Adding upstream version 1.21.8.upstream/1.21.8

Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
author: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-16 19:19:13 +0000
committer: Daniel Baumann <daniel.baumann@progress-linux.org> 2024-04-16 19:19:13 +0000
commit: ccd992355df7192993c666236047820244914598 (patch)
tree: f00fea65147227b7743083c6148396f74cd66935 /src/runtime
parent: Initial commit. (diff)
download: golang-1.21-ccd992355df7192993c666236047820244914598.tar.xz
golang-1.21-ccd992355df7192993c666236047820244914598.zip
1080 files changed, 226184 insertions, 0 deletions
diff --git a/src/runtime/HACKING.md b/src/runtime/HACKING.md
new file mode 100644
index 0000000..ce0b42a
--- /dev/null
+++ b/src/runtime/HACKING.md
@@ -0,0 +1,332 @@
+This is a living document and at times it will be out of date. It is
+intended to articulate how programming in the Go runtime differs from
+writing normal Go. It focuses on pervasive concepts rather than
+details of particular interfaces.
+
+Scheduler structures
+====================
+
+The scheduler manages three types of resources that pervade the
+runtime: Gs, Ms, and Ps. It's important to understand these even if
+you're not working on the scheduler.
+
+Gs, Ms, Ps
+----------
+
+A "G" is simply a goroutine. It's represented by type `g`. When a
+goroutine exits, its `g` object is returned to a pool of free `g`s and
+can later be reused for some other goroutine.
+
+An "M" is an OS thread that can be executing user Go code, runtime
+code, a system call, or be idle. It's represented by type `m`. There
+can be any number of Ms at a time since any number of threads may be
+blocked in system calls.
+
+Finally, a "P" represents the resources required to execute user Go
+code, such as scheduler and memory allocator state. It's represented
+by type `p`. There are exactly `GOMAXPROCS` Ps. A P can be thought of
+like a CPU in the OS scheduler and the contents of the `p` type like
+per-CPU state. This is a good place to put state that needs to be
+sharded for efficiency, but doesn't need to be per-thread or
+per-goroutine.
+
+The scheduler's job is to match up a G (the code to execute), an M
+(where to execute it), and a P (the rights and resources to execute
+it). When an M stops executing user Go code, for example by entering a
+system call, it returns its P to the idle P pool. In order to resume
+executing user Go code, for example on return from a system call, it
+must acquire a P from the idle pool.
+
+All `g`, `m`, and `p` objects are heap allocated, but are never freed,
+so their memory remains type stable. As a result, the runtime can
+avoid write barriers in the depths of the scheduler.
+
+`getg()` and `getg().m.curg`
+----------------------------
+
+To get the current user `g`, use `getg().m.curg`.
+
+`getg()` alone returns the current `g`, but when executing on the
+system or signal stacks, this will return the current M's "g0" or
+"gsignal", respectively. This is usually not what you want.
+
+To determine if you're running on the user stack or the system stack,
+use `getg() == getg().m.curg`.
+
+Stacks
+======
+
+Every non-dead G has a *user stack* associated with it, which is what
+user Go code executes on. User stacks start small (e.g., 2K) and grow
+or shrink dynamically.
+
+Every M has a *system stack* associated with it (also known as the M's
+"g0" stack because it's implemented as a stub G) and, on Unix
+platforms, a *signal stack* (also known as the M's "gsignal" stack).
+System and signal stacks cannot grow, but are large enough to execute
+runtime and cgo code (8K in a pure Go binary; system-allocated in a
+cgo binary).
+
+Runtime code often temporarily switches to the system stack using
+`systemstack`, `mcall`, or `asmcgocall` to perform tasks that must not
+be preempted, that must not grow the user stack, or that switch user
+goroutines. Code running on the system stack is implicitly
+non-preemptible and the garbage collector does not scan system stacks.
+While running on the system stack, the current user stack is not used
+for execution.
+
+nosplit functions
+-----------------
+
+Most functions start with a prologue that inspects the stack pointer
+and the current G's stack bound and calls `morestack` if the stack
+needs to grow.
+
+Functions can be marked `//go:nosplit` (or `NOSPLIT` in assembly) to
+indicate that they should not get this prologue. This has several
+uses:
+
+- Functions that must run on the user stack, but must not call into
+  stack growth, for example because this would cause a deadlock, or
+  because they have untyped words on the stack.
+
+- Functions that must not be preempted on entry.
+
+- Functions that may run without a valid G. For example, functions
+  that run in early runtime start-up, or that may be entered from C
+  code such as cgo callbacks or the signal handler.
+
+Splittable functions ensure there's some amount of space on the stack
+for nosplit functions to run in and the linker checks that any static
+chain of nosplit function calls cannot exceed this bound.
+
+Any function with a `//go:nosplit` annotation should explain why it is
+nosplit in its documentation comment.
+
+Error handling and reporting
+============================
+
+Errors that can reasonably be recovered from in user code should use
+`panic` like usual. However, there are some situations where `panic`
+will cause an immediate fatal error, such as when called on the system
+stack or when called during `mallocgc`.
+
+Most errors in the runtime are not recoverable. For these, use
+`throw`, which dumps the traceback and immediately terminates the
+process. In general, `throw` should be passed a string constant to
+avoid allocating in perilous situations. By convention, additional
+details are printed before `throw` using `print` or `println` and the
+messages are prefixed with "runtime:".
+
+For unrecoverable errors where user code is expected to be at fault for the
+failure (such as racing map writes), use `fatal`.
+
+For runtime error debugging, it may be useful to run with `GOTRACEBACK=system`
+or `GOTRACEBACK=crash`. The output of `panic` and `fatal` is as described by
+`GOTRACEBACK`. The output of `throw` always includes runtime frames, metadata
+and all goroutines regardless of `GOTRACEBACK` (i.e., equivalent to
+`GOTRACEBACK=system`). Whether `throw` crashes or not is still controlled by
+`GOTRACEBACK`.
+
+Synchronization
+===============
+
+The runtime has multiple synchronization mechanisms. They differ in
+semantics and, in particular, in whether they interact with the
+goroutine scheduler or the OS scheduler.
+
+The simplest is `mutex`, which is manipulated using `lock` and
+`unlock`. This should be used to protect shared structures for short
+periods. Blocking on a `mutex` directly blocks the M, without
+interacting with the Go scheduler. This means it is safe to use from
+the lowest levels of the runtime, but also prevents any associated G
+and P from being rescheduled. `rwmutex` is similar.
+
+For one-shot notifications, use `note`, which provides `notesleep` and
+`notewakeup`. Unlike traditional UNIX `sleep`/`wakeup`, `note`s are
+race-free, so `notesleep` returns immediately if the `notewakeup` has
+already happened. A `note` can be reset after use with `noteclear`,
+which must not race with a sleep or wakeup. Like `mutex`, blocking on
+a `note` blocks the M. However, there are different ways to sleep on a
+`note`:`notesleep` also prevents rescheduling of any associated G and
+P, while `notetsleepg` acts like a blocking system call that allows
+the P to be reused to run another G. This is still less efficient than
+blocking the G directly since it consumes an M.
+
+To interact directly with the goroutine scheduler, use `gopark` and
+`goready`. `gopark` parks the current goroutine—putting it in the
+"waiting" state and removing it from the scheduler's run queue—and
+schedules another goroutine on the current M/P. `goready` puts a
+parked goroutine back in the "runnable" state and adds it to the run
+queue.
+
+In summary,
+
+<table>
+<tr><th></th><th colspan="3">Blocks</th></tr>
+<tr><th>Interface</th><th>G</th><th>M</th><th>P</th></tr>
+<tr><td>(rw)mutex</td><td>Y</td><td>Y</td><td>Y</td></tr>
+<tr><td>note</td><td>Y</td><td>Y</td><td>Y/N</td></tr>
+<tr><td>park</td><td>Y</td><td>N</td><td>N</td></tr>
+</table>
+
+Atomics
+=======
+
+The runtime uses its own atomics package at `runtime/internal/atomic`.
+This corresponds to `sync/atomic`, but functions have different names
+for historical reasons and there are a few additional functions needed
+by the runtime.
+
+In general, we think hard about the uses of atomics in the runtime and
+try to avoid unnecessary atomic operations. If access to a variable is
+sometimes protected by another synchronization mechanism, the
+already-protected accesses generally don't need to be atomic. There
+are several reasons for this:
+
+1. Using non-atomic or atomic access where appropriate makes the code
+   more self-documenting. Atomic access to a variable implies there's
+   somewhere else that may concurrently access the variable.
+
+2. Non-atomic access allows for automatic race detection. The runtime
+   doesn't currently have a race detector, but it may in the future.
+   Atomic access defeats the race detector, while non-atomic access
+   allows the race detector to check your assumptions.
+
+3. Non-atomic access may improve performance.
+
+Of course, any non-atomic access to a shared variable should be
+documented to explain how that access is protected.
+
+Some common patterns that mix atomic and non-atomic access are:
+
+* Read-mostly variables where updates are protected by a lock. Within
+  the locked region, reads do not need to be atomic, but the write
+  does. Outside the locked region, reads need to be atomic.
+
+* Reads that only happen during STW, where no writes can happen during
+  STW, do not need to be atomic.
+
+That said, the advice from the Go memory model stands: "Don't be
+[too] clever." The performance of the runtime matters, but its
+robustness matters more.
+
+Unmanaged memory
+================
+
+In general, the runtime tries to use regular heap allocation. However,
+in some cases the runtime must allocate objects outside of the garbage
+collected heap, in *unmanaged memory*. This is necessary if the
+objects are part of the memory manager itself or if they must be
+allocated in situations where the caller may not have a P.
+
+There are three mechanisms for allocating unmanaged memory:
+
+* sysAlloc obtains memory directly from the OS. This comes in whole
+  multiples of the system page size, but it can be freed with sysFree.
+
+* persistentalloc combines multiple smaller allocations into a single
+  sysAlloc to avoid fragmentation. However, there is no way to free
+  persistentalloced objects (hence the name).
+
+* fixalloc is a SLAB-style allocator that allocates objects of a fixed
+  size. fixalloced objects can be freed, but this memory can only be
+  reused by the same fixalloc pool, so it can only be reused for
+  objects of the same type.
+
+In general, types that are allocated using any of these should be
+marked as not in heap by embedding `runtime/internal/sys.NotInHeap`.
+
+Objects that are allocated in unmanaged memory **must not** contain
+heap pointers unless the following rules are also obeyed:
+
+1. Any pointers from unmanaged memory to the heap must be garbage
+   collection roots. More specifically, any pointer must either be
+   accessible through a global variable or be added as an explicit
+   garbage collection root in `runtime.markroot`.
+
+2. If the memory is reused, the heap pointers must be zero-initialized
+   before they become visible as GC roots. Otherwise, the GC may
+   observe stale heap pointers. See "Zero-initialization versus
+   zeroing".
+
+Zero-initialization versus zeroing
+==================================
+
+There are two types of zeroing in the runtime, depending on whether
+the memory is already initialized to a type-safe state.
+
+If memory is not in a type-safe state, meaning it potentially contains
+"garbage" because it was just allocated and it is being initialized
+for first use, then it must be *zero-initialized* using
+`memclrNoHeapPointers` or non-pointer writes. This does not perform
+write barriers.
+
+If memory is already in a type-safe state and is simply being set to
+the zero value, this must be done using regular writes, `typedmemclr`,
+or `memclrHasPointers`. This performs write barriers.
+
+Runtime-only compiler directives
+================================
+
+In addition to the "//go:" directives documented in "go doc compile",
+the compiler supports additional directives only in the runtime.
+
+go:systemstack
+--------------
+
+`go:systemstack` indicates that a function must run on the system
+stack. This is checked dynamically by a special function prologue.
+
+go:nowritebarrier
+-----------------
+
+`go:nowritebarrier` directs the compiler to emit an error if the
+following function contains any write barriers. (It *does not*
+suppress the generation of write barriers; it is simply an assertion.)
+
+Usually you want `go:nowritebarrierrec`. `go:nowritebarrier` is
+primarily useful in situations where it's "nice" not to have write
+barriers, but not required for correctness.
+
+go:nowritebarrierrec and go:yeswritebarrierrec
+----------------------------------------------
+
+`go:nowritebarrierrec` directs the compiler to emit an error if the
+following function or any function it calls recursively, up to a
+`go:yeswritebarrierrec`, contains a write barrier.
+
+Logically, the compiler floods the call graph starting from each
+`go:nowritebarrierrec` function and produces an error if it encounters
+a function containing a write barrier. This flood stops at
+`go:yeswritebarrierrec` functions.
+
+`go:nowritebarrierrec` is used in the implementation of the write
+barrier to prevent infinite loops.
+
+Both directives are used in the scheduler. The write barrier requires
+an active P (`getg().m.p != nil`) and scheduler code often runs
+without an active P. In this case, `go:nowritebarrierrec` is used on
+functions that release the P or may run without a P and
+`go:yeswritebarrierrec` is used when code re-acquires an active P.
+Since these are function-level annotations, code that releases or
+acquires a P may need to be split across two functions.
+
+go:uintptrkeepalive
+-------------------
+
+The //go:uintptrkeepalive directive must be followed by a function declaration.
+
+It specifies that the function's uintptr arguments may be pointer values that
+have been converted to uintptr and must be kept alive for the duration of the
+call, even though from the types alone it would appear that the object is no
+longer needed during the call.
+
+This directive is similar to //go:uintptrescapes, but it does not force
+arguments to escape. Since stack growth does not understand these arguments,
+this directive must be used with //go:nosplit (in the marked function and all
+transitive calls) to prevent stack growth.
+
+The conversion from pointer to uintptr must appear in the argument list of any
+call to this function. This directive is used for some low-level system call
+implementations.
diff --git a/src/runtime/Makefile b/src/runtime/Makefile
new file mode 100644
index 0000000..55087de
--- /dev/null
+++ b/src/runtime/Makefile
@@ -0,0 +1,5 @@
+# Copyright 2009 The Go Authors. All rights reserved.
+# Use of this source code is governed by a BSD-style
+# license that can be found in the LICENSE file.
+
+include ../Make.dist
diff --git a/src/runtime/abi_test.go b/src/runtime/abi_test.go
new file mode 100644
index 0000000..0c9488a
--- /dev/null
+++ b/src/runtime/abi_test.go
@@ -0,0 +1,112 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build goexperiment.regabiargs
+
+// This file contains tests specific to making sure the register ABI
+// works in a bunch of contexts in the runtime.
+
+package runtime_test
+
+import (
+	"internal/abi"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"runtime"
+	"strings"
+	"testing"
+	"time"
+)
+
+var regConfirmRun chan int
+
+//go:registerparams
+func regFinalizerPointer(v *Tint) (int, float32, [10]byte) {
+	regConfirmRun <- *(*int)(v)
+	return 5151, 4.0, [10]byte{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
+}
+
+//go:registerparams
+func regFinalizerIface(v Tinter) (int, float32, [10]byte) {
+	regConfirmRun <- *(*int)(v.(*Tint))
+	return 5151, 4.0, [10]byte{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
+}
+
+func TestFinalizerRegisterABI(t *testing.T) {
+	testenv.MustHaveExec(t)
+
+	// Actually run the test in a subprocess because we don't want
+	// finalizers from other tests interfering.
+	if os.Getenv("TEST_FINALIZER_REGABI") != "1" {
+		cmd := testenv.CleanCmdEnv(exec.Command(os.Args[0], "-test.run=TestFinalizerRegisterABI", "-test.v"))
+		cmd.Env = append(cmd.Env, "TEST_FINALIZER_REGABI=1")
+		out, err := cmd.CombinedOutput()
+		if !strings.Contains(string(out), "PASS\n") || err != nil {
+			t.Fatalf("%s\n(exit status %v)", string(out), err)
+		}
+		return
+	}
+
+	// Optimistically clear any latent finalizers from e.g. the testing
+	// package before continuing.
+	//
+	// It's possible that a finalizer only becomes available to run
+	// after this point, which would interfere with the test and could
+	// cause a crash, but because we're running in a separate process
+	// it's extremely unlikely.
+	runtime.GC()
+	runtime.GC()
+
+	// fing will only pick the new IntRegArgs up if it's currently
+	// sleeping and wakes up, so wait for it to go to sleep.
+	success := false
+	for i := 0; i < 100; i++ {
+		if runtime.FinalizerGAsleep() {
+			success = true
+			break
+		}
+		time.Sleep(20 * time.Millisecond)
+	}
+	if !success {
+		t.Fatal("finalizer not asleep?")
+	}
+
+	argRegsBefore := runtime.SetIntArgRegs(abi.IntArgRegs)
+	defer runtime.SetIntArgRegs(argRegsBefore)
+
+	tests := []struct {
+		name         string
+		fin          any
+		confirmValue int
+	}{
+		{"Pointer", regFinalizerPointer, -1},
+		{"Interface", regFinalizerIface, -2},
+	}
+	for i := range tests {
+		test := &tests[i]
+		t.Run(test.name, func(t *testing.T) {
+			regConfirmRun = make(chan int)
+
+			x := new(Tint)
+			*x = (Tint)(test.confirmValue)
+			runtime.SetFinalizer(x, test.fin)
+
+			runtime.KeepAlive(x)
+
+			// Queue the finalizer.
+			runtime.GC()
+			runtime.GC()
+
+			select {
+			case <-time.After(time.Second):
+				t.Fatal("finalizer failed to execute")
+			case gotVal := <-regConfirmRun:
+				if gotVal != test.confirmValue {
+					t.Fatalf("wrong finalizer executed? got %d, want %d", gotVal, test.confirmValue)
+				}
+			}
+		})
+	}
+}
diff --git a/src/runtime/alg.go b/src/runtime/alg.go
new file mode 100644
index 0000000..a1f683f
--- /dev/null
+++ b/src/runtime/alg.go
@@ -0,0 +1,354 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/cpu"
+	"internal/goarch"
+	"unsafe"
+)
+
+const (
+	c0 = uintptr((8-goarch.PtrSize)/4*2860486313 + (goarch.PtrSize-4)/4*33054211828000289)
+	c1 = uintptr((8-goarch.PtrSize)/4*3267000013 + (goarch.PtrSize-4)/4*23344194077549503)
+)
+
+func memhash0(p unsafe.Pointer, h uintptr) uintptr {
+	return h
+}
+
+func memhash8(p unsafe.Pointer, h uintptr) uintptr {
+	return memhash(p, h, 1)
+}
+
+func memhash16(p unsafe.Pointer, h uintptr) uintptr {
+	return memhash(p, h, 2)
+}
+
+func memhash128(p unsafe.Pointer, h uintptr) uintptr {
+	return memhash(p, h, 16)
+}
+
+//go:nosplit
+func memhash_varlen(p unsafe.Pointer, h uintptr) uintptr {
+	ptr := getclosureptr()
+	size := *(*uintptr)(unsafe.Pointer(ptr + unsafe.Sizeof(h)))
+	return memhash(p, h, size)
+}
+
+// runtime variable to check if the processor we're running on
+// actually supports the instructions used by the AES-based
+// hash implementation.
+var useAeshash bool
+
+// in asm_*.s
+func memhash(p unsafe.Pointer, h, s uintptr) uintptr
+func memhash32(p unsafe.Pointer, h uintptr) uintptr
+func memhash64(p unsafe.Pointer, h uintptr) uintptr
+func strhash(p unsafe.Pointer, h uintptr) uintptr
+
+func strhashFallback(a unsafe.Pointer, h uintptr) uintptr {
+	x := (*stringStruct)(a)
+	return memhashFallback(x.str, h, uintptr(x.len))
+}
+
+// NOTE: Because NaN != NaN, a map can contain any
+// number of (mostly useless) entries keyed with NaNs.
+// To avoid long hash chains, we assign a random number
+// as the hash value for a NaN.
+
+func f32hash(p unsafe.Pointer, h uintptr) uintptr {
+	f := *(*float32)(p)
+	switch {
+	case f == 0:
+		return c1 * (c0 ^ h) // +0, -0
+	case f != f:
+		return c1 * (c0 ^ h ^ uintptr(fastrand())) // any kind of NaN
+	default:
+		return memhash(p, h, 4)
+	}
+}
+
+func f64hash(p unsafe.Pointer, h uintptr) uintptr {
+	f := *(*float64)(p)
+	switch {
+	case f == 0:
+		return c1 * (c0 ^ h) // +0, -0
+	case f != f:
+		return c1 * (c0 ^ h ^ uintptr(fastrand())) // any kind of NaN
+	default:
+		return memhash(p, h, 8)
+	}
+}
+
+func c64hash(p unsafe.Pointer, h uintptr) uintptr {
+	x := (*[2]float32)(p)
+	return f32hash(unsafe.Pointer(&x[1]), f32hash(unsafe.Pointer(&x[0]), h))
+}
+
+func c128hash(p unsafe.Pointer, h uintptr) uintptr {
+	x := (*[2]float64)(p)
+	return f64hash(unsafe.Pointer(&x[1]), f64hash(unsafe.Pointer(&x[0]), h))
+}
+
+func interhash(p unsafe.Pointer, h uintptr) uintptr {
+	a := (*iface)(p)
+	tab := a.tab
+	if tab == nil {
+		return h
+	}
+	t := tab._type
+	if t.Equal == nil {
+		// Check hashability here. We could do this check inside
+		// typehash, but we want to report the topmost type in
+		// the error text (e.g. in a struct with a field of slice type
+		// we want to report the struct, not the slice).
+		panic(errorString("hash of unhashable type " + toRType(t).string()))
+	}
+	if isDirectIface(t) {
+		return c1 * typehash(t, unsafe.Pointer(&a.data), h^c0)
+	} else {
+		return c1 * typehash(t, a.data, h^c0)
+	}
+}
+
+func nilinterhash(p unsafe.Pointer, h uintptr) uintptr {
+	a := (*eface)(p)
+	t := a._type
+	if t == nil {
+		return h
+	}
+	if t.Equal == nil {
+		// See comment in interhash above.
+		panic(errorString("hash of unhashable type " + toRType(t).string()))
+	}
+	if isDirectIface(t) {
+		return c1 * typehash(t, unsafe.Pointer(&a.data), h^c0)
+	} else {
+		return c1 * typehash(t, a.data, h^c0)
+	}
+}
+
+// typehash computes the hash of the object of type t at address p.
+// h is the seed.
+// This function is seldom used. Most maps use for hashing either
+// fixed functions (e.g. f32hash) or compiler-generated functions
+// (e.g. for a type like struct { x, y string }). This implementation
+// is slower but more general and is used for hashing interface types
+// (called from interhash or nilinterhash, above) or for hashing in
+// maps generated by reflect.MapOf (reflect_typehash, below).
+// Note: this function must match the compiler generated
+// functions exactly. See issue 37716.
+func typehash(t *_type, p unsafe.Pointer, h uintptr) uintptr {
+	if t.TFlag&abi.TFlagRegularMemory != 0 {
+		// Handle ptr sizes specially, see issue 37086.
+		switch t.Size_ {
+		case 4:
+			return memhash32(p, h)
+		case 8:
+			return memhash64(p, h)
+		default:
+			return memhash(p, h, t.Size_)
+		}
+	}
+	switch t.Kind_ & kindMask {
+	case kindFloat32:
+		return f32hash(p, h)
+	case kindFloat64:
+		return f64hash(p, h)
+	case kindComplex64:
+		return c64hash(p, h)
+	case kindComplex128:
+		return c128hash(p, h)
+	case kindString:
+		return strhash(p, h)
+	case kindInterface:
+		i := (*interfacetype)(unsafe.Pointer(t))
+		if len(i.Methods) == 0 {
+			return nilinterhash(p, h)
+		}
+		return interhash(p, h)
+	case kindArray:
+		a := (*arraytype)(unsafe.Pointer(t))
+		for i := uintptr(0); i < a.Len; i++ {
+			h = typehash(a.Elem, add(p, i*a.Elem.Size_), h)
+		}
+		return h
+	case kindStruct:
+		s := (*structtype)(unsafe.Pointer(t))
+		for _, f := range s.Fields {
+			if f.Name.IsBlank() {
+				continue
+			}
+			h = typehash(f.Typ, add(p, f.Offset), h)
+		}
+		return h
+	default:
+		// Should never happen, as typehash should only be called
+		// with comparable types.
+		panic(errorString("hash of unhashable type " + toRType(t).string()))
+	}
+}
+
+//go:linkname reflect_typehash reflect.typehash
+func reflect_typehash(t *_type, p unsafe.Pointer, h uintptr) uintptr {
+	return typehash(t, p, h)
+}
+
+func memequal0(p, q unsafe.Pointer) bool {
+	return true
+}
+func memequal8(p, q unsafe.Pointer) bool {
+	return *(*int8)(p) == *(*int8)(q)
+}
+func memequal16(p, q unsafe.Pointer) bool {
+	return *(*int16)(p) == *(*int16)(q)
+}
+func memequal32(p, q unsafe.Pointer) bool {
+	return *(*int32)(p) == *(*int32)(q)
+}
+func memequal64(p, q unsafe.Pointer) bool {
+	return *(*int64)(p) == *(*int64)(q)
+}
+func memequal128(p, q unsafe.Pointer) bool {
+	return *(*[2]int64)(p) == *(*[2]int64)(q)
+}
+func f32equal(p, q unsafe.Pointer) bool {
+	return *(*float32)(p) == *(*float32)(q)
+}
+func f64equal(p, q unsafe.Pointer) bool {
+	return *(*float64)(p) == *(*float64)(q)
+}
+func c64equal(p, q unsafe.Pointer) bool {
+	return *(*complex64)(p) == *(*complex64)(q)
+}
+func c128equal(p, q unsafe.Pointer) bool {
+	return *(*complex128)(p) == *(*complex128)(q)
+}
+func strequal(p, q unsafe.Pointer) bool {
+	return *(*string)(p) == *(*string)(q)
+}
+func interequal(p, q unsafe.Pointer) bool {
+	x := *(*iface)(p)
+	y := *(*iface)(q)
+	return x.tab == y.tab && ifaceeq(x.tab, x.data, y.data)
+}
+func nilinterequal(p, q unsafe.Pointer) bool {
+	x := *(*eface)(p)
+	y := *(*eface)(q)
+	return x._type == y._type && efaceeq(x._type, x.data, y.data)
+}
+func efaceeq(t *_type, x, y unsafe.Pointer) bool {
+	if t == nil {
+		return true
+	}
+	eq := t.Equal
+	if eq == nil {
+		panic(errorString("comparing uncomparable type " + toRType(t).string()))
+	}
+	if isDirectIface(t) {
+		// Direct interface types are ptr, chan, map, func, and single-element structs/arrays thereof.
+		// Maps and funcs are not comparable, so they can't reach here.
+		// Ptrs, chans, and single-element items can be compared directly using ==.
+		return x == y
+	}
+	return eq(x, y)
+}
+func ifaceeq(tab *itab, x, y unsafe.Pointer) bool {
+	if tab == nil {
+		return true
+	}
+	t := tab._type
+	eq := t.Equal
+	if eq == nil {
+		panic(errorString("comparing uncomparable type " + toRType(t).string()))
+	}
+	if isDirectIface(t) {
+		// See comment in efaceeq.
+		return x == y
+	}
+	return eq(x, y)
+}
+
+// Testing adapters for hash quality tests (see hash_test.go)
+func stringHash(s string, seed uintptr) uintptr {
+	return strhash(noescape(unsafe.Pointer(&s)), seed)
+}
+
+func bytesHash(b []byte, seed uintptr) uintptr {
+	s := (*slice)(unsafe.Pointer(&b))
+	return memhash(s.array, seed, uintptr(s.len))
+}
+
+func int32Hash(i uint32, seed uintptr) uintptr {
+	return memhash32(noescape(unsafe.Pointer(&i)), seed)
+}
+
+func int64Hash(i uint64, seed uintptr) uintptr {
+	return memhash64(noescape(unsafe.Pointer(&i)), seed)
+}
+
+func efaceHash(i any, seed uintptr) uintptr {
+	return nilinterhash(noescape(unsafe.Pointer(&i)), seed)
+}
+
+func ifaceHash(i interface {
+	F()
+}, seed uintptr) uintptr {
+	return interhash(noescape(unsafe.Pointer(&i)), seed)
+}
+
+const hashRandomBytes = goarch.PtrSize / 4 * 64
+
+// used in asm_{386,amd64,arm64}.s to seed the hash function
+var aeskeysched [hashRandomBytes]byte
+
+// used in hash{32,64}.go to seed the hash function
+var hashkey [4]uintptr
+
+func alginit() {
+	// Install AES hash algorithms if the instructions needed are present.
+	if (GOARCH == "386" || GOARCH == "amd64") &&
+		cpu.X86.HasAES && // AESENC
+		cpu.X86.HasSSSE3 && // PSHUFB
+		cpu.X86.HasSSE41 { // PINSR{D,Q}
+		initAlgAES()
+		return
+	}
+	if GOARCH == "arm64" && cpu.ARM64.HasAES {
+		initAlgAES()
+		return
+	}
+	getRandomData((*[len(hashkey) * goarch.PtrSize]byte)(unsafe.Pointer(&hashkey))[:])
+	hashkey[0] |= 1 // make sure these numbers are odd
+	hashkey[1] |= 1
+	hashkey[2] |= 1
+	hashkey[3] |= 1
+}
+
+func initAlgAES() {
+	useAeshash = true
+	// Initialize with random data so hash collisions will be hard to engineer.
+	getRandomData(aeskeysched[:])
+}
+
+// Note: These routines perform the read with a native endianness.
+func readUnaligned32(p unsafe.Pointer) uint32 {
+	q := (*[4]byte)(p)
+	if goarch.BigEndian {
+		return uint32(q[3]) | uint32(q[2])<<8 | uint32(q[1])<<16 | uint32(q[0])<<24
+	}
+	return uint32(q[0]) | uint32(q[1])<<8 | uint32(q[2])<<16 | uint32(q[3])<<24
+}
+
+func readUnaligned64(p unsafe.Pointer) uint64 {
+	q := (*[8]byte)(p)
+	if goarch.BigEndian {
+		return uint64(q[7]) | uint64(q[6])<<8 | uint64(q[5])<<16 | uint64(q[4])<<24 |
+			uint64(q[3])<<32 | uint64(q[2])<<40 | uint64(q[1])<<48 | uint64(q[0])<<56
+	}
+	return uint64(q[0]) | uint64(q[1])<<8 | uint64(q[2])<<16 | uint64(q[3])<<24 | uint64(q[4])<<32 | uint64(q[5])<<40 | uint64(q[6])<<48 | uint64(q[7])<<56
+}
diff --git a/src/runtime/align_runtime_test.go b/src/runtime/align_runtime_test.go
new file mode 100644
index 0000000..d78b0b2
--- /dev/null
+++ b/src/runtime/align_runtime_test.go
@@ -0,0 +1,51 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This file lives in the runtime package
+// so we can get access to the runtime guts.
+// The rest of the implementation of this test is in align_test.go.
+
+package runtime
+
+import "unsafe"
+
+// AtomicFields is the set of fields on which we perform 64-bit atomic
+// operations (all the *64 operations in runtime/internal/atomic).
+var AtomicFields = []uintptr{
+	unsafe.Offsetof(m{}.procid),
+	unsafe.Offsetof(p{}.gcFractionalMarkTime),
+	unsafe.Offsetof(profBuf{}.overflow),
+	unsafe.Offsetof(profBuf{}.overflowTime),
+	unsafe.Offsetof(heapStatsDelta{}.tinyAllocCount),
+	unsafe.Offsetof(heapStatsDelta{}.smallAllocCount),
+	unsafe.Offsetof(heapStatsDelta{}.smallFreeCount),
+	unsafe.Offsetof(heapStatsDelta{}.largeAlloc),
+	unsafe.Offsetof(heapStatsDelta{}.largeAllocCount),
+	unsafe.Offsetof(heapStatsDelta{}.largeFree),
+	unsafe.Offsetof(heapStatsDelta{}.largeFreeCount),
+	unsafe.Offsetof(heapStatsDelta{}.committed),
+	unsafe.Offsetof(heapStatsDelta{}.released),
+	unsafe.Offsetof(heapStatsDelta{}.inHeap),
+	unsafe.Offsetof(heapStatsDelta{}.inStacks),
+	unsafe.Offsetof(heapStatsDelta{}.inPtrScalarBits),
+	unsafe.Offsetof(heapStatsDelta{}.inWorkBufs),
+	unsafe.Offsetof(lfnode{}.next),
+	unsafe.Offsetof(mstats{}.last_gc_nanotime),
+	unsafe.Offsetof(mstats{}.last_gc_unix),
+	unsafe.Offsetof(workType{}.bytesMarked),
+}
+
+// AtomicVariables is the set of global variables on which we perform
+// 64-bit atomic operations.
+var AtomicVariables = []unsafe.Pointer{
+	unsafe.Pointer(&ncgocall),
+	unsafe.Pointer(&test_z64),
+	unsafe.Pointer(&blockprofilerate),
+	unsafe.Pointer(&mutexprofilerate),
+	unsafe.Pointer(&gcController),
+	unsafe.Pointer(&memstats),
+	unsafe.Pointer(&sched),
+	unsafe.Pointer(&ticks),
+	unsafe.Pointer(&work),
+}
diff --git a/src/runtime/align_test.go b/src/runtime/align_test.go
new file mode 100644
index 0000000..2bad5b1
--- /dev/null
+++ b/src/runtime/align_test.go
@@ -0,0 +1,200 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"go/ast"
+	"go/build"
+	"go/importer"
+	"go/parser"
+	"go/printer"
+	"go/token"
+	"go/types"
+	"internal/testenv"
+	"os"
+	"regexp"
+	"runtime"
+	"strings"
+	"testing"
+)
+
+// Check that 64-bit fields on which we apply atomic operations
+// are aligned to 8 bytes. This can be a problem on 32-bit systems.
+func TestAtomicAlignment(t *testing.T) {
+	testenv.MustHaveGoBuild(t) // go command needed to resolve std .a files for importer.Default().
+
+	// Read the code making the tables above, to see which fields and
+	// variables we are currently checking.
+	checked := map[string]bool{}
+	x, err := os.ReadFile("./align_runtime_test.go")
+	if err != nil {
+		t.Fatalf("read failed: %v", err)
+	}
+	fieldDesc := map[int]string{}
+	r := regexp.MustCompile(`unsafe[.]Offsetof[(](\w+){}[.](\w+)[)]`)
+	matches := r.FindAllStringSubmatch(string(x), -1)
+	for i, v := range matches {
+		checked["field runtime."+v[1]+"."+v[2]] = true
+		fieldDesc[i] = v[1] + "." + v[2]
+	}
+	varDesc := map[int]string{}
+	r = regexp.MustCompile(`unsafe[.]Pointer[(]&(\w+)[)]`)
+	matches = r.FindAllStringSubmatch(string(x), -1)
+	for i, v := range matches {
+		checked["var "+v[1]] = true
+		varDesc[i] = v[1]
+	}
+
+	// Check all of our alignments. This is the actual core of the test.
+	for i, d := range runtime.AtomicFields {
+		if d%8 != 0 {
+			t.Errorf("field alignment of %s failed: offset is %d", fieldDesc[i], d)
+		}
+	}
+	for i, p := range runtime.AtomicVariables {
+		if uintptr(p)%8 != 0 {
+			t.Errorf("variable alignment of %s failed: address is %x", varDesc[i], p)
+		}
+	}
+
+	// The code above is the actual test. The code below attempts to check
+	// that the tables used by the code above are exhaustive.
+
+	// Parse the whole runtime package, checking that arguments of
+	// appropriate atomic operations are in the list above.
+	fset := token.NewFileSet()
+	m, err := parser.ParseDir(fset, ".", nil, 0)
+	if err != nil {
+		t.Fatalf("parsing runtime failed: %v", err)
+	}
+	pkg := m["runtime"] // Note: ignore runtime_test and main packages
+
+	// Filter files by those for the current architecture/os being tested.
+	fileMap := map[string]bool{}
+	for _, f := range buildableFiles(t, ".") {
+		fileMap[f] = true
+	}
+	var files []*ast.File
+	for fname, f := range pkg.Files {
+		if fileMap[fname] {
+			files = append(files, f)
+		}
+	}
+
+	// Call go/types to analyze the runtime package.
+	var info types.Info
+	info.Types = map[ast.Expr]types.TypeAndValue{}
+	conf := types.Config{Importer: importer.Default()}
+	_, err = conf.Check("runtime", fset, files, &info)
+	if err != nil {
+		t.Fatalf("typechecking runtime failed: %v", err)
+	}
+
+	// Analyze all atomic.*64 callsites.
+	v := Visitor{t: t, fset: fset, types: info.Types, checked: checked}
+	ast.Walk(&v, pkg)
+}
+
+type Visitor struct {
+	fset    *token.FileSet
+	types   map[ast.Expr]types.TypeAndValue
+	checked map[string]bool
+	t       *testing.T
+}
+
+func (v *Visitor) Visit(n ast.Node) ast.Visitor {
+	c, ok := n.(*ast.CallExpr)
+	if !ok {
+		return v
+	}
+	f, ok := c.Fun.(*ast.SelectorExpr)
+	if !ok {
+		return v
+	}
+	p, ok := f.X.(*ast.Ident)
+	if !ok {
+		return v
+	}
+	if p.Name != "atomic" {
+		return v
+	}
+	if !strings.HasSuffix(f.Sel.Name, "64") {
+		return v
+	}
+
+	a := c.Args[0]
+
+	// This is a call to atomic.XXX64(a, ...). Make sure a is aligned to 8 bytes.
+	// XXX = one of Load, Store, Cas, etc.
+	// The arg we care about the alignment of is always the first one.
+
+	if u, ok := a.(*ast.UnaryExpr); ok && u.Op == token.AND {
+		v.checkAddr(u.X)
+		return v
+	}
+
+	// Other cases there's nothing we can check. Assume we're ok.
+	v.t.Logf("unchecked atomic operation %s %v", v.fset.Position(n.Pos()), v.print(n))
+
+	return v
+}
+
+// checkAddr checks to make sure n is a properly aligned address for a 64-bit atomic operation.
+func (v *Visitor) checkAddr(n ast.Node) {
+	switch n := n.(type) {
+	case *ast.IndexExpr:
+		// Alignment of an array element is the same as the whole array.
+		v.checkAddr(n.X)
+		return
+	case *ast.Ident:
+		key := "var " + v.print(n)
+		if !v.checked[key] {
+			v.t.Errorf("unchecked variable %s %s", v.fset.Position(n.Pos()), key)
+		}
+		return
+	case *ast.SelectorExpr:
+		t := v.types[n.X].Type
+		if t == nil {
+			// Not sure what is happening here, go/types fails to
+			// type the selector arg on some platforms.
+			return
+		}
+		if p, ok := t.(*types.Pointer); ok {
+			// Note: we assume here that the pointer p in p.foo is properly
+			// aligned. We just check that foo is at a properly aligned offset.
+			t = p.Elem()
+		} else {
+			v.checkAddr(n.X)
+		}
+		if t.Underlying() == t {
+			v.t.Errorf("analysis can't handle unnamed type %s %v", v.fset.Position(n.Pos()), t)
+		}
+		key := "field " + t.String() + "." + n.Sel.Name
+		if !v.checked[key] {
+			v.t.Errorf("unchecked field %s %s", v.fset.Position(n.Pos()), key)
+		}
+	default:
+		v.t.Errorf("unchecked atomic address %s %v", v.fset.Position(n.Pos()), v.print(n))
+
+	}
+}
+
+func (v *Visitor) print(n ast.Node) string {
+	var b strings.Builder
+	printer.Fprint(&b, v.fset, n)
+	return b.String()
+}
+
+// buildableFiles returns the list of files in the given directory
+// that are actually used for the build, given GOOS/GOARCH restrictions.
+func buildableFiles(t *testing.T, dir string) []string {
+	ctxt := build.Default
+	ctxt.CgoEnabled = true
+	pkg, err := ctxt.ImportDir(dir, 0)
+	if err != nil {
+		t.Fatalf("can't find buildable files: %v", err)
+	}
+	return pkg.GoFiles
+}
diff --git a/src/runtime/arena.go b/src/runtime/arena.go
new file mode 100644
index 0000000..f9806c5
--- /dev/null
+++ b/src/runtime/arena.go
@@ -0,0 +1,1003 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Implementation of (safe) user arenas.
+//
+// This file contains the implementation of user arenas wherein Go values can
+// be manually allocated and freed in bulk. The act of manually freeing memory,
+// potentially before a GC cycle, means that a garbage collection cycle can be
+// delayed, improving efficiency by reducing GC cycle frequency. There are other
+// potential efficiency benefits, such as improved locality and access to a more
+// efficient allocation strategy.
+//
+// What makes the arenas here safe is that once they are freed, accessing the
+// arena's memory will cause an explicit program fault, and the arena's address
+// space will not be reused until no more pointers into it are found. There's one
+// exception to this: if an arena allocated memory that isn't exhausted, it's placed
+// back into a pool for reuse. This means that a crash is not always guaranteed.
+//
+// While this may seem unsafe, it still prevents memory corruption, and is in fact
+// necessary in order to make new(T) a valid implementation of arenas. Such a property
+// is desirable to allow for a trivial implementation. (It also avoids complexities
+// that arise from synchronization with the GC when trying to set the arena chunks to
+// fault while the GC is active.)
+//
+// The implementation works in layers. At the bottom, arenas are managed in chunks.
+// Each chunk must be a multiple of the heap arena size, or the heap arena size must
+// be divisible by the arena chunks. The address space for each chunk, and each
+// corresponding heapArena for that address space, are eternally reserved for use as
+// arena chunks. That is, they can never be used for the general heap. Each chunk
+// is also represented by a single mspan, and is modeled as a single large heap
+// allocation. It must be, because each chunk contains ordinary Go values that may
+// point into the heap, so it must be scanned just like any other object. Any
+// pointer into a chunk will therefore always cause the whole chunk to be scanned
+// while its corresponding arena is still live.
+//
+// Chunks may be allocated either from new memory mapped by the OS on our behalf,
+// or by reusing old freed chunks. When chunks are freed, their underlying memory
+// is returned to the OS, set to fault on access, and may not be reused until the
+// program doesn't point into the chunk anymore (the code refers to this state as
+// "quarantined"), a property checked by the GC.
+//
+// The sweeper handles moving chunks out of this quarantine state to be ready for
+// reuse. When the chunk is placed into the quarantine state, its corresponding
+// span is marked as noscan so that the GC doesn't try to scan memory that would
+// cause a fault.
+//
+// At the next layer are the user arenas themselves. They consist of a single
+// active chunk which new Go values are bump-allocated into and a list of chunks
+// that were exhausted when allocating into the arena. Once the arena is freed,
+// it frees all full chunks it references, and places the active one onto a reuse
+// list for a future arena to use. Each arena keeps its list of referenced chunks
+// explicitly live until it is freed. Each user arena also maps to an object which
+// has a finalizer attached that ensures the arena's chunks are all freed even if
+// the arena itself is never explicitly freed.
+//
+// Pointer-ful memory is bump-allocated from low addresses to high addresses in each
+// chunk, while pointer-free memory is bump-allocated from high address to low
+// addresses. The reason for this is to take advantage of a GC optimization wherein
+// the GC will stop scanning an object when there are no more pointers in it, which
+// also allows us to elide clearing the heap bitmap for pointer-free Go values
+// allocated into arenas.
+//
+// Note that arenas are not safe to use concurrently.
+//
+// In summary, there are 2 resources: arenas, and arena chunks. They exist in the
+// following lifecycle:
+//
+// (1) A new arena is created via newArena.
+// (2) Chunks are allocated to hold memory allocated into the arena with new or slice.
+//    (a) Chunks are first allocated from the reuse list of partially-used chunks.
+//    (b) If there are no such chunks, then chunks on the ready list are taken.
+//    (c) Failing all the above, memory for a new chunk is mapped.
+// (3) The arena is freed, or all references to it are dropped, triggering its finalizer.
+//    (a) If the GC is not active, exhausted chunks are set to fault and placed on a
+//        quarantine list.
+//    (b) If the GC is active, exhausted chunks are placed on a fault list and will
+//        go through step (a) at a later point in time.
+//    (c) Any remaining partially-used chunk is placed on a reuse list.
+// (4) Once no more pointers are found into quarantined arena chunks, the sweeper
+//     takes these chunks out of quarantine and places them on the ready list.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"runtime/internal/math"
+	"unsafe"
+)
+
+// Functions starting with arena_ are meant to be exported to downstream users
+// of arenas. They should wrap these functions in a higher-lever API.
+//
+// The underlying arena and its resources are managed through an opaque unsafe.Pointer.
+
+// arena_newArena is a wrapper around newUserArena.
+//
+//go:linkname arena_newArena arena.runtime_arena_newArena
+func arena_newArena() unsafe.Pointer {
+	return unsafe.Pointer(newUserArena())
+}
+
+// arena_arena_New is a wrapper around (*userArena).new, except that typ
+// is an any (must be a *_type, still) and typ must be a type descriptor
+// for a pointer to the type to actually be allocated, i.e. pass a *T
+// to allocate a T. This is necessary because this function returns a *T.
+//
+//go:linkname arena_arena_New arena.runtime_arena_arena_New
+func arena_arena_New(arena unsafe.Pointer, typ any) any {
+	t := (*_type)(efaceOf(&typ).data)
+	if t.Kind_&kindMask != kindPtr {
+		throw("arena_New: non-pointer type")
+	}
+	te := (*ptrtype)(unsafe.Pointer(t)).Elem
+	x := ((*userArena)(arena)).new(te)
+	var result any
+	e := efaceOf(&result)
+	e._type = t
+	e.data = x
+	return result
+}
+
+// arena_arena_Slice is a wrapper around (*userArena).slice.
+//
+//go:linkname arena_arena_Slice arena.runtime_arena_arena_Slice
+func arena_arena_Slice(arena unsafe.Pointer, slice any, cap int) {
+	((*userArena)(arena)).slice(slice, cap)
+}
+
+// arena_arena_Free is a wrapper around (*userArena).free.
+//
+//go:linkname arena_arena_Free arena.runtime_arena_arena_Free
+func arena_arena_Free(arena unsafe.Pointer) {
+	((*userArena)(arena)).free()
+}
+
+// arena_heapify takes a value that lives in an arena and makes a copy
+// of it on the heap. Values that don't live in an arena are returned unmodified.
+//
+//go:linkname arena_heapify arena.runtime_arena_heapify
+func arena_heapify(s any) any {
+	var v unsafe.Pointer
+	e := efaceOf(&s)
+	t := e._type
+	switch t.Kind_ & kindMask {
+	case kindString:
+		v = stringStructOf((*string)(e.data)).str
+	case kindSlice:
+		v = (*slice)(e.data).array
+	case kindPtr:
+		v = e.data
+	default:
+		panic("arena: Clone only supports pointers, slices, and strings")
+	}
+	span := spanOf(uintptr(v))
+	if span == nil || !span.isUserArenaChunk {
+		// Not stored in a user arena chunk.
+		return s
+	}
+	// Heap-allocate storage for a copy.
+	var x any
+	switch t.Kind_ & kindMask {
+	case kindString:
+		s1 := s.(string)
+		s2, b := rawstring(len(s1))
+		copy(b, s1)
+		x = s2
+	case kindSlice:
+		len := (*slice)(e.data).len
+		et := (*slicetype)(unsafe.Pointer(t)).Elem
+		sl := new(slice)
+		*sl = slice{makeslicecopy(et, len, len, (*slice)(e.data).array), len, len}
+		xe := efaceOf(&x)
+		xe._type = t
+		xe.data = unsafe.Pointer(sl)
+	case kindPtr:
+		et := (*ptrtype)(unsafe.Pointer(t)).Elem
+		e2 := newobject(et)
+		typedmemmove(et, e2, e.data)
+		xe := efaceOf(&x)
+		xe._type = t
+		xe.data = e2
+	}
+	return x
+}
+
+const (
+	// userArenaChunkBytes is the size of a user arena chunk.
+	userArenaChunkBytesMax = 8 << 20
+	userArenaChunkBytes    = uintptr(int64(userArenaChunkBytesMax-heapArenaBytes)&(int64(userArenaChunkBytesMax-heapArenaBytes)>>63) + heapArenaBytes) // min(userArenaChunkBytesMax, heapArenaBytes)
+
+	// userArenaChunkPages is the number of pages a user arena chunk uses.
+	userArenaChunkPages = userArenaChunkBytes / pageSize
+
+	// userArenaChunkMaxAllocBytes is the maximum size of an object that can
+	// be allocated from an arena. This number is chosen to cap worst-case
+	// fragmentation of user arenas to 25%. Larger allocations are redirected
+	// to the heap.
+	userArenaChunkMaxAllocBytes = userArenaChunkBytes / 4
+)
+
+func init() {
+	if userArenaChunkPages*pageSize != userArenaChunkBytes {
+		throw("user arena chunk size is not a multiple of the page size")
+	}
+	if userArenaChunkBytes%physPageSize != 0 {
+		throw("user arena chunk size is not a multiple of the physical page size")
+	}
+	if userArenaChunkBytes < heapArenaBytes {
+		if heapArenaBytes%userArenaChunkBytes != 0 {
+			throw("user arena chunk size is smaller than a heap arena, but doesn't divide it")
+		}
+	} else {
+		if userArenaChunkBytes%heapArenaBytes != 0 {
+			throw("user arena chunks size is larger than a heap arena, but not a multiple")
+		}
+	}
+	lockInit(&userArenaState.lock, lockRankUserArenaState)
+}
+
+type userArena struct {
+	// full is a list of full chunks that have not enough free memory left, and
+	// that we'll free once this user arena is freed.
+	//
+	// Can't use mSpanList here because it's not-in-heap.
+	fullList *mspan
+
+	// active is the user arena chunk we're currently allocating into.
+	active *mspan
+
+	// refs is a set of references to the arena chunks so that they're kept alive.
+	//
+	// The last reference in the list always refers to active, while the rest of
+	// them correspond to fullList. Specifically, the head of fullList is the
+	// second-to-last one, fullList.next is the third-to-last, and so on.
+	//
+	// In other words, every time a new chunk becomes active, its appended to this
+	// list.
+	refs []unsafe.Pointer
+
+	// defunct is true if free has been called on this arena.
+	//
+	// This is just a best-effort way to discover a concurrent allocation
+	// and free. Also used to detect a double-free.
+	defunct atomic.Bool
+}
+
+// newUserArena creates a new userArena ready to be used.
+func newUserArena() *userArena {
+	a := new(userArena)
+	SetFinalizer(a, func(a *userArena) {
+		// If arena handle is dropped without being freed, then call
+		// free on the arena, so the arena chunks are never reclaimed
+		// by the garbage collector.
+		a.free()
+	})
+	a.refill()
+	return a
+}
+
+// new allocates a new object of the provided type into the arena, and returns
+// its pointer.
+//
+// This operation is not safe to call concurrently with other operations on the
+// same arena.
+func (a *userArena) new(typ *_type) unsafe.Pointer {
+	return a.alloc(typ, -1)
+}
+
+// slice allocates a new slice backing store. slice must be a pointer to a slice
+// (i.e. *[]T), because userArenaSlice will update the slice directly.
+//
+// cap determines the capacity of the slice backing store and must be non-negative.
+//
+// This operation is not safe to call concurrently with other operations on the
+// same arena.
+func (a *userArena) slice(sl any, cap int) {
+	if cap < 0 {
+		panic("userArena.slice: negative cap")
+	}
+	i := efaceOf(&sl)
+	typ := i._type
+	if typ.Kind_&kindMask != kindPtr {
+		panic("slice result of non-ptr type")
+	}
+	typ = (*ptrtype)(unsafe.Pointer(typ)).Elem
+	if typ.Kind_&kindMask != kindSlice {
+		panic("slice of non-ptr-to-slice type")
+	}
+	typ = (*slicetype)(unsafe.Pointer(typ)).Elem
+	// t is now the element type of the slice we want to allocate.
+
+	*((*slice)(i.data)) = slice{a.alloc(typ, cap), cap, cap}
+}
+
+// free returns the userArena's chunks back to mheap and marks it as defunct.
+//
+// Must be called at most once for any given arena.
+//
+// This operation is not safe to call concurrently with other operations on the
+// same arena.
+func (a *userArena) free() {
+	// Check for a double-free.
+	if a.defunct.Load() {
+		panic("arena double free")
+	}
+
+	// Mark ourselves as defunct.
+	a.defunct.Store(true)
+	SetFinalizer(a, nil)
+
+	// Free all the full arenas.
+	//
+	// The refs on this list are in reverse order from the second-to-last.
+	s := a.fullList
+	i := len(a.refs) - 2
+	for s != nil {
+		a.fullList = s.next
+		s.next = nil
+		freeUserArenaChunk(s, a.refs[i])
+		s = a.fullList
+		i--
+	}
+	if a.fullList != nil || i >= 0 {
+		// There's still something left on the full list, or we
+		// failed to actually iterate over the entire refs list.
+		throw("full list doesn't match refs list in length")
+	}
+
+	// Put the active chunk onto the reuse list.
+	//
+	// Note that active's reference is always the last reference in refs.
+	s = a.active
+	if s != nil {
+		if raceenabled || msanenabled || asanenabled {
+			// Don't reuse arenas with sanitizers enabled. We want to catch
+			// any use-after-free errors aggressively.
+			freeUserArenaChunk(s, a.refs[len(a.refs)-1])
+		} else {
+			lock(&userArenaState.lock)
+			userArenaState.reuse = append(userArenaState.reuse, liveUserArenaChunk{s, a.refs[len(a.refs)-1]})
+			unlock(&userArenaState.lock)
+		}
+	}
+	// nil out a.active so that a race with freeing will more likely cause a crash.
+	a.active = nil
+	a.refs = nil
+}
+
+// alloc reserves space in the current chunk or calls refill and reserves space
+// in a new chunk. If cap is negative, the type will be taken literally, otherwise
+// it will be considered as an element type for a slice backing store with capacity
+// cap.
+func (a *userArena) alloc(typ *_type, cap int) unsafe.Pointer {
+	s := a.active
+	var x unsafe.Pointer
+	for {
+		x = s.userArenaNextFree(typ, cap)
+		if x != nil {
+			break
+		}
+		s = a.refill()
+	}
+	return x
+}
+
+// refill inserts the current arena chunk onto the full list and obtains a new
+// one, either from the partial list or allocating a new one, both from mheap.
+func (a *userArena) refill() *mspan {
+	// If there's an active chunk, assume it's full.
+	s := a.active
+	if s != nil {
+		if s.userArenaChunkFree.size() > userArenaChunkMaxAllocBytes {
+			// It's difficult to tell when we're actually out of memory
+			// in a chunk because the allocation that failed may still leave
+			// some free space available. However, that amount of free space
+			// should never exceed the maximum allocation size.
+			throw("wasted too much memory in an arena chunk")
+		}
+		s.next = a.fullList
+		a.fullList = s
+		a.active = nil
+		s = nil
+	}
+	var x unsafe.Pointer
+
+	// Check the partially-used list.
+	lock(&userArenaState.lock)
+	if len(userArenaState.reuse) > 0 {
+		// Pick off the last arena chunk from the list.
+		n := len(userArenaState.reuse) - 1
+		x = userArenaState.reuse[n].x
+		s = userArenaState.reuse[n].mspan
+		userArenaState.reuse[n].x = nil
+		userArenaState.reuse[n].mspan = nil
+		userArenaState.reuse = userArenaState.reuse[:n]
+	}
+	unlock(&userArenaState.lock)
+	if s == nil {
+		// Allocate a new one.
+		x, s = newUserArenaChunk()
+		if s == nil {
+			throw("out of memory")
+		}
+	}
+	a.refs = append(a.refs, x)
+	a.active = s
+	return s
+}
+
+type liveUserArenaChunk struct {
+	*mspan // Must represent a user arena chunk.
+
+	// Reference to mspan.base() to keep the chunk alive.
+	x unsafe.Pointer
+}
+
+var userArenaState struct {
+	lock mutex
+
+	// reuse contains a list of partially-used and already-live
+	// user arena chunks that can be quickly reused for another
+	// arena.
+	//
+	// Protected by lock.
+	reuse []liveUserArenaChunk
+
+	// fault contains full user arena chunks that need to be faulted.
+	//
+	// Protected by lock.
+	fault []liveUserArenaChunk
+}
+
+// userArenaNextFree reserves space in the user arena for an item of the specified
+// type. If cap is not -1, this is for an array of cap elements of type t.
+func (s *mspan) userArenaNextFree(typ *_type, cap int) unsafe.Pointer {
+	size := typ.Size_
+	if cap > 0 {
+		if size > ^uintptr(0)/uintptr(cap) {
+			// Overflow.
+			throw("out of memory")
+		}
+		size *= uintptr(cap)
+	}
+	if size == 0 || cap == 0 {
+		return unsafe.Pointer(&zerobase)
+	}
+	if size > userArenaChunkMaxAllocBytes {
+		// Redirect allocations that don't fit into a chunk well directly
+		// from the heap.
+		if cap >= 0 {
+			return newarray(typ, cap)
+		}
+		return newobject(typ)
+	}
+
+	// Prevent preemption as we set up the space for a new object.
+	//
+	// Act like we're allocating.
+	mp := acquirem()
+	if mp.mallocing != 0 {
+		throw("malloc deadlock")
+	}
+	if mp.gsignal == getg() {
+		throw("malloc during signal")
+	}
+	mp.mallocing = 1
+
+	var ptr unsafe.Pointer
+	if typ.PtrBytes == 0 {
+		// Allocate pointer-less objects from the tail end of the chunk.
+		v, ok := s.userArenaChunkFree.takeFromBack(size, typ.Align_)
+		if ok {
+			ptr = unsafe.Pointer(v)
+		}
+	} else {
+		v, ok := s.userArenaChunkFree.takeFromFront(size, typ.Align_)
+		if ok {
+			ptr = unsafe.Pointer(v)
+		}
+	}
+	if ptr == nil {
+		// Failed to allocate.
+		mp.mallocing = 0
+		releasem(mp)
+		return nil
+	}
+	if s.needzero != 0 {
+		throw("arena chunk needs zeroing, but should already be zeroed")
+	}
+	// Set up heap bitmap and do extra accounting.
+	if typ.PtrBytes != 0 {
+		if cap >= 0 {
+			userArenaHeapBitsSetSliceType(typ, cap, ptr, s.base())
+		} else {
+			userArenaHeapBitsSetType(typ, ptr, s.base())
+		}
+		c := getMCache(mp)
+		if c == nil {
+			throw("mallocgc called without a P or outside bootstrapping")
+		}
+		if cap > 0 {
+			c.scanAlloc += size - (typ.Size_ - typ.PtrBytes)
+		} else {
+			c.scanAlloc += typ.PtrBytes
+		}
+	}
+
+	// Ensure that the stores above that initialize x to
+	// type-safe memory and set the heap bits occur before
+	// the caller can make ptr observable to the garbage
+	// collector. Otherwise, on weakly ordered machines,
+	// the garbage collector could follow a pointer to x,
+	// but see uninitialized memory or stale heap bits.
+	publicationBarrier()
+
+	mp.mallocing = 0
+	releasem(mp)
+
+	return ptr
+}
+
+// userArenaHeapBitsSetType is the equivalent of heapBitsSetType but for
+// non-slice-backing-store Go values allocated in a user arena chunk. It
+// sets up the heap bitmap for the value with type typ allocated at address ptr.
+// base is the base address of the arena chunk.
+func userArenaHeapBitsSetType(typ *_type, ptr unsafe.Pointer, base uintptr) {
+	h := writeHeapBitsForAddr(uintptr(ptr))
+
+	// Our last allocation might have ended right at a noMorePtrs mark,
+	// which we would not have erased. We need to erase that mark here,
+	// because we're going to start adding new heap bitmap bits.
+	// We only need to clear one mark, because below we make sure to
+	// pad out the bits with zeroes and only write one noMorePtrs bit
+	// for each new object.
+	// (This is only necessary at noMorePtrs boundaries, as noMorePtrs
+	// marks within an object allocated with newAt will be erased by
+	// the normal writeHeapBitsForAddr mechanism.)
+	//
+	// Note that we skip this if this is the first allocation in the
+	// arena because there's definitely no previous noMorePtrs mark
+	// (in fact, we *must* do this, because we're going to try to back
+	// up a pointer to fix this up).
+	if uintptr(ptr)%(8*goarch.PtrSize*goarch.PtrSize) == 0 && uintptr(ptr) != base {
+		// Back up one pointer and rewrite that pointer. That will
+		// cause the writeHeapBits implementation to clear the
+		// noMorePtrs bit we need to clear.
+		r := heapBitsForAddr(uintptr(ptr)-goarch.PtrSize, goarch.PtrSize)
+		_, p := r.next()
+		b := uintptr(0)
+		if p == uintptr(ptr)-goarch.PtrSize {
+			b = 1
+		}
+		h = writeHeapBitsForAddr(uintptr(ptr) - goarch.PtrSize)
+		h = h.write(b, 1)
+	}
+
+	p := typ.GCData // start of 1-bit pointer mask (or GC program)
+	var gcProgBits uintptr
+	if typ.Kind_&kindGCProg != 0 {
+		// Expand gc program, using the object itself for storage.
+		gcProgBits = runGCProg(addb(p, 4), (*byte)(ptr))
+		p = (*byte)(ptr)
+	}
+	nb := typ.PtrBytes / goarch.PtrSize
+
+	for i := uintptr(0); i < nb; i += ptrBits {
+		k := nb - i
+		if k > ptrBits {
+			k = ptrBits
+		}
+		h = h.write(readUintptr(addb(p, i/8)), k)
+	}
+	// Note: we call pad here to ensure we emit explicit 0 bits
+	// for the pointerless tail of the object. This ensures that
+	// there's only a single noMorePtrs mark for the next object
+	// to clear. We don't need to do this to clear stale noMorePtrs
+	// markers from previous uses because arena chunk pointer bitmaps
+	// are always fully cleared when reused.
+	h = h.pad(typ.Size_ - typ.PtrBytes)
+	h.flush(uintptr(ptr), typ.Size_)
+
+	if typ.Kind_&kindGCProg != 0 {
+		// Zero out temporary ptrmask buffer inside object.
+		memclrNoHeapPointers(ptr, (gcProgBits+7)/8)
+	}
+
+	// Double-check that the bitmap was written out correctly.
+	//
+	// Derived from heapBitsSetType.
+	const doubleCheck = false
+	if doubleCheck {
+		size := typ.Size_
+		x := uintptr(ptr)
+		h := heapBitsForAddr(x, size)
+		for i := uintptr(0); i < size; i += goarch.PtrSize {
+			// Compute the pointer bit we want at offset i.
+			want := false
+			off := i % typ.Size_
+			if off < typ.PtrBytes {
+				j := off / goarch.PtrSize
+				want = *addb(typ.GCData, j/8)>>(j%8)&1 != 0
+			}
+			if want {
+				var addr uintptr
+				h, addr = h.next()
+				if addr != x+i {
+					throw("userArenaHeapBitsSetType: pointer entry not correct")
+				}
+			}
+		}
+		if _, addr := h.next(); addr != 0 {
+			throw("userArenaHeapBitsSetType: extra pointer")
+		}
+	}
+}
+
+// userArenaHeapBitsSetSliceType is the equivalent of heapBitsSetType but for
+// Go slice backing store values allocated in a user arena chunk. It sets up the
+// heap bitmap for n consecutive values with type typ allocated at address ptr.
+func userArenaHeapBitsSetSliceType(typ *_type, n int, ptr unsafe.Pointer, base uintptr) {
+	mem, overflow := math.MulUintptr(typ.Size_, uintptr(n))
+	if overflow || n < 0 || mem > maxAlloc {
+		panic(plainError("runtime: allocation size out of range"))
+	}
+	for i := 0; i < n; i++ {
+		userArenaHeapBitsSetType(typ, add(ptr, uintptr(i)*typ.Size_), base)
+	}
+}
+
+// newUserArenaChunk allocates a user arena chunk, which maps to a single
+// heap arena and single span. Returns a pointer to the base of the chunk
+// (this is really important: we need to keep the chunk alive) and the span.
+func newUserArenaChunk() (unsafe.Pointer, *mspan) {
+	if gcphase == _GCmarktermination {
+		throw("newUserArenaChunk called with gcphase == _GCmarktermination")
+	}
+
+	// Deduct assist credit. Because user arena chunks are modeled as one
+	// giant heap object which counts toward heapLive, we're obligated to
+	// assist the GC proportionally (and it's worth noting that the arena
+	// does represent additional work for the GC, but we also have no idea
+	// what that looks like until we actually allocate things into the
+	// arena).
+	deductAssistCredit(userArenaChunkBytes)
+
+	// Set mp.mallocing to keep from being preempted by GC.
+	mp := acquirem()
+	if mp.mallocing != 0 {
+		throw("malloc deadlock")
+	}
+	if mp.gsignal == getg() {
+		throw("malloc during signal")
+	}
+	mp.mallocing = 1
+
+	// Allocate a new user arena.
+	var span *mspan
+	systemstack(func() {
+		span = mheap_.allocUserArenaChunk()
+	})
+	if span == nil {
+		throw("out of memory")
+	}
+	x := unsafe.Pointer(span.base())
+
+	// Allocate black during GC.
+	// All slots hold nil so no scanning is needed.
+	// This may be racing with GC so do it atomically if there can be
+	// a race marking the bit.
+	if gcphase != _GCoff {
+		gcmarknewobject(span, span.base(), span.elemsize)
+	}
+
+	if raceenabled {
+		// TODO(mknyszek): Track individual objects.
+		racemalloc(unsafe.Pointer(span.base()), span.elemsize)
+	}
+
+	if msanenabled {
+		// TODO(mknyszek): Track individual objects.
+		msanmalloc(unsafe.Pointer(span.base()), span.elemsize)
+	}
+
+	if asanenabled {
+		// TODO(mknyszek): Track individual objects.
+		rzSize := computeRZlog(span.elemsize)
+		span.elemsize -= rzSize
+		span.limit -= rzSize
+		span.userArenaChunkFree = makeAddrRange(span.base(), span.limit)
+		asanpoison(unsafe.Pointer(span.limit), span.npages*pageSize-span.elemsize)
+		asanunpoison(unsafe.Pointer(span.base()), span.elemsize)
+	}
+
+	if rate := MemProfileRate; rate > 0 {
+		c := getMCache(mp)
+		if c == nil {
+			throw("newUserArenaChunk called without a P or outside bootstrapping")
+		}
+		// Note cache c only valid while m acquired; see #47302
+		if rate != 1 && userArenaChunkBytes < c.nextSample {
+			c.nextSample -= userArenaChunkBytes
+		} else {
+			profilealloc(mp, unsafe.Pointer(span.base()), userArenaChunkBytes)
+		}
+	}
+	mp.mallocing = 0
+	releasem(mp)
+
+	// Again, because this chunk counts toward heapLive, potentially trigger a GC.
+	if t := (gcTrigger{kind: gcTriggerHeap}); t.test() {
+		gcStart(t)
+	}
+
+	if debug.malloc {
+		if debug.allocfreetrace != 0 {
+			tracealloc(unsafe.Pointer(span.base()), userArenaChunkBytes, nil)
+		}
+
+		if inittrace.active && inittrace.id == getg().goid {
+			// Init functions are executed sequentially in a single goroutine.
+			inittrace.bytes += uint64(userArenaChunkBytes)
+		}
+	}
+
+	// Double-check it's aligned to the physical page size. Based on the current
+	// implementation this is trivially true, but it need not be in the future.
+	// However, if it's not aligned to the physical page size then we can't properly
+	// set it to fault later.
+	if uintptr(x)%physPageSize != 0 {
+		throw("user arena chunk is not aligned to the physical page size")
+	}
+
+	return x, span
+}
+
+// isUnusedUserArenaChunk indicates that the arena chunk has been set to fault
+// and doesn't contain any scannable memory anymore. However, it might still be
+// mSpanInUse as it sits on the quarantine list, since it needs to be swept.
+//
+// This is not safe to execute unless the caller has ownership of the mspan or
+// the world is stopped (preemption is prevented while the relevant state changes).
+//
+// This is really only meant to be used by accounting tests in the runtime to
+// distinguish when a span shouldn't be counted (since mSpanInUse might not be
+// enough).
+func (s *mspan) isUnusedUserArenaChunk() bool {
+	return s.isUserArenaChunk && s.spanclass == makeSpanClass(0, true)
+}
+
+// setUserArenaChunkToFault sets the address space for the user arena chunk to fault
+// and releases any underlying memory resources.
+//
+// Must be in a non-preemptible state to ensure the consistency of statistics
+// exported to MemStats.
+func (s *mspan) setUserArenaChunkToFault() {
+	if !s.isUserArenaChunk {
+		throw("invalid span in heapArena for user arena")
+	}
+	if s.npages*pageSize != userArenaChunkBytes {
+		throw("span on userArena.faultList has invalid size")
+	}
+
+	// Update the span class to be noscan. What we want to happen is that
+	// any pointer into the span keeps it from getting recycled, so we want
+	// the mark bit to get set, but we're about to set the address space to fault,
+	// so we have to prevent the GC from scanning this memory.
+	//
+	// It's OK to set it here because (1) a GC isn't in progress, so the scanning code
+	// won't make a bad decision, (2) we're currently non-preemptible and in the runtime,
+	// so a GC is blocked from starting. We might race with sweeping, which could
+	// put it on the "wrong" sweep list, but really don't care because the chunk is
+	// treated as a large object span and there's no meaningful difference between scan
+	// and noscan large objects in the sweeper. The STW at the start of the GC acts as a
+	// barrier for this update.
+	s.spanclass = makeSpanClass(0, true)
+
+	// Actually set the arena chunk to fault, so we'll get dangling pointer errors.
+	// sysFault currently uses a method on each OS that forces it to evacuate all
+	// memory backing the chunk.
+	sysFault(unsafe.Pointer(s.base()), s.npages*pageSize)
+
+	// Everything on the list is counted as in-use, however sysFault transitions to
+	// Reserved, not Prepared, so we skip updating heapFree or heapReleased and just
+	// remove the memory from the total altogether; it's just address space now.
+	gcController.heapInUse.add(-int64(s.npages * pageSize))
+
+	// Count this as a free of an object right now as opposed to when
+	// the span gets off the quarantine list. The main reason is so that the
+	// amount of bytes allocated doesn't exceed how much is counted as
+	// "mapped ready," which could cause a deadlock in the pacer.
+	gcController.totalFree.Add(int64(s.npages * pageSize))
+
+	// Update consistent stats to match.
+	//
+	// We're non-preemptible, so it's safe to update consistent stats (our P
+	// won't change out from under us).
+	stats := memstats.heapStats.acquire()
+	atomic.Xaddint64(&stats.committed, -int64(s.npages*pageSize))
+	atomic.Xaddint64(&stats.inHeap, -int64(s.npages*pageSize))
+	atomic.Xadd64(&stats.largeFreeCount, 1)
+	atomic.Xadd64(&stats.largeFree, int64(s.npages*pageSize))
+	memstats.heapStats.release()
+
+	// This counts as a free, so update heapLive.
+	gcController.update(-int64(s.npages*pageSize), 0)
+
+	// Mark it as free for the race detector.
+	if raceenabled {
+		racefree(unsafe.Pointer(s.base()), s.elemsize)
+	}
+
+	systemstack(func() {
+		// Add the user arena to the quarantine list.
+		lock(&mheap_.lock)
+		mheap_.userArena.quarantineList.insert(s)
+		unlock(&mheap_.lock)
+	})
+}
+
+// inUserArenaChunk returns true if p points to a user arena chunk.
+func inUserArenaChunk(p uintptr) bool {
+	s := spanOf(p)
+	if s == nil {
+		return false
+	}
+	return s.isUserArenaChunk
+}
+
+// freeUserArenaChunk releases the user arena represented by s back to the runtime.
+//
+// x must be a live pointer within s.
+//
+// The runtime will set the user arena to fault once it's safe (the GC is no longer running)
+// and then once the user arena is no longer referenced by the application, will allow it to
+// be reused.
+func freeUserArenaChunk(s *mspan, x unsafe.Pointer) {
+	if !s.isUserArenaChunk {
+		throw("span is not for a user arena")
+	}
+	if s.npages*pageSize != userArenaChunkBytes {
+		throw("invalid user arena span size")
+	}
+
+	// Mark the region as free to various santizers immediately instead
+	// of handling them at sweep time.
+	if raceenabled {
+		racefree(unsafe.Pointer(s.base()), s.elemsize)
+	}
+	if msanenabled {
+		msanfree(unsafe.Pointer(s.base()), s.elemsize)
+	}
+	if asanenabled {
+		asanpoison(unsafe.Pointer(s.base()), s.elemsize)
+	}
+
+	// Make ourselves non-preemptible as we manipulate state and statistics.
+	//
+	// Also required by setUserArenaChunksToFault.
+	mp := acquirem()
+
+	// We can only set user arenas to fault if we're in the _GCoff phase.
+	if gcphase == _GCoff {
+		lock(&userArenaState.lock)
+		faultList := userArenaState.fault
+		userArenaState.fault = nil
+		unlock(&userArenaState.lock)
+
+		s.setUserArenaChunkToFault()
+		for _, lc := range faultList {
+			lc.mspan.setUserArenaChunkToFault()
+		}
+
+		// Until the chunks are set to fault, keep them alive via the fault list.
+		KeepAlive(x)
+		KeepAlive(faultList)
+	} else {
+		// Put the user arena on the fault list.
+		lock(&userArenaState.lock)
+		userArenaState.fault = append(userArenaState.fault, liveUserArenaChunk{s, x})
+		unlock(&userArenaState.lock)
+	}
+	releasem(mp)
+}
+
+// allocUserArenaChunk attempts to reuse a free user arena chunk represented
+// as a span.
+//
+// Must be in a non-preemptible state to ensure the consistency of statistics
+// exported to MemStats.
+//
+// Acquires the heap lock. Must run on the system stack for that reason.
+//
+//go:systemstack
+func (h *mheap) allocUserArenaChunk() *mspan {
+	var s *mspan
+	var base uintptr
+
+	// First check the free list.
+	lock(&h.lock)
+	if !h.userArena.readyList.isEmpty() {
+		s = h.userArena.readyList.first
+		h.userArena.readyList.remove(s)
+		base = s.base()
+	} else {
+		// Free list was empty, so allocate a new arena.
+		hintList := &h.userArena.arenaHints
+		if raceenabled {
+			// In race mode just use the regular heap hints. We might fragment
+			// the address space, but the race detector requires that the heap
+			// is mapped contiguously.
+			hintList = &h.arenaHints
+		}
+		v, size := h.sysAlloc(userArenaChunkBytes, hintList, false)
+		if size%userArenaChunkBytes != 0 {
+			throw("sysAlloc size is not divisible by userArenaChunkBytes")
+		}
+		if size > userArenaChunkBytes {
+			// We got more than we asked for. This can happen if
+			// heapArenaSize > userArenaChunkSize, or if sysAlloc just returns
+			// some extra as a result of trying to find an aligned region.
+			//
+			// Divide it up and put it on the ready list.
+			for i := uintptr(userArenaChunkBytes); i < size; i += userArenaChunkBytes {
+				s := h.allocMSpanLocked()
+				s.init(uintptr(v)+i, userArenaChunkPages)
+				h.userArena.readyList.insertBack(s)
+			}
+			size = userArenaChunkBytes
+		}
+		base = uintptr(v)
+		if base == 0 {
+			// Out of memory.
+			unlock(&h.lock)
+			return nil
+		}
+		s = h.allocMSpanLocked()
+	}
+	unlock(&h.lock)
+
+	// sysAlloc returns Reserved address space, and any span we're
+	// reusing is set to fault (so, also Reserved), so transition
+	// it to Prepared and then Ready.
+	//
+	// Unlike (*mheap).grow, just map in everything that we
+	// asked for. We're likely going to use it all.
+	sysMap(unsafe.Pointer(base), userArenaChunkBytes, &gcController.heapReleased)
+	sysUsed(unsafe.Pointer(base), userArenaChunkBytes, userArenaChunkBytes)
+
+	// Model the user arena as a heap span for a large object.
+	spc := makeSpanClass(0, false)
+	h.initSpan(s, spanAllocHeap, spc, base, userArenaChunkPages)
+	s.isUserArenaChunk = true
+
+	// Account for this new arena chunk memory.
+	gcController.heapInUse.add(int64(userArenaChunkBytes))
+	gcController.heapReleased.add(-int64(userArenaChunkBytes))
+
+	stats := memstats.heapStats.acquire()
+	atomic.Xaddint64(&stats.inHeap, int64(userArenaChunkBytes))
+	atomic.Xaddint64(&stats.committed, int64(userArenaChunkBytes))
+
+	// Model the arena as a single large malloc.
+	atomic.Xadd64(&stats.largeAlloc, int64(userArenaChunkBytes))
+	atomic.Xadd64(&stats.largeAllocCount, 1)
+	memstats.heapStats.release()
+
+	// Count the alloc in inconsistent, internal stats.
+	gcController.totalAlloc.Add(int64(userArenaChunkBytes))
+
+	// Update heapLive.
+	gcController.update(int64(userArenaChunkBytes), 0)
+
+	// Put the large span in the mcentral swept list so that it's
+	// visible to the background sweeper.
+	h.central[spc].mcentral.fullSwept(h.sweepgen).push(s)
+	s.limit = s.base() + userArenaChunkBytes
+	s.freeindex = 1
+	s.allocCount = 1
+
+	// This must clear the entire heap bitmap so that it's safe
+	// to allocate noscan data without writing anything out.
+	s.initHeapBits(true)
+
+	// Clear the span preemptively. It's an arena chunk, so let's assume
+	// everything is going to be used.
+	//
+	// This also seems to make a massive difference as to whether or
+	// not Linux decides to back this memory with transparent huge
+	// pages. There's latency involved in this zeroing, but the hugepage
+	// gains are almost always worth it. Note: it's important that we
+	// clear even if it's freshly mapped and we know there's no point
+	// to zeroing as *that* is the critical signal to use huge pages.
+	memclrNoHeapPointers(unsafe.Pointer(s.base()), s.elemsize)
+	s.needzero = 0
+
+	s.freeIndexForScan = 1
+
+	// Set up the range for allocation.
+	s.userArenaChunkFree = makeAddrRange(base, s.limit)
+	return s
+}
diff --git a/src/runtime/arena_test.go b/src/runtime/arena_test.go
new file mode 100644
index 0000000..7e121ad
--- /dev/null
+++ b/src/runtime/arena_test.go
@@ -0,0 +1,529 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"internal/goarch"
+	"reflect"
+	. "runtime"
+	"runtime/debug"
+	"runtime/internal/atomic"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+type smallScalar struct {
+	X uintptr
+}
+type smallPointer struct {
+	X *smallPointer
+}
+type smallPointerMix struct {
+	A *smallPointer
+	B byte
+	C *smallPointer
+	D [11]byte
+}
+type mediumScalarEven [8192]byte
+type mediumScalarOdd [3321]byte
+type mediumPointerEven [1024]*smallPointer
+type mediumPointerOdd [1023]*smallPointer
+
+type largeScalar [UserArenaChunkBytes + 1]byte
+type largePointer [UserArenaChunkBytes/unsafe.Sizeof(&smallPointer{}) + 1]*smallPointer
+
+func TestUserArena(t *testing.T) {
+	// Set GOMAXPROCS to 2 so we don't run too many of these
+	// tests in parallel.
+	defer GOMAXPROCS(GOMAXPROCS(2))
+
+	// Start a subtest so that we can clean up after any parallel tests within.
+	t.Run("Alloc", func(t *testing.T) {
+		ss := &smallScalar{5}
+		runSubTestUserArenaNew(t, ss, true)
+
+		sp := &smallPointer{new(smallPointer)}
+		runSubTestUserArenaNew(t, sp, true)
+
+		spm := &smallPointerMix{sp, 5, nil, [11]byte{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}}
+		runSubTestUserArenaNew(t, spm, true)
+
+		mse := new(mediumScalarEven)
+		for i := range mse {
+			mse[i] = 121
+		}
+		runSubTestUserArenaNew(t, mse, true)
+
+		mso := new(mediumScalarOdd)
+		for i := range mso {
+			mso[i] = 122
+		}
+		runSubTestUserArenaNew(t, mso, true)
+
+		mpe := new(mediumPointerEven)
+		for i := range mpe {
+			mpe[i] = sp
+		}
+		runSubTestUserArenaNew(t, mpe, true)
+
+		mpo := new(mediumPointerOdd)
+		for i := range mpo {
+			mpo[i] = sp
+		}
+		runSubTestUserArenaNew(t, mpo, true)
+
+		ls := new(largeScalar)
+		for i := range ls {
+			ls[i] = 123
+		}
+		// Not in parallel because we don't want to hold this large allocation live.
+		runSubTestUserArenaNew(t, ls, false)
+
+		lp := new(largePointer)
+		for i := range lp {
+			lp[i] = sp
+		}
+		// Not in parallel because we don't want to hold this large allocation live.
+		runSubTestUserArenaNew(t, lp, false)
+
+		sss := make([]smallScalar, 25)
+		for i := range sss {
+			sss[i] = smallScalar{12}
+		}
+		runSubTestUserArenaSlice(t, sss, true)
+
+		mpos := make([]mediumPointerOdd, 5)
+		for i := range mpos {
+			mpos[i] = *mpo
+		}
+		runSubTestUserArenaSlice(t, mpos, true)
+
+		sps := make([]smallPointer, UserArenaChunkBytes/unsafe.Sizeof(smallPointer{})+1)
+		for i := range sps {
+			sps[i] = *sp
+		}
+		// Not in parallel because we don't want to hold this large allocation live.
+		runSubTestUserArenaSlice(t, sps, false)
+
+		// Test zero-sized types.
+		t.Run("struct{}", func(t *testing.T) {
+			arena := NewUserArena()
+			var x any
+			x = (*struct{})(nil)
+			arena.New(&x)
+			if v := unsafe.Pointer(x.(*struct{})); v != ZeroBase {
+				t.Errorf("expected zero-sized type to be allocated as zerobase: got %x, want %x", v, ZeroBase)
+			}
+			arena.Free()
+		})
+		t.Run("[]struct{}", func(t *testing.T) {
+			arena := NewUserArena()
+			var sl []struct{}
+			arena.Slice(&sl, 10)
+			if v := unsafe.Pointer(&sl[0]); v != ZeroBase {
+				t.Errorf("expected zero-sized type to be allocated as zerobase: got %x, want %x", v, ZeroBase)
+			}
+			arena.Free()
+		})
+		t.Run("[]int (cap 0)", func(t *testing.T) {
+			arena := NewUserArena()
+			var sl []int
+			arena.Slice(&sl, 0)
+			if len(sl) != 0 {
+				t.Errorf("expected requested zero-sized slice to still have zero length: got %x, want 0", len(sl))
+			}
+			arena.Free()
+		})
+	})
+
+	// Run a GC cycle to get any arenas off the quarantine list.
+	GC()
+
+	if n := GlobalWaitingArenaChunks(); n != 0 {
+		t.Errorf("expected zero waiting arena chunks, found %d", n)
+	}
+}
+
+func runSubTestUserArenaNew[S comparable](t *testing.T, value *S, parallel bool) {
+	t.Run(reflect.TypeOf(value).Elem().Name(), func(t *testing.T) {
+		if parallel {
+			t.Parallel()
+		}
+
+		// Allocate and write data, enough to exhaust the arena.
+		//
+		// This is an underestimate, likely leaving some space in the arena. That's a good thing,
+		// because it gives us coverage of boundary cases.
+		n := int(UserArenaChunkBytes / unsafe.Sizeof(*value))
+		if n == 0 {
+			n = 1
+		}
+
+		// Create a new arena and do a bunch of operations on it.
+		arena := NewUserArena()
+
+		arenaValues := make([]*S, 0, n)
+		for j := 0; j < n; j++ {
+			var x any
+			x = (*S)(nil)
+			arena.New(&x)
+			s := x.(*S)
+			*s = *value
+			arenaValues = append(arenaValues, s)
+		}
+		// Check integrity of allocated data.
+		for _, s := range arenaValues {
+			if *s != *value {
+				t.Errorf("failed integrity check: got %#v, want %#v", *s, *value)
+			}
+		}
+
+		// Release the arena.
+		arena.Free()
+	})
+}
+
+func runSubTestUserArenaSlice[S comparable](t *testing.T, value []S, parallel bool) {
+	t.Run("[]"+reflect.TypeOf(value).Elem().Name(), func(t *testing.T) {
+		if parallel {
+			t.Parallel()
+		}
+
+		// Allocate and write data, enough to exhaust the arena.
+		//
+		// This is an underestimate, likely leaving some space in the arena. That's a good thing,
+		// because it gives us coverage of boundary cases.
+		n := int(UserArenaChunkBytes / (unsafe.Sizeof(*new(S)) * uintptr(cap(value))))
+		if n == 0 {
+			n = 1
+		}
+
+		// Create a new arena and do a bunch of operations on it.
+		arena := NewUserArena()
+
+		arenaValues := make([][]S, 0, n)
+		for j := 0; j < n; j++ {
+			var sl []S
+			arena.Slice(&sl, cap(value))
+			copy(sl, value)
+			arenaValues = append(arenaValues, sl)
+		}
+		// Check integrity of allocated data.
+		for _, sl := range arenaValues {
+			for i := range sl {
+				got := sl[i]
+				want := value[i]
+				if got != want {
+					t.Errorf("failed integrity check: got %#v, want %#v at index %d", got, want, i)
+				}
+			}
+		}
+
+		// Release the arena.
+		arena.Free()
+	})
+}
+
+func TestUserArenaLiveness(t *testing.T) {
+	t.Run("Free", func(t *testing.T) {
+		testUserArenaLiveness(t, false)
+	})
+	t.Run("Finalizer", func(t *testing.T) {
+		testUserArenaLiveness(t, true)
+	})
+}
+
+func testUserArenaLiveness(t *testing.T, useArenaFinalizer bool) {
+	// Disable the GC so that there's zero chance we try doing anything arena related *during*
+	// a mark phase, since otherwise a bunch of arenas could end up on the fault list.
+	defer debug.SetGCPercent(debug.SetGCPercent(-1))
+
+	// Defensively ensure that any full arena chunks leftover from previous tests have been cleared.
+	GC()
+	GC()
+
+	arena := NewUserArena()
+
+	// Allocate a few pointer-ful but un-initialized objects so that later we can
+	// place a reference to heap object at a more interesting location.
+	for i := 0; i < 3; i++ {
+		var x any
+		x = (*mediumPointerOdd)(nil)
+		arena.New(&x)
+	}
+
+	var x any
+	x = (*smallPointerMix)(nil)
+	arena.New(&x)
+	v := x.(*smallPointerMix)
+
+	var safeToFinalize atomic.Bool
+	var finalized atomic.Bool
+	v.C = new(smallPointer)
+	SetFinalizer(v.C, func(_ *smallPointer) {
+		if !safeToFinalize.Load() {
+			t.Error("finalized arena-referenced object unexpectedly")
+		}
+		finalized.Store(true)
+	})
+
+	// Make sure it stays alive.
+	GC()
+	GC()
+
+	// In order to ensure the object can be freed, we now need to make sure to use
+	// the entire arena. Exhaust the rest of the arena.
+
+	for i := 0; i < int(UserArenaChunkBytes/unsafe.Sizeof(mediumScalarEven{})); i++ {
+		var x any
+		x = (*mediumScalarEven)(nil)
+		arena.New(&x)
+	}
+
+	// Make sure it stays alive again.
+	GC()
+	GC()
+
+	v = nil
+
+	safeToFinalize.Store(true)
+	if useArenaFinalizer {
+		arena = nil
+
+		// Try to queue the arena finalizer.
+		GC()
+		GC()
+
+		// In order for the finalizer we actually want to run to execute,
+		// we need to make sure this one runs first.
+		if !BlockUntilEmptyFinalizerQueue(int64(2 * time.Second)) {
+			t.Fatal("finalizer queue was never emptied")
+		}
+	} else {
+		// Free the arena explicitly.
+		arena.Free()
+	}
+
+	// Try to queue the object's finalizer that we set earlier.
+	GC()
+	GC()
+
+	if !BlockUntilEmptyFinalizerQueue(int64(2 * time.Second)) {
+		t.Fatal("finalizer queue was never emptied")
+	}
+	if !finalized.Load() {
+		t.Error("expected arena-referenced object to be finalized")
+	}
+}
+
+func TestUserArenaClearsPointerBits(t *testing.T) {
+	// This is a regression test for a serious issue wherein if pointer bits
+	// aren't properly cleared, it's possible to allocate scalar data down
+	// into a previously pointer-ful area, causing misinterpretation by the GC.
+
+	// Create a large object, grab a pointer into it, and free it.
+	x := new([8 << 20]byte)
+	xp := uintptr(unsafe.Pointer(&x[124]))
+	var finalized atomic.Bool
+	SetFinalizer(x, func(_ *[8 << 20]byte) {
+		finalized.Store(true)
+	})
+
+	// Write three chunks worth of pointer data. Three gives us a
+	// high likelihood that when we write 2 later, we'll get the behavior
+	// we want.
+	a := NewUserArena()
+	for i := 0; i < int(UserArenaChunkBytes/goarch.PtrSize*3); i++ {
+		var x any
+		x = (*smallPointer)(nil)
+		a.New(&x)
+	}
+	a.Free()
+
+	// Recycle the arena chunks.
+	GC()
+	GC()
+
+	a = NewUserArena()
+	for i := 0; i < int(UserArenaChunkBytes/goarch.PtrSize*2); i++ {
+		var x any
+		x = (*smallScalar)(nil)
+		a.New(&x)
+		v := x.(*smallScalar)
+		// Write a pointer that should not keep x alive.
+		*v = smallScalar{xp}
+	}
+	KeepAlive(x)
+	x = nil
+
+	// Try to free x.
+	GC()
+	GC()
+
+	if !BlockUntilEmptyFinalizerQueue(int64(2 * time.Second)) {
+		t.Fatal("finalizer queue was never emptied")
+	}
+	if !finalized.Load() {
+		t.Fatal("heap allocation kept alive through non-pointer reference")
+	}
+
+	// Clean up the arena.
+	a.Free()
+	GC()
+	GC()
+}
+
+func TestUserArenaCloneString(t *testing.T) {
+	a := NewUserArena()
+
+	// A static string (not on heap or arena)
+	var s = "abcdefghij"
+
+	// Create a byte slice in the arena, initialize it with s
+	var b []byte
+	a.Slice(&b, len(s))
+	copy(b, s)
+
+	// Create a string as using the same memory as the byte slice, hence in
+	// the arena. This could be an arena API, but hasn't really been needed
+	// yet.
+	var as string
+	asHeader := (*reflect.StringHeader)(unsafe.Pointer(&as))
+	asHeader.Data = (*reflect.SliceHeader)(unsafe.Pointer(&b)).Data
+	asHeader.Len = len(b)
+
+	// Clone should make a copy of as, since it is in the arena.
+	asCopy := UserArenaClone(as)
+	if (*reflect.StringHeader)(unsafe.Pointer(&as)).Data == (*reflect.StringHeader)(unsafe.Pointer(&asCopy)).Data {
+		t.Error("Clone did not make a copy")
+	}
+
+	// Clone should make a copy of subAs, since subAs is just part of as and so is in the arena.
+	subAs := as[1:3]
+	subAsCopy := UserArenaClone(subAs)
+	if (*reflect.StringHeader)(unsafe.Pointer(&subAs)).Data == (*reflect.StringHeader)(unsafe.Pointer(&subAsCopy)).Data {
+		t.Error("Clone did not make a copy")
+	}
+	if len(subAs) != len(subAsCopy) {
+		t.Errorf("Clone made an incorrect copy (bad length): %d -> %d", len(subAs), len(subAsCopy))
+	} else {
+		for i := range subAs {
+			if subAs[i] != subAsCopy[i] {
+				t.Errorf("Clone made an incorrect copy (data at index %d): %d -> %d", i, subAs[i], subAs[i])
+			}
+		}
+	}
+
+	// Clone should not make a copy of doubleAs, since doubleAs will be on the heap.
+	doubleAs := as + as
+	doubleAsCopy := UserArenaClone(doubleAs)
+	if (*reflect.StringHeader)(unsafe.Pointer(&doubleAs)).Data != (*reflect.StringHeader)(unsafe.Pointer(&doubleAsCopy)).Data {
+		t.Error("Clone should not have made a copy")
+	}
+
+	// Clone should not make a copy of s, since s is a static string.
+	sCopy := UserArenaClone(s)
+	if (*reflect.StringHeader)(unsafe.Pointer(&s)).Data != (*reflect.StringHeader)(unsafe.Pointer(&sCopy)).Data {
+		t.Error("Clone should not have made a copy")
+	}
+
+	a.Free()
+}
+
+func TestUserArenaClonePointer(t *testing.T) {
+	a := NewUserArena()
+
+	// Clone should not make a copy of a heap-allocated smallScalar.
+	x := Escape(new(smallScalar))
+	xCopy := UserArenaClone(x)
+	if unsafe.Pointer(x) != unsafe.Pointer(xCopy) {
+		t.Errorf("Clone should not have made a copy: %#v -> %#v", x, xCopy)
+	}
+
+	// Clone should make a copy of an arena-allocated smallScalar.
+	var i any
+	i = (*smallScalar)(nil)
+	a.New(&i)
+	xArena := i.(*smallScalar)
+	xArenaCopy := UserArenaClone(xArena)
+	if unsafe.Pointer(xArena) == unsafe.Pointer(xArenaCopy) {
+		t.Errorf("Clone should have made a copy: %#v -> %#v", xArena, xArenaCopy)
+	}
+	if *xArena != *xArenaCopy {
+		t.Errorf("Clone made an incorrect copy copy: %#v -> %#v", *xArena, *xArenaCopy)
+	}
+
+	a.Free()
+}
+
+func TestUserArenaCloneSlice(t *testing.T) {
+	a := NewUserArena()
+
+	// A static string (not on heap or arena)
+	var s = "klmnopqrstuv"
+
+	// Create a byte slice in the arena, initialize it with s
+	var b []byte
+	a.Slice(&b, len(s))
+	copy(b, s)
+
+	// Clone should make a copy of b, since it is in the arena.
+	bCopy := UserArenaClone(b)
+	if unsafe.Pointer(&b[0]) == unsafe.Pointer(&bCopy[0]) {
+		t.Errorf("Clone did not make a copy: %#v -> %#v", b, bCopy)
+	}
+	if len(b) != len(bCopy) {
+		t.Errorf("Clone made an incorrect copy (bad length): %d -> %d", len(b), len(bCopy))
+	} else {
+		for i := range b {
+			if b[i] != bCopy[i] {
+				t.Errorf("Clone made an incorrect copy (data at index %d): %d -> %d", i, b[i], bCopy[i])
+			}
+		}
+	}
+
+	// Clone should make a copy of bSub, since bSub is just part of b and so is in the arena.
+	bSub := b[1:3]
+	bSubCopy := UserArenaClone(bSub)
+	if unsafe.Pointer(&bSub[0]) == unsafe.Pointer(&bSubCopy[0]) {
+		t.Errorf("Clone did not make a copy: %#v -> %#v", bSub, bSubCopy)
+	}
+	if len(bSub) != len(bSubCopy) {
+		t.Errorf("Clone made an incorrect copy (bad length): %d -> %d", len(bSub), len(bSubCopy))
+	} else {
+		for i := range bSub {
+			if bSub[i] != bSubCopy[i] {
+				t.Errorf("Clone made an incorrect copy (data at index %d): %d -> %d", i, bSub[i], bSubCopy[i])
+			}
+		}
+	}
+
+	// Clone should not make a copy of bNotArena, since it will not be in an arena.
+	bNotArena := make([]byte, len(s))
+	copy(bNotArena, s)
+	bNotArenaCopy := UserArenaClone(bNotArena)
+	if unsafe.Pointer(&bNotArena[0]) != unsafe.Pointer(&bNotArenaCopy[0]) {
+		t.Error("Clone should not have made a copy")
+	}
+
+	a.Free()
+}
+
+func TestUserArenaClonePanic(t *testing.T) {
+	var s string
+	func() {
+		x := smallScalar{2}
+		defer func() {
+			if v := recover(); v != nil {
+				s = v.(string)
+			}
+		}()
+		UserArenaClone(x)
+	}()
+	if s == "" {
+		t.Errorf("expected panic from Clone")
+	}
+}
diff --git a/src/runtime/asan.go b/src/runtime/asan.go
new file mode 100644
index 0000000..25b8327
--- /dev/null
+++ b/src/runtime/asan.go
@@ -0,0 +1,67 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build asan
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+// Public address sanitizer API.
+func ASanRead(addr unsafe.Pointer, len int) {
+	sp := getcallersp()
+	pc := getcallerpc()
+	doasanread(addr, uintptr(len), sp, pc)
+}
+
+func ASanWrite(addr unsafe.Pointer, len int) {
+	sp := getcallersp()
+	pc := getcallerpc()
+	doasanwrite(addr, uintptr(len), sp, pc)
+}
+
+// Private interface for the runtime.
+const asanenabled = true
+
+// asan{read,write} are nosplit because they may be called between
+// fork and exec, when the stack must not grow. See issue #50391.
+
+//go:nosplit
+func asanread(addr unsafe.Pointer, sz uintptr) {
+	sp := getcallersp()
+	pc := getcallerpc()
+	doasanread(addr, sz, sp, pc)
+}
+
+//go:nosplit
+func asanwrite(addr unsafe.Pointer, sz uintptr) {
+	sp := getcallersp()
+	pc := getcallerpc()
+	doasanwrite(addr, sz, sp, pc)
+}
+
+//go:noescape
+func doasanread(addr unsafe.Pointer, sz, sp, pc uintptr)
+
+//go:noescape
+func doasanwrite(addr unsafe.Pointer, sz, sp, pc uintptr)
+
+//go:noescape
+func asanunpoison(addr unsafe.Pointer, sz uintptr)
+
+//go:noescape
+func asanpoison(addr unsafe.Pointer, sz uintptr)
+
+//go:noescape
+func asanregisterglobals(addr unsafe.Pointer, n uintptr)
+
+// These are called from asan_GOARCH.s
+//
+//go:cgo_import_static __asan_read_go
+//go:cgo_import_static __asan_write_go
+//go:cgo_import_static __asan_unpoison_go
+//go:cgo_import_static __asan_poison_go
+//go:cgo_import_static __asan_register_globals_go
diff --git a/src/runtime/asan/asan.go b/src/runtime/asan/asan.go
new file mode 100644
index 0000000..25f15ae
--- /dev/null
+++ b/src/runtime/asan/asan.go
@@ -0,0 +1,76 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build asan && linux && (arm64 || amd64 || riscv64 || ppc64le)
+
+package asan
+
+/*
+#cgo CFLAGS: -fsanitize=address
+#cgo LDFLAGS: -fsanitize=address
+
+#include <stdbool.h>
+#include <stdint.h>
+#include <sanitizer/asan_interface.h>
+
+void __asan_read_go(void *addr, uintptr_t sz, void *sp, void *pc) {
+	if (__asan_region_is_poisoned(addr, sz)) {
+		__asan_report_error(pc, 0, sp, addr, false, sz);
+	}
+}
+
+void __asan_write_go(void *addr, uintptr_t sz, void *sp, void *pc) {
+	if (__asan_region_is_poisoned(addr, sz)) {
+		__asan_report_error(pc, 0, sp, addr, true, sz);
+	}
+}
+
+void __asan_unpoison_go(void *addr, uintptr_t sz) {
+	__asan_unpoison_memory_region(addr, sz);
+}
+
+void __asan_poison_go(void *addr, uintptr_t sz) {
+	__asan_poison_memory_region(addr, sz);
+}
+
+// Keep in sync with the definition in compiler-rt
+// https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/asan/asan_interface_internal.h#L41
+// This structure is used to describe the source location of
+// a place where global was defined.
+struct _asan_global_source_location {
+	const char *filename;
+	int line_no;
+	int column_no;
+};
+
+// Keep in sync with the definition in compiler-rt
+// https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/asan/asan_interface_internal.h#L48
+// So far, the current implementation is only compatible with the ASan library from version v7 to v9.
+// https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/asan/asan_init_version.h
+// This structure describes an instrumented global variable.
+//
+// TODO: If a later version of the ASan library changes __asan_global or __asan_global_source_location
+// structure, we need to make the same changes.
+struct _asan_global {
+	uintptr_t beg;
+	uintptr_t size;
+	uintptr_t size_with_redzone;
+	const char *name;
+	const char *module_name;
+	uintptr_t has_dynamic_init;
+	struct _asan_global_source_location *location;
+	uintptr_t odr_indicator;
+};
+
+
+extern void __asan_register_globals(void*, long int);
+
+// Register global variables.
+// The 'globals' is an array of structures describing 'n' globals.
+void __asan_register_globals_go(void *addr, uintptr_t n) {
+	struct _asan_global *globals = (struct _asan_global *)(addr);
+	__asan_register_globals(globals, n);
+}
+*/
+import "C"
diff --git a/src/runtime/asan0.go b/src/runtime/asan0.go
new file mode 100644
index 0000000..0948786
--- /dev/null
+++ b/src/runtime/asan0.go
@@ -0,0 +1,23 @@
+// Copyright 2021 The Go Authors.  All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !asan
+
+// Dummy ASan support API, used when not built with -asan.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+const asanenabled = false
+
+// Because asanenabled is false, none of these functions should be called.
+
+func asanread(addr unsafe.Pointer, sz uintptr)            { throw("asan") }
+func asanwrite(addr unsafe.Pointer, sz uintptr)           { throw("asan") }
+func asanunpoison(addr unsafe.Pointer, sz uintptr)        { throw("asan") }
+func asanpoison(addr unsafe.Pointer, sz uintptr)          { throw("asan") }
+func asanregisterglobals(addr unsafe.Pointer, sz uintptr) { throw("asan") }
diff --git a/src/runtime/asan_amd64.s b/src/runtime/asan_amd64.s
new file mode 100644
index 0000000..bf847f2
--- /dev/null
+++ b/src/runtime/asan_amd64.s
@@ -0,0 +1,91 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build asan
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// This is like race_amd64.s, but for the asan calls.
+// See race_amd64.s for detailed comments.
+
+#ifdef GOOS_windows
+#define RARG0 CX
+#define RARG1 DX
+#define RARG2 R8
+#define RARG3 R9
+#else
+#define RARG0 DI
+#define RARG1 SI
+#define RARG2 DX
+#define RARG3 CX
+#endif
+
+// Called from instrumented code.
+// func runtime·doasanread(addr unsafe.Pointer, sz, sp, pc uintptr)
+TEXT	runtime·doasanread(SB), NOSPLIT, $0-32
+	MOVQ	addr+0(FP), RARG0
+	MOVQ	size+8(FP), RARG1
+	MOVQ	sp+16(FP), RARG2
+	MOVQ	pc+24(FP), RARG3
+	// void __asan_read_go(void *addr, uintptr_t sz, void *sp, void *pc);
+	MOVQ	$__asan_read_go(SB), AX
+	JMP	asancall<>(SB)
+
+// func runtime·doasanwrite(addr unsafe.Pointer, sz, sp, pc uintptr)
+TEXT	runtime·doasanwrite(SB), NOSPLIT, $0-32
+	MOVQ	addr+0(FP), RARG0
+	MOVQ	size+8(FP), RARG1
+	MOVQ	sp+16(FP), RARG2
+	MOVQ	pc+24(FP), RARG3
+	// void __asan_write_go(void *addr, uintptr_t sz, void *sp, void *pc);
+	MOVQ	$__asan_write_go(SB), AX
+	JMP	asancall<>(SB)
+
+// func runtime·asanunpoison(addr unsafe.Pointer, sz uintptr)
+TEXT	runtime·asanunpoison(SB), NOSPLIT, $0-16
+	MOVQ	addr+0(FP), RARG0
+	MOVQ	size+8(FP), RARG1
+	// void __asan_unpoison_go(void *addr, uintptr_t sz);
+	MOVQ	$__asan_unpoison_go(SB), AX
+	JMP	asancall<>(SB)
+
+// func runtime·asanpoison(addr unsafe.Pointer, sz uintptr)
+TEXT	runtime·asanpoison(SB), NOSPLIT, $0-16
+	MOVQ	addr+0(FP), RARG0
+	MOVQ	size+8(FP), RARG1
+	// void __asan_poison_go(void *addr, uintptr_t sz);
+	MOVQ	$__asan_poison_go(SB), AX
+	JMP	asancall<>(SB)
+
+// func runtime·asanregisterglobals(addr unsafe.Pointer, n uintptr)
+TEXT	runtime·asanregisterglobals(SB), NOSPLIT, $0-16
+	MOVD	addr+0(FP), RARG0
+	MOVD	size+8(FP), RARG1
+	// void __asan_register_globals_go(void *addr, uintptr_t n);
+	MOVD	$__asan_register_globals_go(SB), AX
+	JMP	asancall<>(SB)
+
+// Switches SP to g0 stack and calls (AX). Arguments already set.
+TEXT	asancall<>(SB), NOSPLIT, $0-0
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	MOVQ	SP, R12		// callee-saved, preserved across the CALL
+	CMPQ	R14, $0
+	JE	call	// no g; still on a system stack
+
+	MOVQ	g_m(R14), R13
+	// Switch to g0 stack.
+	MOVQ	m_g0(R13), R10
+	CMPQ	R10, R14
+	JE	call	// already on g0
+
+	MOVQ	(g_sched+gobuf_sp)(R10), SP
+call:
+	ANDQ	$~15, SP	// alignment for gcc ABI
+	CALL	AX
+	MOVQ	R12, SP
+	RET
diff --git a/src/runtime/asan_arm64.s b/src/runtime/asan_arm64.s
new file mode 100644
index 0000000..697c982
--- /dev/null
+++ b/src/runtime/asan_arm64.s
@@ -0,0 +1,76 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build asan
+
+#include "go_asm.h"
+#include "textflag.h"
+
+#define RARG0 R0
+#define RARG1 R1
+#define RARG2 R2
+#define RARG3 R3
+#define FARG R4
+
+// Called from instrumented code.
+// func runtime·doasanread(addr unsafe.Pointer, sz, sp, pc uintptr)
+TEXT	runtime·doasanread(SB), NOSPLIT, $0-32
+	MOVD	addr+0(FP), RARG0
+	MOVD	size+8(FP), RARG1
+	MOVD	sp+16(FP), RARG2
+	MOVD	pc+24(FP), RARG3
+	// void __asan_read_go(void *addr, uintptr_t sz, void *sp, void *pc);
+	MOVD	$__asan_read_go(SB), FARG
+	JMP	asancall<>(SB)
+
+// func runtime·doasanwrite(addr unsafe.Pointer, sz, sp, pc uintptr)
+TEXT	runtime·doasanwrite(SB), NOSPLIT, $0-32
+	MOVD	addr+0(FP), RARG0
+	MOVD	size+8(FP), RARG1
+	MOVD	sp+16(FP), RARG2
+	MOVD	pc+24(FP), RARG3
+	// void __asan_write_go(void *addr, uintptr_t sz, void *sp, void *pc);
+	MOVD	$__asan_write_go(SB), FARG
+	JMP	asancall<>(SB)
+
+// func runtime·asanunpoison(addr unsafe.Pointer, sz uintptr)
+TEXT	runtime·asanunpoison(SB), NOSPLIT, $0-16
+	MOVD	addr+0(FP), RARG0
+	MOVD	size+8(FP), RARG1
+	// void __asan_unpoison_go(void *addr, uintptr_t sz);
+	MOVD	$__asan_unpoison_go(SB), FARG
+	JMP	asancall<>(SB)
+
+// func runtime·asanpoison(addr unsafe.Pointer, sz uintptr)
+TEXT	runtime·asanpoison(SB), NOSPLIT, $0-16
+	MOVD	addr+0(FP), RARG0
+	MOVD	size+8(FP), RARG1
+	// void __asan_poison_go(void *addr, uintptr_t sz);
+	MOVD	$__asan_poison_go(SB), FARG
+	JMP	asancall<>(SB)
+
+// func runtime·asanregisterglobals(addr unsafe.Pointer, n uintptr)
+TEXT	runtime·asanregisterglobals(SB), NOSPLIT, $0-16
+	MOVD	addr+0(FP), RARG0
+	MOVD	size+8(FP), RARG1
+	// void __asan_register_globals_go(void *addr, uintptr_t n);
+	MOVD	$__asan_register_globals_go(SB), FARG
+	JMP	asancall<>(SB)
+
+// Switches SP to g0 stack and calls (FARG). Arguments already set.
+TEXT	asancall<>(SB), NOSPLIT, $0-0
+	MOVD	RSP, R19                  // callee-saved
+	CBZ	g, g0stack                // no g, still on a system stack
+	MOVD	g_m(g), R10
+	MOVD	m_g0(R10), R11
+	CMP	R11, g
+	BEQ	g0stack
+
+	MOVD	(g_sched+gobuf_sp)(R11), R5
+	MOVD	R5, RSP
+
+g0stack:
+	BL	(FARG)
+	MOVD	R19, RSP
+	RET
diff --git a/src/runtime/asan_ppc64le.s b/src/runtime/asan_ppc64le.s
new file mode 100644
index 0000000..d13301a
--- /dev/null
+++ b/src/runtime/asan_ppc64le.s
@@ -0,0 +1,87 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build asan
+
+#include "go_asm.h"
+#include "textflag.h"
+
+#define RARG0 R3
+#define RARG1 R4
+#define RARG2 R5
+#define RARG3 R6
+#define FARG R12
+
+// Called from instrumented code.
+// func runtime·doasanread(addr unsafe.Pointer, sz, sp, pc uintptr)
+TEXT	runtime·doasanread(SB),NOSPLIT|NOFRAME,$0-32
+	MOVD	addr+0(FP), RARG0
+	MOVD	sz+8(FP), RARG1
+	MOVD	sp+16(FP), RARG2
+	MOVD	pc+24(FP), RARG3
+	// void __asan_read_go(void *addr, uintptr_t sz, void *sp, void *pc);
+	MOVD	$__asan_read_go(SB), FARG
+	BR	asancall<>(SB)
+
+// func runtime·doasanwrite(addr unsafe.Pointer, sz, sp, pc uintptr)
+TEXT	runtime·doasanwrite(SB),NOSPLIT|NOFRAME,$0-32
+	MOVD	addr+0(FP), RARG0
+	MOVD	sz+8(FP), RARG1
+	MOVD	sp+16(FP), RARG2
+	MOVD	pc+24(FP), RARG3
+	// void __asan_write_go(void *addr, uintptr_t sz, void *sp, void *pc);
+	MOVD	$__asan_write_go(SB), FARG
+	BR	asancall<>(SB)
+
+// func runtime·asanunpoison(addr unsafe.Pointer, sz uintptr)
+TEXT	runtime·asanunpoison(SB),NOSPLIT|NOFRAME,$0-16
+	MOVD	addr+0(FP), RARG0
+	MOVD	sz+8(FP), RARG1
+	// void __asan_unpoison_go(void *addr, uintptr_t sz);
+	MOVD	$__asan_unpoison_go(SB), FARG
+	BR	asancall<>(SB)
+
+// func runtime·asanpoison(addr unsafe.Pointer, sz uintptr)
+TEXT	runtime·asanpoison(SB),NOSPLIT|NOFRAME,$0-16
+	MOVD	addr+0(FP), RARG0
+	MOVD	sz+8(FP), RARG1
+	// void __asan_poison_go(void *addr, uintptr_t sz);
+	MOVD	$__asan_poison_go(SB), FARG
+	BR	asancall<>(SB)
+
+// func runtime·asanregisterglobals(addr unsafe.Pointer, n uintptr)
+TEXT	runtime·asanregisterglobals(SB),NOSPLIT|NOFRAME,$0-16
+	MOVD	addr+0(FP), RARG0
+	MOVD	n+8(FP), RARG1
+	// void __asan_register_globals_go(void *addr, uintptr_t n);
+	MOVD	$__asan_register_globals_go(SB), FARG
+	BR	asancall<>(SB)
+
+// Switches SP to g0 stack and calls (FARG). Arguments already set.
+TEXT	asancall<>(SB), NOSPLIT, $0-0
+	// LR saved in generated prologue
+	// Get info from the current goroutine
+	MOVD	runtime·tls_g(SB), R10  // g offset in TLS
+	MOVD	0(R10), g
+	MOVD	g_m(g), R7		// m for g
+	MOVD	R1, R16			// callee-saved, preserved across C call
+	MOVD	m_g0(R7), R10		// g0 for m
+	CMP	R10, g			// same g0?
+	BEQ	call			// already on g0
+	MOVD	(g_sched+gobuf_sp)(R10), R1 // switch R1
+call:
+	// prepare frame for C ABI
+	SUB	$32, R1			// create frame for callee saving LR, CR, R2 etc.
+	RLDCR	$0, R1, $~15, R1	// align SP to 16 bytes
+	MOVD	FARG, CTR		// address of function to be called
+	MOVD	R0, 0(R1)		// clear back chain pointer
+	BL	(CTR)
+	MOVD	$0, R0			// C code can clobber R0 set it back to 0
+	MOVD	R16, R1			// restore R1;
+	MOVD	runtime·tls_g(SB), R10	// find correct g
+	MOVD	0(R10), g
+	RET
+
+// tls_g, g value for each thread in TLS
+GLOBL runtime·tls_g+0(SB), TLSBSS+DUPOK, $8
diff --git a/src/runtime/asan_riscv64.s b/src/runtime/asan_riscv64.s
new file mode 100644
index 0000000..6fcd94d
--- /dev/null
+++ b/src/runtime/asan_riscv64.s
@@ -0,0 +1,68 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build asan
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// Called from instrumented code.
+// func runtime·doasanread(addr unsafe.Pointer, sz, sp, pc uintptr)
+TEXT	runtime·doasanread(SB), NOSPLIT, $0-32
+	MOV	addr+0(FP), X10
+	MOV	sz+8(FP), X11
+	MOV	sp+16(FP), X12
+	MOV	pc+24(FP), X13
+	// void __asan_read_go(void *addr, uintptr_t sz);
+	MOV	$__asan_read_go(SB), X14
+	JMP	asancall<>(SB)
+
+// func runtime·doasanwrite(addr unsafe.Pointer, sz, sp, pc uintptr)
+TEXT	runtime·doasanwrite(SB), NOSPLIT, $0-32
+	MOV	addr+0(FP), X10
+	MOV	sz+8(FP), X11
+	MOV	sp+16(FP), X12
+	MOV	pc+24(FP), X13
+	// void __asan_write_go(void *addr, uintptr_t sz);
+	MOV	$__asan_write_go(SB), X14
+	JMP	asancall<>(SB)
+
+// func runtime·asanunpoison(addr unsafe.Pointer, sz uintptr)
+TEXT	runtime·asanunpoison(SB), NOSPLIT, $0-16
+	MOV	addr+0(FP), X10
+	MOV	sz+8(FP), X11
+	// void __asan_unpoison_go(void *addr, uintptr_t sz);
+	MOV	$__asan_unpoison_go(SB), X14
+	JMP	asancall<>(SB)
+
+// func runtime·asanpoison(addr unsafe.Pointer, sz uintptr)
+TEXT	runtime·asanpoison(SB), NOSPLIT, $0-16
+	MOV	addr+0(FP), X10
+	MOV	sz+8(FP), X11
+	// void __asan_poison_go(void *addr, uintptr_t sz);
+	MOV	$__asan_poison_go(SB), X14
+	JMP	asancall<>(SB)
+
+// func runtime·asanregisterglobals(addr unsafe.Pointer, n uintptr)
+TEXT	runtime·asanregisterglobals(SB), NOSPLIT, $0-16
+	MOV	addr+0(FP), X10
+	MOV	n+8(FP), X11
+	// void __asan_register_globals_go(void *addr, uintptr_t n);
+	MOV	$__asan_register_globals_go(SB), X14
+	JMP	asancall<>(SB)
+
+// Switches SP to g0 stack and calls (X14). Arguments already set.
+TEXT	asancall<>(SB), NOSPLIT, $0-0
+	MOV	X2, X8		// callee-saved
+	BEQZ	g, g0stack	// no g, still on a system stack
+	MOV	g_m(g), X21
+	MOV	m_g0(X21), X21
+	BEQ	X21, g, g0stack
+
+	MOV	(g_sched+gobuf_sp)(X21), X2
+
+g0stack:
+	JALR	RA, X14
+	MOV	X8, X2
+	RET
diff --git a/src/runtime/asm.s b/src/runtime/asm.s
new file mode 100644
index 0000000..f7bc5d4
--- /dev/null
+++ b/src/runtime/asm.s
@@ -0,0 +1,14 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+#ifndef GOARCH_amd64
+TEXT ·sigpanic0(SB),NOSPLIT,$0-0
+	JMP	·sigpanic<ABIInternal>(SB)
+#endif
+
+// See map.go comment on the need for this routine.
+TEXT ·mapinitnoop<ABIInternal>(SB),NOSPLIT,$0-0
+	RET
diff --git a/src/runtime/asm_386.s b/src/runtime/asm_386.s
new file mode 100644
index 0000000..67ffc24
--- /dev/null
+++ b/src/runtime/asm_386.s
@@ -0,0 +1,1653 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// _rt0_386 is common startup code for most 386 systems when using
+// internal linking. This is the entry point for the program from the
+// kernel for an ordinary -buildmode=exe program. The stack holds the
+// number of arguments and the C-style argv.
+TEXT _rt0_386(SB),NOSPLIT,$8
+	MOVL	8(SP), AX	// argc
+	LEAL	12(SP), BX	// argv
+	MOVL	AX, 0(SP)
+	MOVL	BX, 4(SP)
+	JMP	runtime·rt0_go(SB)
+
+// _rt0_386_lib is common startup code for most 386 systems when
+// using -buildmode=c-archive or -buildmode=c-shared. The linker will
+// arrange to invoke this function as a global constructor (for
+// c-archive) or when the shared library is loaded (for c-shared).
+// We expect argc and argv to be passed on the stack following the
+// usual C ABI.
+TEXT _rt0_386_lib(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	PUSHL	BX
+	PUSHL	SI
+	PUSHL	DI
+
+	MOVL	8(BP), AX
+	MOVL	AX, _rt0_386_lib_argc<>(SB)
+	MOVL	12(BP), AX
+	MOVL	AX, _rt0_386_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	CALL	runtime·libpreinit(SB)
+
+	SUBL	$8, SP
+
+	// Create a new thread to do the runtime initialization.
+	MOVL	_cgo_sys_thread_create(SB), AX
+	TESTL	AX, AX
+	JZ	nocgo
+
+	// Align stack to call C function.
+	// We moved SP to BP above, but BP was clobbered by the libpreinit call.
+	MOVL	SP, BP
+	ANDL	$~15, SP
+
+	MOVL	$_rt0_386_lib_go(SB), BX
+	MOVL	BX, 0(SP)
+	MOVL	$0, 4(SP)
+
+	CALL	AX
+
+	MOVL	BP, SP
+
+	JMP	restore
+
+nocgo:
+	MOVL	$0x800000, 0(SP)                    // stacksize = 8192KB
+	MOVL	$_rt0_386_lib_go(SB), AX
+	MOVL	AX, 4(SP)                           // fn
+	CALL	runtime·newosproc0(SB)
+
+restore:
+	ADDL	$8, SP
+	POPL	DI
+	POPL	SI
+	POPL	BX
+	POPL	BP
+	RET
+
+// _rt0_386_lib_go initializes the Go runtime.
+// This is started in a separate thread by _rt0_386_lib.
+TEXT _rt0_386_lib_go(SB),NOSPLIT,$8
+	MOVL	_rt0_386_lib_argc<>(SB), AX
+	MOVL	AX, 0(SP)
+	MOVL	_rt0_386_lib_argv<>(SB), AX
+	MOVL	AX, 4(SP)
+	JMP	runtime·rt0_go(SB)
+
+DATA _rt0_386_lib_argc<>(SB)/4, $0
+GLOBL _rt0_386_lib_argc<>(SB),NOPTR, $4
+DATA _rt0_386_lib_argv<>(SB)/4, $0
+GLOBL _rt0_386_lib_argv<>(SB),NOPTR, $4
+
+TEXT runtime·rt0_go(SB),NOSPLIT|NOFRAME|TOPFRAME,$0
+	// Copy arguments forward on an even stack.
+	// Users of this function jump to it, they don't call it.
+	MOVL	0(SP), AX
+	MOVL	4(SP), BX
+	SUBL	$128, SP		// plenty of scratch
+	ANDL	$~15, SP
+	MOVL	AX, 120(SP)		// save argc, argv away
+	MOVL	BX, 124(SP)
+
+	// set default stack bounds.
+	// _cgo_init may update stackguard.
+	MOVL	$runtime·g0(SB), BP
+	LEAL	(-64*1024+104)(SP), BX
+	MOVL	BX, g_stackguard0(BP)
+	MOVL	BX, g_stackguard1(BP)
+	MOVL	BX, (g_stack+stack_lo)(BP)
+	MOVL	SP, (g_stack+stack_hi)(BP)
+
+	// find out information about the processor we're on
+	// first see if CPUID instruction is supported.
+	PUSHFL
+	PUSHFL
+	XORL	$(1<<21), 0(SP) // flip ID bit
+	POPFL
+	PUSHFL
+	POPL	AX
+	XORL	0(SP), AX
+	POPFL	// restore EFLAGS
+	TESTL	$(1<<21), AX
+	JNE 	has_cpuid
+
+bad_proc: // show that the program requires MMX.
+	MOVL	$2, 0(SP)
+	MOVL	$bad_proc_msg<>(SB), 4(SP)
+	MOVL	$0x3d, 8(SP)
+	CALL	runtime·write(SB)
+	MOVL	$1, 0(SP)
+	CALL	runtime·exit(SB)
+	CALL	runtime·abort(SB)
+
+has_cpuid:
+	MOVL	$0, AX
+	CPUID
+	MOVL	AX, SI
+	CMPL	AX, $0
+	JE	nocpuinfo
+
+	CMPL	BX, $0x756E6547  // "Genu"
+	JNE	notintel
+	CMPL	DX, $0x49656E69  // "ineI"
+	JNE	notintel
+	CMPL	CX, $0x6C65746E  // "ntel"
+	JNE	notintel
+	MOVB	$1, runtime·isIntel(SB)
+notintel:
+
+	// Load EAX=1 cpuid flags
+	MOVL	$1, AX
+	CPUID
+	MOVL	CX, DI // Move to global variable clobbers CX when generating PIC
+	MOVL	AX, runtime·processorVersionInfo(SB)
+
+	// Check for MMX support
+	TESTL	$(1<<23), DX // MMX
+	JZ	bad_proc
+
+nocpuinfo:
+	// if there is an _cgo_init, call it to let it
+	// initialize and to set up GS.  if not,
+	// we set up GS ourselves.
+	MOVL	_cgo_init(SB), AX
+	TESTL	AX, AX
+	JZ	needtls
+#ifdef GOOS_android
+	// arg 4: TLS base, stored in slot 0 (Android's TLS_SLOT_SELF).
+	// Compensate for tls_g (+8).
+	MOVL	-8(TLS), BX
+	MOVL	BX, 12(SP)
+	MOVL	$runtime·tls_g(SB), 8(SP)	// arg 3: &tls_g
+#else
+	MOVL	$0, BX
+	MOVL	BX, 12(SP)	// arg 4: not used when using platform's TLS
+#ifdef GOOS_windows
+	MOVL	$runtime·tls_g(SB), 8(SP)	// arg 3: &tls_g
+#else
+	MOVL	BX, 8(SP)	// arg 3: not used when using platform's TLS
+#endif
+#endif
+	MOVL	$setg_gcc<>(SB), BX
+	MOVL	BX, 4(SP)	// arg 2: setg_gcc
+	MOVL	BP, 0(SP)	// arg 1: g0
+	CALL	AX
+
+	// update stackguard after _cgo_init
+	MOVL	$runtime·g0(SB), CX
+	MOVL	(g_stack+stack_lo)(CX), AX
+	ADDL	$const_stackGuard, AX
+	MOVL	AX, g_stackguard0(CX)
+	MOVL	AX, g_stackguard1(CX)
+
+#ifndef GOOS_windows
+	// skip runtime·ldt0setup(SB) and tls test after _cgo_init for non-windows
+	JMP ok
+#endif
+needtls:
+#ifdef GOOS_openbsd
+	// skip runtime·ldt0setup(SB) and tls test on OpenBSD in all cases
+	JMP	ok
+#endif
+#ifdef GOOS_plan9
+	// skip runtime·ldt0setup(SB) and tls test on Plan 9 in all cases
+	JMP	ok
+#endif
+
+	// set up %gs
+	CALL	ldt0setup<>(SB)
+
+	// store through it, to make sure it works
+	get_tls(BX)
+	MOVL	$0x123, g(BX)
+	MOVL	runtime·m0+m_tls(SB), AX
+	CMPL	AX, $0x123
+	JEQ	ok
+	MOVL	AX, 0	// abort
+ok:
+	// set up m and g "registers"
+	get_tls(BX)
+	LEAL	runtime·g0(SB), DX
+	MOVL	DX, g(BX)
+	LEAL	runtime·m0(SB), AX
+
+	// save m->g0 = g0
+	MOVL	DX, m_g0(AX)
+	// save g0->m = m0
+	MOVL	AX, g_m(DX)
+
+	CALL	runtime·emptyfunc(SB)	// fault if stack check is wrong
+
+	// convention is D is always cleared
+	CLD
+
+	CALL	runtime·check(SB)
+
+	// saved argc, argv
+	MOVL	120(SP), AX
+	MOVL	AX, 0(SP)
+	MOVL	124(SP), AX
+	MOVL	AX, 4(SP)
+	CALL	runtime·args(SB)
+	CALL	runtime·osinit(SB)
+	CALL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	PUSHL	$runtime·mainPC(SB)	// entry
+	CALL	runtime·newproc(SB)
+	POPL	AX
+
+	// start this M
+	CALL	runtime·mstart(SB)
+
+	CALL	runtime·abort(SB)
+	RET
+
+DATA	bad_proc_msg<>+0x00(SB)/61, $"This program can only be run on processors with MMX support.\n"
+GLOBL	bad_proc_msg<>(SB), RODATA, $61
+
+DATA	runtime·mainPC+0(SB)/4,$runtime·main(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$4
+
+TEXT runtime·breakpoint(SB),NOSPLIT,$0-0
+	INT $3
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT,$0-0
+	// Linux and MinGW start the FPU in extended double precision.
+	// Other operating systems use double precision.
+	// Change to double precision to match them,
+	// and to match other hardware that only has double.
+	FLDCW	runtime·controlWord64(SB)
+	RET
+
+TEXT runtime·mstart(SB),NOSPLIT|TOPFRAME,$0
+	CALL	runtime·mstart0(SB)
+	RET // not reached
+
+/*
+ *  go-routine
+ */
+
+// void gogo(Gobuf*)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB), NOSPLIT, $0-4
+	MOVL	buf+0(FP), BX		// gobuf
+	MOVL	gobuf_g(BX), DX
+	MOVL	0(DX), CX		// make sure g != nil
+	JMP	gogo<>(SB)
+
+TEXT gogo<>(SB), NOSPLIT, $0
+	get_tls(CX)
+	MOVL	DX, g(CX)
+	MOVL	gobuf_sp(BX), SP	// restore SP
+	MOVL	gobuf_ret(BX), AX
+	MOVL	gobuf_ctxt(BX), DX
+	MOVL	$0, gobuf_sp(BX)	// clear to help garbage collector
+	MOVL	$0, gobuf_ret(BX)
+	MOVL	$0, gobuf_ctxt(BX)
+	MOVL	gobuf_pc(BX), BX
+	JMP	BX
+
+// func mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall(SB), NOSPLIT, $0-4
+	MOVL	fn+0(FP), DI
+
+	get_tls(DX)
+	MOVL	g(DX), AX	// save state in g->sched
+	MOVL	0(SP), BX	// caller's PC
+	MOVL	BX, (g_sched+gobuf_pc)(AX)
+	LEAL	fn+0(FP), BX	// caller's SP
+	MOVL	BX, (g_sched+gobuf_sp)(AX)
+
+	// switch to m->g0 & its stack, call fn
+	MOVL	g(DX), BX
+	MOVL	g_m(BX), BX
+	MOVL	m_g0(BX), SI
+	CMPL	SI, AX	// if g == m->g0 call badmcall
+	JNE	3(PC)
+	MOVL	$runtime·badmcall(SB), AX
+	JMP	AX
+	MOVL	SI, g(DX)	// g = m->g0
+	MOVL	(g_sched+gobuf_sp)(SI), SP	// sp = m->g0->sched.sp
+	PUSHL	AX
+	MOVL	DI, DX
+	MOVL	0(DI), DI
+	CALL	DI
+	POPL	AX
+	MOVL	$runtime·badmcall2(SB), AX
+	JMP	AX
+	RET
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack. We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-4
+	MOVL	fn+0(FP), DI	// DI = fn
+	get_tls(CX)
+	MOVL	g(CX), AX	// AX = g
+	MOVL	g_m(AX), BX	// BX = m
+
+	CMPL	AX, m_gsignal(BX)
+	JEQ	noswitch
+
+	MOVL	m_g0(BX), DX	// DX = g0
+	CMPL	AX, DX
+	JEQ	noswitch
+
+	CMPL	AX, m_curg(BX)
+	JNE	bad
+
+	// switch stacks
+	// save our state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	CALL	gosave_systemstack_switch<>(SB)
+
+	// switch to g0
+	get_tls(CX)
+	MOVL	DX, g(CX)
+	MOVL	(g_sched+gobuf_sp)(DX), BX
+	MOVL	BX, SP
+
+	// call target function
+	MOVL	DI, DX
+	MOVL	0(DI), DI
+	CALL	DI
+
+	// switch back to g
+	get_tls(CX)
+	MOVL	g(CX), AX
+	MOVL	g_m(AX), BX
+	MOVL	m_curg(BX), AX
+	MOVL	AX, g(CX)
+	MOVL	(g_sched+gobuf_sp)(AX), SP
+	MOVL	$0, (g_sched+gobuf_sp)(AX)
+	RET
+
+noswitch:
+	// already on system stack; tail call the function
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOVL	DI, DX
+	MOVL	0(DI), DI
+	JMP	DI
+
+bad:
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOVL	$runtime·badsystemstack(SB), AX
+	CALL	AX
+	INT	$3
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	get_tls(CX)
+	MOVL	g(CX), BX
+	MOVL	g_m(BX), BX
+	MOVL	m_g0(BX), SI
+	CMPL	g(CX), SI
+	JNE	3(PC)
+	CALL	runtime·badmorestackg0(SB)
+	CALL	runtime·abort(SB)
+
+	// Cannot grow signal stack.
+	MOVL	m_gsignal(BX), SI
+	CMPL	g(CX), SI
+	JNE	3(PC)
+	CALL	runtime·badmorestackgsignal(SB)
+	CALL	runtime·abort(SB)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	NOP	SP	// tell vet SP changed - stop checking offsets
+	MOVL	4(SP), DI	// f's caller's PC
+	MOVL	DI, (m_morebuf+gobuf_pc)(BX)
+	LEAL	8(SP), CX	// f's caller's SP
+	MOVL	CX, (m_morebuf+gobuf_sp)(BX)
+	get_tls(CX)
+	MOVL	g(CX), SI
+	MOVL	SI, (m_morebuf+gobuf_g)(BX)
+
+	// Set g->sched to context in f.
+	MOVL	0(SP), AX	// f's PC
+	MOVL	AX, (g_sched+gobuf_pc)(SI)
+	LEAL	4(SP), AX	// f's SP
+	MOVL	AX, (g_sched+gobuf_sp)(SI)
+	MOVL	DX, (g_sched+gobuf_ctxt)(SI)
+
+	// Call newstack on m->g0's stack.
+	MOVL	m_g0(BX), BP
+	MOVL	BP, g(CX)
+	MOVL	(g_sched+gobuf_sp)(BP), AX
+	MOVL	-4(AX), BX	// fault if CALL would, before smashing SP
+	MOVL	AX, SP
+	CALL	runtime·newstack(SB)
+	CALL	runtime·abort(SB)	// crash if newstack returns
+	RET
+
+TEXT runtime·morestack_noctxt(SB),NOSPLIT,$0-0
+	MOVL	$0, DX
+	JMP runtime·morestack(SB)
+
+// reflectcall: call a function with the given argument list
+// func call(stackArgsType *_type, f *FuncVal, stackArgs *byte, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)		\
+	CMPL	CX, $MAXSIZE;		\
+	JA	3(PC);			\
+	MOVL	$NAME(SB), AX;		\
+	JMP	AX
+// Note: can't just "JMP NAME(SB)" - bad inlining results.
+
+TEXT ·reflectcall(SB), NOSPLIT, $0-28
+	MOVL	frameSize+20(FP), CX
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVL	$runtime·badreflectcall(SB), AX
+	JMP	AX
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-28;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOVL	stackArgs+8(FP), SI;		\
+	MOVL	stackArgsSize+12(FP), CX;		\
+	MOVL	SP, DI;				\
+	REP;MOVSB;				\
+	/* call function */			\
+	MOVL	f+4(FP), DX;			\
+	MOVL	(DX), AX; 			\
+	PCDATA  $PCDATA_StackMapIndex, $0;	\
+	CALL	AX;				\
+	/* copy return values back */		\
+	MOVL	stackArgsType+0(FP), DX;		\
+	MOVL	stackArgs+8(FP), DI;		\
+	MOVL	stackArgsSize+12(FP), CX;		\
+	MOVL	stackRetOffset+16(FP), BX;		\
+	MOVL	SP, SI;				\
+	ADDL	BX, DI;				\
+	ADDL	BX, SI;				\
+	SUBL	BX, CX;				\
+	CALL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $20-0
+	MOVL	DX, 0(SP)
+	MOVL	DI, 4(SP)
+	MOVL	SI, 8(SP)
+	MOVL	CX, 12(SP)
+	MOVL	$0, 16(SP)
+	CALL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+TEXT runtime·procyield(SB),NOSPLIT,$0-0
+	MOVL	cycles+0(FP), AX
+again:
+	PAUSE
+	SUBL	$1, AX
+	JNZ	again
+	RET
+
+TEXT ·publicationBarrier(SB),NOSPLIT,$0-0
+	// Stores are already ordered on x86, so this is just a
+	// compile barrier.
+	RET
+
+// Save state of caller into g->sched,
+// but using fake PC from systemstack_switch.
+// Must only be called from functions with no locals ($0)
+// or else unwinding from systemstack_switch is incorrect.
+TEXT gosave_systemstack_switch<>(SB),NOSPLIT,$0
+	PUSHL	AX
+	PUSHL	BX
+	get_tls(BX)
+	MOVL	g(BX), BX
+	LEAL	arg+0(FP), AX
+	MOVL	AX, (g_sched+gobuf_sp)(BX)
+	MOVL	$runtime·systemstack_switch(SB), AX
+	MOVL	AX, (g_sched+gobuf_pc)(BX)
+	MOVL	$0, (g_sched+gobuf_ret)(BX)
+	// Assert ctxt is zero. See func save.
+	MOVL	(g_sched+gobuf_ctxt)(BX), AX
+	TESTL	AX, AX
+	JZ	2(PC)
+	CALL	runtime·abort(SB)
+	POPL	BX
+	POPL	AX
+	RET
+
+// func asmcgocall_no_g(fn, arg unsafe.Pointer)
+// Call fn(arg) aligned appropriately for the gcc ABI.
+// Called on a system stack, and there may be no g yet (during needm).
+TEXT ·asmcgocall_no_g(SB),NOSPLIT,$0-8
+	MOVL	fn+0(FP), AX
+	MOVL	arg+4(FP), BX
+	MOVL	SP, DX
+	SUBL	$32, SP
+	ANDL	$~15, SP	// alignment, perhaps unnecessary
+	MOVL	DX, 8(SP)	// save old SP
+	MOVL	BX, 0(SP)	// first argument in x86-32 ABI
+	CALL	AX
+	MOVL	8(SP), DX
+	MOVL	DX, SP
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-12
+	MOVL	fn+0(FP), AX
+	MOVL	arg+4(FP), BX
+
+	MOVL	SP, DX
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already. Or we might already
+	// be on the m->gsignal stack.
+	get_tls(CX)
+	MOVL	g(CX), DI
+	CMPL	DI, $0
+	JEQ	nosave	// Don't even have a G yet.
+	MOVL	g_m(DI), BP
+	CMPL	DI, m_gsignal(BP)
+	JEQ	noswitch
+	MOVL	m_g0(BP), SI
+	CMPL	DI, SI
+	JEQ	noswitch
+	CALL	gosave_systemstack_switch<>(SB)
+	get_tls(CX)
+	MOVL	SI, g(CX)
+	MOVL	(g_sched+gobuf_sp)(SI), SP
+
+noswitch:
+	// Now on a scheduling stack (a pthread-created stack).
+	SUBL	$32, SP
+	ANDL	$~15, SP	// alignment, perhaps unnecessary
+	MOVL	DI, 8(SP)	// save g
+	MOVL	(g_stack+stack_hi)(DI), DI
+	SUBL	DX, DI
+	MOVL	DI, 4(SP)	// save depth in stack (can't just save SP, as stack might be copied during a callback)
+	MOVL	BX, 0(SP)	// first argument in x86-32 ABI
+	CALL	AX
+
+	// Restore registers, g, stack pointer.
+	get_tls(CX)
+	MOVL	8(SP), DI
+	MOVL	(g_stack+stack_hi)(DI), SI
+	SUBL	4(SP), SI
+	MOVL	DI, g(CX)
+	MOVL	SI, SP
+
+	MOVL	AX, ret+8(FP)
+	RET
+nosave:
+	// Now on a scheduling stack (a pthread-created stack).
+	SUBL	$32, SP
+	ANDL	$~15, SP	// alignment, perhaps unnecessary
+	MOVL	DX, 4(SP)	// save original stack pointer
+	MOVL	BX, 0(SP)	// first argument in x86-32 ABI
+	CALL	AX
+
+	MOVL	4(SP), CX	// restore original stack pointer
+	MOVL	CX, SP
+	MOVL	AX, ret+8(FP)
+	RET
+
+// cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$12-12  // Frame size must match commented places below
+	NO_LOCAL_POINTERS
+
+	// Skip cgocallbackg, just dropm when fn is nil, and frame is the saved g.
+	// It is used to dropm while thread is exiting.
+	MOVL	fn+0(FP), AX
+	CMPL	AX, $0
+	JNE	loadg
+	// Restore the g from frame.
+	get_tls(CX)
+	MOVL	frame+4(FP), BX
+	MOVL	BX, g(CX)
+	JMP	dropm
+
+loadg:
+	// If g is nil, Go did not create the current thread,
+	// or if this thread never called into Go on pthread platforms.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call through AX.
+	get_tls(CX)
+#ifdef GOOS_windows
+	MOVL	$0, BP
+	CMPL	CX, $0
+	JEQ	2(PC) // TODO
+#endif
+	MOVL	g(CX), BP
+	CMPL	BP, $0
+	JEQ	needm
+	MOVL	g_m(BP), BP
+	MOVL	BP, savedm-4(SP) // saved copy of oldm
+	JMP	havem
+needm:
+	MOVL	$runtime·needAndBindM(SB), AX
+	CALL	AX
+	MOVL	$0, savedm-4(SP)
+	get_tls(CX)
+	MOVL	g(CX), BP
+	MOVL	g_m(BP), BP
+
+	// Set m->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVL	m_g0(BP), SI
+	MOVL	SP, (g_sched+gobuf_sp)(SI)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 0(SP).
+	MOVL	m_g0(BP), SI
+	MOVL	(g_sched+gobuf_sp)(SI), AX
+	MOVL	AX, 0(SP)
+	MOVL	SP, (g_sched+gobuf_sp)(SI)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOVL	m_curg(BP), SI
+	MOVL	SI, g(CX)
+	MOVL	(g_sched+gobuf_sp)(SI), DI // prepare stack as DI
+	MOVL	(g_sched+gobuf_pc)(SI), BP
+	MOVL	BP, -4(DI)  // "push" return PC on the g stack
+	// Gather our arguments into registers.
+	MOVL	fn+0(FP), AX
+	MOVL	frame+4(FP), BX
+	MOVL	ctxt+8(FP), CX
+	LEAL	-(4+12)(DI), SP  // Must match declared frame size
+	MOVL	AX, 0(SP)
+	MOVL	BX, 4(SP)
+	MOVL	CX, 8(SP)
+	CALL	runtime·cgocallbackg(SB)
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	get_tls(CX)
+	MOVL	g(CX), SI
+	MOVL	12(SP), BP  // Must match declared frame size
+	MOVL	BP, (g_sched+gobuf_pc)(SI)
+	LEAL	(12+4)(SP), DI  // Must match declared frame size
+	MOVL	DI, (g_sched+gobuf_sp)(SI)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVL	g(CX), BP
+	MOVL	g_m(BP), BP
+	MOVL	m_g0(BP), SI
+	MOVL	SI, g(CX)
+	MOVL	(g_sched+gobuf_sp)(SI), SP
+	MOVL	0(SP), AX
+	MOVL	AX, (g_sched+gobuf_sp)(SI)
+
+	// If the m on entry was nil, we called needm above to borrow an m,
+	// 1. for the duration of the call on non-pthread platforms,
+	// 2. or the duration of the C thread alive on pthread platforms.
+	// If the m on entry wasn't nil,
+	// 1. the thread might be a Go thread,
+	// 2. or it wasn't the first call from a C thread on pthread platforms,
+	//    since then we skip dropm to reuse the m in the first call.
+	MOVL	savedm-4(SP), DX
+	CMPL	DX, $0
+	JNE	droppedm
+
+	// Skip dropm to reuse it in the next call, when a pthread key has been created.
+	MOVL	_cgo_pthread_key_created(SB), DX
+	// It means cgo is disabled when _cgo_pthread_key_created is a nil pointer, need dropm.
+	CMPL	DX, $0
+	JEQ	dropm
+	CMPL	(DX), $0
+	JNE	droppedm
+
+dropm:
+	MOVL	$runtime·dropm(SB), AX
+	CALL	AX
+droppedm:
+
+	// Done!
+	RET
+
+// void setg(G*); set g. for use by needm.
+TEXT runtime·setg(SB), NOSPLIT, $0-4
+	MOVL	gg+0(FP), BX
+#ifdef GOOS_windows
+	MOVL	runtime·tls_g(SB), CX
+	CMPL	BX, $0
+	JNE	settls
+	MOVL	$0, 0(CX)(FS)
+	RET
+settls:
+	MOVL	g_m(BX), AX
+	LEAL	m_tls(AX), AX
+	MOVL	AX, 0(CX)(FS)
+#endif
+	get_tls(CX)
+	MOVL	BX, g(CX)
+	RET
+
+// void setg_gcc(G*); set g. for use by gcc
+TEXT setg_gcc<>(SB), NOSPLIT, $0
+	get_tls(AX)
+	MOVL	gg+0(FP), DX
+	MOVL	DX, g(AX)
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT,$0-0
+	INT	$3
+loop:
+	JMP	loop
+
+// check that SP is in range [g->stack.lo, g->stack.hi)
+TEXT runtime·stackcheck(SB), NOSPLIT, $0-0
+	get_tls(CX)
+	MOVL	g(CX), AX
+	CMPL	(g_stack+stack_hi)(AX), SP
+	JHI	2(PC)
+	CALL	runtime·abort(SB)
+	CMPL	SP, (g_stack+stack_lo)(AX)
+	JHI	2(PC)
+	CALL	runtime·abort(SB)
+	RET
+
+// func cputicks() int64
+TEXT runtime·cputicks(SB),NOSPLIT,$0-8
+	// LFENCE/MFENCE instruction support is dependent on SSE2.
+	// When no SSE2 support is present do not enforce any serialization
+	// since using CPUID to serialize the instruction stream is
+	// very costly.
+#ifdef GO386_softfloat
+	JMP	rdtsc  // no fence instructions available
+#endif
+	CMPB	internal∕cpu·X86+const_offsetX86HasRDTSCP(SB), $1
+	JNE	fences
+	// Instruction stream serializing RDTSCP is supported.
+	// RDTSCP is supported by Intel Nehalem (2008) and
+	// AMD K8 Rev. F (2006) and newer.
+	RDTSCP
+done:
+	MOVL	AX, ret_lo+0(FP)
+	MOVL	DX, ret_hi+4(FP)
+	RET
+fences:
+	// MFENCE is instruction stream serializing and flushes the
+	// store buffers on AMD. The serialization semantics of LFENCE on AMD
+	// are dependent on MSR C001_1029 and CPU generation.
+	// LFENCE on Intel does wait for all previous instructions to have executed.
+	// Intel recommends MFENCE;LFENCE in its manuals before RDTSC to have all
+	// previous instructions executed and all previous loads and stores to globally visible.
+	// Using MFENCE;LFENCE here aligns the serializing properties without
+	// runtime detection of CPU manufacturer.
+	MFENCE
+	LFENCE
+rdtsc:
+	RDTSC
+	JMP done
+
+TEXT ldt0setup<>(SB),NOSPLIT,$16-0
+#ifdef GOOS_windows
+	CALL	runtime·wintls(SB)
+#endif
+	// set up ldt 7 to point at m0.tls
+	// ldt 1 would be fine on Linux, but on OS X, 7 is as low as we can go.
+	// the entry number is just a hint.  setldt will set up GS with what it used.
+	MOVL	$7, 0(SP)
+	LEAL	runtime·m0+m_tls(SB), AX
+	MOVL	AX, 4(SP)
+	MOVL	$32, 8(SP)	// sizeof(tls array)
+	CALL	runtime·setldt(SB)
+	RET
+
+TEXT runtime·emptyfunc(SB),0,$0-0
+	RET
+
+// hash function using AES hardware instructions
+TEXT runtime·memhash(SB),NOSPLIT,$0-16
+	CMPB	runtime·useAeshash(SB), $0
+	JEQ	noaes
+	MOVL	p+0(FP), AX	// ptr to data
+	MOVL	s+8(FP), BX	// size
+	LEAL	ret+12(FP), DX
+	JMP	aeshashbody<>(SB)
+noaes:
+	JMP	runtime·memhashFallback(SB)
+
+TEXT runtime·strhash(SB),NOSPLIT,$0-12
+	CMPB	runtime·useAeshash(SB), $0
+	JEQ	noaes
+	MOVL	p+0(FP), AX	// ptr to string object
+	MOVL	4(AX), BX	// length of string
+	MOVL	(AX), AX	// string data
+	LEAL	ret+8(FP), DX
+	JMP	aeshashbody<>(SB)
+noaes:
+	JMP	runtime·strhashFallback(SB)
+
+// AX: data
+// BX: length
+// DX: address to put return value
+TEXT aeshashbody<>(SB),NOSPLIT,$0-0
+	MOVL	h+4(FP), X0	            // 32 bits of per-table hash seed
+	PINSRW	$4, BX, X0	            // 16 bits of length
+	PSHUFHW	$0, X0, X0	            // replace size with its low 2 bytes repeated 4 times
+	MOVO	X0, X1                      // save unscrambled seed
+	PXOR	runtime·aeskeysched(SB), X0 // xor in per-process seed
+	AESENC	X0, X0                      // scramble seed
+
+	CMPL	BX, $16
+	JB	aes0to15
+	JE	aes16
+	CMPL	BX, $32
+	JBE	aes17to32
+	CMPL	BX, $64
+	JBE	aes33to64
+	JMP	aes65plus
+
+aes0to15:
+	TESTL	BX, BX
+	JE	aes0
+
+	ADDL	$16, AX
+	TESTW	$0xff0, AX
+	JE	endofpage
+
+	// 16 bytes loaded at this address won't cross
+	// a page boundary, so we can load it directly.
+	MOVOU	-16(AX), X1
+	ADDL	BX, BX
+	PAND	masks<>(SB)(BX*8), X1
+
+final1:
+	PXOR	X0, X1	// xor data with seed
+	AESENC	X1, X1  // scramble combo 3 times
+	AESENC	X1, X1
+	AESENC	X1, X1
+	MOVL	X1, (DX)
+	RET
+
+endofpage:
+	// address ends in 1111xxxx. Might be up against
+	// a page boundary, so load ending at last byte.
+	// Then shift bytes down using pshufb.
+	MOVOU	-32(AX)(BX*1), X1
+	ADDL	BX, BX
+	PSHUFB	shifts<>(SB)(BX*8), X1
+	JMP	final1
+
+aes0:
+	// Return scrambled input seed
+	AESENC	X0, X0
+	MOVL	X0, (DX)
+	RET
+
+aes16:
+	MOVOU	(AX), X1
+	JMP	final1
+
+aes17to32:
+	// make second starting seed
+	PXOR	runtime·aeskeysched+16(SB), X1
+	AESENC	X1, X1
+
+	// load data to be hashed
+	MOVOU	(AX), X2
+	MOVOU	-16(AX)(BX*1), X3
+
+	// xor with seed
+	PXOR	X0, X2
+	PXOR	X1, X3
+
+	// scramble 3 times
+	AESENC	X2, X2
+	AESENC	X3, X3
+	AESENC	X2, X2
+	AESENC	X3, X3
+	AESENC	X2, X2
+	AESENC	X3, X3
+
+	// combine results
+	PXOR	X3, X2
+	MOVL	X2, (DX)
+	RET
+
+aes33to64:
+	// make 3 more starting seeds
+	MOVO	X1, X2
+	MOVO	X1, X3
+	PXOR	runtime·aeskeysched+16(SB), X1
+	PXOR	runtime·aeskeysched+32(SB), X2
+	PXOR	runtime·aeskeysched+48(SB), X3
+	AESENC	X1, X1
+	AESENC	X2, X2
+	AESENC	X3, X3
+
+	MOVOU	(AX), X4
+	MOVOU	16(AX), X5
+	MOVOU	-32(AX)(BX*1), X6
+	MOVOU	-16(AX)(BX*1), X7
+
+	PXOR	X0, X4
+	PXOR	X1, X5
+	PXOR	X2, X6
+	PXOR	X3, X7
+
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	PXOR	X6, X4
+	PXOR	X7, X5
+	PXOR	X5, X4
+	MOVL	X4, (DX)
+	RET
+
+aes65plus:
+	// make 3 more starting seeds
+	MOVO	X1, X2
+	MOVO	X1, X3
+	PXOR	runtime·aeskeysched+16(SB), X1
+	PXOR	runtime·aeskeysched+32(SB), X2
+	PXOR	runtime·aeskeysched+48(SB), X3
+	AESENC	X1, X1
+	AESENC	X2, X2
+	AESENC	X3, X3
+
+	// start with last (possibly overlapping) block
+	MOVOU	-64(AX)(BX*1), X4
+	MOVOU	-48(AX)(BX*1), X5
+	MOVOU	-32(AX)(BX*1), X6
+	MOVOU	-16(AX)(BX*1), X7
+
+	// scramble state once
+	AESENC	X0, X4
+	AESENC	X1, X5
+	AESENC	X2, X6
+	AESENC	X3, X7
+
+	// compute number of remaining 64-byte blocks
+	DECL	BX
+	SHRL	$6, BX
+
+aesloop:
+	// scramble state, xor in a block
+	MOVOU	(AX), X0
+	MOVOU	16(AX), X1
+	MOVOU	32(AX), X2
+	MOVOU	48(AX), X3
+	AESENC	X0, X4
+	AESENC	X1, X5
+	AESENC	X2, X6
+	AESENC	X3, X7
+
+	// scramble state
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	ADDL	$64, AX
+	DECL	BX
+	JNE	aesloop
+
+	// 3 more scrambles to finish
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	PXOR	X6, X4
+	PXOR	X7, X5
+	PXOR	X5, X4
+	MOVL	X4, (DX)
+	RET
+
+TEXT runtime·memhash32(SB),NOSPLIT,$0-12
+	CMPB	runtime·useAeshash(SB), $0
+	JEQ	noaes
+	MOVL	p+0(FP), AX	// ptr to data
+	MOVL	h+4(FP), X0	// seed
+	PINSRD	$1, (AX), X0	// data
+	AESENC	runtime·aeskeysched+0(SB), X0
+	AESENC	runtime·aeskeysched+16(SB), X0
+	AESENC	runtime·aeskeysched+32(SB), X0
+	MOVL	X0, ret+8(FP)
+	RET
+noaes:
+	JMP	runtime·memhash32Fallback(SB)
+
+TEXT runtime·memhash64(SB),NOSPLIT,$0-12
+	CMPB	runtime·useAeshash(SB), $0
+	JEQ	noaes
+	MOVL	p+0(FP), AX	// ptr to data
+	MOVQ	(AX), X0	// data
+	PINSRD	$2, h+4(FP), X0	// seed
+	AESENC	runtime·aeskeysched+0(SB), X0
+	AESENC	runtime·aeskeysched+16(SB), X0
+	AESENC	runtime·aeskeysched+32(SB), X0
+	MOVL	X0, ret+8(FP)
+	RET
+noaes:
+	JMP	runtime·memhash64Fallback(SB)
+
+// simple mask to get rid of data in the high part of the register.
+DATA masks<>+0x00(SB)/4, $0x00000000
+DATA masks<>+0x04(SB)/4, $0x00000000
+DATA masks<>+0x08(SB)/4, $0x00000000
+DATA masks<>+0x0c(SB)/4, $0x00000000
+
+DATA masks<>+0x10(SB)/4, $0x000000ff
+DATA masks<>+0x14(SB)/4, $0x00000000
+DATA masks<>+0x18(SB)/4, $0x00000000
+DATA masks<>+0x1c(SB)/4, $0x00000000
+
+DATA masks<>+0x20(SB)/4, $0x0000ffff
+DATA masks<>+0x24(SB)/4, $0x00000000
+DATA masks<>+0x28(SB)/4, $0x00000000
+DATA masks<>+0x2c(SB)/4, $0x00000000
+
+DATA masks<>+0x30(SB)/4, $0x00ffffff
+DATA masks<>+0x34(SB)/4, $0x00000000
+DATA masks<>+0x38(SB)/4, $0x00000000
+DATA masks<>+0x3c(SB)/4, $0x00000000
+
+DATA masks<>+0x40(SB)/4, $0xffffffff
+DATA masks<>+0x44(SB)/4, $0x00000000
+DATA masks<>+0x48(SB)/4, $0x00000000
+DATA masks<>+0x4c(SB)/4, $0x00000000
+
+DATA masks<>+0x50(SB)/4, $0xffffffff
+DATA masks<>+0x54(SB)/4, $0x000000ff
+DATA masks<>+0x58(SB)/4, $0x00000000
+DATA masks<>+0x5c(SB)/4, $0x00000000
+
+DATA masks<>+0x60(SB)/4, $0xffffffff
+DATA masks<>+0x64(SB)/4, $0x0000ffff
+DATA masks<>+0x68(SB)/4, $0x00000000
+DATA masks<>+0x6c(SB)/4, $0x00000000
+
+DATA masks<>+0x70(SB)/4, $0xffffffff
+DATA masks<>+0x74(SB)/4, $0x00ffffff
+DATA masks<>+0x78(SB)/4, $0x00000000
+DATA masks<>+0x7c(SB)/4, $0x00000000
+
+DATA masks<>+0x80(SB)/4, $0xffffffff
+DATA masks<>+0x84(SB)/4, $0xffffffff
+DATA masks<>+0x88(SB)/4, $0x00000000
+DATA masks<>+0x8c(SB)/4, $0x00000000
+
+DATA masks<>+0x90(SB)/4, $0xffffffff
+DATA masks<>+0x94(SB)/4, $0xffffffff
+DATA masks<>+0x98(SB)/4, $0x000000ff
+DATA masks<>+0x9c(SB)/4, $0x00000000
+
+DATA masks<>+0xa0(SB)/4, $0xffffffff
+DATA masks<>+0xa4(SB)/4, $0xffffffff
+DATA masks<>+0xa8(SB)/4, $0x0000ffff
+DATA masks<>+0xac(SB)/4, $0x00000000
+
+DATA masks<>+0xb0(SB)/4, $0xffffffff
+DATA masks<>+0xb4(SB)/4, $0xffffffff
+DATA masks<>+0xb8(SB)/4, $0x00ffffff
+DATA masks<>+0xbc(SB)/4, $0x00000000
+
+DATA masks<>+0xc0(SB)/4, $0xffffffff
+DATA masks<>+0xc4(SB)/4, $0xffffffff
+DATA masks<>+0xc8(SB)/4, $0xffffffff
+DATA masks<>+0xcc(SB)/4, $0x00000000
+
+DATA masks<>+0xd0(SB)/4, $0xffffffff
+DATA masks<>+0xd4(SB)/4, $0xffffffff
+DATA masks<>+0xd8(SB)/4, $0xffffffff
+DATA masks<>+0xdc(SB)/4, $0x000000ff
+
+DATA masks<>+0xe0(SB)/4, $0xffffffff
+DATA masks<>+0xe4(SB)/4, $0xffffffff
+DATA masks<>+0xe8(SB)/4, $0xffffffff
+DATA masks<>+0xec(SB)/4, $0x0000ffff
+
+DATA masks<>+0xf0(SB)/4, $0xffffffff
+DATA masks<>+0xf4(SB)/4, $0xffffffff
+DATA masks<>+0xf8(SB)/4, $0xffffffff
+DATA masks<>+0xfc(SB)/4, $0x00ffffff
+
+GLOBL masks<>(SB),RODATA,$256
+
+// these are arguments to pshufb. They move data down from
+// the high bytes of the register to the low bytes of the register.
+// index is how many bytes to move.
+DATA shifts<>+0x00(SB)/4, $0x00000000
+DATA shifts<>+0x04(SB)/4, $0x00000000
+DATA shifts<>+0x08(SB)/4, $0x00000000
+DATA shifts<>+0x0c(SB)/4, $0x00000000
+
+DATA shifts<>+0x10(SB)/4, $0xffffff0f
+DATA shifts<>+0x14(SB)/4, $0xffffffff
+DATA shifts<>+0x18(SB)/4, $0xffffffff
+DATA shifts<>+0x1c(SB)/4, $0xffffffff
+
+DATA shifts<>+0x20(SB)/4, $0xffff0f0e
+DATA shifts<>+0x24(SB)/4, $0xffffffff
+DATA shifts<>+0x28(SB)/4, $0xffffffff
+DATA shifts<>+0x2c(SB)/4, $0xffffffff
+
+DATA shifts<>+0x30(SB)/4, $0xff0f0e0d
+DATA shifts<>+0x34(SB)/4, $0xffffffff
+DATA shifts<>+0x38(SB)/4, $0xffffffff
+DATA shifts<>+0x3c(SB)/4, $0xffffffff
+
+DATA shifts<>+0x40(SB)/4, $0x0f0e0d0c
+DATA shifts<>+0x44(SB)/4, $0xffffffff
+DATA shifts<>+0x48(SB)/4, $0xffffffff
+DATA shifts<>+0x4c(SB)/4, $0xffffffff
+
+DATA shifts<>+0x50(SB)/4, $0x0e0d0c0b
+DATA shifts<>+0x54(SB)/4, $0xffffff0f
+DATA shifts<>+0x58(SB)/4, $0xffffffff
+DATA shifts<>+0x5c(SB)/4, $0xffffffff
+
+DATA shifts<>+0x60(SB)/4, $0x0d0c0b0a
+DATA shifts<>+0x64(SB)/4, $0xffff0f0e
+DATA shifts<>+0x68(SB)/4, $0xffffffff
+DATA shifts<>+0x6c(SB)/4, $0xffffffff
+
+DATA shifts<>+0x70(SB)/4, $0x0c0b0a09
+DATA shifts<>+0x74(SB)/4, $0xff0f0e0d
+DATA shifts<>+0x78(SB)/4, $0xffffffff
+DATA shifts<>+0x7c(SB)/4, $0xffffffff
+
+DATA shifts<>+0x80(SB)/4, $0x0b0a0908
+DATA shifts<>+0x84(SB)/4, $0x0f0e0d0c
+DATA shifts<>+0x88(SB)/4, $0xffffffff
+DATA shifts<>+0x8c(SB)/4, $0xffffffff
+
+DATA shifts<>+0x90(SB)/4, $0x0a090807
+DATA shifts<>+0x94(SB)/4, $0x0e0d0c0b
+DATA shifts<>+0x98(SB)/4, $0xffffff0f
+DATA shifts<>+0x9c(SB)/4, $0xffffffff
+
+DATA shifts<>+0xa0(SB)/4, $0x09080706
+DATA shifts<>+0xa4(SB)/4, $0x0d0c0b0a
+DATA shifts<>+0xa8(SB)/4, $0xffff0f0e
+DATA shifts<>+0xac(SB)/4, $0xffffffff
+
+DATA shifts<>+0xb0(SB)/4, $0x08070605
+DATA shifts<>+0xb4(SB)/4, $0x0c0b0a09
+DATA shifts<>+0xb8(SB)/4, $0xff0f0e0d
+DATA shifts<>+0xbc(SB)/4, $0xffffffff
+
+DATA shifts<>+0xc0(SB)/4, $0x07060504
+DATA shifts<>+0xc4(SB)/4, $0x0b0a0908
+DATA shifts<>+0xc8(SB)/4, $0x0f0e0d0c
+DATA shifts<>+0xcc(SB)/4, $0xffffffff
+
+DATA shifts<>+0xd0(SB)/4, $0x06050403
+DATA shifts<>+0xd4(SB)/4, $0x0a090807
+DATA shifts<>+0xd8(SB)/4, $0x0e0d0c0b
+DATA shifts<>+0xdc(SB)/4, $0xffffff0f
+
+DATA shifts<>+0xe0(SB)/4, $0x05040302
+DATA shifts<>+0xe4(SB)/4, $0x09080706
+DATA shifts<>+0xe8(SB)/4, $0x0d0c0b0a
+DATA shifts<>+0xec(SB)/4, $0xffff0f0e
+
+DATA shifts<>+0xf0(SB)/4, $0x04030201
+DATA shifts<>+0xf4(SB)/4, $0x08070605
+DATA shifts<>+0xf8(SB)/4, $0x0c0b0a09
+DATA shifts<>+0xfc(SB)/4, $0xff0f0e0d
+
+GLOBL shifts<>(SB),RODATA,$256
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	// check that masks<>(SB) and shifts<>(SB) are aligned to 16-byte
+	MOVL	$masks<>(SB), AX
+	MOVL	$shifts<>(SB), BX
+	ORL	BX, AX
+	TESTL	$15, AX
+	SETEQ	ret+0(FP)
+	RET
+
+TEXT runtime·return0(SB), NOSPLIT, $0
+	MOVL	$0, AX
+	RET
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT,$0
+	get_tls(CX)
+	MOVL	g(CX), AX
+	MOVL	g_m(AX), AX
+	MOVL	m_curg(AX), AX
+	MOVL	(g_stack+stack_hi)(AX), AX
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+TEXT runtime·goexit(SB),NOSPLIT|TOPFRAME,$0-0
+	BYTE	$0x90	// NOP
+	CALL	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	BYTE	$0x90	// NOP
+
+// Add a module's moduledata to the linked list of moduledata objects. This
+// is called from .init_array by a function generated in the linker and so
+// follows the platform ABI wrt register preservation -- it only touches AX,
+// CX (implicitly) and DX, but it does not follow the ABI wrt arguments:
+// instead the pointer to the moduledata is passed in AX.
+TEXT runtime·addmoduledata(SB),NOSPLIT,$0-0
+	MOVL	runtime·lastmoduledatap(SB), DX
+	MOVL	AX, moduledata_next(DX)
+	MOVL	AX, runtime·lastmoduledatap(SB)
+	RET
+
+TEXT runtime·uint32tofloat64(SB),NOSPLIT,$8-12
+	MOVL	a+0(FP), AX
+	MOVL	AX, 0(SP)
+	MOVL	$0, 4(SP)
+	FMOVV	0(SP), F0
+	FMOVDP	F0, ret+4(FP)
+	RET
+
+TEXT runtime·float64touint32(SB),NOSPLIT,$12-12
+	FMOVD	a+0(FP), F0
+	FSTCW	0(SP)
+	FLDCW	runtime·controlWord64trunc(SB)
+	FMOVVP	F0, 4(SP)
+	FLDCW	0(SP)
+	MOVL	4(SP), AX
+	MOVL	AX, ret+8(FP)
+	RET
+
+// gcWriteBarrier informs the GC about heap pointer writes.
+//
+// gcWriteBarrier returns space in a write barrier buffer which
+// should be filled in by the caller.
+// gcWriteBarrier does NOT follow the Go ABI. It accepts the
+// number of bytes of buffer needed in DI, and returns a pointer
+// to the buffer space in DI.
+// It clobbers FLAGS. It does not clobber any general-purpose registers,
+// but may clobber others (e.g., SSE registers).
+// Typical use would be, when doing *(CX+88) = AX
+//     CMPL    $0, runtime.writeBarrier(SB)
+//     JEQ     dowrite
+//     CALL    runtime.gcBatchBarrier2(SB)
+//     MOVL    AX, (DI)
+//     MOVL    88(CX), DX
+//     MOVL    DX, 4(DI)
+// dowrite:
+//     MOVL    AX, 88(CX)
+TEXT gcWriteBarrier<>(SB),NOSPLIT,$28
+	// Save the registers clobbered by the fast path. This is slightly
+	// faster than having the caller spill these.
+	MOVL	CX, 20(SP)
+	MOVL	BX, 24(SP)
+retry:
+	// TODO: Consider passing g.m.p in as an argument so they can be shared
+	// across a sequence of write barriers.
+	get_tls(BX)
+	MOVL	g(BX), BX
+	MOVL	g_m(BX), BX
+	MOVL	m_p(BX), BX
+	// Get current buffer write position.
+	MOVL	(p_wbBuf+wbBuf_next)(BX), CX	// original next position
+	ADDL	DI, CX				// new next position
+	// Is the buffer full?
+	CMPL	CX, (p_wbBuf+wbBuf_end)(BX)
+	JA	flush
+	// Commit to the larger buffer.
+	MOVL	CX, (p_wbBuf+wbBuf_next)(BX)
+	// Make return value (the original next position)
+	SUBL	DI, CX
+	MOVL	CX, DI
+	// Restore registers.
+	MOVL	20(SP), CX
+	MOVL	24(SP), BX
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	MOVL	DI, 0(SP)
+	MOVL	AX, 4(SP)
+	// BX already saved
+	// CX already saved
+	MOVL	DX, 8(SP)
+	MOVL	BP, 12(SP)
+	MOVL	SI, 16(SP)
+	// DI already saved
+
+	CALL	runtime·wbBufFlush(SB)
+
+	MOVL	0(SP), DI
+	MOVL	4(SP), AX
+	MOVL	8(SP), DX
+	MOVL	12(SP), BP
+	MOVL	16(SP), SI
+	JMP	retry
+
+TEXT runtime·gcWriteBarrier1<ABIInternal>(SB),NOSPLIT,$0
+	MOVL	$4, DI
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier2<ABIInternal>(SB),NOSPLIT,$0
+	MOVL	$8, DI
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier3<ABIInternal>(SB),NOSPLIT,$0
+	MOVL	$12, DI
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier4<ABIInternal>(SB),NOSPLIT,$0
+	MOVL	$16, DI
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier5<ABIInternal>(SB),NOSPLIT,$0
+	MOVL	$20, DI
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier6<ABIInternal>(SB),NOSPLIT,$0
+	MOVL	$24, DI
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier7<ABIInternal>(SB),NOSPLIT,$0
+	MOVL	$28, DI
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier8<ABIInternal>(SB),NOSPLIT,$0
+	MOVL	$32, DI
+	JMP	gcWriteBarrier<>(SB)
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+TEXT runtime·panicIndex(SB),NOSPLIT,$0-8
+	MOVL	AX, x+0(FP)
+	MOVL	CX, y+4(FP)
+	JMP	runtime·goPanicIndex(SB)
+TEXT runtime·panicIndexU(SB),NOSPLIT,$0-8
+	MOVL	AX, x+0(FP)
+	MOVL	CX, y+4(FP)
+	JMP	runtime·goPanicIndexU(SB)
+TEXT runtime·panicSliceAlen(SB),NOSPLIT,$0-8
+	MOVL	CX, x+0(FP)
+	MOVL	DX, y+4(FP)
+	JMP	runtime·goPanicSliceAlen(SB)
+TEXT runtime·panicSliceAlenU(SB),NOSPLIT,$0-8
+	MOVL	CX, x+0(FP)
+	MOVL	DX, y+4(FP)
+	JMP	runtime·goPanicSliceAlenU(SB)
+TEXT runtime·panicSliceAcap(SB),NOSPLIT,$0-8
+	MOVL	CX, x+0(FP)
+	MOVL	DX, y+4(FP)
+	JMP	runtime·goPanicSliceAcap(SB)
+TEXT runtime·panicSliceAcapU(SB),NOSPLIT,$0-8
+	MOVL	CX, x+0(FP)
+	MOVL	DX, y+4(FP)
+	JMP	runtime·goPanicSliceAcapU(SB)
+TEXT runtime·panicSliceB(SB),NOSPLIT,$0-8
+	MOVL	AX, x+0(FP)
+	MOVL	CX, y+4(FP)
+	JMP	runtime·goPanicSliceB(SB)
+TEXT runtime·panicSliceBU(SB),NOSPLIT,$0-8
+	MOVL	AX, x+0(FP)
+	MOVL	CX, y+4(FP)
+	JMP	runtime·goPanicSliceBU(SB)
+TEXT runtime·panicSlice3Alen(SB),NOSPLIT,$0-8
+	MOVL	DX, x+0(FP)
+	MOVL	BX, y+4(FP)
+	JMP	runtime·goPanicSlice3Alen(SB)
+TEXT runtime·panicSlice3AlenU(SB),NOSPLIT,$0-8
+	MOVL	DX, x+0(FP)
+	MOVL	BX, y+4(FP)
+	JMP	runtime·goPanicSlice3AlenU(SB)
+TEXT runtime·panicSlice3Acap(SB),NOSPLIT,$0-8
+	MOVL	DX, x+0(FP)
+	MOVL	BX, y+4(FP)
+	JMP	runtime·goPanicSlice3Acap(SB)
+TEXT runtime·panicSlice3AcapU(SB),NOSPLIT,$0-8
+	MOVL	DX, x+0(FP)
+	MOVL	BX, y+4(FP)
+	JMP	runtime·goPanicSlice3AcapU(SB)
+TEXT runtime·panicSlice3B(SB),NOSPLIT,$0-8
+	MOVL	CX, x+0(FP)
+	MOVL	DX, y+4(FP)
+	JMP	runtime·goPanicSlice3B(SB)
+TEXT runtime·panicSlice3BU(SB),NOSPLIT,$0-8
+	MOVL	CX, x+0(FP)
+	MOVL	DX, y+4(FP)
+	JMP	runtime·goPanicSlice3BU(SB)
+TEXT runtime·panicSlice3C(SB),NOSPLIT,$0-8
+	MOVL	AX, x+0(FP)
+	MOVL	CX, y+4(FP)
+	JMP	runtime·goPanicSlice3C(SB)
+TEXT runtime·panicSlice3CU(SB),NOSPLIT,$0-8
+	MOVL	AX, x+0(FP)
+	MOVL	CX, y+4(FP)
+	JMP	runtime·goPanicSlice3CU(SB)
+TEXT runtime·panicSliceConvert(SB),NOSPLIT,$0-8
+	MOVL	DX, x+0(FP)
+	MOVL	BX, y+4(FP)
+	JMP	runtime·goPanicSliceConvert(SB)
+
+// Extended versions for 64-bit indexes.
+TEXT runtime·panicExtendIndex(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	AX, lo+4(FP)
+	MOVL	CX, y+8(FP)
+	JMP	runtime·goPanicExtendIndex(SB)
+TEXT runtime·panicExtendIndexU(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	AX, lo+4(FP)
+	MOVL	CX, y+8(FP)
+	JMP	runtime·goPanicExtendIndexU(SB)
+TEXT runtime·panicExtendSliceAlen(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	CX, lo+4(FP)
+	MOVL	DX, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAlen(SB)
+TEXT runtime·panicExtendSliceAlenU(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	CX, lo+4(FP)
+	MOVL	DX, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAlenU(SB)
+TEXT runtime·panicExtendSliceAcap(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	CX, lo+4(FP)
+	MOVL	DX, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAcap(SB)
+TEXT runtime·panicExtendSliceAcapU(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	CX, lo+4(FP)
+	MOVL	DX, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAcapU(SB)
+TEXT runtime·panicExtendSliceB(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	AX, lo+4(FP)
+	MOVL	CX, y+8(FP)
+	JMP	runtime·goPanicExtendSliceB(SB)
+TEXT runtime·panicExtendSliceBU(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	AX, lo+4(FP)
+	MOVL	CX, y+8(FP)
+	JMP	runtime·goPanicExtendSliceBU(SB)
+TEXT runtime·panicExtendSlice3Alen(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	DX, lo+4(FP)
+	MOVL	BX, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3Alen(SB)
+TEXT runtime·panicExtendSlice3AlenU(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	DX, lo+4(FP)
+	MOVL	BX, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3AlenU(SB)
+TEXT runtime·panicExtendSlice3Acap(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	DX, lo+4(FP)
+	MOVL	BX, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3Acap(SB)
+TEXT runtime·panicExtendSlice3AcapU(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	DX, lo+4(FP)
+	MOVL	BX, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3AcapU(SB)
+TEXT runtime·panicExtendSlice3B(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	CX, lo+4(FP)
+	MOVL	DX, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3B(SB)
+TEXT runtime·panicExtendSlice3BU(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	CX, lo+4(FP)
+	MOVL	DX, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3BU(SB)
+TEXT runtime·panicExtendSlice3C(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	AX, lo+4(FP)
+	MOVL	CX, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3C(SB)
+TEXT runtime·panicExtendSlice3CU(SB),NOSPLIT,$0-12
+	MOVL	SI, hi+0(FP)
+	MOVL	AX, lo+4(FP)
+	MOVL	CX, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3CU(SB)
+
+#ifdef GOOS_android
+// Use the free TLS_SLOT_APP slot #2 on Android Q.
+// Earlier androids are set up in gcc_android.c.
+DATA runtime·tls_g+0(SB)/4, $8
+GLOBL runtime·tls_g+0(SB), NOPTR, $4
+#endif
+#ifdef GOOS_windows
+GLOBL runtime·tls_g+0(SB), NOPTR, $4
+#endif
diff --git a/src/runtime/asm_amd64.h b/src/runtime/asm_amd64.h
new file mode 100644
index 0000000..f7a8896
--- /dev/null
+++ b/src/runtime/asm_amd64.h
@@ -0,0 +1,25 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Define features that are guaranteed to be supported by setting the AMD64 variable.
+// If a feature is supported, there's no need to check it at runtime every time.
+
+#ifdef GOAMD64_v2
+#define hasPOPCNT
+#define hasSSE42
+#endif
+
+#ifdef GOAMD64_v3
+#define hasAVX
+#define hasAVX2
+#define hasPOPCNT
+#define hasSSE42
+#endif
+
+#ifdef GOAMD64_v4
+#define hasAVX
+#define hasAVX2
+#define hasPOPCNT
+#define hasSSE42
+#endif
diff --git a/src/runtime/asm_amd64.s b/src/runtime/asm_amd64.s
new file mode 100644
index 0000000..edf0909
--- /dev/null
+++ b/src/runtime/asm_amd64.s
@@ -0,0 +1,2093 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+#include "cgo/abi_amd64.h"
+
+// _rt0_amd64 is common startup code for most amd64 systems when using
+// internal linking. This is the entry point for the program from the
+// kernel for an ordinary -buildmode=exe program. The stack holds the
+// number of arguments and the C-style argv.
+TEXT _rt0_amd64(SB),NOSPLIT,$-8
+	MOVQ	0(SP), DI	// argc
+	LEAQ	8(SP), SI	// argv
+	JMP	runtime·rt0_go(SB)
+
+// main is common startup code for most amd64 systems when using
+// external linking. The C startup code will call the symbol "main"
+// passing argc and argv in the usual C ABI registers DI and SI.
+TEXT main(SB),NOSPLIT,$-8
+	JMP	runtime·rt0_go(SB)
+
+// _rt0_amd64_lib is common startup code for most amd64 systems when
+// using -buildmode=c-archive or -buildmode=c-shared. The linker will
+// arrange to invoke this function as a global constructor (for
+// c-archive) or when the shared library is loaded (for c-shared).
+// We expect argc and argv to be passed in the usual C ABI registers
+// DI and SI.
+TEXT _rt0_amd64_lib(SB),NOSPLIT|NOFRAME,$0
+	// Transition from C ABI to Go ABI.
+	PUSH_REGS_HOST_TO_ABI0()
+
+	MOVQ	DI, _rt0_amd64_lib_argc<>(SB)
+	MOVQ	SI, _rt0_amd64_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	CALL	runtime·libpreinit(SB)
+
+	// Create a new thread to finish Go runtime initialization.
+	MOVQ	_cgo_sys_thread_create(SB), AX
+	TESTQ	AX, AX
+	JZ	nocgo
+
+	// We're calling back to C.
+	// Align stack per ELF ABI requirements.
+	MOVQ	SP, BX  // Callee-save in C ABI
+	ANDQ	$~15, SP
+	MOVQ	$_rt0_amd64_lib_go(SB), DI
+	MOVQ	$0, SI
+	CALL	AX
+	MOVQ	BX, SP
+	JMP	restore
+
+nocgo:
+	ADJSP	$16
+	MOVQ	$0x800000, 0(SP)		// stacksize
+	MOVQ	$_rt0_amd64_lib_go(SB), AX
+	MOVQ	AX, 8(SP)			// fn
+	CALL	runtime·newosproc0(SB)
+	ADJSP	$-16
+
+restore:
+	POP_REGS_HOST_TO_ABI0()
+	RET
+
+// _rt0_amd64_lib_go initializes the Go runtime.
+// This is started in a separate thread by _rt0_amd64_lib.
+TEXT _rt0_amd64_lib_go(SB),NOSPLIT,$0
+	MOVQ	_rt0_amd64_lib_argc<>(SB), DI
+	MOVQ	_rt0_amd64_lib_argv<>(SB), SI
+	JMP	runtime·rt0_go(SB)
+
+DATA _rt0_amd64_lib_argc<>(SB)/8, $0
+GLOBL _rt0_amd64_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_amd64_lib_argv<>(SB)/8, $0
+GLOBL _rt0_amd64_lib_argv<>(SB),NOPTR, $8
+
+#ifdef GOAMD64_v2
+DATA bad_cpu_msg<>+0x00(SB)/84, $"This program can only be run on AMD64 processors with v2 microarchitecture support.\n"
+#endif
+
+#ifdef GOAMD64_v3
+DATA bad_cpu_msg<>+0x00(SB)/84, $"This program can only be run on AMD64 processors with v3 microarchitecture support.\n"
+#endif
+
+#ifdef GOAMD64_v4
+DATA bad_cpu_msg<>+0x00(SB)/84, $"This program can only be run on AMD64 processors with v4 microarchitecture support.\n"
+#endif
+
+GLOBL bad_cpu_msg<>(SB), RODATA, $84
+
+// Define a list of AMD64 microarchitecture level features
+// https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels
+
+                     // SSE3     SSSE3    CMPXCHNG16 SSE4.1    SSE4.2    POPCNT
+#define V2_FEATURES_CX (1 << 0 | 1 << 9 | 1 << 13  | 1 << 19 | 1 << 20 | 1 << 23)
+                         // LAHF/SAHF
+#define V2_EXT_FEATURES_CX (1 << 0)
+                                      // FMA       MOVBE     OSXSAVE   AVX       F16C
+#define V3_FEATURES_CX (V2_FEATURES_CX | 1 << 12 | 1 << 22 | 1 << 27 | 1 << 28 | 1 << 29)
+                                              // ABM (FOR LZNCT)
+#define V3_EXT_FEATURES_CX (V2_EXT_FEATURES_CX | 1 << 5)
+                         // BMI1     AVX2     BMI2
+#define V3_EXT_FEATURES_BX (1 << 3 | 1 << 5 | 1 << 8)
+                       // XMM      YMM
+#define V3_OS_SUPPORT_AX (1 << 1 | 1 << 2)
+
+#define V4_FEATURES_CX V3_FEATURES_CX
+
+#define V4_EXT_FEATURES_CX V3_EXT_FEATURES_CX
+                                              // AVX512F   AVX512DQ  AVX512CD  AVX512BW  AVX512VL
+#define V4_EXT_FEATURES_BX (V3_EXT_FEATURES_BX | 1 << 16 | 1 << 17 | 1 << 28 | 1 << 30 | 1 << 31)
+                                          // OPMASK   ZMM
+#define V4_OS_SUPPORT_AX (V3_OS_SUPPORT_AX | 1 << 5 | (1 << 6 | 1 << 7))
+
+#ifdef GOAMD64_v2
+#define NEED_MAX_CPUID 0x80000001
+#define NEED_FEATURES_CX V2_FEATURES_CX
+#define NEED_EXT_FEATURES_CX V2_EXT_FEATURES_CX
+#endif
+
+#ifdef GOAMD64_v3
+#define NEED_MAX_CPUID 0x80000001
+#define NEED_FEATURES_CX V3_FEATURES_CX
+#define NEED_EXT_FEATURES_CX V3_EXT_FEATURES_CX
+#define NEED_EXT_FEATURES_BX V3_EXT_FEATURES_BX
+#define NEED_OS_SUPPORT_AX V3_OS_SUPPORT_AX
+#endif
+
+#ifdef GOAMD64_v4
+#define NEED_MAX_CPUID 0x80000001
+#define NEED_FEATURES_CX V4_FEATURES_CX
+#define NEED_EXT_FEATURES_CX V4_EXT_FEATURES_CX
+#define NEED_EXT_FEATURES_BX V4_EXT_FEATURES_BX
+
+// Darwin requires a different approach to check AVX512 support, see CL 285572.
+#ifdef GOOS_darwin
+#define NEED_OS_SUPPORT_AX V3_OS_SUPPORT_AX
+// These values are from:
+// https://github.com/apple/darwin-xnu/blob/xnu-4570.1.46/osfmk/i386/cpu_capabilities.h
+#define commpage64_base_address         0x00007fffffe00000
+#define commpage64_cpu_capabilities64   (commpage64_base_address+0x010)
+#define commpage64_version              (commpage64_base_address+0x01E)
+#define hasAVX512F                      0x0000004000000000
+#define hasAVX512CD                     0x0000008000000000
+#define hasAVX512DQ                     0x0000010000000000
+#define hasAVX512BW                     0x0000020000000000
+#define hasAVX512VL                     0x0000100000000000
+#define NEED_DARWIN_SUPPORT             (hasAVX512F | hasAVX512DQ | hasAVX512CD | hasAVX512BW | hasAVX512VL)
+#else
+#define NEED_OS_SUPPORT_AX V4_OS_SUPPORT_AX
+#endif
+
+#endif
+
+TEXT runtime·rt0_go(SB),NOSPLIT|NOFRAME|TOPFRAME,$0
+	// copy arguments forward on an even stack
+	MOVQ	DI, AX		// argc
+	MOVQ	SI, BX		// argv
+	SUBQ	$(5*8), SP		// 3args 2auto
+	ANDQ	$~15, SP
+	MOVQ	AX, 24(SP)
+	MOVQ	BX, 32(SP)
+
+	// create istack out of the given (operating system) stack.
+	// _cgo_init may update stackguard.
+	MOVQ	$runtime·g0(SB), DI
+	LEAQ	(-64*1024)(SP), BX
+	MOVQ	BX, g_stackguard0(DI)
+	MOVQ	BX, g_stackguard1(DI)
+	MOVQ	BX, (g_stack+stack_lo)(DI)
+	MOVQ	SP, (g_stack+stack_hi)(DI)
+
+	// find out information about the processor we're on
+	MOVL	$0, AX
+	CPUID
+	CMPL	AX, $0
+	JE	nocpuinfo
+
+	CMPL	BX, $0x756E6547  // "Genu"
+	JNE	notintel
+	CMPL	DX, $0x49656E69  // "ineI"
+	JNE	notintel
+	CMPL	CX, $0x6C65746E  // "ntel"
+	JNE	notintel
+	MOVB	$1, runtime·isIntel(SB)
+
+notintel:
+	// Load EAX=1 cpuid flags
+	MOVL	$1, AX
+	CPUID
+	MOVL	AX, runtime·processorVersionInfo(SB)
+
+nocpuinfo:
+	// if there is an _cgo_init, call it.
+	MOVQ	_cgo_init(SB), AX
+	TESTQ	AX, AX
+	JZ	needtls
+	// arg 1: g0, already in DI
+	MOVQ	$setg_gcc<>(SB), SI // arg 2: setg_gcc
+	MOVQ	$0, DX	// arg 3, 4: not used when using platform's TLS
+	MOVQ	$0, CX
+#ifdef GOOS_android
+	MOVQ	$runtime·tls_g(SB), DX 	// arg 3: &tls_g
+	// arg 4: TLS base, stored in slot 0 (Android's TLS_SLOT_SELF).
+	// Compensate for tls_g (+16).
+	MOVQ	-16(TLS), CX
+#endif
+#ifdef GOOS_windows
+	MOVQ	$runtime·tls_g(SB), DX 	// arg 3: &tls_g
+	// Adjust for the Win64 calling convention.
+	MOVQ	CX, R9 // arg 4
+	MOVQ	DX, R8 // arg 3
+	MOVQ	SI, DX // arg 2
+	MOVQ	DI, CX // arg 1
+#endif
+	CALL	AX
+
+	// update stackguard after _cgo_init
+	MOVQ	$runtime·g0(SB), CX
+	MOVQ	(g_stack+stack_lo)(CX), AX
+	ADDQ	$const_stackGuard, AX
+	MOVQ	AX, g_stackguard0(CX)
+	MOVQ	AX, g_stackguard1(CX)
+
+#ifndef GOOS_windows
+	JMP ok
+#endif
+needtls:
+#ifdef GOOS_plan9
+	// skip TLS setup on Plan 9
+	JMP ok
+#endif
+#ifdef GOOS_solaris
+	// skip TLS setup on Solaris
+	JMP ok
+#endif
+#ifdef GOOS_illumos
+	// skip TLS setup on illumos
+	JMP ok
+#endif
+#ifdef GOOS_darwin
+	// skip TLS setup on Darwin
+	JMP ok
+#endif
+#ifdef GOOS_openbsd
+	// skip TLS setup on OpenBSD
+	JMP ok
+#endif
+
+#ifdef GOOS_windows
+	CALL	runtime·wintls(SB)
+#endif
+
+	LEAQ	runtime·m0+m_tls(SB), DI
+	CALL	runtime·settls(SB)
+
+	// store through it, to make sure it works
+	get_tls(BX)
+	MOVQ	$0x123, g(BX)
+	MOVQ	runtime·m0+m_tls(SB), AX
+	CMPQ	AX, $0x123
+	JEQ 2(PC)
+	CALL	runtime·abort(SB)
+ok:
+	// set the per-goroutine and per-mach "registers"
+	get_tls(BX)
+	LEAQ	runtime·g0(SB), CX
+	MOVQ	CX, g(BX)
+	LEAQ	runtime·m0(SB), AX
+
+	// save m->g0 = g0
+	MOVQ	CX, m_g0(AX)
+	// save m0 to g0->m
+	MOVQ	AX, g_m(CX)
+
+	CLD				// convention is D is always left cleared
+
+	// Check GOAMD64 requirements
+	// We need to do this after setting up TLS, so that
+	// we can report an error if there is a failure. See issue 49586.
+#ifdef NEED_FEATURES_CX
+	MOVL	$0, AX
+	CPUID
+	CMPL	AX, $0
+	JE	bad_cpu
+	MOVL	$1, AX
+	CPUID
+	ANDL	$NEED_FEATURES_CX, CX
+	CMPL	CX, $NEED_FEATURES_CX
+	JNE	bad_cpu
+#endif
+
+#ifdef NEED_MAX_CPUID
+	MOVL	$0x80000000, AX
+	CPUID
+	CMPL	AX, $NEED_MAX_CPUID
+	JL	bad_cpu
+#endif
+
+#ifdef NEED_EXT_FEATURES_BX
+	MOVL	$7, AX
+	MOVL	$0, CX
+	CPUID
+	ANDL	$NEED_EXT_FEATURES_BX, BX
+	CMPL	BX, $NEED_EXT_FEATURES_BX
+	JNE	bad_cpu
+#endif
+
+#ifdef NEED_EXT_FEATURES_CX
+	MOVL	$0x80000001, AX
+	CPUID
+	ANDL	$NEED_EXT_FEATURES_CX, CX
+	CMPL	CX, $NEED_EXT_FEATURES_CX
+	JNE	bad_cpu
+#endif
+
+#ifdef NEED_OS_SUPPORT_AX
+	XORL    CX, CX
+	XGETBV
+	ANDL	$NEED_OS_SUPPORT_AX, AX
+	CMPL	AX, $NEED_OS_SUPPORT_AX
+	JNE	bad_cpu
+#endif
+
+#ifdef NEED_DARWIN_SUPPORT
+	MOVQ	$commpage64_version, BX
+	CMPW	(BX), $13  // cpu_capabilities64 undefined in versions < 13
+	JL	bad_cpu
+	MOVQ	$commpage64_cpu_capabilities64, BX
+	MOVQ	(BX), BX
+	MOVQ	$NEED_DARWIN_SUPPORT, CX
+	ANDQ	CX, BX
+	CMPQ	BX, CX
+	JNE	bad_cpu
+#endif
+
+	CALL	runtime·check(SB)
+
+	MOVL	24(SP), AX		// copy argc
+	MOVL	AX, 0(SP)
+	MOVQ	32(SP), AX		// copy argv
+	MOVQ	AX, 8(SP)
+	CALL	runtime·args(SB)
+	CALL	runtime·osinit(SB)
+	CALL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	MOVQ	$runtime·mainPC(SB), AX		// entry
+	PUSHQ	AX
+	CALL	runtime·newproc(SB)
+	POPQ	AX
+
+	// start this M
+	CALL	runtime·mstart(SB)
+
+	CALL	runtime·abort(SB)	// mstart should never return
+	RET
+
+bad_cpu: // show that the program requires a certain microarchitecture level.
+	MOVQ	$2, 0(SP)
+	MOVQ	$bad_cpu_msg<>(SB), AX
+	MOVQ	AX, 8(SP)
+	MOVQ	$84, 16(SP)
+	CALL	runtime·write(SB)
+	MOVQ	$1, 0(SP)
+	CALL	runtime·exit(SB)
+	CALL	runtime·abort(SB)
+	RET
+
+	// Prevent dead-code elimination of debugCallV2, which is
+	// intended to be called by debuggers.
+	MOVQ	$runtime·debugCallV2<ABIInternal>(SB), AX
+	RET
+
+// mainPC is a function value for runtime.main, to be passed to newproc.
+// The reference to runtime.main is made via ABIInternal, since the
+// actual function (not the ABI0 wrapper) is needed by newproc.
+DATA	runtime·mainPC+0(SB)/8,$runtime·main<ABIInternal>(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$8
+
+TEXT runtime·breakpoint(SB),NOSPLIT,$0-0
+	BYTE	$0xcc
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT,$0-0
+	// No per-thread init.
+	RET
+
+TEXT runtime·mstart(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
+	CALL	runtime·mstart0(SB)
+	RET // not reached
+
+/*
+ *  go-routine
+ */
+
+// func gogo(buf *gobuf)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB), NOSPLIT, $0-8
+	MOVQ	buf+0(FP), BX		// gobuf
+	MOVQ	gobuf_g(BX), DX
+	MOVQ	0(DX), CX		// make sure g != nil
+	JMP	gogo<>(SB)
+
+TEXT gogo<>(SB), NOSPLIT, $0
+	get_tls(CX)
+	MOVQ	DX, g(CX)
+	MOVQ	DX, R14		// set the g register
+	MOVQ	gobuf_sp(BX), SP	// restore SP
+	MOVQ	gobuf_ret(BX), AX
+	MOVQ	gobuf_ctxt(BX), DX
+	MOVQ	gobuf_bp(BX), BP
+	MOVQ	$0, gobuf_sp(BX)	// clear to help garbage collector
+	MOVQ	$0, gobuf_ret(BX)
+	MOVQ	$0, gobuf_ctxt(BX)
+	MOVQ	$0, gobuf_bp(BX)
+	MOVQ	gobuf_pc(BX), BX
+	JMP	BX
+
+// func mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall<ABIInternal>(SB), NOSPLIT, $0-8
+	MOVQ	AX, DX	// DX = fn
+
+	// Save state in g->sched. The caller's SP and PC are restored by gogo to
+	// resume execution in the caller's frame (implicit return). The caller's BP
+	// is also restored to support frame pointer unwinding.
+	MOVQ	SP, BX	// hide (SP) reads from vet
+	MOVQ	8(BX), BX	// caller's PC
+	MOVQ	BX, (g_sched+gobuf_pc)(R14)
+	LEAQ	fn+0(FP), BX	// caller's SP
+	MOVQ	BX, (g_sched+gobuf_sp)(R14)
+	// Get the caller's frame pointer by dereferencing BP. Storing BP as it is
+	// can cause a frame pointer cycle, see CL 476235.
+	MOVQ	(BP), BX // caller's BP
+	MOVQ	BX, (g_sched+gobuf_bp)(R14)
+
+	// switch to m->g0 & its stack, call fn
+	MOVQ	g_m(R14), BX
+	MOVQ	m_g0(BX), SI	// SI = g.m.g0
+	CMPQ	SI, R14	// if g == m->g0 call badmcall
+	JNE	goodm
+	JMP	runtime·badmcall(SB)
+goodm:
+	MOVQ	R14, AX		// AX (and arg 0) = g
+	MOVQ	SI, R14		// g = g.m.g0
+	get_tls(CX)		// Set G in TLS
+	MOVQ	R14, g(CX)
+	MOVQ	(g_sched+gobuf_sp)(R14), SP	// sp = g0.sched.sp
+	PUSHQ	AX	// open up space for fn's arg spill slot
+	MOVQ	0(DX), R12
+	CALL	R12		// fn(g)
+	POPQ	AX
+	JMP	runtime·badmcall2(SB)
+	RET
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack. We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+// The frame layout needs to match systemstack
+// so that it can pretend to be systemstack_switch.
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	UNDEF
+	// Make sure this function is not leaf,
+	// so the frame is saved.
+	CALL	runtime·abort(SB)
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-8
+	MOVQ	fn+0(FP), DI	// DI = fn
+	get_tls(CX)
+	MOVQ	g(CX), AX	// AX = g
+	MOVQ	g_m(AX), BX	// BX = m
+
+	CMPQ	AX, m_gsignal(BX)
+	JEQ	noswitch
+
+	MOVQ	m_g0(BX), DX	// DX = g0
+	CMPQ	AX, DX
+	JEQ	noswitch
+
+	CMPQ	AX, m_curg(BX)
+	JNE	bad
+
+	// Switch stacks.
+	// The original frame pointer is stored in BP,
+	// which is useful for stack unwinding.
+	// Save our state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	CALL	gosave_systemstack_switch<>(SB)
+
+	// switch to g0
+	MOVQ	DX, g(CX)
+	MOVQ	DX, R14 // set the g register
+	MOVQ	(g_sched+gobuf_sp)(DX), SP
+
+	// call target function
+	MOVQ	DI, DX
+	MOVQ	0(DI), DI
+	CALL	DI
+
+	// switch back to g
+	get_tls(CX)
+	MOVQ	g(CX), AX
+	MOVQ	g_m(AX), BX
+	MOVQ	m_curg(BX), AX
+	MOVQ	AX, g(CX)
+	MOVQ	(g_sched+gobuf_sp)(AX), SP
+	MOVQ	(g_sched+gobuf_bp)(AX), BP
+	MOVQ	$0, (g_sched+gobuf_sp)(AX)
+	MOVQ	$0, (g_sched+gobuf_bp)(AX)
+	RET
+
+noswitch:
+	// already on m stack; tail call the function
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOVQ	DI, DX
+	MOVQ	0(DI), DI
+	// The function epilogue is not called on a tail call.
+	// Pop BP from the stack to simulate it.
+	POPQ	BP
+	JMP	DI
+
+bad:
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	MOVQ	$runtime·badsystemstack(SB), AX
+	CALL	AX
+	INT	$3
+
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	get_tls(CX)
+	MOVQ	g(CX), BX
+	MOVQ	g_m(BX), BX
+	MOVQ	m_g0(BX), SI
+	CMPQ	g(CX), SI
+	JNE	3(PC)
+	CALL	runtime·badmorestackg0(SB)
+	CALL	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOVQ	m_gsignal(BX), SI
+	CMPQ	g(CX), SI
+	JNE	3(PC)
+	CALL	runtime·badmorestackgsignal(SB)
+	CALL	runtime·abort(SB)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	NOP	SP	// tell vet SP changed - stop checking offsets
+	MOVQ	8(SP), AX	// f's caller's PC
+	MOVQ	AX, (m_morebuf+gobuf_pc)(BX)
+	LEAQ	16(SP), AX	// f's caller's SP
+	MOVQ	AX, (m_morebuf+gobuf_sp)(BX)
+	get_tls(CX)
+	MOVQ	g(CX), SI
+	MOVQ	SI, (m_morebuf+gobuf_g)(BX)
+
+	// Set g->sched to context in f.
+	MOVQ	0(SP), AX // f's PC
+	MOVQ	AX, (g_sched+gobuf_pc)(SI)
+	LEAQ	8(SP), AX // f's SP
+	MOVQ	AX, (g_sched+gobuf_sp)(SI)
+	MOVQ	BP, (g_sched+gobuf_bp)(SI)
+	MOVQ	DX, (g_sched+gobuf_ctxt)(SI)
+
+	// Call newstack on m->g0's stack.
+	MOVQ	m_g0(BX), BX
+	MOVQ	BX, g(CX)
+	MOVQ	(g_sched+gobuf_sp)(BX), SP
+	MOVQ	(g_sched+gobuf_bp)(BX), BP
+	CALL	runtime·newstack(SB)
+	CALL	runtime·abort(SB)	// crash if newstack returns
+	RET
+
+// morestack but not preserving ctxt.
+TEXT runtime·morestack_noctxt(SB),NOSPLIT,$0
+	MOVL	$0, DX
+	JMP	runtime·morestack(SB)
+
+// spillArgs stores return values from registers to a *internal/abi.RegArgs in R12.
+TEXT ·spillArgs(SB),NOSPLIT,$0-0
+	MOVQ AX, 0(R12)
+	MOVQ BX, 8(R12)
+	MOVQ CX, 16(R12)
+	MOVQ DI, 24(R12)
+	MOVQ SI, 32(R12)
+	MOVQ R8, 40(R12)
+	MOVQ R9, 48(R12)
+	MOVQ R10, 56(R12)
+	MOVQ R11, 64(R12)
+	MOVQ X0, 72(R12)
+	MOVQ X1, 80(R12)
+	MOVQ X2, 88(R12)
+	MOVQ X3, 96(R12)
+	MOVQ X4, 104(R12)
+	MOVQ X5, 112(R12)
+	MOVQ X6, 120(R12)
+	MOVQ X7, 128(R12)
+	MOVQ X8, 136(R12)
+	MOVQ X9, 144(R12)
+	MOVQ X10, 152(R12)
+	MOVQ X11, 160(R12)
+	MOVQ X12, 168(R12)
+	MOVQ X13, 176(R12)
+	MOVQ X14, 184(R12)
+	RET
+
+// unspillArgs loads args into registers from a *internal/abi.RegArgs in R12.
+TEXT ·unspillArgs(SB),NOSPLIT,$0-0
+	MOVQ 0(R12), AX
+	MOVQ 8(R12), BX
+	MOVQ 16(R12), CX
+	MOVQ 24(R12), DI
+	MOVQ 32(R12), SI
+	MOVQ 40(R12), R8
+	MOVQ 48(R12), R9
+	MOVQ 56(R12), R10
+	MOVQ 64(R12), R11
+	MOVQ 72(R12), X0
+	MOVQ 80(R12), X1
+	MOVQ 88(R12), X2
+	MOVQ 96(R12), X3
+	MOVQ 104(R12), X4
+	MOVQ 112(R12), X5
+	MOVQ 120(R12), X6
+	MOVQ 128(R12), X7
+	MOVQ 136(R12), X8
+	MOVQ 144(R12), X9
+	MOVQ 152(R12), X10
+	MOVQ 160(R12), X11
+	MOVQ 168(R12), X12
+	MOVQ 176(R12), X13
+	MOVQ 184(R12), X14
+	RET
+
+// reflectcall: call a function with the given argument list
+// func call(stackArgsType *_type, f *FuncVal, stackArgs *byte, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)		\
+	CMPQ	CX, $MAXSIZE;		\
+	JA	3(PC);			\
+	MOVQ	$NAME(SB), AX;		\
+	JMP	AX
+// Note: can't just "JMP NAME(SB)" - bad inlining results.
+
+TEXT ·reflectcall(SB), NOSPLIT, $0-48
+	MOVLQZX frameSize+32(FP), CX
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVQ	$runtime·badreflectcall(SB), AX
+	JMP	AX
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-48;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOVQ	stackArgs+16(FP), SI;		\
+	MOVLQZX stackArgsSize+24(FP), CX;		\
+	MOVQ	SP, DI;				\
+	REP;MOVSB;				\
+	/* set up argument registers */		\
+	MOVQ    regArgs+40(FP), R12;		\
+	CALL    ·unspillArgs(SB);		\
+	/* call function */			\
+	MOVQ	f+8(FP), DX;			\
+	PCDATA  $PCDATA_StackMapIndex, $0;	\
+	MOVQ	(DX), R12;			\
+	CALL	R12;				\
+	/* copy register return values back */		\
+	MOVQ    regArgs+40(FP), R12;		\
+	CALL    ·spillArgs(SB);		\
+	MOVLQZX	stackArgsSize+24(FP), CX;		\
+	MOVLQZX	stackRetOffset+28(FP), BX;		\
+	MOVQ	stackArgs+16(FP), DI;		\
+	MOVQ	stackArgsType+0(FP), DX;		\
+	MOVQ	SP, SI;				\
+	ADDQ	BX, DI;				\
+	ADDQ	BX, SI;				\
+	SUBQ	BX, CX;				\
+	CALL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $40-0
+	NO_LOCAL_POINTERS
+	MOVQ	DX, 0(SP)
+	MOVQ	DI, 8(SP)
+	MOVQ	SI, 16(SP)
+	MOVQ	CX, 24(SP)
+	MOVQ	R12, 32(SP)
+	CALL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+TEXT runtime·procyield(SB),NOSPLIT,$0-0
+	MOVL	cycles+0(FP), AX
+again:
+	PAUSE
+	SUBL	$1, AX
+	JNZ	again
+	RET
+
+
+TEXT ·publicationBarrier<ABIInternal>(SB),NOSPLIT,$0-0
+	// Stores are already ordered on x86, so this is just a
+	// compile barrier.
+	RET
+
+// Save state of caller into g->sched,
+// but using fake PC from systemstack_switch.
+// Must only be called from functions with frame pointer
+// and without locals ($0) or else unwinding from
+// systemstack_switch is incorrect.
+// Smashes R9.
+TEXT gosave_systemstack_switch<>(SB),NOSPLIT|NOFRAME,$0
+	// Take systemstack_switch PC and add 8 bytes to skip
+	// the prologue. The final location does not matter
+	// as long as we are between the prologue and the epilogue.
+	MOVQ	$runtime·systemstack_switch+8(SB), R9
+	MOVQ	R9, (g_sched+gobuf_pc)(R14)
+	LEAQ	8(SP), R9
+	MOVQ	R9, (g_sched+gobuf_sp)(R14)
+	MOVQ	$0, (g_sched+gobuf_ret)(R14)
+	MOVQ	BP, (g_sched+gobuf_bp)(R14)
+	// Assert ctxt is zero. See func save.
+	MOVQ	(g_sched+gobuf_ctxt)(R14), R9
+	TESTQ	R9, R9
+	JZ	2(PC)
+	CALL	runtime·abort(SB)
+	RET
+
+// func asmcgocall_no_g(fn, arg unsafe.Pointer)
+// Call fn(arg) aligned appropriately for the gcc ABI.
+// Called on a system stack, and there may be no g yet (during needm).
+TEXT ·asmcgocall_no_g(SB),NOSPLIT,$32-16
+	MOVQ	fn+0(FP), AX
+	MOVQ	arg+8(FP), BX
+	MOVQ	SP, DX
+	ANDQ	$~15, SP	// alignment
+	MOVQ	DX, 8(SP)
+	MOVQ	BX, DI		// DI = first argument in AMD64 ABI
+	MOVQ	BX, CX		// CX = first argument in Win64
+	CALL	AX
+	MOVQ	8(SP), DX
+	MOVQ	DX, SP
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-20
+	MOVQ	fn+0(FP), AX
+	MOVQ	arg+8(FP), BX
+
+	MOVQ	SP, DX
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already. Or we might already
+	// be on the m->gsignal stack.
+	get_tls(CX)
+	MOVQ	g(CX), DI
+	CMPQ	DI, $0
+	JEQ	nosave
+	MOVQ	g_m(DI), R8
+	MOVQ	m_gsignal(R8), SI
+	CMPQ	DI, SI
+	JEQ	nosave
+	MOVQ	m_g0(R8), SI
+	CMPQ	DI, SI
+	JEQ	nosave
+
+	// Switch to system stack.
+	// The original frame pointer is stored in BP,
+	// which is useful for stack unwinding.
+	CALL	gosave_systemstack_switch<>(SB)
+	MOVQ	SI, g(CX)
+	MOVQ	(g_sched+gobuf_sp)(SI), SP
+
+	// Now on a scheduling stack (a pthread-created stack).
+	// Make sure we have enough room for 4 stack-backed fast-call
+	// registers as per windows amd64 calling convention.
+	SUBQ	$64, SP
+	ANDQ	$~15, SP	// alignment for gcc ABI
+	MOVQ	DI, 48(SP)	// save g
+	MOVQ	(g_stack+stack_hi)(DI), DI
+	SUBQ	DX, DI
+	MOVQ	DI, 40(SP)	// save depth in stack (can't just save SP, as stack might be copied during a callback)
+	MOVQ	BX, DI		// DI = first argument in AMD64 ABI
+	MOVQ	BX, CX		// CX = first argument in Win64
+	CALL	AX
+
+	// Restore registers, g, stack pointer.
+	get_tls(CX)
+	MOVQ	48(SP), DI
+	MOVQ	(g_stack+stack_hi)(DI), SI
+	SUBQ	40(SP), SI
+	MOVQ	DI, g(CX)
+	MOVQ	SI, SP
+
+	MOVL	AX, ret+16(FP)
+	RET
+
+nosave:
+	// Running on a system stack, perhaps even without a g.
+	// Having no g can happen during thread creation or thread teardown
+	// (see needm/dropm on Solaris, for example).
+	// This code is like the above sequence but without saving/restoring g
+	// and without worrying about the stack moving out from under us
+	// (because we're on a system stack, not a goroutine stack).
+	// The above code could be used directly if already on a system stack,
+	// but then the only path through this code would be a rare case on Solaris.
+	// Using this code for all "already on system stack" calls exercises it more,
+	// which should help keep it correct.
+	SUBQ	$64, SP
+	ANDQ	$~15, SP
+	MOVQ	$0, 48(SP)		// where above code stores g, in case someone looks during debugging
+	MOVQ	DX, 40(SP)	// save original stack pointer
+	MOVQ	BX, DI		// DI = first argument in AMD64 ABI
+	MOVQ	BX, CX		// CX = first argument in Win64
+	CALL	AX
+	MOVQ	40(SP), SI	// restore original stack pointer
+	MOVQ	SI, SP
+	MOVL	AX, ret+16(FP)
+	RET
+
+#ifdef GOOS_windows
+// Dummy TLS that's used on Windows so that we don't crash trying
+// to restore the G register in needm. needm and its callees are
+// very careful never to actually use the G, the TLS just can't be
+// unset since we're in Go code.
+GLOBL zeroTLS<>(SB),RODATA,$const_tlsSize
+#endif
+
+// func cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$24-24
+	NO_LOCAL_POINTERS
+
+	// Skip cgocallbackg, just dropm when fn is nil, and frame is the saved g.
+	// It is used to dropm while thread is exiting.
+	MOVQ	fn+0(FP), AX
+	CMPQ	AX, $0
+	JNE	loadg
+	// Restore the g from frame.
+	get_tls(CX)
+	MOVQ	frame+8(FP), BX
+	MOVQ	BX, g(CX)
+	JMP	dropm
+
+loadg:
+	// If g is nil, Go did not create the current thread,
+	// or if this thread never called into Go on pthread platforms.
+	// Call needm to obtain one m for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call through AX.
+	get_tls(CX)
+#ifdef GOOS_windows
+	MOVL	$0, BX
+	CMPQ	CX, $0
+	JEQ	2(PC)
+#endif
+	MOVQ	g(CX), BX
+	CMPQ	BX, $0
+	JEQ	needm
+	MOVQ	g_m(BX), BX
+	MOVQ	BX, savedm-8(SP)	// saved copy of oldm
+	JMP	havem
+needm:
+#ifdef GOOS_windows
+	// Set up a dummy TLS value. needm is careful not to use it,
+	// but it needs to be there to prevent autogenerated code from
+	// crashing when it loads from it.
+	// We don't need to clear it or anything later because needm
+	// will set up TLS properly.
+	MOVQ	$zeroTLS<>(SB), DI
+	CALL	runtime·settls(SB)
+#endif
+	// On some platforms (Windows) we cannot call needm through
+	// an ABI wrapper because there's no TLS set up, and the ABI
+	// wrapper will try to restore the G register (R14) from TLS.
+	// Clear X15 because Go expects it and we're not calling
+	// through a wrapper, but otherwise avoid setting the G
+	// register in the wrapper and call needm directly. It
+	// takes no arguments and doesn't return any values so
+	// there's no need to handle that. Clear R14 so that there's
+	// a bad value in there, in case needm tries to use it.
+	XORPS	X15, X15
+	XORQ    R14, R14
+	MOVQ	$runtime·needAndBindM<ABIInternal>(SB), AX
+	CALL	AX
+	MOVQ	$0, savedm-8(SP)
+	get_tls(CX)
+	MOVQ	g(CX), BX
+	MOVQ	g_m(BX), BX
+
+	// Set m->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVQ	m_g0(BX), SI
+	MOVQ	SP, (g_sched+gobuf_sp)(SI)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 0(SP).
+	MOVQ	m_g0(BX), SI
+	MOVQ	(g_sched+gobuf_sp)(SI), AX
+	MOVQ	AX, 0(SP)
+	MOVQ	SP, (g_sched+gobuf_sp)(SI)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOVQ	m_curg(BX), SI
+	MOVQ	SI, g(CX)
+	MOVQ	(g_sched+gobuf_sp)(SI), DI  // prepare stack as DI
+	MOVQ	(g_sched+gobuf_pc)(SI), BX
+	MOVQ	BX, -8(DI)  // "push" return PC on the g stack
+	// Gather our arguments into registers.
+	MOVQ	fn+0(FP), BX
+	MOVQ	frame+8(FP), CX
+	MOVQ	ctxt+16(FP), DX
+	// Compute the size of the frame, including return PC and, if
+	// GOEXPERIMENT=framepointer, the saved base pointer
+	LEAQ	fn+0(FP), AX
+	SUBQ	SP, AX   // AX is our actual frame size
+	SUBQ	AX, DI   // Allocate the same frame size on the g stack
+	MOVQ	DI, SP
+
+	MOVQ	BX, 0(SP)
+	MOVQ	CX, 8(SP)
+	MOVQ	DX, 16(SP)
+	MOVQ	$runtime·cgocallbackg(SB), AX
+	CALL	AX	// indirect call to bypass nosplit check. We're on a different stack now.
+
+	// Compute the size of the frame again. FP and SP have
+	// completely different values here than they did above,
+	// but only their difference matters.
+	LEAQ	fn+0(FP), AX
+	SUBQ	SP, AX
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	get_tls(CX)
+	MOVQ	g(CX), SI
+	MOVQ	SP, DI
+	ADDQ	AX, DI
+	MOVQ	-8(DI), BX
+	MOVQ	BX, (g_sched+gobuf_pc)(SI)
+	MOVQ	DI, (g_sched+gobuf_sp)(SI)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVQ	g(CX), BX
+	MOVQ	g_m(BX), BX
+	MOVQ	m_g0(BX), SI
+	MOVQ	SI, g(CX)
+	MOVQ	(g_sched+gobuf_sp)(SI), SP
+	MOVQ	0(SP), AX
+	MOVQ	AX, (g_sched+gobuf_sp)(SI)
+
+	// If the m on entry was nil, we called needm above to borrow an m,
+	// 1. for the duration of the call on non-pthread platforms,
+	// 2. or the duration of the C thread alive on pthread platforms.
+	// If the m on entry wasn't nil,
+	// 1. the thread might be a Go thread,
+	// 2. or it wasn't the first call from a C thread on pthread platforms,
+	//    since then we skip dropm to reuse the m in the first call.
+	MOVQ	savedm-8(SP), BX
+	CMPQ	BX, $0
+	JNE	done
+
+	// Skip dropm to reuse it in the next call, when a pthread key has been created.
+	MOVQ	_cgo_pthread_key_created(SB), AX
+	// It means cgo is disabled when _cgo_pthread_key_created is a nil pointer, need dropm.
+	CMPQ	AX, $0
+	JEQ	dropm
+	CMPQ	(AX), $0
+	JNE	done
+
+dropm:
+	MOVQ	$runtime·dropm(SB), AX
+	CALL	AX
+#ifdef GOOS_windows
+	// We need to clear the TLS pointer in case the next
+	// thread that comes into Go tries to reuse that space
+	// but uses the same M.
+	XORQ	DI, DI
+	CALL	runtime·settls(SB)
+#endif
+done:
+
+	// Done!
+	RET
+
+// func setg(gg *g)
+// set g. for use by needm.
+TEXT runtime·setg(SB), NOSPLIT, $0-8
+	MOVQ	gg+0(FP), BX
+	get_tls(CX)
+	MOVQ	BX, g(CX)
+	RET
+
+// void setg_gcc(G*); set g called from gcc.
+TEXT setg_gcc<>(SB),NOSPLIT,$0
+	get_tls(AX)
+	MOVQ	DI, g(AX)
+	MOVQ	DI, R14 // set the g register
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT,$0-0
+	INT	$3
+loop:
+	JMP	loop
+
+// check that SP is in range [g->stack.lo, g->stack.hi)
+TEXT runtime·stackcheck(SB), NOSPLIT|NOFRAME, $0-0
+	get_tls(CX)
+	MOVQ	g(CX), AX
+	CMPQ	(g_stack+stack_hi)(AX), SP
+	JHI	2(PC)
+	CALL	runtime·abort(SB)
+	CMPQ	SP, (g_stack+stack_lo)(AX)
+	JHI	2(PC)
+	CALL	runtime·abort(SB)
+	RET
+
+// func cputicks() int64
+TEXT runtime·cputicks(SB),NOSPLIT,$0-0
+	CMPB	internal∕cpu·X86+const_offsetX86HasRDTSCP(SB), $1
+	JNE	fences
+	// Instruction stream serializing RDTSCP is supported.
+	// RDTSCP is supported by Intel Nehalem (2008) and
+	// AMD K8 Rev. F (2006) and newer.
+	RDTSCP
+done:
+	SHLQ	$32, DX
+	ADDQ	DX, AX
+	MOVQ	AX, ret+0(FP)
+	RET
+fences:
+	// MFENCE is instruction stream serializing and flushes the
+	// store buffers on AMD. The serialization semantics of LFENCE on AMD
+	// are dependent on MSR C001_1029 and CPU generation.
+	// LFENCE on Intel does wait for all previous instructions to have executed.
+	// Intel recommends MFENCE;LFENCE in its manuals before RDTSC to have all
+	// previous instructions executed and all previous loads and stores to globally visible.
+	// Using MFENCE;LFENCE here aligns the serializing properties without
+	// runtime detection of CPU manufacturer.
+	MFENCE
+	LFENCE
+	RDTSC
+	JMP done
+
+// func memhash(p unsafe.Pointer, h, s uintptr) uintptr
+// hash function using AES hardware instructions
+TEXT runtime·memhash<ABIInternal>(SB),NOSPLIT,$0-32
+	// AX = ptr to data
+	// BX = seed
+	// CX = size
+	CMPB	runtime·useAeshash(SB), $0
+	JEQ	noaes
+	JMP	aeshashbody<>(SB)
+noaes:
+	JMP	runtime·memhashFallback<ABIInternal>(SB)
+
+// func strhash(p unsafe.Pointer, h uintptr) uintptr
+TEXT runtime·strhash<ABIInternal>(SB),NOSPLIT,$0-24
+	// AX = ptr to string struct
+	// BX = seed
+	CMPB	runtime·useAeshash(SB), $0
+	JEQ	noaes
+	MOVQ	8(AX), CX	// length of string
+	MOVQ	(AX), AX	// string data
+	JMP	aeshashbody<>(SB)
+noaes:
+	JMP	runtime·strhashFallback<ABIInternal>(SB)
+
+// AX: data
+// BX: hash seed
+// CX: length
+// At return: AX = return value
+TEXT aeshashbody<>(SB),NOSPLIT,$0-0
+	// Fill an SSE register with our seeds.
+	MOVQ	BX, X0				// 64 bits of per-table hash seed
+	PINSRW	$4, CX, X0			// 16 bits of length
+	PSHUFHW $0, X0, X0			// repeat length 4 times total
+	MOVO	X0, X1				// save unscrambled seed
+	PXOR	runtime·aeskeysched(SB), X0	// xor in per-process seed
+	AESENC	X0, X0				// scramble seed
+
+	CMPQ	CX, $16
+	JB	aes0to15
+	JE	aes16
+	CMPQ	CX, $32
+	JBE	aes17to32
+	CMPQ	CX, $64
+	JBE	aes33to64
+	CMPQ	CX, $128
+	JBE	aes65to128
+	JMP	aes129plus
+
+aes0to15:
+	TESTQ	CX, CX
+	JE	aes0
+
+	ADDQ	$16, AX
+	TESTW	$0xff0, AX
+	JE	endofpage
+
+	// 16 bytes loaded at this address won't cross
+	// a page boundary, so we can load it directly.
+	MOVOU	-16(AX), X1
+	ADDQ	CX, CX
+	MOVQ	$masks<>(SB), AX
+	PAND	(AX)(CX*8), X1
+final1:
+	PXOR	X0, X1	// xor data with seed
+	AESENC	X1, X1	// scramble combo 3 times
+	AESENC	X1, X1
+	AESENC	X1, X1
+	MOVQ	X1, AX	// return X1
+	RET
+
+endofpage:
+	// address ends in 1111xxxx. Might be up against
+	// a page boundary, so load ending at last byte.
+	// Then shift bytes down using pshufb.
+	MOVOU	-32(AX)(CX*1), X1
+	ADDQ	CX, CX
+	MOVQ	$shifts<>(SB), AX
+	PSHUFB	(AX)(CX*8), X1
+	JMP	final1
+
+aes0:
+	// Return scrambled input seed
+	AESENC	X0, X0
+	MOVQ	X0, AX	// return X0
+	RET
+
+aes16:
+	MOVOU	(AX), X1
+	JMP	final1
+
+aes17to32:
+	// make second starting seed
+	PXOR	runtime·aeskeysched+16(SB), X1
+	AESENC	X1, X1
+
+	// load data to be hashed
+	MOVOU	(AX), X2
+	MOVOU	-16(AX)(CX*1), X3
+
+	// xor with seed
+	PXOR	X0, X2
+	PXOR	X1, X3
+
+	// scramble 3 times
+	AESENC	X2, X2
+	AESENC	X3, X3
+	AESENC	X2, X2
+	AESENC	X3, X3
+	AESENC	X2, X2
+	AESENC	X3, X3
+
+	// combine results
+	PXOR	X3, X2
+	MOVQ	X2, AX	// return X2
+	RET
+
+aes33to64:
+	// make 3 more starting seeds
+	MOVO	X1, X2
+	MOVO	X1, X3
+	PXOR	runtime·aeskeysched+16(SB), X1
+	PXOR	runtime·aeskeysched+32(SB), X2
+	PXOR	runtime·aeskeysched+48(SB), X3
+	AESENC	X1, X1
+	AESENC	X2, X2
+	AESENC	X3, X3
+
+	MOVOU	(AX), X4
+	MOVOU	16(AX), X5
+	MOVOU	-32(AX)(CX*1), X6
+	MOVOU	-16(AX)(CX*1), X7
+
+	PXOR	X0, X4
+	PXOR	X1, X5
+	PXOR	X2, X6
+	PXOR	X3, X7
+
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	PXOR	X6, X4
+	PXOR	X7, X5
+	PXOR	X5, X4
+	MOVQ	X4, AX	// return X4
+	RET
+
+aes65to128:
+	// make 7 more starting seeds
+	MOVO	X1, X2
+	MOVO	X1, X3
+	MOVO	X1, X4
+	MOVO	X1, X5
+	MOVO	X1, X6
+	MOVO	X1, X7
+	PXOR	runtime·aeskeysched+16(SB), X1
+	PXOR	runtime·aeskeysched+32(SB), X2
+	PXOR	runtime·aeskeysched+48(SB), X3
+	PXOR	runtime·aeskeysched+64(SB), X4
+	PXOR	runtime·aeskeysched+80(SB), X5
+	PXOR	runtime·aeskeysched+96(SB), X6
+	PXOR	runtime·aeskeysched+112(SB), X7
+	AESENC	X1, X1
+	AESENC	X2, X2
+	AESENC	X3, X3
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	// load data
+	MOVOU	(AX), X8
+	MOVOU	16(AX), X9
+	MOVOU	32(AX), X10
+	MOVOU	48(AX), X11
+	MOVOU	-64(AX)(CX*1), X12
+	MOVOU	-48(AX)(CX*1), X13
+	MOVOU	-32(AX)(CX*1), X14
+	MOVOU	-16(AX)(CX*1), X15
+
+	// xor with seed
+	PXOR	X0, X8
+	PXOR	X1, X9
+	PXOR	X2, X10
+	PXOR	X3, X11
+	PXOR	X4, X12
+	PXOR	X5, X13
+	PXOR	X6, X14
+	PXOR	X7, X15
+
+	// scramble 3 times
+	AESENC	X8, X8
+	AESENC	X9, X9
+	AESENC	X10, X10
+	AESENC	X11, X11
+	AESENC	X12, X12
+	AESENC	X13, X13
+	AESENC	X14, X14
+	AESENC	X15, X15
+
+	AESENC	X8, X8
+	AESENC	X9, X9
+	AESENC	X10, X10
+	AESENC	X11, X11
+	AESENC	X12, X12
+	AESENC	X13, X13
+	AESENC	X14, X14
+	AESENC	X15, X15
+
+	AESENC	X8, X8
+	AESENC	X9, X9
+	AESENC	X10, X10
+	AESENC	X11, X11
+	AESENC	X12, X12
+	AESENC	X13, X13
+	AESENC	X14, X14
+	AESENC	X15, X15
+
+	// combine results
+	PXOR	X12, X8
+	PXOR	X13, X9
+	PXOR	X14, X10
+	PXOR	X15, X11
+	PXOR	X10, X8
+	PXOR	X11, X9
+	PXOR	X9, X8
+	// X15 must be zero on return
+	PXOR	X15, X15
+	MOVQ	X8, AX	// return X8
+	RET
+
+aes129plus:
+	// make 7 more starting seeds
+	MOVO	X1, X2
+	MOVO	X1, X3
+	MOVO	X1, X4
+	MOVO	X1, X5
+	MOVO	X1, X6
+	MOVO	X1, X7
+	PXOR	runtime·aeskeysched+16(SB), X1
+	PXOR	runtime·aeskeysched+32(SB), X2
+	PXOR	runtime·aeskeysched+48(SB), X3
+	PXOR	runtime·aeskeysched+64(SB), X4
+	PXOR	runtime·aeskeysched+80(SB), X5
+	PXOR	runtime·aeskeysched+96(SB), X6
+	PXOR	runtime·aeskeysched+112(SB), X7
+	AESENC	X1, X1
+	AESENC	X2, X2
+	AESENC	X3, X3
+	AESENC	X4, X4
+	AESENC	X5, X5
+	AESENC	X6, X6
+	AESENC	X7, X7
+
+	// start with last (possibly overlapping) block
+	MOVOU	-128(AX)(CX*1), X8
+	MOVOU	-112(AX)(CX*1), X9
+	MOVOU	-96(AX)(CX*1), X10
+	MOVOU	-80(AX)(CX*1), X11
+	MOVOU	-64(AX)(CX*1), X12
+	MOVOU	-48(AX)(CX*1), X13
+	MOVOU	-32(AX)(CX*1), X14
+	MOVOU	-16(AX)(CX*1), X15
+
+	// xor in seed
+	PXOR	X0, X8
+	PXOR	X1, X9
+	PXOR	X2, X10
+	PXOR	X3, X11
+	PXOR	X4, X12
+	PXOR	X5, X13
+	PXOR	X6, X14
+	PXOR	X7, X15
+
+	// compute number of remaining 128-byte blocks
+	DECQ	CX
+	SHRQ	$7, CX
+
+aesloop:
+	// scramble state
+	AESENC	X8, X8
+	AESENC	X9, X9
+	AESENC	X10, X10
+	AESENC	X11, X11
+	AESENC	X12, X12
+	AESENC	X13, X13
+	AESENC	X14, X14
+	AESENC	X15, X15
+
+	// scramble state, xor in a block
+	MOVOU	(AX), X0
+	MOVOU	16(AX), X1
+	MOVOU	32(AX), X2
+	MOVOU	48(AX), X3
+	AESENC	X0, X8
+	AESENC	X1, X9
+	AESENC	X2, X10
+	AESENC	X3, X11
+	MOVOU	64(AX), X4
+	MOVOU	80(AX), X5
+	MOVOU	96(AX), X6
+	MOVOU	112(AX), X7
+	AESENC	X4, X12
+	AESENC	X5, X13
+	AESENC	X6, X14
+	AESENC	X7, X15
+
+	ADDQ	$128, AX
+	DECQ	CX
+	JNE	aesloop
+
+	// 3 more scrambles to finish
+	AESENC	X8, X8
+	AESENC	X9, X9
+	AESENC	X10, X10
+	AESENC	X11, X11
+	AESENC	X12, X12
+	AESENC	X13, X13
+	AESENC	X14, X14
+	AESENC	X15, X15
+	AESENC	X8, X8
+	AESENC	X9, X9
+	AESENC	X10, X10
+	AESENC	X11, X11
+	AESENC	X12, X12
+	AESENC	X13, X13
+	AESENC	X14, X14
+	AESENC	X15, X15
+	AESENC	X8, X8
+	AESENC	X9, X9
+	AESENC	X10, X10
+	AESENC	X11, X11
+	AESENC	X12, X12
+	AESENC	X13, X13
+	AESENC	X14, X14
+	AESENC	X15, X15
+
+	PXOR	X12, X8
+	PXOR	X13, X9
+	PXOR	X14, X10
+	PXOR	X15, X11
+	PXOR	X10, X8
+	PXOR	X11, X9
+	PXOR	X9, X8
+	// X15 must be zero on return
+	PXOR	X15, X15
+	MOVQ	X8, AX	// return X8
+	RET
+
+// func memhash32(p unsafe.Pointer, h uintptr) uintptr
+// ABIInternal for performance.
+TEXT runtime·memhash32<ABIInternal>(SB),NOSPLIT,$0-24
+	// AX = ptr to data
+	// BX = seed
+	CMPB	runtime·useAeshash(SB), $0
+	JEQ	noaes
+	MOVQ	BX, X0	// X0 = seed
+	PINSRD	$2, (AX), X0	// data
+	AESENC	runtime·aeskeysched+0(SB), X0
+	AESENC	runtime·aeskeysched+16(SB), X0
+	AESENC	runtime·aeskeysched+32(SB), X0
+	MOVQ	X0, AX	// return X0
+	RET
+noaes:
+	JMP	runtime·memhash32Fallback<ABIInternal>(SB)
+
+// func memhash64(p unsafe.Pointer, h uintptr) uintptr
+// ABIInternal for performance.
+TEXT runtime·memhash64<ABIInternal>(SB),NOSPLIT,$0-24
+	// AX = ptr to data
+	// BX = seed
+	CMPB	runtime·useAeshash(SB), $0
+	JEQ	noaes
+	MOVQ	BX, X0	// X0 = seed
+	PINSRQ	$1, (AX), X0	// data
+	AESENC	runtime·aeskeysched+0(SB), X0
+	AESENC	runtime·aeskeysched+16(SB), X0
+	AESENC	runtime·aeskeysched+32(SB), X0
+	MOVQ	X0, AX	// return X0
+	RET
+noaes:
+	JMP	runtime·memhash64Fallback<ABIInternal>(SB)
+
+// simple mask to get rid of data in the high part of the register.
+DATA masks<>+0x00(SB)/8, $0x0000000000000000
+DATA masks<>+0x08(SB)/8, $0x0000000000000000
+DATA masks<>+0x10(SB)/8, $0x00000000000000ff
+DATA masks<>+0x18(SB)/8, $0x0000000000000000
+DATA masks<>+0x20(SB)/8, $0x000000000000ffff
+DATA masks<>+0x28(SB)/8, $0x0000000000000000
+DATA masks<>+0x30(SB)/8, $0x0000000000ffffff
+DATA masks<>+0x38(SB)/8, $0x0000000000000000
+DATA masks<>+0x40(SB)/8, $0x00000000ffffffff
+DATA masks<>+0x48(SB)/8, $0x0000000000000000
+DATA masks<>+0x50(SB)/8, $0x000000ffffffffff
+DATA masks<>+0x58(SB)/8, $0x0000000000000000
+DATA masks<>+0x60(SB)/8, $0x0000ffffffffffff
+DATA masks<>+0x68(SB)/8, $0x0000000000000000
+DATA masks<>+0x70(SB)/8, $0x00ffffffffffffff
+DATA masks<>+0x78(SB)/8, $0x0000000000000000
+DATA masks<>+0x80(SB)/8, $0xffffffffffffffff
+DATA masks<>+0x88(SB)/8, $0x0000000000000000
+DATA masks<>+0x90(SB)/8, $0xffffffffffffffff
+DATA masks<>+0x98(SB)/8, $0x00000000000000ff
+DATA masks<>+0xa0(SB)/8, $0xffffffffffffffff
+DATA masks<>+0xa8(SB)/8, $0x000000000000ffff
+DATA masks<>+0xb0(SB)/8, $0xffffffffffffffff
+DATA masks<>+0xb8(SB)/8, $0x0000000000ffffff
+DATA masks<>+0xc0(SB)/8, $0xffffffffffffffff
+DATA masks<>+0xc8(SB)/8, $0x00000000ffffffff
+DATA masks<>+0xd0(SB)/8, $0xffffffffffffffff
+DATA masks<>+0xd8(SB)/8, $0x000000ffffffffff
+DATA masks<>+0xe0(SB)/8, $0xffffffffffffffff
+DATA masks<>+0xe8(SB)/8, $0x0000ffffffffffff
+DATA masks<>+0xf0(SB)/8, $0xffffffffffffffff
+DATA masks<>+0xf8(SB)/8, $0x00ffffffffffffff
+GLOBL masks<>(SB),RODATA,$256
+
+// func checkASM() bool
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	// check that masks<>(SB) and shifts<>(SB) are aligned to 16-byte
+	MOVQ	$masks<>(SB), AX
+	MOVQ	$shifts<>(SB), BX
+	ORQ	BX, AX
+	TESTQ	$15, AX
+	SETEQ	ret+0(FP)
+	RET
+
+// these are arguments to pshufb. They move data down from
+// the high bytes of the register to the low bytes of the register.
+// index is how many bytes to move.
+DATA shifts<>+0x00(SB)/8, $0x0000000000000000
+DATA shifts<>+0x08(SB)/8, $0x0000000000000000
+DATA shifts<>+0x10(SB)/8, $0xffffffffffffff0f
+DATA shifts<>+0x18(SB)/8, $0xffffffffffffffff
+DATA shifts<>+0x20(SB)/8, $0xffffffffffff0f0e
+DATA shifts<>+0x28(SB)/8, $0xffffffffffffffff
+DATA shifts<>+0x30(SB)/8, $0xffffffffff0f0e0d
+DATA shifts<>+0x38(SB)/8, $0xffffffffffffffff
+DATA shifts<>+0x40(SB)/8, $0xffffffff0f0e0d0c
+DATA shifts<>+0x48(SB)/8, $0xffffffffffffffff
+DATA shifts<>+0x50(SB)/8, $0xffffff0f0e0d0c0b
+DATA shifts<>+0x58(SB)/8, $0xffffffffffffffff
+DATA shifts<>+0x60(SB)/8, $0xffff0f0e0d0c0b0a
+DATA shifts<>+0x68(SB)/8, $0xffffffffffffffff
+DATA shifts<>+0x70(SB)/8, $0xff0f0e0d0c0b0a09
+DATA shifts<>+0x78(SB)/8, $0xffffffffffffffff
+DATA shifts<>+0x80(SB)/8, $0x0f0e0d0c0b0a0908
+DATA shifts<>+0x88(SB)/8, $0xffffffffffffffff
+DATA shifts<>+0x90(SB)/8, $0x0e0d0c0b0a090807
+DATA shifts<>+0x98(SB)/8, $0xffffffffffffff0f
+DATA shifts<>+0xa0(SB)/8, $0x0d0c0b0a09080706
+DATA shifts<>+0xa8(SB)/8, $0xffffffffffff0f0e
+DATA shifts<>+0xb0(SB)/8, $0x0c0b0a0908070605
+DATA shifts<>+0xb8(SB)/8, $0xffffffffff0f0e0d
+DATA shifts<>+0xc0(SB)/8, $0x0b0a090807060504
+DATA shifts<>+0xc8(SB)/8, $0xffffffff0f0e0d0c
+DATA shifts<>+0xd0(SB)/8, $0x0a09080706050403
+DATA shifts<>+0xd8(SB)/8, $0xffffff0f0e0d0c0b
+DATA shifts<>+0xe0(SB)/8, $0x0908070605040302
+DATA shifts<>+0xe8(SB)/8, $0xffff0f0e0d0c0b0a
+DATA shifts<>+0xf0(SB)/8, $0x0807060504030201
+DATA shifts<>+0xf8(SB)/8, $0xff0f0e0d0c0b0a09
+GLOBL shifts<>(SB),RODATA,$256
+
+TEXT runtime·return0(SB), NOSPLIT, $0
+	MOVL	$0, AX
+	RET
+
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT,$0
+	get_tls(CX)
+	MOVQ	g(CX), AX
+	MOVQ	g_m(AX), AX
+	MOVQ	m_curg(AX), AX
+	MOVQ	(g_stack+stack_hi)(AX), AX
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+TEXT runtime·goexit(SB),NOSPLIT|TOPFRAME|NOFRAME,$0-0
+	BYTE	$0x90	// NOP
+	CALL	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	BYTE	$0x90	// NOP
+
+// This is called from .init_array and follows the platform, not Go, ABI.
+TEXT runtime·addmoduledata(SB),NOSPLIT,$0-0
+	PUSHQ	R15 // The access to global variables below implicitly uses R15, which is callee-save
+	MOVQ	runtime·lastmoduledatap(SB), AX
+	MOVQ	DI, moduledata_next(AX)
+	MOVQ	DI, runtime·lastmoduledatap(SB)
+	POPQ	R15
+	RET
+
+// Initialize special registers then jump to sigpanic.
+// This function is injected from the signal handler for panicking
+// signals. It is quite painful to set X15 in the signal context,
+// so we do it here.
+TEXT ·sigpanic0(SB),NOSPLIT,$0-0
+	get_tls(R14)
+	MOVQ	g(R14), R14
+#ifndef GOOS_plan9
+	XORPS	X15, X15
+#endif
+	JMP	·sigpanic<ABIInternal>(SB)
+
+// gcWriteBarrier informs the GC about heap pointer writes.
+//
+// gcWriteBarrier returns space in a write barrier buffer which
+// should be filled in by the caller.
+// gcWriteBarrier does NOT follow the Go ABI. It accepts the
+// number of bytes of buffer needed in R11, and returns a pointer
+// to the buffer space in R11.
+// It clobbers FLAGS. It does not clobber any general-purpose registers,
+// but may clobber others (e.g., SSE registers).
+// Typical use would be, when doing *(CX+88) = AX
+//     CMPL    $0, runtime.writeBarrier(SB)
+//     JEQ     dowrite
+//     CALL    runtime.gcBatchBarrier2(SB)
+//     MOVQ    AX, (R11)
+//     MOVQ    88(CX), DX
+//     MOVQ    DX, 8(R11)
+// dowrite:
+//     MOVQ    AX, 88(CX)
+TEXT gcWriteBarrier<>(SB),NOSPLIT,$112
+	// Save the registers clobbered by the fast path. This is slightly
+	// faster than having the caller spill these.
+	MOVQ	R12, 96(SP)
+	MOVQ	R13, 104(SP)
+retry:
+	// TODO: Consider passing g.m.p in as an argument so they can be shared
+	// across a sequence of write barriers.
+	MOVQ	g_m(R14), R13
+	MOVQ	m_p(R13), R13
+	// Get current buffer write position.
+	MOVQ	(p_wbBuf+wbBuf_next)(R13), R12	// original next position
+	ADDQ	R11, R12			// new next position
+	// Is the buffer full?
+	CMPQ	R12, (p_wbBuf+wbBuf_end)(R13)
+	JA	flush
+	// Commit to the larger buffer.
+	MOVQ	R12, (p_wbBuf+wbBuf_next)(R13)
+	// Make return value (the original next position)
+	SUBQ	R11, R12
+	MOVQ	R12, R11
+	// Restore registers.
+	MOVQ	96(SP), R12
+	MOVQ	104(SP), R13
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	// It is possible for wbBufFlush to clobber other registers
+	// (e.g., SSE registers), but the compiler takes care of saving
+	// those in the caller if necessary. This strikes a balance
+	// with registers that are likely to be used.
+	//
+	// We don't have type information for these, but all code under
+	// here is NOSPLIT, so nothing will observe these.
+	//
+	// TODO: We could strike a different balance; e.g., saving X0
+	// and not saving GP registers that are less likely to be used.
+	MOVQ	DI, 0(SP)
+	MOVQ	AX, 8(SP)
+	MOVQ	BX, 16(SP)
+	MOVQ	CX, 24(SP)
+	MOVQ	DX, 32(SP)
+	// DI already saved
+	MOVQ	SI, 40(SP)
+	MOVQ	BP, 48(SP)
+	MOVQ	R8, 56(SP)
+	MOVQ	R9, 64(SP)
+	MOVQ	R10, 72(SP)
+	MOVQ	R11, 80(SP)
+	// R12 already saved
+	// R13 already saved
+	// R14 is g
+	MOVQ	R15, 88(SP)
+
+	CALL	runtime·wbBufFlush(SB)
+
+	MOVQ	0(SP), DI
+	MOVQ	8(SP), AX
+	MOVQ	16(SP), BX
+	MOVQ	24(SP), CX
+	MOVQ	32(SP), DX
+	MOVQ	40(SP), SI
+	MOVQ	48(SP), BP
+	MOVQ	56(SP), R8
+	MOVQ	64(SP), R9
+	MOVQ	72(SP), R10
+	MOVQ	80(SP), R11
+	MOVQ	88(SP), R15
+	JMP	retry
+
+TEXT runtime·gcWriteBarrier1<ABIInternal>(SB),NOSPLIT|NOFRAME,$0
+	MOVL   $8, R11
+	JMP     gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier2<ABIInternal>(SB),NOSPLIT|NOFRAME,$0
+	MOVL   $16, R11
+	JMP     gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier3<ABIInternal>(SB),NOSPLIT|NOFRAME,$0
+	MOVL   $24, R11
+	JMP     gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier4<ABIInternal>(SB),NOSPLIT|NOFRAME,$0
+	MOVL   $32, R11
+	JMP     gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier5<ABIInternal>(SB),NOSPLIT|NOFRAME,$0
+	MOVL   $40, R11
+	JMP     gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier6<ABIInternal>(SB),NOSPLIT|NOFRAME,$0
+	MOVL   $48, R11
+	JMP     gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier7<ABIInternal>(SB),NOSPLIT|NOFRAME,$0
+	MOVL   $56, R11
+	JMP     gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier8<ABIInternal>(SB),NOSPLIT|NOFRAME,$0
+	MOVL   $64, R11
+	JMP     gcWriteBarrier<>(SB)
+
+DATA	debugCallFrameTooLarge<>+0x00(SB)/20, $"call frame too large"
+GLOBL	debugCallFrameTooLarge<>(SB), RODATA, $20	// Size duplicated below
+
+// debugCallV2 is the entry point for debugger-injected function
+// calls on running goroutines. It informs the runtime that a
+// debug call has been injected and creates a call frame for the
+// debugger to fill in.
+//
+// To inject a function call, a debugger should:
+// 1. Check that the goroutine is in state _Grunning and that
+//    there are at least 256 bytes free on the stack.
+// 2. Push the current PC on the stack (updating SP).
+// 3. Write the desired argument frame size at SP-16 (using the SP
+//    after step 2).
+// 4. Save all machine registers (including flags and XMM registers)
+//    so they can be restored later by the debugger.
+// 5. Set the PC to debugCallV2 and resume execution.
+//
+// If the goroutine is in state _Grunnable, then it's not generally
+// safe to inject a call because it may return out via other runtime
+// operations. Instead, the debugger should unwind the stack to find
+// the return to non-runtime code, add a temporary breakpoint there,
+// and inject the call once that breakpoint is hit.
+//
+// If the goroutine is in any other state, it's not safe to inject a call.
+//
+// This function communicates back to the debugger by setting R12 and
+// invoking INT3 to raise a breakpoint signal. See the comments in the
+// implementation for the protocol the debugger is expected to
+// follow. InjectDebugCall in the runtime tests demonstrates this protocol.
+//
+// The debugger must ensure that any pointers passed to the function
+// obey escape analysis requirements. Specifically, it must not pass
+// a stack pointer to an escaping argument. debugCallV2 cannot check
+// this invariant.
+//
+// This is ABIInternal because Go code injects its PC directly into new
+// goroutine stacks.
+TEXT runtime·debugCallV2<ABIInternal>(SB),NOSPLIT,$152-0
+	// Save all registers that may contain pointers so they can be
+	// conservatively scanned.
+	//
+	// We can't do anything that might clobber any of these
+	// registers before this.
+	MOVQ	R15, r15-(14*8+8)(SP)
+	MOVQ	R14, r14-(13*8+8)(SP)
+	MOVQ	R13, r13-(12*8+8)(SP)
+	MOVQ	R12, r12-(11*8+8)(SP)
+	MOVQ	R11, r11-(10*8+8)(SP)
+	MOVQ	R10, r10-(9*8+8)(SP)
+	MOVQ	R9, r9-(8*8+8)(SP)
+	MOVQ	R8, r8-(7*8+8)(SP)
+	MOVQ	DI, di-(6*8+8)(SP)
+	MOVQ	SI, si-(5*8+8)(SP)
+	MOVQ	BP, bp-(4*8+8)(SP)
+	MOVQ	BX, bx-(3*8+8)(SP)
+	MOVQ	DX, dx-(2*8+8)(SP)
+	// Save the frame size before we clobber it. Either of the last
+	// saves could clobber this depending on whether there's a saved BP.
+	MOVQ	frameSize-24(FP), DX	// aka -16(RSP) before prologue
+	MOVQ	CX, cx-(1*8+8)(SP)
+	MOVQ	AX, ax-(0*8+8)(SP)
+
+	// Save the argument frame size.
+	MOVQ	DX, frameSize-128(SP)
+
+	// Perform a safe-point check.
+	MOVQ	retpc-8(FP), AX	// Caller's PC
+	MOVQ	AX, 0(SP)
+	CALL	runtime·debugCallCheck(SB)
+	MOVQ	8(SP), AX
+	TESTQ	AX, AX
+	JZ	good
+	// The safety check failed. Put the reason string at the top
+	// of the stack.
+	MOVQ	AX, 0(SP)
+	MOVQ	16(SP), AX
+	MOVQ	AX, 8(SP)
+	// Set R12 to 8 and invoke INT3. The debugger should get the
+	// reason a call can't be injected from the top of the stack
+	// and resume execution.
+	MOVQ	$8, R12
+	BYTE	$0xcc
+	JMP	restore
+
+good:
+	// Registers are saved and it's safe to make a call.
+	// Open up a call frame, moving the stack if necessary.
+	//
+	// Once the frame is allocated, this will set R12 to 0 and
+	// invoke INT3. The debugger should write the argument
+	// frame for the call at SP, set up argument registers, push
+	// the trapping PC on the stack, set the PC to the function to
+	// call, set RDX to point to the closure (if a closure call),
+	// and resume execution.
+	//
+	// If the function returns, this will set R12 to 1 and invoke
+	// INT3. The debugger can then inspect any return value saved
+	// on the stack at SP and in registers and resume execution again.
+	//
+	// If the function panics, this will set R12 to 2 and invoke INT3.
+	// The interface{} value of the panic will be at SP. The debugger
+	// can inspect the panic value and resume execution again.
+#define DEBUG_CALL_DISPATCH(NAME,MAXSIZE)	\
+	CMPQ	AX, $MAXSIZE;			\
+	JA	5(PC);				\
+	MOVQ	$NAME(SB), AX;			\
+	MOVQ	AX, 0(SP);			\
+	CALL	runtime·debugCallWrap(SB);	\
+	JMP	restore
+
+	MOVQ	frameSize-128(SP), AX
+	DEBUG_CALL_DISPATCH(debugCall32<>, 32)
+	DEBUG_CALL_DISPATCH(debugCall64<>, 64)
+	DEBUG_CALL_DISPATCH(debugCall128<>, 128)
+	DEBUG_CALL_DISPATCH(debugCall256<>, 256)
+	DEBUG_CALL_DISPATCH(debugCall512<>, 512)
+	DEBUG_CALL_DISPATCH(debugCall1024<>, 1024)
+	DEBUG_CALL_DISPATCH(debugCall2048<>, 2048)
+	DEBUG_CALL_DISPATCH(debugCall4096<>, 4096)
+	DEBUG_CALL_DISPATCH(debugCall8192<>, 8192)
+	DEBUG_CALL_DISPATCH(debugCall16384<>, 16384)
+	DEBUG_CALL_DISPATCH(debugCall32768<>, 32768)
+	DEBUG_CALL_DISPATCH(debugCall65536<>, 65536)
+	// The frame size is too large. Report the error.
+	MOVQ	$debugCallFrameTooLarge<>(SB), AX
+	MOVQ	AX, 0(SP)
+	MOVQ	$20, 8(SP) // length of debugCallFrameTooLarge string
+	MOVQ	$8, R12
+	BYTE	$0xcc
+	JMP	restore
+
+restore:
+	// Calls and failures resume here.
+	//
+	// Set R12 to 16 and invoke INT3. The debugger should restore
+	// all registers except RIP and RSP and resume execution.
+	MOVQ	$16, R12
+	BYTE	$0xcc
+	// We must not modify flags after this point.
+
+	// Restore pointer-containing registers, which may have been
+	// modified from the debugger's copy by stack copying.
+	MOVQ	ax-(0*8+8)(SP), AX
+	MOVQ	cx-(1*8+8)(SP), CX
+	MOVQ	dx-(2*8+8)(SP), DX
+	MOVQ	bx-(3*8+8)(SP), BX
+	MOVQ	bp-(4*8+8)(SP), BP
+	MOVQ	si-(5*8+8)(SP), SI
+	MOVQ	di-(6*8+8)(SP), DI
+	MOVQ	r8-(7*8+8)(SP), R8
+	MOVQ	r9-(8*8+8)(SP), R9
+	MOVQ	r10-(9*8+8)(SP), R10
+	MOVQ	r11-(10*8+8)(SP), R11
+	MOVQ	r12-(11*8+8)(SP), R12
+	MOVQ	r13-(12*8+8)(SP), R13
+	MOVQ	r14-(13*8+8)(SP), R14
+	MOVQ	r15-(14*8+8)(SP), R15
+
+	RET
+
+// runtime.debugCallCheck assumes that functions defined with the
+// DEBUG_CALL_FN macro are safe points to inject calls.
+#define DEBUG_CALL_FN(NAME,MAXSIZE)		\
+TEXT NAME(SB),WRAPPER,$MAXSIZE-0;		\
+	NO_LOCAL_POINTERS;			\
+	MOVQ	$0, R12;				\
+	BYTE	$0xcc;				\
+	MOVQ	$1, R12;				\
+	BYTE	$0xcc;				\
+	RET
+DEBUG_CALL_FN(debugCall32<>, 32)
+DEBUG_CALL_FN(debugCall64<>, 64)
+DEBUG_CALL_FN(debugCall128<>, 128)
+DEBUG_CALL_FN(debugCall256<>, 256)
+DEBUG_CALL_FN(debugCall512<>, 512)
+DEBUG_CALL_FN(debugCall1024<>, 1024)
+DEBUG_CALL_FN(debugCall2048<>, 2048)
+DEBUG_CALL_FN(debugCall4096<>, 4096)
+DEBUG_CALL_FN(debugCall8192<>, 8192)
+DEBUG_CALL_FN(debugCall16384<>, 16384)
+DEBUG_CALL_FN(debugCall32768<>, 32768)
+DEBUG_CALL_FN(debugCall65536<>, 65536)
+
+// func debugCallPanicked(val interface{})
+TEXT runtime·debugCallPanicked(SB),NOSPLIT,$16-16
+	// Copy the panic value to the top of stack.
+	MOVQ	val_type+0(FP), AX
+	MOVQ	AX, 0(SP)
+	MOVQ	val_data+8(FP), AX
+	MOVQ	AX, 8(SP)
+	MOVQ	$2, R12
+	BYTE	$0xcc
+	RET
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+// Defined as ABIInternal since they do not use the stack-based Go ABI.
+TEXT runtime·panicIndex<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, BX
+	JMP	runtime·goPanicIndex<ABIInternal>(SB)
+TEXT runtime·panicIndexU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, BX
+	JMP	runtime·goPanicIndexU<ABIInternal>(SB)
+TEXT runtime·panicSliceAlen<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, AX
+	MOVQ	DX, BX
+	JMP	runtime·goPanicSliceAlen<ABIInternal>(SB)
+TEXT runtime·panicSliceAlenU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, AX
+	MOVQ	DX, BX
+	JMP	runtime·goPanicSliceAlenU<ABIInternal>(SB)
+TEXT runtime·panicSliceAcap<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, AX
+	MOVQ	DX, BX
+	JMP	runtime·goPanicSliceAcap<ABIInternal>(SB)
+TEXT runtime·panicSliceAcapU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, AX
+	MOVQ	DX, BX
+	JMP	runtime·goPanicSliceAcapU<ABIInternal>(SB)
+TEXT runtime·panicSliceB<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, BX
+	JMP	runtime·goPanicSliceB<ABIInternal>(SB)
+TEXT runtime·panicSliceBU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, BX
+	JMP	runtime·goPanicSliceBU<ABIInternal>(SB)
+TEXT runtime·panicSlice3Alen<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	DX, AX
+	JMP	runtime·goPanicSlice3Alen<ABIInternal>(SB)
+TEXT runtime·panicSlice3AlenU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	DX, AX
+	JMP	runtime·goPanicSlice3AlenU<ABIInternal>(SB)
+TEXT runtime·panicSlice3Acap<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	DX, AX
+	JMP	runtime·goPanicSlice3Acap<ABIInternal>(SB)
+TEXT runtime·panicSlice3AcapU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	DX, AX
+	JMP	runtime·goPanicSlice3AcapU<ABIInternal>(SB)
+TEXT runtime·panicSlice3B<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, AX
+	MOVQ	DX, BX
+	JMP	runtime·goPanicSlice3B<ABIInternal>(SB)
+TEXT runtime·panicSlice3BU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, AX
+	MOVQ	DX, BX
+	JMP	runtime·goPanicSlice3BU<ABIInternal>(SB)
+TEXT runtime·panicSlice3C<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, BX
+	JMP	runtime·goPanicSlice3C<ABIInternal>(SB)
+TEXT runtime·panicSlice3CU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	CX, BX
+	JMP	runtime·goPanicSlice3CU<ABIInternal>(SB)
+TEXT runtime·panicSliceConvert<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVQ	DX, AX
+	JMP	runtime·goPanicSliceConvert<ABIInternal>(SB)
+
+#ifdef GOOS_android
+// Use the free TLS_SLOT_APP slot #2 on Android Q.
+// Earlier androids are set up in gcc_android.c.
+DATA runtime·tls_g+0(SB)/8, $16
+GLOBL runtime·tls_g+0(SB), NOPTR, $8
+#endif
+#ifdef GOOS_windows
+GLOBL runtime·tls_g+0(SB), NOPTR, $8
+#endif
+
+// The compiler and assembler's -spectre=ret mode rewrites
+// all indirect CALL AX / JMP AX instructions to be
+// CALL retpolineAX / JMP retpolineAX.
+// See https://support.google.com/faqs/answer/7625886.
+#define RETPOLINE(reg) \
+	/*   CALL setup */     BYTE $0xE8; BYTE $(2+2); BYTE $0; BYTE $0; BYTE $0;	\
+	/* nospec: */									\
+	/*   PAUSE */           BYTE $0xF3; BYTE $0x90;					\
+	/*   JMP nospec */      BYTE $0xEB; BYTE $-(2+2);				\
+	/* setup: */									\
+	/*   MOVQ AX, 0(SP) */  BYTE $0x48|((reg&8)>>1); BYTE $0x89;			\
+	                        BYTE $0x04|((reg&7)<<3); BYTE $0x24;			\
+	/*   RET */             BYTE $0xC3
+
+TEXT runtime·retpolineAX(SB),NOSPLIT|NOFRAME,$0; RETPOLINE(0)
+TEXT runtime·retpolineCX(SB),NOSPLIT|NOFRAME,$0; RETPOLINE(1)
+TEXT runtime·retpolineDX(SB),NOSPLIT|NOFRAME,$0; RETPOLINE(2)
+TEXT runtime·retpolineBX(SB),NOSPLIT|NOFRAME,$0; RETPOLINE(3)
+/* SP is 4, can't happen / magic encodings */
+TEXT runtime·retpolineBP(SB),NOSPLIT|NOFRAME,$0; RETPOLINE(5)
+TEXT runtime·retpolineSI(SB),NOSPLIT|NOFRAME,$0; RETPOLINE(6)
+TEXT runtime·retpolineDI(SB),NOSPLIT|NOFRAME,$0; RETPOLINE(7)
+TEXT runtime·retpolineR8(SB),NOSPLIT|NOFRAME,$0; RETPOLINE(8)
+TEXT runtime·retpolineR9(SB),NOSPLIT|NOFRAME,$0; RETPOLINE(9)
+TEXT runtime·retpolineR10(SB),NOSPLIT|NOFRAME,$0; RETPOLINE(10)
+TEXT runtime·retpolineR11(SB),NOSPLIT|NOFRAME,$0; RETPOLINE(11)
+TEXT runtime·retpolineR12(SB),NOSPLIT|NOFRAME,$0; RETPOLINE(12)
+TEXT runtime·retpolineR13(SB),NOSPLIT|NOFRAME,$0; RETPOLINE(13)
+TEXT runtime·retpolineR14(SB),NOSPLIT|NOFRAME,$0; RETPOLINE(14)
+TEXT runtime·retpolineR15(SB),NOSPLIT|NOFRAME,$0; RETPOLINE(15)
+
+TEXT ·getfp<ABIInternal>(SB),NOSPLIT|NOFRAME,$0
+	MOVQ BP, AX
+	RET
diff --git a/src/runtime/asm_arm.s b/src/runtime/asm_arm.s
new file mode 100644
index 0000000..e3206a1
--- /dev/null
+++ b/src/runtime/asm_arm.s
@@ -0,0 +1,1130 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// _rt0_arm is common startup code for most ARM systems when using
+// internal linking. This is the entry point for the program from the
+// kernel for an ordinary -buildmode=exe program. The stack holds the
+// number of arguments and the C-style argv.
+TEXT _rt0_arm(SB),NOSPLIT|NOFRAME,$0
+	MOVW	(R13), R0	// argc
+	MOVW	$4(R13), R1		// argv
+	B	runtime·rt0_go(SB)
+
+// main is common startup code for most ARM systems when using
+// external linking. The C startup code will call the symbol "main"
+// passing argc and argv in the usual C ABI registers R0 and R1.
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	B	runtime·rt0_go(SB)
+
+// _rt0_arm_lib is common startup code for most ARM systems when
+// using -buildmode=c-archive or -buildmode=c-shared. The linker will
+// arrange to invoke this function as a global constructor (for
+// c-archive) or when the shared library is loaded (for c-shared).
+// We expect argc and argv to be passed in the usual C ABI registers
+// R0 and R1.
+TEXT _rt0_arm_lib(SB),NOSPLIT,$104
+	// Preserve callee-save registers. Raspberry Pi's dlopen(), for example,
+	// actually cares that R11 is preserved.
+	MOVW	R4, 12(R13)
+	MOVW	R5, 16(R13)
+	MOVW	R6, 20(R13)
+	MOVW	R7, 24(R13)
+	MOVW	R8, 28(R13)
+	MOVW	g, 32(R13)
+	MOVW	R11, 36(R13)
+
+	// Skip floating point registers on GOARM < 6.
+	MOVB    runtime·goarm(SB), R11
+	CMP	$6, R11
+	BLT	skipfpsave
+	MOVD	F8, (40+8*0)(R13)
+	MOVD	F9, (40+8*1)(R13)
+	MOVD	F10, (40+8*2)(R13)
+	MOVD	F11, (40+8*3)(R13)
+	MOVD	F12, (40+8*4)(R13)
+	MOVD	F13, (40+8*5)(R13)
+	MOVD	F14, (40+8*6)(R13)
+	MOVD	F15, (40+8*7)(R13)
+skipfpsave:
+	// Save argc/argv.
+	MOVW	R0, _rt0_arm_lib_argc<>(SB)
+	MOVW	R1, _rt0_arm_lib_argv<>(SB)
+
+	MOVW	$0, g // Initialize g.
+
+	// Synchronous initialization.
+	CALL	runtime·libpreinit(SB)
+
+	// Create a new thread to do the runtime initialization.
+	MOVW	_cgo_sys_thread_create(SB), R2
+	CMP	$0, R2
+	BEQ	nocgo
+	MOVW	$_rt0_arm_lib_go<>(SB), R0
+	MOVW	$0, R1
+	BL	(R2)
+	B	rr
+nocgo:
+	MOVW	$0x800000, R0                     // stacksize = 8192KB
+	MOVW	$_rt0_arm_lib_go<>(SB), R1  // fn
+	MOVW	R0, 4(R13)
+	MOVW	R1, 8(R13)
+	BL	runtime·newosproc0(SB)
+rr:
+	// Restore callee-save registers and return.
+	MOVB    runtime·goarm(SB), R11
+	CMP	$6, R11
+	BLT	skipfprest
+	MOVD	(40+8*0)(R13), F8
+	MOVD	(40+8*1)(R13), F9
+	MOVD	(40+8*2)(R13), F10
+	MOVD	(40+8*3)(R13), F11
+	MOVD	(40+8*4)(R13), F12
+	MOVD	(40+8*5)(R13), F13
+	MOVD	(40+8*6)(R13), F14
+	MOVD	(40+8*7)(R13), F15
+skipfprest:
+	MOVW	12(R13), R4
+	MOVW	16(R13), R5
+	MOVW	20(R13), R6
+	MOVW	24(R13), R7
+	MOVW	28(R13), R8
+	MOVW	32(R13), g
+	MOVW	36(R13), R11
+	RET
+
+// _rt0_arm_lib_go initializes the Go runtime.
+// This is started in a separate thread by _rt0_arm_lib.
+TEXT _rt0_arm_lib_go<>(SB),NOSPLIT,$8
+	MOVW	_rt0_arm_lib_argc<>(SB), R0
+	MOVW	_rt0_arm_lib_argv<>(SB), R1
+	B	runtime·rt0_go(SB)
+
+DATA _rt0_arm_lib_argc<>(SB)/4,$0
+GLOBL _rt0_arm_lib_argc<>(SB),NOPTR,$4
+DATA _rt0_arm_lib_argv<>(SB)/4,$0
+GLOBL _rt0_arm_lib_argv<>(SB),NOPTR,$4
+
+// using NOFRAME means do not save LR on stack.
+// argc is in R0, argv is in R1.
+TEXT runtime·rt0_go(SB),NOSPLIT|NOFRAME|TOPFRAME,$0
+	MOVW	$0xcafebabe, R12
+
+	// copy arguments forward on an even stack
+	// use R13 instead of SP to avoid linker rewriting the offsets
+	SUB	$64, R13		// plenty of scratch
+	AND	$~7, R13
+	MOVW	R0, 60(R13)		// save argc, argv away
+	MOVW	R1, 64(R13)
+
+	// set up g register
+	// g is R10
+	MOVW	$runtime·g0(SB), g
+	MOVW	$runtime·m0(SB), R8
+
+	// save m->g0 = g0
+	MOVW	g, m_g0(R8)
+	// save g->m = m0
+	MOVW	R8, g_m(g)
+
+	// create istack out of the OS stack
+	// (1MB of system stack is available on iOS and Android)
+	MOVW	$(-64*1024+104)(R13), R0
+	MOVW	R0, g_stackguard0(g)
+	MOVW	R0, g_stackguard1(g)
+	MOVW	R0, (g_stack+stack_lo)(g)
+	MOVW	R13, (g_stack+stack_hi)(g)
+
+	BL	runtime·emptyfunc(SB)	// fault if stack check is wrong
+
+#ifdef GOOS_openbsd
+	// Save g to TLS so that it is available from signal trampoline.
+	BL	runtime·save_g(SB)
+#endif
+
+	BL	runtime·_initcgo(SB)	// will clobber R0-R3
+
+	// update stackguard after _cgo_init
+	MOVW	(g_stack+stack_lo)(g), R0
+	ADD	$const_stackGuard, R0
+	MOVW	R0, g_stackguard0(g)
+	MOVW	R0, g_stackguard1(g)
+
+	BL	runtime·check(SB)
+
+	// saved argc, argv
+	MOVW	60(R13), R0
+	MOVW	R0, 4(R13)
+	MOVW	64(R13), R1
+	MOVW	R1, 8(R13)
+	BL	runtime·args(SB)
+	BL	runtime·checkgoarm(SB)
+	BL	runtime·osinit(SB)
+	BL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	SUB	$8, R13
+	MOVW	$runtime·mainPC(SB), R0
+	MOVW	R0, 4(R13)	// arg 1: fn
+	MOVW	$0, R0
+	MOVW	R0, 0(R13)	// dummy LR
+	BL	runtime·newproc(SB)
+	ADD	$8, R13	// pop args and LR
+
+	// start this M
+	BL	runtime·mstart(SB)
+
+	MOVW	$1234, R0
+	MOVW	$1000, R1
+	MOVW	R0, (R1)	// fail hard
+
+DATA	runtime·mainPC+0(SB)/4,$runtime·main(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$4
+
+TEXT runtime·breakpoint(SB),NOSPLIT,$0-0
+	// gdb won't skip this breakpoint instruction automatically,
+	// so you must manually "set $pc+=4" to skip it and continue.
+#ifdef GOOS_plan9
+	WORD	$0xD1200070	// undefined instruction used as armv5 breakpoint in Plan 9
+#else
+	WORD	$0xe7f001f0	// undefined instruction that gdb understands is a software breakpoint
+#endif
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT,$0-0
+	// disable runfast (flush-to-zero) mode of vfp if runtime.goarm > 5
+	MOVB	runtime·goarm(SB), R11
+	CMP	$5, R11
+	BLE	4(PC)
+	WORD	$0xeef1ba10	// vmrs r11, fpscr
+	BIC	$(1<<24), R11
+	WORD	$0xeee1ba10	// vmsr fpscr, r11
+	RET
+
+TEXT runtime·mstart(SB),NOSPLIT|TOPFRAME,$0
+	BL	runtime·mstart0(SB)
+	RET // not reached
+
+/*
+ *  go-routine
+ */
+
+// void gogo(Gobuf*)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	buf+0(FP), R1
+	MOVW	gobuf_g(R1), R0
+	MOVW	0(R0), R2	// make sure g != nil
+	B	gogo<>(SB)
+
+TEXT gogo<>(SB),NOSPLIT|NOFRAME,$0
+	BL	setg<>(SB)
+	MOVW	gobuf_sp(R1), R13	// restore SP==R13
+	MOVW	gobuf_lr(R1), LR
+	MOVW	gobuf_ret(R1), R0
+	MOVW	gobuf_ctxt(R1), R7
+	MOVW	$0, R11
+	MOVW	R11, gobuf_sp(R1)	// clear to help garbage collector
+	MOVW	R11, gobuf_ret(R1)
+	MOVW	R11, gobuf_lr(R1)
+	MOVW	R11, gobuf_ctxt(R1)
+	MOVW	gobuf_pc(R1), R11
+	CMP	R11, R11 // set condition codes for == test, needed by stack split
+	B	(R11)
+
+// func mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall(SB),NOSPLIT|NOFRAME,$0-4
+	// Save caller state in g->sched.
+	MOVW	R13, (g_sched+gobuf_sp)(g)
+	MOVW	LR, (g_sched+gobuf_pc)(g)
+	MOVW	$0, R11
+	MOVW	R11, (g_sched+gobuf_lr)(g)
+
+	// Switch to m->g0 & its stack, call fn.
+	MOVW	g, R1
+	MOVW	g_m(g), R8
+	MOVW	m_g0(R8), R0
+	BL	setg<>(SB)
+	CMP	g, R1
+	B.NE	2(PC)
+	B	runtime·badmcall(SB)
+	MOVW	fn+0(FP), R0
+	MOVW	(g_sched+gobuf_sp)(g), R13
+	SUB	$8, R13
+	MOVW	R1, 4(R13)
+	MOVW	R0, R7
+	MOVW	0(R0), R0
+	BL	(R0)
+	B	runtime·badmcall2(SB)
+	RET
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack. We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB),NOSPLIT,$0-0
+	MOVW	$0, R0
+	BL	(R0) // clobber lr to ensure push {lr} is kept
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB),NOSPLIT,$0-4
+	MOVW	fn+0(FP), R0	// R0 = fn
+	MOVW	g_m(g), R1	// R1 = m
+
+	MOVW	m_gsignal(R1), R2	// R2 = gsignal
+	CMP	g, R2
+	B.EQ	noswitch
+
+	MOVW	m_g0(R1), R2	// R2 = g0
+	CMP	g, R2
+	B.EQ	noswitch
+
+	MOVW	m_curg(R1), R3
+	CMP	g, R3
+	B.EQ	switch
+
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOVW	$runtime·badsystemstack(SB), R0
+	BL	(R0)
+	B	runtime·abort(SB)
+
+switch:
+	// save our state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	BL	gosave_systemstack_switch<>(SB)
+
+	// switch to g0
+	MOVW	R0, R5
+	MOVW	R2, R0
+	BL	setg<>(SB)
+	MOVW	R5, R0
+	MOVW	(g_sched+gobuf_sp)(R2), R13
+
+	// call target function
+	MOVW	R0, R7
+	MOVW	0(R0), R0
+	BL	(R0)
+
+	// switch back to g
+	MOVW	g_m(g), R1
+	MOVW	m_curg(R1), R0
+	BL	setg<>(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R13
+	MOVW	$0, R3
+	MOVW	R3, (g_sched+gobuf_sp)(g)
+	RET
+
+noswitch:
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOVW	R0, R7
+	MOVW	0(R0), R0
+	MOVW.P	4(R13), R14	// restore LR
+	B	(R0)
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+// R3 prolog's LR
+// using NOFRAME means do not save LR on stack.
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	MOVW	g_m(g), R8
+	MOVW	m_g0(R8), R4
+	CMP	g, R4
+	BNE	3(PC)
+	BL	runtime·badmorestackg0(SB)
+	B	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOVW	m_gsignal(R8), R4
+	CMP	g, R4
+	BNE	3(PC)
+	BL	runtime·badmorestackgsignal(SB)
+	B	runtime·abort(SB)
+
+	// Called from f.
+	// Set g->sched to context in f.
+	MOVW	R13, (g_sched+gobuf_sp)(g)
+	MOVW	LR, (g_sched+gobuf_pc)(g)
+	MOVW	R3, (g_sched+gobuf_lr)(g)
+	MOVW	R7, (g_sched+gobuf_ctxt)(g)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	MOVW	R3, (m_morebuf+gobuf_pc)(R8)	// f's caller's PC
+	MOVW	R13, (m_morebuf+gobuf_sp)(R8)	// f's caller's SP
+	MOVW	g, (m_morebuf+gobuf_g)(R8)
+
+	// Call newstack on m->g0's stack.
+	MOVW	m_g0(R8), R0
+	BL	setg<>(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R13
+	MOVW	$0, R0
+	MOVW.W  R0, -4(R13)	// create a call frame on g0 (saved LR)
+	BL	runtime·newstack(SB)
+
+	// Not reached, but make sure the return PC from the call to newstack
+	// is still in this function, and not the beginning of the next.
+	RET
+
+TEXT runtime·morestack_noctxt(SB),NOSPLIT|NOFRAME,$0-0
+	// Force SPWRITE. This function doesn't actually write SP,
+	// but it is called with a special calling convention where
+	// the caller doesn't save LR on stack but passes it as a
+	// register (R3), and the unwinder currently doesn't understand.
+	// Make it SPWRITE to stop unwinding. (See issue 54332)
+	MOVW	R13, R13
+
+	MOVW	$0, R7
+	B runtime·morestack(SB)
+
+// reflectcall: call a function with the given argument list
+// func call(stackArgsType *_type, f *FuncVal, stackArgs *byte, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)		\
+	CMP	$MAXSIZE, R0;		\
+	B.HI	3(PC);			\
+	MOVW	$NAME(SB), R1;		\
+	B	(R1)
+
+TEXT ·reflectcall(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	frameSize+20(FP), R0
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVW	$runtime·badreflectcall(SB), R1
+	B	(R1)
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-28;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOVW	stackArgs+8(FP), R0;		\
+	MOVW	stackArgsSize+12(FP), R2;		\
+	ADD	$4, R13, R1;			\
+	CMP	$0, R2;				\
+	B.EQ	5(PC);				\
+	MOVBU.P	1(R0), R5;			\
+	MOVBU.P R5, 1(R1);			\
+	SUB	$1, R2, R2;			\
+	B	-5(PC);				\
+	/* call function */			\
+	MOVW	f+4(FP), R7;			\
+	MOVW	(R7), R0;			\
+	PCDATA  $PCDATA_StackMapIndex, $0;	\
+	BL	(R0);				\
+	/* copy return values back */		\
+	MOVW	stackArgsType+0(FP), R4;		\
+	MOVW	stackArgs+8(FP), R0;		\
+	MOVW	stackArgsSize+12(FP), R2;		\
+	MOVW	stackArgsRetOffset+16(FP), R3;		\
+	ADD	$4, R13, R1;			\
+	ADD	R3, R1;				\
+	ADD	R3, R0;				\
+	SUB	R3, R2;				\
+	BL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $20-0
+	MOVW	R4, 4(R13)
+	MOVW	R0, 8(R13)
+	MOVW	R1, 12(R13)
+	MOVW	R2, 16(R13)
+	MOVW	$0, R7
+	MOVW	R7, 20(R13)
+	BL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+// Save state of caller into g->sched,
+// but using fake PC from systemstack_switch.
+// Must only be called from functions with no locals ($0)
+// or else unwinding from systemstack_switch is incorrect.
+// Smashes R11.
+TEXT gosave_systemstack_switch<>(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$runtime·systemstack_switch(SB), R11
+	ADD	$4, R11 // get past push {lr}
+	MOVW	R11, (g_sched+gobuf_pc)(g)
+	MOVW	R13, (g_sched+gobuf_sp)(g)
+	MOVW	$0, R11
+	MOVW	R11, (g_sched+gobuf_lr)(g)
+	MOVW	R11, (g_sched+gobuf_ret)(g)
+	// Assert ctxt is zero. See func save.
+	MOVW	(g_sched+gobuf_ctxt)(g), R11
+	TST	R11, R11
+	B.EQ	2(PC)
+	BL	runtime·abort(SB)
+	RET
+
+// func asmcgocall_no_g(fn, arg unsafe.Pointer)
+// Call fn(arg) aligned appropriately for the gcc ABI.
+// Called on a system stack, and there may be no g yet (during needm).
+TEXT ·asmcgocall_no_g(SB),NOSPLIT,$0-8
+	MOVW	fn+0(FP), R1
+	MOVW	arg+4(FP), R0
+	MOVW	R13, R2
+	SUB	$32, R13
+	BIC	$0x7, R13	// alignment for gcc ABI
+	MOVW	R2, 8(R13)
+	BL	(R1)
+	MOVW	8(R13), R2
+	MOVW	R2, R13
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-12
+	MOVW	fn+0(FP), R1
+	MOVW	arg+4(FP), R0
+
+	MOVW	R13, R2
+	CMP	$0, g
+	BEQ nosave
+	MOVW	g, R4
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already. Or we might already
+	// be on the m->gsignal stack.
+	MOVW	g_m(g), R8
+	MOVW	m_gsignal(R8), R3
+	CMP	R3, g
+	BEQ	nosave
+	MOVW	m_g0(R8), R3
+	CMP	R3, g
+	BEQ	nosave
+	BL	gosave_systemstack_switch<>(SB)
+	MOVW	R0, R5
+	MOVW	R3, R0
+	BL	setg<>(SB)
+	MOVW	R5, R0
+	MOVW	(g_sched+gobuf_sp)(g), R13
+
+	// Now on a scheduling stack (a pthread-created stack).
+	SUB	$24, R13
+	BIC	$0x7, R13	// alignment for gcc ABI
+	MOVW	R4, 20(R13) // save old g
+	MOVW	(g_stack+stack_hi)(R4), R4
+	SUB	R2, R4
+	MOVW	R4, 16(R13)	// save depth in stack (can't just save SP, as stack might be copied during a callback)
+	BL	(R1)
+
+	// Restore registers, g, stack pointer.
+	MOVW	R0, R5
+	MOVW	20(R13), R0
+	BL	setg<>(SB)
+	MOVW	(g_stack+stack_hi)(g), R1
+	MOVW	16(R13), R2
+	SUB	R2, R1
+	MOVW	R5, R0
+	MOVW	R1, R13
+
+	MOVW	R0, ret+8(FP)
+	RET
+
+nosave:
+	// Running on a system stack, perhaps even without a g.
+	// Having no g can happen during thread creation or thread teardown
+	// (see needm/dropm on Solaris, for example).
+	// This code is like the above sequence but without saving/restoring g
+	// and without worrying about the stack moving out from under us
+	// (because we're on a system stack, not a goroutine stack).
+	// The above code could be used directly if already on a system stack,
+	// but then the only path through this code would be a rare case on Solaris.
+	// Using this code for all "already on system stack" calls exercises it more,
+	// which should help keep it correct.
+	SUB	$24, R13
+	BIC	$0x7, R13	// alignment for gcc ABI
+	// save null g in case someone looks during debugging.
+	MOVW	$0, R4
+	MOVW	R4, 20(R13)
+	MOVW	R2, 16(R13)	// Save old stack pointer.
+	BL	(R1)
+	// Restore stack pointer.
+	MOVW	16(R13), R2
+	MOVW	R2, R13
+	MOVW	R0, ret+8(FP)
+	RET
+
+// cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT	·cgocallback(SB),NOSPLIT,$12-12
+	NO_LOCAL_POINTERS
+
+	// Skip cgocallbackg, just dropm when fn is nil, and frame is the saved g.
+	// It is used to dropm while thread is exiting.
+	MOVW	fn+0(FP), R1
+	CMP	$0, R1
+	B.NE	loadg
+	// Restore the g from frame.
+	MOVW	frame+4(FP), g
+	B	dropm
+
+loadg:
+	// Load m and g from thread-local storage.
+#ifdef GOOS_openbsd
+	BL	runtime·load_g(SB)
+#else
+	MOVB	runtime·iscgo(SB), R0
+	CMP	$0, R0
+	BL.NE	runtime·load_g(SB)
+#endif
+
+	// If g is nil, Go did not create the current thread,
+	// or if this thread never called into Go on pthread platforms.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call.
+	CMP	$0, g
+	B.EQ	needm
+
+	MOVW	g_m(g), R8
+	MOVW	R8, savedm-4(SP)
+	B	havem
+
+needm:
+	MOVW	g, savedm-4(SP) // g is zero, so is m.
+	MOVW	$runtime·needAndBindM(SB), R0
+	BL	(R0)
+
+	// Set m->g0->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVW	g_m(g), R8
+	MOVW	m_g0(R8), R3
+	MOVW	R13, (g_sched+gobuf_sp)(R3)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 4(R13) aka savedsp-12(SP).
+	MOVW	m_g0(R8), R3
+	MOVW	(g_sched+gobuf_sp)(R3), R4
+	MOVW	R4, savedsp-12(SP)	// must match frame size
+	MOVW	R13, (g_sched+gobuf_sp)(R3)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOVW	m_curg(R8), R0
+	BL	setg<>(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R4 // prepare stack as R4
+	MOVW	(g_sched+gobuf_pc)(g), R5
+	MOVW	R5, -(12+4)(R4)	// "saved LR"; must match frame size
+	// Gather our arguments into registers.
+	MOVW	fn+0(FP), R1
+	MOVW	frame+4(FP), R2
+	MOVW	ctxt+8(FP), R3
+	MOVW	$-(12+4)(R4), R13	// switch stack; must match frame size
+	MOVW	R1, 4(R13)
+	MOVW	R2, 8(R13)
+	MOVW	R3, 12(R13)
+	BL	runtime·cgocallbackg(SB)
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	MOVW	0(R13), R5
+	MOVW	R5, (g_sched+gobuf_pc)(g)
+	MOVW	$(12+4)(R13), R4	// must match frame size
+	MOVW	R4, (g_sched+gobuf_sp)(g)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVW	g_m(g), R8
+	MOVW	m_g0(R8), R0
+	BL	setg<>(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R13
+	MOVW	savedsp-12(SP), R4	// must match frame size
+	MOVW	R4, (g_sched+gobuf_sp)(g)
+
+	// If the m on entry was nil, we called needm above to borrow an m,
+	// 1. for the duration of the call on non-pthread platforms,
+	// 2. or the duration of the C thread alive on pthread platforms.
+	// If the m on entry wasn't nil,
+	// 1. the thread might be a Go thread,
+	// 2. or it wasn't the first call from a C thread on pthread platforms,
+	//    since then we skip dropm to reuse the m in the first call.
+	MOVW	savedm-4(SP), R6
+	CMP	$0, R6
+	B.NE	done
+
+	// Skip dropm to reuse it in the next call, when a pthread key has been created.
+	MOVW	_cgo_pthread_key_created(SB), R6
+	// It means cgo is disabled when _cgo_pthread_key_created is a nil pointer, need dropm.
+	CMP	$0, R6
+	B.EQ	dropm
+	MOVW	(R6), R6
+	CMP	$0, R6
+	B.NE	done
+
+dropm:
+	MOVW	$runtime·dropm(SB), R0
+	BL	(R0)
+
+done:
+	// Done!
+	RET
+
+// void setg(G*); set g. for use by needm.
+TEXT runtime·setg(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	gg+0(FP), R0
+	B	setg<>(SB)
+
+TEXT setg<>(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	R0, g
+
+	// Save g to thread-local storage.
+#ifdef GOOS_windows
+	B	runtime·save_g(SB)
+#else
+#ifdef GOOS_openbsd
+	B	runtime·save_g(SB)
+#else
+	MOVB	runtime·iscgo(SB), R0
+	CMP	$0, R0
+	B.EQ	2(PC)
+	B	runtime·save_g(SB)
+
+	MOVW	g, R0
+	RET
+#endif
+#endif
+
+TEXT runtime·emptyfunc(SB),0,$0-0
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	$0, R0
+	MOVW	(R0), R1
+
+// armPublicationBarrier is a native store/store barrier for ARMv7+.
+// On earlier ARM revisions, armPublicationBarrier is a no-op.
+// This will not work on SMP ARMv6 machines, if any are in use.
+// To implement publicationBarrier in sys_$GOOS_arm.s using the native
+// instructions, use:
+//
+//	TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+//		B	runtime·armPublicationBarrier(SB)
+//
+TEXT runtime·armPublicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	DMB	MB_ST
+	RET
+
+// AES hashing not implemented for ARM
+TEXT runtime·memhash(SB),NOSPLIT|NOFRAME,$0-16
+	JMP	runtime·memhashFallback(SB)
+TEXT runtime·strhash(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	runtime·strhashFallback(SB)
+TEXT runtime·memhash32(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	runtime·memhash32Fallback(SB)
+TEXT runtime·memhash64(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	runtime·memhash64Fallback(SB)
+
+TEXT runtime·return0(SB),NOSPLIT,$0
+	MOVW	$0, R0
+	RET
+
+TEXT runtime·procyield(SB),NOSPLIT|NOFRAME,$0
+	MOVW	cycles+0(FP), R1
+	MOVW	$0, R0
+yieldloop:
+	WORD	$0xe320f001	// YIELD (NOP pre-ARMv6K)
+	CMP	R0, R1
+	B.NE	2(PC)
+	RET
+	SUB	$1, R1
+	B yieldloop
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT,$8
+	// R11 and g register are clobbered by load_g. They are
+	// callee-save in the gcc calling convention, so save them here.
+	MOVW	R11, saveR11-4(SP)
+	MOVW	g, saveG-8(SP)
+
+	BL	runtime·load_g(SB)
+	MOVW	g_m(g), R0
+	MOVW	m_curg(R0), R0
+	MOVW	(g_stack+stack_hi)(R0), R0
+
+	MOVW	saveG-8(SP), g
+	MOVW	saveR11-4(SP), R11
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+TEXT runtime·goexit(SB),NOSPLIT|NOFRAME|TOPFRAME,$0-0
+	MOVW	R0, R0	// NOP
+	BL	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	MOVW	R0, R0	// NOP
+
+// x -> x/1000000, x%1000000, called from Go with args, results on stack.
+TEXT runtime·usplit(SB),NOSPLIT,$0-12
+	MOVW	x+0(FP), R0
+	CALL	runtime·usplitR0(SB)
+	MOVW	R0, q+4(FP)
+	MOVW	R1, r+8(FP)
+	RET
+
+// R0, R1 = R0/1000000, R0%1000000
+TEXT runtime·usplitR0(SB),NOSPLIT,$0
+	// magic multiply to avoid software divide without available m.
+	// see output of go tool compile -S for x/1000000.
+	MOVW	R0, R3
+	MOVW	$1125899907, R1
+	MULLU	R1, R0, (R0, R1)
+	MOVW	R0>>18, R0
+	MOVW	$1000000, R1
+	MULU	R0, R1
+	SUB	R1, R3, R1
+	RET
+
+// This is called from .init_array and follows the platform, not Go, ABI.
+TEXT runtime·addmoduledata(SB),NOSPLIT,$0-0
+	MOVW	R9, saver9-4(SP) // The access to global variables below implicitly uses R9, which is callee-save
+	MOVW	R11, saver11-8(SP) // Likewise, R11 is the temp register, but callee-save in C ABI
+	MOVW	runtime·lastmoduledatap(SB), R1
+	MOVW	R0, moduledata_next(R1)
+	MOVW	R0, runtime·lastmoduledatap(SB)
+	MOVW	saver11-8(SP), R11
+	MOVW	saver9-4(SP), R9
+	RET
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	MOVW	$1, R3
+	MOVB	R3, ret+0(FP)
+	RET
+
+// gcWriteBarrier informs the GC about heap pointer writes.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It accepts the
+// number of bytes of buffer needed in R8, and returns a pointer
+// to the buffer space in R8.
+// It clobbers condition codes.
+// It does not clobber any other general-purpose registers,
+// but may clobber others (e.g., floating point registers).
+// The act of CALLing gcWriteBarrier will clobber R14 (LR).
+TEXT gcWriteBarrier<>(SB),NOSPLIT|NOFRAME,$0
+	// Save the registers clobbered by the fast path.
+	MOVM.DB.W	[R0,R1], (R13)
+retry:
+	MOVW	g_m(g), R0
+	MOVW	m_p(R0), R0
+	MOVW	(p_wbBuf+wbBuf_next)(R0), R1
+	MOVW	(p_wbBuf+wbBuf_end)(R0), R11
+	// Increment wbBuf.next position.
+	ADD	R8, R1
+	// Is the buffer full?
+	CMP	R11, R1
+	BHI	flush
+	// Commit to the larger buffer.
+	MOVW	R1, (p_wbBuf+wbBuf_next)(R0)
+	// Make return value (the original next position)
+	SUB	R8, R1, R8
+	// Restore registers.
+	MOVM.IA.W	(R13), [R0,R1]
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	//
+	// R0 and R1 were saved at entry.
+	// R10 is g, so preserved.
+	// R11 is linker temp, so no need to save.
+	// R13 is stack pointer.
+	// R15 is PC.
+	MOVM.DB.W	[R2-R9,R12], (R13)
+	// Save R14 (LR) because the fast path above doesn't save it,
+	// but needs it to RET.
+	MOVM.DB.W	[R14], (R13)
+
+	CALL	runtime·wbBufFlush(SB)
+
+	MOVM.IA.W	(R13), [R14]
+	MOVM.IA.W	(R13), [R2-R9,R12]
+	JMP	retry
+
+TEXT runtime·gcWriteBarrier1<ABIInternal>(SB),NOSPLIT,$0
+	MOVW	$4, R8
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier2<ABIInternal>(SB),NOSPLIT,$0
+	MOVW	$8, R8
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier3<ABIInternal>(SB),NOSPLIT,$0
+	MOVW	$12, R8
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier4<ABIInternal>(SB),NOSPLIT,$0
+	MOVW	$16, R8
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier5<ABIInternal>(SB),NOSPLIT,$0
+	MOVW	$20, R8
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier6<ABIInternal>(SB),NOSPLIT,$0
+	MOVW	$24, R8
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier7<ABIInternal>(SB),NOSPLIT,$0
+	MOVW	$28, R8
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier8<ABIInternal>(SB),NOSPLIT,$0
+	MOVW	$32, R8
+	JMP	gcWriteBarrier<>(SB)
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+TEXT runtime·panicIndex(SB),NOSPLIT,$0-8
+	MOVW	R0, x+0(FP)
+	MOVW	R1, y+4(FP)
+	JMP	runtime·goPanicIndex(SB)
+TEXT runtime·panicIndexU(SB),NOSPLIT,$0-8
+	MOVW	R0, x+0(FP)
+	MOVW	R1, y+4(FP)
+	JMP	runtime·goPanicIndexU(SB)
+TEXT runtime·panicSliceAlen(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSliceAlen(SB)
+TEXT runtime·panicSliceAlenU(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSliceAlenU(SB)
+TEXT runtime·panicSliceAcap(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSliceAcap(SB)
+TEXT runtime·panicSliceAcapU(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSliceAcapU(SB)
+TEXT runtime·panicSliceB(SB),NOSPLIT,$0-8
+	MOVW	R0, x+0(FP)
+	MOVW	R1, y+4(FP)
+	JMP	runtime·goPanicSliceB(SB)
+TEXT runtime·panicSliceBU(SB),NOSPLIT,$0-8
+	MOVW	R0, x+0(FP)
+	MOVW	R1, y+4(FP)
+	JMP	runtime·goPanicSliceBU(SB)
+TEXT runtime·panicSlice3Alen(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSlice3Alen(SB)
+TEXT runtime·panicSlice3AlenU(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSlice3AlenU(SB)
+TEXT runtime·panicSlice3Acap(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSlice3Acap(SB)
+TEXT runtime·panicSlice3AcapU(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSlice3AcapU(SB)
+TEXT runtime·panicSlice3B(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSlice3B(SB)
+TEXT runtime·panicSlice3BU(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSlice3BU(SB)
+TEXT runtime·panicSlice3C(SB),NOSPLIT,$0-8
+	MOVW	R0, x+0(FP)
+	MOVW	R1, y+4(FP)
+	JMP	runtime·goPanicSlice3C(SB)
+TEXT runtime·panicSlice3CU(SB),NOSPLIT,$0-8
+	MOVW	R0, x+0(FP)
+	MOVW	R1, y+4(FP)
+	JMP	runtime·goPanicSlice3CU(SB)
+TEXT runtime·panicSliceConvert(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSliceConvert(SB)
+
+// Extended versions for 64-bit indexes.
+TEXT runtime·panicExtendIndex(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R0, lo+4(FP)
+	MOVW	R1, y+8(FP)
+	JMP	runtime·goPanicExtendIndex(SB)
+TEXT runtime·panicExtendIndexU(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R0, lo+4(FP)
+	MOVW	R1, y+8(FP)
+	JMP	runtime·goPanicExtendIndexU(SB)
+TEXT runtime·panicExtendSliceAlen(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAlen(SB)
+TEXT runtime·panicExtendSliceAlenU(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAlenU(SB)
+TEXT runtime·panicExtendSliceAcap(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAcap(SB)
+TEXT runtime·panicExtendSliceAcapU(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAcapU(SB)
+TEXT runtime·panicExtendSliceB(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R0, lo+4(FP)
+	MOVW	R1, y+8(FP)
+	JMP	runtime·goPanicExtendSliceB(SB)
+TEXT runtime·panicExtendSliceBU(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R0, lo+4(FP)
+	MOVW	R1, y+8(FP)
+	JMP	runtime·goPanicExtendSliceBU(SB)
+TEXT runtime·panicExtendSlice3Alen(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3Alen(SB)
+TEXT runtime·panicExtendSlice3AlenU(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3AlenU(SB)
+TEXT runtime·panicExtendSlice3Acap(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3Acap(SB)
+TEXT runtime·panicExtendSlice3AcapU(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3AcapU(SB)
+TEXT runtime·panicExtendSlice3B(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3B(SB)
+TEXT runtime·panicExtendSlice3BU(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3BU(SB)
+TEXT runtime·panicExtendSlice3C(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R0, lo+4(FP)
+	MOVW	R1, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3C(SB)
+TEXT runtime·panicExtendSlice3CU(SB),NOSPLIT,$0-12
+	MOVW	R4, hi+0(FP)
+	MOVW	R0, lo+4(FP)
+	MOVW	R1, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3CU(SB)
diff --git a/src/runtime/asm_arm64.s b/src/runtime/asm_arm64.s
new file mode 100644
index 0000000..7866e35
--- /dev/null
+++ b/src/runtime/asm_arm64.s
@@ -0,0 +1,1573 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "tls_arm64.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
+	// SP = stack; R0 = argc; R1 = argv
+
+	SUB	$32, RSP
+	MOVW	R0, 8(RSP) // argc
+	MOVD	R1, 16(RSP) // argv
+
+#ifdef TLS_darwin
+	// Initialize TLS.
+	MOVD	ZR, g // clear g, make sure it's not junk.
+	SUB	$32, RSP
+	MRS_TPIDR_R0
+	AND	$~7, R0
+	MOVD	R0, 16(RSP)             // arg2: TLS base
+	MOVD	$runtime·tls_g(SB), R2
+	MOVD	R2, 8(RSP)              // arg1: &tlsg
+	BL	·tlsinit(SB)
+	ADD	$32, RSP
+#endif
+
+	// create istack out of the given (operating system) stack.
+	// _cgo_init may update stackguard.
+	MOVD	$runtime·g0(SB), g
+	MOVD	RSP, R7
+	MOVD	$(-64*1024)(R7), R0
+	MOVD	R0, g_stackguard0(g)
+	MOVD	R0, g_stackguard1(g)
+	MOVD	R0, (g_stack+stack_lo)(g)
+	MOVD	R7, (g_stack+stack_hi)(g)
+
+	// if there is a _cgo_init, call it using the gcc ABI.
+	MOVD	_cgo_init(SB), R12
+	CBZ	R12, nocgo
+
+#ifdef GOOS_android
+	MRS_TPIDR_R0			// load TLS base pointer
+	MOVD	R0, R3			// arg 3: TLS base pointer
+	MOVD	$runtime·tls_g(SB), R2 	// arg 2: &tls_g
+#else
+	MOVD	$0, R2		        // arg 2: not used when using platform's TLS
+#endif
+	MOVD	$setg_gcc<>(SB), R1	// arg 1: setg
+	MOVD	g, R0			// arg 0: G
+	SUB	$16, RSP		// reserve 16 bytes for sp-8 where fp may be saved.
+	BL	(R12)
+	ADD	$16, RSP
+
+nocgo:
+	BL	runtime·save_g(SB)
+	// update stackguard after _cgo_init
+	MOVD	(g_stack+stack_lo)(g), R0
+	ADD	$const_stackGuard, R0
+	MOVD	R0, g_stackguard0(g)
+	MOVD	R0, g_stackguard1(g)
+
+	// set the per-goroutine and per-mach "registers"
+	MOVD	$runtime·m0(SB), R0
+
+	// save m->g0 = g0
+	MOVD	g, m_g0(R0)
+	// save m0 to g0->m
+	MOVD	R0, g_m(g)
+
+	BL	runtime·check(SB)
+
+#ifdef GOOS_windows
+	BL	runtime·wintls(SB)
+#endif
+
+	MOVW	8(RSP), R0	// copy argc
+	MOVW	R0, -8(RSP)
+	MOVD	16(RSP), R0		// copy argv
+	MOVD	R0, 0(RSP)
+	BL	runtime·args(SB)
+	BL	runtime·osinit(SB)
+	BL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	MOVD	$runtime·mainPC(SB), R0		// entry
+	SUB	$16, RSP
+	MOVD	R0, 8(RSP) // arg
+	MOVD	$0, 0(RSP) // dummy LR
+	BL	runtime·newproc(SB)
+	ADD	$16, RSP
+
+	// start this M
+	BL	runtime·mstart(SB)
+
+	// Prevent dead-code elimination of debugCallV2, which is
+	// intended to be called by debuggers.
+	MOVD	$runtime·debugCallV2<ABIInternal>(SB), R0
+
+	MOVD	$0, R0
+	MOVD	R0, (R0)	// boom
+	UNDEF
+
+DATA	runtime·mainPC+0(SB)/8,$runtime·main<ABIInternal>(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$8
+
+// Windows ARM64 needs an immediate 0xf000 argument.
+// See go.dev/issues/53837.
+#define BREAK	\
+#ifdef GOOS_windows	\
+	BRK	$0xf000 	\
+#else 				\
+	BRK 			\
+#endif 				\
+
+
+TEXT runtime·breakpoint(SB),NOSPLIT|NOFRAME,$0-0
+	BREAK
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT|NOFRAME,$0-0
+	RET
+
+TEXT runtime·mstart(SB),NOSPLIT|TOPFRAME,$0
+	BL	runtime·mstart0(SB)
+	RET // not reached
+
+/*
+ *  go-routine
+ */
+
+// void gogo(Gobuf*)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB), NOSPLIT|NOFRAME, $0-8
+	MOVD	buf+0(FP), R5
+	MOVD	gobuf_g(R5), R6
+	MOVD	0(R6), R4	// make sure g != nil
+	B	gogo<>(SB)
+
+TEXT gogo<>(SB), NOSPLIT|NOFRAME, $0
+	MOVD	R6, g
+	BL	runtime·save_g(SB)
+
+	MOVD	gobuf_sp(R5), R0
+	MOVD	R0, RSP
+	MOVD	gobuf_bp(R5), R29
+	MOVD	gobuf_lr(R5), LR
+	MOVD	gobuf_ret(R5), R0
+	MOVD	gobuf_ctxt(R5), R26
+	MOVD	$0, gobuf_sp(R5)
+	MOVD	$0, gobuf_bp(R5)
+	MOVD	$0, gobuf_ret(R5)
+	MOVD	$0, gobuf_lr(R5)
+	MOVD	$0, gobuf_ctxt(R5)
+	CMP	ZR, ZR // set condition codes for == test, needed by stack split
+	MOVD	gobuf_pc(R5), R6
+	B	(R6)
+
+// void mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-8
+	MOVD	R0, R26				// context
+
+	// Save caller state in g->sched
+	MOVD	RSP, R0
+	MOVD	R0, (g_sched+gobuf_sp)(g)
+	MOVD	R29, (g_sched+gobuf_bp)(g)
+	MOVD	LR, (g_sched+gobuf_pc)(g)
+	MOVD	$0, (g_sched+gobuf_lr)(g)
+
+	// Switch to m->g0 & its stack, call fn.
+	MOVD	g, R3
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), g
+	BL	runtime·save_g(SB)
+	CMP	g, R3
+	BNE	2(PC)
+	B	runtime·badmcall(SB)
+
+	MOVD	(g_sched+gobuf_sp)(g), R0
+	MOVD	R0, RSP	// sp = m->g0->sched.sp
+	MOVD	(g_sched+gobuf_bp)(g), R29
+	MOVD	R3, R0				// arg = g
+	MOVD	$0, -16(RSP)			// dummy LR
+	SUB	$16, RSP
+	MOVD	0(R26), R4			// code pointer
+	BL	(R4)
+	B	runtime·badmcall2(SB)
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack. We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	UNDEF
+	BL	(LR)	// make sure this function is not leaf
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-8
+	MOVD	fn+0(FP), R3	// R3 = fn
+	MOVD	R3, R26		// context
+	MOVD	g_m(g), R4	// R4 = m
+
+	MOVD	m_gsignal(R4), R5	// R5 = gsignal
+	CMP	g, R5
+	BEQ	noswitch
+
+	MOVD	m_g0(R4), R5	// R5 = g0
+	CMP	g, R5
+	BEQ	noswitch
+
+	MOVD	m_curg(R4), R6
+	CMP	g, R6
+	BEQ	switch
+
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOVD	$runtime·badsystemstack(SB), R3
+	BL	(R3)
+	B	runtime·abort(SB)
+
+switch:
+	// save our state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	BL	gosave_systemstack_switch<>(SB)
+
+	// switch to g0
+	MOVD	R5, g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R3
+	MOVD	R3, RSP
+	MOVD	(g_sched+gobuf_bp)(g), R29
+
+	// call target function
+	MOVD	0(R26), R3	// code pointer
+	BL	(R3)
+
+	// switch back to g
+	MOVD	g_m(g), R3
+	MOVD	m_curg(R3), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R0
+	MOVD	R0, RSP
+	MOVD	(g_sched+gobuf_bp)(g), R29
+	MOVD	$0, (g_sched+gobuf_sp)(g)
+	MOVD	$0, (g_sched+gobuf_bp)(g)
+	RET
+
+noswitch:
+	// already on m stack, just call directly
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOVD	0(R26), R3	// code pointer
+	MOVD.P	16(RSP), R30	// restore LR
+	SUB	$8, RSP, R29	// restore FP
+	B	(R3)
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+// Caller has already loaded:
+// R3 prolog's LR (R30)
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), R4
+	CMP	g, R4
+	BNE	3(PC)
+	BL	runtime·badmorestackg0(SB)
+	B	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOVD	m_gsignal(R8), R4
+	CMP	g, R4
+	BNE	3(PC)
+	BL	runtime·badmorestackgsignal(SB)
+	B	runtime·abort(SB)
+
+	// Called from f.
+	// Set g->sched to context in f
+	MOVD	RSP, R0
+	MOVD	R0, (g_sched+gobuf_sp)(g)
+	MOVD	R29, (g_sched+gobuf_bp)(g)
+	MOVD	LR, (g_sched+gobuf_pc)(g)
+	MOVD	R3, (g_sched+gobuf_lr)(g)
+	MOVD	R26, (g_sched+gobuf_ctxt)(g)
+
+	// Called from f.
+	// Set m->morebuf to f's callers.
+	MOVD	R3, (m_morebuf+gobuf_pc)(R8)	// f's caller's PC
+	MOVD	RSP, R0
+	MOVD	R0, (m_morebuf+gobuf_sp)(R8)	// f's caller's RSP
+	MOVD	g, (m_morebuf+gobuf_g)(R8)
+
+	// Call newstack on m->g0's stack.
+	MOVD	m_g0(R8), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R0
+	MOVD	R0, RSP
+	MOVD	(g_sched+gobuf_bp)(g), R29
+	MOVD.W	$0, -16(RSP)	// create a call frame on g0 (saved LR; keep 16-aligned)
+	BL	runtime·newstack(SB)
+
+	// Not reached, but make sure the return PC from the call to newstack
+	// is still in this function, and not the beginning of the next.
+	UNDEF
+
+TEXT runtime·morestack_noctxt(SB),NOSPLIT|NOFRAME,$0-0
+	// Force SPWRITE. This function doesn't actually write SP,
+	// but it is called with a special calling convention where
+	// the caller doesn't save LR on stack but passes it as a
+	// register (R3), and the unwinder currently doesn't understand.
+	// Make it SPWRITE to stop unwinding. (See issue 54332)
+	MOVD	RSP, RSP
+
+	MOVW	$0, R26
+	B runtime·morestack(SB)
+
+// spillArgs stores return values from registers to a *internal/abi.RegArgs in R20.
+TEXT ·spillArgs(SB),NOSPLIT,$0-0
+	STP	(R0, R1), (0*8)(R20)
+	STP	(R2, R3), (2*8)(R20)
+	STP	(R4, R5), (4*8)(R20)
+	STP	(R6, R7), (6*8)(R20)
+	STP	(R8, R9), (8*8)(R20)
+	STP	(R10, R11), (10*8)(R20)
+	STP	(R12, R13), (12*8)(R20)
+	STP	(R14, R15), (14*8)(R20)
+	FSTPD	(F0, F1), (16*8)(R20)
+	FSTPD	(F2, F3), (18*8)(R20)
+	FSTPD	(F4, F5), (20*8)(R20)
+	FSTPD	(F6, F7), (22*8)(R20)
+	FSTPD	(F8, F9), (24*8)(R20)
+	FSTPD	(F10, F11), (26*8)(R20)
+	FSTPD	(F12, F13), (28*8)(R20)
+	FSTPD	(F14, F15), (30*8)(R20)
+	RET
+
+// unspillArgs loads args into registers from a *internal/abi.RegArgs in R20.
+TEXT ·unspillArgs(SB),NOSPLIT,$0-0
+	LDP	(0*8)(R20), (R0, R1)
+	LDP	(2*8)(R20), (R2, R3)
+	LDP	(4*8)(R20), (R4, R5)
+	LDP	(6*8)(R20), (R6, R7)
+	LDP	(8*8)(R20), (R8, R9)
+	LDP	(10*8)(R20), (R10, R11)
+	LDP	(12*8)(R20), (R12, R13)
+	LDP	(14*8)(R20), (R14, R15)
+	FLDPD	(16*8)(R20), (F0, F1)
+	FLDPD	(18*8)(R20), (F2, F3)
+	FLDPD	(20*8)(R20), (F4, F5)
+	FLDPD	(22*8)(R20), (F6, F7)
+	FLDPD	(24*8)(R20), (F8, F9)
+	FLDPD	(26*8)(R20), (F10, F11)
+	FLDPD	(28*8)(R20), (F12, F13)
+	FLDPD	(30*8)(R20), (F14, F15)
+	RET
+
+// reflectcall: call a function with the given argument list
+// func call(stackArgsType *_type, f *FuncVal, stackArgs *byte, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)		\
+	MOVD	$MAXSIZE, R27;		\
+	CMP	R27, R16;		\
+	BGT	3(PC);			\
+	MOVD	$NAME(SB), R27;	\
+	B	(R27)
+// Note: can't just "B NAME(SB)" - bad inlining results.
+
+TEXT ·reflectcall(SB), NOSPLIT|NOFRAME, $0-48
+	MOVWU	frameSize+32(FP), R16
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVD	$runtime·badreflectcall(SB), R0
+	B	(R0)
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-48;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOVD	stackArgs+16(FP), R3;			\
+	MOVWU	stackArgsSize+24(FP), R4;		\
+	ADD	$8, RSP, R5;			\
+	BIC	$0xf, R4, R6;			\
+	CBZ	R6, 6(PC);			\
+	/* if R6=(argsize&~15) != 0 */		\
+	ADD	R6, R5, R6;			\
+	/* copy 16 bytes a time */		\
+	LDP.P	16(R3), (R7, R8);		\
+	STP.P	(R7, R8), 16(R5);		\
+	CMP	R5, R6;				\
+	BNE	-3(PC);				\
+	AND	$0xf, R4, R6;			\
+	CBZ	R6, 6(PC);			\
+	/* if R6=(argsize&15) != 0 */		\
+	ADD	R6, R5, R6;			\
+	/* copy 1 byte a time for the rest */	\
+	MOVBU.P	1(R3), R7;			\
+	MOVBU.P	R7, 1(R5);			\
+	CMP	R5, R6;				\
+	BNE	-3(PC);				\
+	/* set up argument registers */		\
+	MOVD	regArgs+40(FP), R20;		\
+	CALL	·unspillArgs(SB);		\
+	/* call function */			\
+	MOVD	f+8(FP), R26;			\
+	MOVD	(R26), R20;			\
+	PCDATA	$PCDATA_StackMapIndex, $0;	\
+	BL	(R20);				\
+	/* copy return values back */		\
+	MOVD	regArgs+40(FP), R20;		\
+	CALL	·spillArgs(SB);		\
+	MOVD	stackArgsType+0(FP), R7;		\
+	MOVD	stackArgs+16(FP), R3;			\
+	MOVWU	stackArgsSize+24(FP), R4;			\
+	MOVWU	stackRetOffset+28(FP), R6;		\
+	ADD	$8, RSP, R5;			\
+	ADD	R6, R5; 			\
+	ADD	R6, R3;				\
+	SUB	R6, R4;				\
+	BL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $48-0
+	NO_LOCAL_POINTERS
+	STP	(R7, R3), 8(RSP)
+	STP	(R5, R4), 24(RSP)
+	MOVD	R20, 40(RSP)
+	BL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+// func memhash32(p unsafe.Pointer, h uintptr) uintptr
+TEXT runtime·memhash32<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-24
+	MOVB	runtime·useAeshash(SB), R10
+	CBZ	R10, noaes
+	MOVD	$runtime·aeskeysched+0(SB), R3
+
+	VEOR	V0.B16, V0.B16, V0.B16
+	VLD1	(R3), [V2.B16]
+	VLD1	(R0), V0.S[1]
+	VMOV	R1, V0.S[0]
+
+	AESE	V2.B16, V0.B16
+	AESMC	V0.B16, V0.B16
+	AESE	V2.B16, V0.B16
+	AESMC	V0.B16, V0.B16
+	AESE	V2.B16, V0.B16
+
+	VMOV	V0.D[0], R0
+	RET
+noaes:
+	B	runtime·memhash32Fallback<ABIInternal>(SB)
+
+// func memhash64(p unsafe.Pointer, h uintptr) uintptr
+TEXT runtime·memhash64<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-24
+	MOVB	runtime·useAeshash(SB), R10
+	CBZ	R10, noaes
+	MOVD	$runtime·aeskeysched+0(SB), R3
+
+	VEOR	V0.B16, V0.B16, V0.B16
+	VLD1	(R3), [V2.B16]
+	VLD1	(R0), V0.D[1]
+	VMOV	R1, V0.D[0]
+
+	AESE	V2.B16, V0.B16
+	AESMC	V0.B16, V0.B16
+	AESE	V2.B16, V0.B16
+	AESMC	V0.B16, V0.B16
+	AESE	V2.B16, V0.B16
+
+	VMOV	V0.D[0], R0
+	RET
+noaes:
+	B	runtime·memhash64Fallback<ABIInternal>(SB)
+
+// func memhash(p unsafe.Pointer, h, size uintptr) uintptr
+TEXT runtime·memhash<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-32
+	MOVB	runtime·useAeshash(SB), R10
+	CBZ	R10, noaes
+	B	aeshashbody<>(SB)
+noaes:
+	B	runtime·memhashFallback<ABIInternal>(SB)
+
+// func strhash(p unsafe.Pointer, h uintptr) uintptr
+TEXT runtime·strhash<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-24
+	MOVB	runtime·useAeshash(SB), R10
+	CBZ	R10, noaes
+	LDP	(R0), (R0, R2)	// string data / length
+	B	aeshashbody<>(SB)
+noaes:
+	B	runtime·strhashFallback<ABIInternal>(SB)
+
+// R0: data
+// R1: seed data
+// R2: length
+// At return, R0 = return value
+TEXT aeshashbody<>(SB),NOSPLIT|NOFRAME,$0
+	VEOR	V30.B16, V30.B16, V30.B16
+	VMOV	R1, V30.D[0]
+	VMOV	R2, V30.D[1] // load length into seed
+
+	MOVD	$runtime·aeskeysched+0(SB), R4
+	VLD1.P	16(R4), [V0.B16]
+	AESE	V30.B16, V0.B16
+	AESMC	V0.B16, V0.B16
+	CMP	$16, R2
+	BLO	aes0to15
+	BEQ	aes16
+	CMP	$32, R2
+	BLS	aes17to32
+	CMP	$64, R2
+	BLS	aes33to64
+	CMP	$128, R2
+	BLS	aes65to128
+	B	aes129plus
+
+aes0to15:
+	CBZ	R2, aes0
+	VEOR	V2.B16, V2.B16, V2.B16
+	TBZ	$3, R2, less_than_8
+	VLD1.P	8(R0), V2.D[0]
+
+less_than_8:
+	TBZ	$2, R2, less_than_4
+	VLD1.P	4(R0), V2.S[2]
+
+less_than_4:
+	TBZ	$1, R2, less_than_2
+	VLD1.P	2(R0), V2.H[6]
+
+less_than_2:
+	TBZ	$0, R2, done
+	VLD1	(R0), V2.B[14]
+done:
+	AESE	V0.B16, V2.B16
+	AESMC	V2.B16, V2.B16
+	AESE	V0.B16, V2.B16
+	AESMC	V2.B16, V2.B16
+	AESE	V0.B16, V2.B16
+	AESMC	V2.B16, V2.B16
+
+	VMOV	V2.D[0], R0
+	RET
+
+aes0:
+	VMOV	V0.D[0], R0
+	RET
+
+aes16:
+	VLD1	(R0), [V2.B16]
+	B	done
+
+aes17to32:
+	// make second seed
+	VLD1	(R4), [V1.B16]
+	AESE	V30.B16, V1.B16
+	AESMC	V1.B16, V1.B16
+	SUB	$16, R2, R10
+	VLD1.P	(R0)(R10), [V2.B16]
+	VLD1	(R0), [V3.B16]
+
+	AESE	V0.B16, V2.B16
+	AESMC	V2.B16, V2.B16
+	AESE	V1.B16, V3.B16
+	AESMC	V3.B16, V3.B16
+
+	AESE	V0.B16, V2.B16
+	AESMC	V2.B16, V2.B16
+	AESE	V1.B16, V3.B16
+	AESMC	V3.B16, V3.B16
+
+	AESE	V0.B16, V2.B16
+	AESE	V1.B16, V3.B16
+
+	VEOR	V3.B16, V2.B16, V2.B16
+
+	VMOV	V2.D[0], R0
+	RET
+
+aes33to64:
+	VLD1	(R4), [V1.B16, V2.B16, V3.B16]
+	AESE	V30.B16, V1.B16
+	AESMC	V1.B16, V1.B16
+	AESE	V30.B16, V2.B16
+	AESMC	V2.B16, V2.B16
+	AESE	V30.B16, V3.B16
+	AESMC	V3.B16, V3.B16
+	SUB	$32, R2, R10
+
+	VLD1.P	(R0)(R10), [V4.B16, V5.B16]
+	VLD1	(R0), [V6.B16, V7.B16]
+
+	AESE	V0.B16, V4.B16
+	AESMC	V4.B16, V4.B16
+	AESE	V1.B16, V5.B16
+	AESMC	V5.B16, V5.B16
+	AESE	V2.B16, V6.B16
+	AESMC	V6.B16, V6.B16
+	AESE	V3.B16, V7.B16
+	AESMC	V7.B16, V7.B16
+
+	AESE	V0.B16, V4.B16
+	AESMC	V4.B16, V4.B16
+	AESE	V1.B16, V5.B16
+	AESMC	V5.B16, V5.B16
+	AESE	V2.B16, V6.B16
+	AESMC	V6.B16, V6.B16
+	AESE	V3.B16, V7.B16
+	AESMC	V7.B16, V7.B16
+
+	AESE	V0.B16, V4.B16
+	AESE	V1.B16, V5.B16
+	AESE	V2.B16, V6.B16
+	AESE	V3.B16, V7.B16
+
+	VEOR	V6.B16, V4.B16, V4.B16
+	VEOR	V7.B16, V5.B16, V5.B16
+	VEOR	V5.B16, V4.B16, V4.B16
+
+	VMOV	V4.D[0], R0
+	RET
+
+aes65to128:
+	VLD1.P	64(R4), [V1.B16, V2.B16, V3.B16, V4.B16]
+	VLD1	(R4), [V5.B16, V6.B16, V7.B16]
+	AESE	V30.B16, V1.B16
+	AESMC	V1.B16, V1.B16
+	AESE	V30.B16, V2.B16
+	AESMC	V2.B16, V2.B16
+	AESE	V30.B16, V3.B16
+	AESMC	V3.B16, V3.B16
+	AESE	V30.B16, V4.B16
+	AESMC	V4.B16, V4.B16
+	AESE	V30.B16, V5.B16
+	AESMC	V5.B16, V5.B16
+	AESE	V30.B16, V6.B16
+	AESMC	V6.B16, V6.B16
+	AESE	V30.B16, V7.B16
+	AESMC	V7.B16, V7.B16
+
+	SUB	$64, R2, R10
+	VLD1.P	(R0)(R10), [V8.B16, V9.B16, V10.B16, V11.B16]
+	VLD1	(R0), [V12.B16, V13.B16, V14.B16, V15.B16]
+	AESE	V0.B16,	 V8.B16
+	AESMC	V8.B16,  V8.B16
+	AESE	V1.B16,	 V9.B16
+	AESMC	V9.B16,  V9.B16
+	AESE	V2.B16, V10.B16
+	AESMC	V10.B16,  V10.B16
+	AESE	V3.B16, V11.B16
+	AESMC	V11.B16,  V11.B16
+	AESE	V4.B16, V12.B16
+	AESMC	V12.B16,  V12.B16
+	AESE	V5.B16, V13.B16
+	AESMC	V13.B16,  V13.B16
+	AESE	V6.B16, V14.B16
+	AESMC	V14.B16,  V14.B16
+	AESE	V7.B16, V15.B16
+	AESMC	V15.B16,  V15.B16
+
+	AESE	V0.B16,	 V8.B16
+	AESMC	V8.B16,  V8.B16
+	AESE	V1.B16,	 V9.B16
+	AESMC	V9.B16,  V9.B16
+	AESE	V2.B16, V10.B16
+	AESMC	V10.B16,  V10.B16
+	AESE	V3.B16, V11.B16
+	AESMC	V11.B16,  V11.B16
+	AESE	V4.B16, V12.B16
+	AESMC	V12.B16,  V12.B16
+	AESE	V5.B16, V13.B16
+	AESMC	V13.B16,  V13.B16
+	AESE	V6.B16, V14.B16
+	AESMC	V14.B16,  V14.B16
+	AESE	V7.B16, V15.B16
+	AESMC	V15.B16,  V15.B16
+
+	AESE	V0.B16,	 V8.B16
+	AESE	V1.B16,	 V9.B16
+	AESE	V2.B16, V10.B16
+	AESE	V3.B16, V11.B16
+	AESE	V4.B16, V12.B16
+	AESE	V5.B16, V13.B16
+	AESE	V6.B16, V14.B16
+	AESE	V7.B16, V15.B16
+
+	VEOR	V12.B16, V8.B16, V8.B16
+	VEOR	V13.B16, V9.B16, V9.B16
+	VEOR	V14.B16, V10.B16, V10.B16
+	VEOR	V15.B16, V11.B16, V11.B16
+	VEOR	V10.B16, V8.B16, V8.B16
+	VEOR	V11.B16, V9.B16, V9.B16
+	VEOR	V9.B16, V8.B16, V8.B16
+
+	VMOV	V8.D[0], R0
+	RET
+
+aes129plus:
+	PRFM (R0), PLDL1KEEP
+	VLD1.P	64(R4), [V1.B16, V2.B16, V3.B16, V4.B16]
+	VLD1	(R4), [V5.B16, V6.B16, V7.B16]
+	AESE	V30.B16, V1.B16
+	AESMC	V1.B16, V1.B16
+	AESE	V30.B16, V2.B16
+	AESMC	V2.B16, V2.B16
+	AESE	V30.B16, V3.B16
+	AESMC	V3.B16, V3.B16
+	AESE	V30.B16, V4.B16
+	AESMC	V4.B16, V4.B16
+	AESE	V30.B16, V5.B16
+	AESMC	V5.B16, V5.B16
+	AESE	V30.B16, V6.B16
+	AESMC	V6.B16, V6.B16
+	AESE	V30.B16, V7.B16
+	AESMC	V7.B16, V7.B16
+	ADD	R0, R2, R10
+	SUB	$128, R10, R10
+	VLD1.P	64(R10), [V8.B16, V9.B16, V10.B16, V11.B16]
+	VLD1	(R10), [V12.B16, V13.B16, V14.B16, V15.B16]
+	SUB	$1, R2, R2
+	LSR	$7, R2, R2
+
+aesloop:
+	AESE	V8.B16,	 V0.B16
+	AESMC	V0.B16,  V0.B16
+	AESE	V9.B16,	 V1.B16
+	AESMC	V1.B16,  V1.B16
+	AESE	V10.B16, V2.B16
+	AESMC	V2.B16,  V2.B16
+	AESE	V11.B16, V3.B16
+	AESMC	V3.B16,  V3.B16
+	AESE	V12.B16, V4.B16
+	AESMC	V4.B16,  V4.B16
+	AESE	V13.B16, V5.B16
+	AESMC	V5.B16,  V5.B16
+	AESE	V14.B16, V6.B16
+	AESMC	V6.B16,  V6.B16
+	AESE	V15.B16, V7.B16
+	AESMC	V7.B16,  V7.B16
+
+	VLD1.P	64(R0), [V8.B16, V9.B16, V10.B16, V11.B16]
+	AESE	V8.B16,	 V0.B16
+	AESMC	V0.B16,  V0.B16
+	AESE	V9.B16,	 V1.B16
+	AESMC	V1.B16,  V1.B16
+	AESE	V10.B16, V2.B16
+	AESMC	V2.B16,  V2.B16
+	AESE	V11.B16, V3.B16
+	AESMC	V3.B16,  V3.B16
+
+	VLD1.P	64(R0), [V12.B16, V13.B16, V14.B16, V15.B16]
+	AESE	V12.B16, V4.B16
+	AESMC	V4.B16,  V4.B16
+	AESE	V13.B16, V5.B16
+	AESMC	V5.B16,  V5.B16
+	AESE	V14.B16, V6.B16
+	AESMC	V6.B16,  V6.B16
+	AESE	V15.B16, V7.B16
+	AESMC	V7.B16,  V7.B16
+	SUB	$1, R2, R2
+	CBNZ	R2, aesloop
+
+	AESE	V8.B16,	 V0.B16
+	AESMC	V0.B16,  V0.B16
+	AESE	V9.B16,	 V1.B16
+	AESMC	V1.B16,  V1.B16
+	AESE	V10.B16, V2.B16
+	AESMC	V2.B16,  V2.B16
+	AESE	V11.B16, V3.B16
+	AESMC	V3.B16,  V3.B16
+	AESE	V12.B16, V4.B16
+	AESMC	V4.B16,  V4.B16
+	AESE	V13.B16, V5.B16
+	AESMC	V5.B16,  V5.B16
+	AESE	V14.B16, V6.B16
+	AESMC	V6.B16,  V6.B16
+	AESE	V15.B16, V7.B16
+	AESMC	V7.B16,  V7.B16
+
+	AESE	V8.B16,	 V0.B16
+	AESMC	V0.B16,  V0.B16
+	AESE	V9.B16,	 V1.B16
+	AESMC	V1.B16,  V1.B16
+	AESE	V10.B16, V2.B16
+	AESMC	V2.B16,  V2.B16
+	AESE	V11.B16, V3.B16
+	AESMC	V3.B16,  V3.B16
+	AESE	V12.B16, V4.B16
+	AESMC	V4.B16,  V4.B16
+	AESE	V13.B16, V5.B16
+	AESMC	V5.B16,  V5.B16
+	AESE	V14.B16, V6.B16
+	AESMC	V6.B16,  V6.B16
+	AESE	V15.B16, V7.B16
+	AESMC	V7.B16,  V7.B16
+
+	AESE	V8.B16,	 V0.B16
+	AESE	V9.B16,	 V1.B16
+	AESE	V10.B16, V2.B16
+	AESE	V11.B16, V3.B16
+	AESE	V12.B16, V4.B16
+	AESE	V13.B16, V5.B16
+	AESE	V14.B16, V6.B16
+	AESE	V15.B16, V7.B16
+
+	VEOR	V0.B16, V1.B16, V0.B16
+	VEOR	V2.B16, V3.B16, V2.B16
+	VEOR	V4.B16, V5.B16, V4.B16
+	VEOR	V6.B16, V7.B16, V6.B16
+	VEOR	V0.B16, V2.B16, V0.B16
+	VEOR	V4.B16, V6.B16, V4.B16
+	VEOR	V4.B16, V0.B16, V0.B16
+
+	VMOV	V0.D[0], R0
+	RET
+
+TEXT runtime·procyield(SB),NOSPLIT,$0-0
+	MOVWU	cycles+0(FP), R0
+again:
+	YIELD
+	SUBW	$1, R0
+	CBNZ	R0, again
+	RET
+
+// Save state of caller into g->sched,
+// but using fake PC from systemstack_switch.
+// Must only be called from functions with no locals ($0)
+// or else unwinding from systemstack_switch is incorrect.
+// Smashes R0.
+TEXT gosave_systemstack_switch<>(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$runtime·systemstack_switch(SB), R0
+	ADD	$8, R0	// get past prologue
+	MOVD	R0, (g_sched+gobuf_pc)(g)
+	MOVD	RSP, R0
+	MOVD	R0, (g_sched+gobuf_sp)(g)
+	MOVD	R29, (g_sched+gobuf_bp)(g)
+	MOVD	$0, (g_sched+gobuf_lr)(g)
+	MOVD	$0, (g_sched+gobuf_ret)(g)
+	// Assert ctxt is zero. See func save.
+	MOVD	(g_sched+gobuf_ctxt)(g), R0
+	CBZ	R0, 2(PC)
+	CALL	runtime·abort(SB)
+	RET
+
+// func asmcgocall_no_g(fn, arg unsafe.Pointer)
+// Call fn(arg) aligned appropriately for the gcc ABI.
+// Called on a system stack, and there may be no g yet (during needm).
+TEXT ·asmcgocall_no_g(SB),NOSPLIT,$0-16
+	MOVD	fn+0(FP), R1
+	MOVD	arg+8(FP), R0
+	SUB	$16, RSP	// skip over saved frame pointer below RSP
+	BL	(R1)
+	ADD	$16, RSP	// skip over saved frame pointer below RSP
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-20
+	MOVD	fn+0(FP), R1
+	MOVD	arg+8(FP), R0
+
+	MOVD	RSP, R2		// save original stack pointer
+	CBZ	g, nosave
+	MOVD	g, R4
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already. Or we might already
+	// be on the m->gsignal stack.
+	MOVD	g_m(g), R8
+	MOVD	m_gsignal(R8), R3
+	CMP	R3, g
+	BEQ	nosave
+	MOVD	m_g0(R8), R3
+	CMP	R3, g
+	BEQ	nosave
+
+	// Switch to system stack.
+	MOVD	R0, R9	// gosave_systemstack_switch<> and save_g might clobber R0
+	BL	gosave_systemstack_switch<>(SB)
+	MOVD	R3, g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R0
+	MOVD	R0, RSP
+	MOVD	(g_sched+gobuf_bp)(g), R29
+	MOVD	R9, R0
+
+	// Now on a scheduling stack (a pthread-created stack).
+	// Save room for two of our pointers /*, plus 32 bytes of callee
+	// save area that lives on the caller stack. */
+	MOVD	RSP, R13
+	SUB	$16, R13
+	MOVD	R13, RSP
+	MOVD	R4, 0(RSP)	// save old g on stack
+	MOVD	(g_stack+stack_hi)(R4), R4
+	SUB	R2, R4
+	MOVD	R4, 8(RSP)	// save depth in old g stack (can't just save SP, as stack might be copied during a callback)
+	BL	(R1)
+	MOVD	R0, R9
+
+	// Restore g, stack pointer. R0 is errno, so don't touch it
+	MOVD	0(RSP), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_stack+stack_hi)(g), R5
+	MOVD	8(RSP), R6
+	SUB	R6, R5
+	MOVD	R9, R0
+	MOVD	R5, RSP
+
+	MOVW	R0, ret+16(FP)
+	RET
+
+nosave:
+	// Running on a system stack, perhaps even without a g.
+	// Having no g can happen during thread creation or thread teardown
+	// (see needm/dropm on Solaris, for example).
+	// This code is like the above sequence but without saving/restoring g
+	// and without worrying about the stack moving out from under us
+	// (because we're on a system stack, not a goroutine stack).
+	// The above code could be used directly if already on a system stack,
+	// but then the only path through this code would be a rare case on Solaris.
+	// Using this code for all "already on system stack" calls exercises it more,
+	// which should help keep it correct.
+	MOVD	RSP, R13
+	SUB	$16, R13
+	MOVD	R13, RSP
+	MOVD	$0, R4
+	MOVD	R4, 0(RSP)	// Where above code stores g, in case someone looks during debugging.
+	MOVD	R2, 8(RSP)	// Save original stack pointer.
+	BL	(R1)
+	// Restore stack pointer.
+	MOVD	8(RSP), R2
+	MOVD	R2, RSP
+	MOVD	R0, ret+16(FP)
+	RET
+
+// cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$24-24
+	NO_LOCAL_POINTERS
+
+	// Skip cgocallbackg, just dropm when fn is nil, and frame is the saved g.
+	// It is used to dropm while thread is exiting.
+	MOVD	fn+0(FP), R1
+	CBNZ	R1, loadg
+	// Restore the g from frame.
+	MOVD	frame+8(FP), g
+	B	dropm
+
+loadg:
+	// Load g from thread-local storage.
+	BL	runtime·load_g(SB)
+
+	// If g is nil, Go did not create the current thread,
+	// or if this thread never called into Go on pthread platforms.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call.
+	CBZ	g, needm
+
+	MOVD	g_m(g), R8
+	MOVD	R8, savedm-8(SP)
+	B	havem
+
+needm:
+	MOVD	g, savedm-8(SP) // g is zero, so is m.
+	MOVD	$runtime·needAndBindM(SB), R0
+	BL	(R0)
+
+	// Set m->g0->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), R3
+	MOVD	RSP, R0
+	MOVD	R0, (g_sched+gobuf_sp)(R3)
+	MOVD	R29, (g_sched+gobuf_bp)(R3)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 16(RSP) aka savedsp-16(SP).
+	// Beware that the frame size is actually 32+16.
+	MOVD	m_g0(R8), R3
+	MOVD	(g_sched+gobuf_sp)(R3), R4
+	MOVD	R4, savedsp-16(SP)
+	MOVD	RSP, R0
+	MOVD	R0, (g_sched+gobuf_sp)(R3)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOVD	m_curg(R8), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R4 // prepare stack as R4
+	MOVD	(g_sched+gobuf_pc)(g), R5
+	MOVD	R5, -48(R4)
+	MOVD	(g_sched+gobuf_bp)(g), R5
+	MOVD	R5, -56(R4)
+	// Gather our arguments into registers.
+	MOVD	fn+0(FP), R1
+	MOVD	frame+8(FP), R2
+	MOVD	ctxt+16(FP), R3
+	MOVD	$-48(R4), R0 // maintain 16-byte SP alignment
+	MOVD	R0, RSP	// switch stack
+	MOVD	R1, 8(RSP)
+	MOVD	R2, 16(RSP)
+	MOVD	R3, 24(RSP)
+	MOVD	$runtime·cgocallbackg(SB), R0
+	CALL	(R0) // indirect call to bypass nosplit check. We're on a different stack now.
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	MOVD	0(RSP), R5
+	MOVD	R5, (g_sched+gobuf_pc)(g)
+	MOVD	RSP, R4
+	ADD	$48, R4, R4
+	MOVD	R4, (g_sched+gobuf_sp)(g)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R0
+	MOVD	R0, RSP
+	MOVD	savedsp-16(SP), R4
+	MOVD	R4, (g_sched+gobuf_sp)(g)
+
+	// If the m on entry was nil, we called needm above to borrow an m,
+	// 1. for the duration of the call on non-pthread platforms,
+	// 2. or the duration of the C thread alive on pthread platforms.
+	// If the m on entry wasn't nil,
+	// 1. the thread might be a Go thread,
+	// 2. or it wasn't the first call from a C thread on pthread platforms,
+	//    since then we skip dropm to reuse the m in the first call.
+	MOVD	savedm-8(SP), R6
+	CBNZ	R6, droppedm
+
+	// Skip dropm to reuse it in the next call, when a pthread key has been created.
+	MOVD	_cgo_pthread_key_created(SB), R6
+	// It means cgo is disabled when _cgo_pthread_key_created is a nil pointer, need dropm.
+	CBZ	R6, dropm
+	MOVD	(R6), R6
+	CBNZ	R6, droppedm
+
+dropm:
+	MOVD	$runtime·dropm(SB), R0
+	BL	(R0)
+droppedm:
+
+	// Done!
+	RET
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT,$24
+	// g (R28) and REGTMP (R27)  might be clobbered by load_g. They
+	// are callee-save in the gcc calling convention, so save them.
+	MOVD	R27, savedR27-8(SP)
+	MOVD	g, saveG-16(SP)
+
+	BL	runtime·load_g(SB)
+	MOVD	g_m(g), R0
+	MOVD	m_curg(R0), R0
+	MOVD	(g_stack+stack_hi)(R0), R0
+
+	MOVD	saveG-16(SP), g
+	MOVD	savedR28-8(SP), R27
+	RET
+
+// void setg(G*); set g. for use by needm.
+TEXT runtime·setg(SB), NOSPLIT, $0-8
+	MOVD	gg+0(FP), g
+	// This only happens if iscgo, so jump straight to save_g
+	BL	runtime·save_g(SB)
+	RET
+
+// void setg_gcc(G*); set g called from gcc
+TEXT setg_gcc<>(SB),NOSPLIT,$8
+	MOVD	R0, g
+	MOVD	R27, savedR27-8(SP)
+	BL	runtime·save_g(SB)
+	MOVD	savedR27-8(SP), R27
+	RET
+
+TEXT runtime·emptyfunc(SB),0,$0-0
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT|NOFRAME,$0-0
+	MOVD	ZR, R0
+	MOVD	(R0), R0
+	UNDEF
+
+TEXT runtime·return0(SB), NOSPLIT, $0
+	MOVW	$0, R0
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+TEXT runtime·goexit(SB),NOSPLIT|NOFRAME|TOPFRAME,$0-0
+	MOVD	R0, R0	// NOP
+	BL	runtime·goexit1(SB)	// does not return
+
+// This is called from .init_array and follows the platform, not Go, ABI.
+TEXT runtime·addmoduledata(SB),NOSPLIT,$0-0
+	SUB	$0x10, RSP
+	MOVD	R27, 8(RSP) // The access to global variables below implicitly uses R27, which is callee-save
+	MOVD	runtime·lastmoduledatap(SB), R1
+	MOVD	R0, moduledata_next(R1)
+	MOVD	R0, runtime·lastmoduledatap(SB)
+	MOVD	8(RSP), R27
+	ADD	$0x10, RSP
+	RET
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	MOVW	$1, R3
+	MOVB	R3, ret+0(FP)
+	RET
+
+// gcWriteBarrier informs the GC about heap pointer writes.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It accepts the
+// number of bytes of buffer needed in R25, and returns a pointer
+// to the buffer space in R25.
+// It clobbers condition codes.
+// It does not clobber any general-purpose registers except R27,
+// but may clobber others (e.g., floating point registers)
+// The act of CALLing gcWriteBarrier will clobber R30 (LR).
+TEXT gcWriteBarrier<>(SB),NOSPLIT,$200
+	// Save the registers clobbered by the fast path.
+	STP	(R0, R1), 184(RSP)
+retry:
+	MOVD	g_m(g), R0
+	MOVD	m_p(R0), R0
+	MOVD	(p_wbBuf+wbBuf_next)(R0), R1
+	MOVD	(p_wbBuf+wbBuf_end)(R0), R27
+	// Increment wbBuf.next position.
+	ADD	R25, R1
+	// Is the buffer full?
+	CMP	R27, R1
+	BHI	flush
+	// Commit to the larger buffer.
+	MOVD	R1, (p_wbBuf+wbBuf_next)(R0)
+	// Make return value (the original next position)
+	SUB	R25, R1, R25
+	// Restore registers.
+	LDP	184(RSP), (R0, R1)
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	// R0 and R1 already saved
+	STP	(R2, R3), 1*8(RSP)
+	STP	(R4, R5), 3*8(RSP)
+	STP	(R6, R7), 5*8(RSP)
+	STP	(R8, R9), 7*8(RSP)
+	STP	(R10, R11), 9*8(RSP)
+	STP	(R12, R13), 11*8(RSP)
+	STP	(R14, R15), 13*8(RSP)
+	// R16, R17 may be clobbered by linker trampoline
+	// R18 is unused.
+	STP	(R19, R20), 15*8(RSP)
+	STP	(R21, R22), 17*8(RSP)
+	STP	(R23, R24), 19*8(RSP)
+	STP	(R25, R26), 21*8(RSP)
+	// R27 is temp register.
+	// R28 is g.
+	// R29 is frame pointer (unused).
+	// R30 is LR, which was saved by the prologue.
+	// R31 is SP.
+
+	CALL	runtime·wbBufFlush(SB)
+	LDP	1*8(RSP), (R2, R3)
+	LDP	3*8(RSP), (R4, R5)
+	LDP	5*8(RSP), (R6, R7)
+	LDP	7*8(RSP), (R8, R9)
+	LDP	9*8(RSP), (R10, R11)
+	LDP	11*8(RSP), (R12, R13)
+	LDP	13*8(RSP), (R14, R15)
+	LDP	15*8(RSP), (R19, R20)
+	LDP	17*8(RSP), (R21, R22)
+	LDP	19*8(RSP), (R23, R24)
+	LDP	21*8(RSP), (R25, R26)
+	JMP	retry
+
+TEXT runtime·gcWriteBarrier1<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$8, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier2<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$16, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier3<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$24, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier4<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$32, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier5<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$40, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier6<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$48, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier7<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$56, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier8<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$64, R25
+	JMP	gcWriteBarrier<>(SB)
+
+DATA	debugCallFrameTooLarge<>+0x00(SB)/20, $"call frame too large"
+GLOBL	debugCallFrameTooLarge<>(SB), RODATA, $20	// Size duplicated below
+
+// debugCallV2 is the entry point for debugger-injected function
+// calls on running goroutines. It informs the runtime that a
+// debug call has been injected and creates a call frame for the
+// debugger to fill in.
+//
+// To inject a function call, a debugger should:
+// 1. Check that the goroutine is in state _Grunning and that
+//    there are at least 288 bytes free on the stack.
+// 2. Set SP as SP-16.
+// 3. Store the current LR in (SP) (using the SP after step 2).
+// 4. Store the current PC in the LR register.
+// 5. Write the desired argument frame size at SP-16
+// 6. Save all machine registers (including flags and fpsimd registers)
+//    so they can be restored later by the debugger.
+// 7. Set the PC to debugCallV2 and resume execution.
+//
+// If the goroutine is in state _Grunnable, then it's not generally
+// safe to inject a call because it may return out via other runtime
+// operations. Instead, the debugger should unwind the stack to find
+// the return to non-runtime code, add a temporary breakpoint there,
+// and inject the call once that breakpoint is hit.
+//
+// If the goroutine is in any other state, it's not safe to inject a call.
+//
+// This function communicates back to the debugger by setting R20 and
+// invoking BRK to raise a breakpoint signal. Note that the signal PC of
+// the signal triggered by the BRK instruction is the PC where the signal
+// is trapped, not the next PC, so to resume execution, the debugger needs
+// to set the signal PC to PC+4. See the comments in the implementation for
+// the protocol the debugger is expected to follow. InjectDebugCall in the
+// runtime tests demonstrates this protocol.
+//
+// The debugger must ensure that any pointers passed to the function
+// obey escape analysis requirements. Specifically, it must not pass
+// a stack pointer to an escaping argument. debugCallV2 cannot check
+// this invariant.
+//
+// This is ABIInternal because Go code injects its PC directly into new
+// goroutine stacks.
+TEXT runtime·debugCallV2<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-0
+	STP	(R29, R30), -280(RSP)
+	SUB	$272, RSP, RSP
+	SUB	$8, RSP, R29
+	// Save all registers that may contain pointers so they can be
+	// conservatively scanned.
+	//
+	// We can't do anything that might clobber any of these
+	// registers before this.
+	STP	(R27, g), (30*8)(RSP)
+	STP	(R25, R26), (28*8)(RSP)
+	STP	(R23, R24), (26*8)(RSP)
+	STP	(R21, R22), (24*8)(RSP)
+	STP	(R19, R20), (22*8)(RSP)
+	STP	(R16, R17), (20*8)(RSP)
+	STP	(R14, R15), (18*8)(RSP)
+	STP	(R12, R13), (16*8)(RSP)
+	STP	(R10, R11), (14*8)(RSP)
+	STP	(R8, R9), (12*8)(RSP)
+	STP	(R6, R7), (10*8)(RSP)
+	STP	(R4, R5), (8*8)(RSP)
+	STP	(R2, R3), (6*8)(RSP)
+	STP	(R0, R1), (4*8)(RSP)
+
+	// Perform a safe-point check.
+	MOVD	R30, 8(RSP) // Caller's PC
+	CALL	runtime·debugCallCheck(SB)
+	MOVD	16(RSP), R0
+	CBZ	R0, good
+
+	// The safety check failed. Put the reason string at the top
+	// of the stack.
+	MOVD	R0, 8(RSP)
+	MOVD	24(RSP), R0
+	MOVD	R0, 16(RSP)
+
+	// Set R20 to 8 and invoke BRK. The debugger should get the
+	// reason a call can't be injected from SP+8 and resume execution.
+	MOVD	$8, R20
+	BREAK
+	JMP	restore
+
+good:
+	// Registers are saved and it's safe to make a call.
+	// Open up a call frame, moving the stack if necessary.
+	//
+	// Once the frame is allocated, this will set R20 to 0 and
+	// invoke BRK. The debugger should write the argument
+	// frame for the call at SP+8, set up argument registers,
+	// set the LR as the signal PC + 4, set the PC to the function
+	// to call, set R26 to point to the closure (if a closure call),
+	// and resume execution.
+	//
+	// If the function returns, this will set R20 to 1 and invoke
+	// BRK. The debugger can then inspect any return value saved
+	// on the stack at SP+8 and in registers. To resume execution,
+	// the debugger should restore the LR from (SP).
+	//
+	// If the function panics, this will set R20 to 2 and invoke BRK.
+	// The interface{} value of the panic will be at SP+8. The debugger
+	// can inspect the panic value and resume execution again.
+#define DEBUG_CALL_DISPATCH(NAME,MAXSIZE)	\
+	CMP	$MAXSIZE, R0;			\
+	BGT	5(PC);				\
+	MOVD	$NAME(SB), R0;			\
+	MOVD	R0, 8(RSP);			\
+	CALL	runtime·debugCallWrap(SB);	\
+	JMP	restore
+
+	MOVD	256(RSP), R0 // the argument frame size
+	DEBUG_CALL_DISPATCH(debugCall32<>, 32)
+	DEBUG_CALL_DISPATCH(debugCall64<>, 64)
+	DEBUG_CALL_DISPATCH(debugCall128<>, 128)
+	DEBUG_CALL_DISPATCH(debugCall256<>, 256)
+	DEBUG_CALL_DISPATCH(debugCall512<>, 512)
+	DEBUG_CALL_DISPATCH(debugCall1024<>, 1024)
+	DEBUG_CALL_DISPATCH(debugCall2048<>, 2048)
+	DEBUG_CALL_DISPATCH(debugCall4096<>, 4096)
+	DEBUG_CALL_DISPATCH(debugCall8192<>, 8192)
+	DEBUG_CALL_DISPATCH(debugCall16384<>, 16384)
+	DEBUG_CALL_DISPATCH(debugCall32768<>, 32768)
+	DEBUG_CALL_DISPATCH(debugCall65536<>, 65536)
+	// The frame size is too large. Report the error.
+	MOVD	$debugCallFrameTooLarge<>(SB), R0
+	MOVD	R0, 8(RSP)
+	MOVD	$20, R0
+	MOVD	R0, 16(RSP) // length of debugCallFrameTooLarge string
+	MOVD	$8, R20
+	BREAK
+	JMP	restore
+
+restore:
+	// Calls and failures resume here.
+	//
+	// Set R20 to 16 and invoke BRK. The debugger should restore
+	// all registers except for PC and RSP and resume execution.
+	MOVD	$16, R20
+	BREAK
+	// We must not modify flags after this point.
+
+	// Restore pointer-containing registers, which may have been
+	// modified from the debugger's copy by stack copying.
+	LDP	(30*8)(RSP), (R27, g)
+	LDP	(28*8)(RSP), (R25, R26)
+	LDP	(26*8)(RSP), (R23, R24)
+	LDP	(24*8)(RSP), (R21, R22)
+	LDP	(22*8)(RSP), (R19, R20)
+	LDP	(20*8)(RSP), (R16, R17)
+	LDP	(18*8)(RSP), (R14, R15)
+	LDP	(16*8)(RSP), (R12, R13)
+	LDP	(14*8)(RSP), (R10, R11)
+	LDP	(12*8)(RSP), (R8, R9)
+	LDP	(10*8)(RSP), (R6, R7)
+	LDP	(8*8)(RSP), (R4, R5)
+	LDP	(6*8)(RSP), (R2, R3)
+	LDP	(4*8)(RSP), (R0, R1)
+
+	LDP	-8(RSP), (R29, R27)
+	ADD	$288, RSP, RSP // Add 16 more bytes, see saveSigContext
+	MOVD	-16(RSP), R30 // restore old lr
+	JMP	(R27)
+
+// runtime.debugCallCheck assumes that functions defined with the
+// DEBUG_CALL_FN macro are safe points to inject calls.
+#define DEBUG_CALL_FN(NAME,MAXSIZE)		\
+TEXT NAME(SB),WRAPPER,$MAXSIZE-0;		\
+	NO_LOCAL_POINTERS;		\
+	MOVD	$0, R20;		\
+	BREAK;		\
+	MOVD	$1, R20;		\
+	BREAK;		\
+	RET
+DEBUG_CALL_FN(debugCall32<>, 32)
+DEBUG_CALL_FN(debugCall64<>, 64)
+DEBUG_CALL_FN(debugCall128<>, 128)
+DEBUG_CALL_FN(debugCall256<>, 256)
+DEBUG_CALL_FN(debugCall512<>, 512)
+DEBUG_CALL_FN(debugCall1024<>, 1024)
+DEBUG_CALL_FN(debugCall2048<>, 2048)
+DEBUG_CALL_FN(debugCall4096<>, 4096)
+DEBUG_CALL_FN(debugCall8192<>, 8192)
+DEBUG_CALL_FN(debugCall16384<>, 16384)
+DEBUG_CALL_FN(debugCall32768<>, 32768)
+DEBUG_CALL_FN(debugCall65536<>, 65536)
+
+// func debugCallPanicked(val interface{})
+TEXT runtime·debugCallPanicked(SB),NOSPLIT,$16-16
+	// Copy the panic value to the top of stack at SP+8.
+	MOVD	val_type+0(FP), R0
+	MOVD	R0, 8(RSP)
+	MOVD	val_data+8(FP), R0
+	MOVD	R0, 16(RSP)
+	MOVD	$2, R20
+	BREAK
+	RET
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+//
+// Defined as ABIInternal since the compiler generates ABIInternal
+// calls to it directly and it does not use the stack-based Go ABI.
+TEXT runtime·panicIndex<ABIInternal>(SB),NOSPLIT,$0-16
+	JMP	runtime·goPanicIndex<ABIInternal>(SB)
+TEXT runtime·panicIndexU<ABIInternal>(SB),NOSPLIT,$0-16
+	JMP	runtime·goPanicIndexU<ABIInternal>(SB)
+TEXT runtime·panicSliceAlen<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R1, R0
+	MOVD	R2, R1
+	JMP	runtime·goPanicSliceAlen<ABIInternal>(SB)
+TEXT runtime·panicSliceAlenU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R1, R0
+	MOVD	R2, R1
+	JMP	runtime·goPanicSliceAlenU<ABIInternal>(SB)
+TEXT runtime·panicSliceAcap<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R1, R0
+	MOVD	R2, R1
+	JMP	runtime·goPanicSliceAcap<ABIInternal>(SB)
+TEXT runtime·panicSliceAcapU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R1, R0
+	MOVD	R2, R1
+	JMP	runtime·goPanicSliceAcapU<ABIInternal>(SB)
+TEXT runtime·panicSliceB<ABIInternal>(SB),NOSPLIT,$0-16
+	JMP	runtime·goPanicSliceB<ABIInternal>(SB)
+TEXT runtime·panicSliceBU<ABIInternal>(SB),NOSPLIT,$0-16
+	JMP	runtime·goPanicSliceBU<ABIInternal>(SB)
+TEXT runtime·panicSlice3Alen<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R2, R0
+	MOVD	R3, R1
+	JMP	runtime·goPanicSlice3Alen<ABIInternal>(SB)
+TEXT runtime·panicSlice3AlenU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R2, R0
+	MOVD	R3, R1
+	JMP	runtime·goPanicSlice3AlenU<ABIInternal>(SB)
+TEXT runtime·panicSlice3Acap<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R2, R0
+	MOVD	R3, R1
+	JMP	runtime·goPanicSlice3Acap<ABIInternal>(SB)
+TEXT runtime·panicSlice3AcapU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R2, R0
+	MOVD	R3, R1
+	JMP	runtime·goPanicSlice3AcapU<ABIInternal>(SB)
+TEXT runtime·panicSlice3B<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R1, R0
+	MOVD	R2, R1
+	JMP	runtime·goPanicSlice3B<ABIInternal>(SB)
+TEXT runtime·panicSlice3BU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R1, R0
+	MOVD	R2, R1
+	JMP	runtime·goPanicSlice3BU<ABIInternal>(SB)
+TEXT runtime·panicSlice3C<ABIInternal>(SB),NOSPLIT,$0-16
+	JMP	runtime·goPanicSlice3C<ABIInternal>(SB)
+TEXT runtime·panicSlice3CU<ABIInternal>(SB),NOSPLIT,$0-16
+	JMP	runtime·goPanicSlice3CU<ABIInternal>(SB)
+TEXT runtime·panicSliceConvert<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R2, R0
+	MOVD	R3, R1
+	JMP	runtime·goPanicSliceConvert<ABIInternal>(SB)
+
+TEXT ·getfp<ABIInternal>(SB),NOSPLIT|NOFRAME,$0
+	MOVD R29, R0
+	RET
diff --git a/src/runtime/asm_loong64.s b/src/runtime/asm_loong64.s
new file mode 100644
index 0000000..6ffa139
--- /dev/null
+++ b/src/runtime/asm_loong64.s
@@ -0,0 +1,844 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+#define	REGCTXT	R29
+
+TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
+	// R3 = stack; R4 = argc; R5 = argv
+
+	ADDV	$-24, R3
+	MOVW	R4, 8(R3) // argc
+	MOVV	R5, 16(R3) // argv
+
+	// create istack out of the given (operating system) stack.
+	// _cgo_init may update stackguard.
+	MOVV	$runtime·g0(SB), g
+	MOVV	$(-64*1024), R30
+	ADDV	R30, R3, R19
+	MOVV	R19, g_stackguard0(g)
+	MOVV	R19, g_stackguard1(g)
+	MOVV	R19, (g_stack+stack_lo)(g)
+	MOVV	R3, (g_stack+stack_hi)(g)
+
+	// if there is a _cgo_init, call it using the gcc ABI.
+	MOVV	_cgo_init(SB), R25
+	BEQ	R25, nocgo
+
+	MOVV	R0, R7	// arg 3: not used
+	MOVV	R0, R6	// arg 2: not used
+	MOVV	$setg_gcc<>(SB), R5	// arg 1: setg
+	MOVV	g, R4	// arg 0: G
+	JAL	(R25)
+
+nocgo:
+	// update stackguard after _cgo_init
+	MOVV	(g_stack+stack_lo)(g), R19
+	ADDV	$const_stackGuard, R19
+	MOVV	R19, g_stackguard0(g)
+	MOVV	R19, g_stackguard1(g)
+
+	// set the per-goroutine and per-mach "registers"
+	MOVV	$runtime·m0(SB), R19
+
+	// save m->g0 = g0
+	MOVV	g, m_g0(R19)
+	// save m0 to g0->m
+	MOVV	R19, g_m(g)
+
+	JAL	runtime·check(SB)
+
+	// args are already prepared
+	JAL	runtime·args(SB)
+	JAL	runtime·osinit(SB)
+	JAL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	MOVV	$runtime·mainPC(SB), R19		// entry
+	ADDV	$-16, R3
+	MOVV	R19, 8(R3)
+	MOVV	R0, 0(R3)
+	JAL	runtime·newproc(SB)
+	ADDV	$16, R3
+
+	// start this M
+	JAL	runtime·mstart(SB)
+
+	MOVV	R0, 1(R0)
+	RET
+
+DATA	runtime·mainPC+0(SB)/8,$runtime·main(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$8
+
+TEXT runtime·breakpoint(SB),NOSPLIT|NOFRAME,$0-0
+	BREAK
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT|NOFRAME,$0-0
+	RET
+
+TEXT runtime·mstart(SB),NOSPLIT|TOPFRAME,$0
+	JAL     runtime·mstart0(SB)
+	RET // not reached
+
+// func cputicks() int64
+TEXT runtime·cputicks(SB),NOSPLIT,$0-8
+	RDTIMED	R0, R4
+	MOVV	R4, ret+0(FP)
+	RET
+
+/*
+ *  go-routine
+ */
+
+// void gogo(Gobuf*)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB), NOSPLIT|NOFRAME, $0-8
+	MOVV	buf+0(FP), R4
+	MOVV	gobuf_g(R4), R5
+	MOVV	0(R5), R0	// make sure g != nil
+	JMP	gogo<>(SB)
+
+TEXT gogo<>(SB), NOSPLIT|NOFRAME, $0
+	MOVV	R5, g
+	JAL	runtime·save_g(SB)
+
+	MOVV	gobuf_sp(R4), R3
+	MOVV	gobuf_lr(R4), R1
+	MOVV	gobuf_ret(R4), R19
+	MOVV	gobuf_ctxt(R4), REGCTXT
+	MOVV	R0, gobuf_sp(R4)
+	MOVV	R0, gobuf_ret(R4)
+	MOVV	R0, gobuf_lr(R4)
+	MOVV	R0, gobuf_ctxt(R4)
+	MOVV	gobuf_pc(R4), R6
+	JMP	(R6)
+
+// void mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall(SB), NOSPLIT|NOFRAME, $0-8
+	// Save caller state in g->sched
+	MOVV	R3, (g_sched+gobuf_sp)(g)
+	MOVV	R1, (g_sched+gobuf_pc)(g)
+	MOVV	R0, (g_sched+gobuf_lr)(g)
+
+	// Switch to m->g0 & its stack, call fn.
+	MOVV	g, R19
+	MOVV	g_m(g), R4
+	MOVV	m_g0(R4), g
+	JAL	runtime·save_g(SB)
+	BNE	g, R19, 2(PC)
+	JMP	runtime·badmcall(SB)
+	MOVV	fn+0(FP), REGCTXT			// context
+	MOVV	0(REGCTXT), R5			// code pointer
+	MOVV	(g_sched+gobuf_sp)(g), R3	// sp = m->g0->sched.sp
+	ADDV	$-16, R3
+	MOVV	R19, 8(R3)
+	MOVV	R0, 0(R3)
+	JAL	(R5)
+	JMP	runtime·badmcall2(SB)
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack. We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	UNDEF
+	JAL	(R1)	// make sure this function is not leaf
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-8
+	MOVV	fn+0(FP), R19	// R19 = fn
+	MOVV	R19, REGCTXT		// context
+	MOVV	g_m(g), R4	// R4 = m
+
+	MOVV	m_gsignal(R4), R5	// R5 = gsignal
+	BEQ	g, R5, noswitch
+
+	MOVV	m_g0(R4), R5	// R5 = g0
+	BEQ	g, R5, noswitch
+
+	MOVV	m_curg(R4), R6
+	BEQ	g, R6, switch
+
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOVV	$runtime·badsystemstack(SB), R7
+	JAL	(R7)
+	JAL	runtime·abort(SB)
+
+switch:
+	// save our state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	JAL	gosave_systemstack_switch<>(SB)
+
+	// switch to g0
+	MOVV	R5, g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R19
+	MOVV	R19, R3
+
+	// call target function
+	MOVV	0(REGCTXT), R6	// code pointer
+	JAL	(R6)
+
+	// switch back to g
+	MOVV	g_m(g), R4
+	MOVV	m_curg(R4), g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R3
+	MOVV	R0, (g_sched+gobuf_sp)(g)
+	RET
+
+noswitch:
+	// already on m stack, just call directly
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOVV	0(REGCTXT), R4	// code pointer
+	MOVV	0(R3), R1	// restore LR
+	ADDV	$8, R3
+	JMP	(R4)
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+// Caller has already loaded:
+// loong64: R5: LR
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	MOVV	g_m(g), R7
+	MOVV	m_g0(R7), R8
+	BNE	g, R8, 3(PC)
+	JAL	runtime·badmorestackg0(SB)
+	JAL	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOVV	m_gsignal(R7), R8
+	BNE	g, R8, 3(PC)
+	JAL	runtime·badmorestackgsignal(SB)
+	JAL	runtime·abort(SB)
+
+	// Called from f.
+	// Set g->sched to context in f.
+	MOVV	R3, (g_sched+gobuf_sp)(g)
+	MOVV	R1, (g_sched+gobuf_pc)(g)
+	MOVV	R5, (g_sched+gobuf_lr)(g)
+	MOVV	REGCTXT, (g_sched+gobuf_ctxt)(g)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	MOVV	R5, (m_morebuf+gobuf_pc)(R7)	// f's caller's PC
+	MOVV	R3, (m_morebuf+gobuf_sp)(R7)	// f's caller's SP
+	MOVV	g, (m_morebuf+gobuf_g)(R7)
+
+	// Call newstack on m->g0's stack.
+	MOVV	m_g0(R7), g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R3
+	// Create a stack frame on g0 to call newstack.
+	MOVV	R0, -8(R3)	// Zero saved LR in frame
+	ADDV	$-8, R3
+	JAL	runtime·newstack(SB)
+
+	// Not reached, but make sure the return PC from the call to newstack
+	// is still in this function, and not the beginning of the next.
+	UNDEF
+
+TEXT runtime·morestack_noctxt(SB),NOSPLIT|NOFRAME,$0-0
+	// Force SPWRITE. This function doesn't actually write SP,
+	// but it is called with a special calling convention where
+	// the caller doesn't save LR on stack but passes it as a
+	// register (R5), and the unwinder currently doesn't understand.
+	// Make it SPWRITE to stop unwinding. (See issue 54332)
+	MOVV    R3, R3
+
+	MOVV	R0, REGCTXT
+	JMP	runtime·morestack(SB)
+
+// reflectcall: call a function with the given argument list
+// func call(argtype *_type, f *FuncVal, arg *byte, argsize, retoffset uint32).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)		\
+	MOVV	$MAXSIZE, R30;		\
+	SGTU	R19, R30, R30;		\
+	BNE	R30, 3(PC);			\
+	MOVV	$NAME(SB), R4;	\
+	JMP	(R4)
+// Note: can't just "BR NAME(SB)" - bad inlining results.
+
+TEXT ·reflectcall(SB), NOSPLIT|NOFRAME, $0-48
+	MOVWU stackArgsSize+24(FP), R19
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVV	$runtime·badreflectcall(SB), R4
+	JMP	(R4)
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-24;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOVV	arg+16(FP), R4;			\
+	MOVWU	argsize+24(FP), R5;			\
+	MOVV	R3, R12;				\
+	ADDV	$8, R12;			\
+	ADDV	R12, R5;				\
+	BEQ	R12, R5, 6(PC);				\
+	MOVBU	(R4), R6;			\
+	ADDV	$1, R4;			\
+	MOVBU	R6, (R12);			\
+	ADDV	$1, R12;			\
+	JMP	-5(PC);				\
+	/* call function */			\
+	MOVV	f+8(FP), REGCTXT;			\
+	MOVV	(REGCTXT), R6;			\
+	PCDATA  $PCDATA_StackMapIndex, $0;	\
+	JAL	(R6);				\
+	/* copy return values back */		\
+	MOVV	argtype+0(FP), R7;		\
+	MOVV	arg+16(FP), R4;			\
+	MOVWU	n+24(FP), R5;			\
+	MOVWU	retoffset+28(FP), R6;		\
+	ADDV	$8, R3, R12;				\
+	ADDV	R6, R12; 			\
+	ADDV	R6, R4;				\
+	SUBVU	R6, R5;				\
+	JAL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $32-0
+	MOVV	R7, 8(R3)
+	MOVV	R4, 16(R3)
+	MOVV	R12, 24(R3)
+	MOVV	R5, 32(R3)
+	JAL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+TEXT runtime·procyield(SB),NOSPLIT,$0-0
+	RET
+
+// Save state of caller into g->sched.
+// but using fake PC from systemstack_switch.
+// Must only be called from functions with no locals ($0)
+// or else unwinding from systemstack_switch is incorrect.
+// Smashes R19.
+TEXT gosave_systemstack_switch<>(SB),NOSPLIT|NOFRAME,$0
+	MOVV    $runtime·systemstack_switch(SB), R19
+	ADDV	$8, R19
+	MOVV	R19, (g_sched+gobuf_pc)(g)
+	MOVV	R3, (g_sched+gobuf_sp)(g)
+	MOVV	R0, (g_sched+gobuf_lr)(g)
+	MOVV	R0, (g_sched+gobuf_ret)(g)
+	// Assert ctxt is zero. See func save.
+	MOVV	(g_sched+gobuf_ctxt)(g), R19
+	BEQ	R19, 2(PC)
+	JAL	runtime·abort(SB)
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-20
+	MOVV	fn+0(FP), R25
+	MOVV	arg+8(FP), R4
+
+	MOVV	R3, R12	// save original stack pointer
+	MOVV	g, R13
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already.
+	MOVV	g_m(g), R5
+	MOVV	m_gsignal(R5), R6
+	BEQ	R6, g, g0
+	MOVV	m_g0(R5), R6
+	BEQ	R6, g, g0
+
+	JAL	gosave_systemstack_switch<>(SB)
+	MOVV	R6, g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R3
+
+	// Now on a scheduling stack (a pthread-created stack).
+g0:
+	// Save room for two of our pointers.
+	ADDV	$-16, R3
+	MOVV	R13, 0(R3)	// save old g on stack
+	MOVV	(g_stack+stack_hi)(R13), R13
+	SUBVU	R12, R13
+	MOVV	R13, 8(R3)	// save depth in old g stack (can't just save SP, as stack might be copied during a callback)
+	JAL	(R25)
+
+	// Restore g, stack pointer. R4 is return value.
+	MOVV	0(R3), g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_stack+stack_hi)(g), R5
+	MOVV	8(R3), R6
+	SUBVU	R6, R5
+	MOVV	R5, R3
+
+	MOVW	R4, ret+16(FP)
+	RET
+
+// func cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$24-24
+	NO_LOCAL_POINTERS
+
+	// Skip cgocallbackg, just dropm when fn is nil, and frame is the saved g.
+	// It is used to dropm while thread is exiting.
+	MOVV    fn+0(FP), R5
+	BNE	R5, loadg
+	// Restore the g from frame.
+	MOVV    frame+8(FP), g
+	JMP	dropm
+
+loadg:
+	// Load m and g from thread-local storage.
+	MOVB	runtime·iscgo(SB), R19
+	BEQ	R19, nocgo
+	JAL	runtime·load_g(SB)
+nocgo:
+
+	// If g is nil, Go did not create the current thread,
+	// or if this thread never called into Go on pthread platforms.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call.
+	BEQ	g, needm
+
+	MOVV	g_m(g), R12
+	MOVV	R12, savedm-8(SP)
+	JMP	havem
+
+needm:
+	MOVV	g, savedm-8(SP) // g is zero, so is m.
+	MOVV	$runtime·needAndBindM(SB), R4
+	JAL	(R4)
+
+	// Set m->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVV	g_m(g), R12
+	MOVV	m_g0(R12), R19
+	MOVV	R3, (g_sched+gobuf_sp)(R19)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 8(R29) aka savedsp-16(SP).
+	MOVV	m_g0(R12), R19
+	MOVV	(g_sched+gobuf_sp)(R19), R13
+	MOVV	R13, savedsp-24(SP) // must match frame size
+	MOVV	R3, (g_sched+gobuf_sp)(R19)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the stack.
+	// This has the added benefit that it looks to the traceback
+	// routine like cgocallbackg is going to return to that
+	// PC (because the frame we allocate below has the same
+	// size as cgocallback_gofunc's frame declared above)
+	// so that the traceback will seamlessly trace back into
+	// the earlier calls.
+	MOVV	m_curg(R12), g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R13 // prepare stack as R13
+	MOVV	(g_sched+gobuf_pc)(g), R4
+	MOVV	R4, -(24+8)(R13) // "saved LR"; must match frame size
+	MOVV    fn+0(FP), R5
+	MOVV    frame+8(FP), R6
+	MOVV    ctxt+16(FP), R7
+	MOVV	$-(24+8)(R13), R3
+	MOVV    R5, 8(R3)
+	MOVV    R6, 16(R3)
+	MOVV    R7, 24(R3)
+	JAL	runtime·cgocallbackg(SB)
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	MOVV	0(R3), R4
+	MOVV	R4, (g_sched+gobuf_pc)(g)
+	MOVV	$(24+8)(R3), R13 // must match frame size
+	MOVV	R13, (g_sched+gobuf_sp)(g)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVV	g_m(g), R12
+	MOVV	m_g0(R12), g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R3
+	MOVV	savedsp-24(SP), R13 // must match frame size
+	MOVV	R13, (g_sched+gobuf_sp)(g)
+
+	// If the m on entry was nil, we called needm above to borrow an m,
+	// 1. for the duration of the call on non-pthread platforms,
+	// 2. or the duration of the C thread alive on pthread platforms.
+	// If the m on entry wasn't nil,
+	// 1. the thread might be a Go thread,
+	// 2. or it wasn't the first call from a C thread on pthread platforms,
+	//    since then we skip dropm to reuse the m in the first call.
+	MOVV	savedm-8(SP), R12
+	BNE	R12, droppedm
+
+	// Skip dropm to reuse it in the next call, when a pthread key has been created.
+	MOVV	_cgo_pthread_key_created(SB), R12
+	// It means cgo is disabled when _cgo_pthread_key_created is a nil pointer, need dropm.
+	BEQ	R12, dropm
+	MOVV    (R12), R12
+	BNE	R12, droppedm
+
+dropm:
+	MOVV	$runtime·dropm(SB), R4
+	JAL	(R4)
+droppedm:
+
+	// Done!
+	RET
+
+// void setg(G*); set g. for use by needm.
+TEXT runtime·setg(SB), NOSPLIT, $0-8
+	MOVV	gg+0(FP), g
+	// This only happens if iscgo, so jump straight to save_g
+	JAL	runtime·save_g(SB)
+	RET
+
+// void setg_gcc(G*); set g called from gcc with g in R19
+TEXT setg_gcc<>(SB),NOSPLIT,$0-0
+	MOVV	R19, g
+	JAL	runtime·save_g(SB)
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	(R0), R0
+	UNDEF
+
+// AES hashing not implemented for loong64
+TEXT runtime·memhash(SB),NOSPLIT|NOFRAME,$0-32
+	JMP	runtime·memhashFallback(SB)
+TEXT runtime·strhash(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·strhashFallback(SB)
+TEXT runtime·memhash32(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash32Fallback(SB)
+TEXT runtime·memhash64(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash64Fallback(SB)
+
+TEXT runtime·return0(SB), NOSPLIT, $0
+	MOVW	$0, R19
+	RET
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT,$16
+	// g (R22) and REGTMP (R30)  might be clobbered by load_g. They
+	// are callee-save in the gcc calling convention, so save them.
+	MOVV	R30, savedREGTMP-16(SP)
+	MOVV	g, savedG-8(SP)
+
+	JAL	runtime·load_g(SB)
+	MOVV	g_m(g), R19
+	MOVV	m_curg(R19), R19
+	MOVV	(g_stack+stack_hi)(R19), R4 // return value in R4
+
+	MOVV	savedG-8(SP), g
+	MOVV	savedREGTMP-16(SP), R30
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+TEXT runtime·goexit(SB),NOSPLIT|NOFRAME|TOPFRAME,$0-0
+	NOOP
+	JAL	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	NOOP
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	MOVW	$1, R19
+	MOVB	R19, ret+0(FP)
+	RET
+
+// gcWriteBarrier informs the GC about heap pointer writes.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It accepts the
+// number of bytes of buffer needed in R29, and returns a pointer
+// to the buffer space in R29.
+// It clobbers R30 (the linker temp register).
+// The act of CALLing gcWriteBarrier will clobber R1 (LR).
+// It does not clobber any other general-purpose registers,
+// but may clobber others (e.g., floating point registers).
+TEXT gcWriteBarrier<>(SB),NOSPLIT,$216
+	// Save the registers clobbered by the fast path.
+	MOVV	R19, 208(R3)
+	MOVV	R13, 216(R3)
+retry:
+	MOVV	g_m(g), R19
+	MOVV	m_p(R19), R19
+	MOVV	(p_wbBuf+wbBuf_next)(R19), R13
+	MOVV	(p_wbBuf+wbBuf_end)(R19), R30 // R30 is linker temp register
+	// Increment wbBuf.next position.
+	ADDV	R29, R13
+	// Is the buffer full?
+	BLTU	R30, R13, flush
+	// Commit to the larger buffer.
+	MOVV	R13, (p_wbBuf+wbBuf_next)(R19)
+	// Make return value (the original next position)
+	SUBV	R29, R13, R29
+	// Restore registers.
+	MOVV	208(R3), R19
+	MOVV	216(R3), R13
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	MOVV	R27, 8(R3)
+	MOVV	R28, 16(R3)
+	// R1 is LR, which was saved by the prologue.
+	MOVV	R2, 24(R3)
+	// R3 is SP.
+	MOVV	R4, 32(R3)
+	MOVV	R5, 40(R3)
+	MOVV	R6, 48(R3)
+	MOVV	R7, 56(R3)
+	MOVV	R8, 64(R3)
+	MOVV	R9, 72(R3)
+	MOVV	R10, 80(R3)
+	MOVV	R11, 88(R3)
+	MOVV	R12, 96(R3)
+	// R13 already saved
+	MOVV	R14, 104(R3)
+	MOVV	R15, 112(R3)
+	MOVV	R16, 120(R3)
+	MOVV	R17, 128(R3)
+	MOVV	R18, 136(R3)
+	// R19 already saved
+	MOVV	R20, 144(R3)
+	MOVV	R21, 152(R3)
+	// R22 is g.
+	MOVV	R23, 160(R3)
+	MOVV	R24, 168(R3)
+	MOVV	R25, 176(R3)
+	MOVV	R26, 184(R3)
+	// R27 already saved
+	// R28 already saved.
+	MOVV	R29, 192(R3)
+	// R30 is tmp register.
+	MOVV	R31, 200(R3)
+
+	CALL	runtime·wbBufFlush(SB)
+
+	MOVV	8(R3), R27
+	MOVV	16(R3), R28
+	MOVV	24(R3), R2
+	MOVV	32(R3), R4
+	MOVV	40(R3), R5
+	MOVV	48(R3), R6
+	MOVV	56(R3), R7
+	MOVV	64(R3), R8
+	MOVV	72(R3), R9
+	MOVV	80(R3), R10
+	MOVV	88(R3), R11
+	MOVV	96(R3), R12
+	MOVV	104(R3), R14
+	MOVV	112(R3), R15
+	MOVV	120(R3), R16
+	MOVV	128(R3), R17
+	MOVV	136(R3), R18
+	MOVV	144(R3), R20
+	MOVV	152(R3), R21
+	MOVV	160(R3), R23
+	MOVV	168(R3), R24
+	MOVV	176(R3), R25
+	MOVV	184(R3), R26
+	MOVV	192(R3), R29
+	MOVV	200(R3), R31
+	JMP	retry
+
+TEXT runtime·gcWriteBarrier1<ABIInternal>(SB),NOSPLIT,$0
+	MOVV	$8, R29
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier2<ABIInternal>(SB),NOSPLIT,$0
+	MOVV	$16, R29
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier3<ABIInternal>(SB),NOSPLIT,$0
+	MOVV	$24, R29
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier4<ABIInternal>(SB),NOSPLIT,$0
+	MOVV	$32, R29
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier5<ABIInternal>(SB),NOSPLIT,$0
+	MOVV	$40, R29
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier6<ABIInternal>(SB),NOSPLIT,$0
+	MOVV	$48, R29
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier7<ABIInternal>(SB),NOSPLIT,$0
+	MOVV	$56, R29
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier8<ABIInternal>(SB),NOSPLIT,$0
+	MOVV	$64, R29
+	JMP	gcWriteBarrier<>(SB)
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+TEXT runtime·panicIndex(SB),NOSPLIT,$0-16
+	MOVV	R19, x+0(FP)
+	MOVV	R18, y+8(FP)
+	JMP	runtime·goPanicIndex(SB)
+TEXT runtime·panicIndexU(SB),NOSPLIT,$0-16
+	MOVV	R19, x+0(FP)
+	MOVV	R18, y+8(FP)
+	JMP	runtime·goPanicIndexU(SB)
+TEXT runtime·panicSliceAlen(SB),NOSPLIT,$0-16
+	MOVV	R18, x+0(FP)
+	MOVV	R17, y+8(FP)
+	JMP	runtime·goPanicSliceAlen(SB)
+TEXT runtime·panicSliceAlenU(SB),NOSPLIT,$0-16
+	MOVV	R18, x+0(FP)
+	MOVV	R17, y+8(FP)
+	JMP	runtime·goPanicSliceAlenU(SB)
+TEXT runtime·panicSliceAcap(SB),NOSPLIT,$0-16
+	MOVV	R18, x+0(FP)
+	MOVV	R17, y+8(FP)
+	JMP	runtime·goPanicSliceAcap(SB)
+TEXT runtime·panicSliceAcapU(SB),NOSPLIT,$0-16
+	MOVV	R18, x+0(FP)
+	MOVV	R17, y+8(FP)
+	JMP	runtime·goPanicSliceAcapU(SB)
+TEXT runtime·panicSliceB(SB),NOSPLIT,$0-16
+	MOVV	R19, x+0(FP)
+	MOVV	R18, y+8(FP)
+	JMP	runtime·goPanicSliceB(SB)
+TEXT runtime·panicSliceBU(SB),NOSPLIT,$0-16
+	MOVV	R19, x+0(FP)
+	MOVV	R18, y+8(FP)
+	JMP	runtime·goPanicSliceBU(SB)
+TEXT runtime·panicSlice3Alen(SB),NOSPLIT,$0-16
+	MOVV	R17, x+0(FP)
+	MOVV	R4, y+8(FP)
+	JMP	runtime·goPanicSlice3Alen(SB)
+TEXT runtime·panicSlice3AlenU(SB),NOSPLIT,$0-16
+	MOVV	R17, x+0(FP)
+	MOVV	R4, y+8(FP)
+	JMP	runtime·goPanicSlice3AlenU(SB)
+TEXT runtime·panicSlice3Acap(SB),NOSPLIT,$0-16
+	MOVV	R17, x+0(FP)
+	MOVV	R4, y+8(FP)
+	JMP	runtime·goPanicSlice3Acap(SB)
+TEXT runtime·panicSlice3AcapU(SB),NOSPLIT,$0-16
+	MOVV	R17, x+0(FP)
+	MOVV	R4, y+8(FP)
+	JMP	runtime·goPanicSlice3AcapU(SB)
+TEXT runtime·panicSlice3B(SB),NOSPLIT,$0-16
+	MOVV	R18, x+0(FP)
+	MOVV	R17, y+8(FP)
+	JMP	runtime·goPanicSlice3B(SB)
+TEXT runtime·panicSlice3BU(SB),NOSPLIT,$0-16
+	MOVV	R18, x+0(FP)
+	MOVV	R17, y+8(FP)
+	JMP	runtime·goPanicSlice3BU(SB)
+TEXT runtime·panicSlice3C(SB),NOSPLIT,$0-16
+	MOVV	R19, x+0(FP)
+	MOVV	R18, y+8(FP)
+	JMP	runtime·goPanicSlice3C(SB)
+TEXT runtime·panicSlice3CU(SB),NOSPLIT,$0-16
+	MOVV	R19, x+0(FP)
+	MOVV	R18, y+8(FP)
+	JMP	runtime·goPanicSlice3CU(SB)
+TEXT runtime·panicSliceConvert(SB),NOSPLIT,$0-16
+	MOVV	R17, x+0(FP)
+	MOVV	R4, y+8(FP)
+	JMP	runtime·goPanicSliceConvert(SB)
diff --git a/src/runtime/asm_mips64x.s b/src/runtime/asm_mips64x.s
new file mode 100644
index 0000000..19592b5
--- /dev/null
+++ b/src/runtime/asm_mips64x.s
@@ -0,0 +1,850 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips64 || mips64le
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+#define	REGCTXT	R22
+
+TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
+	// R29 = stack; R4 = argc; R5 = argv
+
+	ADDV	$-24, R29
+	MOVW	R4, 8(R29) // argc
+	MOVV	R5, 16(R29) // argv
+
+	// create istack out of the given (operating system) stack.
+	// _cgo_init may update stackguard.
+	MOVV	$runtime·g0(SB), g
+	MOVV	$(-64*1024), R23
+	ADDV	R23, R29, R1
+	MOVV	R1, g_stackguard0(g)
+	MOVV	R1, g_stackguard1(g)
+	MOVV	R1, (g_stack+stack_lo)(g)
+	MOVV	R29, (g_stack+stack_hi)(g)
+
+	// if there is a _cgo_init, call it using the gcc ABI.
+	MOVV	_cgo_init(SB), R25
+	BEQ	R25, nocgo
+
+	MOVV	R0, R7	// arg 3: not used
+	MOVV	R0, R6	// arg 2: not used
+	MOVV	$setg_gcc<>(SB), R5	// arg 1: setg
+	MOVV	g, R4	// arg 0: G
+	JAL	(R25)
+
+nocgo:
+	// update stackguard after _cgo_init
+	MOVV	(g_stack+stack_lo)(g), R1
+	ADDV	$const_stackGuard, R1
+	MOVV	R1, g_stackguard0(g)
+	MOVV	R1, g_stackguard1(g)
+
+	// set the per-goroutine and per-mach "registers"
+	MOVV	$runtime·m0(SB), R1
+
+	// save m->g0 = g0
+	MOVV	g, m_g0(R1)
+	// save m0 to g0->m
+	MOVV	R1, g_m(g)
+
+	JAL	runtime·check(SB)
+
+	// args are already prepared
+	JAL	runtime·args(SB)
+	JAL	runtime·osinit(SB)
+	JAL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	MOVV	$runtime·mainPC(SB), R1		// entry
+	ADDV	$-16, R29
+	MOVV	R1, 8(R29)
+	MOVV	R0, 0(R29)
+	JAL	runtime·newproc(SB)
+	ADDV	$16, R29
+
+	// start this M
+	JAL	runtime·mstart(SB)
+
+	MOVV	R0, 1(R0)
+	RET
+
+DATA	runtime·mainPC+0(SB)/8,$runtime·main(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$8
+
+TEXT runtime·breakpoint(SB),NOSPLIT|NOFRAME,$0-0
+	MOVV	R0, 2(R0) // TODO: TD
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT|NOFRAME,$0-0
+	RET
+
+TEXT runtime·mstart(SB),NOSPLIT|TOPFRAME,$0
+	JAL	runtime·mstart0(SB)
+	RET // not reached
+
+/*
+ *  go-routine
+ */
+
+// void gogo(Gobuf*)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB), NOSPLIT|NOFRAME, $0-8
+	MOVV	buf+0(FP), R3
+	MOVV	gobuf_g(R3), R4
+	MOVV	0(R4), R0	// make sure g != nil
+	JMP	gogo<>(SB)
+
+TEXT gogo<>(SB), NOSPLIT|NOFRAME, $0
+	MOVV	R4, g
+	JAL	runtime·save_g(SB)
+
+	MOVV	0(g), R2
+	MOVV	gobuf_sp(R3), R29
+	MOVV	gobuf_lr(R3), R31
+	MOVV	gobuf_ret(R3), R1
+	MOVV	gobuf_ctxt(R3), REGCTXT
+	MOVV	R0, gobuf_sp(R3)
+	MOVV	R0, gobuf_ret(R3)
+	MOVV	R0, gobuf_lr(R3)
+	MOVV	R0, gobuf_ctxt(R3)
+	MOVV	gobuf_pc(R3), R4
+	JMP	(R4)
+
+// void mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall(SB), NOSPLIT|NOFRAME, $0-8
+	// Save caller state in g->sched
+	MOVV	R29, (g_sched+gobuf_sp)(g)
+	MOVV	R31, (g_sched+gobuf_pc)(g)
+	MOVV	R0, (g_sched+gobuf_lr)(g)
+
+	// Switch to m->g0 & its stack, call fn.
+	MOVV	g, R1
+	MOVV	g_m(g), R3
+	MOVV	m_g0(R3), g
+	JAL	runtime·save_g(SB)
+	BNE	g, R1, 2(PC)
+	JMP	runtime·badmcall(SB)
+	MOVV	fn+0(FP), REGCTXT			// context
+	MOVV	0(REGCTXT), R4			// code pointer
+	MOVV	(g_sched+gobuf_sp)(g), R29	// sp = m->g0->sched.sp
+	ADDV	$-16, R29
+	MOVV	R1, 8(R29)
+	MOVV	R0, 0(R29)
+	JAL	(R4)
+	JMP	runtime·badmcall2(SB)
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack. We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	UNDEF
+	JAL	(R31)	// make sure this function is not leaf
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-8
+	MOVV	fn+0(FP), R1	// R1 = fn
+	MOVV	R1, REGCTXT		// context
+	MOVV	g_m(g), R2	// R2 = m
+
+	MOVV	m_gsignal(R2), R3	// R3 = gsignal
+	BEQ	g, R3, noswitch
+
+	MOVV	m_g0(R2), R3	// R3 = g0
+	BEQ	g, R3, noswitch
+
+	MOVV	m_curg(R2), R4
+	BEQ	g, R4, switch
+
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOVV	$runtime·badsystemstack(SB), R4
+	JAL	(R4)
+	JAL	runtime·abort(SB)
+
+switch:
+	// save our state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	JAL	gosave_systemstack_switch<>(SB)
+
+	// switch to g0
+	MOVV	R3, g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R1
+	MOVV	R1, R29
+
+	// call target function
+	MOVV	0(REGCTXT), R4	// code pointer
+	JAL	(R4)
+
+	// switch back to g
+	MOVV	g_m(g), R1
+	MOVV	m_curg(R1), g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R29
+	MOVV	R0, (g_sched+gobuf_sp)(g)
+	RET
+
+noswitch:
+	// already on m stack, just call directly
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOVV	0(REGCTXT), R4	// code pointer
+	MOVV	0(R29), R31	// restore LR
+	ADDV	$8, R29
+	JMP	(R4)
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+// Caller has already loaded:
+// R1: framesize, R2: argsize, R3: LR
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	MOVV	g_m(g), R7
+	MOVV	m_g0(R7), R8
+	BNE	g, R8, 3(PC)
+	JAL	runtime·badmorestackg0(SB)
+	JAL	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOVV	m_gsignal(R7), R8
+	BNE	g, R8, 3(PC)
+	JAL	runtime·badmorestackgsignal(SB)
+	JAL	runtime·abort(SB)
+
+	// Called from f.
+	// Set g->sched to context in f.
+	MOVV	R29, (g_sched+gobuf_sp)(g)
+	MOVV	R31, (g_sched+gobuf_pc)(g)
+	MOVV	R3, (g_sched+gobuf_lr)(g)
+	MOVV	REGCTXT, (g_sched+gobuf_ctxt)(g)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	MOVV	R3, (m_morebuf+gobuf_pc)(R7)	// f's caller's PC
+	MOVV	R29, (m_morebuf+gobuf_sp)(R7)	// f's caller's SP
+	MOVV	g, (m_morebuf+gobuf_g)(R7)
+
+	// Call newstack on m->g0's stack.
+	MOVV	m_g0(R7), g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R29
+	// Create a stack frame on g0 to call newstack.
+	MOVV	R0, -8(R29)	// Zero saved LR in frame
+	ADDV	$-8, R29
+	JAL	runtime·newstack(SB)
+
+	// Not reached, but make sure the return PC from the call to newstack
+	// is still in this function, and not the beginning of the next.
+	UNDEF
+
+TEXT runtime·morestack_noctxt(SB),NOSPLIT|NOFRAME,$0-0
+	// Force SPWRITE. This function doesn't actually write SP,
+	// but it is called with a special calling convention where
+	// the caller doesn't save LR on stack but passes it as a
+	// register (R3), and the unwinder currently doesn't understand.
+	// Make it SPWRITE to stop unwinding. (See issue 54332)
+	MOVV	R29, R29
+
+	MOVV	R0, REGCTXT
+	JMP	runtime·morestack(SB)
+
+// reflectcall: call a function with the given argument list
+// func call(stackArgsType *_type, f *FuncVal, stackArgs *byte, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)		\
+	MOVV	$MAXSIZE, R23;		\
+	SGTU	R1, R23, R23;		\
+	BNE	R23, 3(PC);			\
+	MOVV	$NAME(SB), R4;	\
+	JMP	(R4)
+// Note: can't just "BR NAME(SB)" - bad inlining results.
+
+TEXT ·reflectcall(SB), NOSPLIT|NOFRAME, $0-48
+	MOVWU	frameSize+32(FP), R1
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVV	$runtime·badreflectcall(SB), R4
+	JMP	(R4)
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-48;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOVV	stackArgs+16(FP), R1;			\
+	MOVWU	stackArgsSize+24(FP), R2;			\
+	MOVV	R29, R3;				\
+	ADDV	$8, R3;			\
+	ADDV	R3, R2;				\
+	BEQ	R3, R2, 6(PC);				\
+	MOVBU	(R1), R4;			\
+	ADDV	$1, R1;			\
+	MOVBU	R4, (R3);			\
+	ADDV	$1, R3;			\
+	JMP	-5(PC);				\
+	/* call function */			\
+	MOVV	f+8(FP), REGCTXT;			\
+	MOVV	(REGCTXT), R4;			\
+	PCDATA  $PCDATA_StackMapIndex, $0;	\
+	JAL	(R4);				\
+	/* copy return values back */		\
+	MOVV	stackArgsType+0(FP), R5;		\
+	MOVV	stackArgs+16(FP), R1;			\
+	MOVWU	stackArgsSize+24(FP), R2;			\
+	MOVWU	stackRetOffset+28(FP), R4;		\
+	ADDV	$8, R29, R3;				\
+	ADDV	R4, R3; 			\
+	ADDV	R4, R1;				\
+	SUBVU	R4, R2;				\
+	JAL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $40-0
+	MOVV	R5, 8(R29)
+	MOVV	R1, 16(R29)
+	MOVV	R3, 24(R29)
+	MOVV	R2, 32(R29)
+	MOVV	$0, 40(R29)
+	JAL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+TEXT runtime·procyield(SB),NOSPLIT,$0-0
+	RET
+
+// Save state of caller into g->sched,
+// but using fake PC from systemstack_switch.
+// Must only be called from functions with no locals ($0)
+// or else unwinding from systemstack_switch is incorrect.
+// Smashes R1.
+TEXT gosave_systemstack_switch<>(SB),NOSPLIT|NOFRAME,$0
+	MOVV	$runtime·systemstack_switch(SB), R1
+	ADDV	$8, R1	// get past prologue
+	MOVV	R1, (g_sched+gobuf_pc)(g)
+	MOVV	R29, (g_sched+gobuf_sp)(g)
+	MOVV	R0, (g_sched+gobuf_lr)(g)
+	MOVV	R0, (g_sched+gobuf_ret)(g)
+	// Assert ctxt is zero. See func save.
+	MOVV	(g_sched+gobuf_ctxt)(g), R1
+	BEQ	R1, 2(PC)
+	JAL	runtime·abort(SB)
+	RET
+
+// func asmcgocall_no_g(fn, arg unsafe.Pointer)
+// Call fn(arg) aligned appropriately for the gcc ABI.
+// Called on a system stack, and there may be no g yet (during needm).
+TEXT ·asmcgocall_no_g(SB),NOSPLIT,$0-16
+	MOVV	fn+0(FP), R25
+	MOVV	arg+8(FP), R4
+	JAL	(R25)
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-20
+	MOVV	fn+0(FP), R25
+	MOVV	arg+8(FP), R4
+
+	MOVV	R29, R3	// save original stack pointer
+	MOVV	g, R2
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already. Or we might already
+	// be on the m->gsignal stack.
+	MOVV	g_m(g), R5
+	MOVV	m_gsignal(R5), R6
+	BEQ	R6, g, g0
+	MOVV	m_g0(R5), R6
+	BEQ	R6, g, g0
+
+	JAL	gosave_systemstack_switch<>(SB)
+	MOVV	R6, g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R29
+
+	// Now on a scheduling stack (a pthread-created stack).
+g0:
+	// Save room for two of our pointers.
+	ADDV	$-16, R29
+	MOVV	R2, 0(R29)	// save old g on stack
+	MOVV	(g_stack+stack_hi)(R2), R2
+	SUBVU	R3, R2
+	MOVV	R2, 8(R29)	// save depth in old g stack (can't just save SP, as stack might be copied during a callback)
+	JAL	(R25)
+
+	// Restore g, stack pointer. R2 is return value.
+	MOVV	0(R29), g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_stack+stack_hi)(g), R5
+	MOVV	8(R29), R6
+	SUBVU	R6, R5
+	MOVV	R5, R29
+
+	MOVW	R2, ret+16(FP)
+	RET
+
+// func cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$24-24
+	NO_LOCAL_POINTERS
+
+	// Skip cgocallbackg, just dropm when fn is nil, and frame is the saved g.
+	// It is used to dropm while thread is exiting.
+	MOVV	fn+0(FP), R5
+	BNE	R5, loadg
+	// Restore the g from frame.
+	MOVV	frame+8(FP), g
+	JMP	dropm
+
+loadg:
+	// Load m and g from thread-local storage.
+	MOVB	runtime·iscgo(SB), R1
+	BEQ	R1, nocgo
+	JAL	runtime·load_g(SB)
+nocgo:
+
+	// If g is nil, Go did not create the current thread,
+	// or if this thread never called into Go on pthread platforms.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call.
+	BEQ	g, needm
+
+	MOVV	g_m(g), R3
+	MOVV	R3, savedm-8(SP)
+	JMP	havem
+
+needm:
+	MOVV	g, savedm-8(SP) // g is zero, so is m.
+	MOVV	$runtime·needAndBindM(SB), R4
+	JAL	(R4)
+
+	// Set m->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVV	g_m(g), R3
+	MOVV	m_g0(R3), R1
+	MOVV	R29, (g_sched+gobuf_sp)(R1)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 8(R29) aka savedsp-16(SP).
+	MOVV	m_g0(R3), R1
+	MOVV	(g_sched+gobuf_sp)(R1), R2
+	MOVV	R2, savedsp-24(SP)	// must match frame size
+	MOVV	R29, (g_sched+gobuf_sp)(R1)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOVV	m_curg(R3), g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R2 // prepare stack as R2
+	MOVV	(g_sched+gobuf_pc)(g), R4
+	MOVV	R4, -(24+8)(R2)	// "saved LR"; must match frame size
+	// Gather our arguments into registers.
+	MOVV	fn+0(FP), R5
+	MOVV	frame+8(FP), R6
+	MOVV	ctxt+16(FP), R7
+	MOVV	$-(24+8)(R2), R29	// switch stack; must match frame size
+	MOVV	R5, 8(R29)
+	MOVV	R6, 16(R29)
+	MOVV	R7, 24(R29)
+	JAL	runtime·cgocallbackg(SB)
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	MOVV	0(R29), R4
+	MOVV	R4, (g_sched+gobuf_pc)(g)
+	MOVV	$(24+8)(R29), R2	// must match frame size
+	MOVV	R2, (g_sched+gobuf_sp)(g)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVV	g_m(g), R3
+	MOVV	m_g0(R3), g
+	JAL	runtime·save_g(SB)
+	MOVV	(g_sched+gobuf_sp)(g), R29
+	MOVV	savedsp-24(SP), R2	// must match frame size
+	MOVV	R2, (g_sched+gobuf_sp)(g)
+
+	// If the m on entry was nil, we called needm above to borrow an m,
+	// 1. for the duration of the call on non-pthread platforms,
+	// 2. or the duration of the C thread alive on pthread platforms.
+	// If the m on entry wasn't nil,
+	// 1. the thread might be a Go thread,
+	// 2. or it wasn't the first call from a C thread on pthread platforms,
+	//    since then we skip dropm to reuse the m in the first call.
+	MOVV	savedm-8(SP), R3
+	BNE	R3, droppedm
+
+	// Skip dropm to reuse it in the next call, when a pthread key has been created.
+	MOVV	_cgo_pthread_key_created(SB), R3
+	// It means cgo is disabled when _cgo_pthread_key_created is a nil pointer, need dropm.
+	BEQ	R3, dropm
+	MOVV	(R3), R3
+	BNE	R3, droppedm
+
+dropm:
+	MOVV	$runtime·dropm(SB), R4
+	JAL	(R4)
+droppedm:
+
+	// Done!
+	RET
+
+// void setg(G*); set g. for use by needm.
+TEXT runtime·setg(SB), NOSPLIT, $0-8
+	MOVV	gg+0(FP), g
+	// This only happens if iscgo, so jump straight to save_g
+	JAL	runtime·save_g(SB)
+	RET
+
+// void setg_gcc(G*); set g called from gcc with g in R1
+TEXT setg_gcc<>(SB),NOSPLIT,$0-0
+	MOVV	R1, g
+	JAL	runtime·save_g(SB)
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	(R0), R0
+	UNDEF
+
+// AES hashing not implemented for mips64
+TEXT runtime·memhash(SB),NOSPLIT|NOFRAME,$0-32
+	JMP	runtime·memhashFallback(SB)
+TEXT runtime·strhash(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·strhashFallback(SB)
+TEXT runtime·memhash32(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash32Fallback(SB)
+TEXT runtime·memhash64(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash64Fallback(SB)
+
+TEXT runtime·return0(SB), NOSPLIT, $0
+	MOVW	$0, R1
+	RET
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT,$16
+	// g (R30) and REGTMP (R23)  might be clobbered by load_g. They
+	// are callee-save in the gcc calling convention, so save them.
+	MOVV	R23, savedR23-16(SP)
+	MOVV	g, savedG-8(SP)
+
+	JAL	runtime·load_g(SB)
+	MOVV	g_m(g), R1
+	MOVV	m_curg(R1), R1
+	MOVV	(g_stack+stack_hi)(R1), R2 // return value in R2
+
+	MOVV	savedG-8(SP), g
+	MOVV	savedR23-16(SP), R23
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+TEXT runtime·goexit(SB),NOSPLIT|NOFRAME|TOPFRAME,$0-0
+	NOR	R0, R0	// NOP
+	JAL	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	NOR	R0, R0	// NOP
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	MOVW	$1, R1
+	MOVB	R1, ret+0(FP)
+	RET
+
+// gcWriteBarrier informs the GC about heap pointer writes.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It accepts the
+// number of bytes of buffer needed in R25, and returns a pointer
+// to the buffer space in R25.
+// It clobbers R23 (the linker temp register).
+// The act of CALLing gcWriteBarrier will clobber R31 (LR).
+// It does not clobber any other general-purpose registers,
+// but may clobber others (e.g., floating point registers).
+TEXT gcWriteBarrier<>(SB),NOSPLIT,$192
+	// Save the registers clobbered by the fast path.
+	MOVV	R1, 184(R29)
+	MOVV	R2, 192(R29)
+retry:
+	MOVV	g_m(g), R1
+	MOVV	m_p(R1), R1
+	MOVV	(p_wbBuf+wbBuf_next)(R1), R2
+	MOVV	(p_wbBuf+wbBuf_end)(R1), R23 // R23 is linker temp register
+	// Increment wbBuf.next position.
+	ADDV	R25, R2
+	// Is the buffer full?
+	SGTU	R2, R23, R23
+	BNE	R23, flush
+	// Commit to the larger buffer.
+	MOVV	R2, (p_wbBuf+wbBuf_next)(R1)
+	// Make return value (the original next position)
+	SUBV	R25, R2, R25
+	// Restore registers.
+	MOVV	184(R29), R1
+	MOVV	192(R29), R2
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	MOVV	R20, 8(R29)
+	MOVV	R21, 16(R29)
+	// R1 already saved
+	// R2 already saved
+	MOVV	R3, 24(R29)
+	MOVV	R4, 32(R29)
+	MOVV	R5, 40(R29)
+	MOVV	R6, 48(R29)
+	MOVV	R7, 56(R29)
+	MOVV	R8, 64(R29)
+	MOVV	R9, 72(R29)
+	MOVV	R10, 80(R29)
+	MOVV	R11, 88(R29)
+	MOVV	R12, 96(R29)
+	MOVV	R13, 104(R29)
+	MOVV	R14, 112(R29)
+	MOVV	R15, 120(R29)
+	MOVV	R16, 128(R29)
+	MOVV	R17, 136(R29)
+	MOVV	R18, 144(R29)
+	MOVV	R19, 152(R29)
+	// R20 already saved
+	// R21 already saved.
+	MOVV	R22, 160(R29)
+	// R23 is tmp register.
+	MOVV	R24, 168(R29)
+	MOVV	R25, 176(R29)
+	// R26 is reserved by kernel.
+	// R27 is reserved by kernel.
+	// R28 is REGSB (not modified by Go code).
+	// R29 is SP.
+	// R30 is g.
+	// R31 is LR, which was saved by the prologue.
+
+	CALL	runtime·wbBufFlush(SB)
+
+	MOVV	8(R29), R20
+	MOVV	16(R29), R21
+	MOVV	24(R29), R3
+	MOVV	32(R29), R4
+	MOVV	40(R29), R5
+	MOVV	48(R29), R6
+	MOVV	56(R29), R7
+	MOVV	64(R29), R8
+	MOVV	72(R29), R9
+	MOVV	80(R29), R10
+	MOVV	88(R29), R11
+	MOVV	96(R29), R12
+	MOVV	104(R29), R13
+	MOVV	112(R29), R14
+	MOVV	120(R29), R15
+	MOVV	128(R29), R16
+	MOVV	136(R29), R17
+	MOVV	144(R29), R18
+	MOVV	152(R29), R19
+	MOVV	160(R29), R22
+	MOVV	168(R29), R24
+	MOVV	176(R29), R25
+	JMP	retry
+
+TEXT runtime·gcWriteBarrier1<ABIInternal>(SB),NOSPLIT,$0
+	MOVV	$8, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier2<ABIInternal>(SB),NOSPLIT,$0
+	MOVV	$16, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier3<ABIInternal>(SB),NOSPLIT,$0
+	MOVV	$24, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier4<ABIInternal>(SB),NOSPLIT,$0
+	MOVV	$32, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier5<ABIInternal>(SB),NOSPLIT,$0
+	MOVV	$40, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier6<ABIInternal>(SB),NOSPLIT,$0
+	MOVV	$48, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier7<ABIInternal>(SB),NOSPLIT,$0
+	MOVV	$56, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier8<ABIInternal>(SB),NOSPLIT,$0
+	MOVV	$64, R25
+	JMP	gcWriteBarrier<>(SB)
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+TEXT runtime·panicIndex(SB),NOSPLIT,$0-16
+	MOVV	R1, x+0(FP)
+	MOVV	R2, y+8(FP)
+	JMP	runtime·goPanicIndex(SB)
+TEXT runtime·panicIndexU(SB),NOSPLIT,$0-16
+	MOVV	R1, x+0(FP)
+	MOVV	R2, y+8(FP)
+	JMP	runtime·goPanicIndexU(SB)
+TEXT runtime·panicSliceAlen(SB),NOSPLIT,$0-16
+	MOVV	R2, x+0(FP)
+	MOVV	R3, y+8(FP)
+	JMP	runtime·goPanicSliceAlen(SB)
+TEXT runtime·panicSliceAlenU(SB),NOSPLIT,$0-16
+	MOVV	R2, x+0(FP)
+	MOVV	R3, y+8(FP)
+	JMP	runtime·goPanicSliceAlenU(SB)
+TEXT runtime·panicSliceAcap(SB),NOSPLIT,$0-16
+	MOVV	R2, x+0(FP)
+	MOVV	R3, y+8(FP)
+	JMP	runtime·goPanicSliceAcap(SB)
+TEXT runtime·panicSliceAcapU(SB),NOSPLIT,$0-16
+	MOVV	R2, x+0(FP)
+	MOVV	R3, y+8(FP)
+	JMP	runtime·goPanicSliceAcapU(SB)
+TEXT runtime·panicSliceB(SB),NOSPLIT,$0-16
+	MOVV	R1, x+0(FP)
+	MOVV	R2, y+8(FP)
+	JMP	runtime·goPanicSliceB(SB)
+TEXT runtime·panicSliceBU(SB),NOSPLIT,$0-16
+	MOVV	R1, x+0(FP)
+	MOVV	R2, y+8(FP)
+	JMP	runtime·goPanicSliceBU(SB)
+TEXT runtime·panicSlice3Alen(SB),NOSPLIT,$0-16
+	MOVV	R3, x+0(FP)
+	MOVV	R4, y+8(FP)
+	JMP	runtime·goPanicSlice3Alen(SB)
+TEXT runtime·panicSlice3AlenU(SB),NOSPLIT,$0-16
+	MOVV	R3, x+0(FP)
+	MOVV	R4, y+8(FP)
+	JMP	runtime·goPanicSlice3AlenU(SB)
+TEXT runtime·panicSlice3Acap(SB),NOSPLIT,$0-16
+	MOVV	R3, x+0(FP)
+	MOVV	R4, y+8(FP)
+	JMP	runtime·goPanicSlice3Acap(SB)
+TEXT runtime·panicSlice3AcapU(SB),NOSPLIT,$0-16
+	MOVV	R3, x+0(FP)
+	MOVV	R4, y+8(FP)
+	JMP	runtime·goPanicSlice3AcapU(SB)
+TEXT runtime·panicSlice3B(SB),NOSPLIT,$0-16
+	MOVV	R2, x+0(FP)
+	MOVV	R3, y+8(FP)
+	JMP	runtime·goPanicSlice3B(SB)
+TEXT runtime·panicSlice3BU(SB),NOSPLIT,$0-16
+	MOVV	R2, x+0(FP)
+	MOVV	R3, y+8(FP)
+	JMP	runtime·goPanicSlice3BU(SB)
+TEXT runtime·panicSlice3C(SB),NOSPLIT,$0-16
+	MOVV	R1, x+0(FP)
+	MOVV	R2, y+8(FP)
+	JMP	runtime·goPanicSlice3C(SB)
+TEXT runtime·panicSlice3CU(SB),NOSPLIT,$0-16
+	MOVV	R1, x+0(FP)
+	MOVV	R2, y+8(FP)
+	JMP	runtime·goPanicSlice3CU(SB)
+TEXT runtime·panicSliceConvert(SB),NOSPLIT,$0-16
+	MOVV	R3, x+0(FP)
+	MOVV	R4, y+8(FP)
+	JMP	runtime·goPanicSliceConvert(SB)
diff --git a/src/runtime/asm_mipsx.s b/src/runtime/asm_mipsx.s
new file mode 100644
index 0000000..eed4a05
--- /dev/null
+++ b/src/runtime/asm_mipsx.s
@@ -0,0 +1,928 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips || mipsle
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+#define	REGCTXT	R22
+
+TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
+	// R29 = stack; R4 = argc; R5 = argv
+
+	ADDU	$-12, R29
+	MOVW	R4, 4(R29)	// argc
+	MOVW	R5, 8(R29)	// argv
+
+	// create istack out of the given (operating system) stack.
+	// _cgo_init may update stackguard.
+	MOVW	$runtime·g0(SB), g
+	MOVW	$(-64*1024), R23
+	ADD	R23, R29, R1
+	MOVW	R1, g_stackguard0(g)
+	MOVW	R1, g_stackguard1(g)
+	MOVW	R1, (g_stack+stack_lo)(g)
+	MOVW	R29, (g_stack+stack_hi)(g)
+
+	// if there is a _cgo_init, call it using the gcc ABI.
+	MOVW	_cgo_init(SB), R25
+	BEQ	R25, nocgo
+	ADDU	$-16, R29
+	MOVW	R0, R7	// arg 3: not used
+	MOVW	R0, R6	// arg 2: not used
+	MOVW	$setg_gcc<>(SB), R5	// arg 1: setg
+	MOVW	g, R4	// arg 0: G
+	JAL	(R25)
+	ADDU	$16, R29
+
+nocgo:
+	// update stackguard after _cgo_init
+	MOVW	(g_stack+stack_lo)(g), R1
+	ADD	$const_stackGuard, R1
+	MOVW	R1, g_stackguard0(g)
+	MOVW	R1, g_stackguard1(g)
+
+	// set the per-goroutine and per-mach "registers"
+	MOVW	$runtime·m0(SB), R1
+
+	// save m->g0 = g0
+	MOVW	g, m_g0(R1)
+	// save m0 to g0->m
+	MOVW	R1, g_m(g)
+
+	JAL	runtime·check(SB)
+
+	// args are already prepared
+	JAL	runtime·args(SB)
+	JAL	runtime·osinit(SB)
+	JAL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	MOVW	$runtime·mainPC(SB), R1	// entry
+	ADDU	$-8, R29
+	MOVW	R1, 4(R29)
+	MOVW	R0, 0(R29)
+	JAL	runtime·newproc(SB)
+	ADDU	$8, R29
+
+	// start this M
+	JAL	runtime·mstart(SB)
+
+	UNDEF
+	RET
+
+DATA	runtime·mainPC+0(SB)/4,$runtime·main(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$4
+
+TEXT runtime·breakpoint(SB),NOSPLIT,$0-0
+	BREAK
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT,$0-0
+	RET
+
+TEXT runtime·mstart(SB),NOSPLIT|TOPFRAME,$0
+	JAL	runtime·mstart0(SB)
+	RET // not reached
+
+/*
+ *  go-routine
+ */
+
+// void gogo(Gobuf*)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	buf+0(FP), R3
+	MOVW	gobuf_g(R3), R4
+	MOVW	0(R4), R5	// make sure g != nil
+	JMP	gogo<>(SB)
+
+TEXT gogo<>(SB),NOSPLIT|NOFRAME,$0
+	MOVW	R4, g
+	JAL	runtime·save_g(SB)
+	MOVW	gobuf_sp(R3), R29
+	MOVW	gobuf_lr(R3), R31
+	MOVW	gobuf_ret(R3), R1
+	MOVW	gobuf_ctxt(R3), REGCTXT
+	MOVW	R0, gobuf_sp(R3)
+	MOVW	R0, gobuf_ret(R3)
+	MOVW	R0, gobuf_lr(R3)
+	MOVW	R0, gobuf_ctxt(R3)
+	MOVW	gobuf_pc(R3), R4
+	JMP	(R4)
+
+// void mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall(SB),NOSPLIT|NOFRAME,$0-4
+	// Save caller state in g->sched
+	MOVW	R29, (g_sched+gobuf_sp)(g)
+	MOVW	R31, (g_sched+gobuf_pc)(g)
+	MOVW	R0, (g_sched+gobuf_lr)(g)
+
+	// Switch to m->g0 & its stack, call fn.
+	MOVW	g, R1
+	MOVW	g_m(g), R3
+	MOVW	m_g0(R3), g
+	JAL	runtime·save_g(SB)
+	BNE	g, R1, 2(PC)
+	JMP	runtime·badmcall(SB)
+	MOVW	fn+0(FP), REGCTXT	// context
+	MOVW	0(REGCTXT), R4	// code pointer
+	MOVW	(g_sched+gobuf_sp)(g), R29	// sp = m->g0->sched.sp
+	ADDU	$-8, R29	// make room for 1 arg and fake LR
+	MOVW	R1, 4(R29)
+	MOVW	R0, 0(R29)
+	JAL	(R4)
+	JMP	runtime·badmcall2(SB)
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack.  We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB),NOSPLIT,$0-0
+	UNDEF
+	JAL	(R31)	// make sure this function is not leaf
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB),NOSPLIT,$0-4
+	MOVW	fn+0(FP), R1	// R1 = fn
+	MOVW	R1, REGCTXT	// context
+	MOVW	g_m(g), R2	// R2 = m
+
+	MOVW	m_gsignal(R2), R3	// R3 = gsignal
+	BEQ	g, R3, noswitch
+
+	MOVW	m_g0(R2), R3	// R3 = g0
+	BEQ	g, R3, noswitch
+
+	MOVW	m_curg(R2), R4
+	BEQ	g, R4, switch
+
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOVW	$runtime·badsystemstack(SB), R4
+	JAL	(R4)
+	JAL	runtime·abort(SB)
+
+switch:
+	// save our state in g->sched.  Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	JAL	gosave_systemstack_switch<>(SB)
+
+	// switch to g0
+	MOVW	R3, g
+	JAL	runtime·save_g(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R1
+	MOVW	R1, R29
+
+	// call target function
+	MOVW	0(REGCTXT), R4	// code pointer
+	JAL	(R4)
+
+	// switch back to g
+	MOVW	g_m(g), R1
+	MOVW	m_curg(R1), g
+	JAL	runtime·save_g(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R29
+	MOVW	R0, (g_sched+gobuf_sp)(g)
+	RET
+
+noswitch:
+	// already on m stack, just call directly
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOVW	0(REGCTXT), R4	// code pointer
+	MOVW	0(R29), R31	// restore LR
+	ADD	$4, R29
+	JMP	(R4)
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+// Caller has already loaded:
+// R1: framesize, R2: argsize, R3: LR
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	MOVW	g_m(g), R7
+	MOVW	m_g0(R7), R8
+	BNE	g, R8, 3(PC)
+	JAL	runtime·badmorestackg0(SB)
+	JAL	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOVW	m_gsignal(R7), R8
+	BNE	g, R8, 3(PC)
+	JAL	runtime·badmorestackgsignal(SB)
+	JAL	runtime·abort(SB)
+
+	// Called from f.
+	// Set g->sched to context in f.
+	MOVW	R29, (g_sched+gobuf_sp)(g)
+	MOVW	R31, (g_sched+gobuf_pc)(g)
+	MOVW	R3, (g_sched+gobuf_lr)(g)
+	MOVW	REGCTXT, (g_sched+gobuf_ctxt)(g)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	MOVW	R3, (m_morebuf+gobuf_pc)(R7)	// f's caller's PC
+	MOVW	R29, (m_morebuf+gobuf_sp)(R7)	// f's caller's SP
+	MOVW	g, (m_morebuf+gobuf_g)(R7)
+
+	// Call newstack on m->g0's stack.
+	MOVW	m_g0(R7), g
+	JAL	runtime·save_g(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R29
+	// Create a stack frame on g0 to call newstack.
+	MOVW	R0, -4(R29)	// Zero saved LR in frame
+	ADDU	$-4, R29
+	JAL	runtime·newstack(SB)
+
+	// Not reached, but make sure the return PC from the call to newstack
+	// is still in this function, and not the beginning of the next.
+	UNDEF
+
+TEXT runtime·morestack_noctxt(SB),NOSPLIT,$0-0
+	// Force SPWRITE. This function doesn't actually write SP,
+	// but it is called with a special calling convention where
+	// the caller doesn't save LR on stack but passes it as a
+	// register (R3), and the unwinder currently doesn't understand.
+	// Make it SPWRITE to stop unwinding. (See issue 54332)
+	MOVW	R29, R29
+
+	MOVW	R0, REGCTXT
+	JMP	runtime·morestack(SB)
+
+// reflectcall: call a function with the given argument list
+// func call(stackArgsType *_type, f *FuncVal, stackArgs *byte, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+
+#define DISPATCH(NAME,MAXSIZE)	\
+	MOVW	$MAXSIZE, R23;	\
+	SGTU	R1, R23, R23;	\
+	BNE	R23, 3(PC);	\
+	MOVW	$NAME(SB), R4;	\
+	JMP	(R4)
+
+TEXT ·reflectcall(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	frameSize+20(FP), R1
+
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVW	$runtime·badreflectcall(SB), R4
+	JMP	(R4)
+
+#define CALLFN(NAME,MAXSIZE)	\
+TEXT NAME(SB),WRAPPER,$MAXSIZE-28;	\
+	NO_LOCAL_POINTERS;	\
+	/* copy arguments to stack */		\
+	MOVW	stackArgs+8(FP), R1;	\
+	MOVW	stackArgsSize+12(FP), R2;	\
+	MOVW	R29, R3;	\
+	ADDU	$4, R3;	\
+	ADDU	R3, R2;	\
+	BEQ	R3, R2, 6(PC);	\
+	MOVBU	(R1), R4;	\
+	ADDU	$1, R1;	\
+	MOVBU	R4, (R3);	\
+	ADDU	$1, R3;	\
+	JMP	-5(PC);	\
+	/* call function */			\
+	MOVW	f+4(FP), REGCTXT;	\
+	MOVW	(REGCTXT), R4;	\
+	PCDATA	$PCDATA_StackMapIndex, $0;	\
+	JAL	(R4);	\
+	/* copy return values back */		\
+	MOVW	stackArgsType+0(FP), R5;	\
+	MOVW	stackArgs+8(FP), R1;	\
+	MOVW	stackArgsSize+12(FP), R2;	\
+	MOVW	stackRetOffset+16(FP), R4;	\
+	ADDU	$4, R29, R3;	\
+	ADDU	R4, R3;	\
+	ADDU	R4, R1;	\
+	SUBU	R4, R2;	\
+	JAL	callRet<>(SB);		\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $20-0
+	MOVW	R5, 4(R29)
+	MOVW	R1, 8(R29)
+	MOVW	R3, 12(R29)
+	MOVW	R2, 16(R29)
+	MOVW    $0, 20(R29)
+	JAL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+TEXT runtime·procyield(SB),NOSPLIT,$0-4
+	RET
+
+// Save state of caller into g->sched,
+// but using fake PC from systemstack_switch.
+// Must only be called from functions with no locals ($0)
+// or else unwinding from systemstack_switch is incorrect.
+// Smashes R1.
+TEXT gosave_systemstack_switch<>(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$runtime·systemstack_switch(SB), R1
+	ADDU	$8, R1	// get past prologue
+	MOVW	R1, (g_sched+gobuf_pc)(g)
+	MOVW	R29, (g_sched+gobuf_sp)(g)
+	MOVW	R0, (g_sched+gobuf_lr)(g)
+	MOVW	R0, (g_sched+gobuf_ret)(g)
+	// Assert ctxt is zero. See func save.
+	MOVW	(g_sched+gobuf_ctxt)(g), R1
+	BEQ	R1, 2(PC)
+	JAL	runtime·abort(SB)
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-12
+	MOVW	fn+0(FP), R25
+	MOVW	arg+4(FP), R4
+
+	MOVW	R29, R3	// save original stack pointer
+	MOVW	g, R2
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already. Or we might already
+	// be on the m->gsignal stack.
+	MOVW	g_m(g), R5
+	MOVW	m_gsignal(R5), R6
+	BEQ	R6, g, g0
+	MOVW	m_g0(R5), R6
+	BEQ	R6, g, g0
+
+	JAL	gosave_systemstack_switch<>(SB)
+	MOVW	R6, g
+	JAL	runtime·save_g(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R29
+
+	// Now on a scheduling stack (a pthread-created stack).
+g0:
+	// Save room for two of our pointers and O32 frame.
+	ADDU	$-24, R29
+	AND	$~7, R29	// O32 ABI expects 8-byte aligned stack on function entry
+	MOVW	R2, 16(R29)	// save old g on stack
+	MOVW	(g_stack+stack_hi)(R2), R2
+	SUBU	R3, R2
+	MOVW	R2, 20(R29)	// save depth in old g stack (can't just save SP, as stack might be copied during a callback)
+	JAL	(R25)
+
+	// Restore g, stack pointer. R2 is return value.
+	MOVW	16(R29), g
+	JAL	runtime·save_g(SB)
+	MOVW	(g_stack+stack_hi)(g), R5
+	MOVW	20(R29), R6
+	SUBU	R6, R5
+	MOVW	R5, R29
+
+	MOVW	R2, ret+8(FP)
+	RET
+
+// cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$12-12
+	NO_LOCAL_POINTERS
+
+	// Skip cgocallbackg, just dropm when fn is nil, and frame is the saved g.
+	// It is used to dropm while thread is exiting.
+	MOVW	fn+0(FP), R5
+	BNE	R5, loadg
+	// Restore the g from frame.
+	MOVW	frame+4(FP), g
+	JMP	dropm
+
+loadg:
+	// Load m and g from thread-local storage.
+	MOVB	runtime·iscgo(SB), R1
+	BEQ	R1, nocgo
+	JAL	runtime·load_g(SB)
+nocgo:
+
+	// If g is nil, Go did not create the current thread,
+	// or if this thread never called into Go on pthread platforms.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call.
+	BEQ	g, needm
+
+	MOVW	g_m(g), R3
+	MOVW	R3, savedm-4(SP)
+	JMP	havem
+
+needm:
+	MOVW	g, savedm-4(SP) // g is zero, so is m.
+	MOVW	$runtime·needAndBindM(SB), R4
+	JAL	(R4)
+
+	// Set m->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVW	g_m(g), R3
+	MOVW	m_g0(R3), R1
+	MOVW	R29, (g_sched+gobuf_sp)(R1)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 4(R29) aka savedsp-8(SP).
+	MOVW	m_g0(R3), R1
+	MOVW	(g_sched+gobuf_sp)(R1), R2
+	MOVW	R2, savedsp-12(SP)	// must match frame size
+	MOVW	R29, (g_sched+gobuf_sp)(R1)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOVW	m_curg(R3), g
+	JAL	runtime·save_g(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R2 // prepare stack as R2
+	MOVW	(g_sched+gobuf_pc)(g), R4
+	MOVW	R4, -(12+4)(R2)	// "saved LR"; must match frame size
+	// Gather our arguments into registers.
+	MOVW	fn+0(FP), R5
+	MOVW	frame+4(FP), R6
+	MOVW	ctxt+8(FP), R7
+	MOVW	$-(12+4)(R2), R29	// switch stack; must match frame size
+	MOVW	R5, 4(R29)
+	MOVW	R6, 8(R29)
+	MOVW	R7, 12(R29)
+	JAL	runtime·cgocallbackg(SB)
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	MOVW	0(R29), R4
+	MOVW	R4, (g_sched+gobuf_pc)(g)
+	MOVW	$(12+4)(R29), R2	// must match frame size
+	MOVW	R2, (g_sched+gobuf_sp)(g)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVW	g_m(g), R3
+	MOVW	m_g0(R3), g
+	JAL	runtime·save_g(SB)
+	MOVW	(g_sched+gobuf_sp)(g), R29
+	MOVW	savedsp-12(SP), R2	// must match frame size
+	MOVW	R2, (g_sched+gobuf_sp)(g)
+
+	// If the m on entry was nil, we called needm above to borrow an m,
+	// 1. for the duration of the call on non-pthread platforms,
+	// 2. or the duration of the C thread alive on pthread platforms.
+	// If the m on entry wasn't nil,
+	// 1. the thread might be a Go thread,
+	// 2. or it wasn't the first call from a C thread on pthread platforms,
+	//    since then we skip dropm to reuse the m in the first call.
+	MOVW	savedm-4(SP), R3
+	BNE	R3, droppedm
+
+	// Skip dropm to reuse it in the next call, when a pthread key has been created.
+	MOVW	_cgo_pthread_key_created(SB), R3
+	// It means cgo is disabled when _cgo_pthread_key_created is a nil pointer, need dropm.
+	BEQ	R3, dropm
+	MOVW	(R3), R3
+	BNE	R3, droppedm
+
+dropm:
+	MOVW	$runtime·dropm(SB), R4
+	JAL	(R4)
+droppedm:
+
+	// Done!
+	RET
+
+// void setg(G*); set g. for use by needm.
+// This only happens if iscgo, so jump straight to save_g
+TEXT runtime·setg(SB),NOSPLIT,$0-4
+	MOVW	gg+0(FP), g
+	JAL	runtime·save_g(SB)
+	RET
+
+// void setg_gcc(G*); set g in C TLS.
+// Must obey the gcc calling convention.
+TEXT setg_gcc<>(SB),NOSPLIT,$0
+	MOVW	R4, g
+	JAL	runtime·save_g(SB)
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT,$0-0
+	UNDEF
+
+// AES hashing not implemented for mips
+TEXT runtime·memhash(SB),NOSPLIT|NOFRAME,$0-16
+	JMP	runtime·memhashFallback(SB)
+TEXT runtime·strhash(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	runtime·strhashFallback(SB)
+TEXT runtime·memhash32(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	runtime·memhash32Fallback(SB)
+TEXT runtime·memhash64(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	runtime·memhash64Fallback(SB)
+
+TEXT runtime·return0(SB),NOSPLIT,$0
+	MOVW	$0, R1
+	RET
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT|NOFRAME,$0
+	// g (R30), R3 and REGTMP (R23) might be clobbered by load_g. R30 and R23
+	// are callee-save in the gcc calling convention, so save them.
+	MOVW	R23, R8
+	MOVW	g, R9
+	MOVW	R31, R10 // this call frame does not save LR
+
+	JAL	runtime·load_g(SB)
+	MOVW	g_m(g), R1
+	MOVW	m_curg(R1), R1
+	MOVW	(g_stack+stack_hi)(R1), R2 // return value in R2
+
+	MOVW	R8, R23
+	MOVW	R9, g
+	MOVW	R10, R31
+
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+TEXT runtime·goexit(SB),NOSPLIT|NOFRAME|TOPFRAME,$0-0
+	NOR	R0, R0	// NOP
+	JAL	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	NOR	R0, R0	// NOP
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	MOVW	$1, R1
+	MOVB	R1, ret+0(FP)
+	RET
+
+// gcWriteBarrier informs the GC about heap pointer writes.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It accepts the
+// number of bytes of buffer needed in R25, and returns a pointer
+// to the buffer space in R25.
+// It clobbers R23 (the linker temp register).
+// The act of CALLing gcWriteBarrier will clobber R31 (LR).
+// It does not clobber any other general-purpose registers,
+// but may clobber others (e.g., floating point registers).
+TEXT gcWriteBarrier<>(SB),NOSPLIT,$104
+	// Save the registers clobbered by the fast path.
+	MOVW	R1, 100(R29)
+	MOVW	R2, 104(R29)
+retry:
+	MOVW	g_m(g), R1
+	MOVW	m_p(R1), R1
+	MOVW	(p_wbBuf+wbBuf_next)(R1), R2
+	MOVW	(p_wbBuf+wbBuf_end)(R1), R23 // R23 is linker temp register
+	// Increment wbBuf.next position.
+	ADD	R25, R2
+	// Is the buffer full?
+	SGTU	R2, R23, R23
+	BNE	R23, flush
+	// Commit to the larger buffer.
+	MOVW	R2, (p_wbBuf+wbBuf_next)(R1)
+	// Make return value (the original next position)
+	SUB	R25, R2, R25
+	// Restore registers.
+	MOVW	100(R29), R1
+	MOVW	104(R29), R2
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	MOVW	R20, 4(R29)
+	MOVW	R21, 8(R29)
+	// R1 already saved
+	// R2 already saved
+	MOVW	R3, 12(R29)
+	MOVW	R4, 16(R29)
+	MOVW	R5, 20(R29)
+	MOVW	R6, 24(R29)
+	MOVW	R7, 28(R29)
+	MOVW	R8, 32(R29)
+	MOVW	R9, 36(R29)
+	MOVW	R10, 40(R29)
+	MOVW	R11, 44(R29)
+	MOVW	R12, 48(R29)
+	MOVW	R13, 52(R29)
+	MOVW	R14, 56(R29)
+	MOVW	R15, 60(R29)
+	MOVW	R16, 64(R29)
+	MOVW	R17, 68(R29)
+	MOVW	R18, 72(R29)
+	MOVW	R19, 76(R29)
+	MOVW	R20, 80(R29)
+	// R21 already saved
+	// R22 already saved.
+	MOVW	R22, 84(R29)
+	// R23 is tmp register.
+	MOVW	R24, 88(R29)
+	MOVW	R25, 92(R29)
+	// R26 is reserved by kernel.
+	// R27 is reserved by kernel.
+	MOVW	R28, 96(R29)
+	// R29 is SP.
+	// R30 is g.
+	// R31 is LR, which was saved by the prologue.
+
+	CALL	runtime·wbBufFlush(SB)
+
+	MOVW	4(R29), R20
+	MOVW	8(R29), R21
+	MOVW	12(R29), R3
+	MOVW	16(R29), R4
+	MOVW	20(R29), R5
+	MOVW	24(R29), R6
+	MOVW	28(R29), R7
+	MOVW	32(R29), R8
+	MOVW	36(R29), R9
+	MOVW	40(R29), R10
+	MOVW	44(R29), R11
+	MOVW	48(R29), R12
+	MOVW	52(R29), R13
+	MOVW	56(R29), R14
+	MOVW	60(R29), R15
+	MOVW	64(R29), R16
+	MOVW	68(R29), R17
+	MOVW	72(R29), R18
+	MOVW	76(R29), R19
+	MOVW	80(R29), R20
+	MOVW	84(R29), R22
+	MOVW	88(R29), R24
+	MOVW	92(R29), R25
+	MOVW	96(R29), R28
+	JMP	retry
+
+TEXT runtime·gcWriteBarrier1<ABIInternal>(SB),NOSPLIT,$0
+	MOVW	$4, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier2<ABIInternal>(SB),NOSPLIT,$0
+	MOVW	$8, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier3<ABIInternal>(SB),NOSPLIT,$0
+	MOVW	$12, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier4<ABIInternal>(SB),NOSPLIT,$0
+	MOVW	$16, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier5<ABIInternal>(SB),NOSPLIT,$0
+	MOVW	$20, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier6<ABIInternal>(SB),NOSPLIT,$0
+	MOVW	$24, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier7<ABIInternal>(SB),NOSPLIT,$0
+	MOVW	$28, R25
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier8<ABIInternal>(SB),NOSPLIT,$0
+	MOVW	$32, R25
+	JMP	gcWriteBarrier<>(SB)
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+TEXT runtime·panicIndex(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicIndex(SB)
+TEXT runtime·panicIndexU(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicIndexU(SB)
+TEXT runtime·panicSliceAlen(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSliceAlen(SB)
+TEXT runtime·panicSliceAlenU(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSliceAlenU(SB)
+TEXT runtime·panicSliceAcap(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSliceAcap(SB)
+TEXT runtime·panicSliceAcapU(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSliceAcapU(SB)
+TEXT runtime·panicSliceB(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSliceB(SB)
+TEXT runtime·panicSliceBU(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSliceBU(SB)
+TEXT runtime·panicSlice3Alen(SB),NOSPLIT,$0-8
+	MOVW	R3, x+0(FP)
+	MOVW	R4, y+4(FP)
+	JMP	runtime·goPanicSlice3Alen(SB)
+TEXT runtime·panicSlice3AlenU(SB),NOSPLIT,$0-8
+	MOVW	R3, x+0(FP)
+	MOVW	R4, y+4(FP)
+	JMP	runtime·goPanicSlice3AlenU(SB)
+TEXT runtime·panicSlice3Acap(SB),NOSPLIT,$0-8
+	MOVW	R3, x+0(FP)
+	MOVW	R4, y+4(FP)
+	JMP	runtime·goPanicSlice3Acap(SB)
+TEXT runtime·panicSlice3AcapU(SB),NOSPLIT,$0-8
+	MOVW	R3, x+0(FP)
+	MOVW	R4, y+4(FP)
+	JMP	runtime·goPanicSlice3AcapU(SB)
+TEXT runtime·panicSlice3B(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSlice3B(SB)
+TEXT runtime·panicSlice3BU(SB),NOSPLIT,$0-8
+	MOVW	R2, x+0(FP)
+	MOVW	R3, y+4(FP)
+	JMP	runtime·goPanicSlice3BU(SB)
+TEXT runtime·panicSlice3C(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSlice3C(SB)
+TEXT runtime·panicSlice3CU(SB),NOSPLIT,$0-8
+	MOVW	R1, x+0(FP)
+	MOVW	R2, y+4(FP)
+	JMP	runtime·goPanicSlice3CU(SB)
+TEXT runtime·panicSliceConvert(SB),NOSPLIT,$0-8
+	MOVW	R3, x+0(FP)
+	MOVW	R4, y+4(FP)
+	JMP	runtime·goPanicSliceConvert(SB)
+
+// Extended versions for 64-bit indexes.
+TEXT runtime·panicExtendIndex(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendIndex(SB)
+TEXT runtime·panicExtendIndexU(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendIndexU(SB)
+TEXT runtime·panicExtendSliceAlen(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAlen(SB)
+TEXT runtime·panicExtendSliceAlenU(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAlenU(SB)
+TEXT runtime·panicExtendSliceAcap(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAcap(SB)
+TEXT runtime·panicExtendSliceAcapU(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSliceAcapU(SB)
+TEXT runtime·panicExtendSliceB(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSliceB(SB)
+TEXT runtime·panicExtendSliceBU(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSliceBU(SB)
+TEXT runtime·panicExtendSlice3Alen(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R3, lo+4(FP)
+	MOVW	R4, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3Alen(SB)
+TEXT runtime·panicExtendSlice3AlenU(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R3, lo+4(FP)
+	MOVW	R4, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3AlenU(SB)
+TEXT runtime·panicExtendSlice3Acap(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R3, lo+4(FP)
+	MOVW	R4, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3Acap(SB)
+TEXT runtime·panicExtendSlice3AcapU(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R3, lo+4(FP)
+	MOVW	R4, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3AcapU(SB)
+TEXT runtime·panicExtendSlice3B(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3B(SB)
+TEXT runtime·panicExtendSlice3BU(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R2, lo+4(FP)
+	MOVW	R3, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3BU(SB)
+TEXT runtime·panicExtendSlice3C(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3C(SB)
+TEXT runtime·panicExtendSlice3CU(SB),NOSPLIT,$0-12
+	MOVW	R5, hi+0(FP)
+	MOVW	R1, lo+4(FP)
+	MOVW	R2, y+8(FP)
+	JMP	runtime·goPanicExtendSlice3CU(SB)
diff --git a/src/runtime/asm_ppc64x.h b/src/runtime/asm_ppc64x.h
new file mode 100644
index 0000000..65870fe
--- /dev/null
+++ b/src/runtime/asm_ppc64x.h
@@ -0,0 +1,55 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// FIXED_FRAME defines the size of the fixed part of a stack frame. A stack
+// frame looks like this:
+//
+// +---------------------+
+// | local variable area |
+// +---------------------+
+// | argument area       |
+// +---------------------+ <- R1+FIXED_FRAME
+// | fixed area          |
+// +---------------------+ <- R1
+//
+// So a function that sets up a stack frame at all uses as least FIXED_FRAME
+// bytes of stack. This mostly affects assembly that calls other functions
+// with arguments (the arguments should be stored at FIXED_FRAME+0(R1),
+// FIXED_FRAME+8(R1) etc) and some other low-level places.
+//
+// The reason for using a constant is to make supporting PIC easier (although
+// we only support PIC on ppc64le which has a minimum 32 bytes of stack frame,
+// and currently always use that much, PIC on ppc64 would need to use 48).
+
+#define FIXED_FRAME 32
+
+// aix/ppc64 uses XCOFF which uses function descriptors.
+// AIX cannot perform the TOC relocation in a text section.
+// Therefore, these descriptors must live in a data section.
+#ifdef GOOS_aix
+#ifdef GOARCH_ppc64
+#define GO_PPC64X_HAS_FUNCDESC
+#define DEFINE_PPC64X_FUNCDESC(funcname, localfuncname)	\
+	DATA	funcname+0(SB)/8, $localfuncname(SB) 	\
+	DATA	funcname+8(SB)/8, $TOC(SB)		\
+	DATA	funcname+16(SB)/8, $0			\
+	GLOBL	funcname(SB), NOPTR, $24
+#endif
+#endif
+
+// linux/ppc64 uses ELFv1 which uses function descriptors.
+// These must also look like ABI0 functions on linux/ppc64
+// to work with abi.FuncPCABI0(sigtramp) in os_linux.go.
+// Only static codegen is supported on linux/ppc64, so TOC
+// is not needed.
+#ifdef GOOS_linux
+#ifdef GOARCH_ppc64
+#define GO_PPC64X_HAS_FUNCDESC
+#define DEFINE_PPC64X_FUNCDESC(funcname, localfuncname)	\
+	TEXT	funcname(SB),NOSPLIT|NOFRAME,$0		\
+		DWORD	$localfuncname(SB)		\
+		DWORD	$0				\
+		DWORD	$0
+#endif
+#endif
diff --git a/src/runtime/asm_ppc64x.s b/src/runtime/asm_ppc64x.s
new file mode 100644
index 0000000..66d0447
--- /dev/null
+++ b/src/runtime/asm_ppc64x.s
@@ -0,0 +1,1299 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ppc64 || ppc64le
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+#include "asm_ppc64x.h"
+
+#ifdef GOOS_aix
+#define cgoCalleeStackSize 48
+#else
+#define cgoCalleeStackSize 32
+#endif
+
+TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
+	// R1 = stack; R3 = argc; R4 = argv; R13 = C TLS base pointer
+
+	// initialize essential registers
+	BL	runtime·reginit(SB)
+
+	SUB	$(FIXED_FRAME+16), R1
+	MOVD	R2, 24(R1)		// stash the TOC pointer away again now we've created a new frame
+	MOVW	R3, FIXED_FRAME+0(R1)	// argc
+	MOVD	R4, FIXED_FRAME+8(R1)	// argv
+
+	// create istack out of the given (operating system) stack.
+	// _cgo_init may update stackguard.
+	MOVD	$runtime·g0(SB), g
+	BL	runtime·save_g(SB)
+	MOVD	$(-64*1024), R31
+	ADD	R31, R1, R3
+	MOVD	R3, g_stackguard0(g)
+	MOVD	R3, g_stackguard1(g)
+	MOVD	R3, (g_stack+stack_lo)(g)
+	MOVD	R1, (g_stack+stack_hi)(g)
+
+	// If there is a _cgo_init, call it using the gcc ABI.
+	MOVD	_cgo_init(SB), R12
+	CMP	R0, R12
+	BEQ	nocgo
+
+#ifdef GO_PPC64X_HAS_FUNCDESC
+	// Load the real entry address from the first slot of the function descriptor.
+	MOVD	8(R12), R2
+	MOVD	(R12), R12
+#endif
+	MOVD	R12, CTR		// r12 = "global function entry point"
+	MOVD	R13, R5			// arg 2: TLS base pointer
+	MOVD	$setg_gcc<>(SB), R4 	// arg 1: setg
+	MOVD	g, R3			// arg 0: G
+	// C functions expect 32 (48 for AIX) bytes of space on caller
+	// stack frame and a 16-byte aligned R1
+	MOVD	R1, R14			// save current stack
+	SUB	$cgoCalleeStackSize, R1	// reserve the callee area
+	RLDCR	$0, R1, $~15, R1	// 16-byte align
+	BL	(CTR)			// may clobber R0, R3-R12
+	MOVD	R14, R1			// restore stack
+#ifndef GOOS_aix
+	MOVD	24(R1), R2
+#endif
+	XOR	R0, R0			// fix R0
+
+nocgo:
+	// update stackguard after _cgo_init
+	MOVD	(g_stack+stack_lo)(g), R3
+	ADD	$const_stackGuard, R3
+	MOVD	R3, g_stackguard0(g)
+	MOVD	R3, g_stackguard1(g)
+
+	// set the per-goroutine and per-mach "registers"
+	MOVD	$runtime·m0(SB), R3
+
+	// save m->g0 = g0
+	MOVD	g, m_g0(R3)
+	// save m0 to g0->m
+	MOVD	R3, g_m(g)
+
+	BL	runtime·check(SB)
+
+	// args are already prepared
+	BL	runtime·args(SB)
+	BL	runtime·osinit(SB)
+	BL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	MOVD	$runtime·mainPC(SB), R3		// entry
+	MOVDU	R3, -8(R1)
+	MOVDU	R0, -8(R1)
+	MOVDU	R0, -8(R1)
+	MOVDU	R0, -8(R1)
+	MOVDU	R0, -8(R1)
+	BL	runtime·newproc(SB)
+	ADD	$(8+FIXED_FRAME), R1
+
+	// start this M
+	BL	runtime·mstart(SB)
+
+	MOVD	R0, 0(R0)
+	RET
+
+DATA	runtime·mainPC+0(SB)/8,$runtime·main<ABIInternal>(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$8
+
+TEXT runtime·breakpoint(SB),NOSPLIT|NOFRAME,$0-0
+	TW	$31, R0, R0
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT|NOFRAME,$0-0
+	RET
+
+// Any changes must be reflected to runtime/cgo/gcc_aix_ppc64.S:.crosscall_ppc64
+TEXT _cgo_reginit(SB),NOSPLIT|NOFRAME,$0-0
+	// crosscall_ppc64 and crosscall2 need to reginit, but can't
+	// get at the 'runtime.reginit' symbol.
+	BR	runtime·reginit(SB)
+
+TEXT runtime·reginit(SB),NOSPLIT|NOFRAME,$0-0
+	// set R0 to zero, it's expected by the toolchain
+	XOR R0, R0
+	RET
+
+TEXT runtime·mstart(SB),NOSPLIT|TOPFRAME,$0
+	BL	runtime·mstart0(SB)
+	RET // not reached
+
+/*
+ *  go-routine
+ */
+
+// void gogo(Gobuf*)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB), NOSPLIT|NOFRAME, $0-8
+	MOVD	buf+0(FP), R5
+	MOVD	gobuf_g(R5), R6
+	MOVD	0(R6), R4	// make sure g != nil
+	BR	gogo<>(SB)
+
+TEXT gogo<>(SB), NOSPLIT|NOFRAME, $0
+	MOVD	R6, g
+	BL	runtime·save_g(SB)
+
+	MOVD	gobuf_sp(R5), R1
+	MOVD	gobuf_lr(R5), R31
+#ifndef GOOS_aix
+	MOVD	24(R1), R2	// restore R2
+#endif
+	MOVD	R31, LR
+	MOVD	gobuf_ret(R5), R3
+	MOVD	gobuf_ctxt(R5), R11
+	MOVD	R0, gobuf_sp(R5)
+	MOVD	R0, gobuf_ret(R5)
+	MOVD	R0, gobuf_lr(R5)
+	MOVD	R0, gobuf_ctxt(R5)
+	CMP	R0, R0 // set condition codes for == test, needed by stack split
+	MOVD	gobuf_pc(R5), R12
+	MOVD	R12, CTR
+	BR	(CTR)
+
+// void mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-8
+	// Save caller state in g->sched
+	// R11 should be safe across save_g??
+	MOVD	R3, R11
+	MOVD	R1, (g_sched+gobuf_sp)(g)
+	MOVD	LR, R31
+	MOVD	R31, (g_sched+gobuf_pc)(g)
+	MOVD	R0, (g_sched+gobuf_lr)(g)
+
+	// Switch to m->g0 & its stack, call fn.
+	MOVD	g, R3
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), g
+	BL	runtime·save_g(SB)
+	CMP	g, R3
+	BNE	2(PC)
+	BR	runtime·badmcall(SB)
+	MOVD	0(R11), R12			// code pointer
+	MOVD	R12, CTR
+	MOVD	(g_sched+gobuf_sp)(g), R1	// sp = m->g0->sched.sp
+	// Don't need to do anything special for regabiargs here
+	// R3 is g; stack is set anyway
+	MOVDU	R3, -8(R1)
+	MOVDU	R0, -8(R1)
+	MOVDU	R0, -8(R1)
+	MOVDU	R0, -8(R1)
+	MOVDU	R0, -8(R1)
+	BL	(CTR)
+	MOVD	24(R1), R2
+	BR	runtime·badmcall2(SB)
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack. We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	// We have several undefs here so that 16 bytes past
+	// $runtime·systemstack_switch lies within them whether or not the
+	// instructions that derive r2 from r12 are there.
+	UNDEF
+	UNDEF
+	UNDEF
+	BL	(LR)	// make sure this function is not leaf
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-8
+	MOVD	fn+0(FP), R3	// R3 = fn
+	MOVD	R3, R11		// context
+	MOVD	g_m(g), R4	// R4 = m
+
+	MOVD	m_gsignal(R4), R5	// R5 = gsignal
+	CMP	g, R5
+	BEQ	noswitch
+
+	MOVD	m_g0(R4), R5	// R5 = g0
+	CMP	g, R5
+	BEQ	noswitch
+
+	MOVD	m_curg(R4), R6
+	CMP	g, R6
+	BEQ	switch
+
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOVD	$runtime·badsystemstack(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+	BL	runtime·abort(SB)
+
+switch:
+	// save our state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	BL	gosave_systemstack_switch<>(SB)
+
+	// switch to g0
+	MOVD	R5, g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R1
+
+	// call target function
+	MOVD	0(R11), R12	// code pointer
+	MOVD	R12, CTR
+	BL	(CTR)
+
+	// restore TOC pointer. It seems unlikely that we will use systemstack
+	// to call a function defined in another module, but the results of
+	// doing so would be so confusing that it's worth doing this.
+	MOVD	g_m(g), R3
+	MOVD	m_curg(R3), g
+	MOVD	(g_sched+gobuf_sp)(g), R3
+#ifndef GOOS_aix
+	MOVD	24(R3), R2
+#endif
+	// switch back to g
+	MOVD	g_m(g), R3
+	MOVD	m_curg(R3), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R1
+	MOVD	R0, (g_sched+gobuf_sp)(g)
+	RET
+
+noswitch:
+	// already on m stack, just call directly
+	// On other arches we do a tail call here, but it appears to be
+	// impossible to tail call a function pointer in shared mode on
+	// ppc64 because the caller is responsible for restoring the TOC.
+	MOVD	0(R11), R12	// code pointer
+	MOVD	R12, CTR
+	BL	(CTR)
+#ifndef GOOS_aix
+	MOVD	24(R1), R2
+#endif
+	RET
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+// Caller has already loaded:
+// R3: framesize, R4: argsize, R5: LR
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	MOVD	g_m(g), R7
+	MOVD	m_g0(R7), R8
+	CMP	g, R8
+	BNE	3(PC)
+	BL	runtime·badmorestackg0(SB)
+	BL	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOVD	m_gsignal(R7), R8
+	CMP	g, R8
+	BNE	3(PC)
+	BL	runtime·badmorestackgsignal(SB)
+	BL	runtime·abort(SB)
+
+	// Called from f.
+	// Set g->sched to context in f.
+	MOVD	R1, (g_sched+gobuf_sp)(g)
+	MOVD	LR, R8
+	MOVD	R8, (g_sched+gobuf_pc)(g)
+	MOVD	R5, (g_sched+gobuf_lr)(g)
+	MOVD	R11, (g_sched+gobuf_ctxt)(g)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	MOVD	R5, (m_morebuf+gobuf_pc)(R7)	// f's caller's PC
+	MOVD	R1, (m_morebuf+gobuf_sp)(R7)	// f's caller's SP
+	MOVD	g, (m_morebuf+gobuf_g)(R7)
+
+	// Call newstack on m->g0's stack.
+	MOVD	m_g0(R7), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R1
+	MOVDU   R0, -(FIXED_FRAME+0)(R1)	// create a call frame on g0
+	BL	runtime·newstack(SB)
+
+	// Not reached, but make sure the return PC from the call to newstack
+	// is still in this function, and not the beginning of the next.
+	UNDEF
+
+TEXT runtime·morestack_noctxt(SB),NOSPLIT|NOFRAME,$0-0
+	// Force SPWRITE. This function doesn't actually write SP,
+	// but it is called with a special calling convention where
+	// the caller doesn't save LR on stack but passes it as a
+	// register (R5), and the unwinder currently doesn't understand.
+	// Make it SPWRITE to stop unwinding. (See issue 54332)
+	// Use OR R0, R1 instead of MOVD R1, R1 as the MOVD instruction
+	// has a special affect on Power8,9,10 by lowering the thread 
+	// priority and causing a slowdown in execution time
+
+	OR	R0, R1
+	MOVD	R0, R11
+	BR	runtime·morestack(SB)
+
+// reflectcall: call a function with the given argument list
+// func call(stackArgsType *_type, f *FuncVal, stackArgs *byte, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)		\
+	MOVD	$MAXSIZE, R31;		\
+	CMP	R3, R31;		\
+	BGT	4(PC);			\
+	MOVD	$NAME(SB), R12;		\
+	MOVD	R12, CTR;		\
+	BR	(CTR)
+// Note: can't just "BR NAME(SB)" - bad inlining results.
+
+TEXT ·reflectcall(SB), NOSPLIT|NOFRAME, $0-48
+	MOVWZ	frameSize+32(FP), R3
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVD	$runtime·badreflectcall(SB), R12
+	MOVD	R12, CTR
+	BR	(CTR)
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-48;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOVD	stackArgs+16(FP), R3;			\
+	MOVWZ	stackArgsSize+24(FP), R4;			\
+	MOVD    R1, R5;				\
+	CMP	R4, $8;				\
+	BLT	tailsetup;			\
+	/* copy 8 at a time if possible */	\
+	ADD	$(FIXED_FRAME-8), R5;			\
+	SUB	$8, R3;				\
+top: \
+	MOVDU	8(R3), R7;			\
+	MOVDU	R7, 8(R5);			\
+	SUB	$8, R4;				\
+	CMP	R4, $8;				\
+	BGE	top;				\
+	/* handle remaining bytes */	\
+	CMP	$0, R4;			\
+	BEQ	callfn;			\
+	ADD	$7, R3;			\
+	ADD	$7, R5;			\
+	BR	tail;			\
+tailsetup: \
+	CMP	$0, R4;			\
+	BEQ	callfn;			\
+	ADD     $(FIXED_FRAME-1), R5;	\
+	SUB     $1, R3;			\
+tail: \
+	MOVBU	1(R3), R6;		\
+	MOVBU	R6, 1(R5);		\
+	SUB	$1, R4;			\
+	CMP	$0, R4;			\
+	BGT	tail;			\
+callfn: \
+	/* call function */			\
+	MOVD	f+8(FP), R11;			\
+#ifdef GOOS_aix				\
+	/* AIX won't trigger a SIGSEGV if R11 = nil */	\
+	/* So it manually triggers it */	\
+	CMP	R0, R11				\
+	BNE	2(PC)				\
+	MOVD	R0, 0(R0)			\
+#endif						\
+	MOVD    regArgs+40(FP), R20;    \
+	BL      runtime·unspillArgs(SB);        \
+	MOVD	(R11), R12;			\
+	MOVD	R12, CTR;			\
+	PCDATA  $PCDATA_StackMapIndex, $0;	\
+	BL	(CTR);				\
+#ifndef GOOS_aix				\
+	MOVD	24(R1), R2;			\
+#endif						\
+	/* copy return values back */		\
+	MOVD	regArgs+40(FP), R20;		\
+	BL	runtime·spillArgs(SB);			\
+	MOVD	stackArgsType+0(FP), R7;		\
+	MOVD	stackArgs+16(FP), R3;			\
+	MOVWZ	stackArgsSize+24(FP), R4;			\
+	MOVWZ	stackRetOffset+28(FP), R6;		\
+	ADD	$FIXED_FRAME, R1, R5;		\
+	ADD	R6, R5; 			\
+	ADD	R6, R3;				\
+	SUB	R6, R4;				\
+	BL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $40-0
+	NO_LOCAL_POINTERS
+	MOVD	R7, FIXED_FRAME+0(R1)
+	MOVD	R3, FIXED_FRAME+8(R1)
+	MOVD	R5, FIXED_FRAME+16(R1)
+	MOVD	R4, FIXED_FRAME+24(R1)
+	MOVD	R20, FIXED_FRAME+32(R1)
+	BL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+TEXT runtime·procyield(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	cycles+0(FP), R7
+	// POWER does not have a pause/yield instruction equivalent.
+	// Instead, we can lower the program priority by setting the
+	// Program Priority Register prior to the wait loop and set it
+	// back to default afterwards. On Linux, the default priority is
+	// medium-low. For details, see page 837 of the ISA 3.0.
+	OR	R1, R1, R1	// Set PPR priority to low
+again:
+	SUB	$1, R7
+	CMP	$0, R7
+	BNE	again
+	OR	R6, R6, R6	// Set PPR priority back to medium-low
+	RET
+
+// Save state of caller into g->sched,
+// but using fake PC from systemstack_switch.
+// Must only be called from functions with no locals ($0)
+// or else unwinding from systemstack_switch is incorrect.
+// Smashes R31.
+TEXT gosave_systemstack_switch<>(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$runtime·systemstack_switch(SB), R31
+	ADD     $16, R31 // get past prologue (including r2-setting instructions when they're there)
+	MOVD	R31, (g_sched+gobuf_pc)(g)
+	MOVD	R1, (g_sched+gobuf_sp)(g)
+	MOVD	R0, (g_sched+gobuf_lr)(g)
+	MOVD	R0, (g_sched+gobuf_ret)(g)
+	// Assert ctxt is zero. See func save.
+	MOVD	(g_sched+gobuf_ctxt)(g), R31
+	CMP	R0, R31
+	BEQ	2(PC)
+	BL	runtime·abort(SB)
+	RET
+
+#ifdef GOOS_aix
+#define asmcgocallSaveOffset cgoCalleeStackSize + 8
+#else
+#define asmcgocallSaveOffset cgoCalleeStackSize
+#endif
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-20
+	MOVD	fn+0(FP), R3
+	MOVD	arg+8(FP), R4
+
+	MOVD	R1, R7		// save original stack pointer
+	MOVD	g, R5
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already. Or we might already
+	// be on the m->gsignal stack.
+	MOVD	g_m(g), R8
+	MOVD	m_gsignal(R8), R6
+	CMP	R6, g
+	BEQ	g0
+	MOVD	m_g0(R8), R6
+	CMP	R6, g
+	BEQ	g0
+	BL	gosave_systemstack_switch<>(SB)
+	MOVD	R6, g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R1
+
+	// Now on a scheduling stack (a pthread-created stack).
+g0:
+#ifdef GOOS_aix
+	// Create a fake LR to improve backtrace.
+	MOVD	$runtime·asmcgocall(SB), R6
+	MOVD	R6, 16(R1)
+	// AIX also save one argument on the stack.
+	SUB $8, R1
+#endif
+	// Save room for two of our pointers, plus the callee
+	// save area that lives on the caller stack.
+	SUB	$(asmcgocallSaveOffset+16), R1
+	RLDCR	$0, R1, $~15, R1	// 16-byte alignment for gcc ABI
+	MOVD	R5, (asmcgocallSaveOffset+8)(R1)// save old g on stack
+	MOVD	(g_stack+stack_hi)(R5), R5
+	SUB	R7, R5
+	MOVD	R5, asmcgocallSaveOffset(R1)    // save depth in old g stack (can't just save SP, as stack might be copied during a callback)
+#ifdef GOOS_aix
+	MOVD	R7, 0(R1)	// Save frame pointer to allow manual backtrace with gdb
+#else
+	MOVD	R0, 0(R1)	// clear back chain pointer (TODO can we give it real back trace information?)
+#endif
+	// This is a "global call", so put the global entry point in r12
+	MOVD	R3, R12
+
+#ifdef GO_PPC64X_HAS_FUNCDESC
+	// Load the real entry address from the first slot of the function descriptor.
+	MOVD	8(R12), R2
+	MOVD	(R12), R12
+#endif
+	MOVD	R12, CTR
+	MOVD	R4, R3		// arg in r3
+	BL	(CTR)
+	// C code can clobber R0, so set it back to 0. F27-F31 are
+	// callee save, so we don't need to recover those.
+	XOR	R0, R0
+	// Restore g, stack pointer, toc pointer.
+	// R3 is errno, so don't touch it
+	MOVD	(asmcgocallSaveOffset+8)(R1), g
+	MOVD	(g_stack+stack_hi)(g), R5
+	MOVD	asmcgocallSaveOffset(R1), R6
+	SUB	R6, R5
+#ifndef GOOS_aix
+	MOVD	24(R5), R2
+#endif
+	MOVD	R5, R1
+	BL	runtime·save_g(SB)
+
+	MOVW	R3, ret+16(FP)
+	RET
+
+// func cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$24-24
+	NO_LOCAL_POINTERS
+
+	// Skip cgocallbackg, just dropm when fn is nil, and frame is the saved g.
+	// It is used to dropm while thread is exiting.
+	MOVD	fn+0(FP), R5
+	CMP	R5, $0
+	BNE	loadg
+	// Restore the g from frame.
+	MOVD	frame+8(FP), g
+	BR	dropm
+
+loadg:
+	// Load m and g from thread-local storage.
+	MOVBZ	runtime·iscgo(SB), R3
+	CMP	R3, $0
+	BEQ	nocgo
+	BL	runtime·load_g(SB)
+nocgo:
+
+	// If g is nil, Go did not create the current thread,
+	// or if this thread never called into Go on pthread platforms.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call.
+	CMP	g, $0
+	BEQ	needm
+
+	MOVD	g_m(g), R8
+	MOVD	R8, savedm-8(SP)
+	BR	havem
+
+needm:
+	MOVD	g, savedm-8(SP) // g is zero, so is m.
+	MOVD	$runtime·needAndBindM(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+
+	// Set m->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), R3
+	MOVD	R1, (g_sched+gobuf_sp)(R3)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 8(R1) aka savedsp-16(SP).
+	MOVD	m_g0(R8), R3
+	MOVD	(g_sched+gobuf_sp)(R3), R4
+	MOVD	R4, savedsp-24(SP)      // must match frame size
+	MOVD	R1, (g_sched+gobuf_sp)(R3)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOVD	m_curg(R8), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R4 // prepare stack as R4
+	MOVD	(g_sched+gobuf_pc)(g), R5
+	MOVD	R5, -(24+FIXED_FRAME)(R4)       // "saved LR"; must match frame size
+	// Gather our arguments into registers.
+	MOVD	fn+0(FP), R5
+	MOVD	frame+8(FP), R6
+	MOVD	ctxt+16(FP), R7
+	MOVD	$-(24+FIXED_FRAME)(R4), R1      // switch stack; must match frame size
+	MOVD    R5, FIXED_FRAME+0(R1)
+	MOVD    R6, FIXED_FRAME+8(R1)
+	MOVD    R7, FIXED_FRAME+16(R1)
+
+	MOVD	$runtime·cgocallbackg(SB), R12
+	MOVD	R12, CTR
+	CALL	(CTR) // indirect call to bypass nosplit check. We're on a different stack now.
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	MOVD	0(R1), R5
+	MOVD	R5, (g_sched+gobuf_pc)(g)
+	MOVD	$(24+FIXED_FRAME)(R1), R4       // must match frame size
+	MOVD	R4, (g_sched+gobuf_sp)(g)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R1
+	MOVD	savedsp-24(SP), R4      // must match frame size
+	MOVD	R4, (g_sched+gobuf_sp)(g)
+
+	// If the m on entry was nil, we called needm above to borrow an m,
+	// 1. for the duration of the call on non-pthread platforms,
+	// 2. or the duration of the C thread alive on pthread platforms.
+	// If the m on entry wasn't nil,
+	// 1. the thread might be a Go thread,
+	// 2. or it wasn't the first call from a C thread on pthread platforms,
+	//    since then we skip dropm to reuse the m in the first call.
+	MOVD	savedm-8(SP), R6
+	CMP	R6, $0
+	BNE	droppedm
+
+	// Skip dropm to reuse it in the next call, when a pthread key has been created.
+	MOVD	_cgo_pthread_key_created(SB), R6
+	// It means cgo is disabled when _cgo_pthread_key_created is a nil pointer, need dropm.
+	CMP	R6, $0
+	BEQ	dropm
+	MOVD	(R6), R6
+	CMP	R6, $0
+	BNE	droppedm
+
+dropm:
+	MOVD	$runtime·dropm(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+droppedm:
+
+	// Done!
+	RET
+
+// void setg(G*); set g. for use by needm.
+TEXT runtime·setg(SB), NOSPLIT, $0-8
+	MOVD	gg+0(FP), g
+	// This only happens if iscgo, so jump straight to save_g
+	BL	runtime·save_g(SB)
+	RET
+
+#ifdef GO_PPC64X_HAS_FUNCDESC
+DEFINE_PPC64X_FUNCDESC(setg_gcc<>, _setg_gcc<>)
+TEXT _setg_gcc<>(SB),NOSPLIT|NOFRAME,$0-0
+#else
+TEXT setg_gcc<>(SB),NOSPLIT|NOFRAME,$0-0
+#endif
+	// The standard prologue clobbers R31, which is callee-save in
+	// the C ABI, so we have to use $-8-0 and save LR ourselves.
+	MOVD	LR, R4
+	// Also save g and R31, since they're callee-save in C ABI
+	MOVD	R31, R5
+	MOVD	g, R6
+
+	MOVD	R3, g
+	BL	runtime·save_g(SB)
+
+	MOVD	R6, g
+	MOVD	R5, R31
+	MOVD	R4, LR
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	(R0), R0
+	UNDEF
+
+#define	TBR	268
+
+// int64 runtime·cputicks(void)
+TEXT runtime·cputicks(SB),NOSPLIT,$0-8
+	MOVD	SPR(TBR), R3
+	MOVD	R3, ret+0(FP)
+	RET
+
+// spillArgs stores return values from registers to a *internal/abi.RegArgs in R20.
+TEXT runtime·spillArgs(SB),NOSPLIT,$0-0
+	MOVD    R3, 0(R20)
+	MOVD    R4, 8(R20)
+	MOVD    R5, 16(R20)
+	MOVD    R6, 24(R20)
+	MOVD    R7, 32(R20)
+	MOVD    R8, 40(R20)
+	MOVD    R9, 48(R20)
+	MOVD    R10, 56(R20)
+	MOVD	R14, 64(R20)
+	MOVD	R15, 72(R20)
+	MOVD	R16, 80(R20)
+	MOVD	R17, 88(R20)
+	FMOVD	F1, 96(R20)
+	FMOVD	F2, 104(R20)
+	FMOVD   F3, 112(R20)
+	FMOVD   F4, 120(R20)
+	FMOVD   F5, 128(R20)
+	FMOVD   F6, 136(R20)
+	FMOVD   F7, 144(R20)
+	FMOVD   F8, 152(R20)
+	FMOVD   F9, 160(R20)
+	FMOVD   F10, 168(R20)
+	FMOVD   F11, 176(R20)
+	FMOVD   F12, 184(R20)
+	RET
+
+// unspillArgs loads args into registers from a *internal/abi.RegArgs in R20.
+TEXT runtime·unspillArgs(SB),NOSPLIT,$0-0
+	MOVD    0(R20), R3
+	MOVD    8(R20), R4
+	MOVD    16(R20), R5
+	MOVD    24(R20), R6
+	MOVD    32(R20), R7
+	MOVD    40(R20), R8
+	MOVD    48(R20), R9
+	MOVD    56(R20), R10
+	MOVD    64(R20), R14
+	MOVD    72(R20), R15
+	MOVD    80(R20), R16
+	MOVD    88(R20), R17
+	FMOVD   96(R20), F1
+	FMOVD   104(R20), F2
+	FMOVD   112(R20), F3
+	FMOVD   120(R20), F4
+	FMOVD   128(R20), F5
+	FMOVD   136(R20), F6
+	FMOVD   144(R20), F7
+	FMOVD   152(R20), F8
+	FMOVD   160(R20), F9
+	FMOVD	168(R20), F10
+	FMOVD	176(R20), F11
+	FMOVD	184(R20), F12
+	RET
+
+// AES hashing not implemented for ppc64
+TEXT runtime·memhash<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-32
+	JMP	runtime·memhashFallback<ABIInternal>(SB)
+TEXT runtime·strhash<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·strhashFallback<ABIInternal>(SB)
+TEXT runtime·memhash32<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash32Fallback<ABIInternal>(SB)
+TEXT runtime·memhash64<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash64Fallback<ABIInternal>(SB)
+
+TEXT runtime·return0(SB), NOSPLIT, $0
+	MOVW	$0, R3
+	RET
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+#ifdef GOOS_aix
+// On AIX, _cgo_topofstack is defined in runtime/cgo, because it must
+// be a longcall in order to prevent trampolines from ld.
+TEXT __cgo_topofstack(SB),NOSPLIT|NOFRAME,$0
+#else
+TEXT _cgo_topofstack(SB),NOSPLIT|NOFRAME,$0
+#endif
+	// g (R30) and R31 are callee-save in the C ABI, so save them
+	MOVD	g, R4
+	MOVD	R31, R5
+	MOVD	LR, R6
+
+	BL	runtime·load_g(SB)	// clobbers g (R30), R31
+	MOVD	g_m(g), R3
+	MOVD	m_curg(R3), R3
+	MOVD	(g_stack+stack_hi)(R3), R3
+
+	MOVD	R4, g
+	MOVD	R5, R31
+	MOVD	R6, LR
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+//
+// When dynamically linking Go, it can be returned to from a function
+// implemented in a different module and so needs to reload the TOC pointer
+// from the stack (although this function declares that it does not set up x-a
+// frame, newproc1 does in fact allocate one for goexit and saves the TOC
+// pointer in the correct place).
+// goexit+_PCQuantum is halfway through the usual global entry point prologue
+// that derives r2 from r12 which is a bit silly, but not harmful.
+TEXT runtime·goexit(SB),NOSPLIT|NOFRAME|TOPFRAME,$0-0
+	MOVD	24(R1), R2
+	BL	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	MOVD	R0, R0	// NOP
+
+// prepGoExitFrame saves the current TOC pointer (i.e. the TOC pointer for the
+// module containing runtime) to the frame that goexit will execute in when
+// the goroutine exits. It's implemented in assembly mainly because that's the
+// easiest way to get access to R2.
+TEXT runtime·prepGoExitFrame(SB),NOSPLIT,$0-8
+	MOVD    sp+0(FP), R3
+	MOVD    R2, 24(R3)
+	RET
+
+TEXT runtime·addmoduledata(SB),NOSPLIT|NOFRAME,$0-0
+	ADD	$-8, R1
+	MOVD	R31, 0(R1)
+	MOVD	runtime·lastmoduledatap(SB), R4
+	MOVD	R3, moduledata_next(R4)
+	MOVD	R3, runtime·lastmoduledatap(SB)
+	MOVD	0(R1), R31
+	ADD	$8, R1
+	RET
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	MOVW	$1, R3
+	MOVB	R3, ret+0(FP)
+	RET
+
+// gcWriteBarrier informs the GC about heap pointer writes.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It accepts the
+// number of bytes of buffer needed in R29, and returns a pointer
+// to the buffer space in R29.
+// It clobbers condition codes.
+// It does not clobber R0 through R17 (except special registers),
+// but may clobber any other register, *including* R31.
+TEXT gcWriteBarrier<>(SB),NOSPLIT,$120
+	// The standard prologue clobbers R31.
+	// We use R18, R19, and R31 as scratch registers.
+retry:
+	MOVD	g_m(g), R18
+	MOVD	m_p(R18), R18
+	MOVD	(p_wbBuf+wbBuf_next)(R18), R19
+	MOVD	(p_wbBuf+wbBuf_end)(R18), R31
+	// Increment wbBuf.next position.
+	ADD	R29, R19
+	// Is the buffer full?
+	CMPU	R31, R19
+	BLT	flush
+	// Commit to the larger buffer.
+	MOVD	R19, (p_wbBuf+wbBuf_next)(R18)
+	// Make return value (the original next position)
+	SUB	R29, R19, R29
+	RET
+
+flush:
+	// Save registers R0 through R15 since these were not saved by the caller.
+	// We don't save all registers on ppc64 because it takes too much space.
+	MOVD	R20, (FIXED_FRAME+0)(R1)
+	MOVD	R21, (FIXED_FRAME+8)(R1)
+	// R0 is always 0, so no need to spill.
+	// R1 is SP.
+	// R2 is SB.
+	MOVD	R3, (FIXED_FRAME+16)(R1)
+	MOVD	R4, (FIXED_FRAME+24)(R1)
+	MOVD	R5, (FIXED_FRAME+32)(R1)
+	MOVD	R6, (FIXED_FRAME+40)(R1)
+	MOVD	R7, (FIXED_FRAME+48)(R1)
+	MOVD	R8, (FIXED_FRAME+56)(R1)
+	MOVD	R9, (FIXED_FRAME+64)(R1)
+	MOVD	R10, (FIXED_FRAME+72)(R1)
+	// R11, R12 may be clobbered by external-linker-inserted trampoline
+	// R13 is REGTLS
+	MOVD	R14, (FIXED_FRAME+80)(R1)
+	MOVD	R15, (FIXED_FRAME+88)(R1)
+	MOVD	R16, (FIXED_FRAME+96)(R1)
+	MOVD	R17, (FIXED_FRAME+104)(R1)
+	MOVD	R29, (FIXED_FRAME+112)(R1)
+
+	CALL	runtime·wbBufFlush(SB)
+
+	MOVD	(FIXED_FRAME+0)(R1), R20
+	MOVD	(FIXED_FRAME+8)(R1), R21
+	MOVD	(FIXED_FRAME+16)(R1), R3
+	MOVD	(FIXED_FRAME+24)(R1), R4
+	MOVD	(FIXED_FRAME+32)(R1), R5
+	MOVD	(FIXED_FRAME+40)(R1), R6
+	MOVD	(FIXED_FRAME+48)(R1), R7
+	MOVD	(FIXED_FRAME+56)(R1), R8
+	MOVD	(FIXED_FRAME+64)(R1), R9
+	MOVD	(FIXED_FRAME+72)(R1), R10
+	MOVD	(FIXED_FRAME+80)(R1), R14
+	MOVD	(FIXED_FRAME+88)(R1), R15
+	MOVD	(FIXED_FRAME+96)(R1), R16
+	MOVD	(FIXED_FRAME+104)(R1), R17
+	MOVD	(FIXED_FRAME+112)(R1), R29
+	JMP	retry
+
+TEXT runtime·gcWriteBarrier1<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$8, R29
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier2<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$16, R29
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier3<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$24, R29
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier4<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$32, R29
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier5<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$40, R29
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier6<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$48, R29
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier7<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$56, R29
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier8<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$64, R29
+	JMP	gcWriteBarrier<>(SB)
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+TEXT runtime·panicIndex<ABIInternal>(SB),NOSPLIT,$0-16
+	JMP	runtime·goPanicIndex<ABIInternal>(SB)
+TEXT runtime·panicIndexU<ABIInternal>(SB),NOSPLIT,$0-16
+	JMP	runtime·goPanicIndexU<ABIInternal>(SB)
+TEXT runtime·panicSliceAlen<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R4, R3
+	MOVD	R5, R4
+	JMP	runtime·goPanicSliceAlen<ABIInternal>(SB)
+TEXT runtime·panicSliceAlenU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R4, R3
+	MOVD	R5, R4
+	JMP	runtime·goPanicSliceAlenU<ABIInternal>(SB)
+TEXT runtime·panicSliceAcap<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R4, R3
+	MOVD	R5, R4
+	JMP	runtime·goPanicSliceAcap<ABIInternal>(SB)
+TEXT runtime·panicSliceAcapU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R4, R3
+	MOVD	R5, R4
+	JMP	runtime·goPanicSliceAcapU<ABIInternal>(SB)
+TEXT runtime·panicSliceB<ABIInternal>(SB),NOSPLIT,$0-16
+	JMP	runtime·goPanicSliceB<ABIInternal>(SB)
+TEXT runtime·panicSliceBU<ABIInternal>(SB),NOSPLIT,$0-16
+	JMP	runtime·goPanicSliceBU<ABIInternal>(SB)
+TEXT runtime·panicSlice3Alen<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R5, R3
+	MOVD	R6, R4
+	JMP	runtime·goPanicSlice3Alen<ABIInternal>(SB)
+TEXT runtime·panicSlice3AlenU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R5, R3
+	MOVD	R6, R4
+	JMP	runtime·goPanicSlice3AlenU<ABIInternal>(SB)
+TEXT runtime·panicSlice3Acap<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R5, R3
+	MOVD	R6, R4
+	JMP	runtime·goPanicSlice3Acap<ABIInternal>(SB)
+TEXT runtime·panicSlice3AcapU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R5, R3
+	MOVD	R6, R4
+	JMP	runtime·goPanicSlice3AcapU<ABIInternal>(SB)
+TEXT runtime·panicSlice3B<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R4, R3
+	MOVD	R5, R4
+	JMP	runtime·goPanicSlice3B<ABIInternal>(SB)
+TEXT runtime·panicSlice3BU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R4, R3
+	MOVD	R5, R4
+	JMP	runtime·goPanicSlice3BU<ABIInternal>(SB)
+TEXT runtime·panicSlice3C<ABIInternal>(SB),NOSPLIT,$0-16
+	JMP	runtime·goPanicSlice3C<ABIInternal>(SB)
+TEXT runtime·panicSlice3CU<ABIInternal>(SB),NOSPLIT,$0-16
+	JMP	runtime·goPanicSlice3CU<ABIInternal>(SB)
+TEXT runtime·panicSliceConvert<ABIInternal>(SB),NOSPLIT,$0-16
+	MOVD	R5, R3
+	MOVD	R6, R4
+	JMP	runtime·goPanicSliceConvert<ABIInternal>(SB)
+
+// These functions are used when internal linking cgo with external
+// objects compiled with the -Os on gcc. They reduce prologue/epilogue
+// size by deferring preservation of callee-save registers to a shared
+// function. These are defined in PPC64 ELFv2 2.3.3 (but also present
+// in ELFv1)
+//
+// These appear unused, but the linker will redirect calls to functions
+// like _savegpr0_14 or _restgpr1_14 to runtime.elf_savegpr0 or
+// runtime.elf_restgpr1 with an appropriate offset based on the number
+// register operations required when linking external objects which
+// make these calls. For GPR/FPR saves, the minimum register value is
+// 14, for VR it is 20.
+//
+// These are only used when linking such cgo code internally. Note, R12
+// and R0 may be used in different ways than regular ELF compliant
+// functions.
+TEXT runtime·elf_savegpr0(SB),NOSPLIT|NOFRAME,$0
+	// R0 holds the LR of the caller's caller, R1 holds save location
+	MOVD	R14, -144(R1)
+	MOVD	R15, -136(R1)
+	MOVD	R16, -128(R1)
+	MOVD	R17, -120(R1)
+	MOVD	R18, -112(R1)
+	MOVD	R19, -104(R1)
+	MOVD	R20, -96(R1)
+	MOVD	R21, -88(R1)
+	MOVD	R22, -80(R1)
+	MOVD	R23, -72(R1)
+	MOVD	R24, -64(R1)
+	MOVD	R25, -56(R1)
+	MOVD	R26, -48(R1)
+	MOVD	R27, -40(R1)
+	MOVD	R28, -32(R1)
+	MOVD	R29, -24(R1)
+	MOVD	g, -16(R1)
+	MOVD	R31, -8(R1)
+	MOVD	R0, 16(R1)
+	RET
+TEXT runtime·elf_restgpr0(SB),NOSPLIT|NOFRAME,$0
+	// R1 holds save location. This returns to the LR saved on stack (bypassing the caller)
+	MOVD	-144(R1), R14
+	MOVD	-136(R1), R15
+	MOVD	-128(R1), R16
+	MOVD	-120(R1), R17
+	MOVD	-112(R1), R18
+	MOVD	-104(R1), R19
+	MOVD	-96(R1), R20
+	MOVD	-88(R1), R21
+	MOVD	-80(R1), R22
+	MOVD	-72(R1), R23
+	MOVD	-64(R1), R24
+	MOVD	-56(R1), R25
+	MOVD	-48(R1), R26
+	MOVD	-40(R1), R27
+	MOVD	-32(R1), R28
+	MOVD	-24(R1), R29
+	MOVD	-16(R1), g
+	MOVD	-8(R1), R31
+	MOVD	16(R1), R0	// Load and return to saved LR
+	MOVD	R0, LR
+	RET
+TEXT runtime·elf_savegpr1(SB),NOSPLIT|NOFRAME,$0
+	// R12 holds the save location
+	MOVD	R14, -144(R12)
+	MOVD	R15, -136(R12)
+	MOVD	R16, -128(R12)
+	MOVD	R17, -120(R12)
+	MOVD	R18, -112(R12)
+	MOVD	R19, -104(R12)
+	MOVD	R20, -96(R12)
+	MOVD	R21, -88(R12)
+	MOVD	R22, -80(R12)
+	MOVD	R23, -72(R12)
+	MOVD	R24, -64(R12)
+	MOVD	R25, -56(R12)
+	MOVD	R26, -48(R12)
+	MOVD	R27, -40(R12)
+	MOVD	R28, -32(R12)
+	MOVD	R29, -24(R12)
+	MOVD	g, -16(R12)
+	MOVD	R31, -8(R12)
+	RET
+TEXT runtime·elf_restgpr1(SB),NOSPLIT|NOFRAME,$0
+	// R12 holds the save location
+	MOVD	-144(R12), R14
+	MOVD	-136(R12), R15
+	MOVD	-128(R12), R16
+	MOVD	-120(R12), R17
+	MOVD	-112(R12), R18
+	MOVD	-104(R12), R19
+	MOVD	-96(R12), R20
+	MOVD	-88(R12), R21
+	MOVD	-80(R12), R22
+	MOVD	-72(R12), R23
+	MOVD	-64(R12), R24
+	MOVD	-56(R12), R25
+	MOVD	-48(R12), R26
+	MOVD	-40(R12), R27
+	MOVD	-32(R12), R28
+	MOVD	-24(R12), R29
+	MOVD	-16(R12), g
+	MOVD	-8(R12), R31
+	RET
+TEXT runtime·elf_savefpr(SB),NOSPLIT|NOFRAME,$0
+	// R0 holds the LR of the caller's caller, R1 holds save location
+	FMOVD	F14, -144(R1)
+	FMOVD	F15, -136(R1)
+	FMOVD	F16, -128(R1)
+	FMOVD	F17, -120(R1)
+	FMOVD	F18, -112(R1)
+	FMOVD	F19, -104(R1)
+	FMOVD	F20, -96(R1)
+	FMOVD	F21, -88(R1)
+	FMOVD	F22, -80(R1)
+	FMOVD	F23, -72(R1)
+	FMOVD	F24, -64(R1)
+	FMOVD	F25, -56(R1)
+	FMOVD	F26, -48(R1)
+	FMOVD	F27, -40(R1)
+	FMOVD	F28, -32(R1)
+	FMOVD	F29, -24(R1)
+	FMOVD	F30, -16(R1)
+	FMOVD	F31, -8(R1)
+	MOVD	R0, 16(R1)
+	RET
+TEXT runtime·elf_restfpr(SB),NOSPLIT|NOFRAME,$0
+	// R1 holds save location. This returns to the LR saved on stack (bypassing the caller)
+	FMOVD	-144(R1), F14
+	FMOVD	-136(R1), F15
+	FMOVD	-128(R1), F16
+	FMOVD	-120(R1), F17
+	FMOVD	-112(R1), F18
+	FMOVD	-104(R1), F19
+	FMOVD	-96(R1), F20
+	FMOVD	-88(R1), F21
+	FMOVD	-80(R1), F22
+	FMOVD	-72(R1), F23
+	FMOVD	-64(R1), F24
+	FMOVD	-56(R1), F25
+	FMOVD	-48(R1), F26
+	FMOVD	-40(R1), F27
+	FMOVD	-32(R1), F28
+	FMOVD	-24(R1), F29
+	FMOVD	-16(R1), F30
+	FMOVD	-8(R1), F31
+	MOVD	16(R1), R0	// Load and return to saved LR
+	MOVD	R0, LR
+	RET
+TEXT runtime·elf_savevr(SB),NOSPLIT|NOFRAME,$0
+	// R0 holds the save location, R12 is clobbered
+	MOVD	$-192, R12
+	STVX	V20, (R0+R12)
+	MOVD	$-176, R12
+	STVX	V21, (R0+R12)
+	MOVD	$-160, R12
+	STVX	V22, (R0+R12)
+	MOVD	$-144, R12
+	STVX	V23, (R0+R12)
+	MOVD	$-128, R12
+	STVX	V24, (R0+R12)
+	MOVD	$-112, R12
+	STVX	V25, (R0+R12)
+	MOVD	$-96, R12
+	STVX	V26, (R0+R12)
+	MOVD	$-80, R12
+	STVX	V27, (R0+R12)
+	MOVD	$-64, R12
+	STVX	V28, (R0+R12)
+	MOVD	$-48, R12
+	STVX	V29, (R0+R12)
+	MOVD	$-32, R12
+	STVX	V30, (R0+R12)
+	MOVD	$-16, R12
+	STVX	V31, (R0+R12)
+	RET
+TEXT runtime·elf_restvr(SB),NOSPLIT|NOFRAME,$0
+	// R0 holds the save location, R12 is clobbered
+	MOVD	$-192, R12
+	LVX	(R0+R12), V20
+	MOVD	$-176, R12
+	LVX	(R0+R12), V21
+	MOVD	$-160, R12
+	LVX	(R0+R12), V22
+	MOVD	$-144, R12
+	LVX	(R0+R12), V23
+	MOVD	$-128, R12
+	LVX	(R0+R12), V24
+	MOVD	$-112, R12
+	LVX	(R0+R12), V25
+	MOVD	$-96, R12
+	LVX	(R0+R12), V26
+	MOVD	$-80, R12
+	LVX	(R0+R12), V27
+	MOVD	$-64, R12
+	LVX	(R0+R12), V28
+	MOVD	$-48, R12
+	LVX	(R0+R12), V29
+	MOVD	$-32, R12
+	LVX	(R0+R12), V30
+	MOVD	$-16, R12
+	LVX	(R0+R12), V31
+	RET
diff --git a/src/runtime/asm_riscv64.s b/src/runtime/asm_riscv64.s
new file mode 100644
index 0000000..eb53cbb
--- /dev/null
+++ b/src/runtime/asm_riscv64.s
@@ -0,0 +1,935 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// func rt0_go()
+TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
+	// X2 = stack; A0 = argc; A1 = argv
+	ADD	$-24, X2
+	MOV	A0, 8(X2)	// argc
+	MOV	A1, 16(X2)	// argv
+
+	// create istack out of the given (operating system) stack.
+	// _cgo_init may update stackguard.
+	MOV	$runtime·g0(SB), g
+	MOV	$(-64*1024), T0
+	ADD	T0, X2, T1
+	MOV	T1, g_stackguard0(g)
+	MOV	T1, g_stackguard1(g)
+	MOV	T1, (g_stack+stack_lo)(g)
+	MOV	X2, (g_stack+stack_hi)(g)
+
+	// if there is a _cgo_init, call it using the gcc ABI.
+	MOV	_cgo_init(SB), T0
+	BEQ	T0, ZERO, nocgo
+
+	MOV	ZERO, A3		// arg 3: not used
+	MOV	ZERO, A2		// arg 2: not used
+	MOV	$setg_gcc<>(SB), A1	// arg 1: setg
+	MOV	g, A0			// arg 0: G
+	JALR	RA, T0
+
+nocgo:
+	// update stackguard after _cgo_init
+	MOV	(g_stack+stack_lo)(g), T0
+	ADD	$const_stackGuard, T0
+	MOV	T0, g_stackguard0(g)
+	MOV	T0, g_stackguard1(g)
+
+	// set the per-goroutine and per-mach "registers"
+	MOV	$runtime·m0(SB), T0
+
+	// save m->g0 = g0
+	MOV	g, m_g0(T0)
+	// save m0 to g0->m
+	MOV	T0, g_m(g)
+
+	CALL	runtime·check(SB)
+
+	// args are already prepared
+	CALL	runtime·args(SB)
+	CALL	runtime·osinit(SB)
+	CALL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	MOV	$runtime·mainPC(SB), T0		// entry
+	ADD	$-16, X2
+	MOV	T0, 8(X2)
+	MOV	ZERO, 0(X2)
+	CALL	runtime·newproc(SB)
+	ADD	$16, X2
+
+	// start this M
+	CALL	runtime·mstart(SB)
+
+	WORD $0 // crash if reached
+	RET
+
+TEXT runtime·mstart(SB),NOSPLIT|TOPFRAME,$0
+	CALL	runtime·mstart0(SB)
+	RET // not reached
+
+// void setg_gcc(G*); set g called from gcc with g in A0
+TEXT setg_gcc<>(SB),NOSPLIT,$0-0
+	MOV	A0, g
+	CALL	runtime·save_g(SB)
+	RET
+
+// func cputicks() int64
+TEXT runtime·cputicks(SB),NOSPLIT,$0-8
+	// RDTIME to emulate cpu ticks
+	// RDCYCLE reads counter that is per HART(core) based
+	// according to the riscv manual, see issue 46737
+	RDTIME	A0
+	MOV	A0, ret+0(FP)
+	RET
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack. We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	UNDEF
+	JALR	RA, ZERO	// make sure this function is not leaf
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-8
+	MOV	fn+0(FP), CTXT	// CTXT = fn
+	MOV	g_m(g), T0	// T0 = m
+
+	MOV	m_gsignal(T0), T1	// T1 = gsignal
+	BEQ	g, T1, noswitch
+
+	MOV	m_g0(T0), T1	// T1 = g0
+	BEQ	g, T1, noswitch
+
+	MOV	m_curg(T0), T2
+	BEQ	g, T2, switch
+
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOV	$runtime·badsystemstack(SB), T1
+	JALR	RA, T1
+
+switch:
+	// save our state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	CALL	gosave_systemstack_switch<>(SB)
+
+	// switch to g0
+	MOV	T1, g
+	CALL	runtime·save_g(SB)
+	MOV	(g_sched+gobuf_sp)(g), T0
+	MOV	T0, X2
+
+	// call target function
+	MOV	0(CTXT), T1	// code pointer
+	JALR	RA, T1
+
+	// switch back to g
+	MOV	g_m(g), T0
+	MOV	m_curg(T0), g
+	CALL	runtime·save_g(SB)
+	MOV	(g_sched+gobuf_sp)(g), X2
+	MOV	ZERO, (g_sched+gobuf_sp)(g)
+	RET
+
+noswitch:
+	// already on m stack, just call directly
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOV	0(CTXT), T1	// code pointer
+	ADD	$8, X2
+	JMP	(T1)
+
+TEXT runtime·getcallerpc(SB),NOSPLIT|NOFRAME,$0-8
+	MOV	0(X2), T0		// LR saved by caller
+	MOV	T0, ret+0(FP)
+	RET
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+// Called with return address (i.e. caller's PC) in X5 (aka T0),
+// and the LR register contains the caller's LR.
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+
+// func morestack()
+TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	MOV	g_m(g), A0
+	MOV	m_g0(A0), A1
+	BNE	g, A1, 3(PC)
+	CALL	runtime·badmorestackg0(SB)
+	CALL	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOV	m_gsignal(A0), A1
+	BNE	g, A1, 3(PC)
+	CALL	runtime·badmorestackgsignal(SB)
+	CALL	runtime·abort(SB)
+
+	// Called from f.
+	// Set g->sched to context in f.
+	MOV	X2, (g_sched+gobuf_sp)(g)
+	MOV	T0, (g_sched+gobuf_pc)(g)
+	MOV	RA, (g_sched+gobuf_lr)(g)
+	MOV	CTXT, (g_sched+gobuf_ctxt)(g)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	MOV	RA, (m_morebuf+gobuf_pc)(A0)	// f's caller's PC
+	MOV	X2, (m_morebuf+gobuf_sp)(A0)	// f's caller's SP
+	MOV	g, (m_morebuf+gobuf_g)(A0)
+
+	// Call newstack on m->g0's stack.
+	MOV	m_g0(A0), g
+	CALL	runtime·save_g(SB)
+	MOV	(g_sched+gobuf_sp)(g), X2
+	// Create a stack frame on g0 to call newstack.
+	MOV	ZERO, -8(X2)	// Zero saved LR in frame
+	ADD	$-8, X2
+	CALL	runtime·newstack(SB)
+
+	// Not reached, but make sure the return PC from the call to newstack
+	// is still in this function, and not the beginning of the next.
+	UNDEF
+
+// func morestack_noctxt()
+TEXT runtime·morestack_noctxt(SB),NOSPLIT|NOFRAME,$0-0
+	// Force SPWRITE. This function doesn't actually write SP,
+	// but it is called with a special calling convention where
+	// the caller doesn't save LR on stack but passes it as a
+	// register, and the unwinder currently doesn't understand.
+	// Make it SPWRITE to stop unwinding. (See issue 54332)
+	MOV	X2, X2
+
+	MOV	ZERO, CTXT
+	JMP	runtime·morestack(SB)
+
+// AES hashing not implemented for riscv64
+TEXT runtime·memhash<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-32
+	JMP	runtime·memhashFallback<ABIInternal>(SB)
+TEXT runtime·strhash<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·strhashFallback<ABIInternal>(SB)
+TEXT runtime·memhash32<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash32Fallback<ABIInternal>(SB)
+TEXT runtime·memhash64<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash64Fallback<ABIInternal>(SB)
+
+// func return0()
+TEXT runtime·return0(SB), NOSPLIT, $0
+	MOV	$0, A0
+	RET
+
+// restore state from Gobuf; longjmp
+
+// func gogo(buf *gobuf)
+TEXT runtime·gogo(SB), NOSPLIT|NOFRAME, $0-8
+	MOV	buf+0(FP), T0
+	MOV	gobuf_g(T0), T1
+	MOV	0(T1), ZERO // make sure g != nil
+	JMP	gogo<>(SB)
+
+TEXT gogo<>(SB), NOSPLIT|NOFRAME, $0
+	MOV	T1, g
+	CALL	runtime·save_g(SB)
+
+	MOV	gobuf_sp(T0), X2
+	MOV	gobuf_lr(T0), RA
+	MOV	gobuf_ret(T0), A0
+	MOV	gobuf_ctxt(T0), CTXT
+	MOV	ZERO, gobuf_sp(T0)
+	MOV	ZERO, gobuf_ret(T0)
+	MOV	ZERO, gobuf_lr(T0)
+	MOV	ZERO, gobuf_ctxt(T0)
+	MOV	gobuf_pc(T0), T0
+	JALR	ZERO, T0
+
+// func procyield(cycles uint32)
+TEXT runtime·procyield(SB),NOSPLIT,$0-0
+	RET
+
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+
+// func mcall(fn func(*g))
+TEXT runtime·mcall<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-8
+	MOV	X10, CTXT
+
+	// Save caller state in g->sched
+	MOV	X2, (g_sched+gobuf_sp)(g)
+	MOV	RA, (g_sched+gobuf_pc)(g)
+	MOV	ZERO, (g_sched+gobuf_lr)(g)
+
+	// Switch to m->g0 & its stack, call fn.
+	MOV	g, X10
+	MOV	g_m(g), T1
+	MOV	m_g0(T1), g
+	CALL	runtime·save_g(SB)
+	BNE	g, X10, 2(PC)
+	JMP	runtime·badmcall(SB)
+	MOV	0(CTXT), T1			// code pointer
+	MOV	(g_sched+gobuf_sp)(g), X2	// sp = m->g0->sched.sp
+	// we don't need special macro for regabi since arg0(X10) = g
+	ADD	$-16, X2
+	MOV	X10, 8(X2)			// setup g
+	MOV	ZERO, 0(X2)			// clear return address
+	JALR	RA, T1
+	JMP	runtime·badmcall2(SB)
+
+// Save state of caller into g->sched,
+// but using fake PC from systemstack_switch.
+// Must only be called from functions with no locals ($0)
+// or else unwinding from systemstack_switch is incorrect.
+// Smashes X31.
+TEXT gosave_systemstack_switch<>(SB),NOSPLIT|NOFRAME,$0
+	MOV	$runtime·systemstack_switch(SB), X31
+	ADD	$8, X31	// get past prologue
+	MOV	X31, (g_sched+gobuf_pc)(g)
+	MOV	X2, (g_sched+gobuf_sp)(g)
+	MOV	ZERO, (g_sched+gobuf_lr)(g)
+	MOV	ZERO, (g_sched+gobuf_ret)(g)
+	// Assert ctxt is zero. See func save.
+	MOV	(g_sched+gobuf_ctxt)(g), X31
+	BEQ	ZERO, X31, 2(PC)
+	CALL	runtime·abort(SB)
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-20
+	MOV	fn+0(FP), X5
+	MOV	arg+8(FP), X10
+
+	MOV	X2, X8	// save original stack pointer
+	MOV	g, X9
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already. Or we might already
+	// be on the m->gsignal stack.
+	MOV	g_m(g), X6
+	MOV	m_gsignal(X6), X7
+	BEQ	X7, g, g0
+	MOV	m_g0(X6), X7
+	BEQ	X7, g, g0
+
+	CALL	gosave_systemstack_switch<>(SB)
+	MOV	X7, g
+	CALL	runtime·save_g(SB)
+	MOV	(g_sched+gobuf_sp)(g), X2
+
+	// Now on a scheduling stack (a pthread-created stack).
+g0:
+	// Save room for two of our pointers.
+	ADD	$-16, X2
+	MOV	X9, 0(X2)	// save old g on stack
+	MOV	(g_stack+stack_hi)(X9), X9
+	SUB	X8, X9, X8
+	MOV	X8, 8(X2)	// save depth in old g stack (can't just save SP, as stack might be copied during a callback)
+
+	JALR	RA, (X5)
+
+	// Restore g, stack pointer. X10 is return value.
+	MOV	0(X2), g
+	CALL	runtime·save_g(SB)
+	MOV	(g_stack+stack_hi)(g), X5
+	MOV	8(X2), X6
+	SUB	X6, X5, X6
+	MOV	X6, X2
+
+	MOVW	X10, ret+16(FP)
+	RET
+
+// func asminit()
+TEXT runtime·asminit(SB),NOSPLIT|NOFRAME,$0-0
+	RET
+
+// reflectcall: call a function with the given argument list
+// func call(stackArgsType *_type, f *FuncVal, stackArgs *byte, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)	\
+	MOV	$MAXSIZE, T1	\
+	BLTU	T1, T0, 3(PC)	\
+	MOV	$NAME(SB), T2;	\
+	JALR	ZERO, T2
+// Note: can't just "BR NAME(SB)" - bad inlining results.
+
+// func call(stackArgsType *rtype, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs).
+TEXT reflect·call(SB), NOSPLIT, $0-0
+	JMP	·reflectcall(SB)
+
+// func call(stackArgsType *_type, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs).
+TEXT ·reflectcall(SB), NOSPLIT|NOFRAME, $0-48
+	MOVWU	frameSize+32(FP), T0
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOV	$runtime·badreflectcall(SB), T2
+	JALR	ZERO, T2
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-48;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOV	stackArgs+16(FP), A1;			\
+	MOVWU	stackArgsSize+24(FP), A2;		\
+	MOV	X2, A3;				\
+	ADD	$8, A3;				\
+	ADD	A3, A2;				\
+	BEQ	A3, A2, 6(PC);			\
+	MOVBU	(A1), A4;			\
+	ADD	$1, A1;				\
+	MOVB	A4, (A3);			\
+	ADD	$1, A3;				\
+	JMP	-5(PC);				\
+	/* set up argument registers */		\
+	MOV	regArgs+40(FP), X25;		\
+	CALL	·unspillArgs(SB);		\
+	/* call function */			\
+	MOV	f+8(FP), CTXT;			\
+	MOV	(CTXT), X25;			\
+	PCDATA  $PCDATA_StackMapIndex, $0;	\
+	JALR	RA, X25;				\
+	/* copy return values back */		\
+	MOV	regArgs+40(FP), X25;		\
+	CALL	·spillArgs(SB);		\
+	MOV	stackArgsType+0(FP), A5;		\
+	MOV	stackArgs+16(FP), A1;			\
+	MOVWU	stackArgsSize+24(FP), A2;			\
+	MOVWU	stackRetOffset+28(FP), A4;		\
+	ADD	$8, X2, A3;			\
+	ADD	A4, A3; 			\
+	ADD	A4, A1;				\
+	SUB	A4, A2;				\
+	CALL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $40-0
+	NO_LOCAL_POINTERS
+	MOV	A5, 8(X2)
+	MOV	A1, 16(X2)
+	MOV	A3, 24(X2)
+	MOV	A2, 32(X2)
+	MOV	X25, 40(X2)
+	CALL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT,$8
+	// g (X27) and REG_TMP (X31) might be clobbered by load_g.
+	// X27 is callee-save in the gcc calling convention, so save it.
+	MOV	g, savedX27-8(SP)
+
+	CALL	runtime·load_g(SB)
+	MOV	g_m(g), X5
+	MOV	m_curg(X5), X5
+	MOV	(g_stack+stack_hi)(X5), X10 // return value in X10
+
+	MOV	savedX27-8(SP), g
+	RET
+
+// func goexit(neverCallThisFunction)
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+TEXT runtime·goexit(SB),NOSPLIT|NOFRAME|TOPFRAME,$0-0
+	MOV	ZERO, ZERO	// NOP
+	JMP	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	MOV	ZERO, ZERO	// NOP
+
+// func cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$24-24
+	NO_LOCAL_POINTERS
+
+	// Skip cgocallbackg, just dropm when fn is nil, and frame is the saved g.
+	// It is used to dropm while thread is exiting.
+	MOV	fn+0(FP), X7
+	BNE	ZERO, X7, loadg
+	// Restore the g from frame.
+	MOV	frame+8(FP), g
+	JMP	dropm
+
+loadg:
+	// Load m and g from thread-local storage.
+	MOVBU	runtime·iscgo(SB), X5
+	BEQ	ZERO, X5, nocgo
+	CALL	runtime·load_g(SB)
+nocgo:
+
+	// If g is nil, Go did not create the current thread,
+	// or if this thread never called into Go on pthread platforms.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call.
+	BEQ	ZERO, g, needm
+
+	MOV	g_m(g), X5
+	MOV	X5, savedm-8(SP)
+	JMP	havem
+
+needm:
+	MOV	g, savedm-8(SP) // g is zero, so is m.
+	MOV	$runtime·needAndBindM(SB), X6
+	JALR	RA, X6
+
+	// Set m->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOV	g_m(g), X5
+	MOV	m_g0(X5), X6
+	MOV	X2, (g_sched+gobuf_sp)(X6)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 8(X2) aka savedsp-24(SP).
+	MOV	m_g0(X5), X6
+	MOV	(g_sched+gobuf_sp)(X6), X7
+	MOV	X7, savedsp-24(SP)	// must match frame size
+	MOV	X2, (g_sched+gobuf_sp)(X6)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOV	m_curg(X5), g
+	CALL	runtime·save_g(SB)
+	MOV	(g_sched+gobuf_sp)(g), X6 // prepare stack as X6
+	MOV	(g_sched+gobuf_pc)(g), X7
+	MOV	X7, -(24+8)(X6)		// "saved LR"; must match frame size
+	// Gather our arguments into registers.
+	MOV	fn+0(FP), X7
+	MOV	frame+8(FP), X8
+	MOV	ctxt+16(FP), X9
+	MOV	$-(24+8)(X6), X2	// switch stack; must match frame size
+	MOV	X7, 8(X2)
+	MOV	X8, 16(X2)
+	MOV	X9, 24(X2)
+	CALL	runtime·cgocallbackg(SB)
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	MOV	0(X2), X7
+	MOV	X7, (g_sched+gobuf_pc)(g)
+	MOV	$(24+8)(X2), X6		// must match frame size
+	MOV	X6, (g_sched+gobuf_sp)(g)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOV	g_m(g), X5
+	MOV	m_g0(X5), g
+	CALL	runtime·save_g(SB)
+	MOV	(g_sched+gobuf_sp)(g), X2
+	MOV	savedsp-24(SP), X6	// must match frame size
+	MOV	X6, (g_sched+gobuf_sp)(g)
+
+	// If the m on entry was nil, we called needm above to borrow an m,
+	// 1. for the duration of the call on non-pthread platforms,
+	// 2. or the duration of the C thread alive on pthread platforms.
+	// If the m on entry wasn't nil,
+	// 1. the thread might be a Go thread,
+	// 2. or it wasn't the first call from a C thread on pthread platforms,
+	//    since then we skip dropm to reuse the m in the first call.
+	MOV	savedm-8(SP), X5
+	BNE	ZERO, X5, droppedm
+
+	// Skip dropm to reuse it in the next call, when a pthread key has been created.
+	MOV	_cgo_pthread_key_created(SB), X5
+	// It means cgo is disabled when _cgo_pthread_key_created is a nil pointer, need dropm.
+	BEQ	ZERO, X5, dropm
+	MOV	(X5), X5
+	BNE	ZERO, X5, droppedm
+
+dropm:
+	MOV	$runtime·dropm(SB), X6
+	JALR	RA, X6
+droppedm:
+
+	// Done!
+	RET
+
+TEXT runtime·breakpoint(SB),NOSPLIT|NOFRAME,$0-0
+	EBREAK
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT|NOFRAME,$0-0
+	EBREAK
+	RET
+
+// void setg(G*); set g. for use by needm.
+TEXT runtime·setg(SB), NOSPLIT, $0-8
+	MOV	gg+0(FP), g
+	// This only happens if iscgo, so jump straight to save_g
+	CALL	runtime·save_g(SB)
+	RET
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	MOV	$1, T0
+	MOV	T0, ret+0(FP)
+	RET
+
+// spillArgs stores return values from registers to a *internal/abi.RegArgs in X25.
+TEXT ·spillArgs(SB),NOSPLIT,$0-0
+	MOV	X10, (0*8)(X25)
+	MOV	X11, (1*8)(X25)
+	MOV	X12, (2*8)(X25)
+	MOV	X13, (3*8)(X25)
+	MOV	X14, (4*8)(X25)
+	MOV	X15, (5*8)(X25)
+	MOV	X16, (6*8)(X25)
+	MOV	X17, (7*8)(X25)
+	MOV	X8,  (8*8)(X25)
+	MOV	X9,  (9*8)(X25)
+	MOV	X18, (10*8)(X25)
+	MOV	X19, (11*8)(X25)
+	MOV	X20, (12*8)(X25)
+	MOV	X21, (13*8)(X25)
+	MOV	X22, (14*8)(X25)
+	MOV	X23, (15*8)(X25)
+	MOVD	F10, (16*8)(X25)
+	MOVD	F11, (17*8)(X25)
+	MOVD	F12, (18*8)(X25)
+	MOVD	F13, (19*8)(X25)
+	MOVD	F14, (20*8)(X25)
+	MOVD	F15, (21*8)(X25)
+	MOVD	F16, (22*8)(X25)
+	MOVD	F17, (23*8)(X25)
+	MOVD	F8,  (24*8)(X25)
+	MOVD	F9,  (25*8)(X25)
+	MOVD	F18, (26*8)(X25)
+	MOVD	F19, (27*8)(X25)
+	MOVD	F20, (28*8)(X25)
+	MOVD	F21, (29*8)(X25)
+	MOVD	F22, (30*8)(X25)
+	MOVD	F23, (31*8)(X25)
+	RET
+
+// unspillArgs loads args into registers from a *internal/abi.RegArgs in X25.
+TEXT ·unspillArgs(SB),NOSPLIT,$0-0
+	MOV	(0*8)(X25), X10
+	MOV	(1*8)(X25), X11
+	MOV	(2*8)(X25), X12
+	MOV	(3*8)(X25), X13
+	MOV	(4*8)(X25), X14
+	MOV	(5*8)(X25), X15
+	MOV	(6*8)(X25), X16
+	MOV	(7*8)(X25), X17
+	MOV	(8*8)(X25), X8
+	MOV	(9*8)(X25), X9
+	MOV	(10*8)(X25), X18
+	MOV	(11*8)(X25), X19
+	MOV	(12*8)(X25), X20
+	MOV	(13*8)(X25), X21
+	MOV	(14*8)(X25), X22
+	MOV	(15*8)(X25), X23
+	MOVD	(16*8)(X25), F10
+	MOVD	(17*8)(X25), F11
+	MOVD	(18*8)(X25), F12
+	MOVD	(19*8)(X25), F13
+	MOVD	(20*8)(X25), F14
+	MOVD	(21*8)(X25), F15
+	MOVD	(22*8)(X25), F16
+	MOVD	(23*8)(X25), F17
+	MOVD	(24*8)(X25), F8
+	MOVD	(25*8)(X25), F9
+	MOVD	(26*8)(X25), F18
+	MOVD	(27*8)(X25), F19
+	MOVD	(28*8)(X25), F20
+	MOVD	(29*8)(X25), F21
+	MOVD	(30*8)(X25), F22
+	MOVD	(31*8)(X25), F23
+	RET
+
+// gcWriteBarrier informs the GC about heap pointer writes.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It accepts the
+// number of bytes of buffer needed in X24, and returns a pointer
+// to the buffer space in X24.
+// It clobbers X31 aka T6 (the linker temp register - REG_TMP).
+// The act of CALLing gcWriteBarrier will clobber RA (LR).
+// It does not clobber any other general-purpose registers,
+// but may clobber others (e.g., floating point registers).
+TEXT gcWriteBarrier<>(SB),NOSPLIT,$208
+	// Save the registers clobbered by the fast path.
+	MOV	A0, 24*8(X2)
+	MOV	A1, 25*8(X2)
+retry:
+	MOV	g_m(g), A0
+	MOV	m_p(A0), A0
+	MOV	(p_wbBuf+wbBuf_next)(A0), A1
+	MOV	(p_wbBuf+wbBuf_end)(A0), T6 // T6 is linker temp register (REG_TMP)
+	// Increment wbBuf.next position.
+	ADD	X24, A1
+	// Is the buffer full?
+	BLTU	T6, A1, flush
+	// Commit to the larger buffer.
+	MOV	A1, (p_wbBuf+wbBuf_next)(A0)
+	// Make the return value (the original next position)
+	SUB	X24, A1, X24
+	// Restore registers.
+	MOV	24*8(X2), A0
+	MOV	25*8(X2), A1
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	MOV	T0, 1*8(X2)
+	MOV	T1, 2*8(X2)
+	// X0 is zero register
+	// X1 is LR, saved by prologue
+	// X2 is SP
+	// X3 is GP
+	// X4 is TP
+	MOV	X7, 3*8(X2)
+	MOV	X8, 4*8(X2)
+	MOV	X9, 5*8(X2)
+	// X10 already saved (A0)
+	// X11 already saved (A1)
+	MOV	X12, 6*8(X2)
+	MOV	X13, 7*8(X2)
+	MOV	X14, 8*8(X2)
+	MOV	X15, 9*8(X2)
+	MOV	X16, 10*8(X2)
+	MOV	X17, 11*8(X2)
+	MOV	X18, 12*8(X2)
+	MOV	X19, 13*8(X2)
+	MOV	X20, 14*8(X2)
+	MOV	X21, 15*8(X2)
+	MOV	X22, 16*8(X2)
+	MOV	X23, 17*8(X2)
+	MOV	X24, 18*8(X2)
+	MOV	X25, 19*8(X2)
+	MOV	X26, 20*8(X2)
+	// X27 is g.
+	MOV	X28, 21*8(X2)
+	MOV	X29, 22*8(X2)
+	MOV	X30, 23*8(X2)
+	// X31 is tmp register.
+
+	CALL	runtime·wbBufFlush(SB)
+
+	MOV	1*8(X2), T0
+	MOV	2*8(X2), T1
+	MOV	3*8(X2), X7
+	MOV	4*8(X2), X8
+	MOV	5*8(X2), X9
+	MOV	6*8(X2), X12
+	MOV	7*8(X2), X13
+	MOV	8*8(X2), X14
+	MOV	9*8(X2), X15
+	MOV	10*8(X2), X16
+	MOV	11*8(X2), X17
+	MOV	12*8(X2), X18
+	MOV	13*8(X2), X19
+	MOV	14*8(X2), X20
+	MOV	15*8(X2), X21
+	MOV	16*8(X2), X22
+	MOV	17*8(X2), X23
+	MOV	18*8(X2), X24
+	MOV	19*8(X2), X25
+	MOV	20*8(X2), X26
+	MOV	21*8(X2), X28
+	MOV	22*8(X2), X29
+	MOV	23*8(X2), X30
+
+	JMP	retry
+
+TEXT runtime·gcWriteBarrier1<ABIInternal>(SB),NOSPLIT,$0
+	MOV	$8, X24
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier2<ABIInternal>(SB),NOSPLIT,$0
+	MOV	$16, X24
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier3<ABIInternal>(SB),NOSPLIT,$0
+	MOV	$24, X24
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier4<ABIInternal>(SB),NOSPLIT,$0
+	MOV	$32, X24
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier5<ABIInternal>(SB),NOSPLIT,$0
+	MOV	$40, X24
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier6<ABIInternal>(SB),NOSPLIT,$0
+	MOV	$48, X24
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier7<ABIInternal>(SB),NOSPLIT,$0
+	MOV	$56, X24
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier8<ABIInternal>(SB),NOSPLIT,$0
+	MOV	$64, X24
+	JMP	gcWriteBarrier<>(SB)
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers (ssa/gen/RISCV64Ops.go), but the space for those
+// arguments are allocated in the caller's stack frame.
+// These stubs write the args into that stack space and then tail call to the
+// corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+TEXT runtime·panicIndex<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T0, X10
+	MOV	T1, X11
+	JMP	runtime·goPanicIndex<ABIInternal>(SB)
+TEXT runtime·panicIndexU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T0, X10
+	MOV	T1, X11
+	JMP	runtime·goPanicIndexU<ABIInternal>(SB)
+TEXT runtime·panicSliceAlen<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T1, X10
+	MOV	T2, X11
+	JMP	runtime·goPanicSliceAlen<ABIInternal>(SB)
+TEXT runtime·panicSliceAlenU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T1, X10
+	MOV	T2, X11
+	JMP	runtime·goPanicSliceAlenU<ABIInternal>(SB)
+TEXT runtime·panicSliceAcap<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T1, X10
+	MOV	T2, X11
+	JMP	runtime·goPanicSliceAcap<ABIInternal>(SB)
+TEXT runtime·panicSliceAcapU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T1, X10
+	MOV	T2, X11
+	JMP	runtime·goPanicSliceAcapU<ABIInternal>(SB)
+TEXT runtime·panicSliceB<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T0, X10
+	MOV	T1, X11
+	JMP	runtime·goPanicSliceB<ABIInternal>(SB)
+TEXT runtime·panicSliceBU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T0, X10
+	MOV	T1, X11
+	JMP	runtime·goPanicSliceBU<ABIInternal>(SB)
+TEXT runtime·panicSlice3Alen<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T2, X10
+	MOV	T3, X11
+	JMP	runtime·goPanicSlice3Alen<ABIInternal>(SB)
+TEXT runtime·panicSlice3AlenU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T2, X10
+	MOV	T3, X11
+	JMP	runtime·goPanicSlice3AlenU<ABIInternal>(SB)
+TEXT runtime·panicSlice3Acap<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T2, X10
+	MOV	T3, X11
+	JMP	runtime·goPanicSlice3Acap<ABIInternal>(SB)
+TEXT runtime·panicSlice3AcapU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T2, X10
+	MOV	T3, X11
+	JMP	runtime·goPanicSlice3AcapU<ABIInternal>(SB)
+TEXT runtime·panicSlice3B<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T1, X10
+	MOV	T2, X11
+	JMP	runtime·goPanicSlice3B<ABIInternal>(SB)
+TEXT runtime·panicSlice3BU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T1, X10
+	MOV	T2, X11
+	JMP	runtime·goPanicSlice3BU<ABIInternal>(SB)
+TEXT runtime·panicSlice3C<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T0, X10
+	MOV	T1, X11
+	JMP	runtime·goPanicSlice3C<ABIInternal>(SB)
+TEXT runtime·panicSlice3CU<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T0, X10
+	MOV	T1, X11
+	JMP	runtime·goPanicSlice3CU<ABIInternal>(SB)
+TEXT runtime·panicSliceConvert<ABIInternal>(SB),NOSPLIT,$0-16
+	MOV	T2, X10
+	MOV	T3, X11
+	JMP	runtime·goPanicSliceConvert<ABIInternal>(SB)
+
+DATA	runtime·mainPC+0(SB)/8,$runtime·main<ABIInternal>(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$8
diff --git a/src/runtime/asm_s390x.s b/src/runtime/asm_s390x.s
new file mode 100644
index 0000000..a7f414e
--- /dev/null
+++ b/src/runtime/asm_s390x.s
@@ -0,0 +1,950 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// _rt0_s390x_lib is common startup code for s390x systems when
+// using -buildmode=c-archive or -buildmode=c-shared. The linker will
+// arrange to invoke this function as a global constructor (for
+// c-archive) or when the shared library is loaded (for c-shared).
+// We expect argc and argv to be passed in the usual C ABI registers
+// R2 and R3.
+TEXT _rt0_s390x_lib(SB), NOSPLIT|NOFRAME, $0
+	STMG	R6, R15, 48(R15)
+	MOVD	R2, _rt0_s390x_lib_argc<>(SB)
+	MOVD	R3, _rt0_s390x_lib_argv<>(SB)
+
+	// Save R6-R15 in the register save area of the calling function.
+	STMG	R6, R15, 48(R15)
+
+	// Allocate 80 bytes on the stack.
+	MOVD	$-80(R15), R15
+
+	// Save F8-F15 in our stack frame.
+	FMOVD	F8, 16(R15)
+	FMOVD	F9, 24(R15)
+	FMOVD	F10, 32(R15)
+	FMOVD	F11, 40(R15)
+	FMOVD	F12, 48(R15)
+	FMOVD	F13, 56(R15)
+	FMOVD	F14, 64(R15)
+	FMOVD	F15, 72(R15)
+
+	// Synchronous initialization.
+	MOVD	$runtime·libpreinit(SB), R1
+	BL	R1
+
+	// Create a new thread to finish Go runtime initialization.
+	MOVD	_cgo_sys_thread_create(SB), R1
+	CMP	R1, $0
+	BEQ	nocgo
+	MOVD	$_rt0_s390x_lib_go(SB), R2
+	MOVD	$0, R3
+	BL	R1
+	BR	restore
+
+nocgo:
+	MOVD	$0x800000, R1              // stacksize
+	MOVD	R1, 0(R15)
+	MOVD	$_rt0_s390x_lib_go(SB), R1
+	MOVD	R1, 8(R15)                 // fn
+	MOVD	$runtime·newosproc(SB), R1
+	BL	R1
+
+restore:
+	// Restore F8-F15 from our stack frame.
+	FMOVD	16(R15), F8
+	FMOVD	24(R15), F9
+	FMOVD	32(R15), F10
+	FMOVD	40(R15), F11
+	FMOVD	48(R15), F12
+	FMOVD	56(R15), F13
+	FMOVD	64(R15), F14
+	FMOVD	72(R15), F15
+	MOVD	$80(R15), R15
+
+	// Restore R6-R15.
+	LMG	48(R15), R6, R15
+	RET
+
+// _rt0_s390x_lib_go initializes the Go runtime.
+// This is started in a separate thread by _rt0_s390x_lib.
+TEXT _rt0_s390x_lib_go(SB), NOSPLIT|NOFRAME, $0
+	MOVD	_rt0_s390x_lib_argc<>(SB), R2
+	MOVD	_rt0_s390x_lib_argv<>(SB), R3
+	MOVD	$runtime·rt0_go(SB), R1
+	BR	R1
+
+DATA _rt0_s390x_lib_argc<>(SB)/8, $0
+GLOBL _rt0_s390x_lib_argc<>(SB), NOPTR, $8
+DATA _rt0_s90x_lib_argv<>(SB)/8, $0
+GLOBL _rt0_s390x_lib_argv<>(SB), NOPTR, $8
+
+TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
+	// R2 = argc; R3 = argv; R11 = temp; R13 = g; R15 = stack pointer
+	// C TLS base pointer in AR0:AR1
+
+	// initialize essential registers
+	XOR	R0, R0
+
+	SUB	$24, R15
+	MOVW	R2, 8(R15) // argc
+	MOVD	R3, 16(R15) // argv
+
+	// create istack out of the given (operating system) stack.
+	// _cgo_init may update stackguard.
+	MOVD	$runtime·g0(SB), g
+	MOVD	R15, R11
+	SUB	$(64*1024), R11
+	MOVD	R11, g_stackguard0(g)
+	MOVD	R11, g_stackguard1(g)
+	MOVD	R11, (g_stack+stack_lo)(g)
+	MOVD	R15, (g_stack+stack_hi)(g)
+
+	// if there is a _cgo_init, call it using the gcc ABI.
+	MOVD	_cgo_init(SB), R11
+	CMPBEQ	R11, $0, nocgo
+	MOVW	AR0, R4			// (AR0 << 32 | AR1) is the TLS base pointer; MOVD is translated to EAR
+	SLD	$32, R4, R4
+	MOVW	AR1, R4			// arg 2: TLS base pointer
+	MOVD	$setg_gcc<>(SB), R3 	// arg 1: setg
+	MOVD	g, R2			// arg 0: G
+	// C functions expect 160 bytes of space on caller stack frame
+	// and an 8-byte aligned stack pointer
+	MOVD	R15, R9			// save current stack (R9 is preserved in the Linux ABI)
+	SUB	$160, R15		// reserve 160 bytes
+	MOVD    $~7, R6
+	AND 	R6, R15			// 8-byte align
+	BL	R11			// this call clobbers volatile registers according to Linux ABI (R0-R5, R14)
+	MOVD	R9, R15			// restore stack
+	XOR	R0, R0			// zero R0
+
+nocgo:
+	// update stackguard after _cgo_init
+	MOVD	(g_stack+stack_lo)(g), R2
+	ADD	$const_stackGuard, R2
+	MOVD	R2, g_stackguard0(g)
+	MOVD	R2, g_stackguard1(g)
+
+	// set the per-goroutine and per-mach "registers"
+	MOVD	$runtime·m0(SB), R2
+
+	// save m->g0 = g0
+	MOVD	g, m_g0(R2)
+	// save m0 to g0->m
+	MOVD	R2, g_m(g)
+
+	BL	runtime·check(SB)
+
+	// argc/argv are already prepared on stack
+	BL	runtime·args(SB)
+	BL	runtime·osinit(SB)
+	BL	runtime·schedinit(SB)
+
+	// create a new goroutine to start program
+	MOVD	$runtime·mainPC(SB), R2		// entry
+	SUB     $16, R15
+	MOVD 	R2, 8(R15)
+	MOVD 	$0, 0(R15)
+	BL	runtime·newproc(SB)
+	ADD	$16, R15
+
+	// start this M
+	BL	runtime·mstart(SB)
+
+	MOVD	$0, 1(R0)
+	RET
+
+DATA	runtime·mainPC+0(SB)/8,$runtime·main(SB)
+GLOBL	runtime·mainPC(SB),RODATA,$8
+
+TEXT runtime·breakpoint(SB),NOSPLIT|NOFRAME,$0-0
+	BRRK
+	RET
+
+TEXT runtime·asminit(SB),NOSPLIT|NOFRAME,$0-0
+	RET
+
+TEXT runtime·mstart(SB),NOSPLIT|TOPFRAME,$0
+	CALL	runtime·mstart0(SB)
+	RET // not reached
+
+/*
+ *  go-routine
+ */
+
+// void gogo(Gobuf*)
+// restore state from Gobuf; longjmp
+TEXT runtime·gogo(SB), NOSPLIT|NOFRAME, $0-8
+	MOVD	buf+0(FP), R5
+	MOVD	gobuf_g(R5), R6
+	MOVD	0(R6), R7	// make sure g != nil
+	BR	gogo<>(SB)
+
+TEXT gogo<>(SB), NOSPLIT|NOFRAME, $0
+	MOVD	R6, g
+	BL	runtime·save_g(SB)
+
+	MOVD	0(g), R4
+	MOVD	gobuf_sp(R5), R15
+	MOVD	gobuf_lr(R5), LR
+	MOVD	gobuf_ret(R5), R3
+	MOVD	gobuf_ctxt(R5), R12
+	MOVD	$0, gobuf_sp(R5)
+	MOVD	$0, gobuf_ret(R5)
+	MOVD	$0, gobuf_lr(R5)
+	MOVD	$0, gobuf_ctxt(R5)
+	CMP	R0, R0 // set condition codes for == test, needed by stack split
+	MOVD	gobuf_pc(R5), R6
+	BR	(R6)
+
+// void mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return.  It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall(SB), NOSPLIT, $-8-8
+	// Save caller state in g->sched
+	MOVD	R15, (g_sched+gobuf_sp)(g)
+	MOVD	LR, (g_sched+gobuf_pc)(g)
+	MOVD	$0, (g_sched+gobuf_lr)(g)
+
+	// Switch to m->g0 & its stack, call fn.
+	MOVD	g, R3
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), g
+	BL	runtime·save_g(SB)
+	CMP	g, R3
+	BNE	2(PC)
+	BR	runtime·badmcall(SB)
+	MOVD	fn+0(FP), R12			// context
+	MOVD	0(R12), R4			// code pointer
+	MOVD	(g_sched+gobuf_sp)(g), R15	// sp = m->g0->sched.sp
+	SUB	$16, R15
+	MOVD	R3, 8(R15)
+	MOVD	$0, 0(R15)
+	BL	(R4)
+	BR	runtime·badmcall2(SB)
+
+// systemstack_switch is a dummy routine that systemstack leaves at the bottom
+// of the G stack.  We need to distinguish the routine that
+// lives at the bottom of the G stack from the one that lives
+// at the top of the system stack because the one at the top of
+// the system stack terminates the stack walk (see topofstack()).
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	UNDEF
+	BL	(LR)	// make sure this function is not leaf
+	RET
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-8
+	MOVD	fn+0(FP), R3	// R3 = fn
+	MOVD	R3, R12		// context
+	MOVD	g_m(g), R4	// R4 = m
+
+	MOVD	m_gsignal(R4), R5	// R5 = gsignal
+	CMPBEQ	g, R5, noswitch
+
+	MOVD	m_g0(R4), R5	// R5 = g0
+	CMPBEQ	g, R5, noswitch
+
+	MOVD	m_curg(R4), R6
+	CMPBEQ	g, R6, switch
+
+	// Bad: g is not gsignal, not g0, not curg. What is it?
+	// Hide call from linker nosplit analysis.
+	MOVD	$runtime·badsystemstack(SB), R3
+	BL	(R3)
+	BL	runtime·abort(SB)
+
+switch:
+	// save our state in g->sched.  Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	BL	gosave_systemstack_switch<>(SB)
+
+	// switch to g0
+	MOVD	R5, g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R15
+
+	// call target function
+	MOVD	0(R12), R3	// code pointer
+	BL	(R3)
+
+	// switch back to g
+	MOVD	g_m(g), R3
+	MOVD	m_curg(R3), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R15
+	MOVD	$0, (g_sched+gobuf_sp)(g)
+	RET
+
+noswitch:
+	// already on m stack, just call directly
+	// Using a tail call here cleans up tracebacks since we won't stop
+	// at an intermediate systemstack.
+	MOVD	0(R12), R3	// code pointer
+	MOVD	0(R15), LR	// restore LR
+	ADD	$8, R15
+	BR	(R3)
+
+/*
+ * support for morestack
+ */
+
+// Called during function prolog when more stack is needed.
+// Caller has already loaded:
+// R3: framesize, R4: argsize, R5: LR
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
+	// Cannot grow scheduler stack (m->g0).
+	MOVD	g_m(g), R7
+	MOVD	m_g0(R7), R8
+	CMPBNE	g, R8, 3(PC)
+	BL	runtime·badmorestackg0(SB)
+	BL	runtime·abort(SB)
+
+	// Cannot grow signal stack (m->gsignal).
+	MOVD	m_gsignal(R7), R8
+	CMP	g, R8
+	BNE	3(PC)
+	BL	runtime·badmorestackgsignal(SB)
+	BL	runtime·abort(SB)
+
+	// Called from f.
+	// Set g->sched to context in f.
+	MOVD	R15, (g_sched+gobuf_sp)(g)
+	MOVD	LR, R8
+	MOVD	R8, (g_sched+gobuf_pc)(g)
+	MOVD	R5, (g_sched+gobuf_lr)(g)
+	MOVD	R12, (g_sched+gobuf_ctxt)(g)
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	MOVD	R5, (m_morebuf+gobuf_pc)(R7)	// f's caller's PC
+	MOVD	R15, (m_morebuf+gobuf_sp)(R7)	// f's caller's SP
+	MOVD	g, (m_morebuf+gobuf_g)(R7)
+
+	// Call newstack on m->g0's stack.
+	MOVD	m_g0(R7), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R15
+	// Create a stack frame on g0 to call newstack.
+	MOVD	$0, -8(R15)	// Zero saved LR in frame
+	SUB	$8, R15
+	BL	runtime·newstack(SB)
+
+	// Not reached, but make sure the return PC from the call to newstack
+	// is still in this function, and not the beginning of the next.
+	UNDEF
+
+TEXT runtime·morestack_noctxt(SB),NOSPLIT|NOFRAME,$0-0
+	// Force SPWRITE. This function doesn't actually write SP,
+	// but it is called with a special calling convention where
+	// the caller doesn't save LR on stack but passes it as a
+	// register (R5), and the unwinder currently doesn't understand.
+	// Make it SPWRITE to stop unwinding. (See issue 54332)
+	MOVD	R15, R15
+
+	MOVD	$0, R12
+	BR	runtime·morestack(SB)
+
+// reflectcall: call a function with the given argument list
+// func call(stackArgsType *_type, f *FuncVal, stackArgs *byte, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs).
+// we don't have variable-sized frames, so we use a small number
+// of constant-sized-frame functions to encode a few bits of size in the pc.
+// Caution: ugly multiline assembly macros in your future!
+
+#define DISPATCH(NAME,MAXSIZE)		\
+	MOVD	$MAXSIZE, R4;		\
+	CMP	R3, R4;		\
+	BGT	3(PC);			\
+	MOVD	$NAME(SB), R5;	\
+	BR	(R5)
+// Note: can't just "BR NAME(SB)" - bad inlining results.
+
+TEXT ·reflectcall(SB), NOSPLIT, $-8-48
+	MOVWZ	frameSize+32(FP), R3
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	MOVD	$runtime·badreflectcall(SB), R5
+	BR	(R5)
+
+#define CALLFN(NAME,MAXSIZE)			\
+TEXT NAME(SB), WRAPPER, $MAXSIZE-48;		\
+	NO_LOCAL_POINTERS;			\
+	/* copy arguments to stack */		\
+	MOVD	stackArgs+16(FP), R4;			\
+	MOVWZ	stackArgsSize+24(FP), R5;		\
+	MOVD	$stack-MAXSIZE(SP), R6;		\
+loopArgs: /* copy 256 bytes at a time */	\
+	CMP	R5, $256;			\
+	BLT	tailArgs;			\
+	SUB	$256, R5;			\
+	MVC	$256, 0(R4), 0(R6);		\
+	MOVD	$256(R4), R4;			\
+	MOVD	$256(R6), R6;			\
+	BR	loopArgs;			\
+tailArgs: /* copy remaining bytes */		\
+	CMP	R5, $0;				\
+	BEQ	callFunction;			\
+	SUB	$1, R5;				\
+	EXRL	$callfnMVC<>(SB), R5;		\
+callFunction:					\
+	MOVD	f+8(FP), R12;			\
+	MOVD	(R12), R8;			\
+	PCDATA  $PCDATA_StackMapIndex, $0;	\
+	BL	(R8);				\
+	/* copy return values back */		\
+	MOVD	stackArgsType+0(FP), R7;		\
+	MOVD	stackArgs+16(FP), R6;			\
+	MOVWZ	stackArgsSize+24(FP), R5;			\
+	MOVD	$stack-MAXSIZE(SP), R4;		\
+	MOVWZ	stackRetOffset+28(FP), R1;		\
+	ADD	R1, R4;				\
+	ADD	R1, R6;				\
+	SUB	R1, R5;				\
+	BL	callRet<>(SB);			\
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $40-0
+	MOVD	R7, 8(R15)
+	MOVD	R6, 16(R15)
+	MOVD	R4, 24(R15)
+	MOVD	R5, 32(R15)
+	MOVD	$0, 40(R15)
+	BL	runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+// Not a function: target for EXRL (execute relative long) instruction.
+TEXT callfnMVC<>(SB),NOSPLIT|NOFRAME,$0-0
+	MVC	$1, 0(R4), 0(R6)
+
+TEXT runtime·procyield(SB),NOSPLIT,$0-0
+	RET
+
+// Save state of caller into g->sched,
+// but using fake PC from systemstack_switch.
+// Must only be called from functions with no locals ($0)
+// or else unwinding from systemstack_switch is incorrect.
+// Smashes R1.
+TEXT gosave_systemstack_switch<>(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$runtime·systemstack_switch(SB), R1
+	ADD	$16, R1	// get past prologue
+	MOVD	R1, (g_sched+gobuf_pc)(g)
+	MOVD	R15, (g_sched+gobuf_sp)(g)
+	MOVD	$0, (g_sched+gobuf_lr)(g)
+	MOVD	$0, (g_sched+gobuf_ret)(g)
+	// Assert ctxt is zero. See func save.
+	MOVD	(g_sched+gobuf_ctxt)(g), R1
+	CMPBEQ	R1, $0, 2(PC)
+	BL	runtime·abort(SB)
+	RET
+
+// func asmcgocall(fn, arg unsafe.Pointer) int32
+// Call fn(arg) on the scheduler stack,
+// aligned appropriately for the gcc ABI.
+// See cgocall.go for more details.
+TEXT ·asmcgocall(SB),NOSPLIT,$0-20
+	// R2 = argc; R3 = argv; R11 = temp; R13 = g; R15 = stack pointer
+	// C TLS base pointer in AR0:AR1
+	MOVD	fn+0(FP), R3
+	MOVD	arg+8(FP), R4
+
+	MOVD	R15, R2		// save original stack pointer
+	MOVD	g, R5
+
+	// Figure out if we need to switch to m->g0 stack.
+	// We get called to create new OS threads too, and those
+	// come in on the m->g0 stack already. Or we might already
+	// be on the m->gsignal stack.
+	MOVD	g_m(g), R6
+	MOVD	m_gsignal(R6), R7
+	CMPBEQ	R7, g, g0
+	MOVD	m_g0(R6), R7
+	CMPBEQ	R7, g, g0
+	BL	gosave_systemstack_switch<>(SB)
+	MOVD	R7, g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R15
+
+	// Now on a scheduling stack (a pthread-created stack).
+g0:
+	// Save room for two of our pointers, plus 160 bytes of callee
+	// save area that lives on the caller stack.
+	SUB	$176, R15
+	MOVD	$~7, R6
+	AND	R6, R15                 // 8-byte alignment for gcc ABI
+	MOVD	R5, 168(R15)             // save old g on stack
+	MOVD	(g_stack+stack_hi)(R5), R5
+	SUB	R2, R5
+	MOVD	R5, 160(R15)             // save depth in old g stack (can't just save SP, as stack might be copied during a callback)
+	MOVD	$0, 0(R15)              // clear back chain pointer (TODO can we give it real back trace information?)
+	MOVD	R4, R2                  // arg in R2
+	BL	R3                      // can clobber: R0-R5, R14, F0-F3, F5, F7-F15
+
+	XOR	R0, R0                  // set R0 back to 0.
+	// Restore g, stack pointer.
+	MOVD	168(R15), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_stack+stack_hi)(g), R5
+	MOVD	160(R15), R6
+	SUB	R6, R5
+	MOVD	R5, R15
+
+	MOVW	R2, ret+16(FP)
+	RET
+
+// cgocallback(fn, frame unsafe.Pointer, ctxt uintptr)
+// See cgocall.go for more details.
+TEXT ·cgocallback(SB),NOSPLIT,$24-24
+	NO_LOCAL_POINTERS
+
+	// Skip cgocallbackg, just dropm when fn is nil, and frame is the saved g.
+	// It is used to dropm while thread is exiting.
+	MOVD	fn+0(FP), R1
+	CMPBNE	R1, $0, loadg
+	// Restore the g from frame.
+	MOVD	frame+8(FP), g
+	BR	dropm
+
+loadg:
+	// Load m and g from thread-local storage.
+	MOVB	runtime·iscgo(SB), R3
+	CMPBEQ	R3, $0, nocgo
+	BL	runtime·load_g(SB)
+
+nocgo:
+	// If g is nil, Go did not create the current thread,
+	// or if this thread never called into Go on pthread platforms.
+	// Call needm to obtain one for temporary use.
+	// In this case, we're running on the thread stack, so there's
+	// lots of space, but the linker doesn't know. Hide the call from
+	// the linker analysis by using an indirect call.
+	CMPBEQ	g, $0, needm
+
+	MOVD	g_m(g), R8
+	MOVD	R8, savedm-8(SP)
+	BR	havem
+
+needm:
+	MOVD	g, savedm-8(SP) // g is zero, so is m.
+	MOVD	$runtime·needAndBindM(SB), R3
+	BL	(R3)
+
+	// Set m->sched.sp = SP, so that if a panic happens
+	// during the function we are about to execute, it will
+	// have a valid SP to run on the g0 stack.
+	// The next few lines (after the havem label)
+	// will save this SP onto the stack and then write
+	// the same SP back to m->sched.sp. That seems redundant,
+	// but if an unrecovered panic happens, unwindm will
+	// restore the g->sched.sp from the stack location
+	// and then systemstack will try to use it. If we don't set it here,
+	// that restored SP will be uninitialized (typically 0) and
+	// will not be usable.
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), R3
+	MOVD	R15, (g_sched+gobuf_sp)(R3)
+
+havem:
+	// Now there's a valid m, and we're running on its m->g0.
+	// Save current m->g0->sched.sp on stack and then set it to SP.
+	// Save current sp in m->g0->sched.sp in preparation for
+	// switch back to m->curg stack.
+	// NOTE: unwindm knows that the saved g->sched.sp is at 8(R1) aka savedsp-16(SP).
+	MOVD	m_g0(R8), R3
+	MOVD	(g_sched+gobuf_sp)(R3), R4
+	MOVD	R4, savedsp-24(SP)	// must match frame size
+	MOVD	R15, (g_sched+gobuf_sp)(R3)
+
+	// Switch to m->curg stack and call runtime.cgocallbackg.
+	// Because we are taking over the execution of m->curg
+	// but *not* resuming what had been running, we need to
+	// save that information (m->curg->sched) so we can restore it.
+	// We can restore m->curg->sched.sp easily, because calling
+	// runtime.cgocallbackg leaves SP unchanged upon return.
+	// To save m->curg->sched.pc, we push it onto the curg stack and
+	// open a frame the same size as cgocallback's g0 frame.
+	// Once we switch to the curg stack, the pushed PC will appear
+	// to be the return PC of cgocallback, so that the traceback
+	// will seamlessly trace back into the earlier calls.
+	MOVD	m_curg(R8), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R4 // prepare stack as R4
+	MOVD	(g_sched+gobuf_pc)(g), R5
+	MOVD	R5, -(24+8)(R4)	// "saved LR"; must match frame size
+	// Gather our arguments into registers.
+	MOVD	fn+0(FP), R1
+	MOVD	frame+8(FP), R2
+	MOVD	ctxt+16(FP), R3
+	MOVD	$-(24+8)(R4), R15	// switch stack; must match frame size
+	MOVD	R1, 8(R15)
+	MOVD	R2, 16(R15)
+	MOVD	R3, 24(R15)
+	BL	runtime·cgocallbackg(SB)
+
+	// Restore g->sched (== m->curg->sched) from saved values.
+	MOVD	0(R15), R5
+	MOVD	R5, (g_sched+gobuf_pc)(g)
+	MOVD	$(24+8)(R15), R4	// must match frame size
+	MOVD	R4, (g_sched+gobuf_sp)(g)
+
+	// Switch back to m->g0's stack and restore m->g0->sched.sp.
+	// (Unlike m->curg, the g0 goroutine never uses sched.pc,
+	// so we do not have to restore it.)
+	MOVD	g_m(g), R8
+	MOVD	m_g0(R8), g
+	BL	runtime·save_g(SB)
+	MOVD	(g_sched+gobuf_sp)(g), R15
+	MOVD	savedsp-24(SP), R4	// must match frame size
+	MOVD	R4, (g_sched+gobuf_sp)(g)
+
+	// If the m on entry was nil, we called needm above to borrow an m,
+	// 1. for the duration of the call on non-pthread platforms,
+	// 2. or the duration of the C thread alive on pthread platforms.
+	// If the m on entry wasn't nil,
+	// 1. the thread might be a Go thread,
+	// 2. or it wasn't the first call from a C thread on pthread platforms,
+	//    since then we skip dropm to reuse the m in the first call.
+	MOVD	savedm-8(SP), R6
+	CMPBNE	R6, $0, droppedm
+
+	// Skip dropm to reuse it in the next call, when a pthread key has been created.
+	MOVD	_cgo_pthread_key_created(SB), R6
+	// It means cgo is disabled when _cgo_pthread_key_created is a nil pointer, need dropm.
+	CMPBEQ	R6, $0, dropm
+	MOVD	(R6), R6
+	CMPBNE	R6, $0, droppedm
+
+dropm:
+	MOVD	$runtime·dropm(SB), R3
+	BL	(R3)
+droppedm:
+
+	// Done!
+	RET
+
+// void setg(G*); set g. for use by needm.
+TEXT runtime·setg(SB), NOSPLIT, $0-8
+	MOVD	gg+0(FP), g
+	// This only happens if iscgo, so jump straight to save_g
+	BL	runtime·save_g(SB)
+	RET
+
+// void setg_gcc(G*); set g in C TLS.
+// Must obey the gcc calling convention.
+TEXT setg_gcc<>(SB),NOSPLIT|NOFRAME,$0-0
+	// The standard prologue clobbers LR (R14), which is callee-save in
+	// the C ABI, so we have to use NOFRAME and save LR ourselves.
+	MOVD	LR, R1
+	// Also save g, R10, and R11 since they're callee-save in C ABI
+	MOVD	R10, R3
+	MOVD	g, R4
+	MOVD	R11, R5
+
+	MOVD	R2, g
+	BL	runtime·save_g(SB)
+
+	MOVD	R5, R11
+	MOVD	R4, g
+	MOVD	R3, R10
+	MOVD	R1, LR
+	RET
+
+TEXT runtime·abort(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	(R0), R0
+	UNDEF
+
+// int64 runtime·cputicks(void)
+TEXT runtime·cputicks(SB),NOSPLIT,$0-8
+	// The TOD clock on s390 counts from the year 1900 in ~250ps intervals.
+	// This means that since about 1972 the msb has been set, making the
+	// result of a call to STORE CLOCK (stck) a negative number.
+	// We clear the msb to make it positive.
+	STCK	ret+0(FP)      // serialises before and after call
+	MOVD	ret+0(FP), R3  // R3 will wrap to 0 in the year 2043
+	SLD	$1, R3
+	SRD	$1, R3
+	MOVD	R3, ret+0(FP)
+	RET
+
+// AES hashing not implemented for s390x
+TEXT runtime·memhash(SB),NOSPLIT|NOFRAME,$0-32
+	JMP	runtime·memhashFallback(SB)
+TEXT runtime·strhash(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·strhashFallback(SB)
+TEXT runtime·memhash32(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash32Fallback(SB)
+TEXT runtime·memhash64(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash64Fallback(SB)
+
+TEXT runtime·return0(SB), NOSPLIT, $0
+	MOVW	$0, R3
+	RET
+
+// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
+// Must obey the gcc calling convention.
+TEXT _cgo_topofstack(SB),NOSPLIT|NOFRAME,$0
+	// g (R13), R10, R11 and LR (R14) are callee-save in the C ABI, so save them
+	MOVD	g, R1
+	MOVD	R10, R3
+	MOVD	LR, R4
+	MOVD	R11, R5
+
+	BL	runtime·load_g(SB)	// clobbers g (R13), R10, R11
+	MOVD	g_m(g), R2
+	MOVD	m_curg(R2), R2
+	MOVD	(g_stack+stack_hi)(R2), R2
+
+	MOVD	R1, g
+	MOVD	R3, R10
+	MOVD	R4, LR
+	MOVD	R5, R11
+	RET
+
+// The top-most function running on a goroutine
+// returns to goexit+PCQuantum.
+TEXT runtime·goexit(SB),NOSPLIT|NOFRAME|TOPFRAME,$0-0
+	BYTE $0x07; BYTE $0x00; // 2-byte nop
+	BL	runtime·goexit1(SB)	// does not return
+	// traceback from goexit1 must hit code range of goexit
+	BYTE $0x07; BYTE $0x00; // 2-byte nop
+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	// Stores are already ordered on s390x, so this is just a
+	// compile barrier.
+	RET
+
+// This is called from .init_array and follows the platform, not Go, ABI.
+// We are overly conservative. We could only save the registers we use.
+// However, since this function is only called once per loaded module
+// performance is unimportant.
+TEXT runtime·addmoduledata(SB),NOSPLIT|NOFRAME,$0-0
+	// Save R6-R15 in the register save area of the calling function.
+	// Don't bother saving F8-F15 as we aren't doing any calls.
+	STMG	R6, R15, 48(R15)
+
+	// append the argument (passed in R2, as per the ELF ABI) to the
+	// moduledata linked list.
+	MOVD	runtime·lastmoduledatap(SB), R1
+	MOVD	R2, moduledata_next(R1)
+	MOVD	R2, runtime·lastmoduledatap(SB)
+
+	// Restore R6-R15.
+	LMG	48(R15), R6, R15
+	RET
+
+TEXT ·checkASM(SB),NOSPLIT,$0-1
+	MOVB	$1, ret+0(FP)
+	RET
+
+// gcWriteBarrier informs the GC about heap pointer writes.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It accepts the
+// number of bytes of buffer needed in R9, and returns a pointer
+// to the buffer space in R9.
+// It clobbers R10 (the temp register) and R1 (used by PLT stub).
+// It does not clobber any other general-purpose registers,
+// but may clobber others (e.g., floating point registers).
+TEXT gcWriteBarrier<>(SB),NOSPLIT,$96
+	// Save the registers clobbered by the fast path.
+	MOVD	R4, 96(R15)
+retry:
+	MOVD	g_m(g), R1
+	MOVD	m_p(R1), R1
+	// Increment wbBuf.next position.
+	MOVD	R9, R4
+	ADD	(p_wbBuf+wbBuf_next)(R1), R4
+	// Is the buffer full?
+	MOVD	(p_wbBuf+wbBuf_end)(R1), R10
+	CMPUBGT	R4, R10, flush
+	// Commit to the larger buffer.
+	MOVD	R4, (p_wbBuf+wbBuf_next)(R1)
+	// Make return value (the original next position)
+	SUB	R9, R4, R9
+	// Restore registers.
+	MOVD	96(R15), R4
+	RET
+
+flush:
+	// Save all general purpose registers since these could be
+	// clobbered by wbBufFlush and were not saved by the caller.
+	STMG	R2, R3, 8(R15)
+	MOVD	R0, 24(R15)
+	// R1 already saved.
+	// R4 already saved.
+	STMG	R5, R12, 32(R15) // save R5 - R12
+	// R13 is g.
+	// R14 is LR.
+	// R15 is SP.
+
+	CALL	runtime·wbBufFlush(SB)
+
+	LMG	8(R15), R2, R3   // restore R2 - R3
+	MOVD	24(R15), R0      // restore R0
+	LMG	32(R15), R5, R12 // restore R5 - R12
+	JMP	retry
+
+TEXT runtime·gcWriteBarrier1<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$8, R9
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier2<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$16, R9
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier3<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$24, R9
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier4<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$32, R9
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier5<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$40, R9
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier6<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$48, R9
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier7<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$56, R9
+	JMP	gcWriteBarrier<>(SB)
+TEXT runtime·gcWriteBarrier8<ABIInternal>(SB),NOSPLIT,$0
+	MOVD	$64, R9
+	JMP	gcWriteBarrier<>(SB)
+
+// Note: these functions use a special calling convention to save generated code space.
+// Arguments are passed in registers, but the space for those arguments are allocated
+// in the caller's stack frame. These stubs write the args into that stack space and
+// then tail call to the corresponding runtime handler.
+// The tail call makes these stubs disappear in backtraces.
+TEXT runtime·panicIndex(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicIndex(SB)
+TEXT runtime·panicIndexU(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicIndexU(SB)
+TEXT runtime·panicSliceAlen(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSliceAlen(SB)
+TEXT runtime·panicSliceAlenU(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSliceAlenU(SB)
+TEXT runtime·panicSliceAcap(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSliceAcap(SB)
+TEXT runtime·panicSliceAcapU(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSliceAcapU(SB)
+TEXT runtime·panicSliceB(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicSliceB(SB)
+TEXT runtime·panicSliceBU(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicSliceBU(SB)
+TEXT runtime·panicSlice3Alen(SB),NOSPLIT,$0-16
+	MOVD	R2, x+0(FP)
+	MOVD	R3, y+8(FP)
+	JMP	runtime·goPanicSlice3Alen(SB)
+TEXT runtime·panicSlice3AlenU(SB),NOSPLIT,$0-16
+	MOVD	R2, x+0(FP)
+	MOVD	R3, y+8(FP)
+	JMP	runtime·goPanicSlice3AlenU(SB)
+TEXT runtime·panicSlice3Acap(SB),NOSPLIT,$0-16
+	MOVD	R2, x+0(FP)
+	MOVD	R3, y+8(FP)
+	JMP	runtime·goPanicSlice3Acap(SB)
+TEXT runtime·panicSlice3AcapU(SB),NOSPLIT,$0-16
+	MOVD	R2, x+0(FP)
+	MOVD	R3, y+8(FP)
+	JMP	runtime·goPanicSlice3AcapU(SB)
+TEXT runtime·panicSlice3B(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSlice3B(SB)
+TEXT runtime·panicSlice3BU(SB),NOSPLIT,$0-16
+	MOVD	R1, x+0(FP)
+	MOVD	R2, y+8(FP)
+	JMP	runtime·goPanicSlice3BU(SB)
+TEXT runtime·panicSlice3C(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicSlice3C(SB)
+TEXT runtime·panicSlice3CU(SB),NOSPLIT,$0-16
+	MOVD	R0, x+0(FP)
+	MOVD	R1, y+8(FP)
+	JMP	runtime·goPanicSlice3CU(SB)
+TEXT runtime·panicSliceConvert(SB),NOSPLIT,$0-16
+	MOVD	R2, x+0(FP)
+	MOVD	R3, y+8(FP)
+	JMP	runtime·goPanicSliceConvert(SB)
diff --git a/src/runtime/asm_wasm.s b/src/runtime/asm_wasm.s
new file mode 100644
index 0000000..9cd8b5a
--- /dev/null
+++ b/src/runtime/asm_wasm.s
@@ -0,0 +1,525 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+TEXT runtime·rt0_go(SB), NOSPLIT|NOFRAME|TOPFRAME, $0
+	// save m->g0 = g0
+	MOVD $runtime·g0(SB), runtime·m0+m_g0(SB)
+	// save m0 to g0->m
+	MOVD $runtime·m0(SB), runtime·g0+g_m(SB)
+	// set g to g0
+	MOVD $runtime·g0(SB), g
+	CALLNORESUME runtime·check(SB)
+#ifdef GOOS_js
+	CALLNORESUME runtime·args(SB)
+#endif
+	CALLNORESUME runtime·osinit(SB)
+	CALLNORESUME runtime·schedinit(SB)
+	MOVD $runtime·mainPC(SB), 0(SP)
+	CALLNORESUME runtime·newproc(SB)
+	CALL runtime·mstart(SB) // WebAssembly stack will unwind when switching to another goroutine
+	UNDEF
+
+TEXT runtime·mstart(SB),NOSPLIT|TOPFRAME,$0
+	CALL	runtime·mstart0(SB)
+	RET // not reached
+
+DATA  runtime·mainPC+0(SB)/8,$runtime·main(SB)
+GLOBL runtime·mainPC(SB),RODATA,$8
+
+// func checkASM() bool
+TEXT ·checkASM(SB), NOSPLIT, $0-1
+	MOVB $1, ret+0(FP)
+	RET
+
+TEXT runtime·gogo(SB), NOSPLIT, $0-8
+	MOVD buf+0(FP), R0
+	MOVD gobuf_g(R0), R1
+	MOVD 0(R1), R2	// make sure g != nil
+	MOVD R1, g
+	MOVD gobuf_sp(R0), SP
+
+	// Put target PC at -8(SP), wasm_pc_f_loop will pick it up
+	Get SP
+	I32Const $8
+	I32Sub
+	I64Load gobuf_pc(R0)
+	I64Store $0
+
+	MOVD gobuf_ret(R0), RET0
+	MOVD gobuf_ctxt(R0), CTXT
+	// clear to help garbage collector
+	MOVD $0, gobuf_sp(R0)
+	MOVD $0, gobuf_ret(R0)
+	MOVD $0, gobuf_ctxt(R0)
+
+	I32Const $1
+	Return
+
+// func mcall(fn func(*g))
+// Switch to m->g0's stack, call fn(g).
+// Fn must never return. It should gogo(&g->sched)
+// to keep running g.
+TEXT runtime·mcall(SB), NOSPLIT, $0-8
+	// CTXT = fn
+	MOVD fn+0(FP), CTXT
+	// R1 = g.m
+	MOVD g_m(g), R1
+	// R2 = g0
+	MOVD m_g0(R1), R2
+
+	// save state in g->sched
+	MOVD 0(SP), g_sched+gobuf_pc(g)     // caller's PC
+	MOVD $fn+0(FP), g_sched+gobuf_sp(g) // caller's SP
+
+	// if g == g0 call badmcall
+	Get g
+	Get R2
+	I64Eq
+	If
+		JMP runtime·badmcall(SB)
+	End
+
+	// switch to g0's stack
+	I64Load (g_sched+gobuf_sp)(R2)
+	I64Const $8
+	I64Sub
+	I32WrapI64
+	Set SP
+
+	// set arg to current g
+	MOVD g, 0(SP)
+
+	// switch to g0
+	MOVD R2, g
+
+	// call fn
+	Get CTXT
+	I32WrapI64
+	I64Load $0
+	CALL
+
+	Get SP
+	I32Const $8
+	I32Add
+	Set SP
+
+	JMP runtime·badmcall2(SB)
+
+// func systemstack(fn func())
+TEXT runtime·systemstack(SB), NOSPLIT, $0-8
+	// R0 = fn
+	MOVD fn+0(FP), R0
+	// R1 = g.m
+	MOVD g_m(g), R1
+	// R2 = g0
+	MOVD m_g0(R1), R2
+
+	// if g == g0
+	Get g
+	Get R2
+	I64Eq
+	If
+		// no switch:
+		MOVD R0, CTXT
+
+		Get CTXT
+		I32WrapI64
+		I64Load $0
+		JMP
+	End
+
+	// if g != m.curg
+	Get g
+	I64Load m_curg(R1)
+	I64Ne
+	If
+		CALLNORESUME runtime·badsystemstack(SB)
+	End
+
+	// switch:
+
+	// save state in g->sched. Pretend to
+	// be systemstack_switch if the G stack is scanned.
+	MOVD $runtime·systemstack_switch(SB), g_sched+gobuf_pc(g)
+
+	MOVD SP, g_sched+gobuf_sp(g)
+
+	// switch to g0
+	MOVD R2, g
+
+	// make it look like mstart called systemstack on g0, to stop traceback
+	I64Load (g_sched+gobuf_sp)(R2)
+	I64Const $8
+	I64Sub
+	Set R3
+
+	MOVD $runtime·mstart(SB), 0(R3)
+	MOVD R3, SP
+
+	// call fn
+	MOVD R0, CTXT
+
+	Get CTXT
+	I32WrapI64
+	I64Load $0
+	CALL
+
+	// switch back to g
+	MOVD g_m(g), R1
+	MOVD m_curg(R1), R2
+	MOVD R2, g
+	MOVD g_sched+gobuf_sp(R2), SP
+	MOVD $0, g_sched+gobuf_sp(R2)
+	RET
+
+TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
+	RET
+
+// AES hashing not implemented for wasm
+TEXT runtime·memhash(SB),NOSPLIT|NOFRAME,$0-32
+	JMP	runtime·memhashFallback(SB)
+TEXT runtime·strhash(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·strhashFallback(SB)
+TEXT runtime·memhash32(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash32Fallback(SB)
+TEXT runtime·memhash64(SB),NOSPLIT|NOFRAME,$0-24
+	JMP	runtime·memhash64Fallback(SB)
+
+TEXT runtime·return0(SB), NOSPLIT, $0-0
+	MOVD $0, RET0
+	RET
+
+TEXT runtime·asminit(SB), NOSPLIT, $0-0
+	// No per-thread init.
+	RET
+
+TEXT ·publicationBarrier(SB), NOSPLIT, $0-0
+	RET
+
+TEXT runtime·procyield(SB), NOSPLIT, $0-0 // FIXME
+	RET
+
+TEXT runtime·breakpoint(SB), NOSPLIT, $0-0
+	UNDEF
+
+// Called during function prolog when more stack is needed.
+//
+// The traceback routines see morestack on a g0 as being
+// the top of a stack (for example, morestack calling newstack
+// calling the scheduler calling newm calling gc), so we must
+// record an argument size. For that purpose, it has no arguments.
+TEXT runtime·morestack(SB), NOSPLIT, $0-0
+	// R1 = g.m
+	MOVD g_m(g), R1
+
+	// R2 = g0
+	MOVD m_g0(R1), R2
+
+	// Cannot grow scheduler stack (m->g0).
+	Get g
+	Get R1
+	I64Eq
+	If
+		CALLNORESUME runtime·badmorestackg0(SB)
+	End
+
+	// Cannot grow signal stack (m->gsignal).
+	Get g
+	I64Load m_gsignal(R1)
+	I64Eq
+	If
+		CALLNORESUME runtime·badmorestackgsignal(SB)
+	End
+
+	// Called from f.
+	// Set m->morebuf to f's caller.
+	NOP	SP	// tell vet SP changed - stop checking offsets
+	MOVD 8(SP), m_morebuf+gobuf_pc(R1)
+	MOVD $16(SP), m_morebuf+gobuf_sp(R1) // f's caller's SP
+	MOVD g, m_morebuf+gobuf_g(R1)
+
+	// Set g->sched to context in f.
+	MOVD 0(SP), g_sched+gobuf_pc(g)
+	MOVD $8(SP), g_sched+gobuf_sp(g) // f's SP
+	MOVD CTXT, g_sched+gobuf_ctxt(g)
+
+	// Call newstack on m->g0's stack.
+	MOVD R2, g
+	MOVD g_sched+gobuf_sp(R2), SP
+	CALL runtime·newstack(SB)
+	UNDEF // crash if newstack returns
+
+// morestack but not preserving ctxt.
+TEXT runtime·morestack_noctxt(SB),NOSPLIT,$0
+	MOVD $0, CTXT
+	JMP runtime·morestack(SB)
+
+TEXT ·asmcgocall(SB), NOSPLIT, $0-0
+	UNDEF
+
+#define DISPATCH(NAME, MAXSIZE) \
+	Get R0; \
+	I64Const $MAXSIZE; \
+	I64LeU; \
+	If; \
+		JMP NAME(SB); \
+	End
+
+TEXT ·reflectcall(SB), NOSPLIT, $0-48
+	I64Load fn+8(FP)
+	I64Eqz
+	If
+		CALLNORESUME runtime·sigpanic<ABIInternal>(SB)
+	End
+
+	MOVW frameSize+32(FP), R0
+
+	DISPATCH(runtime·call16, 16)
+	DISPATCH(runtime·call32, 32)
+	DISPATCH(runtime·call64, 64)
+	DISPATCH(runtime·call128, 128)
+	DISPATCH(runtime·call256, 256)
+	DISPATCH(runtime·call512, 512)
+	DISPATCH(runtime·call1024, 1024)
+	DISPATCH(runtime·call2048, 2048)
+	DISPATCH(runtime·call4096, 4096)
+	DISPATCH(runtime·call8192, 8192)
+	DISPATCH(runtime·call16384, 16384)
+	DISPATCH(runtime·call32768, 32768)
+	DISPATCH(runtime·call65536, 65536)
+	DISPATCH(runtime·call131072, 131072)
+	DISPATCH(runtime·call262144, 262144)
+	DISPATCH(runtime·call524288, 524288)
+	DISPATCH(runtime·call1048576, 1048576)
+	DISPATCH(runtime·call2097152, 2097152)
+	DISPATCH(runtime·call4194304, 4194304)
+	DISPATCH(runtime·call8388608, 8388608)
+	DISPATCH(runtime·call16777216, 16777216)
+	DISPATCH(runtime·call33554432, 33554432)
+	DISPATCH(runtime·call67108864, 67108864)
+	DISPATCH(runtime·call134217728, 134217728)
+	DISPATCH(runtime·call268435456, 268435456)
+	DISPATCH(runtime·call536870912, 536870912)
+	DISPATCH(runtime·call1073741824, 1073741824)
+	JMP runtime·badreflectcall(SB)
+
+#define CALLFN(NAME, MAXSIZE) \
+TEXT NAME(SB), WRAPPER, $MAXSIZE-48; \
+	NO_LOCAL_POINTERS; \
+	MOVW stackArgsSize+24(FP), R0; \
+	\
+	Get R0; \
+	I64Eqz; \
+	Not; \
+	If; \
+		Get SP; \
+		I64Load stackArgs+16(FP); \
+		I32WrapI64; \
+		I64Load stackArgsSize+24(FP); \
+		I32WrapI64; \
+		MemoryCopy; \
+	End; \
+	\
+	MOVD f+8(FP), CTXT; \
+	Get CTXT; \
+	I32WrapI64; \
+	I64Load $0; \
+	CALL; \
+	\
+	I64Load32U stackRetOffset+28(FP); \
+	Set R0; \
+	\
+	MOVD stackArgsType+0(FP), RET0; \
+	\
+	I64Load stackArgs+16(FP); \
+	Get R0; \
+	I64Add; \
+	Set RET1; \
+	\
+	Get SP; \
+	I64ExtendI32U; \
+	Get R0; \
+	I64Add; \
+	Set RET2; \
+	\
+	I64Load32U stackArgsSize+24(FP); \
+	Get R0; \
+	I64Sub; \
+	Set RET3; \
+	\
+	CALL callRet<>(SB); \
+	RET
+
+// callRet copies return values back at the end of call*. This is a
+// separate function so it can allocate stack space for the arguments
+// to reflectcallmove. It does not follow the Go ABI; it expects its
+// arguments in registers.
+TEXT callRet<>(SB), NOSPLIT, $40-0
+	NO_LOCAL_POINTERS
+	MOVD RET0, 0(SP)
+	MOVD RET1, 8(SP)
+	MOVD RET2, 16(SP)
+	MOVD RET3, 24(SP)
+	MOVD $0,   32(SP)
+	CALL runtime·reflectcallmove(SB)
+	RET
+
+CALLFN(·call16, 16)
+CALLFN(·call32, 32)
+CALLFN(·call64, 64)
+CALLFN(·call128, 128)
+CALLFN(·call256, 256)
+CALLFN(·call512, 512)
+CALLFN(·call1024, 1024)
+CALLFN(·call2048, 2048)
+CALLFN(·call4096, 4096)
+CALLFN(·call8192, 8192)
+CALLFN(·call16384, 16384)
+CALLFN(·call32768, 32768)
+CALLFN(·call65536, 65536)
+CALLFN(·call131072, 131072)
+CALLFN(·call262144, 262144)
+CALLFN(·call524288, 524288)
+CALLFN(·call1048576, 1048576)
+CALLFN(·call2097152, 2097152)
+CALLFN(·call4194304, 4194304)
+CALLFN(·call8388608, 8388608)
+CALLFN(·call16777216, 16777216)
+CALLFN(·call33554432, 33554432)
+CALLFN(·call67108864, 67108864)
+CALLFN(·call134217728, 134217728)
+CALLFN(·call268435456, 268435456)
+CALLFN(·call536870912, 536870912)
+CALLFN(·call1073741824, 1073741824)
+
+TEXT runtime·goexit(SB), NOSPLIT|TOPFRAME, $0-0
+	NOP // first PC of goexit is skipped
+	CALL runtime·goexit1(SB) // does not return
+	UNDEF
+
+TEXT runtime·cgocallback(SB), NOSPLIT, $0-24
+	UNDEF
+
+// gcWriteBarrier informs the GC about heap pointer writes.
+//
+// gcWriteBarrier does NOT follow the Go ABI. It accepts the
+// number of bytes of buffer needed as a wasm argument
+// (put on the TOS by the caller, lives in local R0 in this body)
+// and returns a pointer to the buffer space as a wasm result
+// (left on the TOS in this body, appears on the wasm stack
+// in the caller).
+TEXT gcWriteBarrier<>(SB), NOSPLIT, $0
+	Loop
+		// R3 = g.m
+		MOVD g_m(g), R3
+		// R4 = p
+		MOVD m_p(R3), R4
+		// R5 = wbBuf.next
+		MOVD p_wbBuf+wbBuf_next(R4), R5
+
+		// Increment wbBuf.next
+		Get R5
+		Get R0
+		I64Add
+		Set R5
+
+		// Is the buffer full?
+		Get R5
+		I64Load (p_wbBuf+wbBuf_end)(R4)
+		I64LeU
+		If
+			// Commit to the larger buffer.
+			MOVD R5, p_wbBuf+wbBuf_next(R4)
+
+			// Make return value (the original next position)
+			Get R5
+			Get R0
+			I64Sub
+
+			Return
+		End
+
+		// Flush
+		CALLNORESUME runtime·wbBufFlush(SB)
+
+		// Retry
+		Br $0
+	End
+
+TEXT runtime·gcWriteBarrier1<ABIInternal>(SB),NOSPLIT,$0
+	I64Const $8
+	Call	gcWriteBarrier<>(SB)
+	Return
+TEXT runtime·gcWriteBarrier2<ABIInternal>(SB),NOSPLIT,$0
+	I64Const $16
+	Call	gcWriteBarrier<>(SB)
+	Return
+TEXT runtime·gcWriteBarrier3<ABIInternal>(SB),NOSPLIT,$0
+	I64Const $24
+	Call	gcWriteBarrier<>(SB)
+	Return
+TEXT runtime·gcWriteBarrier4<ABIInternal>(SB),NOSPLIT,$0
+	I64Const $32
+	Call	gcWriteBarrier<>(SB)
+	Return
+TEXT runtime·gcWriteBarrier5<ABIInternal>(SB),NOSPLIT,$0
+	I64Const $40
+	Call	gcWriteBarrier<>(SB)
+	Return
+TEXT runtime·gcWriteBarrier6<ABIInternal>(SB),NOSPLIT,$0
+	I64Const $48
+	Call	gcWriteBarrier<>(SB)
+	Return
+TEXT runtime·gcWriteBarrier7<ABIInternal>(SB),NOSPLIT,$0
+	I64Const $56
+	Call	gcWriteBarrier<>(SB)
+	Return
+TEXT runtime·gcWriteBarrier8<ABIInternal>(SB),NOSPLIT,$0
+	I64Const $64
+	Call	gcWriteBarrier<>(SB)
+	Return
+
+TEXT wasm_pc_f_loop(SB),NOSPLIT,$0
+// Call the function for the current PC_F. Repeat until PAUSE != 0 indicates pause or exit.
+// The WebAssembly stack may unwind, e.g. when switching goroutines.
+// The Go stack on the linear memory is then used to jump to the correct functions
+// with this loop, without having to restore the full WebAssembly stack.
+// It is expected to have a pending call before entering the loop, so check PAUSE first.
+	Get PAUSE
+	I32Eqz
+	If
+	loop:
+		Loop
+			// Get PC_B & PC_F from -8(SP)
+			Get SP
+			I32Const $8
+			I32Sub
+			I32Load16U $0 // PC_B
+
+			Get SP
+			I32Const $8
+			I32Sub
+			I32Load16U $2 // PC_F
+
+			CallIndirect $0
+			Drop
+
+			Get PAUSE
+			I32Eqz
+			BrIf loop
+		End
+	End
+
+	I32Const $0
+	Set PAUSE
+
+	Return
+
+TEXT wasm_export_lib(SB),NOSPLIT,$0
+	UNDEF
diff --git a/src/runtime/atomic_arm64.s b/src/runtime/atomic_arm64.s
new file mode 100644
index 0000000..21b4d8c
--- /dev/null
+++ b/src/runtime/atomic_arm64.s
@@ -0,0 +1,9 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	DMB	$0xe	// DMB ST
+	RET
diff --git a/src/runtime/atomic_loong64.s b/src/runtime/atomic_loong64.s
new file mode 100644
index 0000000..4818a82
--- /dev/null
+++ b/src/runtime/atomic_loong64.s
@@ -0,0 +1,9 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	DBAR
+	RET
diff --git a/src/runtime/atomic_mips64x.s b/src/runtime/atomic_mips64x.s
new file mode 100644
index 0000000..dd6380c
--- /dev/null
+++ b/src/runtime/atomic_mips64x.s
@@ -0,0 +1,13 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips64 || mips64le
+
+#include "textflag.h"
+
+#define SYNC	WORD $0xf
+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	SYNC
+	RET
diff --git a/src/runtime/atomic_mipsx.s b/src/runtime/atomic_mipsx.s
new file mode 100644
index 0000000..ac255fe
--- /dev/null
+++ b/src/runtime/atomic_mipsx.s
@@ -0,0 +1,11 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips || mipsle
+
+#include "textflag.h"
+
+TEXT ·publicationBarrier(SB),NOSPLIT,$0
+	SYNC
+	RET
diff --git a/src/runtime/atomic_pointer.go b/src/runtime/atomic_pointer.go
new file mode 100644
index 0000000..b61bf0b
--- /dev/null
+++ b/src/runtime/atomic_pointer.go
@@ -0,0 +1,114 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/goexperiment"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// These functions cannot have go:noescape annotations,
+// because while ptr does not escape, new does.
+// If new is marked as not escaping, the compiler will make incorrect
+// escape analysis decisions about the pointer value being stored.
+
+// atomicwb performs a write barrier before an atomic pointer write.
+// The caller should guard the call with "if writeBarrier.enabled".
+//
+//go:nosplit
+func atomicwb(ptr *unsafe.Pointer, new unsafe.Pointer) {
+	slot := (*uintptr)(unsafe.Pointer(ptr))
+	buf := getg().m.p.ptr().wbBuf.get2()
+	buf[0] = *slot
+	buf[1] = uintptr(new)
+}
+
+// atomicstorep performs *ptr = new atomically and invokes a write barrier.
+//
+//go:nosplit
+func atomicstorep(ptr unsafe.Pointer, new unsafe.Pointer) {
+	if writeBarrier.enabled {
+		atomicwb((*unsafe.Pointer)(ptr), new)
+	}
+	if goexperiment.CgoCheck2 {
+		cgoCheckPtrWrite((*unsafe.Pointer)(ptr), new)
+	}
+	atomic.StorepNoWB(noescape(ptr), new)
+}
+
+// atomic_storePointer is the implementation of runtime/internal/UnsafePointer.Store
+// (like StoreNoWB but with the write barrier).
+//
+//go:nosplit
+//go:linkname atomic_storePointer runtime/internal/atomic.storePointer
+func atomic_storePointer(ptr *unsafe.Pointer, new unsafe.Pointer) {
+	atomicstorep(unsafe.Pointer(ptr), new)
+}
+
+// atomic_casPointer is the implementation of runtime/internal/UnsafePointer.CompareAndSwap
+// (like CompareAndSwapNoWB but with the write barrier).
+//
+//go:nosplit
+//go:linkname atomic_casPointer runtime/internal/atomic.casPointer
+func atomic_casPointer(ptr *unsafe.Pointer, old, new unsafe.Pointer) bool {
+	if writeBarrier.enabled {
+		atomicwb(ptr, new)
+	}
+	if goexperiment.CgoCheck2 {
+		cgoCheckPtrWrite(ptr, new)
+	}
+	return atomic.Casp1(ptr, old, new)
+}
+
+// Like above, but implement in terms of sync/atomic's uintptr operations.
+// We cannot just call the runtime routines, because the race detector expects
+// to be able to intercept the sync/atomic forms but not the runtime forms.
+
+//go:linkname sync_atomic_StoreUintptr sync/atomic.StoreUintptr
+func sync_atomic_StoreUintptr(ptr *uintptr, new uintptr)
+
+//go:linkname sync_atomic_StorePointer sync/atomic.StorePointer
+//go:nosplit
+func sync_atomic_StorePointer(ptr *unsafe.Pointer, new unsafe.Pointer) {
+	if writeBarrier.enabled {
+		atomicwb(ptr, new)
+	}
+	if goexperiment.CgoCheck2 {
+		cgoCheckPtrWrite(ptr, new)
+	}
+	sync_atomic_StoreUintptr((*uintptr)(unsafe.Pointer(ptr)), uintptr(new))
+}
+
+//go:linkname sync_atomic_SwapUintptr sync/atomic.SwapUintptr
+func sync_atomic_SwapUintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:linkname sync_atomic_SwapPointer sync/atomic.SwapPointer
+//go:nosplit
+func sync_atomic_SwapPointer(ptr *unsafe.Pointer, new unsafe.Pointer) unsafe.Pointer {
+	if writeBarrier.enabled {
+		atomicwb(ptr, new)
+	}
+	if goexperiment.CgoCheck2 {
+		cgoCheckPtrWrite(ptr, new)
+	}
+	old := unsafe.Pointer(sync_atomic_SwapUintptr((*uintptr)(noescape(unsafe.Pointer(ptr))), uintptr(new)))
+	return old
+}
+
+//go:linkname sync_atomic_CompareAndSwapUintptr sync/atomic.CompareAndSwapUintptr
+func sync_atomic_CompareAndSwapUintptr(ptr *uintptr, old, new uintptr) bool
+
+//go:linkname sync_atomic_CompareAndSwapPointer sync/atomic.CompareAndSwapPointer
+//go:nosplit
+func sync_atomic_CompareAndSwapPointer(ptr *unsafe.Pointer, old, new unsafe.Pointer) bool {
+	if writeBarrier.enabled {
+		atomicwb(ptr, new)
+	}
+	if goexperiment.CgoCheck2 {
+		cgoCheckPtrWrite(ptr, new)
+	}
+	return sync_atomic_CompareAndSwapUintptr((*uintptr)(noescape(unsafe.Pointer(ptr))), uintptr(old), uintptr(new))
+}
diff --git a/src/runtime/atomic_ppc64x.s b/src/runtime/atomic_ppc64x.s
new file mode 100644
index 0000000..4742b6c
--- /dev/null
+++ b/src/runtime/atomic_ppc64x.s
@@ -0,0 +1,14 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ppc64 || ppc64le
+
+#include "textflag.h"
+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	// LWSYNC is the "export" barrier recommended by Power ISA
+	// v2.07 book II, appendix B.2.2.2.
+	// LWSYNC is a load/load, load/store, and store/store barrier.
+	LWSYNC
+	RET
diff --git a/src/runtime/atomic_riscv64.s b/src/runtime/atomic_riscv64.s
new file mode 100644
index 0000000..544a7c5
--- /dev/null
+++ b/src/runtime/atomic_riscv64.s
@@ -0,0 +1,10 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// func publicationBarrier()
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	FENCE
+	RET
diff --git a/src/runtime/auxv_none.go b/src/runtime/auxv_none.go
new file mode 100644
index 0000000..5d473ca
--- /dev/null
+++ b/src/runtime/auxv_none.go
@@ -0,0 +1,10 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !linux && !darwin && !dragonfly && !freebsd && !netbsd && !solaris
+
+package runtime
+
+func sysargs(argc int32, argv **byte) {
+}
diff --git a/src/runtime/callers_test.go b/src/runtime/callers_test.go
new file mode 100644
index 0000000..d316ee9
--- /dev/null
+++ b/src/runtime/callers_test.go
@@ -0,0 +1,524 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"reflect"
+	"runtime"
+	"strings"
+	"testing"
+)
+
+func f1(pan bool) []uintptr {
+	return f2(pan) // line 15
+}
+
+func f2(pan bool) []uintptr {
+	return f3(pan) // line 19
+}
+
+func f3(pan bool) []uintptr {
+	if pan {
+		panic("f3") // line 24
+	}
+	ret := make([]uintptr, 20)
+	return ret[:runtime.Callers(0, ret)] // line 27
+}
+
+func testCallers(t *testing.T, pcs []uintptr, pan bool) {
+	m := make(map[string]int, len(pcs))
+	frames := runtime.CallersFrames(pcs)
+	for {
+		frame, more := frames.Next()
+		if frame.Function != "" {
+			m[frame.Function] = frame.Line
+		}
+		if !more {
+			break
+		}
+	}
+
+	var seen []string
+	for k := range m {
+		seen = append(seen, k)
+	}
+	t.Logf("functions seen: %s", strings.Join(seen, " "))
+
+	var f3Line int
+	if pan {
+		f3Line = 24
+	} else {
+		f3Line = 27
+	}
+	want := []struct {
+		name string
+		line int
+	}{
+		{"f1", 15},
+		{"f2", 19},
+		{"f3", f3Line},
+	}
+	for _, w := range want {
+		if got := m["runtime_test."+w.name]; got != w.line {
+			t.Errorf("%s is line %d, want %d", w.name, got, w.line)
+		}
+	}
+}
+
+func testCallersEqual(t *testing.T, pcs []uintptr, want []string) {
+	t.Helper()
+
+	got := make([]string, 0, len(want))
+
+	frames := runtime.CallersFrames(pcs)
+	for {
+		frame, more := frames.Next()
+		if !more || len(got) >= len(want) {
+			break
+		}
+		got = append(got, frame.Function)
+	}
+	if !reflect.DeepEqual(want, got) {
+		t.Fatalf("wanted %v, got %v", want, got)
+	}
+}
+
+func TestCallers(t *testing.T) {
+	testCallers(t, f1(false), false)
+}
+
+func TestCallersPanic(t *testing.T) {
+	// Make sure we don't have any extra frames on the stack (due to
+	// open-coded defer processing)
+	want := []string{"runtime.Callers", "runtime_test.TestCallersPanic.func1",
+		"runtime.gopanic", "runtime_test.f3", "runtime_test.f2", "runtime_test.f1",
+		"runtime_test.TestCallersPanic"}
+
+	defer func() {
+		if r := recover(); r == nil {
+			t.Fatal("did not panic")
+		}
+		pcs := make([]uintptr, 20)
+		pcs = pcs[:runtime.Callers(0, pcs)]
+		testCallers(t, pcs, true)
+		testCallersEqual(t, pcs, want)
+	}()
+	f1(true)
+}
+
+func TestCallersDoublePanic(t *testing.T) {
+	// Make sure we don't have any extra frames on the stack (due to
+	// open-coded defer processing)
+	want := []string{"runtime.Callers", "runtime_test.TestCallersDoublePanic.func1.1",
+		"runtime.gopanic", "runtime_test.TestCallersDoublePanic.func1", "runtime.gopanic", "runtime_test.TestCallersDoublePanic"}
+
+	defer func() {
+		defer func() {
+			pcs := make([]uintptr, 20)
+			pcs = pcs[:runtime.Callers(0, pcs)]
+			if recover() == nil {
+				t.Fatal("did not panic")
+			}
+			testCallersEqual(t, pcs, want)
+		}()
+		if recover() == nil {
+			t.Fatal("did not panic")
+		}
+		panic(2)
+	}()
+	panic(1)
+}
+
+// Test that a defer after a successful recovery looks like it is called directly
+// from the function with the defers.
+func TestCallersAfterRecovery(t *testing.T) {
+	want := []string{"runtime.Callers", "runtime_test.TestCallersAfterRecovery.func1", "runtime_test.TestCallersAfterRecovery"}
+
+	defer func() {
+		pcs := make([]uintptr, 20)
+		pcs = pcs[:runtime.Callers(0, pcs)]
+		testCallersEqual(t, pcs, want)
+	}()
+	defer func() {
+		if recover() == nil {
+			t.Fatal("did not recover from panic")
+		}
+	}()
+	panic(1)
+}
+
+func TestCallersAbortedPanic(t *testing.T) {
+	want := []string{"runtime.Callers", "runtime_test.TestCallersAbortedPanic.func2", "runtime_test.TestCallersAbortedPanic"}
+
+	defer func() {
+		r := recover()
+		if r != nil {
+			t.Fatalf("should be no panic remaining to recover")
+		}
+	}()
+
+	defer func() {
+		// panic2 was aborted/replaced by panic1, so when panic2 was
+		// recovered, there is no remaining panic on the stack.
+		pcs := make([]uintptr, 20)
+		pcs = pcs[:runtime.Callers(0, pcs)]
+		testCallersEqual(t, pcs, want)
+	}()
+	defer func() {
+		r := recover()
+		if r != "panic2" {
+			t.Fatalf("got %v, wanted %v", r, "panic2")
+		}
+	}()
+	defer func() {
+		// panic2 aborts/replaces panic1, because it is a recursive panic
+		// that is not recovered within the defer function called by
+		// panic1 panicking sequence
+		panic("panic2")
+	}()
+	panic("panic1")
+}
+
+func TestCallersAbortedPanic2(t *testing.T) {
+	want := []string{"runtime.Callers", "runtime_test.TestCallersAbortedPanic2.func2", "runtime_test.TestCallersAbortedPanic2"}
+	defer func() {
+		r := recover()
+		if r != nil {
+			t.Fatalf("should be no panic remaining to recover")
+		}
+	}()
+	defer func() {
+		pcs := make([]uintptr, 20)
+		pcs = pcs[:runtime.Callers(0, pcs)]
+		testCallersEqual(t, pcs, want)
+	}()
+	func() {
+		defer func() {
+			r := recover()
+			if r != "panic2" {
+				t.Fatalf("got %v, wanted %v", r, "panic2")
+			}
+		}()
+		func() {
+			defer func() {
+				// Again, panic2 aborts/replaces panic1
+				panic("panic2")
+			}()
+			panic("panic1")
+		}()
+	}()
+}
+
+func TestCallersNilPointerPanic(t *testing.T) {
+	// Make sure we don't have any extra frames on the stack (due to
+	// open-coded defer processing)
+	want := []string{"runtime.Callers", "runtime_test.TestCallersNilPointerPanic.func1",
+		"runtime.gopanic", "runtime.panicmem", "runtime.sigpanic",
+		"runtime_test.TestCallersNilPointerPanic"}
+
+	defer func() {
+		if r := recover(); r == nil {
+			t.Fatal("did not panic")
+		}
+		pcs := make([]uintptr, 20)
+		pcs = pcs[:runtime.Callers(0, pcs)]
+		testCallersEqual(t, pcs, want)
+	}()
+	var p *int
+	if *p == 3 {
+		t.Fatal("did not see nil pointer panic")
+	}
+}
+
+func TestCallersDivZeroPanic(t *testing.T) {
+	// Make sure we don't have any extra frames on the stack (due to
+	// open-coded defer processing)
+	want := []string{"runtime.Callers", "runtime_test.TestCallersDivZeroPanic.func1",
+		"runtime.gopanic", "runtime.panicdivide",
+		"runtime_test.TestCallersDivZeroPanic"}
+
+	defer func() {
+		if r := recover(); r == nil {
+			t.Fatal("did not panic")
+		}
+		pcs := make([]uintptr, 20)
+		pcs = pcs[:runtime.Callers(0, pcs)]
+		testCallersEqual(t, pcs, want)
+	}()
+	var n int
+	if 5/n == 1 {
+		t.Fatal("did not see divide-by-sizer panic")
+	}
+}
+
+func TestCallersDeferNilFuncPanic(t *testing.T) {
+	// Make sure we don't have any extra frames on the stack. We cut off the check
+	// at runtime.sigpanic, because non-open-coded defers (which may be used in
+	// non-opt or race checker mode) include an extra 'deferreturn' frame (which is
+	// where the nil pointer deref happens).
+	state := 1
+	want := []string{"runtime.Callers", "runtime_test.TestCallersDeferNilFuncPanic.func1",
+		"runtime.gopanic", "runtime.panicmem", "runtime.sigpanic"}
+
+	defer func() {
+		if r := recover(); r == nil {
+			t.Fatal("did not panic")
+		}
+		pcs := make([]uintptr, 20)
+		pcs = pcs[:runtime.Callers(0, pcs)]
+		testCallersEqual(t, pcs, want)
+		if state == 1 {
+			t.Fatal("nil defer func panicked at defer time rather than function exit time")
+		}
+
+	}()
+	var f func()
+	defer f()
+	// Use the value of 'state' to make sure nil defer func f causes panic at
+	// function exit, rather than at the defer statement.
+	state = 2
+}
+
+// Same test, but forcing non-open-coded defer by putting the defer in a loop.  See
+// issue #36050
+func TestCallersDeferNilFuncPanicWithLoop(t *testing.T) {
+	state := 1
+	want := []string{"runtime.Callers", "runtime_test.TestCallersDeferNilFuncPanicWithLoop.func1",
+		"runtime.gopanic", "runtime.panicmem", "runtime.sigpanic", "runtime.deferreturn", "runtime_test.TestCallersDeferNilFuncPanicWithLoop"}
+
+	defer func() {
+		if r := recover(); r == nil {
+			t.Fatal("did not panic")
+		}
+		pcs := make([]uintptr, 20)
+		pcs = pcs[:runtime.Callers(0, pcs)]
+		testCallersEqual(t, pcs, want)
+		if state == 1 {
+			t.Fatal("nil defer func panicked at defer time rather than function exit time")
+		}
+
+	}()
+
+	for i := 0; i < 1; i++ {
+		var f func()
+		defer f()
+	}
+	// Use the value of 'state' to make sure nil defer func f causes panic at
+	// function exit, rather than at the defer statement.
+	state = 2
+}
+
+// issue #51988
+// Func.Endlineno was lost when instantiating generic functions, leading to incorrect
+// stack trace positions.
+func TestCallersEndlineno(t *testing.T) {
+	testNormalEndlineno(t)
+	testGenericEndlineno[int](t)
+}
+
+func testNormalEndlineno(t *testing.T) {
+	defer testCallerLine(t, callerLine(t, 0)+1)
+}
+
+func testGenericEndlineno[_ any](t *testing.T) {
+	defer testCallerLine(t, callerLine(t, 0)+1)
+}
+
+func testCallerLine(t *testing.T, want int) {
+	if have := callerLine(t, 1); have != want {
+		t.Errorf("callerLine(1) returned %d, but want %d\n", have, want)
+	}
+}
+
+func callerLine(t *testing.T, skip int) int {
+	_, _, line, ok := runtime.Caller(skip + 1)
+	if !ok {
+		t.Fatalf("runtime.Caller(%d) failed", skip+1)
+	}
+	return line
+}
+
+func BenchmarkCallers(b *testing.B) {
+	b.Run("cached", func(b *testing.B) {
+		// Very pcvalueCache-friendly, no inlining.
+		callersCached(b, 100)
+	})
+	b.Run("inlined", func(b *testing.B) {
+		// Some inlining, still pretty cache-friendly.
+		callersInlined(b, 100)
+	})
+	b.Run("no-cache", func(b *testing.B) {
+		// Cache-hostile
+		callersNoCache(b, 100)
+	})
+}
+
+func callersCached(b *testing.B, n int) int {
+	if n <= 0 {
+		pcs := make([]uintptr, 32)
+		b.ResetTimer()
+		for i := 0; i < b.N; i++ {
+			runtime.Callers(0, pcs)
+		}
+		b.StopTimer()
+		return 0
+	}
+	return 1 + callersCached(b, n-1)
+}
+
+func callersInlined(b *testing.B, n int) int {
+	if n <= 0 {
+		pcs := make([]uintptr, 32)
+		b.ResetTimer()
+		for i := 0; i < b.N; i++ {
+			runtime.Callers(0, pcs)
+		}
+		b.StopTimer()
+		return 0
+	}
+	return 1 + callersInlined1(b, n-1)
+}
+func callersInlined1(b *testing.B, n int) int { return callersInlined2(b, n) }
+func callersInlined2(b *testing.B, n int) int { return callersInlined3(b, n) }
+func callersInlined3(b *testing.B, n int) int { return callersInlined4(b, n) }
+func callersInlined4(b *testing.B, n int) int { return callersInlined(b, n) }
+
+func callersNoCache(b *testing.B, n int) int {
+	if n <= 0 {
+		pcs := make([]uintptr, 32)
+		b.ResetTimer()
+		for i := 0; i < b.N; i++ {
+			runtime.Callers(0, pcs)
+		}
+		b.StopTimer()
+		return 0
+	}
+	switch n % 16 {
+	case 0:
+		return 1 + callersNoCache(b, n-1)
+	case 1:
+		return 1 + callersNoCache(b, n-1)
+	case 2:
+		return 1 + callersNoCache(b, n-1)
+	case 3:
+		return 1 + callersNoCache(b, n-1)
+	case 4:
+		return 1 + callersNoCache(b, n-1)
+	case 5:
+		return 1 + callersNoCache(b, n-1)
+	case 6:
+		return 1 + callersNoCache(b, n-1)
+	case 7:
+		return 1 + callersNoCache(b, n-1)
+	case 8:
+		return 1 + callersNoCache(b, n-1)
+	case 9:
+		return 1 + callersNoCache(b, n-1)
+	case 10:
+		return 1 + callersNoCache(b, n-1)
+	case 11:
+		return 1 + callersNoCache(b, n-1)
+	case 12:
+		return 1 + callersNoCache(b, n-1)
+	case 13:
+		return 1 + callersNoCache(b, n-1)
+	case 14:
+		return 1 + callersNoCache(b, n-1)
+	default:
+		return 1 + callersNoCache(b, n-1)
+	}
+}
+
+func BenchmarkFPCallers(b *testing.B) {
+	b.Run("cached", func(b *testing.B) {
+		// Very pcvalueCache-friendly, no inlining.
+		fpCallersCached(b, 100)
+	})
+}
+
+func fpCallersCached(b *testing.B, n int) int {
+	if n <= 0 {
+		pcs := make([]uintptr, 32)
+		b.ResetTimer()
+		for i := 0; i < b.N; i++ {
+			runtime.FPCallers(pcs)
+		}
+		b.StopTimer()
+		return 0
+	}
+	return 1 + fpCallersCached(b, n-1)
+}
+
+func TestFPUnwindAfterRecovery(t *testing.T) {
+	if !runtime.FramePointerEnabled {
+		t.Skip("frame pointers not supported for this architecture")
+	}
+	func() {
+		// Make sure that frame pointer unwinding succeeds from a deferred
+		// function run after recovering from a panic. It can fail if the
+		// recovery does not properly restore the caller's frame pointer before
+		// running the remaining deferred functions.
+		//
+		// Wrap this all in an extra function since the unwinding is most likely
+		// to fail trying to unwind *after* the frame we're currently in (since
+		// *that* bp will fail to be restored). Below we'll try to induce a crash,
+		// but if for some reason we can't, let's make sure the stack trace looks
+		// right.
+		want := []string{
+			"runtime_test.TestFPUnwindAfterRecovery.func1.1",
+			"runtime_test.TestFPUnwindAfterRecovery.func1",
+			"runtime_test.TestFPUnwindAfterRecovery",
+		}
+		defer func() {
+			pcs := make([]uintptr, 32)
+			for i := range pcs {
+				// If runtime.recovery doesn't properly restore the
+				// frame pointer before returning control to this
+				// function, it will point somewhere lower in the stack
+				// from one of the frames of runtime.gopanic() or one of
+				// it's callees prior to recovery.  So, we put some
+				// non-zero values on the stack to try and get frame
+				// pointer unwinding to crash if it sees the old,
+				// invalid frame pointer.
+				pcs[i] = 10
+			}
+			runtime.FPCallers(pcs)
+			// If it didn't crash, let's symbolize. Something is going
+			// to look wrong if the bp restoration just happened to
+			// reference a valid frame. Look for
+			var got []string
+			frames := runtime.CallersFrames(pcs)
+			for {
+				frame, more := frames.Next()
+				if !more {
+					break
+				}
+				got = append(got, frame.Function)
+			}
+			// Check that we see the frames in want and in that order.
+			// This is a bit roundabout because FPCallers doesn't do
+			// filtering of runtime internals like Callers.
+			i := 0
+			for _, f := range got {
+				if f != want[i] {
+					continue
+				}
+				i++
+				if i == len(want) {
+					break
+				}
+			}
+			if i != len(want) {
+				t.Fatalf("bad unwind: got %v, want %v in that order", got, want)
+			}
+		}()
+		defer func() {
+			if recover() == nil {
+				t.Fatal("did not recover from panic")
+			}
+		}()
+		panic(1)
+	}()
+}
diff --git a/src/runtime/cgo.go b/src/runtime/cgo.go
new file mode 100644
index 0000000..3953035
--- /dev/null
+++ b/src/runtime/cgo.go
@@ -0,0 +1,63 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+//go:cgo_export_static main
+
+// Filled in by runtime/cgo when linked into binary.
+
+//go:linkname _cgo_init _cgo_init
+//go:linkname _cgo_thread_start _cgo_thread_start
+//go:linkname _cgo_sys_thread_create _cgo_sys_thread_create
+//go:linkname _cgo_notify_runtime_init_done _cgo_notify_runtime_init_done
+//go:linkname _cgo_callers _cgo_callers
+//go:linkname _cgo_set_context_function _cgo_set_context_function
+//go:linkname _cgo_yield _cgo_yield
+//go:linkname _cgo_pthread_key_created _cgo_pthread_key_created
+//go:linkname _cgo_bindm _cgo_bindm
+//go:linkname _cgo_getstackbound _cgo_getstackbound
+
+var (
+	_cgo_init                     unsafe.Pointer
+	_cgo_thread_start             unsafe.Pointer
+	_cgo_sys_thread_create        unsafe.Pointer
+	_cgo_notify_runtime_init_done unsafe.Pointer
+	_cgo_callers                  unsafe.Pointer
+	_cgo_set_context_function     unsafe.Pointer
+	_cgo_yield                    unsafe.Pointer
+	_cgo_pthread_key_created      unsafe.Pointer
+	_cgo_bindm                    unsafe.Pointer
+	_cgo_getstackbound            unsafe.Pointer
+)
+
+// iscgo is set to true by the runtime/cgo package
+var iscgo bool
+
+// set_crosscall2 is set by the runtime/cgo package
+var set_crosscall2 func()
+
+// cgoHasExtraM is set on startup when an extra M is created for cgo.
+// The extra M must be created before any C/C++ code calls cgocallback.
+var cgoHasExtraM bool
+
+// cgoUse is called by cgo-generated code (using go:linkname to get at
+// an unexported name). The calls serve two purposes:
+// 1) they are opaque to escape analysis, so the argument is considered to
+// escape to the heap.
+// 2) they keep the argument alive until the call site; the call is emitted after
+// the end of the (presumed) use of the argument by C.
+// cgoUse should not actually be called (see cgoAlwaysFalse).
+func cgoUse(any) { throw("cgoUse should not be called") }
+
+// cgoAlwaysFalse is a boolean value that is always false.
+// The cgo-generated code says if cgoAlwaysFalse { cgoUse(p) }.
+// The compiler cannot see that cgoAlwaysFalse is always false,
+// so it emits the test and keeps the call, giving the desired
+// escape analysis result. The test is cheaper than the call.
+var cgoAlwaysFalse bool
+
+var cgo_yield = &_cgo_yield
diff --git a/src/runtime/cgo/abi_amd64.h b/src/runtime/cgo/abi_amd64.h
new file mode 100644
index 0000000..9949435
--- /dev/null
+++ b/src/runtime/cgo/abi_amd64.h
@@ -0,0 +1,99 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Macros for transitioning from the host ABI to Go ABI0.
+//
+// These save the frame pointer, so in general, functions that use
+// these should have zero frame size to suppress the automatic frame
+// pointer, though it's harmless to not do this.
+
+#ifdef GOOS_windows
+
+// REGS_HOST_TO_ABI0_STACK is the stack bytes used by
+// PUSH_REGS_HOST_TO_ABI0.
+#define REGS_HOST_TO_ABI0_STACK (28*8 + 8)
+
+// PUSH_REGS_HOST_TO_ABI0 prepares for transitioning from
+// the host ABI to Go ABI0 code. It saves all registers that are
+// callee-save in the host ABI and caller-save in Go ABI0 and prepares
+// for entry to Go.
+//
+// Save DI SI BP BX R12 R13 R14 R15 X6-X15 registers and the DF flag.
+// Clear the DF flag for the Go ABI.
+// MXCSR matches the Go ABI, so we don't have to set that,
+// and Go doesn't modify it, so we don't have to save it.
+#define PUSH_REGS_HOST_TO_ABI0()	\
+	PUSHFQ			\
+	CLD			\
+	ADJSP	$(REGS_HOST_TO_ABI0_STACK - 8)	\
+	MOVQ	DI, (0*0)(SP)	\
+	MOVQ	SI, (1*8)(SP)	\
+	MOVQ	BP, (2*8)(SP)	\
+	MOVQ	BX, (3*8)(SP)	\
+	MOVQ	R12, (4*8)(SP)	\
+	MOVQ	R13, (5*8)(SP)	\
+	MOVQ	R14, (6*8)(SP)	\
+	MOVQ	R15, (7*8)(SP)	\
+	MOVUPS	X6, (8*8)(SP)	\
+	MOVUPS	X7, (10*8)(SP)	\
+	MOVUPS	X8, (12*8)(SP)	\
+	MOVUPS	X9, (14*8)(SP)	\
+	MOVUPS	X10, (16*8)(SP)	\
+	MOVUPS	X11, (18*8)(SP)	\
+	MOVUPS	X12, (20*8)(SP)	\
+	MOVUPS	X13, (22*8)(SP)	\
+	MOVUPS	X14, (24*8)(SP)	\
+	MOVUPS	X15, (26*8)(SP)
+
+#define POP_REGS_HOST_TO_ABI0()	\
+	MOVQ	(0*0)(SP), DI	\
+	MOVQ	(1*8)(SP), SI	\
+	MOVQ	(2*8)(SP), BP	\
+	MOVQ	(3*8)(SP), BX	\
+	MOVQ	(4*8)(SP), R12	\
+	MOVQ	(5*8)(SP), R13	\
+	MOVQ	(6*8)(SP), R14	\
+	MOVQ	(7*8)(SP), R15	\
+	MOVUPS	(8*8)(SP), X6	\
+	MOVUPS	(10*8)(SP), X7	\
+	MOVUPS	(12*8)(SP), X8	\
+	MOVUPS	(14*8)(SP), X9	\
+	MOVUPS	(16*8)(SP), X10	\
+	MOVUPS	(18*8)(SP), X11	\
+	MOVUPS	(20*8)(SP), X12	\
+	MOVUPS	(22*8)(SP), X13	\
+	MOVUPS	(24*8)(SP), X14	\
+	MOVUPS	(26*8)(SP), X15	\
+	ADJSP	$-(REGS_HOST_TO_ABI0_STACK - 8)	\
+	POPFQ
+
+#else
+// SysV ABI
+
+#define REGS_HOST_TO_ABI0_STACK (6*8)
+
+// SysV MXCSR matches the Go ABI, so we don't have to set that,
+// and Go doesn't modify it, so we don't have to save it.
+// Both SysV and Go require DF to be cleared, so that's already clear.
+// The SysV and Go frame pointer conventions are compatible.
+#define PUSH_REGS_HOST_TO_ABI0()	\
+	ADJSP	$(REGS_HOST_TO_ABI0_STACK)	\
+	MOVQ	BP, (5*8)(SP)	\
+	LEAQ	(5*8)(SP), BP	\
+	MOVQ	BX, (0*8)(SP)	\
+	MOVQ	R12, (1*8)(SP)	\
+	MOVQ	R13, (2*8)(SP)	\
+	MOVQ	R14, (3*8)(SP)	\
+	MOVQ	R15, (4*8)(SP)
+
+#define POP_REGS_HOST_TO_ABI0()	\
+	MOVQ	(0*8)(SP), BX	\
+	MOVQ	(1*8)(SP), R12	\
+	MOVQ	(2*8)(SP), R13	\
+	MOVQ	(3*8)(SP), R14	\
+	MOVQ	(4*8)(SP), R15	\
+	MOVQ	(5*8)(SP), BP	\
+	ADJSP	$-(REGS_HOST_TO_ABI0_STACK)
+
+#endif
diff --git a/src/runtime/cgo/abi_arm64.h b/src/runtime/cgo/abi_arm64.h
new file mode 100644
index 0000000..e2b5e6d
--- /dev/null
+++ b/src/runtime/cgo/abi_arm64.h
@@ -0,0 +1,43 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Macros for transitioning from the host ABI to Go ABI0.
+//
+// These macros save and restore the callee-saved registers
+// from the stack, but they don't adjust stack pointer, so
+// the user should prepare stack space in advance.
+// SAVE_R19_TO_R28(offset) saves R19 ~ R28 to the stack space
+// of ((offset)+0*8)(RSP) ~ ((offset)+9*8)(RSP).
+//
+// SAVE_F8_TO_F15(offset) saves F8 ~ F15 to the stack space
+// of ((offset)+0*8)(RSP) ~ ((offset)+7*8)(RSP).
+//
+// R29 is not saved because Go will save and restore it.
+
+#define SAVE_R19_TO_R28(offset) \
+	STP	(R19, R20), ((offset)+0*8)(RSP) \
+	STP	(R21, R22), ((offset)+2*8)(RSP) \
+	STP	(R23, R24), ((offset)+4*8)(RSP) \
+	STP	(R25, R26), ((offset)+6*8)(RSP) \
+	STP	(R27, g), ((offset)+8*8)(RSP)
+
+#define RESTORE_R19_TO_R28(offset) \
+	LDP	((offset)+0*8)(RSP), (R19, R20) \
+	LDP	((offset)+2*8)(RSP), (R21, R22) \
+	LDP	((offset)+4*8)(RSP), (R23, R24) \
+	LDP	((offset)+6*8)(RSP), (R25, R26) \
+	LDP	((offset)+8*8)(RSP), (R27, g) /* R28 */
+
+#define SAVE_F8_TO_F15(offset) \
+	FSTPD	(F8, F9), ((offset)+0*8)(RSP) \
+	FSTPD	(F10, F11), ((offset)+2*8)(RSP) \
+	FSTPD	(F12, F13), ((offset)+4*8)(RSP) \
+	FSTPD	(F14, F15), ((offset)+6*8)(RSP)
+
+#define RESTORE_F8_TO_F15(offset) \
+	FLDPD	((offset)+0*8)(RSP), (F8, F9) \
+	FLDPD	((offset)+2*8)(RSP), (F10, F11) \
+	FLDPD	((offset)+4*8)(RSP), (F12, F13) \
+	FLDPD	((offset)+6*8)(RSP), (F14, F15)
+
diff --git a/src/runtime/cgo/abi_loong64.h b/src/runtime/cgo/abi_loong64.h
new file mode 100644
index 0000000..b10d837
--- /dev/null
+++ b/src/runtime/cgo/abi_loong64.h
@@ -0,0 +1,60 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Macros for transitioning from the host ABI to Go ABI0.
+//
+// These macros save and restore the callee-saved registers
+// from the stack, but they don't adjust stack pointer, so
+// the user should prepare stack space in advance.
+// SAVE_R22_TO_R31(offset) saves R22 ~ R31 to the stack space
+// of ((offset)+0*8)(R3) ~ ((offset)+9*8)(R3).
+//
+// SAVE_F24_TO_F31(offset) saves F24 ~ F31 to the stack space
+// of ((offset)+0*8)(R3) ~ ((offset)+7*8)(R3).
+//
+// Note: g is R22
+
+#define SAVE_R22_TO_R31(offset)	\
+	MOVV	g,   ((offset)+(0*8))(R3)	\
+	MOVV	R23, ((offset)+(1*8))(R3)	\
+	MOVV	R24, ((offset)+(2*8))(R3)	\
+	MOVV	R25, ((offset)+(3*8))(R3)	\
+	MOVV	R26, ((offset)+(4*8))(R3)	\
+	MOVV	R27, ((offset)+(5*8))(R3)	\
+	MOVV	R28, ((offset)+(6*8))(R3)	\
+	MOVV	R29, ((offset)+(7*8))(R3)	\
+	MOVV	R30, ((offset)+(8*8))(R3)	\
+	MOVV	R31, ((offset)+(9*8))(R3)
+
+#define SAVE_F24_TO_F31(offset)	\
+	MOVD	F24, ((offset)+(0*8))(R3)	\
+	MOVD	F25, ((offset)+(1*8))(R3)	\
+	MOVD	F26, ((offset)+(2*8))(R3)	\
+	MOVD	F27, ((offset)+(3*8))(R3)	\
+	MOVD	F28, ((offset)+(4*8))(R3)	\
+	MOVD	F29, ((offset)+(5*8))(R3)	\
+	MOVD	F30, ((offset)+(6*8))(R3)	\
+	MOVD	F31, ((offset)+(7*8))(R3)
+
+#define RESTORE_R22_TO_R31(offset)	\
+	MOVV	((offset)+(0*8))(R3),  g	\
+	MOVV	((offset)+(1*8))(R3), R23	\
+	MOVV	((offset)+(2*8))(R3), R24	\
+	MOVV	((offset)+(3*8))(R3), R25	\
+	MOVV	((offset)+(4*8))(R3), R26	\
+	MOVV	((offset)+(5*8))(R3), R27	\
+	MOVV	((offset)+(6*8))(R3), R28	\
+	MOVV	((offset)+(7*8))(R3), R29	\
+	MOVV	((offset)+(8*8))(R3), R30	\
+	MOVV	((offset)+(9*8))(R3), R31
+
+#define RESTORE_F24_TO_F31(offset)	\
+	MOVD	((offset)+(0*8))(R3), F24	\
+	MOVD	((offset)+(1*8))(R3), F25	\
+	MOVD	((offset)+(2*8))(R3), F26	\
+	MOVD	((offset)+(3*8))(R3), F27	\
+	MOVD	((offset)+(4*8))(R3), F28	\
+	MOVD	((offset)+(5*8))(R3), F29	\
+	MOVD	((offset)+(6*8))(R3), F30	\
+	MOVD	((offset)+(7*8))(R3), F31
diff --git a/src/runtime/cgo/abi_ppc64x.h b/src/runtime/cgo/abi_ppc64x.h
new file mode 100644
index 0000000..245a526
--- /dev/null
+++ b/src/runtime/cgo/abi_ppc64x.h
@@ -0,0 +1,195 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Macros for transitioning from the host ABI to Go ABI
+//
+// On PPC64/ELFv2 targets, the following registers are callee
+// saved when called from C. They must be preserved before
+// calling into Go which does not preserve any of them.
+//
+//	R14-R31
+//	CR2-4
+//	VR20-31
+//	F14-F31
+//
+// xcoff(aix) and ELFv1 are similar, but may only require a
+// subset of these.
+//
+// These macros assume a 16 byte aligned stack pointer. This
+// is required by ELFv1, ELFv2, and AIX PPC64.
+
+#define SAVE_GPR_SIZE (18*8)
+#define SAVE_GPR(offset)               \
+	MOVD	R14, (offset+8*0)(R1)  \
+	MOVD	R15, (offset+8*1)(R1)  \
+	MOVD	R16, (offset+8*2)(R1)  \
+	MOVD	R17, (offset+8*3)(R1)  \
+	MOVD	R18, (offset+8*4)(R1)  \
+	MOVD	R19, (offset+8*5)(R1)  \
+	MOVD	R20, (offset+8*6)(R1)  \
+	MOVD	R21, (offset+8*7)(R1)  \
+	MOVD	R22, (offset+8*8)(R1)  \
+	MOVD	R23, (offset+8*9)(R1)  \
+	MOVD	R24, (offset+8*10)(R1) \
+	MOVD	R25, (offset+8*11)(R1) \
+	MOVD	R26, (offset+8*12)(R1) \
+	MOVD	R27, (offset+8*13)(R1) \
+	MOVD	R28, (offset+8*14)(R1) \
+	MOVD	R29, (offset+8*15)(R1) \
+	MOVD	g,   (offset+8*16)(R1) \
+	MOVD	R31, (offset+8*17)(R1)
+
+#define RESTORE_GPR(offset)            \
+	MOVD	(offset+8*0)(R1), R14  \
+	MOVD	(offset+8*1)(R1), R15  \
+	MOVD	(offset+8*2)(R1), R16  \
+	MOVD	(offset+8*3)(R1), R17  \
+	MOVD	(offset+8*4)(R1), R18  \
+	MOVD	(offset+8*5)(R1), R19  \
+	MOVD	(offset+8*6)(R1), R20  \
+	MOVD	(offset+8*7)(R1), R21  \
+	MOVD	(offset+8*8)(R1), R22  \
+	MOVD	(offset+8*9)(R1), R23  \
+	MOVD	(offset+8*10)(R1), R24 \
+	MOVD	(offset+8*11)(R1), R25 \
+	MOVD	(offset+8*12)(R1), R26 \
+	MOVD	(offset+8*13)(R1), R27 \
+	MOVD	(offset+8*14)(R1), R28 \
+	MOVD	(offset+8*15)(R1), R29 \
+	MOVD	(offset+8*16)(R1), g   \
+	MOVD	(offset+8*17)(R1), R31
+
+#define SAVE_FPR_SIZE (18*8)
+#define SAVE_FPR(offset)               \
+	FMOVD	F14, (offset+8*0)(R1)  \
+	FMOVD	F15, (offset+8*1)(R1)  \
+	FMOVD	F16, (offset+8*2)(R1)  \
+	FMOVD	F17, (offset+8*3)(R1)  \
+	FMOVD	F18, (offset+8*4)(R1)  \
+	FMOVD	F19, (offset+8*5)(R1)  \
+	FMOVD	F20, (offset+8*6)(R1)  \
+	FMOVD	F21, (offset+8*7)(R1)  \
+	FMOVD	F22, (offset+8*8)(R1)  \
+	FMOVD	F23, (offset+8*9)(R1)  \
+	FMOVD	F24, (offset+8*10)(R1) \
+	FMOVD	F25, (offset+8*11)(R1) \
+	FMOVD	F26, (offset+8*12)(R1) \
+	FMOVD	F27, (offset+8*13)(R1) \
+	FMOVD	F28, (offset+8*14)(R1) \
+	FMOVD	F29, (offset+8*15)(R1) \
+	FMOVD	F30, (offset+8*16)(R1) \
+	FMOVD	F31, (offset+8*17)(R1)
+
+#define RESTORE_FPR(offset)            \
+	FMOVD	(offset+8*0)(R1), F14  \
+	FMOVD	(offset+8*1)(R1), F15  \
+	FMOVD	(offset+8*2)(R1), F16  \
+	FMOVD	(offset+8*3)(R1), F17  \
+	FMOVD	(offset+8*4)(R1), F18  \
+	FMOVD	(offset+8*5)(R1), F19  \
+	FMOVD	(offset+8*6)(R1), F20  \
+	FMOVD	(offset+8*7)(R1), F21  \
+	FMOVD	(offset+8*8)(R1), F22  \
+	FMOVD	(offset+8*9)(R1), F23  \
+	FMOVD	(offset+8*10)(R1), F24 \
+	FMOVD	(offset+8*11)(R1), F25 \
+	FMOVD	(offset+8*12)(R1), F26 \
+	FMOVD	(offset+8*13)(R1), F27 \
+	FMOVD	(offset+8*14)(R1), F28 \
+	FMOVD	(offset+8*15)(R1), F29 \
+	FMOVD	(offset+8*16)(R1), F30 \
+	FMOVD	(offset+8*17)(R1), F31
+
+// Save and restore VR20-31 (aka VSR56-63). These
+// macros must point to a 16B aligned offset.
+#define SAVE_VR_SIZE (12*16)
+#define SAVE_VR(offset, rtmp)         \
+	MOVD	$(offset+16*0), rtmp  \
+	STVX	V20, (rtmp)(R1)       \
+	MOVD	$(offset+16*1), rtmp  \
+	STVX	V21, (rtmp)(R1)       \
+	MOVD	$(offset+16*2), rtmp  \
+	STVX	V22, (rtmp)(R1)       \
+	MOVD	$(offset+16*3), rtmp  \
+	STVX	V23, (rtmp)(R1)       \
+	MOVD	$(offset+16*4), rtmp  \
+	STVX	V24, (rtmp)(R1)       \
+	MOVD	$(offset+16*5), rtmp  \
+	STVX	V25, (rtmp)(R1)       \
+	MOVD	$(offset+16*6), rtmp  \
+	STVX	V26, (rtmp)(R1)       \
+	MOVD	$(offset+16*7), rtmp  \
+	STVX	V27, (rtmp)(R1)       \
+	MOVD	$(offset+16*8), rtmp  \
+	STVX	V28, (rtmp)(R1)       \
+	MOVD	$(offset+16*9), rtmp  \
+	STVX	V29, (rtmp)(R1)       \
+	MOVD	$(offset+16*10), rtmp \
+	STVX	V30, (rtmp)(R1)       \
+	MOVD	$(offset+16*11), rtmp \
+	STVX	V31, (rtmp)(R1)
+
+#define RESTORE_VR(offset, rtmp)      \
+	MOVD	$(offset+16*0), rtmp  \
+	LVX	(rtmp)(R1), V20       \
+	MOVD	$(offset+16*1), rtmp  \
+	LVX	(rtmp)(R1), V21       \
+	MOVD	$(offset+16*2), rtmp  \
+	LVX	(rtmp)(R1), V22       \
+	MOVD	$(offset+16*3), rtmp  \
+	LVX	(rtmp)(R1), V23       \
+	MOVD	$(offset+16*4), rtmp  \
+	LVX	(rtmp)(R1), V24       \
+	MOVD	$(offset+16*5), rtmp  \
+	LVX	(rtmp)(R1), V25       \
+	MOVD	$(offset+16*6), rtmp  \
+	LVX	(rtmp)(R1), V26       \
+	MOVD	$(offset+16*7), rtmp  \
+	LVX	(rtmp)(R1), V27       \
+	MOVD	$(offset+16*8), rtmp  \
+	LVX	(rtmp)(R1), V28       \
+	MOVD	$(offset+16*9), rtmp  \
+	LVX	(rtmp)(R1), V29       \
+	MOVD	$(offset+16*10), rtmp \
+	LVX	(rtmp)(R1), V30       \
+	MOVD	$(offset+16*11), rtmp \
+	LVX	(rtmp)(R1), V31
+
+// LR and CR are saved in the caller's frame. The callee must
+// make space for all other callee-save registers.
+#define SAVE_ALL_REG_SIZE (SAVE_GPR_SIZE+SAVE_FPR_SIZE+SAVE_VR_SIZE)
+
+// Stack a frame and save all callee-save registers following the
+// host OS's ABI. Fortunately, this is identical for AIX, ELFv1, and
+// ELFv2. All host ABIs require the stack pointer to maintain 16 byte
+// alignment, and save the callee-save registers in the same places.
+//
+// To restate, R1 is assumed to be aligned when this macro is used.
+// This assumes the caller's frame is compliant with the host ABI.
+// CR and LR are saved into the caller's frame per the host ABI.
+// R0 is initialized to $0 as expected by Go.
+#define STACK_AND_SAVE_HOST_TO_GO_ABI(extra)                       \
+	MOVD	LR, R0                                             \
+	MOVD	R0, 16(R1)                                         \
+	MOVW	CR, R0                                             \
+	MOVD	R0, 8(R1)                                          \
+	MOVDU	R1, -(extra)-FIXED_FRAME-SAVE_ALL_REG_SIZE(R1)     \
+	SAVE_GPR(extra+FIXED_FRAME)                                \
+	SAVE_FPR(extra+FIXED_FRAME+SAVE_GPR_SIZE)                  \
+	SAVE_VR(extra+FIXED_FRAME+SAVE_GPR_SIZE+SAVE_FPR_SIZE, R0) \
+	MOVD	$0, R0
+
+// This unstacks the frame, restoring all callee-save registers
+// as saved by STACK_AND_SAVE_HOST_TO_GO_ABI.
+//
+// R0 is not guaranteed to contain $0 after this macro.
+#define UNSTACK_AND_RESTORE_GO_TO_HOST_ABI(extra)                     \
+	RESTORE_GPR(extra+FIXED_FRAME)                                \
+	RESTORE_FPR(extra+FIXED_FRAME+SAVE_GPR_SIZE)                  \
+	RESTORE_VR(extra+FIXED_FRAME+SAVE_GPR_SIZE+SAVE_FPR_SIZE, R0) \
+	ADD 	$(extra+FIXED_FRAME+SAVE_ALL_REG_SIZE), R1            \
+	MOVD	16(R1), R0                                            \
+	MOVD	R0, LR                                                \
+	MOVD	8(R1), R0                                             \
+	MOVW	R0, CR
diff --git a/src/runtime/cgo/asm_386.s b/src/runtime/cgo/asm_386.s
new file mode 100644
index 0000000..f9a662a
--- /dev/null
+++ b/src/runtime/cgo/asm_386.s
@@ -0,0 +1,42 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// Set the x_crosscall2_ptr C function pointer variable point to crosscall2.
+// It's such a pointer chain: _crosscall2_ptr -> x_crosscall2_ptr -> crosscall2
+// Use a local trampoline, to avoid taking the address of a dynamically exported
+// function.
+TEXT ·set_crosscall2(SB),NOSPLIT,$0-0
+	MOVL	_crosscall2_ptr(SB), AX
+	MOVL	$crosscall2_trampoline<>(SB), BX
+	MOVL	BX, (AX)
+	RET
+
+TEXT crosscall2_trampoline<>(SB),NOSPLIT,$0-0
+	JMP	crosscall2(SB)
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+TEXT crosscall2(SB),NOSPLIT,$28-16
+	MOVL BP, 24(SP)
+	MOVL BX, 20(SP)
+	MOVL SI, 16(SP)
+	MOVL DI, 12(SP)
+
+	MOVL	ctxt+12(FP), AX
+	MOVL	AX, 8(SP)
+	MOVL	a+4(FP), AX
+	MOVL	AX, 4(SP)
+	MOVL	fn+0(FP), AX
+	MOVL	AX, 0(SP)
+	CALL	runtime·cgocallback(SB)
+
+	MOVL 12(SP), DI
+	MOVL 16(SP), SI
+	MOVL 20(SP), BX
+	MOVL 24(SP), BP
+	RET
diff --git a/src/runtime/cgo/asm_amd64.s b/src/runtime/cgo/asm_amd64.s
new file mode 100644
index 0000000..e319094
--- /dev/null
+++ b/src/runtime/cgo/asm_amd64.s
@@ -0,0 +1,47 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+#include "abi_amd64.h"
+
+// Set the x_crosscall2_ptr C function pointer variable point to crosscall2.
+// It's such a pointer chain: _crosscall2_ptr -> x_crosscall2_ptr -> crosscall2
+// Use a local trampoline, to avoid taking the address of a dynamically exported
+// function.
+TEXT ·set_crosscall2(SB),NOSPLIT,$0-0
+	MOVQ	_crosscall2_ptr(SB), AX
+	MOVQ	$crosscall2_trampoline<>(SB), BX
+	MOVQ	BX, (AX)
+	RET
+
+TEXT crosscall2_trampoline<>(SB),NOSPLIT,$0-0
+	JMP	crosscall2(SB)
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+// This signature is known to SWIG, so we can't change it.
+TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0-0
+	PUSH_REGS_HOST_TO_ABI0()
+
+	// Make room for arguments to cgocallback.
+	ADJSP	$0x18
+#ifndef GOOS_windows
+	MOVQ	DI, 0x0(SP)	/* fn */
+	MOVQ	SI, 0x8(SP)	/* arg */
+	// Skip n in DX.
+	MOVQ	CX, 0x10(SP)	/* ctxt */
+#else
+	MOVQ	CX, 0x0(SP)	/* fn */
+	MOVQ	DX, 0x8(SP)	/* arg */
+	// Skip n in R8.
+	MOVQ	R9, 0x10(SP)	/* ctxt */
+#endif
+
+	CALL	runtime·cgocallback(SB)
+
+	ADJSP	$-0x18
+	POP_REGS_HOST_TO_ABI0()
+	RET
diff --git a/src/runtime/cgo/asm_arm.s b/src/runtime/cgo/asm_arm.s
new file mode 100644
index 0000000..095e9c0
--- /dev/null
+++ b/src/runtime/cgo/asm_arm.s
@@ -0,0 +1,69 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// Set the x_crosscall2_ptr C function pointer variable point to crosscall2.
+// It's such a pointer chain: _crosscall2_ptr -> x_crosscall2_ptr -> crosscall2
+// Use a local trampoline, to avoid taking the address of a dynamically exported
+// function.
+TEXT ·set_crosscall2(SB),NOSPLIT,$0-0
+	MOVW	_crosscall2_ptr(SB), R1
+	MOVW	$crosscall2_trampoline<>(SB), R2
+	MOVW	R2, (R1)
+	RET
+
+TEXT crosscall2_trampoline<>(SB),NOSPLIT,$0-0
+	JMP	crosscall2(SB)
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0
+	SUB	$(8*9), R13 // Reserve space for the floating point registers.
+	// The C arguments arrive in R0, R1, R2, and R3. We want to
+	// pass R0, R1, and R3 to Go, so we push those on the stack.
+	// Also, save C callee-save registers R4-R12.
+	MOVM.WP	[R0, R1, R3, R4, R5, R6, R7, R8, R9, g, R11, R12], (R13)
+	// Finally, save the link register R14. This also puts the
+	// arguments we pushed for cgocallback where they need to be,
+	// starting at 4(R13).
+	MOVW.W	R14, -4(R13)
+
+	// Skip floating point registers on GOARM < 6.
+	MOVB    runtime·goarm(SB), R11
+	CMP $6, R11
+	BLT skipfpsave
+	MOVD	F8, (13*4+8*1)(R13)
+	MOVD	F9, (13*4+8*2)(R13)
+	MOVD	F10, (13*4+8*3)(R13)
+	MOVD	F11, (13*4+8*4)(R13)
+	MOVD	F12, (13*4+8*5)(R13)
+	MOVD	F13, (13*4+8*6)(R13)
+	MOVD	F14, (13*4+8*7)(R13)
+	MOVD	F15, (13*4+8*8)(R13)
+
+skipfpsave:
+	BL	runtime·load_g(SB)
+	// We set up the arguments to cgocallback when saving registers above.
+	BL	runtime·cgocallback(SB)
+
+	MOVB    runtime·goarm(SB), R11
+	CMP $6, R11
+	BLT skipfprest
+	MOVD	(13*4+8*1)(R13), F8
+	MOVD	(13*4+8*2)(R13), F9
+	MOVD	(13*4+8*3)(R13), F10
+	MOVD	(13*4+8*4)(R13), F11
+	MOVD	(13*4+8*5)(R13), F12
+	MOVD	(13*4+8*6)(R13), F13
+	MOVD	(13*4+8*7)(R13), F14
+	MOVD	(13*4+8*8)(R13), F15
+
+skipfprest:
+	MOVW.P	4(R13), R14
+	MOVM.IAW	(R13), [R0, R1, R3, R4, R5, R6, R7, R8, R9, g, R11, R12]
+	ADD	$(8*9), R13
+	MOVW	R14, R15
diff --git a/src/runtime/cgo/asm_arm64.s b/src/runtime/cgo/asm_arm64.s
new file mode 100644
index 0000000..5492dc1
--- /dev/null
+++ b/src/runtime/cgo/asm_arm64.s
@@ -0,0 +1,50 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+#include "abi_arm64.h"
+
+// Set the x_crosscall2_ptr C function pointer variable point to crosscall2.
+// It's such a pointer chain: _crosscall2_ptr -> x_crosscall2_ptr -> crosscall2
+// Use a local trampoline, to avoid taking the address of a dynamically exported
+// function.
+TEXT ·set_crosscall2(SB),NOSPLIT,$0-0
+	MOVD	_crosscall2_ptr(SB), R1
+	MOVD	$crosscall2_trampoline<>(SB), R2
+	MOVD	R2, (R1)
+	RET
+
+TEXT crosscall2_trampoline<>(SB),NOSPLIT,$0-0
+	JMP	crosscall2(SB)
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0
+	/*
+	 * We still need to save all callee save register as before, and then
+	 *  push 3 args for fn (R0, R1, R3), skipping R2.
+	 * Also note that at procedure entry in gc world, 8(RSP) will be the
+	 *  first arg.
+	 */
+	SUB	$(8*24), RSP
+	STP	(R0, R1), (8*1)(RSP)
+	MOVD	R3, (8*3)(RSP)
+
+	SAVE_R19_TO_R28(8*4)
+	SAVE_F8_TO_F15(8*14)
+	STP	(R29, R30), (8*22)(RSP)
+
+
+	// Initialize Go ABI environment
+	BL	runtime·load_g(SB)
+	BL	runtime·cgocallback(SB)
+
+	RESTORE_R19_TO_R28(8*4)
+	RESTORE_F8_TO_F15(8*14)
+	LDP	(8*22)(RSP), (R29, R30)
+
+	ADD	$(8*24), RSP
+	RET
diff --git a/src/runtime/cgo/asm_loong64.s b/src/runtime/cgo/asm_loong64.s
new file mode 100644
index 0000000..19c8d74
--- /dev/null
+++ b/src/runtime/cgo/asm_loong64.s
@@ -0,0 +1,53 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+#include "abi_loong64.h"
+
+// Set the x_crosscall2_ptr C function pointer variable point to crosscall2.
+// It's such a pointer chain: _crosscall2_ptr -> x_crosscall2_ptr -> crosscall2
+// Use a local trampoline, to avoid taking the address of a dynamically exported
+// function.
+TEXT ·set_crosscall2(SB),NOSPLIT,$0-0
+	MOVV	_crosscall2_ptr(SB), R5
+	MOVV	$crosscall2_trampoline<>(SB), R6
+	MOVV	R6, (R5)
+	RET
+
+TEXT crosscall2_trampoline<>(SB),NOSPLIT,$0-0
+	JMP	crosscall2(SB)
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0
+	/*
+	 * We still need to save all callee save register as before, and then
+	 * push 3 args for fn (R4, R5, R7), skipping R6.
+	 * Also note that at procedure entry in gc world, 8(R29) will be the
+	 *  first arg.
+	 */
+
+	ADDV	$(-23*8), R3
+	MOVV	R4, (1*8)(R3) // fn unsafe.Pointer
+	MOVV	R5, (2*8)(R3) // a unsafe.Pointer
+	MOVV	R7, (3*8)(R3) // ctxt uintptr
+
+	SAVE_R22_TO_R31((4*8))
+	SAVE_F24_TO_F31((14*8))
+	MOVV	R1, (22*8)(R3)
+
+	// Initialize Go ABI environment
+	JAL	runtime·load_g(SB)
+
+	JAL	runtime·cgocallback(SB)
+
+	RESTORE_R22_TO_R31((4*8))
+	RESTORE_F24_TO_F31((14*8))
+	MOVV	(22*8)(R3), R1
+
+	ADDV	$(23*8), R3
+
+	RET
diff --git a/src/runtime/cgo/asm_mips64x.s b/src/runtime/cgo/asm_mips64x.s
new file mode 100644
index 0000000..af817d5
--- /dev/null
+++ b/src/runtime/cgo/asm_mips64x.s
@@ -0,0 +1,95 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips64 || mips64le
+
+#include "textflag.h"
+
+// Set the x_crosscall2_ptr C function pointer variable point to crosscall2.
+// It's such a pointer chain: _crosscall2_ptr -> x_crosscall2_ptr -> crosscall2
+// Use a local trampoline, to avoid taking the address of a dynamically exported
+// function.
+TEXT ·set_crosscall2(SB),NOSPLIT,$0-0
+	MOVV	_crosscall2_ptr(SB), R5
+	MOVV	$crosscall2_trampoline<>(SB), R6
+	MOVV	R6, (R5)
+	RET
+
+TEXT crosscall2_trampoline<>(SB),NOSPLIT,$0-0
+	JMP	crosscall2(SB)
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0
+	/*
+	 * We still need to save all callee save register as before, and then
+	 *  push 3 args for fn (R4, R5, R7), skipping R6.
+	 * Also note that at procedure entry in gc world, 8(R29) will be the
+	 *  first arg.
+	 */
+#ifndef GOMIPS64_softfloat
+	ADDV	$(-8*23), R29
+#else
+	ADDV	$(-8*15), R29
+#endif
+	MOVV	R4, (8*1)(R29) // fn unsafe.Pointer
+	MOVV	R5, (8*2)(R29) // a unsafe.Pointer
+	MOVV	R7, (8*3)(R29) // ctxt uintptr
+	MOVV	R16, (8*4)(R29)
+	MOVV	R17, (8*5)(R29)
+	MOVV	R18, (8*6)(R29)
+	MOVV	R19, (8*7)(R29)
+	MOVV	R20, (8*8)(R29)
+	MOVV	R21, (8*9)(R29)
+	MOVV	R22, (8*10)(R29)
+	MOVV	R23, (8*11)(R29)
+	MOVV	RSB, (8*12)(R29)
+	MOVV	g, (8*13)(R29)
+	MOVV	R31, (8*14)(R29)
+#ifndef GOMIPS64_softfloat
+	MOVD	F24, (8*15)(R29)
+	MOVD	F25, (8*16)(R29)
+	MOVD	F26, (8*17)(R29)
+	MOVD	F27, (8*18)(R29)
+	MOVD	F28, (8*19)(R29)
+	MOVD	F29, (8*20)(R29)
+	MOVD	F30, (8*21)(R29)
+	MOVD	F31, (8*22)(R29)
+#endif
+	// Initialize Go ABI environment
+	// prepare SB register = PC & 0xffffffff00000000
+	BGEZAL	R0, 1(PC)
+	SRLV	$32, R31, RSB
+	SLLV	$32, RSB
+	JAL	runtime·load_g(SB)
+
+	JAL	runtime·cgocallback(SB)
+
+	MOVV	(8*4)(R29), R16
+	MOVV	(8*5)(R29), R17
+	MOVV	(8*6)(R29), R18
+	MOVV	(8*7)(R29), R19
+	MOVV	(8*8)(R29), R20
+	MOVV	(8*9)(R29), R21
+	MOVV	(8*10)(R29), R22
+	MOVV	(8*11)(R29), R23
+	MOVV	(8*12)(R29), RSB
+	MOVV	(8*13)(R29), g
+	MOVV	(8*14)(R29), R31
+#ifndef GOMIPS64_softfloat
+	MOVD	(8*15)(R29), F24
+	MOVD	(8*16)(R29), F25
+	MOVD	(8*17)(R29), F26
+	MOVD	(8*18)(R29), F27
+	MOVD	(8*19)(R29), F28
+	MOVD	(8*20)(R29), F29
+	MOVD	(8*21)(R29), F30
+	MOVD	(8*22)(R29), F31
+	ADDV	$(8*23), R29
+#else
+	ADDV	$(8*15), R29
+#endif
+	RET
diff --git a/src/runtime/cgo/asm_mipsx.s b/src/runtime/cgo/asm_mipsx.s
new file mode 100644
index 0000000..198c59a
--- /dev/null
+++ b/src/runtime/cgo/asm_mipsx.s
@@ -0,0 +1,88 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips || mipsle
+
+#include "textflag.h"
+
+// Set the x_crosscall2_ptr C function pointer variable point to crosscall2.
+// It's such a pointer chain: _crosscall2_ptr -> x_crosscall2_ptr -> crosscall2
+// Use a local trampoline, to avoid taking the address of a dynamically exported
+// function.
+TEXT ·set_crosscall2(SB),NOSPLIT,$0-0
+	MOVW	_crosscall2_ptr(SB), R5
+	MOVW	$crosscall2_trampoline<>(SB), R6
+	MOVW	R6, (R5)
+	RET
+
+TEXT crosscall2_trampoline<>(SB),NOSPLIT,$0-0
+	JMP	crosscall2(SB)
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0
+	/*
+	 * We still need to save all callee save register as before, and then
+	 *  push 3 args for fn (R4, R5, R7), skipping R6.
+	 * Also note that at procedure entry in gc world, 4(R29) will be the
+	 *  first arg.
+	 */
+
+	// Space for 9 caller-saved GPR + LR + 6 caller-saved FPR.
+	// O32 ABI allows us to smash 16 bytes argument area of caller frame.
+#ifndef GOMIPS_softfloat
+	SUBU	$(4*14+8*6-16), R29
+#else
+	SUBU	$(4*14-16), R29	// For soft-float, no FPR.
+#endif
+	MOVW	R4, (4*1)(R29)	// fn unsafe.Pointer
+	MOVW	R5, (4*2)(R29)	// a unsafe.Pointer
+	MOVW	R7, (4*3)(R29)	// ctxt uintptr
+	MOVW	R16, (4*4)(R29)
+	MOVW	R17, (4*5)(R29)
+	MOVW	R18, (4*6)(R29)
+	MOVW	R19, (4*7)(R29)
+	MOVW	R20, (4*8)(R29)
+	MOVW	R21, (4*9)(R29)
+	MOVW	R22, (4*10)(R29)
+	MOVW	R23, (4*11)(R29)
+	MOVW	g, (4*12)(R29)
+	MOVW	R31, (4*13)(R29)
+#ifndef GOMIPS_softfloat
+	MOVD	F20, (4*14)(R29)
+	MOVD	F22, (4*14+8*1)(R29)
+	MOVD	F24, (4*14+8*2)(R29)
+	MOVD	F26, (4*14+8*3)(R29)
+	MOVD	F28, (4*14+8*4)(R29)
+	MOVD	F30, (4*14+8*5)(R29)
+#endif
+	JAL	runtime·load_g(SB)
+
+	JAL	runtime·cgocallback(SB)
+
+	MOVW	(4*4)(R29), R16
+	MOVW	(4*5)(R29), R17
+	MOVW	(4*6)(R29), R18
+	MOVW	(4*7)(R29), R19
+	MOVW	(4*8)(R29), R20
+	MOVW	(4*9)(R29), R21
+	MOVW	(4*10)(R29), R22
+	MOVW	(4*11)(R29), R23
+	MOVW	(4*12)(R29), g
+	MOVW	(4*13)(R29), R31
+#ifndef GOMIPS_softfloat
+	MOVD	(4*14)(R29), F20
+	MOVD	(4*14+8*1)(R29), F22
+	MOVD	(4*14+8*2)(R29), F24
+	MOVD	(4*14+8*3)(R29), F26
+	MOVD	(4*14+8*4)(R29), F28
+	MOVD	(4*14+8*5)(R29), F30
+
+	ADDU	$(4*14+8*6-16), R29
+#else
+	ADDU	$(4*14-16), R29
+#endif
+	RET
diff --git a/src/runtime/cgo/asm_ppc64x.s b/src/runtime/cgo/asm_ppc64x.s
new file mode 100644
index 0000000..a389745
--- /dev/null
+++ b/src/runtime/cgo/asm_ppc64x.s
@@ -0,0 +1,70 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ppc64 || ppc64le
+
+#include "textflag.h"
+#include "asm_ppc64x.h"
+#include "abi_ppc64x.h"
+
+#ifdef GO_PPC64X_HAS_FUNCDESC
+// crosscall2 is marked with go:cgo_export_static. On AIX, this creates and exports
+// the symbol name and descriptor as the AIX linker expects, but does not work if
+// referenced from within Go. Create and use an aliased descriptor of crosscall2
+// to workaround this.
+DEFINE_PPC64X_FUNCDESC(_crosscall2<>, crosscall2)
+#define CROSSCALL2_FPTR $_crosscall2<>(SB)
+#else
+// Use a local trampoline, to avoid taking the address of a dynamically exported
+// function.
+#define CROSSCALL2_FPTR $crosscall2_trampoline<>(SB)
+#endif
+
+// Set the x_crosscall2_ptr C function pointer variable point to crosscall2.
+// It's such a pointer chain: _crosscall2_ptr -> x_crosscall2_ptr -> crosscall2
+TEXT ·set_crosscall2(SB),NOSPLIT,$0-0
+	MOVD	_crosscall2_ptr(SB), R5
+	MOVD	CROSSCALL2_FPTR, R6
+	MOVD	R6, (R5)
+	RET
+
+TEXT crosscall2_trampoline<>(SB),NOSPLIT,$0-0
+	JMP	crosscall2(SB)
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+// The value of R2 is saved on the new stack frame, and not
+// the caller's frame due to issue #43228.
+TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0
+	// Start with standard C stack frame layout and linkage, allocate
+	// 32 bytes of argument space, save callee-save regs, and set R0 to $0.
+	STACK_AND_SAVE_HOST_TO_GO_ABI(32)
+	// The above will not preserve R2 (TOC). Save it in case Go is
+	// compiled without a TOC pointer (e.g -buildmode=default).
+	MOVD	R2, 24(R1)
+
+	// Load the current g.
+	BL	runtime·load_g(SB)
+
+#ifdef GO_PPC64X_HAS_FUNCDESC
+	// Load the real entry address from the first slot of the function descriptor.
+	// The first argument fn might be null, that means dropm in pthread key destructor.
+	CMP	R3, $0
+	BEQ	nil_fn
+	MOVD	8(R3), R2
+	MOVD	(R3), R3
+nil_fn:
+#endif
+	MOVD	R3, FIXED_FRAME+0(R1)	// fn unsafe.Pointer
+	MOVD	R4, FIXED_FRAME+8(R1)	// a unsafe.Pointer
+	// Skip R5 = n uint32
+	MOVD	R6, FIXED_FRAME+16(R1)	// ctxt uintptr
+	BL	runtime·cgocallback(SB)
+
+	// Restore the old frame, and R2.
+	MOVD	24(R1), R2
+	UNSTACK_AND_RESTORE_GO_TO_HOST_ABI(32)
+	RET
diff --git a/src/runtime/cgo/asm_riscv64.s b/src/runtime/cgo/asm_riscv64.s
new file mode 100644
index 0000000..d75a543
--- /dev/null
+++ b/src/runtime/cgo/asm_riscv64.s
@@ -0,0 +1,91 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// Set the x_crosscall2_ptr C function pointer variable point to crosscall2.
+// It's such a pointer chain: _crosscall2_ptr -> x_crosscall2_ptr -> crosscall2
+// Use a local trampoline, to avoid taking the address of a dynamically exported
+// function.
+TEXT ·set_crosscall2(SB),NOSPLIT,$0-0
+	MOV	_crosscall2_ptr(SB), X7
+	MOV	$crosscall2_trampoline<>(SB), X8
+	MOV	X8, (X7)
+	RET
+
+TEXT crosscall2_trampoline<>(SB),NOSPLIT,$0-0
+	JMP	crosscall2(SB)
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0
+	/*
+	 * Push arguments for fn (X10, X11, X13), along with all callee-save
+	 * registers. Note that at procedure entry the first argument is at
+	 * 8(X2).
+	 */
+	ADD	$(-8*29), X2
+	MOV	X10, (8*1)(X2) // fn unsafe.Pointer
+	MOV	X11, (8*2)(X2) // a unsafe.Pointer
+	MOV	X13, (8*3)(X2) // ctxt uintptr
+	MOV	X8, (8*4)(X2)
+	MOV	X9, (8*5)(X2)
+	MOV	X18, (8*6)(X2)
+	MOV	X19, (8*7)(X2)
+	MOV	X20, (8*8)(X2)
+	MOV	X21, (8*9)(X2)
+	MOV	X22, (8*10)(X2)
+	MOV	X23, (8*11)(X2)
+	MOV	X24, (8*12)(X2)
+	MOV	X25, (8*13)(X2)
+	MOV	X26, (8*14)(X2)
+	MOV	g, (8*15)(X2)
+	MOV	X1, (8*16)(X2)
+	MOVD	F8, (8*17)(X2)
+	MOVD	F9, (8*18)(X2)
+	MOVD	F18, (8*19)(X2)
+	MOVD	F19, (8*20)(X2)
+	MOVD	F20, (8*21)(X2)
+	MOVD	F21, (8*22)(X2)
+	MOVD	F22, (8*23)(X2)
+	MOVD	F23, (8*24)(X2)
+	MOVD	F24, (8*25)(X2)
+	MOVD	F25, (8*26)(X2)
+	MOVD	F26, (8*27)(X2)
+	MOVD	F27, (8*28)(X2)
+
+	// Initialize Go ABI environment
+	CALL	runtime·load_g(SB)
+	CALL	runtime·cgocallback(SB)
+
+	MOV	(8*4)(X2), X8
+	MOV	(8*5)(X2), X9
+	MOV	(8*6)(X2), X18
+	MOV	(8*7)(X2), X19
+	MOV	(8*8)(X2), X20
+	MOV	(8*9)(X2), X21
+	MOV	(8*10)(X2), X22
+	MOV	(8*11)(X2), X23
+	MOV	(8*12)(X2), X24
+	MOV	(8*13)(X2), X25
+	MOV	(8*14)(X2), X26
+	MOV	(8*15)(X2), g
+	MOV	(8*16)(X2), X1
+	MOVD	(8*17)(X2), F8
+	MOVD	(8*18)(X2), F9
+	MOVD	(8*19)(X2), F18
+	MOVD	(8*20)(X2), F19
+	MOVD	(8*21)(X2), F20
+	MOVD	(8*22)(X2), F21
+	MOVD	(8*23)(X2), F22
+	MOVD	(8*24)(X2), F23
+	MOVD	(8*25)(X2), F24
+	MOVD	(8*26)(X2), F25
+	MOVD	(8*27)(X2), F26
+	MOVD	(8*28)(X2), F27
+	ADD	$(8*29), X2
+
+	RET
diff --git a/src/runtime/cgo/asm_s390x.s b/src/runtime/cgo/asm_s390x.s
new file mode 100644
index 0000000..8f74fd5
--- /dev/null
+++ b/src/runtime/cgo/asm_s390x.s
@@ -0,0 +1,68 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// Set the x_crosscall2_ptr C function pointer variable point to crosscall2.
+// It's such a pointer chain: _crosscall2_ptr -> x_crosscall2_ptr -> crosscall2
+// Use a local trampoline, to avoid taking the address of a dynamically exported
+// function.
+TEXT ·set_crosscall2(SB),NOSPLIT,$0-0
+	MOVD	_crosscall2_ptr(SB), R1
+	MOVD	$crosscall2_trampoline<>(SB), R2
+	MOVD	R2, (R1)
+	RET
+
+TEXT crosscall2_trampoline<>(SB),NOSPLIT,$0-0
+	JMP	crosscall2(SB)
+
+// Called by C code generated by cmd/cgo.
+// func crosscall2(fn, a unsafe.Pointer, n int32, ctxt uintptr)
+// Saves C callee-saved registers and calls cgocallback with three arguments.
+// fn is the PC of a func(a unsafe.Pointer) function.
+TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0
+	// Start with standard C stack frame layout and linkage.
+
+	// Save R6-R15 in the register save area of the calling function.
+	STMG	R6, R15, 48(R15)
+
+	// Allocate 96 bytes on the stack.
+	MOVD	$-96(R15), R15
+
+	// Save F8-F15 in our stack frame.
+	FMOVD	F8, 32(R15)
+	FMOVD	F9, 40(R15)
+	FMOVD	F10, 48(R15)
+	FMOVD	F11, 56(R15)
+	FMOVD	F12, 64(R15)
+	FMOVD	F13, 72(R15)
+	FMOVD	F14, 80(R15)
+	FMOVD	F15, 88(R15)
+
+	// Initialize Go ABI environment.
+	BL	runtime·load_g(SB)
+
+	MOVD	R2, 8(R15)	// fn unsafe.Pointer
+	MOVD	R3, 16(R15)	// a unsafe.Pointer
+	// Skip R4 = n uint32
+	MOVD	R5, 24(R15)	// ctxt uintptr
+	BL	runtime·cgocallback(SB)
+
+	FMOVD	32(R15), F8
+	FMOVD	40(R15), F9
+	FMOVD	48(R15), F10
+	FMOVD	56(R15), F11
+	FMOVD	64(R15), F12
+	FMOVD	72(R15), F13
+	FMOVD	80(R15), F14
+	FMOVD	88(R15), F15
+
+	// De-allocate stack frame.
+	MOVD	$96(R15), R15
+
+	// Restore R6-R15.
+	LMG	48(R15), R6, R15
+
+	RET
+
diff --git a/src/runtime/cgo/asm_wasm.s b/src/runtime/cgo/asm_wasm.s
new file mode 100644
index 0000000..e7f01bd
--- /dev/null
+++ b/src/runtime/cgo/asm_wasm.s
@@ -0,0 +1,11 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT ·set_crosscall2(SB),NOSPLIT,$0-0
+	UNDEF
+
+TEXT crosscall2(SB), NOSPLIT, $0
+	UNDEF
diff --git a/src/runtime/cgo/callbacks.go b/src/runtime/cgo/callbacks.go
new file mode 100644
index 0000000..3c246a8
--- /dev/null
+++ b/src/runtime/cgo/callbacks.go
@@ -0,0 +1,152 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package cgo
+
+import "unsafe"
+
+// These utility functions are available to be called from code
+// compiled with gcc via crosscall2.
+
+// The declaration of crosscall2 is:
+//   void crosscall2(void (*fn)(void *), void *, int);
+//
+// We need to export the symbol crosscall2 in order to support
+// callbacks from shared libraries. This applies regardless of
+// linking mode.
+//
+// Compatibility note: SWIG uses crosscall2 in exactly one situation:
+// to call _cgo_panic using the pattern shown below. We need to keep
+// that pattern working. In particular, crosscall2 actually takes four
+// arguments, but it works to call it with three arguments when
+// calling _cgo_panic.
+//
+//go:cgo_export_static crosscall2
+//go:cgo_export_dynamic crosscall2
+
+// Panic. The argument is converted into a Go string.
+
+// Call like this in code compiled with gcc:
+//   struct { const char *p; } a;
+//   a.p = /* string to pass to panic */;
+//   crosscall2(_cgo_panic, &a, sizeof a);
+//   /* The function call will not return.  */
+
+// TODO: We should export a regular C function to panic, change SWIG
+// to use that instead of the above pattern, and then we can drop
+// backwards-compatibility from crosscall2 and stop exporting it.
+
+//go:linkname _runtime_cgo_panic_internal runtime._cgo_panic_internal
+func _runtime_cgo_panic_internal(p *byte)
+
+//go:linkname _cgo_panic _cgo_panic
+//go:cgo_export_static _cgo_panic
+//go:cgo_export_dynamic _cgo_panic
+func _cgo_panic(a *struct{ cstr *byte }) {
+	_runtime_cgo_panic_internal(a.cstr)
+}
+
+//go:cgo_import_static x_cgo_init
+//go:linkname x_cgo_init x_cgo_init
+//go:linkname _cgo_init _cgo_init
+var x_cgo_init byte
+var _cgo_init = &x_cgo_init
+
+//go:cgo_import_static x_cgo_thread_start
+//go:linkname x_cgo_thread_start x_cgo_thread_start
+//go:linkname _cgo_thread_start _cgo_thread_start
+var x_cgo_thread_start byte
+var _cgo_thread_start = &x_cgo_thread_start
+
+// Creates a new system thread without updating any Go state.
+//
+// This method is invoked during shared library loading to create a new OS
+// thread to perform the runtime initialization. This method is similar to
+// _cgo_sys_thread_start except that it doesn't update any Go state.
+
+//go:cgo_import_static x_cgo_sys_thread_create
+//go:linkname x_cgo_sys_thread_create x_cgo_sys_thread_create
+//go:linkname _cgo_sys_thread_create _cgo_sys_thread_create
+var x_cgo_sys_thread_create byte
+var _cgo_sys_thread_create = &x_cgo_sys_thread_create
+
+// Indicates whether a dummy thread key has been created or not.
+//
+// When calling go exported function from C, we register a destructor
+// callback, for a dummy thread key, by using pthread_key_create.
+
+//go:cgo_import_static x_cgo_pthread_key_created
+//go:linkname x_cgo_pthread_key_created x_cgo_pthread_key_created
+//go:linkname _cgo_pthread_key_created _cgo_pthread_key_created
+var x_cgo_pthread_key_created byte
+var _cgo_pthread_key_created = &x_cgo_pthread_key_created
+
+// Export crosscall2 to a c function pointer variable.
+// Used to dropm in pthread key destructor, while C thread is exiting.
+
+//go:cgo_import_static x_crosscall2_ptr
+//go:linkname x_crosscall2_ptr x_crosscall2_ptr
+//go:linkname _crosscall2_ptr _crosscall2_ptr
+var x_crosscall2_ptr byte
+var _crosscall2_ptr = &x_crosscall2_ptr
+
+// Set the x_crosscall2_ptr C function pointer variable point to crosscall2.
+// It's for the runtime package to call at init time.
+func set_crosscall2()
+
+//go:linkname _set_crosscall2 runtime.set_crosscall2
+var _set_crosscall2 = set_crosscall2
+
+// Store the g into the thread-specific value.
+// So that pthread_key_destructor will dropm when the thread is exiting.
+
+//go:cgo_import_static x_cgo_bindm
+//go:linkname x_cgo_bindm x_cgo_bindm
+//go:linkname _cgo_bindm _cgo_bindm
+var x_cgo_bindm byte
+var _cgo_bindm = &x_cgo_bindm
+
+// Notifies that the runtime has been initialized.
+//
+// We currently block at every CGO entry point (via _cgo_wait_runtime_init_done)
+// to ensure that the runtime has been initialized before the CGO call is
+// executed. This is necessary for shared libraries where we kickoff runtime
+// initialization in a separate thread and return without waiting for this
+// thread to complete the init.
+
+//go:cgo_import_static x_cgo_notify_runtime_init_done
+//go:linkname x_cgo_notify_runtime_init_done x_cgo_notify_runtime_init_done
+//go:linkname _cgo_notify_runtime_init_done _cgo_notify_runtime_init_done
+var x_cgo_notify_runtime_init_done byte
+var _cgo_notify_runtime_init_done = &x_cgo_notify_runtime_init_done
+
+// Sets the traceback context function. See runtime.SetCgoTraceback.
+
+//go:cgo_import_static x_cgo_set_context_function
+//go:linkname x_cgo_set_context_function x_cgo_set_context_function
+//go:linkname _cgo_set_context_function _cgo_set_context_function
+var x_cgo_set_context_function byte
+var _cgo_set_context_function = &x_cgo_set_context_function
+
+// Calls a libc function to execute background work injected via libc
+// interceptors, such as processing pending signals under the thread
+// sanitizer.
+//
+// Left as a nil pointer if no libc interceptors are expected.
+
+//go:cgo_import_static _cgo_yield
+//go:linkname _cgo_yield _cgo_yield
+var _cgo_yield unsafe.Pointer
+
+//go:cgo_export_static _cgo_topofstack
+//go:cgo_export_dynamic _cgo_topofstack
+
+// x_cgo_getstackbound gets the thread's C stack size and
+// set the G's stack bound based on the stack size.
+
+//go:cgo_import_static x_cgo_getstackbound
+//go:linkname x_cgo_getstackbound x_cgo_getstackbound
+//go:linkname _cgo_getstackbound _cgo_getstackbound
+var x_cgo_getstackbound byte
+var _cgo_getstackbound = &x_cgo_getstackbound
diff --git a/src/runtime/cgo/callbacks_aix.go b/src/runtime/cgo/callbacks_aix.go
new file mode 100644
index 0000000..8f756fb
--- /dev/null
+++ b/src/runtime/cgo/callbacks_aix.go
@@ -0,0 +1,12 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package cgo
+
+// These functions must be exported in order to perform
+// longcall on cgo programs (cf gcc_aix_ppc64.c).
+//
+//go:cgo_export_static __cgo_topofstack
+//go:cgo_export_static runtime.rt0_go
+//go:cgo_export_static _rt0_ppc64_aix_lib
diff --git a/src/runtime/cgo/callbacks_traceback.go b/src/runtime/cgo/callbacks_traceback.go
new file mode 100644
index 0000000..dae31a8
--- /dev/null
+++ b/src/runtime/cgo/callbacks_traceback.go
@@ -0,0 +1,17 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build darwin || linux
+
+package cgo
+
+import _ "unsafe" // for go:linkname
+
+// Calls the traceback function passed to SetCgoTraceback.
+
+//go:cgo_import_static x_cgo_callers
+//go:linkname x_cgo_callers x_cgo_callers
+//go:linkname _cgo_callers _cgo_callers
+var x_cgo_callers byte
+var _cgo_callers = &x_cgo_callers
diff --git a/src/runtime/cgo/cgo.go b/src/runtime/cgo/cgo.go
new file mode 100644
index 0000000..1e3a502
--- /dev/null
+++ b/src/runtime/cgo/cgo.go
@@ -0,0 +1,40 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+Package cgo contains runtime support for code generated
+by the cgo tool.  See the documentation for the cgo command
+for details on using cgo.
+*/
+package cgo
+
+/*
+
+#cgo darwin,!arm64 LDFLAGS: -lpthread
+#cgo darwin,arm64 LDFLAGS: -framework CoreFoundation
+#cgo dragonfly LDFLAGS: -lpthread
+#cgo freebsd LDFLAGS: -lpthread
+#cgo android LDFLAGS: -llog
+#cgo !android,linux LDFLAGS: -lpthread
+#cgo netbsd LDFLAGS: -lpthread
+#cgo openbsd LDFLAGS: -lpthread
+#cgo aix LDFLAGS: -Wl,-berok
+#cgo solaris LDFLAGS: -lxnet
+#cgo solaris LDFLAGS: -lsocket
+
+// Use -fno-stack-protector to avoid problems locating the
+// proper support functions. See issues #52919, #54313, #58385.
+#cgo CFLAGS: -Wall -Werror -fno-stack-protector
+
+#cgo solaris CPPFLAGS: -D_POSIX_PTHREAD_SEMANTICS
+
+*/
+import "C"
+
+import "runtime/internal/sys"
+
+// Incomplete is used specifically for the semantics of incomplete C types.
+type Incomplete struct {
+	_ sys.NotInHeap
+}
diff --git a/src/runtime/cgo/dragonfly.go b/src/runtime/cgo/dragonfly.go
new file mode 100644
index 0000000..36d70e3
--- /dev/null
+++ b/src/runtime/cgo/dragonfly.go
@@ -0,0 +1,19 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build dragonfly
+
+package cgo
+
+import _ "unsafe" // for go:linkname
+
+// Supply environ and __progname, because we don't
+// link against the standard DragonFly crt0.o and the
+// libc dynamic library needs them.
+
+//go:linkname _environ environ
+//go:linkname _progname __progname
+
+var _environ uintptr
+var _progname uintptr
diff --git a/src/runtime/cgo/freebsd.go b/src/runtime/cgo/freebsd.go
new file mode 100644
index 0000000..2d9f624
--- /dev/null
+++ b/src/runtime/cgo/freebsd.go
@@ -0,0 +1,22 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build freebsd
+
+package cgo
+
+import _ "unsafe" // for go:linkname
+
+// Supply environ and __progname, because we don't
+// link against the standard FreeBSD crt0.o and the
+// libc dynamic library needs them.
+
+//go:linkname _environ environ
+//go:linkname _progname __progname
+
+//go:cgo_export_dynamic environ
+//go:cgo_export_dynamic __progname
+
+var _environ uintptr
+var _progname uintptr
diff --git a/src/runtime/cgo/gcc_386.S b/src/runtime/cgo/gcc_386.S
new file mode 100644
index 0000000..5bd677f
--- /dev/null
+++ b/src/runtime/cgo/gcc_386.S
@@ -0,0 +1,42 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+.file "gcc_386.S"
+
+/*
+ * Windows still insists on underscore prefixes for C function names.
+ */
+#if defined(_WIN32)
+#define EXT(s) _##s
+#else
+#define EXT(s) s
+#endif
+
+/*
+ * void crosscall_386(void (*fn)(void))
+ *
+ * Calling into the 8c tool chain, where all registers are caller save.
+ * Called from standard x86 ABI, where %ebp, %ebx, %esi,
+ * and %edi are callee-save, so they must be saved explicitly.
+ */
+.globl EXT(crosscall_386)
+EXT(crosscall_386):
+	pushl %ebp
+	movl %esp, %ebp
+	pushl %ebx
+	pushl %esi
+	pushl %edi
+
+	movl 8(%ebp), %eax	/* fn */
+	call *%eax
+
+	popl %edi
+	popl %esi
+	popl %ebx
+	popl %ebp
+	ret
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",@progbits
+#endif
diff --git a/src/runtime/cgo/gcc_aix_ppc64.S b/src/runtime/cgo/gcc_aix_ppc64.S
new file mode 100644
index 0000000..6f465f0
--- /dev/null
+++ b/src/runtime/cgo/gcc_aix_ppc64.S
@@ -0,0 +1,132 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+.file "gcc_aix_ppc64.S"
+
+/*
+ * void crosscall_ppc64(void (*fn)(void), void *g)
+ *
+ * Calling into the gc tool chain, where all registers are caller save.
+ * Called from standard ppc64 C ABI, where r2, r14-r31, f14-f31 are
+ * callee-save, so they must be saved explicitly.
+ * AIX has a special assembly syntax and keywords that can be mixed with
+ * Linux assembly.
+ */
+  .toc
+  .csect .text[PR]
+  .globl crosscall_ppc64
+  .globl .crosscall_ppc64
+  .csect crosscall_ppc64[DS]
+crosscall_ppc64:
+  .llong .crosscall_ppc64, TOC[tc0], 0
+  .csect .text[PR]
+.crosscall_ppc64:
+	// Start with standard C stack frame layout and linkage
+	mflr	0
+	std	0, 16(1)	// Save LR in caller's frame
+	std	2, 40(1)	// Save TOC in caller's frame
+	bl	saveregs
+	stdu	1, -296(1)
+
+	// Set up Go ABI constant registers
+	// Must match _cgo_reginit in runtime package.
+	xor 0, 0, 0
+
+	// Restore g pointer (r30 in Go ABI, which may have been clobbered by C)
+	mr	30, 4
+
+	// Call fn
+	mr	12, 3
+	mtctr	12
+	bctrl
+
+	addi	1, 1, 296
+	bl	restoreregs
+	ld	2, 40(1)
+	ld	0, 16(1)
+	mtlr	0
+	blr
+
+saveregs:
+	// Save callee-save registers
+	// O=-288; for R in {14..31}; do echo "\tstd\t$R, $O(1)"; ((O+=8)); done; for F in f{14..31}; do echo "\tstfd\t$F, $O(1)"; ((O+=8)); done
+	std	14, -288(1)
+	std	15, -280(1)
+	std	16, -272(1)
+	std	17, -264(1)
+	std	18, -256(1)
+	std	19, -248(1)
+	std	20, -240(1)
+	std	21, -232(1)
+	std	22, -224(1)
+	std	23, -216(1)
+	std	24, -208(1)
+	std	25, -200(1)
+	std	26, -192(1)
+	std	27, -184(1)
+	std	28, -176(1)
+	std	29, -168(1)
+	std	30, -160(1)
+	std	31, -152(1)
+	stfd	14, -144(1)
+	stfd	15, -136(1)
+	stfd	16, -128(1)
+	stfd	17, -120(1)
+	stfd	18, -112(1)
+	stfd	19, -104(1)
+	stfd	20, -96(1)
+	stfd	21, -88(1)
+	stfd	22, -80(1)
+	stfd	23, -72(1)
+	stfd	24, -64(1)
+	stfd	25, -56(1)
+	stfd	26, -48(1)
+	stfd	27, -40(1)
+	stfd	28, -32(1)
+	stfd	29, -24(1)
+	stfd	30, -16(1)
+	stfd	31, -8(1)
+
+	blr
+
+restoreregs:
+	// O=-288; for R in {14..31}; do echo "\tld\t$R, $O(1)"; ((O+=8)); done; for F in {14..31}; do echo "\tlfd\t$F, $O(1)"; ((O+=8)); done
+	ld	14, -288(1)
+	ld	15, -280(1)
+	ld	16, -272(1)
+	ld	17, -264(1)
+	ld	18, -256(1)
+	ld	19, -248(1)
+	ld	20, -240(1)
+	ld	21, -232(1)
+	ld	22, -224(1)
+	ld	23, -216(1)
+	ld	24, -208(1)
+	ld	25, -200(1)
+	ld	26, -192(1)
+	ld	27, -184(1)
+	ld	28, -176(1)
+	ld	29, -168(1)
+	ld	30, -160(1)
+	ld	31, -152(1)
+	lfd	14, -144(1)
+	lfd	15, -136(1)
+	lfd	16, -128(1)
+	lfd	17, -120(1)
+	lfd	18, -112(1)
+	lfd	19, -104(1)
+	lfd	20, -96(1)
+	lfd	21, -88(1)
+	lfd	22, -80(1)
+	lfd	23, -72(1)
+	lfd	24, -64(1)
+	lfd	25, -56(1)
+	lfd	26, -48(1)
+	lfd	27, -40(1)
+	lfd	28, -32(1)
+	lfd	29, -24(1)
+	lfd	30, -16(1)
+	lfd	31, -8(1)
+
+	blr
diff --git a/src/runtime/cgo/gcc_aix_ppc64.c b/src/runtime/cgo/gcc_aix_ppc64.c
new file mode 100644
index 0000000..9dd9524
--- /dev/null
+++ b/src/runtime/cgo/gcc_aix_ppc64.c
@@ -0,0 +1,35 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+ * On AIX, call to _cgo_topofstack and Go main are forced to be a longcall.
+ * Without it, ld might add trampolines in the middle of .text section
+ * to reach these functions which are normally declared in runtime package.
+ */
+extern int __attribute__((longcall)) __cgo_topofstack(void);
+extern int __attribute__((longcall)) runtime_rt0_go(int argc, char **argv);
+extern void __attribute__((longcall)) _rt0_ppc64_aix_lib(void);
+
+int _cgo_topofstack(void) {
+	return __cgo_topofstack();
+}
+
+int main(int argc, char **argv) {
+	return runtime_rt0_go(argc, argv);
+}
+
+static void libinit(void) __attribute__ ((constructor));
+
+/*
+ * libinit aims to replace .init_array section which isn't available on aix.
+ * Using __attribute__ ((constructor)) let gcc handles this instead of
+ * adding special code in cmd/link.
+ * However, it will be called for every Go programs which has cgo.
+ * Inside _rt0_ppc64_aix_lib(), runtime.isarchive is checked in order
+ * to know if this program is a c-archive or a simple cgo program.
+ * If it's not set, _rt0_ppc64_ax_lib() returns directly.
+ */
+static void libinit() {
+	_rt0_ppc64_aix_lib();
+}
diff --git a/src/runtime/cgo/gcc_amd64.S b/src/runtime/cgo/gcc_amd64.S
new file mode 100644
index 0000000..5a1629e
--- /dev/null
+++ b/src/runtime/cgo/gcc_amd64.S
@@ -0,0 +1,55 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+.file "gcc_amd64.S"
+
+/*
+ * Apple still insists on underscore prefixes for C function names.
+ */
+#if defined(__APPLE__)
+#define EXT(s) _##s
+#else
+#define EXT(s) s
+#endif
+
+/*
+ * void crosscall_amd64(void (*fn)(void), void (*setg_gcc)(void*), void *g)
+ *
+ * Calling into the 6c tool chain, where all registers are caller save.
+ * Called from standard x86-64 ABI, where %rbx, %rbp, %r12-%r15
+ * are callee-save so they must be saved explicitly.
+ * The standard x86-64 ABI passes the three arguments m, g, fn
+ * in %rdi, %rsi, %rdx.
+ */
+.globl EXT(crosscall_amd64)
+EXT(crosscall_amd64):
+	pushq %rbx
+	pushq %rbp
+	pushq %r12
+	pushq %r13
+	pushq %r14
+	pushq %r15
+
+#if defined(_WIN64)
+	movq %r8, %rdi	/* arg of setg_gcc */
+	call *%rdx	/* setg_gcc */
+	call *%rcx	/* fn */
+#else
+	movq %rdi, %rbx
+	movq %rdx, %rdi	/* arg of setg_gcc */
+	call *%rsi	/* setg_gcc */
+	call *%rbx	/* fn */
+#endif
+
+	popq %r15
+	popq %r14
+	popq %r13
+	popq %r12
+	popq %rbp
+	popq %rbx
+	ret
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",@progbits
+#endif
diff --git a/src/runtime/cgo/gcc_android.c b/src/runtime/cgo/gcc_android.c
new file mode 100644
index 0000000..7ea2135
--- /dev/null
+++ b/src/runtime/cgo/gcc_android.c
@@ -0,0 +1,90 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <stdarg.h>
+#include <android/log.h>
+#include <pthread.h>
+#include <dlfcn.h>
+#include "libcgo.h"
+
+void
+fatalf(const char* format, ...)
+{
+	va_list ap;
+
+	// Write to both stderr and logcat.
+	//
+	// When running from an .apk, /dev/stderr and /dev/stdout
+	// redirect to /dev/null. And when running a test binary
+	// via adb shell, it's easy to miss logcat.
+
+	fprintf(stderr, "runtime/cgo: ");
+	va_start(ap, format);
+	vfprintf(stderr, format, ap);
+	va_end(ap);
+	fprintf(stderr, "\n");
+
+	va_start(ap, format);
+	__android_log_vprint(ANDROID_LOG_FATAL, "runtime/cgo", format, ap);
+	va_end(ap);
+
+	abort();
+}
+
+// Truncated to a different magic value on 32-bit; that's ok.
+#define magic1 (0x23581321345589ULL)
+
+// From https://android.googlesource.com/platform/bionic/+/refs/heads/android10-tests-release/libc/private/bionic_asm_tls.h#69.
+#define TLS_SLOT_APP 2
+
+// inittls allocates a thread-local storage slot for g.
+//
+// It finds the first available slot using pthread_key_create and uses
+// it as the offset value for runtime.tls_g.
+static void
+inittls(void **tlsg, void **tlsbase)
+{
+	pthread_key_t k;
+	int i, err;
+	void *handle, *get_ver, *off;
+
+	// Check for Android Q where we can use the free TLS_SLOT_APP slot.
+	handle = dlopen("libc.so", RTLD_LAZY);
+	if (handle == NULL) {
+		fatalf("inittls: failed to dlopen main program");
+		return;
+	}
+	// android_get_device_api_level is introduced in Android Q, so its mere presence
+	// is enough.
+	get_ver = dlsym(handle, "android_get_device_api_level");
+	dlclose(handle);
+	if (get_ver != NULL) {
+		off = (void *)(TLS_SLOT_APP*sizeof(void *));
+		// tlsg is initialized to Q's free TLS slot. Verify it while we're here.
+		if (*tlsg != off) {
+			fatalf("tlsg offset wrong, got %ld want %ld\n", *tlsg, off);
+		}
+		return;
+	}
+
+	err = pthread_key_create(&k, nil);
+	if(err != 0) {
+		fatalf("pthread_key_create failed: %d", err);
+	}
+	pthread_setspecific(k, (void*)magic1);
+	// If thread local slots are laid out as we expect, our magic word will
+	// be located at some low offset from tlsbase. However, just in case something went
+	// wrong, the search is limited to sensible offsets. PTHREAD_KEYS_MAX was the
+	// original limit, but issue 19472 made a higher limit necessary.
+	for (i=0; i<384; i++) {
+		if (*(tlsbase+i) == (void*)magic1) {
+			*tlsg = (void*)(i*sizeof(void *));
+			pthread_setspecific(k, 0);
+			return;
+		}
+	}
+	fatalf("inittls: could not find pthread key");
+}
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase) = inittls;
diff --git a/src/runtime/cgo/gcc_arm.S b/src/runtime/cgo/gcc_arm.S
new file mode 100644
index 0000000..474fc23
--- /dev/null
+++ b/src/runtime/cgo/gcc_arm.S
@@ -0,0 +1,31 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+.file "gcc_arm.S"
+
+/*
+ * void crosscall_arm1(void (*fn)(void), void (*setg_gcc)(void *g), void *g)
+ *
+ * Calling into the 5c tool chain, where all registers are caller save.
+ * Called from standard ARM EABI, where r4-r11 are callee-save, so they
+ * must be saved explicitly.
+ */
+.globl crosscall_arm1
+crosscall_arm1:
+	push {r4, r5, r6, r7, r8, r9, r10, r11, ip, lr}
+	mov r4, r0
+	mov r5, r1
+	mov r0, r2
+
+	// Because the assembler might target an earlier revision of the ISA
+	// by default, we encode BLX as a .word.
+	.word 0xe12fff35 // blx r5 // setg(g)
+	.word 0xe12fff34 // blx r4 // fn()
+
+	pop {r4, r5, r6, r7, r8, r9, r10, r11, ip, pc}
+
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",%progbits
+#endif
diff --git a/src/runtime/cgo/gcc_arm64.S b/src/runtime/cgo/gcc_arm64.S
new file mode 100644
index 0000000..865f67c
--- /dev/null
+++ b/src/runtime/cgo/gcc_arm64.S
@@ -0,0 +1,84 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+.file "gcc_arm64.S"
+
+/*
+ * Apple still insists on underscore prefixes for C function names.
+ */
+#if defined(__APPLE__)
+#define EXT(s) _##s
+#else
+#define EXT(s) s
+#endif
+
+// Apple's ld64 wants 4-byte alignment for ARM code sections.
+// .align in both Apple as and GNU as treat n as aligning to 2**n bytes.
+.align	2
+
+/*
+ * void crosscall1(void (*fn)(void), void (*setg_gcc)(void *g), void *g)
+ *
+ * Calling into the gc tool chain, where all registers are caller save.
+ * Called from standard ARM EABI, where x19-x29 are callee-save, so they
+ * must be saved explicitly, along with x30 (LR).
+ */
+.globl EXT(crosscall1)
+EXT(crosscall1):
+	.cfi_startproc
+	stp x29, x30, [sp, #-96]!
+	.cfi_def_cfa_offset 96
+	.cfi_offset 29, -96
+	.cfi_offset 30, -88
+	mov x29, sp
+	.cfi_def_cfa_register 29
+	stp x19, x20, [sp, #80]
+	.cfi_offset 19, -16
+	.cfi_offset 20, -8
+	stp x21, x22, [sp, #64]
+	.cfi_offset 21, -32
+	.cfi_offset 22, -24
+	stp x23, x24, [sp, #48]
+	.cfi_offset 23, -48
+	.cfi_offset 24, -40
+	stp x25, x26, [sp, #32]
+	.cfi_offset 25, -64
+	.cfi_offset 26, -56
+	stp x27, x28, [sp, #16]
+	.cfi_offset 27, -80
+	.cfi_offset 28, -72
+
+	mov x19, x0
+	mov x20, x1
+	mov x0, x2
+
+	blr x20
+	blr x19
+
+	ldp x27, x28, [sp, #16]
+	.cfi_restore 27
+	.cfi_restore 28
+	ldp x25, x26, [sp, #32]
+	.cfi_restore 25
+	.cfi_restore 26
+	ldp x23, x24, [sp, #48]
+	.cfi_restore 23
+	.cfi_restore 24
+	ldp x21, x22, [sp, #64]
+	.cfi_restore 21
+	.cfi_restore 22
+	ldp x19, x20, [sp, #80]
+	.cfi_restore 19
+	.cfi_restore 20
+	ldp x29, x30, [sp], #96
+	.cfi_restore 29
+	.cfi_restore 30
+	.cfi_def_cfa 31, 0
+	ret
+	.cfi_endproc
+
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",%progbits
+#endif
diff --git a/src/runtime/cgo/gcc_context.c b/src/runtime/cgo/gcc_context.c
new file mode 100644
index 0000000..ad58692
--- /dev/null
+++ b/src/runtime/cgo/gcc_context.c
@@ -0,0 +1,20 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix || windows
+
+#include "libcgo.h"
+
+// Releases the cgo traceback context.
+void _cgo_release_context(uintptr_t ctxt) {
+	void (*pfn)(struct context_arg*);
+
+	pfn = _cgo_get_context_function();
+	if (ctxt != 0 && pfn != nil) {
+		struct context_arg arg;
+
+		arg.Context = ctxt;
+		(*pfn)(&arg);
+	}
+}
diff --git a/src/runtime/cgo/gcc_darwin_amd64.c b/src/runtime/cgo/gcc_darwin_amd64.c
new file mode 100644
index 0000000..955b81d
--- /dev/null
+++ b/src/runtime/cgo/gcc_darwin_amd64.c
@@ -0,0 +1,63 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <string.h> /* for strerror */
+#include <pthread.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	size_t size;
+
+	setg_gcc = setg;
+
+	size = pthread_get_stacksize_np(pthread_self());
+	g->stacklo = (uintptr)&size - size + 4096;
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	size = pthread_get_stacksize_np(pthread_self());
+	pthread_attr_init(&attr);
+	pthread_attr_setstacksize(&attr, size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall_amd64(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_darwin_arm64.c b/src/runtime/cgo/gcc_darwin_arm64.c
new file mode 100644
index 0000000..5b77a42
--- /dev/null
+++ b/src/runtime/cgo/gcc_darwin_arm64.c
@@ -0,0 +1,142 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <limits.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h> /* for strerror */
+#include <sys/param.h>
+#include <unistd.h>
+#include <stdlib.h>
+
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+#include <TargetConditionals.h>
+
+#if TARGET_OS_IPHONE
+#include <CoreFoundation/CFBundle.h>
+#include <CoreFoundation/CFString.h>
+#endif
+
+static void *threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	//fprintf(stderr, "runtime/cgo: _cgo_sys_thread_start: fn=%p, g=%p\n", ts->fn, ts->g); // debug
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	size = pthread_get_stacksize_np(pthread_self());
+	pthread_attr_init(&attr);
+	pthread_attr_setstacksize(&attr, size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+#if TARGET_OS_IPHONE
+	darwin_arm_init_thread_exception_port();
+#endif
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
+
+#if TARGET_OS_IPHONE
+
+// init_working_dir sets the current working directory to the app root.
+// By default ios/arm64 processes start in "/".
+static void
+init_working_dir()
+{
+	CFBundleRef bundle = CFBundleGetMainBundle();
+	if (bundle == NULL) {
+		fprintf(stderr, "runtime/cgo: no main bundle\n");
+		return;
+	}
+	CFURLRef url_ref = CFBundleCopyResourceURL(bundle, CFSTR("Info"), CFSTR("plist"), NULL);
+	if (url_ref == NULL) {
+		// No Info.plist found. It can happen on Corellium virtual devices.
+		return;
+	}
+	CFStringRef url_str_ref = CFURLGetString(url_ref);
+	char buf[MAXPATHLEN];
+	Boolean res = CFStringGetCString(url_str_ref, buf, sizeof(buf), kCFStringEncodingUTF8);
+	CFRelease(url_ref);
+	if (!res) {
+		fprintf(stderr, "runtime/cgo: cannot get URL string\n");
+		return;
+	}
+
+	// url is of the form "file:///path/to/Info.plist".
+	// strip it down to the working directory "/path/to".
+	int url_len = strlen(buf);
+	if (url_len < sizeof("file://")+sizeof("/Info.plist")) {
+		fprintf(stderr, "runtime/cgo: bad URL: %s\n", buf);
+		return;
+	}
+	buf[url_len-sizeof("/Info.plist")+1] = 0;
+	char *dir = &buf[0] + sizeof("file://")-1;
+
+	if (chdir(dir) != 0) {
+		fprintf(stderr, "runtime/cgo: chdir(%s) failed\n", dir);
+	}
+
+	// The test harness in go_ios_exec passes the relative working directory
+	// in the GoExecWrapperWorkingDirectory property of the app bundle.
+	CFStringRef wd_ref = CFBundleGetValueForInfoDictionaryKey(bundle, CFSTR("GoExecWrapperWorkingDirectory"));
+	if (wd_ref != NULL) {
+		if (!CFStringGetCString(wd_ref, buf, sizeof(buf), kCFStringEncodingUTF8)) {
+			fprintf(stderr, "runtime/cgo: cannot get GoExecWrapperWorkingDirectory string\n");
+			return;
+		}
+		if (chdir(buf) != 0) {
+			fprintf(stderr, "runtime/cgo: chdir(%s) failed\n", buf);
+		}
+	}
+}
+
+#endif // TARGET_OS_IPHONE
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	size_t size;
+
+	//fprintf(stderr, "x_cgo_init = %p\n", &x_cgo_init); // aid debugging in presence of ASLR
+	setg_gcc = setg;
+	size = pthread_get_stacksize_np(pthread_self());
+	g->stacklo = (uintptr)&size - size + 4096;
+
+#if TARGET_OS_IPHONE
+	darwin_arm_init_mach_exception_handler();
+	darwin_arm_init_thread_exception_port();
+	init_working_dir();
+#endif
+}
diff --git a/src/runtime/cgo/gcc_dragonfly_amd64.c b/src/runtime/cgo/gcc_dragonfly_amd64.c
new file mode 100644
index 0000000..0003414
--- /dev/null
+++ b/src/runtime/cgo/gcc_dragonfly_amd64.c
@@ -0,0 +1,66 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <sys/signalvar.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	SIGFILLSET(ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall_amd64(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_fatalf.c b/src/runtime/cgo/gcc_fatalf.c
new file mode 100644
index 0000000..9493dbb
--- /dev/null
+++ b/src/runtime/cgo/gcc_fatalf.c
@@ -0,0 +1,23 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build aix || (!android && linux) || freebsd
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include "libcgo.h"
+
+void
+fatalf(const char* format, ...)
+{
+	va_list ap;
+
+	fprintf(stderr, "runtime/cgo: ");
+	va_start(ap, format);
+	vfprintf(stderr, format, ap);
+	va_end(ap);
+	fprintf(stderr, "\n");
+	abort();
+}
diff --git a/src/runtime/cgo/gcc_freebsd_386.c b/src/runtime/cgo/gcc_freebsd_386.c
new file mode 100644
index 0000000..9097a2a
--- /dev/null
+++ b/src/runtime/cgo/gcc_freebsd_386.c
@@ -0,0 +1,71 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <sys/signalvar.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	SIGFILLSET(ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	/*
+	 * Set specific keys.
+	 */
+	setg_gcc((void*)ts.g);
+
+	crosscall_386(ts.fn);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_freebsd_amd64.c b/src/runtime/cgo/gcc_freebsd_amd64.c
new file mode 100644
index 0000000..6071ec3
--- /dev/null
+++ b/src/runtime/cgo/gcc_freebsd_amd64.c
@@ -0,0 +1,74 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <errno.h>
+#include <sys/signalvar.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t *attr;
+	size_t size;
+
+	// Deal with memory sanitizer/clang interaction.
+	// See gcc_linux_amd64.c for details.
+	setg_gcc = setg;
+	attr = (pthread_attr_t*)malloc(sizeof *attr);
+	if (attr == NULL) {
+		fatalf("malloc failed: %s", strerror(errno));
+	}
+	pthread_attr_init(attr);
+	pthread_attr_getstacksize(attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(attr);
+	free(attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	SIGFILLSET(ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	_cgo_tsan_acquire();
+	free(v);
+	_cgo_tsan_release();
+
+	crosscall_amd64(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_freebsd_arm.c b/src/runtime/cgo/gcc_freebsd_arm.c
new file mode 100644
index 0000000..5f89978
--- /dev/null
+++ b/src/runtime/cgo/gcc_freebsd_arm.c
@@ -0,0 +1,77 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <machine/sysarch.h>
+#include <sys/signalvar.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+#ifdef ARM_TP_ADDRESS
+// ARM_TP_ADDRESS is (ARM_VECTORS_HIGH + 0x1000) or 0xffff1000
+// and is known to runtime.read_tls_fallback. Verify it with
+// cpp.
+#if ARM_TP_ADDRESS != 0xffff1000
+#error Wrong ARM_TP_ADDRESS!
+#endif
+#endif
+
+static void *threadentry(void*);
+
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	SIGFILLSET(ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+extern void crosscall_arm1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall_arm1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_freebsd_arm64.c b/src/runtime/cgo/gcc_freebsd_arm64.c
new file mode 100644
index 0000000..dd8f888
--- /dev/null
+++ b/src/runtime/cgo/gcc_freebsd_arm64.c
@@ -0,0 +1,68 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <errno.h>
+#include <sys/signalvar.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	SIGFILLSET(ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_freebsd_riscv64.c b/src/runtime/cgo/gcc_freebsd_riscv64.c
new file mode 100644
index 0000000..6ce5e65
--- /dev/null
+++ b/src/runtime/cgo/gcc_freebsd_riscv64.c
@@ -0,0 +1,67 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <errno.h>
+#include <sys/signalvar.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	SIGFILLSET(ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_freebsd_sigaction.c b/src/runtime/cgo/gcc_freebsd_sigaction.c
new file mode 100644
index 0000000..b324983
--- /dev/null
+++ b/src/runtime/cgo/gcc_freebsd_sigaction.c
@@ -0,0 +1,80 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build freebsd && amd64
+
+#include <errno.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <string.h>
+#include <signal.h>
+
+#include "libcgo.h"
+
+// go_sigaction_t is a C version of the sigactiont struct from
+// os_freebsd.go.  This definition — and its conversion to and from struct
+// sigaction — are specific to freebsd/amd64.
+typedef struct {
+        uint32_t __bits[_SIG_WORDS];
+} go_sigset_t;
+typedef struct {
+	uintptr_t handler;
+	int32_t flags;
+	go_sigset_t mask;
+} go_sigaction_t;
+
+int32_t
+x_cgo_sigaction(intptr_t signum, const go_sigaction_t *goact, go_sigaction_t *oldgoact) {
+	int32_t ret;
+	struct sigaction act;
+	struct sigaction oldact;
+	size_t i;
+
+	_cgo_tsan_acquire();
+
+	memset(&act, 0, sizeof act);
+	memset(&oldact, 0, sizeof oldact);
+
+	if (goact) {
+		if (goact->flags & SA_SIGINFO) {
+			act.sa_sigaction = (void(*)(int, siginfo_t*, void*))(goact->handler);
+		} else {
+			act.sa_handler = (void(*)(int))(goact->handler);
+		}
+		sigemptyset(&act.sa_mask);
+		for (i = 0; i < 8 * sizeof(goact->mask); i++) {
+			if (goact->mask.__bits[i/32] & ((uint32_t)(1)<<(i&31))) {
+				sigaddset(&act.sa_mask, i+1);
+			}
+		}
+		act.sa_flags = goact->flags;
+	}
+
+	ret = sigaction(signum, goact ? &act : NULL, oldgoact ? &oldact : NULL);
+	if (ret == -1) {
+		// runtime.sigaction expects _cgo_sigaction to return errno on error.
+		_cgo_tsan_release();
+		return errno;
+	}
+
+	if (oldgoact) {
+		if (oldact.sa_flags & SA_SIGINFO) {
+			oldgoact->handler = (uintptr_t)(oldact.sa_sigaction);
+		} else {
+			oldgoact->handler = (uintptr_t)(oldact.sa_handler);
+		}
+		for (i = 0 ; i < _SIG_WORDS; i++) {
+			oldgoact->mask.__bits[i] = 0;
+		}
+		for (i = 0; i < 8 * sizeof(oldgoact->mask); i++) {
+			if (sigismember(&oldact.sa_mask, i+1) == 1) {
+				oldgoact->mask.__bits[i/32] |= (uint32_t)(1)<<(i&31);
+			}
+		}
+		oldgoact->flags = oldact.sa_flags;
+	}
+
+	_cgo_tsan_release();
+	return ret;
+}
diff --git a/src/runtime/cgo/gcc_libinit.c b/src/runtime/cgo/gcc_libinit.c
new file mode 100644
index 0000000..9676593
--- /dev/null
+++ b/src/runtime/cgo/gcc_libinit.c
@@ -0,0 +1,147 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+#include <pthread.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h> // strerror
+#include <time.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static pthread_cond_t runtime_init_cond = PTHREAD_COND_INITIALIZER;
+static pthread_mutex_t runtime_init_mu = PTHREAD_MUTEX_INITIALIZER;
+static int runtime_init_done;
+
+// pthread_g is a pthread specific key, for storing the g that binded to the C thread.
+// The registered pthread_key_destructor will dropm, when the pthread-specified value g is not NULL,
+// while a C thread is exiting.
+static pthread_key_t pthread_g;
+static void pthread_key_destructor(void* g);
+uintptr_t x_cgo_pthread_key_created;
+void (*x_crosscall2_ptr)(void (*fn)(void *), void *, int, size_t);
+
+// The context function, used when tracing back C calls into Go.
+static void (*cgo_context_function)(struct context_arg*);
+
+void
+x_cgo_sys_thread_create(void* (*func)(void*), void* arg) {
+	pthread_t p;
+	int err = _cgo_try_pthread_create(&p, NULL, func, arg);
+	if (err != 0) {
+		fprintf(stderr, "pthread_create failed: %s", strerror(err));
+		abort();
+	}
+}
+
+uintptr_t
+_cgo_wait_runtime_init_done(void) {
+	void (*pfn)(struct context_arg*);
+
+	pthread_mutex_lock(&runtime_init_mu);
+	while (runtime_init_done == 0) {
+		pthread_cond_wait(&runtime_init_cond, &runtime_init_mu);
+	}
+
+	// The key and x_cgo_pthread_key_created are for the whole program,
+	// whereas the specific and destructor is per thread.
+	if (x_cgo_pthread_key_created == 0 && pthread_key_create(&pthread_g, pthread_key_destructor) == 0) {
+		x_cgo_pthread_key_created = 1;
+	}
+
+	// TODO(iant): For the case of a new C thread calling into Go, such
+	// as when using -buildmode=c-archive, we know that Go runtime
+	// initialization is complete but we do not know that all Go init
+	// functions have been run. We should not fetch cgo_context_function
+	// until they have been, because that is where a call to
+	// SetCgoTraceback is likely to occur. We are going to wait for Go
+	// initialization to be complete anyhow, later, by waiting for
+	// main_init_done to be closed in cgocallbackg1. We should wait here
+	// instead. See also issue #15943.
+	pfn = cgo_context_function;
+
+	pthread_mutex_unlock(&runtime_init_mu);
+	if (pfn != nil) {
+		struct context_arg arg;
+
+		arg.Context = 0;
+		(*pfn)(&arg);
+		return arg.Context;
+	}
+	return 0;
+}
+
+// Store the g into a thread-specific value associated with the pthread key pthread_g.
+// And pthread_key_destructor will dropm when the thread is exiting.
+void x_cgo_bindm(void* g) {
+	// We assume this will always succeed, otherwise, there might be extra M leaking,
+	// when a C thread exits after a cgo call.
+	// We only invoke this function once per thread in runtime.needAndBindM,
+	// and the next calls just reuse the bound m.
+	pthread_setspecific(pthread_g, g);
+}
+
+void
+x_cgo_notify_runtime_init_done(void* dummy __attribute__ ((unused))) {
+	pthread_mutex_lock(&runtime_init_mu);
+	runtime_init_done = 1;
+	pthread_cond_broadcast(&runtime_init_cond);
+	pthread_mutex_unlock(&runtime_init_mu);
+}
+
+// Sets the context function to call to record the traceback context
+// when calling a Go function from C code. Called from runtime.SetCgoTraceback.
+void x_cgo_set_context_function(void (*context)(struct context_arg*)) {
+	pthread_mutex_lock(&runtime_init_mu);
+	cgo_context_function = context;
+	pthread_mutex_unlock(&runtime_init_mu);
+}
+
+// Gets the context function.
+void (*(_cgo_get_context_function(void)))(struct context_arg*) {
+	void (*ret)(struct context_arg*);
+
+	pthread_mutex_lock(&runtime_init_mu);
+	ret = cgo_context_function;
+	pthread_mutex_unlock(&runtime_init_mu);
+	return ret;
+}
+
+// _cgo_try_pthread_create retries pthread_create if it fails with
+// EAGAIN.
+int
+_cgo_try_pthread_create(pthread_t* thread, const pthread_attr_t* attr, void* (*pfn)(void*), void* arg) {
+	int tries;
+	int err;
+	struct timespec ts;
+
+	for (tries = 0; tries < 20; tries++) {
+		err = pthread_create(thread, attr, pfn, arg);
+		if (err == 0) {
+			pthread_detach(*thread);
+			return 0;
+		}
+		if (err != EAGAIN) {
+			return err;
+		}
+		ts.tv_sec = 0;
+		ts.tv_nsec = (tries + 1) * 1000 * 1000; // Milliseconds.
+		nanosleep(&ts, nil);
+	}
+	return EAGAIN;
+}
+
+static void
+pthread_key_destructor(void* g) {
+	if (x_crosscall2_ptr != NULL) {
+		// fn == NULL means dropm.
+		// We restore g by using the stored g, before dropm in runtime.cgocallback,
+		// since the g stored in the TLS by Go might be cleared in some platforms,
+		// before this destructor invoked.
+		x_crosscall2_ptr(NULL, g, 0, 0);
+	}
+}
diff --git a/src/runtime/cgo/gcc_libinit_windows.c b/src/runtime/cgo/gcc_libinit_windows.c
new file mode 100644
index 0000000..9a8c65e
--- /dev/null
+++ b/src/runtime/cgo/gcc_libinit_windows.c
@@ -0,0 +1,158 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#define WIN32_LEAN_AND_MEAN
+#include <windows.h>
+#include <process.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+
+#include "libcgo.h"
+#include "libcgo_windows.h"
+
+// Ensure there's one symbol marked __declspec(dllexport).
+// If there are no exported symbols, the unfortunate behavior of
+// the binutils linker is to also strip the relocations table,
+// resulting in non-PIE binary. The other option is the
+// --export-all-symbols flag, but we don't need to export all symbols
+// and this may overflow the export table (#40795).
+// See https://sourceware.org/bugzilla/show_bug.cgi?id=19011
+__declspec(dllexport) int _cgo_dummy_export;
+
+static volatile LONG runtime_init_once_gate = 0;
+static volatile LONG runtime_init_once_done = 0;
+
+static CRITICAL_SECTION runtime_init_cs;
+
+static HANDLE runtime_init_wait;
+static int runtime_init_done;
+
+uintptr_t x_cgo_pthread_key_created;
+void (*x_crosscall2_ptr)(void (*fn)(void *), void *, int, size_t);
+
+// Pre-initialize the runtime synchronization objects
+void
+_cgo_preinit_init() {
+	 runtime_init_wait = CreateEvent(NULL, TRUE, FALSE, NULL);
+	 if (runtime_init_wait == NULL) {
+		fprintf(stderr, "runtime: failed to create runtime initialization wait event.\n");
+		abort();
+	 }
+
+	 InitializeCriticalSection(&runtime_init_cs);
+}
+
+// Make sure that the preinit sequence has run.
+void
+_cgo_maybe_run_preinit() {
+	 if (!InterlockedExchangeAdd(&runtime_init_once_done, 0)) {
+			if (InterlockedIncrement(&runtime_init_once_gate) == 1) {
+				 _cgo_preinit_init();
+				 InterlockedIncrement(&runtime_init_once_done);
+			} else {
+				 // Decrement to avoid overflow.
+				 InterlockedDecrement(&runtime_init_once_gate);
+				 while(!InterlockedExchangeAdd(&runtime_init_once_done, 0)) {
+						Sleep(0);
+				 }
+			}
+	 }
+}
+
+void
+x_cgo_sys_thread_create(void (*func)(void*), void* arg) {
+	_cgo_beginthread(func, arg);
+}
+
+int
+_cgo_is_runtime_initialized() {
+	 EnterCriticalSection(&runtime_init_cs);
+	 int status = runtime_init_done;
+	 LeaveCriticalSection(&runtime_init_cs);
+	 return status;
+}
+
+uintptr_t
+_cgo_wait_runtime_init_done(void) {
+	void (*pfn)(struct context_arg*);
+
+	 _cgo_maybe_run_preinit();
+	while (!_cgo_is_runtime_initialized()) {
+			WaitForSingleObject(runtime_init_wait, INFINITE);
+	}
+	pfn = _cgo_get_context_function();
+	if (pfn != nil) {
+		struct context_arg arg;
+
+		arg.Context = 0;
+		(*pfn)(&arg);
+		return arg.Context;
+	}
+	return 0;
+}
+
+// Should not be used since x_cgo_pthread_key_created will always be zero.
+void x_cgo_bindm(void* dummy) {
+	fprintf(stderr, "unexpected cgo_bindm on Windows\n");
+	abort();
+}
+
+void
+x_cgo_notify_runtime_init_done(void* dummy) {
+	 _cgo_maybe_run_preinit();
+
+	 EnterCriticalSection(&runtime_init_cs);
+	runtime_init_done = 1;
+	 LeaveCriticalSection(&runtime_init_cs);
+
+	 if (!SetEvent(runtime_init_wait)) {
+		fprintf(stderr, "runtime: failed to signal runtime initialization complete.\n");
+		abort();
+	}
+}
+
+// The context function, used when tracing back C calls into Go.
+static void (*cgo_context_function)(struct context_arg*);
+
+// Sets the context function to call to record the traceback context
+// when calling a Go function from C code. Called from runtime.SetCgoTraceback.
+void x_cgo_set_context_function(void (*context)(struct context_arg*)) {
+	EnterCriticalSection(&runtime_init_cs);
+	cgo_context_function = context;
+	LeaveCriticalSection(&runtime_init_cs);
+}
+
+// Gets the context function.
+void (*(_cgo_get_context_function(void)))(struct context_arg*) {
+	void (*ret)(struct context_arg*);
+
+	EnterCriticalSection(&runtime_init_cs);
+	ret = cgo_context_function;
+	LeaveCriticalSection(&runtime_init_cs);
+	return ret;
+}
+
+void _cgo_beginthread(void (*func)(void*), void* arg) {
+	int tries;
+	uintptr_t thandle;
+
+	for (tries = 0; tries < 20; tries++) {
+		thandle = _beginthread(func, 0, arg);
+		if (thandle == -1 && errno == EACCES) {
+			// "Insufficient resources", try again in a bit.
+			//
+			// Note that the first Sleep(0) is a yield.
+			Sleep(tries); // milliseconds
+			continue;
+		} else if (thandle == -1) {
+			break;
+		}
+		return; // Success!
+	}
+
+	fprintf(stderr, "runtime: failed to create new OS thread (%d)\n", errno);
+	abort();
+}
diff --git a/src/runtime/cgo/gcc_linux_386.c b/src/runtime/cgo/gcc_linux_386.c
new file mode 100644
index 0000000..0ce9359
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_386.c
@@ -0,0 +1,74 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+static void (*setg_gcc)(void*);
+
+// This will be set in gcc_android.c for android-specific customization.
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase) __attribute__((common));
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+
+	if (x_cgo_inittls) {
+		x_cgo_inittls(tlsg, tlsbase);
+	}
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	/*
+	 * Set specific keys.
+	 */
+	setg_gcc((void*)ts.g);
+
+	crosscall_386(ts.fn);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_linux_amd64.c b/src/runtime/cgo/gcc_linux_amd64.c
new file mode 100644
index 0000000..fb164c1
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_amd64.c
@@ -0,0 +1,96 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <pthread.h>
+#include <errno.h>
+#include <string.h> // strerror
+#include <signal.h>
+#include <stdlib.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+// This will be set in gcc_android.c for android-specific customization.
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase) __attribute__((common));
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	pthread_attr_t *attr;
+	size_t size;
+
+	/* The memory sanitizer distributed with versions of clang
+	   before 3.8 has a bug: if you call mmap before malloc, mmap
+	   may return an address that is later overwritten by the msan
+	   library.  Avoid this problem by forcing a call to malloc
+	   here, before we ever call malloc.
+
+	   This is only required for the memory sanitizer, so it's
+	   unfortunate that we always run it.  It should be possible
+	   to remove this when we no longer care about versions of
+	   clang before 3.8.  The test for this is
+	   misc/cgo/testsanitizers.
+
+	   GCC works hard to eliminate a seemingly unnecessary call to
+	   malloc, so we actually use the memory we allocate.  */
+
+	setg_gcc = setg;
+	attr = (pthread_attr_t*)malloc(sizeof *attr);
+	if (attr == NULL) {
+		fatalf("malloc failed: %s", strerror(errno));
+	}
+	pthread_attr_init(attr);
+	pthread_attr_getstacksize(attr, &size);
+	g->stacklo = (uintptr)__builtin_frame_address(0) - size + 4096;
+	if (g->stacklo >= g->stackhi)
+		fatalf("bad stack bounds: lo=%p hi=%p\n", g->stacklo, g->stackhi);
+	pthread_attr_destroy(attr);
+	free(attr);
+
+	if (x_cgo_inittls) {
+		x_cgo_inittls(tlsg, tlsbase);
+	}
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	_cgo_tsan_acquire();
+	free(v);
+	_cgo_tsan_release();
+
+	crosscall_amd64(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_linux_arm.c b/src/runtime/cgo/gcc_linux_arm.c
new file mode 100644
index 0000000..5e97a9e
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_arm.c
@@ -0,0 +1,69 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase) __attribute__((common));
+static void (*setg_gcc)(void*);
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+extern void crosscall_arm1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall_arm1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+
+	if (x_cgo_inittls) {
+		x_cgo_inittls(tlsg, tlsbase);
+	}
+}
diff --git a/src/runtime/cgo/gcc_linux_arm64.c b/src/runtime/cgo/gcc_linux_arm64.c
new file mode 100644
index 0000000..dac45e4
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_arm64.c
@@ -0,0 +1,91 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <pthread.h>
+#include <errno.h>
+#include <string.h>
+#include <signal.h>
+#include <stdlib.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase) __attribute__((common));
+static void (*setg_gcc)(void*);
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	pthread_attr_t *attr;
+	size_t size;
+
+	/* The memory sanitizer distributed with versions of clang
+	   before 3.8 has a bug: if you call mmap before malloc, mmap
+	   may return an address that is later overwritten by the msan
+	   library.  Avoid this problem by forcing a call to malloc
+	   here, before we ever call malloc.
+
+	   This is only required for the memory sanitizer, so it's
+	   unfortunate that we always run it.  It should be possible
+	   to remove this when we no longer care about versions of
+	   clang before 3.8.  The test for this is
+	   misc/cgo/testsanitizers.
+
+	   GCC works hard to eliminate a seemingly unnecessary call to
+	   malloc, so we actually use the memory we allocate.  */
+
+	setg_gcc = setg;
+	attr = (pthread_attr_t*)malloc(sizeof *attr);
+	if (attr == NULL) {
+		fatalf("malloc failed: %s", strerror(errno));
+	}
+	pthread_attr_init(attr);
+	pthread_attr_getstacksize(attr, &size);
+	g->stacklo = (uintptr)&size - size + 4096;
+	pthread_attr_destroy(attr);
+	free(attr);
+
+	if (x_cgo_inittls) {
+		x_cgo_inittls(tlsg, tlsbase);
+	}
+}
diff --git a/src/runtime/cgo/gcc_linux_loong64.c b/src/runtime/cgo/gcc_linux_loong64.c
new file mode 100644
index 0000000..96a06eb
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_loong64.c
@@ -0,0 +1,69 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase);
+static void (*setg_gcc)(void*);
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+
+	if (x_cgo_inittls) {
+		x_cgo_inittls(tlsg, tlsbase);
+	}
+}
diff --git a/src/runtime/cgo/gcc_linux_mips64x.c b/src/runtime/cgo/gcc_linux_mips64x.c
new file mode 100644
index 0000000..c059fd1
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_mips64x.c
@@ -0,0 +1,71 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (mips64 || mips64le)
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase);
+static void (*setg_gcc)(void*);
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+
+	if (x_cgo_inittls) {
+		x_cgo_inittls(tlsg, tlsbase);
+	}
+}
diff --git a/src/runtime/cgo/gcc_linux_mipsx.c b/src/runtime/cgo/gcc_linux_mipsx.c
new file mode 100644
index 0000000..218b8fd
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_mipsx.c
@@ -0,0 +1,72 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (mips || mipsle)
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase);
+static void (*setg_gcc)(void*);
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+
+	if (x_cgo_inittls) {
+		x_cgo_inittls(tlsg, tlsbase);
+	}
+}
diff --git a/src/runtime/cgo/gcc_linux_ppc64x.S b/src/runtime/cgo/gcc_linux_ppc64x.S
new file mode 100644
index 0000000..745d232
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_ppc64x.S
@@ -0,0 +1,86 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (ppc64 || ppc64le)
+
+.file "gcc_linux_ppc64x.S"
+
+// Define a frame which has no argument space, but is compatible with
+// a call into a Go ABI. We allocate 32B to match FIXED_FRAME with
+// similar semantics, except we store the backchain pointer, not the
+// LR at offset 0. R2 is stored in the Go TOC save slot (offset 24).
+.set GPR_OFFSET, 32
+.set FPR_OFFSET, GPR_OFFSET + 18*8
+.set VR_OFFSET, FPR_OFFSET + 18*8
+.set FRAME_SIZE, VR_OFFSET + 12*16
+
+.macro FOR_EACH_GPR opcode r=14
+.ifge 31 - \r
+	\opcode \r, GPR_OFFSET + 8*(\r-14)(1)
+	FOR_EACH_GPR \opcode "(\r+1)"
+.endif
+.endm
+
+.macro FOR_EACH_FPR opcode fr=14
+.ifge 31 - \fr
+	\opcode \fr, FPR_OFFSET + 8*(\fr-14)(1)
+	FOR_EACH_FPR \opcode "(\fr+1)"
+.endif
+.endm
+
+.macro FOR_EACH_VR opcode vr=20
+.ifge 31 - \vr
+	li 0, VR_OFFSET + 16*(\vr-20)
+	\opcode \vr, 1, 0
+	FOR_EACH_VR \opcode "(\vr+1)"
+.endif
+.endm
+
+/*
+ * void crosscall_ppc64(void (*fn)(void), void *g)
+ *
+ * Calling into the gc tool chain, where all registers are caller save.
+ * Called from standard ppc64 C ABI, where r2, r14-r31, f14-f31 are
+ * callee-save, so they must be saved explicitly.
+ */
+.globl crosscall_ppc64
+crosscall_ppc64:
+	// Start with standard C stack frame layout and linkage
+	mflr	%r0
+	std	%r0, 16(%r1)	// Save LR in caller's frame
+	mfcr	%r0
+	std	%r0, 8(%r1)	// Save CR in caller's frame
+	stdu	%r1, -FRAME_SIZE(%r1)
+	std	%r2, 24(%r1)
+
+	FOR_EACH_GPR std
+	FOR_EACH_FPR stfd
+	FOR_EACH_VR stvx
+
+	// Set up Go ABI constant registers
+	li	%r0, 0
+
+	// Restore g pointer (r30 in Go ABI, which may have been clobbered by C)
+	mr	%r30, %r4
+
+	// Call fn
+	mr	%r12, %r3
+	mtctr	%r3
+	bctrl
+
+	FOR_EACH_GPR ld
+	FOR_EACH_FPR lfd
+	FOR_EACH_VR lvx
+
+	ld	%r2, 24(%r1)
+	addi	%r1, %r1, FRAME_SIZE
+	ld	%r0, 16(%r1)
+	mtlr	%r0
+	ld	%r0, 8(%r1)
+	mtcr	%r0
+	blr
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",%progbits
+#endif
diff --git a/src/runtime/cgo/gcc_linux_riscv64.c b/src/runtime/cgo/gcc_linux_riscv64.c
new file mode 100644
index 0000000..99c2866
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_riscv64.c
@@ -0,0 +1,69 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase);
+static void (*setg_gcc)(void*);
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+
+	if (x_cgo_inittls) {
+		x_cgo_inittls(tlsg, tlsbase);
+	}
+}
diff --git a/src/runtime/cgo/gcc_linux_s390x.c b/src/runtime/cgo/gcc_linux_s390x.c
new file mode 100644
index 0000000..bb60048
--- /dev/null
+++ b/src/runtime/cgo/gcc_linux_s390x.c
@@ -0,0 +1,69 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsbase)
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+extern void crosscall_s390x(void (*fn)(void), void *g);
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	// Save g for this thread in C TLS
+	setg_gcc((void*)ts.g);
+
+	crosscall_s390x(ts.fn, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_loong64.S b/src/runtime/cgo/gcc_loong64.S
new file mode 100644
index 0000000..6b7668f
--- /dev/null
+++ b/src/runtime/cgo/gcc_loong64.S
@@ -0,0 +1,67 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+.file "gcc_loong64.S"
+
+/*
+ * void crosscall1(void (*fn)(void), void (*setg_gcc)(void *g), void *g)
+ *
+ * Calling into the gc tool chain, where all registers are caller save.
+ * Called from standard lp64d ABI, where $r1, $r3, $r23-$r30, and $f24-$f31
+ * are callee-save, so they must be saved explicitly, along with $r1 (LR).
+ */
+.globl crosscall1
+crosscall1:
+	addi.d	$r3, $r3, -160
+	st.d	$r1, $r3, 0
+	st.d	$r23, $r3, 8
+	st.d	$r24, $r3, 16
+	st.d	$r25, $r3, 24
+	st.d	$r26, $r3, 32
+	st.d	$r27, $r3, 40
+	st.d	$r28, $r3, 48
+	st.d	$r29, $r3, 56
+	st.d	$r30, $r3, 64
+	st.d	$r2, $r3, 72
+	st.d	$r22, $r3, 80
+	fst.d	$f24, $r3, 88
+	fst.d	$f25, $r3, 96
+	fst.d	$f26, $r3, 104
+	fst.d	$f27, $r3, 112
+	fst.d	$f28, $r3, 120
+	fst.d	$f29, $r3, 128
+	fst.d	$f30, $r3, 136
+	fst.d	$f31, $r3, 144
+
+	move	$r18, $r4 // save R4
+	move	$r19, $r6
+	jirl	$r1, $r5, 0	// call setg_gcc (clobbers R4)
+	jirl	$r1, $r18, 0	// call fn
+
+	ld.d	$r23, $r3, 8
+	ld.d	$r24, $r3, 16
+	ld.d	$r25, $r3, 24
+	ld.d	$r26, $r3, 32
+	ld.d	$r27, $r3, 40
+	ld.d	$r28, $r3, 48
+	ld.d	$r29, $r3, 56
+	ld.d	$r30, $r3, 64
+	ld.d	$r2, $r3, 72
+	ld.d	$r22, $r3, 80
+	fld.d	$f24, $r3, 88
+	fld.d	$f25, $r3, 96
+	fld.d	$f26, $r3, 104
+	fld.d	$f27, $r3, 112
+	fld.d	$f28, $r3, 120
+	fld.d	$f29, $r3, 128
+	fld.d	$f30, $r3, 136
+	fld.d	$f31, $r3, 144
+	ld.d	$r1, $r3, 0
+	addi.d	$r3, $r3, 160
+	jirl	$r0, $r1, 0
+
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",%progbits
+#endif
diff --git a/src/runtime/cgo/gcc_mips64x.S b/src/runtime/cgo/gcc_mips64x.S
new file mode 100644
index 0000000..1629e47
--- /dev/null
+++ b/src/runtime/cgo/gcc_mips64x.S
@@ -0,0 +1,89 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips64 || mips64le
+
+.file "gcc_mips64x.S"
+
+/*
+ * void crosscall1(void (*fn)(void), void (*setg_gcc)(void *g), void *g)
+ *
+ * Calling into the gc tool chain, where all registers are caller save.
+ * Called from standard MIPS N64 ABI, where $16-$23, $28, $30, and $f24-$f31
+ * are callee-save, so they must be saved explicitly, along with $31 (LR).
+ */
+.globl crosscall1
+.set noat
+crosscall1:
+#ifndef __mips_soft_float
+	daddiu	$29, $29, -160
+#else
+	daddiu	$29, $29, -96 // For soft-float, no need to make room for FP registers
+#endif
+	sd	$31, 0($29)
+	sd	$16, 8($29)
+	sd	$17, 16($29)
+	sd	$18, 24($29)
+	sd	$19, 32($29)
+	sd	$20, 40($29)
+	sd	$21, 48($29)
+	sd	$22, 56($29)
+	sd	$23, 64($29)
+	sd	$28, 72($29)
+	sd	$30, 80($29)
+#ifndef __mips_soft_float
+	sdc1	$f24, 88($29)
+	sdc1	$f25, 96($29)
+	sdc1	$f26, 104($29)
+	sdc1	$f27, 112($29)
+	sdc1	$f28, 120($29)
+	sdc1	$f29, 128($29)
+	sdc1	$f30, 136($29)
+	sdc1	$f31, 144($29)
+#endif
+
+	// prepare SB register = pc & 0xffffffff00000000
+	bal	1f
+1:
+	dsrl	$28, $31, 32
+	dsll	$28, $28, 32
+
+	move	$20, $4 // save R4
+	move	$1, $6
+	jalr	$5	// call setg_gcc (clobbers R4)
+	jalr	$20	// call fn
+
+	ld	$16, 8($29)
+	ld	$17, 16($29)
+	ld	$18, 24($29)
+	ld	$19, 32($29)
+	ld	$20, 40($29)
+	ld	$21, 48($29)
+	ld	$22, 56($29)
+	ld	$23, 64($29)
+	ld	$28, 72($29)
+	ld	$30, 80($29)
+#ifndef __mips_soft_float
+	ldc1	$f24, 88($29)
+	ldc1	$f25, 96($29)
+	ldc1	$f26, 104($29)
+	ldc1	$f27, 112($29)
+	ldc1	$f28, 120($29)
+	ldc1	$f29, 128($29)
+	ldc1	$f30, 136($29)
+	ldc1	$f31, 144($29)
+#endif
+	ld	$31, 0($29)
+#ifndef __mips_soft_float
+	daddiu	$29, $29, 160
+#else
+	daddiu	$29, $29, 96
+#endif
+	jr	$31
+
+.set at
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",%progbits
+#endif
diff --git a/src/runtime/cgo/gcc_mipsx.S b/src/runtime/cgo/gcc_mipsx.S
new file mode 100644
index 0000000..fb19c11
--- /dev/null
+++ b/src/runtime/cgo/gcc_mipsx.S
@@ -0,0 +1,77 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips || mipsle
+
+.file "gcc_mipsx.S"
+
+/*
+ * void crosscall1(void (*fn)(void), void (*setg_gcc)(void *g), void *g)
+ *
+ * Calling into the gc tool chain, where all registers are caller save.
+ * Called from standard MIPS O32 ABI, where $16-$23, $30, and $f20-$f31
+ * are callee-save, so they must be saved explicitly, along with $31 (LR).
+ */
+.globl crosscall1
+.set noat
+crosscall1:
+#ifndef __mips_soft_float
+	addiu	$29, $29, -88
+#else
+	addiu	$29, $29, -40 // For soft-float, no need to make room for FP registers
+#endif
+	sw	$31, 0($29)
+	sw	$16, 4($29)
+	sw	$17, 8($29)
+	sw	$18, 12($29)
+	sw	$19, 16($29)
+	sw	$20, 20($29)
+	sw	$21, 24($29)
+	sw	$22, 28($29)
+	sw	$23, 32($29)
+	sw	$30, 36($29)
+
+#ifndef __mips_soft_float
+	sdc1	$f20, 40($29)
+	sdc1	$f22, 48($29)
+	sdc1	$f24, 56($29)
+	sdc1	$f26, 64($29)
+	sdc1	$f28, 72($29)
+	sdc1	$f30, 80($29)
+#endif
+	move	$20, $4 // save R4
+	move	$4, $6
+	jalr	$5	// call setg_gcc
+	jalr	$20	// call fn
+
+	lw	$16, 4($29)
+	lw	$17, 8($29)
+	lw	$18, 12($29)
+	lw	$19, 16($29)
+	lw	$20, 20($29)
+	lw	$21, 24($29)
+	lw	$22, 28($29)
+	lw	$23, 32($29)
+	lw	$30, 36($29)
+#ifndef __mips_soft_float
+	ldc1	$f20, 40($29)
+	ldc1	$f22, 48($29)
+	ldc1	$f24, 56($29)
+	ldc1	$f26, 64($29)
+	ldc1	$f28, 72($29)
+	ldc1	$f30, 80($29)
+#endif
+	lw	$31, 0($29)
+#ifndef __mips_soft_float
+	addiu	$29, $29, 88
+#else
+	addiu	$29, $29, 40
+#endif
+	jr	$31
+
+.set at
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",%progbits
+#endif
diff --git a/src/runtime/cgo/gcc_mmap.c b/src/runtime/cgo/gcc_mmap.c
new file mode 100644
index 0000000..1fbd5e8
--- /dev/null
+++ b/src/runtime/cgo/gcc_mmap.c
@@ -0,0 +1,39 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build (linux && (amd64 || arm64 || ppc64le)) || (freebsd && amd64)
+
+#include <errno.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <sys/mman.h>
+
+#include "libcgo.h"
+
+uintptr_t
+x_cgo_mmap(void *addr, uintptr_t length, int32_t prot, int32_t flags, int32_t fd, uint32_t offset) {
+	void *p;
+
+	_cgo_tsan_acquire();
+	p = mmap(addr, length, prot, flags, fd, offset);
+	_cgo_tsan_release();
+	if (p == MAP_FAILED) {
+		/* This is what the Go code expects on failure.  */
+		return (uintptr_t)errno;
+	}
+	return (uintptr_t)p;
+}
+
+void
+x_cgo_munmap(void *addr, uintptr_t length) {
+	int r;
+
+	_cgo_tsan_acquire();
+	r = munmap(addr, length);
+	_cgo_tsan_release();
+	if (r < 0) {
+		/* The Go runtime is not prepared for munmap to fail.  */
+		abort();
+	}
+}
diff --git a/src/runtime/cgo/gcc_netbsd_386.c b/src/runtime/cgo/gcc_netbsd_386.c
new file mode 100644
index 0000000..5495f0f
--- /dev/null
+++ b/src/runtime/cgo/gcc_netbsd_386.c
@@ -0,0 +1,82 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+	stack_t ss;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	/*
+	 * Set specific keys.
+	 */
+	setg_gcc((void*)ts.g);
+
+	// On NetBSD, a new thread inherits the signal stack of the
+	// creating thread. That confuses minit, so we remove that
+	// signal stack here before calling the regular mstart. It's
+	// a bit baroque to remove a signal stack here only to add one
+	// in minit, but it's a simple change that keeps NetBSD
+	// working like other OS's. At this point all signals are
+	// blocked, so there is no race.
+	memset(&ss, 0, sizeof ss);
+	ss.ss_flags = SS_DISABLE;
+	sigaltstack(&ss, nil);
+
+	crosscall_386(ts.fn);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_netbsd_amd64.c b/src/runtime/cgo/gcc_netbsd_amd64.c
new file mode 100644
index 0000000..9f4b031
--- /dev/null
+++ b/src/runtime/cgo/gcc_netbsd_amd64.c
@@ -0,0 +1,78 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+	stack_t ss;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	// On NetBSD, a new thread inherits the signal stack of the
+	// creating thread. That confuses minit, so we remove that
+	// signal stack here before calling the regular mstart. It's
+	// a bit baroque to remove a signal stack here only to add one
+	// in minit, but it's a simple change that keeps NetBSD
+	// working like other OS's. At this point all signals are
+	// blocked, so there is no race.
+	memset(&ss, 0, sizeof ss);
+	ss.ss_flags = SS_DISABLE;
+	sigaltstack(&ss, nil);
+
+	crosscall_amd64(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_netbsd_arm.c b/src/runtime/cgo/gcc_netbsd_arm.c
new file mode 100644
index 0000000..b0c80ea
--- /dev/null
+++ b/src/runtime/cgo/gcc_netbsd_arm.c
@@ -0,0 +1,79 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+extern void crosscall_arm1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+	stack_t ss;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	// On NetBSD, a new thread inherits the signal stack of the
+	// creating thread. That confuses minit, so we remove that
+	// signal stack here before calling the regular mstart. It's
+	// a bit baroque to remove a signal stack here only to add one
+	// in minit, but it's a simple change that keeps NetBSD
+	// working like other OS's. At this point all signals are
+	// blocked, so there is no race.
+	memset(&ss, 0, sizeof ss);
+	ss.ss_flags = SS_DISABLE;
+	sigaltstack(&ss, nil);
+
+	crosscall_arm1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_netbsd_arm64.c b/src/runtime/cgo/gcc_netbsd_arm64.c
new file mode 100644
index 0000000..694116c
--- /dev/null
+++ b/src/runtime/cgo/gcc_netbsd_arm64.c
@@ -0,0 +1,80 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+	stack_t ss;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	// On NetBSD, a new thread inherits the signal stack of the
+	// creating thread. That confuses minit, so we remove that
+	// signal stack here before calling the regular mstart. It's
+	// a bit baroque to remove a signal stack here only to add one
+	// in minit, but it's a simple change that keeps NetBSD
+	// working like other OS's. At this point all signals are
+	// blocked, so there is no race.
+	memset(&ss, 0, sizeof ss);
+	ss.ss_flags = SS_DISABLE;
+	sigaltstack(&ss, nil);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_openbsd_386.c b/src/runtime/cgo/gcc_openbsd_386.c
new file mode 100644
index 0000000..127a1b6
--- /dev/null
+++ b/src/runtime/cgo/gcc_openbsd_386.c
@@ -0,0 +1,70 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	/*
+	 * Set specific keys.
+	 */
+	setg_gcc((void*)ts.g);
+
+	crosscall_386(ts.fn);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_openbsd_amd64.c b/src/runtime/cgo/gcc_openbsd_amd64.c
new file mode 100644
index 0000000..09d2750
--- /dev/null
+++ b/src/runtime/cgo/gcc_openbsd_amd64.c
@@ -0,0 +1,65 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall_amd64(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_openbsd_arm.c b/src/runtime/cgo/gcc_openbsd_arm.c
new file mode 100644
index 0000000..9a5757f
--- /dev/null
+++ b/src/runtime/cgo/gcc_openbsd_arm.c
@@ -0,0 +1,67 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+extern void crosscall_arm1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall_arm1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_openbsd_arm64.c b/src/runtime/cgo/gcc_openbsd_arm64.c
new file mode 100644
index 0000000..abf9f66
--- /dev/null
+++ b/src/runtime/cgo/gcc_openbsd_arm64.c
@@ -0,0 +1,67 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_openbsd_mips64.c b/src/runtime/cgo/gcc_openbsd_mips64.c
new file mode 100644
index 0000000..79f039a
--- /dev/null
+++ b/src/runtime/cgo/gcc_openbsd_mips64.c
@@ -0,0 +1,67 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <sys/types.h>
+#include <pthread.h>
+#include <signal.h>
+#include <string.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall1(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_ppc64x.c b/src/runtime/cgo/gcc_ppc64x.c
new file mode 100644
index 0000000..bfdcf65
--- /dev/null
+++ b/src/runtime/cgo/gcc_ppc64x.c
@@ -0,0 +1,73 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ppc64 || ppc64le
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void *threadentry(void*);
+
+void (*x_cgo_inittls)(void **tlsg, void **tlsbase);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsbase)
+{
+	pthread_attr_t attr;
+	size_t size;
+
+	setg_gcc = setg;
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	g->stacklo = (uintptr)&attr - size + 4096;
+	pthread_attr_destroy(&attr);
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	// Leave stacklo=0 and set stackhi=size; mstart will do the rest.
+	ts->g->stackhi = size;
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fatalf("pthread_create failed: %s", strerror(err));
+	}
+}
+
+extern void crosscall_ppc64(void (*fn)(void), void *g);
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	_cgo_tsan_acquire();
+	free(v);
+	_cgo_tsan_release();
+
+	// Save g for this thread in C TLS
+	setg_gcc((void*)ts.g);
+
+	crosscall_ppc64(ts.fn, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_riscv64.S b/src/runtime/cgo/gcc_riscv64.S
new file mode 100644
index 0000000..8f07649
--- /dev/null
+++ b/src/runtime/cgo/gcc_riscv64.S
@@ -0,0 +1,82 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+.file "gcc_riscv64.S"
+
+/*
+ * void crosscall1(void (*fn)(void), void (*setg_gcc)(void *g), void *g)
+ *
+ * Calling into the gc tool chain, where all registers are caller save.
+ * Called from standard RISCV ELF psABI, where x8-x9, x18-x27, f8-f9 and
+ * f18-f27 are callee-save, so they must be saved explicitly, along with
+ * x1 (LR).
+ */
+.globl crosscall1
+crosscall1:
+	sd	x1, -200(sp)
+	addi	sp, sp, -200
+	sd	x8, 8(sp)
+	sd	x9, 16(sp)
+	sd	x18, 24(sp)
+	sd	x19, 32(sp)
+	sd	x20, 40(sp)
+	sd	x21, 48(sp)
+	sd	x22, 56(sp)
+	sd	x23, 64(sp)
+	sd	x24, 72(sp)
+	sd	x25, 80(sp)
+	sd	x26, 88(sp)
+	sd	x27, 96(sp)
+	fsd	f8, 104(sp)
+	fsd	f9, 112(sp)
+	fsd	f18, 120(sp)
+	fsd	f19, 128(sp)
+	fsd	f20, 136(sp)
+	fsd	f21, 144(sp)
+	fsd	f22, 152(sp)
+	fsd	f23, 160(sp)
+	fsd	f24, 168(sp)
+	fsd	f25, 176(sp)
+	fsd	f26, 184(sp)
+	fsd	f27, 192(sp)
+
+	// a0 = *fn, a1 = *setg_gcc, a2 = *g
+	mv	s1, a0
+	mv	s0, a1
+	mv	a0, a2
+	jalr	ra, s0	// call setg_gcc (clobbers x30 aka g)
+	jalr	ra, s1	// call fn
+
+	ld	x1, 0(sp)
+	ld	x8, 8(sp)
+	ld	x9, 16(sp)
+	ld	x18, 24(sp)
+	ld	x19, 32(sp)
+	ld	x20, 40(sp)
+	ld	x21, 48(sp)
+	ld	x22, 56(sp)
+	ld	x23, 64(sp)
+	ld	x24, 72(sp)
+	ld	x25, 80(sp)
+	ld	x26, 88(sp)
+	ld	x27, 96(sp)
+	fld	f8, 104(sp)
+	fld	f9, 112(sp)
+	fld	f18, 120(sp)
+	fld	f19, 128(sp)
+	fld	f20, 136(sp)
+	fld	f21, 144(sp)
+	fld	f22, 152(sp)
+	fld	f23, 160(sp)
+	fld	f24, 168(sp)
+	fld	f25, 176(sp)
+	fld	f26, 184(sp)
+	fld	f27, 192(sp)
+	addi	sp, sp, 200
+
+	jr	ra
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",%progbits
+#endif
diff --git a/src/runtime/cgo/gcc_s390x.S b/src/runtime/cgo/gcc_s390x.S
new file mode 100644
index 0000000..8bd30fe
--- /dev/null
+++ b/src/runtime/cgo/gcc_s390x.S
@@ -0,0 +1,58 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+.file "gcc_s390x.S"
+
+/*
+ * void crosscall_s390x(void (*fn)(void), void *g)
+ *
+ * Calling into the go tool chain, where all registers are caller save.
+ * Called from standard s390x C ABI, where r6-r13, r15, and f8-f15 are
+ * callee-save, so they must be saved explicitly.
+ */
+.globl crosscall_s390x
+crosscall_s390x:
+	/* save r6-r15 in the register save area of the calling function */
+	stmg    %r6, %r15, 48(%r15)
+
+	/* allocate 64 bytes of stack space to save f8-f15 */
+	lay     %r15, -64(%r15)
+
+	/* save callee-saved floating point registers */
+	std     %f8, 0(%r15)
+	std     %f9, 8(%r15)
+	std     %f10, 16(%r15)
+	std     %f11, 24(%r15)
+	std     %f12, 32(%r15)
+	std     %f13, 40(%r15)
+	std     %f14, 48(%r15)
+	std     %f15, 56(%r15)
+
+	/* restore g pointer */
+	lgr     %r13, %r3
+
+	/* call fn */
+	basr    %r14, %r2
+
+	/* restore floating point registers */
+	ld      %f8, 0(%r15)
+	ld      %f9, 8(%r15)
+	ld      %f10, 16(%r15)
+	ld      %f11, 24(%r15)
+	ld      %f12, 32(%r15)
+	ld      %f13, 40(%r15)
+	ld      %f14, 48(%r15)
+	ld      %f15, 56(%r15)
+
+	/* de-allocate stack frame */
+	la      %r15, 64(%r15)
+
+	/* restore general purpose registers */
+	lmg     %r6, %r15, 48(%r15)
+
+	br      %r14 /* restored by lmg */
+
+#ifdef __ELF__
+.section .note.GNU-stack,"",%progbits
+#endif
diff --git a/src/runtime/cgo/gcc_setenv.c b/src/runtime/cgo/gcc_setenv.c
new file mode 100644
index 0000000..47caa4b
--- /dev/null
+++ b/src/runtime/cgo/gcc_setenv.c
@@ -0,0 +1,27 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+#include "libcgo.h"
+
+#include <stdlib.h>
+
+/* Stub for calling setenv */
+void
+x_cgo_setenv(char **arg)
+{
+	_cgo_tsan_acquire();
+	setenv(arg[0], arg[1], 1);
+	_cgo_tsan_release();
+}
+
+/* Stub for calling unsetenv */
+void
+x_cgo_unsetenv(char **arg)
+{
+	_cgo_tsan_acquire();
+	unsetenv(arg[0]);
+	_cgo_tsan_release();
+}
diff --git a/src/runtime/cgo/gcc_sigaction.c b/src/runtime/cgo/gcc_sigaction.c
new file mode 100644
index 0000000..374909b
--- /dev/null
+++ b/src/runtime/cgo/gcc_sigaction.c
@@ -0,0 +1,82 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (amd64 || arm64 || ppc64le)
+
+#include <errno.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <string.h>
+#include <signal.h>
+
+#include "libcgo.h"
+
+// go_sigaction_t is a C version of the sigactiont struct from
+// defs_linux_amd64.go.  This definition — and its conversion to and from struct
+// sigaction — are specific to linux/amd64.
+typedef struct {
+	uintptr_t handler;
+	uint64_t flags;
+	uintptr_t restorer;
+	uint64_t mask;
+} go_sigaction_t;
+
+// SA_RESTORER is part of the kernel interface.
+// This is Linux i386/amd64 specific.
+#ifndef SA_RESTORER
+#define SA_RESTORER 0x4000000
+#endif
+
+int32_t
+x_cgo_sigaction(intptr_t signum, const go_sigaction_t *goact, go_sigaction_t *oldgoact) {
+	int32_t ret;
+	struct sigaction act;
+	struct sigaction oldact;
+	size_t i;
+
+	_cgo_tsan_acquire();
+
+	memset(&act, 0, sizeof act);
+	memset(&oldact, 0, sizeof oldact);
+
+	if (goact) {
+		if (goact->flags & SA_SIGINFO) {
+			act.sa_sigaction = (void(*)(int, siginfo_t*, void*))(goact->handler);
+		} else {
+			act.sa_handler = (void(*)(int))(goact->handler);
+		}
+		sigemptyset(&act.sa_mask);
+		for (i = 0; i < 8 * sizeof(goact->mask); i++) {
+			if (goact->mask & ((uint64_t)(1)<<i)) {
+				sigaddset(&act.sa_mask, (int)(i+1));
+			}
+		}
+		act.sa_flags = (int)(goact->flags & ~(uint64_t)SA_RESTORER);
+	}
+
+	ret = sigaction((int)signum, goact ? &act : NULL, oldgoact ? &oldact : NULL);
+	if (ret == -1) {
+		// runtime.rt_sigaction expects _cgo_sigaction to return errno on error.
+		_cgo_tsan_release();
+		return errno;
+	}
+
+	if (oldgoact) {
+		if (oldact.sa_flags & SA_SIGINFO) {
+			oldgoact->handler = (uintptr_t)(oldact.sa_sigaction);
+		} else {
+			oldgoact->handler = (uintptr_t)(oldact.sa_handler);
+		}
+		oldgoact->mask = 0;
+		for (i = 0; i < 8 * sizeof(oldgoact->mask); i++) {
+			if (sigismember(&oldact.sa_mask, (int)(i+1)) == 1) {
+				oldgoact->mask |= (uint64_t)(1)<<i;
+			}
+		}
+		oldgoact->flags = (uint64_t)oldact.sa_flags;
+	}
+
+	_cgo_tsan_release();
+	return ret;
+}
diff --git a/src/runtime/cgo/gcc_signal2_ios_arm64.c b/src/runtime/cgo/gcc_signal2_ios_arm64.c
new file mode 100644
index 0000000..f8cef54
--- /dev/null
+++ b/src/runtime/cgo/gcc_signal2_ios_arm64.c
@@ -0,0 +1,11 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build lldb
+
+// Used by gcc_signal_darwin_arm64.c when doing the test build during cgo.
+// We hope that for real binaries the definition provided by Go will take precedence
+// and the linker will drop this .o file altogether, which is why this definition
+// is all by itself in its own file.
+void __attribute__((weak)) xx_cgo_panicmem(void) {}
diff --git a/src/runtime/cgo/gcc_signal_ios_arm64.c b/src/runtime/cgo/gcc_signal_ios_arm64.c
new file mode 100644
index 0000000..87055e9
--- /dev/null
+++ b/src/runtime/cgo/gcc_signal_ios_arm64.c
@@ -0,0 +1,213 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Emulation of the Unix signal SIGSEGV.
+//
+// On iOS, Go tests and apps under development are run by lldb.
+// The debugger uses a task-level exception handler to intercept signals.
+// Despite having a 'handle' mechanism like gdb, lldb will not allow a
+// SIGSEGV to pass to the running program. For Go, this means we cannot
+// generate a panic, which cannot be recovered, and so tests fail.
+//
+// We work around this by registering a thread-level mach exception handler
+// and intercepting EXC_BAD_ACCESS. The kernel offers thread handlers a
+// chance to resolve exceptions before the task handler, so we can generate
+// the panic and avoid lldb's SIGSEGV handler.
+//
+// The dist tool enables this by build flag when testing.
+
+//go:build lldb
+
+#include <limits.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <signal.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <mach/arm/thread_status.h>
+#include <mach/exception_types.h>
+#include <mach/mach.h>
+#include <mach/mach_init.h>
+#include <mach/mach_port.h>
+#include <mach/thread_act.h>
+#include <mach/thread_status.h>
+
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+void xx_cgo_panicmem(void);
+uintptr_t x_cgo_panicmem = (uintptr_t)xx_cgo_panicmem;
+
+static pthread_mutex_t mach_exception_handler_port_set_mu;
+static mach_port_t mach_exception_handler_port_set = MACH_PORT_NULL;
+
+kern_return_t
+catch_exception_raise(
+	mach_port_t exception_port,
+	mach_port_t thread,
+	mach_port_t task,
+	exception_type_t exception,
+	exception_data_t code_vector,
+	mach_msg_type_number_t code_count)
+{
+	kern_return_t ret;
+	arm_unified_thread_state_t thread_state;
+	mach_msg_type_number_t state_count = ARM_UNIFIED_THREAD_STATE_COUNT;
+
+	// Returning KERN_SUCCESS intercepts the exception.
+	//
+	// Returning KERN_FAILURE lets the exception fall through to the
+	// next handler, which is the standard signal emulation code
+	// registered on the task port.
+
+	if (exception != EXC_BAD_ACCESS) {
+		return KERN_FAILURE;
+	}
+
+	ret = thread_get_state(thread, ARM_UNIFIED_THREAD_STATE, (thread_state_t)&thread_state, &state_count);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: thread_get_state failed: %d\n", ret);
+		abort();
+	}
+
+	// Bounce call to sigpanic through asm that makes it look like
+	// we call sigpanic directly from the faulting code.
+#ifdef __arm64__
+	thread_state.ts_64.__x[1] = thread_state.ts_64.__lr;
+	thread_state.ts_64.__x[2] = thread_state.ts_64.__pc;
+	thread_state.ts_64.__pc = x_cgo_panicmem;
+#else
+	thread_state.ts_32.__r[1] = thread_state.ts_32.__lr;
+	thread_state.ts_32.__r[2] = thread_state.ts_32.__pc;
+	thread_state.ts_32.__pc = x_cgo_panicmem;
+#endif
+
+	if (0) {
+		// Useful debugging logic when panicmem is broken.
+		//
+		// Sends the first SIGSEGV and lets lldb catch the
+		// second one, avoiding a loop that locks up iOS
+		// devices requiring a hard reboot.
+		fprintf(stderr, "runtime/cgo: caught exc_bad_access\n");
+		fprintf(stderr, "__lr = %llx\n", thread_state.ts_64.__lr);
+		fprintf(stderr, "__pc = %llx\n", thread_state.ts_64.__pc);
+		static int pass1 = 0;
+		if (pass1) {
+			return KERN_FAILURE;
+		}
+		pass1 = 1;
+	}
+
+	ret = thread_set_state(thread, ARM_UNIFIED_THREAD_STATE, (thread_state_t)&thread_state, state_count);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: thread_set_state failed: %d\n", ret);
+		abort();
+	}
+
+	return KERN_SUCCESS;
+}
+
+void
+darwin_arm_init_thread_exception_port()
+{
+	// Called by each new OS thread to bind its EXC_BAD_ACCESS exception
+	// to mach_exception_handler_port_set.
+	int ret;
+	mach_port_t port = MACH_PORT_NULL;
+
+	ret = mach_port_allocate(mach_task_self(), MACH_PORT_RIGHT_RECEIVE, &port);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: mach_port_allocate failed: %d\n", ret);
+		abort();
+	}
+	ret = mach_port_insert_right(
+		mach_task_self(),
+		port,
+		port,
+		MACH_MSG_TYPE_MAKE_SEND);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: mach_port_insert_right failed: %d\n", ret);
+		abort();
+	}
+
+	ret = thread_set_exception_ports(
+		mach_thread_self(),
+		EXC_MASK_BAD_ACCESS,
+		port,
+		EXCEPTION_DEFAULT,
+		THREAD_STATE_NONE);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: thread_set_exception_ports failed: %d\n", ret);
+		abort();
+	}
+
+	ret = pthread_mutex_lock(&mach_exception_handler_port_set_mu);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: pthread_mutex_lock failed: %d\n", ret);
+		abort();
+	}
+	ret = mach_port_move_member(
+		mach_task_self(),
+		port,
+		mach_exception_handler_port_set);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: mach_port_move_member failed: %d\n", ret);
+		abort();
+	}
+	ret = pthread_mutex_unlock(&mach_exception_handler_port_set_mu);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: pthread_mutex_unlock failed: %d\n", ret);
+		abort();
+	}
+}
+
+static void*
+mach_exception_handler(void *port)
+{
+	// Calls catch_exception_raise.
+	extern boolean_t exc_server();
+	mach_msg_server(exc_server, 2048, (mach_port_t)port, 0);
+	abort(); // never returns
+}
+
+void
+darwin_arm_init_mach_exception_handler()
+{
+	pthread_mutex_init(&mach_exception_handler_port_set_mu, NULL);
+
+	// Called once per process to initialize a mach port server, listening
+	// for EXC_BAD_ACCESS thread exceptions.
+	int ret;
+	pthread_t thr = NULL;
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+
+	ret = mach_port_allocate(
+		mach_task_self(),
+		MACH_PORT_RIGHT_PORT_SET,
+		&mach_exception_handler_port_set);
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: mach_port_allocate failed for port_set: %d\n", ret);
+		abort();
+	}
+
+	// Block all signals to the exception handler thread
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	// Start a thread to handle exceptions.
+	uintptr_t port_set = (uintptr_t)mach_exception_handler_port_set;
+	pthread_attr_init(&attr);
+	pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
+	ret = _cgo_try_pthread_create(&thr, &attr, mach_exception_handler, (void*)port_set);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (ret) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %d\n", ret);
+		abort();
+	}
+	pthread_attr_destroy(&attr);
+}
diff --git a/src/runtime/cgo/gcc_signal_ios_nolldb.c b/src/runtime/cgo/gcc_signal_ios_nolldb.c
new file mode 100644
index 0000000..9ddc37a
--- /dev/null
+++ b/src/runtime/cgo/gcc_signal_ios_nolldb.c
@@ -0,0 +1,10 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !lldb && ios && arm64
+
+#include <stdint.h>
+
+void darwin_arm_init_thread_exception_port() {}
+void darwin_arm_init_mach_exception_handler() {}
diff --git a/src/runtime/cgo/gcc_solaris_amd64.c b/src/runtime/cgo/gcc_solaris_amd64.c
new file mode 100644
index 0000000..e89e844
--- /dev/null
+++ b/src/runtime/cgo/gcc_solaris_amd64.c
@@ -0,0 +1,77 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <pthread.h>
+#include <string.h>
+#include <signal.h>
+#include <ucontext.h>
+#include "libcgo.h"
+#include "libcgo_unix.h"
+
+static void* threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	ucontext_t ctx;
+
+	setg_gcc = setg;
+	if (getcontext(&ctx) != 0)
+		perror("runtime/cgo: getcontext failed");
+	g->stacklo = (uintptr_t)ctx.uc_stack.ss_sp;
+
+	// Solaris processes report a tiny stack when run with "ulimit -s unlimited".
+	// Correct that as best we can: assume it's at least 1 MB.
+	// See golang.org/issue/12210.
+	if(ctx.uc_stack.ss_size < 1024*1024)
+		g->stacklo -= 1024*1024 - ctx.uc_stack.ss_size;
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	pthread_attr_t attr;
+	sigset_t ign, oset;
+	pthread_t p;
+	void *base;
+	size_t size;
+	int err;
+
+	sigfillset(&ign);
+	pthread_sigmask(SIG_SETMASK, &ign, &oset);
+
+	pthread_attr_init(&attr);
+
+	if (pthread_attr_getstack(&attr, &base, &size) != 0)
+		perror("runtime/cgo: pthread_attr_getstack failed");
+	if (size == 0) {
+		ts->g->stackhi = 2 << 20;
+		if (pthread_attr_setstack(&attr, NULL, ts->g->stackhi) != 0)
+			perror("runtime/cgo: pthread_attr_setstack failed");
+	} else {
+		ts->g->stackhi = size;
+	}
+	pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
+	err = _cgo_try_pthread_create(&p, &attr, threadentry, ts);
+
+	pthread_sigmask(SIG_SETMASK, &oset, nil);
+
+	if (err != 0) {
+		fprintf(stderr, "runtime/cgo: pthread_create failed: %s\n", strerror(err));
+		abort();
+	}
+}
+
+static void*
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall_amd64(ts.fn, setg_gcc, (void*)ts.g);
+	return nil;
+}
diff --git a/src/runtime/cgo/gcc_stack_darwin.c b/src/runtime/cgo/gcc_stack_darwin.c
new file mode 100644
index 0000000..0a9038e
--- /dev/null
+++ b/src/runtime/cgo/gcc_stack_darwin.c
@@ -0,0 +1,20 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <pthread.h>
+#include "libcgo.h"
+
+void
+x_cgo_getstackbound(uintptr bounds[2])
+{
+	void* addr;
+	size_t size;
+	pthread_t p;
+
+	p = pthread_self();
+	addr = pthread_get_stackaddr_np(p); // high address (!)
+	size = pthread_get_stacksize_np(p);
+	bounds[0] = (uintptr)addr - size;
+	bounds[1] = (uintptr)addr;
+}
diff --git a/src/runtime/cgo/gcc_stack_unix.c b/src/runtime/cgo/gcc_stack_unix.c
new file mode 100644
index 0000000..f3fead9
--- /dev/null
+++ b/src/runtime/cgo/gcc_stack_unix.c
@@ -0,0 +1,40 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix && !darwin
+
+#ifndef _GNU_SOURCE // pthread_getattr_np
+#define _GNU_SOURCE
+#endif
+
+#include <pthread.h>
+#include "libcgo.h"
+
+void
+x_cgo_getstackbound(uintptr bounds[2])
+{
+	pthread_attr_t attr;
+	void *addr;
+	size_t size;
+
+#if defined(__GLIBC__) || (defined(__sun) && !defined(__illumos__))
+	// pthread_getattr_np is a GNU extension supported in glibc.
+	// Solaris is not glibc but does support pthread_getattr_np
+	// (and the fallback doesn't work...). Illumos does not.
+	pthread_getattr_np(pthread_self(), &attr);  // GNU extension
+	pthread_attr_getstack(&attr, &addr, &size); // low address
+#elif defined(__illumos__)
+	pthread_attr_init(&attr);
+	pthread_attr_get_np(pthread_self(), &attr);
+	pthread_attr_getstack(&attr, &addr, &size); // low address
+#else
+	pthread_attr_init(&attr);
+	pthread_attr_getstacksize(&attr, &size);
+	addr = __builtin_frame_address(0) + 4096 - size;
+#endif
+	pthread_attr_destroy(&attr);
+
+	bounds[0] = (uintptr)addr;
+	bounds[1] = (uintptr)addr + size;
+}
diff --git a/src/runtime/cgo/gcc_stack_windows.c b/src/runtime/cgo/gcc_stack_windows.c
new file mode 100644
index 0000000..d798cc7
--- /dev/null
+++ b/src/runtime/cgo/gcc_stack_windows.c
@@ -0,0 +1,7 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "libcgo.h"
+
+void x_cgo_getstackbound(uintptr bounds[2]) {} // no-op for now
diff --git a/src/runtime/cgo/gcc_traceback.c b/src/runtime/cgo/gcc_traceback.c
new file mode 100644
index 0000000..c6643a1
--- /dev/null
+++ b/src/runtime/cgo/gcc_traceback.c
@@ -0,0 +1,46 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build darwin || linux
+
+#include <stdint.h>
+#include "libcgo.h"
+
+#ifndef __has_feature
+#define __has_feature(x) 0
+#endif
+
+#if __has_feature(memory_sanitizer)
+#include <sanitizer/msan_interface.h>
+#endif
+
+// Call the user's traceback function and then call sigtramp.
+// The runtime signal handler will jump to this code.
+// We do it this way so that the user's traceback function will be called
+// by a C function with proper unwind info.
+void
+x_cgo_callers(uintptr_t sig, void *info, void *context, void (*cgoTraceback)(struct cgoTracebackArg*), uintptr_t* cgoCallers, void (*sigtramp)(uintptr_t, void*, void*)) {
+	struct cgoTracebackArg arg;
+
+	arg.Context = 0;
+	arg.SigContext = (uintptr_t)(context);
+	arg.Buf = cgoCallers;
+	arg.Max = 32; // must match len(runtime.cgoCallers)
+
+#if __has_feature(memory_sanitizer)
+        // This function is called directly from the signal handler.
+        // The arguments are passed in registers, so whether msan
+        // considers cgoCallers to be initialized depends on whether
+        // it considers the appropriate register to be initialized.
+        // That can cause false reports in rare cases.
+        // Explicitly unpoison the memory to avoid that.
+        // See issue #47543 for more details.
+        __msan_unpoison(&arg, sizeof arg);
+#endif
+
+	_cgo_tsan_acquire();
+	(*cgoTraceback)(&arg);
+	_cgo_tsan_release();
+	sigtramp(sig, info, context);
+}
diff --git a/src/runtime/cgo/gcc_util.c b/src/runtime/cgo/gcc_util.c
new file mode 100644
index 0000000..3fcb48c
--- /dev/null
+++ b/src/runtime/cgo/gcc_util.c
@@ -0,0 +1,69 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "libcgo.h"
+
+/* Stub for creating a new thread */
+void
+x_cgo_thread_start(ThreadStart *arg)
+{
+	ThreadStart *ts;
+
+	/* Make our own copy that can persist after we return. */
+	_cgo_tsan_acquire();
+	ts = malloc(sizeof *ts);
+	_cgo_tsan_release();
+	if(ts == nil) {
+		fprintf(stderr, "runtime/cgo: out of memory in thread_start\n");
+		abort();
+	}
+	*ts = *arg;
+
+	_cgo_sys_thread_start(ts);	/* OS-dependent half */
+}
+
+#ifndef CGO_TSAN
+void(* const _cgo_yield)() = NULL;
+#else
+
+#include <string.h>
+
+char x_cgo_yield_strncpy_src = 0;
+char x_cgo_yield_strncpy_dst = 0;
+size_t x_cgo_yield_strncpy_n = 0;
+
+/*
+Stub for allowing libc interceptors to execute.
+
+_cgo_yield is set to NULL if we do not expect libc interceptors to exist.
+*/
+static void
+x_cgo_yield()
+{
+	/*
+	The libc function(s) we call here must form a no-op and include at least one
+	call that triggers TSAN to process pending asynchronous signals.
+
+	sleep(0) would be fine, but it's not portable C (so it would need more header
+	guards).
+	free(NULL) has a fast-path special case in TSAN, so it doesn't
+	trigger signal delivery.
+	free(malloc(0)) would work (triggering the interceptors in malloc), but
+	it also runs a bunch of user-supplied malloc hooks.
+
+	So we choose strncpy(_, _, 0): it requires an extra header,
+	but it's standard and should be very efficient.
+
+	GCC 7 has an unfortunate habit of optimizing out strncpy calls (see
+	https://golang.org/issue/21196), so the arguments here need to be global
+	variables with external linkage in order to ensure that the call traps all the
+	way down into libc.
+	*/
+	strncpy(&x_cgo_yield_strncpy_dst, &x_cgo_yield_strncpy_src,
+	        x_cgo_yield_strncpy_n);
+}
+
+void(* const _cgo_yield)() = &x_cgo_yield;
+
+#endif  /* GO_TSAN */
diff --git a/src/runtime/cgo/gcc_windows_386.c b/src/runtime/cgo/gcc_windows_386.c
new file mode 100644
index 0000000..0f4f01c
--- /dev/null
+++ b/src/runtime/cgo/gcc_windows_386.c
@@ -0,0 +1,51 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#define WIN32_LEAN_AND_MEAN
+#include <windows.h>
+#include <process.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <errno.h>
+#include "libcgo.h"
+#include "libcgo_windows.h"
+
+static void threadentry(void*);
+static DWORD *tls_g;
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	tls_g = (DWORD *)tlsg;
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	_cgo_beginthread(threadentry, ts);
+}
+
+static void
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	// minit queries stack bounds from the OS.
+
+	/*
+	 * Set specific keys in thread local storage.
+	 */
+	asm volatile (
+		"movl %0, %%fs:0(%1)\n"	// MOVL tls0, 0(tls_g)(FS)
+		"movl %%fs:0(%1), %%eax\n"	// MOVL 0(tls_g)(FS), tmp
+		"movl %2, 0(%%eax)\n"	// MOVL g, 0(AX)
+		:: "r"(ts.tls), "r"(*tls_g), "r"(ts.g) : "%eax"
+	);
+
+	crosscall_386(ts.fn);
+}
diff --git a/src/runtime/cgo/gcc_windows_amd64.c b/src/runtime/cgo/gcc_windows_amd64.c
new file mode 100644
index 0000000..3ff3c64
--- /dev/null
+++ b/src/runtime/cgo/gcc_windows_amd64.c
@@ -0,0 +1,51 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#define WIN32_LEAN_AND_MEAN
+#include <windows.h>
+#include <process.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <errno.h>
+#include "libcgo.h"
+#include "libcgo_windows.h"
+
+static void threadentry(void*);
+static void (*setg_gcc)(void*);
+static DWORD *tls_g;
+
+void
+x_cgo_init(G *g, void (*setg)(void*), void **tlsg, void **tlsbase)
+{
+	setg_gcc = setg;
+	tls_g = (DWORD *)tlsg;
+}
+
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	_cgo_beginthread(threadentry, ts);
+}
+
+static void
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	// minit queries stack bounds from the OS.
+
+	/*
+	 * Set specific keys in thread local storage.
+	 */
+	asm volatile (
+	  "movq %0, %%gs:0(%1)\n"	// MOVL tls0, 0(tls_g)(GS)
+	  :: "r"(ts.tls), "r"(*tls_g)
+	);
+
+	crosscall_amd64(ts.fn, setg_gcc, (void*)ts.g);
+}
diff --git a/src/runtime/cgo/gcc_windows_arm64.c b/src/runtime/cgo/gcc_windows_arm64.c
new file mode 100644
index 0000000..8f113cc
--- /dev/null
+++ b/src/runtime/cgo/gcc_windows_arm64.c
@@ -0,0 +1,40 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#define WIN32_LEAN_AND_MEAN
+#include <windows.h>
+#include <process.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <errno.h>
+#include "libcgo.h"
+#include "libcgo_windows.h"
+
+static void threadentry(void*);
+static void (*setg_gcc)(void*);
+
+void
+x_cgo_init(G *g, void (*setg)(void*))
+{
+	setg_gcc = setg;
+}
+
+void
+_cgo_sys_thread_start(ThreadStart *ts)
+{
+	_cgo_beginthread(threadentry, ts);
+}
+
+extern void crosscall1(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+
+static void
+threadentry(void *v)
+{
+	ThreadStart ts;
+
+	ts = *(ThreadStart*)v;
+	free(v);
+
+	crosscall1(ts.fn, setg_gcc, (void *)ts.g);
+}
diff --git a/src/runtime/cgo/handle.go b/src/runtime/cgo/handle.go
new file mode 100644
index 0000000..061dfb0
--- /dev/null
+++ b/src/runtime/cgo/handle.go
@@ -0,0 +1,144 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package cgo
+
+import (
+	"sync"
+	"sync/atomic"
+)
+
+// Handle provides a way to pass values that contain Go pointers
+// (pointers to memory allocated by Go) between Go and C without
+// breaking the cgo pointer passing rules. A Handle is an integer
+// value that can represent any Go value. A Handle can be passed
+// through C and back to Go, and Go code can use the Handle to
+// retrieve the original Go value.
+//
+// The underlying type of Handle is guaranteed to fit in an integer type
+// that is large enough to hold the bit pattern of any pointer. The zero
+// value of a Handle is not valid, and thus is safe to use as a sentinel
+// in C APIs.
+//
+// For instance, on the Go side:
+//
+//	package main
+//
+//	/*
+//	#include <stdint.h> // for uintptr_t
+//
+//	extern void MyGoPrint(uintptr_t handle);
+//	void myprint(uintptr_t handle);
+//	*/
+//	import "C"
+//	import "runtime/cgo"
+//
+//	//export MyGoPrint
+//	func MyGoPrint(handle C.uintptr_t) {
+//		h := cgo.Handle(handle)
+//		val := h.Value().(string)
+//		println(val)
+//		h.Delete()
+//	}
+//
+//	func main() {
+//		val := "hello Go"
+//		C.myprint(C.uintptr_t(cgo.NewHandle(val)))
+//		// Output: hello Go
+//	}
+//
+// and on the C side:
+//
+//	#include <stdint.h> // for uintptr_t
+//
+//	// A Go function
+//	extern void MyGoPrint(uintptr_t handle);
+//
+//	// A C function
+//	void myprint(uintptr_t handle) {
+//	    MyGoPrint(handle);
+//	}
+//
+// Some C functions accept a void* argument that points to an arbitrary
+// data value supplied by the caller. It is not safe to coerce a cgo.Handle
+// (an integer) to a Go unsafe.Pointer, but instead we can pass the address
+// of the cgo.Handle to the void* parameter, as in this variant of the
+// previous example:
+//
+//	package main
+//
+//	/*
+//	extern void MyGoPrint(void *context);
+//	static inline void myprint(void *context) {
+//	    MyGoPrint(context);
+//	}
+//	*/
+//	import "C"
+//	import (
+//		"runtime/cgo"
+//		"unsafe"
+//	)
+//
+//	//export MyGoPrint
+//	func MyGoPrint(context unsafe.Pointer) {
+//		h := *(*cgo.Handle)(context)
+//		val := h.Value().(string)
+//		println(val)
+//		h.Delete()
+//	}
+//
+//	func main() {
+//		val := "hello Go"
+//		h := cgo.NewHandle(val)
+//		C.myprint(unsafe.Pointer(&h))
+//		// Output: hello Go
+//	}
+type Handle uintptr
+
+// NewHandle returns a handle for a given value.
+//
+// The handle is valid until the program calls Delete on it. The handle
+// uses resources, and this package assumes that C code may hold on to
+// the handle, so a program must explicitly call Delete when the handle
+// is no longer needed.
+//
+// The intended use is to pass the returned handle to C code, which
+// passes it back to Go, which calls Value.
+func NewHandle(v any) Handle {
+	h := handleIdx.Add(1)
+	if h == 0 {
+		panic("runtime/cgo: ran out of handle space")
+	}
+
+	handles.Store(h, v)
+	return Handle(h)
+}
+
+// Value returns the associated Go value for a valid handle.
+//
+// The method panics if the handle is invalid.
+func (h Handle) Value() any {
+	v, ok := handles.Load(uintptr(h))
+	if !ok {
+		panic("runtime/cgo: misuse of an invalid Handle")
+	}
+	return v
+}
+
+// Delete invalidates a handle. This method should only be called once
+// the program no longer needs to pass the handle to C and the C code
+// no longer has a copy of the handle value.
+//
+// The method panics if the handle is invalid.
+func (h Handle) Delete() {
+	_, ok := handles.LoadAndDelete(uintptr(h))
+	if !ok {
+		panic("runtime/cgo: misuse of an invalid Handle")
+	}
+}
+
+var (
+	handles   = sync.Map{} // map[Handle]interface{}
+	handleIdx atomic.Uintptr
+)
diff --git a/src/runtime/cgo/handle_test.go b/src/runtime/cgo/handle_test.go
new file mode 100644
index 0000000..b341c8e
--- /dev/null
+++ b/src/runtime/cgo/handle_test.go
@@ -0,0 +1,103 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package cgo
+
+import (
+	"reflect"
+	"testing"
+)
+
+func TestHandle(t *testing.T) {
+	v := 42
+
+	tests := []struct {
+		v1 any
+		v2 any
+	}{
+		{v1: v, v2: v},
+		{v1: &v, v2: &v},
+		{v1: nil, v2: nil},
+	}
+
+	for _, tt := range tests {
+		h1 := NewHandle(tt.v1)
+		h2 := NewHandle(tt.v2)
+
+		if uintptr(h1) == 0 || uintptr(h2) == 0 {
+			t.Fatalf("NewHandle returns zero")
+		}
+
+		if uintptr(h1) == uintptr(h2) {
+			t.Fatalf("Duplicated Go values should have different handles, but got equal")
+		}
+
+		h1v := h1.Value()
+		h2v := h2.Value()
+		if !reflect.DeepEqual(h1v, h2v) || !reflect.DeepEqual(h1v, tt.v1) {
+			t.Fatalf("Value of a Handle got wrong, got %+v %+v, want %+v", h1v, h2v, tt.v1)
+		}
+
+		h1.Delete()
+		h2.Delete()
+	}
+
+	siz := 0
+	handles.Range(func(k, v any) bool {
+		siz++
+		return true
+	})
+	if siz != 0 {
+		t.Fatalf("handles are not cleared, got %d, want %d", siz, 0)
+	}
+}
+
+func TestInvalidHandle(t *testing.T) {
+	t.Run("zero", func(t *testing.T) {
+		h := Handle(0)
+
+		defer func() {
+			if r := recover(); r != nil {
+				return
+			}
+			t.Fatalf("Delete of zero handle did not trigger a panic")
+		}()
+
+		h.Delete()
+	})
+
+	t.Run("invalid", func(t *testing.T) {
+		h := NewHandle(42)
+
+		defer func() {
+			if r := recover(); r != nil {
+				h.Delete()
+				return
+			}
+			t.Fatalf("Invalid handle did not trigger a panic")
+		}()
+
+		Handle(h + 1).Delete()
+	})
+}
+
+func BenchmarkHandle(b *testing.B) {
+	b.Run("non-concurrent", func(b *testing.B) {
+		for i := 0; i < b.N; i++ {
+			h := NewHandle(i)
+			_ = h.Value()
+			h.Delete()
+		}
+	})
+	b.Run("concurrent", func(b *testing.B) {
+		b.RunParallel(func(pb *testing.PB) {
+			var v int
+			for pb.Next() {
+				h := NewHandle(v)
+				_ = h.Value()
+				h.Delete()
+			}
+		})
+	})
+}
diff --git a/src/runtime/cgo/iscgo.go b/src/runtime/cgo/iscgo.go
new file mode 100644
index 0000000..e12d0f4
--- /dev/null
+++ b/src/runtime/cgo/iscgo.go
@@ -0,0 +1,17 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The runtime package contains an uninitialized definition
+// for runtime·iscgo. Override it to tell the runtime we're here.
+// There are various function pointers that should be set too,
+// but those depend on dynamic linker magic to get initialized
+// correctly, and sometimes they break. This variable is a
+// backup: it depends only on old C style static linking rules.
+
+package cgo
+
+import _ "unsafe" // for go:linkname
+
+//go:linkname _iscgo runtime.iscgo
+var _iscgo bool = true
diff --git a/src/runtime/cgo/libcgo.h b/src/runtime/cgo/libcgo.h
new file mode 100644
index 0000000..04755f0
--- /dev/null
+++ b/src/runtime/cgo/libcgo.h
@@ -0,0 +1,156 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdio.h>
+
+#undef nil
+#define nil ((void*)0)
+#define nelem(x) (sizeof(x)/sizeof((x)[0]))
+
+typedef uint32_t uint32;
+typedef uint64_t uint64;
+typedef uintptr_t uintptr;
+
+/*
+ * The beginning of the per-goroutine structure,
+ * as defined in ../pkg/runtime/runtime.h.
+ * Just enough to edit these two fields.
+ */
+typedef struct G G;
+struct G
+{
+	uintptr stacklo;
+	uintptr stackhi;
+};
+
+/*
+ * Arguments to the _cgo_thread_start call.
+ * Also known to ../pkg/runtime/runtime.h.
+ */
+typedef struct ThreadStart ThreadStart;
+struct ThreadStart
+{
+	G *g;
+	uintptr *tls;
+	void (*fn)(void);
+};
+
+/*
+ * Called by 5c/6c/8c world.
+ * Makes a local copy of the ThreadStart and
+ * calls _cgo_sys_thread_start(ts).
+ */
+extern void (*_cgo_thread_start)(ThreadStart *ts);
+
+/*
+ * Creates a new operating system thread without updating any Go state
+ * (OS dependent).
+ */
+extern void (*_cgo_sys_thread_create)(void* (*func)(void*), void* arg);
+
+/*
+ * Indicates whether a dummy pthread per-thread variable is allocated.
+ */
+extern uintptr_t *_cgo_pthread_key_created;
+
+/*
+ * Creates the new operating system thread (OS, arch dependent).
+ */
+void _cgo_sys_thread_start(ThreadStart *ts);
+
+/*
+ * Waits for the Go runtime to be initialized (OS dependent).
+ * If runtime.SetCgoTraceback is used to set a context function,
+ * calls the context function and returns the context value.
+ */
+uintptr_t _cgo_wait_runtime_init_done(void);
+
+/*
+ * Call fn in the 6c world.
+ */
+void crosscall_amd64(void (*fn)(void), void (*setg_gcc)(void*), void *g);
+
+/*
+ * Call fn in the 8c world.
+ */
+void crosscall_386(void (*fn)(void));
+
+/*
+ * Prints error then calls abort. For linux and android.
+ */
+void fatalf(const char* format, ...);
+
+/*
+ * Registers the current mach thread port for EXC_BAD_ACCESS processing.
+ */
+void darwin_arm_init_thread_exception_port(void);
+
+/*
+ * Starts a mach message server processing EXC_BAD_ACCESS.
+ */
+void darwin_arm_init_mach_exception_handler(void);
+
+/*
+ * The cgo context function. See runtime.SetCgoTraceback.
+ */
+struct context_arg {
+	uintptr_t Context;
+};
+extern void (*(_cgo_get_context_function(void)))(struct context_arg*);
+
+/*
+ * The argument for the cgo traceback callback. See runtime.SetCgoTraceback.
+ */
+struct cgoTracebackArg {
+	uintptr_t  Context;
+	uintptr_t  SigContext;
+	uintptr_t* Buf;
+	uintptr_t  Max;
+};
+
+/*
+ * TSAN support.  This is only useful when building with
+ *   CGO_CFLAGS="-fsanitize=thread" CGO_LDFLAGS="-fsanitize=thread" go install
+ */
+#undef CGO_TSAN
+#if defined(__has_feature)
+# if __has_feature(thread_sanitizer)
+#  define CGO_TSAN
+# endif
+#elif defined(__SANITIZE_THREAD__)
+# define CGO_TSAN
+#endif
+
+#ifdef CGO_TSAN
+
+// These must match the definitions in yesTsanProlog in cmd/cgo/out.go.
+// In general we should call _cgo_tsan_acquire when we enter C code,
+// and call _cgo_tsan_release when we return to Go code.
+// This is only necessary when calling code that might be instrumented
+// by TSAN, which mostly means system library calls that TSAN intercepts.
+// See the comment in cmd/cgo/out.go for more details.
+
+long long _cgo_sync __attribute__ ((common));
+
+extern void __tsan_acquire(void*);
+extern void __tsan_release(void*);
+
+__attribute__ ((unused))
+static void _cgo_tsan_acquire() {
+	__tsan_acquire(&_cgo_sync);
+}
+
+__attribute__ ((unused))
+static void _cgo_tsan_release() {
+	__tsan_release(&_cgo_sync);
+}
+
+#else // !defined(CGO_TSAN)
+
+#define _cgo_tsan_acquire()
+#define _cgo_tsan_release()
+
+#endif // !defined(CGO_TSAN)
diff --git a/src/runtime/cgo/libcgo_unix.h b/src/runtime/cgo/libcgo_unix.h
new file mode 100644
index 0000000..a56a366
--- /dev/null
+++ b/src/runtime/cgo/libcgo_unix.h
@@ -0,0 +1,15 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+ * Call pthread_create, retrying on EAGAIN.
+ */
+extern int _cgo_try_pthread_create(pthread_t*, const pthread_attr_t*, void* (*)(void*), void*);
+
+/*
+ * Same as _cgo_try_pthread_create, but passing on the pthread_create function.
+ * Only defined on OpenBSD.
+ */
+extern int _cgo_openbsd_try_pthread_create(int (*)(pthread_t*, const pthread_attr_t*, void *(*pfn)(void*), void*),
+	pthread_t*, const pthread_attr_t*, void* (*)(void*), void* arg);
diff --git a/src/runtime/cgo/libcgo_windows.h b/src/runtime/cgo/libcgo_windows.h
new file mode 100644
index 0000000..33d7637
--- /dev/null
+++ b/src/runtime/cgo/libcgo_windows.h
@@ -0,0 +1,6 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Call _beginthread, aborting on failure.
+void _cgo_beginthread(void (*func)(void*), void* arg);
diff --git a/src/runtime/cgo/linux.go b/src/runtime/cgo/linux.go
new file mode 100644
index 0000000..1d6fe03
--- /dev/null
+++ b/src/runtime/cgo/linux.go
@@ -0,0 +1,74 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Linux system call wrappers that provide POSIX semantics through the
+// corresponding cgo->libc (nptl) wrappers for various system calls.
+
+//go:build linux
+
+package cgo
+
+import "unsafe"
+
+// Each of the following entries is needed to ensure that the
+// syscall.syscall_linux code can conditionally call these
+// function pointers:
+//
+//  1. find the C-defined function start
+//  2. force the local byte alias to be mapped to that location
+//  3. map the Go pointer to the function to the syscall package
+
+//go:cgo_import_static _cgo_libc_setegid
+//go:linkname _cgo_libc_setegid _cgo_libc_setegid
+//go:linkname cgo_libc_setegid syscall.cgo_libc_setegid
+var _cgo_libc_setegid byte
+var cgo_libc_setegid = unsafe.Pointer(&_cgo_libc_setegid)
+
+//go:cgo_import_static _cgo_libc_seteuid
+//go:linkname _cgo_libc_seteuid _cgo_libc_seteuid
+//go:linkname cgo_libc_seteuid syscall.cgo_libc_seteuid
+var _cgo_libc_seteuid byte
+var cgo_libc_seteuid = unsafe.Pointer(&_cgo_libc_seteuid)
+
+//go:cgo_import_static _cgo_libc_setregid
+//go:linkname _cgo_libc_setregid _cgo_libc_setregid
+//go:linkname cgo_libc_setregid syscall.cgo_libc_setregid
+var _cgo_libc_setregid byte
+var cgo_libc_setregid = unsafe.Pointer(&_cgo_libc_setregid)
+
+//go:cgo_import_static _cgo_libc_setresgid
+//go:linkname _cgo_libc_setresgid _cgo_libc_setresgid
+//go:linkname cgo_libc_setresgid syscall.cgo_libc_setresgid
+var _cgo_libc_setresgid byte
+var cgo_libc_setresgid = unsafe.Pointer(&_cgo_libc_setresgid)
+
+//go:cgo_import_static _cgo_libc_setresuid
+//go:linkname _cgo_libc_setresuid _cgo_libc_setresuid
+//go:linkname cgo_libc_setresuid syscall.cgo_libc_setresuid
+var _cgo_libc_setresuid byte
+var cgo_libc_setresuid = unsafe.Pointer(&_cgo_libc_setresuid)
+
+//go:cgo_import_static _cgo_libc_setreuid
+//go:linkname _cgo_libc_setreuid _cgo_libc_setreuid
+//go:linkname cgo_libc_setreuid syscall.cgo_libc_setreuid
+var _cgo_libc_setreuid byte
+var cgo_libc_setreuid = unsafe.Pointer(&_cgo_libc_setreuid)
+
+//go:cgo_import_static _cgo_libc_setgroups
+//go:linkname _cgo_libc_setgroups _cgo_libc_setgroups
+//go:linkname cgo_libc_setgroups syscall.cgo_libc_setgroups
+var _cgo_libc_setgroups byte
+var cgo_libc_setgroups = unsafe.Pointer(&_cgo_libc_setgroups)
+
+//go:cgo_import_static _cgo_libc_setgid
+//go:linkname _cgo_libc_setgid _cgo_libc_setgid
+//go:linkname cgo_libc_setgid syscall.cgo_libc_setgid
+var _cgo_libc_setgid byte
+var cgo_libc_setgid = unsafe.Pointer(&_cgo_libc_setgid)
+
+//go:cgo_import_static _cgo_libc_setuid
+//go:linkname _cgo_libc_setuid _cgo_libc_setuid
+//go:linkname cgo_libc_setuid syscall.cgo_libc_setuid
+var _cgo_libc_setuid byte
+var cgo_libc_setuid = unsafe.Pointer(&_cgo_libc_setuid)
diff --git a/src/runtime/cgo/linux_syscall.c b/src/runtime/cgo/linux_syscall.c
new file mode 100644
index 0000000..0ea2da7
--- /dev/null
+++ b/src/runtime/cgo/linux_syscall.c
@@ -0,0 +1,85 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux
+
+#ifndef _GNU_SOURCE // setres[ug]id() API.
+#define _GNU_SOURCE
+#endif
+
+#include <grp.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <errno.h>
+#include "libcgo.h"
+
+/*
+ * Assumed POSIX compliant libc system call wrappers. For linux, the
+ * glibc/nptl/setxid mechanism ensures that POSIX semantics are
+ * honored for all pthreads (by default), and this in turn with cgo
+ * ensures that all Go threads launched with cgo are kept in sync for
+ * these function calls.
+ */
+
+// argset_t matches runtime/cgocall.go:argset.
+typedef struct {
+	uintptr_t* args;
+	uintptr_t retval;
+} argset_t;
+
+// libc backed posix-compliant syscalls.
+
+#define SET_RETVAL(fn) \
+  uintptr_t ret = (uintptr_t) fn ; \
+  if (ret == (uintptr_t) -1) {	   \
+    x->retval = (uintptr_t) errno; \
+  } else                           \
+    x->retval = ret
+
+void
+_cgo_libc_setegid(argset_t* x) {
+	SET_RETVAL(setegid((gid_t) x->args[0]));
+}
+
+void
+_cgo_libc_seteuid(argset_t* x) {
+	SET_RETVAL(seteuid((uid_t) x->args[0]));
+}
+
+void
+_cgo_libc_setgid(argset_t* x) {
+	SET_RETVAL(setgid((gid_t) x->args[0]));
+}
+
+void
+_cgo_libc_setgroups(argset_t* x) {
+	SET_RETVAL(setgroups((size_t) x->args[0], (const gid_t *) x->args[1]));
+}
+
+void
+_cgo_libc_setregid(argset_t* x) {
+	SET_RETVAL(setregid((gid_t) x->args[0], (gid_t) x->args[1]));
+}
+
+void
+_cgo_libc_setresgid(argset_t* x) {
+	SET_RETVAL(setresgid((gid_t) x->args[0], (gid_t) x->args[1],
+			     (gid_t) x->args[2]));
+}
+
+void
+_cgo_libc_setresuid(argset_t* x) {
+	SET_RETVAL(setresuid((uid_t) x->args[0], (uid_t) x->args[1],
+			     (uid_t) x->args[2]));
+}
+
+void
+_cgo_libc_setreuid(argset_t* x) {
+	SET_RETVAL(setreuid((uid_t) x->args[0], (uid_t) x->args[1]));
+}
+
+void
+_cgo_libc_setuid(argset_t* x) {
+	SET_RETVAL(setuid((uid_t) x->args[0]));
+}
diff --git a/src/runtime/cgo/mmap.go b/src/runtime/cgo/mmap.go
new file mode 100644
index 0000000..2f7e83b
--- /dev/null
+++ b/src/runtime/cgo/mmap.go
@@ -0,0 +1,31 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build (linux && amd64) || (linux && arm64) || (freebsd && amd64)
+
+package cgo
+
+// Import "unsafe" because we use go:linkname.
+import _ "unsafe"
+
+// When using cgo, call the C library for mmap, so that we call into
+// any sanitizer interceptors. This supports using the memory
+// sanitizer with Go programs. The memory sanitizer only applies to
+// C/C++ code; this permits that code to see the Go code as normal
+// program addresses that have been initialized.
+
+// To support interceptors that look for both mmap and munmap,
+// also call the C library for munmap.
+
+//go:cgo_import_static x_cgo_mmap
+//go:linkname x_cgo_mmap x_cgo_mmap
+//go:linkname _cgo_mmap _cgo_mmap
+var x_cgo_mmap byte
+var _cgo_mmap = &x_cgo_mmap
+
+//go:cgo_import_static x_cgo_munmap
+//go:linkname x_cgo_munmap x_cgo_munmap
+//go:linkname _cgo_munmap _cgo_munmap
+var x_cgo_munmap byte
+var _cgo_munmap = &x_cgo_munmap
diff --git a/src/runtime/cgo/netbsd.go b/src/runtime/cgo/netbsd.go
new file mode 100644
index 0000000..8a8018b
--- /dev/null
+++ b/src/runtime/cgo/netbsd.go
@@ -0,0 +1,21 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build netbsd
+
+package cgo
+
+import _ "unsafe" // for go:linkname
+
+// Supply environ and __progname, because we don't
+// link against the standard NetBSD crt0.o and the
+// libc dynamic library needs them.
+
+//go:linkname _environ environ
+//go:linkname _progname __progname
+//go:linkname ___ps_strings __ps_strings
+
+var _environ uintptr
+var _progname uintptr
+var ___ps_strings uintptr
diff --git a/src/runtime/cgo/openbsd.go b/src/runtime/cgo/openbsd.go
new file mode 100644
index 0000000..26b62fb
--- /dev/null
+++ b/src/runtime/cgo/openbsd.go
@@ -0,0 +1,21 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build openbsd
+
+package cgo
+
+import _ "unsafe" // for go:linkname
+
+// Supply __guard_local because we don't link against the standard
+// OpenBSD crt0.o and the libc dynamic library needs it.
+
+//go:linkname _guard_local __guard_local
+
+var _guard_local uintptr
+
+// This is normally marked as hidden and placed in the
+// .openbsd.randomdata section.
+//
+//go:cgo_export_dynamic __guard_local __guard_local
diff --git a/src/runtime/cgo/setenv.go b/src/runtime/cgo/setenv.go
new file mode 100644
index 0000000..2247cb2
--- /dev/null
+++ b/src/runtime/cgo/setenv.go
@@ -0,0 +1,21 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package cgo
+
+import _ "unsafe" // for go:linkname
+
+//go:cgo_import_static x_cgo_setenv
+//go:linkname x_cgo_setenv x_cgo_setenv
+//go:linkname _cgo_setenv runtime._cgo_setenv
+var x_cgo_setenv byte
+var _cgo_setenv = &x_cgo_setenv
+
+//go:cgo_import_static x_cgo_unsetenv
+//go:linkname x_cgo_unsetenv x_cgo_unsetenv
+//go:linkname _cgo_unsetenv runtime._cgo_unsetenv
+var x_cgo_unsetenv byte
+var _cgo_unsetenv = &x_cgo_unsetenv
diff --git a/src/runtime/cgo/sigaction.go b/src/runtime/cgo/sigaction.go
new file mode 100644
index 0000000..dc714f7
--- /dev/null
+++ b/src/runtime/cgo/sigaction.go
@@ -0,0 +1,22 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build (linux && amd64) || (freebsd && amd64) || (linux && arm64) || (linux && ppc64le)
+
+package cgo
+
+// Import "unsafe" because we use go:linkname.
+import _ "unsafe"
+
+// When using cgo, call the C library for sigaction, so that we call into
+// any sanitizer interceptors. This supports using the sanitizers
+// with Go programs. The thread and memory sanitizers only apply to
+// C/C++ code; this permits that code to see the Go runtime's existing signal
+// handlers when registering new signal handlers for the process.
+
+//go:cgo_import_static x_cgo_sigaction
+//go:linkname x_cgo_sigaction x_cgo_sigaction
+//go:linkname _cgo_sigaction _cgo_sigaction
+var x_cgo_sigaction byte
+var _cgo_sigaction = &x_cgo_sigaction
diff --git a/src/runtime/cgo/signal_ios_arm64.go b/src/runtime/cgo/signal_ios_arm64.go
new file mode 100644
index 0000000..3425c44
--- /dev/null
+++ b/src/runtime/cgo/signal_ios_arm64.go
@@ -0,0 +1,10 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package cgo
+
+import _ "unsafe"
+
+//go:cgo_export_static xx_cgo_panicmem xx_cgo_panicmem
+func xx_cgo_panicmem()
diff --git a/src/runtime/cgo/signal_ios_arm64.s b/src/runtime/cgo/signal_ios_arm64.s
new file mode 100644
index 0000000..1ae00d1
--- /dev/null
+++ b/src/runtime/cgo/signal_ios_arm64.s
@@ -0,0 +1,56 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// xx_cgo_panicmem is the entrypoint for SIGSEGV as intercepted via a
+// mach thread port as EXC_BAD_ACCESS. As the segfault may have happened
+// in C code, we first need to load_g then call xx_cgo_panicmem.
+//
+//	R1 - LR at moment of fault
+//	R2 - PC at moment of fault
+TEXT xx_cgo_panicmem(SB),NOSPLIT|NOFRAME,$0
+	// If in external C code, we need to load the g register.
+	BL  runtime·load_g(SB)
+	CMP $0, g
+	BNE ongothread
+
+	// On a foreign thread.
+	// TODO(crawshaw): call badsignal
+	MOVD.W $0, -16(RSP)
+	MOVW $139, R1
+	MOVW R1, 8(RSP)
+	B    runtime·exit(SB)
+
+ongothread:
+	// Trigger a SIGSEGV panic.
+	//
+	// The goal is to arrange the stack so it looks like the runtime
+	// function sigpanic was called from the PC that faulted. It has
+	// to be sigpanic, as the stack unwinding code in traceback.go
+	// looks explicitly for it.
+	//
+	// To do this we call into runtime·setsigsegv, which sets the
+	// appropriate state inside the g object. We give it the faulting
+	// PC on the stack, then put it in the LR before calling sigpanic.
+
+	// Build a 32-byte stack frame for us for this call.
+	// Saved LR (none available) is at the bottom,
+	// then the PC argument for setsigsegv,
+	// then a copy of the LR for us to restore.
+	MOVD.W $0, -32(RSP)
+	MOVD R1, 8(RSP)
+	MOVD R2, 16(RSP)
+	BL runtime·setsigsegv(SB)
+	MOVD 8(RSP), R1
+	MOVD 16(RSP), R2
+
+	// Build a 16-byte stack frame for the simulated
+	// call to sigpanic, by taking 16 bytes away from the
+	// 32-byte stack frame above.
+	// The saved LR in this frame is the LR at time of fault,
+	// and the LR on entry to sigpanic is the PC at time of fault.
+	MOVD.W R1, 16(RSP)
+	MOVD R2, R30
+	B runtime·sigpanic(SB)
diff --git a/src/runtime/cgo_mmap.go b/src/runtime/cgo_mmap.go
new file mode 100644
index 0000000..30660f7
--- /dev/null
+++ b/src/runtime/cgo_mmap.go
@@ -0,0 +1,70 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Support for memory sanitizer. See runtime/cgo/mmap.go.
+
+//go:build (linux && amd64) || (linux && arm64) || (freebsd && amd64)
+
+package runtime
+
+import "unsafe"
+
+// _cgo_mmap is filled in by runtime/cgo when it is linked into the
+// program, so it is only non-nil when using cgo.
+//
+//go:linkname _cgo_mmap _cgo_mmap
+var _cgo_mmap unsafe.Pointer
+
+// _cgo_munmap is filled in by runtime/cgo when it is linked into the
+// program, so it is only non-nil when using cgo.
+//
+//go:linkname _cgo_munmap _cgo_munmap
+var _cgo_munmap unsafe.Pointer
+
+// mmap is used to route the mmap system call through C code when using cgo, to
+// support sanitizer interceptors. Don't allow stack splits, since this function
+// (used by sysAlloc) is called in a lot of low-level parts of the runtime and
+// callers often assume it won't acquire any locks.
+//
+//go:nosplit
+func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (unsafe.Pointer, int) {
+	if _cgo_mmap != nil {
+		// Make ret a uintptr so that writing to it in the
+		// function literal does not trigger a write barrier.
+		// A write barrier here could break because of the way
+		// that mmap uses the same value both as a pointer and
+		// an errno value.
+		var ret uintptr
+		systemstack(func() {
+			ret = callCgoMmap(addr, n, prot, flags, fd, off)
+		})
+		if ret < 4096 {
+			return nil, int(ret)
+		}
+		return unsafe.Pointer(ret), 0
+	}
+	return sysMmap(addr, n, prot, flags, fd, off)
+}
+
+func munmap(addr unsafe.Pointer, n uintptr) {
+	if _cgo_munmap != nil {
+		systemstack(func() { callCgoMunmap(addr, n) })
+		return
+	}
+	sysMunmap(addr, n)
+}
+
+// sysMmap calls the mmap system call. It is implemented in assembly.
+func sysMmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (p unsafe.Pointer, err int)
+
+// callCgoMmap calls the mmap function in the runtime/cgo package
+// using the GCC calling convention. It is implemented in assembly.
+func callCgoMmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) uintptr
+
+// sysMunmap calls the munmap system call. It is implemented in assembly.
+func sysMunmap(addr unsafe.Pointer, n uintptr)
+
+// callCgoMunmap calls the munmap function in the runtime/cgo package
+// using the GCC calling convention. It is implemented in assembly.
+func callCgoMunmap(addr unsafe.Pointer, n uintptr)
diff --git a/src/runtime/cgo_ppc64x.go b/src/runtime/cgo_ppc64x.go
new file mode 100644
index 0000000..c723213
--- /dev/null
+++ b/src/runtime/cgo_ppc64x.go
@@ -0,0 +1,13 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ppc64 || ppc64le
+
+package runtime
+
+// crosscall_ppc64 calls into the runtime to set up the registers the
+// Go runtime expects and so the symbol it calls needs to be exported
+// for external linking to work.
+//
+//go:cgo_export_static _cgo_reginit
diff --git a/src/runtime/cgo_sigaction.go b/src/runtime/cgo_sigaction.go
new file mode 100644
index 0000000..9500c52
--- /dev/null
+++ b/src/runtime/cgo_sigaction.go
@@ -0,0 +1,94 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Support for sanitizers. See runtime/cgo/sigaction.go.
+
+//go:build (linux && amd64) || (freebsd && amd64) || (linux && arm64) || (linux && ppc64le)
+
+package runtime
+
+import "unsafe"
+
+// _cgo_sigaction is filled in by runtime/cgo when it is linked into the
+// program, so it is only non-nil when using cgo.
+//
+//go:linkname _cgo_sigaction _cgo_sigaction
+var _cgo_sigaction unsafe.Pointer
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaction(sig uint32, new, old *sigactiont) {
+	// racewalk.go avoids adding sanitizing instrumentation to package runtime,
+	// but we might be calling into instrumented C functions here,
+	// so we need the pointer parameters to be properly marked.
+	//
+	// Mark the input as having been written before the call
+	// and the output as read after.
+	if msanenabled && new != nil {
+		msanwrite(unsafe.Pointer(new), unsafe.Sizeof(*new))
+	}
+	if asanenabled && new != nil {
+		asanwrite(unsafe.Pointer(new), unsafe.Sizeof(*new))
+	}
+	if _cgo_sigaction == nil || inForkedChild {
+		sysSigaction(sig, new, old)
+	} else {
+		// We need to call _cgo_sigaction, which means we need a big enough stack
+		// for C.  To complicate matters, we may be in libpreinit (before the
+		// runtime has been initialized) or in an asynchronous signal handler (with
+		// the current thread in transition between goroutines, or with the g0
+		// system stack already in use).
+
+		var ret int32
+
+		var g *g
+		if mainStarted {
+			g = getg()
+		}
+		sp := uintptr(unsafe.Pointer(&sig))
+		switch {
+		case g == nil:
+			// No g: we're on a C stack or a signal stack.
+			ret = callCgoSigaction(uintptr(sig), new, old)
+		case sp < g.stack.lo || sp >= g.stack.hi:
+			// We're no longer on g's stack, so we must be handling a signal.  It's
+			// possible that we interrupted the thread during a transition between g
+			// and g0, so we should stay on the current stack to avoid corrupting g0.
+			ret = callCgoSigaction(uintptr(sig), new, old)
+		default:
+			// We're running on g's stack, so either we're not in a signal handler or
+			// the signal handler has set the correct g.  If we're on gsignal or g0,
+			// systemstack will make the call directly; otherwise, it will switch to
+			// g0 to ensure we have enough room to call a libc function.
+			//
+			// The function literal that we pass to systemstack is not nosplit, but
+			// that's ok: we'll be running on a fresh, clean system stack so the stack
+			// check will always succeed anyway.
+			systemstack(func() {
+				ret = callCgoSigaction(uintptr(sig), new, old)
+			})
+		}
+
+		const EINVAL = 22
+		if ret == EINVAL {
+			// libc reserves certain signals — normally 32-33 — for pthreads, and
+			// returns EINVAL for sigaction calls on those signals.  If we get EINVAL,
+			// fall back to making the syscall directly.
+			sysSigaction(sig, new, old)
+		}
+	}
+
+	if msanenabled && old != nil {
+		msanread(unsafe.Pointer(old), unsafe.Sizeof(*old))
+	}
+	if asanenabled && old != nil {
+		asanread(unsafe.Pointer(old), unsafe.Sizeof(*old))
+	}
+}
+
+// callCgoSigaction calls the sigaction function in the runtime/cgo package
+// using the GCC calling convention. It is implemented in assembly.
+//
+//go:noescape
+func callCgoSigaction(sig uintptr, new, old *sigactiont) int32
diff --git a/src/runtime/cgocall.go b/src/runtime/cgocall.go
new file mode 100644
index 0000000..d226c2e
--- /dev/null
+++ b/src/runtime/cgocall.go
@@ -0,0 +1,727 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Cgo call and callback support.
+//
+// To call into the C function f from Go, the cgo-generated code calls
+// runtime.cgocall(_cgo_Cfunc_f, frame), where _cgo_Cfunc_f is a
+// gcc-compiled function written by cgo.
+//
+// runtime.cgocall (below) calls entersyscall so as not to block
+// other goroutines or the garbage collector, and then calls
+// runtime.asmcgocall(_cgo_Cfunc_f, frame).
+//
+// runtime.asmcgocall (in asm_$GOARCH.s) switches to the m->g0 stack
+// (assumed to be an operating system-allocated stack, so safe to run
+// gcc-compiled code on) and calls _cgo_Cfunc_f(frame).
+//
+// _cgo_Cfunc_f invokes the actual C function f with arguments
+// taken from the frame structure, records the results in the frame,
+// and returns to runtime.asmcgocall.
+//
+// After it regains control, runtime.asmcgocall switches back to the
+// original g (m->curg)'s stack and returns to runtime.cgocall.
+//
+// After it regains control, runtime.cgocall calls exitsyscall, which blocks
+// until this m can run Go code without violating the $GOMAXPROCS limit,
+// and then unlocks g from m.
+//
+// The above description skipped over the possibility of the gcc-compiled
+// function f calling back into Go. If that happens, we continue down
+// the rabbit hole during the execution of f.
+//
+// To make it possible for gcc-compiled C code to call a Go function p.GoF,
+// cgo writes a gcc-compiled function named GoF (not p.GoF, since gcc doesn't
+// know about packages).  The gcc-compiled C function f calls GoF.
+//
+// GoF initializes "frame", a structure containing all of its
+// arguments and slots for p.GoF's results. It calls
+// crosscall2(_cgoexp_GoF, frame, framesize, ctxt) using the gcc ABI.
+//
+// crosscall2 (in cgo/asm_$GOARCH.s) is a four-argument adapter from
+// the gcc function call ABI to the gc function call ABI. At this
+// point we're in the Go runtime, but we're still running on m.g0's
+// stack and outside the $GOMAXPROCS limit. crosscall2 calls
+// runtime.cgocallback(_cgoexp_GoF, frame, ctxt) using the gc ABI.
+// (crosscall2's framesize argument is no longer used, but there's one
+// case where SWIG calls crosscall2 directly and expects to pass this
+// argument. See _cgo_panic.)
+//
+// runtime.cgocallback (in asm_$GOARCH.s) switches from m.g0's stack
+// to the original g (m.curg)'s stack, on which it calls
+// runtime.cgocallbackg(_cgoexp_GoF, frame, ctxt). As part of the
+// stack switch, runtime.cgocallback saves the current SP as
+// m.g0.sched.sp, so that any use of m.g0's stack during the execution
+// of the callback will be done below the existing stack frames.
+// Before overwriting m.g0.sched.sp, it pushes the old value on the
+// m.g0 stack, so that it can be restored later.
+//
+// runtime.cgocallbackg (below) is now running on a real goroutine
+// stack (not an m.g0 stack).  First it calls runtime.exitsyscall, which will
+// block until the $GOMAXPROCS limit allows running this goroutine.
+// Once exitsyscall has returned, it is safe to do things like call the memory
+// allocator or invoke the Go callback function.  runtime.cgocallbackg
+// first defers a function to unwind m.g0.sched.sp, so that if p.GoF
+// panics, m.g0.sched.sp will be restored to its old value: the m.g0 stack
+// and the m.curg stack will be unwound in lock step.
+// Then it calls _cgoexp_GoF(frame).
+//
+// _cgoexp_GoF, which was generated by cmd/cgo, unpacks the arguments
+// from frame, calls p.GoF, writes the results back to frame, and
+// returns. Now we start unwinding this whole process.
+//
+// runtime.cgocallbackg pops but does not execute the deferred
+// function to unwind m.g0.sched.sp, calls runtime.entersyscall, and
+// returns to runtime.cgocallback.
+//
+// After it regains control, runtime.cgocallback switches back to
+// m.g0's stack (the pointer is still in m.g0.sched.sp), restores the old
+// m.g0.sched.sp value from the stack, and returns to crosscall2.
+//
+// crosscall2 restores the callee-save registers for gcc and returns
+// to GoF, which unpacks any result values and returns to f.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"internal/goexperiment"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// Addresses collected in a cgo backtrace when crashing.
+// Length must match arg.Max in x_cgo_callers in runtime/cgo/gcc_traceback.c.
+type cgoCallers [32]uintptr
+
+// argset matches runtime/cgo/linux_syscall.c:argset_t
+type argset struct {
+	args   unsafe.Pointer
+	retval uintptr
+}
+
+// wrapper for syscall package to call cgocall for libc (cgo) calls.
+//
+//go:linkname syscall_cgocaller syscall.cgocaller
+//go:nosplit
+//go:uintptrescapes
+func syscall_cgocaller(fn unsafe.Pointer, args ...uintptr) uintptr {
+	as := argset{args: unsafe.Pointer(&args[0])}
+	cgocall(fn, unsafe.Pointer(&as))
+	return as.retval
+}
+
+var ncgocall uint64 // number of cgo calls in total for dead m
+
+// Call from Go to C.
+//
+// This must be nosplit because it's used for syscalls on some
+// platforms. Syscalls may have untyped arguments on the stack, so
+// it's not safe to grow or scan the stack.
+//
+//go:nosplit
+func cgocall(fn, arg unsafe.Pointer) int32 {
+	if !iscgo && GOOS != "solaris" && GOOS != "illumos" && GOOS != "windows" {
+		throw("cgocall unavailable")
+	}
+
+	if fn == nil {
+		throw("cgocall nil")
+	}
+
+	if raceenabled {
+		racereleasemerge(unsafe.Pointer(&racecgosync))
+	}
+
+	mp := getg().m
+	mp.ncgocall++
+
+	// Reset traceback.
+	mp.cgoCallers[0] = 0
+
+	// Announce we are entering a system call
+	// so that the scheduler knows to create another
+	// M to run goroutines while we are in the
+	// foreign code.
+	//
+	// The call to asmcgocall is guaranteed not to
+	// grow the stack and does not allocate memory,
+	// so it is safe to call while "in a system call", outside
+	// the $GOMAXPROCS accounting.
+	//
+	// fn may call back into Go code, in which case we'll exit the
+	// "system call", run the Go code (which may grow the stack),
+	// and then re-enter the "system call" reusing the PC and SP
+	// saved by entersyscall here.
+	entersyscall()
+
+	// Tell asynchronous preemption that we're entering external
+	// code. We do this after entersyscall because this may block
+	// and cause an async preemption to fail, but at this point a
+	// sync preemption will succeed (though this is not a matter
+	// of correctness).
+	osPreemptExtEnter(mp)
+
+	mp.incgo = true
+	// We use ncgo as a check during execution tracing for whether there is
+	// any C on the call stack, which there will be after this point. If
+	// there isn't, we can use frame pointer unwinding to collect call
+	// stacks efficiently. This will be the case for the first Go-to-C call
+	// on a stack, so it's prefereable to update it here, after we emit a
+	// trace event in entersyscall above.
+	mp.ncgo++
+
+	errno := asmcgocall(fn, arg)
+
+	// Update accounting before exitsyscall because exitsyscall may
+	// reschedule us on to a different M.
+	mp.incgo = false
+	mp.ncgo--
+
+	osPreemptExtExit(mp)
+
+	exitsyscall()
+
+	// Note that raceacquire must be called only after exitsyscall has
+	// wired this M to a P.
+	if raceenabled {
+		raceacquire(unsafe.Pointer(&racecgosync))
+	}
+
+	// From the garbage collector's perspective, time can move
+	// backwards in the sequence above. If there's a callback into
+	// Go code, GC will see this function at the call to
+	// asmcgocall. When the Go call later returns to C, the
+	// syscall PC/SP is rolled back and the GC sees this function
+	// back at the call to entersyscall. Normally, fn and arg
+	// would be live at entersyscall and dead at asmcgocall, so if
+	// time moved backwards, GC would see these arguments as dead
+	// and then live. Prevent these undead arguments from crashing
+	// GC by forcing them to stay live across this time warp.
+	KeepAlive(fn)
+	KeepAlive(arg)
+	KeepAlive(mp)
+
+	return errno
+}
+
+// Set or reset the system stack bounds for a callback on sp.
+//
+// Must be nosplit because it is called by needm prior to fully initializing
+// the M.
+//
+//go:nosplit
+func callbackUpdateSystemStack(mp *m, sp uintptr, signal bool) {
+	g0 := mp.g0
+	if sp > g0.stack.lo && sp <= g0.stack.hi {
+		// Stack already in bounds, nothing to do.
+		return
+	}
+
+	if mp.ncgo > 0 {
+		// ncgo > 0 indicates that this M was in Go further up the stack
+		// (it called C and is now receiving a callback). It is not
+		// safe for the C call to change the stack out from under us.
+
+		// Note that this case isn't possible for signal == true, as
+		// that is always passing a new M from needm.
+
+		// Stack is bogus, but reset the bounds anyway so we can print.
+		hi := g0.stack.hi
+		lo := g0.stack.lo
+		g0.stack.hi = sp + 1024
+		g0.stack.lo = sp - 32*1024
+		g0.stackguard0 = g0.stack.lo + stackGuard
+
+		print("M ", mp.id, " procid ", mp.procid, " runtime: cgocallback with sp=", hex(sp), " out of bounds [", hex(lo), ", ", hex(hi), "]")
+		print("\n")
+		exit(2)
+	}
+
+	// This M does not have Go further up the stack. However, it may have
+	// previously called into Go, initializing the stack bounds. Between
+	// that call returning and now the stack may have changed (perhaps the
+	// C thread is running a coroutine library). We need to update the
+	// stack bounds for this case.
+	//
+	// Set the stack bounds to match the current stack. If we don't
+	// actually know how big the stack is, like we don't know how big any
+	// scheduling stack is, but we assume there's at least 32 kB. If we
+	// can get a more accurate stack bound from pthread, use that, provided
+	// it actually contains SP..
+	g0.stack.hi = sp + 1024
+	g0.stack.lo = sp - 32*1024
+	if !signal && _cgo_getstackbound != nil {
+		// Don't adjust if called from the signal handler.
+		// We are on the signal stack, not the pthread stack.
+		// (We could get the stack bounds from sigaltstack, but
+		// we're getting out of the signal handler very soon
+		// anyway. Not worth it.)
+		var bounds [2]uintptr
+		asmcgocall(_cgo_getstackbound, unsafe.Pointer(&bounds))
+		// getstackbound is an unsupported no-op on Windows.
+		//
+		// Don't use these bounds if they don't contain SP. Perhaps we
+		// were called by something not using the standard thread
+		// stack.
+		if bounds[0] != 0  && sp > bounds[0] && sp <= bounds[1] {
+			g0.stack.lo = bounds[0]
+			g0.stack.hi = bounds[1]
+		}
+	}
+	g0.stackguard0 = g0.stack.lo + stackGuard
+}
+
+// Call from C back to Go. fn must point to an ABIInternal Go entry-point.
+//
+//go:nosplit
+func cgocallbackg(fn, frame unsafe.Pointer, ctxt uintptr) {
+	gp := getg()
+	if gp != gp.m.curg {
+		println("runtime: bad g in cgocallback")
+		exit(2)
+	}
+
+	sp := gp.m.g0.sched.sp // system sp saved by cgocallback.
+	callbackUpdateSystemStack(gp.m, sp, false)
+
+	// The call from C is on gp.m's g0 stack, so we must ensure
+	// that we stay on that M. We have to do this before calling
+	// exitsyscall, since it would otherwise be free to move us to
+	// a different M. The call to unlockOSThread is in unwindm.
+	lockOSThread()
+
+	checkm := gp.m
+
+	// Save current syscall parameters, so m.syscall can be
+	// used again if callback decide to make syscall.
+	syscall := gp.m.syscall
+
+	// entersyscall saves the caller's SP to allow the GC to trace the Go
+	// stack. However, since we're returning to an earlier stack frame and
+	// need to pair with the entersyscall() call made by cgocall, we must
+	// save syscall* and let reentersyscall restore them.
+	savedsp := unsafe.Pointer(gp.syscallsp)
+	savedpc := gp.syscallpc
+	exitsyscall() // coming out of cgo call
+	gp.m.incgo = false
+	if gp.m.isextra {
+		gp.m.isExtraInC = false
+	}
+
+	osPreemptExtExit(gp.m)
+
+	cgocallbackg1(fn, frame, ctxt) // will call unlockOSThread
+
+	// At this point unlockOSThread has been called.
+	// The following code must not change to a different m.
+	// This is enforced by checking incgo in the schedule function.
+
+	gp.m.incgo = true
+	if gp.m.isextra {
+		gp.m.isExtraInC = true
+	}
+
+	if gp.m != checkm {
+		throw("m changed unexpectedly in cgocallbackg")
+	}
+
+	osPreemptExtEnter(gp.m)
+
+	// going back to cgo call
+	reentersyscall(savedpc, uintptr(savedsp))
+
+	gp.m.syscall = syscall
+}
+
+func cgocallbackg1(fn, frame unsafe.Pointer, ctxt uintptr) {
+	gp := getg()
+
+	// When we return, undo the call to lockOSThread in cgocallbackg.
+	// We must still stay on the same m.
+	defer unlockOSThread()
+
+	if gp.m.needextram || extraMWaiters.Load() > 0 {
+		gp.m.needextram = false
+		systemstack(newextram)
+	}
+
+	if ctxt != 0 {
+		s := append(gp.cgoCtxt, ctxt)
+
+		// Now we need to set gp.cgoCtxt = s, but we could get
+		// a SIGPROF signal while manipulating the slice, and
+		// the SIGPROF handler could pick up gp.cgoCtxt while
+		// tracing up the stack.  We need to ensure that the
+		// handler always sees a valid slice, so set the
+		// values in an order such that it always does.
+		p := (*slice)(unsafe.Pointer(&gp.cgoCtxt))
+		atomicstorep(unsafe.Pointer(&p.array), unsafe.Pointer(&s[0]))
+		p.cap = cap(s)
+		p.len = len(s)
+
+		defer func(gp *g) {
+			// Decrease the length of the slice by one, safely.
+			p := (*slice)(unsafe.Pointer(&gp.cgoCtxt))
+			p.len--
+		}(gp)
+	}
+
+	if gp.m.ncgo == 0 {
+		// The C call to Go came from a thread not currently running
+		// any Go. In the case of -buildmode=c-archive or c-shared,
+		// this call may be coming in before package initialization
+		// is complete. Wait until it is.
+		<-main_init_done
+	}
+
+	// Check whether the profiler needs to be turned on or off; this route to
+	// run Go code does not use runtime.execute, so bypasses the check there.
+	hz := sched.profilehz
+	if gp.m.profilehz != hz {
+		setThreadCPUProfiler(hz)
+	}
+
+	// Add entry to defer stack in case of panic.
+	restore := true
+	defer unwindm(&restore)
+
+	if raceenabled {
+		raceacquire(unsafe.Pointer(&racecgosync))
+	}
+
+	// Invoke callback. This function is generated by cmd/cgo and
+	// will unpack the argument frame and call the Go function.
+	var cb func(frame unsafe.Pointer)
+	cbFV := funcval{uintptr(fn)}
+	*(*unsafe.Pointer)(unsafe.Pointer(&cb)) = noescape(unsafe.Pointer(&cbFV))
+	cb(frame)
+
+	if raceenabled {
+		racereleasemerge(unsafe.Pointer(&racecgosync))
+	}
+
+	// Do not unwind m->g0->sched.sp.
+	// Our caller, cgocallback, will do that.
+	restore = false
+}
+
+func unwindm(restore *bool) {
+	if *restore {
+		// Restore sp saved by cgocallback during
+		// unwind of g's stack (see comment at top of file).
+		mp := acquirem()
+		sched := &mp.g0.sched
+		sched.sp = *(*uintptr)(unsafe.Pointer(sched.sp + alignUp(sys.MinFrameSize, sys.StackAlign)))
+
+		// Do the accounting that cgocall will not have a chance to do
+		// during an unwind.
+		//
+		// In the case where a Go call originates from C, ncgo is 0
+		// and there is no matching cgocall to end.
+		if mp.ncgo > 0 {
+			mp.incgo = false
+			mp.ncgo--
+			osPreemptExtExit(mp)
+		}
+
+		releasem(mp)
+	}
+}
+
+// called from assembly.
+func badcgocallback() {
+	throw("misaligned stack in cgocallback")
+}
+
+// called from (incomplete) assembly.
+func cgounimpl() {
+	throw("cgo not implemented")
+}
+
+var racecgosync uint64 // represents possible synchronization in C code
+
+// Pointer checking for cgo code.
+
+// We want to detect all cases where a program that does not use
+// unsafe makes a cgo call passing a Go pointer to memory that
+// contains an unpinned Go pointer. Here a Go pointer is defined as a
+// pointer to memory allocated by the Go runtime. Programs that use
+// unsafe can evade this restriction easily, so we don't try to catch
+// them. The cgo program will rewrite all possibly bad pointer
+// arguments to call cgoCheckPointer, where we can catch cases of a Go
+// pointer pointing to an unpinned Go pointer.
+
+// Complicating matters, taking the address of a slice or array
+// element permits the C program to access all elements of the slice
+// or array. In that case we will see a pointer to a single element,
+// but we need to check the entire data structure.
+
+// The cgoCheckPointer call takes additional arguments indicating that
+// it was called on an address expression. An additional argument of
+// true means that it only needs to check a single element. An
+// additional argument of a slice or array means that it needs to
+// check the entire slice/array, but nothing else. Otherwise, the
+// pointer could be anything, and we check the entire heap object,
+// which is conservative but safe.
+
+// When and if we implement a moving garbage collector,
+// cgoCheckPointer will pin the pointer for the duration of the cgo
+// call.  (This is necessary but not sufficient; the cgo program will
+// also have to change to pin Go pointers that cannot point to Go
+// pointers.)
+
+// cgoCheckPointer checks if the argument contains a Go pointer that
+// points to an unpinned Go pointer, and panics if it does.
+func cgoCheckPointer(ptr any, arg any) {
+	if !goexperiment.CgoCheck2 && debug.cgocheck == 0 {
+		return
+	}
+
+	ep := efaceOf(&ptr)
+	t := ep._type
+
+	top := true
+	if arg != nil && (t.Kind_&kindMask == kindPtr || t.Kind_&kindMask == kindUnsafePointer) {
+		p := ep.data
+		if t.Kind_&kindDirectIface == 0 {
+			p = *(*unsafe.Pointer)(p)
+		}
+		if p == nil || !cgoIsGoPointer(p) {
+			return
+		}
+		aep := efaceOf(&arg)
+		switch aep._type.Kind_ & kindMask {
+		case kindBool:
+			if t.Kind_&kindMask == kindUnsafePointer {
+				// We don't know the type of the element.
+				break
+			}
+			pt := (*ptrtype)(unsafe.Pointer(t))
+			cgoCheckArg(pt.Elem, p, true, false, cgoCheckPointerFail)
+			return
+		case kindSlice:
+			// Check the slice rather than the pointer.
+			ep = aep
+			t = ep._type
+		case kindArray:
+			// Check the array rather than the pointer.
+			// Pass top as false since we have a pointer
+			// to the array.
+			ep = aep
+			t = ep._type
+			top = false
+		default:
+			throw("can't happen")
+		}
+	}
+
+	cgoCheckArg(t, ep.data, t.Kind_&kindDirectIface == 0, top, cgoCheckPointerFail)
+}
+
+const cgoCheckPointerFail = "cgo argument has Go pointer to unpinned Go pointer"
+const cgoResultFail = "cgo result has Go pointer"
+
+// cgoCheckArg is the real work of cgoCheckPointer. The argument p
+// is either a pointer to the value (of type t), or the value itself,
+// depending on indir. The top parameter is whether we are at the top
+// level, where Go pointers are allowed. Go pointers to pinned objects are
+// always allowed.
+func cgoCheckArg(t *_type, p unsafe.Pointer, indir, top bool, msg string) {
+	if t.PtrBytes == 0 || p == nil {
+		// If the type has no pointers there is nothing to do.
+		return
+	}
+
+	switch t.Kind_ & kindMask {
+	default:
+		throw("can't happen")
+	case kindArray:
+		at := (*arraytype)(unsafe.Pointer(t))
+		if !indir {
+			if at.Len != 1 {
+				throw("can't happen")
+			}
+			cgoCheckArg(at.Elem, p, at.Elem.Kind_&kindDirectIface == 0, top, msg)
+			return
+		}
+		for i := uintptr(0); i < at.Len; i++ {
+			cgoCheckArg(at.Elem, p, true, top, msg)
+			p = add(p, at.Elem.Size_)
+		}
+	case kindChan, kindMap:
+		// These types contain internal pointers that will
+		// always be allocated in the Go heap. It's never OK
+		// to pass them to C.
+		panic(errorString(msg))
+	case kindFunc:
+		if indir {
+			p = *(*unsafe.Pointer)(p)
+		}
+		if !cgoIsGoPointer(p) {
+			return
+		}
+		panic(errorString(msg))
+	case kindInterface:
+		it := *(**_type)(p)
+		if it == nil {
+			return
+		}
+		// A type known at compile time is OK since it's
+		// constant. A type not known at compile time will be
+		// in the heap and will not be OK.
+		if inheap(uintptr(unsafe.Pointer(it))) {
+			panic(errorString(msg))
+		}
+		p = *(*unsafe.Pointer)(add(p, goarch.PtrSize))
+		if !cgoIsGoPointer(p) {
+			return
+		}
+		if !top && !isPinned(p) {
+			panic(errorString(msg))
+		}
+		cgoCheckArg(it, p, it.Kind_&kindDirectIface == 0, false, msg)
+	case kindSlice:
+		st := (*slicetype)(unsafe.Pointer(t))
+		s := (*slice)(p)
+		p = s.array
+		if p == nil || !cgoIsGoPointer(p) {
+			return
+		}
+		if !top && !isPinned(p) {
+			panic(errorString(msg))
+		}
+		if st.Elem.PtrBytes == 0 {
+			return
+		}
+		for i := 0; i < s.cap; i++ {
+			cgoCheckArg(st.Elem, p, true, false, msg)
+			p = add(p, st.Elem.Size_)
+		}
+	case kindString:
+		ss := (*stringStruct)(p)
+		if !cgoIsGoPointer(ss.str) {
+			return
+		}
+		if !top && !isPinned(ss.str) {
+			panic(errorString(msg))
+		}
+	case kindStruct:
+		st := (*structtype)(unsafe.Pointer(t))
+		if !indir {
+			if len(st.Fields) != 1 {
+				throw("can't happen")
+			}
+			cgoCheckArg(st.Fields[0].Typ, p, st.Fields[0].Typ.Kind_&kindDirectIface == 0, top, msg)
+			return
+		}
+		for _, f := range st.Fields {
+			if f.Typ.PtrBytes == 0 {
+				continue
+			}
+			cgoCheckArg(f.Typ, add(p, f.Offset), true, top, msg)
+		}
+	case kindPtr, kindUnsafePointer:
+		if indir {
+			p = *(*unsafe.Pointer)(p)
+			if p == nil {
+				return
+			}
+		}
+
+		if !cgoIsGoPointer(p) {
+			return
+		}
+		if !top && !isPinned(p) {
+			panic(errorString(msg))
+		}
+
+		cgoCheckUnknownPointer(p, msg)
+	}
+}
+
+// cgoCheckUnknownPointer is called for an arbitrary pointer into Go
+// memory. It checks whether that Go memory contains any other
+// pointer into unpinned Go memory. If it does, we panic.
+// The return values are unused but useful to see in panic tracebacks.
+func cgoCheckUnknownPointer(p unsafe.Pointer, msg string) (base, i uintptr) {
+	if inheap(uintptr(p)) {
+		b, span, _ := findObject(uintptr(p), 0, 0)
+		base = b
+		if base == 0 {
+			return
+		}
+		n := span.elemsize
+		hbits := heapBitsForAddr(base, n)
+		for {
+			var addr uintptr
+			if hbits, addr = hbits.next(); addr == 0 {
+				break
+			}
+			pp := *(*unsafe.Pointer)(unsafe.Pointer(addr))
+			if cgoIsGoPointer(pp) && !isPinned(pp) {
+				panic(errorString(msg))
+			}
+		}
+
+		return
+	}
+
+	for _, datap := range activeModules() {
+		if cgoInRange(p, datap.data, datap.edata) || cgoInRange(p, datap.bss, datap.ebss) {
+			// We have no way to know the size of the object.
+			// We have to assume that it might contain a pointer.
+			panic(errorString(msg))
+		}
+		// In the text or noptr sections, we know that the
+		// pointer does not point to a Go pointer.
+	}
+
+	return
+}
+
+// cgoIsGoPointer reports whether the pointer is a Go pointer--a
+// pointer to Go memory. We only care about Go memory that might
+// contain pointers.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func cgoIsGoPointer(p unsafe.Pointer) bool {
+	if p == nil {
+		return false
+	}
+
+	if inHeapOrStack(uintptr(p)) {
+		return true
+	}
+
+	for _, datap := range activeModules() {
+		if cgoInRange(p, datap.data, datap.edata) || cgoInRange(p, datap.bss, datap.ebss) {
+			return true
+		}
+	}
+
+	return false
+}
+
+// cgoInRange reports whether p is between start and end.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func cgoInRange(p unsafe.Pointer, start, end uintptr) bool {
+	return start <= uintptr(p) && uintptr(p) < end
+}
+
+// cgoCheckResult is called to check the result parameter of an
+// exported Go function. It panics if the result is or contains a Go
+// pointer.
+func cgoCheckResult(val any) {
+	if !goexperiment.CgoCheck2 && debug.cgocheck == 0 {
+		return
+	}
+
+	ep := efaceOf(&val)
+	t := ep._type
+	cgoCheckArg(t, ep.data, t.Kind_&kindDirectIface == 0, false, cgoResultFail)
+}
diff --git a/src/runtime/cgocallback.go b/src/runtime/cgocallback.go
new file mode 100644
index 0000000..59953f1
--- /dev/null
+++ b/src/runtime/cgocallback.go
@@ -0,0 +1,13 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// These functions are called from C code via cgo/callbacks.go.
+
+// Panic.
+
+func _cgo_panic_internal(p *byte) {
+	panic(gostringnocopy(p))
+}
diff --git a/src/runtime/cgocheck.go b/src/runtime/cgocheck.go
new file mode 100644
index 0000000..ec5734a
--- /dev/null
+++ b/src/runtime/cgocheck.go
@@ -0,0 +1,292 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Code to check that pointer writes follow the cgo rules.
+// These functions are invoked when GOEXPERIMENT=cgocheck2 is enabled.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"unsafe"
+)
+
+const cgoWriteBarrierFail = "unpinned Go pointer stored into non-Go memory"
+
+// cgoCheckPtrWrite is called whenever a pointer is stored into memory.
+// It throws if the program is storing an unpinned Go pointer into non-Go
+// memory.
+//
+// This is called from generated code when GOEXPERIMENT=cgocheck2 is enabled.
+//
+//go:nosplit
+//go:nowritebarrier
+func cgoCheckPtrWrite(dst *unsafe.Pointer, src unsafe.Pointer) {
+	if !mainStarted {
+		// Something early in startup hates this function.
+		// Don't start doing any actual checking until the
+		// runtime has set itself up.
+		return
+	}
+	if !cgoIsGoPointer(src) {
+		return
+	}
+	if cgoIsGoPointer(unsafe.Pointer(dst)) {
+		return
+	}
+
+	// If we are running on the system stack then dst might be an
+	// address on the stack, which is OK.
+	gp := getg()
+	if gp == gp.m.g0 || gp == gp.m.gsignal {
+		return
+	}
+
+	// Allocating memory can write to various mfixalloc structs
+	// that look like they are non-Go memory.
+	if gp.m.mallocing != 0 {
+		return
+	}
+
+	// If the object is pinned, it's safe to store it in C memory. The GC
+	// ensures it will not be moved or freed.
+	if isPinned(src) {
+		return
+	}
+
+	// It's OK if writing to memory allocated by persistentalloc.
+	// Do this check last because it is more expensive and rarely true.
+	// If it is false the expense doesn't matter since we are crashing.
+	if inPersistentAlloc(uintptr(unsafe.Pointer(dst))) {
+		return
+	}
+
+	systemstack(func() {
+		println("write of unpinned Go pointer", hex(uintptr(src)), "to non-Go memory", hex(uintptr(unsafe.Pointer(dst))))
+		throw(cgoWriteBarrierFail)
+	})
+}
+
+// cgoCheckMemmove is called when moving a block of memory.
+// It throws if the program is copying a block that contains an unpinned Go
+// pointer into non-Go memory.
+//
+// This is called from generated code when GOEXPERIMENT=cgocheck2 is enabled.
+//
+//go:nosplit
+//go:nowritebarrier
+func cgoCheckMemmove(typ *_type, dst, src unsafe.Pointer) {
+	cgoCheckMemmove2(typ, dst, src, 0, typ.Size_)
+}
+
+// cgoCheckMemmove2 is called when moving a block of memory.
+// dst and src point off bytes into the value to copy.
+// size is the number of bytes to copy.
+// It throws if the program is copying a block that contains an unpinned Go
+// pointer into non-Go memory.
+//
+//go:nosplit
+//go:nowritebarrier
+func cgoCheckMemmove2(typ *_type, dst, src unsafe.Pointer, off, size uintptr) {
+	if typ.PtrBytes == 0 {
+		return
+	}
+	if !cgoIsGoPointer(src) {
+		return
+	}
+	if cgoIsGoPointer(dst) {
+		return
+	}
+	cgoCheckTypedBlock(typ, src, off, size)
+}
+
+// cgoCheckSliceCopy is called when copying n elements of a slice.
+// src and dst are pointers to the first element of the slice.
+// typ is the element type of the slice.
+// It throws if the program is copying slice elements that contain unpinned Go
+// pointers into non-Go memory.
+//
+//go:nosplit
+//go:nowritebarrier
+func cgoCheckSliceCopy(typ *_type, dst, src unsafe.Pointer, n int) {
+	if typ.PtrBytes == 0 {
+		return
+	}
+	if !cgoIsGoPointer(src) {
+		return
+	}
+	if cgoIsGoPointer(dst) {
+		return
+	}
+	p := src
+	for i := 0; i < n; i++ {
+		cgoCheckTypedBlock(typ, p, 0, typ.Size_)
+		p = add(p, typ.Size_)
+	}
+}
+
+// cgoCheckTypedBlock checks the block of memory at src, for up to size bytes,
+// and throws if it finds an unpinned Go pointer. The type of the memory is typ,
+// and src is off bytes into that type.
+//
+//go:nosplit
+//go:nowritebarrier
+func cgoCheckTypedBlock(typ *_type, src unsafe.Pointer, off, size uintptr) {
+	// Anything past typ.PtrBytes is not a pointer.
+	if typ.PtrBytes <= off {
+		return
+	}
+	if ptrdataSize := typ.PtrBytes - off; size > ptrdataSize {
+		size = ptrdataSize
+	}
+
+	if typ.Kind_&kindGCProg == 0 {
+		cgoCheckBits(src, typ.GCData, off, size)
+		return
+	}
+
+	// The type has a GC program. Try to find GC bits somewhere else.
+	for _, datap := range activeModules() {
+		if cgoInRange(src, datap.data, datap.edata) {
+			doff := uintptr(src) - datap.data
+			cgoCheckBits(add(src, -doff), datap.gcdatamask.bytedata, off+doff, size)
+			return
+		}
+		if cgoInRange(src, datap.bss, datap.ebss) {
+			boff := uintptr(src) - datap.bss
+			cgoCheckBits(add(src, -boff), datap.gcbssmask.bytedata, off+boff, size)
+			return
+		}
+	}
+
+	s := spanOfUnchecked(uintptr(src))
+	if s.state.get() == mSpanManual {
+		// There are no heap bits for value stored on the stack.
+		// For a channel receive src might be on the stack of some
+		// other goroutine, so we can't unwind the stack even if
+		// we wanted to.
+		// We can't expand the GC program without extra storage
+		// space we can't easily get.
+		// Fortunately we have the type information.
+		systemstack(func() {
+			cgoCheckUsingType(typ, src, off, size)
+		})
+		return
+	}
+
+	// src must be in the regular heap.
+
+	hbits := heapBitsForAddr(uintptr(src), size)
+	for {
+		var addr uintptr
+		if hbits, addr = hbits.next(); addr == 0 {
+			break
+		}
+		v := *(*unsafe.Pointer)(unsafe.Pointer(addr))
+		if cgoIsGoPointer(v) && !isPinned(v) {
+			throw(cgoWriteBarrierFail)
+		}
+	}
+}
+
+// cgoCheckBits checks the block of memory at src, for up to size
+// bytes, and throws if it finds an unpinned Go pointer. The gcbits mark each
+// pointer value. The src pointer is off bytes into the gcbits.
+//
+//go:nosplit
+//go:nowritebarrier
+func cgoCheckBits(src unsafe.Pointer, gcbits *byte, off, size uintptr) {
+	skipMask := off / goarch.PtrSize / 8
+	skipBytes := skipMask * goarch.PtrSize * 8
+	ptrmask := addb(gcbits, skipMask)
+	src = add(src, skipBytes)
+	off -= skipBytes
+	size += off
+	var bits uint32
+	for i := uintptr(0); i < size; i += goarch.PtrSize {
+		if i&(goarch.PtrSize*8-1) == 0 {
+			bits = uint32(*ptrmask)
+			ptrmask = addb(ptrmask, 1)
+		} else {
+			bits >>= 1
+		}
+		if off > 0 {
+			off -= goarch.PtrSize
+		} else {
+			if bits&1 != 0 {
+				v := *(*unsafe.Pointer)(add(src, i))
+				if cgoIsGoPointer(v) && !isPinned(v) {
+					throw(cgoWriteBarrierFail)
+				}
+			}
+		}
+	}
+}
+
+// cgoCheckUsingType is like cgoCheckTypedBlock, but is a last ditch
+// fall back to look for pointers in src using the type information.
+// We only use this when looking at a value on the stack when the type
+// uses a GC program, because otherwise it's more efficient to use the
+// GC bits. This is called on the system stack.
+//
+//go:nowritebarrier
+//go:systemstack
+func cgoCheckUsingType(typ *_type, src unsafe.Pointer, off, size uintptr) {
+	if typ.PtrBytes == 0 {
+		return
+	}
+
+	// Anything past typ.PtrBytes is not a pointer.
+	if typ.PtrBytes <= off {
+		return
+	}
+	if ptrdataSize := typ.PtrBytes - off; size > ptrdataSize {
+		size = ptrdataSize
+	}
+
+	if typ.Kind_&kindGCProg == 0 {
+		cgoCheckBits(src, typ.GCData, off, size)
+		return
+	}
+	switch typ.Kind_ & kindMask {
+	default:
+		throw("can't happen")
+	case kindArray:
+		at := (*arraytype)(unsafe.Pointer(typ))
+		for i := uintptr(0); i < at.Len; i++ {
+			if off < at.Elem.Size_ {
+				cgoCheckUsingType(at.Elem, src, off, size)
+			}
+			src = add(src, at.Elem.Size_)
+			skipped := off
+			if skipped > at.Elem.Size_ {
+				skipped = at.Elem.Size_
+			}
+			checked := at.Elem.Size_ - skipped
+			off -= skipped
+			if size <= checked {
+				return
+			}
+			size -= checked
+		}
+	case kindStruct:
+		st := (*structtype)(unsafe.Pointer(typ))
+		for _, f := range st.Fields {
+			if off < f.Typ.Size_ {
+				cgoCheckUsingType(f.Typ, src, off, size)
+			}
+			src = add(src, f.Typ.Size_)
+			skipped := off
+			if skipped > f.Typ.Size_ {
+				skipped = f.Typ.Size_
+			}
+			checked := f.Typ.Size_ - skipped
+			off -= skipped
+			if size <= checked {
+				return
+			}
+			size -= checked
+		}
+	}
+}
diff --git a/src/runtime/chan.go b/src/runtime/chan.go
new file mode 100644
index 0000000..ff9e2a9
--- /dev/null
+++ b/src/runtime/chan.go
@@ -0,0 +1,851 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// This file contains the implementation of Go channels.
+
+// Invariants:
+//  At least one of c.sendq and c.recvq is empty,
+//  except for the case of an unbuffered channel with a single goroutine
+//  blocked on it for both sending and receiving using a select statement,
+//  in which case the length of c.sendq and c.recvq is limited only by the
+//  size of the select statement.
+//
+// For buffered channels, also:
+//  c.qcount > 0 implies that c.recvq is empty.
+//  c.qcount < c.dataqsiz implies that c.sendq is empty.
+
+import (
+	"internal/abi"
+	"runtime/internal/atomic"
+	"runtime/internal/math"
+	"unsafe"
+)
+
+const (
+	maxAlign  = 8
+	hchanSize = unsafe.Sizeof(hchan{}) + uintptr(-int(unsafe.Sizeof(hchan{}))&(maxAlign-1))
+	debugChan = false
+)
+
+type hchan struct {
+	qcount   uint           // total data in the queue
+	dataqsiz uint           // size of the circular queue
+	buf      unsafe.Pointer // points to an array of dataqsiz elements
+	elemsize uint16
+	closed   uint32
+	elemtype *_type // element type
+	sendx    uint   // send index
+	recvx    uint   // receive index
+	recvq    waitq  // list of recv waiters
+	sendq    waitq  // list of send waiters
+
+	// lock protects all fields in hchan, as well as several
+	// fields in sudogs blocked on this channel.
+	//
+	// Do not change another G's status while holding this lock
+	// (in particular, do not ready a G), as this can deadlock
+	// with stack shrinking.
+	lock mutex
+}
+
+type waitq struct {
+	first *sudog
+	last  *sudog
+}
+
+//go:linkname reflect_makechan reflect.makechan
+func reflect_makechan(t *chantype, size int) *hchan {
+	return makechan(t, size)
+}
+
+func makechan64(t *chantype, size int64) *hchan {
+	if int64(int(size)) != size {
+		panic(plainError("makechan: size out of range"))
+	}
+
+	return makechan(t, int(size))
+}
+
+func makechan(t *chantype, size int) *hchan {
+	elem := t.Elem
+
+	// compiler checks this but be safe.
+	if elem.Size_ >= 1<<16 {
+		throw("makechan: invalid channel element type")
+	}
+	if hchanSize%maxAlign != 0 || elem.Align_ > maxAlign {
+		throw("makechan: bad alignment")
+	}
+
+	mem, overflow := math.MulUintptr(elem.Size_, uintptr(size))
+	if overflow || mem > maxAlloc-hchanSize || size < 0 {
+		panic(plainError("makechan: size out of range"))
+	}
+
+	// Hchan does not contain pointers interesting for GC when elements stored in buf do not contain pointers.
+	// buf points into the same allocation, elemtype is persistent.
+	// SudoG's are referenced from their owning thread so they can't be collected.
+	// TODO(dvyukov,rlh): Rethink when collector can move allocated objects.
+	var c *hchan
+	switch {
+	case mem == 0:
+		// Queue or element size is zero.
+		c = (*hchan)(mallocgc(hchanSize, nil, true))
+		// Race detector uses this location for synchronization.
+		c.buf = c.raceaddr()
+	case elem.PtrBytes == 0:
+		// Elements do not contain pointers.
+		// Allocate hchan and buf in one call.
+		c = (*hchan)(mallocgc(hchanSize+mem, nil, true))
+		c.buf = add(unsafe.Pointer(c), hchanSize)
+	default:
+		// Elements contain pointers.
+		c = new(hchan)
+		c.buf = mallocgc(mem, elem, true)
+	}
+
+	c.elemsize = uint16(elem.Size_)
+	c.elemtype = elem
+	c.dataqsiz = uint(size)
+	lockInit(&c.lock, lockRankHchan)
+
+	if debugChan {
+		print("makechan: chan=", c, "; elemsize=", elem.Size_, "; dataqsiz=", size, "\n")
+	}
+	return c
+}
+
+// chanbuf(c, i) is pointer to the i'th slot in the buffer.
+func chanbuf(c *hchan, i uint) unsafe.Pointer {
+	return add(c.buf, uintptr(i)*uintptr(c.elemsize))
+}
+
+// full reports whether a send on c would block (that is, the channel is full).
+// It uses a single word-sized read of mutable state, so although
+// the answer is instantaneously true, the correct answer may have changed
+// by the time the calling function receives the return value.
+func full(c *hchan) bool {
+	// c.dataqsiz is immutable (never written after the channel is created)
+	// so it is safe to read at any time during channel operation.
+	if c.dataqsiz == 0 {
+		// Assumes that a pointer read is relaxed-atomic.
+		return c.recvq.first == nil
+	}
+	// Assumes that a uint read is relaxed-atomic.
+	return c.qcount == c.dataqsiz
+}
+
+// entry point for c <- x from compiled code.
+//
+//go:nosplit
+func chansend1(c *hchan, elem unsafe.Pointer) {
+	chansend(c, elem, true, getcallerpc())
+}
+
+/*
+ * generic single channel send/recv
+ * If block is not nil,
+ * then the protocol will not
+ * sleep but return if it could
+ * not complete.
+ *
+ * sleep can wake up with g.param == nil
+ * when a channel involved in the sleep has
+ * been closed.  it is easiest to loop and re-run
+ * the operation; we'll see that it's now closed.
+ */
+func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
+	if c == nil {
+		if !block {
+			return false
+		}
+		gopark(nil, nil, waitReasonChanSendNilChan, traceBlockForever, 2)
+		throw("unreachable")
+	}
+
+	if debugChan {
+		print("chansend: chan=", c, "\n")
+	}
+
+	if raceenabled {
+		racereadpc(c.raceaddr(), callerpc, abi.FuncPCABIInternal(chansend))
+	}
+
+	// Fast path: check for failed non-blocking operation without acquiring the lock.
+	//
+	// After observing that the channel is not closed, we observe that the channel is
+	// not ready for sending. Each of these observations is a single word-sized read
+	// (first c.closed and second full()).
+	// Because a closed channel cannot transition from 'ready for sending' to
+	// 'not ready for sending', even if the channel is closed between the two observations,
+	// they imply a moment between the two when the channel was both not yet closed
+	// and not ready for sending. We behave as if we observed the channel at that moment,
+	// and report that the send cannot proceed.
+	//
+	// It is okay if the reads are reordered here: if we observe that the channel is not
+	// ready for sending and then observe that it is not closed, that implies that the
+	// channel wasn't closed during the first observation. However, nothing here
+	// guarantees forward progress. We rely on the side effects of lock release in
+	// chanrecv() and closechan() to update this thread's view of c.closed and full().
+	if !block && c.closed == 0 && full(c) {
+		return false
+	}
+
+	var t0 int64
+	if blockprofilerate > 0 {
+		t0 = cputicks()
+	}
+
+	lock(&c.lock)
+
+	if c.closed != 0 {
+		unlock(&c.lock)
+		panic(plainError("send on closed channel"))
+	}
+
+	if sg := c.recvq.dequeue(); sg != nil {
+		// Found a waiting receiver. We pass the value we want to send
+		// directly to the receiver, bypassing the channel buffer (if any).
+		send(c, sg, ep, func() { unlock(&c.lock) }, 3)
+		return true
+	}
+
+	if c.qcount < c.dataqsiz {
+		// Space is available in the channel buffer. Enqueue the element to send.
+		qp := chanbuf(c, c.sendx)
+		if raceenabled {
+			racenotify(c, c.sendx, nil)
+		}
+		typedmemmove(c.elemtype, qp, ep)
+		c.sendx++
+		if c.sendx == c.dataqsiz {
+			c.sendx = 0
+		}
+		c.qcount++
+		unlock(&c.lock)
+		return true
+	}
+
+	if !block {
+		unlock(&c.lock)
+		return false
+	}
+
+	// Block on the channel. Some receiver will complete our operation for us.
+	gp := getg()
+	mysg := acquireSudog()
+	mysg.releasetime = 0
+	if t0 != 0 {
+		mysg.releasetime = -1
+	}
+	// No stack splits between assigning elem and enqueuing mysg
+	// on gp.waiting where copystack can find it.
+	mysg.elem = ep
+	mysg.waitlink = nil
+	mysg.g = gp
+	mysg.isSelect = false
+	mysg.c = c
+	gp.waiting = mysg
+	gp.param = nil
+	c.sendq.enqueue(mysg)
+	// Signal to anyone trying to shrink our stack that we're about
+	// to park on a channel. The window between when this G's status
+	// changes and when we set gp.activeStackChans is not safe for
+	// stack shrinking.
+	gp.parkingOnChan.Store(true)
+	gopark(chanparkcommit, unsafe.Pointer(&c.lock), waitReasonChanSend, traceBlockChanSend, 2)
+	// Ensure the value being sent is kept alive until the
+	// receiver copies it out. The sudog has a pointer to the
+	// stack object, but sudogs aren't considered as roots of the
+	// stack tracer.
+	KeepAlive(ep)
+
+	// someone woke us up.
+	if mysg != gp.waiting {
+		throw("G waiting list is corrupted")
+	}
+	gp.waiting = nil
+	gp.activeStackChans = false
+	closed := !mysg.success
+	gp.param = nil
+	if mysg.releasetime > 0 {
+		blockevent(mysg.releasetime-t0, 2)
+	}
+	mysg.c = nil
+	releaseSudog(mysg)
+	if closed {
+		if c.closed == 0 {
+			throw("chansend: spurious wakeup")
+		}
+		panic(plainError("send on closed channel"))
+	}
+	return true
+}
+
+// send processes a send operation on an empty channel c.
+// The value ep sent by the sender is copied to the receiver sg.
+// The receiver is then woken up to go on its merry way.
+// Channel c must be empty and locked.  send unlocks c with unlockf.
+// sg must already be dequeued from c.
+// ep must be non-nil and point to the heap or the caller's stack.
+func send(c *hchan, sg *sudog, ep unsafe.Pointer, unlockf func(), skip int) {
+	if raceenabled {
+		if c.dataqsiz == 0 {
+			racesync(c, sg)
+		} else {
+			// Pretend we go through the buffer, even though
+			// we copy directly. Note that we need to increment
+			// the head/tail locations only when raceenabled.
+			racenotify(c, c.recvx, nil)
+			racenotify(c, c.recvx, sg)
+			c.recvx++
+			if c.recvx == c.dataqsiz {
+				c.recvx = 0
+			}
+			c.sendx = c.recvx // c.sendx = (c.sendx+1) % c.dataqsiz
+		}
+	}
+	if sg.elem != nil {
+		sendDirect(c.elemtype, sg, ep)
+		sg.elem = nil
+	}
+	gp := sg.g
+	unlockf()
+	gp.param = unsafe.Pointer(sg)
+	sg.success = true
+	if sg.releasetime != 0 {
+		sg.releasetime = cputicks()
+	}
+	goready(gp, skip+1)
+}
+
+// Sends and receives on unbuffered or empty-buffered channels are the
+// only operations where one running goroutine writes to the stack of
+// another running goroutine. The GC assumes that stack writes only
+// happen when the goroutine is running and are only done by that
+// goroutine. Using a write barrier is sufficient to make up for
+// violating that assumption, but the write barrier has to work.
+// typedmemmove will call bulkBarrierPreWrite, but the target bytes
+// are not in the heap, so that will not help. We arrange to call
+// memmove and typeBitsBulkBarrier instead.
+
+func sendDirect(t *_type, sg *sudog, src unsafe.Pointer) {
+	// src is on our stack, dst is a slot on another stack.
+
+	// Once we read sg.elem out of sg, it will no longer
+	// be updated if the destination's stack gets copied (shrunk).
+	// So make sure that no preemption points can happen between read & use.
+	dst := sg.elem
+	typeBitsBulkBarrier(t, uintptr(dst), uintptr(src), t.Size_)
+	// No need for cgo write barrier checks because dst is always
+	// Go memory.
+	memmove(dst, src, t.Size_)
+}
+
+func recvDirect(t *_type, sg *sudog, dst unsafe.Pointer) {
+	// dst is on our stack or the heap, src is on another stack.
+	// The channel is locked, so src will not move during this
+	// operation.
+	src := sg.elem
+	typeBitsBulkBarrier(t, uintptr(dst), uintptr(src), t.Size_)
+	memmove(dst, src, t.Size_)
+}
+
+func closechan(c *hchan) {
+	if c == nil {
+		panic(plainError("close of nil channel"))
+	}
+
+	lock(&c.lock)
+	if c.closed != 0 {
+		unlock(&c.lock)
+		panic(plainError("close of closed channel"))
+	}
+
+	if raceenabled {
+		callerpc := getcallerpc()
+		racewritepc(c.raceaddr(), callerpc, abi.FuncPCABIInternal(closechan))
+		racerelease(c.raceaddr())
+	}
+
+	c.closed = 1
+
+	var glist gList
+
+	// release all readers
+	for {
+		sg := c.recvq.dequeue()
+		if sg == nil {
+			break
+		}
+		if sg.elem != nil {
+			typedmemclr(c.elemtype, sg.elem)
+			sg.elem = nil
+		}
+		if sg.releasetime != 0 {
+			sg.releasetime = cputicks()
+		}
+		gp := sg.g
+		gp.param = unsafe.Pointer(sg)
+		sg.success = false
+		if raceenabled {
+			raceacquireg(gp, c.raceaddr())
+		}
+		glist.push(gp)
+	}
+
+	// release all writers (they will panic)
+	for {
+		sg := c.sendq.dequeue()
+		if sg == nil {
+			break
+		}
+		sg.elem = nil
+		if sg.releasetime != 0 {
+			sg.releasetime = cputicks()
+		}
+		gp := sg.g
+		gp.param = unsafe.Pointer(sg)
+		sg.success = false
+		if raceenabled {
+			raceacquireg(gp, c.raceaddr())
+		}
+		glist.push(gp)
+	}
+	unlock(&c.lock)
+
+	// Ready all Gs now that we've dropped the channel lock.
+	for !glist.empty() {
+		gp := glist.pop()
+		gp.schedlink = 0
+		goready(gp, 3)
+	}
+}
+
+// empty reports whether a read from c would block (that is, the channel is
+// empty).  It uses a single atomic read of mutable state.
+func empty(c *hchan) bool {
+	// c.dataqsiz is immutable.
+	if c.dataqsiz == 0 {
+		return atomic.Loadp(unsafe.Pointer(&c.sendq.first)) == nil
+	}
+	return atomic.Loaduint(&c.qcount) == 0
+}
+
+// entry points for <- c from compiled code.
+//
+//go:nosplit
+func chanrecv1(c *hchan, elem unsafe.Pointer) {
+	chanrecv(c, elem, true)
+}
+
+//go:nosplit
+func chanrecv2(c *hchan, elem unsafe.Pointer) (received bool) {
+	_, received = chanrecv(c, elem, true)
+	return
+}
+
+// chanrecv receives on channel c and writes the received data to ep.
+// ep may be nil, in which case received data is ignored.
+// If block == false and no elements are available, returns (false, false).
+// Otherwise, if c is closed, zeros *ep and returns (true, false).
+// Otherwise, fills in *ep with an element and returns (true, true).
+// A non-nil ep must point to the heap or the caller's stack.
+func chanrecv(c *hchan, ep unsafe.Pointer, block bool) (selected, received bool) {
+	// raceenabled: don't need to check ep, as it is always on the stack
+	// or is new memory allocated by reflect.
+
+	if debugChan {
+		print("chanrecv: chan=", c, "\n")
+	}
+
+	if c == nil {
+		if !block {
+			return
+		}
+		gopark(nil, nil, waitReasonChanReceiveNilChan, traceBlockForever, 2)
+		throw("unreachable")
+	}
+
+	// Fast path: check for failed non-blocking operation without acquiring the lock.
+	if !block && empty(c) {
+		// After observing that the channel is not ready for receiving, we observe whether the
+		// channel is closed.
+		//
+		// Reordering of these checks could lead to incorrect behavior when racing with a close.
+		// For example, if the channel was open and not empty, was closed, and then drained,
+		// reordered reads could incorrectly indicate "open and empty". To prevent reordering,
+		// we use atomic loads for both checks, and rely on emptying and closing to happen in
+		// separate critical sections under the same lock.  This assumption fails when closing
+		// an unbuffered channel with a blocked send, but that is an error condition anyway.
+		if atomic.Load(&c.closed) == 0 {
+			// Because a channel cannot be reopened, the later observation of the channel
+			// being not closed implies that it was also not closed at the moment of the
+			// first observation. We behave as if we observed the channel at that moment
+			// and report that the receive cannot proceed.
+			return
+		}
+		// The channel is irreversibly closed. Re-check whether the channel has any pending data
+		// to receive, which could have arrived between the empty and closed checks above.
+		// Sequential consistency is also required here, when racing with such a send.
+		if empty(c) {
+			// The channel is irreversibly closed and empty.
+			if raceenabled {
+				raceacquire(c.raceaddr())
+			}
+			if ep != nil {
+				typedmemclr(c.elemtype, ep)
+			}
+			return true, false
+		}
+	}
+
+	var t0 int64
+	if blockprofilerate > 0 {
+		t0 = cputicks()
+	}
+
+	lock(&c.lock)
+
+	if c.closed != 0 {
+		if c.qcount == 0 {
+			if raceenabled {
+				raceacquire(c.raceaddr())
+			}
+			unlock(&c.lock)
+			if ep != nil {
+				typedmemclr(c.elemtype, ep)
+			}
+			return true, false
+		}
+		// The channel has been closed, but the channel's buffer have data.
+	} else {
+		// Just found waiting sender with not closed.
+		if sg := c.sendq.dequeue(); sg != nil {
+			// Found a waiting sender. If buffer is size 0, receive value
+			// directly from sender. Otherwise, receive from head of queue
+			// and add sender's value to the tail of the queue (both map to
+			// the same buffer slot because the queue is full).
+			recv(c, sg, ep, func() { unlock(&c.lock) }, 3)
+			return true, true
+		}
+	}
+
+	if c.qcount > 0 {
+		// Receive directly from queue
+		qp := chanbuf(c, c.recvx)
+		if raceenabled {
+			racenotify(c, c.recvx, nil)
+		}
+		if ep != nil {
+			typedmemmove(c.elemtype, ep, qp)
+		}
+		typedmemclr(c.elemtype, qp)
+		c.recvx++
+		if c.recvx == c.dataqsiz {
+			c.recvx = 0
+		}
+		c.qcount--
+		unlock(&c.lock)
+		return true, true
+	}
+
+	if !block {
+		unlock(&c.lock)
+		return false, false
+	}
+
+	// no sender available: block on this channel.
+	gp := getg()
+	mysg := acquireSudog()
+	mysg.releasetime = 0
+	if t0 != 0 {
+		mysg.releasetime = -1
+	}
+	// No stack splits between assigning elem and enqueuing mysg
+	// on gp.waiting where copystack can find it.
+	mysg.elem = ep
+	mysg.waitlink = nil
+	gp.waiting = mysg
+	mysg.g = gp
+	mysg.isSelect = false
+	mysg.c = c
+	gp.param = nil
+	c.recvq.enqueue(mysg)
+	// Signal to anyone trying to shrink our stack that we're about
+	// to park on a channel. The window between when this G's status
+	// changes and when we set gp.activeStackChans is not safe for
+	// stack shrinking.
+	gp.parkingOnChan.Store(true)
+	gopark(chanparkcommit, unsafe.Pointer(&c.lock), waitReasonChanReceive, traceBlockChanRecv, 2)
+
+	// someone woke us up
+	if mysg != gp.waiting {
+		throw("G waiting list is corrupted")
+	}
+	gp.waiting = nil
+	gp.activeStackChans = false
+	if mysg.releasetime > 0 {
+		blockevent(mysg.releasetime-t0, 2)
+	}
+	success := mysg.success
+	gp.param = nil
+	mysg.c = nil
+	releaseSudog(mysg)
+	return true, success
+}
+
+// recv processes a receive operation on a full channel c.
+// There are 2 parts:
+//  1. The value sent by the sender sg is put into the channel
+//     and the sender is woken up to go on its merry way.
+//  2. The value received by the receiver (the current G) is
+//     written to ep.
+//
+// For synchronous channels, both values are the same.
+// For asynchronous channels, the receiver gets its data from
+// the channel buffer and the sender's data is put in the
+// channel buffer.
+// Channel c must be full and locked. recv unlocks c with unlockf.
+// sg must already be dequeued from c.
+// A non-nil ep must point to the heap or the caller's stack.
+func recv(c *hchan, sg *sudog, ep unsafe.Pointer, unlockf func(), skip int) {
+	if c.dataqsiz == 0 {
+		if raceenabled {
+			racesync(c, sg)
+		}
+		if ep != nil {
+			// copy data from sender
+			recvDirect(c.elemtype, sg, ep)
+		}
+	} else {
+		// Queue is full. Take the item at the
+		// head of the queue. Make the sender enqueue
+		// its item at the tail of the queue. Since the
+		// queue is full, those are both the same slot.
+		qp := chanbuf(c, c.recvx)
+		if raceenabled {
+			racenotify(c, c.recvx, nil)
+			racenotify(c, c.recvx, sg)
+		}
+		// copy data from queue to receiver
+		if ep != nil {
+			typedmemmove(c.elemtype, ep, qp)
+		}
+		// copy data from sender to queue
+		typedmemmove(c.elemtype, qp, sg.elem)
+		c.recvx++
+		if c.recvx == c.dataqsiz {
+			c.recvx = 0
+		}
+		c.sendx = c.recvx // c.sendx = (c.sendx+1) % c.dataqsiz
+	}
+	sg.elem = nil
+	gp := sg.g
+	unlockf()
+	gp.param = unsafe.Pointer(sg)
+	sg.success = true
+	if sg.releasetime != 0 {
+		sg.releasetime = cputicks()
+	}
+	goready(gp, skip+1)
+}
+
+func chanparkcommit(gp *g, chanLock unsafe.Pointer) bool {
+	// There are unlocked sudogs that point into gp's stack. Stack
+	// copying must lock the channels of those sudogs.
+	// Set activeStackChans here instead of before we try parking
+	// because we could self-deadlock in stack growth on the
+	// channel lock.
+	gp.activeStackChans = true
+	// Mark that it's safe for stack shrinking to occur now,
+	// because any thread acquiring this G's stack for shrinking
+	// is guaranteed to observe activeStackChans after this store.
+	gp.parkingOnChan.Store(false)
+	// Make sure we unlock after setting activeStackChans and
+	// unsetting parkingOnChan. The moment we unlock chanLock
+	// we risk gp getting readied by a channel operation and
+	// so gp could continue running before everything before
+	// the unlock is visible (even to gp itself).
+	unlock((*mutex)(chanLock))
+	return true
+}
+
+// compiler implements
+//
+//	select {
+//	case c <- v:
+//		... foo
+//	default:
+//		... bar
+//	}
+//
+// as
+//
+//	if selectnbsend(c, v) {
+//		... foo
+//	} else {
+//		... bar
+//	}
+func selectnbsend(c *hchan, elem unsafe.Pointer) (selected bool) {
+	return chansend(c, elem, false, getcallerpc())
+}
+
+// compiler implements
+//
+//	select {
+//	case v, ok = <-c:
+//		... foo
+//	default:
+//		... bar
+//	}
+//
+// as
+//
+//	if selected, ok = selectnbrecv(&v, c); selected {
+//		... foo
+//	} else {
+//		... bar
+//	}
+func selectnbrecv(elem unsafe.Pointer, c *hchan) (selected, received bool) {
+	return chanrecv(c, elem, false)
+}
+
+//go:linkname reflect_chansend reflect.chansend0
+func reflect_chansend(c *hchan, elem unsafe.Pointer, nb bool) (selected bool) {
+	return chansend(c, elem, !nb, getcallerpc())
+}
+
+//go:linkname reflect_chanrecv reflect.chanrecv
+func reflect_chanrecv(c *hchan, nb bool, elem unsafe.Pointer) (selected bool, received bool) {
+	return chanrecv(c, elem, !nb)
+}
+
+//go:linkname reflect_chanlen reflect.chanlen
+func reflect_chanlen(c *hchan) int {
+	if c == nil {
+		return 0
+	}
+	return int(c.qcount)
+}
+
+//go:linkname reflectlite_chanlen internal/reflectlite.chanlen
+func reflectlite_chanlen(c *hchan) int {
+	if c == nil {
+		return 0
+	}
+	return int(c.qcount)
+}
+
+//go:linkname reflect_chancap reflect.chancap
+func reflect_chancap(c *hchan) int {
+	if c == nil {
+		return 0
+	}
+	return int(c.dataqsiz)
+}
+
+//go:linkname reflect_chanclose reflect.chanclose
+func reflect_chanclose(c *hchan) {
+	closechan(c)
+}
+
+func (q *waitq) enqueue(sgp *sudog) {
+	sgp.next = nil
+	x := q.last
+	if x == nil {
+		sgp.prev = nil
+		q.first = sgp
+		q.last = sgp
+		return
+	}
+	sgp.prev = x
+	x.next = sgp
+	q.last = sgp
+}
+
+func (q *waitq) dequeue() *sudog {
+	for {
+		sgp := q.first
+		if sgp == nil {
+			return nil
+		}
+		y := sgp.next
+		if y == nil {
+			q.first = nil
+			q.last = nil
+		} else {
+			y.prev = nil
+			q.first = y
+			sgp.next = nil // mark as removed (see dequeueSudoG)
+		}
+
+		// if a goroutine was put on this queue because of a
+		// select, there is a small window between the goroutine
+		// being woken up by a different case and it grabbing the
+		// channel locks. Once it has the lock
+		// it removes itself from the queue, so we won't see it after that.
+		// We use a flag in the G struct to tell us when someone
+		// else has won the race to signal this goroutine but the goroutine
+		// hasn't removed itself from the queue yet.
+		if sgp.isSelect && !sgp.g.selectDone.CompareAndSwap(0, 1) {
+			continue
+		}
+
+		return sgp
+	}
+}
+
+func (c *hchan) raceaddr() unsafe.Pointer {
+	// Treat read-like and write-like operations on the channel to
+	// happen at this address. Avoid using the address of qcount
+	// or dataqsiz, because the len() and cap() builtins read
+	// those addresses, and we don't want them racing with
+	// operations like close().
+	return unsafe.Pointer(&c.buf)
+}
+
+func racesync(c *hchan, sg *sudog) {
+	racerelease(chanbuf(c, 0))
+	raceacquireg(sg.g, chanbuf(c, 0))
+	racereleaseg(sg.g, chanbuf(c, 0))
+	raceacquire(chanbuf(c, 0))
+}
+
+// Notify the race detector of a send or receive involving buffer entry idx
+// and a channel c or its communicating partner sg.
+// This function handles the special case of c.elemsize==0.
+func racenotify(c *hchan, idx uint, sg *sudog) {
+	// We could have passed the unsafe.Pointer corresponding to entry idx
+	// instead of idx itself.  However, in a future version of this function,
+	// we can use idx to better handle the case of elemsize==0.
+	// A future improvement to the detector is to call TSan with c and idx:
+	// this way, Go will continue to not allocating buffer entries for channels
+	// of elemsize==0, yet the race detector can be made to handle multiple
+	// sync objects underneath the hood (one sync object per idx)
+	qp := chanbuf(c, idx)
+	// When elemsize==0, we don't allocate a full buffer for the channel.
+	// Instead of individual buffer entries, the race detector uses the
+	// c.buf as the only buffer entry.  This simplification prevents us from
+	// following the memory model's happens-before rules (rules that are
+	// implemented in racereleaseacquire).  Instead, we accumulate happens-before
+	// information in the synchronization object associated with c.buf.
+	if c.elemsize == 0 {
+		if sg == nil {
+			raceacquire(qp)
+			racerelease(qp)
+		} else {
+			raceacquireg(sg.g, qp)
+			racereleaseg(sg.g, qp)
+		}
+	} else {
+		if sg == nil {
+			racereleaseacquire(qp)
+		} else {
+			racereleaseacquireg(sg.g, qp)
+		}
+	}
+}
diff --git a/src/runtime/chan_test.go b/src/runtime/chan_test.go
new file mode 100644
index 0000000..256f976
--- /dev/null
+++ b/src/runtime/chan_test.go
@@ -0,0 +1,1221 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"internal/testenv"
+	"math"
+	"runtime"
+	"sync"
+	"sync/atomic"
+	"testing"
+	"time"
+)
+
+func TestChan(t *testing.T) {
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(4))
+	N := 200
+	if testing.Short() {
+		N = 20
+	}
+	for chanCap := 0; chanCap < N; chanCap++ {
+		{
+			// Ensure that receive from empty chan blocks.
+			c := make(chan int, chanCap)
+			recv1 := false
+			go func() {
+				_ = <-c
+				recv1 = true
+			}()
+			recv2 := false
+			go func() {
+				_, _ = <-c
+				recv2 = true
+			}()
+			time.Sleep(time.Millisecond)
+			if recv1 || recv2 {
+				t.Fatalf("chan[%d]: receive from empty chan", chanCap)
+			}
+			// Ensure that non-blocking receive does not block.
+			select {
+			case _ = <-c:
+				t.Fatalf("chan[%d]: receive from empty chan", chanCap)
+			default:
+			}
+			select {
+			case _, _ = <-c:
+				t.Fatalf("chan[%d]: receive from empty chan", chanCap)
+			default:
+			}
+			c <- 0
+			c <- 0
+		}
+
+		{
+			// Ensure that send to full chan blocks.
+			c := make(chan int, chanCap)
+			for i := 0; i < chanCap; i++ {
+				c <- i
+			}
+			sent := uint32(0)
+			go func() {
+				c <- 0
+				atomic.StoreUint32(&sent, 1)
+			}()
+			time.Sleep(time.Millisecond)
+			if atomic.LoadUint32(&sent) != 0 {
+				t.Fatalf("chan[%d]: send to full chan", chanCap)
+			}
+			// Ensure that non-blocking send does not block.
+			select {
+			case c <- 0:
+				t.Fatalf("chan[%d]: send to full chan", chanCap)
+			default:
+			}
+			<-c
+		}
+
+		{
+			// Ensure that we receive 0 from closed chan.
+			c := make(chan int, chanCap)
+			for i := 0; i < chanCap; i++ {
+				c <- i
+			}
+			close(c)
+			for i := 0; i < chanCap; i++ {
+				v := <-c
+				if v != i {
+					t.Fatalf("chan[%d]: received %v, expected %v", chanCap, v, i)
+				}
+			}
+			if v := <-c; v != 0 {
+				t.Fatalf("chan[%d]: received %v, expected %v", chanCap, v, 0)
+			}
+			if v, ok := <-c; v != 0 || ok {
+				t.Fatalf("chan[%d]: received %v/%v, expected %v/%v", chanCap, v, ok, 0, false)
+			}
+		}
+
+		{
+			// Ensure that close unblocks receive.
+			c := make(chan int, chanCap)
+			done := make(chan bool)
+			go func() {
+				v, ok := <-c
+				done <- v == 0 && ok == false
+			}()
+			time.Sleep(time.Millisecond)
+			close(c)
+			if !<-done {
+				t.Fatalf("chan[%d]: received non zero from closed chan", chanCap)
+			}
+		}
+
+		{
+			// Send 100 integers,
+			// ensure that we receive them non-corrupted in FIFO order.
+			c := make(chan int, chanCap)
+			go func() {
+				for i := 0; i < 100; i++ {
+					c <- i
+				}
+			}()
+			for i := 0; i < 100; i++ {
+				v := <-c
+				if v != i {
+					t.Fatalf("chan[%d]: received %v, expected %v", chanCap, v, i)
+				}
+			}
+
+			// Same, but using recv2.
+			go func() {
+				for i := 0; i < 100; i++ {
+					c <- i
+				}
+			}()
+			for i := 0; i < 100; i++ {
+				v, ok := <-c
+				if !ok {
+					t.Fatalf("chan[%d]: receive failed, expected %v", chanCap, i)
+				}
+				if v != i {
+					t.Fatalf("chan[%d]: received %v, expected %v", chanCap, v, i)
+				}
+			}
+
+			// Send 1000 integers in 4 goroutines,
+			// ensure that we receive what we send.
+			const P = 4
+			const L = 1000
+			for p := 0; p < P; p++ {
+				go func() {
+					for i := 0; i < L; i++ {
+						c <- i
+					}
+				}()
+			}
+			done := make(chan map[int]int)
+			for p := 0; p < P; p++ {
+				go func() {
+					recv := make(map[int]int)
+					for i := 0; i < L; i++ {
+						v := <-c
+						recv[v] = recv[v] + 1
+					}
+					done <- recv
+				}()
+			}
+			recv := make(map[int]int)
+			for p := 0; p < P; p++ {
+				for k, v := range <-done {
+					recv[k] = recv[k] + v
+				}
+			}
+			if len(recv) != L {
+				t.Fatalf("chan[%d]: received %v values, expected %v", chanCap, len(recv), L)
+			}
+			for _, v := range recv {
+				if v != P {
+					t.Fatalf("chan[%d]: received %v values, expected %v", chanCap, v, P)
+				}
+			}
+		}
+
+		{
+			// Test len/cap.
+			c := make(chan int, chanCap)
+			if len(c) != 0 || cap(c) != chanCap {
+				t.Fatalf("chan[%d]: bad len/cap, expect %v/%v, got %v/%v", chanCap, 0, chanCap, len(c), cap(c))
+			}
+			for i := 0; i < chanCap; i++ {
+				c <- i
+			}
+			if len(c) != chanCap || cap(c) != chanCap {
+				t.Fatalf("chan[%d]: bad len/cap, expect %v/%v, got %v/%v", chanCap, chanCap, chanCap, len(c), cap(c))
+			}
+		}
+
+	}
+}
+
+func TestNonblockRecvRace(t *testing.T) {
+	n := 10000
+	if testing.Short() {
+		n = 100
+	}
+	for i := 0; i < n; i++ {
+		c := make(chan int, 1)
+		c <- 1
+		go func() {
+			select {
+			case <-c:
+			default:
+				t.Error("chan is not ready")
+			}
+		}()
+		close(c)
+		<-c
+		if t.Failed() {
+			return
+		}
+	}
+}
+
+// This test checks that select acts on the state of the channels at one
+// moment in the execution, not over a smeared time window.
+// In the test, one goroutine does:
+//
+//	create c1, c2
+//	make c1 ready for receiving
+//	create second goroutine
+//	make c2 ready for receiving
+//	make c1 no longer ready for receiving (if possible)
+//
+// The second goroutine does a non-blocking select receiving from c1 and c2.
+// From the time the second goroutine is created, at least one of c1 and c2
+// is always ready for receiving, so the select in the second goroutine must
+// always receive from one or the other. It must never execute the default case.
+func TestNonblockSelectRace(t *testing.T) {
+	n := 100000
+	if testing.Short() {
+		n = 1000
+	}
+	done := make(chan bool, 1)
+	for i := 0; i < n; i++ {
+		c1 := make(chan int, 1)
+		c2 := make(chan int, 1)
+		c1 <- 1
+		go func() {
+			select {
+			case <-c1:
+			case <-c2:
+			default:
+				done <- false
+				return
+			}
+			done <- true
+		}()
+		c2 <- 1
+		select {
+		case <-c1:
+		default:
+		}
+		if !<-done {
+			t.Fatal("no chan is ready")
+		}
+	}
+}
+
+// Same as TestNonblockSelectRace, but close(c2) replaces c2 <- 1.
+func TestNonblockSelectRace2(t *testing.T) {
+	n := 100000
+	if testing.Short() {
+		n = 1000
+	}
+	done := make(chan bool, 1)
+	for i := 0; i < n; i++ {
+		c1 := make(chan int, 1)
+		c2 := make(chan int)
+		c1 <- 1
+		go func() {
+			select {
+			case <-c1:
+			case <-c2:
+			default:
+				done <- false
+				return
+			}
+			done <- true
+		}()
+		close(c2)
+		select {
+		case <-c1:
+		default:
+		}
+		if !<-done {
+			t.Fatal("no chan is ready")
+		}
+	}
+}
+
+func TestSelfSelect(t *testing.T) {
+	// Ensure that send/recv on the same chan in select
+	// does not crash nor deadlock.
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(2))
+	for _, chanCap := range []int{0, 10} {
+		var wg sync.WaitGroup
+		wg.Add(2)
+		c := make(chan int, chanCap)
+		for p := 0; p < 2; p++ {
+			p := p
+			go func() {
+				defer wg.Done()
+				for i := 0; i < 1000; i++ {
+					if p == 0 || i%2 == 0 {
+						select {
+						case c <- p:
+						case v := <-c:
+							if chanCap == 0 && v == p {
+								t.Errorf("self receive")
+								return
+							}
+						}
+					} else {
+						select {
+						case v := <-c:
+							if chanCap == 0 && v == p {
+								t.Errorf("self receive")
+								return
+							}
+						case c <- p:
+						}
+					}
+				}
+			}()
+		}
+		wg.Wait()
+	}
+}
+
+func TestSelectStress(t *testing.T) {
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(10))
+	var c [4]chan int
+	c[0] = make(chan int)
+	c[1] = make(chan int)
+	c[2] = make(chan int, 2)
+	c[3] = make(chan int, 3)
+	N := int(1e5)
+	if testing.Short() {
+		N /= 10
+	}
+	// There are 4 goroutines that send N values on each of the chans,
+	// + 4 goroutines that receive N values on each of the chans,
+	// + 1 goroutine that sends N values on each of the chans in a single select,
+	// + 1 goroutine that receives N values on each of the chans in a single select.
+	// All these sends, receives and selects interact chaotically at runtime,
+	// but we are careful that this whole construct does not deadlock.
+	var wg sync.WaitGroup
+	wg.Add(10)
+	for k := 0; k < 4; k++ {
+		k := k
+		go func() {
+			for i := 0; i < N; i++ {
+				c[k] <- 0
+			}
+			wg.Done()
+		}()
+		go func() {
+			for i := 0; i < N; i++ {
+				<-c[k]
+			}
+			wg.Done()
+		}()
+	}
+	go func() {
+		var n [4]int
+		c1 := c
+		for i := 0; i < 4*N; i++ {
+			select {
+			case c1[3] <- 0:
+				n[3]++
+				if n[3] == N {
+					c1[3] = nil
+				}
+			case c1[2] <- 0:
+				n[2]++
+				if n[2] == N {
+					c1[2] = nil
+				}
+			case c1[0] <- 0:
+				n[0]++
+				if n[0] == N {
+					c1[0] = nil
+				}
+			case c1[1] <- 0:
+				n[1]++
+				if n[1] == N {
+					c1[1] = nil
+				}
+			}
+		}
+		wg.Done()
+	}()
+	go func() {
+		var n [4]int
+		c1 := c
+		for i := 0; i < 4*N; i++ {
+			select {
+			case <-c1[0]:
+				n[0]++
+				if n[0] == N {
+					c1[0] = nil
+				}
+			case <-c1[1]:
+				n[1]++
+				if n[1] == N {
+					c1[1] = nil
+				}
+			case <-c1[2]:
+				n[2]++
+				if n[2] == N {
+					c1[2] = nil
+				}
+			case <-c1[3]:
+				n[3]++
+				if n[3] == N {
+					c1[3] = nil
+				}
+			}
+		}
+		wg.Done()
+	}()
+	wg.Wait()
+}
+
+func TestSelectFairness(t *testing.T) {
+	const trials = 10000
+	if runtime.GOOS == "linux" && runtime.GOARCH == "ppc64le" {
+		testenv.SkipFlaky(t, 22047)
+	}
+	c1 := make(chan byte, trials+1)
+	c2 := make(chan byte, trials+1)
+	for i := 0; i < trials+1; i++ {
+		c1 <- 1
+		c2 <- 2
+	}
+	c3 := make(chan byte)
+	c4 := make(chan byte)
+	out := make(chan byte)
+	done := make(chan byte)
+	var wg sync.WaitGroup
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		for {
+			var b byte
+			select {
+			case b = <-c3:
+			case b = <-c4:
+			case b = <-c1:
+			case b = <-c2:
+			}
+			select {
+			case out <- b:
+			case <-done:
+				return
+			}
+		}
+	}()
+	cnt1, cnt2 := 0, 0
+	for i := 0; i < trials; i++ {
+		switch b := <-out; b {
+		case 1:
+			cnt1++
+		case 2:
+			cnt2++
+		default:
+			t.Fatalf("unexpected value %d on channel", b)
+		}
+	}
+	// If the select in the goroutine is fair,
+	// cnt1 and cnt2 should be about the same value.
+	// With 10,000 trials, the expected margin of error at
+	// a confidence level of six nines is 4.891676 / (2 * Sqrt(10000)).
+	r := float64(cnt1) / trials
+	e := math.Abs(r - 0.5)
+	t.Log(cnt1, cnt2, r, e)
+	if e > 4.891676/(2*math.Sqrt(trials)) {
+		t.Errorf("unfair select: in %d trials, results were %d, %d", trials, cnt1, cnt2)
+	}
+	close(done)
+	wg.Wait()
+}
+
+func TestChanSendInterface(t *testing.T) {
+	type mt struct{}
+	m := &mt{}
+	c := make(chan any, 1)
+	c <- m
+	select {
+	case c <- m:
+	default:
+	}
+	select {
+	case c <- m:
+	case c <- &mt{}:
+	default:
+	}
+}
+
+func TestPseudoRandomSend(t *testing.T) {
+	n := 100
+	for _, chanCap := range []int{0, n} {
+		c := make(chan int, chanCap)
+		l := make([]int, n)
+		var m sync.Mutex
+		m.Lock()
+		go func() {
+			for i := 0; i < n; i++ {
+				runtime.Gosched()
+				l[i] = <-c
+			}
+			m.Unlock()
+		}()
+		for i := 0; i < n; i++ {
+			select {
+			case c <- 1:
+			case c <- 0:
+			}
+		}
+		m.Lock() // wait
+		n0 := 0
+		n1 := 0
+		for _, i := range l {
+			n0 += (i + 1) % 2
+			n1 += i
+		}
+		if n0 <= n/10 || n1 <= n/10 {
+			t.Errorf("Want pseudorandom, got %d zeros and %d ones (chan cap %d)", n0, n1, chanCap)
+		}
+	}
+}
+
+func TestMultiConsumer(t *testing.T) {
+	const nwork = 23
+	const niter = 271828
+
+	pn := []int{2, 3, 7, 11, 13, 17, 19, 23, 27, 31}
+
+	q := make(chan int, nwork*3)
+	r := make(chan int, nwork*3)
+
+	// workers
+	var wg sync.WaitGroup
+	for i := 0; i < nwork; i++ {
+		wg.Add(1)
+		go func(w int) {
+			for v := range q {
+				// mess with the fifo-ish nature of range
+				if pn[w%len(pn)] == v {
+					runtime.Gosched()
+				}
+				r <- v
+			}
+			wg.Done()
+		}(i)
+	}
+
+	// feeder & closer
+	expect := 0
+	go func() {
+		for i := 0; i < niter; i++ {
+			v := pn[i%len(pn)]
+			expect += v
+			q <- v
+		}
+		close(q)  // no more work
+		wg.Wait() // workers done
+		close(r)  // ... so there can be no more results
+	}()
+
+	// consume & check
+	n := 0
+	s := 0
+	for v := range r {
+		n++
+		s += v
+	}
+	if n != niter || s != expect {
+		t.Errorf("Expected sum %d (got %d) from %d iter (saw %d)",
+			expect, s, niter, n)
+	}
+}
+
+func TestShrinkStackDuringBlockedSend(t *testing.T) {
+	// make sure that channel operations still work when we are
+	// blocked on a channel send and we shrink the stack.
+	// NOTE: this test probably won't fail unless stack1.go:stackDebug
+	// is set to >= 1.
+	const n = 10
+	c := make(chan int)
+	done := make(chan struct{})
+
+	go func() {
+		for i := 0; i < n; i++ {
+			c <- i
+			// use lots of stack, briefly.
+			stackGrowthRecursive(20)
+		}
+		done <- struct{}{}
+	}()
+
+	for i := 0; i < n; i++ {
+		x := <-c
+		if x != i {
+			t.Errorf("bad channel read: want %d, got %d", i, x)
+		}
+		// Waste some time so sender can finish using lots of stack
+		// and block in channel send.
+		time.Sleep(1 * time.Millisecond)
+		// trigger GC which will shrink the stack of the sender.
+		runtime.GC()
+	}
+	<-done
+}
+
+func TestNoShrinkStackWhileParking(t *testing.T) {
+	if runtime.GOOS == "netbsd" && runtime.GOARCH == "arm64" {
+		testenv.SkipFlaky(t, 49382)
+	}
+	if runtime.GOOS == "openbsd" {
+		testenv.SkipFlaky(t, 51482)
+	}
+
+	// The goal of this test is to trigger a "racy sudog adjustment"
+	// throw. Basically, there's a window between when a goroutine
+	// becomes available for preemption for stack scanning (and thus,
+	// stack shrinking) but before the goroutine has fully parked on a
+	// channel. See issue 40641 for more details on the problem.
+	//
+	// The way we try to induce this failure is to set up two
+	// goroutines: a sender and a receiver that communicate across
+	// a channel. We try to set up a situation where the sender
+	// grows its stack temporarily then *fully* blocks on a channel
+	// often. Meanwhile a GC is triggered so that we try to get a
+	// mark worker to shrink the sender's stack and race with the
+	// sender parking.
+	//
+	// Unfortunately the race window here is so small that we
+	// either need a ridiculous number of iterations, or we add
+	// "usleep(1000)" to park_m, just before the unlockf call.
+	const n = 10
+	send := func(c chan<- int, done chan struct{}) {
+		for i := 0; i < n; i++ {
+			c <- i
+			// Use lots of stack briefly so that
+			// the GC is going to want to shrink us
+			// when it scans us. Make sure not to
+			// do any function calls otherwise
+			// in order to avoid us shrinking ourselves
+			// when we're preempted.
+			stackGrowthRecursive(20)
+		}
+		done <- struct{}{}
+	}
+	recv := func(c <-chan int, done chan struct{}) {
+		for i := 0; i < n; i++ {
+			// Sleep here so that the sender always
+			// fully blocks.
+			time.Sleep(10 * time.Microsecond)
+			<-c
+		}
+		done <- struct{}{}
+	}
+	for i := 0; i < n*20; i++ {
+		c := make(chan int)
+		done := make(chan struct{})
+		go recv(c, done)
+		go send(c, done)
+		// Wait a little bit before triggering
+		// the GC to make sure the sender and
+		// receiver have gotten into their groove.
+		time.Sleep(50 * time.Microsecond)
+		runtime.GC()
+		<-done
+		<-done
+	}
+}
+
+func TestSelectDuplicateChannel(t *testing.T) {
+	// This test makes sure we can queue a G on
+	// the same channel multiple times.
+	c := make(chan int)
+	d := make(chan int)
+	e := make(chan int)
+
+	// goroutine A
+	go func() {
+		select {
+		case <-c:
+		case <-c:
+		case <-d:
+		}
+		e <- 9
+	}()
+	time.Sleep(time.Millisecond) // make sure goroutine A gets queued first on c
+
+	// goroutine B
+	go func() {
+		<-c
+	}()
+	time.Sleep(time.Millisecond) // make sure goroutine B gets queued on c before continuing
+
+	d <- 7 // wake up A, it dequeues itself from c.  This operation used to corrupt c.recvq.
+	<-e    // A tells us it's done
+	c <- 8 // wake up B.  This operation used to fail because c.recvq was corrupted (it tries to wake up an already running G instead of B)
+}
+
+func TestSelectStackAdjust(t *testing.T) {
+	// Test that channel receive slots that contain local stack
+	// pointers are adjusted correctly by stack shrinking.
+	c := make(chan *int)
+	d := make(chan *int)
+	ready1 := make(chan bool)
+	ready2 := make(chan bool)
+
+	f := func(ready chan bool, dup bool) {
+		// Temporarily grow the stack to 10K.
+		stackGrowthRecursive((10 << 10) / (128 * 8))
+
+		// We're ready to trigger GC and stack shrink.
+		ready <- true
+
+		val := 42
+		var cx *int
+		cx = &val
+
+		var c2 chan *int
+		var d2 chan *int
+		if dup {
+			c2 = c
+			d2 = d
+		}
+
+		// Receive from d. cx won't be affected.
+		select {
+		case cx = <-c:
+		case <-c2:
+		case <-d:
+		case <-d2:
+		}
+
+		// Check that pointer in cx was adjusted correctly.
+		if cx != &val {
+			t.Error("cx no longer points to val")
+		} else if val != 42 {
+			t.Error("val changed")
+		} else {
+			*cx = 43
+			if val != 43 {
+				t.Error("changing *cx failed to change val")
+			}
+		}
+		ready <- true
+	}
+
+	go f(ready1, false)
+	go f(ready2, true)
+
+	// Let the goroutines get into the select.
+	<-ready1
+	<-ready2
+	time.Sleep(10 * time.Millisecond)
+
+	// Force concurrent GC to shrink the stacks.
+	runtime.GC()
+
+	// Wake selects.
+	close(d)
+	<-ready1
+	<-ready2
+}
+
+type struct0 struct{}
+
+func BenchmarkMakeChan(b *testing.B) {
+	b.Run("Byte", func(b *testing.B) {
+		var x chan byte
+		for i := 0; i < b.N; i++ {
+			x = make(chan byte, 8)
+		}
+		close(x)
+	})
+	b.Run("Int", func(b *testing.B) {
+		var x chan int
+		for i := 0; i < b.N; i++ {
+			x = make(chan int, 8)
+		}
+		close(x)
+	})
+	b.Run("Ptr", func(b *testing.B) {
+		var x chan *byte
+		for i := 0; i < b.N; i++ {
+			x = make(chan *byte, 8)
+		}
+		close(x)
+	})
+	b.Run("Struct", func(b *testing.B) {
+		b.Run("0", func(b *testing.B) {
+			var x chan struct0
+			for i := 0; i < b.N; i++ {
+				x = make(chan struct0, 8)
+			}
+			close(x)
+		})
+		b.Run("32", func(b *testing.B) {
+			var x chan struct32
+			for i := 0; i < b.N; i++ {
+				x = make(chan struct32, 8)
+			}
+			close(x)
+		})
+		b.Run("40", func(b *testing.B) {
+			var x chan struct40
+			for i := 0; i < b.N; i++ {
+				x = make(chan struct40, 8)
+			}
+			close(x)
+		})
+	})
+}
+
+func BenchmarkChanNonblocking(b *testing.B) {
+	myc := make(chan int)
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			select {
+			case <-myc:
+			default:
+			}
+		}
+	})
+}
+
+func BenchmarkSelectUncontended(b *testing.B) {
+	b.RunParallel(func(pb *testing.PB) {
+		myc1 := make(chan int, 1)
+		myc2 := make(chan int, 1)
+		myc1 <- 0
+		for pb.Next() {
+			select {
+			case <-myc1:
+				myc2 <- 0
+			case <-myc2:
+				myc1 <- 0
+			}
+		}
+	})
+}
+
+func BenchmarkSelectSyncContended(b *testing.B) {
+	myc1 := make(chan int)
+	myc2 := make(chan int)
+	myc3 := make(chan int)
+	done := make(chan int)
+	b.RunParallel(func(pb *testing.PB) {
+		go func() {
+			for {
+				select {
+				case myc1 <- 0:
+				case myc2 <- 0:
+				case myc3 <- 0:
+				case <-done:
+					return
+				}
+			}
+		}()
+		for pb.Next() {
+			select {
+			case <-myc1:
+			case <-myc2:
+			case <-myc3:
+			}
+		}
+	})
+	close(done)
+}
+
+func BenchmarkSelectAsyncContended(b *testing.B) {
+	procs := runtime.GOMAXPROCS(0)
+	myc1 := make(chan int, procs)
+	myc2 := make(chan int, procs)
+	b.RunParallel(func(pb *testing.PB) {
+		myc1 <- 0
+		for pb.Next() {
+			select {
+			case <-myc1:
+				myc2 <- 0
+			case <-myc2:
+				myc1 <- 0
+			}
+		}
+	})
+}
+
+func BenchmarkSelectNonblock(b *testing.B) {
+	myc1 := make(chan int)
+	myc2 := make(chan int)
+	myc3 := make(chan int, 1)
+	myc4 := make(chan int, 1)
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			select {
+			case <-myc1:
+			default:
+			}
+			select {
+			case myc2 <- 0:
+			default:
+			}
+			select {
+			case <-myc3:
+			default:
+			}
+			select {
+			case myc4 <- 0:
+			default:
+			}
+		}
+	})
+}
+
+func BenchmarkChanUncontended(b *testing.B) {
+	const C = 100
+	b.RunParallel(func(pb *testing.PB) {
+		myc := make(chan int, C)
+		for pb.Next() {
+			for i := 0; i < C; i++ {
+				myc <- 0
+			}
+			for i := 0; i < C; i++ {
+				<-myc
+			}
+		}
+	})
+}
+
+func BenchmarkChanContended(b *testing.B) {
+	const C = 100
+	myc := make(chan int, C*runtime.GOMAXPROCS(0))
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			for i := 0; i < C; i++ {
+				myc <- 0
+			}
+			for i := 0; i < C; i++ {
+				<-myc
+			}
+		}
+	})
+}
+
+func benchmarkChanSync(b *testing.B, work int) {
+	const CallsPerSched = 1000
+	procs := 2
+	N := int32(b.N / CallsPerSched / procs * procs)
+	c := make(chan bool, procs)
+	myc := make(chan int)
+	for p := 0; p < procs; p++ {
+		go func() {
+			for {
+				i := atomic.AddInt32(&N, -1)
+				if i < 0 {
+					break
+				}
+				for g := 0; g < CallsPerSched; g++ {
+					if i%2 == 0 {
+						<-myc
+						localWork(work)
+						myc <- 0
+						localWork(work)
+					} else {
+						myc <- 0
+						localWork(work)
+						<-myc
+						localWork(work)
+					}
+				}
+			}
+			c <- true
+		}()
+	}
+	for p := 0; p < procs; p++ {
+		<-c
+	}
+}
+
+func BenchmarkChanSync(b *testing.B) {
+	benchmarkChanSync(b, 0)
+}
+
+func BenchmarkChanSyncWork(b *testing.B) {
+	benchmarkChanSync(b, 1000)
+}
+
+func benchmarkChanProdCons(b *testing.B, chanSize, localWork int) {
+	const CallsPerSched = 1000
+	procs := runtime.GOMAXPROCS(-1)
+	N := int32(b.N / CallsPerSched)
+	c := make(chan bool, 2*procs)
+	myc := make(chan int, chanSize)
+	for p := 0; p < procs; p++ {
+		go func() {
+			foo := 0
+			for atomic.AddInt32(&N, -1) >= 0 {
+				for g := 0; g < CallsPerSched; g++ {
+					for i := 0; i < localWork; i++ {
+						foo *= 2
+						foo /= 2
+					}
+					myc <- 1
+				}
+			}
+			myc <- 0
+			c <- foo == 42
+		}()
+		go func() {
+			foo := 0
+			for {
+				v := <-myc
+				if v == 0 {
+					break
+				}
+				for i := 0; i < localWork; i++ {
+					foo *= 2
+					foo /= 2
+				}
+			}
+			c <- foo == 42
+		}()
+	}
+	for p := 0; p < procs; p++ {
+		<-c
+		<-c
+	}
+}
+
+func BenchmarkChanProdCons0(b *testing.B) {
+	benchmarkChanProdCons(b, 0, 0)
+}
+
+func BenchmarkChanProdCons10(b *testing.B) {
+	benchmarkChanProdCons(b, 10, 0)
+}
+
+func BenchmarkChanProdCons100(b *testing.B) {
+	benchmarkChanProdCons(b, 100, 0)
+}
+
+func BenchmarkChanProdConsWork0(b *testing.B) {
+	benchmarkChanProdCons(b, 0, 100)
+}
+
+func BenchmarkChanProdConsWork10(b *testing.B) {
+	benchmarkChanProdCons(b, 10, 100)
+}
+
+func BenchmarkChanProdConsWork100(b *testing.B) {
+	benchmarkChanProdCons(b, 100, 100)
+}
+
+func BenchmarkSelectProdCons(b *testing.B) {
+	const CallsPerSched = 1000
+	procs := runtime.GOMAXPROCS(-1)
+	N := int32(b.N / CallsPerSched)
+	c := make(chan bool, 2*procs)
+	myc := make(chan int, 128)
+	myclose := make(chan bool)
+	for p := 0; p < procs; p++ {
+		go func() {
+			// Producer: sends to myc.
+			foo := 0
+			// Intended to not fire during benchmarking.
+			mytimer := time.After(time.Hour)
+			for atomic.AddInt32(&N, -1) >= 0 {
+				for g := 0; g < CallsPerSched; g++ {
+					// Model some local work.
+					for i := 0; i < 100; i++ {
+						foo *= 2
+						foo /= 2
+					}
+					select {
+					case myc <- 1:
+					case <-mytimer:
+					case <-myclose:
+					}
+				}
+			}
+			myc <- 0
+			c <- foo == 42
+		}()
+		go func() {
+			// Consumer: receives from myc.
+			foo := 0
+			// Intended to not fire during benchmarking.
+			mytimer := time.After(time.Hour)
+		loop:
+			for {
+				select {
+				case v := <-myc:
+					if v == 0 {
+						break loop
+					}
+				case <-mytimer:
+				case <-myclose:
+				}
+				// Model some local work.
+				for i := 0; i < 100; i++ {
+					foo *= 2
+					foo /= 2
+				}
+			}
+			c <- foo == 42
+		}()
+	}
+	for p := 0; p < procs; p++ {
+		<-c
+		<-c
+	}
+}
+
+func BenchmarkReceiveDataFromClosedChan(b *testing.B) {
+	count := b.N
+	ch := make(chan struct{}, count)
+	for i := 0; i < count; i++ {
+		ch <- struct{}{}
+	}
+	close(ch)
+
+	b.ResetTimer()
+	for range ch {
+	}
+}
+
+func BenchmarkChanCreation(b *testing.B) {
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			myc := make(chan int, 1)
+			myc <- 0
+			<-myc
+		}
+	})
+}
+
+func BenchmarkChanSem(b *testing.B) {
+	type Empty struct{}
+	myc := make(chan Empty, runtime.GOMAXPROCS(0))
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			myc <- Empty{}
+			<-myc
+		}
+	})
+}
+
+func BenchmarkChanPopular(b *testing.B) {
+	const n = 1000
+	c := make(chan bool)
+	var a []chan bool
+	var wg sync.WaitGroup
+	wg.Add(n)
+	for j := 0; j < n; j++ {
+		d := make(chan bool)
+		a = append(a, d)
+		go func() {
+			for i := 0; i < b.N; i++ {
+				select {
+				case <-c:
+				case <-d:
+				}
+			}
+			wg.Done()
+		}()
+	}
+	for i := 0; i < b.N; i++ {
+		for _, d := range a {
+			d <- true
+		}
+	}
+	wg.Wait()
+}
+
+func BenchmarkChanClosed(b *testing.B) {
+	c := make(chan struct{})
+	close(c)
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			select {
+			case <-c:
+			default:
+				b.Error("Unreachable")
+			}
+		}
+	})
+}
+
+var (
+	alwaysFalse = false
+	workSink    = 0
+)
+
+func localWork(w int) {
+	foo := 0
+	for i := 0; i < w; i++ {
+		foo /= (foo + 1)
+	}
+	if alwaysFalse {
+		workSink += foo
+	}
+}
diff --git a/src/runtime/chanbarrier_test.go b/src/runtime/chanbarrier_test.go
new file mode 100644
index 0000000..d479574
--- /dev/null
+++ b/src/runtime/chanbarrier_test.go
@@ -0,0 +1,83 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime"
+	"sync"
+	"testing"
+)
+
+type response struct {
+}
+
+type myError struct {
+}
+
+func (myError) Error() string { return "" }
+
+func doRequest(useSelect bool) (*response, error) {
+	type async struct {
+		resp *response
+		err  error
+	}
+	ch := make(chan *async, 0)
+	done := make(chan struct{}, 0)
+
+	if useSelect {
+		go func() {
+			select {
+			case ch <- &async{resp: nil, err: myError{}}:
+			case <-done:
+			}
+		}()
+	} else {
+		go func() {
+			ch <- &async{resp: nil, err: myError{}}
+		}()
+	}
+
+	r := <-ch
+	runtime.Gosched()
+	return r.resp, r.err
+}
+
+func TestChanSendSelectBarrier(t *testing.T) {
+	testChanSendBarrier(true)
+}
+
+func TestChanSendBarrier(t *testing.T) {
+	testChanSendBarrier(false)
+}
+
+func testChanSendBarrier(useSelect bool) {
+	var wg sync.WaitGroup
+	var globalMu sync.Mutex
+	outer := 100
+	inner := 100000
+	if testing.Short() || runtime.GOARCH == "wasm" {
+		outer = 10
+		inner = 1000
+	}
+	for i := 0; i < outer; i++ {
+		wg.Add(1)
+		go func() {
+			defer wg.Done()
+			var garbage []byte
+			for j := 0; j < inner; j++ {
+				_, err := doRequest(useSelect)
+				_, ok := err.(myError)
+				if !ok {
+					panic(1)
+				}
+				garbage = make([]byte, 1<<10)
+			}
+			globalMu.Lock()
+			global = garbage
+			globalMu.Unlock()
+		}()
+	}
+	wg.Wait()
+}
diff --git a/src/runtime/checkptr.go b/src/runtime/checkptr.go
new file mode 100644
index 0000000..3c49645
--- /dev/null
+++ b/src/runtime/checkptr.go
@@ -0,0 +1,109 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+func checkptrAlignment(p unsafe.Pointer, elem *_type, n uintptr) {
+	// nil pointer is always suitably aligned (#47430).
+	if p == nil {
+		return
+	}
+
+	// Check that (*[n]elem)(p) is appropriately aligned.
+	// Note that we allow unaligned pointers if the types they point to contain
+	// no pointers themselves. See issue 37298.
+	// TODO(mdempsky): What about fieldAlign?
+	if elem.PtrBytes != 0 && uintptr(p)&(uintptr(elem.Align_)-1) != 0 {
+		throw("checkptr: misaligned pointer conversion")
+	}
+
+	// Check that (*[n]elem)(p) doesn't straddle multiple heap objects.
+	// TODO(mdempsky): Fix #46938 so we don't need to worry about overflow here.
+	if checkptrStraddles(p, n*elem.Size_) {
+		throw("checkptr: converted pointer straddles multiple allocations")
+	}
+}
+
+// checkptrStraddles reports whether the first size-bytes of memory
+// addressed by ptr is known to straddle more than one Go allocation.
+func checkptrStraddles(ptr unsafe.Pointer, size uintptr) bool {
+	if size <= 1 {
+		return false
+	}
+
+	// Check that add(ptr, size-1) won't overflow. This avoids the risk
+	// of producing an illegal pointer value (assuming ptr is legal).
+	if uintptr(ptr) >= -(size - 1) {
+		return true
+	}
+	end := add(ptr, size-1)
+
+	// TODO(mdempsky): Detect when [ptr, end] contains Go allocations,
+	// but neither ptr nor end point into one themselves.
+
+	return checkptrBase(ptr) != checkptrBase(end)
+}
+
+func checkptrArithmetic(p unsafe.Pointer, originals []unsafe.Pointer) {
+	if 0 < uintptr(p) && uintptr(p) < minLegalPointer {
+		throw("checkptr: pointer arithmetic computed bad pointer value")
+	}
+
+	// Check that if the computed pointer p points into a heap
+	// object, then one of the original pointers must have pointed
+	// into the same object.
+	base := checkptrBase(p)
+	if base == 0 {
+		return
+	}
+
+	for _, original := range originals {
+		if base == checkptrBase(original) {
+			return
+		}
+	}
+
+	throw("checkptr: pointer arithmetic result points to invalid allocation")
+}
+
+// checkptrBase returns the base address for the allocation containing
+// the address p.
+//
+// Importantly, if p1 and p2 point into the same variable, then
+// checkptrBase(p1) == checkptrBase(p2). However, the converse/inverse
+// is not necessarily true as allocations can have trailing padding,
+// and multiple variables may be packed into a single allocation.
+func checkptrBase(p unsafe.Pointer) uintptr {
+	// stack
+	if gp := getg(); gp.stack.lo <= uintptr(p) && uintptr(p) < gp.stack.hi {
+		// TODO(mdempsky): Walk the stack to identify the
+		// specific stack frame or even stack object that p
+		// points into.
+		//
+		// In the mean time, use "1" as a pseudo-address to
+		// represent the stack. This is an invalid address on
+		// all platforms, so it's guaranteed to be distinct
+		// from any of the addresses we might return below.
+		return 1
+	}
+
+	// heap (must check after stack because of #35068)
+	if base, _, _ := findObject(uintptr(p), 0, 0); base != 0 {
+		return base
+	}
+
+	// data or bss
+	for _, datap := range activeModules() {
+		if datap.data <= uintptr(p) && uintptr(p) < datap.edata {
+			return datap.data
+		}
+		if datap.bss <= uintptr(p) && uintptr(p) < datap.ebss {
+			return datap.bss
+		}
+	}
+
+	return 0
+}
diff --git a/src/runtime/checkptr_test.go b/src/runtime/checkptr_test.go
new file mode 100644
index 0000000..811c0f0
--- /dev/null
+++ b/src/runtime/checkptr_test.go
@@ -0,0 +1,108 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"internal/testenv"
+	"os/exec"
+	"strings"
+	"testing"
+)
+
+func TestCheckPtr(t *testing.T) {
+	// This test requires rebuilding packages with -d=checkptr=1,
+	// so it's somewhat slow.
+	if testing.Short() {
+		t.Skip("skipping test in -short mode")
+	}
+
+	t.Parallel()
+	testenv.MustHaveGoRun(t)
+
+	exe, err := buildTestProg(t, "testprog", "-gcflags=all=-d=checkptr=1")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	testCases := []struct {
+		cmd  string
+		want string
+	}{
+		{"CheckPtrAlignmentPtr", "fatal error: checkptr: misaligned pointer conversion\n"},
+		{"CheckPtrAlignmentNoPtr", ""},
+		{"CheckPtrAlignmentNilPtr", ""},
+		{"CheckPtrArithmetic", "fatal error: checkptr: pointer arithmetic result points to invalid allocation\n"},
+		{"CheckPtrArithmetic2", "fatal error: checkptr: pointer arithmetic result points to invalid allocation\n"},
+		{"CheckPtrSize", "fatal error: checkptr: converted pointer straddles multiple allocations\n"},
+		{"CheckPtrSmall", "fatal error: checkptr: pointer arithmetic computed bad pointer value\n"},
+		{"CheckPtrSliceOK", ""},
+		{"CheckPtrSliceFail", "fatal error: checkptr: unsafe.Slice result straddles multiple allocations\n"},
+		{"CheckPtrStringOK", ""},
+		{"CheckPtrStringFail", "fatal error: checkptr: unsafe.String result straddles multiple allocations\n"},
+	}
+
+	for _, tc := range testCases {
+		tc := tc
+		t.Run(tc.cmd, func(t *testing.T) {
+			t.Parallel()
+			got, err := testenv.CleanCmdEnv(exec.Command(exe, tc.cmd)).CombinedOutput()
+			if err != nil {
+				t.Log(err)
+			}
+			if tc.want == "" {
+				if len(got) > 0 {
+					t.Errorf("output:\n%s\nwant no output", got)
+				}
+				return
+			}
+			if !strings.HasPrefix(string(got), tc.want) {
+				t.Errorf("output:\n%s\n\nwant output starting with: %s", got, tc.want)
+			}
+		})
+	}
+}
+
+func TestCheckPtr2(t *testing.T) {
+	// This test requires rebuilding packages with -d=checkptr=2,
+	// so it's somewhat slow.
+	if testing.Short() {
+		t.Skip("skipping test in -short mode")
+	}
+
+	t.Parallel()
+	testenv.MustHaveGoRun(t)
+
+	exe, err := buildTestProg(t, "testprog", "-gcflags=all=-d=checkptr=2")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	testCases := []struct {
+		cmd  string
+		want string
+	}{
+		{"CheckPtrAlignmentNested", "fatal error: checkptr: converted pointer straddles multiple allocations\n"},
+	}
+
+	for _, tc := range testCases {
+		tc := tc
+		t.Run(tc.cmd, func(t *testing.T) {
+			t.Parallel()
+			got, err := testenv.CleanCmdEnv(exec.Command(exe, tc.cmd)).CombinedOutput()
+			if err != nil {
+				t.Log(err)
+			}
+			if tc.want == "" {
+				if len(got) > 0 {
+					t.Errorf("output:\n%s\nwant no output", got)
+				}
+				return
+			}
+			if !strings.HasPrefix(string(got), tc.want) {
+				t.Errorf("output:\n%s\n\nwant output starting with: %s", got, tc.want)
+			}
+		})
+	}
+}
diff --git a/src/runtime/closure_test.go b/src/runtime/closure_test.go
new file mode 100644
index 0000000..741c932
--- /dev/null
+++ b/src/runtime/closure_test.go
@@ -0,0 +1,54 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import "testing"
+
+var s int
+
+func BenchmarkCallClosure(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		s += func(ii int) int { return 2 * ii }(i)
+	}
+}
+
+func BenchmarkCallClosure1(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		j := i
+		s += func(ii int) int { return 2*ii + j }(i)
+	}
+}
+
+var ss *int
+
+func BenchmarkCallClosure2(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		j := i
+		s += func() int {
+			ss = &j
+			return 2
+		}()
+	}
+}
+
+func addr1(x int) *int {
+	return func() *int { return &x }()
+}
+
+func BenchmarkCallClosure3(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		ss = addr1(i)
+	}
+}
+
+func addr2() (x int, p *int) {
+	return 0, func() *int { return &x }()
+}
+
+func BenchmarkCallClosure4(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		_, ss = addr2()
+	}
+}
diff --git a/src/runtime/compiler.go b/src/runtime/compiler.go
new file mode 100644
index 0000000..f430a27
--- /dev/null
+++ b/src/runtime/compiler.go
@@ -0,0 +1,12 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// Compiler is the name of the compiler toolchain that built the
+// running binary. Known toolchains are:
+//
+//	gc      Also known as cmd/compile.
+//	gccgo   The gccgo front end, part of the GCC compiler suite.
+const Compiler = "gc"
diff --git a/src/runtime/complex.go b/src/runtime/complex.go
new file mode 100644
index 0000000..07c596f
--- /dev/null
+++ b/src/runtime/complex.go
@@ -0,0 +1,61 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// inf2one returns a signed 1 if f is an infinity and a signed 0 otherwise.
+// The sign of the result is the sign of f.
+func inf2one(f float64) float64 {
+	g := 0.0
+	if isInf(f) {
+		g = 1.0
+	}
+	return copysign(g, f)
+}
+
+func complex128div(n complex128, m complex128) complex128 {
+	var e, f float64 // complex(e, f) = n/m
+
+	// Algorithm for robust complex division as described in
+	// Robert L. Smith: Algorithm 116: Complex division. Commun. ACM 5(8): 435 (1962).
+	if abs(real(m)) >= abs(imag(m)) {
+		ratio := imag(m) / real(m)
+		denom := real(m) + ratio*imag(m)
+		e = (real(n) + imag(n)*ratio) / denom
+		f = (imag(n) - real(n)*ratio) / denom
+	} else {
+		ratio := real(m) / imag(m)
+		denom := imag(m) + ratio*real(m)
+		e = (real(n)*ratio + imag(n)) / denom
+		f = (imag(n)*ratio - real(n)) / denom
+	}
+
+	if isNaN(e) && isNaN(f) {
+		// Correct final result to infinities and zeros if applicable.
+		// Matches C99: ISO/IEC 9899:1999 - G.5.1  Multiplicative operators.
+
+		a, b := real(n), imag(n)
+		c, d := real(m), imag(m)
+
+		switch {
+		case m == 0 && (!isNaN(a) || !isNaN(b)):
+			e = copysign(inf, c) * a
+			f = copysign(inf, c) * b
+
+		case (isInf(a) || isInf(b)) && isFinite(c) && isFinite(d):
+			a = inf2one(a)
+			b = inf2one(b)
+			e = inf * (a*c + b*d)
+			f = inf * (b*c - a*d)
+
+		case (isInf(c) || isInf(d)) && isFinite(a) && isFinite(b):
+			c = inf2one(c)
+			d = inf2one(d)
+			e = 0 * (a*c + b*d)
+			f = 0 * (b*c - a*d)
+		}
+	}
+
+	return complex(e, f)
+}
diff --git a/src/runtime/complex_test.go b/src/runtime/complex_test.go
new file mode 100644
index 0000000..f41e6a3
--- /dev/null
+++ b/src/runtime/complex_test.go
@@ -0,0 +1,67 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"math/cmplx"
+	"testing"
+)
+
+var result complex128
+
+func BenchmarkComplex128DivNormal(b *testing.B) {
+	d := 15 + 2i
+	n := 32 + 3i
+	res := 0i
+	for i := 0; i < b.N; i++ {
+		n += 0.1i
+		res += n / d
+	}
+	result = res
+}
+
+func BenchmarkComplex128DivNisNaN(b *testing.B) {
+	d := cmplx.NaN()
+	n := 32 + 3i
+	res := 0i
+	for i := 0; i < b.N; i++ {
+		n += 0.1i
+		res += n / d
+	}
+	result = res
+}
+
+func BenchmarkComplex128DivDisNaN(b *testing.B) {
+	d := 15 + 2i
+	n := cmplx.NaN()
+	res := 0i
+	for i := 0; i < b.N; i++ {
+		d += 0.1i
+		res += n / d
+	}
+	result = res
+}
+
+func BenchmarkComplex128DivNisInf(b *testing.B) {
+	d := 15 + 2i
+	n := cmplx.Inf()
+	res := 0i
+	for i := 0; i < b.N; i++ {
+		d += 0.1i
+		res += n / d
+	}
+	result = res
+}
+
+func BenchmarkComplex128DivDisInf(b *testing.B) {
+	d := cmplx.Inf()
+	n := 32 + 3i
+	res := 0i
+	for i := 0; i < b.N; i++ {
+		n += 0.1i
+		res += n / d
+	}
+	result = res
+}
diff --git a/src/runtime/conv_wasm_test.go b/src/runtime/conv_wasm_test.go
new file mode 100644
index 0000000..5054fca
--- /dev/null
+++ b/src/runtime/conv_wasm_test.go
@@ -0,0 +1,128 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"testing"
+)
+
+var res int64
+var ures uint64
+
+func TestFloatTruncation(t *testing.T) {
+	testdata := []struct {
+		input      float64
+		convInt64  int64
+		convUInt64 uint64
+		overflow   bool
+	}{
+		// max +- 1
+		{
+			input:      0x7fffffffffffffff,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		// For out-of-bounds conversion, the result is implementation-dependent.
+		// This test verifies the implementation of wasm architecture.
+		{
+			input:      0x8000000000000000,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      0x7ffffffffffffffe,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		// neg max +- 1
+		{
+			input:      -0x8000000000000000,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      -0x8000000000000001,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      -0x7fffffffffffffff,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		// trunc point +- 1
+		{
+			input:      0x7ffffffffffffdff,
+			convInt64:  0x7ffffffffffffc00,
+			convUInt64: 0x7ffffffffffffc00,
+		},
+		{
+			input:      0x7ffffffffffffe00,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      0x7ffffffffffffdfe,
+			convInt64:  0x7ffffffffffffc00,
+			convUInt64: 0x7ffffffffffffc00,
+		},
+		// neg trunc point +- 1
+		{
+			input:      -0x7ffffffffffffdff,
+			convInt64:  -0x7ffffffffffffc00,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      -0x7ffffffffffffe00,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      -0x7ffffffffffffdfe,
+			convInt64:  -0x7ffffffffffffc00,
+			convUInt64: 0x8000000000000000,
+		},
+		// umax +- 1
+		{
+			input:      0xffffffffffffffff,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      0x10000000000000000,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      0xfffffffffffffffe,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		// umax trunc +- 1
+		{
+			input:      0xfffffffffffffbff,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0xfffffffffffff800,
+		},
+		{
+			input:      0xfffffffffffffc00,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0x8000000000000000,
+		},
+		{
+			input:      0xfffffffffffffbfe,
+			convInt64:  -0x8000000000000000,
+			convUInt64: 0xfffffffffffff800,
+		},
+	}
+	for _, item := range testdata {
+		if got, want := int64(item.input), item.convInt64; got != want {
+			t.Errorf("int64(%f): got %x, want %x", item.input, got, want)
+		}
+		if got, want := uint64(item.input), item.convUInt64; got != want {
+			t.Errorf("uint64(%f): got %x, want %x", item.input, got, want)
+		}
+	}
+}
diff --git a/src/runtime/coverage/apis.go b/src/runtime/coverage/apis.go
new file mode 100644
index 0000000..05da345
--- /dev/null
+++ b/src/runtime/coverage/apis.go
@@ -0,0 +1,184 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package coverage
+
+import (
+	"fmt"
+	"internal/coverage"
+	"io"
+	"reflect"
+	"sync/atomic"
+	"unsafe"
+)
+
+// WriteMetaDir writes a coverage meta-data file for the currently
+// running program to the directory specified in 'dir'. An error will
+// be returned if the operation can't be completed successfully (for
+// example, if the currently running program was not built with
+// "-cover", or if the directory does not exist).
+func WriteMetaDir(dir string) error {
+	if !finalHashComputed {
+		return fmt.Errorf("error: no meta-data available (binary not built with -cover?)")
+	}
+	return emitMetaDataToDirectory(dir, getCovMetaList())
+}
+
+// WriteMeta writes the meta-data content (the payload that would
+// normally be emitted to a meta-data file) for the currently running
+// program to the writer 'w'. An error will be returned if the
+// operation can't be completed successfully (for example, if the
+// currently running program was not built with "-cover", or if a
+// write fails).
+func WriteMeta(w io.Writer) error {
+	if w == nil {
+		return fmt.Errorf("error: nil writer in WriteMeta")
+	}
+	if !finalHashComputed {
+		return fmt.Errorf("error: no meta-data available (binary not built with -cover?)")
+	}
+	ml := getCovMetaList()
+	return writeMetaData(w, ml, cmode, cgran, finalHash)
+}
+
+// WriteCountersDir writes a coverage counter-data file for the
+// currently running program to the directory specified in 'dir'. An
+// error will be returned if the operation can't be completed
+// successfully (for example, if the currently running program was not
+// built with "-cover", or if the directory does not exist). The
+// counter data written will be a snapshot taken at the point of the
+// call.
+func WriteCountersDir(dir string) error {
+	if cmode != coverage.CtrModeAtomic {
+		return fmt.Errorf("WriteCountersDir invoked for program built with -covermode=%s (please use -covermode=atomic)", cmode.String())
+	}
+	return emitCounterDataToDirectory(dir)
+}
+
+// WriteCounters writes coverage counter-data content for the
+// currently running program to the writer 'w'. An error will be
+// returned if the operation can't be completed successfully (for
+// example, if the currently running program was not built with
+// "-cover", or if a write fails). The counter data written will be a
+// snapshot taken at the point of the invocation.
+func WriteCounters(w io.Writer) error {
+	if w == nil {
+		return fmt.Errorf("error: nil writer in WriteCounters")
+	}
+	if cmode != coverage.CtrModeAtomic {
+		return fmt.Errorf("WriteCounters invoked for program built with -covermode=%s (please use -covermode=atomic)", cmode.String())
+	}
+	// Ask the runtime for the list of coverage counter symbols.
+	cl := getCovCounterList()
+	if len(cl) == 0 {
+		return fmt.Errorf("program not built with -cover")
+	}
+	if !finalHashComputed {
+		return fmt.Errorf("meta-data not written yet, unable to write counter data")
+	}
+
+	pm := getCovPkgMap()
+	s := &emitState{
+		counterlist: cl,
+		pkgmap:      pm,
+	}
+	return s.emitCounterDataToWriter(w)
+}
+
+// ClearCounters clears/resets all coverage counter variables in the
+// currently running program. It returns an error if the program in
+// question was not built with the "-cover" flag. Clearing of coverage
+// counters is also not supported for programs not using atomic
+// counter mode (see more detailed comments below for the rationale
+// here).
+func ClearCounters() error {
+	cl := getCovCounterList()
+	if len(cl) == 0 {
+		return fmt.Errorf("program not built with -cover")
+	}
+	if cmode != coverage.CtrModeAtomic {
+		return fmt.Errorf("ClearCounters invoked for program built with -covermode=%s (please use -covermode=atomic)", cmode.String())
+	}
+
+	// Implementation note: this function would be faster and simpler
+	// if we could just zero out the entire counter array, but for the
+	// moment we go through and zero out just the slots in the array
+	// corresponding to the counter values. We do this to avoid the
+	// following bad scenario: suppose that a user builds their Go
+	// program with "-cover", and that program has a function (call it
+	// main.XYZ) that invokes ClearCounters:
+	//
+	//     func XYZ() {
+	//       ... do some stuff ...
+	//       coverage.ClearCounters()
+	//       if someCondition {   <<--- HERE
+	//         ...
+	//       }
+	//     }
+	//
+	// At the point where ClearCounters executes, main.XYZ has not yet
+	// finished running, thus as soon as the call returns the line
+	// marked "HERE" above will trigger the writing of a non-zero
+	// value into main.XYZ's counter slab. However since we've just
+	// finished clearing the entire counter segment, we will have lost
+	// the values in the prolog portion of main.XYZ's counter slab
+	// (nctrs, pkgid, funcid). This means that later on at the end of
+	// program execution as we walk through the entire counter array
+	// for the program looking for executed functions, we'll zoom past
+	// main.XYZ's prolog (which was zero'd) and hit the non-zero
+	// counter value corresponding to the "HERE" block, which will
+	// then be interpreted as the start of another live function.
+	// Things will go downhill from there.
+	//
+	// This same scenario is also a potential risk if the program is
+	// running on an architecture that permits reordering of
+	// writes/stores, since the inconsistency described above could
+	// arise here. Example scenario:
+	//
+	//     func ABC() {
+	//       ...                    // prolog
+	//       if alwaysTrue() {
+	//         XYZ()                // counter update here
+	//       }
+	//     }
+	//
+	// In the instrumented version of ABC, the prolog of the function
+	// will contain a series of stores to the initial portion of the
+	// counter array to write number-of-counters, pkgid, funcid. Later
+	// in the function there is also a store to increment a counter
+	// for the block containing the call to XYZ(). If the CPU is
+	// allowed to reorder stores and decides to issue the XYZ store
+	// before the prolog stores, this could be observable as an
+	// inconsistency similar to the one above. Hence the requirement
+	// for atomic counter mode: according to package atomic docs,
+	// "...operations that happen in a specific order on one thread,
+	// will always be observed to happen in exactly that order by
+	// another thread". Thus we can be sure that there will be no
+	// inconsistency when reading the counter array from the thread
+	// running ClearCounters.
+
+	var sd []atomic.Uint32
+
+	bufHdr := (*reflect.SliceHeader)(unsafe.Pointer(&sd))
+	for _, c := range cl {
+		bufHdr.Data = uintptr(unsafe.Pointer(c.Counters))
+		bufHdr.Len = int(c.Len)
+		bufHdr.Cap = int(c.Len)
+		for i := 0; i < len(sd); i++ {
+			// Skip ahead until the next non-zero value.
+			sdi := sd[i].Load()
+			if sdi == 0 {
+				continue
+			}
+			// We found a function that was executed; clear its counters.
+			nCtrs := sdi
+			for j := 0; j < int(nCtrs); j++ {
+				sd[i+coverage.FirstCtrOffset+j].Store(0)
+			}
+			// Move to next function.
+			i += coverage.FirstCtrOffset + int(nCtrs) - 1
+		}
+	}
+	return nil
+}
diff --git a/src/runtime/coverage/dummy.s b/src/runtime/coverage/dummy.s
new file mode 100644
index 0000000..7592859
--- /dev/null
+++ b/src/runtime/coverage/dummy.s
@@ -0,0 +1,8 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The runtime package uses //go:linkname to push a few functions into this
+// package but we still need a .s file so the Go tool does not pass -complete
+// to 'go tool compile' so the latter does not complain about Go functions
+// with no bodies.
diff --git a/src/runtime/coverage/emit.go b/src/runtime/coverage/emit.go
new file mode 100644
index 0000000..bb0c6fb
--- /dev/null
+++ b/src/runtime/coverage/emit.go
@@ -0,0 +1,622 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package coverage
+
+import (
+	"crypto/md5"
+	"fmt"
+	"internal/coverage"
+	"internal/coverage/encodecounter"
+	"internal/coverage/encodemeta"
+	"internal/coverage/rtcov"
+	"io"
+	"os"
+	"path/filepath"
+	"reflect"
+	"runtime"
+	"strconv"
+	"sync/atomic"
+	"time"
+	"unsafe"
+)
+
+// This file contains functions that support the writing of data files
+// emitted at the end of code coverage testing runs, from instrumented
+// executables.
+
+// getCovMetaList returns a list of meta-data blobs registered
+// for the currently executing instrumented program. It is defined in the
+// runtime.
+func getCovMetaList() []rtcov.CovMetaBlob
+
+// getCovCounterList returns a list of counter-data blobs registered
+// for the currently executing instrumented program. It is defined in the
+// runtime.
+func getCovCounterList() []rtcov.CovCounterBlob
+
+// getCovPkgMap returns a map storing the remapped package IDs for
+// hard-coded runtime packages (see internal/coverage/pkgid.go for
+// more on why hard-coded package IDs are needed). This function
+// is defined in the runtime.
+func getCovPkgMap() map[int]int
+
+// emitState holds useful state information during the emit process.
+//
+// When an instrumented program finishes execution and starts the
+// process of writing out coverage data, it's possible that an
+// existing meta-data file already exists in the output directory. In
+// this case openOutputFiles() below will leave the 'mf' field below
+// as nil. If a new meta-data file is needed, field 'mfname' will be
+// the final desired path of the meta file, 'mftmp' will be a
+// temporary file, and 'mf' will be an open os.File pointer for
+// 'mftmp'. The meta-data file payload will be written to 'mf', the
+// temp file will be then closed and renamed (from 'mftmp' to
+// 'mfname'), so as to insure that the meta-data file is created
+// atomically; we want this so that things work smoothly in cases
+// where there are several instances of a given instrumented program
+// all terminating at the same time and trying to create meta-data
+// files simultaneously.
+//
+// For counter data files there is less chance of a collision, hence
+// the openOutputFiles() stores the counter data file in 'cfname' and
+// then places the *io.File into 'cf'.
+type emitState struct {
+	mfname string   // path of final meta-data output file
+	mftmp  string   // path to meta-data temp file (if needed)
+	mf     *os.File // open os.File for meta-data temp file
+	cfname string   // path of final counter data file
+	cftmp  string   // path to counter data temp file
+	cf     *os.File // open os.File for counter data file
+	outdir string   // output directory
+
+	// List of meta-data symbols obtained from the runtime
+	metalist []rtcov.CovMetaBlob
+
+	// List of counter-data symbols obtained from the runtime
+	counterlist []rtcov.CovCounterBlob
+
+	// Table to use for remapping hard-coded pkg ids.
+	pkgmap map[int]int
+
+	// emit debug trace output
+	debug bool
+}
+
+var (
+	// finalHash is computed at init time from the list of meta-data
+	// symbols registered during init. It is used both for writing the
+	// meta-data file and counter-data files.
+	finalHash [16]byte
+	// Set to true when we've computed finalHash + finalMetaLen.
+	finalHashComputed bool
+	// Total meta-data length.
+	finalMetaLen uint64
+	// Records whether we've already attempted to write meta-data.
+	metaDataEmitAttempted bool
+	// Counter mode for this instrumented program run.
+	cmode coverage.CounterMode
+	// Counter granularity for this instrumented program run.
+	cgran coverage.CounterGranularity
+	// Cached value of GOCOVERDIR environment variable.
+	goCoverDir string
+	// Copy of os.Args made at init time, converted into map format.
+	capturedOsArgs map[string]string
+	// Flag used in tests to signal that coverage data already written.
+	covProfileAlreadyEmitted bool
+)
+
+// fileType is used to select between counter-data files and
+// meta-data files.
+type fileType int
+
+const (
+	noFile = 1 << iota
+	metaDataFile
+	counterDataFile
+)
+
+// emitMetaData emits the meta-data output file for this coverage run.
+// This entry point is intended to be invoked by the compiler from
+// an instrumented program's main package init func.
+func emitMetaData() {
+	if covProfileAlreadyEmitted {
+		return
+	}
+	ml, err := prepareForMetaEmit()
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "error: coverage meta-data prep failed: %v\n", err)
+		if os.Getenv("GOCOVERDEBUG") != "" {
+			panic("meta-data write failure")
+		}
+	}
+	if len(ml) == 0 {
+		fmt.Fprintf(os.Stderr, "program not built with -cover\n")
+		return
+	}
+
+	goCoverDir = os.Getenv("GOCOVERDIR")
+	if goCoverDir == "" {
+		fmt.Fprintf(os.Stderr, "warning: GOCOVERDIR not set, no coverage data emitted\n")
+		return
+	}
+
+	if err := emitMetaDataToDirectory(goCoverDir, ml); err != nil {
+		fmt.Fprintf(os.Stderr, "error: coverage meta-data emit failed: %v\n", err)
+		if os.Getenv("GOCOVERDEBUG") != "" {
+			panic("meta-data write failure")
+		}
+	}
+}
+
+func modeClash(m coverage.CounterMode) bool {
+	if m == coverage.CtrModeRegOnly || m == coverage.CtrModeTestMain {
+		return false
+	}
+	if cmode == coverage.CtrModeInvalid {
+		cmode = m
+		return false
+	}
+	return cmode != m
+}
+
+func granClash(g coverage.CounterGranularity) bool {
+	if cgran == coverage.CtrGranularityInvalid {
+		cgran = g
+		return false
+	}
+	return cgran != g
+}
+
+// prepareForMetaEmit performs preparatory steps needed prior to
+// emitting a meta-data file, notably computing a final hash of
+// all meta-data blobs and capturing os args.
+func prepareForMetaEmit() ([]rtcov.CovMetaBlob, error) {
+	// Ask the runtime for the list of coverage meta-data symbols.
+	ml := getCovMetaList()
+
+	// In the normal case (go build -o prog.exe ... ; ./prog.exe)
+	// len(ml) will always be non-zero, but we check here since at
+	// some point this function will be reachable via user-callable
+	// APIs (for example, to write out coverage data from a server
+	// program that doesn't ever call os.Exit).
+	if len(ml) == 0 {
+		return nil, nil
+	}
+
+	s := &emitState{
+		metalist: ml,
+		debug:    os.Getenv("GOCOVERDEBUG") != "",
+	}
+
+	// Capture os.Args() now so as to avoid issues if args
+	// are rewritten during program execution.
+	capturedOsArgs = captureOsArgs()
+
+	if s.debug {
+		fmt.Fprintf(os.Stderr, "=+= GOCOVERDIR is %s\n", os.Getenv("GOCOVERDIR"))
+		fmt.Fprintf(os.Stderr, "=+= contents of covmetalist:\n")
+		for k, b := range ml {
+			fmt.Fprintf(os.Stderr, "=+= slot: %d path: %s ", k, b.PkgPath)
+			if b.PkgID != -1 {
+				fmt.Fprintf(os.Stderr, " hcid: %d", b.PkgID)
+			}
+			fmt.Fprintf(os.Stderr, "\n")
+		}
+		pm := getCovPkgMap()
+		fmt.Fprintf(os.Stderr, "=+= remap table:\n")
+		for from, to := range pm {
+			fmt.Fprintf(os.Stderr, "=+= from %d to %d\n",
+				uint32(from), uint32(to))
+		}
+	}
+
+	h := md5.New()
+	tlen := uint64(unsafe.Sizeof(coverage.MetaFileHeader{}))
+	for _, entry := range ml {
+		if _, err := h.Write(entry.Hash[:]); err != nil {
+			return nil, err
+		}
+		tlen += uint64(entry.Len)
+		ecm := coverage.CounterMode(entry.CounterMode)
+		if modeClash(ecm) {
+			return nil, fmt.Errorf("coverage counter mode clash: package %s uses mode=%d, but package %s uses mode=%s\n", ml[0].PkgPath, cmode, entry.PkgPath, ecm)
+		}
+		ecg := coverage.CounterGranularity(entry.CounterGranularity)
+		if granClash(ecg) {
+			return nil, fmt.Errorf("coverage counter granularity clash: package %s uses gran=%d, but package %s uses gran=%s\n", ml[0].PkgPath, cgran, entry.PkgPath, ecg)
+		}
+	}
+
+	// Hash mode and granularity as well.
+	h.Write([]byte(cmode.String()))
+	h.Write([]byte(cgran.String()))
+
+	// Compute final digest.
+	fh := h.Sum(nil)
+	copy(finalHash[:], fh)
+	finalHashComputed = true
+	finalMetaLen = tlen
+
+	return ml, nil
+}
+
+// emitMetaDataToDirectory emits the meta-data output file to the specified
+// directory, returning an error if something went wrong.
+func emitMetaDataToDirectory(outdir string, ml []rtcov.CovMetaBlob) error {
+	ml, err := prepareForMetaEmit()
+	if err != nil {
+		return err
+	}
+	if len(ml) == 0 {
+		return nil
+	}
+
+	metaDataEmitAttempted = true
+
+	s := &emitState{
+		metalist: ml,
+		debug:    os.Getenv("GOCOVERDEBUG") != "",
+		outdir:   outdir,
+	}
+
+	// Open output files.
+	if err := s.openOutputFiles(finalHash, finalMetaLen, metaDataFile); err != nil {
+		return err
+	}
+
+	// Emit meta-data file only if needed (may already be present).
+	if s.needMetaDataFile() {
+		if err := s.emitMetaDataFile(finalHash, finalMetaLen); err != nil {
+			return err
+		}
+	}
+	return nil
+}
+
+// emitCounterData emits the counter data output file for this coverage run.
+// This entry point is intended to be invoked by the runtime when an
+// instrumented program is terminating or calling os.Exit().
+func emitCounterData() {
+	if goCoverDir == "" || !finalHashComputed || covProfileAlreadyEmitted {
+		return
+	}
+	if err := emitCounterDataToDirectory(goCoverDir); err != nil {
+		fmt.Fprintf(os.Stderr, "error: coverage counter data emit failed: %v\n", err)
+		if os.Getenv("GOCOVERDEBUG") != "" {
+			panic("counter-data write failure")
+		}
+	}
+}
+
+// emitCounterDataToDirectory emits the counter-data output file for this coverage run.
+func emitCounterDataToDirectory(outdir string) error {
+	// Ask the runtime for the list of coverage counter symbols.
+	cl := getCovCounterList()
+	if len(cl) == 0 {
+		// no work to do here.
+		return nil
+	}
+
+	if !finalHashComputed {
+		return fmt.Errorf("error: meta-data not available (binary not built with -cover?)")
+	}
+
+	// Ask the runtime for the list of coverage counter symbols.
+	pm := getCovPkgMap()
+	s := &emitState{
+		counterlist: cl,
+		pkgmap:      pm,
+		outdir:      outdir,
+		debug:       os.Getenv("GOCOVERDEBUG") != "",
+	}
+
+	// Open output file.
+	if err := s.openOutputFiles(finalHash, finalMetaLen, counterDataFile); err != nil {
+		return err
+	}
+	if s.cf == nil {
+		return fmt.Errorf("counter data output file open failed (no additional info")
+	}
+
+	// Emit counter data file.
+	if err := s.emitCounterDataFile(finalHash, s.cf); err != nil {
+		return err
+	}
+	if err := s.cf.Close(); err != nil {
+		return fmt.Errorf("closing counter data file: %v", err)
+	}
+
+	// Counter file has now been closed. Rename the temp to the
+	// final desired path.
+	if err := os.Rename(s.cftmp, s.cfname); err != nil {
+		return fmt.Errorf("writing %s: rename from %s failed: %v\n", s.cfname, s.cftmp, err)
+	}
+
+	return nil
+}
+
+// emitCounterDataToWriter emits counter data for this coverage run to an io.Writer.
+func (s *emitState) emitCounterDataToWriter(w io.Writer) error {
+	if err := s.emitCounterDataFile(finalHash, w); err != nil {
+		return err
+	}
+	return nil
+}
+
+// openMetaFile determines whether we need to emit a meta-data output
+// file, or whether we can reuse the existing file in the coverage out
+// dir. It updates mfname/mftmp/mf fields in 's', returning an error
+// if something went wrong. See the comment on the emitState type
+// definition above for more on how file opening is managed.
+func (s *emitState) openMetaFile(metaHash [16]byte, metaLen uint64) error {
+
+	// Open meta-outfile for reading to see if it exists.
+	fn := fmt.Sprintf("%s.%x", coverage.MetaFilePref, metaHash)
+	s.mfname = filepath.Join(s.outdir, fn)
+	fi, err := os.Stat(s.mfname)
+	if err != nil || fi.Size() != int64(metaLen) {
+		// We need a new meta-file.
+		tname := "tmp." + fn + strconv.FormatInt(time.Now().UnixNano(), 10)
+		s.mftmp = filepath.Join(s.outdir, tname)
+		s.mf, err = os.Create(s.mftmp)
+		if err != nil {
+			return fmt.Errorf("creating meta-data file %s: %v", s.mftmp, err)
+		}
+	}
+	return nil
+}
+
+// openCounterFile opens an output file for the counter data portion
+// of a test coverage run. If updates the 'cfname' and 'cf' fields in
+// 's', returning an error if something went wrong.
+func (s *emitState) openCounterFile(metaHash [16]byte) error {
+	processID := os.Getpid()
+	fn := fmt.Sprintf(coverage.CounterFileTempl, coverage.CounterFilePref, metaHash, processID, time.Now().UnixNano())
+	s.cfname = filepath.Join(s.outdir, fn)
+	s.cftmp = filepath.Join(s.outdir, "tmp."+fn)
+	var err error
+	s.cf, err = os.Create(s.cftmp)
+	if err != nil {
+		return fmt.Errorf("creating counter data file %s: %v", s.cftmp, err)
+	}
+	return nil
+}
+
+// openOutputFiles opens output files in preparation for emitting
+// coverage data. In the case of the meta-data file, openOutputFiles
+// may determine that we can reuse an existing meta-data file in the
+// outdir, in which case it will leave the 'mf' field in the state
+// struct as nil. If a new meta-file is needed, the field 'mfname'
+// will be the final desired path of the meta file, 'mftmp' will be a
+// temporary file, and 'mf' will be an open os.File pointer for
+// 'mftmp'. The idea is that the client/caller will write content into
+// 'mf', close it, and then rename 'mftmp' to 'mfname'. This function
+// also opens the counter data output file, setting 'cf' and 'cfname'
+// in the state struct.
+func (s *emitState) openOutputFiles(metaHash [16]byte, metaLen uint64, which fileType) error {
+	fi, err := os.Stat(s.outdir)
+	if err != nil {
+		return fmt.Errorf("output directory %q inaccessible (err: %v); no coverage data written", s.outdir, err)
+	}
+	if !fi.IsDir() {
+		return fmt.Errorf("output directory %q not a directory; no coverage data written", s.outdir)
+	}
+
+	if (which & metaDataFile) != 0 {
+		if err := s.openMetaFile(metaHash, metaLen); err != nil {
+			return err
+		}
+	}
+	if (which & counterDataFile) != 0 {
+		if err := s.openCounterFile(metaHash); err != nil {
+			return err
+		}
+	}
+	return nil
+}
+
+// emitMetaDataFile emits coverage meta-data to a previously opened
+// temporary file (s.mftmp), then renames the generated file to the
+// final path (s.mfname).
+func (s *emitState) emitMetaDataFile(finalHash [16]byte, tlen uint64) error {
+	if err := writeMetaData(s.mf, s.metalist, cmode, cgran, finalHash); err != nil {
+		return fmt.Errorf("writing %s: %v\n", s.mftmp, err)
+	}
+	if err := s.mf.Close(); err != nil {
+		return fmt.Errorf("closing meta data temp file: %v", err)
+	}
+
+	// Temp file has now been flushed and closed. Rename the temp to the
+	// final desired path.
+	if err := os.Rename(s.mftmp, s.mfname); err != nil {
+		return fmt.Errorf("writing %s: rename from %s failed: %v\n", s.mfname, s.mftmp, err)
+	}
+
+	return nil
+}
+
+// needMetaDataFile returns TRUE if we need to emit a meta-data file
+// for this program run. It should be used only after
+// openOutputFiles() has been invoked.
+func (s *emitState) needMetaDataFile() bool {
+	return s.mf != nil
+}
+
+func writeMetaData(w io.Writer, metalist []rtcov.CovMetaBlob, cmode coverage.CounterMode, gran coverage.CounterGranularity, finalHash [16]byte) error {
+	mfw := encodemeta.NewCoverageMetaFileWriter("<io.Writer>", w)
+
+	// Note: "sd" is re-initialized on each iteration of the loop
+	// below, and would normally be declared inside the loop, but
+	// placed here escape analysis since we capture it in bufHdr.
+	var sd []byte
+	bufHdr := (*reflect.SliceHeader)(unsafe.Pointer(&sd))
+
+	var blobs [][]byte
+	for _, e := range metalist {
+		bufHdr.Data = uintptr(unsafe.Pointer(e.P))
+		bufHdr.Len = int(e.Len)
+		bufHdr.Cap = int(e.Len)
+		blobs = append(blobs, sd)
+	}
+	return mfw.Write(finalHash, blobs, cmode, gran)
+}
+
+func (s *emitState) VisitFuncs(f encodecounter.CounterVisitorFn) error {
+	var sd []atomic.Uint32
+	var tcounters []uint32
+	bufHdr := (*reflect.SliceHeader)(unsafe.Pointer(&sd))
+
+	rdCounters := func(actrs []atomic.Uint32, ctrs []uint32) []uint32 {
+		ctrs = ctrs[:0]
+		for i := range actrs {
+			ctrs = append(ctrs, actrs[i].Load())
+		}
+		return ctrs
+	}
+
+	dpkg := uint32(0)
+	for _, c := range s.counterlist {
+		bufHdr.Data = uintptr(unsafe.Pointer(c.Counters))
+		bufHdr.Len = int(c.Len)
+		bufHdr.Cap = int(c.Len)
+		for i := 0; i < len(sd); i++ {
+			// Skip ahead until the next non-zero value.
+			sdi := sd[i].Load()
+			if sdi == 0 {
+				continue
+			}
+
+			// We found a function that was executed.
+			nCtrs := sd[i+coverage.NumCtrsOffset].Load()
+			pkgId := sd[i+coverage.PkgIdOffset].Load()
+			funcId := sd[i+coverage.FuncIdOffset].Load()
+			cst := i + coverage.FirstCtrOffset
+			counters := sd[cst : cst+int(nCtrs)]
+
+			// Check to make sure that we have at least one live
+			// counter. See the implementation note in ClearCoverageCounters
+			// for a description of why this is needed.
+			isLive := false
+			for i := 0; i < len(counters); i++ {
+				if counters[i].Load() != 0 {
+					isLive = true
+					break
+				}
+			}
+			if !isLive {
+				// Skip this function.
+				i += coverage.FirstCtrOffset + int(nCtrs) - 1
+				continue
+			}
+
+			if s.debug {
+				if pkgId != dpkg {
+					dpkg = pkgId
+					fmt.Fprintf(os.Stderr, "\n=+= %d: pk=%d visit live fcn",
+						i, pkgId)
+				}
+				fmt.Fprintf(os.Stderr, " {i=%d F%d NC%d}", i, funcId, nCtrs)
+			}
+
+			// Vet and/or fix up package ID. A package ID of zero
+			// indicates that there is some new package X that is a
+			// runtime dependency, and this package has code that
+			// executes before its corresponding init package runs.
+			// This is a fatal error that we should only see during
+			// Go development (e.g. tip).
+			ipk := int32(pkgId)
+			if ipk == 0 {
+				fmt.Fprintf(os.Stderr, "\n")
+				reportErrorInHardcodedList(int32(i), ipk, funcId, nCtrs)
+			} else if ipk < 0 {
+				if newId, ok := s.pkgmap[int(ipk)]; ok {
+					pkgId = uint32(newId)
+				} else {
+					fmt.Fprintf(os.Stderr, "\n")
+					reportErrorInHardcodedList(int32(i), ipk, funcId, nCtrs)
+				}
+			} else {
+				// The package ID value stored in the counter array
+				// has 1 added to it (so as to preclude the
+				// possibility of a zero value ; see
+				// runtime.addCovMeta), so subtract off 1 here to form
+				// the real package ID.
+				pkgId--
+			}
+
+			tcounters = rdCounters(counters, tcounters)
+			if err := f(pkgId, funcId, tcounters); err != nil {
+				return err
+			}
+
+			// Skip over this function.
+			i += coverage.FirstCtrOffset + int(nCtrs) - 1
+		}
+		if s.debug {
+			fmt.Fprintf(os.Stderr, "\n")
+		}
+	}
+	return nil
+}
+
+// captureOsArgs converts os.Args() into the format we use to store
+// this info in the counter data file (counter data file "args"
+// section is a generic key-value collection). See the 'args' section
+// in internal/coverage/defs.go for more info. The args map
+// is also used to capture GOOS + GOARCH values as well.
+func captureOsArgs() map[string]string {
+	m := make(map[string]string)
+	m["argc"] = strconv.Itoa(len(os.Args))
+	for k, a := range os.Args {
+		m[fmt.Sprintf("argv%d", k)] = a
+	}
+	m["GOOS"] = runtime.GOOS
+	m["GOARCH"] = runtime.GOARCH
+	return m
+}
+
+// emitCounterDataFile emits the counter data portion of a
+// coverage output file (to the file 's.cf').
+func (s *emitState) emitCounterDataFile(finalHash [16]byte, w io.Writer) error {
+	cfw := encodecounter.NewCoverageDataWriter(w, coverage.CtrULeb128)
+	if err := cfw.Write(finalHash, capturedOsArgs, s); err != nil {
+		return err
+	}
+	return nil
+}
+
+// markProfileEmitted signals the runtime/coverage machinery that
+// coverate data output files have already been written out, and there
+// is no need to take any additional action at exit time. This
+// function is called (via linknamed reference) from the
+// coverage-related boilerplate code in _testmain.go emitted for go
+// unit tests.
+func markProfileEmitted(val bool) {
+	covProfileAlreadyEmitted = val
+}
+
+func reportErrorInHardcodedList(slot, pkgID int32, fnID, nCtrs uint32) {
+	metaList := getCovMetaList()
+	pkgMap := getCovPkgMap()
+
+	println("internal error in coverage meta-data tracking:")
+	println("encountered bad pkgID:", pkgID, " at slot:", slot,
+		" fnID:", fnID, " numCtrs:", nCtrs)
+	println("list of hard-coded runtime package IDs needs revising.")
+	println("[see the comment on the 'rtPkgs' var in ")
+	println(" <goroot>/src/internal/coverage/pkid.go]")
+	println("registered list:")
+	for k, b := range metaList {
+		print("slot: ", k, " path='", b.PkgPath, "' ")
+		if b.PkgID != -1 {
+			print(" hard-coded id: ", b.PkgID)
+		}
+		println("")
+	}
+	println("remap table:")
+	for from, to := range pkgMap {
+		println("from ", from, " to ", to)
+	}
+}
diff --git a/src/runtime/coverage/emitdata_test.go b/src/runtime/coverage/emitdata_test.go
new file mode 100644
index 0000000..3558dd2
--- /dev/null
+++ b/src/runtime/coverage/emitdata_test.go
@@ -0,0 +1,550 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package coverage
+
+import (
+	"fmt"
+	"internal/coverage"
+	"internal/goexperiment"
+	"internal/platform"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"runtime"
+	"strings"
+	"testing"
+)
+
+// Set to true for debugging (linux only).
+const fixedTestDir = false
+
+func TestCoverageApis(t *testing.T) {
+	if testing.Short() {
+		t.Skipf("skipping test: too long for short mode")
+	}
+	if !goexperiment.CoverageRedesign {
+		t.Skipf("skipping new coverage tests (experiment not enabled)")
+	}
+	testenv.MustHaveGoBuild(t)
+	dir := t.TempDir()
+	if fixedTestDir {
+		dir = "/tmp/qqqzzz"
+		os.RemoveAll(dir)
+		mkdir(t, dir)
+	}
+
+	// Build harness. We need two copies of the harness, one built
+	// with -covermode=atomic and one built non-atomic.
+	bdir1 := mkdir(t, filepath.Join(dir, "build1"))
+	hargs1 := []string{"-covermode=atomic", "-coverpkg=all"}
+	atomicHarnessPath := buildHarness(t, bdir1, hargs1)
+	nonAtomicMode := testing.CoverMode()
+	if testing.CoverMode() == "atomic" {
+		nonAtomicMode = "set"
+	}
+	bdir2 := mkdir(t, filepath.Join(dir, "build2"))
+	hargs2 := []string{"-coverpkg=all", "-covermode=" + nonAtomicMode}
+	nonAtomicHarnessPath := buildHarness(t, bdir2, hargs2)
+
+	t.Logf("atomic harness path is %s", atomicHarnessPath)
+	t.Logf("non-atomic harness path is %s", nonAtomicHarnessPath)
+
+	// Sub-tests for each API we want to inspect, plus
+	// extras for error testing.
+	t.Run("emitToDir", func(t *testing.T) {
+		t.Parallel()
+		testEmitToDir(t, atomicHarnessPath, dir)
+	})
+	t.Run("emitToWriter", func(t *testing.T) {
+		t.Parallel()
+		testEmitToWriter(t, atomicHarnessPath, dir)
+	})
+	t.Run("emitToNonexistentDir", func(t *testing.T) {
+		t.Parallel()
+		testEmitToNonexistentDir(t, atomicHarnessPath, dir)
+	})
+	t.Run("emitToNilWriter", func(t *testing.T) {
+		t.Parallel()
+		testEmitToNilWriter(t, atomicHarnessPath, dir)
+	})
+	t.Run("emitToFailingWriter", func(t *testing.T) {
+		t.Parallel()
+		testEmitToFailingWriter(t, atomicHarnessPath, dir)
+	})
+	t.Run("emitWithCounterClear", func(t *testing.T) {
+		t.Parallel()
+		testEmitWithCounterClear(t, atomicHarnessPath, dir)
+	})
+	t.Run("emitToDirNonAtomic", func(t *testing.T) {
+		t.Parallel()
+		testEmitToDirNonAtomic(t, nonAtomicHarnessPath, nonAtomicMode, dir)
+	})
+	t.Run("emitToWriterNonAtomic", func(t *testing.T) {
+		t.Parallel()
+		testEmitToWriterNonAtomic(t, nonAtomicHarnessPath, nonAtomicMode, dir)
+	})
+	t.Run("emitWithCounterClearNonAtomic", func(t *testing.T) {
+		t.Parallel()
+		testEmitWithCounterClearNonAtomic(t, nonAtomicHarnessPath, nonAtomicMode, dir)
+	})
+}
+
+// upmergeCoverData helps improve coverage data for this package
+// itself. If this test itself is being invoked with "-cover", then
+// what we'd like is for package coverage data (that is, coverage for
+// routines in "runtime/coverage") to be incorporated into the test
+// run from the "harness.exe" runs we've just done. We can accomplish
+// this by doing a merge from the harness gocoverdir's to the test
+// gocoverdir.
+func upmergeCoverData(t *testing.T, gocoverdir string, mode string) {
+	if testing.CoverMode() != mode {
+		return
+	}
+	testGoCoverDir := os.Getenv("GOCOVERDIR")
+	if testGoCoverDir == "" {
+		return
+	}
+	args := []string{"tool", "covdata", "merge", "-pkg=runtime/coverage",
+		"-o", testGoCoverDir, "-i", gocoverdir}
+	t.Logf("up-merge of covdata from %s to %s", gocoverdir, testGoCoverDir)
+	t.Logf("executing: go %+v", args)
+	cmd := exec.Command(testenv.GoToolPath(t), args...)
+	if b, err := cmd.CombinedOutput(); err != nil {
+		t.Fatalf("covdata merge failed (%v): %s", err, b)
+	}
+}
+
+// buildHarness builds the helper program "harness.exe".
+func buildHarness(t *testing.T, dir string, opts []string) string {
+	harnessPath := filepath.Join(dir, "harness.exe")
+	harnessSrc := filepath.Join("testdata", "harness.go")
+	args := []string{"build", "-o", harnessPath}
+	args = append(args, opts...)
+	args = append(args, harnessSrc)
+	//t.Logf("harness build: go %+v\n", args)
+	cmd := exec.Command(testenv.GoToolPath(t), args...)
+	if b, err := cmd.CombinedOutput(); err != nil {
+		t.Fatalf("build failed (%v): %s", err, b)
+	}
+	return harnessPath
+}
+
+func mkdir(t *testing.T, d string) string {
+	t.Helper()
+	if err := os.Mkdir(d, 0777); err != nil {
+		t.Fatalf("mkdir failed: %v", err)
+	}
+	return d
+}
+
+// updateGoCoverDir updates the specified environment 'env' to set
+// GOCOVERDIR to 'gcd' (if setGoCoverDir is TRUE) or removes
+// GOCOVERDIR from the environment (if setGoCoverDir is false).
+func updateGoCoverDir(env []string, gcd string, setGoCoverDir bool) []string {
+	rv := []string{}
+	found := false
+	for _, v := range env {
+		if strings.HasPrefix(v, "GOCOVERDIR=") {
+			if !setGoCoverDir {
+				continue
+			}
+			v = "GOCOVERDIR=" + gcd
+			found = true
+		}
+		rv = append(rv, v)
+	}
+	if !found && setGoCoverDir {
+		rv = append(rv, "GOCOVERDIR="+gcd)
+	}
+	return rv
+}
+
+func runHarness(t *testing.T, harnessPath string, tp string, setGoCoverDir bool, rdir, edir string) (string, error) {
+	t.Logf("running: %s -tp %s -o %s with rdir=%s and GOCOVERDIR=%v", harnessPath, tp, edir, rdir, setGoCoverDir)
+	cmd := exec.Command(harnessPath, "-tp", tp, "-o", edir)
+	cmd.Dir = rdir
+	cmd.Env = updateGoCoverDir(os.Environ(), rdir, setGoCoverDir)
+	b, err := cmd.CombinedOutput()
+	//t.Logf("harness run output: %s\n", string(b))
+	return string(b), err
+}
+
+func testForSpecificFunctions(t *testing.T, dir string, want []string, avoid []string) string {
+	args := []string{"tool", "covdata", "debugdump",
+		"-live", "-pkg=command-line-arguments", "-i=" + dir}
+	t.Logf("running: go %v\n", args)
+	cmd := exec.Command(testenv.GoToolPath(t), args...)
+	b, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("'go tool covdata failed (%v): %s", err, b)
+	}
+	output := string(b)
+	rval := ""
+	for _, f := range want {
+		wf := "Func: " + f + "\n"
+		if strings.Contains(output, wf) {
+			continue
+		}
+		rval += fmt.Sprintf("error: output should contain %q but does not\n", wf)
+	}
+	for _, f := range avoid {
+		wf := "Func: " + f + "\n"
+		if strings.Contains(output, wf) {
+			rval += fmt.Sprintf("error: output should not contain %q but does\n", wf)
+		}
+	}
+	if rval != "" {
+		t.Logf("=-= begin output:\n" + output + "\n=-= end output\n")
+	}
+	return rval
+}
+
+func withAndWithoutRunner(f func(setit bool, tag string)) {
+	// Run 'f' with and without GOCOVERDIR set.
+	for i := 0; i < 2; i++ {
+		tag := "x"
+		setGoCoverDir := true
+		if i == 0 {
+			setGoCoverDir = false
+			tag = "y"
+		}
+		f(setGoCoverDir, tag)
+	}
+}
+
+func mktestdirs(t *testing.T, tag, tp, dir string) (string, string) {
+	t.Helper()
+	rdir := mkdir(t, filepath.Join(dir, tp+"-rdir-"+tag))
+	edir := mkdir(t, filepath.Join(dir, tp+"-edir-"+tag))
+	return rdir, edir
+}
+
+func testEmitToDir(t *testing.T, harnessPath string, dir string) {
+	withAndWithoutRunner(func(setGoCoverDir bool, tag string) {
+		tp := "emitToDir"
+		rdir, edir := mktestdirs(t, tag, tp, dir)
+		output, err := runHarness(t, harnessPath, tp,
+			setGoCoverDir, rdir, edir)
+		if err != nil {
+			t.Logf("%s", output)
+			t.Fatalf("running 'harness -tp emitDir': %v", err)
+		}
+
+		// Just check to make sure meta-data file and counter data file were
+		// written. Another alternative would be to run "go tool covdata"
+		// or equivalent, but for now, this is what we've got.
+		dents, err := os.ReadDir(edir)
+		if err != nil {
+			t.Fatalf("os.ReadDir(%s) failed: %v", edir, err)
+		}
+		mfc := 0
+		cdc := 0
+		for _, e := range dents {
+			if e.IsDir() {
+				continue
+			}
+			if strings.HasPrefix(e.Name(), coverage.MetaFilePref) {
+				mfc++
+			} else if strings.HasPrefix(e.Name(), coverage.CounterFilePref) {
+				cdc++
+			}
+		}
+		wantmf := 1
+		wantcf := 1
+		if mfc != wantmf {
+			t.Errorf("EmitToDir: want %d meta-data files, got %d\n", wantmf, mfc)
+		}
+		if cdc != wantcf {
+			t.Errorf("EmitToDir: want %d counter-data files, got %d\n", wantcf, cdc)
+		}
+		upmergeCoverData(t, edir, "atomic")
+		upmergeCoverData(t, rdir, "atomic")
+	})
+}
+
+func testEmitToWriter(t *testing.T, harnessPath string, dir string) {
+	withAndWithoutRunner(func(setGoCoverDir bool, tag string) {
+		tp := "emitToWriter"
+		rdir, edir := mktestdirs(t, tag, tp, dir)
+		output, err := runHarness(t, harnessPath, tp, setGoCoverDir, rdir, edir)
+		if err != nil {
+			t.Logf("%s", output)
+			t.Fatalf("running 'harness -tp %s': %v", tp, err)
+		}
+		want := []string{"main", tp}
+		avoid := []string{"final"}
+		if msg := testForSpecificFunctions(t, edir, want, avoid); msg != "" {
+			t.Errorf("coverage data from %q output match failed: %s", tp, msg)
+		}
+		upmergeCoverData(t, edir, "atomic")
+		upmergeCoverData(t, rdir, "atomic")
+	})
+}
+
+func testEmitToNonexistentDir(t *testing.T, harnessPath string, dir string) {
+	withAndWithoutRunner(func(setGoCoverDir bool, tag string) {
+		tp := "emitToNonexistentDir"
+		rdir, edir := mktestdirs(t, tag, tp, dir)
+		output, err := runHarness(t, harnessPath, tp, setGoCoverDir, rdir, edir)
+		if err != nil {
+			t.Logf("%s", output)
+			t.Fatalf("running 'harness -tp %s': %v", tp, err)
+		}
+		upmergeCoverData(t, edir, "atomic")
+		upmergeCoverData(t, rdir, "atomic")
+	})
+}
+
+func testEmitToUnwritableDir(t *testing.T, harnessPath string, dir string) {
+	withAndWithoutRunner(func(setGoCoverDir bool, tag string) {
+
+		tp := "emitToUnwritableDir"
+		rdir, edir := mktestdirs(t, tag, tp, dir)
+
+		// Make edir unwritable.
+		if err := os.Chmod(edir, 0555); err != nil {
+			t.Fatalf("chmod failed: %v", err)
+		}
+		defer os.Chmod(edir, 0777)
+
+		output, err := runHarness(t, harnessPath, tp, setGoCoverDir, rdir, edir)
+		if err != nil {
+			t.Logf("%s", output)
+			t.Fatalf("running 'harness -tp %s': %v", tp, err)
+		}
+		upmergeCoverData(t, edir, "atomic")
+		upmergeCoverData(t, rdir, "atomic")
+	})
+}
+
+func testEmitToNilWriter(t *testing.T, harnessPath string, dir string) {
+	withAndWithoutRunner(func(setGoCoverDir bool, tag string) {
+		tp := "emitToNilWriter"
+		rdir, edir := mktestdirs(t, tag, tp, dir)
+		output, err := runHarness(t, harnessPath, tp, setGoCoverDir, rdir, edir)
+		if err != nil {
+			t.Logf("%s", output)
+			t.Fatalf("running 'harness -tp %s': %v", tp, err)
+		}
+		upmergeCoverData(t, edir, "atomic")
+		upmergeCoverData(t, rdir, "atomic")
+	})
+}
+
+func testEmitToFailingWriter(t *testing.T, harnessPath string, dir string) {
+	withAndWithoutRunner(func(setGoCoverDir bool, tag string) {
+		tp := "emitToFailingWriter"
+		rdir, edir := mktestdirs(t, tag, tp, dir)
+		output, err := runHarness(t, harnessPath, tp, setGoCoverDir, rdir, edir)
+		if err != nil {
+			t.Logf("%s", output)
+			t.Fatalf("running 'harness -tp %s': %v", tp, err)
+		}
+		upmergeCoverData(t, edir, "atomic")
+		upmergeCoverData(t, rdir, "atomic")
+	})
+}
+
+func testEmitWithCounterClear(t *testing.T, harnessPath string, dir string) {
+	withAndWithoutRunner(func(setGoCoverDir bool, tag string) {
+		tp := "emitWithCounterClear"
+		rdir, edir := mktestdirs(t, tag, tp, dir)
+		output, err := runHarness(t, harnessPath, tp,
+			setGoCoverDir, rdir, edir)
+		if err != nil {
+			t.Logf("%s", output)
+			t.Fatalf("running 'harness -tp %s': %v", tp, err)
+		}
+		want := []string{tp, "postClear"}
+		avoid := []string{"preClear", "main", "final"}
+		if msg := testForSpecificFunctions(t, edir, want, avoid); msg != "" {
+			t.Logf("%s", output)
+			t.Errorf("coverage data from %q output match failed: %s", tp, msg)
+		}
+		upmergeCoverData(t, edir, "atomic")
+		upmergeCoverData(t, rdir, "atomic")
+	})
+}
+
+func testEmitToDirNonAtomic(t *testing.T, harnessPath string, naMode string, dir string) {
+	tp := "emitToDir"
+	tag := "nonatomdir"
+	rdir, edir := mktestdirs(t, tag, tp, dir)
+	output, err := runHarness(t, harnessPath, tp,
+		true, rdir, edir)
+
+	// We expect an error here.
+	if err == nil {
+		t.Logf("%s", output)
+		t.Fatalf("running 'harness -tp %s': did not get expected error", tp)
+	}
+
+	got := strings.TrimSpace(string(output))
+	want := "WriteCountersDir invoked for program built"
+	if !strings.Contains(got, want) {
+		t.Errorf("running 'harness -tp %s': got:\n%s\nwant: %s",
+			tp, got, want)
+	}
+	upmergeCoverData(t, edir, naMode)
+	upmergeCoverData(t, rdir, naMode)
+}
+
+func testEmitToWriterNonAtomic(t *testing.T, harnessPath string, naMode string, dir string) {
+	tp := "emitToWriter"
+	tag := "nonatomw"
+	rdir, edir := mktestdirs(t, tag, tp, dir)
+	output, err := runHarness(t, harnessPath, tp,
+		true, rdir, edir)
+
+	// We expect an error here.
+	if err == nil {
+		t.Logf("%s", output)
+		t.Fatalf("running 'harness -tp %s': did not get expected error", tp)
+	}
+
+	got := strings.TrimSpace(string(output))
+	want := "WriteCounters invoked for program built"
+	if !strings.Contains(got, want) {
+		t.Errorf("running 'harness -tp %s': got:\n%s\nwant: %s",
+			tp, got, want)
+	}
+
+	upmergeCoverData(t, edir, naMode)
+	upmergeCoverData(t, rdir, naMode)
+}
+
+func testEmitWithCounterClearNonAtomic(t *testing.T, harnessPath string, naMode string, dir string) {
+	tp := "emitWithCounterClear"
+	tag := "cclear"
+	rdir, edir := mktestdirs(t, tag, tp, dir)
+	output, err := runHarness(t, harnessPath, tp,
+		true, rdir, edir)
+
+	// We expect an error here.
+	if err == nil {
+		t.Logf("%s", output)
+		t.Fatalf("running 'harness -tp %s' nonatomic: did not get expected error", tp)
+	}
+
+	got := strings.TrimSpace(string(output))
+	want := "ClearCounters invoked for program built"
+	if !strings.Contains(got, want) {
+		t.Errorf("running 'harness -tp %s': got:\n%s\nwant: %s",
+			tp, got, want)
+	}
+
+	upmergeCoverData(t, edir, naMode)
+	upmergeCoverData(t, rdir, naMode)
+}
+
+func TestApisOnNocoverBinary(t *testing.T) {
+	if testing.Short() {
+		t.Skipf("skipping test: too long for short mode")
+	}
+	testenv.MustHaveGoBuild(t)
+	dir := t.TempDir()
+
+	// Build harness with no -cover.
+	bdir := mkdir(t, filepath.Join(dir, "nocover"))
+	edir := mkdir(t, filepath.Join(dir, "emitDirNo"))
+	harnessPath := buildHarness(t, bdir, nil)
+	output, err := runHarness(t, harnessPath, "emitToDir", false, edir, edir)
+	if err == nil {
+		t.Fatalf("expected error on TestApisOnNocoverBinary harness run")
+	}
+	const want = "not built with -cover"
+	if !strings.Contains(output, want) {
+		t.Errorf("error output does not contain %q: %s", want, output)
+	}
+}
+
+func TestIssue56006EmitDataRaceCoverRunningGoroutine(t *testing.T) {
+	if testing.Short() {
+		t.Skipf("skipping test: too long for short mode")
+	}
+	if !goexperiment.CoverageRedesign {
+		t.Skipf("skipping new coverage tests (experiment not enabled)")
+	}
+
+	// This test requires "go test -race -cover", meaning that we need
+	// go build, go run, and "-race" support.
+	testenv.MustHaveGoRun(t)
+	if !platform.RaceDetectorSupported(runtime.GOOS, runtime.GOARCH) ||
+		!testenv.HasCGO() {
+		t.Skip("skipped due to lack of race detector support / CGO")
+	}
+
+	// This will run a program with -cover and -race where we have a
+	// goroutine still running (and updating counters) at the point where
+	// the test runtime is trying to write out counter data.
+	cmd := exec.Command(testenv.GoToolPath(t), "test", "-cover", "-race")
+	cmd.Dir = filepath.Join("testdata", "issue56006")
+	b, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("go test -cover -race failed: %v", err)
+	}
+
+	// Don't want to see any data races in output.
+	avoid := []string{"DATA RACE"}
+	for _, no := range avoid {
+		if strings.Contains(string(b), no) {
+			t.Logf("%s\n", string(b))
+			t.Fatalf("found %s in test output, not permitted", no)
+		}
+	}
+}
+
+func TestIssue59563TruncatedCoverPkgAll(t *testing.T) {
+	if testing.Short() {
+		t.Skipf("skipping test: too long for short mode")
+	}
+	testenv.MustHaveGoRun(t)
+
+	tmpdir := t.TempDir()
+	ppath := filepath.Join(tmpdir, "foo.cov")
+
+	cmd := exec.Command(testenv.GoToolPath(t), "test", "-coverpkg=all", "-coverprofile="+ppath)
+	cmd.Dir = filepath.Join("testdata", "issue59563")
+	b, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("go test -cover failed: %v", err)
+	}
+
+	cmd = exec.Command(testenv.GoToolPath(t), "tool", "cover", "-func="+ppath)
+	b, err = cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("go tool cover -func failed: %v", err)
+	}
+
+	lines := strings.Split(string(b), "\n")
+	nfound := 0
+	bad := false
+	for _, line := range lines {
+		f := strings.Fields(line)
+		if len(f) == 0 {
+			continue
+		}
+		// We're only interested in the specific function "large" for
+		// the testcase being built. See the #59563 for details on why
+		// size matters.
+		if !(strings.HasPrefix(f[0], "runtime/coverage/testdata/issue59563/repro.go") && strings.Contains(line, "large")) {
+			continue
+		}
+		nfound++
+		want := "100.0%"
+		if f[len(f)-1] != want {
+			t.Errorf("wanted %s got: %q\n", want, line)
+			bad = true
+		}
+	}
+	if nfound != 1 {
+		t.Errorf("wanted 1 found, got %d\n", nfound)
+		bad = true
+	}
+	if bad {
+		t.Logf("func output:\n%s\n", string(b))
+	}
+}
diff --git a/src/runtime/coverage/hooks.go b/src/runtime/coverage/hooks.go
new file mode 100644
index 0000000..a9fbf9d
--- /dev/null
+++ b/src/runtime/coverage/hooks.go
@@ -0,0 +1,42 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package coverage
+
+import _ "unsafe"
+
+// initHook is invoked from the main package "init" routine in
+// programs built with "-cover". This function is intended to be
+// called only by the compiler.
+//
+// If 'istest' is false, it indicates we're building a regular program
+// ("go build -cover ..."), in which case we immediately try to write
+// out the meta-data file, and register emitCounterData as an exit
+// hook.
+//
+// If 'istest' is true (indicating that the program in question is a
+// Go test binary), then we tentatively queue up both emitMetaData and
+// emitCounterData as exit hooks. In the normal case (e.g. regular "go
+// test -cover" run) the testmain.go boilerplate will run at the end
+// of the test, write out the coverage percentage, and then invoke
+// markProfileEmitted() to indicate that no more work needs to be
+// done. If however that call is never made, this is a sign that the
+// test binary is being used as a replacement binary for the tool
+// being tested, hence we do want to run exit hooks when the program
+// terminates.
+func initHook(istest bool) {
+	// Note: hooks are run in reverse registration order, so
+	// register the counter data hook before the meta-data hook
+	// (in the case where two hooks are needed).
+	runOnNonZeroExit := true
+	runtime_addExitHook(emitCounterData, runOnNonZeroExit)
+	if istest {
+		runtime_addExitHook(emitMetaData, runOnNonZeroExit)
+	} else {
+		emitMetaData()
+	}
+}
+
+//go:linkname runtime_addExitHook runtime.addExitHook
+func runtime_addExitHook(f func(), runOnNonZeroExit bool)
diff --git a/src/runtime/coverage/testdata/harness.go b/src/runtime/coverage/testdata/harness.go
new file mode 100644
index 0000000..5c87e4c
--- /dev/null
+++ b/src/runtime/coverage/testdata/harness.go
@@ -0,0 +1,259 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"flag"
+	"fmt"
+	"internal/coverage/slicewriter"
+	"io"
+	"io/ioutil"
+	"log"
+	"path/filepath"
+	"runtime/coverage"
+	"strings"
+)
+
+var verbflag = flag.Int("v", 0, "Verbose trace output level")
+var testpointflag = flag.String("tp", "", "Testpoint to run")
+var outdirflag = flag.String("o", "", "Output dir into which to emit")
+
+func emitToWriter() {
+	log.SetPrefix("emitToWriter: ")
+	var slwm slicewriter.WriteSeeker
+	if err := coverage.WriteMeta(&slwm); err != nil {
+		log.Fatalf("error: WriteMeta returns %v", err)
+	}
+	mf := filepath.Join(*outdirflag, "covmeta.0abcdef")
+	if err := ioutil.WriteFile(mf, slwm.BytesWritten(), 0666); err != nil {
+		log.Fatalf("error: writing %s: %v", mf, err)
+	}
+	var slwc slicewriter.WriteSeeker
+	if err := coverage.WriteCounters(&slwc); err != nil {
+		log.Fatalf("error: WriteCounters returns %v", err)
+	}
+	cf := filepath.Join(*outdirflag, "covcounters.0abcdef.99.77")
+	if err := ioutil.WriteFile(cf, slwc.BytesWritten(), 0666); err != nil {
+		log.Fatalf("error: writing %s: %v", cf, err)
+	}
+}
+
+func emitToDir() {
+	log.SetPrefix("emitToDir: ")
+	if err := coverage.WriteMetaDir(*outdirflag); err != nil {
+		log.Fatalf("error: WriteMetaDir returns %v", err)
+	}
+	if err := coverage.WriteCountersDir(*outdirflag); err != nil {
+		log.Fatalf("error: WriteCountersDir returns %v", err)
+	}
+}
+
+func emitToNonexistentDir() {
+	log.SetPrefix("emitToNonexistentDir: ")
+
+	want := []string{
+		"no such file or directory",             // linux-ish
+		"system cannot find the file specified", // windows
+		"does not exist",                        // plan9
+	}
+
+	checkWant := func(which string, got string) {
+		found := false
+		for _, w := range want {
+			if strings.Contains(got, w) {
+				found = true
+				break
+			}
+		}
+		if !found {
+			log.Fatalf("%s emit to bad dir: got error:\n  %v\nwanted error with one of:\n  %+v", which, got, want)
+		}
+	}
+
+	// Mangle the output directory to produce something nonexistent.
+	mangled := *outdirflag + "_MANGLED"
+	if err := coverage.WriteMetaDir(mangled); err == nil {
+		log.Fatal("expected error from WriteMetaDir to nonexistent dir")
+	} else {
+		got := fmt.Sprintf("%v", err)
+		checkWant("meta data", got)
+	}
+
+	// Now try to emit counter data file to a bad dir.
+	if err := coverage.WriteCountersDir(mangled); err == nil {
+		log.Fatal("expected error emitting counter data to bad dir")
+	} else {
+		got := fmt.Sprintf("%v", err)
+		checkWant("counter data", got)
+	}
+}
+
+func emitToUnwritableDir() {
+	log.SetPrefix("emitToUnwritableDir: ")
+
+	want := "permission denied"
+
+	if err := coverage.WriteMetaDir(*outdirflag); err == nil {
+		log.Fatal("expected error from WriteMetaDir to unwritable dir")
+	} else {
+		got := fmt.Sprintf("%v", err)
+		if !strings.Contains(got, want) {
+			log.Fatalf("meta-data emit to unwritable dir: wanted error containing %q got %q", want, got)
+		}
+	}
+
+	// Similarly with writing counter data.
+	if err := coverage.WriteCountersDir(*outdirflag); err == nil {
+		log.Fatal("expected error emitting counter data to unwritable dir")
+	} else {
+		got := fmt.Sprintf("%v", err)
+		if !strings.Contains(got, want) {
+			log.Fatalf("emitting counter data to unwritable dir: wanted error containing %q got %q", want, got)
+		}
+	}
+}
+
+func emitToNilWriter() {
+	log.SetPrefix("emitToWriter: ")
+	want := "nil writer"
+	var bad io.WriteSeeker
+	if err := coverage.WriteMeta(bad); err == nil {
+		log.Fatal("expected error passing nil writer for meta emit")
+	} else {
+		got := fmt.Sprintf("%v", err)
+		if !strings.Contains(got, want) {
+			log.Fatalf("emitting meta-data passing nil writer: wanted error containing %q got %q", want, got)
+		}
+	}
+
+	if err := coverage.WriteCounters(bad); err == nil {
+		log.Fatal("expected error passing nil writer for counter emit")
+	} else {
+		got := fmt.Sprintf("%v", err)
+		if !strings.Contains(got, want) {
+			log.Fatalf("emitting counter data passing nil writer: wanted error containing %q got %q", want, got)
+		}
+	}
+}
+
+type failingWriter struct {
+	writeCount int
+	writeLimit int
+	slws       slicewriter.WriteSeeker
+}
+
+func (f *failingWriter) Write(p []byte) (n int, err error) {
+	c := f.writeCount
+	f.writeCount++
+	if f.writeLimit < 0 || c < f.writeLimit {
+		return f.slws.Write(p)
+	}
+	return 0, fmt.Errorf("manufactured write error")
+}
+
+func (f *failingWriter) Seek(offset int64, whence int) (int64, error) {
+	return f.slws.Seek(offset, whence)
+}
+
+func (f *failingWriter) reset(lim int) {
+	f.writeCount = 0
+	f.writeLimit = lim
+	f.slws = slicewriter.WriteSeeker{}
+}
+
+func writeStressTest(tag string, testf func(testf *failingWriter) error) {
+	// Invoke the function initially without the write limit
+	// set, to capture the number of writes performed.
+	fw := &failingWriter{writeLimit: -1}
+	testf(fw)
+
+	// Now that we know how many writes are going to happen, run the
+	// function repeatedly, each time with a Write operation set to
+	// fail at a new spot. The goal here is to make sure that:
+	// A) an error is reported, and B) nothing crashes.
+	tot := fw.writeCount
+	for i := 0; i < tot; i++ {
+		fw.reset(i)
+		err := testf(fw)
+		if err == nil {
+			log.Fatalf("no error from write %d tag %s", i, tag)
+		}
+	}
+}
+
+func postClear() int {
+	return 42
+}
+
+func preClear() int {
+	return 42
+}
+
+// This test is designed to ensure that write errors are properly
+// handled by the code that writes out coverage data. It repeatedly
+// invokes the 'emit to writer' apis using a specially crafted writer
+// that captures the total number of expected writes, then replays the
+// execution N times with a manufactured write error at the
+// appropriate spot.
+func emitToFailingWriter() {
+	log.SetPrefix("emitToFailingWriter: ")
+
+	writeStressTest("emit-meta", func(f *failingWriter) error {
+		return coverage.WriteMeta(f)
+	})
+	writeStressTest("emit-counter", func(f *failingWriter) error {
+		return coverage.WriteCounters(f)
+	})
+}
+
+func emitWithCounterClear() {
+	log.SetPrefix("emitWitCounterClear: ")
+	preClear()
+	if err := coverage.ClearCounters(); err != nil {
+		log.Fatalf("clear failed: %v", err)
+	}
+	postClear()
+	if err := coverage.WriteMetaDir(*outdirflag); err != nil {
+		log.Fatalf("error: WriteMetaDir returns %v", err)
+	}
+	if err := coverage.WriteCountersDir(*outdirflag); err != nil {
+		log.Fatalf("error: WriteCountersDir returns %v", err)
+	}
+}
+
+func final() int {
+	println("I run last.")
+	return 43
+}
+
+func main() {
+	log.SetFlags(0)
+	flag.Parse()
+	if *testpointflag == "" {
+		log.Fatalf("error: no testpoint (use -tp flag)")
+	}
+	if *outdirflag == "" {
+		log.Fatalf("error: no output dir specified (use -o flag)")
+	}
+	switch *testpointflag {
+	case "emitToDir":
+		emitToDir()
+	case "emitToWriter":
+		emitToWriter()
+	case "emitToNonexistentDir":
+		emitToNonexistentDir()
+	case "emitToUnwritableDir":
+		emitToUnwritableDir()
+	case "emitToNilWriter":
+		emitToNilWriter()
+	case "emitToFailingWriter":
+		emitToFailingWriter()
+	case "emitWithCounterClear":
+		emitWithCounterClear()
+	default:
+		log.Fatalf("error: unknown testpoint %q", *testpointflag)
+	}
+	final()
+}
diff --git a/src/runtime/coverage/testdata/issue56006/repro.go b/src/runtime/coverage/testdata/issue56006/repro.go
new file mode 100644
index 0000000..60a4925
--- /dev/null
+++ b/src/runtime/coverage/testdata/issue56006/repro.go
@@ -0,0 +1,26 @@
+package main
+
+//go:noinline
+func blah(x int) int {
+	if x != 0 {
+		return x + 42
+	}
+	return x - 42
+}
+
+func main() {
+	go infloop()
+	println(blah(1) + blah(0))
+}
+
+var G int
+
+func infloop() {
+	for {
+		G += blah(1)
+		G += blah(0)
+		if G > 10000 {
+			G = 0
+		}
+	}
+}
diff --git a/src/runtime/coverage/testdata/issue56006/repro_test.go b/src/runtime/coverage/testdata/issue56006/repro_test.go
new file mode 100644
index 0000000..674d819
--- /dev/null
+++ b/src/runtime/coverage/testdata/issue56006/repro_test.go
@@ -0,0 +1,8 @@
+package main
+
+import "testing"
+
+func TestSomething(t *testing.T) {
+	go infloop()
+	println(blah(1) + blah(0))
+}
diff --git a/src/runtime/coverage/testdata/issue59563/repro.go b/src/runtime/coverage/testdata/issue59563/repro.go
new file mode 100644
index 0000000..d054567
--- /dev/null
+++ b/src/runtime/coverage/testdata/issue59563/repro.go
@@ -0,0 +1,823 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package repro
+
+import (
+	"fmt"
+	"net/http"
+)
+
+func small() {
+	go func() {
+		fmt.Println(http.ListenAndServe("localhost:7070", nil))
+	}()
+}
+
+func large(x int) int {
+	if x == 0 {
+		x += 0
+	} else if x == 1 {
+		x += 1
+	} else if x == 2 {
+		x += 2
+	} else if x == 3 {
+		x += 3
+	} else if x == 4 {
+		x += 4
+	} else if x == 5 {
+		x += 5
+	} else if x == 6 {
+		x += 6
+	} else if x == 7 {
+		x += 7
+	} else if x == 8 {
+		x += 8
+	} else if x == 9 {
+		x += 9
+	} else if x == 10 {
+		x += 10
+	} else if x == 11 {
+		x += 11
+	} else if x == 12 {
+		x += 12
+	} else if x == 13 {
+		x += 13
+	} else if x == 14 {
+		x += 14
+	} else if x == 15 {
+		x += 15
+	} else if x == 16 {
+		x += 16
+	} else if x == 17 {
+		x += 17
+	} else if x == 18 {
+		x += 18
+	} else if x == 19 {
+		x += 19
+	} else if x == 20 {
+		x += 20
+	} else if x == 21 {
+		x += 21
+	} else if x == 22 {
+		x += 22
+	} else if x == 23 {
+		x += 23
+	} else if x == 24 {
+		x += 24
+	} else if x == 25 {
+		x += 25
+	} else if x == 26 {
+		x += 26
+	} else if x == 27 {
+		x += 27
+	} else if x == 28 {
+		x += 28
+	} else if x == 29 {
+		x += 29
+	} else if x == 30 {
+		x += 30
+	} else if x == 31 {
+		x += 31
+	} else if x == 32 {
+		x += 32
+	} else if x == 33 {
+		x += 33
+	} else if x == 34 {
+		x += 34
+	} else if x == 35 {
+		x += 35
+	} else if x == 36 {
+		x += 36
+	} else if x == 37 {
+		x += 37
+	} else if x == 38 {
+		x += 38
+	} else if x == 39 {
+		x += 39
+	} else if x == 40 {
+		x += 40
+	} else if x == 41 {
+		x += 41
+	} else if x == 42 {
+		x += 42
+	} else if x == 43 {
+		x += 43
+	} else if x == 44 {
+		x += 44
+	} else if x == 45 {
+		x += 45
+	} else if x == 46 {
+		x += 46
+	} else if x == 47 {
+		x += 47
+	} else if x == 48 {
+		x += 48
+	} else if x == 49 {
+		x += 49
+	} else if x == 50 {
+		x += 50
+	} else if x == 51 {
+		x += 51
+	} else if x == 52 {
+		x += 52
+	} else if x == 53 {
+		x += 53
+	} else if x == 54 {
+		x += 54
+	} else if x == 55 {
+		x += 55
+	} else if x == 56 {
+		x += 56
+	} else if x == 57 {
+		x += 57
+	} else if x == 58 {
+		x += 58
+	} else if x == 59 {
+		x += 59
+	} else if x == 60 {
+		x += 60
+	} else if x == 61 {
+		x += 61
+	} else if x == 62 {
+		x += 62
+	} else if x == 63 {
+		x += 63
+	} else if x == 64 {
+		x += 64
+	} else if x == 65 {
+		x += 65
+	} else if x == 66 {
+		x += 66
+	} else if x == 67 {
+		x += 67
+	} else if x == 68 {
+		x += 68
+	} else if x == 69 {
+		x += 69
+	} else if x == 70 {
+		x += 70
+	} else if x == 71 {
+		x += 71
+	} else if x == 72 {
+		x += 72
+	} else if x == 73 {
+		x += 73
+	} else if x == 74 {
+		x += 74
+	} else if x == 75 {
+		x += 75
+	} else if x == 76 {
+		x += 76
+	} else if x == 77 {
+		x += 77
+	} else if x == 78 {
+		x += 78
+	} else if x == 79 {
+		x += 79
+	} else if x == 80 {
+		x += 80
+	} else if x == 81 {
+		x += 81
+	} else if x == 82 {
+		x += 82
+	} else if x == 83 {
+		x += 83
+	} else if x == 84 {
+		x += 84
+	} else if x == 85 {
+		x += 85
+	} else if x == 86 {
+		x += 86
+	} else if x == 87 {
+		x += 87
+	} else if x == 88 {
+		x += 88
+	} else if x == 89 {
+		x += 89
+	} else if x == 90 {
+		x += 90
+	} else if x == 91 {
+		x += 91
+	} else if x == 92 {
+		x += 92
+	} else if x == 93 {
+		x += 93
+	} else if x == 94 {
+		x += 94
+	} else if x == 95 {
+		x += 95
+	} else if x == 96 {
+		x += 96
+	} else if x == 97 {
+		x += 97
+	} else if x == 98 {
+		x += 98
+	} else if x == 99 {
+		x += 99
+	} else if x == 100 {
+		x += 100
+	} else if x == 101 {
+		x += 101
+	} else if x == 102 {
+		x += 102
+	} else if x == 103 {
+		x += 103
+	} else if x == 104 {
+		x += 104
+	} else if x == 105 {
+		x += 105
+	} else if x == 106 {
+		x += 106
+	} else if x == 107 {
+		x += 107
+	} else if x == 108 {
+		x += 108
+	} else if x == 109 {
+		x += 109
+	} else if x == 110 {
+		x += 110
+	} else if x == 111 {
+		x += 111
+	} else if x == 112 {
+		x += 112
+	} else if x == 113 {
+		x += 113
+	} else if x == 114 {
+		x += 114
+	} else if x == 115 {
+		x += 115
+	} else if x == 116 {
+		x += 116
+	} else if x == 117 {
+		x += 117
+	} else if x == 118 {
+		x += 118
+	} else if x == 119 {
+		x += 119
+	} else if x == 120 {
+		x += 120
+	} else if x == 121 {
+		x += 121
+	} else if x == 122 {
+		x += 122
+	} else if x == 123 {
+		x += 123
+	} else if x == 124 {
+		x += 124
+	} else if x == 125 {
+		x += 125
+	} else if x == 126 {
+		x += 126
+	} else if x == 127 {
+		x += 127
+	} else if x == 128 {
+		x += 128
+	} else if x == 129 {
+		x += 129
+	} else if x == 130 {
+		x += 130
+	} else if x == 131 {
+		x += 131
+	} else if x == 132 {
+		x += 132
+	} else if x == 133 {
+		x += 133
+	} else if x == 134 {
+		x += 134
+	} else if x == 135 {
+		x += 135
+	} else if x == 136 {
+		x += 136
+	} else if x == 137 {
+		x += 137
+	} else if x == 138 {
+		x += 138
+	} else if x == 139 {
+		x += 139
+	} else if x == 140 {
+		x += 140
+	} else if x == 141 {
+		x += 141
+	} else if x == 142 {
+		x += 142
+	} else if x == 143 {
+		x += 143
+	} else if x == 144 {
+		x += 144
+	} else if x == 145 {
+		x += 145
+	} else if x == 146 {
+		x += 146
+	} else if x == 147 {
+		x += 147
+	} else if x == 148 {
+		x += 148
+	} else if x == 149 {
+		x += 149
+	} else if x == 150 {
+		x += 150
+	} else if x == 151 {
+		x += 151
+	} else if x == 152 {
+		x += 152
+	} else if x == 153 {
+		x += 153
+	} else if x == 154 {
+		x += 154
+	} else if x == 155 {
+		x += 155
+	} else if x == 156 {
+		x += 156
+	} else if x == 157 {
+		x += 157
+	} else if x == 158 {
+		x += 158
+	} else if x == 159 {
+		x += 159
+	} else if x == 160 {
+		x += 160
+	} else if x == 161 {
+		x += 161
+	} else if x == 162 {
+		x += 162
+	} else if x == 163 {
+		x += 163
+	} else if x == 164 {
+		x += 164
+	} else if x == 165 {
+		x += 165
+	} else if x == 166 {
+		x += 166
+	} else if x == 167 {
+		x += 167
+	} else if x == 168 {
+		x += 168
+	} else if x == 169 {
+		x += 169
+	} else if x == 170 {
+		x += 170
+	} else if x == 171 {
+		x += 171
+	} else if x == 172 {
+		x += 172
+	} else if x == 173 {
+		x += 173
+	} else if x == 174 {
+		x += 174
+	} else if x == 175 {
+		x += 175
+	} else if x == 176 {
+		x += 176
+	} else if x == 177 {
+		x += 177
+	} else if x == 178 {
+		x += 178
+	} else if x == 179 {
+		x += 179
+	} else if x == 180 {
+		x += 180
+	} else if x == 181 {
+		x += 181
+	} else if x == 182 {
+		x += 182
+	} else if x == 183 {
+		x += 183
+	} else if x == 184 {
+		x += 184
+	} else if x == 185 {
+		x += 185
+	} else if x == 186 {
+		x += 186
+	} else if x == 187 {
+		x += 187
+	} else if x == 188 {
+		x += 188
+	} else if x == 189 {
+		x += 189
+	} else if x == 190 {
+		x += 190
+	} else if x == 191 {
+		x += 191
+	} else if x == 192 {
+		x += 192
+	} else if x == 193 {
+		x += 193
+	} else if x == 194 {
+		x += 194
+	} else if x == 195 {
+		x += 195
+	} else if x == 196 {
+		x += 196
+	} else if x == 197 {
+		x += 197
+	} else if x == 198 {
+		x += 198
+	} else if x == 199 {
+		x += 199
+	} else if x == 200 {
+		x += 200
+	} else if x == 201 {
+		x += 201
+	} else if x == 202 {
+		x += 202
+	} else if x == 203 {
+		x += 203
+	} else if x == 204 {
+		x += 204
+	} else if x == 205 {
+		x += 205
+	} else if x == 206 {
+		x += 206
+	} else if x == 207 {
+		x += 207
+	} else if x == 208 {
+		x += 208
+	} else if x == 209 {
+		x += 209
+	} else if x == 210 {
+		x += 210
+	} else if x == 211 {
+		x += 211
+	} else if x == 212 {
+		x += 212
+	} else if x == 213 {
+		x += 213
+	} else if x == 214 {
+		x += 214
+	} else if x == 215 {
+		x += 215
+	} else if x == 216 {
+		x += 216
+	} else if x == 217 {
+		x += 217
+	} else if x == 218 {
+		x += 218
+	} else if x == 219 {
+		x += 219
+	} else if x == 220 {
+		x += 220
+	} else if x == 221 {
+		x += 221
+	} else if x == 222 {
+		x += 222
+	} else if x == 223 {
+		x += 223
+	} else if x == 224 {
+		x += 224
+	} else if x == 225 {
+		x += 225
+	} else if x == 226 {
+		x += 226
+	} else if x == 227 {
+		x += 227
+	} else if x == 228 {
+		x += 228
+	} else if x == 229 {
+		x += 229
+	} else if x == 230 {
+		x += 230
+	} else if x == 231 {
+		x += 231
+	} else if x == 232 {
+		x += 232
+	} else if x == 233 {
+		x += 233
+	} else if x == 234 {
+		x += 234
+	} else if x == 235 {
+		x += 235
+	} else if x == 236 {
+		x += 236
+	} else if x == 237 {
+		x += 237
+	} else if x == 238 {
+		x += 238
+	} else if x == 239 {
+		x += 239
+	} else if x == 240 {
+		x += 240
+	} else if x == 241 {
+		x += 241
+	} else if x == 242 {
+		x += 242
+	} else if x == 243 {
+		x += 243
+	} else if x == 244 {
+		x += 244
+	} else if x == 245 {
+		x += 245
+	} else if x == 246 {
+		x += 246
+	} else if x == 247 {
+		x += 247
+	} else if x == 248 {
+		x += 248
+	} else if x == 249 {
+		x += 249
+	} else if x == 250 {
+		x += 250
+	} else if x == 251 {
+		x += 251
+	} else if x == 252 {
+		x += 252
+	} else if x == 253 {
+		x += 253
+	} else if x == 254 {
+		x += 254
+	} else if x == 255 {
+		x += 255
+	} else if x == 256 {
+		x += 256
+	} else if x == 257 {
+		x += 257
+	} else if x == 258 {
+		x += 258
+	} else if x == 259 {
+		x += 259
+	} else if x == 260 {
+		x += 260
+	} else if x == 261 {
+		x += 261
+	} else if x == 262 {
+		x += 262
+	} else if x == 263 {
+		x += 263
+	} else if x == 264 {
+		x += 264
+	} else if x == 265 {
+		x += 265
+	} else if x == 266 {
+		x += 266
+	} else if x == 267 {
+		x += 267
+	} else if x == 268 {
+		x += 268
+	} else if x == 269 {
+		x += 269
+	} else if x == 270 {
+		x += 270
+	} else if x == 271 {
+		x += 271
+	} else if x == 272 {
+		x += 272
+	} else if x == 273 {
+		x += 273
+	} else if x == 274 {
+		x += 274
+	} else if x == 275 {
+		x += 275
+	} else if x == 276 {
+		x += 276
+	} else if x == 277 {
+		x += 277
+	} else if x == 278 {
+		x += 278
+	} else if x == 279 {
+		x += 279
+	} else if x == 280 {
+		x += 280
+	} else if x == 281 {
+		x += 281
+	} else if x == 282 {
+		x += 282
+	} else if x == 283 {
+		x += 283
+	} else if x == 284 {
+		x += 284
+	} else if x == 285 {
+		x += 285
+	} else if x == 286 {
+		x += 286
+	} else if x == 287 {
+		x += 287
+	} else if x == 288 {
+		x += 288
+	} else if x == 289 {
+		x += 289
+	} else if x == 290 {
+		x += 290
+	} else if x == 291 {
+		x += 291
+	} else if x == 292 {
+		x += 292
+	} else if x == 293 {
+		x += 293
+	} else if x == 294 {
+		x += 294
+	} else if x == 295 {
+		x += 295
+	} else if x == 296 {
+		x += 296
+	} else if x == 297 {
+		x += 297
+	} else if x == 298 {
+		x += 298
+	} else if x == 299 {
+		x += 299
+	} else if x == 300 {
+		x += 300
+	} else if x == 301 {
+		x += 301
+	} else if x == 302 {
+		x += 302
+	} else if x == 303 {
+		x += 303
+	} else if x == 304 {
+		x += 304
+	} else if x == 305 {
+		x += 305
+	} else if x == 306 {
+		x += 306
+	} else if x == 307 {
+		x += 307
+	} else if x == 308 {
+		x += 308
+	} else if x == 309 {
+		x += 309
+	} else if x == 310 {
+		x += 310
+	} else if x == 311 {
+		x += 311
+	} else if x == 312 {
+		x += 312
+	} else if x == 313 {
+		x += 313
+	} else if x == 314 {
+		x += 314
+	} else if x == 315 {
+		x += 315
+	} else if x == 316 {
+		x += 316
+	} else if x == 317 {
+		x += 317
+	} else if x == 318 {
+		x += 318
+	} else if x == 319 {
+		x += 319
+	} else if x == 320 {
+		x += 320
+	} else if x == 321 {
+		x += 321
+	} else if x == 322 {
+		x += 322
+	} else if x == 323 {
+		x += 323
+	} else if x == 324 {
+		x += 324
+	} else if x == 325 {
+		x += 325
+	} else if x == 326 {
+		x += 326
+	} else if x == 327 {
+		x += 327
+	} else if x == 328 {
+		x += 328
+	} else if x == 329 {
+		x += 329
+	} else if x == 330 {
+		x += 330
+	} else if x == 331 {
+		x += 331
+	} else if x == 332 {
+		x += 332
+	} else if x == 333 {
+		x += 333
+	} else if x == 334 {
+		x += 334
+	} else if x == 335 {
+		x += 335
+	} else if x == 336 {
+		x += 336
+	} else if x == 337 {
+		x += 337
+	} else if x == 338 {
+		x += 338
+	} else if x == 339 {
+		x += 339
+	} else if x == 340 {
+		x += 340
+	} else if x == 341 {
+		x += 341
+	} else if x == 342 {
+		x += 342
+	} else if x == 343 {
+		x += 343
+	} else if x == 344 {
+		x += 344
+	} else if x == 345 {
+		x += 345
+	} else if x == 346 {
+		x += 346
+	} else if x == 347 {
+		x += 347
+	} else if x == 348 {
+		x += 348
+	} else if x == 349 {
+		x += 349
+	} else if x == 350 {
+		x += 350
+	} else if x == 351 {
+		x += 351
+	} else if x == 352 {
+		x += 352
+	} else if x == 353 {
+		x += 353
+	} else if x == 354 {
+		x += 354
+	} else if x == 355 {
+		x += 355
+	} else if x == 356 {
+		x += 356
+	} else if x == 357 {
+		x += 357
+	} else if x == 358 {
+		x += 358
+	} else if x == 359 {
+		x += 359
+	} else if x == 360 {
+		x += 360
+	} else if x == 361 {
+		x += 361
+	} else if x == 362 {
+		x += 362
+	} else if x == 363 {
+		x += 363
+	} else if x == 364 {
+		x += 364
+	} else if x == 365 {
+		x += 365
+	} else if x == 366 {
+		x += 366
+	} else if x == 367 {
+		x += 367
+	} else if x == 368 {
+		x += 368
+	} else if x == 369 {
+		x += 369
+	} else if x == 370 {
+		x += 370
+	} else if x == 371 {
+		x += 371
+	} else if x == 372 {
+		x += 372
+	} else if x == 373 {
+		x += 373
+	} else if x == 374 {
+		x += 374
+	} else if x == 375 {
+		x += 375
+	} else if x == 376 {
+		x += 376
+	} else if x == 377 {
+		x += 377
+	} else if x == 378 {
+		x += 378
+	} else if x == 379 {
+		x += 379
+	} else if x == 380 {
+		x += 380
+	} else if x == 381 {
+		x += 381
+	} else if x == 382 {
+		x += 382
+	} else if x == 383 {
+		x += 383
+	} else if x == 384 {
+		x += 384
+	} else if x == 385 {
+		x += 385
+	} else if x == 386 {
+		x += 386
+	} else if x == 387 {
+		x += 387
+	} else if x == 388 {
+		x += 388
+	} else if x == 389 {
+		x += 389
+	} else if x == 390 {
+		x += 390
+	} else if x == 391 {
+		x += 391
+	} else if x == 392 {
+		x += 392
+	} else if x == 393 {
+		x += 393
+	} else if x == 394 {
+		x += 394
+	} else if x == 395 {
+		x += 395
+	} else if x == 396 {
+		x += 396
+	} else if x == 397 {
+		x += 397
+	} else if x == 398 {
+		x += 398
+	} else if x == 399 {
+		x += 399
+	} else if x == 400 {
+		x += 400
+	}
+	return x * x
+}
diff --git a/src/runtime/coverage/testdata/issue59563/repro_test.go b/src/runtime/coverage/testdata/issue59563/repro_test.go
new file mode 100644
index 0000000..15c8e01
--- /dev/null
+++ b/src/runtime/coverage/testdata/issue59563/repro_test.go
@@ -0,0 +1,14 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package repro
+
+import "testing"
+
+func TestSomething(t *testing.T) {
+	small()
+	for i := 0; i < 1001; i++ {
+		large(i)
+	}
+}
diff --git a/src/runtime/coverage/testsupport.go b/src/runtime/coverage/testsupport.go
new file mode 100644
index 0000000..f169580
--- /dev/null
+++ b/src/runtime/coverage/testsupport.go
@@ -0,0 +1,323 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package coverage
+
+import (
+	"encoding/json"
+	"fmt"
+	"internal/coverage"
+	"internal/coverage/calloc"
+	"internal/coverage/cformat"
+	"internal/coverage/cmerge"
+	"internal/coverage/decodecounter"
+	"internal/coverage/decodemeta"
+	"internal/coverage/pods"
+	"io"
+	"os"
+	"path/filepath"
+	"runtime/internal/atomic"
+	"strings"
+	"unsafe"
+)
+
+// processCoverTestDir is called (via a linknamed reference) from
+// testmain code when "go test -cover" is in effect. It is not
+// intended to be used other than internally by the Go command's
+// generated code.
+func processCoverTestDir(dir string, cfile string, cm string, cpkg string) error {
+	return processCoverTestDirInternal(dir, cfile, cm, cpkg, os.Stdout)
+}
+
+// processCoverTestDirInternal is an io.Writer version of processCoverTestDir,
+// exposed for unit testing.
+func processCoverTestDirInternal(dir string, cfile string, cm string, cpkg string, w io.Writer) error {
+	cmode := coverage.ParseCounterMode(cm)
+	if cmode == coverage.CtrModeInvalid {
+		return fmt.Errorf("invalid counter mode %q", cm)
+	}
+
+	// Emit meta-data and counter data.
+	ml := getCovMetaList()
+	if len(ml) == 0 {
+		// This corresponds to the case where we have a package that
+		// contains test code but no functions (which is fine). In this
+		// case there is no need to emit anything.
+	} else {
+		if err := emitMetaDataToDirectory(dir, ml); err != nil {
+			return err
+		}
+		if err := emitCounterDataToDirectory(dir); err != nil {
+			return err
+		}
+	}
+
+	// Collect pods from test run. For the majority of cases we would
+	// expect to see a single pod here, but allow for multiple pods in
+	// case the test harness is doing extra work to collect data files
+	// from builds that it kicks off as part of the testing.
+	podlist, err := pods.CollectPods([]string{dir}, false)
+	if err != nil {
+		return fmt.Errorf("reading from %s: %v", dir, err)
+	}
+
+	// Open text output file if appropriate.
+	var tf *os.File
+	var tfClosed bool
+	if cfile != "" {
+		var err error
+		tf, err = os.Create(cfile)
+		if err != nil {
+			return fmt.Errorf("internal error: opening coverage data output file %q: %v", cfile, err)
+		}
+		defer func() {
+			if !tfClosed {
+				tfClosed = true
+				tf.Close()
+			}
+		}()
+	}
+
+	// Read/process the pods.
+	ts := &tstate{
+		cm:    &cmerge.Merger{},
+		cf:    cformat.NewFormatter(cmode),
+		cmode: cmode,
+	}
+	// Generate the expected hash string based on the final meta-data
+	// hash for this test, then look only for pods that refer to that
+	// hash (just in case there are multiple instrumented executables
+	// in play). See issue #57924 for more on this.
+	hashstring := fmt.Sprintf("%x", finalHash)
+	importpaths := make(map[string]struct{})
+	for _, p := range podlist {
+		if !strings.Contains(p.MetaFile, hashstring) {
+			continue
+		}
+		if err := ts.processPod(p, importpaths); err != nil {
+			return err
+		}
+	}
+
+	metafilespath := filepath.Join(dir, coverage.MetaFilesFileName)
+	if _, err := os.Stat(metafilespath); err == nil {
+		if err := ts.readAuxMetaFiles(metafilespath, importpaths); err != nil {
+			return err
+		}
+	}
+
+	// Emit percent.
+	if err := ts.cf.EmitPercent(w, cpkg, true, true); err != nil {
+		return err
+	}
+
+	// Emit text output.
+	if tf != nil {
+		if err := ts.cf.EmitTextual(tf); err != nil {
+			return err
+		}
+		tfClosed = true
+		if err := tf.Close(); err != nil {
+			return fmt.Errorf("closing %s: %v", cfile, err)
+		}
+	}
+
+	return nil
+}
+
+type tstate struct {
+	calloc.BatchCounterAlloc
+	cm    *cmerge.Merger
+	cf    *cformat.Formatter
+	cmode coverage.CounterMode
+}
+
+// processPod reads coverage counter data for a specific pod.
+func (ts *tstate) processPod(p pods.Pod, importpaths map[string]struct{}) error {
+	// Open meta-data file
+	f, err := os.Open(p.MetaFile)
+	if err != nil {
+		return fmt.Errorf("unable to open meta-data file %s: %v", p.MetaFile, err)
+	}
+	defer func() {
+		f.Close()
+	}()
+	var mfr *decodemeta.CoverageMetaFileReader
+	mfr, err = decodemeta.NewCoverageMetaFileReader(f, nil)
+	if err != nil {
+		return fmt.Errorf("error reading meta-data file %s: %v", p.MetaFile, err)
+	}
+	newmode := mfr.CounterMode()
+	if newmode != ts.cmode {
+		return fmt.Errorf("internal error: counter mode clash: %q from test harness, %q from data file %s", ts.cmode.String(), newmode.String(), p.MetaFile)
+	}
+	newgran := mfr.CounterGranularity()
+	if err := ts.cm.SetModeAndGranularity(p.MetaFile, cmode, newgran); err != nil {
+		return err
+	}
+
+	// A map to store counter data, indexed by pkgid/fnid tuple.
+	pmm := make(map[pkfunc][]uint32)
+
+	// Helper to read a single counter data file.
+	readcdf := func(cdf string) error {
+		cf, err := os.Open(cdf)
+		if err != nil {
+			return fmt.Errorf("opening counter data file %s: %s", cdf, err)
+		}
+		defer cf.Close()
+		var cdr *decodecounter.CounterDataReader
+		cdr, err = decodecounter.NewCounterDataReader(cdf, cf)
+		if err != nil {
+			return fmt.Errorf("reading counter data file %s: %s", cdf, err)
+		}
+		var data decodecounter.FuncPayload
+		for {
+			ok, err := cdr.NextFunc(&data)
+			if err != nil {
+				return fmt.Errorf("reading counter data file %s: %v", cdf, err)
+			}
+			if !ok {
+				break
+			}
+
+			// NB: sanity check on pkg and func IDs?
+			key := pkfunc{pk: data.PkgIdx, fcn: data.FuncIdx}
+			if prev, found := pmm[key]; found {
+				// Note: no overflow reporting here.
+				if err, _ := ts.cm.MergeCounters(data.Counters, prev); err != nil {
+					return fmt.Errorf("processing counter data file %s: %v", cdf, err)
+				}
+			}
+			c := ts.AllocateCounters(len(data.Counters))
+			copy(c, data.Counters)
+			pmm[key] = c
+		}
+		return nil
+	}
+
+	// Read counter data files.
+	for _, cdf := range p.CounterDataFiles {
+		if err := readcdf(cdf); err != nil {
+			return err
+		}
+	}
+
+	// Visit meta-data file.
+	np := uint32(mfr.NumPackages())
+	payload := []byte{}
+	for pkIdx := uint32(0); pkIdx < np; pkIdx++ {
+		var pd *decodemeta.CoverageMetaDataDecoder
+		pd, payload, err = mfr.GetPackageDecoder(pkIdx, payload)
+		if err != nil {
+			return fmt.Errorf("reading pkg %d from meta-file %s: %s", pkIdx, p.MetaFile, err)
+		}
+		ts.cf.SetPackage(pd.PackagePath())
+		importpaths[pd.PackagePath()] = struct{}{}
+		var fd coverage.FuncDesc
+		nf := pd.NumFuncs()
+		for fnIdx := uint32(0); fnIdx < nf; fnIdx++ {
+			if err := pd.ReadFunc(fnIdx, &fd); err != nil {
+				return fmt.Errorf("reading meta-data file %s: %v",
+					p.MetaFile, err)
+			}
+			key := pkfunc{pk: pkIdx, fcn: fnIdx}
+			counters, haveCounters := pmm[key]
+			for i := 0; i < len(fd.Units); i++ {
+				u := fd.Units[i]
+				// Skip units with non-zero parent (no way to represent
+				// these in the existing format).
+				if u.Parent != 0 {
+					continue
+				}
+				count := uint32(0)
+				if haveCounters {
+					count = counters[i]
+				}
+				ts.cf.AddUnit(fd.Srcfile, fd.Funcname, fd.Lit, u, count)
+			}
+		}
+	}
+	return nil
+}
+
+type pkfunc struct {
+	pk, fcn uint32
+}
+
+func (ts *tstate) readAuxMetaFiles(metafiles string, importpaths map[string]struct{}) error {
+	// Unmarshall the information on available aux metafiles into
+	// a MetaFileCollection struct.
+	var mfc coverage.MetaFileCollection
+	data, err := os.ReadFile(metafiles)
+	if err != nil {
+		return fmt.Errorf("error reading auxmetafiles file %q: %v", metafiles, err)
+	}
+	if err := json.Unmarshal(data, &mfc); err != nil {
+		return fmt.Errorf("error reading auxmetafiles file %q: %v", metafiles, err)
+	}
+
+	// Walk through each available aux meta-file. If we've already
+	// seen the package path in question during the walk of the
+	// "regular" meta-data file, then we can skip the package,
+	// otherwise construct a dummy pod with the single meta-data file
+	// (no counters) and invoke processPod on it.
+	for i := range mfc.ImportPaths {
+		p := mfc.ImportPaths[i]
+		if _, ok := importpaths[p]; ok {
+			continue
+		}
+		var pod pods.Pod
+		pod.MetaFile = mfc.MetaFileFragments[i]
+		if err := ts.processPod(pod, importpaths); err != nil {
+			return err
+		}
+	}
+	return nil
+}
+
+// snapshot returns a snapshot of coverage percentage at a moment of
+// time within a running test, so as to support the testing.Coverage()
+// function. This version doesn't examine coverage meta-data, so the
+// result it returns will be less accurate (more "slop") due to the
+// fact that we don't look at the meta data to see how many statements
+// are associated with each counter.
+func snapshot() float64 {
+	cl := getCovCounterList()
+	if len(cl) == 0 {
+		// no work to do here.
+		return 0.0
+	}
+
+	tot := uint64(0)
+	totExec := uint64(0)
+	for _, c := range cl {
+		sd := unsafe.Slice((*atomic.Uint32)(unsafe.Pointer(c.Counters)), c.Len)
+		tot += uint64(len(sd))
+		for i := 0; i < len(sd); i++ {
+			// Skip ahead until the next non-zero value.
+			if sd[i].Load() == 0 {
+				continue
+			}
+			// We found a function that was executed.
+			nCtrs := sd[i+coverage.NumCtrsOffset].Load()
+			cst := i + coverage.FirstCtrOffset
+
+			if cst+int(nCtrs) > len(sd) {
+				break
+			}
+			counters := sd[cst : cst+int(nCtrs)]
+			for i := range counters {
+				if counters[i].Load() != 0 {
+					totExec++
+				}
+			}
+			i += coverage.FirstCtrOffset + int(nCtrs) - 1
+		}
+	}
+	if tot == 0 {
+		return 0.0
+	}
+	return float64(totExec) / float64(tot)
+}
diff --git a/src/runtime/coverage/ts_test.go b/src/runtime/coverage/ts_test.go
new file mode 100644
index 0000000..b4c6e97
--- /dev/null
+++ b/src/runtime/coverage/ts_test.go
@@ -0,0 +1,207 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package coverage
+
+import (
+	"encoding/json"
+	"internal/coverage"
+	"internal/goexperiment"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"strings"
+	"testing"
+	_ "unsafe"
+)
+
+//go:linkname testing_testGoCoverDir testing.testGoCoverDir
+func testing_testGoCoverDir() string
+
+func testGoCoverDir(t *testing.T) string {
+	tgcd := testing_testGoCoverDir()
+	if tgcd != "" {
+		return tgcd
+	}
+	return t.TempDir()
+}
+
+// TestTestSupport does a basic verification of the functionality in
+// runtime/coverage.processCoverTestDir (doing this here as opposed to
+// relying on other test paths will provide a better signal when
+// running "go test -cover" for this package).
+func TestTestSupport(t *testing.T) {
+	if !goexperiment.CoverageRedesign {
+		return
+	}
+	if testing.CoverMode() == "" {
+		return
+	}
+	tgcd := testGoCoverDir(t)
+	t.Logf("testing.testGoCoverDir() returns %s mode=%s\n",
+		tgcd, testing.CoverMode())
+
+	textfile := filepath.Join(t.TempDir(), "file.txt")
+	var sb strings.Builder
+	err := processCoverTestDirInternal(tgcd, textfile,
+		testing.CoverMode(), "", &sb)
+	if err != nil {
+		t.Fatalf("bad: %v", err)
+	}
+
+	// Check for existence of text file.
+	if inf, err := os.Open(textfile); err != nil {
+		t.Fatalf("problems opening text file %s: %v", textfile, err)
+	} else {
+		inf.Close()
+	}
+
+	// Check for percent output with expected tokens.
+	strout := sb.String()
+	want := "of statements"
+	if !strings.Contains(strout, want) {
+		t.Logf("output from run: %s\n", strout)
+		t.Fatalf("percent output missing token: %q", want)
+	}
+}
+
+var funcInvoked bool
+
+//go:noinline
+func thisFunctionOnlyCalledFromSnapshotTest(n int) int {
+	if funcInvoked {
+		panic("bad")
+	}
+	funcInvoked = true
+
+	// Contents here not especially important, just so long as we
+	// have some statements.
+	t := 0
+	for i := 0; i < n; i++ {
+		for j := 0; j < i; j++ {
+			t += i ^ j
+		}
+	}
+	return t
+}
+
+// Tests runtime/coverage.snapshot() directly. Note that if
+// coverage is not enabled, the hook is designed to just return
+// zero.
+func TestCoverageSnapshot(t *testing.T) {
+	C1 := snapshot()
+	thisFunctionOnlyCalledFromSnapshotTest(15)
+	C2 := snapshot()
+	cond := "C1 > C2"
+	val := C1 > C2
+	if testing.CoverMode() != "" {
+		cond = "C1 >= C2"
+		val = C1 >= C2
+	}
+	t.Logf("%f %f\n", C1, C2)
+	if val {
+		t.Errorf("erroneous snapshots, %s = true C1=%f C2=%f",
+			cond, C1, C2)
+	}
+}
+
+const hellogo = `
+package main
+
+func main() {
+  println("hello")
+}
+`
+
+// Returns a pair F,T where F is a meta-data file generated from
+// "hello.go" above, and T is a token to look for that should be
+// present in the coverage report from F.
+func genAuxMeta(t *testing.T, dstdir string) (string, string) {
+	// Do a GOCOVERDIR=<tmp> go run hello.go
+	src := filepath.Join(dstdir, "hello.go")
+	if err := os.WriteFile(src, []byte(hellogo), 0777); err != nil {
+		t.Fatalf("write failed: %v", err)
+	}
+	args := []string{"run", "-covermode=" + testing.CoverMode(), src}
+	cmd := exec.Command(testenv.GoToolPath(t), args...)
+	cmd.Env = updateGoCoverDir(os.Environ(), dstdir, true)
+	if b, err := cmd.CombinedOutput(); err != nil {
+		t.Fatalf("go run failed (%v): %s", err, b)
+	}
+
+	// Pick out the generated meta-data file.
+	files, err := os.ReadDir(dstdir)
+	if err != nil {
+		t.Fatalf("reading %s: %v", dstdir, err)
+	}
+	for _, f := range files {
+		if strings.HasPrefix(f.Name(), "covmeta") {
+			return filepath.Join(dstdir, f.Name()), "hello.go:"
+		}
+	}
+	t.Fatalf("could not locate generated meta-data file")
+	return "", ""
+}
+
+func TestAuxMetaDataFiles(t *testing.T) {
+	if !goexperiment.CoverageRedesign {
+		return
+	}
+	if testing.CoverMode() == "" {
+		return
+	}
+	testenv.MustHaveGoRun(t)
+	tgcd := testGoCoverDir(t)
+	t.Logf("testing.testGoCoverDir() returns %s mode=%s\n",
+		tgcd, testing.CoverMode())
+
+	td := t.TempDir()
+
+	// Manufacture a new, separate meta-data file not related to this
+	// test. Contents are not important, just so long as the
+	// packages/paths are different.
+	othermetadir := filepath.Join(td, "othermeta")
+	if err := os.Mkdir(othermetadir, 0777); err != nil {
+		t.Fatalf("mkdir failed: %v", err)
+	}
+	mfile, token := genAuxMeta(t, othermetadir)
+
+	// Write a metafiles file.
+	metafiles := filepath.Join(tgcd, coverage.MetaFilesFileName)
+	mfc := coverage.MetaFileCollection{
+		ImportPaths:       []string{"command-line-arguments"},
+		MetaFileFragments: []string{mfile},
+	}
+	jdata, err := json.Marshal(mfc)
+	if err != nil {
+		t.Fatalf("marshal MetaFileCollection: %v", err)
+	}
+	if err := os.WriteFile(metafiles, jdata, 0666); err != nil {
+		t.Fatalf("write failed: %v", err)
+	}
+
+	// Kick off guts of test.
+	var sb strings.Builder
+	textfile := filepath.Join(td, "file2.txt")
+	err = processCoverTestDirInternal(tgcd, textfile,
+		testing.CoverMode(), "", &sb)
+	if err != nil {
+		t.Fatalf("bad: %v", err)
+	}
+	if err = os.Remove(metafiles); err != nil {
+		t.Fatalf("removing metafiles file: %v", err)
+	}
+
+	// Look for the expected things in the coverage profile.
+	contents, err := os.ReadFile(textfile)
+	strc := string(contents)
+	if err != nil {
+		t.Fatalf("problems reading text file %s: %v", textfile, err)
+	}
+	if !strings.Contains(strc, token) {
+		t.Logf("content: %s\n", string(contents))
+		t.Fatalf("cov profile does not contain aux meta content %q", token)
+	}
+}
diff --git a/src/runtime/covercounter.go b/src/runtime/covercounter.go
new file mode 100644
index 0000000..72842bd
--- /dev/null
+++ b/src/runtime/covercounter.go
@@ -0,0 +1,26 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/coverage/rtcov"
+	"unsafe"
+)
+
+//go:linkname runtime_coverage_getCovCounterList runtime/coverage.getCovCounterList
+func runtime_coverage_getCovCounterList() []rtcov.CovCounterBlob {
+	res := []rtcov.CovCounterBlob{}
+	u32sz := unsafe.Sizeof(uint32(0))
+	for datap := &firstmoduledata; datap != nil; datap = datap.next {
+		if datap.covctrs == datap.ecovctrs {
+			continue
+		}
+		res = append(res, rtcov.CovCounterBlob{
+			Counters: (*uint32)(unsafe.Pointer(datap.covctrs)),
+			Len:      uint64((datap.ecovctrs - datap.covctrs) / u32sz),
+		})
+	}
+	return res
+}
diff --git a/src/runtime/covermeta.go b/src/runtime/covermeta.go
new file mode 100644
index 0000000..54ef42a
--- /dev/null
+++ b/src/runtime/covermeta.go
@@ -0,0 +1,72 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/coverage/rtcov"
+	"unsafe"
+)
+
+// covMeta is the top-level container for bits of state related to
+// code coverage meta-data in the runtime.
+var covMeta struct {
+	// metaList contains the list of currently registered meta-data
+	// blobs for the running program.
+	metaList []rtcov.CovMetaBlob
+
+	// pkgMap records mappings from hard-coded package IDs to
+	// slots in the covMetaList above.
+	pkgMap map[int]int
+
+	// Set to true if we discover a package mapping glitch.
+	hardCodedListNeedsUpdating bool
+}
+
+// addCovMeta is invoked during package "init" functions by the
+// compiler when compiling for coverage instrumentation; here 'p' is a
+// meta-data blob of length 'dlen' for the package in question, 'hash'
+// is a compiler-computed md5.sum for the blob, 'pkpath' is the
+// package path, 'pkid' is the hard-coded ID that the compiler is
+// using for the package (or -1 if the compiler doesn't think a
+// hard-coded ID is needed), and 'cmode'/'cgran' are the coverage
+// counter mode and granularity requested by the user. Return value is
+// the ID for the package for use by the package code itself.
+func addCovMeta(p unsafe.Pointer, dlen uint32, hash [16]byte, pkpath string, pkid int, cmode uint8, cgran uint8) uint32 {
+	slot := len(covMeta.metaList)
+	covMeta.metaList = append(covMeta.metaList,
+		rtcov.CovMetaBlob{
+			P:                  (*byte)(p),
+			Len:                dlen,
+			Hash:               hash,
+			PkgPath:            pkpath,
+			PkgID:              pkid,
+			CounterMode:        cmode,
+			CounterGranularity: cgran,
+		})
+	if pkid != -1 {
+		if covMeta.pkgMap == nil {
+			covMeta.pkgMap = make(map[int]int)
+		}
+		if _, ok := covMeta.pkgMap[pkid]; ok {
+			throw("runtime.addCovMeta: coverage package map collision")
+		}
+		// Record the real slot (position on meta-list) for this
+		// package; we'll use the map to fix things up later on.
+		covMeta.pkgMap[pkid] = slot
+	}
+
+	// ID zero is reserved as invalid.
+	return uint32(slot + 1)
+}
+
+//go:linkname runtime_coverage_getCovMetaList runtime/coverage.getCovMetaList
+func runtime_coverage_getCovMetaList() []rtcov.CovMetaBlob {
+	return covMeta.metaList
+}
+
+//go:linkname runtime_coverage_getCovPkgMap runtime/coverage.getCovPkgMap
+func runtime_coverage_getCovPkgMap() map[int]int {
+	return covMeta.pkgMap
+}
diff --git a/src/runtime/cpuflags.go b/src/runtime/cpuflags.go
new file mode 100644
index 0000000..bbe93c5
--- /dev/null
+++ b/src/runtime/cpuflags.go
@@ -0,0 +1,34 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/cpu"
+	"unsafe"
+)
+
+// Offsets into internal/cpu records for use in assembly.
+const (
+	offsetX86HasAVX    = unsafe.Offsetof(cpu.X86.HasAVX)
+	offsetX86HasAVX2   = unsafe.Offsetof(cpu.X86.HasAVX2)
+	offsetX86HasERMS   = unsafe.Offsetof(cpu.X86.HasERMS)
+	offsetX86HasRDTSCP = unsafe.Offsetof(cpu.X86.HasRDTSCP)
+
+	offsetARMHasIDIVA = unsafe.Offsetof(cpu.ARM.HasIDIVA)
+
+	offsetMIPS64XHasMSA = unsafe.Offsetof(cpu.MIPS64X.HasMSA)
+)
+
+var (
+	// Set in runtime.cpuinit.
+	// TODO: deprecate these; use internal/cpu directly.
+	x86HasPOPCNT bool
+	x86HasSSE41  bool
+	x86HasFMA    bool
+
+	armHasVFPv4 bool
+
+	arm64HasATOMICS bool
+)
diff --git a/src/runtime/cpuflags_amd64.go b/src/runtime/cpuflags_amd64.go
new file mode 100644
index 0000000..8cca4bc
--- /dev/null
+++ b/src/runtime/cpuflags_amd64.go
@@ -0,0 +1,24 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/cpu"
+)
+
+var useAVXmemmove bool
+
+func init() {
+	// Let's remove stepping and reserved fields
+	processor := processorVersionInfo & 0x0FFF3FF0
+
+	isIntelBridgeFamily := isIntel &&
+		processor == 0x206A0 ||
+		processor == 0x206D0 ||
+		processor == 0x306A0 ||
+		processor == 0x306E0
+
+	useAVXmemmove = cpu.X86.HasAVX && !isIntelBridgeFamily
+}
diff --git a/src/runtime/cpuflags_arm64.go b/src/runtime/cpuflags_arm64.go
new file mode 100644
index 0000000..2ed1811
--- /dev/null
+++ b/src/runtime/cpuflags_arm64.go
@@ -0,0 +1,17 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/cpu"
+)
+
+var arm64UseAlignedLoads bool
+
+func init() {
+	if cpu.ARM64.IsNeoverse {
+		arm64UseAlignedLoads = true
+	}
+}
diff --git a/src/runtime/cpuprof.go b/src/runtime/cpuprof.go
new file mode 100644
index 0000000..0d7eeac
--- /dev/null
+++ b/src/runtime/cpuprof.go
@@ -0,0 +1,241 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// CPU profiling.
+//
+// The signal handler for the profiling clock tick adds a new stack trace
+// to a log of recent traces. The log is read by a user goroutine that
+// turns it into formatted profile data. If the reader does not keep up
+// with the log, those writes will be recorded as a count of lost records.
+// The actual profile buffer is in profbuf.go.
+
+package runtime
+
+import (
+	"internal/abi"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	maxCPUProfStack = 64
+
+	// profBufWordCount is the size of the CPU profile buffer's storage for the
+	// header and stack of each sample, measured in 64-bit words. Every sample
+	// has a required header of two words. With a small additional header (a
+	// word or two) and stacks at the profiler's maximum length of 64 frames,
+	// that capacity can support 1900 samples or 19 thread-seconds at a 100 Hz
+	// sample rate, at a cost of 1 MiB.
+	profBufWordCount = 1 << 17
+	// profBufTagCount is the size of the CPU profile buffer's storage for the
+	// goroutine tags associated with each sample. A capacity of 1<<14 means
+	// room for 16k samples, or 160 thread-seconds at a 100 Hz sample rate.
+	profBufTagCount = 1 << 14
+)
+
+type cpuProfile struct {
+	lock mutex
+	on   bool     // profiling is on
+	log  *profBuf // profile events written here
+
+	// extra holds extra stacks accumulated in addNonGo
+	// corresponding to profiling signals arriving on
+	// non-Go-created threads. Those stacks are written
+	// to log the next time a normal Go thread gets the
+	// signal handler.
+	// Assuming the stacks are 2 words each (we don't get
+	// a full traceback from those threads), plus one word
+	// size for framing, 100 Hz profiling would generate
+	// 300 words per second.
+	// Hopefully a normal Go thread will get the profiling
+	// signal at least once every few seconds.
+	extra      [1000]uintptr
+	numExtra   int
+	lostExtra  uint64 // count of frames lost because extra is full
+	lostAtomic uint64 // count of frames lost because of being in atomic64 on mips/arm; updated racily
+}
+
+var cpuprof cpuProfile
+
+// SetCPUProfileRate sets the CPU profiling rate to hz samples per second.
+// If hz <= 0, SetCPUProfileRate turns off profiling.
+// If the profiler is on, the rate cannot be changed without first turning it off.
+//
+// Most clients should use the runtime/pprof package or
+// the testing package's -test.cpuprofile flag instead of calling
+// SetCPUProfileRate directly.
+func SetCPUProfileRate(hz int) {
+	// Clamp hz to something reasonable.
+	if hz < 0 {
+		hz = 0
+	}
+	if hz > 1000000 {
+		hz = 1000000
+	}
+
+	lock(&cpuprof.lock)
+	if hz > 0 {
+		if cpuprof.on || cpuprof.log != nil {
+			print("runtime: cannot set cpu profile rate until previous profile has finished.\n")
+			unlock(&cpuprof.lock)
+			return
+		}
+
+		cpuprof.on = true
+		cpuprof.log = newProfBuf(1, profBufWordCount, profBufTagCount)
+		hdr := [1]uint64{uint64(hz)}
+		cpuprof.log.write(nil, nanotime(), hdr[:], nil)
+		setcpuprofilerate(int32(hz))
+	} else if cpuprof.on {
+		setcpuprofilerate(0)
+		cpuprof.on = false
+		cpuprof.addExtra()
+		cpuprof.log.close()
+	}
+	unlock(&cpuprof.lock)
+}
+
+// add adds the stack trace to the profile.
+// It is called from signal handlers and other limited environments
+// and cannot allocate memory or acquire locks that might be
+// held at the time of the signal, nor can it use substantial amounts
+// of stack.
+//
+//go:nowritebarrierrec
+func (p *cpuProfile) add(tagPtr *unsafe.Pointer, stk []uintptr) {
+	// Simple cas-lock to coordinate with setcpuprofilerate.
+	for !prof.signalLock.CompareAndSwap(0, 1) {
+		// TODO: Is it safe to osyield here? https://go.dev/issue/52672
+		osyield()
+	}
+
+	if prof.hz.Load() != 0 { // implies cpuprof.log != nil
+		if p.numExtra > 0 || p.lostExtra > 0 || p.lostAtomic > 0 {
+			p.addExtra()
+		}
+		hdr := [1]uint64{1}
+		// Note: write "knows" that the argument is &gp.labels,
+		// because otherwise its write barrier behavior may not
+		// be correct. See the long comment there before
+		// changing the argument here.
+		cpuprof.log.write(tagPtr, nanotime(), hdr[:], stk)
+	}
+
+	prof.signalLock.Store(0)
+}
+
+// addNonGo adds the non-Go stack trace to the profile.
+// It is called from a non-Go thread, so we cannot use much stack at all,
+// nor do anything that needs a g or an m.
+// In particular, we can't call cpuprof.log.write.
+// Instead, we copy the stack into cpuprof.extra,
+// which will be drained the next time a Go thread
+// gets the signal handling event.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func (p *cpuProfile) addNonGo(stk []uintptr) {
+	// Simple cas-lock to coordinate with SetCPUProfileRate.
+	// (Other calls to add or addNonGo should be blocked out
+	// by the fact that only one SIGPROF can be handled by the
+	// process at a time. If not, this lock will serialize those too.
+	// The use of timer_create(2) on Linux to request process-targeted
+	// signals may have changed this.)
+	for !prof.signalLock.CompareAndSwap(0, 1) {
+		// TODO: Is it safe to osyield here? https://go.dev/issue/52672
+		osyield()
+	}
+
+	if cpuprof.numExtra+1+len(stk) < len(cpuprof.extra) {
+		i := cpuprof.numExtra
+		cpuprof.extra[i] = uintptr(1 + len(stk))
+		copy(cpuprof.extra[i+1:], stk)
+		cpuprof.numExtra += 1 + len(stk)
+	} else {
+		cpuprof.lostExtra++
+	}
+
+	prof.signalLock.Store(0)
+}
+
+// addExtra adds the "extra" profiling events,
+// queued by addNonGo, to the profile log.
+// addExtra is called either from a signal handler on a Go thread
+// or from an ordinary goroutine; either way it can use stack
+// and has a g. The world may be stopped, though.
+func (p *cpuProfile) addExtra() {
+	// Copy accumulated non-Go profile events.
+	hdr := [1]uint64{1}
+	for i := 0; i < p.numExtra; {
+		p.log.write(nil, 0, hdr[:], p.extra[i+1:i+int(p.extra[i])])
+		i += int(p.extra[i])
+	}
+	p.numExtra = 0
+
+	// Report any lost events.
+	if p.lostExtra > 0 {
+		hdr := [1]uint64{p.lostExtra}
+		lostStk := [2]uintptr{
+			abi.FuncPCABIInternal(_LostExternalCode) + sys.PCQuantum,
+			abi.FuncPCABIInternal(_ExternalCode) + sys.PCQuantum,
+		}
+		p.log.write(nil, 0, hdr[:], lostStk[:])
+		p.lostExtra = 0
+	}
+
+	if p.lostAtomic > 0 {
+		hdr := [1]uint64{p.lostAtomic}
+		lostStk := [2]uintptr{
+			abi.FuncPCABIInternal(_LostSIGPROFDuringAtomic64) + sys.PCQuantum,
+			abi.FuncPCABIInternal(_System) + sys.PCQuantum,
+		}
+		p.log.write(nil, 0, hdr[:], lostStk[:])
+		p.lostAtomic = 0
+	}
+
+}
+
+// CPUProfile panics.
+// It formerly provided raw access to chunks of
+// a pprof-format profile generated by the runtime.
+// The details of generating that format have changed,
+// so this functionality has been removed.
+//
+// Deprecated: Use the runtime/pprof package,
+// or the handlers in the net/http/pprof package,
+// or the testing package's -test.cpuprofile flag instead.
+func CPUProfile() []byte {
+	panic("CPUProfile no longer available")
+}
+
+//go:linkname runtime_pprof_runtime_cyclesPerSecond runtime/pprof.runtime_cyclesPerSecond
+func runtime_pprof_runtime_cyclesPerSecond() int64 {
+	return tickspersecond()
+}
+
+// readProfile, provided to runtime/pprof, returns the next chunk of
+// binary CPU profiling stack trace data, blocking until data is available.
+// If profiling is turned off and all the profile data accumulated while it was
+// on has been returned, readProfile returns eof=true.
+// The caller must save the returned data and tags before calling readProfile again.
+// The returned data contains a whole number of records, and tags contains
+// exactly one entry per record.
+//
+//go:linkname runtime_pprof_readProfile runtime/pprof.readProfile
+func runtime_pprof_readProfile() ([]uint64, []unsafe.Pointer, bool) {
+	lock(&cpuprof.lock)
+	log := cpuprof.log
+	unlock(&cpuprof.lock)
+	readMode := profBufBlocking
+	if GOOS == "darwin" || GOOS == "ios" {
+		readMode = profBufNonBlocking // For #61768; on Darwin notes are not async-signal-safe.  See sigNoteSetup in os_darwin.go.
+	}
+	data, tags, eof := log.read(readMode)
+	if len(data) == 0 && eof {
+		lock(&cpuprof.lock)
+		cpuprof.log = nil
+		unlock(&cpuprof.lock)
+	}
+	return data, tags, eof
+}
diff --git a/src/runtime/cputicks.go b/src/runtime/cputicks.go
new file mode 100644
index 0000000..2cf3240
--- /dev/null
+++ b/src/runtime/cputicks.go
@@ -0,0 +1,11 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !arm && !arm64 && !mips64 && !mips64le && !mips && !mipsle && !wasm
+
+package runtime
+
+// careful: cputicks is not guaranteed to be monotonic! In particular, we have
+// noticed drift between cpus on certain os/arch combinations. See issue 8976.
+func cputicks() int64
diff --git a/src/runtime/crash_cgo_test.go b/src/runtime/crash_cgo_test.go
new file mode 100644
index 0000000..424aedb
--- /dev/null
+++ b/src/runtime/crash_cgo_test.go
@@ -0,0 +1,872 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build cgo
+
+package runtime_test
+
+import (
+	"fmt"
+	"internal/goos"
+	"internal/platform"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"runtime"
+	"strconv"
+	"strings"
+	"testing"
+	"time"
+)
+
+func TestCgoCrashHandler(t *testing.T) {
+	t.Parallel()
+	testCrashHandler(t, true)
+}
+
+func TestCgoSignalDeadlock(t *testing.T) {
+	// Don't call t.Parallel, since too much work going on at the
+	// same time can cause the testprogcgo code to overrun its
+	// timeouts (issue #18598).
+
+	if testing.Short() && runtime.GOOS == "windows" {
+		t.Skip("Skipping in short mode") // takes up to 64 seconds
+	}
+	got := runTestProg(t, "testprogcgo", "CgoSignalDeadlock")
+	want := "OK\n"
+	if got != want {
+		t.Fatalf("expected %q, but got:\n%s", want, got)
+	}
+}
+
+func TestCgoTraceback(t *testing.T) {
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "CgoTraceback")
+	want := "OK\n"
+	if got != want {
+		t.Fatalf("expected %q, but got:\n%s", want, got)
+	}
+}
+
+func TestCgoCallbackGC(t *testing.T) {
+	t.Parallel()
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no pthreads on %s", runtime.GOOS)
+	}
+	if testing.Short() {
+		switch {
+		case runtime.GOOS == "dragonfly":
+			t.Skip("see golang.org/issue/11990")
+		case runtime.GOOS == "linux" && runtime.GOARCH == "arm":
+			t.Skip("too slow for arm builders")
+		case runtime.GOOS == "linux" && (runtime.GOARCH == "mips64" || runtime.GOARCH == "mips64le"):
+			t.Skip("too slow for mips64x builders")
+		}
+	}
+	if testenv.Builder() == "darwin-amd64-10_14" {
+		// TODO(#23011): When the 10.14 builders are gone, remove this skip.
+		t.Skip("skipping due to platform bug on macOS 10.14; see https://golang.org/issue/43926")
+	}
+	got := runTestProg(t, "testprogcgo", "CgoCallbackGC")
+	want := "OK\n"
+	if got != want {
+		t.Fatalf("expected %q, but got:\n%s", want, got)
+	}
+}
+
+func TestCgoExternalThreadPanic(t *testing.T) {
+	t.Parallel()
+	if runtime.GOOS == "plan9" {
+		t.Skipf("no pthreads on %s", runtime.GOOS)
+	}
+	got := runTestProg(t, "testprogcgo", "CgoExternalThreadPanic")
+	want := "panic: BOOM"
+	if !strings.Contains(got, want) {
+		t.Fatalf("want failure containing %q. output:\n%s\n", want, got)
+	}
+}
+
+func TestCgoExternalThreadSIGPROF(t *testing.T) {
+	t.Parallel()
+	// issue 9456.
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no pthreads on %s", runtime.GOOS)
+	}
+
+	got := runTestProg(t, "testprogcgo", "CgoExternalThreadSIGPROF", "GO_START_SIGPROF_THREAD=1")
+	if want := "OK\n"; got != want {
+		t.Fatalf("expected %q, but got:\n%s", want, got)
+	}
+}
+
+func TestCgoExternalThreadSignal(t *testing.T) {
+	t.Parallel()
+	// issue 10139
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no pthreads on %s", runtime.GOOS)
+	}
+
+	got := runTestProg(t, "testprogcgo", "CgoExternalThreadSignal")
+	if want := "OK\n"; got != want {
+		if runtime.GOOS == "ios" && strings.Contains(got, "C signal did not crash as expected") {
+			testenv.SkipFlaky(t, 59913)
+		}
+		t.Fatalf("expected %q, but got:\n%s", want, got)
+	}
+}
+
+func TestCgoDLLImports(t *testing.T) {
+	// test issue 9356
+	if runtime.GOOS != "windows" {
+		t.Skip("skipping windows specific test")
+	}
+	got := runTestProg(t, "testprogcgo", "CgoDLLImportsMain")
+	want := "OK\n"
+	if got != want {
+		t.Fatalf("expected %q, but got %v", want, got)
+	}
+}
+
+func TestCgoExecSignalMask(t *testing.T) {
+	t.Parallel()
+	// Test issue 13164.
+	switch runtime.GOOS {
+	case "windows", "plan9":
+		t.Skipf("skipping signal mask test on %s", runtime.GOOS)
+	}
+	got := runTestProg(t, "testprogcgo", "CgoExecSignalMask", "GOTRACEBACK=system")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q, got %v", want, got)
+	}
+}
+
+func TestEnsureDropM(t *testing.T) {
+	t.Parallel()
+	// Test for issue 13881.
+	switch runtime.GOOS {
+	case "windows", "plan9":
+		t.Skipf("skipping dropm test on %s", runtime.GOOS)
+	}
+	got := runTestProg(t, "testprogcgo", "EnsureDropM")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q, got %v", want, got)
+	}
+}
+
+// Test for issue 14387.
+// Test that the program that doesn't need any cgo pointer checking
+// takes about the same amount of time with it as without it.
+func TestCgoCheckBytes(t *testing.T) {
+	t.Parallel()
+	// Make sure we don't count the build time as part of the run time.
+	testenv.MustHaveGoBuild(t)
+	exe, err := buildTestProg(t, "testprogcgo")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	// Try it 10 times to avoid flakiness.
+	const tries = 10
+	var tot1, tot2 time.Duration
+	for i := 0; i < tries; i++ {
+		cmd := testenv.CleanCmdEnv(exec.Command(exe, "CgoCheckBytes"))
+		cmd.Env = append(cmd.Env, "GODEBUG=cgocheck=0", fmt.Sprintf("GO_CGOCHECKBYTES_TRY=%d", i))
+
+		start := time.Now()
+		cmd.Run()
+		d1 := time.Since(start)
+
+		cmd = testenv.CleanCmdEnv(exec.Command(exe, "CgoCheckBytes"))
+		cmd.Env = append(cmd.Env, fmt.Sprintf("GO_CGOCHECKBYTES_TRY=%d", i))
+
+		start = time.Now()
+		cmd.Run()
+		d2 := time.Since(start)
+
+		if d1*20 > d2 {
+			// The slow version (d2) was less than 20 times
+			// slower than the fast version (d1), so OK.
+			return
+		}
+
+		tot1 += d1
+		tot2 += d2
+	}
+
+	t.Errorf("cgo check too slow: got %v, expected at most %v", tot2/tries, (tot1/tries)*20)
+}
+
+func TestCgoPanicDeadlock(t *testing.T) {
+	t.Parallel()
+	// test issue 14432
+	got := runTestProg(t, "testprogcgo", "CgoPanicDeadlock")
+	want := "panic: cgo error\n\n"
+	if !strings.HasPrefix(got, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, got)
+	}
+}
+
+func TestCgoCCodeSIGPROF(t *testing.T) {
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "CgoCCodeSIGPROF")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q got %v", want, got)
+	}
+}
+
+func TestCgoPprofCallback(t *testing.T) {
+	if testing.Short() {
+		t.Skip("skipping in short mode") // takes a full second
+	}
+	switch runtime.GOOS {
+	case "windows", "plan9":
+		t.Skipf("skipping cgo pprof callback test on %s", runtime.GOOS)
+	}
+	got := runTestProg(t, "testprogcgo", "CgoPprofCallback")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q got %v", want, got)
+	}
+}
+
+func TestCgoCrashTraceback(t *testing.T) {
+	t.Parallel()
+	switch platform := runtime.GOOS + "/" + runtime.GOARCH; platform {
+	case "darwin/amd64":
+	case "linux/amd64":
+	case "linux/arm64":
+	case "linux/ppc64le":
+	default:
+		t.Skipf("not yet supported on %s", platform)
+	}
+	got := runTestProg(t, "testprogcgo", "CrashTraceback")
+	for i := 1; i <= 3; i++ {
+		if !strings.Contains(got, fmt.Sprintf("cgo symbolizer:%d", i)) {
+			t.Errorf("missing cgo symbolizer:%d", i)
+		}
+	}
+}
+
+func TestCgoCrashTracebackGo(t *testing.T) {
+	t.Parallel()
+	switch platform := runtime.GOOS + "/" + runtime.GOARCH; platform {
+	case "darwin/amd64":
+	case "linux/amd64":
+	case "linux/arm64":
+	case "linux/ppc64le":
+	default:
+		t.Skipf("not yet supported on %s", platform)
+	}
+	got := runTestProg(t, "testprogcgo", "CrashTracebackGo")
+	for i := 1; i <= 3; i++ {
+		want := fmt.Sprintf("main.h%d", i)
+		if !strings.Contains(got, want) {
+			t.Errorf("missing %s", want)
+		}
+	}
+}
+
+func TestCgoTracebackContext(t *testing.T) {
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "TracebackContext")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q got %v", want, got)
+	}
+}
+
+func TestCgoTracebackContextPreemption(t *testing.T) {
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "TracebackContextPreemption")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q got %v", want, got)
+	}
+}
+
+func testCgoPprof(t *testing.T, buildArg, runArg, top, bottom string) {
+	t.Parallel()
+	if runtime.GOOS != "linux" || (runtime.GOARCH != "amd64" && runtime.GOARCH != "ppc64le" && runtime.GOARCH != "arm64") {
+		t.Skipf("not yet supported on %s/%s", runtime.GOOS, runtime.GOARCH)
+	}
+	testenv.MustHaveGoRun(t)
+
+	exe, err := buildTestProg(t, "testprogcgo", buildArg)
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	cmd := testenv.CleanCmdEnv(exec.Command(exe, runArg))
+	got, err := cmd.CombinedOutput()
+	if err != nil {
+		if testenv.Builder() == "linux-amd64-alpine" {
+			// See Issue 18243 and Issue 19938.
+			t.Skipf("Skipping failing test on Alpine (golang.org/issue/18243). Ignoring error: %v", err)
+		}
+		t.Fatalf("%s\n\n%v", got, err)
+	}
+	fn := strings.TrimSpace(string(got))
+	defer os.Remove(fn)
+
+	for try := 0; try < 2; try++ {
+		cmd := testenv.CleanCmdEnv(exec.Command(testenv.GoToolPath(t), "tool", "pprof", "-tagignore=ignore", "-traces"))
+		// Check that pprof works both with and without explicit executable on command line.
+		if try == 0 {
+			cmd.Args = append(cmd.Args, exe, fn)
+		} else {
+			cmd.Args = append(cmd.Args, fn)
+		}
+
+		found := false
+		for i, e := range cmd.Env {
+			if strings.HasPrefix(e, "PPROF_TMPDIR=") {
+				cmd.Env[i] = "PPROF_TMPDIR=" + os.TempDir()
+				found = true
+				break
+			}
+		}
+		if !found {
+			cmd.Env = append(cmd.Env, "PPROF_TMPDIR="+os.TempDir())
+		}
+
+		out, err := cmd.CombinedOutput()
+		t.Logf("%s:\n%s", cmd.Args, out)
+		if err != nil {
+			t.Error(err)
+			continue
+		}
+
+		trace := findTrace(string(out), top)
+		if len(trace) == 0 {
+			t.Errorf("%s traceback missing.", top)
+			continue
+		}
+		if trace[len(trace)-1] != bottom {
+			t.Errorf("invalid traceback origin: got=%v; want=[%s ... %s]", trace, top, bottom)
+		}
+	}
+}
+
+func TestCgoPprof(t *testing.T) {
+	testCgoPprof(t, "", "CgoPprof", "cpuHog", "runtime.main")
+}
+
+func TestCgoPprofPIE(t *testing.T) {
+	testCgoPprof(t, "-buildmode=pie", "CgoPprof", "cpuHog", "runtime.main")
+}
+
+func TestCgoPprofThread(t *testing.T) {
+	testCgoPprof(t, "", "CgoPprofThread", "cpuHogThread", "cpuHogThread2")
+}
+
+func TestCgoPprofThreadNoTraceback(t *testing.T) {
+	testCgoPprof(t, "", "CgoPprofThreadNoTraceback", "cpuHogThread", "runtime._ExternalCode")
+}
+
+func TestRaceProf(t *testing.T) {
+	if !platform.RaceDetectorSupported(runtime.GOOS, runtime.GOARCH) {
+		t.Skipf("skipping on %s/%s because race detector not supported", runtime.GOOS, runtime.GOARCH)
+	}
+	if runtime.GOOS == "windows" {
+		t.Skipf("skipping: test requires pthread support")
+		// TODO: Can this test be rewritten to use the C11 thread API instead?
+	}
+
+	testenv.MustHaveGoRun(t)
+
+	// This test requires building various packages with -race, so
+	// it's somewhat slow.
+	if testing.Short() {
+		t.Skip("skipping test in -short mode")
+	}
+
+	exe, err := buildTestProg(t, "testprogcgo", "-race")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	got, err := testenv.CleanCmdEnv(exec.Command(exe, "CgoRaceprof")).CombinedOutput()
+	if err != nil {
+		t.Fatal(err)
+	}
+	want := "OK\n"
+	if string(got) != want {
+		t.Errorf("expected %q got %s", want, got)
+	}
+}
+
+func TestRaceSignal(t *testing.T) {
+	if !platform.RaceDetectorSupported(runtime.GOOS, runtime.GOARCH) {
+		t.Skipf("skipping on %s/%s because race detector not supported", runtime.GOOS, runtime.GOARCH)
+	}
+	if runtime.GOOS == "windows" {
+		t.Skipf("skipping: test requires pthread support")
+		// TODO: Can this test be rewritten to use the C11 thread API instead?
+	}
+	if runtime.GOOS == "darwin" || runtime.GOOS == "ios" {
+		testenv.SkipFlaky(t, 60316)
+	}
+
+	t.Parallel()
+
+	testenv.MustHaveGoRun(t)
+
+	// This test requires building various packages with -race, so
+	// it's somewhat slow.
+	if testing.Short() {
+		t.Skip("skipping test in -short mode")
+	}
+
+	exe, err := buildTestProg(t, "testprogcgo", "-race")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	got, err := testenv.CleanCmdEnv(testenv.Command(t, exe, "CgoRaceSignal")).CombinedOutput()
+	if err != nil {
+		t.Logf("%s\n", got)
+		t.Fatal(err)
+	}
+	want := "OK\n"
+	if string(got) != want {
+		t.Errorf("expected %q got %s", want, got)
+	}
+}
+
+func TestCgoNumGoroutine(t *testing.T) {
+	switch runtime.GOOS {
+	case "windows", "plan9":
+		t.Skipf("skipping numgoroutine test on %s", runtime.GOOS)
+	}
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "NumGoroutine")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q got %v", want, got)
+	}
+}
+
+func TestCatchPanic(t *testing.T) {
+	t.Parallel()
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no signals on %s", runtime.GOOS)
+	case "darwin":
+		if runtime.GOARCH == "amd64" {
+			t.Skipf("crash() on darwin/amd64 doesn't raise SIGABRT")
+		}
+	}
+
+	testenv.MustHaveGoRun(t)
+
+	exe, err := buildTestProg(t, "testprogcgo")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	for _, early := range []bool{true, false} {
+		cmd := testenv.CleanCmdEnv(exec.Command(exe, "CgoCatchPanic"))
+		// Make sure a panic results in a crash.
+		cmd.Env = append(cmd.Env, "GOTRACEBACK=crash")
+		if early {
+			// Tell testprogcgo to install an early signal handler for SIGABRT
+			cmd.Env = append(cmd.Env, "CGOCATCHPANIC_EARLY_HANDLER=1")
+		}
+		if out, err := cmd.CombinedOutput(); err != nil {
+			t.Errorf("testprogcgo CgoCatchPanic failed: %v\n%s", err, out)
+		}
+	}
+}
+
+func TestCgoLockOSThreadExit(t *testing.T) {
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no pthreads on %s", runtime.GOOS)
+	}
+	t.Parallel()
+	testLockOSThreadExit(t, "testprogcgo")
+}
+
+func TestWindowsStackMemoryCgo(t *testing.T) {
+	if runtime.GOOS != "windows" {
+		t.Skip("skipping windows specific test")
+	}
+	testenv.SkipFlaky(t, 22575)
+	o := runTestProg(t, "testprogcgo", "StackMemory")
+	stackUsage, err := strconv.Atoi(o)
+	if err != nil {
+		t.Fatalf("Failed to read stack usage: %v", err)
+	}
+	if expected, got := 100<<10, stackUsage; got > expected {
+		t.Fatalf("expected < %d bytes of memory per thread, got %d", expected, got)
+	}
+}
+
+func TestSigStackSwapping(t *testing.T) {
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no sigaltstack on %s", runtime.GOOS)
+	}
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "SigStack")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q got %v", want, got)
+	}
+}
+
+func TestCgoTracebackSigpanic(t *testing.T) {
+	// Test unwinding over a sigpanic in C code without a C
+	// symbolizer. See issue #23576.
+	if runtime.GOOS == "windows" {
+		// On Windows if we get an exception in C code, we let
+		// the Windows exception handler unwind it, rather
+		// than injecting a sigpanic.
+		t.Skip("no sigpanic in C on windows")
+	}
+	if runtime.GOOS == "ios" {
+		testenv.SkipFlaky(t, 59912)
+	}
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "TracebackSigpanic")
+	t.Log(got)
+	// We should see the function that calls the C function.
+	want := "main.TracebackSigpanic"
+	if !strings.Contains(got, want) {
+		if runtime.GOOS == "android" && (runtime.GOARCH == "arm" || runtime.GOARCH == "arm64") {
+			testenv.SkipFlaky(t, 58794)
+		}
+		t.Errorf("did not see %q in output", want)
+	}
+	// We shouldn't inject a sigpanic call. (see issue 57698)
+	nowant := "runtime.sigpanic"
+	if strings.Contains(got, nowant) {
+		t.Errorf("unexpectedly saw %q in output", nowant)
+	}
+	// No runtime errors like "runtime: unexpected return pc".
+	nowant = "runtime: "
+	if strings.Contains(got, nowant) {
+		t.Errorf("unexpectedly saw %q in output", nowant)
+	}
+}
+
+func TestCgoPanicCallback(t *testing.T) {
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "PanicCallback")
+	t.Log(got)
+	want := "panic: runtime error: invalid memory address or nil pointer dereference"
+	if !strings.Contains(got, want) {
+		t.Errorf("did not see %q in output", want)
+	}
+	want = "panic_callback"
+	if !strings.Contains(got, want) {
+		t.Errorf("did not see %q in output", want)
+	}
+	want = "PanicCallback"
+	if !strings.Contains(got, want) {
+		t.Errorf("did not see %q in output", want)
+	}
+	// No runtime errors like "runtime: unexpected return pc".
+	nowant := "runtime: "
+	if strings.Contains(got, nowant) {
+		t.Errorf("did not see %q in output", want)
+	}
+}
+
+// Test that C code called via cgo can use large Windows thread stacks
+// and call back in to Go without crashing. See issue #20975.
+//
+// See also TestBigStackCallbackSyscall.
+func TestBigStackCallbackCgo(t *testing.T) {
+	if runtime.GOOS != "windows" {
+		t.Skip("skipping windows specific test")
+	}
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "BigStack")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q got %v", want, got)
+	}
+}
+
+func nextTrace(lines []string) ([]string, []string) {
+	var trace []string
+	for n, line := range lines {
+		if strings.HasPrefix(line, "---") {
+			return trace, lines[n+1:]
+		}
+		fields := strings.Fields(strings.TrimSpace(line))
+		if len(fields) == 0 {
+			continue
+		}
+		// Last field contains the function name.
+		trace = append(trace, fields[len(fields)-1])
+	}
+	return nil, nil
+}
+
+func findTrace(text, top string) []string {
+	lines := strings.Split(text, "\n")
+	_, lines = nextTrace(lines) // Skip the header.
+	for len(lines) > 0 {
+		var t []string
+		t, lines = nextTrace(lines)
+		if len(t) == 0 {
+			continue
+		}
+		if t[0] == top {
+			return t
+		}
+	}
+	return nil
+}
+
+func TestSegv(t *testing.T) {
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no signals on %s", runtime.GOOS)
+	}
+
+	for _, test := range []string{"Segv", "SegvInCgo", "TgkillSegv", "TgkillSegvInCgo"} {
+		test := test
+
+		// The tgkill variants only run on Linux.
+		if runtime.GOOS != "linux" && strings.HasPrefix(test, "Tgkill") {
+			continue
+		}
+
+		t.Run(test, func(t *testing.T) {
+			if test == "SegvInCgo" && runtime.GOOS == "ios" {
+				testenv.SkipFlaky(t, 59947) // Don't even try, in case it times out.
+			}
+
+			t.Parallel()
+			prog := "testprog"
+			if strings.HasSuffix(test, "InCgo") {
+				prog = "testprogcgo"
+			}
+			got := runTestProg(t, prog, test)
+			t.Log(got)
+			want := "SIGSEGV"
+			if !strings.Contains(got, want) {
+				if runtime.GOOS == "darwin" && runtime.GOARCH == "amd64" && strings.Contains(got, "fatal: morestack on g0") {
+					testenv.SkipFlaky(t, 39457)
+				}
+				t.Errorf("did not see %q in output", want)
+			}
+
+			// No runtime errors like "runtime: unknown pc".
+			switch runtime.GOOS {
+			case "darwin", "ios", "illumos", "solaris":
+				// Runtime sometimes throws when generating the traceback.
+				testenv.SkipFlaky(t, 49182)
+			case "linux":
+				if runtime.GOARCH == "386" {
+					// Runtime throws when generating a traceback from
+					// a VDSO call via asmcgocall.
+					testenv.SkipFlaky(t, 50504)
+				}
+			}
+			if test == "SegvInCgo" && strings.Contains(got, "unknown pc") {
+				testenv.SkipFlaky(t, 50979)
+			}
+
+			for _, nowant := range []string{"fatal error: ", "runtime: "} {
+				if strings.Contains(got, nowant) {
+					if runtime.GOOS == "darwin" && strings.Contains(got, "0xb01dfacedebac1e") {
+						// See the comment in signal_darwin_amd64.go.
+						t.Skip("skipping due to Darwin handling of malformed addresses")
+					}
+					t.Errorf("unexpectedly saw %q in output", nowant)
+				}
+			}
+		})
+	}
+}
+
+func TestAbortInCgo(t *testing.T) {
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		// N.B. On Windows, C abort() causes the program to exit
+		// without going through the runtime at all.
+		t.Skipf("no signals on %s", runtime.GOOS)
+	}
+
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "Abort")
+	t.Log(got)
+	want := "SIGABRT"
+	if !strings.Contains(got, want) {
+		t.Errorf("did not see %q in output", want)
+	}
+	// No runtime errors like "runtime: unknown pc".
+	nowant := "runtime: "
+	if strings.Contains(got, nowant) {
+		t.Errorf("did not see %q in output", want)
+	}
+}
+
+// TestEINTR tests that we handle EINTR correctly.
+// See issue #20400 and friends.
+func TestEINTR(t *testing.T) {
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no EINTR on %s", runtime.GOOS)
+	case "linux":
+		if runtime.GOARCH == "386" {
+			// On linux-386 the Go signal handler sets
+			// a restorer function that is not preserved
+			// by the C sigaction call in the test,
+			// causing the signal handler to crash when
+			// returning the normal code. The test is not
+			// architecture-specific, so just skip on 386
+			// rather than doing a complicated workaround.
+			t.Skip("skipping on linux-386; C sigaction does not preserve Go restorer")
+		}
+	}
+
+	t.Parallel()
+	output := runTestProg(t, "testprogcgo", "EINTR")
+	want := "OK\n"
+	if output != want {
+		t.Fatalf("want %s, got %s\n", want, output)
+	}
+}
+
+// Issue #42207.
+func TestNeedmDeadlock(t *testing.T) {
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no signals on %s", runtime.GOOS)
+	}
+	output := runTestProg(t, "testprogcgo", "NeedmDeadlock")
+	want := "OK\n"
+	if output != want {
+		t.Fatalf("want %s, got %s\n", want, output)
+	}
+}
+
+func TestCgoTracebackGoroutineProfile(t *testing.T) {
+	output := runTestProg(t, "testprogcgo", "GoroutineProfile")
+	want := "OK\n"
+	if output != want {
+		t.Fatalf("want %s, got %s\n", want, output)
+	}
+}
+
+func TestCgoTraceParser(t *testing.T) {
+	// Test issue 29707.
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no pthreads on %s", runtime.GOOS)
+	}
+	output := runTestProg(t, "testprogcgo", "CgoTraceParser")
+	want := "OK\n"
+	ErrTimeOrder := "ErrTimeOrder\n"
+	if output == ErrTimeOrder {
+		t.Skipf("skipping due to golang.org/issue/16755: %v", output)
+	} else if output != want {
+		t.Fatalf("want %s, got %s\n", want, output)
+	}
+}
+
+func TestCgoTraceParserWithOneProc(t *testing.T) {
+	// Test issue 29707.
+	switch runtime.GOOS {
+	case "plan9", "windows":
+		t.Skipf("no pthreads on %s", runtime.GOOS)
+	}
+	output := runTestProg(t, "testprogcgo", "CgoTraceParser", "GOMAXPROCS=1")
+	want := "OK\n"
+	ErrTimeOrder := "ErrTimeOrder\n"
+	if output == ErrTimeOrder {
+		t.Skipf("skipping due to golang.org/issue/16755: %v", output)
+	} else if output != want {
+		t.Fatalf("GOMAXPROCS=1, want %s, got %s\n", want, output)
+	}
+}
+
+func TestCgoSigfwd(t *testing.T) {
+	t.Parallel()
+	if !goos.IsUnix {
+		t.Skipf("no signals on %s", runtime.GOOS)
+	}
+
+	got := runTestProg(t, "testprogcgo", "CgoSigfwd", "GO_TEST_CGOSIGFWD=1")
+	if want := "OK\n"; got != want {
+		t.Fatalf("expected %q, but got:\n%s", want, got)
+	}
+}
+
+func TestDestructorCallback(t *testing.T) {
+	t.Parallel()
+	got := runTestProg(t, "testprogcgo", "DestructorCallback")
+	if want := "OK\n"; got != want {
+		t.Errorf("expected %q, but got:\n%s", want, got)
+	}
+}
+
+func TestDestructorCallbackRace(t *testing.T) {
+	// This test requires building with -race,
+	// so it's somewhat slow.
+	if testing.Short() {
+		t.Skip("skipping test in -short mode")
+	}
+
+	if !platform.RaceDetectorSupported(runtime.GOOS, runtime.GOARCH) {
+		t.Skipf("skipping on %s/%s because race detector not supported", runtime.GOOS, runtime.GOARCH)
+	}
+
+	t.Parallel()
+
+	exe, err := buildTestProg(t, "testprogcgo", "-race")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	got, err := testenv.CleanCmdEnv(exec.Command(exe, "DestructorCallback")).CombinedOutput()
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	if want := "OK\n"; string(got) != want {
+		t.Errorf("expected %q, but got:\n%s", want, got)
+	}
+}
+
+func TestEnsureBindM(t *testing.T) {
+	t.Parallel()
+	switch runtime.GOOS {
+	case "windows", "plan9":
+		t.Skipf("skipping bindm test on %s", runtime.GOOS)
+	}
+	got := runTestProg(t, "testprogcgo", "EnsureBindM")
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q, got %v", want, got)
+	}
+}
+
+func TestStackSwitchCallback(t *testing.T) {
+	t.Parallel()
+	switch runtime.GOOS {
+	case "windows", "plan9", "android", "ios", "openbsd": // no getcontext
+		t.Skipf("skipping test on %s", runtime.GOOS)
+	}
+	got := runTestProg(t, "testprogcgo", "StackSwitchCallback")
+	skip := "SKIP\n"
+	if got == skip {
+		t.Skip("skipping on musl/bionic libc")
+	}
+	want := "OK\n"
+	if got != want {
+		t.Errorf("expected %q, got %v", want, got)
+	}
+}
diff --git a/src/runtime/crash_test.go b/src/runtime/crash_test.go
new file mode 100644
index 0000000..53f7028
--- /dev/null
+++ b/src/runtime/crash_test.go
@@ -0,0 +1,882 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"bytes"
+	"errors"
+	"flag"
+	"fmt"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"regexp"
+	"runtime"
+	"strings"
+	"sync"
+	"testing"
+	"time"
+)
+
+var toRemove []string
+
+func TestMain(m *testing.M) {
+	_, coreErrBefore := os.Stat("core")
+
+	status := m.Run()
+	for _, file := range toRemove {
+		os.RemoveAll(file)
+	}
+
+	_, coreErrAfter := os.Stat("core")
+	if coreErrBefore != nil && coreErrAfter == nil {
+		fmt.Fprintln(os.Stderr, "runtime.test: some test left a core file behind")
+		if status == 0 {
+			status = 1
+		}
+	}
+
+	os.Exit(status)
+}
+
+var testprog struct {
+	sync.Mutex
+	dir    string
+	target map[string]*buildexe
+}
+
+type buildexe struct {
+	once sync.Once
+	exe  string
+	err  error
+}
+
+func runTestProg(t *testing.T, binary, name string, env ...string) string {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+
+	testenv.MustHaveGoBuild(t)
+	t.Helper()
+
+	exe, err := buildTestProg(t, binary)
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	return runBuiltTestProg(t, exe, name, env...)
+}
+
+func runBuiltTestProg(t *testing.T, exe, name string, env ...string) string {
+	t.Helper()
+
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+
+	start := time.Now()
+
+	cmd := testenv.CleanCmdEnv(testenv.Command(t, exe, name))
+	cmd.Env = append(cmd.Env, env...)
+	if testing.Short() {
+		cmd.Env = append(cmd.Env, "RUNTIME_TEST_SHORT=1")
+	}
+	out, err := cmd.CombinedOutput()
+	if err == nil {
+		t.Logf("%v (%v): ok", cmd, time.Since(start))
+	} else {
+		if _, ok := err.(*exec.ExitError); ok {
+			t.Logf("%v: %v", cmd, err)
+		} else if errors.Is(err, exec.ErrWaitDelay) {
+			t.Fatalf("%v: %v", cmd, err)
+		} else {
+			t.Fatalf("%v failed to start: %v", cmd, err)
+		}
+	}
+	return string(out)
+}
+
+var serializeBuild = make(chan bool, 2)
+
+func buildTestProg(t *testing.T, binary string, flags ...string) (string, error) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+	testenv.MustHaveGoBuild(t)
+
+	testprog.Lock()
+	if testprog.dir == "" {
+		dir, err := os.MkdirTemp("", "go-build")
+		if err != nil {
+			t.Fatalf("failed to create temp directory: %v", err)
+		}
+		testprog.dir = dir
+		toRemove = append(toRemove, dir)
+	}
+
+	if testprog.target == nil {
+		testprog.target = make(map[string]*buildexe)
+	}
+	name := binary
+	if len(flags) > 0 {
+		name += "_" + strings.Join(flags, "_")
+	}
+	target, ok := testprog.target[name]
+	if !ok {
+		target = &buildexe{}
+		testprog.target[name] = target
+	}
+
+	dir := testprog.dir
+
+	// Unlock testprog while actually building, so that other
+	// tests can look up executables that were already built.
+	testprog.Unlock()
+
+	target.once.Do(func() {
+		// Only do two "go build"'s at a time,
+		// to keep load from getting too high.
+		serializeBuild <- true
+		defer func() { <-serializeBuild }()
+
+		// Don't get confused if testenv.GoToolPath calls t.Skip.
+		target.err = errors.New("building test called t.Skip")
+
+		exe := filepath.Join(dir, name+".exe")
+
+		start := time.Now()
+		cmd := exec.Command(testenv.GoToolPath(t), append([]string{"build", "-o", exe}, flags...)...)
+		t.Logf("running %v", cmd)
+		cmd.Dir = "testdata/" + binary
+		out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+		if err != nil {
+			target.err = fmt.Errorf("building %s %v: %v\n%s", binary, flags, err, out)
+		} else {
+			t.Logf("built %v in %v", name, time.Since(start))
+			target.exe = exe
+			target.err = nil
+		}
+	})
+
+	return target.exe, target.err
+}
+
+func TestVDSO(t *testing.T) {
+	t.Parallel()
+	output := runTestProg(t, "testprog", "SignalInVDSO")
+	want := "success\n"
+	if output != want {
+		t.Fatalf("output:\n%s\n\nwanted:\n%s", output, want)
+	}
+}
+
+func testCrashHandler(t *testing.T, cgo bool) {
+	type crashTest struct {
+		Cgo bool
+	}
+	var output string
+	if cgo {
+		output = runTestProg(t, "testprogcgo", "Crash")
+	} else {
+		output = runTestProg(t, "testprog", "Crash")
+	}
+	want := "main: recovered done\nnew-thread: recovered done\nsecond-new-thread: recovered done\nmain-again: recovered done\n"
+	if output != want {
+		t.Fatalf("output:\n%s\n\nwanted:\n%s", output, want)
+	}
+}
+
+func TestCrashHandler(t *testing.T) {
+	testCrashHandler(t, false)
+}
+
+func testDeadlock(t *testing.T, name string) {
+	// External linking brings in cgo, causing deadlock detection not working.
+	testenv.MustInternalLink(t, false)
+
+	output := runTestProg(t, "testprog", name)
+	want := "fatal error: all goroutines are asleep - deadlock!\n"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestSimpleDeadlock(t *testing.T) {
+	testDeadlock(t, "SimpleDeadlock")
+}
+
+func TestInitDeadlock(t *testing.T) {
+	testDeadlock(t, "InitDeadlock")
+}
+
+func TestLockedDeadlock(t *testing.T) {
+	testDeadlock(t, "LockedDeadlock")
+}
+
+func TestLockedDeadlock2(t *testing.T) {
+	testDeadlock(t, "LockedDeadlock2")
+}
+
+func TestGoexitDeadlock(t *testing.T) {
+	// External linking brings in cgo, causing deadlock detection not working.
+	testenv.MustInternalLink(t, false)
+
+	output := runTestProg(t, "testprog", "GoexitDeadlock")
+	want := "no goroutines (main called runtime.Goexit) - deadlock!"
+	if !strings.Contains(output, want) {
+		t.Fatalf("output:\n%s\n\nwant output containing: %s", output, want)
+	}
+}
+
+func TestStackOverflow(t *testing.T) {
+	output := runTestProg(t, "testprog", "StackOverflow")
+	want := []string{
+		"runtime: goroutine stack exceeds 1474560-byte limit\n",
+		"fatal error: stack overflow",
+		// information about the current SP and stack bounds
+		"runtime: sp=",
+		"stack=[",
+	}
+	if !strings.HasPrefix(output, want[0]) {
+		t.Errorf("output does not start with %q", want[0])
+	}
+	for _, s := range want[1:] {
+		if !strings.Contains(output, s) {
+			t.Errorf("output does not contain %q", s)
+		}
+	}
+	if t.Failed() {
+		t.Logf("output:\n%s", output)
+	}
+}
+
+func TestThreadExhaustion(t *testing.T) {
+	output := runTestProg(t, "testprog", "ThreadExhaustion")
+	want := "runtime: program exceeds 10-thread limit\nfatal error: thread exhaustion"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestRecursivePanic(t *testing.T) {
+	output := runTestProg(t, "testprog", "RecursivePanic")
+	want := `wrap: bad
+panic: again
+
+`
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+
+}
+
+func TestRecursivePanic2(t *testing.T) {
+	output := runTestProg(t, "testprog", "RecursivePanic2")
+	want := `first panic
+second panic
+panic: third panic
+
+`
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+
+}
+
+func TestRecursivePanic3(t *testing.T) {
+	output := runTestProg(t, "testprog", "RecursivePanic3")
+	want := `panic: first panic
+
+`
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+
+}
+
+func TestRecursivePanic4(t *testing.T) {
+	output := runTestProg(t, "testprog", "RecursivePanic4")
+	want := `panic: first panic [recovered]
+	panic: second panic
+`
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+
+}
+
+func TestRecursivePanic5(t *testing.T) {
+	output := runTestProg(t, "testprog", "RecursivePanic5")
+	want := `first panic
+second panic
+panic: third panic
+`
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+
+}
+
+func TestGoexitCrash(t *testing.T) {
+	// External linking brings in cgo, causing deadlock detection not working.
+	testenv.MustInternalLink(t, false)
+
+	output := runTestProg(t, "testprog", "GoexitExit")
+	want := "no goroutines (main called runtime.Goexit) - deadlock!"
+	if !strings.Contains(output, want) {
+		t.Fatalf("output:\n%s\n\nwant output containing: %s", output, want)
+	}
+}
+
+func TestGoexitDefer(t *testing.T) {
+	c := make(chan struct{})
+	go func() {
+		defer func() {
+			r := recover()
+			if r != nil {
+				t.Errorf("non-nil recover during Goexit")
+			}
+			c <- struct{}{}
+		}()
+		runtime.Goexit()
+	}()
+	// Note: if the defer fails to run, we will get a deadlock here
+	<-c
+}
+
+func TestGoNil(t *testing.T) {
+	output := runTestProg(t, "testprog", "GoNil")
+	want := "go of nil func value"
+	if !strings.Contains(output, want) {
+		t.Fatalf("output:\n%s\n\nwant output containing: %s", output, want)
+	}
+}
+
+func TestMainGoroutineID(t *testing.T) {
+	output := runTestProg(t, "testprog", "MainGoroutineID")
+	want := "panic: test\n\ngoroutine 1 [running]:\n"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestNoHelperGoroutines(t *testing.T) {
+	output := runTestProg(t, "testprog", "NoHelperGoroutines")
+	matches := regexp.MustCompile(`goroutine [0-9]+ \[`).FindAllStringSubmatch(output, -1)
+	if len(matches) != 1 || matches[0][0] != "goroutine 1 [" {
+		t.Fatalf("want to see only goroutine 1, see:\n%s", output)
+	}
+}
+
+func TestBreakpoint(t *testing.T) {
+	output := runTestProg(t, "testprog", "Breakpoint")
+	// If runtime.Breakpoint() is inlined, then the stack trace prints
+	// "runtime.Breakpoint(...)" instead of "runtime.Breakpoint()".
+	want := "runtime.Breakpoint("
+	if !strings.Contains(output, want) {
+		t.Fatalf("output:\n%s\n\nwant output containing: %s", output, want)
+	}
+}
+
+func TestGoexitInPanic(t *testing.T) {
+	// External linking brings in cgo, causing deadlock detection not working.
+	testenv.MustInternalLink(t, false)
+
+	// see issue 8774: this code used to trigger an infinite recursion
+	output := runTestProg(t, "testprog", "GoexitInPanic")
+	want := "fatal error: no goroutines (main called runtime.Goexit) - deadlock!"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+// Issue 14965: Runtime panics should be of type runtime.Error
+func TestRuntimePanicWithRuntimeError(t *testing.T) {
+	testCases := [...]func(){
+		0: func() {
+			var m map[uint64]bool
+			m[1234] = true
+		},
+		1: func() {
+			ch := make(chan struct{})
+			close(ch)
+			close(ch)
+		},
+		2: func() {
+			var ch = make(chan struct{})
+			close(ch)
+			ch <- struct{}{}
+		},
+		3: func() {
+			var s = make([]int, 2)
+			_ = s[2]
+		},
+		4: func() {
+			n := -1
+			_ = make(chan bool, n)
+		},
+		5: func() {
+			close((chan bool)(nil))
+		},
+	}
+
+	for i, fn := range testCases {
+		got := panicValue(fn)
+		if _, ok := got.(runtime.Error); !ok {
+			t.Errorf("test #%d: recovered value %v(type %T) does not implement runtime.Error", i, got, got)
+		}
+	}
+}
+
+func panicValue(fn func()) (recovered any) {
+	defer func() {
+		recovered = recover()
+	}()
+	fn()
+	return
+}
+
+func TestPanicAfterGoexit(t *testing.T) {
+	// an uncaught panic should still work after goexit
+	output := runTestProg(t, "testprog", "PanicAfterGoexit")
+	want := "panic: hello"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestRecoveredPanicAfterGoexit(t *testing.T) {
+	// External linking brings in cgo, causing deadlock detection not working.
+	testenv.MustInternalLink(t, false)
+
+	output := runTestProg(t, "testprog", "RecoveredPanicAfterGoexit")
+	want := "fatal error: no goroutines (main called runtime.Goexit) - deadlock!"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestRecoverBeforePanicAfterGoexit(t *testing.T) {
+	// External linking brings in cgo, causing deadlock detection not working.
+	testenv.MustInternalLink(t, false)
+
+	t.Parallel()
+	output := runTestProg(t, "testprog", "RecoverBeforePanicAfterGoexit")
+	want := "fatal error: no goroutines (main called runtime.Goexit) - deadlock!"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestRecoverBeforePanicAfterGoexit2(t *testing.T) {
+	// External linking brings in cgo, causing deadlock detection not working.
+	testenv.MustInternalLink(t, false)
+
+	t.Parallel()
+	output := runTestProg(t, "testprog", "RecoverBeforePanicAfterGoexit2")
+	want := "fatal error: no goroutines (main called runtime.Goexit) - deadlock!"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestNetpollDeadlock(t *testing.T) {
+	t.Parallel()
+	output := runTestProg(t, "testprognet", "NetpollDeadlock")
+	want := "done\n"
+	if !strings.HasSuffix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestPanicTraceback(t *testing.T) {
+	t.Parallel()
+	output := runTestProg(t, "testprog", "PanicTraceback")
+	want := "panic: hello\n\tpanic: panic pt2\n\tpanic: panic pt1\n"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+
+	// Check functions in the traceback.
+	fns := []string{"main.pt1.func1", "panic", "main.pt2.func1", "panic", "main.pt2", "main.pt1"}
+	for _, fn := range fns {
+		re := regexp.MustCompile(`(?m)^` + regexp.QuoteMeta(fn) + `\(.*\n`)
+		idx := re.FindStringIndex(output)
+		if idx == nil {
+			t.Fatalf("expected %q function in traceback:\n%s", fn, output)
+		}
+		output = output[idx[1]:]
+	}
+}
+
+func testPanicDeadlock(t *testing.T, name string, want string) {
+	// test issue 14432
+	output := runTestProg(t, "testprog", name)
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestPanicDeadlockGosched(t *testing.T) {
+	testPanicDeadlock(t, "GoschedInPanic", "panic: errorThatGosched\n\n")
+}
+
+func TestPanicDeadlockSyscall(t *testing.T) {
+	testPanicDeadlock(t, "SyscallInPanic", "1\n2\npanic: 3\n\n")
+}
+
+func TestPanicLoop(t *testing.T) {
+	output := runTestProg(t, "testprog", "PanicLoop")
+	if want := "panic while printing panic value"; !strings.Contains(output, want) {
+		t.Errorf("output does not contain %q:\n%s", want, output)
+	}
+}
+
+func TestMemPprof(t *testing.T) {
+	testenv.MustHaveGoRun(t)
+
+	exe, err := buildTestProg(t, "testprog")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	got, err := testenv.CleanCmdEnv(exec.Command(exe, "MemProf")).CombinedOutput()
+	if err != nil {
+		t.Fatalf("testprog failed: %s, output:\n%s", err, got)
+	}
+	fn := strings.TrimSpace(string(got))
+	defer os.Remove(fn)
+
+	for try := 0; try < 2; try++ {
+		cmd := testenv.CleanCmdEnv(exec.Command(testenv.GoToolPath(t), "tool", "pprof", "-alloc_space", "-top"))
+		// Check that pprof works both with and without explicit executable on command line.
+		if try == 0 {
+			cmd.Args = append(cmd.Args, exe, fn)
+		} else {
+			cmd.Args = append(cmd.Args, fn)
+		}
+		found := false
+		for i, e := range cmd.Env {
+			if strings.HasPrefix(e, "PPROF_TMPDIR=") {
+				cmd.Env[i] = "PPROF_TMPDIR=" + os.TempDir()
+				found = true
+				break
+			}
+		}
+		if !found {
+			cmd.Env = append(cmd.Env, "PPROF_TMPDIR="+os.TempDir())
+		}
+
+		top, err := cmd.CombinedOutput()
+		t.Logf("%s:\n%s", cmd.Args, top)
+		if err != nil {
+			t.Error(err)
+		} else if !bytes.Contains(top, []byte("MemProf")) {
+			t.Error("missing MemProf in pprof output")
+		}
+	}
+}
+
+var concurrentMapTest = flag.Bool("run_concurrent_map_tests", false, "also run flaky concurrent map tests")
+
+func TestConcurrentMapWrites(t *testing.T) {
+	if !*concurrentMapTest {
+		t.Skip("skipping without -run_concurrent_map_tests")
+	}
+	testenv.MustHaveGoRun(t)
+	output := runTestProg(t, "testprog", "concurrentMapWrites")
+	want := "fatal error: concurrent map writes"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+func TestConcurrentMapReadWrite(t *testing.T) {
+	if !*concurrentMapTest {
+		t.Skip("skipping without -run_concurrent_map_tests")
+	}
+	testenv.MustHaveGoRun(t)
+	output := runTestProg(t, "testprog", "concurrentMapReadWrite")
+	want := "fatal error: concurrent map read and map write"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+func TestConcurrentMapIterateWrite(t *testing.T) {
+	if !*concurrentMapTest {
+		t.Skip("skipping without -run_concurrent_map_tests")
+	}
+	testenv.MustHaveGoRun(t)
+	output := runTestProg(t, "testprog", "concurrentMapIterateWrite")
+	want := "fatal error: concurrent map iteration and map write"
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+type point struct {
+	x, y *int
+}
+
+func (p *point) negate() {
+	*p.x = *p.x * -1
+	*p.y = *p.y * -1
+}
+
+// Test for issue #10152.
+func TestPanicInlined(t *testing.T) {
+	defer func() {
+		r := recover()
+		if r == nil {
+			t.Fatalf("recover failed")
+		}
+		buf := make([]byte, 2048)
+		n := runtime.Stack(buf, false)
+		buf = buf[:n]
+		if !bytes.Contains(buf, []byte("(*point).negate(")) {
+			t.Fatalf("expecting stack trace to contain call to (*point).negate()")
+		}
+	}()
+
+	pt := new(point)
+	pt.negate()
+}
+
+// Test for issues #3934 and #20018.
+// We want to delay exiting until a panic print is complete.
+func TestPanicRace(t *testing.T) {
+	testenv.MustHaveGoRun(t)
+
+	exe, err := buildTestProg(t, "testprog")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	// The test is intentionally racy, and in my testing does not
+	// produce the expected output about 0.05% of the time.
+	// So run the program in a loop and only fail the test if we
+	// get the wrong output ten times in a row.
+	const tries = 10
+retry:
+	for i := 0; i < tries; i++ {
+		got, err := testenv.CleanCmdEnv(exec.Command(exe, "PanicRace")).CombinedOutput()
+		if err == nil {
+			t.Logf("try %d: program exited successfully, should have failed", i+1)
+			continue
+		}
+
+		if i > 0 {
+			t.Logf("try %d:\n", i+1)
+		}
+		t.Logf("%s\n", got)
+
+		wants := []string{
+			"panic: crash",
+			"PanicRace",
+			"created by ",
+		}
+		for _, want := range wants {
+			if !bytes.Contains(got, []byte(want)) {
+				t.Logf("did not find expected string %q", want)
+				continue retry
+			}
+		}
+
+		// Test generated expected output.
+		return
+	}
+	t.Errorf("test ran %d times without producing expected output", tries)
+}
+
+func TestBadTraceback(t *testing.T) {
+	output := runTestProg(t, "testprog", "BadTraceback")
+	for _, want := range []string{
+		"unexpected return pc",
+		"called from 0xbad",
+		"00000bad",    // Smashed LR in hex dump
+		"<main.badLR", // Symbolization in hex dump (badLR1 or badLR2)
+	} {
+		if !strings.Contains(output, want) {
+			t.Errorf("output does not contain %q:\n%s", want, output)
+		}
+	}
+}
+
+func TestTimePprof(t *testing.T) {
+	// This test is unreliable on any system in which nanotime
+	// calls into libc.
+	switch runtime.GOOS {
+	case "aix", "darwin", "illumos", "openbsd", "solaris":
+		t.Skipf("skipping on %s because nanotime calls libc", runtime.GOOS)
+	}
+
+	// Pass GOTRACEBACK for issue #41120 to try to get more
+	// information on timeout.
+	fn := runTestProg(t, "testprog", "TimeProf", "GOTRACEBACK=crash")
+	fn = strings.TrimSpace(fn)
+	defer os.Remove(fn)
+
+	cmd := testenv.CleanCmdEnv(exec.Command(testenv.GoToolPath(t), "tool", "pprof", "-top", "-nodecount=1", fn))
+	cmd.Env = append(cmd.Env, "PPROF_TMPDIR="+os.TempDir())
+	top, err := cmd.CombinedOutput()
+	t.Logf("%s", top)
+	if err != nil {
+		t.Error(err)
+	} else if bytes.Contains(top, []byte("ExternalCode")) {
+		t.Error("profiler refers to ExternalCode")
+	}
+}
+
+// Test that runtime.abort does so.
+func TestAbort(t *testing.T) {
+	// Pass GOTRACEBACK to ensure we get runtime frames.
+	output := runTestProg(t, "testprog", "Abort", "GOTRACEBACK=system")
+	if want := "runtime.abort"; !strings.Contains(output, want) {
+		t.Errorf("output does not contain %q:\n%s", want, output)
+	}
+	if strings.Contains(output, "BAD") {
+		t.Errorf("output contains BAD:\n%s", output)
+	}
+	// Check that it's a signal traceback.
+	want := "PC="
+	// For systems that use a breakpoint, check specifically for that.
+	switch runtime.GOARCH {
+	case "386", "amd64":
+		switch runtime.GOOS {
+		case "plan9":
+			want = "sys: breakpoint"
+		case "windows":
+			want = "Exception 0x80000003"
+		default:
+			want = "SIGTRAP"
+		}
+	}
+	if !strings.Contains(output, want) {
+		t.Errorf("output does not contain %q:\n%s", want, output)
+	}
+}
+
+// For TestRuntimePanic: test a panic in the runtime package without
+// involving the testing harness.
+func init() {
+	if os.Getenv("GO_TEST_RUNTIME_PANIC") == "1" {
+		defer func() {
+			if r := recover(); r != nil {
+				// We expect to crash, so exit 0
+				// to indicate failure.
+				os.Exit(0)
+			}
+		}()
+		runtime.PanicForTesting(nil, 1)
+		// We expect to crash, so exit 0 to indicate failure.
+		os.Exit(0)
+	}
+}
+
+func TestRuntimePanic(t *testing.T) {
+	testenv.MustHaveExec(t)
+	cmd := testenv.CleanCmdEnv(exec.Command(os.Args[0], "-test.run=TestRuntimePanic"))
+	cmd.Env = append(cmd.Env, "GO_TEST_RUNTIME_PANIC=1")
+	out, err := cmd.CombinedOutput()
+	t.Logf("%s", out)
+	if err == nil {
+		t.Error("child process did not fail")
+	} else if want := "runtime.unexportedPanicForTesting"; !bytes.Contains(out, []byte(want)) {
+		t.Errorf("output did not contain expected string %q", want)
+	}
+}
+
+// Test that g0 stack overflows are handled gracefully.
+func TestG0StackOverflow(t *testing.T) {
+	testenv.MustHaveExec(t)
+
+	switch runtime.GOOS {
+	case "android", "darwin", "dragonfly", "freebsd", "ios", "linux", "netbsd", "openbsd":
+		t.Skipf("g0 stack is wrong on pthread platforms (see golang.org/issue/26061)")
+	}
+
+	if os.Getenv("TEST_G0_STACK_OVERFLOW") != "1" {
+		cmd := testenv.CleanCmdEnv(exec.Command(os.Args[0], "-test.run=TestG0StackOverflow", "-test.v"))
+		cmd.Env = append(cmd.Env, "TEST_G0_STACK_OVERFLOW=1")
+		out, err := cmd.CombinedOutput()
+		// Don't check err since it's expected to crash.
+		if n := strings.Count(string(out), "morestack on g0\n"); n != 1 {
+			t.Fatalf("%s\n(exit status %v)", out, err)
+		}
+		// Check that it's a signal-style traceback.
+		if runtime.GOOS != "windows" {
+			if want := "PC="; !strings.Contains(string(out), want) {
+				t.Errorf("output does not contain %q:\n%s", want, out)
+			}
+		}
+		return
+	}
+
+	runtime.G0StackOverflow()
+}
+
+// Test that panic message is not clobbered.
+// See issue 30150.
+func TestDoublePanic(t *testing.T) {
+	output := runTestProg(t, "testprog", "DoublePanic", "GODEBUG=clobberfree=1")
+	wants := []string{"panic: XXX", "panic: YYY"}
+	for _, want := range wants {
+		if !strings.Contains(output, want) {
+			t.Errorf("output:\n%s\n\nwant output containing: %s", output, want)
+		}
+	}
+}
+
+// Test that panic while panicking discards error message
+// See issue 52257
+func TestPanicWhilePanicking(t *testing.T) {
+	tests := []struct {
+		Want string
+		Func string
+	}{
+		{
+			"panic while printing panic value: important error message",
+			"ErrorPanic",
+		},
+		{
+			"panic while printing panic value: important stringer message",
+			"StringerPanic",
+		},
+		{
+			"panic while printing panic value: type",
+			"DoubleErrorPanic",
+		},
+		{
+			"panic while printing panic value: type",
+			"DoubleStringerPanic",
+		},
+		{
+			"panic while printing panic value: type",
+			"CircularPanic",
+		},
+		{
+			"important string message",
+			"StringPanic",
+		},
+		{
+			"nil",
+			"NilPanic",
+		},
+	}
+	for _, x := range tests {
+		output := runTestProg(t, "testprog", x.Func)
+		if !strings.Contains(output, x.Want) {
+			t.Errorf("output does not contain %q:\n%s", x.Want, output)
+		}
+	}
+}
+
+func TestPanicOnUnsafeSlice(t *testing.T) {
+	output := runTestProg(t, "testprog", "panicOnNilAndEleSizeIsZero")
+	want := "panic: runtime error: unsafe.Slice: ptr is nil and len is not zero"
+	if !strings.Contains(output, want) {
+		t.Errorf("output does not contain %q:\n%s", want, output)
+	}
+}
diff --git a/src/runtime/crash_unix_test.go b/src/runtime/crash_unix_test.go
new file mode 100644
index 0000000..cc60bfb
--- /dev/null
+++ b/src/runtime/crash_unix_test.go
@@ -0,0 +1,332 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package runtime_test
+
+import (
+	"bytes"
+	"internal/testenv"
+	"io"
+	"os"
+	"os/exec"
+	"runtime"
+	"runtime/debug"
+	"strings"
+	"sync"
+	"syscall"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+func init() {
+	if runtime.Sigisblocked(int(syscall.SIGQUIT)) {
+		// We can't use SIGQUIT to kill subprocesses because
+		// it's blocked. Use SIGKILL instead. See issue
+		// #19196 for an example of when this happens.
+		testenv.Sigquit = syscall.SIGKILL
+	}
+}
+
+func TestBadOpen(t *testing.T) {
+	// make sure we get the correct error code if open fails. Same for
+	// read/write/close on the resulting -1 fd. See issue 10052.
+	nonfile := []byte("/notreallyafile")
+	fd := runtime.Open(&nonfile[0], 0, 0)
+	if fd != -1 {
+		t.Errorf("open(%q)=%d, want -1", nonfile, fd)
+	}
+	var buf [32]byte
+	r := runtime.Read(-1, unsafe.Pointer(&buf[0]), int32(len(buf)))
+	if got, want := r, -int32(syscall.EBADF); got != want {
+		t.Errorf("read()=%d, want %d", got, want)
+	}
+	w := runtime.Write(^uintptr(0), unsafe.Pointer(&buf[0]), int32(len(buf)))
+	if got, want := w, -int32(syscall.EBADF); got != want {
+		t.Errorf("write()=%d, want %d", got, want)
+	}
+	c := runtime.Close(-1)
+	if c != -1 {
+		t.Errorf("close()=%d, want -1", c)
+	}
+}
+
+func TestCrashDumpsAllThreads(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+
+	switch runtime.GOOS {
+	case "darwin", "dragonfly", "freebsd", "linux", "netbsd", "openbsd", "illumos", "solaris":
+	default:
+		t.Skipf("skipping; not supported on %v", runtime.GOOS)
+	}
+
+	if runtime.GOOS == "openbsd" && (runtime.GOARCH == "arm" || runtime.GOARCH == "mips64") {
+		// This may be ncpu < 2 related...
+		t.Skipf("skipping; test fails on %s/%s - see issue #42464", runtime.GOOS, runtime.GOARCH)
+	}
+
+	if runtime.Sigisblocked(int(syscall.SIGQUIT)) {
+		t.Skip("skipping; SIGQUIT is blocked, see golang.org/issue/19196")
+	}
+
+	testenv.MustHaveGoBuild(t)
+
+	if strings.Contains(os.Getenv("GOFLAGS"), "mayMoreStackPreempt") {
+		// This test occasionally times out in this debug mode. This is probably
+		// revealing a real bug in the scheduler, but since it seems to only
+		// affect this test and this is itself a test of a debug mode, it's not
+		// a high priority.
+		testenv.SkipFlaky(t, 55160)
+	}
+
+	exe, err := buildTestProg(t, "testprog")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	cmd := testenv.Command(t, exe, "CrashDumpsAllThreads")
+	cmd = testenv.CleanCmdEnv(cmd)
+	cmd.Dir = t.TempDir() // put any core file in tempdir
+	cmd.Env = append(cmd.Env,
+		"GOTRACEBACK=crash",
+		// Set GOGC=off. Because of golang.org/issue/10958, the tight
+		// loops in the test program are not preemptible. If GC kicks
+		// in, it may lock up and prevent main from saying it's ready.
+		"GOGC=off",
+		// Set GODEBUG=asyncpreemptoff=1. If a thread is preempted
+		// when it receives SIGQUIT, it won't show the expected
+		// stack trace. See issue 35356.
+		"GODEBUG=asyncpreemptoff=1",
+	)
+
+	var outbuf bytes.Buffer
+	cmd.Stdout = &outbuf
+	cmd.Stderr = &outbuf
+
+	rp, wp, err := os.Pipe()
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer rp.Close()
+
+	cmd.ExtraFiles = []*os.File{wp}
+
+	if err := cmd.Start(); err != nil {
+		wp.Close()
+		t.Fatalf("starting program: %v", err)
+	}
+
+	if err := wp.Close(); err != nil {
+		t.Logf("closing write pipe: %v", err)
+	}
+	if _, err := rp.Read(make([]byte, 1)); err != nil {
+		t.Fatalf("reading from pipe: %v", err)
+	}
+
+	if err := cmd.Process.Signal(syscall.SIGQUIT); err != nil {
+		t.Fatalf("signal: %v", err)
+	}
+
+	// No point in checking the error return from Wait--we expect
+	// it to fail.
+	cmd.Wait()
+
+	// We want to see a stack trace for each thread.
+	// Before https://golang.org/cl/2811 running threads would say
+	// "goroutine running on other thread; stack unavailable".
+	out := outbuf.Bytes()
+	n := bytes.Count(out, []byte("main.crashDumpsAllThreadsLoop("))
+	if n != 4 {
+		t.Errorf("found %d instances of main.crashDumpsAllThreadsLoop; expected 4", n)
+		t.Logf("%s", out)
+	}
+}
+
+func TestPanicSystemstack(t *testing.T) {
+	// Test that GOTRACEBACK=crash prints both the system and user
+	// stack of other threads.
+
+	// The GOTRACEBACK=crash handler takes 0.1 seconds even if
+	// it's not writing a core file and potentially much longer if
+	// it is. Skip in short mode.
+	if testing.Short() {
+		t.Skip("Skipping in short mode (GOTRACEBACK=crash is slow)")
+	}
+
+	if runtime.Sigisblocked(int(syscall.SIGQUIT)) {
+		t.Skip("skipping; SIGQUIT is blocked, see golang.org/issue/19196")
+	}
+
+	t.Parallel()
+	cmd := exec.Command(os.Args[0], "testPanicSystemstackInternal")
+	cmd = testenv.CleanCmdEnv(cmd)
+	cmd.Dir = t.TempDir() // put any core file in tempdir
+	cmd.Env = append(cmd.Env, "GOTRACEBACK=crash")
+	pr, pw, err := os.Pipe()
+	if err != nil {
+		t.Fatal("creating pipe: ", err)
+	}
+	cmd.Stderr = pw
+	if err := cmd.Start(); err != nil {
+		t.Fatal("starting command: ", err)
+	}
+	defer cmd.Process.Wait()
+	defer cmd.Process.Kill()
+	if err := pw.Close(); err != nil {
+		t.Log("closing write pipe: ", err)
+	}
+	defer pr.Close()
+
+	// Wait for "x\nx\n" to indicate almost-readiness.
+	buf := make([]byte, 4)
+	_, err = io.ReadFull(pr, buf)
+	if err != nil || string(buf) != "x\nx\n" {
+		t.Fatal("subprocess failed; output:\n", string(buf))
+	}
+
+	// The child blockers print "x\n" and then block on a lock. Receiving
+	// those bytes only indicates that the child is _about to block_. Since
+	// we don't have a way to know when it is fully blocked, sleep a bit to
+	// make us less likely to lose the race and signal before the child
+	// blocks.
+	time.Sleep(100 * time.Millisecond)
+
+	// Send SIGQUIT.
+	if err := cmd.Process.Signal(syscall.SIGQUIT); err != nil {
+		t.Fatal("signaling subprocess: ", err)
+	}
+
+	// Get traceback.
+	tb, err := io.ReadAll(pr)
+	if err != nil {
+		t.Fatal("reading traceback from pipe: ", err)
+	}
+
+	// Traceback should have two testPanicSystemstackInternal's
+	// and two blockOnSystemStackInternal's.
+	userFunc := "testPanicSystemstackInternal"
+	sysFunc := "blockOnSystemStackInternal"
+	nUser := bytes.Count(tb, []byte(userFunc))
+	nSys := bytes.Count(tb, []byte(sysFunc))
+	if nUser != 2 || nSys != 2 {
+		t.Fatalf("want %d user stack frames in %s and %d system stack frames in %s, got %d and %d:\n%s", 2, userFunc, 2, sysFunc, nUser, nSys, string(tb))
+	}
+
+	// Traceback should not contain "unexpected SPWRITE" when
+	// unwinding the system stacks.
+	if bytes.Contains(tb, []byte("unexpected SPWRITE")) {
+		t.Errorf("unexpected \"unexpected SPWRITE\" in traceback:\n%s", tb)
+	}
+}
+
+func init() {
+	if len(os.Args) >= 2 && os.Args[1] == "testPanicSystemstackInternal" {
+		// Complete any in-flight GCs and disable future ones. We're going to
+		// block goroutines on runtime locks, which aren't ever preemptible for the
+		// GC to scan them.
+		runtime.GC()
+		debug.SetGCPercent(-1)
+		// Get two threads running on the system stack with
+		// something recognizable in the stack trace.
+		runtime.GOMAXPROCS(2)
+		go testPanicSystemstackInternal()
+		testPanicSystemstackInternal()
+	}
+}
+
+func testPanicSystemstackInternal() {
+	runtime.BlockOnSystemStack()
+	os.Exit(1) // Should be unreachable.
+}
+
+func TestSignalExitStatus(t *testing.T) {
+	testenv.MustHaveGoBuild(t)
+	exe, err := buildTestProg(t, "testprog")
+	if err != nil {
+		t.Fatal(err)
+	}
+	err = testenv.CleanCmdEnv(exec.Command(exe, "SignalExitStatus")).Run()
+	if err == nil {
+		t.Error("test program succeeded unexpectedly")
+	} else if ee, ok := err.(*exec.ExitError); !ok {
+		t.Errorf("error (%v) has type %T; expected exec.ExitError", err, err)
+	} else if ws, ok := ee.Sys().(syscall.WaitStatus); !ok {
+		t.Errorf("error.Sys (%v) has type %T; expected syscall.WaitStatus", ee.Sys(), ee.Sys())
+	} else if !ws.Signaled() || ws.Signal() != syscall.SIGTERM {
+		t.Errorf("got %v; expected SIGTERM", ee)
+	}
+}
+
+func TestSignalIgnoreSIGTRAP(t *testing.T) {
+	if runtime.GOOS == "openbsd" {
+		testenv.SkipFlaky(t, 49725)
+	}
+
+	output := runTestProg(t, "testprognet", "SignalIgnoreSIGTRAP")
+	want := "OK\n"
+	if output != want {
+		t.Fatalf("want %s, got %s\n", want, output)
+	}
+}
+
+func TestSignalDuringExec(t *testing.T) {
+	switch runtime.GOOS {
+	case "darwin", "dragonfly", "freebsd", "linux", "netbsd", "openbsd":
+	default:
+		t.Skipf("skipping test on %s", runtime.GOOS)
+	}
+	output := runTestProg(t, "testprognet", "SignalDuringExec")
+	want := "OK\n"
+	if output != want {
+		t.Fatalf("want %s, got %s\n", want, output)
+	}
+}
+
+func TestSignalM(t *testing.T) {
+	r, w, errno := runtime.Pipe()
+	if errno != 0 {
+		t.Fatal(syscall.Errno(errno))
+	}
+	defer func() {
+		runtime.Close(r)
+		runtime.Close(w)
+	}()
+	runtime.Closeonexec(r)
+	runtime.Closeonexec(w)
+
+	var want, got int64
+	var wg sync.WaitGroup
+	ready := make(chan *runtime.M)
+	wg.Add(1)
+	go func() {
+		runtime.LockOSThread()
+		want, got = runtime.WaitForSigusr1(r, w, func(mp *runtime.M) {
+			ready <- mp
+		})
+		runtime.UnlockOSThread()
+		wg.Done()
+	}()
+	waitingM := <-ready
+	runtime.SendSigusr1(waitingM)
+
+	timer := time.AfterFunc(time.Second, func() {
+		// Write 1 to tell WaitForSigusr1 that we timed out.
+		bw := byte(1)
+		if n := runtime.Write(uintptr(w), unsafe.Pointer(&bw), 1); n != 1 {
+			t.Errorf("pipe write failed: %d", n)
+		}
+	})
+	defer timer.Stop()
+
+	wg.Wait()
+	if got == -1 {
+		t.Fatal("signalM signal not received")
+	} else if want != got {
+		t.Fatalf("signal sent to M %d, but received on M %d", want, got)
+	}
+}
diff --git a/src/runtime/create_file_nounix.go b/src/runtime/create_file_nounix.go
new file mode 100644
index 0000000..60f7517
--- /dev/null
+++ b/src/runtime/create_file_nounix.go
@@ -0,0 +1,14 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !unix
+
+package runtime
+
+const canCreateFile = false
+
+func create(name *byte, perm int32) int32 {
+	throw("unimplemented")
+	return -1
+}
diff --git a/src/runtime/create_file_unix.go b/src/runtime/create_file_unix.go
new file mode 100644
index 0000000..7280810
--- /dev/null
+++ b/src/runtime/create_file_unix.go
@@ -0,0 +1,14 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package runtime
+
+const canCreateFile = true
+
+// create returns an fd to a write-only file.
+func create(name *byte, perm int32) int32 {
+	return open(name, _O_CREAT|_O_WRONLY|_O_TRUNC, perm)
+}
diff --git a/src/runtime/debug.go b/src/runtime/debug.go
new file mode 100644
index 0000000..9a92b45
--- /dev/null
+++ b/src/runtime/debug.go
@@ -0,0 +1,115 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// GOMAXPROCS sets the maximum number of CPUs that can be executing
+// simultaneously and returns the previous setting. It defaults to
+// the value of runtime.NumCPU. If n < 1, it does not change the current setting.
+// This call will go away when the scheduler improves.
+func GOMAXPROCS(n int) int {
+	if GOARCH == "wasm" && n > 1 {
+		n = 1 // WebAssembly has no threads yet, so only one CPU is possible.
+	}
+
+	lock(&sched.lock)
+	ret := int(gomaxprocs)
+	unlock(&sched.lock)
+	if n <= 0 || n == ret {
+		return ret
+	}
+
+	stopTheWorldGC(stwGOMAXPROCS)
+
+	// newprocs will be processed by startTheWorld
+	newprocs = int32(n)
+
+	startTheWorldGC()
+	return ret
+}
+
+// NumCPU returns the number of logical CPUs usable by the current process.
+//
+// The set of available CPUs is checked by querying the operating system
+// at process startup. Changes to operating system CPU allocation after
+// process startup are not reflected.
+func NumCPU() int {
+	return int(ncpu)
+}
+
+// NumCgoCall returns the number of cgo calls made by the current process.
+func NumCgoCall() int64 {
+	var n = int64(atomic.Load64(&ncgocall))
+	for mp := (*m)(atomic.Loadp(unsafe.Pointer(&allm))); mp != nil; mp = mp.alllink {
+		n += int64(mp.ncgocall)
+	}
+	return n
+}
+
+// NumGoroutine returns the number of goroutines that currently exist.
+func NumGoroutine() int {
+	return int(gcount())
+}
+
+//go:linkname debug_modinfo runtime/debug.modinfo
+func debug_modinfo() string {
+	return modinfo
+}
+
+// mayMoreStackPreempt is a maymorestack hook that forces a preemption
+// at every possible cooperative preemption point.
+//
+// This is valuable to apply to the runtime, which can be sensitive to
+// preemption points. To apply this to all preemption points in the
+// runtime and runtime-like code, use the following in bash or zsh:
+//
+//	X=(-{gc,asm}flags={runtime/...,reflect,sync}=-d=maymorestack=runtime.mayMoreStackPreempt) GOFLAGS=${X[@]}
+//
+// This must be deeply nosplit because it is called from a function
+// prologue before the stack is set up and because the compiler will
+// call it from any splittable prologue (leading to infinite
+// recursion).
+//
+// Ideally it should also use very little stack because the linker
+// doesn't currently account for this in nosplit stack depth checking.
+//
+// Ensure mayMoreStackPreempt can be called for all ABIs.
+//
+//go:nosplit
+//go:linkname mayMoreStackPreempt
+func mayMoreStackPreempt() {
+	// Don't do anything on the g0 or gsignal stack.
+	gp := getg()
+	if gp == gp.m.g0 || gp == gp.m.gsignal {
+		return
+	}
+	// Force a preemption, unless the stack is already poisoned.
+	if gp.stackguard0 < stackPoisonMin {
+		gp.stackguard0 = stackPreempt
+	}
+}
+
+// mayMoreStackMove is a maymorestack hook that forces stack movement
+// at every possible point.
+//
+// See mayMoreStackPreempt.
+//
+//go:nosplit
+//go:linkname mayMoreStackMove
+func mayMoreStackMove() {
+	// Don't do anything on the g0 or gsignal stack.
+	gp := getg()
+	if gp == gp.m.g0 || gp == gp.m.gsignal {
+		return
+	}
+	// Force stack movement, unless the stack is already poisoned.
+	if gp.stackguard0 < stackPoisonMin {
+		gp.stackguard0 = stackForceMove
+	}
+}
diff --git a/src/runtime/debug/debug.s b/src/runtime/debug/debug.s
new file mode 100644
index 0000000..6aae33a
--- /dev/null
+++ b/src/runtime/debug/debug.s
@@ -0,0 +1,9 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Nothing to see here.
+// This file exists so that the go command knows that parts of the
+// package are implemented in C, so that it does not instruct the
+// Go compiler to complain about extern declarations.
+// The actual implementations are in package runtime.
diff --git a/src/runtime/debug/garbage.go b/src/runtime/debug/garbage.go
new file mode 100644
index 0000000..0f53928
--- /dev/null
+++ b/src/runtime/debug/garbage.go
@@ -0,0 +1,238 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package debug
+
+import (
+	"runtime"
+	"sort"
+	"time"
+)
+
+// GCStats collect information about recent garbage collections.
+type GCStats struct {
+	LastGC         time.Time       // time of last collection
+	NumGC          int64           // number of garbage collections
+	PauseTotal     time.Duration   // total pause for all collections
+	Pause          []time.Duration // pause history, most recent first
+	PauseEnd       []time.Time     // pause end times history, most recent first
+	PauseQuantiles []time.Duration
+}
+
+// ReadGCStats reads statistics about garbage collection into stats.
+// The number of entries in the pause history is system-dependent;
+// stats.Pause slice will be reused if large enough, reallocated otherwise.
+// ReadGCStats may use the full capacity of the stats.Pause slice.
+// If stats.PauseQuantiles is non-empty, ReadGCStats fills it with quantiles
+// summarizing the distribution of pause time. For example, if
+// len(stats.PauseQuantiles) is 5, it will be filled with the minimum,
+// 25%, 50%, 75%, and maximum pause times.
+func ReadGCStats(stats *GCStats) {
+	// Create a buffer with space for at least two copies of the
+	// pause history tracked by the runtime. One will be returned
+	// to the caller and the other will be used as transfer buffer
+	// for end times history and as a temporary buffer for
+	// computing quantiles.
+	const maxPause = len(((*runtime.MemStats)(nil)).PauseNs)
+	if cap(stats.Pause) < 2*maxPause+3 {
+		stats.Pause = make([]time.Duration, 2*maxPause+3)
+	}
+
+	// readGCStats fills in the pause and end times histories (up to
+	// maxPause entries) and then three more: Unix ns time of last GC,
+	// number of GC, and total pause time in nanoseconds. Here we
+	// depend on the fact that time.Duration's native unit is
+	// nanoseconds, so the pauses and the total pause time do not need
+	// any conversion.
+	readGCStats(&stats.Pause)
+	n := len(stats.Pause) - 3
+	stats.LastGC = time.Unix(0, int64(stats.Pause[n]))
+	stats.NumGC = int64(stats.Pause[n+1])
+	stats.PauseTotal = stats.Pause[n+2]
+	n /= 2 // buffer holds pauses and end times
+	stats.Pause = stats.Pause[:n]
+
+	if cap(stats.PauseEnd) < maxPause {
+		stats.PauseEnd = make([]time.Time, 0, maxPause)
+	}
+	stats.PauseEnd = stats.PauseEnd[:0]
+	for _, ns := range stats.Pause[n : n+n] {
+		stats.PauseEnd = append(stats.PauseEnd, time.Unix(0, int64(ns)))
+	}
+
+	if len(stats.PauseQuantiles) > 0 {
+		if n == 0 {
+			for i := range stats.PauseQuantiles {
+				stats.PauseQuantiles[i] = 0
+			}
+		} else {
+			// There's room for a second copy of the data in stats.Pause.
+			// See the allocation at the top of the function.
+			sorted := stats.Pause[n : n+n]
+			copy(sorted, stats.Pause)
+			sort.Slice(sorted, func(i, j int) bool { return sorted[i] < sorted[j] })
+			nq := len(stats.PauseQuantiles) - 1
+			for i := 0; i < nq; i++ {
+				stats.PauseQuantiles[i] = sorted[len(sorted)*i/nq]
+			}
+			stats.PauseQuantiles[nq] = sorted[len(sorted)-1]
+		}
+	}
+}
+
+// SetGCPercent sets the garbage collection target percentage:
+// a collection is triggered when the ratio of freshly allocated data
+// to live data remaining after the previous collection reaches this percentage.
+// SetGCPercent returns the previous setting.
+// The initial setting is the value of the GOGC environment variable
+// at startup, or 100 if the variable is not set.
+// This setting may be effectively reduced in order to maintain a memory
+// limit.
+// A negative percentage effectively disables garbage collection, unless
+// the memory limit is reached.
+// See SetMemoryLimit for more details.
+func SetGCPercent(percent int) int {
+	return int(setGCPercent(int32(percent)))
+}
+
+// FreeOSMemory forces a garbage collection followed by an
+// attempt to return as much memory to the operating system
+// as possible. (Even if this is not called, the runtime gradually
+// returns memory to the operating system in a background task.)
+func FreeOSMemory() {
+	freeOSMemory()
+}
+
+// SetMaxStack sets the maximum amount of memory that
+// can be used by a single goroutine stack.
+// If any goroutine exceeds this limit while growing its stack,
+// the program crashes.
+// SetMaxStack returns the previous setting.
+// The initial setting is 1 GB on 64-bit systems, 250 MB on 32-bit systems.
+// There may be a system-imposed maximum stack limit regardless
+// of the value provided to SetMaxStack.
+//
+// SetMaxStack is useful mainly for limiting the damage done by
+// goroutines that enter an infinite recursion. It only limits future
+// stack growth.
+func SetMaxStack(bytes int) int {
+	return setMaxStack(bytes)
+}
+
+// SetMaxThreads sets the maximum number of operating system
+// threads that the Go program can use. If it attempts to use more than
+// this many, the program crashes.
+// SetMaxThreads returns the previous setting.
+// The initial setting is 10,000 threads.
+//
+// The limit controls the number of operating system threads, not the number
+// of goroutines. A Go program creates a new thread only when a goroutine
+// is ready to run but all the existing threads are blocked in system calls, cgo calls,
+// or are locked to other goroutines due to use of runtime.LockOSThread.
+//
+// SetMaxThreads is useful mainly for limiting the damage done by
+// programs that create an unbounded number of threads. The idea is
+// to take down the program before it takes down the operating system.
+func SetMaxThreads(threads int) int {
+	return setMaxThreads(threads)
+}
+
+// SetPanicOnFault controls the runtime's behavior when a program faults
+// at an unexpected (non-nil) address. Such faults are typically caused by
+// bugs such as runtime memory corruption, so the default response is to crash
+// the program. Programs working with memory-mapped files or unsafe
+// manipulation of memory may cause faults at non-nil addresses in less
+// dramatic situations; SetPanicOnFault allows such programs to request
+// that the runtime trigger only a panic, not a crash.
+// The runtime.Error that the runtime panics with may have an additional method:
+//
+//	Addr() uintptr
+//
+// If that method exists, it returns the memory address which triggered the fault.
+// The results of Addr are best-effort and the veracity of the result
+// may depend on the platform.
+// SetPanicOnFault applies only to the current goroutine.
+// It returns the previous setting.
+func SetPanicOnFault(enabled bool) bool {
+	return setPanicOnFault(enabled)
+}
+
+// WriteHeapDump writes a description of the heap and the objects in
+// it to the given file descriptor.
+//
+// WriteHeapDump suspends the execution of all goroutines until the heap
+// dump is completely written.  Thus, the file descriptor must not be
+// connected to a pipe or socket whose other end is in the same Go
+// process; instead, use a temporary file or network socket.
+//
+// The heap dump format is defined at https://golang.org/s/go15heapdump.
+func WriteHeapDump(fd uintptr)
+
+// SetTraceback sets the amount of detail printed by the runtime in
+// the traceback it prints before exiting due to an unrecovered panic
+// or an internal runtime error.
+// The level argument takes the same values as the GOTRACEBACK
+// environment variable. For example, SetTraceback("all") ensure
+// that the program prints all goroutines when it crashes.
+// See the package runtime documentation for details.
+// If SetTraceback is called with a level lower than that of the
+// environment variable, the call is ignored.
+func SetTraceback(level string)
+
+// SetMemoryLimit provides the runtime with a soft memory limit.
+//
+// The runtime undertakes several processes to try to respect this
+// memory limit, including adjustments to the frequency of garbage
+// collections and returning memory to the underlying system more
+// aggressively. This limit will be respected even if GOGC=off (or,
+// if SetGCPercent(-1) is executed).
+//
+// The input limit is provided as bytes, and includes all memory
+// mapped, managed, and not released by the Go runtime. Notably, it
+// does not account for space used by the Go binary and memory
+// external to Go, such as memory managed by the underlying system
+// on behalf of the process, or memory managed by non-Go code inside
+// the same process. Examples of excluded memory sources include: OS
+// kernel memory held on behalf of the process, memory allocated by
+// C code, and memory mapped by syscall.Mmap (because it is not
+// managed by the Go runtime).
+//
+// More specifically, the following expression accurately reflects
+// the value the runtime attempts to maintain as the limit:
+//
+//	runtime.MemStats.Sys - runtime.MemStats.HeapReleased
+//
+// or in terms of the runtime/metrics package:
+//
+//	/memory/classes/total:bytes - /memory/classes/heap/released:bytes
+//
+// A zero limit or a limit that's lower than the amount of memory
+// used by the Go runtime may cause the garbage collector to run
+// nearly continuously. However, the application may still make
+// progress.
+//
+// The memory limit is always respected by the Go runtime, so to
+// effectively disable this behavior, set the limit very high.
+// math.MaxInt64 is the canonical value for disabling the limit,
+// but values much greater than the available memory on the underlying
+// system work just as well.
+//
+// See https://go.dev/doc/gc-guide for a detailed guide explaining
+// the soft memory limit in more detail, as well as a variety of common
+// use-cases and scenarios.
+//
+// The initial setting is math.MaxInt64 unless the GOMEMLIMIT
+// environment variable is set, in which case it provides the initial
+// setting. GOMEMLIMIT is a numeric value in bytes with an optional
+// unit suffix. The supported suffixes include B, KiB, MiB, GiB, and
+// TiB. These suffixes represent quantities of bytes as defined by
+// the IEC 80000-13 standard. That is, they are based on powers of
+// two: KiB means 2^10 bytes, MiB means 2^20 bytes, and so on.
+//
+// SetMemoryLimit returns the previously set memory limit.
+// A negative input does not adjust the limit, and allows for
+// retrieval of the currently set memory limit.
+func SetMemoryLimit(limit int64) int64 {
+	return setMemoryLimit(limit)
+}
diff --git a/src/runtime/debug/garbage_test.go b/src/runtime/debug/garbage_test.go
new file mode 100644
index 0000000..cd91782
--- /dev/null
+++ b/src/runtime/debug/garbage_test.go
@@ -0,0 +1,238 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package debug_test
+
+import (
+	"internal/testenv"
+	"os"
+	"runtime"
+	. "runtime/debug"
+	"testing"
+	"time"
+)
+
+func TestReadGCStats(t *testing.T) {
+	defer SetGCPercent(SetGCPercent(-1))
+
+	var stats GCStats
+	var mstats runtime.MemStats
+	var min, max time.Duration
+
+	// First ReadGCStats will allocate, second should not,
+	// especially if we follow up with an explicit garbage collection.
+	stats.PauseQuantiles = make([]time.Duration, 10)
+	ReadGCStats(&stats)
+	runtime.GC()
+
+	// Assume these will return same data: no GC during ReadGCStats.
+	ReadGCStats(&stats)
+	runtime.ReadMemStats(&mstats)
+
+	if stats.NumGC != int64(mstats.NumGC) {
+		t.Errorf("stats.NumGC = %d, but mstats.NumGC = %d", stats.NumGC, mstats.NumGC)
+	}
+	if stats.PauseTotal != time.Duration(mstats.PauseTotalNs) {
+		t.Errorf("stats.PauseTotal = %d, but mstats.PauseTotalNs = %d", stats.PauseTotal, mstats.PauseTotalNs)
+	}
+	if stats.LastGC.UnixNano() != int64(mstats.LastGC) {
+		t.Errorf("stats.LastGC.UnixNano = %d, but mstats.LastGC = %d", stats.LastGC.UnixNano(), mstats.LastGC)
+	}
+	n := int(mstats.NumGC)
+	if n > len(mstats.PauseNs) {
+		n = len(mstats.PauseNs)
+	}
+	if len(stats.Pause) != n {
+		t.Errorf("len(stats.Pause) = %d, want %d", len(stats.Pause), n)
+	} else {
+		off := (int(mstats.NumGC) + len(mstats.PauseNs) - 1) % len(mstats.PauseNs)
+		for i := 0; i < n; i++ {
+			dt := stats.Pause[i]
+			if dt != time.Duration(mstats.PauseNs[off]) {
+				t.Errorf("stats.Pause[%d] = %d, want %d", i, dt, mstats.PauseNs[off])
+			}
+			if max < dt {
+				max = dt
+			}
+			if min > dt || i == 0 {
+				min = dt
+			}
+			off = (off + len(mstats.PauseNs) - 1) % len(mstats.PauseNs)
+		}
+	}
+
+	q := stats.PauseQuantiles
+	nq := len(q)
+	if q[0] != min || q[nq-1] != max {
+		t.Errorf("stats.PauseQuantiles = [%d, ..., %d], want [%d, ..., %d]", q[0], q[nq-1], min, max)
+	}
+
+	for i := 0; i < nq-1; i++ {
+		if q[i] > q[i+1] {
+			t.Errorf("stats.PauseQuantiles[%d]=%d > stats.PauseQuantiles[%d]=%d", i, q[i], i+1, q[i+1])
+		}
+	}
+
+	// compare memory stats with gc stats:
+	if len(stats.PauseEnd) != n {
+		t.Fatalf("len(stats.PauseEnd) = %d, want %d", len(stats.PauseEnd), n)
+	}
+	off := (int(mstats.NumGC) + len(mstats.PauseEnd) - 1) % len(mstats.PauseEnd)
+	for i := 0; i < n; i++ {
+		dt := stats.PauseEnd[i]
+		if dt.UnixNano() != int64(mstats.PauseEnd[off]) {
+			t.Errorf("stats.PauseEnd[%d] = %d, want %d", i, dt.UnixNano(), mstats.PauseEnd[off])
+		}
+		off = (off + len(mstats.PauseEnd) - 1) % len(mstats.PauseEnd)
+	}
+}
+
+var big []byte
+
+func TestFreeOSMemory(t *testing.T) {
+	// Tests FreeOSMemory by making big susceptible to collection
+	// and checking that at least that much memory is returned to
+	// the OS after.
+
+	const bigBytes = 32 << 20
+	big = make([]byte, bigBytes)
+
+	// Make sure any in-progress GCs are complete.
+	runtime.GC()
+
+	var before runtime.MemStats
+	runtime.ReadMemStats(&before)
+
+	// Clear the last reference to the big allocation, making it
+	// susceptible to collection.
+	big = nil
+
+	// FreeOSMemory runs a GC cycle before releasing memory,
+	// so it's fine to skip a GC here.
+	//
+	// It's possible the background scavenger runs concurrently
+	// with this function and does most of the work for it.
+	// If that happens, it's OK. What we want is a test that fails
+	// often if FreeOSMemory does not work correctly, and a test
+	// that passes every time if it does.
+	FreeOSMemory()
+
+	var after runtime.MemStats
+	runtime.ReadMemStats(&after)
+
+	// Check to make sure that the big allocation (now freed)
+	// had its memory shift into HeapReleased as a result of that
+	// FreeOSMemory.
+	if after.HeapReleased <= before.HeapReleased {
+		t.Fatalf("no memory released: %d -> %d", before.HeapReleased, after.HeapReleased)
+	}
+
+	// Check to make sure bigBytes was released, plus some slack. Pages may get
+	// allocated in between the two measurements above for a variety for reasons,
+	// most commonly for GC work bufs. Since this can get fairly high, depending
+	// on scheduling and what GOMAXPROCS is, give a lot of slack up-front.
+	//
+	// Add a little more slack too if the page size is bigger than the runtime page size.
+	// "big" could end up unaligned on its ends, forcing the scavenger to skip at worst
+	// 2x pages.
+	slack := uint64(bigBytes / 2)
+	pageSize := uint64(os.Getpagesize())
+	if pageSize > 8<<10 {
+		slack += pageSize * 2
+	}
+	if slack > bigBytes {
+		// We basically already checked this.
+		return
+	}
+	if after.HeapReleased-before.HeapReleased < bigBytes-slack {
+		t.Fatalf("less than %d released: %d -> %d", bigBytes-slack, before.HeapReleased, after.HeapReleased)
+	}
+}
+
+var (
+	setGCPercentBallast any
+	setGCPercentSink    any
+)
+
+func TestSetGCPercent(t *testing.T) {
+	testenv.SkipFlaky(t, 20076)
+
+	// Test that the variable is being set and returned correctly.
+	old := SetGCPercent(123)
+	new := SetGCPercent(old)
+	if new != 123 {
+		t.Errorf("SetGCPercent(123); SetGCPercent(x) = %d, want 123", new)
+	}
+
+	// Test that the percentage is implemented correctly.
+	defer func() {
+		SetGCPercent(old)
+		setGCPercentBallast, setGCPercentSink = nil, nil
+	}()
+	SetGCPercent(100)
+	runtime.GC()
+	// Create 100 MB of live heap as a baseline.
+	const baseline = 100 << 20
+	var ms runtime.MemStats
+	runtime.ReadMemStats(&ms)
+	setGCPercentBallast = make([]byte, baseline-ms.Alloc)
+	runtime.GC()
+	runtime.ReadMemStats(&ms)
+	if abs64(baseline-int64(ms.Alloc)) > 10<<20 {
+		t.Fatalf("failed to set up baseline live heap; got %d MB, want %d MB", ms.Alloc>>20, baseline>>20)
+	}
+	// NextGC should be ~200 MB.
+	const thresh = 20 << 20 // TODO: Figure out why this is so noisy on some builders
+	if want := int64(2 * baseline); abs64(want-int64(ms.NextGC)) > thresh {
+		t.Errorf("NextGC = %d MB, want %d±%d MB", ms.NextGC>>20, want>>20, thresh>>20)
+	}
+	// Create some garbage, but not enough to trigger another GC.
+	for i := 0; i < int(1.2*baseline); i += 1 << 10 {
+		setGCPercentSink = make([]byte, 1<<10)
+	}
+	setGCPercentSink = nil
+	// Adjust GOGC to 50. NextGC should be ~150 MB.
+	SetGCPercent(50)
+	runtime.ReadMemStats(&ms)
+	if want := int64(1.5 * baseline); abs64(want-int64(ms.NextGC)) > thresh {
+		t.Errorf("NextGC = %d MB, want %d±%d MB", ms.NextGC>>20, want>>20, thresh>>20)
+	}
+
+	// Trigger a GC and get back to 100 MB live with GOGC=100.
+	SetGCPercent(100)
+	runtime.GC()
+	// Raise live to 120 MB.
+	setGCPercentSink = make([]byte, int(0.2*baseline))
+	// Lower GOGC to 10. This must force a GC.
+	runtime.ReadMemStats(&ms)
+	ngc1 := ms.NumGC
+	SetGCPercent(10)
+	// It may require an allocation to actually force the GC.
+	setGCPercentSink = make([]byte, 1<<20)
+	runtime.ReadMemStats(&ms)
+	ngc2 := ms.NumGC
+	if ngc1 == ngc2 {
+		t.Errorf("expected GC to run but it did not")
+	}
+}
+
+func abs64(a int64) int64 {
+	if a < 0 {
+		return -a
+	}
+	return a
+}
+
+func TestSetMaxThreadsOvf(t *testing.T) {
+	// Verify that a big threads count will not overflow the int32
+	// maxmcount variable, causing a panic (see Issue 16076).
+	//
+	// This can only happen when ints are 64 bits, since on platforms
+	// with 32 bit ints SetMaxThreads (which takes an int parameter)
+	// cannot be given anything that will overflow an int32.
+	//
+	// Call SetMaxThreads with 1<<31, but only on 64 bit systems.
+	nt := SetMaxThreads(1 << (30 + ^uint(0)>>63))
+	SetMaxThreads(nt) // restore previous value
+}
diff --git a/src/runtime/debug/heapdump_test.go b/src/runtime/debug/heapdump_test.go
new file mode 100644
index 0000000..ee6b054
--- /dev/null
+++ b/src/runtime/debug/heapdump_test.go
@@ -0,0 +1,95 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package debug_test
+
+import (
+	"os"
+	"runtime"
+	. "runtime/debug"
+	"testing"
+)
+
+func TestWriteHeapDumpNonempty(t *testing.T) {
+	if runtime.GOOS == "js" {
+		t.Skipf("WriteHeapDump is not available on %s.", runtime.GOOS)
+	}
+	f, err := os.CreateTemp("", "heapdumptest")
+	if err != nil {
+		t.Fatalf("TempFile failed: %v", err)
+	}
+	defer os.Remove(f.Name())
+	defer f.Close()
+	WriteHeapDump(f.Fd())
+	fi, err := f.Stat()
+	if err != nil {
+		t.Fatalf("Stat failed: %v", err)
+	}
+	const minSize = 1
+	if size := fi.Size(); size < minSize {
+		t.Fatalf("Heap dump size %d bytes, expected at least %d bytes", size, minSize)
+	}
+}
+
+type Obj struct {
+	x, y int
+}
+
+func objfin(x *Obj) {
+	//println("finalized", x)
+}
+
+func TestWriteHeapDumpFinalizers(t *testing.T) {
+	if runtime.GOOS == "js" {
+		t.Skipf("WriteHeapDump is not available on %s.", runtime.GOOS)
+	}
+	f, err := os.CreateTemp("", "heapdumptest")
+	if err != nil {
+		t.Fatalf("TempFile failed: %v", err)
+	}
+	defer os.Remove(f.Name())
+	defer f.Close()
+
+	// bug 9172: WriteHeapDump couldn't handle more than one finalizer
+	println("allocating objects")
+	x := &Obj{}
+	runtime.SetFinalizer(x, objfin)
+	y := &Obj{}
+	runtime.SetFinalizer(y, objfin)
+
+	// Trigger collection of x and y, queueing of their finalizers.
+	println("starting gc")
+	runtime.GC()
+
+	// Make sure WriteHeapDump doesn't fail with multiple queued finalizers.
+	println("starting dump")
+	WriteHeapDump(f.Fd())
+	println("done dump")
+}
+
+type G[T any] struct{}
+type I interface {
+	M()
+}
+
+//go:noinline
+func (g G[T]) M() {}
+
+var dummy I = G[int]{}
+var dummy2 I = G[G[int]]{}
+
+func TestWriteHeapDumpTypeName(t *testing.T) {
+	if runtime.GOOS == "js" {
+		t.Skipf("WriteHeapDump is not available on %s.", runtime.GOOS)
+	}
+	f, err := os.CreateTemp("", "heapdumptest")
+	if err != nil {
+		t.Fatalf("TempFile failed: %v", err)
+	}
+	defer os.Remove(f.Name())
+	defer f.Close()
+	WriteHeapDump(f.Fd())
+	dummy.M()
+	dummy2.M()
+}
diff --git a/src/runtime/debug/mod.go b/src/runtime/debug/mod.go
new file mode 100644
index 0000000..7f85174
--- /dev/null
+++ b/src/runtime/debug/mod.go
@@ -0,0 +1,287 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package debug
+
+import (
+	"fmt"
+	"runtime"
+	"strconv"
+	"strings"
+)
+
+// exported from runtime.
+func modinfo() string
+
+// ReadBuildInfo returns the build information embedded
+// in the running binary. The information is available only
+// in binaries built with module support.
+func ReadBuildInfo() (info *BuildInfo, ok bool) {
+	data := modinfo()
+	if len(data) < 32 {
+		return nil, false
+	}
+	data = data[16 : len(data)-16]
+	bi, err := ParseBuildInfo(data)
+	if err != nil {
+		return nil, false
+	}
+
+	// The go version is stored separately from other build info, mostly for
+	// historical reasons. It is not part of the modinfo() string, and
+	// ParseBuildInfo does not recognize it. We inject it here to hide this
+	// awkwardness from the user.
+	bi.GoVersion = runtime.Version()
+
+	return bi, true
+}
+
+// BuildInfo represents the build information read from a Go binary.
+type BuildInfo struct {
+	// GoVersion is the version of the Go toolchain that built the binary
+	// (for example, "go1.19.2").
+	GoVersion string
+
+	// Path is the package path of the main package for the binary
+	// (for example, "golang.org/x/tools/cmd/stringer").
+	Path string
+
+	// Main describes the module that contains the main package for the binary.
+	Main Module
+
+	// Deps describes all the dependency modules, both direct and indirect,
+	// that contributed packages to the build of this binary.
+	Deps []*Module
+
+	// Settings describes the build settings used to build the binary.
+	Settings []BuildSetting
+}
+
+// A Module describes a single module included in a build.
+type Module struct {
+	Path    string  // module path
+	Version string  // module version
+	Sum     string  // checksum
+	Replace *Module // replaced by this module
+}
+
+// A BuildSetting is a key-value pair describing one setting that influenced a build.
+//
+// Defined keys include:
+//
+//   - -buildmode: the buildmode flag used (typically "exe")
+//   - -compiler: the compiler toolchain flag used (typically "gc")
+//   - CGO_ENABLED: the effective CGO_ENABLED environment variable
+//   - CGO_CFLAGS: the effective CGO_CFLAGS environment variable
+//   - CGO_CPPFLAGS: the effective CGO_CPPFLAGS environment variable
+//   - CGO_CXXFLAGS:  the effective CGO_CPPFLAGS environment variable
+//   - CGO_LDFLAGS: the effective CGO_CPPFLAGS environment variable
+//   - GOARCH: the architecture target
+//   - GOAMD64/GOARM/GO386/etc: the architecture feature level for GOARCH
+//   - GOOS: the operating system target
+//   - vcs: the version control system for the source tree where the build ran
+//   - vcs.revision: the revision identifier for the current commit or checkout
+//   - vcs.time: the modification time associated with vcs.revision, in RFC3339 format
+//   - vcs.modified: true or false indicating whether the source tree had local modifications
+type BuildSetting struct {
+	// Key and Value describe the build setting.
+	// Key must not contain an equals sign, space, tab, or newline.
+	// Value must not contain newlines ('\n').
+	Key, Value string
+}
+
+// quoteKey reports whether key is required to be quoted.
+func quoteKey(key string) bool {
+	return len(key) == 0 || strings.ContainsAny(key, "= \t\r\n\"`")
+}
+
+// quoteValue reports whether value is required to be quoted.
+func quoteValue(value string) bool {
+	return strings.ContainsAny(value, " \t\r\n\"`")
+}
+
+func (bi *BuildInfo) String() string {
+	buf := new(strings.Builder)
+	if bi.GoVersion != "" {
+		fmt.Fprintf(buf, "go\t%s\n", bi.GoVersion)
+	}
+	if bi.Path != "" {
+		fmt.Fprintf(buf, "path\t%s\n", bi.Path)
+	}
+	var formatMod func(string, Module)
+	formatMod = func(word string, m Module) {
+		buf.WriteString(word)
+		buf.WriteByte('\t')
+		buf.WriteString(m.Path)
+		buf.WriteByte('\t')
+		buf.WriteString(m.Version)
+		if m.Replace == nil {
+			buf.WriteByte('\t')
+			buf.WriteString(m.Sum)
+		} else {
+			buf.WriteByte('\n')
+			formatMod("=>", *m.Replace)
+		}
+		buf.WriteByte('\n')
+	}
+	if bi.Main != (Module{}) {
+		formatMod("mod", bi.Main)
+	}
+	for _, dep := range bi.Deps {
+		formatMod("dep", *dep)
+	}
+	for _, s := range bi.Settings {
+		key := s.Key
+		if quoteKey(key) {
+			key = strconv.Quote(key)
+		}
+		value := s.Value
+		if quoteValue(value) {
+			value = strconv.Quote(value)
+		}
+		fmt.Fprintf(buf, "build\t%s=%s\n", key, value)
+	}
+
+	return buf.String()
+}
+
+func ParseBuildInfo(data string) (bi *BuildInfo, err error) {
+	lineNum := 1
+	defer func() {
+		if err != nil {
+			err = fmt.Errorf("could not parse Go build info: line %d: %w", lineNum, err)
+		}
+	}()
+
+	var (
+		pathLine  = "path\t"
+		modLine   = "mod\t"
+		depLine   = "dep\t"
+		repLine   = "=>\t"
+		buildLine = "build\t"
+		newline   = "\n"
+		tab       = "\t"
+	)
+
+	readModuleLine := func(elem []string) (Module, error) {
+		if len(elem) != 2 && len(elem) != 3 {
+			return Module{}, fmt.Errorf("expected 2 or 3 columns; got %d", len(elem))
+		}
+		version := elem[1]
+		sum := ""
+		if len(elem) == 3 {
+			sum = elem[2]
+		}
+		return Module{
+			Path:    elem[0],
+			Version: version,
+			Sum:     sum,
+		}, nil
+	}
+
+	bi = new(BuildInfo)
+	var (
+		last *Module
+		line string
+		ok   bool
+	)
+	// Reverse of BuildInfo.String(), except for go version.
+	for len(data) > 0 {
+		line, data, ok = strings.Cut(data, newline)
+		if !ok {
+			break
+		}
+		switch {
+		case strings.HasPrefix(line, pathLine):
+			elem := line[len(pathLine):]
+			bi.Path = string(elem)
+		case strings.HasPrefix(line, modLine):
+			elem := strings.Split(line[len(modLine):], tab)
+			last = &bi.Main
+			*last, err = readModuleLine(elem)
+			if err != nil {
+				return nil, err
+			}
+		case strings.HasPrefix(line, depLine):
+			elem := strings.Split(line[len(depLine):], tab)
+			last = new(Module)
+			bi.Deps = append(bi.Deps, last)
+			*last, err = readModuleLine(elem)
+			if err != nil {
+				return nil, err
+			}
+		case strings.HasPrefix(line, repLine):
+			elem := strings.Split(line[len(repLine):], tab)
+			if len(elem) != 3 {
+				return nil, fmt.Errorf("expected 3 columns for replacement; got %d", len(elem))
+			}
+			if last == nil {
+				return nil, fmt.Errorf("replacement with no module on previous line")
+			}
+			last.Replace = &Module{
+				Path:    string(elem[0]),
+				Version: string(elem[1]),
+				Sum:     string(elem[2]),
+			}
+			last = nil
+		case strings.HasPrefix(line, buildLine):
+			kv := line[len(buildLine):]
+			if len(kv) < 1 {
+				return nil, fmt.Errorf("build line missing '='")
+			}
+
+			var key, rawValue string
+			switch kv[0] {
+			case '=':
+				return nil, fmt.Errorf("build line with missing key")
+
+			case '`', '"':
+				rawKey, err := strconv.QuotedPrefix(kv)
+				if err != nil {
+					return nil, fmt.Errorf("invalid quoted key in build line")
+				}
+				if len(kv) == len(rawKey) {
+					return nil, fmt.Errorf("build line missing '=' after quoted key")
+				}
+				if c := kv[len(rawKey)]; c != '=' {
+					return nil, fmt.Errorf("unexpected character after quoted key: %q", c)
+				}
+				key, _ = strconv.Unquote(rawKey)
+				rawValue = kv[len(rawKey)+1:]
+
+			default:
+				var ok bool
+				key, rawValue, ok = strings.Cut(kv, "=")
+				if !ok {
+					return nil, fmt.Errorf("build line missing '=' after key")
+				}
+				if quoteKey(key) {
+					return nil, fmt.Errorf("unquoted key %q must be quoted", key)
+				}
+			}
+
+			var value string
+			if len(rawValue) > 0 {
+				switch rawValue[0] {
+				case '`', '"':
+					var err error
+					value, err = strconv.Unquote(rawValue)
+					if err != nil {
+						return nil, fmt.Errorf("invalid quoted value in build line")
+					}
+
+				default:
+					value = rawValue
+					if quoteValue(value) {
+						return nil, fmt.Errorf("unquoted value %q must be quoted", value)
+					}
+				}
+			}
+
+			bi.Settings = append(bi.Settings, BuildSetting{Key: key, Value: value})
+		}
+		lineNum++
+	}
+	return bi, nil
+}
diff --git a/src/runtime/debug/mod_test.go b/src/runtime/debug/mod_test.go
new file mode 100644
index 0000000..b291769
--- /dev/null
+++ b/src/runtime/debug/mod_test.go
@@ -0,0 +1,75 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package debug_test
+
+import (
+	"reflect"
+	"runtime/debug"
+	"strings"
+	"testing"
+)
+
+// strip removes two leading tabs after each newline of s.
+func strip(s string) string {
+	replaced := strings.ReplaceAll(s, "\n\t\t", "\n")
+	if len(replaced) > 0 && replaced[0] == '\n' {
+		replaced = replaced[1:]
+	}
+	return replaced
+}
+
+func FuzzParseBuildInfoRoundTrip(f *testing.F) {
+	// Package built from outside a module, missing some fields..
+	f.Add(strip(`
+		path	rsc.io/fortune
+		mod	rsc.io/fortune	v1.0.0
+		`))
+
+	// Package built from the standard library, missing some fields..
+	f.Add(`path	cmd/test2json`)
+
+	// Package built from inside a module.
+	f.Add(strip(`
+		go	1.18
+		path	example.com/m
+		mod	example.com/m	(devel)	
+		build	-compiler=gc
+		`))
+
+	// Package built in GOPATH mode.
+	f.Add(strip(`
+		go	1.18
+		path	example.com/m
+		build	-compiler=gc
+		`))
+
+	// Escaped build info.
+	f.Add(strip(`
+		go 1.18
+		path example.com/m
+		build CRAZY_ENV="requires\nescaping"
+		`))
+
+	f.Fuzz(func(t *testing.T, s string) {
+		bi, err := debug.ParseBuildInfo(s)
+		if err != nil {
+			// Not a round-trippable BuildInfo string.
+			t.Log(err)
+			return
+		}
+
+		// s2 could have different escaping from s.
+		// However, it should parse to exactly the same contents.
+		s2 := bi.String()
+		bi2, err := debug.ParseBuildInfo(s2)
+		if err != nil {
+			t.Fatalf("%v:\n%s", err, s2)
+		}
+
+		if !reflect.DeepEqual(bi2, bi) {
+			t.Fatalf("Parsed representation differs.\ninput:\n%s\noutput:\n%s", s, s2)
+		}
+	})
+}
diff --git a/src/runtime/debug/panic_test.go b/src/runtime/debug/panic_test.go
new file mode 100644
index 0000000..ec5294c
--- /dev/null
+++ b/src/runtime/debug/panic_test.go
@@ -0,0 +1,56 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build aix || darwin || dragonfly || freebsd || linux || netbsd || openbsd
+
+// TODO: test on Windows?
+
+package debug_test
+
+import (
+	"runtime"
+	"runtime/debug"
+	"syscall"
+	"testing"
+	"unsafe"
+)
+
+func TestPanicOnFault(t *testing.T) {
+	if runtime.GOARCH == "s390x" {
+		t.Skip("s390x fault addresses are missing the low order bits")
+	}
+	if runtime.GOOS == "ios" {
+		t.Skip("iOS doesn't provide fault addresses")
+	}
+	if runtime.GOOS == "netbsd" && runtime.GOARCH == "arm" {
+		t.Skip("netbsd-arm doesn't provide fault address (golang.org/issue/45026)")
+	}
+	m, err := syscall.Mmap(-1, 0, 0x1000, syscall.PROT_READ /* Note: no PROT_WRITE */, syscall.MAP_SHARED|syscall.MAP_ANON)
+	if err != nil {
+		t.Fatalf("can't map anonymous memory: %s", err)
+	}
+	defer syscall.Munmap(m)
+	old := debug.SetPanicOnFault(true)
+	defer debug.SetPanicOnFault(old)
+	const lowBits = 0x3e7
+	defer func() {
+		r := recover()
+		if r == nil {
+			t.Fatalf("write did not fault")
+		}
+		type addressable interface {
+			Addr() uintptr
+		}
+		a, ok := r.(addressable)
+		if !ok {
+			t.Fatalf("fault does not contain address")
+		}
+		want := uintptr(unsafe.Pointer(&m[lowBits]))
+		got := a.Addr()
+		if got != want {
+			t.Fatalf("fault address %x, want %x", got, want)
+		}
+	}()
+	m[lowBits] = 1 // will fault
+}
diff --git a/src/runtime/debug/stack.go b/src/runtime/debug/stack.go
new file mode 100644
index 0000000..5d810af
--- /dev/null
+++ b/src/runtime/debug/stack.go
@@ -0,0 +1,30 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Package debug contains facilities for programs to debug themselves while
+// they are running.
+package debug
+
+import (
+	"os"
+	"runtime"
+)
+
+// PrintStack prints to standard error the stack trace returned by runtime.Stack.
+func PrintStack() {
+	os.Stderr.Write(Stack())
+}
+
+// Stack returns a formatted stack trace of the goroutine that calls it.
+// It calls runtime.Stack with a large enough buffer to capture the entire trace.
+func Stack() []byte {
+	buf := make([]byte, 1024)
+	for {
+		n := runtime.Stack(buf, false)
+		if n < len(buf) {
+			return buf[:n]
+		}
+		buf = make([]byte, 2*len(buf))
+	}
+}
diff --git a/src/runtime/debug/stack_test.go b/src/runtime/debug/stack_test.go
new file mode 100644
index 0000000..671057c
--- /dev/null
+++ b/src/runtime/debug/stack_test.go
@@ -0,0 +1,121 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package debug_test
+
+import (
+	"bytes"
+	"fmt"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"runtime"
+	. "runtime/debug"
+	"strings"
+	"testing"
+)
+
+func TestMain(m *testing.M) {
+	if os.Getenv("GO_RUNTIME_DEBUG_TEST_DUMP_GOROOT") != "" {
+		fmt.Println(runtime.GOROOT())
+		os.Exit(0)
+	}
+	os.Exit(m.Run())
+}
+
+type T int
+
+func (t *T) ptrmethod() []byte {
+	return Stack()
+}
+func (t T) method() []byte {
+	return t.ptrmethod()
+}
+
+/*
+The traceback should look something like this, modulo line numbers and hex constants.
+Don't worry much about the base levels, but check the ones in our own package.
+
+	goroutine 10 [running]:
+	runtime/debug.Stack(0x0, 0x0, 0x0)
+		/Users/r/go/src/runtime/debug/stack.go:28 +0x80
+	runtime/debug.(*T).ptrmethod(0xc82005ee70, 0x0, 0x0, 0x0)
+		/Users/r/go/src/runtime/debug/stack_test.go:15 +0x29
+	runtime/debug.T.method(0x0, 0x0, 0x0, 0x0)
+		/Users/r/go/src/runtime/debug/stack_test.go:18 +0x32
+	runtime/debug.TestStack(0xc8201ce000)
+		/Users/r/go/src/runtime/debug/stack_test.go:37 +0x38
+	testing.tRunner(0xc8201ce000, 0x664b58)
+		/Users/r/go/src/testing/testing.go:456 +0x98
+	created by testing.RunTests
+		/Users/r/go/src/testing/testing.go:561 +0x86d
+*/
+func TestStack(t *testing.T) {
+	b := T(0).method()
+	lines := strings.Split(string(b), "\n")
+	if len(lines) < 6 {
+		t.Fatal("too few lines")
+	}
+
+	// If built with -trimpath, file locations should start with package paths.
+	// Otherwise, file locations should start with a GOROOT/src prefix
+	// (for whatever value of GOROOT is baked into the binary, not the one
+	// that may be set in the environment).
+	fileGoroot := ""
+	if envGoroot := os.Getenv("GOROOT"); envGoroot != "" {
+		// Since GOROOT is set explicitly in the environment, we can't be certain
+		// that it is the same GOROOT value baked into the binary, and we can't
+		// change the value in-process because runtime.GOROOT uses the value from
+		// initial (not current) environment. Spawn a subprocess to determine the
+		// real baked-in GOROOT.
+		t.Logf("found GOROOT %q from environment; checking embedded GOROOT value", envGoroot)
+		testenv.MustHaveExec(t)
+		exe, err := os.Executable()
+		if err != nil {
+			t.Fatal(err)
+		}
+		cmd := exec.Command(exe)
+		cmd.Env = append(os.Environ(), "GOROOT=", "GO_RUNTIME_DEBUG_TEST_DUMP_GOROOT=1")
+		out, err := cmd.Output()
+		if err != nil {
+			t.Fatal(err)
+		}
+		fileGoroot = string(bytes.TrimSpace(out))
+	} else {
+		// Since GOROOT is not set in the environment, its value (if any) must come
+		// from the path embedded in the binary.
+		fileGoroot = runtime.GOROOT()
+	}
+	filePrefix := ""
+	if fileGoroot != "" {
+		filePrefix = filepath.ToSlash(fileGoroot) + "/src/"
+	}
+
+	n := 0
+	frame := func(file, code string) {
+		t.Helper()
+
+		line := lines[n]
+		if !strings.Contains(line, code) {
+			t.Errorf("expected %q in %q", code, line)
+		}
+		n++
+
+		line = lines[n]
+
+		wantPrefix := "\t" + filePrefix + file
+		if !strings.HasPrefix(line, wantPrefix) {
+			t.Errorf("in line %q, expected prefix %q", line, wantPrefix)
+		}
+		n++
+	}
+	n++
+
+	frame("runtime/debug/stack.go", "runtime/debug.Stack")
+	frame("runtime/debug/stack_test.go", "runtime/debug_test.(*T).ptrmethod")
+	frame("runtime/debug/stack_test.go", "runtime/debug_test.T.method")
+	frame("runtime/debug/stack_test.go", "runtime/debug_test.TestStack")
+	frame("testing/testing.go", "")
+}
diff --git a/src/runtime/debug/stubs.go b/src/runtime/debug/stubs.go
new file mode 100644
index 0000000..913d4b9
--- /dev/null
+++ b/src/runtime/debug/stubs.go
@@ -0,0 +1,18 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package debug
+
+import (
+	"time"
+)
+
+// Implemented in package runtime.
+func readGCStats(*[]time.Duration)
+func freeOSMemory()
+func setMaxStack(int) int
+func setGCPercent(int32) int32
+func setPanicOnFault(bool) bool
+func setMaxThreads(int) int
+func setMemoryLimit(int64) int64
diff --git a/src/runtime/debug/testdata/fuzz/FuzzParseBuildInfoRoundTrip/5501685e611fa764 b/src/runtime/debug/testdata/fuzz/FuzzParseBuildInfoRoundTrip/5501685e611fa764
new file mode 100644
index 0000000..4ab5d92
--- /dev/null
+++ b/src/runtime/debug/testdata/fuzz/FuzzParseBuildInfoRoundTrip/5501685e611fa764
@@ -0,0 +1,2 @@
+go test fuzz v1
+string("mod\t\t0\n")
diff --git a/src/runtime/debug/testdata/fuzz/FuzzParseBuildInfoRoundTrip/71634114e78567cf b/src/runtime/debug/testdata/fuzz/FuzzParseBuildInfoRoundTrip/71634114e78567cf
new file mode 100644
index 0000000..741c4df
--- /dev/null
+++ b/src/runtime/debug/testdata/fuzz/FuzzParseBuildInfoRoundTrip/71634114e78567cf
@@ -0,0 +1,2 @@
+go test fuzz v1
+string("mod\t0\t\n")
diff --git a/src/runtime/debug/testdata/fuzz/FuzzParseBuildInfoRoundTrip/c73dce23c1f2494c b/src/runtime/debug/testdata/fuzz/FuzzParseBuildInfoRoundTrip/c73dce23c1f2494c
new file mode 100644
index 0000000..60f9338
--- /dev/null
+++ b/src/runtime/debug/testdata/fuzz/FuzzParseBuildInfoRoundTrip/c73dce23c1f2494c
@@ -0,0 +1,2 @@
+go test fuzz v1
+string("build\t0=\" 0\"\n")
diff --git a/src/runtime/debug_test.go b/src/runtime/debug_test.go
new file mode 100644
index 0000000..75fe07e
--- /dev/null
+++ b/src/runtime/debug_test.go
@@ -0,0 +1,307 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// TODO: This test could be implemented on all (most?) UNIXes if we
+// added syscall.Tgkill more widely.
+
+// We skip all of these tests under race mode because our test thread
+// spends all of its time in the race runtime, which isn't a safe
+// point.
+
+//go:build (amd64 || arm64) && linux && !race
+
+package runtime_test
+
+import (
+	"fmt"
+	"internal/abi"
+	"math"
+	"os"
+	"regexp"
+	"runtime"
+	"runtime/debug"
+	"sync/atomic"
+	"syscall"
+	"testing"
+)
+
+func startDebugCallWorker(t *testing.T) (g *runtime.G, after func()) {
+	// This can deadlock if run under a debugger because it
+	// depends on catching SIGTRAP, which is usually swallowed by
+	// a debugger.
+	skipUnderDebugger(t)
+
+	// This can deadlock if there aren't enough threads or if a GC
+	// tries to interrupt an atomic loop (see issue #10958). Execute
+	// an extra GC to ensure even the sweep phase is done (out of
+	// caution to prevent #49370 from happening).
+	// TODO(mknyszek): This extra GC cycle is likely unnecessary
+	// because preemption (which may happen during the sweep phase)
+	// isn't much of an issue anymore thanks to asynchronous preemption.
+	// The biggest risk is having a write barrier in the debug call
+	// injection test code fire, because it runs in a signal handler
+	// and may not have a P.
+	//
+	// We use 8 Ps so there's room for the debug call worker,
+	// something that's trying to preempt the call worker, and the
+	// goroutine that's trying to stop the call worker.
+	ogomaxprocs := runtime.GOMAXPROCS(8)
+	ogcpercent := debug.SetGCPercent(-1)
+	runtime.GC()
+
+	// ready is a buffered channel so debugCallWorker won't block
+	// on sending to it. This makes it less likely we'll catch
+	// debugCallWorker while it's in the runtime.
+	ready := make(chan *runtime.G, 1)
+	var stop uint32
+	done := make(chan error)
+	go debugCallWorker(ready, &stop, done)
+	g = <-ready
+	return g, func() {
+		atomic.StoreUint32(&stop, 1)
+		err := <-done
+		if err != nil {
+			t.Fatal(err)
+		}
+		runtime.GOMAXPROCS(ogomaxprocs)
+		debug.SetGCPercent(ogcpercent)
+	}
+}
+
+func debugCallWorker(ready chan<- *runtime.G, stop *uint32, done chan<- error) {
+	runtime.LockOSThread()
+	defer runtime.UnlockOSThread()
+
+	ready <- runtime.Getg()
+
+	x := 2
+	debugCallWorker2(stop, &x)
+	if x != 1 {
+		done <- fmt.Errorf("want x = 2, got %d; register pointer not adjusted?", x)
+	}
+	close(done)
+}
+
+// Don't inline this function, since we want to test adjusting
+// pointers in the arguments.
+//
+//go:noinline
+func debugCallWorker2(stop *uint32, x *int) {
+	for atomic.LoadUint32(stop) == 0 {
+		// Strongly encourage x to live in a register so we
+		// can test pointer register adjustment.
+		*x++
+	}
+	*x = 1
+}
+
+func debugCallTKill(tid int) error {
+	return syscall.Tgkill(syscall.Getpid(), tid, syscall.SIGTRAP)
+}
+
+// skipUnderDebugger skips the current test when running under a
+// debugger (specifically if this process has a tracer). This is
+// Linux-specific.
+func skipUnderDebugger(t *testing.T) {
+	pid := syscall.Getpid()
+	status, err := os.ReadFile(fmt.Sprintf("/proc/%d/status", pid))
+	if err != nil {
+		t.Logf("couldn't get proc tracer: %s", err)
+		return
+	}
+	re := regexp.MustCompile(`TracerPid:\s+([0-9]+)`)
+	sub := re.FindSubmatch(status)
+	if sub == nil {
+		t.Logf("couldn't find proc tracer PID")
+		return
+	}
+	if string(sub[1]) == "0" {
+		return
+	}
+	t.Skip("test will deadlock under a debugger")
+}
+
+func TestDebugCall(t *testing.T) {
+	g, after := startDebugCallWorker(t)
+	defer after()
+
+	type stackArgs struct {
+		x0    int
+		x1    float64
+		y0Ret int
+		y1Ret float64
+	}
+
+	// Inject a call into the debugCallWorker goroutine and test
+	// basic argument and result passing.
+	fn := func(x int, y float64) (y0Ret int, y1Ret float64) {
+		return x + 1, y + 1.0
+	}
+	var args *stackArgs
+	var regs abi.RegArgs
+	intRegs := regs.Ints[:]
+	floatRegs := regs.Floats[:]
+	fval := float64(42.0)
+	if len(intRegs) > 0 {
+		intRegs[0] = 42
+		floatRegs[0] = math.Float64bits(fval)
+	} else {
+		args = &stackArgs{
+			x0: 42,
+			x1: 42.0,
+		}
+	}
+
+	if _, err := runtime.InjectDebugCall(g, fn, &regs, args, debugCallTKill, false); err != nil {
+		t.Fatal(err)
+	}
+	var result0 int
+	var result1 float64
+	if len(intRegs) > 0 {
+		result0 = int(intRegs[0])
+		result1 = math.Float64frombits(floatRegs[0])
+	} else {
+		result0 = args.y0Ret
+		result1 = args.y1Ret
+	}
+	if result0 != 43 {
+		t.Errorf("want 43, got %d", result0)
+	}
+	if result1 != fval+1 {
+		t.Errorf("want 43, got %f", result1)
+	}
+}
+
+func TestDebugCallLarge(t *testing.T) {
+	g, after := startDebugCallWorker(t)
+	defer after()
+
+	// Inject a call with a large call frame.
+	const N = 128
+	var args struct {
+		in  [N]int
+		out [N]int
+	}
+	fn := func(in [N]int) (out [N]int) {
+		for i := range in {
+			out[i] = in[i] + 1
+		}
+		return
+	}
+	var want [N]int
+	for i := range args.in {
+		args.in[i] = i
+		want[i] = i + 1
+	}
+	if _, err := runtime.InjectDebugCall(g, fn, nil, &args, debugCallTKill, false); err != nil {
+		t.Fatal(err)
+	}
+	if want != args.out {
+		t.Fatalf("want %v, got %v", want, args.out)
+	}
+}
+
+func TestDebugCallGC(t *testing.T) {
+	g, after := startDebugCallWorker(t)
+	defer after()
+
+	// Inject a call that performs a GC.
+	if _, err := runtime.InjectDebugCall(g, runtime.GC, nil, nil, debugCallTKill, false); err != nil {
+		t.Fatal(err)
+	}
+}
+
+func TestDebugCallGrowStack(t *testing.T) {
+	g, after := startDebugCallWorker(t)
+	defer after()
+
+	// Inject a call that grows the stack. debugCallWorker checks
+	// for stack pointer breakage.
+	if _, err := runtime.InjectDebugCall(g, func() { growStack(nil) }, nil, nil, debugCallTKill, false); err != nil {
+		t.Fatal(err)
+	}
+}
+
+//go:nosplit
+func debugCallUnsafePointWorker(gpp **runtime.G, ready, stop *uint32) {
+	// The nosplit causes this function to not contain safe-points
+	// except at calls.
+	runtime.LockOSThread()
+	defer runtime.UnlockOSThread()
+
+	*gpp = runtime.Getg()
+
+	for atomic.LoadUint32(stop) == 0 {
+		atomic.StoreUint32(ready, 1)
+	}
+}
+
+func TestDebugCallUnsafePoint(t *testing.T) {
+	skipUnderDebugger(t)
+
+	// This can deadlock if there aren't enough threads or if a GC
+	// tries to interrupt an atomic loop (see issue #10958).
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(8))
+
+	// InjectDebugCall cannot be executed while a GC is actively in
+	// progress. Wait until the current GC is done, and turn it off.
+	//
+	// See #49370.
+	runtime.GC()
+	defer debug.SetGCPercent(debug.SetGCPercent(-1))
+
+	// Test that the runtime refuses call injection at unsafe points.
+	var g *runtime.G
+	var ready, stop uint32
+	defer atomic.StoreUint32(&stop, 1)
+	go debugCallUnsafePointWorker(&g, &ready, &stop)
+	for atomic.LoadUint32(&ready) == 0 {
+		runtime.Gosched()
+	}
+
+	_, err := runtime.InjectDebugCall(g, func() {}, nil, nil, debugCallTKill, true)
+	if msg := "call not at safe point"; err == nil || err.Error() != msg {
+		t.Fatalf("want %q, got %s", msg, err)
+	}
+}
+
+func TestDebugCallPanic(t *testing.T) {
+	skipUnderDebugger(t)
+
+	// This can deadlock if there aren't enough threads.
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(8))
+
+	// InjectDebugCall cannot be executed while a GC is actively in
+	// progress. Wait until the current GC is done, and turn it off.
+	//
+	// See #10958 and #49370.
+	defer debug.SetGCPercent(debug.SetGCPercent(-1))
+	// TODO(mknyszek): This extra GC cycle is likely unnecessary
+	// because preemption (which may happen during the sweep phase)
+	// isn't much of an issue anymore thanks to asynchronous preemption.
+	// The biggest risk is having a write barrier in the debug call
+	// injection test code fire, because it runs in a signal handler
+	// and may not have a P.
+	runtime.GC()
+
+	ready := make(chan *runtime.G)
+	var stop uint32
+	defer atomic.StoreUint32(&stop, 1)
+	go func() {
+		runtime.LockOSThread()
+		defer runtime.UnlockOSThread()
+		ready <- runtime.Getg()
+		for atomic.LoadUint32(&stop) == 0 {
+		}
+	}()
+	g := <-ready
+
+	p, err := runtime.InjectDebugCall(g, func() { panic("test") }, nil, nil, debugCallTKill, false)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if ps, ok := p.(string); !ok || ps != "test" {
+		t.Fatalf("wanted panic %v, got %v", "test", p)
+	}
+}
diff --git a/src/runtime/debugcall.go b/src/runtime/debugcall.go
new file mode 100644
index 0000000..e793545
--- /dev/null
+++ b/src/runtime/debugcall.go
@@ -0,0 +1,257 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build amd64 || arm64
+
+package runtime
+
+import (
+	"internal/abi"
+	"unsafe"
+)
+
+const (
+	debugCallSystemStack = "executing on Go runtime stack"
+	debugCallUnknownFunc = "call from unknown function"
+	debugCallRuntime     = "call from within the Go runtime"
+	debugCallUnsafePoint = "call not at safe point"
+)
+
+func debugCallV2()
+func debugCallPanicked(val any)
+
+// debugCallCheck checks whether it is safe to inject a debugger
+// function call with return PC pc. If not, it returns a string
+// explaining why.
+//
+//go:nosplit
+func debugCallCheck(pc uintptr) string {
+	// No user calls from the system stack.
+	if getg() != getg().m.curg {
+		return debugCallSystemStack
+	}
+	if sp := getcallersp(); !(getg().stack.lo < sp && sp <= getg().stack.hi) {
+		// Fast syscalls (nanotime) and racecall switch to the
+		// g0 stack without switching g. We can't safely make
+		// a call in this state. (We can't even safely
+		// systemstack.)
+		return debugCallSystemStack
+	}
+
+	// Switch to the system stack to avoid overflowing the user
+	// stack.
+	var ret string
+	systemstack(func() {
+		f := findfunc(pc)
+		if !f.valid() {
+			ret = debugCallUnknownFunc
+			return
+		}
+
+		name := funcname(f)
+
+		switch name {
+		case "debugCall32",
+			"debugCall64",
+			"debugCall128",
+			"debugCall256",
+			"debugCall512",
+			"debugCall1024",
+			"debugCall2048",
+			"debugCall4096",
+			"debugCall8192",
+			"debugCall16384",
+			"debugCall32768",
+			"debugCall65536":
+			// These functions are allowed so that the debugger can initiate multiple function calls.
+			// See: https://golang.org/cl/161137/
+			return
+		}
+
+		// Disallow calls from the runtime. We could
+		// potentially make this condition tighter (e.g., not
+		// when locks are held), but there are enough tightly
+		// coded sequences (e.g., defer handling) that it's
+		// better to play it safe.
+		if pfx := "runtime."; len(name) > len(pfx) && name[:len(pfx)] == pfx {
+			ret = debugCallRuntime
+			return
+		}
+
+		// Check that this isn't an unsafe-point.
+		if pc != f.entry() {
+			pc--
+		}
+		up := pcdatavalue(f, abi.PCDATA_UnsafePoint, pc, nil)
+		if up != abi.UnsafePointSafe {
+			// Not at a safe point.
+			ret = debugCallUnsafePoint
+		}
+	})
+	return ret
+}
+
+// debugCallWrap starts a new goroutine to run a debug call and blocks
+// the calling goroutine. On the goroutine, it prepares to recover
+// panics from the debug call, and then calls the call dispatching
+// function at PC dispatch.
+//
+// This must be deeply nosplit because there are untyped values on the
+// stack from debugCallV2.
+//
+//go:nosplit
+func debugCallWrap(dispatch uintptr) {
+	var lockedExt uint32
+	callerpc := getcallerpc()
+	gp := getg()
+
+	// Lock ourselves to the OS thread.
+	//
+	// Debuggers rely on us running on the same thread until we get to
+	// dispatch the function they asked as to.
+	//
+	// We're going to transfer this to the new G we just created.
+	lockOSThread()
+
+	// Create a new goroutine to execute the call on. Run this on
+	// the system stack to avoid growing our stack.
+	systemstack(func() {
+		// TODO(mknyszek): It would be nice to wrap these arguments in an allocated
+		// closure and start the goroutine with that closure, but the compiler disallows
+		// implicit closure allocation in the runtime.
+		fn := debugCallWrap1
+		newg := newproc1(*(**funcval)(unsafe.Pointer(&fn)), gp, callerpc)
+		args := &debugCallWrapArgs{
+			dispatch: dispatch,
+			callingG: gp,
+		}
+		newg.param = unsafe.Pointer(args)
+
+		// Transfer locked-ness to the new goroutine.
+		// Save lock state to restore later.
+		mp := gp.m
+		if mp != gp.lockedm.ptr() {
+			throw("inconsistent lockedm")
+		}
+		// Save the external lock count and clear it so
+		// that it can't be unlocked from the debug call.
+		// Note: we already locked internally to the thread,
+		// so if we were locked before we're still locked now.
+		lockedExt = mp.lockedExt
+		mp.lockedExt = 0
+
+		mp.lockedg.set(newg)
+		newg.lockedm.set(mp)
+		gp.lockedm = 0
+
+		// Mark the calling goroutine as being at an async
+		// safe-point, since it has a few conservative frames
+		// at the bottom of the stack. This also prevents
+		// stack shrinks.
+		gp.asyncSafePoint = true
+
+		// Stash newg away so we can execute it below (mcall's
+		// closure can't capture anything).
+		gp.schedlink.set(newg)
+	})
+
+	// Switch to the new goroutine.
+	mcall(func(gp *g) {
+		// Get newg.
+		newg := gp.schedlink.ptr()
+		gp.schedlink = 0
+
+		// Park the calling goroutine.
+		if traceEnabled() {
+			traceGoPark(traceBlockDebugCall, 1)
+		}
+		casGToWaiting(gp, _Grunning, waitReasonDebugCall)
+		dropg()
+
+		// Directly execute the new goroutine. The debug
+		// protocol will continue on the new goroutine, so
+		// it's important we not just let the scheduler do
+		// this or it may resume a different goroutine.
+		execute(newg, true)
+	})
+
+	// We'll resume here when the call returns.
+
+	// Restore locked state.
+	mp := gp.m
+	mp.lockedExt = lockedExt
+	mp.lockedg.set(gp)
+	gp.lockedm.set(mp)
+
+	// Undo the lockOSThread we did earlier.
+	unlockOSThread()
+
+	gp.asyncSafePoint = false
+}
+
+type debugCallWrapArgs struct {
+	dispatch uintptr
+	callingG *g
+}
+
+// debugCallWrap1 is the continuation of debugCallWrap on the callee
+// goroutine.
+func debugCallWrap1() {
+	gp := getg()
+	args := (*debugCallWrapArgs)(gp.param)
+	dispatch, callingG := args.dispatch, args.callingG
+	gp.param = nil
+
+	// Dispatch call and trap panics.
+	debugCallWrap2(dispatch)
+
+	// Resume the caller goroutine.
+	getg().schedlink.set(callingG)
+	mcall(func(gp *g) {
+		callingG := gp.schedlink.ptr()
+		gp.schedlink = 0
+
+		// Unlock this goroutine from the M if necessary. The
+		// calling G will relock.
+		if gp.lockedm != 0 {
+			gp.lockedm = 0
+			gp.m.lockedg = 0
+		}
+
+		// Switch back to the calling goroutine. At some point
+		// the scheduler will schedule us again and we'll
+		// finish exiting.
+		if traceEnabled() {
+			traceGoSched()
+		}
+		casgstatus(gp, _Grunning, _Grunnable)
+		dropg()
+		lock(&sched.lock)
+		globrunqput(gp)
+		unlock(&sched.lock)
+
+		if traceEnabled() {
+			traceGoUnpark(callingG, 0)
+		}
+		casgstatus(callingG, _Gwaiting, _Grunnable)
+		execute(callingG, true)
+	})
+}
+
+func debugCallWrap2(dispatch uintptr) {
+	// Call the dispatch function and trap panics.
+	var dispatchF func()
+	dispatchFV := funcval{dispatch}
+	*(*unsafe.Pointer)(unsafe.Pointer(&dispatchF)) = noescape(unsafe.Pointer(&dispatchFV))
+
+	var ok bool
+	defer func() {
+		if !ok {
+			err := recover()
+			debugCallPanicked(err)
+		}
+	}()
+	dispatchF()
+	ok = true
+}
diff --git a/src/runtime/debuglog.go b/src/runtime/debuglog.go
new file mode 100644
index 0000000..873f1b4
--- /dev/null
+++ b/src/runtime/debuglog.go
@@ -0,0 +1,831 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This file provides an internal debug logging facility. The debug
+// log is a lightweight, in-memory, per-M ring buffer. By default, the
+// runtime prints the debug log on panic.
+//
+// To print something to the debug log, call dlog to obtain a dlogger
+// and use the methods on that to add values. The values will be
+// space-separated in the output (much like println).
+//
+// This facility can be enabled by passing -tags debuglog when
+// building. Without this tag, dlog calls compile to nothing.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// debugLogBytes is the size of each per-M ring buffer. This is
+// allocated off-heap to avoid blowing up the M and hence the GC'd
+// heap size.
+const debugLogBytes = 16 << 10
+
+// debugLogStringLimit is the maximum number of bytes in a string.
+// Above this, the string will be truncated with "..(n more bytes).."
+const debugLogStringLimit = debugLogBytes / 8
+
+// dlog returns a debug logger. The caller can use methods on the
+// returned logger to add values, which will be space-separated in the
+// final output, much like println. The caller must call end() to
+// finish the message.
+//
+// dlog can be used from highly-constrained corners of the runtime: it
+// is safe to use in the signal handler, from within the write
+// barrier, from within the stack implementation, and in places that
+// must be recursively nosplit.
+//
+// This will be compiled away if built without the debuglog build tag.
+// However, argument construction may not be. If any of the arguments
+// are not literals or trivial expressions, consider protecting the
+// call with "if dlogEnabled".
+//
+//go:nosplit
+//go:nowritebarrierrec
+func dlog() *dlogger {
+	if !dlogEnabled {
+		return nil
+	}
+
+	// Get the time.
+	tick, nano := uint64(cputicks()), uint64(nanotime())
+
+	// Try to get a cached logger.
+	l := getCachedDlogger()
+
+	// If we couldn't get a cached logger, try to get one from the
+	// global pool.
+	if l == nil {
+		allp := (*uintptr)(unsafe.Pointer(&allDloggers))
+		all := (*dlogger)(unsafe.Pointer(atomic.Loaduintptr(allp)))
+		for l1 := all; l1 != nil; l1 = l1.allLink {
+			if l1.owned.Load() == 0 && l1.owned.CompareAndSwap(0, 1) {
+				l = l1
+				break
+			}
+		}
+	}
+
+	// If that failed, allocate a new logger.
+	if l == nil {
+		// Use sysAllocOS instead of sysAlloc because we want to interfere
+		// with the runtime as little as possible, and sysAlloc updates accounting.
+		l = (*dlogger)(sysAllocOS(unsafe.Sizeof(dlogger{})))
+		if l == nil {
+			throw("failed to allocate debug log")
+		}
+		l.w.r.data = &l.w.data
+		l.owned.Store(1)
+
+		// Prepend to allDloggers list.
+		headp := (*uintptr)(unsafe.Pointer(&allDloggers))
+		for {
+			head := atomic.Loaduintptr(headp)
+			l.allLink = (*dlogger)(unsafe.Pointer(head))
+			if atomic.Casuintptr(headp, head, uintptr(unsafe.Pointer(l))) {
+				break
+			}
+		}
+	}
+
+	// If the time delta is getting too high, write a new sync
+	// packet. We set the limit so we don't write more than 6
+	// bytes of delta in the record header.
+	const deltaLimit = 1<<(3*7) - 1 // ~2ms between sync packets
+	if tick-l.w.tick > deltaLimit || nano-l.w.nano > deltaLimit {
+		l.w.writeSync(tick, nano)
+	}
+
+	// Reserve space for framing header.
+	l.w.ensure(debugLogHeaderSize)
+	l.w.write += debugLogHeaderSize
+
+	// Write record header.
+	l.w.uvarint(tick - l.w.tick)
+	l.w.uvarint(nano - l.w.nano)
+	gp := getg()
+	if gp != nil && gp.m != nil && gp.m.p != 0 {
+		l.w.varint(int64(gp.m.p.ptr().id))
+	} else {
+		l.w.varint(-1)
+	}
+
+	return l
+}
+
+// A dlogger writes to the debug log.
+//
+// To obtain a dlogger, call dlog(). When done with the dlogger, call
+// end().
+type dlogger struct {
+	_ sys.NotInHeap
+	w debugLogWriter
+
+	// allLink is the next dlogger in the allDloggers list.
+	allLink *dlogger
+
+	// owned indicates that this dlogger is owned by an M. This is
+	// accessed atomically.
+	owned atomic.Uint32
+}
+
+// allDloggers is a list of all dloggers, linked through
+// dlogger.allLink. This is accessed atomically. This is prepend only,
+// so it doesn't need to protect against ABA races.
+var allDloggers *dlogger
+
+//go:nosplit
+func (l *dlogger) end() {
+	if !dlogEnabled {
+		return
+	}
+
+	// Fill in framing header.
+	size := l.w.write - l.w.r.end
+	if !l.w.writeFrameAt(l.w.r.end, size) {
+		throw("record too large")
+	}
+
+	// Commit the record.
+	l.w.r.end = l.w.write
+
+	// Attempt to return this logger to the cache.
+	if putCachedDlogger(l) {
+		return
+	}
+
+	// Return the logger to the global pool.
+	l.owned.Store(0)
+}
+
+const (
+	debugLogUnknown = 1 + iota
+	debugLogBoolTrue
+	debugLogBoolFalse
+	debugLogInt
+	debugLogUint
+	debugLogHex
+	debugLogPtr
+	debugLogString
+	debugLogConstString
+	debugLogStringOverflow
+
+	debugLogPC
+	debugLogTraceback
+)
+
+//go:nosplit
+func (l *dlogger) b(x bool) *dlogger {
+	if !dlogEnabled {
+		return l
+	}
+	if x {
+		l.w.byte(debugLogBoolTrue)
+	} else {
+		l.w.byte(debugLogBoolFalse)
+	}
+	return l
+}
+
+//go:nosplit
+func (l *dlogger) i(x int) *dlogger {
+	return l.i64(int64(x))
+}
+
+//go:nosplit
+func (l *dlogger) i8(x int8) *dlogger {
+	return l.i64(int64(x))
+}
+
+//go:nosplit
+func (l *dlogger) i16(x int16) *dlogger {
+	return l.i64(int64(x))
+}
+
+//go:nosplit
+func (l *dlogger) i32(x int32) *dlogger {
+	return l.i64(int64(x))
+}
+
+//go:nosplit
+func (l *dlogger) i64(x int64) *dlogger {
+	if !dlogEnabled {
+		return l
+	}
+	l.w.byte(debugLogInt)
+	l.w.varint(x)
+	return l
+}
+
+//go:nosplit
+func (l *dlogger) u(x uint) *dlogger {
+	return l.u64(uint64(x))
+}
+
+//go:nosplit
+func (l *dlogger) uptr(x uintptr) *dlogger {
+	return l.u64(uint64(x))
+}
+
+//go:nosplit
+func (l *dlogger) u8(x uint8) *dlogger {
+	return l.u64(uint64(x))
+}
+
+//go:nosplit
+func (l *dlogger) u16(x uint16) *dlogger {
+	return l.u64(uint64(x))
+}
+
+//go:nosplit
+func (l *dlogger) u32(x uint32) *dlogger {
+	return l.u64(uint64(x))
+}
+
+//go:nosplit
+func (l *dlogger) u64(x uint64) *dlogger {
+	if !dlogEnabled {
+		return l
+	}
+	l.w.byte(debugLogUint)
+	l.w.uvarint(x)
+	return l
+}
+
+//go:nosplit
+func (l *dlogger) hex(x uint64) *dlogger {
+	if !dlogEnabled {
+		return l
+	}
+	l.w.byte(debugLogHex)
+	l.w.uvarint(x)
+	return l
+}
+
+//go:nosplit
+func (l *dlogger) p(x any) *dlogger {
+	if !dlogEnabled {
+		return l
+	}
+	l.w.byte(debugLogPtr)
+	if x == nil {
+		l.w.uvarint(0)
+	} else {
+		v := efaceOf(&x)
+		switch v._type.Kind_ & kindMask {
+		case kindChan, kindFunc, kindMap, kindPtr, kindUnsafePointer:
+			l.w.uvarint(uint64(uintptr(v.data)))
+		default:
+			throw("not a pointer type")
+		}
+	}
+	return l
+}
+
+//go:nosplit
+func (l *dlogger) s(x string) *dlogger {
+	if !dlogEnabled {
+		return l
+	}
+
+	strData := unsafe.StringData(x)
+	datap := &firstmoduledata
+	if len(x) > 4 && datap.etext <= uintptr(unsafe.Pointer(strData)) && uintptr(unsafe.Pointer(strData)) < datap.end {
+		// String constants are in the rodata section, which
+		// isn't recorded in moduledata. But it has to be
+		// somewhere between etext and end.
+		l.w.byte(debugLogConstString)
+		l.w.uvarint(uint64(len(x)))
+		l.w.uvarint(uint64(uintptr(unsafe.Pointer(strData)) - datap.etext))
+	} else {
+		l.w.byte(debugLogString)
+		// We can't use unsafe.Slice as it may panic, which isn't safe
+		// in this (potentially) nowritebarrier context.
+		var b []byte
+		bb := (*slice)(unsafe.Pointer(&b))
+		bb.array = unsafe.Pointer(strData)
+		bb.len, bb.cap = len(x), len(x)
+		if len(b) > debugLogStringLimit {
+			b = b[:debugLogStringLimit]
+		}
+		l.w.uvarint(uint64(len(b)))
+		l.w.bytes(b)
+		if len(b) != len(x) {
+			l.w.byte(debugLogStringOverflow)
+			l.w.uvarint(uint64(len(x) - len(b)))
+		}
+	}
+	return l
+}
+
+//go:nosplit
+func (l *dlogger) pc(x uintptr) *dlogger {
+	if !dlogEnabled {
+		return l
+	}
+	l.w.byte(debugLogPC)
+	l.w.uvarint(uint64(x))
+	return l
+}
+
+//go:nosplit
+func (l *dlogger) traceback(x []uintptr) *dlogger {
+	if !dlogEnabled {
+		return l
+	}
+	l.w.byte(debugLogTraceback)
+	l.w.uvarint(uint64(len(x)))
+	for _, pc := range x {
+		l.w.uvarint(uint64(pc))
+	}
+	return l
+}
+
+// A debugLogWriter is a ring buffer of binary debug log records.
+//
+// A log record consists of a 2-byte framing header and a sequence of
+// fields. The framing header gives the size of the record as a little
+// endian 16-bit value. Each field starts with a byte indicating its
+// type, followed by type-specific data. If the size in the framing
+// header is 0, it's a sync record consisting of two little endian
+// 64-bit values giving a new time base.
+//
+// Because this is a ring buffer, new records will eventually
+// overwrite old records. Hence, it maintains a reader that consumes
+// the log as it gets overwritten. That reader state is where an
+// actual log reader would start.
+type debugLogWriter struct {
+	_     sys.NotInHeap
+	write uint64
+	data  debugLogBuf
+
+	// tick and nano are the time bases from the most recently
+	// written sync record.
+	tick, nano uint64
+
+	// r is a reader that consumes records as they get overwritten
+	// by the writer. It also acts as the initial reader state
+	// when printing the log.
+	r debugLogReader
+
+	// buf is a scratch buffer for encoding. This is here to
+	// reduce stack usage.
+	buf [10]byte
+}
+
+type debugLogBuf struct {
+	_ sys.NotInHeap
+	b [debugLogBytes]byte
+}
+
+const (
+	// debugLogHeaderSize is the number of bytes in the framing
+	// header of every dlog record.
+	debugLogHeaderSize = 2
+
+	// debugLogSyncSize is the number of bytes in a sync record.
+	debugLogSyncSize = debugLogHeaderSize + 2*8
+)
+
+//go:nosplit
+func (l *debugLogWriter) ensure(n uint64) {
+	for l.write+n >= l.r.begin+uint64(len(l.data.b)) {
+		// Consume record at begin.
+		if l.r.skip() == ^uint64(0) {
+			// Wrapped around within a record.
+			//
+			// TODO(austin): It would be better to just
+			// eat the whole buffer at this point, but we
+			// have to communicate that to the reader
+			// somehow.
+			throw("record wrapped around")
+		}
+	}
+}
+
+//go:nosplit
+func (l *debugLogWriter) writeFrameAt(pos, size uint64) bool {
+	l.data.b[pos%uint64(len(l.data.b))] = uint8(size)
+	l.data.b[(pos+1)%uint64(len(l.data.b))] = uint8(size >> 8)
+	return size <= 0xFFFF
+}
+
+//go:nosplit
+func (l *debugLogWriter) writeSync(tick, nano uint64) {
+	l.tick, l.nano = tick, nano
+	l.ensure(debugLogHeaderSize)
+	l.writeFrameAt(l.write, 0)
+	l.write += debugLogHeaderSize
+	l.writeUint64LE(tick)
+	l.writeUint64LE(nano)
+	l.r.end = l.write
+}
+
+//go:nosplit
+func (l *debugLogWriter) writeUint64LE(x uint64) {
+	var b [8]byte
+	b[0] = byte(x)
+	b[1] = byte(x >> 8)
+	b[2] = byte(x >> 16)
+	b[3] = byte(x >> 24)
+	b[4] = byte(x >> 32)
+	b[5] = byte(x >> 40)
+	b[6] = byte(x >> 48)
+	b[7] = byte(x >> 56)
+	l.bytes(b[:])
+}
+
+//go:nosplit
+func (l *debugLogWriter) byte(x byte) {
+	l.ensure(1)
+	pos := l.write
+	l.write++
+	l.data.b[pos%uint64(len(l.data.b))] = x
+}
+
+//go:nosplit
+func (l *debugLogWriter) bytes(x []byte) {
+	l.ensure(uint64(len(x)))
+	pos := l.write
+	l.write += uint64(len(x))
+	for len(x) > 0 {
+		n := copy(l.data.b[pos%uint64(len(l.data.b)):], x)
+		pos += uint64(n)
+		x = x[n:]
+	}
+}
+
+//go:nosplit
+func (l *debugLogWriter) varint(x int64) {
+	var u uint64
+	if x < 0 {
+		u = (^uint64(x) << 1) | 1 // complement i, bit 0 is 1
+	} else {
+		u = (uint64(x) << 1) // do not complement i, bit 0 is 0
+	}
+	l.uvarint(u)
+}
+
+//go:nosplit
+func (l *debugLogWriter) uvarint(u uint64) {
+	i := 0
+	for u >= 0x80 {
+		l.buf[i] = byte(u) | 0x80
+		u >>= 7
+		i++
+	}
+	l.buf[i] = byte(u)
+	i++
+	l.bytes(l.buf[:i])
+}
+
+type debugLogReader struct {
+	data *debugLogBuf
+
+	// begin and end are the positions in the log of the beginning
+	// and end of the log data, modulo len(data).
+	begin, end uint64
+
+	// tick and nano are the current time base at begin.
+	tick, nano uint64
+}
+
+//go:nosplit
+func (r *debugLogReader) skip() uint64 {
+	// Read size at pos.
+	if r.begin+debugLogHeaderSize > r.end {
+		return ^uint64(0)
+	}
+	size := uint64(r.readUint16LEAt(r.begin))
+	if size == 0 {
+		// Sync packet.
+		r.tick = r.readUint64LEAt(r.begin + debugLogHeaderSize)
+		r.nano = r.readUint64LEAt(r.begin + debugLogHeaderSize + 8)
+		size = debugLogSyncSize
+	}
+	if r.begin+size > r.end {
+		return ^uint64(0)
+	}
+	r.begin += size
+	return size
+}
+
+//go:nosplit
+func (r *debugLogReader) readUint16LEAt(pos uint64) uint16 {
+	return uint16(r.data.b[pos%uint64(len(r.data.b))]) |
+		uint16(r.data.b[(pos+1)%uint64(len(r.data.b))])<<8
+}
+
+//go:nosplit
+func (r *debugLogReader) readUint64LEAt(pos uint64) uint64 {
+	var b [8]byte
+	for i := range b {
+		b[i] = r.data.b[pos%uint64(len(r.data.b))]
+		pos++
+	}
+	return uint64(b[0]) | uint64(b[1])<<8 |
+		uint64(b[2])<<16 | uint64(b[3])<<24 |
+		uint64(b[4])<<32 | uint64(b[5])<<40 |
+		uint64(b[6])<<48 | uint64(b[7])<<56
+}
+
+func (r *debugLogReader) peek() (tick uint64) {
+	// Consume any sync records.
+	size := uint64(0)
+	for size == 0 {
+		if r.begin+debugLogHeaderSize > r.end {
+			return ^uint64(0)
+		}
+		size = uint64(r.readUint16LEAt(r.begin))
+		if size != 0 {
+			break
+		}
+		if r.begin+debugLogSyncSize > r.end {
+			return ^uint64(0)
+		}
+		// Sync packet.
+		r.tick = r.readUint64LEAt(r.begin + debugLogHeaderSize)
+		r.nano = r.readUint64LEAt(r.begin + debugLogHeaderSize + 8)
+		r.begin += debugLogSyncSize
+	}
+
+	// Peek tick delta.
+	if r.begin+size > r.end {
+		return ^uint64(0)
+	}
+	pos := r.begin + debugLogHeaderSize
+	var u uint64
+	for i := uint(0); ; i += 7 {
+		b := r.data.b[pos%uint64(len(r.data.b))]
+		pos++
+		u |= uint64(b&^0x80) << i
+		if b&0x80 == 0 {
+			break
+		}
+	}
+	if pos > r.begin+size {
+		return ^uint64(0)
+	}
+	return r.tick + u
+}
+
+func (r *debugLogReader) header() (end, tick, nano uint64, p int) {
+	// Read size. We've already skipped sync packets and checked
+	// bounds in peek.
+	size := uint64(r.readUint16LEAt(r.begin))
+	end = r.begin + size
+	r.begin += debugLogHeaderSize
+
+	// Read tick, nano, and p.
+	tick = r.uvarint() + r.tick
+	nano = r.uvarint() + r.nano
+	p = int(r.varint())
+
+	return
+}
+
+func (r *debugLogReader) uvarint() uint64 {
+	var u uint64
+	for i := uint(0); ; i += 7 {
+		b := r.data.b[r.begin%uint64(len(r.data.b))]
+		r.begin++
+		u |= uint64(b&^0x80) << i
+		if b&0x80 == 0 {
+			break
+		}
+	}
+	return u
+}
+
+func (r *debugLogReader) varint() int64 {
+	u := r.uvarint()
+	var v int64
+	if u&1 == 0 {
+		v = int64(u >> 1)
+	} else {
+		v = ^int64(u >> 1)
+	}
+	return v
+}
+
+func (r *debugLogReader) printVal() bool {
+	typ := r.data.b[r.begin%uint64(len(r.data.b))]
+	r.begin++
+
+	switch typ {
+	default:
+		print("<unknown field type ", hex(typ), " pos ", r.begin-1, " end ", r.end, ">\n")
+		return false
+
+	case debugLogUnknown:
+		print("<unknown kind>")
+
+	case debugLogBoolTrue:
+		print(true)
+
+	case debugLogBoolFalse:
+		print(false)
+
+	case debugLogInt:
+		print(r.varint())
+
+	case debugLogUint:
+		print(r.uvarint())
+
+	case debugLogHex, debugLogPtr:
+		print(hex(r.uvarint()))
+
+	case debugLogString:
+		sl := r.uvarint()
+		if r.begin+sl > r.end {
+			r.begin = r.end
+			print("<string length corrupted>")
+			break
+		}
+		for sl > 0 {
+			b := r.data.b[r.begin%uint64(len(r.data.b)):]
+			if uint64(len(b)) > sl {
+				b = b[:sl]
+			}
+			r.begin += uint64(len(b))
+			sl -= uint64(len(b))
+			gwrite(b)
+		}
+
+	case debugLogConstString:
+		len, ptr := int(r.uvarint()), uintptr(r.uvarint())
+		ptr += firstmoduledata.etext
+		// We can't use unsafe.String as it may panic, which isn't safe
+		// in this (potentially) nowritebarrier context.
+		str := stringStruct{
+			str: unsafe.Pointer(ptr),
+			len: len,
+		}
+		s := *(*string)(unsafe.Pointer(&str))
+		print(s)
+
+	case debugLogStringOverflow:
+		print("..(", r.uvarint(), " more bytes)..")
+
+	case debugLogPC:
+		printDebugLogPC(uintptr(r.uvarint()), false)
+
+	case debugLogTraceback:
+		n := int(r.uvarint())
+		for i := 0; i < n; i++ {
+			print("\n\t")
+			// gentraceback PCs are always return PCs.
+			// Convert them to call PCs.
+			//
+			// TODO(austin): Expand inlined frames.
+			printDebugLogPC(uintptr(r.uvarint()), true)
+		}
+	}
+
+	return true
+}
+
+// printDebugLog prints the debug log.
+func printDebugLog() {
+	if !dlogEnabled {
+		return
+	}
+
+	// This function should not panic or throw since it is used in
+	// the fatal panic path and this may deadlock.
+
+	printlock()
+
+	// Get the list of all debug logs.
+	allp := (*uintptr)(unsafe.Pointer(&allDloggers))
+	all := (*dlogger)(unsafe.Pointer(atomic.Loaduintptr(allp)))
+
+	// Count the logs.
+	n := 0
+	for l := all; l != nil; l = l.allLink {
+		n++
+	}
+	if n == 0 {
+		printunlock()
+		return
+	}
+
+	// Prepare read state for all logs.
+	type readState struct {
+		debugLogReader
+		first    bool
+		lost     uint64
+		nextTick uint64
+	}
+	// Use sysAllocOS instead of sysAlloc because we want to interfere
+	// with the runtime as little as possible, and sysAlloc updates accounting.
+	state1 := sysAllocOS(unsafe.Sizeof(readState{}) * uintptr(n))
+	if state1 == nil {
+		println("failed to allocate read state for", n, "logs")
+		printunlock()
+		return
+	}
+	state := (*[1 << 20]readState)(state1)[:n]
+	{
+		l := all
+		for i := range state {
+			s := &state[i]
+			s.debugLogReader = l.w.r
+			s.first = true
+			s.lost = l.w.r.begin
+			s.nextTick = s.peek()
+			l = l.allLink
+		}
+	}
+
+	// Print records.
+	for {
+		// Find the next record.
+		var best struct {
+			tick uint64
+			i    int
+		}
+		best.tick = ^uint64(0)
+		for i := range state {
+			if state[i].nextTick < best.tick {
+				best.tick = state[i].nextTick
+				best.i = i
+			}
+		}
+		if best.tick == ^uint64(0) {
+			break
+		}
+
+		// Print record.
+		s := &state[best.i]
+		if s.first {
+			print(">> begin log ", best.i)
+			if s.lost != 0 {
+				print("; lost first ", s.lost>>10, "KB")
+			}
+			print(" <<\n")
+			s.first = false
+		}
+
+		end, _, nano, p := s.header()
+		oldEnd := s.end
+		s.end = end
+
+		print("[")
+		var tmpbuf [21]byte
+		pnano := int64(nano) - runtimeInitTime
+		if pnano < 0 {
+			// Logged before runtimeInitTime was set.
+			pnano = 0
+		}
+		pnanoBytes := itoaDiv(tmpbuf[:], uint64(pnano), 9)
+		print(slicebytetostringtmp((*byte)(noescape(unsafe.Pointer(&pnanoBytes[0]))), len(pnanoBytes)))
+		print(" P ", p, "] ")
+
+		for i := 0; s.begin < s.end; i++ {
+			if i > 0 {
+				print(" ")
+			}
+			if !s.printVal() {
+				// Abort this P log.
+				print("<aborting P log>")
+				end = oldEnd
+				break
+			}
+		}
+		println()
+
+		// Move on to the next record.
+		s.begin = end
+		s.end = oldEnd
+		s.nextTick = s.peek()
+	}
+
+	printunlock()
+}
+
+// printDebugLogPC prints a single symbolized PC. If returnPC is true,
+// pc is a return PC that must first be converted to a call PC.
+func printDebugLogPC(pc uintptr, returnPC bool) {
+	fn := findfunc(pc)
+	if returnPC && (!fn.valid() || pc > fn.entry()) {
+		// TODO(austin): Don't back up if the previous frame
+		// was a sigpanic.
+		pc--
+	}
+
+	print(hex(pc))
+	if !fn.valid() {
+		print(" [unknown PC]")
+	} else {
+		name := funcname(fn)
+		file, line := funcline(fn, pc)
+		print(" [", name, "+", hex(pc-fn.entry()),
+			" ", file, ":", line, "]")
+	}
+}
diff --git a/src/runtime/debuglog_off.go b/src/runtime/debuglog_off.go
new file mode 100644
index 0000000..fa3be39
--- /dev/null
+++ b/src/runtime/debuglog_off.go
@@ -0,0 +1,19 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !debuglog
+
+package runtime
+
+const dlogEnabled = false
+
+type dlogPerM struct{}
+
+func getCachedDlogger() *dlogger {
+	return nil
+}
+
+func putCachedDlogger(l *dlogger) bool {
+	return false
+}
diff --git a/src/runtime/debuglog_on.go b/src/runtime/debuglog_on.go
new file mode 100644
index 0000000..b815020
--- /dev/null
+++ b/src/runtime/debuglog_on.go
@@ -0,0 +1,45 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build debuglog
+
+package runtime
+
+const dlogEnabled = true
+
+// dlogPerM is the per-M debug log data. This is embedded in the m
+// struct.
+type dlogPerM struct {
+	dlogCache *dlogger
+}
+
+// getCachedDlogger returns a cached dlogger if it can do so
+// efficiently, or nil otherwise. The returned dlogger will be owned.
+func getCachedDlogger() *dlogger {
+	mp := acquirem()
+	// We don't return a cached dlogger if we're running on the
+	// signal stack in case the signal arrived while in
+	// get/putCachedDlogger. (Too bad we don't have non-atomic
+	// exchange!)
+	var l *dlogger
+	if getg() != mp.gsignal {
+		l = mp.dlogCache
+		mp.dlogCache = nil
+	}
+	releasem(mp)
+	return l
+}
+
+// putCachedDlogger attempts to return l to the local cache. It
+// returns false if this fails.
+func putCachedDlogger(l *dlogger) bool {
+	mp := acquirem()
+	if getg() != mp.gsignal && mp.dlogCache == nil {
+		mp.dlogCache = l
+		releasem(mp)
+		return true
+	}
+	releasem(mp)
+	return false
+}
diff --git a/src/runtime/debuglog_test.go b/src/runtime/debuglog_test.go
new file mode 100644
index 0000000..18c54a8
--- /dev/null
+++ b/src/runtime/debuglog_test.go
@@ -0,0 +1,169 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// TODO(austin): All of these tests are skipped if the debuglog build
+// tag isn't provided. That means we basically never test debuglog.
+// There are two potential ways around this:
+//
+// 1. Make these tests re-build the runtime test with the debuglog
+// build tag and re-invoke themselves.
+//
+// 2. Always build the whole debuglog infrastructure and depend on
+// linker dead-code elimination to drop it. This is easy for dlog()
+// since there won't be any calls to it. For printDebugLog, we can
+// make panic call a wrapper that is call printDebugLog if the
+// debuglog build tag is set, or otherwise do nothing. Then tests
+// could call printDebugLog directly. This is the right answer in
+// principle, but currently our linker reads in all symbols
+// regardless, so this would slow down and bloat all links. If the
+// linker gets more efficient about this, we should revisit this
+// approach.
+
+package runtime_test
+
+import (
+	"fmt"
+	"internal/testenv"
+	"regexp"
+	"runtime"
+	"strings"
+	"sync"
+	"sync/atomic"
+	"testing"
+)
+
+func skipDebugLog(t *testing.T) {
+	if !runtime.DlogEnabled {
+		t.Skip("debug log disabled (rebuild with -tags debuglog)")
+	}
+}
+
+func dlogCanonicalize(x string) string {
+	begin := regexp.MustCompile(`(?m)^>> begin log \d+ <<\n`)
+	x = begin.ReplaceAllString(x, "")
+	prefix := regexp.MustCompile(`(?m)^\[[^]]+\]`)
+	x = prefix.ReplaceAllString(x, "[]")
+	return x
+}
+
+func TestDebugLog(t *testing.T) {
+	skipDebugLog(t)
+	runtime.ResetDebugLog()
+	runtime.Dlog().S("testing").End()
+	got := dlogCanonicalize(runtime.DumpDebugLog())
+	if want := "[] testing\n"; got != want {
+		t.Fatalf("want %q, got %q", want, got)
+	}
+}
+
+func TestDebugLogTypes(t *testing.T) {
+	skipDebugLog(t)
+	runtime.ResetDebugLog()
+	var varString = strings.Repeat("a", 4)
+	runtime.Dlog().B(true).B(false).I(-42).I16(0x7fff).U64(^uint64(0)).Hex(0xfff).P(nil).S(varString).S("const string").End()
+	got := dlogCanonicalize(runtime.DumpDebugLog())
+	if want := "[] true false -42 32767 18446744073709551615 0xfff 0x0 aaaa const string\n"; got != want {
+		t.Fatalf("want %q, got %q", want, got)
+	}
+}
+
+func TestDebugLogSym(t *testing.T) {
+	skipDebugLog(t)
+	runtime.ResetDebugLog()
+	pc, _, _, _ := runtime.Caller(0)
+	runtime.Dlog().PC(pc).End()
+	got := dlogCanonicalize(runtime.DumpDebugLog())
+	want := regexp.MustCompile(`\[\] 0x[0-9a-f]+ \[runtime_test\.TestDebugLogSym\+0x[0-9a-f]+ .*/debuglog_test\.go:[0-9]+\]\n`)
+	if !want.MatchString(got) {
+		t.Fatalf("want matching %s, got %q", want, got)
+	}
+}
+
+func TestDebugLogInterleaving(t *testing.T) {
+	skipDebugLog(t)
+	runtime.ResetDebugLog()
+	var wg sync.WaitGroup
+	done := int32(0)
+	wg.Add(1)
+	go func() {
+		// Encourage main goroutine to move around to
+		// different Ms and Ps.
+		for atomic.LoadInt32(&done) == 0 {
+			runtime.Gosched()
+		}
+		wg.Done()
+	}()
+	var want strings.Builder
+	for i := 0; i < 1000; i++ {
+		runtime.Dlog().I(i).End()
+		fmt.Fprintf(&want, "[] %d\n", i)
+		runtime.Gosched()
+	}
+	atomic.StoreInt32(&done, 1)
+	wg.Wait()
+
+	gotFull := runtime.DumpDebugLog()
+	got := dlogCanonicalize(gotFull)
+	if got != want.String() {
+		// Since the timestamps are useful in understand
+		// failures of this test, we print the uncanonicalized
+		// output.
+		t.Fatalf("want %q, got (uncanonicalized) %q", want.String(), gotFull)
+	}
+}
+
+func TestDebugLogWraparound(t *testing.T) {
+	skipDebugLog(t)
+
+	// Make sure we don't switch logs so it's easier to fill one up.
+	runtime.LockOSThread()
+	defer runtime.UnlockOSThread()
+
+	runtime.ResetDebugLog()
+	var longString = strings.Repeat("a", 128)
+	var want strings.Builder
+	for i, j := 0, 0; j < 2*runtime.DebugLogBytes; i, j = i+1, j+len(longString) {
+		runtime.Dlog().I(i).S(longString).End()
+		fmt.Fprintf(&want, "[] %d %s\n", i, longString)
+	}
+	log := runtime.DumpDebugLog()
+
+	// Check for "lost" message.
+	lost := regexp.MustCompile(`^>> begin log \d+; lost first \d+KB <<\n`)
+	if !lost.MatchString(log) {
+		t.Fatalf("want matching %s, got %q", lost, log)
+	}
+	idx := lost.FindStringIndex(log)
+	// Strip lost message.
+	log = dlogCanonicalize(log[idx[1]:])
+
+	// Check log.
+	if !strings.HasSuffix(want.String(), log) {
+		t.Fatalf("wrong suffix:\n%s", log)
+	}
+}
+
+func TestDebugLogLongString(t *testing.T) {
+	skipDebugLog(t)
+
+	runtime.ResetDebugLog()
+	var longString = strings.Repeat("a", runtime.DebugLogStringLimit+1)
+	runtime.Dlog().S(longString).End()
+	got := dlogCanonicalize(runtime.DumpDebugLog())
+	want := "[] " + strings.Repeat("a", runtime.DebugLogStringLimit) + " ..(1 more bytes)..\n"
+	if got != want {
+		t.Fatalf("want %q, got %q", want, got)
+	}
+}
+
+// TestDebugLogBuild verifies that the runtime builds with -tags=debuglog.
+func TestDebugLogBuild(t *testing.T) {
+	testenv.MustHaveGoBuild(t)
+
+	// It doesn't matter which program we build, anything will rebuild the
+	// runtime.
+	if _, err := buildTestProg(t, "testprog", "-tags=debuglog"); err != nil {
+		t.Fatal(err)
+	}
+}
diff --git a/src/runtime/defer_test.go b/src/runtime/defer_test.go
new file mode 100644
index 0000000..3a54951
--- /dev/null
+++ b/src/runtime/defer_test.go
@@ -0,0 +1,518 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"reflect"
+	"runtime"
+	"testing"
+)
+
+// Make sure open-coded defer exit code is not lost, even when there is an
+// unconditional panic (hence no return from the function)
+func TestUnconditionalPanic(t *testing.T) {
+	defer func() {
+		if recover() != "testUnconditional" {
+			t.Fatal("expected unconditional panic")
+		}
+	}()
+	panic("testUnconditional")
+}
+
+var glob int = 3
+
+// Test an open-coded defer and non-open-coded defer - make sure both defers run
+// and call recover()
+func TestOpenAndNonOpenDefers(t *testing.T) {
+	for {
+		// Non-open defer because in a loop
+		defer func(n int) {
+			if recover() != "testNonOpenDefer" {
+				t.Fatal("expected testNonOpen panic")
+			}
+		}(3)
+		if glob > 2 {
+			break
+		}
+	}
+	testOpen(t, 47)
+	panic("testNonOpenDefer")
+}
+
+//go:noinline
+func testOpen(t *testing.T, arg int) {
+	defer func(n int) {
+		if recover() != "testOpenDefer" {
+			t.Fatal("expected testOpen panic")
+		}
+	}(4)
+	if arg > 2 {
+		panic("testOpenDefer")
+	}
+}
+
+// Test a non-open-coded defer and an open-coded defer - make sure both defers run
+// and call recover()
+func TestNonOpenAndOpenDefers(t *testing.T) {
+	testOpen(t, 47)
+	for {
+		// Non-open defer because in a loop
+		defer func(n int) {
+			if recover() != "testNonOpenDefer" {
+				t.Fatal("expected testNonOpen panic")
+			}
+		}(3)
+		if glob > 2 {
+			break
+		}
+	}
+	panic("testNonOpenDefer")
+}
+
+var list []int
+
+// Make sure that conditional open-coded defers are activated correctly and run in
+// the correct order.
+func TestConditionalDefers(t *testing.T) {
+	list = make([]int, 0, 10)
+
+	defer func() {
+		if recover() != "testConditional" {
+			t.Fatal("expected panic")
+		}
+		want := []int{4, 2, 1}
+		if !reflect.DeepEqual(want, list) {
+			t.Fatal(fmt.Sprintf("wanted %v, got %v", want, list))
+		}
+
+	}()
+	testConditionalDefers(8)
+}
+
+func testConditionalDefers(n int) {
+	doappend := func(i int) {
+		list = append(list, i)
+	}
+
+	defer doappend(1)
+	if n > 5 {
+		defer doappend(2)
+		if n > 8 {
+			defer doappend(3)
+		} else {
+			defer doappend(4)
+		}
+	}
+	panic("testConditional")
+}
+
+// Test that there is no compile-time or run-time error if an open-coded defer
+// call is removed by constant propagation and dead-code elimination.
+func TestDisappearingDefer(t *testing.T) {
+	switch runtime.GOOS {
+	case "invalidOS":
+		defer func() {
+			t.Fatal("Defer shouldn't run")
+		}()
+	}
+}
+
+// This tests an extra recursive panic behavior that is only specified in the
+// code. Suppose a first panic P1 happens and starts processing defer calls. If a
+// second panic P2 happens while processing defer call D in frame F, then defer
+// call processing is restarted (with some potentially new defer calls created by
+// D or its callees). If the defer processing reaches the started defer call D
+// again in the defer stack, then the original panic P1 is aborted and cannot
+// continue panic processing or be recovered. If the panic P2 does a recover at
+// some point, it will naturally remove the original panic P1 from the stack
+// (since the original panic had to be in frame F or a descendant of F).
+func TestAbortedPanic(t *testing.T) {
+	defer func() {
+		r := recover()
+		if r != nil {
+			t.Fatal(fmt.Sprintf("wanted nil recover, got %v", r))
+		}
+	}()
+	defer func() {
+		r := recover()
+		if r != "panic2" {
+			t.Fatal(fmt.Sprintf("wanted %v, got %v", "panic2", r))
+		}
+	}()
+	defer func() {
+		panic("panic2")
+	}()
+	panic("panic1")
+}
+
+// This tests that recover() does not succeed unless it is called directly from a
+// defer function that is directly called by the panic.  Here, we first call it
+// from a defer function that is created by the defer function called directly by
+// the panic.  In
+func TestRecoverMatching(t *testing.T) {
+	defer func() {
+		r := recover()
+		if r != "panic1" {
+			t.Fatal(fmt.Sprintf("wanted %v, got %v", "panic1", r))
+		}
+	}()
+	defer func() {
+		defer func() {
+			// Shouldn't succeed, even though it is called directly
+			// from a defer function, since this defer function was
+			// not directly called by the panic.
+			r := recover()
+			if r != nil {
+				t.Fatal(fmt.Sprintf("wanted nil recover, got %v", r))
+			}
+		}()
+	}()
+	panic("panic1")
+}
+
+type nonSSAable [128]byte
+
+type bigStruct struct {
+	x, y, z, w, p, q int64
+}
+
+type containsBigStruct struct {
+	element bigStruct
+}
+
+func mknonSSAable() nonSSAable {
+	globint1++
+	return nonSSAable{0, 0, 0, 0, 5}
+}
+
+var globint1, globint2, globint3 int
+
+//go:noinline
+func sideeffect(n int64) int64 {
+	globint2++
+	return n
+}
+
+func sideeffect2(in containsBigStruct) containsBigStruct {
+	globint3++
+	return in
+}
+
+// Test that nonSSAable arguments to defer are handled correctly and only evaluated once.
+func TestNonSSAableArgs(t *testing.T) {
+	globint1 = 0
+	globint2 = 0
+	globint3 = 0
+	var save1 byte
+	var save2 int64
+	var save3 int64
+	var save4 int64
+
+	defer func() {
+		if globint1 != 1 {
+			t.Fatal(fmt.Sprintf("globint1:  wanted: 1, got %v", globint1))
+		}
+		if save1 != 5 {
+			t.Fatal(fmt.Sprintf("save1:  wanted: 5, got %v", save1))
+		}
+		if globint2 != 1 {
+			t.Fatal(fmt.Sprintf("globint2:  wanted: 1, got %v", globint2))
+		}
+		if save2 != 2 {
+			t.Fatal(fmt.Sprintf("save2:  wanted: 2, got %v", save2))
+		}
+		if save3 != 4 {
+			t.Fatal(fmt.Sprintf("save3:  wanted: 4, got %v", save3))
+		}
+		if globint3 != 1 {
+			t.Fatal(fmt.Sprintf("globint3:  wanted: 1, got %v", globint3))
+		}
+		if save4 != 4 {
+			t.Fatal(fmt.Sprintf("save1:  wanted: 4, got %v", save4))
+		}
+	}()
+
+	// Test function returning a non-SSAable arg
+	defer func(n nonSSAable) {
+		save1 = n[4]
+	}(mknonSSAable())
+	// Test composite literal that is not SSAable
+	defer func(b bigStruct) {
+		save2 = b.y
+	}(bigStruct{1, 2, 3, 4, 5, sideeffect(6)})
+
+	// Test struct field reference that is non-SSAable
+	foo := containsBigStruct{}
+	foo.element.z = 4
+	defer func(element bigStruct) {
+		save3 = element.z
+	}(foo.element)
+	defer func(element bigStruct) {
+		save4 = element.z
+	}(sideeffect2(foo).element)
+}
+
+//go:noinline
+func doPanic() {
+	panic("Test panic")
+}
+
+func TestDeferForFuncWithNoExit(t *testing.T) {
+	cond := 1
+	defer func() {
+		if cond != 2 {
+			t.Fatal(fmt.Sprintf("cond: wanted 2, got %v", cond))
+		}
+		if recover() != "Test panic" {
+			t.Fatal("Didn't find expected panic")
+		}
+	}()
+	x := 0
+	// Force a stack copy, to make sure that the &cond pointer passed to defer
+	// function is properly updated.
+	growStackIter(&x, 1000)
+	cond = 2
+	doPanic()
+
+	// This function has no exit/return, since it ends with an infinite loop
+	for {
+	}
+}
+
+// Test case approximating issue #37664, where a recursive function (interpreter)
+// may do repeated recovers/re-panics until it reaches the frame where the panic
+// can actually be handled. The recurseFnPanicRec() function is testing that there
+// are no stale defer structs on the defer chain after the interpreter() sequence,
+// by writing a bunch of 0xffffffffs into several recursive stack frames, and then
+// doing a single panic-recover which would invoke any such stale defer structs.
+func TestDeferWithRepeatedRepanics(t *testing.T) {
+	interpreter(0, 6, 2)
+	recurseFnPanicRec(0, 10)
+	interpreter(0, 5, 1)
+	recurseFnPanicRec(0, 10)
+	interpreter(0, 6, 3)
+	recurseFnPanicRec(0, 10)
+}
+
+func interpreter(level int, maxlevel int, rec int) {
+	defer func() {
+		e := recover()
+		if e == nil {
+			return
+		}
+		if level != e.(int) {
+			//fmt.Fprintln(os.Stderr, "re-panicing, level", level)
+			panic(e)
+		}
+		//fmt.Fprintln(os.Stderr, "Recovered, level", level)
+	}()
+	if level+1 < maxlevel {
+		interpreter(level+1, maxlevel, rec)
+	} else {
+		//fmt.Fprintln(os.Stderr, "Initiating panic")
+		panic(rec)
+	}
+}
+
+func recurseFnPanicRec(level int, maxlevel int) {
+	defer func() {
+		recover()
+	}()
+	recurseFn(level, maxlevel)
+}
+
+var saveInt uint32
+
+func recurseFn(level int, maxlevel int) {
+	a := [40]uint32{0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff}
+	if level+1 < maxlevel {
+		// Make sure a array is referenced, so it is not optimized away
+		saveInt = a[4]
+		recurseFn(level+1, maxlevel)
+	} else {
+		panic("recurseFn panic")
+	}
+}
+
+// Try to reproduce issue #37688, where a pointer to an open-coded defer struct is
+// mistakenly held, and that struct keeps a pointer to a stack-allocated defer
+// struct, and that stack-allocated struct gets overwritten or the stack gets
+// moved, so a memory error happens on GC.
+func TestIssue37688(t *testing.T) {
+	for j := 0; j < 10; j++ {
+		g2()
+		g3()
+	}
+}
+
+type foo struct {
+}
+
+//go:noinline
+func (f *foo) method1() {
+}
+
+//go:noinline
+func (f *foo) method2() {
+}
+
+func g2() {
+	var a foo
+	ap := &a
+	// The loop forces this defer to be heap-allocated and the remaining two
+	// to be stack-allocated.
+	for i := 0; i < 1; i++ {
+		defer ap.method1()
+	}
+	defer ap.method2()
+	defer ap.method1()
+	ff1(ap, 1, 2, 3, 4, 5, 6, 7, 8, 9)
+	// Try to get the stack to be moved by growing it too large, so
+	// existing stack-allocated defer becomes invalid.
+	rec1(2000)
+}
+
+func g3() {
+	// Mix up the stack layout by adding in an extra function frame
+	g2()
+}
+
+var globstruct struct {
+	a, b, c, d, e, f, g, h, i int
+}
+
+func ff1(ap *foo, a, b, c, d, e, f, g, h, i int) {
+	defer ap.method1()
+
+	// Make a defer that has a very large set of args, hence big size for the
+	// defer record for the open-coded frame (which means it won't use the
+	// defer pool)
+	defer func(ap *foo, a, b, c, d, e, f, g, h, i int) {
+		if v := recover(); v != nil {
+		}
+		globstruct.a = a
+		globstruct.b = b
+		globstruct.c = c
+		globstruct.d = d
+		globstruct.e = e
+		globstruct.f = f
+		globstruct.g = g
+		globstruct.h = h
+	}(ap, a, b, c, d, e, f, g, h, i)
+	panic("ff1 panic")
+}
+
+func rec1(max int) {
+	if max > 0 {
+		rec1(max - 1)
+	}
+}
+
+func TestIssue43921(t *testing.T) {
+	defer func() {
+		expect(t, 1, recover())
+	}()
+	func() {
+		// Prevent open-coded defers
+		for {
+			defer func() {}()
+			break
+		}
+
+		defer func() {
+			defer func() {
+				expect(t, 4, recover())
+			}()
+			panic(4)
+		}()
+		panic(1)
+
+	}()
+}
+
+func expect(t *testing.T, n int, err any) {
+	if n != err {
+		t.Fatalf("have %v, want %v", err, n)
+	}
+}
+
+func TestIssue43920(t *testing.T) {
+	var steps int
+
+	defer func() {
+		expect(t, 1, recover())
+	}()
+	defer func() {
+		defer func() {
+			defer func() {
+				expect(t, 5, recover())
+			}()
+			defer panic(5)
+			func() {
+				panic(4)
+			}()
+		}()
+		defer func() {
+			expect(t, 3, recover())
+		}()
+		defer panic(3)
+	}()
+	func() {
+		defer step(t, &steps, 1)
+		panic(1)
+	}()
+}
+
+func step(t *testing.T, steps *int, want int) {
+	*steps++
+	if *steps != want {
+		t.Fatalf("have %v, want %v", *steps, want)
+	}
+}
+
+func TestIssue43941(t *testing.T) {
+	var steps int = 7
+	defer func() {
+		step(t, &steps, 14)
+		expect(t, 4, recover())
+	}()
+	func() {
+		func() {
+			defer func() {
+				defer func() {
+					expect(t, 3, recover())
+				}()
+				defer panic(3)
+				panic(2)
+			}()
+			defer func() {
+				expect(t, 1, recover())
+			}()
+			defer panic(1)
+		}()
+		defer func() {}()
+		defer func() {}()
+		defer step(t, &steps, 10)
+		defer step(t, &steps, 9)
+		step(t, &steps, 8)
+	}()
+	func() {
+		defer step(t, &steps, 13)
+		defer step(t, &steps, 12)
+		func() {
+			defer step(t, &steps, 11)
+			panic(4)
+		}()
+
+		// Code below isn't executed,
+		// but removing it breaks the test case.
+		defer func() {}()
+		defer panic(-1)
+		defer step(t, &steps, -1)
+		defer step(t, &steps, -1)
+		defer func() {}()
+	}()
+}
diff --git a/src/runtime/defs1_linux.go b/src/runtime/defs1_linux.go
new file mode 100644
index 0000000..709f19e
--- /dev/null
+++ b/src/runtime/defs1_linux.go
@@ -0,0 +1,40 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+/*
+Input to cgo -cdefs
+
+GOARCH=amd64 cgo -cdefs defs.go defs1.go >amd64/defs.h
+*/
+
+package runtime
+
+/*
+#include <ucontext.h>
+#include <fcntl.h>
+#include <asm/signal.h>
+*/
+import "C"
+
+const (
+	O_RDONLY    = C.O_RDONLY
+	O_NONBLOCK  = C.O_NONBLOCK
+	O_CLOEXEC   = C.O_CLOEXEC
+	SA_RESTORER = C.SA_RESTORER
+)
+
+type Usigset C.__sigset_t
+type Fpxreg C.struct__libc_fpxreg
+type Xmmreg C.struct__libc_xmmreg
+type Fpstate C.struct__libc_fpstate
+type Fpxreg1 C.struct__fpxreg
+type Xmmreg1 C.struct__xmmreg
+type Fpstate1 C.struct__fpstate
+type Fpreg1 C.struct__fpreg
+type StackT C.stack_t
+type Mcontext C.mcontext_t
+type Ucontext C.ucontext_t
+type Sigcontext C.struct_sigcontext
diff --git a/src/runtime/defs1_netbsd_386.go b/src/runtime/defs1_netbsd_386.go
new file mode 100644
index 0000000..f7fe45b
--- /dev/null
+++ b/src/runtime/defs1_netbsd_386.go
@@ -0,0 +1,183 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_netbsd.go defs_netbsd_386.go
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+	_O_CLOEXEC  = 0x400000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED = 0x4
+	_MADV_FREE     = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = 0x0
+	_EVFILT_WRITE = 0x1
+)
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type siginfo struct {
+	_signo  int32
+	_code   int32
+	_errno  int32
+	_reason [20]byte
+}
+
+type stackt struct {
+	ss_sp    uintptr
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int32
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = int64(timediv(ns, 1e9, &ts.tv_nsec))
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int32
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type mcontextt struct {
+	__gregs     [19]uint32
+	__fpregs    [644]byte
+	_mc_tlsbase int32
+}
+
+type ucontextt struct {
+	uc_flags    uint32
+	uc_link     *ucontextt
+	uc_sigmask  sigset
+	uc_stack    stackt
+	uc_mcontext mcontextt
+	__uc_pad    [4]int32
+}
+
+type keventt struct {
+	ident  uint32
+	filter uint32
+	flags  uint32
+	fflags uint32
+	data   int64
+	udata  *byte
+}
+
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_netbsd.go defs_netbsd_386.go
+
+const (
+	_REG_GS     = 0x0
+	_REG_FS     = 0x1
+	_REG_ES     = 0x2
+	_REG_DS     = 0x3
+	_REG_EDI    = 0x4
+	_REG_ESI    = 0x5
+	_REG_EBP    = 0x6
+	_REG_ESP    = 0x7
+	_REG_EBX    = 0x8
+	_REG_EDX    = 0x9
+	_REG_ECX    = 0xa
+	_REG_EAX    = 0xb
+	_REG_TRAPNO = 0xc
+	_REG_ERR    = 0xd
+	_REG_EIP    = 0xe
+	_REG_CS     = 0xf
+	_REG_EFL    = 0x10
+	_REG_UESP   = 0x11
+	_REG_SS     = 0x12
+)
diff --git a/src/runtime/defs1_netbsd_amd64.go b/src/runtime/defs1_netbsd_amd64.go
new file mode 100644
index 0000000..80908cd
--- /dev/null
+++ b/src/runtime/defs1_netbsd_amd64.go
@@ -0,0 +1,195 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_netbsd.go defs_netbsd_amd64.go
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+	_O_CLOEXEC  = 0x400000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED = 0x4
+	_MADV_FREE     = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = 0x0
+	_EVFILT_WRITE = 0x1
+)
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type siginfo struct {
+	_signo  int32
+	_code   int32
+	_errno  int32
+	_pad    int32
+	_reason [24]byte
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec    int64
+	tv_usec   int32
+	pad_cgo_0 [4]byte
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type mcontextt struct {
+	__gregs     [26]uint64
+	_mc_tlsbase uint64
+	__fpregs    [512]int8
+}
+
+type ucontextt struct {
+	uc_flags    uint32
+	pad_cgo_0   [4]byte
+	uc_link     *ucontextt
+	uc_sigmask  sigset
+	uc_stack    stackt
+	uc_mcontext mcontextt
+}
+
+type keventt struct {
+	ident     uint64
+	filter    uint32
+	flags     uint32
+	fflags    uint32
+	pad_cgo_0 [4]byte
+	data      int64
+	udata     *byte
+}
+
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_netbsd.go defs_netbsd_amd64.go
+
+const (
+	_REG_RDI    = 0x0
+	_REG_RSI    = 0x1
+	_REG_RDX    = 0x2
+	_REG_RCX    = 0x3
+	_REG_R8     = 0x4
+	_REG_R9     = 0x5
+	_REG_R10    = 0x6
+	_REG_R11    = 0x7
+	_REG_R12    = 0x8
+	_REG_R13    = 0x9
+	_REG_R14    = 0xa
+	_REG_R15    = 0xb
+	_REG_RBP    = 0xc
+	_REG_RBX    = 0xd
+	_REG_RAX    = 0xe
+	_REG_GS     = 0xf
+	_REG_FS     = 0x10
+	_REG_ES     = 0x11
+	_REG_DS     = 0x12
+	_REG_TRAPNO = 0x13
+	_REG_ERR    = 0x14
+	_REG_RIP    = 0x15
+	_REG_CS     = 0x16
+	_REG_RFLAGS = 0x17
+	_REG_RSP    = 0x18
+	_REG_SS     = 0x19
+)
diff --git a/src/runtime/defs1_netbsd_arm.go b/src/runtime/defs1_netbsd_arm.go
new file mode 100644
index 0000000..c63e592
--- /dev/null
+++ b/src/runtime/defs1_netbsd_arm.go
@@ -0,0 +1,188 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_netbsd.go defs_netbsd_arm.go
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+	_O_CLOEXEC  = 0x400000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED = 0x4
+	_MADV_FREE     = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = 0x0
+	_EVFILT_WRITE = 0x1
+)
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type siginfo struct {
+	_signo   int32
+	_code    int32
+	_errno   int32
+	_reason  uintptr
+	_reasonx [16]byte
+}
+
+type stackt struct {
+	ss_sp    uintptr
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int32
+	_       [4]byte // EABI
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = int64(timediv(ns, 1e9, &ts.tv_nsec))
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int32
+	_       [4]byte // EABI
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type mcontextt struct {
+	__gregs     [17]uint32
+	_           [4]byte   // EABI
+	__fpu       [272]byte // EABI
+	_mc_tlsbase uint32
+	_           [4]byte // EABI
+}
+
+type ucontextt struct {
+	uc_flags    uint32
+	uc_link     *ucontextt
+	uc_sigmask  sigset
+	uc_stack    stackt
+	_           [4]byte // EABI
+	uc_mcontext mcontextt
+	__uc_pad    [2]int32
+}
+
+type keventt struct {
+	ident  uint32
+	filter uint32
+	flags  uint32
+	fflags uint32
+	data   int64
+	udata  *byte
+	_      [4]byte // EABI
+}
+
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_netbsd.go defs_netbsd_arm.go
+
+const (
+	_REG_R0   = 0x0
+	_REG_R1   = 0x1
+	_REG_R2   = 0x2
+	_REG_R3   = 0x3
+	_REG_R4   = 0x4
+	_REG_R5   = 0x5
+	_REG_R6   = 0x6
+	_REG_R7   = 0x7
+	_REG_R8   = 0x8
+	_REG_R9   = 0x9
+	_REG_R10  = 0xa
+	_REG_R11  = 0xb
+	_REG_R12  = 0xc
+	_REG_R13  = 0xd
+	_REG_R14  = 0xe
+	_REG_R15  = 0xf
+	_REG_CPSR = 0x10
+)
diff --git a/src/runtime/defs1_netbsd_arm64.go b/src/runtime/defs1_netbsd_arm64.go
new file mode 100644
index 0000000..804b5b0
--- /dev/null
+++ b/src/runtime/defs1_netbsd_arm64.go
@@ -0,0 +1,203 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_netbsd.go defs_netbsd_arm.go
+
+package runtime
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+	_O_CLOEXEC  = 0x400000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED = 0x4
+	_MADV_FREE     = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = 0x0
+	_EVFILT_WRITE = 0x1
+)
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type siginfo struct {
+	_signo   int32
+	_code    int32
+	_errno   int32
+	_reason  uintptr
+	_reasonx [16]byte
+}
+
+type stackt struct {
+	ss_sp    uintptr
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int32
+	_       [4]byte // EABI
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type mcontextt struct {
+	__gregs [35]uint64
+	__fregs [4160]byte // _NFREG * 128 + 32 + 32
+	_       [8]uint64  // future use
+}
+
+type ucontextt struct {
+	uc_flags    uint32
+	uc_link     *ucontextt
+	uc_sigmask  sigset
+	uc_stack    stackt
+	_           [4]byte // EABI
+	uc_mcontext mcontextt
+	__uc_pad    [2]int32
+}
+
+type keventt struct {
+	ident     uint64
+	filter    uint32
+	flags     uint32
+	fflags    uint32
+	pad_cgo_0 [4]byte
+	data      int64
+	udata     *byte
+}
+
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_netbsd.go defs_netbsd_arm.go
+
+const (
+	_REG_X0    = 0
+	_REG_X1    = 1
+	_REG_X2    = 2
+	_REG_X3    = 3
+	_REG_X4    = 4
+	_REG_X5    = 5
+	_REG_X6    = 6
+	_REG_X7    = 7
+	_REG_X8    = 8
+	_REG_X9    = 9
+	_REG_X10   = 10
+	_REG_X11   = 11
+	_REG_X12   = 12
+	_REG_X13   = 13
+	_REG_X14   = 14
+	_REG_X15   = 15
+	_REG_X16   = 16
+	_REG_X17   = 17
+	_REG_X18   = 18
+	_REG_X19   = 19
+	_REG_X20   = 20
+	_REG_X21   = 21
+	_REG_X22   = 22
+	_REG_X23   = 23
+	_REG_X24   = 24
+	_REG_X25   = 25
+	_REG_X26   = 26
+	_REG_X27   = 27
+	_REG_X28   = 28
+	_REG_X29   = 29
+	_REG_X30   = 30
+	_REG_X31   = 31
+	_REG_ELR   = 32
+	_REG_SPSR  = 33
+	_REG_TPIDR = 34
+)
diff --git a/src/runtime/defs1_solaris_amd64.go b/src/runtime/defs1_solaris_amd64.go
new file mode 100644
index 0000000..9ebe5bb
--- /dev/null
+++ b/src/runtime/defs1_solaris_amd64.go
@@ -0,0 +1,250 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_solaris.go defs_solaris_amd64.go
+
+package runtime
+
+const (
+	_EINTR       = 0x4
+	_EBADF       = 0x9
+	_EFAULT      = 0xe
+	_EAGAIN      = 0xb
+	_EBUSY       = 0x10
+	_ETIME       = 0x3e
+	_ETIMEDOUT   = 0x91
+	_EWOULDBLOCK = 0xb
+	_EINPROGRESS = 0x96
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x100
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED = 0x4
+	_MADV_FREE     = 0x5
+
+	_SA_SIGINFO = 0x8
+	_SA_RESTART = 0x4
+	_SA_ONSTACK = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x15
+	_SIGSTOP   = 0x17
+	_SIGTSTP   = 0x18
+	_SIGCONT   = 0x19
+	_SIGCHLD   = 0x12
+	_SIGTTIN   = 0x1a
+	_SIGTTOU   = 0x1b
+	_SIGIO     = 0x16
+	_SIGXCPU   = 0x1e
+	_SIGXFSZ   = 0x1f
+	_SIGVTALRM = 0x1c
+	_SIGPROF   = 0x1d
+	_SIGWINCH  = 0x14
+	_SIGUSR1   = 0x10
+	_SIGUSR2   = 0x11
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	__SC_PAGESIZE         = 0xb
+	__SC_NPROCESSORS_ONLN = 0xf
+
+	_PTHREAD_CREATE_DETACHED = 0x40
+
+	_FORK_NOSIGCHLD = 0x1
+	_FORK_WAITPID   = 0x2
+
+	_MAXHOSTNAMELEN = 0x100
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x80
+	_O_TRUNC    = 0x200
+	_O_CREAT    = 0x100
+	_O_CLOEXEC  = 0x800000
+
+	_POLLIN  = 0x1
+	_POLLOUT = 0x4
+	_POLLHUP = 0x10
+	_POLLERR = 0x8
+
+	_PORT_SOURCE_FD    = 0x4
+	_PORT_SOURCE_ALERT = 0x5
+	_PORT_ALERT_UPDATE = 0x2
+)
+
+type semt struct {
+	sem_count uint32
+	sem_type  uint16
+	sem_magic uint16
+	sem_pad1  [3]uint64
+	sem_pad2  [2]uint64
+}
+
+type sigset struct {
+	__sigbits [4]uint32
+}
+
+type stackt struct {
+	ss_sp     *byte
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type siginfo struct {
+	si_signo int32
+	si_code  int32
+	si_errno int32
+	si_pad   int32
+	__data   [240]byte
+}
+
+type sigactiont struct {
+	sa_flags  int32
+	pad_cgo_0 [4]byte
+	_funcptr  [8]byte
+	sa_mask   sigset
+}
+
+type fpregset struct {
+	fp_reg_set [528]byte
+}
+
+type mcontext struct {
+	gregs  [28]int64
+	fpregs fpregset
+}
+
+type ucontext struct {
+	uc_flags    uint64
+	uc_link     *ucontext
+	uc_sigmask  sigset
+	uc_stack    stackt
+	pad_cgo_0   [8]byte
+	uc_mcontext mcontext
+	uc_filler   [5]int64
+	pad_cgo_1   [8]byte
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type portevent struct {
+	portev_events int32
+	portev_source uint16
+	portev_pad    uint16
+	portev_object uint64
+	portev_user   *byte
+}
+
+type pthread uint32
+type pthreadattr struct {
+	__pthread_attrp *byte
+}
+
+type stat struct {
+	st_dev     uint64
+	st_ino     uint64
+	st_mode    uint32
+	st_nlink   uint32
+	st_uid     uint32
+	st_gid     uint32
+	st_rdev    uint64
+	st_size    int64
+	st_atim    timespec
+	st_mtim    timespec
+	st_ctim    timespec
+	st_blksize int32
+	pad_cgo_0  [4]byte
+	st_blocks  int64
+	st_fstype  [16]int8
+}
+
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_solaris.go defs_solaris_amd64.go
+
+const (
+	_REG_RDI    = 0x8
+	_REG_RSI    = 0x9
+	_REG_RDX    = 0xc
+	_REG_RCX    = 0xd
+	_REG_R8     = 0x7
+	_REG_R9     = 0x6
+	_REG_R10    = 0x5
+	_REG_R11    = 0x4
+	_REG_R12    = 0x3
+	_REG_R13    = 0x2
+	_REG_R14    = 0x1
+	_REG_R15    = 0x0
+	_REG_RBP    = 0xa
+	_REG_RBX    = 0xb
+	_REG_RAX    = 0xe
+	_REG_GS     = 0x17
+	_REG_FS     = 0x16
+	_REG_ES     = 0x18
+	_REG_DS     = 0x19
+	_REG_TRAPNO = 0xf
+	_REG_ERR    = 0x10
+	_REG_RIP    = 0x11
+	_REG_CS     = 0x12
+	_REG_RFLAGS = 0x13
+	_REG_RSP    = 0x14
+	_REG_SS     = 0x15
+)
diff --git a/src/runtime/defs2_linux.go b/src/runtime/defs2_linux.go
new file mode 100644
index 0000000..5d6730a
--- /dev/null
+++ b/src/runtime/defs2_linux.go
@@ -0,0 +1,138 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+/*
+ * Input to cgo -cdefs
+
+GOARCH=386 go tool cgo -cdefs defs2_linux.go >defs_linux_386.h
+
+The asm header tricks we have to use for Linux on amd64
+(see defs.c and defs1.c) don't work here, so this is yet another
+file.  Sigh.
+*/
+
+package runtime
+
+/*
+#cgo CFLAGS: -I/tmp/linux/arch/x86/include -I/tmp/linux/include -D_LOOSE_KERNEL_NAMES -D__ARCH_SI_UID_T=__kernel_uid32_t
+
+#define size_t __kernel_size_t
+#define pid_t int
+#include <asm/signal.h>
+#include <asm/mman.h>
+#include <asm/sigcontext.h>
+#include <asm/ucontext.h>
+#include <asm/siginfo.h>
+#include <asm-generic/errno.h>
+#include <asm-generic/fcntl.h>
+#include <asm-generic/poll.h>
+#include <linux/eventpoll.h>
+
+// This is the sigaction structure from the Linux 2.1.68 kernel which
+//   is used with the rt_sigaction system call. For 386 this is not
+//   defined in any public header file.
+
+struct kernel_sigaction {
+	__sighandler_t k_sa_handler;
+	unsigned long sa_flags;
+	void (*sa_restorer) (void);
+	unsigned long long sa_mask;
+};
+*/
+import "C"
+
+const (
+	EINTR  = C.EINTR
+	EAGAIN = C.EAGAIN
+	ENOMEM = C.ENOMEM
+
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANONYMOUS
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+
+	MADV_DONTNEED   = C.MADV_DONTNEED
+	MADV_FREE       = C.MADV_FREE
+	MADV_HUGEPAGE   = C.MADV_HUGEPAGE
+	MADV_NOHUGEPAGE = C.MADV_NOHUGEPAGE
+
+	SA_RESTART  = C.SA_RESTART
+	SA_ONSTACK  = C.SA_ONSTACK
+	SA_RESTORER = C.SA_RESTORER
+	SA_SIGINFO  = C.SA_SIGINFO
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGBUS    = C.SIGBUS
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGUSR1   = C.SIGUSR1
+	SIGSEGV   = C.SIGSEGV
+	SIGUSR2   = C.SIGUSR2
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGSTKFLT = C.SIGSTKFLT
+	SIGCHLD   = C.SIGCHLD
+	SIGCONT   = C.SIGCONT
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGURG    = C.SIGURG
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGIO     = C.SIGIO
+	SIGPWR    = C.SIGPWR
+	SIGSYS    = C.SIGSYS
+
+	FPE_INTDIV = C.FPE_INTDIV
+	FPE_INTOVF = C.FPE_INTOVF
+	FPE_FLTDIV = C.FPE_FLTDIV
+	FPE_FLTOVF = C.FPE_FLTOVF
+	FPE_FLTUND = C.FPE_FLTUND
+	FPE_FLTRES = C.FPE_FLTRES
+	FPE_FLTINV = C.FPE_FLTINV
+	FPE_FLTSUB = C.FPE_FLTSUB
+
+	BUS_ADRALN = C.BUS_ADRALN
+	BUS_ADRERR = C.BUS_ADRERR
+	BUS_OBJERR = C.BUS_OBJERR
+
+	SEGV_MAPERR = C.SEGV_MAPERR
+	SEGV_ACCERR = C.SEGV_ACCERR
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	ITIMER_PROF    = C.ITIMER_PROF
+
+	O_RDONLY  = C.O_RDONLY
+	O_CLOEXEC = C.O_CLOEXEC
+)
+
+type Fpreg C.struct__fpreg
+type Fpxreg C.struct__fpxreg
+type Xmmreg C.struct__xmmreg
+type Fpstate C.struct__fpstate
+type Timespec C.struct_timespec
+type Timeval C.struct_timeval
+type Sigaction C.struct_kernel_sigaction
+type Siginfo C.siginfo_t
+type StackT C.stack_t
+type Sigcontext C.struct_sigcontext
+type Ucontext C.struct_ucontext
+type Itimerval C.struct_itimerval
+type EpollEvent C.struct_epoll_event
diff --git a/src/runtime/defs3_linux.go b/src/runtime/defs3_linux.go
new file mode 100644
index 0000000..99479aa
--- /dev/null
+++ b/src/runtime/defs3_linux.go
@@ -0,0 +1,43 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+/*
+Input to cgo -cdefs
+
+GOARCH=ppc64 cgo -cdefs defs_linux.go defs3_linux.go > defs_linux_ppc64.h
+*/
+
+package runtime
+
+/*
+#define size_t __kernel_size_t
+#define sigset_t __sigset_t // rename the sigset_t here otherwise cgo will complain about "inconsistent definitions for C.sigset_t"
+#define	_SYS_TYPES_H	// avoid inclusion of sys/types.h
+#include <asm/ucontext.h>
+#include <asm-generic/fcntl.h>
+*/
+import "C"
+
+const (
+	O_RDONLY    = C.O_RDONLY
+	O_CLOEXEC   = C.O_CLOEXEC
+	SA_RESTORER = 0 // unused
+)
+
+type Usigset C.__sigset_t
+
+// types used in sigcontext
+type Ptregs C.struct_pt_regs
+type Gregset C.elf_gregset_t
+type FPregset C.elf_fpregset_t
+type Vreg C.elf_vrreg_t
+
+type StackT C.stack_t
+
+// PPC64 uses sigcontext in place of mcontext in ucontext.
+// see https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/include/uapi/asm/ucontext.h
+type Sigcontext C.struct_sigcontext
+type Ucontext C.struct_ucontext
diff --git a/src/runtime/defs_aix.go b/src/runtime/defs_aix.go
new file mode 100644
index 0000000..2f28e53
--- /dev/null
+++ b/src/runtime/defs_aix.go
@@ -0,0 +1,172 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+/*
+Input to cgo -godefs
+GOARCH=ppc64 go tool cgo -godefs defs_aix.go > defs_aix_ppc64_tmp.go
+
+This is only a helper to create defs_aix_ppc64.go
+Go runtime functions require the "linux" name of fields (ss_sp, si_addr, etc)
+However, AIX structures don't provide such names and must be modified.
+
+TODO(aix): create a script to automatise defs_aix creation.
+
+Modifications made:
+ - sigset replaced by a [4]uint64 array
+ - add sigset_all variable
+ - siginfo.si_addr uintptr instead of *byte
+ - add (*timeval) set_usec
+ - stackt.ss_sp uintptr instead of *byte
+ - stackt.ss_size uintptr instead of uint64
+ - sigcontext.sc_jmpbuf context64 instead of jumbuf
+ - ucontext.__extctx is a uintptr because we don't need extctx struct
+ - ucontext.uc_mcontext: replace jumbuf structure by context64 structure
+ - sigaction.sa_handler represents union field as both are uintptr
+ - tstate.* replace *byte by uintptr
+
+
+*/
+
+package runtime
+
+/*
+
+#include <sys/types.h>
+#include <sys/errno.h>
+#include <sys/time.h>
+#include <sys/signal.h>
+#include <sys/mman.h>
+#include <sys/thread.h>
+#include <sys/resource.h>
+
+#include <unistd.h>
+#include <fcntl.h>
+#include <pthread.h>
+#include <semaphore.h>
+*/
+import "C"
+
+const (
+	_EPERM     = C.EPERM
+	_ENOENT    = C.ENOENT
+	_EINTR     = C.EINTR
+	_EAGAIN    = C.EAGAIN
+	_ENOMEM    = C.ENOMEM
+	_EACCES    = C.EACCES
+	_EFAULT    = C.EFAULT
+	_EINVAL    = C.EINVAL
+	_ETIMEDOUT = C.ETIMEDOUT
+
+	_PROT_NONE  = C.PROT_NONE
+	_PROT_READ  = C.PROT_READ
+	_PROT_WRITE = C.PROT_WRITE
+	_PROT_EXEC  = C.PROT_EXEC
+
+	_MAP_ANON      = C.MAP_ANONYMOUS
+	_MAP_PRIVATE   = C.MAP_PRIVATE
+	_MAP_FIXED     = C.MAP_FIXED
+	_MADV_DONTNEED = C.MADV_DONTNEED
+
+	_SIGHUP     = C.SIGHUP
+	_SIGINT     = C.SIGINT
+	_SIGQUIT    = C.SIGQUIT
+	_SIGILL     = C.SIGILL
+	_SIGTRAP    = C.SIGTRAP
+	_SIGABRT    = C.SIGABRT
+	_SIGBUS     = C.SIGBUS
+	_SIGFPE     = C.SIGFPE
+	_SIGKILL    = C.SIGKILL
+	_SIGUSR1    = C.SIGUSR1
+	_SIGSEGV    = C.SIGSEGV
+	_SIGUSR2    = C.SIGUSR2
+	_SIGPIPE    = C.SIGPIPE
+	_SIGALRM    = C.SIGALRM
+	_SIGCHLD    = C.SIGCHLD
+	_SIGCONT    = C.SIGCONT
+	_SIGSTOP    = C.SIGSTOP
+	_SIGTSTP    = C.SIGTSTP
+	_SIGTTIN    = C.SIGTTIN
+	_SIGTTOU    = C.SIGTTOU
+	_SIGURG     = C.SIGURG
+	_SIGXCPU    = C.SIGXCPU
+	_SIGXFSZ    = C.SIGXFSZ
+	_SIGVTALRM  = C.SIGVTALRM
+	_SIGPROF    = C.SIGPROF
+	_SIGWINCH   = C.SIGWINCH
+	_SIGIO      = C.SIGIO
+	_SIGPWR     = C.SIGPWR
+	_SIGSYS     = C.SIGSYS
+	_SIGTERM    = C.SIGTERM
+	_SIGEMT     = C.SIGEMT
+	_SIGWAITING = C.SIGWAITING
+
+	_FPE_INTDIV = C.FPE_INTDIV
+	_FPE_INTOVF = C.FPE_INTOVF
+	_FPE_FLTDIV = C.FPE_FLTDIV
+	_FPE_FLTOVF = C.FPE_FLTOVF
+	_FPE_FLTUND = C.FPE_FLTUND
+	_FPE_FLTRES = C.FPE_FLTRES
+	_FPE_FLTINV = C.FPE_FLTINV
+	_FPE_FLTSUB = C.FPE_FLTSUB
+
+	_BUS_ADRALN = C.BUS_ADRALN
+	_BUS_ADRERR = C.BUS_ADRERR
+	_BUS_OBJERR = C.BUS_OBJERR
+
+	_SEGV_MAPERR = C.SEGV_MAPERR
+	_SEGV_ACCERR = C.SEGV_ACCERR
+
+	_ITIMER_REAL    = C.ITIMER_REAL
+	_ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	_ITIMER_PROF    = C.ITIMER_PROF
+
+	_O_RDONLY   = C.O_RDONLY
+	_O_WRONLY   = C.O_WRONLY
+	_O_NONBLOCK = C.O_NONBLOCK
+	_O_CREAT    = C.O_CREAT
+	_O_TRUNC    = C.O_TRUNC
+
+	_SS_DISABLE  = C.SS_DISABLE
+	_SI_USER     = C.SI_USER
+	_SIG_BLOCK   = C.SIG_BLOCK
+	_SIG_UNBLOCK = C.SIG_UNBLOCK
+	_SIG_SETMASK = C.SIG_SETMASK
+
+	_SA_SIGINFO = C.SA_SIGINFO
+	_SA_RESTART = C.SA_RESTART
+	_SA_ONSTACK = C.SA_ONSTACK
+
+	_PTHREAD_CREATE_DETACHED = C.PTHREAD_CREATE_DETACHED
+
+	__SC_PAGE_SIZE        = C._SC_PAGE_SIZE
+	__SC_NPROCESSORS_ONLN = C._SC_NPROCESSORS_ONLN
+
+	_F_SETFL = C.F_SETFL
+	_F_GETFD = C.F_GETFD
+	_F_GETFL = C.F_GETFL
+)
+
+type sigset C.sigset_t
+type siginfo C.siginfo_t
+type timespec C.struct_timespec
+type timestruc C.struct_timestruc_t
+type timeval C.struct_timeval
+type itimerval C.struct_itimerval
+
+type stackt C.stack_t
+type sigcontext C.struct_sigcontext
+type ucontext C.ucontext_t
+type _Ctype_struct___extctx uint64 // ucontext use a pointer to this structure but it shouldn't be used
+type jmpbuf C.struct___jmpbuf
+type context64 C.struct___context64
+type sigactiont C.struct_sigaction
+type tstate C.struct_tstate
+type rusage C.struct_rusage
+
+type pthread C.pthread_t
+type pthread_attr C.pthread_attr_t
+
+type semt C.sem_t
diff --git a/src/runtime/defs_aix_ppc64.go b/src/runtime/defs_aix_ppc64.go
new file mode 100644
index 0000000..8e85096
--- /dev/null
+++ b/src/runtime/defs_aix_ppc64.go
@@ -0,0 +1,212 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build aix
+
+package runtime
+
+const (
+	_EPERM     = 0x1
+	_ENOENT    = 0x2
+	_EINTR     = 0x4
+	_EAGAIN    = 0xb
+	_ENOMEM    = 0xc
+	_EACCES    = 0xd
+	_EFAULT    = 0xe
+	_EINVAL    = 0x16
+	_ETIMEDOUT = 0x4e
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON      = 0x10
+	_MAP_PRIVATE   = 0x2
+	_MAP_FIXED     = 0x100
+	_MADV_DONTNEED = 0x4
+
+	_SIGHUP     = 0x1
+	_SIGINT     = 0x2
+	_SIGQUIT    = 0x3
+	_SIGILL     = 0x4
+	_SIGTRAP    = 0x5
+	_SIGABRT    = 0x6
+	_SIGBUS     = 0xa
+	_SIGFPE     = 0x8
+	_SIGKILL    = 0x9
+	_SIGUSR1    = 0x1e
+	_SIGSEGV    = 0xb
+	_SIGUSR2    = 0x1f
+	_SIGPIPE    = 0xd
+	_SIGALRM    = 0xe
+	_SIGCHLD    = 0x14
+	_SIGCONT    = 0x13
+	_SIGSTOP    = 0x11
+	_SIGTSTP    = 0x12
+	_SIGTTIN    = 0x15
+	_SIGTTOU    = 0x16
+	_SIGURG     = 0x10
+	_SIGXCPU    = 0x18
+	_SIGXFSZ    = 0x19
+	_SIGVTALRM  = 0x22
+	_SIGPROF    = 0x20
+	_SIGWINCH   = 0x1c
+	_SIGIO      = 0x17
+	_SIGPWR     = 0x1d
+	_SIGSYS     = 0xc
+	_SIGTERM    = 0xf
+	_SIGEMT     = 0x7
+	_SIGWAITING = 0x27
+
+	_FPE_INTDIV = 0x14
+	_FPE_INTOVF = 0x15
+	_FPE_FLTDIV = 0x16
+	_FPE_FLTOVF = 0x17
+	_FPE_FLTUND = 0x18
+	_FPE_FLTRES = 0x19
+	_FPE_FLTINV = 0x1a
+	_FPE_FLTSUB = 0x1b
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+	_
+	_SEGV_MAPERR = 0x32
+	_SEGV_ACCERR = 0x33
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_O_RDONLY   = 0x0
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x100
+	_O_TRUNC    = 0x200
+
+	_SS_DISABLE  = 0x2
+	_SI_USER     = 0x0
+	_SIG_BLOCK   = 0x0
+	_SIG_UNBLOCK = 0x1
+	_SIG_SETMASK = 0x2
+
+	_SA_SIGINFO = 0x100
+	_SA_RESTART = 0x8
+	_SA_ONSTACK = 0x1
+
+	_PTHREAD_CREATE_DETACHED = 0x1
+
+	__SC_PAGE_SIZE        = 0x30
+	__SC_NPROCESSORS_ONLN = 0x48
+
+	_F_SETFL = 0x4
+	_F_GETFD = 0x1
+	_F_GETFL = 0x3
+)
+
+type sigset [4]uint64
+
+var sigset_all = sigset{^uint64(0), ^uint64(0), ^uint64(0), ^uint64(0)}
+
+type siginfo struct {
+	si_signo   int32
+	si_errno   int32
+	si_code    int32
+	si_pid     int32
+	si_uid     uint32
+	si_status  int32
+	si_addr    uintptr
+	si_band    int64
+	si_value   [2]int32 // [8]byte
+	__si_flags int32
+	__pad      [3]int32
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec    int64
+	tv_usec   int32
+	pad_cgo_0 [4]byte
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	__pad     [4]int32
+	pas_cgo_0 [4]byte
+}
+
+type sigcontext struct {
+	sc_onstack int32
+	pad_cgo_0  [4]byte
+	sc_mask    sigset
+	sc_uerror  int32
+	sc_jmpbuf  context64
+}
+
+type ucontext struct {
+	__sc_onstack   int32
+	pad_cgo_0      [4]byte
+	uc_sigmask     sigset
+	__sc_error     int32
+	pad_cgo_1      [4]byte
+	uc_mcontext    context64
+	uc_link        *ucontext
+	uc_stack       stackt
+	__extctx       uintptr // pointer to struct __extctx but we don't use it
+	__extctx_magic int32
+	__pad          int32
+}
+
+type context64 struct {
+	gpr        [32]uint64
+	msr        uint64
+	iar        uint64
+	lr         uint64
+	ctr        uint64
+	cr         uint32
+	xer        uint32
+	fpscr      uint32
+	fpscrx     uint32
+	except     [1]uint64
+	fpr        [32]float64
+	fpeu       uint8
+	fpinfo     uint8
+	fpscr24_31 uint8
+	pad        [1]uint8
+	excp_type  int32
+}
+
+type sigactiont struct {
+	sa_handler uintptr // a union of two pointer
+	sa_mask    sigset
+	sa_flags   int32
+	pad_cgo_0  [4]byte
+}
+
+type pthread uint32
+type pthread_attr *byte
+
+type semt int32
diff --git a/src/runtime/defs_arm_linux.go b/src/runtime/defs_arm_linux.go
new file mode 100644
index 0000000..805735b
--- /dev/null
+++ b/src/runtime/defs_arm_linux.go
@@ -0,0 +1,124 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+/*
+Input to cgo.
+On a Debian Lenny arm linux distribution:
+
+cgo -cdefs defs_arm.c >arm/defs.h
+*/
+
+package runtime
+
+/*
+#cgo CFLAGS: -I/usr/src/linux-headers-2.6.26-2-versatile/include
+
+#define __ARCH_SI_UID_T int
+#include <asm/signal.h>
+#include <asm/mman.h>
+#include <asm/sigcontext.h>
+#include <asm/ucontext.h>
+#include <asm/siginfo.h>
+#include <linux/time.h>
+
+struct xsiginfo {
+	int si_signo;
+	int si_errno;
+	int si_code;
+	char _sifields[4];
+};
+
+#undef sa_handler
+#undef sa_flags
+#undef sa_restorer
+#undef sa_mask
+
+struct xsigaction {
+	void (*sa_handler)(void);
+	unsigned long sa_flags;
+	void (*sa_restorer)(void);
+	unsigned int sa_mask;		// mask last for extensibility
+};
+*/
+import "C"
+
+const (
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANONYMOUS
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+
+	MADV_DONTNEED = C.MADV_DONTNEED
+
+	SA_RESTART  = C.SA_RESTART
+	SA_ONSTACK  = C.SA_ONSTACK
+	SA_RESTORER = C.SA_RESTORER
+	SA_SIGINFO  = C.SA_SIGINFO
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGBUS    = C.SIGBUS
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGUSR1   = C.SIGUSR1
+	SIGSEGV   = C.SIGSEGV
+	SIGUSR2   = C.SIGUSR2
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGSTKFLT = C.SIGSTKFLT
+	SIGCHLD   = C.SIGCHLD
+	SIGCONT   = C.SIGCONT
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGURG    = C.SIGURG
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGIO     = C.SIGIO
+	SIGPWR    = C.SIGPWR
+	SIGSYS    = C.SIGSYS
+
+	FPE_INTDIV = C.FPE_INTDIV & 0xFFFF
+	FPE_INTOVF = C.FPE_INTOVF & 0xFFFF
+	FPE_FLTDIV = C.FPE_FLTDIV & 0xFFFF
+	FPE_FLTOVF = C.FPE_FLTOVF & 0xFFFF
+	FPE_FLTUND = C.FPE_FLTUND & 0xFFFF
+	FPE_FLTRES = C.FPE_FLTRES & 0xFFFF
+	FPE_FLTINV = C.FPE_FLTINV & 0xFFFF
+	FPE_FLTSUB = C.FPE_FLTSUB & 0xFFFF
+
+	BUS_ADRALN = C.BUS_ADRALN & 0xFFFF
+	BUS_ADRERR = C.BUS_ADRERR & 0xFFFF
+	BUS_OBJERR = C.BUS_OBJERR & 0xFFFF
+
+	SEGV_MAPERR = C.SEGV_MAPERR & 0xFFFF
+	SEGV_ACCERR = C.SEGV_ACCERR & 0xFFFF
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_PROF    = C.ITIMER_PROF
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+)
+
+type Timespec C.struct_timespec
+type StackT C.stack_t
+type Sigcontext C.struct_sigcontext
+type Ucontext C.struct_ucontext
+type Timeval C.struct_timeval
+type Itimerval C.struct_itimerval
+type Siginfo C.struct_xsiginfo
+type Sigaction C.struct_xsigaction
diff --git a/src/runtime/defs_darwin.go b/src/runtime/defs_darwin.go
new file mode 100644
index 0000000..9c6eeee
--- /dev/null
+++ b/src/runtime/defs_darwin.go
@@ -0,0 +1,165 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -cdefs defs_darwin.go >defs_darwin_amd64.h
+*/
+
+package runtime
+
+/*
+#define __DARWIN_UNIX03 0
+#include <mach/mach_time.h>
+#include <sys/types.h>
+#include <sys/time.h>
+#include <errno.h>
+#include <signal.h>
+#include <sys/event.h>
+#include <sys/mman.h>
+#include <pthread.h>
+#include <fcntl.h>
+*/
+import "C"
+
+const (
+	EINTR     = C.EINTR
+	EFAULT    = C.EFAULT
+	EAGAIN    = C.EAGAIN
+	ETIMEDOUT = C.ETIMEDOUT
+
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANON
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+
+	MADV_DONTNEED      = C.MADV_DONTNEED
+	MADV_FREE          = C.MADV_FREE
+	MADV_FREE_REUSABLE = C.MADV_FREE_REUSABLE
+	MADV_FREE_REUSE    = C.MADV_FREE_REUSE
+
+	SA_SIGINFO   = C.SA_SIGINFO
+	SA_RESTART   = C.SA_RESTART
+	SA_ONSTACK   = C.SA_ONSTACK
+	SA_USERTRAMP = C.SA_USERTRAMP
+	SA_64REGSET  = C.SA_64REGSET
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGEMT    = C.SIGEMT
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGBUS    = C.SIGBUS
+	SIGSEGV   = C.SIGSEGV
+	SIGSYS    = C.SIGSYS
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGTERM   = C.SIGTERM
+	SIGURG    = C.SIGURG
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGCONT   = C.SIGCONT
+	SIGCHLD   = C.SIGCHLD
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGIO     = C.SIGIO
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGINFO   = C.SIGINFO
+	SIGUSR1   = C.SIGUSR1
+	SIGUSR2   = C.SIGUSR2
+
+	FPE_INTDIV = C.FPE_INTDIV
+	FPE_INTOVF = C.FPE_INTOVF
+	FPE_FLTDIV = C.FPE_FLTDIV
+	FPE_FLTOVF = C.FPE_FLTOVF
+	FPE_FLTUND = C.FPE_FLTUND
+	FPE_FLTRES = C.FPE_FLTRES
+	FPE_FLTINV = C.FPE_FLTINV
+	FPE_FLTSUB = C.FPE_FLTSUB
+
+	BUS_ADRALN = C.BUS_ADRALN
+	BUS_ADRERR = C.BUS_ADRERR
+	BUS_OBJERR = C.BUS_OBJERR
+
+	SEGV_MAPERR = C.SEGV_MAPERR
+	SEGV_ACCERR = C.SEGV_ACCERR
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	ITIMER_PROF    = C.ITIMER_PROF
+
+	EV_ADD       = C.EV_ADD
+	EV_DELETE    = C.EV_DELETE
+	EV_CLEAR     = C.EV_CLEAR
+	EV_RECEIPT   = C.EV_RECEIPT
+	EV_ERROR     = C.EV_ERROR
+	EV_EOF       = C.EV_EOF
+	EVFILT_READ  = C.EVFILT_READ
+	EVFILT_WRITE = C.EVFILT_WRITE
+
+	PTHREAD_CREATE_DETACHED = C.PTHREAD_CREATE_DETACHED
+
+	F_GETFL = C.F_GETFL
+	F_SETFL = C.F_SETFL
+
+	O_WRONLY   = C.O_WRONLY
+	O_NONBLOCK = C.O_NONBLOCK
+	O_CREAT    = C.O_CREAT
+	O_TRUNC    = C.O_TRUNC
+)
+
+type StackT C.struct_sigaltstack
+type Sighandler C.union___sigaction_u
+
+type Sigaction C.struct___sigaction // used in syscalls
+type Usigaction C.struct_sigaction  // used by sigaction second argument
+type Sigset C.sigset_t
+type Sigval C.union_sigval
+type Siginfo C.siginfo_t
+type Timeval C.struct_timeval
+type Itimerval C.struct_itimerval
+type Timespec C.struct_timespec
+
+type FPControl C.struct_fp_control
+type FPStatus C.struct_fp_status
+type RegMMST C.struct_mmst_reg
+type RegXMM C.struct_xmm_reg
+
+type Regs64 C.struct_x86_thread_state64
+type FloatState64 C.struct_x86_float_state64
+type ExceptionState64 C.struct_x86_exception_state64
+type Mcontext64 C.struct_mcontext64
+
+type Regs32 C.struct_i386_thread_state
+type FloatState32 C.struct_i386_float_state
+type ExceptionState32 C.struct_i386_exception_state
+type Mcontext32 C.struct_mcontext32
+
+type Ucontext C.struct_ucontext
+
+type Kevent C.struct_kevent
+
+type Pthread C.pthread_t
+type PthreadAttr C.pthread_attr_t
+type PthreadMutex C.pthread_mutex_t
+type PthreadMutexAttr C.pthread_mutexattr_t
+type PthreadCond C.pthread_cond_t
+type PthreadCondAttr C.pthread_condattr_t
+
+type MachTimebaseInfo C.mach_timebase_info_data_t
diff --git a/src/runtime/defs_darwin_amd64.go b/src/runtime/defs_darwin_amd64.go
new file mode 100644
index 0000000..fc7de33
--- /dev/null
+++ b/src/runtime/defs_darwin_amd64.go
@@ -0,0 +1,373 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_darwin.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR     = 0x4
+	_EFAULT    = 0xe
+	_EAGAIN    = 0x23
+	_ETIMEDOUT = 0x3c
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED      = 0x4
+	_MADV_FREE          = 0x5
+	_MADV_FREE_REUSABLE = 0x7
+	_MADV_FREE_REUSE    = 0x8
+
+	_SA_SIGINFO   = 0x40
+	_SA_RESTART   = 0x2
+	_SA_ONSTACK   = 0x1
+	_SA_USERTRAMP = 0x100
+	_SA_64REGSET  = 0x200
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x7
+	_FPE_INTOVF = 0x8
+	_FPE_FLTDIV = 0x1
+	_FPE_FLTOVF = 0x2
+	_FPE_FLTUND = 0x3
+	_FPE_FLTRES = 0x4
+	_FPE_FLTINV = 0x5
+	_FPE_FLTSUB = 0x6
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0x40
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+
+	_PTHREAD_CREATE_DETACHED = 0x2
+
+	_F_GETFL = 0x3
+	_F_SETFL = 0x4
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+)
+
+type stackt struct {
+	ss_sp     *byte
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type sigactiont struct {
+	__sigaction_u [8]byte
+	sa_tramp      unsafe.Pointer
+	sa_mask       uint32
+	sa_flags      int32
+}
+
+type usigactiont struct {
+	__sigaction_u [8]byte
+	sa_mask       uint32
+	sa_flags      int32
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_errno  int32
+	si_code   int32
+	si_pid    int32
+	si_uid    uint32
+	si_status int32
+	si_addr   uint64
+	si_value  [8]byte
+	si_band   int64
+	__pad     [7]uint64
+}
+
+type timeval struct {
+	tv_sec    int64
+	tv_usec   int32
+	pad_cgo_0 [4]byte
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type fpcontrol struct {
+	pad_cgo_0 [2]byte
+}
+
+type fpstatus struct {
+	pad_cgo_0 [2]byte
+}
+
+type regmmst struct {
+	mmst_reg  [10]int8
+	mmst_rsrv [6]int8
+}
+
+type regxmm struct {
+	xmm_reg [16]int8
+}
+
+type regs64 struct {
+	rax    uint64
+	rbx    uint64
+	rcx    uint64
+	rdx    uint64
+	rdi    uint64
+	rsi    uint64
+	rbp    uint64
+	rsp    uint64
+	r8     uint64
+	r9     uint64
+	r10    uint64
+	r11    uint64
+	r12    uint64
+	r13    uint64
+	r14    uint64
+	r15    uint64
+	rip    uint64
+	rflags uint64
+	cs     uint64
+	fs     uint64
+	gs     uint64
+}
+
+type floatstate64 struct {
+	fpu_reserved  [2]int32
+	fpu_fcw       fpcontrol
+	fpu_fsw       fpstatus
+	fpu_ftw       uint8
+	fpu_rsrv1     uint8
+	fpu_fop       uint16
+	fpu_ip        uint32
+	fpu_cs        uint16
+	fpu_rsrv2     uint16
+	fpu_dp        uint32
+	fpu_ds        uint16
+	fpu_rsrv3     uint16
+	fpu_mxcsr     uint32
+	fpu_mxcsrmask uint32
+	fpu_stmm0     regmmst
+	fpu_stmm1     regmmst
+	fpu_stmm2     regmmst
+	fpu_stmm3     regmmst
+	fpu_stmm4     regmmst
+	fpu_stmm5     regmmst
+	fpu_stmm6     regmmst
+	fpu_stmm7     regmmst
+	fpu_xmm0      regxmm
+	fpu_xmm1      regxmm
+	fpu_xmm2      regxmm
+	fpu_xmm3      regxmm
+	fpu_xmm4      regxmm
+	fpu_xmm5      regxmm
+	fpu_xmm6      regxmm
+	fpu_xmm7      regxmm
+	fpu_xmm8      regxmm
+	fpu_xmm9      regxmm
+	fpu_xmm10     regxmm
+	fpu_xmm11     regxmm
+	fpu_xmm12     regxmm
+	fpu_xmm13     regxmm
+	fpu_xmm14     regxmm
+	fpu_xmm15     regxmm
+	fpu_rsrv4     [96]int8
+	fpu_reserved1 int32
+}
+
+type exceptionstate64 struct {
+	trapno     uint16
+	cpu        uint16
+	err        uint32
+	faultvaddr uint64
+}
+
+type mcontext64 struct {
+	es        exceptionstate64
+	ss        regs64
+	fs        floatstate64
+	pad_cgo_0 [4]byte
+}
+
+type regs32 struct {
+	eax    uint32
+	ebx    uint32
+	ecx    uint32
+	edx    uint32
+	edi    uint32
+	esi    uint32
+	ebp    uint32
+	esp    uint32
+	ss     uint32
+	eflags uint32
+	eip    uint32
+	cs     uint32
+	ds     uint32
+	es     uint32
+	fs     uint32
+	gs     uint32
+}
+
+type floatstate32 struct {
+	fpu_reserved  [2]int32
+	fpu_fcw       fpcontrol
+	fpu_fsw       fpstatus
+	fpu_ftw       uint8
+	fpu_rsrv1     uint8
+	fpu_fop       uint16
+	fpu_ip        uint32
+	fpu_cs        uint16
+	fpu_rsrv2     uint16
+	fpu_dp        uint32
+	fpu_ds        uint16
+	fpu_rsrv3     uint16
+	fpu_mxcsr     uint32
+	fpu_mxcsrmask uint32
+	fpu_stmm0     regmmst
+	fpu_stmm1     regmmst
+	fpu_stmm2     regmmst
+	fpu_stmm3     regmmst
+	fpu_stmm4     regmmst
+	fpu_stmm5     regmmst
+	fpu_stmm6     regmmst
+	fpu_stmm7     regmmst
+	fpu_xmm0      regxmm
+	fpu_xmm1      regxmm
+	fpu_xmm2      regxmm
+	fpu_xmm3      regxmm
+	fpu_xmm4      regxmm
+	fpu_xmm5      regxmm
+	fpu_xmm6      regxmm
+	fpu_xmm7      regxmm
+	fpu_rsrv4     [224]int8
+	fpu_reserved1 int32
+}
+
+type exceptionstate32 struct {
+	trapno     uint16
+	cpu        uint16
+	err        uint32
+	faultvaddr uint32
+}
+
+type mcontext32 struct {
+	es exceptionstate32
+	ss regs32
+	fs floatstate32
+}
+
+type ucontext struct {
+	uc_onstack  int32
+	uc_sigmask  uint32
+	uc_stack    stackt
+	uc_link     *ucontext
+	uc_mcsize   uint64
+	uc_mcontext *mcontext64
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+}
+
+type pthread uintptr
+type pthreadattr struct {
+	X__sig    int64
+	X__opaque [56]int8
+}
+type pthreadmutex struct {
+	X__sig    int64
+	X__opaque [56]int8
+}
+type pthreadmutexattr struct {
+	X__sig    int64
+	X__opaque [8]int8
+}
+type pthreadcond struct {
+	X__sig    int64
+	X__opaque [40]int8
+}
+type pthreadcondattr struct {
+	X__sig    int64
+	X__opaque [8]int8
+}
+
+type machTimebaseInfo struct {
+	numer uint32
+	denom uint32
+}
diff --git a/src/runtime/defs_darwin_arm64.go b/src/runtime/defs_darwin_arm64.go
new file mode 100644
index 0000000..e26df02
--- /dev/null
+++ b/src/runtime/defs_darwin_arm64.go
@@ -0,0 +1,240 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_darwin.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR     = 0x4
+	_EFAULT    = 0xe
+	_EAGAIN    = 0x23
+	_ETIMEDOUT = 0x3c
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED      = 0x4
+	_MADV_FREE          = 0x5
+	_MADV_FREE_REUSABLE = 0x7
+	_MADV_FREE_REUSE    = 0x8
+
+	_SA_SIGINFO   = 0x40
+	_SA_RESTART   = 0x2
+	_SA_ONSTACK   = 0x1
+	_SA_USERTRAMP = 0x100
+	_SA_64REGSET  = 0x200
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x7
+	_FPE_INTOVF = 0x8
+	_FPE_FLTDIV = 0x1
+	_FPE_FLTOVF = 0x2
+	_FPE_FLTUND = 0x3
+	_FPE_FLTRES = 0x4
+	_FPE_FLTINV = 0x5
+	_FPE_FLTSUB = 0x6
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0x40
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+
+	_PTHREAD_CREATE_DETACHED = 0x2
+
+	_PTHREAD_KEYS_MAX = 512
+
+	_F_GETFL = 0x3
+	_F_SETFL = 0x4
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+)
+
+type stackt struct {
+	ss_sp     *byte
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type sigactiont struct {
+	__sigaction_u [8]byte
+	sa_tramp      unsafe.Pointer
+	sa_mask       uint32
+	sa_flags      int32
+}
+
+type usigactiont struct {
+	__sigaction_u [8]byte
+	sa_mask       uint32
+	sa_flags      int32
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_errno  int32
+	si_code   int32
+	si_pid    int32
+	si_uid    uint32
+	si_status int32
+	si_addr   *byte
+	si_value  [8]byte
+	si_band   int64
+	__pad     [7]uint64
+}
+
+type timeval struct {
+	tv_sec    int64
+	tv_usec   int32
+	pad_cgo_0 [4]byte
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type exceptionstate64 struct {
+	far uint64 // virtual fault addr
+	esr uint32 // exception syndrome
+	exc uint32 // number of arm exception taken
+}
+
+type regs64 struct {
+	x     [29]uint64 // registers x0 to x28
+	fp    uint64     // frame register, x29
+	lr    uint64     // link register, x30
+	sp    uint64     // stack pointer, x31
+	pc    uint64     // program counter
+	cpsr  uint32     // current program status register
+	__pad uint32
+}
+
+type neonstate64 struct {
+	v    [64]uint64 // actually [32]uint128
+	fpsr uint32
+	fpcr uint32
+}
+
+type mcontext64 struct {
+	es exceptionstate64
+	ss regs64
+	ns neonstate64
+}
+
+type ucontext struct {
+	uc_onstack  int32
+	uc_sigmask  uint32
+	uc_stack    stackt
+	uc_link     *ucontext
+	uc_mcsize   uint64
+	uc_mcontext *mcontext64
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+}
+
+type pthread uintptr
+type pthreadattr struct {
+	X__sig    int64
+	X__opaque [56]int8
+}
+type pthreadmutex struct {
+	X__sig    int64
+	X__opaque [56]int8
+}
+type pthreadmutexattr struct {
+	X__sig    int64
+	X__opaque [8]int8
+}
+type pthreadcond struct {
+	X__sig    int64
+	X__opaque [40]int8
+}
+type pthreadcondattr struct {
+	X__sig    int64
+	X__opaque [8]int8
+}
+
+type machTimebaseInfo struct {
+	numer uint32
+	denom uint32
+}
+
+type pthreadkey uint64
diff --git a/src/runtime/defs_dragonfly.go b/src/runtime/defs_dragonfly.go
new file mode 100644
index 0000000..9dcfdf0
--- /dev/null
+++ b/src/runtime/defs_dragonfly.go
@@ -0,0 +1,132 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -cdefs defs_dragonfly.go >defs_dragonfly_amd64.h
+*/
+
+package runtime
+
+/*
+#include <sys/user.h>
+#include <sys/time.h>
+#include <sys/event.h>
+#include <sys/mman.h>
+#include <sys/ucontext.h>
+#include <sys/rtprio.h>
+#include <sys/signal.h>
+#include <sys/unistd.h>
+#include <errno.h>
+#include <signal.h>
+*/
+import "C"
+
+const (
+	EINTR  = C.EINTR
+	EFAULT = C.EFAULT
+	EBUSY  = C.EBUSY
+	EAGAIN = C.EAGAIN
+
+	O_WRONLY   = C.O_WRONLY
+	O_NONBLOCK = C.O_NONBLOCK
+	O_CREAT    = C.O_CREAT
+	O_TRUNC    = C.O_TRUNC
+	O_CLOEXEC  = C.O_CLOEXEC
+
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANON
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+
+	MADV_DONTNEED = C.MADV_DONTNEED
+	MADV_FREE     = C.MADV_FREE
+
+	SA_SIGINFO = C.SA_SIGINFO
+	SA_RESTART = C.SA_RESTART
+	SA_ONSTACK = C.SA_ONSTACK
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGEMT    = C.SIGEMT
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGBUS    = C.SIGBUS
+	SIGSEGV   = C.SIGSEGV
+	SIGSYS    = C.SIGSYS
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGTERM   = C.SIGTERM
+	SIGURG    = C.SIGURG
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGCONT   = C.SIGCONT
+	SIGCHLD   = C.SIGCHLD
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGIO     = C.SIGIO
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGINFO   = C.SIGINFO
+	SIGUSR1   = C.SIGUSR1
+	SIGUSR2   = C.SIGUSR2
+
+	FPE_INTDIV = C.FPE_INTDIV
+	FPE_INTOVF = C.FPE_INTOVF
+	FPE_FLTDIV = C.FPE_FLTDIV
+	FPE_FLTOVF = C.FPE_FLTOVF
+	FPE_FLTUND = C.FPE_FLTUND
+	FPE_FLTRES = C.FPE_FLTRES
+	FPE_FLTINV = C.FPE_FLTINV
+	FPE_FLTSUB = C.FPE_FLTSUB
+
+	BUS_ADRALN = C.BUS_ADRALN
+	BUS_ADRERR = C.BUS_ADRERR
+	BUS_OBJERR = C.BUS_OBJERR
+
+	SEGV_MAPERR = C.SEGV_MAPERR
+	SEGV_ACCERR = C.SEGV_ACCERR
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	ITIMER_PROF    = C.ITIMER_PROF
+
+	EV_ADD       = C.EV_ADD
+	EV_DELETE    = C.EV_DELETE
+	EV_CLEAR     = C.EV_CLEAR
+	EV_ERROR     = C.EV_ERROR
+	EV_EOF       = C.EV_EOF
+	EVFILT_READ  = C.EVFILT_READ
+	EVFILT_WRITE = C.EVFILT_WRITE
+)
+
+type Rtprio C.struct_rtprio
+type Lwpparams C.struct_lwp_params
+type Sigset C.struct___sigset
+type StackT C.stack_t
+
+type Siginfo C.siginfo_t
+
+type Mcontext C.mcontext_t
+type Ucontext C.ucontext_t
+
+type Timespec C.struct_timespec
+type Timeval C.struct_timeval
+type Itimerval C.struct_itimerval
+
+type Kevent C.struct_kevent
diff --git a/src/runtime/defs_dragonfly_amd64.go b/src/runtime/defs_dragonfly_amd64.go
new file mode 100644
index 0000000..f1a2302
--- /dev/null
+++ b/src/runtime/defs_dragonfly_amd64.go
@@ -0,0 +1,211 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_dragonfly.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EBUSY  = 0x10
+	_EAGAIN = 0x23
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+	_O_CLOEXEC  = 0x20000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED = 0x4
+	_MADV_FREE     = 0x5
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x2
+	_FPE_INTOVF = 0x1
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type rtprio struct {
+	_type uint16
+	prio  uint16
+}
+
+type lwpparams struct {
+	start_func uintptr
+	arg        unsafe.Pointer
+	stack      uintptr
+	tid1       unsafe.Pointer // *int32
+	tid2       unsafe.Pointer // *int32
+}
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_errno  int32
+	si_code   int32
+	si_pid    int32
+	si_uid    uint32
+	si_status int32
+	si_addr   uint64
+	si_value  [8]byte
+	si_band   int64
+	__spare__ [7]int32
+	pad_cgo_0 [4]byte
+}
+
+type mcontext struct {
+	mc_onstack  uint64
+	mc_rdi      uint64
+	mc_rsi      uint64
+	mc_rdx      uint64
+	mc_rcx      uint64
+	mc_r8       uint64
+	mc_r9       uint64
+	mc_rax      uint64
+	mc_rbx      uint64
+	mc_rbp      uint64
+	mc_r10      uint64
+	mc_r11      uint64
+	mc_r12      uint64
+	mc_r13      uint64
+	mc_r14      uint64
+	mc_r15      uint64
+	mc_xflags   uint64
+	mc_trapno   uint64
+	mc_addr     uint64
+	mc_flags    uint64
+	mc_err      uint64
+	mc_rip      uint64
+	mc_cs       uint64
+	mc_rflags   uint64
+	mc_rsp      uint64
+	mc_ss       uint64
+	mc_len      uint32
+	mc_fpformat uint32
+	mc_ownedfp  uint32
+	mc_reserved uint32
+	mc_unused   [8]uint32
+	mc_fpregs   [256]int32
+}
+
+type ucontext struct {
+	uc_sigmask  sigset
+	pad_cgo_0   [48]byte
+	uc_mcontext mcontext
+	uc_link     *ucontext
+	uc_stack    stackt
+	__spare__   [8]int32
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+}
diff --git a/src/runtime/defs_freebsd.go b/src/runtime/defs_freebsd.go
new file mode 100644
index 0000000..d86ae91
--- /dev/null
+++ b/src/runtime/defs_freebsd.go
@@ -0,0 +1,174 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -cdefs defs_freebsd.go >defs_freebsd_amd64.h
+GOARCH=386 go tool cgo -cdefs defs_freebsd.go >defs_freebsd_386.h
+GOARCH=arm go tool cgo -cdefs defs_freebsd.go >defs_freebsd_arm.h
+*/
+
+package runtime
+
+/*
+#include <sys/types.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <sys/time.h>
+#include <signal.h>
+#include <errno.h>
+#include <sys/event.h>
+#include <sys/mman.h>
+#include <sys/ucontext.h>
+#include <sys/umtx.h>
+#include <sys/_umtx.h>
+#include <sys/rtprio.h>
+#include <sys/thr.h>
+#include <sys/_sigset.h>
+#include <sys/unistd.h>
+#include <sys/sysctl.h>
+#include <sys/cpuset.h>
+#include <sys/param.h>
+#include <sys/vdso.h>
+*/
+import "C"
+
+// Local consts.
+const (
+	_NBBY            = C.NBBY            // Number of bits in a byte.
+	_CTL_MAXNAME     = C.CTL_MAXNAME     // Largest number of components supported.
+	_CPU_LEVEL_WHICH = C.CPU_LEVEL_WHICH // Actual mask/id for which.
+	_CPU_WHICH_PID   = C.CPU_WHICH_PID   // Specifies a process id.
+)
+
+const (
+	EINTR     = C.EINTR
+	EFAULT    = C.EFAULT
+	EAGAIN    = C.EAGAIN
+	ETIMEDOUT = C.ETIMEDOUT
+
+	O_WRONLY   = C.O_WRONLY
+	O_NONBLOCK = C.O_NONBLOCK
+	O_CREAT    = C.O_CREAT
+	O_TRUNC    = C.O_TRUNC
+	O_CLOEXEC  = C.O_CLOEXEC
+
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANON
+	MAP_SHARED  = C.MAP_SHARED
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+
+	MADV_DONTNEED = C.MADV_DONTNEED
+	MADV_FREE     = C.MADV_FREE
+
+	SA_SIGINFO = C.SA_SIGINFO
+	SA_RESTART = C.SA_RESTART
+	SA_ONSTACK = C.SA_ONSTACK
+
+	CLOCK_MONOTONIC = C.CLOCK_MONOTONIC
+	CLOCK_REALTIME  = C.CLOCK_REALTIME
+
+	UMTX_OP_WAIT_UINT         = C.UMTX_OP_WAIT_UINT
+	UMTX_OP_WAIT_UINT_PRIVATE = C.UMTX_OP_WAIT_UINT_PRIVATE
+	UMTX_OP_WAKE              = C.UMTX_OP_WAKE
+	UMTX_OP_WAKE_PRIVATE      = C.UMTX_OP_WAKE_PRIVATE
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGEMT    = C.SIGEMT
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGBUS    = C.SIGBUS
+	SIGSEGV   = C.SIGSEGV
+	SIGSYS    = C.SIGSYS
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGTERM   = C.SIGTERM
+	SIGURG    = C.SIGURG
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGCONT   = C.SIGCONT
+	SIGCHLD   = C.SIGCHLD
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGIO     = C.SIGIO
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGINFO   = C.SIGINFO
+	SIGUSR1   = C.SIGUSR1
+	SIGUSR2   = C.SIGUSR2
+
+	FPE_INTDIV = C.FPE_INTDIV
+	FPE_INTOVF = C.FPE_INTOVF
+	FPE_FLTDIV = C.FPE_FLTDIV
+	FPE_FLTOVF = C.FPE_FLTOVF
+	FPE_FLTUND = C.FPE_FLTUND
+	FPE_FLTRES = C.FPE_FLTRES
+	FPE_FLTINV = C.FPE_FLTINV
+	FPE_FLTSUB = C.FPE_FLTSUB
+
+	BUS_ADRALN = C.BUS_ADRALN
+	BUS_ADRERR = C.BUS_ADRERR
+	BUS_OBJERR = C.BUS_OBJERR
+
+	SEGV_MAPERR = C.SEGV_MAPERR
+	SEGV_ACCERR = C.SEGV_ACCERR
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	ITIMER_PROF    = C.ITIMER_PROF
+
+	EV_ADD       = C.EV_ADD
+	EV_DELETE    = C.EV_DELETE
+	EV_CLEAR     = C.EV_CLEAR
+	EV_RECEIPT   = C.EV_RECEIPT
+	EV_ERROR     = C.EV_ERROR
+	EV_EOF       = C.EV_EOF
+	EVFILT_READ  = C.EVFILT_READ
+	EVFILT_WRITE = C.EVFILT_WRITE
+)
+
+type Rtprio C.struct_rtprio
+type ThrParam C.struct_thr_param
+type Sigset C.struct___sigset
+type StackT C.stack_t
+
+type Siginfo C.siginfo_t
+
+type Mcontext C.mcontext_t
+type Ucontext C.ucontext_t
+
+type Timespec C.struct_timespec
+type Timeval C.struct_timeval
+type Itimerval C.struct_itimerval
+
+type Umtx_time C.struct__umtx_time
+
+type KeventT C.struct_kevent
+
+type bintime C.struct_bintime
+type vdsoTimehands C.struct_vdso_timehands
+type vdsoTimekeep C.struct_vdso_timekeep
+
+const (
+	_VDSO_TK_VER_CURR = C.VDSO_TK_VER_CURR
+
+	vdsoTimehandsSize = C.sizeof_struct_vdso_timehands
+	vdsoTimekeepSize  = C.sizeof_struct_vdso_timekeep
+)
diff --git a/src/runtime/defs_freebsd_386.go b/src/runtime/defs_freebsd_386.go
new file mode 100644
index 0000000..ee82741
--- /dev/null
+++ b/src/runtime/defs_freebsd_386.go
@@ -0,0 +1,270 @@
+// Code generated by cgo, then manually converted into appropriate naming and code
+// for the Go runtime.
+// go tool cgo -godefs defs_freebsd.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_NBBY            = 0x8
+	_CTL_MAXNAME     = 0x18
+	_CPU_LEVEL_WHICH = 0x3
+	_CPU_WHICH_PID   = 0x2
+)
+
+const (
+	_EINTR     = 0x4
+	_EFAULT    = 0xe
+	_EAGAIN    = 0x23
+	_ETIMEDOUT = 0x3c
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+	_O_CLOEXEC  = 0x100000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_SHARED  = 0x1
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED = 0x4
+	_MADV_FREE     = 0x5
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_CLOCK_MONOTONIC = 0x4
+	_CLOCK_REALTIME  = 0x0
+
+	_UMTX_OP_WAIT_UINT         = 0xb
+	_UMTX_OP_WAIT_UINT_PRIVATE = 0xf
+	_UMTX_OP_WAKE              = 0x3
+	_UMTX_OP_WAKE_PRIVATE      = 0x10
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x2
+	_FPE_INTOVF = 0x1
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0x40
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type rtprio struct {
+	_type uint16
+	prio  uint16
+}
+
+type thrparam struct {
+	start_func uintptr
+	arg        unsafe.Pointer
+	stack_base uintptr
+	stack_size uintptr
+	tls_base   unsafe.Pointer
+	tls_size   uintptr
+	child_tid  unsafe.Pointer // *int32
+	parent_tid *int32
+	flags      int32
+	rtp        *rtprio
+	spare      [3]uintptr
+}
+
+type thread int32 // long
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type stackt struct {
+	ss_sp    uintptr
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_errno  int32
+	si_code   int32
+	si_pid    int32
+	si_uid    uint32
+	si_status int32
+	si_addr   uintptr
+	si_value  [4]byte
+	_reason   [32]byte
+}
+
+type mcontext struct {
+	mc_onstack       uint32
+	mc_gs            uint32
+	mc_fs            uint32
+	mc_es            uint32
+	mc_ds            uint32
+	mc_edi           uint32
+	mc_esi           uint32
+	mc_ebp           uint32
+	mc_isp           uint32
+	mc_ebx           uint32
+	mc_edx           uint32
+	mc_ecx           uint32
+	mc_eax           uint32
+	mc_trapno        uint32
+	mc_err           uint32
+	mc_eip           uint32
+	mc_cs            uint32
+	mc_eflags        uint32
+	mc_esp           uint32
+	mc_ss            uint32
+	mc_len           uint32
+	mc_fpformat      uint32
+	mc_ownedfp       uint32
+	mc_flags         uint32
+	mc_fpstate       [128]uint32
+	mc_fsbase        uint32
+	mc_gsbase        uint32
+	mc_xfpustate     uint32
+	mc_xfpustate_len uint32
+	mc_spare2        [4]uint32
+}
+
+type ucontext struct {
+	uc_sigmask  sigset
+	uc_mcontext mcontext
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_flags    int32
+	__spare__   [4]int32
+	pad_cgo_0   [12]byte
+}
+
+type timespec struct {
+	tv_sec  int32
+	tv_nsec int32
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = timediv(ns, 1e9, &ts.tv_nsec)
+}
+
+type timeval struct {
+	tv_sec  int32
+	tv_usec int32
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type umtx_time struct {
+	_timeout timespec
+	_flags   uint32
+	_clockid uint32
+}
+
+type keventt struct {
+	ident  uint32
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+	ext    [4]uint64
+}
+
+type bintime struct {
+	sec  int32
+	frac uint64
+}
+
+type vdsoTimehands struct {
+	algo         uint32
+	gen          uint32
+	scale        uint64
+	offset_count uint32
+	counter_mask uint32
+	offset       bintime
+	boottime     bintime
+	x86_shift    uint32
+	x86_hpet_idx uint32
+	res          [6]uint32
+}
+
+type vdsoTimekeep struct {
+	ver     uint32
+	enabled uint32
+	current uint32
+}
+
+const (
+	_VDSO_TK_VER_CURR = 0x1
+
+	vdsoTimehandsSize = 0x50
+	vdsoTimekeepSize  = 0xc
+)
diff --git a/src/runtime/defs_freebsd_amd64.go b/src/runtime/defs_freebsd_amd64.go
new file mode 100644
index 0000000..9003f92
--- /dev/null
+++ b/src/runtime/defs_freebsd_amd64.go
@@ -0,0 +1,282 @@
+// Code generated by cgo, then manually converted into appropriate naming and code
+// for the Go runtime.
+// go tool cgo -godefs defs_freebsd.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_NBBY            = 0x8
+	_CTL_MAXNAME     = 0x18
+	_CPU_LEVEL_WHICH = 0x3
+	_CPU_WHICH_PID   = 0x2
+)
+
+const (
+	_EINTR     = 0x4
+	_EFAULT    = 0xe
+	_EAGAIN    = 0x23
+	_ETIMEDOUT = 0x3c
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+	_O_CLOEXEC  = 0x100000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_SHARED  = 0x1
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED = 0x4
+	_MADV_FREE     = 0x5
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_CLOCK_MONOTONIC = 0x4
+	_CLOCK_REALTIME  = 0x0
+
+	_UMTX_OP_WAIT_UINT         = 0xb
+	_UMTX_OP_WAIT_UINT_PRIVATE = 0xf
+	_UMTX_OP_WAKE              = 0x3
+	_UMTX_OP_WAKE_PRIVATE      = 0x10
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x2
+	_FPE_INTOVF = 0x1
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0x40
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type rtprio struct {
+	_type uint16
+	prio  uint16
+}
+
+type thrparam struct {
+	start_func uintptr
+	arg        unsafe.Pointer
+	stack_base uintptr
+	stack_size uintptr
+	tls_base   unsafe.Pointer
+	tls_size   uintptr
+	child_tid  unsafe.Pointer // *int64
+	parent_tid *int64
+	flags      int32
+	pad_cgo_0  [4]byte
+	rtp        *rtprio
+	spare      [3]uintptr
+}
+
+type thread int64 // long
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_errno  int32
+	si_code   int32
+	si_pid    int32
+	si_uid    uint32
+	si_status int32
+	si_addr   uint64
+	si_value  [8]byte
+	_reason   [40]byte
+}
+
+type mcontext struct {
+	mc_onstack       uint64
+	mc_rdi           uint64
+	mc_rsi           uint64
+	mc_rdx           uint64
+	mc_rcx           uint64
+	mc_r8            uint64
+	mc_r9            uint64
+	mc_rax           uint64
+	mc_rbx           uint64
+	mc_rbp           uint64
+	mc_r10           uint64
+	mc_r11           uint64
+	mc_r12           uint64
+	mc_r13           uint64
+	mc_r14           uint64
+	mc_r15           uint64
+	mc_trapno        uint32
+	mc_fs            uint16
+	mc_gs            uint16
+	mc_addr          uint64
+	mc_flags         uint32
+	mc_es            uint16
+	mc_ds            uint16
+	mc_err           uint64
+	mc_rip           uint64
+	mc_cs            uint64
+	mc_rflags        uint64
+	mc_rsp           uint64
+	mc_ss            uint64
+	mc_len           uint64
+	mc_fpformat      uint64
+	mc_ownedfp       uint64
+	mc_fpstate       [64]uint64
+	mc_fsbase        uint64
+	mc_gsbase        uint64
+	mc_xfpustate     uint64
+	mc_xfpustate_len uint64
+	mc_spare         [4]uint64
+}
+
+type ucontext struct {
+	uc_sigmask  sigset
+	uc_mcontext mcontext
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_flags    int32
+	__spare__   [4]int32
+	pad_cgo_0   [12]byte
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type umtx_time struct {
+	_timeout timespec
+	_flags   uint32
+	_clockid uint32
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+	ext    [4]uint64
+}
+
+type bintime struct {
+	sec  int64
+	frac uint64
+}
+
+type vdsoTimehands struct {
+	algo         uint32
+	gen          uint32
+	scale        uint64
+	offset_count uint32
+	counter_mask uint32
+	offset       bintime
+	boottime     bintime
+	x86_shift    uint32
+	x86_hpet_idx uint32
+	res          [6]uint32
+}
+
+type vdsoTimekeep struct {
+	ver       uint32
+	enabled   uint32
+	current   uint32
+	pad_cgo_0 [4]byte
+}
+
+const (
+	_VDSO_TK_VER_CURR = 0x1
+
+	vdsoTimehandsSize = 0x58
+	vdsoTimekeepSize  = 0x10
+)
diff --git a/src/runtime/defs_freebsd_arm.go b/src/runtime/defs_freebsd_arm.go
new file mode 100644
index 0000000..68cc1b9
--- /dev/null
+++ b/src/runtime/defs_freebsd_arm.go
@@ -0,0 +1,245 @@
+// Code generated by cgo, then manually converted into appropriate naming and code
+// for the Go runtime.
+// go tool cgo -godefs defs_freebsd.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_NBBY            = 0x8
+	_CTL_MAXNAME     = 0x18
+	_CPU_LEVEL_WHICH = 0x3
+	_CPU_WHICH_PID   = 0x2
+)
+
+const (
+	_EINTR     = 0x4
+	_EFAULT    = 0xe
+	_EAGAIN    = 0x23
+	_ETIMEDOUT = 0x3c
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+	_O_CLOEXEC  = 0x100000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_SHARED  = 0x1
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED = 0x4
+	_MADV_FREE     = 0x5
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_CLOCK_MONOTONIC = 0x4
+	_CLOCK_REALTIME  = 0x0
+
+	_UMTX_OP_WAIT_UINT         = 0xb
+	_UMTX_OP_WAIT_UINT_PRIVATE = 0xf
+	_UMTX_OP_WAKE              = 0x3
+	_UMTX_OP_WAKE_PRIVATE      = 0x10
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x2
+	_FPE_INTOVF = 0x1
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0x40
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type rtprio struct {
+	_type uint16
+	prio  uint16
+}
+
+type thrparam struct {
+	start_func uintptr
+	arg        unsafe.Pointer
+	stack_base uintptr
+	stack_size uintptr
+	tls_base   unsafe.Pointer
+	tls_size   uintptr
+	child_tid  unsafe.Pointer // *int32
+	parent_tid *int32
+	flags      int32
+	rtp        *rtprio
+	spare      [3]uintptr
+}
+
+type thread int32 // long
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type stackt struct {
+	ss_sp    uintptr
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_errno  int32
+	si_code   int32
+	si_pid    int32
+	si_uid    uint32
+	si_status int32
+	si_addr   uintptr
+	si_value  [4]byte
+	_reason   [32]byte
+}
+
+type mcontext struct {
+	__gregs [17]uint32
+	__fpu   [140]byte
+}
+
+type ucontext struct {
+	uc_sigmask  sigset
+	uc_mcontext mcontext
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_flags    int32
+	__spare__   [4]int32
+}
+
+type timespec struct {
+	tv_sec    int64
+	tv_nsec   int32
+	pad_cgo_0 [4]byte
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = int64(timediv(ns, 1e9, &ts.tv_nsec))
+}
+
+type timeval struct {
+	tv_sec    int64
+	tv_usec   int32
+	pad_cgo_0 [4]byte
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type umtx_time struct {
+	_timeout timespec
+	_flags   uint32
+	_clockid uint32
+}
+
+type keventt struct {
+	ident     uint32
+	filter    int16
+	flags     uint16
+	fflags    uint32
+	pad_cgo_0 [4]byte
+	data      int64
+	udata     *byte
+	pad_cgo_1 [4]byte
+	ext       [4]uint64
+}
+
+type bintime struct {
+	sec  int64
+	frac uint64
+}
+
+type vdsoTimehands struct {
+	algo         uint32
+	gen          uint32
+	scale        uint64
+	offset_count uint32
+	counter_mask uint32
+	offset       bintime
+	boottime     bintime
+	physical     uint32
+	res          [7]uint32
+}
+
+type vdsoTimekeep struct {
+	ver       uint32
+	enabled   uint32
+	current   uint32
+	pad_cgo_0 [4]byte
+}
+
+const (
+	_VDSO_TK_VER_CURR = 0x1
+
+	vdsoTimehandsSize = 0x58
+	vdsoTimekeepSize  = 0x10
+)
diff --git a/src/runtime/defs_freebsd_arm64.go b/src/runtime/defs_freebsd_arm64.go
new file mode 100644
index 0000000..1d67236
--- /dev/null
+++ b/src/runtime/defs_freebsd_arm64.go
@@ -0,0 +1,265 @@
+// Code generated by cgo, then manually converted into appropriate naming and code
+// for the Go runtime.
+// go tool cgo -godefs defs_freebsd.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_NBBY            = 0x8
+	_CTL_MAXNAME     = 0x18
+	_CPU_LEVEL_WHICH = 0x3
+	_CPU_WHICH_PID   = 0x2
+)
+
+const (
+	_EINTR     = 0x4
+	_EFAULT    = 0xe
+	_EAGAIN    = 0x23
+	_ETIMEDOUT = 0x3c
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+	_O_CLOEXEC  = 0x100000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_SHARED  = 0x1
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED = 0x4
+	_MADV_FREE     = 0x5
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_CLOCK_MONOTONIC = 0x4
+	_CLOCK_REALTIME  = 0x0
+
+	_UMTX_OP_WAIT_UINT         = 0xb
+	_UMTX_OP_WAIT_UINT_PRIVATE = 0xf
+	_UMTX_OP_WAKE              = 0x3
+	_UMTX_OP_WAKE_PRIVATE      = 0x10
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x2
+	_FPE_INTOVF = 0x1
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0x40
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type rtprio struct {
+	_type uint16
+	prio  uint16
+}
+
+type thrparam struct {
+	start_func uintptr
+	arg        unsafe.Pointer
+	stack_base uintptr
+	stack_size uintptr
+	tls_base   unsafe.Pointer
+	tls_size   uintptr
+	child_tid  unsafe.Pointer // *int64
+	parent_tid *int64
+	flags      int32
+	pad_cgo_0  [4]byte
+	rtp        *rtprio
+	spare      [3]uintptr
+}
+
+type thread int64 // long
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_errno  int32
+	si_code   int32
+	si_pid    int32
+	si_uid    uint32
+	si_status int32
+	si_addr   uint64
+	si_value  [8]byte
+	_reason   [40]byte
+}
+
+type gpregs struct {
+	gp_x    [30]uint64
+	gp_lr   uint64
+	gp_sp   uint64
+	gp_elr  uint64
+	gp_spsr uint32
+	gp_pad  int32
+}
+
+type fpregs struct {
+	fp_q     [64]uint64 // actually [32]uint128
+	fp_sr    uint32
+	fp_cr    uint32
+	fp_flags int32
+	fp_pad   int32
+}
+
+type mcontext struct {
+	mc_gpregs gpregs
+	mc_fpregs fpregs
+	mc_flags  int32
+	mc_pad    int32
+	mc_spare  [8]uint64
+}
+
+type ucontext struct {
+	uc_sigmask  sigset
+	uc_mcontext mcontext
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_flags    int32
+	__spare__   [4]int32
+	pad_cgo_0   [12]byte
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type umtx_time struct {
+	_timeout timespec
+	_flags   uint32
+	_clockid uint32
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+	ext    [4]uint64
+}
+
+type bintime struct {
+	sec  int64
+	frac uint64
+}
+
+type vdsoTimehands struct {
+	algo         uint32
+	gen          uint32
+	scale        uint64
+	offset_count uint32
+	counter_mask uint32
+	offset       bintime
+	boottime     bintime
+	physical     uint32
+	res          [7]uint32
+}
+
+type vdsoTimekeep struct {
+	ver       uint32
+	enabled   uint32
+	current   uint32
+	pad_cgo_0 [4]byte
+}
+
+const (
+	_VDSO_TK_VER_CURR = 0x1
+
+	vdsoTimehandsSize = 0x58
+	vdsoTimekeepSize  = 0x10
+)
diff --git a/src/runtime/defs_freebsd_riscv64.go b/src/runtime/defs_freebsd_riscv64.go
new file mode 100644
index 0000000..b977bde
--- /dev/null
+++ b/src/runtime/defs_freebsd_riscv64.go
@@ -0,0 +1,266 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_freebsd.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_NBBY            = 0x8
+	_CTL_MAXNAME     = 0x18
+	_CPU_LEVEL_WHICH = 0x3
+	_CPU_WHICH_PID   = 0x2
+)
+
+const (
+	_EINTR     = 0x4
+	_EFAULT    = 0xe
+	_EAGAIN    = 0x23
+	_ETIMEDOUT = 0x3c
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+	_O_CLOEXEC  = 0x100000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_SHARED  = 0x1
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED = 0x4
+	_MADV_FREE     = 0x5
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_CLOCK_MONOTONIC = 0x4
+	_CLOCK_REALTIME  = 0x0
+
+	_UMTX_OP_WAIT_UINT         = 0xb
+	_UMTX_OP_WAIT_UINT_PRIVATE = 0xf
+	_UMTX_OP_WAKE              = 0x3
+	_UMTX_OP_WAKE_PRIVATE      = 0x10
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x2
+	_FPE_INTOVF = 0x1
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_RECEIPT   = 0x40
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type rtprio struct {
+	_type uint16
+	prio  uint16
+}
+
+type thrparam struct {
+	start_func uintptr
+	arg        unsafe.Pointer
+	stack_base uintptr
+	stack_size uintptr
+	tls_base   unsafe.Pointer
+	tls_size   uintptr
+	child_tid  unsafe.Pointer // *int64
+	parent_tid *int64
+	flags      int32
+	pad_cgo_0  [4]byte
+	rtp        *rtprio
+	spare      [3]uintptr
+}
+
+type thread int64 // long
+
+type sigset struct {
+	__bits [4]uint32
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_errno  int32
+	si_code   int32
+	si_pid    int32
+	si_uid    uint32
+	si_status int32
+	si_addr   uint64
+	si_value  [8]byte
+	_reason   [40]byte
+}
+
+type gpregs struct {
+	gp_ra      uint64
+	gp_sp      uint64
+	gp_gp      uint64
+	gp_tp      uint64
+	gp_t       [7]uint64
+	gp_s       [12]uint64
+	gp_a       [8]uint64
+	gp_sepc    uint64
+	gp_sstatus uint64
+}
+
+type fpregs struct {
+	fp_x     [64]uint64 // actually __uint64_t fp_x[32][2]
+	fp_fcsr  uint64
+	fp_flags int32
+	pad      int32
+}
+
+type mcontext struct {
+	mc_gpregs gpregs
+	mc_fpregs fpregs
+	mc_flags  int32
+	mc_pad    int32
+	mc_spare  [8]uint64
+}
+
+type ucontext struct {
+	uc_sigmask  sigset
+	uc_mcontext mcontext
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_flags    int32
+	__spare__   [4]int32
+	pad_cgo_0   [12]byte
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type umtx_time struct {
+	_timeout timespec
+	_flags   uint32
+	_clockid uint32
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+	ext    [4]uint64
+}
+
+type bintime struct {
+	sec  int64
+	frac uint64
+}
+
+type vdsoTimehands struct {
+	algo         uint32
+	gen          uint32
+	scale        uint64
+	offset_count uint32
+	counter_mask uint32
+	offset       bintime
+	boottime     bintime
+	physical     uint32
+	res          [7]uint32
+}
+
+type vdsoTimekeep struct {
+	ver       uint32
+	enabled   uint32
+	current   uint32
+	pad_cgo_0 [4]byte
+}
+
+const (
+	_VDSO_TK_VER_CURR = 0x1
+
+	vdsoTimehandsSize = 0x58
+	vdsoTimekeepSize  = 0x10
+)
diff --git a/src/runtime/defs_illumos_amd64.go b/src/runtime/defs_illumos_amd64.go
new file mode 100644
index 0000000..9c5413b
--- /dev/null
+++ b/src/runtime/defs_illumos_amd64.go
@@ -0,0 +1,14 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	_RCTL_LOCAL_DENY = 0x2
+
+	_RCTL_LOCAL_MAXIMAL = 0x80000000
+
+	_RCTL_FIRST = 0x0
+	_RCTL_NEXT  = 0x1
+)
diff --git a/src/runtime/defs_linux.go b/src/runtime/defs_linux.go
new file mode 100644
index 0000000..296fcb4
--- /dev/null
+++ b/src/runtime/defs_linux.go
@@ -0,0 +1,127 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+/*
+Input to cgo -cdefs
+
+GOARCH=amd64 go tool cgo -cdefs defs_linux.go defs1_linux.go >defs_linux_amd64.h
+*/
+
+package runtime
+
+/*
+// Linux glibc and Linux kernel define different and conflicting
+// definitions for struct sigaction, struct timespec, etc.
+// We want the kernel ones, which are in the asm/* headers.
+// But then we'd get conflicts when we include the system
+// headers for things like ucontext_t, so that happens in
+// a separate file, defs1.go.
+
+#define	_SYS_TYPES_H	// avoid inclusion of sys/types.h
+#include <asm/posix_types.h>
+#define size_t __kernel_size_t
+#include <asm/signal.h>
+#include <asm/siginfo.h>
+#include <asm/mman.h>
+#include <asm-generic/errno.h>
+#include <asm-generic/poll.h>
+#include <linux/eventpoll.h>
+#include <linux/time.h>
+*/
+import "C"
+
+const (
+	EINTR  = C.EINTR
+	EAGAIN = C.EAGAIN
+	ENOMEM = C.ENOMEM
+
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANONYMOUS
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+
+	MADV_DONTNEED   = C.MADV_DONTNEED
+	MADV_FREE       = C.MADV_FREE
+	MADV_HUGEPAGE   = C.MADV_HUGEPAGE
+	MADV_NOHUGEPAGE = C.MADV_NOHUGEPAGE
+
+	SA_RESTART = C.SA_RESTART
+	SA_ONSTACK = C.SA_ONSTACK
+	SA_SIGINFO = C.SA_SIGINFO
+
+	SI_KERNEL = C.SI_KERNEL
+	SI_TIMER  = C.SI_TIMER
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGBUS    = C.SIGBUS
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGUSR1   = C.SIGUSR1
+	SIGSEGV   = C.SIGSEGV
+	SIGUSR2   = C.SIGUSR2
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGSTKFLT = C.SIGSTKFLT
+	SIGCHLD   = C.SIGCHLD
+	SIGCONT   = C.SIGCONT
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGURG    = C.SIGURG
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGIO     = C.SIGIO
+	SIGPWR    = C.SIGPWR
+	SIGSYS    = C.SIGSYS
+
+	SIGRTMIN = C.SIGRTMIN
+
+	FPE_INTDIV = C.FPE_INTDIV
+	FPE_INTOVF = C.FPE_INTOVF
+	FPE_FLTDIV = C.FPE_FLTDIV
+	FPE_FLTOVF = C.FPE_FLTOVF
+	FPE_FLTUND = C.FPE_FLTUND
+	FPE_FLTRES = C.FPE_FLTRES
+	FPE_FLTINV = C.FPE_FLTINV
+	FPE_FLTSUB = C.FPE_FLTSUB
+
+	BUS_ADRALN = C.BUS_ADRALN
+	BUS_ADRERR = C.BUS_ADRERR
+	BUS_OBJERR = C.BUS_OBJERR
+
+	SEGV_MAPERR = C.SEGV_MAPERR
+	SEGV_ACCERR = C.SEGV_ACCERR
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	ITIMER_PROF    = C.ITIMER_PROF
+
+	CLOCK_THREAD_CPUTIME_ID = C.CLOCK_THREAD_CPUTIME_ID
+
+	SIGEV_THREAD_ID = C.SIGEV_THREAD_ID
+)
+
+type Sigset C.sigset_t
+type Timespec C.struct_timespec
+type Timeval C.struct_timeval
+type Sigaction C.struct_sigaction
+type Siginfo C.siginfo_t
+type Itimerspec C.struct_itimerspec
+type Itimerval C.struct_itimerval
+type Sigevent C.struct_sigevent
diff --git a/src/runtime/defs_linux_386.go b/src/runtime/defs_linux_386.go
new file mode 100644
index 0000000..5fef556
--- /dev/null
+++ b/src/runtime/defs_linux_386.go
@@ -0,0 +1,253 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs2_linux.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+	_MADV_COLLAPSE   = 0x19
+
+	_SA_RESTART  = 0x10000000
+	_SA_ONSTACK  = 0x8000000
+	_SA_RESTORER = 0x4000000
+	_SA_SIGINFO  = 0x4
+
+	_SI_KERNEL = 0x80
+	_SI_TIMER  = -0x2
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGBUS    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGUSR1   = 0xa
+	_SIGSEGV   = 0xb
+	_SIGUSR2   = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGSTKFLT = 0x10
+	_SIGCHLD   = 0x11
+	_SIGCONT   = 0x12
+	_SIGSTOP   = 0x13
+	_SIGTSTP   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGURG    = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGIO     = 0x1d
+	_SIGPWR    = 0x1e
+	_SIGSYS    = 0x1f
+
+	_SIGRTMIN = 0x20
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_CLOCK_THREAD_CPUTIME_ID = 0x3
+
+	_SIGEV_THREAD_ID = 0x4
+
+	_O_RDONLY   = 0x0
+	_O_WRONLY   = 0x1
+	_O_CREAT    = 0x40
+	_O_TRUNC    = 0x200
+	_O_NONBLOCK = 0x800
+	_O_CLOEXEC  = 0x80000
+
+	_AF_UNIX    = 0x1
+	_SOCK_DGRAM = 0x2
+)
+
+type fpreg struct {
+	significand [4]uint16
+	exponent    uint16
+}
+
+type fpxreg struct {
+	significand [4]uint16
+	exponent    uint16
+	padding     [3]uint16
+}
+
+type xmmreg struct {
+	element [4]uint32
+}
+
+type fpstate struct {
+	cw        uint32
+	sw        uint32
+	tag       uint32
+	ipoff     uint32
+	cssel     uint32
+	dataoff   uint32
+	datasel   uint32
+	_st       [8]fpreg
+	status    uint16
+	magic     uint16
+	_fxsr_env [6]uint32
+	mxcsr     uint32
+	reserved  uint32
+	_fxsr_st  [8]fpxreg
+	_xmm      [8]xmmreg
+	padding1  [44]uint32
+	anon0     [48]byte
+}
+
+type timespec struct {
+	tv_sec  int32
+	tv_nsec int32
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = timediv(ns, 1e9, &ts.tv_nsec)
+}
+
+type timeval struct {
+	tv_sec  int32
+	tv_usec int32
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type sigactiont struct {
+	sa_handler  uintptr
+	sa_flags    uint32
+	sa_restorer uintptr
+	sa_mask     uint64
+}
+
+type siginfoFields struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint32
+}
+
+type siginfo struct {
+	siginfoFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_si_max_size - unsafe.Sizeof(siginfoFields{})]byte
+}
+
+type stackt struct {
+	ss_sp    *byte
+	ss_flags int32
+	ss_size  uintptr
+}
+
+type sigcontext struct {
+	gs            uint16
+	__gsh         uint16
+	fs            uint16
+	__fsh         uint16
+	es            uint16
+	__esh         uint16
+	ds            uint16
+	__dsh         uint16
+	edi           uint32
+	esi           uint32
+	ebp           uint32
+	esp           uint32
+	ebx           uint32
+	edx           uint32
+	ecx           uint32
+	eax           uint32
+	trapno        uint32
+	err           uint32
+	eip           uint32
+	cs            uint16
+	__csh         uint16
+	eflags        uint32
+	esp_at_signal uint32
+	ss            uint16
+	__ssh         uint16
+	fpstate       *fpstate
+	oldmask       uint32
+	cr2           uint32
+}
+
+type ucontext struct {
+	uc_flags    uint32
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_mcontext sigcontext
+	uc_sigmask  uint32
+}
+
+type itimerspec struct {
+	it_interval timespec
+	it_value    timespec
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type sigeventFields struct {
+	value  uintptr
+	signo  int32
+	notify int32
+	// below here is a union; sigev_notify_thread_id is the only field we use
+	sigev_notify_thread_id int32
+}
+
+type sigevent struct {
+	sigeventFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_sigev_max_size - unsafe.Sizeof(sigeventFields{})]byte
+}
+
+type sockaddr_un struct {
+	family uint16
+	path   [108]byte
+}
diff --git a/src/runtime/defs_linux_amd64.go b/src/runtime/defs_linux_amd64.go
new file mode 100644
index 0000000..dce7799
--- /dev/null
+++ b/src/runtime/defs_linux_amd64.go
@@ -0,0 +1,289 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_linux.go defs1_linux.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+	_MADV_COLLAPSE   = 0x19
+
+	_SA_RESTART  = 0x10000000
+	_SA_ONSTACK  = 0x8000000
+	_SA_RESTORER = 0x4000000
+	_SA_SIGINFO  = 0x4
+
+	_SI_KERNEL = 0x80
+	_SI_TIMER  = -0x2
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGBUS    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGUSR1   = 0xa
+	_SIGSEGV   = 0xb
+	_SIGUSR2   = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGSTKFLT = 0x10
+	_SIGCHLD   = 0x11
+	_SIGCONT   = 0x12
+	_SIGSTOP   = 0x13
+	_SIGTSTP   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGURG    = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGIO     = 0x1d
+	_SIGPWR    = 0x1e
+	_SIGSYS    = 0x1f
+
+	_SIGRTMIN = 0x20
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_CLOCK_THREAD_CPUTIME_ID = 0x3
+
+	_SIGEV_THREAD_ID = 0x4
+
+	_AF_UNIX    = 0x1
+	_SOCK_DGRAM = 0x2
+)
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type sigactiont struct {
+	sa_handler  uintptr
+	sa_flags    uint64
+	sa_restorer uintptr
+	sa_mask     uint64
+}
+
+type siginfoFields struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint64
+}
+
+type siginfo struct {
+	siginfoFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_si_max_size - unsafe.Sizeof(siginfoFields{})]byte
+}
+
+type itimerspec struct {
+	it_interval timespec
+	it_value    timespec
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type sigeventFields struct {
+	value  uintptr
+	signo  int32
+	notify int32
+	// below here is a union; sigev_notify_thread_id is the only field we use
+	sigev_notify_thread_id int32
+}
+
+type sigevent struct {
+	sigeventFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_sigev_max_size - unsafe.Sizeof(sigeventFields{})]byte
+}
+
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_linux.go defs1_linux.go
+
+const (
+	_O_RDONLY   = 0x0
+	_O_WRONLY   = 0x1
+	_O_CREAT    = 0x40
+	_O_TRUNC    = 0x200
+	_O_NONBLOCK = 0x800
+	_O_CLOEXEC  = 0x80000
+)
+
+type usigset struct {
+	__val [16]uint64
+}
+
+type fpxreg struct {
+	significand [4]uint16
+	exponent    uint16
+	padding     [3]uint16
+}
+
+type xmmreg struct {
+	element [4]uint32
+}
+
+type fpstate struct {
+	cwd       uint16
+	swd       uint16
+	ftw       uint16
+	fop       uint16
+	rip       uint64
+	rdp       uint64
+	mxcsr     uint32
+	mxcr_mask uint32
+	_st       [8]fpxreg
+	_xmm      [16]xmmreg
+	padding   [24]uint32
+}
+
+type fpxreg1 struct {
+	significand [4]uint16
+	exponent    uint16
+	padding     [3]uint16
+}
+
+type xmmreg1 struct {
+	element [4]uint32
+}
+
+type fpstate1 struct {
+	cwd       uint16
+	swd       uint16
+	ftw       uint16
+	fop       uint16
+	rip       uint64
+	rdp       uint64
+	mxcsr     uint32
+	mxcr_mask uint32
+	_st       [8]fpxreg1
+	_xmm      [16]xmmreg1
+	padding   [24]uint32
+}
+
+type fpreg1 struct {
+	significand [4]uint16
+	exponent    uint16
+}
+
+type stackt struct {
+	ss_sp     *byte
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+	ss_size   uintptr
+}
+
+type mcontext struct {
+	gregs       [23]uint64
+	fpregs      *fpstate
+	__reserved1 [8]uint64
+}
+
+type ucontext struct {
+	uc_flags     uint64
+	uc_link      *ucontext
+	uc_stack     stackt
+	uc_mcontext  mcontext
+	uc_sigmask   usigset
+	__fpregs_mem fpstate
+}
+
+type sigcontext struct {
+	r8          uint64
+	r9          uint64
+	r10         uint64
+	r11         uint64
+	r12         uint64
+	r13         uint64
+	r14         uint64
+	r15         uint64
+	rdi         uint64
+	rsi         uint64
+	rbp         uint64
+	rbx         uint64
+	rdx         uint64
+	rax         uint64
+	rcx         uint64
+	rsp         uint64
+	rip         uint64
+	eflags      uint64
+	cs          uint16
+	gs          uint16
+	fs          uint16
+	__pad0      uint16
+	err         uint64
+	trapno      uint64
+	oldmask     uint64
+	cr2         uint64
+	fpstate     *fpstate1
+	__reserved1 [8]uint64
+}
+
+type sockaddr_un struct {
+	family uint16
+	path   [108]byte
+}
diff --git a/src/runtime/defs_linux_arm.go b/src/runtime/defs_linux_arm.go
new file mode 100644
index 0000000..71cf8c6
--- /dev/null
+++ b/src/runtime/defs_linux_arm.go
@@ -0,0 +1,207 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// Constants
+const (
+	_EINTR  = 0x4
+	_ENOMEM = 0xc
+	_EAGAIN = 0xb
+
+	_PROT_NONE  = 0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+	_MADV_COLLAPSE   = 0x19
+
+	_SA_RESTART     = 0x10000000
+	_SA_ONSTACK     = 0x8000000
+	_SA_RESTORER    = 0 // unused on ARM
+	_SA_SIGINFO     = 0x4
+	_SI_KERNEL      = 0x80
+	_SI_TIMER       = -0x2
+	_SIGHUP         = 0x1
+	_SIGINT         = 0x2
+	_SIGQUIT        = 0x3
+	_SIGILL         = 0x4
+	_SIGTRAP        = 0x5
+	_SIGABRT        = 0x6
+	_SIGBUS         = 0x7
+	_SIGFPE         = 0x8
+	_SIGKILL        = 0x9
+	_SIGUSR1        = 0xa
+	_SIGSEGV        = 0xb
+	_SIGUSR2        = 0xc
+	_SIGPIPE        = 0xd
+	_SIGALRM        = 0xe
+	_SIGSTKFLT      = 0x10
+	_SIGCHLD        = 0x11
+	_SIGCONT        = 0x12
+	_SIGSTOP        = 0x13
+	_SIGTSTP        = 0x14
+	_SIGTTIN        = 0x15
+	_SIGTTOU        = 0x16
+	_SIGURG         = 0x17
+	_SIGXCPU        = 0x18
+	_SIGXFSZ        = 0x19
+	_SIGVTALRM      = 0x1a
+	_SIGPROF        = 0x1b
+	_SIGWINCH       = 0x1c
+	_SIGIO          = 0x1d
+	_SIGPWR         = 0x1e
+	_SIGSYS         = 0x1f
+	_SIGRTMIN       = 0x20
+	_FPE_INTDIV     = 0x1
+	_FPE_INTOVF     = 0x2
+	_FPE_FLTDIV     = 0x3
+	_FPE_FLTOVF     = 0x4
+	_FPE_FLTUND     = 0x5
+	_FPE_FLTRES     = 0x6
+	_FPE_FLTINV     = 0x7
+	_FPE_FLTSUB     = 0x8
+	_BUS_ADRALN     = 0x1
+	_BUS_ADRERR     = 0x2
+	_BUS_OBJERR     = 0x3
+	_SEGV_MAPERR    = 0x1
+	_SEGV_ACCERR    = 0x2
+	_ITIMER_REAL    = 0
+	_ITIMER_PROF    = 0x2
+	_ITIMER_VIRTUAL = 0x1
+	_O_RDONLY       = 0
+	_O_WRONLY       = 0x1
+	_O_CREAT        = 0x40
+	_O_TRUNC        = 0x200
+	_O_NONBLOCK     = 0x800
+	_O_CLOEXEC      = 0x80000
+
+	_CLOCK_THREAD_CPUTIME_ID = 0x3
+
+	_SIGEV_THREAD_ID = 0x4
+
+	_AF_UNIX    = 0x1
+	_SOCK_DGRAM = 0x2
+)
+
+type timespec struct {
+	tv_sec  int32
+	tv_nsec int32
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = timediv(ns, 1e9, &ts.tv_nsec)
+}
+
+type stackt struct {
+	ss_sp    *byte
+	ss_flags int32
+	ss_size  uintptr
+}
+
+type sigcontext struct {
+	trap_no       uint32
+	error_code    uint32
+	oldmask       uint32
+	r0            uint32
+	r1            uint32
+	r2            uint32
+	r3            uint32
+	r4            uint32
+	r5            uint32
+	r6            uint32
+	r7            uint32
+	r8            uint32
+	r9            uint32
+	r10           uint32
+	fp            uint32
+	ip            uint32
+	sp            uint32
+	lr            uint32
+	pc            uint32
+	cpsr          uint32
+	fault_address uint32
+}
+
+type ucontext struct {
+	uc_flags    uint32
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_mcontext sigcontext
+	uc_sigmask  uint32
+	__unused    [31]int32
+	uc_regspace [128]uint32
+}
+
+type timeval struct {
+	tv_sec  int32
+	tv_usec int32
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerspec struct {
+	it_interval timespec
+	it_value    timespec
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type sigeventFields struct {
+	value  uintptr
+	signo  int32
+	notify int32
+	// below here is a union; sigev_notify_thread_id is the only field we use
+	sigev_notify_thread_id int32
+}
+
+type sigevent struct {
+	sigeventFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_sigev_max_size - unsafe.Sizeof(sigeventFields{})]byte
+}
+
+type siginfoFields struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint32
+}
+
+type siginfo struct {
+	siginfoFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_si_max_size - unsafe.Sizeof(siginfoFields{})]byte
+}
+
+type sigactiont struct {
+	sa_handler  uintptr
+	sa_flags    uint32
+	sa_restorer uintptr
+	sa_mask     uint64
+}
+
+type sockaddr_un struct {
+	family uint16
+	path   [108]byte
+}
diff --git a/src/runtime/defs_linux_arm64.go b/src/runtime/defs_linux_arm64.go
new file mode 100644
index 0000000..606cd70
--- /dev/null
+++ b/src/runtime/defs_linux_arm64.go
@@ -0,0 +1,211 @@
+// Created by cgo -cdefs and converted (by hand) to Go
+// ../cmd/cgo/cgo -cdefs defs_linux.go defs1_linux.go defs2_linux.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+	_MADV_COLLAPSE   = 0x19
+
+	_SA_RESTART  = 0x10000000
+	_SA_ONSTACK  = 0x8000000
+	_SA_RESTORER = 0x0 // Only used on intel
+	_SA_SIGINFO  = 0x4
+
+	_SI_KERNEL = 0x80
+	_SI_TIMER  = -0x2
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGBUS    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGUSR1   = 0xa
+	_SIGSEGV   = 0xb
+	_SIGUSR2   = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGSTKFLT = 0x10
+	_SIGCHLD   = 0x11
+	_SIGCONT   = 0x12
+	_SIGSTOP   = 0x13
+	_SIGTSTP   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGURG    = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGIO     = 0x1d
+	_SIGPWR    = 0x1e
+	_SIGSYS    = 0x1f
+
+	_SIGRTMIN = 0x20
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_CLOCK_THREAD_CPUTIME_ID = 0x3
+
+	_SIGEV_THREAD_ID = 0x4
+
+	_AF_UNIX    = 0x1
+	_SOCK_DGRAM = 0x2
+)
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type sigactiont struct {
+	sa_handler  uintptr
+	sa_flags    uint64
+	sa_restorer uintptr
+	sa_mask     uint64
+}
+
+type siginfoFields struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint64
+}
+
+type siginfo struct {
+	siginfoFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_si_max_size - unsafe.Sizeof(siginfoFields{})]byte
+}
+
+type itimerspec struct {
+	it_interval timespec
+	it_value    timespec
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type sigeventFields struct {
+	value  uintptr
+	signo  int32
+	notify int32
+	// below here is a union; sigev_notify_thread_id is the only field we use
+	sigev_notify_thread_id int32
+}
+
+type sigevent struct {
+	sigeventFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_sigev_max_size - unsafe.Sizeof(sigeventFields{})]byte
+}
+
+// Created by cgo -cdefs and then converted to Go by hand
+// ../cmd/cgo/cgo -cdefs defs_linux.go defs1_linux.go defs2_linux.go
+
+const (
+	_O_RDONLY   = 0x0
+	_O_WRONLY   = 0x1
+	_O_CREAT    = 0x40
+	_O_TRUNC    = 0x200
+	_O_NONBLOCK = 0x800
+	_O_CLOEXEC  = 0x80000
+)
+
+type usigset struct {
+	__val [16]uint64
+}
+
+type stackt struct {
+	ss_sp     *byte
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+	ss_size   uintptr
+}
+
+type sigcontext struct {
+	fault_address uint64
+	/* AArch64 registers */
+	regs       [31]uint64
+	sp         uint64
+	pc         uint64
+	pstate     uint64
+	_pad       [8]byte // __attribute__((__aligned__(16)))
+	__reserved [4096]byte
+}
+
+type sockaddr_un struct {
+	family uint16
+	path   [108]byte
+}
+
+type ucontext struct {
+	uc_flags    uint64
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_sigmask  uint64
+	_pad        [(1024 - 64) / 8]byte
+	_pad2       [8]byte // sigcontext must be aligned to 16-byte
+	uc_mcontext sigcontext
+}
diff --git a/src/runtime/defs_linux_loong64.go b/src/runtime/defs_linux_loong64.go
new file mode 100644
index 0000000..692d8c7
--- /dev/null
+++ b/src/runtime/defs_linux_loong64.go
@@ -0,0 +1,198 @@
+// Generated using cgo, then manually converted into appropriate naming and code
+// for the Go runtime.
+// go tool cgo -godefs defs_linux.go defs1_linux.go defs2_linux.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+	_MADV_COLLAPSE   = 0x19
+
+	_SA_RESTART  = 0x10000000
+	_SA_ONSTACK  = 0x8000000
+	_SA_SIGINFO  = 0x4
+	_SA_RESTORER = 0x0
+
+	_SI_KERNEL = 0x80
+	_SI_TIMER  = -0x2
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGBUS    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGUSR1   = 0xa
+	_SIGSEGV   = 0xb
+	_SIGUSR2   = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGSTKFLT = 0x10
+	_SIGCHLD   = 0x11
+	_SIGCONT   = 0x12
+	_SIGSTOP   = 0x13
+	_SIGTSTP   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGURG    = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGIO     = 0x1d
+	_SIGPWR    = 0x1e
+	_SIGSYS    = 0x1f
+
+	_SIGRTMIN = 0x20
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_CLOCK_THREAD_CPUTIME_ID = 0x3
+
+	_SIGEV_THREAD_ID = 0x4
+)
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type itimerspec struct {
+	it_interval timespec
+	it_value    timespec
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type sigeventFields struct {
+	value  uintptr
+	signo  int32
+	notify int32
+	// below here is a union; sigev_notify_thread_id is the only field we use
+	sigev_notify_thread_id int32
+}
+
+type sigevent struct {
+	sigeventFields
+	// Pad struct to the max size in the kernel.
+	_ [_sigev_max_size - unsafe.Sizeof(sigeventFields{})]byte
+}
+
+const (
+	_O_RDONLY   = 0x0
+	_O_WRONLY   = 0x1
+	_O_CREAT    = 0x40
+	_O_TRUNC    = 0x200
+	_O_NONBLOCK = 0x800
+	_O_CLOEXEC  = 0x80000
+)
+
+type sigactiont struct {
+	sa_handler uintptr
+	sa_flags   uint64
+	sa_mask    uint64
+	// Linux on loong64 does not have the sa_restorer field, but the setsig
+	// function references it (for x86). Not much harm to include it at the end.
+	sa_restorer uintptr
+}
+
+type siginfoFields struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	__pad0   [1]int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint64
+}
+
+type siginfo struct {
+	siginfoFields
+	// Pad struct to the max size in the kernel.
+	_ [_si_max_size - unsafe.Sizeof(siginfoFields{})]byte
+}
+
+type usigset struct {
+	val [16]uint64
+}
+
+type stackt struct {
+	ss_sp     *byte
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+	ss_size   uintptr
+}
+
+type sigcontext struct {
+	sc_pc         uint64
+	sc_regs       [32]uint64
+	sc_flags      uint32
+	sc_extcontext [0]uint64
+}
+
+type ucontext struct {
+	uc_flags     uint64
+	uc_link      *ucontext
+	uc_stack     stackt
+	uc_sigmask   usigset
+	uc_x_unused  [0]uint8
+	uc_pad_cgo_0 [8]byte
+	uc_mcontext  sigcontext
+}
diff --git a/src/runtime/defs_linux_mips64x.go b/src/runtime/defs_linux_mips64x.go
new file mode 100644
index 0000000..8a0af41
--- /dev/null
+++ b/src/runtime/defs_linux_mips64x.go
@@ -0,0 +1,211 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build (mips64 || mips64le) && linux
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x800
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+	_MADV_COLLAPSE   = 0x19
+
+	_SA_RESTART = 0x10000000
+	_SA_ONSTACK = 0x8000000
+	_SA_SIGINFO = 0x8
+
+	_SI_KERNEL = 0x80
+	_SI_TIMER  = -0x2
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGUSR1   = 0x10
+	_SIGUSR2   = 0x11
+	_SIGCHLD   = 0x12
+	_SIGPWR    = 0x13
+	_SIGWINCH  = 0x14
+	_SIGURG    = 0x15
+	_SIGIO     = 0x16
+	_SIGSTOP   = 0x17
+	_SIGTSTP   = 0x18
+	_SIGCONT   = 0x19
+	_SIGTTIN   = 0x1a
+	_SIGTTOU   = 0x1b
+	_SIGVTALRM = 0x1c
+	_SIGPROF   = 0x1d
+	_SIGXCPU   = 0x1e
+	_SIGXFSZ   = 0x1f
+
+	_SIGRTMIN = 0x20
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_CLOCK_THREAD_CPUTIME_ID = 0x3
+
+	_SIGEV_THREAD_ID = 0x4
+)
+
+//struct Sigset {
+//	uint64	sig[1];
+//};
+//typedef uint64 Sigset;
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type sigactiont struct {
+	sa_flags   uint32
+	sa_handler uintptr
+	sa_mask    [2]uint64
+	// linux header does not have sa_restorer field,
+	// but it is used in setsig(). it is no harm to put it here
+	sa_restorer uintptr
+}
+
+type siginfoFields struct {
+	si_signo int32
+	si_code  int32
+	si_errno int32
+	__pad0   [1]int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint64
+}
+
+type siginfo struct {
+	siginfoFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_si_max_size - unsafe.Sizeof(siginfoFields{})]byte
+}
+
+type itimerspec struct {
+	it_interval timespec
+	it_value    timespec
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type sigeventFields struct {
+	value  uintptr
+	signo  int32
+	notify int32
+	// below here is a union; sigev_notify_thread_id is the only field we use
+	sigev_notify_thread_id int32
+}
+
+type sigevent struct {
+	sigeventFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_sigev_max_size - unsafe.Sizeof(sigeventFields{})]byte
+}
+
+const (
+	_O_RDONLY    = 0x0
+	_O_WRONLY    = 0x1
+	_O_CREAT     = 0x100
+	_O_TRUNC     = 0x200
+	_O_NONBLOCK  = 0x80
+	_O_CLOEXEC   = 0x80000
+	_SA_RESTORER = 0
+)
+
+type stackt struct {
+	ss_sp    *byte
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type sigcontext struct {
+	sc_regs      [32]uint64
+	sc_fpregs    [32]uint64
+	sc_mdhi      uint64
+	sc_hi1       uint64
+	sc_hi2       uint64
+	sc_hi3       uint64
+	sc_mdlo      uint64
+	sc_lo1       uint64
+	sc_lo2       uint64
+	sc_lo3       uint64
+	sc_pc        uint64
+	sc_fpc_csr   uint32
+	sc_used_math uint32
+	sc_dsp       uint32
+	sc_reserved  uint32
+}
+
+type ucontext struct {
+	uc_flags    uint64
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_mcontext sigcontext
+	uc_sigmask  uint64
+}
diff --git a/src/runtime/defs_linux_mipsx.go b/src/runtime/defs_linux_mipsx.go
new file mode 100644
index 0000000..8322bea
--- /dev/null
+++ b/src/runtime/defs_linux_mipsx.go
@@ -0,0 +1,209 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build (mips || mipsle) && linux
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x800
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+	_MADV_COLLAPSE   = 0x19
+
+	_SA_RESTART = 0x10000000
+	_SA_ONSTACK = 0x8000000
+	_SA_SIGINFO = 0x8
+
+	_SI_KERNEL = 0x80
+	_SI_TIMER  = -0x2
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGUSR1   = 0x10
+	_SIGUSR2   = 0x11
+	_SIGCHLD   = 0x12
+	_SIGPWR    = 0x13
+	_SIGWINCH  = 0x14
+	_SIGURG    = 0x15
+	_SIGIO     = 0x16
+	_SIGSTOP   = 0x17
+	_SIGTSTP   = 0x18
+	_SIGCONT   = 0x19
+	_SIGTTIN   = 0x1a
+	_SIGTTOU   = 0x1b
+	_SIGVTALRM = 0x1c
+	_SIGPROF   = 0x1d
+	_SIGXCPU   = 0x1e
+	_SIGXFSZ   = 0x1f
+
+	_SIGRTMIN = 0x20
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_CLOCK_THREAD_CPUTIME_ID = 0x3
+
+	_SIGEV_THREAD_ID = 0x4
+)
+
+type timespec struct {
+	tv_sec  int32
+	tv_nsec int32
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = timediv(ns, 1e9, &ts.tv_nsec)
+}
+
+type timeval struct {
+	tv_sec  int32
+	tv_usec int32
+}
+
+//go:nosplit
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type sigactiont struct {
+	sa_flags   uint32
+	sa_handler uintptr
+	sa_mask    [4]uint32
+	// linux header does not have sa_restorer field,
+	// but it is used in setsig(). it is no harm to put it here
+	sa_restorer uintptr
+}
+
+type siginfoFields struct {
+	si_signo int32
+	si_code  int32
+	si_errno int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint32
+}
+
+type siginfo struct {
+	siginfoFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_si_max_size - unsafe.Sizeof(siginfoFields{})]byte
+}
+
+type itimerspec struct {
+	it_interval timespec
+	it_value    timespec
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type sigeventFields struct {
+	value  uintptr
+	signo  int32
+	notify int32
+	// below here is a union; sigev_notify_thread_id is the only field we use
+	sigev_notify_thread_id int32
+}
+
+type sigevent struct {
+	sigeventFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_sigev_max_size - unsafe.Sizeof(sigeventFields{})]byte
+}
+
+const (
+	_O_RDONLY    = 0x0
+	_O_WRONLY    = 0x1
+	_O_NONBLOCK  = 0x80
+	_O_CREAT     = 0x100
+	_O_TRUNC     = 0x200
+	_O_CLOEXEC   = 0x80000
+	_SA_RESTORER = 0
+)
+
+type stackt struct {
+	ss_sp    *byte
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type sigcontext struct {
+	sc_regmask   uint32
+	sc_status    uint32
+	sc_pc        uint64
+	sc_regs      [32]uint64
+	sc_fpregs    [32]uint64
+	sc_acx       uint32
+	sc_fpc_csr   uint32
+	sc_fpc_eir   uint32
+	sc_used_math uint32
+	sc_dsp       uint32
+	sc_mdhi      uint64
+	sc_mdlo      uint64
+	sc_hi1       uint32
+	sc_lo1       uint32
+	sc_hi2       uint32
+	sc_lo2       uint32
+	sc_hi3       uint32
+	sc_lo3       uint32
+}
+
+type ucontext struct {
+	uc_flags    uint32
+	uc_link     *ucontext
+	uc_stack    stackt
+	Pad_cgo_0   [4]byte
+	uc_mcontext sigcontext
+	uc_sigmask  [4]uint32
+}
diff --git a/src/runtime/defs_linux_ppc64.go b/src/runtime/defs_linux_ppc64.go
new file mode 100644
index 0000000..f87924a
--- /dev/null
+++ b/src/runtime/defs_linux_ppc64.go
@@ -0,0 +1,225 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_linux.go defs3_linux.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+	_MADV_COLLAPSE   = 0x19
+
+	_SA_RESTART = 0x10000000
+	_SA_ONSTACK = 0x8000000
+	_SA_SIGINFO = 0x4
+
+	_SI_KERNEL = 0x80
+	_SI_TIMER  = -0x2
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGBUS    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGUSR1   = 0xa
+	_SIGSEGV   = 0xb
+	_SIGUSR2   = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGSTKFLT = 0x10
+	_SIGCHLD   = 0x11
+	_SIGCONT   = 0x12
+	_SIGSTOP   = 0x13
+	_SIGTSTP   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGURG    = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGIO     = 0x1d
+	_SIGPWR    = 0x1e
+	_SIGSYS    = 0x1f
+
+	_SIGRTMIN = 0x20
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_CLOCK_THREAD_CPUTIME_ID = 0x3
+
+	_SIGEV_THREAD_ID = 0x4
+)
+
+//struct Sigset {
+//	uint64	sig[1];
+//};
+//typedef uint64 Sigset;
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type sigactiont struct {
+	sa_handler  uintptr
+	sa_flags    uint64
+	sa_restorer uintptr
+	sa_mask     uint64
+}
+
+type siginfoFields struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint64
+}
+
+type siginfo struct {
+	siginfoFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_si_max_size - unsafe.Sizeof(siginfoFields{})]byte
+}
+
+type itimerspec struct {
+	it_interval timespec
+	it_value    timespec
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type sigeventFields struct {
+	value  uintptr
+	signo  int32
+	notify int32
+	// below here is a union; sigev_notify_thread_id is the only field we use
+	sigev_notify_thread_id int32
+}
+
+type sigevent struct {
+	sigeventFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_sigev_max_size - unsafe.Sizeof(sigeventFields{})]byte
+}
+
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_linux.go defs3_linux.go
+
+const (
+	_O_RDONLY    = 0x0
+	_O_WRONLY    = 0x1
+	_O_CREAT     = 0x40
+	_O_TRUNC     = 0x200
+	_O_NONBLOCK  = 0x800
+	_O_CLOEXEC   = 0x80000
+	_SA_RESTORER = 0
+)
+
+type ptregs struct {
+	gpr       [32]uint64
+	nip       uint64
+	msr       uint64
+	orig_gpr3 uint64
+	ctr       uint64
+	link      uint64
+	xer       uint64
+	ccr       uint64
+	softe     uint64
+	trap      uint64
+	dar       uint64
+	dsisr     uint64
+	result    uint64
+}
+
+type vreg struct {
+	u [4]uint32
+}
+
+type stackt struct {
+	ss_sp     *byte
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+	ss_size   uintptr
+}
+
+type sigcontext struct {
+	_unused     [4]uint64
+	signal      int32
+	_pad0       int32
+	handler     uint64
+	oldmask     uint64
+	regs        *ptregs
+	gp_regs     [48]uint64
+	fp_regs     [33]float64
+	v_regs      *vreg
+	vmx_reserve [101]int64
+}
+
+type ucontext struct {
+	uc_flags    uint64
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_sigmask  uint64
+	__unused    [15]uint64
+	uc_mcontext sigcontext
+}
diff --git a/src/runtime/defs_linux_ppc64le.go b/src/runtime/defs_linux_ppc64le.go
new file mode 100644
index 0000000..f87924a
--- /dev/null
+++ b/src/runtime/defs_linux_ppc64le.go
@@ -0,0 +1,225 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_linux.go defs3_linux.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+	_MADV_COLLAPSE   = 0x19
+
+	_SA_RESTART = 0x10000000
+	_SA_ONSTACK = 0x8000000
+	_SA_SIGINFO = 0x4
+
+	_SI_KERNEL = 0x80
+	_SI_TIMER  = -0x2
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGBUS    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGUSR1   = 0xa
+	_SIGSEGV   = 0xb
+	_SIGUSR2   = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGSTKFLT = 0x10
+	_SIGCHLD   = 0x11
+	_SIGCONT   = 0x12
+	_SIGSTOP   = 0x13
+	_SIGTSTP   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGURG    = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGIO     = 0x1d
+	_SIGPWR    = 0x1e
+	_SIGSYS    = 0x1f
+
+	_SIGRTMIN = 0x20
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_CLOCK_THREAD_CPUTIME_ID = 0x3
+
+	_SIGEV_THREAD_ID = 0x4
+)
+
+//struct Sigset {
+//	uint64	sig[1];
+//};
+//typedef uint64 Sigset;
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type sigactiont struct {
+	sa_handler  uintptr
+	sa_flags    uint64
+	sa_restorer uintptr
+	sa_mask     uint64
+}
+
+type siginfoFields struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint64
+}
+
+type siginfo struct {
+	siginfoFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_si_max_size - unsafe.Sizeof(siginfoFields{})]byte
+}
+
+type itimerspec struct {
+	it_interval timespec
+	it_value    timespec
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type sigeventFields struct {
+	value  uintptr
+	signo  int32
+	notify int32
+	// below here is a union; sigev_notify_thread_id is the only field we use
+	sigev_notify_thread_id int32
+}
+
+type sigevent struct {
+	sigeventFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_sigev_max_size - unsafe.Sizeof(sigeventFields{})]byte
+}
+
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_linux.go defs3_linux.go
+
+const (
+	_O_RDONLY    = 0x0
+	_O_WRONLY    = 0x1
+	_O_CREAT     = 0x40
+	_O_TRUNC     = 0x200
+	_O_NONBLOCK  = 0x800
+	_O_CLOEXEC   = 0x80000
+	_SA_RESTORER = 0
+)
+
+type ptregs struct {
+	gpr       [32]uint64
+	nip       uint64
+	msr       uint64
+	orig_gpr3 uint64
+	ctr       uint64
+	link      uint64
+	xer       uint64
+	ccr       uint64
+	softe     uint64
+	trap      uint64
+	dar       uint64
+	dsisr     uint64
+	result    uint64
+}
+
+type vreg struct {
+	u [4]uint32
+}
+
+type stackt struct {
+	ss_sp     *byte
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+	ss_size   uintptr
+}
+
+type sigcontext struct {
+	_unused     [4]uint64
+	signal      int32
+	_pad0       int32
+	handler     uint64
+	oldmask     uint64
+	regs        *ptregs
+	gp_regs     [48]uint64
+	fp_regs     [33]float64
+	v_regs      *vreg
+	vmx_reserve [101]int64
+}
+
+type ucontext struct {
+	uc_flags    uint64
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_sigmask  uint64
+	__unused    [15]uint64
+	uc_mcontext sigcontext
+}
diff --git a/src/runtime/defs_linux_riscv64.go b/src/runtime/defs_linux_riscv64.go
new file mode 100644
index 0000000..29b1ef2
--- /dev/null
+++ b/src/runtime/defs_linux_riscv64.go
@@ -0,0 +1,235 @@
+// Generated using cgo, then manually converted into appropriate naming and code
+// for the Go runtime.
+// go tool cgo -godefs defs_linux.go defs1_linux.go defs2_linux.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+	_MADV_COLLAPSE   = 0x19
+
+	_SA_RESTART  = 0x10000000
+	_SA_ONSTACK  = 0x8000000
+	_SA_RESTORER = 0x0
+	_SA_SIGINFO  = 0x4
+
+	_SI_KERNEL = 0x80
+	_SI_TIMER  = -0x2
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGBUS    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGUSR1   = 0xa
+	_SIGSEGV   = 0xb
+	_SIGUSR2   = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGSTKFLT = 0x10
+	_SIGCHLD   = 0x11
+	_SIGCONT   = 0x12
+	_SIGSTOP   = 0x13
+	_SIGTSTP   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGURG    = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGIO     = 0x1d
+	_SIGPWR    = 0x1e
+	_SIGSYS    = 0x1f
+
+	_SIGRTMIN = 0x20
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_CLOCK_THREAD_CPUTIME_ID = 0x3
+
+	_SIGEV_THREAD_ID = 0x4
+)
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type sigactiont struct {
+	sa_handler uintptr
+	sa_flags   uint64
+	sa_mask    uint64
+	// Linux on riscv64 does not have the sa_restorer field, but the setsig
+	// function references it (for x86). Not much harm to include it at the end.
+	sa_restorer uintptr
+}
+
+type siginfoFields struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint64
+}
+
+type siginfo struct {
+	siginfoFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_si_max_size - unsafe.Sizeof(siginfoFields{})]byte
+}
+
+type itimerspec struct {
+	it_interval timespec
+	it_value    timespec
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type sigeventFields struct {
+	value  uintptr
+	signo  int32
+	notify int32
+	// below here is a union; sigev_notify_thread_id is the only field we use
+	sigev_notify_thread_id int32
+}
+
+type sigevent struct {
+	sigeventFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_sigev_max_size - unsafe.Sizeof(sigeventFields{})]byte
+}
+
+const (
+	_O_RDONLY   = 0x0
+	_O_WRONLY   = 0x1
+	_O_CREAT    = 0x40
+	_O_TRUNC    = 0x200
+	_O_NONBLOCK = 0x800
+	_O_CLOEXEC  = 0x80000
+)
+
+type user_regs_struct struct {
+	pc  uint64
+	ra  uint64
+	sp  uint64
+	gp  uint64
+	tp  uint64
+	t0  uint64
+	t1  uint64
+	t2  uint64
+	s0  uint64
+	s1  uint64
+	a0  uint64
+	a1  uint64
+	a2  uint64
+	a3  uint64
+	a4  uint64
+	a5  uint64
+	a6  uint64
+	a7  uint64
+	s2  uint64
+	s3  uint64
+	s4  uint64
+	s5  uint64
+	s6  uint64
+	s7  uint64
+	s8  uint64
+	s9  uint64
+	s10 uint64
+	s11 uint64
+	t3  uint64
+	t4  uint64
+	t5  uint64
+	t6  uint64
+}
+
+type user_fpregs_struct struct {
+	f [528]byte
+}
+
+type usigset struct {
+	us_x__val [16]uint64
+}
+
+type sigcontext struct {
+	sc_regs   user_regs_struct
+	sc_fpregs user_fpregs_struct
+}
+
+type stackt struct {
+	ss_sp    *byte
+	ss_flags int32
+	ss_size  uintptr
+}
+
+type ucontext struct {
+	uc_flags     uint64
+	uc_link      *ucontext
+	uc_stack     stackt
+	uc_sigmask   usigset
+	uc_x__unused [0]uint8
+	uc_pad_cgo_0 [8]byte
+	uc_mcontext  sigcontext
+}
diff --git a/src/runtime/defs_linux_s390x.go b/src/runtime/defs_linux_s390x.go
new file mode 100644
index 0000000..b028021
--- /dev/null
+++ b/src/runtime/defs_linux_s390x.go
@@ -0,0 +1,192 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EAGAIN = 0xb
+	_ENOMEM = 0xc
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x20
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+
+	_MADV_DONTNEED   = 0x4
+	_MADV_FREE       = 0x8
+	_MADV_HUGEPAGE   = 0xe
+	_MADV_NOHUGEPAGE = 0xf
+	_MADV_COLLAPSE   = 0x19
+
+	_SA_RESTART = 0x10000000
+	_SA_ONSTACK = 0x8000000
+	_SA_SIGINFO = 0x4
+
+	_SI_KERNEL = 0x80
+	_SI_TIMER  = -0x2
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGBUS    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGUSR1   = 0xa
+	_SIGSEGV   = 0xb
+	_SIGUSR2   = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGSTKFLT = 0x10
+	_SIGCHLD   = 0x11
+	_SIGCONT   = 0x12
+	_SIGSTOP   = 0x13
+	_SIGTSTP   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGURG    = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGIO     = 0x1d
+	_SIGPWR    = 0x1e
+	_SIGSYS    = 0x1f
+
+	_SIGRTMIN = 0x20
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_CLOCK_THREAD_CPUTIME_ID = 0x3
+
+	_SIGEV_THREAD_ID = 0x4
+)
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type sigactiont struct {
+	sa_handler  uintptr
+	sa_flags    uint64
+	sa_restorer uintptr
+	sa_mask     uint64
+}
+
+type siginfoFields struct {
+	si_signo int32
+	si_errno int32
+	si_code  int32
+	// below here is a union; si_addr is the only field we use
+	si_addr uint64
+}
+
+type siginfo struct {
+	siginfoFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_si_max_size - unsafe.Sizeof(siginfoFields{})]byte
+}
+
+type itimerspec struct {
+	it_interval timespec
+	it_value    timespec
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type sigeventFields struct {
+	value  uintptr
+	signo  int32
+	notify int32
+	// below here is a union; sigev_notify_thread_id is the only field we use
+	sigev_notify_thread_id int32
+}
+
+type sigevent struct {
+	sigeventFields
+
+	// Pad struct to the max size in the kernel.
+	_ [_sigev_max_size - unsafe.Sizeof(sigeventFields{})]byte
+}
+
+const (
+	_O_RDONLY    = 0x0
+	_O_WRONLY    = 0x1
+	_O_CREAT     = 0x40
+	_O_TRUNC     = 0x200
+	_O_NONBLOCK  = 0x800
+	_O_CLOEXEC   = 0x80000
+	_SA_RESTORER = 0
+)
+
+type stackt struct {
+	ss_sp    *byte
+	ss_flags int32
+	ss_size  uintptr
+}
+
+type sigcontext struct {
+	psw_mask uint64
+	psw_addr uint64
+	gregs    [16]uint64
+	aregs    [16]uint32
+	fpc      uint32
+	fpregs   [16]uint64
+}
+
+type ucontext struct {
+	uc_flags    uint64
+	uc_link     *ucontext
+	uc_stack    stackt
+	uc_mcontext sigcontext
+	uc_sigmask  uint64
+}
diff --git a/src/runtime/defs_netbsd.go b/src/runtime/defs_netbsd.go
new file mode 100644
index 0000000..43923e3
--- /dev/null
+++ b/src/runtime/defs_netbsd.go
@@ -0,0 +1,133 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -cdefs defs_netbsd.go defs_netbsd_amd64.go >defs_netbsd_amd64.h
+GOARCH=386 go tool cgo -cdefs defs_netbsd.go defs_netbsd_386.go >defs_netbsd_386.h
+GOARCH=arm go tool cgo -cdefs defs_netbsd.go defs_netbsd_arm.go >defs_netbsd_arm.h
+*/
+
+// +godefs map __fpregset_t [644]byte
+
+package runtime
+
+/*
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <sys/signal.h>
+#include <sys/event.h>
+#include <sys/time.h>
+#include <sys/ucontext.h>
+#include <sys/unistd.h>
+#include <errno.h>
+#include <signal.h>
+*/
+import "C"
+
+const (
+	EINTR  = C.EINTR
+	EFAULT = C.EFAULT
+	EAGAIN = C.EAGAIN
+
+	O_WRONLY   = C.O_WRONLY
+	O_NONBLOCK = C.O_NONBLOCK
+	O_CREAT    = C.O_CREAT
+	O_TRUNC    = C.O_TRUNC
+	O_CLOEXEC  = C.O_CLOEXEC
+
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANON
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+
+	MADV_DONTNEED = C.MADV_DONTNEED
+	MADV_FREE     = C.MADV_FREE
+
+	SA_SIGINFO = C.SA_SIGINFO
+	SA_RESTART = C.SA_RESTART
+	SA_ONSTACK = C.SA_ONSTACK
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGEMT    = C.SIGEMT
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGBUS    = C.SIGBUS
+	SIGSEGV   = C.SIGSEGV
+	SIGSYS    = C.SIGSYS
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGTERM   = C.SIGTERM
+	SIGURG    = C.SIGURG
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGCONT   = C.SIGCONT
+	SIGCHLD   = C.SIGCHLD
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGIO     = C.SIGIO
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGINFO   = C.SIGINFO
+	SIGUSR1   = C.SIGUSR1
+	SIGUSR2   = C.SIGUSR2
+
+	FPE_INTDIV = C.FPE_INTDIV
+	FPE_INTOVF = C.FPE_INTOVF
+	FPE_FLTDIV = C.FPE_FLTDIV
+	FPE_FLTOVF = C.FPE_FLTOVF
+	FPE_FLTUND = C.FPE_FLTUND
+	FPE_FLTRES = C.FPE_FLTRES
+	FPE_FLTINV = C.FPE_FLTINV
+	FPE_FLTSUB = C.FPE_FLTSUB
+
+	BUS_ADRALN = C.BUS_ADRALN
+	BUS_ADRERR = C.BUS_ADRERR
+	BUS_OBJERR = C.BUS_OBJERR
+
+	SEGV_MAPERR = C.SEGV_MAPERR
+	SEGV_ACCERR = C.SEGV_ACCERR
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	ITIMER_PROF    = C.ITIMER_PROF
+
+	EV_ADD       = C.EV_ADD
+	EV_DELETE    = C.EV_DELETE
+	EV_CLEAR     = C.EV_CLEAR
+	EV_RECEIPT   = 0
+	EV_ERROR     = C.EV_ERROR
+	EV_EOF       = C.EV_EOF
+	EVFILT_READ  = C.EVFILT_READ
+	EVFILT_WRITE = C.EVFILT_WRITE
+)
+
+type Sigset C.sigset_t
+type Siginfo C.struct__ksiginfo
+
+type StackT C.stack_t
+
+type Timespec C.struct_timespec
+type Timeval C.struct_timeval
+type Itimerval C.struct_itimerval
+
+type McontextT C.mcontext_t
+type UcontextT C.ucontext_t
+
+type Kevent C.struct_kevent
diff --git a/src/runtime/defs_netbsd_386.go b/src/runtime/defs_netbsd_386.go
new file mode 100644
index 0000000..2943ea3
--- /dev/null
+++ b/src/runtime/defs_netbsd_386.go
@@ -0,0 +1,41 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+/*
+Input to cgo.
+
+GOARCH=386 go tool cgo -cdefs defs_netbsd.go defs_netbsd_386.go >defs_netbsd_386.h
+*/
+
+package runtime
+
+/*
+#include <sys/types.h>
+#include <machine/mcontext.h>
+*/
+import "C"
+
+const (
+	REG_GS     = C._REG_GS
+	REG_FS     = C._REG_FS
+	REG_ES     = C._REG_ES
+	REG_DS     = C._REG_DS
+	REG_EDI    = C._REG_EDI
+	REG_ESI    = C._REG_ESI
+	REG_EBP    = C._REG_EBP
+	REG_ESP    = C._REG_ESP
+	REG_EBX    = C._REG_EBX
+	REG_EDX    = C._REG_EDX
+	REG_ECX    = C._REG_ECX
+	REG_EAX    = C._REG_EAX
+	REG_TRAPNO = C._REG_TRAPNO
+	REG_ERR    = C._REG_ERR
+	REG_EIP    = C._REG_EIP
+	REG_CS     = C._REG_CS
+	REG_EFL    = C._REG_EFL
+	REG_UESP   = C._REG_UESP
+	REG_SS     = C._REG_SS
+)
diff --git a/src/runtime/defs_netbsd_amd64.go b/src/runtime/defs_netbsd_amd64.go
new file mode 100644
index 0000000..33d80ff
--- /dev/null
+++ b/src/runtime/defs_netbsd_amd64.go
@@ -0,0 +1,48 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -cdefs defs_netbsd.go defs_netbsd_amd64.go >defs_netbsd_amd64.h
+*/
+
+package runtime
+
+/*
+#include <sys/types.h>
+#include <machine/mcontext.h>
+*/
+import "C"
+
+const (
+	REG_RDI    = C._REG_RDI
+	REG_RSI    = C._REG_RSI
+	REG_RDX    = C._REG_RDX
+	REG_RCX    = C._REG_RCX
+	REG_R8     = C._REG_R8
+	REG_R9     = C._REG_R9
+	REG_R10    = C._REG_R10
+	REG_R11    = C._REG_R11
+	REG_R12    = C._REG_R12
+	REG_R13    = C._REG_R13
+	REG_R14    = C._REG_R14
+	REG_R15    = C._REG_R15
+	REG_RBP    = C._REG_RBP
+	REG_RBX    = C._REG_RBX
+	REG_RAX    = C._REG_RAX
+	REG_GS     = C._REG_GS
+	REG_FS     = C._REG_FS
+	REG_ES     = C._REG_ES
+	REG_DS     = C._REG_DS
+	REG_TRAPNO = C._REG_TRAPNO
+	REG_ERR    = C._REG_ERR
+	REG_RIP    = C._REG_RIP
+	REG_CS     = C._REG_CS
+	REG_RFLAGS = C._REG_RFLAGS
+	REG_RSP    = C._REG_RSP
+	REG_SS     = C._REG_SS
+)
diff --git a/src/runtime/defs_netbsd_arm.go b/src/runtime/defs_netbsd_arm.go
new file mode 100644
index 0000000..74b3752
--- /dev/null
+++ b/src/runtime/defs_netbsd_arm.go
@@ -0,0 +1,39 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+/*
+Input to cgo.
+
+GOARCH=arm go tool cgo -cdefs defs_netbsd.go defs_netbsd_arm.go >defs_netbsd_arm.h
+*/
+
+package runtime
+
+/*
+#include <sys/types.h>
+#include <machine/mcontext.h>
+*/
+import "C"
+
+const (
+	REG_R0   = C._REG_R0
+	REG_R1   = C._REG_R1
+	REG_R2   = C._REG_R2
+	REG_R3   = C._REG_R3
+	REG_R4   = C._REG_R4
+	REG_R5   = C._REG_R5
+	REG_R6   = C._REG_R6
+	REG_R7   = C._REG_R7
+	REG_R8   = C._REG_R8
+	REG_R9   = C._REG_R9
+	REG_R10  = C._REG_R10
+	REG_R11  = C._REG_R11
+	REG_R12  = C._REG_R12
+	REG_R13  = C._REG_R13
+	REG_R14  = C._REG_R14
+	REG_R15  = C._REG_R15
+	REG_CPSR = C._REG_CPSR
+)
diff --git a/src/runtime/defs_openbsd.go b/src/runtime/defs_openbsd.go
new file mode 100644
index 0000000..2ca6a88
--- /dev/null
+++ b/src/runtime/defs_openbsd.go
@@ -0,0 +1,144 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -godefs defs_openbsd.go
+GOARCH=386 go tool cgo -godefs defs_openbsd.go
+GOARCH=arm go tool cgo -godefs defs_openbsd.go
+GOARCH=arm64 go tool cgo -godefs defs_openbsd.go
+GOARCH=mips64 go tool cgo -godefs defs_openbsd.go
+*/
+
+package runtime
+
+/*
+#include <sys/types.h>
+#include <sys/event.h>
+#include <sys/mman.h>
+#include <sys/time.h>
+#include <sys/unistd.h>
+#include <sys/signal.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <pthread.h>
+#include <signal.h>
+*/
+import "C"
+
+const (
+	EINTR  = C.EINTR
+	EFAULT = C.EFAULT
+	EAGAIN = C.EAGAIN
+
+	O_NONBLOCK = C.O_NONBLOCK
+	O_CLOEXEC  = C.O_CLOEXEC
+
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANON
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+	MAP_STACK   = C.MAP_STACK
+
+	MADV_DONTNEED = C.MADV_DONTNEED
+	MADV_FREE     = C.MADV_FREE
+
+	SA_SIGINFO = C.SA_SIGINFO
+	SA_RESTART = C.SA_RESTART
+	SA_ONSTACK = C.SA_ONSTACK
+
+	PTHREAD_CREATE_DETACHED = C.PTHREAD_CREATE_DETACHED
+
+	F_GETFL = C.F_GETFL
+	F_SETFL = C.F_SETFL
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGEMT    = C.SIGEMT
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGBUS    = C.SIGBUS
+	SIGSEGV   = C.SIGSEGV
+	SIGSYS    = C.SIGSYS
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGTERM   = C.SIGTERM
+	SIGURG    = C.SIGURG
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGCONT   = C.SIGCONT
+	SIGCHLD   = C.SIGCHLD
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGIO     = C.SIGIO
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGINFO   = C.SIGINFO
+	SIGUSR1   = C.SIGUSR1
+	SIGUSR2   = C.SIGUSR2
+
+	FPE_INTDIV = C.FPE_INTDIV
+	FPE_INTOVF = C.FPE_INTOVF
+	FPE_FLTDIV = C.FPE_FLTDIV
+	FPE_FLTOVF = C.FPE_FLTOVF
+	FPE_FLTUND = C.FPE_FLTUND
+	FPE_FLTRES = C.FPE_FLTRES
+	FPE_FLTINV = C.FPE_FLTINV
+	FPE_FLTSUB = C.FPE_FLTSUB
+
+	BUS_ADRALN = C.BUS_ADRALN
+	BUS_ADRERR = C.BUS_ADRERR
+	BUS_OBJERR = C.BUS_OBJERR
+
+	SEGV_MAPERR = C.SEGV_MAPERR
+	SEGV_ACCERR = C.SEGV_ACCERR
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	ITIMER_PROF    = C.ITIMER_PROF
+
+	EV_ADD       = C.EV_ADD
+	EV_DELETE    = C.EV_DELETE
+	EV_CLEAR     = C.EV_CLEAR
+	EV_ERROR     = C.EV_ERROR
+	EV_EOF       = C.EV_EOF
+	EVFILT_READ  = C.EVFILT_READ
+	EVFILT_WRITE = C.EVFILT_WRITE
+)
+
+type TforkT C.struct___tfork
+
+type Sigcontext C.struct_sigcontext
+type Siginfo C.siginfo_t
+type Sigset C.sigset_t
+type Sigval C.union_sigval
+
+type StackT C.stack_t
+
+type Timespec C.struct_timespec
+type Timeval C.struct_timeval
+type Itimerval C.struct_itimerval
+
+type KeventT C.struct_kevent
+
+type Pthread C.pthread_t
+type PthreadAttr C.pthread_attr_t
+type PthreadCond C.pthread_cond_t
+type PthreadCondAttr C.pthread_condattr_t
+type PthreadMutex C.pthread_mutex_t
+type PthreadMutexAttr C.pthread_mutexattr_t
diff --git a/src/runtime/defs_openbsd_386.go b/src/runtime/defs_openbsd_386.go
new file mode 100644
index 0000000..fde8af5
--- /dev/null
+++ b/src/runtime/defs_openbsd_386.go
@@ -0,0 +1,180 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_openbsd.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+	_O_CLOEXEC  = 0x10000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+	_MAP_STACK   = 0x4000
+
+	_MADV_DONTNEED = 0x4
+	_MADV_FREE     = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_PTHREAD_CREATE_DETACHED = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type tforkt struct {
+	tf_tcb   unsafe.Pointer
+	tf_tid   *int32
+	tf_stack uintptr
+}
+
+type sigcontext struct {
+	sc_gs       uint32
+	sc_fs       uint32
+	sc_es       uint32
+	sc_ds       uint32
+	sc_edi      uint32
+	sc_esi      uint32
+	sc_ebp      uint32
+	sc_ebx      uint32
+	sc_edx      uint32
+	sc_ecx      uint32
+	sc_eax      uint32
+	sc_eip      uint32
+	sc_cs       uint32
+	sc_eflags   uint32
+	sc_esp      uint32
+	sc_ss       uint32
+	__sc_unused uint32
+	sc_mask     uint32
+	sc_trapno   uint32
+	sc_err      uint32
+	sc_fpstate  unsafe.Pointer
+}
+
+type siginfo struct {
+	si_signo int32
+	si_code  int32
+	si_errno int32
+	_data    [116]byte
+}
+
+type stackt struct {
+	ss_sp    uintptr
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int32
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = int64(timediv(ns, 1e9, &ts.tv_nsec))
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int32
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type keventt struct {
+	ident  uint32
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+}
+
+type pthread uintptr
+type pthreadattr uintptr
+type pthreadcond uintptr
+type pthreadcondattr uintptr
+type pthreadmutex uintptr
+type pthreadmutexattr uintptr
diff --git a/src/runtime/defs_openbsd_amd64.go b/src/runtime/defs_openbsd_amd64.go
new file mode 100644
index 0000000..0f29d0c
--- /dev/null
+++ b/src/runtime/defs_openbsd_amd64.go
@@ -0,0 +1,191 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_openbsd.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+	_O_CLOEXEC  = 0x10000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+	_MAP_STACK   = 0x4000
+
+	_MADV_DONTNEED = 0x4
+	_MADV_FREE     = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_PTHREAD_CREATE_DETACHED = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type tforkt struct {
+	tf_tcb   unsafe.Pointer
+	tf_tid   *int32
+	tf_stack uintptr
+}
+
+type sigcontext struct {
+	sc_rdi      uint64
+	sc_rsi      uint64
+	sc_rdx      uint64
+	sc_rcx      uint64
+	sc_r8       uint64
+	sc_r9       uint64
+	sc_r10      uint64
+	sc_r11      uint64
+	sc_r12      uint64
+	sc_r13      uint64
+	sc_r14      uint64
+	sc_r15      uint64
+	sc_rbp      uint64
+	sc_rbx      uint64
+	sc_rax      uint64
+	sc_gs       uint64
+	sc_fs       uint64
+	sc_es       uint64
+	sc_ds       uint64
+	sc_trapno   uint64
+	sc_err      uint64
+	sc_rip      uint64
+	sc_cs       uint64
+	sc_rflags   uint64
+	sc_rsp      uint64
+	sc_ss       uint64
+	sc_fpstate  unsafe.Pointer
+	__sc_unused int32
+	sc_mask     int32
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_code   int32
+	si_errno  int32
+	pad_cgo_0 [4]byte
+	_data     [120]byte
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+}
+
+type pthread uintptr
+type pthreadattr uintptr
+type pthreadcond uintptr
+type pthreadcondattr uintptr
+type pthreadmutex uintptr
+type pthreadmutexattr uintptr
diff --git a/src/runtime/defs_openbsd_arm.go b/src/runtime/defs_openbsd_arm.go
new file mode 100644
index 0000000..b56f3b4
--- /dev/null
+++ b/src/runtime/defs_openbsd_arm.go
@@ -0,0 +1,188 @@
+// created by cgo -cdefs and then converted to Go
+// cgo -cdefs defs_openbsd.go
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+	_O_CLOEXEC  = 0x10000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+	_MAP_STACK   = 0x4000
+
+	_MADV_DONTNEED = 0x4
+	_MADV_FREE     = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_PTHREAD_CREATE_DETACHED = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type tforkt struct {
+	tf_tcb   unsafe.Pointer
+	tf_tid   *int32
+	tf_stack uintptr
+}
+
+type sigcontext struct {
+	__sc_unused int32
+	sc_mask     int32
+
+	sc_spsr   uint32
+	sc_r0     uint32
+	sc_r1     uint32
+	sc_r2     uint32
+	sc_r3     uint32
+	sc_r4     uint32
+	sc_r5     uint32
+	sc_r6     uint32
+	sc_r7     uint32
+	sc_r8     uint32
+	sc_r9     uint32
+	sc_r10    uint32
+	sc_r11    uint32
+	sc_r12    uint32
+	sc_usr_sp uint32
+	sc_usr_lr uint32
+	sc_svc_lr uint32
+	sc_pc     uint32
+	sc_fpused uint32
+	sc_fpscr  uint32
+	sc_fpreg  [32]uint64
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_code   int32
+	si_errno  int32
+	pad_cgo_0 [4]byte
+	_data     [120]byte
+}
+
+type stackt struct {
+	ss_sp    uintptr
+	ss_size  uintptr
+	ss_flags int32
+}
+
+type timespec struct {
+	tv_sec    int64
+	tv_nsec   int32
+	pad_cgo_0 [4]byte
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = int64(timediv(ns, 1e9, &ts.tv_nsec))
+}
+
+type timeval struct {
+	tv_sec    int64
+	tv_usec   int32
+	pad_cgo_0 [4]byte
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = x
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type keventt struct {
+	ident     uint32
+	filter    int16
+	flags     uint16
+	fflags    uint32
+	pad_cgo_0 [4]byte
+	data      int64
+	udata     *byte
+	pad_cgo_1 [4]byte
+}
+
+type pthread uintptr
+type pthreadattr uintptr
+type pthreadcond uintptr
+type pthreadcondattr uintptr
+type pthreadmutex uintptr
+type pthreadmutexattr uintptr
diff --git a/src/runtime/defs_openbsd_arm64.go b/src/runtime/defs_openbsd_arm64.go
new file mode 100644
index 0000000..0a9acc0
--- /dev/null
+++ b/src/runtime/defs_openbsd_arm64.go
@@ -0,0 +1,171 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+	_O_CLOEXEC  = 0x10000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+	_MAP_STACK   = 0x4000
+
+	_MADV_DONTNEED = 0x4
+	_MADV_FREE     = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_PTHREAD_CREATE_DETACHED = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type tforkt struct {
+	tf_tcb   unsafe.Pointer
+	tf_tid   *int32
+	tf_stack uintptr
+}
+
+type sigcontext struct {
+	__sc_unused int32
+	sc_mask     int32
+	sc_sp       uintptr
+	sc_lr       uintptr
+	sc_elr      uintptr
+	sc_spsr     uintptr
+	sc_x        [30]uintptr
+	sc_cookie   int64
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_code   int32
+	si_errno  int32
+	pad_cgo_0 [4]byte
+	_data     [120]byte
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+}
+
+type pthread uintptr
+type pthreadattr uintptr
+type pthreadcond uintptr
+type pthreadcondattr uintptr
+type pthreadmutex uintptr
+type pthreadmutexattr uintptr
diff --git a/src/runtime/defs_openbsd_mips64.go b/src/runtime/defs_openbsd_mips64.go
new file mode 100644
index 0000000..1e469e4
--- /dev/null
+++ b/src/runtime/defs_openbsd_mips64.go
@@ -0,0 +1,170 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Generated from:
+//
+//   GOARCH=mips64 go tool cgo -godefs defs_openbsd.go
+//
+// Then converted to the form used by the runtime.
+
+package runtime
+
+import "unsafe"
+
+const (
+	_EINTR  = 0x4
+	_EFAULT = 0xe
+	_EAGAIN = 0x23
+
+	_O_WRONLY   = 0x1
+	_O_NONBLOCK = 0x4
+	_O_CREAT    = 0x200
+	_O_TRUNC    = 0x400
+	_O_CLOEXEC  = 0x10000
+
+	_PROT_NONE  = 0x0
+	_PROT_READ  = 0x1
+	_PROT_WRITE = 0x2
+	_PROT_EXEC  = 0x4
+
+	_MAP_ANON    = 0x1000
+	_MAP_PRIVATE = 0x2
+	_MAP_FIXED   = 0x10
+	_MAP_STACK   = 0x4000
+
+	_MADV_DONTNEED = 0x4
+	_MADV_FREE     = 0x6
+
+	_SA_SIGINFO = 0x40
+	_SA_RESTART = 0x2
+	_SA_ONSTACK = 0x1
+
+	_SIGHUP    = 0x1
+	_SIGINT    = 0x2
+	_SIGQUIT   = 0x3
+	_SIGILL    = 0x4
+	_SIGTRAP   = 0x5
+	_SIGABRT   = 0x6
+	_SIGEMT    = 0x7
+	_SIGFPE    = 0x8
+	_SIGKILL   = 0x9
+	_SIGBUS    = 0xa
+	_SIGSEGV   = 0xb
+	_SIGSYS    = 0xc
+	_SIGPIPE   = 0xd
+	_SIGALRM   = 0xe
+	_SIGTERM   = 0xf
+	_SIGURG    = 0x10
+	_SIGSTOP   = 0x11
+	_SIGTSTP   = 0x12
+	_SIGCONT   = 0x13
+	_SIGCHLD   = 0x14
+	_SIGTTIN   = 0x15
+	_SIGTTOU   = 0x16
+	_SIGIO     = 0x17
+	_SIGXCPU   = 0x18
+	_SIGXFSZ   = 0x19
+	_SIGVTALRM = 0x1a
+	_SIGPROF   = 0x1b
+	_SIGWINCH  = 0x1c
+	_SIGINFO   = 0x1d
+	_SIGUSR1   = 0x1e
+	_SIGUSR2   = 0x1f
+
+	_FPE_INTDIV = 0x1
+	_FPE_INTOVF = 0x2
+	_FPE_FLTDIV = 0x3
+	_FPE_FLTOVF = 0x4
+	_FPE_FLTUND = 0x5
+	_FPE_FLTRES = 0x6
+	_FPE_FLTINV = 0x7
+	_FPE_FLTSUB = 0x8
+
+	_BUS_ADRALN = 0x1
+	_BUS_ADRERR = 0x2
+	_BUS_OBJERR = 0x3
+
+	_SEGV_MAPERR = 0x1
+	_SEGV_ACCERR = 0x2
+
+	_ITIMER_REAL    = 0x0
+	_ITIMER_VIRTUAL = 0x1
+	_ITIMER_PROF    = 0x2
+
+	_EV_ADD       = 0x1
+	_EV_DELETE    = 0x2
+	_EV_CLEAR     = 0x20
+	_EV_ERROR     = 0x4000
+	_EV_EOF       = 0x8000
+	_EVFILT_READ  = -0x1
+	_EVFILT_WRITE = -0x2
+)
+
+type tforkt struct {
+	tf_tcb   unsafe.Pointer
+	tf_tid   *int32
+	tf_stack uintptr
+}
+
+type sigcontext struct {
+	sc_cookie  uint64
+	sc_mask    uint64
+	sc_pc      uint64
+	sc_regs    [32]uint64
+	mullo      uint64
+	mulhi      uint64
+	sc_fpregs  [33]uint64
+	sc_fpused  uint64
+	sc_fpc_eir uint64
+	_xxx       [8]int64
+}
+
+type siginfo struct {
+	si_signo  int32
+	si_code   int32
+	si_errno  int32
+	pad_cgo_0 [4]byte
+	_data     [120]byte
+}
+
+type stackt struct {
+	ss_sp     uintptr
+	ss_size   uintptr
+	ss_flags  int32
+	pad_cgo_0 [4]byte
+}
+
+type timespec struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+//go:nosplit
+func (ts *timespec) setNsec(ns int64) {
+	ts.tv_sec = ns / 1e9
+	ts.tv_nsec = ns % 1e9
+}
+
+type timeval struct {
+	tv_sec  int64
+	tv_usec int64
+}
+
+func (tv *timeval) set_usec(x int32) {
+	tv.tv_usec = int64(x)
+}
+
+type itimerval struct {
+	it_interval timeval
+	it_value    timeval
+}
+
+type keventt struct {
+	ident  uint64
+	filter int16
+	flags  uint16
+	fflags uint32
+	data   int64
+	udata  *byte
+}
diff --git a/src/runtime/defs_plan9_386.go b/src/runtime/defs_plan9_386.go
new file mode 100644
index 0000000..428044d
--- /dev/null
+++ b/src/runtime/defs_plan9_386.go
@@ -0,0 +1,64 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const _PAGESIZE = 0x1000
+
+type ureg struct {
+	di    uint32 /* general registers */
+	si    uint32 /* ... */
+	bp    uint32 /* ... */
+	nsp   uint32
+	bx    uint32 /* ... */
+	dx    uint32 /* ... */
+	cx    uint32 /* ... */
+	ax    uint32 /* ... */
+	gs    uint32 /* data segments */
+	fs    uint32 /* ... */
+	es    uint32 /* ... */
+	ds    uint32 /* ... */
+	trap  uint32 /* trap _type */
+	ecode uint32 /* error code (or zero) */
+	pc    uint32 /* pc */
+	cs    uint32 /* old context */
+	flags uint32 /* old flags */
+	sp    uint32
+	ss    uint32 /* old stack segment */
+}
+
+type sigctxt struct {
+	u *ureg
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uintptr { return uintptr(c.u.pc) }
+
+func (c *sigctxt) sp() uintptr { return uintptr(c.u.sp) }
+func (c *sigctxt) lr() uintptr { return uintptr(0) }
+
+func (c *sigctxt) setpc(x uintptr) { c.u.pc = uint32(x) }
+func (c *sigctxt) setsp(x uintptr) { c.u.sp = uint32(x) }
+func (c *sigctxt) setlr(x uintptr) {}
+
+func (c *sigctxt) savelr(x uintptr) {}
+
+func dumpregs(u *ureg) {
+	print("ax    ", hex(u.ax), "\n")
+	print("bx    ", hex(u.bx), "\n")
+	print("cx    ", hex(u.cx), "\n")
+	print("dx    ", hex(u.dx), "\n")
+	print("di    ", hex(u.di), "\n")
+	print("si    ", hex(u.si), "\n")
+	print("bp    ", hex(u.bp), "\n")
+	print("sp    ", hex(u.sp), "\n")
+	print("pc    ", hex(u.pc), "\n")
+	print("flags ", hex(u.flags), "\n")
+	print("cs    ", hex(u.cs), "\n")
+	print("fs    ", hex(u.fs), "\n")
+	print("gs    ", hex(u.gs), "\n")
+}
+
+func sigpanictramp()
diff --git a/src/runtime/defs_plan9_amd64.go b/src/runtime/defs_plan9_amd64.go
new file mode 100644
index 0000000..15a27fc
--- /dev/null
+++ b/src/runtime/defs_plan9_amd64.go
@@ -0,0 +1,81 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const _PAGESIZE = 0x1000
+
+type ureg struct {
+	ax  uint64
+	bx  uint64
+	cx  uint64
+	dx  uint64
+	si  uint64
+	di  uint64
+	bp  uint64
+	r8  uint64
+	r9  uint64
+	r10 uint64
+	r11 uint64
+	r12 uint64
+	r13 uint64
+	r14 uint64
+	r15 uint64
+
+	ds uint16
+	es uint16
+	fs uint16
+	gs uint16
+
+	_type uint64
+	error uint64 /* error code (or zero) */
+	ip    uint64 /* pc */
+	cs    uint64 /* old context */
+	flags uint64 /* old flags */
+	sp    uint64 /* sp */
+	ss    uint64 /* old stack segment */
+}
+
+type sigctxt struct {
+	u *ureg
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uintptr { return uintptr(c.u.ip) }
+
+func (c *sigctxt) sp() uintptr { return uintptr(c.u.sp) }
+func (c *sigctxt) lr() uintptr { return uintptr(0) }
+
+func (c *sigctxt) setpc(x uintptr) { c.u.ip = uint64(x) }
+func (c *sigctxt) setsp(x uintptr) { c.u.sp = uint64(x) }
+func (c *sigctxt) setlr(x uintptr) {}
+
+func (c *sigctxt) savelr(x uintptr) {}
+
+func dumpregs(u *ureg) {
+	print("ax    ", hex(u.ax), "\n")
+	print("bx    ", hex(u.bx), "\n")
+	print("cx    ", hex(u.cx), "\n")
+	print("dx    ", hex(u.dx), "\n")
+	print("di    ", hex(u.di), "\n")
+	print("si    ", hex(u.si), "\n")
+	print("bp    ", hex(u.bp), "\n")
+	print("sp    ", hex(u.sp), "\n")
+	print("r8    ", hex(u.r8), "\n")
+	print("r9    ", hex(u.r9), "\n")
+	print("r10   ", hex(u.r10), "\n")
+	print("r11   ", hex(u.r11), "\n")
+	print("r12   ", hex(u.r12), "\n")
+	print("r13   ", hex(u.r13), "\n")
+	print("r14   ", hex(u.r14), "\n")
+	print("r15   ", hex(u.r15), "\n")
+	print("ip    ", hex(u.ip), "\n")
+	print("flags ", hex(u.flags), "\n")
+	print("cs    ", hex(u.cs), "\n")
+	print("fs    ", hex(u.fs), "\n")
+	print("gs    ", hex(u.gs), "\n")
+}
+
+func sigpanictramp()
diff --git a/src/runtime/defs_plan9_arm.go b/src/runtime/defs_plan9_arm.go
new file mode 100644
index 0000000..1adc16e
--- /dev/null
+++ b/src/runtime/defs_plan9_arm.go
@@ -0,0 +1,66 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const _PAGESIZE = 0x1000
+
+type ureg struct {
+	r0   uint32 /* general registers */
+	r1   uint32 /* ... */
+	r2   uint32 /* ... */
+	r3   uint32 /* ... */
+	r4   uint32 /* ... */
+	r5   uint32 /* ... */
+	r6   uint32 /* ... */
+	r7   uint32 /* ... */
+	r8   uint32 /* ... */
+	r9   uint32 /* ... */
+	r10  uint32 /* ... */
+	r11  uint32 /* ... */
+	r12  uint32 /* ... */
+	sp   uint32
+	link uint32 /* ... */
+	trap uint32 /* trap type */
+	psr  uint32
+	pc   uint32 /* interrupted addr */
+}
+
+type sigctxt struct {
+	u *ureg
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uintptr { return uintptr(c.u.pc) }
+
+func (c *sigctxt) sp() uintptr { return uintptr(c.u.sp) }
+func (c *sigctxt) lr() uintptr { return uintptr(c.u.link) }
+
+func (c *sigctxt) setpc(x uintptr)  { c.u.pc = uint32(x) }
+func (c *sigctxt) setsp(x uintptr)  { c.u.sp = uint32(x) }
+func (c *sigctxt) setlr(x uintptr)  { c.u.link = uint32(x) }
+func (c *sigctxt) savelr(x uintptr) { c.u.r0 = uint32(x) }
+
+func dumpregs(u *ureg) {
+	print("r0    ", hex(u.r0), "\n")
+	print("r1    ", hex(u.r1), "\n")
+	print("r2    ", hex(u.r2), "\n")
+	print("r3    ", hex(u.r3), "\n")
+	print("r4    ", hex(u.r4), "\n")
+	print("r5    ", hex(u.r5), "\n")
+	print("r6    ", hex(u.r6), "\n")
+	print("r7    ", hex(u.r7), "\n")
+	print("r8    ", hex(u.r8), "\n")
+	print("r9    ", hex(u.r9), "\n")
+	print("r10   ", hex(u.r10), "\n")
+	print("r11   ", hex(u.r11), "\n")
+	print("r12   ", hex(u.r12), "\n")
+	print("sp    ", hex(u.sp), "\n")
+	print("link  ", hex(u.link), "\n")
+	print("pc    ", hex(u.pc), "\n")
+	print("psr   ", hex(u.psr), "\n")
+}
+
+func sigpanictramp()
diff --git a/src/runtime/defs_solaris.go b/src/runtime/defs_solaris.go
new file mode 100644
index 0000000..11708ee
--- /dev/null
+++ b/src/runtime/defs_solaris.go
@@ -0,0 +1,162 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -cdefs defs_solaris.go >defs_solaris_amd64.h
+*/
+
+package runtime
+
+/*
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <sys/select.h>
+#include <sys/siginfo.h>
+#include <sys/signal.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/ucontext.h>
+#include <sys/regset.h>
+#include <sys/unistd.h>
+#include <sys/fork.h>
+#include <sys/port.h>
+#include <semaphore.h>
+#include <errno.h>
+#include <signal.h>
+#include <pthread.h>
+#include <netdb.h>
+*/
+import "C"
+
+const (
+	EINTR       = C.EINTR
+	EBADF       = C.EBADF
+	EFAULT      = C.EFAULT
+	EAGAIN      = C.EAGAIN
+	EBUSY       = C.EBUSY
+	ETIME       = C.ETIME
+	ETIMEDOUT   = C.ETIMEDOUT
+	EWOULDBLOCK = C.EWOULDBLOCK
+	EINPROGRESS = C.EINPROGRESS
+
+	PROT_NONE  = C.PROT_NONE
+	PROT_READ  = C.PROT_READ
+	PROT_WRITE = C.PROT_WRITE
+	PROT_EXEC  = C.PROT_EXEC
+
+	MAP_ANON    = C.MAP_ANON
+	MAP_PRIVATE = C.MAP_PRIVATE
+	MAP_FIXED   = C.MAP_FIXED
+
+	MADV_DONTNEED = C.MADV_DONTNEED
+	MADV_FREE     = C.MADV_FREE
+
+	SA_SIGINFO = C.SA_SIGINFO
+	SA_RESTART = C.SA_RESTART
+	SA_ONSTACK = C.SA_ONSTACK
+
+	SIGHUP    = C.SIGHUP
+	SIGINT    = C.SIGINT
+	SIGQUIT   = C.SIGQUIT
+	SIGILL    = C.SIGILL
+	SIGTRAP   = C.SIGTRAP
+	SIGABRT   = C.SIGABRT
+	SIGEMT    = C.SIGEMT
+	SIGFPE    = C.SIGFPE
+	SIGKILL   = C.SIGKILL
+	SIGBUS    = C.SIGBUS
+	SIGSEGV   = C.SIGSEGV
+	SIGSYS    = C.SIGSYS
+	SIGPIPE   = C.SIGPIPE
+	SIGALRM   = C.SIGALRM
+	SIGTERM   = C.SIGTERM
+	SIGURG    = C.SIGURG
+	SIGSTOP   = C.SIGSTOP
+	SIGTSTP   = C.SIGTSTP
+	SIGCONT   = C.SIGCONT
+	SIGCHLD   = C.SIGCHLD
+	SIGTTIN   = C.SIGTTIN
+	SIGTTOU   = C.SIGTTOU
+	SIGIO     = C.SIGIO
+	SIGXCPU   = C.SIGXCPU
+	SIGXFSZ   = C.SIGXFSZ
+	SIGVTALRM = C.SIGVTALRM
+	SIGPROF   = C.SIGPROF
+	SIGWINCH  = C.SIGWINCH
+	SIGUSR1   = C.SIGUSR1
+	SIGUSR2   = C.SIGUSR2
+
+	FPE_INTDIV = C.FPE_INTDIV
+	FPE_INTOVF = C.FPE_INTOVF
+	FPE_FLTDIV = C.FPE_FLTDIV
+	FPE_FLTOVF = C.FPE_FLTOVF
+	FPE_FLTUND = C.FPE_FLTUND
+	FPE_FLTRES = C.FPE_FLTRES
+	FPE_FLTINV = C.FPE_FLTINV
+	FPE_FLTSUB = C.FPE_FLTSUB
+
+	BUS_ADRALN = C.BUS_ADRALN
+	BUS_ADRERR = C.BUS_ADRERR
+	BUS_OBJERR = C.BUS_OBJERR
+
+	SEGV_MAPERR = C.SEGV_MAPERR
+	SEGV_ACCERR = C.SEGV_ACCERR
+
+	ITIMER_REAL    = C.ITIMER_REAL
+	ITIMER_VIRTUAL = C.ITIMER_VIRTUAL
+	ITIMER_PROF    = C.ITIMER_PROF
+
+	_SC_NPROCESSORS_ONLN = C._SC_NPROCESSORS_ONLN
+
+	PTHREAD_CREATE_DETACHED = C.PTHREAD_CREATE_DETACHED
+
+	FORK_NOSIGCHLD = C.FORK_NOSIGCHLD
+	FORK_WAITPID   = C.FORK_WAITPID
+
+	MAXHOSTNAMELEN = C.MAXHOSTNAMELEN
+
+	O_WRONLY   = C.O_WRONLY
+	O_NONBLOCK = C.O_NONBLOCK
+	O_CREAT    = C.O_CREAT
+	O_TRUNC    = C.O_TRUNC
+	O_CLOEXEC  = C.O_CLOEXEC
+	F_GETFL    = C.F_GETFL
+	F_SETFL    = C.F_SETFL
+
+	POLLIN  = C.POLLIN
+	POLLOUT = C.POLLOUT
+	POLLHUP = C.POLLHUP
+	POLLERR = C.POLLERR
+
+	PORT_SOURCE_FD    = C.PORT_SOURCE_FD
+	PORT_SOURCE_ALERT = C.PORT_SOURCE_ALERT
+	PORT_ALERT_UPDATE = C.PORT_ALERT_UPDATE
+)
+
+type SemT C.sem_t
+
+type Sigset C.sigset_t
+type StackT C.stack_t
+
+type Siginfo C.siginfo_t
+type Sigaction C.struct_sigaction
+
+type Fpregset C.fpregset_t
+type Mcontext C.mcontext_t
+type Ucontext C.ucontext_t
+
+type Timespec C.struct_timespec
+type Timeval C.struct_timeval
+type Itimerval C.struct_itimerval
+
+type PortEvent C.port_event_t
+type Pthread C.pthread_t
+type PthreadAttr C.pthread_attr_t
+
+// depends on Timespec, must appear below
+type Stat C.struct_stat
diff --git a/src/runtime/defs_solaris_amd64.go b/src/runtime/defs_solaris_amd64.go
new file mode 100644
index 0000000..56e4b38
--- /dev/null
+++ b/src/runtime/defs_solaris_amd64.go
@@ -0,0 +1,48 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+/*
+Input to cgo.
+
+GOARCH=amd64 go tool cgo -cdefs defs_solaris.go defs_solaris_amd64.go >defs_solaris_amd64.h
+*/
+
+package runtime
+
+/*
+#include <sys/types.h>
+#include <sys/regset.h>
+*/
+import "C"
+
+const (
+	REG_RDI    = C.REG_RDI
+	REG_RSI    = C.REG_RSI
+	REG_RDX    = C.REG_RDX
+	REG_RCX    = C.REG_RCX
+	REG_R8     = C.REG_R8
+	REG_R9     = C.REG_R9
+	REG_R10    = C.REG_R10
+	REG_R11    = C.REG_R11
+	REG_R12    = C.REG_R12
+	REG_R13    = C.REG_R13
+	REG_R14    = C.REG_R14
+	REG_R15    = C.REG_R15
+	REG_RBP    = C.REG_RBP
+	REG_RBX    = C.REG_RBX
+	REG_RAX    = C.REG_RAX
+	REG_GS     = C.REG_GS
+	REG_FS     = C.REG_FS
+	REG_ES     = C.REG_ES
+	REG_DS     = C.REG_DS
+	REG_TRAPNO = C.REG_TRAPNO
+	REG_ERR    = C.REG_ERR
+	REG_RIP    = C.REG_RIP
+	REG_CS     = C.REG_CS
+	REG_RFLAGS = C.REG_RFL
+	REG_RSP    = C.REG_RSP
+	REG_SS     = C.REG_SS
+)
diff --git a/src/runtime/defs_windows.go b/src/runtime/defs_windows.go
new file mode 100644
index 0000000..56698fa
--- /dev/null
+++ b/src/runtime/defs_windows.go
@@ -0,0 +1,90 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Windows architecture-independent definitions.
+
+package runtime
+
+const (
+	_PROT_NONE  = 0
+	_PROT_READ  = 1
+	_PROT_WRITE = 2
+	_PROT_EXEC  = 4
+
+	_MAP_ANON    = 1
+	_MAP_PRIVATE = 2
+
+	_DUPLICATE_SAME_ACCESS   = 0x2
+	_THREAD_PRIORITY_HIGHEST = 0x2
+
+	_SIGINT              = 0x2
+	_SIGTERM             = 0xF
+	_CTRL_C_EVENT        = 0x0
+	_CTRL_BREAK_EVENT    = 0x1
+	_CTRL_CLOSE_EVENT    = 0x2
+	_CTRL_LOGOFF_EVENT   = 0x5
+	_CTRL_SHUTDOWN_EVENT = 0x6
+
+	_EXCEPTION_ACCESS_VIOLATION     = 0xc0000005
+	_EXCEPTION_IN_PAGE_ERROR        = 0xc0000006
+	_EXCEPTION_BREAKPOINT           = 0x80000003
+	_EXCEPTION_ILLEGAL_INSTRUCTION  = 0xc000001d
+	_EXCEPTION_FLT_DENORMAL_OPERAND = 0xc000008d
+	_EXCEPTION_FLT_DIVIDE_BY_ZERO   = 0xc000008e
+	_EXCEPTION_FLT_INEXACT_RESULT   = 0xc000008f
+	_EXCEPTION_FLT_OVERFLOW         = 0xc0000091
+	_EXCEPTION_FLT_UNDERFLOW        = 0xc0000093
+	_EXCEPTION_INT_DIVIDE_BY_ZERO   = 0xc0000094
+	_EXCEPTION_INT_OVERFLOW         = 0xc0000095
+
+	_INFINITE     = 0xffffffff
+	_WAIT_TIMEOUT = 0x102
+
+	_EXCEPTION_CONTINUE_EXECUTION = -0x1
+	_EXCEPTION_CONTINUE_SEARCH    = 0x0
+)
+
+type systeminfo struct {
+	anon0                       [4]byte
+	dwpagesize                  uint32
+	lpminimumapplicationaddress *byte
+	lpmaximumapplicationaddress *byte
+	dwactiveprocessormask       uintptr
+	dwnumberofprocessors        uint32
+	dwprocessortype             uint32
+	dwallocationgranularity     uint32
+	wprocessorlevel             uint16
+	wprocessorrevision          uint16
+}
+
+type exceptionpointers struct {
+	record  *exceptionrecord
+	context *context
+}
+
+type exceptionrecord struct {
+	exceptioncode        uint32
+	exceptionflags       uint32
+	exceptionrecord      *exceptionrecord
+	exceptionaddress     uintptr
+	numberparameters     uint32
+	exceptioninformation [15]uintptr
+}
+
+type overlapped struct {
+	internal     uintptr
+	internalhigh uintptr
+	anon0        [8]byte
+	hevent       *byte
+}
+
+type memoryBasicInformation struct {
+	baseAddress       uintptr
+	allocationBase    uintptr
+	allocationProtect uint32
+	regionSize        uintptr
+	state             uint32
+	protect           uint32
+	type_             uint32
+}
diff --git a/src/runtime/defs_windows_386.go b/src/runtime/defs_windows_386.go
new file mode 100644
index 0000000..b11b155
--- /dev/null
+++ b/src/runtime/defs_windows_386.go
@@ -0,0 +1,81 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const _CONTEXT_CONTROL = 0x10001
+
+type floatingsavearea struct {
+	controlword   uint32
+	statusword    uint32
+	tagword       uint32
+	erroroffset   uint32
+	errorselector uint32
+	dataoffset    uint32
+	dataselector  uint32
+	registerarea  [80]uint8
+	cr0npxstate   uint32
+}
+
+type context struct {
+	contextflags      uint32
+	dr0               uint32
+	dr1               uint32
+	dr2               uint32
+	dr3               uint32
+	dr6               uint32
+	dr7               uint32
+	floatsave         floatingsavearea
+	seggs             uint32
+	segfs             uint32
+	seges             uint32
+	segds             uint32
+	edi               uint32
+	esi               uint32
+	ebx               uint32
+	edx               uint32
+	ecx               uint32
+	eax               uint32
+	ebp               uint32
+	eip               uint32
+	segcs             uint32
+	eflags            uint32
+	esp               uint32
+	segss             uint32
+	extendedregisters [512]uint8
+}
+
+func (c *context) ip() uintptr { return uintptr(c.eip) }
+func (c *context) sp() uintptr { return uintptr(c.esp) }
+
+// 386 does not have link register, so this returns 0.
+func (c *context) lr() uintptr      { return 0 }
+func (c *context) set_lr(x uintptr) {}
+
+func (c *context) set_ip(x uintptr) { c.eip = uint32(x) }
+func (c *context) set_sp(x uintptr) { c.esp = uint32(x) }
+
+// 386 does not have frame pointer register.
+func (c *context) set_fp(x uintptr) {}
+
+func prepareContextForSigResume(c *context) {
+	c.edx = c.esp
+	c.ecx = c.eip
+}
+
+func dumpregs(r *context) {
+	print("eax     ", hex(r.eax), "\n")
+	print("ebx     ", hex(r.ebx), "\n")
+	print("ecx     ", hex(r.ecx), "\n")
+	print("edx     ", hex(r.edx), "\n")
+	print("edi     ", hex(r.edi), "\n")
+	print("esi     ", hex(r.esi), "\n")
+	print("ebp     ", hex(r.ebp), "\n")
+	print("esp     ", hex(r.esp), "\n")
+	print("eip     ", hex(r.eip), "\n")
+	print("eflags  ", hex(r.eflags), "\n")
+	print("cs      ", hex(r.segcs), "\n")
+	print("fs      ", hex(r.segfs), "\n")
+	print("gs      ", hex(r.seggs), "\n")
+}
diff --git a/src/runtime/defs_windows_amd64.go b/src/runtime/defs_windows_amd64.go
new file mode 100644
index 0000000..20c9c4d
--- /dev/null
+++ b/src/runtime/defs_windows_amd64.go
@@ -0,0 +1,100 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const _CONTEXT_CONTROL = 0x100001
+
+type m128a struct {
+	low  uint64
+	high int64
+}
+
+type context struct {
+	p1home               uint64
+	p2home               uint64
+	p3home               uint64
+	p4home               uint64
+	p5home               uint64
+	p6home               uint64
+	contextflags         uint32
+	mxcsr                uint32
+	segcs                uint16
+	segds                uint16
+	seges                uint16
+	segfs                uint16
+	seggs                uint16
+	segss                uint16
+	eflags               uint32
+	dr0                  uint64
+	dr1                  uint64
+	dr2                  uint64
+	dr3                  uint64
+	dr6                  uint64
+	dr7                  uint64
+	rax                  uint64
+	rcx                  uint64
+	rdx                  uint64
+	rbx                  uint64
+	rsp                  uint64
+	rbp                  uint64
+	rsi                  uint64
+	rdi                  uint64
+	r8                   uint64
+	r9                   uint64
+	r10                  uint64
+	r11                  uint64
+	r12                  uint64
+	r13                  uint64
+	r14                  uint64
+	r15                  uint64
+	rip                  uint64
+	anon0                [512]byte
+	vectorregister       [26]m128a
+	vectorcontrol        uint64
+	debugcontrol         uint64
+	lastbranchtorip      uint64
+	lastbranchfromrip    uint64
+	lastexceptiontorip   uint64
+	lastexceptionfromrip uint64
+}
+
+func (c *context) ip() uintptr { return uintptr(c.rip) }
+func (c *context) sp() uintptr { return uintptr(c.rsp) }
+
+// AMD64 does not have link register, so this returns 0.
+func (c *context) lr() uintptr      { return 0 }
+func (c *context) set_lr(x uintptr) {}
+
+func (c *context) set_ip(x uintptr) { c.rip = uint64(x) }
+func (c *context) set_sp(x uintptr) { c.rsp = uint64(x) }
+func (c *context) set_fp(x uintptr) { c.rbp = uint64(x) }
+
+func prepareContextForSigResume(c *context) {
+	c.r8 = c.rsp
+	c.r9 = c.rip
+}
+
+func dumpregs(r *context) {
+	print("rax     ", hex(r.rax), "\n")
+	print("rbx     ", hex(r.rbx), "\n")
+	print("rcx     ", hex(r.rcx), "\n")
+	print("rdi     ", hex(r.rdi), "\n")
+	print("rsi     ", hex(r.rsi), "\n")
+	print("rbp     ", hex(r.rbp), "\n")
+	print("rsp     ", hex(r.rsp), "\n")
+	print("r8      ", hex(r.r8), "\n")
+	print("r9      ", hex(r.r9), "\n")
+	print("r10     ", hex(r.r10), "\n")
+	print("r11     ", hex(r.r11), "\n")
+	print("r12     ", hex(r.r12), "\n")
+	print("r13     ", hex(r.r13), "\n")
+	print("r14     ", hex(r.r14), "\n")
+	print("r15     ", hex(r.r15), "\n")
+	print("rip     ", hex(r.rip), "\n")
+	print("rflags  ", hex(r.eflags), "\n")
+	print("cs      ", hex(r.segcs), "\n")
+	print("fs      ", hex(r.segfs), "\n")
+	print("gs      ", hex(r.seggs), "\n")
+}
diff --git a/src/runtime/defs_windows_arm.go b/src/runtime/defs_windows_arm.go
new file mode 100644
index 0000000..7a18c95
--- /dev/null
+++ b/src/runtime/defs_windows_arm.go
@@ -0,0 +1,91 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// NOTE(rsc): _CONTEXT_CONTROL is actually 0x200001 and should include PC, SP, and LR.
+// However, empirically, LR doesn't come along on Windows 10
+// unless you also set _CONTEXT_INTEGER (0x200002).
+// Without LR, we skip over the next-to-bottom function in profiles
+// when the bottom function is frameless.
+// So we set both here, to make a working _CONTEXT_CONTROL.
+const _CONTEXT_CONTROL = 0x200003
+
+type neon128 struct {
+	low  uint64
+	high int64
+}
+
+type context struct {
+	contextflags uint32
+	r0           uint32
+	r1           uint32
+	r2           uint32
+	r3           uint32
+	r4           uint32
+	r5           uint32
+	r6           uint32
+	r7           uint32
+	r8           uint32
+	r9           uint32
+	r10          uint32
+	r11          uint32
+	r12          uint32
+
+	spr  uint32
+	lrr  uint32
+	pc   uint32
+	cpsr uint32
+
+	fpscr   uint32
+	padding uint32
+
+	floatNeon [16]neon128
+
+	bvr      [8]uint32
+	bcr      [8]uint32
+	wvr      [1]uint32
+	wcr      [1]uint32
+	padding2 [2]uint32
+}
+
+func (c *context) ip() uintptr { return uintptr(c.pc) }
+func (c *context) sp() uintptr { return uintptr(c.spr) }
+func (c *context) lr() uintptr { return uintptr(c.lrr) }
+
+func (c *context) set_ip(x uintptr) { c.pc = uint32(x) }
+func (c *context) set_sp(x uintptr) { c.spr = uint32(x) }
+func (c *context) set_lr(x uintptr) { c.lrr = uint32(x) }
+
+// arm does not have frame pointer register.
+func (c *context) set_fp(x uintptr) {}
+
+func prepareContextForSigResume(c *context) {
+	c.r0 = c.spr
+	c.r1 = c.pc
+}
+
+func dumpregs(r *context) {
+	print("r0   ", hex(r.r0), "\n")
+	print("r1   ", hex(r.r1), "\n")
+	print("r2   ", hex(r.r2), "\n")
+	print("r3   ", hex(r.r3), "\n")
+	print("r4   ", hex(r.r4), "\n")
+	print("r5   ", hex(r.r5), "\n")
+	print("r6   ", hex(r.r6), "\n")
+	print("r7   ", hex(r.r7), "\n")
+	print("r8   ", hex(r.r8), "\n")
+	print("r9   ", hex(r.r9), "\n")
+	print("r10  ", hex(r.r10), "\n")
+	print("r11  ", hex(r.r11), "\n")
+	print("r12  ", hex(r.r12), "\n")
+	print("sp   ", hex(r.spr), "\n")
+	print("lr   ", hex(r.lrr), "\n")
+	print("pc   ", hex(r.pc), "\n")
+	print("cpsr ", hex(r.cpsr), "\n")
+}
+
+func stackcheck() {
+	// TODO: not implemented on ARM
+}
diff --git a/src/runtime/defs_windows_arm64.go b/src/runtime/defs_windows_arm64.go
new file mode 100644
index 0000000..ef2efb1
--- /dev/null
+++ b/src/runtime/defs_windows_arm64.go
@@ -0,0 +1,89 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// NOTE(rsc): _CONTEXT_CONTROL is actually 0x400001 and should include PC, SP, and LR.
+// However, empirically, LR doesn't come along on Windows 10
+// unless you also set _CONTEXT_INTEGER (0x400002).
+// Without LR, we skip over the next-to-bottom function in profiles
+// when the bottom function is frameless.
+// So we set both here, to make a working _CONTEXT_CONTROL.
+const _CONTEXT_CONTROL = 0x400003
+
+type neon128 struct {
+	low  uint64
+	high int64
+}
+
+// See https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-arm64_nt_context
+type context struct {
+	contextflags uint32
+	cpsr         uint32
+	x            [31]uint64 // fp is x[29], lr is x[30]
+	xsp          uint64
+	pc           uint64
+	v            [32]neon128
+	fpcr         uint32
+	fpsr         uint32
+	bcr          [8]uint32
+	bvr          [8]uint64
+	wcr          [2]uint32
+	wvr          [2]uint64
+}
+
+func (c *context) ip() uintptr { return uintptr(c.pc) }
+func (c *context) sp() uintptr { return uintptr(c.xsp) }
+func (c *context) lr() uintptr { return uintptr(c.x[30]) }
+
+func (c *context) set_ip(x uintptr) { c.pc = uint64(x) }
+func (c *context) set_sp(x uintptr) { c.xsp = uint64(x) }
+func (c *context) set_lr(x uintptr) { c.x[30] = uint64(x) }
+func (c *context) set_fp(x uintptr) { c.x[29] = uint64(x) }
+
+func prepareContextForSigResume(c *context) {
+	c.x[0] = c.xsp
+	c.x[1] = c.pc
+}
+
+func dumpregs(r *context) {
+	print("r0   ", hex(r.x[0]), "\n")
+	print("r1   ", hex(r.x[1]), "\n")
+	print("r2   ", hex(r.x[2]), "\n")
+	print("r3   ", hex(r.x[3]), "\n")
+	print("r4   ", hex(r.x[4]), "\n")
+	print("r5   ", hex(r.x[5]), "\n")
+	print("r6   ", hex(r.x[6]), "\n")
+	print("r7   ", hex(r.x[7]), "\n")
+	print("r8   ", hex(r.x[8]), "\n")
+	print("r9   ", hex(r.x[9]), "\n")
+	print("r10  ", hex(r.x[10]), "\n")
+	print("r11  ", hex(r.x[11]), "\n")
+	print("r12  ", hex(r.x[12]), "\n")
+	print("r13  ", hex(r.x[13]), "\n")
+	print("r14  ", hex(r.x[14]), "\n")
+	print("r15  ", hex(r.x[15]), "\n")
+	print("r16  ", hex(r.x[16]), "\n")
+	print("r17  ", hex(r.x[17]), "\n")
+	print("r18  ", hex(r.x[18]), "\n")
+	print("r19  ", hex(r.x[19]), "\n")
+	print("r20  ", hex(r.x[20]), "\n")
+	print("r21  ", hex(r.x[21]), "\n")
+	print("r22  ", hex(r.x[22]), "\n")
+	print("r23  ", hex(r.x[23]), "\n")
+	print("r24  ", hex(r.x[24]), "\n")
+	print("r25  ", hex(r.x[25]), "\n")
+	print("r26  ", hex(r.x[26]), "\n")
+	print("r27  ", hex(r.x[27]), "\n")
+	print("r28  ", hex(r.x[28]), "\n")
+	print("r29  ", hex(r.x[29]), "\n")
+	print("lr   ", hex(r.x[30]), "\n")
+	print("sp   ", hex(r.xsp), "\n")
+	print("pc   ", hex(r.pc), "\n")
+	print("cpsr ", hex(r.cpsr), "\n")
+}
+
+func stackcheck() {
+	// TODO: not implemented on ARM
+}
diff --git a/src/runtime/duff_386.s b/src/runtime/duff_386.s
new file mode 100644
index 0000000..ab01430
--- /dev/null
+++ b/src/runtime/duff_386.s
@@ -0,0 +1,779 @@
+// Code generated by mkduff.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkduff.go for comments.
+
+#include "textflag.h"
+
+TEXT runtime·duffzero(SB), NOSPLIT, $0-0
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	STOSL
+	RET
+
+TEXT runtime·duffcopy(SB), NOSPLIT, $0-0
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	MOVL	(SI), CX
+	ADDL	$4, SI
+	MOVL	CX, (DI)
+	ADDL	$4, DI
+
+	RET
diff --git a/src/runtime/duff_amd64.s b/src/runtime/duff_amd64.s
new file mode 100644
index 0000000..69e9980
--- /dev/null
+++ b/src/runtime/duff_amd64.s
@@ -0,0 +1,427 @@
+// Code generated by mkduff.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkduff.go for comments.
+
+#include "textflag.h"
+
+TEXT runtime·duffzero<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-0
+	MOVUPS	X15,(DI)
+	MOVUPS	X15,16(DI)
+	MOVUPS	X15,32(DI)
+	MOVUPS	X15,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X15,(DI)
+	MOVUPS	X15,16(DI)
+	MOVUPS	X15,32(DI)
+	MOVUPS	X15,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X15,(DI)
+	MOVUPS	X15,16(DI)
+	MOVUPS	X15,32(DI)
+	MOVUPS	X15,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X15,(DI)
+	MOVUPS	X15,16(DI)
+	MOVUPS	X15,32(DI)
+	MOVUPS	X15,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X15,(DI)
+	MOVUPS	X15,16(DI)
+	MOVUPS	X15,32(DI)
+	MOVUPS	X15,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X15,(DI)
+	MOVUPS	X15,16(DI)
+	MOVUPS	X15,32(DI)
+	MOVUPS	X15,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X15,(DI)
+	MOVUPS	X15,16(DI)
+	MOVUPS	X15,32(DI)
+	MOVUPS	X15,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X15,(DI)
+	MOVUPS	X15,16(DI)
+	MOVUPS	X15,32(DI)
+	MOVUPS	X15,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X15,(DI)
+	MOVUPS	X15,16(DI)
+	MOVUPS	X15,32(DI)
+	MOVUPS	X15,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X15,(DI)
+	MOVUPS	X15,16(DI)
+	MOVUPS	X15,32(DI)
+	MOVUPS	X15,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X15,(DI)
+	MOVUPS	X15,16(DI)
+	MOVUPS	X15,32(DI)
+	MOVUPS	X15,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X15,(DI)
+	MOVUPS	X15,16(DI)
+	MOVUPS	X15,32(DI)
+	MOVUPS	X15,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X15,(DI)
+	MOVUPS	X15,16(DI)
+	MOVUPS	X15,32(DI)
+	MOVUPS	X15,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X15,(DI)
+	MOVUPS	X15,16(DI)
+	MOVUPS	X15,32(DI)
+	MOVUPS	X15,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X15,(DI)
+	MOVUPS	X15,16(DI)
+	MOVUPS	X15,32(DI)
+	MOVUPS	X15,48(DI)
+	LEAQ	64(DI),DI
+
+	MOVUPS	X15,(DI)
+	MOVUPS	X15,16(DI)
+	MOVUPS	X15,32(DI)
+	MOVUPS	X15,48(DI)
+	LEAQ	64(DI),DI
+
+	RET
+
+TEXT runtime·duffcopy<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-0
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	MOVUPS	(SI), X0
+	ADDQ	$16, SI
+	MOVUPS	X0, (DI)
+	ADDQ	$16, DI
+
+	RET
diff --git a/src/runtime/duff_arm.s b/src/runtime/duff_arm.s
new file mode 100644
index 0000000..ba8235b
--- /dev/null
+++ b/src/runtime/duff_arm.s
@@ -0,0 +1,523 @@
+// Code generated by mkduff.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkduff.go for comments.
+
+#include "textflag.h"
+
+TEXT runtime·duffzero(SB), NOSPLIT, $0-0
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	MOVW.P	R0, 4(R1)
+	RET
+
+TEXT runtime·duffcopy(SB), NOSPLIT, $0-0
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	MOVW.P	4(R1), R0
+	MOVW.P	R0, 4(R2)
+
+	RET
diff --git a/src/runtime/duff_arm64.s b/src/runtime/duff_arm64.s
new file mode 100644
index 0000000..33c4905
--- /dev/null
+++ b/src/runtime/duff_arm64.s
@@ -0,0 +1,267 @@
+// Code generated by mkduff.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkduff.go for comments.
+
+#include "textflag.h"
+
+TEXT runtime·duffzero<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-0
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP.P	(ZR, ZR), 16(R20)
+	STP	(ZR, ZR), (R20)
+	RET
+
+TEXT runtime·duffcopy<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-0
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	LDP.P	16(R20), (R26, R27)
+	STP.P	(R26, R27), 16(R21)
+
+	RET
diff --git a/src/runtime/duff_loong64.s b/src/runtime/duff_loong64.s
new file mode 100644
index 0000000..7f78e4f
--- /dev/null
+++ b/src/runtime/duff_loong64.s
@@ -0,0 +1,907 @@
+// Code generated by mkduff.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkduff.go for comments.
+
+#include "textflag.h"
+
+TEXT runtime·duffzero(SB), NOSPLIT|NOFRAME, $0-0
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	MOVV	R0, 8(R19)
+	ADDV	$8, R19
+	RET
+
+TEXT runtime·duffcopy(SB), NOSPLIT|NOFRAME, $0-0
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	MOVV	(R19), R30
+	ADDV	$8, R19
+	MOVV	R30, (R20)
+	ADDV	$8, R20
+
+	RET
diff --git a/src/runtime/duff_mips64x.s b/src/runtime/duff_mips64x.s
new file mode 100644
index 0000000..3a8524c
--- /dev/null
+++ b/src/runtime/duff_mips64x.s
@@ -0,0 +1,909 @@
+// Code generated by mkduff.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkduff.go for comments.
+
+//go:build mips64 || mips64le
+
+#include "textflag.h"
+
+TEXT runtime·duffzero(SB), NOSPLIT|NOFRAME, $0-0
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	MOVV	R0, 8(R1)
+	ADDV	$8, R1
+	RET
+
+TEXT runtime·duffcopy(SB), NOSPLIT|NOFRAME, $0-0
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	MOVV	(R1), R23
+	ADDV	$8, R1
+	MOVV	R23, (R2)
+	ADDV	$8, R2
+
+	RET
diff --git a/src/runtime/duff_ppc64x.s b/src/runtime/duff_ppc64x.s
new file mode 100644
index 0000000..a3caaa8
--- /dev/null
+++ b/src/runtime/duff_ppc64x.s
@@ -0,0 +1,397 @@
+// Code generated by mkduff.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkduff.go for comments.
+
+//go:build ppc64 || ppc64le
+
+#include "textflag.h"
+
+TEXT runtime·duffzero<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-0
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	MOVDU	R0, 8(R20)
+	RET
+
+TEXT runtime·duffcopy<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-0
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	MOVDU	8(R20), R5
+	MOVDU	R5, 8(R21)
+	RET
diff --git a/src/runtime/duff_riscv64.s b/src/runtime/duff_riscv64.s
new file mode 100644
index 0000000..ec44767
--- /dev/null
+++ b/src/runtime/duff_riscv64.s
@@ -0,0 +1,907 @@
+// Code generated by mkduff.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkduff.go for comments.
+
+#include "textflag.h"
+
+TEXT runtime·duffzero<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-0
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	MOV	ZERO, (X25)
+	ADD	$8, X25
+	RET
+
+TEXT runtime·duffcopy<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-0
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	MOV	(X24), X31
+	ADD	$8, X24
+	MOV	X31, (X25)
+	ADD	$8, X25
+
+	RET
diff --git a/src/runtime/duff_s390x.s b/src/runtime/duff_s390x.s
new file mode 100644
index 0000000..95d492a
--- /dev/null
+++ b/src/runtime/duff_s390x.s
@@ -0,0 +1,19 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// s390x can copy/zero 1-256 bytes with a single instruction,
+// so there's no need for these, except to satisfy the prototypes
+// in stubs.go.
+
+TEXT runtime·duffzero(SB),NOSPLIT|NOFRAME,$0-0
+	MOVD	$0, 2(R0)
+	RET
+
+TEXT runtime·duffcopy(SB),NOSPLIT|NOFRAME,$0-0
+	MOVD	$0, 2(R0)
+	RET
diff --git a/src/runtime/ehooks_test.go b/src/runtime/ehooks_test.go
new file mode 100644
index 0000000..ee286ec
--- /dev/null
+++ b/src/runtime/ehooks_test.go
@@ -0,0 +1,91 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"internal/platform"
+	"internal/testenv"
+	"os/exec"
+	"runtime"
+	"strings"
+	"testing"
+)
+
+func TestExitHooks(t *testing.T) {
+	bmodes := []string{""}
+	if testing.Short() {
+		t.Skip("skipping due to -short")
+	}
+	// Note the HasCGO() test below; this is to prevent the test
+	// running if CGO_ENABLED=0 is in effect.
+	haverace := platform.RaceDetectorSupported(runtime.GOOS, runtime.GOARCH)
+	if haverace && testenv.HasCGO() {
+		bmodes = append(bmodes, "-race")
+	}
+	for _, bmode := range bmodes {
+		scenarios := []struct {
+			mode     string
+			expected string
+			musthave string
+		}{
+			{
+				mode:     "simple",
+				expected: "bar foo",
+				musthave: "",
+			},
+			{
+				mode:     "goodexit",
+				expected: "orange apple",
+				musthave: "",
+			},
+			{
+				mode:     "badexit",
+				expected: "blub blix",
+				musthave: "",
+			},
+			{
+				mode:     "panics",
+				expected: "",
+				musthave: "fatal error: internal error: exit hook invoked panic",
+			},
+			{
+				mode:     "callsexit",
+				expected: "",
+				musthave: "fatal error: internal error: exit hook invoked exit",
+			},
+		}
+
+		exe, err := buildTestProg(t, "testexithooks", bmode)
+		if err != nil {
+			t.Fatal(err)
+		}
+
+		bt := ""
+		if bmode != "" {
+			bt = " bmode: " + bmode
+		}
+		for _, s := range scenarios {
+			cmd := exec.Command(exe, []string{"-mode", s.mode}...)
+			out, _ := cmd.CombinedOutput()
+			outs := strings.ReplaceAll(string(out), "\n", " ")
+			outs = strings.TrimSpace(outs)
+			if s.expected != "" {
+				if s.expected != outs {
+					t.Logf("raw output: %q", outs)
+					t.Errorf("failed%s mode %s: wanted %q got %q", bt,
+						s.mode, s.expected, outs)
+				}
+			} else if s.musthave != "" {
+				if !strings.Contains(outs, s.musthave) {
+					t.Logf("raw output: %q", outs)
+					t.Errorf("failed mode %s: output does not contain %q",
+						s.mode, s.musthave)
+				}
+			} else {
+				panic("badly written scenario")
+			}
+		}
+	}
+}
diff --git a/src/runtime/env_plan9.go b/src/runtime/env_plan9.go
new file mode 100644
index 0000000..d206c5d
--- /dev/null
+++ b/src/runtime/env_plan9.go
@@ -0,0 +1,126 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+const (
+	// Plan 9 environment device
+	envDir = "/env/"
+	// size of buffer to read from a directory
+	dirBufSize = 4096
+	// size of buffer to read an environment variable (may grow)
+	envBufSize = 128
+	// offset of the name field in a 9P directory entry - see syscall.UnmarshalDir()
+	nameOffset = 39
+)
+
+// goenvs caches the Plan 9 environment variables at start of execution into
+// string array envs, to supply the initial contents for os.Environ.
+// Subsequent calls to os.Setenv will change this cache, without writing back
+// to the (possibly shared) Plan 9 environment, so that Setenv and Getenv
+// conform to the same Posix semantics as on other operating systems.
+// For Plan 9 shared environment semantics, instead of Getenv(key) and
+// Setenv(key, value), one can use os.ReadFile("/env/" + key) and
+// os.WriteFile("/env/" + key, value, 0666) respectively.
+//
+//go:nosplit
+func goenvs() {
+	buf := make([]byte, envBufSize)
+	copy(buf, envDir)
+	dirfd := open(&buf[0], _OREAD, 0)
+	if dirfd < 0 {
+		return
+	}
+	defer closefd(dirfd)
+	dofiles(dirfd, func(name []byte) {
+		name = append(name, 0)
+		buf = buf[:len(envDir)]
+		copy(buf, envDir)
+		buf = append(buf, name...)
+		fd := open(&buf[0], _OREAD, 0)
+		if fd < 0 {
+			return
+		}
+		defer closefd(fd)
+		n := len(buf)
+		r := 0
+		for {
+			r = int(pread(fd, unsafe.Pointer(&buf[0]), int32(n), 0))
+			if r < n {
+				break
+			}
+			n = int(seek(fd, 0, 2)) + 1
+			if len(buf) < n {
+				buf = make([]byte, n)
+			}
+		}
+		if r <= 0 {
+			r = 0
+		} else if buf[r-1] == 0 {
+			r--
+		}
+		name[len(name)-1] = '='
+		env := make([]byte, len(name)+r)
+		copy(env, name)
+		copy(env[len(name):], buf[:r])
+		envs = append(envs, string(env))
+	})
+}
+
+// dofiles reads the directory opened with file descriptor fd, applying function f
+// to each filename in it.
+//
+//go:nosplit
+func dofiles(dirfd int32, f func([]byte)) {
+	dirbuf := new([dirBufSize]byte)
+
+	var off int64 = 0
+	for {
+		n := pread(dirfd, unsafe.Pointer(&dirbuf[0]), int32(dirBufSize), off)
+		if n <= 0 {
+			return
+		}
+		for b := dirbuf[:n]; len(b) > 0; {
+			var name []byte
+			name, b = gdirname(b)
+			if name == nil {
+				return
+			}
+			f(name)
+		}
+		off += int64(n)
+	}
+}
+
+// gdirname returns the first filename from a buffer of directory entries,
+// and a slice containing the remaining directory entries.
+// If the buffer doesn't start with a valid directory entry, the returned name is nil.
+//
+//go:nosplit
+func gdirname(buf []byte) (name []byte, rest []byte) {
+	if 2+nameOffset+2 > len(buf) {
+		return
+	}
+	entryLen, buf := gbit16(buf)
+	if entryLen > len(buf) {
+		return
+	}
+	n, b := gbit16(buf[nameOffset:])
+	if n > len(b) {
+		return
+	}
+	name = b[:n]
+	rest = buf[entryLen:]
+	return
+}
+
+// gbit16 reads a 16-bit little-endian binary number from b and returns it
+// with the remaining slice of b.
+//
+//go:nosplit
+func gbit16(b []byte) (int, []byte) {
+	return int(b[0]) | int(b[1])<<8, b[2:]
+}
diff --git a/src/runtime/env_posix.go b/src/runtime/env_posix.go
new file mode 100644
index 0000000..0eb4f0d
--- /dev/null
+++ b/src/runtime/env_posix.go
@@ -0,0 +1,70 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+func gogetenv(key string) string {
+	env := environ()
+	if env == nil {
+		throw("getenv before env init")
+	}
+	for _, s := range env {
+		if len(s) > len(key) && s[len(key)] == '=' && envKeyEqual(s[:len(key)], key) {
+			return s[len(key)+1:]
+		}
+	}
+	return ""
+}
+
+// envKeyEqual reports whether a == b, with ASCII-only case insensitivity
+// on Windows. The two strings must have the same length.
+func envKeyEqual(a, b string) bool {
+	if GOOS == "windows" { // case insensitive
+		for i := 0; i < len(a); i++ {
+			ca, cb := a[i], b[i]
+			if ca == cb || lowerASCII(ca) == lowerASCII(cb) {
+				continue
+			}
+			return false
+		}
+		return true
+	}
+	return a == b
+}
+
+func lowerASCII(c byte) byte {
+	if 'A' <= c && c <= 'Z' {
+		return c + ('a' - 'A')
+	}
+	return c
+}
+
+var _cgo_setenv unsafe.Pointer   // pointer to C function
+var _cgo_unsetenv unsafe.Pointer // pointer to C function
+
+// Update the C environment if cgo is loaded.
+func setenv_c(k string, v string) {
+	if _cgo_setenv == nil {
+		return
+	}
+	arg := [2]unsafe.Pointer{cstring(k), cstring(v)}
+	asmcgocall(_cgo_setenv, unsafe.Pointer(&arg))
+}
+
+// Update the C environment if cgo is loaded.
+func unsetenv_c(k string) {
+	if _cgo_unsetenv == nil {
+		return
+	}
+	arg := [1]unsafe.Pointer{cstring(k)}
+	asmcgocall(_cgo_unsetenv, unsafe.Pointer(&arg))
+}
+
+func cstring(s string) unsafe.Pointer {
+	p := make([]byte, len(s)+1)
+	copy(p, s)
+	return unsafe.Pointer(&p[0])
+}
diff --git a/src/runtime/env_test.go b/src/runtime/env_test.go
new file mode 100644
index 0000000..c009d0f
--- /dev/null
+++ b/src/runtime/env_test.go
@@ -0,0 +1,43 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime"
+	"syscall"
+	"testing"
+)
+
+func TestFixedGOROOT(t *testing.T) {
+	// Restore both the real GOROOT environment variable, and runtime's copies:
+	if orig, ok := syscall.Getenv("GOROOT"); ok {
+		defer syscall.Setenv("GOROOT", orig)
+	} else {
+		defer syscall.Unsetenv("GOROOT")
+	}
+	envs := runtime.Envs()
+	oldenvs := append([]string{}, envs...)
+	defer runtime.SetEnvs(oldenvs)
+
+	// attempt to reuse existing envs backing array.
+	want := runtime.GOROOT()
+	runtime.SetEnvs(append(envs[:0], "GOROOT="+want))
+
+	if got := runtime.GOROOT(); got != want {
+		t.Errorf(`initial runtime.GOROOT()=%q, want %q`, got, want)
+	}
+	if err := syscall.Setenv("GOROOT", "/os"); err != nil {
+		t.Fatal(err)
+	}
+	if got := runtime.GOROOT(); got != want {
+		t.Errorf(`after setenv runtime.GOROOT()=%q, want %q`, got, want)
+	}
+	if err := syscall.Unsetenv("GOROOT"); err != nil {
+		t.Fatal(err)
+	}
+	if got := runtime.GOROOT(); got != want {
+		t.Errorf(`after unsetenv runtime.GOROOT()=%q, want %q`, got, want)
+	}
+}
diff --git a/src/runtime/error.go b/src/runtime/error.go
new file mode 100644
index 0000000..3590ccd
--- /dev/null
+++ b/src/runtime/error.go
@@ -0,0 +1,330 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "internal/bytealg"
+
+// The Error interface identifies a run time error.
+type Error interface {
+	error
+
+	// RuntimeError is a no-op function but
+	// serves to distinguish types that are run time
+	// errors from ordinary errors: a type is a
+	// run time error if it has a RuntimeError method.
+	RuntimeError()
+}
+
+// A TypeAssertionError explains a failed type assertion.
+type TypeAssertionError struct {
+	_interface    *_type
+	concrete      *_type
+	asserted      *_type
+	missingMethod string // one method needed by Interface, missing from Concrete
+}
+
+func (*TypeAssertionError) RuntimeError() {}
+
+func (e *TypeAssertionError) Error() string {
+	inter := "interface"
+	if e._interface != nil {
+		inter = toRType(e._interface).string()
+	}
+	as := toRType(e.asserted).string()
+	if e.concrete == nil {
+		return "interface conversion: " + inter + " is nil, not " + as
+	}
+	cs := toRType(e.concrete).string()
+	if e.missingMethod == "" {
+		msg := "interface conversion: " + inter + " is " + cs + ", not " + as
+		if cs == as {
+			// provide slightly clearer error message
+			if toRType(e.concrete).pkgpath() != toRType(e.asserted).pkgpath() {
+				msg += " (types from different packages)"
+			} else {
+				msg += " (types from different scopes)"
+			}
+		}
+		return msg
+	}
+	return "interface conversion: " + cs + " is not " + as +
+		": missing method " + e.missingMethod
+}
+
+// itoa converts val to a decimal representation. The result is
+// written somewhere within buf and the location of the result is returned.
+// buf must be at least 20 bytes.
+//
+//go:nosplit
+func itoa(buf []byte, val uint64) []byte {
+	i := len(buf) - 1
+	for val >= 10 {
+		buf[i] = byte(val%10 + '0')
+		i--
+		val /= 10
+	}
+	buf[i] = byte(val + '0')
+	return buf[i:]
+}
+
+// An errorString represents a runtime error described by a single string.
+type errorString string
+
+func (e errorString) RuntimeError() {}
+
+func (e errorString) Error() string {
+	return "runtime error: " + string(e)
+}
+
+type errorAddressString struct {
+	msg  string  // error message
+	addr uintptr // memory address where the error occurred
+}
+
+func (e errorAddressString) RuntimeError() {}
+
+func (e errorAddressString) Error() string {
+	return "runtime error: " + e.msg
+}
+
+// Addr returns the memory address where a fault occurred.
+// The address provided is best-effort.
+// The veracity of the result may depend on the platform.
+// Errors providing this method will only be returned as
+// a result of using runtime/debug.SetPanicOnFault.
+func (e errorAddressString) Addr() uintptr {
+	return e.addr
+}
+
+// plainError represents a runtime error described a string without
+// the prefix "runtime error: " after invoking errorString.Error().
+// See Issue #14965.
+type plainError string
+
+func (e plainError) RuntimeError() {}
+
+func (e plainError) Error() string {
+	return string(e)
+}
+
+// A boundsError represents an indexing or slicing operation gone wrong.
+type boundsError struct {
+	x int64
+	y int
+	// Values in an index or slice expression can be signed or unsigned.
+	// That means we'd need 65 bits to encode all possible indexes, from -2^63 to 2^64-1.
+	// Instead, we keep track of whether x should be interpreted as signed or unsigned.
+	// y is known to be nonnegative and to fit in an int.
+	signed bool
+	code   boundsErrorCode
+}
+
+type boundsErrorCode uint8
+
+const (
+	boundsIndex boundsErrorCode = iota // s[x], 0 <= x < len(s) failed
+
+	boundsSliceAlen // s[?:x], 0 <= x <= len(s) failed
+	boundsSliceAcap // s[?:x], 0 <= x <= cap(s) failed
+	boundsSliceB    // s[x:y], 0 <= x <= y failed (but boundsSliceA didn't happen)
+
+	boundsSlice3Alen // s[?:?:x], 0 <= x <= len(s) failed
+	boundsSlice3Acap // s[?:?:x], 0 <= x <= cap(s) failed
+	boundsSlice3B    // s[?:x:y], 0 <= x <= y failed (but boundsSlice3A didn't happen)
+	boundsSlice3C    // s[x:y:?], 0 <= x <= y failed (but boundsSlice3A/B didn't happen)
+
+	boundsConvert // (*[x]T)(s), 0 <= x <= len(s) failed
+	// Note: in the above, len(s) and cap(s) are stored in y
+)
+
+// boundsErrorFmts provide error text for various out-of-bounds panics.
+// Note: if you change these strings, you should adjust the size of the buffer
+// in boundsError.Error below as well.
+var boundsErrorFmts = [...]string{
+	boundsIndex:      "index out of range [%x] with length %y",
+	boundsSliceAlen:  "slice bounds out of range [:%x] with length %y",
+	boundsSliceAcap:  "slice bounds out of range [:%x] with capacity %y",
+	boundsSliceB:     "slice bounds out of range [%x:%y]",
+	boundsSlice3Alen: "slice bounds out of range [::%x] with length %y",
+	boundsSlice3Acap: "slice bounds out of range [::%x] with capacity %y",
+	boundsSlice3B:    "slice bounds out of range [:%x:%y]",
+	boundsSlice3C:    "slice bounds out of range [%x:%y:]",
+	boundsConvert:    "cannot convert slice with length %y to array or pointer to array with length %x",
+}
+
+// boundsNegErrorFmts are overriding formats if x is negative. In this case there's no need to report y.
+var boundsNegErrorFmts = [...]string{
+	boundsIndex:      "index out of range [%x]",
+	boundsSliceAlen:  "slice bounds out of range [:%x]",
+	boundsSliceAcap:  "slice bounds out of range [:%x]",
+	boundsSliceB:     "slice bounds out of range [%x:]",
+	boundsSlice3Alen: "slice bounds out of range [::%x]",
+	boundsSlice3Acap: "slice bounds out of range [::%x]",
+	boundsSlice3B:    "slice bounds out of range [:%x:]",
+	boundsSlice3C:    "slice bounds out of range [%x::]",
+}
+
+func (e boundsError) RuntimeError() {}
+
+func appendIntStr(b []byte, v int64, signed bool) []byte {
+	if signed && v < 0 {
+		b = append(b, '-')
+		v = -v
+	}
+	var buf [20]byte
+	b = append(b, itoa(buf[:], uint64(v))...)
+	return b
+}
+
+func (e boundsError) Error() string {
+	fmt := boundsErrorFmts[e.code]
+	if e.signed && e.x < 0 {
+		fmt = boundsNegErrorFmts[e.code]
+	}
+	// max message length is 99: "runtime error: slice bounds out of range [::%x] with capacity %y"
+	// x can be at most 20 characters. y can be at most 19.
+	b := make([]byte, 0, 100)
+	b = append(b, "runtime error: "...)
+	for i := 0; i < len(fmt); i++ {
+		c := fmt[i]
+		if c != '%' {
+			b = append(b, c)
+			continue
+		}
+		i++
+		switch fmt[i] {
+		case 'x':
+			b = appendIntStr(b, e.x, e.signed)
+		case 'y':
+			b = appendIntStr(b, int64(e.y), true)
+		}
+	}
+	return string(b)
+}
+
+type stringer interface {
+	String() string
+}
+
+// printany prints an argument passed to panic.
+// If panic is called with a value that has a String or Error method,
+// it has already been converted into a string by preprintpanics.
+func printany(i any) {
+	switch v := i.(type) {
+	case nil:
+		print("nil")
+	case bool:
+		print(v)
+	case int:
+		print(v)
+	case int8:
+		print(v)
+	case int16:
+		print(v)
+	case int32:
+		print(v)
+	case int64:
+		print(v)
+	case uint:
+		print(v)
+	case uint8:
+		print(v)
+	case uint16:
+		print(v)
+	case uint32:
+		print(v)
+	case uint64:
+		print(v)
+	case uintptr:
+		print(v)
+	case float32:
+		print(v)
+	case float64:
+		print(v)
+	case complex64:
+		print(v)
+	case complex128:
+		print(v)
+	case string:
+		print(v)
+	default:
+		printanycustomtype(i)
+	}
+}
+
+func printanycustomtype(i any) {
+	eface := efaceOf(&i)
+	typestring := toRType(eface._type).string()
+
+	switch eface._type.Kind_ {
+	case kindString:
+		print(typestring, `("`, *(*string)(eface.data), `")`)
+	case kindBool:
+		print(typestring, "(", *(*bool)(eface.data), ")")
+	case kindInt:
+		print(typestring, "(", *(*int)(eface.data), ")")
+	case kindInt8:
+		print(typestring, "(", *(*int8)(eface.data), ")")
+	case kindInt16:
+		print(typestring, "(", *(*int16)(eface.data), ")")
+	case kindInt32:
+		print(typestring, "(", *(*int32)(eface.data), ")")
+	case kindInt64:
+		print(typestring, "(", *(*int64)(eface.data), ")")
+	case kindUint:
+		print(typestring, "(", *(*uint)(eface.data), ")")
+	case kindUint8:
+		print(typestring, "(", *(*uint8)(eface.data), ")")
+	case kindUint16:
+		print(typestring, "(", *(*uint16)(eface.data), ")")
+	case kindUint32:
+		print(typestring, "(", *(*uint32)(eface.data), ")")
+	case kindUint64:
+		print(typestring, "(", *(*uint64)(eface.data), ")")
+	case kindUintptr:
+		print(typestring, "(", *(*uintptr)(eface.data), ")")
+	case kindFloat32:
+		print(typestring, "(", *(*float32)(eface.data), ")")
+	case kindFloat64:
+		print(typestring, "(", *(*float64)(eface.data), ")")
+	case kindComplex64:
+		print(typestring, *(*complex64)(eface.data))
+	case kindComplex128:
+		print(typestring, *(*complex128)(eface.data))
+	default:
+		print("(", typestring, ") ", eface.data)
+	}
+}
+
+// panicwrap generates a panic for a call to a wrapped value method
+// with a nil pointer receiver.
+//
+// It is called from the generated wrapper code.
+func panicwrap() {
+	pc := getcallerpc()
+	name := funcNameForPrint(funcname(findfunc(pc)))
+	// name is something like "main.(*T).F".
+	// We want to extract pkg ("main"), typ ("T"), and meth ("F").
+	// Do it by finding the parens.
+	i := bytealg.IndexByteString(name, '(')
+	if i < 0 {
+		throw("panicwrap: no ( in " + name)
+	}
+	pkg := name[:i-1]
+	if i+2 >= len(name) || name[i-1:i+2] != ".(*" {
+		throw("panicwrap: unexpected string after package name: " + name)
+	}
+	name = name[i+2:]
+	i = bytealg.IndexByteString(name, ')')
+	if i < 0 {
+		throw("panicwrap: no ) in " + name)
+	}
+	if i+2 >= len(name) || name[i:i+2] != ")." {
+		throw("panicwrap: unexpected string after type name: " + name)
+	}
+	typ := name[:i]
+	meth := name[i+2:]
+	panic(plainError("value method " + pkg + "." + typ + "." + meth + " called using nil *" + typ + " pointer"))
+}
diff --git a/src/runtime/example_test.go b/src/runtime/example_test.go
new file mode 100644
index 0000000..dcb8f77
--- /dev/null
+++ b/src/runtime/example_test.go
@@ -0,0 +1,62 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"runtime"
+	"strings"
+)
+
+func ExampleFrames() {
+	c := func() {
+		// Ask runtime.Callers for up to 10 PCs, including runtime.Callers itself.
+		pc := make([]uintptr, 10)
+		n := runtime.Callers(0, pc)
+		if n == 0 {
+			// No PCs available. This can happen if the first argument to
+			// runtime.Callers is large.
+			//
+			// Return now to avoid processing the zero Frame that would
+			// otherwise be returned by frames.Next below.
+			return
+		}
+
+		pc = pc[:n] // pass only valid pcs to runtime.CallersFrames
+		frames := runtime.CallersFrames(pc)
+
+		// Loop to get frames.
+		// A fixed number of PCs can expand to an indefinite number of Frames.
+		for {
+			frame, more := frames.Next()
+
+			// Process this frame.
+			//
+			// To keep this example's output stable
+			// even if there are changes in the testing package,
+			// stop unwinding when we leave package runtime.
+			if !strings.Contains(frame.File, "runtime/") {
+				break
+			}
+			fmt.Printf("- more:%v | %s\n", more, frame.Function)
+
+			// Check whether there are more frames to process after this one.
+			if !more {
+				break
+			}
+		}
+	}
+
+	b := func() { c() }
+	a := func() { b() }
+
+	a()
+	// Output:
+	// - more:true | runtime.Callers
+	// - more:true | runtime_test.ExampleFrames.func1
+	// - more:true | runtime_test.ExampleFrames.func2
+	// - more:true | runtime_test.ExampleFrames.func3
+	// - more:true | runtime_test.ExampleFrames
+}
diff --git a/src/runtime/exithook.go b/src/runtime/exithook.go
new file mode 100644
index 0000000..65b426b
--- /dev/null
+++ b/src/runtime/exithook.go
@@ -0,0 +1,69 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// addExitHook registers the specified function 'f' to be run at
+// program termination (e.g. when someone invokes os.Exit(), or when
+// main.main returns). Hooks are run in reverse order of registration:
+// first hook added is the last one run.
+//
+// CAREFUL: the expectation is that addExitHook should only be called
+// from a safe context (e.g. not an error/panic path or signal
+// handler, preemption enabled, allocation allowed, write barriers
+// allowed, etc), and that the exit function 'f' will be invoked under
+// similar circumstances. That is the say, we are expecting that 'f'
+// uses normal / high-level Go code as opposed to one of the more
+// restricted dialects used for the trickier parts of the runtime.
+func addExitHook(f func(), runOnNonZeroExit bool) {
+	exitHooks.hooks = append(exitHooks.hooks, exitHook{f: f, runOnNonZeroExit: runOnNonZeroExit})
+}
+
+// exitHook stores a function to be run on program exit, registered
+// by the utility runtime.addExitHook.
+type exitHook struct {
+	f                func() // func to run
+	runOnNonZeroExit bool   // whether to run on non-zero exit code
+}
+
+// exitHooks stores state related to hook functions registered to
+// run when program execution terminates.
+var exitHooks struct {
+	hooks            []exitHook
+	runningExitHooks bool
+}
+
+// runExitHooks runs any registered exit hook functions (funcs
+// previously registered using runtime.addExitHook). Here 'exitCode'
+// is the status code being passed to os.Exit, or zero if the program
+// is terminating normally without calling os.Exit.
+func runExitHooks(exitCode int) {
+	if exitHooks.runningExitHooks {
+		throw("internal error: exit hook invoked exit")
+	}
+	exitHooks.runningExitHooks = true
+
+	runExitHook := func(f func()) (caughtPanic bool) {
+		defer func() {
+			if x := recover(); x != nil {
+				caughtPanic = true
+			}
+		}()
+		f()
+		return
+	}
+
+	finishPageTrace()
+	for i := range exitHooks.hooks {
+		h := exitHooks.hooks[len(exitHooks.hooks)-i-1]
+		if exitCode != 0 && !h.runOnNonZeroExit {
+			continue
+		}
+		if caughtPanic := runExitHook(h.f); caughtPanic {
+			throw("internal error: exit hook invoked panic")
+		}
+	}
+	exitHooks.hooks = nil
+	exitHooks.runningExitHooks = false
+}
diff --git a/src/runtime/export_aix_test.go b/src/runtime/export_aix_test.go
new file mode 100644
index 0000000..4845533
--- /dev/null
+++ b/src/runtime/export_aix_test.go
@@ -0,0 +1,7 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var SetNonblock = setNonblock
diff --git a/src/runtime/export_arm_test.go b/src/runtime/export_arm_test.go
new file mode 100644
index 0000000..b8a89fc
--- /dev/null
+++ b/src/runtime/export_arm_test.go
@@ -0,0 +1,9 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Export guts for testing.
+
+package runtime
+
+var Usplit = usplit
diff --git a/src/runtime/export_darwin_test.go b/src/runtime/export_darwin_test.go
new file mode 100644
index 0000000..4845533
--- /dev/null
+++ b/src/runtime/export_darwin_test.go
@@ -0,0 +1,7 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var SetNonblock = setNonblock
diff --git a/src/runtime/export_debug_amd64_test.go b/src/runtime/export_debug_amd64_test.go
new file mode 100644
index 0000000..f9908cd
--- /dev/null
+++ b/src/runtime/export_debug_amd64_test.go
@@ -0,0 +1,132 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build amd64 && linux
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+type sigContext struct {
+	savedRegs sigcontext
+	// sigcontext.fpstate is a pointer, so we need to save
+	// the its value with a fpstate1 structure.
+	savedFP fpstate1
+}
+
+func sigctxtSetContextRegister(ctxt *sigctxt, x uint64) {
+	ctxt.regs().rdx = x
+}
+
+func sigctxtAtTrapInstruction(ctxt *sigctxt) bool {
+	return *(*byte)(unsafe.Pointer(uintptr(ctxt.rip() - 1))) == 0xcc // INT 3
+}
+
+func sigctxtStatus(ctxt *sigctxt) uint64 {
+	return ctxt.r12()
+}
+
+func (h *debugCallHandler) saveSigContext(ctxt *sigctxt) {
+	// Push current PC on the stack.
+	rsp := ctxt.rsp() - goarch.PtrSize
+	*(*uint64)(unsafe.Pointer(uintptr(rsp))) = ctxt.rip()
+	ctxt.set_rsp(rsp)
+	// Write the argument frame size.
+	*(*uintptr)(unsafe.Pointer(uintptr(rsp - 16))) = h.argSize
+	// Save current registers.
+	h.sigCtxt.savedRegs = *ctxt.regs()
+	h.sigCtxt.savedFP = *h.sigCtxt.savedRegs.fpstate
+	h.sigCtxt.savedRegs.fpstate = nil
+}
+
+// case 0
+func (h *debugCallHandler) debugCallRun(ctxt *sigctxt) {
+	rsp := ctxt.rsp()
+	memmove(unsafe.Pointer(uintptr(rsp)), h.argp, h.argSize)
+	if h.regArgs != nil {
+		storeRegArgs(ctxt.regs(), h.regArgs)
+	}
+	// Push return PC.
+	rsp -= goarch.PtrSize
+	ctxt.set_rsp(rsp)
+	// The signal PC is the next PC of the trap instruction.
+	*(*uint64)(unsafe.Pointer(uintptr(rsp))) = ctxt.rip()
+	// Set PC to call and context register.
+	ctxt.set_rip(uint64(h.fv.fn))
+	sigctxtSetContextRegister(ctxt, uint64(uintptr(unsafe.Pointer(h.fv))))
+}
+
+// case 1
+func (h *debugCallHandler) debugCallReturn(ctxt *sigctxt) {
+	rsp := ctxt.rsp()
+	memmove(h.argp, unsafe.Pointer(uintptr(rsp)), h.argSize)
+	if h.regArgs != nil {
+		loadRegArgs(h.regArgs, ctxt.regs())
+	}
+}
+
+// case 2
+func (h *debugCallHandler) debugCallPanicOut(ctxt *sigctxt) {
+	rsp := ctxt.rsp()
+	memmove(unsafe.Pointer(&h.panic), unsafe.Pointer(uintptr(rsp)), 2*goarch.PtrSize)
+}
+
+// case 8
+func (h *debugCallHandler) debugCallUnsafe(ctxt *sigctxt) {
+	rsp := ctxt.rsp()
+	reason := *(*string)(unsafe.Pointer(uintptr(rsp)))
+	h.err = plainError(reason)
+}
+
+// case 16
+func (h *debugCallHandler) restoreSigContext(ctxt *sigctxt) {
+	// Restore all registers except RIP and RSP.
+	rip, rsp := ctxt.rip(), ctxt.rsp()
+	fp := ctxt.regs().fpstate
+	*ctxt.regs() = h.sigCtxt.savedRegs
+	ctxt.regs().fpstate = fp
+	*fp = h.sigCtxt.savedFP
+	ctxt.set_rip(rip)
+	ctxt.set_rsp(rsp)
+}
+
+// storeRegArgs sets up argument registers in the signal
+// context state from an abi.RegArgs.
+//
+// Both src and dst must be non-nil.
+func storeRegArgs(dst *sigcontext, src *abi.RegArgs) {
+	dst.rax = uint64(src.Ints[0])
+	dst.rbx = uint64(src.Ints[1])
+	dst.rcx = uint64(src.Ints[2])
+	dst.rdi = uint64(src.Ints[3])
+	dst.rsi = uint64(src.Ints[4])
+	dst.r8 = uint64(src.Ints[5])
+	dst.r9 = uint64(src.Ints[6])
+	dst.r10 = uint64(src.Ints[7])
+	dst.r11 = uint64(src.Ints[8])
+	for i := range src.Floats {
+		dst.fpstate._xmm[i].element[0] = uint32(src.Floats[i] >> 0)
+		dst.fpstate._xmm[i].element[1] = uint32(src.Floats[i] >> 32)
+	}
+}
+
+func loadRegArgs(dst *abi.RegArgs, src *sigcontext) {
+	dst.Ints[0] = uintptr(src.rax)
+	dst.Ints[1] = uintptr(src.rbx)
+	dst.Ints[2] = uintptr(src.rcx)
+	dst.Ints[3] = uintptr(src.rdi)
+	dst.Ints[4] = uintptr(src.rsi)
+	dst.Ints[5] = uintptr(src.r8)
+	dst.Ints[6] = uintptr(src.r9)
+	dst.Ints[7] = uintptr(src.r10)
+	dst.Ints[8] = uintptr(src.r11)
+	for i := range dst.Floats {
+		dst.Floats[i] = uint64(src.fpstate._xmm[i].element[0]) << 0
+		dst.Floats[i] |= uint64(src.fpstate._xmm[i].element[1]) << 32
+	}
+}
diff --git a/src/runtime/export_debug_arm64_test.go b/src/runtime/export_debug_arm64_test.go
new file mode 100644
index 0000000..ee90241
--- /dev/null
+++ b/src/runtime/export_debug_arm64_test.go
@@ -0,0 +1,135 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build arm64 && linux
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+type sigContext struct {
+	savedRegs sigcontext
+}
+
+func sigctxtSetContextRegister(ctxt *sigctxt, x uint64) {
+	ctxt.regs().regs[26] = x
+}
+
+func sigctxtAtTrapInstruction(ctxt *sigctxt) bool {
+	return *(*uint32)(unsafe.Pointer(ctxt.sigpc())) == 0xd4200000 // BRK 0
+}
+
+func sigctxtStatus(ctxt *sigctxt) uint64 {
+	return ctxt.r20()
+}
+
+func (h *debugCallHandler) saveSigContext(ctxt *sigctxt) {
+	sp := ctxt.sp()
+	sp -= 2 * goarch.PtrSize
+	ctxt.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = ctxt.lr() // save the current lr
+	ctxt.set_lr(ctxt.pc())                              // set new lr to the current pc
+	// Write the argument frame size.
+	*(*uintptr)(unsafe.Pointer(uintptr(sp - 16))) = h.argSize
+	// Save current registers.
+	h.sigCtxt.savedRegs = *ctxt.regs()
+}
+
+// case 0
+func (h *debugCallHandler) debugCallRun(ctxt *sigctxt) {
+	sp := ctxt.sp()
+	memmove(unsafe.Pointer(uintptr(sp)+8), h.argp, h.argSize)
+	if h.regArgs != nil {
+		storeRegArgs(ctxt.regs(), h.regArgs)
+	}
+	// Push return PC, which should be the signal PC+4, because
+	// the signal PC is the PC of the trap instruction itself.
+	ctxt.set_lr(ctxt.pc() + 4)
+	// Set PC to call and context register.
+	ctxt.set_pc(uint64(h.fv.fn))
+	sigctxtSetContextRegister(ctxt, uint64(uintptr(unsafe.Pointer(h.fv))))
+}
+
+// case 1
+func (h *debugCallHandler) debugCallReturn(ctxt *sigctxt) {
+	sp := ctxt.sp()
+	memmove(h.argp, unsafe.Pointer(uintptr(sp)+8), h.argSize)
+	if h.regArgs != nil {
+		loadRegArgs(h.regArgs, ctxt.regs())
+	}
+	// Restore the old lr from *sp
+	olr := *(*uint64)(unsafe.Pointer(uintptr(sp)))
+	ctxt.set_lr(olr)
+	pc := ctxt.pc()
+	ctxt.set_pc(pc + 4) // step to next instruction
+}
+
+// case 2
+func (h *debugCallHandler) debugCallPanicOut(ctxt *sigctxt) {
+	sp := ctxt.sp()
+	memmove(unsafe.Pointer(&h.panic), unsafe.Pointer(uintptr(sp)+8), 2*goarch.PtrSize)
+	ctxt.set_pc(ctxt.pc() + 4)
+}
+
+// case 8
+func (h *debugCallHandler) debugCallUnsafe(ctxt *sigctxt) {
+	sp := ctxt.sp()
+	reason := *(*string)(unsafe.Pointer(uintptr(sp) + 8))
+	h.err = plainError(reason)
+	ctxt.set_pc(ctxt.pc() + 4)
+}
+
+// case 16
+func (h *debugCallHandler) restoreSigContext(ctxt *sigctxt) {
+	// Restore all registers except for pc and sp
+	pc, sp := ctxt.pc(), ctxt.sp()
+	*ctxt.regs() = h.sigCtxt.savedRegs
+	ctxt.set_pc(pc + 4)
+	ctxt.set_sp(sp)
+}
+
+// storeRegArgs sets up argument registers in the signal
+// context state from an abi.RegArgs.
+//
+// Both src and dst must be non-nil.
+func storeRegArgs(dst *sigcontext, src *abi.RegArgs) {
+	for i, r := range src.Ints {
+		dst.regs[i] = uint64(r)
+	}
+	for i, r := range src.Floats {
+		*(fpRegAddr(dst, i)) = r
+	}
+}
+
+func loadRegArgs(dst *abi.RegArgs, src *sigcontext) {
+	for i := range dst.Ints {
+		dst.Ints[i] = uintptr(src.regs[i])
+	}
+	for i := range dst.Floats {
+		dst.Floats[i] = *(fpRegAddr(src, i))
+	}
+}
+
+// fpRegAddr returns the address of the ith fp-simd register in sigcontext.
+func fpRegAddr(dst *sigcontext, i int) *uint64 {
+	/* FP-SIMD registers are saved in sigcontext.__reserved, which is orgnized in
+	the following C structs:
+	struct fpsimd_context {
+		struct _aarch64_ctx head;
+		__u32 fpsr;
+		__u32 fpcr;
+		__uint128_t vregs[32];
+	};
+	struct _aarch64_ctx {
+		__u32 magic;
+		__u32 size;
+	};
+	So the offset of the ith FP_SIMD register is 16+i*128.
+	*/
+	return (*uint64)(unsafe.Pointer(&dst.__reserved[16+i*128]))
+}
diff --git a/src/runtime/export_debug_test.go b/src/runtime/export_debug_test.go
new file mode 100644
index 0000000..76dc206
--- /dev/null
+++ b/src/runtime/export_debug_test.go
@@ -0,0 +1,182 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build (amd64 || arm64) && linux
+
+package runtime
+
+import (
+	"internal/abi"
+	"unsafe"
+)
+
+// InjectDebugCall injects a debugger call to fn into g. regArgs must
+// contain any arguments to fn that are passed in registers, according
+// to the internal Go ABI. It may be nil if no arguments are passed in
+// registers to fn. args must be a pointer to a valid call frame (including
+// arguments and return space) for fn, or nil. tkill must be a function that
+// will send SIGTRAP to thread ID tid. gp must be locked to its OS thread and
+// running.
+//
+// On success, InjectDebugCall returns the panic value of fn or nil.
+// If fn did not panic, its results will be available in args.
+func InjectDebugCall(gp *g, fn any, regArgs *abi.RegArgs, stackArgs any, tkill func(tid int) error, returnOnUnsafePoint bool) (any, error) {
+	if gp.lockedm == 0 {
+		return nil, plainError("goroutine not locked to thread")
+	}
+
+	tid := int(gp.lockedm.ptr().procid)
+	if tid == 0 {
+		return nil, plainError("missing tid")
+	}
+
+	f := efaceOf(&fn)
+	if f._type == nil || f._type.Kind_&kindMask != kindFunc {
+		return nil, plainError("fn must be a function")
+	}
+	fv := (*funcval)(f.data)
+
+	a := efaceOf(&stackArgs)
+	if a._type != nil && a._type.Kind_&kindMask != kindPtr {
+		return nil, plainError("args must be a pointer or nil")
+	}
+	argp := a.data
+	var argSize uintptr
+	if argp != nil {
+		argSize = (*ptrtype)(unsafe.Pointer(a._type)).Elem.Size_
+	}
+
+	h := new(debugCallHandler)
+	h.gp = gp
+	// gp may not be running right now, but we can still get the M
+	// it will run on since it's locked.
+	h.mp = gp.lockedm.ptr()
+	h.fv, h.regArgs, h.argp, h.argSize = fv, regArgs, argp, argSize
+	h.handleF = h.handle // Avoid allocating closure during signal
+
+	defer func() { testSigtrap = nil }()
+	for i := 0; ; i++ {
+		testSigtrap = h.inject
+		noteclear(&h.done)
+		h.err = ""
+
+		if err := tkill(tid); err != nil {
+			return nil, err
+		}
+		// Wait for completion.
+		notetsleepg(&h.done, -1)
+		if h.err != "" {
+			switch h.err {
+			case "call not at safe point":
+				if returnOnUnsafePoint {
+					// This is for TestDebugCallUnsafePoint.
+					return nil, h.err
+				}
+				fallthrough
+			case "retry _Grunnable", "executing on Go runtime stack", "call from within the Go runtime":
+				// These are transient states. Try to get out of them.
+				if i < 100 {
+					usleep(100)
+					Gosched()
+					continue
+				}
+			}
+			return nil, h.err
+		}
+		return h.panic, nil
+	}
+}
+
+type debugCallHandler struct {
+	gp      *g
+	mp      *m
+	fv      *funcval
+	regArgs *abi.RegArgs
+	argp    unsafe.Pointer
+	argSize uintptr
+	panic   any
+
+	handleF func(info *siginfo, ctxt *sigctxt, gp2 *g) bool
+
+	err     plainError
+	done    note
+	sigCtxt sigContext
+}
+
+func (h *debugCallHandler) inject(info *siginfo, ctxt *sigctxt, gp2 *g) bool {
+	// TODO(49370): This code is riddled with write barriers, but called from
+	// a signal handler. Add the go:nowritebarrierrec annotation and restructure
+	// this to avoid write barriers.
+
+	switch h.gp.atomicstatus.Load() {
+	case _Grunning:
+		if getg().m != h.mp {
+			println("trap on wrong M", getg().m, h.mp)
+			return false
+		}
+		// Save the signal context
+		h.saveSigContext(ctxt)
+		// Set PC to debugCallV2.
+		ctxt.setsigpc(uint64(abi.FuncPCABIInternal(debugCallV2)))
+		// Call injected. Switch to the debugCall protocol.
+		testSigtrap = h.handleF
+	case _Grunnable:
+		// Ask InjectDebugCall to pause for a bit and then try
+		// again to interrupt this goroutine.
+		h.err = plainError("retry _Grunnable")
+		notewakeup(&h.done)
+	default:
+		h.err = plainError("goroutine in unexpected state at call inject")
+		notewakeup(&h.done)
+	}
+	// Resume execution.
+	return true
+}
+
+func (h *debugCallHandler) handle(info *siginfo, ctxt *sigctxt, gp2 *g) bool {
+	// TODO(49370): This code is riddled with write barriers, but called from
+	// a signal handler. Add the go:nowritebarrierrec annotation and restructure
+	// this to avoid write barriers.
+
+	// Double-check m.
+	if getg().m != h.mp {
+		println("trap on wrong M", getg().m, h.mp)
+		return false
+	}
+	f := findfunc(ctxt.sigpc())
+	if !(hasPrefix(funcname(f), "runtime.debugCall") || hasPrefix(funcname(f), "debugCall")) {
+		println("trap in unknown function", funcname(f))
+		return false
+	}
+	if !sigctxtAtTrapInstruction(ctxt) {
+		println("trap at non-INT3 instruction pc =", hex(ctxt.sigpc()))
+		return false
+	}
+
+	switch status := sigctxtStatus(ctxt); status {
+	case 0:
+		// Frame is ready. Copy the arguments to the frame and to registers.
+		// Call the debug function.
+		h.debugCallRun(ctxt)
+	case 1:
+		// Function returned. Copy frame and result registers back out.
+		h.debugCallReturn(ctxt)
+	case 2:
+		// Function panicked. Copy panic out.
+		h.debugCallPanicOut(ctxt)
+	case 8:
+		// Call isn't safe. Get the reason.
+		h.debugCallUnsafe(ctxt)
+		// Don't wake h.done. We need to transition to status 16 first.
+	case 16:
+		h.restoreSigContext(ctxt)
+		// Done
+		notewakeup(&h.done)
+	default:
+		h.err = plainError("unexpected debugCallV2 status")
+		notewakeup(&h.done)
+	}
+	// Resume execution.
+	return true
+}
diff --git a/src/runtime/export_debuglog_test.go b/src/runtime/export_debuglog_test.go
new file mode 100644
index 0000000..f12aab0
--- /dev/null
+++ b/src/runtime/export_debuglog_test.go
@@ -0,0 +1,46 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Export debuglog guts for testing.
+
+package runtime
+
+const DlogEnabled = dlogEnabled
+
+const DebugLogBytes = debugLogBytes
+
+const DebugLogStringLimit = debugLogStringLimit
+
+var Dlog = dlog
+
+func (l *dlogger) End()                  { l.end() }
+func (l *dlogger) B(x bool) *dlogger     { return l.b(x) }
+func (l *dlogger) I(x int) *dlogger      { return l.i(x) }
+func (l *dlogger) I16(x int16) *dlogger  { return l.i16(x) }
+func (l *dlogger) U64(x uint64) *dlogger { return l.u64(x) }
+func (l *dlogger) Hex(x uint64) *dlogger { return l.hex(x) }
+func (l *dlogger) P(x any) *dlogger      { return l.p(x) }
+func (l *dlogger) S(x string) *dlogger   { return l.s(x) }
+func (l *dlogger) PC(x uintptr) *dlogger { return l.pc(x) }
+
+func DumpDebugLog() string {
+	gp := getg()
+	gp.writebuf = make([]byte, 0, 1<<20)
+	printDebugLog()
+	buf := gp.writebuf
+	gp.writebuf = nil
+
+	return string(buf)
+}
+
+func ResetDebugLog() {
+	stopTheWorld(stwForTestResetDebugLog)
+	for l := allDloggers; l != nil; l = l.allLink {
+		l.w.write = 0
+		l.w.tick, l.w.nano = 0, 0
+		l.w.r.begin, l.w.r.end = 0, 0
+		l.w.r.tick, l.w.r.nano = 0, 0
+	}
+	startTheWorld()
+}
diff --git a/src/runtime/export_linux_test.go b/src/runtime/export_linux_test.go
new file mode 100644
index 0000000..426fd1e
--- /dev/null
+++ b/src/runtime/export_linux_test.go
@@ -0,0 +1,17 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Export guts for testing.
+
+package runtime
+
+const SiginfoMaxSize = _si_max_size
+const SigeventMaxSize = _sigev_max_size
+
+var NewOSProc0 = newosproc0
+var Mincore = mincore
+var Add = add
+
+type Siginfo siginfo
+type Sigevent sigevent
diff --git a/src/runtime/export_mmap_test.go b/src/runtime/export_mmap_test.go
new file mode 100644
index 0000000..f73fcbd
--- /dev/null
+++ b/src/runtime/export_mmap_test.go
@@ -0,0 +1,21 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+// Export guts for testing.
+
+package runtime
+
+var Mmap = mmap
+var Munmap = munmap
+
+const ENOMEM = _ENOMEM
+const MAP_ANON = _MAP_ANON
+const MAP_PRIVATE = _MAP_PRIVATE
+const MAP_FIXED = _MAP_FIXED
+
+func GetPhysPageSize() uintptr {
+	return physPageSize
+}
diff --git a/src/runtime/export_pipe2_test.go b/src/runtime/export_pipe2_test.go
new file mode 100644
index 0000000..8d49009
--- /dev/null
+++ b/src/runtime/export_pipe2_test.go
@@ -0,0 +1,11 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build dragonfly || freebsd || linux || netbsd || openbsd || solaris
+
+package runtime
+
+func Pipe() (r, w int32, errno int32) {
+	return pipe2(0)
+}
diff --git a/src/runtime/export_pipe_test.go b/src/runtime/export_pipe_test.go
new file mode 100644
index 0000000..0583039
--- /dev/null
+++ b/src/runtime/export_pipe_test.go
@@ -0,0 +1,9 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build aix || darwin
+
+package runtime
+
+var Pipe = pipe
diff --git a/src/runtime/export_test.go b/src/runtime/export_test.go
new file mode 100644
index 0000000..34dd890
--- /dev/null
+++ b/src/runtime/export_test.go
@@ -0,0 +1,1942 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Export guts for testing.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"internal/goos"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+var Fadd64 = fadd64
+var Fsub64 = fsub64
+var Fmul64 = fmul64
+var Fdiv64 = fdiv64
+var F64to32 = f64to32
+var F32to64 = f32to64
+var Fcmp64 = fcmp64
+var Fintto64 = fintto64
+var F64toint = f64toint
+
+var Entersyscall = entersyscall
+var Exitsyscall = exitsyscall
+var LockedOSThread = lockedOSThread
+var Xadduintptr = atomic.Xadduintptr
+
+var Fastlog2 = fastlog2
+
+var Atoi = atoi
+var Atoi32 = atoi32
+var ParseByteCount = parseByteCount
+
+var Nanotime = nanotime
+var NetpollBreak = netpollBreak
+var Usleep = usleep
+
+var PhysPageSize = physPageSize
+var PhysHugePageSize = physHugePageSize
+
+var NetpollGenericInit = netpollGenericInit
+
+var Memmove = memmove
+var MemclrNoHeapPointers = memclrNoHeapPointers
+
+var CgoCheckPointer = cgoCheckPointer
+
+const TracebackInnerFrames = tracebackInnerFrames
+const TracebackOuterFrames = tracebackOuterFrames
+
+var LockPartialOrder = lockPartialOrder
+
+type LockRank lockRank
+
+func (l LockRank) String() string {
+	return lockRank(l).String()
+}
+
+const PreemptMSupported = preemptMSupported
+
+type LFNode struct {
+	Next    uint64
+	Pushcnt uintptr
+}
+
+func LFStackPush(head *uint64, node *LFNode) {
+	(*lfstack)(head).push((*lfnode)(unsafe.Pointer(node)))
+}
+
+func LFStackPop(head *uint64) *LFNode {
+	return (*LFNode)(unsafe.Pointer((*lfstack)(head).pop()))
+}
+func LFNodeValidate(node *LFNode) {
+	lfnodeValidate((*lfnode)(unsafe.Pointer(node)))
+}
+
+func Netpoll(delta int64) {
+	systemstack(func() {
+		netpoll(delta)
+	})
+}
+
+func GCMask(x any) (ret []byte) {
+	systemstack(func() {
+		ret = getgcmask(x)
+	})
+	return
+}
+
+func RunSchedLocalQueueTest() {
+	pp := new(p)
+	gs := make([]g, len(pp.runq))
+	Escape(gs) // Ensure gs doesn't move, since we use guintptrs
+	for i := 0; i < len(pp.runq); i++ {
+		if g, _ := runqget(pp); g != nil {
+			throw("runq is not empty initially")
+		}
+		for j := 0; j < i; j++ {
+			runqput(pp, &gs[i], false)
+		}
+		for j := 0; j < i; j++ {
+			if g, _ := runqget(pp); g != &gs[i] {
+				print("bad element at iter ", i, "/", j, "\n")
+				throw("bad element")
+			}
+		}
+		if g, _ := runqget(pp); g != nil {
+			throw("runq is not empty afterwards")
+		}
+	}
+}
+
+func RunSchedLocalQueueStealTest() {
+	p1 := new(p)
+	p2 := new(p)
+	gs := make([]g, len(p1.runq))
+	Escape(gs) // Ensure gs doesn't move, since we use guintptrs
+	for i := 0; i < len(p1.runq); i++ {
+		for j := 0; j < i; j++ {
+			gs[j].sig = 0
+			runqput(p1, &gs[j], false)
+		}
+		gp := runqsteal(p2, p1, true)
+		s := 0
+		if gp != nil {
+			s++
+			gp.sig++
+		}
+		for {
+			gp, _ = runqget(p2)
+			if gp == nil {
+				break
+			}
+			s++
+			gp.sig++
+		}
+		for {
+			gp, _ = runqget(p1)
+			if gp == nil {
+				break
+			}
+			gp.sig++
+		}
+		for j := 0; j < i; j++ {
+			if gs[j].sig != 1 {
+				print("bad element ", j, "(", gs[j].sig, ") at iter ", i, "\n")
+				throw("bad element")
+			}
+		}
+		if s != i/2 && s != i/2+1 {
+			print("bad steal ", s, ", want ", i/2, " or ", i/2+1, ", iter ", i, "\n")
+			throw("bad steal")
+		}
+	}
+}
+
+func RunSchedLocalQueueEmptyTest(iters int) {
+	// Test that runq is not spuriously reported as empty.
+	// Runq emptiness affects scheduling decisions and spurious emptiness
+	// can lead to underutilization (both runnable Gs and idle Ps coexist
+	// for arbitrary long time).
+	done := make(chan bool, 1)
+	p := new(p)
+	gs := make([]g, 2)
+	Escape(gs) // Ensure gs doesn't move, since we use guintptrs
+	ready := new(uint32)
+	for i := 0; i < iters; i++ {
+		*ready = 0
+		next0 := (i & 1) == 0
+		next1 := (i & 2) == 0
+		runqput(p, &gs[0], next0)
+		go func() {
+			for atomic.Xadd(ready, 1); atomic.Load(ready) != 2; {
+			}
+			if runqempty(p) {
+				println("next:", next0, next1)
+				throw("queue is empty")
+			}
+			done <- true
+		}()
+		for atomic.Xadd(ready, 1); atomic.Load(ready) != 2; {
+		}
+		runqput(p, &gs[1], next1)
+		runqget(p)
+		<-done
+		runqget(p)
+	}
+}
+
+var (
+	StringHash = stringHash
+	BytesHash  = bytesHash
+	Int32Hash  = int32Hash
+	Int64Hash  = int64Hash
+	MemHash    = memhash
+	MemHash32  = memhash32
+	MemHash64  = memhash64
+	EfaceHash  = efaceHash
+	IfaceHash  = ifaceHash
+)
+
+var UseAeshash = &useAeshash
+
+func MemclrBytes(b []byte) {
+	s := (*slice)(unsafe.Pointer(&b))
+	memclrNoHeapPointers(s.array, uintptr(s.len))
+}
+
+const HashLoad = hashLoad
+
+// entry point for testing
+func GostringW(w []uint16) (s string) {
+	systemstack(func() {
+		s = gostringw(&w[0])
+	})
+	return
+}
+
+var Open = open
+var Close = closefd
+var Read = read
+var Write = write
+
+func Envs() []string     { return envs }
+func SetEnvs(e []string) { envs = e }
+
+// For benchmarking.
+
+// blockWrapper is a wrapper type that ensures a T is placed within a
+// large object. This is necessary for safely benchmarking things
+// that manipulate the heap bitmap, like heapBitsSetType.
+//
+// More specifically, allocating threads assume they're the sole writers
+// to their span's heap bits, which allows those writes to be non-atomic.
+// The heap bitmap is written byte-wise, so if one tried to call heapBitsSetType
+// on an existing object in a small object span, we might corrupt that
+// span's bitmap with a concurrent byte write to the heap bitmap. Large
+// object spans contain exactly one object, so we can be sure no other P
+// is going to be allocating from it concurrently, hence this wrapper type
+// which ensures we have a T in a large object span.
+type blockWrapper[T any] struct {
+	value T
+	_     [_MaxSmallSize]byte // Ensure we're a large object.
+}
+
+func BenchSetType[T any](n int, resetTimer func()) {
+	x := new(blockWrapper[T])
+
+	// Escape x to ensure it is allocated on the heap, as we are
+	// working on the heap bits here.
+	Escape(x)
+
+	// Grab the type.
+	var i any = *new(T)
+	e := *efaceOf(&i)
+	t := e._type
+
+	// Benchmark setting the type bits for just the internal T of the block.
+	benchSetType(n, resetTimer, 1, unsafe.Pointer(&x.value), t)
+}
+
+const maxArrayBlockWrapperLen = 32
+
+// arrayBlockWrapper is like blockWrapper, but the interior value is intended
+// to be used as a backing store for a slice.
+type arrayBlockWrapper[T any] struct {
+	value [maxArrayBlockWrapperLen]T
+	_     [_MaxSmallSize]byte // Ensure we're a large object.
+}
+
+// arrayLargeBlockWrapper is like arrayBlockWrapper, but the interior array
+// accommodates many more elements.
+type arrayLargeBlockWrapper[T any] struct {
+	value [1024]T
+	_     [_MaxSmallSize]byte // Ensure we're a large object.
+}
+
+func BenchSetTypeSlice[T any](n int, resetTimer func(), len int) {
+	// We have two separate cases here because we want to avoid
+	// tests on big types but relatively small slices to avoid generating
+	// an allocation that's really big. This will likely force a GC which will
+	// skew the test results.
+	var y unsafe.Pointer
+	if len <= maxArrayBlockWrapperLen {
+		x := new(arrayBlockWrapper[T])
+		// Escape x to ensure it is allocated on the heap, as we are
+		// working on the heap bits here.
+		Escape(x)
+		y = unsafe.Pointer(&x.value[0])
+	} else {
+		x := new(arrayLargeBlockWrapper[T])
+		Escape(x)
+		y = unsafe.Pointer(&x.value[0])
+	}
+
+	// Grab the type.
+	var i any = *new(T)
+	e := *efaceOf(&i)
+	t := e._type
+
+	// Benchmark setting the type for a slice created from the array
+	// of T within the arrayBlock.
+	benchSetType(n, resetTimer, len, y, t)
+}
+
+// benchSetType is the implementation of the BenchSetType* functions.
+// x must be len consecutive Ts allocated within a large object span (to
+// avoid a race on the heap bitmap).
+//
+// Note: this function cannot be generic. It would get its type from one of
+// its callers (BenchSetType or BenchSetTypeSlice) whose type parameters are
+// set by a call in the runtime_test package. That means this function and its
+// callers will get instantiated in the package that provides the type argument,
+// i.e. runtime_test. However, we call a function on the system stack. In race
+// mode the runtime package is usually left uninstrumented because e.g. g0 has
+// no valid racectx, but if we're instantiated in the runtime_test package,
+// we might accidentally cause runtime code to be incorrectly instrumented.
+func benchSetType(n int, resetTimer func(), len int, x unsafe.Pointer, t *_type) {
+	// Compute the input sizes.
+	size := t.Size() * uintptr(len)
+
+	// Validate this function's invariant.
+	s := spanOfHeap(uintptr(x))
+	if s == nil {
+		panic("no heap span for input")
+	}
+	if s.spanclass.sizeclass() != 0 {
+		panic("span is not a large object span")
+	}
+
+	// Round up the size to the size class to make the benchmark a little more
+	// realistic. However, validate it, to make sure this is safe.
+	allocSize := roundupsize(size)
+	if s.npages*pageSize < allocSize {
+		panic("backing span not large enough for benchmark")
+	}
+
+	// Benchmark heapBitsSetType by calling it in a loop. This is safe because
+	// x is in a large object span.
+	resetTimer()
+	systemstack(func() {
+		for i := 0; i < n; i++ {
+			heapBitsSetType(uintptr(x), allocSize, size, t)
+		}
+	})
+
+	// Make sure x doesn't get freed, since we're taking a uintptr.
+	KeepAlive(x)
+}
+
+const PtrSize = goarch.PtrSize
+
+var ForceGCPeriod = &forcegcperiod
+
+// SetTracebackEnv is like runtime/debug.SetTraceback, but it raises
+// the "environment" traceback level, so later calls to
+// debug.SetTraceback (e.g., from testing timeouts) can't lower it.
+func SetTracebackEnv(level string) {
+	setTraceback(level)
+	traceback_env = traceback_cache
+}
+
+var ReadUnaligned32 = readUnaligned32
+var ReadUnaligned64 = readUnaligned64
+
+func CountPagesInUse() (pagesInUse, counted uintptr) {
+	stopTheWorld(stwForTestCountPagesInUse)
+
+	pagesInUse = uintptr(mheap_.pagesInUse.Load())
+
+	for _, s := range mheap_.allspans {
+		if s.state.get() == mSpanInUse {
+			counted += s.npages
+		}
+	}
+
+	startTheWorld()
+
+	return
+}
+
+func Fastrand() uint32          { return fastrand() }
+func Fastrand64() uint64        { return fastrand64() }
+func Fastrandn(n uint32) uint32 { return fastrandn(n) }
+
+type ProfBuf profBuf
+
+func NewProfBuf(hdrsize, bufwords, tags int) *ProfBuf {
+	return (*ProfBuf)(newProfBuf(hdrsize, bufwords, tags))
+}
+
+func (p *ProfBuf) Write(tag *unsafe.Pointer, now int64, hdr []uint64, stk []uintptr) {
+	(*profBuf)(p).write(tag, now, hdr, stk)
+}
+
+const (
+	ProfBufBlocking    = profBufBlocking
+	ProfBufNonBlocking = profBufNonBlocking
+)
+
+func (p *ProfBuf) Read(mode profBufReadMode) ([]uint64, []unsafe.Pointer, bool) {
+	return (*profBuf)(p).read(profBufReadMode(mode))
+}
+
+func (p *ProfBuf) Close() {
+	(*profBuf)(p).close()
+}
+
+func ReadMetricsSlow(memStats *MemStats, samplesp unsafe.Pointer, len, cap int) {
+	stopTheWorld(stwForTestReadMetricsSlow)
+
+	// Initialize the metrics beforehand because this could
+	// allocate and skew the stats.
+	metricsLock()
+	initMetrics()
+	metricsUnlock()
+
+	systemstack(func() {
+		// Read memstats first. It's going to flush
+		// the mcaches which readMetrics does not do, so
+		// going the other way around may result in
+		// inconsistent statistics.
+		readmemstats_m(memStats)
+	})
+
+	// Read metrics off the system stack.
+	//
+	// The only part of readMetrics that could allocate
+	// and skew the stats is initMetrics.
+	readMetrics(samplesp, len, cap)
+
+	startTheWorld()
+}
+
+var DoubleCheckReadMemStats = &doubleCheckReadMemStats
+
+// ReadMemStatsSlow returns both the runtime-computed MemStats and
+// MemStats accumulated by scanning the heap.
+func ReadMemStatsSlow() (base, slow MemStats) {
+	stopTheWorld(stwForTestReadMemStatsSlow)
+
+	// Run on the system stack to avoid stack growth allocation.
+	systemstack(func() {
+		// Make sure stats don't change.
+		getg().m.mallocing++
+
+		readmemstats_m(&base)
+
+		// Initialize slow from base and zero the fields we're
+		// recomputing.
+		slow = base
+		slow.Alloc = 0
+		slow.TotalAlloc = 0
+		slow.Mallocs = 0
+		slow.Frees = 0
+		slow.HeapReleased = 0
+		var bySize [_NumSizeClasses]struct {
+			Mallocs, Frees uint64
+		}
+
+		// Add up current allocations in spans.
+		for _, s := range mheap_.allspans {
+			if s.state.get() != mSpanInUse {
+				continue
+			}
+			if s.isUnusedUserArenaChunk() {
+				continue
+			}
+			if sizeclass := s.spanclass.sizeclass(); sizeclass == 0 {
+				slow.Mallocs++
+				slow.Alloc += uint64(s.elemsize)
+			} else {
+				slow.Mallocs += uint64(s.allocCount)
+				slow.Alloc += uint64(s.allocCount) * uint64(s.elemsize)
+				bySize[sizeclass].Mallocs += uint64(s.allocCount)
+			}
+		}
+
+		// Add in frees by just reading the stats for those directly.
+		var m heapStatsDelta
+		memstats.heapStats.unsafeRead(&m)
+
+		// Collect per-sizeclass free stats.
+		var smallFree uint64
+		for i := 0; i < _NumSizeClasses; i++ {
+			slow.Frees += uint64(m.smallFreeCount[i])
+			bySize[i].Frees += uint64(m.smallFreeCount[i])
+			bySize[i].Mallocs += uint64(m.smallFreeCount[i])
+			smallFree += uint64(m.smallFreeCount[i]) * uint64(class_to_size[i])
+		}
+		slow.Frees += uint64(m.tinyAllocCount) + uint64(m.largeFreeCount)
+		slow.Mallocs += slow.Frees
+
+		slow.TotalAlloc = slow.Alloc + uint64(m.largeFree) + smallFree
+
+		for i := range slow.BySize {
+			slow.BySize[i].Mallocs = bySize[i].Mallocs
+			slow.BySize[i].Frees = bySize[i].Frees
+		}
+
+		for i := mheap_.pages.start; i < mheap_.pages.end; i++ {
+			chunk := mheap_.pages.tryChunkOf(i)
+			if chunk == nil {
+				continue
+			}
+			pg := chunk.scavenged.popcntRange(0, pallocChunkPages)
+			slow.HeapReleased += uint64(pg) * pageSize
+		}
+		for _, p := range allp {
+			pg := sys.OnesCount64(p.pcache.scav)
+			slow.HeapReleased += uint64(pg) * pageSize
+		}
+
+		getg().m.mallocing--
+	})
+
+	startTheWorld()
+	return
+}
+
+// ShrinkStackAndVerifyFramePointers attempts to shrink the stack of the current goroutine
+// and verifies that unwinding the new stack doesn't crash, even if the old
+// stack has been freed or reused (simulated via poisoning).
+func ShrinkStackAndVerifyFramePointers() {
+	before := stackPoisonCopy
+	defer func() { stackPoisonCopy = before }()
+	stackPoisonCopy = 1
+
+	gp := getg()
+	systemstack(func() {
+		shrinkstack(gp)
+	})
+	// If our new stack contains frame pointers into the old stack, this will
+	// crash because the old stack has been poisoned.
+	FPCallers(make([]uintptr, 1024))
+}
+
+// BlockOnSystemStack switches to the system stack, prints "x\n" to
+// stderr, and blocks in a stack containing
+// "runtime.blockOnSystemStackInternal".
+func BlockOnSystemStack() {
+	systemstack(blockOnSystemStackInternal)
+}
+
+func blockOnSystemStackInternal() {
+	print("x\n")
+	lock(&deadlock)
+	lock(&deadlock)
+}
+
+type RWMutex struct {
+	rw rwmutex
+}
+
+func (rw *RWMutex) Init() {
+	rw.rw.init(lockRankTestR, lockRankTestRInternal, lockRankTestW)
+}
+
+func (rw *RWMutex) RLock() {
+	rw.rw.rlock()
+}
+
+func (rw *RWMutex) RUnlock() {
+	rw.rw.runlock()
+}
+
+func (rw *RWMutex) Lock() {
+	rw.rw.lock()
+}
+
+func (rw *RWMutex) Unlock() {
+	rw.rw.unlock()
+}
+
+const RuntimeHmapSize = unsafe.Sizeof(hmap{})
+
+func MapBucketsCount(m map[int]int) int {
+	h := *(**hmap)(unsafe.Pointer(&m))
+	return 1 << h.B
+}
+
+func MapBucketsPointerIsNil(m map[int]int) bool {
+	h := *(**hmap)(unsafe.Pointer(&m))
+	return h.buckets == nil
+}
+
+func LockOSCounts() (external, internal uint32) {
+	gp := getg()
+	if gp.m.lockedExt+gp.m.lockedInt == 0 {
+		if gp.lockedm != 0 {
+			panic("lockedm on non-locked goroutine")
+		}
+	} else {
+		if gp.lockedm == 0 {
+			panic("nil lockedm on locked goroutine")
+		}
+	}
+	return gp.m.lockedExt, gp.m.lockedInt
+}
+
+//go:noinline
+func TracebackSystemstack(stk []uintptr, i int) int {
+	if i == 0 {
+		pc, sp := getcallerpc(), getcallersp()
+		var u unwinder
+		u.initAt(pc, sp, 0, getg(), unwindJumpStack) // Don't ignore errors, for testing
+		return tracebackPCs(&u, 0, stk)
+	}
+	n := 0
+	systemstack(func() {
+		n = TracebackSystemstack(stk, i-1)
+	})
+	return n
+}
+
+func KeepNArenaHints(n int) {
+	hint := mheap_.arenaHints
+	for i := 1; i < n; i++ {
+		hint = hint.next
+		if hint == nil {
+			return
+		}
+	}
+	hint.next = nil
+}
+
+// MapNextArenaHint reserves a page at the next arena growth hint,
+// preventing the arena from growing there, and returns the range of
+// addresses that are no longer viable.
+//
+// This may fail to reserve memory. If it fails, it still returns the
+// address range it attempted to reserve.
+func MapNextArenaHint() (start, end uintptr, ok bool) {
+	hint := mheap_.arenaHints
+	addr := hint.addr
+	if hint.down {
+		start, end = addr-heapArenaBytes, addr
+		addr -= physPageSize
+	} else {
+		start, end = addr, addr+heapArenaBytes
+	}
+	got := sysReserve(unsafe.Pointer(addr), physPageSize)
+	ok = (addr == uintptr(got))
+	if !ok {
+		// We were unable to get the requested reservation.
+		// Release what we did get and fail.
+		sysFreeOS(got, physPageSize)
+	}
+	return
+}
+
+func GetNextArenaHint() uintptr {
+	return mheap_.arenaHints.addr
+}
+
+type G = g
+
+type Sudog = sudog
+
+func Getg() *G {
+	return getg()
+}
+
+func Goid() uint64 {
+	return getg().goid
+}
+
+func GIsWaitingOnMutex(gp *G) bool {
+	return readgstatus(gp) == _Gwaiting && gp.waitreason.isMutexWait()
+}
+
+var CasGStatusAlwaysTrack = &casgstatusAlwaysTrack
+
+//go:noinline
+func PanicForTesting(b []byte, i int) byte {
+	return unexportedPanicForTesting(b, i)
+}
+
+//go:noinline
+func unexportedPanicForTesting(b []byte, i int) byte {
+	return b[i]
+}
+
+func G0StackOverflow() {
+	systemstack(func() {
+		stackOverflow(nil)
+	})
+}
+
+func stackOverflow(x *byte) {
+	var buf [256]byte
+	stackOverflow(&buf[0])
+}
+
+func MapTombstoneCheck(m map[int]int) {
+	// Make sure emptyOne and emptyRest are distributed correctly.
+	// We should have a series of filled and emptyOne cells, followed by
+	// a series of emptyRest cells.
+	h := *(**hmap)(unsafe.Pointer(&m))
+	i := any(m)
+	t := *(**maptype)(unsafe.Pointer(&i))
+
+	for x := 0; x < 1<<h.B; x++ {
+		b0 := (*bmap)(add(h.buckets, uintptr(x)*uintptr(t.BucketSize)))
+		n := 0
+		for b := b0; b != nil; b = b.overflow(t) {
+			for i := 0; i < bucketCnt; i++ {
+				if b.tophash[i] != emptyRest {
+					n++
+				}
+			}
+		}
+		k := 0
+		for b := b0; b != nil; b = b.overflow(t) {
+			for i := 0; i < bucketCnt; i++ {
+				if k < n && b.tophash[i] == emptyRest {
+					panic("early emptyRest")
+				}
+				if k >= n && b.tophash[i] != emptyRest {
+					panic("late non-emptyRest")
+				}
+				if k == n-1 && b.tophash[i] == emptyOne {
+					panic("last non-emptyRest entry is emptyOne")
+				}
+				k++
+			}
+		}
+	}
+}
+
+func RunGetgThreadSwitchTest() {
+	// Test that getg works correctly with thread switch.
+	// With gccgo, if we generate getg inlined, the backend
+	// may cache the address of the TLS variable, which
+	// will become invalid after a thread switch. This test
+	// checks that the bad caching doesn't happen.
+
+	ch := make(chan int)
+	go func(ch chan int) {
+		ch <- 5
+		LockOSThread()
+	}(ch)
+
+	g1 := getg()
+
+	// Block on a receive. This is likely to get us a thread
+	// switch. If we yield to the sender goroutine, it will
+	// lock the thread, forcing us to resume on a different
+	// thread.
+	<-ch
+
+	g2 := getg()
+	if g1 != g2 {
+		panic("g1 != g2")
+	}
+
+	// Also test getg after some control flow, as the
+	// backend is sensitive to control flow.
+	g3 := getg()
+	if g1 != g3 {
+		panic("g1 != g3")
+	}
+}
+
+const (
+	PageSize         = pageSize
+	PallocChunkPages = pallocChunkPages
+	PageAlloc64Bit   = pageAlloc64Bit
+	PallocSumBytes   = pallocSumBytes
+)
+
+// Expose pallocSum for testing.
+type PallocSum pallocSum
+
+func PackPallocSum(start, max, end uint) PallocSum { return PallocSum(packPallocSum(start, max, end)) }
+func (m PallocSum) Start() uint                    { return pallocSum(m).start() }
+func (m PallocSum) Max() uint                      { return pallocSum(m).max() }
+func (m PallocSum) End() uint                      { return pallocSum(m).end() }
+
+// Expose pallocBits for testing.
+type PallocBits pallocBits
+
+func (b *PallocBits) Find(npages uintptr, searchIdx uint) (uint, uint) {
+	return (*pallocBits)(b).find(npages, searchIdx)
+}
+func (b *PallocBits) AllocRange(i, n uint)       { (*pallocBits)(b).allocRange(i, n) }
+func (b *PallocBits) Free(i, n uint)             { (*pallocBits)(b).free(i, n) }
+func (b *PallocBits) Summarize() PallocSum       { return PallocSum((*pallocBits)(b).summarize()) }
+func (b *PallocBits) PopcntRange(i, n uint) uint { return (*pageBits)(b).popcntRange(i, n) }
+
+// SummarizeSlow is a slow but more obviously correct implementation
+// of (*pallocBits).summarize. Used for testing.
+func SummarizeSlow(b *PallocBits) PallocSum {
+	var start, max, end uint
+
+	const N = uint(len(b)) * 64
+	for start < N && (*pageBits)(b).get(start) == 0 {
+		start++
+	}
+	for end < N && (*pageBits)(b).get(N-end-1) == 0 {
+		end++
+	}
+	run := uint(0)
+	for i := uint(0); i < N; i++ {
+		if (*pageBits)(b).get(i) == 0 {
+			run++
+		} else {
+			run = 0
+		}
+		if run > max {
+			max = run
+		}
+	}
+	return PackPallocSum(start, max, end)
+}
+
+// Expose non-trivial helpers for testing.
+func FindBitRange64(c uint64, n uint) uint { return findBitRange64(c, n) }
+
+// Given two PallocBits, returns a set of bit ranges where
+// they differ.
+func DiffPallocBits(a, b *PallocBits) []BitRange {
+	ba := (*pageBits)(a)
+	bb := (*pageBits)(b)
+
+	var d []BitRange
+	base, size := uint(0), uint(0)
+	for i := uint(0); i < uint(len(ba))*64; i++ {
+		if ba.get(i) != bb.get(i) {
+			if size == 0 {
+				base = i
+			}
+			size++
+		} else {
+			if size != 0 {
+				d = append(d, BitRange{base, size})
+			}
+			size = 0
+		}
+	}
+	if size != 0 {
+		d = append(d, BitRange{base, size})
+	}
+	return d
+}
+
+// StringifyPallocBits gets the bits in the bit range r from b,
+// and returns a string containing the bits as ASCII 0 and 1
+// characters.
+func StringifyPallocBits(b *PallocBits, r BitRange) string {
+	str := ""
+	for j := r.I; j < r.I+r.N; j++ {
+		if (*pageBits)(b).get(j) != 0 {
+			str += "1"
+		} else {
+			str += "0"
+		}
+	}
+	return str
+}
+
+// Expose pallocData for testing.
+type PallocData pallocData
+
+func (d *PallocData) FindScavengeCandidate(searchIdx uint, min, max uintptr) (uint, uint) {
+	return (*pallocData)(d).findScavengeCandidate(searchIdx, min, max)
+}
+func (d *PallocData) AllocRange(i, n uint) { (*pallocData)(d).allocRange(i, n) }
+func (d *PallocData) ScavengedSetRange(i, n uint) {
+	(*pallocData)(d).scavenged.setRange(i, n)
+}
+func (d *PallocData) PallocBits() *PallocBits {
+	return (*PallocBits)(&(*pallocData)(d).pallocBits)
+}
+func (d *PallocData) Scavenged() *PallocBits {
+	return (*PallocBits)(&(*pallocData)(d).scavenged)
+}
+
+// Expose fillAligned for testing.
+func FillAligned(x uint64, m uint) uint64 { return fillAligned(x, m) }
+
+// Expose pageCache for testing.
+type PageCache pageCache
+
+const PageCachePages = pageCachePages
+
+func NewPageCache(base uintptr, cache, scav uint64) PageCache {
+	return PageCache(pageCache{base: base, cache: cache, scav: scav})
+}
+func (c *PageCache) Empty() bool   { return (*pageCache)(c).empty() }
+func (c *PageCache) Base() uintptr { return (*pageCache)(c).base }
+func (c *PageCache) Cache() uint64 { return (*pageCache)(c).cache }
+func (c *PageCache) Scav() uint64  { return (*pageCache)(c).scav }
+func (c *PageCache) Alloc(npages uintptr) (uintptr, uintptr) {
+	return (*pageCache)(c).alloc(npages)
+}
+func (c *PageCache) Flush(s *PageAlloc) {
+	cp := (*pageCache)(c)
+	sp := (*pageAlloc)(s)
+
+	systemstack(func() {
+		// None of the tests need any higher-level locking, so we just
+		// take the lock internally.
+		lock(sp.mheapLock)
+		cp.flush(sp)
+		unlock(sp.mheapLock)
+	})
+}
+
+// Expose chunk index type.
+type ChunkIdx chunkIdx
+
+// Expose pageAlloc for testing. Note that because pageAlloc is
+// not in the heap, so is PageAlloc.
+type PageAlloc pageAlloc
+
+func (p *PageAlloc) Alloc(npages uintptr) (uintptr, uintptr) {
+	pp := (*pageAlloc)(p)
+
+	var addr, scav uintptr
+	systemstack(func() {
+		// None of the tests need any higher-level locking, so we just
+		// take the lock internally.
+		lock(pp.mheapLock)
+		addr, scav = pp.alloc(npages)
+		unlock(pp.mheapLock)
+	})
+	return addr, scav
+}
+func (p *PageAlloc) AllocToCache() PageCache {
+	pp := (*pageAlloc)(p)
+
+	var c PageCache
+	systemstack(func() {
+		// None of the tests need any higher-level locking, so we just
+		// take the lock internally.
+		lock(pp.mheapLock)
+		c = PageCache(pp.allocToCache())
+		unlock(pp.mheapLock)
+	})
+	return c
+}
+func (p *PageAlloc) Free(base, npages uintptr) {
+	pp := (*pageAlloc)(p)
+
+	systemstack(func() {
+		// None of the tests need any higher-level locking, so we just
+		// take the lock internally.
+		lock(pp.mheapLock)
+		pp.free(base, npages)
+		unlock(pp.mheapLock)
+	})
+}
+func (p *PageAlloc) Bounds() (ChunkIdx, ChunkIdx) {
+	return ChunkIdx((*pageAlloc)(p).start), ChunkIdx((*pageAlloc)(p).end)
+}
+func (p *PageAlloc) Scavenge(nbytes uintptr) (r uintptr) {
+	pp := (*pageAlloc)(p)
+	systemstack(func() {
+		r = pp.scavenge(nbytes, nil, true)
+	})
+	return
+}
+func (p *PageAlloc) InUse() []AddrRange {
+	ranges := make([]AddrRange, 0, len(p.inUse.ranges))
+	for _, r := range p.inUse.ranges {
+		ranges = append(ranges, AddrRange{r})
+	}
+	return ranges
+}
+
+// Returns nil if the PallocData's L2 is missing.
+func (p *PageAlloc) PallocData(i ChunkIdx) *PallocData {
+	ci := chunkIdx(i)
+	return (*PallocData)((*pageAlloc)(p).tryChunkOf(ci))
+}
+
+// AddrRange is a wrapper around addrRange for testing.
+type AddrRange struct {
+	addrRange
+}
+
+// MakeAddrRange creates a new address range.
+func MakeAddrRange(base, limit uintptr) AddrRange {
+	return AddrRange{makeAddrRange(base, limit)}
+}
+
+// Base returns the virtual base address of the address range.
+func (a AddrRange) Base() uintptr {
+	return a.addrRange.base.addr()
+}
+
+// Base returns the virtual address of the limit of the address range.
+func (a AddrRange) Limit() uintptr {
+	return a.addrRange.limit.addr()
+}
+
+// Equals returns true if the two address ranges are exactly equal.
+func (a AddrRange) Equals(b AddrRange) bool {
+	return a == b
+}
+
+// Size returns the size in bytes of the address range.
+func (a AddrRange) Size() uintptr {
+	return a.addrRange.size()
+}
+
+// testSysStat is the sysStat passed to test versions of various
+// runtime structures. We do actually have to keep track of this
+// because otherwise memstats.mappedReady won't actually line up
+// with other stats in the runtime during tests.
+var testSysStat = &memstats.other_sys
+
+// AddrRanges is a wrapper around addrRanges for testing.
+type AddrRanges struct {
+	addrRanges
+	mutable bool
+}
+
+// NewAddrRanges creates a new empty addrRanges.
+//
+// Note that this initializes addrRanges just like in the
+// runtime, so its memory is persistentalloc'd. Call this
+// function sparingly since the memory it allocates is
+// leaked.
+//
+// This AddrRanges is mutable, so we can test methods like
+// Add.
+func NewAddrRanges() AddrRanges {
+	r := addrRanges{}
+	r.init(testSysStat)
+	return AddrRanges{r, true}
+}
+
+// MakeAddrRanges creates a new addrRanges populated with
+// the ranges in a.
+//
+// The returned AddrRanges is immutable, so methods like
+// Add will fail.
+func MakeAddrRanges(a ...AddrRange) AddrRanges {
+	// Methods that manipulate the backing store of addrRanges.ranges should
+	// not be used on the result from this function (e.g. add) since they may
+	// trigger reallocation. That would normally be fine, except the new
+	// backing store won't come from the heap, but from persistentalloc, so
+	// we'll leak some memory implicitly.
+	ranges := make([]addrRange, 0, len(a))
+	total := uintptr(0)
+	for _, r := range a {
+		ranges = append(ranges, r.addrRange)
+		total += r.Size()
+	}
+	return AddrRanges{addrRanges{
+		ranges:     ranges,
+		totalBytes: total,
+		sysStat:    testSysStat,
+	}, false}
+}
+
+// Ranges returns a copy of the ranges described by the
+// addrRanges.
+func (a *AddrRanges) Ranges() []AddrRange {
+	result := make([]AddrRange, 0, len(a.addrRanges.ranges))
+	for _, r := range a.addrRanges.ranges {
+		result = append(result, AddrRange{r})
+	}
+	return result
+}
+
+// FindSucc returns the successor to base. See addrRanges.findSucc
+// for more details.
+func (a *AddrRanges) FindSucc(base uintptr) int {
+	return a.findSucc(base)
+}
+
+// Add adds a new AddrRange to the AddrRanges.
+//
+// The AddrRange must be mutable (i.e. created by NewAddrRanges),
+// otherwise this method will throw.
+func (a *AddrRanges) Add(r AddrRange) {
+	if !a.mutable {
+		throw("attempt to mutate immutable AddrRanges")
+	}
+	a.add(r.addrRange)
+}
+
+// TotalBytes returns the totalBytes field of the addrRanges.
+func (a *AddrRanges) TotalBytes() uintptr {
+	return a.addrRanges.totalBytes
+}
+
+// BitRange represents a range over a bitmap.
+type BitRange struct {
+	I, N uint // bit index and length in bits
+}
+
+// NewPageAlloc creates a new page allocator for testing and
+// initializes it with the scav and chunks maps. Each key in these maps
+// represents a chunk index and each value is a series of bit ranges to
+// set within each bitmap's chunk.
+//
+// The initialization of the pageAlloc preserves the invariant that if a
+// scavenged bit is set the alloc bit is necessarily unset, so some
+// of the bits described by scav may be cleared in the final bitmap if
+// ranges in chunks overlap with them.
+//
+// scav is optional, and if nil, the scavenged bitmap will be cleared
+// (as opposed to all 1s, which it usually is). Furthermore, every
+// chunk index in scav must appear in chunks; ones that do not are
+// ignored.
+func NewPageAlloc(chunks, scav map[ChunkIdx][]BitRange) *PageAlloc {
+	p := new(pageAlloc)
+
+	// We've got an entry, so initialize the pageAlloc.
+	p.init(new(mutex), testSysStat, true)
+	lockInit(p.mheapLock, lockRankMheap)
+	for i, init := range chunks {
+		addr := chunkBase(chunkIdx(i))
+
+		// Mark the chunk's existence in the pageAlloc.
+		systemstack(func() {
+			lock(p.mheapLock)
+			p.grow(addr, pallocChunkBytes)
+			unlock(p.mheapLock)
+		})
+
+		// Initialize the bitmap and update pageAlloc metadata.
+		ci := chunkIndex(addr)
+		chunk := p.chunkOf(ci)
+
+		// Clear all the scavenged bits which grow set.
+		chunk.scavenged.clearRange(0, pallocChunkPages)
+
+		// Simulate the allocation and subsequent free of all pages in
+		// the chunk for the scavenge index. This sets the state equivalent
+		// with all pages within the index being free.
+		p.scav.index.alloc(ci, pallocChunkPages)
+		p.scav.index.free(ci, 0, pallocChunkPages)
+
+		// Apply scavenge state if applicable.
+		if scav != nil {
+			if scvg, ok := scav[i]; ok {
+				for _, s := range scvg {
+					// Ignore the case of s.N == 0. setRange doesn't handle
+					// it and it's a no-op anyway.
+					if s.N != 0 {
+						chunk.scavenged.setRange(s.I, s.N)
+					}
+				}
+			}
+		}
+
+		// Apply alloc state.
+		for _, s := range init {
+			// Ignore the case of s.N == 0. allocRange doesn't handle
+			// it and it's a no-op anyway.
+			if s.N != 0 {
+				chunk.allocRange(s.I, s.N)
+
+				// Make sure the scavenge index is updated.
+				p.scav.index.alloc(ci, s.N)
+			}
+		}
+
+		// Update heap metadata for the allocRange calls above.
+		systemstack(func() {
+			lock(p.mheapLock)
+			p.update(addr, pallocChunkPages, false, false)
+			unlock(p.mheapLock)
+		})
+	}
+
+	return (*PageAlloc)(p)
+}
+
+// FreePageAlloc releases hard OS resources owned by the pageAlloc. Once this
+// is called the pageAlloc may no longer be used. The object itself will be
+// collected by the garbage collector once it is no longer live.
+func FreePageAlloc(pp *PageAlloc) {
+	p := (*pageAlloc)(pp)
+
+	// Free all the mapped space for the summary levels.
+	if pageAlloc64Bit != 0 {
+		for l := 0; l < summaryLevels; l++ {
+			sysFreeOS(unsafe.Pointer(&p.summary[l][0]), uintptr(cap(p.summary[l]))*pallocSumBytes)
+		}
+	} else {
+		resSize := uintptr(0)
+		for _, s := range p.summary {
+			resSize += uintptr(cap(s)) * pallocSumBytes
+		}
+		sysFreeOS(unsafe.Pointer(&p.summary[0][0]), alignUp(resSize, physPageSize))
+	}
+
+	// Free extra data structures.
+	sysFreeOS(unsafe.Pointer(&p.scav.index.chunks[0]), uintptr(cap(p.scav.index.chunks))*unsafe.Sizeof(atomicScavChunkData{}))
+
+	// Subtract back out whatever we mapped for the summaries.
+	// sysUsed adds to p.sysStat and memstats.mappedReady no matter what
+	// (and in anger should actually be accounted for), and there's no other
+	// way to figure out how much we actually mapped.
+	gcController.mappedReady.Add(-int64(p.summaryMappedReady))
+	testSysStat.add(-int64(p.summaryMappedReady))
+
+	// Free the mapped space for chunks.
+	for i := range p.chunks {
+		if x := p.chunks[i]; x != nil {
+			p.chunks[i] = nil
+			// This memory comes from sysAlloc and will always be page-aligned.
+			sysFree(unsafe.Pointer(x), unsafe.Sizeof(*p.chunks[0]), testSysStat)
+		}
+	}
+}
+
+// BaseChunkIdx is a convenient chunkIdx value which works on both
+// 64 bit and 32 bit platforms, allowing the tests to share code
+// between the two.
+//
+// This should not be higher than 0x100*pallocChunkBytes to support
+// mips and mipsle, which only have 31-bit address spaces.
+var BaseChunkIdx = func() ChunkIdx {
+	var prefix uintptr
+	if pageAlloc64Bit != 0 {
+		prefix = 0xc000
+	} else {
+		prefix = 0x100
+	}
+	baseAddr := prefix * pallocChunkBytes
+	if goos.IsAix != 0 {
+		baseAddr += arenaBaseOffset
+	}
+	return ChunkIdx(chunkIndex(baseAddr))
+}()
+
+// PageBase returns an address given a chunk index and a page index
+// relative to that chunk.
+func PageBase(c ChunkIdx, pageIdx uint) uintptr {
+	return chunkBase(chunkIdx(c)) + uintptr(pageIdx)*pageSize
+}
+
+type BitsMismatch struct {
+	Base      uintptr
+	Got, Want uint64
+}
+
+func CheckScavengedBitsCleared(mismatches []BitsMismatch) (n int, ok bool) {
+	ok = true
+
+	// Run on the system stack to avoid stack growth allocation.
+	systemstack(func() {
+		getg().m.mallocing++
+
+		// Lock so that we can safely access the bitmap.
+		lock(&mheap_.lock)
+	chunkLoop:
+		for i := mheap_.pages.start; i < mheap_.pages.end; i++ {
+			chunk := mheap_.pages.tryChunkOf(i)
+			if chunk == nil {
+				continue
+			}
+			for j := 0; j < pallocChunkPages/64; j++ {
+				// Run over each 64-bit bitmap section and ensure
+				// scavenged is being cleared properly on allocation.
+				// If a used bit and scavenged bit are both set, that's
+				// an error, and could indicate a larger problem, or
+				// an accounting problem.
+				want := chunk.scavenged[j] &^ chunk.pallocBits[j]
+				got := chunk.scavenged[j]
+				if want != got {
+					ok = false
+					if n >= len(mismatches) {
+						break chunkLoop
+					}
+					mismatches[n] = BitsMismatch{
+						Base: chunkBase(i) + uintptr(j)*64*pageSize,
+						Got:  got,
+						Want: want,
+					}
+					n++
+				}
+			}
+		}
+		unlock(&mheap_.lock)
+
+		getg().m.mallocing--
+	})
+	return
+}
+
+func PageCachePagesLeaked() (leaked uintptr) {
+	stopTheWorld(stwForTestPageCachePagesLeaked)
+
+	// Walk over destroyed Ps and look for unflushed caches.
+	deadp := allp[len(allp):cap(allp)]
+	for _, p := range deadp {
+		// Since we're going past len(allp) we may see nil Ps.
+		// Just ignore them.
+		if p != nil {
+			leaked += uintptr(sys.OnesCount64(p.pcache.cache))
+		}
+	}
+
+	startTheWorld()
+	return
+}
+
+var Semacquire = semacquire
+var Semrelease1 = semrelease1
+
+func SemNwait(addr *uint32) uint32 {
+	root := semtable.rootFor(addr)
+	return root.nwait.Load()
+}
+
+const SemTableSize = semTabSize
+
+// SemTable is a wrapper around semTable exported for testing.
+type SemTable struct {
+	semTable
+}
+
+// Enqueue simulates enqueuing a waiter for a semaphore (or lock) at addr.
+func (t *SemTable) Enqueue(addr *uint32) {
+	s := acquireSudog()
+	s.releasetime = 0
+	s.acquiretime = 0
+	s.ticket = 0
+	t.semTable.rootFor(addr).queue(addr, s, false)
+}
+
+// Dequeue simulates dequeuing a waiter for a semaphore (or lock) at addr.
+//
+// Returns true if there actually was a waiter to be dequeued.
+func (t *SemTable) Dequeue(addr *uint32) bool {
+	s, _ := t.semTable.rootFor(addr).dequeue(addr)
+	if s != nil {
+		releaseSudog(s)
+		return true
+	}
+	return false
+}
+
+// mspan wrapper for testing.
+type MSpan mspan
+
+// Allocate an mspan for testing.
+func AllocMSpan() *MSpan {
+	var s *mspan
+	systemstack(func() {
+		lock(&mheap_.lock)
+		s = (*mspan)(mheap_.spanalloc.alloc())
+		unlock(&mheap_.lock)
+	})
+	return (*MSpan)(s)
+}
+
+// Free an allocated mspan.
+func FreeMSpan(s *MSpan) {
+	systemstack(func() {
+		lock(&mheap_.lock)
+		mheap_.spanalloc.free(unsafe.Pointer(s))
+		unlock(&mheap_.lock)
+	})
+}
+
+func MSpanCountAlloc(ms *MSpan, bits []byte) int {
+	s := (*mspan)(ms)
+	s.nelems = uintptr(len(bits) * 8)
+	s.gcmarkBits = (*gcBits)(unsafe.Pointer(&bits[0]))
+	result := s.countAlloc()
+	s.gcmarkBits = nil
+	return result
+}
+
+const (
+	TimeHistSubBucketBits = timeHistSubBucketBits
+	TimeHistNumSubBuckets = timeHistNumSubBuckets
+	TimeHistNumBuckets    = timeHistNumBuckets
+	TimeHistMinBucketBits = timeHistMinBucketBits
+	TimeHistMaxBucketBits = timeHistMaxBucketBits
+)
+
+type TimeHistogram timeHistogram
+
+// Counts returns the counts for the given bucket, subBucket indices.
+// Returns true if the bucket was valid, otherwise returns the counts
+// for the overflow bucket if bucket > 0 or the underflow bucket if
+// bucket < 0, and false.
+func (th *TimeHistogram) Count(bucket, subBucket int) (uint64, bool) {
+	t := (*timeHistogram)(th)
+	if bucket < 0 {
+		return t.underflow.Load(), false
+	}
+	i := bucket*TimeHistNumSubBuckets + subBucket
+	if i >= len(t.counts) {
+		return t.overflow.Load(), false
+	}
+	return t.counts[i].Load(), true
+}
+
+func (th *TimeHistogram) Record(duration int64) {
+	(*timeHistogram)(th).record(duration)
+}
+
+var TimeHistogramMetricsBuckets = timeHistogramMetricsBuckets
+
+func SetIntArgRegs(a int) int {
+	lock(&finlock)
+	old := intArgRegs
+	if a >= 0 {
+		intArgRegs = a
+	}
+	unlock(&finlock)
+	return old
+}
+
+func FinalizerGAsleep() bool {
+	return fingStatus.Load()&fingWait != 0
+}
+
+// For GCTestMoveStackOnNextCall, it's important not to introduce an
+// extra layer of call, since then there's a return before the "real"
+// next call.
+var GCTestMoveStackOnNextCall = gcTestMoveStackOnNextCall
+
+// For GCTestIsReachable, it's important that we do this as a call so
+// escape analysis can see through it.
+func GCTestIsReachable(ptrs ...unsafe.Pointer) (mask uint64) {
+	return gcTestIsReachable(ptrs...)
+}
+
+// For GCTestPointerClass, it's important that we do this as a call so
+// escape analysis can see through it.
+//
+// This is nosplit because gcTestPointerClass is.
+//
+//go:nosplit
+func GCTestPointerClass(p unsafe.Pointer) string {
+	return gcTestPointerClass(p)
+}
+
+const Raceenabled = raceenabled
+
+const (
+	GCBackgroundUtilization            = gcBackgroundUtilization
+	GCGoalUtilization                  = gcGoalUtilization
+	DefaultHeapMinimum                 = defaultHeapMinimum
+	MemoryLimitHeapGoalHeadroomPercent = memoryLimitHeapGoalHeadroomPercent
+	MemoryLimitMinHeapGoalHeadroom     = memoryLimitMinHeapGoalHeadroom
+)
+
+type GCController struct {
+	gcControllerState
+}
+
+func NewGCController(gcPercent int, memoryLimit int64) *GCController {
+	// Force the controller to escape. We're going to
+	// do 64-bit atomics on it, and if it gets stack-allocated
+	// on a 32-bit architecture, it may get allocated unaligned
+	// space.
+	g := Escape(new(GCController))
+	g.gcControllerState.test = true // Mark it as a test copy.
+	g.init(int32(gcPercent), memoryLimit)
+	return g
+}
+
+func (c *GCController) StartCycle(stackSize, globalsSize uint64, scannableFrac float64, gomaxprocs int) {
+	trigger, _ := c.trigger()
+	if c.heapMarked > trigger {
+		trigger = c.heapMarked
+	}
+	c.maxStackScan.Store(stackSize)
+	c.globalsScan.Store(globalsSize)
+	c.heapLive.Store(trigger)
+	c.heapScan.Add(int64(float64(trigger-c.heapMarked) * scannableFrac))
+	c.startCycle(0, gomaxprocs, gcTrigger{kind: gcTriggerHeap})
+}
+
+func (c *GCController) AssistWorkPerByte() float64 {
+	return c.assistWorkPerByte.Load()
+}
+
+func (c *GCController) HeapGoal() uint64 {
+	return c.heapGoal()
+}
+
+func (c *GCController) HeapLive() uint64 {
+	return c.heapLive.Load()
+}
+
+func (c *GCController) HeapMarked() uint64 {
+	return c.heapMarked
+}
+
+func (c *GCController) Triggered() uint64 {
+	return c.triggered
+}
+
+type GCControllerReviseDelta struct {
+	HeapLive        int64
+	HeapScan        int64
+	HeapScanWork    int64
+	StackScanWork   int64
+	GlobalsScanWork int64
+}
+
+func (c *GCController) Revise(d GCControllerReviseDelta) {
+	c.heapLive.Add(d.HeapLive)
+	c.heapScan.Add(d.HeapScan)
+	c.heapScanWork.Add(d.HeapScanWork)
+	c.stackScanWork.Add(d.StackScanWork)
+	c.globalsScanWork.Add(d.GlobalsScanWork)
+	c.revise()
+}
+
+func (c *GCController) EndCycle(bytesMarked uint64, assistTime, elapsed int64, gomaxprocs int) {
+	c.assistTime.Store(assistTime)
+	c.endCycle(elapsed, gomaxprocs, false)
+	c.resetLive(bytesMarked)
+	c.commit(false)
+}
+
+func (c *GCController) AddIdleMarkWorker() bool {
+	return c.addIdleMarkWorker()
+}
+
+func (c *GCController) NeedIdleMarkWorker() bool {
+	return c.needIdleMarkWorker()
+}
+
+func (c *GCController) RemoveIdleMarkWorker() {
+	c.removeIdleMarkWorker()
+}
+
+func (c *GCController) SetMaxIdleMarkWorkers(max int32) {
+	c.setMaxIdleMarkWorkers(max)
+}
+
+var alwaysFalse bool
+var escapeSink any
+
+func Escape[T any](x T) T {
+	if alwaysFalse {
+		escapeSink = x
+	}
+	return x
+}
+
+// Acquirem blocks preemption.
+func Acquirem() {
+	acquirem()
+}
+
+func Releasem() {
+	releasem(getg().m)
+}
+
+var Timediv = timediv
+
+type PIController struct {
+	piController
+}
+
+func NewPIController(kp, ti, tt, min, max float64) *PIController {
+	return &PIController{piController{
+		kp:  kp,
+		ti:  ti,
+		tt:  tt,
+		min: min,
+		max: max,
+	}}
+}
+
+func (c *PIController) Next(input, setpoint, period float64) (float64, bool) {
+	return c.piController.next(input, setpoint, period)
+}
+
+const (
+	CapacityPerProc          = capacityPerProc
+	GCCPULimiterUpdatePeriod = gcCPULimiterUpdatePeriod
+)
+
+type GCCPULimiter struct {
+	limiter gcCPULimiterState
+}
+
+func NewGCCPULimiter(now int64, gomaxprocs int32) *GCCPULimiter {
+	// Force the controller to escape. We're going to
+	// do 64-bit atomics on it, and if it gets stack-allocated
+	// on a 32-bit architecture, it may get allocated unaligned
+	// space.
+	l := Escape(new(GCCPULimiter))
+	l.limiter.test = true
+	l.limiter.resetCapacity(now, gomaxprocs)
+	return l
+}
+
+func (l *GCCPULimiter) Fill() uint64 {
+	return l.limiter.bucket.fill
+}
+
+func (l *GCCPULimiter) Capacity() uint64 {
+	return l.limiter.bucket.capacity
+}
+
+func (l *GCCPULimiter) Overflow() uint64 {
+	return l.limiter.overflow
+}
+
+func (l *GCCPULimiter) Limiting() bool {
+	return l.limiter.limiting()
+}
+
+func (l *GCCPULimiter) NeedUpdate(now int64) bool {
+	return l.limiter.needUpdate(now)
+}
+
+func (l *GCCPULimiter) StartGCTransition(enableGC bool, now int64) {
+	l.limiter.startGCTransition(enableGC, now)
+}
+
+func (l *GCCPULimiter) FinishGCTransition(now int64) {
+	l.limiter.finishGCTransition(now)
+}
+
+func (l *GCCPULimiter) Update(now int64) {
+	l.limiter.update(now)
+}
+
+func (l *GCCPULimiter) AddAssistTime(t int64) {
+	l.limiter.addAssistTime(t)
+}
+
+func (l *GCCPULimiter) ResetCapacity(now int64, nprocs int32) {
+	l.limiter.resetCapacity(now, nprocs)
+}
+
+const ScavengePercent = scavengePercent
+
+type Scavenger struct {
+	Sleep      func(int64) int64
+	Scavenge   func(uintptr) (uintptr, int64)
+	ShouldStop func() bool
+	GoMaxProcs func() int32
+
+	released  atomic.Uintptr
+	scavenger scavengerState
+	stop      chan<- struct{}
+	done      <-chan struct{}
+}
+
+func (s *Scavenger) Start() {
+	if s.Sleep == nil || s.Scavenge == nil || s.ShouldStop == nil || s.GoMaxProcs == nil {
+		panic("must populate all stubs")
+	}
+
+	// Install hooks.
+	s.scavenger.sleepStub = s.Sleep
+	s.scavenger.scavenge = s.Scavenge
+	s.scavenger.shouldStop = s.ShouldStop
+	s.scavenger.gomaxprocs = s.GoMaxProcs
+
+	// Start up scavenger goroutine, and wait for it to be ready.
+	stop := make(chan struct{})
+	s.stop = stop
+	done := make(chan struct{})
+	s.done = done
+	go func() {
+		// This should match bgscavenge, loosely.
+		s.scavenger.init()
+		s.scavenger.park()
+		for {
+			select {
+			case <-stop:
+				close(done)
+				return
+			default:
+			}
+			released, workTime := s.scavenger.run()
+			if released == 0 {
+				s.scavenger.park()
+				continue
+			}
+			s.released.Add(released)
+			s.scavenger.sleep(workTime)
+		}
+	}()
+	if !s.BlockUntilParked(1e9 /* 1 second */) {
+		panic("timed out waiting for scavenger to get ready")
+	}
+}
+
+// BlockUntilParked blocks until the scavenger parks, or until
+// timeout is exceeded. Returns true if the scavenger parked.
+//
+// Note that in testing, parked means something slightly different.
+// In anger, the scavenger parks to sleep, too, but in testing,
+// it only parks when it actually has no work to do.
+func (s *Scavenger) BlockUntilParked(timeout int64) bool {
+	// Just spin, waiting for it to park.
+	//
+	// The actual parking process is racy with respect to
+	// wakeups, which is fine, but for testing we need something
+	// a bit more robust.
+	start := nanotime()
+	for nanotime()-start < timeout {
+		lock(&s.scavenger.lock)
+		parked := s.scavenger.parked
+		unlock(&s.scavenger.lock)
+		if parked {
+			return true
+		}
+		Gosched()
+	}
+	return false
+}
+
+// Released returns how many bytes the scavenger released.
+func (s *Scavenger) Released() uintptr {
+	return s.released.Load()
+}
+
+// Wake wakes up a parked scavenger to keep running.
+func (s *Scavenger) Wake() {
+	s.scavenger.wake()
+}
+
+// Stop cleans up the scavenger's resources. The scavenger
+// must be parked for this to work.
+func (s *Scavenger) Stop() {
+	lock(&s.scavenger.lock)
+	parked := s.scavenger.parked
+	unlock(&s.scavenger.lock)
+	if !parked {
+		panic("tried to clean up scavenger that is not parked")
+	}
+	close(s.stop)
+	s.Wake()
+	<-s.done
+}
+
+type ScavengeIndex struct {
+	i scavengeIndex
+}
+
+func NewScavengeIndex(min, max ChunkIdx) *ScavengeIndex {
+	s := new(ScavengeIndex)
+	// This is a bit lazy but we easily guarantee we'll be able
+	// to reference all the relevant chunks. The worst-case
+	// memory usage here is 512 MiB, but tests generally use
+	// small offsets from BaseChunkIdx, which results in ~100s
+	// of KiB in memory use.
+	//
+	// This may still be worth making better, at least by sharing
+	// this fairly large array across calls with a sync.Pool or
+	// something. Currently, when the tests are run serially,
+	// it takes around 0.5s. Not all that much, but if we have
+	// a lot of tests like this it could add up.
+	s.i.chunks = make([]atomicScavChunkData, max)
+	s.i.min.Store(uintptr(min))
+	s.i.max.Store(uintptr(max))
+	s.i.minHeapIdx.Store(uintptr(min))
+	s.i.test = true
+	return s
+}
+
+func (s *ScavengeIndex) Find(force bool) (ChunkIdx, uint) {
+	ci, off := s.i.find(force)
+	return ChunkIdx(ci), off
+}
+
+func (s *ScavengeIndex) AllocRange(base, limit uintptr) {
+	sc, ec := chunkIndex(base), chunkIndex(limit-1)
+	si, ei := chunkPageIndex(base), chunkPageIndex(limit-1)
+
+	if sc == ec {
+		// The range doesn't cross any chunk boundaries.
+		s.i.alloc(sc, ei+1-si)
+	} else {
+		// The range crosses at least one chunk boundary.
+		s.i.alloc(sc, pallocChunkPages-si)
+		for c := sc + 1; c < ec; c++ {
+			s.i.alloc(c, pallocChunkPages)
+		}
+		s.i.alloc(ec, ei+1)
+	}
+}
+
+func (s *ScavengeIndex) FreeRange(base, limit uintptr) {
+	sc, ec := chunkIndex(base), chunkIndex(limit-1)
+	si, ei := chunkPageIndex(base), chunkPageIndex(limit-1)
+
+	if sc == ec {
+		// The range doesn't cross any chunk boundaries.
+		s.i.free(sc, si, ei+1-si)
+	} else {
+		// The range crosses at least one chunk boundary.
+		s.i.free(sc, si, pallocChunkPages-si)
+		for c := sc + 1; c < ec; c++ {
+			s.i.free(c, 0, pallocChunkPages)
+		}
+		s.i.free(ec, 0, ei+1)
+	}
+}
+
+func (s *ScavengeIndex) ResetSearchAddrs() {
+	for _, a := range []*atomicOffAddr{&s.i.searchAddrBg, &s.i.searchAddrForce} {
+		addr, marked := a.Load()
+		if marked {
+			a.StoreUnmark(addr, addr)
+		}
+		a.Clear()
+	}
+	s.i.freeHWM = minOffAddr
+}
+
+func (s *ScavengeIndex) NextGen() {
+	s.i.nextGen()
+}
+
+func (s *ScavengeIndex) SetEmpty(ci ChunkIdx) {
+	s.i.setEmpty(chunkIdx(ci))
+}
+
+func (s *ScavengeIndex) SetNoHugePage(ci ChunkIdx) {
+	s.i.setNoHugePage(chunkIdx(ci))
+}
+
+func CheckPackScavChunkData(gen uint32, inUse, lastInUse uint16, flags uint8) bool {
+	sc0 := scavChunkData{
+		gen:            gen,
+		inUse:          inUse,
+		lastInUse:      lastInUse,
+		scavChunkFlags: scavChunkFlags(flags),
+	}
+	scp := sc0.pack()
+	sc1 := unpackScavChunkData(scp)
+	return sc0 == sc1
+}
+
+const GTrackingPeriod = gTrackingPeriod
+
+var ZeroBase = unsafe.Pointer(&zerobase)
+
+const UserArenaChunkBytes = userArenaChunkBytes
+
+type UserArena struct {
+	arena *userArena
+}
+
+func NewUserArena() *UserArena {
+	return &UserArena{newUserArena()}
+}
+
+func (a *UserArena) New(out *any) {
+	i := efaceOf(out)
+	typ := i._type
+	if typ.Kind_&kindMask != kindPtr {
+		panic("new result of non-ptr type")
+	}
+	typ = (*ptrtype)(unsafe.Pointer(typ)).Elem
+	i.data = a.arena.new(typ)
+}
+
+func (a *UserArena) Slice(sl any, cap int) {
+	a.arena.slice(sl, cap)
+}
+
+func (a *UserArena) Free() {
+	a.arena.free()
+}
+
+func GlobalWaitingArenaChunks() int {
+	n := 0
+	systemstack(func() {
+		lock(&mheap_.lock)
+		for s := mheap_.userArena.quarantineList.first; s != nil; s = s.next {
+			n++
+		}
+		unlock(&mheap_.lock)
+	})
+	return n
+}
+
+func UserArenaClone[T any](s T) T {
+	return arena_heapify(s).(T)
+}
+
+var AlignUp = alignUp
+
+// BlockUntilEmptyFinalizerQueue blocks until either the finalizer
+// queue is emptied (and the finalizers have executed) or the timeout
+// is reached. Returns true if the finalizer queue was emptied.
+func BlockUntilEmptyFinalizerQueue(timeout int64) bool {
+	start := nanotime()
+	for nanotime()-start < timeout {
+		lock(&finlock)
+		// We know the queue has been drained when both finq is nil
+		// and the finalizer g has stopped executing.
+		empty := finq == nil
+		empty = empty && readgstatus(fing) == _Gwaiting && fing.waitreason == waitReasonFinalizerWait
+		unlock(&finlock)
+		if empty {
+			return true
+		}
+		Gosched()
+	}
+	return false
+}
+
+func FrameStartLine(f *Frame) int {
+	return f.startLine
+}
+
+// PersistentAlloc allocates some memory that lives outside the Go heap.
+// This memory will never be freed; use sparingly.
+func PersistentAlloc(n uintptr) unsafe.Pointer {
+	return persistentalloc(n, 0, &memstats.other_sys)
+}
+
+// FPCallers works like Callers and uses frame pointer unwinding to populate
+// pcBuf with the return addresses of the physical frames on the stack.
+func FPCallers(pcBuf []uintptr) int {
+	return fpTracebackPCs(unsafe.Pointer(getfp()), pcBuf)
+}
+
+const FramePointerEnabled = framepointer_enabled
+
+var (
+	IsPinned      = isPinned
+	GetPinCounter = pinnerGetPinCounter
+)
+
+func SetPinnerLeakPanic(f func()) {
+	pinnerLeakPanic = f
+}
+func GetPinnerLeakPanic() func() {
+	return pinnerLeakPanic
+}
diff --git a/src/runtime/export_unix_test.go b/src/runtime/export_unix_test.go
new file mode 100644
index 0000000..56ff771
--- /dev/null
+++ b/src/runtime/export_unix_test.go
@@ -0,0 +1,99 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package runtime
+
+import "unsafe"
+
+var NonblockingPipe = nonblockingPipe
+var Fcntl = fcntl
+var Closeonexec = closeonexec
+
+func sigismember(mask *sigset, i int) bool {
+	clear := *mask
+	sigdelset(&clear, i)
+	return clear != *mask
+}
+
+func Sigisblocked(i int) bool {
+	var sigmask sigset
+	sigprocmask(_SIG_SETMASK, nil, &sigmask)
+	return sigismember(&sigmask, i)
+}
+
+type M = m
+
+var waitForSigusr1 struct {
+	rdpipe int32
+	wrpipe int32
+	mID    int64
+}
+
+// WaitForSigusr1 blocks until a SIGUSR1 is received. It calls ready
+// when it is set up to receive SIGUSR1. The ready function should
+// cause a SIGUSR1 to be sent. The r and w arguments are a pipe that
+// the signal handler can use to report when the signal is received.
+//
+// Once SIGUSR1 is received, it returns the ID of the current M and
+// the ID of the M the SIGUSR1 was received on. If the caller writes
+// a non-zero byte to w, WaitForSigusr1 returns immediately with -1, -1.
+func WaitForSigusr1(r, w int32, ready func(mp *M)) (int64, int64) {
+	lockOSThread()
+	// Make sure we can receive SIGUSR1.
+	unblocksig(_SIGUSR1)
+
+	waitForSigusr1.rdpipe = r
+	waitForSigusr1.wrpipe = w
+
+	mp := getg().m
+	testSigusr1 = waitForSigusr1Callback
+	ready(mp)
+
+	// Wait for the signal. We use a pipe rather than a note
+	// because write is always async-signal-safe.
+	entersyscallblock()
+	var b byte
+	read(waitForSigusr1.rdpipe, noescape(unsafe.Pointer(&b)), 1)
+	exitsyscall()
+
+	gotM := waitForSigusr1.mID
+	testSigusr1 = nil
+
+	unlockOSThread()
+
+	if b != 0 {
+		// timeout signal from caller
+		return -1, -1
+	}
+	return mp.id, gotM
+}
+
+// waitForSigusr1Callback is called from the signal handler during
+// WaitForSigusr1. It must not have write barriers because there may
+// not be a P.
+//
+//go:nowritebarrierrec
+func waitForSigusr1Callback(gp *g) bool {
+	if gp == nil || gp.m == nil {
+		waitForSigusr1.mID = -1
+	} else {
+		waitForSigusr1.mID = gp.m.id
+	}
+	b := byte(0)
+	write(uintptr(waitForSigusr1.wrpipe), noescape(unsafe.Pointer(&b)), 1)
+	return true
+}
+
+// SendSigusr1 sends SIGUSR1 to mp.
+func SendSigusr1(mp *M) {
+	signalM(mp, _SIGUSR1)
+}
+
+const (
+	O_WRONLY = _O_WRONLY
+	O_CREAT  = _O_CREAT
+	O_TRUNC  = _O_TRUNC
+)
diff --git a/src/runtime/export_windows_test.go b/src/runtime/export_windows_test.go
new file mode 100644
index 0000000..cf0db57
--- /dev/null
+++ b/src/runtime/export_windows_test.go
@@ -0,0 +1,43 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Export guts for testing.
+
+package runtime
+
+import "unsafe"
+
+const MaxArgs = maxArgs
+
+var (
+	OsYield                 = osyield
+	TimeBeginPeriodRetValue = &timeBeginPeriodRetValue
+)
+
+func NumberOfProcessors() int32 {
+	var info systeminfo
+	stdcall1(_GetSystemInfo, uintptr(unsafe.Pointer(&info)))
+	return int32(info.dwnumberofprocessors)
+}
+
+type ContextStub struct {
+	context
+}
+
+func (c ContextStub) GetPC() uintptr {
+	return c.ip()
+}
+
+func NewContextStub() *ContextStub {
+	var ctx context
+	ctx.set_ip(getcallerpc())
+	ctx.set_sp(getcallersp())
+	fp := getfp()
+	// getfp is not implemented on windows/386 and windows/arm,
+	// in which case it returns 0.
+	if fp != 0 {
+		ctx.set_fp(*(*uintptr)(unsafe.Pointer(fp)))
+	}
+	return &ContextStub{ctx}
+}
diff --git a/src/runtime/extern.go b/src/runtime/extern.go
new file mode 100644
index 0000000..de4a0ca
--- /dev/null
+++ b/src/runtime/extern.go
@@ -0,0 +1,348 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+Package runtime contains operations that interact with Go's runtime system,
+such as functions to control goroutines. It also includes the low-level type information
+used by the reflect package; see reflect's documentation for the programmable
+interface to the run-time type system.
+
+# Environment Variables
+
+The following environment variables ($name or %name%, depending on the host
+operating system) control the run-time behavior of Go programs. The meanings
+and use may change from release to release.
+
+The GOGC variable sets the initial garbage collection target percentage.
+A collection is triggered when the ratio of freshly allocated data to live data
+remaining after the previous collection reaches this percentage. The default
+is GOGC=100. Setting GOGC=off disables the garbage collector entirely.
+[runtime/debug.SetGCPercent] allows changing this percentage at run time.
+
+The GOMEMLIMIT variable sets a soft memory limit for the runtime. This memory limit
+includes the Go heap and all other memory managed by the runtime, and excludes
+external memory sources such as mappings of the binary itself, memory managed in
+other languages, and memory held by the operating system on behalf of the Go
+program. GOMEMLIMIT is a numeric value in bytes with an optional unit suffix.
+The supported suffixes include B, KiB, MiB, GiB, and TiB. These suffixes
+represent quantities of bytes as defined by the IEC 80000-13 standard. That is,
+they are based on powers of two: KiB means 2^10 bytes, MiB means 2^20 bytes,
+and so on. The default setting is math.MaxInt64, which effectively disables the
+memory limit. [runtime/debug.SetMemoryLimit] allows changing this limit at run
+time.
+
+The GODEBUG variable controls debugging variables within the runtime.
+It is a comma-separated list of name=val pairs setting these named variables:
+
+	allocfreetrace: setting allocfreetrace=1 causes every allocation to be
+	profiled and a stack trace printed on each object's allocation and free.
+
+	clobberfree: setting clobberfree=1 causes the garbage collector to
+	clobber the memory content of an object with bad content when it frees
+	the object.
+
+	cpu.*: cpu.all=off disables the use of all optional instruction set extensions.
+	cpu.extension=off disables use of instructions from the specified instruction set extension.
+	extension is the lower case name for the instruction set extension such as sse41 or avx
+	as listed in internal/cpu package. As an example cpu.avx=off disables runtime detection
+	and thereby use of AVX instructions.
+
+	cgocheck: setting cgocheck=0 disables all checks for packages
+	using cgo to incorrectly pass Go pointers to non-Go code.
+	Setting cgocheck=1 (the default) enables relatively cheap
+	checks that may miss some errors. A more complete, but slow,
+	cgocheck mode can be enabled using GOEXPERIMENT (which
+	requires a rebuild), see https://pkg.go.dev/internal/goexperiment for details.
+
+	disablethp: setting disablethp=1 on Linux disables transparent huge pages for the heap.
+	It has no effect on other platforms. disablethp is meant for compatibility with versions
+	of Go before 1.21, which stopped working around a Linux kernel default that can result
+	in significant memory overuse. See https://go.dev/issue/64332. This setting will be
+	removed in a future release, so operators should tweak their Linux configuration to suit
+	their needs before then. See https://go.dev/doc/gc-guide#Linux_transparent_huge_pages.
+
+	dontfreezetheworld: by default, the start of a fatal panic or throw
+	"freezes the world", preempting all threads to stop all running
+	goroutines, which makes it possible to traceback all goroutines, and
+	keeps their state close to the point of panic. Setting
+	dontfreezetheworld=1 disables this preemption, allowing goroutines to
+	continue executing during panic processing. Note that goroutines that
+	naturally enter the scheduler will still stop. This can be useful when
+	debugging the runtime scheduler, as freezetheworld perturbs scheduler
+	state and thus may hide problems.
+
+	efence: setting efence=1 causes the allocator to run in a mode
+	where each object is allocated on a unique page and addresses are
+	never recycled.
+
+	gccheckmark: setting gccheckmark=1 enables verification of the
+	garbage collector's concurrent mark phase by performing a
+	second mark pass while the world is stopped.  If the second
+	pass finds a reachable object that was not found by concurrent
+	mark, the garbage collector will panic.
+
+	gcpacertrace: setting gcpacertrace=1 causes the garbage collector to
+	print information about the internal state of the concurrent pacer.
+
+	gcshrinkstackoff: setting gcshrinkstackoff=1 disables moving goroutines
+	onto smaller stacks. In this mode, a goroutine's stack can only grow.
+
+	gcstoptheworld: setting gcstoptheworld=1 disables concurrent garbage collection,
+	making every garbage collection a stop-the-world event. Setting gcstoptheworld=2
+	also disables concurrent sweeping after the garbage collection finishes.
+
+	gctrace: setting gctrace=1 causes the garbage collector to emit a single line to standard
+	error at each collection, summarizing the amount of memory collected and the
+	length of the pause. The format of this line is subject to change. Included in
+	the explanation below is also the relevant runtime/metrics metric for each field.
+	Currently, it is:
+		gc # @#s #%: #+#+# ms clock, #+#/#/#+# ms cpu, #->#-># MB, # MB goal, # MB stacks, #MB globals, # P
+	where the fields are as follows:
+		gc #         the GC number, incremented at each GC
+		@#s          time in seconds since program start
+		#%           percentage of time spent in GC since program start
+		#+...+#      wall-clock/CPU times for the phases of the GC
+		#->#-># MB   heap size at GC start, at GC end, and live heap, or /gc/scan/heap:bytes
+		# MB goal    goal heap size, or /gc/heap/goal:bytes
+		# MB stacks  estimated scannable stack size, or /gc/scan/stack:bytes
+		# MB globals scannable global size, or /gc/scan/globals:bytes
+		# P          number of processors used, or /sched/gomaxprocs:threads
+	The phases are stop-the-world (STW) sweep termination, concurrent
+	mark and scan, and STW mark termination. The CPU times
+	for mark/scan are broken down in to assist time (GC performed in
+	line with allocation), background GC time, and idle GC time.
+	If the line ends with "(forced)", this GC was forced by a
+	runtime.GC() call.
+
+	harddecommit: setting harddecommit=1 causes memory that is returned to the OS to
+	also have protections removed on it. This is the only mode of operation on Windows,
+	but is helpful in debugging scavenger-related issues on other platforms. Currently,
+	only supported on Linux.
+
+	inittrace: setting inittrace=1 causes the runtime to emit a single line to standard
+	error for each package with init work, summarizing the execution time and memory
+	allocation. No information is printed for inits executed as part of plugin loading
+	and for packages without both user defined and compiler generated init work.
+	The format of this line is subject to change. Currently, it is:
+		init # @#ms, # ms clock, # bytes, # allocs
+	where the fields are as follows:
+		init #      the package name
+		@# ms       time in milliseconds when the init started since program start
+		# clock     wall-clock time for package initialization work
+		# bytes     memory allocated on the heap
+		# allocs    number of heap allocations
+
+	madvdontneed: setting madvdontneed=0 will use MADV_FREE
+	instead of MADV_DONTNEED on Linux when returning memory to the
+	kernel. This is more efficient, but means RSS numbers will
+	drop only when the OS is under memory pressure. On the BSDs and
+	Illumos/Solaris, setting madvdontneed=1 will use MADV_DONTNEED instead
+	of MADV_FREE. This is less efficient, but causes RSS numbers to drop
+	more quickly.
+
+	memprofilerate: setting memprofilerate=X will update the value of runtime.MemProfileRate.
+	When set to 0 memory profiling is disabled.  Refer to the description of
+	MemProfileRate for the default value.
+
+	pagetrace: setting pagetrace=/path/to/file will write out a trace of page events
+	that can be viewed, analyzed, and visualized using the x/debug/cmd/pagetrace tool.
+	Build your program with GOEXPERIMENT=pagetrace to enable this functionality. Do not
+	enable this functionality if your program is a setuid binary as it introduces a security
+	risk in that scenario. Currently not supported on Windows, plan9 or js/wasm. Setting this
+	option for some applications can produce large traces, so use with care.
+
+	invalidptr: invalidptr=1 (the default) causes the garbage collector and stack
+	copier to crash the program if an invalid pointer value (for example, 1)
+	is found in a pointer-typed location. Setting invalidptr=0 disables this check.
+	This should only be used as a temporary workaround to diagnose buggy code.
+	The real fix is to not store integers in pointer-typed locations.
+
+	sbrk: setting sbrk=1 replaces the memory allocator and garbage collector
+	with a trivial allocator that obtains memory from the operating system and
+	never reclaims any memory.
+
+	scavtrace: setting scavtrace=1 causes the runtime to emit a single line to standard
+	error, roughly once per GC cycle, summarizing the amount of work done by the
+	scavenger as well as the total amount of memory returned to the operating system
+	and an estimate of physical memory utilization. The format of this line is subject
+	to change, but currently it is:
+		scav # KiB work (bg), # KiB work (eager), # KiB total, #% util
+	where the fields are as follows:
+		# KiB work (bg)    the amount of memory returned to the OS in the background since
+		                   the last line
+		# KiB work (eager) the amount of memory returned to the OS eagerly since the last line
+		# KiB now          the amount of address space currently returned to the OS
+		#% util            the fraction of all unscavenged heap memory which is in-use
+	If the line ends with "(forced)", then scavenging was forced by a
+	debug.FreeOSMemory() call.
+
+	scheddetail: setting schedtrace=X and scheddetail=1 causes the scheduler to emit
+	detailed multiline info every X milliseconds, describing state of the scheduler,
+	processors, threads and goroutines.
+
+	schedtrace: setting schedtrace=X causes the scheduler to emit a single line to standard
+	error every X milliseconds, summarizing the scheduler state.
+
+	tracebackancestors: setting tracebackancestors=N extends tracebacks with the stacks at
+	which goroutines were created, where N limits the number of ancestor goroutines to
+	report. This also extends the information returned by runtime.Stack. Ancestor's goroutine
+	IDs will refer to the ID of the goroutine at the time of creation; it's possible for this
+	ID to be reused for another goroutine. Setting N to 0 will report no ancestry information.
+
+	tracefpunwindoff: setting tracefpunwindoff=1 forces the execution tracer to
+	use the runtime's default stack unwinder instead of frame pointer unwinding.
+	This increases tracer overhead, but could be helpful as a workaround or for
+	debugging unexpected regressions caused by frame pointer unwinding.
+
+	asyncpreemptoff: asyncpreemptoff=1 disables signal-based
+	asynchronous goroutine preemption. This makes some loops
+	non-preemptible for long periods, which may delay GC and
+	goroutine scheduling. This is useful for debugging GC issues
+	because it also disables the conservative stack scanning used
+	for asynchronously preempted goroutines.
+
+The net and net/http packages also refer to debugging variables in GODEBUG.
+See the documentation for those packages for details.
+
+The GOMAXPROCS variable limits the number of operating system threads that
+can execute user-level Go code simultaneously. There is no limit to the number of threads
+that can be blocked in system calls on behalf of Go code; those do not count against
+the GOMAXPROCS limit. This package's GOMAXPROCS function queries and changes
+the limit.
+
+The GORACE variable configures the race detector, for programs built using -race.
+See https://golang.org/doc/articles/race_detector.html for details.
+
+The GOTRACEBACK variable controls the amount of output generated when a Go
+program fails due to an unrecovered panic or an unexpected runtime condition.
+By default, a failure prints a stack trace for the current goroutine,
+eliding functions internal to the run-time system, and then exits with exit code 2.
+The failure prints stack traces for all goroutines if there is no current goroutine
+or the failure is internal to the run-time.
+GOTRACEBACK=none omits the goroutine stack traces entirely.
+GOTRACEBACK=single (the default) behaves as described above.
+GOTRACEBACK=all adds stack traces for all user-created goroutines.
+GOTRACEBACK=system is like “all” but adds stack frames for run-time functions
+and shows goroutines created internally by the run-time.
+GOTRACEBACK=crash is like “system” but crashes in an operating system-specific
+manner instead of exiting. For example, on Unix systems, the crash raises
+SIGABRT to trigger a core dump.
+GOTRACEBACK=wer is like “crash” but doesn't disable Windows Error Reporting (WER).
+For historical reasons, the GOTRACEBACK settings 0, 1, and 2 are synonyms for
+none, all, and system, respectively.
+The runtime/debug package's SetTraceback function allows increasing the
+amount of output at run time, but it cannot reduce the amount below that
+specified by the environment variable.
+See https://golang.org/pkg/runtime/debug/#SetTraceback.
+
+The GOARCH, GOOS, GOPATH, and GOROOT environment variables complete
+the set of Go environment variables. They influence the building of Go programs
+(see https://golang.org/cmd/go and https://golang.org/pkg/go/build).
+GOARCH, GOOS, and GOROOT are recorded at compile time and made available by
+constants or functions in this package, but they do not influence the execution
+of the run-time system.
+
+# Security
+
+On Unix platforms, Go's runtime system behaves slightly differently when a
+binary is setuid/setgid or executed with setuid/setgid-like properties, in order
+to prevent dangerous behaviors. On Linux this is determined by checking for the
+AT_SECURE flag in the auxiliary vector, on the BSDs and Solaris/Illumos it is
+determined by checking the issetugid syscall, and on AIX it is determined by
+checking if the uid/gid match the effective uid/gid.
+
+When the runtime determines the binary is setuid/setgid-like, it does three main
+things:
+  - The standard input/output file descriptors (0, 1, 2) are checked to be open.
+    If any of them are closed, they are opened pointing at /dev/null.
+  - The value of the GOTRACEBACK environment variable is set to 'none'.
+  - When a signal is received that terminates the program, or the program
+    encounters an unrecoverable panic that would otherwise override the value
+    of GOTRACEBACK, the goroutine stack, registers, and other memory related
+    information are omitted.
+*/
+package runtime
+
+import (
+	"internal/goarch"
+	"internal/goos"
+)
+
+// Caller reports file and line number information about function invocations on
+// the calling goroutine's stack. The argument skip is the number of stack frames
+// to ascend, with 0 identifying the caller of Caller.  (For historical reasons the
+// meaning of skip differs between Caller and Callers.) The return values report the
+// program counter, file name, and line number within the file of the corresponding
+// call. The boolean ok is false if it was not possible to recover the information.
+func Caller(skip int) (pc uintptr, file string, line int, ok bool) {
+	rpc := make([]uintptr, 1)
+	n := callers(skip+1, rpc[:])
+	if n < 1 {
+		return
+	}
+	frame, _ := CallersFrames(rpc).Next()
+	return frame.PC, frame.File, frame.Line, frame.PC != 0
+}
+
+// Callers fills the slice pc with the return program counters of function invocations
+// on the calling goroutine's stack. The argument skip is the number of stack frames
+// to skip before recording in pc, with 0 identifying the frame for Callers itself and
+// 1 identifying the caller of Callers.
+// It returns the number of entries written to pc.
+//
+// To translate these PCs into symbolic information such as function
+// names and line numbers, use CallersFrames. CallersFrames accounts
+// for inlined functions and adjusts the return program counters into
+// call program counters. Iterating over the returned slice of PCs
+// directly is discouraged, as is using FuncForPC on any of the
+// returned PCs, since these cannot account for inlining or return
+// program counter adjustment.
+func Callers(skip int, pc []uintptr) int {
+	// runtime.callers uses pc.array==nil as a signal
+	// to print a stack trace. Pick off 0-length pc here
+	// so that we don't let a nil pc slice get to it.
+	if len(pc) == 0 {
+		return 0
+	}
+	return callers(skip, pc)
+}
+
+var defaultGOROOT string // set by cmd/link
+
+// GOROOT returns the root of the Go tree. It uses the
+// GOROOT environment variable, if set at process start,
+// or else the root used during the Go build.
+func GOROOT() string {
+	s := gogetenv("GOROOT")
+	if s != "" {
+		return s
+	}
+	return defaultGOROOT
+}
+
+// buildVersion is the Go tree's version string at build time.
+//
+// If any GOEXPERIMENTs are set to non-default values, it will include
+// "X:<GOEXPERIMENT>".
+//
+// This is set by the linker.
+//
+// This is accessed by "go version <binary>".
+var buildVersion string
+
+// Version returns the Go tree's version string.
+// It is either the commit hash and date at the time of the build or,
+// when possible, a release tag like "go1.3".
+func Version() string {
+	return buildVersion
+}
+
+// GOOS is the running program's operating system target:
+// one of darwin, freebsd, linux, and so on.
+// To view possible combinations of GOOS and GOARCH, run "go tool dist list".
+const GOOS string = goos.GOOS
+
+// GOARCH is the running program's architecture target:
+// one of 386, amd64, arm, s390x, and so on.
+const GOARCH string = goarch.GOARCH
diff --git a/src/runtime/fastlog2.go b/src/runtime/fastlog2.go
new file mode 100644
index 0000000..1f251bf
--- /dev/null
+++ b/src/runtime/fastlog2.go
@@ -0,0 +1,27 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// fastlog2 implements a fast approximation to the base 2 log of a
+// float64. This is used to compute a geometric distribution for heap
+// sampling, without introducing dependencies into package math. This
+// uses a very rough approximation using the float64 exponent and the
+// first 25 bits of the mantissa. The top 5 bits of the mantissa are
+// used to load limits from a table of constants and the rest are used
+// to scale linearly between them.
+func fastlog2(x float64) float64 {
+	const fastlogScaleBits = 20
+	const fastlogScaleRatio = 1.0 / (1 << fastlogScaleBits)
+
+	xBits := float64bits(x)
+	// Extract the exponent from the IEEE float64, and index a constant
+	// table with the first 10 bits from the mantissa.
+	xExp := int64((xBits>>52)&0x7FF) - 1023
+	xManIndex := (xBits >> (52 - fastlogNumBits)) % (1 << fastlogNumBits)
+	xManScale := (xBits >> (52 - fastlogNumBits - fastlogScaleBits)) % (1 << fastlogScaleBits)
+
+	low, high := fastlog2Table[xManIndex], fastlog2Table[xManIndex+1]
+	return float64(xExp) + low + (high-low)*float64(xManScale)*fastlogScaleRatio
+}
diff --git a/src/runtime/fastlog2_test.go b/src/runtime/fastlog2_test.go
new file mode 100644
index 0000000..ae0f40b
--- /dev/null
+++ b/src/runtime/fastlog2_test.go
@@ -0,0 +1,34 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"math"
+	"runtime"
+	"testing"
+)
+
+func TestFastLog2(t *testing.T) {
+	// Compute the euclidean distance between math.Log2 and the FastLog2
+	// implementation over the range of interest for heap sampling.
+	const randomBitCount = 26
+	var e float64
+
+	inc := 1
+	if testing.Short() {
+		// Check 1K total values, down from 64M.
+		inc = 1 << 16
+	}
+	for i := 1; i < 1<<randomBitCount; i += inc {
+		l, fl := math.Log2(float64(i)), runtime.Fastlog2(float64(i))
+		d := l - fl
+		e += d * d
+	}
+	e = math.Sqrt(e)
+
+	if e > 1.0 {
+		t.Fatalf("imprecision on fastlog2 implementation, want <=1.0, got %f", e)
+	}
+}
diff --git a/src/runtime/fastlog2table.go b/src/runtime/fastlog2table.go
new file mode 100644
index 0000000..6ba4a7d
--- /dev/null
+++ b/src/runtime/fastlog2table.go
@@ -0,0 +1,43 @@
+// Code generated by mkfastlog2table.go; DO NOT EDIT.
+// Run go generate from src/runtime to update.
+// See mkfastlog2table.go for comments.
+
+package runtime
+
+const fastlogNumBits = 5
+
+var fastlog2Table = [1<<fastlogNumBits + 1]float64{
+	0,
+	0.0443941193584535,
+	0.08746284125033943,
+	0.12928301694496647,
+	0.16992500144231248,
+	0.2094533656289499,
+	0.24792751344358555,
+	0.28540221886224837,
+	0.3219280948873623,
+	0.3575520046180837,
+	0.39231742277876036,
+	0.4262647547020979,
+	0.4594316186372973,
+	0.4918530963296748,
+	0.5235619560570128,
+	0.5545888516776374,
+	0.5849625007211563,
+	0.6147098441152082,
+	0.6438561897747247,
+	0.6724253419714956,
+	0.7004397181410922,
+	0.7279204545631992,
+	0.7548875021634686,
+	0.7813597135246596,
+	0.8073549220576042,
+	0.8328900141647417,
+	0.8579809951275721,
+	0.8826430493618412,
+	0.9068905956085185,
+	0.9307373375628862,
+	0.9541963103868752,
+	0.9772799234999164,
+	1,
+}
diff --git a/src/runtime/float.go b/src/runtime/float.go
new file mode 100644
index 0000000..9f281c4
--- /dev/null
+++ b/src/runtime/float.go
@@ -0,0 +1,54 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+var inf = float64frombits(0x7FF0000000000000)
+
+// isNaN reports whether f is an IEEE 754 “not-a-number” value.
+func isNaN(f float64) (is bool) {
+	// IEEE 754 says that only NaNs satisfy f != f.
+	return f != f
+}
+
+// isFinite reports whether f is neither NaN nor an infinity.
+func isFinite(f float64) bool {
+	return !isNaN(f - f)
+}
+
+// isInf reports whether f is an infinity.
+func isInf(f float64) bool {
+	return !isNaN(f) && !isFinite(f)
+}
+
+// abs returns the absolute value of x.
+//
+// Special cases are:
+//
+//	abs(±Inf) = +Inf
+//	abs(NaN) = NaN
+func abs(x float64) float64 {
+	const sign = 1 << 63
+	return float64frombits(float64bits(x) &^ sign)
+}
+
+// copysign returns a value with the magnitude
+// of x and the sign of y.
+func copysign(x, y float64) float64 {
+	const sign = 1 << 63
+	return float64frombits(float64bits(x)&^sign | float64bits(y)&sign)
+}
+
+// float64bits returns the IEEE 754 binary representation of f.
+func float64bits(f float64) uint64 {
+	return *(*uint64)(unsafe.Pointer(&f))
+}
+
+// float64frombits returns the floating point number corresponding
+// the IEEE 754 binary representation b.
+func float64frombits(b uint64) float64 {
+	return *(*float64)(unsafe.Pointer(&b))
+}
diff --git a/src/runtime/float_test.go b/src/runtime/float_test.go
new file mode 100644
index 0000000..b2aa43d
--- /dev/null
+++ b/src/runtime/float_test.go
@@ -0,0 +1,25 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"testing"
+)
+
+func TestIssue48807(t *testing.T) {
+	for _, i := range []uint64{
+		0x8234508000000001, // from issue48807
+		1<<56 + 1<<32 + 1,
+	} {
+		got := float32(i)
+		dontwant := float32(float64(i))
+		if got == dontwant {
+			// The test cases above should be uint64s such that
+			// this equality doesn't hold. These examples trigger
+			// the case where using an intermediate float64 doesn't work.
+			t.Errorf("direct float32 conversion doesn't work: arg=%x got=%x dontwant=%x", i, got, dontwant)
+		}
+	}
+}
diff --git a/src/runtime/funcdata.h b/src/runtime/funcdata.h
new file mode 100644
index 0000000..edc0316
--- /dev/null
+++ b/src/runtime/funcdata.h
@@ -0,0 +1,56 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This file defines the IDs for PCDATA and FUNCDATA instructions
+// in Go binaries. It is included by assembly sources, so it must
+// be written using #defines.
+//
+// These must agree with internal/abi/symtab.go.
+
+#define PCDATA_UnsafePoint 0
+#define PCDATA_StackMapIndex 1
+#define PCDATA_InlTreeIndex 2
+#define PCDATA_ArgLiveIndex 3
+
+#define FUNCDATA_ArgsPointerMaps 0 /* garbage collector blocks */
+#define FUNCDATA_LocalsPointerMaps 1
+#define FUNCDATA_StackObjects 2
+#define FUNCDATA_InlTree 3
+#define FUNCDATA_OpenCodedDeferInfo 4 /* info for func with open-coded defers */
+#define FUNCDATA_ArgInfo 5
+#define FUNCDATA_ArgLiveInfo 6
+#define FUNCDATA_WrapInfo 7
+
+// Pseudo-assembly statements.
+
+// GO_ARGS, GO_RESULTS_INITIALIZED, and NO_LOCAL_POINTERS are macros
+// that communicate to the runtime information about the location and liveness
+// of pointers in an assembly function's arguments, results, and stack frame.
+// This communication is only required in assembly functions that make calls
+// to other functions that might be preempted or grow the stack.
+// NOSPLIT functions that make no calls do not need to use these macros.
+
+// GO_ARGS indicates that the Go prototype for this assembly function
+// defines the pointer map for the function's arguments.
+// GO_ARGS should be the first instruction in a function that uses it.
+// It can be omitted if there are no arguments at all.
+// GO_ARGS is inserted implicitly by the linker for any function whose
+// name starts with a middle-dot and that also has a Go prototype; it
+// is therefore usually not necessary to write explicitly.
+#define GO_ARGS	FUNCDATA $FUNCDATA_ArgsPointerMaps, go_args_stackmap(SB)
+
+// GO_RESULTS_INITIALIZED indicates that the assembly function
+// has initialized the stack space for its results and that those results
+// should be considered live for the remainder of the function.
+#define GO_RESULTS_INITIALIZED	PCDATA $PCDATA_StackMapIndex, $1
+
+// NO_LOCAL_POINTERS indicates that the assembly function stores
+// no pointers to heap objects in its local stack variables.
+#define NO_LOCAL_POINTERS	FUNCDATA $FUNCDATA_LocalsPointerMaps, no_pointers_stackmap(SB)
+
+// ArgsSizeUnknown is set in Func.argsize to mark all functions
+// whose argument size is unknown (C vararg functions, and
+// assembly code without an explicit specification).
+// This value is generated by the compiler, assembler, or linker.
+#define ArgsSizeUnknown 0x80000000
diff --git a/src/runtime/gc_test.go b/src/runtime/gc_test.go
new file mode 100644
index 0000000..9302ea3
--- /dev/null
+++ b/src/runtime/gc_test.go
@@ -0,0 +1,936 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"math/rand"
+	"os"
+	"reflect"
+	"runtime"
+	"runtime/debug"
+	"sort"
+	"strings"
+	"sync"
+	"sync/atomic"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+func TestGcSys(t *testing.T) {
+	t.Skip("skipping known-flaky test; golang.org/issue/37331")
+	if os.Getenv("GOGC") == "off" {
+		t.Skip("skipping test; GOGC=off in environment")
+	}
+	got := runTestProg(t, "testprog", "GCSys")
+	want := "OK\n"
+	if got != want {
+		t.Fatalf("expected %q, but got %q", want, got)
+	}
+}
+
+func TestGcDeepNesting(t *testing.T) {
+	type T [2][2][2][2][2][2][2][2][2][2]*int
+	a := new(T)
+
+	// Prevent the compiler from applying escape analysis.
+	// This makes sure new(T) is allocated on heap, not on the stack.
+	t.Logf("%p", a)
+
+	a[0][0][0][0][0][0][0][0][0][0] = new(int)
+	*a[0][0][0][0][0][0][0][0][0][0] = 13
+	runtime.GC()
+	if *a[0][0][0][0][0][0][0][0][0][0] != 13 {
+		t.Fail()
+	}
+}
+
+func TestGcMapIndirection(t *testing.T) {
+	defer debug.SetGCPercent(debug.SetGCPercent(1))
+	runtime.GC()
+	type T struct {
+		a [256]int
+	}
+	m := make(map[T]T)
+	for i := 0; i < 2000; i++ {
+		var a T
+		a.a[0] = i
+		m[a] = T{}
+	}
+}
+
+func TestGcArraySlice(t *testing.T) {
+	type X struct {
+		buf     [1]byte
+		nextbuf []byte
+		next    *X
+	}
+	var head *X
+	for i := 0; i < 10; i++ {
+		p := &X{}
+		p.buf[0] = 42
+		p.next = head
+		if head != nil {
+			p.nextbuf = head.buf[:]
+		}
+		head = p
+		runtime.GC()
+	}
+	for p := head; p != nil; p = p.next {
+		if p.buf[0] != 42 {
+			t.Fatal("corrupted heap")
+		}
+	}
+}
+
+func TestGcRescan(t *testing.T) {
+	type X struct {
+		c     chan error
+		nextx *X
+	}
+	type Y struct {
+		X
+		nexty *Y
+		p     *int
+	}
+	var head *Y
+	for i := 0; i < 10; i++ {
+		p := &Y{}
+		p.c = make(chan error)
+		if head != nil {
+			p.nextx = &head.X
+		}
+		p.nexty = head
+		p.p = new(int)
+		*p.p = 42
+		head = p
+		runtime.GC()
+	}
+	for p := head; p != nil; p = p.nexty {
+		if *p.p != 42 {
+			t.Fatal("corrupted heap")
+		}
+	}
+}
+
+func TestGcLastTime(t *testing.T) {
+	ms := new(runtime.MemStats)
+	t0 := time.Now().UnixNano()
+	runtime.GC()
+	t1 := time.Now().UnixNano()
+	runtime.ReadMemStats(ms)
+	last := int64(ms.LastGC)
+	if t0 > last || last > t1 {
+		t.Fatalf("bad last GC time: got %v, want [%v, %v]", last, t0, t1)
+	}
+	pause := ms.PauseNs[(ms.NumGC+255)%256]
+	// Due to timer granularity, pause can actually be 0 on windows
+	// or on virtualized environments.
+	if pause == 0 {
+		t.Logf("last GC pause was 0")
+	} else if pause > 10e9 {
+		t.Logf("bad last GC pause: got %v, want [0, 10e9]", pause)
+	}
+}
+
+var hugeSink any
+
+func TestHugeGCInfo(t *testing.T) {
+	// The test ensures that compiler can chew these huge types even on weakest machines.
+	// The types are not allocated at runtime.
+	if hugeSink != nil {
+		// 400MB on 32 bots, 4TB on 64-bits.
+		const n = (400 << 20) + (unsafe.Sizeof(uintptr(0))-4)<<40
+		hugeSink = new([n]*byte)
+		hugeSink = new([n]uintptr)
+		hugeSink = new(struct {
+			x float64
+			y [n]*byte
+			z []string
+		})
+		hugeSink = new(struct {
+			x float64
+			y [n]uintptr
+			z []string
+		})
+	}
+}
+
+func TestPeriodicGC(t *testing.T) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no sysmon on wasm yet")
+	}
+
+	// Make sure we're not in the middle of a GC.
+	runtime.GC()
+
+	var ms1, ms2 runtime.MemStats
+	runtime.ReadMemStats(&ms1)
+
+	// Make periodic GC run continuously.
+	orig := *runtime.ForceGCPeriod
+	*runtime.ForceGCPeriod = 0
+
+	// Let some periodic GCs happen. In a heavily loaded system,
+	// it's possible these will be delayed, so this is designed to
+	// succeed quickly if things are working, but to give it some
+	// slack if things are slow.
+	var numGCs uint32
+	const want = 2
+	for i := 0; i < 200 && numGCs < want; i++ {
+		time.Sleep(5 * time.Millisecond)
+
+		// Test that periodic GC actually happened.
+		runtime.ReadMemStats(&ms2)
+		numGCs = ms2.NumGC - ms1.NumGC
+	}
+	*runtime.ForceGCPeriod = orig
+
+	if numGCs < want {
+		t.Fatalf("no periodic GC: got %v GCs, want >= 2", numGCs)
+	}
+}
+
+func TestGcZombieReporting(t *testing.T) {
+	// This test is somewhat sensitive to how the allocator works.
+	// Pointers in zombies slice may cross-span, thus we
+	// add invalidptr=0 for avoiding the badPointer check.
+	// See issue https://golang.org/issues/49613/
+	got := runTestProg(t, "testprog", "GCZombie", "GODEBUG=invalidptr=0")
+	want := "found pointer to free object"
+	if !strings.Contains(got, want) {
+		t.Fatalf("expected %q in output, but got %q", want, got)
+	}
+}
+
+func TestGCTestMoveStackOnNextCall(t *testing.T) {
+	t.Parallel()
+	var onStack int
+	// GCTestMoveStackOnNextCall can fail in rare cases if there's
+	// a preemption. This won't happen many times in quick
+	// succession, so just retry a few times.
+	for retry := 0; retry < 5; retry++ {
+		runtime.GCTestMoveStackOnNextCall()
+		if moveStackCheck(t, &onStack, uintptr(unsafe.Pointer(&onStack))) {
+			// Passed.
+			return
+		}
+	}
+	t.Fatal("stack did not move")
+}
+
+// This must not be inlined because the point is to force a stack
+// growth check and move the stack.
+//
+//go:noinline
+func moveStackCheck(t *testing.T, new *int, old uintptr) bool {
+	// new should have been updated by the stack move;
+	// old should not have.
+
+	// Capture new's value before doing anything that could
+	// further move the stack.
+	new2 := uintptr(unsafe.Pointer(new))
+
+	t.Logf("old stack pointer %x, new stack pointer %x", old, new2)
+	if new2 == old {
+		// Check that we didn't screw up the test's escape analysis.
+		if cls := runtime.GCTestPointerClass(unsafe.Pointer(new)); cls != "stack" {
+			t.Fatalf("test bug: new (%#x) should be a stack pointer, not %s", new2, cls)
+		}
+		// This was a real failure.
+		return false
+	}
+	return true
+}
+
+func TestGCTestMoveStackRepeatedly(t *testing.T) {
+	// Move the stack repeatedly to make sure we're not doubling
+	// it each time.
+	for i := 0; i < 100; i++ {
+		runtime.GCTestMoveStackOnNextCall()
+		moveStack1(false)
+	}
+}
+
+//go:noinline
+func moveStack1(x bool) {
+	// Make sure this function doesn't get auto-nosplit.
+	if x {
+		println("x")
+	}
+}
+
+func TestGCTestIsReachable(t *testing.T) {
+	var all, half []unsafe.Pointer
+	var want uint64
+	for i := 0; i < 16; i++ {
+		// The tiny allocator muddies things, so we use a
+		// scannable type.
+		p := unsafe.Pointer(new(*int))
+		all = append(all, p)
+		if i%2 == 0 {
+			half = append(half, p)
+			want |= 1 << i
+		}
+	}
+
+	got := runtime.GCTestIsReachable(all...)
+	if want != got {
+		t.Fatalf("did not get expected reachable set; want %b, got %b", want, got)
+	}
+	runtime.KeepAlive(half)
+}
+
+var pointerClassBSS *int
+var pointerClassData = 42
+
+func TestGCTestPointerClass(t *testing.T) {
+	t.Parallel()
+	check := func(p unsafe.Pointer, want string) {
+		t.Helper()
+		got := runtime.GCTestPointerClass(p)
+		if got != want {
+			// Convert the pointer to a uintptr to avoid
+			// escaping it.
+			t.Errorf("for %#x, want class %s, got %s", uintptr(p), want, got)
+		}
+	}
+	var onStack int
+	var notOnStack int
+	check(unsafe.Pointer(&onStack), "stack")
+	check(unsafe.Pointer(runtime.Escape(&notOnStack)), "heap")
+	check(unsafe.Pointer(&pointerClassBSS), "bss")
+	check(unsafe.Pointer(&pointerClassData), "data")
+	check(nil, "other")
+}
+
+func BenchmarkSetTypePtr(b *testing.B) {
+	benchSetType[*byte](b)
+}
+
+func BenchmarkSetTypePtr8(b *testing.B) {
+	benchSetType[[8]*byte](b)
+}
+
+func BenchmarkSetTypePtr16(b *testing.B) {
+	benchSetType[[16]*byte](b)
+}
+
+func BenchmarkSetTypePtr32(b *testing.B) {
+	benchSetType[[32]*byte](b)
+}
+
+func BenchmarkSetTypePtr64(b *testing.B) {
+	benchSetType[[64]*byte](b)
+}
+
+func BenchmarkSetTypePtr126(b *testing.B) {
+	benchSetType[[126]*byte](b)
+}
+
+func BenchmarkSetTypePtr128(b *testing.B) {
+	benchSetType[[128]*byte](b)
+}
+
+func BenchmarkSetTypePtrSlice(b *testing.B) {
+	benchSetTypeSlice[*byte](b, 1<<10)
+}
+
+type Node1 struct {
+	Value       [1]uintptr
+	Left, Right *byte
+}
+
+func BenchmarkSetTypeNode1(b *testing.B) {
+	benchSetType[Node1](b)
+}
+
+func BenchmarkSetTypeNode1Slice(b *testing.B) {
+	benchSetTypeSlice[Node1](b, 32)
+}
+
+type Node8 struct {
+	Value       [8]uintptr
+	Left, Right *byte
+}
+
+func BenchmarkSetTypeNode8(b *testing.B) {
+	benchSetType[Node8](b)
+}
+
+func BenchmarkSetTypeNode8Slice(b *testing.B) {
+	benchSetTypeSlice[Node8](b, 32)
+}
+
+type Node64 struct {
+	Value       [64]uintptr
+	Left, Right *byte
+}
+
+func BenchmarkSetTypeNode64(b *testing.B) {
+	benchSetType[Node64](b)
+}
+
+func BenchmarkSetTypeNode64Slice(b *testing.B) {
+	benchSetTypeSlice[Node64](b, 32)
+}
+
+type Node64Dead struct {
+	Left, Right *byte
+	Value       [64]uintptr
+}
+
+func BenchmarkSetTypeNode64Dead(b *testing.B) {
+	benchSetType[Node64Dead](b)
+}
+
+func BenchmarkSetTypeNode64DeadSlice(b *testing.B) {
+	benchSetTypeSlice[Node64Dead](b, 32)
+}
+
+type Node124 struct {
+	Value       [124]uintptr
+	Left, Right *byte
+}
+
+func BenchmarkSetTypeNode124(b *testing.B) {
+	benchSetType[Node124](b)
+}
+
+func BenchmarkSetTypeNode124Slice(b *testing.B) {
+	benchSetTypeSlice[Node124](b, 32)
+}
+
+type Node126 struct {
+	Value       [126]uintptr
+	Left, Right *byte
+}
+
+func BenchmarkSetTypeNode126(b *testing.B) {
+	benchSetType[Node126](b)
+}
+
+func BenchmarkSetTypeNode126Slice(b *testing.B) {
+	benchSetTypeSlice[Node126](b, 32)
+}
+
+type Node128 struct {
+	Value       [128]uintptr
+	Left, Right *byte
+}
+
+func BenchmarkSetTypeNode128(b *testing.B) {
+	benchSetType[Node128](b)
+}
+
+func BenchmarkSetTypeNode128Slice(b *testing.B) {
+	benchSetTypeSlice[Node128](b, 32)
+}
+
+type Node130 struct {
+	Value       [130]uintptr
+	Left, Right *byte
+}
+
+func BenchmarkSetTypeNode130(b *testing.B) {
+	benchSetType[Node130](b)
+}
+
+func BenchmarkSetTypeNode130Slice(b *testing.B) {
+	benchSetTypeSlice[Node130](b, 32)
+}
+
+type Node1024 struct {
+	Value       [1024]uintptr
+	Left, Right *byte
+}
+
+func BenchmarkSetTypeNode1024(b *testing.B) {
+	benchSetType[Node1024](b)
+}
+
+func BenchmarkSetTypeNode1024Slice(b *testing.B) {
+	benchSetTypeSlice[Node1024](b, 32)
+}
+
+func benchSetType[T any](b *testing.B) {
+	b.SetBytes(int64(unsafe.Sizeof(*new(T))))
+	runtime.BenchSetType[T](b.N, b.ResetTimer)
+}
+
+func benchSetTypeSlice[T any](b *testing.B, len int) {
+	b.SetBytes(int64(unsafe.Sizeof(*new(T)) * uintptr(len)))
+	runtime.BenchSetTypeSlice[T](b.N, b.ResetTimer, len)
+}
+
+func BenchmarkAllocation(b *testing.B) {
+	type T struct {
+		x, y *byte
+	}
+	ngo := runtime.GOMAXPROCS(0)
+	work := make(chan bool, b.N+ngo)
+	result := make(chan *T)
+	for i := 0; i < b.N; i++ {
+		work <- true
+	}
+	for i := 0; i < ngo; i++ {
+		work <- false
+	}
+	for i := 0; i < ngo; i++ {
+		go func() {
+			var x *T
+			for <-work {
+				for i := 0; i < 1000; i++ {
+					x = &T{}
+				}
+			}
+			result <- x
+		}()
+	}
+	for i := 0; i < ngo; i++ {
+		<-result
+	}
+}
+
+func TestPrintGC(t *testing.T) {
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(2))
+	done := make(chan bool)
+	go func() {
+		for {
+			select {
+			case <-done:
+				return
+			default:
+				runtime.GC()
+			}
+		}
+	}()
+	for i := 0; i < 1e4; i++ {
+		func() {
+			defer print("")
+		}()
+	}
+	close(done)
+}
+
+func testTypeSwitch(x any) error {
+	switch y := x.(type) {
+	case nil:
+		// ok
+	case error:
+		return y
+	}
+	return nil
+}
+
+func testAssert(x any) error {
+	if y, ok := x.(error); ok {
+		return y
+	}
+	return nil
+}
+
+func testAssertVar(x any) error {
+	var y, ok = x.(error)
+	if ok {
+		return y
+	}
+	return nil
+}
+
+var a bool
+
+//go:noinline
+func testIfaceEqual(x any) {
+	if x == "abc" {
+		a = true
+	}
+}
+
+func TestPageAccounting(t *testing.T) {
+	// Grow the heap in small increments. This used to drop the
+	// pages-in-use count below zero because of a rounding
+	// mismatch (golang.org/issue/15022).
+	const blockSize = 64 << 10
+	blocks := make([]*[blockSize]byte, (64<<20)/blockSize)
+	for i := range blocks {
+		blocks[i] = new([blockSize]byte)
+	}
+
+	// Check that the running page count matches reality.
+	pagesInUse, counted := runtime.CountPagesInUse()
+	if pagesInUse != counted {
+		t.Fatalf("mheap_.pagesInUse is %d, but direct count is %d", pagesInUse, counted)
+	}
+}
+
+func init() {
+	// Enable ReadMemStats' double-check mode.
+	*runtime.DoubleCheckReadMemStats = true
+}
+
+func TestReadMemStats(t *testing.T) {
+	base, slow := runtime.ReadMemStatsSlow()
+	if base != slow {
+		logDiff(t, "MemStats", reflect.ValueOf(base), reflect.ValueOf(slow))
+		t.Fatal("memstats mismatch")
+	}
+}
+
+func logDiff(t *testing.T, prefix string, got, want reflect.Value) {
+	typ := got.Type()
+	switch typ.Kind() {
+	case reflect.Array, reflect.Slice:
+		if got.Len() != want.Len() {
+			t.Logf("len(%s): got %v, want %v", prefix, got, want)
+			return
+		}
+		for i := 0; i < got.Len(); i++ {
+			logDiff(t, fmt.Sprintf("%s[%d]", prefix, i), got.Index(i), want.Index(i))
+		}
+	case reflect.Struct:
+		for i := 0; i < typ.NumField(); i++ {
+			gf, wf := got.Field(i), want.Field(i)
+			logDiff(t, prefix+"."+typ.Field(i).Name, gf, wf)
+		}
+	case reflect.Map:
+		t.Fatal("not implemented: logDiff for map")
+	default:
+		if got.Interface() != want.Interface() {
+			t.Logf("%s: got %v, want %v", prefix, got, want)
+		}
+	}
+}
+
+func BenchmarkReadMemStats(b *testing.B) {
+	var ms runtime.MemStats
+	const heapSize = 100 << 20
+	x := make([]*[1024]byte, heapSize/1024)
+	for i := range x {
+		x[i] = new([1024]byte)
+	}
+
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		runtime.ReadMemStats(&ms)
+	}
+
+	runtime.KeepAlive(x)
+}
+
+func applyGCLoad(b *testing.B) func() {
+	// We’ll apply load to the runtime with maxProcs-1 goroutines
+	// and use one more to actually benchmark. It doesn't make sense
+	// to try to run this test with only 1 P (that's what
+	// BenchmarkReadMemStats is for).
+	maxProcs := runtime.GOMAXPROCS(-1)
+	if maxProcs == 1 {
+		b.Skip("This benchmark can only be run with GOMAXPROCS > 1")
+	}
+
+	// Code to build a big tree with lots of pointers.
+	type node struct {
+		children [16]*node
+	}
+	var buildTree func(depth int) *node
+	buildTree = func(depth int) *node {
+		tree := new(node)
+		if depth != 0 {
+			for i := range tree.children {
+				tree.children[i] = buildTree(depth - 1)
+			}
+		}
+		return tree
+	}
+
+	// Keep the GC busy by continuously generating large trees.
+	done := make(chan struct{})
+	var wg sync.WaitGroup
+	for i := 0; i < maxProcs-1; i++ {
+		wg.Add(1)
+		go func() {
+			defer wg.Done()
+			var hold *node
+		loop:
+			for {
+				hold = buildTree(5)
+				select {
+				case <-done:
+					break loop
+				default:
+				}
+			}
+			runtime.KeepAlive(hold)
+		}()
+	}
+	return func() {
+		close(done)
+		wg.Wait()
+	}
+}
+
+func BenchmarkReadMemStatsLatency(b *testing.B) {
+	stop := applyGCLoad(b)
+
+	// Spend this much time measuring latencies.
+	latencies := make([]time.Duration, 0, 1024)
+
+	// Run for timeToBench hitting ReadMemStats continuously
+	// and measuring the latency.
+	b.ResetTimer()
+	var ms runtime.MemStats
+	for i := 0; i < b.N; i++ {
+		// Sleep for a bit, otherwise we're just going to keep
+		// stopping the world and no one will get to do anything.
+		time.Sleep(100 * time.Millisecond)
+		start := time.Now()
+		runtime.ReadMemStats(&ms)
+		latencies = append(latencies, time.Since(start))
+	}
+	// Make sure to stop the timer before we wait! The load created above
+	// is very heavy-weight and not easy to stop, so we could end up
+	// confusing the benchmarking framework for small b.N.
+	b.StopTimer()
+	stop()
+
+	// Disable the default */op metrics.
+	// ns/op doesn't mean anything because it's an average, but we
+	// have a sleep in our b.N loop above which skews this significantly.
+	b.ReportMetric(0, "ns/op")
+	b.ReportMetric(0, "B/op")
+	b.ReportMetric(0, "allocs/op")
+
+	// Sort latencies then report percentiles.
+	sort.Slice(latencies, func(i, j int) bool {
+		return latencies[i] < latencies[j]
+	})
+	b.ReportMetric(float64(latencies[len(latencies)*50/100]), "p50-ns")
+	b.ReportMetric(float64(latencies[len(latencies)*90/100]), "p90-ns")
+	b.ReportMetric(float64(latencies[len(latencies)*99/100]), "p99-ns")
+}
+
+func TestUserForcedGC(t *testing.T) {
+	// Test that runtime.GC() triggers a GC even if GOGC=off.
+	defer debug.SetGCPercent(debug.SetGCPercent(-1))
+
+	var ms1, ms2 runtime.MemStats
+	runtime.ReadMemStats(&ms1)
+	runtime.GC()
+	runtime.ReadMemStats(&ms2)
+	if ms1.NumGC == ms2.NumGC {
+		t.Fatalf("runtime.GC() did not trigger GC")
+	}
+	if ms1.NumForcedGC == ms2.NumForcedGC {
+		t.Fatalf("runtime.GC() was not accounted in NumForcedGC")
+	}
+}
+
+func writeBarrierBenchmark(b *testing.B, f func()) {
+	runtime.GC()
+	var ms runtime.MemStats
+	runtime.ReadMemStats(&ms)
+	//b.Logf("heap size: %d MB", ms.HeapAlloc>>20)
+
+	// Keep GC running continuously during the benchmark, which in
+	// turn keeps the write barrier on continuously.
+	var stop uint32
+	done := make(chan bool)
+	go func() {
+		for atomic.LoadUint32(&stop) == 0 {
+			runtime.GC()
+		}
+		close(done)
+	}()
+	defer func() {
+		atomic.StoreUint32(&stop, 1)
+		<-done
+	}()
+
+	b.ResetTimer()
+	f()
+	b.StopTimer()
+}
+
+func BenchmarkWriteBarrier(b *testing.B) {
+	if runtime.GOMAXPROCS(-1) < 2 {
+		// We don't want GC to take our time.
+		b.Skip("need GOMAXPROCS >= 2")
+	}
+
+	// Construct a large tree both so the GC runs for a while and
+	// so we have a data structure to manipulate the pointers of.
+	type node struct {
+		l, r *node
+	}
+	var wbRoots []*node
+	var mkTree func(level int) *node
+	mkTree = func(level int) *node {
+		if level == 0 {
+			return nil
+		}
+		n := &node{mkTree(level - 1), mkTree(level - 1)}
+		if level == 10 {
+			// Seed GC with enough early pointers so it
+			// doesn't start termination barriers when it
+			// only has the top of the tree.
+			wbRoots = append(wbRoots, n)
+		}
+		return n
+	}
+	const depth = 22 // 64 MB
+	root := mkTree(22)
+
+	writeBarrierBenchmark(b, func() {
+		var stack [depth]*node
+		tos := -1
+
+		// There are two write barriers per iteration, so i+=2.
+		for i := 0; i < b.N; i += 2 {
+			if tos == -1 {
+				stack[0] = root
+				tos = 0
+			}
+
+			// Perform one step of reversing the tree.
+			n := stack[tos]
+			if n.l == nil {
+				tos--
+			} else {
+				n.l, n.r = n.r, n.l
+				stack[tos] = n.l
+				stack[tos+1] = n.r
+				tos++
+			}
+
+			if i%(1<<12) == 0 {
+				// Avoid non-preemptible loops (see issue #10958).
+				runtime.Gosched()
+			}
+		}
+	})
+
+	runtime.KeepAlive(wbRoots)
+}
+
+func BenchmarkBulkWriteBarrier(b *testing.B) {
+	if runtime.GOMAXPROCS(-1) < 2 {
+		// We don't want GC to take our time.
+		b.Skip("need GOMAXPROCS >= 2")
+	}
+
+	// Construct a large set of objects we can copy around.
+	const heapSize = 64 << 20
+	type obj [16]*byte
+	ptrs := make([]*obj, heapSize/unsafe.Sizeof(obj{}))
+	for i := range ptrs {
+		ptrs[i] = new(obj)
+	}
+
+	writeBarrierBenchmark(b, func() {
+		const blockSize = 1024
+		var pos int
+		for i := 0; i < b.N; i += blockSize {
+			// Rotate block.
+			block := ptrs[pos : pos+blockSize]
+			first := block[0]
+			copy(block, block[1:])
+			block[blockSize-1] = first
+
+			pos += blockSize
+			if pos+blockSize > len(ptrs) {
+				pos = 0
+			}
+
+			runtime.Gosched()
+		}
+	})
+
+	runtime.KeepAlive(ptrs)
+}
+
+func BenchmarkScanStackNoLocals(b *testing.B) {
+	var ready sync.WaitGroup
+	teardown := make(chan bool)
+	for j := 0; j < 10; j++ {
+		ready.Add(1)
+		go func() {
+			x := 100000
+			countpwg(&x, &ready, teardown)
+		}()
+	}
+	ready.Wait()
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		b.StartTimer()
+		runtime.GC()
+		runtime.GC()
+		b.StopTimer()
+	}
+	close(teardown)
+}
+
+func BenchmarkMSpanCountAlloc(b *testing.B) {
+	// Allocate one dummy mspan for the whole benchmark.
+	s := runtime.AllocMSpan()
+	defer runtime.FreeMSpan(s)
+
+	// n is the number of bytes to benchmark against.
+	// n must always be a multiple of 8, since gcBits is
+	// always rounded up 8 bytes.
+	for _, n := range []int{8, 16, 32, 64, 128} {
+		b.Run(fmt.Sprintf("bits=%d", n*8), func(b *testing.B) {
+			// Initialize a new byte slice with pseduo-random data.
+			bits := make([]byte, n)
+			rand.Read(bits)
+
+			b.ResetTimer()
+			for i := 0; i < b.N; i++ {
+				runtime.MSpanCountAlloc(s, bits)
+			}
+		})
+	}
+}
+
+func countpwg(n *int, ready *sync.WaitGroup, teardown chan bool) {
+	if *n == 0 {
+		ready.Done()
+		<-teardown
+		return
+	}
+	*n--
+	countpwg(n, ready, teardown)
+}
+
+func TestMemoryLimit(t *testing.T) {
+	if testing.Short() {
+		t.Skip("stress test that takes time to run")
+	}
+	if runtime.NumCPU() < 4 {
+		t.Skip("want at least 4 CPUs for this test")
+	}
+	got := runTestProg(t, "testprog", "GCMemoryLimit")
+	want := "OK\n"
+	if got != want {
+		t.Fatalf("expected %q, but got %q", want, got)
+	}
+}
+
+func TestMemoryLimitNoGCPercent(t *testing.T) {
+	if testing.Short() {
+		t.Skip("stress test that takes time to run")
+	}
+	if runtime.NumCPU() < 4 {
+		t.Skip("want at least 4 CPUs for this test")
+	}
+	got := runTestProg(t, "testprog", "GCMemoryLimitNoGCPercent")
+	want := "OK\n"
+	if got != want {
+		t.Fatalf("expected %q, but got %q", want, got)
+	}
+}
diff --git a/src/runtime/gcinfo_test.go b/src/runtime/gcinfo_test.go
new file mode 100644
index 0000000..787160d
--- /dev/null
+++ b/src/runtime/gcinfo_test.go
@@ -0,0 +1,207 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"bytes"
+	"runtime"
+	"testing"
+)
+
+const (
+	typeScalar  = 0
+	typePointer = 1
+)
+
+// TestGCInfo tests that various objects in heap, data and bss receive correct GC pointer type info.
+func TestGCInfo(t *testing.T) {
+	verifyGCInfo(t, "bss Ptr", &bssPtr, infoPtr)
+	verifyGCInfo(t, "bss ScalarPtr", &bssScalarPtr, infoScalarPtr)
+	verifyGCInfo(t, "bss PtrScalar", &bssPtrScalar, infoPtrScalar)
+	verifyGCInfo(t, "bss BigStruct", &bssBigStruct, infoBigStruct())
+	verifyGCInfo(t, "bss string", &bssString, infoString)
+	verifyGCInfo(t, "bss slice", &bssSlice, infoSlice)
+	verifyGCInfo(t, "bss eface", &bssEface, infoEface)
+	verifyGCInfo(t, "bss iface", &bssIface, infoIface)
+
+	verifyGCInfo(t, "data Ptr", &dataPtr, infoPtr)
+	verifyGCInfo(t, "data ScalarPtr", &dataScalarPtr, infoScalarPtr)
+	verifyGCInfo(t, "data PtrScalar", &dataPtrScalar, infoPtrScalar)
+	verifyGCInfo(t, "data BigStruct", &dataBigStruct, infoBigStruct())
+	verifyGCInfo(t, "data string", &dataString, infoString)
+	verifyGCInfo(t, "data slice", &dataSlice, infoSlice)
+	verifyGCInfo(t, "data eface", &dataEface, infoEface)
+	verifyGCInfo(t, "data iface", &dataIface, infoIface)
+
+	{
+		var x Ptr
+		verifyGCInfo(t, "stack Ptr", &x, infoPtr)
+		runtime.KeepAlive(x)
+	}
+	{
+		var x ScalarPtr
+		verifyGCInfo(t, "stack ScalarPtr", &x, infoScalarPtr)
+		runtime.KeepAlive(x)
+	}
+	{
+		var x PtrScalar
+		verifyGCInfo(t, "stack PtrScalar", &x, infoPtrScalar)
+		runtime.KeepAlive(x)
+	}
+	{
+		var x BigStruct
+		verifyGCInfo(t, "stack BigStruct", &x, infoBigStruct())
+		runtime.KeepAlive(x)
+	}
+	{
+		var x string
+		verifyGCInfo(t, "stack string", &x, infoString)
+		runtime.KeepAlive(x)
+	}
+	{
+		var x []string
+		verifyGCInfo(t, "stack slice", &x, infoSlice)
+		runtime.KeepAlive(x)
+	}
+	{
+		var x any
+		verifyGCInfo(t, "stack eface", &x, infoEface)
+		runtime.KeepAlive(x)
+	}
+	{
+		var x Iface
+		verifyGCInfo(t, "stack iface", &x, infoIface)
+		runtime.KeepAlive(x)
+	}
+
+	for i := 0; i < 10; i++ {
+		verifyGCInfo(t, "heap Ptr", runtime.Escape(new(Ptr)), trimDead(infoPtr))
+		verifyGCInfo(t, "heap PtrSlice", runtime.Escape(&make([]*byte, 10)[0]), trimDead(infoPtr10))
+		verifyGCInfo(t, "heap ScalarPtr", runtime.Escape(new(ScalarPtr)), trimDead(infoScalarPtr))
+		verifyGCInfo(t, "heap ScalarPtrSlice", runtime.Escape(&make([]ScalarPtr, 4)[0]), trimDead(infoScalarPtr4))
+		verifyGCInfo(t, "heap PtrScalar", runtime.Escape(new(PtrScalar)), trimDead(infoPtrScalar))
+		verifyGCInfo(t, "heap BigStruct", runtime.Escape(new(BigStruct)), trimDead(infoBigStruct()))
+		verifyGCInfo(t, "heap string", runtime.Escape(new(string)), trimDead(infoString))
+		verifyGCInfo(t, "heap eface", runtime.Escape(new(any)), trimDead(infoEface))
+		verifyGCInfo(t, "heap iface", runtime.Escape(new(Iface)), trimDead(infoIface))
+	}
+}
+
+func verifyGCInfo(t *testing.T, name string, p any, mask0 []byte) {
+	mask := runtime.GCMask(p)
+	if !bytes.Equal(mask, mask0) {
+		t.Errorf("bad GC program for %v:\nwant %+v\ngot  %+v", name, mask0, mask)
+		return
+	}
+}
+
+func trimDead(mask []byte) []byte {
+	for len(mask) > 0 && mask[len(mask)-1] == typeScalar {
+		mask = mask[:len(mask)-1]
+	}
+	return mask
+}
+
+var infoPtr = []byte{typePointer}
+
+type Ptr struct {
+	*byte
+}
+
+var infoPtr10 = []byte{typePointer, typePointer, typePointer, typePointer, typePointer, typePointer, typePointer, typePointer, typePointer, typePointer}
+
+type ScalarPtr struct {
+	q int
+	w *int
+	e int
+	r *int
+	t int
+	y *int
+}
+
+var infoScalarPtr = []byte{typeScalar, typePointer, typeScalar, typePointer, typeScalar, typePointer}
+
+var infoScalarPtr4 = append(append(append(append([]byte(nil), infoScalarPtr...), infoScalarPtr...), infoScalarPtr...), infoScalarPtr...)
+
+type PtrScalar struct {
+	q *int
+	w int
+	e *int
+	r int
+	t *int
+	y int
+}
+
+var infoPtrScalar = []byte{typePointer, typeScalar, typePointer, typeScalar, typePointer, typeScalar}
+
+type BigStruct struct {
+	q *int
+	w byte
+	e [17]byte
+	r []byte
+	t int
+	y uint16
+	u uint64
+	i string
+}
+
+func infoBigStruct() []byte {
+	switch runtime.GOARCH {
+	case "386", "arm", "mips", "mipsle":
+		return []byte{
+			typePointer,                                                // q *int
+			typeScalar, typeScalar, typeScalar, typeScalar, typeScalar, // w byte; e [17]byte
+			typePointer, typeScalar, typeScalar, // r []byte
+			typeScalar, typeScalar, typeScalar, typeScalar, // t int; y uint16; u uint64
+			typePointer, typeScalar, // i string
+		}
+	case "arm64", "amd64", "loong64", "mips64", "mips64le", "ppc64", "ppc64le", "riscv64", "s390x", "wasm":
+		return []byte{
+			typePointer,                        // q *int
+			typeScalar, typeScalar, typeScalar, // w byte; e [17]byte
+			typePointer, typeScalar, typeScalar, // r []byte
+			typeScalar, typeScalar, typeScalar, // t int; y uint16; u uint64
+			typePointer, typeScalar, // i string
+		}
+	default:
+		panic("unknown arch")
+	}
+}
+
+type Iface interface {
+	f()
+}
+
+type IfaceImpl int
+
+func (IfaceImpl) f() {
+}
+
+var (
+	// BSS
+	bssPtr       Ptr
+	bssScalarPtr ScalarPtr
+	bssPtrScalar PtrScalar
+	bssBigStruct BigStruct
+	bssString    string
+	bssSlice     []string
+	bssEface     any
+	bssIface     Iface
+
+	// DATA
+	dataPtr             = Ptr{new(byte)}
+	dataScalarPtr       = ScalarPtr{q: 1}
+	dataPtrScalar       = PtrScalar{w: 1}
+	dataBigStruct       = BigStruct{w: 1}
+	dataString          = "foo"
+	dataSlice           = []string{"foo"}
+	dataEface     any   = 42
+	dataIface     Iface = IfaceImpl(42)
+
+	infoString = []byte{typePointer, typeScalar}
+	infoSlice  = []byte{typePointer, typeScalar, typeScalar}
+	infoEface  = []byte{typeScalar, typePointer}
+	infoIface  = []byte{typeScalar, typePointer}
+)
diff --git a/src/runtime/go_tls.h b/src/runtime/go_tls.h
new file mode 100644
index 0000000..a47e798
--- /dev/null
+++ b/src/runtime/go_tls.h
@@ -0,0 +1,17 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#ifdef GOARCH_arm
+#define LR R14
+#endif
+
+#ifdef GOARCH_amd64
+#define	get_tls(r)	MOVQ TLS, r
+#define	g(r)	0(r)(TLS*1)
+#endif
+
+#ifdef GOARCH_386
+#define	get_tls(r)	MOVL TLS, r
+#define	g(r)	0(r)(TLS*1)
+#endif
diff --git a/src/runtime/hash32.go b/src/runtime/hash32.go
new file mode 100644
index 0000000..0616c7d
--- /dev/null
+++ b/src/runtime/hash32.go
@@ -0,0 +1,62 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Hashing algorithm inspired by
+// wyhash: https://github.com/wangyi-fudan/wyhash/blob/ceb019b530e2c1c14d70b79bfa2bc49de7d95bc1/Modern%20Non-Cryptographic%20Hash%20Function%20and%20Pseudorandom%20Number%20Generator.pdf
+
+//go:build 386 || arm || mips || mipsle
+
+package runtime
+
+import "unsafe"
+
+func memhash32Fallback(p unsafe.Pointer, seed uintptr) uintptr {
+	a, b := mix32(uint32(seed), uint32(4^hashkey[0]))
+	t := readUnaligned32(p)
+	a ^= t
+	b ^= t
+	a, b = mix32(a, b)
+	a, b = mix32(a, b)
+	return uintptr(a ^ b)
+}
+
+func memhash64Fallback(p unsafe.Pointer, seed uintptr) uintptr {
+	a, b := mix32(uint32(seed), uint32(8^hashkey[0]))
+	a ^= readUnaligned32(p)
+	b ^= readUnaligned32(add(p, 4))
+	a, b = mix32(a, b)
+	a, b = mix32(a, b)
+	return uintptr(a ^ b)
+}
+
+func memhashFallback(p unsafe.Pointer, seed, s uintptr) uintptr {
+
+	a, b := mix32(uint32(seed), uint32(s^hashkey[0]))
+	if s == 0 {
+		return uintptr(a ^ b)
+	}
+	for ; s > 8; s -= 8 {
+		a ^= readUnaligned32(p)
+		b ^= readUnaligned32(add(p, 4))
+		a, b = mix32(a, b)
+		p = add(p, 8)
+	}
+	if s >= 4 {
+		a ^= readUnaligned32(p)
+		b ^= readUnaligned32(add(p, s-4))
+	} else {
+		t := uint32(*(*byte)(p))
+		t |= uint32(*(*byte)(add(p, s>>1))) << 8
+		t |= uint32(*(*byte)(add(p, s-1))) << 16
+		b ^= t
+	}
+	a, b = mix32(a, b)
+	a, b = mix32(a, b)
+	return uintptr(a ^ b)
+}
+
+func mix32(a, b uint32) (uint32, uint32) {
+	c := uint64(a^uint32(hashkey[1])) * uint64(b^uint32(hashkey[2]))
+	return uint32(c), uint32(c >> 32)
+}
diff --git a/src/runtime/hash64.go b/src/runtime/hash64.go
new file mode 100644
index 0000000..2864a4b
--- /dev/null
+++ b/src/runtime/hash64.go
@@ -0,0 +1,92 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Hashing algorithm inspired by
+// wyhash: https://github.com/wangyi-fudan/wyhash
+
+//go:build amd64 || arm64 || loong64 || mips64 || mips64le || ppc64 || ppc64le || riscv64 || s390x || wasm
+
+package runtime
+
+import (
+	"runtime/internal/math"
+	"unsafe"
+)
+
+const (
+	m1 = 0xa0761d6478bd642f
+	m2 = 0xe7037ed1a0b428db
+	m3 = 0x8ebc6af09c88c6e3
+	m4 = 0x589965cc75374cc3
+	m5 = 0x1d8e4e27c47d124f
+)
+
+func memhashFallback(p unsafe.Pointer, seed, s uintptr) uintptr {
+	var a, b uintptr
+	seed ^= hashkey[0] ^ m1
+	switch {
+	case s == 0:
+		return seed
+	case s < 4:
+		a = uintptr(*(*byte)(p))
+		a |= uintptr(*(*byte)(add(p, s>>1))) << 8
+		a |= uintptr(*(*byte)(add(p, s-1))) << 16
+	case s == 4:
+		a = r4(p)
+		b = a
+	case s < 8:
+		a = r4(p)
+		b = r4(add(p, s-4))
+	case s == 8:
+		a = r8(p)
+		b = a
+	case s <= 16:
+		a = r8(p)
+		b = r8(add(p, s-8))
+	default:
+		l := s
+		if l > 48 {
+			seed1 := seed
+			seed2 := seed
+			for ; l > 48; l -= 48 {
+				seed = mix(r8(p)^m2, r8(add(p, 8))^seed)
+				seed1 = mix(r8(add(p, 16))^m3, r8(add(p, 24))^seed1)
+				seed2 = mix(r8(add(p, 32))^m4, r8(add(p, 40))^seed2)
+				p = add(p, 48)
+			}
+			seed ^= seed1 ^ seed2
+		}
+		for ; l > 16; l -= 16 {
+			seed = mix(r8(p)^m2, r8(add(p, 8))^seed)
+			p = add(p, 16)
+		}
+		a = r8(add(p, l-16))
+		b = r8(add(p, l-8))
+	}
+
+	return mix(m5^s, mix(a^m2, b^seed))
+}
+
+func memhash32Fallback(p unsafe.Pointer, seed uintptr) uintptr {
+	a := r4(p)
+	return mix(m5^4, mix(a^m2, a^seed^hashkey[0]^m1))
+}
+
+func memhash64Fallback(p unsafe.Pointer, seed uintptr) uintptr {
+	a := r8(p)
+	return mix(m5^8, mix(a^m2, a^seed^hashkey[0]^m1))
+}
+
+func mix(a, b uintptr) uintptr {
+	hi, lo := math.Mul64(uint64(a), uint64(b))
+	return uintptr(hi ^ lo)
+}
+
+func r4(p unsafe.Pointer) uintptr {
+	return uintptr(readUnaligned32(p))
+}
+
+func r8(p unsafe.Pointer) uintptr {
+	return uintptr(readUnaligned64(p))
+}
diff --git a/src/runtime/hash_test.go b/src/runtime/hash_test.go
new file mode 100644
index 0000000..6562829
--- /dev/null
+++ b/src/runtime/hash_test.go
@@ -0,0 +1,786 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"internal/race"
+	"math"
+	"math/rand"
+	. "runtime"
+	"strings"
+	"testing"
+	"unsafe"
+)
+
+func TestMemHash32Equality(t *testing.T) {
+	if *UseAeshash {
+		t.Skip("skipping since AES hash implementation is used")
+	}
+	var b [4]byte
+	r := rand.New(rand.NewSource(1234))
+	seed := uintptr(r.Uint64())
+	for i := 0; i < 100; i++ {
+		randBytes(r, b[:])
+		got := MemHash32(unsafe.Pointer(&b), seed)
+		want := MemHash(unsafe.Pointer(&b), seed, 4)
+		if got != want {
+			t.Errorf("MemHash32(%x, %v) = %v; want %v", b, seed, got, want)
+		}
+	}
+}
+
+func TestMemHash64Equality(t *testing.T) {
+	if *UseAeshash {
+		t.Skip("skipping since AES hash implementation is used")
+	}
+	var b [8]byte
+	r := rand.New(rand.NewSource(1234))
+	seed := uintptr(r.Uint64())
+	for i := 0; i < 100; i++ {
+		randBytes(r, b[:])
+		got := MemHash64(unsafe.Pointer(&b), seed)
+		want := MemHash(unsafe.Pointer(&b), seed, 8)
+		if got != want {
+			t.Errorf("MemHash64(%x, %v) = %v; want %v", b, seed, got, want)
+		}
+	}
+}
+
+// Smhasher is a torture test for hash functions.
+// https://code.google.com/p/smhasher/
+// This code is a port of some of the Smhasher tests to Go.
+//
+// The current AES hash function passes Smhasher. Our fallback
+// hash functions don't, so we only enable the difficult tests when
+// we know the AES implementation is available.
+
+// Sanity checks.
+// hash should not depend on values outside key.
+// hash should not depend on alignment.
+func TestSmhasherSanity(t *testing.T) {
+	r := rand.New(rand.NewSource(1234))
+	const REP = 10
+	const KEYMAX = 128
+	const PAD = 16
+	const OFFMAX = 16
+	for k := 0; k < REP; k++ {
+		for n := 0; n < KEYMAX; n++ {
+			for i := 0; i < OFFMAX; i++ {
+				var b [KEYMAX + OFFMAX + 2*PAD]byte
+				var c [KEYMAX + OFFMAX + 2*PAD]byte
+				randBytes(r, b[:])
+				randBytes(r, c[:])
+				copy(c[PAD+i:PAD+i+n], b[PAD:PAD+n])
+				if BytesHash(b[PAD:PAD+n], 0) != BytesHash(c[PAD+i:PAD+i+n], 0) {
+					t.Errorf("hash depends on bytes outside key")
+				}
+			}
+		}
+	}
+}
+
+type HashSet struct {
+	m map[uintptr]struct{} // set of hashes added
+	n int                  // number of hashes added
+}
+
+func newHashSet() *HashSet {
+	return &HashSet{make(map[uintptr]struct{}), 0}
+}
+func (s *HashSet) add(h uintptr) {
+	s.m[h] = struct{}{}
+	s.n++
+}
+func (s *HashSet) addS(x string) {
+	s.add(StringHash(x, 0))
+}
+func (s *HashSet) addB(x []byte) {
+	s.add(BytesHash(x, 0))
+}
+func (s *HashSet) addS_seed(x string, seed uintptr) {
+	s.add(StringHash(x, seed))
+}
+func (s *HashSet) check(t *testing.T) {
+	const SLOP = 50.0
+	collisions := s.n - len(s.m)
+	pairs := int64(s.n) * int64(s.n-1) / 2
+	expected := float64(pairs) / math.Pow(2.0, float64(hashSize))
+	stddev := math.Sqrt(expected)
+	if float64(collisions) > expected+SLOP*(3*stddev+1) {
+		t.Errorf("unexpected number of collisions: got=%d mean=%f stddev=%f threshold=%f", collisions, expected, stddev, expected+SLOP*(3*stddev+1))
+	}
+}
+
+// a string plus adding zeros must make distinct hashes
+func TestSmhasherAppendedZeros(t *testing.T) {
+	s := "hello" + strings.Repeat("\x00", 256)
+	h := newHashSet()
+	for i := 0; i <= len(s); i++ {
+		h.addS(s[:i])
+	}
+	h.check(t)
+}
+
+// All 0-3 byte strings have distinct hashes.
+func TestSmhasherSmallKeys(t *testing.T) {
+	if race.Enabled {
+		t.Skip("Too long for race mode")
+	}
+	h := newHashSet()
+	var b [3]byte
+	for i := 0; i < 256; i++ {
+		b[0] = byte(i)
+		h.addB(b[:1])
+		for j := 0; j < 256; j++ {
+			b[1] = byte(j)
+			h.addB(b[:2])
+			if !testing.Short() {
+				for k := 0; k < 256; k++ {
+					b[2] = byte(k)
+					h.addB(b[:3])
+				}
+			}
+		}
+	}
+	h.check(t)
+}
+
+// Different length strings of all zeros have distinct hashes.
+func TestSmhasherZeros(t *testing.T) {
+	N := 256 * 1024
+	if testing.Short() {
+		N = 1024
+	}
+	h := newHashSet()
+	b := make([]byte, N)
+	for i := 0; i <= N; i++ {
+		h.addB(b[:i])
+	}
+	h.check(t)
+}
+
+// Strings with up to two nonzero bytes all have distinct hashes.
+func TestSmhasherTwoNonzero(t *testing.T) {
+	if GOARCH == "wasm" {
+		t.Skip("Too slow on wasm")
+	}
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	if race.Enabled {
+		t.Skip("Too long for race mode")
+	}
+	h := newHashSet()
+	for n := 2; n <= 16; n++ {
+		twoNonZero(h, n)
+	}
+	h.check(t)
+}
+func twoNonZero(h *HashSet, n int) {
+	b := make([]byte, n)
+
+	// all zero
+	h.addB(b)
+
+	// one non-zero byte
+	for i := 0; i < n; i++ {
+		for x := 1; x < 256; x++ {
+			b[i] = byte(x)
+			h.addB(b)
+			b[i] = 0
+		}
+	}
+
+	// two non-zero bytes
+	for i := 0; i < n; i++ {
+		for x := 1; x < 256; x++ {
+			b[i] = byte(x)
+			for j := i + 1; j < n; j++ {
+				for y := 1; y < 256; y++ {
+					b[j] = byte(y)
+					h.addB(b)
+					b[j] = 0
+				}
+			}
+			b[i] = 0
+		}
+	}
+}
+
+// Test strings with repeats, like "abcdabcdabcdabcd..."
+func TestSmhasherCyclic(t *testing.T) {
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	if race.Enabled {
+		t.Skip("Too long for race mode")
+	}
+	r := rand.New(rand.NewSource(1234))
+	const REPEAT = 8
+	const N = 1000000
+	for n := 4; n <= 12; n++ {
+		h := newHashSet()
+		b := make([]byte, REPEAT*n)
+		for i := 0; i < N; i++ {
+			b[0] = byte(i * 79 % 97)
+			b[1] = byte(i * 43 % 137)
+			b[2] = byte(i * 151 % 197)
+			b[3] = byte(i * 199 % 251)
+			randBytes(r, b[4:n])
+			for j := n; j < n*REPEAT; j++ {
+				b[j] = b[j-n]
+			}
+			h.addB(b)
+		}
+		h.check(t)
+	}
+}
+
+// Test strings with only a few bits set
+func TestSmhasherSparse(t *testing.T) {
+	if GOARCH == "wasm" {
+		t.Skip("Too slow on wasm")
+	}
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	sparse(t, 32, 6)
+	sparse(t, 40, 6)
+	sparse(t, 48, 5)
+	sparse(t, 56, 5)
+	sparse(t, 64, 5)
+	sparse(t, 96, 4)
+	sparse(t, 256, 3)
+	sparse(t, 2048, 2)
+}
+func sparse(t *testing.T, n int, k int) {
+	b := make([]byte, n/8)
+	h := newHashSet()
+	setbits(h, b, 0, k)
+	h.check(t)
+}
+
+// set up to k bits at index i and greater
+func setbits(h *HashSet, b []byte, i int, k int) {
+	h.addB(b)
+	if k == 0 {
+		return
+	}
+	for j := i; j < len(b)*8; j++ {
+		b[j/8] |= byte(1 << uint(j&7))
+		setbits(h, b, j+1, k-1)
+		b[j/8] &= byte(^(1 << uint(j&7)))
+	}
+}
+
+// Test all possible combinations of n blocks from the set s.
+// "permutation" is a bad name here, but it is what Smhasher uses.
+func TestSmhasherPermutation(t *testing.T) {
+	if GOARCH == "wasm" {
+		t.Skip("Too slow on wasm")
+	}
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	if race.Enabled {
+		t.Skip("Too long for race mode")
+	}
+	permutation(t, []uint32{0, 1, 2, 3, 4, 5, 6, 7}, 8)
+	permutation(t, []uint32{0, 1 << 29, 2 << 29, 3 << 29, 4 << 29, 5 << 29, 6 << 29, 7 << 29}, 8)
+	permutation(t, []uint32{0, 1}, 20)
+	permutation(t, []uint32{0, 1 << 31}, 20)
+	permutation(t, []uint32{0, 1, 2, 3, 4, 5, 6, 7, 1 << 29, 2 << 29, 3 << 29, 4 << 29, 5 << 29, 6 << 29, 7 << 29}, 6)
+}
+func permutation(t *testing.T, s []uint32, n int) {
+	b := make([]byte, n*4)
+	h := newHashSet()
+	genPerm(h, b, s, 0)
+	h.check(t)
+}
+func genPerm(h *HashSet, b []byte, s []uint32, n int) {
+	h.addB(b[:n])
+	if n == len(b) {
+		return
+	}
+	for _, v := range s {
+		b[n] = byte(v)
+		b[n+1] = byte(v >> 8)
+		b[n+2] = byte(v >> 16)
+		b[n+3] = byte(v >> 24)
+		genPerm(h, b, s, n+4)
+	}
+}
+
+type Key interface {
+	clear()              // set bits all to 0
+	random(r *rand.Rand) // set key to something random
+	bits() int           // how many bits key has
+	flipBit(i int)       // flip bit i of the key
+	hash() uintptr       // hash the key
+	name() string        // for error reporting
+}
+
+type BytesKey struct {
+	b []byte
+}
+
+func (k *BytesKey) clear() {
+	for i := range k.b {
+		k.b[i] = 0
+	}
+}
+func (k *BytesKey) random(r *rand.Rand) {
+	randBytes(r, k.b)
+}
+func (k *BytesKey) bits() int {
+	return len(k.b) * 8
+}
+func (k *BytesKey) flipBit(i int) {
+	k.b[i>>3] ^= byte(1 << uint(i&7))
+}
+func (k *BytesKey) hash() uintptr {
+	return BytesHash(k.b, 0)
+}
+func (k *BytesKey) name() string {
+	return fmt.Sprintf("bytes%d", len(k.b))
+}
+
+type Int32Key struct {
+	i uint32
+}
+
+func (k *Int32Key) clear() {
+	k.i = 0
+}
+func (k *Int32Key) random(r *rand.Rand) {
+	k.i = r.Uint32()
+}
+func (k *Int32Key) bits() int {
+	return 32
+}
+func (k *Int32Key) flipBit(i int) {
+	k.i ^= 1 << uint(i)
+}
+func (k *Int32Key) hash() uintptr {
+	return Int32Hash(k.i, 0)
+}
+func (k *Int32Key) name() string {
+	return "int32"
+}
+
+type Int64Key struct {
+	i uint64
+}
+
+func (k *Int64Key) clear() {
+	k.i = 0
+}
+func (k *Int64Key) random(r *rand.Rand) {
+	k.i = uint64(r.Uint32()) + uint64(r.Uint32())<<32
+}
+func (k *Int64Key) bits() int {
+	return 64
+}
+func (k *Int64Key) flipBit(i int) {
+	k.i ^= 1 << uint(i)
+}
+func (k *Int64Key) hash() uintptr {
+	return Int64Hash(k.i, 0)
+}
+func (k *Int64Key) name() string {
+	return "int64"
+}
+
+type EfaceKey struct {
+	i any
+}
+
+func (k *EfaceKey) clear() {
+	k.i = nil
+}
+func (k *EfaceKey) random(r *rand.Rand) {
+	k.i = uint64(r.Int63())
+}
+func (k *EfaceKey) bits() int {
+	// use 64 bits. This tests inlined interfaces
+	// on 64-bit targets and indirect interfaces on
+	// 32-bit targets.
+	return 64
+}
+func (k *EfaceKey) flipBit(i int) {
+	k.i = k.i.(uint64) ^ uint64(1)<<uint(i)
+}
+func (k *EfaceKey) hash() uintptr {
+	return EfaceHash(k.i, 0)
+}
+func (k *EfaceKey) name() string {
+	return "Eface"
+}
+
+type IfaceKey struct {
+	i interface {
+		F()
+	}
+}
+type fInter uint64
+
+func (x fInter) F() {
+}
+
+func (k *IfaceKey) clear() {
+	k.i = nil
+}
+func (k *IfaceKey) random(r *rand.Rand) {
+	k.i = fInter(r.Int63())
+}
+func (k *IfaceKey) bits() int {
+	// use 64 bits. This tests inlined interfaces
+	// on 64-bit targets and indirect interfaces on
+	// 32-bit targets.
+	return 64
+}
+func (k *IfaceKey) flipBit(i int) {
+	k.i = k.i.(fInter) ^ fInter(1)<<uint(i)
+}
+func (k *IfaceKey) hash() uintptr {
+	return IfaceHash(k.i, 0)
+}
+func (k *IfaceKey) name() string {
+	return "Iface"
+}
+
+// Flipping a single bit of a key should flip each output bit with 50% probability.
+func TestSmhasherAvalanche(t *testing.T) {
+	if GOARCH == "wasm" {
+		t.Skip("Too slow on wasm")
+	}
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	if race.Enabled {
+		t.Skip("Too long for race mode")
+	}
+	avalancheTest1(t, &BytesKey{make([]byte, 2)})
+	avalancheTest1(t, &BytesKey{make([]byte, 4)})
+	avalancheTest1(t, &BytesKey{make([]byte, 8)})
+	avalancheTest1(t, &BytesKey{make([]byte, 16)})
+	avalancheTest1(t, &BytesKey{make([]byte, 32)})
+	avalancheTest1(t, &BytesKey{make([]byte, 200)})
+	avalancheTest1(t, &Int32Key{})
+	avalancheTest1(t, &Int64Key{})
+	avalancheTest1(t, &EfaceKey{})
+	avalancheTest1(t, &IfaceKey{})
+}
+func avalancheTest1(t *testing.T, k Key) {
+	const REP = 100000
+	r := rand.New(rand.NewSource(1234))
+	n := k.bits()
+
+	// grid[i][j] is a count of whether flipping
+	// input bit i affects output bit j.
+	grid := make([][hashSize]int, n)
+
+	for z := 0; z < REP; z++ {
+		// pick a random key, hash it
+		k.random(r)
+		h := k.hash()
+
+		// flip each bit, hash & compare the results
+		for i := 0; i < n; i++ {
+			k.flipBit(i)
+			d := h ^ k.hash()
+			k.flipBit(i)
+
+			// record the effects of that bit flip
+			g := &grid[i]
+			for j := 0; j < hashSize; j++ {
+				g[j] += int(d & 1)
+				d >>= 1
+			}
+		}
+	}
+
+	// Each entry in the grid should be about REP/2.
+	// More precisely, we did N = k.bits() * hashSize experiments where
+	// each is the sum of REP coin flips. We want to find bounds on the
+	// sum of coin flips such that a truly random experiment would have
+	// all sums inside those bounds with 99% probability.
+	N := n * hashSize
+	var c float64
+	// find c such that Prob(mean-c*stddev < x < mean+c*stddev)^N > .9999
+	for c = 0.0; math.Pow(math.Erf(c/math.Sqrt(2)), float64(N)) < .9999; c += .1 {
+	}
+	c *= 8.0 // allowed slack - we don't need to be perfectly random
+	mean := .5 * REP
+	stddev := .5 * math.Sqrt(REP)
+	low := int(mean - c*stddev)
+	high := int(mean + c*stddev)
+	for i := 0; i < n; i++ {
+		for j := 0; j < hashSize; j++ {
+			x := grid[i][j]
+			if x < low || x > high {
+				t.Errorf("bad bias for %s bit %d -> bit %d: %d/%d\n", k.name(), i, j, x, REP)
+			}
+		}
+	}
+}
+
+// All bit rotations of a set of distinct keys
+func TestSmhasherWindowed(t *testing.T) {
+	if race.Enabled {
+		t.Skip("Too long for race mode")
+	}
+	t.Logf("32 bit keys")
+	windowed(t, &Int32Key{})
+	t.Logf("64 bit keys")
+	windowed(t, &Int64Key{})
+	t.Logf("string keys")
+	windowed(t, &BytesKey{make([]byte, 128)})
+}
+func windowed(t *testing.T, k Key) {
+	if GOARCH == "wasm" {
+		t.Skip("Too slow on wasm")
+	}
+	if PtrSize == 4 {
+		// This test tends to be flaky on 32-bit systems.
+		// There's not enough bits in the hash output, so we
+		// expect a nontrivial number of collisions, and it is
+		// often quite a bit higher than expected. See issue 43130.
+		t.Skip("Flaky on 32-bit systems")
+	}
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	const BITS = 16
+
+	for r := 0; r < k.bits(); r++ {
+		h := newHashSet()
+		for i := 0; i < 1<<BITS; i++ {
+			k.clear()
+			for j := 0; j < BITS; j++ {
+				if i>>uint(j)&1 != 0 {
+					k.flipBit((j + r) % k.bits())
+				}
+			}
+			h.add(k.hash())
+		}
+		h.check(t)
+	}
+}
+
+// All keys of the form prefix + [A-Za-z0-9]*N + suffix.
+func TestSmhasherText(t *testing.T) {
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	text(t, "Foo", "Bar")
+	text(t, "FooBar", "")
+	text(t, "", "FooBar")
+}
+func text(t *testing.T, prefix, suffix string) {
+	const N = 4
+	const S = "ABCDEFGHIJKLMNOPQRSTabcdefghijklmnopqrst0123456789"
+	const L = len(S)
+	b := make([]byte, len(prefix)+N+len(suffix))
+	copy(b, prefix)
+	copy(b[len(prefix)+N:], suffix)
+	h := newHashSet()
+	c := b[len(prefix):]
+	for i := 0; i < L; i++ {
+		c[0] = S[i]
+		for j := 0; j < L; j++ {
+			c[1] = S[j]
+			for k := 0; k < L; k++ {
+				c[2] = S[k]
+				for x := 0; x < L; x++ {
+					c[3] = S[x]
+					h.addB(b)
+				}
+			}
+		}
+	}
+	h.check(t)
+}
+
+// Make sure different seed values generate different hashes.
+func TestSmhasherSeed(t *testing.T) {
+	h := newHashSet()
+	const N = 100000
+	s := "hello"
+	for i := 0; i < N; i++ {
+		h.addS_seed(s, uintptr(i))
+	}
+	h.check(t)
+}
+
+// size of the hash output (32 or 64 bits)
+const hashSize = 32 + int(^uintptr(0)>>63<<5)
+
+func randBytes(r *rand.Rand, b []byte) {
+	for i := range b {
+		b[i] = byte(r.Uint32())
+	}
+}
+
+func benchmarkHash(b *testing.B, n int) {
+	s := strings.Repeat("A", n)
+
+	for i := 0; i < b.N; i++ {
+		StringHash(s, 0)
+	}
+	b.SetBytes(int64(n))
+}
+
+func BenchmarkHash5(b *testing.B)     { benchmarkHash(b, 5) }
+func BenchmarkHash16(b *testing.B)    { benchmarkHash(b, 16) }
+func BenchmarkHash64(b *testing.B)    { benchmarkHash(b, 64) }
+func BenchmarkHash1024(b *testing.B)  { benchmarkHash(b, 1024) }
+func BenchmarkHash65536(b *testing.B) { benchmarkHash(b, 65536) }
+
+func TestArrayHash(t *testing.T) {
+	// Make sure that "" in arrays hash correctly. The hash
+	// should at least scramble the input seed so that, e.g.,
+	// {"","foo"} and {"foo",""} have different hashes.
+
+	// If the hash is bad, then all (8 choose 4) = 70 keys
+	// have the same hash. If so, we allocate 70/8 = 8
+	// overflow buckets. If the hash is good we don't
+	// normally allocate any overflow buckets, and the
+	// probability of even one or two overflows goes down rapidly.
+	// (There is always 1 allocation of the bucket array. The map
+	// header is allocated on the stack.)
+	f := func() {
+		// Make the key type at most 128 bytes. Otherwise,
+		// we get an allocation per key.
+		type key [8]string
+		m := make(map[key]bool, 70)
+
+		// fill m with keys that have 4 "foo"s and 4 ""s.
+		for i := 0; i < 256; i++ {
+			var k key
+			cnt := 0
+			for j := uint(0); j < 8; j++ {
+				if i>>j&1 != 0 {
+					k[j] = "foo"
+					cnt++
+				}
+			}
+			if cnt == 4 {
+				m[k] = true
+			}
+		}
+		if len(m) != 70 {
+			t.Errorf("bad test: (8 choose 4) should be 70, not %d", len(m))
+		}
+	}
+	if n := testing.AllocsPerRun(10, f); n > 6 {
+		t.Errorf("too many allocs %f - hash not balanced", n)
+	}
+}
+func TestStructHash(t *testing.T) {
+	// See the comment in TestArrayHash.
+	f := func() {
+		type key struct {
+			a, b, c, d, e, f, g, h string
+		}
+		m := make(map[key]bool, 70)
+
+		// fill m with keys that have 4 "foo"s and 4 ""s.
+		for i := 0; i < 256; i++ {
+			var k key
+			cnt := 0
+			if i&1 != 0 {
+				k.a = "foo"
+				cnt++
+			}
+			if i&2 != 0 {
+				k.b = "foo"
+				cnt++
+			}
+			if i&4 != 0 {
+				k.c = "foo"
+				cnt++
+			}
+			if i&8 != 0 {
+				k.d = "foo"
+				cnt++
+			}
+			if i&16 != 0 {
+				k.e = "foo"
+				cnt++
+			}
+			if i&32 != 0 {
+				k.f = "foo"
+				cnt++
+			}
+			if i&64 != 0 {
+				k.g = "foo"
+				cnt++
+			}
+			if i&128 != 0 {
+				k.h = "foo"
+				cnt++
+			}
+			if cnt == 4 {
+				m[k] = true
+			}
+		}
+		if len(m) != 70 {
+			t.Errorf("bad test: (8 choose 4) should be 70, not %d", len(m))
+		}
+	}
+	if n := testing.AllocsPerRun(10, f); n > 6 {
+		t.Errorf("too many allocs %f - hash not balanced", n)
+	}
+}
+
+var sink uint64
+
+func BenchmarkAlignedLoad(b *testing.B) {
+	var buf [16]byte
+	p := unsafe.Pointer(&buf[0])
+	var s uint64
+	for i := 0; i < b.N; i++ {
+		s += ReadUnaligned64(p)
+	}
+	sink = s
+}
+
+func BenchmarkUnalignedLoad(b *testing.B) {
+	var buf [16]byte
+	p := unsafe.Pointer(&buf[1])
+	var s uint64
+	for i := 0; i < b.N; i++ {
+		s += ReadUnaligned64(p)
+	}
+	sink = s
+}
+
+func TestCollisions(t *testing.T) {
+	if testing.Short() {
+		t.Skip("Skipping in short mode")
+	}
+	for i := 0; i < 16; i++ {
+		for j := 0; j < 16; j++ {
+			if j == i {
+				continue
+			}
+			var a [16]byte
+			m := make(map[uint16]struct{}, 1<<16)
+			for n := 0; n < 1<<16; n++ {
+				a[i] = byte(n)
+				a[j] = byte(n >> 8)
+				m[uint16(BytesHash(a[:], 0))] = struct{}{}
+			}
+			// N balls in N bins, for N=65536
+			avg := 41427
+			stdDev := 123
+			if len(m) < avg-40*stdDev || len(m) > avg+40*stdDev {
+				t.Errorf("bad number of collisions i=%d j=%d outputs=%d out of 65536\n", i, j, len(m))
+			}
+		}
+	}
+}
diff --git a/src/runtime/heap_test.go b/src/runtime/heap_test.go
new file mode 100644
index 0000000..4b73ab5
--- /dev/null
+++ b/src/runtime/heap_test.go
@@ -0,0 +1,21 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"testing"
+	_ "unsafe"
+)
+
+//go:linkname heapObjectsCanMove runtime.heapObjectsCanMove
+func heapObjectsCanMove() bool
+
+func TestHeapObjectsCanMove(t *testing.T) {
+	if heapObjectsCanMove() {
+		// If this happens (or this test stops building),
+		// it will break go4.org/unsafe/assume-no-moving-gc.
+		t.Fatalf("heap objects can move!")
+	}
+}
diff --git a/src/runtime/heapdump.go b/src/runtime/heapdump.go
new file mode 100644
index 0000000..8ddec8b
--- /dev/null
+++ b/src/runtime/heapdump.go
@@ -0,0 +1,752 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Implementation of runtime/debug.WriteHeapDump. Writes all
+// objects in the heap plus additional info (roots, threads,
+// finalizers, etc.) to a file.
+
+// The format of the dumped file is described at
+// https://golang.org/s/go15heapdump.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+//go:linkname runtime_debug_WriteHeapDump runtime/debug.WriteHeapDump
+func runtime_debug_WriteHeapDump(fd uintptr) {
+	stopTheWorld(stwWriteHeapDump)
+
+	// Keep m on this G's stack instead of the system stack.
+	// Both readmemstats_m and writeheapdump_m have pretty large
+	// peak stack depths and we risk blowing the system stack.
+	// This is safe because the world is stopped, so we don't
+	// need to worry about anyone shrinking and therefore moving
+	// our stack.
+	var m MemStats
+	systemstack(func() {
+		// Call readmemstats_m here instead of deeper in
+		// writeheapdump_m because we might blow the system stack
+		// otherwise.
+		readmemstats_m(&m)
+		writeheapdump_m(fd, &m)
+	})
+
+	startTheWorld()
+}
+
+const (
+	fieldKindEol       = 0
+	fieldKindPtr       = 1
+	fieldKindIface     = 2
+	fieldKindEface     = 3
+	tagEOF             = 0
+	tagObject          = 1
+	tagOtherRoot       = 2
+	tagType            = 3
+	tagGoroutine       = 4
+	tagStackFrame      = 5
+	tagParams          = 6
+	tagFinalizer       = 7
+	tagItab            = 8
+	tagOSThread        = 9
+	tagMemStats        = 10
+	tagQueuedFinalizer = 11
+	tagData            = 12
+	tagBSS             = 13
+	tagDefer           = 14
+	tagPanic           = 15
+	tagMemProf         = 16
+	tagAllocSample     = 17
+)
+
+var dumpfd uintptr // fd to write the dump to.
+var tmpbuf []byte
+
+// buffer of pending write data
+const (
+	bufSize = 4096
+)
+
+var buf [bufSize]byte
+var nbuf uintptr
+
+func dwrite(data unsafe.Pointer, len uintptr) {
+	if len == 0 {
+		return
+	}
+	if nbuf+len <= bufSize {
+		copy(buf[nbuf:], (*[bufSize]byte)(data)[:len])
+		nbuf += len
+		return
+	}
+
+	write(dumpfd, unsafe.Pointer(&buf), int32(nbuf))
+	if len >= bufSize {
+		write(dumpfd, data, int32(len))
+		nbuf = 0
+	} else {
+		copy(buf[:], (*[bufSize]byte)(data)[:len])
+		nbuf = len
+	}
+}
+
+func dwritebyte(b byte) {
+	dwrite(unsafe.Pointer(&b), 1)
+}
+
+func flush() {
+	write(dumpfd, unsafe.Pointer(&buf), int32(nbuf))
+	nbuf = 0
+}
+
+// Cache of types that have been serialized already.
+// We use a type's hash field to pick a bucket.
+// Inside a bucket, we keep a list of types that
+// have been serialized so far, most recently used first.
+// Note: when a bucket overflows we may end up
+// serializing a type more than once. That's ok.
+const (
+	typeCacheBuckets = 256
+	typeCacheAssoc   = 4
+)
+
+type typeCacheBucket struct {
+	t [typeCacheAssoc]*_type
+}
+
+var typecache [typeCacheBuckets]typeCacheBucket
+
+// dump a uint64 in a varint format parseable by encoding/binary.
+func dumpint(v uint64) {
+	var buf [10]byte
+	var n int
+	for v >= 0x80 {
+		buf[n] = byte(v | 0x80)
+		n++
+		v >>= 7
+	}
+	buf[n] = byte(v)
+	n++
+	dwrite(unsafe.Pointer(&buf), uintptr(n))
+}
+
+func dumpbool(b bool) {
+	if b {
+		dumpint(1)
+	} else {
+		dumpint(0)
+	}
+}
+
+// dump varint uint64 length followed by memory contents.
+func dumpmemrange(data unsafe.Pointer, len uintptr) {
+	dumpint(uint64(len))
+	dwrite(data, len)
+}
+
+func dumpslice(b []byte) {
+	dumpint(uint64(len(b)))
+	if len(b) > 0 {
+		dwrite(unsafe.Pointer(&b[0]), uintptr(len(b)))
+	}
+}
+
+func dumpstr(s string) {
+	dumpmemrange(unsafe.Pointer(unsafe.StringData(s)), uintptr(len(s)))
+}
+
+// dump information for a type.
+func dumptype(t *_type) {
+	if t == nil {
+		return
+	}
+
+	// If we've definitely serialized the type before,
+	// no need to do it again.
+	b := &typecache[t.Hash&(typeCacheBuckets-1)]
+	if t == b.t[0] {
+		return
+	}
+	for i := 1; i < typeCacheAssoc; i++ {
+		if t == b.t[i] {
+			// Move-to-front
+			for j := i; j > 0; j-- {
+				b.t[j] = b.t[j-1]
+			}
+			b.t[0] = t
+			return
+		}
+	}
+
+	// Might not have been dumped yet. Dump it and
+	// remember we did so.
+	for j := typeCacheAssoc - 1; j > 0; j-- {
+		b.t[j] = b.t[j-1]
+	}
+	b.t[0] = t
+
+	// dump the type
+	dumpint(tagType)
+	dumpint(uint64(uintptr(unsafe.Pointer(t))))
+	dumpint(uint64(t.Size_))
+	rt := toRType(t)
+	if x := t.Uncommon(); x == nil || rt.nameOff(x.PkgPath).Name() == "" {
+		dumpstr(rt.string())
+	} else {
+		pkgpath := rt.nameOff(x.PkgPath).Name()
+		name := rt.name()
+		dumpint(uint64(uintptr(len(pkgpath)) + 1 + uintptr(len(name))))
+		dwrite(unsafe.Pointer(unsafe.StringData(pkgpath)), uintptr(len(pkgpath)))
+		dwritebyte('.')
+		dwrite(unsafe.Pointer(unsafe.StringData(name)), uintptr(len(name)))
+	}
+	dumpbool(t.Kind_&kindDirectIface == 0 || t.PtrBytes != 0)
+}
+
+// dump an object.
+func dumpobj(obj unsafe.Pointer, size uintptr, bv bitvector) {
+	dumpint(tagObject)
+	dumpint(uint64(uintptr(obj)))
+	dumpmemrange(obj, size)
+	dumpfields(bv)
+}
+
+func dumpotherroot(description string, to unsafe.Pointer) {
+	dumpint(tagOtherRoot)
+	dumpstr(description)
+	dumpint(uint64(uintptr(to)))
+}
+
+func dumpfinalizer(obj unsafe.Pointer, fn *funcval, fint *_type, ot *ptrtype) {
+	dumpint(tagFinalizer)
+	dumpint(uint64(uintptr(obj)))
+	dumpint(uint64(uintptr(unsafe.Pointer(fn))))
+	dumpint(uint64(uintptr(unsafe.Pointer(fn.fn))))
+	dumpint(uint64(uintptr(unsafe.Pointer(fint))))
+	dumpint(uint64(uintptr(unsafe.Pointer(ot))))
+}
+
+type childInfo struct {
+	// Information passed up from the callee frame about
+	// the layout of the outargs region.
+	argoff uintptr   // where the arguments start in the frame
+	arglen uintptr   // size of args region
+	args   bitvector // if args.n >= 0, pointer map of args region
+	sp     *uint8    // callee sp
+	depth  uintptr   // depth in call stack (0 == most recent)
+}
+
+// dump kinds & offsets of interesting fields in bv.
+func dumpbv(cbv *bitvector, offset uintptr) {
+	for i := uintptr(0); i < uintptr(cbv.n); i++ {
+		if cbv.ptrbit(i) == 1 {
+			dumpint(fieldKindPtr)
+			dumpint(uint64(offset + i*goarch.PtrSize))
+		}
+	}
+}
+
+func dumpframe(s *stkframe, child *childInfo) {
+	f := s.fn
+
+	// Figure out what we can about our stack map
+	pc := s.pc
+	pcdata := int32(-1) // Use the entry map at function entry
+	if pc != f.entry() {
+		pc--
+		pcdata = pcdatavalue(f, abi.PCDATA_StackMapIndex, pc, nil)
+	}
+	if pcdata == -1 {
+		// We do not have a valid pcdata value but there might be a
+		// stackmap for this function. It is likely that we are looking
+		// at the function prologue, assume so and hope for the best.
+		pcdata = 0
+	}
+	stkmap := (*stackmap)(funcdata(f, abi.FUNCDATA_LocalsPointerMaps))
+
+	var bv bitvector
+	if stkmap != nil && stkmap.n > 0 {
+		bv = stackmapdata(stkmap, pcdata)
+	} else {
+		bv.n = -1
+	}
+
+	// Dump main body of stack frame.
+	dumpint(tagStackFrame)
+	dumpint(uint64(s.sp))                              // lowest address in frame
+	dumpint(uint64(child.depth))                       // # of frames deep on the stack
+	dumpint(uint64(uintptr(unsafe.Pointer(child.sp)))) // sp of child, or 0 if bottom of stack
+	dumpmemrange(unsafe.Pointer(s.sp), s.fp-s.sp)      // frame contents
+	dumpint(uint64(f.entry()))
+	dumpint(uint64(s.pc))
+	dumpint(uint64(s.continpc))
+	name := funcname(f)
+	if name == "" {
+		name = "unknown function"
+	}
+	dumpstr(name)
+
+	// Dump fields in the outargs section
+	if child.args.n >= 0 {
+		dumpbv(&child.args, child.argoff)
+	} else {
+		// conservative - everything might be a pointer
+		for off := child.argoff; off < child.argoff+child.arglen; off += goarch.PtrSize {
+			dumpint(fieldKindPtr)
+			dumpint(uint64(off))
+		}
+	}
+
+	// Dump fields in the local vars section
+	if stkmap == nil {
+		// No locals information, dump everything.
+		for off := child.arglen; off < s.varp-s.sp; off += goarch.PtrSize {
+			dumpint(fieldKindPtr)
+			dumpint(uint64(off))
+		}
+	} else if stkmap.n < 0 {
+		// Locals size information, dump just the locals.
+		size := uintptr(-stkmap.n)
+		for off := s.varp - size - s.sp; off < s.varp-s.sp; off += goarch.PtrSize {
+			dumpint(fieldKindPtr)
+			dumpint(uint64(off))
+		}
+	} else if stkmap.n > 0 {
+		// Locals bitmap information, scan just the pointers in
+		// locals.
+		dumpbv(&bv, s.varp-uintptr(bv.n)*goarch.PtrSize-s.sp)
+	}
+	dumpint(fieldKindEol)
+
+	// Record arg info for parent.
+	child.argoff = s.argp - s.fp
+	child.arglen = s.argBytes()
+	child.sp = (*uint8)(unsafe.Pointer(s.sp))
+	child.depth++
+	stkmap = (*stackmap)(funcdata(f, abi.FUNCDATA_ArgsPointerMaps))
+	if stkmap != nil {
+		child.args = stackmapdata(stkmap, pcdata)
+	} else {
+		child.args.n = -1
+	}
+	return
+}
+
+func dumpgoroutine(gp *g) {
+	var sp, pc, lr uintptr
+	if gp.syscallsp != 0 {
+		sp = gp.syscallsp
+		pc = gp.syscallpc
+		lr = 0
+	} else {
+		sp = gp.sched.sp
+		pc = gp.sched.pc
+		lr = gp.sched.lr
+	}
+
+	dumpint(tagGoroutine)
+	dumpint(uint64(uintptr(unsafe.Pointer(gp))))
+	dumpint(uint64(sp))
+	dumpint(gp.goid)
+	dumpint(uint64(gp.gopc))
+	dumpint(uint64(readgstatus(gp)))
+	dumpbool(isSystemGoroutine(gp, false))
+	dumpbool(false) // isbackground
+	dumpint(uint64(gp.waitsince))
+	dumpstr(gp.waitreason.String())
+	dumpint(uint64(uintptr(gp.sched.ctxt)))
+	dumpint(uint64(uintptr(unsafe.Pointer(gp.m))))
+	dumpint(uint64(uintptr(unsafe.Pointer(gp._defer))))
+	dumpint(uint64(uintptr(unsafe.Pointer(gp._panic))))
+
+	// dump stack
+	var child childInfo
+	child.args.n = -1
+	child.arglen = 0
+	child.sp = nil
+	child.depth = 0
+	var u unwinder
+	for u.initAt(pc, sp, lr, gp, 0); u.valid(); u.next() {
+		dumpframe(&u.frame, &child)
+	}
+
+	// dump defer & panic records
+	for d := gp._defer; d != nil; d = d.link {
+		dumpint(tagDefer)
+		dumpint(uint64(uintptr(unsafe.Pointer(d))))
+		dumpint(uint64(uintptr(unsafe.Pointer(gp))))
+		dumpint(uint64(d.sp))
+		dumpint(uint64(d.pc))
+		fn := *(**funcval)(unsafe.Pointer(&d.fn))
+		dumpint(uint64(uintptr(unsafe.Pointer(fn))))
+		if d.fn == nil {
+			// d.fn can be nil for open-coded defers
+			dumpint(uint64(0))
+		} else {
+			dumpint(uint64(uintptr(unsafe.Pointer(fn.fn))))
+		}
+		dumpint(uint64(uintptr(unsafe.Pointer(d.link))))
+	}
+	for p := gp._panic; p != nil; p = p.link {
+		dumpint(tagPanic)
+		dumpint(uint64(uintptr(unsafe.Pointer(p))))
+		dumpint(uint64(uintptr(unsafe.Pointer(gp))))
+		eface := efaceOf(&p.arg)
+		dumpint(uint64(uintptr(unsafe.Pointer(eface._type))))
+		dumpint(uint64(uintptr(unsafe.Pointer(eface.data))))
+		dumpint(0) // was p->defer, no longer recorded
+		dumpint(uint64(uintptr(unsafe.Pointer(p.link))))
+	}
+}
+
+func dumpgs() {
+	assertWorldStopped()
+
+	// goroutines & stacks
+	forEachG(func(gp *g) {
+		status := readgstatus(gp) // The world is stopped so gp will not be in a scan state.
+		switch status {
+		default:
+			print("runtime: unexpected G.status ", hex(status), "\n")
+			throw("dumpgs in STW - bad status")
+		case _Gdead:
+			// ok
+		case _Grunnable,
+			_Gsyscall,
+			_Gwaiting:
+			dumpgoroutine(gp)
+		}
+	})
+}
+
+func finq_callback(fn *funcval, obj unsafe.Pointer, nret uintptr, fint *_type, ot *ptrtype) {
+	dumpint(tagQueuedFinalizer)
+	dumpint(uint64(uintptr(obj)))
+	dumpint(uint64(uintptr(unsafe.Pointer(fn))))
+	dumpint(uint64(uintptr(unsafe.Pointer(fn.fn))))
+	dumpint(uint64(uintptr(unsafe.Pointer(fint))))
+	dumpint(uint64(uintptr(unsafe.Pointer(ot))))
+}
+
+func dumproots() {
+	// To protect mheap_.allspans.
+	assertWorldStopped()
+
+	// TODO(mwhudson): dump datamask etc from all objects
+	// data segment
+	dumpint(tagData)
+	dumpint(uint64(firstmoduledata.data))
+	dumpmemrange(unsafe.Pointer(firstmoduledata.data), firstmoduledata.edata-firstmoduledata.data)
+	dumpfields(firstmoduledata.gcdatamask)
+
+	// bss segment
+	dumpint(tagBSS)
+	dumpint(uint64(firstmoduledata.bss))
+	dumpmemrange(unsafe.Pointer(firstmoduledata.bss), firstmoduledata.ebss-firstmoduledata.bss)
+	dumpfields(firstmoduledata.gcbssmask)
+
+	// mspan.types
+	for _, s := range mheap_.allspans {
+		if s.state.get() == mSpanInUse {
+			// Finalizers
+			for sp := s.specials; sp != nil; sp = sp.next {
+				if sp.kind != _KindSpecialFinalizer {
+					continue
+				}
+				spf := (*specialfinalizer)(unsafe.Pointer(sp))
+				p := unsafe.Pointer(s.base() + uintptr(spf.special.offset))
+				dumpfinalizer(p, spf.fn, spf.fint, spf.ot)
+			}
+		}
+	}
+
+	// Finalizer queue
+	iterate_finq(finq_callback)
+}
+
+// Bit vector of free marks.
+// Needs to be as big as the largest number of objects per span.
+var freemark [_PageSize / 8]bool
+
+func dumpobjs() {
+	// To protect mheap_.allspans.
+	assertWorldStopped()
+
+	for _, s := range mheap_.allspans {
+		if s.state.get() != mSpanInUse {
+			continue
+		}
+		p := s.base()
+		size := s.elemsize
+		n := (s.npages << _PageShift) / size
+		if n > uintptr(len(freemark)) {
+			throw("freemark array doesn't have enough entries")
+		}
+
+		for freeIndex := uintptr(0); freeIndex < s.nelems; freeIndex++ {
+			if s.isFree(freeIndex) {
+				freemark[freeIndex] = true
+			}
+		}
+
+		for j := uintptr(0); j < n; j, p = j+1, p+size {
+			if freemark[j] {
+				freemark[j] = false
+				continue
+			}
+			dumpobj(unsafe.Pointer(p), size, makeheapobjbv(p, size))
+		}
+	}
+}
+
+func dumpparams() {
+	dumpint(tagParams)
+	x := uintptr(1)
+	if *(*byte)(unsafe.Pointer(&x)) == 1 {
+		dumpbool(false) // little-endian ptrs
+	} else {
+		dumpbool(true) // big-endian ptrs
+	}
+	dumpint(goarch.PtrSize)
+	var arenaStart, arenaEnd uintptr
+	for i1 := range mheap_.arenas {
+		if mheap_.arenas[i1] == nil {
+			continue
+		}
+		for i, ha := range mheap_.arenas[i1] {
+			if ha == nil {
+				continue
+			}
+			base := arenaBase(arenaIdx(i1)<<arenaL1Shift | arenaIdx(i))
+			if arenaStart == 0 || base < arenaStart {
+				arenaStart = base
+			}
+			if base+heapArenaBytes > arenaEnd {
+				arenaEnd = base + heapArenaBytes
+			}
+		}
+	}
+	dumpint(uint64(arenaStart))
+	dumpint(uint64(arenaEnd))
+	dumpstr(goarch.GOARCH)
+	dumpstr(buildVersion)
+	dumpint(uint64(ncpu))
+}
+
+func itab_callback(tab *itab) {
+	t := tab._type
+	dumptype(t)
+	dumpint(tagItab)
+	dumpint(uint64(uintptr(unsafe.Pointer(tab))))
+	dumpint(uint64(uintptr(unsafe.Pointer(t))))
+}
+
+func dumpitabs() {
+	iterate_itabs(itab_callback)
+}
+
+func dumpms() {
+	for mp := allm; mp != nil; mp = mp.alllink {
+		dumpint(tagOSThread)
+		dumpint(uint64(uintptr(unsafe.Pointer(mp))))
+		dumpint(uint64(mp.id))
+		dumpint(mp.procid)
+	}
+}
+
+//go:systemstack
+func dumpmemstats(m *MemStats) {
+	assertWorldStopped()
+
+	// These ints should be identical to the exported
+	// MemStats structure and should be ordered the same
+	// way too.
+	dumpint(tagMemStats)
+	dumpint(m.Alloc)
+	dumpint(m.TotalAlloc)
+	dumpint(m.Sys)
+	dumpint(m.Lookups)
+	dumpint(m.Mallocs)
+	dumpint(m.Frees)
+	dumpint(m.HeapAlloc)
+	dumpint(m.HeapSys)
+	dumpint(m.HeapIdle)
+	dumpint(m.HeapInuse)
+	dumpint(m.HeapReleased)
+	dumpint(m.HeapObjects)
+	dumpint(m.StackInuse)
+	dumpint(m.StackSys)
+	dumpint(m.MSpanInuse)
+	dumpint(m.MSpanSys)
+	dumpint(m.MCacheInuse)
+	dumpint(m.MCacheSys)
+	dumpint(m.BuckHashSys)
+	dumpint(m.GCSys)
+	dumpint(m.OtherSys)
+	dumpint(m.NextGC)
+	dumpint(m.LastGC)
+	dumpint(m.PauseTotalNs)
+	for i := 0; i < 256; i++ {
+		dumpint(m.PauseNs[i])
+	}
+	dumpint(uint64(m.NumGC))
+}
+
+func dumpmemprof_callback(b *bucket, nstk uintptr, pstk *uintptr, size, allocs, frees uintptr) {
+	stk := (*[100000]uintptr)(unsafe.Pointer(pstk))
+	dumpint(tagMemProf)
+	dumpint(uint64(uintptr(unsafe.Pointer(b))))
+	dumpint(uint64(size))
+	dumpint(uint64(nstk))
+	for i := uintptr(0); i < nstk; i++ {
+		pc := stk[i]
+		f := findfunc(pc)
+		if !f.valid() {
+			var buf [64]byte
+			n := len(buf)
+			n--
+			buf[n] = ')'
+			if pc == 0 {
+				n--
+				buf[n] = '0'
+			} else {
+				for pc > 0 {
+					n--
+					buf[n] = "0123456789abcdef"[pc&15]
+					pc >>= 4
+				}
+			}
+			n--
+			buf[n] = 'x'
+			n--
+			buf[n] = '0'
+			n--
+			buf[n] = '('
+			dumpslice(buf[n:])
+			dumpstr("?")
+			dumpint(0)
+		} else {
+			dumpstr(funcname(f))
+			if i > 0 && pc > f.entry() {
+				pc--
+			}
+			file, line := funcline(f, pc)
+			dumpstr(file)
+			dumpint(uint64(line))
+		}
+	}
+	dumpint(uint64(allocs))
+	dumpint(uint64(frees))
+}
+
+func dumpmemprof() {
+	// To protect mheap_.allspans.
+	assertWorldStopped()
+
+	iterate_memprof(dumpmemprof_callback)
+	for _, s := range mheap_.allspans {
+		if s.state.get() != mSpanInUse {
+			continue
+		}
+		for sp := s.specials; sp != nil; sp = sp.next {
+			if sp.kind != _KindSpecialProfile {
+				continue
+			}
+			spp := (*specialprofile)(unsafe.Pointer(sp))
+			p := s.base() + uintptr(spp.special.offset)
+			dumpint(tagAllocSample)
+			dumpint(uint64(p))
+			dumpint(uint64(uintptr(unsafe.Pointer(spp.b))))
+		}
+	}
+}
+
+var dumphdr = []byte("go1.7 heap dump\n")
+
+func mdump(m *MemStats) {
+	assertWorldStopped()
+
+	// make sure we're done sweeping
+	for _, s := range mheap_.allspans {
+		if s.state.get() == mSpanInUse {
+			s.ensureSwept()
+		}
+	}
+	memclrNoHeapPointers(unsafe.Pointer(&typecache), unsafe.Sizeof(typecache))
+	dwrite(unsafe.Pointer(&dumphdr[0]), uintptr(len(dumphdr)))
+	dumpparams()
+	dumpitabs()
+	dumpobjs()
+	dumpgs()
+	dumpms()
+	dumproots()
+	dumpmemstats(m)
+	dumpmemprof()
+	dumpint(tagEOF)
+	flush()
+}
+
+func writeheapdump_m(fd uintptr, m *MemStats) {
+	assertWorldStopped()
+
+	gp := getg()
+	casGToWaiting(gp.m.curg, _Grunning, waitReasonDumpingHeap)
+
+	// Set dump file.
+	dumpfd = fd
+
+	// Call dump routine.
+	mdump(m)
+
+	// Reset dump file.
+	dumpfd = 0
+	if tmpbuf != nil {
+		sysFree(unsafe.Pointer(&tmpbuf[0]), uintptr(len(tmpbuf)), &memstats.other_sys)
+		tmpbuf = nil
+	}
+
+	casgstatus(gp.m.curg, _Gwaiting, _Grunning)
+}
+
+// dumpint() the kind & offset of each field in an object.
+func dumpfields(bv bitvector) {
+	dumpbv(&bv, 0)
+	dumpint(fieldKindEol)
+}
+
+func makeheapobjbv(p uintptr, size uintptr) bitvector {
+	// Extend the temp buffer if necessary.
+	nptr := size / goarch.PtrSize
+	if uintptr(len(tmpbuf)) < nptr/8+1 {
+		if tmpbuf != nil {
+			sysFree(unsafe.Pointer(&tmpbuf[0]), uintptr(len(tmpbuf)), &memstats.other_sys)
+		}
+		n := nptr/8 + 1
+		p := sysAlloc(n, &memstats.other_sys)
+		if p == nil {
+			throw("heapdump: out of memory")
+		}
+		tmpbuf = (*[1 << 30]byte)(p)[:n]
+	}
+	// Convert heap bitmap to pointer bitmap.
+	for i := uintptr(0); i < nptr/8+1; i++ {
+		tmpbuf[i] = 0
+	}
+
+	hbits := heapBitsForAddr(p, size)
+	for {
+		var addr uintptr
+		hbits, addr = hbits.next()
+		if addr == 0 {
+			break
+		}
+		i := (addr - p) / goarch.PtrSize
+		tmpbuf[i/8] |= 1 << (i % 8)
+	}
+	return bitvector{int32(nptr), &tmpbuf[0]}
+}
diff --git a/src/runtime/histogram.go b/src/runtime/histogram.go
new file mode 100644
index 0000000..43dfe61
--- /dev/null
+++ b/src/runtime/histogram.go
@@ -0,0 +1,190 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	// For the time histogram type, we use an HDR histogram.
+	// Values are placed in buckets based solely on the most
+	// significant set bit. Thus, buckets are power-of-2 sized.
+	// Values are then placed into sub-buckets based on the value of
+	// the next timeHistSubBucketBits most significant bits. Thus,
+	// sub-buckets are linear within a bucket.
+	//
+	// Therefore, the number of sub-buckets (timeHistNumSubBuckets)
+	// defines the error. This error may be computed as
+	// 1/timeHistNumSubBuckets*100%. For example, for 16 sub-buckets
+	// per bucket the error is approximately 6%.
+	//
+	// The number of buckets (timeHistNumBuckets), on the
+	// other hand, defines the range. To avoid producing a large number
+	// of buckets that are close together, especially for small numbers
+	// (e.g. 1, 2, 3, 4, 5 ns) that aren't very useful, timeHistNumBuckets
+	// is defined in terms of the least significant bit (timeHistMinBucketBits)
+	// that needs to be set before we start bucketing and the most
+	// significant bit (timeHistMaxBucketBits) that we bucket before we just
+	// dump it into a catch-all bucket.
+	//
+	// As an example, consider the configuration:
+	//
+	//    timeHistMinBucketBits = 9
+	//    timeHistMaxBucketBits = 48
+	//    timeHistSubBucketBits = 2
+	//
+	// Then:
+	//
+	//    011000001
+	//    ^--
+	//    │ ^
+	//    │ └---- Next 2 bits -> sub-bucket 3
+	//    └------- Bit 9 unset -> bucket 0
+	//
+	//    110000001
+	//    ^--
+	//    │ ^
+	//    │ └---- Next 2 bits -> sub-bucket 2
+	//    └------- Bit 9 set -> bucket 1
+	//
+	//    1000000010
+	//    ^-- ^
+	//    │ ^ └-- Lower bits ignored
+	//    │ └---- Next 2 bits -> sub-bucket 0
+	//    └------- Bit 10 set -> bucket 2
+	//
+	// Following this pattern, bucket 38 will have the bit 46 set. We don't
+	// have any buckets for higher values, so we spill the rest into an overflow
+	// bucket containing values of 2^47-1 nanoseconds or approx. 1 day or more.
+	// This range is more than enough to handle durations produced by the runtime.
+	timeHistMinBucketBits = 9
+	timeHistMaxBucketBits = 48 // Note that this is exclusive; 1 higher than the actual range.
+	timeHistSubBucketBits = 2
+	timeHistNumSubBuckets = 1 << timeHistSubBucketBits
+	timeHistNumBuckets    = timeHistMaxBucketBits - timeHistMinBucketBits + 1
+	// Two extra buckets, one for underflow, one for overflow.
+	timeHistTotalBuckets = timeHistNumBuckets*timeHistNumSubBuckets + 2
+)
+
+// timeHistogram represents a distribution of durations in
+// nanoseconds.
+//
+// The accuracy and range of the histogram is defined by the
+// timeHistSubBucketBits and timeHistNumBuckets constants.
+//
+// It is an HDR histogram with exponentially-distributed
+// buckets and linearly distributed sub-buckets.
+//
+// The histogram is safe for concurrent reads and writes.
+type timeHistogram struct {
+	counts [timeHistNumBuckets * timeHistNumSubBuckets]atomic.Uint64
+
+	// underflow counts all the times we got a negative duration
+	// sample. Because of how time works on some platforms, it's
+	// possible to measure negative durations. We could ignore them,
+	// but we record them anyway because it's better to have some
+	// signal that it's happening than just missing samples.
+	underflow atomic.Uint64
+
+	// overflow counts all the times we got a duration that exceeded
+	// the range counts represents.
+	overflow atomic.Uint64
+}
+
+// record adds the given duration to the distribution.
+//
+// Disallow preemptions and stack growths because this function
+// may run in sensitive locations.
+//
+//go:nosplit
+func (h *timeHistogram) record(duration int64) {
+	// If the duration is negative, capture that in underflow.
+	if duration < 0 {
+		h.underflow.Add(1)
+		return
+	}
+	// bucketBit is the target bit for the bucket which is usually the
+	// highest 1 bit, but if we're less than the minimum, is the highest
+	// 1 bit of the minimum (which will be zero in the duration).
+	//
+	// bucket is the bucket index, which is the bucketBit minus the
+	// highest bit of the minimum, plus one to leave room for the catch-all
+	// bucket for samples lower than the minimum.
+	var bucketBit, bucket uint
+	if l := sys.Len64(uint64(duration)); l < timeHistMinBucketBits {
+		bucketBit = timeHistMinBucketBits
+		bucket = 0 // bucketBit - timeHistMinBucketBits
+	} else {
+		bucketBit = uint(l)
+		bucket = bucketBit - timeHistMinBucketBits + 1
+	}
+	// If the bucket we computed is greater than the number of buckets,
+	// count that in overflow.
+	if bucket >= timeHistNumBuckets {
+		h.overflow.Add(1)
+		return
+	}
+	// The sub-bucket index is just next timeHistSubBucketBits after the bucketBit.
+	subBucket := uint(duration>>(bucketBit-1-timeHistSubBucketBits)) % timeHistNumSubBuckets
+	h.counts[bucket*timeHistNumSubBuckets+subBucket].Add(1)
+}
+
+const (
+	fInf    = 0x7FF0000000000000
+	fNegInf = 0xFFF0000000000000
+)
+
+func float64Inf() float64 {
+	inf := uint64(fInf)
+	return *(*float64)(unsafe.Pointer(&inf))
+}
+
+func float64NegInf() float64 {
+	inf := uint64(fNegInf)
+	return *(*float64)(unsafe.Pointer(&inf))
+}
+
+// timeHistogramMetricsBuckets generates a slice of boundaries for
+// the timeHistogram. These boundaries are represented in seconds,
+// not nanoseconds like the timeHistogram represents durations.
+func timeHistogramMetricsBuckets() []float64 {
+	b := make([]float64, timeHistTotalBuckets+1)
+	// Underflow bucket.
+	b[0] = float64NegInf()
+
+	for j := 0; j < timeHistNumSubBuckets; j++ {
+		// No bucket bit for the first few buckets. Just sub-bucket bits after the
+		// min bucket bit.
+		bucketNanos := uint64(j) << (timeHistMinBucketBits - 1 - timeHistSubBucketBits)
+		// Convert nanoseconds to seconds via a division.
+		// These values will all be exactly representable by a float64.
+		b[j+1] = float64(bucketNanos) / 1e9
+	}
+	// Generate the rest of the buckets. It's easier to reason
+	// about if we cut out the 0'th bucket.
+	for i := timeHistMinBucketBits; i < timeHistMaxBucketBits; i++ {
+		for j := 0; j < timeHistNumSubBuckets; j++ {
+			// Set the bucket bit.
+			bucketNanos := uint64(1) << (i - 1)
+			// Set the sub-bucket bits.
+			bucketNanos |= uint64(j) << (i - 1 - timeHistSubBucketBits)
+			// The index for this bucket is going to be the (i+1)'th bucket
+			// (note that we're starting from zero, but handled the first bucket
+			// earlier, so we need to compensate), and the j'th sub bucket.
+			// Add 1 because we left space for -Inf.
+			bucketIndex := (i-timeHistMinBucketBits+1)*timeHistNumSubBuckets + j + 1
+			// Convert nanoseconds to seconds via a division.
+			// These values will all be exactly representable by a float64.
+			b[bucketIndex] = float64(bucketNanos) / 1e9
+		}
+	}
+	// Overflow bucket.
+	b[len(b)-2] = float64(uint64(1)<<(timeHistMaxBucketBits-1)) / 1e9
+	b[len(b)-1] = float64Inf()
+	return b
+}
diff --git a/src/runtime/histogram_test.go b/src/runtime/histogram_test.go
new file mode 100644
index 0000000..5246e86
--- /dev/null
+++ b/src/runtime/histogram_test.go
@@ -0,0 +1,112 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"math"
+	. "runtime"
+	"testing"
+)
+
+var dummyTimeHistogram TimeHistogram
+
+func TestTimeHistogram(t *testing.T) {
+	// We need to use a global dummy because this
+	// could get stack-allocated with a non-8-byte alignment.
+	// The result of this bad alignment is a segfault on
+	// 32-bit platforms when calling Record.
+	h := &dummyTimeHistogram
+
+	// Record exactly one sample in each bucket.
+	for j := 0; j < TimeHistNumSubBuckets; j++ {
+		v := int64(j) << (TimeHistMinBucketBits - 1 - TimeHistSubBucketBits)
+		for k := 0; k < j; k++ {
+			// Record a number of times equal to the bucket index.
+			h.Record(v)
+		}
+	}
+	for i := TimeHistMinBucketBits; i < TimeHistMaxBucketBits; i++ {
+		base := int64(1) << (i - 1)
+		for j := 0; j < TimeHistNumSubBuckets; j++ {
+			v := int64(j) << (i - 1 - TimeHistSubBucketBits)
+			for k := 0; k < (i+1-TimeHistMinBucketBits)*TimeHistNumSubBuckets+j; k++ {
+				// Record a number of times equal to the bucket index.
+				h.Record(base + v)
+			}
+		}
+	}
+	// Hit the underflow and overflow buckets.
+	h.Record(int64(-1))
+	h.Record(math.MaxInt64)
+	h.Record(math.MaxInt64)
+
+	// Check to make sure there's exactly one count in each
+	// bucket.
+	for i := 0; i < TimeHistNumBuckets; i++ {
+		for j := 0; j < TimeHistNumSubBuckets; j++ {
+			c, ok := h.Count(i, j)
+			if !ok {
+				t.Errorf("unexpected invalid bucket: (%d, %d)", i, j)
+			} else if idx := uint64(i*TimeHistNumSubBuckets + j); c != idx {
+				t.Errorf("bucket (%d, %d) has count that is not %d: %d", i, j, idx, c)
+			}
+		}
+	}
+	c, ok := h.Count(-1, 0)
+	if ok {
+		t.Errorf("expected to hit underflow bucket: (%d, %d)", -1, 0)
+	}
+	if c != 1 {
+		t.Errorf("overflow bucket has count that is not 1: %d", c)
+	}
+
+	c, ok = h.Count(TimeHistNumBuckets+1, 0)
+	if ok {
+		t.Errorf("expected to hit overflow bucket: (%d, %d)", TimeHistNumBuckets+1, 0)
+	}
+	if c != 2 {
+		t.Errorf("overflow bucket has count that is not 2: %d", c)
+	}
+
+	dummyTimeHistogram = TimeHistogram{}
+}
+
+func TestTimeHistogramMetricsBuckets(t *testing.T) {
+	buckets := TimeHistogramMetricsBuckets()
+
+	nonInfBucketsLen := TimeHistNumSubBuckets * TimeHistNumBuckets
+	expBucketsLen := nonInfBucketsLen + 3 // Count -Inf, the edge for the overflow bucket, and +Inf.
+	if len(buckets) != expBucketsLen {
+		t.Fatalf("unexpected length of buckets: got %d, want %d", len(buckets), expBucketsLen)
+	}
+	// Check some values.
+	idxToBucket := map[int]float64{
+		0:                 math.Inf(-1),
+		1:                 0.0,
+		2:                 float64(0x040) / 1e9,
+		3:                 float64(0x080) / 1e9,
+		4:                 float64(0x0c0) / 1e9,
+		5:                 float64(0x100) / 1e9,
+		6:                 float64(0x140) / 1e9,
+		7:                 float64(0x180) / 1e9,
+		8:                 float64(0x1c0) / 1e9,
+		9:                 float64(0x200) / 1e9,
+		10:                float64(0x280) / 1e9,
+		11:                float64(0x300) / 1e9,
+		12:                float64(0x380) / 1e9,
+		13:                float64(0x400) / 1e9,
+		15:                float64(0x600) / 1e9,
+		81:                float64(0x8000000) / 1e9,
+		82:                float64(0xa000000) / 1e9,
+		108:               float64(0x380000000) / 1e9,
+		expBucketsLen - 2: float64(0x1<<47) / 1e9,
+		expBucketsLen - 1: math.Inf(1),
+	}
+	for idx, bucket := range idxToBucket {
+		if got, want := buckets[idx], bucket; got != want {
+			t.Errorf("expected bucket %d to have value %e, got %e", idx, want, got)
+		}
+	}
+}
diff --git a/src/runtime/iface.go b/src/runtime/iface.go
new file mode 100644
index 0000000..87f7c20
--- /dev/null
+++ b/src/runtime/iface.go
@@ -0,0 +1,534 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const itabInitSize = 512
+
+var (
+	itabLock      mutex                               // lock for accessing itab table
+	itabTable     = &itabTableInit                    // pointer to current table
+	itabTableInit = itabTableType{size: itabInitSize} // starter table
+)
+
+// Note: change the formula in the mallocgc call in itabAdd if you change these fields.
+type itabTableType struct {
+	size    uintptr             // length of entries array. Always a power of 2.
+	count   uintptr             // current number of filled entries.
+	entries [itabInitSize]*itab // really [size] large
+}
+
+func itabHashFunc(inter *interfacetype, typ *_type) uintptr {
+	// compiler has provided some good hash codes for us.
+	return uintptr(inter.Type.Hash ^ typ.Hash)
+}
+
+func getitab(inter *interfacetype, typ *_type, canfail bool) *itab {
+	if len(inter.Methods) == 0 {
+		throw("internal error - misuse of itab")
+	}
+
+	// easy case
+	if typ.TFlag&abi.TFlagUncommon == 0 {
+		if canfail {
+			return nil
+		}
+		name := toRType(&inter.Type).nameOff(inter.Methods[0].Name)
+		panic(&TypeAssertionError{nil, typ, &inter.Type, name.Name()})
+	}
+
+	var m *itab
+
+	// First, look in the existing table to see if we can find the itab we need.
+	// This is by far the most common case, so do it without locks.
+	// Use atomic to ensure we see any previous writes done by the thread
+	// that updates the itabTable field (with atomic.Storep in itabAdd).
+	t := (*itabTableType)(atomic.Loadp(unsafe.Pointer(&itabTable)))
+	if m = t.find(inter, typ); m != nil {
+		goto finish
+	}
+
+	// Not found.  Grab the lock and try again.
+	lock(&itabLock)
+	if m = itabTable.find(inter, typ); m != nil {
+		unlock(&itabLock)
+		goto finish
+	}
+
+	// Entry doesn't exist yet. Make a new entry & add it.
+	m = (*itab)(persistentalloc(unsafe.Sizeof(itab{})+uintptr(len(inter.Methods)-1)*goarch.PtrSize, 0, &memstats.other_sys))
+	m.inter = inter
+	m._type = typ
+	// The hash is used in type switches. However, compiler statically generates itab's
+	// for all interface/type pairs used in switches (which are added to itabTable
+	// in itabsinit). The dynamically-generated itab's never participate in type switches,
+	// and thus the hash is irrelevant.
+	// Note: m.hash is _not_ the hash used for the runtime itabTable hash table.
+	m.hash = 0
+	m.init()
+	itabAdd(m)
+	unlock(&itabLock)
+finish:
+	if m.fun[0] != 0 {
+		return m
+	}
+	if canfail {
+		return nil
+	}
+	// this can only happen if the conversion
+	// was already done once using the , ok form
+	// and we have a cached negative result.
+	// The cached result doesn't record which
+	// interface function was missing, so initialize
+	// the itab again to get the missing function name.
+	panic(&TypeAssertionError{concrete: typ, asserted: &inter.Type, missingMethod: m.init()})
+}
+
+// find finds the given interface/type pair in t.
+// Returns nil if the given interface/type pair isn't present.
+func (t *itabTableType) find(inter *interfacetype, typ *_type) *itab {
+	// Implemented using quadratic probing.
+	// Probe sequence is h(i) = h0 + i*(i+1)/2 mod 2^k.
+	// We're guaranteed to hit all table entries using this probe sequence.
+	mask := t.size - 1
+	h := itabHashFunc(inter, typ) & mask
+	for i := uintptr(1); ; i++ {
+		p := (**itab)(add(unsafe.Pointer(&t.entries), h*goarch.PtrSize))
+		// Use atomic read here so if we see m != nil, we also see
+		// the initializations of the fields of m.
+		// m := *p
+		m := (*itab)(atomic.Loadp(unsafe.Pointer(p)))
+		if m == nil {
+			return nil
+		}
+		if m.inter == inter && m._type == typ {
+			return m
+		}
+		h += i
+		h &= mask
+	}
+}
+
+// itabAdd adds the given itab to the itab hash table.
+// itabLock must be held.
+func itabAdd(m *itab) {
+	// Bugs can lead to calling this while mallocing is set,
+	// typically because this is called while panicking.
+	// Crash reliably, rather than only when we need to grow
+	// the hash table.
+	if getg().m.mallocing != 0 {
+		throw("malloc deadlock")
+	}
+
+	t := itabTable
+	if t.count >= 3*(t.size/4) { // 75% load factor
+		// Grow hash table.
+		// t2 = new(itabTableType) + some additional entries
+		// We lie and tell malloc we want pointer-free memory because
+		// all the pointed-to values are not in the heap.
+		t2 := (*itabTableType)(mallocgc((2+2*t.size)*goarch.PtrSize, nil, true))
+		t2.size = t.size * 2
+
+		// Copy over entries.
+		// Note: while copying, other threads may look for an itab and
+		// fail to find it. That's ok, they will then try to get the itab lock
+		// and as a consequence wait until this copying is complete.
+		iterate_itabs(t2.add)
+		if t2.count != t.count {
+			throw("mismatched count during itab table copy")
+		}
+		// Publish new hash table. Use an atomic write: see comment in getitab.
+		atomicstorep(unsafe.Pointer(&itabTable), unsafe.Pointer(t2))
+		// Adopt the new table as our own.
+		t = itabTable
+		// Note: the old table can be GC'ed here.
+	}
+	t.add(m)
+}
+
+// add adds the given itab to itab table t.
+// itabLock must be held.
+func (t *itabTableType) add(m *itab) {
+	// See comment in find about the probe sequence.
+	// Insert new itab in the first empty spot in the probe sequence.
+	mask := t.size - 1
+	h := itabHashFunc(m.inter, m._type) & mask
+	for i := uintptr(1); ; i++ {
+		p := (**itab)(add(unsafe.Pointer(&t.entries), h*goarch.PtrSize))
+		m2 := *p
+		if m2 == m {
+			// A given itab may be used in more than one module
+			// and thanks to the way global symbol resolution works, the
+			// pointed-to itab may already have been inserted into the
+			// global 'hash'.
+			return
+		}
+		if m2 == nil {
+			// Use atomic write here so if a reader sees m, it also
+			// sees the correctly initialized fields of m.
+			// NoWB is ok because m is not in heap memory.
+			// *p = m
+			atomic.StorepNoWB(unsafe.Pointer(p), unsafe.Pointer(m))
+			t.count++
+			return
+		}
+		h += i
+		h &= mask
+	}
+}
+
+// init fills in the m.fun array with all the code pointers for
+// the m.inter/m._type pair. If the type does not implement the interface,
+// it sets m.fun[0] to 0 and returns the name of an interface function that is missing.
+// It is ok to call this multiple times on the same m, even concurrently.
+func (m *itab) init() string {
+	inter := m.inter
+	typ := m._type
+	x := typ.Uncommon()
+
+	// both inter and typ have method sorted by name,
+	// and interface names are unique,
+	// so can iterate over both in lock step;
+	// the loop is O(ni+nt) not O(ni*nt).
+	ni := len(inter.Methods)
+	nt := int(x.Mcount)
+	xmhdr := (*[1 << 16]abi.Method)(add(unsafe.Pointer(x), uintptr(x.Moff)))[:nt:nt]
+	j := 0
+	methods := (*[1 << 16]unsafe.Pointer)(unsafe.Pointer(&m.fun[0]))[:ni:ni]
+	var fun0 unsafe.Pointer
+imethods:
+	for k := 0; k < ni; k++ {
+		i := &inter.Methods[k]
+		itype := toRType(&inter.Type).typeOff(i.Typ)
+		name := toRType(&inter.Type).nameOff(i.Name)
+		iname := name.Name()
+		ipkg := pkgPath(name)
+		if ipkg == "" {
+			ipkg = inter.PkgPath.Name()
+		}
+		for ; j < nt; j++ {
+			t := &xmhdr[j]
+			rtyp := toRType(typ)
+			tname := rtyp.nameOff(t.Name)
+			if rtyp.typeOff(t.Mtyp) == itype && tname.Name() == iname {
+				pkgPath := pkgPath(tname)
+				if pkgPath == "" {
+					pkgPath = rtyp.nameOff(x.PkgPath).Name()
+				}
+				if tname.IsExported() || pkgPath == ipkg {
+					if m != nil {
+						ifn := rtyp.textOff(t.Ifn)
+						if k == 0 {
+							fun0 = ifn // we'll set m.fun[0] at the end
+						} else {
+							methods[k] = ifn
+						}
+					}
+					continue imethods
+				}
+			}
+		}
+		// didn't find method
+		m.fun[0] = 0
+		return iname
+	}
+	m.fun[0] = uintptr(fun0)
+	return ""
+}
+
+func itabsinit() {
+	lockInit(&itabLock, lockRankItab)
+	lock(&itabLock)
+	for _, md := range activeModules() {
+		for _, i := range md.itablinks {
+			itabAdd(i)
+		}
+	}
+	unlock(&itabLock)
+}
+
+// panicdottypeE is called when doing an e.(T) conversion and the conversion fails.
+// have = the dynamic type we have.
+// want = the static type we're trying to convert to.
+// iface = the static type we're converting from.
+func panicdottypeE(have, want, iface *_type) {
+	panic(&TypeAssertionError{iface, have, want, ""})
+}
+
+// panicdottypeI is called when doing an i.(T) conversion and the conversion fails.
+// Same args as panicdottypeE, but "have" is the dynamic itab we have.
+func panicdottypeI(have *itab, want, iface *_type) {
+	var t *_type
+	if have != nil {
+		t = have._type
+	}
+	panicdottypeE(t, want, iface)
+}
+
+// panicnildottype is called when doing an i.(T) conversion and the interface i is nil.
+// want = the static type we're trying to convert to.
+func panicnildottype(want *_type) {
+	panic(&TypeAssertionError{nil, nil, want, ""})
+	// TODO: Add the static type we're converting from as well.
+	// It might generate a better error message.
+	// Just to match other nil conversion errors, we don't for now.
+}
+
+// The specialized convTx routines need a type descriptor to use when calling mallocgc.
+// We don't need the type to be exact, just to have the correct size, alignment, and pointer-ness.
+// However, when debugging, it'd be nice to have some indication in mallocgc where the types came from,
+// so we use named types here.
+// We then construct interface values of these types,
+// and then extract the type word to use as needed.
+type (
+	uint16InterfacePtr uint16
+	uint32InterfacePtr uint32
+	uint64InterfacePtr uint64
+	stringInterfacePtr string
+	sliceInterfacePtr  []byte
+)
+
+var (
+	uint16Eface any = uint16InterfacePtr(0)
+	uint32Eface any = uint32InterfacePtr(0)
+	uint64Eface any = uint64InterfacePtr(0)
+	stringEface any = stringInterfacePtr("")
+	sliceEface  any = sliceInterfacePtr(nil)
+
+	uint16Type *_type = efaceOf(&uint16Eface)._type
+	uint32Type *_type = efaceOf(&uint32Eface)._type
+	uint64Type *_type = efaceOf(&uint64Eface)._type
+	stringType *_type = efaceOf(&stringEface)._type
+	sliceType  *_type = efaceOf(&sliceEface)._type
+)
+
+// The conv and assert functions below do very similar things.
+// The convXXX functions are guaranteed by the compiler to succeed.
+// The assertXXX functions may fail (either panicking or returning false,
+// depending on whether they are 1-result or 2-result).
+// The convXXX functions succeed on a nil input, whereas the assertXXX
+// functions fail on a nil input.
+
+// convT converts a value of type t, which is pointed to by v, to a pointer that can
+// be used as the second word of an interface value.
+func convT(t *_type, v unsafe.Pointer) unsafe.Pointer {
+	if raceenabled {
+		raceReadObjectPC(t, v, getcallerpc(), abi.FuncPCABIInternal(convT))
+	}
+	if msanenabled {
+		msanread(v, t.Size_)
+	}
+	if asanenabled {
+		asanread(v, t.Size_)
+	}
+	x := mallocgc(t.Size_, t, true)
+	typedmemmove(t, x, v)
+	return x
+}
+func convTnoptr(t *_type, v unsafe.Pointer) unsafe.Pointer {
+	// TODO: maybe take size instead of type?
+	if raceenabled {
+		raceReadObjectPC(t, v, getcallerpc(), abi.FuncPCABIInternal(convTnoptr))
+	}
+	if msanenabled {
+		msanread(v, t.Size_)
+	}
+	if asanenabled {
+		asanread(v, t.Size_)
+	}
+
+	x := mallocgc(t.Size_, t, false)
+	memmove(x, v, t.Size_)
+	return x
+}
+
+func convT16(val uint16) (x unsafe.Pointer) {
+	if val < uint16(len(staticuint64s)) {
+		x = unsafe.Pointer(&staticuint64s[val])
+		if goarch.BigEndian {
+			x = add(x, 6)
+		}
+	} else {
+		x = mallocgc(2, uint16Type, false)
+		*(*uint16)(x) = val
+	}
+	return
+}
+
+func convT32(val uint32) (x unsafe.Pointer) {
+	if val < uint32(len(staticuint64s)) {
+		x = unsafe.Pointer(&staticuint64s[val])
+		if goarch.BigEndian {
+			x = add(x, 4)
+		}
+	} else {
+		x = mallocgc(4, uint32Type, false)
+		*(*uint32)(x) = val
+	}
+	return
+}
+
+func convT64(val uint64) (x unsafe.Pointer) {
+	if val < uint64(len(staticuint64s)) {
+		x = unsafe.Pointer(&staticuint64s[val])
+	} else {
+		x = mallocgc(8, uint64Type, false)
+		*(*uint64)(x) = val
+	}
+	return
+}
+
+func convTstring(val string) (x unsafe.Pointer) {
+	if val == "" {
+		x = unsafe.Pointer(&zeroVal[0])
+	} else {
+		x = mallocgc(unsafe.Sizeof(val), stringType, true)
+		*(*string)(x) = val
+	}
+	return
+}
+
+func convTslice(val []byte) (x unsafe.Pointer) {
+	// Note: this must work for any element type, not just byte.
+	if (*slice)(unsafe.Pointer(&val)).array == nil {
+		x = unsafe.Pointer(&zeroVal[0])
+	} else {
+		x = mallocgc(unsafe.Sizeof(val), sliceType, true)
+		*(*[]byte)(x) = val
+	}
+	return
+}
+
+// convI2I returns the new itab to be used for the destination value
+// when converting a value with itab src to the dst interface.
+func convI2I(dst *interfacetype, src *itab) *itab {
+	if src == nil {
+		return nil
+	}
+	if src.inter == dst {
+		return src
+	}
+	return getitab(dst, src._type, false)
+}
+
+func assertI2I(inter *interfacetype, tab *itab) *itab {
+	if tab == nil {
+		// explicit conversions require non-nil interface value.
+		panic(&TypeAssertionError{nil, nil, &inter.Type, ""})
+	}
+	if tab.inter == inter {
+		return tab
+	}
+	return getitab(inter, tab._type, false)
+}
+
+func assertI2I2(inter *interfacetype, i iface) (r iface) {
+	tab := i.tab
+	if tab == nil {
+		return
+	}
+	if tab.inter != inter {
+		tab = getitab(inter, tab._type, true)
+		if tab == nil {
+			return
+		}
+	}
+	r.tab = tab
+	r.data = i.data
+	return
+}
+
+func assertE2I(inter *interfacetype, t *_type) *itab {
+	if t == nil {
+		// explicit conversions require non-nil interface value.
+		panic(&TypeAssertionError{nil, nil, &inter.Type, ""})
+	}
+	return getitab(inter, t, false)
+}
+
+func assertE2I2(inter *interfacetype, e eface) (r iface) {
+	t := e._type
+	if t == nil {
+		return
+	}
+	tab := getitab(inter, t, true)
+	if tab == nil {
+		return
+	}
+	r.tab = tab
+	r.data = e.data
+	return
+}
+
+//go:linkname reflect_ifaceE2I reflect.ifaceE2I
+func reflect_ifaceE2I(inter *interfacetype, e eface, dst *iface) {
+	*dst = iface{assertE2I(inter, e._type), e.data}
+}
+
+//go:linkname reflectlite_ifaceE2I internal/reflectlite.ifaceE2I
+func reflectlite_ifaceE2I(inter *interfacetype, e eface, dst *iface) {
+	*dst = iface{assertE2I(inter, e._type), e.data}
+}
+
+func iterate_itabs(fn func(*itab)) {
+	// Note: only runs during stop the world or with itabLock held,
+	// so no other locks/atomics needed.
+	t := itabTable
+	for i := uintptr(0); i < t.size; i++ {
+		m := *(**itab)(add(unsafe.Pointer(&t.entries), i*goarch.PtrSize))
+		if m != nil {
+			fn(m)
+		}
+	}
+}
+
+// staticuint64s is used to avoid allocating in convTx for small integer values.
+var staticuint64s = [...]uint64{
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+	0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+	0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
+	0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,
+	0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
+	0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
+	0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f,
+	0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47,
+	0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f,
+	0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57,
+	0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f,
+	0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67,
+	0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f,
+	0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77,
+	0x78, 0x79, 0x7a, 0x7b, 0x7c, 0x7d, 0x7e, 0x7f,
+	0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+	0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+	0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+	0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f,
+	0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
+	0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
+	0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
+	0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
+	0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
+	0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
+	0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
+	0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
+	0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
+	0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
+	0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
+	0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff,
+}
+
+// The linker redirects a reference of a method that it determined
+// unreachable to a reference to this function, so it will throw if
+// ever called.
+func unreachableMethod() {
+	throw("unreachable method called. linker bug?")
+}
diff --git a/src/runtime/iface_test.go b/src/runtime/iface_test.go
new file mode 100644
index 0000000..06f6eeb
--- /dev/null
+++ b/src/runtime/iface_test.go
@@ -0,0 +1,439 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime"
+	"testing"
+)
+
+type I1 interface {
+	Method1()
+}
+
+type I2 interface {
+	Method1()
+	Method2()
+}
+
+type TS uint16
+type TM uintptr
+type TL [2]uintptr
+
+func (TS) Method1() {}
+func (TS) Method2() {}
+func (TM) Method1() {}
+func (TM) Method2() {}
+func (TL) Method1() {}
+func (TL) Method2() {}
+
+type T8 uint8
+type T16 uint16
+type T32 uint32
+type T64 uint64
+type Tstr string
+type Tslice []byte
+
+func (T8) Method1()     {}
+func (T16) Method1()    {}
+func (T32) Method1()    {}
+func (T64) Method1()    {}
+func (Tstr) Method1()   {}
+func (Tslice) Method1() {}
+
+var (
+	e  any
+	e_ any
+	i1 I1
+	i2 I2
+	ts TS
+	tm TM
+	tl TL
+	ok bool
+)
+
+// Issue 9370
+func TestCmpIfaceConcreteAlloc(t *testing.T) {
+	if runtime.Compiler != "gc" {
+		t.Skip("skipping on non-gc compiler")
+	}
+
+	n := testing.AllocsPerRun(1, func() {
+		_ = e == ts
+		_ = i1 == ts
+		_ = e == 1
+	})
+
+	if n > 0 {
+		t.Fatalf("iface cmp allocs=%v; want 0", n)
+	}
+}
+
+func BenchmarkEqEfaceConcrete(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		_ = e == ts
+	}
+}
+
+func BenchmarkEqIfaceConcrete(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		_ = i1 == ts
+	}
+}
+
+func BenchmarkNeEfaceConcrete(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		_ = e != ts
+	}
+}
+
+func BenchmarkNeIfaceConcrete(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		_ = i1 != ts
+	}
+}
+
+func BenchmarkConvT2EByteSized(b *testing.B) {
+	b.Run("bool", func(b *testing.B) {
+		for i := 0; i < b.N; i++ {
+			e = yes
+		}
+	})
+	b.Run("uint8", func(b *testing.B) {
+		for i := 0; i < b.N; i++ {
+			e = eight8
+		}
+	})
+}
+
+func BenchmarkConvT2ESmall(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		e = ts
+	}
+}
+
+func BenchmarkConvT2EUintptr(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		e = tm
+	}
+}
+
+func BenchmarkConvT2ELarge(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		e = tl
+	}
+}
+
+func BenchmarkConvT2ISmall(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		i1 = ts
+	}
+}
+
+func BenchmarkConvT2IUintptr(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		i1 = tm
+	}
+}
+
+func BenchmarkConvT2ILarge(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		i1 = tl
+	}
+}
+
+func BenchmarkConvI2E(b *testing.B) {
+	i2 = tm
+	for i := 0; i < b.N; i++ {
+		e = i2
+	}
+}
+
+func BenchmarkConvI2I(b *testing.B) {
+	i2 = tm
+	for i := 0; i < b.N; i++ {
+		i1 = i2
+	}
+}
+
+func BenchmarkAssertE2T(b *testing.B) {
+	e = tm
+	for i := 0; i < b.N; i++ {
+		tm = e.(TM)
+	}
+}
+
+func BenchmarkAssertE2TLarge(b *testing.B) {
+	e = tl
+	for i := 0; i < b.N; i++ {
+		tl = e.(TL)
+	}
+}
+
+func BenchmarkAssertE2I(b *testing.B) {
+	e = tm
+	for i := 0; i < b.N; i++ {
+		i1 = e.(I1)
+	}
+}
+
+func BenchmarkAssertI2T(b *testing.B) {
+	i1 = tm
+	for i := 0; i < b.N; i++ {
+		tm = i1.(TM)
+	}
+}
+
+func BenchmarkAssertI2I(b *testing.B) {
+	i1 = tm
+	for i := 0; i < b.N; i++ {
+		i2 = i1.(I2)
+	}
+}
+
+func BenchmarkAssertI2E(b *testing.B) {
+	i1 = tm
+	for i := 0; i < b.N; i++ {
+		e = i1.(any)
+	}
+}
+
+func BenchmarkAssertE2E(b *testing.B) {
+	e = tm
+	for i := 0; i < b.N; i++ {
+		e_ = e
+	}
+}
+
+func BenchmarkAssertE2T2(b *testing.B) {
+	e = tm
+	for i := 0; i < b.N; i++ {
+		tm, ok = e.(TM)
+	}
+}
+
+func BenchmarkAssertE2T2Blank(b *testing.B) {
+	e = tm
+	for i := 0; i < b.N; i++ {
+		_, ok = e.(TM)
+	}
+}
+
+func BenchmarkAssertI2E2(b *testing.B) {
+	i1 = tm
+	for i := 0; i < b.N; i++ {
+		e, ok = i1.(any)
+	}
+}
+
+func BenchmarkAssertI2E2Blank(b *testing.B) {
+	i1 = tm
+	for i := 0; i < b.N; i++ {
+		_, ok = i1.(any)
+	}
+}
+
+func BenchmarkAssertE2E2(b *testing.B) {
+	e = tm
+	for i := 0; i < b.N; i++ {
+		e_, ok = e.(any)
+	}
+}
+
+func BenchmarkAssertE2E2Blank(b *testing.B) {
+	e = tm
+	for i := 0; i < b.N; i++ {
+		_, ok = e.(any)
+	}
+}
+
+func TestNonEscapingConvT2E(t *testing.T) {
+	m := make(map[any]bool)
+	m[42] = true
+	if !m[42] {
+		t.Fatalf("42 is not present in the map")
+	}
+	if m[0] {
+		t.Fatalf("0 is present in the map")
+	}
+
+	n := testing.AllocsPerRun(1000, func() {
+		if m[0] {
+			t.Fatalf("0 is present in the map")
+		}
+	})
+	if n != 0 {
+		t.Fatalf("want 0 allocs, got %v", n)
+	}
+}
+
+func TestNonEscapingConvT2I(t *testing.T) {
+	m := make(map[I1]bool)
+	m[TM(42)] = true
+	if !m[TM(42)] {
+		t.Fatalf("42 is not present in the map")
+	}
+	if m[TM(0)] {
+		t.Fatalf("0 is present in the map")
+	}
+
+	n := testing.AllocsPerRun(1000, func() {
+		if m[TM(0)] {
+			t.Fatalf("0 is present in the map")
+		}
+	})
+	if n != 0 {
+		t.Fatalf("want 0 allocs, got %v", n)
+	}
+}
+
+func TestZeroConvT2x(t *testing.T) {
+	tests := []struct {
+		name string
+		fn   func()
+	}{
+		{name: "E8", fn: func() { e = eight8 }},  // any byte-sized value does not allocate
+		{name: "E16", fn: func() { e = zero16 }}, // zero values do not allocate
+		{name: "E32", fn: func() { e = zero32 }},
+		{name: "E64", fn: func() { e = zero64 }},
+		{name: "Estr", fn: func() { e = zerostr }},
+		{name: "Eslice", fn: func() { e = zeroslice }},
+		{name: "Econstflt", fn: func() { e = 99.0 }}, // constants do not allocate
+		{name: "Econststr", fn: func() { e = "change" }},
+		{name: "I8", fn: func() { i1 = eight8I }},
+		{name: "I16", fn: func() { i1 = zero16I }},
+		{name: "I32", fn: func() { i1 = zero32I }},
+		{name: "I64", fn: func() { i1 = zero64I }},
+		{name: "Istr", fn: func() { i1 = zerostrI }},
+		{name: "Islice", fn: func() { i1 = zerosliceI }},
+	}
+
+	for _, test := range tests {
+		t.Run(test.name, func(t *testing.T) {
+			n := testing.AllocsPerRun(1000, test.fn)
+			if n != 0 {
+				t.Errorf("want zero allocs, got %v", n)
+			}
+		})
+	}
+}
+
+var (
+	eight8  uint8 = 8
+	eight8I T8    = 8
+	yes     bool  = true
+
+	zero16     uint16 = 0
+	zero16I    T16    = 0
+	one16      uint16 = 1
+	thousand16 uint16 = 1000
+
+	zero32     uint32 = 0
+	zero32I    T32    = 0
+	one32      uint32 = 1
+	thousand32 uint32 = 1000
+
+	zero64     uint64 = 0
+	zero64I    T64    = 0
+	one64      uint64 = 1
+	thousand64 uint64 = 1000
+
+	zerostr  string = ""
+	zerostrI Tstr   = ""
+	nzstr    string = "abc"
+
+	zeroslice  []byte = nil
+	zerosliceI Tslice = nil
+	nzslice    []byte = []byte("abc")
+
+	zerobig [512]byte
+	nzbig   [512]byte = [512]byte{511: 1}
+)
+
+func BenchmarkConvT2Ezero(b *testing.B) {
+	b.Run("zero", func(b *testing.B) {
+		b.Run("16", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = zero16
+			}
+		})
+		b.Run("32", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = zero32
+			}
+		})
+		b.Run("64", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = zero64
+			}
+		})
+		b.Run("str", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = zerostr
+			}
+		})
+		b.Run("slice", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = zeroslice
+			}
+		})
+		b.Run("big", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = zerobig
+			}
+		})
+	})
+	b.Run("nonzero", func(b *testing.B) {
+		b.Run("str", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = nzstr
+			}
+		})
+		b.Run("slice", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = nzslice
+			}
+		})
+		b.Run("big", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = nzbig
+			}
+		})
+	})
+	b.Run("smallint", func(b *testing.B) {
+		b.Run("16", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = one16
+			}
+		})
+		b.Run("32", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = one32
+			}
+		})
+		b.Run("64", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = one64
+			}
+		})
+	})
+	b.Run("largeint", func(b *testing.B) {
+		b.Run("16", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = thousand16
+			}
+		})
+		b.Run("32", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = thousand32
+			}
+		})
+		b.Run("64", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				e = thousand64
+			}
+		})
+	})
+}
diff --git a/src/runtime/import_test.go b/src/runtime/import_test.go
new file mode 100644
index 0000000..2bf80aa
--- /dev/null
+++ b/src/runtime/import_test.go
@@ -0,0 +1,45 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This file and importx_test.go make it possible to write tests in the runtime
+// package, which is generally more convenient for testing runtime internals.
+// For tests that mostly touch public APIs, it's generally easier to write them
+// in the runtime_test package and export any runtime internals via
+// export_test.go.
+//
+// There are a few limitations on runtime package tests that this bridges:
+//
+// 1. Tests use the signature "XTest<name>(t TestingT)". Since runtime can't import
+// testing, test functions can't use testing.T, so instead we have the T
+// interface, which *testing.T satisfies. And we start names with "XTest"
+// because otherwise go test will complain about Test functions with the wrong
+// signature. To actually expose these as test functions, this file contains
+// trivial wrappers.
+//
+// 2. Runtime package tests can't directly import other std packages, so we
+// inject any necessary functions from std.
+
+// TODO: Generate this
+
+package runtime_test
+
+import (
+	"fmt"
+	"internal/testenv"
+	"runtime"
+	"testing"
+)
+
+func init() {
+	runtime.FmtSprintf = fmt.Sprintf
+	runtime.TestenvOptimizationOff = testenv.OptimizationOff
+}
+
+func TestInlineUnwinder(t *testing.T) {
+	runtime.XTestInlineUnwinder(t)
+}
+
+func TestSPWrite(t *testing.T) {
+	runtime.XTestSPWrite(t)
+}
diff --git a/src/runtime/importx_test.go b/src/runtime/importx_test.go
new file mode 100644
index 0000000..4574af7
--- /dev/null
+++ b/src/runtime/importx_test.go
@@ -0,0 +1,33 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// See import_test.go. This is the half that lives in the runtime package.
+
+// TODO: Generate this
+
+package runtime
+
+type TestingT interface {
+	Cleanup(func())
+	Error(args ...any)
+	Errorf(format string, args ...any)
+	Fail()
+	FailNow()
+	Failed() bool
+	Fatal(args ...any)
+	Fatalf(format string, args ...any)
+	Helper()
+	Log(args ...any)
+	Logf(format string, args ...any)
+	Name() string
+	Setenv(key, value string)
+	Skip(args ...any)
+	SkipNow()
+	Skipf(format string, args ...any)
+	Skipped() bool
+	TempDir() string
+}
+
+var FmtSprintf func(format string, a ...any) string
+var TestenvOptimizationOff func() bool
diff --git a/src/runtime/internal/atomic/atomic_386.go b/src/runtime/internal/atomic/atomic_386.go
new file mode 100644
index 0000000..bf2f4b9
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_386.go
@@ -0,0 +1,103 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build 386
+
+package atomic
+
+import "unsafe"
+
+// Export some functions via linkname to assembly in sync/atomic.
+//
+//go:linkname Load
+//go:linkname Loadp
+
+//go:nosplit
+//go:noinline
+func Load(ptr *uint32) uint32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer {
+	return *(*unsafe.Pointer)(ptr)
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcq(ptr *uint32) uint32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcquintptr(ptr *uintptr) uintptr {
+	return *ptr
+}
+
+//go:noescape
+func Xadd64(ptr *uint64, delta int64) uint64
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xchg64(ptr *uint64, new uint64) uint64
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:noescape
+func Load64(ptr *uint64) uint64
+
+//go:nosplit
+//go:noinline
+func Load8(ptr *uint8) uint8 {
+	return *ptr
+}
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+// NOTE: Do not add atomicxor8 (XOR is not idempotent).
+
+//go:noescape
+func Cas64(ptr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(ptr *uint32, old, new uint32) bool
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+//go:noescape
+func Store64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreRel(ptr *uint32, val uint32)
+
+//go:noescape
+func StoreReluintptr(ptr *uintptr, val uintptr)
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
diff --git a/src/runtime/internal/atomic/atomic_386.s b/src/runtime/internal/atomic/atomic_386.s
new file mode 100644
index 0000000..724d515
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_386.s
@@ -0,0 +1,285 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+#include "funcdata.h"
+
+// bool Cas(int32 *val, int32 old, int32 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	}else
+//		return 0;
+TEXT ·Cas(SB), NOSPLIT, $0-13
+	MOVL	ptr+0(FP), BX
+	MOVL	old+4(FP), AX
+	MOVL	new+8(FP), CX
+	LOCK
+	CMPXCHGL	CX, 0(BX)
+	SETEQ	ret+12(FP)
+	RET
+
+TEXT ·Casint32(SB), NOSPLIT, $0-13
+	JMP	·Cas(SB)
+
+TEXT ·Casint64(SB), NOSPLIT, $0-21
+	JMP	·Cas64(SB)
+
+TEXT ·Casuintptr(SB), NOSPLIT, $0-13
+	JMP	·Cas(SB)
+
+TEXT ·CasRel(SB), NOSPLIT, $0-13
+	JMP	·Cas(SB)
+
+TEXT ·Loaduintptr(SB), NOSPLIT, $0-8
+	JMP	·Load(SB)
+
+TEXT ·Loaduint(SB), NOSPLIT, $0-8
+	JMP	·Load(SB)
+
+TEXT ·Storeint32(SB), NOSPLIT, $0-8
+	JMP	·Store(SB)
+
+TEXT ·Storeint64(SB), NOSPLIT, $0-12
+	JMP	·Store64(SB)
+
+TEXT ·Storeuintptr(SB), NOSPLIT, $0-8
+	JMP	·Store(SB)
+
+TEXT ·Xadduintptr(SB), NOSPLIT, $0-12
+	JMP	·Xadd(SB)
+
+TEXT ·Loadint32(SB), NOSPLIT, $0-8
+	JMP	·Load(SB)
+
+TEXT ·Loadint64(SB), NOSPLIT, $0-12
+	JMP	·Load64(SB)
+
+TEXT ·Xaddint32(SB), NOSPLIT, $0-12
+	JMP	·Xadd(SB)
+
+TEXT ·Xaddint64(SB), NOSPLIT, $0-20
+	JMP	·Xadd64(SB)
+
+// bool ·Cas64(uint64 *val, uint64 old, uint64 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else {
+//		return 0;
+//	}
+TEXT ·Cas64(SB), NOSPLIT, $0-21
+	NO_LOCAL_POINTERS
+	MOVL	ptr+0(FP), BP
+	TESTL	$7, BP
+	JZ	2(PC)
+	CALL	·panicUnaligned(SB)
+	MOVL	old_lo+4(FP), AX
+	MOVL	old_hi+8(FP), DX
+	MOVL	new_lo+12(FP), BX
+	MOVL	new_hi+16(FP), CX
+	LOCK
+	CMPXCHG8B	0(BP)
+	SETEQ	ret+20(FP)
+	RET
+
+// bool Casp1(void **p, void *old, void *new)
+// Atomically:
+//	if(*p == old){
+//		*p = new;
+//		return 1;
+//	}else
+//		return 0;
+TEXT ·Casp1(SB), NOSPLIT, $0-13
+	MOVL	ptr+0(FP), BX
+	MOVL	old+4(FP), AX
+	MOVL	new+8(FP), CX
+	LOCK
+	CMPXCHGL	CX, 0(BX)
+	SETEQ	ret+12(FP)
+	RET
+
+// uint32 Xadd(uint32 volatile *val, int32 delta)
+// Atomically:
+//	*val += delta;
+//	return *val;
+TEXT ·Xadd(SB), NOSPLIT, $0-12
+	MOVL	ptr+0(FP), BX
+	MOVL	delta+4(FP), AX
+	MOVL	AX, CX
+	LOCK
+	XADDL	AX, 0(BX)
+	ADDL	CX, AX
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT ·Xadd64(SB), NOSPLIT, $0-20
+	NO_LOCAL_POINTERS
+	// no XADDQ so use CMPXCHG8B loop
+	MOVL	ptr+0(FP), BP
+	TESTL	$7, BP
+	JZ	2(PC)
+	CALL	·panicUnaligned(SB)
+	// DI:SI = delta
+	MOVL	delta_lo+4(FP), SI
+	MOVL	delta_hi+8(FP), DI
+	// DX:AX = *addr
+	MOVL	0(BP), AX
+	MOVL	4(BP), DX
+addloop:
+	// CX:BX = DX:AX (*addr) + DI:SI (delta)
+	MOVL	AX, BX
+	MOVL	DX, CX
+	ADDL	SI, BX
+	ADCL	DI, CX
+
+	// if *addr == DX:AX {
+	//	*addr = CX:BX
+	// } else {
+	//	DX:AX = *addr
+	// }
+	// all in one instruction
+	LOCK
+	CMPXCHG8B	0(BP)
+
+	JNZ	addloop
+
+	// success
+	// return CX:BX
+	MOVL	BX, ret_lo+12(FP)
+	MOVL	CX, ret_hi+16(FP)
+	RET
+
+TEXT ·Xchg(SB), NOSPLIT, $0-12
+	MOVL	ptr+0(FP), BX
+	MOVL	new+4(FP), AX
+	XCHGL	AX, 0(BX)
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT ·Xchgint32(SB), NOSPLIT, $0-12
+	JMP	·Xchg(SB)
+
+TEXT ·Xchgint64(SB), NOSPLIT, $0-20
+	JMP	·Xchg64(SB)
+
+TEXT ·Xchguintptr(SB), NOSPLIT, $0-12
+	JMP	·Xchg(SB)
+
+TEXT ·Xchg64(SB),NOSPLIT,$0-20
+	NO_LOCAL_POINTERS
+	// no XCHGQ so use CMPXCHG8B loop
+	MOVL	ptr+0(FP), BP
+	TESTL	$7, BP
+	JZ	2(PC)
+	CALL	·panicUnaligned(SB)
+	// CX:BX = new
+	MOVL	new_lo+4(FP), BX
+	MOVL	new_hi+8(FP), CX
+	// DX:AX = *addr
+	MOVL	0(BP), AX
+	MOVL	4(BP), DX
+swaploop:
+	// if *addr == DX:AX
+	//	*addr = CX:BX
+	// else
+	//	DX:AX = *addr
+	// all in one instruction
+	LOCK
+	CMPXCHG8B	0(BP)
+	JNZ	swaploop
+
+	// success
+	// return DX:AX
+	MOVL	AX, ret_lo+12(FP)
+	MOVL	DX, ret_hi+16(FP)
+	RET
+
+TEXT ·StorepNoWB(SB), NOSPLIT, $0-8
+	MOVL	ptr+0(FP), BX
+	MOVL	val+4(FP), AX
+	XCHGL	AX, 0(BX)
+	RET
+
+TEXT ·Store(SB), NOSPLIT, $0-8
+	MOVL	ptr+0(FP), BX
+	MOVL	val+4(FP), AX
+	XCHGL	AX, 0(BX)
+	RET
+
+TEXT ·StoreRel(SB), NOSPLIT, $0-8
+	JMP	·Store(SB)
+
+TEXT ·StoreReluintptr(SB), NOSPLIT, $0-8
+	JMP	·Store(SB)
+
+// uint64 atomicload64(uint64 volatile* addr);
+TEXT ·Load64(SB), NOSPLIT, $0-12
+	NO_LOCAL_POINTERS
+	MOVL	ptr+0(FP), AX
+	TESTL	$7, AX
+	JZ	2(PC)
+	CALL	·panicUnaligned(SB)
+	MOVQ	(AX), M0
+	MOVQ	M0, ret+4(FP)
+	EMMS
+	RET
+
+// void ·Store64(uint64 volatile* addr, uint64 v);
+TEXT ·Store64(SB), NOSPLIT, $0-12
+	NO_LOCAL_POINTERS
+	MOVL	ptr+0(FP), AX
+	TESTL	$7, AX
+	JZ	2(PC)
+	CALL	·panicUnaligned(SB)
+	// MOVQ and EMMS were introduced on the Pentium MMX.
+	MOVQ	val+4(FP), M0
+	MOVQ	M0, (AX)
+	EMMS
+	// This is essentially a no-op, but it provides required memory fencing.
+	// It can be replaced with MFENCE, but MFENCE was introduced only on the Pentium4 (SSE2).
+	XORL	AX, AX
+	LOCK
+	XADDL	AX, (SP)
+	RET
+
+// void	·Or8(byte volatile*, byte);
+TEXT ·Or8(SB), NOSPLIT, $0-5
+	MOVL	ptr+0(FP), AX
+	MOVB	val+4(FP), BX
+	LOCK
+	ORB	BX, (AX)
+	RET
+
+// void	·And8(byte volatile*, byte);
+TEXT ·And8(SB), NOSPLIT, $0-5
+	MOVL	ptr+0(FP), AX
+	MOVB	val+4(FP), BX
+	LOCK
+	ANDB	BX, (AX)
+	RET
+
+TEXT ·Store8(SB), NOSPLIT, $0-5
+	MOVL	ptr+0(FP), BX
+	MOVB	val+4(FP), AX
+	XCHGB	AX, 0(BX)
+	RET
+
+// func Or(addr *uint32, v uint32)
+TEXT ·Or(SB), NOSPLIT, $0-8
+	MOVL	ptr+0(FP), AX
+	MOVL	val+4(FP), BX
+	LOCK
+	ORL	BX, (AX)
+	RET
+
+// func And(addr *uint32, v uint32)
+TEXT ·And(SB), NOSPLIT, $0-8
+	MOVL	ptr+0(FP), AX
+	MOVL	val+4(FP), BX
+	LOCK
+	ANDL	BX, (AX)
+	RET
diff --git a/src/runtime/internal/atomic/atomic_amd64.go b/src/runtime/internal/atomic/atomic_amd64.go
new file mode 100644
index 0000000..52a8362
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_amd64.go
@@ -0,0 +1,117 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package atomic
+
+import "unsafe"
+
+// Export some functions via linkname to assembly in sync/atomic.
+//
+//go:linkname Load
+//go:linkname Loadp
+//go:linkname Load64
+
+//go:nosplit
+//go:noinline
+func Load(ptr *uint32) uint32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer {
+	return *(*unsafe.Pointer)(ptr)
+}
+
+//go:nosplit
+//go:noinline
+func Load64(ptr *uint64) uint64 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcq(ptr *uint32) uint32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcq64(ptr *uint64) uint64 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcquintptr(ptr *uintptr) uintptr {
+	return *ptr
+}
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xadd64(ptr *uint64, delta int64) uint64
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchg64(ptr *uint64, new uint64) uint64
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:nosplit
+//go:noinline
+func Load8(ptr *uint8) uint8 {
+	return *ptr
+}
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+// NOTE: Do not add atomicxor8 (XOR is not idempotent).
+
+//go:noescape
+func Cas64(ptr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(ptr *uint32, old, new uint32) bool
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+//go:noescape
+func Store64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreRel(ptr *uint32, val uint32)
+
+//go:noescape
+func StoreRel64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreReluintptr(ptr *uintptr, val uintptr)
+
+// StorepNoWB performs *ptr = val atomically and without a write
+// barrier.
+//
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
diff --git a/src/runtime/internal/atomic/atomic_amd64.s b/src/runtime/internal/atomic/atomic_amd64.s
new file mode 100644
index 0000000..d21514b
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_amd64.s
@@ -0,0 +1,225 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Note: some of these functions are semantically inlined
+// by the compiler (in src/cmd/compile/internal/gc/ssa.go).
+
+#include "textflag.h"
+
+TEXT ·Loaduintptr(SB), NOSPLIT, $0-16
+	JMP	·Load64(SB)
+
+TEXT ·Loaduint(SB), NOSPLIT, $0-16
+	JMP	·Load64(SB)
+
+TEXT ·Loadint32(SB), NOSPLIT, $0-12
+	JMP	·Load(SB)
+
+TEXT ·Loadint64(SB), NOSPLIT, $0-16
+	JMP	·Load64(SB)
+
+// bool Cas(int32 *val, int32 old, int32 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT ·Cas(SB),NOSPLIT,$0-17
+	MOVQ	ptr+0(FP), BX
+	MOVL	old+8(FP), AX
+	MOVL	new+12(FP), CX
+	LOCK
+	CMPXCHGL	CX, 0(BX)
+	SETEQ	ret+16(FP)
+	RET
+
+// bool	·Cas64(uint64 *val, uint64 old, uint64 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else {
+//		return 0;
+//	}
+TEXT ·Cas64(SB), NOSPLIT, $0-25
+	MOVQ	ptr+0(FP), BX
+	MOVQ	old+8(FP), AX
+	MOVQ	new+16(FP), CX
+	LOCK
+	CMPXCHGQ	CX, 0(BX)
+	SETEQ	ret+24(FP)
+	RET
+
+// bool Casp1(void **val, void *old, void *new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT ·Casp1(SB), NOSPLIT, $0-25
+	MOVQ	ptr+0(FP), BX
+	MOVQ	old+8(FP), AX
+	MOVQ	new+16(FP), CX
+	LOCK
+	CMPXCHGQ	CX, 0(BX)
+	SETEQ	ret+24(FP)
+	RET
+
+TEXT ·Casint32(SB), NOSPLIT, $0-17
+	JMP	·Cas(SB)
+
+TEXT ·Casint64(SB), NOSPLIT, $0-25
+	JMP	·Cas64(SB)
+
+TEXT ·Casuintptr(SB), NOSPLIT, $0-25
+	JMP	·Cas64(SB)
+
+TEXT ·CasRel(SB), NOSPLIT, $0-17
+	JMP	·Cas(SB)
+
+// uint32 Xadd(uint32 volatile *val, int32 delta)
+// Atomically:
+//	*val += delta;
+//	return *val;
+TEXT ·Xadd(SB), NOSPLIT, $0-20
+	MOVQ	ptr+0(FP), BX
+	MOVL	delta+8(FP), AX
+	MOVL	AX, CX
+	LOCK
+	XADDL	AX, 0(BX)
+	ADDL	CX, AX
+	MOVL	AX, ret+16(FP)
+	RET
+
+// uint64 Xadd64(uint64 volatile *val, int64 delta)
+// Atomically:
+//	*val += delta;
+//	return *val;
+TEXT ·Xadd64(SB), NOSPLIT, $0-24
+	MOVQ	ptr+0(FP), BX
+	MOVQ	delta+8(FP), AX
+	MOVQ	AX, CX
+	LOCK
+	XADDQ	AX, 0(BX)
+	ADDQ	CX, AX
+	MOVQ	AX, ret+16(FP)
+	RET
+
+TEXT ·Xaddint32(SB), NOSPLIT, $0-20
+	JMP	·Xadd(SB)
+
+TEXT ·Xaddint64(SB), NOSPLIT, $0-24
+	JMP	·Xadd64(SB)
+
+TEXT ·Xadduintptr(SB), NOSPLIT, $0-24
+	JMP	·Xadd64(SB)
+
+// uint32 Xchg(ptr *uint32, new uint32)
+// Atomically:
+//	old := *ptr;
+//	*ptr = new;
+//	return old;
+TEXT ·Xchg(SB), NOSPLIT, $0-20
+	MOVQ	ptr+0(FP), BX
+	MOVL	new+8(FP), AX
+	XCHGL	AX, 0(BX)
+	MOVL	AX, ret+16(FP)
+	RET
+
+// uint64 Xchg64(ptr *uint64, new uint64)
+// Atomically:
+//	old := *ptr;
+//	*ptr = new;
+//	return old;
+TEXT ·Xchg64(SB), NOSPLIT, $0-24
+	MOVQ	ptr+0(FP), BX
+	MOVQ	new+8(FP), AX
+	XCHGQ	AX, 0(BX)
+	MOVQ	AX, ret+16(FP)
+	RET
+
+TEXT ·Xchgint32(SB), NOSPLIT, $0-20
+	JMP	·Xchg(SB)
+
+TEXT ·Xchgint64(SB), NOSPLIT, $0-24
+	JMP	·Xchg64(SB)
+
+TEXT ·Xchguintptr(SB), NOSPLIT, $0-24
+	JMP	·Xchg64(SB)
+
+TEXT ·StorepNoWB(SB), NOSPLIT, $0-16
+	MOVQ	ptr+0(FP), BX
+	MOVQ	val+8(FP), AX
+	XCHGQ	AX, 0(BX)
+	RET
+
+TEXT ·Store(SB), NOSPLIT, $0-12
+	MOVQ	ptr+0(FP), BX
+	MOVL	val+8(FP), AX
+	XCHGL	AX, 0(BX)
+	RET
+
+TEXT ·Store8(SB), NOSPLIT, $0-9
+	MOVQ	ptr+0(FP), BX
+	MOVB	val+8(FP), AX
+	XCHGB	AX, 0(BX)
+	RET
+
+TEXT ·Store64(SB), NOSPLIT, $0-16
+	MOVQ	ptr+0(FP), BX
+	MOVQ	val+8(FP), AX
+	XCHGQ	AX, 0(BX)
+	RET
+
+TEXT ·Storeint32(SB), NOSPLIT, $0-12
+	JMP	·Store(SB)
+
+TEXT ·Storeint64(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·Storeuintptr(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·StoreRel(SB), NOSPLIT, $0-12
+	JMP	·Store(SB)
+
+TEXT ·StoreRel64(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·StoreReluintptr(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+// void	·Or8(byte volatile*, byte);
+TEXT ·Or8(SB), NOSPLIT, $0-9
+	MOVQ	ptr+0(FP), AX
+	MOVB	val+8(FP), BX
+	LOCK
+	ORB	BX, (AX)
+	RET
+
+// void	·And8(byte volatile*, byte);
+TEXT ·And8(SB), NOSPLIT, $0-9
+	MOVQ	ptr+0(FP), AX
+	MOVB	val+8(FP), BX
+	LOCK
+	ANDB	BX, (AX)
+	RET
+
+// func Or(addr *uint32, v uint32)
+TEXT ·Or(SB), NOSPLIT, $0-12
+	MOVQ	ptr+0(FP), AX
+	MOVL	val+8(FP), BX
+	LOCK
+	ORL	BX, (AX)
+	RET
+
+// func And(addr *uint32, v uint32)
+TEXT ·And(SB), NOSPLIT, $0-12
+	MOVQ	ptr+0(FP), AX
+	MOVL	val+8(FP), BX
+	LOCK
+	ANDL	BX, (AX)
+	RET
diff --git a/src/runtime/internal/atomic/atomic_arm.go b/src/runtime/internal/atomic/atomic_arm.go
new file mode 100644
index 0000000..bdb1847
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_arm.go
@@ -0,0 +1,244 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build arm
+
+package atomic
+
+import (
+	"internal/cpu"
+	"unsafe"
+)
+
+// Export some functions via linkname to assembly in sync/atomic.
+//
+//go:linkname Xchg
+//go:linkname Xchguintptr
+
+type spinlock struct {
+	v uint32
+}
+
+//go:nosplit
+func (l *spinlock) lock() {
+	for {
+		if Cas(&l.v, 0, 1) {
+			return
+		}
+	}
+}
+
+//go:nosplit
+func (l *spinlock) unlock() {
+	Store(&l.v, 0)
+}
+
+var locktab [57]struct {
+	l   spinlock
+	pad [cpu.CacheLinePadSize - unsafe.Sizeof(spinlock{})]byte
+}
+
+func addrLock(addr *uint64) *spinlock {
+	return &locktab[(uintptr(unsafe.Pointer(addr))>>3)%uintptr(len(locktab))].l
+}
+
+// Atomic add and return new value.
+//
+//go:nosplit
+func Xadd(val *uint32, delta int32) uint32 {
+	for {
+		oval := *val
+		nval := oval + uint32(delta)
+		if Cas(val, oval, nval) {
+			return nval
+		}
+	}
+}
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:nosplit
+func Xchg(addr *uint32, v uint32) uint32 {
+	for {
+		old := *addr
+		if Cas(addr, old, v) {
+			return old
+		}
+	}
+}
+
+//go:nosplit
+func Xchguintptr(addr *uintptr, v uintptr) uintptr {
+	return uintptr(Xchg((*uint32)(unsafe.Pointer(addr)), uint32(v)))
+}
+
+// Not noescape -- it installs a pointer to addr.
+func StorepNoWB(addr unsafe.Pointer, v unsafe.Pointer)
+
+//go:noescape
+func Store(addr *uint32, v uint32)
+
+//go:noescape
+func StoreRel(addr *uint32, v uint32)
+
+//go:noescape
+func StoreReluintptr(addr *uintptr, v uintptr)
+
+//go:nosplit
+func goCas64(addr *uint64, old, new uint64) bool {
+	if uintptr(unsafe.Pointer(addr))&7 != 0 {
+		*(*int)(nil) = 0 // crash on unaligned uint64
+	}
+	_ = *addr // if nil, fault before taking the lock
+	var ok bool
+	addrLock(addr).lock()
+	if *addr == old {
+		*addr = new
+		ok = true
+	}
+	addrLock(addr).unlock()
+	return ok
+}
+
+//go:nosplit
+func goXadd64(addr *uint64, delta int64) uint64 {
+	if uintptr(unsafe.Pointer(addr))&7 != 0 {
+		*(*int)(nil) = 0 // crash on unaligned uint64
+	}
+	_ = *addr // if nil, fault before taking the lock
+	var r uint64
+	addrLock(addr).lock()
+	r = *addr + uint64(delta)
+	*addr = r
+	addrLock(addr).unlock()
+	return r
+}
+
+//go:nosplit
+func goXchg64(addr *uint64, v uint64) uint64 {
+	if uintptr(unsafe.Pointer(addr))&7 != 0 {
+		*(*int)(nil) = 0 // crash on unaligned uint64
+	}
+	_ = *addr // if nil, fault before taking the lock
+	var r uint64
+	addrLock(addr).lock()
+	r = *addr
+	*addr = v
+	addrLock(addr).unlock()
+	return r
+}
+
+//go:nosplit
+func goLoad64(addr *uint64) uint64 {
+	if uintptr(unsafe.Pointer(addr))&7 != 0 {
+		*(*int)(nil) = 0 // crash on unaligned uint64
+	}
+	_ = *addr // if nil, fault before taking the lock
+	var r uint64
+	addrLock(addr).lock()
+	r = *addr
+	addrLock(addr).unlock()
+	return r
+}
+
+//go:nosplit
+func goStore64(addr *uint64, v uint64) {
+	if uintptr(unsafe.Pointer(addr))&7 != 0 {
+		*(*int)(nil) = 0 // crash on unaligned uint64
+	}
+	_ = *addr // if nil, fault before taking the lock
+	addrLock(addr).lock()
+	*addr = v
+	addrLock(addr).unlock()
+}
+
+//go:nosplit
+func Or8(addr *uint8, v uint8) {
+	// Align down to 4 bytes and use 32-bit CAS.
+	uaddr := uintptr(unsafe.Pointer(addr))
+	addr32 := (*uint32)(unsafe.Pointer(uaddr &^ 3))
+	word := uint32(v) << ((uaddr & 3) * 8) // little endian
+	for {
+		old := *addr32
+		if Cas(addr32, old, old|word) {
+			return
+		}
+	}
+}
+
+//go:nosplit
+func And8(addr *uint8, v uint8) {
+	// Align down to 4 bytes and use 32-bit CAS.
+	uaddr := uintptr(unsafe.Pointer(addr))
+	addr32 := (*uint32)(unsafe.Pointer(uaddr &^ 3))
+	word := uint32(v) << ((uaddr & 3) * 8)    // little endian
+	mask := uint32(0xFF) << ((uaddr & 3) * 8) // little endian
+	word |= ^mask
+	for {
+		old := *addr32
+		if Cas(addr32, old, old&word) {
+			return
+		}
+	}
+}
+
+//go:nosplit
+func Or(addr *uint32, v uint32) {
+	for {
+		old := *addr
+		if Cas(addr, old, old|v) {
+			return
+		}
+	}
+}
+
+//go:nosplit
+func And(addr *uint32, v uint32) {
+	for {
+		old := *addr
+		if Cas(addr, old, old&v) {
+			return
+		}
+	}
+}
+
+//go:nosplit
+func armcas(ptr *uint32, old, new uint32) bool
+
+//go:noescape
+func Load(addr *uint32) uint32
+
+// NO go:noescape annotation; *addr escapes if result escapes (#31525)
+func Loadp(addr unsafe.Pointer) unsafe.Pointer
+
+//go:noescape
+func Load8(addr *uint8) uint8
+
+//go:noescape
+func LoadAcq(addr *uint32) uint32
+
+//go:noescape
+func LoadAcquintptr(ptr *uintptr) uintptr
+
+//go:noescape
+func Cas64(addr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(addr *uint32, old, new uint32) bool
+
+//go:noescape
+func Xadd64(addr *uint64, delta int64) uint64
+
+//go:noescape
+func Xchg64(addr *uint64, v uint64) uint64
+
+//go:noescape
+func Load64(addr *uint64) uint64
+
+//go:noescape
+func Store8(addr *uint8, v uint8)
+
+//go:noescape
+func Store64(addr *uint64, v uint64)
diff --git a/src/runtime/internal/atomic/atomic_arm.s b/src/runtime/internal/atomic/atomic_arm.s
new file mode 100644
index 0000000..92cbe8a
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_arm.s
@@ -0,0 +1,297 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+#include "funcdata.h"
+
+// bool armcas(int32 *val, int32 old, int32 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	}else
+//		return 0;
+//
+// To implement ·cas in sys_$GOOS_arm.s
+// using the native instructions, use:
+//
+//	TEXT ·cas(SB),NOSPLIT,$0
+//		B	·armcas(SB)
+//
+TEXT ·armcas(SB),NOSPLIT,$0-13
+	MOVW	ptr+0(FP), R1
+	MOVW	old+4(FP), R2
+	MOVW	new+8(FP), R3
+casl:
+	LDREX	(R1), R0
+	CMP	R0, R2
+	BNE	casfail
+
+	MOVB	runtime·goarm(SB), R8
+	CMP	$7, R8
+	BLT	2(PC)
+	DMB	MB_ISHST
+
+	STREX	R3, (R1), R0
+	CMP	$0, R0
+	BNE	casl
+	MOVW	$1, R0
+
+	CMP	$7, R8
+	BLT	2(PC)
+	DMB	MB_ISH
+
+	MOVB	R0, ret+12(FP)
+	RET
+casfail:
+	MOVW	$0, R0
+	MOVB	R0, ret+12(FP)
+	RET
+
+// stubs
+
+TEXT ·Loadp(SB),NOSPLIT|NOFRAME,$0-8
+	B	·Load(SB)
+
+TEXT ·LoadAcq(SB),NOSPLIT|NOFRAME,$0-8
+	B	·Load(SB)
+
+TEXT ·LoadAcquintptr(SB),NOSPLIT|NOFRAME,$0-8
+	B 	·Load(SB)
+
+TEXT ·Casint32(SB),NOSPLIT,$0-13
+	B	·Cas(SB)
+
+TEXT ·Casint64(SB),NOSPLIT,$-4-21
+	B	·Cas64(SB)
+
+TEXT ·Casuintptr(SB),NOSPLIT,$0-13
+	B	·Cas(SB)
+
+TEXT ·Casp1(SB),NOSPLIT,$0-13
+	B	·Cas(SB)
+
+TEXT ·CasRel(SB),NOSPLIT,$0-13
+	B	·Cas(SB)
+
+TEXT ·Loadint32(SB),NOSPLIT,$0-8
+	B	·Load(SB)
+
+TEXT ·Loadint64(SB),NOSPLIT,$-4-12
+	B	·Load64(SB)
+
+TEXT ·Loaduintptr(SB),NOSPLIT,$0-8
+	B	·Load(SB)
+
+TEXT ·Loaduint(SB),NOSPLIT,$0-8
+	B	·Load(SB)
+
+TEXT ·Storeint32(SB),NOSPLIT,$0-8
+	B	·Store(SB)
+
+TEXT ·Storeint64(SB),NOSPLIT,$0-12
+	B	·Store64(SB)
+
+TEXT ·Storeuintptr(SB),NOSPLIT,$0-8
+	B	·Store(SB)
+
+TEXT ·StorepNoWB(SB),NOSPLIT,$0-8
+	B	·Store(SB)
+
+TEXT ·StoreRel(SB),NOSPLIT,$0-8
+	B	·Store(SB)
+
+TEXT ·StoreReluintptr(SB),NOSPLIT,$0-8
+	B	·Store(SB)
+
+TEXT ·Xaddint32(SB),NOSPLIT,$0-12
+	B	·Xadd(SB)
+
+TEXT ·Xaddint64(SB),NOSPLIT,$-4-20
+	B	·Xadd64(SB)
+
+TEXT ·Xadduintptr(SB),NOSPLIT,$0-12
+	B	·Xadd(SB)
+
+TEXT ·Xchgint32(SB),NOSPLIT,$0-12
+	B	·Xchg(SB)
+
+TEXT ·Xchgint64(SB),NOSPLIT,$-4-20
+	B	·Xchg64(SB)
+
+// 64-bit atomics
+// The native ARM implementations use LDREXD/STREXD, which are
+// available on ARMv6k or later. We use them only on ARMv7.
+// On older ARM, we use Go implementations which simulate 64-bit
+// atomics with locks.
+TEXT armCas64<>(SB),NOSPLIT,$0-21
+	// addr is already in R1
+	MOVW	old_lo+4(FP), R2
+	MOVW	old_hi+8(FP), R3
+	MOVW	new_lo+12(FP), R4
+	MOVW	new_hi+16(FP), R5
+cas64loop:
+	LDREXD	(R1), R6	// loads R6 and R7
+	CMP	R2, R6
+	BNE	cas64fail
+	CMP	R3, R7
+	BNE	cas64fail
+
+	DMB	MB_ISHST
+
+	STREXD	R4, (R1), R0	// stores R4 and R5
+	CMP	$0, R0
+	BNE	cas64loop
+	MOVW	$1, R0
+
+	DMB	MB_ISH
+
+	MOVBU	R0, swapped+20(FP)
+	RET
+cas64fail:
+	MOVW	$0, R0
+	MOVBU	R0, swapped+20(FP)
+	RET
+
+TEXT armXadd64<>(SB),NOSPLIT,$0-20
+	// addr is already in R1
+	MOVW	delta_lo+4(FP), R2
+	MOVW	delta_hi+8(FP), R3
+
+add64loop:
+	LDREXD	(R1), R4	// loads R4 and R5
+	ADD.S	R2, R4
+	ADC	R3, R5
+
+	DMB	MB_ISHST
+
+	STREXD	R4, (R1), R0	// stores R4 and R5
+	CMP	$0, R0
+	BNE	add64loop
+
+	DMB	MB_ISH
+
+	MOVW	R4, new_lo+12(FP)
+	MOVW	R5, new_hi+16(FP)
+	RET
+
+TEXT armXchg64<>(SB),NOSPLIT,$0-20
+	// addr is already in R1
+	MOVW	new_lo+4(FP), R2
+	MOVW	new_hi+8(FP), R3
+
+swap64loop:
+	LDREXD	(R1), R4	// loads R4 and R5
+
+	DMB	MB_ISHST
+
+	STREXD	R2, (R1), R0	// stores R2 and R3
+	CMP	$0, R0
+	BNE	swap64loop
+
+	DMB	MB_ISH
+
+	MOVW	R4, old_lo+12(FP)
+	MOVW	R5, old_hi+16(FP)
+	RET
+
+TEXT armLoad64<>(SB),NOSPLIT,$0-12
+	// addr is already in R1
+
+	LDREXD	(R1), R2	// loads R2 and R3
+	DMB	MB_ISH
+
+	MOVW	R2, val_lo+4(FP)
+	MOVW	R3, val_hi+8(FP)
+	RET
+
+TEXT armStore64<>(SB),NOSPLIT,$0-12
+	// addr is already in R1
+	MOVW	val_lo+4(FP), R2
+	MOVW	val_hi+8(FP), R3
+
+store64loop:
+	LDREXD	(R1), R4	// loads R4 and R5
+
+	DMB	MB_ISHST
+
+	STREXD	R2, (R1), R0	// stores R2 and R3
+	CMP	$0, R0
+	BNE	store64loop
+
+	DMB	MB_ISH
+	RET
+
+// The following functions all panic if their address argument isn't
+// 8-byte aligned. Since we're calling back into Go code to do this,
+// we have to cooperate with stack unwinding. In the normal case, the
+// functions tail-call into the appropriate implementation, which
+// means they must not open a frame. Hence, when they go down the
+// panic path, at that point they push the LR to create a real frame
+// (they don't need to pop it because panic won't return; however, we
+// do need to set the SP delta back).
+
+// Check if R1 is 8-byte aligned, panic if not.
+// Clobbers R2.
+#define CHECK_ALIGN \
+	AND.S	$7, R1, R2 \
+	BEQ 	4(PC) \
+	MOVW.W	R14, -4(R13) /* prepare a real frame */ \
+	BL	·panicUnaligned(SB) \
+	ADD	$4, R13 /* compensate SP delta */
+
+TEXT ·Cas64(SB),NOSPLIT,$-4-21
+	NO_LOCAL_POINTERS
+	MOVW	addr+0(FP), R1
+	CHECK_ALIGN
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	JMP	armCas64<>(SB)
+	JMP	·goCas64(SB)
+
+TEXT ·Xadd64(SB),NOSPLIT,$-4-20
+	NO_LOCAL_POINTERS
+	MOVW	addr+0(FP), R1
+	CHECK_ALIGN
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	JMP	armXadd64<>(SB)
+	JMP	·goXadd64(SB)
+
+TEXT ·Xchg64(SB),NOSPLIT,$-4-20
+	NO_LOCAL_POINTERS
+	MOVW	addr+0(FP), R1
+	CHECK_ALIGN
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	JMP	armXchg64<>(SB)
+	JMP	·goXchg64(SB)
+
+TEXT ·Load64(SB),NOSPLIT,$-4-12
+	NO_LOCAL_POINTERS
+	MOVW	addr+0(FP), R1
+	CHECK_ALIGN
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	JMP	armLoad64<>(SB)
+	JMP	·goLoad64(SB)
+
+TEXT ·Store64(SB),NOSPLIT,$-4-12
+	NO_LOCAL_POINTERS
+	MOVW	addr+0(FP), R1
+	CHECK_ALIGN
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	JMP	armStore64<>(SB)
+	JMP	·goStore64(SB)
diff --git a/src/runtime/internal/atomic/atomic_arm64.go b/src/runtime/internal/atomic/atomic_arm64.go
new file mode 100644
index 0000000..459fb99
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_arm64.go
@@ -0,0 +1,94 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build arm64
+
+package atomic
+
+import (
+	"internal/cpu"
+	"unsafe"
+)
+
+const (
+	offsetARM64HasATOMICS = unsafe.Offsetof(cpu.ARM64.HasATOMICS)
+)
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xadd64(ptr *uint64, delta int64) uint64
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchg64(ptr *uint64, new uint64) uint64
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:noescape
+func Load(ptr *uint32) uint32
+
+//go:noescape
+func Load8(ptr *uint8) uint8
+
+//go:noescape
+func Load64(ptr *uint64) uint64
+
+// NO go:noescape annotation; *ptr escapes if result escapes (#31525)
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer
+
+//go:noescape
+func LoadAcq(addr *uint32) uint32
+
+//go:noescape
+func LoadAcq64(ptr *uint64) uint64
+
+//go:noescape
+func LoadAcquintptr(ptr *uintptr) uintptr
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+//go:noescape
+func Cas64(ptr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(ptr *uint32, old, new uint32) bool
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+//go:noescape
+func Store64(ptr *uint64, val uint64)
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+
+//go:noescape
+func StoreRel(ptr *uint32, val uint32)
+
+//go:noescape
+func StoreRel64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreReluintptr(ptr *uintptr, val uintptr)
diff --git a/src/runtime/internal/atomic/atomic_arm64.s b/src/runtime/internal/atomic/atomic_arm64.s
new file mode 100644
index 0000000..5f77d92
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_arm64.s
@@ -0,0 +1,333 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+TEXT ·Casint32(SB), NOSPLIT, $0-17
+	B	·Cas(SB)
+
+TEXT ·Casint64(SB), NOSPLIT, $0-25
+	B	·Cas64(SB)
+
+TEXT ·Casuintptr(SB), NOSPLIT, $0-25
+	B	·Cas64(SB)
+
+TEXT ·CasRel(SB), NOSPLIT, $0-17
+	B	·Cas(SB)
+
+TEXT ·Loadint32(SB), NOSPLIT, $0-12
+	B	·Load(SB)
+
+TEXT ·Loadint64(SB), NOSPLIT, $0-16
+	B	·Load64(SB)
+
+TEXT ·Loaduintptr(SB), NOSPLIT, $0-16
+	B	·Load64(SB)
+
+TEXT ·Loaduint(SB), NOSPLIT, $0-16
+	B	·Load64(SB)
+
+TEXT ·Storeint32(SB), NOSPLIT, $0-12
+	B	·Store(SB)
+
+TEXT ·Storeint64(SB), NOSPLIT, $0-16
+	B	·Store64(SB)
+
+TEXT ·Storeuintptr(SB), NOSPLIT, $0-16
+	B	·Store64(SB)
+
+TEXT ·Xaddint32(SB), NOSPLIT, $0-20
+	B	·Xadd(SB)
+
+TEXT ·Xaddint64(SB), NOSPLIT, $0-24
+	B	·Xadd64(SB)
+
+TEXT ·Xadduintptr(SB), NOSPLIT, $0-24
+	B	·Xadd64(SB)
+
+TEXT ·Casp1(SB), NOSPLIT, $0-25
+	B ·Cas64(SB)
+
+// uint32 ·Load(uint32 volatile* addr)
+TEXT ·Load(SB),NOSPLIT,$0-12
+	MOVD	ptr+0(FP), R0
+	LDARW	(R0), R0
+	MOVW	R0, ret+8(FP)
+	RET
+
+// uint8 ·Load8(uint8 volatile* addr)
+TEXT ·Load8(SB),NOSPLIT,$0-9
+	MOVD	ptr+0(FP), R0
+	LDARB	(R0), R0
+	MOVB	R0, ret+8(FP)
+	RET
+
+// uint64 ·Load64(uint64 volatile* addr)
+TEXT ·Load64(SB),NOSPLIT,$0-16
+	MOVD	ptr+0(FP), R0
+	LDAR	(R0), R0
+	MOVD	R0, ret+8(FP)
+	RET
+
+// void *·Loadp(void *volatile *addr)
+TEXT ·Loadp(SB),NOSPLIT,$0-16
+	MOVD	ptr+0(FP), R0
+	LDAR	(R0), R0
+	MOVD	R0, ret+8(FP)
+	RET
+
+// uint32 ·LoadAcq(uint32 volatile* addr)
+TEXT ·LoadAcq(SB),NOSPLIT,$0-12
+	B	·Load(SB)
+
+// uint64 ·LoadAcquintptr(uint64 volatile* addr)
+TEXT ·LoadAcq64(SB),NOSPLIT,$0-16
+	B	·Load64(SB)
+
+// uintptr ·LoadAcq64(uintptr volatile* addr)
+TEXT ·LoadAcquintptr(SB),NOSPLIT,$0-16
+	B	·Load64(SB)
+
+TEXT ·StorepNoWB(SB), NOSPLIT, $0-16
+	B	·Store64(SB)
+
+TEXT ·StoreRel(SB), NOSPLIT, $0-12
+	B	·Store(SB)
+
+TEXT ·StoreRel64(SB), NOSPLIT, $0-16
+	B	·Store64(SB)
+
+TEXT ·StoreReluintptr(SB), NOSPLIT, $0-16
+	B	·Store64(SB)
+
+TEXT ·Store(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R0
+	MOVW	val+8(FP), R1
+	STLRW	R1, (R0)
+	RET
+
+TEXT ·Store8(SB), NOSPLIT, $0-9
+	MOVD	ptr+0(FP), R0
+	MOVB	val+8(FP), R1
+	STLRB	R1, (R0)
+	RET
+
+TEXT ·Store64(SB), NOSPLIT, $0-16
+	MOVD	ptr+0(FP), R0
+	MOVD	val+8(FP), R1
+	STLR	R1, (R0)
+	RET
+
+// uint32 Xchg(ptr *uint32, new uint32)
+// Atomically:
+//	old := *ptr;
+//	*ptr = new;
+//	return old;
+TEXT ·Xchg(SB), NOSPLIT, $0-20
+	MOVD	ptr+0(FP), R0
+	MOVW	new+8(FP), R1
+	MOVBU	internal∕cpu·ARM64+const_offsetARM64HasATOMICS(SB), R4
+	CBZ 	R4, load_store_loop
+	SWPALW	R1, (R0), R2
+	MOVW	R2, ret+16(FP)
+	RET
+load_store_loop:
+	LDAXRW	(R0), R2
+	STLXRW	R1, (R0), R3
+	CBNZ	R3, load_store_loop
+	MOVW	R2, ret+16(FP)
+	RET
+
+// uint64 Xchg64(ptr *uint64, new uint64)
+// Atomically:
+//	old := *ptr;
+//	*ptr = new;
+//	return old;
+TEXT ·Xchg64(SB), NOSPLIT, $0-24
+	MOVD	ptr+0(FP), R0
+	MOVD	new+8(FP), R1
+	MOVBU	internal∕cpu·ARM64+const_offsetARM64HasATOMICS(SB), R4
+	CBZ 	R4, load_store_loop
+	SWPALD	R1, (R0), R2
+	MOVD	R2, ret+16(FP)
+	RET
+load_store_loop:
+	LDAXR	(R0), R2
+	STLXR	R1, (R0), R3
+	CBNZ	R3, load_store_loop
+	MOVD	R2, ret+16(FP)
+	RET
+
+// bool Cas(uint32 *ptr, uint32 old, uint32 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT ·Cas(SB), NOSPLIT, $0-17
+	MOVD	ptr+0(FP), R0
+	MOVW	old+8(FP), R1
+	MOVW	new+12(FP), R2
+	MOVBU	internal∕cpu·ARM64+const_offsetARM64HasATOMICS(SB), R4
+	CBZ 	R4, load_store_loop
+	MOVD	R1, R3
+	CASALW	R3, (R0), R2
+	CMP 	R1, R3
+	CSET	EQ, R0
+	MOVB	R0, ret+16(FP)
+	RET
+load_store_loop:
+	LDAXRW	(R0), R3
+	CMPW	R1, R3
+	BNE	ok
+	STLXRW	R2, (R0), R3
+	CBNZ	R3, load_store_loop
+ok:
+	CSET	EQ, R0
+	MOVB	R0, ret+16(FP)
+	RET
+
+// bool ·Cas64(uint64 *ptr, uint64 old, uint64 new)
+// Atomically:
+//      if(*val == old){
+//              *val = new;
+//              return 1;
+//      } else {
+//              return 0;
+//      }
+TEXT ·Cas64(SB), NOSPLIT, $0-25
+	MOVD	ptr+0(FP), R0
+	MOVD	old+8(FP), R1
+	MOVD	new+16(FP), R2
+	MOVBU	internal∕cpu·ARM64+const_offsetARM64HasATOMICS(SB), R4
+	CBZ 	R4, load_store_loop
+	MOVD	R1, R3
+	CASALD	R3, (R0), R2
+	CMP 	R1, R3
+	CSET	EQ, R0
+	MOVB	R0, ret+24(FP)
+	RET
+load_store_loop:
+	LDAXR	(R0), R3
+	CMP	R1, R3
+	BNE	ok
+	STLXR	R2, (R0), R3
+	CBNZ	R3, load_store_loop
+ok:
+	CSET	EQ, R0
+	MOVB	R0, ret+24(FP)
+	RET
+
+// uint32 xadd(uint32 volatile *ptr, int32 delta)
+// Atomically:
+//      *val += delta;
+//      return *val;
+TEXT ·Xadd(SB), NOSPLIT, $0-20
+	MOVD	ptr+0(FP), R0
+	MOVW	delta+8(FP), R1
+	MOVBU	internal∕cpu·ARM64+const_offsetARM64HasATOMICS(SB), R4
+	CBZ 	R4, load_store_loop
+	LDADDALW	R1, (R0), R2
+	ADD 	R1, R2
+	MOVW	R2, ret+16(FP)
+	RET
+load_store_loop:
+	LDAXRW	(R0), R2
+	ADDW	R2, R1, R2
+	STLXRW	R2, (R0), R3
+	CBNZ	R3, load_store_loop
+	MOVW	R2, ret+16(FP)
+	RET
+
+// uint64 Xadd64(uint64 volatile *ptr, int64 delta)
+// Atomically:
+//      *val += delta;
+//      return *val;
+TEXT ·Xadd64(SB), NOSPLIT, $0-24
+	MOVD	ptr+0(FP), R0
+	MOVD	delta+8(FP), R1
+	MOVBU	internal∕cpu·ARM64+const_offsetARM64HasATOMICS(SB), R4
+	CBZ 	R4, load_store_loop
+	LDADDALD	R1, (R0), R2
+	ADD 	R1, R2
+	MOVD	R2, ret+16(FP)
+	RET
+load_store_loop:
+	LDAXR	(R0), R2
+	ADD	R2, R1, R2
+	STLXR	R2, (R0), R3
+	CBNZ	R3, load_store_loop
+	MOVD	R2, ret+16(FP)
+	RET
+
+TEXT ·Xchgint32(SB), NOSPLIT, $0-20
+	B	·Xchg(SB)
+
+TEXT ·Xchgint64(SB), NOSPLIT, $0-24
+	B	·Xchg64(SB)
+
+TEXT ·Xchguintptr(SB), NOSPLIT, $0-24
+	B	·Xchg64(SB)
+
+TEXT ·And8(SB), NOSPLIT, $0-9
+	MOVD	ptr+0(FP), R0
+	MOVB	val+8(FP), R1
+	MOVBU	internal∕cpu·ARM64+const_offsetARM64HasATOMICS(SB), R4
+	CBZ 	R4, load_store_loop
+	MVN 	R1, R2
+	LDCLRALB	R2, (R0), R3
+	RET
+load_store_loop:
+	LDAXRB	(R0), R2
+	AND	R1, R2
+	STLXRB	R2, (R0), R3
+	CBNZ	R3, load_store_loop
+	RET
+
+TEXT ·Or8(SB), NOSPLIT, $0-9
+	MOVD	ptr+0(FP), R0
+	MOVB	val+8(FP), R1
+	MOVBU	internal∕cpu·ARM64+const_offsetARM64HasATOMICS(SB), R4
+	CBZ 	R4, load_store_loop
+	LDORALB	R1, (R0), R2
+	RET
+load_store_loop:
+	LDAXRB	(R0), R2
+	ORR	R1, R2
+	STLXRB	R2, (R0), R3
+	CBNZ	R3, load_store_loop
+	RET
+
+// func And(addr *uint32, v uint32)
+TEXT ·And(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R0
+	MOVW	val+8(FP), R1
+	MOVBU	internal∕cpu·ARM64+const_offsetARM64HasATOMICS(SB), R4
+	CBZ 	R4, load_store_loop
+	MVN 	R1, R2
+	LDCLRALW	R2, (R0), R3
+	RET
+load_store_loop:
+	LDAXRW	(R0), R2
+	AND	R1, R2
+	STLXRW	R2, (R0), R3
+	CBNZ	R3, load_store_loop
+	RET
+
+// func Or(addr *uint32, v uint32)
+TEXT ·Or(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R0
+	MOVW	val+8(FP), R1
+	MOVBU	internal∕cpu·ARM64+const_offsetARM64HasATOMICS(SB), R4
+	CBZ 	R4, load_store_loop
+	LDORALW	R1, (R0), R2
+	RET
+load_store_loop:
+	LDAXRW	(R0), R2
+	ORR	R1, R2
+	STLXRW	R2, (R0), R3
+	CBNZ	R3, load_store_loop
+	RET
diff --git a/src/runtime/internal/atomic/atomic_loong64.go b/src/runtime/internal/atomic/atomic_loong64.go
new file mode 100644
index 0000000..d82a5b8
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_loong64.go
@@ -0,0 +1,89 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build loong64
+
+package atomic
+
+import "unsafe"
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xadd64(ptr *uint64, delta int64) uint64
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchg64(ptr *uint64, new uint64) uint64
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:noescape
+func Load(ptr *uint32) uint32
+
+//go:noescape
+func Load8(ptr *uint8) uint8
+
+//go:noescape
+func Load64(ptr *uint64) uint64
+
+// NO go:noescape annotation; *ptr escapes if result escapes (#31525)
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer
+
+//go:noescape
+func LoadAcq(ptr *uint32) uint32
+
+//go:noescape
+func LoadAcq64(ptr *uint64) uint64
+
+//go:noescape
+func LoadAcquintptr(ptr *uintptr) uintptr
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+// NOTE: Do not add atomicxor8 (XOR is not idempotent).
+
+//go:noescape
+func Cas64(ptr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(ptr *uint32, old, new uint32) bool
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+//go:noescape
+func Store64(ptr *uint64, val uint64)
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+
+//go:noescape
+func StoreRel(ptr *uint32, val uint32)
+
+//go:noescape
+func StoreRel64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreReluintptr(ptr *uintptr, val uintptr)
diff --git a/src/runtime/internal/atomic/atomic_loong64.s b/src/runtime/internal/atomic/atomic_loong64.s
new file mode 100644
index 0000000..34193ad
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_loong64.s
@@ -0,0 +1,306 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// bool cas(uint32 *ptr, uint32 old, uint32 new)
+// Atomically:
+//	if(*ptr == old){
+//		*ptr = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT ·Cas(SB), NOSPLIT, $0-17
+	MOVV	ptr+0(FP), R4
+	MOVW	old+8(FP), R5
+	MOVW	new+12(FP), R6
+	DBAR
+cas_again:
+	MOVV	R6, R7
+	LL	(R4), R8
+	BNE	R5, R8, cas_fail
+	SC	R7, (R4)
+	BEQ	R7, cas_again
+	MOVV	$1, R4
+	MOVB	R4, ret+16(FP)
+	DBAR
+	RET
+cas_fail:
+	MOVV	$0, R4
+	JMP	-4(PC)
+
+// bool	cas64(uint64 *ptr, uint64 old, uint64 new)
+// Atomically:
+//	if(*ptr == old){
+//		*ptr = new;
+//		return 1;
+//	} else {
+//		return 0;
+//	}
+TEXT ·Cas64(SB), NOSPLIT, $0-25
+	MOVV	ptr+0(FP), R4
+	MOVV	old+8(FP), R5
+	MOVV	new+16(FP), R6
+	DBAR
+cas64_again:
+	MOVV	R6, R7
+	LLV	(R4), R8
+	BNE	R5, R8, cas64_fail
+	SCV	R7, (R4)
+	BEQ	R7, cas64_again
+	MOVV	$1, R4
+	MOVB	R4, ret+24(FP)
+	DBAR
+	RET
+cas64_fail:
+	MOVV	$0, R4
+	JMP	-4(PC)
+
+TEXT ·Casuintptr(SB), NOSPLIT, $0-25
+	JMP	·Cas64(SB)
+
+TEXT ·CasRel(SB), NOSPLIT, $0-17
+	JMP	·Cas(SB)
+
+TEXT ·Loaduintptr(SB),  NOSPLIT|NOFRAME, $0-16
+	JMP	·Load64(SB)
+
+TEXT ·Loaduint(SB), NOSPLIT|NOFRAME, $0-16
+	JMP	·Load64(SB)
+
+TEXT ·Storeuintptr(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·Xadduintptr(SB), NOSPLIT, $0-24
+	JMP	·Xadd64(SB)
+
+TEXT ·Loadint64(SB), NOSPLIT, $0-16
+	JMP	·Load64(SB)
+
+TEXT ·Xaddint64(SB), NOSPLIT, $0-24
+	JMP	·Xadd64(SB)
+
+// bool casp(void **val, void *old, void *new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT ·Casp1(SB), NOSPLIT, $0-25
+	JMP	·Cas64(SB)
+
+// uint32 xadd(uint32 volatile *ptr, int32 delta)
+// Atomically:
+//	*val += delta;
+//	return *val;
+TEXT ·Xadd(SB), NOSPLIT, $0-20
+	MOVV	ptr+0(FP), R4
+	MOVW	delta+8(FP), R5
+	DBAR
+	LL	(R4), R6
+	ADDU	R6, R5, R7
+	MOVV	R7, R6
+	SC	R7, (R4)
+	BEQ	R7, -4(PC)
+	MOVW	R6, ret+16(FP)
+	DBAR
+	RET
+
+TEXT ·Xadd64(SB), NOSPLIT, $0-24
+	MOVV	ptr+0(FP), R4
+	MOVV	delta+8(FP), R5
+	DBAR
+	LLV	(R4), R6
+	ADDVU	R6, R5, R7
+	MOVV	R7, R6
+	SCV	R7, (R4)
+	BEQ	R7, -4(PC)
+	MOVV	R6, ret+16(FP)
+	DBAR
+	RET
+
+TEXT ·Xchg(SB), NOSPLIT, $0-20
+	MOVV	ptr+0(FP), R4
+	MOVW	new+8(FP), R5
+
+	DBAR
+	MOVV	R5, R6
+	LL	(R4), R7
+	SC	R6, (R4)
+	BEQ	R6, -3(PC)
+	MOVW	R7, ret+16(FP)
+	DBAR
+	RET
+
+TEXT ·Xchg64(SB), NOSPLIT, $0-24
+	MOVV	ptr+0(FP), R4
+	MOVV	new+8(FP), R5
+
+	DBAR
+	MOVV	R5, R6
+	LLV	(R4), R7
+	SCV	R6, (R4)
+	BEQ	R6, -3(PC)
+	MOVV	R7, ret+16(FP)
+	DBAR
+	RET
+
+TEXT ·Xchguintptr(SB), NOSPLIT, $0-24
+	JMP	·Xchg64(SB)
+
+TEXT ·StorepNoWB(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·StoreRel(SB), NOSPLIT, $0-12
+	JMP	·Store(SB)
+
+TEXT ·StoreRel64(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·StoreReluintptr(SB), NOSPLIT, $0-16
+	JMP     ·Store64(SB)
+
+TEXT ·Store(SB), NOSPLIT, $0-12
+	MOVV	ptr+0(FP), R4
+	MOVW	val+8(FP), R5
+	DBAR
+	MOVW	R5, 0(R4)
+	DBAR
+	RET
+
+TEXT ·Store8(SB), NOSPLIT, $0-9
+	MOVV	ptr+0(FP), R4
+	MOVB	val+8(FP), R5
+	DBAR
+	MOVB	R5, 0(R4)
+	DBAR
+	RET
+
+TEXT ·Store64(SB), NOSPLIT, $0-16
+	MOVV	ptr+0(FP), R4
+	MOVV	val+8(FP), R5
+	DBAR
+	MOVV	R5, 0(R4)
+	DBAR
+	RET
+
+// void	Or8(byte volatile*, byte);
+TEXT ·Or8(SB), NOSPLIT, $0-9
+	MOVV	ptr+0(FP), R4
+	MOVBU	val+8(FP), R5
+	// Align ptr down to 4 bytes so we can use 32-bit load/store.
+	MOVV	$~3, R6
+	AND	R4, R6
+	// R7 = ((ptr & 3) * 8)
+	AND	$3, R4, R7
+	SLLV	$3, R7
+	// Shift val for aligned ptr. R5 = val << R4
+	SLLV	R7, R5
+
+	DBAR
+	LL	(R6), R7
+	OR	R5, R7
+	SC	R7, (R6)
+	BEQ	R7, -4(PC)
+	DBAR
+	RET
+
+// void	And8(byte volatile*, byte);
+TEXT ·And8(SB), NOSPLIT, $0-9
+	MOVV	ptr+0(FP), R4
+	MOVBU	val+8(FP), R5
+	// Align ptr down to 4 bytes so we can use 32-bit load/store.
+	MOVV	$~3, R6
+	AND	R4, R6
+	// R7 = ((ptr & 3) * 8)
+	AND	$3, R4, R7
+	SLLV	$3, R7
+	// Shift val for aligned ptr. R5 = val << R7 | ^(0xFF << R7)
+	MOVV	$0xFF, R8
+	SLLV	R7, R5
+	SLLV	R7, R8
+	NOR	R0, R8
+	OR	R8, R5
+
+	DBAR
+	LL	(R6), R7
+	AND	R5, R7
+	SC	R7, (R6)
+	BEQ	R7, -4(PC)
+	DBAR
+	RET
+
+// func Or(addr *uint32, v uint32)
+TEXT ·Or(SB), NOSPLIT, $0-12
+	MOVV	ptr+0(FP), R4
+	MOVW	val+8(FP), R5
+	DBAR
+	LL	(R4), R6
+	OR	R5, R6
+	SC	R6, (R4)
+	BEQ	R6, -4(PC)
+	DBAR
+	RET
+
+// func And(addr *uint32, v uint32)
+TEXT ·And(SB), NOSPLIT, $0-12
+	MOVV	ptr+0(FP), R4
+	MOVW	val+8(FP), R5
+	DBAR
+	LL	(R4), R6
+	AND	R5, R6
+	SC	R6, (R4)
+	BEQ	R6, -4(PC)
+	DBAR
+	RET
+
+// uint32 runtime∕internal∕atomic·Load(uint32 volatile* ptr)
+TEXT ·Load(SB),NOSPLIT|NOFRAME,$0-12
+	MOVV	ptr+0(FP), R19
+	DBAR
+	MOVWU	0(R19), R19
+	DBAR
+	MOVW	R19, ret+8(FP)
+	RET
+
+// uint8 runtime∕internal∕atomic·Load8(uint8 volatile* ptr)
+TEXT ·Load8(SB),NOSPLIT|NOFRAME,$0-9
+	MOVV	ptr+0(FP), R19
+	DBAR
+	MOVBU	0(R19), R19
+	DBAR
+	MOVB	R19, ret+8(FP)
+	RET
+
+// uint64 runtime∕internal∕atomic·Load64(uint64 volatile* ptr)
+TEXT ·Load64(SB),NOSPLIT|NOFRAME,$0-16
+	MOVV	ptr+0(FP), R19
+	DBAR
+	MOVV	0(R19), R19
+	DBAR
+	MOVV	R19, ret+8(FP)
+	RET
+
+// void *runtime∕internal∕atomic·Loadp(void *volatile *ptr)
+TEXT ·Loadp(SB),NOSPLIT|NOFRAME,$0-16
+	MOVV	ptr+0(FP), R19
+	DBAR
+	MOVV	0(R19), R19
+	DBAR
+	MOVV	R19, ret+8(FP)
+	RET
+
+// uint32 runtime∕internal∕atomic·LoadAcq(uint32 volatile* ptr)
+TEXT ·LoadAcq(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	·Load(SB)
+
+// uint64 ·LoadAcq64(uint64 volatile* ptr)
+TEXT ·LoadAcq64(SB),NOSPLIT|NOFRAME,$0-16
+	JMP	·Load64(SB)
+
+// uintptr ·LoadAcquintptr(uintptr volatile* ptr)
+TEXT ·LoadAcquintptr(SB),NOSPLIT|NOFRAME,$0-16
+	JMP	·Load64(SB)
+
diff --git a/src/runtime/internal/atomic/atomic_mips64x.go b/src/runtime/internal/atomic/atomic_mips64x.go
new file mode 100644
index 0000000..1e12b83
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_mips64x.go
@@ -0,0 +1,89 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips64 || mips64le
+
+package atomic
+
+import "unsafe"
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xadd64(ptr *uint64, delta int64) uint64
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchg64(ptr *uint64, new uint64) uint64
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:noescape
+func Load(ptr *uint32) uint32
+
+//go:noescape
+func Load8(ptr *uint8) uint8
+
+//go:noescape
+func Load64(ptr *uint64) uint64
+
+// NO go:noescape annotation; *ptr escapes if result escapes (#31525)
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer
+
+//go:noescape
+func LoadAcq(ptr *uint32) uint32
+
+//go:noescape
+func LoadAcq64(ptr *uint64) uint64
+
+//go:noescape
+func LoadAcquintptr(ptr *uintptr) uintptr
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+// NOTE: Do not add atomicxor8 (XOR is not idempotent).
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+//go:noescape
+func Cas64(ptr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(ptr *uint32, old, new uint32) bool
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+//go:noescape
+func Store64(ptr *uint64, val uint64)
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+
+//go:noescape
+func StoreRel(ptr *uint32, val uint32)
+
+//go:noescape
+func StoreRel64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreReluintptr(ptr *uintptr, val uintptr)
diff --git a/src/runtime/internal/atomic/atomic_mips64x.s b/src/runtime/internal/atomic/atomic_mips64x.s
new file mode 100644
index 0000000..b4411d8
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_mips64x.s
@@ -0,0 +1,359 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips64 || mips64le
+
+#include "textflag.h"
+
+#define SYNC	WORD $0xf
+
+// bool cas(uint32 *ptr, uint32 old, uint32 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT ·Cas(SB), NOSPLIT, $0-17
+	MOVV	ptr+0(FP), R1
+	MOVW	old+8(FP), R2
+	MOVW	new+12(FP), R5
+	SYNC
+cas_again:
+	MOVV	R5, R3
+	LL	(R1), R4
+	BNE	R2, R4, cas_fail
+	SC	R3, (R1)
+	BEQ	R3, cas_again
+	MOVV	$1, R1
+	MOVB	R1, ret+16(FP)
+	SYNC
+	RET
+cas_fail:
+	MOVV	$0, R1
+	JMP	-4(PC)
+
+// bool	cas64(uint64 *ptr, uint64 old, uint64 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else {
+//		return 0;
+//	}
+TEXT ·Cas64(SB), NOSPLIT, $0-25
+	MOVV	ptr+0(FP), R1
+	MOVV	old+8(FP), R2
+	MOVV	new+16(FP), R5
+	SYNC
+cas64_again:
+	MOVV	R5, R3
+	LLV	(R1), R4
+	BNE	R2, R4, cas64_fail
+	SCV	R3, (R1)
+	BEQ	R3, cas64_again
+	MOVV	$1, R1
+	MOVB	R1, ret+24(FP)
+	SYNC
+	RET
+cas64_fail:
+	MOVV	$0, R1
+	JMP	-4(PC)
+
+TEXT ·Casint32(SB), NOSPLIT, $0-17
+	JMP	·Cas(SB)
+
+TEXT ·Casint64(SB), NOSPLIT, $0-25
+	JMP	·Cas64(SB)
+
+TEXT ·Casuintptr(SB), NOSPLIT, $0-25
+	JMP	·Cas64(SB)
+
+TEXT ·CasRel(SB), NOSPLIT, $0-17
+	JMP	·Cas(SB)
+
+TEXT ·Loaduintptr(SB),  NOSPLIT|NOFRAME, $0-16
+	JMP	·Load64(SB)
+
+TEXT ·Loaduint(SB), NOSPLIT|NOFRAME, $0-16
+	JMP	·Load64(SB)
+
+TEXT ·Storeint32(SB), NOSPLIT, $0-12
+	JMP	·Store(SB)
+
+TEXT ·Storeint64(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·Storeuintptr(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·Xadduintptr(SB), NOSPLIT, $0-24
+	JMP	·Xadd64(SB)
+
+TEXT ·Loadint32(SB), NOSPLIT, $0-12
+	JMP	·Load(SB)
+
+TEXT ·Loadint64(SB), NOSPLIT, $0-16
+	JMP	·Load64(SB)
+
+TEXT ·Xaddint32(SB), NOSPLIT, $0-20
+	JMP	·Xadd(SB)
+
+TEXT ·Xaddint64(SB), NOSPLIT, $0-24
+	JMP	·Xadd64(SB)
+
+// bool casp(void **val, void *old, void *new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT ·Casp1(SB), NOSPLIT, $0-25
+	JMP ·Cas64(SB)
+
+// uint32 xadd(uint32 volatile *ptr, int32 delta)
+// Atomically:
+//	*val += delta;
+//	return *val;
+TEXT ·Xadd(SB), NOSPLIT, $0-20
+	MOVV	ptr+0(FP), R2
+	MOVW	delta+8(FP), R3
+	SYNC
+	LL	(R2), R1
+	ADDU	R1, R3, R4
+	MOVV	R4, R1
+	SC	R4, (R2)
+	BEQ	R4, -4(PC)
+	MOVW	R1, ret+16(FP)
+	SYNC
+	RET
+
+// uint64 Xadd64(uint64 volatile *ptr, int64 delta)
+// Atomically:
+//	*val += delta;
+//	return *val;
+TEXT ·Xadd64(SB), NOSPLIT, $0-24
+	MOVV	ptr+0(FP), R2
+	MOVV	delta+8(FP), R3
+	SYNC
+	LLV	(R2), R1
+	ADDVU	R1, R3, R4
+	MOVV	R4, R1
+	SCV	R4, (R2)
+	BEQ	R4, -4(PC)
+	MOVV	R1, ret+16(FP)
+	SYNC
+	RET
+
+// uint32 Xchg(ptr *uint32, new uint32)
+// Atomically:
+//	old := *ptr;
+//	*ptr = new;
+//	return old;
+TEXT ·Xchg(SB), NOSPLIT, $0-20
+	MOVV	ptr+0(FP), R2
+	MOVW	new+8(FP), R5
+
+	SYNC
+	MOVV	R5, R3
+	LL	(R2), R1
+	SC	R3, (R2)
+	BEQ	R3, -3(PC)
+	MOVW	R1, ret+16(FP)
+	SYNC
+	RET
+
+// uint64 Xchg64(ptr *uint64, new uint64)
+// Atomically:
+//	old := *ptr;
+//	*ptr = new;
+//	return old;
+TEXT ·Xchg64(SB), NOSPLIT, $0-24
+	MOVV	ptr+0(FP), R2
+	MOVV	new+8(FP), R5
+
+	SYNC
+	MOVV	R5, R3
+	LLV	(R2), R1
+	SCV	R3, (R2)
+	BEQ	R3, -3(PC)
+	MOVV	R1, ret+16(FP)
+	SYNC
+	RET
+
+TEXT ·Xchgint32(SB), NOSPLIT, $0-20
+	JMP	·Xchg(SB)
+
+TEXT ·Xchgint64(SB), NOSPLIT, $0-24
+	JMP	·Xchg64(SB)
+
+TEXT ·Xchguintptr(SB), NOSPLIT, $0-24
+	JMP	·Xchg64(SB)
+
+TEXT ·StorepNoWB(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·StoreRel(SB), NOSPLIT, $0-12
+	JMP	·Store(SB)
+
+TEXT ·StoreRel64(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·StoreReluintptr(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·Store(SB), NOSPLIT, $0-12
+	MOVV	ptr+0(FP), R1
+	MOVW	val+8(FP), R2
+	SYNC
+	MOVW	R2, 0(R1)
+	SYNC
+	RET
+
+TEXT ·Store8(SB), NOSPLIT, $0-9
+	MOVV	ptr+0(FP), R1
+	MOVB	val+8(FP), R2
+	SYNC
+	MOVB	R2, 0(R1)
+	SYNC
+	RET
+
+TEXT ·Store64(SB), NOSPLIT, $0-16
+	MOVV	ptr+0(FP), R1
+	MOVV	val+8(FP), R2
+	SYNC
+	MOVV	R2, 0(R1)
+	SYNC
+	RET
+
+// void	Or8(byte volatile*, byte);
+TEXT ·Or8(SB), NOSPLIT, $0-9
+	MOVV	ptr+0(FP), R1
+	MOVBU	val+8(FP), R2
+	// Align ptr down to 4 bytes so we can use 32-bit load/store.
+	MOVV	$~3, R3
+	AND	R1, R3
+	// Compute val shift.
+#ifdef GOARCH_mips64
+	// Big endian.  ptr = ptr ^ 3
+	XOR	$3, R1
+#endif
+	// R4 = ((ptr & 3) * 8)
+	AND	$3, R1, R4
+	SLLV	$3, R4
+	// Shift val for aligned ptr. R2 = val << R4
+	SLLV	R4, R2
+
+	SYNC
+	LL	(R3), R4
+	OR	R2, R4
+	SC	R4, (R3)
+	BEQ	R4, -4(PC)
+	SYNC
+	RET
+
+// void	And8(byte volatile*, byte);
+TEXT ·And8(SB), NOSPLIT, $0-9
+	MOVV	ptr+0(FP), R1
+	MOVBU	val+8(FP), R2
+	// Align ptr down to 4 bytes so we can use 32-bit load/store.
+	MOVV	$~3, R3
+	AND	R1, R3
+	// Compute val shift.
+#ifdef GOARCH_mips64
+	// Big endian.  ptr = ptr ^ 3
+	XOR	$3, R1
+#endif
+	// R4 = ((ptr & 3) * 8)
+	AND	$3, R1, R4
+	SLLV	$3, R4
+	// Shift val for aligned ptr. R2 = val << R4 | ^(0xFF << R4)
+	MOVV	$0xFF, R5
+	SLLV	R4, R2
+	SLLV	R4, R5
+	NOR	R0, R5
+	OR	R5, R2
+
+	SYNC
+	LL	(R3), R4
+	AND	R2, R4
+	SC	R4, (R3)
+	BEQ	R4, -4(PC)
+	SYNC
+	RET
+
+// func Or(addr *uint32, v uint32)
+TEXT ·Or(SB), NOSPLIT, $0-12
+	MOVV	ptr+0(FP), R1
+	MOVW	val+8(FP), R2
+
+	SYNC
+	LL	(R1), R3
+	OR	R2, R3
+	SC	R3, (R1)
+	BEQ	R3, -4(PC)
+	SYNC
+	RET
+
+// func And(addr *uint32, v uint32)
+TEXT ·And(SB), NOSPLIT, $0-12
+	MOVV	ptr+0(FP), R1
+	MOVW	val+8(FP), R2
+
+	SYNC
+	LL	(R1), R3
+	AND	R2, R3
+	SC	R3, (R1)
+	BEQ	R3, -4(PC)
+	SYNC
+	RET
+
+// uint32 ·Load(uint32 volatile* ptr)
+TEXT ·Load(SB),NOSPLIT|NOFRAME,$0-12
+	MOVV	ptr+0(FP), R1
+	SYNC
+	MOVWU	0(R1), R1
+	SYNC
+	MOVW	R1, ret+8(FP)
+	RET
+
+// uint8 ·Load8(uint8 volatile* ptr)
+TEXT ·Load8(SB),NOSPLIT|NOFRAME,$0-9
+	MOVV	ptr+0(FP), R1
+	SYNC
+	MOVBU	0(R1), R1
+	SYNC
+	MOVB	R1, ret+8(FP)
+	RET
+
+// uint64 ·Load64(uint64 volatile* ptr)
+TEXT ·Load64(SB),NOSPLIT|NOFRAME,$0-16
+	MOVV	ptr+0(FP), R1
+	SYNC
+	MOVV	0(R1), R1
+	SYNC
+	MOVV	R1, ret+8(FP)
+	RET
+
+// void *·Loadp(void *volatile *ptr)
+TEXT ·Loadp(SB),NOSPLIT|NOFRAME,$0-16
+	MOVV	ptr+0(FP), R1
+	SYNC
+	MOVV	0(R1), R1
+	SYNC
+	MOVV	R1, ret+8(FP)
+	RET
+
+// uint32 ·LoadAcq(uint32 volatile* ptr)
+TEXT ·LoadAcq(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	atomic·Load(SB)
+
+// uint64 ·LoadAcq64(uint64 volatile* ptr)
+TEXT ·LoadAcq64(SB),NOSPLIT|NOFRAME,$0-16
+	JMP	atomic·Load64(SB)
+
+// uintptr ·LoadAcquintptr(uintptr volatile* ptr)
+TEXT ·LoadAcquintptr(SB),NOSPLIT|NOFRAME,$0-16
+	JMP	atomic·Load64(SB)
diff --git a/src/runtime/internal/atomic/atomic_mipsx.go b/src/runtime/internal/atomic/atomic_mipsx.go
new file mode 100644
index 0000000..5dd15a0
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_mipsx.go
@@ -0,0 +1,167 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips || mipsle
+
+// Export some functions via linkname to assembly in sync/atomic.
+//
+//go:linkname Xadd64
+//go:linkname Xchg64
+//go:linkname Cas64
+//go:linkname Load64
+//go:linkname Store64
+
+package atomic
+
+import (
+	"internal/cpu"
+	"unsafe"
+)
+
+// TODO implement lock striping
+var lock struct {
+	state uint32
+	pad   [cpu.CacheLinePadSize - 4]byte
+}
+
+//go:noescape
+func spinLock(state *uint32)
+
+//go:noescape
+func spinUnlock(state *uint32)
+
+//go:nosplit
+func lockAndCheck(addr *uint64) {
+	// ensure 8-byte alignment
+	if uintptr(unsafe.Pointer(addr))&7 != 0 {
+		panicUnaligned()
+	}
+	// force dereference before taking lock
+	_ = *addr
+
+	spinLock(&lock.state)
+}
+
+//go:nosplit
+func unlock() {
+	spinUnlock(&lock.state)
+}
+
+//go:nosplit
+func unlockNoFence() {
+	lock.state = 0
+}
+
+//go:nosplit
+func Xadd64(addr *uint64, delta int64) (new uint64) {
+	lockAndCheck(addr)
+
+	new = *addr + uint64(delta)
+	*addr = new
+
+	unlock()
+	return
+}
+
+//go:nosplit
+func Xchg64(addr *uint64, new uint64) (old uint64) {
+	lockAndCheck(addr)
+
+	old = *addr
+	*addr = new
+
+	unlock()
+	return
+}
+
+//go:nosplit
+func Cas64(addr *uint64, old, new uint64) (swapped bool) {
+	lockAndCheck(addr)
+
+	if (*addr) == old {
+		*addr = new
+		unlock()
+		return true
+	}
+
+	unlockNoFence()
+	return false
+}
+
+//go:nosplit
+func Load64(addr *uint64) (val uint64) {
+	lockAndCheck(addr)
+
+	val = *addr
+
+	unlock()
+	return
+}
+
+//go:nosplit
+func Store64(addr *uint64, val uint64) {
+	lockAndCheck(addr)
+
+	*addr = val
+
+	unlock()
+	return
+}
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:noescape
+func Load(ptr *uint32) uint32
+
+//go:noescape
+func Load8(ptr *uint8) uint8
+
+// NO go:noescape annotation; *ptr escapes if result escapes (#31525)
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer
+
+//go:noescape
+func LoadAcq(ptr *uint32) uint32
+
+//go:noescape
+func LoadAcquintptr(ptr *uintptr) uintptr
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+
+//go:noescape
+func StoreRel(ptr *uint32, val uint32)
+
+//go:noescape
+func StoreReluintptr(ptr *uintptr, val uintptr)
+
+//go:noescape
+func CasRel(addr *uint32, old, new uint32) bool
diff --git a/src/runtime/internal/atomic/atomic_mipsx.s b/src/runtime/internal/atomic/atomic_mipsx.s
new file mode 100644
index 0000000..390e9ce
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_mipsx.s
@@ -0,0 +1,261 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips || mipsle
+
+#include "textflag.h"
+
+// bool Cas(int32 *val, int32 old, int32 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT ·Cas(SB),NOSPLIT,$0-13
+	MOVW	ptr+0(FP), R1
+	MOVW	old+4(FP), R2
+	MOVW	new+8(FP), R5
+	SYNC
+try_cas:
+	MOVW	R5, R3
+	LL	(R1), R4	// R4 = *R1
+	BNE	R2, R4, cas_fail
+	SC	R3, (R1)	// *R1 = R3
+	BEQ	R3, try_cas
+	SYNC
+	MOVB	R3, ret+12(FP)
+	RET
+cas_fail:
+	MOVB	R0, ret+12(FP)
+	RET
+
+TEXT ·Store(SB),NOSPLIT,$0-8
+	MOVW	ptr+0(FP), R1
+	MOVW	val+4(FP), R2
+	SYNC
+	MOVW	R2, 0(R1)
+	SYNC
+	RET
+
+TEXT ·Store8(SB),NOSPLIT,$0-5
+	MOVW	ptr+0(FP), R1
+	MOVB	val+4(FP), R2
+	SYNC
+	MOVB	R2, 0(R1)
+	SYNC
+	RET
+
+TEXT ·Load(SB),NOSPLIT,$0-8
+	MOVW	ptr+0(FP), R1
+	SYNC
+	MOVW	0(R1), R1
+	SYNC
+	MOVW	R1, ret+4(FP)
+	RET
+
+TEXT ·Load8(SB),NOSPLIT,$0-5
+	MOVW	ptr+0(FP), R1
+	SYNC
+	MOVB	0(R1), R1
+	SYNC
+	MOVB	R1, ret+4(FP)
+	RET
+
+// uint32 Xadd(uint32 volatile *val, int32 delta)
+// Atomically:
+//	*val += delta;
+//	return *val;
+TEXT ·Xadd(SB),NOSPLIT,$0-12
+	MOVW	ptr+0(FP), R2
+	MOVW	delta+4(FP), R3
+	SYNC
+try_xadd:
+	LL	(R2), R1	// R1 = *R2
+	ADDU	R1, R3, R4
+	MOVW	R4, R1
+	SC	R4, (R2)	// *R2 = R4
+	BEQ	R4, try_xadd
+	SYNC
+	MOVW	R1, ret+8(FP)
+	RET
+
+// uint32 Xchg(ptr *uint32, new uint32)
+// Atomically:
+//	old := *ptr;
+//	*ptr = new;
+//	return old;
+TEXT ·Xchg(SB),NOSPLIT,$0-12
+	MOVW	ptr+0(FP), R2
+	MOVW	new+4(FP), R5
+	SYNC
+try_xchg:
+	MOVW	R5, R3
+	LL	(R2), R1	// R1 = *R2
+	SC	R3, (R2)	// *R2 = R3
+	BEQ	R3, try_xchg
+	SYNC
+	MOVW	R1, ret+8(FP)
+	RET
+
+TEXT ·Casint32(SB),NOSPLIT,$0-13
+	JMP	·Cas(SB)
+
+TEXT ·Casint64(SB),NOSPLIT,$0-21
+	JMP	·Cas64(SB)
+
+TEXT ·Casuintptr(SB),NOSPLIT,$0-13
+	JMP	·Cas(SB)
+
+TEXT ·CasRel(SB),NOSPLIT,$0-13
+	JMP	·Cas(SB)
+
+TEXT ·Loaduintptr(SB),NOSPLIT,$0-8
+	JMP	·Load(SB)
+
+TEXT ·Loaduint(SB),NOSPLIT,$0-8
+	JMP	·Load(SB)
+
+TEXT ·Loadp(SB),NOSPLIT,$-0-8
+	JMP	·Load(SB)
+
+TEXT ·Storeint32(SB),NOSPLIT,$0-8
+	JMP	·Store(SB)
+
+TEXT ·Storeint64(SB),NOSPLIT,$0-12
+	JMP	·Store64(SB)
+
+TEXT ·Storeuintptr(SB),NOSPLIT,$0-8
+	JMP	·Store(SB)
+
+TEXT ·Xadduintptr(SB),NOSPLIT,$0-12
+	JMP	·Xadd(SB)
+
+TEXT ·Loadint32(SB),NOSPLIT,$0-8
+	JMP	·Load(SB)
+
+TEXT ·Loadint64(SB),NOSPLIT,$0-12
+	JMP	·Load64(SB)
+
+TEXT ·Xaddint32(SB),NOSPLIT,$0-12
+	JMP	·Xadd(SB)
+
+TEXT ·Xaddint64(SB),NOSPLIT,$0-20
+	JMP	·Xadd64(SB)
+
+TEXT ·Casp1(SB),NOSPLIT,$0-13
+	JMP	·Cas(SB)
+
+TEXT ·Xchgint32(SB),NOSPLIT,$0-12
+	JMP	·Xchg(SB)
+
+TEXT ·Xchgint64(SB),NOSPLIT,$0-20
+	JMP	·Xchg64(SB)
+
+TEXT ·Xchguintptr(SB),NOSPLIT,$0-12
+	JMP	·Xchg(SB)
+
+TEXT ·StorepNoWB(SB),NOSPLIT,$0-8
+	JMP	·Store(SB)
+
+TEXT ·StoreRel(SB),NOSPLIT,$0-8
+	JMP	·Store(SB)
+
+TEXT ·StoreReluintptr(SB),NOSPLIT,$0-8
+	JMP	·Store(SB)
+
+// void	Or8(byte volatile*, byte);
+TEXT ·Or8(SB),NOSPLIT,$0-5
+	MOVW	ptr+0(FP), R1
+	MOVBU	val+4(FP), R2
+	MOVW	$~3, R3	// Align ptr down to 4 bytes so we can use 32-bit load/store.
+	AND	R1, R3
+#ifdef GOARCH_mips
+	// Big endian.  ptr = ptr ^ 3
+	XOR	$3, R1
+#endif
+	AND	$3, R1, R4	// R4 = ((ptr & 3) * 8)
+	SLL	$3, R4
+	SLL	R4, R2, R2	// Shift val for aligned ptr. R2 = val << R4
+	SYNC
+try_or8:
+	LL	(R3), R4	// R4 = *R3
+	OR	R2, R4
+	SC	R4, (R3)	// *R3 = R4
+	BEQ	R4, try_or8
+	SYNC
+	RET
+
+// void	And8(byte volatile*, byte);
+TEXT ·And8(SB),NOSPLIT,$0-5
+	MOVW	ptr+0(FP), R1
+	MOVBU	val+4(FP), R2
+	MOVW	$~3, R3
+	AND	R1, R3
+#ifdef GOARCH_mips
+	// Big endian.  ptr = ptr ^ 3
+	XOR	$3, R1
+#endif
+	AND	$3, R1, R4	// R4 = ((ptr & 3) * 8)
+	SLL	$3, R4
+	MOVW	$0xFF, R5
+	SLL	R4, R2
+	SLL	R4, R5
+	NOR	R0, R5
+	OR	R5, R2	// Shift val for aligned ptr. R2 = val << R4 | ^(0xFF << R4)
+	SYNC
+try_and8:
+	LL	(R3), R4	// R4 = *R3
+	AND	R2, R4
+	SC	R4, (R3)	// *R3 = R4
+	BEQ	R4, try_and8
+	SYNC
+	RET
+
+// func Or(addr *uint32, v uint32)
+TEXT ·Or(SB), NOSPLIT, $0-8
+	MOVW	ptr+0(FP), R1
+	MOVW	val+4(FP), R2
+
+	SYNC
+	LL	(R1), R3
+	OR	R2, R3
+	SC	R3, (R1)
+	BEQ	R3, -4(PC)
+	SYNC
+	RET
+
+// func And(addr *uint32, v uint32)
+TEXT ·And(SB), NOSPLIT, $0-8
+	MOVW	ptr+0(FP), R1
+	MOVW	val+4(FP), R2
+
+	SYNC
+	LL	(R1), R3
+	AND	R2, R3
+	SC	R3, (R1)
+	BEQ	R3, -4(PC)
+	SYNC
+	RET
+
+TEXT ·spinLock(SB),NOSPLIT,$0-4
+	MOVW	state+0(FP), R1
+	MOVW	$1, R2
+	SYNC
+try_lock:
+	MOVW	R2, R3
+check_again:
+	LL	(R1), R4
+	BNE	R4, check_again
+	SC	R3, (R1)
+	BEQ	R3, try_lock
+	SYNC
+	RET
+
+TEXT ·spinUnlock(SB),NOSPLIT,$0-4
+	MOVW	state+0(FP), R1
+	SYNC
+	MOVW	R0, (R1)
+	SYNC
+	RET
diff --git a/src/runtime/internal/atomic/atomic_ppc64x.go b/src/runtime/internal/atomic/atomic_ppc64x.go
new file mode 100644
index 0000000..998d16e
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_ppc64x.go
@@ -0,0 +1,89 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ppc64 || ppc64le
+
+package atomic
+
+import "unsafe"
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xadd64(ptr *uint64, delta int64) uint64
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchg64(ptr *uint64, new uint64) uint64
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:noescape
+func Load(ptr *uint32) uint32
+
+//go:noescape
+func Load8(ptr *uint8) uint8
+
+//go:noescape
+func Load64(ptr *uint64) uint64
+
+// NO go:noescape annotation; *ptr escapes if result escapes (#31525)
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer
+
+//go:noescape
+func LoadAcq(ptr *uint32) uint32
+
+//go:noescape
+func LoadAcq64(ptr *uint64) uint64
+
+//go:noescape
+func LoadAcquintptr(ptr *uintptr) uintptr
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+// NOTE: Do not add atomicxor8 (XOR is not idempotent).
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+//go:noescape
+func Cas64(ptr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(ptr *uint32, old, new uint32) bool
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+//go:noescape
+func Store64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreRel(ptr *uint32, val uint32)
+
+//go:noescape
+func StoreRel64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreReluintptr(ptr *uintptr, val uintptr)
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
diff --git a/src/runtime/internal/atomic/atomic_ppc64x.s b/src/runtime/internal/atomic/atomic_ppc64x.s
new file mode 100644
index 0000000..04f0ead
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_ppc64x.s
@@ -0,0 +1,362 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ppc64 || ppc64le
+
+#include "textflag.h"
+
+// For more details about how various memory models are
+// enforced on POWER, the following paper provides more
+// details about how they enforce C/C++ like models. This
+// gives context about why the strange looking code
+// sequences below work.
+//
+// http://www.rdrop.com/users/paulmck/scalability/paper/N2745r.2011.03.04a.html
+
+// uint32 ·Load(uint32 volatile* ptr)
+TEXT ·Load(SB),NOSPLIT|NOFRAME,$-8-12
+	MOVD	ptr+0(FP), R3
+	SYNC
+	MOVWZ	0(R3), R3
+	CMPW	R3, R3, CR7
+	BC	4, 30, 1(PC) // bne- cr7,0x4
+	ISYNC
+	MOVW	R3, ret+8(FP)
+	RET
+
+// uint8 ·Load8(uint8 volatile* ptr)
+TEXT ·Load8(SB),NOSPLIT|NOFRAME,$-8-9
+	MOVD	ptr+0(FP), R3
+	SYNC
+	MOVBZ	0(R3), R3
+	CMP	R3, R3, CR7
+	BC	4, 30, 1(PC) // bne- cr7,0x4
+	ISYNC
+	MOVB	R3, ret+8(FP)
+	RET
+
+// uint64 ·Load64(uint64 volatile* ptr)
+TEXT ·Load64(SB),NOSPLIT|NOFRAME,$-8-16
+	MOVD	ptr+0(FP), R3
+	SYNC
+	MOVD	0(R3), R3
+	CMP	R3, R3, CR7
+	BC	4, 30, 1(PC) // bne- cr7,0x4
+	ISYNC
+	MOVD	R3, ret+8(FP)
+	RET
+
+// void *·Loadp(void *volatile *ptr)
+TEXT ·Loadp(SB),NOSPLIT|NOFRAME,$-8-16
+	MOVD	ptr+0(FP), R3
+	SYNC
+	MOVD	0(R3), R3
+	CMP	R3, R3, CR7
+	BC	4, 30, 1(PC) // bne- cr7,0x4
+	ISYNC
+	MOVD	R3, ret+8(FP)
+	RET
+
+// uint32 ·LoadAcq(uint32 volatile* ptr)
+TEXT ·LoadAcq(SB),NOSPLIT|NOFRAME,$-8-12
+	MOVD   ptr+0(FP), R3
+	MOVWZ  0(R3), R3
+	CMPW   R3, R3, CR7
+	BC     4, 30, 1(PC) // bne- cr7, 0x4
+	ISYNC
+	MOVW   R3, ret+8(FP)
+	RET
+
+// uint64 ·LoadAcq64(uint64 volatile* ptr)
+TEXT ·LoadAcq64(SB),NOSPLIT|NOFRAME,$-8-16
+	MOVD   ptr+0(FP), R3
+	MOVD   0(R3), R3
+	CMP    R3, R3, CR7
+	BC     4, 30, 1(PC) // bne- cr7, 0x4
+	ISYNC
+	MOVD   R3, ret+8(FP)
+	RET
+
+// bool cas(uint32 *ptr, uint32 old, uint32 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT ·Cas(SB), NOSPLIT, $0-17
+	MOVD	ptr+0(FP), R3
+	MOVWZ	old+8(FP), R4
+	MOVWZ	new+12(FP), R5
+	LWSYNC
+cas_again:
+	LWAR	(R3), R6
+	CMPW	R6, R4
+	BNE	cas_fail
+	STWCCC	R5, (R3)
+	BNE	cas_again
+	MOVD	$1, R3
+	LWSYNC
+	MOVB	R3, ret+16(FP)
+	RET
+cas_fail:
+	MOVB	R0, ret+16(FP)
+	RET
+
+// bool	·Cas64(uint64 *ptr, uint64 old, uint64 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else {
+//		return 0;
+//	}
+TEXT ·Cas64(SB), NOSPLIT, $0-25
+	MOVD	ptr+0(FP), R3
+	MOVD	old+8(FP), R4
+	MOVD	new+16(FP), R5
+	LWSYNC
+cas64_again:
+	LDAR	(R3), R6
+	CMP	R6, R4
+	BNE	cas64_fail
+	STDCCC	R5, (R3)
+	BNE	cas64_again
+	MOVD	$1, R3
+	LWSYNC
+	MOVB	R3, ret+24(FP)
+	RET
+cas64_fail:
+	MOVB	R0, ret+24(FP)
+	RET
+
+TEXT ·CasRel(SB), NOSPLIT, $0-17
+	MOVD    ptr+0(FP), R3
+	MOVWZ   old+8(FP), R4
+	MOVWZ   new+12(FP), R5
+	LWSYNC
+cas_again:
+	LWAR    (R3), $0, R6        // 0 = Mutex release hint
+	CMPW    R6, R4
+	BNE     cas_fail
+	STWCCC  R5, (R3)
+	BNE     cas_again
+	MOVD    $1, R3
+	MOVB    R3, ret+16(FP)
+	RET
+cas_fail:
+	MOVB    R0, ret+16(FP)
+	RET
+
+TEXT ·Casint32(SB), NOSPLIT, $0-17
+	BR	·Cas(SB)
+
+TEXT ·Casint64(SB), NOSPLIT, $0-25
+	BR	·Cas64(SB)
+
+TEXT ·Casuintptr(SB), NOSPLIT, $0-25
+	BR	·Cas64(SB)
+
+TEXT ·Loaduintptr(SB),  NOSPLIT|NOFRAME, $0-16
+	BR	·Load64(SB)
+
+TEXT ·LoadAcquintptr(SB),  NOSPLIT|NOFRAME, $0-16
+	BR	·LoadAcq64(SB)
+
+TEXT ·Loaduint(SB), NOSPLIT|NOFRAME, $0-16
+	BR	·Load64(SB)
+
+TEXT ·Storeint32(SB), NOSPLIT, $0-12
+	BR	·Store(SB)
+
+TEXT ·Storeint64(SB), NOSPLIT, $0-16
+	BR	·Store64(SB)
+
+TEXT ·Storeuintptr(SB), NOSPLIT, $0-16
+	BR	·Store64(SB)
+
+TEXT ·StoreReluintptr(SB), NOSPLIT, $0-16
+	BR	·StoreRel64(SB)
+
+TEXT ·Xadduintptr(SB), NOSPLIT, $0-24
+	BR	·Xadd64(SB)
+
+TEXT ·Loadint32(SB), NOSPLIT, $0-12
+	BR	·Load(SB)
+
+TEXT ·Loadint64(SB), NOSPLIT, $0-16
+	BR	·Load64(SB)
+
+TEXT ·Xaddint32(SB), NOSPLIT, $0-20
+	BR	·Xadd(SB)
+
+TEXT ·Xaddint64(SB), NOSPLIT, $0-24
+	BR	·Xadd64(SB)
+
+// bool casp(void **val, void *old, void *new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	} else
+//		return 0;
+TEXT ·Casp1(SB), NOSPLIT, $0-25
+	BR ·Cas64(SB)
+
+// uint32 xadd(uint32 volatile *ptr, int32 delta)
+// Atomically:
+//	*val += delta;
+//	return *val;
+TEXT ·Xadd(SB), NOSPLIT, $0-20
+	MOVD	ptr+0(FP), R4
+	MOVW	delta+8(FP), R5
+	LWSYNC
+	LWAR	(R4), R3
+	ADD	R5, R3
+	STWCCC	R3, (R4)
+	BNE	-3(PC)
+	MOVW	R3, ret+16(FP)
+	RET
+
+// uint64 Xadd64(uint64 volatile *val, int64 delta)
+// Atomically:
+//	*val += delta;
+//	return *val;
+TEXT ·Xadd64(SB), NOSPLIT, $0-24
+	MOVD	ptr+0(FP), R4
+	MOVD	delta+8(FP), R5
+	LWSYNC
+	LDAR	(R4), R3
+	ADD	R5, R3
+	STDCCC	R3, (R4)
+	BNE	-3(PC)
+	MOVD	R3, ret+16(FP)
+	RET
+
+// uint32 Xchg(ptr *uint32, new uint32)
+// Atomically:
+//	old := *ptr;
+//	*ptr = new;
+//	return old;
+TEXT ·Xchg(SB), NOSPLIT, $0-20
+	MOVD	ptr+0(FP), R4
+	MOVW	new+8(FP), R5
+	LWSYNC
+	LWAR	(R4), R3
+	STWCCC	R5, (R4)
+	BNE	-2(PC)
+	ISYNC
+	MOVW	R3, ret+16(FP)
+	RET
+
+// uint64 Xchg64(ptr *uint64, new uint64)
+// Atomically:
+//	old := *ptr;
+//	*ptr = new;
+//	return old;
+TEXT ·Xchg64(SB), NOSPLIT, $0-24
+	MOVD	ptr+0(FP), R4
+	MOVD	new+8(FP), R5
+	LWSYNC
+	LDAR	(R4), R3
+	STDCCC	R5, (R4)
+	BNE	-2(PC)
+	ISYNC
+	MOVD	R3, ret+16(FP)
+	RET
+
+TEXT ·Xchgint32(SB), NOSPLIT, $0-20
+	BR	·Xchg(SB)
+
+TEXT ·Xchgint64(SB), NOSPLIT, $0-24
+	BR	·Xchg64(SB)
+
+TEXT ·Xchguintptr(SB), NOSPLIT, $0-24
+	BR	·Xchg64(SB)
+
+TEXT ·StorepNoWB(SB), NOSPLIT, $0-16
+	BR	·Store64(SB)
+
+TEXT ·Store(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R3
+	MOVW	val+8(FP), R4
+	SYNC
+	MOVW	R4, 0(R3)
+	RET
+
+TEXT ·Store8(SB), NOSPLIT, $0-9
+	MOVD	ptr+0(FP), R3
+	MOVB	val+8(FP), R4
+	SYNC
+	MOVB	R4, 0(R3)
+	RET
+
+TEXT ·Store64(SB), NOSPLIT, $0-16
+	MOVD	ptr+0(FP), R3
+	MOVD	val+8(FP), R4
+	SYNC
+	MOVD	R4, 0(R3)
+	RET
+
+TEXT ·StoreRel(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R3
+	MOVW	val+8(FP), R4
+	LWSYNC
+	MOVW	R4, 0(R3)
+	RET
+
+TEXT ·StoreRel64(SB), NOSPLIT, $0-16
+	MOVD	ptr+0(FP), R3
+	MOVD	val+8(FP), R4
+	LWSYNC
+	MOVD	R4, 0(R3)
+	RET
+
+// void ·Or8(byte volatile*, byte);
+TEXT ·Or8(SB), NOSPLIT, $0-9
+	MOVD	ptr+0(FP), R3
+	MOVBZ	val+8(FP), R4
+	LWSYNC
+again:
+	LBAR	(R3), R6
+	OR	R4, R6
+	STBCCC	R6, (R3)
+	BNE	again
+	RET
+
+// void ·And8(byte volatile*, byte);
+TEXT ·And8(SB), NOSPLIT, $0-9
+	MOVD	ptr+0(FP), R3
+	MOVBZ	val+8(FP), R4
+	LWSYNC
+again:
+	LBAR	(R3), R6
+	AND	R4, R6
+	STBCCC	R6, (R3)
+	BNE	again
+	RET
+
+// func Or(addr *uint32, v uint32)
+TEXT ·Or(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R3
+	MOVW	val+8(FP), R4
+	LWSYNC
+again:
+	LWAR	(R3), R6
+	OR	R4, R6
+	STWCCC	R6, (R3)
+	BNE	again
+	RET
+
+// func And(addr *uint32, v uint32)
+TEXT ·And(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R3
+	MOVW	val+8(FP), R4
+	LWSYNC
+again:
+	LWAR	(R3),R6
+	AND	R4, R6
+	STWCCC	R6, (R3)
+	BNE	again
+	RET
diff --git a/src/runtime/internal/atomic/atomic_riscv64.go b/src/runtime/internal/atomic/atomic_riscv64.go
new file mode 100644
index 0000000..8f24d61
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_riscv64.go
@@ -0,0 +1,85 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package atomic
+
+import "unsafe"
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xadd64(ptr *uint64, delta int64) uint64
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchg64(ptr *uint64, new uint64) uint64
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:noescape
+func Load(ptr *uint32) uint32
+
+//go:noescape
+func Load8(ptr *uint8) uint8
+
+//go:noescape
+func Load64(ptr *uint64) uint64
+
+// NO go:noescape annotation; *ptr escapes if result escapes (#31525)
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer
+
+//go:noescape
+func LoadAcq(ptr *uint32) uint32
+
+//go:noescape
+func LoadAcq64(ptr *uint64) uint64
+
+//go:noescape
+func LoadAcquintptr(ptr *uintptr) uintptr
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+//go:noescape
+func Cas64(ptr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(ptr *uint32, old, new uint32) bool
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+//go:noescape
+func Store64(ptr *uint64, val uint64)
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+
+//go:noescape
+func StoreRel(ptr *uint32, val uint32)
+
+//go:noescape
+func StoreRel64(ptr *uint64, val uint64)
+
+//go:noescape
+func StoreReluintptr(ptr *uintptr, val uintptr)
diff --git a/src/runtime/internal/atomic/atomic_riscv64.s b/src/runtime/internal/atomic/atomic_riscv64.s
new file mode 100644
index 0000000..21d5adc
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_riscv64.s
@@ -0,0 +1,284 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// RISC-V's atomic operations have two bits, aq ("acquire") and rl ("release"),
+// which may be toggled on and off. Their precise semantics are defined in
+// section 6.3 of the specification, but the basic idea is as follows:
+//
+//   - If neither aq nor rl is set, the CPU may reorder the atomic arbitrarily.
+//     It guarantees only that it will execute atomically.
+//
+//   - If aq is set, the CPU may move the instruction backward, but not forward.
+//
+//   - If rl is set, the CPU may move the instruction forward, but not backward.
+//
+//   - If both are set, the CPU may not reorder the instruction at all.
+//
+// These four modes correspond to other well-known memory models on other CPUs.
+// On ARM, aq corresponds to a dmb ishst, aq+rl corresponds to a dmb ish. On
+// Intel, aq corresponds to an lfence, rl to an sfence, and aq+rl to an mfence
+// (or a lock prefix).
+//
+// Go's memory model requires that
+//   - if a read happens after a write, the read must observe the write, and
+//     that
+//   - if a read happens concurrently with a write, the read may observe the
+//     write.
+// aq is sufficient to guarantee this, so that's what we use here. (This jibes
+// with ARM, which uses dmb ishst.)
+
+#include "textflag.h"
+
+// func Cas(ptr *uint64, old, new uint64) bool
+// Atomically:
+//      if(*val == old){
+//              *val = new;
+//              return 1;
+//      } else {
+//              return 0;
+//      }
+TEXT ·Cas(SB), NOSPLIT, $0-17
+	MOV	ptr+0(FP), A0
+	MOVW	old+8(FP), A1
+	MOVW	new+12(FP), A2
+cas_again:
+	LRW	(A0), A3
+	BNE	A3, A1, cas_fail
+	SCW	A2, (A0), A4
+	BNE	A4, ZERO, cas_again
+	MOV	$1, A0
+	MOVB	A0, ret+16(FP)
+	RET
+cas_fail:
+	MOV	$0, A0
+	MOV	A0, ret+16(FP)
+	RET
+
+// func Cas64(ptr *uint64, old, new uint64) bool
+TEXT ·Cas64(SB), NOSPLIT, $0-25
+	MOV	ptr+0(FP), A0
+	MOV	old+8(FP), A1
+	MOV	new+16(FP), A2
+cas_again:
+	LRD	(A0), A3
+	BNE	A3, A1, cas_fail
+	SCD	A2, (A0), A4
+	BNE	A4, ZERO, cas_again
+	MOV	$1, A0
+	MOVB	A0, ret+24(FP)
+	RET
+cas_fail:
+	MOVB	ZERO, ret+24(FP)
+	RET
+
+// func Load(ptr *uint32) uint32
+TEXT ·Load(SB),NOSPLIT|NOFRAME,$0-12
+	MOV	ptr+0(FP), A0
+	LRW	(A0), A0
+	MOVW	A0, ret+8(FP)
+	RET
+
+// func Load8(ptr *uint8) uint8
+TEXT ·Load8(SB),NOSPLIT|NOFRAME,$0-9
+	MOV	ptr+0(FP), A0
+	FENCE
+	MOVBU	(A0), A1
+	FENCE
+	MOVB	A1, ret+8(FP)
+	RET
+
+// func Load64(ptr *uint64) uint64
+TEXT ·Load64(SB),NOSPLIT|NOFRAME,$0-16
+	MOV	ptr+0(FP), A0
+	LRD	(A0), A0
+	MOV	A0, ret+8(FP)
+	RET
+
+// func Store(ptr *uint32, val uint32)
+TEXT ·Store(SB), NOSPLIT, $0-12
+	MOV	ptr+0(FP), A0
+	MOVW	val+8(FP), A1
+	AMOSWAPW A1, (A0), ZERO
+	RET
+
+// func Store8(ptr *uint8, val uint8)
+TEXT ·Store8(SB), NOSPLIT, $0-9
+	MOV	ptr+0(FP), A0
+	MOVBU	val+8(FP), A1
+	FENCE
+	MOVB	A1, (A0)
+	FENCE
+	RET
+
+// func Store64(ptr *uint64, val uint64)
+TEXT ·Store64(SB), NOSPLIT, $0-16
+	MOV	ptr+0(FP), A0
+	MOV	val+8(FP), A1
+	AMOSWAPD A1, (A0), ZERO
+	RET
+
+TEXT ·Casp1(SB), NOSPLIT, $0-25
+	JMP	·Cas64(SB)
+
+TEXT ·Casint32(SB),NOSPLIT,$0-17
+	JMP	·Cas(SB)
+
+TEXT ·Casint64(SB),NOSPLIT,$0-25
+	JMP	·Cas64(SB)
+
+TEXT ·Casuintptr(SB),NOSPLIT,$0-25
+	JMP	·Cas64(SB)
+
+TEXT ·CasRel(SB), NOSPLIT, $0-17
+	JMP	·Cas(SB)
+
+TEXT ·Loaduintptr(SB),NOSPLIT,$0-16
+	JMP	·Load64(SB)
+
+TEXT ·Storeint32(SB),NOSPLIT,$0-12
+	JMP	·Store(SB)
+
+TEXT ·Storeint64(SB),NOSPLIT,$0-16
+	JMP	·Store64(SB)
+
+TEXT ·Storeuintptr(SB),NOSPLIT,$0-16
+	JMP	·Store64(SB)
+
+TEXT ·Loaduint(SB),NOSPLIT,$0-16
+	JMP ·Loaduintptr(SB)
+
+TEXT ·Loadint32(SB),NOSPLIT,$0-12
+	JMP ·Load(SB)
+
+TEXT ·Loadint64(SB),NOSPLIT,$0-16
+	JMP ·Load64(SB)
+
+TEXT ·Xaddint32(SB),NOSPLIT,$0-20
+	JMP ·Xadd(SB)
+
+TEXT ·Xaddint64(SB),NOSPLIT,$0-24
+	MOV	ptr+0(FP), A0
+	MOV	delta+8(FP), A1
+	AMOADDD A1, (A0), A0
+	ADD	A0, A1, A0
+	MOVW	A0, ret+16(FP)
+	RET
+
+TEXT ·LoadAcq(SB),NOSPLIT|NOFRAME,$0-12
+	JMP	·Load(SB)
+
+TEXT ·LoadAcq64(SB),NOSPLIT|NOFRAME,$0-16
+	JMP	·Load64(SB)
+
+TEXT ·LoadAcquintptr(SB),NOSPLIT|NOFRAME,$0-16
+	JMP	·Load64(SB)
+
+// func Loadp(ptr unsafe.Pointer) unsafe.Pointer
+TEXT ·Loadp(SB),NOSPLIT,$0-16
+	JMP	·Load64(SB)
+
+// func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+TEXT ·StorepNoWB(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·StoreRel(SB), NOSPLIT, $0-12
+	JMP	·Store(SB)
+
+TEXT ·StoreRel64(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+TEXT ·StoreReluintptr(SB), NOSPLIT, $0-16
+	JMP	·Store64(SB)
+
+// func Xchg(ptr *uint32, new uint32) uint32
+TEXT ·Xchg(SB), NOSPLIT, $0-20
+	MOV	ptr+0(FP), A0
+	MOVW	new+8(FP), A1
+	AMOSWAPW A1, (A0), A1
+	MOVW	A1, ret+16(FP)
+	RET
+
+// func Xchg64(ptr *uint64, new uint64) uint64
+TEXT ·Xchg64(SB), NOSPLIT, $0-24
+	MOV	ptr+0(FP), A0
+	MOV	new+8(FP), A1
+	AMOSWAPD A1, (A0), A1
+	MOV	A1, ret+16(FP)
+	RET
+
+// Atomically:
+//      *val += delta;
+//      return *val;
+
+// func Xadd(ptr *uint32, delta int32) uint32
+TEXT ·Xadd(SB), NOSPLIT, $0-20
+	MOV	ptr+0(FP), A0
+	MOVW	delta+8(FP), A1
+	AMOADDW A1, (A0), A2
+	ADD	A2,A1,A0
+	MOVW	A0, ret+16(FP)
+	RET
+
+// func Xadd64(ptr *uint64, delta int64) uint64
+TEXT ·Xadd64(SB), NOSPLIT, $0-24
+	MOV	ptr+0(FP), A0
+	MOV	delta+8(FP), A1
+	AMOADDD A1, (A0), A2
+	ADD	A2, A1, A0
+	MOV	A0, ret+16(FP)
+	RET
+
+// func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+TEXT ·Xadduintptr(SB), NOSPLIT, $0-24
+	JMP	·Xadd64(SB)
+
+// func Xchgint32(ptr *int32, new int32) int32
+TEXT ·Xchgint32(SB), NOSPLIT, $0-20
+	JMP	·Xchg(SB)
+
+// func Xchgint64(ptr *int64, new int64) int64
+TEXT ·Xchgint64(SB), NOSPLIT, $0-24
+	JMP	·Xchg64(SB)
+
+// func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+TEXT ·Xchguintptr(SB), NOSPLIT, $0-24
+	JMP	·Xchg64(SB)
+
+// func And8(ptr *uint8, val uint8)
+TEXT ·And8(SB), NOSPLIT, $0-9
+	MOV	ptr+0(FP), A0
+	MOVBU	val+8(FP), A1
+	AND	$3, A0, A2
+	AND	$-4, A0
+	SLL	$3, A2
+	XOR	$255, A1
+	SLL	A2, A1
+	XOR	$-1, A1
+	AMOANDW A1, (A0), ZERO
+	RET
+
+// func Or8(ptr *uint8, val uint8)
+TEXT ·Or8(SB), NOSPLIT, $0-9
+	MOV	ptr+0(FP), A0
+	MOVBU	val+8(FP), A1
+	AND	$3, A0, A2
+	AND	$-4, A0
+	SLL	$3, A2
+	SLL	A2, A1
+	AMOORW	A1, (A0), ZERO
+	RET
+
+// func And(ptr *uint32, val uint32)
+TEXT ·And(SB), NOSPLIT, $0-12
+	MOV	ptr+0(FP), A0
+	MOVW	val+8(FP), A1
+	AMOANDW	A1, (A0), ZERO
+	RET
+
+// func Or(ptr *uint32, val uint32)
+TEXT ·Or(SB), NOSPLIT, $0-12
+	MOV	ptr+0(FP), A0
+	MOVW	val+8(FP), A1
+	AMOORW	A1, (A0), ZERO
+	RET
diff --git a/src/runtime/internal/atomic/atomic_s390x.go b/src/runtime/internal/atomic/atomic_s390x.go
new file mode 100644
index 0000000..9855bf0
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_s390x.go
@@ -0,0 +1,123 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package atomic
+
+import "unsafe"
+
+// Export some functions via linkname to assembly in sync/atomic.
+//
+//go:linkname Load
+//go:linkname Loadp
+//go:linkname Load64
+
+//go:nosplit
+//go:noinline
+func Load(ptr *uint32) uint32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer {
+	return *(*unsafe.Pointer)(ptr)
+}
+
+//go:nosplit
+//go:noinline
+func Load8(ptr *uint8) uint8 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Load64(ptr *uint64) uint64 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcq(ptr *uint32) uint32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcq64(ptr *uint64) uint64 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcquintptr(ptr *uintptr) uintptr {
+	return *ptr
+}
+
+//go:noescape
+func Store(ptr *uint32, val uint32)
+
+//go:noescape
+func Store8(ptr *uint8, val uint8)
+
+//go:noescape
+func Store64(ptr *uint64, val uint64)
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+
+//go:nosplit
+//go:noinline
+func StoreRel(ptr *uint32, val uint32) {
+	*ptr = val
+}
+
+//go:nosplit
+//go:noinline
+func StoreRel64(ptr *uint64, val uint64) {
+	*ptr = val
+}
+
+//go:nosplit
+//go:noinline
+func StoreReluintptr(ptr *uintptr, val uintptr) {
+	*ptr = val
+}
+
+//go:noescape
+func And8(ptr *uint8, val uint8)
+
+//go:noescape
+func Or8(ptr *uint8, val uint8)
+
+// NOTE: Do not add atomicxor8 (XOR is not idempotent).
+
+//go:noescape
+func And(ptr *uint32, val uint32)
+
+//go:noescape
+func Or(ptr *uint32, val uint32)
+
+//go:noescape
+func Xadd(ptr *uint32, delta int32) uint32
+
+//go:noescape
+func Xadd64(ptr *uint64, delta int64) uint64
+
+//go:noescape
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+
+//go:noescape
+func Xchg(ptr *uint32, new uint32) uint32
+
+//go:noescape
+func Xchg64(ptr *uint64, new uint64) uint64
+
+//go:noescape
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+
+//go:noescape
+func Cas64(ptr *uint64, old, new uint64) bool
+
+//go:noescape
+func CasRel(ptr *uint32, old, new uint32) bool
diff --git a/src/runtime/internal/atomic/atomic_s390x.s b/src/runtime/internal/atomic/atomic_s390x.s
new file mode 100644
index 0000000..a0c204b
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_s390x.s
@@ -0,0 +1,248 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// func Store(ptr *uint32, val uint32)
+TEXT ·Store(SB), NOSPLIT, $0
+	MOVD	ptr+0(FP), R2
+	MOVWZ	val+8(FP), R3
+	MOVW	R3, 0(R2)
+	SYNC
+	RET
+
+// func Store8(ptr *uint8, val uint8)
+TEXT ·Store8(SB), NOSPLIT, $0
+	MOVD	ptr+0(FP), R2
+	MOVB	val+8(FP), R3
+	MOVB	R3, 0(R2)
+	SYNC
+	RET
+
+// func Store64(ptr *uint64, val uint64)
+TEXT ·Store64(SB), NOSPLIT, $0
+	MOVD	ptr+0(FP), R2
+	MOVD	val+8(FP), R3
+	MOVD	R3, 0(R2)
+	SYNC
+	RET
+
+// func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+TEXT ·StorepNoWB(SB), NOSPLIT, $0
+	MOVD	ptr+0(FP), R2
+	MOVD	val+8(FP), R3
+	MOVD	R3, 0(R2)
+	SYNC
+	RET
+
+// func Cas(ptr *uint32, old, new uint32) bool
+// Atomically:
+//	if *ptr == old {
+//		*val = new
+//		return 1
+//	} else {
+//		return 0
+//	}
+TEXT ·Cas(SB), NOSPLIT, $0-17
+	MOVD	ptr+0(FP), R3
+	MOVWZ	old+8(FP), R4
+	MOVWZ	new+12(FP), R5
+	CS	R4, R5, 0(R3)    //  if (R4 == 0(R3)) then 0(R3)= R5
+	BNE	cas_fail
+	MOVB	$1, ret+16(FP)
+	RET
+cas_fail:
+	MOVB	$0, ret+16(FP)
+	RET
+
+// func Cas64(ptr *uint64, old, new uint64) bool
+// Atomically:
+//	if *ptr == old {
+//		*ptr = new
+//		return 1
+//	} else {
+//		return 0
+//	}
+TEXT ·Cas64(SB), NOSPLIT, $0-25
+	MOVD	ptr+0(FP), R3
+	MOVD	old+8(FP), R4
+	MOVD	new+16(FP), R5
+	CSG	R4, R5, 0(R3)    //  if (R4 == 0(R3)) then 0(R3)= R5
+	BNE	cas64_fail
+	MOVB	$1, ret+24(FP)
+	RET
+cas64_fail:
+	MOVB	$0, ret+24(FP)
+	RET
+
+// func Casint32(ptr *int32, old, new int32) bool
+TEXT ·Casint32(SB), NOSPLIT, $0-17
+	BR	·Cas(SB)
+
+// func Casint64(ptr *int64, old, new int64) bool
+TEXT ·Casint64(SB), NOSPLIT, $0-25
+	BR	·Cas64(SB)
+
+// func Casuintptr(ptr *uintptr, old, new uintptr) bool
+TEXT ·Casuintptr(SB), NOSPLIT, $0-25
+	BR	·Cas64(SB)
+
+// func CasRel(ptr *uint32, old, new uint32) bool
+TEXT ·CasRel(SB), NOSPLIT, $0-17
+	BR	·Cas(SB)
+
+// func Loaduintptr(ptr *uintptr) uintptr
+TEXT ·Loaduintptr(SB), NOSPLIT, $0-16
+	BR	·Load64(SB)
+
+// func Loaduint(ptr *uint) uint
+TEXT ·Loaduint(SB), NOSPLIT, $0-16
+	BR	·Load64(SB)
+
+// func Storeint32(ptr *int32, new int32)
+TEXT ·Storeint32(SB), NOSPLIT, $0-12
+	BR	·Store(SB)
+
+// func Storeint64(ptr *int64, new int64)
+TEXT ·Storeint64(SB), NOSPLIT, $0-16
+	BR	·Store64(SB)
+
+// func Storeuintptr(ptr *uintptr, new uintptr)
+TEXT ·Storeuintptr(SB), NOSPLIT, $0-16
+	BR	·Store64(SB)
+
+// func Loadint32(ptr *int32) int32
+TEXT ·Loadint32(SB), NOSPLIT, $0-12
+	BR	·Load(SB)
+
+// func Loadint64(ptr *int64) int64
+TEXT ·Loadint64(SB), NOSPLIT, $0-16
+	BR	·Load64(SB)
+
+// func Xadduintptr(ptr *uintptr, delta uintptr) uintptr
+TEXT ·Xadduintptr(SB), NOSPLIT, $0-24
+	BR	·Xadd64(SB)
+
+// func Xaddint32(ptr *int32, delta int32) int32
+TEXT ·Xaddint32(SB), NOSPLIT, $0-20
+	BR	·Xadd(SB)
+
+// func Xaddint64(ptr *int64, delta int64) int64
+TEXT ·Xaddint64(SB), NOSPLIT, $0-24
+	BR	·Xadd64(SB)
+
+// func Casp1(ptr *unsafe.Pointer, old, new unsafe.Pointer) bool
+// Atomically:
+//	if *ptr == old {
+//		*ptr = new
+//		return 1
+//	} else {
+//		return 0
+//	}
+TEXT ·Casp1(SB), NOSPLIT, $0-25
+	BR ·Cas64(SB)
+
+// func Xadd(ptr *uint32, delta int32) uint32
+// Atomically:
+//	*ptr += delta
+//	return *ptr
+TEXT ·Xadd(SB), NOSPLIT, $0-20
+	MOVD	ptr+0(FP), R4
+	MOVW	delta+8(FP), R5
+	MOVW	(R4), R3
+repeat:
+	ADD	R5, R3, R6
+	CS	R3, R6, (R4) // if R3==(R4) then (R4)=R6 else R3=(R4)
+	BNE	repeat
+	MOVW	R6, ret+16(FP)
+	RET
+
+// func Xadd64(ptr *uint64, delta int64) uint64
+TEXT ·Xadd64(SB), NOSPLIT, $0-24
+	MOVD	ptr+0(FP), R4
+	MOVD	delta+8(FP), R5
+	MOVD	(R4), R3
+repeat:
+	ADD	R5, R3, R6
+	CSG	R3, R6, (R4) // if R3==(R4) then (R4)=R6 else R3=(R4)
+	BNE	repeat
+	MOVD	R6, ret+16(FP)
+	RET
+
+// func Xchg(ptr *uint32, new uint32) uint32
+TEXT ·Xchg(SB), NOSPLIT, $0-20
+	MOVD	ptr+0(FP), R4
+	MOVW	new+8(FP), R3
+	MOVW	(R4), R6
+repeat:
+	CS	R6, R3, (R4) // if R6==(R4) then (R4)=R3 else R6=(R4)
+	BNE	repeat
+	MOVW	R6, ret+16(FP)
+	RET
+
+// func Xchg64(ptr *uint64, new uint64) uint64
+TEXT ·Xchg64(SB), NOSPLIT, $0-24
+	MOVD	ptr+0(FP), R4
+	MOVD	new+8(FP), R3
+	MOVD	(R4), R6
+repeat:
+	CSG	R6, R3, (R4) // if R6==(R4) then (R4)=R3 else R6=(R4)
+	BNE	repeat
+	MOVD	R6, ret+16(FP)
+	RET
+
+// func Xchgint32(ptr *int32, new int32) int32
+TEXT ·Xchgint32(SB), NOSPLIT, $0-20
+	BR	·Xchg(SB)
+
+// func Xchgint64(ptr *int64, new int64) int64
+TEXT ·Xchgint64(SB), NOSPLIT, $0-24
+	BR	·Xchg64(SB)
+
+// func Xchguintptr(ptr *uintptr, new uintptr) uintptr
+TEXT ·Xchguintptr(SB), NOSPLIT, $0-24
+	BR	·Xchg64(SB)
+
+// func Or8(addr *uint8, v uint8)
+TEXT ·Or8(SB), NOSPLIT, $0-9
+	MOVD	ptr+0(FP), R3
+	MOVBZ	val+8(FP), R4
+	// We don't have atomic operations that work on individual bytes so we
+	// need to align addr down to a word boundary and create a mask
+	// containing v to OR with the entire word atomically.
+	MOVD	$(3<<3), R5
+	RXSBG	$59, $60, $3, R3, R5 // R5 = 24 - ((addr % 4) * 8) = ((addr & 3) << 3) ^ (3 << 3)
+	ANDW	$~3, R3              // R3 = floor(addr, 4) = addr &^ 3
+	SLW	R5, R4               // R4 = uint32(v) << R5
+	LAO	R4, R6, 0(R3)        // R6 = *R3; *R3 |= R4; (atomic)
+	RET
+
+// func And8(addr *uint8, v uint8)
+TEXT ·And8(SB), NOSPLIT, $0-9
+	MOVD	ptr+0(FP), R3
+	MOVBZ	val+8(FP), R4
+	// We don't have atomic operations that work on individual bytes so we
+	// need to align addr down to a word boundary and create a mask
+	// containing v to AND with the entire word atomically.
+	ORW	$~0xff, R4           // R4 = uint32(v) | 0xffffff00
+	MOVD	$(3<<3), R5
+	RXSBG	$59, $60, $3, R3, R5 // R5 = 24 - ((addr % 4) * 8) = ((addr & 3) << 3) ^ (3 << 3)
+	ANDW	$~3, R3              // R3 = floor(addr, 4) = addr &^ 3
+	RLL	R5, R4, R4           // R4 = rotl(R4, R5)
+	LAN	R4, R6, 0(R3)        // R6 = *R3; *R3 &= R4; (atomic)
+	RET
+
+// func Or(addr *uint32, v uint32)
+TEXT ·Or(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R3
+	MOVW	val+8(FP), R4
+	LAO	R4, R6, 0(R3)        // R6 = *R3; *R3 |= R4; (atomic)
+	RET
+
+// func And(addr *uint32, v uint32)
+TEXT ·And(SB), NOSPLIT, $0-12
+	MOVD	ptr+0(FP), R3
+	MOVW	val+8(FP), R4
+	LAN	R4, R6, 0(R3)        // R6 = *R3; *R3 &= R4; (atomic)
+	RET
diff --git a/src/runtime/internal/atomic/atomic_test.go b/src/runtime/internal/atomic/atomic_test.go
new file mode 100644
index 0000000..2427bfd
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_test.go
@@ -0,0 +1,386 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package atomic_test
+
+import (
+	"internal/goarch"
+	"runtime"
+	"runtime/internal/atomic"
+	"testing"
+	"unsafe"
+)
+
+func runParallel(N, iter int, f func()) {
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(int(N)))
+	done := make(chan bool)
+	for i := 0; i < N; i++ {
+		go func() {
+			for j := 0; j < iter; j++ {
+				f()
+			}
+			done <- true
+		}()
+	}
+	for i := 0; i < N; i++ {
+		<-done
+	}
+}
+
+func TestXadduintptr(t *testing.T) {
+	N := 20
+	iter := 100000
+	if testing.Short() {
+		N = 10
+		iter = 10000
+	}
+	inc := uintptr(100)
+	total := uintptr(0)
+	runParallel(N, iter, func() {
+		atomic.Xadduintptr(&total, inc)
+	})
+	if want := uintptr(N*iter) * inc; want != total {
+		t.Fatalf("xadduintpr error, want %d, got %d", want, total)
+	}
+	total = 0
+	runParallel(N, iter, func() {
+		atomic.Xadduintptr(&total, inc)
+		atomic.Xadduintptr(&total, uintptr(-int64(inc)))
+	})
+	if total != 0 {
+		t.Fatalf("xadduintpr total error, want %d, got %d", 0, total)
+	}
+}
+
+// Tests that xadduintptr correctly updates 64-bit values. The place where
+// we actually do so is mstats.go, functions mSysStat{Inc,Dec}.
+func TestXadduintptrOnUint64(t *testing.T) {
+	if goarch.BigEndian {
+		// On big endian architectures, we never use xadduintptr to update
+		// 64-bit values and hence we skip the test.  (Note that functions
+		// mSysStat{Inc,Dec} in mstats.go have explicit checks for
+		// big-endianness.)
+		t.Skip("skip xadduintptr on big endian architecture")
+	}
+	const inc = 100
+	val := uint64(0)
+	atomic.Xadduintptr((*uintptr)(unsafe.Pointer(&val)), inc)
+	if inc != val {
+		t.Fatalf("xadduintptr should increase lower-order bits, want %d, got %d", inc, val)
+	}
+}
+
+func shouldPanic(t *testing.T, name string, f func()) {
+	defer func() {
+		// Check that all GC maps are sane.
+		runtime.GC()
+
+		err := recover()
+		want := "unaligned 64-bit atomic operation"
+		if err == nil {
+			t.Errorf("%s did not panic", name)
+		} else if s, _ := err.(string); s != want {
+			t.Errorf("%s: wanted panic %q, got %q", name, want, err)
+		}
+	}()
+	f()
+}
+
+// Variant of sync/atomic's TestUnaligned64:
+func TestUnaligned64(t *testing.T) {
+	// Unaligned 64-bit atomics on 32-bit systems are
+	// a continual source of pain. Test that on 32-bit systems they crash
+	// instead of failing silently.
+
+	if unsafe.Sizeof(int(0)) != 4 {
+		t.Skip("test only runs on 32-bit systems")
+	}
+
+	x := make([]uint32, 4)
+	u := unsafe.Pointer(uintptr(unsafe.Pointer(&x[0])) | 4) // force alignment to 4
+
+	up64 := (*uint64)(u) // misaligned
+	p64 := (*int64)(u)   // misaligned
+
+	shouldPanic(t, "Load64", func() { atomic.Load64(up64) })
+	shouldPanic(t, "Loadint64", func() { atomic.Loadint64(p64) })
+	shouldPanic(t, "Store64", func() { atomic.Store64(up64, 0) })
+	shouldPanic(t, "Xadd64", func() { atomic.Xadd64(up64, 1) })
+	shouldPanic(t, "Xchg64", func() { atomic.Xchg64(up64, 1) })
+	shouldPanic(t, "Cas64", func() { atomic.Cas64(up64, 1, 2) })
+}
+
+func TestAnd8(t *testing.T) {
+	// Basic sanity check.
+	x := uint8(0xff)
+	for i := uint8(0); i < 8; i++ {
+		atomic.And8(&x, ^(1 << i))
+		if r := uint8(0xff) << (i + 1); x != r {
+			t.Fatalf("clearing bit %#x: want %#x, got %#x", uint8(1<<i), r, x)
+		}
+	}
+
+	// Set every bit in array to 1.
+	a := make([]uint8, 1<<12)
+	for i := range a {
+		a[i] = 0xff
+	}
+
+	// Clear array bit-by-bit in different goroutines.
+	done := make(chan bool)
+	for i := 0; i < 8; i++ {
+		m := ^uint8(1 << i)
+		go func() {
+			for i := range a {
+				atomic.And8(&a[i], m)
+			}
+			done <- true
+		}()
+	}
+	for i := 0; i < 8; i++ {
+		<-done
+	}
+
+	// Check that the array has been totally cleared.
+	for i, v := range a {
+		if v != 0 {
+			t.Fatalf("a[%v] not cleared: want %#x, got %#x", i, uint8(0), v)
+		}
+	}
+}
+
+func TestAnd(t *testing.T) {
+	// Basic sanity check.
+	x := uint32(0xffffffff)
+	for i := uint32(0); i < 32; i++ {
+		atomic.And(&x, ^(1 << i))
+		if r := uint32(0xffffffff) << (i + 1); x != r {
+			t.Fatalf("clearing bit %#x: want %#x, got %#x", uint32(1<<i), r, x)
+		}
+	}
+
+	// Set every bit in array to 1.
+	a := make([]uint32, 1<<12)
+	for i := range a {
+		a[i] = 0xffffffff
+	}
+
+	// Clear array bit-by-bit in different goroutines.
+	done := make(chan bool)
+	for i := 0; i < 32; i++ {
+		m := ^uint32(1 << i)
+		go func() {
+			for i := range a {
+				atomic.And(&a[i], m)
+			}
+			done <- true
+		}()
+	}
+	for i := 0; i < 32; i++ {
+		<-done
+	}
+
+	// Check that the array has been totally cleared.
+	for i, v := range a {
+		if v != 0 {
+			t.Fatalf("a[%v] not cleared: want %#x, got %#x", i, uint32(0), v)
+		}
+	}
+}
+
+func TestOr8(t *testing.T) {
+	// Basic sanity check.
+	x := uint8(0)
+	for i := uint8(0); i < 8; i++ {
+		atomic.Or8(&x, 1<<i)
+		if r := (uint8(1) << (i + 1)) - 1; x != r {
+			t.Fatalf("setting bit %#x: want %#x, got %#x", uint8(1)<<i, r, x)
+		}
+	}
+
+	// Start with every bit in array set to 0.
+	a := make([]uint8, 1<<12)
+
+	// Set every bit in array bit-by-bit in different goroutines.
+	done := make(chan bool)
+	for i := 0; i < 8; i++ {
+		m := uint8(1 << i)
+		go func() {
+			for i := range a {
+				atomic.Or8(&a[i], m)
+			}
+			done <- true
+		}()
+	}
+	for i := 0; i < 8; i++ {
+		<-done
+	}
+
+	// Check that the array has been totally set.
+	for i, v := range a {
+		if v != 0xff {
+			t.Fatalf("a[%v] not fully set: want %#x, got %#x", i, uint8(0xff), v)
+		}
+	}
+}
+
+func TestOr(t *testing.T) {
+	// Basic sanity check.
+	x := uint32(0)
+	for i := uint32(0); i < 32; i++ {
+		atomic.Or(&x, 1<<i)
+		if r := (uint32(1) << (i + 1)) - 1; x != r {
+			t.Fatalf("setting bit %#x: want %#x, got %#x", uint32(1)<<i, r, x)
+		}
+	}
+
+	// Start with every bit in array set to 0.
+	a := make([]uint32, 1<<12)
+
+	// Set every bit in array bit-by-bit in different goroutines.
+	done := make(chan bool)
+	for i := 0; i < 32; i++ {
+		m := uint32(1 << i)
+		go func() {
+			for i := range a {
+				atomic.Or(&a[i], m)
+			}
+			done <- true
+		}()
+	}
+	for i := 0; i < 32; i++ {
+		<-done
+	}
+
+	// Check that the array has been totally set.
+	for i, v := range a {
+		if v != 0xffffffff {
+			t.Fatalf("a[%v] not fully set: want %#x, got %#x", i, uint32(0xffffffff), v)
+		}
+	}
+}
+
+func TestBitwiseContended8(t *testing.T) {
+	// Start with every bit in array set to 0.
+	a := make([]uint8, 16)
+
+	// Iterations to try.
+	N := 1 << 16
+	if testing.Short() {
+		N = 1 << 10
+	}
+
+	// Set and then clear every bit in the array bit-by-bit in different goroutines.
+	done := make(chan bool)
+	for i := 0; i < 8; i++ {
+		m := uint8(1 << i)
+		go func() {
+			for n := 0; n < N; n++ {
+				for i := range a {
+					atomic.Or8(&a[i], m)
+					if atomic.Load8(&a[i])&m != m {
+						t.Errorf("a[%v] bit %#x not set", i, m)
+					}
+					atomic.And8(&a[i], ^m)
+					if atomic.Load8(&a[i])&m != 0 {
+						t.Errorf("a[%v] bit %#x not clear", i, m)
+					}
+				}
+			}
+			done <- true
+		}()
+	}
+	for i := 0; i < 8; i++ {
+		<-done
+	}
+
+	// Check that the array has been totally cleared.
+	for i, v := range a {
+		if v != 0 {
+			t.Fatalf("a[%v] not cleared: want %#x, got %#x", i, uint8(0), v)
+		}
+	}
+}
+
+func TestBitwiseContended(t *testing.T) {
+	// Start with every bit in array set to 0.
+	a := make([]uint32, 16)
+
+	// Iterations to try.
+	N := 1 << 16
+	if testing.Short() {
+		N = 1 << 10
+	}
+
+	// Set and then clear every bit in the array bit-by-bit in different goroutines.
+	done := make(chan bool)
+	for i := 0; i < 32; i++ {
+		m := uint32(1 << i)
+		go func() {
+			for n := 0; n < N; n++ {
+				for i := range a {
+					atomic.Or(&a[i], m)
+					if atomic.Load(&a[i])&m != m {
+						t.Errorf("a[%v] bit %#x not set", i, m)
+					}
+					atomic.And(&a[i], ^m)
+					if atomic.Load(&a[i])&m != 0 {
+						t.Errorf("a[%v] bit %#x not clear", i, m)
+					}
+				}
+			}
+			done <- true
+		}()
+	}
+	for i := 0; i < 32; i++ {
+		<-done
+	}
+
+	// Check that the array has been totally cleared.
+	for i, v := range a {
+		if v != 0 {
+			t.Fatalf("a[%v] not cleared: want %#x, got %#x", i, uint32(0), v)
+		}
+	}
+}
+
+func TestCasRel(t *testing.T) {
+	const _magic = 0x5a5aa5a5
+	var x struct {
+		before uint32
+		i      uint32
+		after  uint32
+		o      uint32
+		n      uint32
+	}
+
+	x.before = _magic
+	x.after = _magic
+	for j := 0; j < 32; j += 1 {
+		x.i = (1 << j) + 0
+		x.o = (1 << j) + 0
+		x.n = (1 << j) + 1
+		if !atomic.CasRel(&x.i, x.o, x.n) {
+			t.Fatalf("should have swapped %#x %#x", x.o, x.n)
+		}
+
+		if x.i != x.n {
+			t.Fatalf("wrong x.i after swap: x.i=%#x x.n=%#x", x.i, x.n)
+		}
+
+		if x.before != _magic || x.after != _magic {
+			t.Fatalf("wrong magic: %#x _ %#x != %#x _ %#x", x.before, x.after, _magic, _magic)
+		}
+	}
+}
+
+func TestStorepNoWB(t *testing.T) {
+	var p [2]*int
+	for i := range p {
+		atomic.StorepNoWB(unsafe.Pointer(&p[i]), unsafe.Pointer(new(int)))
+	}
+	if p[0] == p[1] {
+		t.Error("Bad escape analysis of StorepNoWB")
+	}
+}
diff --git a/src/runtime/internal/atomic/atomic_wasm.go b/src/runtime/internal/atomic/atomic_wasm.go
new file mode 100644
index 0000000..835fc43
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_wasm.go
@@ -0,0 +1,341 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// TODO(neelance): implement with actual atomic operations as soon as threads are available
+// See https://github.com/WebAssembly/design/issues/1073
+
+// Export some functions via linkname to assembly in sync/atomic.
+//
+//go:linkname Load
+//go:linkname Loadp
+//go:linkname Load64
+//go:linkname Loadint32
+//go:linkname Loadint64
+//go:linkname Loaduintptr
+//go:linkname Xadd
+//go:linkname Xaddint32
+//go:linkname Xaddint64
+//go:linkname Xadd64
+//go:linkname Xadduintptr
+//go:linkname Xchg
+//go:linkname Xchg64
+//go:linkname Xchgint32
+//go:linkname Xchgint64
+//go:linkname Xchguintptr
+//go:linkname Cas
+//go:linkname Cas64
+//go:linkname Casint32
+//go:linkname Casint64
+//go:linkname Casuintptr
+//go:linkname Store
+//go:linkname Store64
+//go:linkname Storeint32
+//go:linkname Storeint64
+//go:linkname Storeuintptr
+
+package atomic
+
+import "unsafe"
+
+//go:nosplit
+//go:noinline
+func Load(ptr *uint32) uint32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Loadp(ptr unsafe.Pointer) unsafe.Pointer {
+	return *(*unsafe.Pointer)(ptr)
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcq(ptr *uint32) uint32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcq64(ptr *uint64) uint64 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func LoadAcquintptr(ptr *uintptr) uintptr {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Load8(ptr *uint8) uint8 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Load64(ptr *uint64) uint64 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Xadd(ptr *uint32, delta int32) uint32 {
+	new := *ptr + uint32(delta)
+	*ptr = new
+	return new
+}
+
+//go:nosplit
+//go:noinline
+func Xadd64(ptr *uint64, delta int64) uint64 {
+	new := *ptr + uint64(delta)
+	*ptr = new
+	return new
+}
+
+//go:nosplit
+//go:noinline
+func Xadduintptr(ptr *uintptr, delta uintptr) uintptr {
+	new := *ptr + delta
+	*ptr = new
+	return new
+}
+
+//go:nosplit
+//go:noinline
+func Xchg(ptr *uint32, new uint32) uint32 {
+	old := *ptr
+	*ptr = new
+	return old
+}
+
+//go:nosplit
+//go:noinline
+func Xchg64(ptr *uint64, new uint64) uint64 {
+	old := *ptr
+	*ptr = new
+	return old
+}
+
+//go:nosplit
+//go:noinline
+func Xchgint32(ptr *int32, new int32) int32 {
+	old := *ptr
+	*ptr = new
+	return old
+}
+
+//go:nosplit
+//go:noinline
+func Xchgint64(ptr *int64, new int64) int64 {
+	old := *ptr
+	*ptr = new
+	return old
+}
+
+//go:nosplit
+//go:noinline
+func Xchguintptr(ptr *uintptr, new uintptr) uintptr {
+	old := *ptr
+	*ptr = new
+	return old
+}
+
+//go:nosplit
+//go:noinline
+func And8(ptr *uint8, val uint8) {
+	*ptr = *ptr & val
+}
+
+//go:nosplit
+//go:noinline
+func Or8(ptr *uint8, val uint8) {
+	*ptr = *ptr | val
+}
+
+// NOTE: Do not add atomicxor8 (XOR is not idempotent).
+
+//go:nosplit
+//go:noinline
+func And(ptr *uint32, val uint32) {
+	*ptr = *ptr & val
+}
+
+//go:nosplit
+//go:noinline
+func Or(ptr *uint32, val uint32) {
+	*ptr = *ptr | val
+}
+
+//go:nosplit
+//go:noinline
+func Cas64(ptr *uint64, old, new uint64) bool {
+	if *ptr == old {
+		*ptr = new
+		return true
+	}
+	return false
+}
+
+//go:nosplit
+//go:noinline
+func Store(ptr *uint32, val uint32) {
+	*ptr = val
+}
+
+//go:nosplit
+//go:noinline
+func StoreRel(ptr *uint32, val uint32) {
+	*ptr = val
+}
+
+//go:nosplit
+//go:noinline
+func StoreRel64(ptr *uint64, val uint64) {
+	*ptr = val
+}
+
+//go:nosplit
+//go:noinline
+func StoreReluintptr(ptr *uintptr, val uintptr) {
+	*ptr = val
+}
+
+//go:nosplit
+//go:noinline
+func Store8(ptr *uint8, val uint8) {
+	*ptr = val
+}
+
+//go:nosplit
+//go:noinline
+func Store64(ptr *uint64, val uint64) {
+	*ptr = val
+}
+
+// StorepNoWB performs *ptr = val atomically and without a write
+// barrier.
+//
+// NO go:noescape annotation; see atomic_pointer.go.
+func StorepNoWB(ptr unsafe.Pointer, val unsafe.Pointer)
+
+//go:nosplit
+//go:noinline
+func Casint32(ptr *int32, old, new int32) bool {
+	if *ptr == old {
+		*ptr = new
+		return true
+	}
+	return false
+}
+
+//go:nosplit
+//go:noinline
+func Casint64(ptr *int64, old, new int64) bool {
+	if *ptr == old {
+		*ptr = new
+		return true
+	}
+	return false
+}
+
+//go:nosplit
+//go:noinline
+func Cas(ptr *uint32, old, new uint32) bool {
+	if *ptr == old {
+		*ptr = new
+		return true
+	}
+	return false
+}
+
+//go:nosplit
+//go:noinline
+func Casp1(ptr *unsafe.Pointer, old, new unsafe.Pointer) bool {
+	if *ptr == old {
+		*ptr = new
+		return true
+	}
+	return false
+}
+
+//go:nosplit
+//go:noinline
+func Casuintptr(ptr *uintptr, old, new uintptr) bool {
+	if *ptr == old {
+		*ptr = new
+		return true
+	}
+	return false
+}
+
+//go:nosplit
+//go:noinline
+func CasRel(ptr *uint32, old, new uint32) bool {
+	if *ptr == old {
+		*ptr = new
+		return true
+	}
+	return false
+}
+
+//go:nosplit
+//go:noinline
+func Storeint32(ptr *int32, new int32) {
+	*ptr = new
+}
+
+//go:nosplit
+//go:noinline
+func Storeint64(ptr *int64, new int64) {
+	*ptr = new
+}
+
+//go:nosplit
+//go:noinline
+func Storeuintptr(ptr *uintptr, new uintptr) {
+	*ptr = new
+}
+
+//go:nosplit
+//go:noinline
+func Loaduintptr(ptr *uintptr) uintptr {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Loaduint(ptr *uint) uint {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Loadint32(ptr *int32) int32 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Loadint64(ptr *int64) int64 {
+	return *ptr
+}
+
+//go:nosplit
+//go:noinline
+func Xaddint32(ptr *int32, delta int32) int32 {
+	new := *ptr + delta
+	*ptr = new
+	return new
+}
+
+//go:nosplit
+//go:noinline
+func Xaddint64(ptr *int64, delta int64) int64 {
+	new := *ptr + delta
+	*ptr = new
+	return new
+}
diff --git a/src/runtime/internal/atomic/atomic_wasm.s b/src/runtime/internal/atomic/atomic_wasm.s
new file mode 100644
index 0000000..1c2d1ce
--- /dev/null
+++ b/src/runtime/internal/atomic/atomic_wasm.s
@@ -0,0 +1,10 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT ·StorepNoWB(SB), NOSPLIT, $0-16
+	MOVD ptr+0(FP), R0
+	MOVD val+8(FP), 0(R0)
+	RET
diff --git a/src/runtime/internal/atomic/bench_test.go b/src/runtime/internal/atomic/bench_test.go
new file mode 100644
index 0000000..efc0531
--- /dev/null
+++ b/src/runtime/internal/atomic/bench_test.go
@@ -0,0 +1,195 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package atomic_test
+
+import (
+	"runtime/internal/atomic"
+	"testing"
+)
+
+var sink any
+
+func BenchmarkAtomicLoad64(b *testing.B) {
+	var x uint64
+	sink = &x
+	for i := 0; i < b.N; i++ {
+		_ = atomic.Load64(&x)
+	}
+}
+
+func BenchmarkAtomicStore64(b *testing.B) {
+	var x uint64
+	sink = &x
+	for i := 0; i < b.N; i++ {
+		atomic.Store64(&x, 0)
+	}
+}
+
+func BenchmarkAtomicLoad(b *testing.B) {
+	var x uint32
+	sink = &x
+	for i := 0; i < b.N; i++ {
+		_ = atomic.Load(&x)
+	}
+}
+
+func BenchmarkAtomicStore(b *testing.B) {
+	var x uint32
+	sink = &x
+	for i := 0; i < b.N; i++ {
+		atomic.Store(&x, 0)
+	}
+}
+
+func BenchmarkAnd8(b *testing.B) {
+	var x [512]uint8 // give byte its own cache line
+	sink = &x
+	for i := 0; i < b.N; i++ {
+		atomic.And8(&x[255], uint8(i))
+	}
+}
+
+func BenchmarkAnd(b *testing.B) {
+	var x [128]uint32 // give x its own cache line
+	sink = &x
+	for i := 0; i < b.N; i++ {
+		atomic.And(&x[63], uint32(i))
+	}
+}
+
+func BenchmarkAnd8Parallel(b *testing.B) {
+	var x [512]uint8 // give byte its own cache line
+	sink = &x
+	b.RunParallel(func(pb *testing.PB) {
+		i := uint8(0)
+		for pb.Next() {
+			atomic.And8(&x[255], i)
+			i++
+		}
+	})
+}
+
+func BenchmarkAndParallel(b *testing.B) {
+	var x [128]uint32 // give x its own cache line
+	sink = &x
+	b.RunParallel(func(pb *testing.PB) {
+		i := uint32(0)
+		for pb.Next() {
+			atomic.And(&x[63], i)
+			i++
+		}
+	})
+}
+
+func BenchmarkOr8(b *testing.B) {
+	var x [512]uint8 // give byte its own cache line
+	sink = &x
+	for i := 0; i < b.N; i++ {
+		atomic.Or8(&x[255], uint8(i))
+	}
+}
+
+func BenchmarkOr(b *testing.B) {
+	var x [128]uint32 // give x its own cache line
+	sink = &x
+	for i := 0; i < b.N; i++ {
+		atomic.Or(&x[63], uint32(i))
+	}
+}
+
+func BenchmarkOr8Parallel(b *testing.B) {
+	var x [512]uint8 // give byte its own cache line
+	sink = &x
+	b.RunParallel(func(pb *testing.PB) {
+		i := uint8(0)
+		for pb.Next() {
+			atomic.Or8(&x[255], i)
+			i++
+		}
+	})
+}
+
+func BenchmarkOrParallel(b *testing.B) {
+	var x [128]uint32 // give x its own cache line
+	sink = &x
+	b.RunParallel(func(pb *testing.PB) {
+		i := uint32(0)
+		for pb.Next() {
+			atomic.Or(&x[63], i)
+			i++
+		}
+	})
+}
+
+func BenchmarkXadd(b *testing.B) {
+	var x uint32
+	ptr := &x
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			atomic.Xadd(ptr, 1)
+		}
+	})
+}
+
+func BenchmarkXadd64(b *testing.B) {
+	var x uint64
+	ptr := &x
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			atomic.Xadd64(ptr, 1)
+		}
+	})
+}
+
+func BenchmarkCas(b *testing.B) {
+	var x uint32
+	x = 1
+	ptr := &x
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			atomic.Cas(ptr, 1, 0)
+			atomic.Cas(ptr, 0, 1)
+		}
+	})
+}
+
+func BenchmarkCas64(b *testing.B) {
+	var x uint64
+	x = 1
+	ptr := &x
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			atomic.Cas64(ptr, 1, 0)
+			atomic.Cas64(ptr, 0, 1)
+		}
+	})
+}
+func BenchmarkXchg(b *testing.B) {
+	var x uint32
+	x = 1
+	ptr := &x
+	b.RunParallel(func(pb *testing.PB) {
+		var y uint32
+		y = 1
+		for pb.Next() {
+			y = atomic.Xchg(ptr, y)
+			y += 1
+		}
+	})
+}
+
+func BenchmarkXchg64(b *testing.B) {
+	var x uint64
+	x = 1
+	ptr := &x
+	b.RunParallel(func(pb *testing.PB) {
+		var y uint64
+		y = 1
+		for pb.Next() {
+			y = atomic.Xchg64(ptr, y)
+			y += 1
+		}
+	})
+}
diff --git a/src/runtime/internal/atomic/doc.go b/src/runtime/internal/atomic/doc.go
new file mode 100644
index 0000000..08e6b6c
--- /dev/null
+++ b/src/runtime/internal/atomic/doc.go
@@ -0,0 +1,18 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+Package atomic provides atomic operations, independent of sync/atomic,
+to the runtime.
+
+On most platforms, the compiler is aware of the functions defined
+in this package, and they're replaced with platform-specific intrinsics.
+On other platforms, generic implementations are made available.
+
+Unless otherwise noted, operations defined in this package are sequentially
+consistent across threads with respect to the values they manipulate. More
+specifically, operations that happen in a specific order on one thread,
+will always be observed to happen in exactly that order by another thread.
+*/
+package atomic
diff --git a/src/runtime/internal/atomic/stubs.go b/src/runtime/internal/atomic/stubs.go
new file mode 100644
index 0000000..7df8d9c
--- /dev/null
+++ b/src/runtime/internal/atomic/stubs.go
@@ -0,0 +1,59 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !wasm
+
+package atomic
+
+import "unsafe"
+
+//go:noescape
+func Cas(ptr *uint32, old, new uint32) bool
+
+// NO go:noescape annotation; see atomic_pointer.go.
+func Casp1(ptr *unsafe.Pointer, old, new unsafe.Pointer) bool
+
+//go:noescape
+func Casint32(ptr *int32, old, new int32) bool
+
+//go:noescape
+func Casint64(ptr *int64, old, new int64) bool
+
+//go:noescape
+func Casuintptr(ptr *uintptr, old, new uintptr) bool
+
+//go:noescape
+func Storeint32(ptr *int32, new int32)
+
+//go:noescape
+func Storeint64(ptr *int64, new int64)
+
+//go:noescape
+func Storeuintptr(ptr *uintptr, new uintptr)
+
+//go:noescape
+func Loaduintptr(ptr *uintptr) uintptr
+
+//go:noescape
+func Loaduint(ptr *uint) uint
+
+// TODO(matloob): Should these functions have the go:noescape annotation?
+
+//go:noescape
+func Loadint32(ptr *int32) int32
+
+//go:noescape
+func Loadint64(ptr *int64) int64
+
+//go:noescape
+func Xaddint32(ptr *int32, delta int32) int32
+
+//go:noescape
+func Xaddint64(ptr *int64, delta int64) int64
+
+//go:noescape
+func Xchgint32(ptr *int32, new int32) int32
+
+//go:noescape
+func Xchgint64(ptr *int64, new int64) int64
diff --git a/src/runtime/internal/atomic/sys_linux_arm.s b/src/runtime/internal/atomic/sys_linux_arm.s
new file mode 100644
index 0000000..9225df8
--- /dev/null
+++ b/src/runtime/internal/atomic/sys_linux_arm.s
@@ -0,0 +1,134 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// Linux/ARM atomic operations.
+
+// Because there is so much variation in ARM devices,
+// the Linux kernel provides an appropriate compare-and-swap
+// implementation at address 0xffff0fc0.  Caller sets:
+//	R0 = old value
+//	R1 = new value
+//	R2 = addr
+//	LR = return address
+// The function returns with CS true if the swap happened.
+// http://lxr.linux.no/linux+v2.6.37.2/arch/arm/kernel/entry-armv.S#L850
+//
+// https://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b49c0f24cf6744a3f4fd09289fe7cade349dead5
+//
+TEXT cas<>(SB),NOSPLIT,$0
+	MOVW	$0xffff0fc0, R15 // R15 is hardware PC.
+
+TEXT ·Cas(SB),NOSPLIT|NOFRAME,$0
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	JMP	·armcas(SB)
+	JMP	kernelcas<>(SB)
+
+TEXT kernelcas<>(SB),NOSPLIT,$0
+	MOVW	ptr+0(FP), R2
+	// trigger potential paging fault here,
+	// because we don't know how to traceback through __kuser_cmpxchg
+	MOVW    (R2), R0
+	MOVW	old+4(FP), R0
+	MOVW	new+8(FP), R1
+	BL	cas<>(SB)
+	BCC	ret0
+	MOVW	$1, R0
+	MOVB	R0, ret+12(FP)
+	RET
+ret0:
+	MOVW	$0, R0
+	MOVB	R0, ret+12(FP)
+	RET
+
+// As for cas, memory barriers are complicated on ARM, but the kernel
+// provides a user helper. ARMv5 does not support SMP and has no
+// memory barrier instruction at all. ARMv6 added SMP support and has
+// a memory barrier, but it requires writing to a coprocessor
+// register. ARMv7 introduced the DMB instruction, but it's expensive
+// even on single-core devices. The kernel helper takes care of all of
+// this for us.
+
+// Use kernel helper version of memory_barrier, when compiled with GOARM < 7.
+TEXT memory_barrier<>(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$0xffff0fa0, R15 // R15 is hardware PC.
+
+TEXT	·Load(SB),NOSPLIT,$0-8
+	MOVW	addr+0(FP), R0
+	MOVW	(R0), R1
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BGE	native_barrier
+	BL	memory_barrier<>(SB)
+	B	end
+native_barrier:
+	DMB	MB_ISH
+end:
+	MOVW	R1, ret+4(FP)
+	RET
+
+TEXT	·Store(SB),NOSPLIT,$0-8
+	MOVW	addr+0(FP), R1
+	MOVW	v+4(FP), R2
+
+	MOVB	runtime·goarm(SB), R8
+	CMP	$7, R8
+	BGE	native_barrier
+	BL	memory_barrier<>(SB)
+	B	store
+native_barrier:
+	DMB	MB_ISH
+
+store:
+	MOVW	R2, (R1)
+
+	CMP	$7, R8
+	BGE	native_barrier2
+	BL	memory_barrier<>(SB)
+	RET
+native_barrier2:
+	DMB	MB_ISH
+	RET
+
+TEXT	·Load8(SB),NOSPLIT,$0-5
+	MOVW	addr+0(FP), R0
+	MOVB	(R0), R1
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BGE	native_barrier
+	BL	memory_barrier<>(SB)
+	B	end
+native_barrier:
+	DMB	MB_ISH
+end:
+	MOVB	R1, ret+4(FP)
+	RET
+
+TEXT	·Store8(SB),NOSPLIT,$0-5
+	MOVW	addr+0(FP), R1
+	MOVB	v+4(FP), R2
+
+	MOVB	runtime·goarm(SB), R8
+	CMP	$7, R8
+	BGE	native_barrier
+	BL	memory_barrier<>(SB)
+	B	store
+native_barrier:
+	DMB	MB_ISH
+
+store:
+	MOVB	R2, (R1)
+
+	CMP	$7, R8
+	BGE	native_barrier2
+	BL	memory_barrier<>(SB)
+	RET
+native_barrier2:
+	DMB	MB_ISH
+	RET
diff --git a/src/runtime/internal/atomic/sys_nonlinux_arm.s b/src/runtime/internal/atomic/sys_nonlinux_arm.s
new file mode 100644
index 0000000..b55bf90
--- /dev/null
+++ b/src/runtime/internal/atomic/sys_nonlinux_arm.s
@@ -0,0 +1,79 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !linux
+
+#include "textflag.h"
+
+// TODO(minux): this is only valid for ARMv6+
+// bool armcas(int32 *val, int32 old, int32 new)
+// Atomically:
+//	if(*val == old){
+//		*val = new;
+//		return 1;
+//	}else
+//		return 0;
+TEXT	·Cas(SB),NOSPLIT,$0
+	JMP	·armcas(SB)
+
+// Non-linux OSes support only single processor machines before ARMv7.
+// So we don't need memory barriers if goarm < 7. And we fail loud at
+// startup (runtime.checkgoarm) if it is a multi-processor but goarm < 7.
+
+TEXT	·Load(SB),NOSPLIT|NOFRAME,$0-8
+	MOVW	addr+0(FP), R0
+	MOVW	(R0), R1
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	DMB	MB_ISH
+
+	MOVW	R1, ret+4(FP)
+	RET
+
+TEXT	·Store(SB),NOSPLIT,$0-8
+	MOVW	addr+0(FP), R1
+	MOVW	v+4(FP), R2
+
+	MOVB	runtime·goarm(SB), R8
+	CMP	$7, R8
+	BLT	2(PC)
+	DMB	MB_ISH
+
+	MOVW	R2, (R1)
+
+	CMP	$7, R8
+	BLT	2(PC)
+	DMB	MB_ISH
+	RET
+
+TEXT	·Load8(SB),NOSPLIT|NOFRAME,$0-5
+	MOVW	addr+0(FP), R0
+	MOVB	(R0), R1
+
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	DMB	MB_ISH
+
+	MOVB	R1, ret+4(FP)
+	RET
+
+TEXT	·Store8(SB),NOSPLIT,$0-5
+	MOVW	addr+0(FP), R1
+	MOVB	v+4(FP), R2
+
+	MOVB	runtime·goarm(SB), R8
+	CMP	$7, R8
+	BLT	2(PC)
+	DMB	MB_ISH
+
+	MOVB	R2, (R1)
+
+	CMP	$7, R8
+	BLT	2(PC)
+	DMB	MB_ISH
+	RET
+
diff --git a/src/runtime/internal/atomic/types.go b/src/runtime/internal/atomic/types.go
new file mode 100644
index 0000000..287742f
--- /dev/null
+++ b/src/runtime/internal/atomic/types.go
@@ -0,0 +1,587 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package atomic
+
+import "unsafe"
+
+// Int32 is an atomically accessed int32 value.
+//
+// An Int32 must not be copied.
+type Int32 struct {
+	noCopy noCopy
+	value  int32
+}
+
+// Load accesses and returns the value atomically.
+//
+//go:nosplit
+func (i *Int32) Load() int32 {
+	return Loadint32(&i.value)
+}
+
+// Store updates the value atomically.
+//
+//go:nosplit
+func (i *Int32) Store(value int32) {
+	Storeint32(&i.value, value)
+}
+
+// CompareAndSwap atomically compares i's value with old,
+// and if they're equal, swaps i's value with new.
+// It reports whether the swap ran.
+//
+//go:nosplit
+func (i *Int32) CompareAndSwap(old, new int32) bool {
+	return Casint32(&i.value, old, new)
+}
+
+// Swap replaces i's value with new, returning
+// i's value before the replacement.
+//
+//go:nosplit
+func (i *Int32) Swap(new int32) int32 {
+	return Xchgint32(&i.value, new)
+}
+
+// Add adds delta to i atomically, returning
+// the new updated value.
+//
+// This operation wraps around in the usual
+// two's-complement way.
+//
+//go:nosplit
+func (i *Int32) Add(delta int32) int32 {
+	return Xaddint32(&i.value, delta)
+}
+
+// Int64 is an atomically accessed int64 value.
+//
+// 8-byte aligned on all platforms, unlike a regular int64.
+//
+// An Int64 must not be copied.
+type Int64 struct {
+	noCopy noCopy
+	_      align64
+	value  int64
+}
+
+// Load accesses and returns the value atomically.
+//
+//go:nosplit
+func (i *Int64) Load() int64 {
+	return Loadint64(&i.value)
+}
+
+// Store updates the value atomically.
+//
+//go:nosplit
+func (i *Int64) Store(value int64) {
+	Storeint64(&i.value, value)
+}
+
+// CompareAndSwap atomically compares i's value with old,
+// and if they're equal, swaps i's value with new.
+// It reports whether the swap ran.
+//
+//go:nosplit
+func (i *Int64) CompareAndSwap(old, new int64) bool {
+	return Casint64(&i.value, old, new)
+}
+
+// Swap replaces i's value with new, returning
+// i's value before the replacement.
+//
+//go:nosplit
+func (i *Int64) Swap(new int64) int64 {
+	return Xchgint64(&i.value, new)
+}
+
+// Add adds delta to i atomically, returning
+// the new updated value.
+//
+// This operation wraps around in the usual
+// two's-complement way.
+//
+//go:nosplit
+func (i *Int64) Add(delta int64) int64 {
+	return Xaddint64(&i.value, delta)
+}
+
+// Uint8 is an atomically accessed uint8 value.
+//
+// A Uint8 must not be copied.
+type Uint8 struct {
+	noCopy noCopy
+	value  uint8
+}
+
+// Load accesses and returns the value atomically.
+//
+//go:nosplit
+func (u *Uint8) Load() uint8 {
+	return Load8(&u.value)
+}
+
+// Store updates the value atomically.
+//
+//go:nosplit
+func (u *Uint8) Store(value uint8) {
+	Store8(&u.value, value)
+}
+
+// And takes value and performs a bit-wise
+// "and" operation with the value of u, storing
+// the result into u.
+//
+// The full process is performed atomically.
+//
+//go:nosplit
+func (u *Uint8) And(value uint8) {
+	And8(&u.value, value)
+}
+
+// Or takes value and performs a bit-wise
+// "or" operation with the value of u, storing
+// the result into u.
+//
+// The full process is performed atomically.
+//
+//go:nosplit
+func (u *Uint8) Or(value uint8) {
+	Or8(&u.value, value)
+}
+
+// Bool is an atomically accessed bool value.
+//
+// A Bool must not be copied.
+type Bool struct {
+	// Inherits noCopy from Uint8.
+	u Uint8
+}
+
+// Load accesses and returns the value atomically.
+//
+//go:nosplit
+func (b *Bool) Load() bool {
+	return b.u.Load() != 0
+}
+
+// Store updates the value atomically.
+//
+//go:nosplit
+func (b *Bool) Store(value bool) {
+	s := uint8(0)
+	if value {
+		s = 1
+	}
+	b.u.Store(s)
+}
+
+// Uint32 is an atomically accessed uint32 value.
+//
+// A Uint32 must not be copied.
+type Uint32 struct {
+	noCopy noCopy
+	value  uint32
+}
+
+// Load accesses and returns the value atomically.
+//
+//go:nosplit
+func (u *Uint32) Load() uint32 {
+	return Load(&u.value)
+}
+
+// LoadAcquire is a partially unsynchronized version
+// of Load that relaxes ordering constraints. Other threads
+// may observe operations that precede this operation to
+// occur after it, but no operation that occurs after it
+// on this thread can be observed to occur before it.
+//
+// WARNING: Use sparingly and with great care.
+//
+//go:nosplit
+func (u *Uint32) LoadAcquire() uint32 {
+	return LoadAcq(&u.value)
+}
+
+// Store updates the value atomically.
+//
+//go:nosplit
+func (u *Uint32) Store(value uint32) {
+	Store(&u.value, value)
+}
+
+// StoreRelease is a partially unsynchronized version
+// of Store that relaxes ordering constraints. Other threads
+// may observe operations that occur after this operation to
+// precede it, but no operation that precedes it
+// on this thread can be observed to occur after it.
+//
+// WARNING: Use sparingly and with great care.
+//
+//go:nosplit
+func (u *Uint32) StoreRelease(value uint32) {
+	StoreRel(&u.value, value)
+}
+
+// CompareAndSwap atomically compares u's value with old,
+// and if they're equal, swaps u's value with new.
+// It reports whether the swap ran.
+//
+//go:nosplit
+func (u *Uint32) CompareAndSwap(old, new uint32) bool {
+	return Cas(&u.value, old, new)
+}
+
+// CompareAndSwapRelease is a partially unsynchronized version
+// of Cas that relaxes ordering constraints. Other threads
+// may observe operations that occur after this operation to
+// precede it, but no operation that precedes it
+// on this thread can be observed to occur after it.
+// It reports whether the swap ran.
+//
+// WARNING: Use sparingly and with great care.
+//
+//go:nosplit
+func (u *Uint32) CompareAndSwapRelease(old, new uint32) bool {
+	return CasRel(&u.value, old, new)
+}
+
+// Swap replaces u's value with new, returning
+// u's value before the replacement.
+//
+//go:nosplit
+func (u *Uint32) Swap(value uint32) uint32 {
+	return Xchg(&u.value, value)
+}
+
+// And takes value and performs a bit-wise
+// "and" operation with the value of u, storing
+// the result into u.
+//
+// The full process is performed atomically.
+//
+//go:nosplit
+func (u *Uint32) And(value uint32) {
+	And(&u.value, value)
+}
+
+// Or takes value and performs a bit-wise
+// "or" operation with the value of u, storing
+// the result into u.
+//
+// The full process is performed atomically.
+//
+//go:nosplit
+func (u *Uint32) Or(value uint32) {
+	Or(&u.value, value)
+}
+
+// Add adds delta to u atomically, returning
+// the new updated value.
+//
+// This operation wraps around in the usual
+// two's-complement way.
+//
+//go:nosplit
+func (u *Uint32) Add(delta int32) uint32 {
+	return Xadd(&u.value, delta)
+}
+
+// Uint64 is an atomically accessed uint64 value.
+//
+// 8-byte aligned on all platforms, unlike a regular uint64.
+//
+// A Uint64 must not be copied.
+type Uint64 struct {
+	noCopy noCopy
+	_      align64
+	value  uint64
+}
+
+// Load accesses and returns the value atomically.
+//
+//go:nosplit
+func (u *Uint64) Load() uint64 {
+	return Load64(&u.value)
+}
+
+// Store updates the value atomically.
+//
+//go:nosplit
+func (u *Uint64) Store(value uint64) {
+	Store64(&u.value, value)
+}
+
+// CompareAndSwap atomically compares u's value with old,
+// and if they're equal, swaps u's value with new.
+// It reports whether the swap ran.
+//
+//go:nosplit
+func (u *Uint64) CompareAndSwap(old, new uint64) bool {
+	return Cas64(&u.value, old, new)
+}
+
+// Swap replaces u's value with new, returning
+// u's value before the replacement.
+//
+//go:nosplit
+func (u *Uint64) Swap(value uint64) uint64 {
+	return Xchg64(&u.value, value)
+}
+
+// Add adds delta to u atomically, returning
+// the new updated value.
+//
+// This operation wraps around in the usual
+// two's-complement way.
+//
+//go:nosplit
+func (u *Uint64) Add(delta int64) uint64 {
+	return Xadd64(&u.value, delta)
+}
+
+// Uintptr is an atomically accessed uintptr value.
+//
+// A Uintptr must not be copied.
+type Uintptr struct {
+	noCopy noCopy
+	value  uintptr
+}
+
+// Load accesses and returns the value atomically.
+//
+//go:nosplit
+func (u *Uintptr) Load() uintptr {
+	return Loaduintptr(&u.value)
+}
+
+// LoadAcquire is a partially unsynchronized version
+// of Load that relaxes ordering constraints. Other threads
+// may observe operations that precede this operation to
+// occur after it, but no operation that occurs after it
+// on this thread can be observed to occur before it.
+//
+// WARNING: Use sparingly and with great care.
+//
+//go:nosplit
+func (u *Uintptr) LoadAcquire() uintptr {
+	return LoadAcquintptr(&u.value)
+}
+
+// Store updates the value atomically.
+//
+//go:nosplit
+func (u *Uintptr) Store(value uintptr) {
+	Storeuintptr(&u.value, value)
+}
+
+// StoreRelease is a partially unsynchronized version
+// of Store that relaxes ordering constraints. Other threads
+// may observe operations that occur after this operation to
+// precede it, but no operation that precedes it
+// on this thread can be observed to occur after it.
+//
+// WARNING: Use sparingly and with great care.
+//
+//go:nosplit
+func (u *Uintptr) StoreRelease(value uintptr) {
+	StoreReluintptr(&u.value, value)
+}
+
+// CompareAndSwap atomically compares u's value with old,
+// and if they're equal, swaps u's value with new.
+// It reports whether the swap ran.
+//
+//go:nosplit
+func (u *Uintptr) CompareAndSwap(old, new uintptr) bool {
+	return Casuintptr(&u.value, old, new)
+}
+
+// Swap replaces u's value with new, returning
+// u's value before the replacement.
+//
+//go:nosplit
+func (u *Uintptr) Swap(value uintptr) uintptr {
+	return Xchguintptr(&u.value, value)
+}
+
+// Add adds delta to u atomically, returning
+// the new updated value.
+//
+// This operation wraps around in the usual
+// two's-complement way.
+//
+//go:nosplit
+func (u *Uintptr) Add(delta uintptr) uintptr {
+	return Xadduintptr(&u.value, delta)
+}
+
+// Float64 is an atomically accessed float64 value.
+//
+// 8-byte aligned on all platforms, unlike a regular float64.
+//
+// A Float64 must not be copied.
+type Float64 struct {
+	// Inherits noCopy and align64 from Uint64.
+	u Uint64
+}
+
+// Load accesses and returns the value atomically.
+//
+//go:nosplit
+func (f *Float64) Load() float64 {
+	r := f.u.Load()
+	return *(*float64)(unsafe.Pointer(&r))
+}
+
+// Store updates the value atomically.
+//
+//go:nosplit
+func (f *Float64) Store(value float64) {
+	f.u.Store(*(*uint64)(unsafe.Pointer(&value)))
+}
+
+// UnsafePointer is an atomically accessed unsafe.Pointer value.
+//
+// Note that because of the atomicity guarantees, stores to values
+// of this type never trigger a write barrier, and the relevant
+// methods are suffixed with "NoWB" to indicate that explicitly.
+// As a result, this type should be used carefully, and sparingly,
+// mostly with values that do not live in the Go heap anyway.
+//
+// An UnsafePointer must not be copied.
+type UnsafePointer struct {
+	noCopy noCopy
+	value  unsafe.Pointer
+}
+
+// Load accesses and returns the value atomically.
+//
+//go:nosplit
+func (u *UnsafePointer) Load() unsafe.Pointer {
+	return Loadp(unsafe.Pointer(&u.value))
+}
+
+// StoreNoWB updates the value atomically.
+//
+// WARNING: As the name implies this operation does *not*
+// perform a write barrier on value, and so this operation may
+// hide pointers from the GC. Use with care and sparingly.
+// It is safe to use with values not found in the Go heap.
+// Prefer Store instead.
+//
+//go:nosplit
+func (u *UnsafePointer) StoreNoWB(value unsafe.Pointer) {
+	StorepNoWB(unsafe.Pointer(&u.value), value)
+}
+
+// Store updates the value atomically.
+func (u *UnsafePointer) Store(value unsafe.Pointer) {
+	storePointer(&u.value, value)
+}
+
+// provided by runtime
+//
+//go:linkname storePointer
+func storePointer(ptr *unsafe.Pointer, new unsafe.Pointer)
+
+// CompareAndSwapNoWB atomically (with respect to other methods)
+// compares u's value with old, and if they're equal,
+// swaps u's value with new.
+// It reports whether the swap ran.
+//
+// WARNING: As the name implies this operation does *not*
+// perform a write barrier on value, and so this operation may
+// hide pointers from the GC. Use with care and sparingly.
+// It is safe to use with values not found in the Go heap.
+// Prefer CompareAndSwap instead.
+//
+//go:nosplit
+func (u *UnsafePointer) CompareAndSwapNoWB(old, new unsafe.Pointer) bool {
+	return Casp1(&u.value, old, new)
+}
+
+// CompareAndSwap atomically compares u's value with old,
+// and if they're equal, swaps u's value with new.
+// It reports whether the swap ran.
+func (u *UnsafePointer) CompareAndSwap(old, new unsafe.Pointer) bool {
+	return casPointer(&u.value, old, new)
+}
+
+func casPointer(ptr *unsafe.Pointer, old, new unsafe.Pointer) bool
+
+// Pointer is an atomic pointer of type *T.
+type Pointer[T any] struct {
+	u UnsafePointer
+}
+
+// Load accesses and returns the value atomically.
+//
+//go:nosplit
+func (p *Pointer[T]) Load() *T {
+	return (*T)(p.u.Load())
+}
+
+// StoreNoWB updates the value atomically.
+//
+// WARNING: As the name implies this operation does *not*
+// perform a write barrier on value, and so this operation may
+// hide pointers from the GC. Use with care and sparingly.
+// It is safe to use with values not found in the Go heap.
+// Prefer Store instead.
+//
+//go:nosplit
+func (p *Pointer[T]) StoreNoWB(value *T) {
+	p.u.StoreNoWB(unsafe.Pointer(value))
+}
+
+// Store updates the value atomically.
+//
+//go:nosplit
+func (p *Pointer[T]) Store(value *T) {
+	p.u.Store(unsafe.Pointer(value))
+}
+
+// CompareAndSwapNoWB atomically (with respect to other methods)
+// compares u's value with old, and if they're equal,
+// swaps u's value with new.
+// It reports whether the swap ran.
+//
+// WARNING: As the name implies this operation does *not*
+// perform a write barrier on value, and so this operation may
+// hide pointers from the GC. Use with care and sparingly.
+// It is safe to use with values not found in the Go heap.
+// Prefer CompareAndSwap instead.
+//
+//go:nosplit
+func (p *Pointer[T]) CompareAndSwapNoWB(old, new *T) bool {
+	return p.u.CompareAndSwapNoWB(unsafe.Pointer(old), unsafe.Pointer(new))
+}
+
+// CompareAndSwap atomically (with respect to other methods)
+// compares u's value with old, and if they're equal,
+// swaps u's value with new.
+// It reports whether the swap ran.
+func (p *Pointer[T]) CompareAndSwap(old, new *T) bool {
+	return p.u.CompareAndSwap(unsafe.Pointer(old), unsafe.Pointer(new))
+}
+
+// noCopy may be embedded into structs which must not be copied
+// after the first use.
+//
+// See https://golang.org/issues/8005#issuecomment-190753527
+// for details.
+type noCopy struct{}
+
+// Lock is a no-op used by -copylocks checker from `go vet`.
+func (*noCopy) Lock()   {}
+func (*noCopy) Unlock() {}
+
+// align64 may be added to structs that must be 64-bit aligned.
+// This struct is recognized by a special case in the compiler
+// and will not work if copied to any other package.
+type align64 struct{}
diff --git a/src/runtime/internal/atomic/types_64bit.go b/src/runtime/internal/atomic/types_64bit.go
new file mode 100644
index 0000000..006e83b
--- /dev/null
+++ b/src/runtime/internal/atomic/types_64bit.go
@@ -0,0 +1,33 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build amd64 || arm64 || loong64 || mips64 || mips64le || ppc64 || ppc64le || riscv64 || s390x || wasm
+
+package atomic
+
+// LoadAcquire is a partially unsynchronized version
+// of Load that relaxes ordering constraints. Other threads
+// may observe operations that precede this operation to
+// occur after it, but no operation that occurs after it
+// on this thread can be observed to occur before it.
+//
+// WARNING: Use sparingly and with great care.
+//
+//go:nosplit
+func (u *Uint64) LoadAcquire() uint64 {
+	return LoadAcq64(&u.value)
+}
+
+// StoreRelease is a partially unsynchronized version
+// of Store that relaxes ordering constraints. Other threads
+// may observe operations that occur after this operation to
+// precede it, but no operation that precedes it
+// on this thread can be observed to occur after it.
+//
+// WARNING: Use sparingly and with great care.
+//
+//go:nosplit
+func (u *Uint64) StoreRelease(value uint64) {
+	StoreRel64(&u.value, value)
+}
diff --git a/src/runtime/internal/atomic/unaligned.go b/src/runtime/internal/atomic/unaligned.go
new file mode 100644
index 0000000..a859de4
--- /dev/null
+++ b/src/runtime/internal/atomic/unaligned.go
@@ -0,0 +1,9 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package atomic
+
+func panicUnaligned() {
+	panic("unaligned 64-bit atomic operation")
+}
diff --git a/src/runtime/internal/math/math.go b/src/runtime/internal/math/math.go
new file mode 100644
index 0000000..c3fac36
--- /dev/null
+++ b/src/runtime/internal/math/math.go
@@ -0,0 +1,40 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package math
+
+import "internal/goarch"
+
+const MaxUintptr = ^uintptr(0)
+
+// MulUintptr returns a * b and whether the multiplication overflowed.
+// On supported platforms this is an intrinsic lowered by the compiler.
+func MulUintptr(a, b uintptr) (uintptr, bool) {
+	if a|b < 1<<(4*goarch.PtrSize) || a == 0 {
+		return a * b, false
+	}
+	overflow := b > MaxUintptr/a
+	return a * b, overflow
+}
+
+// Mul64 returns the 128-bit product of x and y: (hi, lo) = x * y
+// with the product bits' upper half returned in hi and the lower
+// half returned in lo.
+// This is a copy from math/bits.Mul64
+// On supported platforms this is an intrinsic lowered by the compiler.
+func Mul64(x, y uint64) (hi, lo uint64) {
+	const mask32 = 1<<32 - 1
+	x0 := x & mask32
+	x1 := x >> 32
+	y0 := y & mask32
+	y1 := y >> 32
+	w0 := x0 * y0
+	t := x1*y0 + w0>>32
+	w1 := t & mask32
+	w2 := t >> 32
+	w1 += x0 * y1
+	hi = x1*y1 + w2 + w1>>32
+	lo = x * y
+	return
+}
diff --git a/src/runtime/internal/math/math_test.go b/src/runtime/internal/math/math_test.go
new file mode 100644
index 0000000..303eb63
--- /dev/null
+++ b/src/runtime/internal/math/math_test.go
@@ -0,0 +1,79 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package math_test
+
+import (
+	. "runtime/internal/math"
+	"testing"
+)
+
+const (
+	UintptrSize = 32 << (^uintptr(0) >> 63)
+)
+
+type mulUintptrTest struct {
+	a        uintptr
+	b        uintptr
+	overflow bool
+}
+
+var mulUintptrTests = []mulUintptrTest{
+	{0, 0, false},
+	{1000, 1000, false},
+	{MaxUintptr, 0, false},
+	{MaxUintptr, 1, false},
+	{MaxUintptr / 2, 2, false},
+	{MaxUintptr / 2, 3, true},
+	{MaxUintptr, 10, true},
+	{MaxUintptr, 100, true},
+	{MaxUintptr / 100, 100, false},
+	{MaxUintptr / 1000, 1001, true},
+	{1<<(UintptrSize/2) - 1, 1<<(UintptrSize/2) - 1, false},
+	{1 << (UintptrSize / 2), 1 << (UintptrSize / 2), true},
+	{MaxUintptr >> 32, MaxUintptr >> 32, false},
+	{MaxUintptr, MaxUintptr, true},
+}
+
+func TestMulUintptr(t *testing.T) {
+	for _, test := range mulUintptrTests {
+		a, b := test.a, test.b
+		for i := 0; i < 2; i++ {
+			mul, overflow := MulUintptr(a, b)
+			if mul != a*b || overflow != test.overflow {
+				t.Errorf("MulUintptr(%v, %v) = %v, %v want %v, %v",
+					a, b, mul, overflow, a*b, test.overflow)
+			}
+			a, b = b, a
+		}
+	}
+}
+
+var SinkUintptr uintptr
+var SinkBool bool
+
+var x, y uintptr
+
+func BenchmarkMulUintptr(b *testing.B) {
+	x, y = 1, 2
+	b.Run("small", func(b *testing.B) {
+		for i := 0; i < b.N; i++ {
+			var overflow bool
+			SinkUintptr, overflow = MulUintptr(x, y)
+			if overflow {
+				SinkUintptr = 0
+			}
+		}
+	})
+	x, y = MaxUintptr, MaxUintptr-1
+	b.Run("large", func(b *testing.B) {
+		for i := 0; i < b.N; i++ {
+			var overflow bool
+			SinkUintptr, overflow = MulUintptr(x, y)
+			if overflow {
+				SinkUintptr = 0
+			}
+		}
+	})
+}
diff --git a/src/runtime/internal/startlinetest/func_amd64.go b/src/runtime/internal/startlinetest/func_amd64.go
new file mode 100644
index 0000000..ab7063d
--- /dev/null
+++ b/src/runtime/internal/startlinetest/func_amd64.go
@@ -0,0 +1,13 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Package startlinetest contains helpers for runtime_test.TestStartLineAsm.
+package startlinetest
+
+// Defined in func_amd64.s, this is a trivial assembly function that calls
+// runtime_test.callerStartLine.
+func AsmFunc() int
+
+// Provided by runtime_test.
+var CallerStartLine func(bool) int
diff --git a/src/runtime/internal/startlinetest/func_amd64.s b/src/runtime/internal/startlinetest/func_amd64.s
new file mode 100644
index 0000000..96982be
--- /dev/null
+++ b/src/runtime/internal/startlinetest/func_amd64.s
@@ -0,0 +1,28 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "funcdata.h"
+#include "textflag.h"
+
+// Assembly function for runtime_test.TestStartLineAsm.
+//
+// Note that this file can't be built directly as part of runtime_test, as assembly
+// files can't declare an alternative package. Building it into runtime is
+// possible, but linkshared complicates things:
+//
+//  1. linkshared mode leaves the function around in the final output of
+//     non-test builds.
+//  2. Due of (1), the linker can't resolve the callerStartLine relocation
+//     (as runtime_test isn't built for non-test builds).
+//
+// Thus it is simpler to just put this in its own package, imported only by
+// runtime_test. We use ABIInternal as no ABI wrapper is generated for
+// callerStartLine since it is in a different package.
+
+TEXT	·AsmFunc<ABIInternal>(SB),NOSPLIT,$8-0
+	NO_LOCAL_POINTERS
+	MOVQ	$0, AX // wantInlined
+	MOVQ	·CallerStartLine(SB), DX
+	CALL	(DX)
+	RET
diff --git a/src/runtime/internal/sys/consts.go b/src/runtime/internal/sys/consts.go
new file mode 100644
index 0000000..98c0f09
--- /dev/null
+++ b/src/runtime/internal/sys/consts.go
@@ -0,0 +1,36 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+import (
+	"internal/goarch"
+	"internal/goos"
+)
+
+// AIX requires a larger stack for syscalls.
+// The race build also needs more stack. See issue 54291.
+// This arithmetic must match that in cmd/internal/objabi/stack.go:stackGuardMultiplier.
+const StackGuardMultiplier = 1 + goos.IsAix + isRace
+
+// DefaultPhysPageSize is the default physical page size.
+const DefaultPhysPageSize = goarch.DefaultPhysPageSize
+
+// PCQuantum is the minimal unit for a program counter (1 on x86, 4 on most other systems).
+// The various PC tables record PC deltas pre-divided by PCQuantum.
+const PCQuantum = goarch.PCQuantum
+
+// Int64Align is the required alignment for a 64-bit integer (4 on 32-bit systems, 8 on 64-bit).
+const Int64Align = goarch.PtrSize
+
+// MinFrameSize is the size of the system-reserved words at the bottom
+// of a frame (just above the architectural stack pointer).
+// It is zero on x86 and PtrSize on most non-x86 (LR-based) systems.
+// On PowerPC it is larger, to cover three more reserved words:
+// the compiler word, the link editor word, and the TOC save word.
+const MinFrameSize = goarch.MinFrameSize
+
+// StackAlign is the required alignment of the SP register.
+// The stack must be at least word aligned, but some architectures require more.
+const StackAlign = goarch.StackAlign
diff --git a/src/runtime/internal/sys/consts_norace.go b/src/runtime/internal/sys/consts_norace.go
new file mode 100644
index 0000000..a9613b8
--- /dev/null
+++ b/src/runtime/internal/sys/consts_norace.go
@@ -0,0 +1,9 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !race
+
+package sys
+
+const isRace = 0
diff --git a/src/runtime/internal/sys/consts_race.go b/src/runtime/internal/sys/consts_race.go
new file mode 100644
index 0000000..f824fb3
--- /dev/null
+++ b/src/runtime/internal/sys/consts_race.go
@@ -0,0 +1,9 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build race
+
+package sys
+
+const isRace = 1
diff --git a/src/runtime/internal/sys/intrinsics.go b/src/runtime/internal/sys/intrinsics.go
new file mode 100644
index 0000000..e6a3758
--- /dev/null
+++ b/src/runtime/internal/sys/intrinsics.go
@@ -0,0 +1,208 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+// Copied from math/bits to avoid dependence.
+
+var deBruijn32tab = [32]byte{
+	0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
+	31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9,
+}
+
+const deBruijn32 = 0x077CB531
+
+var deBruijn64tab = [64]byte{
+	0, 1, 56, 2, 57, 49, 28, 3, 61, 58, 42, 50, 38, 29, 17, 4,
+	62, 47, 59, 36, 45, 43, 51, 22, 53, 39, 33, 30, 24, 18, 12, 5,
+	63, 55, 48, 27, 60, 41, 37, 16, 46, 35, 44, 21, 52, 32, 23, 11,
+	54, 26, 40, 15, 34, 20, 31, 10, 25, 14, 19, 9, 13, 8, 7, 6,
+}
+
+const deBruijn64 = 0x03f79d71b4ca8b09
+
+const ntz8tab = "" +
+	"\x08\x00\x01\x00\x02\x00\x01\x00\x03\x00\x01\x00\x02\x00\x01\x00" +
+	"\x04\x00\x01\x00\x02\x00\x01\x00\x03\x00\x01\x00\x02\x00\x01\x00" +
+	"\x05\x00\x01\x00\x02\x00\x01\x00\x03\x00\x01\x00\x02\x00\x01\x00" +
+	"\x04\x00\x01\x00\x02\x00\x01\x00\x03\x00\x01\x00\x02\x00\x01\x00" +
+	"\x06\x00\x01\x00\x02\x00\x01\x00\x03\x00\x01\x00\x02\x00\x01\x00" +
+	"\x04\x00\x01\x00\x02\x00\x01\x00\x03\x00\x01\x00\x02\x00\x01\x00" +
+	"\x05\x00\x01\x00\x02\x00\x01\x00\x03\x00\x01\x00\x02\x00\x01\x00" +
+	"\x04\x00\x01\x00\x02\x00\x01\x00\x03\x00\x01\x00\x02\x00\x01\x00" +
+	"\x07\x00\x01\x00\x02\x00\x01\x00\x03\x00\x01\x00\x02\x00\x01\x00" +
+	"\x04\x00\x01\x00\x02\x00\x01\x00\x03\x00\x01\x00\x02\x00\x01\x00" +
+	"\x05\x00\x01\x00\x02\x00\x01\x00\x03\x00\x01\x00\x02\x00\x01\x00" +
+	"\x04\x00\x01\x00\x02\x00\x01\x00\x03\x00\x01\x00\x02\x00\x01\x00" +
+	"\x06\x00\x01\x00\x02\x00\x01\x00\x03\x00\x01\x00\x02\x00\x01\x00" +
+	"\x04\x00\x01\x00\x02\x00\x01\x00\x03\x00\x01\x00\x02\x00\x01\x00" +
+	"\x05\x00\x01\x00\x02\x00\x01\x00\x03\x00\x01\x00\x02\x00\x01\x00" +
+	"\x04\x00\x01\x00\x02\x00\x01\x00\x03\x00\x01\x00\x02\x00\x01\x00"
+
+// TrailingZeros32 returns the number of trailing zero bits in x; the result is 32 for x == 0.
+func TrailingZeros32(x uint32) int {
+	if x == 0 {
+		return 32
+	}
+	// see comment in TrailingZeros64
+	return int(deBruijn32tab[(x&-x)*deBruijn32>>(32-5)])
+}
+
+// TrailingZeros64 returns the number of trailing zero bits in x; the result is 64 for x == 0.
+func TrailingZeros64(x uint64) int {
+	if x == 0 {
+		return 64
+	}
+	// If popcount is fast, replace code below with return popcount(^x & (x - 1)).
+	//
+	// x & -x leaves only the right-most bit set in the word. Let k be the
+	// index of that bit. Since only a single bit is set, the value is two
+	// to the power of k. Multiplying by a power of two is equivalent to
+	// left shifting, in this case by k bits. The de Bruijn (64 bit) constant
+	// is such that all six bit, consecutive substrings are distinct.
+	// Therefore, if we have a left shifted version of this constant we can
+	// find by how many bits it was shifted by looking at which six bit
+	// substring ended up at the top of the word.
+	// (Knuth, volume 4, section 7.3.1)
+	return int(deBruijn64tab[(x&-x)*deBruijn64>>(64-6)])
+}
+
+// TrailingZeros8 returns the number of trailing zero bits in x; the result is 8 for x == 0.
+func TrailingZeros8(x uint8) int {
+	return int(ntz8tab[x])
+}
+
+const len8tab = "" +
+	"\x00\x01\x02\x02\x03\x03\x03\x03\x04\x04\x04\x04\x04\x04\x04\x04" +
+	"\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05\x05" +
+	"\x06\x06\x06\x06\x06\x06\x06\x06\x06\x06\x06\x06\x06\x06\x06\x06" +
+	"\x06\x06\x06\x06\x06\x06\x06\x06\x06\x06\x06\x06\x06\x06\x06\x06" +
+	"\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07" +
+	"\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07" +
+	"\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07" +
+	"\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07\x07" +
+	"\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08" +
+	"\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08" +
+	"\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08" +
+	"\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08" +
+	"\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08" +
+	"\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08" +
+	"\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08" +
+	"\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08"
+
+// Len64 returns the minimum number of bits required to represent x; the result is 0 for x == 0.
+//
+// nosplit because this is used in src/runtime/histogram.go, which make run in sensitive contexts.
+//
+//go:nosplit
+func Len64(x uint64) (n int) {
+	if x >= 1<<32 {
+		x >>= 32
+		n = 32
+	}
+	if x >= 1<<16 {
+		x >>= 16
+		n += 16
+	}
+	if x >= 1<<8 {
+		x >>= 8
+		n += 8
+	}
+	return n + int(len8tab[x])
+}
+
+// --- OnesCount ---
+
+const m0 = 0x5555555555555555 // 01010101 ...
+const m1 = 0x3333333333333333 // 00110011 ...
+const m2 = 0x0f0f0f0f0f0f0f0f // 00001111 ...
+
+// OnesCount64 returns the number of one bits ("population count") in x.
+func OnesCount64(x uint64) int {
+	// Implementation: Parallel summing of adjacent bits.
+	// See "Hacker's Delight", Chap. 5: Counting Bits.
+	// The following pattern shows the general approach:
+	//
+	//   x = x>>1&(m0&m) + x&(m0&m)
+	//   x = x>>2&(m1&m) + x&(m1&m)
+	//   x = x>>4&(m2&m) + x&(m2&m)
+	//   x = x>>8&(m3&m) + x&(m3&m)
+	//   x = x>>16&(m4&m) + x&(m4&m)
+	//   x = x>>32&(m5&m) + x&(m5&m)
+	//   return int(x)
+	//
+	// Masking (& operations) can be left away when there's no
+	// danger that a field's sum will carry over into the next
+	// field: Since the result cannot be > 64, 8 bits is enough
+	// and we can ignore the masks for the shifts by 8 and up.
+	// Per "Hacker's Delight", the first line can be simplified
+	// more, but it saves at best one instruction, so we leave
+	// it alone for clarity.
+	const m = 1<<64 - 1
+	x = x>>1&(m0&m) + x&(m0&m)
+	x = x>>2&(m1&m) + x&(m1&m)
+	x = (x>>4 + x) & (m2 & m)
+	x += x >> 8
+	x += x >> 16
+	x += x >> 32
+	return int(x) & (1<<7 - 1)
+}
+
+// LeadingZeros64 returns the number of leading zero bits in x; the result is 64 for x == 0.
+func LeadingZeros64(x uint64) int { return 64 - Len64(x) }
+
+// LeadingZeros8 returns the number of leading zero bits in x; the result is 8 for x == 0.
+func LeadingZeros8(x uint8) int { return 8 - Len8(x) }
+
+// Len8 returns the minimum number of bits required to represent x; the result is 0 for x == 0.
+func Len8(x uint8) int {
+	return int(len8tab[x])
+}
+
+// Bswap64 returns its input with byte order reversed
+// 0x0102030405060708 -> 0x0807060504030201
+func Bswap64(x uint64) uint64 {
+	c8 := uint64(0x00ff00ff00ff00ff)
+	a := x >> 8 & c8
+	b := (x & c8) << 8
+	x = a | b
+	c16 := uint64(0x0000ffff0000ffff)
+	a = x >> 16 & c16
+	b = (x & c16) << 16
+	x = a | b
+	c32 := uint64(0x00000000ffffffff)
+	a = x >> 32 & c32
+	b = (x & c32) << 32
+	x = a | b
+	return x
+}
+
+// Bswap32 returns its input with byte order reversed
+// 0x01020304 -> 0x04030201
+func Bswap32(x uint32) uint32 {
+	c8 := uint32(0x00ff00ff)
+	a := x >> 8 & c8
+	b := (x & c8) << 8
+	x = a | b
+	c16 := uint32(0x0000ffff)
+	a = x >> 16 & c16
+	b = (x & c16) << 16
+	x = a | b
+	return x
+}
+
+// Prefetch prefetches data from memory addr to cache
+//
+// AMD64: Produce PREFETCHT0 instruction
+//
+// ARM64: Produce PRFM instruction with PLDL1KEEP option
+func Prefetch(addr uintptr) {}
+
+// PrefetchStreamed prefetches data from memory addr, with a hint that this data is being streamed.
+// That is, it is likely to be accessed very soon, but only once. If possible, this will avoid polluting the cache.
+//
+// AMD64: Produce PREFETCHNTA instruction
+//
+// ARM64: Produce PRFM instruction with PLDL1STRM option
+func PrefetchStreamed(addr uintptr) {}
diff --git a/src/runtime/internal/sys/intrinsics_test.go b/src/runtime/internal/sys/intrinsics_test.go
new file mode 100644
index 0000000..bf75f19
--- /dev/null
+++ b/src/runtime/internal/sys/intrinsics_test.go
@@ -0,0 +1,38 @@
+package sys_test
+
+import (
+	"runtime/internal/sys"
+	"testing"
+)
+
+func TestTrailingZeros64(t *testing.T) {
+	for i := 0; i <= 64; i++ {
+		x := uint64(5) << uint(i)
+		if got := sys.TrailingZeros64(x); got != i {
+			t.Errorf("TrailingZeros64(%d)=%d, want %d", x, got, i)
+		}
+	}
+}
+func TestTrailingZeros32(t *testing.T) {
+	for i := 0; i <= 32; i++ {
+		x := uint32(5) << uint(i)
+		if got := sys.TrailingZeros32(x); got != i {
+			t.Errorf("TrailingZeros32(%d)=%d, want %d", x, got, i)
+		}
+	}
+}
+
+func TestBswap64(t *testing.T) {
+	x := uint64(0x1122334455667788)
+	y := sys.Bswap64(x)
+	if y != 0x8877665544332211 {
+		t.Errorf("Bswap(%x)=%x, want 0x8877665544332211", x, y)
+	}
+}
+func TestBswap32(t *testing.T) {
+	x := uint32(0x11223344)
+	y := sys.Bswap32(x)
+	if y != 0x44332211 {
+		t.Errorf("Bswap(%x)=%x, want 0x44332211", x, y)
+	}
+}
diff --git a/src/runtime/internal/sys/nih.go b/src/runtime/internal/sys/nih.go
new file mode 100644
index 0000000..17eab67
--- /dev/null
+++ b/src/runtime/internal/sys/nih.go
@@ -0,0 +1,41 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package sys
+
+// NOTE: keep in sync with cmd/compile/internal/types.CalcSize
+// to make the compiler recognize this as an intrinsic type.
+type nih struct{}
+
+// NotInHeap is a type must never be allocated from the GC'd heap or on the stack,
+// and is called not-in-heap.
+//
+// Other types can embed NotInHeap to make it not-in-heap. Specifically, pointers
+// to these types must always fail the `runtime.inheap` check. The type may be used
+// for global variables, or for objects in unmanaged memory (e.g., allocated with
+// `sysAlloc`, `persistentalloc`, r`fixalloc`, or from a manually-managed span).
+//
+// Specifically:
+//
+// 1. `new(T)`, `make([]T)`, `append([]T, ...)` and implicit heap
+// allocation of T are disallowed. (Though implicit allocations are
+// disallowed in the runtime anyway.)
+//
+// 2. A pointer to a regular type (other than `unsafe.Pointer`) cannot be
+// converted to a pointer to a not-in-heap type, even if they have the
+// same underlying type.
+//
+// 3. Any type that containing a not-in-heap type is itself considered as not-in-heap.
+//
+// - Structs and arrays are not-in-heap if their elements are not-in-heap.
+// - Maps and channels contains no-in-heap types are disallowed.
+//
+// 4. Write barriers on pointers to not-in-heap types can be omitted.
+//
+// The last point is the real benefit of NotInHeap. The runtime uses
+// it for low-level internal structures to avoid memory barriers in the
+// scheduler and the memory allocator where they are illegal or simply
+// inefficient. This mechanism is reasonably safe and does not compromise
+// the readability of the runtime.
+type NotInHeap struct{ _ nih }
diff --git a/src/runtime/internal/sys/sys.go b/src/runtime/internal/sys/sys.go
new file mode 100644
index 0000000..694101d
--- /dev/null
+++ b/src/runtime/internal/sys/sys.go
@@ -0,0 +1,7 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// package sys contains system- and configuration- and architecture-specific
+// constants used by the runtime.
+package sys
diff --git a/src/runtime/internal/syscall/asm_linux_386.s b/src/runtime/internal/syscall/asm_linux_386.s
new file mode 100644
index 0000000..15aae4d
--- /dev/null
+++ b/src/runtime/internal/syscall/asm_linux_386.s
@@ -0,0 +1,34 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See ../sys_linux_386.s for the reason why we always use int 0x80
+// instead of the glibc-specific "CALL 0x10(GS)".
+#define INVOKE_SYSCALL	INT	$0x80
+
+// func Syscall6(num, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, errno uintptr)
+//
+// Syscall # in AX, args in BX CX DX SI DI BP, return in AX
+TEXT ·Syscall6(SB),NOSPLIT,$0-40
+	MOVL	num+0(FP), AX	// syscall entry
+	MOVL	a1+4(FP), BX
+	MOVL	a2+8(FP), CX
+	MOVL	a3+12(FP), DX
+	MOVL	a4+16(FP), SI
+	MOVL	a5+20(FP), DI
+	MOVL	a6+24(FP), BP
+	INVOKE_SYSCALL
+	CMPL	AX, $0xfffff001
+	JLS	ok
+	MOVL	$-1, r1+28(FP)
+	MOVL	$0, r2+32(FP)
+	NEGL	AX
+	MOVL	AX, errno+36(FP)
+	RET
+ok:
+	MOVL	AX, r1+28(FP)
+	MOVL	DX, r2+32(FP)
+	MOVL	$0, errno+36(FP)
+	RET
diff --git a/src/runtime/internal/syscall/asm_linux_amd64.s b/src/runtime/internal/syscall/asm_linux_amd64.s
new file mode 100644
index 0000000..3740ef1
--- /dev/null
+++ b/src/runtime/internal/syscall/asm_linux_amd64.s
@@ -0,0 +1,47 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// func Syscall6(num, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, errno uintptr)
+//
+// We need to convert to the syscall ABI.
+//
+// arg | ABIInternal | Syscall
+// ---------------------------
+// num | AX          | AX
+// a1  | BX          | DI
+// a2  | CX          | SI
+// a3  | DI          | DX
+// a4  | SI          | R10
+// a5  | R8          | R8
+// a6  | R9          | R9
+//
+// r1  | AX          | AX
+// r2  | BX          | DX
+// err | CX          | part of AX
+//
+// Note that this differs from "standard" ABI convention, which would pass 4th
+// arg in CX, not R10.
+TEXT ·Syscall6<ABIInternal>(SB),NOSPLIT,$0
+	// a6 already in R9.
+	// a5 already in R8.
+	MOVQ	SI, R10 // a4
+	MOVQ	DI, DX  // a3
+	MOVQ	CX, SI  // a2
+	MOVQ	BX, DI  // a1
+	// num already in AX.
+	SYSCALL
+	CMPQ	AX, $0xfffffffffffff001
+	JLS	ok
+	NEGQ	AX
+	MOVQ	AX, CX  // errno
+	MOVQ	$-1, AX // r1
+	MOVQ	$0, BX  // r2
+	RET
+ok:
+	// r1 already in AX.
+	MOVQ	DX, BX // r2
+	MOVQ	$0, CX // errno
+	RET
diff --git a/src/runtime/internal/syscall/asm_linux_arm.s b/src/runtime/internal/syscall/asm_linux_arm.s
new file mode 100644
index 0000000..dbf1826
--- /dev/null
+++ b/src/runtime/internal/syscall/asm_linux_arm.s
@@ -0,0 +1,32 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// func Syscall6(num, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, errno uintptr)
+TEXT ·Syscall6(SB),NOSPLIT,$0-40
+	MOVW	num+0(FP), R7	// syscall entry
+	MOVW	a1+4(FP), R0
+	MOVW	a2+8(FP), R1
+	MOVW	a3+12(FP), R2
+	MOVW	a4+16(FP), R3
+	MOVW	a5+20(FP), R4
+	MOVW	a6+24(FP), R5
+	SWI	$0
+	MOVW	$0xfffff001, R6
+	CMP	R6, R0
+	BLS	ok
+	MOVW	$-1, R1
+	MOVW	R1, r1+28(FP)
+	MOVW	$0, R2
+	MOVW	R2, r2+32(FP)
+	RSB	$0, R0, R0
+	MOVW	R0, errno+36(FP)
+	RET
+ok:
+	MOVW	R0, r1+28(FP)
+	MOVW	R1, r2+32(FP)
+	MOVW	$0, R0
+	MOVW	R0, errno+36(FP)
+	RET
diff --git a/src/runtime/internal/syscall/asm_linux_arm64.s b/src/runtime/internal/syscall/asm_linux_arm64.s
new file mode 100644
index 0000000..83e862f
--- /dev/null
+++ b/src/runtime/internal/syscall/asm_linux_arm64.s
@@ -0,0 +1,29 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// func Syscall6(num, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, errno uintptr)
+TEXT ·Syscall6(SB),NOSPLIT,$0-80
+	MOVD	num+0(FP), R8	// syscall entry
+	MOVD	a1+8(FP), R0
+	MOVD	a2+16(FP), R1
+	MOVD	a3+24(FP), R2
+	MOVD	a4+32(FP), R3
+	MOVD	a5+40(FP), R4
+	MOVD	a6+48(FP), R5
+	SVC
+	CMN	$4095, R0
+	BCC	ok
+	MOVD	$-1, R4
+	MOVD	R4, r1+56(FP)
+	MOVD	ZR, r2+64(FP)
+	NEG	R0, R0
+	MOVD	R0, errno+72(FP)
+	RET
+ok:
+	MOVD	R0, r1+56(FP)
+	MOVD	R1, r2+64(FP)
+	MOVD	ZR, errno+72(FP)
+	RET
diff --git a/src/runtime/internal/syscall/asm_linux_loong64.s b/src/runtime/internal/syscall/asm_linux_loong64.s
new file mode 100644
index 0000000..d6a33f9
--- /dev/null
+++ b/src/runtime/internal/syscall/asm_linux_loong64.s
@@ -0,0 +1,29 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// func Syscall6(num, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, errno uintptr)
+TEXT ·Syscall6(SB),NOSPLIT,$0-80
+	MOVV	num+0(FP), R11  // syscall entry
+	MOVV	a1+8(FP), R4
+	MOVV	a2+16(FP), R5
+	MOVV	a3+24(FP), R6
+	MOVV	a4+32(FP), R7
+	MOVV	a5+40(FP), R8
+	MOVV	a6+48(FP), R9
+	SYSCALL
+	MOVW	$-4096, R12
+	BGEU	R12, R4, ok
+	MOVV	$-1, R12
+	MOVV	R12, r1+56(FP)
+	MOVV	R0, r2+64(FP)
+	SUBVU	R4, R0, R4
+	MOVV	R4, errno+72(FP)
+	RET
+ok:
+	MOVV	R4, r1+56(FP)
+	MOVV	R0, r2+64(FP)	// r2 is not used. Always set to 0.
+	MOVV	R0, errno+72(FP)
+	RET
diff --git a/src/runtime/internal/syscall/asm_linux_mips64x.s b/src/runtime/internal/syscall/asm_linux_mips64x.s
new file mode 100644
index 0000000..6b7c524
--- /dev/null
+++ b/src/runtime/internal/syscall/asm_linux_mips64x.s
@@ -0,0 +1,30 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (mips64 || mips64le)
+
+#include "textflag.h"
+
+// func Syscall6(num, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, errno uintptr)
+TEXT ·Syscall6(SB),NOSPLIT,$0-80
+	MOVV	num+0(FP), R2	// syscall entry
+	MOVV	a1+8(FP), R4
+	MOVV	a2+16(FP), R5
+	MOVV	a3+24(FP), R6
+	MOVV	a4+32(FP), R7
+	MOVV	a5+40(FP), R8
+	MOVV	a6+48(FP), R9
+	MOVV	R0, R3	// reset R3 to 0 as 1-ret SYSCALL keeps it
+	SYSCALL
+	BEQ	R7, ok
+	MOVV	$-1, R1
+	MOVV	R1, r1+56(FP)
+	MOVV	R0, r2+64(FP)
+	MOVV	R2, errno+72(FP)
+	RET
+ok:
+	MOVV	R2, r1+56(FP)
+	MOVV	R3, r2+64(FP)
+	MOVV	R0, errno+72(FP)
+	RET
diff --git a/src/runtime/internal/syscall/asm_linux_mipsx.s b/src/runtime/internal/syscall/asm_linux_mipsx.s
new file mode 100644
index 0000000..561310f
--- /dev/null
+++ b/src/runtime/internal/syscall/asm_linux_mipsx.s
@@ -0,0 +1,35 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (mips || mipsle)
+
+#include "textflag.h"
+
+// func Syscall6(num, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, errno uintptr)
+//
+// The 5th and 6th arg go at sp+16, sp+20.
+// Note that frame size of 20 means that 24 bytes gets reserved on stack.
+TEXT ·Syscall6(SB),NOSPLIT,$20-40
+	MOVW	num+0(FP), R2	// syscall entry
+	MOVW	a1+4(FP), R4
+	MOVW	a2+8(FP), R5
+	MOVW	a3+12(FP), R6
+	MOVW	a4+16(FP), R7
+	MOVW	a5+20(FP), R8
+	MOVW	a6+24(FP), R9
+	MOVW	R8, 16(R29)
+	MOVW	R9, 20(R29)
+	MOVW	R0, R3	// reset R3 to 0 as 1-ret SYSCALL keeps it
+	SYSCALL
+	BEQ	R7, ok
+	MOVW	$-1, R1
+	MOVW	R1, r1+28(FP)
+	MOVW	R0, r2+32(FP)
+	MOVW	R2, errno+36(FP)
+	RET
+ok:
+	MOVW	R2, r1+28(FP)
+	MOVW	R3, r2+32(FP)
+	MOVW	R0, errno+36(FP)
+	RET
diff --git a/src/runtime/internal/syscall/asm_linux_ppc64x.s b/src/runtime/internal/syscall/asm_linux_ppc64x.s
new file mode 100644
index 0000000..3e985ed
--- /dev/null
+++ b/src/runtime/internal/syscall/asm_linux_ppc64x.s
@@ -0,0 +1,23 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (ppc64 || ppc64le)
+
+#include "textflag.h"
+
+// func Syscall6(num, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, errno uintptr)
+TEXT ·Syscall6<ABIInternal>(SB),NOSPLIT,$0-80
+	MOVD	R3, R10	// Move syscall number to R10. SYSCALL will move it R0, and restore R0.
+	MOVD	R4, R3
+	MOVD	R5, R4
+	MOVD	R6, R5
+	MOVD	R7, R6
+	MOVD	R8, R7
+	MOVD	R9, R8
+	SYSCALL	R10
+	MOVD	$-1, R6
+	ISEL	CR0SO, R3, R0, R5 // errno = (error) ? R3 : 0
+	ISEL	CR0SO, R6, R3, R3 // r1 = (error) ? -1 : 0
+	MOVD	$0, R4            // r2 is not used on linux/ppc64
+	RET
diff --git a/src/runtime/internal/syscall/asm_linux_riscv64.s b/src/runtime/internal/syscall/asm_linux_riscv64.s
new file mode 100644
index 0000000..15e50ec
--- /dev/null
+++ b/src/runtime/internal/syscall/asm_linux_riscv64.s
@@ -0,0 +1,43 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// func Syscall6(num, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, errno uintptr)
+//
+// We need to convert to the syscall ABI.
+//
+// arg | ABIInternal | Syscall
+// ---------------------------
+// num | A0          | A7
+// a1  | A1          | A0
+// a2  | A2          | A1
+// a3  | A3          | A2
+// a4  | A4          | A3
+// a5  | A5          | A4
+// a6  | A6          | A5
+//
+// r1  | A0          | A0
+// r2  | A1          | A1
+// err | A2          | part of A0
+TEXT ·Syscall6<ABIInternal>(SB),NOSPLIT,$0-80
+	MOV	A0, A7
+	MOV	A1, A0
+	MOV	A2, A1
+	MOV	A3, A2
+	MOV	A4, A3
+	MOV	A5, A4
+	MOV	A6, A5
+	ECALL
+	MOV	$-4096, T0
+	BLTU	T0, A0, err
+	// r1 already in A0
+	// r2 already in A1
+	MOV	ZERO, A2 // errno
+	RET
+err:
+	SUB	A0, ZERO, A2 // errno
+	MOV	$-1, A0	     // r1
+	MOV	ZERO, A1     // r2
+	RET
diff --git a/src/runtime/internal/syscall/asm_linux_s390x.s b/src/runtime/internal/syscall/asm_linux_s390x.s
new file mode 100644
index 0000000..1b27f29
--- /dev/null
+++ b/src/runtime/internal/syscall/asm_linux_s390x.s
@@ -0,0 +1,28 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// func Syscall6(num, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, errno uintptr)
+TEXT ·Syscall6(SB),NOSPLIT,$0-80
+	MOVD	num+0(FP), R1	// syscall entry
+	MOVD	a1+8(FP), R2
+	MOVD	a2+16(FP), R3
+	MOVD	a3+24(FP), R4
+	MOVD	a4+32(FP), R5
+	MOVD	a5+40(FP), R6
+	MOVD	a6+48(FP), R7
+	SYSCALL
+	MOVD	$0xfffffffffffff001, R8
+	CMPUBLT	R2, R8, ok
+	MOVD	$-1, r1+56(FP)
+	MOVD	$0, r2+64(FP)
+	NEG	R2, R2
+	MOVD	R2, errno+72(FP)
+	RET
+ok:
+	MOVD	R2, r1+56(FP)
+	MOVD	R3, r2+64(FP)
+	MOVD	$0, errno+72(FP)
+	RET
diff --git a/src/runtime/internal/syscall/defs_linux_386.go b/src/runtime/internal/syscall/defs_linux_386.go
new file mode 100644
index 0000000..dc723a6
--- /dev/null
+++ b/src/runtime/internal/syscall/defs_linux_386.go
@@ -0,0 +1,29 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package syscall
+
+const (
+	SYS_FCNTL         = 55
+	SYS_EPOLL_CTL     = 255
+	SYS_EPOLL_PWAIT   = 319
+	SYS_EPOLL_CREATE1 = 329
+	SYS_EPOLL_PWAIT2  = 441
+
+	EPOLLIN       = 0x1
+	EPOLLOUT      = 0x4
+	EPOLLERR      = 0x8
+	EPOLLHUP      = 0x10
+	EPOLLRDHUP    = 0x2000
+	EPOLLET       = 0x80000000
+	EPOLL_CLOEXEC = 0x80000
+	EPOLL_CTL_ADD = 0x1
+	EPOLL_CTL_DEL = 0x2
+	EPOLL_CTL_MOD = 0x3
+)
+
+type EpollEvent struct {
+	Events uint32
+	Data   [8]byte // to match amd64
+}
diff --git a/src/runtime/internal/syscall/defs_linux_amd64.go b/src/runtime/internal/syscall/defs_linux_amd64.go
new file mode 100644
index 0000000..886eb5b
--- /dev/null
+++ b/src/runtime/internal/syscall/defs_linux_amd64.go
@@ -0,0 +1,29 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package syscall
+
+const (
+	SYS_FCNTL         = 72
+	SYS_EPOLL_CTL     = 233
+	SYS_EPOLL_PWAIT   = 281
+	SYS_EPOLL_CREATE1 = 291
+	SYS_EPOLL_PWAIT2  = 441
+
+	EPOLLIN       = 0x1
+	EPOLLOUT      = 0x4
+	EPOLLERR      = 0x8
+	EPOLLHUP      = 0x10
+	EPOLLRDHUP    = 0x2000
+	EPOLLET       = 0x80000000
+	EPOLL_CLOEXEC = 0x80000
+	EPOLL_CTL_ADD = 0x1
+	EPOLL_CTL_DEL = 0x2
+	EPOLL_CTL_MOD = 0x3
+)
+
+type EpollEvent struct {
+	Events uint32
+	Data   [8]byte // unaligned uintptr
+}
diff --git a/src/runtime/internal/syscall/defs_linux_arm.go b/src/runtime/internal/syscall/defs_linux_arm.go
new file mode 100644
index 0000000..8f812a2
--- /dev/null
+++ b/src/runtime/internal/syscall/defs_linux_arm.go
@@ -0,0 +1,30 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package syscall
+
+const (
+	SYS_FCNTL         = 55
+	SYS_EPOLL_CTL     = 251
+	SYS_EPOLL_PWAIT   = 346
+	SYS_EPOLL_CREATE1 = 357
+	SYS_EPOLL_PWAIT2  = 441
+
+	EPOLLIN       = 0x1
+	EPOLLOUT      = 0x4
+	EPOLLERR      = 0x8
+	EPOLLHUP      = 0x10
+	EPOLLRDHUP    = 0x2000
+	EPOLLET       = 0x80000000
+	EPOLL_CLOEXEC = 0x80000
+	EPOLL_CTL_ADD = 0x1
+	EPOLL_CTL_DEL = 0x2
+	EPOLL_CTL_MOD = 0x3
+)
+
+type EpollEvent struct {
+	Events uint32
+	_pad   uint32
+	Data   [8]byte // to match amd64
+}
diff --git a/src/runtime/internal/syscall/defs_linux_arm64.go b/src/runtime/internal/syscall/defs_linux_arm64.go
new file mode 100644
index 0000000..48e11b0
--- /dev/null
+++ b/src/runtime/internal/syscall/defs_linux_arm64.go
@@ -0,0 +1,30 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package syscall
+
+const (
+	SYS_EPOLL_CREATE1 = 20
+	SYS_EPOLL_CTL     = 21
+	SYS_EPOLL_PWAIT   = 22
+	SYS_FCNTL         = 25
+	SYS_EPOLL_PWAIT2  = 441
+
+	EPOLLIN       = 0x1
+	EPOLLOUT      = 0x4
+	EPOLLERR      = 0x8
+	EPOLLHUP      = 0x10
+	EPOLLRDHUP    = 0x2000
+	EPOLLET       = 0x80000000
+	EPOLL_CLOEXEC = 0x80000
+	EPOLL_CTL_ADD = 0x1
+	EPOLL_CTL_DEL = 0x2
+	EPOLL_CTL_MOD = 0x3
+)
+
+type EpollEvent struct {
+	Events uint32
+	_pad   uint32
+	Data   [8]byte // to match amd64
+}
diff --git a/src/runtime/internal/syscall/defs_linux_loong64.go b/src/runtime/internal/syscall/defs_linux_loong64.go
new file mode 100644
index 0000000..b78ef81
--- /dev/null
+++ b/src/runtime/internal/syscall/defs_linux_loong64.go
@@ -0,0 +1,30 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package syscall
+
+const (
+	SYS_EPOLL_CREATE1 = 20
+	SYS_EPOLL_CTL     = 21
+	SYS_EPOLL_PWAIT   = 22
+	SYS_FCNTL         = 25
+	SYS_EPOLL_PWAIT2  = 441
+
+	EPOLLIN       = 0x1
+	EPOLLOUT      = 0x4
+	EPOLLERR      = 0x8
+	EPOLLHUP      = 0x10
+	EPOLLRDHUP    = 0x2000
+	EPOLLET       = 0x80000000
+	EPOLL_CLOEXEC = 0x80000
+	EPOLL_CTL_ADD = 0x1
+	EPOLL_CTL_DEL = 0x2
+	EPOLL_CTL_MOD = 0x3
+)
+
+type EpollEvent struct {
+	Events    uint32
+	pad_cgo_0 [4]byte
+	Data      [8]byte // unaligned uintptr
+}
diff --git a/src/runtime/internal/syscall/defs_linux_mips64x.go b/src/runtime/internal/syscall/defs_linux_mips64x.go
new file mode 100644
index 0000000..92b49ca
--- /dev/null
+++ b/src/runtime/internal/syscall/defs_linux_mips64x.go
@@ -0,0 +1,32 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (mips64 || mips64le)
+
+package syscall
+
+const (
+	SYS_FCNTL         = 5070
+	SYS_EPOLL_CTL     = 5208
+	SYS_EPOLL_PWAIT   = 5272
+	SYS_EPOLL_CREATE1 = 5285
+	SYS_EPOLL_PWAIT2  = 5441
+
+	EPOLLIN       = 0x1
+	EPOLLOUT      = 0x4
+	EPOLLERR      = 0x8
+	EPOLLHUP      = 0x10
+	EPOLLRDHUP    = 0x2000
+	EPOLLET       = 0x80000000
+	EPOLL_CLOEXEC = 0x80000
+	EPOLL_CTL_ADD = 0x1
+	EPOLL_CTL_DEL = 0x2
+	EPOLL_CTL_MOD = 0x3
+)
+
+type EpollEvent struct {
+	Events    uint32
+	pad_cgo_0 [4]byte
+	Data      [8]byte // unaligned uintptr
+}
diff --git a/src/runtime/internal/syscall/defs_linux_mipsx.go b/src/runtime/internal/syscall/defs_linux_mipsx.go
new file mode 100644
index 0000000..e28d09c
--- /dev/null
+++ b/src/runtime/internal/syscall/defs_linux_mipsx.go
@@ -0,0 +1,32 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (mips || mipsle)
+
+package syscall
+
+const (
+	SYS_FCNTL         = 4055
+	SYS_EPOLL_CTL     = 4249
+	SYS_EPOLL_PWAIT   = 4313
+	SYS_EPOLL_CREATE1 = 4326
+	SYS_EPOLL_PWAIT2  = 4441
+
+	EPOLLIN       = 0x1
+	EPOLLOUT      = 0x4
+	EPOLLERR      = 0x8
+	EPOLLHUP      = 0x10
+	EPOLLRDHUP    = 0x2000
+	EPOLLET       = 0x80000000
+	EPOLL_CLOEXEC = 0x80000
+	EPOLL_CTL_ADD = 0x1
+	EPOLL_CTL_DEL = 0x2
+	EPOLL_CTL_MOD = 0x3
+)
+
+type EpollEvent struct {
+	Events    uint32
+	pad_cgo_0 [4]byte
+	Data      uint64
+}
diff --git a/src/runtime/internal/syscall/defs_linux_ppc64x.go b/src/runtime/internal/syscall/defs_linux_ppc64x.go
new file mode 100644
index 0000000..a74483e
--- /dev/null
+++ b/src/runtime/internal/syscall/defs_linux_ppc64x.go
@@ -0,0 +1,32 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (ppc64 || ppc64le)
+
+package syscall
+
+const (
+	SYS_FCNTL         = 55
+	SYS_EPOLL_CTL     = 237
+	SYS_EPOLL_PWAIT   = 303
+	SYS_EPOLL_CREATE1 = 315
+	SYS_EPOLL_PWAIT2  = 441
+
+	EPOLLIN       = 0x1
+	EPOLLOUT      = 0x4
+	EPOLLERR      = 0x8
+	EPOLLHUP      = 0x10
+	EPOLLRDHUP    = 0x2000
+	EPOLLET       = 0x80000000
+	EPOLL_CLOEXEC = 0x80000
+	EPOLL_CTL_ADD = 0x1
+	EPOLL_CTL_DEL = 0x2
+	EPOLL_CTL_MOD = 0x3
+)
+
+type EpollEvent struct {
+	Events    uint32
+	pad_cgo_0 [4]byte
+	Data      [8]byte // unaligned uintptr
+}
diff --git a/src/runtime/internal/syscall/defs_linux_riscv64.go b/src/runtime/internal/syscall/defs_linux_riscv64.go
new file mode 100644
index 0000000..b78ef81
--- /dev/null
+++ b/src/runtime/internal/syscall/defs_linux_riscv64.go
@@ -0,0 +1,30 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package syscall
+
+const (
+	SYS_EPOLL_CREATE1 = 20
+	SYS_EPOLL_CTL     = 21
+	SYS_EPOLL_PWAIT   = 22
+	SYS_FCNTL         = 25
+	SYS_EPOLL_PWAIT2  = 441
+
+	EPOLLIN       = 0x1
+	EPOLLOUT      = 0x4
+	EPOLLERR      = 0x8
+	EPOLLHUP      = 0x10
+	EPOLLRDHUP    = 0x2000
+	EPOLLET       = 0x80000000
+	EPOLL_CLOEXEC = 0x80000
+	EPOLL_CTL_ADD = 0x1
+	EPOLL_CTL_DEL = 0x2
+	EPOLL_CTL_MOD = 0x3
+)
+
+type EpollEvent struct {
+	Events    uint32
+	pad_cgo_0 [4]byte
+	Data      [8]byte // unaligned uintptr
+}
diff --git a/src/runtime/internal/syscall/defs_linux_s390x.go b/src/runtime/internal/syscall/defs_linux_s390x.go
new file mode 100644
index 0000000..a7bb1ba
--- /dev/null
+++ b/src/runtime/internal/syscall/defs_linux_s390x.go
@@ -0,0 +1,30 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package syscall
+
+const (
+	SYS_FCNTL         = 55
+	SYS_EPOLL_CTL     = 250
+	SYS_EPOLL_PWAIT   = 312
+	SYS_EPOLL_CREATE1 = 327
+	SYS_EPOLL_PWAIT2  = 441
+
+	EPOLLIN       = 0x1
+	EPOLLOUT      = 0x4
+	EPOLLERR      = 0x8
+	EPOLLHUP      = 0x10
+	EPOLLRDHUP    = 0x2000
+	EPOLLET       = 0x80000000
+	EPOLL_CLOEXEC = 0x80000
+	EPOLL_CTL_ADD = 0x1
+	EPOLL_CTL_DEL = 0x2
+	EPOLL_CTL_MOD = 0x3
+)
+
+type EpollEvent struct {
+	Events    uint32
+	pad_cgo_0 [4]byte
+	Data      [8]byte // unaligned uintptr
+}
diff --git a/src/runtime/internal/syscall/syscall_linux.go b/src/runtime/internal/syscall/syscall_linux.go
new file mode 100644
index 0000000..7209634
--- /dev/null
+++ b/src/runtime/internal/syscall/syscall_linux.go
@@ -0,0 +1,62 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Package syscall provides the syscall primitives required for the runtime.
+package syscall
+
+import (
+	"unsafe"
+)
+
+// TODO(https://go.dev/issue/51087): This package is incomplete and currently
+// only contains very minimal support for Linux.
+
+// Syscall6 calls system call number 'num' with arguments a1-6.
+func Syscall6(num, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, errno uintptr)
+
+// syscall_RawSyscall6 is a push linkname to export Syscall6 as
+// syscall.RawSyscall6.
+//
+// //go:uintptrkeepalive because the uintptr argument may be converted pointers
+// that need to be kept alive in the caller (this is implied for Syscall6 since
+// it has no body).
+//
+// //go:nosplit because stack copying does not account for uintptrkeepalive, so
+// the stack must not grow. Stack copying cannot blindly assume that all
+// uintptr arguments are pointers, because some values may look like pointers,
+// but not really be pointers, and adjusting their value would break the call.
+//
+// This is a separate wrapper because we can't export one function as two
+// names. The assembly implementations name themselves Syscall6 would not be
+// affected by a linkname.
+//
+//go:uintptrkeepalive
+//go:nosplit
+//go:linkname syscall_RawSyscall6 syscall.RawSyscall6
+func syscall_RawSyscall6(num, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, errno uintptr) {
+	return Syscall6(num, a1, a2, a3, a4, a5, a6)
+}
+
+func EpollCreate1(flags int32) (fd int32, errno uintptr) {
+	r1, _, e := Syscall6(SYS_EPOLL_CREATE1, uintptr(flags), 0, 0, 0, 0, 0)
+	return int32(r1), e
+}
+
+var _zero uintptr
+
+func EpollWait(epfd int32, events []EpollEvent, maxev, waitms int32) (n int32, errno uintptr) {
+	var ev unsafe.Pointer
+	if len(events) > 0 {
+		ev = unsafe.Pointer(&events[0])
+	} else {
+		ev = unsafe.Pointer(&_zero)
+	}
+	r1, _, e := Syscall6(SYS_EPOLL_PWAIT, uintptr(epfd), uintptr(ev), uintptr(maxev), uintptr(waitms), 0, 0)
+	return int32(r1), e
+}
+
+func EpollCtl(epfd, op, fd int32, event *EpollEvent) (errno uintptr) {
+	_, _, e := Syscall6(SYS_EPOLL_CTL, uintptr(epfd), uintptr(op), uintptr(fd), uintptr(unsafe.Pointer(event)), 0, 0)
+	return e
+}
diff --git a/src/runtime/internal/syscall/syscall_linux_test.go b/src/runtime/internal/syscall/syscall_linux_test.go
new file mode 100644
index 0000000..1976da5
--- /dev/null
+++ b/src/runtime/internal/syscall/syscall_linux_test.go
@@ -0,0 +1,19 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package syscall_test
+
+import (
+	"runtime/internal/syscall"
+	"testing"
+)
+
+func TestEpollctlErrorSign(t *testing.T) {
+	v := syscall.EpollCtl(-1, 1, -1, &syscall.EpollEvent{})
+
+	const EBADF = 0x09
+	if v != EBADF {
+		t.Errorf("epollctl = %v, want %v", v, EBADF)
+	}
+}
diff --git a/src/runtime/internal/wasitest/host_test.go b/src/runtime/internal/wasitest/host_test.go
new file mode 100644
index 0000000..ca4ef8f
--- /dev/null
+++ b/src/runtime/internal/wasitest/host_test.go
@@ -0,0 +1,14 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package wasi_test
+
+import "flag"
+
+var target string
+
+func init() {
+	// The dist test runner passes -target when running this as a host test.
+	flag.StringVar(&target, "target", "", "")
+}
diff --git a/src/runtime/internal/wasitest/nonblock_test.go b/src/runtime/internal/wasitest/nonblock_test.go
new file mode 100644
index 0000000..3072b96
--- /dev/null
+++ b/src/runtime/internal/wasitest/nonblock_test.go
@@ -0,0 +1,101 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Not all systems have syscall.Mkfifo.
+//go:build !aix && !plan9 && !solaris && !wasm && !windows
+
+package wasi_test
+
+import (
+	"bufio"
+	"fmt"
+	"io"
+	"math/rand"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"syscall"
+	"testing"
+)
+
+// This test creates a set of FIFOs and writes to them in reverse order. It
+// checks that the output order matches the write order. The test binary opens
+// the FIFOs in their original order and spawns a goroutine for each that reads
+// from the FIFO and writes the result to stderr. If I/O was blocking, all
+// goroutines would be blocked waiting for one read call to return, and the
+// output order wouldn't match.
+
+type fifo struct {
+	file *os.File
+	path string
+}
+
+func TestNonblock(t *testing.T) {
+	if target != "wasip1/wasm" {
+		t.Skip()
+	}
+
+	switch os.Getenv("GOWASIRUNTIME") {
+	case "wasmer":
+		t.Skip("wasmer does not support non-blocking I/O")
+	}
+
+	for _, mode := range []string{"os.OpenFile", "os.NewFile"} {
+		t.Run(mode, func(t *testing.T) {
+			args := []string{"run", "./testdata/nonblock.go", mode}
+
+			fifos := make([]*fifo, 8)
+			for i := range fifos {
+				path := filepath.Join(t.TempDir(), fmt.Sprintf("wasip1-nonblock-fifo-%d-%d", rand.Uint32(), i))
+				if err := syscall.Mkfifo(path, 0666); err != nil {
+					t.Fatal(err)
+				}
+
+				file, err := os.OpenFile(path, os.O_RDWR, 0)
+				if err != nil {
+					t.Fatal(err)
+				}
+				defer file.Close()
+
+				args = append(args, path)
+				fifos[len(fifos)-i-1] = &fifo{file, path}
+			}
+
+			subProcess := exec.Command("go", args...)
+
+			subProcess.Env = append(os.Environ(), "GOOS=wasip1", "GOARCH=wasm")
+
+			pr, pw := io.Pipe()
+			defer pw.Close()
+
+			subProcess.Stderr = pw
+
+			if err := subProcess.Start(); err != nil {
+				t.Fatal(err)
+			}
+
+			scanner := bufio.NewScanner(pr)
+			if !scanner.Scan() {
+				t.Fatal("expected line:", scanner.Err())
+			} else if scanner.Text() != "waiting" {
+				t.Fatal("unexpected output:", scanner.Text())
+			}
+
+			for _, fifo := range fifos {
+				if _, err := fifo.file.WriteString(fifo.path + "\n"); err != nil {
+					t.Fatal(err)
+				}
+				if !scanner.Scan() {
+					t.Fatal("expected line:", scanner.Err())
+				} else if scanner.Text() != fifo.path {
+					t.Fatal("unexpected line:", scanner.Text())
+				}
+			}
+
+			if err := subProcess.Wait(); err != nil {
+				t.Fatal(err)
+			}
+		})
+	}
+}
diff --git a/src/runtime/internal/wasitest/tcpecho_test.go b/src/runtime/internal/wasitest/tcpecho_test.go
new file mode 100644
index 0000000..1137395
--- /dev/null
+++ b/src/runtime/internal/wasitest/tcpecho_test.go
@@ -0,0 +1,99 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package wasi_test
+
+import (
+	"bytes"
+	"fmt"
+	"math/rand"
+	"net"
+	"os"
+	"os/exec"
+	"testing"
+	"time"
+)
+
+func TestTCPEcho(t *testing.T) {
+	if target != "wasip1/wasm" {
+		t.Skip()
+	}
+
+	// We're unable to use port 0 here (let the OS choose a spare port).
+	// Although the WASM runtime accepts port 0, and the WASM module listens
+	// successfully, there's no way for this test to query the selected port
+	// so that it can connect to the WASM module. The WASM module itself
+	// cannot access any information about the socket due to limitations
+	// with WASI preview 1 networking, and the WASM runtimes do not log the
+	// port when you pre-open a socket. So, we probe for a free port here.
+	// Given there's an unavoidable race condition, the test is disabled by
+	// default.
+	if os.Getenv("GOWASIENABLERACYTEST") != "1" {
+		t.Skip("skipping WASI test with unavoidable race condition")
+	}
+	var host string
+	port := rand.Intn(10000) + 40000
+	for attempts := 0; attempts < 10; attempts++ {
+		host = fmt.Sprintf("127.0.0.1:%d", port)
+		l, err := net.Listen("tcp", host)
+		if err == nil {
+			l.Close()
+			break
+		}
+		port++
+	}
+
+	subProcess := exec.Command("go", "run", "./testdata/tcpecho.go")
+
+	subProcess.Env = append(os.Environ(), "GOOS=wasip1", "GOARCH=wasm")
+
+	switch os.Getenv("GOWASIRUNTIME") {
+	case "wazero":
+		subProcess.Env = append(subProcess.Env, "GOWASIRUNTIMEARGS=--listen="+host)
+	case "wasmtime", "":
+		subProcess.Env = append(subProcess.Env, "GOWASIRUNTIMEARGS=--tcplisten="+host)
+	default:
+		t.Skip("WASI runtime does not support sockets")
+	}
+
+	var b bytes.Buffer
+	subProcess.Stdout = &b
+	subProcess.Stderr = &b
+
+	if err := subProcess.Start(); err != nil {
+		t.Log(b.String())
+		t.Fatal(err)
+	}
+	defer subProcess.Process.Kill()
+
+	var conn net.Conn
+	var err error
+	for {
+		conn, err = net.Dial("tcp", host)
+		if err == nil {
+			break
+		}
+		time.Sleep(500 * time.Millisecond)
+	}
+	if err != nil {
+		t.Log(b.String())
+		t.Fatal(err)
+	}
+	defer conn.Close()
+
+	payload := []byte("foobar")
+	if _, err := conn.Write(payload); err != nil {
+		t.Fatal(err)
+	}
+	var buf [256]byte
+	n, err := conn.Read(buf[:])
+	if err != nil {
+		t.Fatal(err)
+	}
+	if string(buf[:n]) != string(payload) {
+		t.Error("unexpected payload")
+		t.Logf("expect: %d bytes (%v)", len(payload), payload)
+		t.Logf("actual: %d bytes (%v)", n, buf[:n])
+	}
+}
diff --git a/src/runtime/internal/wasitest/testdata/nonblock.go b/src/runtime/internal/wasitest/testdata/nonblock.go
new file mode 100644
index 0000000..8cbf21b
--- /dev/null
+++ b/src/runtime/internal/wasitest/testdata/nonblock.go
@@ -0,0 +1,65 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"os"
+	"sync"
+	"syscall"
+)
+
+func main() {
+	if len(os.Args) < 2 {
+		panic("usage: nonblock <MODE> [PATH...]")
+	}
+	mode := os.Args[1]
+
+	ready := make(chan struct{})
+
+	var wg sync.WaitGroup
+	for _, path := range os.Args[2:] {
+		f, err := os.Open(path)
+		if err != nil {
+			panic(err)
+		}
+		switch mode {
+		case "os.OpenFile":
+		case "os.NewFile":
+			fd := f.Fd()
+			if err := syscall.SetNonblock(int(fd), true); err != nil {
+				panic(err)
+			}
+			f = os.NewFile(fd, path)
+		default:
+			panic("invalid test mode")
+		}
+
+		spawnWait := make(chan struct{})
+
+		wg.Add(1)
+		go func(f *os.File) {
+			defer f.Close()
+			defer wg.Done()
+
+			close(spawnWait)
+
+			<-ready
+
+			var buf [256]byte
+			n, err := f.Read(buf[:])
+			if err != nil {
+				panic(err)
+			}
+			os.Stderr.Write(buf[:n])
+		}(f)
+
+		// Spawn one goroutine at a time.
+		<-spawnWait
+	}
+
+	println("waiting")
+	close(ready)
+	wg.Wait()
+}
diff --git a/src/runtime/internal/wasitest/testdata/tcpecho.go b/src/runtime/internal/wasitest/testdata/tcpecho.go
new file mode 100644
index 0000000..819e352
--- /dev/null
+++ b/src/runtime/internal/wasitest/testdata/tcpecho.go
@@ -0,0 +1,74 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"errors"
+	"net"
+	"os"
+	"syscall"
+)
+
+func main() {
+	if err := run(); err != nil {
+		println(err)
+		os.Exit(1)
+	}
+}
+
+func run() error {
+	l, err := findListener()
+	if err != nil {
+		return err
+	}
+	if l == nil {
+		return errors.New("no pre-opened sockets available")
+	}
+	defer l.Close()
+
+	c, err := l.Accept()
+	if err != nil {
+		return err
+	}
+	return handleConn(c)
+}
+
+func handleConn(c net.Conn) error {
+	defer c.Close()
+
+	var buf [128]byte
+	n, err := c.Read(buf[:])
+	if err != nil {
+		return err
+	}
+	if _, err := c.Write(buf[:n]); err != nil {
+		return err
+	}
+	if err := c.(*net.TCPConn).CloseWrite(); err != nil {
+		return err
+	}
+	return c.Close()
+}
+
+func findListener() (net.Listener, error) {
+	// We start looking for pre-opened sockets at fd=3 because 0, 1, and 2
+	// are reserved for stdio. Pre-opened directors also start at fd=3, so
+	// we skip fds that aren't sockets. Once we reach EBADF we know there
+	// are no more pre-opens.
+	for preopenFd := uintptr(3); ; preopenFd++ {
+		f := os.NewFile(preopenFd, "")
+		l, err := net.FileListener(f)
+		f.Close()
+
+		var se syscall.Errno
+		switch errors.As(err, &se); se {
+		case syscall.ENOTSOCK:
+			continue
+		case syscall.EBADF:
+			err = nil
+		}
+		return l, err
+	}
+}
diff --git a/src/runtime/lfstack.go b/src/runtime/lfstack.go
new file mode 100644
index 0000000..a91ae64
--- /dev/null
+++ b/src/runtime/lfstack.go
@@ -0,0 +1,77 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Lock-free stack.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// lfstack is the head of a lock-free stack.
+//
+// The zero value of lfstack is an empty list.
+//
+// This stack is intrusive. Nodes must embed lfnode as the first field.
+//
+// The stack does not keep GC-visible pointers to nodes, so the caller
+// must ensure the nodes are allocated outside the Go heap.
+type lfstack uint64
+
+func (head *lfstack) push(node *lfnode) {
+	node.pushcnt++
+	new := lfstackPack(node, node.pushcnt)
+	if node1 := lfstackUnpack(new); node1 != node {
+		print("runtime: lfstack.push invalid packing: node=", node, " cnt=", hex(node.pushcnt), " packed=", hex(new), " -> node=", node1, "\n")
+		throw("lfstack.push")
+	}
+	for {
+		old := atomic.Load64((*uint64)(head))
+		node.next = old
+		if atomic.Cas64((*uint64)(head), old, new) {
+			break
+		}
+	}
+}
+
+func (head *lfstack) pop() unsafe.Pointer {
+	for {
+		old := atomic.Load64((*uint64)(head))
+		if old == 0 {
+			return nil
+		}
+		node := lfstackUnpack(old)
+		next := atomic.Load64(&node.next)
+		if atomic.Cas64((*uint64)(head), old, next) {
+			return unsafe.Pointer(node)
+		}
+	}
+}
+
+func (head *lfstack) empty() bool {
+	return atomic.Load64((*uint64)(head)) == 0
+}
+
+// lfnodeValidate panics if node is not a valid address for use with
+// lfstack.push. This only needs to be called when node is allocated.
+func lfnodeValidate(node *lfnode) {
+	if base, _, _ := findObject(uintptr(unsafe.Pointer(node)), 0, 0); base != 0 {
+		throw("lfstack node allocated from the heap")
+	}
+	if lfstackUnpack(lfstackPack(node, ^uintptr(0))) != node {
+		printlock()
+		println("runtime: bad lfnode address", hex(uintptr(unsafe.Pointer(node))))
+		throw("bad lfnode address")
+	}
+}
+
+func lfstackPack(node *lfnode, cnt uintptr) uint64 {
+	return uint64(taggedPointerPack(unsafe.Pointer(node), cnt))
+}
+
+func lfstackUnpack(val uint64) *lfnode {
+	return (*lfnode)(taggedPointer(val).pointer())
+}
diff --git a/src/runtime/lfstack_test.go b/src/runtime/lfstack_test.go
new file mode 100644
index 0000000..e36297e
--- /dev/null
+++ b/src/runtime/lfstack_test.go
@@ -0,0 +1,137 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"math/rand"
+	. "runtime"
+	"testing"
+	"unsafe"
+)
+
+type MyNode struct {
+	LFNode
+	data int
+}
+
+// allocMyNode allocates nodes that are stored in an lfstack
+// outside the Go heap.
+// We require lfstack objects to live outside the heap so that
+// checkptr passes on the unsafe shenanigans used.
+func allocMyNode(data int) *MyNode {
+	n := (*MyNode)(PersistentAlloc(unsafe.Sizeof(MyNode{})))
+	LFNodeValidate(&n.LFNode)
+	n.data = data
+	return n
+}
+
+func fromMyNode(node *MyNode) *LFNode {
+	return (*LFNode)(unsafe.Pointer(node))
+}
+
+func toMyNode(node *LFNode) *MyNode {
+	return (*MyNode)(unsafe.Pointer(node))
+}
+
+var global any
+
+func TestLFStack(t *testing.T) {
+	stack := new(uint64)
+	global = stack // force heap allocation
+
+	// Check the stack is initially empty.
+	if LFStackPop(stack) != nil {
+		t.Fatalf("stack is not empty")
+	}
+
+	// Push one element.
+	node := allocMyNode(42)
+	LFStackPush(stack, fromMyNode(node))
+
+	// Push another.
+	node = allocMyNode(43)
+	LFStackPush(stack, fromMyNode(node))
+
+	// Pop one element.
+	node = toMyNode(LFStackPop(stack))
+	if node == nil {
+		t.Fatalf("stack is empty")
+	}
+	if node.data != 43 {
+		t.Fatalf("no lifo")
+	}
+
+	// Pop another.
+	node = toMyNode(LFStackPop(stack))
+	if node == nil {
+		t.Fatalf("stack is empty")
+	}
+	if node.data != 42 {
+		t.Fatalf("no lifo")
+	}
+
+	// Check the stack is empty again.
+	if LFStackPop(stack) != nil {
+		t.Fatalf("stack is not empty")
+	}
+	if *stack != 0 {
+		t.Fatalf("stack is not empty")
+	}
+}
+
+func TestLFStackStress(t *testing.T) {
+	const K = 100
+	P := 4 * GOMAXPROCS(-1)
+	N := 100000
+	if testing.Short() {
+		N /= 10
+	}
+	// Create 2 stacks.
+	stacks := [2]*uint64{new(uint64), new(uint64)}
+	// Push K elements randomly onto the stacks.
+	sum := 0
+	for i := 0; i < K; i++ {
+		sum += i
+		node := allocMyNode(i)
+		LFStackPush(stacks[i%2], fromMyNode(node))
+	}
+	c := make(chan bool, P)
+	for p := 0; p < P; p++ {
+		go func() {
+			r := rand.New(rand.NewSource(rand.Int63()))
+			// Pop a node from a random stack, then push it onto a random stack.
+			for i := 0; i < N; i++ {
+				node := toMyNode(LFStackPop(stacks[r.Intn(2)]))
+				if node != nil {
+					LFStackPush(stacks[r.Intn(2)], fromMyNode(node))
+				}
+			}
+			c <- true
+		}()
+	}
+	for i := 0; i < P; i++ {
+		<-c
+	}
+	// Pop all elements from both stacks, and verify that nothing lost.
+	sum2 := 0
+	cnt := 0
+	for i := 0; i < 2; i++ {
+		for {
+			node := toMyNode(LFStackPop(stacks[i]))
+			if node == nil {
+				break
+			}
+			cnt++
+			sum2 += node.data
+			node.Next = 0
+		}
+	}
+	if cnt != K {
+		t.Fatalf("Wrong number of nodes %d/%d", cnt, K)
+	}
+	if sum2 != sum {
+		t.Fatalf("Wrong sum %d/%d", sum2, sum)
+	}
+}
diff --git a/src/runtime/libfuzzer.go b/src/runtime/libfuzzer.go
new file mode 100644
index 0000000..0ece035
--- /dev/null
+++ b/src/runtime/libfuzzer.go
@@ -0,0 +1,160 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build libfuzzer
+
+package runtime
+
+import "unsafe"
+
+func libfuzzerCallWithTwoByteBuffers(fn, start, end *byte)
+func libfuzzerCallTraceIntCmp(fn *byte, arg0, arg1, fakePC uintptr)
+func libfuzzerCall4(fn *byte, fakePC uintptr, s1, s2 unsafe.Pointer, result uintptr)
+
+// Keep in sync with the definition of ret_sled in src/runtime/libfuzzer_amd64.s
+const retSledSize = 512
+
+// In libFuzzer mode, the compiler inserts calls to libfuzzerTraceCmpN and libfuzzerTraceConstCmpN
+// (where N can be 1, 2, 4, or 8) for encountered integer comparisons in the code to be instrumented.
+// This may result in these functions having callers that are nosplit. That is why they must be nosplit.
+//
+//go:nosplit
+func libfuzzerTraceCmp1(arg0, arg1 uint8, fakePC uint) {
+	fakePC = fakePC % retSledSize
+	libfuzzerCallTraceIntCmp(&__sanitizer_cov_trace_cmp1, uintptr(arg0), uintptr(arg1), uintptr(fakePC))
+}
+
+//go:nosplit
+func libfuzzerTraceCmp2(arg0, arg1 uint16, fakePC uint) {
+	fakePC = fakePC % retSledSize
+	libfuzzerCallTraceIntCmp(&__sanitizer_cov_trace_cmp2, uintptr(arg0), uintptr(arg1), uintptr(fakePC))
+}
+
+//go:nosplit
+func libfuzzerTraceCmp4(arg0, arg1 uint32, fakePC uint) {
+	fakePC = fakePC % retSledSize
+	libfuzzerCallTraceIntCmp(&__sanitizer_cov_trace_cmp4, uintptr(arg0), uintptr(arg1), uintptr(fakePC))
+}
+
+//go:nosplit
+func libfuzzerTraceCmp8(arg0, arg1 uint64, fakePC uint) {
+	fakePC = fakePC % retSledSize
+	libfuzzerCallTraceIntCmp(&__sanitizer_cov_trace_cmp8, uintptr(arg0), uintptr(arg1), uintptr(fakePC))
+}
+
+//go:nosplit
+func libfuzzerTraceConstCmp1(arg0, arg1 uint8, fakePC uint) {
+	fakePC = fakePC % retSledSize
+	libfuzzerCallTraceIntCmp(&__sanitizer_cov_trace_const_cmp1, uintptr(arg0), uintptr(arg1), uintptr(fakePC))
+}
+
+//go:nosplit
+func libfuzzerTraceConstCmp2(arg0, arg1 uint16, fakePC uint) {
+	fakePC = fakePC % retSledSize
+	libfuzzerCallTraceIntCmp(&__sanitizer_cov_trace_const_cmp2, uintptr(arg0), uintptr(arg1), uintptr(fakePC))
+}
+
+//go:nosplit
+func libfuzzerTraceConstCmp4(arg0, arg1 uint32, fakePC uint) {
+	fakePC = fakePC % retSledSize
+	libfuzzerCallTraceIntCmp(&__sanitizer_cov_trace_const_cmp4, uintptr(arg0), uintptr(arg1), uintptr(fakePC))
+}
+
+//go:nosplit
+func libfuzzerTraceConstCmp8(arg0, arg1 uint64, fakePC uint) {
+	fakePC = fakePC % retSledSize
+	libfuzzerCallTraceIntCmp(&__sanitizer_cov_trace_const_cmp8, uintptr(arg0), uintptr(arg1), uintptr(fakePC))
+}
+
+var pcTables []byte
+
+func init() {
+	libfuzzerCallWithTwoByteBuffers(&__sanitizer_cov_8bit_counters_init, &__start___sancov_cntrs, &__stop___sancov_cntrs)
+	start := unsafe.Pointer(&__start___sancov_cntrs)
+	end := unsafe.Pointer(&__stop___sancov_cntrs)
+
+	// PC tables are arrays of ptr-sized integers representing pairs [PC,PCFlags] for every instrumented block.
+	// The number of PCs and PCFlags is the same as the number of 8-bit counters. Each PC table entry has
+	// the size of two ptr-sized integers. We allocate one more byte than what we actually need so that we can
+	// get a pointer representing the end of the PC table array.
+	size := (uintptr(end)-uintptr(start))*unsafe.Sizeof(uintptr(0))*2 + 1
+	pcTables = make([]byte, size)
+	libfuzzerCallWithTwoByteBuffers(&__sanitizer_cov_pcs_init, &pcTables[0], &pcTables[size-1])
+}
+
+// We call libFuzzer's __sanitizer_weak_hook_strcmp function which takes the
+// following four arguments:
+//
+//  1. caller_pc: location of string comparison call site
+//  2. s1: first string used in the comparison
+//  3. s2: second string used in the comparison
+//  4. result: an integer representing the comparison result. 0 indicates
+//     equality (comparison will ignored by libfuzzer), non-zero indicates a
+//     difference (comparison will be taken into consideration).
+//
+//go:nosplit
+func libfuzzerHookStrCmp(s1, s2 string, fakePC int) {
+	if s1 != s2 {
+		libfuzzerCall4(&__sanitizer_weak_hook_strcmp, uintptr(fakePC), cstring(s1), cstring(s2), uintptr(1))
+	}
+	// if s1 == s2 we could call the hook with a last argument of 0 but this is unnecessary since this case will be then
+	// ignored by libfuzzer
+}
+
+// This function has now the same implementation as libfuzzerHookStrCmp because we lack better checks
+// for case-insensitive string equality in the runtime package.
+//
+//go:nosplit
+func libfuzzerHookEqualFold(s1, s2 string, fakePC int) {
+	if s1 != s2 {
+		libfuzzerCall4(&__sanitizer_weak_hook_strcmp, uintptr(fakePC), cstring(s1), cstring(s2), uintptr(1))
+	}
+}
+
+//go:linkname __sanitizer_cov_trace_cmp1 __sanitizer_cov_trace_cmp1
+//go:cgo_import_static __sanitizer_cov_trace_cmp1
+var __sanitizer_cov_trace_cmp1 byte
+
+//go:linkname __sanitizer_cov_trace_cmp2 __sanitizer_cov_trace_cmp2
+//go:cgo_import_static __sanitizer_cov_trace_cmp2
+var __sanitizer_cov_trace_cmp2 byte
+
+//go:linkname __sanitizer_cov_trace_cmp4 __sanitizer_cov_trace_cmp4
+//go:cgo_import_static __sanitizer_cov_trace_cmp4
+var __sanitizer_cov_trace_cmp4 byte
+
+//go:linkname __sanitizer_cov_trace_cmp8 __sanitizer_cov_trace_cmp8
+//go:cgo_import_static __sanitizer_cov_trace_cmp8
+var __sanitizer_cov_trace_cmp8 byte
+
+//go:linkname __sanitizer_cov_trace_const_cmp1 __sanitizer_cov_trace_const_cmp1
+//go:cgo_import_static __sanitizer_cov_trace_const_cmp1
+var __sanitizer_cov_trace_const_cmp1 byte
+
+//go:linkname __sanitizer_cov_trace_const_cmp2 __sanitizer_cov_trace_const_cmp2
+//go:cgo_import_static __sanitizer_cov_trace_const_cmp2
+var __sanitizer_cov_trace_const_cmp2 byte
+
+//go:linkname __sanitizer_cov_trace_const_cmp4 __sanitizer_cov_trace_const_cmp4
+//go:cgo_import_static __sanitizer_cov_trace_const_cmp4
+var __sanitizer_cov_trace_const_cmp4 byte
+
+//go:linkname __sanitizer_cov_trace_const_cmp8 __sanitizer_cov_trace_const_cmp8
+//go:cgo_import_static __sanitizer_cov_trace_const_cmp8
+var __sanitizer_cov_trace_const_cmp8 byte
+
+//go:linkname __sanitizer_cov_8bit_counters_init __sanitizer_cov_8bit_counters_init
+//go:cgo_import_static __sanitizer_cov_8bit_counters_init
+var __sanitizer_cov_8bit_counters_init byte
+
+// start, stop markers of counters, set by the linker
+var __start___sancov_cntrs, __stop___sancov_cntrs byte
+
+//go:linkname __sanitizer_cov_pcs_init __sanitizer_cov_pcs_init
+//go:cgo_import_static __sanitizer_cov_pcs_init
+var __sanitizer_cov_pcs_init byte
+
+//go:linkname __sanitizer_weak_hook_strcmp __sanitizer_weak_hook_strcmp
+//go:cgo_import_static __sanitizer_weak_hook_strcmp
+var __sanitizer_weak_hook_strcmp byte
diff --git a/src/runtime/libfuzzer_amd64.s b/src/runtime/libfuzzer_amd64.s
new file mode 100644
index 0000000..e30b768
--- /dev/null
+++ b/src/runtime/libfuzzer_amd64.s
@@ -0,0 +1,158 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build libfuzzer
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// Based on race_amd64.s; see commentary there.
+
+#ifdef GOOS_windows
+#define RARG0 CX
+#define RARG1 DX
+#define RARG2 R8
+#define RARG3 R9
+#else
+#define RARG0 DI
+#define RARG1 SI
+#define RARG2 DX
+#define RARG3 CX
+#endif
+
+// void runtime·libfuzzerCall4(fn, hookId int, s1, s2 unsafe.Pointer, result uintptr)
+// Calls C function fn from libFuzzer and passes 4 arguments to it.
+TEXT	runtime·libfuzzerCall4(SB), NOSPLIT, $0-40
+	MOVQ	fn+0(FP), AX
+	MOVQ	hookId+8(FP), RARG0
+	MOVQ	s1+16(FP), RARG1
+	MOVQ	s2+24(FP), RARG2
+	MOVQ	result+32(FP), RARG3
+
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	MOVQ	g_m(R14), R13
+
+	// Switch to g0 stack.
+	MOVQ	SP, R12		// callee-saved, preserved across the CALL
+	MOVQ	m_g0(R13), R10
+	CMPQ	R10, R14
+	JE	call	// already on g0
+	MOVQ	(g_sched+gobuf_sp)(R10), SP
+call:
+	ANDQ	$~15, SP	// alignment for gcc ABI
+	CALL	AX
+	MOVQ	R12, SP
+	RET
+
+// void runtime·libfuzzerCallTraceIntCmp(fn, arg0, arg1, fakePC uintptr)
+// Calls C function fn from libFuzzer and passes 2 arguments to it after
+// manipulating the return address so that libfuzzer's integer compare hooks
+// work
+// libFuzzer's compare hooks obtain the caller's address from the compiler
+// builtin __builtin_return_address. Since we invoke the hooks always
+// from the same native function, this builtin would always return the same
+// value. Internally, the libFuzzer hooks call through to the always inlined
+// HandleCmp and thus can't be mimicked without patching libFuzzer.
+//
+// We solve this problem via an inline assembly trampoline construction that
+// translates a runtime argument `fake_pc` in the range [0, 512) into a call to
+// a hook with a fake return address whose lower 9 bits are `fake_pc` up to a
+// constant shift. This is achieved by pushing a return address pointing into
+// 512 ret instructions at offset `fake_pc` onto the stack and then jumping
+// directly to the address of the hook.
+//
+// Note: We only set the lowest 9 bits of the return address since only these
+// bits are used by the libFuzzer value profiling mode for integer compares, see
+// https://github.com/llvm/llvm-project/blob/704d92607d26e696daba596b72cb70effe79a872/compiler-rt/lib/fuzzer/FuzzerTracePC.cpp#L390
+// as well as
+// https://github.com/llvm/llvm-project/blob/704d92607d26e696daba596b72cb70effe79a872/compiler-rt/lib/fuzzer/FuzzerValueBitMap.h#L34
+// ValueProfileMap.AddValue() truncates its argument to 16 bits and shifts the
+// PC to the left by log_2(128)=7, which means that only the lowest 16 - 7 bits
+// of the return address matter. String compare hooks use the lowest 12 bits,
+// but take the return address as an argument and thus don't require the
+// indirection through a trampoline.
+// TODO: Remove the inline assembly trampoline once a PC argument has been added to libfuzzer's int compare hooks.
+TEXT	runtime·libfuzzerCallTraceIntCmp(SB), NOSPLIT, $0-32
+	MOVQ	fn+0(FP), AX
+	MOVQ	arg0+8(FP), RARG0
+	MOVQ	arg1+16(FP), RARG1
+	MOVQ	fakePC+24(FP), R8
+
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	MOVQ	g_m(R14), R13
+
+	// Switch to g0 stack.
+	MOVQ	SP, R12		// callee-saved, preserved across the CALL
+	MOVQ	m_g0(R13), R10
+	CMPQ	R10, R14
+	JE	call	// already on g0
+	MOVQ	(g_sched+gobuf_sp)(R10), SP
+call:
+	ANDQ	$~15, SP	// alignment for gcc ABI
+	SUBQ	$8, SP
+	// Load the address of the end of the function and push it into the stack.
+	// This address will be jumped to after executing the return instruction
+	// from the return sled. There we reset the stack pointer and return.
+	MOVQ    $end_of_function<>(SB), BX
+	PUSHQ   BX
+	// Load the starting address of the return sled into BX.
+	MOVQ    $ret_sled<>(SB), BX
+	// Load the address of the i'th return instruction from the return sled.
+	// The index is given in the fakePC argument.
+	ADDQ    R8, BX
+	PUSHQ   BX
+	// Call the original function with the fakePC return address on the stack.
+	// Function arguments arg0 and arg1 are passed in the registers specified
+	// by the x64 calling convention.
+	JMP     AX
+// This code will not be executed and is only there to satisfy assembler
+// check of a balanced stack.
+not_reachable:
+	POPQ    BX
+	POPQ    BX
+	RET
+
+TEXT end_of_function<>(SB), NOSPLIT, $0-0
+	MOVQ	R12, SP
+	RET
+
+#define REPEAT_8(a) a \
+  a \
+  a \
+  a \
+  a \
+  a \
+  a \
+  a
+
+#define REPEAT_512(a) REPEAT_8(REPEAT_8(REPEAT_8(a)))
+
+TEXT ret_sled<>(SB), NOSPLIT, $0-0
+	REPEAT_512(RET)
+
+// void runtime·libfuzzerCallWithTwoByteBuffers(fn, start, end *byte)
+// Calls C function fn from libFuzzer and passes 2 arguments of type *byte to it.
+TEXT	runtime·libfuzzerCallWithTwoByteBuffers(SB), NOSPLIT, $0-24
+	MOVQ	fn+0(FP), AX
+	MOVQ	start+8(FP), RARG0
+	MOVQ	end+16(FP), RARG1
+
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	MOVQ	g_m(R14), R13
+
+	// Switch to g0 stack.
+	MOVQ	SP, R12		// callee-saved, preserved across the CALL
+	MOVQ	m_g0(R13), R10
+	CMPQ	R10, R14
+	JE	call	// already on g0
+	MOVQ	(g_sched+gobuf_sp)(R10), SP
+call:
+	ANDQ	$~15, SP	// alignment for gcc ABI
+	CALL	AX
+	MOVQ	R12, SP
+	RET
diff --git a/src/runtime/libfuzzer_arm64.s b/src/runtime/libfuzzer_arm64.s
new file mode 100644
index 0000000..37b3517
--- /dev/null
+++ b/src/runtime/libfuzzer_arm64.s
@@ -0,0 +1,115 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build libfuzzer
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// Based on race_arm64.s; see commentary there.
+
+#define RARG0 R0
+#define RARG1 R1
+#define RARG2 R2
+#define RARG3 R3
+
+#define REPEAT_2(a) a a
+#define REPEAT_8(a) REPEAT_2(REPEAT_2(REPEAT_2(a)))
+#define REPEAT_128(a) REPEAT_2(REPEAT_8(REPEAT_8(a)))
+
+// void runtime·libfuzzerCallTraceIntCmp(fn, arg0, arg1, fakePC uintptr)
+// Calls C function fn from libFuzzer and passes 2 arguments to it after
+// manipulating the return address so that libfuzzer's integer compare hooks
+// work.
+// The problem statement and solution are documented in detail in libfuzzer_amd64.s.
+// See commentary there.
+TEXT	runtime·libfuzzerCallTraceIntCmp(SB), NOSPLIT, $8-32
+	MOVD	fn+0(FP), R9
+	MOVD	arg0+8(FP), RARG0
+	MOVD	arg1+16(FP), RARG1
+	MOVD	fakePC+24(FP), R8
+	// Save the original return address in a local variable
+	MOVD	R30, savedRetAddr-8(SP)
+
+	MOVD	g_m(g), R10
+
+	// Switch to g0 stack.
+	MOVD	RSP, R19	// callee-saved, preserved across the CALL
+	MOVD	m_g0(R10), R11
+	CMP	R11, g
+	BEQ	call	// already on g0
+	MOVD	(g_sched+gobuf_sp)(R11), R12
+	MOVD	R12, RSP
+call:
+	// Load address of the ret sled into the default register for the return
+	// address.
+	ADR	ret_sled, R30
+	// Clear the lowest 2 bits of fakePC. All ARM64 instructions are four
+	// bytes long, so we cannot get better return address granularity than
+	// multiples of 4.
+	AND	$-4, R8, R8
+	// Add the offset of the fake_pc-th ret.
+	ADD	R8, R30, R30
+	// Call the function by jumping to it and reusing all registers except
+	// for the modified return address register R30.
+	JMP	(R9)
+
+// The ret sled for ARM64 consists of 128 br instructions jumping to the
+// end of the function. Each instruction is 4 bytes long. The sled thus
+// has the same byte length of 4 * 128 = 512 as the x86_64 sled, but
+// coarser granularity.
+#define RET_SLED \
+	JMP	end_of_function;
+
+ret_sled:
+	REPEAT_128(RET_SLED);
+
+end_of_function:
+	MOVD	R19, RSP
+	MOVD	savedRetAddr-8(SP), R30
+	RET
+
+// void runtime·libfuzzerCall4(fn, hookId int, s1, s2 unsafe.Pointer, result uintptr)
+// Calls C function fn from libFuzzer and passes 4 arguments to it.
+TEXT	runtime·libfuzzerCall4(SB), NOSPLIT, $0-40
+	MOVD	fn+0(FP), R9
+	MOVD	hookId+8(FP), RARG0
+	MOVD	s1+16(FP), RARG1
+	MOVD	s2+24(FP), RARG2
+	MOVD	result+32(FP), RARG3
+
+	MOVD	g_m(g), R10
+
+	// Switch to g0 stack.
+	MOVD	RSP, R19	// callee-saved, preserved across the CALL
+	MOVD	m_g0(R10), R11
+	CMP	R11, g
+	BEQ	call	// already on g0
+	MOVD	(g_sched+gobuf_sp)(R11), R12
+	MOVD	R12, RSP
+call:
+	BL	R9
+	MOVD	R19, RSP
+	RET
+
+// void runtime·libfuzzerCallWithTwoByteBuffers(fn, start, end *byte)
+// Calls C function fn from libFuzzer and passes 2 arguments of type *byte to it.
+TEXT	runtime·libfuzzerCallWithTwoByteBuffers(SB), NOSPLIT, $0-24
+	MOVD	fn+0(FP), R9
+	MOVD	start+8(FP), R0
+	MOVD	end+16(FP), R1
+
+	MOVD	g_m(g), R10
+
+	// Switch to g0 stack.
+	MOVD	RSP, R19	// callee-saved, preserved across the CALL
+	MOVD	m_g0(R10), R11
+	CMP	R11, g
+	BEQ	call	// already on g0
+	MOVD	(g_sched+gobuf_sp)(R11), R12
+	MOVD	R12, RSP
+call:
+	BL	R9
+	MOVD	R19, RSP
+	RET
diff --git a/src/runtime/lock_futex.go b/src/runtime/lock_futex.go
new file mode 100644
index 0000000..cc7d465
--- /dev/null
+++ b/src/runtime/lock_futex.go
@@ -0,0 +1,246 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build dragonfly || freebsd || linux
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// This implementation depends on OS-specific implementations of
+//
+//	futexsleep(addr *uint32, val uint32, ns int64)
+//		Atomically,
+//			if *addr == val { sleep }
+//		Might be woken up spuriously; that's allowed.
+//		Don't sleep longer than ns; ns < 0 means forever.
+//
+//	futexwakeup(addr *uint32, cnt uint32)
+//		If any procs are sleeping on addr, wake up at most cnt.
+
+const (
+	mutex_unlocked = 0
+	mutex_locked   = 1
+	mutex_sleeping = 2
+
+	active_spin     = 4
+	active_spin_cnt = 30
+	passive_spin    = 1
+)
+
+// Possible lock states are mutex_unlocked, mutex_locked and mutex_sleeping.
+// mutex_sleeping means that there is presumably at least one sleeping thread.
+// Note that there can be spinning threads during all states - they do not
+// affect mutex's state.
+
+// We use the uintptr mutex.key and note.key as a uint32.
+//
+//go:nosplit
+func key32(p *uintptr) *uint32 {
+	return (*uint32)(unsafe.Pointer(p))
+}
+
+func lock(l *mutex) {
+	lockWithRank(l, getLockRank(l))
+}
+
+func lock2(l *mutex) {
+	gp := getg()
+
+	if gp.m.locks < 0 {
+		throw("runtime·lock: lock count")
+	}
+	gp.m.locks++
+
+	// Speculative grab for lock.
+	v := atomic.Xchg(key32(&l.key), mutex_locked)
+	if v == mutex_unlocked {
+		return
+	}
+
+	// wait is either MUTEX_LOCKED or MUTEX_SLEEPING
+	// depending on whether there is a thread sleeping
+	// on this mutex. If we ever change l->key from
+	// MUTEX_SLEEPING to some other value, we must be
+	// careful to change it back to MUTEX_SLEEPING before
+	// returning, to ensure that the sleeping thread gets
+	// its wakeup call.
+	wait := v
+
+	// On uniprocessors, no point spinning.
+	// On multiprocessors, spin for ACTIVE_SPIN attempts.
+	spin := 0
+	if ncpu > 1 {
+		spin = active_spin
+	}
+	for {
+		// Try for lock, spinning.
+		for i := 0; i < spin; i++ {
+			for l.key == mutex_unlocked {
+				if atomic.Cas(key32(&l.key), mutex_unlocked, wait) {
+					return
+				}
+			}
+			procyield(active_spin_cnt)
+		}
+
+		// Try for lock, rescheduling.
+		for i := 0; i < passive_spin; i++ {
+			for l.key == mutex_unlocked {
+				if atomic.Cas(key32(&l.key), mutex_unlocked, wait) {
+					return
+				}
+			}
+			osyield()
+		}
+
+		// Sleep.
+		v = atomic.Xchg(key32(&l.key), mutex_sleeping)
+		if v == mutex_unlocked {
+			return
+		}
+		wait = mutex_sleeping
+		futexsleep(key32(&l.key), mutex_sleeping, -1)
+	}
+}
+
+func unlock(l *mutex) {
+	unlockWithRank(l)
+}
+
+func unlock2(l *mutex) {
+	v := atomic.Xchg(key32(&l.key), mutex_unlocked)
+	if v == mutex_unlocked {
+		throw("unlock of unlocked lock")
+	}
+	if v == mutex_sleeping {
+		futexwakeup(key32(&l.key), 1)
+	}
+
+	gp := getg()
+	gp.m.locks--
+	if gp.m.locks < 0 {
+		throw("runtime·unlock: lock count")
+	}
+	if gp.m.locks == 0 && gp.preempt { // restore the preemption request in case we've cleared it in newstack
+		gp.stackguard0 = stackPreempt
+	}
+}
+
+// One-time notifications.
+func noteclear(n *note) {
+	n.key = 0
+}
+
+func notewakeup(n *note) {
+	old := atomic.Xchg(key32(&n.key), 1)
+	if old != 0 {
+		print("notewakeup - double wakeup (", old, ")\n")
+		throw("notewakeup - double wakeup")
+	}
+	futexwakeup(key32(&n.key), 1)
+}
+
+func notesleep(n *note) {
+	gp := getg()
+	if gp != gp.m.g0 {
+		throw("notesleep not on g0")
+	}
+	ns := int64(-1)
+	if *cgo_yield != nil {
+		// Sleep for an arbitrary-but-moderate interval to poll libc interceptors.
+		ns = 10e6
+	}
+	for atomic.Load(key32(&n.key)) == 0 {
+		gp.m.blocked = true
+		futexsleep(key32(&n.key), 0, ns)
+		if *cgo_yield != nil {
+			asmcgocall(*cgo_yield, nil)
+		}
+		gp.m.blocked = false
+	}
+}
+
+// May run with m.p==nil if called from notetsleep, so write barriers
+// are not allowed.
+//
+//go:nosplit
+//go:nowritebarrier
+func notetsleep_internal(n *note, ns int64) bool {
+	gp := getg()
+
+	if ns < 0 {
+		if *cgo_yield != nil {
+			// Sleep for an arbitrary-but-moderate interval to poll libc interceptors.
+			ns = 10e6
+		}
+		for atomic.Load(key32(&n.key)) == 0 {
+			gp.m.blocked = true
+			futexsleep(key32(&n.key), 0, ns)
+			if *cgo_yield != nil {
+				asmcgocall(*cgo_yield, nil)
+			}
+			gp.m.blocked = false
+		}
+		return true
+	}
+
+	if atomic.Load(key32(&n.key)) != 0 {
+		return true
+	}
+
+	deadline := nanotime() + ns
+	for {
+		if *cgo_yield != nil && ns > 10e6 {
+			ns = 10e6
+		}
+		gp.m.blocked = true
+		futexsleep(key32(&n.key), 0, ns)
+		if *cgo_yield != nil {
+			asmcgocall(*cgo_yield, nil)
+		}
+		gp.m.blocked = false
+		if atomic.Load(key32(&n.key)) != 0 {
+			break
+		}
+		now := nanotime()
+		if now >= deadline {
+			break
+		}
+		ns = deadline - now
+	}
+	return atomic.Load(key32(&n.key)) != 0
+}
+
+func notetsleep(n *note, ns int64) bool {
+	gp := getg()
+	if gp != gp.m.g0 && gp.m.preemptoff != "" {
+		throw("notetsleep not on g0")
+	}
+
+	return notetsleep_internal(n, ns)
+}
+
+// same as runtime·notetsleep, but called on user g (not g0)
+// calls only nosplit functions between entersyscallblock/exitsyscall.
+func notetsleepg(n *note, ns int64) bool {
+	gp := getg()
+	if gp == gp.m.g0 {
+		throw("notetsleepg on g0")
+	}
+
+	entersyscallblock()
+	ok := notetsleep_internal(n, ns)
+	exitsyscall()
+	return ok
+}
+
+func beforeIdle(int64, int64) (*g, bool) {
+	return nil, false
+}
+
+func checkTimeouts() {}
diff --git a/src/runtime/lock_js.go b/src/runtime/lock_js.go
new file mode 100644
index 0000000..91ad7be
--- /dev/null
+++ b/src/runtime/lock_js.go
@@ -0,0 +1,309 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build js && wasm
+
+package runtime
+
+import _ "unsafe" // for go:linkname
+
+// js/wasm has no support for threads yet. There is no preemption.
+
+const (
+	mutex_unlocked = 0
+	mutex_locked   = 1
+
+	note_cleared = 0
+	note_woken   = 1
+	note_timeout = 2
+
+	active_spin     = 4
+	active_spin_cnt = 30
+	passive_spin    = 1
+)
+
+func lock(l *mutex) {
+	lockWithRank(l, getLockRank(l))
+}
+
+func lock2(l *mutex) {
+	if l.key == mutex_locked {
+		// js/wasm is single-threaded so we should never
+		// observe this.
+		throw("self deadlock")
+	}
+	gp := getg()
+	if gp.m.locks < 0 {
+		throw("lock count")
+	}
+	gp.m.locks++
+	l.key = mutex_locked
+}
+
+func unlock(l *mutex) {
+	unlockWithRank(l)
+}
+
+func unlock2(l *mutex) {
+	if l.key == mutex_unlocked {
+		throw("unlock of unlocked lock")
+	}
+	gp := getg()
+	gp.m.locks--
+	if gp.m.locks < 0 {
+		throw("lock count")
+	}
+	l.key = mutex_unlocked
+}
+
+// One-time notifications.
+
+type noteWithTimeout struct {
+	gp       *g
+	deadline int64
+}
+
+var (
+	notes            = make(map[*note]*g)
+	notesWithTimeout = make(map[*note]noteWithTimeout)
+)
+
+func noteclear(n *note) {
+	n.key = note_cleared
+}
+
+func notewakeup(n *note) {
+	// gp := getg()
+	if n.key == note_woken {
+		throw("notewakeup - double wakeup")
+	}
+	cleared := n.key == note_cleared
+	n.key = note_woken
+	if cleared {
+		goready(notes[n], 1)
+	}
+}
+
+func notesleep(n *note) {
+	throw("notesleep not supported by js")
+}
+
+func notetsleep(n *note, ns int64) bool {
+	throw("notetsleep not supported by js")
+	return false
+}
+
+// same as runtime·notetsleep, but called on user g (not g0)
+func notetsleepg(n *note, ns int64) bool {
+	gp := getg()
+	if gp == gp.m.g0 {
+		throw("notetsleepg on g0")
+	}
+
+	if ns >= 0 {
+		deadline := nanotime() + ns
+		delay := ns/1000000 + 1 // round up
+		if delay > 1<<31-1 {
+			delay = 1<<31 - 1 // cap to max int32
+		}
+
+		id := scheduleTimeoutEvent(delay)
+		mp := acquirem()
+		notes[n] = gp
+		notesWithTimeout[n] = noteWithTimeout{gp: gp, deadline: deadline}
+		releasem(mp)
+
+		gopark(nil, nil, waitReasonSleep, traceBlockSleep, 1)
+
+		clearTimeoutEvent(id) // note might have woken early, clear timeout
+
+		mp = acquirem()
+		delete(notes, n)
+		delete(notesWithTimeout, n)
+		releasem(mp)
+
+		return n.key == note_woken
+	}
+
+	for n.key != note_woken {
+		mp := acquirem()
+		notes[n] = gp
+		releasem(mp)
+
+		gopark(nil, nil, waitReasonZero, traceBlockGeneric, 1)
+
+		mp = acquirem()
+		delete(notes, n)
+		releasem(mp)
+	}
+	return true
+}
+
+// checkTimeouts resumes goroutines that are waiting on a note which has reached its deadline.
+// TODO(drchase): need to understand if write barriers are really okay in this context.
+//
+//go:yeswritebarrierrec
+func checkTimeouts() {
+	now := nanotime()
+	// TODO: map iteration has the write barriers in it; is that okay?
+	for n, nt := range notesWithTimeout {
+		if n.key == note_cleared && now >= nt.deadline {
+			n.key = note_timeout
+			goready(nt.gp, 1)
+		}
+	}
+}
+
+// events is a stack of calls from JavaScript into Go.
+var events []*event
+
+type event struct {
+	// g was the active goroutine when the call from JavaScript occurred.
+	// It needs to be active when returning to JavaScript.
+	gp *g
+	// returned reports whether the event handler has returned.
+	// When all goroutines are idle and the event handler has returned,
+	// then g gets resumed and returns the execution to JavaScript.
+	returned bool
+}
+
+type timeoutEvent struct {
+	id int32
+	// The time when this timeout will be triggered.
+	time int64
+}
+
+// diff calculates the difference of the event's trigger time and x.
+func (e *timeoutEvent) diff(x int64) int64 {
+	if e == nil {
+		return 0
+	}
+
+	diff := x - idleTimeout.time
+	if diff < 0 {
+		diff = -diff
+	}
+	return diff
+}
+
+// clear cancels this timeout event.
+func (e *timeoutEvent) clear() {
+	if e == nil {
+		return
+	}
+
+	clearTimeoutEvent(e.id)
+}
+
+// The timeout event started by beforeIdle.
+var idleTimeout *timeoutEvent
+
+// beforeIdle gets called by the scheduler if no goroutine is awake.
+// If we are not already handling an event, then we pause for an async event.
+// If an event handler returned, we resume it and it will pause the execution.
+// beforeIdle either returns the specific goroutine to schedule next or
+// indicates with otherReady that some goroutine became ready.
+// TODO(drchase): need to understand if write barriers are really okay in this context.
+//
+//go:yeswritebarrierrec
+func beforeIdle(now, pollUntil int64) (gp *g, otherReady bool) {
+	delay := int64(-1)
+	if pollUntil != 0 {
+		// round up to prevent setTimeout being called early
+		delay = (pollUntil-now-1)/1e6 + 1
+		if delay > 1e9 {
+			// An arbitrary cap on how long to wait for a timer.
+			// 1e9 ms == ~11.5 days.
+			delay = 1e9
+		}
+	}
+
+	if delay > 0 && (idleTimeout == nil || idleTimeout.diff(pollUntil) > 1e6) {
+		// If the difference is larger than 1 ms, we should reschedule the timeout.
+		idleTimeout.clear()
+
+		idleTimeout = &timeoutEvent{
+			id:   scheduleTimeoutEvent(delay),
+			time: pollUntil,
+		}
+	}
+
+	if len(events) == 0 {
+		// TODO: this is the line that requires the yeswritebarrierrec
+		go handleAsyncEvent()
+		return nil, true
+	}
+
+	e := events[len(events)-1]
+	if e.returned {
+		return e.gp, false
+	}
+	return nil, false
+}
+
+var idleStart int64
+
+func handleAsyncEvent() {
+	idleStart = nanotime()
+	pause(getcallersp() - 16)
+}
+
+// clearIdleTimeout clears our record of the timeout started by beforeIdle.
+func clearIdleTimeout() {
+	idleTimeout.clear()
+	idleTimeout = nil
+}
+
+// pause sets SP to newsp and pauses the execution of Go's WebAssembly code until an event is triggered.
+func pause(newsp uintptr)
+
+// scheduleTimeoutEvent tells the WebAssembly environment to trigger an event after ms milliseconds.
+// It returns a timer id that can be used with clearTimeoutEvent.
+//
+//go:wasmimport gojs runtime.scheduleTimeoutEvent
+func scheduleTimeoutEvent(ms int64) int32
+
+// clearTimeoutEvent clears a timeout event scheduled by scheduleTimeoutEvent.
+//
+//go:wasmimport gojs runtime.clearTimeoutEvent
+func clearTimeoutEvent(id int32)
+
+// handleEvent gets invoked on a call from JavaScript into Go. It calls the event handler of the syscall/js package
+// and then parks the handler goroutine to allow other goroutines to run before giving execution back to JavaScript.
+// When no other goroutine is awake any more, beforeIdle resumes the handler goroutine. Now that the same goroutine
+// is running as was running when the call came in from JavaScript, execution can be safely passed back to JavaScript.
+func handleEvent() {
+	sched.idleTime.Add(nanotime() - idleStart)
+
+	e := &event{
+		gp:       getg(),
+		returned: false,
+	}
+	events = append(events, e)
+
+	if !eventHandler() {
+		// If we did not handle a window event, the idle timeout was triggered, so we can clear it.
+		clearIdleTimeout()
+	}
+
+	// wait until all goroutines are idle
+	e.returned = true
+	gopark(nil, nil, waitReasonZero, traceBlockGeneric, 1)
+
+	events[len(events)-1] = nil
+	events = events[:len(events)-1]
+
+	// return execution to JavaScript
+	idleStart = nanotime()
+	pause(getcallersp() - 16)
+}
+
+// eventHandler retrieves and executes handlers for pending JavaScript events.
+// It returns true if an event was handled.
+var eventHandler func() bool
+
+//go:linkname setEventHandler syscall/js.setEventHandler
+func setEventHandler(fn func() bool) {
+	eventHandler = fn
+}
diff --git a/src/runtime/lock_sema.go b/src/runtime/lock_sema.go
new file mode 100644
index 0000000..e15bbf7
--- /dev/null
+++ b/src/runtime/lock_sema.go
@@ -0,0 +1,304 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build aix || darwin || netbsd || openbsd || plan9 || solaris || windows
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// This implementation depends on OS-specific implementations of
+//
+//	func semacreate(mp *m)
+//		Create a semaphore for mp, if it does not already have one.
+//
+//	func semasleep(ns int64) int32
+//		If ns < 0, acquire m's semaphore and return 0.
+//		If ns >= 0, try to acquire m's semaphore for at most ns nanoseconds.
+//		Return 0 if the semaphore was acquired, -1 if interrupted or timed out.
+//
+//	func semawakeup(mp *m)
+//		Wake up mp, which is or will soon be sleeping on its semaphore.
+const (
+	locked uintptr = 1
+
+	active_spin     = 4
+	active_spin_cnt = 30
+	passive_spin    = 1
+)
+
+func lock(l *mutex) {
+	lockWithRank(l, getLockRank(l))
+}
+
+func lock2(l *mutex) {
+	gp := getg()
+	if gp.m.locks < 0 {
+		throw("runtime·lock: lock count")
+	}
+	gp.m.locks++
+
+	// Speculative grab for lock.
+	if atomic.Casuintptr(&l.key, 0, locked) {
+		return
+	}
+	semacreate(gp.m)
+
+	// On uniprocessor's, no point spinning.
+	// On multiprocessors, spin for ACTIVE_SPIN attempts.
+	spin := 0
+	if ncpu > 1 {
+		spin = active_spin
+	}
+Loop:
+	for i := 0; ; i++ {
+		v := atomic.Loaduintptr(&l.key)
+		if v&locked == 0 {
+			// Unlocked. Try to lock.
+			if atomic.Casuintptr(&l.key, v, v|locked) {
+				return
+			}
+			i = 0
+		}
+		if i < spin {
+			procyield(active_spin_cnt)
+		} else if i < spin+passive_spin {
+			osyield()
+		} else {
+			// Someone else has it.
+			// l->waitm points to a linked list of M's waiting
+			// for this lock, chained through m->nextwaitm.
+			// Queue this M.
+			for {
+				gp.m.nextwaitm = muintptr(v &^ locked)
+				if atomic.Casuintptr(&l.key, v, uintptr(unsafe.Pointer(gp.m))|locked) {
+					break
+				}
+				v = atomic.Loaduintptr(&l.key)
+				if v&locked == 0 {
+					continue Loop
+				}
+			}
+			if v&locked != 0 {
+				// Queued. Wait.
+				semasleep(-1)
+				i = 0
+			}
+		}
+	}
+}
+
+func unlock(l *mutex) {
+	unlockWithRank(l)
+}
+
+// We might not be holding a p in this code.
+//
+//go:nowritebarrier
+func unlock2(l *mutex) {
+	gp := getg()
+	var mp *m
+	for {
+		v := atomic.Loaduintptr(&l.key)
+		if v == locked {
+			if atomic.Casuintptr(&l.key, locked, 0) {
+				break
+			}
+		} else {
+			// Other M's are waiting for the lock.
+			// Dequeue an M.
+			mp = muintptr(v &^ locked).ptr()
+			if atomic.Casuintptr(&l.key, v, uintptr(mp.nextwaitm)) {
+				// Dequeued an M.  Wake it.
+				semawakeup(mp)
+				break
+			}
+		}
+	}
+	gp.m.locks--
+	if gp.m.locks < 0 {
+		throw("runtime·unlock: lock count")
+	}
+	if gp.m.locks == 0 && gp.preempt { // restore the preemption request in case we've cleared it in newstack
+		gp.stackguard0 = stackPreempt
+	}
+}
+
+// One-time notifications.
+func noteclear(n *note) {
+	if GOOS == "aix" {
+		// On AIX, semaphores might not synchronize the memory in some
+		// rare cases. See issue #30189.
+		atomic.Storeuintptr(&n.key, 0)
+	} else {
+		n.key = 0
+	}
+}
+
+func notewakeup(n *note) {
+	var v uintptr
+	for {
+		v = atomic.Loaduintptr(&n.key)
+		if atomic.Casuintptr(&n.key, v, locked) {
+			break
+		}
+	}
+
+	// Successfully set waitm to locked.
+	// What was it before?
+	switch {
+	case v == 0:
+		// Nothing was waiting. Done.
+	case v == locked:
+		// Two notewakeups! Not allowed.
+		throw("notewakeup - double wakeup")
+	default:
+		// Must be the waiting m. Wake it up.
+		semawakeup((*m)(unsafe.Pointer(v)))
+	}
+}
+
+func notesleep(n *note) {
+	gp := getg()
+	if gp != gp.m.g0 {
+		throw("notesleep not on g0")
+	}
+	semacreate(gp.m)
+	if !atomic.Casuintptr(&n.key, 0, uintptr(unsafe.Pointer(gp.m))) {
+		// Must be locked (got wakeup).
+		if n.key != locked {
+			throw("notesleep - waitm out of sync")
+		}
+		return
+	}
+	// Queued. Sleep.
+	gp.m.blocked = true
+	if *cgo_yield == nil {
+		semasleep(-1)
+	} else {
+		// Sleep for an arbitrary-but-moderate interval to poll libc interceptors.
+		const ns = 10e6
+		for atomic.Loaduintptr(&n.key) == 0 {
+			semasleep(ns)
+			asmcgocall(*cgo_yield, nil)
+		}
+	}
+	gp.m.blocked = false
+}
+
+//go:nosplit
+func notetsleep_internal(n *note, ns int64, gp *g, deadline int64) bool {
+	// gp and deadline are logically local variables, but they are written
+	// as parameters so that the stack space they require is charged
+	// to the caller.
+	// This reduces the nosplit footprint of notetsleep_internal.
+	gp = getg()
+
+	// Register for wakeup on n->waitm.
+	if !atomic.Casuintptr(&n.key, 0, uintptr(unsafe.Pointer(gp.m))) {
+		// Must be locked (got wakeup).
+		if n.key != locked {
+			throw("notetsleep - waitm out of sync")
+		}
+		return true
+	}
+	if ns < 0 {
+		// Queued. Sleep.
+		gp.m.blocked = true
+		if *cgo_yield == nil {
+			semasleep(-1)
+		} else {
+			// Sleep in arbitrary-but-moderate intervals to poll libc interceptors.
+			const ns = 10e6
+			for semasleep(ns) < 0 {
+				asmcgocall(*cgo_yield, nil)
+			}
+		}
+		gp.m.blocked = false
+		return true
+	}
+
+	deadline = nanotime() + ns
+	for {
+		// Registered. Sleep.
+		gp.m.blocked = true
+		if *cgo_yield != nil && ns > 10e6 {
+			ns = 10e6
+		}
+		if semasleep(ns) >= 0 {
+			gp.m.blocked = false
+			// Acquired semaphore, semawakeup unregistered us.
+			// Done.
+			return true
+		}
+		if *cgo_yield != nil {
+			asmcgocall(*cgo_yield, nil)
+		}
+		gp.m.blocked = false
+		// Interrupted or timed out. Still registered. Semaphore not acquired.
+		ns = deadline - nanotime()
+		if ns <= 0 {
+			break
+		}
+		// Deadline hasn't arrived. Keep sleeping.
+	}
+
+	// Deadline arrived. Still registered. Semaphore not acquired.
+	// Want to give up and return, but have to unregister first,
+	// so that any notewakeup racing with the return does not
+	// try to grant us the semaphore when we don't expect it.
+	for {
+		v := atomic.Loaduintptr(&n.key)
+		switch v {
+		case uintptr(unsafe.Pointer(gp.m)):
+			// No wakeup yet; unregister if possible.
+			if atomic.Casuintptr(&n.key, v, 0) {
+				return false
+			}
+		case locked:
+			// Wakeup happened so semaphore is available.
+			// Grab it to avoid getting out of sync.
+			gp.m.blocked = true
+			if semasleep(-1) < 0 {
+				throw("runtime: unable to acquire - semaphore out of sync")
+			}
+			gp.m.blocked = false
+			return true
+		default:
+			throw("runtime: unexpected waitm - semaphore out of sync")
+		}
+	}
+}
+
+func notetsleep(n *note, ns int64) bool {
+	gp := getg()
+	if gp != gp.m.g0 {
+		throw("notetsleep not on g0")
+	}
+	semacreate(gp.m)
+	return notetsleep_internal(n, ns, nil, 0)
+}
+
+// same as runtime·notetsleep, but called on user g (not g0)
+// calls only nosplit functions between entersyscallblock/exitsyscall.
+func notetsleepg(n *note, ns int64) bool {
+	gp := getg()
+	if gp == gp.m.g0 {
+		throw("notetsleepg on g0")
+	}
+	semacreate(gp.m)
+	entersyscallblock()
+	ok := notetsleep_internal(n, ns, nil, 0)
+	exitsyscall()
+	return ok
+}
+
+func beforeIdle(int64, int64) (*g, bool) {
+	return nil, false
+}
+
+func checkTimeouts() {}
diff --git a/src/runtime/lock_wasip1.go b/src/runtime/lock_wasip1.go
new file mode 100644
index 0000000..c4fc59f
--- /dev/null
+++ b/src/runtime/lock_wasip1.go
@@ -0,0 +1,107 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build wasip1
+
+package runtime
+
+// wasm has no support for threads yet. There is no preemption.
+// See proposal: https://github.com/WebAssembly/threads
+// Waiting for a mutex or timeout is implemented as a busy loop
+// while allowing other goroutines to run.
+
+const (
+	mutex_unlocked = 0
+	mutex_locked   = 1
+
+	active_spin     = 4
+	active_spin_cnt = 30
+)
+
+func lock(l *mutex) {
+	lockWithRank(l, getLockRank(l))
+}
+
+func lock2(l *mutex) {
+	if l.key == mutex_locked {
+		// wasm is single-threaded so we should never
+		// observe this.
+		throw("self deadlock")
+	}
+	gp := getg()
+	if gp.m.locks < 0 {
+		throw("lock count")
+	}
+	gp.m.locks++
+	l.key = mutex_locked
+}
+
+func unlock(l *mutex) {
+	unlockWithRank(l)
+}
+
+func unlock2(l *mutex) {
+	if l.key == mutex_unlocked {
+		throw("unlock of unlocked lock")
+	}
+	gp := getg()
+	gp.m.locks--
+	if gp.m.locks < 0 {
+		throw("lock count")
+	}
+	l.key = mutex_unlocked
+}
+
+// One-time notifications.
+func noteclear(n *note) {
+	n.key = 0
+}
+
+func notewakeup(n *note) {
+	if n.key != 0 {
+		print("notewakeup - double wakeup (", n.key, ")\n")
+		throw("notewakeup - double wakeup")
+	}
+	n.key = 1
+}
+
+func notesleep(n *note) {
+	throw("notesleep not supported by wasi")
+}
+
+func notetsleep(n *note, ns int64) bool {
+	throw("notetsleep not supported by wasi")
+	return false
+}
+
+// same as runtime·notetsleep, but called on user g (not g0)
+func notetsleepg(n *note, ns int64) bool {
+	gp := getg()
+	if gp == gp.m.g0 {
+		throw("notetsleepg on g0")
+	}
+
+	deadline := nanotime() + ns
+	for {
+		if n.key != 0 {
+			return true
+		}
+		if sched_yield() != 0 {
+			throw("sched_yield failed")
+		}
+		Gosched()
+		if ns >= 0 && nanotime() >= deadline {
+			return false
+		}
+	}
+}
+
+func beforeIdle(int64, int64) (*g, bool) {
+	return nil, false
+}
+
+func checkTimeouts() {}
+
+//go:wasmimport wasi_snapshot_preview1 sched_yield
+func sched_yield() errno
diff --git a/src/runtime/lockrank.go b/src/runtime/lockrank.go
new file mode 100644
index 0000000..fb99536
--- /dev/null
+++ b/src/runtime/lockrank.go
@@ -0,0 +1,209 @@
+// Code generated by mklockrank.go; DO NOT EDIT.
+
+package runtime
+
+type lockRank int
+
+// Constants representing the ranks of all non-leaf runtime locks, in rank order.
+// Locks with lower rank must be taken before locks with higher rank,
+// in addition to satisfying the partial order in lockPartialOrder.
+// A few ranks allow self-cycles, which are specified in lockPartialOrder.
+const (
+	lockRankUnknown lockRank = iota
+
+	lockRankSysmon
+	lockRankScavenge
+	lockRankForcegc
+	lockRankDefer
+	lockRankSweepWaiters
+	lockRankAssistQueue
+	lockRankSweep
+	lockRankTestR
+	lockRankTestW
+	lockRankAllocmW
+	lockRankExecW
+	lockRankCpuprof
+	lockRankPollDesc
+	// SCHED
+	lockRankAllocmR
+	lockRankExecR
+	lockRankSched
+	lockRankAllg
+	lockRankAllp
+	lockRankTimers
+	lockRankNetpollInit
+	lockRankHchan
+	lockRankNotifyList
+	lockRankSudog
+	lockRankRoot
+	lockRankItab
+	lockRankReflectOffs
+	lockRankUserArenaState
+	// TRACEGLOBAL
+	lockRankTraceBuf
+	lockRankTraceStrings
+	// MALLOC
+	lockRankFin
+	lockRankSpanSetSpine
+	lockRankMspanSpecial
+	// MPROF
+	lockRankGcBitsArenas
+	lockRankProfInsert
+	lockRankProfBlock
+	lockRankProfMemActive
+	lockRankProfMemFuture
+	// STACKGROW
+	lockRankGscan
+	lockRankStackpool
+	lockRankStackLarge
+	lockRankHchanLeaf
+	// WB
+	lockRankWbufSpans
+	lockRankMheap
+	lockRankMheapSpecial
+	lockRankGlobalAlloc
+	// TRACE
+	lockRankTrace
+	lockRankTraceStackTab
+	lockRankPanic
+	lockRankDeadlock
+	lockRankRaceFini
+	lockRankAllocmRInternal
+	lockRankExecRInternal
+	lockRankTestRInternal
+)
+
+// lockRankLeafRank is the rank of lock that does not have a declared rank,
+// and hence is a leaf lock.
+const lockRankLeafRank lockRank = 1000
+
+// lockNames gives the names associated with each of the above ranks.
+var lockNames = []string{
+	lockRankSysmon:          "sysmon",
+	lockRankScavenge:        "scavenge",
+	lockRankForcegc:         "forcegc",
+	lockRankDefer:           "defer",
+	lockRankSweepWaiters:    "sweepWaiters",
+	lockRankAssistQueue:     "assistQueue",
+	lockRankSweep:           "sweep",
+	lockRankTestR:           "testR",
+	lockRankTestW:           "testW",
+	lockRankAllocmW:         "allocmW",
+	lockRankExecW:           "execW",
+	lockRankCpuprof:         "cpuprof",
+	lockRankPollDesc:        "pollDesc",
+	lockRankAllocmR:         "allocmR",
+	lockRankExecR:           "execR",
+	lockRankSched:           "sched",
+	lockRankAllg:            "allg",
+	lockRankAllp:            "allp",
+	lockRankTimers:          "timers",
+	lockRankNetpollInit:     "netpollInit",
+	lockRankHchan:           "hchan",
+	lockRankNotifyList:      "notifyList",
+	lockRankSudog:           "sudog",
+	lockRankRoot:            "root",
+	lockRankItab:            "itab",
+	lockRankReflectOffs:     "reflectOffs",
+	lockRankUserArenaState:  "userArenaState",
+	lockRankTraceBuf:        "traceBuf",
+	lockRankTraceStrings:    "traceStrings",
+	lockRankFin:             "fin",
+	lockRankSpanSetSpine:    "spanSetSpine",
+	lockRankMspanSpecial:    "mspanSpecial",
+	lockRankGcBitsArenas:    "gcBitsArenas",
+	lockRankProfInsert:      "profInsert",
+	lockRankProfBlock:       "profBlock",
+	lockRankProfMemActive:   "profMemActive",
+	lockRankProfMemFuture:   "profMemFuture",
+	lockRankGscan:           "gscan",
+	lockRankStackpool:       "stackpool",
+	lockRankStackLarge:      "stackLarge",
+	lockRankHchanLeaf:       "hchanLeaf",
+	lockRankWbufSpans:       "wbufSpans",
+	lockRankMheap:           "mheap",
+	lockRankMheapSpecial:    "mheapSpecial",
+	lockRankGlobalAlloc:     "globalAlloc",
+	lockRankTrace:           "trace",
+	lockRankTraceStackTab:   "traceStackTab",
+	lockRankPanic:           "panic",
+	lockRankDeadlock:        "deadlock",
+	lockRankRaceFini:        "raceFini",
+	lockRankAllocmRInternal: "allocmRInternal",
+	lockRankExecRInternal:   "execRInternal",
+	lockRankTestRInternal:   "testRInternal",
+}
+
+func (rank lockRank) String() string {
+	if rank == 0 {
+		return "UNKNOWN"
+	}
+	if rank == lockRankLeafRank {
+		return "LEAF"
+	}
+	if rank < 0 || int(rank) >= len(lockNames) {
+		return "BAD RANK"
+	}
+	return lockNames[rank]
+}
+
+// lockPartialOrder is the transitive closure of the lock rank graph.
+// An entry for rank X lists all of the ranks that can already be held
+// when rank X is acquired.
+//
+// Lock ranks that allow self-cycles list themselves.
+var lockPartialOrder [][]lockRank = [][]lockRank{
+	lockRankSysmon:          {},
+	lockRankScavenge:        {lockRankSysmon},
+	lockRankForcegc:         {lockRankSysmon},
+	lockRankDefer:           {},
+	lockRankSweepWaiters:    {},
+	lockRankAssistQueue:     {},
+	lockRankSweep:           {},
+	lockRankTestR:           {},
+	lockRankTestW:           {},
+	lockRankAllocmW:         {},
+	lockRankExecW:           {},
+	lockRankCpuprof:         {},
+	lockRankPollDesc:        {},
+	lockRankAllocmR:         {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankCpuprof, lockRankPollDesc},
+	lockRankExecR:           {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankCpuprof, lockRankPollDesc},
+	lockRankSched:           {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR},
+	lockRankAllg:            {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched},
+	lockRankAllp:            {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched},
+	lockRankTimers:          {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllp, lockRankTimers},
+	lockRankNetpollInit:     {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllp, lockRankTimers},
+	lockRankHchan:           {lockRankSysmon, lockRankScavenge, lockRankSweep, lockRankTestR, lockRankHchan},
+	lockRankNotifyList:      {},
+	lockRankSudog:           {lockRankSysmon, lockRankScavenge, lockRankSweep, lockRankTestR, lockRankHchan, lockRankNotifyList},
+	lockRankRoot:            {},
+	lockRankItab:            {},
+	lockRankReflectOffs:     {lockRankItab},
+	lockRankUserArenaState:  {},
+	lockRankTraceBuf:        {lockRankSysmon, lockRankScavenge},
+	lockRankTraceStrings:    {lockRankSysmon, lockRankScavenge, lockRankTraceBuf},
+	lockRankFin:             {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankHchan, lockRankNotifyList, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings},
+	lockRankSpanSetSpine:    {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankHchan, lockRankNotifyList, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings},
+	lockRankMspanSpecial:    {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankHchan, lockRankNotifyList, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings},
+	lockRankGcBitsArenas:    {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankHchan, lockRankNotifyList, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings, lockRankMspanSpecial},
+	lockRankProfInsert:      {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankHchan, lockRankNotifyList, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings},
+	lockRankProfBlock:       {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankHchan, lockRankNotifyList, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings},
+	lockRankProfMemActive:   {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankHchan, lockRankNotifyList, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings},
+	lockRankProfMemFuture:   {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankHchan, lockRankNotifyList, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings, lockRankProfMemActive},
+	lockRankGscan:           {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankNetpollInit, lockRankHchan, lockRankNotifyList, lockRankRoot, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings, lockRankFin, lockRankSpanSetSpine, lockRankMspanSpecial, lockRankGcBitsArenas, lockRankProfInsert, lockRankProfBlock, lockRankProfMemActive, lockRankProfMemFuture},
+	lockRankStackpool:       {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankNetpollInit, lockRankHchan, lockRankNotifyList, lockRankRoot, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings, lockRankFin, lockRankSpanSetSpine, lockRankMspanSpecial, lockRankGcBitsArenas, lockRankProfInsert, lockRankProfBlock, lockRankProfMemActive, lockRankProfMemFuture, lockRankGscan},
+	lockRankStackLarge:      {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankNetpollInit, lockRankHchan, lockRankNotifyList, lockRankRoot, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings, lockRankFin, lockRankSpanSetSpine, lockRankMspanSpecial, lockRankGcBitsArenas, lockRankProfInsert, lockRankProfBlock, lockRankProfMemActive, lockRankProfMemFuture, lockRankGscan},
+	lockRankHchanLeaf:       {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankNetpollInit, lockRankHchan, lockRankNotifyList, lockRankRoot, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings, lockRankFin, lockRankSpanSetSpine, lockRankMspanSpecial, lockRankGcBitsArenas, lockRankProfInsert, lockRankProfBlock, lockRankProfMemActive, lockRankProfMemFuture, lockRankGscan, lockRankHchanLeaf},
+	lockRankWbufSpans:       {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankDefer, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankNetpollInit, lockRankHchan, lockRankNotifyList, lockRankSudog, lockRankRoot, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings, lockRankFin, lockRankSpanSetSpine, lockRankMspanSpecial, lockRankGcBitsArenas, lockRankProfInsert, lockRankProfBlock, lockRankProfMemActive, lockRankProfMemFuture, lockRankGscan},
+	lockRankMheap:           {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankDefer, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankNetpollInit, lockRankHchan, lockRankNotifyList, lockRankSudog, lockRankRoot, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings, lockRankFin, lockRankSpanSetSpine, lockRankMspanSpecial, lockRankGcBitsArenas, lockRankProfInsert, lockRankProfBlock, lockRankProfMemActive, lockRankProfMemFuture, lockRankGscan, lockRankStackpool, lockRankStackLarge, lockRankWbufSpans},
+	lockRankMheapSpecial:    {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankDefer, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankNetpollInit, lockRankHchan, lockRankNotifyList, lockRankSudog, lockRankRoot, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings, lockRankFin, lockRankSpanSetSpine, lockRankMspanSpecial, lockRankGcBitsArenas, lockRankProfInsert, lockRankProfBlock, lockRankProfMemActive, lockRankProfMemFuture, lockRankGscan, lockRankStackpool, lockRankStackLarge, lockRankWbufSpans, lockRankMheap},
+	lockRankGlobalAlloc:     {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankDefer, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankNetpollInit, lockRankHchan, lockRankNotifyList, lockRankSudog, lockRankRoot, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings, lockRankFin, lockRankSpanSetSpine, lockRankMspanSpecial, lockRankGcBitsArenas, lockRankProfInsert, lockRankProfBlock, lockRankProfMemActive, lockRankProfMemFuture, lockRankGscan, lockRankStackpool, lockRankStackLarge, lockRankWbufSpans, lockRankMheap, lockRankMheapSpecial},
+	lockRankTrace:           {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankDefer, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankNetpollInit, lockRankHchan, lockRankNotifyList, lockRankSudog, lockRankRoot, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings, lockRankFin, lockRankSpanSetSpine, lockRankMspanSpecial, lockRankGcBitsArenas, lockRankProfInsert, lockRankProfBlock, lockRankProfMemActive, lockRankProfMemFuture, lockRankGscan, lockRankStackpool, lockRankStackLarge, lockRankWbufSpans, lockRankMheap},
+	lockRankTraceStackTab:   {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankDefer, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR, lockRankExecR, lockRankSched, lockRankAllg, lockRankAllp, lockRankTimers, lockRankNetpollInit, lockRankHchan, lockRankNotifyList, lockRankSudog, lockRankRoot, lockRankItab, lockRankReflectOffs, lockRankUserArenaState, lockRankTraceBuf, lockRankTraceStrings, lockRankFin, lockRankSpanSetSpine, lockRankMspanSpecial, lockRankGcBitsArenas, lockRankProfInsert, lockRankProfBlock, lockRankProfMemActive, lockRankProfMemFuture, lockRankGscan, lockRankStackpool, lockRankStackLarge, lockRankWbufSpans, lockRankMheap, lockRankTrace},
+	lockRankPanic:           {},
+	lockRankDeadlock:        {lockRankPanic, lockRankDeadlock},
+	lockRankRaceFini:        {lockRankPanic},
+	lockRankAllocmRInternal: {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankAllocmW, lockRankCpuprof, lockRankPollDesc, lockRankAllocmR},
+	lockRankExecRInternal:   {lockRankSysmon, lockRankScavenge, lockRankForcegc, lockRankSweepWaiters, lockRankAssistQueue, lockRankSweep, lockRankTestR, lockRankExecW, lockRankCpuprof, lockRankPollDesc, lockRankExecR},
+	lockRankTestRInternal:   {lockRankTestR, lockRankTestW},
+}
diff --git a/src/runtime/lockrank_off.go b/src/runtime/lockrank_off.go
new file mode 100644
index 0000000..c86726f
--- /dev/null
+++ b/src/runtime/lockrank_off.go
@@ -0,0 +1,68 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !goexperiment.staticlockranking
+
+package runtime
+
+const staticLockRanking = false
+
+// // lockRankStruct is embedded in mutex, but is empty when staticklockranking is
+// disabled (the default)
+type lockRankStruct struct {
+}
+
+func lockInit(l *mutex, rank lockRank) {
+}
+
+func getLockRank(l *mutex) lockRank {
+	return 0
+}
+
+func lockWithRank(l *mutex, rank lockRank) {
+	lock2(l)
+}
+
+// This function may be called in nosplit context and thus must be nosplit.
+//
+//go:nosplit
+func acquireLockRank(rank lockRank) {
+}
+
+func unlockWithRank(l *mutex) {
+	unlock2(l)
+}
+
+// This function may be called in nosplit context and thus must be nosplit.
+//
+//go:nosplit
+func releaseLockRank(rank lockRank) {
+}
+
+func lockWithRankMayAcquire(l *mutex, rank lockRank) {
+}
+
+//go:nosplit
+func assertLockHeld(l *mutex) {
+}
+
+//go:nosplit
+func assertRankHeld(r lockRank) {
+}
+
+//go:nosplit
+func worldStopped() {
+}
+
+//go:nosplit
+func worldStarted() {
+}
+
+//go:nosplit
+func assertWorldStopped() {
+}
+
+//go:nosplit
+func assertWorldStoppedOrLockHeld(l *mutex) {
+}
diff --git a/src/runtime/lockrank_on.go b/src/runtime/lockrank_on.go
new file mode 100644
index 0000000..bf530ee
--- /dev/null
+++ b/src/runtime/lockrank_on.go
@@ -0,0 +1,389 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build goexperiment.staticlockranking
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const staticLockRanking = true
+
+// worldIsStopped is accessed atomically to track world-stops. 1 == world
+// stopped.
+var worldIsStopped atomic.Uint32
+
+// lockRankStruct is embedded in mutex
+type lockRankStruct struct {
+	// static lock ranking of the lock
+	rank lockRank
+	// pad field to make sure lockRankStruct is a multiple of 8 bytes, even on
+	// 32-bit systems.
+	pad int
+}
+
+// lockInit(l *mutex, rank int) sets the rank of lock before it is used.
+// If there is no clear place to initialize a lock, then the rank of a lock can be
+// specified during the lock call itself via lockWithRank(l *mutex, rank int).
+func lockInit(l *mutex, rank lockRank) {
+	l.rank = rank
+}
+
+func getLockRank(l *mutex) lockRank {
+	return l.rank
+}
+
+// lockWithRank is like lock(l), but allows the caller to specify a lock rank
+// when acquiring a non-static lock.
+//
+// Note that we need to be careful about stack splits:
+//
+// This function is not nosplit, thus it may split at function entry. This may
+// introduce a new edge in the lock order, but it is no different from any
+// other (nosplit) call before this call (including the call to lock() itself).
+//
+// However, we switch to the systemstack to record the lock held to ensure that
+// we record an accurate lock ordering. e.g., without systemstack, a stack
+// split on entry to lock2() would record stack split locks as taken after l,
+// even though l is not actually locked yet.
+func lockWithRank(l *mutex, rank lockRank) {
+	if l == &debuglock || l == &paniclk || l == &raceFiniLock {
+		// debuglock is only used for println/printlock(). Don't do lock
+		// rank recording for it, since print/println are used when
+		// printing out a lock ordering problem below.
+		//
+		// paniclk is only used for fatal throw/panic. Don't do lock
+		// ranking recording for it, since we throw after reporting a
+		// lock ordering problem. Additionally, paniclk may be taken
+		// after effectively any lock (anywhere we might panic), which
+		// the partial order doesn't cover.
+		//
+		// raceFiniLock is held while exiting when running
+		// the race detector. Don't do lock rank recording for it,
+		// since we are exiting.
+		lock2(l)
+		return
+	}
+	if rank == 0 {
+		rank = lockRankLeafRank
+	}
+	gp := getg()
+	// Log the new class.
+	systemstack(func() {
+		i := gp.m.locksHeldLen
+		if i >= len(gp.m.locksHeld) {
+			throw("too many locks held concurrently for rank checking")
+		}
+		gp.m.locksHeld[i].rank = rank
+		gp.m.locksHeld[i].lockAddr = uintptr(unsafe.Pointer(l))
+		gp.m.locksHeldLen++
+
+		// i is the index of the lock being acquired
+		if i > 0 {
+			checkRanks(gp, gp.m.locksHeld[i-1].rank, rank)
+		}
+		lock2(l)
+	})
+}
+
+// nosplit to ensure it can be called in as many contexts as possible.
+//
+//go:nosplit
+func printHeldLocks(gp *g) {
+	if gp.m.locksHeldLen == 0 {
+		println("<none>")
+		return
+	}
+
+	for j, held := range gp.m.locksHeld[:gp.m.locksHeldLen] {
+		println(j, ":", held.rank.String(), held.rank, unsafe.Pointer(gp.m.locksHeld[j].lockAddr))
+	}
+}
+
+// acquireLockRank acquires a rank which is not associated with a mutex lock
+//
+// This function may be called in nosplit context and thus must be nosplit.
+//
+//go:nosplit
+func acquireLockRank(rank lockRank) {
+	gp := getg()
+	// Log the new class. See comment on lockWithRank.
+	systemstack(func() {
+		i := gp.m.locksHeldLen
+		if i >= len(gp.m.locksHeld) {
+			throw("too many locks held concurrently for rank checking")
+		}
+		gp.m.locksHeld[i].rank = rank
+		gp.m.locksHeld[i].lockAddr = 0
+		gp.m.locksHeldLen++
+
+		// i is the index of the lock being acquired
+		if i > 0 {
+			checkRanks(gp, gp.m.locksHeld[i-1].rank, rank)
+		}
+	})
+}
+
+// checkRanks checks if goroutine g, which has mostly recently acquired a lock
+// with rank 'prevRank', can now acquire a lock with rank 'rank'.
+//
+//go:systemstack
+func checkRanks(gp *g, prevRank, rank lockRank) {
+	rankOK := false
+	if rank < prevRank {
+		// If rank < prevRank, then we definitely have a rank error
+		rankOK = false
+	} else if rank == lockRankLeafRank {
+		// If new lock is a leaf lock, then the preceding lock can
+		// be anything except another leaf lock.
+		rankOK = prevRank < lockRankLeafRank
+	} else {
+		// We've now verified the total lock ranking, but we
+		// also enforce the partial ordering specified by
+		// lockPartialOrder as well. Two locks with the same rank
+		// can only be acquired at the same time if explicitly
+		// listed in the lockPartialOrder table.
+		list := lockPartialOrder[rank]
+		for _, entry := range list {
+			if entry == prevRank {
+				rankOK = true
+				break
+			}
+		}
+	}
+	if !rankOK {
+		printlock()
+		println(gp.m.procid, " ======")
+		printHeldLocks(gp)
+		throw("lock ordering problem")
+	}
+}
+
+// See comment on lockWithRank regarding stack splitting.
+func unlockWithRank(l *mutex) {
+	if l == &debuglock || l == &paniclk || l == &raceFiniLock {
+		// See comment at beginning of lockWithRank.
+		unlock2(l)
+		return
+	}
+	gp := getg()
+	systemstack(func() {
+		found := false
+		for i := gp.m.locksHeldLen - 1; i >= 0; i-- {
+			if gp.m.locksHeld[i].lockAddr == uintptr(unsafe.Pointer(l)) {
+				found = true
+				copy(gp.m.locksHeld[i:gp.m.locksHeldLen-1], gp.m.locksHeld[i+1:gp.m.locksHeldLen])
+				gp.m.locksHeldLen--
+				break
+			}
+		}
+		if !found {
+			println(gp.m.procid, ":", l.rank.String(), l.rank, l)
+			throw("unlock without matching lock acquire")
+		}
+		unlock2(l)
+	})
+}
+
+// releaseLockRank releases a rank which is not associated with a mutex lock
+//
+// This function may be called in nosplit context and thus must be nosplit.
+//
+//go:nosplit
+func releaseLockRank(rank lockRank) {
+	gp := getg()
+	systemstack(func() {
+		found := false
+		for i := gp.m.locksHeldLen - 1; i >= 0; i-- {
+			if gp.m.locksHeld[i].rank == rank && gp.m.locksHeld[i].lockAddr == 0 {
+				found = true
+				copy(gp.m.locksHeld[i:gp.m.locksHeldLen-1], gp.m.locksHeld[i+1:gp.m.locksHeldLen])
+				gp.m.locksHeldLen--
+				break
+			}
+		}
+		if !found {
+			println(gp.m.procid, ":", rank.String(), rank)
+			throw("lockRank release without matching lockRank acquire")
+		}
+	})
+}
+
+// See comment on lockWithRank regarding stack splitting.
+func lockWithRankMayAcquire(l *mutex, rank lockRank) {
+	gp := getg()
+	if gp.m.locksHeldLen == 0 {
+		// No possibility of lock ordering problem if no other locks held
+		return
+	}
+
+	systemstack(func() {
+		i := gp.m.locksHeldLen
+		if i >= len(gp.m.locksHeld) {
+			throw("too many locks held concurrently for rank checking")
+		}
+		// Temporarily add this lock to the locksHeld list, so
+		// checkRanks() will print out list, including this lock, if there
+		// is a lock ordering problem.
+		gp.m.locksHeld[i].rank = rank
+		gp.m.locksHeld[i].lockAddr = uintptr(unsafe.Pointer(l))
+		gp.m.locksHeldLen++
+		checkRanks(gp, gp.m.locksHeld[i-1].rank, rank)
+		gp.m.locksHeldLen--
+	})
+}
+
+// nosplit to ensure it can be called in as many contexts as possible.
+//
+//go:nosplit
+func checkLockHeld(gp *g, l *mutex) bool {
+	for i := gp.m.locksHeldLen - 1; i >= 0; i-- {
+		if gp.m.locksHeld[i].lockAddr == uintptr(unsafe.Pointer(l)) {
+			return true
+		}
+	}
+	return false
+}
+
+// assertLockHeld throws if l is not held by the caller.
+//
+// nosplit to ensure it can be called in as many contexts as possible.
+//
+//go:nosplit
+func assertLockHeld(l *mutex) {
+	gp := getg()
+
+	held := checkLockHeld(gp, l)
+	if held {
+		return
+	}
+
+	// Crash from system stack to avoid splits that may cause
+	// additional issues.
+	systemstack(func() {
+		printlock()
+		print("caller requires lock ", l, " (rank ", l.rank.String(), "), holding:\n")
+		printHeldLocks(gp)
+		throw("not holding required lock!")
+	})
+}
+
+// assertRankHeld throws if a mutex with rank r is not held by the caller.
+//
+// This is less precise than assertLockHeld, but can be used in places where a
+// pointer to the exact mutex is not available.
+//
+// nosplit to ensure it can be called in as many contexts as possible.
+//
+//go:nosplit
+func assertRankHeld(r lockRank) {
+	gp := getg()
+
+	for i := gp.m.locksHeldLen - 1; i >= 0; i-- {
+		if gp.m.locksHeld[i].rank == r {
+			return
+		}
+	}
+
+	// Crash from system stack to avoid splits that may cause
+	// additional issues.
+	systemstack(func() {
+		printlock()
+		print("caller requires lock with rank ", r.String(), "), holding:\n")
+		printHeldLocks(gp)
+		throw("not holding required lock!")
+	})
+}
+
+// worldStopped notes that the world is stopped.
+//
+// Caller must hold worldsema.
+//
+// nosplit to ensure it can be called in as many contexts as possible.
+//
+//go:nosplit
+func worldStopped() {
+	if stopped := worldIsStopped.Add(1); stopped != 1 {
+		systemstack(func() {
+			print("world stop count=", stopped, "\n")
+			throw("recursive world stop")
+		})
+	}
+}
+
+// worldStarted that the world is starting.
+//
+// Caller must hold worldsema.
+//
+// nosplit to ensure it can be called in as many contexts as possible.
+//
+//go:nosplit
+func worldStarted() {
+	if stopped := worldIsStopped.Add(-1); stopped != 0 {
+		systemstack(func() {
+			print("world stop count=", stopped, "\n")
+			throw("released non-stopped world stop")
+		})
+	}
+}
+
+// nosplit to ensure it can be called in as many contexts as possible.
+//
+//go:nosplit
+func checkWorldStopped() bool {
+	stopped := worldIsStopped.Load()
+	if stopped > 1 {
+		systemstack(func() {
+			print("inconsistent world stop count=", stopped, "\n")
+			throw("inconsistent world stop count")
+		})
+	}
+
+	return stopped == 1
+}
+
+// assertWorldStopped throws if the world is not stopped. It does not check
+// which M stopped the world.
+//
+// nosplit to ensure it can be called in as many contexts as possible.
+//
+//go:nosplit
+func assertWorldStopped() {
+	if checkWorldStopped() {
+		return
+	}
+
+	throw("world not stopped")
+}
+
+// assertWorldStoppedOrLockHeld throws if the world is not stopped and the
+// passed lock is not held.
+//
+// nosplit to ensure it can be called in as many contexts as possible.
+//
+//go:nosplit
+func assertWorldStoppedOrLockHeld(l *mutex) {
+	if checkWorldStopped() {
+		return
+	}
+
+	gp := getg()
+	held := checkLockHeld(gp, l)
+	if held {
+		return
+	}
+
+	// Crash from system stack to avoid splits that may cause
+	// additional issues.
+	systemstack(func() {
+		printlock()
+		print("caller requires world stop or lock ", l, " (rank ", l.rank.String(), "), holding:\n")
+		println("<no world stop>")
+		printHeldLocks(gp)
+		throw("no world stop or required lock!")
+	})
+}
diff --git a/src/runtime/lockrank_test.go b/src/runtime/lockrank_test.go
new file mode 100644
index 0000000..a7b1b8d
--- /dev/null
+++ b/src/runtime/lockrank_test.go
@@ -0,0 +1,29 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"bytes"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"testing"
+)
+
+// Test that the generated code for the lock rank graph is up-to-date.
+func TestLockRankGenerated(t *testing.T) {
+	testenv.MustHaveGoRun(t)
+	want, err := testenv.CleanCmdEnv(exec.Command(testenv.GoToolPath(t), "run", "mklockrank.go")).CombinedOutput()
+	if err != nil {
+		t.Fatal(err)
+	}
+	got, err := os.ReadFile("lockrank.go")
+	if err != nil {
+		t.Fatal(err)
+	}
+	if !bytes.Equal(want, got) {
+		t.Fatalf("lockrank.go is out of date. Please run go generate.")
+	}
+}
diff --git a/src/runtime/malloc.go b/src/runtime/malloc.go
new file mode 100644
index 0000000..b2026ad
--- /dev/null
+++ b/src/runtime/malloc.go
@@ -0,0 +1,1636 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Memory allocator.
+//
+// This was originally based on tcmalloc, but has diverged quite a bit.
+// http://goog-perftools.sourceforge.net/doc/tcmalloc.html
+
+// The main allocator works in runs of pages.
+// Small allocation sizes (up to and including 32 kB) are
+// rounded to one of about 70 size classes, each of which
+// has its own free set of objects of exactly that size.
+// Any free page of memory can be split into a set of objects
+// of one size class, which are then managed using a free bitmap.
+//
+// The allocator's data structures are:
+//
+//	fixalloc: a free-list allocator for fixed-size off-heap objects,
+//		used to manage storage used by the allocator.
+//	mheap: the malloc heap, managed at page (8192-byte) granularity.
+//	mspan: a run of in-use pages managed by the mheap.
+//	mcentral: collects all spans of a given size class.
+//	mcache: a per-P cache of mspans with free space.
+//	mstats: allocation statistics.
+//
+// Allocating a small object proceeds up a hierarchy of caches:
+//
+//	1. Round the size up to one of the small size classes
+//	   and look in the corresponding mspan in this P's mcache.
+//	   Scan the mspan's free bitmap to find a free slot.
+//	   If there is a free slot, allocate it.
+//	   This can all be done without acquiring a lock.
+//
+//	2. If the mspan has no free slots, obtain a new mspan
+//	   from the mcentral's list of mspans of the required size
+//	   class that have free space.
+//	   Obtaining a whole span amortizes the cost of locking
+//	   the mcentral.
+//
+//	3. If the mcentral's mspan list is empty, obtain a run
+//	   of pages from the mheap to use for the mspan.
+//
+//	4. If the mheap is empty or has no page runs large enough,
+//	   allocate a new group of pages (at least 1MB) from the
+//	   operating system. Allocating a large run of pages
+//	   amortizes the cost of talking to the operating system.
+//
+// Sweeping an mspan and freeing objects on it proceeds up a similar
+// hierarchy:
+//
+//	1. If the mspan is being swept in response to allocation, it
+//	   is returned to the mcache to satisfy the allocation.
+//
+//	2. Otherwise, if the mspan still has allocated objects in it,
+//	   it is placed on the mcentral free list for the mspan's size
+//	   class.
+//
+//	3. Otherwise, if all objects in the mspan are free, the mspan's
+//	   pages are returned to the mheap and the mspan is now dead.
+//
+// Allocating and freeing a large object uses the mheap
+// directly, bypassing the mcache and mcentral.
+//
+// If mspan.needzero is false, then free object slots in the mspan are
+// already zeroed. Otherwise if needzero is true, objects are zeroed as
+// they are allocated. There are various benefits to delaying zeroing
+// this way:
+//
+//	1. Stack frame allocation can avoid zeroing altogether.
+//
+//	2. It exhibits better temporal locality, since the program is
+//	   probably about to write to the memory.
+//
+//	3. We don't zero pages that never get reused.
+
+// Virtual memory layout
+//
+// The heap consists of a set of arenas, which are 64MB on 64-bit and
+// 4MB on 32-bit (heapArenaBytes). Each arena's start address is also
+// aligned to the arena size.
+//
+// Each arena has an associated heapArena object that stores the
+// metadata for that arena: the heap bitmap for all words in the arena
+// and the span map for all pages in the arena. heapArena objects are
+// themselves allocated off-heap.
+//
+// Since arenas are aligned, the address space can be viewed as a
+// series of arena frames. The arena map (mheap_.arenas) maps from
+// arena frame number to *heapArena, or nil for parts of the address
+// space not backed by the Go heap. The arena map is structured as a
+// two-level array consisting of a "L1" arena map and many "L2" arena
+// maps; however, since arenas are large, on many architectures, the
+// arena map consists of a single, large L2 map.
+//
+// The arena map covers the entire possible address space, allowing
+// the Go heap to use any part of the address space. The allocator
+// attempts to keep arenas contiguous so that large spans (and hence
+// large objects) can cross arenas.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"internal/goos"
+	"runtime/internal/atomic"
+	"runtime/internal/math"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	maxTinySize   = _TinySize
+	tinySizeClass = _TinySizeClass
+	maxSmallSize  = _MaxSmallSize
+
+	pageShift = _PageShift
+	pageSize  = _PageSize
+
+	concurrentSweep = _ConcurrentSweep
+
+	_PageSize = 1 << _PageShift
+	_PageMask = _PageSize - 1
+
+	// _64bit = 1 on 64-bit systems, 0 on 32-bit systems
+	_64bit = 1 << (^uintptr(0) >> 63) / 2
+
+	// Tiny allocator parameters, see "Tiny allocator" comment in malloc.go.
+	_TinySize      = 16
+	_TinySizeClass = int8(2)
+
+	_FixAllocChunk = 16 << 10 // Chunk size for FixAlloc
+
+	// Per-P, per order stack segment cache size.
+	_StackCacheSize = 32 * 1024
+
+	// Number of orders that get caching. Order 0 is FixedStack
+	// and each successive order is twice as large.
+	// We want to cache 2KB, 4KB, 8KB, and 16KB stacks. Larger stacks
+	// will be allocated directly.
+	// Since FixedStack is different on different systems, we
+	// must vary NumStackOrders to keep the same maximum cached size.
+	//   OS               | FixedStack | NumStackOrders
+	//   -----------------+------------+---------------
+	//   linux/darwin/bsd | 2KB        | 4
+	//   windows/32       | 4KB        | 3
+	//   windows/64       | 8KB        | 2
+	//   plan9            | 4KB        | 3
+	_NumStackOrders = 4 - goarch.PtrSize/4*goos.IsWindows - 1*goos.IsPlan9
+
+	// heapAddrBits is the number of bits in a heap address. On
+	// amd64, addresses are sign-extended beyond heapAddrBits. On
+	// other arches, they are zero-extended.
+	//
+	// On most 64-bit platforms, we limit this to 48 bits based on a
+	// combination of hardware and OS limitations.
+	//
+	// amd64 hardware limits addresses to 48 bits, sign-extended
+	// to 64 bits. Addresses where the top 16 bits are not either
+	// all 0 or all 1 are "non-canonical" and invalid. Because of
+	// these "negative" addresses, we offset addresses by 1<<47
+	// (arenaBaseOffset) on amd64 before computing indexes into
+	// the heap arenas index. In 2017, amd64 hardware added
+	// support for 57 bit addresses; however, currently only Linux
+	// supports this extension and the kernel will never choose an
+	// address above 1<<47 unless mmap is called with a hint
+	// address above 1<<47 (which we never do).
+	//
+	// arm64 hardware (as of ARMv8) limits user addresses to 48
+	// bits, in the range [0, 1<<48).
+	//
+	// ppc64, mips64, and s390x support arbitrary 64 bit addresses
+	// in hardware. On Linux, Go leans on stricter OS limits. Based
+	// on Linux's processor.h, the user address space is limited as
+	// follows on 64-bit architectures:
+	//
+	// Architecture  Name              Maximum Value (exclusive)
+	// ---------------------------------------------------------------------
+	// amd64         TASK_SIZE_MAX     0x007ffffffff000 (47 bit addresses)
+	// arm64         TASK_SIZE_64      0x01000000000000 (48 bit addresses)
+	// ppc64{,le}    TASK_SIZE_USER64  0x00400000000000 (46 bit addresses)
+	// mips64{,le}   TASK_SIZE64       0x00010000000000 (40 bit addresses)
+	// s390x         TASK_SIZE         1<<64 (64 bit addresses)
+	//
+	// These limits may increase over time, but are currently at
+	// most 48 bits except on s390x. On all architectures, Linux
+	// starts placing mmap'd regions at addresses that are
+	// significantly below 48 bits, so even if it's possible to
+	// exceed Go's 48 bit limit, it's extremely unlikely in
+	// practice.
+	//
+	// On 32-bit platforms, we accept the full 32-bit address
+	// space because doing so is cheap.
+	// mips32 only has access to the low 2GB of virtual memory, so
+	// we further limit it to 31 bits.
+	//
+	// On ios/arm64, although 64-bit pointers are presumably
+	// available, pointers are truncated to 33 bits in iOS <14.
+	// Furthermore, only the top 4 GiB of the address space are
+	// actually available to the application. In iOS >=14, more
+	// of the address space is available, and the OS can now
+	// provide addresses outside of those 33 bits. Pick 40 bits
+	// as a reasonable balance between address space usage by the
+	// page allocator, and flexibility for what mmap'd regions
+	// we'll accept for the heap. We can't just move to the full
+	// 48 bits because this uses too much address space for older
+	// iOS versions.
+	// TODO(mknyszek): Once iOS <14 is deprecated, promote ios/arm64
+	// to a 48-bit address space like every other arm64 platform.
+	//
+	// WebAssembly currently has a limit of 4GB linear memory.
+	heapAddrBits = (_64bit*(1-goarch.IsWasm)*(1-goos.IsIos*goarch.IsArm64))*48 + (1-_64bit+goarch.IsWasm)*(32-(goarch.IsMips+goarch.IsMipsle)) + 40*goos.IsIos*goarch.IsArm64
+
+	// maxAlloc is the maximum size of an allocation. On 64-bit,
+	// it's theoretically possible to allocate 1<<heapAddrBits bytes. On
+	// 32-bit, however, this is one less than 1<<32 because the
+	// number of bytes in the address space doesn't actually fit
+	// in a uintptr.
+	maxAlloc = (1 << heapAddrBits) - (1-_64bit)*1
+
+	// The number of bits in a heap address, the size of heap
+	// arenas, and the L1 and L2 arena map sizes are related by
+	//
+	//   (1 << addr bits) = arena size * L1 entries * L2 entries
+	//
+	// Currently, we balance these as follows:
+	//
+	//       Platform  Addr bits  Arena size  L1 entries   L2 entries
+	// --------------  ---------  ----------  ----------  -----------
+	//       */64-bit         48        64MB           1    4M (32MB)
+	// windows/64-bit         48         4MB          64    1M  (8MB)
+	//      ios/arm64         33         4MB           1  2048  (8KB)
+	//       */32-bit         32         4MB           1  1024  (4KB)
+	//     */mips(le)         31         4MB           1   512  (2KB)
+
+	// heapArenaBytes is the size of a heap arena. The heap
+	// consists of mappings of size heapArenaBytes, aligned to
+	// heapArenaBytes. The initial heap mapping is one arena.
+	//
+	// This is currently 64MB on 64-bit non-Windows and 4MB on
+	// 32-bit and on Windows. We use smaller arenas on Windows
+	// because all committed memory is charged to the process,
+	// even if it's not touched. Hence, for processes with small
+	// heaps, the mapped arena space needs to be commensurate.
+	// This is particularly important with the race detector,
+	// since it significantly amplifies the cost of committed
+	// memory.
+	heapArenaBytes = 1 << logHeapArenaBytes
+
+	heapArenaWords = heapArenaBytes / goarch.PtrSize
+
+	// logHeapArenaBytes is log_2 of heapArenaBytes. For clarity,
+	// prefer using heapArenaBytes where possible (we need the
+	// constant to compute some other constants).
+	logHeapArenaBytes = (6+20)*(_64bit*(1-goos.IsWindows)*(1-goarch.IsWasm)*(1-goos.IsIos*goarch.IsArm64)) + (2+20)*(_64bit*goos.IsWindows) + (2+20)*(1-_64bit) + (2+20)*goarch.IsWasm + (2+20)*goos.IsIos*goarch.IsArm64
+
+	// heapArenaBitmapWords is the size of each heap arena's bitmap in uintptrs.
+	heapArenaBitmapWords = heapArenaWords / (8 * goarch.PtrSize)
+
+	pagesPerArena = heapArenaBytes / pageSize
+
+	// arenaL1Bits is the number of bits of the arena number
+	// covered by the first level arena map.
+	//
+	// This number should be small, since the first level arena
+	// map requires PtrSize*(1<<arenaL1Bits) of space in the
+	// binary's BSS. It can be zero, in which case the first level
+	// index is effectively unused. There is a performance benefit
+	// to this, since the generated code can be more efficient,
+	// but comes at the cost of having a large L2 mapping.
+	//
+	// We use the L1 map on 64-bit Windows because the arena size
+	// is small, but the address space is still 48 bits, and
+	// there's a high cost to having a large L2.
+	arenaL1Bits = 6 * (_64bit * goos.IsWindows)
+
+	// arenaL2Bits is the number of bits of the arena number
+	// covered by the second level arena index.
+	//
+	// The size of each arena map allocation is proportional to
+	// 1<<arenaL2Bits, so it's important that this not be too
+	// large. 48 bits leads to 32MB arena index allocations, which
+	// is about the practical threshold.
+	arenaL2Bits = heapAddrBits - logHeapArenaBytes - arenaL1Bits
+
+	// arenaL1Shift is the number of bits to shift an arena frame
+	// number by to compute an index into the first level arena map.
+	arenaL1Shift = arenaL2Bits
+
+	// arenaBits is the total bits in a combined arena map index.
+	// This is split between the index into the L1 arena map and
+	// the L2 arena map.
+	arenaBits = arenaL1Bits + arenaL2Bits
+
+	// arenaBaseOffset is the pointer value that corresponds to
+	// index 0 in the heap arena map.
+	//
+	// On amd64, the address space is 48 bits, sign extended to 64
+	// bits. This offset lets us handle "negative" addresses (or
+	// high addresses if viewed as unsigned).
+	//
+	// On aix/ppc64, this offset allows to keep the heapAddrBits to
+	// 48. Otherwise, it would be 60 in order to handle mmap addresses
+	// (in range 0x0a00000000000000 - 0x0afffffffffffff). But in this
+	// case, the memory reserved in (s *pageAlloc).init for chunks
+	// is causing important slowdowns.
+	//
+	// On other platforms, the user address space is contiguous
+	// and starts at 0, so no offset is necessary.
+	arenaBaseOffset = 0xffff800000000000*goarch.IsAmd64 + 0x0a00000000000000*goos.IsAix
+	// A typed version of this constant that will make it into DWARF (for viewcore).
+	arenaBaseOffsetUintptr = uintptr(arenaBaseOffset)
+
+	// Max number of threads to run garbage collection.
+	// 2, 3, and 4 are all plausible maximums depending
+	// on the hardware details of the machine. The garbage
+	// collector scales well to 32 cpus.
+	_MaxGcproc = 32
+
+	// minLegalPointer is the smallest possible legal pointer.
+	// This is the smallest possible architectural page size,
+	// since we assume that the first page is never mapped.
+	//
+	// This should agree with minZeroPage in the compiler.
+	minLegalPointer uintptr = 4096
+
+	// minHeapForMetadataHugePages sets a threshold on when certain kinds of
+	// heap metadata, currently the arenas map L2 entries and page alloc bitmap
+	// mappings, are allowed to be backed by huge pages. If the heap goal ever
+	// exceeds this threshold, then huge pages are enabled.
+	//
+	// These numbers are chosen with the assumption that huge pages are on the
+	// order of a few MiB in size.
+	//
+	// The kind of metadata this applies to has a very low overhead when compared
+	// to address space used, but their constant overheads for small heaps would
+	// be very high if they were to be backed by huge pages (e.g. a few MiB makes
+	// a huge difference for an 8 MiB heap, but barely any difference for a 1 GiB
+	// heap). The benefit of huge pages is also not worth it for small heaps,
+	// because only a very, very small part of the metadata is used for small heaps.
+	//
+	// N.B. If the heap goal exceeds the threshold then shrinks to a very small size
+	// again, then huge pages will still be enabled for this mapping. The reason is that
+	// there's no point unless we're also returning the physical memory for these
+	// metadata mappings back to the OS. That would be quite complex to do in general
+	// as the heap is likely fragmented after a reduction in heap size.
+	minHeapForMetadataHugePages = 1 << 30
+)
+
+// physPageSize is the size in bytes of the OS's physical pages.
+// Mapping and unmapping operations must be done at multiples of
+// physPageSize.
+//
+// This must be set by the OS init code (typically in osinit) before
+// mallocinit.
+var physPageSize uintptr
+
+// physHugePageSize is the size in bytes of the OS's default physical huge
+// page size whose allocation is opaque to the application. It is assumed
+// and verified to be a power of two.
+//
+// If set, this must be set by the OS init code (typically in osinit) before
+// mallocinit. However, setting it at all is optional, and leaving the default
+// value is always safe (though potentially less efficient).
+//
+// Since physHugePageSize is always assumed to be a power of two,
+// physHugePageShift is defined as physHugePageSize == 1 << physHugePageShift.
+// The purpose of physHugePageShift is to avoid doing divisions in
+// performance critical functions.
+var (
+	physHugePageSize  uintptr
+	physHugePageShift uint
+)
+
+func mallocinit() {
+	if class_to_size[_TinySizeClass] != _TinySize {
+		throw("bad TinySizeClass")
+	}
+
+	if heapArenaBitmapWords&(heapArenaBitmapWords-1) != 0 {
+		// heapBits expects modular arithmetic on bitmap
+		// addresses to work.
+		throw("heapArenaBitmapWords not a power of 2")
+	}
+
+	// Check physPageSize.
+	if physPageSize == 0 {
+		// The OS init code failed to fetch the physical page size.
+		throw("failed to get system page size")
+	}
+	if physPageSize > maxPhysPageSize {
+		print("system page size (", physPageSize, ") is larger than maximum page size (", maxPhysPageSize, ")\n")
+		throw("bad system page size")
+	}
+	if physPageSize < minPhysPageSize {
+		print("system page size (", physPageSize, ") is smaller than minimum page size (", minPhysPageSize, ")\n")
+		throw("bad system page size")
+	}
+	if physPageSize&(physPageSize-1) != 0 {
+		print("system page size (", physPageSize, ") must be a power of 2\n")
+		throw("bad system page size")
+	}
+	if physHugePageSize&(physHugePageSize-1) != 0 {
+		print("system huge page size (", physHugePageSize, ") must be a power of 2\n")
+		throw("bad system huge page size")
+	}
+	if physHugePageSize > maxPhysHugePageSize {
+		// physHugePageSize is greater than the maximum supported huge page size.
+		// Don't throw here, like in the other cases, since a system configured
+		// in this way isn't wrong, we just don't have the code to support them.
+		// Instead, silently set the huge page size to zero.
+		physHugePageSize = 0
+	}
+	if physHugePageSize != 0 {
+		// Since physHugePageSize is a power of 2, it suffices to increase
+		// physHugePageShift until 1<<physHugePageShift == physHugePageSize.
+		for 1<<physHugePageShift != physHugePageSize {
+			physHugePageShift++
+		}
+	}
+	if pagesPerArena%pagesPerSpanRoot != 0 {
+		print("pagesPerArena (", pagesPerArena, ") is not divisible by pagesPerSpanRoot (", pagesPerSpanRoot, ")\n")
+		throw("bad pagesPerSpanRoot")
+	}
+	if pagesPerArena%pagesPerReclaimerChunk != 0 {
+		print("pagesPerArena (", pagesPerArena, ") is not divisible by pagesPerReclaimerChunk (", pagesPerReclaimerChunk, ")\n")
+		throw("bad pagesPerReclaimerChunk")
+	}
+
+	if minTagBits > taggedPointerBits {
+		throw("taggedPointerbits too small")
+	}
+
+	// Initialize the heap.
+	mheap_.init()
+	mcache0 = allocmcache()
+	lockInit(&gcBitsArenas.lock, lockRankGcBitsArenas)
+	lockInit(&profInsertLock, lockRankProfInsert)
+	lockInit(&profBlockLock, lockRankProfBlock)
+	lockInit(&profMemActiveLock, lockRankProfMemActive)
+	for i := range profMemFutureLock {
+		lockInit(&profMemFutureLock[i], lockRankProfMemFuture)
+	}
+	lockInit(&globalAlloc.mutex, lockRankGlobalAlloc)
+
+	// Create initial arena growth hints.
+	if goarch.PtrSize == 8 {
+		// On a 64-bit machine, we pick the following hints
+		// because:
+		//
+		// 1. Starting from the middle of the address space
+		// makes it easier to grow out a contiguous range
+		// without running in to some other mapping.
+		//
+		// 2. This makes Go heap addresses more easily
+		// recognizable when debugging.
+		//
+		// 3. Stack scanning in gccgo is still conservative,
+		// so it's important that addresses be distinguishable
+		// from other data.
+		//
+		// Starting at 0x00c0 means that the valid memory addresses
+		// will begin 0x00c0, 0x00c1, ...
+		// In little-endian, that's c0 00, c1 00, ... None of those are valid
+		// UTF-8 sequences, and they are otherwise as far away from
+		// ff (likely a common byte) as possible. If that fails, we try other 0xXXc0
+		// addresses. An earlier attempt to use 0x11f8 caused out of memory errors
+		// on OS X during thread allocations.  0x00c0 causes conflicts with
+		// AddressSanitizer which reserves all memory up to 0x0100.
+		// These choices reduce the odds of a conservative garbage collector
+		// not collecting memory because some non-pointer block of memory
+		// had a bit pattern that matched a memory address.
+		//
+		// However, on arm64, we ignore all this advice above and slam the
+		// allocation at 0x40 << 32 because when using 4k pages with 3-level
+		// translation buffers, the user address space is limited to 39 bits
+		// On ios/arm64, the address space is even smaller.
+		//
+		// On AIX, mmaps starts at 0x0A00000000000000 for 64-bit.
+		// processes.
+		//
+		// Space mapped for user arenas comes immediately after the range
+		// originally reserved for the regular heap when race mode is not
+		// enabled because user arena chunks can never be used for regular heap
+		// allocations and we want to avoid fragmenting the address space.
+		//
+		// In race mode we have no choice but to just use the same hints because
+		// the race detector requires that the heap be mapped contiguously.
+		for i := 0x7f; i >= 0; i-- {
+			var p uintptr
+			switch {
+			case raceenabled:
+				// The TSAN runtime requires the heap
+				// to be in the range [0x00c000000000,
+				// 0x00e000000000).
+				p = uintptr(i)<<32 | uintptrMask&(0x00c0<<32)
+				if p >= uintptrMask&0x00e000000000 {
+					continue
+				}
+			case GOARCH == "arm64" && GOOS == "ios":
+				p = uintptr(i)<<40 | uintptrMask&(0x0013<<28)
+			case GOARCH == "arm64":
+				p = uintptr(i)<<40 | uintptrMask&(0x0040<<32)
+			case GOOS == "aix":
+				if i == 0 {
+					// We don't use addresses directly after 0x0A00000000000000
+					// to avoid collisions with others mmaps done by non-go programs.
+					continue
+				}
+				p = uintptr(i)<<40 | uintptrMask&(0xa0<<52)
+			default:
+				p = uintptr(i)<<40 | uintptrMask&(0x00c0<<32)
+			}
+			// Switch to generating hints for user arenas if we've gone
+			// through about half the hints. In race mode, take only about
+			// a quarter; we don't have very much space to work with.
+			hintList := &mheap_.arenaHints
+			if (!raceenabled && i > 0x3f) || (raceenabled && i > 0x5f) {
+				hintList = &mheap_.userArena.arenaHints
+			}
+			hint := (*arenaHint)(mheap_.arenaHintAlloc.alloc())
+			hint.addr = p
+			hint.next, *hintList = *hintList, hint
+		}
+	} else {
+		// On a 32-bit machine, we're much more concerned
+		// about keeping the usable heap contiguous.
+		// Hence:
+		//
+		// 1. We reserve space for all heapArenas up front so
+		// they don't get interleaved with the heap. They're
+		// ~258MB, so this isn't too bad. (We could reserve a
+		// smaller amount of space up front if this is a
+		// problem.)
+		//
+		// 2. We hint the heap to start right above the end of
+		// the binary so we have the best chance of keeping it
+		// contiguous.
+		//
+		// 3. We try to stake out a reasonably large initial
+		// heap reservation.
+
+		const arenaMetaSize = (1 << arenaBits) * unsafe.Sizeof(heapArena{})
+		meta := uintptr(sysReserve(nil, arenaMetaSize))
+		if meta != 0 {
+			mheap_.heapArenaAlloc.init(meta, arenaMetaSize, true)
+		}
+
+		// We want to start the arena low, but if we're linked
+		// against C code, it's possible global constructors
+		// have called malloc and adjusted the process' brk.
+		// Query the brk so we can avoid trying to map the
+		// region over it (which will cause the kernel to put
+		// the region somewhere else, likely at a high
+		// address).
+		procBrk := sbrk0()
+
+		// If we ask for the end of the data segment but the
+		// operating system requires a little more space
+		// before we can start allocating, it will give out a
+		// slightly higher pointer. Except QEMU, which is
+		// buggy, as usual: it won't adjust the pointer
+		// upward. So adjust it upward a little bit ourselves:
+		// 1/4 MB to get away from the running binary image.
+		p := firstmoduledata.end
+		if p < procBrk {
+			p = procBrk
+		}
+		if mheap_.heapArenaAlloc.next <= p && p < mheap_.heapArenaAlloc.end {
+			p = mheap_.heapArenaAlloc.end
+		}
+		p = alignUp(p+(256<<10), heapArenaBytes)
+		// Because we're worried about fragmentation on
+		// 32-bit, we try to make a large initial reservation.
+		arenaSizes := []uintptr{
+			512 << 20,
+			256 << 20,
+			128 << 20,
+		}
+		for _, arenaSize := range arenaSizes {
+			a, size := sysReserveAligned(unsafe.Pointer(p), arenaSize, heapArenaBytes)
+			if a != nil {
+				mheap_.arena.init(uintptr(a), size, false)
+				p = mheap_.arena.end // For hint below
+				break
+			}
+		}
+		hint := (*arenaHint)(mheap_.arenaHintAlloc.alloc())
+		hint.addr = p
+		hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
+
+		// Place the hint for user arenas just after the large reservation.
+		//
+		// While this potentially competes with the hint above, in practice we probably
+		// aren't going to be getting this far anyway on 32-bit platforms.
+		userArenaHint := (*arenaHint)(mheap_.arenaHintAlloc.alloc())
+		userArenaHint.addr = p
+		userArenaHint.next, mheap_.userArena.arenaHints = mheap_.userArena.arenaHints, userArenaHint
+	}
+	// Initialize the memory limit here because the allocator is going to look at it
+	// but we haven't called gcinit yet and we're definitely going to allocate memory before then.
+	gcController.memoryLimit.Store(maxInt64)
+}
+
+// sysAlloc allocates heap arena space for at least n bytes. The
+// returned pointer is always heapArenaBytes-aligned and backed by
+// h.arenas metadata. The returned size is always a multiple of
+// heapArenaBytes. sysAlloc returns nil on failure.
+// There is no corresponding free function.
+//
+// hintList is a list of hint addresses for where to allocate new
+// heap arenas. It must be non-nil.
+//
+// register indicates whether the heap arena should be registered
+// in allArenas.
+//
+// sysAlloc returns a memory region in the Reserved state. This region must
+// be transitioned to Prepared and then Ready before use.
+//
+// h must be locked.
+func (h *mheap) sysAlloc(n uintptr, hintList **arenaHint, register bool) (v unsafe.Pointer, size uintptr) {
+	assertLockHeld(&h.lock)
+
+	n = alignUp(n, heapArenaBytes)
+
+	if hintList == &h.arenaHints {
+		// First, try the arena pre-reservation.
+		// Newly-used mappings are considered released.
+		//
+		// Only do this if we're using the regular heap arena hints.
+		// This behavior is only for the heap.
+		v = h.arena.alloc(n, heapArenaBytes, &gcController.heapReleased)
+		if v != nil {
+			size = n
+			goto mapped
+		}
+	}
+
+	// Try to grow the heap at a hint address.
+	for *hintList != nil {
+		hint := *hintList
+		p := hint.addr
+		if hint.down {
+			p -= n
+		}
+		if p+n < p {
+			// We can't use this, so don't ask.
+			v = nil
+		} else if arenaIndex(p+n-1) >= 1<<arenaBits {
+			// Outside addressable heap. Can't use.
+			v = nil
+		} else {
+			v = sysReserve(unsafe.Pointer(p), n)
+		}
+		if p == uintptr(v) {
+			// Success. Update the hint.
+			if !hint.down {
+				p += n
+			}
+			hint.addr = p
+			size = n
+			break
+		}
+		// Failed. Discard this hint and try the next.
+		//
+		// TODO: This would be cleaner if sysReserve could be
+		// told to only return the requested address. In
+		// particular, this is already how Windows behaves, so
+		// it would simplify things there.
+		if v != nil {
+			sysFreeOS(v, n)
+		}
+		*hintList = hint.next
+		h.arenaHintAlloc.free(unsafe.Pointer(hint))
+	}
+
+	if size == 0 {
+		if raceenabled {
+			// The race detector assumes the heap lives in
+			// [0x00c000000000, 0x00e000000000), but we
+			// just ran out of hints in this region. Give
+			// a nice failure.
+			throw("too many address space collisions for -race mode")
+		}
+
+		// All of the hints failed, so we'll take any
+		// (sufficiently aligned) address the kernel will give
+		// us.
+		v, size = sysReserveAligned(nil, n, heapArenaBytes)
+		if v == nil {
+			return nil, 0
+		}
+
+		// Create new hints for extending this region.
+		hint := (*arenaHint)(h.arenaHintAlloc.alloc())
+		hint.addr, hint.down = uintptr(v), true
+		hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
+		hint = (*arenaHint)(h.arenaHintAlloc.alloc())
+		hint.addr = uintptr(v) + size
+		hint.next, mheap_.arenaHints = mheap_.arenaHints, hint
+	}
+
+	// Check for bad pointers or pointers we can't use.
+	{
+		var bad string
+		p := uintptr(v)
+		if p+size < p {
+			bad = "region exceeds uintptr range"
+		} else if arenaIndex(p) >= 1<<arenaBits {
+			bad = "base outside usable address space"
+		} else if arenaIndex(p+size-1) >= 1<<arenaBits {
+			bad = "end outside usable address space"
+		}
+		if bad != "" {
+			// This should be impossible on most architectures,
+			// but it would be really confusing to debug.
+			print("runtime: memory allocated by OS [", hex(p), ", ", hex(p+size), ") not in usable address space: ", bad, "\n")
+			throw("memory reservation exceeds address space limit")
+		}
+	}
+
+	if uintptr(v)&(heapArenaBytes-1) != 0 {
+		throw("misrounded allocation in sysAlloc")
+	}
+
+mapped:
+	// Create arena metadata.
+	for ri := arenaIndex(uintptr(v)); ri <= arenaIndex(uintptr(v)+size-1); ri++ {
+		l2 := h.arenas[ri.l1()]
+		if l2 == nil {
+			// Allocate an L2 arena map.
+			//
+			// Use sysAllocOS instead of sysAlloc or persistentalloc because there's no
+			// statistic we can comfortably account for this space in. With this structure,
+			// we rely on demand paging to avoid large overheads, but tracking which memory
+			// is paged in is too expensive. Trying to account for the whole region means
+			// that it will appear like an enormous memory overhead in statistics, even though
+			// it is not.
+			l2 = (*[1 << arenaL2Bits]*heapArena)(sysAllocOS(unsafe.Sizeof(*l2)))
+			if l2 == nil {
+				throw("out of memory allocating heap arena map")
+			}
+			if h.arenasHugePages {
+				sysHugePage(unsafe.Pointer(l2), unsafe.Sizeof(*l2))
+			} else {
+				sysNoHugePage(unsafe.Pointer(l2), unsafe.Sizeof(*l2))
+			}
+			atomic.StorepNoWB(unsafe.Pointer(&h.arenas[ri.l1()]), unsafe.Pointer(l2))
+		}
+
+		if l2[ri.l2()] != nil {
+			throw("arena already initialized")
+		}
+		var r *heapArena
+		r = (*heapArena)(h.heapArenaAlloc.alloc(unsafe.Sizeof(*r), goarch.PtrSize, &memstats.gcMiscSys))
+		if r == nil {
+			r = (*heapArena)(persistentalloc(unsafe.Sizeof(*r), goarch.PtrSize, &memstats.gcMiscSys))
+			if r == nil {
+				throw("out of memory allocating heap arena metadata")
+			}
+		}
+
+		// Register the arena in allArenas if requested.
+		if register {
+			if len(h.allArenas) == cap(h.allArenas) {
+				size := 2 * uintptr(cap(h.allArenas)) * goarch.PtrSize
+				if size == 0 {
+					size = physPageSize
+				}
+				newArray := (*notInHeap)(persistentalloc(size, goarch.PtrSize, &memstats.gcMiscSys))
+				if newArray == nil {
+					throw("out of memory allocating allArenas")
+				}
+				oldSlice := h.allArenas
+				*(*notInHeapSlice)(unsafe.Pointer(&h.allArenas)) = notInHeapSlice{newArray, len(h.allArenas), int(size / goarch.PtrSize)}
+				copy(h.allArenas, oldSlice)
+				// Do not free the old backing array because
+				// there may be concurrent readers. Since we
+				// double the array each time, this can lead
+				// to at most 2x waste.
+			}
+			h.allArenas = h.allArenas[:len(h.allArenas)+1]
+			h.allArenas[len(h.allArenas)-1] = ri
+		}
+
+		// Store atomically just in case an object from the
+		// new heap arena becomes visible before the heap lock
+		// is released (which shouldn't happen, but there's
+		// little downside to this).
+		atomic.StorepNoWB(unsafe.Pointer(&l2[ri.l2()]), unsafe.Pointer(r))
+	}
+
+	// Tell the race detector about the new heap memory.
+	if raceenabled {
+		racemapshadow(v, size)
+	}
+
+	return
+}
+
+// sysReserveAligned is like sysReserve, but the returned pointer is
+// aligned to align bytes. It may reserve either n or n+align bytes,
+// so it returns the size that was reserved.
+func sysReserveAligned(v unsafe.Pointer, size, align uintptr) (unsafe.Pointer, uintptr) {
+	// Since the alignment is rather large in uses of this
+	// function, we're not likely to get it by chance, so we ask
+	// for a larger region and remove the parts we don't need.
+	retries := 0
+retry:
+	p := uintptr(sysReserve(v, size+align))
+	switch {
+	case p == 0:
+		return nil, 0
+	case p&(align-1) == 0:
+		return unsafe.Pointer(p), size + align
+	case GOOS == "windows":
+		// On Windows we can't release pieces of a
+		// reservation, so we release the whole thing and
+		// re-reserve the aligned sub-region. This may race,
+		// so we may have to try again.
+		sysFreeOS(unsafe.Pointer(p), size+align)
+		p = alignUp(p, align)
+		p2 := sysReserve(unsafe.Pointer(p), size)
+		if p != uintptr(p2) {
+			// Must have raced. Try again.
+			sysFreeOS(p2, size)
+			if retries++; retries == 100 {
+				throw("failed to allocate aligned heap memory; too many retries")
+			}
+			goto retry
+		}
+		// Success.
+		return p2, size
+	default:
+		// Trim off the unaligned parts.
+		pAligned := alignUp(p, align)
+		sysFreeOS(unsafe.Pointer(p), pAligned-p)
+		end := pAligned + size
+		endLen := (p + size + align) - end
+		if endLen > 0 {
+			sysFreeOS(unsafe.Pointer(end), endLen)
+		}
+		return unsafe.Pointer(pAligned), size
+	}
+}
+
+// enableMetadataHugePages enables huge pages for various sources of heap metadata.
+//
+// A note on latency: for sufficiently small heaps (<10s of GiB) this function will take constant
+// time, but may take time proportional to the size of the mapped heap beyond that.
+//
+// This function is idempotent.
+//
+// The heap lock must not be held over this operation, since it will briefly acquire
+// the heap lock.
+//
+// Must be called on the system stack because it acquires the heap lock.
+//
+//go:systemstack
+func (h *mheap) enableMetadataHugePages() {
+	// Enable huge pages for page structure.
+	h.pages.enableChunkHugePages()
+
+	// Grab the lock and set arenasHugePages if it's not.
+	//
+	// Once arenasHugePages is set, all new L2 entries will be eligible for
+	// huge pages. We'll set all the old entries after we release the lock.
+	lock(&h.lock)
+	if h.arenasHugePages {
+		unlock(&h.lock)
+		return
+	}
+	h.arenasHugePages = true
+	unlock(&h.lock)
+
+	// N.B. The arenas L1 map is quite small on all platforms, so it's fine to
+	// just iterate over the whole thing.
+	for i := range h.arenas {
+		l2 := (*[1 << arenaL2Bits]*heapArena)(atomic.Loadp(unsafe.Pointer(&h.arenas[i])))
+		if l2 == nil {
+			continue
+		}
+		sysHugePage(unsafe.Pointer(l2), unsafe.Sizeof(*l2))
+	}
+}
+
+// base address for all 0-byte allocations
+var zerobase uintptr
+
+// nextFreeFast returns the next free object if one is quickly available.
+// Otherwise it returns 0.
+func nextFreeFast(s *mspan) gclinkptr {
+	theBit := sys.TrailingZeros64(s.allocCache) // Is there a free object in the allocCache?
+	if theBit < 64 {
+		result := s.freeindex + uintptr(theBit)
+		if result < s.nelems {
+			freeidx := result + 1
+			if freeidx%64 == 0 && freeidx != s.nelems {
+				return 0
+			}
+			s.allocCache >>= uint(theBit + 1)
+			s.freeindex = freeidx
+			s.allocCount++
+			return gclinkptr(result*s.elemsize + s.base())
+		}
+	}
+	return 0
+}
+
+// nextFree returns the next free object from the cached span if one is available.
+// Otherwise it refills the cache with a span with an available object and
+// returns that object along with a flag indicating that this was a heavy
+// weight allocation. If it is a heavy weight allocation the caller must
+// determine whether a new GC cycle needs to be started or if the GC is active
+// whether this goroutine needs to assist the GC.
+//
+// Must run in a non-preemptible context since otherwise the owner of
+// c could change.
+func (c *mcache) nextFree(spc spanClass) (v gclinkptr, s *mspan, shouldhelpgc bool) {
+	s = c.alloc[spc]
+	shouldhelpgc = false
+	freeIndex := s.nextFreeIndex()
+	if freeIndex == s.nelems {
+		// The span is full.
+		if uintptr(s.allocCount) != s.nelems {
+			println("runtime: s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
+			throw("s.allocCount != s.nelems && freeIndex == s.nelems")
+		}
+		c.refill(spc)
+		shouldhelpgc = true
+		s = c.alloc[spc]
+
+		freeIndex = s.nextFreeIndex()
+	}
+
+	if freeIndex >= s.nelems {
+		throw("freeIndex is not valid")
+	}
+
+	v = gclinkptr(freeIndex*s.elemsize + s.base())
+	s.allocCount++
+	if uintptr(s.allocCount) > s.nelems {
+		println("s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
+		throw("s.allocCount > s.nelems")
+	}
+	return
+}
+
+// Allocate an object of size bytes.
+// Small objects are allocated from the per-P cache's free lists.
+// Large objects (> 32 kB) are allocated straight from the heap.
+func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
+	if gcphase == _GCmarktermination {
+		throw("mallocgc called with gcphase == _GCmarktermination")
+	}
+
+	if size == 0 {
+		return unsafe.Pointer(&zerobase)
+	}
+
+	// It's possible for any malloc to trigger sweeping, which may in
+	// turn queue finalizers. Record this dynamic lock edge.
+	lockRankMayQueueFinalizer()
+
+	userSize := size
+	if asanenabled {
+		// Refer to ASAN runtime library, the malloc() function allocates extra memory,
+		// the redzone, around the user requested memory region. And the redzones are marked
+		// as unaddressable. We perform the same operations in Go to detect the overflows or
+		// underflows.
+		size += computeRZlog(size)
+	}
+
+	if debug.malloc {
+		if debug.sbrk != 0 {
+			align := uintptr(16)
+			if typ != nil {
+				// TODO(austin): This should be just
+				//   align = uintptr(typ.align)
+				// but that's only 4 on 32-bit platforms,
+				// even if there's a uint64 field in typ (see #599).
+				// This causes 64-bit atomic accesses to panic.
+				// Hence, we use stricter alignment that matches
+				// the normal allocator better.
+				if size&7 == 0 {
+					align = 8
+				} else if size&3 == 0 {
+					align = 4
+				} else if size&1 == 0 {
+					align = 2
+				} else {
+					align = 1
+				}
+			}
+			return persistentalloc(size, align, &memstats.other_sys)
+		}
+
+		if inittrace.active && inittrace.id == getg().goid {
+			// Init functions are executed sequentially in a single goroutine.
+			inittrace.allocs += 1
+		}
+	}
+
+	// assistG is the G to charge for this allocation, or nil if
+	// GC is not currently active.
+	assistG := deductAssistCredit(size)
+
+	// Set mp.mallocing to keep from being preempted by GC.
+	mp := acquirem()
+	if mp.mallocing != 0 {
+		throw("malloc deadlock")
+	}
+	if mp.gsignal == getg() {
+		throw("malloc during signal")
+	}
+	mp.mallocing = 1
+
+	shouldhelpgc := false
+	dataSize := userSize
+	c := getMCache(mp)
+	if c == nil {
+		throw("mallocgc called without a P or outside bootstrapping")
+	}
+	var span *mspan
+	var x unsafe.Pointer
+	noscan := typ == nil || typ.PtrBytes == 0
+	// In some cases block zeroing can profitably (for latency reduction purposes)
+	// be delayed till preemption is possible; delayedZeroing tracks that state.
+	delayedZeroing := false
+	if size <= maxSmallSize {
+		if noscan && size < maxTinySize {
+			// Tiny allocator.
+			//
+			// Tiny allocator combines several tiny allocation requests
+			// into a single memory block. The resulting memory block
+			// is freed when all subobjects are unreachable. The subobjects
+			// must be noscan (don't have pointers), this ensures that
+			// the amount of potentially wasted memory is bounded.
+			//
+			// Size of the memory block used for combining (maxTinySize) is tunable.
+			// Current setting is 16 bytes, which relates to 2x worst case memory
+			// wastage (when all but one subobjects are unreachable).
+			// 8 bytes would result in no wastage at all, but provides less
+			// opportunities for combining.
+			// 32 bytes provides more opportunities for combining,
+			// but can lead to 4x worst case wastage.
+			// The best case winning is 8x regardless of block size.
+			//
+			// Objects obtained from tiny allocator must not be freed explicitly.
+			// So when an object will be freed explicitly, we ensure that
+			// its size >= maxTinySize.
+			//
+			// SetFinalizer has a special case for objects potentially coming
+			// from tiny allocator, it such case it allows to set finalizers
+			// for an inner byte of a memory block.
+			//
+			// The main targets of tiny allocator are small strings and
+			// standalone escaping variables. On a json benchmark
+			// the allocator reduces number of allocations by ~12% and
+			// reduces heap size by ~20%.
+			off := c.tinyoffset
+			// Align tiny pointer for required (conservative) alignment.
+			if size&7 == 0 {
+				off = alignUp(off, 8)
+			} else if goarch.PtrSize == 4 && size == 12 {
+				// Conservatively align 12-byte objects to 8 bytes on 32-bit
+				// systems so that objects whose first field is a 64-bit
+				// value is aligned to 8 bytes and does not cause a fault on
+				// atomic access. See issue 37262.
+				// TODO(mknyszek): Remove this workaround if/when issue 36606
+				// is resolved.
+				off = alignUp(off, 8)
+			} else if size&3 == 0 {
+				off = alignUp(off, 4)
+			} else if size&1 == 0 {
+				off = alignUp(off, 2)
+			}
+			if off+size <= maxTinySize && c.tiny != 0 {
+				// The object fits into existing tiny block.
+				x = unsafe.Pointer(c.tiny + off)
+				c.tinyoffset = off + size
+				c.tinyAllocs++
+				mp.mallocing = 0
+				releasem(mp)
+				return x
+			}
+			// Allocate a new maxTinySize block.
+			span = c.alloc[tinySpanClass]
+			v := nextFreeFast(span)
+			if v == 0 {
+				v, span, shouldhelpgc = c.nextFree(tinySpanClass)
+			}
+			x = unsafe.Pointer(v)
+			(*[2]uint64)(x)[0] = 0
+			(*[2]uint64)(x)[1] = 0
+			// See if we need to replace the existing tiny block with the new one
+			// based on amount of remaining free space.
+			if !raceenabled && (size < c.tinyoffset || c.tiny == 0) {
+				// Note: disabled when race detector is on, see comment near end of this function.
+				c.tiny = uintptr(x)
+				c.tinyoffset = size
+			}
+			size = maxTinySize
+		} else {
+			var sizeclass uint8
+			if size <= smallSizeMax-8 {
+				sizeclass = size_to_class8[divRoundUp(size, smallSizeDiv)]
+			} else {
+				sizeclass = size_to_class128[divRoundUp(size-smallSizeMax, largeSizeDiv)]
+			}
+			size = uintptr(class_to_size[sizeclass])
+			spc := makeSpanClass(sizeclass, noscan)
+			span = c.alloc[spc]
+			v := nextFreeFast(span)
+			if v == 0 {
+				v, span, shouldhelpgc = c.nextFree(spc)
+			}
+			x = unsafe.Pointer(v)
+			if needzero && span.needzero != 0 {
+				memclrNoHeapPointers(x, size)
+			}
+		}
+	} else {
+		shouldhelpgc = true
+		// For large allocations, keep track of zeroed state so that
+		// bulk zeroing can be happen later in a preemptible context.
+		span = c.allocLarge(size, noscan)
+		span.freeindex = 1
+		span.allocCount = 1
+		size = span.elemsize
+		x = unsafe.Pointer(span.base())
+		if needzero && span.needzero != 0 {
+			if noscan {
+				delayedZeroing = true
+			} else {
+				memclrNoHeapPointers(x, size)
+				// We've in theory cleared almost the whole span here,
+				// and could take the extra step of actually clearing
+				// the whole thing. However, don't. Any GC bits for the
+				// uncleared parts will be zero, and it's just going to
+				// be needzero = 1 once freed anyway.
+			}
+		}
+	}
+
+	if !noscan {
+		var scanSize uintptr
+		heapBitsSetType(uintptr(x), size, dataSize, typ)
+		if dataSize > typ.Size_ {
+			// Array allocation. If there are any
+			// pointers, GC has to scan to the last
+			// element.
+			if typ.PtrBytes != 0 {
+				scanSize = dataSize - typ.Size_ + typ.PtrBytes
+			}
+		} else {
+			scanSize = typ.PtrBytes
+		}
+		c.scanAlloc += scanSize
+	}
+
+	// Ensure that the stores above that initialize x to
+	// type-safe memory and set the heap bits occur before
+	// the caller can make x observable to the garbage
+	// collector. Otherwise, on weakly ordered machines,
+	// the garbage collector could follow a pointer to x,
+	// but see uninitialized memory or stale heap bits.
+	publicationBarrier()
+	// As x and the heap bits are initialized, update
+	// freeIndexForScan now so x is seen by the GC
+	// (including conservative scan) as an allocated object.
+	// While this pointer can't escape into user code as a
+	// _live_ pointer until we return, conservative scanning
+	// may find a dead pointer that happens to point into this
+	// object. Delaying this update until now ensures that
+	// conservative scanning considers this pointer dead until
+	// this point.
+	span.freeIndexForScan = span.freeindex
+
+	// Allocate black during GC.
+	// All slots hold nil so no scanning is needed.
+	// This may be racing with GC so do it atomically if there can be
+	// a race marking the bit.
+	if gcphase != _GCoff {
+		gcmarknewobject(span, uintptr(x), size)
+	}
+
+	if raceenabled {
+		racemalloc(x, size)
+	}
+
+	if msanenabled {
+		msanmalloc(x, size)
+	}
+
+	if asanenabled {
+		// We should only read/write the memory with the size asked by the user.
+		// The rest of the allocated memory should be poisoned, so that we can report
+		// errors when accessing poisoned memory.
+		// The allocated memory is larger than required userSize, it will also include
+		// redzone and some other padding bytes.
+		rzBeg := unsafe.Add(x, userSize)
+		asanpoison(rzBeg, size-userSize)
+		asanunpoison(x, userSize)
+	}
+
+	if rate := MemProfileRate; rate > 0 {
+		// Note cache c only valid while m acquired; see #47302
+		if rate != 1 && size < c.nextSample {
+			c.nextSample -= size
+		} else {
+			profilealloc(mp, x, size)
+		}
+	}
+	mp.mallocing = 0
+	releasem(mp)
+
+	// Pointerfree data can be zeroed late in a context where preemption can occur.
+	// x will keep the memory alive.
+	if delayedZeroing {
+		if !noscan {
+			throw("delayed zeroing on data that may contain pointers")
+		}
+		memclrNoHeapPointersChunked(size, x) // This is a possible preemption point: see #47302
+	}
+
+	if debug.malloc {
+		if debug.allocfreetrace != 0 {
+			tracealloc(x, size, typ)
+		}
+
+		if inittrace.active && inittrace.id == getg().goid {
+			// Init functions are executed sequentially in a single goroutine.
+			inittrace.bytes += uint64(size)
+		}
+	}
+
+	if assistG != nil {
+		// Account for internal fragmentation in the assist
+		// debt now that we know it.
+		assistG.gcAssistBytes -= int64(size - dataSize)
+	}
+
+	if shouldhelpgc {
+		if t := (gcTrigger{kind: gcTriggerHeap}); t.test() {
+			gcStart(t)
+		}
+	}
+
+	if raceenabled && noscan && dataSize < maxTinySize {
+		// Pad tinysize allocations so they are aligned with the end
+		// of the tinyalloc region. This ensures that any arithmetic
+		// that goes off the top end of the object will be detectable
+		// by checkptr (issue 38872).
+		// Note that we disable tinyalloc when raceenabled for this to work.
+		// TODO: This padding is only performed when the race detector
+		// is enabled. It would be nice to enable it if any package
+		// was compiled with checkptr, but there's no easy way to
+		// detect that (especially at compile time).
+		// TODO: enable this padding for all allocations, not just
+		// tinyalloc ones. It's tricky because of pointer maps.
+		// Maybe just all noscan objects?
+		x = add(x, size-dataSize)
+	}
+
+	return x
+}
+
+// deductAssistCredit reduces the current G's assist credit
+// by size bytes, and assists the GC if necessary.
+//
+// Caller must be preemptible.
+//
+// Returns the G for which the assist credit was accounted.
+func deductAssistCredit(size uintptr) *g {
+	var assistG *g
+	if gcBlackenEnabled != 0 {
+		// Charge the current user G for this allocation.
+		assistG = getg()
+		if assistG.m.curg != nil {
+			assistG = assistG.m.curg
+		}
+		// Charge the allocation against the G. We'll account
+		// for internal fragmentation at the end of mallocgc.
+		assistG.gcAssistBytes -= int64(size)
+
+		if assistG.gcAssistBytes < 0 {
+			// This G is in debt. Assist the GC to correct
+			// this before allocating. This must happen
+			// before disabling preemption.
+			gcAssistAlloc(assistG)
+		}
+	}
+	return assistG
+}
+
+// memclrNoHeapPointersChunked repeatedly calls memclrNoHeapPointers
+// on chunks of the buffer to be zeroed, with opportunities for preemption
+// along the way.  memclrNoHeapPointers contains no safepoints and also
+// cannot be preemptively scheduled, so this provides a still-efficient
+// block copy that can also be preempted on a reasonable granularity.
+//
+// Use this with care; if the data being cleared is tagged to contain
+// pointers, this allows the GC to run before it is all cleared.
+func memclrNoHeapPointersChunked(size uintptr, x unsafe.Pointer) {
+	v := uintptr(x)
+	// got this from benchmarking. 128k is too small, 512k is too large.
+	const chunkBytes = 256 * 1024
+	vsize := v + size
+	for voff := v; voff < vsize; voff = voff + chunkBytes {
+		if getg().preempt {
+			// may hold locks, e.g., profiling
+			goschedguarded()
+		}
+		// clear min(avail, lump) bytes
+		n := vsize - voff
+		if n > chunkBytes {
+			n = chunkBytes
+		}
+		memclrNoHeapPointers(unsafe.Pointer(voff), n)
+	}
+}
+
+// implementation of new builtin
+// compiler (both frontend and SSA backend) knows the signature
+// of this function.
+func newobject(typ *_type) unsafe.Pointer {
+	return mallocgc(typ.Size_, typ, true)
+}
+
+//go:linkname reflect_unsafe_New reflect.unsafe_New
+func reflect_unsafe_New(typ *_type) unsafe.Pointer {
+	return mallocgc(typ.Size_, typ, true)
+}
+
+//go:linkname reflectlite_unsafe_New internal/reflectlite.unsafe_New
+func reflectlite_unsafe_New(typ *_type) unsafe.Pointer {
+	return mallocgc(typ.Size_, typ, true)
+}
+
+// newarray allocates an array of n elements of type typ.
+func newarray(typ *_type, n int) unsafe.Pointer {
+	if n == 1 {
+		return mallocgc(typ.Size_, typ, true)
+	}
+	mem, overflow := math.MulUintptr(typ.Size_, uintptr(n))
+	if overflow || mem > maxAlloc || n < 0 {
+		panic(plainError("runtime: allocation size out of range"))
+	}
+	return mallocgc(mem, typ, true)
+}
+
+//go:linkname reflect_unsafe_NewArray reflect.unsafe_NewArray
+func reflect_unsafe_NewArray(typ *_type, n int) unsafe.Pointer {
+	return newarray(typ, n)
+}
+
+func profilealloc(mp *m, x unsafe.Pointer, size uintptr) {
+	c := getMCache(mp)
+	if c == nil {
+		throw("profilealloc called without a P or outside bootstrapping")
+	}
+	c.nextSample = nextSample()
+	mProf_Malloc(x, size)
+}
+
+// nextSample returns the next sampling point for heap profiling. The goal is
+// to sample allocations on average every MemProfileRate bytes, but with a
+// completely random distribution over the allocation timeline; this
+// corresponds to a Poisson process with parameter MemProfileRate. In Poisson
+// processes, the distance between two samples follows the exponential
+// distribution (exp(MemProfileRate)), so the best return value is a random
+// number taken from an exponential distribution whose mean is MemProfileRate.
+func nextSample() uintptr {
+	if MemProfileRate == 1 {
+		// Callers assign our return value to
+		// mcache.next_sample, but next_sample is not used
+		// when the rate is 1. So avoid the math below and
+		// just return something.
+		return 0
+	}
+	if GOOS == "plan9" {
+		// Plan 9 doesn't support floating point in note handler.
+		if gp := getg(); gp == gp.m.gsignal {
+			return nextSampleNoFP()
+		}
+	}
+
+	return uintptr(fastexprand(MemProfileRate))
+}
+
+// fastexprand returns a random number from an exponential distribution with
+// the specified mean.
+func fastexprand(mean int) int32 {
+	// Avoid overflow. Maximum possible step is
+	// -ln(1/(1<<randomBitCount)) * mean, approximately 20 * mean.
+	switch {
+	case mean > 0x7000000:
+		mean = 0x7000000
+	case mean == 0:
+		return 0
+	}
+
+	// Take a random sample of the exponential distribution exp(-mean*x).
+	// The probability distribution function is mean*exp(-mean*x), so the CDF is
+	// p = 1 - exp(-mean*x), so
+	// q = 1 - p == exp(-mean*x)
+	// log_e(q) = -mean*x
+	// -log_e(q)/mean = x
+	// x = -log_e(q) * mean
+	// x = log_2(q) * (-log_e(2)) * mean    ; Using log_2 for efficiency
+	const randomBitCount = 26
+	q := fastrandn(1<<randomBitCount) + 1
+	qlog := fastlog2(float64(q)) - randomBitCount
+	if qlog > 0 {
+		qlog = 0
+	}
+	const minusLog2 = -0.6931471805599453 // -ln(2)
+	return int32(qlog*(minusLog2*float64(mean))) + 1
+}
+
+// nextSampleNoFP is similar to nextSample, but uses older,
+// simpler code to avoid floating point.
+func nextSampleNoFP() uintptr {
+	// Set first allocation sample size.
+	rate := MemProfileRate
+	if rate > 0x3fffffff { // make 2*rate not overflow
+		rate = 0x3fffffff
+	}
+	if rate != 0 {
+		return uintptr(fastrandn(uint32(2 * rate)))
+	}
+	return 0
+}
+
+type persistentAlloc struct {
+	base *notInHeap
+	off  uintptr
+}
+
+var globalAlloc struct {
+	mutex
+	persistentAlloc
+}
+
+// persistentChunkSize is the number of bytes we allocate when we grow
+// a persistentAlloc.
+const persistentChunkSize = 256 << 10
+
+// persistentChunks is a list of all the persistent chunks we have
+// allocated. The list is maintained through the first word in the
+// persistent chunk. This is updated atomically.
+var persistentChunks *notInHeap
+
+// Wrapper around sysAlloc that can allocate small chunks.
+// There is no associated free operation.
+// Intended for things like function/type/debug-related persistent data.
+// If align is 0, uses default align (currently 8).
+// The returned memory will be zeroed.
+// sysStat must be non-nil.
+//
+// Consider marking persistentalloc'd types not in heap by embedding
+// runtime/internal/sys.NotInHeap.
+func persistentalloc(size, align uintptr, sysStat *sysMemStat) unsafe.Pointer {
+	var p *notInHeap
+	systemstack(func() {
+		p = persistentalloc1(size, align, sysStat)
+	})
+	return unsafe.Pointer(p)
+}
+
+// Must run on system stack because stack growth can (re)invoke it.
+// See issue 9174.
+//
+//go:systemstack
+func persistentalloc1(size, align uintptr, sysStat *sysMemStat) *notInHeap {
+	const (
+		maxBlock = 64 << 10 // VM reservation granularity is 64K on windows
+	)
+
+	if size == 0 {
+		throw("persistentalloc: size == 0")
+	}
+	if align != 0 {
+		if align&(align-1) != 0 {
+			throw("persistentalloc: align is not a power of 2")
+		}
+		if align > _PageSize {
+			throw("persistentalloc: align is too large")
+		}
+	} else {
+		align = 8
+	}
+
+	if size >= maxBlock {
+		return (*notInHeap)(sysAlloc(size, sysStat))
+	}
+
+	mp := acquirem()
+	var persistent *persistentAlloc
+	if mp != nil && mp.p != 0 {
+		persistent = &mp.p.ptr().palloc
+	} else {
+		lock(&globalAlloc.mutex)
+		persistent = &globalAlloc.persistentAlloc
+	}
+	persistent.off = alignUp(persistent.off, align)
+	if persistent.off+size > persistentChunkSize || persistent.base == nil {
+		persistent.base = (*notInHeap)(sysAlloc(persistentChunkSize, &memstats.other_sys))
+		if persistent.base == nil {
+			if persistent == &globalAlloc.persistentAlloc {
+				unlock(&globalAlloc.mutex)
+			}
+			throw("runtime: cannot allocate memory")
+		}
+
+		// Add the new chunk to the persistentChunks list.
+		for {
+			chunks := uintptr(unsafe.Pointer(persistentChunks))
+			*(*uintptr)(unsafe.Pointer(persistent.base)) = chunks
+			if atomic.Casuintptr((*uintptr)(unsafe.Pointer(&persistentChunks)), chunks, uintptr(unsafe.Pointer(persistent.base))) {
+				break
+			}
+		}
+		persistent.off = alignUp(goarch.PtrSize, align)
+	}
+	p := persistent.base.add(persistent.off)
+	persistent.off += size
+	releasem(mp)
+	if persistent == &globalAlloc.persistentAlloc {
+		unlock(&globalAlloc.mutex)
+	}
+
+	if sysStat != &memstats.other_sys {
+		sysStat.add(int64(size))
+		memstats.other_sys.add(-int64(size))
+	}
+	return p
+}
+
+// inPersistentAlloc reports whether p points to memory allocated by
+// persistentalloc. This must be nosplit because it is called by the
+// cgo checker code, which is called by the write barrier code.
+//
+//go:nosplit
+func inPersistentAlloc(p uintptr) bool {
+	chunk := atomic.Loaduintptr((*uintptr)(unsafe.Pointer(&persistentChunks)))
+	for chunk != 0 {
+		if p >= chunk && p < chunk+persistentChunkSize {
+			return true
+		}
+		chunk = *(*uintptr)(unsafe.Pointer(chunk))
+	}
+	return false
+}
+
+// linearAlloc is a simple linear allocator that pre-reserves a region
+// of memory and then optionally maps that region into the Ready state
+// as needed.
+//
+// The caller is responsible for locking.
+type linearAlloc struct {
+	next   uintptr // next free byte
+	mapped uintptr // one byte past end of mapped space
+	end    uintptr // end of reserved space
+
+	mapMemory bool // transition memory from Reserved to Ready if true
+}
+
+func (l *linearAlloc) init(base, size uintptr, mapMemory bool) {
+	if base+size < base {
+		// Chop off the last byte. The runtime isn't prepared
+		// to deal with situations where the bounds could overflow.
+		// Leave that memory reserved, though, so we don't map it
+		// later.
+		size -= 1
+	}
+	l.next, l.mapped = base, base
+	l.end = base + size
+	l.mapMemory = mapMemory
+}
+
+func (l *linearAlloc) alloc(size, align uintptr, sysStat *sysMemStat) unsafe.Pointer {
+	p := alignUp(l.next, align)
+	if p+size > l.end {
+		return nil
+	}
+	l.next = p + size
+	if pEnd := alignUp(l.next-1, physPageSize); pEnd > l.mapped {
+		if l.mapMemory {
+			// Transition from Reserved to Prepared to Ready.
+			n := pEnd - l.mapped
+			sysMap(unsafe.Pointer(l.mapped), n, sysStat)
+			sysUsed(unsafe.Pointer(l.mapped), n, n)
+		}
+		l.mapped = pEnd
+	}
+	return unsafe.Pointer(p)
+}
+
+// notInHeap is off-heap memory allocated by a lower-level allocator
+// like sysAlloc or persistentAlloc.
+//
+// In general, it's better to use real types which embed
+// runtime/internal/sys.NotInHeap, but this serves as a generic type
+// for situations where that isn't possible (like in the allocators).
+//
+// TODO: Use this as the return type of sysAlloc, persistentAlloc, etc?
+type notInHeap struct{ _ sys.NotInHeap }
+
+func (p *notInHeap) add(bytes uintptr) *notInHeap {
+	return (*notInHeap)(unsafe.Pointer(uintptr(unsafe.Pointer(p)) + bytes))
+}
+
+// computeRZlog computes the size of the redzone.
+// Refer to the implementation of the compiler-rt.
+func computeRZlog(userSize uintptr) uintptr {
+	switch {
+	case userSize <= (64 - 16):
+		return 16 << 0
+	case userSize <= (128 - 32):
+		return 16 << 1
+	case userSize <= (512 - 64):
+		return 16 << 2
+	case userSize <= (4096 - 128):
+		return 16 << 3
+	case userSize <= (1<<14)-256:
+		return 16 << 4
+	case userSize <= (1<<15)-512:
+		return 16 << 5
+	case userSize <= (1<<16)-1024:
+		return 16 << 6
+	default:
+		return 16 << 7
+	}
+}
diff --git a/src/runtime/malloc_test.go b/src/runtime/malloc_test.go
new file mode 100644
index 0000000..5b9ce98
--- /dev/null
+++ b/src/runtime/malloc_test.go
@@ -0,0 +1,449 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"flag"
+	"fmt"
+	"internal/race"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"reflect"
+	"runtime"
+	. "runtime"
+	"strings"
+	"sync/atomic"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+var testMemStatsCount int
+
+func TestMemStats(t *testing.T) {
+	testMemStatsCount++
+
+	// Make sure there's at least one forced GC.
+	GC()
+
+	// Test that MemStats has sane values.
+	st := new(MemStats)
+	ReadMemStats(st)
+
+	nz := func(x any) error {
+		if x != reflect.Zero(reflect.TypeOf(x)).Interface() {
+			return nil
+		}
+		return fmt.Errorf("zero value")
+	}
+	le := func(thresh float64) func(any) error {
+		return func(x any) error {
+			// These sanity tests aren't necessarily valid
+			// with high -test.count values, so only run
+			// them once.
+			if testMemStatsCount > 1 {
+				return nil
+			}
+
+			if reflect.ValueOf(x).Convert(reflect.TypeOf(thresh)).Float() < thresh {
+				return nil
+			}
+			return fmt.Errorf("insanely high value (overflow?); want <= %v", thresh)
+		}
+	}
+	eq := func(x any) func(any) error {
+		return func(y any) error {
+			if x == y {
+				return nil
+			}
+			return fmt.Errorf("want %v", x)
+		}
+	}
+	// Of the uint fields, HeapReleased, HeapIdle can be 0.
+	// PauseTotalNs can be 0 if timer resolution is poor.
+	fields := map[string][]func(any) error{
+		"Alloc": {nz, le(1e10)}, "TotalAlloc": {nz, le(1e11)}, "Sys": {nz, le(1e10)},
+		"Lookups": {eq(uint64(0))}, "Mallocs": {nz, le(1e10)}, "Frees": {nz, le(1e10)},
+		"HeapAlloc": {nz, le(1e10)}, "HeapSys": {nz, le(1e10)}, "HeapIdle": {le(1e10)},
+		"HeapInuse": {nz, le(1e10)}, "HeapReleased": {le(1e10)}, "HeapObjects": {nz, le(1e10)},
+		"StackInuse": {nz, le(1e10)}, "StackSys": {nz, le(1e10)},
+		"MSpanInuse": {nz, le(1e10)}, "MSpanSys": {nz, le(1e10)},
+		"MCacheInuse": {nz, le(1e10)}, "MCacheSys": {nz, le(1e10)},
+		"BuckHashSys": {nz, le(1e10)}, "GCSys": {nz, le(1e10)}, "OtherSys": {nz, le(1e10)},
+		"NextGC": {nz, le(1e10)}, "LastGC": {nz},
+		"PauseTotalNs": {le(1e11)}, "PauseNs": nil, "PauseEnd": nil,
+		"NumGC": {nz, le(1e9)}, "NumForcedGC": {nz, le(1e9)},
+		"GCCPUFraction": {le(0.99)}, "EnableGC": {eq(true)}, "DebugGC": {eq(false)},
+		"BySize": nil,
+	}
+
+	rst := reflect.ValueOf(st).Elem()
+	for i := 0; i < rst.Type().NumField(); i++ {
+		name, val := rst.Type().Field(i).Name, rst.Field(i).Interface()
+		checks, ok := fields[name]
+		if !ok {
+			t.Errorf("unknown MemStats field %s", name)
+			continue
+		}
+		for _, check := range checks {
+			if err := check(val); err != nil {
+				t.Errorf("%s = %v: %s", name, val, err)
+			}
+		}
+	}
+
+	if st.Sys != st.HeapSys+st.StackSys+st.MSpanSys+st.MCacheSys+
+		st.BuckHashSys+st.GCSys+st.OtherSys {
+		t.Fatalf("Bad sys value: %+v", *st)
+	}
+
+	if st.HeapIdle+st.HeapInuse != st.HeapSys {
+		t.Fatalf("HeapIdle(%d) + HeapInuse(%d) should be equal to HeapSys(%d), but isn't.", st.HeapIdle, st.HeapInuse, st.HeapSys)
+	}
+
+	if lpe := st.PauseEnd[int(st.NumGC+255)%len(st.PauseEnd)]; st.LastGC != lpe {
+		t.Fatalf("LastGC(%d) != last PauseEnd(%d)", st.LastGC, lpe)
+	}
+
+	var pauseTotal uint64
+	for _, pause := range st.PauseNs {
+		pauseTotal += pause
+	}
+	if int(st.NumGC) < len(st.PauseNs) {
+		// We have all pauses, so this should be exact.
+		if st.PauseTotalNs != pauseTotal {
+			t.Fatalf("PauseTotalNs(%d) != sum PauseNs(%d)", st.PauseTotalNs, pauseTotal)
+		}
+		for i := int(st.NumGC); i < len(st.PauseNs); i++ {
+			if st.PauseNs[i] != 0 {
+				t.Fatalf("Non-zero PauseNs[%d]: %+v", i, st)
+			}
+			if st.PauseEnd[i] != 0 {
+				t.Fatalf("Non-zero PauseEnd[%d]: %+v", i, st)
+			}
+		}
+	} else {
+		if st.PauseTotalNs < pauseTotal {
+			t.Fatalf("PauseTotalNs(%d) < sum PauseNs(%d)", st.PauseTotalNs, pauseTotal)
+		}
+	}
+
+	if st.NumForcedGC > st.NumGC {
+		t.Fatalf("NumForcedGC(%d) > NumGC(%d)", st.NumForcedGC, st.NumGC)
+	}
+}
+
+func TestStringConcatenationAllocs(t *testing.T) {
+	n := testing.AllocsPerRun(1e3, func() {
+		b := make([]byte, 10)
+		for i := 0; i < 10; i++ {
+			b[i] = byte(i) + '0'
+		}
+		s := "foo" + string(b)
+		if want := "foo0123456789"; s != want {
+			t.Fatalf("want %v, got %v", want, s)
+		}
+	})
+	// Only string concatenation allocates.
+	if n != 1 {
+		t.Fatalf("want 1 allocation, got %v", n)
+	}
+}
+
+func TestTinyAlloc(t *testing.T) {
+	if runtime.Raceenabled {
+		t.Skip("tinyalloc suppressed when running in race mode")
+	}
+	const N = 16
+	var v [N]unsafe.Pointer
+	for i := range v {
+		v[i] = unsafe.Pointer(new(byte))
+	}
+
+	chunks := make(map[uintptr]bool, N)
+	for _, p := range v {
+		chunks[uintptr(p)&^7] = true
+	}
+
+	if len(chunks) == N {
+		t.Fatal("no bytes allocated within the same 8-byte chunk")
+	}
+}
+
+type obj12 struct {
+	a uint64
+	b uint32
+}
+
+func TestTinyAllocIssue37262(t *testing.T) {
+	if runtime.Raceenabled {
+		t.Skip("tinyalloc suppressed when running in race mode")
+	}
+	// Try to cause an alignment access fault
+	// by atomically accessing the first 64-bit
+	// value of a tiny-allocated object.
+	// See issue 37262 for details.
+
+	// GC twice, once to reach a stable heap state
+	// and again to make sure we finish the sweep phase.
+	runtime.GC()
+	runtime.GC()
+
+	// Disable preemption so we stay on one P's tiny allocator and
+	// nothing else allocates from it.
+	runtime.Acquirem()
+
+	// Make 1-byte allocations until we get a fresh tiny slot.
+	aligned := false
+	for i := 0; i < 16; i++ {
+		x := runtime.Escape(new(byte))
+		if uintptr(unsafe.Pointer(x))&0xf == 0xf {
+			aligned = true
+			break
+		}
+	}
+	if !aligned {
+		runtime.Releasem()
+		t.Fatal("unable to get a fresh tiny slot")
+	}
+
+	// Create a 4-byte object so that the current
+	// tiny slot is partially filled.
+	runtime.Escape(new(uint32))
+
+	// Create a 12-byte object, which fits into the
+	// tiny slot. If it actually gets place there,
+	// then the field "a" will be improperly aligned
+	// for atomic access on 32-bit architectures.
+	// This won't be true if issue 36606 gets resolved.
+	tinyObj12 := runtime.Escape(new(obj12))
+
+	// Try to atomically access "x.a".
+	atomic.StoreUint64(&tinyObj12.a, 10)
+
+	runtime.Releasem()
+}
+
+func TestPageCacheLeak(t *testing.T) {
+	defer GOMAXPROCS(GOMAXPROCS(1))
+	leaked := PageCachePagesLeaked()
+	if leaked != 0 {
+		t.Fatalf("found %d leaked pages in page caches", leaked)
+	}
+}
+
+func TestPhysicalMemoryUtilization(t *testing.T) {
+	got := runTestProg(t, "testprog", "GCPhys")
+	want := "OK\n"
+	if got != want {
+		t.Fatalf("expected %q, but got %q", want, got)
+	}
+}
+
+func TestScavengedBitsCleared(t *testing.T) {
+	var mismatches [128]BitsMismatch
+	if n, ok := CheckScavengedBitsCleared(mismatches[:]); !ok {
+		t.Errorf("uncleared scavenged bits")
+		for _, m := range mismatches[:n] {
+			t.Logf("\t@ address 0x%x", m.Base)
+			t.Logf("\t|  got: %064b", m.Got)
+			t.Logf("\t| want: %064b", m.Want)
+		}
+		t.FailNow()
+	}
+}
+
+type acLink struct {
+	x [1 << 20]byte
+}
+
+var arenaCollisionSink []*acLink
+
+func TestArenaCollision(t *testing.T) {
+	testenv.MustHaveExec(t)
+
+	// Test that mheap.sysAlloc handles collisions with other
+	// memory mappings.
+	if os.Getenv("TEST_ARENA_COLLISION") != "1" {
+		cmd := testenv.CleanCmdEnv(exec.Command(os.Args[0], "-test.run=TestArenaCollision", "-test.v"))
+		cmd.Env = append(cmd.Env, "TEST_ARENA_COLLISION=1")
+		out, err := cmd.CombinedOutput()
+		if race.Enabled {
+			// This test runs the runtime out of hint
+			// addresses, so it will start mapping the
+			// heap wherever it can. The race detector
+			// doesn't support this, so look for the
+			// expected failure.
+			if want := "too many address space collisions"; !strings.Contains(string(out), want) {
+				t.Fatalf("want %q, got:\n%s", want, string(out))
+			}
+		} else if !strings.Contains(string(out), "PASS\n") || err != nil {
+			t.Fatalf("%s\n(exit status %v)", string(out), err)
+		}
+		return
+	}
+	disallowed := [][2]uintptr{}
+	// Drop all but the next 3 hints. 64-bit has a lot of hints,
+	// so it would take a lot of memory to go through all of them.
+	KeepNArenaHints(3)
+	// Consume these 3 hints and force the runtime to find some
+	// fallback hints.
+	for i := 0; i < 5; i++ {
+		// Reserve memory at the next hint so it can't be used
+		// for the heap.
+		start, end, ok := MapNextArenaHint()
+		if !ok {
+			t.Skipf("failed to reserve memory at next arena hint [%#x, %#x)", start, end)
+		}
+		t.Logf("reserved [%#x, %#x)", start, end)
+		disallowed = append(disallowed, [2]uintptr{start, end})
+		// Allocate until the runtime tries to use the hint we
+		// just mapped over.
+		hint := GetNextArenaHint()
+		for GetNextArenaHint() == hint {
+			ac := new(acLink)
+			arenaCollisionSink = append(arenaCollisionSink, ac)
+			// The allocation must not have fallen into
+			// one of the reserved regions.
+			p := uintptr(unsafe.Pointer(ac))
+			for _, d := range disallowed {
+				if d[0] <= p && p < d[1] {
+					t.Fatalf("allocation %#x in reserved region [%#x, %#x)", p, d[0], d[1])
+				}
+			}
+		}
+	}
+}
+
+func BenchmarkMalloc8(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		p := new(int64)
+		Escape(p)
+	}
+}
+
+func BenchmarkMalloc16(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		p := new([2]int64)
+		Escape(p)
+	}
+}
+
+func BenchmarkMallocTypeInfo8(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		p := new(struct {
+			p [8 / unsafe.Sizeof(uintptr(0))]*int
+		})
+		Escape(p)
+	}
+}
+
+func BenchmarkMallocTypeInfo16(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		p := new(struct {
+			p [16 / unsafe.Sizeof(uintptr(0))]*int
+		})
+		Escape(p)
+	}
+}
+
+type LargeStruct struct {
+	x [16][]byte
+}
+
+func BenchmarkMallocLargeStruct(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		p := make([]LargeStruct, 2)
+		Escape(p)
+	}
+}
+
+var n = flag.Int("n", 1000, "number of goroutines")
+
+func BenchmarkGoroutineSelect(b *testing.B) {
+	quit := make(chan struct{})
+	read := func(ch chan struct{}) {
+		for {
+			select {
+			case _, ok := <-ch:
+				if !ok {
+					return
+				}
+			case <-quit:
+				return
+			}
+		}
+	}
+	benchHelper(b, *n, read)
+}
+
+func BenchmarkGoroutineBlocking(b *testing.B) {
+	read := func(ch chan struct{}) {
+		for {
+			if _, ok := <-ch; !ok {
+				return
+			}
+		}
+	}
+	benchHelper(b, *n, read)
+}
+
+func BenchmarkGoroutineForRange(b *testing.B) {
+	read := func(ch chan struct{}) {
+		for range ch {
+		}
+	}
+	benchHelper(b, *n, read)
+}
+
+func benchHelper(b *testing.B, n int, read func(chan struct{})) {
+	m := make([]chan struct{}, n)
+	for i := range m {
+		m[i] = make(chan struct{}, 1)
+		go read(m[i])
+	}
+	b.StopTimer()
+	b.ResetTimer()
+	GC()
+
+	for i := 0; i < b.N; i++ {
+		for _, ch := range m {
+			if ch != nil {
+				ch <- struct{}{}
+			}
+		}
+		time.Sleep(10 * time.Millisecond)
+		b.StartTimer()
+		GC()
+		b.StopTimer()
+	}
+
+	for _, ch := range m {
+		close(ch)
+	}
+	time.Sleep(10 * time.Millisecond)
+}
+
+func BenchmarkGoroutineIdle(b *testing.B) {
+	quit := make(chan struct{})
+	fn := func() {
+		<-quit
+	}
+	for i := 0; i < *n; i++ {
+		go fn()
+	}
+
+	GC()
+	b.ResetTimer()
+
+	for i := 0; i < b.N; i++ {
+		GC()
+	}
+
+	b.StopTimer()
+	close(quit)
+	time.Sleep(10 * time.Millisecond)
+}
diff --git a/src/runtime/map.go b/src/runtime/map.go
new file mode 100644
index 0000000..22aeb86
--- /dev/null
+++ b/src/runtime/map.go
@@ -0,0 +1,1738 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// This file contains the implementation of Go's map type.
+//
+// A map is just a hash table. The data is arranged
+// into an array of buckets. Each bucket contains up to
+// 8 key/elem pairs. The low-order bits of the hash are
+// used to select a bucket. Each bucket contains a few
+// high-order bits of each hash to distinguish the entries
+// within a single bucket.
+//
+// If more than 8 keys hash to a bucket, we chain on
+// extra buckets.
+//
+// When the hashtable grows, we allocate a new array
+// of buckets twice as big. Buckets are incrementally
+// copied from the old bucket array to the new bucket array.
+//
+// Map iterators walk through the array of buckets and
+// return the keys in walk order (bucket #, then overflow
+// chain order, then bucket index).  To maintain iteration
+// semantics, we never move keys within their bucket (if
+// we did, keys might be returned 0 or 2 times).  When
+// growing the table, iterators remain iterating through the
+// old table and must check the new table if the bucket
+// they are iterating through has been moved ("evacuated")
+// to the new table.
+
+// Picking loadFactor: too large and we have lots of overflow
+// buckets, too small and we waste a lot of space. I wrote
+// a simple program to check some stats for different loads:
+// (64-bit, 8 byte keys and elems)
+//  loadFactor    %overflow  bytes/entry     hitprobe    missprobe
+//        4.00         2.13        20.77         3.00         4.00
+//        4.50         4.05        17.30         3.25         4.50
+//        5.00         6.85        14.77         3.50         5.00
+//        5.50        10.55        12.94         3.75         5.50
+//        6.00        15.27        11.67         4.00         6.00
+//        6.50        20.90        10.79         4.25         6.50
+//        7.00        27.14        10.15         4.50         7.00
+//        7.50        34.03         9.73         4.75         7.50
+//        8.00        41.10         9.40         5.00         8.00
+//
+// %overflow   = percentage of buckets which have an overflow bucket
+// bytes/entry = overhead bytes used per key/elem pair
+// hitprobe    = # of entries to check when looking up a present key
+// missprobe   = # of entries to check when looking up an absent key
+//
+// Keep in mind this data is for maximally loaded tables, i.e. just
+// before the table grows. Typical tables will be somewhat less loaded.
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"runtime/internal/math"
+	"unsafe"
+)
+
+const (
+	// Maximum number of key/elem pairs a bucket can hold.
+	bucketCntBits = abi.MapBucketCountBits
+	bucketCnt     = abi.MapBucketCount
+
+	// Maximum average load of a bucket that triggers growth is bucketCnt*13/16 (about 80% full)
+	// Because of minimum alignment rules, bucketCnt is known to be at least 8.
+	// Represent as loadFactorNum/loadFactorDen, to allow integer math.
+	loadFactorDen = 2
+	loadFactorNum = (bucketCnt * 13 / 16) * loadFactorDen
+
+	// Maximum key or elem size to keep inline (instead of mallocing per element).
+	// Must fit in a uint8.
+	// Fast versions cannot handle big elems - the cutoff size for
+	// fast versions in cmd/compile/internal/gc/walk.go must be at most this elem.
+	maxKeySize  = abi.MapMaxKeyBytes
+	maxElemSize = abi.MapMaxElemBytes
+
+	// data offset should be the size of the bmap struct, but needs to be
+	// aligned correctly. For amd64p32 this means 64-bit alignment
+	// even though pointers are 32 bit.
+	dataOffset = unsafe.Offsetof(struct {
+		b bmap
+		v int64
+	}{}.v)
+
+	// Possible tophash values. We reserve a few possibilities for special marks.
+	// Each bucket (including its overflow buckets, if any) will have either all or none of its
+	// entries in the evacuated* states (except during the evacuate() method, which only happens
+	// during map writes and thus no one else can observe the map during that time).
+	emptyRest      = 0 // this cell is empty, and there are no more non-empty cells at higher indexes or overflows.
+	emptyOne       = 1 // this cell is empty
+	evacuatedX     = 2 // key/elem is valid.  Entry has been evacuated to first half of larger table.
+	evacuatedY     = 3 // same as above, but evacuated to second half of larger table.
+	evacuatedEmpty = 4 // cell is empty, bucket is evacuated.
+	minTopHash     = 5 // minimum tophash for a normal filled cell.
+
+	// flags
+	iterator     = 1 // there may be an iterator using buckets
+	oldIterator  = 2 // there may be an iterator using oldbuckets
+	hashWriting  = 4 // a goroutine is writing to the map
+	sameSizeGrow = 8 // the current map growth is to a new map of the same size
+
+	// sentinel bucket ID for iterator checks
+	noCheck = 1<<(8*goarch.PtrSize) - 1
+)
+
+// isEmpty reports whether the given tophash array entry represents an empty bucket entry.
+func isEmpty(x uint8) bool {
+	return x <= emptyOne
+}
+
+// A header for a Go map.
+type hmap struct {
+	// Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
+	// Make sure this stays in sync with the compiler's definition.
+	count     int // # live cells == size of map.  Must be first (used by len() builtin)
+	flags     uint8
+	B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
+	noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
+	hash0     uint32 // hash seed
+
+	buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
+	oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
+	nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)
+
+	extra *mapextra // optional fields
+}
+
+// mapextra holds fields that are not present on all maps.
+type mapextra struct {
+	// If both key and elem do not contain pointers and are inline, then we mark bucket
+	// type as containing no pointers. This avoids scanning such maps.
+	// However, bmap.overflow is a pointer. In order to keep overflow buckets
+	// alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
+	// overflow and oldoverflow are only used if key and elem do not contain pointers.
+	// overflow contains overflow buckets for hmap.buckets.
+	// oldoverflow contains overflow buckets for hmap.oldbuckets.
+	// The indirection allows to store a pointer to the slice in hiter.
+	overflow    *[]*bmap
+	oldoverflow *[]*bmap
+
+	// nextOverflow holds a pointer to a free overflow bucket.
+	nextOverflow *bmap
+}
+
+// A bucket for a Go map.
+type bmap struct {
+	// tophash generally contains the top byte of the hash value
+	// for each key in this bucket. If tophash[0] < minTopHash,
+	// tophash[0] is a bucket evacuation state instead.
+	tophash [bucketCnt]uint8
+	// Followed by bucketCnt keys and then bucketCnt elems.
+	// NOTE: packing all the keys together and then all the elems together makes the
+	// code a bit more complicated than alternating key/elem/key/elem/... but it allows
+	// us to eliminate padding which would be needed for, e.g., map[int64]int8.
+	// Followed by an overflow pointer.
+}
+
+// A hash iteration structure.
+// If you modify hiter, also change cmd/compile/internal/reflectdata/reflect.go
+// and reflect/value.go to match the layout of this structure.
+type hiter struct {
+	key         unsafe.Pointer // Must be in first position.  Write nil to indicate iteration end (see cmd/compile/internal/walk/range.go).
+	elem        unsafe.Pointer // Must be in second position (see cmd/compile/internal/walk/range.go).
+	t           *maptype
+	h           *hmap
+	buckets     unsafe.Pointer // bucket ptr at hash_iter initialization time
+	bptr        *bmap          // current bucket
+	overflow    *[]*bmap       // keeps overflow buckets of hmap.buckets alive
+	oldoverflow *[]*bmap       // keeps overflow buckets of hmap.oldbuckets alive
+	startBucket uintptr        // bucket iteration started at
+	offset      uint8          // intra-bucket offset to start from during iteration (should be big enough to hold bucketCnt-1)
+	wrapped     bool           // already wrapped around from end of bucket array to beginning
+	B           uint8
+	i           uint8
+	bucket      uintptr
+	checkBucket uintptr
+}
+
+// bucketShift returns 1<<b, optimized for code generation.
+func bucketShift(b uint8) uintptr {
+	// Masking the shift amount allows overflow checks to be elided.
+	return uintptr(1) << (b & (goarch.PtrSize*8 - 1))
+}
+
+// bucketMask returns 1<<b - 1, optimized for code generation.
+func bucketMask(b uint8) uintptr {
+	return bucketShift(b) - 1
+}
+
+// tophash calculates the tophash value for hash.
+func tophash(hash uintptr) uint8 {
+	top := uint8(hash >> (goarch.PtrSize*8 - 8))
+	if top < minTopHash {
+		top += minTopHash
+	}
+	return top
+}
+
+func evacuated(b *bmap) bool {
+	h := b.tophash[0]
+	return h > emptyOne && h < minTopHash
+}
+
+func (b *bmap) overflow(t *maptype) *bmap {
+	return *(**bmap)(add(unsafe.Pointer(b), uintptr(t.BucketSize)-goarch.PtrSize))
+}
+
+func (b *bmap) setoverflow(t *maptype, ovf *bmap) {
+	*(**bmap)(add(unsafe.Pointer(b), uintptr(t.BucketSize)-goarch.PtrSize)) = ovf
+}
+
+func (b *bmap) keys() unsafe.Pointer {
+	return add(unsafe.Pointer(b), dataOffset)
+}
+
+// incrnoverflow increments h.noverflow.
+// noverflow counts the number of overflow buckets.
+// This is used to trigger same-size map growth.
+// See also tooManyOverflowBuckets.
+// To keep hmap small, noverflow is a uint16.
+// When there are few buckets, noverflow is an exact count.
+// When there are many buckets, noverflow is an approximate count.
+func (h *hmap) incrnoverflow() {
+	// We trigger same-size map growth if there are
+	// as many overflow buckets as buckets.
+	// We need to be able to count to 1<<h.B.
+	if h.B < 16 {
+		h.noverflow++
+		return
+	}
+	// Increment with probability 1/(1<<(h.B-15)).
+	// When we reach 1<<15 - 1, we will have approximately
+	// as many overflow buckets as buckets.
+	mask := uint32(1)<<(h.B-15) - 1
+	// Example: if h.B == 18, then mask == 7,
+	// and fastrand & 7 == 0 with probability 1/8.
+	if fastrand()&mask == 0 {
+		h.noverflow++
+	}
+}
+
+func (h *hmap) newoverflow(t *maptype, b *bmap) *bmap {
+	var ovf *bmap
+	if h.extra != nil && h.extra.nextOverflow != nil {
+		// We have preallocated overflow buckets available.
+		// See makeBucketArray for more details.
+		ovf = h.extra.nextOverflow
+		if ovf.overflow(t) == nil {
+			// We're not at the end of the preallocated overflow buckets. Bump the pointer.
+			h.extra.nextOverflow = (*bmap)(add(unsafe.Pointer(ovf), uintptr(t.BucketSize)))
+		} else {
+			// This is the last preallocated overflow bucket.
+			// Reset the overflow pointer on this bucket,
+			// which was set to a non-nil sentinel value.
+			ovf.setoverflow(t, nil)
+			h.extra.nextOverflow = nil
+		}
+	} else {
+		ovf = (*bmap)(newobject(t.Bucket))
+	}
+	h.incrnoverflow()
+	if t.Bucket.PtrBytes == 0 {
+		h.createOverflow()
+		*h.extra.overflow = append(*h.extra.overflow, ovf)
+	}
+	b.setoverflow(t, ovf)
+	return ovf
+}
+
+func (h *hmap) createOverflow() {
+	if h.extra == nil {
+		h.extra = new(mapextra)
+	}
+	if h.extra.overflow == nil {
+		h.extra.overflow = new([]*bmap)
+	}
+}
+
+func makemap64(t *maptype, hint int64, h *hmap) *hmap {
+	if int64(int(hint)) != hint {
+		hint = 0
+	}
+	return makemap(t, int(hint), h)
+}
+
+// makemap_small implements Go map creation for make(map[k]v) and
+// make(map[k]v, hint) when hint is known to be at most bucketCnt
+// at compile time and the map needs to be allocated on the heap.
+func makemap_small() *hmap {
+	h := new(hmap)
+	h.hash0 = fastrand()
+	return h
+}
+
+// makemap implements Go map creation for make(map[k]v, hint).
+// If the compiler has determined that the map or the first bucket
+// can be created on the stack, h and/or bucket may be non-nil.
+// If h != nil, the map can be created directly in h.
+// If h.buckets != nil, bucket pointed to can be used as the first bucket.
+func makemap(t *maptype, hint int, h *hmap) *hmap {
+	mem, overflow := math.MulUintptr(uintptr(hint), t.Bucket.Size_)
+	if overflow || mem > maxAlloc {
+		hint = 0
+	}
+
+	// initialize Hmap
+	if h == nil {
+		h = new(hmap)
+	}
+	h.hash0 = fastrand()
+
+	// Find the size parameter B which will hold the requested # of elements.
+	// For hint < 0 overLoadFactor returns false since hint < bucketCnt.
+	B := uint8(0)
+	for overLoadFactor(hint, B) {
+		B++
+	}
+	h.B = B
+
+	// allocate initial hash table
+	// if B == 0, the buckets field is allocated lazily later (in mapassign)
+	// If hint is large zeroing this memory could take a while.
+	if h.B != 0 {
+		var nextOverflow *bmap
+		h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
+		if nextOverflow != nil {
+			h.extra = new(mapextra)
+			h.extra.nextOverflow = nextOverflow
+		}
+	}
+
+	return h
+}
+
+// makeBucketArray initializes a backing array for map buckets.
+// 1<<b is the minimum number of buckets to allocate.
+// dirtyalloc should either be nil or a bucket array previously
+// allocated by makeBucketArray with the same t and b parameters.
+// If dirtyalloc is nil a new backing array will be alloced and
+// otherwise dirtyalloc will be cleared and reused as backing array.
+func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) {
+	base := bucketShift(b)
+	nbuckets := base
+	// For small b, overflow buckets are unlikely.
+	// Avoid the overhead of the calculation.
+	if b >= 4 {
+		// Add on the estimated number of overflow buckets
+		// required to insert the median number of elements
+		// used with this value of b.
+		nbuckets += bucketShift(b - 4)
+		sz := t.Bucket.Size_ * nbuckets
+		up := roundupsize(sz)
+		if up != sz {
+			nbuckets = up / t.Bucket.Size_
+		}
+	}
+
+	if dirtyalloc == nil {
+		buckets = newarray(t.Bucket, int(nbuckets))
+	} else {
+		// dirtyalloc was previously generated by
+		// the above newarray(t.Bucket, int(nbuckets))
+		// but may not be empty.
+		buckets = dirtyalloc
+		size := t.Bucket.Size_ * nbuckets
+		if t.Bucket.PtrBytes != 0 {
+			memclrHasPointers(buckets, size)
+		} else {
+			memclrNoHeapPointers(buckets, size)
+		}
+	}
+
+	if base != nbuckets {
+		// We preallocated some overflow buckets.
+		// To keep the overhead of tracking these overflow buckets to a minimum,
+		// we use the convention that if a preallocated overflow bucket's overflow
+		// pointer is nil, then there are more available by bumping the pointer.
+		// We need a safe non-nil pointer for the last overflow bucket; just use buckets.
+		nextOverflow = (*bmap)(add(buckets, base*uintptr(t.BucketSize)))
+		last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.BucketSize)))
+		last.setoverflow(t, (*bmap)(buckets))
+	}
+	return buckets, nextOverflow
+}
+
+// mapaccess1 returns a pointer to h[key].  Never returns nil, instead
+// it will return a reference to the zero object for the elem type if
+// the key is not in the map.
+// NOTE: The returned pointer may keep the whole map live, so don't
+// hold onto it for very long.
+func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		pc := abi.FuncPCABIInternal(mapaccess1)
+		racereadpc(unsafe.Pointer(h), callerpc, pc)
+		raceReadObjectPC(t.Key, key, callerpc, pc)
+	}
+	if msanenabled && h != nil {
+		msanread(key, t.Key.Size_)
+	}
+	if asanenabled && h != nil {
+		asanread(key, t.Key.Size_)
+	}
+	if h == nil || h.count == 0 {
+		if t.HashMightPanic() {
+			t.Hasher(key, 0) // see issue 23734
+		}
+		return unsafe.Pointer(&zeroVal[0])
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map read and map write")
+	}
+	hash := t.Hasher(key, uintptr(h.hash0))
+	m := bucketMask(h.B)
+	b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.BucketSize)))
+	if c := h.oldbuckets; c != nil {
+		if !h.sameSizeGrow() {
+			// There used to be half as many buckets; mask down one more power of two.
+			m >>= 1
+		}
+		oldb := (*bmap)(add(c, (hash&m)*uintptr(t.BucketSize)))
+		if !evacuated(oldb) {
+			b = oldb
+		}
+	}
+	top := tophash(hash)
+bucketloop:
+	for ; b != nil; b = b.overflow(t) {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if b.tophash[i] != top {
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.KeySize))
+			if t.IndirectKey() {
+				k = *((*unsafe.Pointer)(k))
+			}
+			if t.Key.Equal(key, k) {
+				e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.KeySize)+i*uintptr(t.ValueSize))
+				if t.IndirectElem() {
+					e = *((*unsafe.Pointer)(e))
+				}
+				return e
+			}
+		}
+	}
+	return unsafe.Pointer(&zeroVal[0])
+}
+
+func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		pc := abi.FuncPCABIInternal(mapaccess2)
+		racereadpc(unsafe.Pointer(h), callerpc, pc)
+		raceReadObjectPC(t.Key, key, callerpc, pc)
+	}
+	if msanenabled && h != nil {
+		msanread(key, t.Key.Size_)
+	}
+	if asanenabled && h != nil {
+		asanread(key, t.Key.Size_)
+	}
+	if h == nil || h.count == 0 {
+		if t.HashMightPanic() {
+			t.Hasher(key, 0) // see issue 23734
+		}
+		return unsafe.Pointer(&zeroVal[0]), false
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map read and map write")
+	}
+	hash := t.Hasher(key, uintptr(h.hash0))
+	m := bucketMask(h.B)
+	b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.BucketSize)))
+	if c := h.oldbuckets; c != nil {
+		if !h.sameSizeGrow() {
+			// There used to be half as many buckets; mask down one more power of two.
+			m >>= 1
+		}
+		oldb := (*bmap)(add(c, (hash&m)*uintptr(t.BucketSize)))
+		if !evacuated(oldb) {
+			b = oldb
+		}
+	}
+	top := tophash(hash)
+bucketloop:
+	for ; b != nil; b = b.overflow(t) {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if b.tophash[i] != top {
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.KeySize))
+			if t.IndirectKey() {
+				k = *((*unsafe.Pointer)(k))
+			}
+			if t.Key.Equal(key, k) {
+				e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.KeySize)+i*uintptr(t.ValueSize))
+				if t.IndirectElem() {
+					e = *((*unsafe.Pointer)(e))
+				}
+				return e, true
+			}
+		}
+	}
+	return unsafe.Pointer(&zeroVal[0]), false
+}
+
+// returns both key and elem. Used by map iterator.
+func mapaccessK(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, unsafe.Pointer) {
+	if h == nil || h.count == 0 {
+		return nil, nil
+	}
+	hash := t.Hasher(key, uintptr(h.hash0))
+	m := bucketMask(h.B)
+	b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.BucketSize)))
+	if c := h.oldbuckets; c != nil {
+		if !h.sameSizeGrow() {
+			// There used to be half as many buckets; mask down one more power of two.
+			m >>= 1
+		}
+		oldb := (*bmap)(add(c, (hash&m)*uintptr(t.BucketSize)))
+		if !evacuated(oldb) {
+			b = oldb
+		}
+	}
+	top := tophash(hash)
+bucketloop:
+	for ; b != nil; b = b.overflow(t) {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if b.tophash[i] != top {
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.KeySize))
+			if t.IndirectKey() {
+				k = *((*unsafe.Pointer)(k))
+			}
+			if t.Key.Equal(key, k) {
+				e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.KeySize)+i*uintptr(t.ValueSize))
+				if t.IndirectElem() {
+					e = *((*unsafe.Pointer)(e))
+				}
+				return k, e
+			}
+		}
+	}
+	return nil, nil
+}
+
+func mapaccess1_fat(t *maptype, h *hmap, key, zero unsafe.Pointer) unsafe.Pointer {
+	e := mapaccess1(t, h, key)
+	if e == unsafe.Pointer(&zeroVal[0]) {
+		return zero
+	}
+	return e
+}
+
+func mapaccess2_fat(t *maptype, h *hmap, key, zero unsafe.Pointer) (unsafe.Pointer, bool) {
+	e := mapaccess1(t, h, key)
+	if e == unsafe.Pointer(&zeroVal[0]) {
+		return zero, false
+	}
+	return e, true
+}
+
+// Like mapaccess, but allocates a slot for the key if it is not present in the map.
+func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
+	if h == nil {
+		panic(plainError("assignment to entry in nil map"))
+	}
+	if raceenabled {
+		callerpc := getcallerpc()
+		pc := abi.FuncPCABIInternal(mapassign)
+		racewritepc(unsafe.Pointer(h), callerpc, pc)
+		raceReadObjectPC(t.Key, key, callerpc, pc)
+	}
+	if msanenabled {
+		msanread(key, t.Key.Size_)
+	}
+	if asanenabled {
+		asanread(key, t.Key.Size_)
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map writes")
+	}
+	hash := t.Hasher(key, uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher, since t.hasher may panic,
+	// in which case we have not actually done a write.
+	h.flags ^= hashWriting
+
+	if h.buckets == nil {
+		h.buckets = newobject(t.Bucket) // newarray(t.Bucket, 1)
+	}
+
+again:
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.BucketSize)))
+	top := tophash(hash)
+
+	var inserti *uint8
+	var insertk unsafe.Pointer
+	var elem unsafe.Pointer
+bucketloop:
+	for {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if b.tophash[i] != top {
+				if isEmpty(b.tophash[i]) && inserti == nil {
+					inserti = &b.tophash[i]
+					insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.KeySize))
+					elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.KeySize)+i*uintptr(t.ValueSize))
+				}
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.KeySize))
+			if t.IndirectKey() {
+				k = *((*unsafe.Pointer)(k))
+			}
+			if !t.Key.Equal(key, k) {
+				continue
+			}
+			// already have a mapping for key. Update it.
+			if t.NeedKeyUpdate() {
+				typedmemmove(t.Key, k, key)
+			}
+			elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.KeySize)+i*uintptr(t.ValueSize))
+			goto done
+		}
+		ovf := b.overflow(t)
+		if ovf == nil {
+			break
+		}
+		b = ovf
+	}
+
+	// Did not find mapping for key. Allocate new cell & add entry.
+
+	// If we hit the max load factor or we have too many overflow buckets,
+	// and we're not already in the middle of growing, start growing.
+	if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
+		hashGrow(t, h)
+		goto again // Growing the table invalidates everything, so try again
+	}
+
+	if inserti == nil {
+		// The current bucket and all the overflow buckets connected to it are full, allocate a new one.
+		newb := h.newoverflow(t, b)
+		inserti = &newb.tophash[0]
+		insertk = add(unsafe.Pointer(newb), dataOffset)
+		elem = add(insertk, bucketCnt*uintptr(t.KeySize))
+	}
+
+	// store new key/elem at insert position
+	if t.IndirectKey() {
+		kmem := newobject(t.Key)
+		*(*unsafe.Pointer)(insertk) = kmem
+		insertk = kmem
+	}
+	if t.IndirectElem() {
+		vmem := newobject(t.Elem)
+		*(*unsafe.Pointer)(elem) = vmem
+	}
+	typedmemmove(t.Key, insertk, key)
+	*inserti = top
+	h.count++
+
+done:
+	if h.flags&hashWriting == 0 {
+		fatal("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+	if t.IndirectElem() {
+		elem = *((*unsafe.Pointer)(elem))
+	}
+	return elem
+}
+
+func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		pc := abi.FuncPCABIInternal(mapdelete)
+		racewritepc(unsafe.Pointer(h), callerpc, pc)
+		raceReadObjectPC(t.Key, key, callerpc, pc)
+	}
+	if msanenabled && h != nil {
+		msanread(key, t.Key.Size_)
+	}
+	if asanenabled && h != nil {
+		asanread(key, t.Key.Size_)
+	}
+	if h == nil || h.count == 0 {
+		if t.HashMightPanic() {
+			t.Hasher(key, 0) // see issue 23734
+		}
+		return
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map writes")
+	}
+
+	hash := t.Hasher(key, uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher, since t.hasher may panic,
+	// in which case we have not actually done a write (delete).
+	h.flags ^= hashWriting
+
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.BucketSize)))
+	bOrig := b
+	top := tophash(hash)
+search:
+	for ; b != nil; b = b.overflow(t) {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if b.tophash[i] != top {
+				if b.tophash[i] == emptyRest {
+					break search
+				}
+				continue
+			}
+			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.KeySize))
+			k2 := k
+			if t.IndirectKey() {
+				k2 = *((*unsafe.Pointer)(k2))
+			}
+			if !t.Key.Equal(key, k2) {
+				continue
+			}
+			// Only clear key if there are pointers in it.
+			if t.IndirectKey() {
+				*(*unsafe.Pointer)(k) = nil
+			} else if t.Key.PtrBytes != 0 {
+				memclrHasPointers(k, t.Key.Size_)
+			}
+			e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.KeySize)+i*uintptr(t.ValueSize))
+			if t.IndirectElem() {
+				*(*unsafe.Pointer)(e) = nil
+			} else if t.Elem.PtrBytes != 0 {
+				memclrHasPointers(e, t.Elem.Size_)
+			} else {
+				memclrNoHeapPointers(e, t.Elem.Size_)
+			}
+			b.tophash[i] = emptyOne
+			// If the bucket now ends in a bunch of emptyOne states,
+			// change those to emptyRest states.
+			// It would be nice to make this a separate function, but
+			// for loops are not currently inlineable.
+			if i == bucketCnt-1 {
+				if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
+					goto notLast
+				}
+			} else {
+				if b.tophash[i+1] != emptyRest {
+					goto notLast
+				}
+			}
+			for {
+				b.tophash[i] = emptyRest
+				if i == 0 {
+					if b == bOrig {
+						break // beginning of initial bucket, we're done.
+					}
+					// Find previous bucket, continue at its last entry.
+					c := b
+					for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
+					}
+					i = bucketCnt - 1
+				} else {
+					i--
+				}
+				if b.tophash[i] != emptyOne {
+					break
+				}
+			}
+		notLast:
+			h.count--
+			// Reset the hash seed to make it more difficult for attackers to
+			// repeatedly trigger hash collisions. See issue 25237.
+			if h.count == 0 {
+				h.hash0 = fastrand()
+			}
+			break search
+		}
+	}
+
+	if h.flags&hashWriting == 0 {
+		fatal("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+}
+
+// mapiterinit initializes the hiter struct used for ranging over maps.
+// The hiter struct pointed to by 'it' is allocated on the stack
+// by the compilers order pass or on the heap by reflect_mapiterinit.
+// Both need to have zeroed hiter since the struct contains pointers.
+func mapiterinit(t *maptype, h *hmap, it *hiter) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(mapiterinit))
+	}
+
+	it.t = t
+	if h == nil || h.count == 0 {
+		return
+	}
+
+	if unsafe.Sizeof(hiter{})/goarch.PtrSize != 12 {
+		throw("hash_iter size incorrect") // see cmd/compile/internal/reflectdata/reflect.go
+	}
+	it.h = h
+
+	// grab snapshot of bucket state
+	it.B = h.B
+	it.buckets = h.buckets
+	if t.Bucket.PtrBytes == 0 {
+		// Allocate the current slice and remember pointers to both current and old.
+		// This preserves all relevant overflow buckets alive even if
+		// the table grows and/or overflow buckets are added to the table
+		// while we are iterating.
+		h.createOverflow()
+		it.overflow = h.extra.overflow
+		it.oldoverflow = h.extra.oldoverflow
+	}
+
+	// decide where to start
+	var r uintptr
+	if h.B > 31-bucketCntBits {
+		r = uintptr(fastrand64())
+	} else {
+		r = uintptr(fastrand())
+	}
+	it.startBucket = r & bucketMask(h.B)
+	it.offset = uint8(r >> h.B & (bucketCnt - 1))
+
+	// iterator state
+	it.bucket = it.startBucket
+
+	// Remember we have an iterator.
+	// Can run concurrently with another mapiterinit().
+	if old := h.flags; old&(iterator|oldIterator) != iterator|oldIterator {
+		atomic.Or8(&h.flags, iterator|oldIterator)
+	}
+
+	mapiternext(it)
+}
+
+func mapiternext(it *hiter) {
+	h := it.h
+	if raceenabled {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(mapiternext))
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map iteration and map write")
+	}
+	t := it.t
+	bucket := it.bucket
+	b := it.bptr
+	i := it.i
+	checkBucket := it.checkBucket
+
+next:
+	if b == nil {
+		if bucket == it.startBucket && it.wrapped {
+			// end of iteration
+			it.key = nil
+			it.elem = nil
+			return
+		}
+		if h.growing() && it.B == h.B {
+			// Iterator was started in the middle of a grow, and the grow isn't done yet.
+			// If the bucket we're looking at hasn't been filled in yet (i.e. the old
+			// bucket hasn't been evacuated) then we need to iterate through the old
+			// bucket and only return the ones that will be migrated to this bucket.
+			oldbucket := bucket & it.h.oldbucketmask()
+			b = (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.BucketSize)))
+			if !evacuated(b) {
+				checkBucket = bucket
+			} else {
+				b = (*bmap)(add(it.buckets, bucket*uintptr(t.BucketSize)))
+				checkBucket = noCheck
+			}
+		} else {
+			b = (*bmap)(add(it.buckets, bucket*uintptr(t.BucketSize)))
+			checkBucket = noCheck
+		}
+		bucket++
+		if bucket == bucketShift(it.B) {
+			bucket = 0
+			it.wrapped = true
+		}
+		i = 0
+	}
+	for ; i < bucketCnt; i++ {
+		offi := (i + it.offset) & (bucketCnt - 1)
+		if isEmpty(b.tophash[offi]) || b.tophash[offi] == evacuatedEmpty {
+			// TODO: emptyRest is hard to use here, as we start iterating
+			// in the middle of a bucket. It's feasible, just tricky.
+			continue
+		}
+		k := add(unsafe.Pointer(b), dataOffset+uintptr(offi)*uintptr(t.KeySize))
+		if t.IndirectKey() {
+			k = *((*unsafe.Pointer)(k))
+		}
+		e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.KeySize)+uintptr(offi)*uintptr(t.ValueSize))
+		if checkBucket != noCheck && !h.sameSizeGrow() {
+			// Special case: iterator was started during a grow to a larger size
+			// and the grow is not done yet. We're working on a bucket whose
+			// oldbucket has not been evacuated yet. Or at least, it wasn't
+			// evacuated when we started the bucket. So we're iterating
+			// through the oldbucket, skipping any keys that will go
+			// to the other new bucket (each oldbucket expands to two
+			// buckets during a grow).
+			if t.ReflexiveKey() || t.Key.Equal(k, k) {
+				// If the item in the oldbucket is not destined for
+				// the current new bucket in the iteration, skip it.
+				hash := t.Hasher(k, uintptr(h.hash0))
+				if hash&bucketMask(it.B) != checkBucket {
+					continue
+				}
+			} else {
+				// Hash isn't repeatable if k != k (NaNs).  We need a
+				// repeatable and randomish choice of which direction
+				// to send NaNs during evacuation. We'll use the low
+				// bit of tophash to decide which way NaNs go.
+				// NOTE: this case is why we need two evacuate tophash
+				// values, evacuatedX and evacuatedY, that differ in
+				// their low bit.
+				if checkBucket>>(it.B-1) != uintptr(b.tophash[offi]&1) {
+					continue
+				}
+			}
+		}
+		if (b.tophash[offi] != evacuatedX && b.tophash[offi] != evacuatedY) ||
+			!(t.ReflexiveKey() || t.Key.Equal(k, k)) {
+			// This is the golden data, we can return it.
+			// OR
+			// key!=key, so the entry can't be deleted or updated, so we can just return it.
+			// That's lucky for us because when key!=key we can't look it up successfully.
+			it.key = k
+			if t.IndirectElem() {
+				e = *((*unsafe.Pointer)(e))
+			}
+			it.elem = e
+		} else {
+			// The hash table has grown since the iterator was started.
+			// The golden data for this key is now somewhere else.
+			// Check the current hash table for the data.
+			// This code handles the case where the key
+			// has been deleted, updated, or deleted and reinserted.
+			// NOTE: we need to regrab the key as it has potentially been
+			// updated to an equal() but not identical key (e.g. +0.0 vs -0.0).
+			rk, re := mapaccessK(t, h, k)
+			if rk == nil {
+				continue // key has been deleted
+			}
+			it.key = rk
+			it.elem = re
+		}
+		it.bucket = bucket
+		if it.bptr != b { // avoid unnecessary write barrier; see issue 14921
+			it.bptr = b
+		}
+		it.i = i + 1
+		it.checkBucket = checkBucket
+		return
+	}
+	b = b.overflow(t)
+	i = 0
+	goto next
+}
+
+// mapclear deletes all keys from a map.
+func mapclear(t *maptype, h *hmap) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		pc := abi.FuncPCABIInternal(mapclear)
+		racewritepc(unsafe.Pointer(h), callerpc, pc)
+	}
+
+	if h == nil || h.count == 0 {
+		return
+	}
+
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map writes")
+	}
+
+	h.flags ^= hashWriting
+
+	// Mark buckets empty, so existing iterators can be terminated, see issue #59411.
+	markBucketsEmpty := func(bucket unsafe.Pointer, mask uintptr) {
+		for i := uintptr(0); i <= mask; i++ {
+			b := (*bmap)(add(bucket, i*uintptr(t.BucketSize)))
+			for ; b != nil; b = b.overflow(t) {
+				for i := uintptr(0); i < bucketCnt; i++ {
+					b.tophash[i] = emptyRest
+				}
+			}
+		}
+	}
+	markBucketsEmpty(h.buckets, bucketMask(h.B))
+	if oldBuckets := h.oldbuckets; oldBuckets != nil {
+		markBucketsEmpty(oldBuckets, h.oldbucketmask())
+	}
+
+	h.flags &^= sameSizeGrow
+	h.oldbuckets = nil
+	h.nevacuate = 0
+	h.noverflow = 0
+	h.count = 0
+
+	// Reset the hash seed to make it more difficult for attackers to
+	// repeatedly trigger hash collisions. See issue 25237.
+	h.hash0 = fastrand()
+
+	// Keep the mapextra allocation but clear any extra information.
+	if h.extra != nil {
+		*h.extra = mapextra{}
+	}
+
+	// makeBucketArray clears the memory pointed to by h.buckets
+	// and recovers any overflow buckets by generating them
+	// as if h.buckets was newly alloced.
+	_, nextOverflow := makeBucketArray(t, h.B, h.buckets)
+	if nextOverflow != nil {
+		// If overflow buckets are created then h.extra
+		// will have been allocated during initial bucket creation.
+		h.extra.nextOverflow = nextOverflow
+	}
+
+	if h.flags&hashWriting == 0 {
+		fatal("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+}
+
+func hashGrow(t *maptype, h *hmap) {
+	// If we've hit the load factor, get bigger.
+	// Otherwise, there are too many overflow buckets,
+	// so keep the same number of buckets and "grow" laterally.
+	bigger := uint8(1)
+	if !overLoadFactor(h.count+1, h.B) {
+		bigger = 0
+		h.flags |= sameSizeGrow
+	}
+	oldbuckets := h.buckets
+	newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)
+
+	flags := h.flags &^ (iterator | oldIterator)
+	if h.flags&iterator != 0 {
+		flags |= oldIterator
+	}
+	// commit the grow (atomic wrt gc)
+	h.B += bigger
+	h.flags = flags
+	h.oldbuckets = oldbuckets
+	h.buckets = newbuckets
+	h.nevacuate = 0
+	h.noverflow = 0
+
+	if h.extra != nil && h.extra.overflow != nil {
+		// Promote current overflow buckets to the old generation.
+		if h.extra.oldoverflow != nil {
+			throw("oldoverflow is not nil")
+		}
+		h.extra.oldoverflow = h.extra.overflow
+		h.extra.overflow = nil
+	}
+	if nextOverflow != nil {
+		if h.extra == nil {
+			h.extra = new(mapextra)
+		}
+		h.extra.nextOverflow = nextOverflow
+	}
+
+	// the actual copying of the hash table data is done incrementally
+	// by growWork() and evacuate().
+}
+
+// overLoadFactor reports whether count items placed in 1<<B buckets is over loadFactor.
+func overLoadFactor(count int, B uint8) bool {
+	return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
+}
+
+// tooManyOverflowBuckets reports whether noverflow buckets is too many for a map with 1<<B buckets.
+// Note that most of these overflow buckets must be in sparse use;
+// if use was dense, then we'd have already triggered regular map growth.
+func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {
+	// If the threshold is too low, we do extraneous work.
+	// If the threshold is too high, maps that grow and shrink can hold on to lots of unused memory.
+	// "too many" means (approximately) as many overflow buckets as regular buckets.
+	// See incrnoverflow for more details.
+	if B > 15 {
+		B = 15
+	}
+	// The compiler doesn't see here that B < 16; mask B to generate shorter shift code.
+	return noverflow >= uint16(1)<<(B&15)
+}
+
+// growing reports whether h is growing. The growth may be to the same size or bigger.
+func (h *hmap) growing() bool {
+	return h.oldbuckets != nil
+}
+
+// sameSizeGrow reports whether the current growth is to a map of the same size.
+func (h *hmap) sameSizeGrow() bool {
+	return h.flags&sameSizeGrow != 0
+}
+
+// noldbuckets calculates the number of buckets prior to the current map growth.
+func (h *hmap) noldbuckets() uintptr {
+	oldB := h.B
+	if !h.sameSizeGrow() {
+		oldB--
+	}
+	return bucketShift(oldB)
+}
+
+// oldbucketmask provides a mask that can be applied to calculate n % noldbuckets().
+func (h *hmap) oldbucketmask() uintptr {
+	return h.noldbuckets() - 1
+}
+
+func growWork(t *maptype, h *hmap, bucket uintptr) {
+	// make sure we evacuate the oldbucket corresponding
+	// to the bucket we're about to use
+	evacuate(t, h, bucket&h.oldbucketmask())
+
+	// evacuate one more oldbucket to make progress on growing
+	if h.growing() {
+		evacuate(t, h, h.nevacuate)
+	}
+}
+
+func bucketEvacuated(t *maptype, h *hmap, bucket uintptr) bool {
+	b := (*bmap)(add(h.oldbuckets, bucket*uintptr(t.BucketSize)))
+	return evacuated(b)
+}
+
+// evacDst is an evacuation destination.
+type evacDst struct {
+	b *bmap          // current destination bucket
+	i int            // key/elem index into b
+	k unsafe.Pointer // pointer to current key storage
+	e unsafe.Pointer // pointer to current elem storage
+}
+
+func evacuate(t *maptype, h *hmap, oldbucket uintptr) {
+	b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.BucketSize)))
+	newbit := h.noldbuckets()
+	if !evacuated(b) {
+		// TODO: reuse overflow buckets instead of using new ones, if there
+		// is no iterator using the old buckets.  (If !oldIterator.)
+
+		// xy contains the x and y (low and high) evacuation destinations.
+		var xy [2]evacDst
+		x := &xy[0]
+		x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.BucketSize)))
+		x.k = add(unsafe.Pointer(x.b), dataOffset)
+		x.e = add(x.k, bucketCnt*uintptr(t.KeySize))
+
+		if !h.sameSizeGrow() {
+			// Only calculate y pointers if we're growing bigger.
+			// Otherwise GC can see bad pointers.
+			y := &xy[1]
+			y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.BucketSize)))
+			y.k = add(unsafe.Pointer(y.b), dataOffset)
+			y.e = add(y.k, bucketCnt*uintptr(t.KeySize))
+		}
+
+		for ; b != nil; b = b.overflow(t) {
+			k := add(unsafe.Pointer(b), dataOffset)
+			e := add(k, bucketCnt*uintptr(t.KeySize))
+			for i := 0; i < bucketCnt; i, k, e = i+1, add(k, uintptr(t.KeySize)), add(e, uintptr(t.ValueSize)) {
+				top := b.tophash[i]
+				if isEmpty(top) {
+					b.tophash[i] = evacuatedEmpty
+					continue
+				}
+				if top < minTopHash {
+					throw("bad map state")
+				}
+				k2 := k
+				if t.IndirectKey() {
+					k2 = *((*unsafe.Pointer)(k2))
+				}
+				var useY uint8
+				if !h.sameSizeGrow() {
+					// Compute hash to make our evacuation decision (whether we need
+					// to send this key/elem to bucket x or bucket y).
+					hash := t.Hasher(k2, uintptr(h.hash0))
+					if h.flags&iterator != 0 && !t.ReflexiveKey() && !t.Key.Equal(k2, k2) {
+						// If key != key (NaNs), then the hash could be (and probably
+						// will be) entirely different from the old hash. Moreover,
+						// it isn't reproducible. Reproducibility is required in the
+						// presence of iterators, as our evacuation decision must
+						// match whatever decision the iterator made.
+						// Fortunately, we have the freedom to send these keys either
+						// way. Also, tophash is meaningless for these kinds of keys.
+						// We let the low bit of tophash drive the evacuation decision.
+						// We recompute a new random tophash for the next level so
+						// these keys will get evenly distributed across all buckets
+						// after multiple grows.
+						useY = top & 1
+						top = tophash(hash)
+					} else {
+						if hash&newbit != 0 {
+							useY = 1
+						}
+					}
+				}
+
+				if evacuatedX+1 != evacuatedY || evacuatedX^1 != evacuatedY {
+					throw("bad evacuatedN")
+				}
+
+				b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY
+				dst := &xy[useY]                 // evacuation destination
+
+				if dst.i == bucketCnt {
+					dst.b = h.newoverflow(t, dst.b)
+					dst.i = 0
+					dst.k = add(unsafe.Pointer(dst.b), dataOffset)
+					dst.e = add(dst.k, bucketCnt*uintptr(t.KeySize))
+				}
+				dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
+				if t.IndirectKey() {
+					*(*unsafe.Pointer)(dst.k) = k2 // copy pointer
+				} else {
+					typedmemmove(t.Key, dst.k, k) // copy elem
+				}
+				if t.IndirectElem() {
+					*(*unsafe.Pointer)(dst.e) = *(*unsafe.Pointer)(e)
+				} else {
+					typedmemmove(t.Elem, dst.e, e)
+				}
+				dst.i++
+				// These updates might push these pointers past the end of the
+				// key or elem arrays.  That's ok, as we have the overflow pointer
+				// at the end of the bucket to protect against pointing past the
+				// end of the bucket.
+				dst.k = add(dst.k, uintptr(t.KeySize))
+				dst.e = add(dst.e, uintptr(t.ValueSize))
+			}
+		}
+		// Unlink the overflow buckets & clear key/elem to help GC.
+		if h.flags&oldIterator == 0 && t.Bucket.PtrBytes != 0 {
+			b := add(h.oldbuckets, oldbucket*uintptr(t.BucketSize))
+			// Preserve b.tophash because the evacuation
+			// state is maintained there.
+			ptr := add(b, dataOffset)
+			n := uintptr(t.BucketSize) - dataOffset
+			memclrHasPointers(ptr, n)
+		}
+	}
+
+	if oldbucket == h.nevacuate {
+		advanceEvacuationMark(h, t, newbit)
+	}
+}
+
+func advanceEvacuationMark(h *hmap, t *maptype, newbit uintptr) {
+	h.nevacuate++
+	// Experiments suggest that 1024 is overkill by at least an order of magnitude.
+	// Put it in there as a safeguard anyway, to ensure O(1) behavior.
+	stop := h.nevacuate + 1024
+	if stop > newbit {
+		stop = newbit
+	}
+	for h.nevacuate != stop && bucketEvacuated(t, h, h.nevacuate) {
+		h.nevacuate++
+	}
+	if h.nevacuate == newbit { // newbit == # of oldbuckets
+		// Growing is all done. Free old main bucket array.
+		h.oldbuckets = nil
+		// Can discard old overflow buckets as well.
+		// If they are still referenced by an iterator,
+		// then the iterator holds a pointers to the slice.
+		if h.extra != nil {
+			h.extra.oldoverflow = nil
+		}
+		h.flags &^= sameSizeGrow
+	}
+}
+
+// Reflect stubs. Called from ../reflect/asm_*.s
+
+//go:linkname reflect_makemap reflect.makemap
+func reflect_makemap(t *maptype, cap int) *hmap {
+	// Check invariants and reflects math.
+	if t.Key.Equal == nil {
+		throw("runtime.reflect_makemap: unsupported map key type")
+	}
+	if t.Key.Size_ > maxKeySize && (!t.IndirectKey() || t.KeySize != uint8(goarch.PtrSize)) ||
+		t.Key.Size_ <= maxKeySize && (t.IndirectKey() || t.KeySize != uint8(t.Key.Size_)) {
+		throw("key size wrong")
+	}
+	if t.Elem.Size_ > maxElemSize && (!t.IndirectElem() || t.ValueSize != uint8(goarch.PtrSize)) ||
+		t.Elem.Size_ <= maxElemSize && (t.IndirectElem() || t.ValueSize != uint8(t.Elem.Size_)) {
+		throw("elem size wrong")
+	}
+	if t.Key.Align_ > bucketCnt {
+		throw("key align too big")
+	}
+	if t.Elem.Align_ > bucketCnt {
+		throw("elem align too big")
+	}
+	if t.Key.Size_%uintptr(t.Key.Align_) != 0 {
+		throw("key size not a multiple of key align")
+	}
+	if t.Elem.Size_%uintptr(t.Elem.Align_) != 0 {
+		throw("elem size not a multiple of elem align")
+	}
+	if bucketCnt < 8 {
+		throw("bucketsize too small for proper alignment")
+	}
+	if dataOffset%uintptr(t.Key.Align_) != 0 {
+		throw("need padding in bucket (key)")
+	}
+	if dataOffset%uintptr(t.Elem.Align_) != 0 {
+		throw("need padding in bucket (elem)")
+	}
+
+	return makemap(t, cap, nil)
+}
+
+//go:linkname reflect_mapaccess reflect.mapaccess
+func reflect_mapaccess(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
+	elem, ok := mapaccess2(t, h, key)
+	if !ok {
+		// reflect wants nil for a missing element
+		elem = nil
+	}
+	return elem
+}
+
+//go:linkname reflect_mapaccess_faststr reflect.mapaccess_faststr
+func reflect_mapaccess_faststr(t *maptype, h *hmap, key string) unsafe.Pointer {
+	elem, ok := mapaccess2_faststr(t, h, key)
+	if !ok {
+		// reflect wants nil for a missing element
+		elem = nil
+	}
+	return elem
+}
+
+//go:linkname reflect_mapassign reflect.mapassign0
+func reflect_mapassign(t *maptype, h *hmap, key unsafe.Pointer, elem unsafe.Pointer) {
+	p := mapassign(t, h, key)
+	typedmemmove(t.Elem, p, elem)
+}
+
+//go:linkname reflect_mapassign_faststr reflect.mapassign_faststr0
+func reflect_mapassign_faststr(t *maptype, h *hmap, key string, elem unsafe.Pointer) {
+	p := mapassign_faststr(t, h, key)
+	typedmemmove(t.Elem, p, elem)
+}
+
+//go:linkname reflect_mapdelete reflect.mapdelete
+func reflect_mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {
+	mapdelete(t, h, key)
+}
+
+//go:linkname reflect_mapdelete_faststr reflect.mapdelete_faststr
+func reflect_mapdelete_faststr(t *maptype, h *hmap, key string) {
+	mapdelete_faststr(t, h, key)
+}
+
+//go:linkname reflect_mapiterinit reflect.mapiterinit
+func reflect_mapiterinit(t *maptype, h *hmap, it *hiter) {
+	mapiterinit(t, h, it)
+}
+
+//go:linkname reflect_mapiternext reflect.mapiternext
+func reflect_mapiternext(it *hiter) {
+	mapiternext(it)
+}
+
+//go:linkname reflect_mapiterkey reflect.mapiterkey
+func reflect_mapiterkey(it *hiter) unsafe.Pointer {
+	return it.key
+}
+
+//go:linkname reflect_mapiterelem reflect.mapiterelem
+func reflect_mapiterelem(it *hiter) unsafe.Pointer {
+	return it.elem
+}
+
+//go:linkname reflect_maplen reflect.maplen
+func reflect_maplen(h *hmap) int {
+	if h == nil {
+		return 0
+	}
+	if raceenabled {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(reflect_maplen))
+	}
+	return h.count
+}
+
+//go:linkname reflect_mapclear reflect.mapclear
+func reflect_mapclear(t *maptype, h *hmap) {
+	mapclear(t, h)
+}
+
+//go:linkname reflectlite_maplen internal/reflectlite.maplen
+func reflectlite_maplen(h *hmap) int {
+	if h == nil {
+		return 0
+	}
+	if raceenabled {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(reflect_maplen))
+	}
+	return h.count
+}
+
+const maxZero = 1024 // must match value in reflect/value.go:maxZero cmd/compile/internal/gc/walk.go:zeroValSize
+var zeroVal [maxZero]byte
+
+// mapinitnoop is a no-op function known the Go linker; if a given global
+// map (of the right size) is determined to be dead, the linker will
+// rewrite the relocation (from the package init func) from the outlined
+// map init function to this symbol. Defined in assembly so as to avoid
+// complications with instrumentation (coverage, etc).
+func mapinitnoop()
+
+// mapclone for implementing maps.Clone
+//
+//go:linkname mapclone maps.clone
+func mapclone(m any) any {
+	e := efaceOf(&m)
+	e.data = unsafe.Pointer(mapclone2((*maptype)(unsafe.Pointer(e._type)), (*hmap)(e.data)))
+	return m
+}
+
+// moveToBmap moves a bucket from src to dst. It returns the destination bucket or new destination bucket if it overflows
+// and the pos that the next key/value will be written, if pos == bucketCnt means needs to written in overflow bucket.
+func moveToBmap(t *maptype, h *hmap, dst *bmap, pos int, src *bmap) (*bmap, int) {
+	for i := 0; i < bucketCnt; i++ {
+		if isEmpty(src.tophash[i]) {
+			continue
+		}
+
+		for ; pos < bucketCnt; pos++ {
+			if isEmpty(dst.tophash[pos]) {
+				break
+			}
+		}
+
+		if pos == bucketCnt {
+			dst = h.newoverflow(t, dst)
+			pos = 0
+		}
+
+		srcK := add(unsafe.Pointer(src), dataOffset+uintptr(i)*uintptr(t.KeySize))
+		srcEle := add(unsafe.Pointer(src), dataOffset+bucketCnt*uintptr(t.KeySize)+uintptr(i)*uintptr(t.ValueSize))
+		dstK := add(unsafe.Pointer(dst), dataOffset+uintptr(pos)*uintptr(t.KeySize))
+		dstEle := add(unsafe.Pointer(dst), dataOffset+bucketCnt*uintptr(t.KeySize)+uintptr(pos)*uintptr(t.ValueSize))
+
+		dst.tophash[pos] = src.tophash[i]
+		if t.IndirectKey() {
+			srcK = *(*unsafe.Pointer)(srcK)
+			if t.NeedKeyUpdate() {
+				kStore := newobject(t.Key)
+				typedmemmove(t.Key, kStore, srcK)
+				srcK = kStore
+			}
+			// Note: if NeedKeyUpdate is false, then the memory
+			// used to store the key is immutable, so we can share
+			// it between the original map and its clone.
+			*(*unsafe.Pointer)(dstK) = srcK
+		} else {
+			typedmemmove(t.Key, dstK, srcK)
+		}
+		if t.IndirectElem() {
+			srcEle = *(*unsafe.Pointer)(srcEle)
+			eStore := newobject(t.Elem)
+			typedmemmove(t.Elem, eStore, srcEle)
+			*(*unsafe.Pointer)(dstEle) = eStore
+		} else {
+			typedmemmove(t.Elem, dstEle, srcEle)
+		}
+		pos++
+		h.count++
+	}
+	return dst, pos
+}
+
+func mapclone2(t *maptype, src *hmap) *hmap {
+	dst := makemap(t, src.count, nil)
+	dst.hash0 = src.hash0
+	dst.nevacuate = 0
+	//flags do not need to be copied here, just like a new map has no flags.
+
+	if src.count == 0 {
+		return dst
+	}
+
+	if src.flags&hashWriting != 0 {
+		fatal("concurrent map clone and map write")
+	}
+
+	if src.B == 0 && !(t.IndirectKey() && t.NeedKeyUpdate()) && !t.IndirectElem() {
+		// Quick copy for small maps.
+		dst.buckets = newobject(t.Bucket)
+		dst.count = src.count
+		typedmemmove(t.Bucket, dst.buckets, src.buckets)
+		return dst
+	}
+
+	if dst.B == 0 {
+		dst.buckets = newobject(t.Bucket)
+	}
+	dstArraySize := int(bucketShift(dst.B))
+	srcArraySize := int(bucketShift(src.B))
+	for i := 0; i < dstArraySize; i++ {
+		dstBmap := (*bmap)(add(dst.buckets, uintptr(i*int(t.BucketSize))))
+		pos := 0
+		for j := 0; j < srcArraySize; j += dstArraySize {
+			srcBmap := (*bmap)(add(src.buckets, uintptr((i+j)*int(t.BucketSize))))
+			for srcBmap != nil {
+				dstBmap, pos = moveToBmap(t, dst, dstBmap, pos, srcBmap)
+				srcBmap = srcBmap.overflow(t)
+			}
+		}
+	}
+
+	if src.oldbuckets == nil {
+		return dst
+	}
+
+	oldB := src.B
+	srcOldbuckets := src.oldbuckets
+	if !src.sameSizeGrow() {
+		oldB--
+	}
+	oldSrcArraySize := int(bucketShift(oldB))
+
+	for i := 0; i < oldSrcArraySize; i++ {
+		srcBmap := (*bmap)(add(srcOldbuckets, uintptr(i*int(t.BucketSize))))
+		if evacuated(srcBmap) {
+			continue
+		}
+
+		if oldB >= dst.B { // main bucket bits in dst is less than oldB bits in src
+			dstBmap := (*bmap)(add(dst.buckets, (uintptr(i)&bucketMask(dst.B))*uintptr(t.BucketSize)))
+			for dstBmap.overflow(t) != nil {
+				dstBmap = dstBmap.overflow(t)
+			}
+			pos := 0
+			for srcBmap != nil {
+				dstBmap, pos = moveToBmap(t, dst, dstBmap, pos, srcBmap)
+				srcBmap = srcBmap.overflow(t)
+			}
+			continue
+		}
+
+		// oldB < dst.B, so a single source bucket may go to multiple destination buckets.
+		// Process entries one at a time.
+		for srcBmap != nil {
+			// move from oldBlucket to new bucket
+			for i := uintptr(0); i < bucketCnt; i++ {
+				if isEmpty(srcBmap.tophash[i]) {
+					continue
+				}
+
+				if src.flags&hashWriting != 0 {
+					fatal("concurrent map clone and map write")
+				}
+
+				srcK := add(unsafe.Pointer(srcBmap), dataOffset+i*uintptr(t.KeySize))
+				if t.IndirectKey() {
+					srcK = *((*unsafe.Pointer)(srcK))
+				}
+
+				srcEle := add(unsafe.Pointer(srcBmap), dataOffset+bucketCnt*uintptr(t.KeySize)+i*uintptr(t.ValueSize))
+				if t.IndirectElem() {
+					srcEle = *((*unsafe.Pointer)(srcEle))
+				}
+				dstEle := mapassign(t, dst, srcK)
+				typedmemmove(t.Elem, dstEle, srcEle)
+			}
+			srcBmap = srcBmap.overflow(t)
+		}
+	}
+	return dst
+}
+
+// keys for implementing maps.keys
+//
+//go:linkname keys maps.keys
+func keys(m any, p unsafe.Pointer) {
+	e := efaceOf(&m)
+	t := (*maptype)(unsafe.Pointer(e._type))
+	h := (*hmap)(e.data)
+
+	if h == nil || h.count == 0 {
+		return
+	}
+	s := (*slice)(p)
+	r := int(fastrand())
+	offset := uint8(r >> h.B & (bucketCnt - 1))
+	if h.B == 0 {
+		copyKeys(t, h, (*bmap)(h.buckets), s, offset)
+		return
+	}
+	arraySize := int(bucketShift(h.B))
+	buckets := h.buckets
+	for i := 0; i < arraySize; i++ {
+		bucket := (i + r) & (arraySize - 1)
+		b := (*bmap)(add(buckets, uintptr(bucket)*uintptr(t.BucketSize)))
+		copyKeys(t, h, b, s, offset)
+	}
+
+	if h.growing() {
+		oldArraySize := int(h.noldbuckets())
+		for i := 0; i < oldArraySize; i++ {
+			bucket := (i + r) & (oldArraySize - 1)
+			b := (*bmap)(add(h.oldbuckets, uintptr(bucket)*uintptr(t.BucketSize)))
+			if evacuated(b) {
+				continue
+			}
+			copyKeys(t, h, b, s, offset)
+		}
+	}
+	return
+}
+
+func copyKeys(t *maptype, h *hmap, b *bmap, s *slice, offset uint8) {
+	for b != nil {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			offi := (i + uintptr(offset)) & (bucketCnt - 1)
+			if isEmpty(b.tophash[offi]) {
+				continue
+			}
+			if h.flags&hashWriting != 0 {
+				fatal("concurrent map read and map write")
+			}
+			k := add(unsafe.Pointer(b), dataOffset+offi*uintptr(t.KeySize))
+			if t.IndirectKey() {
+				k = *((*unsafe.Pointer)(k))
+			}
+			if s.len >= s.cap {
+				fatal("concurrent map read and map write")
+			}
+			typedmemmove(t.Key, add(s.array, uintptr(s.len)*uintptr(t.KeySize)), k)
+			s.len++
+		}
+		b = b.overflow(t)
+	}
+}
+
+// values for implementing maps.values
+//
+//go:linkname values maps.values
+func values(m any, p unsafe.Pointer) {
+	e := efaceOf(&m)
+	t := (*maptype)(unsafe.Pointer(e._type))
+	h := (*hmap)(e.data)
+	if h == nil || h.count == 0 {
+		return
+	}
+	s := (*slice)(p)
+	r := int(fastrand())
+	offset := uint8(r >> h.B & (bucketCnt - 1))
+	if h.B == 0 {
+		copyValues(t, h, (*bmap)(h.buckets), s, offset)
+		return
+	}
+	arraySize := int(bucketShift(h.B))
+	buckets := h.buckets
+	for i := 0; i < arraySize; i++ {
+		bucket := (i + r) & (arraySize - 1)
+		b := (*bmap)(add(buckets, uintptr(bucket)*uintptr(t.BucketSize)))
+		copyValues(t, h, b, s, offset)
+	}
+
+	if h.growing() {
+		oldArraySize := int(h.noldbuckets())
+		for i := 0; i < oldArraySize; i++ {
+			bucket := (i + r) & (oldArraySize - 1)
+			b := (*bmap)(add(h.oldbuckets, uintptr(bucket)*uintptr(t.BucketSize)))
+			if evacuated(b) {
+				continue
+			}
+			copyValues(t, h, b, s, offset)
+		}
+	}
+	return
+}
+
+func copyValues(t *maptype, h *hmap, b *bmap, s *slice, offset uint8) {
+	for b != nil {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			offi := (i + uintptr(offset)) & (bucketCnt - 1)
+			if isEmpty(b.tophash[offi]) {
+				continue
+			}
+
+			if h.flags&hashWriting != 0 {
+				fatal("concurrent map read and map write")
+			}
+
+			ele := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.KeySize)+offi*uintptr(t.ValueSize))
+			if t.IndirectElem() {
+				ele = *((*unsafe.Pointer)(ele))
+			}
+			if s.len >= s.cap {
+				fatal("concurrent map read and map write")
+			}
+			typedmemmove(t.Elem, add(s.array, uintptr(s.len)*uintptr(t.ValueSize)), ele)
+			s.len++
+		}
+		b = b.overflow(t)
+	}
+}
diff --git a/src/runtime/map_benchmark_test.go b/src/runtime/map_benchmark_test.go
new file mode 100644
index 0000000..b46d2a4
--- /dev/null
+++ b/src/runtime/map_benchmark_test.go
@@ -0,0 +1,535 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"math/rand"
+	"strconv"
+	"strings"
+	"testing"
+)
+
+const size = 10
+
+func BenchmarkHashStringSpeed(b *testing.B) {
+	strings := make([]string, size)
+	for i := 0; i < size; i++ {
+		strings[i] = fmt.Sprintf("string#%d", i)
+	}
+	sum := 0
+	m := make(map[string]int, size)
+	for i := 0; i < size; i++ {
+		m[strings[i]] = 0
+	}
+	idx := 0
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		sum += m[strings[idx]]
+		idx++
+		if idx == size {
+			idx = 0
+		}
+	}
+}
+
+type chunk [17]byte
+
+func BenchmarkHashBytesSpeed(b *testing.B) {
+	// a bunch of chunks, each with a different alignment mod 16
+	var chunks [size]chunk
+	// initialize each to a different value
+	for i := 0; i < size; i++ {
+		chunks[i][0] = byte(i)
+	}
+	// put into a map
+	m := make(map[chunk]int, size)
+	for i, c := range chunks {
+		m[c] = i
+	}
+	idx := 0
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		if m[chunks[idx]] != idx {
+			b.Error("bad map entry for chunk")
+		}
+		idx++
+		if idx == size {
+			idx = 0
+		}
+	}
+}
+
+func BenchmarkHashInt32Speed(b *testing.B) {
+	ints := make([]int32, size)
+	for i := 0; i < size; i++ {
+		ints[i] = int32(i)
+	}
+	sum := 0
+	m := make(map[int32]int, size)
+	for i := 0; i < size; i++ {
+		m[ints[i]] = 0
+	}
+	idx := 0
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		sum += m[ints[idx]]
+		idx++
+		if idx == size {
+			idx = 0
+		}
+	}
+}
+
+func BenchmarkHashInt64Speed(b *testing.B) {
+	ints := make([]int64, size)
+	for i := 0; i < size; i++ {
+		ints[i] = int64(i)
+	}
+	sum := 0
+	m := make(map[int64]int, size)
+	for i := 0; i < size; i++ {
+		m[ints[i]] = 0
+	}
+	idx := 0
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		sum += m[ints[idx]]
+		idx++
+		if idx == size {
+			idx = 0
+		}
+	}
+}
+func BenchmarkHashStringArraySpeed(b *testing.B) {
+	stringpairs := make([][2]string, size)
+	for i := 0; i < size; i++ {
+		for j := 0; j < 2; j++ {
+			stringpairs[i][j] = fmt.Sprintf("string#%d/%d", i, j)
+		}
+	}
+	sum := 0
+	m := make(map[[2]string]int, size)
+	for i := 0; i < size; i++ {
+		m[stringpairs[i]] = 0
+	}
+	idx := 0
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		sum += m[stringpairs[idx]]
+		idx++
+		if idx == size {
+			idx = 0
+		}
+	}
+}
+
+func BenchmarkMegMap(b *testing.B) {
+	m := make(map[string]bool)
+	for suffix := 'A'; suffix <= 'G'; suffix++ {
+		m[strings.Repeat("X", 1<<20-1)+fmt.Sprint(suffix)] = true
+	}
+	key := strings.Repeat("X", 1<<20-1) + "k"
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		_, _ = m[key]
+	}
+}
+
+func BenchmarkMegOneMap(b *testing.B) {
+	m := make(map[string]bool)
+	m[strings.Repeat("X", 1<<20)] = true
+	key := strings.Repeat("Y", 1<<20)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		_, _ = m[key]
+	}
+}
+
+func BenchmarkMegEqMap(b *testing.B) {
+	m := make(map[string]bool)
+	key1 := strings.Repeat("X", 1<<20)
+	key2 := strings.Repeat("X", 1<<20) // equal but different instance
+	m[key1] = true
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		_, _ = m[key2]
+	}
+}
+
+func BenchmarkMegEmptyMap(b *testing.B) {
+	m := make(map[string]bool)
+	key := strings.Repeat("X", 1<<20)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		_, _ = m[key]
+	}
+}
+
+func BenchmarkSmallStrMap(b *testing.B) {
+	m := make(map[string]bool)
+	for suffix := 'A'; suffix <= 'G'; suffix++ {
+		m[fmt.Sprint(suffix)] = true
+	}
+	key := "k"
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		_, _ = m[key]
+	}
+}
+
+func BenchmarkMapStringKeysEight_16(b *testing.B) { benchmarkMapStringKeysEight(b, 16) }
+func BenchmarkMapStringKeysEight_32(b *testing.B) { benchmarkMapStringKeysEight(b, 32) }
+func BenchmarkMapStringKeysEight_64(b *testing.B) { benchmarkMapStringKeysEight(b, 64) }
+func BenchmarkMapStringKeysEight_1M(b *testing.B) { benchmarkMapStringKeysEight(b, 1<<20) }
+
+func benchmarkMapStringKeysEight(b *testing.B, keySize int) {
+	m := make(map[string]bool)
+	for i := 0; i < 8; i++ {
+		m[strings.Repeat("K", i+1)] = true
+	}
+	key := strings.Repeat("K", keySize)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		_ = m[key]
+	}
+}
+
+func BenchmarkIntMap(b *testing.B) {
+	m := make(map[int]bool)
+	for i := 0; i < 8; i++ {
+		m[i] = true
+	}
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		_, _ = m[7]
+	}
+}
+
+func BenchmarkMapFirst(b *testing.B) {
+	for n := 1; n <= 16; n++ {
+		b.Run(fmt.Sprintf("%d", n), func(b *testing.B) {
+			m := make(map[int]bool)
+			for i := 0; i < n; i++ {
+				m[i] = true
+			}
+			b.ResetTimer()
+			for i := 0; i < b.N; i++ {
+				_ = m[0]
+			}
+		})
+	}
+}
+func BenchmarkMapMid(b *testing.B) {
+	for n := 1; n <= 16; n++ {
+		b.Run(fmt.Sprintf("%d", n), func(b *testing.B) {
+			m := make(map[int]bool)
+			for i := 0; i < n; i++ {
+				m[i] = true
+			}
+			b.ResetTimer()
+			for i := 0; i < b.N; i++ {
+				_ = m[n>>1]
+			}
+		})
+	}
+}
+func BenchmarkMapLast(b *testing.B) {
+	for n := 1; n <= 16; n++ {
+		b.Run(fmt.Sprintf("%d", n), func(b *testing.B) {
+			m := make(map[int]bool)
+			for i := 0; i < n; i++ {
+				m[i] = true
+			}
+			b.ResetTimer()
+			for i := 0; i < b.N; i++ {
+				_ = m[n-1]
+			}
+		})
+	}
+}
+
+func BenchmarkMapCycle(b *testing.B) {
+	// Arrange map entries to be a permutation, so that
+	// we hit all entries, and one lookup is data dependent
+	// on the previous lookup.
+	const N = 3127
+	p := rand.New(rand.NewSource(1)).Perm(N)
+	m := map[int]int{}
+	for i := 0; i < N; i++ {
+		m[i] = p[i]
+	}
+	b.ResetTimer()
+	j := 0
+	for i := 0; i < b.N; i++ {
+		j = m[j]
+	}
+	sink = uint64(j)
+}
+
+// Accessing the same keys in a row.
+func benchmarkRepeatedLookup(b *testing.B, lookupKeySize int) {
+	m := make(map[string]bool)
+	// At least bigger than a single bucket:
+	for i := 0; i < 64; i++ {
+		m[fmt.Sprintf("some key %d", i)] = true
+	}
+	base := strings.Repeat("x", lookupKeySize-1)
+	key1 := base + "1"
+	key2 := base + "2"
+	b.ResetTimer()
+	for i := 0; i < b.N/4; i++ {
+		_ = m[key1]
+		_ = m[key1]
+		_ = m[key2]
+		_ = m[key2]
+	}
+}
+
+func BenchmarkRepeatedLookupStrMapKey32(b *testing.B) { benchmarkRepeatedLookup(b, 32) }
+func BenchmarkRepeatedLookupStrMapKey1M(b *testing.B) { benchmarkRepeatedLookup(b, 1<<20) }
+
+func BenchmarkMakeMap(b *testing.B) {
+	b.Run("[Byte]Byte", func(b *testing.B) {
+		var m map[byte]byte
+		for i := 0; i < b.N; i++ {
+			m = make(map[byte]byte, 10)
+		}
+		hugeSink = m
+	})
+	b.Run("[Int]Int", func(b *testing.B) {
+		var m map[int]int
+		for i := 0; i < b.N; i++ {
+			m = make(map[int]int, 10)
+		}
+		hugeSink = m
+	})
+}
+
+func BenchmarkNewEmptyMap(b *testing.B) {
+	b.ReportAllocs()
+	for i := 0; i < b.N; i++ {
+		_ = make(map[int]int)
+	}
+}
+
+func BenchmarkNewSmallMap(b *testing.B) {
+	b.ReportAllocs()
+	for i := 0; i < b.N; i++ {
+		m := make(map[int]int)
+		m[0] = 0
+		m[1] = 1
+	}
+}
+
+func BenchmarkMapIter(b *testing.B) {
+	m := make(map[int]bool)
+	for i := 0; i < 8; i++ {
+		m[i] = true
+	}
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		for range m {
+		}
+	}
+}
+
+func BenchmarkMapIterEmpty(b *testing.B) {
+	m := make(map[int]bool)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		for range m {
+		}
+	}
+}
+
+func BenchmarkSameLengthMap(b *testing.B) {
+	// long strings, same length, differ in first few
+	// and last few bytes.
+	m := make(map[string]bool)
+	s1 := "foo" + strings.Repeat("-", 100) + "bar"
+	s2 := "goo" + strings.Repeat("-", 100) + "ber"
+	m[s1] = true
+	m[s2] = true
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		_ = m[s1]
+	}
+}
+
+type BigKey [3]int64
+
+func BenchmarkBigKeyMap(b *testing.B) {
+	m := make(map[BigKey]bool)
+	k := BigKey{3, 4, 5}
+	m[k] = true
+	for i := 0; i < b.N; i++ {
+		_ = m[k]
+	}
+}
+
+type BigVal [3]int64
+
+func BenchmarkBigValMap(b *testing.B) {
+	m := make(map[BigKey]BigVal)
+	k := BigKey{3, 4, 5}
+	m[k] = BigVal{6, 7, 8}
+	for i := 0; i < b.N; i++ {
+		_ = m[k]
+	}
+}
+
+func BenchmarkSmallKeyMap(b *testing.B) {
+	m := make(map[int16]bool)
+	m[5] = true
+	for i := 0; i < b.N; i++ {
+		_ = m[5]
+	}
+}
+
+func BenchmarkMapPopulate(b *testing.B) {
+	for size := 1; size < 1000000; size *= 10 {
+		b.Run(strconv.Itoa(size), func(b *testing.B) {
+			b.ReportAllocs()
+			for i := 0; i < b.N; i++ {
+				m := make(map[int]bool)
+				for j := 0; j < size; j++ {
+					m[j] = true
+				}
+			}
+		})
+	}
+}
+
+type ComplexAlgKey struct {
+	a, b, c int64
+	_       int
+	d       int32
+	_       int
+	e       string
+	_       int
+	f, g, h int64
+}
+
+func BenchmarkComplexAlgMap(b *testing.B) {
+	m := make(map[ComplexAlgKey]bool)
+	var k ComplexAlgKey
+	m[k] = true
+	for i := 0; i < b.N; i++ {
+		_ = m[k]
+	}
+}
+
+func BenchmarkGoMapClear(b *testing.B) {
+	b.Run("Reflexive", func(b *testing.B) {
+		for size := 1; size < 100000; size *= 10 {
+			b.Run(strconv.Itoa(size), func(b *testing.B) {
+				m := make(map[int]int, size)
+				for i := 0; i < b.N; i++ {
+					m[0] = size // Add one element so len(m) != 0 avoiding fast paths.
+					for k := range m {
+						delete(m, k)
+					}
+				}
+			})
+		}
+	})
+	b.Run("NonReflexive", func(b *testing.B) {
+		for size := 1; size < 100000; size *= 10 {
+			b.Run(strconv.Itoa(size), func(b *testing.B) {
+				m := make(map[float64]int, size)
+				for i := 0; i < b.N; i++ {
+					m[1.0] = size // Add one element so len(m) != 0 avoiding fast paths.
+					for k := range m {
+						delete(m, k)
+					}
+				}
+			})
+		}
+	})
+}
+
+func BenchmarkMapStringConversion(b *testing.B) {
+	for _, length := range []int{32, 64} {
+		b.Run(strconv.Itoa(length), func(b *testing.B) {
+			bytes := make([]byte, length)
+			b.Run("simple", func(b *testing.B) {
+				b.ReportAllocs()
+				m := make(map[string]int)
+				m[string(bytes)] = 0
+				for i := 0; i < b.N; i++ {
+					_ = m[string(bytes)]
+				}
+			})
+			b.Run("struct", func(b *testing.B) {
+				b.ReportAllocs()
+				type stringstruct struct{ s string }
+				m := make(map[stringstruct]int)
+				m[stringstruct{string(bytes)}] = 0
+				for i := 0; i < b.N; i++ {
+					_ = m[stringstruct{string(bytes)}]
+				}
+			})
+			b.Run("array", func(b *testing.B) {
+				b.ReportAllocs()
+				type stringarray [1]string
+				m := make(map[stringarray]int)
+				m[stringarray{string(bytes)}] = 0
+				for i := 0; i < b.N; i++ {
+					_ = m[stringarray{string(bytes)}]
+				}
+			})
+		})
+	}
+}
+
+var BoolSink bool
+
+func BenchmarkMapInterfaceString(b *testing.B) {
+	m := map[any]bool{}
+
+	for i := 0; i < 100; i++ {
+		m[fmt.Sprintf("%d", i)] = true
+	}
+
+	key := (any)("A")
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		BoolSink = m[key]
+	}
+}
+func BenchmarkMapInterfacePtr(b *testing.B) {
+	m := map[any]bool{}
+
+	for i := 0; i < 100; i++ {
+		i := i
+		m[&i] = true
+	}
+
+	key := new(int)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		BoolSink = m[key]
+	}
+}
+
+var (
+	hintLessThan8    = 7
+	hintGreaterThan8 = 32
+)
+
+func BenchmarkNewEmptyMapHintLessThan8(b *testing.B) {
+	b.ReportAllocs()
+	for i := 0; i < b.N; i++ {
+		_ = make(map[int]int, hintLessThan8)
+	}
+}
+
+func BenchmarkNewEmptyMapHintGreaterThan8(b *testing.B) {
+	b.ReportAllocs()
+	for i := 0; i < b.N; i++ {
+		_ = make(map[int]int, hintGreaterThan8)
+	}
+}
diff --git a/src/runtime/map_fast32.go b/src/runtime/map_fast32.go
new file mode 100644
index 0000000..d10dca3
--- /dev/null
+++ b/src/runtime/map_fast32.go
@@ -0,0 +1,462 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+func mapaccess1_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(mapaccess1_fast32))
+	}
+	if h == nil || h.count == 0 {
+		return unsafe.Pointer(&zeroVal[0])
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map read and map write")
+	}
+	var b *bmap
+	if h.B == 0 {
+		// One-bucket table. No need to hash.
+		b = (*bmap)(h.buckets)
+	} else {
+		hash := t.Hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+		m := bucketMask(h.B)
+		b = (*bmap)(add(h.buckets, (hash&m)*uintptr(t.BucketSize)))
+		if c := h.oldbuckets; c != nil {
+			if !h.sameSizeGrow() {
+				// There used to be half as many buckets; mask down one more power of two.
+				m >>= 1
+			}
+			oldb := (*bmap)(add(c, (hash&m)*uintptr(t.BucketSize)))
+			if !evacuated(oldb) {
+				b = oldb
+			}
+		}
+	}
+	for ; b != nil; b = b.overflow(t) {
+		for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 4) {
+			if *(*uint32)(k) == key && !isEmpty(b.tophash[i]) {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*4+i*uintptr(t.ValueSize))
+			}
+		}
+	}
+	return unsafe.Pointer(&zeroVal[0])
+}
+
+func mapaccess2_fast32(t *maptype, h *hmap, key uint32) (unsafe.Pointer, bool) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(mapaccess2_fast32))
+	}
+	if h == nil || h.count == 0 {
+		return unsafe.Pointer(&zeroVal[0]), false
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map read and map write")
+	}
+	var b *bmap
+	if h.B == 0 {
+		// One-bucket table. No need to hash.
+		b = (*bmap)(h.buckets)
+	} else {
+		hash := t.Hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+		m := bucketMask(h.B)
+		b = (*bmap)(add(h.buckets, (hash&m)*uintptr(t.BucketSize)))
+		if c := h.oldbuckets; c != nil {
+			if !h.sameSizeGrow() {
+				// There used to be half as many buckets; mask down one more power of two.
+				m >>= 1
+			}
+			oldb := (*bmap)(add(c, (hash&m)*uintptr(t.BucketSize)))
+			if !evacuated(oldb) {
+				b = oldb
+			}
+		}
+	}
+	for ; b != nil; b = b.overflow(t) {
+		for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 4) {
+			if *(*uint32)(k) == key && !isEmpty(b.tophash[i]) {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*4+i*uintptr(t.ValueSize)), true
+			}
+		}
+	}
+	return unsafe.Pointer(&zeroVal[0]), false
+}
+
+func mapassign_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer {
+	if h == nil {
+		panic(plainError("assignment to entry in nil map"))
+	}
+	if raceenabled {
+		callerpc := getcallerpc()
+		racewritepc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(mapassign_fast32))
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map writes")
+	}
+	hash := t.Hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher for consistency with mapassign.
+	h.flags ^= hashWriting
+
+	if h.buckets == nil {
+		h.buckets = newobject(t.Bucket) // newarray(t.bucket, 1)
+	}
+
+again:
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork_fast32(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.BucketSize)))
+
+	var insertb *bmap
+	var inserti uintptr
+	var insertk unsafe.Pointer
+
+bucketloop:
+	for {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if isEmpty(b.tophash[i]) {
+				if insertb == nil {
+					inserti = i
+					insertb = b
+				}
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := *((*uint32)(add(unsafe.Pointer(b), dataOffset+i*4)))
+			if k != key {
+				continue
+			}
+			inserti = i
+			insertb = b
+			goto done
+		}
+		ovf := b.overflow(t)
+		if ovf == nil {
+			break
+		}
+		b = ovf
+	}
+
+	// Did not find mapping for key. Allocate new cell & add entry.
+
+	// If we hit the max load factor or we have too many overflow buckets,
+	// and we're not already in the middle of growing, start growing.
+	if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
+		hashGrow(t, h)
+		goto again // Growing the table invalidates everything, so try again
+	}
+
+	if insertb == nil {
+		// The current bucket and all the overflow buckets connected to it are full, allocate a new one.
+		insertb = h.newoverflow(t, b)
+		inserti = 0 // not necessary, but avoids needlessly spilling inserti
+	}
+	insertb.tophash[inserti&(bucketCnt-1)] = tophash(hash) // mask inserti to avoid bounds checks
+
+	insertk = add(unsafe.Pointer(insertb), dataOffset+inserti*4)
+	// store new key at insert position
+	*(*uint32)(insertk) = key
+
+	h.count++
+
+done:
+	elem := add(unsafe.Pointer(insertb), dataOffset+bucketCnt*4+inserti*uintptr(t.ValueSize))
+	if h.flags&hashWriting == 0 {
+		fatal("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+	return elem
+}
+
+func mapassign_fast32ptr(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
+	if h == nil {
+		panic(plainError("assignment to entry in nil map"))
+	}
+	if raceenabled {
+		callerpc := getcallerpc()
+		racewritepc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(mapassign_fast32))
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map writes")
+	}
+	hash := t.Hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher for consistency with mapassign.
+	h.flags ^= hashWriting
+
+	if h.buckets == nil {
+		h.buckets = newobject(t.Bucket) // newarray(t.bucket, 1)
+	}
+
+again:
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork_fast32(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.BucketSize)))
+
+	var insertb *bmap
+	var inserti uintptr
+	var insertk unsafe.Pointer
+
+bucketloop:
+	for {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if isEmpty(b.tophash[i]) {
+				if insertb == nil {
+					inserti = i
+					insertb = b
+				}
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := *((*unsafe.Pointer)(add(unsafe.Pointer(b), dataOffset+i*4)))
+			if k != key {
+				continue
+			}
+			inserti = i
+			insertb = b
+			goto done
+		}
+		ovf := b.overflow(t)
+		if ovf == nil {
+			break
+		}
+		b = ovf
+	}
+
+	// Did not find mapping for key. Allocate new cell & add entry.
+
+	// If we hit the max load factor or we have too many overflow buckets,
+	// and we're not already in the middle of growing, start growing.
+	if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
+		hashGrow(t, h)
+		goto again // Growing the table invalidates everything, so try again
+	}
+
+	if insertb == nil {
+		// The current bucket and all the overflow buckets connected to it are full, allocate a new one.
+		insertb = h.newoverflow(t, b)
+		inserti = 0 // not necessary, but avoids needlessly spilling inserti
+	}
+	insertb.tophash[inserti&(bucketCnt-1)] = tophash(hash) // mask inserti to avoid bounds checks
+
+	insertk = add(unsafe.Pointer(insertb), dataOffset+inserti*4)
+	// store new key at insert position
+	*(*unsafe.Pointer)(insertk) = key
+
+	h.count++
+
+done:
+	elem := add(unsafe.Pointer(insertb), dataOffset+bucketCnt*4+inserti*uintptr(t.ValueSize))
+	if h.flags&hashWriting == 0 {
+		fatal("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+	return elem
+}
+
+func mapdelete_fast32(t *maptype, h *hmap, key uint32) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racewritepc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(mapdelete_fast32))
+	}
+	if h == nil || h.count == 0 {
+		return
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map writes")
+	}
+
+	hash := t.Hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher for consistency with mapdelete
+	h.flags ^= hashWriting
+
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork_fast32(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.BucketSize)))
+	bOrig := b
+search:
+	for ; b != nil; b = b.overflow(t) {
+		for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 4) {
+			if key != *(*uint32)(k) || isEmpty(b.tophash[i]) {
+				continue
+			}
+			// Only clear key if there are pointers in it.
+			// This can only happen if pointers are 32 bit
+			// wide as 64 bit pointers do not fit into a 32 bit key.
+			if goarch.PtrSize == 4 && t.Key.PtrBytes != 0 {
+				// The key must be a pointer as we checked pointers are
+				// 32 bits wide and the key is 32 bits wide also.
+				*(*unsafe.Pointer)(k) = nil
+			}
+			e := add(unsafe.Pointer(b), dataOffset+bucketCnt*4+i*uintptr(t.ValueSize))
+			if t.Elem.PtrBytes != 0 {
+				memclrHasPointers(e, t.Elem.Size_)
+			} else {
+				memclrNoHeapPointers(e, t.Elem.Size_)
+			}
+			b.tophash[i] = emptyOne
+			// If the bucket now ends in a bunch of emptyOne states,
+			// change those to emptyRest states.
+			if i == bucketCnt-1 {
+				if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
+					goto notLast
+				}
+			} else {
+				if b.tophash[i+1] != emptyRest {
+					goto notLast
+				}
+			}
+			for {
+				b.tophash[i] = emptyRest
+				if i == 0 {
+					if b == bOrig {
+						break // beginning of initial bucket, we're done.
+					}
+					// Find previous bucket, continue at its last entry.
+					c := b
+					for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
+					}
+					i = bucketCnt - 1
+				} else {
+					i--
+				}
+				if b.tophash[i] != emptyOne {
+					break
+				}
+			}
+		notLast:
+			h.count--
+			// Reset the hash seed to make it more difficult for attackers to
+			// repeatedly trigger hash collisions. See issue 25237.
+			if h.count == 0 {
+				h.hash0 = fastrand()
+			}
+			break search
+		}
+	}
+
+	if h.flags&hashWriting == 0 {
+		fatal("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+}
+
+func growWork_fast32(t *maptype, h *hmap, bucket uintptr) {
+	// make sure we evacuate the oldbucket corresponding
+	// to the bucket we're about to use
+	evacuate_fast32(t, h, bucket&h.oldbucketmask())
+
+	// evacuate one more oldbucket to make progress on growing
+	if h.growing() {
+		evacuate_fast32(t, h, h.nevacuate)
+	}
+}
+
+func evacuate_fast32(t *maptype, h *hmap, oldbucket uintptr) {
+	b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.BucketSize)))
+	newbit := h.noldbuckets()
+	if !evacuated(b) {
+		// TODO: reuse overflow buckets instead of using new ones, if there
+		// is no iterator using the old buckets.  (If !oldIterator.)
+
+		// xy contains the x and y (low and high) evacuation destinations.
+		var xy [2]evacDst
+		x := &xy[0]
+		x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.BucketSize)))
+		x.k = add(unsafe.Pointer(x.b), dataOffset)
+		x.e = add(x.k, bucketCnt*4)
+
+		if !h.sameSizeGrow() {
+			// Only calculate y pointers if we're growing bigger.
+			// Otherwise GC can see bad pointers.
+			y := &xy[1]
+			y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.BucketSize)))
+			y.k = add(unsafe.Pointer(y.b), dataOffset)
+			y.e = add(y.k, bucketCnt*4)
+		}
+
+		for ; b != nil; b = b.overflow(t) {
+			k := add(unsafe.Pointer(b), dataOffset)
+			e := add(k, bucketCnt*4)
+			for i := 0; i < bucketCnt; i, k, e = i+1, add(k, 4), add(e, uintptr(t.ValueSize)) {
+				top := b.tophash[i]
+				if isEmpty(top) {
+					b.tophash[i] = evacuatedEmpty
+					continue
+				}
+				if top < minTopHash {
+					throw("bad map state")
+				}
+				var useY uint8
+				if !h.sameSizeGrow() {
+					// Compute hash to make our evacuation decision (whether we need
+					// to send this key/elem to bucket x or bucket y).
+					hash := t.Hasher(k, uintptr(h.hash0))
+					if hash&newbit != 0 {
+						useY = 1
+					}
+				}
+
+				b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY, enforced in makemap
+				dst := &xy[useY]                 // evacuation destination
+
+				if dst.i == bucketCnt {
+					dst.b = h.newoverflow(t, dst.b)
+					dst.i = 0
+					dst.k = add(unsafe.Pointer(dst.b), dataOffset)
+					dst.e = add(dst.k, bucketCnt*4)
+				}
+				dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
+
+				// Copy key.
+				if goarch.PtrSize == 4 && t.Key.PtrBytes != 0 && writeBarrier.enabled {
+					// Write with a write barrier.
+					*(*unsafe.Pointer)(dst.k) = *(*unsafe.Pointer)(k)
+				} else {
+					*(*uint32)(dst.k) = *(*uint32)(k)
+				}
+
+				typedmemmove(t.Elem, dst.e, e)
+				dst.i++
+				// These updates might push these pointers past the end of the
+				// key or elem arrays.  That's ok, as we have the overflow pointer
+				// at the end of the bucket to protect against pointing past the
+				// end of the bucket.
+				dst.k = add(dst.k, 4)
+				dst.e = add(dst.e, uintptr(t.ValueSize))
+			}
+		}
+		// Unlink the overflow buckets & clear key/elem to help GC.
+		if h.flags&oldIterator == 0 && t.Bucket.PtrBytes != 0 {
+			b := add(h.oldbuckets, oldbucket*uintptr(t.BucketSize))
+			// Preserve b.tophash because the evacuation
+			// state is maintained there.
+			ptr := add(b, dataOffset)
+			n := uintptr(t.BucketSize) - dataOffset
+			memclrHasPointers(ptr, n)
+		}
+	}
+
+	if oldbucket == h.nevacuate {
+		advanceEvacuationMark(h, t, newbit)
+	}
+}
diff --git a/src/runtime/map_fast64.go b/src/runtime/map_fast64.go
new file mode 100644
index 0000000..d771e0b
--- /dev/null
+++ b/src/runtime/map_fast64.go
@@ -0,0 +1,470 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+func mapaccess1_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(mapaccess1_fast64))
+	}
+	if h == nil || h.count == 0 {
+		return unsafe.Pointer(&zeroVal[0])
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map read and map write")
+	}
+	var b *bmap
+	if h.B == 0 {
+		// One-bucket table. No need to hash.
+		b = (*bmap)(h.buckets)
+	} else {
+		hash := t.Hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+		m := bucketMask(h.B)
+		b = (*bmap)(add(h.buckets, (hash&m)*uintptr(t.BucketSize)))
+		if c := h.oldbuckets; c != nil {
+			if !h.sameSizeGrow() {
+				// There used to be half as many buckets; mask down one more power of two.
+				m >>= 1
+			}
+			oldb := (*bmap)(add(c, (hash&m)*uintptr(t.BucketSize)))
+			if !evacuated(oldb) {
+				b = oldb
+			}
+		}
+	}
+	for ; b != nil; b = b.overflow(t) {
+		for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 8) {
+			if *(*uint64)(k) == key && !isEmpty(b.tophash[i]) {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*8+i*uintptr(t.ValueSize))
+			}
+		}
+	}
+	return unsafe.Pointer(&zeroVal[0])
+}
+
+func mapaccess2_fast64(t *maptype, h *hmap, key uint64) (unsafe.Pointer, bool) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(mapaccess2_fast64))
+	}
+	if h == nil || h.count == 0 {
+		return unsafe.Pointer(&zeroVal[0]), false
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map read and map write")
+	}
+	var b *bmap
+	if h.B == 0 {
+		// One-bucket table. No need to hash.
+		b = (*bmap)(h.buckets)
+	} else {
+		hash := t.Hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+		m := bucketMask(h.B)
+		b = (*bmap)(add(h.buckets, (hash&m)*uintptr(t.BucketSize)))
+		if c := h.oldbuckets; c != nil {
+			if !h.sameSizeGrow() {
+				// There used to be half as many buckets; mask down one more power of two.
+				m >>= 1
+			}
+			oldb := (*bmap)(add(c, (hash&m)*uintptr(t.BucketSize)))
+			if !evacuated(oldb) {
+				b = oldb
+			}
+		}
+	}
+	for ; b != nil; b = b.overflow(t) {
+		for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 8) {
+			if *(*uint64)(k) == key && !isEmpty(b.tophash[i]) {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*8+i*uintptr(t.ValueSize)), true
+			}
+		}
+	}
+	return unsafe.Pointer(&zeroVal[0]), false
+}
+
+func mapassign_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer {
+	if h == nil {
+		panic(plainError("assignment to entry in nil map"))
+	}
+	if raceenabled {
+		callerpc := getcallerpc()
+		racewritepc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(mapassign_fast64))
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map writes")
+	}
+	hash := t.Hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher for consistency with mapassign.
+	h.flags ^= hashWriting
+
+	if h.buckets == nil {
+		h.buckets = newobject(t.Bucket) // newarray(t.bucket, 1)
+	}
+
+again:
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork_fast64(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.BucketSize)))
+
+	var insertb *bmap
+	var inserti uintptr
+	var insertk unsafe.Pointer
+
+bucketloop:
+	for {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if isEmpty(b.tophash[i]) {
+				if insertb == nil {
+					insertb = b
+					inserti = i
+				}
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := *((*uint64)(add(unsafe.Pointer(b), dataOffset+i*8)))
+			if k != key {
+				continue
+			}
+			insertb = b
+			inserti = i
+			goto done
+		}
+		ovf := b.overflow(t)
+		if ovf == nil {
+			break
+		}
+		b = ovf
+	}
+
+	// Did not find mapping for key. Allocate new cell & add entry.
+
+	// If we hit the max load factor or we have too many overflow buckets,
+	// and we're not already in the middle of growing, start growing.
+	if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
+		hashGrow(t, h)
+		goto again // Growing the table invalidates everything, so try again
+	}
+
+	if insertb == nil {
+		// The current bucket and all the overflow buckets connected to it are full, allocate a new one.
+		insertb = h.newoverflow(t, b)
+		inserti = 0 // not necessary, but avoids needlessly spilling inserti
+	}
+	insertb.tophash[inserti&(bucketCnt-1)] = tophash(hash) // mask inserti to avoid bounds checks
+
+	insertk = add(unsafe.Pointer(insertb), dataOffset+inserti*8)
+	// store new key at insert position
+	*(*uint64)(insertk) = key
+
+	h.count++
+
+done:
+	elem := add(unsafe.Pointer(insertb), dataOffset+bucketCnt*8+inserti*uintptr(t.ValueSize))
+	if h.flags&hashWriting == 0 {
+		fatal("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+	return elem
+}
+
+func mapassign_fast64ptr(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
+	if h == nil {
+		panic(plainError("assignment to entry in nil map"))
+	}
+	if raceenabled {
+		callerpc := getcallerpc()
+		racewritepc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(mapassign_fast64))
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map writes")
+	}
+	hash := t.Hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher for consistency with mapassign.
+	h.flags ^= hashWriting
+
+	if h.buckets == nil {
+		h.buckets = newobject(t.Bucket) // newarray(t.bucket, 1)
+	}
+
+again:
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork_fast64(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.BucketSize)))
+
+	var insertb *bmap
+	var inserti uintptr
+	var insertk unsafe.Pointer
+
+bucketloop:
+	for {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if isEmpty(b.tophash[i]) {
+				if insertb == nil {
+					insertb = b
+					inserti = i
+				}
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := *((*unsafe.Pointer)(add(unsafe.Pointer(b), dataOffset+i*8)))
+			if k != key {
+				continue
+			}
+			insertb = b
+			inserti = i
+			goto done
+		}
+		ovf := b.overflow(t)
+		if ovf == nil {
+			break
+		}
+		b = ovf
+	}
+
+	// Did not find mapping for key. Allocate new cell & add entry.
+
+	// If we hit the max load factor or we have too many overflow buckets,
+	// and we're not already in the middle of growing, start growing.
+	if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
+		hashGrow(t, h)
+		goto again // Growing the table invalidates everything, so try again
+	}
+
+	if insertb == nil {
+		// The current bucket and all the overflow buckets connected to it are full, allocate a new one.
+		insertb = h.newoverflow(t, b)
+		inserti = 0 // not necessary, but avoids needlessly spilling inserti
+	}
+	insertb.tophash[inserti&(bucketCnt-1)] = tophash(hash) // mask inserti to avoid bounds checks
+
+	insertk = add(unsafe.Pointer(insertb), dataOffset+inserti*8)
+	// store new key at insert position
+	*(*unsafe.Pointer)(insertk) = key
+
+	h.count++
+
+done:
+	elem := add(unsafe.Pointer(insertb), dataOffset+bucketCnt*8+inserti*uintptr(t.ValueSize))
+	if h.flags&hashWriting == 0 {
+		fatal("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+	return elem
+}
+
+func mapdelete_fast64(t *maptype, h *hmap, key uint64) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racewritepc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(mapdelete_fast64))
+	}
+	if h == nil || h.count == 0 {
+		return
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map writes")
+	}
+
+	hash := t.Hasher(noescape(unsafe.Pointer(&key)), uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher for consistency with mapdelete
+	h.flags ^= hashWriting
+
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork_fast64(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.BucketSize)))
+	bOrig := b
+search:
+	for ; b != nil; b = b.overflow(t) {
+		for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 8) {
+			if key != *(*uint64)(k) || isEmpty(b.tophash[i]) {
+				continue
+			}
+			// Only clear key if there are pointers in it.
+			if t.Key.PtrBytes != 0 {
+				if goarch.PtrSize == 8 {
+					*(*unsafe.Pointer)(k) = nil
+				} else {
+					// There are three ways to squeeze at one or more 32 bit pointers into 64 bits.
+					// Just call memclrHasPointers instead of trying to handle all cases here.
+					memclrHasPointers(k, 8)
+				}
+			}
+			e := add(unsafe.Pointer(b), dataOffset+bucketCnt*8+i*uintptr(t.ValueSize))
+			if t.Elem.PtrBytes != 0 {
+				memclrHasPointers(e, t.Elem.Size_)
+			} else {
+				memclrNoHeapPointers(e, t.Elem.Size_)
+			}
+			b.tophash[i] = emptyOne
+			// If the bucket now ends in a bunch of emptyOne states,
+			// change those to emptyRest states.
+			if i == bucketCnt-1 {
+				if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
+					goto notLast
+				}
+			} else {
+				if b.tophash[i+1] != emptyRest {
+					goto notLast
+				}
+			}
+			for {
+				b.tophash[i] = emptyRest
+				if i == 0 {
+					if b == bOrig {
+						break // beginning of initial bucket, we're done.
+					}
+					// Find previous bucket, continue at its last entry.
+					c := b
+					for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
+					}
+					i = bucketCnt - 1
+				} else {
+					i--
+				}
+				if b.tophash[i] != emptyOne {
+					break
+				}
+			}
+		notLast:
+			h.count--
+			// Reset the hash seed to make it more difficult for attackers to
+			// repeatedly trigger hash collisions. See issue 25237.
+			if h.count == 0 {
+				h.hash0 = fastrand()
+			}
+			break search
+		}
+	}
+
+	if h.flags&hashWriting == 0 {
+		fatal("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+}
+
+func growWork_fast64(t *maptype, h *hmap, bucket uintptr) {
+	// make sure we evacuate the oldbucket corresponding
+	// to the bucket we're about to use
+	evacuate_fast64(t, h, bucket&h.oldbucketmask())
+
+	// evacuate one more oldbucket to make progress on growing
+	if h.growing() {
+		evacuate_fast64(t, h, h.nevacuate)
+	}
+}
+
+func evacuate_fast64(t *maptype, h *hmap, oldbucket uintptr) {
+	b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.BucketSize)))
+	newbit := h.noldbuckets()
+	if !evacuated(b) {
+		// TODO: reuse overflow buckets instead of using new ones, if there
+		// is no iterator using the old buckets.  (If !oldIterator.)
+
+		// xy contains the x and y (low and high) evacuation destinations.
+		var xy [2]evacDst
+		x := &xy[0]
+		x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.BucketSize)))
+		x.k = add(unsafe.Pointer(x.b), dataOffset)
+		x.e = add(x.k, bucketCnt*8)
+
+		if !h.sameSizeGrow() {
+			// Only calculate y pointers if we're growing bigger.
+			// Otherwise GC can see bad pointers.
+			y := &xy[1]
+			y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.BucketSize)))
+			y.k = add(unsafe.Pointer(y.b), dataOffset)
+			y.e = add(y.k, bucketCnt*8)
+		}
+
+		for ; b != nil; b = b.overflow(t) {
+			k := add(unsafe.Pointer(b), dataOffset)
+			e := add(k, bucketCnt*8)
+			for i := 0; i < bucketCnt; i, k, e = i+1, add(k, 8), add(e, uintptr(t.ValueSize)) {
+				top := b.tophash[i]
+				if isEmpty(top) {
+					b.tophash[i] = evacuatedEmpty
+					continue
+				}
+				if top < minTopHash {
+					throw("bad map state")
+				}
+				var useY uint8
+				if !h.sameSizeGrow() {
+					// Compute hash to make our evacuation decision (whether we need
+					// to send this key/elem to bucket x or bucket y).
+					hash := t.Hasher(k, uintptr(h.hash0))
+					if hash&newbit != 0 {
+						useY = 1
+					}
+				}
+
+				b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY, enforced in makemap
+				dst := &xy[useY]                 // evacuation destination
+
+				if dst.i == bucketCnt {
+					dst.b = h.newoverflow(t, dst.b)
+					dst.i = 0
+					dst.k = add(unsafe.Pointer(dst.b), dataOffset)
+					dst.e = add(dst.k, bucketCnt*8)
+				}
+				dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
+
+				// Copy key.
+				if t.Key.PtrBytes != 0 && writeBarrier.enabled {
+					if goarch.PtrSize == 8 {
+						// Write with a write barrier.
+						*(*unsafe.Pointer)(dst.k) = *(*unsafe.Pointer)(k)
+					} else {
+						// There are three ways to squeeze at least one 32 bit pointer into 64 bits.
+						// Give up and call typedmemmove.
+						typedmemmove(t.Key, dst.k, k)
+					}
+				} else {
+					*(*uint64)(dst.k) = *(*uint64)(k)
+				}
+
+				typedmemmove(t.Elem, dst.e, e)
+				dst.i++
+				// These updates might push these pointers past the end of the
+				// key or elem arrays.  That's ok, as we have the overflow pointer
+				// at the end of the bucket to protect against pointing past the
+				// end of the bucket.
+				dst.k = add(dst.k, 8)
+				dst.e = add(dst.e, uintptr(t.ValueSize))
+			}
+		}
+		// Unlink the overflow buckets & clear key/elem to help GC.
+		if h.flags&oldIterator == 0 && t.Bucket.PtrBytes != 0 {
+			b := add(h.oldbuckets, oldbucket*uintptr(t.BucketSize))
+			// Preserve b.tophash because the evacuation
+			// state is maintained there.
+			ptr := add(b, dataOffset)
+			n := uintptr(t.BucketSize) - dataOffset
+			memclrHasPointers(ptr, n)
+		}
+	}
+
+	if oldbucket == h.nevacuate {
+		advanceEvacuationMark(h, t, newbit)
+	}
+}
diff --git a/src/runtime/map_faststr.go b/src/runtime/map_faststr.go
new file mode 100644
index 0000000..ef71da8
--- /dev/null
+++ b/src/runtime/map_faststr.go
@@ -0,0 +1,485 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+func mapaccess1_faststr(t *maptype, h *hmap, ky string) unsafe.Pointer {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(mapaccess1_faststr))
+	}
+	if h == nil || h.count == 0 {
+		return unsafe.Pointer(&zeroVal[0])
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map read and map write")
+	}
+	key := stringStructOf(&ky)
+	if h.B == 0 {
+		// One-bucket table.
+		b := (*bmap)(h.buckets)
+		if key.len < 32 {
+			// short key, doing lots of comparisons is ok
+			for i, kptr := uintptr(0), b.keys(); i < bucketCnt; i, kptr = i+1, add(kptr, 2*goarch.PtrSize) {
+				k := (*stringStruct)(kptr)
+				if k.len != key.len || isEmpty(b.tophash[i]) {
+					if b.tophash[i] == emptyRest {
+						break
+					}
+					continue
+				}
+				if k.str == key.str || memequal(k.str, key.str, uintptr(key.len)) {
+					return add(unsafe.Pointer(b), dataOffset+bucketCnt*2*goarch.PtrSize+i*uintptr(t.ValueSize))
+				}
+			}
+			return unsafe.Pointer(&zeroVal[0])
+		}
+		// long key, try not to do more comparisons than necessary
+		keymaybe := uintptr(bucketCnt)
+		for i, kptr := uintptr(0), b.keys(); i < bucketCnt; i, kptr = i+1, add(kptr, 2*goarch.PtrSize) {
+			k := (*stringStruct)(kptr)
+			if k.len != key.len || isEmpty(b.tophash[i]) {
+				if b.tophash[i] == emptyRest {
+					break
+				}
+				continue
+			}
+			if k.str == key.str {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*2*goarch.PtrSize+i*uintptr(t.ValueSize))
+			}
+			// check first 4 bytes
+			if *((*[4]byte)(key.str)) != *((*[4]byte)(k.str)) {
+				continue
+			}
+			// check last 4 bytes
+			if *((*[4]byte)(add(key.str, uintptr(key.len)-4))) != *((*[4]byte)(add(k.str, uintptr(key.len)-4))) {
+				continue
+			}
+			if keymaybe != bucketCnt {
+				// Two keys are potential matches. Use hash to distinguish them.
+				goto dohash
+			}
+			keymaybe = i
+		}
+		if keymaybe != bucketCnt {
+			k := (*stringStruct)(add(unsafe.Pointer(b), dataOffset+keymaybe*2*goarch.PtrSize))
+			if memequal(k.str, key.str, uintptr(key.len)) {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*2*goarch.PtrSize+keymaybe*uintptr(t.ValueSize))
+			}
+		}
+		return unsafe.Pointer(&zeroVal[0])
+	}
+dohash:
+	hash := t.Hasher(noescape(unsafe.Pointer(&ky)), uintptr(h.hash0))
+	m := bucketMask(h.B)
+	b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.BucketSize)))
+	if c := h.oldbuckets; c != nil {
+		if !h.sameSizeGrow() {
+			// There used to be half as many buckets; mask down one more power of two.
+			m >>= 1
+		}
+		oldb := (*bmap)(add(c, (hash&m)*uintptr(t.BucketSize)))
+		if !evacuated(oldb) {
+			b = oldb
+		}
+	}
+	top := tophash(hash)
+	for ; b != nil; b = b.overflow(t) {
+		for i, kptr := uintptr(0), b.keys(); i < bucketCnt; i, kptr = i+1, add(kptr, 2*goarch.PtrSize) {
+			k := (*stringStruct)(kptr)
+			if k.len != key.len || b.tophash[i] != top {
+				continue
+			}
+			if k.str == key.str || memequal(k.str, key.str, uintptr(key.len)) {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*2*goarch.PtrSize+i*uintptr(t.ValueSize))
+			}
+		}
+	}
+	return unsafe.Pointer(&zeroVal[0])
+}
+
+func mapaccess2_faststr(t *maptype, h *hmap, ky string) (unsafe.Pointer, bool) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racereadpc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(mapaccess2_faststr))
+	}
+	if h == nil || h.count == 0 {
+		return unsafe.Pointer(&zeroVal[0]), false
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map read and map write")
+	}
+	key := stringStructOf(&ky)
+	if h.B == 0 {
+		// One-bucket table.
+		b := (*bmap)(h.buckets)
+		if key.len < 32 {
+			// short key, doing lots of comparisons is ok
+			for i, kptr := uintptr(0), b.keys(); i < bucketCnt; i, kptr = i+1, add(kptr, 2*goarch.PtrSize) {
+				k := (*stringStruct)(kptr)
+				if k.len != key.len || isEmpty(b.tophash[i]) {
+					if b.tophash[i] == emptyRest {
+						break
+					}
+					continue
+				}
+				if k.str == key.str || memequal(k.str, key.str, uintptr(key.len)) {
+					return add(unsafe.Pointer(b), dataOffset+bucketCnt*2*goarch.PtrSize+i*uintptr(t.ValueSize)), true
+				}
+			}
+			return unsafe.Pointer(&zeroVal[0]), false
+		}
+		// long key, try not to do more comparisons than necessary
+		keymaybe := uintptr(bucketCnt)
+		for i, kptr := uintptr(0), b.keys(); i < bucketCnt; i, kptr = i+1, add(kptr, 2*goarch.PtrSize) {
+			k := (*stringStruct)(kptr)
+			if k.len != key.len || isEmpty(b.tophash[i]) {
+				if b.tophash[i] == emptyRest {
+					break
+				}
+				continue
+			}
+			if k.str == key.str {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*2*goarch.PtrSize+i*uintptr(t.ValueSize)), true
+			}
+			// check first 4 bytes
+			if *((*[4]byte)(key.str)) != *((*[4]byte)(k.str)) {
+				continue
+			}
+			// check last 4 bytes
+			if *((*[4]byte)(add(key.str, uintptr(key.len)-4))) != *((*[4]byte)(add(k.str, uintptr(key.len)-4))) {
+				continue
+			}
+			if keymaybe != bucketCnt {
+				// Two keys are potential matches. Use hash to distinguish them.
+				goto dohash
+			}
+			keymaybe = i
+		}
+		if keymaybe != bucketCnt {
+			k := (*stringStruct)(add(unsafe.Pointer(b), dataOffset+keymaybe*2*goarch.PtrSize))
+			if memequal(k.str, key.str, uintptr(key.len)) {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*2*goarch.PtrSize+keymaybe*uintptr(t.ValueSize)), true
+			}
+		}
+		return unsafe.Pointer(&zeroVal[0]), false
+	}
+dohash:
+	hash := t.Hasher(noescape(unsafe.Pointer(&ky)), uintptr(h.hash0))
+	m := bucketMask(h.B)
+	b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.BucketSize)))
+	if c := h.oldbuckets; c != nil {
+		if !h.sameSizeGrow() {
+			// There used to be half as many buckets; mask down one more power of two.
+			m >>= 1
+		}
+		oldb := (*bmap)(add(c, (hash&m)*uintptr(t.BucketSize)))
+		if !evacuated(oldb) {
+			b = oldb
+		}
+	}
+	top := tophash(hash)
+	for ; b != nil; b = b.overflow(t) {
+		for i, kptr := uintptr(0), b.keys(); i < bucketCnt; i, kptr = i+1, add(kptr, 2*goarch.PtrSize) {
+			k := (*stringStruct)(kptr)
+			if k.len != key.len || b.tophash[i] != top {
+				continue
+			}
+			if k.str == key.str || memequal(k.str, key.str, uintptr(key.len)) {
+				return add(unsafe.Pointer(b), dataOffset+bucketCnt*2*goarch.PtrSize+i*uintptr(t.ValueSize)), true
+			}
+		}
+	}
+	return unsafe.Pointer(&zeroVal[0]), false
+}
+
+func mapassign_faststr(t *maptype, h *hmap, s string) unsafe.Pointer {
+	if h == nil {
+		panic(plainError("assignment to entry in nil map"))
+	}
+	if raceenabled {
+		callerpc := getcallerpc()
+		racewritepc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(mapassign_faststr))
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map writes")
+	}
+	key := stringStructOf(&s)
+	hash := t.Hasher(noescape(unsafe.Pointer(&s)), uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher for consistency with mapassign.
+	h.flags ^= hashWriting
+
+	if h.buckets == nil {
+		h.buckets = newobject(t.Bucket) // newarray(t.bucket, 1)
+	}
+
+again:
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork_faststr(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.BucketSize)))
+	top := tophash(hash)
+
+	var insertb *bmap
+	var inserti uintptr
+	var insertk unsafe.Pointer
+
+bucketloop:
+	for {
+		for i := uintptr(0); i < bucketCnt; i++ {
+			if b.tophash[i] != top {
+				if isEmpty(b.tophash[i]) && insertb == nil {
+					insertb = b
+					inserti = i
+				}
+				if b.tophash[i] == emptyRest {
+					break bucketloop
+				}
+				continue
+			}
+			k := (*stringStruct)(add(unsafe.Pointer(b), dataOffset+i*2*goarch.PtrSize))
+			if k.len != key.len {
+				continue
+			}
+			if k.str != key.str && !memequal(k.str, key.str, uintptr(key.len)) {
+				continue
+			}
+			// already have a mapping for key. Update it.
+			inserti = i
+			insertb = b
+			// Overwrite existing key, so it can be garbage collected.
+			// The size is already guaranteed to be set correctly.
+			k.str = key.str
+			goto done
+		}
+		ovf := b.overflow(t)
+		if ovf == nil {
+			break
+		}
+		b = ovf
+	}
+
+	// Did not find mapping for key. Allocate new cell & add entry.
+
+	// If we hit the max load factor or we have too many overflow buckets,
+	// and we're not already in the middle of growing, start growing.
+	if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
+		hashGrow(t, h)
+		goto again // Growing the table invalidates everything, so try again
+	}
+
+	if insertb == nil {
+		// The current bucket and all the overflow buckets connected to it are full, allocate a new one.
+		insertb = h.newoverflow(t, b)
+		inserti = 0 // not necessary, but avoids needlessly spilling inserti
+	}
+	insertb.tophash[inserti&(bucketCnt-1)] = top // mask inserti to avoid bounds checks
+
+	insertk = add(unsafe.Pointer(insertb), dataOffset+inserti*2*goarch.PtrSize)
+	// store new key at insert position
+	*((*stringStruct)(insertk)) = *key
+	h.count++
+
+done:
+	elem := add(unsafe.Pointer(insertb), dataOffset+bucketCnt*2*goarch.PtrSize+inserti*uintptr(t.ValueSize))
+	if h.flags&hashWriting == 0 {
+		fatal("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+	return elem
+}
+
+func mapdelete_faststr(t *maptype, h *hmap, ky string) {
+	if raceenabled && h != nil {
+		callerpc := getcallerpc()
+		racewritepc(unsafe.Pointer(h), callerpc, abi.FuncPCABIInternal(mapdelete_faststr))
+	}
+	if h == nil || h.count == 0 {
+		return
+	}
+	if h.flags&hashWriting != 0 {
+		fatal("concurrent map writes")
+	}
+
+	key := stringStructOf(&ky)
+	hash := t.Hasher(noescape(unsafe.Pointer(&ky)), uintptr(h.hash0))
+
+	// Set hashWriting after calling t.hasher for consistency with mapdelete
+	h.flags ^= hashWriting
+
+	bucket := hash & bucketMask(h.B)
+	if h.growing() {
+		growWork_faststr(t, h, bucket)
+	}
+	b := (*bmap)(add(h.buckets, bucket*uintptr(t.BucketSize)))
+	bOrig := b
+	top := tophash(hash)
+search:
+	for ; b != nil; b = b.overflow(t) {
+		for i, kptr := uintptr(0), b.keys(); i < bucketCnt; i, kptr = i+1, add(kptr, 2*goarch.PtrSize) {
+			k := (*stringStruct)(kptr)
+			if k.len != key.len || b.tophash[i] != top {
+				continue
+			}
+			if k.str != key.str && !memequal(k.str, key.str, uintptr(key.len)) {
+				continue
+			}
+			// Clear key's pointer.
+			k.str = nil
+			e := add(unsafe.Pointer(b), dataOffset+bucketCnt*2*goarch.PtrSize+i*uintptr(t.ValueSize))
+			if t.Elem.PtrBytes != 0 {
+				memclrHasPointers(e, t.Elem.Size_)
+			} else {
+				memclrNoHeapPointers(e, t.Elem.Size_)
+			}
+			b.tophash[i] = emptyOne
+			// If the bucket now ends in a bunch of emptyOne states,
+			// change those to emptyRest states.
+			if i == bucketCnt-1 {
+				if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {
+					goto notLast
+				}
+			} else {
+				if b.tophash[i+1] != emptyRest {
+					goto notLast
+				}
+			}
+			for {
+				b.tophash[i] = emptyRest
+				if i == 0 {
+					if b == bOrig {
+						break // beginning of initial bucket, we're done.
+					}
+					// Find previous bucket, continue at its last entry.
+					c := b
+					for b = bOrig; b.overflow(t) != c; b = b.overflow(t) {
+					}
+					i = bucketCnt - 1
+				} else {
+					i--
+				}
+				if b.tophash[i] != emptyOne {
+					break
+				}
+			}
+		notLast:
+			h.count--
+			// Reset the hash seed to make it more difficult for attackers to
+			// repeatedly trigger hash collisions. See issue 25237.
+			if h.count == 0 {
+				h.hash0 = fastrand()
+			}
+			break search
+		}
+	}
+
+	if h.flags&hashWriting == 0 {
+		fatal("concurrent map writes")
+	}
+	h.flags &^= hashWriting
+}
+
+func growWork_faststr(t *maptype, h *hmap, bucket uintptr) {
+	// make sure we evacuate the oldbucket corresponding
+	// to the bucket we're about to use
+	evacuate_faststr(t, h, bucket&h.oldbucketmask())
+
+	// evacuate one more oldbucket to make progress on growing
+	if h.growing() {
+		evacuate_faststr(t, h, h.nevacuate)
+	}
+}
+
+func evacuate_faststr(t *maptype, h *hmap, oldbucket uintptr) {
+	b := (*bmap)(add(h.oldbuckets, oldbucket*uintptr(t.BucketSize)))
+	newbit := h.noldbuckets()
+	if !evacuated(b) {
+		// TODO: reuse overflow buckets instead of using new ones, if there
+		// is no iterator using the old buckets.  (If !oldIterator.)
+
+		// xy contains the x and y (low and high) evacuation destinations.
+		var xy [2]evacDst
+		x := &xy[0]
+		x.b = (*bmap)(add(h.buckets, oldbucket*uintptr(t.BucketSize)))
+		x.k = add(unsafe.Pointer(x.b), dataOffset)
+		x.e = add(x.k, bucketCnt*2*goarch.PtrSize)
+
+		if !h.sameSizeGrow() {
+			// Only calculate y pointers if we're growing bigger.
+			// Otherwise GC can see bad pointers.
+			y := &xy[1]
+			y.b = (*bmap)(add(h.buckets, (oldbucket+newbit)*uintptr(t.BucketSize)))
+			y.k = add(unsafe.Pointer(y.b), dataOffset)
+			y.e = add(y.k, bucketCnt*2*goarch.PtrSize)
+		}
+
+		for ; b != nil; b = b.overflow(t) {
+			k := add(unsafe.Pointer(b), dataOffset)
+			e := add(k, bucketCnt*2*goarch.PtrSize)
+			for i := 0; i < bucketCnt; i, k, e = i+1, add(k, 2*goarch.PtrSize), add(e, uintptr(t.ValueSize)) {
+				top := b.tophash[i]
+				if isEmpty(top) {
+					b.tophash[i] = evacuatedEmpty
+					continue
+				}
+				if top < minTopHash {
+					throw("bad map state")
+				}
+				var useY uint8
+				if !h.sameSizeGrow() {
+					// Compute hash to make our evacuation decision (whether we need
+					// to send this key/elem to bucket x or bucket y).
+					hash := t.Hasher(k, uintptr(h.hash0))
+					if hash&newbit != 0 {
+						useY = 1
+					}
+				}
+
+				b.tophash[i] = evacuatedX + useY // evacuatedX + 1 == evacuatedY, enforced in makemap
+				dst := &xy[useY]                 // evacuation destination
+
+				if dst.i == bucketCnt {
+					dst.b = h.newoverflow(t, dst.b)
+					dst.i = 0
+					dst.k = add(unsafe.Pointer(dst.b), dataOffset)
+					dst.e = add(dst.k, bucketCnt*2*goarch.PtrSize)
+				}
+				dst.b.tophash[dst.i&(bucketCnt-1)] = top // mask dst.i as an optimization, to avoid a bounds check
+
+				// Copy key.
+				*(*string)(dst.k) = *(*string)(k)
+
+				typedmemmove(t.Elem, dst.e, e)
+				dst.i++
+				// These updates might push these pointers past the end of the
+				// key or elem arrays.  That's ok, as we have the overflow pointer
+				// at the end of the bucket to protect against pointing past the
+				// end of the bucket.
+				dst.k = add(dst.k, 2*goarch.PtrSize)
+				dst.e = add(dst.e, uintptr(t.ValueSize))
+			}
+		}
+		// Unlink the overflow buckets & clear key/elem to help GC.
+		if h.flags&oldIterator == 0 && t.Bucket.PtrBytes != 0 {
+			b := add(h.oldbuckets, oldbucket*uintptr(t.BucketSize))
+			// Preserve b.tophash because the evacuation
+			// state is maintained there.
+			ptr := add(b, dataOffset)
+			n := uintptr(t.BucketSize) - dataOffset
+			memclrHasPointers(ptr, n)
+		}
+	}
+
+	if oldbucket == h.nevacuate {
+		advanceEvacuationMark(h, t, newbit)
+	}
+}
diff --git a/src/runtime/map_test.go b/src/runtime/map_test.go
new file mode 100644
index 0000000..3675106
--- /dev/null
+++ b/src/runtime/map_test.go
@@ -0,0 +1,1258 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"internal/abi"
+	"internal/goarch"
+	"math"
+	"reflect"
+	"runtime"
+	"sort"
+	"strconv"
+	"strings"
+	"sync"
+	"testing"
+)
+
+func TestHmapSize(t *testing.T) {
+	// The structure of hmap is defined in runtime/map.go
+	// and in cmd/compile/internal/gc/reflect.go and must be in sync.
+	// The size of hmap should be 48 bytes on 64 bit and 28 bytes on 32 bit platforms.
+	var hmapSize = uintptr(8 + 5*goarch.PtrSize)
+	if runtime.RuntimeHmapSize != hmapSize {
+		t.Errorf("sizeof(runtime.hmap{})==%d, want %d", runtime.RuntimeHmapSize, hmapSize)
+	}
+
+}
+
+// negative zero is a good test because:
+//  1. 0 and -0 are equal, yet have distinct representations.
+//  2. 0 is represented as all zeros, -0 isn't.
+//
+// I'm not sure the language spec actually requires this behavior,
+// but it's what the current map implementation does.
+func TestNegativeZero(t *testing.T) {
+	m := make(map[float64]bool, 0)
+
+	m[+0.0] = true
+	m[math.Copysign(0.0, -1.0)] = true // should overwrite +0 entry
+
+	if len(m) != 1 {
+		t.Error("length wrong")
+	}
+
+	for k := range m {
+		if math.Copysign(1.0, k) > 0 {
+			t.Error("wrong sign")
+		}
+	}
+
+	m = make(map[float64]bool, 0)
+	m[math.Copysign(0.0, -1.0)] = true
+	m[+0.0] = true // should overwrite -0.0 entry
+
+	if len(m) != 1 {
+		t.Error("length wrong")
+	}
+
+	for k := range m {
+		if math.Copysign(1.0, k) < 0 {
+			t.Error("wrong sign")
+		}
+	}
+}
+
+func testMapNan(t *testing.T, m map[float64]int) {
+	if len(m) != 3 {
+		t.Error("length wrong")
+	}
+	s := 0
+	for k, v := range m {
+		if k == k {
+			t.Error("nan disappeared")
+		}
+		if (v & (v - 1)) != 0 {
+			t.Error("value wrong")
+		}
+		s |= v
+	}
+	if s != 7 {
+		t.Error("values wrong")
+	}
+}
+
+// nan is a good test because nan != nan, and nan has
+// a randomized hash value.
+func TestMapAssignmentNan(t *testing.T) {
+	m := make(map[float64]int, 0)
+	nan := math.NaN()
+
+	// Test assignment.
+	m[nan] = 1
+	m[nan] = 2
+	m[nan] = 4
+	testMapNan(t, m)
+}
+
+// nan is a good test because nan != nan, and nan has
+// a randomized hash value.
+func TestMapOperatorAssignmentNan(t *testing.T) {
+	m := make(map[float64]int, 0)
+	nan := math.NaN()
+
+	// Test assignment operations.
+	m[nan] += 1
+	m[nan] += 2
+	m[nan] += 4
+	testMapNan(t, m)
+}
+
+func TestMapOperatorAssignment(t *testing.T) {
+	m := make(map[int]int, 0)
+
+	// "m[k] op= x" is rewritten into "m[k] = m[k] op x"
+	// differently when op is / or % than when it isn't.
+	// Simple test to make sure they all work as expected.
+	m[0] = 12345
+	m[0] += 67890
+	m[0] /= 123
+	m[0] %= 456
+
+	const want = (12345 + 67890) / 123 % 456
+	if got := m[0]; got != want {
+		t.Errorf("got %d, want %d", got, want)
+	}
+}
+
+var sinkAppend bool
+
+func TestMapAppendAssignment(t *testing.T) {
+	m := make(map[int][]int, 0)
+
+	m[0] = nil
+	m[0] = append(m[0], 12345)
+	m[0] = append(m[0], 67890)
+	sinkAppend, m[0] = !sinkAppend, append(m[0], 123, 456)
+	a := []int{7, 8, 9, 0}
+	m[0] = append(m[0], a...)
+
+	want := []int{12345, 67890, 123, 456, 7, 8, 9, 0}
+	if got := m[0]; !reflect.DeepEqual(got, want) {
+		t.Errorf("got %v, want %v", got, want)
+	}
+}
+
+// Maps aren't actually copied on assignment.
+func TestAlias(t *testing.T) {
+	m := make(map[int]int, 0)
+	m[0] = 5
+	n := m
+	n[0] = 6
+	if m[0] != 6 {
+		t.Error("alias didn't work")
+	}
+}
+
+func TestGrowWithNaN(t *testing.T) {
+	m := make(map[float64]int, 4)
+	nan := math.NaN()
+
+	// Use both assignment and assignment operations as they may
+	// behave differently.
+	m[nan] = 1
+	m[nan] = 2
+	m[nan] += 4
+
+	cnt := 0
+	s := 0
+	growflag := true
+	for k, v := range m {
+		if growflag {
+			// force a hashtable resize
+			for i := 0; i < 50; i++ {
+				m[float64(i)] = i
+			}
+			for i := 50; i < 100; i++ {
+				m[float64(i)] += i
+			}
+			growflag = false
+		}
+		if k != k {
+			cnt++
+			s |= v
+		}
+	}
+	if cnt != 3 {
+		t.Error("NaN keys lost during grow")
+	}
+	if s != 7 {
+		t.Error("NaN values lost during grow")
+	}
+}
+
+type FloatInt struct {
+	x float64
+	y int
+}
+
+func TestGrowWithNegativeZero(t *testing.T) {
+	negzero := math.Copysign(0.0, -1.0)
+	m := make(map[FloatInt]int, 4)
+	m[FloatInt{0.0, 0}] = 1
+	m[FloatInt{0.0, 1}] += 2
+	m[FloatInt{0.0, 2}] += 4
+	m[FloatInt{0.0, 3}] = 8
+	growflag := true
+	s := 0
+	cnt := 0
+	negcnt := 0
+	// The first iteration should return the +0 key.
+	// The subsequent iterations should return the -0 key.
+	// I'm not really sure this is required by the spec,
+	// but it makes sense.
+	// TODO: are we allowed to get the first entry returned again???
+	for k, v := range m {
+		if v == 0 {
+			continue
+		} // ignore entries added to grow table
+		cnt++
+		if math.Copysign(1.0, k.x) < 0 {
+			if v&16 == 0 {
+				t.Error("key/value not updated together 1")
+			}
+			negcnt++
+			s |= v & 15
+		} else {
+			if v&16 == 16 {
+				t.Error("key/value not updated together 2", k, v)
+			}
+			s |= v
+		}
+		if growflag {
+			// force a hashtable resize
+			for i := 0; i < 100; i++ {
+				m[FloatInt{3.0, i}] = 0
+			}
+			// then change all the entries
+			// to negative zero
+			m[FloatInt{negzero, 0}] = 1 | 16
+			m[FloatInt{negzero, 1}] = 2 | 16
+			m[FloatInt{negzero, 2}] = 4 | 16
+			m[FloatInt{negzero, 3}] = 8 | 16
+			growflag = false
+		}
+	}
+	if s != 15 {
+		t.Error("entry missing", s)
+	}
+	if cnt != 4 {
+		t.Error("wrong number of entries returned by iterator", cnt)
+	}
+	if negcnt != 3 {
+		t.Error("update to negzero missed by iteration", negcnt)
+	}
+}
+
+func TestIterGrowAndDelete(t *testing.T) {
+	m := make(map[int]int, 4)
+	for i := 0; i < 100; i++ {
+		m[i] = i
+	}
+	growflag := true
+	for k := range m {
+		if growflag {
+			// grow the table
+			for i := 100; i < 1000; i++ {
+				m[i] = i
+			}
+			// delete all odd keys
+			for i := 1; i < 1000; i += 2 {
+				delete(m, i)
+			}
+			growflag = false
+		} else {
+			if k&1 == 1 {
+				t.Error("odd value returned")
+			}
+		}
+	}
+}
+
+// make sure old bucket arrays don't get GCd while
+// an iterator is still using them.
+func TestIterGrowWithGC(t *testing.T) {
+	m := make(map[int]int, 4)
+	for i := 0; i < 8; i++ {
+		m[i] = i
+	}
+	for i := 8; i < 16; i++ {
+		m[i] += i
+	}
+	growflag := true
+	bitmask := 0
+	for k := range m {
+		if k < 16 {
+			bitmask |= 1 << uint(k)
+		}
+		if growflag {
+			// grow the table
+			for i := 100; i < 1000; i++ {
+				m[i] = i
+			}
+			// trigger a gc
+			runtime.GC()
+			growflag = false
+		}
+	}
+	if bitmask != 1<<16-1 {
+		t.Error("missing key", bitmask)
+	}
+}
+
+func testConcurrentReadsAfterGrowth(t *testing.T, useReflect bool) {
+	t.Parallel()
+	if runtime.GOMAXPROCS(-1) == 1 {
+		defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(16))
+	}
+	numLoop := 10
+	numGrowStep := 250
+	numReader := 16
+	if testing.Short() {
+		numLoop, numGrowStep = 2, 100
+	}
+	for i := 0; i < numLoop; i++ {
+		m := make(map[int]int, 0)
+		for gs := 0; gs < numGrowStep; gs++ {
+			m[gs] = gs
+			var wg sync.WaitGroup
+			wg.Add(numReader * 2)
+			for nr := 0; nr < numReader; nr++ {
+				go func() {
+					defer wg.Done()
+					for range m {
+					}
+				}()
+				go func() {
+					defer wg.Done()
+					for key := 0; key < gs; key++ {
+						_ = m[key]
+					}
+				}()
+				if useReflect {
+					wg.Add(1)
+					go func() {
+						defer wg.Done()
+						mv := reflect.ValueOf(m)
+						keys := mv.MapKeys()
+						for _, k := range keys {
+							mv.MapIndex(k)
+						}
+					}()
+				}
+			}
+			wg.Wait()
+		}
+	}
+}
+
+func TestConcurrentReadsAfterGrowth(t *testing.T) {
+	testConcurrentReadsAfterGrowth(t, false)
+}
+
+func TestConcurrentReadsAfterGrowthReflect(t *testing.T) {
+	testConcurrentReadsAfterGrowth(t, true)
+}
+
+func TestBigItems(t *testing.T) {
+	var key [256]string
+	for i := 0; i < 256; i++ {
+		key[i] = "foo"
+	}
+	m := make(map[[256]string][256]string, 4)
+	for i := 0; i < 100; i++ {
+		key[37] = fmt.Sprintf("string%02d", i)
+		m[key] = key
+	}
+	var keys [100]string
+	var values [100]string
+	i := 0
+	for k, v := range m {
+		keys[i] = k[37]
+		values[i] = v[37]
+		i++
+	}
+	sort.Strings(keys[:])
+	sort.Strings(values[:])
+	for i := 0; i < 100; i++ {
+		if keys[i] != fmt.Sprintf("string%02d", i) {
+			t.Errorf("#%d: missing key: %v", i, keys[i])
+		}
+		if values[i] != fmt.Sprintf("string%02d", i) {
+			t.Errorf("#%d: missing value: %v", i, values[i])
+		}
+	}
+}
+
+func TestMapHugeZero(t *testing.T) {
+	type T [4000]byte
+	m := map[int]T{}
+	x := m[0]
+	if x != (T{}) {
+		t.Errorf("map value not zero")
+	}
+	y, ok := m[0]
+	if ok {
+		t.Errorf("map value should be missing")
+	}
+	if y != (T{}) {
+		t.Errorf("map value not zero")
+	}
+}
+
+type empty struct {
+}
+
+func TestEmptyKeyAndValue(t *testing.T) {
+	a := make(map[int]empty, 4)
+	b := make(map[empty]int, 4)
+	c := make(map[empty]empty, 4)
+	a[0] = empty{}
+	b[empty{}] = 0
+	b[empty{}] = 1
+	c[empty{}] = empty{}
+
+	if len(a) != 1 {
+		t.Errorf("empty value insert problem")
+	}
+	if b[empty{}] != 1 {
+		t.Errorf("empty key returned wrong value")
+	}
+}
+
+// Tests a map with a single bucket, with same-lengthed short keys
+// ("quick keys") as well as long keys.
+func TestSingleBucketMapStringKeys_DupLen(t *testing.T) {
+	testMapLookups(t, map[string]string{
+		"x":                      "x1val",
+		"xx":                     "x2val",
+		"foo":                    "fooval",
+		"bar":                    "barval", // same key length as "foo"
+		"xxxx":                   "x4val",
+		strings.Repeat("x", 128): "longval1",
+		strings.Repeat("y", 128): "longval2",
+	})
+}
+
+// Tests a map with a single bucket, with all keys having different lengths.
+func TestSingleBucketMapStringKeys_NoDupLen(t *testing.T) {
+	testMapLookups(t, map[string]string{
+		"x":                      "x1val",
+		"xx":                     "x2val",
+		"foo":                    "fooval",
+		"xxxx":                   "x4val",
+		"xxxxx":                  "x5val",
+		"xxxxxx":                 "x6val",
+		strings.Repeat("x", 128): "longval",
+	})
+}
+
+func testMapLookups(t *testing.T, m map[string]string) {
+	for k, v := range m {
+		if m[k] != v {
+			t.Fatalf("m[%q] = %q; want %q", k, m[k], v)
+		}
+	}
+}
+
+// Tests whether the iterator returns the right elements when
+// started in the middle of a grow, when the keys are NaNs.
+func TestMapNanGrowIterator(t *testing.T) {
+	m := make(map[float64]int)
+	nan := math.NaN()
+	const nBuckets = 16
+	// To fill nBuckets buckets takes LOAD * nBuckets keys.
+	nKeys := int(nBuckets * runtime.HashLoad)
+
+	// Get map to full point with nan keys.
+	for i := 0; i < nKeys; i++ {
+		m[nan] = i
+	}
+	// Trigger grow
+	m[1.0] = 1
+	delete(m, 1.0)
+
+	// Run iterator
+	found := make(map[int]struct{})
+	for _, v := range m {
+		if v != -1 {
+			if _, repeat := found[v]; repeat {
+				t.Fatalf("repeat of value %d", v)
+			}
+			found[v] = struct{}{}
+		}
+		if len(found) == nKeys/2 {
+			// Halfway through iteration, finish grow.
+			for i := 0; i < nBuckets; i++ {
+				delete(m, 1.0)
+			}
+		}
+	}
+	if len(found) != nKeys {
+		t.Fatalf("missing value")
+	}
+}
+
+func TestMapIterOrder(t *testing.T) {
+	sizes := []int{3, 7, 9, 15}
+	if abi.MapBucketCountBits >= 5 {
+		// it gets flaky (often only one iteration order) at size 3 when abi.MapBucketCountBits >=5.
+		t.Fatalf("This test becomes flaky if abi.MapBucketCountBits(=%d) is 5 or larger", abi.MapBucketCountBits)
+	}
+	for _, n := range sizes {
+		for i := 0; i < 1000; i++ {
+			// Make m be {0: true, 1: true, ..., n-1: true}.
+			m := make(map[int]bool)
+			for i := 0; i < n; i++ {
+				m[i] = true
+			}
+			// Check that iterating over the map produces at least two different orderings.
+			ord := func() []int {
+				var s []int
+				for key := range m {
+					s = append(s, key)
+				}
+				return s
+			}
+			first := ord()
+			ok := false
+			for try := 0; try < 100; try++ {
+				if !reflect.DeepEqual(first, ord()) {
+					ok = true
+					break
+				}
+			}
+			if !ok {
+				t.Errorf("Map with n=%d elements had consistent iteration order: %v", n, first)
+				break
+			}
+		}
+	}
+}
+
+// Issue 8410
+func TestMapSparseIterOrder(t *testing.T) {
+	// Run several rounds to increase the probability
+	// of failure. One is not enough.
+NextRound:
+	for round := 0; round < 10; round++ {
+		m := make(map[int]bool)
+		// Add 1000 items, remove 980.
+		for i := 0; i < 1000; i++ {
+			m[i] = true
+		}
+		for i := 20; i < 1000; i++ {
+			delete(m, i)
+		}
+
+		var first []int
+		for i := range m {
+			first = append(first, i)
+		}
+
+		// 800 chances to get a different iteration order.
+		// See bug 8736 for why we need so many tries.
+		for n := 0; n < 800; n++ {
+			idx := 0
+			for i := range m {
+				if i != first[idx] {
+					// iteration order changed.
+					continue NextRound
+				}
+				idx++
+			}
+		}
+		t.Fatalf("constant iteration order on round %d: %v", round, first)
+	}
+}
+
+func TestMapStringBytesLookup(t *testing.T) {
+	// Use large string keys to avoid small-allocation coalescing,
+	// which can cause AllocsPerRun to report lower counts than it should.
+	m := map[string]int{
+		"1000000000000000000000000000000000000000000000000": 1,
+		"2000000000000000000000000000000000000000000000000": 2,
+	}
+	buf := []byte("1000000000000000000000000000000000000000000000000")
+	if x := m[string(buf)]; x != 1 {
+		t.Errorf(`m[string([]byte("1"))] = %d, want 1`, x)
+	}
+	buf[0] = '2'
+	if x := m[string(buf)]; x != 2 {
+		t.Errorf(`m[string([]byte("2"))] = %d, want 2`, x)
+	}
+
+	var x int
+	n := testing.AllocsPerRun(100, func() {
+		x += m[string(buf)]
+	})
+	if n != 0 {
+		t.Errorf("AllocsPerRun for m[string(buf)] = %v, want 0", n)
+	}
+
+	x = 0
+	n = testing.AllocsPerRun(100, func() {
+		y, ok := m[string(buf)]
+		if !ok {
+			panic("!ok")
+		}
+		x += y
+	})
+	if n != 0 {
+		t.Errorf("AllocsPerRun for x,ok = m[string(buf)] = %v, want 0", n)
+	}
+}
+
+func TestMapLargeKeyNoPointer(t *testing.T) {
+	const (
+		I = 1000
+		N = 64
+	)
+	type T [N]int
+	m := make(map[T]int)
+	for i := 0; i < I; i++ {
+		var v T
+		for j := 0; j < N; j++ {
+			v[j] = i + j
+		}
+		m[v] = i
+	}
+	runtime.GC()
+	for i := 0; i < I; i++ {
+		var v T
+		for j := 0; j < N; j++ {
+			v[j] = i + j
+		}
+		if m[v] != i {
+			t.Fatalf("corrupted map: want %+v, got %+v", i, m[v])
+		}
+	}
+}
+
+func TestMapLargeValNoPointer(t *testing.T) {
+	const (
+		I = 1000
+		N = 64
+	)
+	type T [N]int
+	m := make(map[int]T)
+	for i := 0; i < I; i++ {
+		var v T
+		for j := 0; j < N; j++ {
+			v[j] = i + j
+		}
+		m[i] = v
+	}
+	runtime.GC()
+	for i := 0; i < I; i++ {
+		var v T
+		for j := 0; j < N; j++ {
+			v[j] = i + j
+		}
+		v1 := m[i]
+		for j := 0; j < N; j++ {
+			if v1[j] != v[j] {
+				t.Fatalf("corrupted map: want %+v, got %+v", v, v1)
+			}
+		}
+	}
+}
+
+// Test that making a map with a large or invalid hint
+// doesn't panic. (Issue 19926).
+func TestIgnoreBogusMapHint(t *testing.T) {
+	for _, hint := range []int64{-1, 1 << 62} {
+		_ = make(map[int]int, hint)
+	}
+}
+
+const bs = abi.MapBucketCount
+
+// belowOverflow should be a pretty-full pair of buckets;
+// atOverflow is 1/8 bs larger = 13/8 buckets or two buckets
+// that are 13/16 full each, which is the overflow boundary.
+// Adding one to that should ensure overflow to the next higher size.
+const (
+	belowOverflow = bs * 3 / 2           // 1.5 bs = 2 buckets @ 75%
+	atOverflow    = belowOverflow + bs/8 // 2 buckets at 13/16 fill.
+)
+
+var mapBucketTests = [...]struct {
+	n        int // n is the number of map elements
+	noescape int // number of expected buckets for non-escaping map
+	escape   int // number of expected buckets for escaping map
+}{
+	{-(1 << 30), 1, 1},
+	{-1, 1, 1},
+	{0, 1, 1},
+	{1, 1, 1},
+	{bs, 1, 1},
+	{bs + 1, 2, 2},
+	{belowOverflow, 2, 2},  // 1.5 bs = 2 buckets @ 75%
+	{atOverflow + 1, 4, 4}, // 13/8 bs + 1 == overflow to 4
+
+	{2 * belowOverflow, 4, 4}, // 3 bs = 4 buckets @75%
+	{2*atOverflow + 1, 8, 8},  // 13/4 bs + 1 = overflow to 8
+
+	{4 * belowOverflow, 8, 8},  // 6 bs = 8 buckets @ 75%
+	{4*atOverflow + 1, 16, 16}, // 13/2 bs + 1 = overflow to 16
+}
+
+func TestMapBuckets(t *testing.T) {
+	// Test that maps of different sizes have the right number of buckets.
+	// Non-escaping maps with small buckets (like map[int]int) never
+	// have a nil bucket pointer due to starting with preallocated buckets
+	// on the stack. Escaping maps start with a non-nil bucket pointer if
+	// hint size is above bucketCnt and thereby have more than one bucket.
+	// These tests depend on bucketCnt and loadFactor* in map.go.
+	t.Run("mapliteral", func(t *testing.T) {
+		for _, tt := range mapBucketTests {
+			localMap := map[int]int{}
+			if runtime.MapBucketsPointerIsNil(localMap) {
+				t.Errorf("no escape: buckets pointer is nil for non-escaping map")
+			}
+			for i := 0; i < tt.n; i++ {
+				localMap[i] = i
+			}
+			if got := runtime.MapBucketsCount(localMap); got != tt.noescape {
+				t.Errorf("no escape: n=%d want %d buckets, got %d", tt.n, tt.noescape, got)
+			}
+			escapingMap := runtime.Escape(map[int]int{})
+			if count := runtime.MapBucketsCount(escapingMap); count > 1 && runtime.MapBucketsPointerIsNil(escapingMap) {
+				t.Errorf("escape: buckets pointer is nil for n=%d buckets", count)
+			}
+			for i := 0; i < tt.n; i++ {
+				escapingMap[i] = i
+			}
+			if got := runtime.MapBucketsCount(escapingMap); got != tt.escape {
+				t.Errorf("escape n=%d want %d buckets, got %d", tt.n, tt.escape, got)
+			}
+		}
+	})
+	t.Run("nohint", func(t *testing.T) {
+		for _, tt := range mapBucketTests {
+			localMap := make(map[int]int)
+			if runtime.MapBucketsPointerIsNil(localMap) {
+				t.Errorf("no escape: buckets pointer is nil for non-escaping map")
+			}
+			for i := 0; i < tt.n; i++ {
+				localMap[i] = i
+			}
+			if got := runtime.MapBucketsCount(localMap); got != tt.noescape {
+				t.Errorf("no escape: n=%d want %d buckets, got %d", tt.n, tt.noescape, got)
+			}
+			escapingMap := runtime.Escape(make(map[int]int))
+			if count := runtime.MapBucketsCount(escapingMap); count > 1 && runtime.MapBucketsPointerIsNil(escapingMap) {
+				t.Errorf("escape: buckets pointer is nil for n=%d buckets", count)
+			}
+			for i := 0; i < tt.n; i++ {
+				escapingMap[i] = i
+			}
+			if got := runtime.MapBucketsCount(escapingMap); got != tt.escape {
+				t.Errorf("escape: n=%d want %d buckets, got %d", tt.n, tt.escape, got)
+			}
+		}
+	})
+	t.Run("makemap", func(t *testing.T) {
+		for _, tt := range mapBucketTests {
+			localMap := make(map[int]int, tt.n)
+			if runtime.MapBucketsPointerIsNil(localMap) {
+				t.Errorf("no escape: buckets pointer is nil for non-escaping map")
+			}
+			for i := 0; i < tt.n; i++ {
+				localMap[i] = i
+			}
+			if got := runtime.MapBucketsCount(localMap); got != tt.noescape {
+				t.Errorf("no escape: n=%d want %d buckets, got %d", tt.n, tt.noescape, got)
+			}
+			escapingMap := runtime.Escape(make(map[int]int, tt.n))
+			if count := runtime.MapBucketsCount(escapingMap); count > 1 && runtime.MapBucketsPointerIsNil(escapingMap) {
+				t.Errorf("escape: buckets pointer is nil for n=%d buckets", count)
+			}
+			for i := 0; i < tt.n; i++ {
+				escapingMap[i] = i
+			}
+			if got := runtime.MapBucketsCount(escapingMap); got != tt.escape {
+				t.Errorf("escape: n=%d want %d buckets, got %d", tt.n, tt.escape, got)
+			}
+		}
+	})
+	t.Run("makemap64", func(t *testing.T) {
+		for _, tt := range mapBucketTests {
+			localMap := make(map[int]int, int64(tt.n))
+			if runtime.MapBucketsPointerIsNil(localMap) {
+				t.Errorf("no escape: buckets pointer is nil for non-escaping map")
+			}
+			for i := 0; i < tt.n; i++ {
+				localMap[i] = i
+			}
+			if got := runtime.MapBucketsCount(localMap); got != tt.noescape {
+				t.Errorf("no escape: n=%d want %d buckets, got %d", tt.n, tt.noescape, got)
+			}
+			escapingMap := runtime.Escape(make(map[int]int, tt.n))
+			if count := runtime.MapBucketsCount(escapingMap); count > 1 && runtime.MapBucketsPointerIsNil(escapingMap) {
+				t.Errorf("escape: buckets pointer is nil for n=%d buckets", count)
+			}
+			for i := 0; i < tt.n; i++ {
+				escapingMap[i] = i
+			}
+			if got := runtime.MapBucketsCount(escapingMap); got != tt.escape {
+				t.Errorf("escape: n=%d want %d buckets, got %d", tt.n, tt.escape, got)
+			}
+		}
+	})
+
+}
+
+func benchmarkMapPop(b *testing.B, n int) {
+	m := map[int]int{}
+	for i := 0; i < b.N; i++ {
+		for j := 0; j < n; j++ {
+			m[j] = j
+		}
+		for j := 0; j < n; j++ {
+			// Use iterator to pop an element.
+			// We want this to be fast, see issue 8412.
+			for k := range m {
+				delete(m, k)
+				break
+			}
+		}
+	}
+}
+
+func BenchmarkMapPop100(b *testing.B)   { benchmarkMapPop(b, 100) }
+func BenchmarkMapPop1000(b *testing.B)  { benchmarkMapPop(b, 1000) }
+func BenchmarkMapPop10000(b *testing.B) { benchmarkMapPop(b, 10000) }
+
+var testNonEscapingMapVariable int = 8
+
+func TestNonEscapingMap(t *testing.T) {
+	n := testing.AllocsPerRun(1000, func() {
+		m := map[int]int{}
+		m[0] = 0
+	})
+	if n != 0 {
+		t.Fatalf("mapliteral: want 0 allocs, got %v", n)
+	}
+	n = testing.AllocsPerRun(1000, func() {
+		m := make(map[int]int)
+		m[0] = 0
+	})
+	if n != 0 {
+		t.Fatalf("no hint: want 0 allocs, got %v", n)
+	}
+	n = testing.AllocsPerRun(1000, func() {
+		m := make(map[int]int, 8)
+		m[0] = 0
+	})
+	if n != 0 {
+		t.Fatalf("with small hint: want 0 allocs, got %v", n)
+	}
+	n = testing.AllocsPerRun(1000, func() {
+		m := make(map[int]int, testNonEscapingMapVariable)
+		m[0] = 0
+	})
+	if n != 0 {
+		t.Fatalf("with variable hint: want 0 allocs, got %v", n)
+	}
+
+}
+
+func benchmarkMapAssignInt32(b *testing.B, n int) {
+	a := make(map[int32]int)
+	for i := 0; i < b.N; i++ {
+		a[int32(i&(n-1))] = i
+	}
+}
+
+func benchmarkMapOperatorAssignInt32(b *testing.B, n int) {
+	a := make(map[int32]int)
+	for i := 0; i < b.N; i++ {
+		a[int32(i&(n-1))] += i
+	}
+}
+
+func benchmarkMapAppendAssignInt32(b *testing.B, n int) {
+	a := make(map[int32][]int)
+	b.ReportAllocs()
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		key := int32(i & (n - 1))
+		a[key] = append(a[key], i)
+	}
+}
+
+func benchmarkMapDeleteInt32(b *testing.B, n int) {
+	a := make(map[int32]int, n)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		if len(a) == 0 {
+			b.StopTimer()
+			for j := i; j < i+n; j++ {
+				a[int32(j)] = j
+			}
+			b.StartTimer()
+		}
+		delete(a, int32(i))
+	}
+}
+
+func benchmarkMapAssignInt64(b *testing.B, n int) {
+	a := make(map[int64]int)
+	for i := 0; i < b.N; i++ {
+		a[int64(i&(n-1))] = i
+	}
+}
+
+func benchmarkMapOperatorAssignInt64(b *testing.B, n int) {
+	a := make(map[int64]int)
+	for i := 0; i < b.N; i++ {
+		a[int64(i&(n-1))] += i
+	}
+}
+
+func benchmarkMapAppendAssignInt64(b *testing.B, n int) {
+	a := make(map[int64][]int)
+	b.ReportAllocs()
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		key := int64(i & (n - 1))
+		a[key] = append(a[key], i)
+	}
+}
+
+func benchmarkMapDeleteInt64(b *testing.B, n int) {
+	a := make(map[int64]int, n)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		if len(a) == 0 {
+			b.StopTimer()
+			for j := i; j < i+n; j++ {
+				a[int64(j)] = j
+			}
+			b.StartTimer()
+		}
+		delete(a, int64(i))
+	}
+}
+
+func benchmarkMapAssignStr(b *testing.B, n int) {
+	k := make([]string, n)
+	for i := 0; i < len(k); i++ {
+		k[i] = strconv.Itoa(i)
+	}
+	b.ResetTimer()
+	a := make(map[string]int)
+	for i := 0; i < b.N; i++ {
+		a[k[i&(n-1)]] = i
+	}
+}
+
+func benchmarkMapOperatorAssignStr(b *testing.B, n int) {
+	k := make([]string, n)
+	for i := 0; i < len(k); i++ {
+		k[i] = strconv.Itoa(i)
+	}
+	b.ResetTimer()
+	a := make(map[string]string)
+	for i := 0; i < b.N; i++ {
+		key := k[i&(n-1)]
+		a[key] += key
+	}
+}
+
+func benchmarkMapAppendAssignStr(b *testing.B, n int) {
+	k := make([]string, n)
+	for i := 0; i < len(k); i++ {
+		k[i] = strconv.Itoa(i)
+	}
+	a := make(map[string][]string)
+	b.ReportAllocs()
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		key := k[i&(n-1)]
+		a[key] = append(a[key], key)
+	}
+}
+
+func benchmarkMapDeleteStr(b *testing.B, n int) {
+	i2s := make([]string, n)
+	for i := 0; i < n; i++ {
+		i2s[i] = strconv.Itoa(i)
+	}
+	a := make(map[string]int, n)
+	b.ResetTimer()
+	k := 0
+	for i := 0; i < b.N; i++ {
+		if len(a) == 0 {
+			b.StopTimer()
+			for j := 0; j < n; j++ {
+				a[i2s[j]] = j
+			}
+			k = i
+			b.StartTimer()
+		}
+		delete(a, i2s[i-k])
+	}
+}
+
+func benchmarkMapDeletePointer(b *testing.B, n int) {
+	i2p := make([]*int, n)
+	for i := 0; i < n; i++ {
+		i2p[i] = new(int)
+	}
+	a := make(map[*int]int, n)
+	b.ResetTimer()
+	k := 0
+	for i := 0; i < b.N; i++ {
+		if len(a) == 0 {
+			b.StopTimer()
+			for j := 0; j < n; j++ {
+				a[i2p[j]] = j
+			}
+			k = i
+			b.StartTimer()
+		}
+		delete(a, i2p[i-k])
+	}
+}
+
+func runWith(f func(*testing.B, int), v ...int) func(*testing.B) {
+	return func(b *testing.B) {
+		for _, n := range v {
+			b.Run(strconv.Itoa(n), func(b *testing.B) { f(b, n) })
+		}
+	}
+}
+
+func BenchmarkMapAssign(b *testing.B) {
+	b.Run("Int32", runWith(benchmarkMapAssignInt32, 1<<8, 1<<16))
+	b.Run("Int64", runWith(benchmarkMapAssignInt64, 1<<8, 1<<16))
+	b.Run("Str", runWith(benchmarkMapAssignStr, 1<<8, 1<<16))
+}
+
+func BenchmarkMapOperatorAssign(b *testing.B) {
+	b.Run("Int32", runWith(benchmarkMapOperatorAssignInt32, 1<<8, 1<<16))
+	b.Run("Int64", runWith(benchmarkMapOperatorAssignInt64, 1<<8, 1<<16))
+	b.Run("Str", runWith(benchmarkMapOperatorAssignStr, 1<<8, 1<<16))
+}
+
+func BenchmarkMapAppendAssign(b *testing.B) {
+	b.Run("Int32", runWith(benchmarkMapAppendAssignInt32, 1<<8, 1<<16))
+	b.Run("Int64", runWith(benchmarkMapAppendAssignInt64, 1<<8, 1<<16))
+	b.Run("Str", runWith(benchmarkMapAppendAssignStr, 1<<8, 1<<16))
+}
+
+func BenchmarkMapDelete(b *testing.B) {
+	b.Run("Int32", runWith(benchmarkMapDeleteInt32, 100, 1000, 10000))
+	b.Run("Int64", runWith(benchmarkMapDeleteInt64, 100, 1000, 10000))
+	b.Run("Str", runWith(benchmarkMapDeleteStr, 100, 1000, 10000))
+	b.Run("Pointer", runWith(benchmarkMapDeletePointer, 100, 1000, 10000))
+}
+
+func TestDeferDeleteSlow(t *testing.T) {
+	ks := []complex128{0, 1, 2, 3}
+
+	m := make(map[any]int)
+	for i, k := range ks {
+		m[k] = i
+	}
+	if len(m) != len(ks) {
+		t.Errorf("want %d elements, got %d", len(ks), len(m))
+	}
+
+	func() {
+		for _, k := range ks {
+			defer delete(m, k)
+		}
+	}()
+	if len(m) != 0 {
+		t.Errorf("want 0 elements, got %d", len(m))
+	}
+}
+
+// TestIncrementAfterDeleteValueInt and other test Issue 25936.
+// Value types int, int32, int64 are affected. Value type string
+// works as expected.
+func TestIncrementAfterDeleteValueInt(t *testing.T) {
+	const key1 = 12
+	const key2 = 13
+
+	m := make(map[int]int)
+	m[key1] = 99
+	delete(m, key1)
+	m[key2]++
+	if n2 := m[key2]; n2 != 1 {
+		t.Errorf("incremented 0 to %d", n2)
+	}
+}
+
+func TestIncrementAfterDeleteValueInt32(t *testing.T) {
+	const key1 = 12
+	const key2 = 13
+
+	m := make(map[int]int32)
+	m[key1] = 99
+	delete(m, key1)
+	m[key2]++
+	if n2 := m[key2]; n2 != 1 {
+		t.Errorf("incremented 0 to %d", n2)
+	}
+}
+
+func TestIncrementAfterDeleteValueInt64(t *testing.T) {
+	const key1 = 12
+	const key2 = 13
+
+	m := make(map[int]int64)
+	m[key1] = 99
+	delete(m, key1)
+	m[key2]++
+	if n2 := m[key2]; n2 != 1 {
+		t.Errorf("incremented 0 to %d", n2)
+	}
+}
+
+func TestIncrementAfterDeleteKeyStringValueInt(t *testing.T) {
+	const key1 = ""
+	const key2 = "x"
+
+	m := make(map[string]int)
+	m[key1] = 99
+	delete(m, key1)
+	m[key2] += 1
+	if n2 := m[key2]; n2 != 1 {
+		t.Errorf("incremented 0 to %d", n2)
+	}
+}
+
+func TestIncrementAfterDeleteKeyValueString(t *testing.T) {
+	const key1 = ""
+	const key2 = "x"
+
+	m := make(map[string]string)
+	m[key1] = "99"
+	delete(m, key1)
+	m[key2] += "1"
+	if n2 := m[key2]; n2 != "1" {
+		t.Errorf("appended '1' to empty (nil) string, got %s", n2)
+	}
+}
+
+// TestIncrementAfterBulkClearKeyStringValueInt tests that map bulk
+// deletion (mapclear) still works as expected. Note that it was not
+// affected by Issue 25936.
+func TestIncrementAfterBulkClearKeyStringValueInt(t *testing.T) {
+	const key1 = ""
+	const key2 = "x"
+
+	m := make(map[string]int)
+	m[key1] = 99
+	for k := range m {
+		delete(m, k)
+	}
+	m[key2]++
+	if n2 := m[key2]; n2 != 1 {
+		t.Errorf("incremented 0 to %d", n2)
+	}
+}
+
+func TestMapTombstones(t *testing.T) {
+	m := map[int]int{}
+	const N = 10000
+	// Fill a map.
+	for i := 0; i < N; i++ {
+		m[i] = i
+	}
+	runtime.MapTombstoneCheck(m)
+	// Delete half of the entries.
+	for i := 0; i < N; i += 2 {
+		delete(m, i)
+	}
+	runtime.MapTombstoneCheck(m)
+	// Add new entries to fill in holes.
+	for i := N; i < 3*N/2; i++ {
+		m[i] = i
+	}
+	runtime.MapTombstoneCheck(m)
+	// Delete everything.
+	for i := 0; i < 3*N/2; i++ {
+		delete(m, i)
+	}
+	runtime.MapTombstoneCheck(m)
+}
+
+type canString int
+
+func (c canString) String() string {
+	return fmt.Sprintf("%d", int(c))
+}
+
+func TestMapInterfaceKey(t *testing.T) {
+	// Test all the special cases in runtime.typehash.
+	type GrabBag struct {
+		f32  float32
+		f64  float64
+		c64  complex64
+		c128 complex128
+		s    string
+		i0   any
+		i1   interface {
+			String() string
+		}
+		a [4]string
+	}
+
+	m := map[any]bool{}
+	// Put a bunch of data in m, so that a bad hash is likely to
+	// lead to a bad bucket, which will lead to a missed lookup.
+	for i := 0; i < 1000; i++ {
+		m[i] = true
+	}
+	m[GrabBag{f32: 1.0}] = true
+	if !m[GrabBag{f32: 1.0}] {
+		panic("f32 not found")
+	}
+	m[GrabBag{f64: 1.0}] = true
+	if !m[GrabBag{f64: 1.0}] {
+		panic("f64 not found")
+	}
+	m[GrabBag{c64: 1.0i}] = true
+	if !m[GrabBag{c64: 1.0i}] {
+		panic("c64 not found")
+	}
+	m[GrabBag{c128: 1.0i}] = true
+	if !m[GrabBag{c128: 1.0i}] {
+		panic("c128 not found")
+	}
+	m[GrabBag{s: "foo"}] = true
+	if !m[GrabBag{s: "foo"}] {
+		panic("string not found")
+	}
+	m[GrabBag{i0: "foo"}] = true
+	if !m[GrabBag{i0: "foo"}] {
+		panic("interface{} not found")
+	}
+	m[GrabBag{i1: canString(5)}] = true
+	if !m[GrabBag{i1: canString(5)}] {
+		panic("interface{String() string} not found")
+	}
+	m[GrabBag{a: [4]string{"foo", "bar", "baz", "bop"}}] = true
+	if !m[GrabBag{a: [4]string{"foo", "bar", "baz", "bop"}}] {
+		panic("array not found")
+	}
+}
diff --git a/src/runtime/mbarrier.go b/src/runtime/mbarrier.go
new file mode 100644
index 0000000..159a298
--- /dev/null
+++ b/src/runtime/mbarrier.go
@@ -0,0 +1,347 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Garbage collector: write barriers.
+//
+// For the concurrent garbage collector, the Go compiler implements
+// updates to pointer-valued fields that may be in heap objects by
+// emitting calls to write barriers. The main write barrier for
+// individual pointer writes is gcWriteBarrier and is implemented in
+// assembly. This file contains write barrier entry points for bulk
+// operations. See also mwbbuf.go.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"internal/goexperiment"
+	"unsafe"
+)
+
+// Go uses a hybrid barrier that combines a Yuasa-style deletion
+// barrier—which shades the object whose reference is being
+// overwritten—with Dijkstra insertion barrier—which shades the object
+// whose reference is being written. The insertion part of the barrier
+// is necessary while the calling goroutine's stack is grey. In
+// pseudocode, the barrier is:
+//
+//     writePointer(slot, ptr):
+//         shade(*slot)
+//         if current stack is grey:
+//             shade(ptr)
+//         *slot = ptr
+//
+// slot is the destination in Go code.
+// ptr is the value that goes into the slot in Go code.
+//
+// Shade indicates that it has seen a white pointer by adding the referent
+// to wbuf as well as marking it.
+//
+// The two shades and the condition work together to prevent a mutator
+// from hiding an object from the garbage collector:
+//
+// 1. shade(*slot) prevents a mutator from hiding an object by moving
+// the sole pointer to it from the heap to its stack. If it attempts
+// to unlink an object from the heap, this will shade it.
+//
+// 2. shade(ptr) prevents a mutator from hiding an object by moving
+// the sole pointer to it from its stack into a black object in the
+// heap. If it attempts to install the pointer into a black object,
+// this will shade it.
+//
+// 3. Once a goroutine's stack is black, the shade(ptr) becomes
+// unnecessary. shade(ptr) prevents hiding an object by moving it from
+// the stack to the heap, but this requires first having a pointer
+// hidden on the stack. Immediately after a stack is scanned, it only
+// points to shaded objects, so it's not hiding anything, and the
+// shade(*slot) prevents it from hiding any other pointers on its
+// stack.
+//
+// For a detailed description of this barrier and proof of
+// correctness, see https://github.com/golang/proposal/blob/master/design/17503-eliminate-rescan.md
+//
+//
+//
+// Dealing with memory ordering:
+//
+// Both the Yuasa and Dijkstra barriers can be made conditional on the
+// color of the object containing the slot. We chose not to make these
+// conditional because the cost of ensuring that the object holding
+// the slot doesn't concurrently change color without the mutator
+// noticing seems prohibitive.
+//
+// Consider the following example where the mutator writes into
+// a slot and then loads the slot's mark bit while the GC thread
+// writes to the slot's mark bit and then as part of scanning reads
+// the slot.
+//
+// Initially both [slot] and [slotmark] are 0 (nil)
+// Mutator thread          GC thread
+// st [slot], ptr          st [slotmark], 1
+//
+// ld r1, [slotmark]       ld r2, [slot]
+//
+// Without an expensive memory barrier between the st and the ld, the final
+// result on most HW (including 386/amd64) can be r1==r2==0. This is a classic
+// example of what can happen when loads are allowed to be reordered with older
+// stores (avoiding such reorderings lies at the heart of the classic
+// Peterson/Dekker algorithms for mutual exclusion). Rather than require memory
+// barriers, which will slow down both the mutator and the GC, we always grey
+// the ptr object regardless of the slot's color.
+//
+// Another place where we intentionally omit memory barriers is when
+// accessing mheap_.arena_used to check if a pointer points into the
+// heap. On relaxed memory machines, it's possible for a mutator to
+// extend the size of the heap by updating arena_used, allocate an
+// object from this new region, and publish a pointer to that object,
+// but for tracing running on another processor to observe the pointer
+// but use the old value of arena_used. In this case, tracing will not
+// mark the object, even though it's reachable. However, the mutator
+// is guaranteed to execute a write barrier when it publishes the
+// pointer, so it will take care of marking the object. A general
+// consequence of this is that the garbage collector may cache the
+// value of mheap_.arena_used. (See issue #9984.)
+//
+//
+// Stack writes:
+//
+// The compiler omits write barriers for writes to the current frame,
+// but if a stack pointer has been passed down the call stack, the
+// compiler will generate a write barrier for writes through that
+// pointer (because it doesn't know it's not a heap pointer).
+//
+//
+// Global writes:
+//
+// The Go garbage collector requires write barriers when heap pointers
+// are stored in globals. Many garbage collectors ignore writes to
+// globals and instead pick up global -> heap pointers during
+// termination. This increases pause time, so we instead rely on write
+// barriers for writes to globals so that we don't have to rescan
+// global during mark termination.
+//
+//
+// Publication ordering:
+//
+// The write barrier is *pre-publication*, meaning that the write
+// barrier happens prior to the *slot = ptr write that may make ptr
+// reachable by some goroutine that currently cannot reach it.
+//
+//
+// Signal handler pointer writes:
+//
+// In general, the signal handler cannot safely invoke the write
+// barrier because it may run without a P or even during the write
+// barrier.
+//
+// There is exactly one exception: profbuf.go omits a barrier during
+// signal handler profile logging. That's safe only because of the
+// deletion barrier. See profbuf.go for a detailed argument. If we
+// remove the deletion barrier, we'll have to work out a new way to
+// handle the profile logging.
+
+// typedmemmove copies a value of type typ to dst from src.
+// Must be nosplit, see #16026.
+//
+// TODO: Perfect for go:nosplitrec since we can't have a safe point
+// anywhere in the bulk barrier or memmove.
+//
+//go:nosplit
+func typedmemmove(typ *abi.Type, dst, src unsafe.Pointer) {
+	if dst == src {
+		return
+	}
+	if writeBarrier.needed && typ.PtrBytes != 0 {
+		bulkBarrierPreWrite(uintptr(dst), uintptr(src), typ.PtrBytes)
+	}
+	// There's a race here: if some other goroutine can write to
+	// src, it may change some pointer in src after we've
+	// performed the write barrier but before we perform the
+	// memory copy. This safe because the write performed by that
+	// other goroutine must also be accompanied by a write
+	// barrier, so at worst we've unnecessarily greyed the old
+	// pointer that was in src.
+	memmove(dst, src, typ.Size_)
+	if goexperiment.CgoCheck2 {
+		cgoCheckMemmove2(typ, dst, src, 0, typ.Size_)
+	}
+}
+
+// wbZero performs the write barrier operations necessary before
+// zeroing a region of memory at address dst of type typ.
+// Does not actually do the zeroing.
+//
+//go:nowritebarrierrec
+//go:nosplit
+func wbZero(typ *_type, dst unsafe.Pointer) {
+	bulkBarrierPreWrite(uintptr(dst), 0, typ.PtrBytes)
+}
+
+// wbMove performs the write barrier operations necessary before
+// copying a region of memory from src to dst of type typ.
+// Does not actually do the copying.
+//
+//go:nowritebarrierrec
+//go:nosplit
+func wbMove(typ *_type, dst, src unsafe.Pointer) {
+	bulkBarrierPreWrite(uintptr(dst), uintptr(src), typ.PtrBytes)
+}
+
+//go:linkname reflect_typedmemmove reflect.typedmemmove
+func reflect_typedmemmove(typ *_type, dst, src unsafe.Pointer) {
+	if raceenabled {
+		raceWriteObjectPC(typ, dst, getcallerpc(), abi.FuncPCABIInternal(reflect_typedmemmove))
+		raceReadObjectPC(typ, src, getcallerpc(), abi.FuncPCABIInternal(reflect_typedmemmove))
+	}
+	if msanenabled {
+		msanwrite(dst, typ.Size_)
+		msanread(src, typ.Size_)
+	}
+	if asanenabled {
+		asanwrite(dst, typ.Size_)
+		asanread(src, typ.Size_)
+	}
+	typedmemmove(typ, dst, src)
+}
+
+//go:linkname reflectlite_typedmemmove internal/reflectlite.typedmemmove
+func reflectlite_typedmemmove(typ *_type, dst, src unsafe.Pointer) {
+	reflect_typedmemmove(typ, dst, src)
+}
+
+// reflectcallmove is invoked by reflectcall to copy the return values
+// out of the stack and into the heap, invoking the necessary write
+// barriers. dst, src, and size describe the return value area to
+// copy. typ describes the entire frame (not just the return values).
+// typ may be nil, which indicates write barriers are not needed.
+//
+// It must be nosplit and must only call nosplit functions because the
+// stack map of reflectcall is wrong.
+//
+//go:nosplit
+func reflectcallmove(typ *_type, dst, src unsafe.Pointer, size uintptr, regs *abi.RegArgs) {
+	if writeBarrier.needed && typ != nil && typ.PtrBytes != 0 && size >= goarch.PtrSize {
+		bulkBarrierPreWrite(uintptr(dst), uintptr(src), size)
+	}
+	memmove(dst, src, size)
+
+	// Move pointers returned in registers to a place where the GC can see them.
+	for i := range regs.Ints {
+		if regs.ReturnIsPtr.Get(i) {
+			regs.Ptrs[i] = unsafe.Pointer(regs.Ints[i])
+		}
+	}
+}
+
+//go:nosplit
+func typedslicecopy(typ *_type, dstPtr unsafe.Pointer, dstLen int, srcPtr unsafe.Pointer, srcLen int) int {
+	n := dstLen
+	if n > srcLen {
+		n = srcLen
+	}
+	if n == 0 {
+		return 0
+	}
+
+	// The compiler emits calls to typedslicecopy before
+	// instrumentation runs, so unlike the other copying and
+	// assignment operations, it's not instrumented in the calling
+	// code and needs its own instrumentation.
+	if raceenabled {
+		callerpc := getcallerpc()
+		pc := abi.FuncPCABIInternal(slicecopy)
+		racewriterangepc(dstPtr, uintptr(n)*typ.Size_, callerpc, pc)
+		racereadrangepc(srcPtr, uintptr(n)*typ.Size_, callerpc, pc)
+	}
+	if msanenabled {
+		msanwrite(dstPtr, uintptr(n)*typ.Size_)
+		msanread(srcPtr, uintptr(n)*typ.Size_)
+	}
+	if asanenabled {
+		asanwrite(dstPtr, uintptr(n)*typ.Size_)
+		asanread(srcPtr, uintptr(n)*typ.Size_)
+	}
+
+	if goexperiment.CgoCheck2 {
+		cgoCheckSliceCopy(typ, dstPtr, srcPtr, n)
+	}
+
+	if dstPtr == srcPtr {
+		return n
+	}
+
+	// Note: No point in checking typ.PtrBytes here:
+	// compiler only emits calls to typedslicecopy for types with pointers,
+	// and growslice and reflect_typedslicecopy check for pointers
+	// before calling typedslicecopy.
+	size := uintptr(n) * typ.Size_
+	if writeBarrier.needed {
+		pwsize := size - typ.Size_ + typ.PtrBytes
+		bulkBarrierPreWrite(uintptr(dstPtr), uintptr(srcPtr), pwsize)
+	}
+	// See typedmemmove for a discussion of the race between the
+	// barrier and memmove.
+	memmove(dstPtr, srcPtr, size)
+	return n
+}
+
+//go:linkname reflect_typedslicecopy reflect.typedslicecopy
+func reflect_typedslicecopy(elemType *_type, dst, src slice) int {
+	if elemType.PtrBytes == 0 {
+		return slicecopy(dst.array, dst.len, src.array, src.len, elemType.Size_)
+	}
+	return typedslicecopy(elemType, dst.array, dst.len, src.array, src.len)
+}
+
+// typedmemclr clears the typed memory at ptr with type typ. The
+// memory at ptr must already be initialized (and hence in type-safe
+// state). If the memory is being initialized for the first time, see
+// memclrNoHeapPointers.
+//
+// If the caller knows that typ has pointers, it can alternatively
+// call memclrHasPointers.
+//
+// TODO: A "go:nosplitrec" annotation would be perfect for this.
+//
+//go:nosplit
+func typedmemclr(typ *_type, ptr unsafe.Pointer) {
+	if writeBarrier.needed && typ.PtrBytes != 0 {
+		bulkBarrierPreWrite(uintptr(ptr), 0, typ.PtrBytes)
+	}
+	memclrNoHeapPointers(ptr, typ.Size_)
+}
+
+//go:linkname reflect_typedmemclr reflect.typedmemclr
+func reflect_typedmemclr(typ *_type, ptr unsafe.Pointer) {
+	typedmemclr(typ, ptr)
+}
+
+//go:linkname reflect_typedmemclrpartial reflect.typedmemclrpartial
+func reflect_typedmemclrpartial(typ *_type, ptr unsafe.Pointer, off, size uintptr) {
+	if writeBarrier.needed && typ.PtrBytes != 0 {
+		bulkBarrierPreWrite(uintptr(ptr), 0, size)
+	}
+	memclrNoHeapPointers(ptr, size)
+}
+
+//go:linkname reflect_typedarrayclear reflect.typedarrayclear
+func reflect_typedarrayclear(typ *_type, ptr unsafe.Pointer, len int) {
+	size := typ.Size_ * uintptr(len)
+	if writeBarrier.needed && typ.PtrBytes != 0 {
+		bulkBarrierPreWrite(uintptr(ptr), 0, size)
+	}
+	memclrNoHeapPointers(ptr, size)
+}
+
+// memclrHasPointers clears n bytes of typed memory starting at ptr.
+// The caller must ensure that the type of the object at ptr has
+// pointers, usually by checking typ.PtrBytes. However, ptr
+// does not have to point to the start of the allocation.
+//
+//go:nosplit
+func memclrHasPointers(ptr unsafe.Pointer, n uintptr) {
+	bulkBarrierPreWrite(uintptr(ptr), 0, n)
+	memclrNoHeapPointers(ptr, n)
+}
diff --git a/src/runtime/mbitmap.go b/src/runtime/mbitmap.go
new file mode 100644
index 0000000..a242872
--- /dev/null
+++ b/src/runtime/mbitmap.go
@@ -0,0 +1,1501 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Garbage collector: type and heap bitmaps.
+//
+// Stack, data, and bss bitmaps
+//
+// Stack frames and global variables in the data and bss sections are
+// described by bitmaps with 1 bit per pointer-sized word. A "1" bit
+// means the word is a live pointer to be visited by the GC (referred to
+// as "pointer"). A "0" bit means the word should be ignored by GC
+// (referred to as "scalar", though it could be a dead pointer value).
+//
+// Heap bitmap
+//
+// The heap bitmap comprises 1 bit for each pointer-sized word in the heap,
+// recording whether a pointer is stored in that word or not. This bitmap
+// is stored in the heapArena metadata backing each heap arena.
+// That is, if ha is the heapArena for the arena starting at "start",
+// then ha.bitmap[0] holds the 64 bits for the 64 words "start"
+// through start+63*ptrSize, ha.bitmap[1] holds the entries for
+// start+64*ptrSize through start+127*ptrSize, and so on.
+// Bits correspond to words in little-endian order. ha.bitmap[0]&1 represents
+// the word at "start", ha.bitmap[0]>>1&1 represents the word at start+8, etc.
+// (For 32-bit platforms, s/64/32/.)
+//
+// We also keep a noMorePtrs bitmap which allows us to stop scanning
+// the heap bitmap early in certain situations. If ha.noMorePtrs[i]>>j&1
+// is 1, then the object containing the last word described by ha.bitmap[8*i+j]
+// has no more pointers beyond those described by ha.bitmap[8*i+j].
+// If ha.noMorePtrs[i]>>j&1 is set, the entries in ha.bitmap[8*i+j+1] and
+// beyond must all be zero until the start of the next object.
+//
+// The bitmap for noscan spans is set to all zero at span allocation time.
+//
+// The bitmap for unallocated objects in scannable spans is not maintained
+// (can be junk).
+
+package runtime
+
+import (
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// addb returns the byte pointer p+n.
+//
+//go:nowritebarrier
+//go:nosplit
+func addb(p *byte, n uintptr) *byte {
+	// Note: wrote out full expression instead of calling add(p, n)
+	// to reduce the number of temporaries generated by the
+	// compiler for this trivial expression during inlining.
+	return (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(p)) + n))
+}
+
+// subtractb returns the byte pointer p-n.
+//
+//go:nowritebarrier
+//go:nosplit
+func subtractb(p *byte, n uintptr) *byte {
+	// Note: wrote out full expression instead of calling add(p, -n)
+	// to reduce the number of temporaries generated by the
+	// compiler for this trivial expression during inlining.
+	return (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(p)) - n))
+}
+
+// add1 returns the byte pointer p+1.
+//
+//go:nowritebarrier
+//go:nosplit
+func add1(p *byte) *byte {
+	// Note: wrote out full expression instead of calling addb(p, 1)
+	// to reduce the number of temporaries generated by the
+	// compiler for this trivial expression during inlining.
+	return (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(p)) + 1))
+}
+
+// subtract1 returns the byte pointer p-1.
+//
+// nosplit because it is used during write barriers and must not be preempted.
+//
+//go:nowritebarrier
+//go:nosplit
+func subtract1(p *byte) *byte {
+	// Note: wrote out full expression instead of calling subtractb(p, 1)
+	// to reduce the number of temporaries generated by the
+	// compiler for this trivial expression during inlining.
+	return (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(p)) - 1))
+}
+
+// markBits provides access to the mark bit for an object in the heap.
+// bytep points to the byte holding the mark bit.
+// mask is a byte with a single bit set that can be &ed with *bytep
+// to see if the bit has been set.
+// *m.byte&m.mask != 0 indicates the mark bit is set.
+// index can be used along with span information to generate
+// the address of the object in the heap.
+// We maintain one set of mark bits for allocation and one for
+// marking purposes.
+type markBits struct {
+	bytep *uint8
+	mask  uint8
+	index uintptr
+}
+
+//go:nosplit
+func (s *mspan) allocBitsForIndex(allocBitIndex uintptr) markBits {
+	bytep, mask := s.allocBits.bitp(allocBitIndex)
+	return markBits{bytep, mask, allocBitIndex}
+}
+
+// refillAllocCache takes 8 bytes s.allocBits starting at whichByte
+// and negates them so that ctz (count trailing zeros) instructions
+// can be used. It then places these 8 bytes into the cached 64 bit
+// s.allocCache.
+func (s *mspan) refillAllocCache(whichByte uintptr) {
+	bytes := (*[8]uint8)(unsafe.Pointer(s.allocBits.bytep(whichByte)))
+	aCache := uint64(0)
+	aCache |= uint64(bytes[0])
+	aCache |= uint64(bytes[1]) << (1 * 8)
+	aCache |= uint64(bytes[2]) << (2 * 8)
+	aCache |= uint64(bytes[3]) << (3 * 8)
+	aCache |= uint64(bytes[4]) << (4 * 8)
+	aCache |= uint64(bytes[5]) << (5 * 8)
+	aCache |= uint64(bytes[6]) << (6 * 8)
+	aCache |= uint64(bytes[7]) << (7 * 8)
+	s.allocCache = ^aCache
+}
+
+// nextFreeIndex returns the index of the next free object in s at
+// or after s.freeindex.
+// There are hardware instructions that can be used to make this
+// faster if profiling warrants it.
+func (s *mspan) nextFreeIndex() uintptr {
+	sfreeindex := s.freeindex
+	snelems := s.nelems
+	if sfreeindex == snelems {
+		return sfreeindex
+	}
+	if sfreeindex > snelems {
+		throw("s.freeindex > s.nelems")
+	}
+
+	aCache := s.allocCache
+
+	bitIndex := sys.TrailingZeros64(aCache)
+	for bitIndex == 64 {
+		// Move index to start of next cached bits.
+		sfreeindex = (sfreeindex + 64) &^ (64 - 1)
+		if sfreeindex >= snelems {
+			s.freeindex = snelems
+			return snelems
+		}
+		whichByte := sfreeindex / 8
+		// Refill s.allocCache with the next 64 alloc bits.
+		s.refillAllocCache(whichByte)
+		aCache = s.allocCache
+		bitIndex = sys.TrailingZeros64(aCache)
+		// nothing available in cached bits
+		// grab the next 8 bytes and try again.
+	}
+	result := sfreeindex + uintptr(bitIndex)
+	if result >= snelems {
+		s.freeindex = snelems
+		return snelems
+	}
+
+	s.allocCache >>= uint(bitIndex + 1)
+	sfreeindex = result + 1
+
+	if sfreeindex%64 == 0 && sfreeindex != snelems {
+		// We just incremented s.freeindex so it isn't 0.
+		// As each 1 in s.allocCache was encountered and used for allocation
+		// it was shifted away. At this point s.allocCache contains all 0s.
+		// Refill s.allocCache so that it corresponds
+		// to the bits at s.allocBits starting at s.freeindex.
+		whichByte := sfreeindex / 8
+		s.refillAllocCache(whichByte)
+	}
+	s.freeindex = sfreeindex
+	return result
+}
+
+// isFree reports whether the index'th object in s is unallocated.
+//
+// The caller must ensure s.state is mSpanInUse, and there must have
+// been no preemption points since ensuring this (which could allow a
+// GC transition, which would allow the state to change).
+func (s *mspan) isFree(index uintptr) bool {
+	if index < s.freeIndexForScan {
+		return false
+	}
+	bytep, mask := s.allocBits.bitp(index)
+	return *bytep&mask == 0
+}
+
+// divideByElemSize returns n/s.elemsize.
+// n must be within [0, s.npages*_PageSize),
+// or may be exactly s.npages*_PageSize
+// if s.elemsize is from sizeclasses.go.
+//
+// nosplit, because it is called by objIndex, which is nosplit
+//
+//go:nosplit
+func (s *mspan) divideByElemSize(n uintptr) uintptr {
+	const doubleCheck = false
+
+	// See explanation in mksizeclasses.go's computeDivMagic.
+	q := uintptr((uint64(n) * uint64(s.divMul)) >> 32)
+
+	if doubleCheck && q != n/s.elemsize {
+		println(n, "/", s.elemsize, "should be", n/s.elemsize, "but got", q)
+		throw("bad magic division")
+	}
+	return q
+}
+
+// nosplit, because it is called by other nosplit code like findObject
+//
+//go:nosplit
+func (s *mspan) objIndex(p uintptr) uintptr {
+	return s.divideByElemSize(p - s.base())
+}
+
+func markBitsForAddr(p uintptr) markBits {
+	s := spanOf(p)
+	objIndex := s.objIndex(p)
+	return s.markBitsForIndex(objIndex)
+}
+
+func (s *mspan) markBitsForIndex(objIndex uintptr) markBits {
+	bytep, mask := s.gcmarkBits.bitp(objIndex)
+	return markBits{bytep, mask, objIndex}
+}
+
+func (s *mspan) markBitsForBase() markBits {
+	return markBits{&s.gcmarkBits.x, uint8(1), 0}
+}
+
+// isMarked reports whether mark bit m is set.
+func (m markBits) isMarked() bool {
+	return *m.bytep&m.mask != 0
+}
+
+// setMarked sets the marked bit in the markbits, atomically.
+func (m markBits) setMarked() {
+	// Might be racing with other updates, so use atomic update always.
+	// We used to be clever here and use a non-atomic update in certain
+	// cases, but it's not worth the risk.
+	atomic.Or8(m.bytep, m.mask)
+}
+
+// setMarkedNonAtomic sets the marked bit in the markbits, non-atomically.
+func (m markBits) setMarkedNonAtomic() {
+	*m.bytep |= m.mask
+}
+
+// clearMarked clears the marked bit in the markbits, atomically.
+func (m markBits) clearMarked() {
+	// Might be racing with other updates, so use atomic update always.
+	// We used to be clever here and use a non-atomic update in certain
+	// cases, but it's not worth the risk.
+	atomic.And8(m.bytep, ^m.mask)
+}
+
+// markBitsForSpan returns the markBits for the span base address base.
+func markBitsForSpan(base uintptr) (mbits markBits) {
+	mbits = markBitsForAddr(base)
+	if mbits.mask != 1 {
+		throw("markBitsForSpan: unaligned start")
+	}
+	return mbits
+}
+
+// advance advances the markBits to the next object in the span.
+func (m *markBits) advance() {
+	if m.mask == 1<<7 {
+		m.bytep = (*uint8)(unsafe.Pointer(uintptr(unsafe.Pointer(m.bytep)) + 1))
+		m.mask = 1
+	} else {
+		m.mask = m.mask << 1
+	}
+	m.index++
+}
+
+// clobberdeadPtr is a special value that is used by the compiler to
+// clobber dead stack slots, when -clobberdead flag is set.
+const clobberdeadPtr = uintptr(0xdeaddead | 0xdeaddead<<((^uintptr(0)>>63)*32))
+
+// badPointer throws bad pointer in heap panic.
+func badPointer(s *mspan, p, refBase, refOff uintptr) {
+	// Typically this indicates an incorrect use
+	// of unsafe or cgo to store a bad pointer in
+	// the Go heap. It may also indicate a runtime
+	// bug.
+	//
+	// TODO(austin): We could be more aggressive
+	// and detect pointers to unallocated objects
+	// in allocated spans.
+	printlock()
+	print("runtime: pointer ", hex(p))
+	if s != nil {
+		state := s.state.get()
+		if state != mSpanInUse {
+			print(" to unallocated span")
+		} else {
+			print(" to unused region of span")
+		}
+		print(" span.base()=", hex(s.base()), " span.limit=", hex(s.limit), " span.state=", state)
+	}
+	print("\n")
+	if refBase != 0 {
+		print("runtime: found in object at *(", hex(refBase), "+", hex(refOff), ")\n")
+		gcDumpObject("object", refBase, refOff)
+	}
+	getg().m.traceback = 2
+	throw("found bad pointer in Go heap (incorrect use of unsafe or cgo?)")
+}
+
+// findObject returns the base address for the heap object containing
+// the address p, the object's span, and the index of the object in s.
+// If p does not point into a heap object, it returns base == 0.
+//
+// If p points is an invalid heap pointer and debug.invalidptr != 0,
+// findObject panics.
+//
+// refBase and refOff optionally give the base address of the object
+// in which the pointer p was found and the byte offset at which it
+// was found. These are used for error reporting.
+//
+// It is nosplit so it is safe for p to be a pointer to the current goroutine's stack.
+// Since p is a uintptr, it would not be adjusted if the stack were to move.
+//
+//go:nosplit
+func findObject(p, refBase, refOff uintptr) (base uintptr, s *mspan, objIndex uintptr) {
+	s = spanOf(p)
+	// If s is nil, the virtual address has never been part of the heap.
+	// This pointer may be to some mmap'd region, so we allow it.
+	if s == nil {
+		if (GOARCH == "amd64" || GOARCH == "arm64") && p == clobberdeadPtr && debug.invalidptr != 0 {
+			// Crash if clobberdeadPtr is seen. Only on AMD64 and ARM64 for now,
+			// as they are the only platform where compiler's clobberdead mode is
+			// implemented. On these platforms clobberdeadPtr cannot be a valid address.
+			badPointer(s, p, refBase, refOff)
+		}
+		return
+	}
+	// If p is a bad pointer, it may not be in s's bounds.
+	//
+	// Check s.state to synchronize with span initialization
+	// before checking other fields. See also spanOfHeap.
+	if state := s.state.get(); state != mSpanInUse || p < s.base() || p >= s.limit {
+		// Pointers into stacks are also ok, the runtime manages these explicitly.
+		if state == mSpanManual {
+			return
+		}
+		// The following ensures that we are rigorous about what data
+		// structures hold valid pointers.
+		if debug.invalidptr != 0 {
+			badPointer(s, p, refBase, refOff)
+		}
+		return
+	}
+
+	objIndex = s.objIndex(p)
+	base = s.base() + objIndex*s.elemsize
+	return
+}
+
+// reflect_verifyNotInHeapPtr reports whether converting the not-in-heap pointer into a unsafe.Pointer is ok.
+//
+//go:linkname reflect_verifyNotInHeapPtr reflect.verifyNotInHeapPtr
+func reflect_verifyNotInHeapPtr(p uintptr) bool {
+	// Conversion to a pointer is ok as long as findObject above does not call badPointer.
+	// Since we're already promised that p doesn't point into the heap, just disallow heap
+	// pointers and the special clobbered pointer.
+	return spanOf(p) == nil && p != clobberdeadPtr
+}
+
+const ptrBits = 8 * goarch.PtrSize
+
+// heapBits provides access to the bitmap bits for a single heap word.
+// The methods on heapBits take value receivers so that the compiler
+// can more easily inline calls to those methods and registerize the
+// struct fields independently.
+type heapBits struct {
+	// heapBits will report on pointers in the range [addr,addr+size).
+	// The low bit of mask contains the pointerness of the word at addr
+	// (assuming valid>0).
+	addr, size uintptr
+
+	// The next few pointer bits representing words starting at addr.
+	// Those bits already returned by next() are zeroed.
+	mask uintptr
+	// Number of bits in mask that are valid. mask is always less than 1<<valid.
+	valid uintptr
+}
+
+// heapBitsForAddr returns the heapBits for the address addr.
+// The caller must ensure [addr,addr+size) is in an allocated span.
+// In particular, be careful not to point past the end of an object.
+//
+// nosplit because it is used during write barriers and must not be preempted.
+//
+//go:nosplit
+func heapBitsForAddr(addr, size uintptr) heapBits {
+	// Find arena
+	ai := arenaIndex(addr)
+	ha := mheap_.arenas[ai.l1()][ai.l2()]
+
+	// Word index in arena.
+	word := addr / goarch.PtrSize % heapArenaWords
+
+	// Word index and bit offset in bitmap array.
+	idx := word / ptrBits
+	off := word % ptrBits
+
+	// Grab relevant bits of bitmap.
+	mask := ha.bitmap[idx] >> off
+	valid := ptrBits - off
+
+	// Process depending on where the object ends.
+	nptr := size / goarch.PtrSize
+	if nptr < valid {
+		// Bits for this object end before the end of this bitmap word.
+		// Squash bits for the following objects.
+		mask &= 1<<(nptr&(ptrBits-1)) - 1
+		valid = nptr
+	} else if nptr == valid {
+		// Bits for this object end at exactly the end of this bitmap word.
+		// All good.
+	} else {
+		// Bits for this object extend into the next bitmap word. See if there
+		// may be any pointers recorded there.
+		if uintptr(ha.noMorePtrs[idx/8])>>(idx%8)&1 != 0 {
+			// No more pointers in this object after this bitmap word.
+			// Update size so we know not to look there.
+			size = valid * goarch.PtrSize
+		}
+	}
+
+	return heapBits{addr: addr, size: size, mask: mask, valid: valid}
+}
+
+// Returns the (absolute) address of the next known pointer and
+// a heapBits iterator representing any remaining pointers.
+// If there are no more pointers, returns address 0.
+// Note that next does not modify h. The caller must record the result.
+//
+// nosplit because it is used during write barriers and must not be preempted.
+//
+//go:nosplit
+func (h heapBits) next() (heapBits, uintptr) {
+	for {
+		if h.mask != 0 {
+			var i int
+			if goarch.PtrSize == 8 {
+				i = sys.TrailingZeros64(uint64(h.mask))
+			} else {
+				i = sys.TrailingZeros32(uint32(h.mask))
+			}
+			h.mask ^= uintptr(1) << (i & (ptrBits - 1))
+			return h, h.addr + uintptr(i)*goarch.PtrSize
+		}
+
+		// Skip words that we've already processed.
+		h.addr += h.valid * goarch.PtrSize
+		h.size -= h.valid * goarch.PtrSize
+		if h.size == 0 {
+			return h, 0 // no more pointers
+		}
+
+		// Grab more bits and try again.
+		h = heapBitsForAddr(h.addr, h.size)
+	}
+}
+
+// nextFast is like next, but can return 0 even when there are more pointers
+// to be found. Callers should call next if nextFast returns 0 as its second
+// return value.
+//
+//	if addr, h = h.nextFast(); addr == 0 {
+//	    if addr, h = h.next(); addr == 0 {
+//	        ... no more pointers ...
+//	    }
+//	}
+//	... process pointer at addr ...
+//
+// nextFast is designed to be inlineable.
+//
+//go:nosplit
+func (h heapBits) nextFast() (heapBits, uintptr) {
+	// TESTQ/JEQ
+	if h.mask == 0 {
+		return h, 0
+	}
+	// BSFQ
+	var i int
+	if goarch.PtrSize == 8 {
+		i = sys.TrailingZeros64(uint64(h.mask))
+	} else {
+		i = sys.TrailingZeros32(uint32(h.mask))
+	}
+	// BTCQ
+	h.mask ^= uintptr(1) << (i & (ptrBits - 1))
+	// LEAQ (XX)(XX*8)
+	return h, h.addr + uintptr(i)*goarch.PtrSize
+}
+
+// bulkBarrierPreWrite executes a write barrier
+// for every pointer slot in the memory range [src, src+size),
+// using pointer/scalar information from [dst, dst+size).
+// This executes the write barriers necessary before a memmove.
+// src, dst, and size must be pointer-aligned.
+// The range [dst, dst+size) must lie within a single object.
+// It does not perform the actual writes.
+//
+// As a special case, src == 0 indicates that this is being used for a
+// memclr. bulkBarrierPreWrite will pass 0 for the src of each write
+// barrier.
+//
+// Callers should call bulkBarrierPreWrite immediately before
+// calling memmove(dst, src, size). This function is marked nosplit
+// to avoid being preempted; the GC must not stop the goroutine
+// between the memmove and the execution of the barriers.
+// The caller is also responsible for cgo pointer checks if this
+// may be writing Go pointers into non-Go memory.
+//
+// The pointer bitmap is not maintained for allocations containing
+// no pointers at all; any caller of bulkBarrierPreWrite must first
+// make sure the underlying allocation contains pointers, usually
+// by checking typ.PtrBytes.
+//
+// Callers must perform cgo checks if goexperiment.CgoCheck2.
+//
+//go:nosplit
+func bulkBarrierPreWrite(dst, src, size uintptr) {
+	if (dst|src|size)&(goarch.PtrSize-1) != 0 {
+		throw("bulkBarrierPreWrite: unaligned arguments")
+	}
+	if !writeBarrier.needed {
+		return
+	}
+	if s := spanOf(dst); s == nil {
+		// If dst is a global, use the data or BSS bitmaps to
+		// execute write barriers.
+		for _, datap := range activeModules() {
+			if datap.data <= dst && dst < datap.edata {
+				bulkBarrierBitmap(dst, src, size, dst-datap.data, datap.gcdatamask.bytedata)
+				return
+			}
+		}
+		for _, datap := range activeModules() {
+			if datap.bss <= dst && dst < datap.ebss {
+				bulkBarrierBitmap(dst, src, size, dst-datap.bss, datap.gcbssmask.bytedata)
+				return
+			}
+		}
+		return
+	} else if s.state.get() != mSpanInUse || dst < s.base() || s.limit <= dst {
+		// dst was heap memory at some point, but isn't now.
+		// It can't be a global. It must be either our stack,
+		// or in the case of direct channel sends, it could be
+		// another stack. Either way, no need for barriers.
+		// This will also catch if dst is in a freed span,
+		// though that should never have.
+		return
+	}
+
+	buf := &getg().m.p.ptr().wbBuf
+	h := heapBitsForAddr(dst, size)
+	if src == 0 {
+		for {
+			var addr uintptr
+			if h, addr = h.next(); addr == 0 {
+				break
+			}
+			dstx := (*uintptr)(unsafe.Pointer(addr))
+			p := buf.get1()
+			p[0] = *dstx
+		}
+	} else {
+		for {
+			var addr uintptr
+			if h, addr = h.next(); addr == 0 {
+				break
+			}
+			dstx := (*uintptr)(unsafe.Pointer(addr))
+			srcx := (*uintptr)(unsafe.Pointer(src + (addr - dst)))
+			p := buf.get2()
+			p[0] = *dstx
+			p[1] = *srcx
+		}
+	}
+}
+
+// bulkBarrierPreWriteSrcOnly is like bulkBarrierPreWrite but
+// does not execute write barriers for [dst, dst+size).
+//
+// In addition to the requirements of bulkBarrierPreWrite
+// callers need to ensure [dst, dst+size) is zeroed.
+//
+// This is used for special cases where e.g. dst was just
+// created and zeroed with malloc.
+//
+//go:nosplit
+func bulkBarrierPreWriteSrcOnly(dst, src, size uintptr) {
+	if (dst|src|size)&(goarch.PtrSize-1) != 0 {
+		throw("bulkBarrierPreWrite: unaligned arguments")
+	}
+	if !writeBarrier.needed {
+		return
+	}
+	buf := &getg().m.p.ptr().wbBuf
+	h := heapBitsForAddr(dst, size)
+	for {
+		var addr uintptr
+		if h, addr = h.next(); addr == 0 {
+			break
+		}
+		srcx := (*uintptr)(unsafe.Pointer(addr - dst + src))
+		p := buf.get1()
+		p[0] = *srcx
+	}
+}
+
+// bulkBarrierBitmap executes write barriers for copying from [src,
+// src+size) to [dst, dst+size) using a 1-bit pointer bitmap. src is
+// assumed to start maskOffset bytes into the data covered by the
+// bitmap in bits (which may not be a multiple of 8).
+//
+// This is used by bulkBarrierPreWrite for writes to data and BSS.
+//
+//go:nosplit
+func bulkBarrierBitmap(dst, src, size, maskOffset uintptr, bits *uint8) {
+	word := maskOffset / goarch.PtrSize
+	bits = addb(bits, word/8)
+	mask := uint8(1) << (word % 8)
+
+	buf := &getg().m.p.ptr().wbBuf
+	for i := uintptr(0); i < size; i += goarch.PtrSize {
+		if mask == 0 {
+			bits = addb(bits, 1)
+			if *bits == 0 {
+				// Skip 8 words.
+				i += 7 * goarch.PtrSize
+				continue
+			}
+			mask = 1
+		}
+		if *bits&mask != 0 {
+			dstx := (*uintptr)(unsafe.Pointer(dst + i))
+			if src == 0 {
+				p := buf.get1()
+				p[0] = *dstx
+			} else {
+				srcx := (*uintptr)(unsafe.Pointer(src + i))
+				p := buf.get2()
+				p[0] = *dstx
+				p[1] = *srcx
+			}
+		}
+		mask <<= 1
+	}
+}
+
+// typeBitsBulkBarrier executes a write barrier for every
+// pointer that would be copied from [src, src+size) to [dst,
+// dst+size) by a memmove using the type bitmap to locate those
+// pointer slots.
+//
+// The type typ must correspond exactly to [src, src+size) and [dst, dst+size).
+// dst, src, and size must be pointer-aligned.
+// The type typ must have a plain bitmap, not a GC program.
+// The only use of this function is in channel sends, and the
+// 64 kB channel element limit takes care of this for us.
+//
+// Must not be preempted because it typically runs right before memmove,
+// and the GC must observe them as an atomic action.
+//
+// Callers must perform cgo checks if goexperiment.CgoCheck2.
+//
+//go:nosplit
+func typeBitsBulkBarrier(typ *_type, dst, src, size uintptr) {
+	if typ == nil {
+		throw("runtime: typeBitsBulkBarrier without type")
+	}
+	if typ.Size_ != size {
+		println("runtime: typeBitsBulkBarrier with type ", toRType(typ).string(), " of size ", typ.Size_, " but memory size", size)
+		throw("runtime: invalid typeBitsBulkBarrier")
+	}
+	if typ.Kind_&kindGCProg != 0 {
+		println("runtime: typeBitsBulkBarrier with type ", toRType(typ).string(), " with GC prog")
+		throw("runtime: invalid typeBitsBulkBarrier")
+	}
+	if !writeBarrier.needed {
+		return
+	}
+	ptrmask := typ.GCData
+	buf := &getg().m.p.ptr().wbBuf
+	var bits uint32
+	for i := uintptr(0); i < typ.PtrBytes; i += goarch.PtrSize {
+		if i&(goarch.PtrSize*8-1) == 0 {
+			bits = uint32(*ptrmask)
+			ptrmask = addb(ptrmask, 1)
+		} else {
+			bits = bits >> 1
+		}
+		if bits&1 != 0 {
+			dstx := (*uintptr)(unsafe.Pointer(dst + i))
+			srcx := (*uintptr)(unsafe.Pointer(src + i))
+			p := buf.get2()
+			p[0] = *dstx
+			p[1] = *srcx
+		}
+	}
+}
+
+// initHeapBits initializes the heap bitmap for a span.
+// If this is a span of single pointer allocations, it initializes all
+// words to pointer. If force is true, clears all bits.
+func (s *mspan) initHeapBits(forceClear bool) {
+	if forceClear || s.spanclass.noscan() {
+		// Set all the pointer bits to zero. We do this once
+		// when the span is allocated so we don't have to do it
+		// for each object allocation.
+		base := s.base()
+		size := s.npages * pageSize
+		h := writeHeapBitsForAddr(base)
+		h.flush(base, size)
+		return
+	}
+	isPtrs := goarch.PtrSize == 8 && s.elemsize == goarch.PtrSize
+	if !isPtrs {
+		return // nothing to do
+	}
+	h := writeHeapBitsForAddr(s.base())
+	size := s.npages * pageSize
+	nptrs := size / goarch.PtrSize
+	for i := uintptr(0); i < nptrs; i += ptrBits {
+		h = h.write(^uintptr(0), ptrBits)
+	}
+	h.flush(s.base(), size)
+}
+
+// countAlloc returns the number of objects allocated in span s by
+// scanning the allocation bitmap.
+func (s *mspan) countAlloc() int {
+	count := 0
+	bytes := divRoundUp(s.nelems, 8)
+	// Iterate over each 8-byte chunk and count allocations
+	// with an intrinsic. Note that newMarkBits guarantees that
+	// gcmarkBits will be 8-byte aligned, so we don't have to
+	// worry about edge cases, irrelevant bits will simply be zero.
+	for i := uintptr(0); i < bytes; i += 8 {
+		// Extract 64 bits from the byte pointer and get a OnesCount.
+		// Note that the unsafe cast here doesn't preserve endianness,
+		// but that's OK. We only care about how many bits are 1, not
+		// about the order we discover them in.
+		mrkBits := *(*uint64)(unsafe.Pointer(s.gcmarkBits.bytep(i)))
+		count += sys.OnesCount64(mrkBits)
+	}
+	return count
+}
+
+type writeHeapBits struct {
+	addr  uintptr // address that the low bit of mask represents the pointer state of.
+	mask  uintptr // some pointer bits starting at the address addr.
+	valid uintptr // number of bits in buf that are valid (including low)
+	low   uintptr // number of low-order bits to not overwrite
+}
+
+func writeHeapBitsForAddr(addr uintptr) (h writeHeapBits) {
+	// We start writing bits maybe in the middle of a heap bitmap word.
+	// Remember how many bits into the word we started, so we can be sure
+	// not to overwrite the previous bits.
+	h.low = addr / goarch.PtrSize % ptrBits
+
+	// round down to heap word that starts the bitmap word.
+	h.addr = addr - h.low*goarch.PtrSize
+
+	// We don't have any bits yet.
+	h.mask = 0
+	h.valid = h.low
+
+	return
+}
+
+// write appends the pointerness of the next valid pointer slots
+// using the low valid bits of bits. 1=pointer, 0=scalar.
+func (h writeHeapBits) write(bits, valid uintptr) writeHeapBits {
+	if h.valid+valid <= ptrBits {
+		// Fast path - just accumulate the bits.
+		h.mask |= bits << h.valid
+		h.valid += valid
+		return h
+	}
+	// Too many bits to fit in this word. Write the current word
+	// out and move on to the next word.
+
+	data := h.mask | bits<<h.valid       // mask for this word
+	h.mask = bits >> (ptrBits - h.valid) // leftover for next word
+	h.valid += valid - ptrBits           // have h.valid+valid bits, writing ptrBits of them
+
+	// Flush mask to the memory bitmap.
+	// TODO: figure out how to cache arena lookup.
+	ai := arenaIndex(h.addr)
+	ha := mheap_.arenas[ai.l1()][ai.l2()]
+	idx := h.addr / (ptrBits * goarch.PtrSize) % heapArenaBitmapWords
+	m := uintptr(1)<<h.low - 1
+	ha.bitmap[idx] = ha.bitmap[idx]&m | data
+	// Note: no synchronization required for this write because
+	// the allocator has exclusive access to the page, and the bitmap
+	// entries are all for a single page. Also, visibility of these
+	// writes is guaranteed by the publication barrier in mallocgc.
+
+	// Clear noMorePtrs bit, since we're going to be writing bits
+	// into the following word.
+	ha.noMorePtrs[idx/8] &^= uint8(1) << (idx % 8)
+	// Note: same as above
+
+	// Move to next word of bitmap.
+	h.addr += ptrBits * goarch.PtrSize
+	h.low = 0
+	return h
+}
+
+// Add padding of size bytes.
+func (h writeHeapBits) pad(size uintptr) writeHeapBits {
+	if size == 0 {
+		return h
+	}
+	words := size / goarch.PtrSize
+	for words > ptrBits {
+		h = h.write(0, ptrBits)
+		words -= ptrBits
+	}
+	return h.write(0, words)
+}
+
+// Flush the bits that have been written, and add zeros as needed
+// to cover the full object [addr, addr+size).
+func (h writeHeapBits) flush(addr, size uintptr) {
+	// zeros counts the number of bits needed to represent the object minus the
+	// number of bits we've already written. This is the number of 0 bits
+	// that need to be added.
+	zeros := (addr+size-h.addr)/goarch.PtrSize - h.valid
+
+	// Add zero bits up to the bitmap word boundary
+	if zeros > 0 {
+		z := ptrBits - h.valid
+		if z > zeros {
+			z = zeros
+		}
+		h.valid += z
+		zeros -= z
+	}
+
+	// Find word in bitmap that we're going to write.
+	ai := arenaIndex(h.addr)
+	ha := mheap_.arenas[ai.l1()][ai.l2()]
+	idx := h.addr / (ptrBits * goarch.PtrSize) % heapArenaBitmapWords
+
+	// Write remaining bits.
+	if h.valid != h.low {
+		m := uintptr(1)<<h.low - 1      // don't clear existing bits below "low"
+		m |= ^(uintptr(1)<<h.valid - 1) // don't clear existing bits above "valid"
+		ha.bitmap[idx] = ha.bitmap[idx]&m | h.mask
+	}
+	if zeros == 0 {
+		return
+	}
+
+	// Record in the noMorePtrs map that there won't be any more 1 bits,
+	// so readers can stop early.
+	ha.noMorePtrs[idx/8] |= uint8(1) << (idx % 8)
+
+	// Advance to next bitmap word.
+	h.addr += ptrBits * goarch.PtrSize
+
+	// Continue on writing zeros for the rest of the object.
+	// For standard use of the ptr bits this is not required, as
+	// the bits are read from the beginning of the object. Some uses,
+	// like noscan spans, oblets, bulk write barriers, and cgocheck, might
+	// start mid-object, so these writes are still required.
+	for {
+		// Write zero bits.
+		ai := arenaIndex(h.addr)
+		ha := mheap_.arenas[ai.l1()][ai.l2()]
+		idx := h.addr / (ptrBits * goarch.PtrSize) % heapArenaBitmapWords
+		if zeros < ptrBits {
+			ha.bitmap[idx] &^= uintptr(1)<<zeros - 1
+			break
+		} else if zeros == ptrBits {
+			ha.bitmap[idx] = 0
+			break
+		} else {
+			ha.bitmap[idx] = 0
+			zeros -= ptrBits
+		}
+		ha.noMorePtrs[idx/8] |= uint8(1) << (idx % 8)
+		h.addr += ptrBits * goarch.PtrSize
+	}
+}
+
+// Read the bytes starting at the aligned pointer p into a uintptr.
+// Read is little-endian.
+func readUintptr(p *byte) uintptr {
+	x := *(*uintptr)(unsafe.Pointer(p))
+	if goarch.BigEndian {
+		if goarch.PtrSize == 8 {
+			return uintptr(sys.Bswap64(uint64(x)))
+		}
+		return uintptr(sys.Bswap32(uint32(x)))
+	}
+	return x
+}
+
+// heapBitsSetType records that the new allocation [x, x+size)
+// holds in [x, x+dataSize) one or more values of type typ.
+// (The number of values is given by dataSize / typ.Size.)
+// If dataSize < size, the fragment [x+dataSize, x+size) is
+// recorded as non-pointer data.
+// It is known that the type has pointers somewhere;
+// malloc does not call heapBitsSetType when there are no pointers,
+// because all free objects are marked as noscan during
+// heapBitsSweepSpan.
+//
+// There can only be one allocation from a given span active at a time,
+// and the bitmap for a span always falls on word boundaries,
+// so there are no write-write races for access to the heap bitmap.
+// Hence, heapBitsSetType can access the bitmap without atomics.
+//
+// There can be read-write races between heapBitsSetType and things
+// that read the heap bitmap like scanobject. However, since
+// heapBitsSetType is only used for objects that have not yet been
+// made reachable, readers will ignore bits being modified by this
+// function. This does mean this function cannot transiently modify
+// bits that belong to neighboring objects. Also, on weakly-ordered
+// machines, callers must execute a store/store (publication) barrier
+// between calling this function and making the object reachable.
+func heapBitsSetType(x, size, dataSize uintptr, typ *_type) {
+	const doubleCheck = false // slow but helpful; enable to test modifications to this code
+
+	if doubleCheck && dataSize%typ.Size_ != 0 {
+		throw("heapBitsSetType: dataSize not a multiple of typ.Size")
+	}
+
+	if goarch.PtrSize == 8 && size == goarch.PtrSize {
+		// It's one word and it has pointers, it must be a pointer.
+		// Since all allocated one-word objects are pointers
+		// (non-pointers are aggregated into tinySize allocations),
+		// (*mspan).initHeapBits sets the pointer bits for us.
+		// Nothing to do here.
+		if doubleCheck {
+			h, addr := heapBitsForAddr(x, size).next()
+			if addr != x {
+				throw("heapBitsSetType: pointer bit missing")
+			}
+			_, addr = h.next()
+			if addr != 0 {
+				throw("heapBitsSetType: second pointer bit found")
+			}
+		}
+		return
+	}
+
+	h := writeHeapBitsForAddr(x)
+
+	// Handle GC program.
+	if typ.Kind_&kindGCProg != 0 {
+		// Expand the gc program into the storage we're going to use for the actual object.
+		obj := (*uint8)(unsafe.Pointer(x))
+		n := runGCProg(addb(typ.GCData, 4), obj)
+		// Use the expanded program to set the heap bits.
+		for i := uintptr(0); true; i += typ.Size_ {
+			// Copy expanded program to heap bitmap.
+			p := obj
+			j := n
+			for j > 8 {
+				h = h.write(uintptr(*p), 8)
+				p = add1(p)
+				j -= 8
+			}
+			h = h.write(uintptr(*p), j)
+
+			if i+typ.Size_ == dataSize {
+				break // no padding after last element
+			}
+
+			// Pad with zeros to the start of the next element.
+			h = h.pad(typ.Size_ - n*goarch.PtrSize)
+		}
+
+		h.flush(x, size)
+
+		// Erase the expanded GC program.
+		memclrNoHeapPointers(unsafe.Pointer(obj), (n+7)/8)
+		return
+	}
+
+	// Note about sizes:
+	//
+	// typ.Size is the number of words in the object,
+	// and typ.PtrBytes is the number of words in the prefix
+	// of the object that contains pointers. That is, the final
+	// typ.Size - typ.PtrBytes words contain no pointers.
+	// This allows optimization of a common pattern where
+	// an object has a small header followed by a large scalar
+	// buffer. If we know the pointers are over, we don't have
+	// to scan the buffer's heap bitmap at all.
+	// The 1-bit ptrmasks are sized to contain only bits for
+	// the typ.PtrBytes prefix, zero padded out to a full byte
+	// of bitmap. If there is more room in the allocated object,
+	// that space is pointerless. The noMorePtrs bitmap will prevent
+	// scanning large pointerless tails of an object.
+	//
+	// Replicated copies are not as nice: if there is an array of
+	// objects with scalar tails, all but the last tail does have to
+	// be initialized, because there is no way to say "skip forward".
+
+	ptrs := typ.PtrBytes / goarch.PtrSize
+	if typ.Size_ == dataSize { // Single element
+		if ptrs <= ptrBits { // Single small element
+			m := readUintptr(typ.GCData)
+			h = h.write(m, ptrs)
+		} else { // Single large element
+			p := typ.GCData
+			for {
+				h = h.write(readUintptr(p), ptrBits)
+				p = addb(p, ptrBits/8)
+				ptrs -= ptrBits
+				if ptrs <= ptrBits {
+					break
+				}
+			}
+			m := readUintptr(p)
+			h = h.write(m, ptrs)
+		}
+	} else { // Repeated element
+		words := typ.Size_ / goarch.PtrSize // total words, including scalar tail
+		if words <= ptrBits {               // Repeated small element
+			n := dataSize / typ.Size_
+			m := readUintptr(typ.GCData)
+			// Make larger unit to repeat
+			for words <= ptrBits/2 {
+				if n&1 != 0 {
+					h = h.write(m, words)
+				}
+				n /= 2
+				m |= m << words
+				ptrs += words
+				words *= 2
+				if n == 1 {
+					break
+				}
+			}
+			for n > 1 {
+				h = h.write(m, words)
+				n--
+			}
+			h = h.write(m, ptrs)
+		} else { // Repeated large element
+			for i := uintptr(0); true; i += typ.Size_ {
+				p := typ.GCData
+				j := ptrs
+				for j > ptrBits {
+					h = h.write(readUintptr(p), ptrBits)
+					p = addb(p, ptrBits/8)
+					j -= ptrBits
+				}
+				m := readUintptr(p)
+				h = h.write(m, j)
+				if i+typ.Size_ == dataSize {
+					break // don't need the trailing nonptr bits on the last element.
+				}
+				// Pad with zeros to the start of the next element.
+				h = h.pad(typ.Size_ - typ.PtrBytes)
+			}
+		}
+	}
+	h.flush(x, size)
+
+	if doubleCheck {
+		h := heapBitsForAddr(x, size)
+		for i := uintptr(0); i < size; i += goarch.PtrSize {
+			// Compute the pointer bit we want at offset i.
+			want := false
+			if i < dataSize {
+				off := i % typ.Size_
+				if off < typ.PtrBytes {
+					j := off / goarch.PtrSize
+					want = *addb(typ.GCData, j/8)>>(j%8)&1 != 0
+				}
+			}
+			if want {
+				var addr uintptr
+				h, addr = h.next()
+				if addr != x+i {
+					throw("heapBitsSetType: pointer entry not correct")
+				}
+			}
+		}
+		if _, addr := h.next(); addr != 0 {
+			throw("heapBitsSetType: extra pointer")
+		}
+	}
+}
+
+var debugPtrmask struct {
+	lock mutex
+	data *byte
+}
+
+// progToPointerMask returns the 1-bit pointer mask output by the GC program prog.
+// size the size of the region described by prog, in bytes.
+// The resulting bitvector will have no more than size/goarch.PtrSize bits.
+func progToPointerMask(prog *byte, size uintptr) bitvector {
+	n := (size/goarch.PtrSize + 7) / 8
+	x := (*[1 << 30]byte)(persistentalloc(n+1, 1, &memstats.buckhash_sys))[:n+1]
+	x[len(x)-1] = 0xa1 // overflow check sentinel
+	n = runGCProg(prog, &x[0])
+	if x[len(x)-1] != 0xa1 {
+		throw("progToPointerMask: overflow")
+	}
+	return bitvector{int32(n), &x[0]}
+}
+
+// Packed GC pointer bitmaps, aka GC programs.
+//
+// For large types containing arrays, the type information has a
+// natural repetition that can be encoded to save space in the
+// binary and in the memory representation of the type information.
+//
+// The encoding is a simple Lempel-Ziv style bytecode machine
+// with the following instructions:
+//
+//	00000000: stop
+//	0nnnnnnn: emit n bits copied from the next (n+7)/8 bytes
+//	10000000 n c: repeat the previous n bits c times; n, c are varints
+//	1nnnnnnn c: repeat the previous n bits c times; c is a varint
+
+// runGCProg returns the number of 1-bit entries written to memory.
+func runGCProg(prog, dst *byte) uintptr {
+	dstStart := dst
+
+	// Bits waiting to be written to memory.
+	var bits uintptr
+	var nbits uintptr
+
+	p := prog
+Run:
+	for {
+		// Flush accumulated full bytes.
+		// The rest of the loop assumes that nbits <= 7.
+		for ; nbits >= 8; nbits -= 8 {
+			*dst = uint8(bits)
+			dst = add1(dst)
+			bits >>= 8
+		}
+
+		// Process one instruction.
+		inst := uintptr(*p)
+		p = add1(p)
+		n := inst & 0x7F
+		if inst&0x80 == 0 {
+			// Literal bits; n == 0 means end of program.
+			if n == 0 {
+				// Program is over.
+				break Run
+			}
+			nbyte := n / 8
+			for i := uintptr(0); i < nbyte; i++ {
+				bits |= uintptr(*p) << nbits
+				p = add1(p)
+				*dst = uint8(bits)
+				dst = add1(dst)
+				bits >>= 8
+			}
+			if n %= 8; n > 0 {
+				bits |= uintptr(*p) << nbits
+				p = add1(p)
+				nbits += n
+			}
+			continue Run
+		}
+
+		// Repeat. If n == 0, it is encoded in a varint in the next bytes.
+		if n == 0 {
+			for off := uint(0); ; off += 7 {
+				x := uintptr(*p)
+				p = add1(p)
+				n |= (x & 0x7F) << off
+				if x&0x80 == 0 {
+					break
+				}
+			}
+		}
+
+		// Count is encoded in a varint in the next bytes.
+		c := uintptr(0)
+		for off := uint(0); ; off += 7 {
+			x := uintptr(*p)
+			p = add1(p)
+			c |= (x & 0x7F) << off
+			if x&0x80 == 0 {
+				break
+			}
+		}
+		c *= n // now total number of bits to copy
+
+		// If the number of bits being repeated is small, load them
+		// into a register and use that register for the entire loop
+		// instead of repeatedly reading from memory.
+		// Handling fewer than 8 bits here makes the general loop simpler.
+		// The cutoff is goarch.PtrSize*8 - 7 to guarantee that when we add
+		// the pattern to a bit buffer holding at most 7 bits (a partial byte)
+		// it will not overflow.
+		src := dst
+		const maxBits = goarch.PtrSize*8 - 7
+		if n <= maxBits {
+			// Start with bits in output buffer.
+			pattern := bits
+			npattern := nbits
+
+			// If we need more bits, fetch them from memory.
+			src = subtract1(src)
+			for npattern < n {
+				pattern <<= 8
+				pattern |= uintptr(*src)
+				src = subtract1(src)
+				npattern += 8
+			}
+
+			// We started with the whole bit output buffer,
+			// and then we loaded bits from whole bytes.
+			// Either way, we might now have too many instead of too few.
+			// Discard the extra.
+			if npattern > n {
+				pattern >>= npattern - n
+				npattern = n
+			}
+
+			// Replicate pattern to at most maxBits.
+			if npattern == 1 {
+				// One bit being repeated.
+				// If the bit is 1, make the pattern all 1s.
+				// If the bit is 0, the pattern is already all 0s,
+				// but we can claim that the number of bits
+				// in the word is equal to the number we need (c),
+				// because right shift of bits will zero fill.
+				if pattern == 1 {
+					pattern = 1<<maxBits - 1
+					npattern = maxBits
+				} else {
+					npattern = c
+				}
+			} else {
+				b := pattern
+				nb := npattern
+				if nb+nb <= maxBits {
+					// Double pattern until the whole uintptr is filled.
+					for nb <= goarch.PtrSize*8 {
+						b |= b << nb
+						nb += nb
+					}
+					// Trim away incomplete copy of original pattern in high bits.
+					// TODO(rsc): Replace with table lookup or loop on systems without divide?
+					nb = maxBits / npattern * npattern
+					b &= 1<<nb - 1
+					pattern = b
+					npattern = nb
+				}
+			}
+
+			// Add pattern to bit buffer and flush bit buffer, c/npattern times.
+			// Since pattern contains >8 bits, there will be full bytes to flush
+			// on each iteration.
+			for ; c >= npattern; c -= npattern {
+				bits |= pattern << nbits
+				nbits += npattern
+				for nbits >= 8 {
+					*dst = uint8(bits)
+					dst = add1(dst)
+					bits >>= 8
+					nbits -= 8
+				}
+			}
+
+			// Add final fragment to bit buffer.
+			if c > 0 {
+				pattern &= 1<<c - 1
+				bits |= pattern << nbits
+				nbits += c
+			}
+			continue Run
+		}
+
+		// Repeat; n too large to fit in a register.
+		// Since nbits <= 7, we know the first few bytes of repeated data
+		// are already written to memory.
+		off := n - nbits // n > nbits because n > maxBits and nbits <= 7
+		// Leading src fragment.
+		src = subtractb(src, (off+7)/8)
+		if frag := off & 7; frag != 0 {
+			bits |= uintptr(*src) >> (8 - frag) << nbits
+			src = add1(src)
+			nbits += frag
+			c -= frag
+		}
+		// Main loop: load one byte, write another.
+		// The bits are rotating through the bit buffer.
+		for i := c / 8; i > 0; i-- {
+			bits |= uintptr(*src) << nbits
+			src = add1(src)
+			*dst = uint8(bits)
+			dst = add1(dst)
+			bits >>= 8
+		}
+		// Final src fragment.
+		if c %= 8; c > 0 {
+			bits |= (uintptr(*src) & (1<<c - 1)) << nbits
+			nbits += c
+		}
+	}
+
+	// Write any final bits out, using full-byte writes, even for the final byte.
+	totalBits := (uintptr(unsafe.Pointer(dst))-uintptr(unsafe.Pointer(dstStart)))*8 + nbits
+	nbits += -nbits & 7
+	for ; nbits > 0; nbits -= 8 {
+		*dst = uint8(bits)
+		dst = add1(dst)
+		bits >>= 8
+	}
+	return totalBits
+}
+
+// materializeGCProg allocates space for the (1-bit) pointer bitmask
+// for an object of size ptrdata.  Then it fills that space with the
+// pointer bitmask specified by the program prog.
+// The bitmask starts at s.startAddr.
+// The result must be deallocated with dematerializeGCProg.
+func materializeGCProg(ptrdata uintptr, prog *byte) *mspan {
+	// Each word of ptrdata needs one bit in the bitmap.
+	bitmapBytes := divRoundUp(ptrdata, 8*goarch.PtrSize)
+	// Compute the number of pages needed for bitmapBytes.
+	pages := divRoundUp(bitmapBytes, pageSize)
+	s := mheap_.allocManual(pages, spanAllocPtrScalarBits)
+	runGCProg(addb(prog, 4), (*byte)(unsafe.Pointer(s.startAddr)))
+	return s
+}
+func dematerializeGCProg(s *mspan) {
+	mheap_.freeManual(s, spanAllocPtrScalarBits)
+}
+
+func dumpGCProg(p *byte) {
+	nptr := 0
+	for {
+		x := *p
+		p = add1(p)
+		if x == 0 {
+			print("\t", nptr, " end\n")
+			break
+		}
+		if x&0x80 == 0 {
+			print("\t", nptr, " lit ", x, ":")
+			n := int(x+7) / 8
+			for i := 0; i < n; i++ {
+				print(" ", hex(*p))
+				p = add1(p)
+			}
+			print("\n")
+			nptr += int(x)
+		} else {
+			nbit := int(x &^ 0x80)
+			if nbit == 0 {
+				for nb := uint(0); ; nb += 7 {
+					x := *p
+					p = add1(p)
+					nbit |= int(x&0x7f) << nb
+					if x&0x80 == 0 {
+						break
+					}
+				}
+			}
+			count := 0
+			for nb := uint(0); ; nb += 7 {
+				x := *p
+				p = add1(p)
+				count |= int(x&0x7f) << nb
+				if x&0x80 == 0 {
+					break
+				}
+			}
+			print("\t", nptr, " repeat ", nbit, " × ", count, "\n")
+			nptr += nbit * count
+		}
+	}
+}
+
+// Testing.
+
+// reflect_gcbits returns the GC type info for x, for testing.
+// The result is the bitmap entries (0 or 1), one entry per byte.
+//
+//go:linkname reflect_gcbits reflect.gcbits
+func reflect_gcbits(x any) []byte {
+	return getgcmask(x)
+}
+
+// Returns GC type info for the pointer stored in ep for testing.
+// If ep points to the stack, only static live information will be returned
+// (i.e. not for objects which are only dynamically live stack objects).
+func getgcmask(ep any) (mask []byte) {
+	e := *efaceOf(&ep)
+	p := e.data
+	t := e._type
+	// data or bss
+	for _, datap := range activeModules() {
+		// data
+		if datap.data <= uintptr(p) && uintptr(p) < datap.edata {
+			bitmap := datap.gcdatamask.bytedata
+			n := (*ptrtype)(unsafe.Pointer(t)).Elem.Size_
+			mask = make([]byte, n/goarch.PtrSize)
+			for i := uintptr(0); i < n; i += goarch.PtrSize {
+				off := (uintptr(p) + i - datap.data) / goarch.PtrSize
+				mask[i/goarch.PtrSize] = (*addb(bitmap, off/8) >> (off % 8)) & 1
+			}
+			return
+		}
+
+		// bss
+		if datap.bss <= uintptr(p) && uintptr(p) < datap.ebss {
+			bitmap := datap.gcbssmask.bytedata
+			n := (*ptrtype)(unsafe.Pointer(t)).Elem.Size_
+			mask = make([]byte, n/goarch.PtrSize)
+			for i := uintptr(0); i < n; i += goarch.PtrSize {
+				off := (uintptr(p) + i - datap.bss) / goarch.PtrSize
+				mask[i/goarch.PtrSize] = (*addb(bitmap, off/8) >> (off % 8)) & 1
+			}
+			return
+		}
+	}
+
+	// heap
+	if base, s, _ := findObject(uintptr(p), 0, 0); base != 0 {
+		if s.spanclass.noscan() {
+			return nil
+		}
+		n := s.elemsize
+		hbits := heapBitsForAddr(base, n)
+		mask = make([]byte, n/goarch.PtrSize)
+		for {
+			var addr uintptr
+			if hbits, addr = hbits.next(); addr == 0 {
+				break
+			}
+			mask[(addr-base)/goarch.PtrSize] = 1
+		}
+		// Callers expect this mask to end at the last pointer.
+		for len(mask) > 0 && mask[len(mask)-1] == 0 {
+			mask = mask[:len(mask)-1]
+		}
+		return
+	}
+
+	// stack
+	if gp := getg(); gp.m.curg.stack.lo <= uintptr(p) && uintptr(p) < gp.m.curg.stack.hi {
+		found := false
+		var u unwinder
+		for u.initAt(gp.m.curg.sched.pc, gp.m.curg.sched.sp, 0, gp.m.curg, 0); u.valid(); u.next() {
+			if u.frame.sp <= uintptr(p) && uintptr(p) < u.frame.varp {
+				found = true
+				break
+			}
+		}
+		if found {
+			locals, _, _ := u.frame.getStackMap(nil, false)
+			if locals.n == 0 {
+				return
+			}
+			size := uintptr(locals.n) * goarch.PtrSize
+			n := (*ptrtype)(unsafe.Pointer(t)).Elem.Size_
+			mask = make([]byte, n/goarch.PtrSize)
+			for i := uintptr(0); i < n; i += goarch.PtrSize {
+				off := (uintptr(p) + i - u.frame.varp + size) / goarch.PtrSize
+				mask[i/goarch.PtrSize] = locals.ptrbit(off)
+			}
+		}
+		return
+	}
+
+	// otherwise, not something the GC knows about.
+	// possibly read-only data, like malloc(0).
+	// must not have pointers
+	return
+}
diff --git a/src/runtime/mcache.go b/src/runtime/mcache.go
new file mode 100644
index 0000000..acfd99b
--- /dev/null
+++ b/src/runtime/mcache.go
@@ -0,0 +1,331 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// Per-thread (in Go, per-P) cache for small objects.
+// This includes a small object cache and local allocation stats.
+// No locking needed because it is per-thread (per-P).
+//
+// mcaches are allocated from non-GC'd memory, so any heap pointers
+// must be specially handled.
+type mcache struct {
+	_ sys.NotInHeap
+
+	// The following members are accessed on every malloc,
+	// so they are grouped here for better caching.
+	nextSample uintptr // trigger heap sample after allocating this many bytes
+	scanAlloc  uintptr // bytes of scannable heap allocated
+
+	// Allocator cache for tiny objects w/o pointers.
+	// See "Tiny allocator" comment in malloc.go.
+
+	// tiny points to the beginning of the current tiny block, or
+	// nil if there is no current tiny block.
+	//
+	// tiny is a heap pointer. Since mcache is in non-GC'd memory,
+	// we handle it by clearing it in releaseAll during mark
+	// termination.
+	//
+	// tinyAllocs is the number of tiny allocations performed
+	// by the P that owns this mcache.
+	tiny       uintptr
+	tinyoffset uintptr
+	tinyAllocs uintptr
+
+	// The rest is not accessed on every malloc.
+
+	alloc [numSpanClasses]*mspan // spans to allocate from, indexed by spanClass
+
+	stackcache [_NumStackOrders]stackfreelist
+
+	// flushGen indicates the sweepgen during which this mcache
+	// was last flushed. If flushGen != mheap_.sweepgen, the spans
+	// in this mcache are stale and need to the flushed so they
+	// can be swept. This is done in acquirep.
+	flushGen atomic.Uint32
+}
+
+// A gclink is a node in a linked list of blocks, like mlink,
+// but it is opaque to the garbage collector.
+// The GC does not trace the pointers during collection,
+// and the compiler does not emit write barriers for assignments
+// of gclinkptr values. Code should store references to gclinks
+// as gclinkptr, not as *gclink.
+type gclink struct {
+	next gclinkptr
+}
+
+// A gclinkptr is a pointer to a gclink, but it is opaque
+// to the garbage collector.
+type gclinkptr uintptr
+
+// ptr returns the *gclink form of p.
+// The result should be used for accessing fields, not stored
+// in other data structures.
+func (p gclinkptr) ptr() *gclink {
+	return (*gclink)(unsafe.Pointer(p))
+}
+
+type stackfreelist struct {
+	list gclinkptr // linked list of free stacks
+	size uintptr   // total size of stacks in list
+}
+
+// dummy mspan that contains no free objects.
+var emptymspan mspan
+
+func allocmcache() *mcache {
+	var c *mcache
+	systemstack(func() {
+		lock(&mheap_.lock)
+		c = (*mcache)(mheap_.cachealloc.alloc())
+		c.flushGen.Store(mheap_.sweepgen)
+		unlock(&mheap_.lock)
+	})
+	for i := range c.alloc {
+		c.alloc[i] = &emptymspan
+	}
+	c.nextSample = nextSample()
+	return c
+}
+
+// freemcache releases resources associated with this
+// mcache and puts the object onto a free list.
+//
+// In some cases there is no way to simply release
+// resources, such as statistics, so donate them to
+// a different mcache (the recipient).
+func freemcache(c *mcache) {
+	systemstack(func() {
+		c.releaseAll()
+		stackcache_clear(c)
+
+		// NOTE(rsc,rlh): If gcworkbuffree comes back, we need to coordinate
+		// with the stealing of gcworkbufs during garbage collection to avoid
+		// a race where the workbuf is double-freed.
+		// gcworkbuffree(c.gcworkbuf)
+
+		lock(&mheap_.lock)
+		mheap_.cachealloc.free(unsafe.Pointer(c))
+		unlock(&mheap_.lock)
+	})
+}
+
+// getMCache is a convenience function which tries to obtain an mcache.
+//
+// Returns nil if we're not bootstrapping or we don't have a P. The caller's
+// P must not change, so we must be in a non-preemptible state.
+func getMCache(mp *m) *mcache {
+	// Grab the mcache, since that's where stats live.
+	pp := mp.p.ptr()
+	var c *mcache
+	if pp == nil {
+		// We will be called without a P while bootstrapping,
+		// in which case we use mcache0, which is set in mallocinit.
+		// mcache0 is cleared when bootstrapping is complete,
+		// by procresize.
+		c = mcache0
+	} else {
+		c = pp.mcache
+	}
+	return c
+}
+
+// refill acquires a new span of span class spc for c. This span will
+// have at least one free object. The current span in c must be full.
+//
+// Must run in a non-preemptible context since otherwise the owner of
+// c could change.
+func (c *mcache) refill(spc spanClass) {
+	// Return the current cached span to the central lists.
+	s := c.alloc[spc]
+
+	if uintptr(s.allocCount) != s.nelems {
+		throw("refill of span with free space remaining")
+	}
+	if s != &emptymspan {
+		// Mark this span as no longer cached.
+		if s.sweepgen != mheap_.sweepgen+3 {
+			throw("bad sweepgen in refill")
+		}
+		mheap_.central[spc].mcentral.uncacheSpan(s)
+
+		// Count up how many slots were used and record it.
+		stats := memstats.heapStats.acquire()
+		slotsUsed := int64(s.allocCount) - int64(s.allocCountBeforeCache)
+		atomic.Xadd64(&stats.smallAllocCount[spc.sizeclass()], slotsUsed)
+
+		// Flush tinyAllocs.
+		if spc == tinySpanClass {
+			atomic.Xadd64(&stats.tinyAllocCount, int64(c.tinyAllocs))
+			c.tinyAllocs = 0
+		}
+		memstats.heapStats.release()
+
+		// Count the allocs in inconsistent, internal stats.
+		bytesAllocated := slotsUsed * int64(s.elemsize)
+		gcController.totalAlloc.Add(bytesAllocated)
+
+		// Clear the second allocCount just to be safe.
+		s.allocCountBeforeCache = 0
+	}
+
+	// Get a new cached span from the central lists.
+	s = mheap_.central[spc].mcentral.cacheSpan()
+	if s == nil {
+		throw("out of memory")
+	}
+
+	if uintptr(s.allocCount) == s.nelems {
+		throw("span has no free space")
+	}
+
+	// Indicate that this span is cached and prevent asynchronous
+	// sweeping in the next sweep phase.
+	s.sweepgen = mheap_.sweepgen + 3
+
+	// Store the current alloc count for accounting later.
+	s.allocCountBeforeCache = s.allocCount
+
+	// Update heapLive and flush scanAlloc.
+	//
+	// We have not yet allocated anything new into the span, but we
+	// assume that all of its slots will get used, so this makes
+	// heapLive an overestimate.
+	//
+	// When the span gets uncached, we'll fix up this overestimate
+	// if necessary (see releaseAll).
+	//
+	// We pick an overestimate here because an underestimate leads
+	// the pacer to believe that it's in better shape than it is,
+	// which appears to lead to more memory used. See #53738 for
+	// more details.
+	usedBytes := uintptr(s.allocCount) * s.elemsize
+	gcController.update(int64(s.npages*pageSize)-int64(usedBytes), int64(c.scanAlloc))
+	c.scanAlloc = 0
+
+	c.alloc[spc] = s
+}
+
+// allocLarge allocates a span for a large object.
+func (c *mcache) allocLarge(size uintptr, noscan bool) *mspan {
+	if size+_PageSize < size {
+		throw("out of memory")
+	}
+	npages := size >> _PageShift
+	if size&_PageMask != 0 {
+		npages++
+	}
+
+	// Deduct credit for this span allocation and sweep if
+	// necessary. mHeap_Alloc will also sweep npages, so this only
+	// pays the debt down to npage pages.
+	deductSweepCredit(npages*_PageSize, npages)
+
+	spc := makeSpanClass(0, noscan)
+	s := mheap_.alloc(npages, spc)
+	if s == nil {
+		throw("out of memory")
+	}
+
+	// Count the alloc in consistent, external stats.
+	stats := memstats.heapStats.acquire()
+	atomic.Xadd64(&stats.largeAlloc, int64(npages*pageSize))
+	atomic.Xadd64(&stats.largeAllocCount, 1)
+	memstats.heapStats.release()
+
+	// Count the alloc in inconsistent, internal stats.
+	gcController.totalAlloc.Add(int64(npages * pageSize))
+
+	// Update heapLive.
+	gcController.update(int64(s.npages*pageSize), 0)
+
+	// Put the large span in the mcentral swept list so that it's
+	// visible to the background sweeper.
+	mheap_.central[spc].mcentral.fullSwept(mheap_.sweepgen).push(s)
+	s.limit = s.base() + size
+	s.initHeapBits(false)
+	return s
+}
+
+func (c *mcache) releaseAll() {
+	// Take this opportunity to flush scanAlloc.
+	scanAlloc := int64(c.scanAlloc)
+	c.scanAlloc = 0
+
+	sg := mheap_.sweepgen
+	dHeapLive := int64(0)
+	for i := range c.alloc {
+		s := c.alloc[i]
+		if s != &emptymspan {
+			slotsUsed := int64(s.allocCount) - int64(s.allocCountBeforeCache)
+			s.allocCountBeforeCache = 0
+
+			// Adjust smallAllocCount for whatever was allocated.
+			stats := memstats.heapStats.acquire()
+			atomic.Xadd64(&stats.smallAllocCount[spanClass(i).sizeclass()], slotsUsed)
+			memstats.heapStats.release()
+
+			// Adjust the actual allocs in inconsistent, internal stats.
+			// We assumed earlier that the full span gets allocated.
+			gcController.totalAlloc.Add(slotsUsed * int64(s.elemsize))
+
+			if s.sweepgen != sg+1 {
+				// refill conservatively counted unallocated slots in gcController.heapLive.
+				// Undo this.
+				//
+				// If this span was cached before sweep, then gcController.heapLive was totally
+				// recomputed since caching this span, so we don't do this for stale spans.
+				dHeapLive -= int64(uintptr(s.nelems)-uintptr(s.allocCount)) * int64(s.elemsize)
+			}
+
+			// Release the span to the mcentral.
+			mheap_.central[i].mcentral.uncacheSpan(s)
+			c.alloc[i] = &emptymspan
+		}
+	}
+	// Clear tinyalloc pool.
+	c.tiny = 0
+	c.tinyoffset = 0
+
+	// Flush tinyAllocs.
+	stats := memstats.heapStats.acquire()
+	atomic.Xadd64(&stats.tinyAllocCount, int64(c.tinyAllocs))
+	c.tinyAllocs = 0
+	memstats.heapStats.release()
+
+	// Update heapLive and heapScan.
+	gcController.update(dHeapLive, scanAlloc)
+}
+
+// prepareForSweep flushes c if the system has entered a new sweep phase
+// since c was populated. This must happen between the sweep phase
+// starting and the first allocation from c.
+func (c *mcache) prepareForSweep() {
+	// Alternatively, instead of making sure we do this on every P
+	// between starting the world and allocating on that P, we
+	// could leave allocate-black on, allow allocation to continue
+	// as usual, use a ragged barrier at the beginning of sweep to
+	// ensure all cached spans are swept, and then disable
+	// allocate-black. However, with this approach it's difficult
+	// to avoid spilling mark bits into the *next* GC cycle.
+	sg := mheap_.sweepgen
+	flushGen := c.flushGen.Load()
+	if flushGen == sg {
+		return
+	} else if flushGen != sg-2 {
+		println("bad flushGen", flushGen, "in prepareForSweep; sweepgen", sg)
+		throw("bad flushGen")
+	}
+	c.releaseAll()
+	stackcache_clear(c)
+	c.flushGen.Store(mheap_.sweepgen) // Synchronizes with gcStart
+}
diff --git a/src/runtime/mcentral.go b/src/runtime/mcentral.go
new file mode 100644
index 0000000..7861199
--- /dev/null
+++ b/src/runtime/mcentral.go
@@ -0,0 +1,257 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Central free lists.
+//
+// See malloc.go for an overview.
+//
+// The mcentral doesn't actually contain the list of free objects; the mspan does.
+// Each mcentral is two lists of mspans: those with free objects (c->nonempty)
+// and those that are completely allocated (c->empty).
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+)
+
+// Central list of free objects of a given size.
+type mcentral struct {
+	_         sys.NotInHeap
+	spanclass spanClass
+
+	// partial and full contain two mspan sets: one of swept in-use
+	// spans, and one of unswept in-use spans. These two trade
+	// roles on each GC cycle. The unswept set is drained either by
+	// allocation or by the background sweeper in every GC cycle,
+	// so only two roles are necessary.
+	//
+	// sweepgen is increased by 2 on each GC cycle, so the swept
+	// spans are in partial[sweepgen/2%2] and the unswept spans are in
+	// partial[1-sweepgen/2%2]. Sweeping pops spans from the
+	// unswept set and pushes spans that are still in-use on the
+	// swept set. Likewise, allocating an in-use span pushes it
+	// on the swept set.
+	//
+	// Some parts of the sweeper can sweep arbitrary spans, and hence
+	// can't remove them from the unswept set, but will add the span
+	// to the appropriate swept list. As a result, the parts of the
+	// sweeper and mcentral that do consume from the unswept list may
+	// encounter swept spans, and these should be ignored.
+	partial [2]spanSet // list of spans with a free object
+	full    [2]spanSet // list of spans with no free objects
+}
+
+// Initialize a single central free list.
+func (c *mcentral) init(spc spanClass) {
+	c.spanclass = spc
+	lockInit(&c.partial[0].spineLock, lockRankSpanSetSpine)
+	lockInit(&c.partial[1].spineLock, lockRankSpanSetSpine)
+	lockInit(&c.full[0].spineLock, lockRankSpanSetSpine)
+	lockInit(&c.full[1].spineLock, lockRankSpanSetSpine)
+}
+
+// partialUnswept returns the spanSet which holds partially-filled
+// unswept spans for this sweepgen.
+func (c *mcentral) partialUnswept(sweepgen uint32) *spanSet {
+	return &c.partial[1-sweepgen/2%2]
+}
+
+// partialSwept returns the spanSet which holds partially-filled
+// swept spans for this sweepgen.
+func (c *mcentral) partialSwept(sweepgen uint32) *spanSet {
+	return &c.partial[sweepgen/2%2]
+}
+
+// fullUnswept returns the spanSet which holds unswept spans without any
+// free slots for this sweepgen.
+func (c *mcentral) fullUnswept(sweepgen uint32) *spanSet {
+	return &c.full[1-sweepgen/2%2]
+}
+
+// fullSwept returns the spanSet which holds swept spans without any
+// free slots for this sweepgen.
+func (c *mcentral) fullSwept(sweepgen uint32) *spanSet {
+	return &c.full[sweepgen/2%2]
+}
+
+// Allocate a span to use in an mcache.
+func (c *mcentral) cacheSpan() *mspan {
+	// Deduct credit for this span allocation and sweep if necessary.
+	spanBytes := uintptr(class_to_allocnpages[c.spanclass.sizeclass()]) * _PageSize
+	deductSweepCredit(spanBytes, 0)
+
+	traceDone := false
+	if traceEnabled() {
+		traceGCSweepStart()
+	}
+
+	// If we sweep spanBudget spans without finding any free
+	// space, just allocate a fresh span. This limits the amount
+	// of time we can spend trying to find free space and
+	// amortizes the cost of small object sweeping over the
+	// benefit of having a full free span to allocate from. By
+	// setting this to 100, we limit the space overhead to 1%.
+	//
+	// TODO(austin,mknyszek): This still has bad worst-case
+	// throughput. For example, this could find just one free slot
+	// on the 100th swept span. That limits allocation latency, but
+	// still has very poor throughput. We could instead keep a
+	// running free-to-used budget and switch to fresh span
+	// allocation if the budget runs low.
+	spanBudget := 100
+
+	var s *mspan
+	var sl sweepLocker
+
+	// Try partial swept spans first.
+	sg := mheap_.sweepgen
+	if s = c.partialSwept(sg).pop(); s != nil {
+		goto havespan
+	}
+
+	sl = sweep.active.begin()
+	if sl.valid {
+		// Now try partial unswept spans.
+		for ; spanBudget >= 0; spanBudget-- {
+			s = c.partialUnswept(sg).pop()
+			if s == nil {
+				break
+			}
+			if s, ok := sl.tryAcquire(s); ok {
+				// We got ownership of the span, so let's sweep it and use it.
+				s.sweep(true)
+				sweep.active.end(sl)
+				goto havespan
+			}
+			// We failed to get ownership of the span, which means it's being or
+			// has been swept by an asynchronous sweeper that just couldn't remove it
+			// from the unswept list. That sweeper took ownership of the span and
+			// responsibility for either freeing it to the heap or putting it on the
+			// right swept list. Either way, we should just ignore it (and it's unsafe
+			// for us to do anything else).
+		}
+		// Now try full unswept spans, sweeping them and putting them into the
+		// right list if we fail to get a span.
+		for ; spanBudget >= 0; spanBudget-- {
+			s = c.fullUnswept(sg).pop()
+			if s == nil {
+				break
+			}
+			if s, ok := sl.tryAcquire(s); ok {
+				// We got ownership of the span, so let's sweep it.
+				s.sweep(true)
+				// Check if there's any free space.
+				freeIndex := s.nextFreeIndex()
+				if freeIndex != s.nelems {
+					s.freeindex = freeIndex
+					sweep.active.end(sl)
+					goto havespan
+				}
+				// Add it to the swept list, because sweeping didn't give us any free space.
+				c.fullSwept(sg).push(s.mspan)
+			}
+			// See comment for partial unswept spans.
+		}
+		sweep.active.end(sl)
+	}
+	if traceEnabled() {
+		traceGCSweepDone()
+		traceDone = true
+	}
+
+	// We failed to get a span from the mcentral so get one from mheap.
+	s = c.grow()
+	if s == nil {
+		return nil
+	}
+
+	// At this point s is a span that should have free slots.
+havespan:
+	if traceEnabled() && !traceDone {
+		traceGCSweepDone()
+	}
+	n := int(s.nelems) - int(s.allocCount)
+	if n == 0 || s.freeindex == s.nelems || uintptr(s.allocCount) == s.nelems {
+		throw("span has no free objects")
+	}
+	freeByteBase := s.freeindex &^ (64 - 1)
+	whichByte := freeByteBase / 8
+	// Init alloc bits cache.
+	s.refillAllocCache(whichByte)
+
+	// Adjust the allocCache so that s.freeindex corresponds to the low bit in
+	// s.allocCache.
+	s.allocCache >>= s.freeindex % 64
+
+	return s
+}
+
+// Return span from an mcache.
+//
+// s must have a span class corresponding to this
+// mcentral and it must not be empty.
+func (c *mcentral) uncacheSpan(s *mspan) {
+	if s.allocCount == 0 {
+		throw("uncaching span but s.allocCount == 0")
+	}
+
+	sg := mheap_.sweepgen
+	stale := s.sweepgen == sg+1
+
+	// Fix up sweepgen.
+	if stale {
+		// Span was cached before sweep began. It's our
+		// responsibility to sweep it.
+		//
+		// Set sweepgen to indicate it's not cached but needs
+		// sweeping and can't be allocated from. sweep will
+		// set s.sweepgen to indicate s is swept.
+		atomic.Store(&s.sweepgen, sg-1)
+	} else {
+		// Indicate that s is no longer cached.
+		atomic.Store(&s.sweepgen, sg)
+	}
+
+	// Put the span in the appropriate place.
+	if stale {
+		// It's stale, so just sweep it. Sweeping will put it on
+		// the right list.
+		//
+		// We don't use a sweepLocker here. Stale cached spans
+		// aren't in the global sweep lists, so mark termination
+		// itself holds up sweep completion until all mcaches
+		// have been swept.
+		ss := sweepLocked{s}
+		ss.sweep(false)
+	} else {
+		if int(s.nelems)-int(s.allocCount) > 0 {
+			// Put it back on the partial swept list.
+			c.partialSwept(sg).push(s)
+		} else {
+			// There's no free space and it's not stale, so put it on the
+			// full swept list.
+			c.fullSwept(sg).push(s)
+		}
+	}
+}
+
+// grow allocates a new empty span from the heap and initializes it for c's size class.
+func (c *mcentral) grow() *mspan {
+	npages := uintptr(class_to_allocnpages[c.spanclass.sizeclass()])
+	size := uintptr(class_to_size[c.spanclass.sizeclass()])
+
+	s := mheap_.alloc(npages, c.spanclass)
+	if s == nil {
+		return nil
+	}
+
+	// Use division by multiplication and shifts to quickly compute:
+	// n := (npages << _PageShift) / size
+	n := s.divideByElemSize(npages << _PageShift)
+	s.limit = s.base() + size*n
+	s.initHeapBits(false)
+	return s
+}
diff --git a/src/runtime/mcheckmark.go b/src/runtime/mcheckmark.go
new file mode 100644
index 0000000..73c1a10
--- /dev/null
+++ b/src/runtime/mcheckmark.go
@@ -0,0 +1,104 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// GC checkmarks
+//
+// In a concurrent garbage collector, one worries about failing to mark
+// a live object due to mutations without write barriers or bugs in the
+// collector implementation. As a sanity check, the GC has a 'checkmark'
+// mode that retraverses the object graph with the world stopped, to make
+// sure that everything that should be marked is marked.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// A checkmarksMap stores the GC marks in "checkmarks" mode. It is a
+// per-arena bitmap with a bit for every word in the arena. The mark
+// is stored on the bit corresponding to the first word of the marked
+// allocation.
+type checkmarksMap struct {
+	_ sys.NotInHeap
+	b [heapArenaBytes / goarch.PtrSize / 8]uint8
+}
+
+// If useCheckmark is true, marking of an object uses the checkmark
+// bits instead of the standard mark bits.
+var useCheckmark = false
+
+// startCheckmarks prepares for the checkmarks phase.
+//
+// The world must be stopped.
+func startCheckmarks() {
+	assertWorldStopped()
+
+	// Clear all checkmarks.
+	for _, ai := range mheap_.allArenas {
+		arena := mheap_.arenas[ai.l1()][ai.l2()]
+		bitmap := arena.checkmarks
+
+		if bitmap == nil {
+			// Allocate bitmap on first use.
+			bitmap = (*checkmarksMap)(persistentalloc(unsafe.Sizeof(*bitmap), 0, &memstats.gcMiscSys))
+			if bitmap == nil {
+				throw("out of memory allocating checkmarks bitmap")
+			}
+			arena.checkmarks = bitmap
+		} else {
+			// Otherwise clear the existing bitmap.
+			for i := range bitmap.b {
+				bitmap.b[i] = 0
+			}
+		}
+	}
+	// Enable checkmarking.
+	useCheckmark = true
+}
+
+// endCheckmarks ends the checkmarks phase.
+func endCheckmarks() {
+	if gcMarkWorkAvailable(nil) {
+		throw("GC work not flushed")
+	}
+	useCheckmark = false
+}
+
+// setCheckmark throws if marking object is a checkmarks violation,
+// and otherwise sets obj's checkmark. It returns true if obj was
+// already checkmarked.
+func setCheckmark(obj, base, off uintptr, mbits markBits) bool {
+	if !mbits.isMarked() {
+		printlock()
+		print("runtime: checkmarks found unexpected unmarked object obj=", hex(obj), "\n")
+		print("runtime: found obj at *(", hex(base), "+", hex(off), ")\n")
+
+		// Dump the source (base) object
+		gcDumpObject("base", base, off)
+
+		// Dump the object
+		gcDumpObject("obj", obj, ^uintptr(0))
+
+		getg().m.traceback = 2
+		throw("checkmark found unmarked object")
+	}
+
+	ai := arenaIndex(obj)
+	arena := mheap_.arenas[ai.l1()][ai.l2()]
+	arenaWord := (obj / heapArenaBytes / 8) % uintptr(len(arena.checkmarks.b))
+	mask := byte(1 << ((obj / heapArenaBytes) % 8))
+	bytep := &arena.checkmarks.b[arenaWord]
+
+	if atomic.Load8(bytep)&mask != 0 {
+		// Already checkmarked.
+		return true
+	}
+
+	atomic.Or8(bytep, mask)
+	return false
+}
diff --git a/src/runtime/mem.go b/src/runtime/mem.go
new file mode 100644
index 0000000..22688d5
--- /dev/null
+++ b/src/runtime/mem.go
@@ -0,0 +1,156 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// OS memory management abstraction layer
+//
+// Regions of the address space managed by the runtime may be in one of four
+// states at any given time:
+// 1) None - Unreserved and unmapped, the default state of any region.
+// 2) Reserved - Owned by the runtime, but accessing it would cause a fault.
+//               Does not count against the process' memory footprint.
+// 3) Prepared - Reserved, intended not to be backed by physical memory (though
+//               an OS may implement this lazily). Can transition efficiently to
+//               Ready. Accessing memory in such a region is undefined (may
+//               fault, may give back unexpected zeroes, etc.).
+// 4) Ready - may be accessed safely.
+//
+// This set of states is more than is strictly necessary to support all the
+// currently supported platforms. One could get by with just None, Reserved, and
+// Ready. However, the Prepared state gives us flexibility for performance
+// purposes. For example, on POSIX-y operating systems, Reserved is usually a
+// private anonymous mmap'd region with PROT_NONE set, and to transition
+// to Ready would require setting PROT_READ|PROT_WRITE. However the
+// underspecification of Prepared lets us use just MADV_FREE to transition from
+// Ready to Prepared. Thus with the Prepared state we can set the permission
+// bits just once early on, we can efficiently tell the OS that it's free to
+// take pages away from us when we don't strictly need them.
+//
+// This file defines a cross-OS interface for a common set of helpers
+// that transition memory regions between these states. The helpers call into
+// OS-specific implementations that handle errors, while the interface boundary
+// implements cross-OS functionality, like updating runtime accounting.
+
+// sysAlloc transitions an OS-chosen region of memory from None to Ready.
+// More specifically, it obtains a large chunk of zeroed memory from the
+// operating system, typically on the order of a hundred kilobytes
+// or a megabyte. This memory is always immediately available for use.
+//
+// sysStat must be non-nil.
+//
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//
+//go:nosplit
+func sysAlloc(n uintptr, sysStat *sysMemStat) unsafe.Pointer {
+	sysStat.add(int64(n))
+	gcController.mappedReady.Add(int64(n))
+	return sysAllocOS(n)
+}
+
+// sysUnused transitions a memory region from Ready to Prepared. It notifies the
+// operating system that the physical pages backing this memory region are no
+// longer needed and can be reused for other purposes. The contents of a
+// sysUnused memory region are considered forfeit and the region must not be
+// accessed again until sysUsed is called.
+func sysUnused(v unsafe.Pointer, n uintptr) {
+	gcController.mappedReady.Add(-int64(n))
+	sysUnusedOS(v, n)
+}
+
+// sysUsed transitions a memory region from Prepared to Ready. It notifies the
+// operating system that the memory region is needed and ensures that the region
+// may be safely accessed. This is typically a no-op on systems that don't have
+// an explicit commit step and hard over-commit limits, but is critical on
+// Windows, for example.
+//
+// This operation is idempotent for memory already in the Prepared state, so
+// it is safe to refer, with v and n, to a range of memory that includes both
+// Prepared and Ready memory. However, the caller must provide the exact amount
+// of Prepared memory for accounting purposes.
+func sysUsed(v unsafe.Pointer, n, prepared uintptr) {
+	gcController.mappedReady.Add(int64(prepared))
+	sysUsedOS(v, n)
+}
+
+// sysHugePage does not transition memory regions, but instead provides a
+// hint to the OS that it would be more efficient to back this memory region
+// with pages of a larger size transparently.
+func sysHugePage(v unsafe.Pointer, n uintptr) {
+	sysHugePageOS(v, n)
+}
+
+// sysNoHugePage does not transition memory regions, but instead provides a
+// hint to the OS that it would be less efficient to back this memory region
+// with pages of a larger size transparently.
+func sysNoHugePage(v unsafe.Pointer, n uintptr) {
+	sysNoHugePageOS(v, n)
+}
+
+// sysHugePageCollapse attempts to immediately back the provided memory region
+// with huge pages. It is best-effort and may fail silently.
+func sysHugePageCollapse(v unsafe.Pointer, n uintptr) {
+	sysHugePageCollapseOS(v, n)
+}
+
+// sysFree transitions a memory region from any state to None. Therefore, it
+// returns memory unconditionally. It is used if an out-of-memory error has been
+// detected midway through an allocation or to carve out an aligned section of
+// the address space. It is okay if sysFree is a no-op only if sysReserve always
+// returns a memory region aligned to the heap allocator's alignment
+// restrictions.
+//
+// sysStat must be non-nil.
+//
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//
+//go:nosplit
+func sysFree(v unsafe.Pointer, n uintptr, sysStat *sysMemStat) {
+	sysStat.add(-int64(n))
+	gcController.mappedReady.Add(-int64(n))
+	sysFreeOS(v, n)
+}
+
+// sysFault transitions a memory region from Ready to Reserved. It
+// marks a region such that it will always fault if accessed. Used only for
+// debugging the runtime.
+//
+// TODO(mknyszek): Currently it's true that all uses of sysFault transition
+// memory from Ready to Reserved, but this may not be true in the future
+// since on every platform the operation is much more general than that.
+// If a transition from Prepared is ever introduced, create a new function
+// that elides the Ready state accounting.
+func sysFault(v unsafe.Pointer, n uintptr) {
+	gcController.mappedReady.Add(-int64(n))
+	sysFaultOS(v, n)
+}
+
+// sysReserve transitions a memory region from None to Reserved. It reserves
+// address space in such a way that it would cause a fatal fault upon access
+// (either via permissions or not committing the memory). Such a reservation is
+// thus never backed by physical memory.
+//
+// If the pointer passed to it is non-nil, the caller wants the
+// reservation there, but sysReserve can still choose another
+// location if that one is unavailable.
+//
+// NOTE: sysReserve returns OS-aligned memory, but the heap allocator
+// may use larger alignment, so the caller must be careful to realign the
+// memory obtained by sysReserve.
+func sysReserve(v unsafe.Pointer, n uintptr) unsafe.Pointer {
+	return sysReserveOS(v, n)
+}
+
+// sysMap transitions a memory region from Reserved to Prepared. It ensures the
+// memory region can be efficiently transitioned to Ready.
+//
+// sysStat must be non-nil.
+func sysMap(v unsafe.Pointer, n uintptr, sysStat *sysMemStat) {
+	sysStat.add(int64(n))
+	sysMapOS(v, n)
+}
diff --git a/src/runtime/mem_aix.go b/src/runtime/mem_aix.go
new file mode 100644
index 0000000..dff2756
--- /dev/null
+++ b/src/runtime/mem_aix.go
@@ -0,0 +1,81 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+// Don't split the stack as this method may be invoked without a valid G, which
+// prevents us from allocating more stack.
+//
+//go:nosplit
+func sysAllocOS(n uintptr) unsafe.Pointer {
+	p, err := mmap(nil, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+	if err != 0 {
+		if err == _EACCES {
+			print("runtime: mmap: access denied\n")
+			exit(2)
+		}
+		if err == _EAGAIN {
+			print("runtime: mmap: too much locked memory (check 'ulimit -l').\n")
+			exit(2)
+		}
+		return nil
+	}
+	return p
+}
+
+func sysUnusedOS(v unsafe.Pointer, n uintptr) {
+	madvise(v, n, _MADV_DONTNEED)
+}
+
+func sysUsedOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysHugePageOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysNoHugePageOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysHugePageCollapseOS(v unsafe.Pointer, n uintptr) {
+}
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//
+//go:nosplit
+func sysFreeOS(v unsafe.Pointer, n uintptr) {
+	munmap(v, n)
+}
+
+func sysFaultOS(v unsafe.Pointer, n uintptr) {
+	mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE|_MAP_FIXED, -1, 0)
+}
+
+func sysReserveOS(v unsafe.Pointer, n uintptr) unsafe.Pointer {
+	p, err := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+	if err != 0 {
+		return nil
+	}
+	return p
+}
+
+func sysMapOS(v unsafe.Pointer, n uintptr) {
+	// AIX does not allow mapping a range that is already mapped.
+	// So, call mprotect to change permissions.
+	// Note that sysMap is always called with a non-nil pointer
+	// since it transitions a Reserved memory region to Prepared,
+	// so mprotect is always possible.
+	_, err := mprotect(v, n, _PROT_READ|_PROT_WRITE)
+	if err == _ENOMEM {
+		throw("runtime: out of memory")
+	}
+	if err != 0 {
+		print("runtime: mprotect(", v, ", ", n, ") returned ", err, "\n")
+		throw("runtime: cannot map pages in arena address space")
+	}
+}
diff --git a/src/runtime/mem_bsd.go b/src/runtime/mem_bsd.go
new file mode 100644
index 0000000..78128ae
--- /dev/null
+++ b/src/runtime/mem_bsd.go
@@ -0,0 +1,87 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build dragonfly || freebsd || netbsd || openbsd || solaris
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//
+//go:nosplit
+func sysAllocOS(n uintptr) unsafe.Pointer {
+	v, err := mmap(nil, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+	if err != 0 {
+		return nil
+	}
+	return v
+}
+
+func sysUnusedOS(v unsafe.Pointer, n uintptr) {
+	if debug.madvdontneed != 0 {
+		madvise(v, n, _MADV_DONTNEED)
+	} else {
+		madvise(v, n, _MADV_FREE)
+	}
+}
+
+func sysUsedOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysHugePageOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysNoHugePageOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysHugePageCollapseOS(v unsafe.Pointer, n uintptr) {
+}
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//
+//go:nosplit
+func sysFreeOS(v unsafe.Pointer, n uintptr) {
+	munmap(v, n)
+}
+
+func sysFaultOS(v unsafe.Pointer, n uintptr) {
+	mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE|_MAP_FIXED, -1, 0)
+}
+
+// Indicates not to reserve swap space for the mapping.
+const _sunosMAP_NORESERVE = 0x40
+
+func sysReserveOS(v unsafe.Pointer, n uintptr) unsafe.Pointer {
+	flags := int32(_MAP_ANON | _MAP_PRIVATE)
+	if GOOS == "solaris" || GOOS == "illumos" {
+		// Be explicit that we don't want to reserve swap space
+		// for PROT_NONE anonymous mappings. This avoids an issue
+		// wherein large mappings can cause fork to fail.
+		flags |= _sunosMAP_NORESERVE
+	}
+	p, err := mmap(v, n, _PROT_NONE, flags, -1, 0)
+	if err != 0 {
+		return nil
+	}
+	return p
+}
+
+const _sunosEAGAIN = 11
+const _ENOMEM = 12
+
+func sysMapOS(v unsafe.Pointer, n uintptr) {
+	p, err := mmap(v, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_FIXED|_MAP_PRIVATE, -1, 0)
+	if err == _ENOMEM || ((GOOS == "solaris" || GOOS == "illumos") && err == _sunosEAGAIN) {
+		throw("runtime: out of memory")
+	}
+	if p != v || err != 0 {
+		print("runtime: mmap(", v, ", ", n, ") returned ", p, ", ", err, "\n")
+		throw("runtime: cannot map pages in arena address space")
+	}
+}
diff --git a/src/runtime/mem_darwin.go b/src/runtime/mem_darwin.go
new file mode 100644
index 0000000..ae84871
--- /dev/null
+++ b/src/runtime/mem_darwin.go
@@ -0,0 +1,76 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//
+//go:nosplit
+func sysAllocOS(n uintptr) unsafe.Pointer {
+	v, err := mmap(nil, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+	if err != 0 {
+		return nil
+	}
+	return v
+}
+
+func sysUnusedOS(v unsafe.Pointer, n uintptr) {
+	// MADV_FREE_REUSABLE is like MADV_FREE except it also propagates
+	// accounting information about the process to task_info.
+	madvise(v, n, _MADV_FREE_REUSABLE)
+}
+
+func sysUsedOS(v unsafe.Pointer, n uintptr) {
+	// MADV_FREE_REUSE is necessary to keep the kernel's accounting
+	// accurate. If called on any memory region that hasn't been
+	// MADV_FREE_REUSABLE'd, it's a no-op.
+	madvise(v, n, _MADV_FREE_REUSE)
+}
+
+func sysHugePageOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysNoHugePageOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysHugePageCollapseOS(v unsafe.Pointer, n uintptr) {
+}
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//
+//go:nosplit
+func sysFreeOS(v unsafe.Pointer, n uintptr) {
+	munmap(v, n)
+}
+
+func sysFaultOS(v unsafe.Pointer, n uintptr) {
+	mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE|_MAP_FIXED, -1, 0)
+}
+
+func sysReserveOS(v unsafe.Pointer, n uintptr) unsafe.Pointer {
+	p, err := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+	if err != 0 {
+		return nil
+	}
+	return p
+}
+
+const _ENOMEM = 12
+
+func sysMapOS(v unsafe.Pointer, n uintptr) {
+	p, err := mmap(v, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_FIXED|_MAP_PRIVATE, -1, 0)
+	if err == _ENOMEM {
+		throw("runtime: out of memory")
+	}
+	if p != v || err != 0 {
+		print("runtime: mmap(", v, ", ", n, ") returned ", p, ", ", err, "\n")
+		throw("runtime: cannot map pages in arena address space")
+	}
+}
diff --git a/src/runtime/mem_js.go b/src/runtime/mem_js.go
new file mode 100644
index 0000000..080b1ab
--- /dev/null
+++ b/src/runtime/mem_js.go
@@ -0,0 +1,13 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build js
+
+package runtime
+
+// resetMemoryDataView signals the JS front-end that WebAssembly's memory.grow instruction has been used.
+// This allows the front-end to replace the old DataView object with a new one.
+//
+//go:wasmimport gojs runtime.resetMemoryDataView
+func resetMemoryDataView()
diff --git a/src/runtime/mem_linux.go b/src/runtime/mem_linux.go
new file mode 100644
index 0000000..d63c38c
--- /dev/null
+++ b/src/runtime/mem_linux.go
@@ -0,0 +1,181 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const (
+	_EACCES = 13
+	_EINVAL = 22
+)
+
+// Don't split the stack as this method may be invoked without a valid G, which
+// prevents us from allocating more stack.
+//
+//go:nosplit
+func sysAllocOS(n uintptr) unsafe.Pointer {
+	p, err := mmap(nil, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+	if err != 0 {
+		if err == _EACCES {
+			print("runtime: mmap: access denied\n")
+			exit(2)
+		}
+		if err == _EAGAIN {
+			print("runtime: mmap: too much locked memory (check 'ulimit -l').\n")
+			exit(2)
+		}
+		return nil
+	}
+	return p
+}
+
+var adviseUnused = uint32(_MADV_FREE)
+
+const madviseUnsupported = 0
+
+func sysUnusedOS(v unsafe.Pointer, n uintptr) {
+	if uintptr(v)&(physPageSize-1) != 0 || n&(physPageSize-1) != 0 {
+		// madvise will round this to any physical page
+		// *covered* by this range, so an unaligned madvise
+		// will release more memory than intended.
+		throw("unaligned sysUnused")
+	}
+
+	advise := atomic.Load(&adviseUnused)
+	if debug.madvdontneed != 0 && advise != madviseUnsupported {
+		advise = _MADV_DONTNEED
+	}
+	switch advise {
+	case _MADV_FREE:
+		if madvise(v, n, _MADV_FREE) == 0 {
+			break
+		}
+		atomic.Store(&adviseUnused, _MADV_DONTNEED)
+		fallthrough
+	case _MADV_DONTNEED:
+		// MADV_FREE was added in Linux 4.5. Fall back on MADV_DONTNEED if it's
+		// not supported.
+		if madvise(v, n, _MADV_DONTNEED) == 0 {
+			break
+		}
+		atomic.Store(&adviseUnused, madviseUnsupported)
+		fallthrough
+	case madviseUnsupported:
+		// Since Linux 3.18, support for madvise is optional.
+		// Fall back on mmap if it's not supported.
+		// _MAP_ANON|_MAP_FIXED|_MAP_PRIVATE will unmap all the
+		// pages in the old mapping, and remap the memory region.
+		mmap(v, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_FIXED|_MAP_PRIVATE, -1, 0)
+	}
+
+	if debug.harddecommit > 0 {
+		p, err := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_FIXED|_MAP_PRIVATE, -1, 0)
+		if p != v || err != 0 {
+			throw("runtime: cannot disable permissions in address space")
+		}
+	}
+}
+
+func sysUsedOS(v unsafe.Pointer, n uintptr) {
+	if debug.harddecommit > 0 {
+		p, err := mmap(v, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_FIXED|_MAP_PRIVATE, -1, 0)
+		if err == _ENOMEM {
+			throw("runtime: out of memory")
+		}
+		if p != v || err != 0 {
+			throw("runtime: cannot remap pages in address space")
+		}
+		return
+	}
+}
+
+func sysHugePageOS(v unsafe.Pointer, n uintptr) {
+	if physHugePageSize != 0 {
+		// Round v up to a huge page boundary.
+		beg := alignUp(uintptr(v), physHugePageSize)
+		// Round v+n down to a huge page boundary.
+		end := alignDown(uintptr(v)+n, physHugePageSize)
+
+		if beg < end {
+			madvise(unsafe.Pointer(beg), end-beg, _MADV_HUGEPAGE)
+		}
+	}
+}
+
+func sysNoHugePageOS(v unsafe.Pointer, n uintptr) {
+	if uintptr(v)&(physPageSize-1) != 0 {
+		// The Linux implementation requires that the address
+		// addr be page-aligned, and allows length to be zero.
+		throw("unaligned sysNoHugePageOS")
+	}
+	madvise(v, n, _MADV_NOHUGEPAGE)
+}
+
+func sysHugePageCollapseOS(v unsafe.Pointer, n uintptr) {
+	if uintptr(v)&(physPageSize-1) != 0 {
+		// The Linux implementation requires that the address
+		// addr be page-aligned, and allows length to be zero.
+		throw("unaligned sysHugePageCollapseOS")
+	}
+	if physHugePageSize == 0 {
+		return
+	}
+	// N.B. If you find yourself debugging this code, note that
+	// this call can fail with EAGAIN because it's best-effort.
+	// Also, when it returns an error, it's only for the last
+	// huge page in the region requested.
+	//
+	// It can also sometimes return EINVAL if the corresponding
+	// region hasn't been backed by physical memory. This is
+	// difficult to guarantee in general, and it also means
+	// there's no way to distinguish whether this syscall is
+	// actually available. Oops.
+	//
+	// Anyway, that's why this call just doesn't bother checking
+	// any errors.
+	madvise(v, n, _MADV_COLLAPSE)
+}
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//
+//go:nosplit
+func sysFreeOS(v unsafe.Pointer, n uintptr) {
+	munmap(v, n)
+}
+
+func sysFaultOS(v unsafe.Pointer, n uintptr) {
+	mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE|_MAP_FIXED, -1, 0)
+}
+
+func sysReserveOS(v unsafe.Pointer, n uintptr) unsafe.Pointer {
+	p, err := mmap(v, n, _PROT_NONE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+	if err != 0 {
+		return nil
+	}
+	return p
+}
+
+func sysMapOS(v unsafe.Pointer, n uintptr) {
+	p, err := mmap(v, n, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_FIXED|_MAP_PRIVATE, -1, 0)
+	if err == _ENOMEM {
+		throw("runtime: out of memory")
+	}
+	if p != v || err != 0 {
+		print("runtime: mmap(", v, ", ", n, ") returned ", p, ", ", err, "\n")
+		throw("runtime: cannot map pages in arena address space")
+	}
+
+	// Disable huge pages if the GODEBUG for it is set.
+	//
+	// Note that there are a few sysHugePage calls that can override this, but
+	// they're all for GC metadata.
+	if debug.disablethp != 0 {
+		sysNoHugePageOS(v, n)
+	}
+}
diff --git a/src/runtime/mem_plan9.go b/src/runtime/mem_plan9.go
new file mode 100644
index 0000000..9b18a29
--- /dev/null
+++ b/src/runtime/mem_plan9.go
@@ -0,0 +1,21 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+func sbrk(n uintptr) unsafe.Pointer {
+	// Plan 9 sbrk from /sys/src/libc/9sys/sbrk.c
+	bl := bloc
+	n = memRound(n)
+	if bl+n > blocMax {
+		if brk_(unsafe.Pointer(bl+n)) < 0 {
+			return nil
+		}
+		blocMax = bl + n
+	}
+	bloc += n
+	return unsafe.Pointer(bl)
+}
diff --git a/src/runtime/mem_sbrk.go b/src/runtime/mem_sbrk.go
new file mode 100644
index 0000000..dc0a764
--- /dev/null
+++ b/src/runtime/mem_sbrk.go
@@ -0,0 +1,189 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build plan9 || wasm
+
+package runtime
+
+import "unsafe"
+
+const memDebug = false
+
+var bloc uintptr
+var blocMax uintptr
+var memlock mutex
+
+type memHdr struct {
+	next memHdrPtr
+	size uintptr
+}
+
+var memFreelist memHdrPtr // sorted in ascending order
+
+type memHdrPtr uintptr
+
+func (p memHdrPtr) ptr() *memHdr   { return (*memHdr)(unsafe.Pointer(p)) }
+func (p *memHdrPtr) set(x *memHdr) { *p = memHdrPtr(unsafe.Pointer(x)) }
+
+func memAlloc(n uintptr) unsafe.Pointer {
+	n = memRound(n)
+	var prevp *memHdr
+	for p := memFreelist.ptr(); p != nil; p = p.next.ptr() {
+		if p.size >= n {
+			if p.size == n {
+				if prevp != nil {
+					prevp.next = p.next
+				} else {
+					memFreelist = p.next
+				}
+			} else {
+				p.size -= n
+				p = (*memHdr)(add(unsafe.Pointer(p), p.size))
+			}
+			*p = memHdr{}
+			return unsafe.Pointer(p)
+		}
+		prevp = p
+	}
+	return sbrk(n)
+}
+
+func memFree(ap unsafe.Pointer, n uintptr) {
+	n = memRound(n)
+	memclrNoHeapPointers(ap, n)
+	bp := (*memHdr)(ap)
+	bp.size = n
+	bpn := uintptr(ap)
+	if memFreelist == 0 {
+		bp.next = 0
+		memFreelist.set(bp)
+		return
+	}
+	p := memFreelist.ptr()
+	if bpn < uintptr(unsafe.Pointer(p)) {
+		memFreelist.set(bp)
+		if bpn+bp.size == uintptr(unsafe.Pointer(p)) {
+			bp.size += p.size
+			bp.next = p.next
+			*p = memHdr{}
+		} else {
+			bp.next.set(p)
+		}
+		return
+	}
+	for ; p.next != 0; p = p.next.ptr() {
+		if bpn > uintptr(unsafe.Pointer(p)) && bpn < uintptr(unsafe.Pointer(p.next)) {
+			break
+		}
+	}
+	if bpn+bp.size == uintptr(unsafe.Pointer(p.next)) {
+		bp.size += p.next.ptr().size
+		bp.next = p.next.ptr().next
+		*p.next.ptr() = memHdr{}
+	} else {
+		bp.next = p.next
+	}
+	if uintptr(unsafe.Pointer(p))+p.size == bpn {
+		p.size += bp.size
+		p.next = bp.next
+		*bp = memHdr{}
+	} else {
+		p.next.set(bp)
+	}
+}
+
+func memCheck() {
+	if !memDebug {
+		return
+	}
+	for p := memFreelist.ptr(); p != nil && p.next != 0; p = p.next.ptr() {
+		if uintptr(unsafe.Pointer(p)) == uintptr(unsafe.Pointer(p.next)) {
+			print("runtime: ", unsafe.Pointer(p), " == ", unsafe.Pointer(p.next), "\n")
+			throw("mem: infinite loop")
+		}
+		if uintptr(unsafe.Pointer(p)) > uintptr(unsafe.Pointer(p.next)) {
+			print("runtime: ", unsafe.Pointer(p), " > ", unsafe.Pointer(p.next), "\n")
+			throw("mem: unordered list")
+		}
+		if uintptr(unsafe.Pointer(p))+p.size > uintptr(unsafe.Pointer(p.next)) {
+			print("runtime: ", unsafe.Pointer(p), "+", p.size, " > ", unsafe.Pointer(p.next), "\n")
+			throw("mem: overlapping blocks")
+		}
+		for b := add(unsafe.Pointer(p), unsafe.Sizeof(memHdr{})); uintptr(b) < uintptr(unsafe.Pointer(p))+p.size; b = add(b, 1) {
+			if *(*byte)(b) != 0 {
+				print("runtime: value at addr ", b, " with offset ", uintptr(b)-uintptr(unsafe.Pointer(p)), " in block ", p, " of size ", p.size, " is not zero\n")
+				throw("mem: uninitialised memory")
+			}
+		}
+	}
+}
+
+func memRound(p uintptr) uintptr {
+	return alignUp(p, physPageSize)
+}
+
+func initBloc() {
+	bloc = memRound(firstmoduledata.end)
+	blocMax = bloc
+}
+
+func sysAllocOS(n uintptr) unsafe.Pointer {
+	lock(&memlock)
+	p := memAlloc(n)
+	memCheck()
+	unlock(&memlock)
+	return p
+}
+
+func sysFreeOS(v unsafe.Pointer, n uintptr) {
+	lock(&memlock)
+	if uintptr(v)+n == bloc {
+		// Address range being freed is at the end of memory,
+		// so record a new lower value for end of memory.
+		// Can't actually shrink address space because segment is shared.
+		memclrNoHeapPointers(v, n)
+		bloc -= n
+	} else {
+		memFree(v, n)
+		memCheck()
+	}
+	unlock(&memlock)
+}
+
+func sysUnusedOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysUsedOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysHugePageOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysNoHugePageOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysHugePageCollapseOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysMapOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysFaultOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysReserveOS(v unsafe.Pointer, n uintptr) unsafe.Pointer {
+	lock(&memlock)
+	var p unsafe.Pointer
+	if uintptr(v) == bloc {
+		// Address hint is the current end of memory,
+		// so try to extend the address space.
+		p = sbrk(n)
+	}
+	if p == nil && v == nil {
+		p = memAlloc(n)
+		memCheck()
+	}
+	unlock(&memlock)
+	return p
+}
diff --git a/src/runtime/mem_wasip1.go b/src/runtime/mem_wasip1.go
new file mode 100644
index 0000000..41ffa0d
--- /dev/null
+++ b/src/runtime/mem_wasip1.go
@@ -0,0 +1,13 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build wasip1
+
+package runtime
+
+func resetMemoryDataView() {
+	// This function is a no-op on WASI, it is only used to notify the browser
+	// that its view of the WASM memory needs to be updated when compiling for
+	// GOOS=js.
+}
diff --git a/src/runtime/mem_wasm.go b/src/runtime/mem_wasm.go
new file mode 100644
index 0000000..d9d3270
--- /dev/null
+++ b/src/runtime/mem_wasm.go
@@ -0,0 +1,20 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+func sbrk(n uintptr) unsafe.Pointer {
+	grow := divRoundUp(n, physPageSize)
+	size := growMemory(int32(grow))
+	if size < 0 {
+		return nil
+	}
+	resetMemoryDataView()
+	return unsafe.Pointer(uintptr(size) * physPageSize)
+}
+
+// Implemented in src/runtime/sys_wasm.s
+func growMemory(pages int32) int32
diff --git a/src/runtime/mem_windows.go b/src/runtime/mem_windows.go
new file mode 100644
index 0000000..477d898
--- /dev/null
+++ b/src/runtime/mem_windows.go
@@ -0,0 +1,134 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+const (
+	_MEM_COMMIT   = 0x1000
+	_MEM_RESERVE  = 0x2000
+	_MEM_DECOMMIT = 0x4000
+	_MEM_RELEASE  = 0x8000
+
+	_PAGE_READWRITE = 0x0004
+	_PAGE_NOACCESS  = 0x0001
+
+	_ERROR_NOT_ENOUGH_MEMORY = 8
+	_ERROR_COMMITMENT_LIMIT  = 1455
+)
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//
+//go:nosplit
+func sysAllocOS(n uintptr) unsafe.Pointer {
+	return unsafe.Pointer(stdcall4(_VirtualAlloc, 0, n, _MEM_COMMIT|_MEM_RESERVE, _PAGE_READWRITE))
+}
+
+func sysUnusedOS(v unsafe.Pointer, n uintptr) {
+	r := stdcall3(_VirtualFree, uintptr(v), n, _MEM_DECOMMIT)
+	if r != 0 {
+		return
+	}
+
+	// Decommit failed. Usual reason is that we've merged memory from two different
+	// VirtualAlloc calls, and Windows will only let each VirtualFree handle pages from
+	// a single VirtualAlloc. It is okay to specify a subset of the pages from a single alloc,
+	// just not pages from multiple allocs. This is a rare case, arising only when we're
+	// trying to give memory back to the operating system, which happens on a time
+	// scale of minutes. It doesn't have to be terribly fast. Instead of extra bookkeeping
+	// on all our VirtualAlloc calls, try freeing successively smaller pieces until
+	// we manage to free something, and then repeat. This ends up being O(n log n)
+	// in the worst case, but that's fast enough.
+	for n > 0 {
+		small := n
+		for small >= 4096 && stdcall3(_VirtualFree, uintptr(v), small, _MEM_DECOMMIT) == 0 {
+			small /= 2
+			small &^= 4096 - 1
+		}
+		if small < 4096 {
+			print("runtime: VirtualFree of ", small, " bytes failed with errno=", getlasterror(), "\n")
+			throw("runtime: failed to decommit pages")
+		}
+		v = add(v, small)
+		n -= small
+	}
+}
+
+func sysUsedOS(v unsafe.Pointer, n uintptr) {
+	p := stdcall4(_VirtualAlloc, uintptr(v), n, _MEM_COMMIT, _PAGE_READWRITE)
+	if p == uintptr(v) {
+		return
+	}
+
+	// Commit failed. See SysUnused.
+	// Hold on to n here so we can give back a better error message
+	// for certain cases.
+	k := n
+	for k > 0 {
+		small := k
+		for small >= 4096 && stdcall4(_VirtualAlloc, uintptr(v), small, _MEM_COMMIT, _PAGE_READWRITE) == 0 {
+			small /= 2
+			small &^= 4096 - 1
+		}
+		if small < 4096 {
+			errno := getlasterror()
+			switch errno {
+			case _ERROR_NOT_ENOUGH_MEMORY, _ERROR_COMMITMENT_LIMIT:
+				print("runtime: VirtualAlloc of ", n, " bytes failed with errno=", errno, "\n")
+				throw("out of memory")
+			default:
+				print("runtime: VirtualAlloc of ", small, " bytes failed with errno=", errno, "\n")
+				throw("runtime: failed to commit pages")
+			}
+		}
+		v = add(v, small)
+		k -= small
+	}
+}
+
+func sysHugePageOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysNoHugePageOS(v unsafe.Pointer, n uintptr) {
+}
+
+func sysHugePageCollapseOS(v unsafe.Pointer, n uintptr) {
+}
+
+// Don't split the stack as this function may be invoked without a valid G,
+// which prevents us from allocating more stack.
+//
+//go:nosplit
+func sysFreeOS(v unsafe.Pointer, n uintptr) {
+	r := stdcall3(_VirtualFree, uintptr(v), 0, _MEM_RELEASE)
+	if r == 0 {
+		print("runtime: VirtualFree of ", n, " bytes failed with errno=", getlasterror(), "\n")
+		throw("runtime: failed to release pages")
+	}
+}
+
+func sysFaultOS(v unsafe.Pointer, n uintptr) {
+	// SysUnused makes the memory inaccessible and prevents its reuse
+	sysUnusedOS(v, n)
+}
+
+func sysReserveOS(v unsafe.Pointer, n uintptr) unsafe.Pointer {
+	// v is just a hint.
+	// First try at v.
+	// This will fail if any of [v, v+n) is already reserved.
+	v = unsafe.Pointer(stdcall4(_VirtualAlloc, uintptr(v), n, _MEM_RESERVE, _PAGE_READWRITE))
+	if v != nil {
+		return v
+	}
+
+	// Next let the kernel choose the address.
+	return unsafe.Pointer(stdcall4(_VirtualAlloc, 0, n, _MEM_RESERVE, _PAGE_READWRITE))
+}
+
+func sysMapOS(v unsafe.Pointer, n uintptr) {
+}
diff --git a/src/runtime/memclr_386.s b/src/runtime/memclr_386.s
new file mode 100644
index 0000000..a72e5f2
--- /dev/null
+++ b/src/runtime/memclr_386.s
@@ -0,0 +1,137 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB), NOSPLIT, $0-8
+	MOVL	ptr+0(FP), DI
+	MOVL	n+4(FP), BX
+	XORL	AX, AX
+
+	// MOVOU seems always faster than REP STOSL.
+tail:
+	// BSR+branch table make almost all memmove/memclr benchmarks worse. Not worth doing.
+	TESTL	BX, BX
+	JEQ	_0
+	CMPL	BX, $2
+	JBE	_1or2
+	CMPL	BX, $4
+	JB	_3
+	JE	_4
+	CMPL	BX, $8
+	JBE	_5through8
+	CMPL	BX, $16
+	JBE	_9through16
+#ifdef GO386_softfloat
+	JMP	nosse2
+#endif
+	PXOR	X0, X0
+	CMPL	BX, $32
+	JBE	_17through32
+	CMPL	BX, $64
+	JBE	_33through64
+	CMPL	BX, $128
+	JBE	_65through128
+	CMPL	BX, $256
+	JBE	_129through256
+
+loop:
+	MOVOU	X0, 0(DI)
+	MOVOU	X0, 16(DI)
+	MOVOU	X0, 32(DI)
+	MOVOU	X0, 48(DI)
+	MOVOU	X0, 64(DI)
+	MOVOU	X0, 80(DI)
+	MOVOU	X0, 96(DI)
+	MOVOU	X0, 112(DI)
+	MOVOU	X0, 128(DI)
+	MOVOU	X0, 144(DI)
+	MOVOU	X0, 160(DI)
+	MOVOU	X0, 176(DI)
+	MOVOU	X0, 192(DI)
+	MOVOU	X0, 208(DI)
+	MOVOU	X0, 224(DI)
+	MOVOU	X0, 240(DI)
+	SUBL	$256, BX
+	ADDL	$256, DI
+	CMPL	BX, $256
+	JAE	loop
+	JMP	tail
+
+_1or2:
+	MOVB	AX, (DI)
+	MOVB	AX, -1(DI)(BX*1)
+	RET
+_0:
+	RET
+_3:
+	MOVW	AX, (DI)
+	MOVB	AX, 2(DI)
+	RET
+_4:
+	// We need a separate case for 4 to make sure we clear pointers atomically.
+	MOVL	AX, (DI)
+	RET
+_5through8:
+	MOVL	AX, (DI)
+	MOVL	AX, -4(DI)(BX*1)
+	RET
+_9through16:
+	MOVL	AX, (DI)
+	MOVL	AX, 4(DI)
+	MOVL	AX, -8(DI)(BX*1)
+	MOVL	AX, -4(DI)(BX*1)
+	RET
+_17through32:
+	MOVOU	X0, (DI)
+	MOVOU	X0, -16(DI)(BX*1)
+	RET
+_33through64:
+	MOVOU	X0, (DI)
+	MOVOU	X0, 16(DI)
+	MOVOU	X0, -32(DI)(BX*1)
+	MOVOU	X0, -16(DI)(BX*1)
+	RET
+_65through128:
+	MOVOU	X0, (DI)
+	MOVOU	X0, 16(DI)
+	MOVOU	X0, 32(DI)
+	MOVOU	X0, 48(DI)
+	MOVOU	X0, -64(DI)(BX*1)
+	MOVOU	X0, -48(DI)(BX*1)
+	MOVOU	X0, -32(DI)(BX*1)
+	MOVOU	X0, -16(DI)(BX*1)
+	RET
+_129through256:
+	MOVOU	X0, (DI)
+	MOVOU	X0, 16(DI)
+	MOVOU	X0, 32(DI)
+	MOVOU	X0, 48(DI)
+	MOVOU	X0, 64(DI)
+	MOVOU	X0, 80(DI)
+	MOVOU	X0, 96(DI)
+	MOVOU	X0, 112(DI)
+	MOVOU	X0, -128(DI)(BX*1)
+	MOVOU	X0, -112(DI)(BX*1)
+	MOVOU	X0, -96(DI)(BX*1)
+	MOVOU	X0, -80(DI)(BX*1)
+	MOVOU	X0, -64(DI)(BX*1)
+	MOVOU	X0, -48(DI)(BX*1)
+	MOVOU	X0, -32(DI)(BX*1)
+	MOVOU	X0, -16(DI)(BX*1)
+	RET
+nosse2:
+	MOVL	BX, CX
+	SHRL	$2, CX
+	REP
+	STOSL
+	ANDL	$3, BX
+	JNE	tail
+	RET
diff --git a/src/runtime/memclr_amd64.s b/src/runtime/memclr_amd64.s
new file mode 100644
index 0000000..19bfa6f
--- /dev/null
+++ b/src/runtime/memclr_amd64.s
@@ -0,0 +1,218 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9
+
+#include "go_asm.h"
+#include "textflag.h"
+#include "asm_amd64.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+// ABIInternal for performance.
+TEXT runtime·memclrNoHeapPointers<ABIInternal>(SB), NOSPLIT, $0-16
+	// AX = ptr
+	// BX = n
+	MOVQ	AX, DI	// DI = ptr
+	XORQ	AX, AX
+
+	// MOVOU seems always faster than REP STOSQ when Enhanced REP STOSQ is not available.
+tail:
+	// BSR+branch table make almost all memmove/memclr benchmarks worse. Not worth doing.
+	TESTQ	BX, BX
+	JEQ	_0
+	CMPQ	BX, $2
+	JBE	_1or2
+	CMPQ	BX, $4
+	JBE	_3or4
+	CMPQ	BX, $8
+	JB	_5through7
+	JE	_8
+	CMPQ	BX, $16
+	JBE	_9through16
+	CMPQ	BX, $32
+	JBE	_17through32
+	CMPQ	BX, $64
+	JBE	_33through64
+	CMPQ	BX, $128
+	JBE	_65through128
+	CMPQ	BX, $256
+	JBE	_129through256
+
+	CMPB	internal∕cpu·X86+const_offsetX86HasERMS(SB), $1 // enhanced REP MOVSB/STOSB
+	JNE	skip_erms
+
+	// If the size is less than 2kb, do not use ERMS as it has a big start-up cost.
+	// Table 3-4. Relative Performance of Memcpy() Using ERMSB Vs. 128-bit AVX
+	// in the Intel Optimization Guide shows better performance for ERMSB starting
+	// from 2KB. Benchmarks show the similar threshold for REP STOS vs AVX.
+	CMPQ    BX, $2048
+	JAE	loop_preheader_erms
+
+skip_erms:
+#ifndef hasAVX2
+	CMPB	internal∕cpu·X86+const_offsetX86HasAVX2(SB), $1
+	JE	loop_preheader_avx2
+	// TODO: for really big clears, use MOVNTDQ, even without AVX2.
+
+loop:
+	MOVOU	X15, 0(DI)
+	MOVOU	X15, 16(DI)
+	MOVOU	X15, 32(DI)
+	MOVOU	X15, 48(DI)
+	MOVOU	X15, 64(DI)
+	MOVOU	X15, 80(DI)
+	MOVOU	X15, 96(DI)
+	MOVOU	X15, 112(DI)
+	MOVOU	X15, 128(DI)
+	MOVOU	X15, 144(DI)
+	MOVOU	X15, 160(DI)
+	MOVOU	X15, 176(DI)
+	MOVOU	X15, 192(DI)
+	MOVOU	X15, 208(DI)
+	MOVOU	X15, 224(DI)
+	MOVOU	X15, 240(DI)
+	SUBQ	$256, BX
+	ADDQ	$256, DI
+	CMPQ	BX, $256
+	JAE	loop
+	JMP	tail
+#endif
+
+loop_preheader_avx2:
+	VPXOR X0, X0, X0
+	// For smaller sizes MOVNTDQ may be faster or slower depending on hardware.
+	// For larger sizes it is always faster, even on dual Xeons with 30M cache.
+	// TODO take into account actual LLC size. E. g. glibc uses LLC size/2.
+	CMPQ    BX, $0x2000000
+	JAE	loop_preheader_avx2_huge
+
+loop_avx2:
+	VMOVDQU	Y0, 0(DI)
+	VMOVDQU	Y0, 32(DI)
+	VMOVDQU	Y0, 64(DI)
+	VMOVDQU	Y0, 96(DI)
+	SUBQ	$128, BX
+	ADDQ	$128, DI
+	CMPQ	BX, $128
+	JAE	loop_avx2
+	VMOVDQU  Y0, -32(DI)(BX*1)
+	VMOVDQU  Y0, -64(DI)(BX*1)
+	VMOVDQU  Y0, -96(DI)(BX*1)
+	VMOVDQU  Y0, -128(DI)(BX*1)
+	VZEROUPPER
+	RET
+
+loop_preheader_erms:
+#ifndef hasAVX2
+	CMPB	internal∕cpu·X86+const_offsetX86HasAVX2(SB), $1
+	JNE	loop_erms
+#endif
+
+	VPXOR X0, X0, X0
+	// At this point both ERMS and AVX2 is supported. While REP STOS can use a no-RFO
+	// write protocol, ERMS could show the same or slower performance comparing to
+	// Non-Temporal Stores when the size is bigger than LLC depending on hardware.
+	CMPQ	BX, $0x2000000
+	JAE	loop_preheader_avx2_huge
+
+loop_erms:
+	// STOSQ is used to guarantee that the whole zeroed pointer-sized word is visible
+	// for a memory subsystem as the GC requires this.
+	MOVQ	BX, CX
+	SHRQ	$3, CX
+	ANDQ	$7, BX
+	REP;	STOSQ
+	JMP	tail
+
+loop_preheader_avx2_huge:
+	// Align to 32 byte boundary
+	VMOVDQU  Y0, 0(DI)
+	MOVQ	DI, SI
+	ADDQ	$32, DI
+	ANDQ	$~31, DI
+	SUBQ	DI, SI
+	ADDQ	SI, BX
+loop_avx2_huge:
+	VMOVNTDQ	Y0, 0(DI)
+	VMOVNTDQ	Y0, 32(DI)
+	VMOVNTDQ	Y0, 64(DI)
+	VMOVNTDQ	Y0, 96(DI)
+	SUBQ	$128, BX
+	ADDQ	$128, DI
+	CMPQ	BX, $128
+	JAE	loop_avx2_huge
+	// In the description of MOVNTDQ in [1]
+	// "... fencing operation implemented with the SFENCE or MFENCE instruction
+	// should be used in conjunction with MOVNTDQ instructions..."
+	// [1] 64-ia-32-architectures-software-developer-manual-325462.pdf
+	SFENCE
+	VMOVDQU  Y0, -32(DI)(BX*1)
+	VMOVDQU  Y0, -64(DI)(BX*1)
+	VMOVDQU  Y0, -96(DI)(BX*1)
+	VMOVDQU  Y0, -128(DI)(BX*1)
+	VZEROUPPER
+	RET
+
+_1or2:
+	MOVB	AX, (DI)
+	MOVB	AX, -1(DI)(BX*1)
+	RET
+_0:
+	RET
+_3or4:
+	MOVW	AX, (DI)
+	MOVW	AX, -2(DI)(BX*1)
+	RET
+_5through7:
+	MOVL	AX, (DI)
+	MOVL	AX, -4(DI)(BX*1)
+	RET
+_8:
+	// We need a separate case for 8 to make sure we clear pointers atomically.
+	MOVQ	AX, (DI)
+	RET
+_9through16:
+	MOVQ	AX, (DI)
+	MOVQ	AX, -8(DI)(BX*1)
+	RET
+_17through32:
+	MOVOU	X15, (DI)
+	MOVOU	X15, -16(DI)(BX*1)
+	RET
+_33through64:
+	MOVOU	X15, (DI)
+	MOVOU	X15, 16(DI)
+	MOVOU	X15, -32(DI)(BX*1)
+	MOVOU	X15, -16(DI)(BX*1)
+	RET
+_65through128:
+	MOVOU	X15, (DI)
+	MOVOU	X15, 16(DI)
+	MOVOU	X15, 32(DI)
+	MOVOU	X15, 48(DI)
+	MOVOU	X15, -64(DI)(BX*1)
+	MOVOU	X15, -48(DI)(BX*1)
+	MOVOU	X15, -32(DI)(BX*1)
+	MOVOU	X15, -16(DI)(BX*1)
+	RET
+_129through256:
+	MOVOU	X15, (DI)
+	MOVOU	X15, 16(DI)
+	MOVOU	X15, 32(DI)
+	MOVOU	X15, 48(DI)
+	MOVOU	X15, 64(DI)
+	MOVOU	X15, 80(DI)
+	MOVOU	X15, 96(DI)
+	MOVOU	X15, 112(DI)
+	MOVOU	X15, -128(DI)(BX*1)
+	MOVOU	X15, -112(DI)(BX*1)
+	MOVOU	X15, -96(DI)(BX*1)
+	MOVOU	X15, -80(DI)(BX*1)
+	MOVOU	X15, -64(DI)(BX*1)
+	MOVOU	X15, -48(DI)(BX*1)
+	MOVOU	X15, -32(DI)(BX*1)
+	MOVOU	X15, -16(DI)(BX*1)
+	RET
diff --git a/src/runtime/memclr_arm.s b/src/runtime/memclr_arm.s
new file mode 100644
index 0000000..f02d058
--- /dev/null
+++ b/src/runtime/memclr_arm.s
@@ -0,0 +1,91 @@
+// Inferno's libkern/memset-arm.s
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/memset-arm.s
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+#include "textflag.h"
+
+#define TO	R8
+#define TOE	R11
+#define N	R12
+#define TMP	R12				/* N and TMP don't overlap */
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+// Also called from assembly in sys_windows_arm.s without g (but using Go stack convention).
+TEXT runtime·memclrNoHeapPointers(SB),NOSPLIT,$0-8
+	MOVW	ptr+0(FP), TO
+	MOVW	n+4(FP), N
+	MOVW	$0, R0
+
+	ADD	N, TO, TOE	/* to end pointer */
+
+	CMP	$4, N		/* need at least 4 bytes to copy */
+	BLT	_1tail
+
+_4align:				/* align on 4 */
+	AND.S	$3, TO, TMP
+	BEQ	_4aligned
+
+	MOVBU.P	R0, 1(TO)		/* implicit write back */
+	B	_4align
+
+_4aligned:
+	SUB	$31, TOE, TMP	/* do 32-byte chunks if possible */
+	CMP	TMP, TO
+	BHS	_4tail
+
+	MOVW	R0, R1			/* replicate */
+	MOVW	R0, R2
+	MOVW	R0, R3
+	MOVW	R0, R4
+	MOVW	R0, R5
+	MOVW	R0, R6
+	MOVW	R0, R7
+
+_f32loop:
+	CMP	TMP, TO
+	BHS	_4tail
+
+	MOVM.IA.W [R0-R7], (TO)
+	B	_f32loop
+
+_4tail:
+	SUB	$3, TOE, TMP	/* do remaining words if possible */
+_4loop:
+	CMP	TMP, TO
+	BHS	_1tail
+
+	MOVW.P	R0, 4(TO)		/* implicit write back */
+	B	_4loop
+
+_1tail:
+	CMP	TO, TOE
+	BEQ	_return
+
+	MOVBU.P	R0, 1(TO)		/* implicit write back */
+	B	_1tail
+
+_return:
+	RET
diff --git a/src/runtime/memclr_arm64.s b/src/runtime/memclr_arm64.s
new file mode 100644
index 0000000..1c35dfe
--- /dev/null
+++ b/src/runtime/memclr_arm64.s
@@ -0,0 +1,182 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+// Also called from assembly in sys_windows_arm64.s without g (but using Go stack convention).
+TEXT runtime·memclrNoHeapPointers<ABIInternal>(SB),NOSPLIT,$0-16
+	CMP	$16, R1
+	// If n is equal to 16 bytes, use zero_exact_16 to zero
+	BEQ	zero_exact_16
+
+	// If n is greater than 16 bytes, use zero_by_16 to zero
+	BHI	zero_by_16
+
+	// n is less than 16 bytes
+	ADD	R1, R0, R7
+	TBZ	$3, R1, less_than_8
+	MOVD	ZR, (R0)
+	MOVD	ZR, -8(R7)
+	RET
+
+less_than_8:
+	TBZ	$2, R1, less_than_4
+	MOVW	ZR, (R0)
+	MOVW	ZR, -4(R7)
+	RET
+
+less_than_4:
+	CBZ	R1, ending
+	MOVB	ZR, (R0)
+	TBZ	$1, R1, ending
+	MOVH	ZR, -2(R7)
+
+ending:
+	RET
+
+zero_exact_16:
+	// n is exactly 16 bytes
+	STP	(ZR, ZR), (R0)
+	RET
+
+zero_by_16:
+	// n greater than 16 bytes, check if the start address is aligned
+	NEG	R0, R4
+	ANDS	$15, R4, R4
+	// Try zeroing using zva if the start address is aligned with 16
+	BEQ	try_zva
+
+	// Non-aligned store
+	STP	(ZR, ZR), (R0)
+	// Make the destination aligned
+	SUB	R4, R1, R1
+	ADD	R4, R0, R0
+	B	try_zva
+
+tail_maybe_long:
+	CMP	$64, R1
+	BHS	no_zva
+
+tail63:
+	ANDS	$48, R1, R3
+	BEQ	last16
+	CMPW	$32, R3
+	BEQ	last48
+	BLT	last32
+	STP.P	(ZR, ZR), 16(R0)
+last48:
+	STP.P	(ZR, ZR), 16(R0)
+last32:
+	STP.P	(ZR, ZR), 16(R0)
+	// The last store length is at most 16, so it is safe to use
+	// stp to write last 16 bytes
+last16:
+	ANDS	$15, R1, R1
+	CBZ	R1, last_end
+	ADD	R1, R0, R0
+	STP	(ZR, ZR), -16(R0)
+last_end:
+	RET
+
+no_zva:
+	SUB	$16, R0, R0
+	SUB	$64, R1, R1
+
+loop_64:
+	STP	(ZR, ZR), 16(R0)
+	STP	(ZR, ZR), 32(R0)
+	STP	(ZR, ZR), 48(R0)
+	STP.W	(ZR, ZR), 64(R0)
+	SUBS	$64, R1, R1
+	BGE	loop_64
+	ANDS	$63, R1, ZR
+	ADD	$16, R0, R0
+	BNE	tail63
+	RET
+
+try_zva:
+	// Try using the ZVA feature to zero entire cache lines
+	// It is not meaningful to use ZVA if the block size is less than 64,
+	// so make sure that n is greater than or equal to 64
+	CMP	$63, R1
+	BLE	tail63
+
+	CMP	$128, R1
+	// Ensure n is at least 128 bytes, so that there is enough to copy after
+	// alignment.
+	BLT	no_zva
+	// Check if ZVA is allowed from user code, and if so get the block size
+	MOVW	block_size<>(SB), R5
+	TBNZ	$31, R5, no_zva
+	CBNZ	R5, zero_by_line
+	// DCZID_EL0 bit assignments
+	// [63:5] Reserved
+	// [4]    DZP, if bit set DC ZVA instruction is prohibited, else permitted
+	// [3:0]  log2 of the block size in words, eg. if it returns 0x4 then block size is 16 words
+	MRS	DCZID_EL0, R3
+	TBZ	$4, R3, init
+	// ZVA not available
+	MOVW	$~0, R5
+	MOVW	R5, block_size<>(SB)
+	B	no_zva
+
+init:
+	MOVW	$4, R9
+	ANDW	$15, R3, R5
+	LSLW	R5, R9, R5
+	MOVW	R5, block_size<>(SB)
+
+	ANDS	$63, R5, R9
+	// Block size is less than 64.
+	BNE	no_zva
+
+zero_by_line:
+	CMP	R5, R1
+	// Not enough memory to reach alignment
+	BLO	no_zva
+	SUB	$1, R5, R6
+	NEG	R0, R4
+	ANDS	R6, R4, R4
+	// Already aligned
+	BEQ	aligned
+
+	// check there is enough to copy after alignment
+	SUB	R4, R1, R3
+
+	// Check that the remaining length to ZVA after alignment
+	// is greater than 64.
+	CMP	$64, R3
+	CCMP	GE, R3, R5, $10  // condition code GE, NZCV=0b1010
+	BLT	no_zva
+
+	// We now have at least 64 bytes to zero, update n
+	MOVD	R3, R1
+
+loop_zva_prolog:
+	STP	(ZR, ZR), (R0)
+	STP	(ZR, ZR), 16(R0)
+	STP	(ZR, ZR), 32(R0)
+	SUBS	$64, R4, R4
+	STP	(ZR, ZR), 48(R0)
+	ADD	$64, R0, R0
+	BGE	loop_zva_prolog
+
+	ADD	R4, R0, R0
+
+aligned:
+	SUB	R5, R1, R1
+
+loop_zva:
+	WORD	$0xd50b7420 // DC ZVA, R0
+	ADD	R5, R0, R0
+	SUBS	R5, R1, R1
+	BHS	loop_zva
+	ANDS	R6, R1, R1
+	BNE	tail_maybe_long
+	RET
+
+GLOBL block_size<>(SB), NOPTR, $8
diff --git a/src/runtime/memclr_loong64.s b/src/runtime/memclr_loong64.s
new file mode 100644
index 0000000..7bb6f3d
--- /dev/null
+++ b/src/runtime/memclr_loong64.s
@@ -0,0 +1,42 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB),NOSPLIT,$0-16
+	MOVV	ptr+0(FP), R6
+	MOVV	n+8(FP), R7
+	ADDV	R6, R7, R4
+
+	// if less than 8 bytes, do one byte at a time
+	SGTU	$8, R7, R8
+	BNE	R8, out
+
+	// do one byte at a time until 8-aligned
+	AND	$7, R6, R8
+	BEQ	R8, words
+	MOVB	R0, (R6)
+	ADDV	$1, R6
+	JMP	-4(PC)
+
+words:
+	// do 8 bytes at a time if there is room
+	ADDV	$-7, R4, R7
+
+	PCALIGN	$16
+	SGTU	R7, R6, R8
+	BEQ	R8, out
+	MOVV	R0, (R6)
+	ADDV	$8, R6
+	JMP	-4(PC)
+
+out:
+	BEQ	R6, R4, done
+	MOVB	R0, (R6)
+	ADDV	$1, R6
+	JMP	-3(PC)
+done:
+	RET
diff --git a/src/runtime/memclr_mips64x.s b/src/runtime/memclr_mips64x.s
new file mode 100644
index 0000000..cf3a9c4
--- /dev/null
+++ b/src/runtime/memclr_mips64x.s
@@ -0,0 +1,99 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips64 || mips64le
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB),NOSPLIT,$0-16
+	MOVV	ptr+0(FP), R1
+	MOVV	n+8(FP), R2
+	ADDV	R1, R2, R4
+
+	// if less than 16 bytes or no MSA, do words check
+	SGTU	$16, R2, R3
+	BNE	R3, no_msa
+	MOVBU	internal∕cpu·MIPS64X+const_offsetMIPS64XHasMSA(SB), R3
+	BEQ	R3, R0, no_msa
+
+	VMOVB	$0, W0
+
+	SGTU	$128, R2, R3
+	BEQ	R3, msa_large
+
+	AND	$15, R2, R5
+	XOR	R2, R5, R6
+	ADDVU	R1, R6
+
+msa_small:
+	VMOVB	W0, (R1)
+	ADDVU	$16, R1
+	SGTU	R6, R1, R3
+	BNE	R3, R0, msa_small
+	BEQ	R5, R0, done
+	VMOVB	W0, -16(R4)
+	JMP	done
+
+msa_large:
+	AND	$127, R2, R5
+	XOR	R2, R5, R6
+	ADDVU	R1, R6
+
+msa_large_loop:
+	VMOVB	W0, (R1)
+	VMOVB	W0, 16(R1)
+	VMOVB	W0, 32(R1)
+	VMOVB	W0, 48(R1)
+	VMOVB	W0, 64(R1)
+	VMOVB	W0, 80(R1)
+	VMOVB	W0, 96(R1)
+	VMOVB	W0, 112(R1)
+
+	ADDVU	$128, R1
+	SGTU	R6, R1, R3
+	BNE	R3, R0, msa_large_loop
+	BEQ	R5, R0, done
+	VMOVB	W0, -128(R4)
+	VMOVB	W0, -112(R4)
+	VMOVB	W0, -96(R4)
+	VMOVB	W0, -80(R4)
+	VMOVB	W0, -64(R4)
+	VMOVB	W0, -48(R4)
+	VMOVB	W0, -32(R4)
+	VMOVB	W0, -16(R4)
+	JMP	done
+
+no_msa:
+	// if less than 8 bytes, do one byte at a time
+	SGTU	$8, R2, R3
+	BNE	R3, out
+
+	// do one byte at a time until 8-aligned
+	AND	$7, R1, R3
+	BEQ	R3, words
+	MOVB	R0, (R1)
+	ADDV	$1, R1
+	JMP	-4(PC)
+
+words:
+	// do 8 bytes at a time if there is room
+	ADDV	$-7, R4, R2
+
+	SGTU	R2, R1, R3
+	BEQ	R3, out
+	MOVV	R0, (R1)
+	ADDV	$8, R1
+	JMP	-4(PC)
+
+out:
+	BEQ	R1, R4, done
+	MOVB	R0, (R1)
+	ADDV	$1, R1
+	JMP	-3(PC)
+done:
+	RET
diff --git a/src/runtime/memclr_mipsx.s b/src/runtime/memclr_mipsx.s
new file mode 100644
index 0000000..ee3009d
--- /dev/null
+++ b/src/runtime/memclr_mipsx.s
@@ -0,0 +1,73 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips || mipsle
+
+#include "textflag.h"
+
+#ifdef GOARCH_mips
+#define MOVWHI  MOVWL
+#define MOVWLO  MOVWR
+#else
+#define MOVWHI  MOVWR
+#define MOVWLO  MOVWL
+#endif
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB),NOSPLIT,$0-8
+	MOVW	n+4(FP), R2
+	MOVW	ptr+0(FP), R1
+
+	SGTU	$4, R2, R3
+	ADDU	R2, R1, R4
+	BNE	R3, small_zero
+
+ptr_align:
+	AND	$3, R1, R3
+	BEQ	R3, setup
+	SUBU	R1, R0, R3
+	AND	$3, R3		// R3 contains number of bytes needed to align ptr
+	MOVWHI	R0, 0(R1)	// MOVWHI will write zeros up to next word boundary
+	SUBU	R3, R2
+	ADDU	R3, R1
+
+setup:
+	AND	$31, R2, R6
+	AND	$3, R2, R5
+	SUBU	R6, R4, R6	// end pointer for 32-byte chunks
+	SUBU	R5, R4, R5	// end pointer for 4-byte chunks
+
+large:
+	BEQ	R1, R6, words
+	MOVW	R0, 0(R1)
+	MOVW	R0, 4(R1)
+	MOVW	R0, 8(R1)
+	MOVW	R0, 12(R1)
+	MOVW	R0, 16(R1)
+	MOVW	R0, 20(R1)
+	MOVW	R0, 24(R1)
+	MOVW	R0, 28(R1)
+	ADDU	$32, R1
+	JMP	large
+
+words:
+	BEQ	R1, R5, tail
+	MOVW	R0, 0(R1)
+	ADDU	$4, R1
+	JMP	words
+
+tail:
+	BEQ	R1, R4, ret
+	MOVWLO	R0, -1(R4)
+
+ret:
+	RET
+
+small_zero:
+	BEQ	R1, R4, ret
+	MOVB	R0, 0(R1)
+	ADDU	$1, R1
+	JMP	small_zero
diff --git a/src/runtime/memclr_plan9_386.s b/src/runtime/memclr_plan9_386.s
new file mode 100644
index 0000000..54701a9
--- /dev/null
+++ b/src/runtime/memclr_plan9_386.s
@@ -0,0 +1,58 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB), NOSPLIT, $0-8
+	MOVL	ptr+0(FP), DI
+	MOVL	n+4(FP), BX
+	XORL	AX, AX
+
+tail:
+	TESTL	BX, BX
+	JEQ	_0
+	CMPL	BX, $2
+	JBE	_1or2
+	CMPL	BX, $4
+	JB	_3
+	JE	_4
+	CMPL	BX, $8
+	JBE	_5through8
+	CMPL	BX, $16
+	JBE	_9through16
+	MOVL	BX, CX
+	SHRL	$2, CX
+	REP
+	STOSL
+	ANDL	$3, BX
+	JNE	tail
+	RET
+
+_1or2:
+	MOVB	AX, (DI)
+	MOVB	AX, -1(DI)(BX*1)
+	RET
+_0:
+	RET
+_3:
+	MOVW	AX, (DI)
+	MOVB	AX, 2(DI)
+	RET
+_4:
+	// We need a separate case for 4 to make sure we clear pointers atomically.
+	MOVL	AX, (DI)
+	RET
+_5through8:
+	MOVL	AX, (DI)
+	MOVL	AX, -4(DI)(BX*1)
+	RET
+_9through16:
+	MOVL	AX, (DI)
+	MOVL	AX, 4(DI)
+	MOVL	AX, -8(DI)(BX*1)
+	MOVL	AX, -4(DI)(BX*1)
+	RET
diff --git a/src/runtime/memclr_plan9_amd64.s b/src/runtime/memclr_plan9_amd64.s
new file mode 100644
index 0000000..8c6a1cc
--- /dev/null
+++ b/src/runtime/memclr_plan9_amd64.s
@@ -0,0 +1,23 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB),NOSPLIT,$0-16
+	MOVQ	ptr+0(FP), DI
+	MOVQ	n+8(FP), CX
+	MOVQ	CX, BX
+	ANDQ	$7, BX
+	SHRQ	$3, CX
+	MOVQ	$0, AX
+	CLD
+	REP
+	STOSQ
+	MOVQ	BX, CX
+	REP
+	STOSB
+	RET
diff --git a/src/runtime/memclr_ppc64x.s b/src/runtime/memclr_ppc64x.s
new file mode 100644
index 0000000..bc4b3fc
--- /dev/null
+++ b/src/runtime/memclr_ppc64x.s
@@ -0,0 +1,190 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ppc64 || ppc64le
+
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-16
+	// R3 = ptr
+	// R4 = n
+
+	// Determine if there are doublewords to clear
+check:
+	ANDCC $7, R4, R5  // R5: leftover bytes to clear
+	SRD   $3, R4, R6  // R6: double words to clear
+	CMP   R6, $0, CR1 // CR1[EQ] set if no double words
+
+	BC    12, 6, nozerolarge // only single bytes
+	CMP   R4, $512
+	BLT   under512           // special case for < 512
+	ANDCC $127, R3, R8       // check for 128 alignment of address
+	BEQ   zero512setup
+
+	ANDCC $7, R3, R15
+	BEQ   zero512xsetup // at least 8 byte aligned
+
+	// zero bytes up to 8 byte alignment
+
+	ANDCC $1, R3, R15 // check for byte alignment
+	BEQ   byte2
+	MOVB  R0, 0(R3)   // zero 1 byte
+	ADD   $1, R3      // bump ptr by 1
+	ADD   $-1, R4
+
+byte2:
+	ANDCC $2, R3, R15 // check for 2 byte alignment
+	BEQ   byte4
+	MOVH  R0, 0(R3)   // zero 2 bytes
+	ADD   $2, R3      // bump ptr by 2
+	ADD   $-2, R4
+
+byte4:
+	ANDCC $4, R3, R15   // check for 4 byte alignment
+	BEQ   zero512xsetup
+	MOVW  R0, 0(R3)     // zero 4 bytes
+	ADD   $4, R3        // bump ptr by 4
+	ADD   $-4, R4
+	BR    zero512xsetup // ptr should now be 8 byte aligned
+
+under512:
+	SRDCC $3, R6, R7  // 64 byte chunks?
+	XXLXOR VS32, VS32, VS32 // clear VS32 (V0)
+	BEQ   lt64gt8
+
+	// Prepare to clear 64 bytes at a time.
+
+zero64setup:
+	DCBTST (R3)             // prepare data cache
+	MOVD   R7, CTR          // number of 64 byte chunks
+	MOVD   $16, R8
+	MOVD   $32, R16
+	MOVD   $48, R17
+
+zero64:
+	STXVD2X VS32, (R3+R0)   // store 16 bytes
+	STXVD2X VS32, (R3+R8)
+	STXVD2X VS32, (R3+R16)
+	STXVD2X VS32, (R3+R17)
+	ADD     $64, R3
+	ADD     $-64, R4
+	BDNZ    zero64          // dec ctr, br zero64 if ctr not 0
+	SRDCC   $3, R4, R6	// remaining doublewords
+	BEQ     nozerolarge
+
+lt64gt8:
+	CMP	R4, $32
+	BLT	lt32gt8
+	MOVD	$16, R8
+	STXVD2X	VS32, (R3+R0)
+	STXVD2X	VS32, (R3+R8)
+	ADD	$-32, R4
+	ADD	$32, R3
+lt32gt8:
+	CMP	R4, $16
+	BLT	lt16gt8
+	STXVD2X	VS32, (R3+R0)
+	ADD	$16, R3
+	ADD	$-16, R4
+lt16gt8:
+#ifdef GOPPC64_power10
+	SLD	$56, R4, R7
+	STXVL   V0, R3, R7
+	RET
+#else
+	CMP	R4, $8
+	BLT	nozerolarge
+	MOVD	R0, 0(R3)
+	ADD	$8, R3
+	ADD	$-8, R4
+#endif
+nozerolarge:
+	ANDCC $7, R4, R5 // any remaining bytes
+	BC    4, 1, LR   // ble lr
+#ifdef GOPPC64_power10
+	XXLXOR  VS32, VS32, VS32 // clear VS32 (V0)
+	SLD	$56, R5, R7
+	STXVL   V0, R3, R7
+	RET
+#else
+	CMP   R5, $4
+	BLT   next2
+	MOVW  R0, 0(R3)
+	ADD   $4, R3
+	ADD   $-4, R5
+next2:
+	CMP   R5, $2
+	BLT   next1
+	MOVH  R0, 0(R3)
+	ADD   $2, R3
+	ADD   $-2, R5
+next1:
+	CMP   R5, $0
+	BC    12, 2, LR      // beqlr
+	MOVB  R0, 0(R3)
+	RET
+#endif
+
+zero512xsetup:  // 512 chunk with extra needed
+	ANDCC $8, R3, R11    // 8 byte alignment?
+	BEQ   zero512setup16
+	MOVD  R0, 0(R3)      // clear 8 bytes
+	ADD   $8, R3         // update ptr to next 8
+	ADD   $-8, R4        // dec count by 8
+
+zero512setup16:
+	ANDCC $127, R3, R14 // < 128 byte alignment
+	BEQ   zero512setup  // handle 128 byte alignment
+	MOVD  $128, R15
+	SUB   R14, R15, R14 // find increment to 128 alignment
+	SRD   $4, R14, R15  // number of 16 byte chunks
+	MOVD   R15, CTR         // loop counter of 16 bytes
+	XXLXOR VS32, VS32, VS32 // clear VS32 (V0)
+
+zero512preloop:  // clear up to 128 alignment
+	STXVD2X VS32, (R3+R0)         // clear 16 bytes
+	ADD     $16, R3               // update ptr
+	ADD     $-16, R4              // dec count
+	BDNZ    zero512preloop
+
+zero512setup:  // setup for dcbz loop
+	CMP  R4, $512   // check if at least 512
+	BLT  remain
+	SRD  $9, R4, R8 // loop count for 512 chunks
+	MOVD R8, CTR    // set up counter
+	MOVD $128, R9   // index regs for 128 bytes
+	MOVD $256, R10
+	MOVD $384, R11
+	PCALIGN $16
+zero512:
+	DCBZ (R3+R0)        // clear first chunk
+	DCBZ (R3+R9)        // clear second chunk
+	DCBZ (R3+R10)       // clear third chunk
+	DCBZ (R3+R11)       // clear fourth chunk
+	ADD  $512, R3
+	BDNZ zero512
+	ANDCC $511, R4
+
+remain:
+	CMP  R4, $128  // check if 128 byte chunks left
+	BLT  smaller
+	DCBZ (R3+R0)   // clear 128
+	ADD  $128, R3
+	ADD  $-128, R4
+	BR   remain
+
+smaller:
+	ANDCC $127, R4, R7 // find leftovers
+	BEQ   done
+	CMP   R7, $64      // more than 64, do 64 at a time
+	XXLXOR VS32, VS32, VS32
+	BLT   lt64gt8	   // less than 64
+	SRD   $6, R7, R7   // set up counter for 64
+	BR    zero64setup
+
+done:
+	RET
diff --git a/src/runtime/memclr_riscv64.s b/src/runtime/memclr_riscv64.s
new file mode 100644
index 0000000..1c1e6ab
--- /dev/null
+++ b/src/runtime/memclr_riscv64.s
@@ -0,0 +1,104 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// void runtime·memclrNoHeapPointers(void*, uintptr)
+TEXT runtime·memclrNoHeapPointers<ABIInternal>(SB),NOSPLIT,$0-16
+	// X10 = ptr
+	// X11 = n
+
+	// If less than 8 bytes, do single byte zeroing.
+	MOV	$8, X9
+	BLT	X11, X9, check4
+
+	// Check alignment
+	AND	$7, X10, X5
+	BEQZ	X5, aligned
+
+	// Zero one byte at a time until we reach 8 byte alignment.
+	SUB	X5, X9, X5
+	SUB	X5, X11, X11
+align:
+	ADD	$-1, X5
+	MOVB	ZERO, 0(X10)
+	ADD	$1, X10
+	BNEZ	X5, align
+
+aligned:
+	// X9 already contains $8
+	BLT	X11, X9, check4
+	MOV	$16, X9
+	BLT	X11, X9, zero8
+	MOV	$32, X9
+	BLT	X11, X9, zero16
+	MOV	$64, X9
+	BLT	X11, X9, zero32
+loop64:
+	MOV	ZERO, 0(X10)
+	MOV	ZERO, 8(X10)
+	MOV	ZERO, 16(X10)
+	MOV	ZERO, 24(X10)
+	MOV	ZERO, 32(X10)
+	MOV	ZERO, 40(X10)
+	MOV	ZERO, 48(X10)
+	MOV	ZERO, 56(X10)
+	ADD	$64, X10
+	ADD	$-64, X11
+	BGE	X11, X9, loop64
+	BEQZ	X11, done
+
+check32:
+	MOV	$32, X9
+	BLT	X11, X9, check16
+zero32:
+	MOV	ZERO, 0(X10)
+	MOV	ZERO, 8(X10)
+	MOV	ZERO, 16(X10)
+	MOV	ZERO, 24(X10)
+	ADD	$32, X10
+	ADD	$-32, X11
+	BEQZ	X11, done
+
+check16:
+	MOV	$16, X9
+	BLT	X11, X9, check8
+zero16:
+	MOV	ZERO, 0(X10)
+	MOV	ZERO, 8(X10)
+	ADD	$16, X10
+	ADD	$-16, X11
+	BEQZ	X11, done
+
+check8:
+	MOV	$8, X9
+	BLT	X11, X9, check4
+zero8:
+	MOV	ZERO, 0(X10)
+	ADD	$8, X10
+	ADD	$-8, X11
+	BEQZ	X11, done
+
+check4:
+	MOV	$4, X9
+	BLT	X11, X9, loop1
+zero4:
+	MOVB	ZERO, 0(X10)
+	MOVB	ZERO, 1(X10)
+	MOVB	ZERO, 2(X10)
+	MOVB	ZERO, 3(X10)
+	ADD	$4, X10
+	ADD	$-4, X11
+
+loop1:
+	BEQZ	X11, done
+	MOVB	ZERO, 0(X10)
+	ADD	$1, X10
+	ADD	$-1, X11
+	JMP	loop1
+
+done:
+	RET
diff --git a/src/runtime/memclr_s390x.s b/src/runtime/memclr_s390x.s
new file mode 100644
index 0000000..fa657ef
--- /dev/null
+++ b/src/runtime/memclr_s390x.s
@@ -0,0 +1,124 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB),NOSPLIT|NOFRAME,$0-16
+	MOVD	ptr+0(FP), R4
+	MOVD	n+8(FP), R5
+
+start:
+	CMPBLE	R5, $3, clear0to3
+	CMPBLE	R5, $7, clear4to7
+	CMPBLE	R5, $11, clear8to11
+	CMPBLE	R5, $15, clear12to15
+	CMP	R5, $32
+	BGE	clearmt32
+	MOVD	$0, 0(R4)
+	MOVD	$0, 8(R4)
+	ADD	$16, R4
+	SUB	$16, R5
+	BR	start
+
+clear0to3:
+	CMPBEQ	R5, $0, done
+	CMPBNE	R5, $1, clear2
+	MOVB	$0, 0(R4)
+	RET
+clear2:
+	CMPBNE	R5, $2, clear3
+	MOVH	$0, 0(R4)
+	RET
+clear3:
+	MOVH	$0, 0(R4)
+	MOVB	$0, 2(R4)
+	RET
+
+clear4to7:
+	CMPBNE	R5, $4, clear5
+	MOVW	$0, 0(R4)
+	RET
+clear5:
+	CMPBNE	R5, $5, clear6
+	MOVW	$0, 0(R4)
+	MOVB	$0, 4(R4)
+	RET
+clear6:
+	CMPBNE	R5, $6, clear7
+	MOVW	$0, 0(R4)
+	MOVH	$0, 4(R4)
+	RET
+clear7:
+	MOVW	$0, 0(R4)
+	MOVH	$0, 4(R4)
+	MOVB	$0, 6(R4)
+	RET
+
+clear8to11:
+	CMPBNE	R5, $8, clear9
+	MOVD	$0, 0(R4)
+	RET
+clear9:
+	CMPBNE	R5, $9, clear10
+	MOVD	$0, 0(R4)
+	MOVB	$0, 8(R4)
+	RET
+clear10:
+	CMPBNE	R5, $10, clear11
+	MOVD	$0, 0(R4)
+	MOVH	$0, 8(R4)
+	RET
+clear11:
+	MOVD	$0, 0(R4)
+	MOVH	$0, 8(R4)
+	MOVB	$0, 10(R4)
+	RET
+
+clear12to15:
+	CMPBNE	R5, $12, clear13
+	MOVD	$0, 0(R4)
+	MOVW	$0, 8(R4)
+	RET
+clear13:
+	CMPBNE	R5, $13, clear14
+	MOVD	$0, 0(R4)
+	MOVW	$0, 8(R4)
+	MOVB	$0, 12(R4)
+	RET
+clear14:
+	CMPBNE	R5, $14, clear15
+	MOVD	$0, 0(R4)
+	MOVW	$0, 8(R4)
+	MOVH	$0, 12(R4)
+	RET
+clear15:
+	MOVD	$0, 0(R4)
+	MOVW	$0, 8(R4)
+	MOVH	$0, 12(R4)
+	MOVB	$0, 14(R4)
+	RET
+
+clearmt32:
+	CMP	R5, $256
+	BLT	clearlt256
+	XC	$256, 0(R4), 0(R4)
+	ADD	$256, R4
+	ADD	$-256, R5
+	BR	clearmt32
+clearlt256:
+	CMPBEQ	R5, $0, done
+	ADD	$-1, R5
+	EXRL	$memclr_exrl_xc<>(SB), R5
+done:
+	RET
+
+// DO NOT CALL - target for exrl (execute relative long) instruction.
+TEXT memclr_exrl_xc<>(SB),NOSPLIT|NOFRAME,$0-0
+	XC	$1, 0(R4), 0(R4)
+	MOVD	$0, 0(R0)
+	RET
+
diff --git a/src/runtime/memclr_wasm.s b/src/runtime/memclr_wasm.s
new file mode 100644
index 0000000..19d08ff
--- /dev/null
+++ b/src/runtime/memclr_wasm.s
@@ -0,0 +1,20 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memclrNoHeapPointers Go doc for important implementation constraints.
+
+// func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+TEXT runtime·memclrNoHeapPointers(SB), NOSPLIT, $0-16
+	MOVD ptr+0(FP), R0
+	MOVD n+8(FP), R1
+
+	Get R0
+	I32WrapI64
+	I32Const $0
+	Get R1
+	I32WrapI64
+	MemoryFill
+	RET
diff --git a/src/runtime/memmove_386.s b/src/runtime/memmove_386.s
new file mode 100644
index 0000000..6d7e17f
--- /dev/null
+++ b/src/runtime/memmove_386.s
@@ -0,0 +1,204 @@
+// Inferno's libkern/memmove-386.s
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/memmove-386.s
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+//go:build !plan9
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB), NOSPLIT, $0-12
+	MOVL	to+0(FP), DI
+	MOVL	from+4(FP), SI
+	MOVL	n+8(FP), BX
+
+	// REP instructions have a high startup cost, so we handle small sizes
+	// with some straightline code. The REP MOVSL instruction is really fast
+	// for large sizes. The cutover is approximately 1K.  We implement up to
+	// 128 because that is the maximum SSE register load (loading all data
+	// into registers lets us ignore copy direction).
+tail:
+	// BSR+branch table make almost all memmove/memclr benchmarks worse. Not worth doing.
+	TESTL	BX, BX
+	JEQ	move_0
+	CMPL	BX, $2
+	JBE	move_1or2
+	CMPL	BX, $4
+	JB	move_3
+	JE	move_4
+	CMPL	BX, $8
+	JBE	move_5through8
+	CMPL	BX, $16
+	JBE	move_9through16
+#ifdef GO386_softfloat
+	JMP	nosse2
+#endif
+	CMPL	BX, $32
+	JBE	move_17through32
+	CMPL	BX, $64
+	JBE	move_33through64
+	CMPL	BX, $128
+	JBE	move_65through128
+
+nosse2:
+/*
+ * check and set for backwards
+ */
+	CMPL	SI, DI
+	JLS	back
+
+/*
+ * forward copy loop
+ */
+forward:
+	// If REP MOVSB isn't fast, don't use it
+	CMPB	internal∕cpu·X86+const_offsetX86HasERMS(SB), $1 // enhanced REP MOVSB/STOSB
+	JNE	fwdBy4
+
+	// Check alignment
+	MOVL	SI, AX
+	ORL	DI, AX
+	TESTL	$3, AX
+	JEQ	fwdBy4
+
+	// Do 1 byte at a time
+	MOVL	BX, CX
+	REP;	MOVSB
+	RET
+
+fwdBy4:
+	// Do 4 bytes at a time
+	MOVL	BX, CX
+	SHRL	$2, CX
+	ANDL	$3, BX
+	REP;	MOVSL
+	JMP	tail
+
+/*
+ * check overlap
+ */
+back:
+	MOVL	SI, CX
+	ADDL	BX, CX
+	CMPL	CX, DI
+	JLS	forward
+/*
+ * whole thing backwards has
+ * adjusted addresses
+ */
+
+	ADDL	BX, DI
+	ADDL	BX, SI
+	STD
+
+/*
+ * copy
+ */
+	MOVL	BX, CX
+	SHRL	$2, CX
+	ANDL	$3, BX
+
+	SUBL	$4, DI
+	SUBL	$4, SI
+	REP;	MOVSL
+
+	CLD
+	ADDL	$4, DI
+	ADDL	$4, SI
+	SUBL	BX, DI
+	SUBL	BX, SI
+	JMP	tail
+
+move_1or2:
+	MOVB	(SI), AX
+	MOVB	-1(SI)(BX*1), CX
+	MOVB	AX, (DI)
+	MOVB	CX, -1(DI)(BX*1)
+	RET
+move_0:
+	RET
+move_3:
+	MOVW	(SI), AX
+	MOVB	2(SI), CX
+	MOVW	AX, (DI)
+	MOVB	CX, 2(DI)
+	RET
+move_4:
+	// We need a separate case for 4 to make sure we write pointers atomically.
+	MOVL	(SI), AX
+	MOVL	AX, (DI)
+	RET
+move_5through8:
+	MOVL	(SI), AX
+	MOVL	-4(SI)(BX*1), CX
+	MOVL	AX, (DI)
+	MOVL	CX, -4(DI)(BX*1)
+	RET
+move_9through16:
+	MOVL	(SI), AX
+	MOVL	4(SI), CX
+	MOVL	-8(SI)(BX*1), DX
+	MOVL	-4(SI)(BX*1), BP
+	MOVL	AX, (DI)
+	MOVL	CX, 4(DI)
+	MOVL	DX, -8(DI)(BX*1)
+	MOVL	BP, -4(DI)(BX*1)
+	RET
+move_17through32:
+	MOVOU	(SI), X0
+	MOVOU	-16(SI)(BX*1), X1
+	MOVOU	X0, (DI)
+	MOVOU	X1, -16(DI)(BX*1)
+	RET
+move_33through64:
+	MOVOU	(SI), X0
+	MOVOU	16(SI), X1
+	MOVOU	-32(SI)(BX*1), X2
+	MOVOU	-16(SI)(BX*1), X3
+	MOVOU	X0, (DI)
+	MOVOU	X1, 16(DI)
+	MOVOU	X2, -32(DI)(BX*1)
+	MOVOU	X3, -16(DI)(BX*1)
+	RET
+move_65through128:
+	MOVOU	(SI), X0
+	MOVOU	16(SI), X1
+	MOVOU	32(SI), X2
+	MOVOU	48(SI), X3
+	MOVOU	-64(SI)(BX*1), X4
+	MOVOU	-48(SI)(BX*1), X5
+	MOVOU	-32(SI)(BX*1), X6
+	MOVOU	-16(SI)(BX*1), X7
+	MOVOU	X0, (DI)
+	MOVOU	X1, 16(DI)
+	MOVOU	X2, 32(DI)
+	MOVOU	X3, 48(DI)
+	MOVOU	X4, -64(DI)(BX*1)
+	MOVOU	X5, -48(DI)(BX*1)
+	MOVOU	X6, -32(DI)(BX*1)
+	MOVOU	X7, -16(DI)(BX*1)
+	RET
diff --git a/src/runtime/memmove_amd64.s b/src/runtime/memmove_amd64.s
new file mode 100644
index 0000000..018bb0b
--- /dev/null
+++ b/src/runtime/memmove_amd64.s
@@ -0,0 +1,532 @@
+// Derived from Inferno's libkern/memmove-386.s (adapted for amd64)
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/memmove-386.s
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+//go:build !plan9
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+// ABIInternal for performance.
+TEXT runtime·memmove<ABIInternal>(SB), NOSPLIT, $0-24
+	// AX = to
+	// BX = from
+	// CX = n
+	MOVQ	AX, DI
+	MOVQ	BX, SI
+	MOVQ	CX, BX
+
+	// REP instructions have a high startup cost, so we handle small sizes
+	// with some straightline code. The REP MOVSQ instruction is really fast
+	// for large sizes. The cutover is approximately 2K.
+tail:
+	// move_129through256 or smaller work whether or not the source and the
+	// destination memory regions overlap because they load all data into
+	// registers before writing it back.  move_256through2048 on the other
+	// hand can be used only when the memory regions don't overlap or the copy
+	// direction is forward.
+	//
+	// BSR+branch table make almost all memmove/memclr benchmarks worse. Not worth doing.
+	TESTQ	BX, BX
+	JEQ	move_0
+	CMPQ	BX, $2
+	JBE	move_1or2
+	CMPQ	BX, $4
+	JB	move_3
+	JBE	move_4
+	CMPQ	BX, $8
+	JB	move_5through7
+	JE	move_8
+	CMPQ	BX, $16
+	JBE	move_9through16
+	CMPQ	BX, $32
+	JBE	move_17through32
+	CMPQ	BX, $64
+	JBE	move_33through64
+	CMPQ	BX, $128
+	JBE	move_65through128
+	CMPQ	BX, $256
+	JBE	move_129through256
+
+	TESTB	$1, runtime·useAVXmemmove(SB)
+	JNZ	avxUnaligned
+
+/*
+ * check and set for backwards
+ */
+	CMPQ	SI, DI
+	JLS	back
+
+/*
+ * forward copy loop
+ */
+forward:
+	CMPQ	BX, $2048
+	JLS	move_256through2048
+
+	// If REP MOVSB isn't fast, don't use it
+	CMPB	internal∕cpu·X86+const_offsetX86HasERMS(SB), $1 // enhanced REP MOVSB/STOSB
+	JNE	fwdBy8
+
+	// Check alignment
+	MOVL	SI, AX
+	ORL	DI, AX
+	TESTL	$7, AX
+	JEQ	fwdBy8
+
+	// Do 1 byte at a time
+	MOVQ	BX, CX
+	REP;	MOVSB
+	RET
+
+fwdBy8:
+	// Do 8 bytes at a time
+	MOVQ	BX, CX
+	SHRQ	$3, CX
+	ANDQ	$7, BX
+	REP;	MOVSQ
+	JMP	tail
+
+back:
+/*
+ * check overlap
+ */
+	MOVQ	SI, CX
+	ADDQ	BX, CX
+	CMPQ	CX, DI
+	JLS	forward
+/*
+ * whole thing backwards has
+ * adjusted addresses
+ */
+	ADDQ	BX, DI
+	ADDQ	BX, SI
+	STD
+
+/*
+ * copy
+ */
+	MOVQ	BX, CX
+	SHRQ	$3, CX
+	ANDQ	$7, BX
+
+	SUBQ	$8, DI
+	SUBQ	$8, SI
+	REP;	MOVSQ
+
+	CLD
+	ADDQ	$8, DI
+	ADDQ	$8, SI
+	SUBQ	BX, DI
+	SUBQ	BX, SI
+	JMP	tail
+
+move_1or2:
+	MOVB	(SI), AX
+	MOVB	-1(SI)(BX*1), CX
+	MOVB	AX, (DI)
+	MOVB	CX, -1(DI)(BX*1)
+	RET
+move_0:
+	RET
+move_4:
+	MOVL	(SI), AX
+	MOVL	AX, (DI)
+	RET
+move_3:
+	MOVW	(SI), AX
+	MOVB	2(SI), CX
+	MOVW	AX, (DI)
+	MOVB	CX, 2(DI)
+	RET
+move_5through7:
+	MOVL	(SI), AX
+	MOVL	-4(SI)(BX*1), CX
+	MOVL	AX, (DI)
+	MOVL	CX, -4(DI)(BX*1)
+	RET
+move_8:
+	// We need a separate case for 8 to make sure we write pointers atomically.
+	MOVQ	(SI), AX
+	MOVQ	AX, (DI)
+	RET
+move_9through16:
+	MOVQ	(SI), AX
+	MOVQ	-8(SI)(BX*1), CX
+	MOVQ	AX, (DI)
+	MOVQ	CX, -8(DI)(BX*1)
+	RET
+move_17through32:
+	MOVOU	(SI), X0
+	MOVOU	-16(SI)(BX*1), X1
+	MOVOU	X0, (DI)
+	MOVOU	X1, -16(DI)(BX*1)
+	RET
+move_33through64:
+	MOVOU	(SI), X0
+	MOVOU	16(SI), X1
+	MOVOU	-32(SI)(BX*1), X2
+	MOVOU	-16(SI)(BX*1), X3
+	MOVOU	X0, (DI)
+	MOVOU	X1, 16(DI)
+	MOVOU	X2, -32(DI)(BX*1)
+	MOVOU	X3, -16(DI)(BX*1)
+	RET
+move_65through128:
+	MOVOU	(SI), X0
+	MOVOU	16(SI), X1
+	MOVOU	32(SI), X2
+	MOVOU	48(SI), X3
+	MOVOU	-64(SI)(BX*1), X4
+	MOVOU	-48(SI)(BX*1), X5
+	MOVOU	-32(SI)(BX*1), X6
+	MOVOU	-16(SI)(BX*1), X7
+	MOVOU	X0, (DI)
+	MOVOU	X1, 16(DI)
+	MOVOU	X2, 32(DI)
+	MOVOU	X3, 48(DI)
+	MOVOU	X4, -64(DI)(BX*1)
+	MOVOU	X5, -48(DI)(BX*1)
+	MOVOU	X6, -32(DI)(BX*1)
+	MOVOU	X7, -16(DI)(BX*1)
+	RET
+move_129through256:
+	MOVOU	(SI), X0
+	MOVOU	16(SI), X1
+	MOVOU	32(SI), X2
+	MOVOU	48(SI), X3
+	MOVOU	64(SI), X4
+	MOVOU	80(SI), X5
+	MOVOU	96(SI), X6
+	MOVOU	112(SI), X7
+	MOVOU	-128(SI)(BX*1), X8
+	MOVOU	-112(SI)(BX*1), X9
+	MOVOU	-96(SI)(BX*1), X10
+	MOVOU	-80(SI)(BX*1), X11
+	MOVOU	-64(SI)(BX*1), X12
+	MOVOU	-48(SI)(BX*1), X13
+	MOVOU	-32(SI)(BX*1), X14
+	MOVOU	-16(SI)(BX*1), X15
+	MOVOU	X0, (DI)
+	MOVOU	X1, 16(DI)
+	MOVOU	X2, 32(DI)
+	MOVOU	X3, 48(DI)
+	MOVOU	X4, 64(DI)
+	MOVOU	X5, 80(DI)
+	MOVOU	X6, 96(DI)
+	MOVOU	X7, 112(DI)
+	MOVOU	X8, -128(DI)(BX*1)
+	MOVOU	X9, -112(DI)(BX*1)
+	MOVOU	X10, -96(DI)(BX*1)
+	MOVOU	X11, -80(DI)(BX*1)
+	MOVOU	X12, -64(DI)(BX*1)
+	MOVOU	X13, -48(DI)(BX*1)
+	MOVOU	X14, -32(DI)(BX*1)
+	MOVOU	X15, -16(DI)(BX*1)
+	// X15 must be zero on return
+	PXOR	X15, X15
+	RET
+move_256through2048:
+	SUBQ	$256, BX
+	MOVOU	(SI), X0
+	MOVOU	16(SI), X1
+	MOVOU	32(SI), X2
+	MOVOU	48(SI), X3
+	MOVOU	64(SI), X4
+	MOVOU	80(SI), X5
+	MOVOU	96(SI), X6
+	MOVOU	112(SI), X7
+	MOVOU	128(SI), X8
+	MOVOU	144(SI), X9
+	MOVOU	160(SI), X10
+	MOVOU	176(SI), X11
+	MOVOU	192(SI), X12
+	MOVOU	208(SI), X13
+	MOVOU	224(SI), X14
+	MOVOU	240(SI), X15
+	MOVOU	X0, (DI)
+	MOVOU	X1, 16(DI)
+	MOVOU	X2, 32(DI)
+	MOVOU	X3, 48(DI)
+	MOVOU	X4, 64(DI)
+	MOVOU	X5, 80(DI)
+	MOVOU	X6, 96(DI)
+	MOVOU	X7, 112(DI)
+	MOVOU	X8, 128(DI)
+	MOVOU	X9, 144(DI)
+	MOVOU	X10, 160(DI)
+	MOVOU	X11, 176(DI)
+	MOVOU	X12, 192(DI)
+	MOVOU	X13, 208(DI)
+	MOVOU	X14, 224(DI)
+	MOVOU	X15, 240(DI)
+	CMPQ	BX, $256
+	LEAQ	256(SI), SI
+	LEAQ	256(DI), DI
+	JGE	move_256through2048
+	// X15 must be zero on return
+	PXOR	X15, X15
+	JMP	tail
+
+avxUnaligned:
+	// There are two implementations of move algorithm.
+	// The first one for non-overlapped memory regions. It uses forward copying.
+	// The second one for overlapped regions. It uses backward copying
+	MOVQ	DI, CX
+	SUBQ	SI, CX
+	// Now CX contains distance between SRC and DEST
+	CMPQ	CX, BX
+	// If the distance lesser than region length it means that regions are overlapped
+	JC	copy_backward
+
+	// Non-temporal copy would be better for big sizes.
+	CMPQ	BX, $0x100000
+	JAE	gobble_big_data_fwd
+
+	// Memory layout on the source side
+	// SI                                       CX
+	// |<---------BX before correction--------->|
+	// |       |<--BX corrected-->|             |
+	// |       |                  |<--- AX  --->|
+	// |<-R11->|                  |<-128 bytes->|
+	// +----------------------------------------+
+	// | Head  | Body             | Tail        |
+	// +-------+------------------+-------------+
+	// ^       ^                  ^
+	// |       |                  |
+	// Save head into Y4          Save tail into X5..X12
+	//         |
+	//         SI+R11, where R11 = ((DI & -32) + 32) - DI
+	// Algorithm:
+	// 1. Unaligned save of the tail's 128 bytes
+	// 2. Unaligned save of the head's 32  bytes
+	// 3. Destination-aligned copying of body (128 bytes per iteration)
+	// 4. Put head on the new place
+	// 5. Put the tail on the new place
+	// It can be important to satisfy processor's pipeline requirements for
+	// small sizes as the cost of unaligned memory region copying is
+	// comparable with the cost of main loop. So code is slightly messed there.
+	// There is more clean implementation of that algorithm for bigger sizes
+	// where the cost of unaligned part copying is negligible.
+	// You can see it after gobble_big_data_fwd label.
+	LEAQ	(SI)(BX*1), CX
+	MOVQ	DI, R10
+	// CX points to the end of buffer so we need go back slightly. We will use negative offsets there.
+	MOVOU	-0x80(CX), X5
+	MOVOU	-0x70(CX), X6
+	MOVQ	$0x80, AX
+	// Align destination address
+	ANDQ	$-32, DI
+	ADDQ	$32, DI
+	// Continue tail saving.
+	MOVOU	-0x60(CX), X7
+	MOVOU	-0x50(CX), X8
+	// Make R11 delta between aligned and unaligned destination addresses.
+	MOVQ	DI, R11
+	SUBQ	R10, R11
+	// Continue tail saving.
+	MOVOU	-0x40(CX), X9
+	MOVOU	-0x30(CX), X10
+	// Let's make bytes-to-copy value adjusted as we've prepared unaligned part for copying.
+	SUBQ	R11, BX
+	// Continue tail saving.
+	MOVOU	-0x20(CX), X11
+	MOVOU	-0x10(CX), X12
+	// The tail will be put on its place after main body copying.
+	// It's time for the unaligned heading part.
+	VMOVDQU	(SI), Y4
+	// Adjust source address to point past head.
+	ADDQ	R11, SI
+	SUBQ	AX, BX
+	// Aligned memory copying there
+gobble_128_loop:
+	VMOVDQU	(SI), Y0
+	VMOVDQU	0x20(SI), Y1
+	VMOVDQU	0x40(SI), Y2
+	VMOVDQU	0x60(SI), Y3
+	ADDQ	AX, SI
+	VMOVDQA	Y0, (DI)
+	VMOVDQA	Y1, 0x20(DI)
+	VMOVDQA	Y2, 0x40(DI)
+	VMOVDQA	Y3, 0x60(DI)
+	ADDQ	AX, DI
+	SUBQ	AX, BX
+	JA	gobble_128_loop
+	// Now we can store unaligned parts.
+	ADDQ	AX, BX
+	ADDQ	DI, BX
+	VMOVDQU	Y4, (R10)
+	VZEROUPPER
+	MOVOU	X5, -0x80(BX)
+	MOVOU	X6, -0x70(BX)
+	MOVOU	X7, -0x60(BX)
+	MOVOU	X8, -0x50(BX)
+	MOVOU	X9, -0x40(BX)
+	MOVOU	X10, -0x30(BX)
+	MOVOU	X11, -0x20(BX)
+	MOVOU	X12, -0x10(BX)
+	RET
+
+gobble_big_data_fwd:
+	// There is forward copying for big regions.
+	// It uses non-temporal mov instructions.
+	// Details of this algorithm are commented previously for small sizes.
+	LEAQ	(SI)(BX*1), CX
+	MOVOU	-0x80(SI)(BX*1), X5
+	MOVOU	-0x70(CX), X6
+	MOVOU	-0x60(CX), X7
+	MOVOU	-0x50(CX), X8
+	MOVOU	-0x40(CX), X9
+	MOVOU	-0x30(CX), X10
+	MOVOU	-0x20(CX), X11
+	MOVOU	-0x10(CX), X12
+	VMOVDQU	(SI), Y4
+	MOVQ	DI, R8
+	ANDQ	$-32, DI
+	ADDQ	$32, DI
+	MOVQ	DI, R10
+	SUBQ	R8, R10
+	SUBQ	R10, BX
+	ADDQ	R10, SI
+	LEAQ	(DI)(BX*1), CX
+	SUBQ	$0x80, BX
+gobble_mem_fwd_loop:
+	PREFETCHNTA 0x1C0(SI)
+	PREFETCHNTA 0x280(SI)
+	// Prefetch values were chosen empirically.
+	// Approach for prefetch usage as in 9.5.6 of [1]
+	// [1] 64-ia-32-architectures-optimization-manual.pdf
+	// https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
+	VMOVDQU	(SI), Y0
+	VMOVDQU	0x20(SI), Y1
+	VMOVDQU	0x40(SI), Y2
+	VMOVDQU	0x60(SI), Y3
+	ADDQ	$0x80, SI
+	VMOVNTDQ Y0, (DI)
+	VMOVNTDQ Y1, 0x20(DI)
+	VMOVNTDQ Y2, 0x40(DI)
+	VMOVNTDQ Y3, 0x60(DI)
+	ADDQ	$0x80, DI
+	SUBQ	$0x80, BX
+	JA		gobble_mem_fwd_loop
+	// NT instructions don't follow the normal cache-coherency rules.
+	// We need SFENCE there to make copied data available timely.
+	SFENCE
+	VMOVDQU	Y4, (R8)
+	VZEROUPPER
+	MOVOU	X5, -0x80(CX)
+	MOVOU	X6, -0x70(CX)
+	MOVOU	X7, -0x60(CX)
+	MOVOU	X8, -0x50(CX)
+	MOVOU	X9, -0x40(CX)
+	MOVOU	X10, -0x30(CX)
+	MOVOU	X11, -0x20(CX)
+	MOVOU	X12, -0x10(CX)
+	RET
+
+copy_backward:
+	MOVQ	DI, AX
+	// Backward copying is about the same as the forward one.
+	// Firstly we load unaligned tail in the beginning of region.
+	MOVOU	(SI), X5
+	MOVOU	0x10(SI), X6
+	ADDQ	BX, DI
+	MOVOU	0x20(SI), X7
+	MOVOU	0x30(SI), X8
+	LEAQ	-0x20(DI), R10
+	MOVQ	DI, R11
+	MOVOU	0x40(SI), X9
+	MOVOU	0x50(SI), X10
+	ANDQ	$0x1F, R11
+	MOVOU	0x60(SI), X11
+	MOVOU	0x70(SI), X12
+	XORQ	R11, DI
+	// Let's point SI to the end of region
+	ADDQ	BX, SI
+	// and load unaligned head into X4.
+	VMOVDQU	-0x20(SI), Y4
+	SUBQ	R11, SI
+	SUBQ	R11, BX
+	// If there is enough data for non-temporal moves go to special loop
+	CMPQ	BX, $0x100000
+	JA		gobble_big_data_bwd
+	SUBQ	$0x80, BX
+gobble_mem_bwd_loop:
+	VMOVDQU	-0x20(SI), Y0
+	VMOVDQU	-0x40(SI), Y1
+	VMOVDQU	-0x60(SI), Y2
+	VMOVDQU	-0x80(SI), Y3
+	SUBQ	$0x80, SI
+	VMOVDQA	Y0, -0x20(DI)
+	VMOVDQA	Y1, -0x40(DI)
+	VMOVDQA	Y2, -0x60(DI)
+	VMOVDQA	Y3, -0x80(DI)
+	SUBQ	$0x80, DI
+	SUBQ	$0x80, BX
+	JA		gobble_mem_bwd_loop
+	// Let's store unaligned data
+	VMOVDQU	Y4, (R10)
+	VZEROUPPER
+	MOVOU	X5, (AX)
+	MOVOU	X6, 0x10(AX)
+	MOVOU	X7, 0x20(AX)
+	MOVOU	X8, 0x30(AX)
+	MOVOU	X9, 0x40(AX)
+	MOVOU	X10, 0x50(AX)
+	MOVOU	X11, 0x60(AX)
+	MOVOU	X12, 0x70(AX)
+	RET
+
+gobble_big_data_bwd:
+	SUBQ	$0x80, BX
+gobble_big_mem_bwd_loop:
+	PREFETCHNTA -0x1C0(SI)
+	PREFETCHNTA -0x280(SI)
+	VMOVDQU	-0x20(SI), Y0
+	VMOVDQU	-0x40(SI), Y1
+	VMOVDQU	-0x60(SI), Y2
+	VMOVDQU	-0x80(SI), Y3
+	SUBQ	$0x80, SI
+	VMOVNTDQ	Y0, -0x20(DI)
+	VMOVNTDQ	Y1, -0x40(DI)
+	VMOVNTDQ	Y2, -0x60(DI)
+	VMOVNTDQ	Y3, -0x80(DI)
+	SUBQ	$0x80, DI
+	SUBQ	$0x80, BX
+	JA	gobble_big_mem_bwd_loop
+	SFENCE
+	VMOVDQU	Y4, (R10)
+	VZEROUPPER
+	MOVOU	X5, (AX)
+	MOVOU	X6, 0x10(AX)
+	MOVOU	X7, 0x20(AX)
+	MOVOU	X8, 0x30(AX)
+	MOVOU	X9, 0x40(AX)
+	MOVOU	X10, 0x50(AX)
+	MOVOU	X11, 0x60(AX)
+	MOVOU	X12, 0x70(AX)
+	RET
diff --git a/src/runtime/memmove_arm.s b/src/runtime/memmove_arm.s
new file mode 100644
index 0000000..43d53fa
--- /dev/null
+++ b/src/runtime/memmove_arm.s
@@ -0,0 +1,264 @@
+// Inferno's libkern/memmove-arm.s
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/memmove-arm.s
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+#include "textflag.h"
+
+// TE or TS are spilled to the stack during bulk register moves.
+#define TS	R0
+#define TE	R8
+
+// Warning: the linker will use R11 to synthesize certain instructions. Please
+// take care and double check with objdump.
+#define FROM	R11
+#define N	R12
+#define TMP	R12				/* N and TMP don't overlap */
+#define TMP1	R5
+
+#define RSHIFT	R5
+#define LSHIFT	R6
+#define OFFSET	R7
+
+#define BR0	R0					/* shared with TS */
+#define BW0	R1
+#define BR1	R1
+#define BW1	R2
+#define BR2	R2
+#define BW2	R3
+#define BR3	R3
+#define BW3	R4
+
+#define FW0	R1
+#define FR0	R2
+#define FW1	R2
+#define FR1	R3
+#define FW2	R3
+#define FR2	R4
+#define FW3	R4
+#define FR3	R8					/* shared with TE */
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB), NOSPLIT, $4-12
+_memmove:
+	MOVW	to+0(FP), TS
+	MOVW	from+4(FP), FROM
+	MOVW	n+8(FP), N
+
+	ADD	N, TS, TE	/* to end pointer */
+
+	CMP	FROM, TS
+	BLS	_forward
+
+_back:
+	ADD	N, FROM		/* from end pointer */
+	CMP	$4, N		/* need at least 4 bytes to copy */
+	BLT	_b1tail
+
+_b4align:				/* align destination on 4 */
+	AND.S	$3, TE, TMP
+	BEQ	_b4aligned
+
+	MOVBU.W	-1(FROM), TMP	/* pre-indexed */
+	MOVBU.W	TMP, -1(TE)	/* pre-indexed */
+	B	_b4align
+
+_b4aligned:				/* is source now aligned? */
+	AND.S	$3, FROM, TMP
+	BNE	_bunaligned
+
+	ADD	$31, TS, TMP	/* do 32-byte chunks if possible */
+	MOVW	TS, savedts-4(SP)
+_b32loop:
+	CMP	TMP, TE
+	BLS	_b4tail
+
+	MOVM.DB.W (FROM), [R0-R7]
+	MOVM.DB.W [R0-R7], (TE)
+	B	_b32loop
+
+_b4tail:				/* do remaining words if possible */
+	MOVW	savedts-4(SP), TS
+	ADD	$3, TS, TMP
+_b4loop:
+	CMP	TMP, TE
+	BLS	_b1tail
+
+	MOVW.W	-4(FROM), TMP1	/* pre-indexed */
+	MOVW.W	TMP1, -4(TE)	/* pre-indexed */
+	B	_b4loop
+
+_b1tail:				/* remaining bytes */
+	CMP	TE, TS
+	BEQ	_return
+
+	MOVBU.W	-1(FROM), TMP	/* pre-indexed */
+	MOVBU.W	TMP, -1(TE)	/* pre-indexed */
+	B	_b1tail
+
+_forward:
+	CMP	$4, N		/* need at least 4 bytes to copy */
+	BLT	_f1tail
+
+_f4align:				/* align destination on 4 */
+	AND.S	$3, TS, TMP
+	BEQ	_f4aligned
+
+	MOVBU.P	1(FROM), TMP	/* implicit write back */
+	MOVBU.P	TMP, 1(TS)	/* implicit write back */
+	B	_f4align
+
+_f4aligned:				/* is source now aligned? */
+	AND.S	$3, FROM, TMP
+	BNE	_funaligned
+
+	SUB	$31, TE, TMP	/* do 32-byte chunks if possible */
+	MOVW	TE, savedte-4(SP)
+_f32loop:
+	CMP	TMP, TS
+	BHS	_f4tail
+
+	MOVM.IA.W (FROM), [R1-R8]
+	MOVM.IA.W [R1-R8], (TS)
+	B	_f32loop
+
+_f4tail:
+	MOVW	savedte-4(SP), TE
+	SUB	$3, TE, TMP	/* do remaining words if possible */
+_f4loop:
+	CMP	TMP, TS
+	BHS	_f1tail
+
+	MOVW.P	4(FROM), TMP1	/* implicit write back */
+	MOVW.P	TMP1, 4(TS)	/* implicit write back */
+	B	_f4loop
+
+_f1tail:
+	CMP	TS, TE
+	BEQ	_return
+
+	MOVBU.P	1(FROM), TMP	/* implicit write back */
+	MOVBU.P	TMP, 1(TS)	/* implicit write back */
+	B	_f1tail
+
+_return:
+	MOVW	to+0(FP), R0
+	RET
+
+_bunaligned:
+	CMP	$2, TMP		/* is TMP < 2 ? */
+
+	MOVW.LT	$8, RSHIFT		/* (R(n)<<24)|(R(n-1)>>8) */
+	MOVW.LT	$24, LSHIFT
+	MOVW.LT	$1, OFFSET
+
+	MOVW.EQ	$16, RSHIFT		/* (R(n)<<16)|(R(n-1)>>16) */
+	MOVW.EQ	$16, LSHIFT
+	MOVW.EQ	$2, OFFSET
+
+	MOVW.GT	$24, RSHIFT		/* (R(n)<<8)|(R(n-1)>>24) */
+	MOVW.GT	$8, LSHIFT
+	MOVW.GT	$3, OFFSET
+
+	ADD	$16, TS, TMP	/* do 16-byte chunks if possible */
+	CMP	TMP, TE
+	BLS	_b1tail
+
+	BIC	$3, FROM		/* align source */
+	MOVW	TS, savedts-4(SP)
+	MOVW	(FROM), BR0	/* prime first block register */
+
+_bu16loop:
+	CMP	TMP, TE
+	BLS	_bu1tail
+
+	MOVW	BR0<<LSHIFT, BW3
+	MOVM.DB.W (FROM), [BR0-BR3]
+	ORR	BR3>>RSHIFT, BW3
+
+	MOVW	BR3<<LSHIFT, BW2
+	ORR	BR2>>RSHIFT, BW2
+
+	MOVW	BR2<<LSHIFT, BW1
+	ORR	BR1>>RSHIFT, BW1
+
+	MOVW	BR1<<LSHIFT, BW0
+	ORR	BR0>>RSHIFT, BW0
+
+	MOVM.DB.W [BW0-BW3], (TE)
+	B	_bu16loop
+
+_bu1tail:
+	MOVW	savedts-4(SP), TS
+	ADD	OFFSET, FROM
+	B	_b1tail
+
+_funaligned:
+	CMP	$2, TMP
+
+	MOVW.LT	$8, RSHIFT		/* (R(n+1)<<24)|(R(n)>>8) */
+	MOVW.LT	$24, LSHIFT
+	MOVW.LT	$3, OFFSET
+
+	MOVW.EQ	$16, RSHIFT		/* (R(n+1)<<16)|(R(n)>>16) */
+	MOVW.EQ	$16, LSHIFT
+	MOVW.EQ	$2, OFFSET
+
+	MOVW.GT	$24, RSHIFT		/* (R(n+1)<<8)|(R(n)>>24) */
+	MOVW.GT	$8, LSHIFT
+	MOVW.GT	$1, OFFSET
+
+	SUB	$16, TE, TMP	/* do 16-byte chunks if possible */
+	CMP	TMP, TS
+	BHS	_f1tail
+
+	BIC	$3, FROM		/* align source */
+	MOVW	TE, savedte-4(SP)
+	MOVW.P	4(FROM), FR3	/* prime last block register, implicit write back */
+
+_fu16loop:
+	CMP	TMP, TS
+	BHS	_fu1tail
+
+	MOVW	FR3>>RSHIFT, FW0
+	MOVM.IA.W (FROM), [FR0,FR1,FR2,FR3]
+	ORR	FR0<<LSHIFT, FW0
+
+	MOVW	FR0>>RSHIFT, FW1
+	ORR	FR1<<LSHIFT, FW1
+
+	MOVW	FR1>>RSHIFT, FW2
+	ORR	FR2<<LSHIFT, FW2
+
+	MOVW	FR2>>RSHIFT, FW3
+	ORR	FR3<<LSHIFT, FW3
+
+	MOVM.IA.W [FW0,FW1,FW2,FW3], (TS)
+	B	_fu16loop
+
+_fu1tail:
+	MOVW	savedte-4(SP), TE
+	SUB	OFFSET, FROM
+	B	_f1tail
diff --git a/src/runtime/memmove_arm64.s b/src/runtime/memmove_arm64.s
new file mode 100644
index 0000000..8ec3ed8
--- /dev/null
+++ b/src/runtime/memmove_arm64.s
@@ -0,0 +1,238 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// Register map
+//
+// dstin  R0
+// src    R1
+// count  R2
+// dst    R3 (same as R0, but gets modified in unaligned cases)
+// srcend R4
+// dstend R5
+// data   R6-R17
+// tmp1   R14
+
+// Copies are split into 3 main cases: small copies of up to 32 bytes, medium
+// copies of up to 128 bytes, and large copies. The overhead of the overlap
+// check is negligible since it is only required for large copies.
+//
+// Large copies use a software pipelined loop processing 64 bytes per iteration.
+// The destination pointer is 16-byte aligned to minimize unaligned accesses.
+// The loop tail is handled by always copying 64 bytes from the end.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-24
+	CBZ	R2, copy0
+
+	// Small copies: 1..16 bytes
+	CMP	$16, R2
+	BLE	copy16
+
+	// Large copies
+	CMP	$128, R2
+	BHI	copy_long
+	CMP	$32, R2
+	BHI	copy32_128
+
+	// Small copies: 17..32 bytes.
+	LDP	(R1), (R6, R7)
+	ADD	R1, R2, R4          // R4 points just past the last source byte
+	LDP	-16(R4), (R12, R13)
+	STP	(R6, R7), (R0)
+	ADD	R0, R2, R5          // R5 points just past the last destination byte
+	STP	(R12, R13), -16(R5)
+	RET
+
+// Small copies: 1..16 bytes.
+copy16:
+	ADD	R1, R2, R4 // R4 points just past the last source byte
+	ADD	R0, R2, R5 // R5 points just past the last destination byte
+	CMP	$8, R2
+	BLT	copy7
+	MOVD	(R1), R6
+	MOVD	-8(R4), R7
+	MOVD	R6, (R0)
+	MOVD	R7, -8(R5)
+	RET
+
+copy7:
+	TBZ	$2, R2, copy3
+	MOVWU	(R1), R6
+	MOVWU	-4(R4), R7
+	MOVW	R6, (R0)
+	MOVW	R7, -4(R5)
+	RET
+
+copy3:
+	TBZ	$1, R2, copy1
+	MOVHU	(R1), R6
+	MOVHU	-2(R4), R7
+	MOVH	R6, (R0)
+	MOVH	R7, -2(R5)
+	RET
+
+copy1:
+	MOVBU	(R1), R6
+	MOVB	R6, (R0)
+
+copy0:
+	RET
+
+	// Medium copies: 33..128 bytes.
+copy32_128:
+	ADD	R1, R2, R4          // R4 points just past the last source byte
+	ADD	R0, R2, R5          // R5 points just past the last destination byte
+	LDP	(R1), (R6, R7)
+	LDP	16(R1), (R8, R9)
+	LDP	-32(R4), (R10, R11)
+	LDP	-16(R4), (R12, R13)
+	CMP	$64, R2
+	BHI	copy128
+	STP	(R6, R7), (R0)
+	STP	(R8, R9), 16(R0)
+	STP	(R10, R11), -32(R5)
+	STP	(R12, R13), -16(R5)
+	RET
+
+	// Copy 65..128 bytes.
+copy128:
+	LDP	32(R1), (R14, R15)
+	LDP	48(R1), (R16, R17)
+	CMP	$96, R2
+	BLS	copy96
+	LDP	-64(R4), (R2, R3)
+	LDP	-48(R4), (R1, R4)
+	STP	(R2, R3), -64(R5)
+	STP	(R1, R4), -48(R5)
+
+copy96:
+	STP	(R6, R7), (R0)
+	STP	(R8, R9), 16(R0)
+	STP	(R14, R15), 32(R0)
+	STP	(R16, R17), 48(R0)
+	STP	(R10, R11), -32(R5)
+	STP	(R12, R13), -16(R5)
+	RET
+
+	// Copy more than 128 bytes.
+copy_long:
+	ADD	R1, R2, R4 // R4 points just past the last source byte
+	ADD	R0, R2, R5 // R5 points just past the last destination byte
+	MOVD	ZR, R7
+	MOVD	ZR, R8
+
+	CMP	$1024, R2
+	BLT	backward_check
+	// feature detect to decide how to align
+	MOVBU	runtime·arm64UseAlignedLoads(SB), R6
+	CBNZ	R6, use_aligned_loads
+	MOVD	R0, R7
+	MOVD	R5, R8
+	B	backward_check
+use_aligned_loads:
+	MOVD	R1, R7
+	MOVD	R4, R8
+	// R7 and R8 are used here for the realignment calculation. In
+	// the use_aligned_loads case, R7 is the src pointer and R8 is
+	// srcend pointer, which is used in the backward copy case.
+	// When doing aligned stores, R7 is the dst pointer and R8 is
+	// the dstend pointer.
+
+backward_check:
+	// Use backward copy if there is an overlap.
+	SUB	R1, R0, R14
+	CBZ	R14, copy0
+	CMP	R2, R14
+	BCC	copy_long_backward
+
+	// Copy 16 bytes and then align src (R1) or dst (R0) to 16-byte alignment.
+	LDP	(R1), (R12, R13)     // Load  A
+	AND	$15, R7, R14         // Calculate the realignment offset
+	SUB	R14, R1, R1
+	SUB	R14, R0, R3          // move dst back same amount as src
+	ADD	R14, R2, R2
+	LDP	16(R1), (R6, R7)     // Load   B
+	STP	(R12, R13), (R0)     // Store A
+	LDP	32(R1), (R8, R9)     // Load    C
+	LDP	48(R1), (R10, R11)   // Load     D
+	LDP.W	64(R1), (R12, R13)   // Load      E
+	// 80 bytes have been loaded; if less than 80+64 bytes remain, copy from the end
+	SUBS	$144, R2, R2
+	BLS	copy64_from_end
+
+loop64:
+	STP	(R6, R7), 16(R3)     // Store  B
+	LDP	16(R1), (R6, R7)     // Load   B (next iteration)
+	STP	(R8, R9), 32(R3)     // Store   C
+	LDP	32(R1), (R8, R9)     // Load    C
+	STP	(R10, R11), 48(R3)   // Store    D
+	LDP	48(R1), (R10, R11)   // Load     D
+	STP.W	(R12, R13), 64(R3)   // Store     E
+	LDP.W	64(R1), (R12, R13)   // Load      E
+	SUBS	$64, R2, R2
+	BHI	loop64
+
+	// Write the last iteration and copy 64 bytes from the end.
+copy64_from_end:
+	LDP	-64(R4), (R14, R15)  // Load       F
+	STP	(R6, R7), 16(R3)     // Store  B
+	LDP	-48(R4), (R6, R7)    // Load        G
+	STP	(R8, R9), 32(R3)     // Store   C
+	LDP	-32(R4), (R8, R9)    // Load         H
+	STP	(R10, R11), 48(R3)   // Store    D
+	LDP	-16(R4), (R10, R11)  // Load          I
+	STP	(R12, R13), 64(R3)   // Store     E
+	STP	(R14, R15), -64(R5)  // Store      F
+	STP	(R6, R7), -48(R5)    // Store       G
+	STP	(R8, R9), -32(R5)    // Store        H
+	STP	(R10, R11), -16(R5)  // Store         I
+	RET
+
+	// Large backward copy for overlapping copies.
+	// Copy 16 bytes and then align srcend (R4) or dstend (R5) to 16-byte alignment.
+copy_long_backward:
+	LDP	-16(R4), (R12, R13)
+	AND	$15, R8, R14
+	SUB	R14, R4, R4
+	SUB	R14, R2, R2
+	LDP	-16(R4), (R6, R7)
+	STP	(R12, R13), -16(R5)
+	LDP	-32(R4), (R8, R9)
+	LDP	-48(R4), (R10, R11)
+	LDP.W	-64(R4), (R12, R13)
+	SUB	R14, R5, R5
+	SUBS	$128, R2, R2
+	BLS	copy64_from_start
+
+loop64_backward:
+	STP	(R6, R7), -16(R5)
+	LDP	-16(R4), (R6, R7)
+	STP	(R8, R9), -32(R5)
+	LDP	-32(R4), (R8, R9)
+	STP	(R10, R11), -48(R5)
+	LDP	-48(R4), (R10, R11)
+	STP.W	(R12, R13), -64(R5)
+	LDP.W	-64(R4), (R12, R13)
+	SUBS	$64, R2, R2
+	BHI	loop64_backward
+
+	// Write the last iteration and copy 64 bytes from the start.
+copy64_from_start:
+	LDP	48(R1), (R2, R3)
+	STP	(R6, R7), -16(R5)
+	LDP	32(R1), (R6, R7)
+	STP	(R8, R9), -32(R5)
+	LDP	16(R1), (R8, R9)
+	STP	(R10, R11), -48(R5)
+	LDP	(R1), (R10, R11)
+	STP	(R12, R13), -64(R5)
+	STP	(R2, R3), 48(R0)
+	STP	(R6, R7), 32(R0)
+	STP	(R8, R9), 16(R0)
+	STP	(R10, R11), (R0)
+	RET
diff --git a/src/runtime/memmove_linux_amd64_test.go b/src/runtime/memmove_linux_amd64_test.go
new file mode 100644
index 0000000..5f90062
--- /dev/null
+++ b/src/runtime/memmove_linux_amd64_test.go
@@ -0,0 +1,56 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"os"
+	"syscall"
+	"testing"
+	"unsafe"
+)
+
+// TestMemmoveOverflow maps 3GB of memory and calls memmove on
+// the corresponding slice.
+func TestMemmoveOverflow(t *testing.T) {
+	t.Parallel()
+	// Create a temporary file.
+	tmp, err := os.CreateTemp("", "go-memmovetest")
+	if err != nil {
+		t.Fatal(err)
+	}
+	_, err = tmp.Write(make([]byte, 65536))
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer os.Remove(tmp.Name())
+	defer tmp.Close()
+
+	// Set up mappings.
+	base, _, errno := syscall.Syscall6(syscall.SYS_MMAP,
+		0xa0<<32, 3<<30, syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_PRIVATE|syscall.MAP_ANONYMOUS, ^uintptr(0), 0)
+	if errno != 0 {
+		t.Skipf("could not create memory mapping: %s", errno)
+	}
+	syscall.Syscall(syscall.SYS_MUNMAP, base, 3<<30, 0)
+
+	for off := uintptr(0); off < 3<<30; off += 65536 {
+		_, _, errno := syscall.Syscall6(syscall.SYS_MMAP,
+			base+off, 65536, syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_SHARED|syscall.MAP_FIXED, tmp.Fd(), 0)
+		if errno != 0 {
+			t.Skipf("could not map a page at requested 0x%x: %s", base+off, errno)
+		}
+		defer syscall.Syscall(syscall.SYS_MUNMAP, base+off, 65536, 0)
+	}
+
+	s := unsafe.Slice((*byte)(unsafe.Pointer(base)), 3<<30)
+	n := copy(s[1:], s)
+	if n != 3<<30-1 {
+		t.Fatalf("copied %d bytes, expected %d", n, 3<<30-1)
+	}
+	n = copy(s, s[1:])
+	if n != 3<<30-1 {
+		t.Fatalf("copied %d bytes, expected %d", n, 3<<30-1)
+	}
+}
diff --git a/src/runtime/memmove_loong64.s b/src/runtime/memmove_loong64.s
new file mode 100644
index 0000000..0f139bc
--- /dev/null
+++ b/src/runtime/memmove_loong64.s
@@ -0,0 +1,107 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB), NOSPLIT|NOFRAME, $0-24
+	MOVV	to+0(FP), R4
+	MOVV	from+8(FP), R5
+	MOVV	n+16(FP), R6
+	BNE	R6, check
+	RET
+
+check:
+	SGTU	R4, R5, R7
+	BNE	R7, backward
+
+	ADDV	R4, R6, R9 // end pointer
+
+	// if the two pointers are not of same alignments, do byte copying
+	SUBVU	R5, R4, R7
+	AND	$7, R7
+	BNE	R7, out
+
+	// if less than 8 bytes, do byte copying
+	SGTU	$8, R6, R7
+	BNE	R7, out
+
+	// do one byte at a time until 8-aligned
+	AND	$7, R4, R8
+	BEQ	R8, words
+	MOVB	(R5), R7
+	ADDV	$1, R5
+	MOVB	R7, (R4)
+	ADDV	$1, R4
+	JMP	-6(PC)
+
+words:
+	// do 8 bytes at a time if there is room
+	ADDV	$-7, R9, R6 // R6 is end pointer-7
+
+	PCALIGN	$16
+	SGTU	R6, R4, R8
+	BEQ	R8, out
+	MOVV	(R5), R7
+	ADDV	$8, R5
+	MOVV	R7, (R4)
+	ADDV	$8, R4
+	JMP	-6(PC)
+
+out:
+	BEQ	R4, R9, done
+	MOVB	(R5), R7
+	ADDV	$1, R5
+	MOVB	R7, (R4)
+	ADDV	$1, R4
+	JMP	-5(PC)
+done:
+	RET
+
+backward:
+	ADDV	R6, R5 // from-end pointer
+	ADDV	R4, R6, R9 // to-end pointer
+
+	// if the two pointers are not of same alignments, do byte copying
+	SUBVU	R9, R5, R7
+	AND	$7, R7
+	BNE	R7, out1
+
+	// if less than 8 bytes, do byte copying
+	SGTU	$8, R6, R7
+	BNE	R7, out1
+
+	// do one byte at a time until 8-aligned
+	AND	$7, R9, R8
+	BEQ	R8, words1
+	ADDV	$-1, R5
+	MOVB	(R5), R7
+	ADDV	$-1, R9
+	MOVB	R7, (R9)
+	JMP	-6(PC)
+
+words1:
+	// do 8 bytes at a time if there is room
+	ADDV	$7, R4, R6 // R6 is start pointer+7
+
+	PCALIGN	$16
+	SGTU	R9, R6, R8
+	BEQ	R8, out1
+	ADDV	$-8, R5
+	MOVV	(R5), R7
+	ADDV	$-8, R9
+	MOVV	R7, (R9)
+	JMP	-6(PC)
+
+out1:
+	BEQ	R4, R9, done1
+	ADDV	$-1, R5
+	MOVB	(R5), R7
+	ADDV	$-1, R9
+	MOVB	R7, (R9)
+	JMP	-5(PC)
+done1:
+	RET
diff --git a/src/runtime/memmove_mips64x.s b/src/runtime/memmove_mips64x.s
new file mode 100644
index 0000000..b69178c
--- /dev/null
+++ b/src/runtime/memmove_mips64x.s
@@ -0,0 +1,107 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips64 || mips64le
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB), NOSPLIT|NOFRAME, $0-24
+	MOVV	to+0(FP), R1
+	MOVV	from+8(FP), R2
+	MOVV	n+16(FP), R3
+	BNE	R3, check
+	RET
+
+check:
+	SGTU	R1, R2, R4
+	BNE	R4, backward
+
+	ADDV	R1, R3, R6 // end pointer
+
+	// if the two pointers are not of same alignments, do byte copying
+	SUBVU	R2, R1, R4
+	AND	$7, R4
+	BNE	R4, out
+
+	// if less than 8 bytes, do byte copying
+	SGTU	$8, R3, R4
+	BNE	R4, out
+
+	// do one byte at a time until 8-aligned
+	AND	$7, R1, R5
+	BEQ	R5, words
+	MOVB	(R2), R4
+	ADDV	$1, R2
+	MOVB	R4, (R1)
+	ADDV	$1, R1
+	JMP	-6(PC)
+
+words:
+	// do 8 bytes at a time if there is room
+	ADDV	$-7, R6, R3 // R3 is end pointer-7
+
+	SGTU	R3, R1, R5
+	BEQ	R5, out
+	MOVV	(R2), R4
+	ADDV	$8, R2
+	MOVV	R4, (R1)
+	ADDV	$8, R1
+	JMP	-6(PC)
+
+out:
+	BEQ	R1, R6, done
+	MOVB	(R2), R4
+	ADDV	$1, R2
+	MOVB	R4, (R1)
+	ADDV	$1, R1
+	JMP	-5(PC)
+done:
+	RET
+
+backward:
+	ADDV	R3, R2 // from-end pointer
+	ADDV	R1, R3, R6 // to-end pointer
+
+	// if the two pointers are not of same alignments, do byte copying
+	SUBVU	R6, R2, R4
+	AND	$7, R4
+	BNE	R4, out1
+
+	// if less than 8 bytes, do byte copying
+	SGTU	$8, R3, R4
+	BNE	R4, out1
+
+	// do one byte at a time until 8-aligned
+	AND	$7, R6, R5
+	BEQ	R5, words1
+	ADDV	$-1, R2
+	MOVB	(R2), R4
+	ADDV	$-1, R6
+	MOVB	R4, (R6)
+	JMP	-6(PC)
+
+words1:
+	// do 8 bytes at a time if there is room
+	ADDV	$7, R1, R3 // R3 is start pointer+7
+
+	SGTU	R6, R3, R5
+	BEQ	R5, out1
+	ADDV	$-8, R2
+	MOVV	(R2), R4
+	ADDV	$-8, R6
+	MOVV	R4, (R6)
+	JMP	-6(PC)
+
+out1:
+	BEQ	R1, R6, done1
+	ADDV	$-1, R2
+	MOVB	(R2), R4
+	ADDV	$-1, R6
+	MOVB	R4, (R6)
+	JMP	-5(PC)
+done1:
+	RET
diff --git a/src/runtime/memmove_mipsx.s b/src/runtime/memmove_mipsx.s
new file mode 100644
index 0000000..494288c
--- /dev/null
+++ b/src/runtime/memmove_mipsx.s
@@ -0,0 +1,260 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips || mipsle
+
+#include "textflag.h"
+
+#ifdef GOARCH_mips
+#define MOVWHI  MOVWL
+#define MOVWLO  MOVWR
+#else
+#define MOVWHI  MOVWR
+#define MOVWLO  MOVWL
+#endif
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB),NOSPLIT,$-0-12
+	MOVW	n+8(FP), R3
+	MOVW	from+4(FP), R2
+	MOVW	to+0(FP), R1
+
+	ADDU	R3, R2, R4	// end pointer for source
+	ADDU	R3, R1, R5	// end pointer for destination
+
+	// if destination is ahead of source, start at the end of the buffer and go backward.
+	SGTU	R1, R2, R6
+	BNE	R6, backward
+
+	// if less than 4 bytes, use byte by byte copying
+	SGTU	$4, R3, R6
+	BNE	R6, f_small_copy
+
+	// align destination to 4 bytes
+	AND	$3, R1, R6
+	BEQ	R6, f_dest_aligned
+	SUBU	R1, R0, R6
+	AND	$3, R6
+	MOVWHI	0(R2), R7
+	SUBU	R6, R3
+	MOVWLO	3(R2), R7
+	ADDU	R6, R2
+	MOVWHI	R7, 0(R1)
+	ADDU	R6, R1
+
+f_dest_aligned:
+	AND	$31, R3, R7
+	AND	$3, R3, R6
+	SUBU	R7, R5, R7	// end pointer for 32-byte chunks
+	SUBU	R6, R5, R6	// end pointer for 4-byte chunks
+
+	// if source is not aligned, use unaligned reads
+	AND	$3, R2, R8
+	BNE	R8, f_large_ua
+
+f_large:
+	BEQ	R1, R7, f_words
+	ADDU	$32, R1
+	MOVW	0(R2), R8
+	MOVW	4(R2), R9
+	MOVW	8(R2), R10
+	MOVW	12(R2), R11
+	MOVW	16(R2), R12
+	MOVW	20(R2), R13
+	MOVW	24(R2), R14
+	MOVW	28(R2), R15
+	ADDU	$32, R2
+	MOVW	R8, -32(R1)
+	MOVW	R9, -28(R1)
+	MOVW	R10, -24(R1)
+	MOVW	R11, -20(R1)
+	MOVW	R12, -16(R1)
+	MOVW	R13, -12(R1)
+	MOVW	R14, -8(R1)
+	MOVW	R15, -4(R1)
+	JMP	f_large
+
+f_words:
+	BEQ	R1, R6, f_tail
+	ADDU	$4, R1
+	MOVW	0(R2), R8
+	ADDU	$4, R2
+	MOVW	R8, -4(R1)
+	JMP	f_words
+
+f_tail:
+	BEQ	R1, R5, ret
+	MOVWLO	-1(R4), R8
+	MOVWLO	R8, -1(R5)
+
+ret:
+	RET
+
+f_large_ua:
+	BEQ	R1, R7, f_words_ua
+	ADDU	$32, R1
+	MOVWHI	0(R2), R8
+	MOVWHI	4(R2), R9
+	MOVWHI	8(R2), R10
+	MOVWHI	12(R2), R11
+	MOVWHI	16(R2), R12
+	MOVWHI	20(R2), R13
+	MOVWHI	24(R2), R14
+	MOVWHI	28(R2), R15
+	MOVWLO	3(R2), R8
+	MOVWLO	7(R2), R9
+	MOVWLO	11(R2), R10
+	MOVWLO	15(R2), R11
+	MOVWLO	19(R2), R12
+	MOVWLO	23(R2), R13
+	MOVWLO	27(R2), R14
+	MOVWLO	31(R2), R15
+	ADDU	$32, R2
+	MOVW	R8, -32(R1)
+	MOVW	R9, -28(R1)
+	MOVW	R10, -24(R1)
+	MOVW	R11, -20(R1)
+	MOVW	R12, -16(R1)
+	MOVW	R13, -12(R1)
+	MOVW	R14, -8(R1)
+	MOVW	R15, -4(R1)
+	JMP	f_large_ua
+
+f_words_ua:
+	BEQ	R1, R6, f_tail_ua
+	MOVWHI	0(R2), R8
+	ADDU	$4, R1
+	MOVWLO	3(R2), R8
+	ADDU	$4, R2
+	MOVW	R8, -4(R1)
+	JMP	f_words_ua
+
+f_tail_ua:
+	BEQ	R1, R5, ret
+	MOVWHI	-4(R4), R8
+	MOVWLO	-1(R4), R8
+	MOVWLO	R8, -1(R5)
+	JMP	ret
+
+f_small_copy:
+	BEQ	R1, R5, ret
+	ADDU	$1, R1
+	MOVB	0(R2), R6
+	ADDU	$1, R2
+	MOVB	R6, -1(R1)
+	JMP	f_small_copy
+
+backward:
+	SGTU	$4, R3, R6
+	BNE	R6, b_small_copy
+
+	AND	$3, R5, R6
+	BEQ	R6, b_dest_aligned
+	MOVWHI	-4(R4), R7
+	SUBU	R6, R3
+	MOVWLO	-1(R4), R7
+	SUBU	R6, R4
+	MOVWLO	R7, -1(R5)
+	SUBU	R6, R5
+
+b_dest_aligned:
+	AND	$31, R3, R7
+	AND	$3, R3, R6
+	ADDU	R7, R1, R7
+	ADDU	R6, R1, R6
+
+	AND	$3, R4, R8
+	BNE	R8, b_large_ua
+
+b_large:
+	BEQ	R5, R7, b_words
+	ADDU	$-32, R5
+	MOVW	-4(R4), R8
+	MOVW	-8(R4), R9
+	MOVW	-12(R4), R10
+	MOVW	-16(R4), R11
+	MOVW	-20(R4), R12
+	MOVW	-24(R4), R13
+	MOVW	-28(R4), R14
+	MOVW	-32(R4), R15
+	ADDU	$-32, R4
+	MOVW	R8, 28(R5)
+	MOVW	R9, 24(R5)
+	MOVW	R10, 20(R5)
+	MOVW	R11, 16(R5)
+	MOVW	R12, 12(R5)
+	MOVW	R13, 8(R5)
+	MOVW	R14, 4(R5)
+	MOVW	R15, 0(R5)
+	JMP	b_large
+
+b_words:
+	BEQ	R5, R6, b_tail
+	ADDU	$-4, R5
+	MOVW	-4(R4), R8
+	ADDU	$-4, R4
+	MOVW	R8, 0(R5)
+	JMP	b_words
+
+b_tail:
+	BEQ	R5, R1, ret
+	MOVWHI	0(R2), R8	// R2 and R1 have the same alignment so we don't need to load a whole word
+	MOVWHI	R8, 0(R1)
+	JMP	ret
+
+b_large_ua:
+	BEQ	R5, R7, b_words_ua
+	ADDU	$-32, R5
+	MOVWHI	-4(R4), R8
+	MOVWHI	-8(R4), R9
+	MOVWHI	-12(R4), R10
+	MOVWHI	-16(R4), R11
+	MOVWHI	-20(R4), R12
+	MOVWHI	-24(R4), R13
+	MOVWHI	-28(R4), R14
+	MOVWHI	-32(R4), R15
+	MOVWLO	-1(R4), R8
+	MOVWLO	-5(R4), R9
+	MOVWLO	-9(R4), R10
+	MOVWLO	-13(R4), R11
+	MOVWLO	-17(R4), R12
+	MOVWLO	-21(R4), R13
+	MOVWLO	-25(R4), R14
+	MOVWLO	-29(R4), R15
+	ADDU	$-32, R4
+	MOVW	R8, 28(R5)
+	MOVW	R9, 24(R5)
+	MOVW	R10, 20(R5)
+	MOVW	R11, 16(R5)
+	MOVW	R12, 12(R5)
+	MOVW	R13, 8(R5)
+	MOVW	R14, 4(R5)
+	MOVW	R15, 0(R5)
+	JMP	b_large_ua
+
+b_words_ua:
+	BEQ	R5, R6, b_tail_ua
+	MOVWHI	-4(R4), R8
+	ADDU	$-4, R5
+	MOVWLO	-1(R4), R8
+	ADDU	$-4, R4
+	MOVW	R8, 0(R5)
+	JMP	b_words_ua
+
+b_tail_ua:
+	BEQ	R5, R1, ret
+	MOVWHI	(R2), R8
+	MOVWLO	3(R2), R8
+	MOVWHI	R8, 0(R1)
+	JMP ret
+
+b_small_copy:
+	BEQ	R5, R1, ret
+	ADDU	$-1, R5
+	MOVB	-1(R4), R6
+	ADDU	$-1, R4
+	MOVB	R6, 0(R5)
+	JMP	b_small_copy
diff --git a/src/runtime/memmove_plan9_386.s b/src/runtime/memmove_plan9_386.s
new file mode 100644
index 0000000..cfce0e9
--- /dev/null
+++ b/src/runtime/memmove_plan9_386.s
@@ -0,0 +1,137 @@
+// Inferno's libkern/memmove-386.s
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/memmove-386.s
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB), NOSPLIT, $0-12
+	MOVL	to+0(FP), DI
+	MOVL	from+4(FP), SI
+	MOVL	n+8(FP), BX
+
+	// REP instructions have a high startup cost, so we handle small sizes
+	// with some straightline code. The REP MOVSL instruction is really fast
+	// for large sizes. The cutover is approximately 1K.
+tail:
+	TESTL	BX, BX
+	JEQ	move_0
+	CMPL	BX, $2
+	JBE	move_1or2
+	CMPL	BX, $4
+	JB	move_3
+	JE	move_4
+	CMPL	BX, $8
+	JBE	move_5through8
+	CMPL	BX, $16
+	JBE	move_9through16
+
+/*
+ * check and set for backwards
+ */
+	CMPL	SI, DI
+	JLS	back
+
+/*
+ * forward copy loop
+ */
+forward:
+	MOVL	BX, CX
+	SHRL	$2, CX
+	ANDL	$3, BX
+
+	REP;	MOVSL
+	JMP	tail
+/*
+ * check overlap
+ */
+back:
+	MOVL	SI, CX
+	ADDL	BX, CX
+	CMPL	CX, DI
+	JLS	forward
+/*
+ * whole thing backwards has
+ * adjusted addresses
+ */
+
+	ADDL	BX, DI
+	ADDL	BX, SI
+	STD
+
+/*
+ * copy
+ */
+	MOVL	BX, CX
+	SHRL	$2, CX
+	ANDL	$3, BX
+
+	SUBL	$4, DI
+	SUBL	$4, SI
+	REP;	MOVSL
+
+	CLD
+	ADDL	$4, DI
+	ADDL	$4, SI
+	SUBL	BX, DI
+	SUBL	BX, SI
+	JMP	tail
+
+move_1or2:
+	MOVB	(SI), AX
+	MOVB	-1(SI)(BX*1), CX
+	MOVB	AX, (DI)
+	MOVB	CX, -1(DI)(BX*1)
+	RET
+move_0:
+	RET
+move_3:
+	MOVW	(SI), AX
+	MOVB	2(SI), CX
+	MOVW	AX, (DI)
+	MOVB	CX, 2(DI)
+	RET
+move_4:
+	// We need a separate case for 4 to make sure we write pointers atomically.
+	MOVL	(SI), AX
+	MOVL	AX, (DI)
+	RET
+move_5through8:
+	MOVL	(SI), AX
+	MOVL	-4(SI)(BX*1), CX
+	MOVL	AX, (DI)
+	MOVL	CX, -4(DI)(BX*1)
+	RET
+move_9through16:
+	MOVL	(SI), AX
+	MOVL	4(SI), CX
+	MOVL	-8(SI)(BX*1), DX
+	MOVL	-4(SI)(BX*1), BP
+	MOVL	AX, (DI)
+	MOVL	CX, 4(DI)
+	MOVL	DX, -8(DI)(BX*1)
+	MOVL	BP, -4(DI)(BX*1)
+	RET
diff --git a/src/runtime/memmove_plan9_amd64.s b/src/runtime/memmove_plan9_amd64.s
new file mode 100644
index 0000000..217aa60
--- /dev/null
+++ b/src/runtime/memmove_plan9_amd64.s
@@ -0,0 +1,135 @@
+// Derived from Inferno's libkern/memmove-386.s (adapted for amd64)
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/memmove-386.s
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB), NOSPLIT, $0-24
+
+	MOVQ	to+0(FP), DI
+	MOVQ	from+8(FP), SI
+	MOVQ	n+16(FP), BX
+
+	// REP instructions have a high startup cost, so we handle small sizes
+	// with some straightline code. The REP MOVSQ instruction is really fast
+	// for large sizes. The cutover is approximately 1K.
+tail:
+	TESTQ	BX, BX
+	JEQ	move_0
+	CMPQ	BX, $2
+	JBE	move_1or2
+	CMPQ	BX, $4
+	JBE	move_3or4
+	CMPQ	BX, $8
+	JB	move_5through7
+	JE	move_8
+	CMPQ	BX, $16
+	JBE	move_9through16
+
+/*
+ * check and set for backwards
+ */
+	CMPQ	SI, DI
+	JLS	back
+
+/*
+ * forward copy loop
+ */
+forward:
+	MOVQ	BX, CX
+	SHRQ	$3, CX
+	ANDQ	$7, BX
+
+	REP;	MOVSQ
+	JMP	tail
+
+back:
+/*
+ * check overlap
+ */
+	MOVQ	SI, CX
+	ADDQ	BX, CX
+	CMPQ	CX, DI
+	JLS	forward
+
+/*
+ * whole thing backwards has
+ * adjusted addresses
+ */
+	ADDQ	BX, DI
+	ADDQ	BX, SI
+	STD
+
+/*
+ * copy
+ */
+	MOVQ	BX, CX
+	SHRQ	$3, CX
+	ANDQ	$7, BX
+
+	SUBQ	$8, DI
+	SUBQ	$8, SI
+	REP;	MOVSQ
+
+	CLD
+	ADDQ	$8, DI
+	ADDQ	$8, SI
+	SUBQ	BX, DI
+	SUBQ	BX, SI
+	JMP	tail
+
+move_1or2:
+	MOVB	(SI), AX
+	MOVB	-1(SI)(BX*1), CX
+	MOVB	AX, (DI)
+	MOVB	CX, -1(DI)(BX*1)
+	RET
+move_0:
+	RET
+move_3or4:
+	MOVW	(SI), AX
+	MOVW	-2(SI)(BX*1), CX
+	MOVW	AX, (DI)
+	MOVW	CX, -2(DI)(BX*1)
+	RET
+move_5through7:
+	MOVL	(SI), AX
+	MOVL	-4(SI)(BX*1), CX
+	MOVL	AX, (DI)
+	MOVL	CX, -4(DI)(BX*1)
+	RET
+move_8:
+	// We need a separate case for 8 to make sure we write pointers atomically.
+	MOVQ	(SI), AX
+	MOVQ	AX, (DI)
+	RET
+move_9through16:
+	MOVQ	(SI), AX
+	MOVQ	-8(SI)(BX*1), CX
+	MOVQ	AX, (DI)
+	MOVQ	CX, -8(DI)(BX*1)
+	RET
diff --git a/src/runtime/memmove_ppc64x.s b/src/runtime/memmove_ppc64x.s
new file mode 100644
index 0000000..18b9c85
--- /dev/null
+++ b/src/runtime/memmove_ppc64x.s
@@ -0,0 +1,220 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ppc64 || ppc64le
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+
+// target address
+#define TGT R3
+// source address
+#define SRC R4
+// length to move
+#define LEN R5
+// number of doublewords
+#define DWORDS R6
+// number of bytes < 8
+#define BYTES R7
+// const 16 used as index
+#define IDX16 R8
+// temp used for copies, etc.
+#define TMP R9
+// number of 64 byte chunks
+#define QWORDS R10
+// index values
+#define IDX32 R14
+#define IDX48 R15
+#define OCTWORDS R16
+
+TEXT runtime·memmove<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-24
+	// R3 = TGT = to
+	// R4 = SRC = from
+	// R5 = LEN = n
+
+	// Determine if there are doublewords to
+	// copy so a more efficient move can be done
+check:
+#ifdef GOPPC64_power10
+	CMP	LEN, $16
+	BGT	mcopy
+	SLD	$56, LEN, TMP
+	LXVL	SRC, TMP, V0
+	STXVL	V0, TGT, TMP
+	RET
+#endif
+mcopy:
+	ANDCC	$7, LEN, BYTES	// R7: bytes to copy
+	SRD	$3, LEN, DWORDS	// R6: double words to copy
+	MOVFL	CR0, CR3	// save CR from ANDCC
+	CMP	DWORDS, $0, CR1	// CR1[EQ] set if no double words to copy
+
+	// Determine overlap by subtracting dest - src and comparing against the
+	// length.  This catches the cases where src and dest are in different types
+	// of storage such as stack and static to avoid doing backward move when not
+	// necessary.
+
+	SUB	SRC, TGT, TMP	// dest - src
+	CMPU	TMP, LEN, CR2	// < len?
+	BC	12, 8, backward // BLT CR2 backward
+
+	// Copying forward if no overlap.
+
+	BC	12, 6, checkbytes	// BEQ CR1, checkbytes
+	SRDCC	$3, DWORDS, OCTWORDS	// 64 byte chunks?
+	MOVD	$16, IDX16
+	BEQ	lt64gt8			// < 64 bytes
+
+	// Prepare for moves of 64 bytes at a time.
+
+forward64setup:
+	DCBTST	(TGT)			// prepare data cache
+	DCBT	(SRC)
+	MOVD	OCTWORDS, CTR		// Number of 64 byte chunks
+	MOVD	$32, IDX32
+	MOVD	$48, IDX48
+	PCALIGN	$16
+
+forward64:
+	LXVD2X	(R0)(SRC), VS32		// load 64 bytes
+	LXVD2X	(IDX16)(SRC), VS33
+	LXVD2X	(IDX32)(SRC), VS34
+	LXVD2X	(IDX48)(SRC), VS35
+	ADD	$64, SRC
+	STXVD2X	VS32, (R0)(TGT)		// store 64 bytes
+	STXVD2X	VS33, (IDX16)(TGT)
+	STXVD2X	VS34, (IDX32)(TGT)
+	STXVD2X VS35, (IDX48)(TGT)
+	ADD	$64,TGT			// bump up for next set
+	BC	16, 0, forward64	// continue
+	ANDCC	$7, DWORDS		// remaining doublewords
+	BEQ	checkbytes		// only bytes remain
+
+lt64gt8:
+	CMP	DWORDS, $4
+	BLT	lt32gt8
+	LXVD2X	(R0)(SRC), VS32
+	LXVD2X	(IDX16)(SRC), VS33
+	ADD	$-4, DWORDS
+	STXVD2X	VS32, (R0)(TGT)
+	STXVD2X	VS33, (IDX16)(TGT)
+	ADD	$32, SRC
+	ADD	$32, TGT
+
+lt32gt8:
+	// At this point >= 8 and < 32
+	// Move 16 bytes if possible
+	CMP     DWORDS, $2
+	BLT     lt16
+	LXVD2X	(R0)(SRC), VS32
+	ADD	$-2, DWORDS
+	STXVD2X	VS32, (R0)(TGT)
+	ADD     $16, SRC
+	ADD     $16, TGT
+
+lt16:	// Move 8 bytes if possible
+	CMP     DWORDS, $1
+	BLT     checkbytes
+#ifdef GOPPC64_power10
+	ADD	$8, BYTES
+	SLD	$56, BYTES, TMP
+	LXVL	SRC, TMP, V0
+	STXVL	V0, TGT, TMP
+	RET
+#endif
+
+	MOVD    0(SRC), TMP
+	ADD	$8, SRC
+	MOVD    TMP, 0(TGT)
+	ADD     $8, TGT
+checkbytes:
+	BC	12, 14, LR		// BEQ lr
+#ifdef GOPPC64_power10
+	SLD	$56, BYTES, TMP
+	LXVL	SRC, TMP, V0
+	STXVL	V0, TGT, TMP
+	RET
+#endif
+lt8:	// Move word if possible
+	CMP BYTES, $4
+	BLT lt4
+	MOVWZ 0(SRC), TMP
+	ADD $-4, BYTES
+	MOVW TMP, 0(TGT)
+	ADD $4, SRC
+	ADD $4, TGT
+lt4:	// Move halfword if possible
+	CMP BYTES, $2
+	BLT lt2
+	MOVHZ 0(SRC), TMP
+	ADD $-2, BYTES
+	MOVH TMP, 0(TGT)
+	ADD $2, SRC
+	ADD $2, TGT
+lt2:	// Move last byte if 1 left
+	CMP BYTES, $1
+	BC 12, 0, LR	// ble lr
+	MOVBZ 0(SRC), TMP
+	MOVBZ TMP, 0(TGT)
+	RET
+
+backward:
+	// Copying backwards proceeds by copying R7 bytes then copying R6 double words.
+	// R3 and R4 are advanced to the end of the destination/source buffers
+	// respectively and moved back as we copy.
+
+	ADD	LEN, SRC, SRC		// end of source
+	ADD	TGT, LEN, TGT		// end of dest
+
+	BEQ	nobackwardtail		// earlier condition
+
+	MOVD	BYTES, CTR			// bytes to move
+
+backwardtailloop:
+	MOVBZ 	-1(SRC), TMP		// point to last byte
+	SUB	$1,SRC
+	MOVBZ 	TMP, -1(TGT)
+	SUB	$1,TGT
+	BDNZ	backwardtailloop
+
+nobackwardtail:
+	BC	4, 5, LR		// blelr cr1, return if DWORDS == 0
+	SRDCC	$2,DWORDS,QWORDS	// Compute number of 32B blocks and compare to 0
+	BNE	backward32setup		// If QWORDS != 0, start the 32B copy loop.
+
+backward24:
+	// DWORDS is a value between 1-3.
+	CMP	DWORDS, $2
+
+	MOVD 	-8(SRC), TMP
+	MOVD 	TMP, -8(TGT)
+	BC	12, 0, LR		// bltlr, return if DWORDS == 1
+
+	MOVD 	-16(SRC), TMP
+	MOVD 	TMP, -16(TGT)
+	BC	12, 2, LR		// beqlr, return if DWORDS == 2
+
+	MOVD 	-24(SRC), TMP
+	MOVD 	TMP, -24(TGT)
+	RET
+
+backward32setup:
+	ANDCC   $3,DWORDS		// Compute remaining DWORDS and compare to 0
+	MOVD	QWORDS, CTR		// set up loop ctr
+	MOVD	$16, IDX16		// 32 bytes at a time
+	PCALIGN	$16
+
+backward32loop:
+	SUB	$32, TGT
+	SUB	$32, SRC
+	LXVD2X	(R0)(SRC), VS32		// load 16x2 bytes
+	LXVD2X	(IDX16)(SRC), VS33
+	STXVD2X	VS32, (R0)(TGT)		// store 16x2 bytes
+	STXVD2X	VS33, (IDX16)(TGT)
+	BDNZ	backward32loop
+	BC	12, 2, LR		// beqlr, return if DWORDS == 0
+	BR	backward24
diff --git a/src/runtime/memmove_riscv64.s b/src/runtime/memmove_riscv64.s
new file mode 100644
index 0000000..f5db865
--- /dev/null
+++ b/src/runtime/memmove_riscv64.s
@@ -0,0 +1,319 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// void runtime·memmove(void*, void*, uintptr)
+TEXT runtime·memmove<ABIInternal>(SB),NOSPLIT,$-0-24
+	// X10 = to
+	// X11 = from
+	// X12 = n
+	BEQ	X10, X11, done
+	BEQZ	X12, done
+
+	// If the destination is ahead of the source, start at the end of the
+	// buffer and go backward.
+	BGTU	X10, X11, backward
+
+	// If less than 8 bytes, do single byte copies.
+	MOV	$8, X9
+	BLT	X12, X9, f_loop4_check
+
+	// Check alignment - if alignment differs we have to do one byte at a time.
+	AND	$7, X10, X5
+	AND	$7, X11, X6
+	BNE	X5, X6, f_loop8_unaligned_check
+	BEQZ	X5, f_loop_check
+
+	// Move one byte at a time until we reach 8 byte alignment.
+	SUB	X5, X9, X5
+	SUB	X5, X12, X12
+f_align:
+	ADD	$-1, X5
+	MOVB	0(X11), X14
+	MOVB	X14, 0(X10)
+	ADD	$1, X10
+	ADD	$1, X11
+	BNEZ	X5, f_align
+
+f_loop_check:
+	MOV	$16, X9
+	BLT	X12, X9, f_loop8_check
+	MOV	$32, X9
+	BLT	X12, X9, f_loop16_check
+	MOV	$64, X9
+	BLT	X12, X9, f_loop32_check
+f_loop64:
+	MOV	0(X11), X14
+	MOV	8(X11), X15
+	MOV	16(X11), X16
+	MOV	24(X11), X17
+	MOV	32(X11), X18
+	MOV	40(X11), X19
+	MOV	48(X11), X20
+	MOV	56(X11), X21
+	MOV	X14, 0(X10)
+	MOV	X15, 8(X10)
+	MOV	X16, 16(X10)
+	MOV	X17, 24(X10)
+	MOV	X18, 32(X10)
+	MOV	X19, 40(X10)
+	MOV	X20, 48(X10)
+	MOV	X21, 56(X10)
+	ADD	$64, X10
+	ADD	$64, X11
+	ADD	$-64, X12
+	BGE	X12, X9, f_loop64
+	BEQZ	X12, done
+
+f_loop32_check:
+	MOV	$32, X9
+	BLT	X12, X9, f_loop16_check
+f_loop32:
+	MOV	0(X11), X14
+	MOV	8(X11), X15
+	MOV	16(X11), X16
+	MOV	24(X11), X17
+	MOV	X14, 0(X10)
+	MOV	X15, 8(X10)
+	MOV	X16, 16(X10)
+	MOV	X17, 24(X10)
+	ADD	$32, X10
+	ADD	$32, X11
+	ADD	$-32, X12
+	BGE	X12, X9, f_loop32
+	BEQZ	X12, done
+
+f_loop16_check:
+	MOV	$16, X9
+	BLT	X12, X9, f_loop8_check
+f_loop16:
+	MOV	0(X11), X14
+	MOV	8(X11), X15
+	MOV	X14, 0(X10)
+	MOV	X15, 8(X10)
+	ADD	$16, X10
+	ADD	$16, X11
+	ADD	$-16, X12
+	BGE	X12, X9, f_loop16
+	BEQZ	X12, done
+
+f_loop8_check:
+	MOV	$8, X9
+	BLT	X12, X9, f_loop4_check
+f_loop8:
+	MOV	0(X11), X14
+	MOV	X14, 0(X10)
+	ADD	$8, X10
+	ADD	$8, X11
+	ADD	$-8, X12
+	BGE	X12, X9, f_loop8
+	BEQZ	X12, done
+	JMP	f_loop4_check
+
+f_loop8_unaligned_check:
+	MOV	$8, X9
+	BLT	X12, X9, f_loop4_check
+f_loop8_unaligned:
+	MOVB	0(X11), X14
+	MOVB	1(X11), X15
+	MOVB	2(X11), X16
+	MOVB	3(X11), X17
+	MOVB	4(X11), X18
+	MOVB	5(X11), X19
+	MOVB	6(X11), X20
+	MOVB	7(X11), X21
+	MOVB	X14, 0(X10)
+	MOVB	X15, 1(X10)
+	MOVB	X16, 2(X10)
+	MOVB	X17, 3(X10)
+	MOVB	X18, 4(X10)
+	MOVB	X19, 5(X10)
+	MOVB	X20, 6(X10)
+	MOVB	X21, 7(X10)
+	ADD	$8, X10
+	ADD	$8, X11
+	ADD	$-8, X12
+	BGE	X12, X9, f_loop8_unaligned
+
+f_loop4_check:
+	MOV	$4, X9
+	BLT	X12, X9, f_loop1
+f_loop4:
+	MOVB	0(X11), X14
+	MOVB	1(X11), X15
+	MOVB	2(X11), X16
+	MOVB	3(X11), X17
+	MOVB	X14, 0(X10)
+	MOVB	X15, 1(X10)
+	MOVB	X16, 2(X10)
+	MOVB	X17, 3(X10)
+	ADD	$4, X10
+	ADD	$4, X11
+	ADD	$-4, X12
+	BGE	X12, X9, f_loop4
+
+f_loop1:
+	BEQZ	X12, done
+	MOVB	0(X11), X14
+	MOVB	X14, 0(X10)
+	ADD	$1, X10
+	ADD	$1, X11
+	ADD	$-1, X12
+	JMP	f_loop1
+
+backward:
+	ADD	X10, X12, X10
+	ADD	X11, X12, X11
+
+	// If less than 8 bytes, do single byte copies.
+	MOV	$8, X9
+	BLT	X12, X9, b_loop4_check
+
+	// Check alignment - if alignment differs we have to do one byte at a time.
+	AND	$7, X10, X5
+	AND	$7, X11, X6
+	BNE	X5, X6, b_loop8_unaligned_check
+	BEQZ	X5, b_loop_check
+
+	// Move one byte at a time until we reach 8 byte alignment.
+	SUB	X5, X12, X12
+b_align:
+	ADD	$-1, X5
+	ADD	$-1, X10
+	ADD	$-1, X11
+	MOVB	0(X11), X14
+	MOVB	X14, 0(X10)
+	BNEZ	X5, b_align
+
+b_loop_check:
+	MOV	$16, X9
+	BLT	X12, X9, b_loop8_check
+	MOV	$32, X9
+	BLT	X12, X9, b_loop16_check
+	MOV	$64, X9
+	BLT	X12, X9, b_loop32_check
+b_loop64:
+	ADD	$-64, X10
+	ADD	$-64, X11
+	MOV	0(X11), X14
+	MOV	8(X11), X15
+	MOV	16(X11), X16
+	MOV	24(X11), X17
+	MOV	32(X11), X18
+	MOV	40(X11), X19
+	MOV	48(X11), X20
+	MOV	56(X11), X21
+	MOV	X14, 0(X10)
+	MOV	X15, 8(X10)
+	MOV	X16, 16(X10)
+	MOV	X17, 24(X10)
+	MOV	X18, 32(X10)
+	MOV	X19, 40(X10)
+	MOV	X20, 48(X10)
+	MOV	X21, 56(X10)
+	ADD	$-64, X12
+	BGE	X12, X9, b_loop64
+	BEQZ	X12, done
+
+b_loop32_check:
+	MOV	$32, X9
+	BLT	X12, X9, b_loop16_check
+b_loop32:
+	ADD	$-32, X10
+	ADD	$-32, X11
+	MOV	0(X11), X14
+	MOV	8(X11), X15
+	MOV	16(X11), X16
+	MOV	24(X11), X17
+	MOV	X14, 0(X10)
+	MOV	X15, 8(X10)
+	MOV	X16, 16(X10)
+	MOV	X17, 24(X10)
+	ADD	$-32, X12
+	BGE	X12, X9, b_loop32
+	BEQZ	X12, done
+
+b_loop16_check:
+	MOV	$16, X9
+	BLT	X12, X9, b_loop8_check
+b_loop16:
+	ADD	$-16, X10
+	ADD	$-16, X11
+	MOV	0(X11), X14
+	MOV	8(X11), X15
+	MOV	X14, 0(X10)
+	MOV	X15, 8(X10)
+	ADD	$-16, X12
+	BGE	X12, X9, b_loop16
+	BEQZ	X12, done
+
+b_loop8_check:
+	MOV	$8, X9
+	BLT	X12, X9, b_loop4_check
+b_loop8:
+	ADD	$-8, X10
+	ADD	$-8, X11
+	MOV	0(X11), X14
+	MOV	X14, 0(X10)
+	ADD	$-8, X12
+	BGE	X12, X9, b_loop8
+	BEQZ	X12, done
+	JMP	b_loop4_check
+
+b_loop8_unaligned_check:
+	MOV	$8, X9
+	BLT	X12, X9, b_loop4_check
+b_loop8_unaligned:
+	ADD	$-8, X10
+	ADD	$-8, X11
+	MOVB	0(X11), X14
+	MOVB	1(X11), X15
+	MOVB	2(X11), X16
+	MOVB	3(X11), X17
+	MOVB	4(X11), X18
+	MOVB	5(X11), X19
+	MOVB	6(X11), X20
+	MOVB	7(X11), X21
+	MOVB	X14, 0(X10)
+	MOVB	X15, 1(X10)
+	MOVB	X16, 2(X10)
+	MOVB	X17, 3(X10)
+	MOVB	X18, 4(X10)
+	MOVB	X19, 5(X10)
+	MOVB	X20, 6(X10)
+	MOVB	X21, 7(X10)
+	ADD	$-8, X12
+	BGE	X12, X9, b_loop8_unaligned
+
+b_loop4_check:
+	MOV	$4, X9
+	BLT	X12, X9, b_loop1
+b_loop4:
+	ADD	$-4, X10
+	ADD	$-4, X11
+	MOVB	0(X11), X14
+	MOVB	1(X11), X15
+	MOVB	2(X11), X16
+	MOVB	3(X11), X17
+	MOVB	X14, 0(X10)
+	MOVB	X15, 1(X10)
+	MOVB	X16, 2(X10)
+	MOVB	X17, 3(X10)
+	ADD	$-4, X12
+	BGE	X12, X9, b_loop4
+
+b_loop1:
+	BEQZ	X12, done
+	ADD	$-1, X10
+	ADD	$-1, X11
+	MOVB	0(X11), X14
+	MOVB	X14, 0(X10)
+	ADD	$-1, X12
+	JMP	b_loop1
+
+done:
+	RET
diff --git a/src/runtime/memmove_s390x.s b/src/runtime/memmove_s390x.s
new file mode 100644
index 0000000..f4c2b87
--- /dev/null
+++ b/src/runtime/memmove_s390x.s
@@ -0,0 +1,191 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB),NOSPLIT|NOFRAME,$0-24
+	MOVD	to+0(FP), R6
+	MOVD	from+8(FP), R4
+	MOVD	n+16(FP), R5
+
+	CMPBEQ	R6, R4, done
+
+start:
+	CMPBLE	R5, $3, move0to3
+	CMPBLE	R5, $7, move4to7
+	CMPBLE	R5, $11, move8to11
+	CMPBLE	R5, $15, move12to15
+	CMPBNE	R5, $16, movemt16
+	MOVD	0(R4), R7
+	MOVD	8(R4), R8
+	MOVD	R7, 0(R6)
+	MOVD	R8, 8(R6)
+	RET
+
+movemt16:
+	CMPBGT	R4, R6, forwards
+	ADD	R5, R4, R7
+	CMPBLE	R7, R6, forwards
+	ADD	R5, R6, R8
+backwards:
+	MOVD	-8(R7), R3
+	MOVD	R3, -8(R8)
+	MOVD	-16(R7), R3
+	MOVD	R3, -16(R8)
+	ADD	$-16, R5
+	ADD	$-16, R7
+	ADD	$-16, R8
+	CMP	R5, $16
+	BGE	backwards
+	BR	start
+
+forwards:
+	CMPBGT	R5, $64, forwards_fast
+	MOVD	0(R4), R3
+	MOVD	R3, 0(R6)
+	MOVD	8(R4), R3
+	MOVD	R3, 8(R6)
+	ADD	$16, R4
+	ADD	$16, R6
+	ADD	$-16, R5
+	CMP	R5, $16
+	BGE	forwards
+	BR	start
+
+forwards_fast:
+	CMP	R5, $256
+	BLE	forwards_small
+	MVC	$256, 0(R4), 0(R6)
+	ADD	$256, R4
+	ADD	$256, R6
+	ADD	$-256, R5
+	BR	forwards_fast
+
+forwards_small:
+	CMPBEQ	R5, $0, done
+	ADD	$-1, R5
+	EXRL	$memmove_exrl_mvc<>(SB), R5
+	RET
+
+move0to3:
+	CMPBEQ	R5, $0, done
+move1:
+	CMPBNE	R5, $1, move2
+	MOVB	0(R4), R3
+	MOVB	R3, 0(R6)
+	RET
+move2:
+	CMPBNE	R5, $2, move3
+	MOVH	0(R4), R3
+	MOVH	R3, 0(R6)
+	RET
+move3:
+	MOVH	0(R4), R3
+	MOVB	2(R4), R7
+	MOVH	R3, 0(R6)
+	MOVB	R7, 2(R6)
+	RET
+
+move4to7:
+	CMPBNE	R5, $4, move5
+	MOVW	0(R4), R3
+	MOVW	R3, 0(R6)
+	RET
+move5:
+	CMPBNE	R5, $5, move6
+	MOVW	0(R4), R3
+	MOVB	4(R4), R7
+	MOVW	R3, 0(R6)
+	MOVB	R7, 4(R6)
+	RET
+move6:
+	CMPBNE	R5, $6, move7
+	MOVW	0(R4), R3
+	MOVH	4(R4), R7
+	MOVW	R3, 0(R6)
+	MOVH	R7, 4(R6)
+	RET
+move7:
+	MOVW	0(R4), R3
+	MOVH	4(R4), R7
+	MOVB	6(R4), R8
+	MOVW	R3, 0(R6)
+	MOVH	R7, 4(R6)
+	MOVB	R8, 6(R6)
+	RET
+
+move8to11:
+	CMPBNE	R5, $8, move9
+	MOVD	0(R4), R3
+	MOVD	R3, 0(R6)
+	RET
+move9:
+	CMPBNE	R5, $9, move10
+	MOVD	0(R4), R3
+	MOVB	8(R4), R7
+	MOVD	R3, 0(R6)
+	MOVB	R7, 8(R6)
+	RET
+move10:
+	CMPBNE	R5, $10, move11
+	MOVD	0(R4), R3
+	MOVH	8(R4), R7
+	MOVD	R3, 0(R6)
+	MOVH	R7, 8(R6)
+	RET
+move11:
+	MOVD	0(R4), R3
+	MOVH	8(R4), R7
+	MOVB	10(R4), R8
+	MOVD	R3, 0(R6)
+	MOVH	R7, 8(R6)
+	MOVB	R8, 10(R6)
+	RET
+
+move12to15:
+	CMPBNE	R5, $12, move13
+	MOVD	0(R4), R3
+	MOVW	8(R4), R7
+	MOVD	R3, 0(R6)
+	MOVW	R7, 8(R6)
+	RET
+move13:
+	CMPBNE	R5, $13, move14
+	MOVD	0(R4), R3
+	MOVW	8(R4), R7
+	MOVB	12(R4), R8
+	MOVD	R3, 0(R6)
+	MOVW	R7, 8(R6)
+	MOVB	R8, 12(R6)
+	RET
+move14:
+	CMPBNE	R5, $14, move15
+	MOVD	0(R4), R3
+	MOVW	8(R4), R7
+	MOVH	12(R4), R8
+	MOVD	R3, 0(R6)
+	MOVW	R7, 8(R6)
+	MOVH	R8, 12(R6)
+	RET
+move15:
+	MOVD	0(R4), R3
+	MOVW	8(R4), R7
+	MOVH	12(R4), R8
+	MOVB	14(R4), R10
+	MOVD	R3, 0(R6)
+	MOVW	R7, 8(R6)
+	MOVH	R8, 12(R6)
+	MOVB	R10, 14(R6)
+done:
+	RET
+
+// DO NOT CALL - target for exrl (execute relative long) instruction.
+TEXT memmove_exrl_mvc<>(SB),NOSPLIT|NOFRAME,$0-0
+	MVC	$1, 0(R4), 0(R6)
+	MOVD	R0, 0(R0)
+	RET
+
diff --git a/src/runtime/memmove_test.go b/src/runtime/memmove_test.go
new file mode 100644
index 0000000..21236d1
--- /dev/null
+++ b/src/runtime/memmove_test.go
@@ -0,0 +1,1124 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"crypto/rand"
+	"encoding/binary"
+	"fmt"
+	"internal/race"
+	"internal/testenv"
+	. "runtime"
+	"sync/atomic"
+	"testing"
+	"unsafe"
+)
+
+func TestMemmove(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+	t.Parallel()
+	size := 256
+	if testing.Short() {
+		size = 128 + 16
+	}
+	src := make([]byte, size)
+	dst := make([]byte, size)
+	for i := 0; i < size; i++ {
+		src[i] = byte(128 + (i & 127))
+	}
+	for i := 0; i < size; i++ {
+		dst[i] = byte(i & 127)
+	}
+	for n := 0; n <= size; n++ {
+		for x := 0; x <= size-n; x++ { // offset in src
+			for y := 0; y <= size-n; y++ { // offset in dst
+				copy(dst[y:y+n], src[x:x+n])
+				for i := 0; i < y; i++ {
+					if dst[i] != byte(i&127) {
+						t.Fatalf("prefix dst[%d] = %d", i, dst[i])
+					}
+				}
+				for i := y; i < y+n; i++ {
+					if dst[i] != byte(128+((i-y+x)&127)) {
+						t.Fatalf("copied dst[%d] = %d", i, dst[i])
+					}
+					dst[i] = byte(i & 127) // reset dst
+				}
+				for i := y + n; i < size; i++ {
+					if dst[i] != byte(i&127) {
+						t.Fatalf("suffix dst[%d] = %d", i, dst[i])
+					}
+				}
+			}
+		}
+	}
+}
+
+func TestMemmoveAlias(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+	t.Parallel()
+	size := 256
+	if testing.Short() {
+		size = 128 + 16
+	}
+	buf := make([]byte, size)
+	for i := 0; i < size; i++ {
+		buf[i] = byte(i)
+	}
+	for n := 0; n <= size; n++ {
+		for x := 0; x <= size-n; x++ { // src offset
+			for y := 0; y <= size-n; y++ { // dst offset
+				copy(buf[y:y+n], buf[x:x+n])
+				for i := 0; i < y; i++ {
+					if buf[i] != byte(i) {
+						t.Fatalf("prefix buf[%d] = %d", i, buf[i])
+					}
+				}
+				for i := y; i < y+n; i++ {
+					if buf[i] != byte(i-y+x) {
+						t.Fatalf("copied buf[%d] = %d", i, buf[i])
+					}
+					buf[i] = byte(i) // reset buf
+				}
+				for i := y + n; i < size; i++ {
+					if buf[i] != byte(i) {
+						t.Fatalf("suffix buf[%d] = %d", i, buf[i])
+					}
+				}
+			}
+		}
+	}
+}
+
+func TestMemmoveLarge0x180000(t *testing.T) {
+	if testing.Short() && testenv.Builder() == "" {
+		t.Skip("-short")
+	}
+
+	t.Parallel()
+	if race.Enabled {
+		t.Skip("skipping large memmove test under race detector")
+	}
+	testSize(t, 0x180000)
+}
+
+func TestMemmoveOverlapLarge0x120000(t *testing.T) {
+	if testing.Short() && testenv.Builder() == "" {
+		t.Skip("-short")
+	}
+
+	t.Parallel()
+	if race.Enabled {
+		t.Skip("skipping large memmove test under race detector")
+	}
+	testOverlap(t, 0x120000)
+}
+
+func testSize(t *testing.T, size int) {
+	src := make([]byte, size)
+	dst := make([]byte, size)
+	_, _ = rand.Read(src)
+	_, _ = rand.Read(dst)
+
+	ref := make([]byte, size)
+	copyref(ref, dst)
+
+	for n := size - 50; n > 1; n >>= 1 {
+		for x := 0; x <= size-n; x = x*7 + 1 { // offset in src
+			for y := 0; y <= size-n; y = y*9 + 1 { // offset in dst
+				copy(dst[y:y+n], src[x:x+n])
+				copyref(ref[y:y+n], src[x:x+n])
+				p := cmpb(dst, ref)
+				if p >= 0 {
+					t.Fatalf("Copy failed, copying from src[%d:%d] to dst[%d:%d].\nOffset %d is different, %v != %v", x, x+n, y, y+n, p, dst[p], ref[p])
+				}
+			}
+		}
+	}
+}
+
+func testOverlap(t *testing.T, size int) {
+	src := make([]byte, size)
+	test := make([]byte, size)
+	ref := make([]byte, size)
+	_, _ = rand.Read(src)
+
+	for n := size - 50; n > 1; n >>= 1 {
+		for x := 0; x <= size-n; x = x*7 + 1 { // offset in src
+			for y := 0; y <= size-n; y = y*9 + 1 { // offset in dst
+				// Reset input
+				copyref(test, src)
+				copyref(ref, src)
+				copy(test[y:y+n], test[x:x+n])
+				if y <= x {
+					copyref(ref[y:y+n], ref[x:x+n])
+				} else {
+					copybw(ref[y:y+n], ref[x:x+n])
+				}
+				p := cmpb(test, ref)
+				if p >= 0 {
+					t.Fatalf("Copy failed, copying from src[%d:%d] to dst[%d:%d].\nOffset %d is different, %v != %v", x, x+n, y, y+n, p, test[p], ref[p])
+				}
+			}
+		}
+	}
+
+}
+
+// Forward copy.
+func copyref(dst, src []byte) {
+	for i, v := range src {
+		dst[i] = v
+	}
+}
+
+// Backwards copy
+func copybw(dst, src []byte) {
+	if len(src) == 0 {
+		return
+	}
+	for i := len(src) - 1; i >= 0; i-- {
+		dst[i] = src[i]
+	}
+}
+
+// Returns offset of difference
+func matchLen(a, b []byte, max int) int {
+	a = a[:max]
+	b = b[:max]
+	for i, av := range a {
+		if b[i] != av {
+			return i
+		}
+	}
+	return max
+}
+
+func cmpb(a, b []byte) int {
+	l := matchLen(a, b, len(a))
+	if l == len(a) {
+		return -1
+	}
+	return l
+}
+
+// Ensure that memmove writes pointers atomically, so the GC won't
+// observe a partially updated pointer.
+func TestMemmoveAtomicity(t *testing.T) {
+	if race.Enabled {
+		t.Skip("skip under the race detector -- this test is intentionally racy")
+	}
+
+	var x int
+
+	for _, backward := range []bool{true, false} {
+		for _, n := range []int{3, 4, 5, 6, 7, 8, 9, 10, 15, 25, 49} {
+			n := n
+
+			// test copying [N]*int.
+			sz := uintptr(n * PtrSize)
+			name := fmt.Sprint(sz)
+			if backward {
+				name += "-backward"
+			} else {
+				name += "-forward"
+			}
+			t.Run(name, func(t *testing.T) {
+				// Use overlapping src and dst to force forward/backward copy.
+				var s [100]*int
+				src := s[n-1 : 2*n-1]
+				dst := s[:n]
+				if backward {
+					src, dst = dst, src
+				}
+				for i := range src {
+					src[i] = &x
+				}
+				for i := range dst {
+					dst[i] = nil
+				}
+
+				var ready atomic.Uint32
+				go func() {
+					sp := unsafe.Pointer(&src[0])
+					dp := unsafe.Pointer(&dst[0])
+					ready.Store(1)
+					for i := 0; i < 10000; i++ {
+						Memmove(dp, sp, sz)
+						MemclrNoHeapPointers(dp, sz)
+					}
+					ready.Store(2)
+				}()
+
+				for ready.Load() == 0 {
+					Gosched()
+				}
+
+				for ready.Load() != 2 {
+					for i := range dst {
+						p := dst[i]
+						if p != nil && p != &x {
+							t.Fatalf("got partially updated pointer %p at dst[%d], want either nil or %p", p, i, &x)
+						}
+					}
+				}
+			})
+		}
+	}
+}
+
+func benchmarkSizes(b *testing.B, sizes []int, fn func(b *testing.B, n int)) {
+	for _, n := range sizes {
+		b.Run(fmt.Sprint(n), func(b *testing.B) {
+			b.SetBytes(int64(n))
+			fn(b, n)
+		})
+	}
+}
+
+var bufSizes = []int{
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+	32, 64, 128, 256, 512, 1024, 2048, 4096,
+}
+var bufSizesOverlap = []int{
+	32, 64, 128, 256, 512, 1024, 2048, 4096,
+}
+
+func BenchmarkMemmove(b *testing.B) {
+	benchmarkSizes(b, bufSizes, func(b *testing.B, n int) {
+		x := make([]byte, n)
+		y := make([]byte, n)
+		for i := 0; i < b.N; i++ {
+			copy(x, y)
+		}
+	})
+}
+
+func BenchmarkMemmoveOverlap(b *testing.B) {
+	benchmarkSizes(b, bufSizesOverlap, func(b *testing.B, n int) {
+		x := make([]byte, n+16)
+		for i := 0; i < b.N; i++ {
+			copy(x[16:n+16], x[:n])
+		}
+	})
+}
+
+func BenchmarkMemmoveUnalignedDst(b *testing.B) {
+	benchmarkSizes(b, bufSizes, func(b *testing.B, n int) {
+		x := make([]byte, n+1)
+		y := make([]byte, n)
+		for i := 0; i < b.N; i++ {
+			copy(x[1:], y)
+		}
+	})
+}
+
+func BenchmarkMemmoveUnalignedDstOverlap(b *testing.B) {
+	benchmarkSizes(b, bufSizesOverlap, func(b *testing.B, n int) {
+		x := make([]byte, n+16)
+		for i := 0; i < b.N; i++ {
+			copy(x[16:n+16], x[1:n+1])
+		}
+	})
+}
+
+func BenchmarkMemmoveUnalignedSrc(b *testing.B) {
+	benchmarkSizes(b, bufSizes, func(b *testing.B, n int) {
+		x := make([]byte, n)
+		y := make([]byte, n+1)
+		for i := 0; i < b.N; i++ {
+			copy(x, y[1:])
+		}
+	})
+}
+
+func BenchmarkMemmoveUnalignedSrcDst(b *testing.B) {
+	for _, n := range []int{16, 64, 256, 4096, 65536} {
+		buf := make([]byte, (n+8)*2)
+		x := buf[:len(buf)/2]
+		y := buf[len(buf)/2:]
+		for _, off := range []int{0, 1, 4, 7} {
+			b.Run(fmt.Sprint("f_", n, off), func(b *testing.B) {
+				b.SetBytes(int64(n))
+				for i := 0; i < b.N; i++ {
+					copy(x[off:n+off], y[off:n+off])
+				}
+			})
+
+			b.Run(fmt.Sprint("b_", n, off), func(b *testing.B) {
+				b.SetBytes(int64(n))
+				for i := 0; i < b.N; i++ {
+					copy(y[off:n+off], x[off:n+off])
+				}
+			})
+		}
+	}
+}
+
+func BenchmarkMemmoveUnalignedSrcOverlap(b *testing.B) {
+	benchmarkSizes(b, bufSizesOverlap, func(b *testing.B, n int) {
+		x := make([]byte, n+1)
+		for i := 0; i < b.N; i++ {
+			copy(x[1:n+1], x[:n])
+		}
+	})
+}
+
+func TestMemclr(t *testing.T) {
+	size := 512
+	if testing.Short() {
+		size = 128 + 16
+	}
+	mem := make([]byte, size)
+	for i := 0; i < size; i++ {
+		mem[i] = 0xee
+	}
+	for n := 0; n < size; n++ {
+		for x := 0; x <= size-n; x++ { // offset in mem
+			MemclrBytes(mem[x : x+n])
+			for i := 0; i < x; i++ {
+				if mem[i] != 0xee {
+					t.Fatalf("overwrite prefix mem[%d] = %d", i, mem[i])
+				}
+			}
+			for i := x; i < x+n; i++ {
+				if mem[i] != 0 {
+					t.Fatalf("failed clear mem[%d] = %d", i, mem[i])
+				}
+				mem[i] = 0xee
+			}
+			for i := x + n; i < size; i++ {
+				if mem[i] != 0xee {
+					t.Fatalf("overwrite suffix mem[%d] = %d", i, mem[i])
+				}
+			}
+		}
+	}
+}
+
+func BenchmarkMemclr(b *testing.B) {
+	for _, n := range []int{5, 16, 64, 256, 4096, 65536} {
+		x := make([]byte, n)
+		b.Run(fmt.Sprint(n), func(b *testing.B) {
+			b.SetBytes(int64(n))
+			for i := 0; i < b.N; i++ {
+				MemclrBytes(x)
+			}
+		})
+	}
+	for _, m := range []int{1, 4, 8, 16, 64} {
+		x := make([]byte, m<<20)
+		b.Run(fmt.Sprint(m, "M"), func(b *testing.B) {
+			b.SetBytes(int64(m << 20))
+			for i := 0; i < b.N; i++ {
+				MemclrBytes(x)
+			}
+		})
+	}
+}
+
+func BenchmarkMemclrUnaligned(b *testing.B) {
+	for _, off := range []int{0, 1, 4, 7} {
+		for _, n := range []int{5, 16, 64, 256, 4096, 65536} {
+			x := make([]byte, n+off)
+			b.Run(fmt.Sprint(off, n), func(b *testing.B) {
+				b.SetBytes(int64(n))
+				for i := 0; i < b.N; i++ {
+					MemclrBytes(x[off:])
+				}
+			})
+		}
+	}
+
+	for _, off := range []int{0, 1, 4, 7} {
+		for _, m := range []int{1, 4, 8, 16, 64} {
+			x := make([]byte, (m<<20)+off)
+			b.Run(fmt.Sprint(off, m, "M"), func(b *testing.B) {
+				b.SetBytes(int64(m << 20))
+				for i := 0; i < b.N; i++ {
+					MemclrBytes(x[off:])
+				}
+			})
+		}
+	}
+}
+
+func BenchmarkGoMemclr(b *testing.B) {
+	benchmarkSizes(b, []int{5, 16, 64, 256}, func(b *testing.B, n int) {
+		x := make([]byte, n)
+		for i := 0; i < b.N; i++ {
+			for j := range x {
+				x[j] = 0
+			}
+		}
+	})
+}
+
+func BenchmarkMemclrRange(b *testing.B) {
+	type RunData struct {
+		data []int
+	}
+
+	benchSizes := []RunData{
+		{[]int{1043, 1078, 1894, 1582, 1044, 1165, 1467, 1100, 1919, 1562, 1932, 1645,
+			1412, 1038, 1576, 1200, 1029, 1336, 1095, 1494, 1350, 1025, 1502, 1548, 1316, 1296,
+			1868, 1639, 1546, 1626, 1642, 1308, 1726, 1665, 1678, 1187, 1515, 1598, 1353, 1237,
+			1977, 1452, 2012, 1914, 1514, 1136, 1975, 1618, 1536, 1695, 1600, 1733, 1392, 1099,
+			1358, 1996, 1224, 1783, 1197, 1838, 1460, 1556, 1554, 2020}}, // 1kb-2kb
+		{[]int{3964, 5139, 6573, 7775, 6553, 2413, 3466, 5394, 2469, 7336, 7091, 6745,
+			4028, 5643, 6164, 3475, 4138, 6908, 7559, 3335, 5660, 4122, 3945, 2082, 7564, 6584,
+			5111, 2288, 6789, 2797, 4928, 7986, 5163, 5447, 2999, 4968, 3174, 3202, 7908, 8137,
+			4735, 6161, 4646, 7592, 3083, 5329, 3687, 2754, 3599, 7231, 6455, 2549, 8063, 2189,
+			7121, 5048, 4277, 6626, 6306, 2815, 7473, 3963, 7549, 7255}}, // 2kb-8kb
+		{[]int{16304, 15936, 15760, 4736, 9136, 11184, 10160, 5952, 14560, 15744,
+			6624, 5872, 13088, 14656, 14192, 10304, 4112, 10384, 9344, 4496, 11392, 7024,
+			5200, 10064, 14784, 5808, 13504, 10480, 8512, 4896, 13264, 5600}}, // 4kb-16kb
+		{[]int{164576, 233136, 220224, 183280, 214112, 217248, 228560, 201728}}, // 128kb-256kb
+	}
+
+	for _, t := range benchSizes {
+		total := 0
+		minLen := 0
+		maxLen := 0
+
+		for _, clrLen := range t.data {
+			if clrLen > maxLen {
+				maxLen = clrLen
+			}
+			if clrLen < minLen || minLen == 0 {
+				minLen = clrLen
+			}
+			total += clrLen
+		}
+		buffer := make([]byte, maxLen)
+
+		text := ""
+		if minLen >= (1 << 20) {
+			text = fmt.Sprint(minLen>>20, "M ", (maxLen+(1<<20-1))>>20, "M")
+		} else if minLen >= (1 << 10) {
+			text = fmt.Sprint(minLen>>10, "K ", (maxLen+(1<<10-1))>>10, "K")
+		} else {
+			text = fmt.Sprint(minLen, " ", maxLen)
+		}
+		b.Run(text, func(b *testing.B) {
+			b.SetBytes(int64(total))
+			for i := 0; i < b.N; i++ {
+				for _, clrLen := range t.data {
+					MemclrBytes(buffer[:clrLen])
+				}
+			}
+		})
+	}
+}
+
+func BenchmarkClearFat7(b *testing.B) {
+	p := new([7]byte)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [7]byte{}
+	}
+}
+
+func BenchmarkClearFat8(b *testing.B) {
+	p := new([8 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [8 / 4]uint32{}
+	}
+}
+
+func BenchmarkClearFat11(b *testing.B) {
+	p := new([11]byte)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [11]byte{}
+	}
+}
+
+func BenchmarkClearFat12(b *testing.B) {
+	p := new([12 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [12 / 4]uint32{}
+	}
+}
+
+func BenchmarkClearFat13(b *testing.B) {
+	p := new([13]byte)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [13]byte{}
+	}
+}
+
+func BenchmarkClearFat14(b *testing.B) {
+	p := new([14]byte)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [14]byte{}
+	}
+}
+
+func BenchmarkClearFat15(b *testing.B) {
+	p := new([15]byte)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [15]byte{}
+	}
+}
+
+func BenchmarkClearFat16(b *testing.B) {
+	p := new([16 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [16 / 4]uint32{}
+	}
+}
+
+func BenchmarkClearFat24(b *testing.B) {
+	p := new([24 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [24 / 4]uint32{}
+	}
+}
+
+func BenchmarkClearFat32(b *testing.B) {
+	p := new([32 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [32 / 4]uint32{}
+	}
+}
+
+func BenchmarkClearFat40(b *testing.B) {
+	p := new([40 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [40 / 4]uint32{}
+	}
+}
+
+func BenchmarkClearFat48(b *testing.B) {
+	p := new([48 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [48 / 4]uint32{}
+	}
+}
+
+func BenchmarkClearFat56(b *testing.B) {
+	p := new([56 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [56 / 4]uint32{}
+	}
+}
+
+func BenchmarkClearFat64(b *testing.B) {
+	p := new([64 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [64 / 4]uint32{}
+	}
+}
+
+func BenchmarkClearFat72(b *testing.B) {
+	p := new([72 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [72 / 4]uint32{}
+	}
+}
+
+func BenchmarkClearFat128(b *testing.B) {
+	p := new([128 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [128 / 4]uint32{}
+	}
+}
+
+func BenchmarkClearFat256(b *testing.B) {
+	p := new([256 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [256 / 4]uint32{}
+	}
+}
+
+func BenchmarkClearFat512(b *testing.B) {
+	p := new([512 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [512 / 4]uint32{}
+	}
+}
+
+func BenchmarkClearFat1024(b *testing.B) {
+	p := new([1024 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [1024 / 4]uint32{}
+	}
+}
+
+func BenchmarkClearFat1032(b *testing.B) {
+	p := new([1032 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [1032 / 4]uint32{}
+	}
+}
+
+func BenchmarkClearFat1040(b *testing.B) {
+	p := new([1040 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = [1040 / 4]uint32{}
+	}
+}
+
+func BenchmarkCopyFat7(b *testing.B) {
+	var x [7]byte
+	p := new([7]byte)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat8(b *testing.B) {
+	var x [8 / 4]uint32
+	p := new([8 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat11(b *testing.B) {
+	var x [11]byte
+	p := new([11]byte)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat12(b *testing.B) {
+	var x [12 / 4]uint32
+	p := new([12 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat13(b *testing.B) {
+	var x [13]byte
+	p := new([13]byte)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat14(b *testing.B) {
+	var x [14]byte
+	p := new([14]byte)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat15(b *testing.B) {
+	var x [15]byte
+	p := new([15]byte)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat16(b *testing.B) {
+	var x [16 / 4]uint32
+	p := new([16 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat24(b *testing.B) {
+	var x [24 / 4]uint32
+	p := new([24 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat32(b *testing.B) {
+	var x [32 / 4]uint32
+	p := new([32 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat64(b *testing.B) {
+	var x [64 / 4]uint32
+	p := new([64 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat72(b *testing.B) {
+	var x [72 / 4]uint32
+	p := new([72 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat128(b *testing.B) {
+	var x [128 / 4]uint32
+	p := new([128 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat256(b *testing.B) {
+	var x [256 / 4]uint32
+	p := new([256 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat512(b *testing.B) {
+	var x [512 / 4]uint32
+	p := new([512 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat520(b *testing.B) {
+	var x [520 / 4]uint32
+	p := new([520 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat1024(b *testing.B) {
+	var x [1024 / 4]uint32
+	p := new([1024 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat1032(b *testing.B) {
+	var x [1032 / 4]uint32
+	p := new([1032 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+func BenchmarkCopyFat1040(b *testing.B) {
+	var x [1040 / 4]uint32
+	p := new([1040 / 4]uint32)
+	Escape(p)
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		*p = x
+	}
+}
+
+// BenchmarkIssue18740 ensures that memmove uses 4 and 8 byte load/store to move 4 and 8 bytes.
+// It used to do 2 2-byte load/stores, which leads to a pipeline stall
+// when we try to read the result with one 4-byte load.
+func BenchmarkIssue18740(b *testing.B) {
+	benchmarks := []struct {
+		name  string
+		nbyte int
+		f     func([]byte) uint64
+	}{
+		{"2byte", 2, func(buf []byte) uint64 { return uint64(binary.LittleEndian.Uint16(buf)) }},
+		{"4byte", 4, func(buf []byte) uint64 { return uint64(binary.LittleEndian.Uint32(buf)) }},
+		{"8byte", 8, func(buf []byte) uint64 { return binary.LittleEndian.Uint64(buf) }},
+	}
+
+	var g [4096]byte
+	for _, bm := range benchmarks {
+		buf := make([]byte, bm.nbyte)
+		b.Run(bm.name, func(b *testing.B) {
+			for j := 0; j < b.N; j++ {
+				for i := 0; i < 4096; i += bm.nbyte {
+					copy(buf[:], g[i:])
+					sink += bm.f(buf[:])
+				}
+			}
+		})
+	}
+}
+
+var memclrSink []int8
+
+func BenchmarkMemclrKnownSize1(b *testing.B) {
+	var x [1]int8
+
+	b.SetBytes(1)
+	for i := 0; i < b.N; i++ {
+		for a := range x {
+			x[a] = 0
+		}
+	}
+
+	memclrSink = x[:]
+}
+func BenchmarkMemclrKnownSize2(b *testing.B) {
+	var x [2]int8
+
+	b.SetBytes(2)
+	for i := 0; i < b.N; i++ {
+		for a := range x {
+			x[a] = 0
+		}
+	}
+
+	memclrSink = x[:]
+}
+func BenchmarkMemclrKnownSize4(b *testing.B) {
+	var x [4]int8
+
+	b.SetBytes(4)
+	for i := 0; i < b.N; i++ {
+		for a := range x {
+			x[a] = 0
+		}
+	}
+
+	memclrSink = x[:]
+}
+func BenchmarkMemclrKnownSize8(b *testing.B) {
+	var x [8]int8
+
+	b.SetBytes(8)
+	for i := 0; i < b.N; i++ {
+		for a := range x {
+			x[a] = 0
+		}
+	}
+
+	memclrSink = x[:]
+}
+func BenchmarkMemclrKnownSize16(b *testing.B) {
+	var x [16]int8
+
+	b.SetBytes(16)
+	for i := 0; i < b.N; i++ {
+		for a := range x {
+			x[a] = 0
+		}
+	}
+
+	memclrSink = x[:]
+}
+func BenchmarkMemclrKnownSize32(b *testing.B) {
+	var x [32]int8
+
+	b.SetBytes(32)
+	for i := 0; i < b.N; i++ {
+		for a := range x {
+			x[a] = 0
+		}
+	}
+
+	memclrSink = x[:]
+}
+func BenchmarkMemclrKnownSize64(b *testing.B) {
+	var x [64]int8
+
+	b.SetBytes(64)
+	for i := 0; i < b.N; i++ {
+		for a := range x {
+			x[a] = 0
+		}
+	}
+
+	memclrSink = x[:]
+}
+func BenchmarkMemclrKnownSize112(b *testing.B) {
+	var x [112]int8
+
+	b.SetBytes(112)
+	for i := 0; i < b.N; i++ {
+		for a := range x {
+			x[a] = 0
+		}
+	}
+
+	memclrSink = x[:]
+}
+
+func BenchmarkMemclrKnownSize128(b *testing.B) {
+	var x [128]int8
+
+	b.SetBytes(128)
+	for i := 0; i < b.N; i++ {
+		for a := range x {
+			x[a] = 0
+		}
+	}
+
+	memclrSink = x[:]
+}
+
+func BenchmarkMemclrKnownSize192(b *testing.B) {
+	var x [192]int8
+
+	b.SetBytes(192)
+	for i := 0; i < b.N; i++ {
+		for a := range x {
+			x[a] = 0
+		}
+	}
+
+	memclrSink = x[:]
+}
+
+func BenchmarkMemclrKnownSize248(b *testing.B) {
+	var x [248]int8
+
+	b.SetBytes(248)
+	for i := 0; i < b.N; i++ {
+		for a := range x {
+			x[a] = 0
+		}
+	}
+
+	memclrSink = x[:]
+}
+
+func BenchmarkMemclrKnownSize256(b *testing.B) {
+	var x [256]int8
+
+	b.SetBytes(256)
+	for i := 0; i < b.N; i++ {
+		for a := range x {
+			x[a] = 0
+		}
+	}
+
+	memclrSink = x[:]
+}
+func BenchmarkMemclrKnownSize512(b *testing.B) {
+	var x [512]int8
+
+	b.SetBytes(512)
+	for i := 0; i < b.N; i++ {
+		for a := range x {
+			x[a] = 0
+		}
+	}
+
+	memclrSink = x[:]
+}
+func BenchmarkMemclrKnownSize1024(b *testing.B) {
+	var x [1024]int8
+
+	b.SetBytes(1024)
+	for i := 0; i < b.N; i++ {
+		for a := range x {
+			x[a] = 0
+		}
+	}
+
+	memclrSink = x[:]
+}
+func BenchmarkMemclrKnownSize4096(b *testing.B) {
+	var x [4096]int8
+
+	b.SetBytes(4096)
+	for i := 0; i < b.N; i++ {
+		for a := range x {
+			x[a] = 0
+		}
+	}
+
+	memclrSink = x[:]
+}
+func BenchmarkMemclrKnownSize512KiB(b *testing.B) {
+	var x [524288]int8
+
+	b.SetBytes(524288)
+	for i := 0; i < b.N; i++ {
+		for a := range x {
+			x[a] = 0
+		}
+	}
+
+	memclrSink = x[:]
+}
diff --git a/src/runtime/memmove_wasm.s b/src/runtime/memmove_wasm.s
new file mode 100644
index 0000000..1be8487
--- /dev/null
+++ b/src/runtime/memmove_wasm.s
@@ -0,0 +1,22 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// See memmove Go doc for important implementation constraints.
+
+// func memmove(to, from unsafe.Pointer, n uintptr)
+TEXT runtime·memmove(SB), NOSPLIT, $0-24
+	MOVD to+0(FP), R0
+	MOVD from+8(FP), R1
+	MOVD n+16(FP), R2
+
+	Get R0
+	I32WrapI64
+	Get R1
+	I32WrapI64
+	Get R2
+	I32WrapI64
+	MemoryCopy
+	RET
diff --git a/src/runtime/metrics.go b/src/runtime/metrics.go
new file mode 100644
index 0000000..3d0f174
--- /dev/null
+++ b/src/runtime/metrics.go
@@ -0,0 +1,855 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// Metrics implementation exported to runtime/metrics.
+
+import (
+	"internal/godebugs"
+	"unsafe"
+)
+
+var (
+	// metrics is a map of runtime/metrics keys to data used by the runtime
+	// to sample each metric's value. metricsInit indicates it has been
+	// initialized.
+	//
+	// These fields are protected by metricsSema which should be
+	// locked/unlocked with metricsLock() / metricsUnlock().
+	metricsSema uint32 = 1
+	metricsInit bool
+	metrics     map[string]metricData
+
+	sizeClassBuckets []float64
+	timeHistBuckets  []float64
+)
+
+type metricData struct {
+	// deps is the set of runtime statistics that this metric
+	// depends on. Before compute is called, the statAggregate
+	// which will be passed must ensure() these dependencies.
+	deps statDepSet
+
+	// compute is a function that populates a metricValue
+	// given a populated statAggregate structure.
+	compute func(in *statAggregate, out *metricValue)
+}
+
+func metricsLock() {
+	// Acquire the metricsSema but with handoff. Operations are typically
+	// expensive enough that queueing up goroutines and handing off between
+	// them will be noticeably better-behaved.
+	semacquire1(&metricsSema, true, 0, 0, waitReasonSemacquire)
+	if raceenabled {
+		raceacquire(unsafe.Pointer(&metricsSema))
+	}
+}
+
+func metricsUnlock() {
+	if raceenabled {
+		racerelease(unsafe.Pointer(&metricsSema))
+	}
+	semrelease(&metricsSema)
+}
+
+// initMetrics initializes the metrics map if it hasn't been yet.
+//
+// metricsSema must be held.
+func initMetrics() {
+	if metricsInit {
+		return
+	}
+
+	sizeClassBuckets = make([]float64, _NumSizeClasses, _NumSizeClasses+1)
+	// Skip size class 0 which is a stand-in for large objects, but large
+	// objects are tracked separately (and they actually get placed in
+	// the last bucket, not the first).
+	sizeClassBuckets[0] = 1 // The smallest allocation is 1 byte in size.
+	for i := 1; i < _NumSizeClasses; i++ {
+		// Size classes have an inclusive upper-bound
+		// and exclusive lower bound (e.g. 48-byte size class is
+		// (32, 48]) whereas we want and inclusive lower-bound
+		// and exclusive upper-bound (e.g. 48-byte size class is
+		// [33, 49)). We can achieve this by shifting all bucket
+		// boundaries up by 1.
+		//
+		// Also, a float64 can precisely represent integers with
+		// value up to 2^53 and size classes are relatively small
+		// (nowhere near 2^48 even) so this will give us exact
+		// boundaries.
+		sizeClassBuckets[i] = float64(class_to_size[i] + 1)
+	}
+	sizeClassBuckets = append(sizeClassBuckets, float64Inf())
+
+	timeHistBuckets = timeHistogramMetricsBuckets()
+	metrics = map[string]metricData{
+		"/cgo/go-to-c-calls:calls": {
+			compute: func(_ *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(NumCgoCall())
+			},
+		},
+		"/cpu/classes/gc/mark/assist:cpu-seconds": {
+			deps: makeStatDepSet(cpuStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindFloat64
+				out.scalar = float64bits(nsToSec(in.cpuStats.gcAssistTime))
+			},
+		},
+		"/cpu/classes/gc/mark/dedicated:cpu-seconds": {
+			deps: makeStatDepSet(cpuStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindFloat64
+				out.scalar = float64bits(nsToSec(in.cpuStats.gcDedicatedTime))
+			},
+		},
+		"/cpu/classes/gc/mark/idle:cpu-seconds": {
+			deps: makeStatDepSet(cpuStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindFloat64
+				out.scalar = float64bits(nsToSec(in.cpuStats.gcIdleTime))
+			},
+		},
+		"/cpu/classes/gc/pause:cpu-seconds": {
+			deps: makeStatDepSet(cpuStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindFloat64
+				out.scalar = float64bits(nsToSec(in.cpuStats.gcPauseTime))
+			},
+		},
+		"/cpu/classes/gc/total:cpu-seconds": {
+			deps: makeStatDepSet(cpuStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindFloat64
+				out.scalar = float64bits(nsToSec(in.cpuStats.gcTotalTime))
+			},
+		},
+		"/cpu/classes/idle:cpu-seconds": {
+			deps: makeStatDepSet(cpuStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindFloat64
+				out.scalar = float64bits(nsToSec(in.cpuStats.idleTime))
+			},
+		},
+		"/cpu/classes/scavenge/assist:cpu-seconds": {
+			deps: makeStatDepSet(cpuStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindFloat64
+				out.scalar = float64bits(nsToSec(in.cpuStats.scavengeAssistTime))
+			},
+		},
+		"/cpu/classes/scavenge/background:cpu-seconds": {
+			deps: makeStatDepSet(cpuStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindFloat64
+				out.scalar = float64bits(nsToSec(in.cpuStats.scavengeBgTime))
+			},
+		},
+		"/cpu/classes/scavenge/total:cpu-seconds": {
+			deps: makeStatDepSet(cpuStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindFloat64
+				out.scalar = float64bits(nsToSec(in.cpuStats.scavengeTotalTime))
+			},
+		},
+		"/cpu/classes/total:cpu-seconds": {
+			deps: makeStatDepSet(cpuStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindFloat64
+				out.scalar = float64bits(nsToSec(in.cpuStats.totalTime))
+			},
+		},
+		"/cpu/classes/user:cpu-seconds": {
+			deps: makeStatDepSet(cpuStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindFloat64
+				out.scalar = float64bits(nsToSec(in.cpuStats.userTime))
+			},
+		},
+		"/gc/cycles/automatic:gc-cycles": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.gcCyclesDone - in.sysStats.gcCyclesForced
+			},
+		},
+		"/gc/cycles/forced:gc-cycles": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.gcCyclesForced
+			},
+		},
+		"/gc/cycles/total:gc-cycles": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.gcCyclesDone
+			},
+		},
+		"/gc/scan/globals:bytes": {
+			deps: makeStatDepSet(gcStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.gcStats.globalsScan
+			},
+		},
+		"/gc/scan/heap:bytes": {
+			deps: makeStatDepSet(gcStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.gcStats.heapScan
+			},
+		},
+		"/gc/scan/stack:bytes": {
+			deps: makeStatDepSet(gcStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.gcStats.stackScan
+			},
+		},
+		"/gc/scan/total:bytes": {
+			deps: makeStatDepSet(gcStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.gcStats.totalScan
+			},
+		},
+		"/gc/heap/allocs-by-size:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				hist := out.float64HistOrInit(sizeClassBuckets)
+				hist.counts[len(hist.counts)-1] = uint64(in.heapStats.largeAllocCount)
+				// Cut off the first index which is ostensibly for size class 0,
+				// but large objects are tracked separately so it's actually unused.
+				for i, count := range in.heapStats.smallAllocCount[1:] {
+					hist.counts[i] = uint64(count)
+				}
+			},
+		},
+		"/gc/heap/allocs:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.heapStats.totalAllocated
+			},
+		},
+		"/gc/heap/allocs:objects": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.heapStats.totalAllocs
+			},
+		},
+		"/gc/heap/frees-by-size:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				hist := out.float64HistOrInit(sizeClassBuckets)
+				hist.counts[len(hist.counts)-1] = uint64(in.heapStats.largeFreeCount)
+				// Cut off the first index which is ostensibly for size class 0,
+				// but large objects are tracked separately so it's actually unused.
+				for i, count := range in.heapStats.smallFreeCount[1:] {
+					hist.counts[i] = uint64(count)
+				}
+			},
+		},
+		"/gc/heap/frees:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.heapStats.totalFreed
+			},
+		},
+		"/gc/heap/frees:objects": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.heapStats.totalFrees
+			},
+		},
+		"/gc/heap/goal:bytes": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.heapGoal
+			},
+		},
+		"/gc/gomemlimit:bytes": {
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(gcController.memoryLimit.Load())
+			},
+		},
+		"/gc/gogc:percent": {
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(gcController.gcPercent.Load())
+			},
+		},
+		"/gc/heap/live:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = gcController.heapMarked
+			},
+		},
+		"/gc/heap/objects:objects": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.heapStats.numObjects
+			},
+		},
+		"/gc/heap/tiny/allocs:objects": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(in.heapStats.tinyAllocCount)
+			},
+		},
+		"/gc/limiter/last-enabled:gc-cycle": {
+			compute: func(_ *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(gcCPULimiter.lastEnabledCycle.Load())
+			},
+		},
+		"/gc/pauses:seconds": {
+			compute: func(_ *statAggregate, out *metricValue) {
+				hist := out.float64HistOrInit(timeHistBuckets)
+				// The bottom-most bucket, containing negative values, is tracked
+				// as a separately as underflow, so fill that in manually and then
+				// iterate over the rest.
+				hist.counts[0] = memstats.gcPauseDist.underflow.Load()
+				for i := range memstats.gcPauseDist.counts {
+					hist.counts[i+1] = memstats.gcPauseDist.counts[i].Load()
+				}
+				hist.counts[len(hist.counts)-1] = memstats.gcPauseDist.overflow.Load()
+			},
+		},
+		"/gc/stack/starting-size:bytes": {
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(startingStackSize)
+			},
+		},
+		"/memory/classes/heap/free:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(in.heapStats.committed - in.heapStats.inHeap -
+					in.heapStats.inStacks - in.heapStats.inWorkBufs -
+					in.heapStats.inPtrScalarBits)
+			},
+		},
+		"/memory/classes/heap/objects:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.heapStats.inObjects
+			},
+		},
+		"/memory/classes/heap/released:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(in.heapStats.released)
+			},
+		},
+		"/memory/classes/heap/stacks:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(in.heapStats.inStacks)
+			},
+		},
+		"/memory/classes/heap/unused:bytes": {
+			deps: makeStatDepSet(heapStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(in.heapStats.inHeap) - in.heapStats.inObjects
+			},
+		},
+		"/memory/classes/metadata/mcache/free:bytes": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.mCacheSys - in.sysStats.mCacheInUse
+			},
+		},
+		"/memory/classes/metadata/mcache/inuse:bytes": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.mCacheInUse
+			},
+		},
+		"/memory/classes/metadata/mspan/free:bytes": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.mSpanSys - in.sysStats.mSpanInUse
+			},
+		},
+		"/memory/classes/metadata/mspan/inuse:bytes": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.mSpanInUse
+			},
+		},
+		"/memory/classes/metadata/other:bytes": {
+			deps: makeStatDepSet(heapStatsDep, sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(in.heapStats.inWorkBufs+in.heapStats.inPtrScalarBits) + in.sysStats.gcMiscSys
+			},
+		},
+		"/memory/classes/os-stacks:bytes": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.stacksSys
+			},
+		},
+		"/memory/classes/other:bytes": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.otherSys
+			},
+		},
+		"/memory/classes/profiling/buckets:bytes": {
+			deps: makeStatDepSet(sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = in.sysStats.buckHashSys
+			},
+		},
+		"/memory/classes/total:bytes": {
+			deps: makeStatDepSet(heapStatsDep, sysStatsDep),
+			compute: func(in *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(in.heapStats.committed+in.heapStats.released) +
+					in.sysStats.stacksSys + in.sysStats.mSpanSys +
+					in.sysStats.mCacheSys + in.sysStats.buckHashSys +
+					in.sysStats.gcMiscSys + in.sysStats.otherSys
+			},
+		},
+		"/sched/gomaxprocs:threads": {
+			compute: func(_ *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(gomaxprocs)
+			},
+		},
+		"/sched/goroutines:goroutines": {
+			compute: func(_ *statAggregate, out *metricValue) {
+				out.kind = metricKindUint64
+				out.scalar = uint64(gcount())
+			},
+		},
+		"/sched/latencies:seconds": {
+			compute: func(_ *statAggregate, out *metricValue) {
+				hist := out.float64HistOrInit(timeHistBuckets)
+				hist.counts[0] = sched.timeToRun.underflow.Load()
+				for i := range sched.timeToRun.counts {
+					hist.counts[i+1] = sched.timeToRun.counts[i].Load()
+				}
+				hist.counts[len(hist.counts)-1] = sched.timeToRun.overflow.Load()
+			},
+		},
+		"/sync/mutex/wait/total:seconds": {
+			compute: func(_ *statAggregate, out *metricValue) {
+				out.kind = metricKindFloat64
+				out.scalar = float64bits(nsToSec(sched.totalMutexWaitTime.Load()))
+			},
+		},
+	}
+
+	for _, info := range godebugs.All {
+		if !info.Opaque {
+			metrics["/godebug/non-default-behavior/"+info.Name+":events"] = metricData{compute: compute0}
+		}
+	}
+
+	metricsInit = true
+}
+
+func compute0(_ *statAggregate, out *metricValue) {
+	out.kind = metricKindUint64
+	out.scalar = 0
+}
+
+type metricReader func() uint64
+
+func (f metricReader) compute(_ *statAggregate, out *metricValue) {
+	out.kind = metricKindUint64
+	out.scalar = f()
+}
+
+//go:linkname godebug_registerMetric internal/godebug.registerMetric
+func godebug_registerMetric(name string, read func() uint64) {
+	metricsLock()
+	initMetrics()
+	d, ok := metrics[name]
+	if !ok {
+		throw("runtime: unexpected metric registration for " + name)
+	}
+	d.compute = metricReader(read).compute
+	metrics[name] = d
+	metricsUnlock()
+}
+
+// statDep is a dependency on a group of statistics
+// that a metric might have.
+type statDep uint
+
+const (
+	heapStatsDep statDep = iota // corresponds to heapStatsAggregate
+	sysStatsDep                 // corresponds to sysStatsAggregate
+	cpuStatsDep                 // corresponds to cpuStatsAggregate
+	gcStatsDep                  // corresponds to gcStatsAggregate
+	numStatsDeps
+)
+
+// statDepSet represents a set of statDeps.
+//
+// Under the hood, it's a bitmap.
+type statDepSet [1]uint64
+
+// makeStatDepSet creates a new statDepSet from a list of statDeps.
+func makeStatDepSet(deps ...statDep) statDepSet {
+	var s statDepSet
+	for _, d := range deps {
+		s[d/64] |= 1 << (d % 64)
+	}
+	return s
+}
+
+// difference returns set difference of s from b as a new set.
+func (s statDepSet) difference(b statDepSet) statDepSet {
+	var c statDepSet
+	for i := range s {
+		c[i] = s[i] &^ b[i]
+	}
+	return c
+}
+
+// union returns the union of the two sets as a new set.
+func (s statDepSet) union(b statDepSet) statDepSet {
+	var c statDepSet
+	for i := range s {
+		c[i] = s[i] | b[i]
+	}
+	return c
+}
+
+// empty returns true if there are no dependencies in the set.
+func (s *statDepSet) empty() bool {
+	for _, c := range s {
+		if c != 0 {
+			return false
+		}
+	}
+	return true
+}
+
+// has returns true if the set contains a given statDep.
+func (s *statDepSet) has(d statDep) bool {
+	return s[d/64]&(1<<(d%64)) != 0
+}
+
+// heapStatsAggregate represents memory stats obtained from the
+// runtime. This set of stats is grouped together because they
+// depend on each other in some way to make sense of the runtime's
+// current heap memory use. They're also sharded across Ps, so it
+// makes sense to grab them all at once.
+type heapStatsAggregate struct {
+	heapStatsDelta
+
+	// Derived from values in heapStatsDelta.
+
+	// inObjects is the bytes of memory occupied by objects,
+	inObjects uint64
+
+	// numObjects is the number of live objects in the heap.
+	numObjects uint64
+
+	// totalAllocated is the total bytes of heap objects allocated
+	// over the lifetime of the program.
+	totalAllocated uint64
+
+	// totalFreed is the total bytes of heap objects freed
+	// over the lifetime of the program.
+	totalFreed uint64
+
+	// totalAllocs is the number of heap objects allocated over
+	// the lifetime of the program.
+	totalAllocs uint64
+
+	// totalFrees is the number of heap objects freed over
+	// the lifetime of the program.
+	totalFrees uint64
+}
+
+// compute populates the heapStatsAggregate with values from the runtime.
+func (a *heapStatsAggregate) compute() {
+	memstats.heapStats.read(&a.heapStatsDelta)
+
+	// Calculate derived stats.
+	a.totalAllocs = a.largeAllocCount
+	a.totalFrees = a.largeFreeCount
+	a.totalAllocated = a.largeAlloc
+	a.totalFreed = a.largeFree
+	for i := range a.smallAllocCount {
+		na := a.smallAllocCount[i]
+		nf := a.smallFreeCount[i]
+		a.totalAllocs += na
+		a.totalFrees += nf
+		a.totalAllocated += na * uint64(class_to_size[i])
+		a.totalFreed += nf * uint64(class_to_size[i])
+	}
+	a.inObjects = a.totalAllocated - a.totalFreed
+	a.numObjects = a.totalAllocs - a.totalFrees
+}
+
+// sysStatsAggregate represents system memory stats obtained
+// from the runtime. This set of stats is grouped together because
+// they're all relatively cheap to acquire and generally independent
+// of one another and other runtime memory stats. The fact that they
+// may be acquired at different times, especially with respect to
+// heapStatsAggregate, means there could be some skew, but because of
+// these stats are independent, there's no real consistency issue here.
+type sysStatsAggregate struct {
+	stacksSys      uint64
+	mSpanSys       uint64
+	mSpanInUse     uint64
+	mCacheSys      uint64
+	mCacheInUse    uint64
+	buckHashSys    uint64
+	gcMiscSys      uint64
+	otherSys       uint64
+	heapGoal       uint64
+	gcCyclesDone   uint64
+	gcCyclesForced uint64
+}
+
+// compute populates the sysStatsAggregate with values from the runtime.
+func (a *sysStatsAggregate) compute() {
+	a.stacksSys = memstats.stacks_sys.load()
+	a.buckHashSys = memstats.buckhash_sys.load()
+	a.gcMiscSys = memstats.gcMiscSys.load()
+	a.otherSys = memstats.other_sys.load()
+	a.heapGoal = gcController.heapGoal()
+	a.gcCyclesDone = uint64(memstats.numgc)
+	a.gcCyclesForced = uint64(memstats.numforcedgc)
+
+	systemstack(func() {
+		lock(&mheap_.lock)
+		a.mSpanSys = memstats.mspan_sys.load()
+		a.mSpanInUse = uint64(mheap_.spanalloc.inuse)
+		a.mCacheSys = memstats.mcache_sys.load()
+		a.mCacheInUse = uint64(mheap_.cachealloc.inuse)
+		unlock(&mheap_.lock)
+	})
+}
+
+// cpuStatsAggregate represents CPU stats obtained from the runtime
+// acquired together to avoid skew and inconsistencies.
+type cpuStatsAggregate struct {
+	cpuStats
+}
+
+// compute populates the cpuStatsAggregate with values from the runtime.
+func (a *cpuStatsAggregate) compute() {
+	a.cpuStats = work.cpuStats
+	// TODO(mknyszek): Update the CPU stats again so that we're not
+	// just relying on the STW snapshot. The issue here is that currently
+	// this will cause non-monotonicity in the "user" CPU time metric.
+	//
+	// a.cpuStats.accumulate(nanotime(), gcphase == _GCmark)
+}
+
+// gcStatsAggregate represents various GC stats obtained from the runtime
+// acquired together to avoid skew and inconsistencies.
+type gcStatsAggregate struct {
+	heapScan    uint64
+	stackScan   uint64
+	globalsScan uint64
+	totalScan   uint64
+}
+
+// compute populates the gcStatsAggregate with values from the runtime.
+func (a *gcStatsAggregate) compute() {
+	a.heapScan = gcController.heapScan.Load()
+	a.stackScan = uint64(gcController.lastStackScan.Load())
+	a.globalsScan = gcController.globalsScan.Load()
+	a.totalScan = a.heapScan + a.stackScan + a.globalsScan
+}
+
+// nsToSec takes a duration in nanoseconds and converts it to seconds as
+// a float64.
+func nsToSec(ns int64) float64 {
+	return float64(ns) / 1e9
+}
+
+// statAggregate is the main driver of the metrics implementation.
+//
+// It contains multiple aggregates of runtime statistics, as well
+// as a set of these aggregates that it has populated. The aggregates
+// are populated lazily by its ensure method.
+type statAggregate struct {
+	ensured   statDepSet
+	heapStats heapStatsAggregate
+	sysStats  sysStatsAggregate
+	cpuStats  cpuStatsAggregate
+	gcStats   gcStatsAggregate
+}
+
+// ensure populates statistics aggregates determined by deps if they
+// haven't yet been populated.
+func (a *statAggregate) ensure(deps *statDepSet) {
+	missing := deps.difference(a.ensured)
+	if missing.empty() {
+		return
+	}
+	for i := statDep(0); i < numStatsDeps; i++ {
+		if !missing.has(i) {
+			continue
+		}
+		switch i {
+		case heapStatsDep:
+			a.heapStats.compute()
+		case sysStatsDep:
+			a.sysStats.compute()
+		case cpuStatsDep:
+			a.cpuStats.compute()
+		case gcStatsDep:
+			a.gcStats.compute()
+		}
+	}
+	a.ensured = a.ensured.union(missing)
+}
+
+// metricKind is a runtime copy of runtime/metrics.ValueKind and
+// must be kept structurally identical to that type.
+type metricKind int
+
+const (
+	// These values must be kept identical to their corresponding Kind* values
+	// in the runtime/metrics package.
+	metricKindBad metricKind = iota
+	metricKindUint64
+	metricKindFloat64
+	metricKindFloat64Histogram
+)
+
+// metricSample is a runtime copy of runtime/metrics.Sample and
+// must be kept structurally identical to that type.
+type metricSample struct {
+	name  string
+	value metricValue
+}
+
+// metricValue is a runtime copy of runtime/metrics.Sample and
+// must be kept structurally identical to that type.
+type metricValue struct {
+	kind    metricKind
+	scalar  uint64         // contains scalar values for scalar Kinds.
+	pointer unsafe.Pointer // contains non-scalar values.
+}
+
+// float64HistOrInit tries to pull out an existing float64Histogram
+// from the value, but if none exists, then it allocates one with
+// the given buckets.
+func (v *metricValue) float64HistOrInit(buckets []float64) *metricFloat64Histogram {
+	var hist *metricFloat64Histogram
+	if v.kind == metricKindFloat64Histogram && v.pointer != nil {
+		hist = (*metricFloat64Histogram)(v.pointer)
+	} else {
+		v.kind = metricKindFloat64Histogram
+		hist = new(metricFloat64Histogram)
+		v.pointer = unsafe.Pointer(hist)
+	}
+	hist.buckets = buckets
+	if len(hist.counts) != len(hist.buckets)-1 {
+		hist.counts = make([]uint64, len(buckets)-1)
+	}
+	return hist
+}
+
+// metricFloat64Histogram is a runtime copy of runtime/metrics.Float64Histogram
+// and must be kept structurally identical to that type.
+type metricFloat64Histogram struct {
+	counts  []uint64
+	buckets []float64
+}
+
+// agg is used by readMetrics, and is protected by metricsSema.
+//
+// Managed as a global variable because its pointer will be
+// an argument to a dynamically-defined function, and we'd
+// like to avoid it escaping to the heap.
+var agg statAggregate
+
+type metricName struct {
+	name string
+	kind metricKind
+}
+
+// readMetricNames is the implementation of runtime/metrics.readMetricNames,
+// used by the runtime/metrics test and otherwise unreferenced.
+//
+//go:linkname readMetricNames runtime/metrics_test.runtime_readMetricNames
+func readMetricNames() []string {
+	metricsLock()
+	initMetrics()
+	n := len(metrics)
+	metricsUnlock()
+
+	list := make([]string, 0, n)
+
+	metricsLock()
+	for name := range metrics {
+		list = append(list, name)
+	}
+	metricsUnlock()
+
+	return list
+}
+
+// readMetrics is the implementation of runtime/metrics.Read.
+//
+//go:linkname readMetrics runtime/metrics.runtime_readMetrics
+func readMetrics(samplesp unsafe.Pointer, len int, cap int) {
+	// Construct a slice from the args.
+	sl := slice{samplesp, len, cap}
+	samples := *(*[]metricSample)(unsafe.Pointer(&sl))
+
+	metricsLock()
+
+	// Ensure the map is initialized.
+	initMetrics()
+
+	// Clear agg defensively.
+	agg = statAggregate{}
+
+	// Sample.
+	for i := range samples {
+		sample := &samples[i]
+		data, ok := metrics[sample.name]
+		if !ok {
+			sample.value.kind = metricKindBad
+			continue
+		}
+		// Ensure we have all the stats we need.
+		// agg is populated lazily.
+		agg.ensure(&data.deps)
+
+		// Compute the value based on the stats we have.
+		data.compute(&agg, &sample.value)
+	}
+
+	metricsUnlock()
+}
diff --git a/src/runtime/metrics/description.go b/src/runtime/metrics/description.go
new file mode 100644
index 0000000..745691b
--- /dev/null
+++ b/src/runtime/metrics/description.go
@@ -0,0 +1,453 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package metrics
+
+import "internal/godebugs"
+
+// Description describes a runtime metric.
+type Description struct {
+	// Name is the full name of the metric which includes the unit.
+	//
+	// The format of the metric may be described by the following regular expression.
+	//
+	// 	^(?P<name>/[^:]+):(?P<unit>[^:*/]+(?:[*/][^:*/]+)*)$
+	//
+	// The format splits the name into two components, separated by a colon: a path which always
+	// starts with a /, and a machine-parseable unit. The name may contain any valid Unicode
+	// codepoint in between / characters, but by convention will try to stick to lowercase
+	// characters and hyphens. An example of such a path might be "/memory/heap/free".
+	//
+	// The unit is by convention a series of lowercase English unit names (singular or plural)
+	// without prefixes delimited by '*' or '/'. The unit names may contain any valid Unicode
+	// codepoint that is not a delimiter.
+	// Examples of units might be "seconds", "bytes", "bytes/second", "cpu-seconds",
+	// "byte*cpu-seconds", and "bytes/second/second".
+	//
+	// For histograms, multiple units may apply. For instance, the units of the buckets and
+	// the count. By convention, for histograms, the units of the count are always "samples"
+	// with the type of sample evident by the metric's name, while the unit in the name
+	// specifies the buckets' unit.
+	//
+	// A complete name might look like "/memory/heap/free:bytes".
+	Name string
+
+	// Description is an English language sentence describing the metric.
+	Description string
+
+	// Kind is the kind of value for this metric.
+	//
+	// The purpose of this field is to allow users to filter out metrics whose values are
+	// types which their application may not understand.
+	Kind ValueKind
+
+	// Cumulative is whether or not the metric is cumulative. If a cumulative metric is just
+	// a single number, then it increases monotonically. If the metric is a distribution,
+	// then each bucket count increases monotonically.
+	//
+	// This flag thus indicates whether or not it's useful to compute a rate from this value.
+	Cumulative bool
+}
+
+// The English language descriptions below must be kept in sync with the
+// descriptions of each metric in doc.go by running 'go generate'.
+var allDesc = []Description{
+	{
+		Name:        "/cgo/go-to-c-calls:calls",
+		Description: "Count of calls made from Go to C by the current process.",
+		Kind:        KindUint64,
+		Cumulative:  true,
+	},
+	{
+		Name: "/cpu/classes/gc/mark/assist:cpu-seconds",
+		Description: "Estimated total CPU time goroutines spent performing GC tasks " +
+			"to assist the GC and prevent it from falling behind the application. " +
+			"This metric is an overestimate, and not directly comparable to " +
+			"system CPU time measurements. Compare only with other /cpu/classes " +
+			"metrics.",
+		Kind:       KindFloat64,
+		Cumulative: true,
+	},
+	{
+		Name: "/cpu/classes/gc/mark/dedicated:cpu-seconds",
+		Description: "Estimated total CPU time spent performing GC tasks on " +
+			"processors (as defined by GOMAXPROCS) dedicated to those tasks. " +
+			"This metric is an overestimate, and not directly comparable to " +
+			"system CPU time measurements. Compare only with other /cpu/classes " +
+			"metrics.",
+		Kind:       KindFloat64,
+		Cumulative: true,
+	},
+	{
+		Name: "/cpu/classes/gc/mark/idle:cpu-seconds",
+		Description: "Estimated total CPU time spent performing GC tasks on " +
+			"spare CPU resources that the Go scheduler could not otherwise find " +
+			"a use for. This should be subtracted from the total GC CPU time to " +
+			"obtain a measure of compulsory GC CPU time. " +
+			"This metric is an overestimate, and not directly comparable to " +
+			"system CPU time measurements. Compare only with other /cpu/classes " +
+			"metrics.",
+		Kind:       KindFloat64,
+		Cumulative: true,
+	},
+	{
+		Name: "/cpu/classes/gc/pause:cpu-seconds",
+		Description: "Estimated total CPU time spent with the application paused by " +
+			"the GC. Even if only one thread is running during the pause, this is " +
+			"computed as GOMAXPROCS times the pause latency because nothing else " +
+			"can be executing. This is the exact sum of samples in /gc/pause:seconds " +
+			"if each sample is multiplied by GOMAXPROCS at the time it is taken. " +
+			"This metric is an overestimate, and not directly comparable to " +
+			"system CPU time measurements. Compare only with other /cpu/classes " +
+			"metrics.",
+		Kind:       KindFloat64,
+		Cumulative: true,
+	},
+	{
+		Name: "/cpu/classes/gc/total:cpu-seconds",
+		Description: "Estimated total CPU time spent performing GC tasks. " +
+			"This metric is an overestimate, and not directly comparable to " +
+			"system CPU time measurements. Compare only with other /cpu/classes " +
+			"metrics. Sum of all metrics in /cpu/classes/gc.",
+		Kind:       KindFloat64,
+		Cumulative: true,
+	},
+	{
+		Name: "/cpu/classes/idle:cpu-seconds",
+		Description: "Estimated total available CPU time not spent executing any Go or Go runtime code. " +
+			"In other words, the part of /cpu/classes/total:cpu-seconds that was unused. " +
+			"This metric is an overestimate, and not directly comparable to " +
+			"system CPU time measurements. Compare only with other /cpu/classes " +
+			"metrics.",
+		Kind:       KindFloat64,
+		Cumulative: true,
+	},
+	{
+		Name: "/cpu/classes/scavenge/assist:cpu-seconds",
+		Description: "Estimated total CPU time spent returning unused memory to the " +
+			"underlying platform in response eagerly in response to memory pressure. " +
+			"This metric is an overestimate, and not directly comparable to " +
+			"system CPU time measurements. Compare only with other /cpu/classes " +
+			"metrics.",
+		Kind:       KindFloat64,
+		Cumulative: true,
+	},
+	{
+		Name: "/cpu/classes/scavenge/background:cpu-seconds",
+		Description: "Estimated total CPU time spent performing background tasks " +
+			"to return unused memory to the underlying platform. " +
+			"This metric is an overestimate, and not directly comparable to " +
+			"system CPU time measurements. Compare only with other /cpu/classes " +
+			"metrics.",
+		Kind:       KindFloat64,
+		Cumulative: true,
+	},
+	{
+		Name: "/cpu/classes/scavenge/total:cpu-seconds",
+		Description: "Estimated total CPU time spent performing tasks that return " +
+			"unused memory to the underlying platform. " +
+			"This metric is an overestimate, and not directly comparable to " +
+			"system CPU time measurements. Compare only with other /cpu/classes " +
+			"metrics. Sum of all metrics in /cpu/classes/scavenge.",
+		Kind:       KindFloat64,
+		Cumulative: true,
+	},
+	{
+		Name: "/cpu/classes/total:cpu-seconds",
+		Description: "Estimated total available CPU time for user Go code " +
+			"or the Go runtime, as defined by GOMAXPROCS. In other words, GOMAXPROCS " +
+			"integrated over the wall-clock duration this process has been executing for. " +
+			"This metric is an overestimate, and not directly comparable to " +
+			"system CPU time measurements. Compare only with other /cpu/classes " +
+			"metrics. Sum of all metrics in /cpu/classes.",
+		Kind:       KindFloat64,
+		Cumulative: true,
+	},
+	{
+		Name: "/cpu/classes/user:cpu-seconds",
+		Description: "Estimated total CPU time spent running user Go code. This may " +
+			"also include some small amount of time spent in the Go runtime. " +
+			"This metric is an overestimate, and not directly comparable to " +
+			"system CPU time measurements. Compare only with other /cpu/classes " +
+			"metrics.",
+		Kind:       KindFloat64,
+		Cumulative: true,
+	},
+	{
+		Name:        "/gc/cycles/automatic:gc-cycles",
+		Description: "Count of completed GC cycles generated by the Go runtime.",
+		Kind:        KindUint64,
+		Cumulative:  true,
+	},
+	{
+		Name:        "/gc/cycles/forced:gc-cycles",
+		Description: "Count of completed GC cycles forced by the application.",
+		Kind:        KindUint64,
+		Cumulative:  true,
+	},
+	{
+		Name:        "/gc/cycles/total:gc-cycles",
+		Description: "Count of all completed GC cycles.",
+		Kind:        KindUint64,
+		Cumulative:  true,
+	},
+	{
+		Name: "/gc/gogc:percent",
+		Description: "Heap size target percentage configured by the user, otherwise 100. This " +
+			"value is set by the GOGC environment variable, and the runtime/debug.SetGCPercent " +
+			"function.",
+		Kind: KindUint64,
+	},
+	{
+		Name: "/gc/gomemlimit:bytes",
+		Description: "Go runtime memory limit configured by the user, otherwise " +
+			"math.MaxInt64. This value is set by the GOMEMLIMIT environment variable, and " +
+			"the runtime/debug.SetMemoryLimit function.",
+		Kind: KindUint64,
+	},
+	{
+		Name: "/gc/heap/allocs-by-size:bytes",
+		Description: "Distribution of heap allocations by approximate size. " +
+			"Bucket counts increase monotonically. " +
+			"Note that this does not include tiny objects as defined by " +
+			"/gc/heap/tiny/allocs:objects, only tiny blocks.",
+		Kind:       KindFloat64Histogram,
+		Cumulative: true,
+	},
+	{
+		Name:        "/gc/heap/allocs:bytes",
+		Description: "Cumulative sum of memory allocated to the heap by the application.",
+		Kind:        KindUint64,
+		Cumulative:  true,
+	},
+	{
+		Name: "/gc/heap/allocs:objects",
+		Description: "Cumulative count of heap allocations triggered by the application. " +
+			"Note that this does not include tiny objects as defined by " +
+			"/gc/heap/tiny/allocs:objects, only tiny blocks.",
+		Kind:       KindUint64,
+		Cumulative: true,
+	},
+	{
+		Name: "/gc/heap/frees-by-size:bytes",
+		Description: "Distribution of freed heap allocations by approximate size. " +
+			"Bucket counts increase monotonically. " +
+			"Note that this does not include tiny objects as defined by " +
+			"/gc/heap/tiny/allocs:objects, only tiny blocks.",
+		Kind:       KindFloat64Histogram,
+		Cumulative: true,
+	},
+	{
+		Name:        "/gc/heap/frees:bytes",
+		Description: "Cumulative sum of heap memory freed by the garbage collector.",
+		Kind:        KindUint64,
+		Cumulative:  true,
+	},
+	{
+		Name: "/gc/heap/frees:objects",
+		Description: "Cumulative count of heap allocations whose storage was freed " +
+			"by the garbage collector. " +
+			"Note that this does not include tiny objects as defined by " +
+			"/gc/heap/tiny/allocs:objects, only tiny blocks.",
+		Kind:       KindUint64,
+		Cumulative: true,
+	},
+	{
+		Name:        "/gc/heap/goal:bytes",
+		Description: "Heap size target for the end of the GC cycle.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/gc/heap/live:bytes",
+		Description: "Heap memory occupied by live objects that were marked by the previous GC.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/gc/heap/objects:objects",
+		Description: "Number of objects, live or unswept, occupying heap memory.",
+		Kind:        KindUint64,
+	},
+	{
+		Name: "/gc/heap/tiny/allocs:objects",
+		Description: "Count of small allocations that are packed together into blocks. " +
+			"These allocations are counted separately from other allocations " +
+			"because each individual allocation is not tracked by the runtime, " +
+			"only their block. Each block is already accounted for in " +
+			"allocs-by-size and frees-by-size.",
+		Kind:       KindUint64,
+		Cumulative: true,
+	},
+	{
+		Name: "/gc/limiter/last-enabled:gc-cycle",
+		Description: "GC cycle the last time the GC CPU limiter was enabled. " +
+			"This metric is useful for diagnosing the root cause of an out-of-memory " +
+			"error, because the limiter trades memory for CPU time when the GC's CPU " +
+			"time gets too high. This is most likely to occur with use of SetMemoryLimit. " +
+			"The first GC cycle is cycle 1, so a value of 0 indicates that it was never enabled.",
+		Kind: KindUint64,
+	},
+	{
+		Name:        "/gc/pauses:seconds",
+		Description: "Distribution of individual GC-related stop-the-world pause latencies. Bucket counts increase monotonically.",
+		Kind:        KindFloat64Histogram,
+		Cumulative:  true,
+	},
+	{
+		Name:        "/gc/scan/globals:bytes",
+		Description: "The total amount of global variable space that is scannable.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/gc/scan/heap:bytes",
+		Description: "The total amount of heap space that is scannable.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/gc/scan/stack:bytes",
+		Description: "The number of bytes of stack that were scanned last GC cycle.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/gc/scan/total:bytes",
+		Description: "The total amount space that is scannable. Sum of all metrics in /gc/scan.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/gc/stack/starting-size:bytes",
+		Description: "The stack size of new goroutines.",
+		Kind:        KindUint64,
+		Cumulative:  false,
+	},
+	{
+		Name: "/memory/classes/heap/free:bytes",
+		Description: "Memory that is completely free and eligible to be returned to the underlying system, " +
+			"but has not been. This metric is the runtime's estimate of free address space that is backed by " +
+			"physical memory.",
+		Kind: KindUint64,
+	},
+	{
+		Name:        "/memory/classes/heap/objects:bytes",
+		Description: "Memory occupied by live objects and dead objects that have not yet been marked free by the garbage collector.",
+		Kind:        KindUint64,
+	},
+	{
+		Name: "/memory/classes/heap/released:bytes",
+		Description: "Memory that is completely free and has been returned to the underlying system. This " +
+			"metric is the runtime's estimate of free address space that is still mapped into the process, " +
+			"but is not backed by physical memory.",
+		Kind: KindUint64,
+	},
+	{
+		Name: "/memory/classes/heap/stacks:bytes",
+		Description: "Memory allocated from the heap that is reserved for stack space, whether or not it is currently in-use. " +
+			"Currently, this represents all stack memory for goroutines. It also includes all OS thread stacks in non-cgo programs. " +
+			"Note that stacks may be allocated differently in the future, and this may change.",
+		Kind: KindUint64,
+	},
+	{
+		Name:        "/memory/classes/heap/unused:bytes",
+		Description: "Memory that is reserved for heap objects but is not currently used to hold heap objects.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/metadata/mcache/free:bytes",
+		Description: "Memory that is reserved for runtime mcache structures, but not in-use.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/metadata/mcache/inuse:bytes",
+		Description: "Memory that is occupied by runtime mcache structures that are currently being used.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/metadata/mspan/free:bytes",
+		Description: "Memory that is reserved for runtime mspan structures, but not in-use.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/metadata/mspan/inuse:bytes",
+		Description: "Memory that is occupied by runtime mspan structures that are currently being used.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/metadata/other:bytes",
+		Description: "Memory that is reserved for or used to hold runtime metadata.",
+		Kind:        KindUint64,
+	},
+	{
+		Name: "/memory/classes/os-stacks:bytes",
+		Description: "Stack memory allocated by the underlying operating system. " +
+			"In non-cgo programs this metric is currently zero. This may change in the future." +
+			"In cgo programs this metric includes OS thread stacks allocated directly from the OS. " +
+			"Currently, this only accounts for one stack in c-shared and c-archive build modes, " +
+			"and other sources of stacks from the OS are not measured. This too may change in the future.",
+		Kind: KindUint64,
+	},
+	{
+		Name:        "/memory/classes/other:bytes",
+		Description: "Memory used by execution trace buffers, structures for debugging the runtime, finalizer and profiler specials, and more.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/profiling/buckets:bytes",
+		Description: "Memory that is used by the stack trace hash map used for profiling.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/memory/classes/total:bytes",
+		Description: "All memory mapped by the Go runtime into the current process as read-write. Note that this does not include memory mapped by code called via cgo or via the syscall package. Sum of all metrics in /memory/classes.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/sched/gomaxprocs:threads",
+		Description: "The current runtime.GOMAXPROCS setting, or the number of operating system threads that can execute user-level Go code simultaneously.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/sched/goroutines:goroutines",
+		Description: "Count of live goroutines.",
+		Kind:        KindUint64,
+	},
+	{
+		Name:        "/sched/latencies:seconds",
+		Description: "Distribution of the time goroutines have spent in the scheduler in a runnable state before actually running. Bucket counts increase monotonically.",
+		Kind:        KindFloat64Histogram,
+		Cumulative:  true,
+	},
+	{
+		Name:        "/sync/mutex/wait/total:seconds",
+		Description: "Approximate cumulative time goroutines have spent blocked on a sync.Mutex or sync.RWMutex. This metric is useful for identifying global changes in lock contention. Collect a mutex or block profile using the runtime/pprof package for more detailed contention data.",
+		Kind:        KindFloat64,
+		Cumulative:  true,
+	},
+}
+
+func init() {
+	// Insert all the non-default-reporting GODEBUGs into the table,
+	// preserving the overall sort order.
+	i := 0
+	for i < len(allDesc) && allDesc[i].Name < "/godebug/" {
+		i++
+	}
+	more := make([]Description, i, len(allDesc)+len(godebugs.All))
+	copy(more, allDesc)
+	for _, info := range godebugs.All {
+		if !info.Opaque {
+			more = append(more, Description{
+				Name: "/godebug/non-default-behavior/" + info.Name + ":events",
+				Description: "The number of non-default behaviors executed by the " +
+					info.Package + " package " + "due to a non-default " +
+					"GODEBUG=" + info.Name + "=... setting.",
+				Kind:       KindUint64,
+				Cumulative: true,
+			})
+		}
+	}
+	allDesc = append(more, allDesc[i:]...)
+}
+
+// All returns a slice of containing metric descriptions for all supported metrics.
+func All() []Description {
+	return allDesc
+}
diff --git a/src/runtime/metrics/description_test.go b/src/runtime/metrics/description_test.go
new file mode 100644
index 0000000..4fc6523
--- /dev/null
+++ b/src/runtime/metrics/description_test.go
@@ -0,0 +1,158 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package metrics_test
+
+import (
+	"bytes"
+	"flag"
+	"fmt"
+	"go/ast"
+	"go/doc"
+	"go/doc/comment"
+	"go/format"
+	"go/parser"
+	"go/token"
+	"internal/diff"
+	"os"
+	"regexp"
+	"runtime/metrics"
+	"sort"
+	"strings"
+	"testing"
+	_ "unsafe"
+)
+
+// Implemented in the runtime.
+//
+//go:linkname runtime_readMetricNames
+func runtime_readMetricNames() []string
+
+func TestNames(t *testing.T) {
+	// Note that this regexp is promised in the package docs for Description. Do not change.
+	r := regexp.MustCompile("^(?P<name>/[^:]+):(?P<unit>[^:*/]+(?:[*/][^:*/]+)*)$")
+	all := metrics.All()
+	for i, d := range all {
+		if !r.MatchString(d.Name) {
+			t.Errorf("name %q does not match regexp %#q", d.Name, r)
+		}
+		if i > 0 && all[i-1].Name >= all[i].Name {
+			t.Fatalf("allDesc not sorted: %s ≥ %s", all[i-1].Name, all[i].Name)
+		}
+	}
+
+	names := runtime_readMetricNames()
+	sort.Strings(names)
+	samples := make([]metrics.Sample, len(names))
+	for i, name := range names {
+		samples[i].Name = name
+	}
+	metrics.Read(samples)
+
+	for _, d := range all {
+		for len(samples) > 0 && samples[0].Name < d.Name {
+			t.Errorf("%s: reported by runtime but not listed in All", samples[0].Name)
+			samples = samples[1:]
+		}
+		if len(samples) == 0 || d.Name < samples[0].Name {
+			t.Errorf("%s: listed in All but not reported by runtime", d.Name)
+			continue
+		}
+		if samples[0].Value.Kind() != d.Kind {
+			t.Errorf("%s: runtime reports %v but All reports %v", d.Name, samples[0].Value.Kind(), d.Kind)
+		}
+		samples = samples[1:]
+	}
+}
+
+func wrap(prefix, text string, width int) string {
+	doc := &comment.Doc{Content: []comment.Block{&comment.Paragraph{Text: []comment.Text{comment.Plain(text)}}}}
+	pr := &comment.Printer{TextPrefix: prefix, TextWidth: width}
+	return string(pr.Text(doc))
+}
+
+func formatDesc(t *testing.T) string {
+	var b strings.Builder
+	for i, d := range metrics.All() {
+		if i > 0 {
+			fmt.Fprintf(&b, "\n")
+		}
+		fmt.Fprintf(&b, "%s\n", d.Name)
+		fmt.Fprintf(&b, "%s", wrap("\t", d.Description, 80-2*8))
+	}
+	return b.String()
+}
+
+var generate = flag.Bool("generate", false, "update doc.go for go generate")
+
+func TestDocs(t *testing.T) {
+	want := formatDesc(t)
+
+	src, err := os.ReadFile("doc.go")
+	if err != nil {
+		t.Fatal(err)
+	}
+	fset := token.NewFileSet()
+	f, err := parser.ParseFile(fset, "doc.go", src, parser.ParseComments)
+	if err != nil {
+		t.Fatal(err)
+	}
+	fdoc := f.Doc
+	if fdoc == nil {
+		t.Fatal("no doc comment in doc.go")
+	}
+	pkg, err := doc.NewFromFiles(fset, []*ast.File{f}, "runtime/metrics")
+	if err != nil {
+		t.Fatal(err)
+	}
+	if pkg.Doc == "" {
+		t.Fatal("doc.NewFromFiles lost doc comment")
+	}
+	doc := new(comment.Parser).Parse(pkg.Doc)
+	expectCode := false
+	foundCode := false
+	updated := false
+	for _, block := range doc.Content {
+		switch b := block.(type) {
+		case *comment.Heading:
+			expectCode = false
+			if b.Text[0] == comment.Plain("Supported metrics") {
+				expectCode = true
+			}
+		case *comment.Code:
+			if expectCode {
+				foundCode = true
+				if b.Text != want {
+					if !*generate {
+						t.Fatalf("doc comment out of date; use go generate to rebuild\n%s", diff.Diff("old", []byte(b.Text), "want", []byte(want)))
+					}
+					b.Text = want
+					updated = true
+				}
+			}
+		}
+	}
+
+	if !foundCode {
+		t.Fatalf("did not find Supported metrics list in doc.go")
+	}
+	if updated {
+		fmt.Fprintf(os.Stderr, "go test -generate: writing new doc.go\n")
+		var buf bytes.Buffer
+		buf.Write(src[:fdoc.Pos()-f.FileStart])
+		buf.WriteString("/*\n")
+		buf.Write(new(comment.Printer).Comment(doc))
+		buf.WriteString("*/")
+		buf.Write(src[fdoc.End()-f.FileStart:])
+		src, err := format.Source(buf.Bytes())
+		if err != nil {
+			t.Fatal(err)
+		}
+		if err := os.WriteFile("doc.go", src, 0666); err != nil {
+			t.Fatal(err)
+		}
+	} else if *generate {
+		fmt.Fprintf(os.Stderr, "go test -generate: doc.go already up-to-date\n")
+	}
+}
diff --git a/src/runtime/metrics/doc.go b/src/runtime/metrics/doc.go
new file mode 100644
index 0000000..55d1f65
--- /dev/null
+++ b/src/runtime/metrics/doc.go
@@ -0,0 +1,400 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Note: run 'go generate' (which will run 'go test -generate') to update the "Supported metrics" list.
+//go:generate go test -run=Docs -generate
+
+/*
+Package metrics provides a stable interface to access implementation-defined
+metrics exported by the Go runtime. This package is similar to existing functions
+like [runtime.ReadMemStats] and [debug.ReadGCStats], but significantly more general.
+
+The set of metrics defined by this package may evolve as the runtime itself
+evolves, and also enables variation across Go implementations, whose relevant
+metric sets may not intersect.
+
+# Interface
+
+Metrics are designated by a string key, rather than, for example, a field name in
+a struct. The full list of supported metrics is always available in the slice of
+Descriptions returned by All. Each Description also includes useful information
+about the metric.
+
+Thus, users of this API are encouraged to sample supported metrics defined by the
+slice returned by All to remain compatible across Go versions. Of course, situations
+arise where reading specific metrics is critical. For these cases, users are
+encouraged to use build tags, and although metrics may be deprecated and removed,
+users should consider this to be an exceptional and rare event, coinciding with a
+very large change in a particular Go implementation.
+
+Each metric key also has a "kind" that describes the format of the metric's value.
+In the interest of not breaking users of this package, the "kind" for a given metric
+is guaranteed not to change. If it must change, then a new metric will be introduced
+with a new key and a new "kind."
+
+# Metric key format
+
+As mentioned earlier, metric keys are strings. Their format is simple and well-defined,
+designed to be both human and machine readable. It is split into two components,
+separated by a colon: a rooted path and a unit. The choice to include the unit in
+the key is motivated by compatibility: if a metric's unit changes, its semantics likely
+did also, and a new key should be introduced.
+
+For more details on the precise definition of the metric key's path and unit formats, see
+the documentation of the Name field of the Description struct.
+
+# A note about floats
+
+This package supports metrics whose values have a floating-point representation. In
+order to improve ease-of-use, this package promises to never produce the following
+classes of floating-point values: NaN, infinity.
+
+# Supported metrics
+
+Below is the full list of supported metrics, ordered lexicographically.
+
+	/cgo/go-to-c-calls:calls
+		Count of calls made from Go to C by the current process.
+
+	/cpu/classes/gc/mark/assist:cpu-seconds
+		Estimated total CPU time goroutines spent performing GC
+		tasks to assist the GC and prevent it from falling behind the
+		application. This metric is an overestimate, and not directly
+		comparable to system CPU time measurements. Compare only with
+		other /cpu/classes metrics.
+
+	/cpu/classes/gc/mark/dedicated:cpu-seconds
+		Estimated total CPU time spent performing GC tasks on processors
+		(as defined by GOMAXPROCS) dedicated to those tasks. This metric
+		is an overestimate, and not directly comparable to system CPU
+		time measurements. Compare only with other /cpu/classes metrics.
+
+	/cpu/classes/gc/mark/idle:cpu-seconds
+		Estimated total CPU time spent performing GC tasks on spare CPU
+		resources that the Go scheduler could not otherwise find a use
+		for. This should be subtracted from the total GC CPU time to
+		obtain a measure of compulsory GC CPU time. This metric is an
+		overestimate, and not directly comparable to system CPU time
+		measurements. Compare only with other /cpu/classes metrics.
+
+	/cpu/classes/gc/pause:cpu-seconds
+		Estimated total CPU time spent with the application paused by
+		the GC. Even if only one thread is running during the pause,
+		this is computed as GOMAXPROCS times the pause latency because
+		nothing else can be executing. This is the exact sum of samples
+		in /gc/pause:seconds if each sample is multiplied by GOMAXPROCS
+		at the time it is taken. This metric is an overestimate,
+		and not directly comparable to system CPU time measurements.
+		Compare only with other /cpu/classes metrics.
+
+	/cpu/classes/gc/total:cpu-seconds
+		Estimated total CPU time spent performing GC tasks. This metric
+		is an overestimate, and not directly comparable to system CPU
+		time measurements. Compare only with other /cpu/classes metrics.
+		Sum of all metrics in /cpu/classes/gc.
+
+	/cpu/classes/idle:cpu-seconds
+		Estimated total available CPU time not spent executing
+		any Go or Go runtime code. In other words, the part of
+		/cpu/classes/total:cpu-seconds that was unused. This metric is
+		an overestimate, and not directly comparable to system CPU time
+		measurements. Compare only with other /cpu/classes metrics.
+
+	/cpu/classes/scavenge/assist:cpu-seconds
+		Estimated total CPU time spent returning unused memory to the
+		underlying platform in response eagerly in response to memory
+		pressure. This metric is an overestimate, and not directly
+		comparable to system CPU time measurements. Compare only with
+		other /cpu/classes metrics.
+
+	/cpu/classes/scavenge/background:cpu-seconds
+		Estimated total CPU time spent performing background tasks to
+		return unused memory to the underlying platform. This metric is
+		an overestimate, and not directly comparable to system CPU time
+		measurements. Compare only with other /cpu/classes metrics.
+
+	/cpu/classes/scavenge/total:cpu-seconds
+		Estimated total CPU time spent performing tasks that return
+		unused memory to the underlying platform. This metric is an
+		overestimate, and not directly comparable to system CPU time
+		measurements. Compare only with other /cpu/classes metrics.
+		Sum of all metrics in /cpu/classes/scavenge.
+
+	/cpu/classes/total:cpu-seconds
+		Estimated total available CPU time for user Go code or the Go
+		runtime, as defined by GOMAXPROCS. In other words, GOMAXPROCS
+		integrated over the wall-clock duration this process has been
+		executing for. This metric is an overestimate, and not directly
+		comparable to system CPU time measurements. Compare only with
+		other /cpu/classes metrics. Sum of all metrics in /cpu/classes.
+
+	/cpu/classes/user:cpu-seconds
+		Estimated total CPU time spent running user Go code. This may
+		also include some small amount of time spent in the Go runtime.
+		This metric is an overestimate, and not directly comparable
+		to system CPU time measurements. Compare only with other
+		/cpu/classes metrics.
+
+	/gc/cycles/automatic:gc-cycles
+		Count of completed GC cycles generated by the Go runtime.
+
+	/gc/cycles/forced:gc-cycles
+		Count of completed GC cycles forced by the application.
+
+	/gc/cycles/total:gc-cycles
+		Count of all completed GC cycles.
+
+	/gc/gogc:percent
+		Heap size target percentage configured by the user, otherwise
+		100. This value is set by the GOGC environment variable, and the
+		runtime/debug.SetGCPercent function.
+
+	/gc/gomemlimit:bytes
+		Go runtime memory limit configured by the user, otherwise
+		math.MaxInt64. This value is set by the GOMEMLIMIT environment
+		variable, and the runtime/debug.SetMemoryLimit function.
+
+	/gc/heap/allocs-by-size:bytes
+		Distribution of heap allocations by approximate size.
+		Bucket counts increase monotonically. Note that this does not
+		include tiny objects as defined by /gc/heap/tiny/allocs:objects,
+		only tiny blocks.
+
+	/gc/heap/allocs:bytes
+		Cumulative sum of memory allocated to the heap by the
+		application.
+
+	/gc/heap/allocs:objects
+		Cumulative count of heap allocations triggered by the
+		application. Note that this does not include tiny objects as
+		defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
+
+	/gc/heap/frees-by-size:bytes
+		Distribution of freed heap allocations by approximate size.
+		Bucket counts increase monotonically. Note that this does not
+		include tiny objects as defined by /gc/heap/tiny/allocs:objects,
+		only tiny blocks.
+
+	/gc/heap/frees:bytes
+		Cumulative sum of heap memory freed by the garbage collector.
+
+	/gc/heap/frees:objects
+		Cumulative count of heap allocations whose storage was freed
+		by the garbage collector. Note that this does not include tiny
+		objects as defined by /gc/heap/tiny/allocs:objects, only tiny
+		blocks.
+
+	/gc/heap/goal:bytes
+		Heap size target for the end of the GC cycle.
+
+	/gc/heap/live:bytes
+		Heap memory occupied by live objects that were marked by the
+		previous GC.
+
+	/gc/heap/objects:objects
+		Number of objects, live or unswept, occupying heap memory.
+
+	/gc/heap/tiny/allocs:objects
+		Count of small allocations that are packed together into blocks.
+		These allocations are counted separately from other allocations
+		because each individual allocation is not tracked by the
+		runtime, only their block. Each block is already accounted for
+		in allocs-by-size and frees-by-size.
+
+	/gc/limiter/last-enabled:gc-cycle
+		GC cycle the last time the GC CPU limiter was enabled.
+		This metric is useful for diagnosing the root cause of an
+		out-of-memory error, because the limiter trades memory for CPU
+		time when the GC's CPU time gets too high. This is most likely
+		to occur with use of SetMemoryLimit. The first GC cycle is cycle
+		1, so a value of 0 indicates that it was never enabled.
+
+	/gc/pauses:seconds
+		Distribution of individual GC-related stop-the-world pause
+		latencies. Bucket counts increase monotonically.
+
+	/gc/scan/globals:bytes
+		The total amount of global variable space that is scannable.
+
+	/gc/scan/heap:bytes
+		The total amount of heap space that is scannable.
+
+	/gc/scan/stack:bytes
+		The number of bytes of stack that were scanned last GC cycle.
+
+	/gc/scan/total:bytes
+		The total amount space that is scannable. Sum of all metrics in
+		/gc/scan.
+
+	/gc/stack/starting-size:bytes
+		The stack size of new goroutines.
+
+	/godebug/non-default-behavior/execerrdot:events
+		The number of non-default behaviors executed by the os/exec
+		package due to a non-default GODEBUG=execerrdot=... setting.
+
+	/godebug/non-default-behavior/gocachehash:events
+		The number of non-default behaviors executed by the cmd/go
+		package due to a non-default GODEBUG=gocachehash=... setting.
+
+	/godebug/non-default-behavior/gocachetest:events
+		The number of non-default behaviors executed by the cmd/go
+		package due to a non-default GODEBUG=gocachetest=... setting.
+
+	/godebug/non-default-behavior/gocacheverify:events
+		The number of non-default behaviors executed by the cmd/go
+		package due to a non-default GODEBUG=gocacheverify=... setting.
+
+	/godebug/non-default-behavior/http2client:events
+		The number of non-default behaviors executed by the net/http
+		package due to a non-default GODEBUG=http2client=... setting.
+
+	/godebug/non-default-behavior/http2server:events
+		The number of non-default behaviors executed by the net/http
+		package due to a non-default GODEBUG=http2server=... setting.
+
+	/godebug/non-default-behavior/installgoroot:events
+		The number of non-default behaviors executed by the go/build
+		package due to a non-default GODEBUG=installgoroot=... setting.
+
+	/godebug/non-default-behavior/jstmpllitinterp:events
+		The number of non-default behaviors executed by
+		the html/template package due to a non-default
+		GODEBUG=jstmpllitinterp=... setting.
+
+	/godebug/non-default-behavior/multipartmaxheaders:events
+		The number of non-default behaviors executed by
+		the mime/multipart package due to a non-default
+		GODEBUG=multipartmaxheaders=... setting.
+
+	/godebug/non-default-behavior/multipartmaxparts:events
+		The number of non-default behaviors executed by
+		the mime/multipart package due to a non-default
+		GODEBUG=multipartmaxparts=... setting.
+
+	/godebug/non-default-behavior/multipathtcp:events
+		The number of non-default behaviors executed by the net package
+		due to a non-default GODEBUG=multipathtcp=... setting.
+
+	/godebug/non-default-behavior/panicnil:events
+		The number of non-default behaviors executed by the runtime
+		package due to a non-default GODEBUG=panicnil=... setting.
+
+	/godebug/non-default-behavior/randautoseed:events
+		The number of non-default behaviors executed by the math/rand
+		package due to a non-default GODEBUG=randautoseed=... setting.
+
+	/godebug/non-default-behavior/tarinsecurepath:events
+		The number of non-default behaviors executed by the archive/tar
+		package due to a non-default GODEBUG=tarinsecurepath=...
+		setting.
+
+	/godebug/non-default-behavior/tlsmaxrsasize:events
+		The number of non-default behaviors executed by the crypto/tls
+		package due to a non-default GODEBUG=tlsmaxrsasize=... setting.
+
+	/godebug/non-default-behavior/x509sha1:events
+		The number of non-default behaviors executed by the crypto/x509
+		package due to a non-default GODEBUG=x509sha1=... setting.
+
+	/godebug/non-default-behavior/x509usefallbackroots:events
+		The number of non-default behaviors executed by the crypto/x509
+		package due to a non-default GODEBUG=x509usefallbackroots=...
+		setting.
+
+	/godebug/non-default-behavior/zipinsecurepath:events
+		The number of non-default behaviors executed by the archive/zip
+		package due to a non-default GODEBUG=zipinsecurepath=...
+		setting.
+
+	/memory/classes/heap/free:bytes
+		Memory that is completely free and eligible to be returned to
+		the underlying system, but has not been. This metric is the
+		runtime's estimate of free address space that is backed by
+		physical memory.
+
+	/memory/classes/heap/objects:bytes
+		Memory occupied by live objects and dead objects that have not
+		yet been marked free by the garbage collector.
+
+	/memory/classes/heap/released:bytes
+		Memory that is completely free and has been returned to the
+		underlying system. This metric is the runtime's estimate of free
+		address space that is still mapped into the process, but is not
+		backed by physical memory.
+
+	/memory/classes/heap/stacks:bytes
+		Memory allocated from the heap that is reserved for stack space,
+		whether or not it is currently in-use. Currently, this
+		represents all stack memory for goroutines. It also includes all
+		OS thread stacks in non-cgo programs. Note that stacks may be
+		allocated differently in the future, and this may change.
+
+	/memory/classes/heap/unused:bytes
+		Memory that is reserved for heap objects but is not currently
+		used to hold heap objects.
+
+	/memory/classes/metadata/mcache/free:bytes
+		Memory that is reserved for runtime mcache structures, but not
+		in-use.
+
+	/memory/classes/metadata/mcache/inuse:bytes
+		Memory that is occupied by runtime mcache structures that are
+		currently being used.
+
+	/memory/classes/metadata/mspan/free:bytes
+		Memory that is reserved for runtime mspan structures, but not
+		in-use.
+
+	/memory/classes/metadata/mspan/inuse:bytes
+		Memory that is occupied by runtime mspan structures that are
+		currently being used.
+
+	/memory/classes/metadata/other:bytes
+		Memory that is reserved for or used to hold runtime metadata.
+
+	/memory/classes/os-stacks:bytes
+		Stack memory allocated by the underlying operating system.
+		In non-cgo programs this metric is currently zero. This may
+		change in the future.In cgo programs this metric includes
+		OS thread stacks allocated directly from the OS. Currently,
+		this only accounts for one stack in c-shared and c-archive build
+		modes, and other sources of stacks from the OS are not measured.
+		This too may change in the future.
+
+	/memory/classes/other:bytes
+		Memory used by execution trace buffers, structures for debugging
+		the runtime, finalizer and profiler specials, and more.
+
+	/memory/classes/profiling/buckets:bytes
+		Memory that is used by the stack trace hash map used for
+		profiling.
+
+	/memory/classes/total:bytes
+		All memory mapped by the Go runtime into the current process
+		as read-write. Note that this does not include memory mapped
+		by code called via cgo or via the syscall package. Sum of all
+		metrics in /memory/classes.
+
+	/sched/gomaxprocs:threads
+		The current runtime.GOMAXPROCS setting, or the number of
+		operating system threads that can execute user-level Go code
+		simultaneously.
+
+	/sched/goroutines:goroutines
+		Count of live goroutines.
+
+	/sched/latencies:seconds
+		Distribution of the time goroutines have spent in the scheduler
+		in a runnable state before actually running. Bucket counts
+		increase monotonically.
+
+	/sync/mutex/wait/total:seconds
+		Approximate cumulative time goroutines have spent blocked
+		on a sync.Mutex or sync.RWMutex. This metric is useful for
+		identifying global changes in lock contention. Collect a mutex
+		or block profile using the runtime/pprof package for more
+		detailed contention data.
+*/
+package metrics
diff --git a/src/runtime/metrics/example_test.go b/src/runtime/metrics/example_test.go
new file mode 100644
index 0000000..624d9d8
--- /dev/null
+++ b/src/runtime/metrics/example_test.go
@@ -0,0 +1,96 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package metrics_test
+
+import (
+	"fmt"
+	"runtime/metrics"
+)
+
+func ExampleRead_readingOneMetric() {
+	// Name of the metric we want to read.
+	const myMetric = "/memory/classes/heap/free:bytes"
+
+	// Create a sample for the metric.
+	sample := make([]metrics.Sample, 1)
+	sample[0].Name = myMetric
+
+	// Sample the metric.
+	metrics.Read(sample)
+
+	// Check if the metric is actually supported.
+	// If it's not, the resulting value will always have
+	// kind KindBad.
+	if sample[0].Value.Kind() == metrics.KindBad {
+		panic(fmt.Sprintf("metric %q no longer supported", myMetric))
+	}
+
+	// Handle the result.
+	//
+	// It's OK to assume a particular Kind for a metric;
+	// they're guaranteed not to change.
+	freeBytes := sample[0].Value.Uint64()
+
+	fmt.Printf("free but not released memory: %d\n", freeBytes)
+}
+
+func ExampleRead_readingAllMetrics() {
+	// Get descriptions for all supported metrics.
+	descs := metrics.All()
+
+	// Create a sample for each metric.
+	samples := make([]metrics.Sample, len(descs))
+	for i := range samples {
+		samples[i].Name = descs[i].Name
+	}
+
+	// Sample the metrics. Re-use the samples slice if you can!
+	metrics.Read(samples)
+
+	// Iterate over all results.
+	for _, sample := range samples {
+		// Pull out the name and value.
+		name, value := sample.Name, sample.Value
+
+		// Handle each sample.
+		switch value.Kind() {
+		case metrics.KindUint64:
+			fmt.Printf("%s: %d\n", name, value.Uint64())
+		case metrics.KindFloat64:
+			fmt.Printf("%s: %f\n", name, value.Float64())
+		case metrics.KindFloat64Histogram:
+			// The histogram may be quite large, so let's just pull out
+			// a crude estimate for the median for the sake of this example.
+			fmt.Printf("%s: %f\n", name, medianBucket(value.Float64Histogram()))
+		case metrics.KindBad:
+			// This should never happen because all metrics are supported
+			// by construction.
+			panic("bug in runtime/metrics package!")
+		default:
+			// This may happen as new metrics get added.
+			//
+			// The safest thing to do here is to simply log it somewhere
+			// as something to look into, but ignore it for now.
+			// In the worst case, you might temporarily miss out on a new metric.
+			fmt.Printf("%s: unexpected metric Kind: %v\n", name, value.Kind())
+		}
+	}
+}
+
+func medianBucket(h *metrics.Float64Histogram) float64 {
+	total := uint64(0)
+	for _, count := range h.Counts {
+		total += count
+	}
+	thresh := total / 2
+	total = 0
+	for i, count := range h.Counts {
+		total += count
+		if total >= thresh {
+			return h.Buckets[i]
+		}
+	}
+	panic("should not happen")
+}
diff --git a/src/runtime/metrics/histogram.go b/src/runtime/metrics/histogram.go
new file mode 100644
index 0000000..956422b
--- /dev/null
+++ b/src/runtime/metrics/histogram.go
@@ -0,0 +1,33 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package metrics
+
+// Float64Histogram represents a distribution of float64 values.
+type Float64Histogram struct {
+	// Counts contains the weights for each histogram bucket.
+	//
+	// Given N buckets, Count[n] is the weight of the range
+	// [bucket[n], bucket[n+1]), for 0 <= n < N.
+	Counts []uint64
+
+	// Buckets contains the boundaries of the histogram buckets, in increasing order.
+	//
+	// Buckets[0] is the inclusive lower bound of the minimum bucket while
+	// Buckets[len(Buckets)-1] is the exclusive upper bound of the maximum bucket.
+	// Hence, there are len(Buckets)-1 counts. Furthermore, len(Buckets) != 1, always,
+	// since at least two boundaries are required to describe one bucket (and 0
+	// boundaries are used to describe 0 buckets).
+	//
+	// Buckets[0] is permitted to have value -Inf and Buckets[len(Buckets)-1] is
+	// permitted to have value Inf.
+	//
+	// For a given metric name, the value of Buckets is guaranteed not to change
+	// between calls until program exit.
+	//
+	// This slice value is permitted to alias with other Float64Histograms' Buckets
+	// fields, so the values within should only ever be read. If they need to be
+	// modified, the user must make a copy.
+	Buckets []float64
+}
diff --git a/src/runtime/metrics/sample.go b/src/runtime/metrics/sample.go
new file mode 100644
index 0000000..4cf8cdf
--- /dev/null
+++ b/src/runtime/metrics/sample.go
@@ -0,0 +1,47 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package metrics
+
+import (
+	_ "runtime" // depends on the runtime via a linkname'd function
+	"unsafe"
+)
+
+// Sample captures a single metric sample.
+type Sample struct {
+	// Name is the name of the metric sampled.
+	//
+	// It must correspond to a name in one of the metric descriptions
+	// returned by All.
+	Name string
+
+	// Value is the value of the metric sample.
+	Value Value
+}
+
+// Implemented in the runtime.
+func runtime_readMetrics(unsafe.Pointer, int, int)
+
+// Read populates each Value field in the given slice of metric samples.
+//
+// Desired metrics should be present in the slice with the appropriate name.
+// The user of this API is encouraged to re-use the same slice between calls for
+// efficiency, but is not required to do so.
+//
+// Note that re-use has some caveats. Notably, Values should not be read or
+// manipulated while a Read with that value is outstanding; that is a data race.
+// This property includes pointer-typed Values (for example, Float64Histogram)
+// whose underlying storage will be reused by Read when possible. To safely use
+// such values in a concurrent setting, all data must be deep-copied.
+//
+// It is safe to execute multiple Read calls concurrently, but their arguments
+// must share no underlying memory. When in doubt, create a new []Sample from
+// scratch, which is always safe, though may be inefficient.
+//
+// Sample values with names not appearing in All will have their Value populated
+// as KindBad to indicate that the name is unknown.
+func Read(m []Sample) {
+	runtime_readMetrics(unsafe.Pointer(&m[0]), len(m), cap(m))
+}
diff --git a/src/runtime/metrics/value.go b/src/runtime/metrics/value.go
new file mode 100644
index 0000000..ed9a33d
--- /dev/null
+++ b/src/runtime/metrics/value.go
@@ -0,0 +1,69 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package metrics
+
+import (
+	"math"
+	"unsafe"
+)
+
+// ValueKind is a tag for a metric Value which indicates its type.
+type ValueKind int
+
+const (
+	// KindBad indicates that the Value has no type and should not be used.
+	KindBad ValueKind = iota
+
+	// KindUint64 indicates that the type of the Value is a uint64.
+	KindUint64
+
+	// KindFloat64 indicates that the type of the Value is a float64.
+	KindFloat64
+
+	// KindFloat64Histogram indicates that the type of the Value is a *Float64Histogram.
+	KindFloat64Histogram
+)
+
+// Value represents a metric value returned by the runtime.
+type Value struct {
+	kind    ValueKind
+	scalar  uint64         // contains scalar values for scalar Kinds.
+	pointer unsafe.Pointer // contains non-scalar values.
+}
+
+// Kind returns the tag representing the kind of value this is.
+func (v Value) Kind() ValueKind {
+	return v.kind
+}
+
+// Uint64 returns the internal uint64 value for the metric.
+//
+// If v.Kind() != KindUint64, this method panics.
+func (v Value) Uint64() uint64 {
+	if v.kind != KindUint64 {
+		panic("called Uint64 on non-uint64 metric value")
+	}
+	return v.scalar
+}
+
+// Float64 returns the internal float64 value for the metric.
+//
+// If v.Kind() != KindFloat64, this method panics.
+func (v Value) Float64() float64 {
+	if v.kind != KindFloat64 {
+		panic("called Float64 on non-float64 metric value")
+	}
+	return math.Float64frombits(v.scalar)
+}
+
+// Float64Histogram returns the internal *Float64Histogram value for the metric.
+//
+// If v.Kind() != KindFloat64Histogram, this method panics.
+func (v Value) Float64Histogram() *Float64Histogram {
+	if v.kind != KindFloat64Histogram {
+		panic("called Float64Histogram on non-Float64Histogram metric value")
+	}
+	return (*Float64Histogram)(v.pointer)
+}
diff --git a/src/runtime/metrics_test.go b/src/runtime/metrics_test.go
new file mode 100644
index 0000000..cfb09a3
--- /dev/null
+++ b/src/runtime/metrics_test.go
@@ -0,0 +1,763 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"reflect"
+	"runtime"
+	"runtime/debug"
+	"runtime/metrics"
+	"sort"
+	"strings"
+	"sync"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+func prepareAllMetricsSamples() (map[string]metrics.Description, []metrics.Sample) {
+	all := metrics.All()
+	samples := make([]metrics.Sample, len(all))
+	descs := make(map[string]metrics.Description)
+	for i := range all {
+		samples[i].Name = all[i].Name
+		descs[all[i].Name] = all[i]
+	}
+	return descs, samples
+}
+
+func TestReadMetrics(t *testing.T) {
+	// Run a GC cycle to get some of the stats to be non-zero.
+	runtime.GC()
+
+	// Set an arbitrary memory limit to check the metric for it
+	limit := int64(512 * 1024 * 1024)
+	oldLimit := debug.SetMemoryLimit(limit)
+	defer debug.SetMemoryLimit(oldLimit)
+
+	// Set an GC percent to check the metric for it
+	gcPercent := 99
+	oldGCPercent := debug.SetGCPercent(gcPercent)
+	defer debug.SetGCPercent(oldGCPercent)
+
+	// Tests whether readMetrics produces values aligning
+	// with ReadMemStats while the world is stopped.
+	var mstats runtime.MemStats
+	_, samples := prepareAllMetricsSamples()
+	runtime.ReadMetricsSlow(&mstats, unsafe.Pointer(&samples[0]), len(samples), cap(samples))
+
+	checkUint64 := func(t *testing.T, m string, got, want uint64) {
+		t.Helper()
+		if got != want {
+			t.Errorf("metric %q: got %d, want %d", m, got, want)
+		}
+	}
+
+	// Check to make sure the values we read line up with other values we read.
+	var allocsBySize *metrics.Float64Histogram
+	var tinyAllocs uint64
+	var mallocs, frees uint64
+	for i := range samples {
+		switch name := samples[i].Name; name {
+		case "/cgo/go-to-c-calls:calls":
+			checkUint64(t, name, samples[i].Value.Uint64(), uint64(runtime.NumCgoCall()))
+		case "/memory/classes/heap/free:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.HeapIdle-mstats.HeapReleased)
+		case "/memory/classes/heap/released:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.HeapReleased)
+		case "/memory/classes/heap/objects:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.HeapAlloc)
+		case "/memory/classes/heap/unused:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.HeapInuse-mstats.HeapAlloc)
+		case "/memory/classes/heap/stacks:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.StackInuse)
+		case "/memory/classes/metadata/mcache/free:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.MCacheSys-mstats.MCacheInuse)
+		case "/memory/classes/metadata/mcache/inuse:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.MCacheInuse)
+		case "/memory/classes/metadata/mspan/free:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.MSpanSys-mstats.MSpanInuse)
+		case "/memory/classes/metadata/mspan/inuse:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.MSpanInuse)
+		case "/memory/classes/metadata/other:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.GCSys)
+		case "/memory/classes/os-stacks:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.StackSys-mstats.StackInuse)
+		case "/memory/classes/other:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.OtherSys)
+		case "/memory/classes/profiling/buckets:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.BuckHashSys)
+		case "/memory/classes/total:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.Sys)
+		case "/gc/heap/allocs-by-size:bytes":
+			hist := samples[i].Value.Float64Histogram()
+			// Skip size class 0 in BySize, because it's always empty and not represented
+			// in the histogram.
+			for i, sc := range mstats.BySize[1:] {
+				if b, s := hist.Buckets[i+1], float64(sc.Size+1); b != s {
+					t.Errorf("bucket does not match size class: got %f, want %f", b, s)
+					// The rest of the checks aren't expected to work anyway.
+					continue
+				}
+				if c, m := hist.Counts[i], sc.Mallocs; c != m {
+					t.Errorf("histogram counts do not much BySize for class %d: got %d, want %d", i, c, m)
+				}
+			}
+			allocsBySize = hist
+		case "/gc/heap/allocs:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.TotalAlloc)
+		case "/gc/heap/frees-by-size:bytes":
+			hist := samples[i].Value.Float64Histogram()
+			// Skip size class 0 in BySize, because it's always empty and not represented
+			// in the histogram.
+			for i, sc := range mstats.BySize[1:] {
+				if b, s := hist.Buckets[i+1], float64(sc.Size+1); b != s {
+					t.Errorf("bucket does not match size class: got %f, want %f", b, s)
+					// The rest of the checks aren't expected to work anyway.
+					continue
+				}
+				if c, f := hist.Counts[i], sc.Frees; c != f {
+					t.Errorf("histogram counts do not match BySize for class %d: got %d, want %d", i, c, f)
+				}
+			}
+		case "/gc/heap/frees:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.TotalAlloc-mstats.HeapAlloc)
+		case "/gc/heap/tiny/allocs:objects":
+			// Currently, MemStats adds tiny alloc count to both Mallocs AND Frees.
+			// The reason for this is because MemStats couldn't be extended at the time
+			// but there was a desire to have Mallocs at least be a little more representative,
+			// while having Mallocs - Frees still represent a live object count.
+			// Unfortunately, MemStats doesn't actually export a large allocation count,
+			// so it's impossible to pull this number out directly.
+			//
+			// Check tiny allocation count outside of this loop, by using the allocs-by-size
+			// histogram in order to figure out how many large objects there are.
+			tinyAllocs = samples[i].Value.Uint64()
+			// Because the next two metrics tests are checking against Mallocs and Frees,
+			// we can't check them directly for the same reason: we need to account for tiny
+			// allocations included in Mallocs and Frees.
+		case "/gc/heap/allocs:objects":
+			mallocs = samples[i].Value.Uint64()
+		case "/gc/heap/frees:objects":
+			frees = samples[i].Value.Uint64()
+		case "/gc/heap/live:bytes":
+			// Check for "obviously wrong" values. We can't check a stronger invariant,
+			// such as live <= HeapAlloc, because live is not 100% accurate. It's computed
+			// under racy conditions, and some objects may be double-counted (this is
+			// intentional and necessary for GC performance).
+			//
+			// Instead, check against a much more reasonable upper-bound: the amount of
+			// mapped heap memory. We can't possibly overcount to the point of exceeding
+			// total mapped heap memory, except if there's an accounting bug.
+			if live := samples[i].Value.Uint64(); live > mstats.HeapSys {
+				t.Errorf("live bytes: %d > heap sys: %d", live, mstats.HeapSys)
+			} else if live == 0 {
+				// Might happen if we don't call runtime.GC() above.
+				t.Error("live bytes is 0")
+			}
+		case "/gc/gomemlimit:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), uint64(limit))
+		case "/gc/heap/objects:objects":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.HeapObjects)
+		case "/gc/heap/goal:bytes":
+			checkUint64(t, name, samples[i].Value.Uint64(), mstats.NextGC)
+		case "/gc/gogc:percent":
+			checkUint64(t, name, samples[i].Value.Uint64(), uint64(gcPercent))
+		case "/gc/cycles/automatic:gc-cycles":
+			checkUint64(t, name, samples[i].Value.Uint64(), uint64(mstats.NumGC-mstats.NumForcedGC))
+		case "/gc/cycles/forced:gc-cycles":
+			checkUint64(t, name, samples[i].Value.Uint64(), uint64(mstats.NumForcedGC))
+		case "/gc/cycles/total:gc-cycles":
+			checkUint64(t, name, samples[i].Value.Uint64(), uint64(mstats.NumGC))
+		}
+	}
+
+	// Check tinyAllocs.
+	nonTinyAllocs := uint64(0)
+	for _, c := range allocsBySize.Counts {
+		nonTinyAllocs += c
+	}
+	checkUint64(t, "/gc/heap/tiny/allocs:objects", tinyAllocs, mstats.Mallocs-nonTinyAllocs)
+
+	// Check allocation and free counts.
+	checkUint64(t, "/gc/heap/allocs:objects", mallocs, mstats.Mallocs-tinyAllocs)
+	checkUint64(t, "/gc/heap/frees:objects", frees, mstats.Frees-tinyAllocs)
+}
+
+func TestReadMetricsConsistency(t *testing.T) {
+	// Tests whether readMetrics produces consistent, sensible values.
+	// The values are read concurrently with the runtime doing other
+	// things (e.g. allocating) so what we read can't reasonably compared
+	// to other runtime values (e.g. MemStats).
+
+	// Run a few GC cycles to get some of the stats to be non-zero.
+	runtime.GC()
+	runtime.GC()
+	runtime.GC()
+
+	// Set GOMAXPROCS high then sleep briefly to ensure we generate
+	// some idle time.
+	oldmaxprocs := runtime.GOMAXPROCS(10)
+	time.Sleep(time.Millisecond)
+	runtime.GOMAXPROCS(oldmaxprocs)
+
+	// Read all the supported metrics through the metrics package.
+	descs, samples := prepareAllMetricsSamples()
+	metrics.Read(samples)
+
+	// Check to make sure the values we read make sense.
+	var totalVirtual struct {
+		got, want uint64
+	}
+	var objects struct {
+		alloc, free             *metrics.Float64Histogram
+		allocs, frees           uint64
+		allocdBytes, freedBytes uint64
+		total, totalBytes       uint64
+	}
+	var gc struct {
+		numGC  uint64
+		pauses uint64
+	}
+	var totalScan struct {
+		got, want uint64
+	}
+	var cpu struct {
+		gcAssist    float64
+		gcDedicated float64
+		gcIdle      float64
+		gcPause     float64
+		gcTotal     float64
+
+		idle float64
+		user float64
+
+		scavengeAssist float64
+		scavengeBg     float64
+		scavengeTotal  float64
+
+		total float64
+	}
+	for i := range samples {
+		kind := samples[i].Value.Kind()
+		if want := descs[samples[i].Name].Kind; kind != want {
+			t.Errorf("supported metric %q has unexpected kind: got %d, want %d", samples[i].Name, kind, want)
+			continue
+		}
+		if samples[i].Name != "/memory/classes/total:bytes" && strings.HasPrefix(samples[i].Name, "/memory/classes") {
+			v := samples[i].Value.Uint64()
+			totalVirtual.want += v
+
+			// None of these stats should ever get this big.
+			// If they do, there's probably overflow involved,
+			// usually due to bad accounting.
+			if int64(v) < 0 {
+				t.Errorf("%q has high/negative value: %d", samples[i].Name, v)
+			}
+		}
+		switch samples[i].Name {
+		case "/cpu/classes/gc/mark/assist:cpu-seconds":
+			cpu.gcAssist = samples[i].Value.Float64()
+		case "/cpu/classes/gc/mark/dedicated:cpu-seconds":
+			cpu.gcDedicated = samples[i].Value.Float64()
+		case "/cpu/classes/gc/mark/idle:cpu-seconds":
+			cpu.gcIdle = samples[i].Value.Float64()
+		case "/cpu/classes/gc/pause:cpu-seconds":
+			cpu.gcPause = samples[i].Value.Float64()
+		case "/cpu/classes/gc/total:cpu-seconds":
+			cpu.gcTotal = samples[i].Value.Float64()
+		case "/cpu/classes/idle:cpu-seconds":
+			cpu.idle = samples[i].Value.Float64()
+		case "/cpu/classes/scavenge/assist:cpu-seconds":
+			cpu.scavengeAssist = samples[i].Value.Float64()
+		case "/cpu/classes/scavenge/background:cpu-seconds":
+			cpu.scavengeBg = samples[i].Value.Float64()
+		case "/cpu/classes/scavenge/total:cpu-seconds":
+			cpu.scavengeTotal = samples[i].Value.Float64()
+		case "/cpu/classes/total:cpu-seconds":
+			cpu.total = samples[i].Value.Float64()
+		case "/cpu/classes/user:cpu-seconds":
+			cpu.user = samples[i].Value.Float64()
+		case "/memory/classes/total:bytes":
+			totalVirtual.got = samples[i].Value.Uint64()
+		case "/memory/classes/heap/objects:bytes":
+			objects.totalBytes = samples[i].Value.Uint64()
+		case "/gc/heap/objects:objects":
+			objects.total = samples[i].Value.Uint64()
+		case "/gc/heap/allocs:bytes":
+			objects.allocdBytes = samples[i].Value.Uint64()
+		case "/gc/heap/allocs:objects":
+			objects.allocs = samples[i].Value.Uint64()
+		case "/gc/heap/allocs-by-size:bytes":
+			objects.alloc = samples[i].Value.Float64Histogram()
+		case "/gc/heap/frees:bytes":
+			objects.freedBytes = samples[i].Value.Uint64()
+		case "/gc/heap/frees:objects":
+			objects.frees = samples[i].Value.Uint64()
+		case "/gc/heap/frees-by-size:bytes":
+			objects.free = samples[i].Value.Float64Histogram()
+		case "/gc/cycles:gc-cycles":
+			gc.numGC = samples[i].Value.Uint64()
+		case "/gc/pauses:seconds":
+			h := samples[i].Value.Float64Histogram()
+			gc.pauses = 0
+			for i := range h.Counts {
+				gc.pauses += h.Counts[i]
+			}
+		case "/gc/scan/heap:bytes":
+			totalScan.want += samples[i].Value.Uint64()
+		case "/gc/scan/globals:bytes":
+			totalScan.want += samples[i].Value.Uint64()
+		case "/gc/scan/stack:bytes":
+			totalScan.want += samples[i].Value.Uint64()
+		case "/gc/scan/total:bytes":
+			totalScan.got = samples[i].Value.Uint64()
+		case "/sched/gomaxprocs:threads":
+			if got, want := samples[i].Value.Uint64(), uint64(runtime.GOMAXPROCS(-1)); got != want {
+				t.Errorf("gomaxprocs doesn't match runtime.GOMAXPROCS: got %d, want %d", got, want)
+			}
+		case "/sched/goroutines:goroutines":
+			if samples[i].Value.Uint64() < 1 {
+				t.Error("number of goroutines is less than one")
+			}
+		}
+	}
+	// Only check this on Linux where we can be reasonably sure we have a high-resolution timer.
+	if runtime.GOOS == "linux" {
+		if cpu.gcDedicated <= 0 && cpu.gcAssist <= 0 && cpu.gcIdle <= 0 {
+			t.Errorf("found no time spent on GC work: %#v", cpu)
+		}
+		if cpu.gcPause <= 0 {
+			t.Errorf("found no GC pauses: %f", cpu.gcPause)
+		}
+		if cpu.idle <= 0 {
+			t.Errorf("found no idle time: %f", cpu.idle)
+		}
+		if total := cpu.gcDedicated + cpu.gcAssist + cpu.gcIdle + cpu.gcPause; !withinEpsilon(cpu.gcTotal, total, 0.01) {
+			t.Errorf("calculated total GC CPU not within 1%% of sampled total: %f vs. %f", total, cpu.gcTotal)
+		}
+		if total := cpu.scavengeAssist + cpu.scavengeBg; !withinEpsilon(cpu.scavengeTotal, total, 0.01) {
+			t.Errorf("calculated total scavenge CPU not within 1%% of sampled total: %f vs. %f", total, cpu.scavengeTotal)
+		}
+		if cpu.total <= 0 {
+			t.Errorf("found no total CPU time passed")
+		}
+		if cpu.user <= 0 {
+			t.Errorf("found no user time passed")
+		}
+		if total := cpu.gcTotal + cpu.scavengeTotal + cpu.user + cpu.idle; !withinEpsilon(cpu.total, total, 0.02) {
+			t.Errorf("calculated total CPU not within 2%% of sampled total: %f vs. %f", total, cpu.total)
+		}
+	}
+	if totalVirtual.got != totalVirtual.want {
+		t.Errorf(`"/memory/classes/total:bytes" does not match sum of /memory/classes/**: got %d, want %d`, totalVirtual.got, totalVirtual.want)
+	}
+	if got, want := objects.allocs-objects.frees, objects.total; got != want {
+		t.Errorf("mismatch between object alloc/free tallies and total: got %d, want %d", got, want)
+	}
+	if got, want := objects.allocdBytes-objects.freedBytes, objects.totalBytes; got != want {
+		t.Errorf("mismatch between object alloc/free tallies and total: got %d, want %d", got, want)
+	}
+	if b, c := len(objects.alloc.Buckets), len(objects.alloc.Counts); b != c+1 {
+		t.Errorf("allocs-by-size has wrong bucket or counts length: %d buckets, %d counts", b, c)
+	}
+	if b, c := len(objects.free.Buckets), len(objects.free.Counts); b != c+1 {
+		t.Errorf("frees-by-size has wrong bucket or counts length: %d buckets, %d counts", b, c)
+	}
+	if len(objects.alloc.Buckets) != len(objects.free.Buckets) {
+		t.Error("allocs-by-size and frees-by-size buckets don't match in length")
+	} else if len(objects.alloc.Counts) != len(objects.free.Counts) {
+		t.Error("allocs-by-size and frees-by-size counts don't match in length")
+	} else {
+		for i := range objects.alloc.Buckets {
+			ba := objects.alloc.Buckets[i]
+			bf := objects.free.Buckets[i]
+			if ba != bf {
+				t.Errorf("bucket %d is different for alloc and free hists: %f != %f", i, ba, bf)
+			}
+		}
+		if !t.Failed() {
+			var gotAlloc, gotFree uint64
+			want := objects.total
+			for i := range objects.alloc.Counts {
+				if objects.alloc.Counts[i] < objects.free.Counts[i] {
+					t.Errorf("found more allocs than frees in object dist bucket %d", i)
+					continue
+				}
+				gotAlloc += objects.alloc.Counts[i]
+				gotFree += objects.free.Counts[i]
+			}
+			if got := gotAlloc - gotFree; got != want {
+				t.Errorf("object distribution counts don't match count of live objects: got %d, want %d", got, want)
+			}
+			if gotAlloc != objects.allocs {
+				t.Errorf("object distribution counts don't match total allocs: got %d, want %d", gotAlloc, objects.allocs)
+			}
+			if gotFree != objects.frees {
+				t.Errorf("object distribution counts don't match total allocs: got %d, want %d", gotFree, objects.frees)
+			}
+		}
+	}
+	// The current GC has at least 2 pauses per GC.
+	// Check to see if that value makes sense.
+	if gc.pauses < gc.numGC*2 {
+		t.Errorf("fewer pauses than expected: got %d, want at least %d", gc.pauses, gc.numGC*2)
+	}
+	if totalScan.got <= 0 {
+		t.Errorf("scannable GC space is empty: %d", totalScan.got)
+	}
+	if totalScan.got != totalScan.want {
+		t.Errorf("/gc/scan/total:bytes doesn't line up with sum of /gc/scan*: total %d vs. sum %d", totalScan.got, totalScan.want)
+	}
+}
+
+func BenchmarkReadMetricsLatency(b *testing.B) {
+	stop := applyGCLoad(b)
+
+	// Spend this much time measuring latencies.
+	latencies := make([]time.Duration, 0, 1024)
+	_, samples := prepareAllMetricsSamples()
+
+	// Hit metrics.Read continuously and measure.
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		start := time.Now()
+		metrics.Read(samples)
+		latencies = append(latencies, time.Since(start))
+	}
+	// Make sure to stop the timer before we wait! The load created above
+	// is very heavy-weight and not easy to stop, so we could end up
+	// confusing the benchmarking framework for small b.N.
+	b.StopTimer()
+	stop()
+
+	// Disable the default */op metrics.
+	// ns/op doesn't mean anything because it's an average, but we
+	// have a sleep in our b.N loop above which skews this significantly.
+	b.ReportMetric(0, "ns/op")
+	b.ReportMetric(0, "B/op")
+	b.ReportMetric(0, "allocs/op")
+
+	// Sort latencies then report percentiles.
+	sort.Slice(latencies, func(i, j int) bool {
+		return latencies[i] < latencies[j]
+	})
+	b.ReportMetric(float64(latencies[len(latencies)*50/100]), "p50-ns")
+	b.ReportMetric(float64(latencies[len(latencies)*90/100]), "p90-ns")
+	b.ReportMetric(float64(latencies[len(latencies)*99/100]), "p99-ns")
+}
+
+var readMetricsSink [1024]interface{}
+
+func TestReadMetricsCumulative(t *testing.T) {
+	// Set up the set of metrics marked cumulative.
+	descs := metrics.All()
+	var samples [2][]metrics.Sample
+	samples[0] = make([]metrics.Sample, len(descs))
+	samples[1] = make([]metrics.Sample, len(descs))
+	total := 0
+	for i := range samples[0] {
+		if !descs[i].Cumulative {
+			continue
+		}
+		samples[0][total].Name = descs[i].Name
+		total++
+	}
+	samples[0] = samples[0][:total]
+	samples[1] = samples[1][:total]
+	copy(samples[1], samples[0])
+
+	// Start some noise in the background.
+	var wg sync.WaitGroup
+	wg.Add(1)
+	done := make(chan struct{})
+	go func() {
+		defer wg.Done()
+		for {
+			// Add more things here that could influence metrics.
+			for i := 0; i < len(readMetricsSink); i++ {
+				readMetricsSink[i] = make([]byte, 1024)
+				select {
+				case <-done:
+					return
+				default:
+				}
+			}
+			runtime.GC()
+		}
+	}()
+
+	sum := func(us []uint64) uint64 {
+		total := uint64(0)
+		for _, u := range us {
+			total += u
+		}
+		return total
+	}
+
+	// Populate the first generation.
+	metrics.Read(samples[0])
+
+	// Check to make sure that these metrics only grow monotonically.
+	for gen := 1; gen < 10; gen++ {
+		metrics.Read(samples[gen%2])
+		for i := range samples[gen%2] {
+			name := samples[gen%2][i].Name
+			vNew, vOld := samples[gen%2][i].Value, samples[1-(gen%2)][i].Value
+
+			switch vNew.Kind() {
+			case metrics.KindUint64:
+				new := vNew.Uint64()
+				old := vOld.Uint64()
+				if new < old {
+					t.Errorf("%s decreased: %d < %d", name, new, old)
+				}
+			case metrics.KindFloat64:
+				new := vNew.Float64()
+				old := vOld.Float64()
+				if new < old {
+					t.Errorf("%s decreased: %f < %f", name, new, old)
+				}
+			case metrics.KindFloat64Histogram:
+				new := sum(vNew.Float64Histogram().Counts)
+				old := sum(vOld.Float64Histogram().Counts)
+				if new < old {
+					t.Errorf("%s counts decreased: %d < %d", name, new, old)
+				}
+			}
+		}
+	}
+	close(done)
+
+	wg.Wait()
+}
+
+func withinEpsilon(v1, v2, e float64) bool {
+	return v2-v2*e <= v1 && v1 <= v2+v2*e
+}
+
+func TestMutexWaitTimeMetric(t *testing.T) {
+	var sample [1]metrics.Sample
+	sample[0].Name = "/sync/mutex/wait/total:seconds"
+
+	locks := []locker2{
+		new(mutex),
+		new(rwmutexWrite),
+		new(rwmutexReadWrite),
+		new(rwmutexWriteRead),
+	}
+	for _, lock := range locks {
+		t.Run(reflect.TypeOf(lock).Elem().Name(), func(t *testing.T) {
+			metrics.Read(sample[:])
+			before := time.Duration(sample[0].Value.Float64() * 1e9)
+
+			minMutexWaitTime := generateMutexWaitTime(lock)
+
+			metrics.Read(sample[:])
+			after := time.Duration(sample[0].Value.Float64() * 1e9)
+
+			if wt := after - before; wt < minMutexWaitTime {
+				t.Errorf("too little mutex wait time: got %s, want %s", wt, minMutexWaitTime)
+			}
+		})
+	}
+}
+
+// locker2 represents an API surface of two concurrent goroutines
+// locking the same resource, but through different APIs. It's intended
+// to abstract over the relationship of two Lock calls or an RLock
+// and a Lock call.
+type locker2 interface {
+	Lock1()
+	Unlock1()
+	Lock2()
+	Unlock2()
+}
+
+type mutex struct {
+	mu sync.Mutex
+}
+
+func (m *mutex) Lock1()   { m.mu.Lock() }
+func (m *mutex) Unlock1() { m.mu.Unlock() }
+func (m *mutex) Lock2()   { m.mu.Lock() }
+func (m *mutex) Unlock2() { m.mu.Unlock() }
+
+type rwmutexWrite struct {
+	mu sync.RWMutex
+}
+
+func (m *rwmutexWrite) Lock1()   { m.mu.Lock() }
+func (m *rwmutexWrite) Unlock1() { m.mu.Unlock() }
+func (m *rwmutexWrite) Lock2()   { m.mu.Lock() }
+func (m *rwmutexWrite) Unlock2() { m.mu.Unlock() }
+
+type rwmutexReadWrite struct {
+	mu sync.RWMutex
+}
+
+func (m *rwmutexReadWrite) Lock1()   { m.mu.RLock() }
+func (m *rwmutexReadWrite) Unlock1() { m.mu.RUnlock() }
+func (m *rwmutexReadWrite) Lock2()   { m.mu.Lock() }
+func (m *rwmutexReadWrite) Unlock2() { m.mu.Unlock() }
+
+type rwmutexWriteRead struct {
+	mu sync.RWMutex
+}
+
+func (m *rwmutexWriteRead) Lock1()   { m.mu.Lock() }
+func (m *rwmutexWriteRead) Unlock1() { m.mu.Unlock() }
+func (m *rwmutexWriteRead) Lock2()   { m.mu.RLock() }
+func (m *rwmutexWriteRead) Unlock2() { m.mu.RUnlock() }
+
+// generateMutexWaitTime causes a couple of goroutines
+// to block a whole bunch of times on a sync.Mutex, returning
+// the minimum amount of time that should be visible in the
+// /sync/mutex-wait:seconds metric.
+func generateMutexWaitTime(mu locker2) time.Duration {
+	// Set up the runtime to always track casgstatus transitions for metrics.
+	*runtime.CasGStatusAlwaysTrack = true
+
+	mu.Lock1()
+
+	// Start up a goroutine to wait on the lock.
+	gc := make(chan *runtime.G)
+	done := make(chan bool)
+	go func() {
+		gc <- runtime.Getg()
+
+		for {
+			mu.Lock2()
+			mu.Unlock2()
+			if <-done {
+				return
+			}
+		}
+	}()
+	gp := <-gc
+
+	// Set the block time high enough so that it will always show up, even
+	// on systems with coarse timer granularity.
+	const blockTime = 100 * time.Millisecond
+
+	// Make sure the goroutine spawned above actually blocks on the lock.
+	for {
+		if runtime.GIsWaitingOnMutex(gp) {
+			break
+		}
+		runtime.Gosched()
+	}
+
+	// Let some amount of time pass.
+	time.Sleep(blockTime)
+
+	// Let the other goroutine acquire the lock.
+	mu.Unlock1()
+	done <- true
+
+	// Reset flag.
+	*runtime.CasGStatusAlwaysTrack = false
+	return blockTime
+}
+
+// See issue #60276.
+func TestCPUMetricsSleep(t *testing.T) {
+	if runtime.GOOS == "wasip1" {
+		// Since wasip1 busy-waits in the scheduler, there's no meaningful idle
+		// time. This is accurately reflected in the metrics, but it means this
+		// test is basically meaningless on this platform.
+		t.Skip("wasip1 currently busy-waits in idle time; test not applicable")
+	}
+
+	names := []string{
+		"/cpu/classes/idle:cpu-seconds",
+
+		"/cpu/classes/gc/mark/assist:cpu-seconds",
+		"/cpu/classes/gc/mark/dedicated:cpu-seconds",
+		"/cpu/classes/gc/mark/idle:cpu-seconds",
+		"/cpu/classes/gc/pause:cpu-seconds",
+		"/cpu/classes/gc/total:cpu-seconds",
+		"/cpu/classes/scavenge/assist:cpu-seconds",
+		"/cpu/classes/scavenge/background:cpu-seconds",
+		"/cpu/classes/scavenge/total:cpu-seconds",
+		"/cpu/classes/total:cpu-seconds",
+		"/cpu/classes/user:cpu-seconds",
+	}
+	prep := func() []metrics.Sample {
+		mm := make([]metrics.Sample, len(names))
+		for i := range names {
+			mm[i].Name = names[i]
+		}
+		return mm
+	}
+	m1, m2 := prep(), prep()
+
+	const (
+		// Expected time spent idle.
+		dur = 100 * time.Millisecond
+
+		// maxFailures is the number of consecutive failures requires to cause the test to fail.
+		maxFailures = 10
+	)
+
+	failureIdleTimes := make([]float64, 0, maxFailures)
+
+	// If the bug we expect is happening, then the Sleep CPU time will be accounted for
+	// as user time rather than idle time. In an ideal world we'd expect the whole application
+	// to go instantly idle the moment this goroutine goes to sleep, and stay asleep for that
+	// duration. However, the Go runtime can easily eat into idle time while this goroutine is
+	// blocked in a sleep. For example, slow platforms might spend more time expected in the
+	// scheduler. Another example is that a Go runtime background goroutine could run while
+	// everything else is idle. Lastly, if a running goroutine is descheduled by the OS, enough
+	// time may pass such that the goroutine is ready to wake, even though the runtime couldn't
+	// observe itself as idle with nanotime.
+	//
+	// To deal with all this, we give a half-proc's worth of leniency.
+	//
+	// We also retry multiple times to deal with the fact that the OS might deschedule us before
+	// we yield and go idle. That has a rare enough chance that retries should resolve it.
+	// If the issue we expect is happening, it should be persistent.
+	minIdleCPUSeconds := dur.Seconds() * (float64(runtime.GOMAXPROCS(-1)) - 0.5)
+
+	// Let's make sure there's no background scavenge work to do.
+	//
+	// The runtime.GC calls below ensure the background sweeper
+	// will not run during the idle period.
+	debug.FreeOSMemory()
+
+	for retries := 0; retries < maxFailures; retries++ {
+		// Read 1.
+		runtime.GC() // Update /cpu/classes metrics.
+		metrics.Read(m1)
+
+		// Sleep.
+		time.Sleep(dur)
+
+		// Read 2.
+		runtime.GC() // Update /cpu/classes metrics.
+		metrics.Read(m2)
+
+		dt := m2[0].Value.Float64() - m1[0].Value.Float64()
+		if dt >= minIdleCPUSeconds {
+			// All is well. Test passed.
+			return
+		}
+		failureIdleTimes = append(failureIdleTimes, dt)
+		// Try again.
+	}
+
+	// We couldn't observe the expected idle time even once.
+	for i, dt := range failureIdleTimes {
+		t.Logf("try %2d: idle time = %.5fs\n", i+1, dt)
+	}
+	t.Logf("try %d breakdown:\n", len(failureIdleTimes))
+	for i := range names {
+		if m1[i].Value.Kind() == metrics.KindBad {
+			continue
+		}
+		t.Logf("\t%s %0.3f\n", names[i], m2[i].Value.Float64()-m1[i].Value.Float64())
+	}
+	t.Errorf(`time.Sleep did not contribute enough to "idle" class: minimum idle time = %.5fs`, minIdleCPUSeconds)
+}
diff --git a/src/runtime/mfinal.go b/src/runtime/mfinal.go
new file mode 100644
index 0000000..650db18
--- /dev/null
+++ b/src/runtime/mfinal.go
@@ -0,0 +1,525 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Garbage collector: finalizers and block profiling.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// finblock is an array of finalizers to be executed. finblocks are
+// arranged in a linked list for the finalizer queue.
+//
+// finblock is allocated from non-GC'd memory, so any heap pointers
+// must be specially handled. GC currently assumes that the finalizer
+// queue does not grow during marking (but it can shrink).
+type finblock struct {
+	_       sys.NotInHeap
+	alllink *finblock
+	next    *finblock
+	cnt     uint32
+	_       int32
+	fin     [(_FinBlockSize - 2*goarch.PtrSize - 2*4) / unsafe.Sizeof(finalizer{})]finalizer
+}
+
+var fingStatus atomic.Uint32
+
+// finalizer goroutine status.
+const (
+	fingUninitialized uint32 = iota
+	fingCreated       uint32 = 1 << (iota - 1)
+	fingRunningFinalizer
+	fingWait
+	fingWake
+)
+
+var finlock mutex  // protects the following variables
+var fing *g        // goroutine that runs finalizers
+var finq *finblock // list of finalizers that are to be executed
+var finc *finblock // cache of free blocks
+var finptrmask [_FinBlockSize / goarch.PtrSize / 8]byte
+
+var allfin *finblock // list of all blocks
+
+// NOTE: Layout known to queuefinalizer.
+type finalizer struct {
+	fn   *funcval       // function to call (may be a heap pointer)
+	arg  unsafe.Pointer // ptr to object (may be a heap pointer)
+	nret uintptr        // bytes of return values from fn
+	fint *_type         // type of first argument of fn
+	ot   *ptrtype       // type of ptr to object (may be a heap pointer)
+}
+
+var finalizer1 = [...]byte{
+	// Each Finalizer is 5 words, ptr ptr INT ptr ptr (INT = uintptr here)
+	// Each byte describes 8 words.
+	// Need 8 Finalizers described by 5 bytes before pattern repeats:
+	//	ptr ptr INT ptr ptr
+	//	ptr ptr INT ptr ptr
+	//	ptr ptr INT ptr ptr
+	//	ptr ptr INT ptr ptr
+	//	ptr ptr INT ptr ptr
+	//	ptr ptr INT ptr ptr
+	//	ptr ptr INT ptr ptr
+	//	ptr ptr INT ptr ptr
+	// aka
+	//
+	//	ptr ptr INT ptr ptr ptr ptr INT
+	//	ptr ptr ptr ptr INT ptr ptr ptr
+	//	ptr INT ptr ptr ptr ptr INT ptr
+	//	ptr ptr ptr INT ptr ptr ptr ptr
+	//	INT ptr ptr ptr ptr INT ptr ptr
+	//
+	// Assumptions about Finalizer layout checked below.
+	1<<0 | 1<<1 | 0<<2 | 1<<3 | 1<<4 | 1<<5 | 1<<6 | 0<<7,
+	1<<0 | 1<<1 | 1<<2 | 1<<3 | 0<<4 | 1<<5 | 1<<6 | 1<<7,
+	1<<0 | 0<<1 | 1<<2 | 1<<3 | 1<<4 | 1<<5 | 0<<6 | 1<<7,
+	1<<0 | 1<<1 | 1<<2 | 0<<3 | 1<<4 | 1<<5 | 1<<6 | 1<<7,
+	0<<0 | 1<<1 | 1<<2 | 1<<3 | 1<<4 | 0<<5 | 1<<6 | 1<<7,
+}
+
+// lockRankMayQueueFinalizer records the lock ranking effects of a
+// function that may call queuefinalizer.
+func lockRankMayQueueFinalizer() {
+	lockWithRankMayAcquire(&finlock, getLockRank(&finlock))
+}
+
+func queuefinalizer(p unsafe.Pointer, fn *funcval, nret uintptr, fint *_type, ot *ptrtype) {
+	if gcphase != _GCoff {
+		// Currently we assume that the finalizer queue won't
+		// grow during marking so we don't have to rescan it
+		// during mark termination. If we ever need to lift
+		// this assumption, we can do it by adding the
+		// necessary barriers to queuefinalizer (which it may
+		// have automatically).
+		throw("queuefinalizer during GC")
+	}
+
+	lock(&finlock)
+	if finq == nil || finq.cnt == uint32(len(finq.fin)) {
+		if finc == nil {
+			finc = (*finblock)(persistentalloc(_FinBlockSize, 0, &memstats.gcMiscSys))
+			finc.alllink = allfin
+			allfin = finc
+			if finptrmask[0] == 0 {
+				// Build pointer mask for Finalizer array in block.
+				// Check assumptions made in finalizer1 array above.
+				if (unsafe.Sizeof(finalizer{}) != 5*goarch.PtrSize ||
+					unsafe.Offsetof(finalizer{}.fn) != 0 ||
+					unsafe.Offsetof(finalizer{}.arg) != goarch.PtrSize ||
+					unsafe.Offsetof(finalizer{}.nret) != 2*goarch.PtrSize ||
+					unsafe.Offsetof(finalizer{}.fint) != 3*goarch.PtrSize ||
+					unsafe.Offsetof(finalizer{}.ot) != 4*goarch.PtrSize) {
+					throw("finalizer out of sync")
+				}
+				for i := range finptrmask {
+					finptrmask[i] = finalizer1[i%len(finalizer1)]
+				}
+			}
+		}
+		block := finc
+		finc = block.next
+		block.next = finq
+		finq = block
+	}
+	f := &finq.fin[finq.cnt]
+	atomic.Xadd(&finq.cnt, +1) // Sync with markroots
+	f.fn = fn
+	f.nret = nret
+	f.fint = fint
+	f.ot = ot
+	f.arg = p
+	unlock(&finlock)
+	fingStatus.Or(fingWake)
+}
+
+//go:nowritebarrier
+func iterate_finq(callback func(*funcval, unsafe.Pointer, uintptr, *_type, *ptrtype)) {
+	for fb := allfin; fb != nil; fb = fb.alllink {
+		for i := uint32(0); i < fb.cnt; i++ {
+			f := &fb.fin[i]
+			callback(f.fn, f.arg, f.nret, f.fint, f.ot)
+		}
+	}
+}
+
+func wakefing() *g {
+	if ok := fingStatus.CompareAndSwap(fingCreated|fingWait|fingWake, fingCreated); ok {
+		return fing
+	}
+	return nil
+}
+
+func createfing() {
+	// start the finalizer goroutine exactly once
+	if fingStatus.Load() == fingUninitialized && fingStatus.CompareAndSwap(fingUninitialized, fingCreated) {
+		go runfinq()
+	}
+}
+
+func finalizercommit(gp *g, lock unsafe.Pointer) bool {
+	unlock((*mutex)(lock))
+	// fingStatus should be modified after fing is put into a waiting state
+	// to avoid waking fing in running state, even if it is about to be parked.
+	fingStatus.Or(fingWait)
+	return true
+}
+
+// This is the goroutine that runs all of the finalizers.
+func runfinq() {
+	var (
+		frame    unsafe.Pointer
+		framecap uintptr
+		argRegs  int
+	)
+
+	gp := getg()
+	lock(&finlock)
+	fing = gp
+	unlock(&finlock)
+
+	for {
+		lock(&finlock)
+		fb := finq
+		finq = nil
+		if fb == nil {
+			gopark(finalizercommit, unsafe.Pointer(&finlock), waitReasonFinalizerWait, traceBlockSystemGoroutine, 1)
+			continue
+		}
+		argRegs = intArgRegs
+		unlock(&finlock)
+		if raceenabled {
+			racefingo()
+		}
+		for fb != nil {
+			for i := fb.cnt; i > 0; i-- {
+				f := &fb.fin[i-1]
+
+				var regs abi.RegArgs
+				// The args may be passed in registers or on stack. Even for
+				// the register case, we still need the spill slots.
+				// TODO: revisit if we remove spill slots.
+				//
+				// Unfortunately because we can have an arbitrary
+				// amount of returns and it would be complex to try and
+				// figure out how many of those can get passed in registers,
+				// just conservatively assume none of them do.
+				framesz := unsafe.Sizeof((any)(nil)) + f.nret
+				if framecap < framesz {
+					// The frame does not contain pointers interesting for GC,
+					// all not yet finalized objects are stored in finq.
+					// If we do not mark it as FlagNoScan,
+					// the last finalized object is not collected.
+					frame = mallocgc(framesz, nil, true)
+					framecap = framesz
+				}
+
+				if f.fint == nil {
+					throw("missing type in runfinq")
+				}
+				r := frame
+				if argRegs > 0 {
+					r = unsafe.Pointer(&regs.Ints)
+				} else {
+					// frame is effectively uninitialized
+					// memory. That means we have to clear
+					// it before writing to it to avoid
+					// confusing the write barrier.
+					*(*[2]uintptr)(frame) = [2]uintptr{}
+				}
+				switch f.fint.Kind_ & kindMask {
+				case kindPtr:
+					// direct use of pointer
+					*(*unsafe.Pointer)(r) = f.arg
+				case kindInterface:
+					ityp := (*interfacetype)(unsafe.Pointer(f.fint))
+					// set up with empty interface
+					(*eface)(r)._type = &f.ot.Type
+					(*eface)(r).data = f.arg
+					if len(ityp.Methods) != 0 {
+						// convert to interface with methods
+						// this conversion is guaranteed to succeed - we checked in SetFinalizer
+						(*iface)(r).tab = assertE2I(ityp, (*eface)(r)._type)
+					}
+				default:
+					throw("bad kind in runfinq")
+				}
+				fingStatus.Or(fingRunningFinalizer)
+				reflectcall(nil, unsafe.Pointer(f.fn), frame, uint32(framesz), uint32(framesz), uint32(framesz), &regs)
+				fingStatus.And(^fingRunningFinalizer)
+
+				// Drop finalizer queue heap references
+				// before hiding them from markroot.
+				// This also ensures these will be
+				// clear if we reuse the finalizer.
+				f.fn = nil
+				f.arg = nil
+				f.ot = nil
+				atomic.Store(&fb.cnt, i-1)
+			}
+			next := fb.next
+			lock(&finlock)
+			fb.next = finc
+			finc = fb
+			unlock(&finlock)
+			fb = next
+		}
+	}
+}
+
+func isGoPointerWithoutSpan(p unsafe.Pointer) bool {
+	// 0-length objects are okay.
+	if p == unsafe.Pointer(&zerobase) {
+		return true
+	}
+
+	// Global initializers might be linker-allocated.
+	//	var Foo = &Object{}
+	//	func main() {
+	//		runtime.SetFinalizer(Foo, nil)
+	//	}
+	// The relevant segments are: noptrdata, data, bss, noptrbss.
+	// We cannot assume they are in any order or even contiguous,
+	// due to external linking.
+	for datap := &firstmoduledata; datap != nil; datap = datap.next {
+		if datap.noptrdata <= uintptr(p) && uintptr(p) < datap.enoptrdata ||
+			datap.data <= uintptr(p) && uintptr(p) < datap.edata ||
+			datap.bss <= uintptr(p) && uintptr(p) < datap.ebss ||
+			datap.noptrbss <= uintptr(p) && uintptr(p) < datap.enoptrbss {
+			return true
+		}
+	}
+	return false
+}
+
+// SetFinalizer sets the finalizer associated with obj to the provided
+// finalizer function. When the garbage collector finds an unreachable block
+// with an associated finalizer, it clears the association and runs
+// finalizer(obj) in a separate goroutine. This makes obj reachable again,
+// but now without an associated finalizer. Assuming that SetFinalizer
+// is not called again, the next time the garbage collector sees
+// that obj is unreachable, it will free obj.
+//
+// SetFinalizer(obj, nil) clears any finalizer associated with obj.
+//
+// The argument obj must be a pointer to an object allocated by calling
+// new, by taking the address of a composite literal, or by taking the
+// address of a local variable.
+// The argument finalizer must be a function that takes a single argument
+// to which obj's type can be assigned, and can have arbitrary ignored return
+// values. If either of these is not true, SetFinalizer may abort the
+// program.
+//
+// Finalizers are run in dependency order: if A points at B, both have
+// finalizers, and they are otherwise unreachable, only the finalizer
+// for A runs; once A is freed, the finalizer for B can run.
+// If a cyclic structure includes a block with a finalizer, that
+// cycle is not guaranteed to be garbage collected and the finalizer
+// is not guaranteed to run, because there is no ordering that
+// respects the dependencies.
+//
+// The finalizer is scheduled to run at some arbitrary time after the
+// program can no longer reach the object to which obj points.
+// There is no guarantee that finalizers will run before a program exits,
+// so typically they are useful only for releasing non-memory resources
+// associated with an object during a long-running program.
+// For example, an os.File object could use a finalizer to close the
+// associated operating system file descriptor when a program discards
+// an os.File without calling Close, but it would be a mistake
+// to depend on a finalizer to flush an in-memory I/O buffer such as a
+// bufio.Writer, because the buffer would not be flushed at program exit.
+//
+// It is not guaranteed that a finalizer will run if the size of *obj is
+// zero bytes, because it may share same address with other zero-size
+// objects in memory. See https://go.dev/ref/spec#Size_and_alignment_guarantees.
+//
+// It is not guaranteed that a finalizer will run for objects allocated
+// in initializers for package-level variables. Such objects may be
+// linker-allocated, not heap-allocated.
+//
+// Note that because finalizers may execute arbitrarily far into the future
+// after an object is no longer referenced, the runtime is allowed to perform
+// a space-saving optimization that batches objects together in a single
+// allocation slot. The finalizer for an unreferenced object in such an
+// allocation may never run if it always exists in the same batch as a
+// referenced object. Typically, this batching only happens for tiny
+// (on the order of 16 bytes or less) and pointer-free objects.
+//
+// A finalizer may run as soon as an object becomes unreachable.
+// In order to use finalizers correctly, the program must ensure that
+// the object is reachable until it is no longer required.
+// Objects stored in global variables, or that can be found by tracing
+// pointers from a global variable, are reachable. For other objects,
+// pass the object to a call of the KeepAlive function to mark the
+// last point in the function where the object must be reachable.
+//
+// For example, if p points to a struct, such as os.File, that contains
+// a file descriptor d, and p has a finalizer that closes that file
+// descriptor, and if the last use of p in a function is a call to
+// syscall.Write(p.d, buf, size), then p may be unreachable as soon as
+// the program enters syscall.Write. The finalizer may run at that moment,
+// closing p.d, causing syscall.Write to fail because it is writing to
+// a closed file descriptor (or, worse, to an entirely different
+// file descriptor opened by a different goroutine). To avoid this problem,
+// call KeepAlive(p) after the call to syscall.Write.
+//
+// A single goroutine runs all finalizers for a program, sequentially.
+// If a finalizer must run for a long time, it should do so by starting
+// a new goroutine.
+//
+// In the terminology of the Go memory model, a call
+// SetFinalizer(x, f) “synchronizes before” the finalization call f(x).
+// However, there is no guarantee that KeepAlive(x) or any other use of x
+// “synchronizes before” f(x), so in general a finalizer should use a mutex
+// or other synchronization mechanism if it needs to access mutable state in x.
+// For example, consider a finalizer that inspects a mutable field in x
+// that is modified from time to time in the main program before x
+// becomes unreachable and the finalizer is invoked.
+// The modifications in the main program and the inspection in the finalizer
+// need to use appropriate synchronization, such as mutexes or atomic updates,
+// to avoid read-write races.
+func SetFinalizer(obj any, finalizer any) {
+	if debug.sbrk != 0 {
+		// debug.sbrk never frees memory, so no finalizers run
+		// (and we don't have the data structures to record them).
+		return
+	}
+	e := efaceOf(&obj)
+	etyp := e._type
+	if etyp == nil {
+		throw("runtime.SetFinalizer: first argument is nil")
+	}
+	if etyp.Kind_&kindMask != kindPtr {
+		throw("runtime.SetFinalizer: first argument is " + toRType(etyp).string() + ", not pointer")
+	}
+	ot := (*ptrtype)(unsafe.Pointer(etyp))
+	if ot.Elem == nil {
+		throw("nil elem type!")
+	}
+
+	if inUserArenaChunk(uintptr(e.data)) {
+		// Arena-allocated objects are not eligible for finalizers.
+		throw("runtime.SetFinalizer: first argument was allocated into an arena")
+	}
+
+	// find the containing object
+	base, _, _ := findObject(uintptr(e.data), 0, 0)
+
+	if base == 0 {
+		if isGoPointerWithoutSpan(e.data) {
+			return
+		}
+		throw("runtime.SetFinalizer: pointer not in allocated block")
+	}
+
+	if uintptr(e.data) != base {
+		// As an implementation detail we allow to set finalizers for an inner byte
+		// of an object if it could come from tiny alloc (see mallocgc for details).
+		if ot.Elem == nil || ot.Elem.PtrBytes != 0 || ot.Elem.Size_ >= maxTinySize {
+			throw("runtime.SetFinalizer: pointer not at beginning of allocated block")
+		}
+	}
+
+	f := efaceOf(&finalizer)
+	ftyp := f._type
+	if ftyp == nil {
+		// switch to system stack and remove finalizer
+		systemstack(func() {
+			removefinalizer(e.data)
+		})
+		return
+	}
+
+	if ftyp.Kind_&kindMask != kindFunc {
+		throw("runtime.SetFinalizer: second argument is " + toRType(ftyp).string() + ", not a function")
+	}
+	ft := (*functype)(unsafe.Pointer(ftyp))
+	if ft.IsVariadic() {
+		throw("runtime.SetFinalizer: cannot pass " + toRType(etyp).string() + " to finalizer " + toRType(ftyp).string() + " because dotdotdot")
+	}
+	if ft.InCount != 1 {
+		throw("runtime.SetFinalizer: cannot pass " + toRType(etyp).string() + " to finalizer " + toRType(ftyp).string())
+	}
+	fint := ft.InSlice()[0]
+	switch {
+	case fint == etyp:
+		// ok - same type
+		goto okarg
+	case fint.Kind_&kindMask == kindPtr:
+		if (fint.Uncommon() == nil || etyp.Uncommon() == nil) && (*ptrtype)(unsafe.Pointer(fint)).Elem == ot.Elem {
+			// ok - not same type, but both pointers,
+			// one or the other is unnamed, and same element type, so assignable.
+			goto okarg
+		}
+	case fint.Kind_&kindMask == kindInterface:
+		ityp := (*interfacetype)(unsafe.Pointer(fint))
+		if len(ityp.Methods) == 0 {
+			// ok - satisfies empty interface
+			goto okarg
+		}
+		if iface := assertE2I2(ityp, *efaceOf(&obj)); iface.tab != nil {
+			goto okarg
+		}
+	}
+	throw("runtime.SetFinalizer: cannot pass " + toRType(etyp).string() + " to finalizer " + toRType(ftyp).string())
+okarg:
+	// compute size needed for return parameters
+	nret := uintptr(0)
+	for _, t := range ft.OutSlice() {
+		nret = alignUp(nret, uintptr(t.Align_)) + uintptr(t.Size_)
+	}
+	nret = alignUp(nret, goarch.PtrSize)
+
+	// make sure we have a finalizer goroutine
+	createfing()
+
+	systemstack(func() {
+		if !addfinalizer(e.data, (*funcval)(f.data), nret, fint, ot) {
+			throw("runtime.SetFinalizer: finalizer already set")
+		}
+	})
+}
+
+// Mark KeepAlive as noinline so that it is easily detectable as an intrinsic.
+//
+//go:noinline
+
+// KeepAlive marks its argument as currently reachable.
+// This ensures that the object is not freed, and its finalizer is not run,
+// before the point in the program where KeepAlive is called.
+//
+// A very simplified example showing where KeepAlive is required:
+//
+//	type File struct { d int }
+//	d, err := syscall.Open("/file/path", syscall.O_RDONLY, 0)
+//	// ... do something if err != nil ...
+//	p := &File{d}
+//	runtime.SetFinalizer(p, func(p *File) { syscall.Close(p.d) })
+//	var buf [10]byte
+//	n, err := syscall.Read(p.d, buf[:])
+//	// Ensure p is not finalized until Read returns.
+//	runtime.KeepAlive(p)
+//	// No more uses of p after this point.
+//
+// Without the KeepAlive call, the finalizer could run at the start of
+// syscall.Read, closing the file descriptor before syscall.Read makes
+// the actual system call.
+//
+// Note: KeepAlive should only be used to prevent finalizers from
+// running prematurely. In particular, when used with unsafe.Pointer,
+// the rules for valid uses of unsafe.Pointer still apply.
+func KeepAlive(x any) {
+	// Introduce a use of x that the compiler can't eliminate.
+	// This makes sure x is alive on entry. We need x to be alive
+	// on entry for "defer runtime.KeepAlive(x)"; see issue 21402.
+	if cgoAlwaysFalse {
+		println(x)
+	}
+}
diff --git a/src/runtime/mfinal_test.go b/src/runtime/mfinal_test.go
new file mode 100644
index 0000000..87d31c4
--- /dev/null
+++ b/src/runtime/mfinal_test.go
@@ -0,0 +1,250 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+type Tintptr *int // assignable to *int
+type Tint int     // *Tint implements Tinter, interface{}
+
+func (t *Tint) m() {}
+
+type Tinter interface {
+	m()
+}
+
+func TestFinalizerType(t *testing.T) {
+	ch := make(chan bool, 10)
+	finalize := func(x *int) {
+		if *x != 97531 {
+			t.Errorf("finalizer %d, want %d", *x, 97531)
+		}
+		ch <- true
+	}
+
+	var finalizerTests = []struct {
+		convert   func(*int) any
+		finalizer any
+	}{
+		{func(x *int) any { return x }, func(v *int) { finalize(v) }},
+		{func(x *int) any { return Tintptr(x) }, func(v Tintptr) { finalize(v) }},
+		{func(x *int) any { return Tintptr(x) }, func(v *int) { finalize(v) }},
+		{func(x *int) any { return (*Tint)(x) }, func(v *Tint) { finalize((*int)(v)) }},
+		{func(x *int) any { return (*Tint)(x) }, func(v Tinter) { finalize((*int)(v.(*Tint))) }},
+		// Test case for argument spill slot.
+		// If the spill slot was not counted for the frame size, it will (incorrectly) choose
+		// call32 as the result has (exactly) 32 bytes. When the argument actually spills,
+		// it clobbers the caller's frame (likely the return PC).
+		{func(x *int) any { return x }, func(v any) [4]int64 {
+			print() // force spill
+			finalize(v.(*int))
+			return [4]int64{}
+		}},
+	}
+
+	for _, tt := range finalizerTests {
+		done := make(chan bool, 1)
+		go func() {
+			// allocate struct with pointer to avoid hitting tinyalloc.
+			// Otherwise we can't be sure when the allocation will
+			// be freed.
+			type T struct {
+				v int
+				p unsafe.Pointer
+			}
+			v := &new(T).v
+			*v = 97531
+			runtime.SetFinalizer(tt.convert(v), tt.finalizer)
+			v = nil
+			done <- true
+		}()
+		<-done
+		runtime.GC()
+		<-ch
+	}
+}
+
+type bigValue struct {
+	fill uint64
+	it   bool
+	up   string
+}
+
+func TestFinalizerInterfaceBig(t *testing.T) {
+	ch := make(chan bool)
+	done := make(chan bool, 1)
+	go func() {
+		v := &bigValue{0xDEADBEEFDEADBEEF, true, "It matters not how strait the gate"}
+		old := *v
+		runtime.SetFinalizer(v, func(v any) {
+			i, ok := v.(*bigValue)
+			if !ok {
+				t.Errorf("finalizer called with type %T, want *bigValue", v)
+			}
+			if *i != old {
+				t.Errorf("finalizer called with %+v, want %+v", *i, old)
+			}
+			close(ch)
+		})
+		v = nil
+		done <- true
+	}()
+	<-done
+	runtime.GC()
+	<-ch
+}
+
+func fin(v *int) {
+}
+
+// Verify we don't crash at least. golang.org/issue/6857
+func TestFinalizerZeroSizedStruct(t *testing.T) {
+	type Z struct{}
+	z := new(Z)
+	runtime.SetFinalizer(z, func(*Z) {})
+}
+
+func BenchmarkFinalizer(b *testing.B) {
+	const Batch = 1000
+	b.RunParallel(func(pb *testing.PB) {
+		var data [Batch]*int
+		for i := 0; i < Batch; i++ {
+			data[i] = new(int)
+		}
+		for pb.Next() {
+			for i := 0; i < Batch; i++ {
+				runtime.SetFinalizer(data[i], fin)
+			}
+			for i := 0; i < Batch; i++ {
+				runtime.SetFinalizer(data[i], nil)
+			}
+		}
+	})
+}
+
+func BenchmarkFinalizerRun(b *testing.B) {
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			v := new(int)
+			runtime.SetFinalizer(v, fin)
+		}
+	})
+}
+
+// One chunk must be exactly one sizeclass in size.
+// It should be a sizeclass not used much by others, so we
+// have a greater chance of finding adjacent ones.
+// size class 19: 320 byte objects, 25 per page, 1 page alloc at a time
+const objsize = 320
+
+type objtype [objsize]byte
+
+func adjChunks() (*objtype, *objtype) {
+	var s []*objtype
+
+	for {
+		c := new(objtype)
+		for _, d := range s {
+			if uintptr(unsafe.Pointer(c))+unsafe.Sizeof(*c) == uintptr(unsafe.Pointer(d)) {
+				return c, d
+			}
+			if uintptr(unsafe.Pointer(d))+unsafe.Sizeof(*c) == uintptr(unsafe.Pointer(c)) {
+				return d, c
+			}
+		}
+		s = append(s, c)
+	}
+}
+
+// Make sure an empty slice on the stack doesn't pin the next object in memory.
+func TestEmptySlice(t *testing.T) {
+	x, y := adjChunks()
+
+	// the pointer inside xs points to y.
+	xs := x[objsize:] // change objsize to objsize-1 and the test passes
+
+	fin := make(chan bool, 1)
+	runtime.SetFinalizer(y, func(z *objtype) { fin <- true })
+	runtime.GC()
+	<-fin
+	xsglobal = xs // keep empty slice alive until here
+}
+
+var xsglobal []byte
+
+func adjStringChunk() (string, *objtype) {
+	b := make([]byte, objsize)
+	for {
+		s := string(b)
+		t := new(objtype)
+		p := *(*uintptr)(unsafe.Pointer(&s))
+		q := uintptr(unsafe.Pointer(t))
+		if p+objsize == q {
+			return s, t
+		}
+	}
+}
+
+// Make sure an empty string on the stack doesn't pin the next object in memory.
+func TestEmptyString(t *testing.T) {
+	x, y := adjStringChunk()
+
+	ss := x[objsize:] // change objsize to objsize-1 and the test passes
+	fin := make(chan bool, 1)
+	// set finalizer on string contents of y
+	runtime.SetFinalizer(y, func(z *objtype) { fin <- true })
+	runtime.GC()
+	<-fin
+	ssglobal = ss // keep 0-length string live until here
+}
+
+var ssglobal string
+
+// Test for issue 7656.
+func TestFinalizerOnGlobal(t *testing.T) {
+	runtime.SetFinalizer(Foo1, func(p *Object1) {})
+	runtime.SetFinalizer(Foo2, func(p *Object2) {})
+	runtime.SetFinalizer(Foo1, nil)
+	runtime.SetFinalizer(Foo2, nil)
+}
+
+type Object1 struct {
+	Something []byte
+}
+
+type Object2 struct {
+	Something byte
+}
+
+var (
+	Foo2 = &Object2{}
+	Foo1 = &Object1{}
+)
+
+func TestDeferKeepAlive(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+
+	// See issue 21402.
+	t.Parallel()
+	type T *int // needs to be a pointer base type to avoid tinyalloc and its never-finalized behavior.
+	x := new(T)
+	finRun := false
+	runtime.SetFinalizer(x, func(x *T) {
+		finRun = true
+	})
+	defer runtime.KeepAlive(x)
+	runtime.GC()
+	time.Sleep(time.Second)
+	if finRun {
+		t.Errorf("finalizer ran prematurely")
+	}
+}
diff --git a/src/runtime/mfixalloc.go b/src/runtime/mfixalloc.go
new file mode 100644
index 0000000..1a249e5
--- /dev/null
+++ b/src/runtime/mfixalloc.go
@@ -0,0 +1,111 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Fixed-size object allocator. Returned memory is not zeroed.
+//
+// See malloc.go for overview.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// fixalloc is a simple free-list allocator for fixed size objects.
+// Malloc uses a FixAlloc wrapped around sysAlloc to manage its
+// mcache and mspan objects.
+//
+// Memory returned by fixalloc.alloc is zeroed by default, but the
+// caller may take responsibility for zeroing allocations by setting
+// the zero flag to false. This is only safe if the memory never
+// contains heap pointers.
+//
+// The caller is responsible for locking around FixAlloc calls.
+// Callers can keep state in the object but the first word is
+// smashed by freeing and reallocating.
+//
+// Consider marking fixalloc'd types not in heap by embedding
+// runtime/internal/sys.NotInHeap.
+type fixalloc struct {
+	size   uintptr
+	first  func(arg, p unsafe.Pointer) // called first time p is returned
+	arg    unsafe.Pointer
+	list   *mlink
+	chunk  uintptr // use uintptr instead of unsafe.Pointer to avoid write barriers
+	nchunk uint32  // bytes remaining in current chunk
+	nalloc uint32  // size of new chunks in bytes
+	inuse  uintptr // in-use bytes now
+	stat   *sysMemStat
+	zero   bool // zero allocations
+}
+
+// A generic linked list of blocks.  (Typically the block is bigger than sizeof(MLink).)
+// Since assignments to mlink.next will result in a write barrier being performed
+// this cannot be used by some of the internal GC structures. For example when
+// the sweeper is placing an unmarked object on the free list it does not want the
+// write barrier to be called since that could result in the object being reachable.
+type mlink struct {
+	_    sys.NotInHeap
+	next *mlink
+}
+
+// Initialize f to allocate objects of the given size,
+// using the allocator to obtain chunks of memory.
+func (f *fixalloc) init(size uintptr, first func(arg, p unsafe.Pointer), arg unsafe.Pointer, stat *sysMemStat) {
+	if size > _FixAllocChunk {
+		throw("runtime: fixalloc size too large")
+	}
+	if min := unsafe.Sizeof(mlink{}); size < min {
+		size = min
+	}
+
+	f.size = size
+	f.first = first
+	f.arg = arg
+	f.list = nil
+	f.chunk = 0
+	f.nchunk = 0
+	f.nalloc = uint32(_FixAllocChunk / size * size) // Round _FixAllocChunk down to an exact multiple of size to eliminate tail waste
+	f.inuse = 0
+	f.stat = stat
+	f.zero = true
+}
+
+func (f *fixalloc) alloc() unsafe.Pointer {
+	if f.size == 0 {
+		print("runtime: use of FixAlloc_Alloc before FixAlloc_Init\n")
+		throw("runtime: internal error")
+	}
+
+	if f.list != nil {
+		v := unsafe.Pointer(f.list)
+		f.list = f.list.next
+		f.inuse += f.size
+		if f.zero {
+			memclrNoHeapPointers(v, f.size)
+		}
+		return v
+	}
+	if uintptr(f.nchunk) < f.size {
+		f.chunk = uintptr(persistentalloc(uintptr(f.nalloc), 0, f.stat))
+		f.nchunk = f.nalloc
+	}
+
+	v := unsafe.Pointer(f.chunk)
+	if f.first != nil {
+		f.first(f.arg, v)
+	}
+	f.chunk = f.chunk + f.size
+	f.nchunk -= uint32(f.size)
+	f.inuse += f.size
+	return v
+}
+
+func (f *fixalloc) free(p unsafe.Pointer) {
+	f.inuse -= f.size
+	v := (*mlink)(p)
+	v.next = f.list
+	f.list = v
+}
diff --git a/src/runtime/mgc.go b/src/runtime/mgc.go
new file mode 100644
index 0000000..a12dbfe
--- /dev/null
+++ b/src/runtime/mgc.go
@@ -0,0 +1,1818 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Garbage collector (GC).
+//
+// The GC runs concurrently with mutator threads, is type accurate (aka precise), allows multiple
+// GC thread to run in parallel. It is a concurrent mark and sweep that uses a write barrier. It is
+// non-generational and non-compacting. Allocation is done using size segregated per P allocation
+// areas to minimize fragmentation while eliminating locks in the common case.
+//
+// The algorithm decomposes into several steps.
+// This is a high level description of the algorithm being used. For an overview of GC a good
+// place to start is Richard Jones' gchandbook.org.
+//
+// The algorithm's intellectual heritage includes Dijkstra's on-the-fly algorithm, see
+// Edsger W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens. 1978.
+// On-the-fly garbage collection: an exercise in cooperation. Commun. ACM 21, 11 (November 1978),
+// 966-975.
+// For journal quality proofs that these steps are complete, correct, and terminate see
+// Hudson, R., and Moss, J.E.B. Copying Garbage Collection without stopping the world.
+// Concurrency and Computation: Practice and Experience 15(3-5), 2003.
+//
+// 1. GC performs sweep termination.
+//
+//    a. Stop the world. This causes all Ps to reach a GC safe-point.
+//
+//    b. Sweep any unswept spans. There will only be unswept spans if
+//    this GC cycle was forced before the expected time.
+//
+// 2. GC performs the mark phase.
+//
+//    a. Prepare for the mark phase by setting gcphase to _GCmark
+//    (from _GCoff), enabling the write barrier, enabling mutator
+//    assists, and enqueueing root mark jobs. No objects may be
+//    scanned until all Ps have enabled the write barrier, which is
+//    accomplished using STW.
+//
+//    b. Start the world. From this point, GC work is done by mark
+//    workers started by the scheduler and by assists performed as
+//    part of allocation. The write barrier shades both the
+//    overwritten pointer and the new pointer value for any pointer
+//    writes (see mbarrier.go for details). Newly allocated objects
+//    are immediately marked black.
+//
+//    c. GC performs root marking jobs. This includes scanning all
+//    stacks, shading all globals, and shading any heap pointers in
+//    off-heap runtime data structures. Scanning a stack stops a
+//    goroutine, shades any pointers found on its stack, and then
+//    resumes the goroutine.
+//
+//    d. GC drains the work queue of grey objects, scanning each grey
+//    object to black and shading all pointers found in the object
+//    (which in turn may add those pointers to the work queue).
+//
+//    e. Because GC work is spread across local caches, GC uses a
+//    distributed termination algorithm to detect when there are no
+//    more root marking jobs or grey objects (see gcMarkDone). At this
+//    point, GC transitions to mark termination.
+//
+// 3. GC performs mark termination.
+//
+//    a. Stop the world.
+//
+//    b. Set gcphase to _GCmarktermination, and disable workers and
+//    assists.
+//
+//    c. Perform housekeeping like flushing mcaches.
+//
+// 4. GC performs the sweep phase.
+//
+//    a. Prepare for the sweep phase by setting gcphase to _GCoff,
+//    setting up sweep state and disabling the write barrier.
+//
+//    b. Start the world. From this point on, newly allocated objects
+//    are white, and allocating sweeps spans before use if necessary.
+//
+//    c. GC does concurrent sweeping in the background and in response
+//    to allocation. See description below.
+//
+// 5. When sufficient allocation has taken place, replay the sequence
+// starting with 1 above. See discussion of GC rate below.
+
+// Concurrent sweep.
+//
+// The sweep phase proceeds concurrently with normal program execution.
+// The heap is swept span-by-span both lazily (when a goroutine needs another span)
+// and concurrently in a background goroutine (this helps programs that are not CPU bound).
+// At the end of STW mark termination all spans are marked as "needs sweeping".
+//
+// The background sweeper goroutine simply sweeps spans one-by-one.
+//
+// To avoid requesting more OS memory while there are unswept spans, when a
+// goroutine needs another span, it first attempts to reclaim that much memory
+// by sweeping. When a goroutine needs to allocate a new small-object span, it
+// sweeps small-object spans for the same object size until it frees at least
+// one object. When a goroutine needs to allocate large-object span from heap,
+// it sweeps spans until it frees at least that many pages into heap. There is
+// one case where this may not suffice: if a goroutine sweeps and frees two
+// nonadjacent one-page spans to the heap, it will allocate a new two-page
+// span, but there can still be other one-page unswept spans which could be
+// combined into a two-page span.
+//
+// It's critical to ensure that no operations proceed on unswept spans (that would corrupt
+// mark bits in GC bitmap). During GC all mcaches are flushed into the central cache,
+// so they are empty. When a goroutine grabs a new span into mcache, it sweeps it.
+// When a goroutine explicitly frees an object or sets a finalizer, it ensures that
+// the span is swept (either by sweeping it, or by waiting for the concurrent sweep to finish).
+// The finalizer goroutine is kicked off only when all spans are swept.
+// When the next GC starts, it sweeps all not-yet-swept spans (if any).
+
+// GC rate.
+// Next GC is after we've allocated an extra amount of memory proportional to
+// the amount already in use. The proportion is controlled by GOGC environment variable
+// (100 by default). If GOGC=100 and we're using 4M, we'll GC again when we get to 8M
+// (this mark is computed by the gcController.heapGoal method). This keeps the GC cost in
+// linear proportion to the allocation cost. Adjusting GOGC just changes the linear constant
+// (and also the amount of extra memory used).
+
+// Oblets
+//
+// In order to prevent long pauses while scanning large objects and to
+// improve parallelism, the garbage collector breaks up scan jobs for
+// objects larger than maxObletBytes into "oblets" of at most
+// maxObletBytes. When scanning encounters the beginning of a large
+// object, it scans only the first oblet and enqueues the remaining
+// oblets as new scan jobs.
+
+package runtime
+
+import (
+	"internal/cpu"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const (
+	_DebugGC         = 0
+	_ConcurrentSweep = true
+	_FinBlockSize    = 4 * 1024
+
+	// debugScanConservative enables debug logging for stack
+	// frames that are scanned conservatively.
+	debugScanConservative = false
+
+	// sweepMinHeapDistance is a lower bound on the heap distance
+	// (in bytes) reserved for concurrent sweeping between GC
+	// cycles.
+	sweepMinHeapDistance = 1024 * 1024
+)
+
+// heapObjectsCanMove always returns false in the current garbage collector.
+// It exists for go4.org/unsafe/assume-no-moving-gc, which is an
+// unfortunate idea that had an even more unfortunate implementation.
+// Every time a new Go release happened, the package stopped building,
+// and the authors had to add a new file with a new //go:build line, and
+// then the entire ecosystem of packages with that as a dependency had to
+// explicitly update to the new version. Many packages depend on
+// assume-no-moving-gc transitively, through paths like
+// inet.af/netaddr -> go4.org/intern -> assume-no-moving-gc.
+// This was causing a significant amount of friction around each new
+// release, so we added this bool for the package to //go:linkname
+// instead. The bool is still unfortunate, but it's not as bad as
+// breaking the ecosystem on every new release.
+//
+// If the Go garbage collector ever does move heap objects, we can set
+// this to true to break all the programs using assume-no-moving-gc.
+//
+//go:linkname heapObjectsCanMove
+func heapObjectsCanMove() bool {
+	return false
+}
+
+func gcinit() {
+	if unsafe.Sizeof(workbuf{}) != _WorkbufSize {
+		throw("size of Workbuf is suboptimal")
+	}
+	// No sweep on the first cycle.
+	sweep.active.state.Store(sweepDrainedMask)
+
+	// Initialize GC pacer state.
+	// Use the environment variable GOGC for the initial gcPercent value.
+	// Use the environment variable GOMEMLIMIT for the initial memoryLimit value.
+	gcController.init(readGOGC(), readGOMEMLIMIT())
+
+	work.startSema = 1
+	work.markDoneSema = 1
+	lockInit(&work.sweepWaiters.lock, lockRankSweepWaiters)
+	lockInit(&work.assistQueue.lock, lockRankAssistQueue)
+	lockInit(&work.wbufSpans.lock, lockRankWbufSpans)
+}
+
+// gcenable is called after the bulk of the runtime initialization,
+// just before we're about to start letting user code run.
+// It kicks off the background sweeper goroutine, the background
+// scavenger goroutine, and enables GC.
+func gcenable() {
+	// Kick off sweeping and scavenging.
+	c := make(chan int, 2)
+	go bgsweep(c)
+	go bgscavenge(c)
+	<-c
+	<-c
+	memstats.enablegc = true // now that runtime is initialized, GC is okay
+}
+
+// Garbage collector phase.
+// Indicates to write barrier and synchronization task to perform.
+var gcphase uint32
+
+// The compiler knows about this variable.
+// If you change it, you must change builtin/runtime.go, too.
+// If you change the first four bytes, you must also change the write
+// barrier insertion code.
+var writeBarrier struct {
+	enabled bool    // compiler emits a check of this before calling write barrier
+	pad     [3]byte // compiler uses 32-bit load for "enabled" field
+	needed  bool    // identical to enabled, for now (TODO: dedup)
+	alignme uint64  // guarantee alignment so that compiler can use a 32 or 64-bit load
+}
+
+// gcBlackenEnabled is 1 if mutator assists and background mark
+// workers are allowed to blacken objects. This must only be set when
+// gcphase == _GCmark.
+var gcBlackenEnabled uint32
+
+const (
+	_GCoff             = iota // GC not running; sweeping in background, write barrier disabled
+	_GCmark                   // GC marking roots and workbufs: allocate black, write barrier ENABLED
+	_GCmarktermination        // GC mark termination: allocate black, P's help GC, write barrier ENABLED
+)
+
+//go:nosplit
+func setGCPhase(x uint32) {
+	atomic.Store(&gcphase, x)
+	writeBarrier.needed = gcphase == _GCmark || gcphase == _GCmarktermination
+	writeBarrier.enabled = writeBarrier.needed
+}
+
+// gcMarkWorkerMode represents the mode that a concurrent mark worker
+// should operate in.
+//
+// Concurrent marking happens through four different mechanisms. One
+// is mutator assists, which happen in response to allocations and are
+// not scheduled. The other three are variations in the per-P mark
+// workers and are distinguished by gcMarkWorkerMode.
+type gcMarkWorkerMode int
+
+const (
+	// gcMarkWorkerNotWorker indicates that the next scheduled G is not
+	// starting work and the mode should be ignored.
+	gcMarkWorkerNotWorker gcMarkWorkerMode = iota
+
+	// gcMarkWorkerDedicatedMode indicates that the P of a mark
+	// worker is dedicated to running that mark worker. The mark
+	// worker should run without preemption.
+	gcMarkWorkerDedicatedMode
+
+	// gcMarkWorkerFractionalMode indicates that a P is currently
+	// running the "fractional" mark worker. The fractional worker
+	// is necessary when GOMAXPROCS*gcBackgroundUtilization is not
+	// an integer and using only dedicated workers would result in
+	// utilization too far from the target of gcBackgroundUtilization.
+	// The fractional worker should run until it is preempted and
+	// will be scheduled to pick up the fractional part of
+	// GOMAXPROCS*gcBackgroundUtilization.
+	gcMarkWorkerFractionalMode
+
+	// gcMarkWorkerIdleMode indicates that a P is running the mark
+	// worker because it has nothing else to do. The idle worker
+	// should run until it is preempted and account its time
+	// against gcController.idleMarkTime.
+	gcMarkWorkerIdleMode
+)
+
+// gcMarkWorkerModeStrings are the strings labels of gcMarkWorkerModes
+// to use in execution traces.
+var gcMarkWorkerModeStrings = [...]string{
+	"Not worker",
+	"GC (dedicated)",
+	"GC (fractional)",
+	"GC (idle)",
+}
+
+// pollFractionalWorkerExit reports whether a fractional mark worker
+// should self-preempt. It assumes it is called from the fractional
+// worker.
+func pollFractionalWorkerExit() bool {
+	// This should be kept in sync with the fractional worker
+	// scheduler logic in findRunnableGCWorker.
+	now := nanotime()
+	delta := now - gcController.markStartTime
+	if delta <= 0 {
+		return true
+	}
+	p := getg().m.p.ptr()
+	selfTime := p.gcFractionalMarkTime + (now - p.gcMarkWorkerStartTime)
+	// Add some slack to the utilization goal so that the
+	// fractional worker isn't behind again the instant it exits.
+	return float64(selfTime)/float64(delta) > 1.2*gcController.fractionalUtilizationGoal
+}
+
+var work workType
+
+type workType struct {
+	full  lfstack          // lock-free list of full blocks workbuf
+	_     cpu.CacheLinePad // prevents false-sharing between full and empty
+	empty lfstack          // lock-free list of empty blocks workbuf
+	_     cpu.CacheLinePad // prevents false-sharing between empty and nproc/nwait
+
+	wbufSpans struct {
+		lock mutex
+		// free is a list of spans dedicated to workbufs, but
+		// that don't currently contain any workbufs.
+		free mSpanList
+		// busy is a list of all spans containing workbufs on
+		// one of the workbuf lists.
+		busy mSpanList
+	}
+
+	// Restore 64-bit alignment on 32-bit.
+	_ uint32
+
+	// bytesMarked is the number of bytes marked this cycle. This
+	// includes bytes blackened in scanned objects, noscan objects
+	// that go straight to black, and permagrey objects scanned by
+	// markroot during the concurrent scan phase. This is updated
+	// atomically during the cycle. Updates may be batched
+	// arbitrarily, since the value is only read at the end of the
+	// cycle.
+	//
+	// Because of benign races during marking, this number may not
+	// be the exact number of marked bytes, but it should be very
+	// close.
+	//
+	// Put this field here because it needs 64-bit atomic access
+	// (and thus 8-byte alignment even on 32-bit architectures).
+	bytesMarked uint64
+
+	markrootNext uint32 // next markroot job
+	markrootJobs uint32 // number of markroot jobs
+
+	nproc  uint32
+	tstart int64
+	nwait  uint32
+
+	// Number of roots of various root types. Set by gcMarkRootPrepare.
+	//
+	// nStackRoots == len(stackRoots), but we have nStackRoots for
+	// consistency.
+	nDataRoots, nBSSRoots, nSpanRoots, nStackRoots int
+
+	// Base indexes of each root type. Set by gcMarkRootPrepare.
+	baseData, baseBSS, baseSpans, baseStacks, baseEnd uint32
+
+	// stackRoots is a snapshot of all of the Gs that existed
+	// before the beginning of concurrent marking. The backing
+	// store of this must not be modified because it might be
+	// shared with allgs.
+	stackRoots []*g
+
+	// Each type of GC state transition is protected by a lock.
+	// Since multiple threads can simultaneously detect the state
+	// transition condition, any thread that detects a transition
+	// condition must acquire the appropriate transition lock,
+	// re-check the transition condition and return if it no
+	// longer holds or perform the transition if it does.
+	// Likewise, any transition must invalidate the transition
+	// condition before releasing the lock. This ensures that each
+	// transition is performed by exactly one thread and threads
+	// that need the transition to happen block until it has
+	// happened.
+	//
+	// startSema protects the transition from "off" to mark or
+	// mark termination.
+	startSema uint32
+	// markDoneSema protects transitions from mark to mark termination.
+	markDoneSema uint32
+
+	bgMarkReady note   // signal background mark worker has started
+	bgMarkDone  uint32 // cas to 1 when at a background mark completion point
+	// Background mark completion signaling
+
+	// mode is the concurrency mode of the current GC cycle.
+	mode gcMode
+
+	// userForced indicates the current GC cycle was forced by an
+	// explicit user call.
+	userForced bool
+
+	// initialHeapLive is the value of gcController.heapLive at the
+	// beginning of this GC cycle.
+	initialHeapLive uint64
+
+	// assistQueue is a queue of assists that are blocked because
+	// there was neither enough credit to steal or enough work to
+	// do.
+	assistQueue struct {
+		lock mutex
+		q    gQueue
+	}
+
+	// sweepWaiters is a list of blocked goroutines to wake when
+	// we transition from mark termination to sweep.
+	sweepWaiters struct {
+		lock mutex
+		list gList
+	}
+
+	// cycles is the number of completed GC cycles, where a GC
+	// cycle is sweep termination, mark, mark termination, and
+	// sweep. This differs from memstats.numgc, which is
+	// incremented at mark termination.
+	cycles atomic.Uint32
+
+	// Timing/utilization stats for this cycle.
+	stwprocs, maxprocs                 int32
+	tSweepTerm, tMark, tMarkTerm, tEnd int64 // nanotime() of phase start
+
+	pauseNS    int64 // total STW time this cycle
+	pauseStart int64 // nanotime() of last STW
+
+	// debug.gctrace heap sizes for this cycle.
+	heap0, heap1, heap2 uint64
+
+	// Cumulative estimated CPU usage.
+	cpuStats
+}
+
+// GC runs a garbage collection and blocks the caller until the
+// garbage collection is complete. It may also block the entire
+// program.
+func GC() {
+	// We consider a cycle to be: sweep termination, mark, mark
+	// termination, and sweep. This function shouldn't return
+	// until a full cycle has been completed, from beginning to
+	// end. Hence, we always want to finish up the current cycle
+	// and start a new one. That means:
+	//
+	// 1. In sweep termination, mark, or mark termination of cycle
+	// N, wait until mark termination N completes and transitions
+	// to sweep N.
+	//
+	// 2. In sweep N, help with sweep N.
+	//
+	// At this point we can begin a full cycle N+1.
+	//
+	// 3. Trigger cycle N+1 by starting sweep termination N+1.
+	//
+	// 4. Wait for mark termination N+1 to complete.
+	//
+	// 5. Help with sweep N+1 until it's done.
+	//
+	// This all has to be written to deal with the fact that the
+	// GC may move ahead on its own. For example, when we block
+	// until mark termination N, we may wake up in cycle N+2.
+
+	// Wait until the current sweep termination, mark, and mark
+	// termination complete.
+	n := work.cycles.Load()
+	gcWaitOnMark(n)
+
+	// We're now in sweep N or later. Trigger GC cycle N+1, which
+	// will first finish sweep N if necessary and then enter sweep
+	// termination N+1.
+	gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1})
+
+	// Wait for mark termination N+1 to complete.
+	gcWaitOnMark(n + 1)
+
+	// Finish sweep N+1 before returning. We do this both to
+	// complete the cycle and because runtime.GC() is often used
+	// as part of tests and benchmarks to get the system into a
+	// relatively stable and isolated state.
+	for work.cycles.Load() == n+1 && sweepone() != ^uintptr(0) {
+		sweep.nbgsweep++
+		Gosched()
+	}
+
+	// Callers may assume that the heap profile reflects the
+	// just-completed cycle when this returns (historically this
+	// happened because this was a STW GC), but right now the
+	// profile still reflects mark termination N, not N+1.
+	//
+	// As soon as all of the sweep frees from cycle N+1 are done,
+	// we can go ahead and publish the heap profile.
+	//
+	// First, wait for sweeping to finish. (We know there are no
+	// more spans on the sweep queue, but we may be concurrently
+	// sweeping spans, so we have to wait.)
+	for work.cycles.Load() == n+1 && !isSweepDone() {
+		Gosched()
+	}
+
+	// Now we're really done with sweeping, so we can publish the
+	// stable heap profile. Only do this if we haven't already hit
+	// another mark termination.
+	mp := acquirem()
+	cycle := work.cycles.Load()
+	if cycle == n+1 || (gcphase == _GCmark && cycle == n+2) {
+		mProf_PostSweep()
+	}
+	releasem(mp)
+}
+
+// gcWaitOnMark blocks until GC finishes the Nth mark phase. If GC has
+// already completed this mark phase, it returns immediately.
+func gcWaitOnMark(n uint32) {
+	for {
+		// Disable phase transitions.
+		lock(&work.sweepWaiters.lock)
+		nMarks := work.cycles.Load()
+		if gcphase != _GCmark {
+			// We've already completed this cycle's mark.
+			nMarks++
+		}
+		if nMarks > n {
+			// We're done.
+			unlock(&work.sweepWaiters.lock)
+			return
+		}
+
+		// Wait until sweep termination, mark, and mark
+		// termination of cycle N complete.
+		work.sweepWaiters.list.push(getg())
+		goparkunlock(&work.sweepWaiters.lock, waitReasonWaitForGCCycle, traceBlockUntilGCEnds, 1)
+	}
+}
+
+// gcMode indicates how concurrent a GC cycle should be.
+type gcMode int
+
+const (
+	gcBackgroundMode gcMode = iota // concurrent GC and sweep
+	gcForceMode                    // stop-the-world GC now, concurrent sweep
+	gcForceBlockMode               // stop-the-world GC now and STW sweep (forced by user)
+)
+
+// A gcTrigger is a predicate for starting a GC cycle. Specifically,
+// it is an exit condition for the _GCoff phase.
+type gcTrigger struct {
+	kind gcTriggerKind
+	now  int64  // gcTriggerTime: current time
+	n    uint32 // gcTriggerCycle: cycle number to start
+}
+
+type gcTriggerKind int
+
+const (
+	// gcTriggerHeap indicates that a cycle should be started when
+	// the heap size reaches the trigger heap size computed by the
+	// controller.
+	gcTriggerHeap gcTriggerKind = iota
+
+	// gcTriggerTime indicates that a cycle should be started when
+	// it's been more than forcegcperiod nanoseconds since the
+	// previous GC cycle.
+	gcTriggerTime
+
+	// gcTriggerCycle indicates that a cycle should be started if
+	// we have not yet started cycle number gcTrigger.n (relative
+	// to work.cycles).
+	gcTriggerCycle
+)
+
+// test reports whether the trigger condition is satisfied, meaning
+// that the exit condition for the _GCoff phase has been met. The exit
+// condition should be tested when allocating.
+func (t gcTrigger) test() bool {
+	if !memstats.enablegc || panicking.Load() != 0 || gcphase != _GCoff {
+		return false
+	}
+	switch t.kind {
+	case gcTriggerHeap:
+		// Non-atomic access to gcController.heapLive for performance. If
+		// we are going to trigger on this, this thread just
+		// atomically wrote gcController.heapLive anyway and we'll see our
+		// own write.
+		trigger, _ := gcController.trigger()
+		return gcController.heapLive.Load() >= trigger
+	case gcTriggerTime:
+		if gcController.gcPercent.Load() < 0 {
+			return false
+		}
+		lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime))
+		return lastgc != 0 && t.now-lastgc > forcegcperiod
+	case gcTriggerCycle:
+		// t.n > work.cycles, but accounting for wraparound.
+		return int32(t.n-work.cycles.Load()) > 0
+	}
+	return true
+}
+
+// gcStart starts the GC. It transitions from _GCoff to _GCmark (if
+// debug.gcstoptheworld == 0) or performs all of GC (if
+// debug.gcstoptheworld != 0).
+//
+// This may return without performing this transition in some cases,
+// such as when called on a system stack or with locks held.
+func gcStart(trigger gcTrigger) {
+	// Since this is called from malloc and malloc is called in
+	// the guts of a number of libraries that might be holding
+	// locks, don't attempt to start GC in non-preemptible or
+	// potentially unstable situations.
+	mp := acquirem()
+	if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" {
+		releasem(mp)
+		return
+	}
+	releasem(mp)
+	mp = nil
+
+	// Pick up the remaining unswept/not being swept spans concurrently
+	//
+	// This shouldn't happen if we're being invoked in background
+	// mode since proportional sweep should have just finished
+	// sweeping everything, but rounding errors, etc, may leave a
+	// few spans unswept. In forced mode, this is necessary since
+	// GC can be forced at any point in the sweeping cycle.
+	//
+	// We check the transition condition continuously here in case
+	// this G gets delayed in to the next GC cycle.
+	for trigger.test() && sweepone() != ^uintptr(0) {
+		sweep.nbgsweep++
+	}
+
+	// Perform GC initialization and the sweep termination
+	// transition.
+	semacquire(&work.startSema)
+	// Re-check transition condition under transition lock.
+	if !trigger.test() {
+		semrelease(&work.startSema)
+		return
+	}
+
+	// In gcstoptheworld debug mode, upgrade the mode accordingly.
+	// We do this after re-checking the transition condition so
+	// that multiple goroutines that detect the heap trigger don't
+	// start multiple STW GCs.
+	mode := gcBackgroundMode
+	if debug.gcstoptheworld == 1 {
+		mode = gcForceMode
+	} else if debug.gcstoptheworld == 2 {
+		mode = gcForceBlockMode
+	}
+
+	// Ok, we're doing it! Stop everybody else
+	semacquire(&gcsema)
+	semacquire(&worldsema)
+
+	// For stats, check if this GC was forced by the user.
+	// Update it under gcsema to avoid gctrace getting wrong values.
+	work.userForced = trigger.kind == gcTriggerCycle
+
+	if traceEnabled() {
+		traceGCStart()
+	}
+
+	// Check that all Ps have finished deferred mcache flushes.
+	for _, p := range allp {
+		if fg := p.mcache.flushGen.Load(); fg != mheap_.sweepgen {
+			println("runtime: p", p.id, "flushGen", fg, "!= sweepgen", mheap_.sweepgen)
+			throw("p mcache not flushed")
+		}
+	}
+
+	gcBgMarkStartWorkers()
+
+	systemstack(gcResetMarkState)
+
+	work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs
+	if work.stwprocs > ncpu {
+		// This is used to compute CPU time of the STW phases,
+		// so it can't be more than ncpu, even if GOMAXPROCS is.
+		work.stwprocs = ncpu
+	}
+	work.heap0 = gcController.heapLive.Load()
+	work.pauseNS = 0
+	work.mode = mode
+
+	now := nanotime()
+	work.tSweepTerm = now
+	work.pauseStart = now
+	systemstack(func() { stopTheWorldWithSema(stwGCSweepTerm) })
+	// Finish sweep before we start concurrent scan.
+	systemstack(func() {
+		finishsweep_m()
+	})
+
+	// clearpools before we start the GC. If we wait they memory will not be
+	// reclaimed until the next GC cycle.
+	clearpools()
+
+	work.cycles.Add(1)
+
+	// Assists and workers can start the moment we start
+	// the world.
+	gcController.startCycle(now, int(gomaxprocs), trigger)
+
+	// Notify the CPU limiter that assists may begin.
+	gcCPULimiter.startGCTransition(true, now)
+
+	// In STW mode, disable scheduling of user Gs. This may also
+	// disable scheduling of this goroutine, so it may block as
+	// soon as we start the world again.
+	if mode != gcBackgroundMode {
+		schedEnableUser(false)
+	}
+
+	// Enter concurrent mark phase and enable
+	// write barriers.
+	//
+	// Because the world is stopped, all Ps will
+	// observe that write barriers are enabled by
+	// the time we start the world and begin
+	// scanning.
+	//
+	// Write barriers must be enabled before assists are
+	// enabled because they must be enabled before
+	// any non-leaf heap objects are marked. Since
+	// allocations are blocked until assists can
+	// happen, we want enable assists as early as
+	// possible.
+	setGCPhase(_GCmark)
+
+	gcBgMarkPrepare() // Must happen before assist enable.
+	gcMarkRootPrepare()
+
+	// Mark all active tinyalloc blocks. Since we're
+	// allocating from these, they need to be black like
+	// other allocations. The alternative is to blacken
+	// the tiny block on every allocation from it, which
+	// would slow down the tiny allocator.
+	gcMarkTinyAllocs()
+
+	// At this point all Ps have enabled the write
+	// barrier, thus maintaining the no white to
+	// black invariant. Enable mutator assists to
+	// put back-pressure on fast allocating
+	// mutators.
+	atomic.Store(&gcBlackenEnabled, 1)
+
+	// In STW mode, we could block the instant systemstack
+	// returns, so make sure we're not preemptible.
+	mp = acquirem()
+
+	// Concurrent mark.
+	systemstack(func() {
+		now = startTheWorldWithSema()
+		work.pauseNS += now - work.pauseStart
+		work.tMark = now
+		memstats.gcPauseDist.record(now - work.pauseStart)
+
+		sweepTermCpu := int64(work.stwprocs) * (work.tMark - work.tSweepTerm)
+		work.cpuStats.gcPauseTime += sweepTermCpu
+		work.cpuStats.gcTotalTime += sweepTermCpu
+
+		// Release the CPU limiter.
+		gcCPULimiter.finishGCTransition(now)
+	})
+
+	// Release the world sema before Gosched() in STW mode
+	// because we will need to reacquire it later but before
+	// this goroutine becomes runnable again, and we could
+	// self-deadlock otherwise.
+	semrelease(&worldsema)
+	releasem(mp)
+
+	// Make sure we block instead of returning to user code
+	// in STW mode.
+	if mode != gcBackgroundMode {
+		Gosched()
+	}
+
+	semrelease(&work.startSema)
+}
+
+// gcMarkDoneFlushed counts the number of P's with flushed work.
+//
+// Ideally this would be a captured local in gcMarkDone, but forEachP
+// escapes its callback closure, so it can't capture anything.
+//
+// This is protected by markDoneSema.
+var gcMarkDoneFlushed uint32
+
+// gcMarkDone transitions the GC from mark to mark termination if all
+// reachable objects have been marked (that is, there are no grey
+// objects and can be no more in the future). Otherwise, it flushes
+// all local work to the global queues where it can be discovered by
+// other workers.
+//
+// This should be called when all local mark work has been drained and
+// there are no remaining workers. Specifically, when
+//
+//	work.nwait == work.nproc && !gcMarkWorkAvailable(p)
+//
+// The calling context must be preemptible.
+//
+// Flushing local work is important because idle Ps may have local
+// work queued. This is the only way to make that work visible and
+// drive GC to completion.
+//
+// It is explicitly okay to have write barriers in this function. If
+// it does transition to mark termination, then all reachable objects
+// have been marked, so the write barrier cannot shade any more
+// objects.
+func gcMarkDone() {
+	// Ensure only one thread is running the ragged barrier at a
+	// time.
+	semacquire(&work.markDoneSema)
+
+top:
+	// Re-check transition condition under transition lock.
+	//
+	// It's critical that this checks the global work queues are
+	// empty before performing the ragged barrier. Otherwise,
+	// there could be global work that a P could take after the P
+	// has passed the ragged barrier.
+	if !(gcphase == _GCmark && work.nwait == work.nproc && !gcMarkWorkAvailable(nil)) {
+		semrelease(&work.markDoneSema)
+		return
+	}
+
+	// forEachP needs worldsema to execute, and we'll need it to
+	// stop the world later, so acquire worldsema now.
+	semacquire(&worldsema)
+
+	// Flush all local buffers and collect flushedWork flags.
+	gcMarkDoneFlushed = 0
+	systemstack(func() {
+		gp := getg().m.curg
+		// Mark the user stack as preemptible so that it may be scanned.
+		// Otherwise, our attempt to force all P's to a safepoint could
+		// result in a deadlock as we attempt to preempt a worker that's
+		// trying to preempt us (e.g. for a stack scan).
+		casGToWaiting(gp, _Grunning, waitReasonGCMarkTermination)
+		forEachP(func(pp *p) {
+			// Flush the write barrier buffer, since this may add
+			// work to the gcWork.
+			wbBufFlush1(pp)
+
+			// Flush the gcWork, since this may create global work
+			// and set the flushedWork flag.
+			//
+			// TODO(austin): Break up these workbufs to
+			// better distribute work.
+			pp.gcw.dispose()
+			// Collect the flushedWork flag.
+			if pp.gcw.flushedWork {
+				atomic.Xadd(&gcMarkDoneFlushed, 1)
+				pp.gcw.flushedWork = false
+			}
+		})
+		casgstatus(gp, _Gwaiting, _Grunning)
+	})
+
+	if gcMarkDoneFlushed != 0 {
+		// More grey objects were discovered since the
+		// previous termination check, so there may be more
+		// work to do. Keep going. It's possible the
+		// transition condition became true again during the
+		// ragged barrier, so re-check it.
+		semrelease(&worldsema)
+		goto top
+	}
+
+	// There was no global work, no local work, and no Ps
+	// communicated work since we took markDoneSema. Therefore
+	// there are no grey objects and no more objects can be
+	// shaded. Transition to mark termination.
+	now := nanotime()
+	work.tMarkTerm = now
+	work.pauseStart = now
+	getg().m.preemptoff = "gcing"
+	systemstack(func() { stopTheWorldWithSema(stwGCMarkTerm) })
+	// The gcphase is _GCmark, it will transition to _GCmarktermination
+	// below. The important thing is that the wb remains active until
+	// all marking is complete. This includes writes made by the GC.
+
+	// There is sometimes work left over when we enter mark termination due
+	// to write barriers performed after the completion barrier above.
+	// Detect this and resume concurrent mark. This is obviously
+	// unfortunate.
+	//
+	// See issue #27993 for details.
+	//
+	// Switch to the system stack to call wbBufFlush1, though in this case
+	// it doesn't matter because we're non-preemptible anyway.
+	restart := false
+	systemstack(func() {
+		for _, p := range allp {
+			wbBufFlush1(p)
+			if !p.gcw.empty() {
+				restart = true
+				break
+			}
+		}
+	})
+	if restart {
+		getg().m.preemptoff = ""
+		systemstack(func() {
+			now := startTheWorldWithSema()
+			work.pauseNS += now - work.pauseStart
+			memstats.gcPauseDist.record(now - work.pauseStart)
+		})
+		semrelease(&worldsema)
+		goto top
+	}
+
+	gcComputeStartingStackSize()
+
+	// Disable assists and background workers. We must do
+	// this before waking blocked assists.
+	atomic.Store(&gcBlackenEnabled, 0)
+
+	// Notify the CPU limiter that GC assists will now cease.
+	gcCPULimiter.startGCTransition(false, now)
+
+	// Wake all blocked assists. These will run when we
+	// start the world again.
+	gcWakeAllAssists()
+
+	// Likewise, release the transition lock. Blocked
+	// workers and assists will run when we start the
+	// world again.
+	semrelease(&work.markDoneSema)
+
+	// In STW mode, re-enable user goroutines. These will be
+	// queued to run after we start the world.
+	schedEnableUser(true)
+
+	// endCycle depends on all gcWork cache stats being flushed.
+	// The termination algorithm above ensured that up to
+	// allocations since the ragged barrier.
+	gcController.endCycle(now, int(gomaxprocs), work.userForced)
+
+	// Perform mark termination. This will restart the world.
+	gcMarkTermination()
+}
+
+// World must be stopped and mark assists and background workers must be
+// disabled.
+func gcMarkTermination() {
+	// Start marktermination (write barrier remains enabled for now).
+	setGCPhase(_GCmarktermination)
+
+	work.heap1 = gcController.heapLive.Load()
+	startTime := nanotime()
+
+	mp := acquirem()
+	mp.preemptoff = "gcing"
+	mp.traceback = 2
+	curgp := mp.curg
+	casGToWaiting(curgp, _Grunning, waitReasonGarbageCollection)
+
+	// Run gc on the g0 stack. We do this so that the g stack
+	// we're currently running on will no longer change. Cuts
+	// the root set down a bit (g0 stacks are not scanned, and
+	// we don't need to scan gc's internal state).  We also
+	// need to switch to g0 so we can shrink the stack.
+	systemstack(func() {
+		gcMark(startTime)
+		// Must return immediately.
+		// The outer function's stack may have moved
+		// during gcMark (it shrinks stacks, including the
+		// outer function's stack), so we must not refer
+		// to any of its variables. Return back to the
+		// non-system stack to pick up the new addresses
+		// before continuing.
+	})
+
+	systemstack(func() {
+		work.heap2 = work.bytesMarked
+		if debug.gccheckmark > 0 {
+			// Run a full non-parallel, stop-the-world
+			// mark using checkmark bits, to check that we
+			// didn't forget to mark anything during the
+			// concurrent mark process.
+			startCheckmarks()
+			gcResetMarkState()
+			gcw := &getg().m.p.ptr().gcw
+			gcDrain(gcw, 0)
+			wbBufFlush1(getg().m.p.ptr())
+			gcw.dispose()
+			endCheckmarks()
+		}
+
+		// marking is complete so we can turn the write barrier off
+		setGCPhase(_GCoff)
+		gcSweep(work.mode)
+	})
+
+	mp.traceback = 0
+	casgstatus(curgp, _Gwaiting, _Grunning)
+
+	if traceEnabled() {
+		traceGCDone()
+	}
+
+	// all done
+	mp.preemptoff = ""
+
+	if gcphase != _GCoff {
+		throw("gc done but gcphase != _GCoff")
+	}
+
+	// Record heapInUse for scavenger.
+	memstats.lastHeapInUse = gcController.heapInUse.load()
+
+	// Update GC trigger and pacing, as well as downstream consumers
+	// of this pacing information, for the next cycle.
+	systemstack(gcControllerCommit)
+
+	// Update timing memstats
+	now := nanotime()
+	sec, nsec, _ := time_now()
+	unixNow := sec*1e9 + int64(nsec)
+	work.pauseNS += now - work.pauseStart
+	work.tEnd = now
+	memstats.gcPauseDist.record(now - work.pauseStart)
+	atomic.Store64(&memstats.last_gc_unix, uint64(unixNow)) // must be Unix time to make sense to user
+	atomic.Store64(&memstats.last_gc_nanotime, uint64(now)) // monotonic time for us
+	memstats.pause_ns[memstats.numgc%uint32(len(memstats.pause_ns))] = uint64(work.pauseNS)
+	memstats.pause_end[memstats.numgc%uint32(len(memstats.pause_end))] = uint64(unixNow)
+	memstats.pause_total_ns += uint64(work.pauseNS)
+
+	markTermCpu := int64(work.stwprocs) * (work.tEnd - work.tMarkTerm)
+	work.cpuStats.gcPauseTime += markTermCpu
+	work.cpuStats.gcTotalTime += markTermCpu
+
+	// Accumulate CPU stats.
+	//
+	// Pass gcMarkPhase=true so we can get all the latest GC CPU stats in there too.
+	work.cpuStats.accumulate(now, true)
+
+	// Compute overall GC CPU utilization.
+	// Omit idle marking time from the overall utilization here since it's "free".
+	memstats.gc_cpu_fraction = float64(work.cpuStats.gcTotalTime-work.cpuStats.gcIdleTime) / float64(work.cpuStats.totalTime)
+
+	// Reset assist time and background time stats.
+	//
+	// Do this now, instead of at the start of the next GC cycle, because
+	// these two may keep accumulating even if the GC is not active.
+	scavenge.assistTime.Store(0)
+	scavenge.backgroundTime.Store(0)
+
+	// Reset idle time stat.
+	sched.idleTime.Store(0)
+
+	// Reset sweep state.
+	sweep.nbgsweep = 0
+	sweep.npausesweep = 0
+
+	if work.userForced {
+		memstats.numforcedgc++
+	}
+
+	// Bump GC cycle count and wake goroutines waiting on sweep.
+	lock(&work.sweepWaiters.lock)
+	memstats.numgc++
+	injectglist(&work.sweepWaiters.list)
+	unlock(&work.sweepWaiters.lock)
+
+	// Increment the scavenge generation now.
+	//
+	// This moment represents peak heap in use because we're
+	// about to start sweeping.
+	mheap_.pages.scav.index.nextGen()
+
+	// Release the CPU limiter.
+	gcCPULimiter.finishGCTransition(now)
+
+	// Finish the current heap profiling cycle and start a new
+	// heap profiling cycle. We do this before starting the world
+	// so events don't leak into the wrong cycle.
+	mProf_NextCycle()
+
+	// There may be stale spans in mcaches that need to be swept.
+	// Those aren't tracked in any sweep lists, so we need to
+	// count them against sweep completion until we ensure all
+	// those spans have been forced out.
+	sl := sweep.active.begin()
+	if !sl.valid {
+		throw("failed to set sweep barrier")
+	}
+
+	systemstack(func() { startTheWorldWithSema() })
+
+	// Flush the heap profile so we can start a new cycle next GC.
+	// This is relatively expensive, so we don't do it with the
+	// world stopped.
+	mProf_Flush()
+
+	// Prepare workbufs for freeing by the sweeper. We do this
+	// asynchronously because it can take non-trivial time.
+	prepareFreeWorkbufs()
+
+	// Free stack spans. This must be done between GC cycles.
+	systemstack(freeStackSpans)
+
+	// Ensure all mcaches are flushed. Each P will flush its own
+	// mcache before allocating, but idle Ps may not. Since this
+	// is necessary to sweep all spans, we need to ensure all
+	// mcaches are flushed before we start the next GC cycle.
+	//
+	// While we're here, flush the page cache for idle Ps to avoid
+	// having pages get stuck on them. These pages are hidden from
+	// the scavenger, so in small idle heaps a significant amount
+	// of additional memory might be held onto.
+	//
+	// Also, flush the pinner cache, to avoid leaking that memory
+	// indefinitely.
+	systemstack(func() {
+		forEachP(func(pp *p) {
+			pp.mcache.prepareForSweep()
+			if pp.status == _Pidle {
+				systemstack(func() {
+					lock(&mheap_.lock)
+					pp.pcache.flush(&mheap_.pages)
+					unlock(&mheap_.lock)
+				})
+			}
+			pp.pinnerCache = nil
+		})
+	})
+	// Now that we've swept stale spans in mcaches, they don't
+	// count against unswept spans.
+	sweep.active.end(sl)
+
+	// Print gctrace before dropping worldsema. As soon as we drop
+	// worldsema another cycle could start and smash the stats
+	// we're trying to print.
+	if debug.gctrace > 0 {
+		util := int(memstats.gc_cpu_fraction * 100)
+
+		var sbuf [24]byte
+		printlock()
+		print("gc ", memstats.numgc,
+			" @", string(itoaDiv(sbuf[:], uint64(work.tSweepTerm-runtimeInitTime)/1e6, 3)), "s ",
+			util, "%: ")
+		prev := work.tSweepTerm
+		for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} {
+			if i != 0 {
+				print("+")
+			}
+			print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev))))
+			prev = ns
+		}
+		print(" ms clock, ")
+		for i, ns := range []int64{
+			int64(work.stwprocs) * (work.tMark - work.tSweepTerm),
+			gcController.assistTime.Load(),
+			gcController.dedicatedMarkTime.Load() + gcController.fractionalMarkTime.Load(),
+			gcController.idleMarkTime.Load(),
+			markTermCpu,
+		} {
+			if i == 2 || i == 3 {
+				// Separate mark time components with /.
+				print("/")
+			} else if i != 0 {
+				print("+")
+			}
+			print(string(fmtNSAsMS(sbuf[:], uint64(ns))))
+		}
+		print(" ms cpu, ",
+			work.heap0>>20, "->", work.heap1>>20, "->", work.heap2>>20, " MB, ",
+			gcController.lastHeapGoal>>20, " MB goal, ",
+			gcController.lastStackScan.Load()>>20, " MB stacks, ",
+			gcController.globalsScan.Load()>>20, " MB globals, ",
+			work.maxprocs, " P")
+		if work.userForced {
+			print(" (forced)")
+		}
+		print("\n")
+		printunlock()
+	}
+
+	// Set any arena chunks that were deferred to fault.
+	lock(&userArenaState.lock)
+	faultList := userArenaState.fault
+	userArenaState.fault = nil
+	unlock(&userArenaState.lock)
+	for _, lc := range faultList {
+		lc.mspan.setUserArenaChunkToFault()
+	}
+
+	// Enable huge pages on some metadata if we cross a heap threshold.
+	if gcController.heapGoal() > minHeapForMetadataHugePages {
+		systemstack(func() {
+			mheap_.enableMetadataHugePages()
+		})
+	}
+
+	semrelease(&worldsema)
+	semrelease(&gcsema)
+	// Careful: another GC cycle may start now.
+
+	releasem(mp)
+	mp = nil
+
+	// now that gc is done, kick off finalizer thread if needed
+	if !concurrentSweep {
+		// give the queued finalizers, if any, a chance to run
+		Gosched()
+	}
+}
+
+// gcBgMarkStartWorkers prepares background mark worker goroutines. These
+// goroutines will not run until the mark phase, but they must be started while
+// the work is not stopped and from a regular G stack. The caller must hold
+// worldsema.
+func gcBgMarkStartWorkers() {
+	// Background marking is performed by per-P G's. Ensure that each P has
+	// a background GC G.
+	//
+	// Worker Gs don't exit if gomaxprocs is reduced. If it is raised
+	// again, we can reuse the old workers; no need to create new workers.
+	for gcBgMarkWorkerCount < gomaxprocs {
+		go gcBgMarkWorker()
+
+		notetsleepg(&work.bgMarkReady, -1)
+		noteclear(&work.bgMarkReady)
+		// The worker is now guaranteed to be added to the pool before
+		// its P's next findRunnableGCWorker.
+
+		gcBgMarkWorkerCount++
+	}
+}
+
+// gcBgMarkPrepare sets up state for background marking.
+// Mutator assists must not yet be enabled.
+func gcBgMarkPrepare() {
+	// Background marking will stop when the work queues are empty
+	// and there are no more workers (note that, since this is
+	// concurrent, this may be a transient state, but mark
+	// termination will clean it up). Between background workers
+	// and assists, we don't really know how many workers there
+	// will be, so we pretend to have an arbitrarily large number
+	// of workers, almost all of which are "waiting". While a
+	// worker is working it decrements nwait. If nproc == nwait,
+	// there are no workers.
+	work.nproc = ^uint32(0)
+	work.nwait = ^uint32(0)
+}
+
+// gcBgMarkWorkerNode is an entry in the gcBgMarkWorkerPool. It points to a single
+// gcBgMarkWorker goroutine.
+type gcBgMarkWorkerNode struct {
+	// Unused workers are managed in a lock-free stack. This field must be first.
+	node lfnode
+
+	// The g of this worker.
+	gp guintptr
+
+	// Release this m on park. This is used to communicate with the unlock
+	// function, which cannot access the G's stack. It is unused outside of
+	// gcBgMarkWorker().
+	m muintptr
+}
+
+func gcBgMarkWorker() {
+	gp := getg()
+
+	// We pass node to a gopark unlock function, so it can't be on
+	// the stack (see gopark). Prevent deadlock from recursively
+	// starting GC by disabling preemption.
+	gp.m.preemptoff = "GC worker init"
+	node := new(gcBgMarkWorkerNode)
+	gp.m.preemptoff = ""
+
+	node.gp.set(gp)
+
+	node.m.set(acquirem())
+	notewakeup(&work.bgMarkReady)
+	// After this point, the background mark worker is generally scheduled
+	// cooperatively by gcController.findRunnableGCWorker. While performing
+	// work on the P, preemption is disabled because we are working on
+	// P-local work buffers. When the preempt flag is set, this puts itself
+	// into _Gwaiting to be woken up by gcController.findRunnableGCWorker
+	// at the appropriate time.
+	//
+	// When preemption is enabled (e.g., while in gcMarkDone), this worker
+	// may be preempted and schedule as a _Grunnable G from a runq. That is
+	// fine; it will eventually gopark again for further scheduling via
+	// findRunnableGCWorker.
+	//
+	// Since we disable preemption before notifying bgMarkReady, we
+	// guarantee that this G will be in the worker pool for the next
+	// findRunnableGCWorker. This isn't strictly necessary, but it reduces
+	// latency between _GCmark starting and the workers starting.
+
+	for {
+		// Go to sleep until woken by
+		// gcController.findRunnableGCWorker.
+		gopark(func(g *g, nodep unsafe.Pointer) bool {
+			node := (*gcBgMarkWorkerNode)(nodep)
+
+			if mp := node.m.ptr(); mp != nil {
+				// The worker G is no longer running; release
+				// the M.
+				//
+				// N.B. it is _safe_ to release the M as soon
+				// as we are no longer performing P-local mark
+				// work.
+				//
+				// However, since we cooperatively stop work
+				// when gp.preempt is set, if we releasem in
+				// the loop then the following call to gopark
+				// would immediately preempt the G. This is
+				// also safe, but inefficient: the G must
+				// schedule again only to enter gopark and park
+				// again. Thus, we defer the release until
+				// after parking the G.
+				releasem(mp)
+			}
+
+			// Release this G to the pool.
+			gcBgMarkWorkerPool.push(&node.node)
+			// Note that at this point, the G may immediately be
+			// rescheduled and may be running.
+			return true
+		}, unsafe.Pointer(node), waitReasonGCWorkerIdle, traceBlockSystemGoroutine, 0)
+
+		// Preemption must not occur here, or another G might see
+		// p.gcMarkWorkerMode.
+
+		// Disable preemption so we can use the gcw. If the
+		// scheduler wants to preempt us, we'll stop draining,
+		// dispose the gcw, and then preempt.
+		node.m.set(acquirem())
+		pp := gp.m.p.ptr() // P can't change with preemption disabled.
+
+		if gcBlackenEnabled == 0 {
+			println("worker mode", pp.gcMarkWorkerMode)
+			throw("gcBgMarkWorker: blackening not enabled")
+		}
+
+		if pp.gcMarkWorkerMode == gcMarkWorkerNotWorker {
+			throw("gcBgMarkWorker: mode not set")
+		}
+
+		startTime := nanotime()
+		pp.gcMarkWorkerStartTime = startTime
+		var trackLimiterEvent bool
+		if pp.gcMarkWorkerMode == gcMarkWorkerIdleMode {
+			trackLimiterEvent = pp.limiterEvent.start(limiterEventIdleMarkWork, startTime)
+		}
+
+		decnwait := atomic.Xadd(&work.nwait, -1)
+		if decnwait == work.nproc {
+			println("runtime: work.nwait=", decnwait, "work.nproc=", work.nproc)
+			throw("work.nwait was > work.nproc")
+		}
+
+		systemstack(func() {
+			// Mark our goroutine preemptible so its stack
+			// can be scanned. This lets two mark workers
+			// scan each other (otherwise, they would
+			// deadlock). We must not modify anything on
+			// the G stack. However, stack shrinking is
+			// disabled for mark workers, so it is safe to
+			// read from the G stack.
+			casGToWaiting(gp, _Grunning, waitReasonGCWorkerActive)
+			switch pp.gcMarkWorkerMode {
+			default:
+				throw("gcBgMarkWorker: unexpected gcMarkWorkerMode")
+			case gcMarkWorkerDedicatedMode:
+				gcDrain(&pp.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit)
+				if gp.preempt {
+					// We were preempted. This is
+					// a useful signal to kick
+					// everything out of the run
+					// queue so it can run
+					// somewhere else.
+					if drainQ, n := runqdrain(pp); n > 0 {
+						lock(&sched.lock)
+						globrunqputbatch(&drainQ, int32(n))
+						unlock(&sched.lock)
+					}
+				}
+				// Go back to draining, this time
+				// without preemption.
+				gcDrain(&pp.gcw, gcDrainFlushBgCredit)
+			case gcMarkWorkerFractionalMode:
+				gcDrain(&pp.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit)
+			case gcMarkWorkerIdleMode:
+				gcDrain(&pp.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit)
+			}
+			casgstatus(gp, _Gwaiting, _Grunning)
+		})
+
+		// Account for time and mark us as stopped.
+		now := nanotime()
+		duration := now - startTime
+		gcController.markWorkerStop(pp.gcMarkWorkerMode, duration)
+		if trackLimiterEvent {
+			pp.limiterEvent.stop(limiterEventIdleMarkWork, now)
+		}
+		if pp.gcMarkWorkerMode == gcMarkWorkerFractionalMode {
+			atomic.Xaddint64(&pp.gcFractionalMarkTime, duration)
+		}
+
+		// Was this the last worker and did we run out
+		// of work?
+		incnwait := atomic.Xadd(&work.nwait, +1)
+		if incnwait > work.nproc {
+			println("runtime: p.gcMarkWorkerMode=", pp.gcMarkWorkerMode,
+				"work.nwait=", incnwait, "work.nproc=", work.nproc)
+			throw("work.nwait > work.nproc")
+		}
+
+		// We'll releasem after this point and thus this P may run
+		// something else. We must clear the worker mode to avoid
+		// attributing the mode to a different (non-worker) G in
+		// traceGoStart.
+		pp.gcMarkWorkerMode = gcMarkWorkerNotWorker
+
+		// If this worker reached a background mark completion
+		// point, signal the main GC goroutine.
+		if incnwait == work.nproc && !gcMarkWorkAvailable(nil) {
+			// We don't need the P-local buffers here, allow
+			// preemption because we may schedule like a regular
+			// goroutine in gcMarkDone (block on locks, etc).
+			releasem(node.m.ptr())
+			node.m.set(nil)
+
+			gcMarkDone()
+		}
+	}
+}
+
+// gcMarkWorkAvailable reports whether executing a mark worker
+// on p is potentially useful. p may be nil, in which case it only
+// checks the global sources of work.
+func gcMarkWorkAvailable(p *p) bool {
+	if p != nil && !p.gcw.empty() {
+		return true
+	}
+	if !work.full.empty() {
+		return true // global work available
+	}
+	if work.markrootNext < work.markrootJobs {
+		return true // root scan work available
+	}
+	return false
+}
+
+// gcMark runs the mark (or, for concurrent GC, mark termination)
+// All gcWork caches must be empty.
+// STW is in effect at this point.
+func gcMark(startTime int64) {
+	if debug.allocfreetrace > 0 {
+		tracegc()
+	}
+
+	if gcphase != _GCmarktermination {
+		throw("in gcMark expecting to see gcphase as _GCmarktermination")
+	}
+	work.tstart = startTime
+
+	// Check that there's no marking work remaining.
+	if work.full != 0 || work.markrootNext < work.markrootJobs {
+		print("runtime: full=", hex(work.full), " next=", work.markrootNext, " jobs=", work.markrootJobs, " nDataRoots=", work.nDataRoots, " nBSSRoots=", work.nBSSRoots, " nSpanRoots=", work.nSpanRoots, " nStackRoots=", work.nStackRoots, "\n")
+		panic("non-empty mark queue after concurrent mark")
+	}
+
+	if debug.gccheckmark > 0 {
+		// This is expensive when there's a large number of
+		// Gs, so only do it if checkmark is also enabled.
+		gcMarkRootCheck()
+	}
+
+	// Drop allg snapshot. allgs may have grown, in which case
+	// this is the only reference to the old backing store and
+	// there's no need to keep it around.
+	work.stackRoots = nil
+
+	// Clear out buffers and double-check that all gcWork caches
+	// are empty. This should be ensured by gcMarkDone before we
+	// enter mark termination.
+	//
+	// TODO: We could clear out buffers just before mark if this
+	// has a non-negligible impact on STW time.
+	for _, p := range allp {
+		// The write barrier may have buffered pointers since
+		// the gcMarkDone barrier. However, since the barrier
+		// ensured all reachable objects were marked, all of
+		// these must be pointers to black objects. Hence we
+		// can just discard the write barrier buffer.
+		if debug.gccheckmark > 0 {
+			// For debugging, flush the buffer and make
+			// sure it really was all marked.
+			wbBufFlush1(p)
+		} else {
+			p.wbBuf.reset()
+		}
+
+		gcw := &p.gcw
+		if !gcw.empty() {
+			printlock()
+			print("runtime: P ", p.id, " flushedWork ", gcw.flushedWork)
+			if gcw.wbuf1 == nil {
+				print(" wbuf1=<nil>")
+			} else {
+				print(" wbuf1.n=", gcw.wbuf1.nobj)
+			}
+			if gcw.wbuf2 == nil {
+				print(" wbuf2=<nil>")
+			} else {
+				print(" wbuf2.n=", gcw.wbuf2.nobj)
+			}
+			print("\n")
+			throw("P has cached GC work at end of mark termination")
+		}
+		// There may still be cached empty buffers, which we
+		// need to flush since we're going to free them. Also,
+		// there may be non-zero stats because we allocated
+		// black after the gcMarkDone barrier.
+		gcw.dispose()
+	}
+
+	// Flush scanAlloc from each mcache since we're about to modify
+	// heapScan directly. If we were to flush this later, then scanAlloc
+	// might have incorrect information.
+	//
+	// Note that it's not important to retain this information; we know
+	// exactly what heapScan is at this point via scanWork.
+	for _, p := range allp {
+		c := p.mcache
+		if c == nil {
+			continue
+		}
+		c.scanAlloc = 0
+	}
+
+	// Reset controller state.
+	gcController.resetLive(work.bytesMarked)
+}
+
+// gcSweep must be called on the system stack because it acquires the heap
+// lock. See mheap for details.
+//
+// The world must be stopped.
+//
+//go:systemstack
+func gcSweep(mode gcMode) {
+	assertWorldStopped()
+
+	if gcphase != _GCoff {
+		throw("gcSweep being done but phase is not GCoff")
+	}
+
+	lock(&mheap_.lock)
+	mheap_.sweepgen += 2
+	sweep.active.reset()
+	mheap_.pagesSwept.Store(0)
+	mheap_.sweepArenas = mheap_.allArenas
+	mheap_.reclaimIndex.Store(0)
+	mheap_.reclaimCredit.Store(0)
+	unlock(&mheap_.lock)
+
+	sweep.centralIndex.clear()
+
+	if !_ConcurrentSweep || mode == gcForceBlockMode {
+		// Special case synchronous sweep.
+		// Record that no proportional sweeping has to happen.
+		lock(&mheap_.lock)
+		mheap_.sweepPagesPerByte = 0
+		unlock(&mheap_.lock)
+		// Sweep all spans eagerly.
+		for sweepone() != ^uintptr(0) {
+			sweep.npausesweep++
+		}
+		// Free workbufs eagerly.
+		prepareFreeWorkbufs()
+		for freeSomeWbufs(false) {
+		}
+		// All "free" events for this mark/sweep cycle have
+		// now happened, so we can make this profile cycle
+		// available immediately.
+		mProf_NextCycle()
+		mProf_Flush()
+		return
+	}
+
+	// Background sweep.
+	lock(&sweep.lock)
+	if sweep.parked {
+		sweep.parked = false
+		ready(sweep.g, 0, true)
+	}
+	unlock(&sweep.lock)
+}
+
+// gcResetMarkState resets global state prior to marking (concurrent
+// or STW) and resets the stack scan state of all Gs.
+//
+// This is safe to do without the world stopped because any Gs created
+// during or after this will start out in the reset state.
+//
+// gcResetMarkState must be called on the system stack because it acquires
+// the heap lock. See mheap for details.
+//
+//go:systemstack
+func gcResetMarkState() {
+	// This may be called during a concurrent phase, so lock to make sure
+	// allgs doesn't change.
+	forEachG(func(gp *g) {
+		gp.gcscandone = false // set to true in gcphasework
+		gp.gcAssistBytes = 0
+	})
+
+	// Clear page marks. This is just 1MB per 64GB of heap, so the
+	// time here is pretty trivial.
+	lock(&mheap_.lock)
+	arenas := mheap_.allArenas
+	unlock(&mheap_.lock)
+	for _, ai := range arenas {
+		ha := mheap_.arenas[ai.l1()][ai.l2()]
+		for i := range ha.pageMarks {
+			ha.pageMarks[i] = 0
+		}
+	}
+
+	work.bytesMarked = 0
+	work.initialHeapLive = gcController.heapLive.Load()
+}
+
+// Hooks for other packages
+
+var poolcleanup func()
+var boringCaches []unsafe.Pointer // for crypto/internal/boring
+
+//go:linkname sync_runtime_registerPoolCleanup sync.runtime_registerPoolCleanup
+func sync_runtime_registerPoolCleanup(f func()) {
+	poolcleanup = f
+}
+
+//go:linkname boring_registerCache crypto/internal/boring/bcache.registerCache
+func boring_registerCache(p unsafe.Pointer) {
+	boringCaches = append(boringCaches, p)
+}
+
+func clearpools() {
+	// clear sync.Pools
+	if poolcleanup != nil {
+		poolcleanup()
+	}
+
+	// clear boringcrypto caches
+	for _, p := range boringCaches {
+		atomicstorep(p, nil)
+	}
+
+	// Clear central sudog cache.
+	// Leave per-P caches alone, they have strictly bounded size.
+	// Disconnect cached list before dropping it on the floor,
+	// so that a dangling ref to one entry does not pin all of them.
+	lock(&sched.sudoglock)
+	var sg, sgnext *sudog
+	for sg = sched.sudogcache; sg != nil; sg = sgnext {
+		sgnext = sg.next
+		sg.next = nil
+	}
+	sched.sudogcache = nil
+	unlock(&sched.sudoglock)
+
+	// Clear central defer pool.
+	// Leave per-P pools alone, they have strictly bounded size.
+	lock(&sched.deferlock)
+	// disconnect cached list before dropping it on the floor,
+	// so that a dangling ref to one entry does not pin all of them.
+	var d, dlink *_defer
+	for d = sched.deferpool; d != nil; d = dlink {
+		dlink = d.link
+		d.link = nil
+	}
+	sched.deferpool = nil
+	unlock(&sched.deferlock)
+}
+
+// Timing
+
+// itoaDiv formats val/(10**dec) into buf.
+func itoaDiv(buf []byte, val uint64, dec int) []byte {
+	i := len(buf) - 1
+	idec := i - dec
+	for val >= 10 || i >= idec {
+		buf[i] = byte(val%10 + '0')
+		i--
+		if i == idec {
+			buf[i] = '.'
+			i--
+		}
+		val /= 10
+	}
+	buf[i] = byte(val + '0')
+	return buf[i:]
+}
+
+// fmtNSAsMS nicely formats ns nanoseconds as milliseconds.
+func fmtNSAsMS(buf []byte, ns uint64) []byte {
+	if ns >= 10e6 {
+		// Format as whole milliseconds.
+		return itoaDiv(buf, ns/1e6, 0)
+	}
+	// Format two digits of precision, with at most three decimal places.
+	x := ns / 1e3
+	if x == 0 {
+		buf[0] = '0'
+		return buf[:1]
+	}
+	dec := 3
+	for x >= 100 {
+		x /= 10
+		dec--
+	}
+	return itoaDiv(buf, x, dec)
+}
+
+// Helpers for testing GC.
+
+// gcTestMoveStackOnNextCall causes the stack to be moved on a call
+// immediately following the call to this. It may not work correctly
+// if any other work appears after this call (such as returning).
+// Typically the following call should be marked go:noinline so it
+// performs a stack check.
+//
+// In rare cases this may not cause the stack to move, specifically if
+// there's a preemption between this call and the next.
+func gcTestMoveStackOnNextCall() {
+	gp := getg()
+	gp.stackguard0 = stackForceMove
+}
+
+// gcTestIsReachable performs a GC and returns a bit set where bit i
+// is set if ptrs[i] is reachable.
+func gcTestIsReachable(ptrs ...unsafe.Pointer) (mask uint64) {
+	// This takes the pointers as unsafe.Pointers in order to keep
+	// them live long enough for us to attach specials. After
+	// that, we drop our references to them.
+
+	if len(ptrs) > 64 {
+		panic("too many pointers for uint64 mask")
+	}
+
+	// Block GC while we attach specials and drop our references
+	// to ptrs. Otherwise, if a GC is in progress, it could mark
+	// them reachable via this function before we have a chance to
+	// drop them.
+	semacquire(&gcsema)
+
+	// Create reachability specials for ptrs.
+	specials := make([]*specialReachable, len(ptrs))
+	for i, p := range ptrs {
+		lock(&mheap_.speciallock)
+		s := (*specialReachable)(mheap_.specialReachableAlloc.alloc())
+		unlock(&mheap_.speciallock)
+		s.special.kind = _KindSpecialReachable
+		if !addspecial(p, &s.special) {
+			throw("already have a reachable special (duplicate pointer?)")
+		}
+		specials[i] = s
+		// Make sure we don't retain ptrs.
+		ptrs[i] = nil
+	}
+
+	semrelease(&gcsema)
+
+	// Force a full GC and sweep.
+	GC()
+
+	// Process specials.
+	for i, s := range specials {
+		if !s.done {
+			printlock()
+			println("runtime: object", i, "was not swept")
+			throw("IsReachable failed")
+		}
+		if s.reachable {
+			mask |= 1 << i
+		}
+		lock(&mheap_.speciallock)
+		mheap_.specialReachableAlloc.free(unsafe.Pointer(s))
+		unlock(&mheap_.speciallock)
+	}
+
+	return mask
+}
+
+// gcTestPointerClass returns the category of what p points to, one of:
+// "heap", "stack", "data", "bss", "other". This is useful for checking
+// that a test is doing what it's intended to do.
+//
+// This is nosplit simply to avoid extra pointer shuffling that may
+// complicate a test.
+//
+//go:nosplit
+func gcTestPointerClass(p unsafe.Pointer) string {
+	p2 := uintptr(noescape(p))
+	gp := getg()
+	if gp.stack.lo <= p2 && p2 < gp.stack.hi {
+		return "stack"
+	}
+	if base, _, _ := findObject(p2, 0, 0); base != 0 {
+		return "heap"
+	}
+	for _, datap := range activeModules() {
+		if datap.data <= p2 && p2 < datap.edata || datap.noptrdata <= p2 && p2 < datap.enoptrdata {
+			return "data"
+		}
+		if datap.bss <= p2 && p2 < datap.ebss || datap.noptrbss <= p2 && p2 <= datap.enoptrbss {
+			return "bss"
+		}
+	}
+	KeepAlive(p)
+	return "other"
+}
diff --git a/src/runtime/mgclimit.go b/src/runtime/mgclimit.go
new file mode 100644
index 0000000..ef3cc08
--- /dev/null
+++ b/src/runtime/mgclimit.go
@@ -0,0 +1,484 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "runtime/internal/atomic"
+
+// gcCPULimiter is a mechanism to limit GC CPU utilization in situations
+// where it might become excessive and inhibit application progress (e.g.
+// a death spiral).
+//
+// The core of the limiter is a leaky bucket mechanism that fills with GC
+// CPU time and drains with mutator time. Because the bucket fills and
+// drains with time directly (i.e. without any weighting), this effectively
+// sets a very conservative limit of 50%. This limit could be enforced directly,
+// however, but the purpose of the bucket is to accommodate spikes in GC CPU
+// utilization without hurting throughput.
+//
+// Note that the bucket in the leaky bucket mechanism can never go negative,
+// so the GC never gets credit for a lot of CPU time spent without the GC
+// running. This is intentional, as an application that stays idle for, say,
+// an entire day, could build up enough credit to fail to prevent a death
+// spiral the following day. The bucket's capacity is the GC's only leeway.
+//
+// The capacity thus also sets the window the limiter considers. For example,
+// if the capacity of the bucket is 1 cpu-second, then the limiter will not
+// kick in until at least 1 full cpu-second in the last 2 cpu-second window
+// is spent on GC CPU time.
+var gcCPULimiter gcCPULimiterState
+
+type gcCPULimiterState struct {
+	lock atomic.Uint32
+
+	enabled atomic.Bool
+	bucket  struct {
+		// Invariants:
+		// - fill >= 0
+		// - capacity >= 0
+		// - fill <= capacity
+		fill, capacity uint64
+	}
+	// overflow is the cumulative amount of GC CPU time that we tried to fill the
+	// bucket with but exceeded its capacity.
+	overflow uint64
+
+	// gcEnabled is an internal copy of gcBlackenEnabled that determines
+	// whether the limiter tracks total assist time.
+	//
+	// gcBlackenEnabled isn't used directly so as to keep this structure
+	// unit-testable.
+	gcEnabled bool
+
+	// transitioning is true when the GC is in a STW and transitioning between
+	// the mark and sweep phases.
+	transitioning bool
+
+	// assistTimePool is the accumulated assist time since the last update.
+	assistTimePool atomic.Int64
+
+	// idleMarkTimePool is the accumulated idle mark time since the last update.
+	idleMarkTimePool atomic.Int64
+
+	// idleTimePool is the accumulated time Ps spent on the idle list since the last update.
+	idleTimePool atomic.Int64
+
+	// lastUpdate is the nanotime timestamp of the last time update was called.
+	//
+	// Updated under lock, but may be read concurrently.
+	lastUpdate atomic.Int64
+
+	// lastEnabledCycle is the GC cycle that last had the limiter enabled.
+	lastEnabledCycle atomic.Uint32
+
+	// nprocs is an internal copy of gomaxprocs, used to determine total available
+	// CPU time.
+	//
+	// gomaxprocs isn't used directly so as to keep this structure unit-testable.
+	nprocs int32
+
+	// test indicates whether this instance of the struct was made for testing purposes.
+	test bool
+}
+
+// limiting returns true if the CPU limiter is currently enabled, meaning the Go GC
+// should take action to limit CPU utilization.
+//
+// It is safe to call concurrently with other operations.
+func (l *gcCPULimiterState) limiting() bool {
+	return l.enabled.Load()
+}
+
+// startGCTransition notifies the limiter of a GC transition.
+//
+// This call takes ownership of the limiter and disables all other means of
+// updating the limiter. Release ownership by calling finishGCTransition.
+//
+// It is safe to call concurrently with other operations.
+func (l *gcCPULimiterState) startGCTransition(enableGC bool, now int64) {
+	if !l.tryLock() {
+		// This must happen during a STW, so we can't fail to acquire the lock.
+		// If we did, something went wrong. Throw.
+		throw("failed to acquire lock to start a GC transition")
+	}
+	if l.gcEnabled == enableGC {
+		throw("transitioning GC to the same state as before?")
+	}
+	// Flush whatever was left between the last update and now.
+	l.updateLocked(now)
+	l.gcEnabled = enableGC
+	l.transitioning = true
+	// N.B. finishGCTransition releases the lock.
+	//
+	// We don't release here to increase the chance that if there's a failure
+	// to finish the transition, that we throw on failing to acquire the lock.
+}
+
+// finishGCTransition notifies the limiter that the GC transition is complete
+// and releases ownership of it. It also accumulates STW time in the bucket.
+// now must be the timestamp from the end of the STW pause.
+func (l *gcCPULimiterState) finishGCTransition(now int64) {
+	if !l.transitioning {
+		throw("finishGCTransition called without starting one?")
+	}
+	// Count the full nprocs set of CPU time because the world is stopped
+	// between startGCTransition and finishGCTransition. Even though the GC
+	// isn't running on all CPUs, it is preventing user code from doing so,
+	// so it might as well be.
+	if lastUpdate := l.lastUpdate.Load(); now >= lastUpdate {
+		l.accumulate(0, (now-lastUpdate)*int64(l.nprocs))
+	}
+	l.lastUpdate.Store(now)
+	l.transitioning = false
+	l.unlock()
+}
+
+// gcCPULimiterUpdatePeriod dictates the maximum amount of wall-clock time
+// we can go before updating the limiter.
+const gcCPULimiterUpdatePeriod = 10e6 // 10ms
+
+// needUpdate returns true if the limiter's maximum update period has been
+// exceeded, and so would benefit from an update.
+func (l *gcCPULimiterState) needUpdate(now int64) bool {
+	return now-l.lastUpdate.Load() > gcCPULimiterUpdatePeriod
+}
+
+// addAssistTime notifies the limiter of additional assist time. It will be
+// included in the next update.
+func (l *gcCPULimiterState) addAssistTime(t int64) {
+	l.assistTimePool.Add(t)
+}
+
+// addIdleTime notifies the limiter of additional time a P spent on the idle list. It will be
+// subtracted from the total CPU time in the next update.
+func (l *gcCPULimiterState) addIdleTime(t int64) {
+	l.idleTimePool.Add(t)
+}
+
+// update updates the bucket given runtime-specific information. now is the
+// current monotonic time in nanoseconds.
+//
+// This is safe to call concurrently with other operations, except *GCTransition.
+func (l *gcCPULimiterState) update(now int64) {
+	if !l.tryLock() {
+		// We failed to acquire the lock, which means something else is currently
+		// updating. Just drop our update, the next one to update will include
+		// our total assist time.
+		return
+	}
+	if l.transitioning {
+		throw("update during transition")
+	}
+	l.updateLocked(now)
+	l.unlock()
+}
+
+// updateLocked is the implementation of update. l.lock must be held.
+func (l *gcCPULimiterState) updateLocked(now int64) {
+	lastUpdate := l.lastUpdate.Load()
+	if now < lastUpdate {
+		// Defensively avoid overflow. This isn't even the latest update anyway.
+		return
+	}
+	windowTotalTime := (now - lastUpdate) * int64(l.nprocs)
+	l.lastUpdate.Store(now)
+
+	// Drain the pool of assist time.
+	assistTime := l.assistTimePool.Load()
+	if assistTime != 0 {
+		l.assistTimePool.Add(-assistTime)
+	}
+
+	// Drain the pool of idle time.
+	idleTime := l.idleTimePool.Load()
+	if idleTime != 0 {
+		l.idleTimePool.Add(-idleTime)
+	}
+
+	if !l.test {
+		// Consume time from in-flight events. Make sure we're not preemptible so allp can't change.
+		//
+		// The reason we do this instead of just waiting for those events to finish and push updates
+		// is to ensure that all the time we're accounting for happened sometime between lastUpdate
+		// and now. This dramatically simplifies reasoning about the limiter because we're not at
+		// risk of extra time being accounted for in this window than actually happened in this window,
+		// leading to all sorts of weird transient behavior.
+		mp := acquirem()
+		for _, pp := range allp {
+			typ, duration := pp.limiterEvent.consume(now)
+			switch typ {
+			case limiterEventIdleMarkWork:
+				fallthrough
+			case limiterEventIdle:
+				idleTime += duration
+				sched.idleTime.Add(duration)
+			case limiterEventMarkAssist:
+				fallthrough
+			case limiterEventScavengeAssist:
+				assistTime += duration
+			case limiterEventNone:
+				break
+			default:
+				throw("invalid limiter event type found")
+			}
+		}
+		releasem(mp)
+	}
+
+	// Compute total GC time.
+	windowGCTime := assistTime
+	if l.gcEnabled {
+		windowGCTime += int64(float64(windowTotalTime) * gcBackgroundUtilization)
+	}
+
+	// Subtract out all idle time from the total time. Do this after computing
+	// GC time, because the background utilization is dependent on the *real*
+	// total time, not the total time after idle time is subtracted.
+	//
+	// Idle time is counted as any time that a P is on the P idle list plus idle mark
+	// time. Idle mark workers soak up time that the application spends idle.
+	//
+	// On a heavily undersubscribed system, any additional idle time can skew GC CPU
+	// utilization, because the GC might be executing continuously and thrashing,
+	// yet the CPU utilization with respect to GOMAXPROCS will be quite low, so
+	// the limiter fails to turn on. By subtracting idle time, we're removing time that
+	// we know the application was idle giving a more accurate picture of whether
+	// the GC is thrashing.
+	//
+	// Note that this can cause the limiter to turn on even if it's not needed. For
+	// instance, on a system with 32 Ps but only 1 running goroutine, each GC will have
+	// 8 dedicated GC workers. Assuming the GC cycle is half mark phase and half sweep
+	// phase, then the GC CPU utilization over that cycle, with idle time removed, will
+	// be 8/(8+2) = 80%. Even though the limiter turns on, though, assist should be
+	// unnecessary, as the GC has way more CPU time to outpace the 1 goroutine that's
+	// running.
+	windowTotalTime -= idleTime
+
+	l.accumulate(windowTotalTime-windowGCTime, windowGCTime)
+}
+
+// accumulate adds time to the bucket and signals whether the limiter is enabled.
+//
+// This is an internal function that deals just with the bucket. Prefer update.
+// l.lock must be held.
+func (l *gcCPULimiterState) accumulate(mutatorTime, gcTime int64) {
+	headroom := l.bucket.capacity - l.bucket.fill
+	enabled := headroom == 0
+
+	// Let's be careful about three things here:
+	// 1. The addition and subtraction, for the invariants.
+	// 2. Overflow.
+	// 3. Excessive mutation of l.enabled, which is accessed
+	//    by all assists, potentially more than once.
+	change := gcTime - mutatorTime
+
+	// Handle limiting case.
+	if change > 0 && headroom <= uint64(change) {
+		l.overflow += uint64(change) - headroom
+		l.bucket.fill = l.bucket.capacity
+		if !enabled {
+			l.enabled.Store(true)
+			l.lastEnabledCycle.Store(memstats.numgc + 1)
+		}
+		return
+	}
+
+	// Handle non-limiting cases.
+	if change < 0 && l.bucket.fill <= uint64(-change) {
+		// Bucket emptied.
+		l.bucket.fill = 0
+	} else {
+		// All other cases.
+		l.bucket.fill -= uint64(-change)
+	}
+	if change != 0 && enabled {
+		l.enabled.Store(false)
+	}
+}
+
+// tryLock attempts to lock l. Returns true on success.
+func (l *gcCPULimiterState) tryLock() bool {
+	return l.lock.CompareAndSwap(0, 1)
+}
+
+// unlock releases the lock on l. Must be called if tryLock returns true.
+func (l *gcCPULimiterState) unlock() {
+	old := l.lock.Swap(0)
+	if old != 1 {
+		throw("double unlock")
+	}
+}
+
+// capacityPerProc is the limiter's bucket capacity for each P in GOMAXPROCS.
+const capacityPerProc = 1e9 // 1 second in nanoseconds
+
+// resetCapacity updates the capacity based on GOMAXPROCS. Must not be called
+// while the GC is enabled.
+//
+// It is safe to call concurrently with other operations.
+func (l *gcCPULimiterState) resetCapacity(now int64, nprocs int32) {
+	if !l.tryLock() {
+		// This must happen during a STW, so we can't fail to acquire the lock.
+		// If we did, something went wrong. Throw.
+		throw("failed to acquire lock to reset capacity")
+	}
+	// Flush the rest of the time for this period.
+	l.updateLocked(now)
+	l.nprocs = nprocs
+
+	l.bucket.capacity = uint64(nprocs) * capacityPerProc
+	if l.bucket.fill > l.bucket.capacity {
+		l.bucket.fill = l.bucket.capacity
+		l.enabled.Store(true)
+		l.lastEnabledCycle.Store(memstats.numgc + 1)
+	} else if l.bucket.fill < l.bucket.capacity {
+		l.enabled.Store(false)
+	}
+	l.unlock()
+}
+
+// limiterEventType indicates the type of an event occurring on some P.
+//
+// These events represent the full set of events that the GC CPU limiter tracks
+// to execute its function.
+//
+// This type may use no more than limiterEventBits bits of information.
+type limiterEventType uint8
+
+const (
+	limiterEventNone           limiterEventType = iota // None of the following events.
+	limiterEventIdleMarkWork                           // Refers to an idle mark worker (see gcMarkWorkerMode).
+	limiterEventMarkAssist                             // Refers to mark assist (see gcAssistAlloc).
+	limiterEventScavengeAssist                         // Refers to a scavenge assist (see allocSpan).
+	limiterEventIdle                                   // Refers to time a P spent on the idle list.
+
+	limiterEventBits = 3
+)
+
+// limiterEventTypeMask is a mask for the bits in p.limiterEventStart that represent
+// the event type. The rest of the bits of that field represent a timestamp.
+const (
+	limiterEventTypeMask  = uint64((1<<limiterEventBits)-1) << (64 - limiterEventBits)
+	limiterEventStampNone = limiterEventStamp(0)
+)
+
+// limiterEventStamp is a nanotime timestamp packed with a limiterEventType.
+type limiterEventStamp uint64
+
+// makeLimiterEventStamp creates a new stamp from the event type and the current timestamp.
+func makeLimiterEventStamp(typ limiterEventType, now int64) limiterEventStamp {
+	return limiterEventStamp(uint64(typ)<<(64-limiterEventBits) | (uint64(now) &^ limiterEventTypeMask))
+}
+
+// duration computes the difference between now and the start time stored in the stamp.
+//
+// Returns 0 if the difference is negative, which may happen if now is stale or if the
+// before and after timestamps cross a 2^(64-limiterEventBits) boundary.
+func (s limiterEventStamp) duration(now int64) int64 {
+	// The top limiterEventBits bits of the timestamp are derived from the current time
+	// when computing a duration.
+	start := int64((uint64(now) & limiterEventTypeMask) | (uint64(s) &^ limiterEventTypeMask))
+	if now < start {
+		return 0
+	}
+	return now - start
+}
+
+// type extracts the event type from the stamp.
+func (s limiterEventStamp) typ() limiterEventType {
+	return limiterEventType(s >> (64 - limiterEventBits))
+}
+
+// limiterEvent represents tracking state for an event tracked by the GC CPU limiter.
+type limiterEvent struct {
+	stamp atomic.Uint64 // Stores a limiterEventStamp.
+}
+
+// start begins tracking a new limiter event of the current type. If an event
+// is already in flight, then a new event cannot begin because the current time is
+// already being attributed to that event. In this case, this function returns false.
+// Otherwise, it returns true.
+//
+// The caller must be non-preemptible until at least stop is called or this function
+// returns false. Because this is trying to measure "on-CPU" time of some event, getting
+// scheduled away during it can mean that whatever we're measuring isn't a reflection
+// of "on-CPU" time. The OS could deschedule us at any time, but we want to maintain as
+// close of an approximation as we can.
+func (e *limiterEvent) start(typ limiterEventType, now int64) bool {
+	if limiterEventStamp(e.stamp.Load()).typ() != limiterEventNone {
+		return false
+	}
+	e.stamp.Store(uint64(makeLimiterEventStamp(typ, now)))
+	return true
+}
+
+// consume acquires the partial event CPU time from any in-flight event.
+// It achieves this by storing the current time as the new event time.
+//
+// Returns the type of the in-flight event, as well as how long it's currently been
+// executing for. Returns limiterEventNone if no event is active.
+func (e *limiterEvent) consume(now int64) (typ limiterEventType, duration int64) {
+	// Read the limiter event timestamp and update it to now.
+	for {
+		old := limiterEventStamp(e.stamp.Load())
+		typ = old.typ()
+		if typ == limiterEventNone {
+			// There's no in-flight event, so just push that up.
+			return
+		}
+		duration = old.duration(now)
+		if duration == 0 {
+			// We might have a stale now value, or this crossed the
+			// 2^(64-limiterEventBits) boundary in the clock readings.
+			// Just ignore it.
+			return limiterEventNone, 0
+		}
+		new := makeLimiterEventStamp(typ, now)
+		if e.stamp.CompareAndSwap(uint64(old), uint64(new)) {
+			break
+		}
+	}
+	return
+}
+
+// stop stops the active limiter event. Throws if the
+//
+// The caller must be non-preemptible across the event. See start as to why.
+func (e *limiterEvent) stop(typ limiterEventType, now int64) {
+	var stamp limiterEventStamp
+	for {
+		stamp = limiterEventStamp(e.stamp.Load())
+		if stamp.typ() != typ {
+			print("runtime: want=", typ, " got=", stamp.typ(), "\n")
+			throw("limiterEvent.stop: found wrong event in p's limiter event slot")
+		}
+		if e.stamp.CompareAndSwap(uint64(stamp), uint64(limiterEventStampNone)) {
+			break
+		}
+	}
+	duration := stamp.duration(now)
+	if duration == 0 {
+		// It's possible that we're missing time because we crossed a
+		// 2^(64-limiterEventBits) boundary between the start and end.
+		// In this case, we're dropping that information. This is OK because
+		// at worst it'll cause a transient hiccup that will quickly resolve
+		// itself as all new timestamps begin on the other side of the boundary.
+		// Such a hiccup should be incredibly rare.
+		return
+	}
+	// Account for the event.
+	switch typ {
+	case limiterEventIdleMarkWork:
+		gcCPULimiter.addIdleTime(duration)
+	case limiterEventIdle:
+		gcCPULimiter.addIdleTime(duration)
+		sched.idleTime.Add(duration)
+	case limiterEventMarkAssist:
+		fallthrough
+	case limiterEventScavengeAssist:
+		gcCPULimiter.addAssistTime(duration)
+	default:
+		throw("limiterEvent.stop: invalid limiter event type found")
+	}
+}
diff --git a/src/runtime/mgclimit_test.go b/src/runtime/mgclimit_test.go
new file mode 100644
index 0000000..124da03
--- /dev/null
+++ b/src/runtime/mgclimit_test.go
@@ -0,0 +1,255 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	. "runtime"
+	"testing"
+	"time"
+)
+
+func TestGCCPULimiter(t *testing.T) {
+	const procs = 14
+
+	// Create mock time.
+	ticks := int64(0)
+	advance := func(d time.Duration) int64 {
+		t.Helper()
+		ticks += int64(d)
+		return ticks
+	}
+
+	// assistTime computes the CPU time for assists using frac of GOMAXPROCS
+	// over the wall-clock duration d.
+	assistTime := func(d time.Duration, frac float64) int64 {
+		t.Helper()
+		return int64(frac * float64(d) * procs)
+	}
+
+	l := NewGCCPULimiter(ticks, procs)
+
+	// Do the whole test twice to make sure state doesn't leak across.
+	var baseOverflow uint64 // Track total overflow across iterations.
+	for i := 0; i < 2; i++ {
+		t.Logf("Iteration %d", i+1)
+
+		if l.Capacity() != procs*CapacityPerProc {
+			t.Fatalf("unexpected capacity: %d", l.Capacity())
+		}
+		if l.Fill() != 0 {
+			t.Fatalf("expected empty bucket to start")
+		}
+
+		// Test filling the bucket with just mutator time.
+
+		l.Update(advance(10 * time.Millisecond))
+		l.Update(advance(1 * time.Second))
+		l.Update(advance(1 * time.Hour))
+		if l.Fill() != 0 {
+			t.Fatalf("expected empty bucket from only accumulating mutator time, got fill of %d cpu-ns", l.Fill())
+		}
+
+		// Test needUpdate.
+
+		if l.NeedUpdate(advance(GCCPULimiterUpdatePeriod / 2)) {
+			t.Fatal("need update even though updated half a period ago")
+		}
+		if !l.NeedUpdate(advance(GCCPULimiterUpdatePeriod)) {
+			t.Fatal("doesn't need update even though updated 1.5 periods ago")
+		}
+		l.Update(advance(0))
+		if l.NeedUpdate(advance(0)) {
+			t.Fatal("need update even though just updated")
+		}
+
+		// Test transitioning the bucket to enable the GC.
+
+		l.StartGCTransition(true, advance(109*time.Millisecond))
+		l.FinishGCTransition(advance(2*time.Millisecond + 1*time.Microsecond))
+
+		if expect := uint64((2*time.Millisecond + 1*time.Microsecond) * procs); l.Fill() != expect {
+			t.Fatalf("expected fill of %d, got %d cpu-ns", expect, l.Fill())
+		}
+
+		// Test passing time without assists during a GC. Specifically, just enough to drain the bucket to
+		// exactly procs nanoseconds (easier to get to because of rounding).
+		//
+		// The window we need to drain the bucket is 1/(1-2*gcBackgroundUtilization) times the current fill:
+		//
+		//   fill + (window * procs * gcBackgroundUtilization - window * procs * (1-gcBackgroundUtilization)) = n
+		//   fill = n - (window * procs * gcBackgroundUtilization - window * procs * (1-gcBackgroundUtilization))
+		//   fill = n + window * procs * ((1-gcBackgroundUtilization) - gcBackgroundUtilization)
+		//   fill = n + window * procs * (1-2*gcBackgroundUtilization)
+		//   window = (fill - n) / (procs * (1-2*gcBackgroundUtilization)))
+		//
+		// And here we want n=procs:
+		factor := (1 / (1 - 2*GCBackgroundUtilization))
+		fill := (2*time.Millisecond + 1*time.Microsecond) * procs
+		l.Update(advance(time.Duration(factor * float64(fill-procs) / procs)))
+		if l.Fill() != procs {
+			t.Fatalf("expected fill %d cpu-ns from draining after a GC started, got fill of %d cpu-ns", procs, l.Fill())
+		}
+
+		// Drain to zero for the rest of the test.
+		l.Update(advance(2 * procs * CapacityPerProc))
+		if l.Fill() != 0 {
+			t.Fatalf("expected empty bucket from draining, got fill of %d cpu-ns", l.Fill())
+		}
+
+		// Test filling up the bucket with 50% total GC work (so, not moving the bucket at all).
+		l.AddAssistTime(assistTime(10*time.Millisecond, 0.5-GCBackgroundUtilization))
+		l.Update(advance(10 * time.Millisecond))
+		if l.Fill() != 0 {
+			t.Fatalf("expected empty bucket from 50%% GC work, got fill of %d cpu-ns", l.Fill())
+		}
+
+		// Test adding to the bucket overall with 100% GC work.
+		l.AddAssistTime(assistTime(time.Millisecond, 1.0-GCBackgroundUtilization))
+		l.Update(advance(time.Millisecond))
+		if expect := uint64(procs * time.Millisecond); l.Fill() != expect {
+			t.Errorf("expected %d fill from 100%% GC CPU, got fill of %d cpu-ns", expect, l.Fill())
+		}
+		if l.Limiting() {
+			t.Errorf("limiter is enabled after filling bucket but shouldn't be")
+		}
+		if t.Failed() {
+			t.FailNow()
+		}
+
+		// Test filling the bucket exactly full.
+		l.AddAssistTime(assistTime(CapacityPerProc-time.Millisecond, 1.0-GCBackgroundUtilization))
+		l.Update(advance(CapacityPerProc - time.Millisecond))
+		if l.Fill() != l.Capacity() {
+			t.Errorf("expected bucket filled to capacity %d, got %d", l.Capacity(), l.Fill())
+		}
+		if !l.Limiting() {
+			t.Errorf("limiter is not enabled after filling bucket but should be")
+		}
+		if l.Overflow() != 0+baseOverflow {
+			t.Errorf("bucket filled exactly should not have overflow, found %d", l.Overflow())
+		}
+		if t.Failed() {
+			t.FailNow()
+		}
+
+		// Test adding with a delta of exactly zero. That is, GC work is exactly 50% of all resources.
+		// Specifically, the limiter should still be on, and no overflow should accumulate.
+		l.AddAssistTime(assistTime(1*time.Second, 0.5-GCBackgroundUtilization))
+		l.Update(advance(1 * time.Second))
+		if l.Fill() != l.Capacity() {
+			t.Errorf("expected bucket filled to capacity %d, got %d", l.Capacity(), l.Fill())
+		}
+		if !l.Limiting() {
+			t.Errorf("limiter is not enabled after filling bucket but should be")
+		}
+		if l.Overflow() != 0+baseOverflow {
+			t.Errorf("bucket filled exactly should not have overflow, found %d", l.Overflow())
+		}
+		if t.Failed() {
+			t.FailNow()
+		}
+
+		// Drain the bucket by half.
+		l.AddAssistTime(assistTime(CapacityPerProc, 0))
+		l.Update(advance(CapacityPerProc))
+		if expect := l.Capacity() / 2; l.Fill() != expect {
+			t.Errorf("failed to drain to %d, got fill %d", expect, l.Fill())
+		}
+		if l.Limiting() {
+			t.Errorf("limiter is enabled after draining bucket but shouldn't be")
+		}
+		if t.Failed() {
+			t.FailNow()
+		}
+
+		// Test overfilling the bucket.
+		l.AddAssistTime(assistTime(CapacityPerProc, 1.0-GCBackgroundUtilization))
+		l.Update(advance(CapacityPerProc))
+		if l.Fill() != l.Capacity() {
+			t.Errorf("failed to fill to capacity %d, got fill %d", l.Capacity(), l.Fill())
+		}
+		if !l.Limiting() {
+			t.Errorf("limiter is not enabled after overfill but should be")
+		}
+		if expect := uint64(CapacityPerProc * procs / 2); l.Overflow() != expect+baseOverflow {
+			t.Errorf("bucket overfilled should have overflow %d, found %d", expect, l.Overflow())
+		}
+		if t.Failed() {
+			t.FailNow()
+		}
+
+		// Test ending the cycle with some assists left over.
+		l.AddAssistTime(assistTime(1*time.Millisecond, 1.0-GCBackgroundUtilization))
+		l.StartGCTransition(false, advance(1*time.Millisecond))
+		if l.Fill() != l.Capacity() {
+			t.Errorf("failed to maintain fill to capacity %d, got fill %d", l.Capacity(), l.Fill())
+		}
+		if !l.Limiting() {
+			t.Errorf("limiter is not enabled after overfill but should be")
+		}
+		if expect := uint64((CapacityPerProc/2 + time.Millisecond) * procs); l.Overflow() != expect+baseOverflow {
+			t.Errorf("bucket overfilled should have overflow %d, found %d", expect, l.Overflow())
+		}
+		if t.Failed() {
+			t.FailNow()
+		}
+
+		// Make sure the STW adds to the bucket.
+		l.FinishGCTransition(advance(5 * time.Millisecond))
+		if l.Fill() != l.Capacity() {
+			t.Errorf("failed to maintain fill to capacity %d, got fill %d", l.Capacity(), l.Fill())
+		}
+		if !l.Limiting() {
+			t.Errorf("limiter is not enabled after overfill but should be")
+		}
+		if expect := uint64((CapacityPerProc/2 + 6*time.Millisecond) * procs); l.Overflow() != expect+baseOverflow {
+			t.Errorf("bucket overfilled should have overflow %d, found %d", expect, l.Overflow())
+		}
+		if t.Failed() {
+			t.FailNow()
+		}
+
+		// Resize procs up and make sure limiting stops.
+		expectFill := l.Capacity()
+		l.ResetCapacity(advance(0), procs+10)
+		if l.Fill() != expectFill {
+			t.Errorf("failed to maintain fill at old capacity %d, got fill %d", expectFill, l.Fill())
+		}
+		if l.Limiting() {
+			t.Errorf("limiter is enabled after resetting capacity higher")
+		}
+		if expect := uint64((CapacityPerProc/2 + 6*time.Millisecond) * procs); l.Overflow() != expect+baseOverflow {
+			t.Errorf("bucket overflow %d should have remained constant, found %d", expect, l.Overflow())
+		}
+		if t.Failed() {
+			t.FailNow()
+		}
+
+		// Resize procs down and make sure limiting begins again.
+		// Also make sure resizing doesn't affect overflow. This isn't
+		// a case where we want to report overflow, because we're not
+		// actively doing work to achieve it. It's that we have fewer
+		// CPU resources now.
+		l.ResetCapacity(advance(0), procs-10)
+		if l.Fill() != l.Capacity() {
+			t.Errorf("failed lower fill to new capacity %d, got fill %d", l.Capacity(), l.Fill())
+		}
+		if !l.Limiting() {
+			t.Errorf("limiter is disabled after resetting capacity lower")
+		}
+		if expect := uint64((CapacityPerProc/2 + 6*time.Millisecond) * procs); l.Overflow() != expect+baseOverflow {
+			t.Errorf("bucket overflow %d should have remained constant, found %d", expect, l.Overflow())
+		}
+		if t.Failed() {
+			t.FailNow()
+		}
+
+		// Get back to a zero state. The top of the loop will double check.
+		l.ResetCapacity(advance(CapacityPerProc*procs), procs)
+
+		// Track total overflow for future iterations.
+		baseOverflow += uint64((CapacityPerProc/2 + 6*time.Millisecond) * procs)
+	}
+}
diff --git a/src/runtime/mgcmark.go b/src/runtime/mgcmark.go
new file mode 100644
index 0000000..2ed411a
--- /dev/null
+++ b/src/runtime/mgcmark.go
@@ -0,0 +1,1598 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Garbage collector: marking and scanning
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	fixedRootFinalizers = iota
+	fixedRootFreeGStacks
+	fixedRootCount
+
+	// rootBlockBytes is the number of bytes to scan per data or
+	// BSS root.
+	rootBlockBytes = 256 << 10
+
+	// maxObletBytes is the maximum bytes of an object to scan at
+	// once. Larger objects will be split up into "oblets" of at
+	// most this size. Since we can scan 1–2 MB/ms, 128 KB bounds
+	// scan preemption at ~100 µs.
+	//
+	// This must be > _MaxSmallSize so that the object base is the
+	// span base.
+	maxObletBytes = 128 << 10
+
+	// drainCheckThreshold specifies how many units of work to do
+	// between self-preemption checks in gcDrain. Assuming a scan
+	// rate of 1 MB/ms, this is ~100 µs. Lower values have higher
+	// overhead in the scan loop (the scheduler check may perform
+	// a syscall, so its overhead is nontrivial). Higher values
+	// make the system less responsive to incoming work.
+	drainCheckThreshold = 100000
+
+	// pagesPerSpanRoot indicates how many pages to scan from a span root
+	// at a time. Used by special root marking.
+	//
+	// Higher values improve throughput by increasing locality, but
+	// increase the minimum latency of a marking operation.
+	//
+	// Must be a multiple of the pageInUse bitmap element size and
+	// must also evenly divide pagesPerArena.
+	pagesPerSpanRoot = 512
+)
+
+// gcMarkRootPrepare queues root scanning jobs (stacks, globals, and
+// some miscellany) and initializes scanning-related state.
+//
+// The world must be stopped.
+func gcMarkRootPrepare() {
+	assertWorldStopped()
+
+	// Compute how many data and BSS root blocks there are.
+	nBlocks := func(bytes uintptr) int {
+		return int(divRoundUp(bytes, rootBlockBytes))
+	}
+
+	work.nDataRoots = 0
+	work.nBSSRoots = 0
+
+	// Scan globals.
+	for _, datap := range activeModules() {
+		nDataRoots := nBlocks(datap.edata - datap.data)
+		if nDataRoots > work.nDataRoots {
+			work.nDataRoots = nDataRoots
+		}
+	}
+
+	for _, datap := range activeModules() {
+		nBSSRoots := nBlocks(datap.ebss - datap.bss)
+		if nBSSRoots > work.nBSSRoots {
+			work.nBSSRoots = nBSSRoots
+		}
+	}
+
+	// Scan span roots for finalizer specials.
+	//
+	// We depend on addfinalizer to mark objects that get
+	// finalizers after root marking.
+	//
+	// We're going to scan the whole heap (that was available at the time the
+	// mark phase started, i.e. markArenas) for in-use spans which have specials.
+	//
+	// Break up the work into arenas, and further into chunks.
+	//
+	// Snapshot allArenas as markArenas. This snapshot is safe because allArenas
+	// is append-only.
+	mheap_.markArenas = mheap_.allArenas[:len(mheap_.allArenas):len(mheap_.allArenas)]
+	work.nSpanRoots = len(mheap_.markArenas) * (pagesPerArena / pagesPerSpanRoot)
+
+	// Scan stacks.
+	//
+	// Gs may be created after this point, but it's okay that we
+	// ignore them because they begin life without any roots, so
+	// there's nothing to scan, and any roots they create during
+	// the concurrent phase will be caught by the write barrier.
+	work.stackRoots = allGsSnapshot()
+	work.nStackRoots = len(work.stackRoots)
+
+	work.markrootNext = 0
+	work.markrootJobs = uint32(fixedRootCount + work.nDataRoots + work.nBSSRoots + work.nSpanRoots + work.nStackRoots)
+
+	// Calculate base indexes of each root type
+	work.baseData = uint32(fixedRootCount)
+	work.baseBSS = work.baseData + uint32(work.nDataRoots)
+	work.baseSpans = work.baseBSS + uint32(work.nBSSRoots)
+	work.baseStacks = work.baseSpans + uint32(work.nSpanRoots)
+	work.baseEnd = work.baseStacks + uint32(work.nStackRoots)
+}
+
+// gcMarkRootCheck checks that all roots have been scanned. It is
+// purely for debugging.
+func gcMarkRootCheck() {
+	if work.markrootNext < work.markrootJobs {
+		print(work.markrootNext, " of ", work.markrootJobs, " markroot jobs done\n")
+		throw("left over markroot jobs")
+	}
+
+	// Check that stacks have been scanned.
+	//
+	// We only check the first nStackRoots Gs that we should have scanned.
+	// Since we don't care about newer Gs (see comment in
+	// gcMarkRootPrepare), no locking is required.
+	i := 0
+	forEachGRace(func(gp *g) {
+		if i >= work.nStackRoots {
+			return
+		}
+
+		if !gp.gcscandone {
+			println("gp", gp, "goid", gp.goid,
+				"status", readgstatus(gp),
+				"gcscandone", gp.gcscandone)
+			throw("scan missed a g")
+		}
+
+		i++
+	})
+}
+
+// ptrmask for an allocation containing a single pointer.
+var oneptrmask = [...]uint8{1}
+
+// markroot scans the i'th root.
+//
+// Preemption must be disabled (because this uses a gcWork).
+//
+// Returns the amount of GC work credit produced by the operation.
+// If flushBgCredit is true, then that credit is also flushed
+// to the background credit pool.
+//
+// nowritebarrier is only advisory here.
+//
+//go:nowritebarrier
+func markroot(gcw *gcWork, i uint32, flushBgCredit bool) int64 {
+	// Note: if you add a case here, please also update heapdump.go:dumproots.
+	var workDone int64
+	var workCounter *atomic.Int64
+	switch {
+	case work.baseData <= i && i < work.baseBSS:
+		workCounter = &gcController.globalsScanWork
+		for _, datap := range activeModules() {
+			workDone += markrootBlock(datap.data, datap.edata-datap.data, datap.gcdatamask.bytedata, gcw, int(i-work.baseData))
+		}
+
+	case work.baseBSS <= i && i < work.baseSpans:
+		workCounter = &gcController.globalsScanWork
+		for _, datap := range activeModules() {
+			workDone += markrootBlock(datap.bss, datap.ebss-datap.bss, datap.gcbssmask.bytedata, gcw, int(i-work.baseBSS))
+		}
+
+	case i == fixedRootFinalizers:
+		for fb := allfin; fb != nil; fb = fb.alllink {
+			cnt := uintptr(atomic.Load(&fb.cnt))
+			scanblock(uintptr(unsafe.Pointer(&fb.fin[0])), cnt*unsafe.Sizeof(fb.fin[0]), &finptrmask[0], gcw, nil)
+		}
+
+	case i == fixedRootFreeGStacks:
+		// Switch to the system stack so we can call
+		// stackfree.
+		systemstack(markrootFreeGStacks)
+
+	case work.baseSpans <= i && i < work.baseStacks:
+		// mark mspan.specials
+		markrootSpans(gcw, int(i-work.baseSpans))
+
+	default:
+		// the rest is scanning goroutine stacks
+		workCounter = &gcController.stackScanWork
+		if i < work.baseStacks || work.baseEnd <= i {
+			printlock()
+			print("runtime: markroot index ", i, " not in stack roots range [", work.baseStacks, ", ", work.baseEnd, ")\n")
+			throw("markroot: bad index")
+		}
+		gp := work.stackRoots[i-work.baseStacks]
+
+		// remember when we've first observed the G blocked
+		// needed only to output in traceback
+		status := readgstatus(gp) // We are not in a scan state
+		if (status == _Gwaiting || status == _Gsyscall) && gp.waitsince == 0 {
+			gp.waitsince = work.tstart
+		}
+
+		// scanstack must be done on the system stack in case
+		// we're trying to scan our own stack.
+		systemstack(func() {
+			// If this is a self-scan, put the user G in
+			// _Gwaiting to prevent self-deadlock. It may
+			// already be in _Gwaiting if this is a mark
+			// worker or we're in mark termination.
+			userG := getg().m.curg
+			selfScan := gp == userG && readgstatus(userG) == _Grunning
+			if selfScan {
+				casGToWaiting(userG, _Grunning, waitReasonGarbageCollectionScan)
+			}
+
+			// TODO: suspendG blocks (and spins) until gp
+			// stops, which may take a while for
+			// running goroutines. Consider doing this in
+			// two phases where the first is non-blocking:
+			// we scan the stacks we can and ask running
+			// goroutines to scan themselves; and the
+			// second blocks.
+			stopped := suspendG(gp)
+			if stopped.dead {
+				gp.gcscandone = true
+				return
+			}
+			if gp.gcscandone {
+				throw("g already scanned")
+			}
+			workDone += scanstack(gp, gcw)
+			gp.gcscandone = true
+			resumeG(stopped)
+
+			if selfScan {
+				casgstatus(userG, _Gwaiting, _Grunning)
+			}
+		})
+	}
+	if workCounter != nil && workDone != 0 {
+		workCounter.Add(workDone)
+		if flushBgCredit {
+			gcFlushBgCredit(workDone)
+		}
+	}
+	return workDone
+}
+
+// markrootBlock scans the shard'th shard of the block of memory [b0,
+// b0+n0), with the given pointer mask.
+//
+// Returns the amount of work done.
+//
+//go:nowritebarrier
+func markrootBlock(b0, n0 uintptr, ptrmask0 *uint8, gcw *gcWork, shard int) int64 {
+	if rootBlockBytes%(8*goarch.PtrSize) != 0 {
+		// This is necessary to pick byte offsets in ptrmask0.
+		throw("rootBlockBytes must be a multiple of 8*ptrSize")
+	}
+
+	// Note that if b0 is toward the end of the address space,
+	// then b0 + rootBlockBytes might wrap around.
+	// These tests are written to avoid any possible overflow.
+	off := uintptr(shard) * rootBlockBytes
+	if off >= n0 {
+		return 0
+	}
+	b := b0 + off
+	ptrmask := (*uint8)(add(unsafe.Pointer(ptrmask0), uintptr(shard)*(rootBlockBytes/(8*goarch.PtrSize))))
+	n := uintptr(rootBlockBytes)
+	if off+n > n0 {
+		n = n0 - off
+	}
+
+	// Scan this shard.
+	scanblock(b, n, ptrmask, gcw, nil)
+	return int64(n)
+}
+
+// markrootFreeGStacks frees stacks of dead Gs.
+//
+// This does not free stacks of dead Gs cached on Ps, but having a few
+// cached stacks around isn't a problem.
+func markrootFreeGStacks() {
+	// Take list of dead Gs with stacks.
+	lock(&sched.gFree.lock)
+	list := sched.gFree.stack
+	sched.gFree.stack = gList{}
+	unlock(&sched.gFree.lock)
+	if list.empty() {
+		return
+	}
+
+	// Free stacks.
+	q := gQueue{list.head, list.head}
+	for gp := list.head.ptr(); gp != nil; gp = gp.schedlink.ptr() {
+		stackfree(gp.stack)
+		gp.stack.lo = 0
+		gp.stack.hi = 0
+		// Manipulate the queue directly since the Gs are
+		// already all linked the right way.
+		q.tail.set(gp)
+	}
+
+	// Put Gs back on the free list.
+	lock(&sched.gFree.lock)
+	sched.gFree.noStack.pushAll(q)
+	unlock(&sched.gFree.lock)
+}
+
+// markrootSpans marks roots for one shard of markArenas.
+//
+//go:nowritebarrier
+func markrootSpans(gcw *gcWork, shard int) {
+	// Objects with finalizers have two GC-related invariants:
+	//
+	// 1) Everything reachable from the object must be marked.
+	// This ensures that when we pass the object to its finalizer,
+	// everything the finalizer can reach will be retained.
+	//
+	// 2) Finalizer specials (which are not in the garbage
+	// collected heap) are roots. In practice, this means the fn
+	// field must be scanned.
+	sg := mheap_.sweepgen
+
+	// Find the arena and page index into that arena for this shard.
+	ai := mheap_.markArenas[shard/(pagesPerArena/pagesPerSpanRoot)]
+	ha := mheap_.arenas[ai.l1()][ai.l2()]
+	arenaPage := uint(uintptr(shard) * pagesPerSpanRoot % pagesPerArena)
+
+	// Construct slice of bitmap which we'll iterate over.
+	specialsbits := ha.pageSpecials[arenaPage/8:]
+	specialsbits = specialsbits[:pagesPerSpanRoot/8]
+	for i := range specialsbits {
+		// Find set bits, which correspond to spans with specials.
+		specials := atomic.Load8(&specialsbits[i])
+		if specials == 0 {
+			continue
+		}
+		for j := uint(0); j < 8; j++ {
+			if specials&(1<<j) == 0 {
+				continue
+			}
+			// Find the span for this bit.
+			//
+			// This value is guaranteed to be non-nil because having
+			// specials implies that the span is in-use, and since we're
+			// currently marking we can be sure that we don't have to worry
+			// about the span being freed and re-used.
+			s := ha.spans[arenaPage+uint(i)*8+j]
+
+			// The state must be mSpanInUse if the specials bit is set, so
+			// sanity check that.
+			if state := s.state.get(); state != mSpanInUse {
+				print("s.state = ", state, "\n")
+				throw("non in-use span found with specials bit set")
+			}
+			// Check that this span was swept (it may be cached or uncached).
+			if !useCheckmark && !(s.sweepgen == sg || s.sweepgen == sg+3) {
+				// sweepgen was updated (+2) during non-checkmark GC pass
+				print("sweep ", s.sweepgen, " ", sg, "\n")
+				throw("gc: unswept span")
+			}
+
+			// Lock the specials to prevent a special from being
+			// removed from the list while we're traversing it.
+			lock(&s.speciallock)
+			for sp := s.specials; sp != nil; sp = sp.next {
+				if sp.kind != _KindSpecialFinalizer {
+					continue
+				}
+				// don't mark finalized object, but scan it so we
+				// retain everything it points to.
+				spf := (*specialfinalizer)(unsafe.Pointer(sp))
+				// A finalizer can be set for an inner byte of an object, find object beginning.
+				p := s.base() + uintptr(spf.special.offset)/s.elemsize*s.elemsize
+
+				// Mark everything that can be reached from
+				// the object (but *not* the object itself or
+				// we'll never collect it).
+				if !s.spanclass.noscan() {
+					scanobject(p, gcw)
+				}
+
+				// The special itself is a root.
+				scanblock(uintptr(unsafe.Pointer(&spf.fn)), goarch.PtrSize, &oneptrmask[0], gcw, nil)
+			}
+			unlock(&s.speciallock)
+		}
+	}
+}
+
+// gcAssistAlloc performs GC work to make gp's assist debt positive.
+// gp must be the calling user goroutine.
+//
+// This must be called with preemption enabled.
+func gcAssistAlloc(gp *g) {
+	// Don't assist in non-preemptible contexts. These are
+	// generally fragile and won't allow the assist to block.
+	if getg() == gp.m.g0 {
+		return
+	}
+	if mp := getg().m; mp.locks > 0 || mp.preemptoff != "" {
+		return
+	}
+
+	traced := false
+retry:
+	if gcCPULimiter.limiting() {
+		// If the CPU limiter is enabled, intentionally don't
+		// assist to reduce the amount of CPU time spent in the GC.
+		if traced {
+			traceGCMarkAssistDone()
+		}
+		return
+	}
+	// Compute the amount of scan work we need to do to make the
+	// balance positive. When the required amount of work is low,
+	// we over-assist to build up credit for future allocations
+	// and amortize the cost of assisting.
+	assistWorkPerByte := gcController.assistWorkPerByte.Load()
+	assistBytesPerWork := gcController.assistBytesPerWork.Load()
+	debtBytes := -gp.gcAssistBytes
+	scanWork := int64(assistWorkPerByte * float64(debtBytes))
+	if scanWork < gcOverAssistWork {
+		scanWork = gcOverAssistWork
+		debtBytes = int64(assistBytesPerWork * float64(scanWork))
+	}
+
+	// Steal as much credit as we can from the background GC's
+	// scan credit. This is racy and may drop the background
+	// credit below 0 if two mutators steal at the same time. This
+	// will just cause steals to fail until credit is accumulated
+	// again, so in the long run it doesn't really matter, but we
+	// do have to handle the negative credit case.
+	bgScanCredit := gcController.bgScanCredit.Load()
+	stolen := int64(0)
+	if bgScanCredit > 0 {
+		if bgScanCredit < scanWork {
+			stolen = bgScanCredit
+			gp.gcAssistBytes += 1 + int64(assistBytesPerWork*float64(stolen))
+		} else {
+			stolen = scanWork
+			gp.gcAssistBytes += debtBytes
+		}
+		gcController.bgScanCredit.Add(-stolen)
+
+		scanWork -= stolen
+
+		if scanWork == 0 {
+			// We were able to steal all of the credit we
+			// needed.
+			if traced {
+				traceGCMarkAssistDone()
+			}
+			return
+		}
+	}
+
+	if traceEnabled() && !traced {
+		traced = true
+		traceGCMarkAssistStart()
+	}
+
+	// Perform assist work
+	systemstack(func() {
+		gcAssistAlloc1(gp, scanWork)
+		// The user stack may have moved, so this can't touch
+		// anything on it until it returns from systemstack.
+	})
+
+	completed := gp.param != nil
+	gp.param = nil
+	if completed {
+		gcMarkDone()
+	}
+
+	if gp.gcAssistBytes < 0 {
+		// We were unable steal enough credit or perform
+		// enough work to pay off the assist debt. We need to
+		// do one of these before letting the mutator allocate
+		// more to prevent over-allocation.
+		//
+		// If this is because we were preempted, reschedule
+		// and try some more.
+		if gp.preempt {
+			Gosched()
+			goto retry
+		}
+
+		// Add this G to an assist queue and park. When the GC
+		// has more background credit, it will satisfy queued
+		// assists before flushing to the global credit pool.
+		//
+		// Note that this does *not* get woken up when more
+		// work is added to the work list. The theory is that
+		// there wasn't enough work to do anyway, so we might
+		// as well let background marking take care of the
+		// work that is available.
+		if !gcParkAssist() {
+			goto retry
+		}
+
+		// At this point either background GC has satisfied
+		// this G's assist debt, or the GC cycle is over.
+	}
+	if traced {
+		traceGCMarkAssistDone()
+	}
+}
+
+// gcAssistAlloc1 is the part of gcAssistAlloc that runs on the system
+// stack. This is a separate function to make it easier to see that
+// we're not capturing anything from the user stack, since the user
+// stack may move while we're in this function.
+//
+// gcAssistAlloc1 indicates whether this assist completed the mark
+// phase by setting gp.param to non-nil. This can't be communicated on
+// the stack since it may move.
+//
+//go:systemstack
+func gcAssistAlloc1(gp *g, scanWork int64) {
+	// Clear the flag indicating that this assist completed the
+	// mark phase.
+	gp.param = nil
+
+	if atomic.Load(&gcBlackenEnabled) == 0 {
+		// The gcBlackenEnabled check in malloc races with the
+		// store that clears it but an atomic check in every malloc
+		// would be a performance hit.
+		// Instead we recheck it here on the non-preemptable system
+		// stack to determine if we should perform an assist.
+
+		// GC is done, so ignore any remaining debt.
+		gp.gcAssistBytes = 0
+		return
+	}
+	// Track time spent in this assist. Since we're on the
+	// system stack, this is non-preemptible, so we can
+	// just measure start and end time.
+	//
+	// Limiter event tracking might be disabled if we end up here
+	// while on a mark worker.
+	startTime := nanotime()
+	trackLimiterEvent := gp.m.p.ptr().limiterEvent.start(limiterEventMarkAssist, startTime)
+
+	decnwait := atomic.Xadd(&work.nwait, -1)
+	if decnwait == work.nproc {
+		println("runtime: work.nwait =", decnwait, "work.nproc=", work.nproc)
+		throw("nwait > work.nprocs")
+	}
+
+	// gcDrainN requires the caller to be preemptible.
+	casGToWaiting(gp, _Grunning, waitReasonGCAssistMarking)
+
+	// drain own cached work first in the hopes that it
+	// will be more cache friendly.
+	gcw := &getg().m.p.ptr().gcw
+	workDone := gcDrainN(gcw, scanWork)
+
+	casgstatus(gp, _Gwaiting, _Grunning)
+
+	// Record that we did this much scan work.
+	//
+	// Back out the number of bytes of assist credit that
+	// this scan work counts for. The "1+" is a poor man's
+	// round-up, to ensure this adds credit even if
+	// assistBytesPerWork is very low.
+	assistBytesPerWork := gcController.assistBytesPerWork.Load()
+	gp.gcAssistBytes += 1 + int64(assistBytesPerWork*float64(workDone))
+
+	// If this is the last worker and we ran out of work,
+	// signal a completion point.
+	incnwait := atomic.Xadd(&work.nwait, +1)
+	if incnwait > work.nproc {
+		println("runtime: work.nwait=", incnwait,
+			"work.nproc=", work.nproc)
+		throw("work.nwait > work.nproc")
+	}
+
+	if incnwait == work.nproc && !gcMarkWorkAvailable(nil) {
+		// This has reached a background completion point. Set
+		// gp.param to a non-nil value to indicate this. It
+		// doesn't matter what we set it to (it just has to be
+		// a valid pointer).
+		gp.param = unsafe.Pointer(gp)
+	}
+	now := nanotime()
+	duration := now - startTime
+	pp := gp.m.p.ptr()
+	pp.gcAssistTime += duration
+	if trackLimiterEvent {
+		pp.limiterEvent.stop(limiterEventMarkAssist, now)
+	}
+	if pp.gcAssistTime > gcAssistTimeSlack {
+		gcController.assistTime.Add(pp.gcAssistTime)
+		gcCPULimiter.update(now)
+		pp.gcAssistTime = 0
+	}
+}
+
+// gcWakeAllAssists wakes all currently blocked assists. This is used
+// at the end of a GC cycle. gcBlackenEnabled must be false to prevent
+// new assists from going to sleep after this point.
+func gcWakeAllAssists() {
+	lock(&work.assistQueue.lock)
+	list := work.assistQueue.q.popList()
+	injectglist(&list)
+	unlock(&work.assistQueue.lock)
+}
+
+// gcParkAssist puts the current goroutine on the assist queue and parks.
+//
+// gcParkAssist reports whether the assist is now satisfied. If it
+// returns false, the caller must retry the assist.
+func gcParkAssist() bool {
+	lock(&work.assistQueue.lock)
+	// If the GC cycle finished while we were getting the lock,
+	// exit the assist. The cycle can't finish while we hold the
+	// lock.
+	if atomic.Load(&gcBlackenEnabled) == 0 {
+		unlock(&work.assistQueue.lock)
+		return true
+	}
+
+	gp := getg()
+	oldList := work.assistQueue.q
+	work.assistQueue.q.pushBack(gp)
+
+	// Recheck for background credit now that this G is in
+	// the queue, but can still back out. This avoids a
+	// race in case background marking has flushed more
+	// credit since we checked above.
+	if gcController.bgScanCredit.Load() > 0 {
+		work.assistQueue.q = oldList
+		if oldList.tail != 0 {
+			oldList.tail.ptr().schedlink.set(nil)
+		}
+		unlock(&work.assistQueue.lock)
+		return false
+	}
+	// Park.
+	goparkunlock(&work.assistQueue.lock, waitReasonGCAssistWait, traceBlockGCMarkAssist, 2)
+	return true
+}
+
+// gcFlushBgCredit flushes scanWork units of background scan work
+// credit. This first satisfies blocked assists on the
+// work.assistQueue and then flushes any remaining credit to
+// gcController.bgScanCredit.
+//
+// Write barriers are disallowed because this is used by gcDrain after
+// it has ensured that all work is drained and this must preserve that
+// condition.
+//
+//go:nowritebarrierrec
+func gcFlushBgCredit(scanWork int64) {
+	if work.assistQueue.q.empty() {
+		// Fast path; there are no blocked assists. There's a
+		// small window here where an assist may add itself to
+		// the blocked queue and park. If that happens, we'll
+		// just get it on the next flush.
+		gcController.bgScanCredit.Add(scanWork)
+		return
+	}
+
+	assistBytesPerWork := gcController.assistBytesPerWork.Load()
+	scanBytes := int64(float64(scanWork) * assistBytesPerWork)
+
+	lock(&work.assistQueue.lock)
+	for !work.assistQueue.q.empty() && scanBytes > 0 {
+		gp := work.assistQueue.q.pop()
+		// Note that gp.gcAssistBytes is negative because gp
+		// is in debt. Think carefully about the signs below.
+		if scanBytes+gp.gcAssistBytes >= 0 {
+			// Satisfy this entire assist debt.
+			scanBytes += gp.gcAssistBytes
+			gp.gcAssistBytes = 0
+			// It's important that we *not* put gp in
+			// runnext. Otherwise, it's possible for user
+			// code to exploit the GC worker's high
+			// scheduler priority to get itself always run
+			// before other goroutines and always in the
+			// fresh quantum started by GC.
+			ready(gp, 0, false)
+		} else {
+			// Partially satisfy this assist.
+			gp.gcAssistBytes += scanBytes
+			scanBytes = 0
+			// As a heuristic, we move this assist to the
+			// back of the queue so that large assists
+			// can't clog up the assist queue and
+			// substantially delay small assists.
+			work.assistQueue.q.pushBack(gp)
+			break
+		}
+	}
+
+	if scanBytes > 0 {
+		// Convert from scan bytes back to work.
+		assistWorkPerByte := gcController.assistWorkPerByte.Load()
+		scanWork = int64(float64(scanBytes) * assistWorkPerByte)
+		gcController.bgScanCredit.Add(scanWork)
+	}
+	unlock(&work.assistQueue.lock)
+}
+
+// scanstack scans gp's stack, greying all pointers found on the stack.
+//
+// Returns the amount of scan work performed, but doesn't update
+// gcController.stackScanWork or flush any credit. Any background credit produced
+// by this function should be flushed by its caller. scanstack itself can't
+// safely flush because it may result in trying to wake up a goroutine that
+// was just scanned, resulting in a self-deadlock.
+//
+// scanstack will also shrink the stack if it is safe to do so. If it
+// is not, it schedules a stack shrink for the next synchronous safe
+// point.
+//
+// scanstack is marked go:systemstack because it must not be preempted
+// while using a workbuf.
+//
+//go:nowritebarrier
+//go:systemstack
+func scanstack(gp *g, gcw *gcWork) int64 {
+	if readgstatus(gp)&_Gscan == 0 {
+		print("runtime:scanstack: gp=", gp, ", goid=", gp.goid, ", gp->atomicstatus=", hex(readgstatus(gp)), "\n")
+		throw("scanstack - bad status")
+	}
+
+	switch readgstatus(gp) &^ _Gscan {
+	default:
+		print("runtime: gp=", gp, ", goid=", gp.goid, ", gp->atomicstatus=", readgstatus(gp), "\n")
+		throw("mark - bad status")
+	case _Gdead:
+		return 0
+	case _Grunning:
+		print("runtime: gp=", gp, ", goid=", gp.goid, ", gp->atomicstatus=", readgstatus(gp), "\n")
+		throw("scanstack: goroutine not stopped")
+	case _Grunnable, _Gsyscall, _Gwaiting:
+		// ok
+	}
+
+	if gp == getg() {
+		throw("can't scan our own stack")
+	}
+
+	// scannedSize is the amount of work we'll be reporting.
+	//
+	// It is less than the allocated size (which is hi-lo).
+	var sp uintptr
+	if gp.syscallsp != 0 {
+		sp = gp.syscallsp // If in a system call this is the stack pointer (gp.sched.sp can be 0 in this case on Windows).
+	} else {
+		sp = gp.sched.sp
+	}
+	scannedSize := gp.stack.hi - sp
+
+	// Keep statistics for initial stack size calculation.
+	// Note that this accumulates the scanned size, not the allocated size.
+	p := getg().m.p.ptr()
+	p.scannedStackSize += uint64(scannedSize)
+	p.scannedStacks++
+
+	if isShrinkStackSafe(gp) {
+		// Shrink the stack if not much of it is being used.
+		shrinkstack(gp)
+	} else {
+		// Otherwise, shrink the stack at the next sync safe point.
+		gp.preemptShrink = true
+	}
+
+	var state stackScanState
+	state.stack = gp.stack
+
+	if stackTraceDebug {
+		println("stack trace goroutine", gp.goid)
+	}
+
+	if debugScanConservative && gp.asyncSafePoint {
+		print("scanning async preempted goroutine ", gp.goid, " stack [", hex(gp.stack.lo), ",", hex(gp.stack.hi), ")\n")
+	}
+
+	// Scan the saved context register. This is effectively a live
+	// register that gets moved back and forth between the
+	// register and sched.ctxt without a write barrier.
+	if gp.sched.ctxt != nil {
+		scanblock(uintptr(unsafe.Pointer(&gp.sched.ctxt)), goarch.PtrSize, &oneptrmask[0], gcw, &state)
+	}
+
+	// Scan the stack. Accumulate a list of stack objects.
+	var u unwinder
+	for u.init(gp, 0); u.valid(); u.next() {
+		scanframeworker(&u.frame, &state, gcw)
+	}
+
+	// Find additional pointers that point into the stack from the heap.
+	// Currently this includes defers and panics. See also function copystack.
+
+	// Find and trace other pointers in defer records.
+	for d := gp._defer; d != nil; d = d.link {
+		if d.fn != nil {
+			// Scan the func value, which could be a stack allocated closure.
+			// See issue 30453.
+			scanblock(uintptr(unsafe.Pointer(&d.fn)), goarch.PtrSize, &oneptrmask[0], gcw, &state)
+		}
+		if d.link != nil {
+			// The link field of a stack-allocated defer record might point
+			// to a heap-allocated defer record. Keep that heap record live.
+			scanblock(uintptr(unsafe.Pointer(&d.link)), goarch.PtrSize, &oneptrmask[0], gcw, &state)
+		}
+		// Retain defers records themselves.
+		// Defer records might not be reachable from the G through regular heap
+		// tracing because the defer linked list might weave between the stack and the heap.
+		if d.heap {
+			scanblock(uintptr(unsafe.Pointer(&d)), goarch.PtrSize, &oneptrmask[0], gcw, &state)
+		}
+	}
+	if gp._panic != nil {
+		// Panics are always stack allocated.
+		state.putPtr(uintptr(unsafe.Pointer(gp._panic)), false)
+	}
+
+	// Find and scan all reachable stack objects.
+	//
+	// The state's pointer queue prioritizes precise pointers over
+	// conservative pointers so that we'll prefer scanning stack
+	// objects precisely.
+	state.buildIndex()
+	for {
+		p, conservative := state.getPtr()
+		if p == 0 {
+			break
+		}
+		obj := state.findObject(p)
+		if obj == nil {
+			continue
+		}
+		r := obj.r
+		if r == nil {
+			// We've already scanned this object.
+			continue
+		}
+		obj.setRecord(nil) // Don't scan it again.
+		if stackTraceDebug {
+			printlock()
+			print("  live stkobj at", hex(state.stack.lo+uintptr(obj.off)), "of size", obj.size)
+			if conservative {
+				print(" (conservative)")
+			}
+			println()
+			printunlock()
+		}
+		gcdata := r.gcdata()
+		var s *mspan
+		if r.useGCProg() {
+			// This path is pretty unlikely, an object large enough
+			// to have a GC program allocated on the stack.
+			// We need some space to unpack the program into a straight
+			// bitmask, which we allocate/free here.
+			// TODO: it would be nice if there were a way to run a GC
+			// program without having to store all its bits. We'd have
+			// to change from a Lempel-Ziv style program to something else.
+			// Or we can forbid putting objects on stacks if they require
+			// a gc program (see issue 27447).
+			s = materializeGCProg(r.ptrdata(), gcdata)
+			gcdata = (*byte)(unsafe.Pointer(s.startAddr))
+		}
+
+		b := state.stack.lo + uintptr(obj.off)
+		if conservative {
+			scanConservative(b, r.ptrdata(), gcdata, gcw, &state)
+		} else {
+			scanblock(b, r.ptrdata(), gcdata, gcw, &state)
+		}
+
+		if s != nil {
+			dematerializeGCProg(s)
+		}
+	}
+
+	// Deallocate object buffers.
+	// (Pointer buffers were all deallocated in the loop above.)
+	for state.head != nil {
+		x := state.head
+		state.head = x.next
+		if stackTraceDebug {
+			for i := 0; i < x.nobj; i++ {
+				obj := &x.obj[i]
+				if obj.r == nil { // reachable
+					continue
+				}
+				println("  dead stkobj at", hex(gp.stack.lo+uintptr(obj.off)), "of size", obj.r.size)
+				// Note: not necessarily really dead - only reachable-from-ptr dead.
+			}
+		}
+		x.nobj = 0
+		putempty((*workbuf)(unsafe.Pointer(x)))
+	}
+	if state.buf != nil || state.cbuf != nil || state.freeBuf != nil {
+		throw("remaining pointer buffers")
+	}
+	return int64(scannedSize)
+}
+
+// Scan a stack frame: local variables and function arguments/results.
+//
+//go:nowritebarrier
+func scanframeworker(frame *stkframe, state *stackScanState, gcw *gcWork) {
+	if _DebugGC > 1 && frame.continpc != 0 {
+		print("scanframe ", funcname(frame.fn), "\n")
+	}
+
+	isAsyncPreempt := frame.fn.valid() && frame.fn.funcID == abi.FuncID_asyncPreempt
+	isDebugCall := frame.fn.valid() && frame.fn.funcID == abi.FuncID_debugCallV2
+	if state.conservative || isAsyncPreempt || isDebugCall {
+		if debugScanConservative {
+			println("conservatively scanning function", funcname(frame.fn), "at PC", hex(frame.continpc))
+		}
+
+		// Conservatively scan the frame. Unlike the precise
+		// case, this includes the outgoing argument space
+		// since we may have stopped while this function was
+		// setting up a call.
+		//
+		// TODO: We could narrow this down if the compiler
+		// produced a single map per function of stack slots
+		// and registers that ever contain a pointer.
+		if frame.varp != 0 {
+			size := frame.varp - frame.sp
+			if size > 0 {
+				scanConservative(frame.sp, size, nil, gcw, state)
+			}
+		}
+
+		// Scan arguments to this frame.
+		if n := frame.argBytes(); n != 0 {
+			// TODO: We could pass the entry argument map
+			// to narrow this down further.
+			scanConservative(frame.argp, n, nil, gcw, state)
+		}
+
+		if isAsyncPreempt || isDebugCall {
+			// This function's frame contained the
+			// registers for the asynchronously stopped
+			// parent frame. Scan the parent
+			// conservatively.
+			state.conservative = true
+		} else {
+			// We only wanted to scan those two frames
+			// conservatively. Clear the flag for future
+			// frames.
+			state.conservative = false
+		}
+		return
+	}
+
+	locals, args, objs := frame.getStackMap(&state.cache, false)
+
+	// Scan local variables if stack frame has been allocated.
+	if locals.n > 0 {
+		size := uintptr(locals.n) * goarch.PtrSize
+		scanblock(frame.varp-size, size, locals.bytedata, gcw, state)
+	}
+
+	// Scan arguments.
+	if args.n > 0 {
+		scanblock(frame.argp, uintptr(args.n)*goarch.PtrSize, args.bytedata, gcw, state)
+	}
+
+	// Add all stack objects to the stack object list.
+	if frame.varp != 0 {
+		// varp is 0 for defers, where there are no locals.
+		// In that case, there can't be a pointer to its args, either.
+		// (And all args would be scanned above anyway.)
+		for i := range objs {
+			obj := &objs[i]
+			off := obj.off
+			base := frame.varp // locals base pointer
+			if off >= 0 {
+				base = frame.argp // arguments and return values base pointer
+			}
+			ptr := base + uintptr(off)
+			if ptr < frame.sp {
+				// object hasn't been allocated in the frame yet.
+				continue
+			}
+			if stackTraceDebug {
+				println("stkobj at", hex(ptr), "of size", obj.size)
+			}
+			state.addObject(ptr, obj)
+		}
+	}
+}
+
+type gcDrainFlags int
+
+const (
+	gcDrainUntilPreempt gcDrainFlags = 1 << iota
+	gcDrainFlushBgCredit
+	gcDrainIdle
+	gcDrainFractional
+)
+
+// gcDrain scans roots and objects in work buffers, blackening grey
+// objects until it is unable to get more work. It may return before
+// GC is done; it's the caller's responsibility to balance work from
+// other Ps.
+//
+// If flags&gcDrainUntilPreempt != 0, gcDrain returns when g.preempt
+// is set.
+//
+// If flags&gcDrainIdle != 0, gcDrain returns when there is other work
+// to do.
+//
+// If flags&gcDrainFractional != 0, gcDrain self-preempts when
+// pollFractionalWorkerExit() returns true. This implies
+// gcDrainNoBlock.
+//
+// If flags&gcDrainFlushBgCredit != 0, gcDrain flushes scan work
+// credit to gcController.bgScanCredit every gcCreditSlack units of
+// scan work.
+//
+// gcDrain will always return if there is a pending STW.
+//
+//go:nowritebarrier
+func gcDrain(gcw *gcWork, flags gcDrainFlags) {
+	if !writeBarrier.needed {
+		throw("gcDrain phase incorrect")
+	}
+
+	gp := getg().m.curg
+	preemptible := flags&gcDrainUntilPreempt != 0
+	flushBgCredit := flags&gcDrainFlushBgCredit != 0
+	idle := flags&gcDrainIdle != 0
+
+	initScanWork := gcw.heapScanWork
+
+	// checkWork is the scan work before performing the next
+	// self-preempt check.
+	checkWork := int64(1<<63 - 1)
+	var check func() bool
+	if flags&(gcDrainIdle|gcDrainFractional) != 0 {
+		checkWork = initScanWork + drainCheckThreshold
+		if idle {
+			check = pollWork
+		} else if flags&gcDrainFractional != 0 {
+			check = pollFractionalWorkerExit
+		}
+	}
+
+	// Drain root marking jobs.
+	if work.markrootNext < work.markrootJobs {
+		// Stop if we're preemptible or if someone wants to STW.
+		for !(gp.preempt && (preemptible || sched.gcwaiting.Load())) {
+			job := atomic.Xadd(&work.markrootNext, +1) - 1
+			if job >= work.markrootJobs {
+				break
+			}
+			markroot(gcw, job, flushBgCredit)
+			if check != nil && check() {
+				goto done
+			}
+		}
+	}
+
+	// Drain heap marking jobs.
+	// Stop if we're preemptible or if someone wants to STW.
+	for !(gp.preempt && (preemptible || sched.gcwaiting.Load())) {
+		// Try to keep work available on the global queue. We used to
+		// check if there were waiting workers, but it's better to
+		// just keep work available than to make workers wait. In the
+		// worst case, we'll do O(log(_WorkbufSize)) unnecessary
+		// balances.
+		if work.full == 0 {
+			gcw.balance()
+		}
+
+		b := gcw.tryGetFast()
+		if b == 0 {
+			b = gcw.tryGet()
+			if b == 0 {
+				// Flush the write barrier
+				// buffer; this may create
+				// more work.
+				wbBufFlush()
+				b = gcw.tryGet()
+			}
+		}
+		if b == 0 {
+			// Unable to get work.
+			break
+		}
+		scanobject(b, gcw)
+
+		// Flush background scan work credit to the global
+		// account if we've accumulated enough locally so
+		// mutator assists can draw on it.
+		if gcw.heapScanWork >= gcCreditSlack {
+			gcController.heapScanWork.Add(gcw.heapScanWork)
+			if flushBgCredit {
+				gcFlushBgCredit(gcw.heapScanWork - initScanWork)
+				initScanWork = 0
+			}
+			checkWork -= gcw.heapScanWork
+			gcw.heapScanWork = 0
+
+			if checkWork <= 0 {
+				checkWork += drainCheckThreshold
+				if check != nil && check() {
+					break
+				}
+			}
+		}
+	}
+
+done:
+	// Flush remaining scan work credit.
+	if gcw.heapScanWork > 0 {
+		gcController.heapScanWork.Add(gcw.heapScanWork)
+		if flushBgCredit {
+			gcFlushBgCredit(gcw.heapScanWork - initScanWork)
+		}
+		gcw.heapScanWork = 0
+	}
+}
+
+// gcDrainN blackens grey objects until it has performed roughly
+// scanWork units of scan work or the G is preempted. This is
+// best-effort, so it may perform less work if it fails to get a work
+// buffer. Otherwise, it will perform at least n units of work, but
+// may perform more because scanning is always done in whole object
+// increments. It returns the amount of scan work performed.
+//
+// The caller goroutine must be in a preemptible state (e.g.,
+// _Gwaiting) to prevent deadlocks during stack scanning. As a
+// consequence, this must be called on the system stack.
+//
+//go:nowritebarrier
+//go:systemstack
+func gcDrainN(gcw *gcWork, scanWork int64) int64 {
+	if !writeBarrier.needed {
+		throw("gcDrainN phase incorrect")
+	}
+
+	// There may already be scan work on the gcw, which we don't
+	// want to claim was done by this call.
+	workFlushed := -gcw.heapScanWork
+
+	// In addition to backing out because of a preemption, back out
+	// if the GC CPU limiter is enabled.
+	gp := getg().m.curg
+	for !gp.preempt && !gcCPULimiter.limiting() && workFlushed+gcw.heapScanWork < scanWork {
+		// See gcDrain comment.
+		if work.full == 0 {
+			gcw.balance()
+		}
+
+		b := gcw.tryGetFast()
+		if b == 0 {
+			b = gcw.tryGet()
+			if b == 0 {
+				// Flush the write barrier buffer;
+				// this may create more work.
+				wbBufFlush()
+				b = gcw.tryGet()
+			}
+		}
+
+		if b == 0 {
+			// Try to do a root job.
+			if work.markrootNext < work.markrootJobs {
+				job := atomic.Xadd(&work.markrootNext, +1) - 1
+				if job < work.markrootJobs {
+					workFlushed += markroot(gcw, job, false)
+					continue
+				}
+			}
+			// No heap or root jobs.
+			break
+		}
+
+		scanobject(b, gcw)
+
+		// Flush background scan work credit.
+		if gcw.heapScanWork >= gcCreditSlack {
+			gcController.heapScanWork.Add(gcw.heapScanWork)
+			workFlushed += gcw.heapScanWork
+			gcw.heapScanWork = 0
+		}
+	}
+
+	// Unlike gcDrain, there's no need to flush remaining work
+	// here because this never flushes to bgScanCredit and
+	// gcw.dispose will flush any remaining work to scanWork.
+
+	return workFlushed + gcw.heapScanWork
+}
+
+// scanblock scans b as scanobject would, but using an explicit
+// pointer bitmap instead of the heap bitmap.
+//
+// This is used to scan non-heap roots, so it does not update
+// gcw.bytesMarked or gcw.heapScanWork.
+//
+// If stk != nil, possible stack pointers are also reported to stk.putPtr.
+//
+//go:nowritebarrier
+func scanblock(b0, n0 uintptr, ptrmask *uint8, gcw *gcWork, stk *stackScanState) {
+	// Use local copies of original parameters, so that a stack trace
+	// due to one of the throws below shows the original block
+	// base and extent.
+	b := b0
+	n := n0
+
+	for i := uintptr(0); i < n; {
+		// Find bits for the next word.
+		bits := uint32(*addb(ptrmask, i/(goarch.PtrSize*8)))
+		if bits == 0 {
+			i += goarch.PtrSize * 8
+			continue
+		}
+		for j := 0; j < 8 && i < n; j++ {
+			if bits&1 != 0 {
+				// Same work as in scanobject; see comments there.
+				p := *(*uintptr)(unsafe.Pointer(b + i))
+				if p != 0 {
+					if obj, span, objIndex := findObject(p, b, i); obj != 0 {
+						greyobject(obj, b, i, span, gcw, objIndex)
+					} else if stk != nil && p >= stk.stack.lo && p < stk.stack.hi {
+						stk.putPtr(p, false)
+					}
+				}
+			}
+			bits >>= 1
+			i += goarch.PtrSize
+		}
+	}
+}
+
+// scanobject scans the object starting at b, adding pointers to gcw.
+// b must point to the beginning of a heap object or an oblet.
+// scanobject consults the GC bitmap for the pointer mask and the
+// spans for the size of the object.
+//
+//go:nowritebarrier
+func scanobject(b uintptr, gcw *gcWork) {
+	// Prefetch object before we scan it.
+	//
+	// This will overlap fetching the beginning of the object with initial
+	// setup before we start scanning the object.
+	sys.Prefetch(b)
+
+	// Find the bits for b and the size of the object at b.
+	//
+	// b is either the beginning of an object, in which case this
+	// is the size of the object to scan, or it points to an
+	// oblet, in which case we compute the size to scan below.
+	s := spanOfUnchecked(b)
+	n := s.elemsize
+	if n == 0 {
+		throw("scanobject n == 0")
+	}
+	if s.spanclass.noscan() {
+		// Correctness-wise this is ok, but it's inefficient
+		// if noscan objects reach here.
+		throw("scanobject of a noscan object")
+	}
+
+	if n > maxObletBytes {
+		// Large object. Break into oblets for better
+		// parallelism and lower latency.
+		if b == s.base() {
+			// Enqueue the other oblets to scan later.
+			// Some oblets may be in b's scalar tail, but
+			// these will be marked as "no more pointers",
+			// so we'll drop out immediately when we go to
+			// scan those.
+			for oblet := b + maxObletBytes; oblet < s.base()+s.elemsize; oblet += maxObletBytes {
+				if !gcw.putFast(oblet) {
+					gcw.put(oblet)
+				}
+			}
+		}
+
+		// Compute the size of the oblet. Since this object
+		// must be a large object, s.base() is the beginning
+		// of the object.
+		n = s.base() + s.elemsize - b
+		if n > maxObletBytes {
+			n = maxObletBytes
+		}
+	}
+
+	hbits := heapBitsForAddr(b, n)
+	var scanSize uintptr
+	for {
+		var addr uintptr
+		if hbits, addr = hbits.nextFast(); addr == 0 {
+			if hbits, addr = hbits.next(); addr == 0 {
+				break
+			}
+		}
+
+		// Keep track of farthest pointer we found, so we can
+		// update heapScanWork. TODO: is there a better metric,
+		// now that we can skip scalar portions pretty efficiently?
+		scanSize = addr - b + goarch.PtrSize
+
+		// Work here is duplicated in scanblock and above.
+		// If you make changes here, make changes there too.
+		obj := *(*uintptr)(unsafe.Pointer(addr))
+
+		// At this point we have extracted the next potential pointer.
+		// Quickly filter out nil and pointers back to the current object.
+		if obj != 0 && obj-b >= n {
+			// Test if obj points into the Go heap and, if so,
+			// mark the object.
+			//
+			// Note that it's possible for findObject to
+			// fail if obj points to a just-allocated heap
+			// object because of a race with growing the
+			// heap. In this case, we know the object was
+			// just allocated and hence will be marked by
+			// allocation itself.
+			if obj, span, objIndex := findObject(obj, b, addr-b); obj != 0 {
+				greyobject(obj, b, addr-b, span, gcw, objIndex)
+			}
+		}
+	}
+	gcw.bytesMarked += uint64(n)
+	gcw.heapScanWork += int64(scanSize)
+}
+
+// scanConservative scans block [b, b+n) conservatively, treating any
+// pointer-like value in the block as a pointer.
+//
+// If ptrmask != nil, only words that are marked in ptrmask are
+// considered as potential pointers.
+//
+// If state != nil, it's assumed that [b, b+n) is a block in the stack
+// and may contain pointers to stack objects.
+func scanConservative(b, n uintptr, ptrmask *uint8, gcw *gcWork, state *stackScanState) {
+	if debugScanConservative {
+		printlock()
+		print("conservatively scanning [", hex(b), ",", hex(b+n), ")\n")
+		hexdumpWords(b, b+n, func(p uintptr) byte {
+			if ptrmask != nil {
+				word := (p - b) / goarch.PtrSize
+				bits := *addb(ptrmask, word/8)
+				if (bits>>(word%8))&1 == 0 {
+					return '$'
+				}
+			}
+
+			val := *(*uintptr)(unsafe.Pointer(p))
+			if state != nil && state.stack.lo <= val && val < state.stack.hi {
+				return '@'
+			}
+
+			span := spanOfHeap(val)
+			if span == nil {
+				return ' '
+			}
+			idx := span.objIndex(val)
+			if span.isFree(idx) {
+				return ' '
+			}
+			return '*'
+		})
+		printunlock()
+	}
+
+	for i := uintptr(0); i < n; i += goarch.PtrSize {
+		if ptrmask != nil {
+			word := i / goarch.PtrSize
+			bits := *addb(ptrmask, word/8)
+			if bits == 0 {
+				// Skip 8 words (the loop increment will do the 8th)
+				//
+				// This must be the first time we've
+				// seen this word of ptrmask, so i
+				// must be 8-word-aligned, but check
+				// our reasoning just in case.
+				if i%(goarch.PtrSize*8) != 0 {
+					throw("misaligned mask")
+				}
+				i += goarch.PtrSize*8 - goarch.PtrSize
+				continue
+			}
+			if (bits>>(word%8))&1 == 0 {
+				continue
+			}
+		}
+
+		val := *(*uintptr)(unsafe.Pointer(b + i))
+
+		// Check if val points into the stack.
+		if state != nil && state.stack.lo <= val && val < state.stack.hi {
+			// val may point to a stack object. This
+			// object may be dead from last cycle and
+			// hence may contain pointers to unallocated
+			// objects, but unlike heap objects we can't
+			// tell if it's already dead. Hence, if all
+			// pointers to this object are from
+			// conservative scanning, we have to scan it
+			// defensively, too.
+			state.putPtr(val, true)
+			continue
+		}
+
+		// Check if val points to a heap span.
+		span := spanOfHeap(val)
+		if span == nil {
+			continue
+		}
+
+		// Check if val points to an allocated object.
+		idx := span.objIndex(val)
+		if span.isFree(idx) {
+			continue
+		}
+
+		// val points to an allocated object. Mark it.
+		obj := span.base() + idx*span.elemsize
+		greyobject(obj, b, i, span, gcw, idx)
+	}
+}
+
+// Shade the object if it isn't already.
+// The object is not nil and known to be in the heap.
+// Preemption must be disabled.
+//
+//go:nowritebarrier
+func shade(b uintptr) {
+	if obj, span, objIndex := findObject(b, 0, 0); obj != 0 {
+		gcw := &getg().m.p.ptr().gcw
+		greyobject(obj, 0, 0, span, gcw, objIndex)
+	}
+}
+
+// obj is the start of an object with mark mbits.
+// If it isn't already marked, mark it and enqueue into gcw.
+// base and off are for debugging only and could be removed.
+//
+// See also wbBufFlush1, which partially duplicates this logic.
+//
+//go:nowritebarrierrec
+func greyobject(obj, base, off uintptr, span *mspan, gcw *gcWork, objIndex uintptr) {
+	// obj should be start of allocation, and so must be at least pointer-aligned.
+	if obj&(goarch.PtrSize-1) != 0 {
+		throw("greyobject: obj not pointer-aligned")
+	}
+	mbits := span.markBitsForIndex(objIndex)
+
+	if useCheckmark {
+		if setCheckmark(obj, base, off, mbits) {
+			// Already marked.
+			return
+		}
+	} else {
+		if debug.gccheckmark > 0 && span.isFree(objIndex) {
+			print("runtime: marking free object ", hex(obj), " found at *(", hex(base), "+", hex(off), ")\n")
+			gcDumpObject("base", base, off)
+			gcDumpObject("obj", obj, ^uintptr(0))
+			getg().m.traceback = 2
+			throw("marking free object")
+		}
+
+		// If marked we have nothing to do.
+		if mbits.isMarked() {
+			return
+		}
+		mbits.setMarked()
+
+		// Mark span.
+		arena, pageIdx, pageMask := pageIndexOf(span.base())
+		if arena.pageMarks[pageIdx]&pageMask == 0 {
+			atomic.Or8(&arena.pageMarks[pageIdx], pageMask)
+		}
+
+		// If this is a noscan object, fast-track it to black
+		// instead of greying it.
+		if span.spanclass.noscan() {
+			gcw.bytesMarked += uint64(span.elemsize)
+			return
+		}
+	}
+
+	// We're adding obj to P's local workbuf, so it's likely
+	// this object will be processed soon by the same P.
+	// Even if the workbuf gets flushed, there will likely still be
+	// some benefit on platforms with inclusive shared caches.
+	sys.Prefetch(obj)
+	// Queue the obj for scanning.
+	if !gcw.putFast(obj) {
+		gcw.put(obj)
+	}
+}
+
+// gcDumpObject dumps the contents of obj for debugging and marks the
+// field at byte offset off in obj.
+func gcDumpObject(label string, obj, off uintptr) {
+	s := spanOf(obj)
+	print(label, "=", hex(obj))
+	if s == nil {
+		print(" s=nil\n")
+		return
+	}
+	print(" s.base()=", hex(s.base()), " s.limit=", hex(s.limit), " s.spanclass=", s.spanclass, " s.elemsize=", s.elemsize, " s.state=")
+	if state := s.state.get(); 0 <= state && int(state) < len(mSpanStateNames) {
+		print(mSpanStateNames[state], "\n")
+	} else {
+		print("unknown(", state, ")\n")
+	}
+
+	skipped := false
+	size := s.elemsize
+	if s.state.get() == mSpanManual && size == 0 {
+		// We're printing something from a stack frame. We
+		// don't know how big it is, so just show up to an
+		// including off.
+		size = off + goarch.PtrSize
+	}
+	for i := uintptr(0); i < size; i += goarch.PtrSize {
+		// For big objects, just print the beginning (because
+		// that usually hints at the object's type) and the
+		// fields around off.
+		if !(i < 128*goarch.PtrSize || off-16*goarch.PtrSize < i && i < off+16*goarch.PtrSize) {
+			skipped = true
+			continue
+		}
+		if skipped {
+			print(" ...\n")
+			skipped = false
+		}
+		print(" *(", label, "+", i, ") = ", hex(*(*uintptr)(unsafe.Pointer(obj + i))))
+		if i == off {
+			print(" <==")
+		}
+		print("\n")
+	}
+	if skipped {
+		print(" ...\n")
+	}
+}
+
+// gcmarknewobject marks a newly allocated object black. obj must
+// not contain any non-nil pointers.
+//
+// This is nosplit so it can manipulate a gcWork without preemption.
+//
+//go:nowritebarrier
+//go:nosplit
+func gcmarknewobject(span *mspan, obj, size uintptr) {
+	if useCheckmark { // The world should be stopped so this should not happen.
+		throw("gcmarknewobject called while doing checkmark")
+	}
+
+	// Mark object.
+	objIndex := span.objIndex(obj)
+	span.markBitsForIndex(objIndex).setMarked()
+
+	// Mark span.
+	arena, pageIdx, pageMask := pageIndexOf(span.base())
+	if arena.pageMarks[pageIdx]&pageMask == 0 {
+		atomic.Or8(&arena.pageMarks[pageIdx], pageMask)
+	}
+
+	gcw := &getg().m.p.ptr().gcw
+	gcw.bytesMarked += uint64(size)
+}
+
+// gcMarkTinyAllocs greys all active tiny alloc blocks.
+//
+// The world must be stopped.
+func gcMarkTinyAllocs() {
+	assertWorldStopped()
+
+	for _, p := range allp {
+		c := p.mcache
+		if c == nil || c.tiny == 0 {
+			continue
+		}
+		_, span, objIndex := findObject(c.tiny, 0, 0)
+		gcw := &p.gcw
+		greyobject(c.tiny, 0, 0, span, gcw, objIndex)
+	}
+}
diff --git a/src/runtime/mgcpacer.go b/src/runtime/mgcpacer.go
new file mode 100644
index 0000000..32e19f9
--- /dev/null
+++ b/src/runtime/mgcpacer.go
@@ -0,0 +1,1444 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/cpu"
+	"internal/goexperiment"
+	"runtime/internal/atomic"
+	_ "unsafe" // for go:linkname
+)
+
+const (
+	// gcGoalUtilization is the goal CPU utilization for
+	// marking as a fraction of GOMAXPROCS.
+	//
+	// Increasing the goal utilization will shorten GC cycles as the GC
+	// has more resources behind it, lessening costs from the write barrier,
+	// but comes at the cost of increasing mutator latency.
+	gcGoalUtilization = gcBackgroundUtilization
+
+	// gcBackgroundUtilization is the fixed CPU utilization for background
+	// marking. It must be <= gcGoalUtilization. The difference between
+	// gcGoalUtilization and gcBackgroundUtilization will be made up by
+	// mark assists. The scheduler will aim to use within 50% of this
+	// goal.
+	//
+	// As a general rule, there's little reason to set gcBackgroundUtilization
+	// < gcGoalUtilization. One reason might be in mostly idle applications,
+	// where goroutines are unlikely to assist at all, so the actual
+	// utilization will be lower than the goal. But this is moot point
+	// because the idle mark workers already soak up idle CPU resources.
+	// These two values are still kept separate however because they are
+	// distinct conceptually, and in previous iterations of the pacer the
+	// distinction was more important.
+	gcBackgroundUtilization = 0.25
+
+	// gcCreditSlack is the amount of scan work credit that can
+	// accumulate locally before updating gcController.heapScanWork and,
+	// optionally, gcController.bgScanCredit. Lower values give a more
+	// accurate assist ratio and make it more likely that assists will
+	// successfully steal background credit. Higher values reduce memory
+	// contention.
+	gcCreditSlack = 2000
+
+	// gcAssistTimeSlack is the nanoseconds of mutator assist time that
+	// can accumulate on a P before updating gcController.assistTime.
+	gcAssistTimeSlack = 5000
+
+	// gcOverAssistWork determines how many extra units of scan work a GC
+	// assist does when an assist happens. This amortizes the cost of an
+	// assist by pre-paying for this many bytes of future allocations.
+	gcOverAssistWork = 64 << 10
+
+	// defaultHeapMinimum is the value of heapMinimum for GOGC==100.
+	defaultHeapMinimum = (goexperiment.HeapMinimum512KiBInt)*(512<<10) +
+		(1-goexperiment.HeapMinimum512KiBInt)*(4<<20)
+
+	// maxStackScanSlack is the bytes of stack space allocated or freed
+	// that can accumulate on a P before updating gcController.stackSize.
+	maxStackScanSlack = 8 << 10
+
+	// memoryLimitMinHeapGoalHeadroom is the minimum amount of headroom the
+	// pacer gives to the heap goal when operating in the memory-limited regime.
+	// That is, it'll reduce the heap goal by this many extra bytes off of the
+	// base calculation, at minimum.
+	memoryLimitMinHeapGoalHeadroom = 1 << 20
+
+	// memoryLimitHeapGoalHeadroomPercent is how headroom the memory-limit-based
+	// heap goal should have as a percent of the maximum possible heap goal allowed
+	// to maintain the memory limit.
+	memoryLimitHeapGoalHeadroomPercent = 3
+)
+
+// gcController implements the GC pacing controller that determines
+// when to trigger concurrent garbage collection and how much marking
+// work to do in mutator assists and background marking.
+//
+// It calculates the ratio between the allocation rate (in terms of CPU
+// time) and the GC scan throughput to determine the heap size at which to
+// trigger a GC cycle such that no GC assists are required to finish on time.
+// This algorithm thus optimizes GC CPU utilization to the dedicated background
+// mark utilization of 25% of GOMAXPROCS by minimizing GC assists.
+// GOMAXPROCS. The high-level design of this algorithm is documented
+// at https://github.com/golang/proposal/blob/master/design/44167-gc-pacer-redesign.md.
+// See https://golang.org/s/go15gcpacing for additional historical context.
+var gcController gcControllerState
+
+type gcControllerState struct {
+	// Initialized from GOGC. GOGC=off means no GC.
+	gcPercent atomic.Int32
+
+	// memoryLimit is the soft memory limit in bytes.
+	//
+	// Initialized from GOMEMLIMIT. GOMEMLIMIT=off is equivalent to MaxInt64
+	// which means no soft memory limit in practice.
+	//
+	// This is an int64 instead of a uint64 to more easily maintain parity with
+	// the SetMemoryLimit API, which sets a maximum at MaxInt64. This value
+	// should never be negative.
+	memoryLimit atomic.Int64
+
+	// heapMinimum is the minimum heap size at which to trigger GC.
+	// For small heaps, this overrides the usual GOGC*live set rule.
+	//
+	// When there is a very small live set but a lot of allocation, simply
+	// collecting when the heap reaches GOGC*live results in many GC
+	// cycles and high total per-GC overhead. This minimum amortizes this
+	// per-GC overhead while keeping the heap reasonably small.
+	//
+	// During initialization this is set to 4MB*GOGC/100. In the case of
+	// GOGC==0, this will set heapMinimum to 0, resulting in constant
+	// collection even when the heap size is small, which is useful for
+	// debugging.
+	heapMinimum uint64
+
+	// runway is the amount of runway in heap bytes allocated by the
+	// application that we want to give the GC once it starts.
+	//
+	// This is computed from consMark during mark termination.
+	runway atomic.Uint64
+
+	// consMark is the estimated per-CPU consMark ratio for the application.
+	//
+	// It represents the ratio between the application's allocation
+	// rate, as bytes allocated per CPU-time, and the GC's scan rate,
+	// as bytes scanned per CPU-time.
+	// The units of this ratio are (B / cpu-ns) / (B / cpu-ns).
+	//
+	// At a high level, this value is computed as the bytes of memory
+	// allocated (cons) per unit of scan work completed (mark) in a GC
+	// cycle, divided by the CPU time spent on each activity.
+	//
+	// Updated at the end of each GC cycle, in endCycle.
+	consMark float64
+
+	// lastConsMark is the computed cons/mark value for the previous 4 GC
+	// cycles. Note that this is *not* the last value of consMark, but the
+	// measured cons/mark value in endCycle.
+	lastConsMark [4]float64
+
+	// gcPercentHeapGoal is the goal heapLive for when next GC ends derived
+	// from gcPercent.
+	//
+	// Set to ^uint64(0) if gcPercent is disabled.
+	gcPercentHeapGoal atomic.Uint64
+
+	// sweepDistMinTrigger is the minimum trigger to ensure a minimum
+	// sweep distance.
+	//
+	// This bound is also special because it applies to both the trigger
+	// *and* the goal (all other trigger bounds must be based *on* the goal).
+	//
+	// It is computed ahead of time, at commit time. The theory is that,
+	// absent a sudden change to a parameter like gcPercent, the trigger
+	// will be chosen to always give the sweeper enough headroom. However,
+	// such a change might dramatically and suddenly move up the trigger,
+	// in which case we need to ensure the sweeper still has enough headroom.
+	sweepDistMinTrigger atomic.Uint64
+
+	// triggered is the point at which the current GC cycle actually triggered.
+	// Only valid during the mark phase of a GC cycle, otherwise set to ^uint64(0).
+	//
+	// Updated while the world is stopped.
+	triggered uint64
+
+	// lastHeapGoal is the value of heapGoal at the moment the last GC
+	// ended. Note that this is distinct from the last value heapGoal had,
+	// because it could change if e.g. gcPercent changes.
+	//
+	// Read and written with the world stopped or with mheap_.lock held.
+	lastHeapGoal uint64
+
+	// heapLive is the number of bytes considered live by the GC.
+	// That is: retained by the most recent GC plus allocated
+	// since then. heapLive ≤ memstats.totalAlloc-memstats.totalFree, since
+	// heapAlloc includes unmarked objects that have not yet been swept (and
+	// hence goes up as we allocate and down as we sweep) while heapLive
+	// excludes these objects (and hence only goes up between GCs).
+	//
+	// To reduce contention, this is updated only when obtaining a span
+	// from an mcentral and at this point it counts all of the unallocated
+	// slots in that span (which will be allocated before that mcache
+	// obtains another span from that mcentral). Hence, it slightly
+	// overestimates the "true" live heap size. It's better to overestimate
+	// than to underestimate because 1) this triggers the GC earlier than
+	// necessary rather than potentially too late and 2) this leads to a
+	// conservative GC rate rather than a GC rate that is potentially too
+	// low.
+	//
+	// Whenever this is updated, call traceHeapAlloc() and
+	// this gcControllerState's revise() method.
+	heapLive atomic.Uint64
+
+	// heapScan is the number of bytes of "scannable" heap. This is the
+	// live heap (as counted by heapLive), but omitting no-scan objects and
+	// no-scan tails of objects.
+	//
+	// This value is fixed at the start of a GC cycle. It represents the
+	// maximum scannable heap.
+	heapScan atomic.Uint64
+
+	// lastHeapScan is the number of bytes of heap that were scanned
+	// last GC cycle. It is the same as heapMarked, but only
+	// includes the "scannable" parts of objects.
+	//
+	// Updated when the world is stopped.
+	lastHeapScan uint64
+
+	// lastStackScan is the number of bytes of stack that were scanned
+	// last GC cycle.
+	lastStackScan atomic.Uint64
+
+	// maxStackScan is the amount of allocated goroutine stack space in
+	// use by goroutines.
+	//
+	// This number tracks allocated goroutine stack space rather than used
+	// goroutine stack space (i.e. what is actually scanned) because used
+	// goroutine stack space is much harder to measure cheaply. By using
+	// allocated space, we make an overestimate; this is OK, it's better
+	// to conservatively overcount than undercount.
+	maxStackScan atomic.Uint64
+
+	// globalsScan is the total amount of global variable space
+	// that is scannable.
+	globalsScan atomic.Uint64
+
+	// heapMarked is the number of bytes marked by the previous
+	// GC. After mark termination, heapLive == heapMarked, but
+	// unlike heapLive, heapMarked does not change until the
+	// next mark termination.
+	heapMarked uint64
+
+	// heapScanWork is the total heap scan work performed this cycle.
+	// stackScanWork is the total stack scan work performed this cycle.
+	// globalsScanWork is the total globals scan work performed this cycle.
+	//
+	// These are updated atomically during the cycle. Updates occur in
+	// bounded batches, since they are both written and read
+	// throughout the cycle. At the end of the cycle, heapScanWork is how
+	// much of the retained heap is scannable.
+	//
+	// Currently these are measured in bytes. For most uses, this is an
+	// opaque unit of work, but for estimation the definition is important.
+	//
+	// Note that stackScanWork includes only stack space scanned, not all
+	// of the allocated stack.
+	heapScanWork    atomic.Int64
+	stackScanWork   atomic.Int64
+	globalsScanWork atomic.Int64
+
+	// bgScanCredit is the scan work credit accumulated by the concurrent
+	// background scan. This credit is accumulated by the background scan
+	// and stolen by mutator assists.  Updates occur in bounded batches,
+	// since it is both written and read throughout the cycle.
+	bgScanCredit atomic.Int64
+
+	// assistTime is the nanoseconds spent in mutator assists
+	// during this cycle. This is updated atomically, and must also
+	// be updated atomically even during a STW, because it is read
+	// by sysmon. Updates occur in bounded batches, since it is both
+	// written and read throughout the cycle.
+	assistTime atomic.Int64
+
+	// dedicatedMarkTime is the nanoseconds spent in dedicated mark workers
+	// during this cycle. This is updated at the end of the concurrent mark
+	// phase.
+	dedicatedMarkTime atomic.Int64
+
+	// fractionalMarkTime is the nanoseconds spent in the fractional mark
+	// worker during this cycle. This is updated throughout the cycle and
+	// will be up-to-date if the fractional mark worker is not currently
+	// running.
+	fractionalMarkTime atomic.Int64
+
+	// idleMarkTime is the nanoseconds spent in idle marking during this
+	// cycle. This is updated throughout the cycle.
+	idleMarkTime atomic.Int64
+
+	// markStartTime is the absolute start time in nanoseconds
+	// that assists and background mark workers started.
+	markStartTime int64
+
+	// dedicatedMarkWorkersNeeded is the number of dedicated mark workers
+	// that need to be started. This is computed at the beginning of each
+	// cycle and decremented as dedicated mark workers get started.
+	dedicatedMarkWorkersNeeded atomic.Int64
+
+	// idleMarkWorkers is two packed int32 values in a single uint64.
+	// These two values are always updated simultaneously.
+	//
+	// The bottom int32 is the current number of idle mark workers executing.
+	//
+	// The top int32 is the maximum number of idle mark workers allowed to
+	// execute concurrently. Normally, this number is just gomaxprocs. However,
+	// during periodic GC cycles it is set to 0 because the system is idle
+	// anyway; there's no need to go full blast on all of GOMAXPROCS.
+	//
+	// The maximum number of idle mark workers is used to prevent new workers
+	// from starting, but it is not a hard maximum. It is possible (but
+	// exceedingly rare) for the current number of idle mark workers to
+	// transiently exceed the maximum. This could happen if the maximum changes
+	// just after a GC ends, and an M with no P.
+	//
+	// Note that if we have no dedicated mark workers, we set this value to
+	// 1 in this case we only have fractional GC workers which aren't scheduled
+	// strictly enough to ensure GC progress. As a result, idle-priority mark
+	// workers are vital to GC progress in these situations.
+	//
+	// For example, consider a situation in which goroutines block on the GC
+	// (such as via runtime.GOMAXPROCS) and only fractional mark workers are
+	// scheduled (e.g. GOMAXPROCS=1). Without idle-priority mark workers, the
+	// last running M might skip scheduling a fractional mark worker if its
+	// utilization goal is met, such that once it goes to sleep (because there's
+	// nothing to do), there will be nothing else to spin up a new M for the
+	// fractional worker in the future, stalling GC progress and causing a
+	// deadlock. However, idle-priority workers will *always* run when there is
+	// nothing left to do, ensuring the GC makes progress.
+	//
+	// See github.com/golang/go/issues/44163 for more details.
+	idleMarkWorkers atomic.Uint64
+
+	// assistWorkPerByte is the ratio of scan work to allocated
+	// bytes that should be performed by mutator assists. This is
+	// computed at the beginning of each cycle and updated every
+	// time heapScan is updated.
+	assistWorkPerByte atomic.Float64
+
+	// assistBytesPerWork is 1/assistWorkPerByte.
+	//
+	// Note that because this is read and written independently
+	// from assistWorkPerByte users may notice a skew between
+	// the two values, and such a state should be safe.
+	assistBytesPerWork atomic.Float64
+
+	// fractionalUtilizationGoal is the fraction of wall clock
+	// time that should be spent in the fractional mark worker on
+	// each P that isn't running a dedicated worker.
+	//
+	// For example, if the utilization goal is 25% and there are
+	// no dedicated workers, this will be 0.25. If the goal is
+	// 25%, there is one dedicated worker, and GOMAXPROCS is 5,
+	// this will be 0.05 to make up the missing 5%.
+	//
+	// If this is zero, no fractional workers are needed.
+	fractionalUtilizationGoal float64
+
+	// These memory stats are effectively duplicates of fields from
+	// memstats.heapStats but are updated atomically or with the world
+	// stopped and don't provide the same consistency guarantees.
+	//
+	// Because the runtime is responsible for managing a memory limit, it's
+	// useful to couple these stats more tightly to the gcController, which
+	// is intimately connected to how that memory limit is maintained.
+	heapInUse    sysMemStat    // bytes in mSpanInUse spans
+	heapReleased sysMemStat    // bytes released to the OS
+	heapFree     sysMemStat    // bytes not in any span, but not released to the OS
+	totalAlloc   atomic.Uint64 // total bytes allocated
+	totalFree    atomic.Uint64 // total bytes freed
+	mappedReady  atomic.Uint64 // total virtual memory in the Ready state (see mem.go).
+
+	// test indicates that this is a test-only copy of gcControllerState.
+	test bool
+
+	_ cpu.CacheLinePad
+}
+
+func (c *gcControllerState) init(gcPercent int32, memoryLimit int64) {
+	c.heapMinimum = defaultHeapMinimum
+	c.triggered = ^uint64(0)
+	c.setGCPercent(gcPercent)
+	c.setMemoryLimit(memoryLimit)
+	c.commit(true) // No sweep phase in the first GC cycle.
+	// N.B. Don't bother calling traceHeapGoal. Tracing is never enabled at
+	// initialization time.
+	// N.B. No need to call revise; there's no GC enabled during
+	// initialization.
+}
+
+// startCycle resets the GC controller's state and computes estimates
+// for a new GC cycle. The caller must hold worldsema and the world
+// must be stopped.
+func (c *gcControllerState) startCycle(markStartTime int64, procs int, trigger gcTrigger) {
+	c.heapScanWork.Store(0)
+	c.stackScanWork.Store(0)
+	c.globalsScanWork.Store(0)
+	c.bgScanCredit.Store(0)
+	c.assistTime.Store(0)
+	c.dedicatedMarkTime.Store(0)
+	c.fractionalMarkTime.Store(0)
+	c.idleMarkTime.Store(0)
+	c.markStartTime = markStartTime
+	c.triggered = c.heapLive.Load()
+
+	// Compute the background mark utilization goal. In general,
+	// this may not come out exactly. We round the number of
+	// dedicated workers so that the utilization is closest to
+	// 25%. For small GOMAXPROCS, this would introduce too much
+	// error, so we add fractional workers in that case.
+	totalUtilizationGoal := float64(procs) * gcBackgroundUtilization
+	dedicatedMarkWorkersNeeded := int64(totalUtilizationGoal + 0.5)
+	utilError := float64(dedicatedMarkWorkersNeeded)/totalUtilizationGoal - 1
+	const maxUtilError = 0.3
+	if utilError < -maxUtilError || utilError > maxUtilError {
+		// Rounding put us more than 30% off our goal. With
+		// gcBackgroundUtilization of 25%, this happens for
+		// GOMAXPROCS<=3 or GOMAXPROCS=6. Enable fractional
+		// workers to compensate.
+		if float64(dedicatedMarkWorkersNeeded) > totalUtilizationGoal {
+			// Too many dedicated workers.
+			dedicatedMarkWorkersNeeded--
+		}
+		c.fractionalUtilizationGoal = (totalUtilizationGoal - float64(dedicatedMarkWorkersNeeded)) / float64(procs)
+	} else {
+		c.fractionalUtilizationGoal = 0
+	}
+
+	// In STW mode, we just want dedicated workers.
+	if debug.gcstoptheworld > 0 {
+		dedicatedMarkWorkersNeeded = int64(procs)
+		c.fractionalUtilizationGoal = 0
+	}
+
+	// Clear per-P state
+	for _, p := range allp {
+		p.gcAssistTime = 0
+		p.gcFractionalMarkTime = 0
+	}
+
+	if trigger.kind == gcTriggerTime {
+		// During a periodic GC cycle, reduce the number of idle mark workers
+		// required. However, we need at least one dedicated mark worker or
+		// idle GC worker to ensure GC progress in some scenarios (see comment
+		// on maxIdleMarkWorkers).
+		if dedicatedMarkWorkersNeeded > 0 {
+			c.setMaxIdleMarkWorkers(0)
+		} else {
+			// TODO(mknyszek): The fundamental reason why we need this is because
+			// we can't count on the fractional mark worker to get scheduled.
+			// Fix that by ensuring it gets scheduled according to its quota even
+			// if the rest of the application is idle.
+			c.setMaxIdleMarkWorkers(1)
+		}
+	} else {
+		// N.B. gomaxprocs and dedicatedMarkWorkersNeeded are guaranteed not to
+		// change during a GC cycle.
+		c.setMaxIdleMarkWorkers(int32(procs) - int32(dedicatedMarkWorkersNeeded))
+	}
+
+	// Compute initial values for controls that are updated
+	// throughout the cycle.
+	c.dedicatedMarkWorkersNeeded.Store(dedicatedMarkWorkersNeeded)
+	c.revise()
+
+	if debug.gcpacertrace > 0 {
+		heapGoal := c.heapGoal()
+		assistRatio := c.assistWorkPerByte.Load()
+		print("pacer: assist ratio=", assistRatio,
+			" (scan ", gcController.heapScan.Load()>>20, " MB in ",
+			work.initialHeapLive>>20, "->",
+			heapGoal>>20, " MB)",
+			" workers=", dedicatedMarkWorkersNeeded,
+			"+", c.fractionalUtilizationGoal, "\n")
+	}
+}
+
+// revise updates the assist ratio during the GC cycle to account for
+// improved estimates. This should be called whenever gcController.heapScan,
+// gcController.heapLive, or if any inputs to gcController.heapGoal are
+// updated. It is safe to call concurrently, but it may race with other
+// calls to revise.
+//
+// The result of this race is that the two assist ratio values may not line
+// up or may be stale. In practice this is OK because the assist ratio
+// moves slowly throughout a GC cycle, and the assist ratio is a best-effort
+// heuristic anyway. Furthermore, no part of the heuristic depends on
+// the two assist ratio values being exact reciprocals of one another, since
+// the two values are used to convert values from different sources.
+//
+// The worst case result of this raciness is that we may miss a larger shift
+// in the ratio (say, if we decide to pace more aggressively against the
+// hard heap goal) but even this "hard goal" is best-effort (see #40460).
+// The dedicated GC should ensure we don't exceed the hard goal by too much
+// in the rare case we do exceed it.
+//
+// It should only be called when gcBlackenEnabled != 0 (because this
+// is when assists are enabled and the necessary statistics are
+// available).
+func (c *gcControllerState) revise() {
+	gcPercent := c.gcPercent.Load()
+	if gcPercent < 0 {
+		// If GC is disabled but we're running a forced GC,
+		// act like GOGC is huge for the below calculations.
+		gcPercent = 100000
+	}
+	live := c.heapLive.Load()
+	scan := c.heapScan.Load()
+	work := c.heapScanWork.Load() + c.stackScanWork.Load() + c.globalsScanWork.Load()
+
+	// Assume we're under the soft goal. Pace GC to complete at
+	// heapGoal assuming the heap is in steady-state.
+	heapGoal := int64(c.heapGoal())
+
+	// The expected scan work is computed as the amount of bytes scanned last
+	// GC cycle (both heap and stack), plus our estimate of globals work for this cycle.
+	scanWorkExpected := int64(c.lastHeapScan + c.lastStackScan.Load() + c.globalsScan.Load())
+
+	// maxScanWork is a worst-case estimate of the amount of scan work that
+	// needs to be performed in this GC cycle. Specifically, it represents
+	// the case where *all* scannable memory turns out to be live, and
+	// *all* allocated stack space is scannable.
+	maxStackScan := c.maxStackScan.Load()
+	maxScanWork := int64(scan + maxStackScan + c.globalsScan.Load())
+	if work > scanWorkExpected {
+		// We've already done more scan work than expected. Because our expectation
+		// is based on a steady-state scannable heap size, we assume this means our
+		// heap is growing. Compute a new heap goal that takes our existing runway
+		// computed for scanWorkExpected and extrapolates it to maxScanWork, the worst-case
+		// scan work. This keeps our assist ratio stable if the heap continues to grow.
+		//
+		// The effect of this mechanism is that assists stay flat in the face of heap
+		// growths. It's OK to use more memory this cycle to scan all the live heap,
+		// because the next GC cycle is inevitably going to use *at least* that much
+		// memory anyway.
+		extHeapGoal := int64(float64(heapGoal-int64(c.triggered))/float64(scanWorkExpected)*float64(maxScanWork)) + int64(c.triggered)
+		scanWorkExpected = maxScanWork
+
+		// hardGoal is a hard limit on the amount that we're willing to push back the
+		// heap goal, and that's twice the heap goal (i.e. if GOGC=100 and the heap and/or
+		// stacks and/or globals grow to twice their size, this limits the current GC cycle's
+		// growth to 4x the original live heap's size).
+		//
+		// This maintains the invariant that we use no more memory than the next GC cycle
+		// will anyway.
+		hardGoal := int64((1.0 + float64(gcPercent)/100.0) * float64(heapGoal))
+		if extHeapGoal > hardGoal {
+			extHeapGoal = hardGoal
+		}
+		heapGoal = extHeapGoal
+	}
+	if int64(live) > heapGoal {
+		// We're already past our heap goal, even the extrapolated one.
+		// Leave ourselves some extra runway, so in the worst case we
+		// finish by that point.
+		const maxOvershoot = 1.1
+		heapGoal = int64(float64(heapGoal) * maxOvershoot)
+
+		// Compute the upper bound on the scan work remaining.
+		scanWorkExpected = maxScanWork
+	}
+
+	// Compute the remaining scan work estimate.
+	//
+	// Note that we currently count allocations during GC as both
+	// scannable heap (heapScan) and scan work completed
+	// (scanWork), so allocation will change this difference
+	// slowly in the soft regime and not at all in the hard
+	// regime.
+	scanWorkRemaining := scanWorkExpected - work
+	if scanWorkRemaining < 1000 {
+		// We set a somewhat arbitrary lower bound on
+		// remaining scan work since if we aim a little high,
+		// we can miss by a little.
+		//
+		// We *do* need to enforce that this is at least 1,
+		// since marking is racy and double-scanning objects
+		// may legitimately make the remaining scan work
+		// negative, even in the hard goal regime.
+		scanWorkRemaining = 1000
+	}
+
+	// Compute the heap distance remaining.
+	heapRemaining := heapGoal - int64(live)
+	if heapRemaining <= 0 {
+		// This shouldn't happen, but if it does, avoid
+		// dividing by zero or setting the assist negative.
+		heapRemaining = 1
+	}
+
+	// Compute the mutator assist ratio so by the time the mutator
+	// allocates the remaining heap bytes up to heapGoal, it will
+	// have done (or stolen) the remaining amount of scan work.
+	// Note that the assist ratio values are updated atomically
+	// but not together. This means there may be some degree of
+	// skew between the two values. This is generally OK as the
+	// values shift relatively slowly over the course of a GC
+	// cycle.
+	assistWorkPerByte := float64(scanWorkRemaining) / float64(heapRemaining)
+	assistBytesPerWork := float64(heapRemaining) / float64(scanWorkRemaining)
+	c.assistWorkPerByte.Store(assistWorkPerByte)
+	c.assistBytesPerWork.Store(assistBytesPerWork)
+}
+
+// endCycle computes the consMark estimate for the next cycle.
+// userForced indicates whether the current GC cycle was forced
+// by the application.
+func (c *gcControllerState) endCycle(now int64, procs int, userForced bool) {
+	// Record last heap goal for the scavenger.
+	// We'll be updating the heap goal soon.
+	gcController.lastHeapGoal = c.heapGoal()
+
+	// Compute the duration of time for which assists were turned on.
+	assistDuration := now - c.markStartTime
+
+	// Assume background mark hit its utilization goal.
+	utilization := gcBackgroundUtilization
+	// Add assist utilization; avoid divide by zero.
+	if assistDuration > 0 {
+		utilization += float64(c.assistTime.Load()) / float64(assistDuration*int64(procs))
+	}
+
+	if c.heapLive.Load() <= c.triggered {
+		// Shouldn't happen, but let's be very safe about this in case the
+		// GC is somehow extremely short.
+		//
+		// In this case though, the only reasonable value for c.heapLive-c.triggered
+		// would be 0, which isn't really all that useful, i.e. the GC was so short
+		// that it didn't matter.
+		//
+		// Ignore this case and don't update anything.
+		return
+	}
+	idleUtilization := 0.0
+	if assistDuration > 0 {
+		idleUtilization = float64(c.idleMarkTime.Load()) / float64(assistDuration*int64(procs))
+	}
+	// Determine the cons/mark ratio.
+	//
+	// The units we want for the numerator and denominator are both B / cpu-ns.
+	// We get this by taking the bytes allocated or scanned, and divide by the amount of
+	// CPU time it took for those operations. For allocations, that CPU time is
+	//
+	//    assistDuration * procs * (1 - utilization)
+	//
+	// Where utilization includes just background GC workers and assists. It does *not*
+	// include idle GC work time, because in theory the mutator is free to take that at
+	// any point.
+	//
+	// For scanning, that CPU time is
+	//
+	//    assistDuration * procs * (utilization + idleUtilization)
+	//
+	// In this case, we *include* idle utilization, because that is additional CPU time that
+	// the GC had available to it.
+	//
+	// In effect, idle GC time is sort of double-counted here, but it's very weird compared
+	// to other kinds of GC work, because of how fluid it is. Namely, because the mutator is
+	// *always* free to take it.
+	//
+	// So this calculation is really:
+	//     (heapLive-trigger) / (assistDuration * procs * (1-utilization)) /
+	//         (scanWork) / (assistDuration * procs * (utilization+idleUtilization))
+	//
+	// Note that because we only care about the ratio, assistDuration and procs cancel out.
+	scanWork := c.heapScanWork.Load() + c.stackScanWork.Load() + c.globalsScanWork.Load()
+	currentConsMark := (float64(c.heapLive.Load()-c.triggered) * (utilization + idleUtilization)) /
+		(float64(scanWork) * (1 - utilization))
+
+	// Update our cons/mark estimate. This is the maximum of the value we just computed and the last
+	// 4 cons/mark values we measured. The reason we take the maximum here is to bias a noisy
+	// cons/mark measurement toward fewer assists at the expense of additional GC cycles (starting
+	// earlier).
+	oldConsMark := c.consMark
+	c.consMark = currentConsMark
+	for i := range c.lastConsMark {
+		if c.lastConsMark[i] > c.consMark {
+			c.consMark = c.lastConsMark[i]
+		}
+	}
+	copy(c.lastConsMark[:], c.lastConsMark[1:])
+	c.lastConsMark[len(c.lastConsMark)-1] = currentConsMark
+
+	if debug.gcpacertrace > 0 {
+		printlock()
+		goal := gcGoalUtilization * 100
+		print("pacer: ", int(utilization*100), "% CPU (", int(goal), " exp.) for ")
+		print(c.heapScanWork.Load(), "+", c.stackScanWork.Load(), "+", c.globalsScanWork.Load(), " B work (", c.lastHeapScan+c.lastStackScan.Load()+c.globalsScan.Load(), " B exp.) ")
+		live := c.heapLive.Load()
+		print("in ", c.triggered, " B -> ", live, " B (∆goal ", int64(live)-int64(c.lastHeapGoal), ", cons/mark ", oldConsMark, ")")
+		println()
+		printunlock()
+	}
+}
+
+// enlistWorker encourages another dedicated mark worker to start on
+// another P if there are spare worker slots. It is used by putfull
+// when more work is made available.
+//
+//go:nowritebarrier
+func (c *gcControllerState) enlistWorker() {
+	// If there are idle Ps, wake one so it will run an idle worker.
+	// NOTE: This is suspected of causing deadlocks. See golang.org/issue/19112.
+	//
+	//	if sched.npidle.Load() != 0 && sched.nmspinning.Load() == 0 {
+	//		wakep()
+	//		return
+	//	}
+
+	// There are no idle Ps. If we need more dedicated workers,
+	// try to preempt a running P so it will switch to a worker.
+	if c.dedicatedMarkWorkersNeeded.Load() <= 0 {
+		return
+	}
+	// Pick a random other P to preempt.
+	if gomaxprocs <= 1 {
+		return
+	}
+	gp := getg()
+	if gp == nil || gp.m == nil || gp.m.p == 0 {
+		return
+	}
+	myID := gp.m.p.ptr().id
+	for tries := 0; tries < 5; tries++ {
+		id := int32(fastrandn(uint32(gomaxprocs - 1)))
+		if id >= myID {
+			id++
+		}
+		p := allp[id]
+		if p.status != _Prunning {
+			continue
+		}
+		if preemptone(p) {
+			return
+		}
+	}
+}
+
+// findRunnableGCWorker returns a background mark worker for pp if it
+// should be run. This must only be called when gcBlackenEnabled != 0.
+func (c *gcControllerState) findRunnableGCWorker(pp *p, now int64) (*g, int64) {
+	if gcBlackenEnabled == 0 {
+		throw("gcControllerState.findRunnable: blackening not enabled")
+	}
+
+	// Since we have the current time, check if the GC CPU limiter
+	// hasn't had an update in a while. This check is necessary in
+	// case the limiter is on but hasn't been checked in a while and
+	// so may have left sufficient headroom to turn off again.
+	if now == 0 {
+		now = nanotime()
+	}
+	if gcCPULimiter.needUpdate(now) {
+		gcCPULimiter.update(now)
+	}
+
+	if !gcMarkWorkAvailable(pp) {
+		// No work to be done right now. This can happen at
+		// the end of the mark phase when there are still
+		// assists tapering off. Don't bother running a worker
+		// now because it'll just return immediately.
+		return nil, now
+	}
+
+	// Grab a worker before we commit to running below.
+	node := (*gcBgMarkWorkerNode)(gcBgMarkWorkerPool.pop())
+	if node == nil {
+		// There is at least one worker per P, so normally there are
+		// enough workers to run on all Ps, if necessary. However, once
+		// a worker enters gcMarkDone it may park without rejoining the
+		// pool, thus freeing a P with no corresponding worker.
+		// gcMarkDone never depends on another worker doing work, so it
+		// is safe to simply do nothing here.
+		//
+		// If gcMarkDone bails out without completing the mark phase,
+		// it will always do so with queued global work. Thus, that P
+		// will be immediately eligible to re-run the worker G it was
+		// just using, ensuring work can complete.
+		return nil, now
+	}
+
+	decIfPositive := func(val *atomic.Int64) bool {
+		for {
+			v := val.Load()
+			if v <= 0 {
+				return false
+			}
+
+			if val.CompareAndSwap(v, v-1) {
+				return true
+			}
+		}
+	}
+
+	if decIfPositive(&c.dedicatedMarkWorkersNeeded) {
+		// This P is now dedicated to marking until the end of
+		// the concurrent mark phase.
+		pp.gcMarkWorkerMode = gcMarkWorkerDedicatedMode
+	} else if c.fractionalUtilizationGoal == 0 {
+		// No need for fractional workers.
+		gcBgMarkWorkerPool.push(&node.node)
+		return nil, now
+	} else {
+		// Is this P behind on the fractional utilization
+		// goal?
+		//
+		// This should be kept in sync with pollFractionalWorkerExit.
+		delta := now - c.markStartTime
+		if delta > 0 && float64(pp.gcFractionalMarkTime)/float64(delta) > c.fractionalUtilizationGoal {
+			// Nope. No need to run a fractional worker.
+			gcBgMarkWorkerPool.push(&node.node)
+			return nil, now
+		}
+		// Run a fractional worker.
+		pp.gcMarkWorkerMode = gcMarkWorkerFractionalMode
+	}
+
+	// Run the background mark worker.
+	gp := node.gp.ptr()
+	casgstatus(gp, _Gwaiting, _Grunnable)
+	if traceEnabled() {
+		traceGoUnpark(gp, 0)
+	}
+	return gp, now
+}
+
+// resetLive sets up the controller state for the next mark phase after the end
+// of the previous one. Must be called after endCycle and before commit, before
+// the world is started.
+//
+// The world must be stopped.
+func (c *gcControllerState) resetLive(bytesMarked uint64) {
+	c.heapMarked = bytesMarked
+	c.heapLive.Store(bytesMarked)
+	c.heapScan.Store(uint64(c.heapScanWork.Load()))
+	c.lastHeapScan = uint64(c.heapScanWork.Load())
+	c.lastStackScan.Store(uint64(c.stackScanWork.Load()))
+	c.triggered = ^uint64(0) // Reset triggered.
+
+	// heapLive was updated, so emit a trace event.
+	if traceEnabled() {
+		traceHeapAlloc(bytesMarked)
+	}
+}
+
+// markWorkerStop must be called whenever a mark worker stops executing.
+//
+// It updates mark work accounting in the controller by a duration of
+// work in nanoseconds and other bookkeeping.
+//
+// Safe to execute at any time.
+func (c *gcControllerState) markWorkerStop(mode gcMarkWorkerMode, duration int64) {
+	switch mode {
+	case gcMarkWorkerDedicatedMode:
+		c.dedicatedMarkTime.Add(duration)
+		c.dedicatedMarkWorkersNeeded.Add(1)
+	case gcMarkWorkerFractionalMode:
+		c.fractionalMarkTime.Add(duration)
+	case gcMarkWorkerIdleMode:
+		c.idleMarkTime.Add(duration)
+		c.removeIdleMarkWorker()
+	default:
+		throw("markWorkerStop: unknown mark worker mode")
+	}
+}
+
+func (c *gcControllerState) update(dHeapLive, dHeapScan int64) {
+	if dHeapLive != 0 {
+		live := gcController.heapLive.Add(dHeapLive)
+		if traceEnabled() {
+			// gcController.heapLive changed.
+			traceHeapAlloc(live)
+		}
+	}
+	if gcBlackenEnabled == 0 {
+		// Update heapScan when we're not in a current GC. It is fixed
+		// at the beginning of a cycle.
+		if dHeapScan != 0 {
+			gcController.heapScan.Add(dHeapScan)
+		}
+	} else {
+		// gcController.heapLive changed.
+		c.revise()
+	}
+}
+
+func (c *gcControllerState) addScannableStack(pp *p, amount int64) {
+	if pp == nil {
+		c.maxStackScan.Add(amount)
+		return
+	}
+	pp.maxStackScanDelta += amount
+	if pp.maxStackScanDelta >= maxStackScanSlack || pp.maxStackScanDelta <= -maxStackScanSlack {
+		c.maxStackScan.Add(pp.maxStackScanDelta)
+		pp.maxStackScanDelta = 0
+	}
+}
+
+func (c *gcControllerState) addGlobals(amount int64) {
+	c.globalsScan.Add(amount)
+}
+
+// heapGoal returns the current heap goal.
+func (c *gcControllerState) heapGoal() uint64 {
+	goal, _ := c.heapGoalInternal()
+	return goal
+}
+
+// heapGoalInternal is the implementation of heapGoal which returns additional
+// information that is necessary for computing the trigger.
+//
+// The returned minTrigger is always <= goal.
+func (c *gcControllerState) heapGoalInternal() (goal, minTrigger uint64) {
+	// Start with the goal calculated for gcPercent.
+	goal = c.gcPercentHeapGoal.Load()
+
+	// Check if the memory-limit-based goal is smaller, and if so, pick that.
+	if newGoal := c.memoryLimitHeapGoal(); newGoal < goal {
+		goal = newGoal
+	} else {
+		// We're not limited by the memory limit goal, so perform a series of
+		// adjustments that might move the goal forward in a variety of circumstances.
+
+		sweepDistTrigger := c.sweepDistMinTrigger.Load()
+		if sweepDistTrigger > goal {
+			// Set the goal to maintain a minimum sweep distance since
+			// the last call to commit. Note that we never want to do this
+			// if we're in the memory limit regime, because it could push
+			// the goal up.
+			goal = sweepDistTrigger
+		}
+		// Since we ignore the sweep distance trigger in the memory
+		// limit regime, we need to ensure we don't propagate it to
+		// the trigger, because it could cause a violation of the
+		// invariant that the trigger < goal.
+		minTrigger = sweepDistTrigger
+
+		// Ensure that the heap goal is at least a little larger than
+		// the point at which we triggered. This may not be the case if GC
+		// start is delayed or if the allocation that pushed gcController.heapLive
+		// over trigger is large or if the trigger is really close to
+		// GOGC. Assist is proportional to this distance, so enforce a
+		// minimum distance, even if it means going over the GOGC goal
+		// by a tiny bit.
+		//
+		// Ignore this if we're in the memory limit regime: we'd prefer to
+		// have the GC respond hard about how close we are to the goal than to
+		// push the goal back in such a manner that it could cause us to exceed
+		// the memory limit.
+		const minRunway = 64 << 10
+		if c.triggered != ^uint64(0) && goal < c.triggered+minRunway {
+			goal = c.triggered + minRunway
+		}
+	}
+	return
+}
+
+// memoryLimitHeapGoal returns a heap goal derived from memoryLimit.
+func (c *gcControllerState) memoryLimitHeapGoal() uint64 {
+	// Start by pulling out some values we'll need. Be careful about overflow.
+	var heapFree, heapAlloc, mappedReady uint64
+	for {
+		heapFree = c.heapFree.load()                         // Free and unscavenged memory.
+		heapAlloc = c.totalAlloc.Load() - c.totalFree.Load() // Heap object bytes in use.
+		mappedReady = c.mappedReady.Load()                   // Total unreleased mapped memory.
+		if heapFree+heapAlloc <= mappedReady {
+			break
+		}
+		// It is impossible for total unreleased mapped memory to exceed heap memory, but
+		// because these stats are updated independently, we may observe a partial update
+		// including only some values. Thus, we appear to break the invariant. However,
+		// this condition is necessarily transient, so just try again. In the case of a
+		// persistent accounting error, we'll deadlock here.
+	}
+
+	// Below we compute a goal from memoryLimit. There are a few things to be aware of.
+	// Firstly, the memoryLimit does not easily compare to the heap goal: the former
+	// is total mapped memory by the runtime that hasn't been released, while the latter is
+	// only heap object memory. Intuitively, the way we convert from one to the other is to
+	// subtract everything from memoryLimit that both contributes to the memory limit (so,
+	// ignore scavenged memory) and doesn't contain heap objects. This isn't quite what
+	// lines up with reality, but it's a good starting point.
+	//
+	// In practice this computation looks like the following:
+	//
+	//    goal := memoryLimit - ((mappedReady - heapFree - heapAlloc) + max(mappedReady - memoryLimit, 0))
+	//                    ^1                                    ^2
+	//    goal -= goal / 100 * memoryLimitHeapGoalHeadroomPercent
+	//    ^3
+	//
+	// Let's break this down.
+	//
+	// The first term (marker 1) is everything that contributes to the memory limit and isn't
+	// or couldn't become heap objects. It represents, broadly speaking, non-heap overheads.
+	// One oddity you may have noticed is that we also subtract out heapFree, i.e. unscavenged
+	// memory that may contain heap objects in the future.
+	//
+	// Let's take a step back. In an ideal world, this term would look something like just
+	// the heap goal. That is, we "reserve" enough space for the heap to grow to the heap
+	// goal, and subtract out everything else. This is of course impossible; the definition
+	// is circular! However, this impossible definition contains a key insight: the amount
+	// we're *going* to use matters just as much as whatever we're currently using.
+	//
+	// Consider if the heap shrinks to 1/10th its size, leaving behind lots of free and
+	// unscavenged memory. mappedReady - heapAlloc will be quite large, because of that free
+	// and unscavenged memory, pushing the goal down significantly.
+	//
+	// heapFree is also safe to exclude from the memory limit because in the steady-state, it's
+	// just a pool of memory for future heap allocations, and making new allocations from heapFree
+	// memory doesn't increase overall memory use. In transient states, the scavenger and the
+	// allocator actively manage the pool of heapFree memory to maintain the memory limit.
+	//
+	// The second term (marker 2) is the amount of memory we've exceeded the limit by, and is
+	// intended to help recover from such a situation. By pushing the heap goal down, we also
+	// push the trigger down, triggering and finishing a GC sooner in order to make room for
+	// other memory sources. Note that since we're effectively reducing the heap goal by X bytes,
+	// we're actually giving more than X bytes of headroom back, because the heap goal is in
+	// terms of heap objects, but it takes more than X bytes (e.g. due to fragmentation) to store
+	// X bytes worth of objects.
+	//
+	// The final adjustment (marker 3) reduces the maximum possible memory limit heap goal by
+	// memoryLimitHeapGoalPercent. As the name implies, this is to provide additional headroom in
+	// the face of pacing inaccuracies, and also to leave a buffer of unscavenged memory so the
+	// allocator isn't constantly scavenging. The reduction amount also has a fixed minimum
+	// (memoryLimitMinHeapGoalHeadroom, not pictured) because the aforementioned pacing inaccuracies
+	// disproportionately affect small heaps: as heaps get smaller, the pacer's inputs get fuzzier.
+	// Shorter GC cycles and less GC work means noisy external factors like the OS scheduler have a
+	// greater impact.
+
+	memoryLimit := uint64(c.memoryLimit.Load())
+
+	// Compute term 1.
+	nonHeapMemory := mappedReady - heapFree - heapAlloc
+
+	// Compute term 2.
+	var overage uint64
+	if mappedReady > memoryLimit {
+		overage = mappedReady - memoryLimit
+	}
+
+	if nonHeapMemory+overage >= memoryLimit {
+		// We're at a point where non-heap memory exceeds the memory limit on its own.
+		// There's honestly not much we can do here but just trigger GCs continuously
+		// and let the CPU limiter reign that in. Something has to give at this point.
+		// Set it to heapMarked, the lowest possible goal.
+		return c.heapMarked
+	}
+
+	// Compute the goal.
+	goal := memoryLimit - (nonHeapMemory + overage)
+
+	// Apply some headroom to the goal to account for pacing inaccuracies and to reduce
+	// the impact of scavenging at allocation time in response to a high allocation rate
+	// when GOGC=off. See issue #57069. Also, be careful about small limits.
+	headroom := goal / 100 * memoryLimitHeapGoalHeadroomPercent
+	if headroom < memoryLimitMinHeapGoalHeadroom {
+		// Set a fixed minimum to deal with the particularly large effect pacing inaccuracies
+		// have for smaller heaps.
+		headroom = memoryLimitMinHeapGoalHeadroom
+	}
+	if goal < headroom || goal-headroom < headroom {
+		goal = headroom
+	} else {
+		goal = goal - headroom
+	}
+	// Don't let us go below the live heap. A heap goal below the live heap doesn't make sense.
+	if goal < c.heapMarked {
+		goal = c.heapMarked
+	}
+	return goal
+}
+
+const (
+	// These constants determine the bounds on the GC trigger as a fraction
+	// of heap bytes allocated between the start of a GC (heapLive == heapMarked)
+	// and the end of a GC (heapLive == heapGoal).
+	//
+	// The constants are obscured in this way for efficiency. The denominator
+	// of the fraction is always a power-of-two for a quick division, so that
+	// the numerator is a single constant integer multiplication.
+	triggerRatioDen = 64
+
+	// The minimum trigger constant was chosen empirically: given a sufficiently
+	// fast/scalable allocator with 48 Ps that could drive the trigger ratio
+	// to <0.05, this constant causes applications to retain the same peak
+	// RSS compared to not having this allocator.
+	minTriggerRatioNum = 45 // ~0.7
+
+	// The maximum trigger constant is chosen somewhat arbitrarily, but the
+	// current constant has served us well over the years.
+	maxTriggerRatioNum = 61 // ~0.95
+)
+
+// trigger returns the current point at which a GC should trigger along with
+// the heap goal.
+//
+// The returned value may be compared against heapLive to determine whether
+// the GC should trigger. Thus, the GC trigger condition should be (but may
+// not be, in the case of small movements for efficiency) checked whenever
+// the heap goal may change.
+func (c *gcControllerState) trigger() (uint64, uint64) {
+	goal, minTrigger := c.heapGoalInternal()
+
+	// Invariant: the trigger must always be less than the heap goal.
+	//
+	// Note that the memory limit sets a hard maximum on our heap goal,
+	// but the live heap may grow beyond it.
+
+	if c.heapMarked >= goal {
+		// The goal should never be smaller than heapMarked, but let's be
+		// defensive about it. The only reasonable trigger here is one that
+		// causes a continuous GC cycle at heapMarked, but respect the goal
+		// if it came out as smaller than that.
+		return goal, goal
+	}
+
+	// Below this point, c.heapMarked < goal.
+
+	// heapMarked is our absolute minimum, and it's possible the trigger
+	// bound we get from heapGoalinternal is less than that.
+	if minTrigger < c.heapMarked {
+		minTrigger = c.heapMarked
+	}
+
+	// If we let the trigger go too low, then if the application
+	// is allocating very rapidly we might end up in a situation
+	// where we're allocating black during a nearly always-on GC.
+	// The result of this is a growing heap and ultimately an
+	// increase in RSS. By capping us at a point >0, we're essentially
+	// saying that we're OK using more CPU during the GC to prevent
+	// this growth in RSS.
+	triggerLowerBound := uint64(((goal-c.heapMarked)/triggerRatioDen)*minTriggerRatioNum) + c.heapMarked
+	if minTrigger < triggerLowerBound {
+		minTrigger = triggerLowerBound
+	}
+
+	// For small heaps, set the max trigger point at maxTriggerRatio of the way
+	// from the live heap to the heap goal. This ensures we always have *some*
+	// headroom when the GC actually starts. For larger heaps, set the max trigger
+	// point at the goal, minus the minimum heap size.
+	//
+	// This choice follows from the fact that the minimum heap size is chosen
+	// to reflect the costs of a GC with no work to do. With a large heap but
+	// very little scan work to perform, this gives us exactly as much runway
+	// as we would need, in the worst case.
+	maxTrigger := uint64(((goal-c.heapMarked)/triggerRatioDen)*maxTriggerRatioNum) + c.heapMarked
+	if goal > defaultHeapMinimum && goal-defaultHeapMinimum > maxTrigger {
+		maxTrigger = goal - defaultHeapMinimum
+	}
+	if maxTrigger < minTrigger {
+		maxTrigger = minTrigger
+	}
+
+	// Compute the trigger from our bounds and the runway stored by commit.
+	var trigger uint64
+	runway := c.runway.Load()
+	if runway > goal {
+		trigger = minTrigger
+	} else {
+		trigger = goal - runway
+	}
+	if trigger < minTrigger {
+		trigger = minTrigger
+	}
+	if trigger > maxTrigger {
+		trigger = maxTrigger
+	}
+	if trigger > goal {
+		print("trigger=", trigger, " heapGoal=", goal, "\n")
+		print("minTrigger=", minTrigger, " maxTrigger=", maxTrigger, "\n")
+		throw("produced a trigger greater than the heap goal")
+	}
+	return trigger, goal
+}
+
+// commit recomputes all pacing parameters needed to derive the
+// trigger and the heap goal. Namely, the gcPercent-based heap goal,
+// and the amount of runway we want to give the GC this cycle.
+//
+// This can be called any time. If GC is the in the middle of a
+// concurrent phase, it will adjust the pacing of that phase.
+//
+// isSweepDone should be the result of calling isSweepDone(),
+// unless we're testing or we know we're executing during a GC cycle.
+//
+// This depends on gcPercent, gcController.heapMarked, and
+// gcController.heapLive. These must be up to date.
+//
+// Callers must call gcControllerState.revise after calling this
+// function if the GC is enabled.
+//
+// mheap_.lock must be held or the world must be stopped.
+func (c *gcControllerState) commit(isSweepDone bool) {
+	if !c.test {
+		assertWorldStoppedOrLockHeld(&mheap_.lock)
+	}
+
+	if isSweepDone {
+		// The sweep is done, so there aren't any restrictions on the trigger
+		// we need to think about.
+		c.sweepDistMinTrigger.Store(0)
+	} else {
+		// Concurrent sweep happens in the heap growth
+		// from gcController.heapLive to trigger. Make sure we
+		// give the sweeper some runway if it doesn't have enough.
+		c.sweepDistMinTrigger.Store(c.heapLive.Load() + sweepMinHeapDistance)
+	}
+
+	// Compute the next GC goal, which is when the allocated heap
+	// has grown by GOGC/100 over where it started the last cycle,
+	// plus additional runway for non-heap sources of GC work.
+	gcPercentHeapGoal := ^uint64(0)
+	if gcPercent := c.gcPercent.Load(); gcPercent >= 0 {
+		gcPercentHeapGoal = c.heapMarked + (c.heapMarked+c.lastStackScan.Load()+c.globalsScan.Load())*uint64(gcPercent)/100
+	}
+	// Apply the minimum heap size here. It's defined in terms of gcPercent
+	// and is only updated by functions that call commit.
+	if gcPercentHeapGoal < c.heapMinimum {
+		gcPercentHeapGoal = c.heapMinimum
+	}
+	c.gcPercentHeapGoal.Store(gcPercentHeapGoal)
+
+	// Compute the amount of runway we want the GC to have by using our
+	// estimate of the cons/mark ratio.
+	//
+	// The idea is to take our expected scan work, and multiply it by
+	// the cons/mark ratio to determine how long it'll take to complete
+	// that scan work in terms of bytes allocated. This gives us our GC's
+	// runway.
+	//
+	// However, the cons/mark ratio is a ratio of rates per CPU-second, but
+	// here we care about the relative rates for some division of CPU
+	// resources among the mutator and the GC.
+	//
+	// To summarize, we have B / cpu-ns, and we want B / ns. We get that
+	// by multiplying by our desired division of CPU resources. We choose
+	// to express CPU resources as GOMAPROCS*fraction. Note that because
+	// we're working with a ratio here, we can omit the number of CPU cores,
+	// because they'll appear in the numerator and denominator and cancel out.
+	// As a result, this is basically just "weighing" the cons/mark ratio by
+	// our desired division of resources.
+	//
+	// Furthermore, by setting the runway so that CPU resources are divided
+	// this way, assuming that the cons/mark ratio is correct, we make that
+	// division a reality.
+	c.runway.Store(uint64((c.consMark * (1 - gcGoalUtilization) / (gcGoalUtilization)) * float64(c.lastHeapScan+c.lastStackScan.Load()+c.globalsScan.Load())))
+}
+
+// setGCPercent updates gcPercent. commit must be called after.
+// Returns the old value of gcPercent.
+//
+// The world must be stopped, or mheap_.lock must be held.
+func (c *gcControllerState) setGCPercent(in int32) int32 {
+	if !c.test {
+		assertWorldStoppedOrLockHeld(&mheap_.lock)
+	}
+
+	out := c.gcPercent.Load()
+	if in < 0 {
+		in = -1
+	}
+	c.heapMinimum = defaultHeapMinimum * uint64(in) / 100
+	c.gcPercent.Store(in)
+
+	return out
+}
+
+//go:linkname setGCPercent runtime/debug.setGCPercent
+func setGCPercent(in int32) (out int32) {
+	// Run on the system stack since we grab the heap lock.
+	systemstack(func() {
+		lock(&mheap_.lock)
+		out = gcController.setGCPercent(in)
+		gcControllerCommit()
+		unlock(&mheap_.lock)
+	})
+
+	// If we just disabled GC, wait for any concurrent GC mark to
+	// finish so we always return with no GC running.
+	if in < 0 {
+		gcWaitOnMark(work.cycles.Load())
+	}
+
+	return out
+}
+
+func readGOGC() int32 {
+	p := gogetenv("GOGC")
+	if p == "off" {
+		return -1
+	}
+	if n, ok := atoi32(p); ok {
+		return n
+	}
+	return 100
+}
+
+// setMemoryLimit updates memoryLimit. commit must be called after
+// Returns the old value of memoryLimit.
+//
+// The world must be stopped, or mheap_.lock must be held.
+func (c *gcControllerState) setMemoryLimit(in int64) int64 {
+	if !c.test {
+		assertWorldStoppedOrLockHeld(&mheap_.lock)
+	}
+
+	out := c.memoryLimit.Load()
+	if in >= 0 {
+		c.memoryLimit.Store(in)
+	}
+
+	return out
+}
+
+//go:linkname setMemoryLimit runtime/debug.setMemoryLimit
+func setMemoryLimit(in int64) (out int64) {
+	// Run on the system stack since we grab the heap lock.
+	systemstack(func() {
+		lock(&mheap_.lock)
+		out = gcController.setMemoryLimit(in)
+		if in < 0 || out == in {
+			// If we're just checking the value or not changing
+			// it, there's no point in doing the rest.
+			unlock(&mheap_.lock)
+			return
+		}
+		gcControllerCommit()
+		unlock(&mheap_.lock)
+	})
+	return out
+}
+
+func readGOMEMLIMIT() int64 {
+	p := gogetenv("GOMEMLIMIT")
+	if p == "" || p == "off" {
+		return maxInt64
+	}
+	n, ok := parseByteCount(p)
+	if !ok {
+		print("GOMEMLIMIT=", p, "\n")
+		throw("malformed GOMEMLIMIT; see `go doc runtime/debug.SetMemoryLimit`")
+	}
+	return n
+}
+
+// addIdleMarkWorker attempts to add a new idle mark worker.
+//
+// If this returns true, the caller must become an idle mark worker unless
+// there's no background mark worker goroutines in the pool. This case is
+// harmless because there are already background mark workers running.
+// If this returns false, the caller must NOT become an idle mark worker.
+//
+// nosplit because it may be called without a P.
+//
+//go:nosplit
+func (c *gcControllerState) addIdleMarkWorker() bool {
+	for {
+		old := c.idleMarkWorkers.Load()
+		n, max := int32(old&uint64(^uint32(0))), int32(old>>32)
+		if n >= max {
+			// See the comment on idleMarkWorkers for why
+			// n > max is tolerated.
+			return false
+		}
+		if n < 0 {
+			print("n=", n, " max=", max, "\n")
+			throw("negative idle mark workers")
+		}
+		new := uint64(uint32(n+1)) | (uint64(max) << 32)
+		if c.idleMarkWorkers.CompareAndSwap(old, new) {
+			return true
+		}
+	}
+}
+
+// needIdleMarkWorker is a hint as to whether another idle mark worker is needed.
+//
+// The caller must still call addIdleMarkWorker to become one. This is mainly
+// useful for a quick check before an expensive operation.
+//
+// nosplit because it may be called without a P.
+//
+//go:nosplit
+func (c *gcControllerState) needIdleMarkWorker() bool {
+	p := c.idleMarkWorkers.Load()
+	n, max := int32(p&uint64(^uint32(0))), int32(p>>32)
+	return n < max
+}
+
+// removeIdleMarkWorker must be called when an new idle mark worker stops executing.
+func (c *gcControllerState) removeIdleMarkWorker() {
+	for {
+		old := c.idleMarkWorkers.Load()
+		n, max := int32(old&uint64(^uint32(0))), int32(old>>32)
+		if n-1 < 0 {
+			print("n=", n, " max=", max, "\n")
+			throw("negative idle mark workers")
+		}
+		new := uint64(uint32(n-1)) | (uint64(max) << 32)
+		if c.idleMarkWorkers.CompareAndSwap(old, new) {
+			return
+		}
+	}
+}
+
+// setMaxIdleMarkWorkers sets the maximum number of idle mark workers allowed.
+//
+// This method is optimistic in that it does not wait for the number of
+// idle mark workers to reduce to max before returning; it assumes the workers
+// will deschedule themselves.
+func (c *gcControllerState) setMaxIdleMarkWorkers(max int32) {
+	for {
+		old := c.idleMarkWorkers.Load()
+		n := int32(old & uint64(^uint32(0)))
+		if n < 0 {
+			print("n=", n, " max=", max, "\n")
+			throw("negative idle mark workers")
+		}
+		new := uint64(uint32(n)) | (uint64(max) << 32)
+		if c.idleMarkWorkers.CompareAndSwap(old, new) {
+			return
+		}
+	}
+}
+
+// gcControllerCommit is gcController.commit, but passes arguments from live
+// (non-test) data. It also updates any consumers of the GC pacing, such as
+// sweep pacing and the background scavenger.
+//
+// Calls gcController.commit.
+//
+// The heap lock must be held, so this must be executed on the system stack.
+//
+//go:systemstack
+func gcControllerCommit() {
+	assertWorldStoppedOrLockHeld(&mheap_.lock)
+
+	gcController.commit(isSweepDone())
+
+	// Update mark pacing.
+	if gcphase != _GCoff {
+		gcController.revise()
+	}
+
+	// TODO(mknyszek): This isn't really accurate any longer because the heap
+	// goal is computed dynamically. Still useful to snapshot, but not as useful.
+	if traceEnabled() {
+		traceHeapGoal()
+	}
+
+	trigger, heapGoal := gcController.trigger()
+	gcPaceSweeper(trigger)
+	gcPaceScavenger(gcController.memoryLimit.Load(), heapGoal, gcController.lastHeapGoal)
+}
diff --git a/src/runtime/mgcpacer_test.go b/src/runtime/mgcpacer_test.go
new file mode 100644
index 0000000..ef1483d
--- /dev/null
+++ b/src/runtime/mgcpacer_test.go
@@ -0,0 +1,1097 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"math"
+	"math/rand"
+	. "runtime"
+	"testing"
+	"time"
+)
+
+func TestGcPacer(t *testing.T) {
+	t.Parallel()
+
+	const initialHeapBytes = 256 << 10
+	for _, e := range []*gcExecTest{
+		{
+			// The most basic test case: a steady-state heap.
+			// Growth to an O(MiB) heap, then constant heap size, alloc/scan rates.
+			name:          "Steady",
+			gcPercent:     100,
+			memoryLimit:   math.MaxInt64,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     constant(33.0),
+			scanRate:      constant(1024.0),
+			growthRate:    constant(2.0).sum(ramp(-1.0, 12)),
+			scannableFrac: constant(1.0),
+			stackBytes:    constant(8192),
+			length:        50,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if n >= 25 {
+					// At this alloc/scan rate, the pacer should be extremely close to the goal utilization.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, GCGoalUtilization, 0.005)
+
+					// Make sure the pacer settles into a non-degenerate state in at least 25 GC cycles.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.005)
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.05)
+				}
+			},
+		},
+		{
+			// Same as the steady-state case, but lots of stacks to scan relative to the heap size.
+			name:          "SteadyBigStacks",
+			gcPercent:     100,
+			memoryLimit:   math.MaxInt64,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     constant(132.0),
+			scanRate:      constant(1024.0),
+			growthRate:    constant(2.0).sum(ramp(-1.0, 12)),
+			scannableFrac: constant(1.0),
+			stackBytes:    constant(2048).sum(ramp(128<<20, 8)),
+			length:        50,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				// Check the same conditions as the steady-state case, except the old pacer can't
+				// really handle this well, so don't check the goal ratio for it.
+				n := len(c)
+				if n >= 25 {
+					// For the pacer redesign, assert something even stronger: at this alloc/scan rate,
+					// it should be extremely close to the goal utilization.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, GCGoalUtilization, 0.005)
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.05)
+
+					// Make sure the pacer settles into a non-degenerate state in at least 25 GC cycles.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.005)
+				}
+			},
+		},
+		{
+			// Same as the steady-state case, but lots of globals to scan relative to the heap size.
+			name:          "SteadyBigGlobals",
+			gcPercent:     100,
+			memoryLimit:   math.MaxInt64,
+			globalsBytes:  128 << 20,
+			nCores:        8,
+			allocRate:     constant(132.0),
+			scanRate:      constant(1024.0),
+			growthRate:    constant(2.0).sum(ramp(-1.0, 12)),
+			scannableFrac: constant(1.0),
+			stackBytes:    constant(8192),
+			length:        50,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				// Check the same conditions as the steady-state case, except the old pacer can't
+				// really handle this well, so don't check the goal ratio for it.
+				n := len(c)
+				if n >= 25 {
+					// For the pacer redesign, assert something even stronger: at this alloc/scan rate,
+					// it should be extremely close to the goal utilization.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, GCGoalUtilization, 0.005)
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.05)
+
+					// Make sure the pacer settles into a non-degenerate state in at least 25 GC cycles.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.005)
+				}
+			},
+		},
+		{
+			// This tests the GC pacer's response to a small change in allocation rate.
+			name:          "StepAlloc",
+			gcPercent:     100,
+			memoryLimit:   math.MaxInt64,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     constant(33.0).sum(ramp(66.0, 1).delay(50)),
+			scanRate:      constant(1024.0),
+			growthRate:    constant(2.0).sum(ramp(-1.0, 12)),
+			scannableFrac: constant(1.0),
+			stackBytes:    constant(8192),
+			length:        100,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if (n >= 25 && n < 50) || n >= 75 {
+					// Make sure the pacer settles into a non-degenerate state in at least 25 GC cycles
+					// and then is able to settle again after a significant jump in allocation rate.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.005)
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.05)
+				}
+			},
+		},
+		{
+			// This tests the GC pacer's response to a large change in allocation rate.
+			name:          "HeavyStepAlloc",
+			gcPercent:     100,
+			memoryLimit:   math.MaxInt64,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     constant(33).sum(ramp(330, 1).delay(50)),
+			scanRate:      constant(1024.0),
+			growthRate:    constant(2.0).sum(ramp(-1.0, 12)),
+			scannableFrac: constant(1.0),
+			stackBytes:    constant(8192),
+			length:        100,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if (n >= 25 && n < 50) || n >= 75 {
+					// Make sure the pacer settles into a non-degenerate state in at least 25 GC cycles
+					// and then is able to settle again after a significant jump in allocation rate.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.005)
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.05)
+				}
+			},
+		},
+		{
+			// This tests the GC pacer's response to a change in the fraction of the scannable heap.
+			name:          "StepScannableFrac",
+			gcPercent:     100,
+			memoryLimit:   math.MaxInt64,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     constant(128.0),
+			scanRate:      constant(1024.0),
+			growthRate:    constant(2.0).sum(ramp(-1.0, 12)),
+			scannableFrac: constant(0.2).sum(unit(0.5).delay(50)),
+			stackBytes:    constant(8192),
+			length:        100,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if (n >= 25 && n < 50) || n >= 75 {
+					// Make sure the pacer settles into a non-degenerate state in at least 25 GC cycles
+					// and then is able to settle again after a significant jump in allocation rate.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.005)
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.05)
+				}
+			},
+		},
+		{
+			// Tests the pacer for a high GOGC value with a large heap growth happening
+			// in the middle. The purpose of the large heap growth is to check if GC
+			// utilization ends up sensitive
+			name:          "HighGOGC",
+			gcPercent:     1500,
+			memoryLimit:   math.MaxInt64,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     random(7, 0x53).offset(165),
+			scanRate:      constant(1024.0),
+			growthRate:    constant(2.0).sum(ramp(-1.0, 12), random(0.01, 0x1), unit(14).delay(25)),
+			scannableFrac: constant(1.0),
+			stackBytes:    constant(8192),
+			length:        50,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if n > 12 {
+					if n == 26 {
+						// In the 26th cycle there's a heap growth. Overshoot is expected to maintain
+						// a stable utilization, but we should *never* overshoot more than GOGC of
+						// the next cycle.
+						assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.90, 15)
+					} else {
+						// Give a wider goal range here. With such a high GOGC value we're going to be
+						// forced to undershoot.
+						//
+						// TODO(mknyszek): Instead of placing a 0.95 limit on the trigger, make the limit
+						// based on absolute bytes, that's based somewhat in how the minimum heap size
+						// is determined.
+						assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.90, 1.05)
+					}
+
+					// Ensure utilization remains stable despite a growth in live heap size
+					// at GC #25. This test fails prior to the GC pacer redesign.
+					//
+					// Because GOGC is so large, we should also be really close to the goal utilization.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, GCGoalUtilization, GCGoalUtilization+0.03)
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.03)
+				}
+			},
+		},
+		{
+			// This test makes sure that in the face of a varying (in this case, oscillating) allocation
+			// rate, the pacer does a reasonably good job of staying abreast of the changes.
+			name:          "OscAlloc",
+			gcPercent:     100,
+			memoryLimit:   math.MaxInt64,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     oscillate(13, 0, 8).offset(67),
+			scanRate:      constant(1024.0),
+			growthRate:    constant(2.0).sum(ramp(-1.0, 12)),
+			scannableFrac: constant(1.0),
+			stackBytes:    constant(8192),
+			length:        50,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if n > 12 {
+					// After the 12th GC, the heap will stop growing. Now, just make sure that:
+					// 1. Utilization isn't varying _too_ much, and
+					// 2. The pacer is mostly keeping up with the goal.
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.05)
+					assertInRange(t, "GC utilization", c[n-1].gcUtilization, 0.25, 0.3)
+				}
+			},
+		},
+		{
+			// This test is the same as OscAlloc, but instead of oscillating, the allocation rate is jittery.
+			name:          "JitterAlloc",
+			gcPercent:     100,
+			memoryLimit:   math.MaxInt64,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     random(13, 0xf).offset(132),
+			scanRate:      constant(1024.0),
+			growthRate:    constant(2.0).sum(ramp(-1.0, 12), random(0.01, 0xe)),
+			scannableFrac: constant(1.0),
+			stackBytes:    constant(8192),
+			length:        50,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if n > 12 {
+					// After the 12th GC, the heap will stop growing. Now, just make sure that:
+					// 1. Utilization isn't varying _too_ much, and
+					// 2. The pacer is mostly keeping up with the goal.
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.025)
+					assertInRange(t, "GC utilization", c[n-1].gcUtilization, 0.25, 0.275)
+				}
+			},
+		},
+		{
+			// This test is the same as JitterAlloc, but with a much higher allocation rate.
+			// The jitter is proportionally the same.
+			name:          "HeavyJitterAlloc",
+			gcPercent:     100,
+			memoryLimit:   math.MaxInt64,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     random(33.0, 0x0).offset(330),
+			scanRate:      constant(1024.0),
+			growthRate:    constant(2.0).sum(ramp(-1.0, 12), random(0.01, 0x152)),
+			scannableFrac: constant(1.0),
+			stackBytes:    constant(8192),
+			length:        50,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if n > 13 {
+					// After the 12th GC, the heap will stop growing. Now, just make sure that:
+					// 1. Utilization isn't varying _too_ much, and
+					// 2. The pacer is mostly keeping up with the goal.
+					// We start at the 13th here because we want to use the 12th as a reference.
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.05)
+					// Unlike the other tests, GC utilization here will vary more and tend higher.
+					// Just make sure it's not going too crazy.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.05)
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[11].gcUtilization, 0.05)
+				}
+			},
+		},
+		{
+			// This test sets a slow allocation rate and a small heap (close to the minimum heap size)
+			// to try to minimize the difference between the trigger and the goal.
+			name:          "SmallHeapSlowAlloc",
+			gcPercent:     100,
+			memoryLimit:   math.MaxInt64,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     constant(1.0),
+			scanRate:      constant(2048.0),
+			growthRate:    constant(2.0).sum(ramp(-1.0, 3)),
+			scannableFrac: constant(0.01),
+			stackBytes:    constant(8192),
+			length:        50,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if n > 4 {
+					// After the 4th GC, the heap will stop growing.
+					// First, let's make sure we're finishing near the goal, with some extra
+					// room because we're probably going to be triggering early.
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.925, 1.025)
+					// Next, let's make sure there's some minimum distance between the goal
+					// and the trigger. It should be proportional to the runway (hence the
+					// trigger ratio check, instead of a check against the runway).
+					assertInRange(t, "trigger ratio", c[n-1].triggerRatio(), 0.925, 0.975)
+				}
+				if n > 25 {
+					// Double-check that GC utilization looks OK.
+
+					// At this alloc/scan rate, the pacer should be extremely close to the goal utilization.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, GCGoalUtilization, 0.005)
+					// Make sure GC utilization has mostly levelled off.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.05)
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[11].gcUtilization, 0.05)
+				}
+			},
+		},
+		{
+			// This test sets a slow allocation rate and a medium heap (around 10x the min heap size)
+			// to try to minimize the difference between the trigger and the goal.
+			name:          "MediumHeapSlowAlloc",
+			gcPercent:     100,
+			memoryLimit:   math.MaxInt64,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     constant(1.0),
+			scanRate:      constant(2048.0),
+			growthRate:    constant(2.0).sum(ramp(-1.0, 8)),
+			scannableFrac: constant(0.01),
+			stackBytes:    constant(8192),
+			length:        50,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if n > 9 {
+					// After the 4th GC, the heap will stop growing.
+					// First, let's make sure we're finishing near the goal, with some extra
+					// room because we're probably going to be triggering early.
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.925, 1.025)
+					// Next, let's make sure there's some minimum distance between the goal
+					// and the trigger. It should be proportional to the runway (hence the
+					// trigger ratio check, instead of a check against the runway).
+					assertInRange(t, "trigger ratio", c[n-1].triggerRatio(), 0.925, 0.975)
+				}
+				if n > 25 {
+					// Double-check that GC utilization looks OK.
+
+					// At this alloc/scan rate, the pacer should be extremely close to the goal utilization.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, GCGoalUtilization, 0.005)
+					// Make sure GC utilization has mostly levelled off.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.05)
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[11].gcUtilization, 0.05)
+				}
+			},
+		},
+		{
+			// This test sets a slow allocation rate and a large heap to try to minimize the
+			// difference between the trigger and the goal.
+			name:          "LargeHeapSlowAlloc",
+			gcPercent:     100,
+			memoryLimit:   math.MaxInt64,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     constant(1.0),
+			scanRate:      constant(2048.0),
+			growthRate:    constant(4.0).sum(ramp(-3.0, 12)),
+			scannableFrac: constant(0.01),
+			stackBytes:    constant(8192),
+			length:        50,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if n > 13 {
+					// After the 4th GC, the heap will stop growing.
+					// First, let's make sure we're finishing near the goal.
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.05)
+					// Next, let's make sure there's some minimum distance between the goal
+					// and the trigger. It should be around the default minimum heap size.
+					assertInRange(t, "runway", c[n-1].runway(), DefaultHeapMinimum-64<<10, DefaultHeapMinimum+64<<10)
+				}
+				if n > 25 {
+					// Double-check that GC utilization looks OK.
+
+					// At this alloc/scan rate, the pacer should be extremely close to the goal utilization.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, GCGoalUtilization, 0.005)
+					// Make sure GC utilization has mostly levelled off.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.05)
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[11].gcUtilization, 0.05)
+				}
+			},
+		},
+		{
+			// The most basic test case with a memory limit: a steady-state heap.
+			// Growth to an O(MiB) heap, then constant heap size, alloc/scan rates.
+			// Provide a lot of room for the limit. Essentially, this should behave just like
+			// the "Steady" test. Note that we don't simulate non-heap overheads, so the
+			// memory limit and the heap limit are identical.
+			name:          "SteadyMemoryLimit",
+			gcPercent:     100,
+			memoryLimit:   512 << 20,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     constant(33.0),
+			scanRate:      constant(1024.0),
+			growthRate:    constant(2.0).sum(ramp(-1.0, 12)),
+			scannableFrac: constant(1.0),
+			stackBytes:    constant(8192),
+			length:        50,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if peak := c[n-1].heapPeak; peak >= applyMemoryLimitHeapGoalHeadroom(512<<20) {
+					t.Errorf("peak heap size reaches heap limit: %d", peak)
+				}
+				if n >= 25 {
+					// At this alloc/scan rate, the pacer should be extremely close to the goal utilization.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, GCGoalUtilization, 0.005)
+
+					// Make sure the pacer settles into a non-degenerate state in at least 25 GC cycles.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.005)
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.05)
+				}
+			},
+		},
+		{
+			// This is the same as the previous test, but gcPercent = -1, so the heap *should* grow
+			// all the way to the peak.
+			name:          "SteadyMemoryLimitNoGCPercent",
+			gcPercent:     -1,
+			memoryLimit:   512 << 20,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     constant(33.0),
+			scanRate:      constant(1024.0),
+			growthRate:    constant(2.0).sum(ramp(-1.0, 12)),
+			scannableFrac: constant(1.0),
+			stackBytes:    constant(8192),
+			length:        50,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if goal := c[n-1].heapGoal; goal != applyMemoryLimitHeapGoalHeadroom(512<<20) {
+					t.Errorf("heap goal is not the heap limit: %d", goal)
+				}
+				if n >= 25 {
+					// At this alloc/scan rate, the pacer should be extremely close to the goal utilization.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, GCGoalUtilization, 0.005)
+
+					// Make sure the pacer settles into a non-degenerate state in at least 25 GC cycles.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.005)
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.05)
+				}
+			},
+		},
+		{
+			// This test ensures that the pacer doesn't fall over even when the live heap exceeds
+			// the memory limit. It also makes sure GC utilization actually rises to push back.
+			name:          "ExceedMemoryLimit",
+			gcPercent:     100,
+			memoryLimit:   512 << 20,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     constant(33.0),
+			scanRate:      constant(1024.0),
+			growthRate:    constant(3.5).sum(ramp(-2.5, 12)),
+			scannableFrac: constant(1.0),
+			stackBytes:    constant(8192),
+			length:        50,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if n > 12 {
+					// We're way over the memory limit, so we want to make sure our goal is set
+					// as low as it possibly can be.
+					if goal, live := c[n-1].heapGoal, c[n-1].heapLive; goal != live {
+						t.Errorf("heap goal is not equal to live heap: %d != %d", goal, live)
+					}
+				}
+				if n >= 25 {
+					// Due to memory pressure, we should scale to 100% GC CPU utilization.
+					// Note that in practice this won't actually happen because of the CPU limiter,
+					// but it's not the pacer's job to limit CPU usage.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, 1.0, 0.005)
+
+					// Make sure the pacer settles into a non-degenerate state in at least 25 GC cycles.
+					// In this case, that just means it's not wavering around a whole bunch.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.005)
+				}
+			},
+		},
+		{
+			// Same as the previous test, but with gcPercent = -1.
+			name:          "ExceedMemoryLimitNoGCPercent",
+			gcPercent:     -1,
+			memoryLimit:   512 << 20,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     constant(33.0),
+			scanRate:      constant(1024.0),
+			growthRate:    constant(3.5).sum(ramp(-2.5, 12)),
+			scannableFrac: constant(1.0),
+			stackBytes:    constant(8192),
+			length:        50,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if n < 10 {
+					if goal := c[n-1].heapGoal; goal != applyMemoryLimitHeapGoalHeadroom(512<<20) {
+						t.Errorf("heap goal is not the heap limit: %d", goal)
+					}
+				}
+				if n > 12 {
+					// We're way over the memory limit, so we want to make sure our goal is set
+					// as low as it possibly can be.
+					if goal, live := c[n-1].heapGoal, c[n-1].heapLive; goal != live {
+						t.Errorf("heap goal is not equal to live heap: %d != %d", goal, live)
+					}
+				}
+				if n >= 25 {
+					// Due to memory pressure, we should scale to 100% GC CPU utilization.
+					// Note that in practice this won't actually happen because of the CPU limiter,
+					// but it's not the pacer's job to limit CPU usage.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, 1.0, 0.005)
+
+					// Make sure the pacer settles into a non-degenerate state in at least 25 GC cycles.
+					// In this case, that just means it's not wavering around a whole bunch.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.005)
+				}
+			},
+		},
+		{
+			// This test ensures that the pacer maintains the memory limit as the heap grows.
+			name:          "MaintainMemoryLimit",
+			gcPercent:     100,
+			memoryLimit:   512 << 20,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     constant(33.0),
+			scanRate:      constant(1024.0),
+			growthRate:    constant(3.0).sum(ramp(-2.0, 12)),
+			scannableFrac: constant(1.0),
+			stackBytes:    constant(8192),
+			length:        50,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if n > 12 {
+					// We're trying to saturate the memory limit.
+					if goal := c[n-1].heapGoal; goal != applyMemoryLimitHeapGoalHeadroom(512<<20) {
+						t.Errorf("heap goal is not the heap limit: %d", goal)
+					}
+				}
+				if n >= 25 {
+					// At this alloc/scan rate, the pacer should be extremely close to the goal utilization,
+					// even with the additional memory pressure.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, GCGoalUtilization, 0.005)
+
+					// Make sure the pacer settles into a non-degenerate state in at least 25 GC cycles and
+					// that it's meeting its goal.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.005)
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.05)
+				}
+			},
+		},
+		{
+			// Same as the previous test, but with gcPercent = -1.
+			name:          "MaintainMemoryLimitNoGCPercent",
+			gcPercent:     -1,
+			memoryLimit:   512 << 20,
+			globalsBytes:  32 << 10,
+			nCores:        8,
+			allocRate:     constant(33.0),
+			scanRate:      constant(1024.0),
+			growthRate:    constant(3.0).sum(ramp(-2.0, 12)),
+			scannableFrac: constant(1.0),
+			stackBytes:    constant(8192),
+			length:        50,
+			checker: func(t *testing.T, c []gcCycleResult) {
+				n := len(c)
+				if goal := c[n-1].heapGoal; goal != applyMemoryLimitHeapGoalHeadroom(512<<20) {
+					t.Errorf("heap goal is not the heap limit: %d", goal)
+				}
+				if n >= 25 {
+					// At this alloc/scan rate, the pacer should be extremely close to the goal utilization,
+					// even with the additional memory pressure.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, GCGoalUtilization, 0.005)
+
+					// Make sure the pacer settles into a non-degenerate state in at least 25 GC cycles and
+					// that it's meeting its goal.
+					assertInEpsilon(t, "GC utilization", c[n-1].gcUtilization, c[n-2].gcUtilization, 0.005)
+					assertInRange(t, "goal ratio", c[n-1].goalRatio(), 0.95, 1.05)
+				}
+			},
+		},
+		// TODO(mknyszek): Write a test that exercises the pacer's hard goal.
+		// This is difficult in the idealized model this testing framework places
+		// the pacer in, because the calculated overshoot is directly proportional
+		// to the runway for the case of the expected work.
+		// However, it is still possible to trigger this case if something exceptional
+		// happens between calls to revise; the framework just doesn't support this yet.
+	} {
+		e := e
+		t.Run(e.name, func(t *testing.T) {
+			t.Parallel()
+
+			c := NewGCController(e.gcPercent, e.memoryLimit)
+			var bytesAllocatedBlackLast int64
+			results := make([]gcCycleResult, 0, e.length)
+			for i := 0; i < e.length; i++ {
+				cycle := e.next()
+				c.StartCycle(cycle.stackBytes, e.globalsBytes, cycle.scannableFrac, e.nCores)
+
+				// Update pacer incrementally as we complete scan work.
+				const (
+					revisePeriod = 500 * time.Microsecond
+					rateConv     = 1024 * float64(revisePeriod) / float64(time.Millisecond)
+				)
+				var nextHeapMarked int64
+				if i == 0 {
+					nextHeapMarked = initialHeapBytes
+				} else {
+					nextHeapMarked = int64(float64(int64(c.HeapMarked())-bytesAllocatedBlackLast) * cycle.growthRate)
+				}
+				globalsScanWorkLeft := int64(e.globalsBytes)
+				stackScanWorkLeft := int64(cycle.stackBytes)
+				heapScanWorkLeft := int64(float64(nextHeapMarked) * cycle.scannableFrac)
+				doWork := func(work int64) (int64, int64, int64) {
+					var deltas [3]int64
+
+					// Do globals work first, then stacks, then heap.
+					for i, workLeft := range []*int64{&globalsScanWorkLeft, &stackScanWorkLeft, &heapScanWorkLeft} {
+						if *workLeft == 0 {
+							continue
+						}
+						if *workLeft > work {
+							deltas[i] += work
+							*workLeft -= work
+							work = 0
+							break
+						} else {
+							deltas[i] += *workLeft
+							work -= *workLeft
+							*workLeft = 0
+						}
+					}
+					return deltas[0], deltas[1], deltas[2]
+				}
+				var (
+					gcDuration          int64
+					assistTime          int64
+					bytesAllocatedBlack int64
+				)
+				for heapScanWorkLeft+stackScanWorkLeft+globalsScanWorkLeft > 0 {
+					// Simulate GC assist pacing.
+					//
+					// Note that this is an idealized view of the GC assist pacing
+					// mechanism.
+
+					// From the assist ratio and the alloc and scan rates, we can idealize what
+					// the GC CPU utilization looks like.
+					//
+					// We start with assistRatio = (bytes of scan work) / (bytes of runway) (by definition).
+					//
+					// Over revisePeriod, we can also calculate how many bytes are scanned and
+					// allocated, given some GC CPU utilization u:
+					//
+					//     bytesScanned   = scanRate  * rateConv * nCores * u
+					//     bytesAllocated = allocRate * rateConv * nCores * (1 - u)
+					//
+					// During revisePeriod, assistRatio is kept constant, and GC assists kick in to
+					// maintain it. Specifically, they act to prevent too many bytes being allocated
+					// compared to how many bytes are scanned. It directly defines the ratio of
+					// bytesScanned to bytesAllocated over this period, hence:
+					//
+					//     assistRatio = bytesScanned / bytesAllocated
+					//
+					// From this, we can solve for utilization, because everything else has already
+					// been determined:
+					//
+					//     assistRatio = (scanRate * rateConv * nCores * u) / (allocRate * rateConv * nCores * (1 - u))
+					//     assistRatio = (scanRate * u) / (allocRate * (1 - u))
+					//     assistRatio * allocRate * (1-u) = scanRate * u
+					//     assistRatio * allocRate - assistRatio * allocRate * u = scanRate * u
+					//     assistRatio * allocRate = assistRatio * allocRate * u + scanRate * u
+					//     assistRatio * allocRate = (assistRatio * allocRate + scanRate) * u
+					//     u = (assistRatio * allocRate) / (assistRatio * allocRate + scanRate)
+					//
+					// Note that this may give a utilization that is _less_ than GCBackgroundUtilization,
+					// which isn't possible in practice because of dedicated workers. Thus, this case
+					// must be interpreted as GC assists not kicking in at all, and just round up. All
+					// downstream values will then have this accounted for.
+					assistRatio := c.AssistWorkPerByte()
+					utilization := assistRatio * cycle.allocRate / (assistRatio*cycle.allocRate + cycle.scanRate)
+					if utilization < GCBackgroundUtilization {
+						utilization = GCBackgroundUtilization
+					}
+
+					// Knowing the utilization, calculate bytesScanned and bytesAllocated.
+					bytesScanned := int64(cycle.scanRate * rateConv * float64(e.nCores) * utilization)
+					bytesAllocated := int64(cycle.allocRate * rateConv * float64(e.nCores) * (1 - utilization))
+
+					// Subtract work from our model.
+					globalsScanned, stackScanned, heapScanned := doWork(bytesScanned)
+
+					// doWork may not use all of bytesScanned.
+					// In this case, the GC actually ends sometime in this period.
+					// Let's figure out when, exactly, and adjust bytesAllocated too.
+					actualElapsed := revisePeriod
+					actualAllocated := bytesAllocated
+					if actualScanned := globalsScanned + stackScanned + heapScanned; actualScanned < bytesScanned {
+						// actualScanned = scanRate * rateConv * (t / revisePeriod) * nCores * u
+						// => t = actualScanned * revisePeriod / (scanRate * rateConv * nCores * u)
+						actualElapsed = time.Duration(float64(actualScanned) * float64(revisePeriod) / (cycle.scanRate * rateConv * float64(e.nCores) * utilization))
+						actualAllocated = int64(cycle.allocRate * rateConv * float64(actualElapsed) / float64(revisePeriod) * float64(e.nCores) * (1 - utilization))
+					}
+
+					// Ask the pacer to revise.
+					c.Revise(GCControllerReviseDelta{
+						HeapLive:        actualAllocated,
+						HeapScan:        int64(float64(actualAllocated) * cycle.scannableFrac),
+						HeapScanWork:    heapScanned,
+						StackScanWork:   stackScanned,
+						GlobalsScanWork: globalsScanned,
+					})
+
+					// Accumulate variables.
+					assistTime += int64(float64(actualElapsed) * float64(e.nCores) * (utilization - GCBackgroundUtilization))
+					gcDuration += int64(actualElapsed)
+					bytesAllocatedBlack += actualAllocated
+				}
+
+				// Put together the results, log them, and concatenate them.
+				result := gcCycleResult{
+					cycle:         i + 1,
+					heapLive:      c.HeapMarked(),
+					heapScannable: int64(float64(int64(c.HeapMarked())-bytesAllocatedBlackLast) * cycle.scannableFrac),
+					heapTrigger:   c.Triggered(),
+					heapPeak:      c.HeapLive(),
+					heapGoal:      c.HeapGoal(),
+					gcUtilization: float64(assistTime)/(float64(gcDuration)*float64(e.nCores)) + GCBackgroundUtilization,
+				}
+				t.Log("GC", result.String())
+				results = append(results, result)
+
+				// Run the checker for this test.
+				e.check(t, results)
+
+				c.EndCycle(uint64(nextHeapMarked+bytesAllocatedBlack), assistTime, gcDuration, e.nCores)
+
+				bytesAllocatedBlackLast = bytesAllocatedBlack
+			}
+		})
+	}
+}
+
+type gcExecTest struct {
+	name string
+
+	gcPercent    int
+	memoryLimit  int64
+	globalsBytes uint64
+	nCores       int
+
+	allocRate     float64Stream // > 0, KiB / cpu-ms
+	scanRate      float64Stream // > 0, KiB / cpu-ms
+	growthRate    float64Stream // > 0
+	scannableFrac float64Stream // Clamped to [0, 1]
+	stackBytes    float64Stream // Multiple of 2048.
+	length        int
+
+	checker func(*testing.T, []gcCycleResult)
+}
+
+// minRate is an arbitrary minimum for allocRate, scanRate, and growthRate.
+// These values just cannot be zero.
+const minRate = 0.0001
+
+func (e *gcExecTest) next() gcCycle {
+	return gcCycle{
+		allocRate:     e.allocRate.min(minRate)(),
+		scanRate:      e.scanRate.min(minRate)(),
+		growthRate:    e.growthRate.min(minRate)(),
+		scannableFrac: e.scannableFrac.limit(0, 1)(),
+		stackBytes:    uint64(e.stackBytes.quantize(2048).min(0)()),
+	}
+}
+
+func (e *gcExecTest) check(t *testing.T, results []gcCycleResult) {
+	t.Helper()
+
+	// Do some basic general checks first.
+	n := len(results)
+	switch n {
+	case 0:
+		t.Fatal("no results passed to check")
+		return
+	case 1:
+		if results[0].cycle != 1 {
+			t.Error("first cycle has incorrect number")
+		}
+	default:
+		if results[n-1].cycle != results[n-2].cycle+1 {
+			t.Error("cycle numbers out of order")
+		}
+	}
+	if u := results[n-1].gcUtilization; u < 0 || u > 1 {
+		t.Fatal("GC utilization not within acceptable bounds")
+	}
+	if s := results[n-1].heapScannable; s < 0 {
+		t.Fatal("heapScannable is negative")
+	}
+	if e.checker == nil {
+		t.Fatal("test-specific checker is missing")
+	}
+
+	// Run the test-specific checker.
+	e.checker(t, results)
+}
+
+type gcCycle struct {
+	allocRate     float64
+	scanRate      float64
+	growthRate    float64
+	scannableFrac float64
+	stackBytes    uint64
+}
+
+type gcCycleResult struct {
+	cycle int
+
+	// These come directly from the pacer, so uint64.
+	heapLive    uint64
+	heapTrigger uint64
+	heapGoal    uint64
+	heapPeak    uint64
+
+	// These are produced by the simulation, so int64 and
+	// float64 are more appropriate, so that we can check for
+	// bad states in the simulation.
+	heapScannable int64
+	gcUtilization float64
+}
+
+func (r *gcCycleResult) goalRatio() float64 {
+	return float64(r.heapPeak) / float64(r.heapGoal)
+}
+
+func (r *gcCycleResult) runway() float64 {
+	return float64(r.heapGoal - r.heapTrigger)
+}
+
+func (r *gcCycleResult) triggerRatio() float64 {
+	return float64(r.heapTrigger-r.heapLive) / float64(r.heapGoal-r.heapLive)
+}
+
+func (r *gcCycleResult) String() string {
+	return fmt.Sprintf("%d %2.1f%% %d->%d->%d (goal: %d)", r.cycle, r.gcUtilization*100, r.heapLive, r.heapTrigger, r.heapPeak, r.heapGoal)
+}
+
+func assertInEpsilon(t *testing.T, name string, a, b, epsilon float64) {
+	t.Helper()
+	assertInRange(t, name, a, b-epsilon, b+epsilon)
+}
+
+func assertInRange(t *testing.T, name string, a, min, max float64) {
+	t.Helper()
+	if a < min || a > max {
+		t.Errorf("%s not in range (%f, %f): %f", name, min, max, a)
+	}
+}
+
+// float64Stream is a function that generates an infinite stream of
+// float64 values when called repeatedly.
+type float64Stream func() float64
+
+// constant returns a stream that generates the value c.
+func constant(c float64) float64Stream {
+	return func() float64 {
+		return c
+	}
+}
+
+// unit returns a stream that generates a single peak with
+// amplitude amp, followed by zeroes.
+//
+// In another manner of speaking, this is the Kronecker delta.
+func unit(amp float64) float64Stream {
+	dropped := false
+	return func() float64 {
+		if dropped {
+			return 0
+		}
+		dropped = true
+		return amp
+	}
+}
+
+// oscillate returns a stream that oscillates sinusoidally
+// with the given amplitude, phase, and period.
+func oscillate(amp, phase float64, period int) float64Stream {
+	var cycle int
+	return func() float64 {
+		p := float64(cycle)/float64(period)*2*math.Pi + phase
+		cycle++
+		if cycle == period {
+			cycle = 0
+		}
+		return math.Sin(p) * amp
+	}
+}
+
+// ramp returns a stream that moves from zero to height
+// over the course of length steps.
+func ramp(height float64, length int) float64Stream {
+	var cycle int
+	return func() float64 {
+		h := height * float64(cycle) / float64(length)
+		if cycle < length {
+			cycle++
+		}
+		return h
+	}
+}
+
+// random returns a stream that generates random numbers
+// between -amp and amp.
+func random(amp float64, seed int64) float64Stream {
+	r := rand.New(rand.NewSource(seed))
+	return func() float64 {
+		return ((r.Float64() - 0.5) * 2) * amp
+	}
+}
+
+// delay returns a new stream which is a buffered version
+// of f: it returns zero for cycles steps, followed by f.
+func (f float64Stream) delay(cycles int) float64Stream {
+	zeroes := 0
+	return func() float64 {
+		if zeroes < cycles {
+			zeroes++
+			return 0
+		}
+		return f()
+	}
+}
+
+// scale returns a new stream that is f, but attenuated by a
+// constant factor.
+func (f float64Stream) scale(amt float64) float64Stream {
+	return func() float64 {
+		return f() * amt
+	}
+}
+
+// offset returns a new stream that is f but offset by amt
+// at each step.
+func (f float64Stream) offset(amt float64) float64Stream {
+	return func() float64 {
+		old := f()
+		return old + amt
+	}
+}
+
+// sum returns a new stream that is the sum of all input streams
+// at each step.
+func (f float64Stream) sum(fs ...float64Stream) float64Stream {
+	return func() float64 {
+		sum := f()
+		for _, s := range fs {
+			sum += s()
+		}
+		return sum
+	}
+}
+
+// quantize returns a new stream that rounds f to a multiple
+// of mult at each step.
+func (f float64Stream) quantize(mult float64) float64Stream {
+	return func() float64 {
+		r := f() / mult
+		if r < 0 {
+			return math.Ceil(r) * mult
+		}
+		return math.Floor(r) * mult
+	}
+}
+
+// min returns a new stream that replaces all values produced
+// by f lower than min with min.
+func (f float64Stream) min(min float64) float64Stream {
+	return func() float64 {
+		return math.Max(min, f())
+	}
+}
+
+// max returns a new stream that replaces all values produced
+// by f higher than max with max.
+func (f float64Stream) max(max float64) float64Stream {
+	return func() float64 {
+		return math.Min(max, f())
+	}
+}
+
+// limit returns a new stream that replaces all values produced
+// by f lower than min with min and higher than max with max.
+func (f float64Stream) limit(min, max float64) float64Stream {
+	return func() float64 {
+		v := f()
+		if v < min {
+			v = min
+		} else if v > max {
+			v = max
+		}
+		return v
+	}
+}
+
+func applyMemoryLimitHeapGoalHeadroom(goal uint64) uint64 {
+	headroom := goal / 100 * MemoryLimitHeapGoalHeadroomPercent
+	if headroom < MemoryLimitMinHeapGoalHeadroom {
+		headroom = MemoryLimitMinHeapGoalHeadroom
+	}
+	if goal < headroom || goal-headroom < headroom {
+		goal = headroom
+	} else {
+		goal -= headroom
+	}
+	return goal
+}
+
+func TestIdleMarkWorkerCount(t *testing.T) {
+	const workers = 10
+	c := NewGCController(100, math.MaxInt64)
+	c.SetMaxIdleMarkWorkers(workers)
+	for i := 0; i < workers; i++ {
+		if !c.NeedIdleMarkWorker() {
+			t.Fatalf("expected to need idle mark workers: i=%d", i)
+		}
+		if !c.AddIdleMarkWorker() {
+			t.Fatalf("expected to be able to add an idle mark worker: i=%d", i)
+		}
+	}
+	if c.NeedIdleMarkWorker() {
+		t.Fatalf("expected to not need idle mark workers")
+	}
+	if c.AddIdleMarkWorker() {
+		t.Fatalf("expected to not be able to add an idle mark worker")
+	}
+	for i := 0; i < workers; i++ {
+		c.RemoveIdleMarkWorker()
+		if !c.NeedIdleMarkWorker() {
+			t.Fatalf("expected to need idle mark workers after removal: i=%d", i)
+		}
+	}
+	for i := 0; i < workers-1; i++ {
+		if !c.AddIdleMarkWorker() {
+			t.Fatalf("expected to be able to add idle mark workers after adding again: i=%d", i)
+		}
+	}
+	for i := 0; i < 10; i++ {
+		if !c.AddIdleMarkWorker() {
+			t.Fatalf("expected to be able to add idle mark workers interleaved: i=%d", i)
+		}
+		if c.AddIdleMarkWorker() {
+			t.Fatalf("expected to not be able to add idle mark workers interleaved: i=%d", i)
+		}
+		c.RemoveIdleMarkWorker()
+	}
+	// Support the max being below the count.
+	c.SetMaxIdleMarkWorkers(0)
+	if c.NeedIdleMarkWorker() {
+		t.Fatalf("expected to not need idle mark workers after capacity set to 0")
+	}
+	if c.AddIdleMarkWorker() {
+		t.Fatalf("expected to not be able to add idle mark workers after capacity set to 0")
+	}
+	for i := 0; i < workers-1; i++ {
+		c.RemoveIdleMarkWorker()
+	}
+	if c.NeedIdleMarkWorker() {
+		t.Fatalf("expected to not need idle mark workers after capacity set to 0")
+	}
+	if c.AddIdleMarkWorker() {
+		t.Fatalf("expected to not be able to add idle mark workers after capacity set to 0")
+	}
+	c.SetMaxIdleMarkWorkers(1)
+	if !c.NeedIdleMarkWorker() {
+		t.Fatalf("expected to need idle mark workers after capacity set to 1")
+	}
+	if !c.AddIdleMarkWorker() {
+		t.Fatalf("expected to be able to add idle mark workers after capacity set to 1")
+	}
+}
diff --git a/src/runtime/mgcscavenge.go b/src/runtime/mgcscavenge.go
new file mode 100644
index 0000000..659ca8d
--- /dev/null
+++ b/src/runtime/mgcscavenge.go
@@ -0,0 +1,1463 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Scavenging free pages.
+//
+// This file implements scavenging (the release of physical pages backing mapped
+// memory) of free and unused pages in the heap as a way to deal with page-level
+// fragmentation and reduce the RSS of Go applications.
+//
+// Scavenging in Go happens on two fronts: there's the background
+// (asynchronous) scavenger and the allocation-time (synchronous) scavenger.
+//
+// The former happens on a goroutine much like the background sweeper which is
+// soft-capped at using scavengePercent of the mutator's time, based on
+// order-of-magnitude estimates of the costs of scavenging. The latter happens
+// when allocating pages from the heap.
+//
+// The scavenger's primary goal is to bring the estimated heap RSS of the
+// application down to a goal.
+//
+// Before we consider what this looks like, we need to split the world into two
+// halves. One in which a memory limit is not set, and one in which it is.
+//
+// For the former, the goal is defined as:
+//   (retainExtraPercent+100) / 100 * (heapGoal / lastHeapGoal) * lastHeapInUse
+//
+// Essentially, we wish to have the application's RSS track the heap goal, but
+// the heap goal is defined in terms of bytes of objects, rather than pages like
+// RSS. As a result, we need to take into account for fragmentation internal to
+// spans. heapGoal / lastHeapGoal defines the ratio between the current heap goal
+// and the last heap goal, which tells us by how much the heap is growing and
+// shrinking. We estimate what the heap will grow to in terms of pages by taking
+// this ratio and multiplying it by heapInUse at the end of the last GC, which
+// allows us to account for this additional fragmentation. Note that this
+// procedure makes the assumption that the degree of fragmentation won't change
+// dramatically over the next GC cycle. Overestimating the amount of
+// fragmentation simply results in higher memory use, which will be accounted
+// for by the next pacing up date. Underestimating the fragmentation however
+// could lead to performance degradation. Handling this case is not within the
+// scope of the scavenger. Situations where the amount of fragmentation balloons
+// over the course of a single GC cycle should be considered pathologies,
+// flagged as bugs, and fixed appropriately.
+//
+// An additional factor of retainExtraPercent is added as a buffer to help ensure
+// that there's more unscavenged memory to allocate out of, since each allocation
+// out of scavenged memory incurs a potentially expensive page fault.
+//
+// If a memory limit is set, then we wish to pick a scavenge goal that maintains
+// that memory limit. For that, we look at total memory that has been committed
+// (memstats.mappedReady) and try to bring that down below the limit. In this case,
+// we want to give buffer space in the *opposite* direction. When the application
+// is close to the limit, we want to make sure we push harder to keep it under, so
+// if we target below the memory limit, we ensure that the background scavenger is
+// giving the situation the urgency it deserves.
+//
+// In this case, the goal is defined as:
+//    (100-reduceExtraPercent) / 100 * memoryLimit
+//
+// We compute both of these goals, and check whether either of them have been met.
+// The background scavenger continues operating as long as either one of the goals
+// has not been met.
+//
+// The goals are updated after each GC.
+//
+// Synchronous scavenging happens for one of two reasons: if an allocation would
+// exceed the memory limit or whenever the heap grows in size, for some
+// definition of heap-growth. The intuition behind this second reason is that the
+// application had to grow the heap because existing fragments were not sufficiently
+// large to satisfy a page-level memory allocation, so we scavenge those fragments
+// eagerly to offset the growth in RSS that results.
+//
+// Lastly, not all pages are available for scavenging at all times and in all cases.
+// The background scavenger and heap-growth scavenger only release memory in chunks
+// that have not been densely-allocated for at least 1 full GC cycle. The reason
+// behind this is likelihood of reuse: the Go heap is allocated in a first-fit order
+// and by the end of the GC mark phase, the heap tends to be densely packed. Releasing
+// memory in these densely packed chunks while they're being packed is counter-productive,
+// and worse, it breaks up huge pages on systems that support them. The scavenger (invoked
+// during memory allocation) further ensures that chunks it identifies as "dense" are
+// immediately eligible for being backed by huge pages. Note that for the most part these
+// density heuristics are best-effort heuristics. It's totally possible (but unlikely)
+// that a chunk that just became dense is scavenged in the case of a race between memory
+// allocation and scavenging.
+//
+// When synchronously scavenging for the memory limit or for debug.FreeOSMemory, these
+// "dense" packing heuristics are ignored (in other words, scavenging is "forced") because
+// in these scenarios returning memory to the OS is more important than keeping CPU
+// overheads low.
+
+package runtime
+
+import (
+	"internal/goos"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	// The background scavenger is paced according to these parameters.
+	//
+	// scavengePercent represents the portion of mutator time we're willing
+	// to spend on scavenging in percent.
+	scavengePercent = 1 // 1%
+
+	// retainExtraPercent represents the amount of memory over the heap goal
+	// that the scavenger should keep as a buffer space for the allocator.
+	// This constant is used when we do not have a memory limit set.
+	//
+	// The purpose of maintaining this overhead is to have a greater pool of
+	// unscavenged memory available for allocation (since using scavenged memory
+	// incurs an additional cost), to account for heap fragmentation and
+	// the ever-changing layout of the heap.
+	retainExtraPercent = 10
+
+	// reduceExtraPercent represents the amount of memory under the limit
+	// that the scavenger should target. For example, 5 means we target 95%
+	// of the limit.
+	//
+	// The purpose of shooting lower than the limit is to ensure that, once
+	// close to the limit, the scavenger is working hard to maintain it. If
+	// we have a memory limit set but are far away from it, there's no harm
+	// in leaving up to 100-retainExtraPercent live, and it's more efficient
+	// anyway, for the same reasons that retainExtraPercent exists.
+	reduceExtraPercent = 5
+
+	// maxPagesPerPhysPage is the maximum number of supported runtime pages per
+	// physical page, based on maxPhysPageSize.
+	maxPagesPerPhysPage = maxPhysPageSize / pageSize
+
+	// scavengeCostRatio is the approximate ratio between the costs of using previously
+	// scavenged memory and scavenging memory.
+	//
+	// For most systems the cost of scavenging greatly outweighs the costs
+	// associated with using scavenged memory, making this constant 0. On other systems
+	// (especially ones where "sysUsed" is not just a no-op) this cost is non-trivial.
+	//
+	// This ratio is used as part of multiplicative factor to help the scavenger account
+	// for the additional costs of using scavenged memory in its pacing.
+	scavengeCostRatio = 0.7 * (goos.IsDarwin + goos.IsIos)
+
+	// scavChunkHiOcFrac indicates the fraction of pages that need to be allocated
+	// in the chunk in a single GC cycle for it to be considered high density.
+	scavChunkHiOccFrac  = 0.96875
+	scavChunkHiOccPages = uint16(scavChunkHiOccFrac * pallocChunkPages)
+)
+
+// heapRetained returns an estimate of the current heap RSS.
+func heapRetained() uint64 {
+	return gcController.heapInUse.load() + gcController.heapFree.load()
+}
+
+// gcPaceScavenger updates the scavenger's pacing, particularly
+// its rate and RSS goal. For this, it requires the current heapGoal,
+// and the heapGoal for the previous GC cycle.
+//
+// The RSS goal is based on the current heap goal with a small overhead
+// to accommodate non-determinism in the allocator.
+//
+// The pacing is based on scavengePageRate, which applies to both regular and
+// huge pages. See that constant for more information.
+//
+// Must be called whenever GC pacing is updated.
+//
+// mheap_.lock must be held or the world must be stopped.
+func gcPaceScavenger(memoryLimit int64, heapGoal, lastHeapGoal uint64) {
+	assertWorldStoppedOrLockHeld(&mheap_.lock)
+
+	// As described at the top of this file, there are two scavenge goals here: one
+	// for gcPercent and one for memoryLimit. Let's handle the latter first because
+	// it's simpler.
+
+	// We want to target retaining (100-reduceExtraPercent)% of the heap.
+	memoryLimitGoal := uint64(float64(memoryLimit) * (100.0 - reduceExtraPercent))
+
+	// mappedReady is comparable to memoryLimit, and represents how much total memory
+	// the Go runtime has committed now (estimated).
+	mappedReady := gcController.mappedReady.Load()
+
+	// If we're below the goal already indicate that we don't need the background
+	// scavenger for the memory limit. This may seems worrisome at first, but note
+	// that the allocator will assist the background scavenger in the face of a memory
+	// limit, so we'll be safe even if we stop the scavenger when we shouldn't have.
+	if mappedReady <= memoryLimitGoal {
+		scavenge.memoryLimitGoal.Store(^uint64(0))
+	} else {
+		scavenge.memoryLimitGoal.Store(memoryLimitGoal)
+	}
+
+	// Now handle the gcPercent goal.
+
+	// If we're called before the first GC completed, disable scavenging.
+	// We never scavenge before the 2nd GC cycle anyway (we don't have enough
+	// information about the heap yet) so this is fine, and avoids a fault
+	// or garbage data later.
+	if lastHeapGoal == 0 {
+		scavenge.gcPercentGoal.Store(^uint64(0))
+		return
+	}
+	// Compute our scavenging goal.
+	goalRatio := float64(heapGoal) / float64(lastHeapGoal)
+	gcPercentGoal := uint64(float64(memstats.lastHeapInUse) * goalRatio)
+	// Add retainExtraPercent overhead to retainedGoal. This calculation
+	// looks strange but the purpose is to arrive at an integer division
+	// (e.g. if retainExtraPercent = 12.5, then we get a divisor of 8)
+	// that also avoids the overflow from a multiplication.
+	gcPercentGoal += gcPercentGoal / (1.0 / (retainExtraPercent / 100.0))
+	// Align it to a physical page boundary to make the following calculations
+	// a bit more exact.
+	gcPercentGoal = (gcPercentGoal + uint64(physPageSize) - 1) &^ (uint64(physPageSize) - 1)
+
+	// Represents where we are now in the heap's contribution to RSS in bytes.
+	//
+	// Guaranteed to always be a multiple of physPageSize on systems where
+	// physPageSize <= pageSize since we map new heap memory at a size larger than
+	// any physPageSize and released memory in multiples of the physPageSize.
+	//
+	// However, certain functions recategorize heap memory as other stats (e.g.
+	// stacks) and this happens in multiples of pageSize, so on systems
+	// where physPageSize > pageSize the calculations below will not be exact.
+	// Generally this is OK since we'll be off by at most one regular
+	// physical page.
+	heapRetainedNow := heapRetained()
+
+	// If we're already below our goal, or within one page of our goal, then indicate
+	// that we don't need the background scavenger for maintaining a memory overhead
+	// proportional to the heap goal.
+	if heapRetainedNow <= gcPercentGoal || heapRetainedNow-gcPercentGoal < uint64(physPageSize) {
+		scavenge.gcPercentGoal.Store(^uint64(0))
+	} else {
+		scavenge.gcPercentGoal.Store(gcPercentGoal)
+	}
+}
+
+var scavenge struct {
+	// gcPercentGoal is the amount of retained heap memory (measured by
+	// heapRetained) that the runtime will try to maintain by returning
+	// memory to the OS. This goal is derived from gcController.gcPercent
+	// by choosing to retain enough memory to allocate heap memory up to
+	// the heap goal.
+	gcPercentGoal atomic.Uint64
+
+	// memoryLimitGoal is the amount of memory retained by the runtime (
+	// measured by gcController.mappedReady) that the runtime will try to
+	// maintain by returning memory to the OS. This goal is derived from
+	// gcController.memoryLimit by choosing to target the memory limit or
+	// some lower target to keep the scavenger working.
+	memoryLimitGoal atomic.Uint64
+
+	// assistTime is the time spent by the allocator scavenging in the last GC cycle.
+	//
+	// This is reset once a GC cycle ends.
+	assistTime atomic.Int64
+
+	// backgroundTime is the time spent by the background scavenger in the last GC cycle.
+	//
+	// This is reset once a GC cycle ends.
+	backgroundTime atomic.Int64
+}
+
+const (
+	// It doesn't really matter what value we start at, but we can't be zero, because
+	// that'll cause divide-by-zero issues. Pick something conservative which we'll
+	// also use as a fallback.
+	startingScavSleepRatio = 0.001
+
+	// Spend at least 1 ms scavenging, otherwise the corresponding
+	// sleep time to maintain our desired utilization is too low to
+	// be reliable.
+	minScavWorkTime = 1e6
+)
+
+// Sleep/wait state of the background scavenger.
+var scavenger scavengerState
+
+type scavengerState struct {
+	// lock protects all fields below.
+	lock mutex
+
+	// g is the goroutine the scavenger is bound to.
+	g *g
+
+	// parked is whether or not the scavenger is parked.
+	parked bool
+
+	// timer is the timer used for the scavenger to sleep.
+	timer *timer
+
+	// sysmonWake signals to sysmon that it should wake the scavenger.
+	sysmonWake atomic.Uint32
+
+	// targetCPUFraction is the target CPU overhead for the scavenger.
+	targetCPUFraction float64
+
+	// sleepRatio is the ratio of time spent doing scavenging work to
+	// time spent sleeping. This is used to decide how long the scavenger
+	// should sleep for in between batches of work. It is set by
+	// critSleepController in order to maintain a CPU overhead of
+	// targetCPUFraction.
+	//
+	// Lower means more sleep, higher means more aggressive scavenging.
+	sleepRatio float64
+
+	// sleepController controls sleepRatio.
+	//
+	// See sleepRatio for more details.
+	sleepController piController
+
+	// cooldown is the time left in nanoseconds during which we avoid
+	// using the controller and we hold sleepRatio at a conservative
+	// value. Used if the controller's assumptions fail to hold.
+	controllerCooldown int64
+
+	// printControllerReset instructs printScavTrace to signal that
+	// the controller was reset.
+	printControllerReset bool
+
+	// sleepStub is a stub used for testing to avoid actually having
+	// the scavenger sleep.
+	//
+	// Unlike the other stubs, this is not populated if left nil
+	// Instead, it is called when non-nil because any valid implementation
+	// of this function basically requires closing over this scavenger
+	// state, and allocating a closure is not allowed in the runtime as
+	// a matter of policy.
+	sleepStub func(n int64) int64
+
+	// scavenge is a function that scavenges n bytes of memory.
+	// Returns how many bytes of memory it actually scavenged, as
+	// well as the time it took in nanoseconds. Usually mheap.pages.scavenge
+	// with nanotime called around it, but stubbed out for testing.
+	// Like mheap.pages.scavenge, if it scavenges less than n bytes of
+	// memory, the caller may assume the heap is exhausted of scavengable
+	// memory for now.
+	//
+	// If this is nil, it is populated with the real thing in init.
+	scavenge func(n uintptr) (uintptr, int64)
+
+	// shouldStop is a callback called in the work loop and provides a
+	// point that can force the scavenger to stop early, for example because
+	// the scavenge policy dictates too much has been scavenged already.
+	//
+	// If this is nil, it is populated with the real thing in init.
+	shouldStop func() bool
+
+	// gomaxprocs returns the current value of gomaxprocs. Stub for testing.
+	//
+	// If this is nil, it is populated with the real thing in init.
+	gomaxprocs func() int32
+}
+
+// init initializes a scavenger state and wires to the current G.
+//
+// Must be called from a regular goroutine that can allocate.
+func (s *scavengerState) init() {
+	if s.g != nil {
+		throw("scavenger state is already wired")
+	}
+	lockInit(&s.lock, lockRankScavenge)
+	s.g = getg()
+
+	s.timer = new(timer)
+	s.timer.arg = s
+	s.timer.f = func(s any, _ uintptr) {
+		s.(*scavengerState).wake()
+	}
+
+	// input: fraction of CPU time actually used.
+	// setpoint: ideal CPU fraction.
+	// output: ratio of time worked to time slept (determines sleep time).
+	//
+	// The output of this controller is somewhat indirect to what we actually
+	// want to achieve: how much time to sleep for. The reason for this definition
+	// is to ensure that the controller's outputs have a direct relationship with
+	// its inputs (as opposed to an inverse relationship), making it somewhat
+	// easier to reason about for tuning purposes.
+	s.sleepController = piController{
+		// Tuned loosely via Ziegler-Nichols process.
+		kp: 0.3375,
+		ti: 3.2e6,
+		tt: 1e9, // 1 second reset time.
+
+		// These ranges seem wide, but we want to give the controller plenty of
+		// room to hunt for the optimal value.
+		min: 0.001,  // 1:1000
+		max: 1000.0, // 1000:1
+	}
+	s.sleepRatio = startingScavSleepRatio
+
+	// Install real functions if stubs aren't present.
+	if s.scavenge == nil {
+		s.scavenge = func(n uintptr) (uintptr, int64) {
+			start := nanotime()
+			r := mheap_.pages.scavenge(n, nil, false)
+			end := nanotime()
+			if start >= end {
+				return r, 0
+			}
+			scavenge.backgroundTime.Add(end - start)
+			return r, end - start
+		}
+	}
+	if s.shouldStop == nil {
+		s.shouldStop = func() bool {
+			// If background scavenging is disabled or if there's no work to do just stop.
+			return heapRetained() <= scavenge.gcPercentGoal.Load() &&
+				gcController.mappedReady.Load() <= scavenge.memoryLimitGoal.Load()
+		}
+	}
+	if s.gomaxprocs == nil {
+		s.gomaxprocs = func() int32 {
+			return gomaxprocs
+		}
+	}
+}
+
+// park parks the scavenger goroutine.
+func (s *scavengerState) park() {
+	lock(&s.lock)
+	if getg() != s.g {
+		throw("tried to park scavenger from another goroutine")
+	}
+	s.parked = true
+	goparkunlock(&s.lock, waitReasonGCScavengeWait, traceBlockSystemGoroutine, 2)
+}
+
+// ready signals to sysmon that the scavenger should be awoken.
+func (s *scavengerState) ready() {
+	s.sysmonWake.Store(1)
+}
+
+// wake immediately unparks the scavenger if necessary.
+//
+// Safe to run without a P.
+func (s *scavengerState) wake() {
+	lock(&s.lock)
+	if s.parked {
+		// Unset sysmonWake, since the scavenger is now being awoken.
+		s.sysmonWake.Store(0)
+
+		// s.parked is unset to prevent a double wake-up.
+		s.parked = false
+
+		// Ready the goroutine by injecting it. We use injectglist instead
+		// of ready or goready in order to allow us to run this function
+		// without a P. injectglist also avoids placing the goroutine in
+		// the current P's runnext slot, which is desirable to prevent
+		// the scavenger from interfering with user goroutine scheduling
+		// too much.
+		var list gList
+		list.push(s.g)
+		injectglist(&list)
+	}
+	unlock(&s.lock)
+}
+
+// sleep puts the scavenger to sleep based on the amount of time that it worked
+// in nanoseconds.
+//
+// Note that this function should only be called by the scavenger.
+//
+// The scavenger may be woken up earlier by a pacing change, and it may not go
+// to sleep at all if there's a pending pacing change.
+func (s *scavengerState) sleep(worked float64) {
+	lock(&s.lock)
+	if getg() != s.g {
+		throw("tried to sleep scavenger from another goroutine")
+	}
+
+	if worked < minScavWorkTime {
+		// This means there wasn't enough work to actually fill up minScavWorkTime.
+		// That's fine; we shouldn't try to do anything with this information
+		// because it's going result in a short enough sleep request that things
+		// will get messy. Just assume we did at least this much work.
+		// All this means is that we'll sleep longer than we otherwise would have.
+		worked = minScavWorkTime
+	}
+
+	// Multiply the critical time by 1 + the ratio of the costs of using
+	// scavenged memory vs. scavenging memory. This forces us to pay down
+	// the cost of reusing this memory eagerly by sleeping for a longer period
+	// of time and scavenging less frequently. More concretely, we avoid situations
+	// where we end up scavenging so often that we hurt allocation performance
+	// because of the additional overheads of using scavenged memory.
+	worked *= 1 + scavengeCostRatio
+
+	// sleepTime is the amount of time we're going to sleep, based on the amount
+	// of time we worked, and the sleepRatio.
+	sleepTime := int64(worked / s.sleepRatio)
+
+	var slept int64
+	if s.sleepStub == nil {
+		// Set the timer.
+		//
+		// This must happen here instead of inside gopark
+		// because we can't close over any variables without
+		// failing escape analysis.
+		start := nanotime()
+		resetTimer(s.timer, start+sleepTime)
+
+		// Mark ourselves as asleep and go to sleep.
+		s.parked = true
+		goparkunlock(&s.lock, waitReasonSleep, traceBlockSleep, 2)
+
+		// How long we actually slept for.
+		slept = nanotime() - start
+
+		lock(&s.lock)
+		// Stop the timer here because s.wake is unable to do it for us.
+		// We don't really care if we succeed in stopping the timer. One
+		// reason we might fail is that we've already woken up, but the timer
+		// might be in the process of firing on some other P; essentially we're
+		// racing with it. That's totally OK. Double wake-ups are perfectly safe.
+		stopTimer(s.timer)
+		unlock(&s.lock)
+	} else {
+		unlock(&s.lock)
+		slept = s.sleepStub(sleepTime)
+	}
+
+	// Stop here if we're cooling down from the controller.
+	if s.controllerCooldown > 0 {
+		// worked and slept aren't exact measures of time, but it's OK to be a bit
+		// sloppy here. We're just hoping we're avoiding some transient bad behavior.
+		t := slept + int64(worked)
+		if t > s.controllerCooldown {
+			s.controllerCooldown = 0
+		} else {
+			s.controllerCooldown -= t
+		}
+		return
+	}
+
+	// idealFraction is the ideal % of overall application CPU time that we
+	// spend scavenging.
+	idealFraction := float64(scavengePercent) / 100.0
+
+	// Calculate the CPU time spent.
+	//
+	// This may be slightly inaccurate with respect to GOMAXPROCS, but we're
+	// recomputing this often enough relative to GOMAXPROCS changes in general
+	// (it only changes when the world is stopped, and not during a GC) that
+	// that small inaccuracy is in the noise.
+	cpuFraction := worked / ((float64(slept) + worked) * float64(s.gomaxprocs()))
+
+	// Update the critSleepRatio, adjusting until we reach our ideal fraction.
+	var ok bool
+	s.sleepRatio, ok = s.sleepController.next(cpuFraction, idealFraction, float64(slept)+worked)
+	if !ok {
+		// The core assumption of the controller, that we can get a proportional
+		// response, broke down. This may be transient, so temporarily switch to
+		// sleeping a fixed, conservative amount.
+		s.sleepRatio = startingScavSleepRatio
+		s.controllerCooldown = 5e9 // 5 seconds.
+
+		// Signal the scav trace printer to output this.
+		s.controllerFailed()
+	}
+}
+
+// controllerFailed indicates that the scavenger's scheduling
+// controller failed.
+func (s *scavengerState) controllerFailed() {
+	lock(&s.lock)
+	s.printControllerReset = true
+	unlock(&s.lock)
+}
+
+// run is the body of the main scavenging loop.
+//
+// Returns the number of bytes released and the estimated time spent
+// releasing those bytes.
+//
+// Must be run on the scavenger goroutine.
+func (s *scavengerState) run() (released uintptr, worked float64) {
+	lock(&s.lock)
+	if getg() != s.g {
+		throw("tried to run scavenger from another goroutine")
+	}
+	unlock(&s.lock)
+
+	for worked < minScavWorkTime {
+		// If something from outside tells us to stop early, stop.
+		if s.shouldStop() {
+			break
+		}
+
+		// scavengeQuantum is the amount of memory we try to scavenge
+		// in one go. A smaller value means the scavenger is more responsive
+		// to the scheduler in case of e.g. preemption. A larger value means
+		// that the overheads of scavenging are better amortized, so better
+		// scavenging throughput.
+		//
+		// The current value is chosen assuming a cost of ~10µs/physical page
+		// (this is somewhat pessimistic), which implies a worst-case latency of
+		// about 160µs for 4 KiB physical pages. The current value is biased
+		// toward latency over throughput.
+		const scavengeQuantum = 64 << 10
+
+		// Accumulate the amount of time spent scavenging.
+		r, duration := s.scavenge(scavengeQuantum)
+
+		// On some platforms we may see end >= start if the time it takes to scavenge
+		// memory is less than the minimum granularity of its clock (e.g. Windows) or
+		// due to clock bugs.
+		//
+		// In this case, just assume scavenging takes 10 µs per regular physical page
+		// (determined empirically), and conservatively ignore the impact of huge pages
+		// on timing.
+		const approxWorkedNSPerPhysicalPage = 10e3
+		if duration == 0 {
+			worked += approxWorkedNSPerPhysicalPage * float64(r/physPageSize)
+		} else {
+			// TODO(mknyszek): If duration is small compared to worked, it could be
+			// rounded down to zero. Probably not a problem in practice because the
+			// values are all within a few orders of magnitude of each other but maybe
+			// worth worrying about.
+			worked += float64(duration)
+		}
+		released += r
+
+		// scavenge does not return until it either finds the requisite amount of
+		// memory to scavenge, or exhausts the heap. If we haven't found enough
+		// to scavenge, then the heap must be exhausted.
+		if r < scavengeQuantum {
+			break
+		}
+		// When using fake time just do one loop.
+		if faketime != 0 {
+			break
+		}
+	}
+	if released > 0 && released < physPageSize {
+		// If this happens, it means that we may have attempted to release part
+		// of a physical page, but the likely effect of that is that it released
+		// the whole physical page, some of which may have still been in-use.
+		// This could lead to memory corruption. Throw.
+		throw("released less than one physical page of memory")
+	}
+	return
+}
+
+// Background scavenger.
+//
+// The background scavenger maintains the RSS of the application below
+// the line described by the proportional scavenging statistics in
+// the mheap struct.
+func bgscavenge(c chan int) {
+	scavenger.init()
+
+	c <- 1
+	scavenger.park()
+
+	for {
+		released, workTime := scavenger.run()
+		if released == 0 {
+			scavenger.park()
+			continue
+		}
+		mheap_.pages.scav.releasedBg.Add(released)
+		scavenger.sleep(workTime)
+	}
+}
+
+// scavenge scavenges nbytes worth of free pages, starting with the
+// highest address first. Successive calls continue from where it left
+// off until the heap is exhausted. force makes all memory available to
+// scavenge, ignoring huge page heuristics.
+//
+// Returns the amount of memory scavenged in bytes.
+//
+// scavenge always tries to scavenge nbytes worth of memory, and will
+// only fail to do so if the heap is exhausted for now.
+func (p *pageAlloc) scavenge(nbytes uintptr, shouldStop func() bool, force bool) uintptr {
+	released := uintptr(0)
+	for released < nbytes {
+		ci, pageIdx := p.scav.index.find(force)
+		if ci == 0 {
+			break
+		}
+		systemstack(func() {
+			released += p.scavengeOne(ci, pageIdx, nbytes-released)
+		})
+		if shouldStop != nil && shouldStop() {
+			break
+		}
+	}
+	return released
+}
+
+// printScavTrace prints a scavenge trace line to standard error.
+//
+// released should be the amount of memory released since the last time this
+// was called, and forced indicates whether the scavenge was forced by the
+// application.
+//
+// scavenger.lock must be held.
+func printScavTrace(releasedBg, releasedEager uintptr, forced bool) {
+	assertLockHeld(&scavenger.lock)
+
+	printlock()
+	print("scav ",
+		releasedBg>>10, " KiB work (bg), ",
+		releasedEager>>10, " KiB work (eager), ",
+		gcController.heapReleased.load()>>10, " KiB now, ",
+		(gcController.heapInUse.load()*100)/heapRetained(), "% util",
+	)
+	if forced {
+		print(" (forced)")
+	} else if scavenger.printControllerReset {
+		print(" [controller reset]")
+		scavenger.printControllerReset = false
+	}
+	println()
+	printunlock()
+}
+
+// scavengeOne walks over the chunk at chunk index ci and searches for
+// a contiguous run of pages to scavenge. It will try to scavenge
+// at most max bytes at once, but may scavenge more to avoid
+// breaking huge pages. Once it scavenges some memory it returns
+// how much it scavenged in bytes.
+//
+// searchIdx is the page index to start searching from in ci.
+//
+// Returns the number of bytes scavenged.
+//
+// Must run on the systemstack because it acquires p.mheapLock.
+//
+//go:systemstack
+func (p *pageAlloc) scavengeOne(ci chunkIdx, searchIdx uint, max uintptr) uintptr {
+	// Calculate the maximum number of pages to scavenge.
+	//
+	// This should be alignUp(max, pageSize) / pageSize but max can and will
+	// be ^uintptr(0), so we need to be very careful not to overflow here.
+	// Rather than use alignUp, calculate the number of pages rounded down
+	// first, then add back one if necessary.
+	maxPages := max / pageSize
+	if max%pageSize != 0 {
+		maxPages++
+	}
+
+	// Calculate the minimum number of pages we can scavenge.
+	//
+	// Because we can only scavenge whole physical pages, we must
+	// ensure that we scavenge at least minPages each time, aligned
+	// to minPages*pageSize.
+	minPages := physPageSize / pageSize
+	if minPages < 1 {
+		minPages = 1
+	}
+
+	lock(p.mheapLock)
+	if p.summary[len(p.summary)-1][ci].max() >= uint(minPages) {
+		// We only bother looking for a candidate if there at least
+		// minPages free pages at all.
+		base, npages := p.chunkOf(ci).findScavengeCandidate(searchIdx, minPages, maxPages)
+
+		// If we found something, scavenge it and return!
+		if npages != 0 {
+			// Compute the full address for the start of the range.
+			addr := chunkBase(ci) + uintptr(base)*pageSize
+
+			// Mark the range we're about to scavenge as allocated, because
+			// we don't want any allocating goroutines to grab it while
+			// the scavenging is in progress. Be careful here -- just do the
+			// bare minimum to avoid stepping on our own scavenging stats.
+			p.chunkOf(ci).allocRange(base, npages)
+			p.update(addr, uintptr(npages), true, true)
+
+			// Grab whether the chunk is hugepage backed and if it is,
+			// clear it. We're about to break up this huge page.
+			p.scav.index.setNoHugePage(ci)
+
+			// With that done, it's safe to unlock.
+			unlock(p.mheapLock)
+
+			if !p.test {
+				pageTraceScav(getg().m.p.ptr(), 0, addr, uintptr(npages))
+
+				// Only perform sys* operations if we're not in a test.
+				// It's dangerous to do so otherwise.
+				sysUnused(unsafe.Pointer(addr), uintptr(npages)*pageSize)
+
+				// Update global accounting only when not in test, otherwise
+				// the runtime's accounting will be wrong.
+				nbytes := int64(npages * pageSize)
+				gcController.heapReleased.add(nbytes)
+				gcController.heapFree.add(-nbytes)
+
+				stats := memstats.heapStats.acquire()
+				atomic.Xaddint64(&stats.committed, -nbytes)
+				atomic.Xaddint64(&stats.released, nbytes)
+				memstats.heapStats.release()
+			}
+
+			// Relock the heap, because now we need to make these pages
+			// available allocation. Free them back to the page allocator.
+			lock(p.mheapLock)
+			if b := (offAddr{addr}); b.lessThan(p.searchAddr) {
+				p.searchAddr = b
+			}
+			p.chunkOf(ci).free(base, npages)
+			p.update(addr, uintptr(npages), true, false)
+
+			// Mark the range as scavenged.
+			p.chunkOf(ci).scavenged.setRange(base, npages)
+			unlock(p.mheapLock)
+
+			return uintptr(npages) * pageSize
+		}
+	}
+	// Mark this chunk as having no free pages.
+	p.scav.index.setEmpty(ci)
+	unlock(p.mheapLock)
+
+	return 0
+}
+
+// fillAligned returns x but with all zeroes in m-aligned
+// groups of m bits set to 1 if any bit in the group is non-zero.
+//
+// For example, fillAligned(0x0100a3, 8) == 0xff00ff.
+//
+// Note that if m == 1, this is a no-op.
+//
+// m must be a power of 2 <= maxPagesPerPhysPage.
+func fillAligned(x uint64, m uint) uint64 {
+	apply := func(x uint64, c uint64) uint64 {
+		// The technique used it here is derived from
+		// https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord
+		// and extended for more than just bytes (like nibbles
+		// and uint16s) by using an appropriate constant.
+		//
+		// To summarize the technique, quoting from that page:
+		// "[It] works by first zeroing the high bits of the [8]
+		// bytes in the word. Subsequently, it adds a number that
+		// will result in an overflow to the high bit of a byte if
+		// any of the low bits were initially set. Next the high
+		// bits of the original word are ORed with these values;
+		// thus, the high bit of a byte is set iff any bit in the
+		// byte was set. Finally, we determine if any of these high
+		// bits are zero by ORing with ones everywhere except the
+		// high bits and inverting the result."
+		return ^((((x & c) + c) | x) | c)
+	}
+	// Transform x to contain a 1 bit at the top of each m-aligned
+	// group of m zero bits.
+	switch m {
+	case 1:
+		return x
+	case 2:
+		x = apply(x, 0x5555555555555555)
+	case 4:
+		x = apply(x, 0x7777777777777777)
+	case 8:
+		x = apply(x, 0x7f7f7f7f7f7f7f7f)
+	case 16:
+		x = apply(x, 0x7fff7fff7fff7fff)
+	case 32:
+		x = apply(x, 0x7fffffff7fffffff)
+	case 64: // == maxPagesPerPhysPage
+		x = apply(x, 0x7fffffffffffffff)
+	default:
+		throw("bad m value")
+	}
+	// Now, the top bit of each m-aligned group in x is set
+	// that group was all zero in the original x.
+
+	// From each group of m bits subtract 1.
+	// Because we know only the top bits of each
+	// m-aligned group are set, we know this will
+	// set each group to have all the bits set except
+	// the top bit, so just OR with the original
+	// result to set all the bits.
+	return ^((x - (x >> (m - 1))) | x)
+}
+
+// findScavengeCandidate returns a start index and a size for this pallocData
+// segment which represents a contiguous region of free and unscavenged memory.
+//
+// searchIdx indicates the page index within this chunk to start the search, but
+// note that findScavengeCandidate searches backwards through the pallocData. As
+// a result, it will return the highest scavenge candidate in address order.
+//
+// min indicates a hard minimum size and alignment for runs of pages. That is,
+// findScavengeCandidate will not return a region smaller than min pages in size,
+// or that is min pages or greater in size but not aligned to min. min must be
+// a non-zero power of 2 <= maxPagesPerPhysPage.
+//
+// max is a hint for how big of a region is desired. If max >= pallocChunkPages, then
+// findScavengeCandidate effectively returns entire free and unscavenged regions.
+// If max < pallocChunkPages, it may truncate the returned region such that size is
+// max. However, findScavengeCandidate may still return a larger region if, for
+// example, it chooses to preserve huge pages, or if max is not aligned to min (it
+// will round up). That is, even if max is small, the returned size is not guaranteed
+// to be equal to max. max is allowed to be less than min, in which case it is as if
+// max == min.
+func (m *pallocData) findScavengeCandidate(searchIdx uint, min, max uintptr) (uint, uint) {
+	if min&(min-1) != 0 || min == 0 {
+		print("runtime: min = ", min, "\n")
+		throw("min must be a non-zero power of 2")
+	} else if min > maxPagesPerPhysPage {
+		print("runtime: min = ", min, "\n")
+		throw("min too large")
+	}
+	// max may not be min-aligned, so we might accidentally truncate to
+	// a max value which causes us to return a non-min-aligned value.
+	// To prevent this, align max up to a multiple of min (which is always
+	// a power of 2). This also prevents max from ever being less than
+	// min, unless it's zero, so handle that explicitly.
+	if max == 0 {
+		max = min
+	} else {
+		max = alignUp(max, min)
+	}
+
+	i := int(searchIdx / 64)
+	// Start by quickly skipping over blocks of non-free or scavenged pages.
+	for ; i >= 0; i-- {
+		// 1s are scavenged OR non-free => 0s are unscavenged AND free
+		x := fillAligned(m.scavenged[i]|m.pallocBits[i], uint(min))
+		if x != ^uint64(0) {
+			break
+		}
+	}
+	if i < 0 {
+		// Failed to find any free/unscavenged pages.
+		return 0, 0
+	}
+	// We have something in the 64-bit chunk at i, but it could
+	// extend further. Loop until we find the extent of it.
+
+	// 1s are scavenged OR non-free => 0s are unscavenged AND free
+	x := fillAligned(m.scavenged[i]|m.pallocBits[i], uint(min))
+	z1 := uint(sys.LeadingZeros64(^x))
+	run, end := uint(0), uint(i)*64+(64-z1)
+	if x<<z1 != 0 {
+		// After shifting out z1 bits, we still have 1s,
+		// so the run ends inside this word.
+		run = uint(sys.LeadingZeros64(x << z1))
+	} else {
+		// After shifting out z1 bits, we have no more 1s.
+		// This means the run extends to the bottom of the
+		// word so it may extend into further words.
+		run = 64 - z1
+		for j := i - 1; j >= 0; j-- {
+			x := fillAligned(m.scavenged[j]|m.pallocBits[j], uint(min))
+			run += uint(sys.LeadingZeros64(x))
+			if x != 0 {
+				// The run stopped in this word.
+				break
+			}
+		}
+	}
+
+	// Split the run we found if it's larger than max but hold on to
+	// our original length, since we may need it later.
+	size := run
+	if size > uint(max) {
+		size = uint(max)
+	}
+	start := end - size
+
+	// Each huge page is guaranteed to fit in a single palloc chunk.
+	//
+	// TODO(mknyszek): Support larger huge page sizes.
+	// TODO(mknyszek): Consider taking pages-per-huge-page as a parameter
+	// so we can write tests for this.
+	if physHugePageSize > pageSize && physHugePageSize > physPageSize {
+		// We have huge pages, so let's ensure we don't break one by scavenging
+		// over a huge page boundary. If the range [start, start+size) overlaps with
+		// a free-and-unscavenged huge page, we want to grow the region we scavenge
+		// to include that huge page.
+
+		// Compute the huge page boundary above our candidate.
+		pagesPerHugePage := uintptr(physHugePageSize / pageSize)
+		hugePageAbove := uint(alignUp(uintptr(start), pagesPerHugePage))
+
+		// If that boundary is within our current candidate, then we may be breaking
+		// a huge page.
+		if hugePageAbove <= end {
+			// Compute the huge page boundary below our candidate.
+			hugePageBelow := uint(alignDown(uintptr(start), pagesPerHugePage))
+
+			if hugePageBelow >= end-run {
+				// We're in danger of breaking apart a huge page since start+size crosses
+				// a huge page boundary and rounding down start to the nearest huge
+				// page boundary is included in the full run we found. Include the entire
+				// huge page in the bound by rounding down to the huge page size.
+				size = size + (start - hugePageBelow)
+				start = hugePageBelow
+			}
+		}
+	}
+	return start, size
+}
+
+// scavengeIndex is a structure for efficiently managing which pageAlloc chunks have
+// memory available to scavenge.
+type scavengeIndex struct {
+	// chunks is a scavChunkData-per-chunk structure that indicates the presence of pages
+	// available for scavenging. Updates to the index are serialized by the pageAlloc lock.
+	//
+	// It tracks chunk occupancy and a generation counter per chunk. If a chunk's occupancy
+	// never exceeds pallocChunkDensePages over the course of a single GC cycle, the chunk
+	// becomes eligible for scavenging on the next cycle. If a chunk ever hits this density
+	// threshold it immediately becomes unavailable for scavenging in the current cycle as
+	// well as the next.
+	//
+	// [min, max) represents the range of chunks that is safe to access (i.e. will not cause
+	// a fault). As an optimization minHeapIdx represents the true minimum chunk that has been
+	// mapped, since min is likely rounded down to include the system page containing minHeapIdx.
+	//
+	// For a chunk size of 4 MiB this structure will only use 2 MiB for a 1 TiB contiguous heap.
+	chunks     []atomicScavChunkData
+	min, max   atomic.Uintptr
+	minHeapIdx atomic.Uintptr
+
+	// searchAddr* is the maximum address (in the offset address space, so we have a linear
+	// view of the address space; see mranges.go:offAddr) containing memory available to
+	// scavenge. It is a hint to the find operation to avoid O(n^2) behavior in repeated lookups.
+	//
+	// searchAddr* is always inclusive and should be the base address of the highest runtime
+	// page available for scavenging.
+	//
+	// searchAddrForce is managed by find and free.
+	// searchAddrBg is managed by find and nextGen.
+	//
+	// Normally, find monotonically decreases searchAddr* as it finds no more free pages to
+	// scavenge. However, mark, when marking a new chunk at an index greater than the current
+	// searchAddr, sets searchAddr to the *negative* index into chunks of that page. The trick here
+	// is that concurrent calls to find will fail to monotonically decrease searchAddr*, and so they
+	// won't barge over new memory becoming available to scavenge. Furthermore, this ensures
+	// that some future caller of find *must* observe the new high index. That caller
+	// (or any other racing with it), then makes searchAddr positive before continuing, bringing
+	// us back to our monotonically decreasing steady-state.
+	//
+	// A pageAlloc lock serializes updates between min, max, and searchAddr, so abs(searchAddr)
+	// is always guaranteed to be >= min and < max (converted to heap addresses).
+	//
+	// searchAddrBg is increased only on each new generation and is mainly used by the
+	// background scavenger and heap-growth scavenging. searchAddrForce is increased continuously
+	// as memory gets freed and is mainly used by eager memory reclaim such as debug.FreeOSMemory
+	// and scavenging to maintain the memory limit.
+	searchAddrBg    atomicOffAddr
+	searchAddrForce atomicOffAddr
+
+	// freeHWM is the highest address (in offset address space) that was freed
+	// this generation.
+	freeHWM offAddr
+
+	// Generation counter. Updated by nextGen at the end of each mark phase.
+	gen uint32
+
+	// test indicates whether or not we're in a test.
+	test bool
+}
+
+// init initializes the scavengeIndex.
+//
+// Returns the amount added to sysStat.
+func (s *scavengeIndex) init(test bool, sysStat *sysMemStat) uintptr {
+	s.searchAddrBg.Clear()
+	s.searchAddrForce.Clear()
+	s.freeHWM = minOffAddr
+	s.test = test
+	return s.sysInit(test, sysStat)
+}
+
+// sysGrow updates the index's backing store in response to a heap growth.
+//
+// Returns the amount of memory added to sysStat.
+func (s *scavengeIndex) grow(base, limit uintptr, sysStat *sysMemStat) uintptr {
+	// Update minHeapIdx. Note that even if there's no mapping work to do,
+	// we may still have a new, lower minimum heap address.
+	minHeapIdx := s.minHeapIdx.Load()
+	if baseIdx := uintptr(chunkIndex(base)); minHeapIdx == 0 || baseIdx < minHeapIdx {
+		s.minHeapIdx.Store(baseIdx)
+	}
+	return s.sysGrow(base, limit, sysStat)
+}
+
+// find returns the highest chunk index that may contain pages available to scavenge.
+// It also returns an offset to start searching in the highest chunk.
+func (s *scavengeIndex) find(force bool) (chunkIdx, uint) {
+	cursor := &s.searchAddrBg
+	if force {
+		cursor = &s.searchAddrForce
+	}
+	searchAddr, marked := cursor.Load()
+	if searchAddr == minOffAddr.addr() {
+		// We got a cleared search addr.
+		return 0, 0
+	}
+
+	// Starting from searchAddr's chunk, iterate until we find a chunk with pages to scavenge.
+	gen := s.gen
+	min := chunkIdx(s.minHeapIdx.Load())
+	start := chunkIndex(uintptr(searchAddr))
+	// N.B. We'll never map the 0'th chunk, so minHeapIdx ensures this loop overflow.
+	for i := start; i >= min; i-- {
+		// Skip over chunks.
+		if !s.chunks[i].load().shouldScavenge(gen, force) {
+			continue
+		}
+		// We're still scavenging this chunk.
+		if i == start {
+			return i, chunkPageIndex(uintptr(searchAddr))
+		}
+		// Try to reduce searchAddr to newSearchAddr.
+		newSearchAddr := chunkBase(i) + pallocChunkBytes - pageSize
+		if marked {
+			// Attempt to be the first one to decrease the searchAddr
+			// after an increase. If we fail, that means there was another
+			// increase, or somebody else got to it before us. Either way,
+			// it doesn't matter. We may lose some performance having an
+			// incorrect search address, but it's far more important that
+			// we don't miss updates.
+			cursor.StoreUnmark(searchAddr, newSearchAddr)
+		} else {
+			// Decrease searchAddr.
+			cursor.StoreMin(newSearchAddr)
+		}
+		return i, pallocChunkPages - 1
+	}
+	// Clear searchAddr, because we've exhausted the heap.
+	cursor.Clear()
+	return 0, 0
+}
+
+// alloc updates metadata for chunk at index ci with the fact that
+// an allocation of npages occurred. It also eagerly attempts to collapse
+// the chunk's memory into hugepage if the chunk has become sufficiently
+// dense and we're not allocating the whole chunk at once (which suggests
+// the allocation is part of a bigger one and it's probably not worth
+// eagerly collapsing).
+//
+// alloc may only run concurrently with find.
+func (s *scavengeIndex) alloc(ci chunkIdx, npages uint) {
+	sc := s.chunks[ci].load()
+	sc.alloc(npages, s.gen)
+	if !sc.isHugePage() && sc.inUse > scavChunkHiOccPages {
+		// Mark that we're considering this chunk as backed by huge pages.
+		sc.setHugePage()
+
+		// TODO(mknyszek): Consider eagerly backing memory with huge pages
+		// here. In the past we've attempted to use sysHugePageCollapse
+		// (which uses MADV_COLLAPSE on Linux, and is unsupported elswhere)
+		// for this purpose, but that caused performance issues in production
+		// environments.
+	}
+	s.chunks[ci].store(sc)
+}
+
+// free updates metadata for chunk at index ci with the fact that
+// a free of npages occurred.
+//
+// free may only run concurrently with find.
+func (s *scavengeIndex) free(ci chunkIdx, page, npages uint) {
+	sc := s.chunks[ci].load()
+	sc.free(npages, s.gen)
+	s.chunks[ci].store(sc)
+
+	// Update scavenge search addresses.
+	addr := chunkBase(ci) + uintptr(page+npages-1)*pageSize
+	if s.freeHWM.lessThan(offAddr{addr}) {
+		s.freeHWM = offAddr{addr}
+	}
+	// N.B. Because free is serialized, it's not necessary to do a
+	// full CAS here. free only ever increases searchAddr, while
+	// find only ever decreases it. Since we only ever race with
+	// decreases, even if the value we loaded is stale, the actual
+	// value will never be larger.
+	searchAddr, _ := s.searchAddrForce.Load()
+	if (offAddr{searchAddr}).lessThan(offAddr{addr}) {
+		s.searchAddrForce.StoreMarked(addr)
+	}
+}
+
+// nextGen moves the scavenger forward one generation. Must be called
+// once per GC cycle, but may be called more often to force more memory
+// to be released.
+//
+// nextGen may only run concurrently with find.
+func (s *scavengeIndex) nextGen() {
+	s.gen++
+	searchAddr, _ := s.searchAddrBg.Load()
+	if (offAddr{searchAddr}).lessThan(s.freeHWM) {
+		s.searchAddrBg.StoreMarked(s.freeHWM.addr())
+	}
+	s.freeHWM = minOffAddr
+}
+
+// setEmpty marks that the scavenger has finished looking at ci
+// for now to prevent the scavenger from getting stuck looking
+// at the same chunk.
+//
+// setEmpty may only run concurrently with find.
+func (s *scavengeIndex) setEmpty(ci chunkIdx) {
+	val := s.chunks[ci].load()
+	val.setEmpty()
+	s.chunks[ci].store(val)
+}
+
+// setNoHugePage updates the backed-by-hugepages status of a particular chunk.
+// Returns true if the set was successful (not already backed by huge pages).
+//
+// setNoHugePage may only run concurrently with find.
+func (s *scavengeIndex) setNoHugePage(ci chunkIdx) {
+	val := s.chunks[ci].load()
+	if !val.isHugePage() {
+		return
+	}
+	val.setNoHugePage()
+	s.chunks[ci].store(val)
+}
+
+// atomicScavChunkData is an atomic wrapper around a scavChunkData
+// that stores it in its packed form.
+type atomicScavChunkData struct {
+	value atomic.Uint64
+}
+
+// load loads and unpacks a scavChunkData.
+func (sc *atomicScavChunkData) load() scavChunkData {
+	return unpackScavChunkData(sc.value.Load())
+}
+
+// store packs and writes a new scavChunkData. store must be serialized
+// with other calls to store.
+func (sc *atomicScavChunkData) store(ssc scavChunkData) {
+	sc.value.Store(ssc.pack())
+}
+
+// scavChunkData tracks information about a palloc chunk for
+// scavenging. It packs well into 64 bits.
+//
+// The zero value always represents a valid newly-grown chunk.
+type scavChunkData struct {
+	// inUse indicates how many pages in this chunk are currently
+	// allocated.
+	//
+	// Only the first 10 bits are used.
+	inUse uint16
+
+	// lastInUse indicates how many pages in this chunk were allocated
+	// when we transitioned from gen-1 to gen.
+	//
+	// Only the first 10 bits are used.
+	lastInUse uint16
+
+	// gen is the generation counter from a scavengeIndex from the
+	// last time this scavChunkData was updated.
+	gen uint32
+
+	// scavChunkFlags represents additional flags
+	//
+	// Note: only 6 bits are available.
+	scavChunkFlags
+}
+
+// unpackScavChunkData unpacks a scavChunkData from a uint64.
+func unpackScavChunkData(sc uint64) scavChunkData {
+	return scavChunkData{
+		inUse:          uint16(sc),
+		lastInUse:      uint16(sc>>16) & scavChunkInUseMask,
+		gen:            uint32(sc >> 32),
+		scavChunkFlags: scavChunkFlags(uint8(sc>>(16+logScavChunkInUseMax)) & scavChunkFlagsMask),
+	}
+}
+
+// pack returns sc packed into a uint64.
+func (sc scavChunkData) pack() uint64 {
+	return uint64(sc.inUse) |
+		(uint64(sc.lastInUse) << 16) |
+		(uint64(sc.scavChunkFlags) << (16 + logScavChunkInUseMax)) |
+		(uint64(sc.gen) << 32)
+}
+
+const (
+	// scavChunkHasFree indicates whether the chunk has anything left to
+	// scavenge. This is the opposite of "empty," used elsewhere in this
+	// file. The reason we say "HasFree" here is so the zero value is
+	// correct for a newly-grown chunk. (New memory is scavenged.)
+	scavChunkHasFree scavChunkFlags = 1 << iota
+	// scavChunkNoHugePage indicates whether this chunk has had any huge
+	// pages broken by the scavenger.
+	//.
+	// The negative here is unfortunate, but necessary to make it so that
+	// the zero value of scavChunkData accurately represents the state of
+	// a newly-grown chunk. (New memory is marked as backed by huge pages.)
+	scavChunkNoHugePage
+
+	// scavChunkMaxFlags is the maximum number of flags we can have, given how
+	// a scavChunkData is packed into 8 bytes.
+	scavChunkMaxFlags  = 6
+	scavChunkFlagsMask = (1 << scavChunkMaxFlags) - 1
+
+	// logScavChunkInUseMax is the number of bits needed to represent the number
+	// of pages allocated in a single chunk. This is 1 more than log2 of the
+	// number of pages in the chunk because we need to represent a fully-allocated
+	// chunk.
+	logScavChunkInUseMax = logPallocChunkPages + 1
+	scavChunkInUseMask   = (1 << logScavChunkInUseMax) - 1
+)
+
+// scavChunkFlags is a set of bit-flags for the scavenger for each palloc chunk.
+type scavChunkFlags uint8
+
+// isEmpty returns true if the hasFree flag is unset.
+func (sc *scavChunkFlags) isEmpty() bool {
+	return (*sc)&scavChunkHasFree == 0
+}
+
+// setEmpty clears the hasFree flag.
+func (sc *scavChunkFlags) setEmpty() {
+	*sc &^= scavChunkHasFree
+}
+
+// setNonEmpty sets the hasFree flag.
+func (sc *scavChunkFlags) setNonEmpty() {
+	*sc |= scavChunkHasFree
+}
+
+// isHugePage returns false if the noHugePage flag is set.
+func (sc *scavChunkFlags) isHugePage() bool {
+	return (*sc)&scavChunkNoHugePage == 0
+}
+
+// setHugePage clears the noHugePage flag.
+func (sc *scavChunkFlags) setHugePage() {
+	*sc &^= scavChunkNoHugePage
+}
+
+// setNoHugePage sets the noHugePage flag.
+func (sc *scavChunkFlags) setNoHugePage() {
+	*sc |= scavChunkNoHugePage
+}
+
+// shouldScavenge returns true if the corresponding chunk should be interrogated
+// by the scavenger.
+func (sc scavChunkData) shouldScavenge(currGen uint32, force bool) bool {
+	if sc.isEmpty() {
+		// Nothing to scavenge.
+		return false
+	}
+	if force {
+		// We're forcing the memory to be scavenged.
+		return true
+	}
+	if sc.gen == currGen {
+		// In the current generation, if either the current or last generation
+		// is dense, then skip scavenging. Inverting that, we should scavenge
+		// if both the current and last generation were not dense.
+		return sc.inUse < scavChunkHiOccPages && sc.lastInUse < scavChunkHiOccPages
+	}
+	// If we're one or more generations ahead, we know inUse represents the current
+	// state of the chunk, since otherwise it would've been updated already.
+	return sc.inUse < scavChunkHiOccPages
+}
+
+// alloc updates sc given that npages were allocated in the corresponding chunk.
+func (sc *scavChunkData) alloc(npages uint, newGen uint32) {
+	if uint(sc.inUse)+npages > pallocChunkPages {
+		print("runtime: inUse=", sc.inUse, " npages=", npages, "\n")
+		throw("too many pages allocated in chunk?")
+	}
+	if sc.gen != newGen {
+		sc.lastInUse = sc.inUse
+		sc.gen = newGen
+	}
+	sc.inUse += uint16(npages)
+	if sc.inUse == pallocChunkPages {
+		// There's nothing for the scavenger to take from here.
+		sc.setEmpty()
+	}
+}
+
+// free updates sc given that npages was freed in the corresponding chunk.
+func (sc *scavChunkData) free(npages uint, newGen uint32) {
+	if uint(sc.inUse) < npages {
+		print("runtime: inUse=", sc.inUse, " npages=", npages, "\n")
+		throw("allocated pages below zero?")
+	}
+	if sc.gen != newGen {
+		sc.lastInUse = sc.inUse
+		sc.gen = newGen
+	}
+	sc.inUse -= uint16(npages)
+	// The scavenger can no longer be done with this chunk now that
+	// new memory has been freed into it.
+	sc.setNonEmpty()
+}
+
+type piController struct {
+	kp float64 // Proportional constant.
+	ti float64 // Integral time constant.
+	tt float64 // Reset time.
+
+	min, max float64 // Output boundaries.
+
+	// PI controller state.
+
+	errIntegral float64 // Integral of the error from t=0 to now.
+
+	// Error flags.
+	errOverflow   bool // Set if errIntegral ever overflowed.
+	inputOverflow bool // Set if an operation with the input overflowed.
+}
+
+// next provides a new sample to the controller.
+//
+// input is the sample, setpoint is the desired point, and period is how much
+// time (in whatever unit makes the most sense) has passed since the last sample.
+//
+// Returns a new value for the variable it's controlling, and whether the operation
+// completed successfully. One reason this might fail is if error has been growing
+// in an unbounded manner, to the point of overflow.
+//
+// In the specific case of an error overflow occurs, the errOverflow field will be
+// set and the rest of the controller's internal state will be fully reset.
+func (c *piController) next(input, setpoint, period float64) (float64, bool) {
+	// Compute the raw output value.
+	prop := c.kp * (setpoint - input)
+	rawOutput := prop + c.errIntegral
+
+	// Clamp rawOutput into output.
+	output := rawOutput
+	if isInf(output) || isNaN(output) {
+		// The input had a large enough magnitude that either it was already
+		// overflowed, or some operation with it overflowed.
+		// Set a flag and reset. That's the safest thing to do.
+		c.reset()
+		c.inputOverflow = true
+		return c.min, false
+	}
+	if output < c.min {
+		output = c.min
+	} else if output > c.max {
+		output = c.max
+	}
+
+	// Update the controller's state.
+	if c.ti != 0 && c.tt != 0 {
+		c.errIntegral += (c.kp*period/c.ti)*(setpoint-input) + (period/c.tt)*(output-rawOutput)
+		if isInf(c.errIntegral) || isNaN(c.errIntegral) {
+			// So much error has accumulated that we managed to overflow.
+			// The assumptions around the controller have likely broken down.
+			// Set a flag and reset. That's the safest thing to do.
+			c.reset()
+			c.errOverflow = true
+			return c.min, false
+		}
+	}
+	return output, true
+}
+
+// reset resets the controller state, except for controller error flags.
+func (c *piController) reset() {
+	c.errIntegral = 0
+}
diff --git a/src/runtime/mgcscavenge_test.go b/src/runtime/mgcscavenge_test.go
new file mode 100644
index 0000000..d7624d6
--- /dev/null
+++ b/src/runtime/mgcscavenge_test.go
@@ -0,0 +1,884 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"internal/goos"
+	"math"
+	"math/rand"
+	. "runtime"
+	"runtime/internal/atomic"
+	"testing"
+	"time"
+)
+
+// makePallocData produces an initialized PallocData by setting
+// the ranges of described in alloc and scavenge.
+func makePallocData(alloc, scavenged []BitRange) *PallocData {
+	b := new(PallocData)
+	for _, v := range alloc {
+		if v.N == 0 {
+			// Skip N==0. It's harmless and allocRange doesn't
+			// handle this case.
+			continue
+		}
+		b.AllocRange(v.I, v.N)
+	}
+	for _, v := range scavenged {
+		if v.N == 0 {
+			// See the previous loop.
+			continue
+		}
+		b.ScavengedSetRange(v.I, v.N)
+	}
+	return b
+}
+
+func TestFillAligned(t *testing.T) {
+	fillAlignedSlow := func(x uint64, m uint) uint64 {
+		if m == 1 {
+			return x
+		}
+		out := uint64(0)
+		for i := uint(0); i < 64; i += m {
+			for j := uint(0); j < m; j++ {
+				if x&(uint64(1)<<(i+j)) != 0 {
+					out |= ((uint64(1) << m) - 1) << i
+					break
+				}
+			}
+		}
+		return out
+	}
+	check := func(x uint64, m uint) {
+		want := fillAlignedSlow(x, m)
+		if got := FillAligned(x, m); got != want {
+			t.Logf("got:  %064b", got)
+			t.Logf("want: %064b", want)
+			t.Errorf("bad fillAligned(%016x, %d)", x, m)
+		}
+	}
+	for m := uint(1); m <= 64; m *= 2 {
+		tests := []uint64{
+			0x0000000000000000,
+			0x00000000ffffffff,
+			0xffffffff00000000,
+			0x8000000000000001,
+			0xf00000000000000f,
+			0xf00000010050000f,
+			0xffffffffffffffff,
+			0x0000000000000001,
+			0x0000000000000002,
+			0x0000000000000008,
+			uint64(1) << (m - 1),
+			uint64(1) << m,
+			// Try a few fixed arbitrary examples.
+			0xb02b9effcf137016,
+			0x3975a076a9fbff18,
+			0x0f8c88ec3b81506e,
+			0x60f14d80ef2fa0e6,
+		}
+		for _, test := range tests {
+			check(test, m)
+		}
+		for i := 0; i < 1000; i++ {
+			// Try a pseudo-random numbers.
+			check(rand.Uint64(), m)
+
+			if m > 1 {
+				// For m != 1, let's construct a slightly more interesting
+				// random test. Generate a bitmap which is either 0 or
+				// randomly set bits for each m-aligned group of m bits.
+				val := uint64(0)
+				for n := uint(0); n < 64; n += m {
+					// For each group of m bits, flip a coin:
+					// * Leave them as zero.
+					// * Set them randomly.
+					if rand.Uint64()%2 == 0 {
+						val |= (rand.Uint64() & ((1 << m) - 1)) << n
+					}
+				}
+				check(val, m)
+			}
+		}
+	}
+}
+
+func TestPallocDataFindScavengeCandidate(t *testing.T) {
+	type test struct {
+		alloc, scavenged []BitRange
+		min, max         uintptr
+		want             BitRange
+	}
+	tests := map[string]test{
+		"MixedMin1": {
+			alloc:     []BitRange{{0, 40}, {42, PallocChunkPages - 42}},
+			scavenged: []BitRange{{0, 41}, {42, PallocChunkPages - 42}},
+			min:       1,
+			max:       PallocChunkPages,
+			want:      BitRange{41, 1},
+		},
+		"MultiMin1": {
+			alloc:     []BitRange{{0, 63}, {65, 20}, {87, PallocChunkPages - 87}},
+			scavenged: []BitRange{{86, 1}},
+			min:       1,
+			max:       PallocChunkPages,
+			want:      BitRange{85, 1},
+		},
+	}
+	// Try out different page minimums.
+	for m := uintptr(1); m <= 64; m *= 2 {
+		suffix := fmt.Sprintf("Min%d", m)
+		tests["AllFree"+suffix] = test{
+			min:  m,
+			max:  PallocChunkPages,
+			want: BitRange{0, PallocChunkPages},
+		}
+		tests["AllScavenged"+suffix] = test{
+			scavenged: []BitRange{{0, PallocChunkPages}},
+			min:       m,
+			max:       PallocChunkPages,
+			want:      BitRange{0, 0},
+		}
+		tests["NoneFree"+suffix] = test{
+			alloc:     []BitRange{{0, PallocChunkPages}},
+			scavenged: []BitRange{{PallocChunkPages / 2, PallocChunkPages / 2}},
+			min:       m,
+			max:       PallocChunkPages,
+			want:      BitRange{0, 0},
+		}
+		tests["StartFree"+suffix] = test{
+			alloc: []BitRange{{uint(m), PallocChunkPages - uint(m)}},
+			min:   m,
+			max:   PallocChunkPages,
+			want:  BitRange{0, uint(m)},
+		}
+		tests["EndFree"+suffix] = test{
+			alloc: []BitRange{{0, PallocChunkPages - uint(m)}},
+			min:   m,
+			max:   PallocChunkPages,
+			want:  BitRange{PallocChunkPages - uint(m), uint(m)},
+		}
+		tests["Straddle64"+suffix] = test{
+			alloc: []BitRange{{0, 64 - uint(m)}, {64 + uint(m), PallocChunkPages - (64 + uint(m))}},
+			min:   m,
+			max:   2 * m,
+			want:  BitRange{64 - uint(m), 2 * uint(m)},
+		}
+		tests["BottomEdge64WithFull"+suffix] = test{
+			alloc:     []BitRange{{64, 64}, {128 + 3*uint(m), PallocChunkPages - (128 + 3*uint(m))}},
+			scavenged: []BitRange{{1, 10}},
+			min:       m,
+			max:       3 * m,
+			want:      BitRange{128, 3 * uint(m)},
+		}
+		tests["BottomEdge64WithPocket"+suffix] = test{
+			alloc:     []BitRange{{64, 62}, {127, 1}, {128 + 3*uint(m), PallocChunkPages - (128 + 3*uint(m))}},
+			scavenged: []BitRange{{1, 10}},
+			min:       m,
+			max:       3 * m,
+			want:      BitRange{128, 3 * uint(m)},
+		}
+		tests["Max0"+suffix] = test{
+			scavenged: []BitRange{{0, PallocChunkPages - uint(m)}},
+			min:       m,
+			max:       0,
+			want:      BitRange{PallocChunkPages - uint(m), uint(m)},
+		}
+		if m <= 8 {
+			tests["OneFree"] = test{
+				alloc: []BitRange{{0, 40}, {40 + uint(m), PallocChunkPages - (40 + uint(m))}},
+				min:   m,
+				max:   PallocChunkPages,
+				want:  BitRange{40, uint(m)},
+			}
+			tests["OneScavenged"] = test{
+				alloc:     []BitRange{{0, 40}, {40 + uint(m), PallocChunkPages - (40 + uint(m))}},
+				scavenged: []BitRange{{40, 1}},
+				min:       m,
+				max:       PallocChunkPages,
+				want:      BitRange{0, 0},
+			}
+		}
+		if m > 1 {
+			tests["MaxUnaligned"+suffix] = test{
+				scavenged: []BitRange{{0, PallocChunkPages - uint(m*2-1)}},
+				min:       m,
+				max:       m - 2,
+				want:      BitRange{PallocChunkPages - uint(m), uint(m)},
+			}
+			tests["SkipSmall"+suffix] = test{
+				alloc: []BitRange{{0, 64 - uint(m)}, {64, 5}, {70, 11}, {82, PallocChunkPages - 82}},
+				min:   m,
+				max:   m,
+				want:  BitRange{64 - uint(m), uint(m)},
+			}
+			tests["SkipMisaligned"+suffix] = test{
+				alloc: []BitRange{{0, 64 - uint(m)}, {64, 63}, {127 + uint(m), PallocChunkPages - (127 + uint(m))}},
+				min:   m,
+				max:   m,
+				want:  BitRange{64 - uint(m), uint(m)},
+			}
+			tests["MaxLessThan"+suffix] = test{
+				scavenged: []BitRange{{0, PallocChunkPages - uint(m)}},
+				min:       m,
+				max:       1,
+				want:      BitRange{PallocChunkPages - uint(m), uint(m)},
+			}
+		}
+	}
+	if PhysHugePageSize > uintptr(PageSize) {
+		// Check hugepage preserving behavior.
+		bits := uint(PhysHugePageSize / uintptr(PageSize))
+		if bits < PallocChunkPages {
+			tests["PreserveHugePageBottom"] = test{
+				alloc: []BitRange{{bits + 2, PallocChunkPages - (bits + 2)}},
+				min:   1,
+				max:   3, // Make it so that max would have us try to break the huge page.
+				want:  BitRange{0, bits + 2},
+			}
+			if 3*bits < PallocChunkPages {
+				// We need at least 3 huge pages in a chunk for this test to make sense.
+				tests["PreserveHugePageMiddle"] = test{
+					alloc: []BitRange{{0, bits - 10}, {2*bits + 10, PallocChunkPages - (2*bits + 10)}},
+					min:   1,
+					max:   12, // Make it so that max would have us try to break the huge page.
+					want:  BitRange{bits, bits + 10},
+				}
+			}
+			tests["PreserveHugePageTop"] = test{
+				alloc: []BitRange{{0, PallocChunkPages - bits}},
+				min:   1,
+				max:   1, // Even one page would break a huge page in this case.
+				want:  BitRange{PallocChunkPages - bits, bits},
+			}
+		} else if bits == PallocChunkPages {
+			tests["PreserveHugePageAll"] = test{
+				min:  1,
+				max:  1, // Even one page would break a huge page in this case.
+				want: BitRange{0, PallocChunkPages},
+			}
+		} else {
+			// The huge page size is greater than pallocChunkPages, so it should
+			// be effectively disabled. There's no way we can possible scavenge
+			// a huge page out of this bitmap chunk.
+			tests["PreserveHugePageNone"] = test{
+				min:  1,
+				max:  1,
+				want: BitRange{PallocChunkPages - 1, 1},
+			}
+		}
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := makePallocData(v.alloc, v.scavenged)
+			start, size := b.FindScavengeCandidate(PallocChunkPages-1, v.min, v.max)
+			got := BitRange{start, size}
+			if !(got.N == 0 && v.want.N == 0) && got != v.want {
+				t.Fatalf("candidate mismatch: got %v, want %v", got, v.want)
+			}
+		})
+	}
+}
+
+// Tests end-to-end scavenging on a pageAlloc.
+func TestPageAllocScavenge(t *testing.T) {
+	if GOOS == "openbsd" && testing.Short() {
+		t.Skip("skipping because virtual memory is limited; see #36210")
+	}
+	type test struct {
+		request, expect uintptr
+	}
+	minPages := PhysPageSize / PageSize
+	if minPages < 1 {
+		minPages = 1
+	}
+	type setup struct {
+		beforeAlloc map[ChunkIdx][]BitRange
+		beforeScav  map[ChunkIdx][]BitRange
+		expect      []test
+		afterScav   map[ChunkIdx][]BitRange
+	}
+	tests := map[string]setup{
+		"AllFreeUnscavExhaust": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+				BaseChunkIdx + 2: {},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+				BaseChunkIdx + 2: {},
+			},
+			expect: []test{
+				{^uintptr(0), 3 * PallocChunkPages * PageSize},
+			},
+			afterScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+			},
+		},
+		"NoneFreeUnscavExhaust": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {},
+			},
+			expect: []test{
+				{^uintptr(0), 0},
+			},
+			afterScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {},
+			},
+		},
+		"ScavHighestPageFirst": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{uint(minPages), PallocChunkPages - uint(2*minPages)}},
+			},
+			expect: []test{
+				{1, minPages * PageSize},
+			},
+			afterScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{uint(minPages), PallocChunkPages - uint(minPages)}},
+			},
+		},
+		"ScavMultiple": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{uint(minPages), PallocChunkPages - uint(2*minPages)}},
+			},
+			expect: []test{
+				{minPages * PageSize, minPages * PageSize},
+				{minPages * PageSize, minPages * PageSize},
+			},
+			afterScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+		},
+		"ScavMultiple2": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{uint(minPages), PallocChunkPages - uint(2*minPages)}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages - uint(2*minPages)}},
+			},
+			expect: []test{
+				{2 * minPages * PageSize, 2 * minPages * PageSize},
+				{minPages * PageSize, minPages * PageSize},
+				{minPages * PageSize, minPages * PageSize},
+			},
+			afterScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+			},
+		},
+		"ScavDiscontiguous": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:       {},
+				BaseChunkIdx + 0xe: {},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:       {{uint(minPages), PallocChunkPages - uint(2*minPages)}},
+				BaseChunkIdx + 0xe: {{uint(2 * minPages), PallocChunkPages - uint(2*minPages)}},
+			},
+			expect: []test{
+				{2 * minPages * PageSize, 2 * minPages * PageSize},
+				{^uintptr(0), 2 * minPages * PageSize},
+				{^uintptr(0), 0},
+			},
+			afterScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:       {{0, PallocChunkPages}},
+				BaseChunkIdx + 0xe: {{0, PallocChunkPages}},
+			},
+		},
+	}
+	// Disable these tests on iOS since we have a small address space.
+	// See #46860.
+	if PageAlloc64Bit != 0 && goos.IsIos == 0 {
+		tests["ScavAllVeryDiscontiguous"] = setup{
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:          {},
+				BaseChunkIdx + 0x1000: {},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:          {},
+				BaseChunkIdx + 0x1000: {},
+			},
+			expect: []test{
+				{^uintptr(0), 2 * PallocChunkPages * PageSize},
+				{^uintptr(0), 0},
+			},
+			afterScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:          {{0, PallocChunkPages}},
+				BaseChunkIdx + 0x1000: {{0, PallocChunkPages}},
+			},
+		}
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := NewPageAlloc(v.beforeAlloc, v.beforeScav)
+			defer FreePageAlloc(b)
+
+			for iter, h := range v.expect {
+				if got := b.Scavenge(h.request); got != h.expect {
+					t.Fatalf("bad scavenge #%d: want %d, got %d", iter+1, h.expect, got)
+				}
+			}
+			want := NewPageAlloc(v.beforeAlloc, v.afterScav)
+			defer FreePageAlloc(want)
+
+			checkPageAlloc(t, want, b)
+		})
+	}
+}
+
+func TestScavenger(t *testing.T) {
+	// workedTime is a standard conversion of bytes of scavenge
+	// work to time elapsed.
+	workedTime := func(bytes uintptr) int64 {
+		return int64((bytes+4095)/4096) * int64(10*time.Microsecond)
+	}
+
+	// Set up a bunch of state that we're going to track and verify
+	// throughout the test.
+	totalWork := uint64(64<<20 - 3*PhysPageSize)
+	var totalSlept, totalWorked atomic.Int64
+	var availableWork atomic.Uint64
+	var stopAt atomic.Uint64 // How much available work to stop at.
+
+	// Set up the scavenger.
+	var s Scavenger
+	s.Sleep = func(ns int64) int64 {
+		totalSlept.Add(ns)
+		return ns
+	}
+	s.Scavenge = func(bytes uintptr) (uintptr, int64) {
+		avail := availableWork.Load()
+		if uint64(bytes) > avail {
+			bytes = uintptr(avail)
+		}
+		t := workedTime(bytes)
+		if bytes != 0 {
+			availableWork.Add(-int64(bytes))
+			totalWorked.Add(t)
+		}
+		return bytes, t
+	}
+	s.ShouldStop = func() bool {
+		if availableWork.Load() <= stopAt.Load() {
+			return true
+		}
+		return false
+	}
+	s.GoMaxProcs = func() int32 {
+		return 1
+	}
+
+	// Define a helper for verifying that various properties hold.
+	verifyScavengerState := func(t *testing.T, expWork uint64) {
+		t.Helper()
+
+		// Check to make sure it did the amount of work we expected.
+		if workDone := uint64(s.Released()); workDone != expWork {
+			t.Errorf("want %d bytes of work done, got %d", expWork, workDone)
+		}
+		// Check to make sure the scavenger is meeting its CPU target.
+		idealFraction := float64(ScavengePercent) / 100.0
+		cpuFraction := float64(totalWorked.Load()) / float64(totalWorked.Load()+totalSlept.Load())
+		if cpuFraction < idealFraction-0.005 || cpuFraction > idealFraction+0.005 {
+			t.Errorf("want %f CPU fraction, got %f", idealFraction, cpuFraction)
+		}
+	}
+
+	// Start the scavenger.
+	s.Start()
+
+	// Set up some work and let the scavenger run to completion.
+	availableWork.Store(totalWork)
+	s.Wake()
+	if !s.BlockUntilParked(2e9 /* 2 seconds */) {
+		t.Fatal("timed out waiting for scavenger to run to completion")
+	}
+	// Run a check.
+	verifyScavengerState(t, totalWork)
+
+	// Now let's do it again and see what happens when we have no work to do.
+	// It should've gone right back to sleep.
+	s.Wake()
+	if !s.BlockUntilParked(2e9 /* 2 seconds */) {
+		t.Fatal("timed out waiting for scavenger to run to completion")
+	}
+	// Run another check.
+	verifyScavengerState(t, totalWork)
+
+	// One more time, this time doing the same amount of work as the first time.
+	// Let's see if we can get the scavenger to continue.
+	availableWork.Store(totalWork)
+	s.Wake()
+	if !s.BlockUntilParked(2e9 /* 2 seconds */) {
+		t.Fatal("timed out waiting for scavenger to run to completion")
+	}
+	// Run another check.
+	verifyScavengerState(t, 2*totalWork)
+
+	// This time, let's stop after a certain amount of work.
+	//
+	// Pick a stopping point such that when subtracted from totalWork
+	// we get a multiple of a relatively large power of 2. verifyScavengerState
+	// always makes an exact check, but the scavenger might go a little over,
+	// which is OK. If this breaks often or gets annoying to maintain, modify
+	// verifyScavengerState.
+	availableWork.Store(totalWork)
+	stoppingPoint := uint64(1<<20 - 3*PhysPageSize)
+	stopAt.Store(stoppingPoint)
+	s.Wake()
+	if !s.BlockUntilParked(2e9 /* 2 seconds */) {
+		t.Fatal("timed out waiting for scavenger to run to completion")
+	}
+	// Run another check.
+	verifyScavengerState(t, 2*totalWork+(totalWork-stoppingPoint))
+
+	// Clean up.
+	s.Stop()
+}
+
+func TestScavengeIndex(t *testing.T) {
+	// This test suite tests the scavengeIndex data structure.
+
+	// markFunc is a function that makes the address range [base, limit)
+	// available for scavenging in a test index.
+	type markFunc func(base, limit uintptr)
+
+	// findFunc is a function that searches for the next available page
+	// to scavenge in the index. It asserts that the page is found in
+	// chunk "ci" at page "offset."
+	type findFunc func(ci ChunkIdx, offset uint)
+
+	// The structure of the tests below is as follows:
+	//
+	// setup creates a fake scavengeIndex that can be mutated and queried by
+	// the functions it returns. Those functions capture the testing.T that
+	// setup is called with, so they're bound to the subtest they're created in.
+	//
+	// Tests are then organized into test cases which mark some pages as
+	// scavenge-able then try to find them. Tests expect that the initial
+	// state of the scavengeIndex has all of the chunks as dense in the last
+	// generation and empty to the scavenger.
+	//
+	// There are a few additional tests that interleave mark and find operations,
+	// so they're defined separately, but use the same infrastructure.
+	setup := func(t *testing.T, force bool) (mark markFunc, find findFunc, nextGen func()) {
+		t.Helper()
+
+		// Pick some reasonable bounds. We don't need a huge range just to test.
+		si := NewScavengeIndex(BaseChunkIdx, BaseChunkIdx+64)
+
+		// Initialize all the chunks as dense and empty.
+		//
+		// Also, reset search addresses so that we can get page offsets.
+		si.AllocRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+64, 0))
+		si.NextGen()
+		si.FreeRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+64, 0))
+		for ci := BaseChunkIdx; ci < BaseChunkIdx+64; ci++ {
+			si.SetEmpty(ci)
+		}
+		si.ResetSearchAddrs()
+
+		// Create and return test functions.
+		mark = func(base, limit uintptr) {
+			t.Helper()
+
+			si.AllocRange(base, limit)
+			si.FreeRange(base, limit)
+		}
+		find = func(want ChunkIdx, wantOffset uint) {
+			t.Helper()
+
+			got, gotOffset := si.Find(force)
+			if want != got {
+				t.Errorf("find: wanted chunk index %d, got %d", want, got)
+			}
+			if wantOffset != gotOffset {
+				t.Errorf("find: wanted page offset %d, got %d", wantOffset, gotOffset)
+			}
+			if t.Failed() {
+				t.FailNow()
+			}
+			si.SetEmpty(got)
+		}
+		nextGen = func() {
+			t.Helper()
+
+			si.NextGen()
+		}
+		return
+	}
+
+	// Each of these test cases calls mark and then find once.
+	type testCase struct {
+		name string
+		mark func(markFunc)
+		find func(findFunc)
+	}
+	for _, test := range []testCase{
+		{
+			name: "Uninitialized",
+			mark: func(_ markFunc) {},
+			find: func(_ findFunc) {},
+		},
+		{
+			name: "OnePage",
+			mark: func(mark markFunc) {
+				mark(PageBase(BaseChunkIdx, 3), PageBase(BaseChunkIdx, 4))
+			},
+			find: func(find findFunc) {
+				find(BaseChunkIdx, 3)
+			},
+		},
+		{
+			name: "FirstPage",
+			mark: func(mark markFunc) {
+				mark(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx, 1))
+			},
+			find: func(find findFunc) {
+				find(BaseChunkIdx, 0)
+			},
+		},
+		{
+			name: "SeveralPages",
+			mark: func(mark markFunc) {
+				mark(PageBase(BaseChunkIdx, 9), PageBase(BaseChunkIdx, 14))
+			},
+			find: func(find findFunc) {
+				find(BaseChunkIdx, 13)
+			},
+		},
+		{
+			name: "WholeChunk",
+			mark: func(mark markFunc) {
+				mark(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+1, 0))
+			},
+			find: func(find findFunc) {
+				find(BaseChunkIdx, PallocChunkPages-1)
+			},
+		},
+		{
+			name: "LastPage",
+			mark: func(mark markFunc) {
+				mark(PageBase(BaseChunkIdx, PallocChunkPages-1), PageBase(BaseChunkIdx+1, 0))
+			},
+			find: func(find findFunc) {
+				find(BaseChunkIdx, PallocChunkPages-1)
+			},
+		},
+		{
+			name: "TwoChunks",
+			mark: func(mark markFunc) {
+				mark(PageBase(BaseChunkIdx, 128), PageBase(BaseChunkIdx+1, 128))
+			},
+			find: func(find findFunc) {
+				find(BaseChunkIdx+1, 127)
+				find(BaseChunkIdx, PallocChunkPages-1)
+			},
+		},
+		{
+			name: "TwoChunksOffset",
+			mark: func(mark markFunc) {
+				mark(PageBase(BaseChunkIdx+7, 128), PageBase(BaseChunkIdx+8, 129))
+			},
+			find: func(find findFunc) {
+				find(BaseChunkIdx+8, 128)
+				find(BaseChunkIdx+7, PallocChunkPages-1)
+			},
+		},
+		{
+			name: "SevenChunksOffset",
+			mark: func(mark markFunc) {
+				mark(PageBase(BaseChunkIdx+6, 11), PageBase(BaseChunkIdx+13, 15))
+			},
+			find: func(find findFunc) {
+				find(BaseChunkIdx+13, 14)
+				for i := BaseChunkIdx + 12; i >= BaseChunkIdx+6; i-- {
+					find(i, PallocChunkPages-1)
+				}
+			},
+		},
+		{
+			name: "ThirtyTwoChunks",
+			mark: func(mark markFunc) {
+				mark(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+32, 0))
+			},
+			find: func(find findFunc) {
+				for i := BaseChunkIdx + 31; i >= BaseChunkIdx; i-- {
+					find(i, PallocChunkPages-1)
+				}
+			},
+		},
+		{
+			name: "ThirtyTwoChunksOffset",
+			mark: func(mark markFunc) {
+				mark(PageBase(BaseChunkIdx+3, 0), PageBase(BaseChunkIdx+35, 0))
+			},
+			find: func(find findFunc) {
+				for i := BaseChunkIdx + 34; i >= BaseChunkIdx+3; i-- {
+					find(i, PallocChunkPages-1)
+				}
+			},
+		},
+		{
+			name: "Mark",
+			mark: func(mark markFunc) {
+				for i := BaseChunkIdx; i < BaseChunkIdx+32; i++ {
+					mark(PageBase(i, 0), PageBase(i+1, 0))
+				}
+			},
+			find: func(find findFunc) {
+				for i := BaseChunkIdx + 31; i >= BaseChunkIdx; i-- {
+					find(i, PallocChunkPages-1)
+				}
+			},
+		},
+		{
+			name: "MarkIdempotentOneChunk",
+			mark: func(mark markFunc) {
+				mark(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+1, 0))
+				mark(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+1, 0))
+			},
+			find: func(find findFunc) {
+				find(BaseChunkIdx, PallocChunkPages-1)
+			},
+		},
+		{
+			name: "MarkIdempotentThirtyTwoChunks",
+			mark: func(mark markFunc) {
+				mark(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+32, 0))
+				mark(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+32, 0))
+			},
+			find: func(find findFunc) {
+				for i := BaseChunkIdx + 31; i >= BaseChunkIdx; i-- {
+					find(i, PallocChunkPages-1)
+				}
+			},
+		},
+		{
+			name: "MarkIdempotentThirtyTwoChunksOffset",
+			mark: func(mark markFunc) {
+				mark(PageBase(BaseChunkIdx+4, 0), PageBase(BaseChunkIdx+31, 0))
+				mark(PageBase(BaseChunkIdx+5, 0), PageBase(BaseChunkIdx+36, 0))
+			},
+			find: func(find findFunc) {
+				for i := BaseChunkIdx + 35; i >= BaseChunkIdx+4; i-- {
+					find(i, PallocChunkPages-1)
+				}
+			},
+		},
+	} {
+		test := test
+		t.Run("Bg/"+test.name, func(t *testing.T) {
+			mark, find, nextGen := setup(t, false)
+			test.mark(mark)
+			find(0, 0)      // Make sure we find nothing at this point.
+			nextGen()       // Move to the next generation.
+			test.find(find) // Now we should be able to find things.
+			find(0, 0)      // The test should always fully exhaust the index.
+		})
+		t.Run("Force/"+test.name, func(t *testing.T) {
+			mark, find, _ := setup(t, true)
+			test.mark(mark)
+			test.find(find) // Finding should always work when forced.
+			find(0, 0)      // The test should always fully exhaust the index.
+		})
+	}
+	t.Run("Bg/MarkInterleaved", func(t *testing.T) {
+		mark, find, nextGen := setup(t, false)
+		for i := BaseChunkIdx; i < BaseChunkIdx+32; i++ {
+			mark(PageBase(i, 0), PageBase(i+1, 0))
+			nextGen()
+			find(i, PallocChunkPages-1)
+		}
+		find(0, 0)
+	})
+	t.Run("Force/MarkInterleaved", func(t *testing.T) {
+		mark, find, _ := setup(t, true)
+		for i := BaseChunkIdx; i < BaseChunkIdx+32; i++ {
+			mark(PageBase(i, 0), PageBase(i+1, 0))
+			find(i, PallocChunkPages-1)
+		}
+		find(0, 0)
+	})
+}
+
+func TestScavChunkDataPack(t *testing.T) {
+	if !CheckPackScavChunkData(1918237402, 512, 512, 0b11) {
+		t.Error("failed pack/unpack check for scavChunkData 1")
+	}
+	if !CheckPackScavChunkData(^uint32(0), 12, 0, 0b00) {
+		t.Error("failed pack/unpack check for scavChunkData 2")
+	}
+}
+
+func FuzzPIController(f *testing.F) {
+	isNormal := func(x float64) bool {
+		return !math.IsInf(x, 0) && !math.IsNaN(x)
+	}
+	isPositive := func(x float64) bool {
+		return isNormal(x) && x > 0
+	}
+	// Seed with constants from controllers in the runtime.
+	// It's not critical that we keep these in sync, they're just
+	// reasonable seed inputs.
+	f.Add(0.3375, 3.2e6, 1e9, 0.001, 1000.0, 0.01)
+	f.Add(0.9, 4.0, 1000.0, -1000.0, 1000.0, 0.84)
+	f.Fuzz(func(t *testing.T, kp, ti, tt, min, max, setPoint float64) {
+		// Ignore uninteresting invalid parameters. These parameters
+		// are constant, so in practice surprising values will be documented
+		// or will be other otherwise immediately visible.
+		//
+		// We just want to make sure that given a non-Inf, non-NaN input,
+		// we always get a non-Inf, non-NaN output.
+		if !isPositive(kp) || !isPositive(ti) || !isPositive(tt) {
+			return
+		}
+		if !isNormal(min) || !isNormal(max) || min > max {
+			return
+		}
+		// Use a random source, but make it deterministic.
+		rs := rand.New(rand.NewSource(800))
+		randFloat64 := func() float64 {
+			return math.Float64frombits(rs.Uint64())
+		}
+		p := NewPIController(kp, ti, tt, min, max)
+		state := float64(0)
+		for i := 0; i < 100; i++ {
+			input := randFloat64()
+			// Ignore the "ok" parameter. We're just trying to break it.
+			// state is intentionally completely uncorrelated with the input.
+			var ok bool
+			state, ok = p.Next(input, setPoint, 1.0)
+			if !isNormal(state) {
+				t.Fatalf("got NaN or Inf result from controller: %f %v", state, ok)
+			}
+		}
+	})
+}
diff --git a/src/runtime/mgcstack.go b/src/runtime/mgcstack.go
new file mode 100644
index 0000000..6b55220
--- /dev/null
+++ b/src/runtime/mgcstack.go
@@ -0,0 +1,350 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Garbage collector: stack objects and stack tracing
+// See the design doc at https://docs.google.com/document/d/1un-Jn47yByHL7I0aVIP_uVCMxjdM5mpelJhiKlIqxkE/edit?usp=sharing
+// Also see issue 22350.
+
+// Stack tracing solves the problem of determining which parts of the
+// stack are live and should be scanned. It runs as part of scanning
+// a single goroutine stack.
+//
+// Normally determining which parts of the stack are live is easy to
+// do statically, as user code has explicit references (reads and
+// writes) to stack variables. The compiler can do a simple dataflow
+// analysis to determine liveness of stack variables at every point in
+// the code. See cmd/compile/internal/gc/plive.go for that analysis.
+//
+// However, when we take the address of a stack variable, determining
+// whether that variable is still live is less clear. We can still
+// look for static accesses, but accesses through a pointer to the
+// variable are difficult in general to track statically. That pointer
+// can be passed among functions on the stack, conditionally retained,
+// etc.
+//
+// Instead, we will track pointers to stack variables dynamically.
+// All pointers to stack-allocated variables will themselves be on the
+// stack somewhere (or in associated locations, like defer records), so
+// we can find them all efficiently.
+//
+// Stack tracing is organized as a mini garbage collection tracing
+// pass. The objects in this garbage collection are all the variables
+// on the stack whose address is taken, and which themselves contain a
+// pointer. We call these variables "stack objects".
+//
+// We begin by determining all the stack objects on the stack and all
+// the statically live pointers that may point into the stack. We then
+// process each pointer to see if it points to a stack object. If it
+// does, we scan that stack object. It may contain pointers into the
+// heap, in which case those pointers are passed to the main garbage
+// collection. It may also contain pointers into the stack, in which
+// case we add them to our set of stack pointers.
+//
+// Once we're done processing all the pointers (including the ones we
+// added during processing), we've found all the stack objects that
+// are live. Any dead stack objects are not scanned and their contents
+// will not keep heap objects live. Unlike the main garbage
+// collection, we can't sweep the dead stack objects; they live on in
+// a moribund state until the stack frame that contains them is
+// popped.
+//
+// A stack can look like this:
+//
+// +----------+
+// | foo()    |
+// | +------+ |
+// | |  A   | | <---\
+// | +------+ |     |
+// |          |     |
+// | +------+ |     |
+// | |  B   | |     |
+// | +------+ |     |
+// |          |     |
+// +----------+     |
+// | bar()    |     |
+// | +------+ |     |
+// | |  C   | | <-\ |
+// | +----|-+ |   | |
+// |      |   |   | |
+// | +----v-+ |   | |
+// | |  D  ---------/
+// | +------+ |   |
+// |          |   |
+// +----------+   |
+// | baz()    |   |
+// | +------+ |   |
+// | |  E  -------/
+// | +------+ |
+// |      ^   |
+// | F: --/   |
+// |          |
+// +----------+
+//
+// foo() calls bar() calls baz(). Each has a frame on the stack.
+// foo() has stack objects A and B.
+// bar() has stack objects C and D, with C pointing to D and D pointing to A.
+// baz() has a stack object E pointing to C, and a local variable F pointing to E.
+//
+// Starting from the pointer in local variable F, we will eventually
+// scan all of E, C, D, and A (in that order). B is never scanned
+// because there is no live pointer to it. If B is also statically
+// dead (meaning that foo() never accesses B again after it calls
+// bar()), then B's pointers into the heap are not considered live.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const stackTraceDebug = false
+
+// Buffer for pointers found during stack tracing.
+// Must be smaller than or equal to workbuf.
+type stackWorkBuf struct {
+	_ sys.NotInHeap
+	stackWorkBufHdr
+	obj [(_WorkbufSize - unsafe.Sizeof(stackWorkBufHdr{})) / goarch.PtrSize]uintptr
+}
+
+// Header declaration must come after the buf declaration above, because of issue #14620.
+type stackWorkBufHdr struct {
+	_ sys.NotInHeap
+	workbufhdr
+	next *stackWorkBuf // linked list of workbufs
+	// Note: we could theoretically repurpose lfnode.next as this next pointer.
+	// It would save 1 word, but that probably isn't worth busting open
+	// the lfnode API.
+}
+
+// Buffer for stack objects found on a goroutine stack.
+// Must be smaller than or equal to workbuf.
+type stackObjectBuf struct {
+	_ sys.NotInHeap
+	stackObjectBufHdr
+	obj [(_WorkbufSize - unsafe.Sizeof(stackObjectBufHdr{})) / unsafe.Sizeof(stackObject{})]stackObject
+}
+
+type stackObjectBufHdr struct {
+	_ sys.NotInHeap
+	workbufhdr
+	next *stackObjectBuf
+}
+
+func init() {
+	if unsafe.Sizeof(stackWorkBuf{}) > unsafe.Sizeof(workbuf{}) {
+		panic("stackWorkBuf too big")
+	}
+	if unsafe.Sizeof(stackObjectBuf{}) > unsafe.Sizeof(workbuf{}) {
+		panic("stackObjectBuf too big")
+	}
+}
+
+// A stackObject represents a variable on the stack that has had
+// its address taken.
+type stackObject struct {
+	_     sys.NotInHeap
+	off   uint32             // offset above stack.lo
+	size  uint32             // size of object
+	r     *stackObjectRecord // info of the object (for ptr/nonptr bits). nil if object has been scanned.
+	left  *stackObject       // objects with lower addresses
+	right *stackObject       // objects with higher addresses
+}
+
+// obj.r = r, but with no write barrier.
+//
+//go:nowritebarrier
+func (obj *stackObject) setRecord(r *stackObjectRecord) {
+	// Types of stack objects are always in read-only memory, not the heap.
+	// So not using a write barrier is ok.
+	*(*uintptr)(unsafe.Pointer(&obj.r)) = uintptr(unsafe.Pointer(r))
+}
+
+// A stackScanState keeps track of the state used during the GC walk
+// of a goroutine.
+type stackScanState struct {
+	cache pcvalueCache
+
+	// stack limits
+	stack stack
+
+	// conservative indicates that the next frame must be scanned conservatively.
+	// This applies only to the innermost frame at an async safe-point.
+	conservative bool
+
+	// buf contains the set of possible pointers to stack objects.
+	// Organized as a LIFO linked list of buffers.
+	// All buffers except possibly the head buffer are full.
+	buf     *stackWorkBuf
+	freeBuf *stackWorkBuf // keep around one free buffer for allocation hysteresis
+
+	// cbuf contains conservative pointers to stack objects. If
+	// all pointers to a stack object are obtained via
+	// conservative scanning, then the stack object may be dead
+	// and may contain dead pointers, so it must be scanned
+	// defensively.
+	cbuf *stackWorkBuf
+
+	// list of stack objects
+	// Objects are in increasing address order.
+	head  *stackObjectBuf
+	tail  *stackObjectBuf
+	nobjs int
+
+	// root of binary tree for fast object lookup by address
+	// Initialized by buildIndex.
+	root *stackObject
+}
+
+// Add p as a potential pointer to a stack object.
+// p must be a stack address.
+func (s *stackScanState) putPtr(p uintptr, conservative bool) {
+	if p < s.stack.lo || p >= s.stack.hi {
+		throw("address not a stack address")
+	}
+	head := &s.buf
+	if conservative {
+		head = &s.cbuf
+	}
+	buf := *head
+	if buf == nil {
+		// Initial setup.
+		buf = (*stackWorkBuf)(unsafe.Pointer(getempty()))
+		buf.nobj = 0
+		buf.next = nil
+		*head = buf
+	} else if buf.nobj == len(buf.obj) {
+		if s.freeBuf != nil {
+			buf = s.freeBuf
+			s.freeBuf = nil
+		} else {
+			buf = (*stackWorkBuf)(unsafe.Pointer(getempty()))
+		}
+		buf.nobj = 0
+		buf.next = *head
+		*head = buf
+	}
+	buf.obj[buf.nobj] = p
+	buf.nobj++
+}
+
+// Remove and return a potential pointer to a stack object.
+// Returns 0 if there are no more pointers available.
+//
+// This prefers non-conservative pointers so we scan stack objects
+// precisely if there are any non-conservative pointers to them.
+func (s *stackScanState) getPtr() (p uintptr, conservative bool) {
+	for _, head := range []**stackWorkBuf{&s.buf, &s.cbuf} {
+		buf := *head
+		if buf == nil {
+			// Never had any data.
+			continue
+		}
+		if buf.nobj == 0 {
+			if s.freeBuf != nil {
+				// Free old freeBuf.
+				putempty((*workbuf)(unsafe.Pointer(s.freeBuf)))
+			}
+			// Move buf to the freeBuf.
+			s.freeBuf = buf
+			buf = buf.next
+			*head = buf
+			if buf == nil {
+				// No more data in this list.
+				continue
+			}
+		}
+		buf.nobj--
+		return buf.obj[buf.nobj], head == &s.cbuf
+	}
+	// No more data in either list.
+	if s.freeBuf != nil {
+		putempty((*workbuf)(unsafe.Pointer(s.freeBuf)))
+		s.freeBuf = nil
+	}
+	return 0, false
+}
+
+// addObject adds a stack object at addr of type typ to the set of stack objects.
+func (s *stackScanState) addObject(addr uintptr, r *stackObjectRecord) {
+	x := s.tail
+	if x == nil {
+		// initial setup
+		x = (*stackObjectBuf)(unsafe.Pointer(getempty()))
+		x.next = nil
+		s.head = x
+		s.tail = x
+	}
+	if x.nobj > 0 && uint32(addr-s.stack.lo) < x.obj[x.nobj-1].off+x.obj[x.nobj-1].size {
+		throw("objects added out of order or overlapping")
+	}
+	if x.nobj == len(x.obj) {
+		// full buffer - allocate a new buffer, add to end of linked list
+		y := (*stackObjectBuf)(unsafe.Pointer(getempty()))
+		y.next = nil
+		x.next = y
+		s.tail = y
+		x = y
+	}
+	obj := &x.obj[x.nobj]
+	x.nobj++
+	obj.off = uint32(addr - s.stack.lo)
+	obj.size = uint32(r.size)
+	obj.setRecord(r)
+	// obj.left and obj.right will be initialized by buildIndex before use.
+	s.nobjs++
+}
+
+// buildIndex initializes s.root to a binary search tree.
+// It should be called after all addObject calls but before
+// any call of findObject.
+func (s *stackScanState) buildIndex() {
+	s.root, _, _ = binarySearchTree(s.head, 0, s.nobjs)
+}
+
+// Build a binary search tree with the n objects in the list
+// x.obj[idx], x.obj[idx+1], ..., x.next.obj[0], ...
+// Returns the root of that tree, and the buf+idx of the nth object after x.obj[idx].
+// (The first object that was not included in the binary search tree.)
+// If n == 0, returns nil, x.
+func binarySearchTree(x *stackObjectBuf, idx int, n int) (root *stackObject, restBuf *stackObjectBuf, restIdx int) {
+	if n == 0 {
+		return nil, x, idx
+	}
+	var left, right *stackObject
+	left, x, idx = binarySearchTree(x, idx, n/2)
+	root = &x.obj[idx]
+	idx++
+	if idx == len(x.obj) {
+		x = x.next
+		idx = 0
+	}
+	right, x, idx = binarySearchTree(x, idx, n-n/2-1)
+	root.left = left
+	root.right = right
+	return root, x, idx
+}
+
+// findObject returns the stack object containing address a, if any.
+// Must have called buildIndex previously.
+func (s *stackScanState) findObject(a uintptr) *stackObject {
+	off := uint32(a - s.stack.lo)
+	obj := s.root
+	for {
+		if obj == nil {
+			return nil
+		}
+		if off < obj.off {
+			obj = obj.left
+			continue
+		}
+		if off >= obj.off+obj.size {
+			obj = obj.right
+			continue
+		}
+		return obj
+	}
+}
diff --git a/src/runtime/mgcsweep.go b/src/runtime/mgcsweep.go
new file mode 100644
index 0000000..68f1aae
--- /dev/null
+++ b/src/runtime/mgcsweep.go
@@ -0,0 +1,982 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Garbage collector: sweeping
+
+// The sweeper consists of two different algorithms:
+//
+// * The object reclaimer finds and frees unmarked slots in spans. It
+//   can free a whole span if none of the objects are marked, but that
+//   isn't its goal. This can be driven either synchronously by
+//   mcentral.cacheSpan for mcentral spans, or asynchronously by
+//   sweepone, which looks at all the mcentral lists.
+//
+// * The span reclaimer looks for spans that contain no marked objects
+//   and frees whole spans. This is a separate algorithm because
+//   freeing whole spans is the hardest task for the object reclaimer,
+//   but is critical when allocating new spans. The entry point for
+//   this is mheap_.reclaim and it's driven by a sequential scan of
+//   the page marks bitmap in the heap arenas.
+//
+// Both algorithms ultimately call mspan.sweep, which sweeps a single
+// heap span.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+var sweep sweepdata
+
+// State of background sweep.
+type sweepdata struct {
+	lock   mutex
+	g      *g
+	parked bool
+
+	nbgsweep    uint32
+	npausesweep uint32
+
+	// active tracks outstanding sweepers and the sweep
+	// termination condition.
+	active activeSweep
+
+	// centralIndex is the current unswept span class.
+	// It represents an index into the mcentral span
+	// sets. Accessed and updated via its load and
+	// update methods. Not protected by a lock.
+	//
+	// Reset at mark termination.
+	// Used by mheap.nextSpanForSweep.
+	centralIndex sweepClass
+}
+
+// sweepClass is a spanClass and one bit to represent whether we're currently
+// sweeping partial or full spans.
+type sweepClass uint32
+
+const (
+	numSweepClasses            = numSpanClasses * 2
+	sweepClassDone  sweepClass = sweepClass(^uint32(0))
+)
+
+func (s *sweepClass) load() sweepClass {
+	return sweepClass(atomic.Load((*uint32)(s)))
+}
+
+func (s *sweepClass) update(sNew sweepClass) {
+	// Only update *s if its current value is less than sNew,
+	// since *s increases monotonically.
+	sOld := s.load()
+	for sOld < sNew && !atomic.Cas((*uint32)(s), uint32(sOld), uint32(sNew)) {
+		sOld = s.load()
+	}
+	// TODO(mknyszek): This isn't the only place we have
+	// an atomic monotonically increasing counter. It would
+	// be nice to have an "atomic max" which is just implemented
+	// as the above on most architectures. Some architectures
+	// like RISC-V however have native support for an atomic max.
+}
+
+func (s *sweepClass) clear() {
+	atomic.Store((*uint32)(s), 0)
+}
+
+// split returns the underlying span class as well as
+// whether we're interested in the full or partial
+// unswept lists for that class, indicated as a boolean
+// (true means "full").
+func (s sweepClass) split() (spc spanClass, full bool) {
+	return spanClass(s >> 1), s&1 == 0
+}
+
+// nextSpanForSweep finds and pops the next span for sweeping from the
+// central sweep buffers. It returns ownership of the span to the caller.
+// Returns nil if no such span exists.
+func (h *mheap) nextSpanForSweep() *mspan {
+	sg := h.sweepgen
+	for sc := sweep.centralIndex.load(); sc < numSweepClasses; sc++ {
+		spc, full := sc.split()
+		c := &h.central[spc].mcentral
+		var s *mspan
+		if full {
+			s = c.fullUnswept(sg).pop()
+		} else {
+			s = c.partialUnswept(sg).pop()
+		}
+		if s != nil {
+			// Write down that we found something so future sweepers
+			// can start from here.
+			sweep.centralIndex.update(sc)
+			return s
+		}
+	}
+	// Write down that we found nothing.
+	sweep.centralIndex.update(sweepClassDone)
+	return nil
+}
+
+const sweepDrainedMask = 1 << 31
+
+// activeSweep is a type that captures whether sweeping
+// is done, and whether there are any outstanding sweepers.
+//
+// Every potential sweeper must call begin() before they look
+// for work, and end() after they've finished sweeping.
+type activeSweep struct {
+	// state is divided into two parts.
+	//
+	// The top bit (masked by sweepDrainedMask) is a boolean
+	// value indicating whether all the sweep work has been
+	// drained from the queue.
+	//
+	// The rest of the bits are a counter, indicating the
+	// number of outstanding concurrent sweepers.
+	state atomic.Uint32
+}
+
+// begin registers a new sweeper. Returns a sweepLocker
+// for acquiring spans for sweeping. Any outstanding sweeper blocks
+// sweep termination.
+//
+// If the sweepLocker is invalid, the caller can be sure that all
+// outstanding sweep work has been drained, so there is nothing left
+// to sweep. Note that there may be sweepers currently running, so
+// this does not indicate that all sweeping has completed.
+//
+// Even if the sweepLocker is invalid, its sweepGen is always valid.
+func (a *activeSweep) begin() sweepLocker {
+	for {
+		state := a.state.Load()
+		if state&sweepDrainedMask != 0 {
+			return sweepLocker{mheap_.sweepgen, false}
+		}
+		if a.state.CompareAndSwap(state, state+1) {
+			return sweepLocker{mheap_.sweepgen, true}
+		}
+	}
+}
+
+// end deregisters a sweeper. Must be called once for each time
+// begin is called if the sweepLocker is valid.
+func (a *activeSweep) end(sl sweepLocker) {
+	if sl.sweepGen != mheap_.sweepgen {
+		throw("sweeper left outstanding across sweep generations")
+	}
+	for {
+		state := a.state.Load()
+		if (state&^sweepDrainedMask)-1 >= sweepDrainedMask {
+			throw("mismatched begin/end of activeSweep")
+		}
+		if a.state.CompareAndSwap(state, state-1) {
+			if state != sweepDrainedMask {
+				return
+			}
+			if debug.gcpacertrace > 0 {
+				live := gcController.heapLive.Load()
+				print("pacer: sweep done at heap size ", live>>20, "MB; allocated ", (live-mheap_.sweepHeapLiveBasis)>>20, "MB during sweep; swept ", mheap_.pagesSwept.Load(), " pages at ", mheap_.sweepPagesPerByte, " pages/byte\n")
+			}
+			return
+		}
+	}
+}
+
+// markDrained marks the active sweep cycle as having drained
+// all remaining work. This is safe to be called concurrently
+// with all other methods of activeSweep, though may race.
+//
+// Returns true if this call was the one that actually performed
+// the mark.
+func (a *activeSweep) markDrained() bool {
+	for {
+		state := a.state.Load()
+		if state&sweepDrainedMask != 0 {
+			return false
+		}
+		if a.state.CompareAndSwap(state, state|sweepDrainedMask) {
+			return true
+		}
+	}
+}
+
+// sweepers returns the current number of active sweepers.
+func (a *activeSweep) sweepers() uint32 {
+	return a.state.Load() &^ sweepDrainedMask
+}
+
+// isDone returns true if all sweep work has been drained and no more
+// outstanding sweepers exist. That is, when the sweep phase is
+// completely done.
+func (a *activeSweep) isDone() bool {
+	return a.state.Load() == sweepDrainedMask
+}
+
+// reset sets up the activeSweep for the next sweep cycle.
+//
+// The world must be stopped.
+func (a *activeSweep) reset() {
+	assertWorldStopped()
+	a.state.Store(0)
+}
+
+// finishsweep_m ensures that all spans are swept.
+//
+// The world must be stopped. This ensures there are no sweeps in
+// progress.
+//
+//go:nowritebarrier
+func finishsweep_m() {
+	assertWorldStopped()
+
+	// Sweeping must be complete before marking commences, so
+	// sweep any unswept spans. If this is a concurrent GC, there
+	// shouldn't be any spans left to sweep, so this should finish
+	// instantly. If GC was forced before the concurrent sweep
+	// finished, there may be spans to sweep.
+	for sweepone() != ^uintptr(0) {
+		sweep.npausesweep++
+	}
+
+	// Make sure there aren't any outstanding sweepers left.
+	// At this point, with the world stopped, it means one of two
+	// things. Either we were able to preempt a sweeper, or that
+	// a sweeper didn't call sweep.active.end when it should have.
+	// Both cases indicate a bug, so throw.
+	if sweep.active.sweepers() != 0 {
+		throw("active sweepers found at start of mark phase")
+	}
+
+	// Reset all the unswept buffers, which should be empty.
+	// Do this in sweep termination as opposed to mark termination
+	// so that we can catch unswept spans and reclaim blocks as
+	// soon as possible.
+	sg := mheap_.sweepgen
+	for i := range mheap_.central {
+		c := &mheap_.central[i].mcentral
+		c.partialUnswept(sg).reset()
+		c.fullUnswept(sg).reset()
+	}
+
+	// Sweeping is done, so there won't be any new memory to
+	// scavenge for a bit.
+	//
+	// If the scavenger isn't already awake, wake it up. There's
+	// definitely work for it to do at this point.
+	scavenger.wake()
+
+	nextMarkBitArenaEpoch()
+}
+
+func bgsweep(c chan int) {
+	sweep.g = getg()
+
+	lockInit(&sweep.lock, lockRankSweep)
+	lock(&sweep.lock)
+	sweep.parked = true
+	c <- 1
+	goparkunlock(&sweep.lock, waitReasonGCSweepWait, traceBlockGCSweep, 1)
+
+	for {
+		// bgsweep attempts to be a "low priority" goroutine by intentionally
+		// yielding time. It's OK if it doesn't run, because goroutines allocating
+		// memory will sweep and ensure that all spans are swept before the next
+		// GC cycle. We really only want to run when we're idle.
+		//
+		// However, calling Gosched after each span swept produces a tremendous
+		// amount of tracing events, sometimes up to 50% of events in a trace. It's
+		// also inefficient to call into the scheduler so much because sweeping a
+		// single span is in general a very fast operation, taking as little as 30 ns
+		// on modern hardware. (See #54767.)
+		//
+		// As a result, bgsweep sweeps in batches, and only calls into the scheduler
+		// at the end of every batch. Furthermore, it only yields its time if there
+		// isn't spare idle time available on other cores. If there's available idle
+		// time, helping to sweep can reduce allocation latencies by getting ahead of
+		// the proportional sweeper and having spans ready to go for allocation.
+		const sweepBatchSize = 10
+		nSwept := 0
+		for sweepone() != ^uintptr(0) {
+			sweep.nbgsweep++
+			nSwept++
+			if nSwept%sweepBatchSize == 0 {
+				goschedIfBusy()
+			}
+		}
+		for freeSomeWbufs(true) {
+			// N.B. freeSomeWbufs is already batched internally.
+			goschedIfBusy()
+		}
+		lock(&sweep.lock)
+		if !isSweepDone() {
+			// This can happen if a GC runs between
+			// gosweepone returning ^0 above
+			// and the lock being acquired.
+			unlock(&sweep.lock)
+			continue
+		}
+		sweep.parked = true
+		goparkunlock(&sweep.lock, waitReasonGCSweepWait, traceBlockGCSweep, 1)
+	}
+}
+
+// sweepLocker acquires sweep ownership of spans.
+type sweepLocker struct {
+	// sweepGen is the sweep generation of the heap.
+	sweepGen uint32
+	valid    bool
+}
+
+// sweepLocked represents sweep ownership of a span.
+type sweepLocked struct {
+	*mspan
+}
+
+// tryAcquire attempts to acquire sweep ownership of span s. If it
+// successfully acquires ownership, it blocks sweep completion.
+func (l *sweepLocker) tryAcquire(s *mspan) (sweepLocked, bool) {
+	if !l.valid {
+		throw("use of invalid sweepLocker")
+	}
+	// Check before attempting to CAS.
+	if atomic.Load(&s.sweepgen) != l.sweepGen-2 {
+		return sweepLocked{}, false
+	}
+	// Attempt to acquire sweep ownership of s.
+	if !atomic.Cas(&s.sweepgen, l.sweepGen-2, l.sweepGen-1) {
+		return sweepLocked{}, false
+	}
+	return sweepLocked{s}, true
+}
+
+// sweepone sweeps some unswept heap span and returns the number of pages returned
+// to the heap, or ^uintptr(0) if there was nothing to sweep.
+func sweepone() uintptr {
+	gp := getg()
+
+	// Increment locks to ensure that the goroutine is not preempted
+	// in the middle of sweep thus leaving the span in an inconsistent state for next GC
+	gp.m.locks++
+
+	// TODO(austin): sweepone is almost always called in a loop;
+	// lift the sweepLocker into its callers.
+	sl := sweep.active.begin()
+	if !sl.valid {
+		gp.m.locks--
+		return ^uintptr(0)
+	}
+
+	// Find a span to sweep.
+	npages := ^uintptr(0)
+	var noMoreWork bool
+	for {
+		s := mheap_.nextSpanForSweep()
+		if s == nil {
+			noMoreWork = sweep.active.markDrained()
+			break
+		}
+		if state := s.state.get(); state != mSpanInUse {
+			// This can happen if direct sweeping already
+			// swept this span, but in that case the sweep
+			// generation should always be up-to-date.
+			if !(s.sweepgen == sl.sweepGen || s.sweepgen == sl.sweepGen+3) {
+				print("runtime: bad span s.state=", state, " s.sweepgen=", s.sweepgen, " sweepgen=", sl.sweepGen, "\n")
+				throw("non in-use span in unswept list")
+			}
+			continue
+		}
+		if s, ok := sl.tryAcquire(s); ok {
+			// Sweep the span we found.
+			npages = s.npages
+			if s.sweep(false) {
+				// Whole span was freed. Count it toward the
+				// page reclaimer credit since these pages can
+				// now be used for span allocation.
+				mheap_.reclaimCredit.Add(npages)
+			} else {
+				// Span is still in-use, so this returned no
+				// pages to the heap and the span needs to
+				// move to the swept in-use list.
+				npages = 0
+			}
+			break
+		}
+	}
+	sweep.active.end(sl)
+
+	if noMoreWork {
+		// The sweep list is empty. There may still be
+		// concurrent sweeps running, but we're at least very
+		// close to done sweeping.
+
+		// Move the scavenge gen forward (signaling
+		// that there's new work to do) and wake the scavenger.
+		//
+		// The scavenger is signaled by the last sweeper because once
+		// sweeping is done, we will definitely have useful work for
+		// the scavenger to do, since the scavenger only runs over the
+		// heap once per GC cycle. This update is not done during sweep
+		// termination because in some cases there may be a long delay
+		// between sweep done and sweep termination (e.g. not enough
+		// allocations to trigger a GC) which would be nice to fill in
+		// with scavenging work.
+		if debug.scavtrace > 0 {
+			systemstack(func() {
+				lock(&mheap_.lock)
+
+				// Get released stats.
+				releasedBg := mheap_.pages.scav.releasedBg.Load()
+				releasedEager := mheap_.pages.scav.releasedEager.Load()
+
+				// Print the line.
+				printScavTrace(releasedBg, releasedEager, false)
+
+				// Update the stats.
+				mheap_.pages.scav.releasedBg.Add(-releasedBg)
+				mheap_.pages.scav.releasedEager.Add(-releasedEager)
+				unlock(&mheap_.lock)
+			})
+		}
+		scavenger.ready()
+	}
+
+	gp.m.locks--
+	return npages
+}
+
+// isSweepDone reports whether all spans are swept.
+//
+// Note that this condition may transition from false to true at any
+// time as the sweeper runs. It may transition from true to false if a
+// GC runs; to prevent that the caller must be non-preemptible or must
+// somehow block GC progress.
+func isSweepDone() bool {
+	return sweep.active.isDone()
+}
+
+// Returns only when span s has been swept.
+//
+//go:nowritebarrier
+func (s *mspan) ensureSwept() {
+	// Caller must disable preemption.
+	// Otherwise when this function returns the span can become unswept again
+	// (if GC is triggered on another goroutine).
+	gp := getg()
+	if gp.m.locks == 0 && gp.m.mallocing == 0 && gp != gp.m.g0 {
+		throw("mspan.ensureSwept: m is not locked")
+	}
+
+	// If this operation fails, then that means that there are
+	// no more spans to be swept. In this case, either s has already
+	// been swept, or is about to be acquired for sweeping and swept.
+	sl := sweep.active.begin()
+	if sl.valid {
+		// The caller must be sure that the span is a mSpanInUse span.
+		if s, ok := sl.tryAcquire(s); ok {
+			s.sweep(false)
+			sweep.active.end(sl)
+			return
+		}
+		sweep.active.end(sl)
+	}
+
+	// Unfortunately we can't sweep the span ourselves. Somebody else
+	// got to it first. We don't have efficient means to wait, but that's
+	// OK, it will be swept fairly soon.
+	for {
+		spangen := atomic.Load(&s.sweepgen)
+		if spangen == sl.sweepGen || spangen == sl.sweepGen+3 {
+			break
+		}
+		osyield()
+	}
+}
+
+// sweep frees or collects finalizers for blocks not marked in the mark phase.
+// It clears the mark bits in preparation for the next GC round.
+// Returns true if the span was returned to heap.
+// If preserve=true, don't return it to heap nor relink in mcentral lists;
+// caller takes care of it.
+func (sl *sweepLocked) sweep(preserve bool) bool {
+	// It's critical that we enter this function with preemption disabled,
+	// GC must not start while we are in the middle of this function.
+	gp := getg()
+	if gp.m.locks == 0 && gp.m.mallocing == 0 && gp != gp.m.g0 {
+		throw("mspan.sweep: m is not locked")
+	}
+
+	s := sl.mspan
+	if !preserve {
+		// We'll release ownership of this span. Nil it out to
+		// prevent the caller from accidentally using it.
+		sl.mspan = nil
+	}
+
+	sweepgen := mheap_.sweepgen
+	if state := s.state.get(); state != mSpanInUse || s.sweepgen != sweepgen-1 {
+		print("mspan.sweep: state=", state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
+		throw("mspan.sweep: bad span state")
+	}
+
+	if traceEnabled() {
+		traceGCSweepSpan(s.npages * _PageSize)
+	}
+
+	mheap_.pagesSwept.Add(int64(s.npages))
+
+	spc := s.spanclass
+	size := s.elemsize
+
+	// The allocBits indicate which unmarked objects don't need to be
+	// processed since they were free at the end of the last GC cycle
+	// and were not allocated since then.
+	// If the allocBits index is >= s.freeindex and the bit
+	// is not marked then the object remains unallocated
+	// since the last GC.
+	// This situation is analogous to being on a freelist.
+
+	// Unlink & free special records for any objects we're about to free.
+	// Two complications here:
+	// 1. An object can have both finalizer and profile special records.
+	//    In such case we need to queue finalizer for execution,
+	//    mark the object as live and preserve the profile special.
+	// 2. A tiny object can have several finalizers setup for different offsets.
+	//    If such object is not marked, we need to queue all finalizers at once.
+	// Both 1 and 2 are possible at the same time.
+	hadSpecials := s.specials != nil
+	siter := newSpecialsIter(s)
+	for siter.valid() {
+		// A finalizer can be set for an inner byte of an object, find object beginning.
+		objIndex := uintptr(siter.s.offset) / size
+		p := s.base() + objIndex*size
+		mbits := s.markBitsForIndex(objIndex)
+		if !mbits.isMarked() {
+			// This object is not marked and has at least one special record.
+			// Pass 1: see if it has at least one finalizer.
+			hasFin := false
+			endOffset := p - s.base() + size
+			for tmp := siter.s; tmp != nil && uintptr(tmp.offset) < endOffset; tmp = tmp.next {
+				if tmp.kind == _KindSpecialFinalizer {
+					// Stop freeing of object if it has a finalizer.
+					mbits.setMarkedNonAtomic()
+					hasFin = true
+					break
+				}
+			}
+			// Pass 2: queue all finalizers _or_ handle profile record.
+			for siter.valid() && uintptr(siter.s.offset) < endOffset {
+				// Find the exact byte for which the special was setup
+				// (as opposed to object beginning).
+				special := siter.s
+				p := s.base() + uintptr(special.offset)
+				if special.kind == _KindSpecialFinalizer || !hasFin {
+					siter.unlinkAndNext()
+					freeSpecial(special, unsafe.Pointer(p), size)
+				} else {
+					// The object has finalizers, so we're keeping it alive.
+					// All other specials only apply when an object is freed,
+					// so just keep the special record.
+					siter.next()
+				}
+			}
+		} else {
+			// object is still live
+			if siter.s.kind == _KindSpecialReachable {
+				special := siter.unlinkAndNext()
+				(*specialReachable)(unsafe.Pointer(special)).reachable = true
+				freeSpecial(special, unsafe.Pointer(p), size)
+			} else {
+				// keep special record
+				siter.next()
+			}
+		}
+	}
+	if hadSpecials && s.specials == nil {
+		spanHasNoSpecials(s)
+	}
+
+	if debug.allocfreetrace != 0 || debug.clobberfree != 0 || raceenabled || msanenabled || asanenabled {
+		// Find all newly freed objects. This doesn't have to
+		// efficient; allocfreetrace has massive overhead.
+		mbits := s.markBitsForBase()
+		abits := s.allocBitsForIndex(0)
+		for i := uintptr(0); i < s.nelems; i++ {
+			if !mbits.isMarked() && (abits.index < s.freeindex || abits.isMarked()) {
+				x := s.base() + i*s.elemsize
+				if debug.allocfreetrace != 0 {
+					tracefree(unsafe.Pointer(x), size)
+				}
+				if debug.clobberfree != 0 {
+					clobberfree(unsafe.Pointer(x), size)
+				}
+				// User arenas are handled on explicit free.
+				if raceenabled && !s.isUserArenaChunk {
+					racefree(unsafe.Pointer(x), size)
+				}
+				if msanenabled && !s.isUserArenaChunk {
+					msanfree(unsafe.Pointer(x), size)
+				}
+				if asanenabled && !s.isUserArenaChunk {
+					asanpoison(unsafe.Pointer(x), size)
+				}
+			}
+			mbits.advance()
+			abits.advance()
+		}
+	}
+
+	// Check for zombie objects.
+	if s.freeindex < s.nelems {
+		// Everything < freeindex is allocated and hence
+		// cannot be zombies.
+		//
+		// Check the first bitmap byte, where we have to be
+		// careful with freeindex.
+		obj := s.freeindex
+		if (*s.gcmarkBits.bytep(obj / 8)&^*s.allocBits.bytep(obj / 8))>>(obj%8) != 0 {
+			s.reportZombies()
+		}
+		// Check remaining bytes.
+		for i := obj/8 + 1; i < divRoundUp(s.nelems, 8); i++ {
+			if *s.gcmarkBits.bytep(i)&^*s.allocBits.bytep(i) != 0 {
+				s.reportZombies()
+			}
+		}
+	}
+
+	// Count the number of free objects in this span.
+	nalloc := uint16(s.countAlloc())
+	nfreed := s.allocCount - nalloc
+	if nalloc > s.allocCount {
+		// The zombie check above should have caught this in
+		// more detail.
+		print("runtime: nelems=", s.nelems, " nalloc=", nalloc, " previous allocCount=", s.allocCount, " nfreed=", nfreed, "\n")
+		throw("sweep increased allocation count")
+	}
+
+	s.allocCount = nalloc
+	s.freeindex = 0 // reset allocation index to start of span.
+	s.freeIndexForScan = 0
+	if traceEnabled() {
+		getg().m.p.ptr().trace.reclaimed += uintptr(nfreed) * s.elemsize
+	}
+
+	// gcmarkBits becomes the allocBits.
+	// get a fresh cleared gcmarkBits in preparation for next GC
+	s.allocBits = s.gcmarkBits
+	s.gcmarkBits = newMarkBits(s.nelems)
+
+	// refresh pinnerBits if they exists
+	if s.pinnerBits != nil {
+		s.refreshPinnerBits()
+	}
+
+	// Initialize alloc bits cache.
+	s.refillAllocCache(0)
+
+	// The span must be in our exclusive ownership until we update sweepgen,
+	// check for potential races.
+	if state := s.state.get(); state != mSpanInUse || s.sweepgen != sweepgen-1 {
+		print("mspan.sweep: state=", state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
+		throw("mspan.sweep: bad span state after sweep")
+	}
+	if s.sweepgen == sweepgen+1 || s.sweepgen == sweepgen+3 {
+		throw("swept cached span")
+	}
+
+	// We need to set s.sweepgen = h.sweepgen only when all blocks are swept,
+	// because of the potential for a concurrent free/SetFinalizer.
+	//
+	// But we need to set it before we make the span available for allocation
+	// (return it to heap or mcentral), because allocation code assumes that a
+	// span is already swept if available for allocation.
+	//
+	// Serialization point.
+	// At this point the mark bits are cleared and allocation ready
+	// to go so release the span.
+	atomic.Store(&s.sweepgen, sweepgen)
+
+	if s.isUserArenaChunk {
+		if preserve {
+			// This is a case that should never be handled by a sweeper that
+			// preserves the span for reuse.
+			throw("sweep: tried to preserve a user arena span")
+		}
+		if nalloc > 0 {
+			// There still exist pointers into the span or the span hasn't been
+			// freed yet. It's not ready to be reused. Put it back on the
+			// full swept list for the next cycle.
+			mheap_.central[spc].mcentral.fullSwept(sweepgen).push(s)
+			return false
+		}
+
+		// It's only at this point that the sweeper doesn't actually need to look
+		// at this arena anymore, so subtract from pagesInUse now.
+		mheap_.pagesInUse.Add(-s.npages)
+		s.state.set(mSpanDead)
+
+		// The arena is ready to be recycled. Remove it from the quarantine list
+		// and place it on the ready list. Don't add it back to any sweep lists.
+		systemstack(func() {
+			// It's the arena code's responsibility to get the chunk on the quarantine
+			// list by the time all references to the chunk are gone.
+			if s.list != &mheap_.userArena.quarantineList {
+				throw("user arena span is on the wrong list")
+			}
+			lock(&mheap_.lock)
+			mheap_.userArena.quarantineList.remove(s)
+			mheap_.userArena.readyList.insert(s)
+			unlock(&mheap_.lock)
+		})
+		return false
+	}
+
+	if spc.sizeclass() != 0 {
+		// Handle spans for small objects.
+		if nfreed > 0 {
+			// Only mark the span as needing zeroing if we've freed any
+			// objects, because a fresh span that had been allocated into,
+			// wasn't totally filled, but then swept, still has all of its
+			// free slots zeroed.
+			s.needzero = 1
+			stats := memstats.heapStats.acquire()
+			atomic.Xadd64(&stats.smallFreeCount[spc.sizeclass()], int64(nfreed))
+			memstats.heapStats.release()
+
+			// Count the frees in the inconsistent, internal stats.
+			gcController.totalFree.Add(int64(nfreed) * int64(s.elemsize))
+		}
+		if !preserve {
+			// The caller may not have removed this span from whatever
+			// unswept set its on but taken ownership of the span for
+			// sweeping by updating sweepgen. If this span still is in
+			// an unswept set, then the mcentral will pop it off the
+			// set, check its sweepgen, and ignore it.
+			if nalloc == 0 {
+				// Free totally free span directly back to the heap.
+				mheap_.freeSpan(s)
+				return true
+			}
+			// Return span back to the right mcentral list.
+			if uintptr(nalloc) == s.nelems {
+				mheap_.central[spc].mcentral.fullSwept(sweepgen).push(s)
+			} else {
+				mheap_.central[spc].mcentral.partialSwept(sweepgen).push(s)
+			}
+		}
+	} else if !preserve {
+		// Handle spans for large objects.
+		if nfreed != 0 {
+			// Free large object span to heap.
+
+			// NOTE(rsc,dvyukov): The original implementation of efence
+			// in CL 22060046 used sysFree instead of sysFault, so that
+			// the operating system would eventually give the memory
+			// back to us again, so that an efence program could run
+			// longer without running out of memory. Unfortunately,
+			// calling sysFree here without any kind of adjustment of the
+			// heap data structures means that when the memory does
+			// come back to us, we have the wrong metadata for it, either in
+			// the mspan structures or in the garbage collection bitmap.
+			// Using sysFault here means that the program will run out of
+			// memory fairly quickly in efence mode, but at least it won't
+			// have mysterious crashes due to confused memory reuse.
+			// It should be possible to switch back to sysFree if we also
+			// implement and then call some kind of mheap.deleteSpan.
+			if debug.efence > 0 {
+				s.limit = 0 // prevent mlookup from finding this span
+				sysFault(unsafe.Pointer(s.base()), size)
+			} else {
+				mheap_.freeSpan(s)
+			}
+
+			// Count the free in the consistent, external stats.
+			stats := memstats.heapStats.acquire()
+			atomic.Xadd64(&stats.largeFreeCount, 1)
+			atomic.Xadd64(&stats.largeFree, int64(size))
+			memstats.heapStats.release()
+
+			// Count the free in the inconsistent, internal stats.
+			gcController.totalFree.Add(int64(size))
+
+			return true
+		}
+
+		// Add a large span directly onto the full+swept list.
+		mheap_.central[spc].mcentral.fullSwept(sweepgen).push(s)
+	}
+	return false
+}
+
+// reportZombies reports any marked but free objects in s and throws.
+//
+// This generally means one of the following:
+//
+// 1. User code converted a pointer to a uintptr and then back
+// unsafely, and a GC ran while the uintptr was the only reference to
+// an object.
+//
+// 2. User code (or a compiler bug) constructed a bad pointer that
+// points to a free slot, often a past-the-end pointer.
+//
+// 3. The GC two cycles ago missed a pointer and freed a live object,
+// but it was still live in the last cycle, so this GC cycle found a
+// pointer to that object and marked it.
+func (s *mspan) reportZombies() {
+	printlock()
+	print("runtime: marked free object in span ", s, ", elemsize=", s.elemsize, " freeindex=", s.freeindex, " (bad use of unsafe.Pointer? try -d=checkptr)\n")
+	mbits := s.markBitsForBase()
+	abits := s.allocBitsForIndex(0)
+	for i := uintptr(0); i < s.nelems; i++ {
+		addr := s.base() + i*s.elemsize
+		print(hex(addr))
+		alloc := i < s.freeindex || abits.isMarked()
+		if alloc {
+			print(" alloc")
+		} else {
+			print(" free ")
+		}
+		if mbits.isMarked() {
+			print(" marked  ")
+		} else {
+			print(" unmarked")
+		}
+		zombie := mbits.isMarked() && !alloc
+		if zombie {
+			print(" zombie")
+		}
+		print("\n")
+		if zombie {
+			length := s.elemsize
+			if length > 1024 {
+				length = 1024
+			}
+			hexdumpWords(addr, addr+length, nil)
+		}
+		mbits.advance()
+		abits.advance()
+	}
+	throw("found pointer to free object")
+}
+
+// deductSweepCredit deducts sweep credit for allocating a span of
+// size spanBytes. This must be performed *before* the span is
+// allocated to ensure the system has enough credit. If necessary, it
+// performs sweeping to prevent going in to debt. If the caller will
+// also sweep pages (e.g., for a large allocation), it can pass a
+// non-zero callerSweepPages to leave that many pages unswept.
+//
+// deductSweepCredit makes a worst-case assumption that all spanBytes
+// bytes of the ultimately allocated span will be available for object
+// allocation.
+//
+// deductSweepCredit is the core of the "proportional sweep" system.
+// It uses statistics gathered by the garbage collector to perform
+// enough sweeping so that all pages are swept during the concurrent
+// sweep phase between GC cycles.
+//
+// mheap_ must NOT be locked.
+func deductSweepCredit(spanBytes uintptr, callerSweepPages uintptr) {
+	if mheap_.sweepPagesPerByte == 0 {
+		// Proportional sweep is done or disabled.
+		return
+	}
+
+	if traceEnabled() {
+		traceGCSweepStart()
+	}
+
+	// Fix debt if necessary.
+retry:
+	sweptBasis := mheap_.pagesSweptBasis.Load()
+	live := gcController.heapLive.Load()
+	liveBasis := mheap_.sweepHeapLiveBasis
+	newHeapLive := spanBytes
+	if liveBasis < live {
+		// Only do this subtraction when we don't overflow. Otherwise, pagesTarget
+		// might be computed as something really huge, causing us to get stuck
+		// sweeping here until the next mark phase.
+		//
+		// Overflow can happen here if gcPaceSweeper is called concurrently with
+		// sweeping (i.e. not during a STW, like it usually is) because this code
+		// is intentionally racy. A concurrent call to gcPaceSweeper can happen
+		// if a GC tuning parameter is modified and we read an older value of
+		// heapLive than what was used to set the basis.
+		//
+		// This state should be transient, so it's fine to just let newHeapLive
+		// be a relatively small number. We'll probably just skip this attempt to
+		// sweep.
+		//
+		// See issue #57523.
+		newHeapLive += uintptr(live - liveBasis)
+	}
+	pagesTarget := int64(mheap_.sweepPagesPerByte*float64(newHeapLive)) - int64(callerSweepPages)
+	for pagesTarget > int64(mheap_.pagesSwept.Load()-sweptBasis) {
+		if sweepone() == ^uintptr(0) {
+			mheap_.sweepPagesPerByte = 0
+			break
+		}
+		if mheap_.pagesSweptBasis.Load() != sweptBasis {
+			// Sweep pacing changed. Recompute debt.
+			goto retry
+		}
+	}
+
+	if traceEnabled() {
+		traceGCSweepDone()
+	}
+}
+
+// clobberfree sets the memory content at x to bad content, for debugging
+// purposes.
+func clobberfree(x unsafe.Pointer, size uintptr) {
+	// size (span.elemsize) is always a multiple of 4.
+	for i := uintptr(0); i < size; i += 4 {
+		*(*uint32)(add(x, i)) = 0xdeadbeef
+	}
+}
+
+// gcPaceSweeper updates the sweeper's pacing parameters.
+//
+// Must be called whenever the GC's pacing is updated.
+//
+// The world must be stopped, or mheap_.lock must be held.
+func gcPaceSweeper(trigger uint64) {
+	assertWorldStoppedOrLockHeld(&mheap_.lock)
+
+	// Update sweep pacing.
+	if isSweepDone() {
+		mheap_.sweepPagesPerByte = 0
+	} else {
+		// Concurrent sweep needs to sweep all of the in-use
+		// pages by the time the allocated heap reaches the GC
+		// trigger. Compute the ratio of in-use pages to sweep
+		// per byte allocated, accounting for the fact that
+		// some might already be swept.
+		heapLiveBasis := gcController.heapLive.Load()
+		heapDistance := int64(trigger) - int64(heapLiveBasis)
+		// Add a little margin so rounding errors and
+		// concurrent sweep are less likely to leave pages
+		// unswept when GC starts.
+		heapDistance -= 1024 * 1024
+		if heapDistance < _PageSize {
+			// Avoid setting the sweep ratio extremely high
+			heapDistance = _PageSize
+		}
+		pagesSwept := mheap_.pagesSwept.Load()
+		pagesInUse := mheap_.pagesInUse.Load()
+		sweepDistancePages := int64(pagesInUse) - int64(pagesSwept)
+		if sweepDistancePages <= 0 {
+			mheap_.sweepPagesPerByte = 0
+		} else {
+			mheap_.sweepPagesPerByte = float64(sweepDistancePages) / float64(heapDistance)
+			mheap_.sweepHeapLiveBasis = heapLiveBasis
+			// Write pagesSweptBasis last, since this
+			// signals concurrent sweeps to recompute
+			// their debt.
+			mheap_.pagesSweptBasis.Store(pagesSwept)
+		}
+	}
+}
diff --git a/src/runtime/mgcwork.go b/src/runtime/mgcwork.go
new file mode 100644
index 0000000..7ab8975
--- /dev/null
+++ b/src/runtime/mgcwork.go
@@ -0,0 +1,489 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	_WorkbufSize = 2048 // in bytes; larger values result in less contention
+
+	// workbufAlloc is the number of bytes to allocate at a time
+	// for new workbufs. This must be a multiple of pageSize and
+	// should be a multiple of _WorkbufSize.
+	//
+	// Larger values reduce workbuf allocation overhead. Smaller
+	// values reduce heap fragmentation.
+	workbufAlloc = 32 << 10
+)
+
+func init() {
+	if workbufAlloc%pageSize != 0 || workbufAlloc%_WorkbufSize != 0 {
+		throw("bad workbufAlloc")
+	}
+}
+
+// Garbage collector work pool abstraction.
+//
+// This implements a producer/consumer model for pointers to grey
+// objects. A grey object is one that is marked and on a work
+// queue. A black object is marked and not on a work queue.
+//
+// Write barriers, root discovery, stack scanning, and object scanning
+// produce pointers to grey objects. Scanning consumes pointers to
+// grey objects, thus blackening them, and then scans them,
+// potentially producing new pointers to grey objects.
+
+// A gcWork provides the interface to produce and consume work for the
+// garbage collector.
+//
+// A gcWork can be used on the stack as follows:
+//
+//	(preemption must be disabled)
+//	gcw := &getg().m.p.ptr().gcw
+//	.. call gcw.put() to produce and gcw.tryGet() to consume ..
+//
+// It's important that any use of gcWork during the mark phase prevent
+// the garbage collector from transitioning to mark termination since
+// gcWork may locally hold GC work buffers. This can be done by
+// disabling preemption (systemstack or acquirem).
+type gcWork struct {
+	// wbuf1 and wbuf2 are the primary and secondary work buffers.
+	//
+	// This can be thought of as a stack of both work buffers'
+	// pointers concatenated. When we pop the last pointer, we
+	// shift the stack up by one work buffer by bringing in a new
+	// full buffer and discarding an empty one. When we fill both
+	// buffers, we shift the stack down by one work buffer by
+	// bringing in a new empty buffer and discarding a full one.
+	// This way we have one buffer's worth of hysteresis, which
+	// amortizes the cost of getting or putting a work buffer over
+	// at least one buffer of work and reduces contention on the
+	// global work lists.
+	//
+	// wbuf1 is always the buffer we're currently pushing to and
+	// popping from and wbuf2 is the buffer that will be discarded
+	// next.
+	//
+	// Invariant: Both wbuf1 and wbuf2 are nil or neither are.
+	wbuf1, wbuf2 *workbuf
+
+	// Bytes marked (blackened) on this gcWork. This is aggregated
+	// into work.bytesMarked by dispose.
+	bytesMarked uint64
+
+	// Heap scan work performed on this gcWork. This is aggregated into
+	// gcController by dispose and may also be flushed by callers.
+	// Other types of scan work are flushed immediately.
+	heapScanWork int64
+
+	// flushedWork indicates that a non-empty work buffer was
+	// flushed to the global work list since the last gcMarkDone
+	// termination check. Specifically, this indicates that this
+	// gcWork may have communicated work to another gcWork.
+	flushedWork bool
+}
+
+// Most of the methods of gcWork are go:nowritebarrierrec because the
+// write barrier itself can invoke gcWork methods but the methods are
+// not generally re-entrant. Hence, if a gcWork method invoked the
+// write barrier while the gcWork was in an inconsistent state, and
+// the write barrier in turn invoked a gcWork method, it could
+// permanently corrupt the gcWork.
+
+func (w *gcWork) init() {
+	w.wbuf1 = getempty()
+	wbuf2 := trygetfull()
+	if wbuf2 == nil {
+		wbuf2 = getempty()
+	}
+	w.wbuf2 = wbuf2
+}
+
+// put enqueues a pointer for the garbage collector to trace.
+// obj must point to the beginning of a heap object or an oblet.
+//
+//go:nowritebarrierrec
+func (w *gcWork) put(obj uintptr) {
+	flushed := false
+	wbuf := w.wbuf1
+	// Record that this may acquire the wbufSpans or heap lock to
+	// allocate a workbuf.
+	lockWithRankMayAcquire(&work.wbufSpans.lock, lockRankWbufSpans)
+	lockWithRankMayAcquire(&mheap_.lock, lockRankMheap)
+	if wbuf == nil {
+		w.init()
+		wbuf = w.wbuf1
+		// wbuf is empty at this point.
+	} else if wbuf.nobj == len(wbuf.obj) {
+		w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
+		wbuf = w.wbuf1
+		if wbuf.nobj == len(wbuf.obj) {
+			putfull(wbuf)
+			w.flushedWork = true
+			wbuf = getempty()
+			w.wbuf1 = wbuf
+			flushed = true
+		}
+	}
+
+	wbuf.obj[wbuf.nobj] = obj
+	wbuf.nobj++
+
+	// If we put a buffer on full, let the GC controller know so
+	// it can encourage more workers to run. We delay this until
+	// the end of put so that w is in a consistent state, since
+	// enlistWorker may itself manipulate w.
+	if flushed && gcphase == _GCmark {
+		gcController.enlistWorker()
+	}
+}
+
+// putFast does a put and reports whether it can be done quickly
+// otherwise it returns false and the caller needs to call put.
+//
+//go:nowritebarrierrec
+func (w *gcWork) putFast(obj uintptr) bool {
+	wbuf := w.wbuf1
+	if wbuf == nil || wbuf.nobj == len(wbuf.obj) {
+		return false
+	}
+
+	wbuf.obj[wbuf.nobj] = obj
+	wbuf.nobj++
+	return true
+}
+
+// putBatch performs a put on every pointer in obj. See put for
+// constraints on these pointers.
+//
+//go:nowritebarrierrec
+func (w *gcWork) putBatch(obj []uintptr) {
+	if len(obj) == 0 {
+		return
+	}
+
+	flushed := false
+	wbuf := w.wbuf1
+	if wbuf == nil {
+		w.init()
+		wbuf = w.wbuf1
+	}
+
+	for len(obj) > 0 {
+		for wbuf.nobj == len(wbuf.obj) {
+			putfull(wbuf)
+			w.flushedWork = true
+			w.wbuf1, w.wbuf2 = w.wbuf2, getempty()
+			wbuf = w.wbuf1
+			flushed = true
+		}
+		n := copy(wbuf.obj[wbuf.nobj:], obj)
+		wbuf.nobj += n
+		obj = obj[n:]
+	}
+
+	if flushed && gcphase == _GCmark {
+		gcController.enlistWorker()
+	}
+}
+
+// tryGet dequeues a pointer for the garbage collector to trace.
+//
+// If there are no pointers remaining in this gcWork or in the global
+// queue, tryGet returns 0.  Note that there may still be pointers in
+// other gcWork instances or other caches.
+//
+//go:nowritebarrierrec
+func (w *gcWork) tryGet() uintptr {
+	wbuf := w.wbuf1
+	if wbuf == nil {
+		w.init()
+		wbuf = w.wbuf1
+		// wbuf is empty at this point.
+	}
+	if wbuf.nobj == 0 {
+		w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
+		wbuf = w.wbuf1
+		if wbuf.nobj == 0 {
+			owbuf := wbuf
+			wbuf = trygetfull()
+			if wbuf == nil {
+				return 0
+			}
+			putempty(owbuf)
+			w.wbuf1 = wbuf
+		}
+	}
+
+	wbuf.nobj--
+	return wbuf.obj[wbuf.nobj]
+}
+
+// tryGetFast dequeues a pointer for the garbage collector to trace
+// if one is readily available. Otherwise it returns 0 and
+// the caller is expected to call tryGet().
+//
+//go:nowritebarrierrec
+func (w *gcWork) tryGetFast() uintptr {
+	wbuf := w.wbuf1
+	if wbuf == nil || wbuf.nobj == 0 {
+		return 0
+	}
+
+	wbuf.nobj--
+	return wbuf.obj[wbuf.nobj]
+}
+
+// dispose returns any cached pointers to the global queue.
+// The buffers are being put on the full queue so that the
+// write barriers will not simply reacquire them before the
+// GC can inspect them. This helps reduce the mutator's
+// ability to hide pointers during the concurrent mark phase.
+//
+//go:nowritebarrierrec
+func (w *gcWork) dispose() {
+	if wbuf := w.wbuf1; wbuf != nil {
+		if wbuf.nobj == 0 {
+			putempty(wbuf)
+		} else {
+			putfull(wbuf)
+			w.flushedWork = true
+		}
+		w.wbuf1 = nil
+
+		wbuf = w.wbuf2
+		if wbuf.nobj == 0 {
+			putempty(wbuf)
+		} else {
+			putfull(wbuf)
+			w.flushedWork = true
+		}
+		w.wbuf2 = nil
+	}
+	if w.bytesMarked != 0 {
+		// dispose happens relatively infrequently. If this
+		// atomic becomes a problem, we should first try to
+		// dispose less and if necessary aggregate in a per-P
+		// counter.
+		atomic.Xadd64(&work.bytesMarked, int64(w.bytesMarked))
+		w.bytesMarked = 0
+	}
+	if w.heapScanWork != 0 {
+		gcController.heapScanWork.Add(w.heapScanWork)
+		w.heapScanWork = 0
+	}
+}
+
+// balance moves some work that's cached in this gcWork back on the
+// global queue.
+//
+//go:nowritebarrierrec
+func (w *gcWork) balance() {
+	if w.wbuf1 == nil {
+		return
+	}
+	if wbuf := w.wbuf2; wbuf.nobj != 0 {
+		putfull(wbuf)
+		w.flushedWork = true
+		w.wbuf2 = getempty()
+	} else if wbuf := w.wbuf1; wbuf.nobj > 4 {
+		w.wbuf1 = handoff(wbuf)
+		w.flushedWork = true // handoff did putfull
+	} else {
+		return
+	}
+	// We flushed a buffer to the full list, so wake a worker.
+	if gcphase == _GCmark {
+		gcController.enlistWorker()
+	}
+}
+
+// empty reports whether w has no mark work available.
+//
+//go:nowritebarrierrec
+func (w *gcWork) empty() bool {
+	return w.wbuf1 == nil || (w.wbuf1.nobj == 0 && w.wbuf2.nobj == 0)
+}
+
+// Internally, the GC work pool is kept in arrays in work buffers.
+// The gcWork interface caches a work buffer until full (or empty) to
+// avoid contending on the global work buffer lists.
+
+type workbufhdr struct {
+	node lfnode // must be first
+	nobj int
+}
+
+type workbuf struct {
+	_ sys.NotInHeap
+	workbufhdr
+	// account for the above fields
+	obj [(_WorkbufSize - unsafe.Sizeof(workbufhdr{})) / goarch.PtrSize]uintptr
+}
+
+// workbuf factory routines. These funcs are used to manage the
+// workbufs.
+// If the GC asks for some work these are the only routines that
+// make wbufs available to the GC.
+
+func (b *workbuf) checknonempty() {
+	if b.nobj == 0 {
+		throw("workbuf is empty")
+	}
+}
+
+func (b *workbuf) checkempty() {
+	if b.nobj != 0 {
+		throw("workbuf is not empty")
+	}
+}
+
+// getempty pops an empty work buffer off the work.empty list,
+// allocating new buffers if none are available.
+//
+//go:nowritebarrier
+func getempty() *workbuf {
+	var b *workbuf
+	if work.empty != 0 {
+		b = (*workbuf)(work.empty.pop())
+		if b != nil {
+			b.checkempty()
+		}
+	}
+	// Record that this may acquire the wbufSpans or heap lock to
+	// allocate a workbuf.
+	lockWithRankMayAcquire(&work.wbufSpans.lock, lockRankWbufSpans)
+	lockWithRankMayAcquire(&mheap_.lock, lockRankMheap)
+	if b == nil {
+		// Allocate more workbufs.
+		var s *mspan
+		if work.wbufSpans.free.first != nil {
+			lock(&work.wbufSpans.lock)
+			s = work.wbufSpans.free.first
+			if s != nil {
+				work.wbufSpans.free.remove(s)
+				work.wbufSpans.busy.insert(s)
+			}
+			unlock(&work.wbufSpans.lock)
+		}
+		if s == nil {
+			systemstack(func() {
+				s = mheap_.allocManual(workbufAlloc/pageSize, spanAllocWorkBuf)
+			})
+			if s == nil {
+				throw("out of memory")
+			}
+			// Record the new span in the busy list.
+			lock(&work.wbufSpans.lock)
+			work.wbufSpans.busy.insert(s)
+			unlock(&work.wbufSpans.lock)
+		}
+		// Slice up the span into new workbufs. Return one and
+		// put the rest on the empty list.
+		for i := uintptr(0); i+_WorkbufSize <= workbufAlloc; i += _WorkbufSize {
+			newb := (*workbuf)(unsafe.Pointer(s.base() + i))
+			newb.nobj = 0
+			lfnodeValidate(&newb.node)
+			if i == 0 {
+				b = newb
+			} else {
+				putempty(newb)
+			}
+		}
+	}
+	return b
+}
+
+// putempty puts a workbuf onto the work.empty list.
+// Upon entry this goroutine owns b. The lfstack.push relinquishes ownership.
+//
+//go:nowritebarrier
+func putempty(b *workbuf) {
+	b.checkempty()
+	work.empty.push(&b.node)
+}
+
+// putfull puts the workbuf on the work.full list for the GC.
+// putfull accepts partially full buffers so the GC can avoid competing
+// with the mutators for ownership of partially full buffers.
+//
+//go:nowritebarrier
+func putfull(b *workbuf) {
+	b.checknonempty()
+	work.full.push(&b.node)
+}
+
+// trygetfull tries to get a full or partially empty workbuffer.
+// If one is not immediately available return nil.
+//
+//go:nowritebarrier
+func trygetfull() *workbuf {
+	b := (*workbuf)(work.full.pop())
+	if b != nil {
+		b.checknonempty()
+		return b
+	}
+	return b
+}
+
+//go:nowritebarrier
+func handoff(b *workbuf) *workbuf {
+	// Make new buffer with half of b's pointers.
+	b1 := getempty()
+	n := b.nobj / 2
+	b.nobj -= n
+	b1.nobj = n
+	memmove(unsafe.Pointer(&b1.obj[0]), unsafe.Pointer(&b.obj[b.nobj]), uintptr(n)*unsafe.Sizeof(b1.obj[0]))
+
+	// Put b on full list - let first half of b get stolen.
+	putfull(b)
+	return b1
+}
+
+// prepareFreeWorkbufs moves busy workbuf spans to free list so they
+// can be freed to the heap. This must only be called when all
+// workbufs are on the empty list.
+func prepareFreeWorkbufs() {
+	lock(&work.wbufSpans.lock)
+	if work.full != 0 {
+		throw("cannot free workbufs when work.full != 0")
+	}
+	// Since all workbufs are on the empty list, we don't care
+	// which ones are in which spans. We can wipe the entire empty
+	// list and move all workbuf spans to the free list.
+	work.empty = 0
+	work.wbufSpans.free.takeAll(&work.wbufSpans.busy)
+	unlock(&work.wbufSpans.lock)
+}
+
+// freeSomeWbufs frees some workbufs back to the heap and returns
+// true if it should be called again to free more.
+func freeSomeWbufs(preemptible bool) bool {
+	const batchSize = 64 // ~1–2 µs per span.
+	lock(&work.wbufSpans.lock)
+	if gcphase != _GCoff || work.wbufSpans.free.isEmpty() {
+		unlock(&work.wbufSpans.lock)
+		return false
+	}
+	systemstack(func() {
+		gp := getg().m.curg
+		for i := 0; i < batchSize && !(preemptible && gp.preempt); i++ {
+			span := work.wbufSpans.free.first
+			if span == nil {
+				break
+			}
+			work.wbufSpans.free.remove(span)
+			mheap_.freeManual(span, spanAllocWorkBuf)
+		}
+	})
+	more := !work.wbufSpans.free.isEmpty()
+	unlock(&work.wbufSpans.lock)
+	return more
+}
diff --git a/src/runtime/mheap.go b/src/runtime/mheap.go
new file mode 100644
index 0000000..f836d91
--- /dev/null
+++ b/src/runtime/mheap.go
@@ -0,0 +1,2260 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Page heap.
+//
+// See malloc.go for overview.
+
+package runtime
+
+import (
+	"internal/cpu"
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	// minPhysPageSize is a lower-bound on the physical page size. The
+	// true physical page size may be larger than this. In contrast,
+	// sys.PhysPageSize is an upper-bound on the physical page size.
+	minPhysPageSize = 4096
+
+	// maxPhysPageSize is the maximum page size the runtime supports.
+	maxPhysPageSize = 512 << 10
+
+	// maxPhysHugePageSize sets an upper-bound on the maximum huge page size
+	// that the runtime supports.
+	maxPhysHugePageSize = pallocChunkBytes
+
+	// pagesPerReclaimerChunk indicates how many pages to scan from the
+	// pageInUse bitmap at a time. Used by the page reclaimer.
+	//
+	// Higher values reduce contention on scanning indexes (such as
+	// h.reclaimIndex), but increase the minimum latency of the
+	// operation.
+	//
+	// The time required to scan this many pages can vary a lot depending
+	// on how many spans are actually freed. Experimentally, it can
+	// scan for pages at ~300 GB/ms on a 2.6GHz Core i7, but can only
+	// free spans at ~32 MB/ms. Using 512 pages bounds this at
+	// roughly 100µs.
+	//
+	// Must be a multiple of the pageInUse bitmap element size and
+	// must also evenly divide pagesPerArena.
+	pagesPerReclaimerChunk = 512
+
+	// physPageAlignedStacks indicates whether stack allocations must be
+	// physical page aligned. This is a requirement for MAP_STACK on
+	// OpenBSD.
+	physPageAlignedStacks = GOOS == "openbsd"
+)
+
+// Main malloc heap.
+// The heap itself is the "free" and "scav" treaps,
+// but all the other global data is here too.
+//
+// mheap must not be heap-allocated because it contains mSpanLists,
+// which must not be heap-allocated.
+type mheap struct {
+	_ sys.NotInHeap
+
+	// lock must only be acquired on the system stack, otherwise a g
+	// could self-deadlock if its stack grows with the lock held.
+	lock mutex
+
+	pages pageAlloc // page allocation data structure
+
+	sweepgen uint32 // sweep generation, see comment in mspan; written during STW
+
+	// allspans is a slice of all mspans ever created. Each mspan
+	// appears exactly once.
+	//
+	// The memory for allspans is manually managed and can be
+	// reallocated and move as the heap grows.
+	//
+	// In general, allspans is protected by mheap_.lock, which
+	// prevents concurrent access as well as freeing the backing
+	// store. Accesses during STW might not hold the lock, but
+	// must ensure that allocation cannot happen around the
+	// access (since that may free the backing store).
+	allspans []*mspan // all spans out there
+
+	// Proportional sweep
+	//
+	// These parameters represent a linear function from gcController.heapLive
+	// to page sweep count. The proportional sweep system works to
+	// stay in the black by keeping the current page sweep count
+	// above this line at the current gcController.heapLive.
+	//
+	// The line has slope sweepPagesPerByte and passes through a
+	// basis point at (sweepHeapLiveBasis, pagesSweptBasis). At
+	// any given time, the system is at (gcController.heapLive,
+	// pagesSwept) in this space.
+	//
+	// It is important that the line pass through a point we
+	// control rather than simply starting at a 0,0 origin
+	// because that lets us adjust sweep pacing at any time while
+	// accounting for current progress. If we could only adjust
+	// the slope, it would create a discontinuity in debt if any
+	// progress has already been made.
+	pagesInUse         atomic.Uintptr // pages of spans in stats mSpanInUse
+	pagesSwept         atomic.Uint64  // pages swept this cycle
+	pagesSweptBasis    atomic.Uint64  // pagesSwept to use as the origin of the sweep ratio
+	sweepHeapLiveBasis uint64         // value of gcController.heapLive to use as the origin of sweep ratio; written with lock, read without
+	sweepPagesPerByte  float64        // proportional sweep ratio; written with lock, read without
+
+	// Page reclaimer state
+
+	// reclaimIndex is the page index in allArenas of next page to
+	// reclaim. Specifically, it refers to page (i %
+	// pagesPerArena) of arena allArenas[i / pagesPerArena].
+	//
+	// If this is >= 1<<63, the page reclaimer is done scanning
+	// the page marks.
+	reclaimIndex atomic.Uint64
+
+	// reclaimCredit is spare credit for extra pages swept. Since
+	// the page reclaimer works in large chunks, it may reclaim
+	// more than requested. Any spare pages released go to this
+	// credit pool.
+	reclaimCredit atomic.Uintptr
+
+	// arenas is the heap arena map. It points to the metadata for
+	// the heap for every arena frame of the entire usable virtual
+	// address space.
+	//
+	// Use arenaIndex to compute indexes into this array.
+	//
+	// For regions of the address space that are not backed by the
+	// Go heap, the arena map contains nil.
+	//
+	// Modifications are protected by mheap_.lock. Reads can be
+	// performed without locking; however, a given entry can
+	// transition from nil to non-nil at any time when the lock
+	// isn't held. (Entries never transitions back to nil.)
+	//
+	// In general, this is a two-level mapping consisting of an L1
+	// map and possibly many L2 maps. This saves space when there
+	// are a huge number of arena frames. However, on many
+	// platforms (even 64-bit), arenaL1Bits is 0, making this
+	// effectively a single-level map. In this case, arenas[0]
+	// will never be nil.
+	arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena
+
+	// arenasHugePages indicates whether arenas' L2 entries are eligible
+	// to be backed by huge pages.
+	arenasHugePages bool
+
+	// heapArenaAlloc is pre-reserved space for allocating heapArena
+	// objects. This is only used on 32-bit, where we pre-reserve
+	// this space to avoid interleaving it with the heap itself.
+	heapArenaAlloc linearAlloc
+
+	// arenaHints is a list of addresses at which to attempt to
+	// add more heap arenas. This is initially populated with a
+	// set of general hint addresses, and grown with the bounds of
+	// actual heap arena ranges.
+	arenaHints *arenaHint
+
+	// arena is a pre-reserved space for allocating heap arenas
+	// (the actual arenas). This is only used on 32-bit.
+	arena linearAlloc
+
+	// allArenas is the arenaIndex of every mapped arena. This can
+	// be used to iterate through the address space.
+	//
+	// Access is protected by mheap_.lock. However, since this is
+	// append-only and old backing arrays are never freed, it is
+	// safe to acquire mheap_.lock, copy the slice header, and
+	// then release mheap_.lock.
+	allArenas []arenaIdx
+
+	// sweepArenas is a snapshot of allArenas taken at the
+	// beginning of the sweep cycle. This can be read safely by
+	// simply blocking GC (by disabling preemption).
+	sweepArenas []arenaIdx
+
+	// markArenas is a snapshot of allArenas taken at the beginning
+	// of the mark cycle. Because allArenas is append-only, neither
+	// this slice nor its contents will change during the mark, so
+	// it can be read safely.
+	markArenas []arenaIdx
+
+	// curArena is the arena that the heap is currently growing
+	// into. This should always be physPageSize-aligned.
+	curArena struct {
+		base, end uintptr
+	}
+
+	// central free lists for small size classes.
+	// the padding makes sure that the mcentrals are
+	// spaced CacheLinePadSize bytes apart, so that each mcentral.lock
+	// gets its own cache line.
+	// central is indexed by spanClass.
+	central [numSpanClasses]struct {
+		mcentral mcentral
+		pad      [(cpu.CacheLinePadSize - unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize) % cpu.CacheLinePadSize]byte
+	}
+
+	spanalloc              fixalloc // allocator for span*
+	cachealloc             fixalloc // allocator for mcache*
+	specialfinalizeralloc  fixalloc // allocator for specialfinalizer*
+	specialprofilealloc    fixalloc // allocator for specialprofile*
+	specialReachableAlloc  fixalloc // allocator for specialReachable
+	specialPinCounterAlloc fixalloc // allocator for specialPinCounter
+	speciallock            mutex    // lock for special record allocators.
+	arenaHintAlloc         fixalloc // allocator for arenaHints
+
+	// User arena state.
+	//
+	// Protected by mheap_.lock.
+	userArena struct {
+		// arenaHints is a list of addresses at which to attempt to
+		// add more heap arenas for user arena chunks. This is initially
+		// populated with a set of general hint addresses, and grown with
+		// the bounds of actual heap arena ranges.
+		arenaHints *arenaHint
+
+		// quarantineList is a list of user arena spans that have been set to fault, but
+		// are waiting for all pointers into them to go away. Sweeping handles
+		// identifying when this is true, and moves the span to the ready list.
+		quarantineList mSpanList
+
+		// readyList is a list of empty user arena spans that are ready for reuse.
+		readyList mSpanList
+	}
+
+	unused *specialfinalizer // never set, just here to force the specialfinalizer type into DWARF
+}
+
+var mheap_ mheap
+
+// A heapArena stores metadata for a heap arena. heapArenas are stored
+// outside of the Go heap and accessed via the mheap_.arenas index.
+type heapArena struct {
+	_ sys.NotInHeap
+
+	// bitmap stores the pointer/scalar bitmap for the words in
+	// this arena. See mbitmap.go for a description.
+	// This array uses 1 bit per word of heap, or 1.6% of the heap size (for 64-bit).
+	bitmap [heapArenaBitmapWords]uintptr
+
+	// If the ith bit of noMorePtrs is true, then there are no more
+	// pointers for the object containing the word described by the
+	// high bit of bitmap[i].
+	// In that case, bitmap[i+1], ... must be zero until the start
+	// of the next object.
+	// We never operate on these entries using bit-parallel techniques,
+	// so it is ok if they are small. Also, they can't be bigger than
+	// uint16 because at that size a single noMorePtrs entry
+	// represents 8K of memory, the minimum size of a span. Any larger
+	// and we'd have to worry about concurrent updates.
+	// This array uses 1 bit per word of bitmap, or .024% of the heap size (for 64-bit).
+	noMorePtrs [heapArenaBitmapWords / 8]uint8
+
+	// spans maps from virtual address page ID within this arena to *mspan.
+	// For allocated spans, their pages map to the span itself.
+	// For free spans, only the lowest and highest pages map to the span itself.
+	// Internal pages map to an arbitrary span.
+	// For pages that have never been allocated, spans entries are nil.
+	//
+	// Modifications are protected by mheap.lock. Reads can be
+	// performed without locking, but ONLY from indexes that are
+	// known to contain in-use or stack spans. This means there
+	// must not be a safe-point between establishing that an
+	// address is live and looking it up in the spans array.
+	spans [pagesPerArena]*mspan
+
+	// pageInUse is a bitmap that indicates which spans are in
+	// state mSpanInUse. This bitmap is indexed by page number,
+	// but only the bit corresponding to the first page in each
+	// span is used.
+	//
+	// Reads and writes are atomic.
+	pageInUse [pagesPerArena / 8]uint8
+
+	// pageMarks is a bitmap that indicates which spans have any
+	// marked objects on them. Like pageInUse, only the bit
+	// corresponding to the first page in each span is used.
+	//
+	// Writes are done atomically during marking. Reads are
+	// non-atomic and lock-free since they only occur during
+	// sweeping (and hence never race with writes).
+	//
+	// This is used to quickly find whole spans that can be freed.
+	//
+	// TODO(austin): It would be nice if this was uint64 for
+	// faster scanning, but we don't have 64-bit atomic bit
+	// operations.
+	pageMarks [pagesPerArena / 8]uint8
+
+	// pageSpecials is a bitmap that indicates which spans have
+	// specials (finalizers or other). Like pageInUse, only the bit
+	// corresponding to the first page in each span is used.
+	//
+	// Writes are done atomically whenever a special is added to
+	// a span and whenever the last special is removed from a span.
+	// Reads are done atomically to find spans containing specials
+	// during marking.
+	pageSpecials [pagesPerArena / 8]uint8
+
+	// checkmarks stores the debug.gccheckmark state. It is only
+	// used if debug.gccheckmark > 0.
+	checkmarks *checkmarksMap
+
+	// zeroedBase marks the first byte of the first page in this
+	// arena which hasn't been used yet and is therefore already
+	// zero. zeroedBase is relative to the arena base.
+	// Increases monotonically until it hits heapArenaBytes.
+	//
+	// This field is sufficient to determine if an allocation
+	// needs to be zeroed because the page allocator follows an
+	// address-ordered first-fit policy.
+	//
+	// Read atomically and written with an atomic CAS.
+	zeroedBase uintptr
+}
+
+// arenaHint is a hint for where to grow the heap arenas. See
+// mheap_.arenaHints.
+type arenaHint struct {
+	_    sys.NotInHeap
+	addr uintptr
+	down bool
+	next *arenaHint
+}
+
+// An mspan is a run of pages.
+//
+// When a mspan is in the heap free treap, state == mSpanFree
+// and heapmap(s->start) == span, heapmap(s->start+s->npages-1) == span.
+// If the mspan is in the heap scav treap, then in addition to the
+// above scavenged == true. scavenged == false in all other cases.
+//
+// When a mspan is allocated, state == mSpanInUse or mSpanManual
+// and heapmap(i) == span for all s->start <= i < s->start+s->npages.
+
+// Every mspan is in one doubly-linked list, either in the mheap's
+// busy list or one of the mcentral's span lists.
+
+// An mspan representing actual memory has state mSpanInUse,
+// mSpanManual, or mSpanFree. Transitions between these states are
+// constrained as follows:
+//
+//   - A span may transition from free to in-use or manual during any GC
+//     phase.
+//
+//   - During sweeping (gcphase == _GCoff), a span may transition from
+//     in-use to free (as a result of sweeping) or manual to free (as a
+//     result of stacks being freed).
+//
+//   - During GC (gcphase != _GCoff), a span *must not* transition from
+//     manual or in-use to free. Because concurrent GC may read a pointer
+//     and then look up its span, the span state must be monotonic.
+//
+// Setting mspan.state to mSpanInUse or mSpanManual must be done
+// atomically and only after all other span fields are valid.
+// Likewise, if inspecting a span is contingent on it being
+// mSpanInUse, the state should be loaded atomically and checked
+// before depending on other fields. This allows the garbage collector
+// to safely deal with potentially invalid pointers, since resolving
+// such pointers may race with a span being allocated.
+type mSpanState uint8
+
+const (
+	mSpanDead   mSpanState = iota
+	mSpanInUse             // allocated for garbage collected heap
+	mSpanManual            // allocated for manual management (e.g., stack allocator)
+)
+
+// mSpanStateNames are the names of the span states, indexed by
+// mSpanState.
+var mSpanStateNames = []string{
+	"mSpanDead",
+	"mSpanInUse",
+	"mSpanManual",
+}
+
+// mSpanStateBox holds an atomic.Uint8 to provide atomic operations on
+// an mSpanState. This is a separate type to disallow accidental comparison
+// or assignment with mSpanState.
+type mSpanStateBox struct {
+	s atomic.Uint8
+}
+
+// It is nosplit to match get, below.
+
+//go:nosplit
+func (b *mSpanStateBox) set(s mSpanState) {
+	b.s.Store(uint8(s))
+}
+
+// It is nosplit because it's called indirectly by typedmemclr,
+// which must not be preempted.
+
+//go:nosplit
+func (b *mSpanStateBox) get() mSpanState {
+	return mSpanState(b.s.Load())
+}
+
+// mSpanList heads a linked list of spans.
+type mSpanList struct {
+	_     sys.NotInHeap
+	first *mspan // first span in list, or nil if none
+	last  *mspan // last span in list, or nil if none
+}
+
+type mspan struct {
+	_    sys.NotInHeap
+	next *mspan     // next span in list, or nil if none
+	prev *mspan     // previous span in list, or nil if none
+	list *mSpanList // For debugging. TODO: Remove.
+
+	startAddr uintptr // address of first byte of span aka s.base()
+	npages    uintptr // number of pages in span
+
+	manualFreeList gclinkptr // list of free objects in mSpanManual spans
+
+	// freeindex is the slot index between 0 and nelems at which to begin scanning
+	// for the next free object in this span.
+	// Each allocation scans allocBits starting at freeindex until it encounters a 0
+	// indicating a free object. freeindex is then adjusted so that subsequent scans begin
+	// just past the newly discovered free object.
+	//
+	// If freeindex == nelem, this span has no free objects.
+	//
+	// allocBits is a bitmap of objects in this span.
+	// If n >= freeindex and allocBits[n/8] & (1<<(n%8)) is 0
+	// then object n is free;
+	// otherwise, object n is allocated. Bits starting at nelem are
+	// undefined and should never be referenced.
+	//
+	// Object n starts at address n*elemsize + (start << pageShift).
+	freeindex uintptr
+	// TODO: Look up nelems from sizeclass and remove this field if it
+	// helps performance.
+	nelems uintptr // number of object in the span.
+
+	// Cache of the allocBits at freeindex. allocCache is shifted
+	// such that the lowest bit corresponds to the bit freeindex.
+	// allocCache holds the complement of allocBits, thus allowing
+	// ctz (count trailing zero) to use it directly.
+	// allocCache may contain bits beyond s.nelems; the caller must ignore
+	// these.
+	allocCache uint64
+
+	// allocBits and gcmarkBits hold pointers to a span's mark and
+	// allocation bits. The pointers are 8 byte aligned.
+	// There are three arenas where this data is held.
+	// free: Dirty arenas that are no longer accessed
+	//       and can be reused.
+	// next: Holds information to be used in the next GC cycle.
+	// current: Information being used during this GC cycle.
+	// previous: Information being used during the last GC cycle.
+	// A new GC cycle starts with the call to finishsweep_m.
+	// finishsweep_m moves the previous arena to the free arena,
+	// the current arena to the previous arena, and
+	// the next arena to the current arena.
+	// The next arena is populated as the spans request
+	// memory to hold gcmarkBits for the next GC cycle as well
+	// as allocBits for newly allocated spans.
+	//
+	// The pointer arithmetic is done "by hand" instead of using
+	// arrays to avoid bounds checks along critical performance
+	// paths.
+	// The sweep will free the old allocBits and set allocBits to the
+	// gcmarkBits. The gcmarkBits are replaced with a fresh zeroed
+	// out memory.
+	allocBits  *gcBits
+	gcmarkBits *gcBits
+	pinnerBits *gcBits // bitmap for pinned objects; accessed atomically
+
+	// sweep generation:
+	// if sweepgen == h->sweepgen - 2, the span needs sweeping
+	// if sweepgen == h->sweepgen - 1, the span is currently being swept
+	// if sweepgen == h->sweepgen, the span is swept and ready to use
+	// if sweepgen == h->sweepgen + 1, the span was cached before sweep began and is still cached, and needs sweeping
+	// if sweepgen == h->sweepgen + 3, the span was swept and then cached and is still cached
+	// h->sweepgen is incremented by 2 after every GC
+
+	sweepgen              uint32
+	divMul                uint32        // for divide by elemsize
+	allocCount            uint16        // number of allocated objects
+	spanclass             spanClass     // size class and noscan (uint8)
+	state                 mSpanStateBox // mSpanInUse etc; accessed atomically (get/set methods)
+	needzero              uint8         // needs to be zeroed before allocation
+	isUserArenaChunk      bool          // whether or not this span represents a user arena
+	allocCountBeforeCache uint16        // a copy of allocCount that is stored just before this span is cached
+	elemsize              uintptr       // computed from sizeclass or from npages
+	limit                 uintptr       // end of data in span
+	speciallock           mutex         // guards specials list and changes to pinnerBits
+	specials              *special      // linked list of special records sorted by offset.
+	userArenaChunkFree    addrRange     // interval for managing chunk allocation
+
+	// freeIndexForScan is like freeindex, except that freeindex is
+	// used by the allocator whereas freeIndexForScan is used by the
+	// GC scanner. They are two fields so that the GC sees the object
+	// is allocated only when the object and the heap bits are
+	// initialized (see also the assignment of freeIndexForScan in
+	// mallocgc, and issue 54596).
+	freeIndexForScan uintptr
+}
+
+func (s *mspan) base() uintptr {
+	return s.startAddr
+}
+
+func (s *mspan) layout() (size, n, total uintptr) {
+	total = s.npages << _PageShift
+	size = s.elemsize
+	if size > 0 {
+		n = total / size
+	}
+	return
+}
+
+// recordspan adds a newly allocated span to h.allspans.
+//
+// This only happens the first time a span is allocated from
+// mheap.spanalloc (it is not called when a span is reused).
+//
+// Write barriers are disallowed here because it can be called from
+// gcWork when allocating new workbufs. However, because it's an
+// indirect call from the fixalloc initializer, the compiler can't see
+// this.
+//
+// The heap lock must be held.
+//
+//go:nowritebarrierrec
+func recordspan(vh unsafe.Pointer, p unsafe.Pointer) {
+	h := (*mheap)(vh)
+	s := (*mspan)(p)
+
+	assertLockHeld(&h.lock)
+
+	if len(h.allspans) >= cap(h.allspans) {
+		n := 64 * 1024 / goarch.PtrSize
+		if n < cap(h.allspans)*3/2 {
+			n = cap(h.allspans) * 3 / 2
+		}
+		var new []*mspan
+		sp := (*slice)(unsafe.Pointer(&new))
+		sp.array = sysAlloc(uintptr(n)*goarch.PtrSize, &memstats.other_sys)
+		if sp.array == nil {
+			throw("runtime: cannot allocate memory")
+		}
+		sp.len = len(h.allspans)
+		sp.cap = n
+		if len(h.allspans) > 0 {
+			copy(new, h.allspans)
+		}
+		oldAllspans := h.allspans
+		*(*notInHeapSlice)(unsafe.Pointer(&h.allspans)) = *(*notInHeapSlice)(unsafe.Pointer(&new))
+		if len(oldAllspans) != 0 {
+			sysFree(unsafe.Pointer(&oldAllspans[0]), uintptr(cap(oldAllspans))*unsafe.Sizeof(oldAllspans[0]), &memstats.other_sys)
+		}
+	}
+	h.allspans = h.allspans[:len(h.allspans)+1]
+	h.allspans[len(h.allspans)-1] = s
+}
+
+// A spanClass represents the size class and noscan-ness of a span.
+//
+// Each size class has a noscan spanClass and a scan spanClass. The
+// noscan spanClass contains only noscan objects, which do not contain
+// pointers and thus do not need to be scanned by the garbage
+// collector.
+type spanClass uint8
+
+const (
+	numSpanClasses = _NumSizeClasses << 1
+	tinySpanClass  = spanClass(tinySizeClass<<1 | 1)
+)
+
+func makeSpanClass(sizeclass uint8, noscan bool) spanClass {
+	return spanClass(sizeclass<<1) | spanClass(bool2int(noscan))
+}
+
+func (sc spanClass) sizeclass() int8 {
+	return int8(sc >> 1)
+}
+
+func (sc spanClass) noscan() bool {
+	return sc&1 != 0
+}
+
+// arenaIndex returns the index into mheap_.arenas of the arena
+// containing metadata for p. This index combines of an index into the
+// L1 map and an index into the L2 map and should be used as
+// mheap_.arenas[ai.l1()][ai.l2()].
+//
+// If p is outside the range of valid heap addresses, either l1() or
+// l2() will be out of bounds.
+//
+// It is nosplit because it's called by spanOf and several other
+// nosplit functions.
+//
+//go:nosplit
+func arenaIndex(p uintptr) arenaIdx {
+	return arenaIdx((p - arenaBaseOffset) / heapArenaBytes)
+}
+
+// arenaBase returns the low address of the region covered by heap
+// arena i.
+func arenaBase(i arenaIdx) uintptr {
+	return uintptr(i)*heapArenaBytes + arenaBaseOffset
+}
+
+type arenaIdx uint
+
+// l1 returns the "l1" portion of an arenaIdx.
+//
+// Marked nosplit because it's called by spanOf and other nosplit
+// functions.
+//
+//go:nosplit
+func (i arenaIdx) l1() uint {
+	if arenaL1Bits == 0 {
+		// Let the compiler optimize this away if there's no
+		// L1 map.
+		return 0
+	} else {
+		return uint(i) >> arenaL1Shift
+	}
+}
+
+// l2 returns the "l2" portion of an arenaIdx.
+//
+// Marked nosplit because it's called by spanOf and other nosplit funcs.
+// functions.
+//
+//go:nosplit
+func (i arenaIdx) l2() uint {
+	if arenaL1Bits == 0 {
+		return uint(i)
+	} else {
+		return uint(i) & (1<<arenaL2Bits - 1)
+	}
+}
+
+// inheap reports whether b is a pointer into a (potentially dead) heap object.
+// It returns false for pointers into mSpanManual spans.
+// Non-preemptible because it is used by write barriers.
+//
+//go:nowritebarrier
+//go:nosplit
+func inheap(b uintptr) bool {
+	return spanOfHeap(b) != nil
+}
+
+// inHeapOrStack is a variant of inheap that returns true for pointers
+// into any allocated heap span.
+//
+//go:nowritebarrier
+//go:nosplit
+func inHeapOrStack(b uintptr) bool {
+	s := spanOf(b)
+	if s == nil || b < s.base() {
+		return false
+	}
+	switch s.state.get() {
+	case mSpanInUse, mSpanManual:
+		return b < s.limit
+	default:
+		return false
+	}
+}
+
+// spanOf returns the span of p. If p does not point into the heap
+// arena or no span has ever contained p, spanOf returns nil.
+//
+// If p does not point to allocated memory, this may return a non-nil
+// span that does *not* contain p. If this is a possibility, the
+// caller should either call spanOfHeap or check the span bounds
+// explicitly.
+//
+// Must be nosplit because it has callers that are nosplit.
+//
+//go:nosplit
+func spanOf(p uintptr) *mspan {
+	// This function looks big, but we use a lot of constant
+	// folding around arenaL1Bits to get it under the inlining
+	// budget. Also, many of the checks here are safety checks
+	// that Go needs to do anyway, so the generated code is quite
+	// short.
+	ri := arenaIndex(p)
+	if arenaL1Bits == 0 {
+		// If there's no L1, then ri.l1() can't be out of bounds but ri.l2() can.
+		if ri.l2() >= uint(len(mheap_.arenas[0])) {
+			return nil
+		}
+	} else {
+		// If there's an L1, then ri.l1() can be out of bounds but ri.l2() can't.
+		if ri.l1() >= uint(len(mheap_.arenas)) {
+			return nil
+		}
+	}
+	l2 := mheap_.arenas[ri.l1()]
+	if arenaL1Bits != 0 && l2 == nil { // Should never happen if there's no L1.
+		return nil
+	}
+	ha := l2[ri.l2()]
+	if ha == nil {
+		return nil
+	}
+	return ha.spans[(p/pageSize)%pagesPerArena]
+}
+
+// spanOfUnchecked is equivalent to spanOf, but the caller must ensure
+// that p points into an allocated heap arena.
+//
+// Must be nosplit because it has callers that are nosplit.
+//
+//go:nosplit
+func spanOfUnchecked(p uintptr) *mspan {
+	ai := arenaIndex(p)
+	return mheap_.arenas[ai.l1()][ai.l2()].spans[(p/pageSize)%pagesPerArena]
+}
+
+// spanOfHeap is like spanOf, but returns nil if p does not point to a
+// heap object.
+//
+// Must be nosplit because it has callers that are nosplit.
+//
+//go:nosplit
+func spanOfHeap(p uintptr) *mspan {
+	s := spanOf(p)
+	// s is nil if it's never been allocated. Otherwise, we check
+	// its state first because we don't trust this pointer, so we
+	// have to synchronize with span initialization. Then, it's
+	// still possible we picked up a stale span pointer, so we
+	// have to check the span's bounds.
+	if s == nil || s.state.get() != mSpanInUse || p < s.base() || p >= s.limit {
+		return nil
+	}
+	return s
+}
+
+// pageIndexOf returns the arena, page index, and page mask for pointer p.
+// The caller must ensure p is in the heap.
+func pageIndexOf(p uintptr) (arena *heapArena, pageIdx uintptr, pageMask uint8) {
+	ai := arenaIndex(p)
+	arena = mheap_.arenas[ai.l1()][ai.l2()]
+	pageIdx = ((p / pageSize) / 8) % uintptr(len(arena.pageInUse))
+	pageMask = byte(1 << ((p / pageSize) % 8))
+	return
+}
+
+// Initialize the heap.
+func (h *mheap) init() {
+	lockInit(&h.lock, lockRankMheap)
+	lockInit(&h.speciallock, lockRankMheapSpecial)
+
+	h.spanalloc.init(unsafe.Sizeof(mspan{}), recordspan, unsafe.Pointer(h), &memstats.mspan_sys)
+	h.cachealloc.init(unsafe.Sizeof(mcache{}), nil, nil, &memstats.mcache_sys)
+	h.specialfinalizeralloc.init(unsafe.Sizeof(specialfinalizer{}), nil, nil, &memstats.other_sys)
+	h.specialprofilealloc.init(unsafe.Sizeof(specialprofile{}), nil, nil, &memstats.other_sys)
+	h.specialReachableAlloc.init(unsafe.Sizeof(specialReachable{}), nil, nil, &memstats.other_sys)
+	h.specialPinCounterAlloc.init(unsafe.Sizeof(specialPinCounter{}), nil, nil, &memstats.other_sys)
+	h.arenaHintAlloc.init(unsafe.Sizeof(arenaHint{}), nil, nil, &memstats.other_sys)
+
+	// Don't zero mspan allocations. Background sweeping can
+	// inspect a span concurrently with allocating it, so it's
+	// important that the span's sweepgen survive across freeing
+	// and re-allocating a span to prevent background sweeping
+	// from improperly cas'ing it from 0.
+	//
+	// This is safe because mspan contains no heap pointers.
+	h.spanalloc.zero = false
+
+	// h->mapcache needs no init
+
+	for i := range h.central {
+		h.central[i].mcentral.init(spanClass(i))
+	}
+
+	h.pages.init(&h.lock, &memstats.gcMiscSys, false)
+}
+
+// reclaim sweeps and reclaims at least npage pages into the heap.
+// It is called before allocating npage pages to keep growth in check.
+//
+// reclaim implements the page-reclaimer half of the sweeper.
+//
+// h.lock must NOT be held.
+func (h *mheap) reclaim(npage uintptr) {
+	// TODO(austin): Half of the time spent freeing spans is in
+	// locking/unlocking the heap (even with low contention). We
+	// could make the slow path here several times faster by
+	// batching heap frees.
+
+	// Bail early if there's no more reclaim work.
+	if h.reclaimIndex.Load() >= 1<<63 {
+		return
+	}
+
+	// Disable preemption so the GC can't start while we're
+	// sweeping, so we can read h.sweepArenas, and so
+	// traceGCSweepStart/Done pair on the P.
+	mp := acquirem()
+
+	if traceEnabled() {
+		traceGCSweepStart()
+	}
+
+	arenas := h.sweepArenas
+	locked := false
+	for npage > 0 {
+		// Pull from accumulated credit first.
+		if credit := h.reclaimCredit.Load(); credit > 0 {
+			take := credit
+			if take > npage {
+				// Take only what we need.
+				take = npage
+			}
+			if h.reclaimCredit.CompareAndSwap(credit, credit-take) {
+				npage -= take
+			}
+			continue
+		}
+
+		// Claim a chunk of work.
+		idx := uintptr(h.reclaimIndex.Add(pagesPerReclaimerChunk) - pagesPerReclaimerChunk)
+		if idx/pagesPerArena >= uintptr(len(arenas)) {
+			// Page reclaiming is done.
+			h.reclaimIndex.Store(1 << 63)
+			break
+		}
+
+		if !locked {
+			// Lock the heap for reclaimChunk.
+			lock(&h.lock)
+			locked = true
+		}
+
+		// Scan this chunk.
+		nfound := h.reclaimChunk(arenas, idx, pagesPerReclaimerChunk)
+		if nfound <= npage {
+			npage -= nfound
+		} else {
+			// Put spare pages toward global credit.
+			h.reclaimCredit.Add(nfound - npage)
+			npage = 0
+		}
+	}
+	if locked {
+		unlock(&h.lock)
+	}
+
+	if traceEnabled() {
+		traceGCSweepDone()
+	}
+	releasem(mp)
+}
+
+// reclaimChunk sweeps unmarked spans that start at page indexes [pageIdx, pageIdx+n).
+// It returns the number of pages returned to the heap.
+//
+// h.lock must be held and the caller must be non-preemptible. Note: h.lock may be
+// temporarily unlocked and re-locked in order to do sweeping or if tracing is
+// enabled.
+func (h *mheap) reclaimChunk(arenas []arenaIdx, pageIdx, n uintptr) uintptr {
+	// The heap lock must be held because this accesses the
+	// heapArena.spans arrays using potentially non-live pointers.
+	// In particular, if a span were freed and merged concurrently
+	// with this probing heapArena.spans, it would be possible to
+	// observe arbitrary, stale span pointers.
+	assertLockHeld(&h.lock)
+
+	n0 := n
+	var nFreed uintptr
+	sl := sweep.active.begin()
+	if !sl.valid {
+		return 0
+	}
+	for n > 0 {
+		ai := arenas[pageIdx/pagesPerArena]
+		ha := h.arenas[ai.l1()][ai.l2()]
+
+		// Get a chunk of the bitmap to work on.
+		arenaPage := uint(pageIdx % pagesPerArena)
+		inUse := ha.pageInUse[arenaPage/8:]
+		marked := ha.pageMarks[arenaPage/8:]
+		if uintptr(len(inUse)) > n/8 {
+			inUse = inUse[:n/8]
+			marked = marked[:n/8]
+		}
+
+		// Scan this bitmap chunk for spans that are in-use
+		// but have no marked objects on them.
+		for i := range inUse {
+			inUseUnmarked := atomic.Load8(&inUse[i]) &^ marked[i]
+			if inUseUnmarked == 0 {
+				continue
+			}
+
+			for j := uint(0); j < 8; j++ {
+				if inUseUnmarked&(1<<j) != 0 {
+					s := ha.spans[arenaPage+uint(i)*8+j]
+					if s, ok := sl.tryAcquire(s); ok {
+						npages := s.npages
+						unlock(&h.lock)
+						if s.sweep(false) {
+							nFreed += npages
+						}
+						lock(&h.lock)
+						// Reload inUse. It's possible nearby
+						// spans were freed when we dropped the
+						// lock and we don't want to get stale
+						// pointers from the spans array.
+						inUseUnmarked = atomic.Load8(&inUse[i]) &^ marked[i]
+					}
+				}
+			}
+		}
+
+		// Advance.
+		pageIdx += uintptr(len(inUse) * 8)
+		n -= uintptr(len(inUse) * 8)
+	}
+	sweep.active.end(sl)
+	if traceEnabled() {
+		unlock(&h.lock)
+		// Account for pages scanned but not reclaimed.
+		traceGCSweepSpan((n0 - nFreed) * pageSize)
+		lock(&h.lock)
+	}
+
+	assertLockHeld(&h.lock) // Must be locked on return.
+	return nFreed
+}
+
+// spanAllocType represents the type of allocation to make, or
+// the type of allocation to be freed.
+type spanAllocType uint8
+
+const (
+	spanAllocHeap          spanAllocType = iota // heap span
+	spanAllocStack                              // stack span
+	spanAllocPtrScalarBits                      // unrolled GC prog bitmap span
+	spanAllocWorkBuf                            // work buf span
+)
+
+// manual returns true if the span allocation is manually managed.
+func (s spanAllocType) manual() bool {
+	return s != spanAllocHeap
+}
+
+// alloc allocates a new span of npage pages from the GC'd heap.
+//
+// spanclass indicates the span's size class and scannability.
+//
+// Returns a span that has been fully initialized. span.needzero indicates
+// whether the span has been zeroed. Note that it may not be.
+func (h *mheap) alloc(npages uintptr, spanclass spanClass) *mspan {
+	// Don't do any operations that lock the heap on the G stack.
+	// It might trigger stack growth, and the stack growth code needs
+	// to be able to allocate heap.
+	var s *mspan
+	systemstack(func() {
+		// To prevent excessive heap growth, before allocating n pages
+		// we need to sweep and reclaim at least n pages.
+		if !isSweepDone() {
+			h.reclaim(npages)
+		}
+		s = h.allocSpan(npages, spanAllocHeap, spanclass)
+	})
+	return s
+}
+
+// allocManual allocates a manually-managed span of npage pages.
+// allocManual returns nil if allocation fails.
+//
+// allocManual adds the bytes used to *stat, which should be a
+// memstats in-use field. Unlike allocations in the GC'd heap, the
+// allocation does *not* count toward heapInUse.
+//
+// The memory backing the returned span may not be zeroed if
+// span.needzero is set.
+//
+// allocManual must be called on the system stack because it may
+// acquire the heap lock via allocSpan. See mheap for details.
+//
+// If new code is written to call allocManual, do NOT use an
+// existing spanAllocType value and instead declare a new one.
+//
+//go:systemstack
+func (h *mheap) allocManual(npages uintptr, typ spanAllocType) *mspan {
+	if !typ.manual() {
+		throw("manual span allocation called with non-manually-managed type")
+	}
+	return h.allocSpan(npages, typ, 0)
+}
+
+// setSpans modifies the span map so [spanOf(base), spanOf(base+npage*pageSize))
+// is s.
+func (h *mheap) setSpans(base, npage uintptr, s *mspan) {
+	p := base / pageSize
+	ai := arenaIndex(base)
+	ha := h.arenas[ai.l1()][ai.l2()]
+	for n := uintptr(0); n < npage; n++ {
+		i := (p + n) % pagesPerArena
+		if i == 0 {
+			ai = arenaIndex(base + n*pageSize)
+			ha = h.arenas[ai.l1()][ai.l2()]
+		}
+		ha.spans[i] = s
+	}
+}
+
+// allocNeedsZero checks if the region of address space [base, base+npage*pageSize),
+// assumed to be allocated, needs to be zeroed, updating heap arena metadata for
+// future allocations.
+//
+// This must be called each time pages are allocated from the heap, even if the page
+// allocator can otherwise prove the memory it's allocating is already zero because
+// they're fresh from the operating system. It updates heapArena metadata that is
+// critical for future page allocations.
+//
+// There are no locking constraints on this method.
+func (h *mheap) allocNeedsZero(base, npage uintptr) (needZero bool) {
+	for npage > 0 {
+		ai := arenaIndex(base)
+		ha := h.arenas[ai.l1()][ai.l2()]
+
+		zeroedBase := atomic.Loaduintptr(&ha.zeroedBase)
+		arenaBase := base % heapArenaBytes
+		if arenaBase < zeroedBase {
+			// We extended into the non-zeroed part of the
+			// arena, so this region needs to be zeroed before use.
+			//
+			// zeroedBase is monotonically increasing, so if we see this now then
+			// we can be sure we need to zero this memory region.
+			//
+			// We still need to update zeroedBase for this arena, and
+			// potentially more arenas.
+			needZero = true
+		}
+		// We may observe arenaBase > zeroedBase if we're racing with one or more
+		// allocations which are acquiring memory directly before us in the address
+		// space. But, because we know no one else is acquiring *this* memory, it's
+		// still safe to not zero.
+
+		// Compute how far into the arena we extend into, capped
+		// at heapArenaBytes.
+		arenaLimit := arenaBase + npage*pageSize
+		if arenaLimit > heapArenaBytes {
+			arenaLimit = heapArenaBytes
+		}
+		// Increase ha.zeroedBase so it's >= arenaLimit.
+		// We may be racing with other updates.
+		for arenaLimit > zeroedBase {
+			if atomic.Casuintptr(&ha.zeroedBase, zeroedBase, arenaLimit) {
+				break
+			}
+			zeroedBase = atomic.Loaduintptr(&ha.zeroedBase)
+			// Double check basic conditions of zeroedBase.
+			if zeroedBase <= arenaLimit && zeroedBase > arenaBase {
+				// The zeroedBase moved into the space we were trying to
+				// claim. That's very bad, and indicates someone allocated
+				// the same region we did.
+				throw("potentially overlapping in-use allocations detected")
+			}
+		}
+
+		// Move base forward and subtract from npage to move into
+		// the next arena, or finish.
+		base += arenaLimit - arenaBase
+		npage -= (arenaLimit - arenaBase) / pageSize
+	}
+	return
+}
+
+// tryAllocMSpan attempts to allocate an mspan object from
+// the P-local cache, but may fail.
+//
+// h.lock need not be held.
+//
+// This caller must ensure that its P won't change underneath
+// it during this function. Currently to ensure that we enforce
+// that the function is run on the system stack, because that's
+// the only place it is used now. In the future, this requirement
+// may be relaxed if its use is necessary elsewhere.
+//
+//go:systemstack
+func (h *mheap) tryAllocMSpan() *mspan {
+	pp := getg().m.p.ptr()
+	// If we don't have a p or the cache is empty, we can't do
+	// anything here.
+	if pp == nil || pp.mspancache.len == 0 {
+		return nil
+	}
+	// Pull off the last entry in the cache.
+	s := pp.mspancache.buf[pp.mspancache.len-1]
+	pp.mspancache.len--
+	return s
+}
+
+// allocMSpanLocked allocates an mspan object.
+//
+// h.lock must be held.
+//
+// allocMSpanLocked must be called on the system stack because
+// its caller holds the heap lock. See mheap for details.
+// Running on the system stack also ensures that we won't
+// switch Ps during this function. See tryAllocMSpan for details.
+//
+//go:systemstack
+func (h *mheap) allocMSpanLocked() *mspan {
+	assertLockHeld(&h.lock)
+
+	pp := getg().m.p.ptr()
+	if pp == nil {
+		// We don't have a p so just do the normal thing.
+		return (*mspan)(h.spanalloc.alloc())
+	}
+	// Refill the cache if necessary.
+	if pp.mspancache.len == 0 {
+		const refillCount = len(pp.mspancache.buf) / 2
+		for i := 0; i < refillCount; i++ {
+			pp.mspancache.buf[i] = (*mspan)(h.spanalloc.alloc())
+		}
+		pp.mspancache.len = refillCount
+	}
+	// Pull off the last entry in the cache.
+	s := pp.mspancache.buf[pp.mspancache.len-1]
+	pp.mspancache.len--
+	return s
+}
+
+// freeMSpanLocked free an mspan object.
+//
+// h.lock must be held.
+//
+// freeMSpanLocked must be called on the system stack because
+// its caller holds the heap lock. See mheap for details.
+// Running on the system stack also ensures that we won't
+// switch Ps during this function. See tryAllocMSpan for details.
+//
+//go:systemstack
+func (h *mheap) freeMSpanLocked(s *mspan) {
+	assertLockHeld(&h.lock)
+
+	pp := getg().m.p.ptr()
+	// First try to free the mspan directly to the cache.
+	if pp != nil && pp.mspancache.len < len(pp.mspancache.buf) {
+		pp.mspancache.buf[pp.mspancache.len] = s
+		pp.mspancache.len++
+		return
+	}
+	// Failing that (or if we don't have a p), just free it to
+	// the heap.
+	h.spanalloc.free(unsafe.Pointer(s))
+}
+
+// allocSpan allocates an mspan which owns npages worth of memory.
+//
+// If typ.manual() == false, allocSpan allocates a heap span of class spanclass
+// and updates heap accounting. If manual == true, allocSpan allocates a
+// manually-managed span (spanclass is ignored), and the caller is
+// responsible for any accounting related to its use of the span. Either
+// way, allocSpan will atomically add the bytes in the newly allocated
+// span to *sysStat.
+//
+// The returned span is fully initialized.
+//
+// h.lock must not be held.
+//
+// allocSpan must be called on the system stack both because it acquires
+// the heap lock and because it must block GC transitions.
+//
+//go:systemstack
+func (h *mheap) allocSpan(npages uintptr, typ spanAllocType, spanclass spanClass) (s *mspan) {
+	// Function-global state.
+	gp := getg()
+	base, scav := uintptr(0), uintptr(0)
+	growth := uintptr(0)
+
+	// On some platforms we need to provide physical page aligned stack
+	// allocations. Where the page size is less than the physical page
+	// size, we already manage to do this by default.
+	needPhysPageAlign := physPageAlignedStacks && typ == spanAllocStack && pageSize < physPageSize
+
+	// If the allocation is small enough, try the page cache!
+	// The page cache does not support aligned allocations, so we cannot use
+	// it if we need to provide a physical page aligned stack allocation.
+	pp := gp.m.p.ptr()
+	if !needPhysPageAlign && pp != nil && npages < pageCachePages/4 {
+		c := &pp.pcache
+
+		// If the cache is empty, refill it.
+		if c.empty() {
+			lock(&h.lock)
+			*c = h.pages.allocToCache()
+			unlock(&h.lock)
+		}
+
+		// Try to allocate from the cache.
+		base, scav = c.alloc(npages)
+		if base != 0 {
+			s = h.tryAllocMSpan()
+			if s != nil {
+				goto HaveSpan
+			}
+			// We have a base but no mspan, so we need
+			// to lock the heap.
+		}
+	}
+
+	// For one reason or another, we couldn't get the
+	// whole job done without the heap lock.
+	lock(&h.lock)
+
+	if needPhysPageAlign {
+		// Overallocate by a physical page to allow for later alignment.
+		extraPages := physPageSize / pageSize
+
+		// Find a big enough region first, but then only allocate the
+		// aligned portion. We can't just allocate and then free the
+		// edges because we need to account for scavenged memory, and
+		// that's difficult with alloc.
+		//
+		// Note that we skip updates to searchAddr here. It's OK if
+		// it's stale and higher than normal; it'll operate correctly,
+		// just come with a performance cost.
+		base, _ = h.pages.find(npages + extraPages)
+		if base == 0 {
+			var ok bool
+			growth, ok = h.grow(npages + extraPages)
+			if !ok {
+				unlock(&h.lock)
+				return nil
+			}
+			base, _ = h.pages.find(npages + extraPages)
+			if base == 0 {
+				throw("grew heap, but no adequate free space found")
+			}
+		}
+		base = alignUp(base, physPageSize)
+		scav = h.pages.allocRange(base, npages)
+	}
+
+	if base == 0 {
+		// Try to acquire a base address.
+		base, scav = h.pages.alloc(npages)
+		if base == 0 {
+			var ok bool
+			growth, ok = h.grow(npages)
+			if !ok {
+				unlock(&h.lock)
+				return nil
+			}
+			base, scav = h.pages.alloc(npages)
+			if base == 0 {
+				throw("grew heap, but no adequate free space found")
+			}
+		}
+	}
+	if s == nil {
+		// We failed to get an mspan earlier, so grab
+		// one now that we have the heap lock.
+		s = h.allocMSpanLocked()
+	}
+	unlock(&h.lock)
+
+HaveSpan:
+	// Decide if we need to scavenge in response to what we just allocated.
+	// Specifically, we track the maximum amount of memory to scavenge of all
+	// the alternatives below, assuming that the maximum satisfies *all*
+	// conditions we check (e.g. if we need to scavenge X to satisfy the
+	// memory limit and Y to satisfy heap-growth scavenging, and Y > X, then
+	// it's fine to pick Y, because the memory limit is still satisfied).
+	//
+	// It's fine to do this after allocating because we expect any scavenged
+	// pages not to get touched until we return. Simultaneously, it's important
+	// to do this before calling sysUsed because that may commit address space.
+	bytesToScavenge := uintptr(0)
+	forceScavenge := false
+	if limit := gcController.memoryLimit.Load(); !gcCPULimiter.limiting() {
+		// Assist with scavenging to maintain the memory limit by the amount
+		// that we expect to page in.
+		inuse := gcController.mappedReady.Load()
+		// Be careful about overflow, especially with uintptrs. Even on 32-bit platforms
+		// someone can set a really big memory limit that isn't maxInt64.
+		if uint64(scav)+inuse > uint64(limit) {
+			bytesToScavenge = uintptr(uint64(scav) + inuse - uint64(limit))
+			forceScavenge = true
+		}
+	}
+	if goal := scavenge.gcPercentGoal.Load(); goal != ^uint64(0) && growth > 0 {
+		// We just caused a heap growth, so scavenge down what will soon be used.
+		// By scavenging inline we deal with the failure to allocate out of
+		// memory fragments by scavenging the memory fragments that are least
+		// likely to be re-used.
+		//
+		// Only bother with this because we're not using a memory limit. We don't
+		// care about heap growths as long as we're under the memory limit, and the
+		// previous check for scaving already handles that.
+		if retained := heapRetained(); retained+uint64(growth) > goal {
+			// The scavenging algorithm requires the heap lock to be dropped so it
+			// can acquire it only sparingly. This is a potentially expensive operation
+			// so it frees up other goroutines to allocate in the meanwhile. In fact,
+			// they can make use of the growth we just created.
+			todo := growth
+			if overage := uintptr(retained + uint64(growth) - goal); todo > overage {
+				todo = overage
+			}
+			if todo > bytesToScavenge {
+				bytesToScavenge = todo
+			}
+		}
+	}
+	// There are a few very limited circumstances where we won't have a P here.
+	// It's OK to simply skip scavenging in these cases. Something else will notice
+	// and pick up the tab.
+	var now int64
+	if pp != nil && bytesToScavenge > 0 {
+		// Measure how long we spent scavenging and add that measurement to the assist
+		// time so we can track it for the GC CPU limiter.
+		//
+		// Limiter event tracking might be disabled if we end up here
+		// while on a mark worker.
+		start := nanotime()
+		track := pp.limiterEvent.start(limiterEventScavengeAssist, start)
+
+		// Scavenge, but back out if the limiter turns on.
+		released := h.pages.scavenge(bytesToScavenge, func() bool {
+			return gcCPULimiter.limiting()
+		}, forceScavenge)
+
+		mheap_.pages.scav.releasedEager.Add(released)
+
+		// Finish up accounting.
+		now = nanotime()
+		if track {
+			pp.limiterEvent.stop(limiterEventScavengeAssist, now)
+		}
+		scavenge.assistTime.Add(now - start)
+	}
+
+	// Initialize the span.
+	h.initSpan(s, typ, spanclass, base, npages)
+
+	// Commit and account for any scavenged memory that the span now owns.
+	nbytes := npages * pageSize
+	if scav != 0 {
+		// sysUsed all the pages that are actually available
+		// in the span since some of them might be scavenged.
+		sysUsed(unsafe.Pointer(base), nbytes, scav)
+		gcController.heapReleased.add(-int64(scav))
+	}
+	// Update stats.
+	gcController.heapFree.add(-int64(nbytes - scav))
+	if typ == spanAllocHeap {
+		gcController.heapInUse.add(int64(nbytes))
+	}
+	// Update consistent stats.
+	stats := memstats.heapStats.acquire()
+	atomic.Xaddint64(&stats.committed, int64(scav))
+	atomic.Xaddint64(&stats.released, -int64(scav))
+	switch typ {
+	case spanAllocHeap:
+		atomic.Xaddint64(&stats.inHeap, int64(nbytes))
+	case spanAllocStack:
+		atomic.Xaddint64(&stats.inStacks, int64(nbytes))
+	case spanAllocPtrScalarBits:
+		atomic.Xaddint64(&stats.inPtrScalarBits, int64(nbytes))
+	case spanAllocWorkBuf:
+		atomic.Xaddint64(&stats.inWorkBufs, int64(nbytes))
+	}
+	memstats.heapStats.release()
+
+	pageTraceAlloc(pp, now, base, npages)
+	return s
+}
+
+// initSpan initializes a blank span s which will represent the range
+// [base, base+npages*pageSize). typ is the type of span being allocated.
+func (h *mheap) initSpan(s *mspan, typ spanAllocType, spanclass spanClass, base, npages uintptr) {
+	// At this point, both s != nil and base != 0, and the heap
+	// lock is no longer held. Initialize the span.
+	s.init(base, npages)
+	if h.allocNeedsZero(base, npages) {
+		s.needzero = 1
+	}
+	nbytes := npages * pageSize
+	if typ.manual() {
+		s.manualFreeList = 0
+		s.nelems = 0
+		s.limit = s.base() + s.npages*pageSize
+		s.state.set(mSpanManual)
+	} else {
+		// We must set span properties before the span is published anywhere
+		// since we're not holding the heap lock.
+		s.spanclass = spanclass
+		if sizeclass := spanclass.sizeclass(); sizeclass == 0 {
+			s.elemsize = nbytes
+			s.nelems = 1
+			s.divMul = 0
+		} else {
+			s.elemsize = uintptr(class_to_size[sizeclass])
+			s.nelems = nbytes / s.elemsize
+			s.divMul = class_to_divmagic[sizeclass]
+		}
+
+		// Initialize mark and allocation structures.
+		s.freeindex = 0
+		s.freeIndexForScan = 0
+		s.allocCache = ^uint64(0) // all 1s indicating all free.
+		s.gcmarkBits = newMarkBits(s.nelems)
+		s.allocBits = newAllocBits(s.nelems)
+
+		// It's safe to access h.sweepgen without the heap lock because it's
+		// only ever updated with the world stopped and we run on the
+		// systemstack which blocks a STW transition.
+		atomic.Store(&s.sweepgen, h.sweepgen)
+
+		// Now that the span is filled in, set its state. This
+		// is a publication barrier for the other fields in
+		// the span. While valid pointers into this span
+		// should never be visible until the span is returned,
+		// if the garbage collector finds an invalid pointer,
+		// access to the span may race with initialization of
+		// the span. We resolve this race by atomically
+		// setting the state after the span is fully
+		// initialized, and atomically checking the state in
+		// any situation where a pointer is suspect.
+		s.state.set(mSpanInUse)
+	}
+
+	// Publish the span in various locations.
+
+	// This is safe to call without the lock held because the slots
+	// related to this span will only ever be read or modified by
+	// this thread until pointers into the span are published (and
+	// we execute a publication barrier at the end of this function
+	// before that happens) or pageInUse is updated.
+	h.setSpans(s.base(), npages, s)
+
+	if !typ.manual() {
+		// Mark in-use span in arena page bitmap.
+		//
+		// This publishes the span to the page sweeper, so
+		// it's imperative that the span be completely initialized
+		// prior to this line.
+		arena, pageIdx, pageMask := pageIndexOf(s.base())
+		atomic.Or8(&arena.pageInUse[pageIdx], pageMask)
+
+		// Update related page sweeper stats.
+		h.pagesInUse.Add(npages)
+	}
+
+	// Make sure the newly allocated span will be observed
+	// by the GC before pointers into the span are published.
+	publicationBarrier()
+}
+
+// Try to add at least npage pages of memory to the heap,
+// returning how much the heap grew by and whether it worked.
+//
+// h.lock must be held.
+func (h *mheap) grow(npage uintptr) (uintptr, bool) {
+	assertLockHeld(&h.lock)
+
+	// We must grow the heap in whole palloc chunks.
+	// We call sysMap below but note that because we
+	// round up to pallocChunkPages which is on the order
+	// of MiB (generally >= to the huge page size) we
+	// won't be calling it too much.
+	ask := alignUp(npage, pallocChunkPages) * pageSize
+
+	totalGrowth := uintptr(0)
+	// This may overflow because ask could be very large
+	// and is otherwise unrelated to h.curArena.base.
+	end := h.curArena.base + ask
+	nBase := alignUp(end, physPageSize)
+	if nBase > h.curArena.end || /* overflow */ end < h.curArena.base {
+		// Not enough room in the current arena. Allocate more
+		// arena space. This may not be contiguous with the
+		// current arena, so we have to request the full ask.
+		av, asize := h.sysAlloc(ask, &h.arenaHints, true)
+		if av == nil {
+			inUse := gcController.heapFree.load() + gcController.heapReleased.load() + gcController.heapInUse.load()
+			print("runtime: out of memory: cannot allocate ", ask, "-byte block (", inUse, " in use)\n")
+			return 0, false
+		}
+
+		if uintptr(av) == h.curArena.end {
+			// The new space is contiguous with the old
+			// space, so just extend the current space.
+			h.curArena.end = uintptr(av) + asize
+		} else {
+			// The new space is discontiguous. Track what
+			// remains of the current space and switch to
+			// the new space. This should be rare.
+			if size := h.curArena.end - h.curArena.base; size != 0 {
+				// Transition this space from Reserved to Prepared and mark it
+				// as released since we'll be able to start using it after updating
+				// the page allocator and releasing the lock at any time.
+				sysMap(unsafe.Pointer(h.curArena.base), size, &gcController.heapReleased)
+				// Update stats.
+				stats := memstats.heapStats.acquire()
+				atomic.Xaddint64(&stats.released, int64(size))
+				memstats.heapStats.release()
+				// Update the page allocator's structures to make this
+				// space ready for allocation.
+				h.pages.grow(h.curArena.base, size)
+				totalGrowth += size
+			}
+			// Switch to the new space.
+			h.curArena.base = uintptr(av)
+			h.curArena.end = uintptr(av) + asize
+		}
+
+		// Recalculate nBase.
+		// We know this won't overflow, because sysAlloc returned
+		// a valid region starting at h.curArena.base which is at
+		// least ask bytes in size.
+		nBase = alignUp(h.curArena.base+ask, physPageSize)
+	}
+
+	// Grow into the current arena.
+	v := h.curArena.base
+	h.curArena.base = nBase
+
+	// Transition the space we're going to use from Reserved to Prepared.
+	//
+	// The allocation is always aligned to the heap arena
+	// size which is always > physPageSize, so its safe to
+	// just add directly to heapReleased.
+	sysMap(unsafe.Pointer(v), nBase-v, &gcController.heapReleased)
+
+	// The memory just allocated counts as both released
+	// and idle, even though it's not yet backed by spans.
+	stats := memstats.heapStats.acquire()
+	atomic.Xaddint64(&stats.released, int64(nBase-v))
+	memstats.heapStats.release()
+
+	// Update the page allocator's structures to make this
+	// space ready for allocation.
+	h.pages.grow(v, nBase-v)
+	totalGrowth += nBase - v
+	return totalGrowth, true
+}
+
+// Free the span back into the heap.
+func (h *mheap) freeSpan(s *mspan) {
+	systemstack(func() {
+		pageTraceFree(getg().m.p.ptr(), 0, s.base(), s.npages)
+
+		lock(&h.lock)
+		if msanenabled {
+			// Tell msan that this entire span is no longer in use.
+			base := unsafe.Pointer(s.base())
+			bytes := s.npages << _PageShift
+			msanfree(base, bytes)
+		}
+		if asanenabled {
+			// Tell asan that this entire span is no longer in use.
+			base := unsafe.Pointer(s.base())
+			bytes := s.npages << _PageShift
+			asanpoison(base, bytes)
+		}
+		h.freeSpanLocked(s, spanAllocHeap)
+		unlock(&h.lock)
+	})
+}
+
+// freeManual frees a manually-managed span returned by allocManual.
+// typ must be the same as the spanAllocType passed to the allocManual that
+// allocated s.
+//
+// This must only be called when gcphase == _GCoff. See mSpanState for
+// an explanation.
+//
+// freeManual must be called on the system stack because it acquires
+// the heap lock. See mheap for details.
+//
+//go:systemstack
+func (h *mheap) freeManual(s *mspan, typ spanAllocType) {
+	pageTraceFree(getg().m.p.ptr(), 0, s.base(), s.npages)
+
+	s.needzero = 1
+	lock(&h.lock)
+	h.freeSpanLocked(s, typ)
+	unlock(&h.lock)
+}
+
+func (h *mheap) freeSpanLocked(s *mspan, typ spanAllocType) {
+	assertLockHeld(&h.lock)
+
+	switch s.state.get() {
+	case mSpanManual:
+		if s.allocCount != 0 {
+			throw("mheap.freeSpanLocked - invalid stack free")
+		}
+	case mSpanInUse:
+		if s.isUserArenaChunk {
+			throw("mheap.freeSpanLocked - invalid free of user arena chunk")
+		}
+		if s.allocCount != 0 || s.sweepgen != h.sweepgen {
+			print("mheap.freeSpanLocked - span ", s, " ptr ", hex(s.base()), " allocCount ", s.allocCount, " sweepgen ", s.sweepgen, "/", h.sweepgen, "\n")
+			throw("mheap.freeSpanLocked - invalid free")
+		}
+		h.pagesInUse.Add(-s.npages)
+
+		// Clear in-use bit in arena page bitmap.
+		arena, pageIdx, pageMask := pageIndexOf(s.base())
+		atomic.And8(&arena.pageInUse[pageIdx], ^pageMask)
+	default:
+		throw("mheap.freeSpanLocked - invalid span state")
+	}
+
+	// Update stats.
+	//
+	// Mirrors the code in allocSpan.
+	nbytes := s.npages * pageSize
+	gcController.heapFree.add(int64(nbytes))
+	if typ == spanAllocHeap {
+		gcController.heapInUse.add(-int64(nbytes))
+	}
+	// Update consistent stats.
+	stats := memstats.heapStats.acquire()
+	switch typ {
+	case spanAllocHeap:
+		atomic.Xaddint64(&stats.inHeap, -int64(nbytes))
+	case spanAllocStack:
+		atomic.Xaddint64(&stats.inStacks, -int64(nbytes))
+	case spanAllocPtrScalarBits:
+		atomic.Xaddint64(&stats.inPtrScalarBits, -int64(nbytes))
+	case spanAllocWorkBuf:
+		atomic.Xaddint64(&stats.inWorkBufs, -int64(nbytes))
+	}
+	memstats.heapStats.release()
+
+	// Mark the space as free.
+	h.pages.free(s.base(), s.npages)
+
+	// Free the span structure. We no longer have a use for it.
+	s.state.set(mSpanDead)
+	h.freeMSpanLocked(s)
+}
+
+// scavengeAll acquires the heap lock (blocking any additional
+// manipulation of the page allocator) and iterates over the whole
+// heap, scavenging every free page available.
+//
+// Must run on the system stack because it acquires the heap lock.
+//
+//go:systemstack
+func (h *mheap) scavengeAll() {
+	// Disallow malloc or panic while holding the heap lock. We do
+	// this here because this is a non-mallocgc entry-point to
+	// the mheap API.
+	gp := getg()
+	gp.m.mallocing++
+
+	// Force scavenge everything.
+	released := h.pages.scavenge(^uintptr(0), nil, true)
+
+	gp.m.mallocing--
+
+	if debug.scavtrace > 0 {
+		printScavTrace(0, released, true)
+	}
+}
+
+//go:linkname runtime_debug_freeOSMemory runtime/debug.freeOSMemory
+func runtime_debug_freeOSMemory() {
+	GC()
+	systemstack(func() { mheap_.scavengeAll() })
+}
+
+// Initialize a new span with the given start and npages.
+func (span *mspan) init(base uintptr, npages uintptr) {
+	// span is *not* zeroed.
+	span.next = nil
+	span.prev = nil
+	span.list = nil
+	span.startAddr = base
+	span.npages = npages
+	span.allocCount = 0
+	span.spanclass = 0
+	span.elemsize = 0
+	span.speciallock.key = 0
+	span.specials = nil
+	span.needzero = 0
+	span.freeindex = 0
+	span.freeIndexForScan = 0
+	span.allocBits = nil
+	span.gcmarkBits = nil
+	span.pinnerBits = nil
+	span.state.set(mSpanDead)
+	lockInit(&span.speciallock, lockRankMspanSpecial)
+}
+
+func (span *mspan) inList() bool {
+	return span.list != nil
+}
+
+// Initialize an empty doubly-linked list.
+func (list *mSpanList) init() {
+	list.first = nil
+	list.last = nil
+}
+
+func (list *mSpanList) remove(span *mspan) {
+	if span.list != list {
+		print("runtime: failed mSpanList.remove span.npages=", span.npages,
+			" span=", span, " prev=", span.prev, " span.list=", span.list, " list=", list, "\n")
+		throw("mSpanList.remove")
+	}
+	if list.first == span {
+		list.first = span.next
+	} else {
+		span.prev.next = span.next
+	}
+	if list.last == span {
+		list.last = span.prev
+	} else {
+		span.next.prev = span.prev
+	}
+	span.next = nil
+	span.prev = nil
+	span.list = nil
+}
+
+func (list *mSpanList) isEmpty() bool {
+	return list.first == nil
+}
+
+func (list *mSpanList) insert(span *mspan) {
+	if span.next != nil || span.prev != nil || span.list != nil {
+		println("runtime: failed mSpanList.insert", span, span.next, span.prev, span.list)
+		throw("mSpanList.insert")
+	}
+	span.next = list.first
+	if list.first != nil {
+		// The list contains at least one span; link it in.
+		// The last span in the list doesn't change.
+		list.first.prev = span
+	} else {
+		// The list contains no spans, so this is also the last span.
+		list.last = span
+	}
+	list.first = span
+	span.list = list
+}
+
+func (list *mSpanList) insertBack(span *mspan) {
+	if span.next != nil || span.prev != nil || span.list != nil {
+		println("runtime: failed mSpanList.insertBack", span, span.next, span.prev, span.list)
+		throw("mSpanList.insertBack")
+	}
+	span.prev = list.last
+	if list.last != nil {
+		// The list contains at least one span.
+		list.last.next = span
+	} else {
+		// The list contains no spans, so this is also the first span.
+		list.first = span
+	}
+	list.last = span
+	span.list = list
+}
+
+// takeAll removes all spans from other and inserts them at the front
+// of list.
+func (list *mSpanList) takeAll(other *mSpanList) {
+	if other.isEmpty() {
+		return
+	}
+
+	// Reparent everything in other to list.
+	for s := other.first; s != nil; s = s.next {
+		s.list = list
+	}
+
+	// Concatenate the lists.
+	if list.isEmpty() {
+		*list = *other
+	} else {
+		// Neither list is empty. Put other before list.
+		other.last.next = list.first
+		list.first.prev = other.last
+		list.first = other.first
+	}
+
+	other.first, other.last = nil, nil
+}
+
+const (
+	_KindSpecialFinalizer = 1
+	_KindSpecialProfile   = 2
+	// _KindSpecialReachable is a special used for tracking
+	// reachability during testing.
+	_KindSpecialReachable = 3
+	// _KindSpecialPinCounter is a special used for objects that are pinned
+	// multiple times
+	_KindSpecialPinCounter = 4
+	// Note: The finalizer special must be first because if we're freeing
+	// an object, a finalizer special will cause the freeing operation
+	// to abort, and we want to keep the other special records around
+	// if that happens.
+)
+
+type special struct {
+	_      sys.NotInHeap
+	next   *special // linked list in span
+	offset uint16   // span offset of object
+	kind   byte     // kind of special
+}
+
+// spanHasSpecials marks a span as having specials in the arena bitmap.
+func spanHasSpecials(s *mspan) {
+	arenaPage := (s.base() / pageSize) % pagesPerArena
+	ai := arenaIndex(s.base())
+	ha := mheap_.arenas[ai.l1()][ai.l2()]
+	atomic.Or8(&ha.pageSpecials[arenaPage/8], uint8(1)<<(arenaPage%8))
+}
+
+// spanHasNoSpecials marks a span as having no specials in the arena bitmap.
+func spanHasNoSpecials(s *mspan) {
+	arenaPage := (s.base() / pageSize) % pagesPerArena
+	ai := arenaIndex(s.base())
+	ha := mheap_.arenas[ai.l1()][ai.l2()]
+	atomic.And8(&ha.pageSpecials[arenaPage/8], ^(uint8(1) << (arenaPage % 8)))
+}
+
+// Adds the special record s to the list of special records for
+// the object p. All fields of s should be filled in except for
+// offset & next, which this routine will fill in.
+// Returns true if the special was successfully added, false otherwise.
+// (The add will fail only if a record with the same p and s->kind
+// already exists.)
+func addspecial(p unsafe.Pointer, s *special) bool {
+	span := spanOfHeap(uintptr(p))
+	if span == nil {
+		throw("addspecial on invalid pointer")
+	}
+
+	// Ensure that the span is swept.
+	// Sweeping accesses the specials list w/o locks, so we have
+	// to synchronize with it. And it's just much safer.
+	mp := acquirem()
+	span.ensureSwept()
+
+	offset := uintptr(p) - span.base()
+	kind := s.kind
+
+	lock(&span.speciallock)
+
+	// Find splice point, check for existing record.
+	iter, exists := span.specialFindSplicePoint(offset, kind)
+	if !exists {
+		// Splice in record, fill in offset.
+		s.offset = uint16(offset)
+		s.next = *iter
+		*iter = s
+		spanHasSpecials(span)
+	}
+
+	unlock(&span.speciallock)
+	releasem(mp)
+	return !exists // already exists
+}
+
+// Removes the Special record of the given kind for the object p.
+// Returns the record if the record existed, nil otherwise.
+// The caller must FixAlloc_Free the result.
+func removespecial(p unsafe.Pointer, kind uint8) *special {
+	span := spanOfHeap(uintptr(p))
+	if span == nil {
+		throw("removespecial on invalid pointer")
+	}
+
+	// Ensure that the span is swept.
+	// Sweeping accesses the specials list w/o locks, so we have
+	// to synchronize with it. And it's just much safer.
+	mp := acquirem()
+	span.ensureSwept()
+
+	offset := uintptr(p) - span.base()
+
+	var result *special
+	lock(&span.speciallock)
+
+	iter, exists := span.specialFindSplicePoint(offset, kind)
+	if exists {
+		s := *iter
+		*iter = s.next
+		result = s
+	}
+	if span.specials == nil {
+		spanHasNoSpecials(span)
+	}
+	unlock(&span.speciallock)
+	releasem(mp)
+	return result
+}
+
+// Find a splice point in the sorted list and check for an already existing
+// record. Returns a pointer to the next-reference in the list predecessor.
+// Returns true, if the referenced item is an exact match.
+func (span *mspan) specialFindSplicePoint(offset uintptr, kind byte) (**special, bool) {
+	// Find splice point, check for existing record.
+	iter := &span.specials
+	found := false
+	for {
+		s := *iter
+		if s == nil {
+			break
+		}
+		if offset == uintptr(s.offset) && kind == s.kind {
+			found = true
+			break
+		}
+		if offset < uintptr(s.offset) || (offset == uintptr(s.offset) && kind < s.kind) {
+			break
+		}
+		iter = &s.next
+	}
+	return iter, found
+}
+
+// The described object has a finalizer set for it.
+//
+// specialfinalizer is allocated from non-GC'd memory, so any heap
+// pointers must be specially handled.
+type specialfinalizer struct {
+	_       sys.NotInHeap
+	special special
+	fn      *funcval // May be a heap pointer.
+	nret    uintptr
+	fint    *_type   // May be a heap pointer, but always live.
+	ot      *ptrtype // May be a heap pointer, but always live.
+}
+
+// Adds a finalizer to the object p. Returns true if it succeeded.
+func addfinalizer(p unsafe.Pointer, f *funcval, nret uintptr, fint *_type, ot *ptrtype) bool {
+	lock(&mheap_.speciallock)
+	s := (*specialfinalizer)(mheap_.specialfinalizeralloc.alloc())
+	unlock(&mheap_.speciallock)
+	s.special.kind = _KindSpecialFinalizer
+	s.fn = f
+	s.nret = nret
+	s.fint = fint
+	s.ot = ot
+	if addspecial(p, &s.special) {
+		// This is responsible for maintaining the same
+		// GC-related invariants as markrootSpans in any
+		// situation where it's possible that markrootSpans
+		// has already run but mark termination hasn't yet.
+		if gcphase != _GCoff {
+			base, span, _ := findObject(uintptr(p), 0, 0)
+			mp := acquirem()
+			gcw := &mp.p.ptr().gcw
+			// Mark everything reachable from the object
+			// so it's retained for the finalizer.
+			if !span.spanclass.noscan() {
+				scanobject(base, gcw)
+			}
+			// Mark the finalizer itself, since the
+			// special isn't part of the GC'd heap.
+			scanblock(uintptr(unsafe.Pointer(&s.fn)), goarch.PtrSize, &oneptrmask[0], gcw, nil)
+			releasem(mp)
+		}
+		return true
+	}
+
+	// There was an old finalizer
+	lock(&mheap_.speciallock)
+	mheap_.specialfinalizeralloc.free(unsafe.Pointer(s))
+	unlock(&mheap_.speciallock)
+	return false
+}
+
+// Removes the finalizer (if any) from the object p.
+func removefinalizer(p unsafe.Pointer) {
+	s := (*specialfinalizer)(unsafe.Pointer(removespecial(p, _KindSpecialFinalizer)))
+	if s == nil {
+		return // there wasn't a finalizer to remove
+	}
+	lock(&mheap_.speciallock)
+	mheap_.specialfinalizeralloc.free(unsafe.Pointer(s))
+	unlock(&mheap_.speciallock)
+}
+
+// The described object is being heap profiled.
+type specialprofile struct {
+	_       sys.NotInHeap
+	special special
+	b       *bucket
+}
+
+// Set the heap profile bucket associated with addr to b.
+func setprofilebucket(p unsafe.Pointer, b *bucket) {
+	lock(&mheap_.speciallock)
+	s := (*specialprofile)(mheap_.specialprofilealloc.alloc())
+	unlock(&mheap_.speciallock)
+	s.special.kind = _KindSpecialProfile
+	s.b = b
+	if !addspecial(p, &s.special) {
+		throw("setprofilebucket: profile already set")
+	}
+}
+
+// specialReachable tracks whether an object is reachable on the next
+// GC cycle. This is used by testing.
+type specialReachable struct {
+	special   special
+	done      bool
+	reachable bool
+}
+
+// specialPinCounter tracks whether an object is pinned multiple times.
+type specialPinCounter struct {
+	special special
+	counter uintptr
+}
+
+// specialsIter helps iterate over specials lists.
+type specialsIter struct {
+	pprev **special
+	s     *special
+}
+
+func newSpecialsIter(span *mspan) specialsIter {
+	return specialsIter{&span.specials, span.specials}
+}
+
+func (i *specialsIter) valid() bool {
+	return i.s != nil
+}
+
+func (i *specialsIter) next() {
+	i.pprev = &i.s.next
+	i.s = *i.pprev
+}
+
+// unlinkAndNext removes the current special from the list and moves
+// the iterator to the next special. It returns the unlinked special.
+func (i *specialsIter) unlinkAndNext() *special {
+	cur := i.s
+	i.s = cur.next
+	*i.pprev = i.s
+	return cur
+}
+
+// freeSpecial performs any cleanup on special s and deallocates it.
+// s must already be unlinked from the specials list.
+func freeSpecial(s *special, p unsafe.Pointer, size uintptr) {
+	switch s.kind {
+	case _KindSpecialFinalizer:
+		sf := (*specialfinalizer)(unsafe.Pointer(s))
+		queuefinalizer(p, sf.fn, sf.nret, sf.fint, sf.ot)
+		lock(&mheap_.speciallock)
+		mheap_.specialfinalizeralloc.free(unsafe.Pointer(sf))
+		unlock(&mheap_.speciallock)
+	case _KindSpecialProfile:
+		sp := (*specialprofile)(unsafe.Pointer(s))
+		mProf_Free(sp.b, size)
+		lock(&mheap_.speciallock)
+		mheap_.specialprofilealloc.free(unsafe.Pointer(sp))
+		unlock(&mheap_.speciallock)
+	case _KindSpecialReachable:
+		sp := (*specialReachable)(unsafe.Pointer(s))
+		sp.done = true
+		// The creator frees these.
+	case _KindSpecialPinCounter:
+		lock(&mheap_.speciallock)
+		mheap_.specialPinCounterAlloc.free(unsafe.Pointer(s))
+		unlock(&mheap_.speciallock)
+	default:
+		throw("bad special kind")
+		panic("not reached")
+	}
+}
+
+// gcBits is an alloc/mark bitmap. This is always used as gcBits.x.
+type gcBits struct {
+	_ sys.NotInHeap
+	x uint8
+}
+
+// bytep returns a pointer to the n'th byte of b.
+func (b *gcBits) bytep(n uintptr) *uint8 {
+	return addb(&b.x, n)
+}
+
+// bitp returns a pointer to the byte containing bit n and a mask for
+// selecting that bit from *bytep.
+func (b *gcBits) bitp(n uintptr) (bytep *uint8, mask uint8) {
+	return b.bytep(n / 8), 1 << (n % 8)
+}
+
+const gcBitsChunkBytes = uintptr(64 << 10)
+const gcBitsHeaderBytes = unsafe.Sizeof(gcBitsHeader{})
+
+type gcBitsHeader struct {
+	free uintptr // free is the index into bits of the next free byte.
+	next uintptr // *gcBits triggers recursive type bug. (issue 14620)
+}
+
+type gcBitsArena struct {
+	_ sys.NotInHeap
+	// gcBitsHeader // side step recursive type bug (issue 14620) by including fields by hand.
+	free uintptr // free is the index into bits of the next free byte; read/write atomically
+	next *gcBitsArena
+	bits [gcBitsChunkBytes - gcBitsHeaderBytes]gcBits
+}
+
+var gcBitsArenas struct {
+	lock     mutex
+	free     *gcBitsArena
+	next     *gcBitsArena // Read atomically. Write atomically under lock.
+	current  *gcBitsArena
+	previous *gcBitsArena
+}
+
+// tryAlloc allocates from b or returns nil if b does not have enough room.
+// This is safe to call concurrently.
+func (b *gcBitsArena) tryAlloc(bytes uintptr) *gcBits {
+	if b == nil || atomic.Loaduintptr(&b.free)+bytes > uintptr(len(b.bits)) {
+		return nil
+	}
+	// Try to allocate from this block.
+	end := atomic.Xadduintptr(&b.free, bytes)
+	if end > uintptr(len(b.bits)) {
+		return nil
+	}
+	// There was enough room.
+	start := end - bytes
+	return &b.bits[start]
+}
+
+// newMarkBits returns a pointer to 8 byte aligned bytes
+// to be used for a span's mark bits.
+func newMarkBits(nelems uintptr) *gcBits {
+	blocksNeeded := uintptr((nelems + 63) / 64)
+	bytesNeeded := blocksNeeded * 8
+
+	// Try directly allocating from the current head arena.
+	head := (*gcBitsArena)(atomic.Loadp(unsafe.Pointer(&gcBitsArenas.next)))
+	if p := head.tryAlloc(bytesNeeded); p != nil {
+		return p
+	}
+
+	// There's not enough room in the head arena. We may need to
+	// allocate a new arena.
+	lock(&gcBitsArenas.lock)
+	// Try the head arena again, since it may have changed. Now
+	// that we hold the lock, the list head can't change, but its
+	// free position still can.
+	if p := gcBitsArenas.next.tryAlloc(bytesNeeded); p != nil {
+		unlock(&gcBitsArenas.lock)
+		return p
+	}
+
+	// Allocate a new arena. This may temporarily drop the lock.
+	fresh := newArenaMayUnlock()
+	// If newArenaMayUnlock dropped the lock, another thread may
+	// have put a fresh arena on the "next" list. Try allocating
+	// from next again.
+	if p := gcBitsArenas.next.tryAlloc(bytesNeeded); p != nil {
+		// Put fresh back on the free list.
+		// TODO: Mark it "already zeroed"
+		fresh.next = gcBitsArenas.free
+		gcBitsArenas.free = fresh
+		unlock(&gcBitsArenas.lock)
+		return p
+	}
+
+	// Allocate from the fresh arena. We haven't linked it in yet, so
+	// this cannot race and is guaranteed to succeed.
+	p := fresh.tryAlloc(bytesNeeded)
+	if p == nil {
+		throw("markBits overflow")
+	}
+
+	// Add the fresh arena to the "next" list.
+	fresh.next = gcBitsArenas.next
+	atomic.StorepNoWB(unsafe.Pointer(&gcBitsArenas.next), unsafe.Pointer(fresh))
+
+	unlock(&gcBitsArenas.lock)
+	return p
+}
+
+// newAllocBits returns a pointer to 8 byte aligned bytes
+// to be used for this span's alloc bits.
+// newAllocBits is used to provide newly initialized spans
+// allocation bits. For spans not being initialized the
+// mark bits are repurposed as allocation bits when
+// the span is swept.
+func newAllocBits(nelems uintptr) *gcBits {
+	return newMarkBits(nelems)
+}
+
+// nextMarkBitArenaEpoch establishes a new epoch for the arenas
+// holding the mark bits. The arenas are named relative to the
+// current GC cycle which is demarcated by the call to finishweep_m.
+//
+// All current spans have been swept.
+// During that sweep each span allocated room for its gcmarkBits in
+// gcBitsArenas.next block. gcBitsArenas.next becomes the gcBitsArenas.current
+// where the GC will mark objects and after each span is swept these bits
+// will be used to allocate objects.
+// gcBitsArenas.current becomes gcBitsArenas.previous where the span's
+// gcAllocBits live until all the spans have been swept during this GC cycle.
+// The span's sweep extinguishes all the references to gcBitsArenas.previous
+// by pointing gcAllocBits into the gcBitsArenas.current.
+// The gcBitsArenas.previous is released to the gcBitsArenas.free list.
+func nextMarkBitArenaEpoch() {
+	lock(&gcBitsArenas.lock)
+	if gcBitsArenas.previous != nil {
+		if gcBitsArenas.free == nil {
+			gcBitsArenas.free = gcBitsArenas.previous
+		} else {
+			// Find end of previous arenas.
+			last := gcBitsArenas.previous
+			for last = gcBitsArenas.previous; last.next != nil; last = last.next {
+			}
+			last.next = gcBitsArenas.free
+			gcBitsArenas.free = gcBitsArenas.previous
+		}
+	}
+	gcBitsArenas.previous = gcBitsArenas.current
+	gcBitsArenas.current = gcBitsArenas.next
+	atomic.StorepNoWB(unsafe.Pointer(&gcBitsArenas.next), nil) // newMarkBits calls newArena when needed
+	unlock(&gcBitsArenas.lock)
+}
+
+// newArenaMayUnlock allocates and zeroes a gcBits arena.
+// The caller must hold gcBitsArena.lock. This may temporarily release it.
+func newArenaMayUnlock() *gcBitsArena {
+	var result *gcBitsArena
+	if gcBitsArenas.free == nil {
+		unlock(&gcBitsArenas.lock)
+		result = (*gcBitsArena)(sysAlloc(gcBitsChunkBytes, &memstats.gcMiscSys))
+		if result == nil {
+			throw("runtime: cannot allocate memory")
+		}
+		lock(&gcBitsArenas.lock)
+	} else {
+		result = gcBitsArenas.free
+		gcBitsArenas.free = gcBitsArenas.free.next
+		memclrNoHeapPointers(unsafe.Pointer(result), gcBitsChunkBytes)
+	}
+	result.next = nil
+	// If result.bits is not 8 byte aligned adjust index so
+	// that &result.bits[result.free] is 8 byte aligned.
+	if uintptr(unsafe.Offsetof(gcBitsArena{}.bits))&7 == 0 {
+		result.free = 0
+	} else {
+		result.free = 8 - (uintptr(unsafe.Pointer(&result.bits[0])) & 7)
+	}
+	return result
+}
diff --git a/src/runtime/minmax.go b/src/runtime/minmax.go
new file mode 100644
index 0000000..e5efc65
--- /dev/null
+++ b/src/runtime/minmax.go
@@ -0,0 +1,72 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+func strmin(x, y string) string {
+	if y < x {
+		return y
+	}
+	return x
+}
+
+func strmax(x, y string) string {
+	if y > x {
+		return y
+	}
+	return x
+}
+
+func fmin32(x, y float32) float32 { return fmin(x, y) }
+func fmin64(x, y float64) float64 { return fmin(x, y) }
+func fmax32(x, y float32) float32 { return fmax(x, y) }
+func fmax64(x, y float64) float64 { return fmax(x, y) }
+
+type floaty interface{ ~float32 | ~float64 }
+
+func fmin[F floaty](x, y F) F {
+	if y != y || y < x {
+		return y
+	}
+	if x != x || x < y || x != 0 {
+		return x
+	}
+	// x and y are both ±0
+	// if either is -0, return -0; else return +0
+	return forbits(x, y)
+}
+
+func fmax[F floaty](x, y F) F {
+	if y != y || y > x {
+		return y
+	}
+	if x != x || x > y || x != 0 {
+		return x
+	}
+	// x and y are both ±0
+	// if both are -0, return -0; else return +0
+	return fandbits(x, y)
+}
+
+func forbits[F floaty](x, y F) F {
+	switch unsafe.Sizeof(x) {
+	case 4:
+		*(*uint32)(unsafe.Pointer(&x)) |= *(*uint32)(unsafe.Pointer(&y))
+	case 8:
+		*(*uint64)(unsafe.Pointer(&x)) |= *(*uint64)(unsafe.Pointer(&y))
+	}
+	return x
+}
+
+func fandbits[F floaty](x, y F) F {
+	switch unsafe.Sizeof(x) {
+	case 4:
+		*(*uint32)(unsafe.Pointer(&x)) &= *(*uint32)(unsafe.Pointer(&y))
+	case 8:
+		*(*uint64)(unsafe.Pointer(&x)) &= *(*uint64)(unsafe.Pointer(&y))
+	}
+	return x
+}
diff --git a/src/runtime/minmax_test.go b/src/runtime/minmax_test.go
new file mode 100644
index 0000000..e0bc28f
--- /dev/null
+++ b/src/runtime/minmax_test.go
@@ -0,0 +1,129 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"math"
+	"strings"
+	"testing"
+	"unsafe"
+)
+
+var (
+	zero    = math.Copysign(0, +1)
+	negZero = math.Copysign(0, -1)
+	inf     = math.Inf(+1)
+	negInf  = math.Inf(-1)
+	nan     = math.NaN()
+)
+
+var tests = []struct{ min, max float64 }{
+	{1, 2},
+	{-2, 1},
+	{negZero, zero},
+	{zero, inf},
+	{negInf, zero},
+	{negInf, inf},
+	{1, inf},
+	{negInf, 1},
+}
+
+var all = []float64{1, 2, -1, -2, zero, negZero, inf, negInf, nan}
+
+func eq(x, y float64) bool {
+	return x == y && math.Signbit(x) == math.Signbit(y)
+}
+
+func TestMinFloat(t *testing.T) {
+	for _, tt := range tests {
+		if z := min(tt.min, tt.max); !eq(z, tt.min) {
+			t.Errorf("min(%v, %v) = %v, want %v", tt.min, tt.max, z, tt.min)
+		}
+		if z := min(tt.max, tt.min); !eq(z, tt.min) {
+			t.Errorf("min(%v, %v) = %v, want %v", tt.max, tt.min, z, tt.min)
+		}
+	}
+	for _, x := range all {
+		if z := min(nan, x); !math.IsNaN(z) {
+			t.Errorf("min(%v, %v) = %v, want %v", nan, x, z, nan)
+		}
+		if z := min(x, nan); !math.IsNaN(z) {
+			t.Errorf("min(%v, %v) = %v, want %v", nan, x, z, nan)
+		}
+	}
+}
+
+func TestMaxFloat(t *testing.T) {
+	for _, tt := range tests {
+		if z := max(tt.min, tt.max); !eq(z, tt.max) {
+			t.Errorf("max(%v, %v) = %v, want %v", tt.min, tt.max, z, tt.max)
+		}
+		if z := max(tt.max, tt.min); !eq(z, tt.max) {
+			t.Errorf("max(%v, %v) = %v, want %v", tt.max, tt.min, z, tt.max)
+		}
+	}
+	for _, x := range all {
+		if z := max(nan, x); !math.IsNaN(z) {
+			t.Errorf("min(%v, %v) = %v, want %v", nan, x, z, nan)
+		}
+		if z := max(x, nan); !math.IsNaN(z) {
+			t.Errorf("min(%v, %v) = %v, want %v", nan, x, z, nan)
+		}
+	}
+}
+
+// testMinMax tests that min/max behave correctly on every pair of
+// values in vals.
+//
+// vals should be a sequence of values in strictly ascending order.
+func testMinMax[T int | uint8 | string](t *testing.T, vals ...T) {
+	for i, x := range vals {
+		for _, y := range vals[i+1:] {
+			if !(x < y) {
+				t.Fatalf("values out of order: !(%v < %v)", x, y)
+			}
+
+			if z := min(x, y); z != x {
+				t.Errorf("min(%v, %v) = %v, want %v", x, y, z, x)
+			}
+			if z := min(y, x); z != x {
+				t.Errorf("min(%v, %v) = %v, want %v", y, x, z, x)
+			}
+
+			if z := max(x, y); z != y {
+				t.Errorf("max(%v, %v) = %v, want %v", x, y, z, y)
+			}
+			if z := max(y, x); z != y {
+				t.Errorf("max(%v, %v) = %v, want %v", y, x, z, y)
+			}
+		}
+	}
+}
+
+func TestMinMaxInt(t *testing.T)    { testMinMax[int](t, -7, 0, 9) }
+func TestMinMaxUint8(t *testing.T)  { testMinMax[uint8](t, 0, 1, 2, 4, 7) }
+func TestMinMaxString(t *testing.T) { testMinMax[string](t, "a", "b", "c") }
+
+// TestMinMaxStringTies ensures that min(a, b) returns a when a == b.
+func TestMinMaxStringTies(t *testing.T) {
+	s := "xxx"
+	x := strings.Split(s, "")
+
+	test := func(i, j, k int) {
+		if z := min(x[i], x[j], x[k]); unsafe.StringData(z) != unsafe.StringData(x[i]) {
+			t.Errorf("min(x[%v], x[%v], x[%v]) = %p, want %p", i, j, k, unsafe.StringData(z), unsafe.StringData(x[i]))
+		}
+		if z := max(x[i], x[j], x[k]); unsafe.StringData(z) != unsafe.StringData(x[i]) {
+			t.Errorf("max(x[%v], x[%v], x[%v]) = %p, want %p", i, j, k, unsafe.StringData(z), unsafe.StringData(x[i]))
+		}
+	}
+
+	test(0, 1, 2)
+	test(0, 2, 1)
+	test(1, 0, 2)
+	test(1, 2, 0)
+	test(2, 0, 1)
+	test(2, 1, 0)
+}
diff --git a/src/runtime/mkduff.go b/src/runtime/mkduff.go
new file mode 100644
index 0000000..cc58558
--- /dev/null
+++ b/src/runtime/mkduff.go
@@ -0,0 +1,286 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+// runtime·duffzero is a Duff's device for zeroing memory.
+// The compiler jumps to computed addresses within
+// the routine to zero chunks of memory.
+// Do not change duffzero without also
+// changing the uses in cmd/compile/internal/*/*.go.
+
+// runtime·duffcopy is a Duff's device for copying memory.
+// The compiler jumps to computed addresses within
+// the routine to copy chunks of memory.
+// Source and destination must not overlap.
+// Do not change duffcopy without also
+// changing the uses in cmd/compile/internal/*/*.go.
+
+// See the zero* and copy* generators below
+// for architecture-specific comments.
+
+// mkduff generates duff_*.s.
+package main
+
+import (
+	"bytes"
+	"fmt"
+	"io"
+	"log"
+	"os"
+)
+
+func main() {
+	gen("amd64", notags, zeroAMD64, copyAMD64)
+	gen("386", notags, zero386, copy386)
+	gen("arm", notags, zeroARM, copyARM)
+	gen("arm64", notags, zeroARM64, copyARM64)
+	gen("loong64", notags, zeroLOONG64, copyLOONG64)
+	gen("ppc64x", tagsPPC64x, zeroPPC64x, copyPPC64x)
+	gen("mips64x", tagsMIPS64x, zeroMIPS64x, copyMIPS64x)
+	gen("riscv64", notags, zeroRISCV64, copyRISCV64)
+}
+
+func gen(arch string, tags, zero, copy func(io.Writer)) {
+	var buf bytes.Buffer
+
+	fmt.Fprintln(&buf, "// Code generated by mkduff.go; DO NOT EDIT.")
+	fmt.Fprintln(&buf, "// Run go generate from src/runtime to update.")
+	fmt.Fprintln(&buf, "// See mkduff.go for comments.")
+	tags(&buf)
+	fmt.Fprintln(&buf, "#include \"textflag.h\"")
+	fmt.Fprintln(&buf)
+	zero(&buf)
+	fmt.Fprintln(&buf)
+	copy(&buf)
+
+	if err := os.WriteFile("duff_"+arch+".s", buf.Bytes(), 0644); err != nil {
+		log.Fatalln(err)
+	}
+}
+
+func notags(w io.Writer) { fmt.Fprintln(w) }
+
+func zeroAMD64(w io.Writer) {
+	// X15: zero
+	// DI: ptr to memory to be zeroed
+	// DI is updated as a side effect.
+	fmt.Fprintln(w, "TEXT runtime·duffzero<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 16; i++ {
+		fmt.Fprintln(w, "\tMOVUPS\tX15,(DI)")
+		fmt.Fprintln(w, "\tMOVUPS\tX15,16(DI)")
+		fmt.Fprintln(w, "\tMOVUPS\tX15,32(DI)")
+		fmt.Fprintln(w, "\tMOVUPS\tX15,48(DI)")
+		fmt.Fprintln(w, "\tLEAQ\t64(DI),DI") // We use lea instead of add, to avoid clobbering flags
+		fmt.Fprintln(w)
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func copyAMD64(w io.Writer) {
+	// SI: ptr to source memory
+	// DI: ptr to destination memory
+	// SI and DI are updated as a side effect.
+	//
+	// This is equivalent to a sequence of MOVSQ but
+	// for some reason that is 3.5x slower than this code.
+	fmt.Fprintln(w, "TEXT runtime·duffcopy<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 64; i++ {
+		fmt.Fprintln(w, "\tMOVUPS\t(SI), X0")
+		fmt.Fprintln(w, "\tADDQ\t$16, SI")
+		fmt.Fprintln(w, "\tMOVUPS\tX0, (DI)")
+		fmt.Fprintln(w, "\tADDQ\t$16, DI")
+		fmt.Fprintln(w)
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func zero386(w io.Writer) {
+	// AX: zero
+	// DI: ptr to memory to be zeroed
+	// DI is updated as a side effect.
+	fmt.Fprintln(w, "TEXT runtime·duffzero(SB), NOSPLIT, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tSTOSL")
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func copy386(w io.Writer) {
+	// SI: ptr to source memory
+	// DI: ptr to destination memory
+	// SI and DI are updated as a side effect.
+	//
+	// This is equivalent to a sequence of MOVSL but
+	// for some reason MOVSL is really slow.
+	fmt.Fprintln(w, "TEXT runtime·duffcopy(SB), NOSPLIT, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOVL\t(SI), CX")
+		fmt.Fprintln(w, "\tADDL\t$4, SI")
+		fmt.Fprintln(w, "\tMOVL\tCX, (DI)")
+		fmt.Fprintln(w, "\tADDL\t$4, DI")
+		fmt.Fprintln(w)
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func zeroARM(w io.Writer) {
+	// R0: zero
+	// R1: ptr to memory to be zeroed
+	// R1 is updated as a side effect.
+	fmt.Fprintln(w, "TEXT runtime·duffzero(SB), NOSPLIT, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOVW.P\tR0, 4(R1)")
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func copyARM(w io.Writer) {
+	// R0: scratch space
+	// R1: ptr to source memory
+	// R2: ptr to destination memory
+	// R1 and R2 are updated as a side effect
+	fmt.Fprintln(w, "TEXT runtime·duffcopy(SB), NOSPLIT, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOVW.P\t4(R1), R0")
+		fmt.Fprintln(w, "\tMOVW.P\tR0, 4(R2)")
+		fmt.Fprintln(w)
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func zeroARM64(w io.Writer) {
+	// ZR: always zero
+	// R20: ptr to memory to be zeroed
+	// On return, R20 points to the last zeroed dword.
+	fmt.Fprintln(w, "TEXT runtime·duffzero<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 63; i++ {
+		fmt.Fprintln(w, "\tSTP.P\t(ZR, ZR), 16(R20)")
+	}
+	fmt.Fprintln(w, "\tSTP\t(ZR, ZR), (R20)")
+	fmt.Fprintln(w, "\tRET")
+}
+
+func copyARM64(w io.Writer) {
+	// R20: ptr to source memory
+	// R21: ptr to destination memory
+	// R26, R27 (aka REGTMP): scratch space
+	// R20 and R21 are updated as a side effect
+	fmt.Fprintln(w, "TEXT runtime·duffcopy<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-0")
+
+	for i := 0; i < 64; i++ {
+		fmt.Fprintln(w, "\tLDP.P\t16(R20), (R26, R27)")
+		fmt.Fprintln(w, "\tSTP.P\t(R26, R27), 16(R21)")
+		fmt.Fprintln(w)
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func zeroLOONG64(w io.Writer) {
+	// R0: always zero
+	// R19 (aka REGRT1): ptr to memory to be zeroed - 8
+	// On return, R19 points to the last zeroed dword.
+	fmt.Fprintln(w, "TEXT runtime·duffzero(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOVV\tR0, 8(R19)")
+		fmt.Fprintln(w, "\tADDV\t$8, R19")
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func copyLOONG64(w io.Writer) {
+	fmt.Fprintln(w, "TEXT runtime·duffcopy(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOVV\t(R19), R30")
+		fmt.Fprintln(w, "\tADDV\t$8, R19")
+		fmt.Fprintln(w, "\tMOVV\tR30, (R20)")
+		fmt.Fprintln(w, "\tADDV\t$8, R20")
+		fmt.Fprintln(w)
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func tagsPPC64x(w io.Writer) {
+	fmt.Fprintln(w)
+	fmt.Fprintln(w, "//go:build ppc64 || ppc64le")
+	fmt.Fprintln(w)
+}
+
+func zeroPPC64x(w io.Writer) {
+	// R0: always zero
+	// R3 (aka REGRT1): ptr to memory to be zeroed - 8
+	// On return, R3 points to the last zeroed dword.
+	fmt.Fprintln(w, "TEXT runtime·duffzero<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOVDU\tR0, 8(R20)")
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func copyPPC64x(w io.Writer) {
+	// duffcopy is not used on PPC64.
+	fmt.Fprintln(w, "TEXT runtime·duffcopy<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOVDU\t8(R20), R5")
+		fmt.Fprintln(w, "\tMOVDU\tR5, 8(R21)")
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func tagsMIPS64x(w io.Writer) {
+	fmt.Fprintln(w)
+	fmt.Fprintln(w, "//go:build mips64 || mips64le")
+	fmt.Fprintln(w)
+}
+
+func zeroMIPS64x(w io.Writer) {
+	// R0: always zero
+	// R1 (aka REGRT1): ptr to memory to be zeroed - 8
+	// On return, R1 points to the last zeroed dword.
+	fmt.Fprintln(w, "TEXT runtime·duffzero(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOVV\tR0, 8(R1)")
+		fmt.Fprintln(w, "\tADDV\t$8, R1")
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func copyMIPS64x(w io.Writer) {
+	fmt.Fprintln(w, "TEXT runtime·duffcopy(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOVV\t(R1), R23")
+		fmt.Fprintln(w, "\tADDV\t$8, R1")
+		fmt.Fprintln(w, "\tMOVV\tR23, (R2)")
+		fmt.Fprintln(w, "\tADDV\t$8, R2")
+		fmt.Fprintln(w)
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func zeroRISCV64(w io.Writer) {
+	// ZERO: always zero
+	// X25: ptr to memory to be zeroed
+	// X25 is updated as a side effect.
+	fmt.Fprintln(w, "TEXT runtime·duffzero<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOV\tZERO, (X25)")
+		fmt.Fprintln(w, "\tADD\t$8, X25")
+	}
+	fmt.Fprintln(w, "\tRET")
+}
+
+func copyRISCV64(w io.Writer) {
+	// X24: ptr to source memory
+	// X25: ptr to destination memory
+	// X24 and X25 are updated as a side effect
+	fmt.Fprintln(w, "TEXT runtime·duffcopy<ABIInternal>(SB), NOSPLIT|NOFRAME, $0-0")
+	for i := 0; i < 128; i++ {
+		fmt.Fprintln(w, "\tMOV\t(X24), X31")
+		fmt.Fprintln(w, "\tADD\t$8, X24")
+		fmt.Fprintln(w, "\tMOV\tX31, (X25)")
+		fmt.Fprintln(w, "\tADD\t$8, X25")
+		fmt.Fprintln(w)
+	}
+	fmt.Fprintln(w, "\tRET")
+}
diff --git a/src/runtime/mkfastlog2table.go b/src/runtime/mkfastlog2table.go
new file mode 100644
index 0000000..614d1f7
--- /dev/null
+++ b/src/runtime/mkfastlog2table.go
@@ -0,0 +1,109 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+// fastlog2Table contains log2 approximations for 5 binary digits.
+// This is used to implement fastlog2, which is used for heap sampling.
+
+package main
+
+import (
+	"bytes"
+	"fmt"
+	"log"
+	"math"
+	"os"
+)
+
+func main() {
+	var buf bytes.Buffer
+
+	fmt.Fprintln(&buf, "// Code generated by mkfastlog2table.go; DO NOT EDIT.")
+	fmt.Fprintln(&buf, "// Run go generate from src/runtime to update.")
+	fmt.Fprintln(&buf, "// See mkfastlog2table.go for comments.")
+	fmt.Fprintln(&buf)
+	fmt.Fprintln(&buf, "package runtime")
+	fmt.Fprintln(&buf)
+	fmt.Fprintln(&buf, "const fastlogNumBits =", fastlogNumBits)
+	fmt.Fprintln(&buf)
+
+	fmt.Fprintln(&buf, "var fastlog2Table = [1<<fastlogNumBits + 1]float64{")
+	table := computeTable()
+	for _, t := range table {
+		fmt.Fprintf(&buf, "\t%v,\n", t)
+	}
+	fmt.Fprintln(&buf, "}")
+
+	if err := os.WriteFile("fastlog2table.go", buf.Bytes(), 0644); err != nil {
+		log.Fatalln(err)
+	}
+}
+
+const fastlogNumBits = 5
+
+func computeTable() []float64 {
+	fastlog2Table := make([]float64, 1<<fastlogNumBits+1)
+	for i := 0; i <= (1 << fastlogNumBits); i++ {
+		fastlog2Table[i] = log2(1.0 + float64(i)/(1<<fastlogNumBits))
+	}
+	return fastlog2Table
+}
+
+// log2 is a local copy of math.Log2 with an explicit float64 conversion
+// to disable FMA. This lets us generate the same output on all platforms.
+func log2(x float64) float64 {
+	frac, exp := math.Frexp(x)
+	// Make sure exact powers of two give an exact answer.
+	// Don't depend on Log(0.5)*(1/Ln2)+exp being exactly exp-1.
+	if frac == 0.5 {
+		return float64(exp - 1)
+	}
+	return float64(nlog(frac)*(1/math.Ln2)) + float64(exp)
+}
+
+// nlog is a local copy of math.Log with explicit float64 conversions
+// to disable FMA. This lets us generate the same output on all platforms.
+func nlog(x float64) float64 {
+	const (
+		Ln2Hi = 6.93147180369123816490e-01 /* 3fe62e42 fee00000 */
+		Ln2Lo = 1.90821492927058770002e-10 /* 3dea39ef 35793c76 */
+		L1    = 6.666666666666735130e-01   /* 3FE55555 55555593 */
+		L2    = 3.999999999940941908e-01   /* 3FD99999 9997FA04 */
+		L3    = 2.857142874366239149e-01   /* 3FD24924 94229359 */
+		L4    = 2.222219843214978396e-01   /* 3FCC71C5 1D8E78AF */
+		L5    = 1.818357216161805012e-01   /* 3FC74664 96CB03DE */
+		L6    = 1.531383769920937332e-01   /* 3FC39A09 D078C69F */
+		L7    = 1.479819860511658591e-01   /* 3FC2F112 DF3E5244 */
+	)
+
+	// special cases
+	switch {
+	case math.IsNaN(x) || math.IsInf(x, 1):
+		return x
+	case x < 0:
+		return math.NaN()
+	case x == 0:
+		return math.Inf(-1)
+	}
+
+	// reduce
+	f1, ki := math.Frexp(x)
+	if f1 < math.Sqrt2/2 {
+		f1 *= 2
+		ki--
+	}
+	f := f1 - 1
+	k := float64(ki)
+
+	// compute
+	s := float64(f / (2 + f))
+	s2 := float64(s * s)
+	s4 := float64(s2 * s2)
+	t1 := s2 * float64(L1+float64(s4*float64(L3+float64(s4*float64(L5+float64(s4*L7))))))
+	t2 := s4 * float64(L2+float64(s4*float64(L4+float64(s4*L6))))
+	R := float64(t1 + t2)
+	hfsq := float64(0.5 * f * f)
+	return float64(k*Ln2Hi) - ((hfsq - (float64(s*float64(hfsq+R)) + float64(k*Ln2Lo))) - f)
+}
diff --git a/src/runtime/mklockrank.go b/src/runtime/mklockrank.go
new file mode 100644
index 0000000..744dc92
--- /dev/null
+++ b/src/runtime/mklockrank.go
@@ -0,0 +1,401 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+// mklockrank records the static rank graph of the locks in the
+// runtime and generates the rank checking structures in lockrank.go.
+package main
+
+import (
+	"bytes"
+	"flag"
+	"fmt"
+	"go/format"
+	"internal/dag"
+	"io"
+	"log"
+	"os"
+	"strings"
+)
+
+// ranks describes the lock rank graph. See "go doc internal/dag" for
+// the syntax.
+//
+// "a < b" means a must be acquired before b if both are held
+// (or, if b is held, a cannot be acquired).
+//
+// "NONE < a" means no locks may be held when a is acquired.
+//
+// If a lock is not given a rank, then it is assumed to be a leaf
+// lock, which means no other lock can be acquired while it is held.
+// Therefore, leaf locks do not need to be given an explicit rank.
+//
+// Ranks in all caps are pseudo-nodes that help define order, but do
+// not actually define a rank.
+//
+// TODO: It's often hard to correlate rank names to locks. Change
+// these to be more consistent with the locks they label.
+const ranks = `
+# Sysmon
+NONE
+< sysmon
+< scavenge, forcegc;
+
+# Defer
+NONE < defer;
+
+# GC
+NONE <
+  sweepWaiters,
+  assistQueue,
+  sweep;
+
+# Test only
+NONE < testR, testW;
+
+# Scheduler, timers, netpoll
+NONE <
+  allocmW,
+  execW,
+  cpuprof,
+  pollDesc;
+assistQueue,
+  cpuprof,
+  forcegc,
+  pollDesc, # pollDesc can interact with timers, which can lock sched.
+  scavenge,
+  sweep,
+  sweepWaiters,
+  testR
+# Above SCHED are things that can call into the scheduler.
+< SCHED
+# Below SCHED is the scheduler implementation.
+< allocmR,
+  execR
+< sched;
+sched < allg, allp;
+allp < timers;
+timers < netpollInit;
+
+# Channels
+scavenge, sweep, testR < hchan;
+NONE < notifyList;
+hchan, notifyList < sudog;
+
+# Semaphores
+NONE < root;
+
+# Itabs
+NONE
+< itab
+< reflectOffs;
+
+# User arena state
+NONE < userArenaState;
+
+# Tracing without a P uses a global trace buffer.
+scavenge
+# Above TRACEGLOBAL can emit a trace event without a P.
+< TRACEGLOBAL
+# Below TRACEGLOBAL manages the global tracing buffer.
+# Note that traceBuf eventually chains to MALLOC, but we never get that far
+# in the situation where there's no P.
+< traceBuf;
+# Starting/stopping tracing traces strings.
+traceBuf < traceStrings;
+
+# Malloc
+allg,
+  allocmR,
+  execR, # May grow stack
+  execW, # May allocate after BeforeFork
+  hchan,
+  notifyList,
+  reflectOffs,
+  timers,
+  traceStrings,
+  userArenaState
+# Above MALLOC are things that can allocate memory.
+< MALLOC
+# Below MALLOC is the malloc implementation.
+< fin,
+  spanSetSpine,
+  mspanSpecial,
+  MPROF;
+
+# We can acquire gcBitsArenas for pinner bits, and
+# it's guarded by mspanSpecial.
+MALLOC, mspanSpecial < gcBitsArenas;
+
+# Memory profiling
+MPROF < profInsert, profBlock, profMemActive;
+profMemActive < profMemFuture;
+
+# Stack allocation and copying
+gcBitsArenas,
+  netpollInit,
+  profBlock,
+  profInsert,
+  profMemFuture,
+  spanSetSpine,
+  fin,
+  root
+# Anything that can grow the stack can acquire STACKGROW.
+# (Most higher layers imply STACKGROW, like MALLOC.)
+< STACKGROW
+# Below STACKGROW is the stack allocator/copying implementation.
+< gscan;
+gscan < stackpool;
+gscan < stackLarge;
+# Generally, hchan must be acquired before gscan. But in one case,
+# where we suspend a G and then shrink its stack, syncadjustsudogs
+# can acquire hchan locks while holding gscan. To allow this case,
+# we use hchanLeaf instead of hchan.
+gscan < hchanLeaf;
+
+# Write barrier
+defer,
+  gscan,
+  mspanSpecial,
+  sudog
+# Anything that can have write barriers can acquire WB.
+# Above WB, we can have write barriers.
+< WB
+# Below WB is the write barrier implementation.
+< wbufSpans;
+
+# Span allocator
+stackLarge,
+  stackpool,
+  wbufSpans
+# Above mheap is anything that can call the span allocator.
+< mheap;
+# Below mheap is the span allocator implementation.
+#
+# Specials: we're allowed to allocate a special while holding
+# an mspanSpecial lock, and they're part of the malloc implementation.
+# Pinner bits might be freed by the span allocator.
+mheap, mspanSpecial < mheapSpecial;
+mheap, mheapSpecial < globalAlloc;
+
+# Execution tracer events (with a P)
+hchan,
+  mheap,
+  root,
+  sched,
+  traceStrings,
+  notifyList,
+  fin
+# Above TRACE is anything that can create a trace event
+< TRACE
+< trace
+< traceStackTab;
+
+# panic is handled specially. It is implicitly below all other locks.
+NONE < panic;
+# deadlock is not acquired while holding panic, but it also needs to be
+# below all other locks.
+panic < deadlock;
+# raceFini is only held while exiting.
+panic < raceFini;
+
+# RWMutex internal read lock
+
+allocmR,
+  allocmW
+< allocmRInternal;
+
+execR,
+  execW
+< execRInternal;
+
+testR,
+  testW
+< testRInternal;
+`
+
+// cyclicRanks lists lock ranks that allow multiple locks of the same
+// rank to be acquired simultaneously. The runtime enforces ordering
+// within these ranks using a separate mechanism.
+var cyclicRanks = map[string]bool{
+	// Multiple timers are locked simultaneously in destroy().
+	"timers": true,
+	// Multiple hchans are acquired in hchan.sortkey() order in
+	// select.
+	"hchan": true,
+	// Multiple hchanLeafs are acquired in hchan.sortkey() order in
+	// syncadjustsudogs().
+	"hchanLeaf": true,
+	// The point of the deadlock lock is to deadlock.
+	"deadlock": true,
+}
+
+func main() {
+	flagO := flag.String("o", "", "write to `file` instead of stdout")
+	flagDot := flag.Bool("dot", false, "emit graphviz output instead of Go")
+	flag.Parse()
+	if flag.NArg() != 0 {
+		fmt.Fprintf(os.Stderr, "too many arguments")
+		os.Exit(2)
+	}
+
+	g, err := dag.Parse(ranks)
+	if err != nil {
+		log.Fatal(err)
+	}
+
+	var out []byte
+	if *flagDot {
+		var b bytes.Buffer
+		g.TransitiveReduction()
+		// Add cyclic edges for visualization.
+		for k := range cyclicRanks {
+			g.AddEdge(k, k)
+		}
+		// Reverse the graph. It's much easier to read this as
+		// a "<" partial order than a ">" partial order. This
+		// ways, locks are acquired from the top going down
+		// and time moves forward over the edges instead of
+		// backward.
+		g.Transpose()
+		generateDot(&b, g)
+		out = b.Bytes()
+	} else {
+		var b bytes.Buffer
+		generateGo(&b, g)
+		out, err = format.Source(b.Bytes())
+		if err != nil {
+			log.Fatal(err)
+		}
+	}
+
+	if *flagO != "" {
+		err = os.WriteFile(*flagO, out, 0666)
+	} else {
+		_, err = os.Stdout.Write(out)
+	}
+	if err != nil {
+		log.Fatal(err)
+	}
+}
+
+func generateGo(w io.Writer, g *dag.Graph) {
+	fmt.Fprintf(w, `// Code generated by mklockrank.go; DO NOT EDIT.
+
+package runtime
+
+type lockRank int
+
+`)
+
+	// Create numeric ranks.
+	topo := g.Topo()
+	for i, j := 0, len(topo)-1; i < j; i, j = i+1, j-1 {
+		topo[i], topo[j] = topo[j], topo[i]
+	}
+	fmt.Fprintf(w, `
+// Constants representing the ranks of all non-leaf runtime locks, in rank order.
+// Locks with lower rank must be taken before locks with higher rank,
+// in addition to satisfying the partial order in lockPartialOrder.
+// A few ranks allow self-cycles, which are specified in lockPartialOrder.
+const (
+	lockRankUnknown lockRank = iota
+
+`)
+	for _, rank := range topo {
+		if isPseudo(rank) {
+			fmt.Fprintf(w, "\t// %s\n", rank)
+		} else {
+			fmt.Fprintf(w, "\t%s\n", cname(rank))
+		}
+	}
+	fmt.Fprintf(w, `)
+
+// lockRankLeafRank is the rank of lock that does not have a declared rank,
+// and hence is a leaf lock.
+const lockRankLeafRank lockRank = 1000
+`)
+
+	// Create string table.
+	fmt.Fprintf(w, `
+// lockNames gives the names associated with each of the above ranks.
+var lockNames = []string{
+`)
+	for _, rank := range topo {
+		if !isPseudo(rank) {
+			fmt.Fprintf(w, "\t%s: %q,\n", cname(rank), rank)
+		}
+	}
+	fmt.Fprintf(w, `}
+
+func (rank lockRank) String() string {
+	if rank == 0 {
+		return "UNKNOWN"
+	}
+	if rank == lockRankLeafRank {
+		return "LEAF"
+	}
+	if rank < 0 || int(rank) >= len(lockNames) {
+		return "BAD RANK"
+	}
+	return lockNames[rank]
+}
+`)
+
+	// Create partial order structure.
+	fmt.Fprintf(w, `
+// lockPartialOrder is the transitive closure of the lock rank graph.
+// An entry for rank X lists all of the ranks that can already be held
+// when rank X is acquired.
+//
+// Lock ranks that allow self-cycles list themselves.
+var lockPartialOrder [][]lockRank = [][]lockRank{
+`)
+	for _, rank := range topo {
+		if isPseudo(rank) {
+			continue
+		}
+		list := []string{}
+		for _, before := range g.Edges(rank) {
+			if !isPseudo(before) {
+				list = append(list, cname(before))
+			}
+		}
+		if cyclicRanks[rank] {
+			list = append(list, cname(rank))
+		}
+
+		fmt.Fprintf(w, "\t%s: {%s},\n", cname(rank), strings.Join(list, ", "))
+	}
+	fmt.Fprintf(w, "}\n")
+}
+
+// cname returns the Go const name for the given lock rank label.
+func cname(label string) string {
+	return "lockRank" + strings.ToUpper(label[:1]) + label[1:]
+}
+
+func isPseudo(label string) bool {
+	return strings.ToUpper(label) == label
+}
+
+// generateDot emits a Graphviz dot representation of g to w.
+func generateDot(w io.Writer, g *dag.Graph) {
+	fmt.Fprintf(w, "digraph g {\n")
+
+	// Define all nodes.
+	for _, node := range g.Nodes {
+		fmt.Fprintf(w, "%q;\n", node)
+	}
+
+	// Create edges.
+	for _, node := range g.Nodes {
+		for _, to := range g.Edges(node) {
+			fmt.Fprintf(w, "%q -> %q;\n", node, to)
+		}
+	}
+
+	fmt.Fprintf(w, "}\n")
+}
diff --git a/src/runtime/mkpreempt.go b/src/runtime/mkpreempt.go
new file mode 100644
index 0000000..0bfbd37
--- /dev/null
+++ b/src/runtime/mkpreempt.go
@@ -0,0 +1,630 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+// mkpreempt generates the asyncPreempt functions for each
+// architecture.
+package main
+
+import (
+	"flag"
+	"fmt"
+	"io"
+	"log"
+	"os"
+	"strings"
+)
+
+// Copied from cmd/compile/internal/ssa/gen/*Ops.go
+
+var regNames386 = []string{
+	"AX",
+	"CX",
+	"DX",
+	"BX",
+	"SP",
+	"BP",
+	"SI",
+	"DI",
+	"X0",
+	"X1",
+	"X2",
+	"X3",
+	"X4",
+	"X5",
+	"X6",
+	"X7",
+}
+
+var regNamesAMD64 = []string{
+	"AX",
+	"CX",
+	"DX",
+	"BX",
+	"SP",
+	"BP",
+	"SI",
+	"DI",
+	"R8",
+	"R9",
+	"R10",
+	"R11",
+	"R12",
+	"R13",
+	"R14",
+	"R15",
+	"X0",
+	"X1",
+	"X2",
+	"X3",
+	"X4",
+	"X5",
+	"X6",
+	"X7",
+	"X8",
+	"X9",
+	"X10",
+	"X11",
+	"X12",
+	"X13",
+	"X14",
+	"X15",
+}
+
+var out io.Writer
+
+var arches = map[string]func(){
+	"386":     gen386,
+	"amd64":   genAMD64,
+	"arm":     genARM,
+	"arm64":   genARM64,
+	"loong64": genLoong64,
+	"mips64x": func() { genMIPS(true) },
+	"mipsx":   func() { genMIPS(false) },
+	"ppc64x":  genPPC64,
+	"riscv64": genRISCV64,
+	"s390x":   genS390X,
+	"wasm":    genWasm,
+}
+var beLe = map[string]bool{"mips64x": true, "mipsx": true, "ppc64x": true}
+
+func main() {
+	flag.Parse()
+	if flag.NArg() > 0 {
+		out = os.Stdout
+		for _, arch := range flag.Args() {
+			gen, ok := arches[arch]
+			if !ok {
+				log.Fatalf("unknown arch %s", arch)
+			}
+			header(arch)
+			gen()
+		}
+		return
+	}
+
+	for arch, gen := range arches {
+		f, err := os.Create(fmt.Sprintf("preempt_%s.s", arch))
+		if err != nil {
+			log.Fatal(err)
+		}
+		out = f
+		header(arch)
+		gen()
+		if err := f.Close(); err != nil {
+			log.Fatal(err)
+		}
+	}
+}
+
+func header(arch string) {
+	fmt.Fprintf(out, "// Code generated by mkpreempt.go; DO NOT EDIT.\n\n")
+	if beLe[arch] {
+		base := arch[:len(arch)-1]
+		fmt.Fprintf(out, "//go:build %s || %sle\n\n", base, base)
+	}
+	fmt.Fprintf(out, "#include \"go_asm.h\"\n")
+	if arch == "amd64" {
+		fmt.Fprintf(out, "#include \"asm_amd64.h\"\n")
+	}
+	fmt.Fprintf(out, "#include \"textflag.h\"\n\n")
+	fmt.Fprintf(out, "TEXT ·asyncPreempt(SB),NOSPLIT|NOFRAME,$0-0\n")
+}
+
+func p(f string, args ...any) {
+	fmted := fmt.Sprintf(f, args...)
+	fmt.Fprintf(out, "\t%s\n", strings.ReplaceAll(fmted, "\n", "\n\t"))
+}
+
+func label(l string) {
+	fmt.Fprintf(out, "%s\n", l)
+}
+
+type layout struct {
+	stack int
+	regs  []regPos
+	sp    string // stack pointer register
+}
+
+type regPos struct {
+	pos int
+
+	saveOp    string
+	restoreOp string
+	reg       string
+
+	// If this register requires special save and restore, these
+	// give those operations with a %d placeholder for the stack
+	// offset.
+	save, restore string
+}
+
+func (l *layout) add(op, reg string, size int) {
+	l.regs = append(l.regs, regPos{saveOp: op, restoreOp: op, reg: reg, pos: l.stack})
+	l.stack += size
+}
+
+func (l *layout) add2(sop, rop, reg string, size int) {
+	l.regs = append(l.regs, regPos{saveOp: sop, restoreOp: rop, reg: reg, pos: l.stack})
+	l.stack += size
+}
+
+func (l *layout) addSpecial(save, restore string, size int) {
+	l.regs = append(l.regs, regPos{save: save, restore: restore, pos: l.stack})
+	l.stack += size
+}
+
+func (l *layout) save() {
+	for _, reg := range l.regs {
+		if reg.save != "" {
+			p(reg.save, reg.pos)
+		} else {
+			p("%s %s, %d(%s)", reg.saveOp, reg.reg, reg.pos, l.sp)
+		}
+	}
+}
+
+func (l *layout) restore() {
+	for i := len(l.regs) - 1; i >= 0; i-- {
+		reg := l.regs[i]
+		if reg.restore != "" {
+			p(reg.restore, reg.pos)
+		} else {
+			p("%s %d(%s), %s", reg.restoreOp, reg.pos, l.sp, reg.reg)
+		}
+	}
+}
+
+func gen386() {
+	p("PUSHFL")
+	// Save general purpose registers.
+	var l = layout{sp: "SP"}
+	for _, reg := range regNames386 {
+		if reg == "SP" || strings.HasPrefix(reg, "X") {
+			continue
+		}
+		l.add("MOVL", reg, 4)
+	}
+
+	softfloat := "GO386_softfloat"
+
+	// Save SSE state only if supported.
+	lSSE := layout{stack: l.stack, sp: "SP"}
+	for i := 0; i < 8; i++ {
+		lSSE.add("MOVUPS", fmt.Sprintf("X%d", i), 16)
+	}
+
+	p("ADJSP $%d", lSSE.stack)
+	p("NOP SP")
+	l.save()
+	p("#ifndef %s", softfloat)
+	lSSE.save()
+	p("#endif")
+	p("CALL ·asyncPreempt2(SB)")
+	p("#ifndef %s", softfloat)
+	lSSE.restore()
+	p("#endif")
+	l.restore()
+	p("ADJSP $%d", -lSSE.stack)
+
+	p("POPFL")
+	p("RET")
+}
+
+func genAMD64() {
+	// Assign stack offsets.
+	var l = layout{sp: "SP"}
+	for _, reg := range regNamesAMD64 {
+		if reg == "SP" || reg == "BP" {
+			continue
+		}
+		if !strings.HasPrefix(reg, "X") {
+			l.add("MOVQ", reg, 8)
+		}
+	}
+	lSSE := layout{stack: l.stack, sp: "SP"}
+	for _, reg := range regNamesAMD64 {
+		if strings.HasPrefix(reg, "X") {
+			lSSE.add("MOVUPS", reg, 16)
+		}
+	}
+
+	// TODO: MXCSR register?
+
+	p("PUSHQ BP")
+	p("MOVQ SP, BP")
+	p("// Save flags before clobbering them")
+	p("PUSHFQ")
+	p("// obj doesn't understand ADD/SUB on SP, but does understand ADJSP")
+	p("ADJSP $%d", lSSE.stack)
+	p("// But vet doesn't know ADJSP, so suppress vet stack checking")
+	p("NOP SP")
+
+	l.save()
+
+	// Apparently, the signal handling code path in darwin kernel leaves
+	// the upper bits of Y registers in a dirty state, which causes
+	// many SSE operations (128-bit and narrower) become much slower.
+	// Clear the upper bits to get to a clean state. See issue #37174.
+	// It is safe here as Go code don't use the upper bits of Y registers.
+	p("#ifdef GOOS_darwin")
+	p("#ifndef hasAVX")
+	p("CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $0")
+	p("JE 2(PC)")
+	p("#endif")
+	p("VZEROUPPER")
+	p("#endif")
+
+	lSSE.save()
+	p("CALL ·asyncPreempt2(SB)")
+	lSSE.restore()
+	l.restore()
+	p("ADJSP $%d", -lSSE.stack)
+	p("POPFQ")
+	p("POPQ BP")
+	p("RET")
+}
+
+func genARM() {
+	// Add integer registers R0-R12.
+	// R13 (SP), R14 (LR), R15 (PC) are special and not saved here.
+	var l = layout{sp: "R13", stack: 4} // add LR slot
+	for i := 0; i <= 12; i++ {
+		reg := fmt.Sprintf("R%d", i)
+		if i == 10 {
+			continue // R10 is g register, no need to save/restore
+		}
+		l.add("MOVW", reg, 4)
+	}
+	// Add flag register.
+	l.addSpecial(
+		"MOVW CPSR, R0\nMOVW R0, %d(R13)",
+		"MOVW %d(R13), R0\nMOVW R0, CPSR",
+		4)
+
+	// Add floating point registers F0-F15 and flag register.
+	var lfp = layout{stack: l.stack, sp: "R13"}
+	lfp.addSpecial(
+		"MOVW FPCR, R0\nMOVW R0, %d(R13)",
+		"MOVW %d(R13), R0\nMOVW R0, FPCR",
+		4)
+	for i := 0; i <= 15; i++ {
+		reg := fmt.Sprintf("F%d", i)
+		lfp.add("MOVD", reg, 8)
+	}
+
+	p("MOVW.W R14, -%d(R13)", lfp.stack) // allocate frame, save LR
+	l.save()
+	p("MOVB ·goarm(SB), R0\nCMP $6, R0\nBLT nofp") // test goarm, and skip FP registers if goarm=5.
+	lfp.save()
+	label("nofp:")
+	p("CALL ·asyncPreempt2(SB)")
+	p("MOVB ·goarm(SB), R0\nCMP $6, R0\nBLT nofp2") // test goarm, and skip FP registers if goarm=5.
+	lfp.restore()
+	label("nofp2:")
+	l.restore()
+
+	p("MOVW %d(R13), R14", lfp.stack)     // sigctxt.pushCall pushes LR on stack, restore it
+	p("MOVW.P %d(R13), R15", lfp.stack+4) // load PC, pop frame (including the space pushed by sigctxt.pushCall)
+	p("UNDEF")                            // shouldn't get here
+}
+
+func genARM64() {
+	// Add integer registers R0-R26
+	// R27 (REGTMP), R28 (g), R29 (FP), R30 (LR), R31 (SP) are special
+	// and not saved here.
+	var l = layout{sp: "RSP", stack: 8} // add slot to save PC of interrupted instruction
+	for i := 0; i < 26; i += 2 {
+		if i == 18 {
+			i--
+			continue // R18 is not used, skip
+		}
+		reg := fmt.Sprintf("(R%d, R%d)", i, i+1)
+		l.add2("STP", "LDP", reg, 16)
+	}
+	// Add flag registers.
+	l.addSpecial(
+		"MOVD NZCV, R0\nMOVD R0, %d(RSP)",
+		"MOVD %d(RSP), R0\nMOVD R0, NZCV",
+		8)
+	l.addSpecial(
+		"MOVD FPSR, R0\nMOVD R0, %d(RSP)",
+		"MOVD %d(RSP), R0\nMOVD R0, FPSR",
+		8)
+	// TODO: FPCR? I don't think we'll change it, so no need to save.
+	// Add floating point registers F0-F31.
+	for i := 0; i < 31; i += 2 {
+		reg := fmt.Sprintf("(F%d, F%d)", i, i+1)
+		l.add2("FSTPD", "FLDPD", reg, 16)
+	}
+	if l.stack%16 != 0 {
+		l.stack += 8 // SP needs 16-byte alignment
+	}
+
+	// allocate frame, save PC of interrupted instruction (in LR)
+	p("MOVD R30, %d(RSP)", -l.stack)
+	p("SUB $%d, RSP", l.stack)
+	p("MOVD R29, -8(RSP)") // save frame pointer (only used on Linux)
+	p("SUB $8, RSP, R29")  // set up new frame pointer
+	// On iOS, save the LR again after decrementing SP. We run the
+	// signal handler on the G stack (as it doesn't support sigaltstack),
+	// so any writes below SP may be clobbered.
+	p("#ifdef GOOS_ios")
+	p("MOVD R30, (RSP)")
+	p("#endif")
+
+	l.save()
+	p("CALL ·asyncPreempt2(SB)")
+	l.restore()
+
+	p("MOVD %d(RSP), R30", l.stack) // sigctxt.pushCall has pushed LR (at interrupt) on stack, restore it
+	p("MOVD -8(RSP), R29")          // restore frame pointer
+	p("MOVD (RSP), R27")            // load PC to REGTMP
+	p("ADD $%d, RSP", l.stack+16)   // pop frame (including the space pushed by sigctxt.pushCall)
+	p("JMP (R27)")
+}
+
+func genMIPS(_64bit bool) {
+	mov := "MOVW"
+	movf := "MOVF"
+	add := "ADD"
+	sub := "SUB"
+	r28 := "R28"
+	regsize := 4
+	softfloat := "GOMIPS_softfloat"
+	if _64bit {
+		mov = "MOVV"
+		movf = "MOVD"
+		add = "ADDV"
+		sub = "SUBV"
+		r28 = "RSB"
+		regsize = 8
+		softfloat = "GOMIPS64_softfloat"
+	}
+
+	// Add integer registers R1-R22, R24-R25, R28
+	// R0 (zero), R23 (REGTMP), R29 (SP), R30 (g), R31 (LR) are special,
+	// and not saved here. R26 and R27 are reserved by kernel and not used.
+	var l = layout{sp: "R29", stack: regsize} // add slot to save PC of interrupted instruction (in LR)
+	for i := 1; i <= 25; i++ {
+		if i == 23 {
+			continue // R23 is REGTMP
+		}
+		reg := fmt.Sprintf("R%d", i)
+		l.add(mov, reg, regsize)
+	}
+	l.add(mov, r28, regsize)
+	l.addSpecial(
+		mov+" HI, R1\n"+mov+" R1, %d(R29)",
+		mov+" %d(R29), R1\n"+mov+" R1, HI",
+		regsize)
+	l.addSpecial(
+		mov+" LO, R1\n"+mov+" R1, %d(R29)",
+		mov+" %d(R29), R1\n"+mov+" R1, LO",
+		regsize)
+
+	// Add floating point control/status register FCR31 (FCR0-FCR30 are irrelevant)
+	var lfp = layout{sp: "R29", stack: l.stack}
+	lfp.addSpecial(
+		mov+" FCR31, R1\n"+mov+" R1, %d(R29)",
+		mov+" %d(R29), R1\n"+mov+" R1, FCR31",
+		regsize)
+	// Add floating point registers F0-F31.
+	for i := 0; i <= 31; i++ {
+		reg := fmt.Sprintf("F%d", i)
+		lfp.add(movf, reg, regsize)
+	}
+
+	// allocate frame, save PC of interrupted instruction (in LR)
+	p(mov+" R31, -%d(R29)", lfp.stack)
+	p(sub+" $%d, R29", lfp.stack)
+
+	l.save()
+	p("#ifndef %s", softfloat)
+	lfp.save()
+	p("#endif")
+	p("CALL ·asyncPreempt2(SB)")
+	p("#ifndef %s", softfloat)
+	lfp.restore()
+	p("#endif")
+	l.restore()
+
+	p(mov+" %d(R29), R31", lfp.stack)     // sigctxt.pushCall has pushed LR (at interrupt) on stack, restore it
+	p(mov + " (R29), R23")                // load PC to REGTMP
+	p(add+" $%d, R29", lfp.stack+regsize) // pop frame (including the space pushed by sigctxt.pushCall)
+	p("JMP (R23)")
+}
+
+func genLoong64() {
+	mov := "MOVV"
+	movf := "MOVD"
+	add := "ADDV"
+	sub := "SUBV"
+	regsize := 8
+
+	// Add integer registers r4-r21 r23-r29 r31
+	// R0 (zero), R30 (REGTMP), R2 (tp), R3 (SP), R22 (g), R1 (LR) are special,
+	var l = layout{sp: "R3", stack: regsize} // add slot to save PC of interrupted instruction (in LR)
+	for i := 4; i <= 31; i++ {
+		if i == 22 || i == 30 {
+			continue
+		}
+		reg := fmt.Sprintf("R%d", i)
+		l.add(mov, reg, regsize)
+	}
+
+	// Add floating point registers F0-F31.
+	for i := 0; i <= 31; i++ {
+		reg := fmt.Sprintf("F%d", i)
+		l.add(movf, reg, regsize)
+	}
+
+	// save/restore FCC0
+	l.addSpecial(
+		mov+" FCC0, R4\n"+mov+" R4, %d(R3)",
+		mov+" %d(R3), R4\n"+mov+" R4, FCC0",
+		regsize)
+
+	// allocate frame, save PC of interrupted instruction (in LR)
+	p(mov+" R1, -%d(R3)", l.stack)
+	p(sub+" $%d, R3", l.stack)
+
+	l.save()
+	p("CALL ·asyncPreempt2(SB)")
+	l.restore()
+
+	p(mov+" %d(R3), R1", l.stack)      // sigctxt.pushCall has pushed LR (at interrupt) on stack, restore it
+	p(mov + " (R3), R30")              // load PC to REGTMP
+	p(add+" $%d, R3", l.stack+regsize) // pop frame (including the space pushed by sigctxt.pushCall)
+	p("JMP (R30)")
+}
+
+func genPPC64() {
+	// Add integer registers R3-R29
+	// R0 (zero), R1 (SP), R30 (g) are special and not saved here.
+	// R2 (TOC pointer in PIC mode), R12 (function entry address in PIC mode) have been saved in sigctxt.pushCall.
+	// R31 (REGTMP) will be saved manually.
+	var l = layout{sp: "R1", stack: 32 + 8} // MinFrameSize on PPC64, plus one word for saving R31
+	for i := 3; i <= 29; i++ {
+		if i == 12 || i == 13 {
+			// R12 has been saved in sigctxt.pushCall.
+			// R13 is TLS pointer, not used by Go code. we must NOT
+			// restore it, otherwise if we parked and resumed on a
+			// different thread we'll mess up TLS addresses.
+			continue
+		}
+		reg := fmt.Sprintf("R%d", i)
+		l.add("MOVD", reg, 8)
+	}
+	l.addSpecial(
+		"MOVW CR, R31\nMOVW R31, %d(R1)",
+		"MOVW %d(R1), R31\nMOVFL R31, $0xff", // this is MOVW R31, CR
+		8)                                    // CR is 4-byte wide, but just keep the alignment
+	l.addSpecial(
+		"MOVD XER, R31\nMOVD R31, %d(R1)",
+		"MOVD %d(R1), R31\nMOVD R31, XER",
+		8)
+	// Add floating point registers F0-F31.
+	for i := 0; i <= 31; i++ {
+		reg := fmt.Sprintf("F%d", i)
+		l.add("FMOVD", reg, 8)
+	}
+	// Add floating point control/status register FPSCR.
+	l.addSpecial(
+		"MOVFL FPSCR, F0\nFMOVD F0, %d(R1)",
+		"FMOVD %d(R1), F0\nMOVFL F0, FPSCR",
+		8)
+
+	p("MOVD R31, -%d(R1)", l.stack-32) // save R31 first, we'll use R31 for saving LR
+	p("MOVD LR, R31")
+	p("MOVDU R31, -%d(R1)", l.stack) // allocate frame, save PC of interrupted instruction (in LR)
+
+	l.save()
+	p("CALL ·asyncPreempt2(SB)")
+	l.restore()
+
+	p("MOVD %d(R1), R31", l.stack) // sigctxt.pushCall has pushed LR, R2, R12 (at interrupt) on stack, restore them
+	p("MOVD R31, LR")
+	p("MOVD %d(R1), R2", l.stack+8)
+	p("MOVD %d(R1), R12", l.stack+16)
+	p("MOVD (R1), R31") // load PC to CTR
+	p("MOVD R31, CTR")
+	p("MOVD 32(R1), R31")        // restore R31
+	p("ADD $%d, R1", l.stack+32) // pop frame (including the space pushed by sigctxt.pushCall)
+	p("JMP (CTR)")
+}
+
+func genRISCV64() {
+	// X0 (zero), X1 (LR), X2 (SP), X3 (GP), X4 (TP), X27 (g), X31 (TMP) are special.
+	var l = layout{sp: "X2", stack: 8}
+
+	// Add integer registers (X5-X26, X28-30).
+	for i := 5; i < 31; i++ {
+		if i == 27 {
+			continue
+		}
+		reg := fmt.Sprintf("X%d", i)
+		l.add("MOV", reg, 8)
+	}
+
+	// Add floating point registers (F0-F31).
+	for i := 0; i <= 31; i++ {
+		reg := fmt.Sprintf("F%d", i)
+		l.add("MOVD", reg, 8)
+	}
+
+	p("MOV X1, -%d(X2)", l.stack)
+	p("ADD $-%d, X2", l.stack)
+	l.save()
+	p("CALL ·asyncPreempt2(SB)")
+	l.restore()
+	p("MOV %d(X2), X1", l.stack)
+	p("MOV (X2), X31")
+	p("ADD $%d, X2", l.stack+8)
+	p("JMP (X31)")
+}
+
+func genS390X() {
+	// Add integer registers R0-R12
+	// R13 (g), R14 (LR), R15 (SP) are special, and not saved here.
+	// Saving R10 (REGTMP) is not necessary, but it is saved anyway.
+	var l = layout{sp: "R15", stack: 16} // add slot to save PC of interrupted instruction and flags
+	l.addSpecial(
+		"STMG R0, R12, %d(R15)",
+		"LMG %d(R15), R0, R12",
+		13*8)
+	// Add floating point registers F0-F31.
+	for i := 0; i <= 15; i++ {
+		reg := fmt.Sprintf("F%d", i)
+		l.add("FMOVD", reg, 8)
+	}
+
+	// allocate frame, save PC of interrupted instruction (in LR) and flags (condition code)
+	p("IPM R10") // save flags upfront, as ADD will clobber flags
+	p("MOVD R14, -%d(R15)", l.stack)
+	p("ADD $-%d, R15", l.stack)
+	p("MOVW R10, 8(R15)") // save flags
+
+	l.save()
+	p("CALL ·asyncPreempt2(SB)")
+	l.restore()
+
+	p("MOVD %d(R15), R14", l.stack)    // sigctxt.pushCall has pushed LR (at interrupt) on stack, restore it
+	p("ADD $%d, R15", l.stack+8)       // pop frame (including the space pushed by sigctxt.pushCall)
+	p("MOVWZ -%d(R15), R10", l.stack)  // load flags to REGTMP
+	p("TMLH R10, $(3<<12)")            // restore flags
+	p("MOVD -%d(R15), R10", l.stack+8) // load PC to REGTMP
+	p("JMP (R10)")
+}
+
+func genWasm() {
+	p("// No async preemption on wasm")
+	p("UNDEF")
+}
+
+func notImplemented() {
+	p("// Not implemented yet")
+	p("JMP ·abort(SB)")
+}
diff --git a/src/runtime/mksizeclasses.go b/src/runtime/mksizeclasses.go
new file mode 100644
index 0000000..156e613
--- /dev/null
+++ b/src/runtime/mksizeclasses.go
@@ -0,0 +1,357 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+// Generate tables for small malloc size classes.
+//
+// See malloc.go for overview.
+//
+// The size classes are chosen so that rounding an allocation
+// request up to the next size class wastes at most 12.5% (1.125x).
+//
+// Each size class has its own page count that gets allocated
+// and chopped up when new objects of the size class are needed.
+// That page count is chosen so that chopping up the run of
+// pages into objects of the given size wastes at most 12.5% (1.125x)
+// of the memory. It is not necessary that the cutoff here be
+// the same as above.
+//
+// The two sources of waste multiply, so the worst possible case
+// for the above constraints would be that allocations of some
+// size might have a 26.6% (1.266x) overhead.
+// In practice, only one of the wastes comes into play for a
+// given size (sizes < 512 waste mainly on the round-up,
+// sizes > 512 waste mainly on the page chopping).
+// For really small sizes, alignment constraints force the
+// overhead higher.
+
+package main
+
+import (
+	"bytes"
+	"flag"
+	"fmt"
+	"go/format"
+	"io"
+	"log"
+	"math"
+	"math/bits"
+	"os"
+)
+
+// Generate msize.go
+
+var stdout = flag.Bool("stdout", false, "write to stdout instead of sizeclasses.go")
+
+func main() {
+	flag.Parse()
+
+	var b bytes.Buffer
+	fmt.Fprintln(&b, "// Code generated by mksizeclasses.go; DO NOT EDIT.")
+	fmt.Fprintln(&b, "//go:generate go run mksizeclasses.go")
+	fmt.Fprintln(&b)
+	fmt.Fprintln(&b, "package runtime")
+	classes := makeClasses()
+
+	printComment(&b, classes)
+
+	printClasses(&b, classes)
+
+	out, err := format.Source(b.Bytes())
+	if err != nil {
+		log.Fatal(err)
+	}
+	if *stdout {
+		_, err = os.Stdout.Write(out)
+	} else {
+		err = os.WriteFile("sizeclasses.go", out, 0666)
+	}
+	if err != nil {
+		log.Fatal(err)
+	}
+}
+
+const (
+	// Constants that we use and will transfer to the runtime.
+	maxSmallSize = 32 << 10
+	smallSizeDiv = 8
+	smallSizeMax = 1024
+	largeSizeDiv = 128
+	pageShift    = 13
+
+	// Derived constants.
+	pageSize = 1 << pageShift
+)
+
+type class struct {
+	size   int // max size
+	npages int // number of pages
+}
+
+func powerOfTwo(x int) bool {
+	return x != 0 && x&(x-1) == 0
+}
+
+func makeClasses() []class {
+	var classes []class
+
+	classes = append(classes, class{}) // class #0 is a dummy entry
+
+	align := 8
+	for size := align; size <= maxSmallSize; size += align {
+		if powerOfTwo(size) { // bump alignment once in a while
+			if size >= 2048 {
+				align = 256
+			} else if size >= 128 {
+				align = size / 8
+			} else if size >= 32 {
+				align = 16 // heap bitmaps assume 16 byte alignment for allocations >= 32 bytes.
+			}
+		}
+		if !powerOfTwo(align) {
+			panic("incorrect alignment")
+		}
+
+		// Make the allocnpages big enough that
+		// the leftover is less than 1/8 of the total,
+		// so wasted space is at most 12.5%.
+		allocsize := pageSize
+		for allocsize%size > allocsize/8 {
+			allocsize += pageSize
+		}
+		npages := allocsize / pageSize
+
+		// If the previous sizeclass chose the same
+		// allocation size and fit the same number of
+		// objects into the page, we might as well
+		// use just this size instead of having two
+		// different sizes.
+		if len(classes) > 1 && npages == classes[len(classes)-1].npages && allocsize/size == allocsize/classes[len(classes)-1].size {
+			classes[len(classes)-1].size = size
+			continue
+		}
+		classes = append(classes, class{size: size, npages: npages})
+	}
+
+	// Increase object sizes if we can fit the same number of larger objects
+	// into the same number of pages. For example, we choose size 8448 above
+	// with 6 objects in 7 pages. But we can well use object size 9472,
+	// which is also 6 objects in 7 pages but +1024 bytes (+12.12%).
+	// We need to preserve at least largeSizeDiv alignment otherwise
+	// sizeToClass won't work.
+	for i := range classes {
+		if i == 0 {
+			continue
+		}
+		c := &classes[i]
+		psize := c.npages * pageSize
+		new_size := (psize / (psize / c.size)) &^ (largeSizeDiv - 1)
+		if new_size > c.size {
+			c.size = new_size
+		}
+	}
+
+	if len(classes) != 68 {
+		panic("number of size classes has changed")
+	}
+
+	for i := range classes {
+		computeDivMagic(&classes[i])
+	}
+
+	return classes
+}
+
+// computeDivMagic checks that the division required to compute object
+// index from span offset can be computed using 32-bit multiplication.
+// n / c.size is implemented as (n * (^uint32(0)/uint32(c.size) + 1)) >> 32
+// for all 0 <= n <= c.npages * pageSize
+func computeDivMagic(c *class) {
+	// divisor
+	d := c.size
+	if d == 0 {
+		return
+	}
+
+	// maximum input value for which the formula needs to work.
+	max := c.npages * pageSize
+
+	// As reported in [1], if n and d are unsigned N-bit integers, we
+	// can compute n / d as ⌊n * c / 2^F⌋, where c is ⌈2^F / d⌉ and F is
+	// computed with:
+	//
+	// 	Algorithm 2: Algorithm to select the number of fractional bits
+	// 	and the scaled approximate reciprocal in the case of unsigned
+	// 	integers.
+	//
+	// 	if d is a power of two then
+	// 		Let F ← log₂(d) and c = 1.
+	// 	else
+	// 		Let F ← N + L where L is the smallest integer
+	// 		such that d ≤ (2^(N+L) mod d) + 2^L.
+	// 	end if
+	//
+	// [1] "Faster Remainder by Direct Computation: Applications to
+	// Compilers and Software Libraries" Daniel Lemire, Owen Kaser,
+	// Nathan Kurz arXiv:1902.01961
+	//
+	// To minimize the risk of introducing errors, we implement the
+	// algorithm exactly as stated, rather than trying to adapt it to
+	// fit typical Go idioms.
+	N := bits.Len(uint(max))
+	var F int
+	if powerOfTwo(d) {
+		F = int(math.Log2(float64(d)))
+		if d != 1<<F {
+			panic("imprecise log2")
+		}
+	} else {
+		for L := 0; ; L++ {
+			if d <= ((1<<(N+L))%d)+(1<<L) {
+				F = N + L
+				break
+			}
+		}
+	}
+
+	// Also, noted in the paper, F is the smallest number of fractional
+	// bits required. We use 32 bits, because it works for all size
+	// classes and is fast on all CPU architectures that we support.
+	if F > 32 {
+		fmt.Printf("d=%d max=%d N=%d F=%d\n", c.size, max, N, F)
+		panic("size class requires more than 32 bits of precision")
+	}
+
+	// Brute force double-check with the exact computation that will be
+	// done by the runtime.
+	m := ^uint32(0)/uint32(c.size) + 1
+	for n := 0; n <= max; n++ {
+		if uint32((uint64(n)*uint64(m))>>32) != uint32(n/c.size) {
+			fmt.Printf("d=%d max=%d m=%d n=%d\n", d, max, m, n)
+			panic("bad 32-bit multiply magic")
+		}
+	}
+}
+
+func printComment(w io.Writer, classes []class) {
+	fmt.Fprintf(w, "// %-5s  %-9s  %-10s  %-7s  %-10s  %-9s  %-9s\n", "class", "bytes/obj", "bytes/span", "objects", "tail waste", "max waste", "min align")
+	prevSize := 0
+	var minAligns [pageShift + 1]int
+	for i, c := range classes {
+		if i == 0 {
+			continue
+		}
+		spanSize := c.npages * pageSize
+		objects := spanSize / c.size
+		tailWaste := spanSize - c.size*(spanSize/c.size)
+		maxWaste := float64((c.size-prevSize-1)*objects+tailWaste) / float64(spanSize)
+		alignBits := bits.TrailingZeros(uint(c.size))
+		if alignBits > pageShift {
+			// object alignment is capped at page alignment
+			alignBits = pageShift
+		}
+		for i := range minAligns {
+			if i > alignBits {
+				minAligns[i] = 0
+			} else if minAligns[i] == 0 {
+				minAligns[i] = c.size
+			}
+		}
+		prevSize = c.size
+		fmt.Fprintf(w, "// %5d  %9d  %10d  %7d  %10d  %8.2f%%  %9d\n", i, c.size, spanSize, objects, tailWaste, 100*maxWaste, 1<<alignBits)
+	}
+	fmt.Fprintf(w, "\n")
+
+	fmt.Fprintf(w, "// %-9s  %-4s  %-12s\n", "alignment", "bits", "min obj size")
+	for bits, size := range minAligns {
+		if size == 0 {
+			break
+		}
+		if bits+1 < len(minAligns) && size == minAligns[bits+1] {
+			continue
+		}
+		fmt.Fprintf(w, "// %9d  %4d  %12d\n", 1<<bits, bits, size)
+	}
+	fmt.Fprintf(w, "\n")
+}
+
+func maxObjsPerSpan(classes []class) int {
+	max := 0
+	for _, c := range classes[1:] {
+		n := c.npages * pageSize / c.size
+		if n > max {
+			max = n
+		}
+	}
+	return max
+}
+
+func printClasses(w io.Writer, classes []class) {
+	fmt.Fprintln(w, "const (")
+	fmt.Fprintf(w, "_MaxSmallSize = %d\n", maxSmallSize)
+	fmt.Fprintf(w, "smallSizeDiv = %d\n", smallSizeDiv)
+	fmt.Fprintf(w, "smallSizeMax = %d\n", smallSizeMax)
+	fmt.Fprintf(w, "largeSizeDiv = %d\n", largeSizeDiv)
+	fmt.Fprintf(w, "_NumSizeClasses = %d\n", len(classes))
+	fmt.Fprintf(w, "_PageShift = %d\n", pageShift)
+	fmt.Fprintf(w, "maxObjsPerSpan = %d\n", maxObjsPerSpan(classes))
+	fmt.Fprintln(w, ")")
+
+	fmt.Fprint(w, "var class_to_size = [_NumSizeClasses]uint16 {")
+	for _, c := range classes {
+		fmt.Fprintf(w, "%d,", c.size)
+	}
+	fmt.Fprintln(w, "}")
+
+	fmt.Fprint(w, "var class_to_allocnpages = [_NumSizeClasses]uint8 {")
+	for _, c := range classes {
+		fmt.Fprintf(w, "%d,", c.npages)
+	}
+	fmt.Fprintln(w, "}")
+
+	fmt.Fprint(w, "var class_to_divmagic = [_NumSizeClasses]uint32 {")
+	for _, c := range classes {
+		if c.size == 0 {
+			fmt.Fprintf(w, "0,")
+			continue
+		}
+		fmt.Fprintf(w, "^uint32(0)/%d+1,", c.size)
+	}
+	fmt.Fprintln(w, "}")
+
+	// map from size to size class, for small sizes.
+	sc := make([]int, smallSizeMax/smallSizeDiv+1)
+	for i := range sc {
+		size := i * smallSizeDiv
+		for j, c := range classes {
+			if c.size >= size {
+				sc[i] = j
+				break
+			}
+		}
+	}
+	fmt.Fprint(w, "var size_to_class8 = [smallSizeMax/smallSizeDiv+1]uint8 {")
+	for _, v := range sc {
+		fmt.Fprintf(w, "%d,", v)
+	}
+	fmt.Fprintln(w, "}")
+
+	// map from size to size class, for large sizes.
+	sc = make([]int, (maxSmallSize-smallSizeMax)/largeSizeDiv+1)
+	for i := range sc {
+		size := smallSizeMax + i*largeSizeDiv
+		for j, c := range classes {
+			if c.size >= size {
+				sc[i] = j
+				break
+			}
+		}
+	}
+	fmt.Fprint(w, "var size_to_class128 = [(_MaxSmallSize-smallSizeMax)/largeSizeDiv+1]uint8 {")
+	for _, v := range sc {
+		fmt.Fprintf(w, "%d,", v)
+	}
+	fmt.Fprintln(w, "}")
+}
diff --git a/src/runtime/mmap.go b/src/runtime/mmap.go
new file mode 100644
index 0000000..f0183f6
--- /dev/null
+++ b/src/runtime/mmap.go
@@ -0,0 +1,19 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !aix && !darwin && !js && (!linux || !amd64) && (!linux || !arm64) && (!freebsd || !amd64) && !openbsd && !plan9 && !solaris && !windows
+
+package runtime
+
+import "unsafe"
+
+// mmap calls the mmap system call. It is implemented in assembly.
+// We only pass the lower 32 bits of file offset to the
+// assembly routine; the higher bits (if required), should be provided
+// by the assembly routine as 0.
+// The err result is an OS error code such as ENOMEM.
+func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (p unsafe.Pointer, err int)
+
+// munmap calls the munmap system call. It is implemented in assembly.
+func munmap(addr unsafe.Pointer, n uintptr)
diff --git a/src/runtime/mpagealloc.go b/src/runtime/mpagealloc.go
new file mode 100644
index 0000000..2861fa9
--- /dev/null
+++ b/src/runtime/mpagealloc.go
@@ -0,0 +1,1081 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Page allocator.
+//
+// The page allocator manages mapped pages (defined by pageSize, NOT
+// physPageSize) for allocation and re-use. It is embedded into mheap.
+//
+// Pages are managed using a bitmap that is sharded into chunks.
+// In the bitmap, 1 means in-use, and 0 means free. The bitmap spans the
+// process's address space. Chunks are managed in a sparse-array-style structure
+// similar to mheap.arenas, since the bitmap may be large on some systems.
+//
+// The bitmap is efficiently searched by using a radix tree in combination
+// with fast bit-wise intrinsics. Allocation is performed using an address-ordered
+// first-fit approach.
+//
+// Each entry in the radix tree is a summary that describes three properties of
+// a particular region of the address space: the number of contiguous free pages
+// at the start and end of the region it represents, and the maximum number of
+// contiguous free pages found anywhere in that region.
+//
+// Each level of the radix tree is stored as one contiguous array, which represents
+// a different granularity of subdivision of the processes' address space. Thus, this
+// radix tree is actually implicit in these large arrays, as opposed to having explicit
+// dynamically-allocated pointer-based node structures. Naturally, these arrays may be
+// quite large for system with large address spaces, so in these cases they are mapped
+// into memory as needed. The leaf summaries of the tree correspond to a bitmap chunk.
+//
+// The root level (referred to as L0 and index 0 in pageAlloc.summary) has each
+// summary represent the largest section of address space (16 GiB on 64-bit systems),
+// with each subsequent level representing successively smaller subsections until we
+// reach the finest granularity at the leaves, a chunk.
+//
+// More specifically, each summary in each level (except for leaf summaries)
+// represents some number of entries in the following level. For example, each
+// summary in the root level may represent a 16 GiB region of address space,
+// and in the next level there could be 8 corresponding entries which represent 2
+// GiB subsections of that 16 GiB region, each of which could correspond to 8
+// entries in the next level which each represent 256 MiB regions, and so on.
+//
+// Thus, this design only scales to heaps so large, but can always be extended to
+// larger heaps by simply adding levels to the radix tree, which mostly costs
+// additional virtual address space. The choice of managing large arrays also means
+// that a large amount of virtual address space may be reserved by the runtime.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const (
+	// The size of a bitmap chunk, i.e. the amount of bits (that is, pages) to consider
+	// in the bitmap at once.
+	pallocChunkPages    = 1 << logPallocChunkPages
+	pallocChunkBytes    = pallocChunkPages * pageSize
+	logPallocChunkPages = 9
+	logPallocChunkBytes = logPallocChunkPages + pageShift
+
+	// The number of radix bits for each level.
+	//
+	// The value of 3 is chosen such that the block of summaries we need to scan at
+	// each level fits in 64 bytes (2^3 summaries * 8 bytes per summary), which is
+	// close to the L1 cache line width on many systems. Also, a value of 3 fits 4 tree
+	// levels perfectly into the 21-bit pallocBits summary field at the root level.
+	//
+	// The following equation explains how each of the constants relate:
+	// summaryL0Bits + (summaryLevels-1)*summaryLevelBits + logPallocChunkBytes = heapAddrBits
+	//
+	// summaryLevels is an architecture-dependent value defined in mpagealloc_*.go.
+	summaryLevelBits = 3
+	summaryL0Bits    = heapAddrBits - logPallocChunkBytes - (summaryLevels-1)*summaryLevelBits
+
+	// pallocChunksL2Bits is the number of bits of the chunk index number
+	// covered by the second level of the chunks map.
+	//
+	// See (*pageAlloc).chunks for more details. Update the documentation
+	// there should this change.
+	pallocChunksL2Bits  = heapAddrBits - logPallocChunkBytes - pallocChunksL1Bits
+	pallocChunksL1Shift = pallocChunksL2Bits
+)
+
+// maxSearchAddr returns the maximum searchAddr value, which indicates
+// that the heap has no free space.
+//
+// This function exists just to make it clear that this is the maximum address
+// for the page allocator's search space. See maxOffAddr for details.
+//
+// It's a function (rather than a variable) because it needs to be
+// usable before package runtime's dynamic initialization is complete.
+// See #51913 for details.
+func maxSearchAddr() offAddr { return maxOffAddr }
+
+// Global chunk index.
+//
+// Represents an index into the leaf level of the radix tree.
+// Similar to arenaIndex, except instead of arenas, it divides the address
+// space into chunks.
+type chunkIdx uint
+
+// chunkIndex returns the global index of the palloc chunk containing the
+// pointer p.
+func chunkIndex(p uintptr) chunkIdx {
+	return chunkIdx((p - arenaBaseOffset) / pallocChunkBytes)
+}
+
+// chunkBase returns the base address of the palloc chunk at index ci.
+func chunkBase(ci chunkIdx) uintptr {
+	return uintptr(ci)*pallocChunkBytes + arenaBaseOffset
+}
+
+// chunkPageIndex computes the index of the page that contains p,
+// relative to the chunk which contains p.
+func chunkPageIndex(p uintptr) uint {
+	return uint(p % pallocChunkBytes / pageSize)
+}
+
+// l1 returns the index into the first level of (*pageAlloc).chunks.
+func (i chunkIdx) l1() uint {
+	if pallocChunksL1Bits == 0 {
+		// Let the compiler optimize this away if there's no
+		// L1 map.
+		return 0
+	} else {
+		return uint(i) >> pallocChunksL1Shift
+	}
+}
+
+// l2 returns the index into the second level of (*pageAlloc).chunks.
+func (i chunkIdx) l2() uint {
+	if pallocChunksL1Bits == 0 {
+		return uint(i)
+	} else {
+		return uint(i) & (1<<pallocChunksL2Bits - 1)
+	}
+}
+
+// offAddrToLevelIndex converts an address in the offset address space
+// to the index into summary[level] containing addr.
+func offAddrToLevelIndex(level int, addr offAddr) int {
+	return int((addr.a - arenaBaseOffset) >> levelShift[level])
+}
+
+// levelIndexToOffAddr converts an index into summary[level] into
+// the corresponding address in the offset address space.
+func levelIndexToOffAddr(level, idx int) offAddr {
+	return offAddr{(uintptr(idx) << levelShift[level]) + arenaBaseOffset}
+}
+
+// addrsToSummaryRange converts base and limit pointers into a range
+// of entries for the given summary level.
+//
+// The returned range is inclusive on the lower bound and exclusive on
+// the upper bound.
+func addrsToSummaryRange(level int, base, limit uintptr) (lo int, hi int) {
+	// This is slightly more nuanced than just a shift for the exclusive
+	// upper-bound. Note that the exclusive upper bound may be within a
+	// summary at this level, meaning if we just do the obvious computation
+	// hi will end up being an inclusive upper bound. Unfortunately, just
+	// adding 1 to that is too broad since we might be on the very edge
+	// of a summary's max page count boundary for this level
+	// (1 << levelLogPages[level]). So, make limit an inclusive upper bound
+	// then shift, then add 1, so we get an exclusive upper bound at the end.
+	lo = int((base - arenaBaseOffset) >> levelShift[level])
+	hi = int(((limit-1)-arenaBaseOffset)>>levelShift[level]) + 1
+	return
+}
+
+// blockAlignSummaryRange aligns indices into the given level to that
+// level's block width (1 << levelBits[level]). It assumes lo is inclusive
+// and hi is exclusive, and so aligns them down and up respectively.
+func blockAlignSummaryRange(level int, lo, hi int) (int, int) {
+	e := uintptr(1) << levelBits[level]
+	return int(alignDown(uintptr(lo), e)), int(alignUp(uintptr(hi), e))
+}
+
+type pageAlloc struct {
+	// Radix tree of summaries.
+	//
+	// Each slice's cap represents the whole memory reservation.
+	// Each slice's len reflects the allocator's maximum known
+	// mapped heap address for that level.
+	//
+	// The backing store of each summary level is reserved in init
+	// and may or may not be committed in grow (small address spaces
+	// may commit all the memory in init).
+	//
+	// The purpose of keeping len <= cap is to enforce bounds checks
+	// on the top end of the slice so that instead of an unknown
+	// runtime segmentation fault, we get a much friendlier out-of-bounds
+	// error.
+	//
+	// To iterate over a summary level, use inUse to determine which ranges
+	// are currently available. Otherwise one might try to access
+	// memory which is only Reserved which may result in a hard fault.
+	//
+	// We may still get segmentation faults < len since some of that
+	// memory may not be committed yet.
+	summary [summaryLevels][]pallocSum
+
+	// chunks is a slice of bitmap chunks.
+	//
+	// The total size of chunks is quite large on most 64-bit platforms
+	// (O(GiB) or more) if flattened, so rather than making one large mapping
+	// (which has problems on some platforms, even when PROT_NONE) we use a
+	// two-level sparse array approach similar to the arena index in mheap.
+	//
+	// To find the chunk containing a memory address `a`, do:
+	//   chunkOf(chunkIndex(a))
+	//
+	// Below is a table describing the configuration for chunks for various
+	// heapAddrBits supported by the runtime.
+	//
+	// heapAddrBits | L1 Bits | L2 Bits | L2 Entry Size
+	// ------------------------------------------------
+	// 32           | 0       | 10      | 128 KiB
+	// 33 (iOS)     | 0       | 11      | 256 KiB
+	// 48           | 13      | 13      | 1 MiB
+	//
+	// There's no reason to use the L1 part of chunks on 32-bit, the
+	// address space is small so the L2 is small. For platforms with a
+	// 48-bit address space, we pick the L1 such that the L2 is 1 MiB
+	// in size, which is a good balance between low granularity without
+	// making the impact on BSS too high (note the L1 is stored directly
+	// in pageAlloc).
+	//
+	// To iterate over the bitmap, use inUse to determine which ranges
+	// are currently available. Otherwise one might iterate over unused
+	// ranges.
+	//
+	// Protected by mheapLock.
+	//
+	// TODO(mknyszek): Consider changing the definition of the bitmap
+	// such that 1 means free and 0 means in-use so that summaries and
+	// the bitmaps align better on zero-values.
+	chunks [1 << pallocChunksL1Bits]*[1 << pallocChunksL2Bits]pallocData
+
+	// The address to start an allocation search with. It must never
+	// point to any memory that is not contained in inUse, i.e.
+	// inUse.contains(searchAddr.addr()) must always be true. The one
+	// exception to this rule is that it may take on the value of
+	// maxOffAddr to indicate that the heap is exhausted.
+	//
+	// We guarantee that all valid heap addresses below this value
+	// are allocated and not worth searching.
+	searchAddr offAddr
+
+	// start and end represent the chunk indices
+	// which pageAlloc knows about. It assumes
+	// chunks in the range [start, end) are
+	// currently ready to use.
+	start, end chunkIdx
+
+	// inUse is a slice of ranges of address space which are
+	// known by the page allocator to be currently in-use (passed
+	// to grow).
+	//
+	// We care much more about having a contiguous heap in these cases
+	// and take additional measures to ensure that, so in nearly all
+	// cases this should have just 1 element.
+	//
+	// All access is protected by the mheapLock.
+	inUse addrRanges
+
+	// scav stores the scavenger state.
+	scav struct {
+		// index is an efficient index of chunks that have pages available to
+		// scavenge.
+		index scavengeIndex
+
+		// releasedBg is the amount of memory released in the background this
+		// scavenge cycle.
+		releasedBg atomic.Uintptr
+
+		// releasedEager is the amount of memory released eagerly this scavenge
+		// cycle.
+		releasedEager atomic.Uintptr
+	}
+
+	// mheap_.lock. This level of indirection makes it possible
+	// to test pageAlloc independently of the runtime allocator.
+	mheapLock *mutex
+
+	// sysStat is the runtime memstat to update when new system
+	// memory is committed by the pageAlloc for allocation metadata.
+	sysStat *sysMemStat
+
+	// summaryMappedReady is the number of bytes mapped in the Ready state
+	// in the summary structure. Used only for testing currently.
+	//
+	// Protected by mheapLock.
+	summaryMappedReady uintptr
+
+	// chunkHugePages indicates whether page bitmap chunks should be backed
+	// by huge pages.
+	chunkHugePages bool
+
+	// Whether or not this struct is being used in tests.
+	test bool
+}
+
+func (p *pageAlloc) init(mheapLock *mutex, sysStat *sysMemStat, test bool) {
+	if levelLogPages[0] > logMaxPackedValue {
+		// We can't represent 1<<levelLogPages[0] pages, the maximum number
+		// of pages we need to represent at the root level, in a summary, which
+		// is a big problem. Throw.
+		print("runtime: root level max pages = ", 1<<levelLogPages[0], "\n")
+		print("runtime: summary max pages = ", maxPackedValue, "\n")
+		throw("root level max pages doesn't fit in summary")
+	}
+	p.sysStat = sysStat
+
+	// Initialize p.inUse.
+	p.inUse.init(sysStat)
+
+	// System-dependent initialization.
+	p.sysInit(test)
+
+	// Start with the searchAddr in a state indicating there's no free memory.
+	p.searchAddr = maxSearchAddr()
+
+	// Set the mheapLock.
+	p.mheapLock = mheapLock
+
+	// Initialize the scavenge index.
+	p.summaryMappedReady += p.scav.index.init(test, sysStat)
+
+	// Set if we're in a test.
+	p.test = test
+}
+
+// tryChunkOf returns the bitmap data for the given chunk.
+//
+// Returns nil if the chunk data has not been mapped.
+func (p *pageAlloc) tryChunkOf(ci chunkIdx) *pallocData {
+	l2 := p.chunks[ci.l1()]
+	if l2 == nil {
+		return nil
+	}
+	return &l2[ci.l2()]
+}
+
+// chunkOf returns the chunk at the given chunk index.
+//
+// The chunk index must be valid or this method may throw.
+func (p *pageAlloc) chunkOf(ci chunkIdx) *pallocData {
+	return &p.chunks[ci.l1()][ci.l2()]
+}
+
+// grow sets up the metadata for the address range [base, base+size).
+// It may allocate metadata, in which case *p.sysStat will be updated.
+//
+// p.mheapLock must be held.
+func (p *pageAlloc) grow(base, size uintptr) {
+	assertLockHeld(p.mheapLock)
+
+	// Round up to chunks, since we can't deal with increments smaller
+	// than chunks. Also, sysGrow expects aligned values.
+	limit := alignUp(base+size, pallocChunkBytes)
+	base = alignDown(base, pallocChunkBytes)
+
+	// Grow the summary levels in a system-dependent manner.
+	// We just update a bunch of additional metadata here.
+	p.sysGrow(base, limit)
+
+	// Grow the scavenge index.
+	p.summaryMappedReady += p.scav.index.grow(base, limit, p.sysStat)
+
+	// Update p.start and p.end.
+	// If no growth happened yet, start == 0. This is generally
+	// safe since the zero page is unmapped.
+	firstGrowth := p.start == 0
+	start, end := chunkIndex(base), chunkIndex(limit)
+	if firstGrowth || start < p.start {
+		p.start = start
+	}
+	if end > p.end {
+		p.end = end
+	}
+	// Note that [base, limit) will never overlap with any existing
+	// range inUse because grow only ever adds never-used memory
+	// regions to the page allocator.
+	p.inUse.add(makeAddrRange(base, limit))
+
+	// A grow operation is a lot like a free operation, so if our
+	// chunk ends up below p.searchAddr, update p.searchAddr to the
+	// new address, just like in free.
+	if b := (offAddr{base}); b.lessThan(p.searchAddr) {
+		p.searchAddr = b
+	}
+
+	// Add entries into chunks, which is sparse, if needed. Then,
+	// initialize the bitmap.
+	//
+	// Newly-grown memory is always considered scavenged.
+	// Set all the bits in the scavenged bitmaps high.
+	for c := chunkIndex(base); c < chunkIndex(limit); c++ {
+		if p.chunks[c.l1()] == nil {
+			// Create the necessary l2 entry.
+			const l2Size = unsafe.Sizeof(*p.chunks[0])
+			r := sysAlloc(l2Size, p.sysStat)
+			if r == nil {
+				throw("pageAlloc: out of memory")
+			}
+			if !p.test {
+				// Make the chunk mapping eligible or ineligible
+				// for huge pages, depending on what our current
+				// state is.
+				if p.chunkHugePages {
+					sysHugePage(r, l2Size)
+				} else {
+					sysNoHugePage(r, l2Size)
+				}
+			}
+			// Store the new chunk block but avoid a write barrier.
+			// grow is used in call chains that disallow write barriers.
+			*(*uintptr)(unsafe.Pointer(&p.chunks[c.l1()])) = uintptr(r)
+		}
+		p.chunkOf(c).scavenged.setRange(0, pallocChunkPages)
+	}
+
+	// Update summaries accordingly. The grow acts like a free, so
+	// we need to ensure this newly-free memory is visible in the
+	// summaries.
+	p.update(base, size/pageSize, true, false)
+}
+
+// enableChunkHugePages enables huge pages for the chunk bitmap mappings (disabled by default).
+//
+// This function is idempotent.
+//
+// A note on latency: for sufficiently small heaps (<10s of GiB) this function will take constant
+// time, but may take time proportional to the size of the mapped heap beyond that.
+//
+// The heap lock must not be held over this operation, since it will briefly acquire
+// the heap lock.
+//
+// Must be called on the system stack because it acquires the heap lock.
+//
+//go:systemstack
+func (p *pageAlloc) enableChunkHugePages() {
+	// Grab the heap lock to turn on huge pages for new chunks and clone the current
+	// heap address space ranges.
+	//
+	// After the lock is released, we can be sure that bitmaps for any new chunks may
+	// be backed with huge pages, and we have the address space for the rest of the
+	// chunks. At the end of this function, all chunk metadata should be backed by huge
+	// pages.
+	lock(&mheap_.lock)
+	if p.chunkHugePages {
+		unlock(&mheap_.lock)
+		return
+	}
+	p.chunkHugePages = true
+	var inUse addrRanges
+	inUse.sysStat = p.sysStat
+	p.inUse.cloneInto(&inUse)
+	unlock(&mheap_.lock)
+
+	// This might seem like a lot of work, but all these loops are for generality.
+	//
+	// For a 1 GiB contiguous heap, a 48-bit address space, 13 L1 bits, a palloc chunk size
+	// of 4 MiB, and adherence to the default set of heap address hints, this will result in
+	// exactly 1 call to sysHugePage.
+	for _, r := range p.inUse.ranges {
+		for i := chunkIndex(r.base.addr()).l1(); i < chunkIndex(r.limit.addr()-1).l1(); i++ {
+			// N.B. We can assume that p.chunks[i] is non-nil and in a mapped part of p.chunks
+			// because it's derived from inUse, which never shrinks.
+			sysHugePage(unsafe.Pointer(p.chunks[i]), unsafe.Sizeof(*p.chunks[0]))
+		}
+	}
+}
+
+// update updates heap metadata. It must be called each time the bitmap
+// is updated.
+//
+// If contig is true, update does some optimizations assuming that there was
+// a contiguous allocation or free between addr and addr+npages. alloc indicates
+// whether the operation performed was an allocation or a free.
+//
+// p.mheapLock must be held.
+func (p *pageAlloc) update(base, npages uintptr, contig, alloc bool) {
+	assertLockHeld(p.mheapLock)
+
+	// base, limit, start, and end are inclusive.
+	limit := base + npages*pageSize - 1
+	sc, ec := chunkIndex(base), chunkIndex(limit)
+
+	// Handle updating the lowest level first.
+	if sc == ec {
+		// Fast path: the allocation doesn't span more than one chunk,
+		// so update this one and if the summary didn't change, return.
+		x := p.summary[len(p.summary)-1][sc]
+		y := p.chunkOf(sc).summarize()
+		if x == y {
+			return
+		}
+		p.summary[len(p.summary)-1][sc] = y
+	} else if contig {
+		// Slow contiguous path: the allocation spans more than one chunk
+		// and at least one summary is guaranteed to change.
+		summary := p.summary[len(p.summary)-1]
+
+		// Update the summary for chunk sc.
+		summary[sc] = p.chunkOf(sc).summarize()
+
+		// Update the summaries for chunks in between, which are
+		// either totally allocated or freed.
+		whole := p.summary[len(p.summary)-1][sc+1 : ec]
+		if alloc {
+			// Should optimize into a memclr.
+			for i := range whole {
+				whole[i] = 0
+			}
+		} else {
+			for i := range whole {
+				whole[i] = freeChunkSum
+			}
+		}
+
+		// Update the summary for chunk ec.
+		summary[ec] = p.chunkOf(ec).summarize()
+	} else {
+		// Slow general path: the allocation spans more than one chunk
+		// and at least one summary is guaranteed to change.
+		//
+		// We can't assume a contiguous allocation happened, so walk over
+		// every chunk in the range and manually recompute the summary.
+		summary := p.summary[len(p.summary)-1]
+		for c := sc; c <= ec; c++ {
+			summary[c] = p.chunkOf(c).summarize()
+		}
+	}
+
+	// Walk up the radix tree and update the summaries appropriately.
+	changed := true
+	for l := len(p.summary) - 2; l >= 0 && changed; l-- {
+		// Update summaries at level l from summaries at level l+1.
+		changed = false
+
+		// "Constants" for the previous level which we
+		// need to compute the summary from that level.
+		logEntriesPerBlock := levelBits[l+1]
+		logMaxPages := levelLogPages[l+1]
+
+		// lo and hi describe all the parts of the level we need to look at.
+		lo, hi := addrsToSummaryRange(l, base, limit+1)
+
+		// Iterate over each block, updating the corresponding summary in the less-granular level.
+		for i := lo; i < hi; i++ {
+			children := p.summary[l+1][i<<logEntriesPerBlock : (i+1)<<logEntriesPerBlock]
+			sum := mergeSummaries(children, logMaxPages)
+			old := p.summary[l][i]
+			if old != sum {
+				changed = true
+				p.summary[l][i] = sum
+			}
+		}
+	}
+}
+
+// allocRange marks the range of memory [base, base+npages*pageSize) as
+// allocated. It also updates the summaries to reflect the newly-updated
+// bitmap.
+//
+// Returns the amount of scavenged memory in bytes present in the
+// allocated range.
+//
+// p.mheapLock must be held.
+func (p *pageAlloc) allocRange(base, npages uintptr) uintptr {
+	assertLockHeld(p.mheapLock)
+
+	limit := base + npages*pageSize - 1
+	sc, ec := chunkIndex(base), chunkIndex(limit)
+	si, ei := chunkPageIndex(base), chunkPageIndex(limit)
+
+	scav := uint(0)
+	if sc == ec {
+		// The range doesn't cross any chunk boundaries.
+		chunk := p.chunkOf(sc)
+		scav += chunk.scavenged.popcntRange(si, ei+1-si)
+		chunk.allocRange(si, ei+1-si)
+		p.scav.index.alloc(sc, ei+1-si)
+	} else {
+		// The range crosses at least one chunk boundary.
+		chunk := p.chunkOf(sc)
+		scav += chunk.scavenged.popcntRange(si, pallocChunkPages-si)
+		chunk.allocRange(si, pallocChunkPages-si)
+		p.scav.index.alloc(sc, pallocChunkPages-si)
+		for c := sc + 1; c < ec; c++ {
+			chunk := p.chunkOf(c)
+			scav += chunk.scavenged.popcntRange(0, pallocChunkPages)
+			chunk.allocAll()
+			p.scav.index.alloc(c, pallocChunkPages)
+		}
+		chunk = p.chunkOf(ec)
+		scav += chunk.scavenged.popcntRange(0, ei+1)
+		chunk.allocRange(0, ei+1)
+		p.scav.index.alloc(ec, ei+1)
+	}
+	p.update(base, npages, true, true)
+	return uintptr(scav) * pageSize
+}
+
+// findMappedAddr returns the smallest mapped offAddr that is
+// >= addr. That is, if addr refers to mapped memory, then it is
+// returned. If addr is higher than any mapped region, then
+// it returns maxOffAddr.
+//
+// p.mheapLock must be held.
+func (p *pageAlloc) findMappedAddr(addr offAddr) offAddr {
+	assertLockHeld(p.mheapLock)
+
+	// If we're not in a test, validate first by checking mheap_.arenas.
+	// This is a fast path which is only safe to use outside of testing.
+	ai := arenaIndex(addr.addr())
+	if p.test || mheap_.arenas[ai.l1()] == nil || mheap_.arenas[ai.l1()][ai.l2()] == nil {
+		vAddr, ok := p.inUse.findAddrGreaterEqual(addr.addr())
+		if ok {
+			return offAddr{vAddr}
+		} else {
+			// The candidate search address is greater than any
+			// known address, which means we definitely have no
+			// free memory left.
+			return maxOffAddr
+		}
+	}
+	return addr
+}
+
+// find searches for the first (address-ordered) contiguous free region of
+// npages in size and returns a base address for that region.
+//
+// It uses p.searchAddr to prune its search and assumes that no palloc chunks
+// below chunkIndex(p.searchAddr) contain any free memory at all.
+//
+// find also computes and returns a candidate p.searchAddr, which may or
+// may not prune more of the address space than p.searchAddr already does.
+// This candidate is always a valid p.searchAddr.
+//
+// find represents the slow path and the full radix tree search.
+//
+// Returns a base address of 0 on failure, in which case the candidate
+// searchAddr returned is invalid and must be ignored.
+//
+// p.mheapLock must be held.
+func (p *pageAlloc) find(npages uintptr) (uintptr, offAddr) {
+	assertLockHeld(p.mheapLock)
+
+	// Search algorithm.
+	//
+	// This algorithm walks each level l of the radix tree from the root level
+	// to the leaf level. It iterates over at most 1 << levelBits[l] of entries
+	// in a given level in the radix tree, and uses the summary information to
+	// find either:
+	//  1) That a given subtree contains a large enough contiguous region, at
+	//     which point it continues iterating on the next level, or
+	//  2) That there are enough contiguous boundary-crossing bits to satisfy
+	//     the allocation, at which point it knows exactly where to start
+	//     allocating from.
+	//
+	// i tracks the index into the current level l's structure for the
+	// contiguous 1 << levelBits[l] entries we're actually interested in.
+	//
+	// NOTE: Technically this search could allocate a region which crosses
+	// the arenaBaseOffset boundary, which when arenaBaseOffset != 0, is
+	// a discontinuity. However, the only way this could happen is if the
+	// page at the zero address is mapped, and this is impossible on
+	// every system we support where arenaBaseOffset != 0. So, the
+	// discontinuity is already encoded in the fact that the OS will never
+	// map the zero page for us, and this function doesn't try to handle
+	// this case in any way.
+
+	// i is the beginning of the block of entries we're searching at the
+	// current level.
+	i := 0
+
+	// firstFree is the region of address space that we are certain to
+	// find the first free page in the heap. base and bound are the inclusive
+	// bounds of this window, and both are addresses in the linearized, contiguous
+	// view of the address space (with arenaBaseOffset pre-added). At each level,
+	// this window is narrowed as we find the memory region containing the
+	// first free page of memory. To begin with, the range reflects the
+	// full process address space.
+	//
+	// firstFree is updated by calling foundFree each time free space in the
+	// heap is discovered.
+	//
+	// At the end of the search, base.addr() is the best new
+	// searchAddr we could deduce in this search.
+	firstFree := struct {
+		base, bound offAddr
+	}{
+		base:  minOffAddr,
+		bound: maxOffAddr,
+	}
+	// foundFree takes the given address range [addr, addr+size) and
+	// updates firstFree if it is a narrower range. The input range must
+	// either be fully contained within firstFree or not overlap with it
+	// at all.
+	//
+	// This way, we'll record the first summary we find with any free
+	// pages on the root level and narrow that down if we descend into
+	// that summary. But as soon as we need to iterate beyond that summary
+	// in a level to find a large enough range, we'll stop narrowing.
+	foundFree := func(addr offAddr, size uintptr) {
+		if firstFree.base.lessEqual(addr) && addr.add(size-1).lessEqual(firstFree.bound) {
+			// This range fits within the current firstFree window, so narrow
+			// down the firstFree window to the base and bound of this range.
+			firstFree.base = addr
+			firstFree.bound = addr.add(size - 1)
+		} else if !(addr.add(size-1).lessThan(firstFree.base) || firstFree.bound.lessThan(addr)) {
+			// This range only partially overlaps with the firstFree range,
+			// so throw.
+			print("runtime: addr = ", hex(addr.addr()), ", size = ", size, "\n")
+			print("runtime: base = ", hex(firstFree.base.addr()), ", bound = ", hex(firstFree.bound.addr()), "\n")
+			throw("range partially overlaps")
+		}
+	}
+
+	// lastSum is the summary which we saw on the previous level that made us
+	// move on to the next level. Used to print additional information in the
+	// case of a catastrophic failure.
+	// lastSumIdx is that summary's index in the previous level.
+	lastSum := packPallocSum(0, 0, 0)
+	lastSumIdx := -1
+
+nextLevel:
+	for l := 0; l < len(p.summary); l++ {
+		// For the root level, entriesPerBlock is the whole level.
+		entriesPerBlock := 1 << levelBits[l]
+		logMaxPages := levelLogPages[l]
+
+		// We've moved into a new level, so let's update i to our new
+		// starting index. This is a no-op for level 0.
+		i <<= levelBits[l]
+
+		// Slice out the block of entries we care about.
+		entries := p.summary[l][i : i+entriesPerBlock]
+
+		// Determine j0, the first index we should start iterating from.
+		// The searchAddr may help us eliminate iterations if we followed the
+		// searchAddr on the previous level or we're on the root level, in which
+		// case the searchAddr should be the same as i after levelShift.
+		j0 := 0
+		if searchIdx := offAddrToLevelIndex(l, p.searchAddr); searchIdx&^(entriesPerBlock-1) == i {
+			j0 = searchIdx & (entriesPerBlock - 1)
+		}
+
+		// Run over the level entries looking for
+		// a contiguous run of at least npages either
+		// within an entry or across entries.
+		//
+		// base contains the page index (relative to
+		// the first entry's first page) of the currently
+		// considered run of consecutive pages.
+		//
+		// size contains the size of the currently considered
+		// run of consecutive pages.
+		var base, size uint
+		for j := j0; j < len(entries); j++ {
+			sum := entries[j]
+			if sum == 0 {
+				// A full entry means we broke any streak and
+				// that we should skip it altogether.
+				size = 0
+				continue
+			}
+
+			// We've encountered a non-zero summary which means
+			// free memory, so update firstFree.
+			foundFree(levelIndexToOffAddr(l, i+j), (uintptr(1)<<logMaxPages)*pageSize)
+
+			s := sum.start()
+			if size+s >= uint(npages) {
+				// If size == 0 we don't have a run yet,
+				// which means base isn't valid. So, set
+				// base to the first page in this block.
+				if size == 0 {
+					base = uint(j) << logMaxPages
+				}
+				// We hit npages; we're done!
+				size += s
+				break
+			}
+			if sum.max() >= uint(npages) {
+				// The entry itself contains npages contiguous
+				// free pages, so continue on the next level
+				// to find that run.
+				i += j
+				lastSumIdx = i
+				lastSum = sum
+				continue nextLevel
+			}
+			if size == 0 || s < 1<<logMaxPages {
+				// We either don't have a current run started, or this entry
+				// isn't totally free (meaning we can't continue the current
+				// one), so try to begin a new run by setting size and base
+				// based on sum.end.
+				size = sum.end()
+				base = uint(j+1)<<logMaxPages - size
+				continue
+			}
+			// The entry is completely free, so continue the run.
+			size += 1 << logMaxPages
+		}
+		if size >= uint(npages) {
+			// We found a sufficiently large run of free pages straddling
+			// some boundary, so compute the address and return it.
+			addr := levelIndexToOffAddr(l, i).add(uintptr(base) * pageSize).addr()
+			return addr, p.findMappedAddr(firstFree.base)
+		}
+		if l == 0 {
+			// We're at level zero, so that means we've exhausted our search.
+			return 0, maxSearchAddr()
+		}
+
+		// We're not at level zero, and we exhausted the level we were looking in.
+		// This means that either our calculations were wrong or the level above
+		// lied to us. In either case, dump some useful state and throw.
+		print("runtime: summary[", l-1, "][", lastSumIdx, "] = ", lastSum.start(), ", ", lastSum.max(), ", ", lastSum.end(), "\n")
+		print("runtime: level = ", l, ", npages = ", npages, ", j0 = ", j0, "\n")
+		print("runtime: p.searchAddr = ", hex(p.searchAddr.addr()), ", i = ", i, "\n")
+		print("runtime: levelShift[level] = ", levelShift[l], ", levelBits[level] = ", levelBits[l], "\n")
+		for j := 0; j < len(entries); j++ {
+			sum := entries[j]
+			print("runtime: summary[", l, "][", i+j, "] = (", sum.start(), ", ", sum.max(), ", ", sum.end(), ")\n")
+		}
+		throw("bad summary data")
+	}
+
+	// Since we've gotten to this point, that means we haven't found a
+	// sufficiently-sized free region straddling some boundary (chunk or larger).
+	// This means the last summary we inspected must have had a large enough "max"
+	// value, so look inside the chunk to find a suitable run.
+	//
+	// After iterating over all levels, i must contain a chunk index which
+	// is what the final level represents.
+	ci := chunkIdx(i)
+	j, searchIdx := p.chunkOf(ci).find(npages, 0)
+	if j == ^uint(0) {
+		// We couldn't find any space in this chunk despite the summaries telling
+		// us it should be there. There's likely a bug, so dump some state and throw.
+		sum := p.summary[len(p.summary)-1][i]
+		print("runtime: summary[", len(p.summary)-1, "][", i, "] = (", sum.start(), ", ", sum.max(), ", ", sum.end(), ")\n")
+		print("runtime: npages = ", npages, "\n")
+		throw("bad summary data")
+	}
+
+	// Compute the address at which the free space starts.
+	addr := chunkBase(ci) + uintptr(j)*pageSize
+
+	// Since we actually searched the chunk, we may have
+	// found an even narrower free window.
+	searchAddr := chunkBase(ci) + uintptr(searchIdx)*pageSize
+	foundFree(offAddr{searchAddr}, chunkBase(ci+1)-searchAddr)
+	return addr, p.findMappedAddr(firstFree.base)
+}
+
+// alloc allocates npages worth of memory from the page heap, returning the base
+// address for the allocation and the amount of scavenged memory in bytes
+// contained in the region [base address, base address + npages*pageSize).
+//
+// Returns a 0 base address on failure, in which case other returned values
+// should be ignored.
+//
+// p.mheapLock must be held.
+//
+// Must run on the system stack because p.mheapLock must be held.
+//
+//go:systemstack
+func (p *pageAlloc) alloc(npages uintptr) (addr uintptr, scav uintptr) {
+	assertLockHeld(p.mheapLock)
+
+	// If the searchAddr refers to a region which has a higher address than
+	// any known chunk, then we know we're out of memory.
+	if chunkIndex(p.searchAddr.addr()) >= p.end {
+		return 0, 0
+	}
+
+	// If npages has a chance of fitting in the chunk where the searchAddr is,
+	// search it directly.
+	searchAddr := minOffAddr
+	if pallocChunkPages-chunkPageIndex(p.searchAddr.addr()) >= uint(npages) {
+		// npages is guaranteed to be no greater than pallocChunkPages here.
+		i := chunkIndex(p.searchAddr.addr())
+		if max := p.summary[len(p.summary)-1][i].max(); max >= uint(npages) {
+			j, searchIdx := p.chunkOf(i).find(npages, chunkPageIndex(p.searchAddr.addr()))
+			if j == ^uint(0) {
+				print("runtime: max = ", max, ", npages = ", npages, "\n")
+				print("runtime: searchIdx = ", chunkPageIndex(p.searchAddr.addr()), ", p.searchAddr = ", hex(p.searchAddr.addr()), "\n")
+				throw("bad summary data")
+			}
+			addr = chunkBase(i) + uintptr(j)*pageSize
+			searchAddr = offAddr{chunkBase(i) + uintptr(searchIdx)*pageSize}
+			goto Found
+		}
+	}
+	// We failed to use a searchAddr for one reason or another, so try
+	// the slow path.
+	addr, searchAddr = p.find(npages)
+	if addr == 0 {
+		if npages == 1 {
+			// We failed to find a single free page, the smallest unit
+			// of allocation. This means we know the heap is completely
+			// exhausted. Otherwise, the heap still might have free
+			// space in it, just not enough contiguous space to
+			// accommodate npages.
+			p.searchAddr = maxSearchAddr()
+		}
+		return 0, 0
+	}
+Found:
+	// Go ahead and actually mark the bits now that we have an address.
+	scav = p.allocRange(addr, npages)
+
+	// If we found a higher searchAddr, we know that all the
+	// heap memory before that searchAddr in an offset address space is
+	// allocated, so bump p.searchAddr up to the new one.
+	if p.searchAddr.lessThan(searchAddr) {
+		p.searchAddr = searchAddr
+	}
+	return addr, scav
+}
+
+// free returns npages worth of memory starting at base back to the page heap.
+//
+// p.mheapLock must be held.
+//
+// Must run on the system stack because p.mheapLock must be held.
+//
+//go:systemstack
+func (p *pageAlloc) free(base, npages uintptr) {
+	assertLockHeld(p.mheapLock)
+
+	// If we're freeing pages below the p.searchAddr, update searchAddr.
+	if b := (offAddr{base}); b.lessThan(p.searchAddr) {
+		p.searchAddr = b
+	}
+	limit := base + npages*pageSize - 1
+	if npages == 1 {
+		// Fast path: we're clearing a single bit, and we know exactly
+		// where it is, so mark it directly.
+		i := chunkIndex(base)
+		pi := chunkPageIndex(base)
+		p.chunkOf(i).free1(pi)
+		p.scav.index.free(i, pi, 1)
+	} else {
+		// Slow path: we're clearing more bits so we may need to iterate.
+		sc, ec := chunkIndex(base), chunkIndex(limit)
+		si, ei := chunkPageIndex(base), chunkPageIndex(limit)
+
+		if sc == ec {
+			// The range doesn't cross any chunk boundaries.
+			p.chunkOf(sc).free(si, ei+1-si)
+			p.scav.index.free(sc, si, ei+1-si)
+		} else {
+			// The range crosses at least one chunk boundary.
+			p.chunkOf(sc).free(si, pallocChunkPages-si)
+			p.scav.index.free(sc, si, pallocChunkPages-si)
+			for c := sc + 1; c < ec; c++ {
+				p.chunkOf(c).freeAll()
+				p.scav.index.free(c, 0, pallocChunkPages)
+			}
+			p.chunkOf(ec).free(0, ei+1)
+			p.scav.index.free(ec, 0, ei+1)
+		}
+	}
+	p.update(base, npages, true, false)
+}
+
+const (
+	pallocSumBytes = unsafe.Sizeof(pallocSum(0))
+
+	// maxPackedValue is the maximum value that any of the three fields in
+	// the pallocSum may take on.
+	maxPackedValue    = 1 << logMaxPackedValue
+	logMaxPackedValue = logPallocChunkPages + (summaryLevels-1)*summaryLevelBits
+
+	freeChunkSum = pallocSum(uint64(pallocChunkPages) |
+		uint64(pallocChunkPages<<logMaxPackedValue) |
+		uint64(pallocChunkPages<<(2*logMaxPackedValue)))
+)
+
+// pallocSum is a packed summary type which packs three numbers: start, max,
+// and end into a single 8-byte value. Each of these values are a summary of
+// a bitmap and are thus counts, each of which may have a maximum value of
+// 2^21 - 1, or all three may be equal to 2^21. The latter case is represented
+// by just setting the 64th bit.
+type pallocSum uint64
+
+// packPallocSum takes a start, max, and end value and produces a pallocSum.
+func packPallocSum(start, max, end uint) pallocSum {
+	if max == maxPackedValue {
+		return pallocSum(uint64(1 << 63))
+	}
+	return pallocSum((uint64(start) & (maxPackedValue - 1)) |
+		((uint64(max) & (maxPackedValue - 1)) << logMaxPackedValue) |
+		((uint64(end) & (maxPackedValue - 1)) << (2 * logMaxPackedValue)))
+}
+
+// start extracts the start value from a packed sum.
+func (p pallocSum) start() uint {
+	if uint64(p)&uint64(1<<63) != 0 {
+		return maxPackedValue
+	}
+	return uint(uint64(p) & (maxPackedValue - 1))
+}
+
+// max extracts the max value from a packed sum.
+func (p pallocSum) max() uint {
+	if uint64(p)&uint64(1<<63) != 0 {
+		return maxPackedValue
+	}
+	return uint((uint64(p) >> logMaxPackedValue) & (maxPackedValue - 1))
+}
+
+// end extracts the end value from a packed sum.
+func (p pallocSum) end() uint {
+	if uint64(p)&uint64(1<<63) != 0 {
+		return maxPackedValue
+	}
+	return uint((uint64(p) >> (2 * logMaxPackedValue)) & (maxPackedValue - 1))
+}
+
+// unpack unpacks all three values from the summary.
+func (p pallocSum) unpack() (uint, uint, uint) {
+	if uint64(p)&uint64(1<<63) != 0 {
+		return maxPackedValue, maxPackedValue, maxPackedValue
+	}
+	return uint(uint64(p) & (maxPackedValue - 1)),
+		uint((uint64(p) >> logMaxPackedValue) & (maxPackedValue - 1)),
+		uint((uint64(p) >> (2 * logMaxPackedValue)) & (maxPackedValue - 1))
+}
+
+// mergeSummaries merges consecutive summaries which may each represent at
+// most 1 << logMaxPagesPerSum pages each together into one.
+func mergeSummaries(sums []pallocSum, logMaxPagesPerSum uint) pallocSum {
+	// Merge the summaries in sums into one.
+	//
+	// We do this by keeping a running summary representing the merged
+	// summaries of sums[:i] in start, max, and end.
+	start, max, end := sums[0].unpack()
+	for i := 1; i < len(sums); i++ {
+		// Merge in sums[i].
+		si, mi, ei := sums[i].unpack()
+
+		// Merge in sums[i].start only if the running summary is
+		// completely free, otherwise this summary's start
+		// plays no role in the combined sum.
+		if start == uint(i)<<logMaxPagesPerSum {
+			start += si
+		}
+
+		// Recompute the max value of the running sum by looking
+		// across the boundary between the running sum and sums[i]
+		// and at the max sums[i], taking the greatest of those two
+		// and the max of the running sum.
+		if end+si > max {
+			max = end + si
+		}
+		if mi > max {
+			max = mi
+		}
+
+		// Merge in end by checking if this new summary is totally
+		// free. If it is, then we want to extend the running sum's
+		// end by the new summary. If not, then we have some alloc'd
+		// pages in there and we just want to take the end value in
+		// sums[i].
+		if ei == 1<<logMaxPagesPerSum {
+			end += 1 << logMaxPagesPerSum
+		} else {
+			end = ei
+		}
+	}
+	return packPallocSum(start, max, end)
+}
diff --git a/src/runtime/mpagealloc_32bit.go b/src/runtime/mpagealloc_32bit.go
new file mode 100644
index 0000000..900146e
--- /dev/null
+++ b/src/runtime/mpagealloc_32bit.go
@@ -0,0 +1,140 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build 386 || arm || mips || mipsle || wasm
+
+// wasm is a treated as a 32-bit architecture for the purposes of the page
+// allocator, even though it has 64-bit pointers. This is because any wasm
+// pointer always has its top 32 bits as zero, so the effective heap address
+// space is only 2^32 bytes in size (see heapAddrBits).
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+const (
+	// The number of levels in the radix tree.
+	summaryLevels = 4
+
+	// Constants for testing.
+	pageAlloc32Bit = 1
+	pageAlloc64Bit = 0
+
+	// Number of bits needed to represent all indices into the L1 of the
+	// chunks map.
+	//
+	// See (*pageAlloc).chunks for more details. Update the documentation
+	// there should this number change.
+	pallocChunksL1Bits = 0
+)
+
+// See comment in mpagealloc_64bit.go.
+var levelBits = [summaryLevels]uint{
+	summaryL0Bits,
+	summaryLevelBits,
+	summaryLevelBits,
+	summaryLevelBits,
+}
+
+// See comment in mpagealloc_64bit.go.
+var levelShift = [summaryLevels]uint{
+	heapAddrBits - summaryL0Bits,
+	heapAddrBits - summaryL0Bits - 1*summaryLevelBits,
+	heapAddrBits - summaryL0Bits - 2*summaryLevelBits,
+	heapAddrBits - summaryL0Bits - 3*summaryLevelBits,
+}
+
+// See comment in mpagealloc_64bit.go.
+var levelLogPages = [summaryLevels]uint{
+	logPallocChunkPages + 3*summaryLevelBits,
+	logPallocChunkPages + 2*summaryLevelBits,
+	logPallocChunkPages + 1*summaryLevelBits,
+	logPallocChunkPages,
+}
+
+// scavengeIndexArray is the backing store for p.scav.index.chunks.
+// On 32-bit platforms, it's small enough to just be a global.
+var scavengeIndexArray [(1 << heapAddrBits) / pallocChunkBytes]atomicScavChunkData
+
+// See mpagealloc_64bit.go for details.
+func (p *pageAlloc) sysInit(test bool) {
+	// Calculate how much memory all our entries will take up.
+	//
+	// This should be around 12 KiB or less.
+	totalSize := uintptr(0)
+	for l := 0; l < summaryLevels; l++ {
+		totalSize += (uintptr(1) << (heapAddrBits - levelShift[l])) * pallocSumBytes
+	}
+	totalSize = alignUp(totalSize, physPageSize)
+
+	// Reserve memory for all levels in one go. There shouldn't be much for 32-bit.
+	reservation := sysReserve(nil, totalSize)
+	if reservation == nil {
+		throw("failed to reserve page summary memory")
+	}
+	// There isn't much. Just map it and mark it as used immediately.
+	sysMap(reservation, totalSize, p.sysStat)
+	sysUsed(reservation, totalSize, totalSize)
+	p.summaryMappedReady += totalSize
+
+	// Iterate over the reservation and cut it up into slices.
+	//
+	// Maintain i as the byte offset from reservation where
+	// the new slice should start.
+	for l, shift := range levelShift {
+		entries := 1 << (heapAddrBits - shift)
+
+		// Put this reservation into a slice.
+		sl := notInHeapSlice{(*notInHeap)(reservation), 0, entries}
+		p.summary[l] = *(*[]pallocSum)(unsafe.Pointer(&sl))
+
+		reservation = add(reservation, uintptr(entries)*pallocSumBytes)
+	}
+}
+
+// See mpagealloc_64bit.go for details.
+func (p *pageAlloc) sysGrow(base, limit uintptr) {
+	if base%pallocChunkBytes != 0 || limit%pallocChunkBytes != 0 {
+		print("runtime: base = ", hex(base), ", limit = ", hex(limit), "\n")
+		throw("sysGrow bounds not aligned to pallocChunkBytes")
+	}
+
+	// Walk up the tree and update the summary slices.
+	for l := len(p.summary) - 1; l >= 0; l-- {
+		// Figure out what part of the summary array this new address space needs.
+		// Note that we need to align the ranges to the block width (1<<levelBits[l])
+		// at this level because the full block is needed to compute the summary for
+		// the next level.
+		lo, hi := addrsToSummaryRange(l, base, limit)
+		_, hi = blockAlignSummaryRange(l, lo, hi)
+		if hi > len(p.summary[l]) {
+			p.summary[l] = p.summary[l][:hi]
+		}
+	}
+}
+
+// sysInit initializes the scavengeIndex' chunks array.
+//
+// Returns the amount of memory added to sysStat.
+func (s *scavengeIndex) sysInit(test bool, sysStat *sysMemStat) (mappedReady uintptr) {
+	if test {
+		// Set up the scavenge index via sysAlloc so the test can free it later.
+		scavIndexSize := uintptr(len(scavengeIndexArray)) * unsafe.Sizeof(atomicScavChunkData{})
+		s.chunks = ((*[(1 << heapAddrBits) / pallocChunkBytes]atomicScavChunkData)(sysAlloc(scavIndexSize, sysStat)))[:]
+		mappedReady = scavIndexSize
+	} else {
+		// Set up the scavenge index.
+		s.chunks = scavengeIndexArray[:]
+	}
+	s.min.Store(1) // The 0th chunk is never going to be mapped for the heap.
+	s.max.Store(uintptr(len(s.chunks)))
+	return
+}
+
+// sysGrow is a no-op on 32-bit platforms.
+func (s *scavengeIndex) sysGrow(base, limit uintptr, sysStat *sysMemStat) uintptr {
+	return 0
+}
diff --git a/src/runtime/mpagealloc_64bit.go b/src/runtime/mpagealloc_64bit.go
new file mode 100644
index 0000000..1418831
--- /dev/null
+++ b/src/runtime/mpagealloc_64bit.go
@@ -0,0 +1,258 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build amd64 || arm64 || loong64 || mips64 || mips64le || ppc64 || ppc64le || riscv64 || s390x
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+const (
+	// The number of levels in the radix tree.
+	summaryLevels = 5
+
+	// Constants for testing.
+	pageAlloc32Bit = 0
+	pageAlloc64Bit = 1
+
+	// Number of bits needed to represent all indices into the L1 of the
+	// chunks map.
+	//
+	// See (*pageAlloc).chunks for more details. Update the documentation
+	// there should this number change.
+	pallocChunksL1Bits = 13
+)
+
+// levelBits is the number of bits in the radix for a given level in the super summary
+// structure.
+//
+// The sum of all the entries of levelBits should equal heapAddrBits.
+var levelBits = [summaryLevels]uint{
+	summaryL0Bits,
+	summaryLevelBits,
+	summaryLevelBits,
+	summaryLevelBits,
+	summaryLevelBits,
+}
+
+// levelShift is the number of bits to shift to acquire the radix for a given level
+// in the super summary structure.
+//
+// With levelShift, one can compute the index of the summary at level l related to a
+// pointer p by doing:
+//
+//	p >> levelShift[l]
+var levelShift = [summaryLevels]uint{
+	heapAddrBits - summaryL0Bits,
+	heapAddrBits - summaryL0Bits - 1*summaryLevelBits,
+	heapAddrBits - summaryL0Bits - 2*summaryLevelBits,
+	heapAddrBits - summaryL0Bits - 3*summaryLevelBits,
+	heapAddrBits - summaryL0Bits - 4*summaryLevelBits,
+}
+
+// levelLogPages is log2 the maximum number of runtime pages in the address space
+// a summary in the given level represents.
+//
+// The leaf level always represents exactly log2 of 1 chunk's worth of pages.
+var levelLogPages = [summaryLevels]uint{
+	logPallocChunkPages + 4*summaryLevelBits,
+	logPallocChunkPages + 3*summaryLevelBits,
+	logPallocChunkPages + 2*summaryLevelBits,
+	logPallocChunkPages + 1*summaryLevelBits,
+	logPallocChunkPages,
+}
+
+// sysInit performs architecture-dependent initialization of fields
+// in pageAlloc. pageAlloc should be uninitialized except for sysStat
+// if any runtime statistic should be updated.
+func (p *pageAlloc) sysInit(test bool) {
+	// Reserve memory for each level. This will get mapped in
+	// as R/W by setArenas.
+	for l, shift := range levelShift {
+		entries := 1 << (heapAddrBits - shift)
+
+		// Reserve b bytes of memory anywhere in the address space.
+		b := alignUp(uintptr(entries)*pallocSumBytes, physPageSize)
+		r := sysReserve(nil, b)
+		if r == nil {
+			throw("failed to reserve page summary memory")
+		}
+
+		// Put this reservation into a slice.
+		sl := notInHeapSlice{(*notInHeap)(r), 0, entries}
+		p.summary[l] = *(*[]pallocSum)(unsafe.Pointer(&sl))
+	}
+}
+
+// sysGrow performs architecture-dependent operations on heap
+// growth for the page allocator, such as mapping in new memory
+// for summaries. It also updates the length of the slices in
+// p.summary.
+//
+// base is the base of the newly-added heap memory and limit is
+// the first address past the end of the newly-added heap memory.
+// Both must be aligned to pallocChunkBytes.
+//
+// The caller must update p.start and p.end after calling sysGrow.
+func (p *pageAlloc) sysGrow(base, limit uintptr) {
+	if base%pallocChunkBytes != 0 || limit%pallocChunkBytes != 0 {
+		print("runtime: base = ", hex(base), ", limit = ", hex(limit), "\n")
+		throw("sysGrow bounds not aligned to pallocChunkBytes")
+	}
+
+	// addrRangeToSummaryRange converts a range of addresses into a range
+	// of summary indices which must be mapped to support those addresses
+	// in the summary range.
+	addrRangeToSummaryRange := func(level int, r addrRange) (int, int) {
+		sumIdxBase, sumIdxLimit := addrsToSummaryRange(level, r.base.addr(), r.limit.addr())
+		return blockAlignSummaryRange(level, sumIdxBase, sumIdxLimit)
+	}
+
+	// summaryRangeToSumAddrRange converts a range of indices in any
+	// level of p.summary into page-aligned addresses which cover that
+	// range of indices.
+	summaryRangeToSumAddrRange := func(level, sumIdxBase, sumIdxLimit int) addrRange {
+		baseOffset := alignDown(uintptr(sumIdxBase)*pallocSumBytes, physPageSize)
+		limitOffset := alignUp(uintptr(sumIdxLimit)*pallocSumBytes, physPageSize)
+		base := unsafe.Pointer(&p.summary[level][0])
+		return addrRange{
+			offAddr{uintptr(add(base, baseOffset))},
+			offAddr{uintptr(add(base, limitOffset))},
+		}
+	}
+
+	// addrRangeToSumAddrRange is a convenience function that converts
+	// an address range r to the address range of the given summary level
+	// that stores the summaries for r.
+	addrRangeToSumAddrRange := func(level int, r addrRange) addrRange {
+		sumIdxBase, sumIdxLimit := addrRangeToSummaryRange(level, r)
+		return summaryRangeToSumAddrRange(level, sumIdxBase, sumIdxLimit)
+	}
+
+	// Find the first inUse index which is strictly greater than base.
+	//
+	// Because this function will never be asked remap the same memory
+	// twice, this index is effectively the index at which we would insert
+	// this new growth, and base will never overlap/be contained within
+	// any existing range.
+	//
+	// This will be used to look at what memory in the summary array is already
+	// mapped before and after this new range.
+	inUseIndex := p.inUse.findSucc(base)
+
+	// Walk up the radix tree and map summaries in as needed.
+	for l := range p.summary {
+		// Figure out what part of the summary array this new address space needs.
+		needIdxBase, needIdxLimit := addrRangeToSummaryRange(l, makeAddrRange(base, limit))
+
+		// Update the summary slices with a new upper-bound. This ensures
+		// we get tight bounds checks on at least the top bound.
+		//
+		// We must do this regardless of whether we map new memory.
+		if needIdxLimit > len(p.summary[l]) {
+			p.summary[l] = p.summary[l][:needIdxLimit]
+		}
+
+		// Compute the needed address range in the summary array for level l.
+		need := summaryRangeToSumAddrRange(l, needIdxBase, needIdxLimit)
+
+		// Prune need down to what needs to be newly mapped. Some parts of it may
+		// already be mapped by what inUse describes due to page alignment requirements
+		// for mapping. Because this function will never be asked to remap the same
+		// memory twice, it should never be possible to prune in such a way that causes
+		// need to be split.
+		if inUseIndex > 0 {
+			need = need.subtract(addrRangeToSumAddrRange(l, p.inUse.ranges[inUseIndex-1]))
+		}
+		if inUseIndex < len(p.inUse.ranges) {
+			need = need.subtract(addrRangeToSumAddrRange(l, p.inUse.ranges[inUseIndex]))
+		}
+		// It's possible that after our pruning above, there's nothing new to map.
+		if need.size() == 0 {
+			continue
+		}
+
+		// Map and commit need.
+		sysMap(unsafe.Pointer(need.base.addr()), need.size(), p.sysStat)
+		sysUsed(unsafe.Pointer(need.base.addr()), need.size(), need.size())
+		p.summaryMappedReady += need.size()
+	}
+
+	// Update the scavenge index.
+	p.summaryMappedReady += p.scav.index.sysGrow(base, limit, p.sysStat)
+}
+
+// sysGrow increases the index's backing store in response to a heap growth.
+//
+// Returns the amount of memory added to sysStat.
+func (s *scavengeIndex) sysGrow(base, limit uintptr, sysStat *sysMemStat) uintptr {
+	if base%pallocChunkBytes != 0 || limit%pallocChunkBytes != 0 {
+		print("runtime: base = ", hex(base), ", limit = ", hex(limit), "\n")
+		throw("sysGrow bounds not aligned to pallocChunkBytes")
+	}
+	scSize := unsafe.Sizeof(atomicScavChunkData{})
+	// Map and commit the pieces of chunks that we need.
+	//
+	// We always map the full range of the minimum heap address to the
+	// maximum heap address. We don't do this for the summary structure
+	// because it's quite large and a discontiguous heap could cause a
+	// lot of memory to be used. In this situation, the worst case overhead
+	// is in the single-digit MiB if we map the whole thing.
+	//
+	// The base address of the backing store is always page-aligned,
+	// because it comes from the OS, so it's sufficient to align the
+	// index.
+	haveMin := s.min.Load()
+	haveMax := s.max.Load()
+	needMin := alignDown(uintptr(chunkIndex(base)), physPageSize/scSize)
+	needMax := alignUp(uintptr(chunkIndex(limit)), physPageSize/scSize)
+	// Extend the range down to what we have, if there's no overlap.
+	if needMax < haveMin {
+		needMax = haveMin
+	}
+	if haveMax != 0 && needMin > haveMax {
+		needMin = haveMax
+	}
+	have := makeAddrRange(
+		// Avoid a panic from indexing one past the last element.
+		uintptr(unsafe.Pointer(&s.chunks[0]))+haveMin*scSize,
+		uintptr(unsafe.Pointer(&s.chunks[0]))+haveMax*scSize,
+	)
+	need := makeAddrRange(
+		// Avoid a panic from indexing one past the last element.
+		uintptr(unsafe.Pointer(&s.chunks[0]))+needMin*scSize,
+		uintptr(unsafe.Pointer(&s.chunks[0]))+needMax*scSize,
+	)
+	// Subtract any overlap from rounding. We can't re-map memory because
+	// it'll be zeroed.
+	need = need.subtract(have)
+
+	// If we've got something to map, map it, and update the slice bounds.
+	if need.size() != 0 {
+		sysMap(unsafe.Pointer(need.base.addr()), need.size(), sysStat)
+		sysUsed(unsafe.Pointer(need.base.addr()), need.size(), need.size())
+		// Update the indices only after the new memory is valid.
+		if haveMin == 0 || needMin < haveMin {
+			s.min.Store(needMin)
+		}
+		if haveMax == 0 || needMax > haveMax {
+			s.max.Store(needMax)
+		}
+	}
+	return need.size()
+}
+
+// sysInit initializes the scavengeIndex' chunks array.
+//
+// Returns the amount of memory added to sysStat.
+func (s *scavengeIndex) sysInit(test bool, sysStat *sysMemStat) uintptr {
+	n := uintptr(1<<heapAddrBits) / pallocChunkBytes
+	nbytes := n * unsafe.Sizeof(atomicScavChunkData{})
+	r := sysReserve(nil, nbytes)
+	sl := notInHeapSlice{(*notInHeap)(r), int(n), int(n)}
+	s.chunks = *(*[]atomicScavChunkData)(unsafe.Pointer(&sl))
+	return 0 // All memory above is mapped Reserved.
+}
diff --git a/src/runtime/mpagealloc_test.go b/src/runtime/mpagealloc_test.go
new file mode 100644
index 0000000..f2b82e3
--- /dev/null
+++ b/src/runtime/mpagealloc_test.go
@@ -0,0 +1,1040 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"internal/goos"
+	. "runtime"
+	"testing"
+)
+
+func checkPageAlloc(t *testing.T, want, got *PageAlloc) {
+	// Ensure start and end are correct.
+	wantStart, wantEnd := want.Bounds()
+	gotStart, gotEnd := got.Bounds()
+	if gotStart != wantStart {
+		t.Fatalf("start values not equal: got %d, want %d", gotStart, wantStart)
+	}
+	if gotEnd != wantEnd {
+		t.Fatalf("end values not equal: got %d, want %d", gotEnd, wantEnd)
+	}
+
+	for i := gotStart; i < gotEnd; i++ {
+		// Check the bitmaps. Note that we may have nil data.
+		gb, wb := got.PallocData(i), want.PallocData(i)
+		if gb == nil && wb == nil {
+			continue
+		}
+		if (gb == nil && wb != nil) || (gb != nil && wb == nil) {
+			t.Errorf("chunk %d nilness mismatch", i)
+		}
+		if !checkPallocBits(t, gb.PallocBits(), wb.PallocBits()) {
+			t.Logf("in chunk %d (mallocBits)", i)
+		}
+		if !checkPallocBits(t, gb.Scavenged(), wb.Scavenged()) {
+			t.Logf("in chunk %d (scavenged)", i)
+		}
+	}
+	// TODO(mknyszek): Verify summaries too?
+}
+
+func TestPageAllocGrow(t *testing.T) {
+	if GOOS == "openbsd" && testing.Short() {
+		t.Skip("skipping because virtual memory is limited; see #36210")
+	}
+	type test struct {
+		chunks []ChunkIdx
+		inUse  []AddrRange
+	}
+	tests := map[string]test{
+		"One": {
+			chunks: []ChunkIdx{
+				BaseChunkIdx,
+			},
+			inUse: []AddrRange{
+				MakeAddrRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+1, 0)),
+			},
+		},
+		"Contiguous2": {
+			chunks: []ChunkIdx{
+				BaseChunkIdx,
+				BaseChunkIdx + 1,
+			},
+			inUse: []AddrRange{
+				MakeAddrRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+2, 0)),
+			},
+		},
+		"Contiguous5": {
+			chunks: []ChunkIdx{
+				BaseChunkIdx,
+				BaseChunkIdx + 1,
+				BaseChunkIdx + 2,
+				BaseChunkIdx + 3,
+				BaseChunkIdx + 4,
+			},
+			inUse: []AddrRange{
+				MakeAddrRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+5, 0)),
+			},
+		},
+		"Discontiguous": {
+			chunks: []ChunkIdx{
+				BaseChunkIdx,
+				BaseChunkIdx + 2,
+				BaseChunkIdx + 4,
+			},
+			inUse: []AddrRange{
+				MakeAddrRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+1, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+2, 0), PageBase(BaseChunkIdx+3, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+4, 0), PageBase(BaseChunkIdx+5, 0)),
+			},
+		},
+		"Mixed": {
+			chunks: []ChunkIdx{
+				BaseChunkIdx,
+				BaseChunkIdx + 1,
+				BaseChunkIdx + 2,
+				BaseChunkIdx + 4,
+			},
+			inUse: []AddrRange{
+				MakeAddrRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+3, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+4, 0), PageBase(BaseChunkIdx+5, 0)),
+			},
+		},
+		"WildlyDiscontiguous": {
+			chunks: []ChunkIdx{
+				BaseChunkIdx,
+				BaseChunkIdx + 1,
+				BaseChunkIdx + 0x10,
+				BaseChunkIdx + 0x21,
+			},
+			inUse: []AddrRange{
+				MakeAddrRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+2, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+0x10, 0), PageBase(BaseChunkIdx+0x11, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+0x21, 0), PageBase(BaseChunkIdx+0x22, 0)),
+			},
+		},
+		"ManyDiscontiguous": {
+			// The initial cap is 16. Test 33 ranges, to exercise the growth path (twice).
+			chunks: []ChunkIdx{
+				BaseChunkIdx, BaseChunkIdx + 2, BaseChunkIdx + 4, BaseChunkIdx + 6,
+				BaseChunkIdx + 8, BaseChunkIdx + 10, BaseChunkIdx + 12, BaseChunkIdx + 14,
+				BaseChunkIdx + 16, BaseChunkIdx + 18, BaseChunkIdx + 20, BaseChunkIdx + 22,
+				BaseChunkIdx + 24, BaseChunkIdx + 26, BaseChunkIdx + 28, BaseChunkIdx + 30,
+				BaseChunkIdx + 32, BaseChunkIdx + 34, BaseChunkIdx + 36, BaseChunkIdx + 38,
+				BaseChunkIdx + 40, BaseChunkIdx + 42, BaseChunkIdx + 44, BaseChunkIdx + 46,
+				BaseChunkIdx + 48, BaseChunkIdx + 50, BaseChunkIdx + 52, BaseChunkIdx + 54,
+				BaseChunkIdx + 56, BaseChunkIdx + 58, BaseChunkIdx + 60, BaseChunkIdx + 62,
+				BaseChunkIdx + 64,
+			},
+			inUse: []AddrRange{
+				MakeAddrRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+1, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+2, 0), PageBase(BaseChunkIdx+3, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+4, 0), PageBase(BaseChunkIdx+5, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+6, 0), PageBase(BaseChunkIdx+7, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+8, 0), PageBase(BaseChunkIdx+9, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+10, 0), PageBase(BaseChunkIdx+11, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+12, 0), PageBase(BaseChunkIdx+13, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+14, 0), PageBase(BaseChunkIdx+15, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+16, 0), PageBase(BaseChunkIdx+17, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+18, 0), PageBase(BaseChunkIdx+19, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+20, 0), PageBase(BaseChunkIdx+21, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+22, 0), PageBase(BaseChunkIdx+23, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+24, 0), PageBase(BaseChunkIdx+25, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+26, 0), PageBase(BaseChunkIdx+27, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+28, 0), PageBase(BaseChunkIdx+29, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+30, 0), PageBase(BaseChunkIdx+31, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+32, 0), PageBase(BaseChunkIdx+33, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+34, 0), PageBase(BaseChunkIdx+35, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+36, 0), PageBase(BaseChunkIdx+37, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+38, 0), PageBase(BaseChunkIdx+39, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+40, 0), PageBase(BaseChunkIdx+41, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+42, 0), PageBase(BaseChunkIdx+43, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+44, 0), PageBase(BaseChunkIdx+45, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+46, 0), PageBase(BaseChunkIdx+47, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+48, 0), PageBase(BaseChunkIdx+49, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+50, 0), PageBase(BaseChunkIdx+51, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+52, 0), PageBase(BaseChunkIdx+53, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+54, 0), PageBase(BaseChunkIdx+55, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+56, 0), PageBase(BaseChunkIdx+57, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+58, 0), PageBase(BaseChunkIdx+59, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+60, 0), PageBase(BaseChunkIdx+61, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+62, 0), PageBase(BaseChunkIdx+63, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+64, 0), PageBase(BaseChunkIdx+65, 0)),
+			},
+		},
+	}
+	// Disable these tests on iOS since we have a small address space.
+	// See #46860.
+	if PageAlloc64Bit != 0 && goos.IsIos == 0 {
+		tests["ExtremelyDiscontiguous"] = test{
+			chunks: []ChunkIdx{
+				BaseChunkIdx,
+				BaseChunkIdx + 0x100000, // constant translates to O(TiB)
+			},
+			inUse: []AddrRange{
+				MakeAddrRange(PageBase(BaseChunkIdx, 0), PageBase(BaseChunkIdx+1, 0)),
+				MakeAddrRange(PageBase(BaseChunkIdx+0x100000, 0), PageBase(BaseChunkIdx+0x100001, 0)),
+			},
+		}
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			// By creating a new pageAlloc, we will
+			// grow it for each chunk defined in x.
+			x := make(map[ChunkIdx][]BitRange)
+			for _, c := range v.chunks {
+				x[c] = []BitRange{}
+			}
+			b := NewPageAlloc(x, nil)
+			defer FreePageAlloc(b)
+
+			got := b.InUse()
+			want := v.inUse
+
+			// Check for mismatches.
+			if len(got) != len(want) {
+				t.Fail()
+			} else {
+				for i := range want {
+					if !want[i].Equals(got[i]) {
+						t.Fail()
+						break
+					}
+				}
+			}
+			if t.Failed() {
+				t.Logf("found inUse mismatch")
+				t.Logf("got:")
+				for i, r := range got {
+					t.Logf("\t#%d [0x%x, 0x%x)", i, r.Base(), r.Limit())
+				}
+				t.Logf("want:")
+				for i, r := range want {
+					t.Logf("\t#%d [0x%x, 0x%x)", i, r.Base(), r.Limit())
+				}
+			}
+		})
+	}
+}
+
+func TestPageAllocAlloc(t *testing.T) {
+	if GOOS == "openbsd" && testing.Short() {
+		t.Skip("skipping because virtual memory is limited; see #36210")
+	}
+	type hit struct {
+		npages, base, scav uintptr
+	}
+	type test struct {
+		scav   map[ChunkIdx][]BitRange
+		before map[ChunkIdx][]BitRange
+		after  map[ChunkIdx][]BitRange
+		hits   []hit
+	}
+	tests := map[string]test{
+		"AllFree1": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 1}, {2, 2}},
+			},
+			hits: []hit{
+				{1, PageBase(BaseChunkIdx, 0), PageSize},
+				{1, PageBase(BaseChunkIdx, 1), 0},
+				{1, PageBase(BaseChunkIdx, 2), PageSize},
+				{1, PageBase(BaseChunkIdx, 3), PageSize},
+				{1, PageBase(BaseChunkIdx, 4), 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 5}},
+			},
+		},
+		"ManyArena1": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages - 1}},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+			},
+			hits: []hit{
+				{1, PageBase(BaseChunkIdx+2, PallocChunkPages-1), PageSize},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+			},
+		},
+		"NotContiguous1": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, PallocChunkPages}},
+				BaseChunkIdx + 0xff: {{0, 0}},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, PallocChunkPages}},
+				BaseChunkIdx + 0xff: {{0, PallocChunkPages}},
+			},
+			hits: []hit{
+				{1, PageBase(BaseChunkIdx+0xff, 0), PageSize},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, PallocChunkPages}},
+				BaseChunkIdx + 0xff: {{0, 1}},
+			},
+		},
+		"AllFree2": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 3}, {7, 1}},
+			},
+			hits: []hit{
+				{2, PageBase(BaseChunkIdx, 0), 2 * PageSize},
+				{2, PageBase(BaseChunkIdx, 2), PageSize},
+				{2, PageBase(BaseChunkIdx, 4), 0},
+				{2, PageBase(BaseChunkIdx, 6), PageSize},
+				{2, PageBase(BaseChunkIdx, 8), 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 10}},
+			},
+		},
+		"Straddle2": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages - 1}},
+				BaseChunkIdx + 1: {{1, PallocChunkPages - 1}},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{PallocChunkPages - 1, 1}},
+				BaseChunkIdx + 1: {},
+			},
+			hits: []hit{
+				{2, PageBase(BaseChunkIdx, PallocChunkPages-1), PageSize},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+			},
+		},
+		"AllFree5": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 8}, {9, 1}, {17, 5}},
+			},
+			hits: []hit{
+				{5, PageBase(BaseChunkIdx, 0), 5 * PageSize},
+				{5, PageBase(BaseChunkIdx, 5), 4 * PageSize},
+				{5, PageBase(BaseChunkIdx, 10), 0},
+				{5, PageBase(BaseChunkIdx, 15), 3 * PageSize},
+				{5, PageBase(BaseChunkIdx, 20), 2 * PageSize},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 25}},
+			},
+		},
+		"AllFree64": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{21, 1}, {63, 65}},
+			},
+			hits: []hit{
+				{64, PageBase(BaseChunkIdx, 0), 2 * PageSize},
+				{64, PageBase(BaseChunkIdx, 64), 64 * PageSize},
+				{64, PageBase(BaseChunkIdx, 128), 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 192}},
+			},
+		},
+		"AllFree65": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{129, 1}},
+			},
+			hits: []hit{
+				{65, PageBase(BaseChunkIdx, 0), 0},
+				{65, PageBase(BaseChunkIdx, 65), PageSize},
+				{65, PageBase(BaseChunkIdx, 130), 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 195}},
+			},
+		},
+		"ExhaustPallocChunkPages-3": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{10, 1}},
+			},
+			hits: []hit{
+				{PallocChunkPages - 3, PageBase(BaseChunkIdx, 0), PageSize},
+				{PallocChunkPages - 3, 0, 0},
+				{1, PageBase(BaseChunkIdx, PallocChunkPages-3), 0},
+				{2, PageBase(BaseChunkIdx, PallocChunkPages-2), 0},
+				{1, 0, 0},
+				{PallocChunkPages - 3, 0, 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+		},
+		"AllFreePallocChunkPages": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 1}, {PallocChunkPages - 1, 1}},
+			},
+			hits: []hit{
+				{PallocChunkPages, PageBase(BaseChunkIdx, 0), 2 * PageSize},
+				{PallocChunkPages, 0, 0},
+				{1, 0, 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+		},
+		"StraddlePallocChunkPages": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages / 2}},
+				BaseChunkIdx + 1: {{PallocChunkPages / 2, PallocChunkPages / 2}},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {{3, 100}},
+			},
+			hits: []hit{
+				{PallocChunkPages, PageBase(BaseChunkIdx, PallocChunkPages/2), 100 * PageSize},
+				{PallocChunkPages, 0, 0},
+				{1, 0, 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+			},
+		},
+		"StraddlePallocChunkPages+1": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages / 2}},
+				BaseChunkIdx + 1: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+			},
+			hits: []hit{
+				{PallocChunkPages + 1, PageBase(BaseChunkIdx, PallocChunkPages/2), (PallocChunkPages + 1) * PageSize},
+				{PallocChunkPages, 0, 0},
+				{1, PageBase(BaseChunkIdx+1, PallocChunkPages/2+1), PageSize},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages/2 + 2}},
+			},
+		},
+		"AllFreePallocChunkPages*2": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+			},
+			hits: []hit{
+				{PallocChunkPages * 2, PageBase(BaseChunkIdx, 0), 0},
+				{PallocChunkPages * 2, 0, 0},
+				{1, 0, 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+			},
+		},
+		"NotContiguousPallocChunkPages*2": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {},
+				BaseChunkIdx + 0x40: {},
+				BaseChunkIdx + 0x41: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, PallocChunkPages}},
+				BaseChunkIdx + 0x40: {},
+				BaseChunkIdx + 0x41: {},
+			},
+			hits: []hit{
+				{PallocChunkPages * 2, PageBase(BaseChunkIdx+0x40, 0), 0},
+				{21, PageBase(BaseChunkIdx, 0), 21 * PageSize},
+				{1, PageBase(BaseChunkIdx, 21), PageSize},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, 22}},
+				BaseChunkIdx + 0x40: {{0, PallocChunkPages}},
+				BaseChunkIdx + 0x41: {{0, PallocChunkPages}},
+			},
+		},
+		"StraddlePallocChunkPages*2": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages / 2}},
+				BaseChunkIdx + 1: {},
+				BaseChunkIdx + 2: {{PallocChunkPages / 2, PallocChunkPages / 2}},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, 7}},
+				BaseChunkIdx + 1: {{3, 5}, {121, 10}},
+				BaseChunkIdx + 2: {{PallocChunkPages/2 + 12, 2}},
+			},
+			hits: []hit{
+				{PallocChunkPages * 2, PageBase(BaseChunkIdx, PallocChunkPages/2), 15 * PageSize},
+				{PallocChunkPages * 2, 0, 0},
+				{1, 0, 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+			},
+		},
+		"StraddlePallocChunkPages*5/4": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages * 3 / 4}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages * 3 / 4}},
+				BaseChunkIdx + 3: {{0, 0}},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{PallocChunkPages / 2, PallocChunkPages/4 + 1}},
+				BaseChunkIdx + 2: {{PallocChunkPages / 3, 1}},
+				BaseChunkIdx + 3: {{PallocChunkPages * 2 / 3, 1}},
+			},
+			hits: []hit{
+				{PallocChunkPages * 5 / 4, PageBase(BaseChunkIdx+2, PallocChunkPages*3/4), PageSize},
+				{PallocChunkPages * 5 / 4, 0, 0},
+				{1, PageBase(BaseChunkIdx+1, PallocChunkPages*3/4), PageSize},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages*3/4 + 1}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+				BaseChunkIdx + 3: {{0, PallocChunkPages}},
+			},
+		},
+		"AllFreePallocChunkPages*7+5": {
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+				BaseChunkIdx + 2: {},
+				BaseChunkIdx + 3: {},
+				BaseChunkIdx + 4: {},
+				BaseChunkIdx + 5: {},
+				BaseChunkIdx + 6: {},
+				BaseChunkIdx + 7: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{50, 1}},
+				BaseChunkIdx + 1: {{31, 1}},
+				BaseChunkIdx + 2: {{7, 1}},
+				BaseChunkIdx + 3: {{200, 1}},
+				BaseChunkIdx + 4: {{3, 1}},
+				BaseChunkIdx + 5: {{51, 1}},
+				BaseChunkIdx + 6: {{20, 1}},
+				BaseChunkIdx + 7: {{1, 1}},
+			},
+			hits: []hit{
+				{PallocChunkPages*7 + 5, PageBase(BaseChunkIdx, 0), 8 * PageSize},
+				{PallocChunkPages*7 + 5, 0, 0},
+				{1, PageBase(BaseChunkIdx+7, 5), 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+				BaseChunkIdx + 3: {{0, PallocChunkPages}},
+				BaseChunkIdx + 4: {{0, PallocChunkPages}},
+				BaseChunkIdx + 5: {{0, PallocChunkPages}},
+				BaseChunkIdx + 6: {{0, PallocChunkPages}},
+				BaseChunkIdx + 7: {{0, 6}},
+			},
+		},
+	}
+	// Disable these tests on iOS since we have a small address space.
+	// See #46860.
+	if PageAlloc64Bit != 0 && goos.IsIos == 0 {
+		const chunkIdxBigJump = 0x100000 // chunk index offset which translates to O(TiB)
+
+		// This test attempts to trigger a bug wherein we look at unmapped summary
+		// memory that isn't just in the case where we exhaust the heap.
+		//
+		// It achieves this by placing a chunk such that its summary will be
+		// at the very end of a physical page. It then also places another chunk
+		// much further up in the address space, such that any allocations into the
+		// first chunk do not exhaust the heap and the second chunk's summary is not in the
+		// page immediately adjacent to the first chunk's summary's page.
+		// Allocating into this first chunk to exhaustion and then into the second
+		// chunk may then trigger a check in the allocator which erroneously looks at
+		// unmapped summary memory and crashes.
+
+		// Figure out how many chunks are in a physical page, then align BaseChunkIdx
+		// to a physical page in the chunk summary array. Here we only assume that
+		// each summary array is aligned to some physical page.
+		sumsPerPhysPage := ChunkIdx(PhysPageSize / PallocSumBytes)
+		baseChunkIdx := BaseChunkIdx &^ (sumsPerPhysPage - 1)
+		tests["DiscontiguousMappedSumBoundary"] = test{
+			before: map[ChunkIdx][]BitRange{
+				baseChunkIdx + sumsPerPhysPage - 1: {},
+				baseChunkIdx + chunkIdxBigJump:     {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				baseChunkIdx + sumsPerPhysPage - 1: {},
+				baseChunkIdx + chunkIdxBigJump:     {},
+			},
+			hits: []hit{
+				{PallocChunkPages - 1, PageBase(baseChunkIdx+sumsPerPhysPage-1, 0), 0},
+				{1, PageBase(baseChunkIdx+sumsPerPhysPage-1, PallocChunkPages-1), 0},
+				{1, PageBase(baseChunkIdx+chunkIdxBigJump, 0), 0},
+				{PallocChunkPages - 1, PageBase(baseChunkIdx+chunkIdxBigJump, 1), 0},
+				{1, 0, 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				baseChunkIdx + sumsPerPhysPage - 1: {{0, PallocChunkPages}},
+				baseChunkIdx + chunkIdxBigJump:     {{0, PallocChunkPages}},
+			},
+		}
+
+		// Test to check for issue #40191. Essentially, the candidate searchAddr
+		// discovered by find may not point to mapped memory, so we need to handle
+		// that explicitly.
+		//
+		// chunkIdxSmallOffset is an offset intended to be used within chunkIdxBigJump.
+		// It is far enough within chunkIdxBigJump that the summaries at the beginning
+		// of an address range the size of chunkIdxBigJump will not be mapped in.
+		const chunkIdxSmallOffset = 0x503
+		tests["DiscontiguousBadSearchAddr"] = test{
+			before: map[ChunkIdx][]BitRange{
+				// The mechanism for the bug involves three chunks, A, B, and C, which are
+				// far apart in the address space. In particular, B is chunkIdxBigJump +
+				// chunkIdxSmalloffset chunks away from B, and C is 2*chunkIdxBigJump chunks
+				// away from A. A has 1 page free, B has several (NOT at the end of B), and
+				// C is totally free.
+				// Note that B's free memory must not be at the end of B because the fast
+				// path in the page allocator will check if the searchAddr even gives us
+				// enough space to place the allocation in a chunk before accessing the
+				// summary.
+				BaseChunkIdx + chunkIdxBigJump*0: {{0, PallocChunkPages - 1}},
+				BaseChunkIdx + chunkIdxBigJump*1 + chunkIdxSmallOffset: {
+					{0, PallocChunkPages - 10},
+					{PallocChunkPages - 1, 1},
+				},
+				BaseChunkIdx + chunkIdxBigJump*2: {},
+			},
+			scav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx + chunkIdxBigJump*0:                       {},
+				BaseChunkIdx + chunkIdxBigJump*1 + chunkIdxSmallOffset: {},
+				BaseChunkIdx + chunkIdxBigJump*2:                       {},
+			},
+			hits: []hit{
+				// We first allocate into A to set the page allocator's searchAddr to the
+				// end of that chunk. That is the only purpose A serves.
+				{1, PageBase(BaseChunkIdx, PallocChunkPages-1), 0},
+				// Then, we make a big allocation that doesn't fit into B, and so must be
+				// fulfilled by C.
+				//
+				// On the way to fulfilling the allocation into C, we estimate searchAddr
+				// using the summary structure, but that will give us a searchAddr of
+				// B's base address minus chunkIdxSmallOffset chunks. These chunks will
+				// not be mapped.
+				{100, PageBase(baseChunkIdx+chunkIdxBigJump*2, 0), 0},
+				// Now we try to make a smaller allocation that can be fulfilled by B.
+				// In an older implementation of the page allocator, this will segfault,
+				// because this last allocation will first try to access the summary
+				// for B's base address minus chunkIdxSmallOffset chunks in the fast path,
+				// and this will not be mapped.
+				{9, PageBase(baseChunkIdx+chunkIdxBigJump*1+chunkIdxSmallOffset, PallocChunkPages-10), 0},
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx + chunkIdxBigJump*0:                       {{0, PallocChunkPages}},
+				BaseChunkIdx + chunkIdxBigJump*1 + chunkIdxSmallOffset: {{0, PallocChunkPages}},
+				BaseChunkIdx + chunkIdxBigJump*2:                       {{0, 100}},
+			},
+		}
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := NewPageAlloc(v.before, v.scav)
+			defer FreePageAlloc(b)
+
+			for iter, i := range v.hits {
+				a, s := b.Alloc(i.npages)
+				if a != i.base {
+					t.Fatalf("bad alloc #%d: want base 0x%x, got 0x%x", iter+1, i.base, a)
+				}
+				if s != i.scav {
+					t.Fatalf("bad alloc #%d: want scav %d, got %d", iter+1, i.scav, s)
+				}
+			}
+			want := NewPageAlloc(v.after, v.scav)
+			defer FreePageAlloc(want)
+
+			checkPageAlloc(t, want, b)
+		})
+	}
+}
+
+func TestPageAllocExhaust(t *testing.T) {
+	if GOOS == "openbsd" && testing.Short() {
+		t.Skip("skipping because virtual memory is limited; see #36210")
+	}
+	for _, npages := range []uintptr{1, 2, 3, 4, 5, 8, 16, 64, 1024, 1025, 2048, 2049} {
+		npages := npages
+		t.Run(fmt.Sprintf("%d", npages), func(t *testing.T) {
+			// Construct b.
+			bDesc := make(map[ChunkIdx][]BitRange)
+			for i := ChunkIdx(0); i < 4; i++ {
+				bDesc[BaseChunkIdx+i] = []BitRange{}
+			}
+			b := NewPageAlloc(bDesc, nil)
+			defer FreePageAlloc(b)
+
+			// Allocate into b with npages until we've exhausted the heap.
+			nAlloc := (PallocChunkPages * 4) / int(npages)
+			for i := 0; i < nAlloc; i++ {
+				addr := PageBase(BaseChunkIdx, uint(i)*uint(npages))
+				if a, _ := b.Alloc(npages); a != addr {
+					t.Fatalf("bad alloc #%d: want 0x%x, got 0x%x", i+1, addr, a)
+				}
+			}
+
+			// Check to make sure the next allocation fails.
+			if a, _ := b.Alloc(npages); a != 0 {
+				t.Fatalf("bad alloc #%d: want 0, got 0x%x", nAlloc, a)
+			}
+
+			// Construct what we want the heap to look like now.
+			allocPages := nAlloc * int(npages)
+			wantDesc := make(map[ChunkIdx][]BitRange)
+			for i := ChunkIdx(0); i < 4; i++ {
+				if allocPages >= PallocChunkPages {
+					wantDesc[BaseChunkIdx+i] = []BitRange{{0, PallocChunkPages}}
+					allocPages -= PallocChunkPages
+				} else if allocPages > 0 {
+					wantDesc[BaseChunkIdx+i] = []BitRange{{0, uint(allocPages)}}
+					allocPages = 0
+				} else {
+					wantDesc[BaseChunkIdx+i] = []BitRange{}
+				}
+			}
+			want := NewPageAlloc(wantDesc, nil)
+			defer FreePageAlloc(want)
+
+			// Check to make sure the heap b matches what we want.
+			checkPageAlloc(t, want, b)
+		})
+	}
+}
+
+func TestPageAllocFree(t *testing.T) {
+	if GOOS == "openbsd" && testing.Short() {
+		t.Skip("skipping because virtual memory is limited; see #36210")
+	}
+	tests := map[string]struct {
+		before map[ChunkIdx][]BitRange
+		after  map[ChunkIdx][]BitRange
+		npages uintptr
+		frees  []uintptr
+	}{
+		"Free1": {
+			npages: 1,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, 0),
+				PageBase(BaseChunkIdx, 1),
+				PageBase(BaseChunkIdx, 2),
+				PageBase(BaseChunkIdx, 3),
+				PageBase(BaseChunkIdx, 4),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{5, PallocChunkPages - 5}},
+			},
+		},
+		"ManyArena1": {
+			npages: 1,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, PallocChunkPages/2),
+				PageBase(BaseChunkIdx+1, 0),
+				PageBase(BaseChunkIdx+2, PallocChunkPages-1),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages / 2}, {PallocChunkPages/2 + 1, PallocChunkPages/2 - 1}},
+				BaseChunkIdx + 1: {{1, PallocChunkPages - 1}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages - 1}},
+			},
+		},
+		"Free2": {
+			npages: 2,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, 0),
+				PageBase(BaseChunkIdx, 2),
+				PageBase(BaseChunkIdx, 4),
+				PageBase(BaseChunkIdx, 6),
+				PageBase(BaseChunkIdx, 8),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{10, PallocChunkPages - 10}},
+			},
+		},
+		"Straddle2": {
+			npages: 2,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{PallocChunkPages - 1, 1}},
+				BaseChunkIdx + 1: {{0, 1}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, PallocChunkPages-1),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+			},
+		},
+		"Free5": {
+			npages: 5,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, 0),
+				PageBase(BaseChunkIdx, 5),
+				PageBase(BaseChunkIdx, 10),
+				PageBase(BaseChunkIdx, 15),
+				PageBase(BaseChunkIdx, 20),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{25, PallocChunkPages - 25}},
+			},
+		},
+		"Free64": {
+			npages: 64,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, 0),
+				PageBase(BaseChunkIdx, 64),
+				PageBase(BaseChunkIdx, 128),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{192, PallocChunkPages - 192}},
+			},
+		},
+		"Free65": {
+			npages: 65,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, 0),
+				PageBase(BaseChunkIdx, 65),
+				PageBase(BaseChunkIdx, 130),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{195, PallocChunkPages - 195}},
+			},
+		},
+		"FreePallocChunkPages": {
+			npages: PallocChunkPages,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, 0),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+		},
+		"StraddlePallocChunkPages": {
+			npages: PallocChunkPages,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{PallocChunkPages / 2, PallocChunkPages / 2}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages / 2}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, PallocChunkPages/2),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+			},
+		},
+		"StraddlePallocChunkPages+1": {
+			npages: PallocChunkPages + 1,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, PallocChunkPages/2),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages / 2}},
+				BaseChunkIdx + 1: {{PallocChunkPages/2 + 1, PallocChunkPages/2 - 1}},
+			},
+		},
+		"FreePallocChunkPages*2": {
+			npages: PallocChunkPages * 2,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, 0),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+			},
+		},
+		"StraddlePallocChunkPages*2": {
+			npages: PallocChunkPages * 2,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, PallocChunkPages/2),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages / 2}},
+				BaseChunkIdx + 1: {},
+				BaseChunkIdx + 2: {{PallocChunkPages / 2, PallocChunkPages / 2}},
+			},
+		},
+		"AllFreePallocChunkPages*7+5": {
+			npages: PallocChunkPages*7 + 5,
+			before: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+				BaseChunkIdx + 3: {{0, PallocChunkPages}},
+				BaseChunkIdx + 4: {{0, PallocChunkPages}},
+				BaseChunkIdx + 5: {{0, PallocChunkPages}},
+				BaseChunkIdx + 6: {{0, PallocChunkPages}},
+				BaseChunkIdx + 7: {{0, PallocChunkPages}},
+			},
+			frees: []uintptr{
+				PageBase(BaseChunkIdx, 0),
+			},
+			after: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+				BaseChunkIdx + 2: {},
+				BaseChunkIdx + 3: {},
+				BaseChunkIdx + 4: {},
+				BaseChunkIdx + 5: {},
+				BaseChunkIdx + 6: {},
+				BaseChunkIdx + 7: {{5, PallocChunkPages - 5}},
+			},
+		},
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := NewPageAlloc(v.before, nil)
+			defer FreePageAlloc(b)
+
+			for _, addr := range v.frees {
+				b.Free(addr, v.npages)
+			}
+			want := NewPageAlloc(v.after, nil)
+			defer FreePageAlloc(want)
+
+			checkPageAlloc(t, want, b)
+		})
+	}
+}
+
+func TestPageAllocAllocAndFree(t *testing.T) {
+	if GOOS == "openbsd" && testing.Short() {
+		t.Skip("skipping because virtual memory is limited; see #36210")
+	}
+	type hit struct {
+		alloc  bool
+		npages uintptr
+		base   uintptr
+	}
+	tests := map[string]struct {
+		init map[ChunkIdx][]BitRange
+		hits []hit
+	}{
+		// TODO(mknyszek): Write more tests here.
+		"Chunks8": {
+			init: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {},
+				BaseChunkIdx + 1: {},
+				BaseChunkIdx + 2: {},
+				BaseChunkIdx + 3: {},
+				BaseChunkIdx + 4: {},
+				BaseChunkIdx + 5: {},
+				BaseChunkIdx + 6: {},
+				BaseChunkIdx + 7: {},
+			},
+			hits: []hit{
+				{true, PallocChunkPages * 8, PageBase(BaseChunkIdx, 0)},
+				{false, PallocChunkPages * 8, PageBase(BaseChunkIdx, 0)},
+				{true, PallocChunkPages * 8, PageBase(BaseChunkIdx, 0)},
+				{false, PallocChunkPages * 8, PageBase(BaseChunkIdx, 0)},
+				{true, PallocChunkPages * 8, PageBase(BaseChunkIdx, 0)},
+				{false, PallocChunkPages * 8, PageBase(BaseChunkIdx, 0)},
+				{true, 1, PageBase(BaseChunkIdx, 0)},
+				{false, 1, PageBase(BaseChunkIdx, 0)},
+				{true, PallocChunkPages * 8, PageBase(BaseChunkIdx, 0)},
+			},
+		},
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := NewPageAlloc(v.init, nil)
+			defer FreePageAlloc(b)
+
+			for iter, i := range v.hits {
+				if i.alloc {
+					if a, _ := b.Alloc(i.npages); a != i.base {
+						t.Fatalf("bad alloc #%d: want 0x%x, got 0x%x", iter+1, i.base, a)
+					}
+				} else {
+					b.Free(i.base, i.npages)
+				}
+			}
+		})
+	}
+}
diff --git a/src/runtime/mpagecache.go b/src/runtime/mpagecache.go
new file mode 100644
index 0000000..245b0cb
--- /dev/null
+++ b/src/runtime/mpagecache.go
@@ -0,0 +1,183 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const pageCachePages = 8 * unsafe.Sizeof(pageCache{}.cache)
+
+// pageCache represents a per-p cache of pages the allocator can
+// allocate from without a lock. More specifically, it represents
+// a pageCachePages*pageSize chunk of memory with 0 or more free
+// pages in it.
+type pageCache struct {
+	base  uintptr // base address of the chunk
+	cache uint64  // 64-bit bitmap representing free pages (1 means free)
+	scav  uint64  // 64-bit bitmap representing scavenged pages (1 means scavenged)
+}
+
+// empty reports whether the page cache has no free pages.
+func (c *pageCache) empty() bool {
+	return c.cache == 0
+}
+
+// alloc allocates npages from the page cache and is the main entry
+// point for allocation.
+//
+// Returns a base address and the amount of scavenged memory in the
+// allocated region in bytes.
+//
+// Returns a base address of zero on failure, in which case the
+// amount of scavenged memory should be ignored.
+func (c *pageCache) alloc(npages uintptr) (uintptr, uintptr) {
+	if c.cache == 0 {
+		return 0, 0
+	}
+	if npages == 1 {
+		i := uintptr(sys.TrailingZeros64(c.cache))
+		scav := (c.scav >> i) & 1
+		c.cache &^= 1 << i // set bit to mark in-use
+		c.scav &^= 1 << i  // clear bit to mark unscavenged
+		return c.base + i*pageSize, uintptr(scav) * pageSize
+	}
+	return c.allocN(npages)
+}
+
+// allocN is a helper which attempts to allocate npages worth of pages
+// from the cache. It represents the general case for allocating from
+// the page cache.
+//
+// Returns a base address and the amount of scavenged memory in the
+// allocated region in bytes.
+func (c *pageCache) allocN(npages uintptr) (uintptr, uintptr) {
+	i := findBitRange64(c.cache, uint(npages))
+	if i >= 64 {
+		return 0, 0
+	}
+	mask := ((uint64(1) << npages) - 1) << i
+	scav := sys.OnesCount64(c.scav & mask)
+	c.cache &^= mask // mark in-use bits
+	c.scav &^= mask  // clear scavenged bits
+	return c.base + uintptr(i*pageSize), uintptr(scav) * pageSize
+}
+
+// flush empties out unallocated free pages in the given cache
+// into s. Then, it clears the cache, such that empty returns
+// true.
+//
+// p.mheapLock must be held.
+//
+// Must run on the system stack because p.mheapLock must be held.
+//
+//go:systemstack
+func (c *pageCache) flush(p *pageAlloc) {
+	assertLockHeld(p.mheapLock)
+
+	if c.empty() {
+		return
+	}
+	ci := chunkIndex(c.base)
+	pi := chunkPageIndex(c.base)
+
+	// This method is called very infrequently, so just do the
+	// slower, safer thing by iterating over each bit individually.
+	for i := uint(0); i < 64; i++ {
+		if c.cache&(1<<i) != 0 {
+			p.chunkOf(ci).free1(pi + i)
+
+			// Update density statistics.
+			p.scav.index.free(ci, pi+i, 1)
+		}
+		if c.scav&(1<<i) != 0 {
+			p.chunkOf(ci).scavenged.setRange(pi+i, 1)
+		}
+	}
+
+	// Since this is a lot like a free, we need to make sure
+	// we update the searchAddr just like free does.
+	if b := (offAddr{c.base}); b.lessThan(p.searchAddr) {
+		p.searchAddr = b
+	}
+	p.update(c.base, pageCachePages, false, false)
+	*c = pageCache{}
+}
+
+// allocToCache acquires a pageCachePages-aligned chunk of free pages which
+// may not be contiguous, and returns a pageCache structure which owns the
+// chunk.
+//
+// p.mheapLock must be held.
+//
+// Must run on the system stack because p.mheapLock must be held.
+//
+//go:systemstack
+func (p *pageAlloc) allocToCache() pageCache {
+	assertLockHeld(p.mheapLock)
+
+	// If the searchAddr refers to a region which has a higher address than
+	// any known chunk, then we know we're out of memory.
+	if chunkIndex(p.searchAddr.addr()) >= p.end {
+		return pageCache{}
+	}
+	c := pageCache{}
+	ci := chunkIndex(p.searchAddr.addr()) // chunk index
+	var chunk *pallocData
+	if p.summary[len(p.summary)-1][ci] != 0 {
+		// Fast path: there's free pages at or near the searchAddr address.
+		chunk = p.chunkOf(ci)
+		j, _ := chunk.find(1, chunkPageIndex(p.searchAddr.addr()))
+		if j == ^uint(0) {
+			throw("bad summary data")
+		}
+		c = pageCache{
+			base:  chunkBase(ci) + alignDown(uintptr(j), 64)*pageSize,
+			cache: ^chunk.pages64(j),
+			scav:  chunk.scavenged.block64(j),
+		}
+	} else {
+		// Slow path: the searchAddr address had nothing there, so go find
+		// the first free page the slow way.
+		addr, _ := p.find(1)
+		if addr == 0 {
+			// We failed to find adequate free space, so mark the searchAddr as OoM
+			// and return an empty pageCache.
+			p.searchAddr = maxSearchAddr()
+			return pageCache{}
+		}
+		ci = chunkIndex(addr)
+		chunk = p.chunkOf(ci)
+		c = pageCache{
+			base:  alignDown(addr, 64*pageSize),
+			cache: ^chunk.pages64(chunkPageIndex(addr)),
+			scav:  chunk.scavenged.block64(chunkPageIndex(addr)),
+		}
+	}
+
+	// Set the page bits as allocated and clear the scavenged bits, but
+	// be careful to only set and clear the relevant bits.
+	cpi := chunkPageIndex(c.base)
+	chunk.allocPages64(cpi, c.cache)
+	chunk.scavenged.clearBlock64(cpi, c.cache&c.scav /* free and scavenged */)
+
+	// Update as an allocation, but note that it's not contiguous.
+	p.update(c.base, pageCachePages, false, true)
+
+	// Update density statistics.
+	p.scav.index.alloc(ci, uint(sys.OnesCount64(c.cache)))
+
+	// Set the search address to the last page represented by the cache.
+	// Since all of the pages in this block are going to the cache, and we
+	// searched for the first free page, we can confidently start at the
+	// next page.
+	//
+	// However, p.searchAddr is not allowed to point into unmapped heap memory
+	// unless it is maxSearchAddr, so make it the last page as opposed to
+	// the page after.
+	p.searchAddr = offAddr{c.base + pageSize*(pageCachePages-1)}
+	return c
+}
diff --git a/src/runtime/mpagecache_test.go b/src/runtime/mpagecache_test.go
new file mode 100644
index 0000000..6cb0620
--- /dev/null
+++ b/src/runtime/mpagecache_test.go
@@ -0,0 +1,424 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"internal/goos"
+	"math/rand"
+	. "runtime"
+	"testing"
+)
+
+func checkPageCache(t *testing.T, got, want PageCache) {
+	if got.Base() != want.Base() {
+		t.Errorf("bad pageCache base: got 0x%x, want 0x%x", got.Base(), want.Base())
+	}
+	if got.Cache() != want.Cache() {
+		t.Errorf("bad pageCache bits: got %016x, want %016x", got.Base(), want.Base())
+	}
+	if got.Scav() != want.Scav() {
+		t.Errorf("bad pageCache scav: got %016x, want %016x", got.Scav(), want.Scav())
+	}
+}
+
+func TestPageCacheAlloc(t *testing.T) {
+	base := PageBase(BaseChunkIdx, 0)
+	type hit struct {
+		npages uintptr
+		base   uintptr
+		scav   uintptr
+	}
+	tests := map[string]struct {
+		cache PageCache
+		hits  []hit
+	}{
+		"Empty": {
+			cache: NewPageCache(base, 0, 0),
+			hits: []hit{
+				{1, 0, 0},
+				{2, 0, 0},
+				{3, 0, 0},
+				{4, 0, 0},
+				{5, 0, 0},
+				{11, 0, 0},
+				{12, 0, 0},
+				{16, 0, 0},
+				{27, 0, 0},
+				{32, 0, 0},
+				{43, 0, 0},
+				{57, 0, 0},
+				{64, 0, 0},
+				{121, 0, 0},
+			},
+		},
+		"Lo1": {
+			cache: NewPageCache(base, 0x1, 0x1),
+			hits: []hit{
+				{1, base, PageSize},
+				{1, 0, 0},
+				{10, 0, 0},
+			},
+		},
+		"Hi1": {
+			cache: NewPageCache(base, 0x1<<63, 0x1),
+			hits: []hit{
+				{1, base + 63*PageSize, 0},
+				{1, 0, 0},
+				{10, 0, 0},
+			},
+		},
+		"Swiss1": {
+			cache: NewPageCache(base, 0x20005555, 0x5505),
+			hits: []hit{
+				{2, 0, 0},
+				{1, base, PageSize},
+				{1, base + 2*PageSize, PageSize},
+				{1, base + 4*PageSize, 0},
+				{1, base + 6*PageSize, 0},
+				{1, base + 8*PageSize, PageSize},
+				{1, base + 10*PageSize, PageSize},
+				{1, base + 12*PageSize, PageSize},
+				{1, base + 14*PageSize, PageSize},
+				{1, base + 29*PageSize, 0},
+				{1, 0, 0},
+				{10, 0, 0},
+			},
+		},
+		"Lo2": {
+			cache: NewPageCache(base, 0x3, 0x2<<62),
+			hits: []hit{
+				{2, base, 0},
+				{2, 0, 0},
+				{1, 0, 0},
+			},
+		},
+		"Hi2": {
+			cache: NewPageCache(base, 0x3<<62, 0x3<<62),
+			hits: []hit{
+				{2, base + 62*PageSize, 2 * PageSize},
+				{2, 0, 0},
+				{1, 0, 0},
+			},
+		},
+		"Swiss2": {
+			cache: NewPageCache(base, 0x3333<<31, 0x3030<<31),
+			hits: []hit{
+				{2, base + 31*PageSize, 0},
+				{2, base + 35*PageSize, 2 * PageSize},
+				{2, base + 39*PageSize, 0},
+				{2, base + 43*PageSize, 2 * PageSize},
+				{2, 0, 0},
+			},
+		},
+		"Hi53": {
+			cache: NewPageCache(base, ((uint64(1)<<53)-1)<<10, ((uint64(1)<<16)-1)<<10),
+			hits: []hit{
+				{53, base + 10*PageSize, 16 * PageSize},
+				{53, 0, 0},
+				{1, 0, 0},
+			},
+		},
+		"Full53": {
+			cache: NewPageCache(base, ^uint64(0), ((uint64(1)<<16)-1)<<10),
+			hits: []hit{
+				{53, base, 16 * PageSize},
+				{53, 0, 0},
+				{1, base + 53*PageSize, 0},
+			},
+		},
+		"Full64": {
+			cache: NewPageCache(base, ^uint64(0), ^uint64(0)),
+			hits: []hit{
+				{64, base, 64 * PageSize},
+				{64, 0, 0},
+				{1, 0, 0},
+			},
+		},
+		"FullMixed": {
+			cache: NewPageCache(base, ^uint64(0), ^uint64(0)),
+			hits: []hit{
+				{5, base, 5 * PageSize},
+				{7, base + 5*PageSize, 7 * PageSize},
+				{1, base + 12*PageSize, 1 * PageSize},
+				{23, base + 13*PageSize, 23 * PageSize},
+				{63, 0, 0},
+				{3, base + 36*PageSize, 3 * PageSize},
+				{3, base + 39*PageSize, 3 * PageSize},
+				{3, base + 42*PageSize, 3 * PageSize},
+				{12, base + 45*PageSize, 12 * PageSize},
+				{11, 0, 0},
+				{4, base + 57*PageSize, 4 * PageSize},
+				{4, 0, 0},
+				{6, 0, 0},
+				{36, 0, 0},
+				{2, base + 61*PageSize, 2 * PageSize},
+				{3, 0, 0},
+				{1, base + 63*PageSize, 1 * PageSize},
+				{4, 0, 0},
+				{2, 0, 0},
+				{62, 0, 0},
+				{1, 0, 0},
+			},
+		},
+	}
+	for name, test := range tests {
+		test := test
+		t.Run(name, func(t *testing.T) {
+			c := test.cache
+			for i, h := range test.hits {
+				b, s := c.Alloc(h.npages)
+				if b != h.base {
+					t.Fatalf("bad alloc base #%d: got 0x%x, want 0x%x", i, b, h.base)
+				}
+				if s != h.scav {
+					t.Fatalf("bad alloc scav #%d: got %d, want %d", i, s, h.scav)
+				}
+			}
+		})
+	}
+}
+
+func TestPageCacheFlush(t *testing.T) {
+	if GOOS == "openbsd" && testing.Short() {
+		t.Skip("skipping because virtual memory is limited; see #36210")
+	}
+	bits64ToBitRanges := func(bits uint64, base uint) []BitRange {
+		var ranges []BitRange
+		start, size := uint(0), uint(0)
+		for i := 0; i < 64; i++ {
+			if bits&(1<<i) != 0 {
+				if size == 0 {
+					start = uint(i) + base
+				}
+				size++
+			} else {
+				if size != 0 {
+					ranges = append(ranges, BitRange{start, size})
+					size = 0
+				}
+			}
+		}
+		if size != 0 {
+			ranges = append(ranges, BitRange{start, size})
+		}
+		return ranges
+	}
+	runTest := func(t *testing.T, base uint, cache, scav uint64) {
+		// Set up the before state.
+		beforeAlloc := map[ChunkIdx][]BitRange{
+			BaseChunkIdx: {{base, 64}},
+		}
+		beforeScav := map[ChunkIdx][]BitRange{
+			BaseChunkIdx: {},
+		}
+		b := NewPageAlloc(beforeAlloc, beforeScav)
+		defer FreePageAlloc(b)
+
+		// Create and flush the cache.
+		c := NewPageCache(PageBase(BaseChunkIdx, base), cache, scav)
+		c.Flush(b)
+		if !c.Empty() {
+			t.Errorf("pageCache flush did not clear cache")
+		}
+
+		// Set up the expected after state.
+		afterAlloc := map[ChunkIdx][]BitRange{
+			BaseChunkIdx: bits64ToBitRanges(^cache, base),
+		}
+		afterScav := map[ChunkIdx][]BitRange{
+			BaseChunkIdx: bits64ToBitRanges(scav, base),
+		}
+		want := NewPageAlloc(afterAlloc, afterScav)
+		defer FreePageAlloc(want)
+
+		// Check to see if it worked.
+		checkPageAlloc(t, want, b)
+	}
+
+	// Empty.
+	runTest(t, 0, 0, 0)
+
+	// Full.
+	runTest(t, 0, ^uint64(0), ^uint64(0))
+
+	// Random.
+	for i := 0; i < 100; i++ {
+		// Generate random valid base within a chunk.
+		base := uint(rand.Intn(PallocChunkPages/64)) * 64
+
+		// Generate random cache.
+		cache := rand.Uint64()
+		scav := rand.Uint64() & cache
+
+		// Run the test.
+		runTest(t, base, cache, scav)
+	}
+}
+
+func TestPageAllocAllocToCache(t *testing.T) {
+	if GOOS == "openbsd" && testing.Short() {
+		t.Skip("skipping because virtual memory is limited; see #36210")
+	}
+	type test struct {
+		beforeAlloc map[ChunkIdx][]BitRange
+		beforeScav  map[ChunkIdx][]BitRange
+		hits        []PageCache // expected base addresses and patterns
+		afterAlloc  map[ChunkIdx][]BitRange
+		afterScav   map[ChunkIdx][]BitRange
+	}
+	tests := map[string]test{
+		"AllFree": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{1, 1}, {64, 64}},
+			},
+			hits: []PageCache{
+				NewPageCache(PageBase(BaseChunkIdx, 0), ^uint64(0), 0x2),
+				NewPageCache(PageBase(BaseChunkIdx, 64), ^uint64(0), ^uint64(0)),
+				NewPageCache(PageBase(BaseChunkIdx, 128), ^uint64(0), 0),
+				NewPageCache(PageBase(BaseChunkIdx, 192), ^uint64(0), 0),
+			},
+			afterAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 256}},
+			},
+		},
+		"ManyArena": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages - 64}},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {},
+			},
+			hits: []PageCache{
+				NewPageCache(PageBase(BaseChunkIdx+2, PallocChunkPages-64), ^uint64(0), 0),
+			},
+			afterAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:     {{0, PallocChunkPages}},
+				BaseChunkIdx + 1: {{0, PallocChunkPages}},
+				BaseChunkIdx + 2: {{0, PallocChunkPages}},
+			},
+		},
+		"NotContiguous": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, PallocChunkPages}},
+				BaseChunkIdx + 0xff: {{0, 0}},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, PallocChunkPages}},
+				BaseChunkIdx + 0xff: {{31, 67}},
+			},
+			hits: []PageCache{
+				NewPageCache(PageBase(BaseChunkIdx+0xff, 0), ^uint64(0), ((uint64(1)<<33)-1)<<31),
+			},
+			afterAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, PallocChunkPages}},
+				BaseChunkIdx + 0xff: {{0, 64}},
+			},
+			afterScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx:        {{0, PallocChunkPages}},
+				BaseChunkIdx + 0xff: {{64, 34}},
+			},
+		},
+		"First": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 32}, {33, 31}, {96, 32}},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{1, 4}, {31, 5}, {66, 2}},
+			},
+			hits: []PageCache{
+				NewPageCache(PageBase(BaseChunkIdx, 0), 1<<32, 1<<32),
+				NewPageCache(PageBase(BaseChunkIdx, 64), (uint64(1)<<32)-1, 0x3<<2),
+			},
+			afterAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 128}},
+			},
+		},
+		"Fail": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+			hits: []PageCache{
+				NewPageCache(0, 0, 0),
+				NewPageCache(0, 0, 0),
+				NewPageCache(0, 0, 0),
+			},
+			afterAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, PallocChunkPages}},
+			},
+		},
+		"RetainScavBits": {
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 1}, {10, 2}},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 4}, {11, 1}},
+			},
+			hits: []PageCache{
+				NewPageCache(PageBase(BaseChunkIdx, 0), ^uint64(0x1|(0x3<<10)), 0x7<<1),
+			},
+			afterAlloc: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 64}},
+			},
+			afterScav: map[ChunkIdx][]BitRange{
+				BaseChunkIdx: {{0, 1}, {11, 1}},
+			},
+		},
+	}
+	// Disable these tests on iOS since we have a small address space.
+	// See #46860.
+	if PageAlloc64Bit != 0 && goos.IsIos == 0 {
+		const chunkIdxBigJump = 0x100000 // chunk index offset which translates to O(TiB)
+
+		// This test is similar to the one with the same name for
+		// pageAlloc.alloc and serves the same purpose.
+		// See mpagealloc_test.go for details.
+		sumsPerPhysPage := ChunkIdx(PhysPageSize / PallocSumBytes)
+		baseChunkIdx := BaseChunkIdx &^ (sumsPerPhysPage - 1)
+		tests["DiscontiguousMappedSumBoundary"] = test{
+			beforeAlloc: map[ChunkIdx][]BitRange{
+				baseChunkIdx + sumsPerPhysPage - 1: {{0, PallocChunkPages - 1}},
+				baseChunkIdx + chunkIdxBigJump:     {{1, PallocChunkPages - 1}},
+			},
+			beforeScav: map[ChunkIdx][]BitRange{
+				baseChunkIdx + sumsPerPhysPage - 1: {},
+				baseChunkIdx + chunkIdxBigJump:     {},
+			},
+			hits: []PageCache{
+				NewPageCache(PageBase(baseChunkIdx+sumsPerPhysPage-1, PallocChunkPages-64), 1<<63, 0),
+				NewPageCache(PageBase(baseChunkIdx+chunkIdxBigJump, 0), 1, 0),
+				NewPageCache(0, 0, 0),
+			},
+			afterAlloc: map[ChunkIdx][]BitRange{
+				baseChunkIdx + sumsPerPhysPage - 1: {{0, PallocChunkPages}},
+				baseChunkIdx + chunkIdxBigJump:     {{0, PallocChunkPages}},
+			},
+		}
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := NewPageAlloc(v.beforeAlloc, v.beforeScav)
+			defer FreePageAlloc(b)
+
+			for _, expect := range v.hits {
+				checkPageCache(t, b.AllocToCache(), expect)
+				if t.Failed() {
+					return
+				}
+			}
+			want := NewPageAlloc(v.afterAlloc, v.afterScav)
+			defer FreePageAlloc(want)
+
+			checkPageAlloc(t, want, b)
+		})
+	}
+}
diff --git a/src/runtime/mpallocbits.go b/src/runtime/mpallocbits.go
new file mode 100644
index 0000000..2f35ce0
--- /dev/null
+++ b/src/runtime/mpallocbits.go
@@ -0,0 +1,446 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+)
+
+// pageBits is a bitmap representing one bit per page in a palloc chunk.
+type pageBits [pallocChunkPages / 64]uint64
+
+// get returns the value of the i'th bit in the bitmap.
+func (b *pageBits) get(i uint) uint {
+	return uint((b[i/64] >> (i % 64)) & 1)
+}
+
+// block64 returns the 64-bit aligned block of bits containing the i'th bit.
+func (b *pageBits) block64(i uint) uint64 {
+	return b[i/64]
+}
+
+// set sets bit i of pageBits.
+func (b *pageBits) set(i uint) {
+	b[i/64] |= 1 << (i % 64)
+}
+
+// setRange sets bits in the range [i, i+n).
+func (b *pageBits) setRange(i, n uint) {
+	_ = b[i/64]
+	if n == 1 {
+		// Fast path for the n == 1 case.
+		b.set(i)
+		return
+	}
+	// Set bits [i, j].
+	j := i + n - 1
+	if i/64 == j/64 {
+		b[i/64] |= ((uint64(1) << n) - 1) << (i % 64)
+		return
+	}
+	_ = b[j/64]
+	// Set leading bits.
+	b[i/64] |= ^uint64(0) << (i % 64)
+	for k := i/64 + 1; k < j/64; k++ {
+		b[k] = ^uint64(0)
+	}
+	// Set trailing bits.
+	b[j/64] |= (uint64(1) << (j%64 + 1)) - 1
+}
+
+// setAll sets all the bits of b.
+func (b *pageBits) setAll() {
+	for i := range b {
+		b[i] = ^uint64(0)
+	}
+}
+
+// setBlock64 sets the 64-bit aligned block of bits containing the i'th bit that
+// are set in v.
+func (b *pageBits) setBlock64(i uint, v uint64) {
+	b[i/64] |= v
+}
+
+// clear clears bit i of pageBits.
+func (b *pageBits) clear(i uint) {
+	b[i/64] &^= 1 << (i % 64)
+}
+
+// clearRange clears bits in the range [i, i+n).
+func (b *pageBits) clearRange(i, n uint) {
+	_ = b[i/64]
+	if n == 1 {
+		// Fast path for the n == 1 case.
+		b.clear(i)
+		return
+	}
+	// Clear bits [i, j].
+	j := i + n - 1
+	if i/64 == j/64 {
+		b[i/64] &^= ((uint64(1) << n) - 1) << (i % 64)
+		return
+	}
+	_ = b[j/64]
+	// Clear leading bits.
+	b[i/64] &^= ^uint64(0) << (i % 64)
+	for k := i/64 + 1; k < j/64; k++ {
+		b[k] = 0
+	}
+	// Clear trailing bits.
+	b[j/64] &^= (uint64(1) << (j%64 + 1)) - 1
+}
+
+// clearAll frees all the bits of b.
+func (b *pageBits) clearAll() {
+	for i := range b {
+		b[i] = 0
+	}
+}
+
+// clearBlock64 clears the 64-bit aligned block of bits containing the i'th bit that
+// are set in v.
+func (b *pageBits) clearBlock64(i uint, v uint64) {
+	b[i/64] &^= v
+}
+
+// popcntRange counts the number of set bits in the
+// range [i, i+n).
+func (b *pageBits) popcntRange(i, n uint) (s uint) {
+	if n == 1 {
+		return uint((b[i/64] >> (i % 64)) & 1)
+	}
+	_ = b[i/64]
+	j := i + n - 1
+	if i/64 == j/64 {
+		return uint(sys.OnesCount64((b[i/64] >> (i % 64)) & ((1 << n) - 1)))
+	}
+	_ = b[j/64]
+	s += uint(sys.OnesCount64(b[i/64] >> (i % 64)))
+	for k := i/64 + 1; k < j/64; k++ {
+		s += uint(sys.OnesCount64(b[k]))
+	}
+	s += uint(sys.OnesCount64(b[j/64] & ((1 << (j%64 + 1)) - 1)))
+	return
+}
+
+// pallocBits is a bitmap that tracks page allocations for at most one
+// palloc chunk.
+//
+// The precise representation is an implementation detail, but for the
+// sake of documentation, 0s are free pages and 1s are allocated pages.
+type pallocBits pageBits
+
+// summarize returns a packed summary of the bitmap in pallocBits.
+func (b *pallocBits) summarize() pallocSum {
+	var start, max, cur uint
+	const notSetYet = ^uint(0) // sentinel for start value
+	start = notSetYet
+	for i := 0; i < len(b); i++ {
+		x := b[i]
+		if x == 0 {
+			cur += 64
+			continue
+		}
+		t := uint(sys.TrailingZeros64(x))
+		l := uint(sys.LeadingZeros64(x))
+
+		// Finish any region spanning the uint64s
+		cur += t
+		if start == notSetYet {
+			start = cur
+		}
+		if cur > max {
+			max = cur
+		}
+		// Final region that might span to next uint64
+		cur = l
+	}
+	if start == notSetYet {
+		// Made it all the way through without finding a single 1 bit.
+		const n = uint(64 * len(b))
+		return packPallocSum(n, n, n)
+	}
+	if cur > max {
+		max = cur
+	}
+	if max >= 64-2 {
+		// There is no way an internal run of zeros could beat max.
+		return packPallocSum(start, max, cur)
+	}
+	// Now look inside each uint64 for runs of zeros.
+	// All uint64s must be nonzero, or we would have aborted above.
+outer:
+	for i := 0; i < len(b); i++ {
+		x := b[i]
+
+		// Look inside this uint64. We have a pattern like
+		// 000000 1xxxxx1 000000
+		// We need to look inside the 1xxxxx1 for any contiguous
+		// region of zeros.
+
+		// We already know the trailing zeros are no larger than max. Remove them.
+		x >>= sys.TrailingZeros64(x) & 63
+		if x&(x+1) == 0 { // no more zeros (except at the top).
+			continue
+		}
+
+		// Strategy: shrink all runs of zeros by max. If any runs of zero
+		// remain, then we've identified a larger maximum zero run.
+		p := max     // number of zeros we still need to shrink by.
+		k := uint(1) // current minimum length of runs of ones in x.
+		for {
+			// Shrink all runs of zeros by p places (except the top zeros).
+			for p > 0 {
+				if p <= k {
+					// Shift p ones down into the top of each run of zeros.
+					x |= x >> (p & 63)
+					if x&(x+1) == 0 { // no more zeros (except at the top).
+						continue outer
+					}
+					break
+				}
+				// Shift k ones down into the top of each run of zeros.
+				x |= x >> (k & 63)
+				if x&(x+1) == 0 { // no more zeros (except at the top).
+					continue outer
+				}
+				p -= k
+				// We've just doubled the minimum length of 1-runs.
+				// This allows us to shift farther in the next iteration.
+				k *= 2
+			}
+
+			// The length of the lowest-order zero run is an increment to our maximum.
+			j := uint(sys.TrailingZeros64(^x)) // count contiguous trailing ones
+			x >>= j & 63                       // remove trailing ones
+			j = uint(sys.TrailingZeros64(x))   // count contiguous trailing zeros
+			x >>= j & 63                       // remove zeros
+			max += j                           // we have a new maximum!
+			if x&(x+1) == 0 {                  // no more zeros (except at the top).
+				continue outer
+			}
+			p = j // remove j more zeros from each zero run.
+		}
+	}
+	return packPallocSum(start, max, cur)
+}
+
+// find searches for npages contiguous free pages in pallocBits and returns
+// the index where that run starts, as well as the index of the first free page
+// it found in the search. searchIdx represents the first known free page and
+// where to begin the next search from.
+//
+// If find fails to find any free space, it returns an index of ^uint(0) and
+// the new searchIdx should be ignored.
+//
+// Note that if npages == 1, the two returned values will always be identical.
+func (b *pallocBits) find(npages uintptr, searchIdx uint) (uint, uint) {
+	if npages == 1 {
+		addr := b.find1(searchIdx)
+		return addr, addr
+	} else if npages <= 64 {
+		return b.findSmallN(npages, searchIdx)
+	}
+	return b.findLargeN(npages, searchIdx)
+}
+
+// find1 is a helper for find which searches for a single free page
+// in the pallocBits and returns the index.
+//
+// See find for an explanation of the searchIdx parameter.
+func (b *pallocBits) find1(searchIdx uint) uint {
+	_ = b[0] // lift nil check out of loop
+	for i := searchIdx / 64; i < uint(len(b)); i++ {
+		x := b[i]
+		if ^x == 0 {
+			continue
+		}
+		return i*64 + uint(sys.TrailingZeros64(^x))
+	}
+	return ^uint(0)
+}
+
+// findSmallN is a helper for find which searches for npages contiguous free pages
+// in this pallocBits and returns the index where that run of contiguous pages
+// starts as well as the index of the first free page it finds in its search.
+//
+// See find for an explanation of the searchIdx parameter.
+//
+// Returns a ^uint(0) index on failure and the new searchIdx should be ignored.
+//
+// findSmallN assumes npages <= 64, where any such contiguous run of pages
+// crosses at most one aligned 64-bit boundary in the bits.
+func (b *pallocBits) findSmallN(npages uintptr, searchIdx uint) (uint, uint) {
+	end, newSearchIdx := uint(0), ^uint(0)
+	for i := searchIdx / 64; i < uint(len(b)); i++ {
+		bi := b[i]
+		if ^bi == 0 {
+			end = 0
+			continue
+		}
+		// First see if we can pack our allocation in the trailing
+		// zeros plus the end of the last 64 bits.
+		if newSearchIdx == ^uint(0) {
+			// The new searchIdx is going to be at these 64 bits after any
+			// 1s we file, so count trailing 1s.
+			newSearchIdx = i*64 + uint(sys.TrailingZeros64(^bi))
+		}
+		start := uint(sys.TrailingZeros64(bi))
+		if end+start >= uint(npages) {
+			return i*64 - end, newSearchIdx
+		}
+		// Next, check the interior of the 64-bit chunk.
+		j := findBitRange64(^bi, uint(npages))
+		if j < 64 {
+			return i*64 + j, newSearchIdx
+		}
+		end = uint(sys.LeadingZeros64(bi))
+	}
+	return ^uint(0), newSearchIdx
+}
+
+// findLargeN is a helper for find which searches for npages contiguous free pages
+// in this pallocBits and returns the index where that run starts, as well as the
+// index of the first free page it found it its search.
+//
+// See alloc for an explanation of the searchIdx parameter.
+//
+// Returns a ^uint(0) index on failure and the new searchIdx should be ignored.
+//
+// findLargeN assumes npages > 64, where any such run of free pages
+// crosses at least one aligned 64-bit boundary in the bits.
+func (b *pallocBits) findLargeN(npages uintptr, searchIdx uint) (uint, uint) {
+	start, size, newSearchIdx := ^uint(0), uint(0), ^uint(0)
+	for i := searchIdx / 64; i < uint(len(b)); i++ {
+		x := b[i]
+		if x == ^uint64(0) {
+			size = 0
+			continue
+		}
+		if newSearchIdx == ^uint(0) {
+			// The new searchIdx is going to be at these 64 bits after any
+			// 1s we file, so count trailing 1s.
+			newSearchIdx = i*64 + uint(sys.TrailingZeros64(^x))
+		}
+		if size == 0 {
+			size = uint(sys.LeadingZeros64(x))
+			start = i*64 + 64 - size
+			continue
+		}
+		s := uint(sys.TrailingZeros64(x))
+		if s+size >= uint(npages) {
+			size += s
+			return start, newSearchIdx
+		}
+		if s < 64 {
+			size = uint(sys.LeadingZeros64(x))
+			start = i*64 + 64 - size
+			continue
+		}
+		size += 64
+	}
+	if size < uint(npages) {
+		return ^uint(0), newSearchIdx
+	}
+	return start, newSearchIdx
+}
+
+// allocRange allocates the range [i, i+n).
+func (b *pallocBits) allocRange(i, n uint) {
+	(*pageBits)(b).setRange(i, n)
+}
+
+// allocAll allocates all the bits of b.
+func (b *pallocBits) allocAll() {
+	(*pageBits)(b).setAll()
+}
+
+// free1 frees a single page in the pallocBits at i.
+func (b *pallocBits) free1(i uint) {
+	(*pageBits)(b).clear(i)
+}
+
+// free frees the range [i, i+n) of pages in the pallocBits.
+func (b *pallocBits) free(i, n uint) {
+	(*pageBits)(b).clearRange(i, n)
+}
+
+// freeAll frees all the bits of b.
+func (b *pallocBits) freeAll() {
+	(*pageBits)(b).clearAll()
+}
+
+// pages64 returns a 64-bit bitmap representing a block of 64 pages aligned
+// to 64 pages. The returned block of pages is the one containing the i'th
+// page in this pallocBits. Each bit represents whether the page is in-use.
+func (b *pallocBits) pages64(i uint) uint64 {
+	return (*pageBits)(b).block64(i)
+}
+
+// allocPages64 allocates a 64-bit block of 64 pages aligned to 64 pages according
+// to the bits set in alloc. The block set is the one containing the i'th page.
+func (b *pallocBits) allocPages64(i uint, alloc uint64) {
+	(*pageBits)(b).setBlock64(i, alloc)
+}
+
+// findBitRange64 returns the bit index of the first set of
+// n consecutive 1 bits. If no consecutive set of 1 bits of
+// size n may be found in c, then it returns an integer >= 64.
+// n must be > 0.
+func findBitRange64(c uint64, n uint) uint {
+	// This implementation is based on shrinking the length of
+	// runs of contiguous 1 bits. We remove the top n-1 1 bits
+	// from each run of 1s, then look for the first remaining 1 bit.
+	p := n - 1   // number of 1s we want to remove.
+	k := uint(1) // current minimum width of runs of 0 in c.
+	for p > 0 {
+		if p <= k {
+			// Shift p 0s down into the top of each run of 1s.
+			c &= c >> (p & 63)
+			break
+		}
+		// Shift k 0s down into the top of each run of 1s.
+		c &= c >> (k & 63)
+		if c == 0 {
+			return 64
+		}
+		p -= k
+		// We've just doubled the minimum length of 0-runs.
+		// This allows us to shift farther in the next iteration.
+		k *= 2
+	}
+	// Find first remaining 1.
+	// Since we shrunk from the top down, the first 1 is in
+	// its correct original position.
+	return uint(sys.TrailingZeros64(c))
+}
+
+// pallocData encapsulates pallocBits and a bitmap for
+// whether or not a given page is scavenged in a single
+// structure. It's effectively a pallocBits with
+// additional functionality.
+//
+// Update the comment on (*pageAlloc).chunks should this
+// structure change.
+type pallocData struct {
+	pallocBits
+	scavenged pageBits
+}
+
+// allocRange sets bits [i, i+n) in the bitmap to 1 and
+// updates the scavenged bits appropriately.
+func (m *pallocData) allocRange(i, n uint) {
+	// Clear the scavenged bits when we alloc the range.
+	m.pallocBits.allocRange(i, n)
+	m.scavenged.clearRange(i, n)
+}
+
+// allocAll sets every bit in the bitmap to 1 and updates
+// the scavenged bits appropriately.
+func (m *pallocData) allocAll() {
+	// Clear the scavenged bits when we alloc the range.
+	m.pallocBits.allocAll()
+	m.scavenged.clearAll()
+}
diff --git a/src/runtime/mpallocbits_test.go b/src/runtime/mpallocbits_test.go
new file mode 100644
index 0000000..5095e24
--- /dev/null
+++ b/src/runtime/mpallocbits_test.go
@@ -0,0 +1,551 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"math/rand"
+	. "runtime"
+	"testing"
+)
+
+// Ensures that got and want are the same, and if not, reports
+// detailed diff information.
+func checkPallocBits(t *testing.T, got, want *PallocBits) bool {
+	d := DiffPallocBits(got, want)
+	if len(d) != 0 {
+		t.Errorf("%d range(s) different", len(d))
+		for _, bits := range d {
+			t.Logf("\t@ bit index %d", bits.I)
+			t.Logf("\t|  got: %s", StringifyPallocBits(got, bits))
+			t.Logf("\t| want: %s", StringifyPallocBits(want, bits))
+		}
+		return false
+	}
+	return true
+}
+
+// makePallocBits produces an initialized PallocBits by setting
+// the ranges in s to 1 and the rest to zero.
+func makePallocBits(s []BitRange) *PallocBits {
+	b := new(PallocBits)
+	for _, v := range s {
+		b.AllocRange(v.I, v.N)
+	}
+	return b
+}
+
+// Ensures that PallocBits.AllocRange works, which is a fundamental
+// method used for testing and initialization since it's used by
+// makePallocBits.
+func TestPallocBitsAllocRange(t *testing.T) {
+	test := func(t *testing.T, i, n uint, want *PallocBits) {
+		checkPallocBits(t, makePallocBits([]BitRange{{i, n}}), want)
+	}
+	t.Run("OneLow", func(t *testing.T) {
+		want := new(PallocBits)
+		want[0] = 0x1
+		test(t, 0, 1, want)
+	})
+	t.Run("OneHigh", func(t *testing.T) {
+		want := new(PallocBits)
+		want[PallocChunkPages/64-1] = 1 << 63
+		test(t, PallocChunkPages-1, 1, want)
+	})
+	t.Run("Inner", func(t *testing.T) {
+		want := new(PallocBits)
+		want[2] = 0x3e
+		test(t, 129, 5, want)
+	})
+	t.Run("Aligned", func(t *testing.T) {
+		want := new(PallocBits)
+		want[2] = ^uint64(0)
+		want[3] = ^uint64(0)
+		test(t, 128, 128, want)
+	})
+	t.Run("Begin", func(t *testing.T) {
+		want := new(PallocBits)
+		want[0] = ^uint64(0)
+		want[1] = ^uint64(0)
+		want[2] = ^uint64(0)
+		want[3] = ^uint64(0)
+		want[4] = ^uint64(0)
+		want[5] = 0x1
+		test(t, 0, 321, want)
+	})
+	t.Run("End", func(t *testing.T) {
+		want := new(PallocBits)
+		want[PallocChunkPages/64-1] = ^uint64(0)
+		want[PallocChunkPages/64-2] = ^uint64(0)
+		want[PallocChunkPages/64-3] = ^uint64(0)
+		want[PallocChunkPages/64-4] = 1 << 63
+		test(t, PallocChunkPages-(64*3+1), 64*3+1, want)
+	})
+	t.Run("All", func(t *testing.T) {
+		want := new(PallocBits)
+		for i := range want {
+			want[i] = ^uint64(0)
+		}
+		test(t, 0, PallocChunkPages, want)
+	})
+}
+
+// Inverts every bit in the PallocBits.
+func invertPallocBits(b *PallocBits) {
+	for i := range b {
+		b[i] = ^b[i]
+	}
+}
+
+// Ensures two packed summaries are identical, and reports a detailed description
+// of the difference if they're not.
+func checkPallocSum(t testing.TB, got, want PallocSum) {
+	if got.Start() != want.Start() {
+		t.Errorf("inconsistent start: got %d, want %d", got.Start(), want.Start())
+	}
+	if got.Max() != want.Max() {
+		t.Errorf("inconsistent max: got %d, want %d", got.Max(), want.Max())
+	}
+	if got.End() != want.End() {
+		t.Errorf("inconsistent end: got %d, want %d", got.End(), want.End())
+	}
+}
+
+func TestMallocBitsPopcntRange(t *testing.T) {
+	type test struct {
+		i, n uint // bit range to popcnt over.
+		want uint // expected popcnt result on that range.
+	}
+	tests := map[string]struct {
+		init  []BitRange // bit ranges to set to 1 in the bitmap.
+		tests []test     // a set of popcnt tests to run over the bitmap.
+	}{
+		"None": {
+			tests: []test{
+				{0, 1, 0},
+				{5, 3, 0},
+				{2, 11, 0},
+				{PallocChunkPages/4 + 1, PallocChunkPages / 2, 0},
+				{0, PallocChunkPages, 0},
+			},
+		},
+		"All": {
+			init: []BitRange{{0, PallocChunkPages}},
+			tests: []test{
+				{0, 1, 1},
+				{5, 3, 3},
+				{2, 11, 11},
+				{PallocChunkPages/4 + 1, PallocChunkPages / 2, PallocChunkPages / 2},
+				{0, PallocChunkPages, PallocChunkPages},
+			},
+		},
+		"Half": {
+			init: []BitRange{{PallocChunkPages / 2, PallocChunkPages / 2}},
+			tests: []test{
+				{0, 1, 0},
+				{5, 3, 0},
+				{2, 11, 0},
+				{PallocChunkPages/2 - 1, 1, 0},
+				{PallocChunkPages / 2, 1, 1},
+				{PallocChunkPages/2 + 10, 1, 1},
+				{PallocChunkPages/2 - 1, 2, 1},
+				{PallocChunkPages / 4, PallocChunkPages / 4, 0},
+				{PallocChunkPages / 4, PallocChunkPages/4 + 1, 1},
+				{PallocChunkPages/4 + 1, PallocChunkPages / 2, PallocChunkPages/4 + 1},
+				{0, PallocChunkPages, PallocChunkPages / 2},
+			},
+		},
+		"OddBound": {
+			init: []BitRange{{0, 111}},
+			tests: []test{
+				{0, 1, 1},
+				{5, 3, 3},
+				{2, 11, 11},
+				{110, 2, 1},
+				{99, 50, 12},
+				{110, 1, 1},
+				{111, 1, 0},
+				{99, 1, 1},
+				{120, 1, 0},
+				{PallocChunkPages / 2, PallocChunkPages / 2, 0},
+				{0, PallocChunkPages, 111},
+			},
+		},
+		"Scattered": {
+			init: []BitRange{
+				{1, 3}, {5, 1}, {7, 1}, {10, 2}, {13, 1}, {15, 4},
+				{21, 1}, {23, 1}, {26, 2}, {30, 5}, {36, 2}, {40, 3},
+				{44, 6}, {51, 1}, {53, 2}, {58, 3}, {63, 1}, {67, 2},
+				{71, 10}, {84, 1}, {89, 7}, {99, 2}, {103, 1}, {107, 2},
+				{111, 1}, {113, 1}, {115, 1}, {118, 1}, {120, 2}, {125, 5},
+			},
+			tests: []test{
+				{0, 11, 6},
+				{0, 64, 39},
+				{13, 64, 40},
+				{64, 64, 34},
+				{0, 128, 73},
+				{1, 128, 74},
+				{0, PallocChunkPages, 75},
+			},
+		},
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := makePallocBits(v.init)
+			for _, h := range v.tests {
+				if got := b.PopcntRange(h.i, h.n); got != h.want {
+					t.Errorf("bad popcnt (i=%d, n=%d): got %d, want %d", h.i, h.n, got, h.want)
+				}
+			}
+		})
+	}
+}
+
+// Ensures computing bit summaries works as expected by generating random
+// bitmaps and checking against a reference implementation.
+func TestPallocBitsSummarizeRandom(t *testing.T) {
+	b := new(PallocBits)
+	for i := 0; i < 1000; i++ {
+		// Randomize bitmap.
+		for i := range b {
+			b[i] = rand.Uint64()
+		}
+		// Check summary against reference implementation.
+		checkPallocSum(t, b.Summarize(), SummarizeSlow(b))
+	}
+}
+
+// Ensures computing bit summaries works as expected.
+func TestPallocBitsSummarize(t *testing.T) {
+	var emptySum = PackPallocSum(PallocChunkPages, PallocChunkPages, PallocChunkPages)
+	type test struct {
+		free []BitRange // Ranges of free (zero) bits.
+		hits []PallocSum
+	}
+	tests := make(map[string]test)
+	tests["NoneFree"] = test{
+		free: []BitRange{},
+		hits: []PallocSum{
+			PackPallocSum(0, 0, 0),
+		},
+	}
+	tests["OnlyStart"] = test{
+		free: []BitRange{{0, 10}},
+		hits: []PallocSum{
+			PackPallocSum(10, 10, 0),
+		},
+	}
+	tests["OnlyEnd"] = test{
+		free: []BitRange{{PallocChunkPages - 40, 40}},
+		hits: []PallocSum{
+			PackPallocSum(0, 40, 40),
+		},
+	}
+	tests["StartAndEnd"] = test{
+		free: []BitRange{{0, 11}, {PallocChunkPages - 23, 23}},
+		hits: []PallocSum{
+			PackPallocSum(11, 23, 23),
+		},
+	}
+	tests["StartMaxEnd"] = test{
+		free: []BitRange{{0, 4}, {50, 100}, {PallocChunkPages - 4, 4}},
+		hits: []PallocSum{
+			PackPallocSum(4, 100, 4),
+		},
+	}
+	tests["OnlyMax"] = test{
+		free: []BitRange{{1, 20}, {35, 241}, {PallocChunkPages - 50, 30}},
+		hits: []PallocSum{
+			PackPallocSum(0, 241, 0),
+		},
+	}
+	tests["MultiMax"] = test{
+		free: []BitRange{{35, 2}, {40, 5}, {100, 5}},
+		hits: []PallocSum{
+			PackPallocSum(0, 5, 0),
+		},
+	}
+	tests["One"] = test{
+		free: []BitRange{{2, 1}},
+		hits: []PallocSum{
+			PackPallocSum(0, 1, 0),
+		},
+	}
+	tests["AllFree"] = test{
+		free: []BitRange{{0, PallocChunkPages}},
+		hits: []PallocSum{
+			emptySum,
+		},
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := makePallocBits(v.free)
+			// In the PallocBits we create 1's represent free spots, but in our actual
+			// PallocBits 1 means not free, so invert.
+			invertPallocBits(b)
+			for _, h := range v.hits {
+				checkPallocSum(t, b.Summarize(), h)
+			}
+		})
+	}
+}
+
+// Benchmarks how quickly we can summarize a PallocBits.
+func BenchmarkPallocBitsSummarize(b *testing.B) {
+	patterns := []uint64{
+		0,
+		^uint64(0),
+		0xaa,
+		0xaaaaaaaaaaaaaaaa,
+		0x80000000aaaaaaaa,
+		0xaaaaaaaa00000001,
+		0xbbbbbbbbbbbbbbbb,
+		0x80000000bbbbbbbb,
+		0xbbbbbbbb00000001,
+		0xcccccccccccccccc,
+		0x4444444444444444,
+		0x4040404040404040,
+		0x4000400040004000,
+		0x1000404044ccaaff,
+	}
+	for _, p := range patterns {
+		buf := new(PallocBits)
+		for i := 0; i < len(buf); i++ {
+			buf[i] = p
+		}
+		b.Run(fmt.Sprintf("Unpacked%02X", p), func(b *testing.B) {
+			checkPallocSum(b, buf.Summarize(), SummarizeSlow(buf))
+			for i := 0; i < b.N; i++ {
+				buf.Summarize()
+			}
+		})
+	}
+}
+
+// Ensures page allocation works.
+func TestPallocBitsAlloc(t *testing.T) {
+	tests := map[string]struct {
+		before []BitRange
+		after  []BitRange
+		npages uintptr
+		hits   []uint
+	}{
+		"AllFree1": {
+			npages: 1,
+			hits:   []uint{0, 1, 2, 3, 4, 5},
+			after:  []BitRange{{0, 6}},
+		},
+		"AllFree2": {
+			npages: 2,
+			hits:   []uint{0, 2, 4, 6, 8, 10},
+			after:  []BitRange{{0, 12}},
+		},
+		"AllFree5": {
+			npages: 5,
+			hits:   []uint{0, 5, 10, 15, 20},
+			after:  []BitRange{{0, 25}},
+		},
+		"AllFree64": {
+			npages: 64,
+			hits:   []uint{0, 64, 128},
+			after:  []BitRange{{0, 192}},
+		},
+		"AllFree65": {
+			npages: 65,
+			hits:   []uint{0, 65, 130},
+			after:  []BitRange{{0, 195}},
+		},
+		"SomeFree64": {
+			before: []BitRange{{0, 32}, {64, 32}, {100, PallocChunkPages - 100}},
+			npages: 64,
+			hits:   []uint{^uint(0)},
+			after:  []BitRange{{0, 32}, {64, 32}, {100, PallocChunkPages - 100}},
+		},
+		"NoneFree1": {
+			before: []BitRange{{0, PallocChunkPages}},
+			npages: 1,
+			hits:   []uint{^uint(0), ^uint(0)},
+			after:  []BitRange{{0, PallocChunkPages}},
+		},
+		"NoneFree2": {
+			before: []BitRange{{0, PallocChunkPages}},
+			npages: 2,
+			hits:   []uint{^uint(0), ^uint(0)},
+			after:  []BitRange{{0, PallocChunkPages}},
+		},
+		"NoneFree5": {
+			before: []BitRange{{0, PallocChunkPages}},
+			npages: 5,
+			hits:   []uint{^uint(0), ^uint(0)},
+			after:  []BitRange{{0, PallocChunkPages}},
+		},
+		"NoneFree65": {
+			before: []BitRange{{0, PallocChunkPages}},
+			npages: 65,
+			hits:   []uint{^uint(0), ^uint(0)},
+			after:  []BitRange{{0, PallocChunkPages}},
+		},
+		"ExactFit1": {
+			before: []BitRange{{0, PallocChunkPages/2 - 3}, {PallocChunkPages/2 - 2, PallocChunkPages/2 + 2}},
+			npages: 1,
+			hits:   []uint{PallocChunkPages/2 - 3, ^uint(0)},
+			after:  []BitRange{{0, PallocChunkPages}},
+		},
+		"ExactFit2": {
+			before: []BitRange{{0, PallocChunkPages/2 - 3}, {PallocChunkPages/2 - 1, PallocChunkPages/2 + 1}},
+			npages: 2,
+			hits:   []uint{PallocChunkPages/2 - 3, ^uint(0)},
+			after:  []BitRange{{0, PallocChunkPages}},
+		},
+		"ExactFit5": {
+			before: []BitRange{{0, PallocChunkPages/2 - 3}, {PallocChunkPages/2 + 2, PallocChunkPages/2 - 2}},
+			npages: 5,
+			hits:   []uint{PallocChunkPages/2 - 3, ^uint(0)},
+			after:  []BitRange{{0, PallocChunkPages}},
+		},
+		"ExactFit65": {
+			before: []BitRange{{0, PallocChunkPages/2 - 31}, {PallocChunkPages/2 + 34, PallocChunkPages/2 - 34}},
+			npages: 65,
+			hits:   []uint{PallocChunkPages/2 - 31, ^uint(0)},
+			after:  []BitRange{{0, PallocChunkPages}},
+		},
+		"SomeFree161": {
+			before: []BitRange{{0, 185}, {331, 1}},
+			npages: 161,
+			hits:   []uint{332},
+			after:  []BitRange{{0, 185}, {331, 162}},
+		},
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := makePallocBits(v.before)
+			for iter, i := range v.hits {
+				a, _ := b.Find(v.npages, 0)
+				if i != a {
+					t.Fatalf("find #%d picked wrong index: want %d, got %d", iter+1, i, a)
+				}
+				if i != ^uint(0) {
+					b.AllocRange(a, uint(v.npages))
+				}
+			}
+			want := makePallocBits(v.after)
+			checkPallocBits(t, b, want)
+		})
+	}
+}
+
+// Ensures page freeing works.
+func TestPallocBitsFree(t *testing.T) {
+	tests := map[string]struct {
+		beforeInv []BitRange
+		afterInv  []BitRange
+		frees     []uint
+		npages    uintptr
+	}{
+		"SomeFree": {
+			npages:    1,
+			beforeInv: []BitRange{{0, 32}, {64, 32}, {100, 1}},
+			frees:     []uint{32},
+			afterInv:  []BitRange{{0, 33}, {64, 32}, {100, 1}},
+		},
+		"NoneFree1": {
+			npages:   1,
+			frees:    []uint{0, 1, 2, 3, 4, 5},
+			afterInv: []BitRange{{0, 6}},
+		},
+		"NoneFree2": {
+			npages:   2,
+			frees:    []uint{0, 2, 4, 6, 8, 10},
+			afterInv: []BitRange{{0, 12}},
+		},
+		"NoneFree5": {
+			npages:   5,
+			frees:    []uint{0, 5, 10, 15, 20},
+			afterInv: []BitRange{{0, 25}},
+		},
+		"NoneFree64": {
+			npages:   64,
+			frees:    []uint{0, 64, 128},
+			afterInv: []BitRange{{0, 192}},
+		},
+		"NoneFree65": {
+			npages:   65,
+			frees:    []uint{0, 65, 130},
+			afterInv: []BitRange{{0, 195}},
+		},
+	}
+	for name, v := range tests {
+		v := v
+		t.Run(name, func(t *testing.T) {
+			b := makePallocBits(v.beforeInv)
+			invertPallocBits(b)
+			for _, i := range v.frees {
+				b.Free(i, uint(v.npages))
+			}
+			want := makePallocBits(v.afterInv)
+			invertPallocBits(want)
+			checkPallocBits(t, b, want)
+		})
+	}
+}
+
+func TestFindBitRange64(t *testing.T) {
+	check := func(x uint64, n uint, result uint) {
+		i := FindBitRange64(x, n)
+		if result == ^uint(0) && i < 64 {
+			t.Errorf("case (%016x, %d): got %d, want failure", x, n, i)
+		} else if result != ^uint(0) && i != result {
+			t.Errorf("case (%016x, %d): got %d, want %d", x, n, i, result)
+		}
+	}
+	for i := uint(1); i <= 64; i++ {
+		check(^uint64(0), i, 0)
+	}
+	for i := uint(1); i <= 64; i++ {
+		check(0, i, ^uint(0))
+	}
+	check(0x8000000000000000, 1, 63)
+	check(0xc000010001010000, 2, 62)
+	check(0xc000010001030000, 2, 16)
+	check(0xe000030001030000, 3, 61)
+	check(0xe000030001070000, 3, 16)
+	check(0xffff03ff01070000, 16, 48)
+	check(0xffff03ff0107ffff, 16, 0)
+	check(0x0fff03ff01079fff, 16, ^uint(0))
+}
+
+func BenchmarkFindBitRange64(b *testing.B) {
+	patterns := []uint64{
+		0,
+		^uint64(0),
+		0xaa,
+		0xaaaaaaaaaaaaaaaa,
+		0x80000000aaaaaaaa,
+		0xaaaaaaaa00000001,
+		0xbbbbbbbbbbbbbbbb,
+		0x80000000bbbbbbbb,
+		0xbbbbbbbb00000001,
+		0xcccccccccccccccc,
+		0x4444444444444444,
+		0x4040404040404040,
+		0x4000400040004000,
+	}
+	sizes := []uint{
+		2, 8, 32,
+	}
+	for _, pattern := range patterns {
+		for _, size := range sizes {
+			b.Run(fmt.Sprintf("Pattern%02XSize%d", pattern, size), func(b *testing.B) {
+				for i := 0; i < b.N; i++ {
+					FindBitRange64(pattern, size)
+				}
+			})
+		}
+	}
+}
diff --git a/src/runtime/mprof.go b/src/runtime/mprof.go
new file mode 100644
index 0000000..308ebae
--- /dev/null
+++ b/src/runtime/mprof.go
@@ -0,0 +1,1278 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Malloc profiling.
+// Patterned after tcmalloc's algorithms; shorter code.
+
+package runtime
+
+import (
+	"internal/abi"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// NOTE(rsc): Everything here could use cas if contention became an issue.
+var (
+	// profInsertLock protects changes to the start of all *bucket linked lists
+	profInsertLock mutex
+	// profBlockLock protects the contents of every blockRecord struct
+	profBlockLock mutex
+	// profMemActiveLock protects the active field of every memRecord struct
+	profMemActiveLock mutex
+	// profMemFutureLock is a set of locks that protect the respective elements
+	// of the future array of every memRecord struct
+	profMemFutureLock [len(memRecord{}.future)]mutex
+)
+
+// All memory allocations are local and do not escape outside of the profiler.
+// The profiler is forbidden from referring to garbage-collected memory.
+
+const (
+	// profile types
+	memProfile bucketType = 1 + iota
+	blockProfile
+	mutexProfile
+
+	// size of bucket hash table
+	buckHashSize = 179999
+
+	// max depth of stack to record in bucket
+	maxStack = 32
+)
+
+type bucketType int
+
+// A bucket holds per-call-stack profiling information.
+// The representation is a bit sleazy, inherited from C.
+// This struct defines the bucket header. It is followed in
+// memory by the stack words and then the actual record
+// data, either a memRecord or a blockRecord.
+//
+// Per-call-stack profiling information.
+// Lookup by hashing call stack into a linked-list hash table.
+//
+// None of the fields in this bucket header are modified after
+// creation, including its next and allnext links.
+//
+// No heap pointers.
+type bucket struct {
+	_       sys.NotInHeap
+	next    *bucket
+	allnext *bucket
+	typ     bucketType // memBucket or blockBucket (includes mutexProfile)
+	hash    uintptr
+	size    uintptr
+	nstk    uintptr
+}
+
+// A memRecord is the bucket data for a bucket of type memProfile,
+// part of the memory profile.
+type memRecord struct {
+	// The following complex 3-stage scheme of stats accumulation
+	// is required to obtain a consistent picture of mallocs and frees
+	// for some point in time.
+	// The problem is that mallocs come in real time, while frees
+	// come only after a GC during concurrent sweeping. So if we would
+	// naively count them, we would get a skew toward mallocs.
+	//
+	// Hence, we delay information to get consistent snapshots as
+	// of mark termination. Allocations count toward the next mark
+	// termination's snapshot, while sweep frees count toward the
+	// previous mark termination's snapshot:
+	//
+	//              MT          MT          MT          MT
+	//             .·|         .·|         .·|         .·|
+	//          .·˙  |      .·˙  |      .·˙  |      .·˙  |
+	//       .·˙     |   .·˙     |   .·˙     |   .·˙     |
+	//    .·˙        |.·˙        |.·˙        |.·˙        |
+	//
+	//       alloc → ▲ ← free
+	//               ┠┅┅┅┅┅┅┅┅┅┅┅P
+	//       C+2     →    C+1    →  C
+	//
+	//                   alloc → ▲ ← free
+	//                           ┠┅┅┅┅┅┅┅┅┅┅┅P
+	//                   C+2     →    C+1    →  C
+	//
+	// Since we can't publish a consistent snapshot until all of
+	// the sweep frees are accounted for, we wait until the next
+	// mark termination ("MT" above) to publish the previous mark
+	// termination's snapshot ("P" above). To do this, allocation
+	// and free events are accounted to *future* heap profile
+	// cycles ("C+n" above) and we only publish a cycle once all
+	// of the events from that cycle must be done. Specifically:
+	//
+	// Mallocs are accounted to cycle C+2.
+	// Explicit frees are accounted to cycle C+2.
+	// GC frees (done during sweeping) are accounted to cycle C+1.
+	//
+	// After mark termination, we increment the global heap
+	// profile cycle counter and accumulate the stats from cycle C
+	// into the active profile.
+
+	// active is the currently published profile. A profiling
+	// cycle can be accumulated into active once its complete.
+	active memRecordCycle
+
+	// future records the profile events we're counting for cycles
+	// that have not yet been published. This is ring buffer
+	// indexed by the global heap profile cycle C and stores
+	// cycles C, C+1, and C+2. Unlike active, these counts are
+	// only for a single cycle; they are not cumulative across
+	// cycles.
+	//
+	// We store cycle C here because there's a window between when
+	// C becomes the active cycle and when we've flushed it to
+	// active.
+	future [3]memRecordCycle
+}
+
+// memRecordCycle
+type memRecordCycle struct {
+	allocs, frees           uintptr
+	alloc_bytes, free_bytes uintptr
+}
+
+// add accumulates b into a. It does not zero b.
+func (a *memRecordCycle) add(b *memRecordCycle) {
+	a.allocs += b.allocs
+	a.frees += b.frees
+	a.alloc_bytes += b.alloc_bytes
+	a.free_bytes += b.free_bytes
+}
+
+// A blockRecord is the bucket data for a bucket of type blockProfile,
+// which is used in blocking and mutex profiles.
+type blockRecord struct {
+	count  float64
+	cycles int64
+}
+
+var (
+	mbuckets atomic.UnsafePointer // *bucket, memory profile buckets
+	bbuckets atomic.UnsafePointer // *bucket, blocking profile buckets
+	xbuckets atomic.UnsafePointer // *bucket, mutex profile buckets
+	buckhash atomic.UnsafePointer // *buckhashArray
+
+	mProfCycle mProfCycleHolder
+)
+
+type buckhashArray [buckHashSize]atomic.UnsafePointer // *bucket
+
+const mProfCycleWrap = uint32(len(memRecord{}.future)) * (2 << 24)
+
+// mProfCycleHolder holds the global heap profile cycle number (wrapped at
+// mProfCycleWrap, stored starting at bit 1), and a flag (stored at bit 0) to
+// indicate whether future[cycle] in all buckets has been queued to flush into
+// the active profile.
+type mProfCycleHolder struct {
+	value atomic.Uint32
+}
+
+// read returns the current cycle count.
+func (c *mProfCycleHolder) read() (cycle uint32) {
+	v := c.value.Load()
+	cycle = v >> 1
+	return cycle
+}
+
+// setFlushed sets the flushed flag. It returns the current cycle count and the
+// previous value of the flushed flag.
+func (c *mProfCycleHolder) setFlushed() (cycle uint32, alreadyFlushed bool) {
+	for {
+		prev := c.value.Load()
+		cycle = prev >> 1
+		alreadyFlushed = (prev & 0x1) != 0
+		next := prev | 0x1
+		if c.value.CompareAndSwap(prev, next) {
+			return cycle, alreadyFlushed
+		}
+	}
+}
+
+// increment increases the cycle count by one, wrapping the value at
+// mProfCycleWrap. It clears the flushed flag.
+func (c *mProfCycleHolder) increment() {
+	// We explicitly wrap mProfCycle rather than depending on
+	// uint wraparound because the memRecord.future ring does not
+	// itself wrap at a power of two.
+	for {
+		prev := c.value.Load()
+		cycle := prev >> 1
+		cycle = (cycle + 1) % mProfCycleWrap
+		next := cycle << 1
+		if c.value.CompareAndSwap(prev, next) {
+			break
+		}
+	}
+}
+
+// newBucket allocates a bucket with the given type and number of stack entries.
+func newBucket(typ bucketType, nstk int) *bucket {
+	size := unsafe.Sizeof(bucket{}) + uintptr(nstk)*unsafe.Sizeof(uintptr(0))
+	switch typ {
+	default:
+		throw("invalid profile bucket type")
+	case memProfile:
+		size += unsafe.Sizeof(memRecord{})
+	case blockProfile, mutexProfile:
+		size += unsafe.Sizeof(blockRecord{})
+	}
+
+	b := (*bucket)(persistentalloc(size, 0, &memstats.buckhash_sys))
+	b.typ = typ
+	b.nstk = uintptr(nstk)
+	return b
+}
+
+// stk returns the slice in b holding the stack.
+func (b *bucket) stk() []uintptr {
+	stk := (*[maxStack]uintptr)(add(unsafe.Pointer(b), unsafe.Sizeof(*b)))
+	return stk[:b.nstk:b.nstk]
+}
+
+// mp returns the memRecord associated with the memProfile bucket b.
+func (b *bucket) mp() *memRecord {
+	if b.typ != memProfile {
+		throw("bad use of bucket.mp")
+	}
+	data := add(unsafe.Pointer(b), unsafe.Sizeof(*b)+b.nstk*unsafe.Sizeof(uintptr(0)))
+	return (*memRecord)(data)
+}
+
+// bp returns the blockRecord associated with the blockProfile bucket b.
+func (b *bucket) bp() *blockRecord {
+	if b.typ != blockProfile && b.typ != mutexProfile {
+		throw("bad use of bucket.bp")
+	}
+	data := add(unsafe.Pointer(b), unsafe.Sizeof(*b)+b.nstk*unsafe.Sizeof(uintptr(0)))
+	return (*blockRecord)(data)
+}
+
+// Return the bucket for stk[0:nstk], allocating new bucket if needed.
+func stkbucket(typ bucketType, size uintptr, stk []uintptr, alloc bool) *bucket {
+	bh := (*buckhashArray)(buckhash.Load())
+	if bh == nil {
+		lock(&profInsertLock)
+		// check again under the lock
+		bh = (*buckhashArray)(buckhash.Load())
+		if bh == nil {
+			bh = (*buckhashArray)(sysAlloc(unsafe.Sizeof(buckhashArray{}), &memstats.buckhash_sys))
+			if bh == nil {
+				throw("runtime: cannot allocate memory")
+			}
+			buckhash.StoreNoWB(unsafe.Pointer(bh))
+		}
+		unlock(&profInsertLock)
+	}
+
+	// Hash stack.
+	var h uintptr
+	for _, pc := range stk {
+		h += pc
+		h += h << 10
+		h ^= h >> 6
+	}
+	// hash in size
+	h += size
+	h += h << 10
+	h ^= h >> 6
+	// finalize
+	h += h << 3
+	h ^= h >> 11
+
+	i := int(h % buckHashSize)
+	// first check optimistically, without the lock
+	for b := (*bucket)(bh[i].Load()); b != nil; b = b.next {
+		if b.typ == typ && b.hash == h && b.size == size && eqslice(b.stk(), stk) {
+			return b
+		}
+	}
+
+	if !alloc {
+		return nil
+	}
+
+	lock(&profInsertLock)
+	// check again under the insertion lock
+	for b := (*bucket)(bh[i].Load()); b != nil; b = b.next {
+		if b.typ == typ && b.hash == h && b.size == size && eqslice(b.stk(), stk) {
+			unlock(&profInsertLock)
+			return b
+		}
+	}
+
+	// Create new bucket.
+	b := newBucket(typ, len(stk))
+	copy(b.stk(), stk)
+	b.hash = h
+	b.size = size
+
+	var allnext *atomic.UnsafePointer
+	if typ == memProfile {
+		allnext = &mbuckets
+	} else if typ == mutexProfile {
+		allnext = &xbuckets
+	} else {
+		allnext = &bbuckets
+	}
+
+	b.next = (*bucket)(bh[i].Load())
+	b.allnext = (*bucket)(allnext.Load())
+
+	bh[i].StoreNoWB(unsafe.Pointer(b))
+	allnext.StoreNoWB(unsafe.Pointer(b))
+
+	unlock(&profInsertLock)
+	return b
+}
+
+func eqslice(x, y []uintptr) bool {
+	if len(x) != len(y) {
+		return false
+	}
+	for i, xi := range x {
+		if xi != y[i] {
+			return false
+		}
+	}
+	return true
+}
+
+// mProf_NextCycle publishes the next heap profile cycle and creates a
+// fresh heap profile cycle. This operation is fast and can be done
+// during STW. The caller must call mProf_Flush before calling
+// mProf_NextCycle again.
+//
+// This is called by mark termination during STW so allocations and
+// frees after the world is started again count towards a new heap
+// profiling cycle.
+func mProf_NextCycle() {
+	mProfCycle.increment()
+}
+
+// mProf_Flush flushes the events from the current heap profiling
+// cycle into the active profile. After this it is safe to start a new
+// heap profiling cycle with mProf_NextCycle.
+//
+// This is called by GC after mark termination starts the world. In
+// contrast with mProf_NextCycle, this is somewhat expensive, but safe
+// to do concurrently.
+func mProf_Flush() {
+	cycle, alreadyFlushed := mProfCycle.setFlushed()
+	if alreadyFlushed {
+		return
+	}
+
+	index := cycle % uint32(len(memRecord{}.future))
+	lock(&profMemActiveLock)
+	lock(&profMemFutureLock[index])
+	mProf_FlushLocked(index)
+	unlock(&profMemFutureLock[index])
+	unlock(&profMemActiveLock)
+}
+
+// mProf_FlushLocked flushes the events from the heap profiling cycle at index
+// into the active profile. The caller must hold the lock for the active profile
+// (profMemActiveLock) and for the profiling cycle at index
+// (profMemFutureLock[index]).
+func mProf_FlushLocked(index uint32) {
+	assertLockHeld(&profMemActiveLock)
+	assertLockHeld(&profMemFutureLock[index])
+	head := (*bucket)(mbuckets.Load())
+	for b := head; b != nil; b = b.allnext {
+		mp := b.mp()
+
+		// Flush cycle C into the published profile and clear
+		// it for reuse.
+		mpc := &mp.future[index]
+		mp.active.add(mpc)
+		*mpc = memRecordCycle{}
+	}
+}
+
+// mProf_PostSweep records that all sweep frees for this GC cycle have
+// completed. This has the effect of publishing the heap profile
+// snapshot as of the last mark termination without advancing the heap
+// profile cycle.
+func mProf_PostSweep() {
+	// Flush cycle C+1 to the active profile so everything as of
+	// the last mark termination becomes visible. *Don't* advance
+	// the cycle, since we're still accumulating allocs in cycle
+	// C+2, which have to become C+1 in the next mark termination
+	// and so on.
+	cycle := mProfCycle.read() + 1
+
+	index := cycle % uint32(len(memRecord{}.future))
+	lock(&profMemActiveLock)
+	lock(&profMemFutureLock[index])
+	mProf_FlushLocked(index)
+	unlock(&profMemFutureLock[index])
+	unlock(&profMemActiveLock)
+}
+
+// Called by malloc to record a profiled block.
+func mProf_Malloc(p unsafe.Pointer, size uintptr) {
+	var stk [maxStack]uintptr
+	nstk := callers(4, stk[:])
+
+	index := (mProfCycle.read() + 2) % uint32(len(memRecord{}.future))
+
+	b := stkbucket(memProfile, size, stk[:nstk], true)
+	mp := b.mp()
+	mpc := &mp.future[index]
+
+	lock(&profMemFutureLock[index])
+	mpc.allocs++
+	mpc.alloc_bytes += size
+	unlock(&profMemFutureLock[index])
+
+	// Setprofilebucket locks a bunch of other mutexes, so we call it outside of
+	// the profiler locks. This reduces potential contention and chances of
+	// deadlocks. Since the object must be alive during the call to
+	// mProf_Malloc, it's fine to do this non-atomically.
+	systemstack(func() {
+		setprofilebucket(p, b)
+	})
+}
+
+// Called when freeing a profiled block.
+func mProf_Free(b *bucket, size uintptr) {
+	index := (mProfCycle.read() + 1) % uint32(len(memRecord{}.future))
+
+	mp := b.mp()
+	mpc := &mp.future[index]
+
+	lock(&profMemFutureLock[index])
+	mpc.frees++
+	mpc.free_bytes += size
+	unlock(&profMemFutureLock[index])
+}
+
+var blockprofilerate uint64 // in CPU ticks
+
+// SetBlockProfileRate controls the fraction of goroutine blocking events
+// that are reported in the blocking profile. The profiler aims to sample
+// an average of one blocking event per rate nanoseconds spent blocked.
+//
+// To include every blocking event in the profile, pass rate = 1.
+// To turn off profiling entirely, pass rate <= 0.
+func SetBlockProfileRate(rate int) {
+	var r int64
+	if rate <= 0 {
+		r = 0 // disable profiling
+	} else if rate == 1 {
+		r = 1 // profile everything
+	} else {
+		// convert ns to cycles, use float64 to prevent overflow during multiplication
+		r = int64(float64(rate) * float64(tickspersecond()) / (1000 * 1000 * 1000))
+		if r == 0 {
+			r = 1
+		}
+	}
+
+	atomic.Store64(&blockprofilerate, uint64(r))
+}
+
+func blockevent(cycles int64, skip int) {
+	if cycles <= 0 {
+		cycles = 1
+	}
+
+	rate := int64(atomic.Load64(&blockprofilerate))
+	if blocksampled(cycles, rate) {
+		saveblockevent(cycles, rate, skip+1, blockProfile)
+	}
+}
+
+// blocksampled returns true for all events where cycles >= rate. Shorter
+// events have a cycles/rate random chance of returning true.
+func blocksampled(cycles, rate int64) bool {
+	if rate <= 0 || (rate > cycles && int64(fastrand())%rate > cycles) {
+		return false
+	}
+	return true
+}
+
+func saveblockevent(cycles, rate int64, skip int, which bucketType) {
+	gp := getg()
+	var nstk int
+	var stk [maxStack]uintptr
+	if gp.m.curg == nil || gp.m.curg == gp {
+		nstk = callers(skip, stk[:])
+	} else {
+		nstk = gcallers(gp.m.curg, skip, stk[:])
+	}
+	b := stkbucket(which, 0, stk[:nstk], true)
+	bp := b.bp()
+
+	lock(&profBlockLock)
+	// We want to up-scale the count and cycles according to the
+	// probability that the event was sampled. For block profile events,
+	// the sample probability is 1 if cycles >= rate, and cycles / rate
+	// otherwise. For mutex profile events, the sample probability is 1 / rate.
+	// We scale the events by 1 / (probability the event was sampled).
+	if which == blockProfile && cycles < rate {
+		// Remove sampling bias, see discussion on http://golang.org/cl/299991.
+		bp.count += float64(rate) / float64(cycles)
+		bp.cycles += rate
+	} else if which == mutexProfile {
+		bp.count += float64(rate)
+		bp.cycles += rate * cycles
+	} else {
+		bp.count++
+		bp.cycles += cycles
+	}
+	unlock(&profBlockLock)
+}
+
+var mutexprofilerate uint64 // fraction sampled
+
+// SetMutexProfileFraction controls the fraction of mutex contention events
+// that are reported in the mutex profile. On average 1/rate events are
+// reported. The previous rate is returned.
+//
+// To turn off profiling entirely, pass rate 0.
+// To just read the current rate, pass rate < 0.
+// (For n>1 the details of sampling may change.)
+func SetMutexProfileFraction(rate int) int {
+	if rate < 0 {
+		return int(mutexprofilerate)
+	}
+	old := mutexprofilerate
+	atomic.Store64(&mutexprofilerate, uint64(rate))
+	return int(old)
+}
+
+//go:linkname mutexevent sync.event
+func mutexevent(cycles int64, skip int) {
+	if cycles < 0 {
+		cycles = 0
+	}
+	rate := int64(atomic.Load64(&mutexprofilerate))
+	// TODO(pjw): measure impact of always calling fastrand vs using something
+	// like malloc.go:nextSample()
+	if rate > 0 && int64(fastrand())%rate == 0 {
+		saveblockevent(cycles, rate, skip+1, mutexProfile)
+	}
+}
+
+// Go interface to profile data.
+
+// A StackRecord describes a single execution stack.
+type StackRecord struct {
+	Stack0 [32]uintptr // stack trace for this record; ends at first 0 entry
+}
+
+// Stack returns the stack trace associated with the record,
+// a prefix of r.Stack0.
+func (r *StackRecord) Stack() []uintptr {
+	for i, v := range r.Stack0 {
+		if v == 0 {
+			return r.Stack0[0:i]
+		}
+	}
+	return r.Stack0[0:]
+}
+
+// MemProfileRate controls the fraction of memory allocations
+// that are recorded and reported in the memory profile.
+// The profiler aims to sample an average of
+// one allocation per MemProfileRate bytes allocated.
+//
+// To include every allocated block in the profile, set MemProfileRate to 1.
+// To turn off profiling entirely, set MemProfileRate to 0.
+//
+// The tools that process the memory profiles assume that the
+// profile rate is constant across the lifetime of the program
+// and equal to the current value. Programs that change the
+// memory profiling rate should do so just once, as early as
+// possible in the execution of the program (for example,
+// at the beginning of main).
+var MemProfileRate int = 512 * 1024
+
+// disableMemoryProfiling is set by the linker if runtime.MemProfile
+// is not used and the link type guarantees nobody else could use it
+// elsewhere.
+var disableMemoryProfiling bool
+
+// A MemProfileRecord describes the live objects allocated
+// by a particular call sequence (stack trace).
+type MemProfileRecord struct {
+	AllocBytes, FreeBytes     int64       // number of bytes allocated, freed
+	AllocObjects, FreeObjects int64       // number of objects allocated, freed
+	Stack0                    [32]uintptr // stack trace for this record; ends at first 0 entry
+}
+
+// InUseBytes returns the number of bytes in use (AllocBytes - FreeBytes).
+func (r *MemProfileRecord) InUseBytes() int64 { return r.AllocBytes - r.FreeBytes }
+
+// InUseObjects returns the number of objects in use (AllocObjects - FreeObjects).
+func (r *MemProfileRecord) InUseObjects() int64 {
+	return r.AllocObjects - r.FreeObjects
+}
+
+// Stack returns the stack trace associated with the record,
+// a prefix of r.Stack0.
+func (r *MemProfileRecord) Stack() []uintptr {
+	for i, v := range r.Stack0 {
+		if v == 0 {
+			return r.Stack0[0:i]
+		}
+	}
+	return r.Stack0[0:]
+}
+
+// MemProfile returns a profile of memory allocated and freed per allocation
+// site.
+//
+// MemProfile returns n, the number of records in the current memory profile.
+// If len(p) >= n, MemProfile copies the profile into p and returns n, true.
+// If len(p) < n, MemProfile does not change p and returns n, false.
+//
+// If inuseZero is true, the profile includes allocation records
+// where r.AllocBytes > 0 but r.AllocBytes == r.FreeBytes.
+// These are sites where memory was allocated, but it has all
+// been released back to the runtime.
+//
+// The returned profile may be up to two garbage collection cycles old.
+// This is to avoid skewing the profile toward allocations; because
+// allocations happen in real time but frees are delayed until the garbage
+// collector performs sweeping, the profile only accounts for allocations
+// that have had a chance to be freed by the garbage collector.
+//
+// Most clients should use the runtime/pprof package or
+// the testing package's -test.memprofile flag instead
+// of calling MemProfile directly.
+func MemProfile(p []MemProfileRecord, inuseZero bool) (n int, ok bool) {
+	cycle := mProfCycle.read()
+	// If we're between mProf_NextCycle and mProf_Flush, take care
+	// of flushing to the active profile so we only have to look
+	// at the active profile below.
+	index := cycle % uint32(len(memRecord{}.future))
+	lock(&profMemActiveLock)
+	lock(&profMemFutureLock[index])
+	mProf_FlushLocked(index)
+	unlock(&profMemFutureLock[index])
+	clear := true
+	head := (*bucket)(mbuckets.Load())
+	for b := head; b != nil; b = b.allnext {
+		mp := b.mp()
+		if inuseZero || mp.active.alloc_bytes != mp.active.free_bytes {
+			n++
+		}
+		if mp.active.allocs != 0 || mp.active.frees != 0 {
+			clear = false
+		}
+	}
+	if clear {
+		// Absolutely no data, suggesting that a garbage collection
+		// has not yet happened. In order to allow profiling when
+		// garbage collection is disabled from the beginning of execution,
+		// accumulate all of the cycles, and recount buckets.
+		n = 0
+		for b := head; b != nil; b = b.allnext {
+			mp := b.mp()
+			for c := range mp.future {
+				lock(&profMemFutureLock[c])
+				mp.active.add(&mp.future[c])
+				mp.future[c] = memRecordCycle{}
+				unlock(&profMemFutureLock[c])
+			}
+			if inuseZero || mp.active.alloc_bytes != mp.active.free_bytes {
+				n++
+			}
+		}
+	}
+	if n <= len(p) {
+		ok = true
+		idx := 0
+		for b := head; b != nil; b = b.allnext {
+			mp := b.mp()
+			if inuseZero || mp.active.alloc_bytes != mp.active.free_bytes {
+				record(&p[idx], b)
+				idx++
+			}
+		}
+	}
+	unlock(&profMemActiveLock)
+	return
+}
+
+// Write b's data to r.
+func record(r *MemProfileRecord, b *bucket) {
+	mp := b.mp()
+	r.AllocBytes = int64(mp.active.alloc_bytes)
+	r.FreeBytes = int64(mp.active.free_bytes)
+	r.AllocObjects = int64(mp.active.allocs)
+	r.FreeObjects = int64(mp.active.frees)
+	if raceenabled {
+		racewriterangepc(unsafe.Pointer(&r.Stack0[0]), unsafe.Sizeof(r.Stack0), getcallerpc(), abi.FuncPCABIInternal(MemProfile))
+	}
+	if msanenabled {
+		msanwrite(unsafe.Pointer(&r.Stack0[0]), unsafe.Sizeof(r.Stack0))
+	}
+	if asanenabled {
+		asanwrite(unsafe.Pointer(&r.Stack0[0]), unsafe.Sizeof(r.Stack0))
+	}
+	copy(r.Stack0[:], b.stk())
+	for i := int(b.nstk); i < len(r.Stack0); i++ {
+		r.Stack0[i] = 0
+	}
+}
+
+func iterate_memprof(fn func(*bucket, uintptr, *uintptr, uintptr, uintptr, uintptr)) {
+	lock(&profMemActiveLock)
+	head := (*bucket)(mbuckets.Load())
+	for b := head; b != nil; b = b.allnext {
+		mp := b.mp()
+		fn(b, b.nstk, &b.stk()[0], b.size, mp.active.allocs, mp.active.frees)
+	}
+	unlock(&profMemActiveLock)
+}
+
+// BlockProfileRecord describes blocking events originated
+// at a particular call sequence (stack trace).
+type BlockProfileRecord struct {
+	Count  int64
+	Cycles int64
+	StackRecord
+}
+
+// BlockProfile returns n, the number of records in the current blocking profile.
+// If len(p) >= n, BlockProfile copies the profile into p and returns n, true.
+// If len(p) < n, BlockProfile does not change p and returns n, false.
+//
+// Most clients should use the runtime/pprof package or
+// the testing package's -test.blockprofile flag instead
+// of calling BlockProfile directly.
+func BlockProfile(p []BlockProfileRecord) (n int, ok bool) {
+	lock(&profBlockLock)
+	head := (*bucket)(bbuckets.Load())
+	for b := head; b != nil; b = b.allnext {
+		n++
+	}
+	if n <= len(p) {
+		ok = true
+		for b := head; b != nil; b = b.allnext {
+			bp := b.bp()
+			r := &p[0]
+			r.Count = int64(bp.count)
+			// Prevent callers from having to worry about division by zero errors.
+			// See discussion on http://golang.org/cl/299991.
+			if r.Count == 0 {
+				r.Count = 1
+			}
+			r.Cycles = bp.cycles
+			if raceenabled {
+				racewriterangepc(unsafe.Pointer(&r.Stack0[0]), unsafe.Sizeof(r.Stack0), getcallerpc(), abi.FuncPCABIInternal(BlockProfile))
+			}
+			if msanenabled {
+				msanwrite(unsafe.Pointer(&r.Stack0[0]), unsafe.Sizeof(r.Stack0))
+			}
+			if asanenabled {
+				asanwrite(unsafe.Pointer(&r.Stack0[0]), unsafe.Sizeof(r.Stack0))
+			}
+			i := copy(r.Stack0[:], b.stk())
+			for ; i < len(r.Stack0); i++ {
+				r.Stack0[i] = 0
+			}
+			p = p[1:]
+		}
+	}
+	unlock(&profBlockLock)
+	return
+}
+
+// MutexProfile returns n, the number of records in the current mutex profile.
+// If len(p) >= n, MutexProfile copies the profile into p and returns n, true.
+// Otherwise, MutexProfile does not change p, and returns n, false.
+//
+// Most clients should use the runtime/pprof package
+// instead of calling MutexProfile directly.
+func MutexProfile(p []BlockProfileRecord) (n int, ok bool) {
+	lock(&profBlockLock)
+	head := (*bucket)(xbuckets.Load())
+	for b := head; b != nil; b = b.allnext {
+		n++
+	}
+	if n <= len(p) {
+		ok = true
+		for b := head; b != nil; b = b.allnext {
+			bp := b.bp()
+			r := &p[0]
+			r.Count = int64(bp.count)
+			r.Cycles = bp.cycles
+			i := copy(r.Stack0[:], b.stk())
+			for ; i < len(r.Stack0); i++ {
+				r.Stack0[i] = 0
+			}
+			p = p[1:]
+		}
+	}
+	unlock(&profBlockLock)
+	return
+}
+
+// ThreadCreateProfile returns n, the number of records in the thread creation profile.
+// If len(p) >= n, ThreadCreateProfile copies the profile into p and returns n, true.
+// If len(p) < n, ThreadCreateProfile does not change p and returns n, false.
+//
+// Most clients should use the runtime/pprof package instead
+// of calling ThreadCreateProfile directly.
+func ThreadCreateProfile(p []StackRecord) (n int, ok bool) {
+	first := (*m)(atomic.Loadp(unsafe.Pointer(&allm)))
+	for mp := first; mp != nil; mp = mp.alllink {
+		n++
+	}
+	if n <= len(p) {
+		ok = true
+		i := 0
+		for mp := first; mp != nil; mp = mp.alllink {
+			p[i].Stack0 = mp.createstack
+			i++
+		}
+	}
+	return
+}
+
+//go:linkname runtime_goroutineProfileWithLabels runtime/pprof.runtime_goroutineProfileWithLabels
+func runtime_goroutineProfileWithLabels(p []StackRecord, labels []unsafe.Pointer) (n int, ok bool) {
+	return goroutineProfileWithLabels(p, labels)
+}
+
+// labels may be nil. If labels is non-nil, it must have the same length as p.
+func goroutineProfileWithLabels(p []StackRecord, labels []unsafe.Pointer) (n int, ok bool) {
+	if labels != nil && len(labels) != len(p) {
+		labels = nil
+	}
+
+	return goroutineProfileWithLabelsConcurrent(p, labels)
+}
+
+var goroutineProfile = struct {
+	sema    uint32
+	active  bool
+	offset  atomic.Int64
+	records []StackRecord
+	labels  []unsafe.Pointer
+}{
+	sema: 1,
+}
+
+// goroutineProfileState indicates the status of a goroutine's stack for the
+// current in-progress goroutine profile. Goroutines' stacks are initially
+// "Absent" from the profile, and end up "Satisfied" by the time the profile is
+// complete. While a goroutine's stack is being captured, its
+// goroutineProfileState will be "InProgress" and it will not be able to run
+// until the capture completes and the state moves to "Satisfied".
+//
+// Some goroutines (the finalizer goroutine, which at various times can be
+// either a "system" or a "user" goroutine, and the goroutine that is
+// coordinating the profile, any goroutines created during the profile) move
+// directly to the "Satisfied" state.
+type goroutineProfileState uint32
+
+const (
+	goroutineProfileAbsent goroutineProfileState = iota
+	goroutineProfileInProgress
+	goroutineProfileSatisfied
+)
+
+type goroutineProfileStateHolder atomic.Uint32
+
+func (p *goroutineProfileStateHolder) Load() goroutineProfileState {
+	return goroutineProfileState((*atomic.Uint32)(p).Load())
+}
+
+func (p *goroutineProfileStateHolder) Store(value goroutineProfileState) {
+	(*atomic.Uint32)(p).Store(uint32(value))
+}
+
+func (p *goroutineProfileStateHolder) CompareAndSwap(old, new goroutineProfileState) bool {
+	return (*atomic.Uint32)(p).CompareAndSwap(uint32(old), uint32(new))
+}
+
+func goroutineProfileWithLabelsConcurrent(p []StackRecord, labels []unsafe.Pointer) (n int, ok bool) {
+	semacquire(&goroutineProfile.sema)
+
+	ourg := getg()
+
+	stopTheWorld(stwGoroutineProfile)
+	// Using gcount while the world is stopped should give us a consistent view
+	// of the number of live goroutines, minus the number of goroutines that are
+	// alive and permanently marked as "system". But to make this count agree
+	// with what we'd get from isSystemGoroutine, we need special handling for
+	// goroutines that can vary between user and system to ensure that the count
+	// doesn't change during the collection. So, check the finalizer goroutine
+	// in particular.
+	n = int(gcount())
+	if fingStatus.Load()&fingRunningFinalizer != 0 {
+		n++
+	}
+
+	if n > len(p) {
+		// There's not enough space in p to store the whole profile, so (per the
+		// contract of runtime.GoroutineProfile) we're not allowed to write to p
+		// at all and must return n, false.
+		startTheWorld()
+		semrelease(&goroutineProfile.sema)
+		return n, false
+	}
+
+	// Save current goroutine.
+	sp := getcallersp()
+	pc := getcallerpc()
+	systemstack(func() {
+		saveg(pc, sp, ourg, &p[0])
+	})
+	ourg.goroutineProfiled.Store(goroutineProfileSatisfied)
+	goroutineProfile.offset.Store(1)
+
+	// Prepare for all other goroutines to enter the profile. Aside from ourg,
+	// every goroutine struct in the allgs list has its goroutineProfiled field
+	// cleared. Any goroutine created from this point on (while
+	// goroutineProfile.active is set) will start with its goroutineProfiled
+	// field set to goroutineProfileSatisfied.
+	goroutineProfile.active = true
+	goroutineProfile.records = p
+	goroutineProfile.labels = labels
+	// The finalizer goroutine needs special handling because it can vary over
+	// time between being a user goroutine (eligible for this profile) and a
+	// system goroutine (to be excluded). Pick one before restarting the world.
+	if fing != nil {
+		fing.goroutineProfiled.Store(goroutineProfileSatisfied)
+		if readgstatus(fing) != _Gdead && !isSystemGoroutine(fing, false) {
+			doRecordGoroutineProfile(fing)
+		}
+	}
+	startTheWorld()
+
+	// Visit each goroutine that existed as of the startTheWorld call above.
+	//
+	// New goroutines may not be in this list, but we didn't want to know about
+	// them anyway. If they do appear in this list (via reusing a dead goroutine
+	// struct, or racing to launch between the world restarting and us getting
+	// the list), they will already have their goroutineProfiled field set to
+	// goroutineProfileSatisfied before their state transitions out of _Gdead.
+	//
+	// Any goroutine that the scheduler tries to execute concurrently with this
+	// call will start by adding itself to the profile (before the act of
+	// executing can cause any changes in its stack).
+	forEachGRace(func(gp1 *g) {
+		tryRecordGoroutineProfile(gp1, Gosched)
+	})
+
+	stopTheWorld(stwGoroutineProfileCleanup)
+	endOffset := goroutineProfile.offset.Swap(0)
+	goroutineProfile.active = false
+	goroutineProfile.records = nil
+	goroutineProfile.labels = nil
+	startTheWorld()
+
+	// Restore the invariant that every goroutine struct in allgs has its
+	// goroutineProfiled field cleared.
+	forEachGRace(func(gp1 *g) {
+		gp1.goroutineProfiled.Store(goroutineProfileAbsent)
+	})
+
+	if raceenabled {
+		raceacquire(unsafe.Pointer(&labelSync))
+	}
+
+	if n != int(endOffset) {
+		// It's a big surprise that the number of goroutines changed while we
+		// were collecting the profile. But probably better to return a
+		// truncated profile than to crash the whole process.
+		//
+		// For instance, needm moves a goroutine out of the _Gdead state and so
+		// might be able to change the goroutine count without interacting with
+		// the scheduler. For code like that, the race windows are small and the
+		// combination of features is uncommon, so it's hard to be (and remain)
+		// sure we've caught them all.
+	}
+
+	semrelease(&goroutineProfile.sema)
+	return n, true
+}
+
+// tryRecordGoroutineProfileWB asserts that write barriers are allowed and calls
+// tryRecordGoroutineProfile.
+//
+//go:yeswritebarrierrec
+func tryRecordGoroutineProfileWB(gp1 *g) {
+	if getg().m.p.ptr() == nil {
+		throw("no P available, write barriers are forbidden")
+	}
+	tryRecordGoroutineProfile(gp1, osyield)
+}
+
+// tryRecordGoroutineProfile ensures that gp1 has the appropriate representation
+// in the current goroutine profile: either that it should not be profiled, or
+// that a snapshot of its call stack and labels are now in the profile.
+func tryRecordGoroutineProfile(gp1 *g, yield func()) {
+	if readgstatus(gp1) == _Gdead {
+		// Dead goroutines should not appear in the profile. Goroutines that
+		// start while profile collection is active will get goroutineProfiled
+		// set to goroutineProfileSatisfied before transitioning out of _Gdead,
+		// so here we check _Gdead first.
+		return
+	}
+	if isSystemGoroutine(gp1, true) {
+		// System goroutines should not appear in the profile. (The finalizer
+		// goroutine is marked as "already profiled".)
+		return
+	}
+
+	for {
+		prev := gp1.goroutineProfiled.Load()
+		if prev == goroutineProfileSatisfied {
+			// This goroutine is already in the profile (or is new since the
+			// start of collection, so shouldn't appear in the profile).
+			break
+		}
+		if prev == goroutineProfileInProgress {
+			// Something else is adding gp1 to the goroutine profile right now.
+			// Give that a moment to finish.
+			yield()
+			continue
+		}
+
+		// While we have gp1.goroutineProfiled set to
+		// goroutineProfileInProgress, gp1 may appear _Grunnable but will not
+		// actually be able to run. Disable preemption for ourselves, to make
+		// sure we finish profiling gp1 right away instead of leaving it stuck
+		// in this limbo.
+		mp := acquirem()
+		if gp1.goroutineProfiled.CompareAndSwap(goroutineProfileAbsent, goroutineProfileInProgress) {
+			doRecordGoroutineProfile(gp1)
+			gp1.goroutineProfiled.Store(goroutineProfileSatisfied)
+		}
+		releasem(mp)
+	}
+}
+
+// doRecordGoroutineProfile writes gp1's call stack and labels to an in-progress
+// goroutine profile. Preemption is disabled.
+//
+// This may be called via tryRecordGoroutineProfile in two ways: by the
+// goroutine that is coordinating the goroutine profile (running on its own
+// stack), or from the scheduler in preparation to execute gp1 (running on the
+// system stack).
+func doRecordGoroutineProfile(gp1 *g) {
+	if readgstatus(gp1) == _Grunning {
+		print("doRecordGoroutineProfile gp1=", gp1.goid, "\n")
+		throw("cannot read stack of running goroutine")
+	}
+
+	offset := int(goroutineProfile.offset.Add(1)) - 1
+
+	if offset >= len(goroutineProfile.records) {
+		// Should be impossible, but better to return a truncated profile than
+		// to crash the entire process at this point. Instead, deal with it in
+		// goroutineProfileWithLabelsConcurrent where we have more context.
+		return
+	}
+
+	// saveg calls gentraceback, which may call cgo traceback functions. When
+	// called from the scheduler, this is on the system stack already so
+	// traceback.go:cgoContextPCs will avoid calling back into the scheduler.
+	//
+	// When called from the goroutine coordinating the profile, we still have
+	// set gp1.goroutineProfiled to goroutineProfileInProgress and so are still
+	// preventing it from being truly _Grunnable. So we'll use the system stack
+	// to avoid schedule delays.
+	systemstack(func() { saveg(^uintptr(0), ^uintptr(0), gp1, &goroutineProfile.records[offset]) })
+
+	if goroutineProfile.labels != nil {
+		goroutineProfile.labels[offset] = gp1.labels
+	}
+}
+
+func goroutineProfileWithLabelsSync(p []StackRecord, labels []unsafe.Pointer) (n int, ok bool) {
+	gp := getg()
+
+	isOK := func(gp1 *g) bool {
+		// Checking isSystemGoroutine here makes GoroutineProfile
+		// consistent with both NumGoroutine and Stack.
+		return gp1 != gp && readgstatus(gp1) != _Gdead && !isSystemGoroutine(gp1, false)
+	}
+
+	stopTheWorld(stwGoroutineProfile)
+
+	// World is stopped, no locking required.
+	n = 1
+	forEachGRace(func(gp1 *g) {
+		if isOK(gp1) {
+			n++
+		}
+	})
+
+	if n <= len(p) {
+		ok = true
+		r, lbl := p, labels
+
+		// Save current goroutine.
+		sp := getcallersp()
+		pc := getcallerpc()
+		systemstack(func() {
+			saveg(pc, sp, gp, &r[0])
+		})
+		r = r[1:]
+
+		// If we have a place to put our goroutine labelmap, insert it there.
+		if labels != nil {
+			lbl[0] = gp.labels
+			lbl = lbl[1:]
+		}
+
+		// Save other goroutines.
+		forEachGRace(func(gp1 *g) {
+			if !isOK(gp1) {
+				return
+			}
+
+			if len(r) == 0 {
+				// Should be impossible, but better to return a
+				// truncated profile than to crash the entire process.
+				return
+			}
+			// saveg calls gentraceback, which may call cgo traceback functions.
+			// The world is stopped, so it cannot use cgocall (which will be
+			// blocked at exitsyscall). Do it on the system stack so it won't
+			// call into the schedular (see traceback.go:cgoContextPCs).
+			systemstack(func() { saveg(^uintptr(0), ^uintptr(0), gp1, &r[0]) })
+			if labels != nil {
+				lbl[0] = gp1.labels
+				lbl = lbl[1:]
+			}
+			r = r[1:]
+		})
+	}
+
+	if raceenabled {
+		raceacquire(unsafe.Pointer(&labelSync))
+	}
+
+	startTheWorld()
+	return n, ok
+}
+
+// GoroutineProfile returns n, the number of records in the active goroutine stack profile.
+// If len(p) >= n, GoroutineProfile copies the profile into p and returns n, true.
+// If len(p) < n, GoroutineProfile does not change p and returns n, false.
+//
+// Most clients should use the runtime/pprof package instead
+// of calling GoroutineProfile directly.
+func GoroutineProfile(p []StackRecord) (n int, ok bool) {
+
+	return goroutineProfileWithLabels(p, nil)
+}
+
+func saveg(pc, sp uintptr, gp *g, r *StackRecord) {
+	var u unwinder
+	u.initAt(pc, sp, 0, gp, unwindSilentErrors)
+	n := tracebackPCs(&u, 0, r.Stack0[:])
+	if n < len(r.Stack0) {
+		r.Stack0[n] = 0
+	}
+}
+
+// Stack formats a stack trace of the calling goroutine into buf
+// and returns the number of bytes written to buf.
+// If all is true, Stack formats stack traces of all other goroutines
+// into buf after the trace for the current goroutine.
+func Stack(buf []byte, all bool) int {
+	if all {
+		stopTheWorld(stwAllGoroutinesStack)
+	}
+
+	n := 0
+	if len(buf) > 0 {
+		gp := getg()
+		sp := getcallersp()
+		pc := getcallerpc()
+		systemstack(func() {
+			g0 := getg()
+			// Force traceback=1 to override GOTRACEBACK setting,
+			// so that Stack's results are consistent.
+			// GOTRACEBACK is only about crash dumps.
+			g0.m.traceback = 1
+			g0.writebuf = buf[0:0:len(buf)]
+			goroutineheader(gp)
+			traceback(pc, sp, 0, gp)
+			if all {
+				tracebackothers(gp)
+			}
+			g0.m.traceback = 0
+			n = len(g0.writebuf)
+			g0.writebuf = nil
+		})
+	}
+
+	if all {
+		startTheWorld()
+	}
+	return n
+}
+
+// Tracing of alloc/free/gc.
+
+var tracelock mutex
+
+func tracealloc(p unsafe.Pointer, size uintptr, typ *_type) {
+	lock(&tracelock)
+	gp := getg()
+	gp.m.traceback = 2
+	if typ == nil {
+		print("tracealloc(", p, ", ", hex(size), ")\n")
+	} else {
+		print("tracealloc(", p, ", ", hex(size), ", ", toRType(typ).string(), ")\n")
+	}
+	if gp.m.curg == nil || gp == gp.m.curg {
+		goroutineheader(gp)
+		pc := getcallerpc()
+		sp := getcallersp()
+		systemstack(func() {
+			traceback(pc, sp, 0, gp)
+		})
+	} else {
+		goroutineheader(gp.m.curg)
+		traceback(^uintptr(0), ^uintptr(0), 0, gp.m.curg)
+	}
+	print("\n")
+	gp.m.traceback = 0
+	unlock(&tracelock)
+}
+
+func tracefree(p unsafe.Pointer, size uintptr) {
+	lock(&tracelock)
+	gp := getg()
+	gp.m.traceback = 2
+	print("tracefree(", p, ", ", hex(size), ")\n")
+	goroutineheader(gp)
+	pc := getcallerpc()
+	sp := getcallersp()
+	systemstack(func() {
+		traceback(pc, sp, 0, gp)
+	})
+	print("\n")
+	gp.m.traceback = 0
+	unlock(&tracelock)
+}
+
+func tracegc() {
+	lock(&tracelock)
+	gp := getg()
+	gp.m.traceback = 2
+	print("tracegc()\n")
+	// running on m->g0 stack; show all non-g0 goroutines
+	tracebackothers(gp)
+	print("end tracegc\n")
+	print("\n")
+	gp.m.traceback = 0
+	unlock(&tracelock)
+}
diff --git a/src/runtime/mranges.go b/src/runtime/mranges.go
new file mode 100644
index 0000000..4388d26
--- /dev/null
+++ b/src/runtime/mranges.go
@@ -0,0 +1,460 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Address range data structure.
+//
+// This file contains an implementation of a data structure which
+// manages ordered address ranges.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// addrRange represents a region of address space.
+//
+// An addrRange must never span a gap in the address space.
+type addrRange struct {
+	// base and limit together represent the region of address space
+	// [base, limit). That is, base is inclusive, limit is exclusive.
+	// These are address over an offset view of the address space on
+	// platforms with a segmented address space, that is, on platforms
+	// where arenaBaseOffset != 0.
+	base, limit offAddr
+}
+
+// makeAddrRange creates a new address range from two virtual addresses.
+//
+// Throws if the base and limit are not in the same memory segment.
+func makeAddrRange(base, limit uintptr) addrRange {
+	r := addrRange{offAddr{base}, offAddr{limit}}
+	if (base-arenaBaseOffset >= base) != (limit-arenaBaseOffset >= limit) {
+		throw("addr range base and limit are not in the same memory segment")
+	}
+	return r
+}
+
+// size returns the size of the range represented in bytes.
+func (a addrRange) size() uintptr {
+	if !a.base.lessThan(a.limit) {
+		return 0
+	}
+	// Subtraction is safe because limit and base must be in the same
+	// segment of the address space.
+	return a.limit.diff(a.base)
+}
+
+// contains returns whether or not the range contains a given address.
+func (a addrRange) contains(addr uintptr) bool {
+	return a.base.lessEqual(offAddr{addr}) && (offAddr{addr}).lessThan(a.limit)
+}
+
+// subtract takes the addrRange toPrune and cuts out any overlap with
+// from, then returns the new range. subtract assumes that a and b
+// either don't overlap at all, only overlap on one side, or are equal.
+// If b is strictly contained in a, thus forcing a split, it will throw.
+func (a addrRange) subtract(b addrRange) addrRange {
+	if b.base.lessEqual(a.base) && a.limit.lessEqual(b.limit) {
+		return addrRange{}
+	} else if a.base.lessThan(b.base) && b.limit.lessThan(a.limit) {
+		throw("bad prune")
+	} else if b.limit.lessThan(a.limit) && a.base.lessThan(b.limit) {
+		a.base = b.limit
+	} else if a.base.lessThan(b.base) && b.base.lessThan(a.limit) {
+		a.limit = b.base
+	}
+	return a
+}
+
+// takeFromFront takes len bytes from the front of the address range, aligning
+// the base to align first. On success, returns the aligned start of the region
+// taken and true.
+func (a *addrRange) takeFromFront(len uintptr, align uint8) (uintptr, bool) {
+	base := alignUp(a.base.addr(), uintptr(align)) + len
+	if base > a.limit.addr() {
+		return 0, false
+	}
+	a.base = offAddr{base}
+	return base - len, true
+}
+
+// takeFromBack takes len bytes from the end of the address range, aligning
+// the limit to align after subtracting len. On success, returns the aligned
+// start of the region taken and true.
+func (a *addrRange) takeFromBack(len uintptr, align uint8) (uintptr, bool) {
+	limit := alignDown(a.limit.addr()-len, uintptr(align))
+	if a.base.addr() > limit {
+		return 0, false
+	}
+	a.limit = offAddr{limit}
+	return limit, true
+}
+
+// removeGreaterEqual removes all addresses in a greater than or equal
+// to addr and returns the new range.
+func (a addrRange) removeGreaterEqual(addr uintptr) addrRange {
+	if (offAddr{addr}).lessEqual(a.base) {
+		return addrRange{}
+	}
+	if a.limit.lessEqual(offAddr{addr}) {
+		return a
+	}
+	return makeAddrRange(a.base.addr(), addr)
+}
+
+var (
+	// minOffAddr is the minimum address in the offset space, and
+	// it corresponds to the virtual address arenaBaseOffset.
+	minOffAddr = offAddr{arenaBaseOffset}
+
+	// maxOffAddr is the maximum address in the offset address
+	// space. It corresponds to the highest virtual address representable
+	// by the page alloc chunk and heap arena maps.
+	maxOffAddr = offAddr{(((1 << heapAddrBits) - 1) + arenaBaseOffset) & uintptrMask}
+)
+
+// offAddr represents an address in a contiguous view
+// of the address space on systems where the address space is
+// segmented. On other systems, it's just a normal address.
+type offAddr struct {
+	// a is just the virtual address, but should never be used
+	// directly. Call addr() to get this value instead.
+	a uintptr
+}
+
+// add adds a uintptr offset to the offAddr.
+func (l offAddr) add(bytes uintptr) offAddr {
+	return offAddr{a: l.a + bytes}
+}
+
+// sub subtracts a uintptr offset from the offAddr.
+func (l offAddr) sub(bytes uintptr) offAddr {
+	return offAddr{a: l.a - bytes}
+}
+
+// diff returns the amount of bytes in between the
+// two offAddrs.
+func (l1 offAddr) diff(l2 offAddr) uintptr {
+	return l1.a - l2.a
+}
+
+// lessThan returns true if l1 is less than l2 in the offset
+// address space.
+func (l1 offAddr) lessThan(l2 offAddr) bool {
+	return (l1.a - arenaBaseOffset) < (l2.a - arenaBaseOffset)
+}
+
+// lessEqual returns true if l1 is less than or equal to l2 in
+// the offset address space.
+func (l1 offAddr) lessEqual(l2 offAddr) bool {
+	return (l1.a - arenaBaseOffset) <= (l2.a - arenaBaseOffset)
+}
+
+// equal returns true if the two offAddr values are equal.
+func (l1 offAddr) equal(l2 offAddr) bool {
+	// No need to compare in the offset space, it
+	// means the same thing.
+	return l1 == l2
+}
+
+// addr returns the virtual address for this offset address.
+func (l offAddr) addr() uintptr {
+	return l.a
+}
+
+// atomicOffAddr is like offAddr, but operations on it are atomic.
+// It also contains operations to be able to store marked addresses
+// to ensure that they're not overridden until they've been seen.
+type atomicOffAddr struct {
+	// a contains the offset address, unlike offAddr.
+	a atomic.Int64
+}
+
+// Clear attempts to store minOffAddr in atomicOffAddr. It may fail
+// if a marked value is placed in the box in the meanwhile.
+func (b *atomicOffAddr) Clear() {
+	for {
+		old := b.a.Load()
+		if old < 0 {
+			return
+		}
+		if b.a.CompareAndSwap(old, int64(minOffAddr.addr()-arenaBaseOffset)) {
+			return
+		}
+	}
+}
+
+// StoreMin stores addr if it's less than the current value in the
+// offset address space if the current value is not marked.
+func (b *atomicOffAddr) StoreMin(addr uintptr) {
+	new := int64(addr - arenaBaseOffset)
+	for {
+		old := b.a.Load()
+		if old < new {
+			return
+		}
+		if b.a.CompareAndSwap(old, new) {
+			return
+		}
+	}
+}
+
+// StoreUnmark attempts to unmark the value in atomicOffAddr and
+// replace it with newAddr. markedAddr must be a marked address
+// returned by Load. This function will not store newAddr if the
+// box no longer contains markedAddr.
+func (b *atomicOffAddr) StoreUnmark(markedAddr, newAddr uintptr) {
+	b.a.CompareAndSwap(-int64(markedAddr-arenaBaseOffset), int64(newAddr-arenaBaseOffset))
+}
+
+// StoreMarked stores addr but first converted to the offset address
+// space and then negated.
+func (b *atomicOffAddr) StoreMarked(addr uintptr) {
+	b.a.Store(-int64(addr - arenaBaseOffset))
+}
+
+// Load returns the address in the box as a virtual address. It also
+// returns if the value was marked or not.
+func (b *atomicOffAddr) Load() (uintptr, bool) {
+	v := b.a.Load()
+	wasMarked := false
+	if v < 0 {
+		wasMarked = true
+		v = -v
+	}
+	return uintptr(v) + arenaBaseOffset, wasMarked
+}
+
+// addrRanges is a data structure holding a collection of ranges of
+// address space.
+//
+// The ranges are coalesced eagerly to reduce the
+// number ranges it holds.
+//
+// The slice backing store for this field is persistentalloc'd
+// and thus there is no way to free it.
+//
+// addrRanges is not thread-safe.
+type addrRanges struct {
+	// ranges is a slice of ranges sorted by base.
+	ranges []addrRange
+
+	// totalBytes is the total amount of address space in bytes counted by
+	// this addrRanges.
+	totalBytes uintptr
+
+	// sysStat is the stat to track allocations by this type
+	sysStat *sysMemStat
+}
+
+func (a *addrRanges) init(sysStat *sysMemStat) {
+	ranges := (*notInHeapSlice)(unsafe.Pointer(&a.ranges))
+	ranges.len = 0
+	ranges.cap = 16
+	ranges.array = (*notInHeap)(persistentalloc(unsafe.Sizeof(addrRange{})*uintptr(ranges.cap), goarch.PtrSize, sysStat))
+	a.sysStat = sysStat
+	a.totalBytes = 0
+}
+
+// findSucc returns the first index in a such that addr is
+// less than the base of the addrRange at that index.
+func (a *addrRanges) findSucc(addr uintptr) int {
+	base := offAddr{addr}
+
+	// Narrow down the search space via a binary search
+	// for large addrRanges until we have at most iterMax
+	// candidates left.
+	const iterMax = 8
+	bot, top := 0, len(a.ranges)
+	for top-bot > iterMax {
+		i := ((top - bot) / 2) + bot
+		if a.ranges[i].contains(base.addr()) {
+			// a.ranges[i] contains base, so
+			// its successor is the next index.
+			return i + 1
+		}
+		if base.lessThan(a.ranges[i].base) {
+			// In this case i might actually be
+			// the successor, but we can't be sure
+			// until we check the ones before it.
+			top = i
+		} else {
+			// In this case we know base is
+			// greater than or equal to a.ranges[i].limit-1,
+			// so i is definitely not the successor.
+			// We already checked i, so pick the next
+			// one.
+			bot = i + 1
+		}
+	}
+	// There are top-bot candidates left, so
+	// iterate over them and find the first that
+	// base is strictly less than.
+	for i := bot; i < top; i++ {
+		if base.lessThan(a.ranges[i].base) {
+			return i
+		}
+	}
+	return top
+}
+
+// findAddrGreaterEqual returns the smallest address represented by a
+// that is >= addr. Thus, if the address is represented by a,
+// then it returns addr. The second return value indicates whether
+// such an address exists for addr in a. That is, if addr is larger than
+// any address known to a, the second return value will be false.
+func (a *addrRanges) findAddrGreaterEqual(addr uintptr) (uintptr, bool) {
+	i := a.findSucc(addr)
+	if i == 0 {
+		return a.ranges[0].base.addr(), true
+	}
+	if a.ranges[i-1].contains(addr) {
+		return addr, true
+	}
+	if i < len(a.ranges) {
+		return a.ranges[i].base.addr(), true
+	}
+	return 0, false
+}
+
+// contains returns true if a covers the address addr.
+func (a *addrRanges) contains(addr uintptr) bool {
+	i := a.findSucc(addr)
+	if i == 0 {
+		return false
+	}
+	return a.ranges[i-1].contains(addr)
+}
+
+// add inserts a new address range to a.
+//
+// r must not overlap with any address range in a and r.size() must be > 0.
+func (a *addrRanges) add(r addrRange) {
+	// The copies in this function are potentially expensive, but this data
+	// structure is meant to represent the Go heap. At worst, copying this
+	// would take ~160µs assuming a conservative copying rate of 25 GiB/s (the
+	// copy will almost never trigger a page fault) for a 1 TiB heap with 4 MiB
+	// arenas which is completely discontiguous. ~160µs is still a lot, but in
+	// practice most platforms have 64 MiB arenas (which cuts this by a factor
+	// of 16) and Go heaps are usually mostly contiguous, so the chance that
+	// an addrRanges even grows to that size is extremely low.
+
+	// An empty range has no effect on the set of addresses represented
+	// by a, but passing a zero-sized range is almost always a bug.
+	if r.size() == 0 {
+		print("runtime: range = {", hex(r.base.addr()), ", ", hex(r.limit.addr()), "}\n")
+		throw("attempted to add zero-sized address range")
+	}
+	// Because we assume r is not currently represented in a,
+	// findSucc gives us our insertion index.
+	i := a.findSucc(r.base.addr())
+	coalescesDown := i > 0 && a.ranges[i-1].limit.equal(r.base)
+	coalescesUp := i < len(a.ranges) && r.limit.equal(a.ranges[i].base)
+	if coalescesUp && coalescesDown {
+		// We have neighbors and they both border us.
+		// Merge a.ranges[i-1], r, and a.ranges[i] together into a.ranges[i-1].
+		a.ranges[i-1].limit = a.ranges[i].limit
+
+		// Delete a.ranges[i].
+		copy(a.ranges[i:], a.ranges[i+1:])
+		a.ranges = a.ranges[:len(a.ranges)-1]
+	} else if coalescesDown {
+		// We have a neighbor at a lower address only and it borders us.
+		// Merge the new space into a.ranges[i-1].
+		a.ranges[i-1].limit = r.limit
+	} else if coalescesUp {
+		// We have a neighbor at a higher address only and it borders us.
+		// Merge the new space into a.ranges[i].
+		a.ranges[i].base = r.base
+	} else {
+		// We may or may not have neighbors which don't border us.
+		// Add the new range.
+		if len(a.ranges)+1 > cap(a.ranges) {
+			// Grow the array. Note that this leaks the old array, but since
+			// we're doubling we have at most 2x waste. For a 1 TiB heap and
+			// 4 MiB arenas which are all discontiguous (both very conservative
+			// assumptions), this would waste at most 4 MiB of memory.
+			oldRanges := a.ranges
+			ranges := (*notInHeapSlice)(unsafe.Pointer(&a.ranges))
+			ranges.len = len(oldRanges) + 1
+			ranges.cap = cap(oldRanges) * 2
+			ranges.array = (*notInHeap)(persistentalloc(unsafe.Sizeof(addrRange{})*uintptr(ranges.cap), goarch.PtrSize, a.sysStat))
+
+			// Copy in the old array, but make space for the new range.
+			copy(a.ranges[:i], oldRanges[:i])
+			copy(a.ranges[i+1:], oldRanges[i:])
+		} else {
+			a.ranges = a.ranges[:len(a.ranges)+1]
+			copy(a.ranges[i+1:], a.ranges[i:])
+		}
+		a.ranges[i] = r
+	}
+	a.totalBytes += r.size()
+}
+
+// removeLast removes and returns the highest-addressed contiguous range
+// of a, or the last nBytes of that range, whichever is smaller. If a is
+// empty, it returns an empty range.
+func (a *addrRanges) removeLast(nBytes uintptr) addrRange {
+	if len(a.ranges) == 0 {
+		return addrRange{}
+	}
+	r := a.ranges[len(a.ranges)-1]
+	size := r.size()
+	if size > nBytes {
+		newEnd := r.limit.sub(nBytes)
+		a.ranges[len(a.ranges)-1].limit = newEnd
+		a.totalBytes -= nBytes
+		return addrRange{newEnd, r.limit}
+	}
+	a.ranges = a.ranges[:len(a.ranges)-1]
+	a.totalBytes -= size
+	return r
+}
+
+// removeGreaterEqual removes the ranges of a which are above addr, and additionally
+// splits any range containing addr.
+func (a *addrRanges) removeGreaterEqual(addr uintptr) {
+	pivot := a.findSucc(addr)
+	if pivot == 0 {
+		// addr is before all ranges in a.
+		a.totalBytes = 0
+		a.ranges = a.ranges[:0]
+		return
+	}
+	removed := uintptr(0)
+	for _, r := range a.ranges[pivot:] {
+		removed += r.size()
+	}
+	if r := a.ranges[pivot-1]; r.contains(addr) {
+		removed += r.size()
+		r = r.removeGreaterEqual(addr)
+		if r.size() == 0 {
+			pivot--
+		} else {
+			removed -= r.size()
+			a.ranges[pivot-1] = r
+		}
+	}
+	a.ranges = a.ranges[:pivot]
+	a.totalBytes -= removed
+}
+
+// cloneInto makes a deep clone of a's state into b, re-using
+// b's ranges if able.
+func (a *addrRanges) cloneInto(b *addrRanges) {
+	if len(a.ranges) > cap(b.ranges) {
+		// Grow the array.
+		ranges := (*notInHeapSlice)(unsafe.Pointer(&b.ranges))
+		ranges.len = 0
+		ranges.cap = cap(a.ranges)
+		ranges.array = (*notInHeap)(persistentalloc(unsafe.Sizeof(addrRange{})*uintptr(ranges.cap), goarch.PtrSize, b.sysStat))
+	}
+	b.ranges = b.ranges[:len(a.ranges)]
+	b.totalBytes = a.totalBytes
+	copy(b.ranges, a.ranges)
+}
diff --git a/src/runtime/mranges_test.go b/src/runtime/mranges_test.go
new file mode 100644
index 0000000..ed439c5
--- /dev/null
+++ b/src/runtime/mranges_test.go
@@ -0,0 +1,275 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	. "runtime"
+	"testing"
+)
+
+func validateAddrRanges(t *testing.T, a *AddrRanges, want ...AddrRange) {
+	ranges := a.Ranges()
+	if len(ranges) != len(want) {
+		t.Errorf("want %v, got %v", want, ranges)
+		t.Fatal("different lengths")
+	}
+	gotTotalBytes := uintptr(0)
+	wantTotalBytes := uintptr(0)
+	for i := range ranges {
+		gotTotalBytes += ranges[i].Size()
+		wantTotalBytes += want[i].Size()
+		if ranges[i].Base() >= ranges[i].Limit() {
+			t.Error("empty range found")
+		}
+		// Ensure this is equivalent to what we want.
+		if !ranges[i].Equals(want[i]) {
+			t.Errorf("range %d: got [0x%x, 0x%x), want [0x%x, 0x%x)", i,
+				ranges[i].Base(), ranges[i].Limit(),
+				want[i].Base(), want[i].Limit(),
+			)
+		}
+		if i != 0 {
+			// Ensure the ranges are sorted.
+			if ranges[i-1].Base() >= ranges[i].Base() {
+				t.Errorf("ranges %d and %d are out of sorted order", i-1, i)
+			}
+			// Check for a failure to coalesce.
+			if ranges[i-1].Limit() == ranges[i].Base() {
+				t.Errorf("ranges %d and %d should have coalesced", i-1, i)
+			}
+			// Check if any ranges overlap. Because the ranges are sorted
+			// by base, it's sufficient to just check neighbors.
+			if ranges[i-1].Limit() > ranges[i].Base() {
+				t.Errorf("ranges %d and %d overlap", i-1, i)
+			}
+		}
+	}
+	if wantTotalBytes != gotTotalBytes {
+		t.Errorf("expected %d total bytes, got %d", wantTotalBytes, gotTotalBytes)
+	}
+	if b := a.TotalBytes(); b != gotTotalBytes {
+		t.Errorf("inconsistent total bytes: want %d, got %d", gotTotalBytes, b)
+	}
+	if t.Failed() {
+		t.Errorf("addrRanges: %v", ranges)
+		t.Fatal("detected bad addrRanges")
+	}
+}
+
+func TestAddrRangesAdd(t *testing.T) {
+	a := NewAddrRanges()
+
+	// First range.
+	a.Add(MakeAddrRange(512, 1024))
+	validateAddrRanges(t, &a,
+		MakeAddrRange(512, 1024),
+	)
+
+	// Coalesce up.
+	a.Add(MakeAddrRange(1024, 2048))
+	validateAddrRanges(t, &a,
+		MakeAddrRange(512, 2048),
+	)
+
+	// Add new independent range.
+	a.Add(MakeAddrRange(4096, 8192))
+	validateAddrRanges(t, &a,
+		MakeAddrRange(512, 2048),
+		MakeAddrRange(4096, 8192),
+	)
+
+	// Coalesce down.
+	a.Add(MakeAddrRange(3776, 4096))
+	validateAddrRanges(t, &a,
+		MakeAddrRange(512, 2048),
+		MakeAddrRange(3776, 8192),
+	)
+
+	// Coalesce up and down.
+	a.Add(MakeAddrRange(2048, 3776))
+	validateAddrRanges(t, &a,
+		MakeAddrRange(512, 8192),
+	)
+
+	// Push a bunch of independent ranges to the end to try and force growth.
+	expectedRanges := []AddrRange{MakeAddrRange(512, 8192)}
+	for i := uintptr(0); i < 64; i++ {
+		dRange := MakeAddrRange(8192+(i+1)*2048, 8192+(i+1)*2048+10)
+		a.Add(dRange)
+		expectedRanges = append(expectedRanges, dRange)
+		validateAddrRanges(t, &a, expectedRanges...)
+	}
+
+	// Push a bunch of independent ranges to the beginning to try and force growth.
+	var bottomRanges []AddrRange
+	for i := uintptr(0); i < 63; i++ {
+		dRange := MakeAddrRange(8+i*8, 8+i*8+4)
+		a.Add(dRange)
+		bottomRanges = append(bottomRanges, dRange)
+		validateAddrRanges(t, &a, append(bottomRanges, expectedRanges...)...)
+	}
+}
+
+func TestAddrRangesFindSucc(t *testing.T) {
+	var large []AddrRange
+	for i := 0; i < 100; i++ {
+		large = append(large, MakeAddrRange(5+uintptr(i)*5, 5+uintptr(i)*5+3))
+	}
+
+	type testt struct {
+		name   string
+		base   uintptr
+		expect int
+		ranges []AddrRange
+	}
+	tests := []testt{
+		{
+			name:   "Empty",
+			base:   12,
+			expect: 0,
+			ranges: []AddrRange{},
+		},
+		{
+			name:   "OneBefore",
+			base:   12,
+			expect: 0,
+			ranges: []AddrRange{
+				MakeAddrRange(14, 16),
+			},
+		},
+		{
+			name:   "OneWithin",
+			base:   14,
+			expect: 1,
+			ranges: []AddrRange{
+				MakeAddrRange(14, 16),
+			},
+		},
+		{
+			name:   "OneAfterLimit",
+			base:   16,
+			expect: 1,
+			ranges: []AddrRange{
+				MakeAddrRange(14, 16),
+			},
+		},
+		{
+			name:   "OneAfter",
+			base:   17,
+			expect: 1,
+			ranges: []AddrRange{
+				MakeAddrRange(14, 16),
+			},
+		},
+		{
+			name:   "ThreeBefore",
+			base:   3,
+			expect: 0,
+			ranges: []AddrRange{
+				MakeAddrRange(6, 10),
+				MakeAddrRange(12, 16),
+				MakeAddrRange(19, 22),
+			},
+		},
+		{
+			name:   "ThreeAfter",
+			base:   24,
+			expect: 3,
+			ranges: []AddrRange{
+				MakeAddrRange(6, 10),
+				MakeAddrRange(12, 16),
+				MakeAddrRange(19, 22),
+			},
+		},
+		{
+			name:   "ThreeBetween",
+			base:   11,
+			expect: 1,
+			ranges: []AddrRange{
+				MakeAddrRange(6, 10),
+				MakeAddrRange(12, 16),
+				MakeAddrRange(19, 22),
+			},
+		},
+		{
+			name:   "ThreeWithin",
+			base:   9,
+			expect: 1,
+			ranges: []AddrRange{
+				MakeAddrRange(6, 10),
+				MakeAddrRange(12, 16),
+				MakeAddrRange(19, 22),
+			},
+		},
+		{
+			name:   "Zero",
+			base:   0,
+			expect: 1,
+			ranges: []AddrRange{
+				MakeAddrRange(0, 10),
+			},
+		},
+		{
+			name:   "Max",
+			base:   ^uintptr(0),
+			expect: 1,
+			ranges: []AddrRange{
+				MakeAddrRange(^uintptr(0)-5, ^uintptr(0)),
+			},
+		},
+		{
+			name:   "LargeBefore",
+			base:   2,
+			expect: 0,
+			ranges: large,
+		},
+		{
+			name:   "LargeAfter",
+			base:   5 + uintptr(len(large))*5 + 30,
+			expect: len(large),
+			ranges: large,
+		},
+		{
+			name:   "LargeBetweenLow",
+			base:   14,
+			expect: 2,
+			ranges: large,
+		},
+		{
+			name:   "LargeBetweenHigh",
+			base:   249,
+			expect: 49,
+			ranges: large,
+		},
+		{
+			name:   "LargeWithinLow",
+			base:   25,
+			expect: 5,
+			ranges: large,
+		},
+		{
+			name:   "LargeWithinHigh",
+			base:   396,
+			expect: 79,
+			ranges: large,
+		},
+		{
+			name:   "LargeWithinMiddle",
+			base:   250,
+			expect: 50,
+			ranges: large,
+		},
+	}
+
+	for _, test := range tests {
+		t.Run(test.name, func(t *testing.T) {
+			a := MakeAddrRanges(test.ranges...)
+			i := a.FindSucc(test.base)
+			if i != test.expect {
+				t.Fatalf("expected %d, got %d", test.expect, i)
+			}
+		})
+	}
+}
diff --git a/src/runtime/msan.go b/src/runtime/msan.go
new file mode 100644
index 0000000..5e2aae1
--- /dev/null
+++ b/src/runtime/msan.go
@@ -0,0 +1,62 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build msan
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+// Public memory sanitizer API.
+
+func MSanRead(addr unsafe.Pointer, len int) {
+	msanread(addr, uintptr(len))
+}
+
+func MSanWrite(addr unsafe.Pointer, len int) {
+	msanwrite(addr, uintptr(len))
+}
+
+// Private interface for the runtime.
+const msanenabled = true
+
+// If we are running on the system stack, the C program may have
+// marked part of that stack as uninitialized. We don't instrument
+// the runtime, but operations like a slice copy can call msanread
+// anyhow for values on the stack. Just ignore msanread when running
+// on the system stack. The other msan functions are fine.
+//
+//go:nosplit
+func msanread(addr unsafe.Pointer, sz uintptr) {
+	gp := getg()
+	if gp == nil || gp.m == nil || gp == gp.m.g0 || gp == gp.m.gsignal {
+		return
+	}
+	domsanread(addr, sz)
+}
+
+//go:noescape
+func domsanread(addr unsafe.Pointer, sz uintptr)
+
+//go:noescape
+func msanwrite(addr unsafe.Pointer, sz uintptr)
+
+//go:noescape
+func msanmalloc(addr unsafe.Pointer, sz uintptr)
+
+//go:noescape
+func msanfree(addr unsafe.Pointer, sz uintptr)
+
+//go:noescape
+func msanmove(dst, src unsafe.Pointer, sz uintptr)
+
+// These are called from msan_GOARCH.s
+//
+//go:cgo_import_static __msan_read_go
+//go:cgo_import_static __msan_write_go
+//go:cgo_import_static __msan_malloc_go
+//go:cgo_import_static __msan_free_go
+//go:cgo_import_static __msan_memmove
diff --git a/src/runtime/msan/msan.go b/src/runtime/msan/msan.go
new file mode 100644
index 0000000..4e41f85
--- /dev/null
+++ b/src/runtime/msan/msan.go
@@ -0,0 +1,32 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build msan && ((linux && (amd64 || arm64)) || (freebsd && amd64))
+
+package msan
+
+/*
+#cgo CFLAGS: -fsanitize=memory
+#cgo LDFLAGS: -fsanitize=memory
+
+#include <stdint.h>
+#include <sanitizer/msan_interface.h>
+
+void __msan_read_go(void *addr, uintptr_t sz) {
+	__msan_check_mem_is_initialized(addr, sz);
+}
+
+void __msan_write_go(void *addr, uintptr_t sz) {
+	__msan_unpoison(addr, sz);
+}
+
+void __msan_malloc_go(void *addr, uintptr_t sz) {
+	__msan_unpoison(addr, sz);
+}
+
+void __msan_free_go(void *addr, uintptr_t sz) {
+	__msan_poison(addr, sz);
+}
+*/
+import "C"
diff --git a/src/runtime/msan0.go b/src/runtime/msan0.go
new file mode 100644
index 0000000..2f5fd2d
--- /dev/null
+++ b/src/runtime/msan0.go
@@ -0,0 +1,23 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !msan
+
+// Dummy MSan support API, used when not built with -msan.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+const msanenabled = false
+
+// Because msanenabled is false, none of these functions should be called.
+
+func msanread(addr unsafe.Pointer, sz uintptr)     { throw("msan") }
+func msanwrite(addr unsafe.Pointer, sz uintptr)    { throw("msan") }
+func msanmalloc(addr unsafe.Pointer, sz uintptr)   { throw("msan") }
+func msanfree(addr unsafe.Pointer, sz uintptr)     { throw("msan") }
+func msanmove(dst, src unsafe.Pointer, sz uintptr) { throw("msan") }
diff --git a/src/runtime/msan_amd64.s b/src/runtime/msan_amd64.s
new file mode 100644
index 0000000..89ed304
--- /dev/null
+++ b/src/runtime/msan_amd64.s
@@ -0,0 +1,89 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build msan
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// This is like race_amd64.s, but for the msan calls.
+// See race_amd64.s for detailed comments.
+
+#ifdef GOOS_windows
+#define RARG0 CX
+#define RARG1 DX
+#define RARG2 R8
+#define RARG3 R9
+#else
+#define RARG0 DI
+#define RARG1 SI
+#define RARG2 DX
+#define RARG3 CX
+#endif
+
+// func runtime·domsanread(addr unsafe.Pointer, sz uintptr)
+// Called from msanread.
+TEXT	runtime·domsanread(SB), NOSPLIT, $0-16
+	MOVQ	addr+0(FP), RARG0
+	MOVQ	size+8(FP), RARG1
+	// void __msan_read_go(void *addr, uintptr_t sz);
+	MOVQ	$__msan_read_go(SB), AX
+	JMP	msancall<>(SB)
+
+// func runtime·msanwrite(addr unsafe.Pointer, sz uintptr)
+// Called from instrumented code.
+TEXT	runtime·msanwrite(SB), NOSPLIT, $0-16
+	MOVQ	addr+0(FP), RARG0
+	MOVQ	size+8(FP), RARG1
+	// void __msan_write_go(void *addr, uintptr_t sz);
+	MOVQ	$__msan_write_go(SB), AX
+	JMP	msancall<>(SB)
+
+// func runtime·msanmalloc(addr unsafe.Pointer, sz uintptr)
+TEXT	runtime·msanmalloc(SB), NOSPLIT, $0-16
+	MOVQ	addr+0(FP), RARG0
+	MOVQ	size+8(FP), RARG1
+	// void __msan_malloc_go(void *addr, uintptr_t sz);
+	MOVQ	$__msan_malloc_go(SB), AX
+	JMP	msancall<>(SB)
+
+// func runtime·msanfree(addr unsafe.Pointer, sz uintptr)
+TEXT	runtime·msanfree(SB), NOSPLIT, $0-16
+	MOVQ	addr+0(FP), RARG0
+	MOVQ	size+8(FP), RARG1
+	// void __msan_free_go(void *addr, uintptr_t sz);
+	MOVQ	$__msan_free_go(SB), AX
+	JMP	msancall<>(SB)
+
+// func runtime·msanmove(dst, src unsafe.Pointer, sz uintptr)
+TEXT	runtime·msanmove(SB), NOSPLIT, $0-24
+	MOVQ	dst+0(FP), RARG0
+	MOVQ	src+8(FP), RARG1
+	MOVQ	size+16(FP), RARG2
+	// void __msan_memmove(void *dst, void *src, uintptr_t sz);
+	MOVQ	$__msan_memmove(SB), AX
+	JMP	msancall<>(SB)
+
+// Switches SP to g0 stack and calls (AX). Arguments already set.
+TEXT	msancall<>(SB), NOSPLIT, $0-0
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	MOVQ	SP, R12		// callee-saved, preserved across the CALL
+	CMPQ	R14, $0
+	JE	call	// no g; still on a system stack
+
+	MOVQ	g_m(R14), R13
+	// Switch to g0 stack.
+	MOVQ	m_g0(R13), R10
+	CMPQ	R10, R14
+	JE	call	// already on g0
+
+	MOVQ	(g_sched+gobuf_sp)(R10), SP
+call:
+	ANDQ	$~15, SP	// alignment for gcc ABI
+	CALL	AX
+	MOVQ	R12, SP
+	RET
diff --git a/src/runtime/msan_arm64.s b/src/runtime/msan_arm64.s
new file mode 100644
index 0000000..b9eff34
--- /dev/null
+++ b/src/runtime/msan_arm64.s
@@ -0,0 +1,73 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build msan
+
+#include "go_asm.h"
+#include "textflag.h"
+
+#define RARG0 R0
+#define RARG1 R1
+#define RARG2 R2
+#define FARG R3
+
+// func runtime·domsanread(addr unsafe.Pointer, sz uintptr)
+// Called from msanread.
+TEXT	runtime·domsanread(SB), NOSPLIT, $0-16
+	MOVD	addr+0(FP), RARG0
+	MOVD	size+8(FP), RARG1
+	// void __msan_read_go(void *addr, uintptr_t sz);
+	MOVD	$__msan_read_go(SB), FARG
+	JMP	msancall<>(SB)
+
+// func runtime·msanwrite(addr unsafe.Pointer, sz uintptr)
+// Called from instrumented code.
+TEXT	runtime·msanwrite(SB), NOSPLIT, $0-16
+	MOVD	addr+0(FP), RARG0
+	MOVD	size+8(FP), RARG1
+	// void __msan_write_go(void *addr, uintptr_t sz);
+	MOVD	$__msan_write_go(SB), FARG
+	JMP	msancall<>(SB)
+
+// func runtime·msanmalloc(addr unsafe.Pointer, sz uintptr)
+TEXT	runtime·msanmalloc(SB), NOSPLIT, $0-16
+	MOVD	addr+0(FP), RARG0
+	MOVD	size+8(FP), RARG1
+	// void __msan_malloc_go(void *addr, uintptr_t sz);
+	MOVD	$__msan_malloc_go(SB), FARG
+	JMP	msancall<>(SB)
+
+// func runtime·msanfree(addr unsafe.Pointer, sz uintptr)
+TEXT	runtime·msanfree(SB), NOSPLIT, $0-16
+	MOVD	addr+0(FP), RARG0
+	MOVD	size+8(FP), RARG1
+	// void __msan_free_go(void *addr, uintptr_t sz);
+	MOVD	$__msan_free_go(SB), FARG
+	JMP	msancall<>(SB)
+
+// func runtime·msanmove(dst, src unsafe.Pointer, sz uintptr)
+TEXT	runtime·msanmove(SB), NOSPLIT, $0-24
+	MOVD	dst+0(FP), RARG0
+	MOVD	src+8(FP), RARG1
+	MOVD	size+16(FP), RARG2
+	// void __msan_memmove(void *dst, void *src, uintptr_t sz);
+	MOVD	$__msan_memmove(SB), FARG
+	JMP	msancall<>(SB)
+
+// Switches SP to g0 stack and calls (FARG). Arguments already set.
+TEXT	msancall<>(SB), NOSPLIT, $0-0
+	MOVD	RSP, R19                  // callee-saved
+	CBZ	g, g0stack                // no g, still on a system stack
+	MOVD	g_m(g), R10
+	MOVD	m_g0(R10), R11
+	CMP	R11, g
+	BEQ	g0stack
+
+	MOVD	(g_sched+gobuf_sp)(R11), R4
+	MOVD	R4, RSP
+
+g0stack:
+	BL	(FARG)
+	MOVD	R19, RSP
+	RET
diff --git a/src/runtime/msize.go b/src/runtime/msize.go
new file mode 100644
index 0000000..c56aa5a
--- /dev/null
+++ b/src/runtime/msize.go
@@ -0,0 +1,25 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Malloc small size classes.
+//
+// See malloc.go for overview.
+// See also mksizeclasses.go for how we decide what size classes to use.
+
+package runtime
+
+// Returns size of the memory block that mallocgc will allocate if you ask for the size.
+func roundupsize(size uintptr) uintptr {
+	if size < _MaxSmallSize {
+		if size <= smallSizeMax-8 {
+			return uintptr(class_to_size[size_to_class8[divRoundUp(size, smallSizeDiv)]])
+		} else {
+			return uintptr(class_to_size[size_to_class128[divRoundUp(size-smallSizeMax, largeSizeDiv)]])
+		}
+	}
+	if size+_PageSize < size {
+		return size
+	}
+	return alignUp(size, _PageSize)
+}
diff --git a/src/runtime/mspanset.go b/src/runtime/mspanset.go
new file mode 100644
index 0000000..5520d6c
--- /dev/null
+++ b/src/runtime/mspanset.go
@@ -0,0 +1,404 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/cpu"
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// A spanSet is a set of *mspans.
+//
+// spanSet is safe for concurrent push and pop operations.
+type spanSet struct {
+	// A spanSet is a two-level data structure consisting of a
+	// growable spine that points to fixed-sized blocks. The spine
+	// can be accessed without locks, but adding a block or
+	// growing it requires taking the spine lock.
+	//
+	// Because each mspan covers at least 8K of heap and takes at
+	// most 8 bytes in the spanSet, the growth of the spine is
+	// quite limited.
+	//
+	// The spine and all blocks are allocated off-heap, which
+	// allows this to be used in the memory manager and avoids the
+	// need for write barriers on all of these. spanSetBlocks are
+	// managed in a pool, though never freed back to the operating
+	// system. We never release spine memory because there could be
+	// concurrent lock-free access and we're likely to reuse it
+	// anyway. (In principle, we could do this during STW.)
+
+	spineLock mutex
+	spine     atomicSpanSetSpinePointer // *[N]atomic.Pointer[spanSetBlock]
+	spineLen  atomic.Uintptr            // Spine array length
+	spineCap  uintptr                   // Spine array cap, accessed under spineLock
+
+	// index is the head and tail of the spanSet in a single field.
+	// The head and the tail both represent an index into the logical
+	// concatenation of all blocks, with the head always behind or
+	// equal to the tail (indicating an empty set). This field is
+	// always accessed atomically.
+	//
+	// The head and the tail are only 32 bits wide, which means we
+	// can only support up to 2^32 pushes before a reset. If every
+	// span in the heap were stored in this set, and each span were
+	// the minimum size (1 runtime page, 8 KiB), then roughly the
+	// smallest heap which would be unrepresentable is 32 TiB in size.
+	index atomicHeadTailIndex
+}
+
+const (
+	spanSetBlockEntries = 512 // 4KB on 64-bit
+	spanSetInitSpineCap = 256 // Enough for 1GB heap on 64-bit
+)
+
+type spanSetBlock struct {
+	// Free spanSetBlocks are managed via a lock-free stack.
+	lfnode
+
+	// popped is the number of pop operations that have occurred on
+	// this block. This number is used to help determine when a block
+	// may be safely recycled.
+	popped atomic.Uint32
+
+	// spans is the set of spans in this block.
+	spans [spanSetBlockEntries]atomicMSpanPointer
+}
+
+// push adds span s to buffer b. push is safe to call concurrently
+// with other push and pop operations.
+func (b *spanSet) push(s *mspan) {
+	// Obtain our slot.
+	cursor := uintptr(b.index.incTail().tail() - 1)
+	top, bottom := cursor/spanSetBlockEntries, cursor%spanSetBlockEntries
+
+	// Do we need to add a block?
+	spineLen := b.spineLen.Load()
+	var block *spanSetBlock
+retry:
+	if top < spineLen {
+		block = b.spine.Load().lookup(top).Load()
+	} else {
+		// Add a new block to the spine, potentially growing
+		// the spine.
+		lock(&b.spineLock)
+		// spineLen cannot change until we release the lock,
+		// but may have changed while we were waiting.
+		spineLen = b.spineLen.Load()
+		if top < spineLen {
+			unlock(&b.spineLock)
+			goto retry
+		}
+
+		spine := b.spine.Load()
+		if spineLen == b.spineCap {
+			// Grow the spine.
+			newCap := b.spineCap * 2
+			if newCap == 0 {
+				newCap = spanSetInitSpineCap
+			}
+			newSpine := persistentalloc(newCap*goarch.PtrSize, cpu.CacheLineSize, &memstats.gcMiscSys)
+			if b.spineCap != 0 {
+				// Blocks are allocated off-heap, so
+				// no write barriers.
+				memmove(newSpine, spine.p, b.spineCap*goarch.PtrSize)
+			}
+			spine = spanSetSpinePointer{newSpine}
+
+			// Spine is allocated off-heap, so no write barrier.
+			b.spine.StoreNoWB(spine)
+			b.spineCap = newCap
+			// We can't immediately free the old spine
+			// since a concurrent push with a lower index
+			// could still be reading from it. We let it
+			// leak because even a 1TB heap would waste
+			// less than 2MB of memory on old spines. If
+			// this is a problem, we could free old spines
+			// during STW.
+		}
+
+		// Allocate a new block from the pool.
+		block = spanSetBlockPool.alloc()
+
+		// Add it to the spine.
+		// Blocks are allocated off-heap, so no write barrier.
+		spine.lookup(top).StoreNoWB(block)
+		b.spineLen.Store(spineLen + 1)
+		unlock(&b.spineLock)
+	}
+
+	// We have a block. Insert the span atomically, since there may be
+	// concurrent readers via the block API.
+	block.spans[bottom].StoreNoWB(s)
+}
+
+// pop removes and returns a span from buffer b, or nil if b is empty.
+// pop is safe to call concurrently with other pop and push operations.
+func (b *spanSet) pop() *mspan {
+	var head, tail uint32
+claimLoop:
+	for {
+		headtail := b.index.load()
+		head, tail = headtail.split()
+		if head >= tail {
+			// The buf is empty, as far as we can tell.
+			return nil
+		}
+		// Check if the head position we want to claim is actually
+		// backed by a block.
+		spineLen := b.spineLen.Load()
+		if spineLen <= uintptr(head)/spanSetBlockEntries {
+			// We're racing with a spine growth and the allocation of
+			// a new block (and maybe a new spine!), and trying to grab
+			// the span at the index which is currently being pushed.
+			// Instead of spinning, let's just notify the caller that
+			// there's nothing currently here. Spinning on this is
+			// almost definitely not worth it.
+			return nil
+		}
+		// Try to claim the current head by CASing in an updated head.
+		// This may fail transiently due to a push which modifies the
+		// tail, so keep trying while the head isn't changing.
+		want := head
+		for want == head {
+			if b.index.cas(headtail, makeHeadTailIndex(want+1, tail)) {
+				break claimLoop
+			}
+			headtail = b.index.load()
+			head, tail = headtail.split()
+		}
+		// We failed to claim the spot we were after and the head changed,
+		// meaning a popper got ahead of us. Try again from the top because
+		// the buf may not be empty.
+	}
+	top, bottom := head/spanSetBlockEntries, head%spanSetBlockEntries
+
+	// We may be reading a stale spine pointer, but because the length
+	// grows monotonically and we've already verified it, we'll definitely
+	// be reading from a valid block.
+	blockp := b.spine.Load().lookup(uintptr(top))
+
+	// Given that the spine length is correct, we know we will never
+	// see a nil block here, since the length is always updated after
+	// the block is set.
+	block := blockp.Load()
+	s := block.spans[bottom].Load()
+	for s == nil {
+		// We raced with the span actually being set, but given that we
+		// know a block for this span exists, the race window here is
+		// extremely small. Try again.
+		s = block.spans[bottom].Load()
+	}
+	// Clear the pointer. This isn't strictly necessary, but defensively
+	// avoids accidentally re-using blocks which could lead to memory
+	// corruption. This way, we'll get a nil pointer access instead.
+	block.spans[bottom].StoreNoWB(nil)
+
+	// Increase the popped count. If we are the last possible popper
+	// in the block (note that bottom need not equal spanSetBlockEntries-1
+	// due to races) then it's our responsibility to free the block.
+	//
+	// If we increment popped to spanSetBlockEntries, we can be sure that
+	// we're the last popper for this block, and it's thus safe to free it.
+	// Every other popper must have crossed this barrier (and thus finished
+	// popping its corresponding mspan) by the time we get here. Because
+	// we're the last popper, we also don't have to worry about concurrent
+	// pushers (there can't be any). Note that we may not be the popper
+	// which claimed the last slot in the block, we're just the last one
+	// to finish popping.
+	if block.popped.Add(1) == spanSetBlockEntries {
+		// Clear the block's pointer.
+		blockp.StoreNoWB(nil)
+
+		// Return the block to the block pool.
+		spanSetBlockPool.free(block)
+	}
+	return s
+}
+
+// reset resets a spanSet which is empty. It will also clean up
+// any left over blocks.
+//
+// Throws if the buf is not empty.
+//
+// reset may not be called concurrently with any other operations
+// on the span set.
+func (b *spanSet) reset() {
+	head, tail := b.index.load().split()
+	if head < tail {
+		print("head = ", head, ", tail = ", tail, "\n")
+		throw("attempt to clear non-empty span set")
+	}
+	top := head / spanSetBlockEntries
+	if uintptr(top) < b.spineLen.Load() {
+		// If the head catches up to the tail and the set is empty,
+		// we may not clean up the block containing the head and tail
+		// since it may be pushed into again. In order to avoid leaking
+		// memory since we're going to reset the head and tail, clean
+		// up such a block now, if it exists.
+		blockp := b.spine.Load().lookup(uintptr(top))
+		block := blockp.Load()
+		if block != nil {
+			// Check the popped value.
+			if block.popped.Load() == 0 {
+				// popped should never be zero because that means we have
+				// pushed at least one value but not yet popped if this
+				// block pointer is not nil.
+				throw("span set block with unpopped elements found in reset")
+			}
+			if block.popped.Load() == spanSetBlockEntries {
+				// popped should also never be equal to spanSetBlockEntries
+				// because the last popper should have made the block pointer
+				// in this slot nil.
+				throw("fully empty unfreed span set block found in reset")
+			}
+
+			// Clear the pointer to the block.
+			blockp.StoreNoWB(nil)
+
+			// Return the block to the block pool.
+			spanSetBlockPool.free(block)
+		}
+	}
+	b.index.reset()
+	b.spineLen.Store(0)
+}
+
+// atomicSpanSetSpinePointer is an atomically-accessed spanSetSpinePointer.
+//
+// It has the same semantics as atomic.UnsafePointer.
+type atomicSpanSetSpinePointer struct {
+	a atomic.UnsafePointer
+}
+
+// Loads the spanSetSpinePointer and returns it.
+//
+// It has the same semantics as atomic.UnsafePointer.
+func (s *atomicSpanSetSpinePointer) Load() spanSetSpinePointer {
+	return spanSetSpinePointer{s.a.Load()}
+}
+
+// Stores the spanSetSpinePointer.
+//
+// It has the same semantics as atomic.UnsafePointer.
+func (s *atomicSpanSetSpinePointer) StoreNoWB(p spanSetSpinePointer) {
+	s.a.StoreNoWB(p.p)
+}
+
+// spanSetSpinePointer represents a pointer to a contiguous block of atomic.Pointer[spanSetBlock].
+type spanSetSpinePointer struct {
+	p unsafe.Pointer
+}
+
+// lookup returns &s[idx].
+func (s spanSetSpinePointer) lookup(idx uintptr) *atomic.Pointer[spanSetBlock] {
+	return (*atomic.Pointer[spanSetBlock])(add(unsafe.Pointer(s.p), goarch.PtrSize*idx))
+}
+
+// spanSetBlockPool is a global pool of spanSetBlocks.
+var spanSetBlockPool spanSetBlockAlloc
+
+// spanSetBlockAlloc represents a concurrent pool of spanSetBlocks.
+type spanSetBlockAlloc struct {
+	stack lfstack
+}
+
+// alloc tries to grab a spanSetBlock out of the pool, and if it fails
+// persistentallocs a new one and returns it.
+func (p *spanSetBlockAlloc) alloc() *spanSetBlock {
+	if s := (*spanSetBlock)(p.stack.pop()); s != nil {
+		return s
+	}
+	return (*spanSetBlock)(persistentalloc(unsafe.Sizeof(spanSetBlock{}), cpu.CacheLineSize, &memstats.gcMiscSys))
+}
+
+// free returns a spanSetBlock back to the pool.
+func (p *spanSetBlockAlloc) free(block *spanSetBlock) {
+	block.popped.Store(0)
+	p.stack.push(&block.lfnode)
+}
+
+// headTailIndex represents a combined 32-bit head and 32-bit tail
+// of a queue into a single 64-bit value.
+type headTailIndex uint64
+
+// makeHeadTailIndex creates a headTailIndex value from a separate
+// head and tail.
+func makeHeadTailIndex(head, tail uint32) headTailIndex {
+	return headTailIndex(uint64(head)<<32 | uint64(tail))
+}
+
+// head returns the head of a headTailIndex value.
+func (h headTailIndex) head() uint32 {
+	return uint32(h >> 32)
+}
+
+// tail returns the tail of a headTailIndex value.
+func (h headTailIndex) tail() uint32 {
+	return uint32(h)
+}
+
+// split splits the headTailIndex value into its parts.
+func (h headTailIndex) split() (head uint32, tail uint32) {
+	return h.head(), h.tail()
+}
+
+// atomicHeadTailIndex is an atomically-accessed headTailIndex.
+type atomicHeadTailIndex struct {
+	u atomic.Uint64
+}
+
+// load atomically reads a headTailIndex value.
+func (h *atomicHeadTailIndex) load() headTailIndex {
+	return headTailIndex(h.u.Load())
+}
+
+// cas atomically compares-and-swaps a headTailIndex value.
+func (h *atomicHeadTailIndex) cas(old, new headTailIndex) bool {
+	return h.u.CompareAndSwap(uint64(old), uint64(new))
+}
+
+// incHead atomically increments the head of a headTailIndex.
+func (h *atomicHeadTailIndex) incHead() headTailIndex {
+	return headTailIndex(h.u.Add(1 << 32))
+}
+
+// decHead atomically decrements the head of a headTailIndex.
+func (h *atomicHeadTailIndex) decHead() headTailIndex {
+	return headTailIndex(h.u.Add(-(1 << 32)))
+}
+
+// incTail atomically increments the tail of a headTailIndex.
+func (h *atomicHeadTailIndex) incTail() headTailIndex {
+	ht := headTailIndex(h.u.Add(1))
+	// Check for overflow.
+	if ht.tail() == 0 {
+		print("runtime: head = ", ht.head(), ", tail = ", ht.tail(), "\n")
+		throw("headTailIndex overflow")
+	}
+	return ht
+}
+
+// reset clears the headTailIndex to (0, 0).
+func (h *atomicHeadTailIndex) reset() {
+	h.u.Store(0)
+}
+
+// atomicMSpanPointer is an atomic.Pointer[mspan]. Can't use generics because it's NotInHeap.
+type atomicMSpanPointer struct {
+	p atomic.UnsafePointer
+}
+
+// Load returns the *mspan.
+func (p *atomicMSpanPointer) Load() *mspan {
+	return (*mspan)(p.p.Load())
+}
+
+// Store stores an *mspan.
+func (p *atomicMSpanPointer) StoreNoWB(s *mspan) {
+	p.p.StoreNoWB(unsafe.Pointer(s))
+}
diff --git a/src/runtime/mstats.go b/src/runtime/mstats.go
new file mode 100644
index 0000000..308bed6
--- /dev/null
+++ b/src/runtime/mstats.go
@@ -0,0 +1,983 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Memory statistics
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+type mstats struct {
+	// Statistics about malloc heap.
+	heapStats consistentHeapStats
+
+	// Statistics about stacks.
+	stacks_sys sysMemStat // only counts newosproc0 stack in mstats; differs from MemStats.StackSys
+
+	// Statistics about allocation of low-level fixed-size structures.
+	mspan_sys    sysMemStat
+	mcache_sys   sysMemStat
+	buckhash_sys sysMemStat // profiling bucket hash table
+
+	// Statistics about GC overhead.
+	gcMiscSys sysMemStat // updated atomically or during STW
+
+	// Miscellaneous statistics.
+	other_sys sysMemStat // updated atomically or during STW
+
+	// Statistics about the garbage collector.
+
+	// Protected by mheap or stopping the world during GC.
+	last_gc_unix    uint64 // last gc (in unix time)
+	pause_total_ns  uint64
+	pause_ns        [256]uint64 // circular buffer of recent gc pause lengths
+	pause_end       [256]uint64 // circular buffer of recent gc end times (nanoseconds since 1970)
+	numgc           uint32
+	numforcedgc     uint32  // number of user-forced GCs
+	gc_cpu_fraction float64 // fraction of CPU time used by GC
+
+	last_gc_nanotime uint64 // last gc (monotonic time)
+	lastHeapInUse    uint64 // heapInUse at mark termination of the previous GC
+
+	enablegc bool
+
+	// gcPauseDist represents the distribution of all GC-related
+	// application pauses in the runtime.
+	//
+	// Each individual pause is counted separately, unlike pause_ns.
+	gcPauseDist timeHistogram
+}
+
+var memstats mstats
+
+// A MemStats records statistics about the memory allocator.
+type MemStats struct {
+	// General statistics.
+
+	// Alloc is bytes of allocated heap objects.
+	//
+	// This is the same as HeapAlloc (see below).
+	Alloc uint64
+
+	// TotalAlloc is cumulative bytes allocated for heap objects.
+	//
+	// TotalAlloc increases as heap objects are allocated, but
+	// unlike Alloc and HeapAlloc, it does not decrease when
+	// objects are freed.
+	TotalAlloc uint64
+
+	// Sys is the total bytes of memory obtained from the OS.
+	//
+	// Sys is the sum of the XSys fields below. Sys measures the
+	// virtual address space reserved by the Go runtime for the
+	// heap, stacks, and other internal data structures. It's
+	// likely that not all of the virtual address space is backed
+	// by physical memory at any given moment, though in general
+	// it all was at some point.
+	Sys uint64
+
+	// Lookups is the number of pointer lookups performed by the
+	// runtime.
+	//
+	// This is primarily useful for debugging runtime internals.
+	Lookups uint64
+
+	// Mallocs is the cumulative count of heap objects allocated.
+	// The number of live objects is Mallocs - Frees.
+	Mallocs uint64
+
+	// Frees is the cumulative count of heap objects freed.
+	Frees uint64
+
+	// Heap memory statistics.
+	//
+	// Interpreting the heap statistics requires some knowledge of
+	// how Go organizes memory. Go divides the virtual address
+	// space of the heap into "spans", which are contiguous
+	// regions of memory 8K or larger. A span may be in one of
+	// three states:
+	//
+	// An "idle" span contains no objects or other data. The
+	// physical memory backing an idle span can be released back
+	// to the OS (but the virtual address space never is), or it
+	// can be converted into an "in use" or "stack" span.
+	//
+	// An "in use" span contains at least one heap object and may
+	// have free space available to allocate more heap objects.
+	//
+	// A "stack" span is used for goroutine stacks. Stack spans
+	// are not considered part of the heap. A span can change
+	// between heap and stack memory; it is never used for both
+	// simultaneously.
+
+	// HeapAlloc is bytes of allocated heap objects.
+	//
+	// "Allocated" heap objects include all reachable objects, as
+	// well as unreachable objects that the garbage collector has
+	// not yet freed. Specifically, HeapAlloc increases as heap
+	// objects are allocated and decreases as the heap is swept
+	// and unreachable objects are freed. Sweeping occurs
+	// incrementally between GC cycles, so these two processes
+	// occur simultaneously, and as a result HeapAlloc tends to
+	// change smoothly (in contrast with the sawtooth that is
+	// typical of stop-the-world garbage collectors).
+	HeapAlloc uint64
+
+	// HeapSys is bytes of heap memory obtained from the OS.
+	//
+	// HeapSys measures the amount of virtual address space
+	// reserved for the heap. This includes virtual address space
+	// that has been reserved but not yet used, which consumes no
+	// physical memory, but tends to be small, as well as virtual
+	// address space for which the physical memory has been
+	// returned to the OS after it became unused (see HeapReleased
+	// for a measure of the latter).
+	//
+	// HeapSys estimates the largest size the heap has had.
+	HeapSys uint64
+
+	// HeapIdle is bytes in idle (unused) spans.
+	//
+	// Idle spans have no objects in them. These spans could be
+	// (and may already have been) returned to the OS, or they can
+	// be reused for heap allocations, or they can be reused as
+	// stack memory.
+	//
+	// HeapIdle minus HeapReleased estimates the amount of memory
+	// that could be returned to the OS, but is being retained by
+	// the runtime so it can grow the heap without requesting more
+	// memory from the OS. If this difference is significantly
+	// larger than the heap size, it indicates there was a recent
+	// transient spike in live heap size.
+	HeapIdle uint64
+
+	// HeapInuse is bytes in in-use spans.
+	//
+	// In-use spans have at least one object in them. These spans
+	// can only be used for other objects of roughly the same
+	// size.
+	//
+	// HeapInuse minus HeapAlloc estimates the amount of memory
+	// that has been dedicated to particular size classes, but is
+	// not currently being used. This is an upper bound on
+	// fragmentation, but in general this memory can be reused
+	// efficiently.
+	HeapInuse uint64
+
+	// HeapReleased is bytes of physical memory returned to the OS.
+	//
+	// This counts heap memory from idle spans that was returned
+	// to the OS and has not yet been reacquired for the heap.
+	HeapReleased uint64
+
+	// HeapObjects is the number of allocated heap objects.
+	//
+	// Like HeapAlloc, this increases as objects are allocated and
+	// decreases as the heap is swept and unreachable objects are
+	// freed.
+	HeapObjects uint64
+
+	// Stack memory statistics.
+	//
+	// Stacks are not considered part of the heap, but the runtime
+	// can reuse a span of heap memory for stack memory, and
+	// vice-versa.
+
+	// StackInuse is bytes in stack spans.
+	//
+	// In-use stack spans have at least one stack in them. These
+	// spans can only be used for other stacks of the same size.
+	//
+	// There is no StackIdle because unused stack spans are
+	// returned to the heap (and hence counted toward HeapIdle).
+	StackInuse uint64
+
+	// StackSys is bytes of stack memory obtained from the OS.
+	//
+	// StackSys is StackInuse, plus any memory obtained directly
+	// from the OS for OS thread stacks.
+	//
+	// In non-cgo programs this metric is currently equal to StackInuse
+	// (but this should not be relied upon, and the value may change in
+	// the future).
+	//
+	// In cgo programs this metric includes OS thread stacks allocated
+	// directly from the OS. Currently, this only accounts for one stack in
+	// c-shared and c-archive build modes and other sources of stacks from
+	// the OS (notably, any allocated by C code) are not currently measured.
+	// Note this too may change in the future.
+	StackSys uint64
+
+	// Off-heap memory statistics.
+	//
+	// The following statistics measure runtime-internal
+	// structures that are not allocated from heap memory (usually
+	// because they are part of implementing the heap). Unlike
+	// heap or stack memory, any memory allocated to these
+	// structures is dedicated to these structures.
+	//
+	// These are primarily useful for debugging runtime memory
+	// overheads.
+
+	// MSpanInuse is bytes of allocated mspan structures.
+	MSpanInuse uint64
+
+	// MSpanSys is bytes of memory obtained from the OS for mspan
+	// structures.
+	MSpanSys uint64
+
+	// MCacheInuse is bytes of allocated mcache structures.
+	MCacheInuse uint64
+
+	// MCacheSys is bytes of memory obtained from the OS for
+	// mcache structures.
+	MCacheSys uint64
+
+	// BuckHashSys is bytes of memory in profiling bucket hash tables.
+	BuckHashSys uint64
+
+	// GCSys is bytes of memory in garbage collection metadata.
+	GCSys uint64
+
+	// OtherSys is bytes of memory in miscellaneous off-heap
+	// runtime allocations.
+	OtherSys uint64
+
+	// Garbage collector statistics.
+
+	// NextGC is the target heap size of the next GC cycle.
+	//
+	// The garbage collector's goal is to keep HeapAlloc ≤ NextGC.
+	// At the end of each GC cycle, the target for the next cycle
+	// is computed based on the amount of reachable data and the
+	// value of GOGC.
+	NextGC uint64
+
+	// LastGC is the time the last garbage collection finished, as
+	// nanoseconds since 1970 (the UNIX epoch).
+	LastGC uint64
+
+	// PauseTotalNs is the cumulative nanoseconds in GC
+	// stop-the-world pauses since the program started.
+	//
+	// During a stop-the-world pause, all goroutines are paused
+	// and only the garbage collector can run.
+	PauseTotalNs uint64
+
+	// PauseNs is a circular buffer of recent GC stop-the-world
+	// pause times in nanoseconds.
+	//
+	// The most recent pause is at PauseNs[(NumGC+255)%256]. In
+	// general, PauseNs[N%256] records the time paused in the most
+	// recent N%256th GC cycle. There may be multiple pauses per
+	// GC cycle; this is the sum of all pauses during a cycle.
+	PauseNs [256]uint64
+
+	// PauseEnd is a circular buffer of recent GC pause end times,
+	// as nanoseconds since 1970 (the UNIX epoch).
+	//
+	// This buffer is filled the same way as PauseNs. There may be
+	// multiple pauses per GC cycle; this records the end of the
+	// last pause in a cycle.
+	PauseEnd [256]uint64
+
+	// NumGC is the number of completed GC cycles.
+	NumGC uint32
+
+	// NumForcedGC is the number of GC cycles that were forced by
+	// the application calling the GC function.
+	NumForcedGC uint32
+
+	// GCCPUFraction is the fraction of this program's available
+	// CPU time used by the GC since the program started.
+	//
+	// GCCPUFraction is expressed as a number between 0 and 1,
+	// where 0 means GC has consumed none of this program's CPU. A
+	// program's available CPU time is defined as the integral of
+	// GOMAXPROCS since the program started. That is, if
+	// GOMAXPROCS is 2 and a program has been running for 10
+	// seconds, its "available CPU" is 20 seconds. GCCPUFraction
+	// does not include CPU time used for write barrier activity.
+	//
+	// This is the same as the fraction of CPU reported by
+	// GODEBUG=gctrace=1.
+	GCCPUFraction float64
+
+	// EnableGC indicates that GC is enabled. It is always true,
+	// even if GOGC=off.
+	EnableGC bool
+
+	// DebugGC is currently unused.
+	DebugGC bool
+
+	// BySize reports per-size class allocation statistics.
+	//
+	// BySize[N] gives statistics for allocations of size S where
+	// BySize[N-1].Size < S ≤ BySize[N].Size.
+	//
+	// This does not report allocations larger than BySize[60].Size.
+	BySize [61]struct {
+		// Size is the maximum byte size of an object in this
+		// size class.
+		Size uint32
+
+		// Mallocs is the cumulative count of heap objects
+		// allocated in this size class. The cumulative bytes
+		// of allocation is Size*Mallocs. The number of live
+		// objects in this size class is Mallocs - Frees.
+		Mallocs uint64
+
+		// Frees is the cumulative count of heap objects freed
+		// in this size class.
+		Frees uint64
+	}
+}
+
+func init() {
+	if offset := unsafe.Offsetof(memstats.heapStats); offset%8 != 0 {
+		println(offset)
+		throw("memstats.heapStats not aligned to 8 bytes")
+	}
+	// Ensure the size of heapStatsDelta causes adjacent fields/slots (e.g.
+	// [3]heapStatsDelta) to be 8-byte aligned.
+	if size := unsafe.Sizeof(heapStatsDelta{}); size%8 != 0 {
+		println(size)
+		throw("heapStatsDelta not a multiple of 8 bytes in size")
+	}
+}
+
+// ReadMemStats populates m with memory allocator statistics.
+//
+// The returned memory allocator statistics are up to date as of the
+// call to ReadMemStats. This is in contrast with a heap profile,
+// which is a snapshot as of the most recently completed garbage
+// collection cycle.
+func ReadMemStats(m *MemStats) {
+	_ = m.Alloc // nil check test before we switch stacks, see issue 61158
+	stopTheWorld(stwReadMemStats)
+
+	systemstack(func() {
+		readmemstats_m(m)
+	})
+
+	startTheWorld()
+}
+
+// doubleCheckReadMemStats controls a double-check mode for ReadMemStats that
+// ensures consistency between the values that ReadMemStats is using and the
+// runtime-internal stats.
+var doubleCheckReadMemStats = false
+
+// readmemstats_m populates stats for internal runtime values.
+//
+// The world must be stopped.
+func readmemstats_m(stats *MemStats) {
+	assertWorldStopped()
+
+	// Flush mcaches to mcentral before doing anything else.
+	//
+	// Flushing to the mcentral may in general cause stats to
+	// change as mcentral data structures are manipulated.
+	systemstack(flushallmcaches)
+
+	// Calculate memory allocator stats.
+	// During program execution we only count number of frees and amount of freed memory.
+	// Current number of alive objects in the heap and amount of alive heap memory
+	// are calculated by scanning all spans.
+	// Total number of mallocs is calculated as number of frees plus number of alive objects.
+	// Similarly, total amount of allocated memory is calculated as amount of freed memory
+	// plus amount of alive heap memory.
+
+	// Collect consistent stats, which are the source-of-truth in some cases.
+	var consStats heapStatsDelta
+	memstats.heapStats.unsafeRead(&consStats)
+
+	// Collect large allocation stats.
+	totalAlloc := consStats.largeAlloc
+	nMalloc := consStats.largeAllocCount
+	totalFree := consStats.largeFree
+	nFree := consStats.largeFreeCount
+
+	// Collect per-sizeclass stats.
+	var bySize [_NumSizeClasses]struct {
+		Size    uint32
+		Mallocs uint64
+		Frees   uint64
+	}
+	for i := range bySize {
+		bySize[i].Size = uint32(class_to_size[i])
+
+		// Malloc stats.
+		a := consStats.smallAllocCount[i]
+		totalAlloc += a * uint64(class_to_size[i])
+		nMalloc += a
+		bySize[i].Mallocs = a
+
+		// Free stats.
+		f := consStats.smallFreeCount[i]
+		totalFree += f * uint64(class_to_size[i])
+		nFree += f
+		bySize[i].Frees = f
+	}
+
+	// Account for tiny allocations.
+	// For historical reasons, MemStats includes tiny allocations
+	// in both the total free and total alloc count. This double-counts
+	// memory in some sense because their tiny allocation block is also
+	// counted. Tracking the lifetime of individual tiny allocations is
+	// currently not done because it would be too expensive.
+	nFree += consStats.tinyAllocCount
+	nMalloc += consStats.tinyAllocCount
+
+	// Calculate derived stats.
+
+	stackInUse := uint64(consStats.inStacks)
+	gcWorkBufInUse := uint64(consStats.inWorkBufs)
+	gcProgPtrScalarBitsInUse := uint64(consStats.inPtrScalarBits)
+
+	totalMapped := gcController.heapInUse.load() + gcController.heapFree.load() + gcController.heapReleased.load() +
+		memstats.stacks_sys.load() + memstats.mspan_sys.load() + memstats.mcache_sys.load() +
+		memstats.buckhash_sys.load() + memstats.gcMiscSys.load() + memstats.other_sys.load() +
+		stackInUse + gcWorkBufInUse + gcProgPtrScalarBitsInUse
+
+	heapGoal := gcController.heapGoal()
+
+	if doubleCheckReadMemStats {
+		// Only check this if we're debugging. It would be bad to crash an application
+		// just because the debugging stats are wrong. We mostly rely on tests to catch
+		// these issues, and we enable the double check mode for tests.
+		//
+		// The world is stopped, so the consistent stats (after aggregation)
+		// should be identical to some combination of memstats. In particular:
+		//
+		// * memstats.heapInUse == inHeap
+		// * memstats.heapReleased == released
+		// * memstats.heapInUse + memstats.heapFree == committed - inStacks - inWorkBufs - inPtrScalarBits
+		// * memstats.totalAlloc == totalAlloc
+		// * memstats.totalFree == totalFree
+		//
+		// Check if that's actually true.
+		//
+		// Prevent sysmon and the tracer from skewing the stats since they can
+		// act without synchronizing with a STW. See #64401.
+		lock(&sched.sysmonlock)
+		lock(&trace.lock)
+		if gcController.heapInUse.load() != uint64(consStats.inHeap) {
+			print("runtime: heapInUse=", gcController.heapInUse.load(), "\n")
+			print("runtime: consistent value=", consStats.inHeap, "\n")
+			throw("heapInUse and consistent stats are not equal")
+		}
+		if gcController.heapReleased.load() != uint64(consStats.released) {
+			print("runtime: heapReleased=", gcController.heapReleased.load(), "\n")
+			print("runtime: consistent value=", consStats.released, "\n")
+			throw("heapReleased and consistent stats are not equal")
+		}
+		heapRetained := gcController.heapInUse.load() + gcController.heapFree.load()
+		consRetained := uint64(consStats.committed - consStats.inStacks - consStats.inWorkBufs - consStats.inPtrScalarBits)
+		if heapRetained != consRetained {
+			print("runtime: global value=", heapRetained, "\n")
+			print("runtime: consistent value=", consRetained, "\n")
+			throw("measures of the retained heap are not equal")
+		}
+		if gcController.totalAlloc.Load() != totalAlloc {
+			print("runtime: totalAlloc=", gcController.totalAlloc.Load(), "\n")
+			print("runtime: consistent value=", totalAlloc, "\n")
+			throw("totalAlloc and consistent stats are not equal")
+		}
+		if gcController.totalFree.Load() != totalFree {
+			print("runtime: totalFree=", gcController.totalFree.Load(), "\n")
+			print("runtime: consistent value=", totalFree, "\n")
+			throw("totalFree and consistent stats are not equal")
+		}
+		// Also check that mappedReady lines up with totalMapped - released.
+		// This isn't really the same type of "make sure consistent stats line up" situation,
+		// but this is an opportune time to check.
+		if gcController.mappedReady.Load() != totalMapped-uint64(consStats.released) {
+			print("runtime: mappedReady=", gcController.mappedReady.Load(), "\n")
+			print("runtime: totalMapped=", totalMapped, "\n")
+			print("runtime: released=", uint64(consStats.released), "\n")
+			print("runtime: totalMapped-released=", totalMapped-uint64(consStats.released), "\n")
+			throw("mappedReady and other memstats are not equal")
+		}
+		unlock(&trace.lock)
+		unlock(&sched.sysmonlock)
+	}
+
+	// We've calculated all the values we need. Now, populate stats.
+
+	stats.Alloc = totalAlloc - totalFree
+	stats.TotalAlloc = totalAlloc
+	stats.Sys = totalMapped
+	stats.Mallocs = nMalloc
+	stats.Frees = nFree
+	stats.HeapAlloc = totalAlloc - totalFree
+	stats.HeapSys = gcController.heapInUse.load() + gcController.heapFree.load() + gcController.heapReleased.load()
+	// By definition, HeapIdle is memory that was mapped
+	// for the heap but is not currently used to hold heap
+	// objects. It also specifically is memory that can be
+	// used for other purposes, like stacks, but this memory
+	// is subtracted out of HeapSys before it makes that
+	// transition. Put another way:
+	//
+	// HeapSys = bytes allocated from the OS for the heap - bytes ultimately used for non-heap purposes
+	// HeapIdle = bytes allocated from the OS for the heap - bytes ultimately used for any purpose
+	//
+	// or
+	//
+	// HeapSys = sys - stacks_inuse - gcWorkBufInUse - gcProgPtrScalarBitsInUse
+	// HeapIdle = sys - stacks_inuse - gcWorkBufInUse - gcProgPtrScalarBitsInUse - heapInUse
+	//
+	// => HeapIdle = HeapSys - heapInUse = heapFree + heapReleased
+	stats.HeapIdle = gcController.heapFree.load() + gcController.heapReleased.load()
+	stats.HeapInuse = gcController.heapInUse.load()
+	stats.HeapReleased = gcController.heapReleased.load()
+	stats.HeapObjects = nMalloc - nFree
+	stats.StackInuse = stackInUse
+	// memstats.stacks_sys is only memory mapped directly for OS stacks.
+	// Add in heap-allocated stack memory for user consumption.
+	stats.StackSys = stackInUse + memstats.stacks_sys.load()
+	stats.MSpanInuse = uint64(mheap_.spanalloc.inuse)
+	stats.MSpanSys = memstats.mspan_sys.load()
+	stats.MCacheInuse = uint64(mheap_.cachealloc.inuse)
+	stats.MCacheSys = memstats.mcache_sys.load()
+	stats.BuckHashSys = memstats.buckhash_sys.load()
+	// MemStats defines GCSys as an aggregate of all memory related
+	// to the memory management system, but we track this memory
+	// at a more granular level in the runtime.
+	stats.GCSys = memstats.gcMiscSys.load() + gcWorkBufInUse + gcProgPtrScalarBitsInUse
+	stats.OtherSys = memstats.other_sys.load()
+	stats.NextGC = heapGoal
+	stats.LastGC = memstats.last_gc_unix
+	stats.PauseTotalNs = memstats.pause_total_ns
+	stats.PauseNs = memstats.pause_ns
+	stats.PauseEnd = memstats.pause_end
+	stats.NumGC = memstats.numgc
+	stats.NumForcedGC = memstats.numforcedgc
+	stats.GCCPUFraction = memstats.gc_cpu_fraction
+	stats.EnableGC = true
+
+	// stats.BySize and bySize might not match in length.
+	// That's OK, stats.BySize cannot change due to backwards
+	// compatibility issues. copy will copy the minimum amount
+	// of values between the two of them.
+	copy(stats.BySize[:], bySize[:])
+}
+
+//go:linkname readGCStats runtime/debug.readGCStats
+func readGCStats(pauses *[]uint64) {
+	systemstack(func() {
+		readGCStats_m(pauses)
+	})
+}
+
+// readGCStats_m must be called on the system stack because it acquires the heap
+// lock. See mheap for details.
+//
+//go:systemstack
+func readGCStats_m(pauses *[]uint64) {
+	p := *pauses
+	// Calling code in runtime/debug should make the slice large enough.
+	if cap(p) < len(memstats.pause_ns)+3 {
+		throw("short slice passed to readGCStats")
+	}
+
+	// Pass back: pauses, pause ends, last gc (absolute time), number of gc, total pause ns.
+	lock(&mheap_.lock)
+
+	n := memstats.numgc
+	if n > uint32(len(memstats.pause_ns)) {
+		n = uint32(len(memstats.pause_ns))
+	}
+
+	// The pause buffer is circular. The most recent pause is at
+	// pause_ns[(numgc-1)%len(pause_ns)], and then backward
+	// from there to go back farther in time. We deliver the times
+	// most recent first (in p[0]).
+	p = p[:cap(p)]
+	for i := uint32(0); i < n; i++ {
+		j := (memstats.numgc - 1 - i) % uint32(len(memstats.pause_ns))
+		p[i] = memstats.pause_ns[j]
+		p[n+i] = memstats.pause_end[j]
+	}
+
+	p[n+n] = memstats.last_gc_unix
+	p[n+n+1] = uint64(memstats.numgc)
+	p[n+n+2] = memstats.pause_total_ns
+	unlock(&mheap_.lock)
+	*pauses = p[:n+n+3]
+}
+
+// flushmcache flushes the mcache of allp[i].
+//
+// The world must be stopped.
+//
+//go:nowritebarrier
+func flushmcache(i int) {
+	assertWorldStopped()
+
+	p := allp[i]
+	c := p.mcache
+	if c == nil {
+		return
+	}
+	c.releaseAll()
+	stackcache_clear(c)
+}
+
+// flushallmcaches flushes the mcaches of all Ps.
+//
+// The world must be stopped.
+//
+//go:nowritebarrier
+func flushallmcaches() {
+	assertWorldStopped()
+
+	for i := 0; i < int(gomaxprocs); i++ {
+		flushmcache(i)
+	}
+}
+
+// sysMemStat represents a global system statistic that is managed atomically.
+//
+// This type must structurally be a uint64 so that mstats aligns with MemStats.
+type sysMemStat uint64
+
+// load atomically reads the value of the stat.
+//
+// Must be nosplit as it is called in runtime initialization, e.g. newosproc0.
+//
+//go:nosplit
+func (s *sysMemStat) load() uint64 {
+	return atomic.Load64((*uint64)(s))
+}
+
+// add atomically adds the sysMemStat by n.
+//
+// Must be nosplit as it is called in runtime initialization, e.g. newosproc0.
+//
+//go:nosplit
+func (s *sysMemStat) add(n int64) {
+	val := atomic.Xadd64((*uint64)(s), n)
+	if (n > 0 && int64(val) < n) || (n < 0 && int64(val)+n < n) {
+		print("runtime: val=", val, " n=", n, "\n")
+		throw("sysMemStat overflow")
+	}
+}
+
+// heapStatsDelta contains deltas of various runtime memory statistics
+// that need to be updated together in order for them to be kept
+// consistent with one another.
+type heapStatsDelta struct {
+	// Memory stats.
+	committed       int64 // byte delta of memory committed
+	released        int64 // byte delta of released memory generated
+	inHeap          int64 // byte delta of memory placed in the heap
+	inStacks        int64 // byte delta of memory reserved for stacks
+	inWorkBufs      int64 // byte delta of memory reserved for work bufs
+	inPtrScalarBits int64 // byte delta of memory reserved for unrolled GC prog bits
+
+	// Allocator stats.
+	//
+	// These are all uint64 because they're cumulative, and could quickly wrap
+	// around otherwise.
+	tinyAllocCount  uint64                  // number of tiny allocations
+	largeAlloc      uint64                  // bytes allocated for large objects
+	largeAllocCount uint64                  // number of large object allocations
+	smallAllocCount [_NumSizeClasses]uint64 // number of allocs for small objects
+	largeFree       uint64                  // bytes freed for large objects (>maxSmallSize)
+	largeFreeCount  uint64                  // number of frees for large objects (>maxSmallSize)
+	smallFreeCount  [_NumSizeClasses]uint64 // number of frees for small objects (<=maxSmallSize)
+
+	// NOTE: This struct must be a multiple of 8 bytes in size because it
+	// is stored in an array. If it's not, atomic accesses to the above
+	// fields may be unaligned and fail on 32-bit platforms.
+}
+
+// merge adds in the deltas from b into a.
+func (a *heapStatsDelta) merge(b *heapStatsDelta) {
+	a.committed += b.committed
+	a.released += b.released
+	a.inHeap += b.inHeap
+	a.inStacks += b.inStacks
+	a.inWorkBufs += b.inWorkBufs
+	a.inPtrScalarBits += b.inPtrScalarBits
+
+	a.tinyAllocCount += b.tinyAllocCount
+	a.largeAlloc += b.largeAlloc
+	a.largeAllocCount += b.largeAllocCount
+	for i := range b.smallAllocCount {
+		a.smallAllocCount[i] += b.smallAllocCount[i]
+	}
+	a.largeFree += b.largeFree
+	a.largeFreeCount += b.largeFreeCount
+	for i := range b.smallFreeCount {
+		a.smallFreeCount[i] += b.smallFreeCount[i]
+	}
+}
+
+// consistentHeapStats represents a set of various memory statistics
+// whose updates must be viewed completely to get a consistent
+// state of the world.
+//
+// To write updates to memory stats use the acquire and release
+// methods. To obtain a consistent global snapshot of these statistics,
+// use read.
+type consistentHeapStats struct {
+	// stats is a ring buffer of heapStatsDelta values.
+	// Writers always atomically update the delta at index gen.
+	//
+	// Readers operate by rotating gen (0 -> 1 -> 2 -> 0 -> ...)
+	// and synchronizing with writers by observing each P's
+	// statsSeq field. If the reader observes a P not writing,
+	// it can be sure that it will pick up the new gen value the
+	// next time it writes.
+	//
+	// The reader then takes responsibility by clearing space
+	// in the ring buffer for the next reader to rotate gen to
+	// that space (i.e. it merges in values from index (gen-2) mod 3
+	// to index (gen-1) mod 3, then clears the former).
+	//
+	// Note that this means only one reader can be reading at a time.
+	// There is no way for readers to synchronize.
+	//
+	// This process is why we need a ring buffer of size 3 instead
+	// of 2: one is for the writers, one contains the most recent
+	// data, and the last one is clear so writers can begin writing
+	// to it the moment gen is updated.
+	stats [3]heapStatsDelta
+
+	// gen represents the current index into which writers
+	// are writing, and can take on the value of 0, 1, or 2.
+	gen atomic.Uint32
+
+	// noPLock is intended to provide mutual exclusion for updating
+	// stats when no P is available. It does not block other writers
+	// with a P, only other writers without a P and the reader. Because
+	// stats are usually updated when a P is available, contention on
+	// this lock should be minimal.
+	noPLock mutex
+}
+
+// acquire returns a heapStatsDelta to be updated. In effect,
+// it acquires the shard for writing. release must be called
+// as soon as the relevant deltas are updated.
+//
+// The returned heapStatsDelta must be updated atomically.
+//
+// The caller's P must not change between acquire and
+// release. This also means that the caller should not
+// acquire a P or release its P in between. A P also must
+// not acquire a given consistentHeapStats if it hasn't
+// yet released it.
+//
+// nosplit because a stack growth in this function could
+// lead to a stack allocation that could reenter the
+// function.
+//
+//go:nosplit
+func (m *consistentHeapStats) acquire() *heapStatsDelta {
+	if pp := getg().m.p.ptr(); pp != nil {
+		seq := pp.statsSeq.Add(1)
+		if seq%2 == 0 {
+			// Should have been incremented to odd.
+			print("runtime: seq=", seq, "\n")
+			throw("bad sequence number")
+		}
+	} else {
+		lock(&m.noPLock)
+	}
+	gen := m.gen.Load() % 3
+	return &m.stats[gen]
+}
+
+// release indicates that the writer is done modifying
+// the delta. The value returned by the corresponding
+// acquire must no longer be accessed or modified after
+// release is called.
+//
+// The caller's P must not change between acquire and
+// release. This also means that the caller should not
+// acquire a P or release its P in between.
+//
+// nosplit because a stack growth in this function could
+// lead to a stack allocation that causes another acquire
+// before this operation has completed.
+//
+//go:nosplit
+func (m *consistentHeapStats) release() {
+	if pp := getg().m.p.ptr(); pp != nil {
+		seq := pp.statsSeq.Add(1)
+		if seq%2 != 0 {
+			// Should have been incremented to even.
+			print("runtime: seq=", seq, "\n")
+			throw("bad sequence number")
+		}
+	} else {
+		unlock(&m.noPLock)
+	}
+}
+
+// unsafeRead aggregates the delta for this shard into out.
+//
+// Unsafe because it does so without any synchronization. The
+// world must be stopped.
+func (m *consistentHeapStats) unsafeRead(out *heapStatsDelta) {
+	assertWorldStopped()
+
+	for i := range m.stats {
+		out.merge(&m.stats[i])
+	}
+}
+
+// unsafeClear clears the shard.
+//
+// Unsafe because the world must be stopped and values should
+// be donated elsewhere before clearing.
+func (m *consistentHeapStats) unsafeClear() {
+	assertWorldStopped()
+
+	for i := range m.stats {
+		m.stats[i] = heapStatsDelta{}
+	}
+}
+
+// read takes a globally consistent snapshot of m
+// and puts the aggregated value in out. Even though out is a
+// heapStatsDelta, the resulting values should be complete and
+// valid statistic values.
+//
+// Not safe to call concurrently. The world must be stopped
+// or metricsSema must be held.
+func (m *consistentHeapStats) read(out *heapStatsDelta) {
+	// Getting preempted after this point is not safe because
+	// we read allp. We need to make sure a STW can't happen
+	// so it doesn't change out from under us.
+	mp := acquirem()
+
+	// Get the current generation. We can be confident that this
+	// will not change since read is serialized and is the only
+	// one that modifies currGen.
+	currGen := m.gen.Load()
+	prevGen := currGen - 1
+	if currGen == 0 {
+		prevGen = 2
+	}
+
+	// Prevent writers without a P from writing while we update gen.
+	lock(&m.noPLock)
+
+	// Rotate gen, effectively taking a snapshot of the state of
+	// these statistics at the point of the exchange by moving
+	// writers to the next set of deltas.
+	//
+	// This exchange is safe to do because we won't race
+	// with anyone else trying to update this value.
+	m.gen.Swap((currGen + 1) % 3)
+
+	// Allow P-less writers to continue. They'll be writing to the
+	// next generation now.
+	unlock(&m.noPLock)
+
+	for _, p := range allp {
+		// Spin until there are no more writers.
+		for p.statsSeq.Load()%2 != 0 {
+		}
+	}
+
+	// At this point we've observed that each sequence
+	// number is even, so any future writers will observe
+	// the new gen value. That means it's safe to read from
+	// the other deltas in the stats buffer.
+
+	// Perform our responsibilities and free up
+	// stats[prevGen] for the next time we want to take
+	// a snapshot.
+	m.stats[currGen].merge(&m.stats[prevGen])
+	m.stats[prevGen] = heapStatsDelta{}
+
+	// Finally, copy out the complete delta.
+	*out = m.stats[currGen]
+
+	releasem(mp)
+}
+
+type cpuStats struct {
+	// All fields are CPU time in nanoseconds computed by comparing
+	// calls of nanotime. This means they're all overestimates, because
+	// they don't accurately compute on-CPU time (so some of the time
+	// could be spent scheduled away by the OS).
+
+	gcAssistTime    int64 // GC assists
+	gcDedicatedTime int64 // GC dedicated mark workers + pauses
+	gcIdleTime      int64 // GC idle mark workers
+	gcPauseTime     int64 // GC pauses (all GOMAXPROCS, even if just 1 is running)
+	gcTotalTime     int64
+
+	scavengeAssistTime int64 // background scavenger
+	scavengeBgTime     int64 // scavenge assists
+	scavengeTotalTime  int64
+
+	idleTime int64 // Time Ps spent in _Pidle.
+	userTime int64 // Time Ps spent in _Prunning or _Psyscall that's not any of the above.
+
+	totalTime int64 // GOMAXPROCS * (monotonic wall clock time elapsed)
+}
+
+// accumulate takes a cpuStats and adds in the current state of all GC CPU
+// counters.
+//
+// gcMarkPhase indicates that we're in the mark phase and that certain counter
+// values should be used.
+func (s *cpuStats) accumulate(now int64, gcMarkPhase bool) {
+	// N.B. Mark termination and sweep termination pauses are
+	// accumulated in work.cpuStats at the end of their respective pauses.
+	var (
+		markAssistCpu     int64
+		markDedicatedCpu  int64
+		markFractionalCpu int64
+		markIdleCpu       int64
+	)
+	if gcMarkPhase {
+		// N.B. These stats may have stale values if the GC is not
+		// currently in the mark phase.
+		markAssistCpu = gcController.assistTime.Load()
+		markDedicatedCpu = gcController.dedicatedMarkTime.Load()
+		markFractionalCpu = gcController.fractionalMarkTime.Load()
+		markIdleCpu = gcController.idleMarkTime.Load()
+	}
+
+	// The rest of the stats below are either derived from the above or
+	// are reset on each mark termination.
+
+	scavAssistCpu := scavenge.assistTime.Load()
+	scavBgCpu := scavenge.backgroundTime.Load()
+
+	// Update cumulative GC CPU stats.
+	s.gcAssistTime += markAssistCpu
+	s.gcDedicatedTime += markDedicatedCpu + markFractionalCpu
+	s.gcIdleTime += markIdleCpu
+	s.gcTotalTime += markAssistCpu + markDedicatedCpu + markFractionalCpu + markIdleCpu
+
+	// Update cumulative scavenge CPU stats.
+	s.scavengeAssistTime += scavAssistCpu
+	s.scavengeBgTime += scavBgCpu
+	s.scavengeTotalTime += scavAssistCpu + scavBgCpu
+
+	// Update total CPU.
+	s.totalTime = sched.totaltime + (now-sched.procresizetime)*int64(gomaxprocs)
+	s.idleTime += sched.idleTime.Load()
+
+	// Compute userTime. We compute this indirectly as everything that's not the above.
+	//
+	// Since time spent in _Pgcstop is covered by gcPauseTime, and time spent in _Pidle
+	// is covered by idleTime, what we're left with is time spent in _Prunning and _Psyscall,
+	// the latter of which is fine because the P will either go idle or get used for something
+	// else via sysmon. Meanwhile if we subtract GC time from whatever's left, we get non-GC
+	// _Prunning time. Note that this still leaves time spent in sweeping and in the scheduler,
+	// but that's fine. The overwhelming majority of this time will be actual user time.
+	s.userTime = s.totalTime - (s.gcTotalTime + s.scavengeTotalTime + s.idleTime)
+}
diff --git a/src/runtime/mwbbuf.go b/src/runtime/mwbbuf.go
new file mode 100644
index 0000000..7419bd2
--- /dev/null
+++ b/src/runtime/mwbbuf.go
@@ -0,0 +1,270 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This implements the write barrier buffer. The write barrier itself
+// is gcWriteBarrier and is implemented in assembly.
+//
+// See mbarrier.go for algorithmic details on the write barrier. This
+// file deals only with the buffer.
+//
+// The write barrier has a fast path and a slow path. The fast path
+// simply enqueues to a per-P write barrier buffer. It's written in
+// assembly and doesn't clobber any general purpose registers, so it
+// doesn't have the usual overheads of a Go call.
+//
+// When the buffer fills up, the write barrier invokes the slow path
+// (wbBufFlush) to flush the buffer to the GC work queues. In this
+// path, since the compiler didn't spill registers, we spill *all*
+// registers and disallow any GC safe points that could observe the
+// stack frame (since we don't know the types of the spilled
+// registers).
+
+package runtime
+
+import (
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// testSmallBuf forces a small write barrier buffer to stress write
+// barrier flushing.
+const testSmallBuf = false
+
+// wbBuf is a per-P buffer of pointers queued by the write barrier.
+// This buffer is flushed to the GC workbufs when it fills up and on
+// various GC transitions.
+//
+// This is closely related to a "sequential store buffer" (SSB),
+// except that SSBs are usually used for maintaining remembered sets,
+// while this is used for marking.
+type wbBuf struct {
+	// next points to the next slot in buf. It must not be a
+	// pointer type because it can point past the end of buf and
+	// must be updated without write barriers.
+	//
+	// This is a pointer rather than an index to optimize the
+	// write barrier assembly.
+	next uintptr
+
+	// end points to just past the end of buf. It must not be a
+	// pointer type because it points past the end of buf and must
+	// be updated without write barriers.
+	end uintptr
+
+	// buf stores a series of pointers to execute write barriers on.
+	buf [wbBufEntries]uintptr
+}
+
+const (
+	// wbBufEntries is the maximum number of pointers that can be
+	// stored in the write barrier buffer.
+	//
+	// This trades latency for throughput amortization. Higher
+	// values amortize flushing overhead more, but increase the
+	// latency of flushing. Higher values also increase the cache
+	// footprint of the buffer.
+	//
+	// TODO: What is the latency cost of this? Tune this value.
+	wbBufEntries = 512
+
+	// Maximum number of entries that we need to ask from the
+	// buffer in a single call.
+	wbMaxEntriesPerCall = 8
+)
+
+// reset empties b by resetting its next and end pointers.
+func (b *wbBuf) reset() {
+	start := uintptr(unsafe.Pointer(&b.buf[0]))
+	b.next = start
+	if testSmallBuf {
+		// For testing, make the buffer smaller but more than
+		// 1 write barrier's worth, so it tests both the
+		// immediate flush and delayed flush cases.
+		b.end = uintptr(unsafe.Pointer(&b.buf[wbMaxEntriesPerCall+1]))
+	} else {
+		b.end = start + uintptr(len(b.buf))*unsafe.Sizeof(b.buf[0])
+	}
+
+	if (b.end-b.next)%unsafe.Sizeof(b.buf[0]) != 0 {
+		throw("bad write barrier buffer bounds")
+	}
+}
+
+// discard resets b's next pointer, but not its end pointer.
+//
+// This must be nosplit because it's called by wbBufFlush.
+//
+//go:nosplit
+func (b *wbBuf) discard() {
+	b.next = uintptr(unsafe.Pointer(&b.buf[0]))
+}
+
+// empty reports whether b contains no pointers.
+func (b *wbBuf) empty() bool {
+	return b.next == uintptr(unsafe.Pointer(&b.buf[0]))
+}
+
+// getX returns space in the write barrier buffer to store X pointers.
+// getX will flush the buffer if necessary. Callers should use this as:
+//
+//	buf := &getg().m.p.ptr().wbBuf
+//	p := buf.get2()
+//	p[0], p[1] = old, new
+//	... actual memory write ...
+//
+// The caller must ensure there are no preemption points during the
+// above sequence. There must be no preemption points while buf is in
+// use because it is a per-P resource. There must be no preemption
+// points between the buffer put and the write to memory because this
+// could allow a GC phase change, which could result in missed write
+// barriers.
+//
+// getX must be nowritebarrierrec to because write barriers here would
+// corrupt the write barrier buffer. It (and everything it calls, if
+// it called anything) has to be nosplit to avoid scheduling on to a
+// different P and a different buffer.
+//
+//go:nowritebarrierrec
+//go:nosplit
+func (b *wbBuf) get1() *[1]uintptr {
+	if b.next+goarch.PtrSize > b.end {
+		wbBufFlush()
+	}
+	p := (*[1]uintptr)(unsafe.Pointer(b.next))
+	b.next += goarch.PtrSize
+	return p
+}
+
+//go:nowritebarrierrec
+//go:nosplit
+func (b *wbBuf) get2() *[2]uintptr {
+	if b.next+2*goarch.PtrSize > b.end {
+		wbBufFlush()
+	}
+	p := (*[2]uintptr)(unsafe.Pointer(b.next))
+	b.next += 2 * goarch.PtrSize
+	return p
+}
+
+// wbBufFlush flushes the current P's write barrier buffer to the GC
+// workbufs.
+//
+// This must not have write barriers because it is part of the write
+// barrier implementation.
+//
+// This and everything it calls must be nosplit because 1) the stack
+// contains untyped slots from gcWriteBarrier and 2) there must not be
+// a GC safe point between the write barrier test in the caller and
+// flushing the buffer.
+//
+// TODO: A "go:nosplitrec" annotation would be perfect for this.
+//
+//go:nowritebarrierrec
+//go:nosplit
+func wbBufFlush() {
+	// Note: Every possible return from this function must reset
+	// the buffer's next pointer to prevent buffer overflow.
+
+	if getg().m.dying > 0 {
+		// We're going down. Not much point in write barriers
+		// and this way we can allow write barriers in the
+		// panic path.
+		getg().m.p.ptr().wbBuf.discard()
+		return
+	}
+
+	// Switch to the system stack so we don't have to worry about
+	// safe points.
+	systemstack(func() {
+		wbBufFlush1(getg().m.p.ptr())
+	})
+}
+
+// wbBufFlush1 flushes p's write barrier buffer to the GC work queue.
+//
+// This must not have write barriers because it is part of the write
+// barrier implementation, so this may lead to infinite loops or
+// buffer corruption.
+//
+// This must be non-preemptible because it uses the P's workbuf.
+//
+//go:nowritebarrierrec
+//go:systemstack
+func wbBufFlush1(pp *p) {
+	// Get the buffered pointers.
+	start := uintptr(unsafe.Pointer(&pp.wbBuf.buf[0]))
+	n := (pp.wbBuf.next - start) / unsafe.Sizeof(pp.wbBuf.buf[0])
+	ptrs := pp.wbBuf.buf[:n]
+
+	// Poison the buffer to make extra sure nothing is enqueued
+	// while we're processing the buffer.
+	pp.wbBuf.next = 0
+
+	if useCheckmark {
+		// Slow path for checkmark mode.
+		for _, ptr := range ptrs {
+			shade(ptr)
+		}
+		pp.wbBuf.reset()
+		return
+	}
+
+	// Mark all of the pointers in the buffer and record only the
+	// pointers we greyed. We use the buffer itself to temporarily
+	// record greyed pointers.
+	//
+	// TODO: Should scanobject/scanblock just stuff pointers into
+	// the wbBuf? Then this would become the sole greying path.
+	//
+	// TODO: We could avoid shading any of the "new" pointers in
+	// the buffer if the stack has been shaded, or even avoid
+	// putting them in the buffer at all (which would double its
+	// capacity). This is slightly complicated with the buffer; we
+	// could track whether any un-shaded goroutine has used the
+	// buffer, or just track globally whether there are any
+	// un-shaded stacks and flush after each stack scan.
+	gcw := &pp.gcw
+	pos := 0
+	for _, ptr := range ptrs {
+		if ptr < minLegalPointer {
+			// nil pointers are very common, especially
+			// for the "old" values. Filter out these and
+			// other "obvious" non-heap pointers ASAP.
+			//
+			// TODO: Should we filter out nils in the fast
+			// path to reduce the rate of flushes?
+			continue
+		}
+		obj, span, objIndex := findObject(ptr, 0, 0)
+		if obj == 0 {
+			continue
+		}
+		// TODO: Consider making two passes where the first
+		// just prefetches the mark bits.
+		mbits := span.markBitsForIndex(objIndex)
+		if mbits.isMarked() {
+			continue
+		}
+		mbits.setMarked()
+
+		// Mark span.
+		arena, pageIdx, pageMask := pageIndexOf(span.base())
+		if arena.pageMarks[pageIdx]&pageMask == 0 {
+			atomic.Or8(&arena.pageMarks[pageIdx], pageMask)
+		}
+
+		if span.spanclass.noscan() {
+			gcw.bytesMarked += uint64(span.elemsize)
+			continue
+		}
+		ptrs[pos] = obj
+		pos++
+	}
+
+	// Enqueue the greyed objects.
+	gcw.putBatch(ptrs[:pos])
+
+	pp.wbBuf.reset()
+}
diff --git a/src/runtime/nbpipe_pipe.go b/src/runtime/nbpipe_pipe.go
new file mode 100644
index 0000000..408e1ec
--- /dev/null
+++ b/src/runtime/nbpipe_pipe.go
@@ -0,0 +1,19 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build aix || darwin
+
+package runtime
+
+func nonblockingPipe() (r, w int32, errno int32) {
+	r, w, errno = pipe()
+	if errno != 0 {
+		return -1, -1, errno
+	}
+	closeonexec(r)
+	setNonblock(r)
+	closeonexec(w)
+	setNonblock(w)
+	return r, w, errno
+}
diff --git a/src/runtime/nbpipe_pipe2.go b/src/runtime/nbpipe_pipe2.go
new file mode 100644
index 0000000..22d60b4
--- /dev/null
+++ b/src/runtime/nbpipe_pipe2.go
@@ -0,0 +1,11 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build dragonfly || freebsd || linux || netbsd || openbsd || solaris
+
+package runtime
+
+func nonblockingPipe() (r, w int32, errno int32) {
+	return pipe2(_O_NONBLOCK | _O_CLOEXEC)
+}
diff --git a/src/runtime/nbpipe_pipe_test.go b/src/runtime/nbpipe_pipe_test.go
new file mode 100644
index 0000000..c8cb3cf
--- /dev/null
+++ b/src/runtime/nbpipe_pipe_test.go
@@ -0,0 +1,38 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build aix || darwin
+
+package runtime_test
+
+import (
+	"runtime"
+	"syscall"
+	"testing"
+)
+
+func TestSetNonblock(t *testing.T) {
+	t.Parallel()
+
+	r, w, errno := runtime.Pipe()
+	if errno != 0 {
+		t.Fatal(syscall.Errno(errno))
+	}
+	defer func() {
+		runtime.Close(r)
+		runtime.Close(w)
+	}()
+
+	checkIsPipe(t, r, w)
+
+	runtime.SetNonblock(r)
+	runtime.SetNonblock(w)
+	checkNonblocking(t, r, "reader")
+	checkNonblocking(t, w, "writer")
+
+	runtime.Closeonexec(r)
+	runtime.Closeonexec(w)
+	checkCloseonexec(t, r, "reader")
+	checkCloseonexec(t, w, "writer")
+}
diff --git a/src/runtime/nbpipe_test.go b/src/runtime/nbpipe_test.go
new file mode 100644
index 0000000..337b8e5
--- /dev/null
+++ b/src/runtime/nbpipe_test.go
@@ -0,0 +1,74 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package runtime_test
+
+import (
+	"runtime"
+	"syscall"
+	"testing"
+	"unsafe"
+)
+
+func TestNonblockingPipe(t *testing.T) {
+	// NonblockingPipe is the test name for nonblockingPipe.
+	r, w, errno := runtime.NonblockingPipe()
+	if errno != 0 {
+		t.Fatal(syscall.Errno(errno))
+	}
+	defer runtime.Close(w)
+
+	checkIsPipe(t, r, w)
+	checkNonblocking(t, r, "reader")
+	checkCloseonexec(t, r, "reader")
+	checkNonblocking(t, w, "writer")
+	checkCloseonexec(t, w, "writer")
+
+	// Test that fcntl returns an error as expected.
+	if runtime.Close(r) != 0 {
+		t.Fatalf("Close(%d) failed", r)
+	}
+	val, errno := runtime.Fcntl(r, syscall.F_GETFD, 0)
+	if val != -1 {
+		t.Errorf("Fcntl succeeded unexpectedly")
+	} else if syscall.Errno(errno) != syscall.EBADF {
+		t.Errorf("Fcntl failed with error %v, expected %v", syscall.Errno(errno), syscall.EBADF)
+	}
+}
+
+func checkIsPipe(t *testing.T, r, w int32) {
+	bw := byte(42)
+	if n := runtime.Write(uintptr(w), unsafe.Pointer(&bw), 1); n != 1 {
+		t.Fatalf("Write(w, &b, 1) == %d, expected 1", n)
+	}
+	var br byte
+	if n := runtime.Read(r, unsafe.Pointer(&br), 1); n != 1 {
+		t.Fatalf("Read(r, &b, 1) == %d, expected 1", n)
+	}
+	if br != bw {
+		t.Errorf("pipe read %d, expected %d", br, bw)
+	}
+}
+
+func checkNonblocking(t *testing.T, fd int32, name string) {
+	t.Helper()
+	flags, errno := runtime.Fcntl(fd, syscall.F_GETFL, 0)
+	if flags == -1 {
+		t.Errorf("fcntl(%s, F_GETFL) failed: %v", name, syscall.Errno(errno))
+	} else if flags&syscall.O_NONBLOCK == 0 {
+		t.Errorf("O_NONBLOCK not set in %s flags %#x", name, flags)
+	}
+}
+
+func checkCloseonexec(t *testing.T, fd int32, name string) {
+	t.Helper()
+	flags, errno := runtime.Fcntl(fd, syscall.F_GETFD, 0)
+	if flags == -1 {
+		t.Errorf("fcntl(%s, F_GETFD) failed: %v", name, syscall.Errno(errno))
+	} else if flags&syscall.FD_CLOEXEC == 0 {
+		t.Errorf("FD_CLOEXEC not set in %s flags %#x", name, flags)
+	}
+}
diff --git a/src/runtime/net_plan9.go b/src/runtime/net_plan9.go
new file mode 100644
index 0000000..b1ac7c7
--- /dev/null
+++ b/src/runtime/net_plan9.go
@@ -0,0 +1,29 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	_ "unsafe"
+)
+
+//go:linkname runtime_ignoreHangup internal/poll.runtime_ignoreHangup
+func runtime_ignoreHangup() {
+	getg().m.ignoreHangup = true
+}
+
+//go:linkname runtime_unignoreHangup internal/poll.runtime_unignoreHangup
+func runtime_unignoreHangup(sig string) {
+	getg().m.ignoreHangup = false
+}
+
+func ignoredNote(note *byte) bool {
+	if note == nil {
+		return false
+	}
+	if gostringnocopy(note) != "hangup" {
+		return false
+	}
+	return getg().m.ignoreHangup
+}
diff --git a/src/runtime/netpoll.go b/src/runtime/netpoll.go
new file mode 100644
index 0000000..9b54e8e
--- /dev/null
+++ b/src/runtime/netpoll.go
@@ -0,0 +1,694 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix || (js && wasm) || wasip1 || windows
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// Integrated network poller (platform-independent part).
+// A particular implementation (epoll/kqueue/port/AIX/Windows)
+// must define the following functions:
+//
+// func netpollinit()
+//     Initialize the poller. Only called once.
+//
+// func netpollopen(fd uintptr, pd *pollDesc) int32
+//     Arm edge-triggered notifications for fd. The pd argument is to pass
+//     back to netpollready when fd is ready. Return an errno value.
+//
+// func netpollclose(fd uintptr) int32
+//     Disable notifications for fd. Return an errno value.
+//
+// func netpoll(delta int64) gList
+//     Poll the network. If delta < 0, block indefinitely. If delta == 0,
+//     poll without blocking. If delta > 0, block for up to delta nanoseconds.
+//     Return a list of goroutines built by calling netpollready.
+//
+// func netpollBreak()
+//     Wake up the network poller, assumed to be blocked in netpoll.
+//
+// func netpollIsPollDescriptor(fd uintptr) bool
+//     Reports whether fd is a file descriptor used by the poller.
+
+// Error codes returned by runtime_pollReset and runtime_pollWait.
+// These must match the values in internal/poll/fd_poll_runtime.go.
+const (
+	pollNoError        = 0 // no error
+	pollErrClosing     = 1 // descriptor is closed
+	pollErrTimeout     = 2 // I/O timeout
+	pollErrNotPollable = 3 // general error polling descriptor
+)
+
+// pollDesc contains 2 binary semaphores, rg and wg, to park reader and writer
+// goroutines respectively. The semaphore can be in the following states:
+//
+//	pdReady - io readiness notification is pending;
+//	          a goroutine consumes the notification by changing the state to pdNil.
+//	pdWait - a goroutine prepares to park on the semaphore, but not yet parked;
+//	         the goroutine commits to park by changing the state to G pointer,
+//	         or, alternatively, concurrent io notification changes the state to pdReady,
+//	         or, alternatively, concurrent timeout/close changes the state to pdNil.
+//	G pointer - the goroutine is blocked on the semaphore;
+//	            io notification or timeout/close changes the state to pdReady or pdNil respectively
+//	            and unparks the goroutine.
+//	pdNil - none of the above.
+const (
+	pdNil   uintptr = 0
+	pdReady uintptr = 1
+	pdWait  uintptr = 2
+)
+
+const pollBlockSize = 4 * 1024
+
+// Network poller descriptor.
+//
+// No heap pointers.
+type pollDesc struct {
+	_     sys.NotInHeap
+	link  *pollDesc      // in pollcache, protected by pollcache.lock
+	fd    uintptr        // constant for pollDesc usage lifetime
+	fdseq atomic.Uintptr // protects against stale pollDesc
+
+	// atomicInfo holds bits from closing, rd, and wd,
+	// which are only ever written while holding the lock,
+	// summarized for use by netpollcheckerr,
+	// which cannot acquire the lock.
+	// After writing these fields under lock in a way that
+	// might change the summary, code must call publishInfo
+	// before releasing the lock.
+	// Code that changes fields and then calls netpollunblock
+	// (while still holding the lock) must call publishInfo
+	// before calling netpollunblock, because publishInfo is what
+	// stops netpollblock from blocking anew
+	// (by changing the result of netpollcheckerr).
+	// atomicInfo also holds the eventErr bit,
+	// recording whether a poll event on the fd got an error;
+	// atomicInfo is the only source of truth for that bit.
+	atomicInfo atomic.Uint32 // atomic pollInfo
+
+	// rg, wg are accessed atomically and hold g pointers.
+	// (Using atomic.Uintptr here is similar to using guintptr elsewhere.)
+	rg atomic.Uintptr // pdReady, pdWait, G waiting for read or pdNil
+	wg atomic.Uintptr // pdReady, pdWait, G waiting for write or pdNil
+
+	lock    mutex // protects the following fields
+	closing bool
+	user    uint32    // user settable cookie
+	rseq    uintptr   // protects from stale read timers
+	rt      timer     // read deadline timer (set if rt.f != nil)
+	rd      int64     // read deadline (a nanotime in the future, -1 when expired)
+	wseq    uintptr   // protects from stale write timers
+	wt      timer     // write deadline timer
+	wd      int64     // write deadline (a nanotime in the future, -1 when expired)
+	self    *pollDesc // storage for indirect interface. See (*pollDesc).makeArg.
+}
+
+// pollInfo is the bits needed by netpollcheckerr, stored atomically,
+// mostly duplicating state that is manipulated under lock in pollDesc.
+// The one exception is the pollEventErr bit, which is maintained only
+// in the pollInfo.
+type pollInfo uint32
+
+const (
+	pollClosing = 1 << iota
+	pollEventErr
+	pollExpiredReadDeadline
+	pollExpiredWriteDeadline
+	pollFDSeq // 20 bit field, low 20 bits of fdseq field
+)
+
+const (
+	pollFDSeqBits = 20                   // number of bits in pollFDSeq
+	pollFDSeqMask = 1<<pollFDSeqBits - 1 // mask for pollFDSeq
+)
+
+func (i pollInfo) closing() bool              { return i&pollClosing != 0 }
+func (i pollInfo) eventErr() bool             { return i&pollEventErr != 0 }
+func (i pollInfo) expiredReadDeadline() bool  { return i&pollExpiredReadDeadline != 0 }
+func (i pollInfo) expiredWriteDeadline() bool { return i&pollExpiredWriteDeadline != 0 }
+
+// info returns the pollInfo corresponding to pd.
+func (pd *pollDesc) info() pollInfo {
+	return pollInfo(pd.atomicInfo.Load())
+}
+
+// publishInfo updates pd.atomicInfo (returned by pd.info)
+// using the other values in pd.
+// It must be called while holding pd.lock,
+// and it must be called after changing anything
+// that might affect the info bits.
+// In practice this means after changing closing
+// or changing rd or wd from < 0 to >= 0.
+func (pd *pollDesc) publishInfo() {
+	var info uint32
+	if pd.closing {
+		info |= pollClosing
+	}
+	if pd.rd < 0 {
+		info |= pollExpiredReadDeadline
+	}
+	if pd.wd < 0 {
+		info |= pollExpiredWriteDeadline
+	}
+	info |= uint32(pd.fdseq.Load()&pollFDSeqMask) << pollFDSeq
+
+	// Set all of x except the pollEventErr bit.
+	x := pd.atomicInfo.Load()
+	for !pd.atomicInfo.CompareAndSwap(x, (x&pollEventErr)|info) {
+		x = pd.atomicInfo.Load()
+	}
+}
+
+// setEventErr sets the result of pd.info().eventErr() to b.
+// We only change the error bit if seq == 0 or if seq matches pollFDSeq
+// (issue #59545).
+func (pd *pollDesc) setEventErr(b bool, seq uintptr) {
+	mSeq := uint32(seq & pollFDSeqMask)
+	x := pd.atomicInfo.Load()
+	xSeq := (x >> pollFDSeq) & pollFDSeqMask
+	if seq != 0 && xSeq != mSeq {
+		return
+	}
+	for (x&pollEventErr != 0) != b && !pd.atomicInfo.CompareAndSwap(x, x^pollEventErr) {
+		x = pd.atomicInfo.Load()
+		xSeq := (x >> pollFDSeq) & pollFDSeqMask
+		if seq != 0 && xSeq != mSeq {
+			return
+		}
+	}
+}
+
+type pollCache struct {
+	lock  mutex
+	first *pollDesc
+	// PollDesc objects must be type-stable,
+	// because we can get ready notification from epoll/kqueue
+	// after the descriptor is closed/reused.
+	// Stale notifications are detected using seq variable,
+	// seq is incremented when deadlines are changed or descriptor is reused.
+}
+
+var (
+	netpollInitLock mutex
+	netpollInited   atomic.Uint32
+
+	pollcache      pollCache
+	netpollWaiters atomic.Uint32
+)
+
+//go:linkname poll_runtime_pollServerInit internal/poll.runtime_pollServerInit
+func poll_runtime_pollServerInit() {
+	netpollGenericInit()
+}
+
+func netpollGenericInit() {
+	if netpollInited.Load() == 0 {
+		lockInit(&netpollInitLock, lockRankNetpollInit)
+		lock(&netpollInitLock)
+		if netpollInited.Load() == 0 {
+			netpollinit()
+			netpollInited.Store(1)
+		}
+		unlock(&netpollInitLock)
+	}
+}
+
+func netpollinited() bool {
+	return netpollInited.Load() != 0
+}
+
+//go:linkname poll_runtime_isPollServerDescriptor internal/poll.runtime_isPollServerDescriptor
+
+// poll_runtime_isPollServerDescriptor reports whether fd is a
+// descriptor being used by netpoll.
+func poll_runtime_isPollServerDescriptor(fd uintptr) bool {
+	return netpollIsPollDescriptor(fd)
+}
+
+//go:linkname poll_runtime_pollOpen internal/poll.runtime_pollOpen
+func poll_runtime_pollOpen(fd uintptr) (*pollDesc, int) {
+	pd := pollcache.alloc()
+	lock(&pd.lock)
+	wg := pd.wg.Load()
+	if wg != pdNil && wg != pdReady {
+		throw("runtime: blocked write on free polldesc")
+	}
+	rg := pd.rg.Load()
+	if rg != pdNil && rg != pdReady {
+		throw("runtime: blocked read on free polldesc")
+	}
+	pd.fd = fd
+	if pd.fdseq.Load() == 0 {
+		// The value 0 is special in setEventErr, so don't use it.
+		pd.fdseq.Store(1)
+	}
+	pd.closing = false
+	pd.setEventErr(false, 0)
+	pd.rseq++
+	pd.rg.Store(pdNil)
+	pd.rd = 0
+	pd.wseq++
+	pd.wg.Store(pdNil)
+	pd.wd = 0
+	pd.self = pd
+	pd.publishInfo()
+	unlock(&pd.lock)
+
+	errno := netpollopen(fd, pd)
+	if errno != 0 {
+		pollcache.free(pd)
+		return nil, int(errno)
+	}
+	return pd, 0
+}
+
+//go:linkname poll_runtime_pollClose internal/poll.runtime_pollClose
+func poll_runtime_pollClose(pd *pollDesc) {
+	if !pd.closing {
+		throw("runtime: close polldesc w/o unblock")
+	}
+	wg := pd.wg.Load()
+	if wg != pdNil && wg != pdReady {
+		throw("runtime: blocked write on closing polldesc")
+	}
+	rg := pd.rg.Load()
+	if rg != pdNil && rg != pdReady {
+		throw("runtime: blocked read on closing polldesc")
+	}
+	netpollclose(pd.fd)
+	pollcache.free(pd)
+}
+
+func (c *pollCache) free(pd *pollDesc) {
+	// pd can't be shared here, but lock anyhow because
+	// that's what publishInfo documents.
+	lock(&pd.lock)
+
+	// Increment the fdseq field, so that any currently
+	// running netpoll calls will not mark pd as ready.
+	fdseq := pd.fdseq.Load()
+	fdseq = (fdseq + 1) & (1<<taggedPointerBits - 1)
+	pd.fdseq.Store(fdseq)
+
+	pd.publishInfo()
+
+	unlock(&pd.lock)
+
+	lock(&c.lock)
+	pd.link = c.first
+	c.first = pd
+	unlock(&c.lock)
+}
+
+// poll_runtime_pollReset, which is internal/poll.runtime_pollReset,
+// prepares a descriptor for polling in mode, which is 'r' or 'w'.
+// This returns an error code; the codes are defined above.
+//
+//go:linkname poll_runtime_pollReset internal/poll.runtime_pollReset
+func poll_runtime_pollReset(pd *pollDesc, mode int) int {
+	errcode := netpollcheckerr(pd, int32(mode))
+	if errcode != pollNoError {
+		return errcode
+	}
+	if mode == 'r' {
+		pd.rg.Store(pdNil)
+	} else if mode == 'w' {
+		pd.wg.Store(pdNil)
+	}
+	return pollNoError
+}
+
+// poll_runtime_pollWait, which is internal/poll.runtime_pollWait,
+// waits for a descriptor to be ready for reading or writing,
+// according to mode, which is 'r' or 'w'.
+// This returns an error code; the codes are defined above.
+//
+//go:linkname poll_runtime_pollWait internal/poll.runtime_pollWait
+func poll_runtime_pollWait(pd *pollDesc, mode int) int {
+	errcode := netpollcheckerr(pd, int32(mode))
+	if errcode != pollNoError {
+		return errcode
+	}
+	// As for now only Solaris, illumos, AIX and wasip1 use level-triggered IO.
+	if GOOS == "solaris" || GOOS == "illumos" || GOOS == "aix" || GOOS == "wasip1" {
+		netpollarm(pd, mode)
+	}
+	for !netpollblock(pd, int32(mode), false) {
+		errcode = netpollcheckerr(pd, int32(mode))
+		if errcode != pollNoError {
+			return errcode
+		}
+		// Can happen if timeout has fired and unblocked us,
+		// but before we had a chance to run, timeout has been reset.
+		// Pretend it has not happened and retry.
+	}
+	return pollNoError
+}
+
+//go:linkname poll_runtime_pollWaitCanceled internal/poll.runtime_pollWaitCanceled
+func poll_runtime_pollWaitCanceled(pd *pollDesc, mode int) {
+	// This function is used only on windows after a failed attempt to cancel
+	// a pending async IO operation. Wait for ioready, ignore closing or timeouts.
+	for !netpollblock(pd, int32(mode), true) {
+	}
+}
+
+//go:linkname poll_runtime_pollSetDeadline internal/poll.runtime_pollSetDeadline
+func poll_runtime_pollSetDeadline(pd *pollDesc, d int64, mode int) {
+	lock(&pd.lock)
+	if pd.closing {
+		unlock(&pd.lock)
+		return
+	}
+	rd0, wd0 := pd.rd, pd.wd
+	combo0 := rd0 > 0 && rd0 == wd0
+	if d > 0 {
+		d += nanotime()
+		if d <= 0 {
+			// If the user has a deadline in the future, but the delay calculation
+			// overflows, then set the deadline to the maximum possible value.
+			d = 1<<63 - 1
+		}
+	}
+	if mode == 'r' || mode == 'r'+'w' {
+		pd.rd = d
+	}
+	if mode == 'w' || mode == 'r'+'w' {
+		pd.wd = d
+	}
+	pd.publishInfo()
+	combo := pd.rd > 0 && pd.rd == pd.wd
+	rtf := netpollReadDeadline
+	if combo {
+		rtf = netpollDeadline
+	}
+	if pd.rt.f == nil {
+		if pd.rd > 0 {
+			pd.rt.f = rtf
+			// Copy current seq into the timer arg.
+			// Timer func will check the seq against current descriptor seq,
+			// if they differ the descriptor was reused or timers were reset.
+			pd.rt.arg = pd.makeArg()
+			pd.rt.seq = pd.rseq
+			resettimer(&pd.rt, pd.rd)
+		}
+	} else if pd.rd != rd0 || combo != combo0 {
+		pd.rseq++ // invalidate current timers
+		if pd.rd > 0 {
+			modtimer(&pd.rt, pd.rd, 0, rtf, pd.makeArg(), pd.rseq)
+		} else {
+			deltimer(&pd.rt)
+			pd.rt.f = nil
+		}
+	}
+	if pd.wt.f == nil {
+		if pd.wd > 0 && !combo {
+			pd.wt.f = netpollWriteDeadline
+			pd.wt.arg = pd.makeArg()
+			pd.wt.seq = pd.wseq
+			resettimer(&pd.wt, pd.wd)
+		}
+	} else if pd.wd != wd0 || combo != combo0 {
+		pd.wseq++ // invalidate current timers
+		if pd.wd > 0 && !combo {
+			modtimer(&pd.wt, pd.wd, 0, netpollWriteDeadline, pd.makeArg(), pd.wseq)
+		} else {
+			deltimer(&pd.wt)
+			pd.wt.f = nil
+		}
+	}
+	// If we set the new deadline in the past, unblock currently pending IO if any.
+	// Note that pd.publishInfo has already been called, above, immediately after modifying rd and wd.
+	var rg, wg *g
+	if pd.rd < 0 {
+		rg = netpollunblock(pd, 'r', false)
+	}
+	if pd.wd < 0 {
+		wg = netpollunblock(pd, 'w', false)
+	}
+	unlock(&pd.lock)
+	if rg != nil {
+		netpollgoready(rg, 3)
+	}
+	if wg != nil {
+		netpollgoready(wg, 3)
+	}
+}
+
+//go:linkname poll_runtime_pollUnblock internal/poll.runtime_pollUnblock
+func poll_runtime_pollUnblock(pd *pollDesc) {
+	lock(&pd.lock)
+	if pd.closing {
+		throw("runtime: unblock on closing polldesc")
+	}
+	pd.closing = true
+	pd.rseq++
+	pd.wseq++
+	var rg, wg *g
+	pd.publishInfo()
+	rg = netpollunblock(pd, 'r', false)
+	wg = netpollunblock(pd, 'w', false)
+	if pd.rt.f != nil {
+		deltimer(&pd.rt)
+		pd.rt.f = nil
+	}
+	if pd.wt.f != nil {
+		deltimer(&pd.wt)
+		pd.wt.f = nil
+	}
+	unlock(&pd.lock)
+	if rg != nil {
+		netpollgoready(rg, 3)
+	}
+	if wg != nil {
+		netpollgoready(wg, 3)
+	}
+}
+
+// netpollready is called by the platform-specific netpoll function.
+// It declares that the fd associated with pd is ready for I/O.
+// The toRun argument is used to build a list of goroutines to return
+// from netpoll. The mode argument is 'r', 'w', or 'r'+'w' to indicate
+// whether the fd is ready for reading or writing or both.
+//
+// This may run while the world is stopped, so write barriers are not allowed.
+//
+//go:nowritebarrier
+func netpollready(toRun *gList, pd *pollDesc, mode int32) {
+	var rg, wg *g
+	if mode == 'r' || mode == 'r'+'w' {
+		rg = netpollunblock(pd, 'r', true)
+	}
+	if mode == 'w' || mode == 'r'+'w' {
+		wg = netpollunblock(pd, 'w', true)
+	}
+	if rg != nil {
+		toRun.push(rg)
+	}
+	if wg != nil {
+		toRun.push(wg)
+	}
+}
+
+func netpollcheckerr(pd *pollDesc, mode int32) int {
+	info := pd.info()
+	if info.closing() {
+		return pollErrClosing
+	}
+	if (mode == 'r' && info.expiredReadDeadline()) || (mode == 'w' && info.expiredWriteDeadline()) {
+		return pollErrTimeout
+	}
+	// Report an event scanning error only on a read event.
+	// An error on a write event will be captured in a subsequent
+	// write call that is able to report a more specific error.
+	if mode == 'r' && info.eventErr() {
+		return pollErrNotPollable
+	}
+	return pollNoError
+}
+
+func netpollblockcommit(gp *g, gpp unsafe.Pointer) bool {
+	r := atomic.Casuintptr((*uintptr)(gpp), pdWait, uintptr(unsafe.Pointer(gp)))
+	if r {
+		// Bump the count of goroutines waiting for the poller.
+		// The scheduler uses this to decide whether to block
+		// waiting for the poller if there is nothing else to do.
+		netpollWaiters.Add(1)
+	}
+	return r
+}
+
+func netpollgoready(gp *g, traceskip int) {
+	netpollWaiters.Add(-1)
+	goready(gp, traceskip+1)
+}
+
+// returns true if IO is ready, or false if timed out or closed
+// waitio - wait only for completed IO, ignore errors
+// Concurrent calls to netpollblock in the same mode are forbidden, as pollDesc
+// can hold only a single waiting goroutine for each mode.
+func netpollblock(pd *pollDesc, mode int32, waitio bool) bool {
+	gpp := &pd.rg
+	if mode == 'w' {
+		gpp = &pd.wg
+	}
+
+	// set the gpp semaphore to pdWait
+	for {
+		// Consume notification if already ready.
+		if gpp.CompareAndSwap(pdReady, pdNil) {
+			return true
+		}
+		if gpp.CompareAndSwap(pdNil, pdWait) {
+			break
+		}
+
+		// Double check that this isn't corrupt; otherwise we'd loop
+		// forever.
+		if v := gpp.Load(); v != pdReady && v != pdNil {
+			throw("runtime: double wait")
+		}
+	}
+
+	// need to recheck error states after setting gpp to pdWait
+	// this is necessary because runtime_pollUnblock/runtime_pollSetDeadline/deadlineimpl
+	// do the opposite: store to closing/rd/wd, publishInfo, load of rg/wg
+	if waitio || netpollcheckerr(pd, mode) == pollNoError {
+		gopark(netpollblockcommit, unsafe.Pointer(gpp), waitReasonIOWait, traceBlockNet, 5)
+	}
+	// be careful to not lose concurrent pdReady notification
+	old := gpp.Swap(pdNil)
+	if old > pdWait {
+		throw("runtime: corrupted polldesc")
+	}
+	return old == pdReady
+}
+
+func netpollunblock(pd *pollDesc, mode int32, ioready bool) *g {
+	gpp := &pd.rg
+	if mode == 'w' {
+		gpp = &pd.wg
+	}
+
+	for {
+		old := gpp.Load()
+		if old == pdReady {
+			return nil
+		}
+		if old == pdNil && !ioready {
+			// Only set pdReady for ioready. runtime_pollWait
+			// will check for timeout/cancel before waiting.
+			return nil
+		}
+		var new uintptr
+		if ioready {
+			new = pdReady
+		}
+		if gpp.CompareAndSwap(old, new) {
+			if old == pdWait {
+				old = pdNil
+			}
+			return (*g)(unsafe.Pointer(old))
+		}
+	}
+}
+
+func netpolldeadlineimpl(pd *pollDesc, seq uintptr, read, write bool) {
+	lock(&pd.lock)
+	// Seq arg is seq when the timer was set.
+	// If it's stale, ignore the timer event.
+	currentSeq := pd.rseq
+	if !read {
+		currentSeq = pd.wseq
+	}
+	if seq != currentSeq {
+		// The descriptor was reused or timers were reset.
+		unlock(&pd.lock)
+		return
+	}
+	var rg *g
+	if read {
+		if pd.rd <= 0 || pd.rt.f == nil {
+			throw("runtime: inconsistent read deadline")
+		}
+		pd.rd = -1
+		pd.publishInfo()
+		rg = netpollunblock(pd, 'r', false)
+	}
+	var wg *g
+	if write {
+		if pd.wd <= 0 || pd.wt.f == nil && !read {
+			throw("runtime: inconsistent write deadline")
+		}
+		pd.wd = -1
+		pd.publishInfo()
+		wg = netpollunblock(pd, 'w', false)
+	}
+	unlock(&pd.lock)
+	if rg != nil {
+		netpollgoready(rg, 0)
+	}
+	if wg != nil {
+		netpollgoready(wg, 0)
+	}
+}
+
+func netpollDeadline(arg any, seq uintptr) {
+	netpolldeadlineimpl(arg.(*pollDesc), seq, true, true)
+}
+
+func netpollReadDeadline(arg any, seq uintptr) {
+	netpolldeadlineimpl(arg.(*pollDesc), seq, true, false)
+}
+
+func netpollWriteDeadline(arg any, seq uintptr) {
+	netpolldeadlineimpl(arg.(*pollDesc), seq, false, true)
+}
+
+func (c *pollCache) alloc() *pollDesc {
+	lock(&c.lock)
+	if c.first == nil {
+		const pdSize = unsafe.Sizeof(pollDesc{})
+		n := pollBlockSize / pdSize
+		if n == 0 {
+			n = 1
+		}
+		// Must be in non-GC memory because can be referenced
+		// only from epoll/kqueue internals.
+		mem := persistentalloc(n*pdSize, 0, &memstats.other_sys)
+		for i := uintptr(0); i < n; i++ {
+			pd := (*pollDesc)(add(mem, i*pdSize))
+			pd.link = c.first
+			c.first = pd
+		}
+	}
+	pd := c.first
+	c.first = pd.link
+	lockInit(&pd.lock, lockRankPollDesc)
+	unlock(&c.lock)
+	return pd
+}
+
+// makeArg converts pd to an interface{}.
+// makeArg does not do any allocation. Normally, such
+// a conversion requires an allocation because pointers to
+// types which embed runtime/internal/sys.NotInHeap (which pollDesc is)
+// must be stored in interfaces indirectly. See issue 42076.
+func (pd *pollDesc) makeArg() (i any) {
+	x := (*eface)(unsafe.Pointer(&i))
+	x._type = pdType
+	x.data = unsafe.Pointer(&pd.self)
+	return
+}
+
+var (
+	pdEface any    = (*pollDesc)(nil)
+	pdType  *_type = efaceOf(&pdEface)._type
+)
diff --git a/src/runtime/netpoll_aix.go b/src/runtime/netpoll_aix.go
new file mode 100644
index 0000000..fad976b
--- /dev/null
+++ b/src/runtime/netpoll_aix.go
@@ -0,0 +1,229 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// This is based on the former libgo/runtime/netpoll_select.c implementation
+// except that it uses poll instead of select and is written in Go.
+// It's also based on Solaris implementation for the arming mechanisms
+
+//go:cgo_import_dynamic libc_poll poll "libc.a/shr_64.o"
+//go:linkname libc_poll libc_poll
+
+var libc_poll libFunc
+
+//go:nosplit
+func poll(pfds *pollfd, npfds uintptr, timeout uintptr) (int32, int32) {
+	r, err := syscall3(&libc_poll, uintptr(unsafe.Pointer(pfds)), npfds, timeout)
+	return int32(r), int32(err)
+}
+
+// pollfd represents the poll structure for AIX operating system.
+type pollfd struct {
+	fd      int32
+	events  int16
+	revents int16
+}
+
+const _POLLIN = 0x0001
+const _POLLOUT = 0x0002
+const _POLLHUP = 0x2000
+const _POLLERR = 0x4000
+
+var (
+	pfds           []pollfd
+	pds            []*pollDesc
+	mtxpoll        mutex
+	mtxset         mutex
+	rdwake         int32
+	wrwake         int32
+	pendingUpdates int32
+
+	netpollWakeSig atomic.Uint32 // used to avoid duplicate calls of netpollBreak
+)
+
+func netpollinit() {
+	// Create the pipe we use to wakeup poll.
+	r, w, errno := nonblockingPipe()
+	if errno != 0 {
+		throw("netpollinit: failed to create pipe")
+	}
+	rdwake = r
+	wrwake = w
+
+	// Pre-allocate array of pollfd structures for poll.
+	pfds = make([]pollfd, 1, 128)
+
+	// Poll the read side of the pipe.
+	pfds[0].fd = rdwake
+	pfds[0].events = _POLLIN
+
+	pds = make([]*pollDesc, 1, 128)
+	pds[0] = nil
+}
+
+func netpollIsPollDescriptor(fd uintptr) bool {
+	return fd == uintptr(rdwake) || fd == uintptr(wrwake)
+}
+
+// netpollwakeup writes on wrwake to wakeup poll before any changes.
+func netpollwakeup() {
+	if pendingUpdates == 0 {
+		pendingUpdates = 1
+		b := [1]byte{0}
+		write(uintptr(wrwake), unsafe.Pointer(&b[0]), 1)
+	}
+}
+
+func netpollopen(fd uintptr, pd *pollDesc) int32 {
+	lock(&mtxpoll)
+	netpollwakeup()
+
+	lock(&mtxset)
+	unlock(&mtxpoll)
+
+	// We don't worry about pd.fdseq here,
+	// as mtxset protects us from stale pollDescs.
+
+	pd.user = uint32(len(pfds))
+	pfds = append(pfds, pollfd{fd: int32(fd)})
+	pds = append(pds, pd)
+	unlock(&mtxset)
+	return 0
+}
+
+func netpollclose(fd uintptr) int32 {
+	lock(&mtxpoll)
+	netpollwakeup()
+
+	lock(&mtxset)
+	unlock(&mtxpoll)
+
+	for i := 0; i < len(pfds); i++ {
+		if pfds[i].fd == int32(fd) {
+			pfds[i] = pfds[len(pfds)-1]
+			pfds = pfds[:len(pfds)-1]
+
+			pds[i] = pds[len(pds)-1]
+			pds[i].user = uint32(i)
+			pds = pds[:len(pds)-1]
+			break
+		}
+	}
+	unlock(&mtxset)
+	return 0
+}
+
+func netpollarm(pd *pollDesc, mode int) {
+	lock(&mtxpoll)
+	netpollwakeup()
+
+	lock(&mtxset)
+	unlock(&mtxpoll)
+
+	switch mode {
+	case 'r':
+		pfds[pd.user].events |= _POLLIN
+	case 'w':
+		pfds[pd.user].events |= _POLLOUT
+	}
+	unlock(&mtxset)
+}
+
+// netpollBreak interrupts a poll.
+func netpollBreak() {
+	// Failing to cas indicates there is an in-flight wakeup, so we're done here.
+	if !netpollWakeSig.CompareAndSwap(0, 1) {
+		return
+	}
+
+	b := [1]byte{0}
+	write(uintptr(wrwake), unsafe.Pointer(&b[0]), 1)
+}
+
+// netpoll checks for ready network connections.
+// Returns list of goroutines that become runnable.
+// delay < 0: blocks indefinitely
+// delay == 0: does not block, just polls
+// delay > 0: block for up to that many nanoseconds
+//
+//go:nowritebarrierrec
+func netpoll(delay int64) gList {
+	var timeout uintptr
+	if delay < 0 {
+		timeout = ^uintptr(0)
+	} else if delay == 0 {
+		// TODO: call poll with timeout == 0
+		return gList{}
+	} else if delay < 1e6 {
+		timeout = 1
+	} else if delay < 1e15 {
+		timeout = uintptr(delay / 1e6)
+	} else {
+		// An arbitrary cap on how long to wait for a timer.
+		// 1e9 ms == ~11.5 days.
+		timeout = 1e9
+	}
+retry:
+	lock(&mtxpoll)
+	lock(&mtxset)
+	pendingUpdates = 0
+	unlock(&mtxpoll)
+
+	n, e := poll(&pfds[0], uintptr(len(pfds)), timeout)
+	if n < 0 {
+		if e != _EINTR {
+			println("errno=", e, " len(pfds)=", len(pfds))
+			throw("poll failed")
+		}
+		unlock(&mtxset)
+		// If a timed sleep was interrupted, just return to
+		// recalculate how long we should sleep now.
+		if timeout > 0 {
+			return gList{}
+		}
+		goto retry
+	}
+	// Check if some descriptors need to be changed
+	if n != 0 && pfds[0].revents&(_POLLIN|_POLLHUP|_POLLERR) != 0 {
+		if delay != 0 {
+			// A netpollwakeup could be picked up by a
+			// non-blocking poll. Only clear the wakeup
+			// if blocking.
+			var b [1]byte
+			for read(rdwake, unsafe.Pointer(&b[0]), 1) == 1 {
+			}
+			netpollWakeSig.Store(0)
+		}
+		// Still look at the other fds even if the mode may have
+		// changed, as netpollBreak might have been called.
+		n--
+	}
+	var toRun gList
+	for i := 1; i < len(pfds) && n > 0; i++ {
+		pfd := &pfds[i]
+
+		var mode int32
+		if pfd.revents&(_POLLIN|_POLLHUP|_POLLERR) != 0 {
+			mode += 'r'
+			pfd.events &= ^_POLLIN
+		}
+		if pfd.revents&(_POLLOUT|_POLLHUP|_POLLERR) != 0 {
+			mode += 'w'
+			pfd.events &= ^_POLLOUT
+		}
+		if mode != 0 {
+			pds[i].setEventErr(pfd.revents == _POLLERR, 0)
+			netpollready(&toRun, pds[i], mode)
+			n--
+		}
+	}
+	unlock(&mtxset)
+	return toRun
+}
diff --git a/src/runtime/netpoll_epoll.go b/src/runtime/netpoll_epoll.go
new file mode 100644
index 0000000..e29b64d
--- /dev/null
+++ b/src/runtime/netpoll_epoll.go
@@ -0,0 +1,172 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"runtime/internal/syscall"
+	"unsafe"
+)
+
+var (
+	epfd int32 = -1 // epoll descriptor
+
+	netpollBreakRd, netpollBreakWr uintptr // for netpollBreak
+
+	netpollWakeSig atomic.Uint32 // used to avoid duplicate calls of netpollBreak
+)
+
+func netpollinit() {
+	var errno uintptr
+	epfd, errno = syscall.EpollCreate1(syscall.EPOLL_CLOEXEC)
+	if errno != 0 {
+		println("runtime: epollcreate failed with", errno)
+		throw("runtime: netpollinit failed")
+	}
+	r, w, errpipe := nonblockingPipe()
+	if errpipe != 0 {
+		println("runtime: pipe failed with", -errpipe)
+		throw("runtime: pipe failed")
+	}
+	ev := syscall.EpollEvent{
+		Events: syscall.EPOLLIN,
+	}
+	*(**uintptr)(unsafe.Pointer(&ev.Data)) = &netpollBreakRd
+	errno = syscall.EpollCtl(epfd, syscall.EPOLL_CTL_ADD, r, &ev)
+	if errno != 0 {
+		println("runtime: epollctl failed with", errno)
+		throw("runtime: epollctl failed")
+	}
+	netpollBreakRd = uintptr(r)
+	netpollBreakWr = uintptr(w)
+}
+
+func netpollIsPollDescriptor(fd uintptr) bool {
+	return fd == uintptr(epfd) || fd == netpollBreakRd || fd == netpollBreakWr
+}
+
+func netpollopen(fd uintptr, pd *pollDesc) uintptr {
+	var ev syscall.EpollEvent
+	ev.Events = syscall.EPOLLIN | syscall.EPOLLOUT | syscall.EPOLLRDHUP | syscall.EPOLLET
+	tp := taggedPointerPack(unsafe.Pointer(pd), pd.fdseq.Load())
+	*(*taggedPointer)(unsafe.Pointer(&ev.Data)) = tp
+	return syscall.EpollCtl(epfd, syscall.EPOLL_CTL_ADD, int32(fd), &ev)
+}
+
+func netpollclose(fd uintptr) uintptr {
+	var ev syscall.EpollEvent
+	return syscall.EpollCtl(epfd, syscall.EPOLL_CTL_DEL, int32(fd), &ev)
+}
+
+func netpollarm(pd *pollDesc, mode int) {
+	throw("runtime: unused")
+}
+
+// netpollBreak interrupts an epollwait.
+func netpollBreak() {
+	// Failing to cas indicates there is an in-flight wakeup, so we're done here.
+	if !netpollWakeSig.CompareAndSwap(0, 1) {
+		return
+	}
+
+	for {
+		var b byte
+		n := write(netpollBreakWr, unsafe.Pointer(&b), 1)
+		if n == 1 {
+			break
+		}
+		if n == -_EINTR {
+			continue
+		}
+		if n == -_EAGAIN {
+			return
+		}
+		println("runtime: netpollBreak write failed with", -n)
+		throw("runtime: netpollBreak write failed")
+	}
+}
+
+// netpoll checks for ready network connections.
+// Returns list of goroutines that become runnable.
+// delay < 0: blocks indefinitely
+// delay == 0: does not block, just polls
+// delay > 0: block for up to that many nanoseconds
+func netpoll(delay int64) gList {
+	if epfd == -1 {
+		return gList{}
+	}
+	var waitms int32
+	if delay < 0 {
+		waitms = -1
+	} else if delay == 0 {
+		waitms = 0
+	} else if delay < 1e6 {
+		waitms = 1
+	} else if delay < 1e15 {
+		waitms = int32(delay / 1e6)
+	} else {
+		// An arbitrary cap on how long to wait for a timer.
+		// 1e9 ms == ~11.5 days.
+		waitms = 1e9
+	}
+	var events [128]syscall.EpollEvent
+retry:
+	n, errno := syscall.EpollWait(epfd, events[:], int32(len(events)), waitms)
+	if errno != 0 {
+		if errno != _EINTR {
+			println("runtime: epollwait on fd", epfd, "failed with", errno)
+			throw("runtime: netpoll failed")
+		}
+		// If a timed sleep was interrupted, just return to
+		// recalculate how long we should sleep now.
+		if waitms > 0 {
+			return gList{}
+		}
+		goto retry
+	}
+	var toRun gList
+	for i := int32(0); i < n; i++ {
+		ev := events[i]
+		if ev.Events == 0 {
+			continue
+		}
+
+		if *(**uintptr)(unsafe.Pointer(&ev.Data)) == &netpollBreakRd {
+			if ev.Events != syscall.EPOLLIN {
+				println("runtime: netpoll: break fd ready for", ev.Events)
+				throw("runtime: netpoll: break fd ready for something unexpected")
+			}
+			if delay != 0 {
+				// netpollBreak could be picked up by a
+				// nonblocking poll. Only read the byte
+				// if blocking.
+				var tmp [16]byte
+				read(int32(netpollBreakRd), noescape(unsafe.Pointer(&tmp[0])), int32(len(tmp)))
+				netpollWakeSig.Store(0)
+			}
+			continue
+		}
+
+		var mode int32
+		if ev.Events&(syscall.EPOLLIN|syscall.EPOLLRDHUP|syscall.EPOLLHUP|syscall.EPOLLERR) != 0 {
+			mode += 'r'
+		}
+		if ev.Events&(syscall.EPOLLOUT|syscall.EPOLLHUP|syscall.EPOLLERR) != 0 {
+			mode += 'w'
+		}
+		if mode != 0 {
+			tp := *(*taggedPointer)(unsafe.Pointer(&ev.Data))
+			pd := (*pollDesc)(tp.pointer())
+			tag := tp.tag()
+			if pd.fdseq.Load() == tag {
+				pd.setEventErr(ev.Events == syscall.EPOLLERR, tag)
+				netpollready(&toRun, pd, mode)
+			}
+		}
+	}
+	return toRun
+}
diff --git a/src/runtime/netpoll_fake.go b/src/runtime/netpoll_fake.go
new file mode 100644
index 0000000..5319561
--- /dev/null
+++ b/src/runtime/netpoll_fake.go
@@ -0,0 +1,35 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Fake network poller for js/wasm.
+// Should never be used, because js/wasm network connections do not honor "SetNonblock".
+
+//go:build js && wasm
+
+package runtime
+
+func netpollinit() {
+}
+
+func netpollIsPollDescriptor(fd uintptr) bool {
+	return false
+}
+
+func netpollopen(fd uintptr, pd *pollDesc) int32 {
+	return 0
+}
+
+func netpollclose(fd uintptr) int32 {
+	return 0
+}
+
+func netpollarm(pd *pollDesc, mode int) {
+}
+
+func netpollBreak() {
+}
+
+func netpoll(delay int64) gList {
+	return gList{}
+}
diff --git a/src/runtime/netpoll_kqueue.go b/src/runtime/netpoll_kqueue.go
new file mode 100644
index 0000000..3af45e6
--- /dev/null
+++ b/src/runtime/netpoll_kqueue.go
@@ -0,0 +1,215 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build darwin || dragonfly || freebsd || netbsd || openbsd
+
+package runtime
+
+// Integrated network poller (kqueue-based implementation).
+
+import (
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+var (
+	kq int32 = -1
+
+	netpollBreakRd, netpollBreakWr uintptr // for netpollBreak
+
+	netpollWakeSig atomic.Uint32 // used to avoid duplicate calls of netpollBreak
+)
+
+func netpollinit() {
+	kq = kqueue()
+	if kq < 0 {
+		println("runtime: kqueue failed with", -kq)
+		throw("runtime: netpollinit failed")
+	}
+	closeonexec(kq)
+	r, w, errno := nonblockingPipe()
+	if errno != 0 {
+		println("runtime: pipe failed with", -errno)
+		throw("runtime: pipe failed")
+	}
+	ev := keventt{
+		filter: _EVFILT_READ,
+		flags:  _EV_ADD,
+	}
+	*(*uintptr)(unsafe.Pointer(&ev.ident)) = uintptr(r)
+	n := kevent(kq, &ev, 1, nil, 0, nil)
+	if n < 0 {
+		println("runtime: kevent failed with", -n)
+		throw("runtime: kevent failed")
+	}
+	netpollBreakRd = uintptr(r)
+	netpollBreakWr = uintptr(w)
+}
+
+func netpollIsPollDescriptor(fd uintptr) bool {
+	return fd == uintptr(kq) || fd == netpollBreakRd || fd == netpollBreakWr
+}
+
+func netpollopen(fd uintptr, pd *pollDesc) int32 {
+	// Arm both EVFILT_READ and EVFILT_WRITE in edge-triggered mode (EV_CLEAR)
+	// for the whole fd lifetime. The notifications are automatically unregistered
+	// when fd is closed.
+	var ev [2]keventt
+	*(*uintptr)(unsafe.Pointer(&ev[0].ident)) = fd
+	ev[0].filter = _EVFILT_READ
+	ev[0].flags = _EV_ADD | _EV_CLEAR
+	ev[0].fflags = 0
+	ev[0].data = 0
+
+	if goarch.PtrSize == 4 {
+		// We only have a pointer-sized field to store into,
+		// so on a 32-bit system we get no sequence protection.
+		// TODO(iant): If we notice any problems we could at least
+		// steal the low-order 2 bits for a tiny sequence number.
+		ev[0].udata = (*byte)(unsafe.Pointer(pd))
+	} else {
+		tp := taggedPointerPack(unsafe.Pointer(pd), pd.fdseq.Load())
+		ev[0].udata = (*byte)(unsafe.Pointer(uintptr(tp)))
+	}
+	ev[1] = ev[0]
+	ev[1].filter = _EVFILT_WRITE
+	n := kevent(kq, &ev[0], 2, nil, 0, nil)
+	if n < 0 {
+		return -n
+	}
+	return 0
+}
+
+func netpollclose(fd uintptr) int32 {
+	// Don't need to unregister because calling close()
+	// on fd will remove any kevents that reference the descriptor.
+	return 0
+}
+
+func netpollarm(pd *pollDesc, mode int) {
+	throw("runtime: unused")
+}
+
+// netpollBreak interrupts a kevent.
+func netpollBreak() {
+	// Failing to cas indicates there is an in-flight wakeup, so we're done here.
+	if !netpollWakeSig.CompareAndSwap(0, 1) {
+		return
+	}
+
+	for {
+		var b byte
+		n := write(netpollBreakWr, unsafe.Pointer(&b), 1)
+		if n == 1 || n == -_EAGAIN {
+			break
+		}
+		if n == -_EINTR {
+			continue
+		}
+		println("runtime: netpollBreak write failed with", -n)
+		throw("runtime: netpollBreak write failed")
+	}
+}
+
+// netpoll checks for ready network connections.
+// Returns list of goroutines that become runnable.
+// delay < 0: blocks indefinitely
+// delay == 0: does not block, just polls
+// delay > 0: block for up to that many nanoseconds
+func netpoll(delay int64) gList {
+	if kq == -1 {
+		return gList{}
+	}
+	var tp *timespec
+	var ts timespec
+	if delay < 0 {
+		tp = nil
+	} else if delay == 0 {
+		tp = &ts
+	} else {
+		ts.setNsec(delay)
+		if ts.tv_sec > 1e6 {
+			// Darwin returns EINVAL if the sleep time is too long.
+			ts.tv_sec = 1e6
+		}
+		tp = &ts
+	}
+	var events [64]keventt
+retry:
+	n := kevent(kq, nil, 0, &events[0], int32(len(events)), tp)
+	if n < 0 {
+		if n != -_EINTR {
+			println("runtime: kevent on fd", kq, "failed with", -n)
+			throw("runtime: netpoll failed")
+		}
+		// If a timed sleep was interrupted, just return to
+		// recalculate how long we should sleep now.
+		if delay > 0 {
+			return gList{}
+		}
+		goto retry
+	}
+	var toRun gList
+	for i := 0; i < int(n); i++ {
+		ev := &events[i]
+
+		if uintptr(ev.ident) == netpollBreakRd {
+			if ev.filter != _EVFILT_READ {
+				println("runtime: netpoll: break fd ready for", ev.filter)
+				throw("runtime: netpoll: break fd ready for something unexpected")
+			}
+			if delay != 0 {
+				// netpollBreak could be picked up by a
+				// nonblocking poll. Only read the byte
+				// if blocking.
+				var tmp [16]byte
+				read(int32(netpollBreakRd), noescape(unsafe.Pointer(&tmp[0])), int32(len(tmp)))
+				netpollWakeSig.Store(0)
+			}
+			continue
+		}
+
+		var mode int32
+		switch ev.filter {
+		case _EVFILT_READ:
+			mode += 'r'
+
+			// On some systems when the read end of a pipe
+			// is closed the write end will not get a
+			// _EVFILT_WRITE event, but will get a
+			// _EVFILT_READ event with EV_EOF set.
+			// Note that setting 'w' here just means that we
+			// will wake up a goroutine waiting to write;
+			// that goroutine will try the write again,
+			// and the appropriate thing will happen based
+			// on what that write returns (success, EPIPE, EAGAIN).
+			if ev.flags&_EV_EOF != 0 {
+				mode += 'w'
+			}
+		case _EVFILT_WRITE:
+			mode += 'w'
+		}
+		if mode != 0 {
+			var pd *pollDesc
+			var tag uintptr
+			if goarch.PtrSize == 4 {
+				// No sequence protection on 32-bit systems.
+				// See netpollopen for details.
+				pd = (*pollDesc)(unsafe.Pointer(ev.udata))
+				tag = 0
+			} else {
+				tp := taggedPointer(uintptr(unsafe.Pointer(ev.udata)))
+				pd = (*pollDesc)(tp.pointer())
+				tag = tp.tag()
+				if pd.fdseq.Load() != tag {
+					continue
+				}
+			}
+			pd.setEventErr(ev.flags == _EV_ERROR, tag)
+			netpollready(&toRun, pd, mode)
+		}
+	}
+	return toRun
+}
diff --git a/src/runtime/netpoll_os_test.go b/src/runtime/netpoll_os_test.go
new file mode 100644
index 0000000..b96b9f3
--- /dev/null
+++ b/src/runtime/netpoll_os_test.go
@@ -0,0 +1,28 @@
+package runtime_test
+
+import (
+	"runtime"
+	"sync"
+	"testing"
+)
+
+var wg sync.WaitGroup
+
+func init() {
+	runtime.NetpollGenericInit()
+}
+
+func BenchmarkNetpollBreak(b *testing.B) {
+	b.StartTimer()
+	for i := 0; i < b.N; i++ {
+		for j := 0; j < 10; j++ {
+			wg.Add(1)
+			go func() {
+				runtime.NetpollBreak()
+				wg.Done()
+			}()
+		}
+	}
+	wg.Wait()
+	b.StopTimer()
+}
diff --git a/src/runtime/netpoll_solaris.go b/src/runtime/netpoll_solaris.go
new file mode 100644
index 0000000..13c7ffc
--- /dev/null
+++ b/src/runtime/netpoll_solaris.go
@@ -0,0 +1,332 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// Solaris runtime-integrated network poller.
+//
+// Solaris uses event ports for scalable network I/O. Event
+// ports are level-triggered, unlike epoll and kqueue which
+// can be configured in both level-triggered and edge-triggered
+// mode. Level triggering means we have to keep track of a few things
+// ourselves. After we receive an event for a file descriptor,
+// it's our responsibility to ask again to be notified for future
+// events for that descriptor. When doing this we must keep track of
+// what kind of events the goroutines are currently interested in,
+// for example a fd may be open both for reading and writing.
+//
+// A description of the high level operation of this code
+// follows. Networking code will get a file descriptor by some means
+// and will register it with the netpolling mechanism by a code path
+// that eventually calls runtime·netpollopen. runtime·netpollopen
+// calls port_associate with an empty event set. That means that we
+// will not receive any events at this point. The association needs
+// to be done at this early point because we need to process the I/O
+// readiness notification at some point in the future. If I/O becomes
+// ready when nobody is listening, when we finally care about it,
+// nobody will tell us anymore.
+//
+// Beside calling runtime·netpollopen, the networking code paths
+// will call runtime·netpollarm each time goroutines are interested
+// in doing network I/O. Because now we know what kind of I/O we
+// are interested in (reading/writing), we can call port_associate
+// passing the correct type of event set (POLLIN/POLLOUT). As we made
+// sure to have already associated the file descriptor with the port,
+// when we now call port_associate, we will unblock the main poller
+// loop (in runtime·netpoll) right away if the socket is actually
+// ready for I/O.
+//
+// The main poller loop runs in its own thread waiting for events
+// using port_getn. When an event happens, it will tell the scheduler
+// about it using runtime·netpollready. Besides doing this, it must
+// also re-associate the events that were not part of this current
+// notification with the file descriptor. Failing to do this would
+// mean each notification will prevent concurrent code using the
+// same file descriptor in parallel.
+//
+// The logic dealing with re-associations is encapsulated in
+// runtime·netpollupdate. This function takes care to associate the
+// descriptor only with the subset of events that were previously
+// part of the association, except the one that just happened. We
+// can't re-associate with that right away, because event ports
+// are level triggered so it would cause a busy loop. Instead, that
+// association is effected only by the runtime·netpollarm code path,
+// when Go code actually asks for I/O.
+//
+// The open and arming mechanisms are serialized using the lock
+// inside PollDesc. This is required because the netpoll loop runs
+// asynchronously in respect to other Go code and by the time we get
+// to call port_associate to update the association in the loop, the
+// file descriptor might have been closed and reopened already. The
+// lock allows runtime·netpollupdate to be called synchronously from
+// the loop thread while preventing other threads operating to the
+// same PollDesc, so once we unblock in the main loop, until we loop
+// again we know for sure we are always talking about the same file
+// descriptor and can safely access the data we want (the event set).
+
+//go:cgo_import_dynamic libc_port_create port_create "libc.so"
+//go:cgo_import_dynamic libc_port_associate port_associate "libc.so"
+//go:cgo_import_dynamic libc_port_dissociate port_dissociate "libc.so"
+//go:cgo_import_dynamic libc_port_getn port_getn "libc.so"
+//go:cgo_import_dynamic libc_port_alert port_alert "libc.so"
+
+//go:linkname libc_port_create libc_port_create
+//go:linkname libc_port_associate libc_port_associate
+//go:linkname libc_port_dissociate libc_port_dissociate
+//go:linkname libc_port_getn libc_port_getn
+//go:linkname libc_port_alert libc_port_alert
+
+var (
+	libc_port_create,
+	libc_port_associate,
+	libc_port_dissociate,
+	libc_port_getn,
+	libc_port_alert libcFunc
+	netpollWakeSig atomic.Uint32 // used to avoid duplicate calls of netpollBreak
+)
+
+func errno() int32 {
+	return *getg().m.perrno
+}
+
+func port_create() int32 {
+	return int32(sysvicall0(&libc_port_create))
+}
+
+func port_associate(port, source int32, object uintptr, events uint32, user uintptr) int32 {
+	return int32(sysvicall5(&libc_port_associate, uintptr(port), uintptr(source), object, uintptr(events), user))
+}
+
+func port_dissociate(port, source int32, object uintptr) int32 {
+	return int32(sysvicall3(&libc_port_dissociate, uintptr(port), uintptr(source), object))
+}
+
+func port_getn(port int32, evs *portevent, max uint32, nget *uint32, timeout *timespec) int32 {
+	return int32(sysvicall5(&libc_port_getn, uintptr(port), uintptr(unsafe.Pointer(evs)), uintptr(max), uintptr(unsafe.Pointer(nget)), uintptr(unsafe.Pointer(timeout))))
+}
+
+func port_alert(port int32, flags, events uint32, user uintptr) int32 {
+	return int32(sysvicall4(&libc_port_alert, uintptr(port), uintptr(flags), uintptr(events), user))
+}
+
+var portfd int32 = -1
+
+func netpollinit() {
+	portfd = port_create()
+	if portfd >= 0 {
+		closeonexec(portfd)
+		return
+	}
+
+	print("runtime: port_create failed (errno=", errno(), ")\n")
+	throw("runtime: netpollinit failed")
+}
+
+func netpollIsPollDescriptor(fd uintptr) bool {
+	return fd == uintptr(portfd)
+}
+
+func netpollopen(fd uintptr, pd *pollDesc) int32 {
+	lock(&pd.lock)
+	// We don't register for any specific type of events yet, that's
+	// netpollarm's job. We merely ensure we call port_associate before
+	// asynchronous connect/accept completes, so when we actually want
+	// to do any I/O, the call to port_associate (from netpollarm,
+	// with the interested event set) will unblock port_getn right away
+	// because of the I/O readiness notification.
+	pd.user = 0
+	tp := taggedPointerPack(unsafe.Pointer(pd), pd.fdseq.Load())
+	// Note that this won't work on a 32-bit system,
+	// as taggedPointer is always 64-bits but uintptr will be 32 bits.
+	// Fortunately we only support Solaris on amd64.
+	if goarch.PtrSize != 8 {
+		throw("runtime: netpollopen: unsupported pointer size")
+	}
+	r := port_associate(portfd, _PORT_SOURCE_FD, fd, 0, uintptr(tp))
+	unlock(&pd.lock)
+	return r
+}
+
+func netpollclose(fd uintptr) int32 {
+	return port_dissociate(portfd, _PORT_SOURCE_FD, fd)
+}
+
+// Updates the association with a new set of interested events. After
+// this call, port_getn will return one and only one event for that
+// particular descriptor, so this function needs to be called again.
+func netpollupdate(pd *pollDesc, set, clear uint32) {
+	if pd.info().closing() {
+		return
+	}
+
+	old := pd.user
+	events := (old & ^clear) | set
+	if old == events {
+		return
+	}
+
+	tp := taggedPointerPack(unsafe.Pointer(pd), pd.fdseq.Load())
+	if events != 0 && port_associate(portfd, _PORT_SOURCE_FD, pd.fd, events, uintptr(tp)) != 0 {
+		print("runtime: port_associate failed (errno=", errno(), ")\n")
+		throw("runtime: netpollupdate failed")
+	}
+	pd.user = events
+}
+
+// subscribe the fd to the port such that port_getn will return one event.
+func netpollarm(pd *pollDesc, mode int) {
+	lock(&pd.lock)
+	switch mode {
+	case 'r':
+		netpollupdate(pd, _POLLIN, 0)
+	case 'w':
+		netpollupdate(pd, _POLLOUT, 0)
+	default:
+		throw("runtime: bad mode")
+	}
+	unlock(&pd.lock)
+}
+
+// netpollBreak interrupts a port_getn wait.
+func netpollBreak() {
+	// Failing to cas indicates there is an in-flight wakeup, so we're done here.
+	if !netpollWakeSig.CompareAndSwap(0, 1) {
+		return
+	}
+
+	// Use port_alert to put portfd into alert mode.
+	// This will wake up all threads sleeping in port_getn on portfd,
+	// and cause their calls to port_getn to return immediately.
+	// Further, until portfd is taken out of alert mode,
+	// all calls to port_getn will return immediately.
+	if port_alert(portfd, _PORT_ALERT_UPDATE, _POLLHUP, uintptr(unsafe.Pointer(&portfd))) < 0 {
+		if e := errno(); e != _EBUSY {
+			println("runtime: port_alert failed with", e)
+			throw("runtime: netpoll: port_alert failed")
+		}
+	}
+}
+
+// netpoll checks for ready network connections.
+// Returns list of goroutines that become runnable.
+// delay < 0: blocks indefinitely
+// delay == 0: does not block, just polls
+// delay > 0: block for up to that many nanoseconds
+func netpoll(delay int64) gList {
+	if portfd == -1 {
+		return gList{}
+	}
+
+	var wait *timespec
+	var ts timespec
+	if delay < 0 {
+		wait = nil
+	} else if delay == 0 {
+		wait = &ts
+	} else {
+		ts.setNsec(delay)
+		if ts.tv_sec > 1e6 {
+			// An arbitrary cap on how long to wait for a timer.
+			// 1e6 s == ~11.5 days.
+			ts.tv_sec = 1e6
+		}
+		wait = &ts
+	}
+
+	var events [128]portevent
+retry:
+	var n uint32 = 1
+	r := port_getn(portfd, &events[0], uint32(len(events)), &n, wait)
+	e := errno()
+	if r < 0 && e == _ETIME && n > 0 {
+		// As per port_getn(3C), an ETIME failure does not preclude the
+		// delivery of some number of events.  Treat a timeout failure
+		// with delivered events as a success.
+		r = 0
+	}
+	if r < 0 {
+		if e != _EINTR && e != _ETIME {
+			print("runtime: port_getn on fd ", portfd, " failed (errno=", e, ")\n")
+			throw("runtime: netpoll failed")
+		}
+		// If a timed sleep was interrupted and there are no events,
+		// just return to recalculate how long we should sleep now.
+		if delay > 0 {
+			return gList{}
+		}
+		goto retry
+	}
+
+	var toRun gList
+	for i := 0; i < int(n); i++ {
+		ev := &events[i]
+
+		if ev.portev_source == _PORT_SOURCE_ALERT {
+			if ev.portev_events != _POLLHUP || unsafe.Pointer(ev.portev_user) != unsafe.Pointer(&portfd) {
+				throw("runtime: netpoll: bad port_alert wakeup")
+			}
+			if delay != 0 {
+				// Now that a blocking call to netpoll
+				// has seen the alert, take portfd
+				// back out of alert mode.
+				// See the comment in netpollBreak.
+				if port_alert(portfd, 0, 0, 0) < 0 {
+					e := errno()
+					println("runtime: port_alert failed with", e)
+					throw("runtime: netpoll: port_alert failed")
+				}
+				netpollWakeSig.Store(0)
+			}
+			continue
+		}
+
+		if ev.portev_events == 0 {
+			continue
+		}
+
+		tp := taggedPointer(uintptr(unsafe.Pointer(ev.portev_user)))
+		pd := (*pollDesc)(tp.pointer())
+		if pd.fdseq.Load() != tp.tag() {
+			continue
+		}
+
+		var mode, clear int32
+		if (ev.portev_events & (_POLLIN | _POLLHUP | _POLLERR)) != 0 {
+			mode += 'r'
+			clear |= _POLLIN
+		}
+		if (ev.portev_events & (_POLLOUT | _POLLHUP | _POLLERR)) != 0 {
+			mode += 'w'
+			clear |= _POLLOUT
+		}
+		// To effect edge-triggered events, we need to be sure to
+		// update our association with whatever events were not
+		// set with the event. For example if we are registered
+		// for POLLIN|POLLOUT, and we get POLLIN, besides waking
+		// the goroutine interested in POLLIN we have to not forget
+		// about the one interested in POLLOUT.
+		if clear != 0 {
+			lock(&pd.lock)
+			netpollupdate(pd, 0, uint32(clear))
+			unlock(&pd.lock)
+		}
+
+		if mode != 0 {
+			// TODO(mikio): Consider implementing event
+			// scanning error reporting once we are sure
+			// about the event port on SmartOS.
+			//
+			// See golang.org/x/issue/30840.
+			netpollready(&toRun, pd, mode)
+		}
+	}
+
+	return toRun
+}
diff --git a/src/runtime/netpoll_stub.go b/src/runtime/netpoll_stub.go
new file mode 100644
index 0000000..14cf0c3
--- /dev/null
+++ b/src/runtime/netpoll_stub.go
@@ -0,0 +1,61 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build plan9
+
+package runtime
+
+import "runtime/internal/atomic"
+
+var netpollInited atomic.Uint32
+var netpollWaiters atomic.Uint32
+
+var netpollStubLock mutex
+var netpollNote note
+
+// netpollBroken, protected by netpollBrokenLock, avoids a double notewakeup.
+var netpollBrokenLock mutex
+var netpollBroken bool
+
+func netpollGenericInit() {
+	netpollInited.Store(1)
+}
+
+func netpollBreak() {
+	lock(&netpollBrokenLock)
+	broken := netpollBroken
+	netpollBroken = true
+	if !broken {
+		notewakeup(&netpollNote)
+	}
+	unlock(&netpollBrokenLock)
+}
+
+// Polls for ready network connections.
+// Returns list of goroutines that become runnable.
+func netpoll(delay int64) gList {
+	// Implementation for platforms that do not support
+	// integrated network poller.
+	if delay != 0 {
+		// This lock ensures that only one goroutine tries to use
+		// the note. It should normally be completely uncontended.
+		lock(&netpollStubLock)
+
+		lock(&netpollBrokenLock)
+		noteclear(&netpollNote)
+		netpollBroken = false
+		unlock(&netpollBrokenLock)
+
+		notetsleep(&netpollNote, delay)
+		unlock(&netpollStubLock)
+		// Guard against starvation in case the lock is contended
+		// (eg when running TestNetpollBreak).
+		osyield()
+	}
+	return gList{}
+}
+
+func netpollinited() bool {
+	return netpollInited.Load() != 0
+}
diff --git a/src/runtime/netpoll_wasip1.go b/src/runtime/netpoll_wasip1.go
new file mode 100644
index 0000000..677287b
--- /dev/null
+++ b/src/runtime/netpoll_wasip1.go
@@ -0,0 +1,254 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build wasip1
+
+package runtime
+
+import "unsafe"
+
+// WASI network poller.
+//
+// WASI preview 1 includes a poll_oneoff host function that behaves similarly
+// to poll(2) on Linux. Like poll(2), poll_oneoff is level triggered. It
+// accepts one or more subscriptions to FD read or write events.
+//
+// Major differences to poll(2):
+// - the events are not written to the input entries (like pollfd.revents), and
+//   instead are appended to a separate events buffer. poll_oneoff writes zero
+//   or more events to the buffer (at most one per input subscription) and
+//   returns the number of events written. Although the index of the
+//   subscriptions might not match the index of the associated event in the
+//   events buffer, both the subscription and event structs contain a userdata
+//   field and when a subscription yields an event the userdata fields will
+//   match.
+// - there's no explicit timeout parameter, although a time limit can be added
+//   by using "clock" subscriptions.
+// - each FD subscription can either be for a read or a write, but not both.
+//   This is in contrast to poll(2) which accepts a mask with POLLIN and
+//   POLLOUT bits, allowing for a subscription to either, neither, or both
+//   reads and writes.
+//
+// Since poll_oneoff is similar to poll(2), the implementation here was derived
+// from netpoll_aix.go.
+
+const _EINTR = 27
+
+var (
+	evts []event
+	subs []subscription
+	pds  []*pollDesc
+	mtx  mutex
+)
+
+func netpollinit() {
+	// Unlike poll(2), WASI's poll_oneoff doesn't accept a timeout directly. To
+	// prevent it from blocking indefinitely, a clock subscription with a
+	// timeout field needs to be submitted. Reserve a slot here for the clock
+	// subscription, and set fields that won't change between poll_oneoff calls.
+
+	subs = make([]subscription, 1, 128)
+	evts = make([]event, 0, 128)
+	pds = make([]*pollDesc, 0, 128)
+
+	timeout := &subs[0]
+	eventtype := timeout.u.eventtype()
+	*eventtype = eventtypeClock
+	clock := timeout.u.subscriptionClock()
+	clock.id = clockMonotonic
+	clock.precision = 1e3
+}
+
+func netpollIsPollDescriptor(fd uintptr) bool {
+	return false
+}
+
+func netpollopen(fd uintptr, pd *pollDesc) int32 {
+	lock(&mtx)
+
+	// We don't worry about pd.fdseq here,
+	// as mtx protects us from stale pollDescs.
+
+	pds = append(pds, pd)
+
+	// The 32-bit pd.user field holds the index of the read subscription in the
+	// upper 16 bits, and index of the write subscription in the lower bits.
+	// A disarmed=^uint16(0) sentinel is used to represent no subscription.
+	// There is thus a maximum of 65535 total subscriptions.
+	pd.user = uint32(disarmed)<<16 | uint32(disarmed)
+
+	unlock(&mtx)
+	return 0
+}
+
+const disarmed = 0xFFFF
+
+func netpollarm(pd *pollDesc, mode int) {
+	lock(&mtx)
+
+	var s subscription
+
+	s.userdata = userdata(uintptr(unsafe.Pointer(pd)))
+
+	fdReadwrite := s.u.subscriptionFdReadwrite()
+	fdReadwrite.fd = int32(pd.fd)
+
+	ridx := int(pd.user >> 16)
+	widx := int(pd.user & 0xFFFF)
+
+	if (mode == 'r' && ridx != disarmed) || (mode == 'w' && widx != disarmed) {
+		unlock(&mtx)
+		return
+	}
+
+	eventtype := s.u.eventtype()
+	switch mode {
+	case 'r':
+		*eventtype = eventtypeFdRead
+		ridx = len(subs)
+	case 'w':
+		*eventtype = eventtypeFdWrite
+		widx = len(subs)
+	}
+
+	if len(subs) == disarmed {
+		throw("overflow")
+	}
+
+	pd.user = uint32(ridx)<<16 | uint32(widx)
+
+	subs = append(subs, s)
+	evts = append(evts, event{})
+
+	unlock(&mtx)
+}
+
+func netpolldisarm(pd *pollDesc, mode int32) {
+	switch mode {
+	case 'r':
+		removesub(int(pd.user >> 16))
+	case 'w':
+		removesub(int(pd.user & 0xFFFF))
+	case 'r' + 'w':
+		removesub(int(pd.user >> 16))
+		removesub(int(pd.user & 0xFFFF))
+	}
+}
+
+func removesub(i int) {
+	if i == disarmed {
+		return
+	}
+	j := len(subs) - 1
+
+	pdi := (*pollDesc)(unsafe.Pointer(uintptr(subs[i].userdata)))
+	pdj := (*pollDesc)(unsafe.Pointer(uintptr(subs[j].userdata)))
+
+	swapsub(pdi, i, disarmed)
+	swapsub(pdj, j, i)
+
+	subs = subs[:j]
+}
+
+func swapsub(pd *pollDesc, from, to int) {
+	if from == to {
+		return
+	}
+	ridx := int(pd.user >> 16)
+	widx := int(pd.user & 0xFFFF)
+	if ridx == from {
+		ridx = to
+	} else if widx == from {
+		widx = to
+	}
+	pd.user = uint32(ridx)<<16 | uint32(widx)
+	if to != disarmed {
+		subs[to], subs[from] = subs[from], subs[to]
+	}
+}
+
+func netpollclose(fd uintptr) int32 {
+	lock(&mtx)
+	for i := 0; i < len(pds); i++ {
+		if pds[i].fd == fd {
+			netpolldisarm(pds[i], 'r'+'w')
+			pds[i] = pds[len(pds)-1]
+			pds = pds[:len(pds)-1]
+			break
+		}
+	}
+	unlock(&mtx)
+	return 0
+}
+
+func netpollBreak() {}
+
+func netpoll(delay int64) gList {
+	lock(&mtx)
+
+	// If delay >= 0, we include a subscription of type Clock that we use as
+	// a timeout. If delay < 0, we omit the subscription and allow poll_oneoff
+	// to block indefinitely.
+	pollsubs := subs
+	if delay >= 0 {
+		timeout := &subs[0]
+		clock := timeout.u.subscriptionClock()
+		clock.timeout = uint64(delay)
+	} else {
+		pollsubs = subs[1:]
+	}
+
+	if len(pollsubs) == 0 {
+		unlock(&mtx)
+		return gList{}
+	}
+
+	evts = evts[:len(pollsubs)]
+	for i := range evts {
+		evts[i] = event{}
+	}
+
+retry:
+	var nevents size
+	errno := poll_oneoff(unsafe.Pointer(&pollsubs[0]), unsafe.Pointer(&evts[0]), uint32(len(pollsubs)), unsafe.Pointer(&nevents))
+	if errno != 0 {
+		if errno != _EINTR {
+			println("errno=", errno, " len(pollsubs)=", len(pollsubs))
+			throw("poll_oneoff failed")
+		}
+		// If a timed sleep was interrupted, just return to
+		// recalculate how long we should sleep now.
+		if delay > 0 {
+			unlock(&mtx)
+			return gList{}
+		}
+		goto retry
+	}
+
+	var toRun gList
+	for i := 0; i < int(nevents); i++ {
+		e := &evts[i]
+		if e.typ == eventtypeClock {
+			continue
+		}
+
+		hangup := e.fdReadwrite.flags&fdReadwriteHangup != 0
+		var mode int32
+		if e.typ == eventtypeFdRead || e.error != 0 || hangup {
+			mode += 'r'
+		}
+		if e.typ == eventtypeFdWrite || e.error != 0 || hangup {
+			mode += 'w'
+		}
+		if mode != 0 {
+			pd := (*pollDesc)(unsafe.Pointer(uintptr(e.userdata)))
+			netpolldisarm(pd, mode)
+			pd.setEventErr(e.error != 0, 0)
+			netpollready(&toRun, pd, mode)
+		}
+	}
+
+	unlock(&mtx)
+	return toRun
+}
diff --git a/src/runtime/netpoll_windows.go b/src/runtime/netpoll_windows.go
new file mode 100644
index 0000000..bb77d8d
--- /dev/null
+++ b/src/runtime/netpoll_windows.go
@@ -0,0 +1,160 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const _DWORD_MAX = 0xffffffff
+
+const _INVALID_HANDLE_VALUE = ^uintptr(0)
+
+// net_op must be the same as beginning of internal/poll.operation.
+// Keep these in sync.
+type net_op struct {
+	// used by windows
+	o overlapped
+	// used by netpoll
+	pd    *pollDesc
+	mode  int32
+	errno int32
+	qty   uint32
+}
+
+type overlappedEntry struct {
+	key      *pollDesc
+	op       *net_op // In reality it's *overlapped, but we cast it to *net_op anyway.
+	internal uintptr
+	qty      uint32
+}
+
+var (
+	iocphandle uintptr = _INVALID_HANDLE_VALUE // completion port io handle
+
+	netpollWakeSig atomic.Uint32 // used to avoid duplicate calls of netpollBreak
+)
+
+func netpollinit() {
+	iocphandle = stdcall4(_CreateIoCompletionPort, _INVALID_HANDLE_VALUE, 0, 0, _DWORD_MAX)
+	if iocphandle == 0 {
+		println("runtime: CreateIoCompletionPort failed (errno=", getlasterror(), ")")
+		throw("runtime: netpollinit failed")
+	}
+}
+
+func netpollIsPollDescriptor(fd uintptr) bool {
+	return fd == iocphandle
+}
+
+func netpollopen(fd uintptr, pd *pollDesc) int32 {
+	// TODO(iant): Consider using taggedPointer on 64-bit systems.
+	if stdcall4(_CreateIoCompletionPort, fd, iocphandle, uintptr(unsafe.Pointer(pd)), 0) == 0 {
+		return int32(getlasterror())
+	}
+	return 0
+}
+
+func netpollclose(fd uintptr) int32 {
+	// nothing to do
+	return 0
+}
+
+func netpollarm(pd *pollDesc, mode int) {
+	throw("runtime: unused")
+}
+
+func netpollBreak() {
+	// Failing to cas indicates there is an in-flight wakeup, so we're done here.
+	if !netpollWakeSig.CompareAndSwap(0, 1) {
+		return
+	}
+
+	if stdcall4(_PostQueuedCompletionStatus, iocphandle, 0, 0, 0) == 0 {
+		println("runtime: netpoll: PostQueuedCompletionStatus failed (errno=", getlasterror(), ")")
+		throw("runtime: netpoll: PostQueuedCompletionStatus failed")
+	}
+}
+
+// netpoll checks for ready network connections.
+// Returns list of goroutines that become runnable.
+// delay < 0: blocks indefinitely
+// delay == 0: does not block, just polls
+// delay > 0: block for up to that many nanoseconds
+func netpoll(delay int64) gList {
+	var entries [64]overlappedEntry
+	var wait, qty, flags, n, i uint32
+	var errno int32
+	var op *net_op
+	var toRun gList
+
+	mp := getg().m
+
+	if iocphandle == _INVALID_HANDLE_VALUE {
+		return gList{}
+	}
+	if delay < 0 {
+		wait = _INFINITE
+	} else if delay == 0 {
+		wait = 0
+	} else if delay < 1e6 {
+		wait = 1
+	} else if delay < 1e15 {
+		wait = uint32(delay / 1e6)
+	} else {
+		// An arbitrary cap on how long to wait for a timer.
+		// 1e9 ms == ~11.5 days.
+		wait = 1e9
+	}
+
+	n = uint32(len(entries) / int(gomaxprocs))
+	if n < 8 {
+		n = 8
+	}
+	if delay != 0 {
+		mp.blocked = true
+	}
+	if stdcall6(_GetQueuedCompletionStatusEx, iocphandle, uintptr(unsafe.Pointer(&entries[0])), uintptr(n), uintptr(unsafe.Pointer(&n)), uintptr(wait), 0) == 0 {
+		mp.blocked = false
+		errno = int32(getlasterror())
+		if errno == _WAIT_TIMEOUT {
+			return gList{}
+		}
+		println("runtime: GetQueuedCompletionStatusEx failed (errno=", errno, ")")
+		throw("runtime: netpoll failed")
+	}
+	mp.blocked = false
+	for i = 0; i < n; i++ {
+		op = entries[i].op
+		if op != nil && op.pd == entries[i].key {
+			errno = 0
+			qty = 0
+			if stdcall5(_WSAGetOverlappedResult, op.pd.fd, uintptr(unsafe.Pointer(op)), uintptr(unsafe.Pointer(&qty)), 0, uintptr(unsafe.Pointer(&flags))) == 0 {
+				errno = int32(getlasterror())
+			}
+			handlecompletion(&toRun, op, errno, qty)
+		} else {
+			netpollWakeSig.Store(0)
+			if delay == 0 {
+				// Forward the notification to the
+				// blocked poller.
+				netpollBreak()
+			}
+		}
+	}
+	return toRun
+}
+
+func handlecompletion(toRun *gList, op *net_op, errno int32, qty uint32) {
+	mode := op.mode
+	if mode != 'r' && mode != 'w' {
+		println("runtime: GetQueuedCompletionStatusEx returned invalid mode=", mode)
+		throw("runtime: netpoll failed")
+	}
+	op.errno = errno
+	op.qty = qty
+	netpollready(toRun, op.pd, mode)
+}
diff --git a/src/runtime/nonwindows_stub.go b/src/runtime/nonwindows_stub.go
new file mode 100644
index 0000000..033f026
--- /dev/null
+++ b/src/runtime/nonwindows_stub.go
@@ -0,0 +1,21 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !windows
+
+package runtime
+
+// osRelaxMinNS is the number of nanoseconds of idleness to tolerate
+// without performing an osRelax. Since osRelax may reduce the
+// precision of timers, this should be enough larger than the relaxed
+// timer precision to keep the timer error acceptable.
+const osRelaxMinNS = 0
+
+// osRelax is called by the scheduler when transitioning to and from
+// all Ps being idle.
+func osRelax(relax bool) {}
+
+// enableWER is called by setTraceback("wer").
+// Windows Error Reporting (WER) is only supported on Windows.
+func enableWER() {}
diff --git a/src/runtime/norace_linux_test.go b/src/runtime/norace_linux_test.go
new file mode 100644
index 0000000..3521b24
--- /dev/null
+++ b/src/runtime/norace_linux_test.go
@@ -0,0 +1,43 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The file contains tests that cannot run under race detector for some reason.
+//
+//go:build !race
+
+package runtime_test
+
+import (
+	"internal/abi"
+	"runtime"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+var newOSProcDone bool
+
+//go:nosplit
+func newOSProcCreated() {
+	newOSProcDone = true
+}
+
+// Can't be run with -race because it inserts calls into newOSProcCreated()
+// that require a valid G/M.
+func TestNewOSProc0(t *testing.T) {
+	runtime.NewOSProc0(0x800000, unsafe.Pointer(abi.FuncPCABIInternal(newOSProcCreated)))
+	check := time.NewTicker(100 * time.Millisecond)
+	defer check.Stop()
+	end := time.After(5 * time.Second)
+	for {
+		select {
+		case <-check.C:
+			if newOSProcDone {
+				return
+			}
+		case <-end:
+			t.Fatalf("couldn't create new OS process")
+		}
+	}
+}
diff --git a/src/runtime/norace_test.go b/src/runtime/norace_test.go
new file mode 100644
index 0000000..3b5eca5
--- /dev/null
+++ b/src/runtime/norace_test.go
@@ -0,0 +1,47 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The file contains tests that cannot run under race detector for some reason.
+//
+//go:build !race
+
+package runtime_test
+
+import (
+	"runtime"
+	"testing"
+)
+
+// Syscall tests split stack between Entersyscall and Exitsyscall under race detector.
+func BenchmarkSyscall(b *testing.B) {
+	benchmarkSyscall(b, 0, 1)
+}
+
+func BenchmarkSyscallWork(b *testing.B) {
+	benchmarkSyscall(b, 100, 1)
+}
+
+func BenchmarkSyscallExcess(b *testing.B) {
+	benchmarkSyscall(b, 0, 4)
+}
+
+func BenchmarkSyscallExcessWork(b *testing.B) {
+	benchmarkSyscall(b, 100, 4)
+}
+
+func benchmarkSyscall(b *testing.B, work, excess int) {
+	b.SetParallelism(excess)
+	b.RunParallel(func(pb *testing.PB) {
+		foo := 42
+		for pb.Next() {
+			runtime.Entersyscall()
+			for i := 0; i < work; i++ {
+				foo *= 2
+				foo /= 2
+			}
+			runtime.Exitsyscall()
+		}
+		_ = foo
+	})
+}
diff --git a/src/runtime/numcpu_freebsd_test.go b/src/runtime/numcpu_freebsd_test.go
new file mode 100644
index 0000000..e78890a
--- /dev/null
+++ b/src/runtime/numcpu_freebsd_test.go
@@ -0,0 +1,15 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import "testing"
+
+func TestFreeBSDNumCPU(t *testing.T) {
+	got := runTestProg(t, "testprog", "FreeBSDNumCPU")
+	want := "OK\n"
+	if got != want {
+		t.Fatalf("expected %q, but got:\n%s", want, got)
+	}
+}
diff --git a/src/runtime/os2_aix.go b/src/runtime/os2_aix.go
new file mode 100644
index 0000000..8af88d1
--- /dev/null
+++ b/src/runtime/os2_aix.go
@@ -0,0 +1,763 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This file contains main runtime AIX syscalls.
+// Pollset syscalls are in netpoll_aix.go.
+// The implementation is based on Solaris and Windows.
+// Each syscall is made by calling its libc symbol using asmcgocall and asmsyscall6
+// assembly functions.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+// Symbols imported for __start function.
+
+//go:cgo_import_dynamic libc___n_pthreads __n_pthreads "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libc___mod_init __mod_init "libc.a/shr_64.o"
+//go:linkname libc___n_pthreads libc___n_pthreads
+//go:linkname libc___mod_init libc___mod_init
+
+var (
+	libc___n_pthreads,
+	libc___mod_init libFunc
+)
+
+// Syscalls
+
+//go:cgo_import_dynamic libc__Errno _Errno "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_clock_gettime clock_gettime "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_close close "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_exit _exit "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_getpid getpid "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_getsystemcfg getsystemcfg "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_kill kill "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_madvise madvise "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_malloc malloc "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_mmap mmap "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_mprotect mprotect "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_munmap munmap "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_open open "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_pipe pipe "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_raise raise "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_read read "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_sched_yield sched_yield "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_sem_init sem_init "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_sem_post sem_post "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_sem_timedwait sem_timedwait "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_sem_wait sem_wait "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_setitimer setitimer "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_sigaction sigaction "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_sigaltstack sigaltstack "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_sysconf sysconf "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_usleep usleep "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_write write "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_getuid getuid "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_geteuid geteuid "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_getgid getgid "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_getegid getegid "libc.a/shr_64.o"
+
+//go:cgo_import_dynamic libpthread___pth_init __pth_init "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_attr_destroy pthread_attr_destroy "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_attr_init pthread_attr_init "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_attr_getstacksize pthread_attr_getstacksize "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_attr_setstacksize pthread_attr_setstacksize "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_attr_setdetachstate pthread_attr_setdetachstate "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_attr_setstackaddr pthread_attr_setstackaddr "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_create pthread_create "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_sigthreadmask sigthreadmask "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_self pthread_self "libpthread.a/shr_xpg5_64.o"
+//go:cgo_import_dynamic libpthread_kill pthread_kill "libpthread.a/shr_xpg5_64.o"
+
+//go:linkname libc__Errno libc__Errno
+//go:linkname libc_clock_gettime libc_clock_gettime
+//go:linkname libc_close libc_close
+//go:linkname libc_exit libc_exit
+//go:linkname libc_getpid libc_getpid
+//go:linkname libc_getsystemcfg libc_getsystemcfg
+//go:linkname libc_kill libc_kill
+//go:linkname libc_madvise libc_madvise
+//go:linkname libc_malloc libc_malloc
+//go:linkname libc_mmap libc_mmap
+//go:linkname libc_mprotect libc_mprotect
+//go:linkname libc_munmap libc_munmap
+//go:linkname libc_open libc_open
+//go:linkname libc_pipe libc_pipe
+//go:linkname libc_raise libc_raise
+//go:linkname libc_read libc_read
+//go:linkname libc_sched_yield libc_sched_yield
+//go:linkname libc_sem_init libc_sem_init
+//go:linkname libc_sem_post libc_sem_post
+//go:linkname libc_sem_timedwait libc_sem_timedwait
+//go:linkname libc_sem_wait libc_sem_wait
+//go:linkname libc_setitimer libc_setitimer
+//go:linkname libc_sigaction libc_sigaction
+//go:linkname libc_sigaltstack libc_sigaltstack
+//go:linkname libc_sysconf libc_sysconf
+//go:linkname libc_usleep libc_usleep
+//go:linkname libc_write libc_write
+//go:linkname libc_getuid libc_getuid
+//go:linkname libc_geteuid libc_geteuid
+//go:linkname libc_getgid libc_getgid
+//go:linkname libc_getegid libc_getegid
+
+//go:linkname libpthread___pth_init libpthread___pth_init
+//go:linkname libpthread_attr_destroy libpthread_attr_destroy
+//go:linkname libpthread_attr_init libpthread_attr_init
+//go:linkname libpthread_attr_getstacksize libpthread_attr_getstacksize
+//go:linkname libpthread_attr_setstacksize libpthread_attr_setstacksize
+//go:linkname libpthread_attr_setdetachstate libpthread_attr_setdetachstate
+//go:linkname libpthread_attr_setstackaddr libpthread_attr_setstackaddr
+//go:linkname libpthread_create libpthread_create
+//go:linkname libpthread_sigthreadmask libpthread_sigthreadmask
+//go:linkname libpthread_self libpthread_self
+//go:linkname libpthread_kill libpthread_kill
+
+var (
+	//libc
+	libc__Errno,
+	libc_clock_gettime,
+	libc_close,
+	libc_exit,
+	libc_getpid,
+	libc_getsystemcfg,
+	libc_kill,
+	libc_madvise,
+	libc_malloc,
+	libc_mmap,
+	libc_mprotect,
+	libc_munmap,
+	libc_open,
+	libc_pipe,
+	libc_raise,
+	libc_read,
+	libc_sched_yield,
+	libc_sem_init,
+	libc_sem_post,
+	libc_sem_timedwait,
+	libc_sem_wait,
+	libc_setitimer,
+	libc_sigaction,
+	libc_sigaltstack,
+	libc_sysconf,
+	libc_usleep,
+	libc_write,
+	libc_getuid,
+	libc_geteuid,
+	libc_getgid,
+	libc_getegid,
+	//libpthread
+	libpthread___pth_init,
+	libpthread_attr_destroy,
+	libpthread_attr_init,
+	libpthread_attr_getstacksize,
+	libpthread_attr_setstacksize,
+	libpthread_attr_setdetachstate,
+	libpthread_attr_setstackaddr,
+	libpthread_create,
+	libpthread_sigthreadmask,
+	libpthread_self,
+	libpthread_kill libFunc
+)
+
+type libFunc uintptr
+
+// asmsyscall6 calls the libc symbol using a C convention.
+// It's defined in sys_aix_ppc64.go.
+var asmsyscall6 libFunc
+
+// syscallX functions must always be called with g != nil and m != nil,
+// as it relies on g.m.libcall to pass arguments to asmcgocall.
+// The few cases where syscalls haven't a g or a m must call their equivalent
+// function in sys_aix_ppc64.s to handle them.
+
+//go:nowritebarrier
+//go:nosplit
+func syscall0(fn *libFunc) (r, err uintptr) {
+	gp := getg()
+	mp := gp.m
+	resetLibcall := true
+	if mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		resetLibcall = false // See comment in sys_darwin.go:libcCall
+	}
+
+	c := libcall{
+		fn:   uintptr(unsafe.Pointer(fn)),
+		n:    0,
+		args: uintptr(unsafe.Pointer(&fn)), // it's unused but must be non-nil, otherwise crashes
+	}
+
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+
+	if resetLibcall {
+		mp.libcallsp = 0
+	}
+
+	return c.r1, c.err
+}
+
+//go:nowritebarrier
+//go:nosplit
+func syscall1(fn *libFunc, a0 uintptr) (r, err uintptr) {
+	gp := getg()
+	mp := gp.m
+	resetLibcall := true
+	if mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		resetLibcall = false // See comment in sys_darwin.go:libcCall
+	}
+
+	c := libcall{
+		fn:   uintptr(unsafe.Pointer(fn)),
+		n:    1,
+		args: uintptr(unsafe.Pointer(&a0)),
+	}
+
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+
+	if resetLibcall {
+		mp.libcallsp = 0
+	}
+
+	return c.r1, c.err
+}
+
+//go:nowritebarrier
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall2(fn *libFunc, a0, a1 uintptr) (r, err uintptr) {
+	gp := getg()
+	mp := gp.m
+	resetLibcall := true
+	if mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		resetLibcall = false // See comment in sys_darwin.go:libcCall
+	}
+
+	c := libcall{
+		fn:   uintptr(unsafe.Pointer(fn)),
+		n:    2,
+		args: uintptr(unsafe.Pointer(&a0)),
+	}
+
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+
+	if resetLibcall {
+		mp.libcallsp = 0
+	}
+
+	return c.r1, c.err
+}
+
+//go:nowritebarrier
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall3(fn *libFunc, a0, a1, a2 uintptr) (r, err uintptr) {
+	gp := getg()
+	mp := gp.m
+	resetLibcall := true
+	if mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		resetLibcall = false // See comment in sys_darwin.go:libcCall
+	}
+
+	c := libcall{
+		fn:   uintptr(unsafe.Pointer(fn)),
+		n:    3,
+		args: uintptr(unsafe.Pointer(&a0)),
+	}
+
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+
+	if resetLibcall {
+		mp.libcallsp = 0
+	}
+
+	return c.r1, c.err
+}
+
+//go:nowritebarrier
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall4(fn *libFunc, a0, a1, a2, a3 uintptr) (r, err uintptr) {
+	gp := getg()
+	mp := gp.m
+	resetLibcall := true
+	if mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		resetLibcall = false // See comment in sys_darwin.go:libcCall
+	}
+
+	c := libcall{
+		fn:   uintptr(unsafe.Pointer(fn)),
+		n:    4,
+		args: uintptr(unsafe.Pointer(&a0)),
+	}
+
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+
+	if resetLibcall {
+		mp.libcallsp = 0
+	}
+
+	return c.r1, c.err
+}
+
+//go:nowritebarrier
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall5(fn *libFunc, a0, a1, a2, a3, a4 uintptr) (r, err uintptr) {
+	gp := getg()
+	mp := gp.m
+	resetLibcall := true
+	if mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		resetLibcall = false // See comment in sys_darwin.go:libcCall
+	}
+
+	c := libcall{
+		fn:   uintptr(unsafe.Pointer(fn)),
+		n:    5,
+		args: uintptr(unsafe.Pointer(&a0)),
+	}
+
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+
+	if resetLibcall {
+		mp.libcallsp = 0
+	}
+
+	return c.r1, c.err
+}
+
+//go:nowritebarrier
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall6(fn *libFunc, a0, a1, a2, a3, a4, a5 uintptr) (r, err uintptr) {
+	gp := getg()
+	mp := gp.m
+	resetLibcall := true
+	if mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		resetLibcall = false // See comment in sys_darwin.go:libcCall
+	}
+
+	c := libcall{
+		fn:   uintptr(unsafe.Pointer(fn)),
+		n:    6,
+		args: uintptr(unsafe.Pointer(&a0)),
+	}
+
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+
+	if resetLibcall {
+		mp.libcallsp = 0
+	}
+
+	return c.r1, c.err
+}
+
+func exit1(code int32)
+
+//go:nosplit
+func exit(code int32) {
+	gp := getg()
+
+	// Check the validity of g because without a g during
+	// newosproc0.
+	if gp != nil {
+		syscall1(&libc_exit, uintptr(code))
+		return
+	}
+	exit1(code)
+}
+
+func write2(fd, p uintptr, n int32) int32
+
+//go:nosplit
+func write1(fd uintptr, p unsafe.Pointer, n int32) int32 {
+	gp := getg()
+
+	// Check the validity of g because without a g during
+	// newosproc0.
+	if gp != nil {
+		r, errno := syscall3(&libc_write, uintptr(fd), uintptr(p), uintptr(n))
+		if int32(r) < 0 {
+			return -int32(errno)
+		}
+		return int32(r)
+	}
+	// Note that in this case we can't return a valid errno value.
+	return write2(fd, uintptr(p), n)
+
+}
+
+//go:nosplit
+func read(fd int32, p unsafe.Pointer, n int32) int32 {
+	r, errno := syscall3(&libc_read, uintptr(fd), uintptr(p), uintptr(n))
+	if int32(r) < 0 {
+		return -int32(errno)
+	}
+	return int32(r)
+}
+
+//go:nosplit
+func open(name *byte, mode, perm int32) int32 {
+	r, _ := syscall3(&libc_open, uintptr(unsafe.Pointer(name)), uintptr(mode), uintptr(perm))
+	return int32(r)
+}
+
+//go:nosplit
+func closefd(fd int32) int32 {
+	r, _ := syscall1(&libc_close, uintptr(fd))
+	return int32(r)
+}
+
+//go:nosplit
+func pipe() (r, w int32, errno int32) {
+	var p [2]int32
+	_, err := syscall1(&libc_pipe, uintptr(noescape(unsafe.Pointer(&p[0]))))
+	return p[0], p[1], int32(err)
+}
+
+// mmap calls the mmap system call.
+// We only pass the lower 32 bits of file offset to the
+// assembly routine; the higher bits (if required), should be provided
+// by the assembly routine as 0.
+// The err result is an OS error code such as ENOMEM.
+//
+//go:nosplit
+func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (unsafe.Pointer, int) {
+	r, err0 := syscall6(&libc_mmap, uintptr(addr), uintptr(n), uintptr(prot), uintptr(flags), uintptr(fd), uintptr(off))
+	if r == ^uintptr(0) {
+		return nil, int(err0)
+	}
+	return unsafe.Pointer(r), int(err0)
+}
+
+//go:nosplit
+func mprotect(addr unsafe.Pointer, n uintptr, prot int32) (unsafe.Pointer, int) {
+	r, err0 := syscall3(&libc_mprotect, uintptr(addr), uintptr(n), uintptr(prot))
+	if r == ^uintptr(0) {
+		return nil, int(err0)
+	}
+	return unsafe.Pointer(r), int(err0)
+}
+
+//go:nosplit
+func munmap(addr unsafe.Pointer, n uintptr) {
+	r, err := syscall2(&libc_munmap, uintptr(addr), uintptr(n))
+	if int32(r) == -1 {
+		println("syscall munmap failed: ", hex(err))
+		throw("syscall munmap")
+	}
+}
+
+//go:nosplit
+func madvise(addr unsafe.Pointer, n uintptr, flags int32) {
+	r, err := syscall3(&libc_madvise, uintptr(addr), uintptr(n), uintptr(flags))
+	if int32(r) == -1 {
+		println("syscall madvise failed: ", hex(err))
+		throw("syscall madvise")
+	}
+}
+
+func sigaction1(sig, new, old uintptr)
+
+//go:nosplit
+func sigaction(sig uintptr, new, old *sigactiont) {
+	gp := getg()
+
+	// Check the validity of g because without a g during
+	// runtime.libpreinit.
+	if gp != nil {
+		r, err := syscall3(&libc_sigaction, sig, uintptr(unsafe.Pointer(new)), uintptr(unsafe.Pointer(old)))
+		if int32(r) == -1 {
+			println("Sigaction failed for sig: ", sig, " with error:", hex(err))
+			throw("syscall sigaction")
+		}
+		return
+	}
+
+	sigaction1(sig, uintptr(unsafe.Pointer(new)), uintptr(unsafe.Pointer(old)))
+}
+
+//go:nosplit
+func sigaltstack(new, old *stackt) {
+	r, err := syscall2(&libc_sigaltstack, uintptr(unsafe.Pointer(new)), uintptr(unsafe.Pointer(old)))
+	if int32(r) == -1 {
+		println("syscall sigaltstack failed: ", hex(err))
+		throw("syscall sigaltstack")
+	}
+}
+
+//go:nosplit
+//go:linkname internal_cpu_getsystemcfg internal/cpu.getsystemcfg
+func internal_cpu_getsystemcfg(label uint) uint {
+	r, _ := syscall1(&libc_getsystemcfg, uintptr(label))
+	return uint(r)
+}
+
+func usleep1(us uint32)
+
+//go:nosplit
+func usleep_no_g(us uint32) {
+	usleep1(us)
+}
+
+//go:nosplit
+func usleep(us uint32) {
+	r, err := syscall1(&libc_usleep, uintptr(us))
+	if int32(r) == -1 {
+		println("syscall usleep failed: ", hex(err))
+		throw("syscall usleep")
+	}
+}
+
+//go:nosplit
+func clock_gettime(clockid int32, tp *timespec) int32 {
+	r, _ := syscall2(&libc_clock_gettime, uintptr(clockid), uintptr(unsafe.Pointer(tp)))
+	return int32(r)
+}
+
+//go:nosplit
+func setitimer(mode int32, new, old *itimerval) {
+	r, err := syscall3(&libc_setitimer, uintptr(mode), uintptr(unsafe.Pointer(new)), uintptr(unsafe.Pointer(old)))
+	if int32(r) == -1 {
+		println("syscall setitimer failed: ", hex(err))
+		throw("syscall setitimer")
+	}
+}
+
+//go:nosplit
+func malloc(size uintptr) unsafe.Pointer {
+	r, _ := syscall1(&libc_malloc, size)
+	return unsafe.Pointer(r)
+}
+
+//go:nosplit
+func sem_init(sem *semt, pshared int32, value uint32) int32 {
+	r, _ := syscall3(&libc_sem_init, uintptr(unsafe.Pointer(sem)), uintptr(pshared), uintptr(value))
+	return int32(r)
+}
+
+//go:nosplit
+func sem_wait(sem *semt) (int32, int32) {
+	r, err := syscall1(&libc_sem_wait, uintptr(unsafe.Pointer(sem)))
+	return int32(r), int32(err)
+}
+
+//go:nosplit
+func sem_post(sem *semt) int32 {
+	r, _ := syscall1(&libc_sem_post, uintptr(unsafe.Pointer(sem)))
+	return int32(r)
+}
+
+//go:nosplit
+func sem_timedwait(sem *semt, timeout *timespec) (int32, int32) {
+	r, err := syscall2(&libc_sem_timedwait, uintptr(unsafe.Pointer(sem)), uintptr(unsafe.Pointer(timeout)))
+	return int32(r), int32(err)
+}
+
+//go:nosplit
+func raise(sig uint32) {
+	r, err := syscall1(&libc_raise, uintptr(sig))
+	if int32(r) == -1 {
+		println("syscall raise failed: ", hex(err))
+		throw("syscall raise")
+	}
+}
+
+//go:nosplit
+func raiseproc(sig uint32) {
+	pid, err := syscall0(&libc_getpid)
+	if int32(pid) == -1 {
+		println("syscall getpid failed: ", hex(err))
+		throw("syscall raiseproc")
+	}
+
+	syscall2(&libc_kill, pid, uintptr(sig))
+}
+
+func osyield1()
+
+//go:nosplit
+func osyield_no_g() {
+	osyield1()
+}
+
+//go:nosplit
+func osyield() {
+	r, err := syscall0(&libc_sched_yield)
+	if int32(r) == -1 {
+		println("syscall osyield failed: ", hex(err))
+		throw("syscall osyield")
+	}
+}
+
+//go:nosplit
+func sysconf(name int32) uintptr {
+	r, _ := syscall1(&libc_sysconf, uintptr(name))
+	if int32(r) == -1 {
+		throw("syscall sysconf")
+	}
+	return r
+
+}
+
+// pthread functions returns its error code in the main return value
+// Therefore, err returns by syscall means nothing and must not be used
+
+//go:nosplit
+func pthread_attr_destroy(attr *pthread_attr) int32 {
+	r, _ := syscall1(&libpthread_attr_destroy, uintptr(unsafe.Pointer(attr)))
+	return int32(r)
+}
+
+func pthread_attr_init1(attr uintptr) int32
+
+//go:nosplit
+func pthread_attr_init(attr *pthread_attr) int32 {
+	gp := getg()
+
+	// Check the validity of g because without a g during
+	// newosproc0.
+	if gp != nil {
+		r, _ := syscall1(&libpthread_attr_init, uintptr(unsafe.Pointer(attr)))
+		return int32(r)
+	}
+
+	return pthread_attr_init1(uintptr(unsafe.Pointer(attr)))
+}
+
+func pthread_attr_setdetachstate1(attr uintptr, state int32) int32
+
+//go:nosplit
+func pthread_attr_setdetachstate(attr *pthread_attr, state int32) int32 {
+	gp := getg()
+
+	// Check the validity of g because without a g during
+	// newosproc0.
+	if gp != nil {
+		r, _ := syscall2(&libpthread_attr_setdetachstate, uintptr(unsafe.Pointer(attr)), uintptr(state))
+		return int32(r)
+	}
+
+	return pthread_attr_setdetachstate1(uintptr(unsafe.Pointer(attr)), state)
+}
+
+//go:nosplit
+func pthread_attr_setstackaddr(attr *pthread_attr, stk unsafe.Pointer) int32 {
+	r, _ := syscall2(&libpthread_attr_setstackaddr, uintptr(unsafe.Pointer(attr)), uintptr(stk))
+	return int32(r)
+}
+
+//go:nosplit
+func pthread_attr_getstacksize(attr *pthread_attr, size *uint64) int32 {
+	r, _ := syscall2(&libpthread_attr_getstacksize, uintptr(unsafe.Pointer(attr)), uintptr(unsafe.Pointer(size)))
+	return int32(r)
+}
+
+func pthread_attr_setstacksize1(attr uintptr, size uint64) int32
+
+//go:nosplit
+func pthread_attr_setstacksize(attr *pthread_attr, size uint64) int32 {
+	gp := getg()
+
+	// Check the validity of g because without a g during
+	// newosproc0.
+	if gp != nil {
+		r, _ := syscall2(&libpthread_attr_setstacksize, uintptr(unsafe.Pointer(attr)), uintptr(size))
+		return int32(r)
+	}
+
+	return pthread_attr_setstacksize1(uintptr(unsafe.Pointer(attr)), size)
+}
+
+func pthread_create1(tid, attr, fn, arg uintptr) int32
+
+//go:nosplit
+func pthread_create(tid *pthread, attr *pthread_attr, fn *funcDescriptor, arg unsafe.Pointer) int32 {
+	gp := getg()
+
+	// Check the validity of g because without a g during
+	// newosproc0.
+	if gp != nil {
+		r, _ := syscall4(&libpthread_create, uintptr(unsafe.Pointer(tid)), uintptr(unsafe.Pointer(attr)), uintptr(unsafe.Pointer(fn)), uintptr(arg))
+		return int32(r)
+	}
+
+	return pthread_create1(uintptr(unsafe.Pointer(tid)), uintptr(unsafe.Pointer(attr)), uintptr(unsafe.Pointer(fn)), uintptr(arg))
+}
+
+// On multi-thread program, sigprocmask must not be called.
+// It's replaced by sigthreadmask.
+func sigprocmask1(how, new, old uintptr)
+
+//go:nosplit
+func sigprocmask(how int32, new, old *sigset) {
+	gp := getg()
+
+	// Check the validity of m because it might be called during a cgo
+	// callback early enough where m isn't available yet.
+	if gp != nil && gp.m != nil {
+		r, err := syscall3(&libpthread_sigthreadmask, uintptr(how), uintptr(unsafe.Pointer(new)), uintptr(unsafe.Pointer(old)))
+		if int32(r) != 0 {
+			println("syscall sigthreadmask failed: ", hex(err))
+			throw("syscall sigthreadmask")
+		}
+		return
+	}
+	sigprocmask1(uintptr(how), uintptr(unsafe.Pointer(new)), uintptr(unsafe.Pointer(old)))
+
+}
+
+//go:nosplit
+func pthread_self() pthread {
+	r, _ := syscall0(&libpthread_self)
+	return pthread(r)
+}
+
+//go:nosplit
+func signalM(mp *m, sig int) {
+	syscall2(&libpthread_kill, uintptr(pthread(mp.procid)), uintptr(sig))
+}
diff --git a/src/runtime/os2_freebsd.go b/src/runtime/os2_freebsd.go
new file mode 100644
index 0000000..29f0b76
--- /dev/null
+++ b/src/runtime/os2_freebsd.go
@@ -0,0 +1,14 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	_SS_DISABLE  = 4
+	_NSIG        = 33
+	_SI_USER     = 0x10001
+	_SIG_BLOCK   = 1
+	_SIG_UNBLOCK = 2
+	_SIG_SETMASK = 3
+)
diff --git a/src/runtime/os2_openbsd.go b/src/runtime/os2_openbsd.go
new file mode 100644
index 0000000..8656a91
--- /dev/null
+++ b/src/runtime/os2_openbsd.go
@@ -0,0 +1,14 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	_SS_DISABLE  = 4
+	_SIG_BLOCK   = 1
+	_SIG_UNBLOCK = 2
+	_SIG_SETMASK = 3
+	_NSIG        = 33
+	_SI_USER     = 0
+)
diff --git a/src/runtime/os2_plan9.go b/src/runtime/os2_plan9.go
new file mode 100644
index 0000000..58fb2be
--- /dev/null
+++ b/src/runtime/os2_plan9.go
@@ -0,0 +1,74 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Plan 9-specific system calls
+
+package runtime
+
+// open
+const (
+	_OREAD   = 0
+	_OWRITE  = 1
+	_ORDWR   = 2
+	_OEXEC   = 3
+	_OTRUNC  = 16
+	_OCEXEC  = 32
+	_ORCLOSE = 64
+	_OEXCL   = 0x1000
+)
+
+// rfork
+const (
+	_RFNAMEG  = 1 << 0
+	_RFENVG   = 1 << 1
+	_RFFDG    = 1 << 2
+	_RFNOTEG  = 1 << 3
+	_RFPROC   = 1 << 4
+	_RFMEM    = 1 << 5
+	_RFNOWAIT = 1 << 6
+	_RFCNAMEG = 1 << 10
+	_RFCENVG  = 1 << 11
+	_RFCFDG   = 1 << 12
+	_RFREND   = 1 << 13
+	_RFNOMNT  = 1 << 14
+)
+
+// notify
+const (
+	_NCONT = 0
+	_NDFLT = 1
+)
+
+type uinptr _Plink
+
+type tos struct {
+	prof struct { // Per process profiling
+		pp    *_Plink // known to be 0(ptr)
+		next  *_Plink // known to be 4(ptr)
+		last  *_Plink
+		first *_Plink
+		pid   uint32
+		what  uint32
+	}
+	cyclefreq uint64 // cycle clock frequency if there is one, 0 otherwise
+	kcycles   int64  // cycles spent in kernel
+	pcycles   int64  // cycles spent in process (kernel + user)
+	pid       uint32 // might as well put the pid here
+	clock     uint32
+	// top of stack is here
+}
+
+const (
+	_NSIG   = 14  // number of signals in sigtable array
+	_ERRMAX = 128 // max length of note string
+
+	// Notes in runtime·sigtab that are handled by runtime·sigpanic.
+	_SIGRFAULT = 2
+	_SIGWFAULT = 3
+	_SIGINTDIV = 4
+	_SIGFLOAT  = 5
+	_SIGTRAP   = 6
+	_SIGPROF   = 0 // dummy value defined for badsignal
+	_SIGQUIT   = 0 // dummy value defined for sighandler
+)
diff --git a/src/runtime/os2_solaris.go b/src/runtime/os2_solaris.go
new file mode 100644
index 0000000..108bea6
--- /dev/null
+++ b/src/runtime/os2_solaris.go
@@ -0,0 +1,13 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	_SS_DISABLE  = 2
+	_SIG_UNBLOCK = 2
+	_SIG_SETMASK = 3
+	_NSIG        = 73 /* number of signals in sigtable array */
+	_SI_USER     = 0
+)
diff --git a/src/runtime/os3_plan9.go b/src/runtime/os3_plan9.go
new file mode 100644
index 0000000..8c9cbe2
--- /dev/null
+++ b/src/runtime/os3_plan9.go
@@ -0,0 +1,166 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+// May run during STW, so write barriers are not allowed.
+//
+//go:nowritebarrierrec
+func sighandler(_ureg *ureg, note *byte, gp *g) int {
+	gsignal := getg()
+	mp := gsignal.m
+
+	var t sigTabT
+	var docrash bool
+	var sig int
+	var flags int
+	var level int32
+
+	c := &sigctxt{_ureg}
+	notestr := gostringnocopy(note)
+
+	// The kernel will never pass us a nil note or ureg so we probably
+	// made a mistake somewhere in sigtramp.
+	if _ureg == nil || note == nil {
+		print("sighandler: ureg ", _ureg, " note ", note, "\n")
+		goto Throw
+	}
+	// Check that the note is no more than ERRMAX bytes (including
+	// the trailing NUL). We should never receive a longer note.
+	if len(notestr) > _ERRMAX-1 {
+		print("sighandler: note is longer than ERRMAX\n")
+		goto Throw
+	}
+	if isAbortPC(c.pc()) {
+		// Never turn abort into a panic.
+		goto Throw
+	}
+	// See if the note matches one of the patterns in sigtab.
+	// Notes that do not match any pattern can be handled at a higher
+	// level by the program but will otherwise be ignored.
+	flags = _SigNotify
+	for sig, t = range sigtable {
+		if hasPrefix(notestr, t.name) {
+			flags = t.flags
+			break
+		}
+	}
+	if flags&_SigPanic != 0 && gp.throwsplit {
+		// We can't safely sigpanic because it may grow the
+		// stack. Abort in the signal handler instead.
+		flags = (flags &^ _SigPanic) | _SigThrow
+	}
+	if flags&_SigGoExit != 0 {
+		exits((*byte)(add(unsafe.Pointer(note), 9))) // Strip "go: exit " prefix.
+	}
+	if flags&_SigPanic != 0 {
+		// Copy the error string from sigtramp's stack into m->notesig so
+		// we can reliably access it from the panic routines.
+		memmove(unsafe.Pointer(mp.notesig), unsafe.Pointer(note), uintptr(len(notestr)+1))
+		gp.sig = uint32(sig)
+		gp.sigpc = c.pc()
+
+		pc := c.pc()
+		sp := c.sp()
+
+		// If we don't recognize the PC as code
+		// but we do recognize the top pointer on the stack as code,
+		// then assume this was a call to non-code and treat like
+		// pc == 0, to make unwinding show the context.
+		if pc != 0 && !findfunc(pc).valid() && findfunc(*(*uintptr)(unsafe.Pointer(sp))).valid() {
+			pc = 0
+		}
+
+		// IF LR exists, sigpanictramp must save it to the stack
+		// before entry to sigpanic so that panics in leaf
+		// functions are correctly handled. This will smash
+		// the stack frame but we're not going back there
+		// anyway.
+		if usesLR {
+			c.savelr(c.lr())
+		}
+
+		// If PC == 0, probably panicked because of a call to a nil func.
+		// Not faking that as the return address will make the trace look like a call
+		// to sigpanic instead. (Otherwise the trace will end at
+		// sigpanic and we won't get to see who faulted).
+		if pc != 0 {
+			if usesLR {
+				c.setlr(pc)
+			} else {
+				sp -= goarch.PtrSize
+				*(*uintptr)(unsafe.Pointer(sp)) = pc
+				c.setsp(sp)
+			}
+		}
+		if usesLR {
+			c.setpc(abi.FuncPCABI0(sigpanictramp))
+		} else {
+			c.setpc(abi.FuncPCABI0(sigpanic0))
+		}
+		return _NCONT
+	}
+	if flags&_SigNotify != 0 {
+		if ignoredNote(note) {
+			return _NCONT
+		}
+		if sendNote(note) {
+			return _NCONT
+		}
+	}
+	if flags&_SigKill != 0 {
+		goto Exit
+	}
+	if flags&_SigThrow == 0 {
+		return _NCONT
+	}
+Throw:
+	mp.throwing = throwTypeRuntime
+	mp.caughtsig.set(gp)
+	startpanic_m()
+	print(notestr, "\n")
+	print("PC=", hex(c.pc()), "\n")
+	print("\n")
+	level, _, docrash = gotraceback()
+	if level > 0 {
+		goroutineheader(gp)
+		tracebacktrap(c.pc(), c.sp(), c.lr(), gp)
+		tracebackothers(gp)
+		print("\n")
+		dumpregs(_ureg)
+	}
+	if docrash {
+		crash()
+	}
+Exit:
+	goexitsall(note)
+	exits(note)
+	return _NDFLT // not reached
+}
+
+func sigenable(sig uint32) {
+}
+
+func sigdisable(sig uint32) {
+}
+
+func sigignore(sig uint32) {
+}
+
+func setProcessCPUProfiler(hz int32) {
+}
+
+func setThreadCPUProfiler(hz int32) {
+	// TODO: Enable profiling interrupts.
+	getg().m.profilehz = hz
+}
+
+// gsignalStack is unused on Plan 9.
+type gsignalStack struct{}
diff --git a/src/runtime/os3_solaris.go b/src/runtime/os3_solaris.go
new file mode 100644
index 0000000..046d173
--- /dev/null
+++ b/src/runtime/os3_solaris.go
@@ -0,0 +1,637 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+//go:cgo_export_dynamic runtime.end _end
+//go:cgo_export_dynamic runtime.etext _etext
+//go:cgo_export_dynamic runtime.edata _edata
+
+//go:cgo_import_dynamic libc____errno ___errno "libc.so"
+//go:cgo_import_dynamic libc_clock_gettime clock_gettime "libc.so"
+//go:cgo_import_dynamic libc_exit _exit "libc.so"
+//go:cgo_import_dynamic libc_getcontext getcontext "libc.so"
+//go:cgo_import_dynamic libc_kill kill "libc.so"
+//go:cgo_import_dynamic libc_madvise madvise "libc.so"
+//go:cgo_import_dynamic libc_malloc malloc "libc.so"
+//go:cgo_import_dynamic libc_mmap mmap "libc.so"
+//go:cgo_import_dynamic libc_munmap munmap "libc.so"
+//go:cgo_import_dynamic libc_open open "libc.so"
+//go:cgo_import_dynamic libc_pthread_attr_destroy pthread_attr_destroy "libc.so"
+//go:cgo_import_dynamic libc_pthread_attr_getstack pthread_attr_getstack "libc.so"
+//go:cgo_import_dynamic libc_pthread_attr_init pthread_attr_init "libc.so"
+//go:cgo_import_dynamic libc_pthread_attr_setdetachstate pthread_attr_setdetachstate "libc.so"
+//go:cgo_import_dynamic libc_pthread_attr_setstack pthread_attr_setstack "libc.so"
+//go:cgo_import_dynamic libc_pthread_create pthread_create "libc.so"
+//go:cgo_import_dynamic libc_pthread_self pthread_self "libc.so"
+//go:cgo_import_dynamic libc_pthread_kill pthread_kill "libc.so"
+//go:cgo_import_dynamic libc_raise raise "libc.so"
+//go:cgo_import_dynamic libc_read read "libc.so"
+//go:cgo_import_dynamic libc_select select "libc.so"
+//go:cgo_import_dynamic libc_sched_yield sched_yield "libc.so"
+//go:cgo_import_dynamic libc_sem_init sem_init "libc.so"
+//go:cgo_import_dynamic libc_sem_post sem_post "libc.so"
+//go:cgo_import_dynamic libc_sem_reltimedwait_np sem_reltimedwait_np "libc.so"
+//go:cgo_import_dynamic libc_sem_wait sem_wait "libc.so"
+//go:cgo_import_dynamic libc_setitimer setitimer "libc.so"
+//go:cgo_import_dynamic libc_sigaction sigaction "libc.so"
+//go:cgo_import_dynamic libc_sigaltstack sigaltstack "libc.so"
+//go:cgo_import_dynamic libc_sigprocmask sigprocmask "libc.so"
+//go:cgo_import_dynamic libc_sysconf sysconf "libc.so"
+//go:cgo_import_dynamic libc_usleep usleep "libc.so"
+//go:cgo_import_dynamic libc_write write "libc.so"
+//go:cgo_import_dynamic libc_pipe2 pipe2 "libc.so"
+
+//go:linkname libc____errno libc____errno
+//go:linkname libc_clock_gettime libc_clock_gettime
+//go:linkname libc_exit libc_exit
+//go:linkname libc_getcontext libc_getcontext
+//go:linkname libc_kill libc_kill
+//go:linkname libc_madvise libc_madvise
+//go:linkname libc_malloc libc_malloc
+//go:linkname libc_mmap libc_mmap
+//go:linkname libc_munmap libc_munmap
+//go:linkname libc_open libc_open
+//go:linkname libc_pthread_attr_destroy libc_pthread_attr_destroy
+//go:linkname libc_pthread_attr_getstack libc_pthread_attr_getstack
+//go:linkname libc_pthread_attr_init libc_pthread_attr_init
+//go:linkname libc_pthread_attr_setdetachstate libc_pthread_attr_setdetachstate
+//go:linkname libc_pthread_attr_setstack libc_pthread_attr_setstack
+//go:linkname libc_pthread_create libc_pthread_create
+//go:linkname libc_pthread_self libc_pthread_self
+//go:linkname libc_pthread_kill libc_pthread_kill
+//go:linkname libc_raise libc_raise
+//go:linkname libc_read libc_read
+//go:linkname libc_select libc_select
+//go:linkname libc_sched_yield libc_sched_yield
+//go:linkname libc_sem_init libc_sem_init
+//go:linkname libc_sem_post libc_sem_post
+//go:linkname libc_sem_reltimedwait_np libc_sem_reltimedwait_np
+//go:linkname libc_sem_wait libc_sem_wait
+//go:linkname libc_setitimer libc_setitimer
+//go:linkname libc_sigaction libc_sigaction
+//go:linkname libc_sigaltstack libc_sigaltstack
+//go:linkname libc_sigprocmask libc_sigprocmask
+//go:linkname libc_sysconf libc_sysconf
+//go:linkname libc_usleep libc_usleep
+//go:linkname libc_write libc_write
+//go:linkname libc_pipe2 libc_pipe2
+
+var (
+	libc____errno,
+	libc_clock_gettime,
+	libc_exit,
+	libc_getcontext,
+	libc_kill,
+	libc_madvise,
+	libc_malloc,
+	libc_mmap,
+	libc_munmap,
+	libc_open,
+	libc_pthread_attr_destroy,
+	libc_pthread_attr_getstack,
+	libc_pthread_attr_init,
+	libc_pthread_attr_setdetachstate,
+	libc_pthread_attr_setstack,
+	libc_pthread_create,
+	libc_pthread_self,
+	libc_pthread_kill,
+	libc_raise,
+	libc_read,
+	libc_sched_yield,
+	libc_select,
+	libc_sem_init,
+	libc_sem_post,
+	libc_sem_reltimedwait_np,
+	libc_sem_wait,
+	libc_setitimer,
+	libc_sigaction,
+	libc_sigaltstack,
+	libc_sigprocmask,
+	libc_sysconf,
+	libc_usleep,
+	libc_write,
+	libc_pipe2 libcFunc
+)
+
+var sigset_all = sigset{[4]uint32{^uint32(0), ^uint32(0), ^uint32(0), ^uint32(0)}}
+
+func getPageSize() uintptr {
+	n := int32(sysconf(__SC_PAGESIZE))
+	if n <= 0 {
+		return 0
+	}
+	return uintptr(n)
+}
+
+func osinit() {
+	ncpu = getncpu()
+	if physPageSize == 0 {
+		physPageSize = getPageSize()
+	}
+}
+
+func tstart_sysvicall(newm *m) uint32
+
+// May run with m.p==nil, so write barriers are not allowed.
+//
+//go:nowritebarrier
+func newosproc(mp *m) {
+	var (
+		attr pthreadattr
+		oset sigset
+		tid  pthread
+		ret  int32
+		size uint64
+	)
+
+	if pthread_attr_init(&attr) != 0 {
+		throw("pthread_attr_init")
+	}
+	// Allocate a new 2MB stack.
+	if pthread_attr_setstack(&attr, 0, 0x200000) != 0 {
+		throw("pthread_attr_setstack")
+	}
+	// Read back the allocated stack.
+	if pthread_attr_getstack(&attr, unsafe.Pointer(&mp.g0.stack.hi), &size) != 0 {
+		throw("pthread_attr_getstack")
+	}
+	mp.g0.stack.lo = mp.g0.stack.hi - uintptr(size)
+	if pthread_attr_setdetachstate(&attr, _PTHREAD_CREATE_DETACHED) != 0 {
+		throw("pthread_attr_setdetachstate")
+	}
+
+	// Disable signals during create, so that the new thread starts
+	// with signals disabled. It will enable them in minit.
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	ret = retryOnEAGAIN(func() int32 {
+		return pthread_create(&tid, &attr, abi.FuncPCABI0(tstart_sysvicall), unsafe.Pointer(mp))
+	})
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if ret != 0 {
+		print("runtime: failed to create new OS thread (have ", mcount(), " already; errno=", ret, ")\n")
+		if ret == _EAGAIN {
+			println("runtime: may need to increase max user processes (ulimit -u)")
+		}
+		throw("newosproc")
+	}
+}
+
+func exitThread(wait *atomic.Uint32) {
+	// We should never reach exitThread on Solaris because we let
+	// libc clean up threads.
+	throw("exitThread")
+}
+
+var urandom_dev = []byte("/dev/urandom\x00")
+
+//go:nosplit
+func getRandomData(r []byte) {
+	fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)
+	n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))
+	closefd(fd)
+	extendRandom(r, int(n))
+}
+
+func goenvs() {
+	goenvs_unix()
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	mp.gsignal = malg(32 * 1024)
+	mp.gsignal.m = mp
+}
+
+func miniterrno()
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+	asmcgocall(unsafe.Pointer(abi.FuncPCABI0(miniterrno)), unsafe.Pointer(&libc____errno))
+
+	minitSignals()
+
+	getg().m.procid = uint64(pthread_self())
+}
+
+// Called from dropm to undo the effect of an minit.
+func unminit() {
+	unminitSignals()
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+func sigtramp()
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa sigactiont
+
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTART
+	sa.sa_mask = sigset_all
+	if fn == abi.FuncPCABIInternal(sighandler) { // abi.FuncPCABIInternal(sighandler) matches the callers in signal_unix.go
+		fn = abi.FuncPCABI0(sigtramp)
+	}
+	*((*uintptr)(unsafe.Pointer(&sa._funcptr))) = fn
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsigstack(i uint32) {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	if sa.sa_flags&_SA_ONSTACK != 0 {
+		return
+	}
+	sa.sa_flags |= _SA_ONSTACK
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func getsig(i uint32) uintptr {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	return *((*uintptr)(unsafe.Pointer(&sa._funcptr)))
+}
+
+// setSignalstackSP sets the ss_sp field of a stackt.
+//
+//go:nosplit
+func setSignalstackSP(s *stackt, sp uintptr) {
+	*(*uintptr)(unsafe.Pointer(&s.ss_sp)) = sp
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	mask.__sigbits[(i-1)/32] |= 1 << ((uint32(i) - 1) & 31)
+}
+
+func sigdelset(mask *sigset, i int) {
+	mask.__sigbits[(i-1)/32] &^= 1 << ((uint32(i) - 1) & 31)
+}
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+}
+
+func setProcessCPUProfiler(hz int32) {
+	setProcessCPUProfilerTimer(hz)
+}
+
+func setThreadCPUProfiler(hz int32) {
+	setThreadCPUProfilerHz(hz)
+}
+
+//go:nosplit
+func validSIGPROF(mp *m, c *sigctxt) bool {
+	return true
+}
+
+//go:nosplit
+func semacreate(mp *m) {
+	if mp.waitsema != 0 {
+		return
+	}
+
+	var sem *semt
+
+	// Call libc's malloc rather than malloc. This will
+	// allocate space on the C heap. We can't call malloc
+	// here because it could cause a deadlock.
+	mp.libcall.fn = uintptr(unsafe.Pointer(&libc_malloc))
+	mp.libcall.n = 1
+	mp.scratch = mscratch{}
+	mp.scratch.v[0] = unsafe.Sizeof(*sem)
+	mp.libcall.args = uintptr(unsafe.Pointer(&mp.scratch))
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&mp.libcall))
+	sem = (*semt)(unsafe.Pointer(mp.libcall.r1))
+	if sem_init(sem, 0, 0) != 0 {
+		throw("sem_init")
+	}
+	mp.waitsema = uintptr(unsafe.Pointer(sem))
+}
+
+//go:nosplit
+func semasleep(ns int64) int32 {
+	mp := getg().m
+	if ns >= 0 {
+		mp.ts.tv_sec = ns / 1000000000
+		mp.ts.tv_nsec = ns % 1000000000
+
+		mp.libcall.fn = uintptr(unsafe.Pointer(&libc_sem_reltimedwait_np))
+		mp.libcall.n = 2
+		mp.scratch = mscratch{}
+		mp.scratch.v[0] = mp.waitsema
+		mp.scratch.v[1] = uintptr(unsafe.Pointer(&mp.ts))
+		mp.libcall.args = uintptr(unsafe.Pointer(&mp.scratch))
+		asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&mp.libcall))
+		if *mp.perrno != 0 {
+			if *mp.perrno == _ETIMEDOUT || *mp.perrno == _EAGAIN || *mp.perrno == _EINTR {
+				return -1
+			}
+			throw("sem_reltimedwait_np")
+		}
+		return 0
+	}
+	for {
+		mp.libcall.fn = uintptr(unsafe.Pointer(&libc_sem_wait))
+		mp.libcall.n = 1
+		mp.scratch = mscratch{}
+		mp.scratch.v[0] = mp.waitsema
+		mp.libcall.args = uintptr(unsafe.Pointer(&mp.scratch))
+		asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&mp.libcall))
+		if mp.libcall.r1 == 0 {
+			break
+		}
+		if *mp.perrno == _EINTR {
+			continue
+		}
+		throw("sem_wait")
+	}
+	return 0
+}
+
+//go:nosplit
+func semawakeup(mp *m) {
+	if sem_post((*semt)(unsafe.Pointer(mp.waitsema))) != 0 {
+		throw("sem_post")
+	}
+}
+
+//go:nosplit
+func closefd(fd int32) int32 {
+	return int32(sysvicall1(&libc_close, uintptr(fd)))
+}
+
+//go:nosplit
+func exit(r int32) {
+	sysvicall1(&libc_exit, uintptr(r))
+}
+
+//go:nosplit
+func getcontext(context *ucontext) /* int32 */ {
+	sysvicall1(&libc_getcontext, uintptr(unsafe.Pointer(context)))
+}
+
+//go:nosplit
+func madvise(addr unsafe.Pointer, n uintptr, flags int32) {
+	sysvicall3(&libc_madvise, uintptr(addr), uintptr(n), uintptr(flags))
+}
+
+//go:nosplit
+func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (unsafe.Pointer, int) {
+	p, err := doMmap(uintptr(addr), n, uintptr(prot), uintptr(flags), uintptr(fd), uintptr(off))
+	if p == ^uintptr(0) {
+		return nil, int(err)
+	}
+	return unsafe.Pointer(p), 0
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+func doMmap(addr, n, prot, flags, fd, off uintptr) (uintptr, uintptr) {
+	var libcall libcall
+	libcall.fn = uintptr(unsafe.Pointer(&libc_mmap))
+	libcall.n = 6
+	libcall.args = uintptr(noescape(unsafe.Pointer(&addr)))
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&libcall))
+	return libcall.r1, libcall.err
+}
+
+//go:nosplit
+func munmap(addr unsafe.Pointer, n uintptr) {
+	sysvicall2(&libc_munmap, uintptr(addr), uintptr(n))
+}
+
+const (
+	_CLOCK_REALTIME  = 3
+	_CLOCK_MONOTONIC = 4
+)
+
+//go:nosplit
+func nanotime1() int64 {
+	var ts mts
+	sysvicall2(&libc_clock_gettime, _CLOCK_MONOTONIC, uintptr(unsafe.Pointer(&ts)))
+	return ts.tv_sec*1e9 + ts.tv_nsec
+}
+
+//go:nosplit
+func open(path *byte, mode, perm int32) int32 {
+	return int32(sysvicall3(&libc_open, uintptr(unsafe.Pointer(path)), uintptr(mode), uintptr(perm)))
+}
+
+func pthread_attr_destroy(attr *pthreadattr) int32 {
+	return int32(sysvicall1(&libc_pthread_attr_destroy, uintptr(unsafe.Pointer(attr))))
+}
+
+func pthread_attr_getstack(attr *pthreadattr, addr unsafe.Pointer, size *uint64) int32 {
+	return int32(sysvicall3(&libc_pthread_attr_getstack, uintptr(unsafe.Pointer(attr)), uintptr(addr), uintptr(unsafe.Pointer(size))))
+}
+
+func pthread_attr_init(attr *pthreadattr) int32 {
+	return int32(sysvicall1(&libc_pthread_attr_init, uintptr(unsafe.Pointer(attr))))
+}
+
+func pthread_attr_setdetachstate(attr *pthreadattr, state int32) int32 {
+	return int32(sysvicall2(&libc_pthread_attr_setdetachstate, uintptr(unsafe.Pointer(attr)), uintptr(state)))
+}
+
+func pthread_attr_setstack(attr *pthreadattr, addr uintptr, size uint64) int32 {
+	return int32(sysvicall3(&libc_pthread_attr_setstack, uintptr(unsafe.Pointer(attr)), uintptr(addr), uintptr(size)))
+}
+
+func pthread_create(thread *pthread, attr *pthreadattr, fn uintptr, arg unsafe.Pointer) int32 {
+	return int32(sysvicall4(&libc_pthread_create, uintptr(unsafe.Pointer(thread)), uintptr(unsafe.Pointer(attr)), uintptr(fn), uintptr(arg)))
+}
+
+func pthread_self() pthread {
+	return pthread(sysvicall0(&libc_pthread_self))
+}
+
+func signalM(mp *m, sig int) {
+	sysvicall2(&libc_pthread_kill, uintptr(pthread(mp.procid)), uintptr(sig))
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func raise(sig uint32) /* int32 */ {
+	sysvicall1(&libc_raise, uintptr(sig))
+}
+
+func raiseproc(sig uint32) /* int32 */ {
+	pid := sysvicall0(&libc_getpid)
+	sysvicall2(&libc_kill, pid, uintptr(sig))
+}
+
+//go:nosplit
+func read(fd int32, buf unsafe.Pointer, nbyte int32) int32 {
+	r1, err := sysvicall3Err(&libc_read, uintptr(fd), uintptr(buf), uintptr(nbyte))
+	if c := int32(r1); c >= 0 {
+		return c
+	}
+	return -int32(err)
+}
+
+//go:nosplit
+func sem_init(sem *semt, pshared int32, value uint32) int32 {
+	return int32(sysvicall3(&libc_sem_init, uintptr(unsafe.Pointer(sem)), uintptr(pshared), uintptr(value)))
+}
+
+//go:nosplit
+func sem_post(sem *semt) int32 {
+	return int32(sysvicall1(&libc_sem_post, uintptr(unsafe.Pointer(sem))))
+}
+
+//go:nosplit
+func sem_reltimedwait_np(sem *semt, timeout *timespec) int32 {
+	return int32(sysvicall2(&libc_sem_reltimedwait_np, uintptr(unsafe.Pointer(sem)), uintptr(unsafe.Pointer(timeout))))
+}
+
+//go:nosplit
+func sem_wait(sem *semt) int32 {
+	return int32(sysvicall1(&libc_sem_wait, uintptr(unsafe.Pointer(sem))))
+}
+
+func setitimer(which int32, value *itimerval, ovalue *itimerval) /* int32 */ {
+	sysvicall3(&libc_setitimer, uintptr(which), uintptr(unsafe.Pointer(value)), uintptr(unsafe.Pointer(ovalue)))
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaction(sig uint32, act *sigactiont, oact *sigactiont) /* int32 */ {
+	sysvicall3(&libc_sigaction, uintptr(sig), uintptr(unsafe.Pointer(act)), uintptr(unsafe.Pointer(oact)))
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaltstack(ss *stackt, oss *stackt) /* int32 */ {
+	sysvicall2(&libc_sigaltstack, uintptr(unsafe.Pointer(ss)), uintptr(unsafe.Pointer(oss)))
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigprocmask(how int32, set *sigset, oset *sigset) /* int32 */ {
+	sysvicall3(&libc_sigprocmask, uintptr(how), uintptr(unsafe.Pointer(set)), uintptr(unsafe.Pointer(oset)))
+}
+
+func sysconf(name int32) int64 {
+	return int64(sysvicall1(&libc_sysconf, uintptr(name)))
+}
+
+func usleep1(usec uint32)
+
+//go:nosplit
+func usleep_no_g(µs uint32) {
+	usleep1(µs)
+}
+
+//go:nosplit
+func usleep(µs uint32) {
+	usleep1(µs)
+}
+
+func walltime() (sec int64, nsec int32) {
+	var ts mts
+	sysvicall2(&libc_clock_gettime, _CLOCK_REALTIME, uintptr(unsafe.Pointer(&ts)))
+	return ts.tv_sec, int32(ts.tv_nsec)
+}
+
+//go:nosplit
+func write1(fd uintptr, buf unsafe.Pointer, nbyte int32) int32 {
+	r1, err := sysvicall3Err(&libc_write, fd, uintptr(buf), uintptr(nbyte))
+	if c := int32(r1); c >= 0 {
+		return c
+	}
+	return -int32(err)
+}
+
+//go:nosplit
+func pipe2(flags int32) (r, w int32, errno int32) {
+	var p [2]int32
+	_, e := sysvicall2Err(&libc_pipe2, uintptr(noescape(unsafe.Pointer(&p))), uintptr(flags))
+	return p[0], p[1], int32(e)
+}
+
+//go:nosplit
+func fcntl(fd, cmd, arg int32) (ret int32, errno int32) {
+	r1, err := sysvicall3Err(&libc_fcntl, uintptr(fd), uintptr(cmd), uintptr(arg))
+	return int32(r1), int32(err)
+}
+
+func osyield1()
+
+//go:nosplit
+func osyield_no_g() {
+	osyield1()
+}
+
+//go:nosplit
+func osyield() {
+	sysvicall0(&libc_sched_yield)
+}
+
+//go:linkname executablePath os.executablePath
+var executablePath string
+
+func sysargs(argc int32, argv **byte) {
+	n := argc + 1
+
+	// skip over argv, envp to get to auxv
+	for argv_index(argv, n) != nil {
+		n++
+	}
+
+	// skip NULL separator
+	n++
+
+	// now argv+n is auxv
+	auxvp := (*[1 << 28]uintptr)(add(unsafe.Pointer(argv), uintptr(n)*goarch.PtrSize))
+	pairs := sysauxv(auxvp[:])
+	auxv = auxvp[: pairs*2 : pairs*2]
+}
+
+const (
+	_AT_NULL         = 0    // Terminates the vector
+	_AT_PAGESZ       = 6    // Page size in bytes
+	_AT_SUN_EXECNAME = 2014 // exec() path name
+)
+
+func sysauxv(auxv []uintptr) (pairs int) {
+	var i int
+	for i = 0; auxv[i] != _AT_NULL; i += 2 {
+		tag, val := auxv[i], auxv[i+1]
+		switch tag {
+		case _AT_PAGESZ:
+			physPageSize = val
+		case _AT_SUN_EXECNAME:
+			executablePath = gostringnocopy((*byte)(unsafe.Pointer(val)))
+		}
+	}
+	return i / 2
+}
+
+// sigPerThreadSyscall is only used on linux, so we assign a bogus signal
+// number.
+const sigPerThreadSyscall = 1 << 31
+
+//go:nosplit
+func runPerThreadSyscall() {
+	throw("runPerThreadSyscall only valid on linux")
+}
diff --git a/src/runtime/os_aix.go b/src/runtime/os_aix.go
new file mode 100644
index 0000000..0583e9a
--- /dev/null
+++ b/src/runtime/os_aix.go
@@ -0,0 +1,415 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build aix
+
+package runtime
+
+import (
+	"internal/abi"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const (
+	threadStackSize = 0x100000 // size of a thread stack allocated by OS
+)
+
+// funcDescriptor is a structure representing a function descriptor
+// A variable with this type is always created in assembler
+type funcDescriptor struct {
+	fn         uintptr
+	toc        uintptr
+	envPointer uintptr // unused in Golang
+}
+
+type mOS struct {
+	waitsema uintptr // semaphore for parking on locks
+	perrno   uintptr // pointer to tls errno
+}
+
+//go:nosplit
+func semacreate(mp *m) {
+	if mp.waitsema != 0 {
+		return
+	}
+
+	var sem *semt
+
+	// Call libc's malloc rather than malloc. This will
+	// allocate space on the C heap. We can't call mallocgc
+	// here because it could cause a deadlock.
+	sem = (*semt)(malloc(unsafe.Sizeof(*sem)))
+	if sem_init(sem, 0, 0) != 0 {
+		throw("sem_init")
+	}
+	mp.waitsema = uintptr(unsafe.Pointer(sem))
+}
+
+//go:nosplit
+func semasleep(ns int64) int32 {
+	mp := getg().m
+	if ns >= 0 {
+		var ts timespec
+
+		if clock_gettime(_CLOCK_REALTIME, &ts) != 0 {
+			throw("clock_gettime")
+		}
+		ts.tv_sec += ns / 1e9
+		ts.tv_nsec += ns % 1e9
+		if ts.tv_nsec >= 1e9 {
+			ts.tv_sec++
+			ts.tv_nsec -= 1e9
+		}
+
+		if r, err := sem_timedwait((*semt)(unsafe.Pointer(mp.waitsema)), &ts); r != 0 {
+			if err == _ETIMEDOUT || err == _EAGAIN || err == _EINTR {
+				return -1
+			}
+			println("sem_timedwait err ", err, " ts.tv_sec ", ts.tv_sec, " ts.tv_nsec ", ts.tv_nsec, " ns ", ns, " id ", mp.id)
+			throw("sem_timedwait")
+		}
+		return 0
+	}
+	for {
+		r1, err := sem_wait((*semt)(unsafe.Pointer(mp.waitsema)))
+		if r1 == 0 {
+			break
+		}
+		if err == _EINTR {
+			continue
+		}
+		throw("sem_wait")
+	}
+	return 0
+}
+
+//go:nosplit
+func semawakeup(mp *m) {
+	if sem_post((*semt)(unsafe.Pointer(mp.waitsema))) != 0 {
+		throw("sem_post")
+	}
+}
+
+func osinit() {
+	ncpu = int32(sysconf(__SC_NPROCESSORS_ONLN))
+	physPageSize = sysconf(__SC_PAGE_SIZE)
+}
+
+// newosproc0 is a version of newosproc that can be called before the runtime
+// is initialized.
+//
+// This function is not safe to use after initialization as it does not pass an M as fnarg.
+//
+//go:nosplit
+func newosproc0(stacksize uintptr, fn *funcDescriptor) {
+	var (
+		attr pthread_attr
+		oset sigset
+		tid  pthread
+	)
+
+	if pthread_attr_init(&attr) != 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+
+	if pthread_attr_setstacksize(&attr, threadStackSize) != 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+
+	if pthread_attr_setdetachstate(&attr, _PTHREAD_CREATE_DETACHED) != 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+
+	// Disable signals during create, so that the new thread starts
+	// with signals disabled. It will enable them in minit.
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	var ret int32
+	for tries := 0; tries < 20; tries++ {
+		// pthread_create can fail with EAGAIN for no reasons
+		// but it will be ok if it retries.
+		ret = pthread_create(&tid, &attr, fn, nil)
+		if ret != _EAGAIN {
+			break
+		}
+		usleep(uint32(tries+1) * 1000) // Milliseconds.
+	}
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if ret != 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+
+}
+
+// Called to do synchronous initialization of Go code built with
+// -buildmode=c-archive or -buildmode=c-shared.
+// None of the Go runtime is initialized.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func libpreinit() {
+	initsig(true)
+}
+
+// Ms related functions
+func mpreinit(mp *m) {
+	mp.gsignal = malg(32 * 1024) // AIX wants >= 8K
+	mp.gsignal.m = mp
+}
+
+// errno address must be retrieved by calling _Errno libc function.
+// This will return a pointer to errno.
+func miniterrno() {
+	mp := getg().m
+	r, _ := syscall0(&libc__Errno)
+	mp.perrno = r
+
+}
+
+func minit() {
+	miniterrno()
+	minitSignals()
+	getg().m.procid = uint64(pthread_self())
+}
+
+func unminit() {
+	unminitSignals()
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+// tstart is a function descriptor to _tstart defined in assembly.
+var tstart funcDescriptor
+
+func newosproc(mp *m) {
+	var (
+		attr pthread_attr
+		oset sigset
+		tid  pthread
+	)
+
+	if pthread_attr_init(&attr) != 0 {
+		throw("pthread_attr_init")
+	}
+
+	if pthread_attr_setstacksize(&attr, threadStackSize) != 0 {
+		throw("pthread_attr_getstacksize")
+	}
+
+	if pthread_attr_setdetachstate(&attr, _PTHREAD_CREATE_DETACHED) != 0 {
+		throw("pthread_attr_setdetachstate")
+	}
+
+	// Disable signals during create, so that the new thread starts
+	// with signals disabled. It will enable them in minit.
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	ret := retryOnEAGAIN(func() int32 {
+		return pthread_create(&tid, &attr, &tstart, unsafe.Pointer(mp))
+	})
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if ret != 0 {
+		print("runtime: failed to create new OS thread (have ", mcount(), " already; errno=", ret, ")\n")
+		if ret == _EAGAIN {
+			println("runtime: may need to increase max user processes (ulimit -u)")
+		}
+		throw("newosproc")
+	}
+
+}
+
+func exitThread(wait *atomic.Uint32) {
+	// We should never reach exitThread on AIX because we let
+	// libc clean up threads.
+	throw("exitThread")
+}
+
+var urandom_dev = []byte("/dev/urandom\x00")
+
+//go:nosplit
+func getRandomData(r []byte) {
+	fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)
+	n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))
+	closefd(fd)
+	extendRandom(r, int(n))
+}
+
+func goenvs() {
+	goenvs_unix()
+}
+
+/* SIGNAL */
+
+const (
+	_NSIG = 256
+)
+
+// sigtramp is a function descriptor to _sigtramp defined in assembly
+var sigtramp funcDescriptor
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa sigactiont
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTART
+	sa.sa_mask = sigset_all
+	if fn == abi.FuncPCABIInternal(sighandler) { // abi.FuncPCABIInternal(sighandler) matches the callers in signal_unix.go
+		fn = uintptr(unsafe.Pointer(&sigtramp))
+	}
+	sa.sa_handler = fn
+	sigaction(uintptr(i), &sa, nil)
+
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsigstack(i uint32) {
+	var sa sigactiont
+	sigaction(uintptr(i), nil, &sa)
+	if sa.sa_flags&_SA_ONSTACK != 0 {
+		return
+	}
+	sa.sa_flags |= _SA_ONSTACK
+	sigaction(uintptr(i), &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func getsig(i uint32) uintptr {
+	var sa sigactiont
+	sigaction(uintptr(i), nil, &sa)
+	return sa.sa_handler
+}
+
+// setSignalstackSP sets the ss_sp field of a stackt.
+//
+//go:nosplit
+func setSignalstackSP(s *stackt, sp uintptr) {
+	*(*uintptr)(unsafe.Pointer(&s.ss_sp)) = sp
+}
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+	switch sig {
+	case _SIGPIPE:
+		// For SIGPIPE, c.sigcode() isn't set to _SI_USER as on Linux.
+		// Therefore, raisebadsignal won't raise SIGPIPE again if
+		// it was deliver in a non-Go thread.
+		c.set_sigcode(_SI_USER)
+	}
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	(*mask)[(i-1)/64] |= 1 << ((uint32(i) - 1) & 63)
+}
+
+func sigdelset(mask *sigset, i int) {
+	(*mask)[(i-1)/64] &^= 1 << ((uint32(i) - 1) & 63)
+}
+
+func setProcessCPUProfiler(hz int32) {
+	setProcessCPUProfilerTimer(hz)
+}
+
+func setThreadCPUProfiler(hz int32) {
+	setThreadCPUProfilerHz(hz)
+}
+
+//go:nosplit
+func validSIGPROF(mp *m, c *sigctxt) bool {
+	return true
+}
+
+const (
+	_CLOCK_REALTIME  = 9
+	_CLOCK_MONOTONIC = 10
+)
+
+//go:nosplit
+func nanotime1() int64 {
+	tp := &timespec{}
+	if clock_gettime(_CLOCK_REALTIME, tp) != 0 {
+		throw("syscall clock_gettime failed")
+	}
+	return tp.tv_sec*1000000000 + tp.tv_nsec
+}
+
+func walltime() (sec int64, nsec int32) {
+	ts := &timespec{}
+	if clock_gettime(_CLOCK_REALTIME, ts) != 0 {
+		throw("syscall clock_gettime failed")
+	}
+	return ts.tv_sec, int32(ts.tv_nsec)
+}
+
+//go:nosplit
+func fcntl(fd, cmd, arg int32) (int32, int32) {
+	r, errno := syscall3(&libc_fcntl, uintptr(fd), uintptr(cmd), uintptr(arg))
+	return int32(r), int32(errno)
+}
+
+//go:nosplit
+func setNonblock(fd int32) {
+	flags, _ := fcntl(fd, _F_GETFL, 0)
+	if flags != -1 {
+		fcntl(fd, _F_SETFL, flags|_O_NONBLOCK)
+	}
+}
+
+// sigPerThreadSyscall is only used on linux, so we assign a bogus signal
+// number.
+const sigPerThreadSyscall = 1 << 31
+
+//go:nosplit
+func runPerThreadSyscall() {
+	throw("runPerThreadSyscall only valid on linux")
+}
+
+//go:nosplit
+func getuid() int32 {
+	r, errno := syscall0(&libc_getuid)
+	if errno != 0 {
+		print("getuid failed ", errno)
+		throw("getuid")
+	}
+	return int32(r)
+}
+
+//go:nosplit
+func geteuid() int32 {
+	r, errno := syscall0(&libc_geteuid)
+	if errno != 0 {
+		print("geteuid failed ", errno)
+		throw("geteuid")
+	}
+	return int32(r)
+}
+
+//go:nosplit
+func getgid() int32 {
+	r, errno := syscall0(&libc_getgid)
+	if errno != 0 {
+		print("getgid failed ", errno)
+		throw("getgid")
+	}
+	return int32(r)
+}
+
+//go:nosplit
+func getegid() int32 {
+	r, errno := syscall0(&libc_getegid)
+	if errno != 0 {
+		print("getegid failed ", errno)
+		throw("getegid")
+	}
+	return int32(r)
+}
diff --git a/src/runtime/os_android.go b/src/runtime/os_android.go
new file mode 100644
index 0000000..52c8c86
--- /dev/null
+++ b/src/runtime/os_android.go
@@ -0,0 +1,15 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import _ "unsafe" // for go:cgo_export_static and go:cgo_export_dynamic
+
+// Export the main function.
+//
+// Used by the app package to start all-Go Android apps that are
+// loaded via JNI. See golang.org/x/mobile/app.
+
+//go:cgo_export_static main.main
+//go:cgo_export_dynamic main.main
diff --git a/src/runtime/os_darwin.go b/src/runtime/os_darwin.go
new file mode 100644
index 0000000..105de47
--- /dev/null
+++ b/src/runtime/os_darwin.go
@@ -0,0 +1,476 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"unsafe"
+)
+
+type mOS struct {
+	initialized bool
+	mutex       pthreadmutex
+	cond        pthreadcond
+	count       int
+}
+
+func unimplemented(name string) {
+	println(name, "not implemented")
+	*(*int)(unsafe.Pointer(uintptr(1231))) = 1231
+}
+
+//go:nosplit
+func semacreate(mp *m) {
+	if mp.initialized {
+		return
+	}
+	mp.initialized = true
+	if err := pthread_mutex_init(&mp.mutex, nil); err != 0 {
+		throw("pthread_mutex_init")
+	}
+	if err := pthread_cond_init(&mp.cond, nil); err != 0 {
+		throw("pthread_cond_init")
+	}
+}
+
+//go:nosplit
+func semasleep(ns int64) int32 {
+	var start int64
+	if ns >= 0 {
+		start = nanotime()
+	}
+	mp := getg().m
+	pthread_mutex_lock(&mp.mutex)
+	for {
+		if mp.count > 0 {
+			mp.count--
+			pthread_mutex_unlock(&mp.mutex)
+			return 0
+		}
+		if ns >= 0 {
+			spent := nanotime() - start
+			if spent >= ns {
+				pthread_mutex_unlock(&mp.mutex)
+				return -1
+			}
+			var t timespec
+			t.setNsec(ns - spent)
+			err := pthread_cond_timedwait_relative_np(&mp.cond, &mp.mutex, &t)
+			if err == _ETIMEDOUT {
+				pthread_mutex_unlock(&mp.mutex)
+				return -1
+			}
+		} else {
+			pthread_cond_wait(&mp.cond, &mp.mutex)
+		}
+	}
+}
+
+//go:nosplit
+func semawakeup(mp *m) {
+	pthread_mutex_lock(&mp.mutex)
+	mp.count++
+	if mp.count > 0 {
+		pthread_cond_signal(&mp.cond)
+	}
+	pthread_mutex_unlock(&mp.mutex)
+}
+
+// The read and write file descriptors used by the sigNote functions.
+var sigNoteRead, sigNoteWrite int32
+
+// sigNoteSetup initializes a single, there-can-only-be-one, async-signal-safe note.
+//
+// The current implementation of notes on Darwin is not async-signal-safe,
+// because the functions pthread_mutex_lock, pthread_cond_signal, and
+// pthread_mutex_unlock, called by semawakeup, are not async-signal-safe.
+// There is only one case where we need to wake up a note from a signal
+// handler: the sigsend function. The signal handler code does not require
+// all the features of notes: it does not need to do a timed wait.
+// This is a separate implementation of notes, based on a pipe, that does
+// not support timed waits but is async-signal-safe.
+func sigNoteSetup(*note) {
+	if sigNoteRead != 0 || sigNoteWrite != 0 {
+		// Generalizing this would require avoiding the pipe-fork-closeonexec race, which entangles syscall.
+		throw("duplicate sigNoteSetup")
+	}
+	var errno int32
+	sigNoteRead, sigNoteWrite, errno = pipe()
+	if errno != 0 {
+		throw("pipe failed")
+	}
+	closeonexec(sigNoteRead)
+	closeonexec(sigNoteWrite)
+
+	// Make the write end of the pipe non-blocking, so that if the pipe
+	// buffer is somehow full we will not block in the signal handler.
+	// Leave the read end of the pipe blocking so that we will block
+	// in sigNoteSleep.
+	setNonblock(sigNoteWrite)
+}
+
+// sigNoteWakeup wakes up a thread sleeping on a note created by sigNoteSetup.
+func sigNoteWakeup(*note) {
+	var b byte
+	write(uintptr(sigNoteWrite), unsafe.Pointer(&b), 1)
+}
+
+// sigNoteSleep waits for a note created by sigNoteSetup to be woken.
+func sigNoteSleep(*note) {
+	for {
+		var b byte
+		entersyscallblock()
+		n := read(sigNoteRead, unsafe.Pointer(&b), 1)
+		exitsyscall()
+		if n != -_EINTR {
+			return
+		}
+	}
+}
+
+// BSD interface for threading.
+func osinit() {
+	// pthread_create delayed until end of goenvs so that we
+	// can look at the environment first.
+
+	ncpu = getncpu()
+	physPageSize = getPageSize()
+
+	osinit_hack()
+}
+
+func sysctlbynameInt32(name []byte) (int32, int32) {
+	out := int32(0)
+	nout := unsafe.Sizeof(out)
+	ret := sysctlbyname(&name[0], (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	return ret, out
+}
+
+//go:linkname internal_cpu_getsysctlbyname internal/cpu.getsysctlbyname
+func internal_cpu_getsysctlbyname(name []byte) (int32, int32) {
+	return sysctlbynameInt32(name)
+}
+
+const (
+	_CTL_HW      = 6
+	_HW_NCPU     = 3
+	_HW_PAGESIZE = 7
+)
+
+func getncpu() int32 {
+	// Use sysctl to fetch hw.ncpu.
+	mib := [2]uint32{_CTL_HW, _HW_NCPU}
+	out := uint32(0)
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], 2, (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret >= 0 && int32(out) > 0 {
+		return int32(out)
+	}
+	return 1
+}
+
+func getPageSize() uintptr {
+	// Use sysctl to fetch hw.pagesize.
+	mib := [2]uint32{_CTL_HW, _HW_PAGESIZE}
+	out := uint32(0)
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], 2, (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret >= 0 && int32(out) > 0 {
+		return uintptr(out)
+	}
+	return 0
+}
+
+var urandom_dev = []byte("/dev/urandom\x00")
+
+//go:nosplit
+func getRandomData(r []byte) {
+	fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)
+	n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))
+	closefd(fd)
+	extendRandom(r, int(n))
+}
+
+func goenvs() {
+	goenvs_unix()
+}
+
+// May run with m.p==nil, so write barriers are not allowed.
+//
+//go:nowritebarrierrec
+func newosproc(mp *m) {
+	stk := unsafe.Pointer(mp.g0.stack.hi)
+	if false {
+		print("newosproc stk=", stk, " m=", mp, " g=", mp.g0, " id=", mp.id, " ostk=", &mp, "\n")
+	}
+
+	// Initialize an attribute object.
+	var attr pthreadattr
+	var err int32
+	err = pthread_attr_init(&attr)
+	if err != 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+
+	// Find out OS stack size for our own stack guard.
+	var stacksize uintptr
+	if pthread_attr_getstacksize(&attr, &stacksize) != 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+	mp.g0.stack.hi = stacksize // for mstart
+
+	// Tell the pthread library we won't join with this thread.
+	if pthread_attr_setdetachstate(&attr, _PTHREAD_CREATE_DETACHED) != 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+
+	// Finally, create the thread. It starts at mstart_stub, which does some low-level
+	// setup and then calls mstart.
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	err = retryOnEAGAIN(func() int32 {
+		return pthread_create(&attr, abi.FuncPCABI0(mstart_stub), unsafe.Pointer(mp))
+	})
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if err != 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+}
+
+// glue code to call mstart from pthread_create.
+func mstart_stub()
+
+// newosproc0 is a version of newosproc that can be called before the runtime
+// is initialized.
+//
+// This function is not safe to use after initialization as it does not pass an M as fnarg.
+//
+//go:nosplit
+func newosproc0(stacksize uintptr, fn uintptr) {
+	// Initialize an attribute object.
+	var attr pthreadattr
+	var err int32
+	err = pthread_attr_init(&attr)
+	if err != 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+
+	// The caller passes in a suggested stack size,
+	// from when we allocated the stack and thread ourselves,
+	// without libpthread. Now that we're using libpthread,
+	// we use the OS default stack size instead of the suggestion.
+	// Find out that stack size for our own stack guard.
+	if pthread_attr_getstacksize(&attr, &stacksize) != 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+	g0.stack.hi = stacksize // for mstart
+	memstats.stacks_sys.add(int64(stacksize))
+
+	// Tell the pthread library we won't join with this thread.
+	if pthread_attr_setdetachstate(&attr, _PTHREAD_CREATE_DETACHED) != 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+
+	// Finally, create the thread. It starts at mstart_stub, which does some low-level
+	// setup and then calls mstart.
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	err = pthread_create(&attr, fn, nil)
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if err != 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+}
+
+// Called to do synchronous initialization of Go code built with
+// -buildmode=c-archive or -buildmode=c-shared.
+// None of the Go runtime is initialized.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func libpreinit() {
+	initsig(true)
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	mp.gsignal = malg(32 * 1024) // OS X wants >= 8K
+	mp.gsignal.m = mp
+	if GOOS == "darwin" && GOARCH == "arm64" {
+		// mlock the signal stack to work around a kernel bug where it may
+		// SIGILL when the signal stack is not faulted in while a signal
+		// arrives. See issue 42774.
+		mlock(unsafe.Pointer(mp.gsignal.stack.hi-physPageSize), physPageSize)
+	}
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+	// iOS does not support alternate signal stack.
+	// The signal handler handles it directly.
+	if !(GOOS == "ios" && GOARCH == "arm64") {
+		minitSignalStack()
+	}
+	minitSignalMask()
+	getg().m.procid = uint64(pthread_self())
+}
+
+// Called from dropm to undo the effect of an minit.
+//
+//go:nosplit
+func unminit() {
+	// iOS does not support alternate signal stack.
+	// See minit.
+	if !(GOOS == "ios" && GOARCH == "arm64") {
+		unminitSignals()
+	}
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+//go:nosplit
+func osyield_no_g() {
+	usleep_no_g(1)
+}
+
+//go:nosplit
+func osyield() {
+	usleep(1)
+}
+
+const (
+	_NSIG        = 32
+	_SI_USER     = 0 /* empirically true, but not what headers say */
+	_SIG_BLOCK   = 1
+	_SIG_UNBLOCK = 2
+	_SIG_SETMASK = 3
+	_SS_DISABLE  = 4
+)
+
+//extern SigTabTT runtime·sigtab[];
+
+type sigset uint32
+
+var sigset_all = ^sigset(0)
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa usigactiont
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTART
+	sa.sa_mask = ^uint32(0)
+	if fn == abi.FuncPCABIInternal(sighandler) { // abi.FuncPCABIInternal(sighandler) matches the callers in signal_unix.go
+		if iscgo {
+			fn = abi.FuncPCABI0(cgoSigtramp)
+		} else {
+			fn = abi.FuncPCABI0(sigtramp)
+		}
+	}
+	*(*uintptr)(unsafe.Pointer(&sa.__sigaction_u)) = fn
+	sigaction(i, &sa, nil)
+}
+
+// sigtramp is the callback from libc when a signal is received.
+// It is called with the C calling convention.
+func sigtramp()
+func cgoSigtramp()
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsigstack(i uint32) {
+	var osa usigactiont
+	sigaction(i, nil, &osa)
+	handler := *(*uintptr)(unsafe.Pointer(&osa.__sigaction_u))
+	if osa.sa_flags&_SA_ONSTACK != 0 {
+		return
+	}
+	var sa usigactiont
+	*(*uintptr)(unsafe.Pointer(&sa.__sigaction_u)) = handler
+	sa.sa_mask = osa.sa_mask
+	sa.sa_flags = osa.sa_flags | _SA_ONSTACK
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func getsig(i uint32) uintptr {
+	var sa usigactiont
+	sigaction(i, nil, &sa)
+	return *(*uintptr)(unsafe.Pointer(&sa.__sigaction_u))
+}
+
+// setSignalstackSP sets the ss_sp field of a stackt.
+//
+//go:nosplit
+func setSignalstackSP(s *stackt, sp uintptr) {
+	*(*uintptr)(unsafe.Pointer(&s.ss_sp)) = sp
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	*mask |= 1 << (uint32(i) - 1)
+}
+
+func sigdelset(mask *sigset, i int) {
+	*mask &^= 1 << (uint32(i) - 1)
+}
+
+func setProcessCPUProfiler(hz int32) {
+	setProcessCPUProfilerTimer(hz)
+}
+
+func setThreadCPUProfiler(hz int32) {
+	setThreadCPUProfilerHz(hz)
+}
+
+//go:nosplit
+func validSIGPROF(mp *m, c *sigctxt) bool {
+	return true
+}
+
+//go:linkname executablePath os.executablePath
+var executablePath string
+
+func sysargs(argc int32, argv **byte) {
+	// skip over argv, envv and the first string will be the path
+	n := argc + 1
+	for argv_index(argv, n) != nil {
+		n++
+	}
+	executablePath = gostringnocopy(argv_index(argv, n+1))
+
+	// strip "executable_path=" prefix if available, it's added after OS X 10.11.
+	const prefix = "executable_path="
+	if len(executablePath) > len(prefix) && executablePath[:len(prefix)] == prefix {
+		executablePath = executablePath[len(prefix):]
+	}
+}
+
+func signalM(mp *m, sig int) {
+	pthread_kill(pthread(mp.procid), uint32(sig))
+}
+
+// sigPerThreadSyscall is only used on linux, so we assign a bogus signal
+// number.
+const sigPerThreadSyscall = 1 << 31
+
+//go:nosplit
+func runPerThreadSyscall() {
+	throw("runPerThreadSyscall only valid on linux")
+}
diff --git a/src/runtime/os_darwin_arm64.go b/src/runtime/os_darwin_arm64.go
new file mode 100644
index 0000000..b808150
--- /dev/null
+++ b/src/runtime/os_darwin_arm64.go
@@ -0,0 +1,12 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_dragonfly.go b/src/runtime/os_dragonfly.go
new file mode 100644
index 0000000..8268c7f
--- /dev/null
+++ b/src/runtime/os_dragonfly.go
@@ -0,0 +1,344 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+const (
+	_NSIG        = 33
+	_SI_USER     = 0
+	_SS_DISABLE  = 4
+	_SIG_BLOCK   = 1
+	_SIG_UNBLOCK = 2
+	_SIG_SETMASK = 3
+)
+
+type mOS struct{}
+
+//go:noescape
+func lwp_create(param *lwpparams) int32
+
+//go:noescape
+func sigaltstack(new, old *stackt)
+
+//go:noescape
+func sigaction(sig uint32, new, old *sigactiont)
+
+//go:noescape
+func sigprocmask(how int32, new, old *sigset)
+
+//go:noescape
+func setitimer(mode int32, new, old *itimerval)
+
+//go:noescape
+func sysctl(mib *uint32, miblen uint32, out *byte, size *uintptr, dst *byte, ndst uintptr) int32
+
+func raiseproc(sig uint32)
+
+func lwp_gettid() int32
+func lwp_kill(pid, tid int32, sig int)
+
+//go:noescape
+func sys_umtx_sleep(addr *uint32, val, timeout int32) int32
+
+//go:noescape
+func sys_umtx_wakeup(addr *uint32, val int32) int32
+
+func osyield()
+
+//go:nosplit
+func osyield_no_g() {
+	osyield()
+}
+
+func kqueue() int32
+
+//go:noescape
+func kevent(kq int32, ch *keventt, nch int32, ev *keventt, nev int32, ts *timespec) int32
+
+func pipe2(flags int32) (r, w int32, errno int32)
+func fcntl(fd, cmd, arg int32) (ret int32, errno int32)
+
+func issetugid() int32
+
+// From DragonFly's <sys/sysctl.h>
+const (
+	_CTL_HW      = 6
+	_HW_NCPU     = 3
+	_HW_PAGESIZE = 7
+)
+
+var sigset_all = sigset{[4]uint32{^uint32(0), ^uint32(0), ^uint32(0), ^uint32(0)}}
+
+func getncpu() int32 {
+	mib := [2]uint32{_CTL_HW, _HW_NCPU}
+	out := uint32(0)
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], 2, (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret >= 0 {
+		return int32(out)
+	}
+	return 1
+}
+
+func getPageSize() uintptr {
+	mib := [2]uint32{_CTL_HW, _HW_PAGESIZE}
+	out := uint32(0)
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], 2, (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret >= 0 {
+		return uintptr(out)
+	}
+	return 0
+}
+
+//go:nosplit
+func futexsleep(addr *uint32, val uint32, ns int64) {
+	systemstack(func() {
+		futexsleep1(addr, val, ns)
+	})
+}
+
+func futexsleep1(addr *uint32, val uint32, ns int64) {
+	var timeout int32
+	if ns >= 0 {
+		// The timeout is specified in microseconds - ensure that we
+		// do not end up dividing to zero, which would put us to sleep
+		// indefinitely...
+		timeout = timediv(ns, 1000, nil)
+		if timeout == 0 {
+			timeout = 1
+		}
+	}
+
+	// sys_umtx_sleep will return EWOULDBLOCK (EAGAIN) when the timeout
+	// expires or EBUSY if the mutex value does not match.
+	ret := sys_umtx_sleep(addr, int32(val), timeout)
+	if ret >= 0 || ret == -_EINTR || ret == -_EAGAIN || ret == -_EBUSY {
+		return
+	}
+
+	print("umtx_sleep addr=", addr, " val=", val, " ret=", ret, "\n")
+	*(*int32)(unsafe.Pointer(uintptr(0x1005))) = 0x1005
+}
+
+//go:nosplit
+func futexwakeup(addr *uint32, cnt uint32) {
+	ret := sys_umtx_wakeup(addr, int32(cnt))
+	if ret >= 0 {
+		return
+	}
+
+	systemstack(func() {
+		print("umtx_wake_addr=", addr, " ret=", ret, "\n")
+		*(*int32)(unsafe.Pointer(uintptr(0x1006))) = 0x1006
+	})
+}
+
+func lwp_start(uintptr)
+
+// May run with m.p==nil, so write barriers are not allowed.
+//
+//go:nowritebarrier
+func newosproc(mp *m) {
+	stk := unsafe.Pointer(mp.g0.stack.hi)
+	if false {
+		print("newosproc stk=", stk, " m=", mp, " g=", mp.g0, " lwp_start=", abi.FuncPCABI0(lwp_start), " id=", mp.id, " ostk=", &mp, "\n")
+	}
+
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+
+	params := lwpparams{
+		start_func: abi.FuncPCABI0(lwp_start),
+		arg:        unsafe.Pointer(mp),
+		stack:      uintptr(stk),
+		tid1:       nil, // minit will record tid
+		tid2:       nil,
+	}
+
+	// TODO: Check for error.
+	retryOnEAGAIN(func() int32 {
+		lwp_create(&params)
+		return 0
+	})
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+}
+
+func osinit() {
+	ncpu = getncpu()
+	if physPageSize == 0 {
+		physPageSize = getPageSize()
+	}
+}
+
+var urandom_dev = []byte("/dev/urandom\x00")
+
+//go:nosplit
+func getRandomData(r []byte) {
+	fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)
+	n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))
+	closefd(fd)
+	extendRandom(r, int(n))
+}
+
+func goenvs() {
+	goenvs_unix()
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	mp.gsignal = malg(32 * 1024)
+	mp.gsignal.m = mp
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+	getg().m.procid = uint64(lwp_gettid())
+	minitSignals()
+}
+
+// Called from dropm to undo the effect of an minit.
+//
+//go:nosplit
+func unminit() {
+	unminitSignals()
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+func sigtramp()
+
+type sigactiont struct {
+	sa_sigaction uintptr
+	sa_flags     int32
+	sa_mask      sigset
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa sigactiont
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTART
+	sa.sa_mask = sigset_all
+	if fn == abi.FuncPCABIInternal(sighandler) { // abi.FuncPCABIInternal(sighandler) matches the callers in signal_unix.go
+		fn = abi.FuncPCABI0(sigtramp)
+	}
+	sa.sa_sigaction = fn
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsigstack(i uint32) {
+	throw("setsigstack")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func getsig(i uint32) uintptr {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	return sa.sa_sigaction
+}
+
+// setSignalstackSP sets the ss_sp field of a stackt.
+//
+//go:nosplit
+func setSignalstackSP(s *stackt, sp uintptr) {
+	s.ss_sp = sp
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	mask.__bits[(i-1)/32] |= 1 << ((uint32(i) - 1) & 31)
+}
+
+func sigdelset(mask *sigset, i int) {
+	mask.__bits[(i-1)/32] &^= 1 << ((uint32(i) - 1) & 31)
+}
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+}
+
+func setProcessCPUProfiler(hz int32) {
+	setProcessCPUProfilerTimer(hz)
+}
+
+func setThreadCPUProfiler(hz int32) {
+	setThreadCPUProfilerHz(hz)
+}
+
+//go:nosplit
+func validSIGPROF(mp *m, c *sigctxt) bool {
+	return true
+}
+
+func sysargs(argc int32, argv **byte) {
+	n := argc + 1
+
+	// skip over argv, envp to get to auxv
+	for argv_index(argv, n) != nil {
+		n++
+	}
+
+	// skip NULL separator
+	n++
+
+	auxvp := (*[1 << 28]uintptr)(add(unsafe.Pointer(argv), uintptr(n)*goarch.PtrSize))
+	pairs := sysauxv(auxvp[:])
+	auxv = auxvp[: pairs*2 : pairs*2]
+}
+
+const (
+	_AT_NULL   = 0
+	_AT_PAGESZ = 6
+)
+
+func sysauxv(auxv []uintptr) (pairs int) {
+	var i int
+	for i = 0; auxv[i] != _AT_NULL; i += 2 {
+		tag, val := auxv[i], auxv[i+1]
+		switch tag {
+		case _AT_PAGESZ:
+			physPageSize = val
+		}
+	}
+	return i / 2
+}
+
+// raise sends a signal to the calling thread.
+//
+// It must be nosplit because it is used by the signal handler before
+// it definitely has a Go stack.
+//
+//go:nosplit
+func raise(sig uint32) {
+	lwp_kill(-1, lwp_gettid(), int(sig))
+}
+
+func signalM(mp *m, sig int) {
+	lwp_kill(-1, int32(mp.procid), sig)
+}
+
+// sigPerThreadSyscall is only used on linux, so we assign a bogus signal
+// number.
+const sigPerThreadSyscall = 1 << 31
+
+//go:nosplit
+func runPerThreadSyscall() {
+	throw("runPerThreadSyscall only valid on linux")
+}
diff --git a/src/runtime/os_freebsd.go b/src/runtime/os_freebsd.go
new file mode 100644
index 0000000..3af234e
--- /dev/null
+++ b/src/runtime/os_freebsd.go
@@ -0,0 +1,482 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+type mOS struct{}
+
+//go:noescape
+func thr_new(param *thrparam, size int32) int32
+
+//go:noescape
+func sigaltstack(new, old *stackt)
+
+//go:noescape
+func sigprocmask(how int32, new, old *sigset)
+
+//go:noescape
+func setitimer(mode int32, new, old *itimerval)
+
+//go:noescape
+func sysctl(mib *uint32, miblen uint32, out *byte, size *uintptr, dst *byte, ndst uintptr) int32
+
+func raiseproc(sig uint32)
+
+func thr_self() thread
+func thr_kill(tid thread, sig int)
+
+//go:noescape
+func sys_umtx_op(addr *uint32, mode int32, val uint32, uaddr1 uintptr, ut *umtx_time) int32
+
+func osyield()
+
+//go:nosplit
+func osyield_no_g() {
+	osyield()
+}
+
+func kqueue() int32
+
+//go:noescape
+func kevent(kq int32, ch *keventt, nch int32, ev *keventt, nev int32, ts *timespec) int32
+
+func pipe2(flags int32) (r, w int32, errno int32)
+func fcntl(fd, cmd, arg int32) (ret int32, errno int32)
+
+func issetugid() int32
+
+// From FreeBSD's <sys/sysctl.h>
+const (
+	_CTL_HW      = 6
+	_HW_PAGESIZE = 7
+)
+
+var sigset_all = sigset{[4]uint32{^uint32(0), ^uint32(0), ^uint32(0), ^uint32(0)}}
+
+// Undocumented numbers from FreeBSD's lib/libc/gen/sysctlnametomib.c.
+const (
+	_CTL_QUERY     = 0
+	_CTL_QUERY_MIB = 3
+)
+
+// sysctlnametomib fill mib with dynamically assigned sysctl entries of name,
+// return count of effected mib slots, return 0 on error.
+func sysctlnametomib(name []byte, mib *[_CTL_MAXNAME]uint32) uint32 {
+	oid := [2]uint32{_CTL_QUERY, _CTL_QUERY_MIB}
+	miblen := uintptr(_CTL_MAXNAME)
+	if sysctl(&oid[0], 2, (*byte)(unsafe.Pointer(mib)), &miblen, (*byte)(unsafe.Pointer(&name[0])), (uintptr)(len(name))) < 0 {
+		return 0
+	}
+	miblen /= unsafe.Sizeof(uint32(0))
+	if miblen <= 0 {
+		return 0
+	}
+	return uint32(miblen)
+}
+
+const (
+	_CPU_CURRENT_PID = -1 // Current process ID.
+)
+
+//go:noescape
+func cpuset_getaffinity(level int, which int, id int64, size int, mask *byte) int32
+
+//go:systemstack
+func getncpu() int32 {
+	// Use a large buffer for the CPU mask. We're on the system
+	// stack, so this is fine, and we can't allocate memory for a
+	// dynamically-sized buffer at this point.
+	const maxCPUs = 64 * 1024
+	var mask [maxCPUs / 8]byte
+	var mib [_CTL_MAXNAME]uint32
+
+	// According to FreeBSD's /usr/src/sys/kern/kern_cpuset.c,
+	// cpuset_getaffinity return ERANGE when provided buffer size exceed the limits in kernel.
+	// Querying kern.smp.maxcpus to calculate maximum buffer size.
+	// See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200802
+
+	// Variable kern.smp.maxcpus introduced at Dec 23 2003, revision 123766,
+	// with dynamically assigned sysctl entries.
+	miblen := sysctlnametomib([]byte("kern.smp.maxcpus"), &mib)
+	if miblen == 0 {
+		return 1
+	}
+
+	// Query kern.smp.maxcpus.
+	dstsize := uintptr(4)
+	maxcpus := uint32(0)
+	if sysctl(&mib[0], miblen, (*byte)(unsafe.Pointer(&maxcpus)), &dstsize, nil, 0) != 0 {
+		return 1
+	}
+
+	maskSize := int(maxcpus+7) / 8
+	if maskSize < goarch.PtrSize {
+		maskSize = goarch.PtrSize
+	}
+	if maskSize > len(mask) {
+		maskSize = len(mask)
+	}
+
+	if cpuset_getaffinity(_CPU_LEVEL_WHICH, _CPU_WHICH_PID, _CPU_CURRENT_PID,
+		maskSize, (*byte)(unsafe.Pointer(&mask[0]))) != 0 {
+		return 1
+	}
+	n := int32(0)
+	for _, v := range mask[:maskSize] {
+		for v != 0 {
+			n += int32(v & 1)
+			v >>= 1
+		}
+	}
+	if n == 0 {
+		return 1
+	}
+	return n
+}
+
+func getPageSize() uintptr {
+	mib := [2]uint32{_CTL_HW, _HW_PAGESIZE}
+	out := uint32(0)
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], 2, (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret >= 0 {
+		return uintptr(out)
+	}
+	return 0
+}
+
+// FreeBSD's umtx_op syscall is effectively the same as Linux's futex, and
+// thus the code is largely similar. See Linux implementation
+// and lock_futex.go for comments.
+
+//go:nosplit
+func futexsleep(addr *uint32, val uint32, ns int64) {
+	systemstack(func() {
+		futexsleep1(addr, val, ns)
+	})
+}
+
+func futexsleep1(addr *uint32, val uint32, ns int64) {
+	var utp *umtx_time
+	if ns >= 0 {
+		var ut umtx_time
+		ut._clockid = _CLOCK_MONOTONIC
+		ut._timeout.setNsec(ns)
+		utp = &ut
+	}
+	ret := sys_umtx_op(addr, _UMTX_OP_WAIT_UINT_PRIVATE, val, unsafe.Sizeof(*utp), utp)
+	if ret >= 0 || ret == -_EINTR || ret == -_ETIMEDOUT {
+		return
+	}
+	print("umtx_wait addr=", addr, " val=", val, " ret=", ret, "\n")
+	*(*int32)(unsafe.Pointer(uintptr(0x1005))) = 0x1005
+}
+
+//go:nosplit
+func futexwakeup(addr *uint32, cnt uint32) {
+	ret := sys_umtx_op(addr, _UMTX_OP_WAKE_PRIVATE, cnt, 0, nil)
+	if ret >= 0 {
+		return
+	}
+
+	systemstack(func() {
+		print("umtx_wake_addr=", addr, " ret=", ret, "\n")
+	})
+}
+
+func thr_start()
+
+// May run with m.p==nil, so write barriers are not allowed.
+//
+//go:nowritebarrier
+func newosproc(mp *m) {
+	stk := unsafe.Pointer(mp.g0.stack.hi)
+	if false {
+		print("newosproc stk=", stk, " m=", mp, " g=", mp.g0, " thr_start=", abi.FuncPCABI0(thr_start), " id=", mp.id, " ostk=", &mp, "\n")
+	}
+
+	param := thrparam{
+		start_func: abi.FuncPCABI0(thr_start),
+		arg:        unsafe.Pointer(mp),
+		stack_base: mp.g0.stack.lo,
+		stack_size: uintptr(stk) - mp.g0.stack.lo,
+		child_tid:  nil, // minit will record tid
+		parent_tid: nil,
+		tls_base:   unsafe.Pointer(&mp.tls[0]),
+		tls_size:   unsafe.Sizeof(mp.tls),
+	}
+
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	ret := retryOnEAGAIN(func() int32 {
+		errno := thr_new(&param, int32(unsafe.Sizeof(param)))
+		// thr_new returns negative errno
+		return -errno
+	})
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if ret != 0 {
+		print("runtime: failed to create new OS thread (have ", mcount(), " already; errno=", ret, ")\n")
+		throw("newosproc")
+	}
+}
+
+// Version of newosproc that doesn't require a valid G.
+//
+//go:nosplit
+func newosproc0(stacksize uintptr, fn unsafe.Pointer) {
+	stack := sysAlloc(stacksize, &memstats.stacks_sys)
+	if stack == nil {
+		writeErrStr(failallocatestack)
+		exit(1)
+	}
+	// This code "knows" it's being called once from the library
+	// initialization code, and so it's using the static m0 for the
+	// tls and procid (thread) pointers. thr_new() requires the tls
+	// pointers, though the tid pointers can be nil.
+	// However, newosproc0 is currently unreachable because builds
+	// utilizing c-shared/c-archive force external linking.
+	param := thrparam{
+		start_func: uintptr(fn),
+		arg:        nil,
+		stack_base: uintptr(stack), //+stacksize?
+		stack_size: stacksize,
+		child_tid:  nil, // minit will record tid
+		parent_tid: nil,
+		tls_base:   unsafe.Pointer(&m0.tls[0]),
+		tls_size:   unsafe.Sizeof(m0.tls),
+	}
+
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	ret := thr_new(&param, int32(unsafe.Sizeof(param)))
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if ret < 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+}
+
+// Called to do synchronous initialization of Go code built with
+// -buildmode=c-archive or -buildmode=c-shared.
+// None of the Go runtime is initialized.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func libpreinit() {
+	initsig(true)
+}
+
+func osinit() {
+	ncpu = getncpu()
+	if physPageSize == 0 {
+		physPageSize = getPageSize()
+	}
+}
+
+var urandom_dev = []byte("/dev/urandom\x00")
+
+//go:nosplit
+func getRandomData(r []byte) {
+	fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)
+	n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))
+	closefd(fd)
+	extendRandom(r, int(n))
+}
+
+func goenvs() {
+	goenvs_unix()
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	mp.gsignal = malg(32 * 1024)
+	mp.gsignal.m = mp
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+	getg().m.procid = uint64(thr_self())
+
+	// On FreeBSD before about April 2017 there was a bug such
+	// that calling execve from a thread other than the main
+	// thread did not reset the signal stack. That would confuse
+	// minitSignals, which calls minitSignalStack, which checks
+	// whether there is currently a signal stack and uses it if
+	// present. To avoid this confusion, explicitly disable the
+	// signal stack on the main thread when not running in a
+	// library. This can be removed when we are confident that all
+	// FreeBSD users are running a patched kernel. See issue #15658.
+	if gp := getg(); !isarchive && !islibrary && gp.m == &m0 && gp == gp.m.g0 {
+		st := stackt{ss_flags: _SS_DISABLE}
+		sigaltstack(&st, nil)
+	}
+
+	minitSignals()
+}
+
+// Called from dropm to undo the effect of an minit.
+//
+//go:nosplit
+func unminit() {
+	unminitSignals()
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+func sigtramp()
+
+type sigactiont struct {
+	sa_handler uintptr
+	sa_flags   int32
+	sa_mask    sigset
+}
+
+// See os_freebsd2.go, os_freebsd_amd64.go for setsig function
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsigstack(i uint32) {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	if sa.sa_flags&_SA_ONSTACK != 0 {
+		return
+	}
+	sa.sa_flags |= _SA_ONSTACK
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func getsig(i uint32) uintptr {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	return sa.sa_handler
+}
+
+// setSignalstackSP sets the ss_sp field of a stackt.
+//
+//go:nosplit
+func setSignalstackSP(s *stackt, sp uintptr) {
+	s.ss_sp = sp
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	mask.__bits[(i-1)/32] |= 1 << ((uint32(i) - 1) & 31)
+}
+
+func sigdelset(mask *sigset, i int) {
+	mask.__bits[(i-1)/32] &^= 1 << ((uint32(i) - 1) & 31)
+}
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+}
+
+func setProcessCPUProfiler(hz int32) {
+	setProcessCPUProfilerTimer(hz)
+}
+
+func setThreadCPUProfiler(hz int32) {
+	setThreadCPUProfilerHz(hz)
+}
+
+//go:nosplit
+func validSIGPROF(mp *m, c *sigctxt) bool {
+	return true
+}
+
+func sysargs(argc int32, argv **byte) {
+	n := argc + 1
+
+	// skip over argv, envp to get to auxv
+	for argv_index(argv, n) != nil {
+		n++
+	}
+
+	// skip NULL separator
+	n++
+
+	// now argv+n is auxv
+	auxvp := (*[1 << 28]uintptr)(add(unsafe.Pointer(argv), uintptr(n)*goarch.PtrSize))
+	pairs := sysauxv(auxvp[:])
+	auxv = auxvp[: pairs*2 : pairs*2]
+}
+
+const (
+	_AT_NULL     = 0  // Terminates the vector
+	_AT_PAGESZ   = 6  // Page size in bytes
+	_AT_TIMEKEEP = 22 // Pointer to timehands.
+	_AT_HWCAP    = 25 // CPU feature flags
+	_AT_HWCAP2   = 26 // CPU feature flags 2
+)
+
+func sysauxv(auxv []uintptr) (pairs int) {
+	var i int
+	for i = 0; auxv[i] != _AT_NULL; i += 2 {
+		tag, val := auxv[i], auxv[i+1]
+		switch tag {
+		// _AT_NCPUS from auxv shouldn't be used due to golang.org/issue/15206
+		case _AT_PAGESZ:
+			physPageSize = val
+		case _AT_TIMEKEEP:
+			timekeepSharedPage = (*vdsoTimekeep)(unsafe.Pointer(val))
+		}
+
+		archauxv(tag, val)
+	}
+	return i / 2
+}
+
+// sysSigaction calls the sigaction system call.
+//
+//go:nosplit
+func sysSigaction(sig uint32, new, old *sigactiont) {
+	// Use system stack to avoid split stack overflow on amd64
+	if asmSigaction(uintptr(sig), new, old) != 0 {
+		systemstack(func() {
+			throw("sigaction failed")
+		})
+	}
+}
+
+// asmSigaction is implemented in assembly.
+//
+//go:noescape
+func asmSigaction(sig uintptr, new, old *sigactiont) int32
+
+// raise sends a signal to the calling thread.
+//
+// It must be nosplit because it is used by the signal handler before
+// it definitely has a Go stack.
+//
+//go:nosplit
+func raise(sig uint32) {
+	thr_kill(thr_self(), int(sig))
+}
+
+func signalM(mp *m, sig int) {
+	thr_kill(thread(mp.procid), sig)
+}
+
+// sigPerThreadSyscall is only used on linux, so we assign a bogus signal
+// number.
+const sigPerThreadSyscall = 1 << 31
+
+//go:nosplit
+func runPerThreadSyscall() {
+	throw("runPerThreadSyscall only valid on linux")
+}
diff --git a/src/runtime/os_freebsd2.go b/src/runtime/os_freebsd2.go
new file mode 100644
index 0000000..3eaedf0
--- /dev/null
+++ b/src/runtime/os_freebsd2.go
@@ -0,0 +1,22 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build freebsd && !amd64
+
+package runtime
+
+import "internal/abi"
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa sigactiont
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTART
+	sa.sa_mask = sigset_all
+	if fn == abi.FuncPCABIInternal(sighandler) { // abi.FuncPCABIInternal(sighandler) matches the callers in signal_unix.go
+		fn = abi.FuncPCABI0(sigtramp)
+	}
+	sa.sa_handler = fn
+	sigaction(i, &sa, nil)
+}
diff --git a/src/runtime/os_freebsd_amd64.go b/src/runtime/os_freebsd_amd64.go
new file mode 100644
index 0000000..b179383
--- /dev/null
+++ b/src/runtime/os_freebsd_amd64.go
@@ -0,0 +1,26 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "internal/abi"
+
+func cgoSigtramp()
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa sigactiont
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTART
+	sa.sa_mask = sigset_all
+	if fn == abi.FuncPCABIInternal(sighandler) { // abi.FuncPCABIInternal(sighandler) matches the callers in signal_unix.go
+		if iscgo {
+			fn = abi.FuncPCABI0(cgoSigtramp)
+		} else {
+			fn = abi.FuncPCABI0(sigtramp)
+		}
+	}
+	sa.sa_handler = fn
+	sigaction(i, &sa, nil)
+}
diff --git a/src/runtime/os_freebsd_arm.go b/src/runtime/os_freebsd_arm.go
new file mode 100644
index 0000000..3feaa5e
--- /dev/null
+++ b/src/runtime/os_freebsd_arm.go
@@ -0,0 +1,48 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "internal/cpu"
+
+const (
+	_HWCAP_VFP   = 1 << 6
+	_HWCAP_VFPv3 = 1 << 13
+)
+
+func checkgoarm() {
+	if goarm > 5 && cpu.HWCap&_HWCAP_VFP == 0 {
+		print("runtime: this CPU has no floating point hardware, so it cannot run\n")
+		print("this GOARM=", goarm, " binary. Recompile using GOARM=5.\n")
+		exit(1)
+	}
+	if goarm > 6 && cpu.HWCap&_HWCAP_VFPv3 == 0 {
+		print("runtime: this CPU has no VFPv3 floating point hardware, so it cannot run\n")
+		print("this GOARM=", goarm, " binary. Recompile using GOARM=5 or GOARM=6.\n")
+		exit(1)
+	}
+
+	// osinit not called yet, so ncpu not set: must use getncpu directly.
+	if getncpu() > 1 && goarm < 7 {
+		print("runtime: this system has multiple CPUs and must use\n")
+		print("atomic synchronization instructions. Recompile using GOARM=7.\n")
+		exit(1)
+	}
+}
+
+func archauxv(tag, val uintptr) {
+	switch tag {
+	case _AT_HWCAP:
+		cpu.HWCap = uint(val)
+	case _AT_HWCAP2:
+		cpu.HWCap2 = uint(val)
+	}
+}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_freebsd_arm64.go b/src/runtime/os_freebsd_arm64.go
new file mode 100644
index 0000000..b5b25f0
--- /dev/null
+++ b/src/runtime/os_freebsd_arm64.go
@@ -0,0 +1,12 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed fastrand().
+	// nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_freebsd_noauxv.go b/src/runtime/os_freebsd_noauxv.go
new file mode 100644
index 0000000..1d9452b
--- /dev/null
+++ b/src/runtime/os_freebsd_noauxv.go
@@ -0,0 +1,10 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build freebsd && !arm
+
+package runtime
+
+func archauxv(tag, val uintptr) {
+}
diff --git a/src/runtime/os_freebsd_riscv64.go b/src/runtime/os_freebsd_riscv64.go
new file mode 100644
index 0000000..0f2ed50
--- /dev/null
+++ b/src/runtime/os_freebsd_riscv64.go
@@ -0,0 +1,7 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+func osArchInit() {}
diff --git a/src/runtime/os_illumos.go b/src/runtime/os_illumos.go
new file mode 100644
index 0000000..c3c3e4e
--- /dev/null
+++ b/src/runtime/os_illumos.go
@@ -0,0 +1,132 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+//go:cgo_import_dynamic libc_getrctl getrctl "libc.so"
+//go:cgo_import_dynamic libc_rctlblk_get_local_action rctlblk_get_local_action "libc.so"
+//go:cgo_import_dynamic libc_rctlblk_get_local_flags rctlblk_get_local_flags "libc.so"
+//go:cgo_import_dynamic libc_rctlblk_get_value rctlblk_get_value "libc.so"
+//go:cgo_import_dynamic libc_rctlblk_size rctlblk_size "libc.so"
+
+//go:linkname libc_getrctl libc_getrctl
+//go:linkname libc_rctlblk_get_local_action libc_rctlblk_get_local_action
+//go:linkname libc_rctlblk_get_local_flags libc_rctlblk_get_local_flags
+//go:linkname libc_rctlblk_get_value libc_rctlblk_get_value
+//go:linkname libc_rctlblk_size libc_rctlblk_size
+
+var (
+	libc_getrctl,
+	libc_rctlblk_get_local_action,
+	libc_rctlblk_get_local_flags,
+	libc_rctlblk_get_value,
+	libc_rctlblk_size libcFunc
+)
+
+// Return the minimum value seen for the zone CPU cap, or 0 if no cap is
+// detected.
+func getcpucap() uint64 {
+	// The resource control block is an opaque object whose size is only
+	// known to libc.  In practice, given the contents, it is unlikely to
+	// grow beyond 8KB so we'll use a static buffer of that size here.
+	const rblkmaxsize = 8 * 1024
+	if rctlblk_size() > rblkmaxsize {
+		return 0
+	}
+
+	// The "zone.cpu-cap" resource control, as described in
+	// resource_controls(5), "sets a limit on the amount of CPU time that
+	// can be used by a zone.  The unit used is the percentage of a single
+	// CPU that can be used by all user threads in a zone, expressed as an
+	// integer."  A C string of the name must be passed to getrctl(2).
+	name := []byte("zone.cpu-cap\x00")
+
+	// To iterate over the list of values for a particular resource
+	// control, we need two blocks: one for the previously read value and
+	// one for the next value.
+	var rblk0 [rblkmaxsize]byte
+	var rblk1 [rblkmaxsize]byte
+	rblk := &rblk0[0]
+	rblkprev := &rblk1[0]
+
+	var flag uint32 = _RCTL_FIRST
+	var capval uint64 = 0
+
+	for {
+		if getrctl(unsafe.Pointer(&name[0]), unsafe.Pointer(rblkprev), unsafe.Pointer(rblk), flag) != 0 {
+			// The end of the sequence is reported as an ENOENT
+			// failure, but determining the CPU cap is not critical
+			// here.  We'll treat any failure as if it were the end
+			// of sequence.
+			break
+		}
+
+		lflags := rctlblk_get_local_flags(unsafe.Pointer(rblk))
+		action := rctlblk_get_local_action(unsafe.Pointer(rblk))
+		if (lflags&_RCTL_LOCAL_MAXIMAL) == 0 && action == _RCTL_LOCAL_DENY {
+			// This is a finite (not maximal) value representing a
+			// cap (deny) action.
+			v := rctlblk_get_value(unsafe.Pointer(rblk))
+			if capval == 0 || capval > v {
+				capval = v
+			}
+		}
+
+		// Swap the blocks around so that we can fetch the next value
+		t := rblk
+		rblk = rblkprev
+		rblkprev = t
+		flag = _RCTL_NEXT
+	}
+
+	return capval
+}
+
+func getncpu() int32 {
+	n := int32(sysconf(__SC_NPROCESSORS_ONLN))
+	if n < 1 {
+		return 1
+	}
+
+	if cents := int32(getcpucap()); cents > 0 {
+		// Convert from a percentage of CPUs to a number of CPUs,
+		// rounding up to make use of a fractional CPU
+		// e.g., 336% becomes 4 CPUs
+		ncap := (cents + 99) / 100
+		if ncap < n {
+			return ncap
+		}
+	}
+
+	return n
+}
+
+//go:nosplit
+func getrctl(controlname, oldbuf, newbuf unsafe.Pointer, flags uint32) uintptr {
+	return sysvicall4(&libc_getrctl, uintptr(controlname), uintptr(oldbuf), uintptr(newbuf), uintptr(flags))
+}
+
+//go:nosplit
+func rctlblk_get_local_action(buf unsafe.Pointer) uintptr {
+	return sysvicall2(&libc_rctlblk_get_local_action, uintptr(buf), uintptr(0))
+}
+
+//go:nosplit
+func rctlblk_get_local_flags(buf unsafe.Pointer) uintptr {
+	return sysvicall1(&libc_rctlblk_get_local_flags, uintptr(buf))
+}
+
+//go:nosplit
+func rctlblk_get_value(buf unsafe.Pointer) uint64 {
+	return uint64(sysvicall1(&libc_rctlblk_get_value, uintptr(buf)))
+}
+
+//go:nosplit
+func rctlblk_size() uintptr {
+	return sysvicall0(&libc_rctlblk_size)
+}
diff --git a/src/runtime/os_js.go b/src/runtime/os_js.go
new file mode 100644
index 0000000..65fb499
--- /dev/null
+++ b/src/runtime/os_js.go
@@ -0,0 +1,37 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build js && wasm
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+func exit(code int32)
+
+func write1(fd uintptr, p unsafe.Pointer, n int32) int32 {
+	if fd > 2 {
+		throw("runtime.write to fd > 2 is unsupported")
+	}
+	wasmWrite(fd, p, n)
+	return n
+}
+
+//go:wasmimport gojs runtime.wasmWrite
+//go:noescape
+func wasmWrite(fd uintptr, p unsafe.Pointer, n int32)
+
+func usleep(usec uint32) {
+	// TODO(neelance): implement usleep
+}
+
+//go:wasmimport gojs runtime.getRandomData
+//go:noescape
+func getRandomData(r []byte)
+
+func goenvs() {
+	goenvs_unix()
+}
diff --git a/src/runtime/os_linux.go b/src/runtime/os_linux.go
new file mode 100644
index 0000000..0b05610
--- /dev/null
+++ b/src/runtime/os_linux.go
@@ -0,0 +1,918 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"runtime/internal/syscall"
+	"unsafe"
+)
+
+// sigPerThreadSyscall is the same signal (SIGSETXID) used by glibc for
+// per-thread syscalls on Linux. We use it for the same purpose in non-cgo
+// binaries.
+const sigPerThreadSyscall = _SIGRTMIN + 1
+
+type mOS struct {
+	// profileTimer holds the ID of the POSIX interval timer for profiling CPU
+	// usage on this thread.
+	//
+	// It is valid when the profileTimerValid field is true. A thread
+	// creates and manages its own timer, and these fields are read and written
+	// only by this thread. But because some of the reads on profileTimerValid
+	// are in signal handling code, this field should be atomic type.
+	profileTimer      int32
+	profileTimerValid atomic.Bool
+
+	// needPerThreadSyscall indicates that a per-thread syscall is required
+	// for doAllThreadsSyscall.
+	needPerThreadSyscall atomic.Uint8
+}
+
+//go:noescape
+func futex(addr unsafe.Pointer, op int32, val uint32, ts, addr2 unsafe.Pointer, val3 uint32) int32
+
+// Linux futex.
+//
+//	futexsleep(uint32 *addr, uint32 val)
+//	futexwakeup(uint32 *addr)
+//
+// Futexsleep atomically checks if *addr == val and if so, sleeps on addr.
+// Futexwakeup wakes up threads sleeping on addr.
+// Futexsleep is allowed to wake up spuriously.
+
+const (
+	_FUTEX_PRIVATE_FLAG = 128
+	_FUTEX_WAIT_PRIVATE = 0 | _FUTEX_PRIVATE_FLAG
+	_FUTEX_WAKE_PRIVATE = 1 | _FUTEX_PRIVATE_FLAG
+)
+
+// Atomically,
+//
+//	if(*addr == val) sleep
+//
+// Might be woken up spuriously; that's allowed.
+// Don't sleep longer than ns; ns < 0 means forever.
+//
+//go:nosplit
+func futexsleep(addr *uint32, val uint32, ns int64) {
+	// Some Linux kernels have a bug where futex of
+	// FUTEX_WAIT returns an internal error code
+	// as an errno. Libpthread ignores the return value
+	// here, and so can we: as it says a few lines up,
+	// spurious wakeups are allowed.
+	if ns < 0 {
+		futex(unsafe.Pointer(addr), _FUTEX_WAIT_PRIVATE, val, nil, nil, 0)
+		return
+	}
+
+	var ts timespec
+	ts.setNsec(ns)
+	futex(unsafe.Pointer(addr), _FUTEX_WAIT_PRIVATE, val, unsafe.Pointer(&ts), nil, 0)
+}
+
+// If any procs are sleeping on addr, wake up at most cnt.
+//
+//go:nosplit
+func futexwakeup(addr *uint32, cnt uint32) {
+	ret := futex(unsafe.Pointer(addr), _FUTEX_WAKE_PRIVATE, cnt, nil, nil, 0)
+	if ret >= 0 {
+		return
+	}
+
+	// I don't know that futex wakeup can return
+	// EAGAIN or EINTR, but if it does, it would be
+	// safe to loop and call futex again.
+	systemstack(func() {
+		print("futexwakeup addr=", addr, " returned ", ret, "\n")
+	})
+
+	*(*int32)(unsafe.Pointer(uintptr(0x1006))) = 0x1006
+}
+
+func getproccount() int32 {
+	// This buffer is huge (8 kB) but we are on the system stack
+	// and there should be plenty of space (64 kB).
+	// Also this is a leaf, so we're not holding up the memory for long.
+	// See golang.org/issue/11823.
+	// The suggested behavior here is to keep trying with ever-larger
+	// buffers, but we don't have a dynamic memory allocator at the
+	// moment, so that's a bit tricky and seems like overkill.
+	const maxCPUs = 64 * 1024
+	var buf [maxCPUs / 8]byte
+	r := sched_getaffinity(0, unsafe.Sizeof(buf), &buf[0])
+	if r < 0 {
+		return 1
+	}
+	n := int32(0)
+	for _, v := range buf[:r] {
+		for v != 0 {
+			n += int32(v & 1)
+			v >>= 1
+		}
+	}
+	if n == 0 {
+		n = 1
+	}
+	return n
+}
+
+// Clone, the Linux rfork.
+const (
+	_CLONE_VM             = 0x100
+	_CLONE_FS             = 0x200
+	_CLONE_FILES          = 0x400
+	_CLONE_SIGHAND        = 0x800
+	_CLONE_PTRACE         = 0x2000
+	_CLONE_VFORK          = 0x4000
+	_CLONE_PARENT         = 0x8000
+	_CLONE_THREAD         = 0x10000
+	_CLONE_NEWNS          = 0x20000
+	_CLONE_SYSVSEM        = 0x40000
+	_CLONE_SETTLS         = 0x80000
+	_CLONE_PARENT_SETTID  = 0x100000
+	_CLONE_CHILD_CLEARTID = 0x200000
+	_CLONE_UNTRACED       = 0x800000
+	_CLONE_CHILD_SETTID   = 0x1000000
+	_CLONE_STOPPED        = 0x2000000
+	_CLONE_NEWUTS         = 0x4000000
+	_CLONE_NEWIPC         = 0x8000000
+
+	// As of QEMU 2.8.0 (5ea2fc84d), user emulation requires all six of these
+	// flags to be set when creating a thread; attempts to share the other
+	// five but leave SYSVSEM unshared will fail with -EINVAL.
+	//
+	// In non-QEMU environments CLONE_SYSVSEM is inconsequential as we do not
+	// use System V semaphores.
+
+	cloneFlags = _CLONE_VM | /* share memory */
+		_CLONE_FS | /* share cwd, etc */
+		_CLONE_FILES | /* share fd table */
+		_CLONE_SIGHAND | /* share sig handler table */
+		_CLONE_SYSVSEM | /* share SysV semaphore undo lists (see issue #20763) */
+		_CLONE_THREAD /* revisit - okay for now */
+)
+
+//go:noescape
+func clone(flags int32, stk, mp, gp, fn unsafe.Pointer) int32
+
+// May run with m.p==nil, so write barriers are not allowed.
+//
+//go:nowritebarrier
+func newosproc(mp *m) {
+	stk := unsafe.Pointer(mp.g0.stack.hi)
+	/*
+	 * note: strace gets confused if we use CLONE_PTRACE here.
+	 */
+	if false {
+		print("newosproc stk=", stk, " m=", mp, " g=", mp.g0, " clone=", abi.FuncPCABI0(clone), " id=", mp.id, " ostk=", &mp, "\n")
+	}
+
+	// Disable signals during clone, so that the new thread starts
+	// with signals disabled. It will enable them in minit.
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	ret := retryOnEAGAIN(func() int32 {
+		r := clone(cloneFlags, stk, unsafe.Pointer(mp), unsafe.Pointer(mp.g0), unsafe.Pointer(abi.FuncPCABI0(mstart)))
+		// clone returns positive TID, negative errno.
+		// We don't care about the TID.
+		if r >= 0 {
+			return 0
+		}
+		return -r
+	})
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+
+	if ret != 0 {
+		print("runtime: failed to create new OS thread (have ", mcount(), " already; errno=", ret, ")\n")
+		if ret == _EAGAIN {
+			println("runtime: may need to increase max user processes (ulimit -u)")
+		}
+		throw("newosproc")
+	}
+}
+
+// Version of newosproc that doesn't require a valid G.
+//
+//go:nosplit
+func newosproc0(stacksize uintptr, fn unsafe.Pointer) {
+	stack := sysAlloc(stacksize, &memstats.stacks_sys)
+	if stack == nil {
+		writeErrStr(failallocatestack)
+		exit(1)
+	}
+	ret := clone(cloneFlags, unsafe.Pointer(uintptr(stack)+stacksize), nil, nil, fn)
+	if ret < 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+}
+
+const (
+	_AT_NULL   = 0  // End of vector
+	_AT_PAGESZ = 6  // System physical page size
+	_AT_HWCAP  = 16 // hardware capability bit vector
+	_AT_SECURE = 23 // secure mode boolean
+	_AT_RANDOM = 25 // introduced in 2.6.29
+	_AT_HWCAP2 = 26 // hardware capability bit vector 2
+)
+
+var procAuxv = []byte("/proc/self/auxv\x00")
+
+var addrspace_vec [1]byte
+
+func mincore(addr unsafe.Pointer, n uintptr, dst *byte) int32
+
+var auxvreadbuf [128]uintptr
+
+func sysargs(argc int32, argv **byte) {
+	n := argc + 1
+
+	// skip over argv, envp to get to auxv
+	for argv_index(argv, n) != nil {
+		n++
+	}
+
+	// skip NULL separator
+	n++
+
+	// now argv+n is auxv
+	auxvp := (*[1 << 28]uintptr)(add(unsafe.Pointer(argv), uintptr(n)*goarch.PtrSize))
+
+	if pairs := sysauxv(auxvp[:]); pairs != 0 {
+		auxv = auxvp[: pairs*2 : pairs*2]
+		return
+	}
+	// In some situations we don't get a loader-provided
+	// auxv, such as when loaded as a library on Android.
+	// Fall back to /proc/self/auxv.
+	fd := open(&procAuxv[0], 0 /* O_RDONLY */, 0)
+	if fd < 0 {
+		// On Android, /proc/self/auxv might be unreadable (issue 9229), so we fallback to
+		// try using mincore to detect the physical page size.
+		// mincore should return EINVAL when address is not a multiple of system page size.
+		const size = 256 << 10 // size of memory region to allocate
+		p, err := mmap(nil, size, _PROT_READ|_PROT_WRITE, _MAP_ANON|_MAP_PRIVATE, -1, 0)
+		if err != 0 {
+			return
+		}
+		var n uintptr
+		for n = 4 << 10; n < size; n <<= 1 {
+			err := mincore(unsafe.Pointer(uintptr(p)+n), 1, &addrspace_vec[0])
+			if err == 0 {
+				physPageSize = n
+				break
+			}
+		}
+		if physPageSize == 0 {
+			physPageSize = size
+		}
+		munmap(p, size)
+		return
+	}
+
+	n = read(fd, noescape(unsafe.Pointer(&auxvreadbuf[0])), int32(unsafe.Sizeof(auxvreadbuf)))
+	closefd(fd)
+	if n < 0 {
+		return
+	}
+	// Make sure buf is terminated, even if we didn't read
+	// the whole file.
+	auxvreadbuf[len(auxvreadbuf)-2] = _AT_NULL
+	pairs := sysauxv(auxvreadbuf[:])
+	auxv = auxvreadbuf[: pairs*2 : pairs*2]
+}
+
+// startupRandomData holds random bytes initialized at startup. These come from
+// the ELF AT_RANDOM auxiliary vector.
+var startupRandomData []byte
+
+// secureMode holds the value of AT_SECURE passed in the auxiliary vector.
+var secureMode bool
+
+func sysauxv(auxv []uintptr) (pairs int) {
+	var i int
+	for ; auxv[i] != _AT_NULL; i += 2 {
+		tag, val := auxv[i], auxv[i+1]
+		switch tag {
+		case _AT_RANDOM:
+			// The kernel provides a pointer to 16-bytes
+			// worth of random data.
+			startupRandomData = (*[16]byte)(unsafe.Pointer(val))[:]
+
+		case _AT_PAGESZ:
+			physPageSize = val
+
+		case _AT_SECURE:
+			secureMode = val == 1
+		}
+
+		archauxv(tag, val)
+		vdsoauxv(tag, val)
+	}
+	return i / 2
+}
+
+var sysTHPSizePath = []byte("/sys/kernel/mm/transparent_hugepage/hpage_pmd_size\x00")
+
+func getHugePageSize() uintptr {
+	var numbuf [20]byte
+	fd := open(&sysTHPSizePath[0], 0 /* O_RDONLY */, 0)
+	if fd < 0 {
+		return 0
+	}
+	ptr := noescape(unsafe.Pointer(&numbuf[0]))
+	n := read(fd, ptr, int32(len(numbuf)))
+	closefd(fd)
+	if n <= 0 {
+		return 0
+	}
+	n-- // remove trailing newline
+	v, ok := atoi(slicebytetostringtmp((*byte)(ptr), int(n)))
+	if !ok || v < 0 {
+		v = 0
+	}
+	if v&(v-1) != 0 {
+		// v is not a power of 2
+		return 0
+	}
+	return uintptr(v)
+}
+
+func osinit() {
+	ncpu = getproccount()
+	physHugePageSize = getHugePageSize()
+	if iscgo {
+		// #42494 glibc and musl reserve some signals for
+		// internal use and require they not be blocked by
+		// the rest of a normal C runtime. When the go runtime
+		// blocks...unblocks signals, temporarily, the blocked
+		// interval of time is generally very short. As such,
+		// these expectations of *libc code are mostly met by
+		// the combined go+cgo system of threads. However,
+		// when go causes a thread to exit, via a return from
+		// mstart(), the combined runtime can deadlock if
+		// these signals are blocked. Thus, don't block these
+		// signals when exiting threads.
+		// - glibc: SIGCANCEL (32), SIGSETXID (33)
+		// - musl: SIGTIMER (32), SIGCANCEL (33), SIGSYNCCALL (34)
+		sigdelset(&sigsetAllExiting, 32)
+		sigdelset(&sigsetAllExiting, 33)
+		sigdelset(&sigsetAllExiting, 34)
+	}
+	osArchInit()
+}
+
+var urandom_dev = []byte("/dev/urandom\x00")
+
+func getRandomData(r []byte) {
+	if startupRandomData != nil {
+		n := copy(r, startupRandomData)
+		extendRandom(r, n)
+		return
+	}
+	fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)
+	n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))
+	closefd(fd)
+	extendRandom(r, int(n))
+}
+
+func goenvs() {
+	goenvs_unix()
+}
+
+// Called to do synchronous initialization of Go code built with
+// -buildmode=c-archive or -buildmode=c-shared.
+// None of the Go runtime is initialized.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func libpreinit() {
+	initsig(true)
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	mp.gsignal = malg(32 * 1024) // Linux wants >= 2K
+	mp.gsignal.m = mp
+}
+
+func gettid() uint32
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+	minitSignals()
+
+	// Cgo-created threads and the bootstrap m are missing a
+	// procid. We need this for asynchronous preemption and it's
+	// useful in debuggers.
+	getg().m.procid = uint64(gettid())
+}
+
+// Called from dropm to undo the effect of an minit.
+//
+//go:nosplit
+func unminit() {
+	unminitSignals()
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+//#ifdef GOARCH_386
+//#define sa_handler k_sa_handler
+//#endif
+
+func sigreturn__sigaction()
+func sigtramp() // Called via C ABI
+func cgoSigtramp()
+
+//go:noescape
+func sigaltstack(new, old *stackt)
+
+//go:noescape
+func setitimer(mode int32, new, old *itimerval)
+
+//go:noescape
+func timer_create(clockid int32, sevp *sigevent, timerid *int32) int32
+
+//go:noescape
+func timer_settime(timerid int32, flags int32, new, old *itimerspec) int32
+
+//go:noescape
+func timer_delete(timerid int32) int32
+
+//go:noescape
+func rtsigprocmask(how int32, new, old *sigset, size int32)
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigprocmask(how int32, new, old *sigset) {
+	rtsigprocmask(how, new, old, int32(unsafe.Sizeof(*new)))
+}
+
+func raise(sig uint32)
+func raiseproc(sig uint32)
+
+//go:noescape
+func sched_getaffinity(pid, len uintptr, buf *byte) int32
+func osyield()
+
+//go:nosplit
+func osyield_no_g() {
+	osyield()
+}
+
+func pipe2(flags int32) (r, w int32, errno int32)
+
+//go:nosplit
+func fcntl(fd, cmd, arg int32) (ret int32, errno int32) {
+	r, _, err := syscall.Syscall6(syscall.SYS_FCNTL, uintptr(fd), uintptr(cmd), uintptr(arg), 0, 0, 0)
+	return int32(r), int32(err)
+}
+
+const (
+	_si_max_size    = 128
+	_sigev_max_size = 64
+)
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa sigactiont
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTORER | _SA_RESTART
+	sigfillset(&sa.sa_mask)
+	// Although Linux manpage says "sa_restorer element is obsolete and
+	// should not be used". x86_64 kernel requires it. Only use it on
+	// x86.
+	if GOARCH == "386" || GOARCH == "amd64" {
+		sa.sa_restorer = abi.FuncPCABI0(sigreturn__sigaction)
+	}
+	if fn == abi.FuncPCABIInternal(sighandler) { // abi.FuncPCABIInternal(sighandler) matches the callers in signal_unix.go
+		if iscgo {
+			fn = abi.FuncPCABI0(cgoSigtramp)
+		} else {
+			fn = abi.FuncPCABI0(sigtramp)
+		}
+	}
+	sa.sa_handler = fn
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsigstack(i uint32) {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	if sa.sa_flags&_SA_ONSTACK != 0 {
+		return
+	}
+	sa.sa_flags |= _SA_ONSTACK
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func getsig(i uint32) uintptr {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	return sa.sa_handler
+}
+
+// setSignalstackSP sets the ss_sp field of a stackt.
+//
+//go:nosplit
+func setSignalstackSP(s *stackt, sp uintptr) {
+	*(*uintptr)(unsafe.Pointer(&s.ss_sp)) = sp
+}
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+}
+
+// sysSigaction calls the rt_sigaction system call.
+//
+//go:nosplit
+func sysSigaction(sig uint32, new, old *sigactiont) {
+	if rt_sigaction(uintptr(sig), new, old, unsafe.Sizeof(sigactiont{}.sa_mask)) != 0 {
+		// Workaround for bugs in QEMU user mode emulation.
+		//
+		// QEMU turns calls to the sigaction system call into
+		// calls to the C library sigaction call; the C
+		// library call rejects attempts to call sigaction for
+		// SIGCANCEL (32) or SIGSETXID (33).
+		//
+		// QEMU rejects calling sigaction on SIGRTMAX (64).
+		//
+		// Just ignore the error in these case. There isn't
+		// anything we can do about it anyhow.
+		if sig != 32 && sig != 33 && sig != 64 {
+			// Use system stack to avoid split stack overflow on ppc64/ppc64le.
+			systemstack(func() {
+				throw("sigaction failed")
+			})
+		}
+	}
+}
+
+// rt_sigaction is implemented in assembly.
+//
+//go:noescape
+func rt_sigaction(sig uintptr, new, old *sigactiont, size uintptr) int32
+
+func getpid() int
+func tgkill(tgid, tid, sig int)
+
+// signalM sends a signal to mp.
+func signalM(mp *m, sig int) {
+	tgkill(getpid(), int(mp.procid), sig)
+}
+
+// validSIGPROF compares this signal delivery's code against the signal sources
+// that the profiler uses, returning whether the delivery should be processed.
+// To be processed, a signal delivery from a known profiling mechanism should
+// correspond to the best profiling mechanism available to this thread. Signals
+// from other sources are always considered valid.
+//
+//go:nosplit
+func validSIGPROF(mp *m, c *sigctxt) bool {
+	code := int32(c.sigcode())
+	setitimer := code == _SI_KERNEL
+	timer_create := code == _SI_TIMER
+
+	if !(setitimer || timer_create) {
+		// The signal doesn't correspond to a profiling mechanism that the
+		// runtime enables itself. There's no reason to process it, but there's
+		// no reason to ignore it either.
+		return true
+	}
+
+	if mp == nil {
+		// Since we don't have an M, we can't check if there's an active
+		// per-thread timer for this thread. We don't know how long this thread
+		// has been around, and if it happened to interact with the Go scheduler
+		// at a time when profiling was active (causing it to have a per-thread
+		// timer). But it may have never interacted with the Go scheduler, or
+		// never while profiling was active. To avoid double-counting, process
+		// only signals from setitimer.
+		//
+		// When a custom cgo traceback function has been registered (on
+		// platforms that support runtime.SetCgoTraceback), SIGPROF signals
+		// delivered to a thread that cannot find a matching M do this check in
+		// the assembly implementations of runtime.cgoSigtramp.
+		return setitimer
+	}
+
+	// Having an M means the thread interacts with the Go scheduler, and we can
+	// check whether there's an active per-thread timer for this thread.
+	if mp.profileTimerValid.Load() {
+		// If this M has its own per-thread CPU profiling interval timer, we
+		// should track the SIGPROF signals that come from that timer (for
+		// accurate reporting of its CPU usage; see issue 35057) and ignore any
+		// that it gets from the process-wide setitimer (to not over-count its
+		// CPU consumption).
+		return timer_create
+	}
+
+	// No active per-thread timer means the only valid profiler is setitimer.
+	return setitimer
+}
+
+func setProcessCPUProfiler(hz int32) {
+	setProcessCPUProfilerTimer(hz)
+}
+
+func setThreadCPUProfiler(hz int32) {
+	mp := getg().m
+	mp.profilehz = hz
+
+	// destroy any active timer
+	if mp.profileTimerValid.Load() {
+		timerid := mp.profileTimer
+		mp.profileTimerValid.Store(false)
+		mp.profileTimer = 0
+
+		ret := timer_delete(timerid)
+		if ret != 0 {
+			print("runtime: failed to disable profiling timer; timer_delete(", timerid, ") errno=", -ret, "\n")
+			throw("timer_delete")
+		}
+	}
+
+	if hz == 0 {
+		// If the goal was to disable profiling for this thread, then the job's done.
+		return
+	}
+
+	// The period of the timer should be 1/Hz. For every "1/Hz" of additional
+	// work, the user should expect one additional sample in the profile.
+	//
+	// But to scale down to very small amounts of application work, to observe
+	// even CPU usage of "one tenth" of the requested period, set the initial
+	// timing delay in a different way: So that "one tenth" of a period of CPU
+	// spend shows up as a 10% chance of one sample (for an expected value of
+	// 0.1 samples), and so that "two and six tenths" periods of CPU spend show
+	// up as a 60% chance of 3 samples and a 40% chance of 2 samples (for an
+	// expected value of 2.6). Set the initial delay to a value in the unifom
+	// random distribution between 0 and the desired period. And because "0"
+	// means "disable timer", add 1 so the half-open interval [0,period) turns
+	// into (0,period].
+	//
+	// Otherwise, this would show up as a bias away from short-lived threads and
+	// from threads that are only occasionally active: for example, when the
+	// garbage collector runs on a mostly-idle system, the additional threads it
+	// activates may do a couple milliseconds of GC-related work and nothing
+	// else in the few seconds that the profiler observes.
+	spec := new(itimerspec)
+	spec.it_value.setNsec(1 + int64(fastrandn(uint32(1e9/hz))))
+	spec.it_interval.setNsec(1e9 / int64(hz))
+
+	var timerid int32
+	var sevp sigevent
+	sevp.notify = _SIGEV_THREAD_ID
+	sevp.signo = _SIGPROF
+	sevp.sigev_notify_thread_id = int32(mp.procid)
+	ret := timer_create(_CLOCK_THREAD_CPUTIME_ID, &sevp, &timerid)
+	if ret != 0 {
+		// If we cannot create a timer for this M, leave profileTimerValid false
+		// to fall back to the process-wide setitimer profiler.
+		return
+	}
+
+	ret = timer_settime(timerid, 0, spec, nil)
+	if ret != 0 {
+		print("runtime: failed to configure profiling timer; timer_settime(", timerid,
+			", 0, {interval: {",
+			spec.it_interval.tv_sec, "s + ", spec.it_interval.tv_nsec, "ns} value: {",
+			spec.it_value.tv_sec, "s + ", spec.it_value.tv_nsec, "ns}}, nil) errno=", -ret, "\n")
+		throw("timer_settime")
+	}
+
+	mp.profileTimer = timerid
+	mp.profileTimerValid.Store(true)
+}
+
+// perThreadSyscallArgs contains the system call number, arguments, and
+// expected return values for a system call to be executed on all threads.
+type perThreadSyscallArgs struct {
+	trap uintptr
+	a1   uintptr
+	a2   uintptr
+	a3   uintptr
+	a4   uintptr
+	a5   uintptr
+	a6   uintptr
+	r1   uintptr
+	r2   uintptr
+}
+
+// perThreadSyscall is the system call to execute for the ongoing
+// doAllThreadsSyscall.
+//
+// perThreadSyscall may only be written while mp.needPerThreadSyscall == 0 on
+// all Ms.
+var perThreadSyscall perThreadSyscallArgs
+
+// syscall_runtime_doAllThreadsSyscall and executes a specified system call on
+// all Ms.
+//
+// The system call is expected to succeed and return the same value on every
+// thread. If any threads do not match, the runtime throws.
+//
+//go:linkname syscall_runtime_doAllThreadsSyscall syscall.runtime_doAllThreadsSyscall
+//go:uintptrescapes
+func syscall_runtime_doAllThreadsSyscall(trap, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	if iscgo {
+		// In cgo, we are not aware of threads created in C, so this approach will not work.
+		panic("doAllThreadsSyscall not supported with cgo enabled")
+	}
+
+	// STW to guarantee that user goroutines see an atomic change to thread
+	// state. Without STW, goroutines could migrate Ms while change is in
+	// progress and e.g., see state old -> new -> old -> new.
+	//
+	// N.B. Internally, this function does not depend on STW to
+	// successfully change every thread. It is only needed for user
+	// expectations, per above.
+	stopTheWorld(stwAllThreadsSyscall)
+
+	// This function depends on several properties:
+	//
+	// 1. All OS threads that already exist are associated with an M in
+	//    allm. i.e., we won't miss any pre-existing threads.
+	// 2. All Ms listed in allm will eventually have an OS thread exist.
+	//    i.e., they will set procid and be able to receive signals.
+	// 3. OS threads created after we read allm will clone from a thread
+	//    that has executed the system call. i.e., they inherit the
+	//    modified state.
+	//
+	// We achieve these through different mechanisms:
+	//
+	// 1. Addition of new Ms to allm in allocm happens before clone of its
+	//    OS thread later in newm.
+	// 2. newm does acquirem to avoid being preempted, ensuring that new Ms
+	//    created in allocm will eventually reach OS thread clone later in
+	//    newm.
+	// 3. We take allocmLock for write here to prevent allocation of new Ms
+	//    while this function runs. Per (1), this prevents clone of OS
+	//    threads that are not yet in allm.
+	allocmLock.lock()
+
+	// Disable preemption, preventing us from changing Ms, as we handle
+	// this M specially.
+	//
+	// N.B. STW and lock() above do this as well, this is added for extra
+	// clarity.
+	acquirem()
+
+	// N.B. allocmLock also prevents concurrent execution of this function,
+	// serializing use of perThreadSyscall, mp.needPerThreadSyscall, and
+	// ensuring all threads execute system calls from multiple calls in the
+	// same order.
+
+	r1, r2, errno := syscall.Syscall6(trap, a1, a2, a3, a4, a5, a6)
+	if GOARCH == "ppc64" || GOARCH == "ppc64le" {
+		// TODO(https://go.dev/issue/51192 ): ppc64 doesn't use r2.
+		r2 = 0
+	}
+	if errno != 0 {
+		releasem(getg().m)
+		allocmLock.unlock()
+		startTheWorld()
+		return r1, r2, errno
+	}
+
+	perThreadSyscall = perThreadSyscallArgs{
+		trap: trap,
+		a1:   a1,
+		a2:   a2,
+		a3:   a3,
+		a4:   a4,
+		a5:   a5,
+		a6:   a6,
+		r1:   r1,
+		r2:   r2,
+	}
+
+	// Wait for all threads to start.
+	//
+	// As described above, some Ms have been added to allm prior to
+	// allocmLock, but not yet completed OS clone and set procid.
+	//
+	// At minimum we must wait for a thread to set procid before we can
+	// send it a signal.
+	//
+	// We take this one step further and wait for all threads to start
+	// before sending any signals. This prevents system calls from getting
+	// applied twice: once in the parent and once in the child, like so:
+	//
+	//          A                     B                  C
+	//                         add C to allm
+	// doAllThreadsSyscall
+	//   allocmLock.lock()
+	//   signal B
+	//                         <receive signal>
+	//                         execute syscall
+	//                         <signal return>
+	//                         clone C
+	//                                             <thread start>
+	//                                             set procid
+	//   signal C
+	//                                             <receive signal>
+	//                                             execute syscall
+	//                                             <signal return>
+	//
+	// In this case, thread C inherited the syscall-modified state from
+	// thread B and did not need to execute the syscall, but did anyway
+	// because doAllThreadsSyscall could not be sure whether it was
+	// required.
+	//
+	// Some system calls may not be idempotent, so we ensure each thread
+	// executes the system call exactly once.
+	for mp := allm; mp != nil; mp = mp.alllink {
+		for atomic.Load64(&mp.procid) == 0 {
+			// Thread is starting.
+			osyield()
+		}
+	}
+
+	// Signal every other thread, where they will execute perThreadSyscall
+	// from the signal handler.
+	gp := getg()
+	tid := gp.m.procid
+	for mp := allm; mp != nil; mp = mp.alllink {
+		if atomic.Load64(&mp.procid) == tid {
+			// Our thread already performed the syscall.
+			continue
+		}
+		mp.needPerThreadSyscall.Store(1)
+		signalM(mp, sigPerThreadSyscall)
+	}
+
+	// Wait for all threads to complete.
+	for mp := allm; mp != nil; mp = mp.alllink {
+		if mp.procid == tid {
+			continue
+		}
+		for mp.needPerThreadSyscall.Load() != 0 {
+			osyield()
+		}
+	}
+
+	perThreadSyscall = perThreadSyscallArgs{}
+
+	releasem(getg().m)
+	allocmLock.unlock()
+	startTheWorld()
+
+	return r1, r2, errno
+}
+
+// runPerThreadSyscall runs perThreadSyscall for this M if required.
+//
+// This function throws if the system call returns with anything other than the
+// expected values.
+//
+//go:nosplit
+func runPerThreadSyscall() {
+	gp := getg()
+	if gp.m.needPerThreadSyscall.Load() == 0 {
+		return
+	}
+
+	args := perThreadSyscall
+	r1, r2, errno := syscall.Syscall6(args.trap, args.a1, args.a2, args.a3, args.a4, args.a5, args.a6)
+	if GOARCH == "ppc64" || GOARCH == "ppc64le" {
+		// TODO(https://go.dev/issue/51192 ): ppc64 doesn't use r2.
+		r2 = 0
+	}
+	if errno != 0 || r1 != args.r1 || r2 != args.r2 {
+		print("trap:", args.trap, ", a123456=[", args.a1, ",", args.a2, ",", args.a3, ",", args.a4, ",", args.a5, ",", args.a6, "]\n")
+		print("results: got {r1=", r1, ",r2=", r2, ",errno=", errno, "}, want {r1=", args.r1, ",r2=", args.r2, ",errno=0}\n")
+		fatal("AllThreadsSyscall6 results differ between threads; runtime corrupted")
+	}
+
+	gp.m.needPerThreadSyscall.Store(0)
+}
+
+const (
+	_SI_USER  = 0
+	_SI_TKILL = -6
+)
+
+// sigFromUser reports whether the signal was sent because of a call
+// to kill or tgkill.
+//
+//go:nosplit
+func (c *sigctxt) sigFromUser() bool {
+	code := int32(c.sigcode())
+	return code == _SI_USER || code == _SI_TKILL
+}
diff --git a/src/runtime/os_linux_arm.go b/src/runtime/os_linux_arm.go
new file mode 100644
index 0000000..bd3ab44
--- /dev/null
+++ b/src/runtime/os_linux_arm.go
@@ -0,0 +1,51 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "internal/cpu"
+
+const (
+	_HWCAP_VFP   = 1 << 6  // introduced in at least 2.6.11
+	_HWCAP_VFPv3 = 1 << 13 // introduced in 2.6.30
+)
+
+func vdsoCall()
+
+func checkgoarm() {
+	// On Android, /proc/self/auxv might be unreadable and hwcap won't
+	// reflect the CPU capabilities. Assume that every Android arm device
+	// has the necessary floating point hardware available.
+	if GOOS == "android" {
+		return
+	}
+	if goarm > 5 && cpu.HWCap&_HWCAP_VFP == 0 {
+		print("runtime: this CPU has no floating point hardware, so it cannot run\n")
+		print("this GOARM=", goarm, " binary. Recompile using GOARM=5.\n")
+		exit(1)
+	}
+	if goarm > 6 && cpu.HWCap&_HWCAP_VFPv3 == 0 {
+		print("runtime: this CPU has no VFPv3 floating point hardware, so it cannot run\n")
+		print("this GOARM=", goarm, " binary. Recompile using GOARM=5 or GOARM=6.\n")
+		exit(1)
+	}
+}
+
+func archauxv(tag, val uintptr) {
+	switch tag {
+	case _AT_HWCAP:
+		cpu.HWCap = uint(val)
+	case _AT_HWCAP2:
+		cpu.HWCap2 = uint(val)
+	}
+}
+
+func osArchInit() {}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed fastrand().
+	// nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_linux_arm64.go b/src/runtime/os_linux_arm64.go
new file mode 100644
index 0000000..2daa56f
--- /dev/null
+++ b/src/runtime/os_linux_arm64.go
@@ -0,0 +1,25 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build arm64
+
+package runtime
+
+import "internal/cpu"
+
+func archauxv(tag, val uintptr) {
+	switch tag {
+	case _AT_HWCAP:
+		cpu.HWCap = uint(val)
+	}
+}
+
+func osArchInit() {}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed fastrand().
+	// nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_linux_be64.go b/src/runtime/os_linux_be64.go
new file mode 100644
index 0000000..d8d4ac2
--- /dev/null
+++ b/src/runtime/os_linux_be64.go
@@ -0,0 +1,42 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The standard Linux sigset type on big-endian 64-bit machines.
+
+//go:build linux && (ppc64 || s390x)
+
+package runtime
+
+const (
+	_SS_DISABLE  = 2
+	_NSIG        = 65
+	_SIG_BLOCK   = 0
+	_SIG_UNBLOCK = 1
+	_SIG_SETMASK = 2
+)
+
+type sigset uint64
+
+var sigset_all = sigset(^uint64(0))
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	if i > 64 {
+		throw("unexpected signal greater than 64")
+	}
+	*mask |= 1 << (uint(i) - 1)
+}
+
+func sigdelset(mask *sigset, i int) {
+	if i > 64 {
+		throw("unexpected signal greater than 64")
+	}
+	*mask &^= 1 << (uint(i) - 1)
+}
+
+//go:nosplit
+func sigfillset(mask *uint64) {
+	*mask = ^uint64(0)
+}
diff --git a/src/runtime/os_linux_generic.go b/src/runtime/os_linux_generic.go
new file mode 100644
index 0000000..15fafc1
--- /dev/null
+++ b/src/runtime/os_linux_generic.go
@@ -0,0 +1,37 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !mips && !mipsle && !mips64 && !mips64le && !s390x && !ppc64 && linux
+
+package runtime
+
+const (
+	_SS_DISABLE  = 2
+	_NSIG        = 65
+	_SIG_BLOCK   = 0
+	_SIG_UNBLOCK = 1
+	_SIG_SETMASK = 2
+)
+
+// It's hard to tease out exactly how big a Sigset is, but
+// rt_sigprocmask crashes if we get it wrong, so if binaries
+// are running, this is right.
+type sigset [2]uint32
+
+var sigset_all = sigset{^uint32(0), ^uint32(0)}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	(*mask)[(i-1)/32] |= 1 << ((uint32(i) - 1) & 31)
+}
+
+func sigdelset(mask *sigset, i int) {
+	(*mask)[(i-1)/32] &^= 1 << ((uint32(i) - 1) & 31)
+}
+
+//go:nosplit
+func sigfillset(mask *uint64) {
+	*mask = ^uint64(0)
+}
diff --git a/src/runtime/os_linux_loong64.go b/src/runtime/os_linux_loong64.go
new file mode 100644
index 0000000..61213da
--- /dev/null
+++ b/src/runtime/os_linux_loong64.go
@@ -0,0 +1,11 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && loong64
+
+package runtime
+
+func archauxv(tag, val uintptr) {}
+
+func osArchInit() {}
diff --git a/src/runtime/os_linux_mips64x.go b/src/runtime/os_linux_mips64x.go
new file mode 100644
index 0000000..11d35bc
--- /dev/null
+++ b/src/runtime/os_linux_mips64x.go
@@ -0,0 +1,52 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (mips64 || mips64le)
+
+package runtime
+
+import "internal/cpu"
+
+func archauxv(tag, val uintptr) {
+	switch tag {
+	case _AT_HWCAP:
+		cpu.HWCap = uint(val)
+	}
+}
+
+func osArchInit() {}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed fastrand().
+	// nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
+
+const (
+	_SS_DISABLE  = 2
+	_NSIG        = 129
+	_SIG_BLOCK   = 1
+	_SIG_UNBLOCK = 2
+	_SIG_SETMASK = 3
+)
+
+type sigset [2]uint64
+
+var sigset_all = sigset{^uint64(0), ^uint64(0)}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	(*mask)[(i-1)/64] |= 1 << ((uint32(i) - 1) & 63)
+}
+
+func sigdelset(mask *sigset, i int) {
+	(*mask)[(i-1)/64] &^= 1 << ((uint32(i) - 1) & 63)
+}
+
+//go:nosplit
+func sigfillset(mask *[2]uint64) {
+	(*mask)[0], (*mask)[1] = ^uint64(0), ^uint64(0)
+}
diff --git a/src/runtime/os_linux_mipsx.go b/src/runtime/os_linux_mipsx.go
new file mode 100644
index 0000000..cdf83ff
--- /dev/null
+++ b/src/runtime/os_linux_mipsx.go
@@ -0,0 +1,46 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (mips || mipsle)
+
+package runtime
+
+func archauxv(tag, val uintptr) {
+}
+
+func osArchInit() {}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed fastrand().
+	// nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
+
+const (
+	_SS_DISABLE  = 2
+	_NSIG        = 128 + 1
+	_SIG_BLOCK   = 1
+	_SIG_UNBLOCK = 2
+	_SIG_SETMASK = 3
+)
+
+type sigset [4]uint32
+
+var sigset_all = sigset{^uint32(0), ^uint32(0), ^uint32(0), ^uint32(0)}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	(*mask)[(i-1)/32] |= 1 << ((uint32(i) - 1) & 31)
+}
+
+func sigdelset(mask *sigset, i int) {
+	(*mask)[(i-1)/32] &^= 1 << ((uint32(i) - 1) & 31)
+}
+
+//go:nosplit
+func sigfillset(mask *[4]uint32) {
+	(*mask)[0], (*mask)[1], (*mask)[2], (*mask)[3] = ^uint32(0), ^uint32(0), ^uint32(0), ^uint32(0)
+}
diff --git a/src/runtime/os_linux_noauxv.go b/src/runtime/os_linux_noauxv.go
new file mode 100644
index 0000000..ff37727
--- /dev/null
+++ b/src/runtime/os_linux_noauxv.go
@@ -0,0 +1,10 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && !arm && !arm64 && !loong64 && !mips && !mipsle && !mips64 && !mips64le && !s390x && !ppc64 && !ppc64le
+
+package runtime
+
+func archauxv(tag, val uintptr) {
+}
diff --git a/src/runtime/os_linux_novdso.go b/src/runtime/os_linux_novdso.go
new file mode 100644
index 0000000..d7e1ea0
--- /dev/null
+++ b/src/runtime/os_linux_novdso.go
@@ -0,0 +1,10 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && !386 && !amd64 && !arm && !arm64 && !loong64 && !mips64 && !mips64le && !ppc64 && !ppc64le && !riscv64 && !s390x
+
+package runtime
+
+func vdsoauxv(tag, val uintptr) {
+}
diff --git a/src/runtime/os_linux_ppc64x.go b/src/runtime/os_linux_ppc64x.go
new file mode 100644
index 0000000..25d7ccc
--- /dev/null
+++ b/src/runtime/os_linux_ppc64x.go
@@ -0,0 +1,23 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (ppc64 || ppc64le)
+
+package runtime
+
+import "internal/cpu"
+
+func archauxv(tag, val uintptr) {
+	switch tag {
+	case _AT_HWCAP:
+		// ppc64x doesn't have a 'cpuid' instruction
+		// equivalent and relies on HWCAP/HWCAP2 bits for
+		// hardware capabilities.
+		cpu.HWCap = uint(val)
+	case _AT_HWCAP2:
+		cpu.HWCap2 = uint(val)
+	}
+}
+
+func osArchInit() {}
diff --git a/src/runtime/os_linux_riscv64.go b/src/runtime/os_linux_riscv64.go
new file mode 100644
index 0000000..9be88a5
--- /dev/null
+++ b/src/runtime/os_linux_riscv64.go
@@ -0,0 +1,7 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+func osArchInit() {}
diff --git a/src/runtime/os_linux_s390x.go b/src/runtime/os_linux_s390x.go
new file mode 100644
index 0000000..b9651f1
--- /dev/null
+++ b/src/runtime/os_linux_s390x.go
@@ -0,0 +1,16 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "internal/cpu"
+
+func archauxv(tag, val uintptr) {
+	switch tag {
+	case _AT_HWCAP:
+		cpu.HWCap = uint(val)
+	}
+}
+
+func osArchInit() {}
diff --git a/src/runtime/os_linux_x86.go b/src/runtime/os_linux_x86.go
new file mode 100644
index 0000000..c88f61f
--- /dev/null
+++ b/src/runtime/os_linux_x86.go
@@ -0,0 +1,9 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (386 || amd64)
+
+package runtime
+
+func osArchInit() {}
diff --git a/src/runtime/os_netbsd.go b/src/runtime/os_netbsd.go
new file mode 100644
index 0000000..b50ed4b
--- /dev/null
+++ b/src/runtime/os_netbsd.go
@@ -0,0 +1,450 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const (
+	_SS_DISABLE  = 4
+	_SIG_BLOCK   = 1
+	_SIG_UNBLOCK = 2
+	_SIG_SETMASK = 3
+	_NSIG        = 33
+	_SI_USER     = 0
+
+	// From NetBSD's <sys/ucontext.h>
+	_UC_SIGMASK = 0x01
+	_UC_CPU     = 0x04
+
+	// From <sys/lwp.h>
+	_LWP_DETACHED = 0x00000040
+)
+
+type mOS struct {
+	waitsemacount uint32
+}
+
+//go:noescape
+func setitimer(mode int32, new, old *itimerval)
+
+//go:noescape
+func sigaction(sig uint32, new, old *sigactiont)
+
+//go:noescape
+func sigaltstack(new, old *stackt)
+
+//go:noescape
+func sigprocmask(how int32, new, old *sigset)
+
+//go:noescape
+func sysctl(mib *uint32, miblen uint32, out *byte, size *uintptr, dst *byte, ndst uintptr) int32
+
+func lwp_tramp()
+
+func raiseproc(sig uint32)
+
+func lwp_kill(tid int32, sig int)
+
+//go:noescape
+func getcontext(ctxt unsafe.Pointer)
+
+//go:noescape
+func lwp_create(ctxt unsafe.Pointer, flags uintptr, lwpid unsafe.Pointer) int32
+
+//go:noescape
+func lwp_park(clockid, flags int32, ts *timespec, unpark int32, hint, unparkhint unsafe.Pointer) int32
+
+//go:noescape
+func lwp_unpark(lwp int32, hint unsafe.Pointer) int32
+
+func lwp_self() int32
+
+func osyield()
+
+//go:nosplit
+func osyield_no_g() {
+	osyield()
+}
+
+func kqueue() int32
+
+//go:noescape
+func kevent(kq int32, ch *keventt, nch int32, ev *keventt, nev int32, ts *timespec) int32
+
+func pipe2(flags int32) (r, w int32, errno int32)
+func fcntl(fd, cmd, arg int32) (ret int32, errno int32)
+
+func issetugid() int32
+
+const (
+	_ESRCH     = 3
+	_ETIMEDOUT = 60
+
+	// From NetBSD's <sys/time.h>
+	_CLOCK_REALTIME  = 0
+	_CLOCK_VIRTUAL   = 1
+	_CLOCK_PROF      = 2
+	_CLOCK_MONOTONIC = 3
+
+	_TIMER_RELTIME = 0
+	_TIMER_ABSTIME = 1
+)
+
+var sigset_all = sigset{[4]uint32{^uint32(0), ^uint32(0), ^uint32(0), ^uint32(0)}}
+
+// From NetBSD's <sys/sysctl.h>
+const (
+	_CTL_KERN   = 1
+	_KERN_OSREV = 3
+
+	_CTL_HW        = 6
+	_HW_NCPU       = 3
+	_HW_PAGESIZE   = 7
+	_HW_NCPUONLINE = 16
+)
+
+func sysctlInt(mib []uint32) (int32, bool) {
+	var out int32
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], uint32(len(mib)), (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret < 0 {
+		return 0, false
+	}
+	return out, true
+}
+
+func getncpu() int32 {
+	if n, ok := sysctlInt([]uint32{_CTL_HW, _HW_NCPUONLINE}); ok {
+		return int32(n)
+	}
+	if n, ok := sysctlInt([]uint32{_CTL_HW, _HW_NCPU}); ok {
+		return int32(n)
+	}
+	return 1
+}
+
+func getPageSize() uintptr {
+	mib := [2]uint32{_CTL_HW, _HW_PAGESIZE}
+	out := uint32(0)
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], 2, (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret >= 0 {
+		return uintptr(out)
+	}
+	return 0
+}
+
+func getOSRev() int {
+	if osrev, ok := sysctlInt([]uint32{_CTL_KERN, _KERN_OSREV}); ok {
+		return int(osrev)
+	}
+	return 0
+}
+
+//go:nosplit
+func semacreate(mp *m) {
+}
+
+//go:nosplit
+func semasleep(ns int64) int32 {
+	gp := getg()
+	var deadline int64
+	if ns >= 0 {
+		deadline = nanotime() + ns
+	}
+
+	for {
+		v := atomic.Load(&gp.m.waitsemacount)
+		if v > 0 {
+			if atomic.Cas(&gp.m.waitsemacount, v, v-1) {
+				return 0 // semaphore acquired
+			}
+			continue
+		}
+
+		// Sleep until unparked by semawakeup or timeout.
+		var tsp *timespec
+		var ts timespec
+		if ns >= 0 {
+			wait := deadline - nanotime()
+			if wait <= 0 {
+				return -1
+			}
+			ts.setNsec(wait)
+			tsp = &ts
+		}
+		ret := lwp_park(_CLOCK_MONOTONIC, _TIMER_RELTIME, tsp, 0, unsafe.Pointer(&gp.m.waitsemacount), nil)
+		if ret == _ETIMEDOUT {
+			return -1
+		}
+	}
+}
+
+//go:nosplit
+func semawakeup(mp *m) {
+	atomic.Xadd(&mp.waitsemacount, 1)
+	// From NetBSD's _lwp_unpark(2) manual:
+	// "If the target LWP is not currently waiting, it will return
+	// immediately upon the next call to _lwp_park()."
+	ret := lwp_unpark(int32(mp.procid), unsafe.Pointer(&mp.waitsemacount))
+	if ret != 0 && ret != _ESRCH {
+		// semawakeup can be called on signal stack.
+		systemstack(func() {
+			print("thrwakeup addr=", &mp.waitsemacount, " sem=", mp.waitsemacount, " ret=", ret, "\n")
+		})
+	}
+}
+
+// May run with m.p==nil, so write barriers are not allowed.
+//
+//go:nowritebarrier
+func newosproc(mp *m) {
+	stk := unsafe.Pointer(mp.g0.stack.hi)
+	if false {
+		print("newosproc stk=", stk, " m=", mp, " g=", mp.g0, " id=", mp.id, " ostk=", &mp, "\n")
+	}
+
+	var uc ucontextt
+	getcontext(unsafe.Pointer(&uc))
+
+	// _UC_SIGMASK does not seem to work here.
+	// It would be nice if _UC_SIGMASK and _UC_STACK
+	// worked so that we could do all the work setting
+	// the sigmask and the stack here, instead of setting
+	// the mask here and the stack in netbsdMstart.
+	// For now do the blocking manually.
+	uc.uc_flags = _UC_SIGMASK | _UC_CPU
+	uc.uc_link = nil
+	uc.uc_sigmask = sigset_all
+
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+
+	lwp_mcontext_init(&uc.uc_mcontext, stk, mp, mp.g0, abi.FuncPCABI0(netbsdMstart))
+
+	ret := retryOnEAGAIN(func() int32 {
+		errno := lwp_create(unsafe.Pointer(&uc), _LWP_DETACHED, unsafe.Pointer(&mp.procid))
+		// lwp_create returns negative errno
+		return -errno
+	})
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if ret != 0 {
+		print("runtime: failed to create new OS thread (have ", mcount()-1, " already; errno=", ret, ")\n")
+		if ret == _EAGAIN {
+			println("runtime: may need to increase max user processes (ulimit -p)")
+		}
+		throw("runtime.newosproc")
+	}
+}
+
+// mstart is the entry-point for new Ms.
+// It is written in assembly, uses ABI0, is marked TOPFRAME, and calls netbsdMstart0.
+func netbsdMstart()
+
+// netbsdMstart0 is the function call that starts executing a newly
+// created thread. On NetBSD, a new thread inherits the signal stack
+// of the creating thread. That confuses minit, so we remove that
+// signal stack here before calling the regular mstart. It's a bit
+// baroque to remove a signal stack here only to add one in minit, but
+// it's a simple change that keeps NetBSD working like other OS's.
+// At this point all signals are blocked, so there is no race.
+//
+//go:nosplit
+func netbsdMstart0() {
+	st := stackt{ss_flags: _SS_DISABLE}
+	sigaltstack(&st, nil)
+	mstart0()
+}
+
+func osinit() {
+	ncpu = getncpu()
+	if physPageSize == 0 {
+		physPageSize = getPageSize()
+	}
+	needSysmonWorkaround = getOSRev() < 902000000 // NetBSD 9.2
+}
+
+var urandom_dev = []byte("/dev/urandom\x00")
+
+//go:nosplit
+func getRandomData(r []byte) {
+	fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)
+	n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))
+	closefd(fd)
+	extendRandom(r, int(n))
+}
+
+func goenvs() {
+	goenvs_unix()
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	mp.gsignal = malg(32 * 1024)
+	mp.gsignal.m = mp
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+	gp := getg()
+	gp.m.procid = uint64(lwp_self())
+
+	// On NetBSD a thread created by pthread_create inherits the
+	// signal stack of the creating thread. We always create a
+	// new signal stack here, to avoid having two Go threads using
+	// the same signal stack. This breaks the case of a thread
+	// created in C that calls sigaltstack and then calls a Go
+	// function, because we will lose track of the C code's
+	// sigaltstack, but it's the best we can do.
+	signalstack(&gp.m.gsignal.stack)
+	gp.m.newSigstack = true
+
+	minitSignalMask()
+}
+
+// Called from dropm to undo the effect of an minit.
+//
+//go:nosplit
+func unminit() {
+	unminitSignals()
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+func sigtramp()
+
+type sigactiont struct {
+	sa_sigaction uintptr
+	sa_mask      sigset
+	sa_flags     int32
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa sigactiont
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTART
+	sa.sa_mask = sigset_all
+	if fn == abi.FuncPCABIInternal(sighandler) { // abi.FuncPCABIInternal(sighandler) matches the callers in signal_unix.go
+		fn = abi.FuncPCABI0(sigtramp)
+	}
+	sa.sa_sigaction = fn
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsigstack(i uint32) {
+	throw("setsigstack")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func getsig(i uint32) uintptr {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	return sa.sa_sigaction
+}
+
+// setSignalstackSP sets the ss_sp field of a stackt.
+//
+//go:nosplit
+func setSignalstackSP(s *stackt, sp uintptr) {
+	s.ss_sp = sp
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	mask.__bits[(i-1)/32] |= 1 << ((uint32(i) - 1) & 31)
+}
+
+func sigdelset(mask *sigset, i int) {
+	mask.__bits[(i-1)/32] &^= 1 << ((uint32(i) - 1) & 31)
+}
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+}
+
+func setProcessCPUProfiler(hz int32) {
+	setProcessCPUProfilerTimer(hz)
+}
+
+func setThreadCPUProfiler(hz int32) {
+	setThreadCPUProfilerHz(hz)
+}
+
+//go:nosplit
+func validSIGPROF(mp *m, c *sigctxt) bool {
+	return true
+}
+
+func sysargs(argc int32, argv **byte) {
+	n := argc + 1
+
+	// skip over argv, envp to get to auxv
+	for argv_index(argv, n) != nil {
+		n++
+	}
+
+	// skip NULL separator
+	n++
+
+	// now argv+n is auxv
+	auxvp := (*[1 << 28]uintptr)(add(unsafe.Pointer(argv), uintptr(n)*goarch.PtrSize))
+	pairs := sysauxv(auxvp[:])
+	auxv = auxvp[: pairs*2 : pairs*2]
+}
+
+const (
+	_AT_NULL   = 0 // Terminates the vector
+	_AT_PAGESZ = 6 // Page size in bytes
+)
+
+func sysauxv(auxv []uintptr) (pairs int) {
+	var i int
+	for i = 0; auxv[i] != _AT_NULL; i += 2 {
+		tag, val := auxv[i], auxv[i+1]
+		switch tag {
+		case _AT_PAGESZ:
+			physPageSize = val
+		}
+	}
+	return i / 2
+}
+
+// raise sends signal to the calling thread.
+//
+// It must be nosplit because it is used by the signal handler before
+// it definitely has a Go stack.
+//
+//go:nosplit
+func raise(sig uint32) {
+	lwp_kill(lwp_self(), int(sig))
+}
+
+func signalM(mp *m, sig int) {
+	lwp_kill(int32(mp.procid), sig)
+}
+
+// sigPerThreadSyscall is only used on linux, so we assign a bogus signal
+// number.
+const sigPerThreadSyscall = 1 << 31
+
+//go:nosplit
+func runPerThreadSyscall() {
+	throw("runPerThreadSyscall only valid on linux")
+}
diff --git a/src/runtime/os_netbsd_386.go b/src/runtime/os_netbsd_386.go
new file mode 100644
index 0000000..ac89b98
--- /dev/null
+++ b/src/runtime/os_netbsd_386.go
@@ -0,0 +1,19 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"unsafe"
+)
+
+func lwp_mcontext_init(mc *mcontextt, stk unsafe.Pointer, mp *m, gp *g, fn uintptr) {
+	// Machine dependent mcontext initialisation for LWP.
+	mc.__gregs[_REG_EIP] = uint32(abi.FuncPCABI0(lwp_tramp))
+	mc.__gregs[_REG_UESP] = uint32(uintptr(stk))
+	mc.__gregs[_REG_EBX] = uint32(uintptr(unsafe.Pointer(mp)))
+	mc.__gregs[_REG_EDX] = uint32(uintptr(unsafe.Pointer(gp)))
+	mc.__gregs[_REG_ESI] = uint32(fn)
+}
diff --git a/src/runtime/os_netbsd_amd64.go b/src/runtime/os_netbsd_amd64.go
new file mode 100644
index 0000000..74eea0c
--- /dev/null
+++ b/src/runtime/os_netbsd_amd64.go
@@ -0,0 +1,19 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"unsafe"
+)
+
+func lwp_mcontext_init(mc *mcontextt, stk unsafe.Pointer, mp *m, gp *g, fn uintptr) {
+	// Machine dependent mcontext initialisation for LWP.
+	mc.__gregs[_REG_RIP] = uint64(abi.FuncPCABI0(lwp_tramp))
+	mc.__gregs[_REG_RSP] = uint64(uintptr(stk))
+	mc.__gregs[_REG_R8] = uint64(uintptr(unsafe.Pointer(mp)))
+	mc.__gregs[_REG_R9] = uint64(uintptr(unsafe.Pointer(gp)))
+	mc.__gregs[_REG_R12] = uint64(fn)
+}
diff --git a/src/runtime/os_netbsd_arm.go b/src/runtime/os_netbsd_arm.go
new file mode 100644
index 0000000..5fb4e08
--- /dev/null
+++ b/src/runtime/os_netbsd_arm.go
@@ -0,0 +1,37 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"unsafe"
+)
+
+func lwp_mcontext_init(mc *mcontextt, stk unsafe.Pointer, mp *m, gp *g, fn uintptr) {
+	// Machine dependent mcontext initialisation for LWP.
+	mc.__gregs[_REG_R15] = uint32(abi.FuncPCABI0(lwp_tramp))
+	mc.__gregs[_REG_R13] = uint32(uintptr(stk))
+	mc.__gregs[_REG_R0] = uint32(uintptr(unsafe.Pointer(mp)))
+	mc.__gregs[_REG_R1] = uint32(uintptr(unsafe.Pointer(gp)))
+	mc.__gregs[_REG_R2] = uint32(fn)
+}
+
+func checkgoarm() {
+	// TODO(minux): FP checks like in os_linux_arm.go.
+
+	// osinit not called yet, so ncpu not set: must use getncpu directly.
+	if getncpu() > 1 && goarm < 7 {
+		print("runtime: this system has multiple CPUs and must use\n")
+		print("atomic synchronization instructions. Recompile using GOARM=7.\n")
+		exit(1)
+	}
+}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_netbsd_arm64.go b/src/runtime/os_netbsd_arm64.go
new file mode 100644
index 0000000..2dda9c9
--- /dev/null
+++ b/src/runtime/os_netbsd_arm64.go
@@ -0,0 +1,26 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"unsafe"
+)
+
+func lwp_mcontext_init(mc *mcontextt, stk unsafe.Pointer, mp *m, gp *g, fn uintptr) {
+	// Machine dependent mcontext initialisation for LWP.
+	mc.__gregs[_REG_ELR] = uint64(abi.FuncPCABI0(lwp_tramp))
+	mc.__gregs[_REG_X31] = uint64(uintptr(stk))
+	mc.__gregs[_REG_X0] = uint64(uintptr(unsafe.Pointer(mp)))
+	mc.__gregs[_REG_X1] = uint64(uintptr(unsafe.Pointer(mp.g0)))
+	mc.__gregs[_REG_X2] = uint64(fn)
+}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_nonopenbsd.go b/src/runtime/os_nonopenbsd.go
new file mode 100644
index 0000000..a577596
--- /dev/null
+++ b/src/runtime/os_nonopenbsd.go
@@ -0,0 +1,17 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !openbsd
+
+package runtime
+
+// osStackAlloc performs OS-specific initialization before s is used
+// as stack memory.
+func osStackAlloc(s *mspan) {
+}
+
+// osStackFree undoes the effect of osStackAlloc before s is returned
+// to the heap.
+func osStackFree(s *mspan) {
+}
diff --git a/src/runtime/os_only_solaris.go b/src/runtime/os_only_solaris.go
new file mode 100644
index 0000000..0c72500
--- /dev/null
+++ b/src/runtime/os_only_solaris.go
@@ -0,0 +1,18 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Solaris code that doesn't also apply to illumos.
+
+//go:build !illumos
+
+package runtime
+
+func getncpu() int32 {
+	n := int32(sysconf(__SC_NPROCESSORS_ONLN))
+	if n < 1 {
+		return 1
+	}
+
+	return n
+}
diff --git a/src/runtime/os_openbsd.go b/src/runtime/os_openbsd.go
new file mode 100644
index 0000000..500286a
--- /dev/null
+++ b/src/runtime/os_openbsd.go
@@ -0,0 +1,314 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+type mOS struct {
+	waitsemacount uint32
+}
+
+const (
+	_ESRCH       = 3
+	_EWOULDBLOCK = _EAGAIN
+	_ENOTSUP     = 91
+
+	// From OpenBSD's sys/time.h
+	_CLOCK_REALTIME  = 0
+	_CLOCK_VIRTUAL   = 1
+	_CLOCK_PROF      = 2
+	_CLOCK_MONOTONIC = 3
+)
+
+type sigset uint32
+
+var sigset_all = ^sigset(0)
+
+// From OpenBSD's <sys/sysctl.h>
+const (
+	_CTL_KERN   = 1
+	_KERN_OSREV = 3
+
+	_CTL_HW        = 6
+	_HW_NCPU       = 3
+	_HW_PAGESIZE   = 7
+	_HW_NCPUONLINE = 25
+)
+
+func sysctlInt(mib []uint32) (int32, bool) {
+	var out int32
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], uint32(len(mib)), (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret < 0 {
+		return 0, false
+	}
+	return out, true
+}
+
+func sysctlUint64(mib []uint32) (uint64, bool) {
+	var out uint64
+	nout := unsafe.Sizeof(out)
+	ret := sysctl(&mib[0], uint32(len(mib)), (*byte)(unsafe.Pointer(&out)), &nout, nil, 0)
+	if ret < 0 {
+		return 0, false
+	}
+	return out, true
+}
+
+//go:linkname internal_cpu_sysctlUint64 internal/cpu.sysctlUint64
+func internal_cpu_sysctlUint64(mib []uint32) (uint64, bool) {
+	return sysctlUint64(mib)
+}
+
+func getncpu() int32 {
+	// Try hw.ncpuonline first because hw.ncpu would report a number twice as
+	// high as the actual CPUs running on OpenBSD 6.4 with hyperthreading
+	// disabled (hw.smt=0). See https://golang.org/issue/30127
+	if n, ok := sysctlInt([]uint32{_CTL_HW, _HW_NCPUONLINE}); ok {
+		return int32(n)
+	}
+	if n, ok := sysctlInt([]uint32{_CTL_HW, _HW_NCPU}); ok {
+		return int32(n)
+	}
+	return 1
+}
+
+func getPageSize() uintptr {
+	if ps, ok := sysctlInt([]uint32{_CTL_HW, _HW_PAGESIZE}); ok {
+		return uintptr(ps)
+	}
+	return 0
+}
+
+func getOSRev() int {
+	if osrev, ok := sysctlInt([]uint32{_CTL_KERN, _KERN_OSREV}); ok {
+		return int(osrev)
+	}
+	return 0
+}
+
+//go:nosplit
+func semacreate(mp *m) {
+}
+
+//go:nosplit
+func semasleep(ns int64) int32 {
+	gp := getg()
+
+	// Compute sleep deadline.
+	var tsp *timespec
+	if ns >= 0 {
+		var ts timespec
+		ts.setNsec(ns + nanotime())
+		tsp = &ts
+	}
+
+	for {
+		v := atomic.Load(&gp.m.waitsemacount)
+		if v > 0 {
+			if atomic.Cas(&gp.m.waitsemacount, v, v-1) {
+				return 0 // semaphore acquired
+			}
+			continue
+		}
+
+		// Sleep until woken by semawakeup or timeout; or abort if waitsemacount != 0.
+		//
+		// From OpenBSD's __thrsleep(2) manual:
+		// "The abort argument, if not NULL, points to an int that will
+		// be examined [...] immediately before blocking. If that int
+		// is non-zero then __thrsleep() will immediately return EINTR
+		// without blocking."
+		ret := thrsleep(uintptr(unsafe.Pointer(&gp.m.waitsemacount)), _CLOCK_MONOTONIC, tsp, 0, &gp.m.waitsemacount)
+		if ret == _EWOULDBLOCK {
+			return -1
+		}
+	}
+}
+
+//go:nosplit
+func semawakeup(mp *m) {
+	atomic.Xadd(&mp.waitsemacount, 1)
+	ret := thrwakeup(uintptr(unsafe.Pointer(&mp.waitsemacount)), 1)
+	if ret != 0 && ret != _ESRCH {
+		// semawakeup can be called on signal stack.
+		systemstack(func() {
+			print("thrwakeup addr=", &mp.waitsemacount, " sem=", mp.waitsemacount, " ret=", ret, "\n")
+		})
+	}
+}
+
+func osinit() {
+	ncpu = getncpu()
+	physPageSize = getPageSize()
+	haveMapStack = getOSRev() >= 201805 // OpenBSD 6.3
+}
+
+var urandom_dev = []byte("/dev/urandom\x00")
+
+//go:nosplit
+func getRandomData(r []byte) {
+	fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)
+	n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))
+	closefd(fd)
+	extendRandom(r, int(n))
+}
+
+func goenvs() {
+	goenvs_unix()
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	gsignalSize := int32(32 * 1024)
+	if GOARCH == "mips64" {
+		gsignalSize = int32(64 * 1024)
+	}
+	mp.gsignal = malg(gsignalSize)
+	mp.gsignal.m = mp
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, can not allocate memory.
+func minit() {
+	getg().m.procid = uint64(getthrid())
+	minitSignals()
+}
+
+// Called from dropm to undo the effect of an minit.
+//
+//go:nosplit
+func unminit() {
+	unminitSignals()
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+func sigtramp()
+
+type sigactiont struct {
+	sa_sigaction uintptr
+	sa_mask      uint32
+	sa_flags     int32
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsig(i uint32, fn uintptr) {
+	var sa sigactiont
+	sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTART
+	sa.sa_mask = uint32(sigset_all)
+	if fn == abi.FuncPCABIInternal(sighandler) { // abi.FuncPCABIInternal(sighandler) matches the callers in signal_unix.go
+		fn = abi.FuncPCABI0(sigtramp)
+	}
+	sa.sa_sigaction = fn
+	sigaction(i, &sa, nil)
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func setsigstack(i uint32) {
+	throw("setsigstack")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func getsig(i uint32) uintptr {
+	var sa sigactiont
+	sigaction(i, nil, &sa)
+	return sa.sa_sigaction
+}
+
+// setSignalstackSP sets the ss_sp field of a stackt.
+//
+//go:nosplit
+func setSignalstackSP(s *stackt, sp uintptr) {
+	s.ss_sp = sp
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaddset(mask *sigset, i int) {
+	*mask |= 1 << (uint32(i) - 1)
+}
+
+func sigdelset(mask *sigset, i int) {
+	*mask &^= 1 << (uint32(i) - 1)
+}
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+}
+
+func setProcessCPUProfiler(hz int32) {
+	setProcessCPUProfilerTimer(hz)
+}
+
+func setThreadCPUProfiler(hz int32) {
+	setThreadCPUProfilerHz(hz)
+}
+
+//go:nosplit
+func validSIGPROF(mp *m, c *sigctxt) bool {
+	return true
+}
+
+var haveMapStack = false
+
+func osStackAlloc(s *mspan) {
+	// OpenBSD 6.4+ requires that stacks be mapped with MAP_STACK.
+	// It will check this on entry to system calls, traps, and
+	// when switching to the alternate system stack.
+	//
+	// This function is called before s is used for any data, so
+	// it's safe to simply re-map it.
+	osStackRemap(s, _MAP_STACK)
+}
+
+func osStackFree(s *mspan) {
+	// Undo MAP_STACK.
+	osStackRemap(s, 0)
+}
+
+func osStackRemap(s *mspan, flags int32) {
+	if !haveMapStack {
+		// OpenBSD prior to 6.3 did not have MAP_STACK and so
+		// the following mmap will fail. But it also didn't
+		// require MAP_STACK (obviously), so there's no need
+		// to do the mmap.
+		return
+	}
+	a, err := mmap(unsafe.Pointer(s.base()), s.npages*pageSize, _PROT_READ|_PROT_WRITE, _MAP_PRIVATE|_MAP_ANON|_MAP_FIXED|flags, -1, 0)
+	if err != 0 || uintptr(a) != s.base() {
+		print("runtime: remapping stack memory ", hex(s.base()), " ", s.npages*pageSize, " a=", a, " err=", err, "\n")
+		throw("remapping stack memory failed")
+	}
+}
+
+//go:nosplit
+func raise(sig uint32) {
+	thrkill(getthrid(), int(sig))
+}
+
+func signalM(mp *m, sig int) {
+	thrkill(int32(mp.procid), sig)
+}
+
+// sigPerThreadSyscall is only used on linux, so we assign a bogus signal
+// number.
+const sigPerThreadSyscall = 1 << 31
+
+//go:nosplit
+func runPerThreadSyscall() {
+	throw("runPerThreadSyscall only valid on linux")
+}
diff --git a/src/runtime/os_openbsd_arm.go b/src/runtime/os_openbsd_arm.go
new file mode 100644
index 0000000..0a24096
--- /dev/null
+++ b/src/runtime/os_openbsd_arm.go
@@ -0,0 +1,23 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+func checkgoarm() {
+	// TODO(minux): FP checks like in os_linux_arm.go.
+
+	// osinit not called yet, so ncpu not set: must use getncpu directly.
+	if getncpu() > 1 && goarm < 7 {
+		print("runtime: this system has multiple CPUs and must use\n")
+		print("atomic synchronization instructions. Recompile using GOARM=7.\n")
+		exit(1)
+	}
+}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_openbsd_arm64.go b/src/runtime/os_openbsd_arm64.go
new file mode 100644
index 0000000..d71de7d
--- /dev/null
+++ b/src/runtime/os_openbsd_arm64.go
@@ -0,0 +1,12 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_openbsd_libc.go b/src/runtime/os_openbsd_libc.go
new file mode 100644
index 0000000..201f162
--- /dev/null
+++ b/src/runtime/os_openbsd_libc.go
@@ -0,0 +1,60 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build openbsd && !mips64
+
+package runtime
+
+import (
+	"internal/abi"
+	"unsafe"
+)
+
+// mstart_stub provides glue code to call mstart from pthread_create.
+func mstart_stub()
+
+// May run with m.p==nil, so write barriers are not allowed.
+//
+//go:nowritebarrierrec
+func newosproc(mp *m) {
+	if false {
+		print("newosproc m=", mp, " g=", mp.g0, " id=", mp.id, " ostk=", &mp, "\n")
+	}
+
+	// Initialize an attribute object.
+	var attr pthreadattr
+	if err := pthread_attr_init(&attr); err != 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+
+	// Find out OS stack size for our own stack guard.
+	var stacksize uintptr
+	if pthread_attr_getstacksize(&attr, &stacksize) != 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+	mp.g0.stack.hi = stacksize // for mstart
+
+	// Tell the pthread library we won't join with this thread.
+	if pthread_attr_setdetachstate(&attr, _PTHREAD_CREATE_DETACHED) != 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+
+	// Finally, create the thread. It starts at mstart_stub, which does some low-level
+	// setup and then calls mstart.
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	err := retryOnEAGAIN(func() int32 {
+		return pthread_create(&attr, abi.FuncPCABI0(mstart_stub), unsafe.Pointer(mp))
+	})
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+	if err != 0 {
+		writeErrStr(failthreadcreate)
+		exit(1)
+	}
+
+	pthread_attr_destroy(&attr)
+}
diff --git a/src/runtime/os_openbsd_mips64.go b/src/runtime/os_openbsd_mips64.go
new file mode 100644
index 0000000..ae220cd
--- /dev/null
+++ b/src/runtime/os_openbsd_mips64.go
@@ -0,0 +1,12 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_openbsd_syscall.go b/src/runtime/os_openbsd_syscall.go
new file mode 100644
index 0000000..d784f76
--- /dev/null
+++ b/src/runtime/os_openbsd_syscall.go
@@ -0,0 +1,51 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build openbsd && mips64
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+//go:noescape
+func tfork(param *tforkt, psize uintptr, mm *m, gg *g, fn uintptr) int32
+
+// May run with m.p==nil, so write barriers are not allowed.
+//
+//go:nowritebarrier
+func newosproc(mp *m) {
+	stk := unsafe.Pointer(mp.g0.stack.hi)
+	if false {
+		print("newosproc stk=", stk, " m=", mp, " g=", mp.g0, " id=", mp.id, " ostk=", &mp, "\n")
+	}
+
+	// Stack pointer must point inside stack area (as marked with MAP_STACK),
+	// rather than at the top of it.
+	param := tforkt{
+		tf_tcb:   unsafe.Pointer(&mp.tls[0]),
+		tf_tid:   nil, // minit will record tid
+		tf_stack: uintptr(stk) - goarch.PtrSize,
+	}
+
+	var oset sigset
+	sigprocmask(_SIG_SETMASK, &sigset_all, &oset)
+	ret := retryOnEAGAIN(func() int32 {
+		errno := tfork(&param, unsafe.Sizeof(param), mp, mp.g0, abi.FuncPCABI0(mstart))
+		// tfork returns negative errno
+		return -errno
+	})
+	sigprocmask(_SIG_SETMASK, &oset, nil)
+
+	if ret != 0 {
+		print("runtime: failed to create new OS thread (have ", mcount()-1, " already; errno=", ret, ")\n")
+		if ret == _EAGAIN {
+			println("runtime: may need to increase max user processes (ulimit -p)")
+		}
+		throw("runtime.newosproc")
+	}
+}
diff --git a/src/runtime/os_openbsd_syscall1.go b/src/runtime/os_openbsd_syscall1.go
new file mode 100644
index 0000000..d32894b
--- /dev/null
+++ b/src/runtime/os_openbsd_syscall1.go
@@ -0,0 +1,20 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build openbsd && mips64
+
+package runtime
+
+//go:noescape
+func thrsleep(ident uintptr, clock_id int32, tsp *timespec, lock uintptr, abort *uint32) int32
+
+//go:noescape
+func thrwakeup(ident uintptr, n int32) int32
+
+func osyield()
+
+//go:nosplit
+func osyield_no_g() {
+	osyield()
+}
diff --git a/src/runtime/os_openbsd_syscall2.go b/src/runtime/os_openbsd_syscall2.go
new file mode 100644
index 0000000..0b796ad
--- /dev/null
+++ b/src/runtime/os_openbsd_syscall2.go
@@ -0,0 +1,102 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build openbsd && mips64
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+//go:noescape
+func sigaction(sig uint32, new, old *sigactiont)
+
+func kqueue() int32
+
+//go:noescape
+func kevent(kq int32, ch *keventt, nch int32, ev *keventt, nev int32, ts *timespec) int32
+
+func raiseproc(sig uint32)
+
+func getthrid() int32
+func thrkill(tid int32, sig int)
+
+// read calls the read system call.
+// It returns a non-negative number of bytes written or a negative errno value.
+func read(fd int32, p unsafe.Pointer, n int32) int32
+
+func closefd(fd int32) int32
+
+func exit(code int32)
+func usleep(usec uint32)
+
+//go:nosplit
+func usleep_no_g(usec uint32) {
+	usleep(usec)
+}
+
+// write1 calls the write system call.
+// It returns a non-negative number of bytes written or a negative errno value.
+//
+//go:noescape
+func write1(fd uintptr, p unsafe.Pointer, n int32) int32
+
+//go:noescape
+func open(name *byte, mode, perm int32) int32
+
+// return value is only set on linux to be used in osinit().
+func madvise(addr unsafe.Pointer, n uintptr, flags int32) int32
+
+// exitThread terminates the current thread, writing *wait = freeMStack when
+// the stack is safe to reclaim.
+//
+//go:noescape
+func exitThread(wait *atomic.Uint32)
+
+//go:noescape
+func obsdsigprocmask(how int32, new sigset) sigset
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigprocmask(how int32, new, old *sigset) {
+	n := sigset(0)
+	if new != nil {
+		n = *new
+	}
+	r := obsdsigprocmask(how, n)
+	if old != nil {
+		*old = r
+	}
+}
+
+func pipe2(flags int32) (r, w int32, errno int32)
+
+//go:noescape
+func setitimer(mode int32, new, old *itimerval)
+
+//go:noescape
+func sysctl(mib *uint32, miblen uint32, out *byte, size *uintptr, dst *byte, ndst uintptr) int32
+
+// mmap calls the mmap system call. It is implemented in assembly.
+// We only pass the lower 32 bits of file offset to the
+// assembly routine; the higher bits (if required), should be provided
+// by the assembly routine as 0.
+// The err result is an OS error code such as ENOMEM.
+func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (p unsafe.Pointer, err int)
+
+// munmap calls the munmap system call. It is implemented in assembly.
+func munmap(addr unsafe.Pointer, n uintptr)
+
+func nanotime1() int64
+
+//go:noescape
+func sigaltstack(new, old *stackt)
+
+func fcntl(fd, cmd, arg int32) (ret int32, errno int32)
+
+func walltime() (sec int64, nsec int32)
+
+func issetugid() int32
diff --git a/src/runtime/os_plan9.go b/src/runtime/os_plan9.go
new file mode 100644
index 0000000..f4ff4d5
--- /dev/null
+++ b/src/runtime/os_plan9.go
@@ -0,0 +1,547 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+type mOS struct {
+	waitsemacount uint32
+	notesig       *int8
+	errstr        *byte
+	ignoreHangup  bool
+}
+
+func closefd(fd int32) int32
+
+//go:noescape
+func open(name *byte, mode, perm int32) int32
+
+//go:noescape
+func pread(fd int32, buf unsafe.Pointer, nbytes int32, offset int64) int32
+
+//go:noescape
+func pwrite(fd int32, buf unsafe.Pointer, nbytes int32, offset int64) int32
+
+func seek(fd int32, offset int64, whence int32) int64
+
+//go:noescape
+func exits(msg *byte)
+
+//go:noescape
+func brk_(addr unsafe.Pointer) int32
+
+func sleep(ms int32) int32
+
+func rfork(flags int32) int32
+
+//go:noescape
+func plan9_semacquire(addr *uint32, block int32) int32
+
+//go:noescape
+func plan9_tsemacquire(addr *uint32, ms int32) int32
+
+//go:noescape
+func plan9_semrelease(addr *uint32, count int32) int32
+
+//go:noescape
+func notify(fn unsafe.Pointer) int32
+
+func noted(mode int32) int32
+
+//go:noescape
+func nsec(*int64) int64
+
+//go:noescape
+func sigtramp(ureg, note unsafe.Pointer)
+
+func setfpmasks()
+
+//go:noescape
+func tstart_plan9(newm *m)
+
+func errstr() string
+
+type _Plink uintptr
+
+func sigpanic() {
+	gp := getg()
+	if !canpanic() {
+		throw("unexpected signal during runtime execution")
+	}
+
+	note := gostringnocopy((*byte)(unsafe.Pointer(gp.m.notesig)))
+	switch gp.sig {
+	case _SIGRFAULT, _SIGWFAULT:
+		i := indexNoFloat(note, "addr=")
+		if i >= 0 {
+			i += 5
+		} else if i = indexNoFloat(note, "va="); i >= 0 {
+			i += 3
+		} else {
+			panicmem()
+		}
+		addr := note[i:]
+		gp.sigcode1 = uintptr(atolwhex(addr))
+		if gp.sigcode1 < 0x1000 {
+			panicmem()
+		}
+		if gp.paniconfault {
+			panicmemAddr(gp.sigcode1)
+		}
+		if inUserArenaChunk(gp.sigcode1) {
+			// We could check that the arena chunk is explicitly set to fault,
+			// but the fact that we faulted on accessing it is enough to prove
+			// that it is.
+			print("accessed data from freed user arena ", hex(gp.sigcode1), "\n")
+		} else {
+			print("unexpected fault address ", hex(gp.sigcode1), "\n")
+		}
+		throw("fault")
+	case _SIGTRAP:
+		if gp.paniconfault {
+			panicmem()
+		}
+		throw(note)
+	case _SIGINTDIV:
+		panicdivide()
+	case _SIGFLOAT:
+		panicfloat()
+	default:
+		panic(errorString(note))
+	}
+}
+
+// indexNoFloat is bytealg.IndexString but safe to use in a note
+// handler.
+func indexNoFloat(s, t string) int {
+	if len(t) == 0 {
+		return 0
+	}
+	for i := 0; i < len(s); i++ {
+		if s[i] == t[0] && hasPrefix(s[i:], t) {
+			return i
+		}
+	}
+	return -1
+}
+
+func atolwhex(p string) int64 {
+	for hasPrefix(p, " ") || hasPrefix(p, "\t") {
+		p = p[1:]
+	}
+	neg := false
+	if hasPrefix(p, "-") || hasPrefix(p, "+") {
+		neg = p[0] == '-'
+		p = p[1:]
+		for hasPrefix(p, " ") || hasPrefix(p, "\t") {
+			p = p[1:]
+		}
+	}
+	var n int64
+	switch {
+	case hasPrefix(p, "0x"), hasPrefix(p, "0X"):
+		p = p[2:]
+		for ; len(p) > 0; p = p[1:] {
+			if '0' <= p[0] && p[0] <= '9' {
+				n = n*16 + int64(p[0]-'0')
+			} else if 'a' <= p[0] && p[0] <= 'f' {
+				n = n*16 + int64(p[0]-'a'+10)
+			} else if 'A' <= p[0] && p[0] <= 'F' {
+				n = n*16 + int64(p[0]-'A'+10)
+			} else {
+				break
+			}
+		}
+	case hasPrefix(p, "0"):
+		for ; len(p) > 0 && '0' <= p[0] && p[0] <= '7'; p = p[1:] {
+			n = n*8 + int64(p[0]-'0')
+		}
+	default:
+		for ; len(p) > 0 && '0' <= p[0] && p[0] <= '9'; p = p[1:] {
+			n = n*10 + int64(p[0]-'0')
+		}
+	}
+	if neg {
+		n = -n
+	}
+	return n
+}
+
+type sigset struct{}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	// Initialize stack and goroutine for note handling.
+	mp.gsignal = malg(32 * 1024)
+	mp.gsignal.m = mp
+	mp.notesig = (*int8)(mallocgc(_ERRMAX, nil, true))
+	// Initialize stack for handling strings from the
+	// errstr system call, as used in package syscall.
+	mp.errstr = (*byte)(mallocgc(_ERRMAX, nil, true))
+}
+
+func sigsave(p *sigset) {
+}
+
+func msigrestore(sigmask sigset) {
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func clearSignalHandlers() {
+}
+
+func sigblock(exiting bool) {
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+	if atomic.Load(&exiting) != 0 {
+		exits(&emptystatus[0])
+	}
+	// Mask all SSE floating-point exceptions
+	// when running on the 64-bit kernel.
+	setfpmasks()
+}
+
+// Called from dropm to undo the effect of an minit.
+func unminit() {
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+var sysstat = []byte("/dev/sysstat\x00")
+
+func getproccount() int32 {
+	var buf [2048]byte
+	fd := open(&sysstat[0], _OREAD, 0)
+	if fd < 0 {
+		return 1
+	}
+	ncpu := int32(0)
+	for {
+		n := read(fd, unsafe.Pointer(&buf), int32(len(buf)))
+		if n <= 0 {
+			break
+		}
+		for i := int32(0); i < n; i++ {
+			if buf[i] == '\n' {
+				ncpu++
+			}
+		}
+	}
+	closefd(fd)
+	if ncpu == 0 {
+		ncpu = 1
+	}
+	return ncpu
+}
+
+var devswap = []byte("/dev/swap\x00")
+var pagesize = []byte(" pagesize\n")
+
+func getPageSize() uintptr {
+	var buf [2048]byte
+	var pos int
+	fd := open(&devswap[0], _OREAD, 0)
+	if fd < 0 {
+		// There's not much we can do if /dev/swap doesn't
+		// exist. However, nothing in the memory manager uses
+		// this on Plan 9, so it also doesn't really matter.
+		return minPhysPageSize
+	}
+	for pos < len(buf) {
+		n := read(fd, unsafe.Pointer(&buf[pos]), int32(len(buf)-pos))
+		if n <= 0 {
+			break
+		}
+		pos += int(n)
+	}
+	closefd(fd)
+	text := buf[:pos]
+	// Find "<n> pagesize" line.
+	bol := 0
+	for i, c := range text {
+		if c == '\n' {
+			bol = i + 1
+		}
+		if bytesHasPrefix(text[i:], pagesize) {
+			// Parse number at the beginning of this line.
+			return uintptr(_atoi(text[bol:]))
+		}
+	}
+	// Again, the page size doesn't really matter, so use a fallback.
+	return minPhysPageSize
+}
+
+func bytesHasPrefix(s, prefix []byte) bool {
+	if len(s) < len(prefix) {
+		return false
+	}
+	for i, p := range prefix {
+		if s[i] != p {
+			return false
+		}
+	}
+	return true
+}
+
+var pid = []byte("#c/pid\x00")
+
+func getpid() uint64 {
+	var b [20]byte
+	fd := open(&pid[0], 0, 0)
+	if fd >= 0 {
+		read(fd, unsafe.Pointer(&b), int32(len(b)))
+		closefd(fd)
+	}
+	c := b[:]
+	for c[0] == ' ' || c[0] == '\t' {
+		c = c[1:]
+	}
+	return uint64(_atoi(c))
+}
+
+func osinit() {
+	physPageSize = getPageSize()
+	initBloc()
+	ncpu = getproccount()
+	getg().m.procid = getpid()
+}
+
+//go:nosplit
+func crash() {
+	notify(nil)
+	*(*int)(nil) = 0
+}
+
+//go:nosplit
+func getRandomData(r []byte) {
+	// inspired by wyrand see hash32.go for detail
+	t := nanotime()
+	v := getg().m.procid ^ uint64(t)
+
+	for len(r) > 0 {
+		v ^= 0xa0761d6478bd642f
+		v *= 0xe7037ed1a0b428db
+		size := 8
+		if len(r) < 8 {
+			size = len(r)
+		}
+		for i := 0; i < size; i++ {
+			r[i] = byte(v >> (8 * i))
+		}
+		r = r[size:]
+		v = v>>32 | v<<32
+	}
+}
+
+func initsig(preinit bool) {
+	if !preinit {
+		notify(unsafe.Pointer(abi.FuncPCABI0(sigtramp)))
+	}
+}
+
+//go:nosplit
+func osyield() {
+	sleep(0)
+}
+
+//go:nosplit
+func osyield_no_g() {
+	osyield()
+}
+
+//go:nosplit
+func usleep(µs uint32) {
+	ms := int32(µs / 1000)
+	if ms == 0 {
+		ms = 1
+	}
+	sleep(ms)
+}
+
+//go:nosplit
+func usleep_no_g(usec uint32) {
+	usleep(usec)
+}
+
+//go:nosplit
+func nanotime1() int64 {
+	var scratch int64
+	ns := nsec(&scratch)
+	// TODO(aram): remove hack after I fix _nsec in the pc64 kernel.
+	if ns == 0 {
+		return scratch
+	}
+	return ns
+}
+
+var goexits = []byte("go: exit ")
+var emptystatus = []byte("\x00")
+var exiting uint32
+
+func goexitsall(status *byte) {
+	var buf [_ERRMAX]byte
+	if !atomic.Cas(&exiting, 0, 1) {
+		return
+	}
+	getg().m.locks++
+	n := copy(buf[:], goexits)
+	n = copy(buf[n:], gostringnocopy(status))
+	pid := getpid()
+	for mp := (*m)(atomic.Loadp(unsafe.Pointer(&allm))); mp != nil; mp = mp.alllink {
+		if mp.procid != 0 && mp.procid != pid {
+			postnote(mp.procid, buf[:])
+		}
+	}
+	getg().m.locks--
+}
+
+var procdir = []byte("/proc/")
+var notefile = []byte("/note\x00")
+
+func postnote(pid uint64, msg []byte) int {
+	var buf [128]byte
+	var tmp [32]byte
+	n := copy(buf[:], procdir)
+	n += copy(buf[n:], itoa(tmp[:], pid))
+	copy(buf[n:], notefile)
+	fd := open(&buf[0], _OWRITE, 0)
+	if fd < 0 {
+		return -1
+	}
+	len := findnull(&msg[0])
+	if write1(uintptr(fd), unsafe.Pointer(&msg[0]), int32(len)) != int32(len) {
+		closefd(fd)
+		return -1
+	}
+	closefd(fd)
+	return 0
+}
+
+//go:nosplit
+func exit(e int32) {
+	var status []byte
+	if e == 0 {
+		status = emptystatus
+	} else {
+		// build error string
+		var tmp [32]byte
+		sl := itoa(tmp[:len(tmp)-1], uint64(e))
+		// Don't append, rely on the existing data being zero.
+		status = sl[:len(sl)+1]
+	}
+	goexitsall(&status[0])
+	exits(&status[0])
+}
+
+// May run with m.p==nil, so write barriers are not allowed.
+//
+//go:nowritebarrier
+func newosproc(mp *m) {
+	if false {
+		print("newosproc mp=", mp, " ostk=", &mp, "\n")
+	}
+	pid := rfork(_RFPROC | _RFMEM | _RFNOWAIT)
+	if pid < 0 {
+		throw("newosproc: rfork failed")
+	}
+	if pid == 0 {
+		tstart_plan9(mp)
+	}
+}
+
+func exitThread(wait *atomic.Uint32) {
+	// We should never reach exitThread on Plan 9 because we let
+	// the OS clean up threads.
+	throw("exitThread")
+}
+
+//go:nosplit
+func semacreate(mp *m) {
+}
+
+//go:nosplit
+func semasleep(ns int64) int {
+	gp := getg()
+	if ns >= 0 {
+		ms := timediv(ns, 1000000, nil)
+		if ms == 0 {
+			ms = 1
+		}
+		ret := plan9_tsemacquire(&gp.m.waitsemacount, ms)
+		if ret == 1 {
+			return 0 // success
+		}
+		return -1 // timeout or interrupted
+	}
+	for plan9_semacquire(&gp.m.waitsemacount, 1) < 0 {
+		// interrupted; try again (c.f. lock_sema.go)
+	}
+	return 0 // success
+}
+
+//go:nosplit
+func semawakeup(mp *m) {
+	plan9_semrelease(&mp.waitsemacount, 1)
+}
+
+//go:nosplit
+func read(fd int32, buf unsafe.Pointer, n int32) int32 {
+	return pread(fd, buf, n, -1)
+}
+
+//go:nosplit
+func write1(fd uintptr, buf unsafe.Pointer, n int32) int32 {
+	return pwrite(int32(fd), buf, n, -1)
+}
+
+var _badsignal = []byte("runtime: signal received on thread not created by Go.\n")
+
+// This runs on a foreign stack, without an m or a g. No stack split.
+//
+//go:nosplit
+func badsignal2() {
+	pwrite(2, unsafe.Pointer(&_badsignal[0]), int32(len(_badsignal)), -1)
+	exits(&_badsignal[0])
+}
+
+func raisebadsignal(sig uint32) {
+	badsignal2()
+}
+
+func _atoi(b []byte) int {
+	n := 0
+	for len(b) > 0 && '0' <= b[0] && b[0] <= '9' {
+		n = n*10 + int(b[0]) - '0'
+		b = b[1:]
+	}
+	return n
+}
+
+func signame(sig uint32) string {
+	if sig >= uint32(len(sigtable)) {
+		return ""
+	}
+	return sigtable[sig].name
+}
+
+const preemptMSupported = false
+
+func preemptM(mp *m) {
+	// Not currently supported.
+	//
+	// TODO: Use a note like we use signals on POSIX OSes
+}
diff --git a/src/runtime/os_plan9_arm.go b/src/runtime/os_plan9_arm.go
new file mode 100644
index 0000000..f165a34
--- /dev/null
+++ b/src/runtime/os_plan9_arm.go
@@ -0,0 +1,16 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+func checkgoarm() {
+	return // TODO(minux)
+}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	return nanotime()
+}
diff --git a/src/runtime/os_solaris.go b/src/runtime/os_solaris.go
new file mode 100644
index 0000000..bc00698
--- /dev/null
+++ b/src/runtime/os_solaris.go
@@ -0,0 +1,273 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type mts struct {
+	tv_sec  int64
+	tv_nsec int64
+}
+
+type mscratch struct {
+	v [6]uintptr
+}
+
+type mOS struct {
+	waitsema uintptr // semaphore for parking on locks
+	perrno   *int32  // pointer to tls errno
+	// these are here because they are too large to be on the stack
+	// of low-level NOSPLIT functions.
+	//LibCall       libcall;
+	ts      mts
+	scratch mscratch
+}
+
+type libcFunc uintptr
+
+//go:linkname asmsysvicall6x runtime.asmsysvicall6
+var asmsysvicall6x libcFunc // name to take addr of asmsysvicall6
+
+func asmsysvicall6() // declared for vet; do NOT call
+
+//go:nosplit
+func sysvicall0(fn *libcFunc) uintptr {
+	// Leave caller's PC/SP around for traceback.
+	gp := getg()
+	var mp *m
+	if gp != nil {
+		mp = gp.m
+	}
+	if mp != nil && mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		mp = nil // See comment in sys_darwin.go:libcCall
+	}
+
+	var libcall libcall
+	libcall.fn = uintptr(unsafe.Pointer(fn))
+	libcall.n = 0
+	libcall.args = uintptr(unsafe.Pointer(fn)) // it's unused but must be non-nil, otherwise crashes
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&libcall))
+	if mp != nil {
+		mp.libcallsp = 0
+	}
+	return libcall.r1
+}
+
+//go:nosplit
+func sysvicall1(fn *libcFunc, a1 uintptr) uintptr {
+	r1, _ := sysvicall1Err(fn, a1)
+	return r1
+}
+
+// sysvicall1Err returns both the system call result and the errno value.
+// This is used by sysvicall1 and pipe.
+//
+//go:nosplit
+func sysvicall1Err(fn *libcFunc, a1 uintptr) (r1, err uintptr) {
+	// Leave caller's PC/SP around for traceback.
+	gp := getg()
+	var mp *m
+	if gp != nil {
+		mp = gp.m
+	}
+	if mp != nil && mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		mp = nil
+	}
+
+	var libcall libcall
+	libcall.fn = uintptr(unsafe.Pointer(fn))
+	libcall.n = 1
+	// TODO(rsc): Why is noescape necessary here and below?
+	libcall.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&libcall))
+	if mp != nil {
+		mp.libcallsp = 0
+	}
+	return libcall.r1, libcall.err
+}
+
+//go:nosplit
+func sysvicall2(fn *libcFunc, a1, a2 uintptr) uintptr {
+	r1, _ := sysvicall2Err(fn, a1, a2)
+	return r1
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+
+// sysvicall2Err returns both the system call result and the errno value.
+// This is used by sysvicall2 and pipe2.
+func sysvicall2Err(fn *libcFunc, a1, a2 uintptr) (uintptr, uintptr) {
+	// Leave caller's PC/SP around for traceback.
+	gp := getg()
+	var mp *m
+	if gp != nil {
+		mp = gp.m
+	}
+	if mp != nil && mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		mp = nil
+	}
+
+	var libcall libcall
+	libcall.fn = uintptr(unsafe.Pointer(fn))
+	libcall.n = 2
+	libcall.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&libcall))
+	if mp != nil {
+		mp.libcallsp = 0
+	}
+	return libcall.r1, libcall.err
+}
+
+//go:nosplit
+func sysvicall3(fn *libcFunc, a1, a2, a3 uintptr) uintptr {
+	r1, _ := sysvicall3Err(fn, a1, a2, a3)
+	return r1
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+
+// sysvicall3Err returns both the system call result and the errno value.
+// This is used by sysvicall3 and write1.
+func sysvicall3Err(fn *libcFunc, a1, a2, a3 uintptr) (r1, err uintptr) {
+	// Leave caller's PC/SP around for traceback.
+	gp := getg()
+	var mp *m
+	if gp != nil {
+		mp = gp.m
+	}
+	if mp != nil && mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		mp = nil
+	}
+
+	var libcall libcall
+	libcall.fn = uintptr(unsafe.Pointer(fn))
+	libcall.n = 3
+	libcall.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&libcall))
+	if mp != nil {
+		mp.libcallsp = 0
+	}
+	return libcall.r1, libcall.err
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sysvicall4(fn *libcFunc, a1, a2, a3, a4 uintptr) uintptr {
+	// Leave caller's PC/SP around for traceback.
+	gp := getg()
+	var mp *m
+	if gp != nil {
+		mp = gp.m
+	}
+	if mp != nil && mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		mp = nil
+	}
+
+	var libcall libcall
+	libcall.fn = uintptr(unsafe.Pointer(fn))
+	libcall.n = 4
+	libcall.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&libcall))
+	if mp != nil {
+		mp.libcallsp = 0
+	}
+	return libcall.r1
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sysvicall5(fn *libcFunc, a1, a2, a3, a4, a5 uintptr) uintptr {
+	// Leave caller's PC/SP around for traceback.
+	gp := getg()
+	var mp *m
+	if gp != nil {
+		mp = gp.m
+	}
+	if mp != nil && mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		mp = nil
+	}
+
+	var libcall libcall
+	libcall.fn = uintptr(unsafe.Pointer(fn))
+	libcall.n = 5
+	libcall.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&libcall))
+	if mp != nil {
+		mp.libcallsp = 0
+	}
+	return libcall.r1
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sysvicall6(fn *libcFunc, a1, a2, a3, a4, a5, a6 uintptr) uintptr {
+	// Leave caller's PC/SP around for traceback.
+	gp := getg()
+	var mp *m
+	if gp != nil {
+		mp = gp.m
+	}
+	if mp != nil && mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		mp = nil
+	}
+
+	var libcall libcall
+	libcall.fn = uintptr(unsafe.Pointer(fn))
+	libcall.n = 6
+	libcall.args = uintptr(noescape(unsafe.Pointer(&a1)))
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&libcall))
+	if mp != nil {
+		mp.libcallsp = 0
+	}
+	return libcall.r1
+}
+
+func issetugid() int32 {
+	return int32(sysvicall0(&libc_issetugid))
+}
diff --git a/src/runtime/os_unix.go b/src/runtime/os_unix.go
new file mode 100644
index 0000000..fdbeba7
--- /dev/null
+++ b/src/runtime/os_unix.go
@@ -0,0 +1,19 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package runtime
+
+const (
+	// These values are the same on all known Unix systems.
+	// If we find a discrepancy some day, we can split them out.
+	_F_SETFD    = 2
+	_FD_CLOEXEC = 1
+)
+
+//go:nosplit
+func closeonexec(fd int32) {
+	fcntl(fd, _F_SETFD, _FD_CLOEXEC)
+}
diff --git a/src/runtime/os_unix_nonlinux.go b/src/runtime/os_unix_nonlinux.go
new file mode 100644
index 0000000..b98753b
--- /dev/null
+++ b/src/runtime/os_unix_nonlinux.go
@@ -0,0 +1,15 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix && !linux
+
+package runtime
+
+// sigFromUser reports whether the signal was sent because of a call
+// to kill.
+//
+//go:nosplit
+func (c *sigctxt) sigFromUser() bool {
+	return c.sigcode() == _SI_USER
+}
diff --git a/src/runtime/os_wasip1.go b/src/runtime/os_wasip1.go
new file mode 100644
index 0000000..8811bb6
--- /dev/null
+++ b/src/runtime/os_wasip1.go
@@ -0,0 +1,259 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build wasip1
+
+package runtime
+
+import "unsafe"
+
+// GOARCH=wasm currently has 64 bits pointers, but the WebAssembly host expects
+// pointers to be 32 bits so we use this type alias to represent pointers in
+// structs and arrays passed as arguments to WASI functions.
+//
+// Note that the use of an integer type prevents the compiler from tracking
+// pointers passed to WASI functions, so we must use KeepAlive to explicitly
+// retain the objects that could otherwise be reclaimed by the GC.
+type uintptr32 = uint32
+
+// https://github.com/WebAssembly/WASI/blob/a2b96e81c0586125cc4dc79a5be0b78d9a059925/legacy/preview1/docs.md#-size-u32
+type size = uint32
+
+// https://github.com/WebAssembly/WASI/blob/a2b96e81c0586125cc4dc79a5be0b78d9a059925/legacy/preview1/docs.md#-errno-variant
+type errno = uint32
+
+// https://github.com/WebAssembly/WASI/blob/a2b96e81c0586125cc4dc79a5be0b78d9a059925/legacy/preview1/docs.md#-filesize-u64
+type filesize = uint64
+
+// https://github.com/WebAssembly/WASI/blob/a2b96e81c0586125cc4dc79a5be0b78d9a059925/legacy/preview1/docs.md#-timestamp-u64
+type timestamp = uint64
+
+// https://github.com/WebAssembly/WASI/blob/a2b96e81c0586125cc4dc79a5be0b78d9a059925/legacy/preview1/docs.md#-clockid-variant
+type clockid = uint32
+
+const (
+	clockRealtime  clockid = 0
+	clockMonotonic clockid = 1
+)
+
+// https://github.com/WebAssembly/WASI/blob/a2b96e81c0586125cc4dc79a5be0b78d9a059925/legacy/preview1/docs.md#-iovec-record
+type iovec struct {
+	buf    uintptr32
+	bufLen size
+}
+
+//go:wasmimport wasi_snapshot_preview1 proc_exit
+func exit(code int32)
+
+//go:wasmimport wasi_snapshot_preview1 args_get
+//go:noescape
+func args_get(argv, argvBuf unsafe.Pointer) errno
+
+//go:wasmimport wasi_snapshot_preview1 args_sizes_get
+//go:noescape
+func args_sizes_get(argc, argvBufLen unsafe.Pointer) errno
+
+//go:wasmimport wasi_snapshot_preview1 clock_time_get
+//go:noescape
+func clock_time_get(clock_id clockid, precision timestamp, time unsafe.Pointer) errno
+
+//go:wasmimport wasi_snapshot_preview1 environ_get
+//go:noescape
+func environ_get(environ, environBuf unsafe.Pointer) errno
+
+//go:wasmimport wasi_snapshot_preview1 environ_sizes_get
+//go:noescape
+func environ_sizes_get(environCount, environBufLen unsafe.Pointer) errno
+
+//go:wasmimport wasi_snapshot_preview1 fd_write
+//go:noescape
+func fd_write(fd int32, iovs unsafe.Pointer, iovsLen size, nwritten unsafe.Pointer) errno
+
+//go:wasmimport wasi_snapshot_preview1 random_get
+//go:noescape
+func random_get(buf unsafe.Pointer, bufLen size) errno
+
+type eventtype = uint8
+
+const (
+	eventtypeClock eventtype = iota
+	eventtypeFdRead
+	eventtypeFdWrite
+)
+
+type eventrwflags = uint16
+
+const (
+	fdReadwriteHangup eventrwflags = 1 << iota
+)
+
+type userdata = uint64
+
+// The go:wasmimport directive currently does not accept values of type uint16
+// in arguments or returns of the function signature. Most WASI imports return
+// an errno value, which we have to define as uint32 because of that limitation.
+// However, the WASI errno type is intended to be a 16 bits integer, and in the
+// event struct the error field should be of type errno. If we used the errno
+// type for the error field it would result in a mismatching field alignment and
+// struct size because errno is declared as a 32 bits type, so we declare the
+// error field as a plain uint16.
+type event struct {
+	userdata    userdata
+	error       uint16
+	typ         eventtype
+	fdReadwrite eventFdReadwrite
+}
+
+type eventFdReadwrite struct {
+	nbytes filesize
+	flags  eventrwflags
+}
+
+type subclockflags = uint16
+
+const (
+	subscriptionClockAbstime subclockflags = 1 << iota
+)
+
+type subscriptionClock struct {
+	id        clockid
+	timeout   timestamp
+	precision timestamp
+	flags     subclockflags
+}
+
+type subscriptionFdReadwrite struct {
+	fd int32
+}
+
+type subscription struct {
+	userdata userdata
+	u        subscriptionUnion
+}
+
+type subscriptionUnion [5]uint64
+
+func (u *subscriptionUnion) eventtype() *eventtype {
+	return (*eventtype)(unsafe.Pointer(&u[0]))
+}
+
+func (u *subscriptionUnion) subscriptionClock() *subscriptionClock {
+	return (*subscriptionClock)(unsafe.Pointer(&u[1]))
+}
+
+func (u *subscriptionUnion) subscriptionFdReadwrite() *subscriptionFdReadwrite {
+	return (*subscriptionFdReadwrite)(unsafe.Pointer(&u[1]))
+}
+
+//go:wasmimport wasi_snapshot_preview1 poll_oneoff
+//go:noescape
+func poll_oneoff(in, out unsafe.Pointer, nsubscriptions size, nevents unsafe.Pointer) errno
+
+func write1(fd uintptr, p unsafe.Pointer, n int32) int32 {
+	iov := iovec{
+		buf:    uintptr32(uintptr(p)),
+		bufLen: size(n),
+	}
+	var nwritten size
+	if fd_write(int32(fd), unsafe.Pointer(&iov), 1, unsafe.Pointer(&nwritten)) != 0 {
+		throw("fd_write failed")
+	}
+	return int32(nwritten)
+}
+
+func usleep(usec uint32) {
+	var in subscription
+	var out event
+	var nevents size
+
+	eventtype := in.u.eventtype()
+	*eventtype = eventtypeClock
+
+	subscription := in.u.subscriptionClock()
+	subscription.id = clockMonotonic
+	subscription.timeout = timestamp(usec) * 1e3
+	subscription.precision = 1e3
+
+	if poll_oneoff(unsafe.Pointer(&in), unsafe.Pointer(&out), 1, unsafe.Pointer(&nevents)) != 0 {
+		throw("wasi_snapshot_preview1.poll_oneoff")
+	}
+}
+
+func getRandomData(r []byte) {
+	if random_get(unsafe.Pointer(&r[0]), size(len(r))) != 0 {
+		throw("random_get failed")
+	}
+}
+
+func goenvs() {
+	// arguments
+	var argc size
+	var argvBufLen size
+	if args_sizes_get(unsafe.Pointer(&argc), unsafe.Pointer(&argvBufLen)) != 0 {
+		throw("args_sizes_get failed")
+	}
+
+	argslice = make([]string, argc)
+	if argc > 0 {
+		argv := make([]uintptr32, argc)
+		argvBuf := make([]byte, argvBufLen)
+		if args_get(unsafe.Pointer(&argv[0]), unsafe.Pointer(&argvBuf[0])) != 0 {
+			throw("args_get failed")
+		}
+
+		for i := range argslice {
+			start := argv[i] - uintptr32(uintptr(unsafe.Pointer(&argvBuf[0])))
+			end := start
+			for argvBuf[end] != 0 {
+				end++
+			}
+			argslice[i] = string(argvBuf[start:end])
+		}
+	}
+
+	// environment
+	var environCount size
+	var environBufLen size
+	if environ_sizes_get(unsafe.Pointer(&environCount), unsafe.Pointer(&environBufLen)) != 0 {
+		throw("environ_sizes_get failed")
+	}
+
+	envs = make([]string, environCount)
+	if environCount > 0 {
+		environ := make([]uintptr32, environCount)
+		environBuf := make([]byte, environBufLen)
+		if environ_get(unsafe.Pointer(&environ[0]), unsafe.Pointer(&environBuf[0])) != 0 {
+			throw("environ_get failed")
+		}
+
+		for i := range envs {
+			start := environ[i] - uintptr32(uintptr(unsafe.Pointer(&environBuf[0])))
+			end := start
+			for environBuf[end] != 0 {
+				end++
+			}
+			envs[i] = string(environBuf[start:end])
+		}
+	}
+}
+
+func walltime() (sec int64, nsec int32) {
+	return walltime1()
+}
+
+func walltime1() (sec int64, nsec int32) {
+	var time timestamp
+	if clock_time_get(clockRealtime, 0, unsafe.Pointer(&time)) != 0 {
+		throw("clock_time_get failed")
+	}
+	return int64(time / 1000000000), int32(time % 1000000000)
+}
+
+func nanotime1() int64 {
+	var time timestamp
+	if clock_time_get(clockMonotonic, 0, unsafe.Pointer(&time)) != 0 {
+		throw("clock_time_get failed")
+	}
+	return int64(time)
+}
diff --git a/src/runtime/os_wasm.go b/src/runtime/os_wasm.go
new file mode 100644
index 0000000..bf78dfb
--- /dev/null
+++ b/src/runtime/os_wasm.go
@@ -0,0 +1,153 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+func osinit() {
+	// https://webassembly.github.io/spec/core/exec/runtime.html#memory-instances
+	physPageSize = 64 * 1024
+	initBloc()
+	ncpu = 1
+	getg().m.procid = 2
+}
+
+const _SIGSEGV = 0xb
+
+func sigpanic() {
+	gp := getg()
+	if !canpanic() {
+		throw("unexpected signal during runtime execution")
+	}
+
+	// js only invokes the exception handler for memory faults.
+	gp.sig = _SIGSEGV
+	panicmem()
+}
+
+// func exitThread(wait *uint32)
+// FIXME: wasm doesn't have atomic yet
+func exitThread(wait *atomic.Uint32)
+
+type mOS struct{}
+
+func osyield()
+
+//go:nosplit
+func osyield_no_g() {
+	osyield()
+}
+
+type sigset struct{}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+	mp.gsignal = malg(32 * 1024)
+	mp.gsignal.m = mp
+}
+
+//go:nosplit
+func usleep_no_g(usec uint32) {
+	usleep(usec)
+}
+
+//go:nosplit
+func sigsave(p *sigset) {
+}
+
+//go:nosplit
+func msigrestore(sigmask sigset) {
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func clearSignalHandlers() {
+}
+
+//go:nosplit
+func sigblock(exiting bool) {
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+}
+
+// Called from dropm to undo the effect of an minit.
+func unminit() {
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+func mdestroy(mp *m) {
+}
+
+// wasm has no signals
+const _NSIG = 0
+
+func signame(sig uint32) string {
+	return ""
+}
+
+func crash() {
+	*(*int32)(nil) = 0
+}
+
+func initsig(preinit bool) {
+}
+
+// May run with m.p==nil, so write barriers are not allowed.
+//
+//go:nowritebarrier
+func newosproc(mp *m) {
+	throw("newosproc: not implemented")
+}
+
+//go:linkname os_sigpipe os.sigpipe
+func os_sigpipe() {
+	throw("too many writes on closed pipe")
+}
+
+//go:linkname syscall_now syscall.now
+func syscall_now() (sec int64, nsec int32) {
+	sec, nsec, _ = time_now()
+	return
+}
+
+//go:nosplit
+func cputicks() int64 {
+	// Currently cputicks() is used in blocking profiler and to seed runtime·fastrand().
+	// runtime·nanotime() is a poor approximation of CPU ticks that is enough for the profiler.
+	// TODO: need more entropy to better seed fastrand.
+	return nanotime()
+}
+
+// gsignalStack is unused on js.
+type gsignalStack struct{}
+
+const preemptMSupported = false
+
+func preemptM(mp *m) {
+	// No threads, so nothing to do.
+}
+
+// getfp returns the frame pointer register of its caller or 0 if not implemented.
+// TODO: Make this a compiler intrinsic
+func getfp() uintptr { return 0 }
+
+func setProcessCPUProfiler(hz int32) {}
+func setThreadCPUProfiler(hz int32)  {}
+func sigdisable(uint32)              {}
+func sigenable(uint32)               {}
+func sigignore(uint32)               {}
+
+// Stubs so tests can link correctly. These should never be called.
+func open(name *byte, mode, perm int32) int32        { panic("not implemented") }
+func closefd(fd int32) int32                         { panic("not implemented") }
+func read(fd int32, p unsafe.Pointer, n int32) int32 { panic("not implemented") }
diff --git a/src/runtime/os_windows.go b/src/runtime/os_windows.go
new file mode 100644
index 0000000..735a905
--- /dev/null
+++ b/src/runtime/os_windows.go
@@ -0,0 +1,1438 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// TODO(brainman): should not need those
+const (
+	_NSIG = 65
+)
+
+//go:cgo_import_dynamic runtime._AddVectoredExceptionHandler AddVectoredExceptionHandler%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._CloseHandle CloseHandle%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._CreateEventA CreateEventA%4 "kernel32.dll"
+//go:cgo_import_dynamic runtime._CreateFileA CreateFileA%7 "kernel32.dll"
+//go:cgo_import_dynamic runtime._CreateIoCompletionPort CreateIoCompletionPort%4 "kernel32.dll"
+//go:cgo_import_dynamic runtime._CreateThread CreateThread%6 "kernel32.dll"
+//go:cgo_import_dynamic runtime._CreateWaitableTimerA CreateWaitableTimerA%3 "kernel32.dll"
+//go:cgo_import_dynamic runtime._CreateWaitableTimerExW CreateWaitableTimerExW%4 "kernel32.dll"
+//go:cgo_import_dynamic runtime._DuplicateHandle DuplicateHandle%7 "kernel32.dll"
+//go:cgo_import_dynamic runtime._ExitProcess ExitProcess%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._FreeEnvironmentStringsW FreeEnvironmentStringsW%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetConsoleMode GetConsoleMode%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetCurrentThreadId GetCurrentThreadId%0 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetEnvironmentStringsW GetEnvironmentStringsW%0 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetErrorMode GetErrorMode%0 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetProcAddress GetProcAddress%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetProcessAffinityMask GetProcessAffinityMask%3 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetQueuedCompletionStatusEx GetQueuedCompletionStatusEx%6 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetStdHandle GetStdHandle%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetSystemDirectoryA GetSystemDirectoryA%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetSystemInfo GetSystemInfo%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._GetThreadContext GetThreadContext%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SetThreadContext SetThreadContext%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._LoadLibraryExW LoadLibraryExW%3 "kernel32.dll"
+//go:cgo_import_dynamic runtime._LoadLibraryW LoadLibraryW%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._PostQueuedCompletionStatus PostQueuedCompletionStatus%4 "kernel32.dll"
+//go:cgo_import_dynamic runtime._RaiseFailFastException RaiseFailFastException%3 "kernel32.dll"
+//go:cgo_import_dynamic runtime._ResumeThread ResumeThread%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SetConsoleCtrlHandler SetConsoleCtrlHandler%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SetErrorMode SetErrorMode%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SetEvent SetEvent%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SetProcessPriorityBoost SetProcessPriorityBoost%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SetThreadPriority SetThreadPriority%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SetUnhandledExceptionFilter SetUnhandledExceptionFilter%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SetWaitableTimer SetWaitableTimer%6 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SuspendThread SuspendThread%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._SwitchToThread SwitchToThread%0 "kernel32.dll"
+//go:cgo_import_dynamic runtime._TlsAlloc TlsAlloc%0 "kernel32.dll"
+//go:cgo_import_dynamic runtime._VirtualAlloc VirtualAlloc%4 "kernel32.dll"
+//go:cgo_import_dynamic runtime._VirtualFree VirtualFree%3 "kernel32.dll"
+//go:cgo_import_dynamic runtime._VirtualQuery VirtualQuery%3 "kernel32.dll"
+//go:cgo_import_dynamic runtime._WaitForSingleObject WaitForSingleObject%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._WaitForMultipleObjects WaitForMultipleObjects%4 "kernel32.dll"
+//go:cgo_import_dynamic runtime._WerGetFlags WerGetFlags%2 "kernel32.dll"
+//go:cgo_import_dynamic runtime._WerSetFlags WerSetFlags%1 "kernel32.dll"
+//go:cgo_import_dynamic runtime._WriteConsoleW WriteConsoleW%5 "kernel32.dll"
+//go:cgo_import_dynamic runtime._WriteFile WriteFile%5 "kernel32.dll"
+
+type stdFunction unsafe.Pointer
+
+var (
+	// Following syscalls are available on every Windows PC.
+	// All these variables are set by the Windows executable
+	// loader before the Go program starts.
+	_AddVectoredExceptionHandler,
+	_CloseHandle,
+	_CreateEventA,
+	_CreateFileA,
+	_CreateIoCompletionPort,
+	_CreateThread,
+	_CreateWaitableTimerA,
+	_CreateWaitableTimerExW,
+	_DuplicateHandle,
+	_ExitProcess,
+	_FreeEnvironmentStringsW,
+	_GetConsoleMode,
+	_GetCurrentThreadId,
+	_GetEnvironmentStringsW,
+	_GetErrorMode,
+	_GetProcAddress,
+	_GetProcessAffinityMask,
+	_GetQueuedCompletionStatusEx,
+	_GetStdHandle,
+	_GetSystemDirectoryA,
+	_GetSystemInfo,
+	_GetSystemTimeAsFileTime,
+	_GetThreadContext,
+	_SetThreadContext,
+	_LoadLibraryExW,
+	_LoadLibraryW,
+	_PostQueuedCompletionStatus,
+	_QueryPerformanceCounter,
+	_QueryPerformanceFrequency,
+	_RaiseFailFastException,
+	_ResumeThread,
+	_SetConsoleCtrlHandler,
+	_SetErrorMode,
+	_SetEvent,
+	_SetProcessPriorityBoost,
+	_SetThreadPriority,
+	_SetUnhandledExceptionFilter,
+	_SetWaitableTimer,
+	_SuspendThread,
+	_SwitchToThread,
+	_TlsAlloc,
+	_VirtualAlloc,
+	_VirtualFree,
+	_VirtualQuery,
+	_WaitForSingleObject,
+	_WaitForMultipleObjects,
+	_WerGetFlags,
+	_WerSetFlags,
+	_WriteConsoleW,
+	_WriteFile,
+	_ stdFunction
+
+	// Following syscalls are only available on some Windows PCs.
+	// We will load syscalls, if available, before using them.
+	_AddVectoredContinueHandler,
+	_ stdFunction
+
+	// Use ProcessPrng to generate cryptographically random data.
+	_ProcessPrng stdFunction
+
+	// Load ntdll.dll manually during startup, otherwise Mingw
+	// links wrong printf function to cgo executable (see issue
+	// 12030 for details).
+	_NtWaitForSingleObject  stdFunction
+	_RtlGetCurrentPeb       stdFunction
+	_RtlGetNtVersionNumbers stdFunction
+
+	// These are from non-kernel32.dll, so we prefer to LoadLibraryEx them.
+	_timeBeginPeriod,
+	_timeEndPeriod,
+	_WSAGetOverlappedResult,
+	_ stdFunction
+)
+
+var (
+	bcryptprimitivesdll = [...]uint16{'b', 'c', 'r', 'y', 'p', 't', 'p', 'r', 'i', 'm', 'i', 't', 'i', 'v', 'e', 's', '.', 'd', 'l', 'l', 0}
+	kernel32dll         = [...]uint16{'k', 'e', 'r', 'n', 'e', 'l', '3', '2', '.', 'd', 'l', 'l', 0}
+	ntdlldll            = [...]uint16{'n', 't', 'd', 'l', 'l', '.', 'd', 'l', 'l', 0}
+	powrprofdll         = [...]uint16{'p', 'o', 'w', 'r', 'p', 'r', 'o', 'f', '.', 'd', 'l', 'l', 0}
+	winmmdll            = [...]uint16{'w', 'i', 'n', 'm', 'm', '.', 'd', 'l', 'l', 0}
+	ws2_32dll           = [...]uint16{'w', 's', '2', '_', '3', '2', '.', 'd', 'l', 'l', 0}
+)
+
+// Function to be called by windows CreateThread
+// to start new os thread.
+func tstart_stdcall(newm *m)
+
+// Init-time helper
+func wintls()
+
+type mOS struct {
+	threadLock mutex   // protects "thread" and prevents closing
+	thread     uintptr // thread handle
+
+	waitsema   uintptr // semaphore for parking on locks
+	resumesema uintptr // semaphore to indicate suspend/resume
+
+	highResTimer uintptr // high resolution timer handle used in usleep
+
+	// preemptExtLock synchronizes preemptM with entry/exit from
+	// external C code.
+	//
+	// This protects against races between preemptM calling
+	// SuspendThread and external code on this thread calling
+	// ExitProcess. If these happen concurrently, it's possible to
+	// exit the suspending thread and suspend the exiting thread,
+	// leading to deadlock.
+	//
+	// 0 indicates this M is not being preempted or in external
+	// code. Entering external code CASes this from 0 to 1. If
+	// this fails, a preemption is in progress, so the thread must
+	// wait for the preemption. preemptM also CASes this from 0 to
+	// 1. If this fails, the preemption fails (as it would if the
+	// PC weren't in Go code). The value is reset to 0 when
+	// returning from external code or after a preemption is
+	// complete.
+	//
+	// TODO(austin): We may not need this if preemption were more
+	// tightly synchronized on the G/P status and preemption
+	// blocked transition into _Gsyscall/_Psyscall.
+	preemptExtLock uint32
+}
+
+// Stubs so tests can link correctly. These should never be called.
+func open(name *byte, mode, perm int32) int32 {
+	throw("unimplemented")
+	return -1
+}
+func closefd(fd int32) int32 {
+	throw("unimplemented")
+	return -1
+}
+func read(fd int32, p unsafe.Pointer, n int32) int32 {
+	throw("unimplemented")
+	return -1
+}
+
+type sigset struct{}
+
+// Call a Windows function with stdcall conventions,
+// and switch to os stack during the call.
+func asmstdcall(fn unsafe.Pointer)
+
+var asmstdcallAddr unsafe.Pointer
+
+func windowsFindfunc(lib uintptr, name []byte) stdFunction {
+	if name[len(name)-1] != 0 {
+		throw("usage")
+	}
+	f := stdcall2(_GetProcAddress, lib, uintptr(unsafe.Pointer(&name[0])))
+	return stdFunction(unsafe.Pointer(f))
+}
+
+const _MAX_PATH = 260 // https://docs.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation
+var sysDirectory [_MAX_PATH + 1]byte
+var sysDirectoryLen uintptr
+
+func initSysDirectory() {
+	l := stdcall2(_GetSystemDirectoryA, uintptr(unsafe.Pointer(&sysDirectory[0])), uintptr(len(sysDirectory)-1))
+	if l == 0 || l > uintptr(len(sysDirectory)-1) {
+		throw("Unable to determine system directory")
+	}
+	sysDirectory[l] = '\\'
+	sysDirectoryLen = l + 1
+}
+
+func windowsLoadSystemLib(name []uint16) uintptr {
+	return stdcall3(_LoadLibraryExW, uintptr(unsafe.Pointer(&name[0])), 0, _LOAD_LIBRARY_SEARCH_SYSTEM32)
+}
+
+const haveCputicksAsm = GOARCH == "386" || GOARCH == "amd64"
+
+func loadOptionalSyscalls() {
+	k32 := windowsLoadSystemLib(kernel32dll[:])
+	if k32 == 0 {
+		throw("kernel32.dll not found")
+	}
+	_AddVectoredContinueHandler = windowsFindfunc(k32, []byte("AddVectoredContinueHandler\000"))
+
+	bcryptPrimitives := windowsLoadSystemLib(bcryptprimitivesdll[:])
+	if bcryptPrimitives == 0 {
+		throw("bcryptprimitives.dll not found")
+	}
+	_ProcessPrng = windowsFindfunc(bcryptPrimitives, []byte("ProcessPrng\000"))
+
+	n32 := windowsLoadSystemLib(ntdlldll[:])
+	if n32 == 0 {
+		throw("ntdll.dll not found")
+	}
+	_NtWaitForSingleObject = windowsFindfunc(n32, []byte("NtWaitForSingleObject\000"))
+	_RtlGetCurrentPeb = windowsFindfunc(n32, []byte("RtlGetCurrentPeb\000"))
+	_RtlGetNtVersionNumbers = windowsFindfunc(n32, []byte("RtlGetNtVersionNumbers\000"))
+
+	if !haveCputicksAsm {
+		_QueryPerformanceCounter = windowsFindfunc(k32, []byte("QueryPerformanceCounter\000"))
+		if _QueryPerformanceCounter == nil {
+			throw("could not find QPC syscalls")
+		}
+	}
+
+	m32 := windowsLoadSystemLib(winmmdll[:])
+	if m32 == 0 {
+		throw("winmm.dll not found")
+	}
+	_timeBeginPeriod = windowsFindfunc(m32, []byte("timeBeginPeriod\000"))
+	_timeEndPeriod = windowsFindfunc(m32, []byte("timeEndPeriod\000"))
+	if _timeBeginPeriod == nil || _timeEndPeriod == nil {
+		throw("timeBegin/EndPeriod not found")
+	}
+
+	ws232 := windowsLoadSystemLib(ws2_32dll[:])
+	if ws232 == 0 {
+		throw("ws2_32.dll not found")
+	}
+	_WSAGetOverlappedResult = windowsFindfunc(ws232, []byte("WSAGetOverlappedResult\000"))
+	if _WSAGetOverlappedResult == nil {
+		throw("WSAGetOverlappedResult not found")
+	}
+
+	if windowsFindfunc(n32, []byte("wine_get_version\000")) != nil {
+		// running on Wine
+		initWine(k32)
+	}
+}
+
+func monitorSuspendResume() {
+	const (
+		_DEVICE_NOTIFY_CALLBACK = 2
+	)
+	type _DEVICE_NOTIFY_SUBSCRIBE_PARAMETERS struct {
+		callback uintptr
+		context  uintptr
+	}
+
+	powrprof := windowsLoadSystemLib(powrprofdll[:])
+	if powrprof == 0 {
+		return // Running on Windows 7, where we don't need it anyway.
+	}
+	powerRegisterSuspendResumeNotification := windowsFindfunc(powrprof, []byte("PowerRegisterSuspendResumeNotification\000"))
+	if powerRegisterSuspendResumeNotification == nil {
+		return // Running on Windows 7, where we don't need it anyway.
+	}
+	var fn any = func(context uintptr, changeType uint32, setting uintptr) uintptr {
+		for mp := (*m)(atomic.Loadp(unsafe.Pointer(&allm))); mp != nil; mp = mp.alllink {
+			if mp.resumesema != 0 {
+				stdcall1(_SetEvent, mp.resumesema)
+			}
+		}
+		return 0
+	}
+	params := _DEVICE_NOTIFY_SUBSCRIBE_PARAMETERS{
+		callback: compileCallback(*efaceOf(&fn), true),
+	}
+	handle := uintptr(0)
+	stdcall3(powerRegisterSuspendResumeNotification, _DEVICE_NOTIFY_CALLBACK,
+		uintptr(unsafe.Pointer(&params)), uintptr(unsafe.Pointer(&handle)))
+}
+
+//go:nosplit
+func getLoadLibrary() uintptr {
+	return uintptr(unsafe.Pointer(_LoadLibraryW))
+}
+
+//go:nosplit
+func getLoadLibraryEx() uintptr {
+	return uintptr(unsafe.Pointer(_LoadLibraryExW))
+}
+
+//go:nosplit
+func getGetProcAddress() uintptr {
+	return uintptr(unsafe.Pointer(_GetProcAddress))
+}
+
+func getproccount() int32 {
+	var mask, sysmask uintptr
+	ret := stdcall3(_GetProcessAffinityMask, currentProcess, uintptr(unsafe.Pointer(&mask)), uintptr(unsafe.Pointer(&sysmask)))
+	if ret != 0 {
+		n := 0
+		maskbits := int(unsafe.Sizeof(mask) * 8)
+		for i := 0; i < maskbits; i++ {
+			if mask&(1<<uint(i)) != 0 {
+				n++
+			}
+		}
+		if n != 0 {
+			return int32(n)
+		}
+	}
+	// use GetSystemInfo if GetProcessAffinityMask fails
+	var info systeminfo
+	stdcall1(_GetSystemInfo, uintptr(unsafe.Pointer(&info)))
+	return int32(info.dwnumberofprocessors)
+}
+
+func getPageSize() uintptr {
+	var info systeminfo
+	stdcall1(_GetSystemInfo, uintptr(unsafe.Pointer(&info)))
+	return uintptr(info.dwpagesize)
+}
+
+const (
+	currentProcess = ^uintptr(0) // -1 = current process
+	currentThread  = ^uintptr(1) // -2 = current thread
+)
+
+// in sys_windows_386.s and sys_windows_amd64.s:
+func getlasterror() uint32
+
+var timeBeginPeriodRetValue uint32
+
+// osRelaxMinNS indicates that sysmon shouldn't osRelax if the next
+// timer is less than 60 ms from now. Since osRelaxing may reduce
+// timer resolution to 15.6 ms, this keeps timer error under roughly 1
+// part in 4.
+const osRelaxMinNS = 60 * 1e6
+
+// osRelax is called by the scheduler when transitioning to and from
+// all Ps being idle.
+//
+// Some versions of Windows have high resolution timer. For those
+// versions osRelax is noop.
+// For Windows versions without high resolution timer, osRelax
+// adjusts the system-wide timer resolution. Go needs a
+// high resolution timer while running and there's little extra cost
+// if we're already using the CPU, but if all Ps are idle there's no
+// need to consume extra power to drive the high-res timer.
+func osRelax(relax bool) uint32 {
+	if haveHighResTimer {
+		// If the high resolution timer is available, the runtime uses the timer
+		// to sleep for short durations. This means there's no need to adjust
+		// the global clock frequency.
+		return 0
+	}
+
+	if relax {
+		return uint32(stdcall1(_timeEndPeriod, 1))
+	} else {
+		return uint32(stdcall1(_timeBeginPeriod, 1))
+	}
+}
+
+// haveHighResTimer indicates that the CreateWaitableTimerEx
+// CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag is available.
+var haveHighResTimer = false
+
+// createHighResTimer calls CreateWaitableTimerEx with
+// CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag to create high
+// resolution timer. createHighResTimer returns new timer
+// handle or 0, if CreateWaitableTimerEx failed.
+func createHighResTimer() uintptr {
+	const (
+		// As per @jstarks, see
+		// https://github.com/golang/go/issues/8687#issuecomment-656259353
+		_CREATE_WAITABLE_TIMER_HIGH_RESOLUTION = 0x00000002
+
+		_SYNCHRONIZE        = 0x00100000
+		_TIMER_QUERY_STATE  = 0x0001
+		_TIMER_MODIFY_STATE = 0x0002
+	)
+	return stdcall4(_CreateWaitableTimerExW, 0, 0,
+		_CREATE_WAITABLE_TIMER_HIGH_RESOLUTION,
+		_SYNCHRONIZE|_TIMER_QUERY_STATE|_TIMER_MODIFY_STATE)
+}
+
+func initHighResTimer() {
+	h := createHighResTimer()
+	if h != 0 {
+		haveHighResTimer = true
+		stdcall1(_CloseHandle, h)
+	}
+}
+
+//go:linkname canUseLongPaths os.canUseLongPaths
+var canUseLongPaths bool
+
+// We want this to be large enough to hold the contents of sysDirectory, *plus*
+// a slash and another component that itself is greater than MAX_PATH.
+var longFileName [(_MAX_PATH+1)*2 + 1]byte
+
+// initLongPathSupport initializes the canUseLongPaths variable, which is
+// linked into os.canUseLongPaths for determining whether or not long paths
+// need to be fixed up. In the best case, this function is running on newer
+// Windows 10 builds, which have a bit field member of the PEB called
+// "IsLongPathAwareProcess." When this is set, we don't need to go through the
+// error-prone fixup function in order to access long paths. So this init
+// function first checks the Windows build number, sets the flag, and then
+// tests to see if it's actually working. If everything checks out, then
+// canUseLongPaths is set to true, and later when called, os.fixLongPath
+// returns early without doing work.
+func initLongPathSupport() {
+	const (
+		IsLongPathAwareProcess = 0x80
+		PebBitFieldOffset      = 3
+		OPEN_EXISTING          = 3
+		ERROR_PATH_NOT_FOUND   = 3
+	)
+
+	// Check that we're ≥ 10.0.15063.
+	var maj, min, build uint32
+	stdcall3(_RtlGetNtVersionNumbers, uintptr(unsafe.Pointer(&maj)), uintptr(unsafe.Pointer(&min)), uintptr(unsafe.Pointer(&build)))
+	if maj < 10 || (maj == 10 && min == 0 && build&0xffff < 15063) {
+		return
+	}
+
+	// Set the IsLongPathAwareProcess flag of the PEB's bit field.
+	bitField := (*byte)(unsafe.Pointer(stdcall0(_RtlGetCurrentPeb) + PebBitFieldOffset))
+	originalBitField := *bitField
+	*bitField |= IsLongPathAwareProcess
+
+	// Check that this actually has an effect, by constructing a large file
+	// path and seeing whether we get ERROR_PATH_NOT_FOUND, rather than
+	// some other error, which would indicate the path is too long, and
+	// hence long path support is not successful. This whole section is NOT
+	// strictly necessary, but is a nice validity check for the near to
+	// medium term, when this functionality is still relatively new in
+	// Windows.
+	getRandomData(longFileName[len(longFileName)-33 : len(longFileName)-1])
+	start := copy(longFileName[:], sysDirectory[:sysDirectoryLen])
+	const dig = "0123456789abcdef"
+	for i := 0; i < 32; i++ {
+		longFileName[start+i*2] = dig[longFileName[len(longFileName)-33+i]>>4]
+		longFileName[start+i*2+1] = dig[longFileName[len(longFileName)-33+i]&0xf]
+	}
+	start += 64
+	for i := start; i < len(longFileName)-1; i++ {
+		longFileName[i] = 'A'
+	}
+	stdcall7(_CreateFileA, uintptr(unsafe.Pointer(&longFileName[0])), 0, 0, 0, OPEN_EXISTING, 0, 0)
+	// The ERROR_PATH_NOT_FOUND error value is distinct from
+	// ERROR_FILE_NOT_FOUND or ERROR_INVALID_NAME, the latter of which we
+	// expect here due to the final component being too long.
+	if getlasterror() == ERROR_PATH_NOT_FOUND {
+		*bitField = originalBitField
+		println("runtime: warning: IsLongPathAwareProcess failed to enable long paths; proceeding in fixup mode")
+		return
+	}
+
+	canUseLongPaths = true
+}
+
+func osinit() {
+	asmstdcallAddr = unsafe.Pointer(abi.FuncPCABI0(asmstdcall))
+
+	loadOptionalSyscalls()
+
+	preventErrorDialogs()
+
+	initExceptionHandler()
+
+	initHighResTimer()
+	timeBeginPeriodRetValue = osRelax(false)
+
+	initSysDirectory()
+	initLongPathSupport()
+
+	ncpu = getproccount()
+
+	physPageSize = getPageSize()
+
+	// Windows dynamic priority boosting assumes that a process has different types
+	// of dedicated threads -- GUI, IO, computational, etc. Go processes use
+	// equivalent threads that all do a mix of GUI, IO, computations, etc.
+	// In such context dynamic priority boosting does nothing but harm, so we turn it off.
+	stdcall2(_SetProcessPriorityBoost, currentProcess, 1)
+}
+
+// useQPCTime controls whether time.now and nanotime use QueryPerformanceCounter.
+// This is only set to 1 when running under Wine.
+var useQPCTime uint8
+
+var qpcStartCounter int64
+var qpcMultiplier int64
+
+//go:nosplit
+func nanotimeQPC() int64 {
+	var counter int64 = 0
+	stdcall1(_QueryPerformanceCounter, uintptr(unsafe.Pointer(&counter)))
+
+	// returns number of nanoseconds
+	return (counter - qpcStartCounter) * qpcMultiplier
+}
+
+//go:nosplit
+func nowQPC() (sec int64, nsec int32, mono int64) {
+	var ft int64
+	stdcall1(_GetSystemTimeAsFileTime, uintptr(unsafe.Pointer(&ft)))
+
+	t := (ft - 116444736000000000) * 100
+
+	sec = t / 1000000000
+	nsec = int32(t - sec*1000000000)
+
+	mono = nanotimeQPC()
+	return
+}
+
+func initWine(k32 uintptr) {
+	_GetSystemTimeAsFileTime = windowsFindfunc(k32, []byte("GetSystemTimeAsFileTime\000"))
+	if _GetSystemTimeAsFileTime == nil {
+		throw("could not find GetSystemTimeAsFileTime() syscall")
+	}
+
+	_QueryPerformanceCounter = windowsFindfunc(k32, []byte("QueryPerformanceCounter\000"))
+	_QueryPerformanceFrequency = windowsFindfunc(k32, []byte("QueryPerformanceFrequency\000"))
+	if _QueryPerformanceCounter == nil || _QueryPerformanceFrequency == nil {
+		throw("could not find QPC syscalls")
+	}
+
+	// We can not simply fallback to GetSystemTimeAsFileTime() syscall, since its time is not monotonic,
+	// instead we use QueryPerformanceCounter family of syscalls to implement monotonic timer
+	// https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408(v=vs.85).aspx
+
+	var tmp int64
+	stdcall1(_QueryPerformanceFrequency, uintptr(unsafe.Pointer(&tmp)))
+	if tmp == 0 {
+		throw("QueryPerformanceFrequency syscall returned zero, running on unsupported hardware")
+	}
+
+	// This should not overflow, it is a number of ticks of the performance counter per second,
+	// its resolution is at most 10 per usecond (on Wine, even smaller on real hardware), so it will be at most 10 millions here,
+	// panic if overflows.
+	if tmp > (1<<31 - 1) {
+		throw("QueryPerformanceFrequency overflow 32 bit divider, check nosplit discussion to proceed")
+	}
+	qpcFrequency := int32(tmp)
+	stdcall1(_QueryPerformanceCounter, uintptr(unsafe.Pointer(&qpcStartCounter)))
+
+	// Since we are supposed to run this time calls only on Wine, it does not lose precision,
+	// since Wine's timer is kind of emulated at 10 Mhz, so it will be a nice round multiplier of 100
+	// but for general purpose system (like 3.3 Mhz timer on i7) it will not be very precise.
+	// We have to do it this way (or similar), since multiplying QPC counter by 100 millions overflows
+	// int64 and resulted time will always be invalid.
+	qpcMultiplier = int64(timediv(1000000000, qpcFrequency, nil))
+
+	useQPCTime = 1
+}
+
+//go:nosplit
+func getRandomData(r []byte) {
+	n := 0
+	if stdcall2(_ProcessPrng, uintptr(unsafe.Pointer(&r[0])), uintptr(len(r)))&0xff != 0 {
+		n = len(r)
+	}
+	extendRandom(r, n)
+}
+
+func goenvs() {
+	// strings is a pointer to environment variable pairs in the form:
+	//     "envA=valA\x00envB=valB\x00\x00" (in UTF-16)
+	// Two consecutive zero bytes end the list.
+	strings := unsafe.Pointer(stdcall0(_GetEnvironmentStringsW))
+	p := (*[1 << 24]uint16)(strings)[:]
+
+	n := 0
+	for from, i := 0, 0; true; i++ {
+		if p[i] == 0 {
+			// empty string marks the end
+			if i == from {
+				break
+			}
+			from = i + 1
+			n++
+		}
+	}
+	envs = make([]string, n)
+
+	for i := range envs {
+		envs[i] = gostringw(&p[0])
+		for p[0] != 0 {
+			p = p[1:]
+		}
+		p = p[1:] // skip nil byte
+	}
+
+	stdcall1(_FreeEnvironmentStringsW, uintptr(strings))
+
+	// We call these all the way here, late in init, so that malloc works
+	// for the callback functions these generate.
+	var fn any = ctrlHandler
+	ctrlHandlerPC := compileCallback(*efaceOf(&fn), true)
+	stdcall2(_SetConsoleCtrlHandler, ctrlHandlerPC, 1)
+
+	monitorSuspendResume()
+}
+
+// exiting is set to non-zero when the process is exiting.
+var exiting uint32
+
+//go:nosplit
+func exit(code int32) {
+	// Disallow thread suspension for preemption. Otherwise,
+	// ExitProcess and SuspendThread can race: SuspendThread
+	// queues a suspension request for this thread, ExitProcess
+	// kills the suspending thread, and then this thread suspends.
+	lock(&suspendLock)
+	atomic.Store(&exiting, 1)
+	stdcall1(_ExitProcess, uintptr(code))
+}
+
+// write1 must be nosplit because it's used as a last resort in
+// functions like badmorestackg0. In such cases, we'll always take the
+// ASCII path.
+//
+//go:nosplit
+func write1(fd uintptr, buf unsafe.Pointer, n int32) int32 {
+	const (
+		_STD_OUTPUT_HANDLE = ^uintptr(10) // -11
+		_STD_ERROR_HANDLE  = ^uintptr(11) // -12
+	)
+	var handle uintptr
+	switch fd {
+	case 1:
+		handle = stdcall1(_GetStdHandle, _STD_OUTPUT_HANDLE)
+	case 2:
+		handle = stdcall1(_GetStdHandle, _STD_ERROR_HANDLE)
+	default:
+		// assume fd is real windows handle.
+		handle = fd
+	}
+	isASCII := true
+	b := (*[1 << 30]byte)(buf)[:n]
+	for _, x := range b {
+		if x >= 0x80 {
+			isASCII = false
+			break
+		}
+	}
+
+	if !isASCII {
+		var m uint32
+		isConsole := stdcall2(_GetConsoleMode, handle, uintptr(unsafe.Pointer(&m))) != 0
+		// If this is a console output, various non-unicode code pages can be in use.
+		// Use the dedicated WriteConsole call to ensure unicode is printed correctly.
+		if isConsole {
+			return int32(writeConsole(handle, buf, n))
+		}
+	}
+	var written uint32
+	stdcall5(_WriteFile, handle, uintptr(buf), uintptr(n), uintptr(unsafe.Pointer(&written)), 0)
+	return int32(written)
+}
+
+var (
+	utf16ConsoleBack     [1000]uint16
+	utf16ConsoleBackLock mutex
+)
+
+// writeConsole writes bufLen bytes from buf to the console File.
+// It returns the number of bytes written.
+func writeConsole(handle uintptr, buf unsafe.Pointer, bufLen int32) int {
+	const surr2 = (surrogateMin + surrogateMax + 1) / 2
+
+	// Do not use defer for unlock. May cause issues when printing a panic.
+	lock(&utf16ConsoleBackLock)
+
+	b := (*[1 << 30]byte)(buf)[:bufLen]
+	s := *(*string)(unsafe.Pointer(&b))
+
+	utf16tmp := utf16ConsoleBack[:]
+
+	total := len(s)
+	w := 0
+	for _, r := range s {
+		if w >= len(utf16tmp)-2 {
+			writeConsoleUTF16(handle, utf16tmp[:w])
+			w = 0
+		}
+		if r < 0x10000 {
+			utf16tmp[w] = uint16(r)
+			w++
+		} else {
+			r -= 0x10000
+			utf16tmp[w] = surrogateMin + uint16(r>>10)&0x3ff
+			utf16tmp[w+1] = surr2 + uint16(r)&0x3ff
+			w += 2
+		}
+	}
+	writeConsoleUTF16(handle, utf16tmp[:w])
+	unlock(&utf16ConsoleBackLock)
+	return total
+}
+
+// writeConsoleUTF16 is the dedicated windows calls that correctly prints
+// to the console regardless of the current code page. Input is utf-16 code points.
+// The handle must be a console handle.
+func writeConsoleUTF16(handle uintptr, b []uint16) {
+	l := uint32(len(b))
+	if l == 0 {
+		return
+	}
+	var written uint32
+	stdcall5(_WriteConsoleW,
+		handle,
+		uintptr(unsafe.Pointer(&b[0])),
+		uintptr(l),
+		uintptr(unsafe.Pointer(&written)),
+		0,
+	)
+	return
+}
+
+//go:nosplit
+func semasleep(ns int64) int32 {
+	const (
+		_WAIT_ABANDONED = 0x00000080
+		_WAIT_OBJECT_0  = 0x00000000
+		_WAIT_TIMEOUT   = 0x00000102
+		_WAIT_FAILED    = 0xFFFFFFFF
+	)
+
+	var result uintptr
+	if ns < 0 {
+		result = stdcall2(_WaitForSingleObject, getg().m.waitsema, uintptr(_INFINITE))
+	} else {
+		start := nanotime()
+		elapsed := int64(0)
+		for {
+			ms := int64(timediv(ns-elapsed, 1000000, nil))
+			if ms == 0 {
+				ms = 1
+			}
+			result = stdcall4(_WaitForMultipleObjects, 2,
+				uintptr(unsafe.Pointer(&[2]uintptr{getg().m.waitsema, getg().m.resumesema})),
+				0, uintptr(ms))
+			if result != _WAIT_OBJECT_0+1 {
+				// Not a suspend/resume event
+				break
+			}
+			elapsed = nanotime() - start
+			if elapsed >= ns {
+				return -1
+			}
+		}
+	}
+	switch result {
+	case _WAIT_OBJECT_0: // Signaled
+		return 0
+
+	case _WAIT_TIMEOUT:
+		return -1
+
+	case _WAIT_ABANDONED:
+		systemstack(func() {
+			throw("runtime.semasleep wait_abandoned")
+		})
+
+	case _WAIT_FAILED:
+		systemstack(func() {
+			print("runtime: waitforsingleobject wait_failed; errno=", getlasterror(), "\n")
+			throw("runtime.semasleep wait_failed")
+		})
+
+	default:
+		systemstack(func() {
+			print("runtime: waitforsingleobject unexpected; result=", result, "\n")
+			throw("runtime.semasleep unexpected")
+		})
+	}
+
+	return -1 // unreachable
+}
+
+//go:nosplit
+func semawakeup(mp *m) {
+	if stdcall1(_SetEvent, mp.waitsema) == 0 {
+		systemstack(func() {
+			print("runtime: setevent failed; errno=", getlasterror(), "\n")
+			throw("runtime.semawakeup")
+		})
+	}
+}
+
+//go:nosplit
+func semacreate(mp *m) {
+	if mp.waitsema != 0 {
+		return
+	}
+	mp.waitsema = stdcall4(_CreateEventA, 0, 0, 0, 0)
+	if mp.waitsema == 0 {
+		systemstack(func() {
+			print("runtime: createevent failed; errno=", getlasterror(), "\n")
+			throw("runtime.semacreate")
+		})
+	}
+	mp.resumesema = stdcall4(_CreateEventA, 0, 0, 0, 0)
+	if mp.resumesema == 0 {
+		systemstack(func() {
+			print("runtime: createevent failed; errno=", getlasterror(), "\n")
+			throw("runtime.semacreate")
+		})
+		stdcall1(_CloseHandle, mp.waitsema)
+		mp.waitsema = 0
+	}
+}
+
+// May run with m.p==nil, so write barriers are not allowed. This
+// function is called by newosproc0, so it is also required to
+// operate without stack guards.
+//
+//go:nowritebarrierrec
+//go:nosplit
+func newosproc(mp *m) {
+	// We pass 0 for the stack size to use the default for this binary.
+	thandle := stdcall6(_CreateThread, 0, 0,
+		abi.FuncPCABI0(tstart_stdcall), uintptr(unsafe.Pointer(mp)),
+		0, 0)
+
+	if thandle == 0 {
+		if atomic.Load(&exiting) != 0 {
+			// CreateThread may fail if called
+			// concurrently with ExitProcess. If this
+			// happens, just freeze this thread and let
+			// the process exit. See issue #18253.
+			lock(&deadlock)
+			lock(&deadlock)
+		}
+		print("runtime: failed to create new OS thread (have ", mcount(), " already; errno=", getlasterror(), ")\n")
+		throw("runtime.newosproc")
+	}
+
+	// Close thandle to avoid leaking the thread object if it exits.
+	stdcall1(_CloseHandle, thandle)
+}
+
+// Used by the C library build mode. On Linux this function would allocate a
+// stack, but that's not necessary for Windows. No stack guards are present
+// and the GC has not been initialized, so write barriers will fail.
+//
+//go:nowritebarrierrec
+//go:nosplit
+func newosproc0(mp *m, stk unsafe.Pointer) {
+	// TODO: this is completely broken. The args passed to newosproc0 (in asm_amd64.s)
+	// are stacksize and function, not *m and stack.
+	// Check os_linux.go for an implementation that might actually work.
+	throw("bad newosproc0")
+}
+
+func exitThread(wait *atomic.Uint32) {
+	// We should never reach exitThread on Windows because we let
+	// the OS clean up threads.
+	throw("exitThread")
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the parent thread (main thread in case of bootstrap), can allocate memory.
+func mpreinit(mp *m) {
+}
+
+//go:nosplit
+func sigsave(p *sigset) {
+}
+
+//go:nosplit
+func msigrestore(sigmask sigset) {
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func clearSignalHandlers() {
+}
+
+//go:nosplit
+func sigblock(exiting bool) {
+}
+
+// Called to initialize a new m (including the bootstrap m).
+// Called on the new thread, cannot allocate memory.
+func minit() {
+	var thandle uintptr
+	if stdcall7(_DuplicateHandle, currentProcess, currentThread, currentProcess, uintptr(unsafe.Pointer(&thandle)), 0, 0, _DUPLICATE_SAME_ACCESS) == 0 {
+		print("runtime.minit: duplicatehandle failed; errno=", getlasterror(), "\n")
+		throw("runtime.minit: duplicatehandle failed")
+	}
+
+	mp := getg().m
+	lock(&mp.threadLock)
+	mp.thread = thandle
+	mp.procid = uint64(stdcall0(_GetCurrentThreadId))
+
+	// Configure usleep timer, if possible.
+	if mp.highResTimer == 0 && haveHighResTimer {
+		mp.highResTimer = createHighResTimer()
+		if mp.highResTimer == 0 {
+			print("runtime: CreateWaitableTimerEx failed; errno=", getlasterror(), "\n")
+			throw("CreateWaitableTimerEx when creating timer failed")
+		}
+	}
+	unlock(&mp.threadLock)
+
+	// Query the true stack base from the OS. Currently we're
+	// running on a small assumed stack.
+	var mbi memoryBasicInformation
+	res := stdcall3(_VirtualQuery, uintptr(unsafe.Pointer(&mbi)), uintptr(unsafe.Pointer(&mbi)), unsafe.Sizeof(mbi))
+	if res == 0 {
+		print("runtime: VirtualQuery failed; errno=", getlasterror(), "\n")
+		throw("VirtualQuery for stack base failed")
+	}
+	// The system leaves an 8K PAGE_GUARD region at the bottom of
+	// the stack (in theory VirtualQuery isn't supposed to include
+	// that, but it does). Add an additional 8K of slop for
+	// calling C functions that don't have stack checks and for
+	// lastcontinuehandler. We shouldn't be anywhere near this
+	// bound anyway.
+	base := mbi.allocationBase + 16<<10
+	// Sanity check the stack bounds.
+	g0 := getg()
+	if base > g0.stack.hi || g0.stack.hi-base > 64<<20 {
+		print("runtime: g0 stack [", hex(base), ",", hex(g0.stack.hi), ")\n")
+		throw("bad g0 stack")
+	}
+	g0.stack.lo = base
+	g0.stackguard0 = g0.stack.lo + stackGuard
+	g0.stackguard1 = g0.stackguard0
+	// Sanity check the SP.
+	stackcheck()
+}
+
+// Called from dropm to undo the effect of an minit.
+//
+//go:nosplit
+func unminit() {
+	mp := getg().m
+	lock(&mp.threadLock)
+	if mp.thread != 0 {
+		stdcall1(_CloseHandle, mp.thread)
+		mp.thread = 0
+	}
+	unlock(&mp.threadLock)
+}
+
+// Called from exitm, but not from drop, to undo the effect of thread-owned
+// resources in minit, semacreate, or elsewhere. Do not take locks after calling this.
+//
+//go:nosplit
+func mdestroy(mp *m) {
+	if mp.highResTimer != 0 {
+		stdcall1(_CloseHandle, mp.highResTimer)
+		mp.highResTimer = 0
+	}
+	if mp.waitsema != 0 {
+		stdcall1(_CloseHandle, mp.waitsema)
+		mp.waitsema = 0
+	}
+	if mp.resumesema != 0 {
+		stdcall1(_CloseHandle, mp.resumesema)
+		mp.resumesema = 0
+	}
+}
+
+// Calling stdcall on os stack.
+// May run during STW, so write barriers are not allowed.
+//
+//go:nowritebarrier
+//go:nosplit
+func stdcall(fn stdFunction) uintptr {
+	gp := getg()
+	mp := gp.m
+	mp.libcall.fn = uintptr(unsafe.Pointer(fn))
+	resetLibcall := false
+	if mp.profilehz != 0 && mp.libcallsp == 0 {
+		// leave pc/sp for cpu profiler
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+		resetLibcall = true // See comment in sys_darwin.go:libcCall
+	}
+	asmcgocall(asmstdcallAddr, unsafe.Pointer(&mp.libcall))
+	if resetLibcall {
+		mp.libcallsp = 0
+	}
+	return mp.libcall.r1
+}
+
+//go:nosplit
+func stdcall0(fn stdFunction) uintptr {
+	mp := getg().m
+	mp.libcall.n = 0
+	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&fn))) // it's unused but must be non-nil, otherwise crashes
+	return stdcall(fn)
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+func stdcall1(fn stdFunction, a0 uintptr) uintptr {
+	mp := getg().m
+	mp.libcall.n = 1
+	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&a0)))
+	return stdcall(fn)
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+func stdcall2(fn stdFunction, a0, a1 uintptr) uintptr {
+	mp := getg().m
+	mp.libcall.n = 2
+	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&a0)))
+	return stdcall(fn)
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+func stdcall3(fn stdFunction, a0, a1, a2 uintptr) uintptr {
+	mp := getg().m
+	mp.libcall.n = 3
+	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&a0)))
+	return stdcall(fn)
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+func stdcall4(fn stdFunction, a0, a1, a2, a3 uintptr) uintptr {
+	mp := getg().m
+	mp.libcall.n = 4
+	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&a0)))
+	return stdcall(fn)
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+func stdcall5(fn stdFunction, a0, a1, a2, a3, a4 uintptr) uintptr {
+	mp := getg().m
+	mp.libcall.n = 5
+	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&a0)))
+	return stdcall(fn)
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+func stdcall6(fn stdFunction, a0, a1, a2, a3, a4, a5 uintptr) uintptr {
+	mp := getg().m
+	mp.libcall.n = 6
+	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&a0)))
+	return stdcall(fn)
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+func stdcall7(fn stdFunction, a0, a1, a2, a3, a4, a5, a6 uintptr) uintptr {
+	mp := getg().m
+	mp.libcall.n = 7
+	mp.libcall.args = uintptr(noescape(unsafe.Pointer(&a0)))
+	return stdcall(fn)
+}
+
+// These must run on the system stack only.
+func usleep2(dt int32)
+func switchtothread()
+
+//go:nosplit
+func osyield_no_g() {
+	switchtothread()
+}
+
+//go:nosplit
+func osyield() {
+	systemstack(switchtothread)
+}
+
+//go:nosplit
+func usleep_no_g(us uint32) {
+	dt := -10 * int32(us) // relative sleep (negative), 100ns units
+	usleep2(dt)
+}
+
+//go:nosplit
+func usleep(us uint32) {
+	systemstack(func() {
+		dt := -10 * int64(us) // relative sleep (negative), 100ns units
+		// If the high-res timer is available and its handle has been allocated for this m, use it.
+		// Otherwise fall back to the low-res one, which doesn't need a handle.
+		if haveHighResTimer && getg().m.highResTimer != 0 {
+			h := getg().m.highResTimer
+			stdcall6(_SetWaitableTimer, h, uintptr(unsafe.Pointer(&dt)), 0, 0, 0, 0)
+			stdcall3(_NtWaitForSingleObject, h, 0, 0)
+		} else {
+			usleep2(int32(dt))
+		}
+	})
+}
+
+func ctrlHandler(_type uint32) uintptr {
+	var s uint32
+
+	switch _type {
+	case _CTRL_C_EVENT, _CTRL_BREAK_EVENT:
+		s = _SIGINT
+	case _CTRL_CLOSE_EVENT, _CTRL_LOGOFF_EVENT, _CTRL_SHUTDOWN_EVENT:
+		s = _SIGTERM
+	default:
+		return 0
+	}
+
+	if sigsend(s) {
+		if s == _SIGTERM {
+			// Windows terminates the process after this handler returns.
+			// Block indefinitely to give signal handlers a chance to clean up,
+			// but make sure to be properly parked first, so the rest of the
+			// program can continue executing.
+			block()
+		}
+		return 1
+	}
+	return 0
+}
+
+// called from zcallback_windows_*.s to sys_windows_*.s
+func callbackasm1()
+
+var profiletimer uintptr
+
+func profilem(mp *m, thread uintptr) {
+	// Align Context to 16 bytes.
+	var c *context
+	var cbuf [unsafe.Sizeof(*c) + 15]byte
+	c = (*context)(unsafe.Pointer((uintptr(unsafe.Pointer(&cbuf[15]))) &^ 15))
+
+	c.contextflags = _CONTEXT_CONTROL
+	stdcall2(_GetThreadContext, thread, uintptr(unsafe.Pointer(c)))
+
+	gp := gFromSP(mp, c.sp())
+
+	sigprof(c.ip(), c.sp(), c.lr(), gp, mp)
+}
+
+func gFromSP(mp *m, sp uintptr) *g {
+	if gp := mp.g0; gp != nil && gp.stack.lo < sp && sp < gp.stack.hi {
+		return gp
+	}
+	if gp := mp.gsignal; gp != nil && gp.stack.lo < sp && sp < gp.stack.hi {
+		return gp
+	}
+	if gp := mp.curg; gp != nil && gp.stack.lo < sp && sp < gp.stack.hi {
+		return gp
+	}
+	return nil
+}
+
+func profileLoop() {
+	stdcall2(_SetThreadPriority, currentThread, _THREAD_PRIORITY_HIGHEST)
+
+	for {
+		stdcall2(_WaitForSingleObject, profiletimer, _INFINITE)
+		first := (*m)(atomic.Loadp(unsafe.Pointer(&allm)))
+		for mp := first; mp != nil; mp = mp.alllink {
+			if mp == getg().m {
+				// Don't profile ourselves.
+				continue
+			}
+
+			lock(&mp.threadLock)
+			// Do not profile threads blocked on Notes,
+			// this includes idle worker threads,
+			// idle timer thread, idle heap scavenger, etc.
+			if mp.thread == 0 || mp.profilehz == 0 || mp.blocked {
+				unlock(&mp.threadLock)
+				continue
+			}
+			// Acquire our own handle to the thread.
+			var thread uintptr
+			if stdcall7(_DuplicateHandle, currentProcess, mp.thread, currentProcess, uintptr(unsafe.Pointer(&thread)), 0, 0, _DUPLICATE_SAME_ACCESS) == 0 {
+				print("runtime: duplicatehandle failed; errno=", getlasterror(), "\n")
+				throw("duplicatehandle failed")
+			}
+			unlock(&mp.threadLock)
+
+			// mp may exit between the DuplicateHandle
+			// above and the SuspendThread. The handle
+			// will remain valid, but SuspendThread may
+			// fail.
+			if int32(stdcall1(_SuspendThread, thread)) == -1 {
+				// The thread no longer exists.
+				stdcall1(_CloseHandle, thread)
+				continue
+			}
+			if mp.profilehz != 0 && !mp.blocked {
+				// Pass the thread handle in case mp
+				// was in the process of shutting down.
+				profilem(mp, thread)
+			}
+			stdcall1(_ResumeThread, thread)
+			stdcall1(_CloseHandle, thread)
+		}
+	}
+}
+
+func setProcessCPUProfiler(hz int32) {
+	if profiletimer == 0 {
+		timer := stdcall3(_CreateWaitableTimerA, 0, 0, 0)
+		atomic.Storeuintptr(&profiletimer, timer)
+		newm(profileLoop, nil, -1)
+	}
+}
+
+func setThreadCPUProfiler(hz int32) {
+	ms := int32(0)
+	due := ^int64(^uint64(1 << 63))
+	if hz > 0 {
+		ms = 1000 / hz
+		if ms == 0 {
+			ms = 1
+		}
+		due = int64(ms) * -10000
+	}
+	stdcall6(_SetWaitableTimer, profiletimer, uintptr(unsafe.Pointer(&due)), uintptr(ms), 0, 0, 0)
+	atomic.Store((*uint32)(unsafe.Pointer(&getg().m.profilehz)), uint32(hz))
+}
+
+const preemptMSupported = true
+
+// suspendLock protects simultaneous SuspendThread operations from
+// suspending each other.
+var suspendLock mutex
+
+func preemptM(mp *m) {
+	if mp == getg().m {
+		throw("self-preempt")
+	}
+
+	// Synchronize with external code that may try to ExitProcess.
+	if !atomic.Cas(&mp.preemptExtLock, 0, 1) {
+		// External code is running. Fail the preemption
+		// attempt.
+		mp.preemptGen.Add(1)
+		return
+	}
+
+	// Acquire our own handle to mp's thread.
+	lock(&mp.threadLock)
+	if mp.thread == 0 {
+		// The M hasn't been minit'd yet (or was just unminit'd).
+		unlock(&mp.threadLock)
+		atomic.Store(&mp.preemptExtLock, 0)
+		mp.preemptGen.Add(1)
+		return
+	}
+	var thread uintptr
+	if stdcall7(_DuplicateHandle, currentProcess, mp.thread, currentProcess, uintptr(unsafe.Pointer(&thread)), 0, 0, _DUPLICATE_SAME_ACCESS) == 0 {
+		print("runtime.preemptM: duplicatehandle failed; errno=", getlasterror(), "\n")
+		throw("runtime.preemptM: duplicatehandle failed")
+	}
+	unlock(&mp.threadLock)
+
+	// Prepare thread context buffer. This must be aligned to 16 bytes.
+	var c *context
+	var cbuf [unsafe.Sizeof(*c) + 15]byte
+	c = (*context)(unsafe.Pointer((uintptr(unsafe.Pointer(&cbuf[15]))) &^ 15))
+	c.contextflags = _CONTEXT_CONTROL
+
+	// Serialize thread suspension. SuspendThread is asynchronous,
+	// so it's otherwise possible for two threads to suspend each
+	// other and deadlock. We must hold this lock until after
+	// GetThreadContext, since that blocks until the thread is
+	// actually suspended.
+	lock(&suspendLock)
+
+	// Suspend the thread.
+	if int32(stdcall1(_SuspendThread, thread)) == -1 {
+		unlock(&suspendLock)
+		stdcall1(_CloseHandle, thread)
+		atomic.Store(&mp.preemptExtLock, 0)
+		// The thread no longer exists. This shouldn't be
+		// possible, but just acknowledge the request.
+		mp.preemptGen.Add(1)
+		return
+	}
+
+	// We have to be very careful between this point and once
+	// we've shown mp is at an async safe-point. This is like a
+	// signal handler in the sense that mp could have been doing
+	// anything when we stopped it, including holding arbitrary
+	// locks.
+
+	// We have to get the thread context before inspecting the M
+	// because SuspendThread only requests a suspend.
+	// GetThreadContext actually blocks until it's suspended.
+	stdcall2(_GetThreadContext, thread, uintptr(unsafe.Pointer(c)))
+
+	unlock(&suspendLock)
+
+	// Does it want a preemption and is it safe to preempt?
+	gp := gFromSP(mp, c.sp())
+	if gp != nil && wantAsyncPreempt(gp) {
+		if ok, newpc := isAsyncSafePoint(gp, c.ip(), c.sp(), c.lr()); ok {
+			// Inject call to asyncPreempt
+			targetPC := abi.FuncPCABI0(asyncPreempt)
+			switch GOARCH {
+			default:
+				throw("unsupported architecture")
+			case "386", "amd64":
+				// Make it look like the thread called targetPC.
+				sp := c.sp()
+				sp -= goarch.PtrSize
+				*(*uintptr)(unsafe.Pointer(sp)) = newpc
+				c.set_sp(sp)
+				c.set_ip(targetPC)
+
+			case "arm":
+				// Push LR. The injected call is responsible
+				// for restoring LR. gentraceback is aware of
+				// this extra slot. See sigctxt.pushCall in
+				// signal_arm.go, which is similar except we
+				// subtract 1 from IP here.
+				sp := c.sp()
+				sp -= goarch.PtrSize
+				c.set_sp(sp)
+				*(*uint32)(unsafe.Pointer(sp)) = uint32(c.lr())
+				c.set_lr(newpc - 1)
+				c.set_ip(targetPC)
+
+			case "arm64":
+				// Push LR. The injected call is responsible
+				// for restoring LR. gentraceback is aware of
+				// this extra slot. See sigctxt.pushCall in
+				// signal_arm64.go.
+				sp := c.sp() - 16 // SP needs 16-byte alignment
+				c.set_sp(sp)
+				*(*uint64)(unsafe.Pointer(sp)) = uint64(c.lr())
+				c.set_lr(newpc)
+				c.set_ip(targetPC)
+			}
+			stdcall2(_SetThreadContext, thread, uintptr(unsafe.Pointer(c)))
+		}
+	}
+
+	atomic.Store(&mp.preemptExtLock, 0)
+
+	// Acknowledge the preemption.
+	mp.preemptGen.Add(1)
+
+	stdcall1(_ResumeThread, thread)
+	stdcall1(_CloseHandle, thread)
+}
+
+// osPreemptExtEnter is called before entering external code that may
+// call ExitProcess.
+//
+// This must be nosplit because it may be called from a syscall with
+// untyped stack slots, so the stack must not be grown or scanned.
+//
+//go:nosplit
+func osPreemptExtEnter(mp *m) {
+	for !atomic.Cas(&mp.preemptExtLock, 0, 1) {
+		// An asynchronous preemption is in progress. It's not
+		// safe to enter external code because it may call
+		// ExitProcess and deadlock with SuspendThread.
+		// Ideally we would do the preemption ourselves, but
+		// can't since there may be untyped syscall arguments
+		// on the stack. Instead, just wait and encourage the
+		// SuspendThread APC to run. The preemption should be
+		// done shortly.
+		osyield()
+	}
+	// Asynchronous preemption is now blocked.
+}
+
+// osPreemptExtExit is called after returning from external code that
+// may call ExitProcess.
+//
+// See osPreemptExtEnter for why this is nosplit.
+//
+//go:nosplit
+func osPreemptExtExit(mp *m) {
+	atomic.Store(&mp.preemptExtLock, 0)
+}
diff --git a/src/runtime/os_windows_arm.go b/src/runtime/os_windows_arm.go
new file mode 100644
index 0000000..10aff75
--- /dev/null
+++ b/src/runtime/os_windows_arm.go
@@ -0,0 +1,22 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+//go:nosplit
+func cputicks() int64 {
+	var counter int64
+	stdcall1(_QueryPerformanceCounter, uintptr(unsafe.Pointer(&counter)))
+	return counter
+}
+
+func checkgoarm() {
+	if goarm < 7 {
+		print("Need atomic synchronization instructions, coprocessor ",
+			"access instructions. Recompile using GOARM=7.\n")
+		exit(1)
+	}
+}
diff --git a/src/runtime/os_windows_arm64.go b/src/runtime/os_windows_arm64.go
new file mode 100644
index 0000000..7e41344
--- /dev/null
+++ b/src/runtime/os_windows_arm64.go
@@ -0,0 +1,14 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+//go:nosplit
+func cputicks() int64 {
+	var counter int64
+	stdcall1(_QueryPerformanceCounter, uintptr(unsafe.Pointer(&counter)))
+	return counter
+}
diff --git a/src/runtime/pagetrace_off.go b/src/runtime/pagetrace_off.go
new file mode 100644
index 0000000..10b44d4
--- /dev/null
+++ b/src/runtime/pagetrace_off.go
@@ -0,0 +1,28 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !goexperiment.pagetrace
+
+package runtime
+
+//go:systemstack
+func pageTraceAlloc(pp *p, now int64, base, npages uintptr) {
+}
+
+//go:systemstack
+func pageTraceFree(pp *p, now int64, base, npages uintptr) {
+}
+
+//go:systemstack
+func pageTraceScav(pp *p, now int64, base, npages uintptr) {
+}
+
+type pageTraceBuf struct {
+}
+
+func initPageTrace(env string) {
+}
+
+func finishPageTrace() {
+}
diff --git a/src/runtime/pagetrace_on.go b/src/runtime/pagetrace_on.go
new file mode 100644
index 0000000..0e621cb
--- /dev/null
+++ b/src/runtime/pagetrace_on.go
@@ -0,0 +1,358 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build goexperiment.pagetrace
+
+// Page tracer.
+//
+// This file contains an implementation of page trace instrumentation for tracking
+// the way the Go runtime manages pages of memory. The trace may be enabled at program
+// startup with the GODEBUG option pagetrace.
+//
+// Each page trace event is either 8 or 16 bytes wide. The first
+// 8 bytes follow this format for non-sync events:
+//
+//     [16 timestamp delta][35 base address][10 npages][1 isLarge][2 pageTraceEventType]
+//
+// If the "large" bit is set then the event is 16 bytes wide with the second 8 byte word
+// containing the full npages value (the npages bitfield is 0).
+//
+// The base address's bottom pageShift bits are always zero hence why we can pack other
+// data in there. We ignore the top 16 bits, assuming a 48 bit address space for the
+// heap.
+//
+// The timestamp delta is computed from the difference between the current nanotime
+// timestamp and the last sync event's timestamp. The bottom pageTraceTimeLostBits of
+// this delta is removed and only the next pageTraceTimeDeltaBits are kept.
+//
+// A sync event is emitted at the beginning of each trace buffer and whenever the
+// timestamp delta would not fit in an event.
+//
+// Sync events have the following structure:
+//
+//    [61 timestamp or P ID][1 isPID][2 pageTraceSyncEvent]
+//
+// In essence, the "large" bit repurposed to indicate whether it's a timestamp or a P ID
+// (these are typically uint32). Note that we only have 61 bits for the 64-bit timestamp,
+// but like for the delta we drop the bottom pageTraceTimeLostBits here as well.
+
+package runtime
+
+import (
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// pageTraceAlloc records a page trace allocation event.
+// pp may be nil. Call only if debug.pagetracefd != 0.
+//
+// Must run on the system stack as a crude way to prevent preemption.
+//
+//go:systemstack
+func pageTraceAlloc(pp *p, now int64, base, npages uintptr) {
+	if pageTrace.enabled {
+		if now == 0 {
+			now = nanotime()
+		}
+		pageTraceEmit(pp, now, base, npages, pageTraceAllocEvent)
+	}
+}
+
+// pageTraceFree records a page trace free event.
+// pp may be nil. Call only if debug.pagetracefd != 0.
+//
+// Must run on the system stack as a crude way to prevent preemption.
+//
+//go:systemstack
+func pageTraceFree(pp *p, now int64, base, npages uintptr) {
+	if pageTrace.enabled {
+		if now == 0 {
+			now = nanotime()
+		}
+		pageTraceEmit(pp, now, base, npages, pageTraceFreeEvent)
+	}
+}
+
+// pageTraceScav records a page trace scavenge event.
+// pp may be nil. Call only if debug.pagetracefd != 0.
+//
+// Must run on the system stack as a crude way to prevent preemption.
+//
+//go:systemstack
+func pageTraceScav(pp *p, now int64, base, npages uintptr) {
+	if pageTrace.enabled {
+		if now == 0 {
+			now = nanotime()
+		}
+		pageTraceEmit(pp, now, base, npages, pageTraceScavEvent)
+	}
+}
+
+// pageTraceEventType is a page trace event type.
+type pageTraceEventType uint8
+
+const (
+	pageTraceSyncEvent  pageTraceEventType = iota // Timestamp emission.
+	pageTraceAllocEvent                           // Allocation of pages.
+	pageTraceFreeEvent                            // Freeing pages.
+	pageTraceScavEvent                            // Scavenging pages.
+)
+
+// pageTraceEmit emits a page trace event.
+//
+// Must run on the system stack as a crude way to prevent preemption.
+//
+//go:systemstack
+func pageTraceEmit(pp *p, now int64, base, npages uintptr, typ pageTraceEventType) {
+	// Get a buffer.
+	var tbp *pageTraceBuf
+	pid := int32(-1)
+	if pp == nil {
+		// We have no P, so take the global buffer.
+		lock(&pageTrace.lock)
+		tbp = &pageTrace.buf
+	} else {
+		tbp = &pp.pageTraceBuf
+		pid = pp.id
+	}
+
+	// Initialize the buffer if necessary.
+	tb := *tbp
+	if tb.buf == nil {
+		tb.buf = (*pageTraceEvents)(sysAlloc(pageTraceBufSize, &memstats.other_sys))
+		tb = tb.writePid(pid)
+	}
+
+	// Handle timestamp and emit a sync event if necessary.
+	if now < tb.timeBase {
+		now = tb.timeBase
+	}
+	if now-tb.timeBase >= pageTraceTimeMaxDelta {
+		tb.timeBase = now
+		tb = tb.writeSync(pid)
+	}
+
+	// Emit the event.
+	tb = tb.writeEvent(pid, now, base, npages, typ)
+
+	// Write back the buffer.
+	*tbp = tb
+	if pp == nil {
+		unlock(&pageTrace.lock)
+	}
+}
+
+const (
+	pageTraceBufSize = 32 << 10
+
+	// These constants describe the per-event timestamp delta encoding.
+	pageTraceTimeLostBits  = 7  // How many bits of precision we lose in the delta.
+	pageTraceTimeDeltaBits = 16 // Size of the delta in bits.
+	pageTraceTimeMaxDelta  = 1 << (pageTraceTimeLostBits + pageTraceTimeDeltaBits)
+)
+
+// pageTraceEvents is the low-level buffer containing the trace data.
+type pageTraceEvents struct {
+	_      sys.NotInHeap
+	events [pageTraceBufSize / 8]uint64
+}
+
+// pageTraceBuf is a wrapper around pageTraceEvents that knows how to write events
+// to the buffer. It tracks state necessary to do so.
+type pageTraceBuf struct {
+	buf      *pageTraceEvents
+	len      int   // How many events have been written so far.
+	timeBase int64 // The current timestamp base from which deltas are produced.
+	finished bool  // Whether this trace buf should no longer flush anything out.
+}
+
+// writePid writes a P ID event indicating which P we're running on.
+//
+// Assumes there's always space in the buffer since this is only called at the
+// beginning of a new buffer.
+//
+// Must run on the system stack as a crude way to prevent preemption.
+//
+//go:systemstack
+func (tb pageTraceBuf) writePid(pid int32) pageTraceBuf {
+	e := uint64(int64(pid))<<3 | 0b100 | uint64(pageTraceSyncEvent)
+	tb.buf.events[tb.len] = e
+	tb.len++
+	return tb
+}
+
+// writeSync writes a sync event, which is just a timestamp. Handles flushing.
+//
+// Must run on the system stack as a crude way to prevent preemption.
+//
+//go:systemstack
+func (tb pageTraceBuf) writeSync(pid int32) pageTraceBuf {
+	if tb.len+1 > len(tb.buf.events) {
+		// N.B. flush will writeSync again.
+		return tb.flush(pid, tb.timeBase)
+	}
+	e := ((uint64(tb.timeBase) >> pageTraceTimeLostBits) << 3) | uint64(pageTraceSyncEvent)
+	tb.buf.events[tb.len] = e
+	tb.len++
+	return tb
+}
+
+// writeEvent handles writing all non-sync and non-pid events. Handles flushing if necessary.
+//
+// pid indicates the P we're currently running on. Necessary in case we need to flush.
+// now is the current nanotime timestamp.
+// base is the base address of whatever group of pages this event is happening to.
+// npages is the length of the group of pages this event is happening to.
+// typ is the event that's happening to these pages.
+//
+// Must run on the system stack as a crude way to prevent preemption.
+//
+//go:systemstack
+func (tb pageTraceBuf) writeEvent(pid int32, now int64, base, npages uintptr, typ pageTraceEventType) pageTraceBuf {
+	large := 0
+	np := npages
+	if npages >= 1024 {
+		large = 1
+		np = 0
+	}
+	if tb.len+1+large > len(tb.buf.events) {
+		tb = tb.flush(pid, now)
+	}
+	if base%pageSize != 0 {
+		throw("base address not page aligned")
+	}
+	e := uint64(base)
+	// The pageShift low-order bits are zero.
+	e |= uint64(typ)        // 2 bits
+	e |= uint64(large) << 2 // 1 bit
+	e |= uint64(np) << 3    // 10 bits
+	// Write the timestamp delta in the upper pageTraceTimeDeltaBits.
+	e |= uint64((now-tb.timeBase)>>pageTraceTimeLostBits) << (64 - pageTraceTimeDeltaBits)
+	tb.buf.events[tb.len] = e
+	if large != 0 {
+		// npages doesn't fit in 10 bits, so write an additional word with that data.
+		tb.buf.events[tb.len+1] = uint64(npages)
+	}
+	tb.len += 1 + large
+	return tb
+}
+
+// flush writes out the contents of the buffer to pageTrace.fd and resets the buffer.
+// It then writes out a P ID event and the first sync event for the new buffer.
+//
+// Must run on the system stack as a crude way to prevent preemption.
+//
+//go:systemstack
+func (tb pageTraceBuf) flush(pid int32, now int64) pageTraceBuf {
+	if !tb.finished {
+		lock(&pageTrace.fdLock)
+		writeFull(uintptr(pageTrace.fd), (*byte)(unsafe.Pointer(&tb.buf.events[0])), tb.len*8)
+		unlock(&pageTrace.fdLock)
+	}
+	tb.len = 0
+	tb.timeBase = now
+	return tb.writePid(pid).writeSync(pid)
+}
+
+var pageTrace struct {
+	// enabled indicates whether tracing is enabled. If true, fd >= 0.
+	//
+	// Safe to read without synchronization because it's only set once
+	// at program initialization.
+	enabled bool
+
+	// buf is the page trace buffer used if there is no P.
+	//
+	// lock protects buf.
+	lock mutex
+	buf  pageTraceBuf
+
+	// fdLock protects writing to fd.
+	//
+	// fd is the file to write the page trace to.
+	fdLock mutex
+	fd     int32
+}
+
+// initPageTrace initializes the page tracing infrastructure from GODEBUG.
+//
+// env must be the value of the GODEBUG environment variable.
+func initPageTrace(env string) {
+	var value string
+	for env != "" {
+		elt, rest := env, ""
+		for i := 0; i < len(env); i++ {
+			if env[i] == ',' {
+				elt, rest = env[:i], env[i+1:]
+				break
+			}
+		}
+		env = rest
+		if hasPrefix(elt, "pagetrace=") {
+			value = elt[len("pagetrace="):]
+			break
+		}
+	}
+	pageTrace.fd = -1
+	if canCreateFile && value != "" {
+		var tmp [4096]byte
+		if len(value) != 0 && len(value) < 4096 {
+			copy(tmp[:], value)
+			pageTrace.fd = create(&tmp[0], 0o664)
+		}
+	}
+	pageTrace.enabled = pageTrace.fd >= 0
+}
+
+// finishPageTrace flushes all P's trace buffers and disables page tracing.
+func finishPageTrace() {
+	if !pageTrace.enabled {
+		return
+	}
+	// Grab worldsema as we're about to execute a ragged barrier.
+	semacquire(&worldsema)
+	systemstack(func() {
+		// Disable tracing. This isn't strictly necessary and it's best-effort.
+		pageTrace.enabled = false
+
+		// Execute a ragged barrier, flushing each trace buffer.
+		forEachP(func(pp *p) {
+			if pp.pageTraceBuf.buf != nil {
+				pp.pageTraceBuf = pp.pageTraceBuf.flush(pp.id, nanotime())
+			}
+			pp.pageTraceBuf.finished = true
+		})
+
+		// Write the global have-no-P buffer.
+		lock(&pageTrace.lock)
+		if pageTrace.buf.buf != nil {
+			pageTrace.buf = pageTrace.buf.flush(-1, nanotime())
+		}
+		pageTrace.buf.finished = true
+		unlock(&pageTrace.lock)
+
+		// Safely close the file as nothing else should be allowed to write to the fd.
+		lock(&pageTrace.fdLock)
+		closefd(pageTrace.fd)
+		pageTrace.fd = -1
+		unlock(&pageTrace.fdLock)
+	})
+	semrelease(&worldsema)
+}
+
+// writeFull ensures that a complete write of bn bytes from b is made to fd.
+func writeFull(fd uintptr, b *byte, bn int) {
+	for bn > 0 {
+		n := write(fd, unsafe.Pointer(b), int32(bn))
+		if n == -_EINTR || n == -_EAGAIN {
+			continue
+		}
+		if n < 0 {
+			print("errno=", -n, "\n")
+			throw("writeBytes: bad write")
+		}
+		bn -= int(n)
+		b = addb(b, uintptr(n))
+	}
+}
diff --git a/src/runtime/panic.go b/src/runtime/panic.go
new file mode 100644
index 0000000..39c27a4
--- /dev/null
+++ b/src/runtime/panic.go
@@ -0,0 +1,1419 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// throwType indicates the current type of ongoing throw, which affects the
+// amount of detail printed to stderr. Higher values include more detail.
+type throwType uint32
+
+const (
+	// throwTypeNone means that we are not throwing.
+	throwTypeNone throwType = iota
+
+	// throwTypeUser is a throw due to a problem with the application.
+	//
+	// These throws do not include runtime frames, system goroutines, or
+	// frame metadata.
+	throwTypeUser
+
+	// throwTypeRuntime is a throw due to a problem with Go itself.
+	//
+	// These throws include as much information as possible to aid in
+	// debugging the runtime, including runtime frames, system goroutines,
+	// and frame metadata.
+	throwTypeRuntime
+)
+
+// We have two different ways of doing defers. The older way involves creating a
+// defer record at the time that a defer statement is executing and adding it to a
+// defer chain. This chain is inspected by the deferreturn call at all function
+// exits in order to run the appropriate defer calls. A cheaper way (which we call
+// open-coded defers) is used for functions in which no defer statements occur in
+// loops. In that case, we simply store the defer function/arg information into
+// specific stack slots at the point of each defer statement, as well as setting a
+// bit in a bitmask. At each function exit, we add inline code to directly make
+// the appropriate defer calls based on the bitmask and fn/arg information stored
+// on the stack. During panic/Goexit processing, the appropriate defer calls are
+// made using extra funcdata info that indicates the exact stack slots that
+// contain the bitmask and defer fn/args.
+
+// Check to make sure we can really generate a panic. If the panic
+// was generated from the runtime, or from inside malloc, then convert
+// to a throw of msg.
+// pc should be the program counter of the compiler-generated code that
+// triggered this panic.
+func panicCheck1(pc uintptr, msg string) {
+	if goarch.IsWasm == 0 && hasPrefix(funcname(findfunc(pc)), "runtime.") {
+		// Note: wasm can't tail call, so we can't get the original caller's pc.
+		throw(msg)
+	}
+	// TODO: is this redundant? How could we be in malloc
+	// but not in the runtime? runtime/internal/*, maybe?
+	gp := getg()
+	if gp != nil && gp.m != nil && gp.m.mallocing != 0 {
+		throw(msg)
+	}
+}
+
+// Same as above, but calling from the runtime is allowed.
+//
+// Using this function is necessary for any panic that may be
+// generated by runtime.sigpanic, since those are always called by the
+// runtime.
+func panicCheck2(err string) {
+	// panic allocates, so to avoid recursive malloc, turn panics
+	// during malloc into throws.
+	gp := getg()
+	if gp != nil && gp.m != nil && gp.m.mallocing != 0 {
+		throw(err)
+	}
+}
+
+// Many of the following panic entry-points turn into throws when they
+// happen in various runtime contexts. These should never happen in
+// the runtime, and if they do, they indicate a serious issue and
+// should not be caught by user code.
+//
+// The panic{Index,Slice,divide,shift} functions are called by
+// code generated by the compiler for out of bounds index expressions,
+// out of bounds slice expressions, division by zero, and shift by negative.
+// The panicdivide (again), panicoverflow, panicfloat, and panicmem
+// functions are called by the signal handler when a signal occurs
+// indicating the respective problem.
+//
+// Since panic{Index,Slice,shift} are never called directly, and
+// since the runtime package should never have an out of bounds slice
+// or array reference or negative shift, if we see those functions called from the
+// runtime package we turn the panic into a throw. That will dump the
+// entire runtime stack for easier debugging.
+//
+// The entry points called by the signal handler will be called from
+// runtime.sigpanic, so we can't disallow calls from the runtime to
+// these (they always look like they're called from the runtime).
+// Hence, for these, we just check for clearly bad runtime conditions.
+//
+// The panic{Index,Slice} functions are implemented in assembly and tail call
+// to the goPanic{Index,Slice} functions below. This is done so we can use
+// a space-minimal register calling convention.
+
+// failures in the comparisons for s[x], 0 <= x < y (y == len(s))
+//
+//go:yeswritebarrierrec
+func goPanicIndex(x int, y int) {
+	panicCheck1(getcallerpc(), "index out of range")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsIndex})
+}
+
+//go:yeswritebarrierrec
+func goPanicIndexU(x uint, y int) {
+	panicCheck1(getcallerpc(), "index out of range")
+	panic(boundsError{x: int64(x), signed: false, y: y, code: boundsIndex})
+}
+
+// failures in the comparisons for s[:x], 0 <= x <= y (y == len(s) or cap(s))
+//
+//go:yeswritebarrierrec
+func goPanicSliceAlen(x int, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsSliceAlen})
+}
+
+//go:yeswritebarrierrec
+func goPanicSliceAlenU(x uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: false, y: y, code: boundsSliceAlen})
+}
+
+//go:yeswritebarrierrec
+func goPanicSliceAcap(x int, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsSliceAcap})
+}
+
+//go:yeswritebarrierrec
+func goPanicSliceAcapU(x uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: false, y: y, code: boundsSliceAcap})
+}
+
+// failures in the comparisons for s[x:y], 0 <= x <= y
+//
+//go:yeswritebarrierrec
+func goPanicSliceB(x int, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsSliceB})
+}
+
+//go:yeswritebarrierrec
+func goPanicSliceBU(x uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: false, y: y, code: boundsSliceB})
+}
+
+// failures in the comparisons for s[::x], 0 <= x <= y (y == len(s) or cap(s))
+func goPanicSlice3Alen(x int, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsSlice3Alen})
+}
+func goPanicSlice3AlenU(x uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: false, y: y, code: boundsSlice3Alen})
+}
+func goPanicSlice3Acap(x int, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsSlice3Acap})
+}
+func goPanicSlice3AcapU(x uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: false, y: y, code: boundsSlice3Acap})
+}
+
+// failures in the comparisons for s[:x:y], 0 <= x <= y
+func goPanicSlice3B(x int, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsSlice3B})
+}
+func goPanicSlice3BU(x uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: false, y: y, code: boundsSlice3B})
+}
+
+// failures in the comparisons for s[x:y:], 0 <= x <= y
+func goPanicSlice3C(x int, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsSlice3C})
+}
+func goPanicSlice3CU(x uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(x), signed: false, y: y, code: boundsSlice3C})
+}
+
+// failures in the conversion ([x]T)(s) or (*[x]T)(s), 0 <= x <= y, y == len(s)
+func goPanicSliceConvert(x int, y int) {
+	panicCheck1(getcallerpc(), "slice length too short to convert to array or pointer to array")
+	panic(boundsError{x: int64(x), signed: true, y: y, code: boundsConvert})
+}
+
+// Implemented in assembly, as they take arguments in registers.
+// Declared here to mark them as ABIInternal.
+func panicIndex(x int, y int)
+func panicIndexU(x uint, y int)
+func panicSliceAlen(x int, y int)
+func panicSliceAlenU(x uint, y int)
+func panicSliceAcap(x int, y int)
+func panicSliceAcapU(x uint, y int)
+func panicSliceB(x int, y int)
+func panicSliceBU(x uint, y int)
+func panicSlice3Alen(x int, y int)
+func panicSlice3AlenU(x uint, y int)
+func panicSlice3Acap(x int, y int)
+func panicSlice3AcapU(x uint, y int)
+func panicSlice3B(x int, y int)
+func panicSlice3BU(x uint, y int)
+func panicSlice3C(x int, y int)
+func panicSlice3CU(x uint, y int)
+func panicSliceConvert(x int, y int)
+
+var shiftError = error(errorString("negative shift amount"))
+
+//go:yeswritebarrierrec
+func panicshift() {
+	panicCheck1(getcallerpc(), "negative shift amount")
+	panic(shiftError)
+}
+
+var divideError = error(errorString("integer divide by zero"))
+
+//go:yeswritebarrierrec
+func panicdivide() {
+	panicCheck2("integer divide by zero")
+	panic(divideError)
+}
+
+var overflowError = error(errorString("integer overflow"))
+
+func panicoverflow() {
+	panicCheck2("integer overflow")
+	panic(overflowError)
+}
+
+var floatError = error(errorString("floating point error"))
+
+func panicfloat() {
+	panicCheck2("floating point error")
+	panic(floatError)
+}
+
+var memoryError = error(errorString("invalid memory address or nil pointer dereference"))
+
+func panicmem() {
+	panicCheck2("invalid memory address or nil pointer dereference")
+	panic(memoryError)
+}
+
+func panicmemAddr(addr uintptr) {
+	panicCheck2("invalid memory address or nil pointer dereference")
+	panic(errorAddressString{msg: "invalid memory address or nil pointer dereference", addr: addr})
+}
+
+// Create a new deferred function fn, which has no arguments and results.
+// The compiler turns a defer statement into a call to this.
+func deferproc(fn func()) {
+	gp := getg()
+	if gp.m.curg != gp {
+		// go code on the system stack can't defer
+		throw("defer on system stack")
+	}
+
+	d := newdefer()
+	if d._panic != nil {
+		throw("deferproc: d.panic != nil after newdefer")
+	}
+	d.link = gp._defer
+	gp._defer = d
+	d.fn = fn
+	d.pc = getcallerpc()
+	// We must not be preempted between calling getcallersp and
+	// storing it to d.sp because getcallersp's result is a
+	// uintptr stack pointer.
+	d.sp = getcallersp()
+
+	// deferproc returns 0 normally.
+	// a deferred func that stops a panic
+	// makes the deferproc return 1.
+	// the code the compiler generates always
+	// checks the return value and jumps to the
+	// end of the function if deferproc returns != 0.
+	return0()
+	// No code can go here - the C return register has
+	// been set and must not be clobbered.
+}
+
+// deferprocStack queues a new deferred function with a defer record on the stack.
+// The defer record must have its fn field initialized.
+// All other fields can contain junk.
+// Nosplit because of the uninitialized pointer fields on the stack.
+//
+//go:nosplit
+func deferprocStack(d *_defer) {
+	gp := getg()
+	if gp.m.curg != gp {
+		// go code on the system stack can't defer
+		throw("defer on system stack")
+	}
+	// fn is already set.
+	// The other fields are junk on entry to deferprocStack and
+	// are initialized here.
+	d.started = false
+	d.heap = false
+	d.openDefer = false
+	d.sp = getcallersp()
+	d.pc = getcallerpc()
+	d.framepc = 0
+	d.varp = 0
+	// The lines below implement:
+	//   d.panic = nil
+	//   d.fd = nil
+	//   d.link = gp._defer
+	//   gp._defer = d
+	// But without write barriers. The first three are writes to
+	// the stack so they don't need a write barrier, and furthermore
+	// are to uninitialized memory, so they must not use a write barrier.
+	// The fourth write does not require a write barrier because we
+	// explicitly mark all the defer structures, so we don't need to
+	// keep track of pointers to them with a write barrier.
+	*(*uintptr)(unsafe.Pointer(&d._panic)) = 0
+	*(*uintptr)(unsafe.Pointer(&d.fd)) = 0
+	*(*uintptr)(unsafe.Pointer(&d.link)) = uintptr(unsafe.Pointer(gp._defer))
+	*(*uintptr)(unsafe.Pointer(&gp._defer)) = uintptr(unsafe.Pointer(d))
+
+	return0()
+	// No code can go here - the C return register has
+	// been set and must not be clobbered.
+}
+
+// Each P holds a pool for defers.
+
+// Allocate a Defer, usually using per-P pool.
+// Each defer must be released with freedefer.  The defer is not
+// added to any defer chain yet.
+func newdefer() *_defer {
+	var d *_defer
+	mp := acquirem()
+	pp := mp.p.ptr()
+	if len(pp.deferpool) == 0 && sched.deferpool != nil {
+		lock(&sched.deferlock)
+		for len(pp.deferpool) < cap(pp.deferpool)/2 && sched.deferpool != nil {
+			d := sched.deferpool
+			sched.deferpool = d.link
+			d.link = nil
+			pp.deferpool = append(pp.deferpool, d)
+		}
+		unlock(&sched.deferlock)
+	}
+	if n := len(pp.deferpool); n > 0 {
+		d = pp.deferpool[n-1]
+		pp.deferpool[n-1] = nil
+		pp.deferpool = pp.deferpool[:n-1]
+	}
+	releasem(mp)
+	mp, pp = nil, nil
+
+	if d == nil {
+		// Allocate new defer.
+		d = new(_defer)
+	}
+	d.heap = true
+	return d
+}
+
+// Free the given defer.
+// The defer cannot be used after this call.
+//
+// This is nosplit because the incoming defer is in a perilous state.
+// It's not on any defer list, so stack copying won't adjust stack
+// pointers in it (namely, d.link). Hence, if we were to copy the
+// stack, d could then contain a stale pointer.
+//
+//go:nosplit
+func freedefer(d *_defer) {
+	d.link = nil
+	// After this point we can copy the stack.
+
+	if d._panic != nil {
+		freedeferpanic()
+	}
+	if d.fn != nil {
+		freedeferfn()
+	}
+	if !d.heap {
+		return
+	}
+
+	mp := acquirem()
+	pp := mp.p.ptr()
+	if len(pp.deferpool) == cap(pp.deferpool) {
+		// Transfer half of local cache to the central cache.
+		var first, last *_defer
+		for len(pp.deferpool) > cap(pp.deferpool)/2 {
+			n := len(pp.deferpool)
+			d := pp.deferpool[n-1]
+			pp.deferpool[n-1] = nil
+			pp.deferpool = pp.deferpool[:n-1]
+			if first == nil {
+				first = d
+			} else {
+				last.link = d
+			}
+			last = d
+		}
+		lock(&sched.deferlock)
+		last.link = sched.deferpool
+		sched.deferpool = first
+		unlock(&sched.deferlock)
+	}
+
+	*d = _defer{}
+
+	pp.deferpool = append(pp.deferpool, d)
+
+	releasem(mp)
+	mp, pp = nil, nil
+}
+
+// Separate function so that it can split stack.
+// Windows otherwise runs out of stack space.
+func freedeferpanic() {
+	// _panic must be cleared before d is unlinked from gp.
+	throw("freedefer with d._panic != nil")
+}
+
+func freedeferfn() {
+	// fn must be cleared before d is unlinked from gp.
+	throw("freedefer with d.fn != nil")
+}
+
+// deferreturn runs deferred functions for the caller's frame.
+// The compiler inserts a call to this at the end of any
+// function which calls defer.
+func deferreturn() {
+	gp := getg()
+	for {
+		d := gp._defer
+		if d == nil {
+			return
+		}
+		sp := getcallersp()
+		if d.sp != sp {
+			return
+		}
+		if d.openDefer {
+			done := runOpenDeferFrame(d)
+			if !done {
+				throw("unfinished open-coded defers in deferreturn")
+			}
+			gp._defer = d.link
+			freedefer(d)
+			// If this frame uses open defers, then this
+			// must be the only defer record for the
+			// frame, so we can just return.
+			return
+		}
+
+		fn := d.fn
+		d.fn = nil
+		gp._defer = d.link
+		freedefer(d)
+		fn()
+	}
+}
+
+// Goexit terminates the goroutine that calls it. No other goroutine is affected.
+// Goexit runs all deferred calls before terminating the goroutine. Because Goexit
+// is not a panic, any recover calls in those deferred functions will return nil.
+//
+// Calling Goexit from the main goroutine terminates that goroutine
+// without func main returning. Since func main has not returned,
+// the program continues execution of other goroutines.
+// If all other goroutines exit, the program crashes.
+func Goexit() {
+	// Run all deferred functions for the current goroutine.
+	// This code is similar to gopanic, see that implementation
+	// for detailed comments.
+	gp := getg()
+
+	// Create a panic object for Goexit, so we can recognize when it might be
+	// bypassed by a recover().
+	var p _panic
+	p.goexit = true
+	p.link = gp._panic
+	gp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
+
+	addOneOpenDeferFrame(gp, getcallerpc(), unsafe.Pointer(getcallersp()))
+	for {
+		d := gp._defer
+		if d == nil {
+			break
+		}
+		if d.started {
+			if d._panic != nil {
+				d._panic.aborted = true
+				d._panic = nil
+			}
+			if !d.openDefer {
+				d.fn = nil
+				gp._defer = d.link
+				freedefer(d)
+				continue
+			}
+		}
+		d.started = true
+		d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
+		if d.openDefer {
+			done := runOpenDeferFrame(d)
+			if !done {
+				// We should always run all defers in the frame,
+				// since there is no panic associated with this
+				// defer that can be recovered.
+				throw("unfinished open-coded defers in Goexit")
+			}
+			if p.aborted {
+				// Since our current defer caused a panic and may
+				// have been already freed, just restart scanning
+				// for open-coded defers from this frame again.
+				addOneOpenDeferFrame(gp, getcallerpc(), unsafe.Pointer(getcallersp()))
+			} else {
+				addOneOpenDeferFrame(gp, 0, nil)
+			}
+		} else {
+			// Save the pc/sp in deferCallSave(), so we can "recover" back to this
+			// loop if necessary.
+			deferCallSave(&p, d.fn)
+		}
+		if p.aborted {
+			// We had a recursive panic in the defer d we started, and
+			// then did a recover in a defer that was further down the
+			// defer chain than d. In the case of an outstanding Goexit,
+			// we force the recover to return back to this loop. d will
+			// have already been freed if completed, so just continue
+			// immediately to the next defer on the chain.
+			p.aborted = false
+			continue
+		}
+		if gp._defer != d {
+			throw("bad defer entry in Goexit")
+		}
+		d._panic = nil
+		d.fn = nil
+		gp._defer = d.link
+		freedefer(d)
+		// Note: we ignore recovers here because Goexit isn't a panic
+	}
+	goexit1()
+}
+
+// Call all Error and String methods before freezing the world.
+// Used when crashing with panicking.
+func preprintpanics(p *_panic) {
+	defer func() {
+		text := "panic while printing panic value"
+		switch r := recover().(type) {
+		case nil:
+			// nothing to do
+		case string:
+			throw(text + ": " + r)
+		default:
+			throw(text + ": type " + toRType(efaceOf(&r)._type).string())
+		}
+	}()
+	for p != nil {
+		switch v := p.arg.(type) {
+		case error:
+			p.arg = v.Error()
+		case stringer:
+			p.arg = v.String()
+		}
+		p = p.link
+	}
+}
+
+// Print all currently active panics. Used when crashing.
+// Should only be called after preprintpanics.
+func printpanics(p *_panic) {
+	if p.link != nil {
+		printpanics(p.link)
+		if !p.link.goexit {
+			print("\t")
+		}
+	}
+	if p.goexit {
+		return
+	}
+	print("panic: ")
+	printany(p.arg)
+	if p.recovered {
+		print(" [recovered]")
+	}
+	print("\n")
+}
+
+// addOneOpenDeferFrame scans the stack (in gentraceback order, from inner frames to
+// outer frames) for the first frame (if any) with open-coded defers. If it finds
+// one, it adds a single entry to the defer chain for that frame. The entry added
+// represents all the defers in the associated open defer frame, and is sorted in
+// order with respect to any non-open-coded defers.
+//
+// addOneOpenDeferFrame stops (possibly without adding a new entry) if it encounters
+// an in-progress open defer entry. An in-progress open defer entry means there has
+// been a new panic because of a defer in the associated frame. addOneOpenDeferFrame
+// does not add an open defer entry past a started entry, because that started entry
+// still needs to finished, and addOneOpenDeferFrame will be called when that started
+// entry is completed. The defer removal loop in gopanic() similarly stops at an
+// in-progress defer entry. Together, addOneOpenDeferFrame and the defer removal loop
+// ensure the invariant that there is no open defer entry further up the stack than
+// an in-progress defer, and also that the defer removal loop is guaranteed to remove
+// all not-in-progress open defer entries from the defer chain.
+//
+// If sp is non-nil, addOneOpenDeferFrame starts the stack scan from the frame
+// specified by sp. If sp is nil, it uses the sp from the current defer record (which
+// has just been finished). Hence, it continues the stack scan from the frame of the
+// defer that just finished. It skips any frame that already has a (not-in-progress)
+// open-coded _defer record in the defer chain.
+//
+// Note: All entries of the defer chain (including this new open-coded entry) have
+// their pointers (including sp) adjusted properly if the stack moves while
+// running deferred functions. Also, it is safe to pass in the sp arg (which is
+// the direct result of calling getcallersp()), because all pointer variables
+// (including arguments) are adjusted as needed during stack copies.
+func addOneOpenDeferFrame(gp *g, pc uintptr, sp unsafe.Pointer) {
+	var prevDefer *_defer
+	if sp == nil {
+		prevDefer = gp._defer
+		pc = prevDefer.framepc
+		sp = unsafe.Pointer(prevDefer.sp)
+	}
+	systemstack(func() {
+		var u unwinder
+	frames:
+		for u.initAt(pc, uintptr(sp), 0, gp, 0); u.valid(); u.next() {
+			frame := &u.frame
+			if prevDefer != nil && prevDefer.sp == frame.sp {
+				// Skip the frame for the previous defer that
+				// we just finished (and was used to set
+				// where we restarted the stack scan)
+				continue
+			}
+			f := frame.fn
+			fd := funcdata(f, abi.FUNCDATA_OpenCodedDeferInfo)
+			if fd == nil {
+				continue
+			}
+			// Insert the open defer record in the
+			// chain, in order sorted by sp.
+			d := gp._defer
+			var prev *_defer
+			for d != nil {
+				dsp := d.sp
+				if frame.sp < dsp {
+					break
+				}
+				if frame.sp == dsp {
+					if !d.openDefer {
+						throw("duplicated defer entry")
+					}
+					// Don't add any record past an
+					// in-progress defer entry. We don't
+					// need it, and more importantly, we
+					// want to keep the invariant that
+					// there is no open defer entry
+					// passed an in-progress entry (see
+					// header comment).
+					if d.started {
+						break frames
+					}
+					continue frames
+				}
+				prev = d
+				d = d.link
+			}
+			if frame.fn.deferreturn == 0 {
+				throw("missing deferreturn")
+			}
+
+			d1 := newdefer()
+			d1.openDefer = true
+			d1._panic = nil
+			// These are the pc/sp to set after we've
+			// run a defer in this frame that did a
+			// recover. We return to a special
+			// deferreturn that runs any remaining
+			// defers and then returns from the
+			// function.
+			d1.pc = frame.fn.entry() + uintptr(frame.fn.deferreturn)
+			d1.varp = frame.varp
+			d1.fd = fd
+			// Save the SP/PC associated with current frame,
+			// so we can continue stack trace later if needed.
+			d1.framepc = frame.pc
+			d1.sp = frame.sp
+			d1.link = d
+			if prev == nil {
+				gp._defer = d1
+			} else {
+				prev.link = d1
+			}
+			// Stop stack scanning after adding one open defer record
+			break
+		}
+	})
+}
+
+// readvarintUnsafe reads the uint32 in varint format starting at fd, and returns the
+// uint32 and a pointer to the byte following the varint.
+//
+// There is a similar function runtime.readvarint, which takes a slice of bytes,
+// rather than an unsafe pointer. These functions are duplicated, because one of
+// the two use cases for the functions would get slower if the functions were
+// combined.
+func readvarintUnsafe(fd unsafe.Pointer) (uint32, unsafe.Pointer) {
+	var r uint32
+	var shift int
+	for {
+		b := *(*uint8)((unsafe.Pointer(fd)))
+		fd = add(fd, unsafe.Sizeof(b))
+		if b < 128 {
+			return r + uint32(b)<<shift, fd
+		}
+		r += ((uint32(b) &^ 128) << shift)
+		shift += 7
+		if shift > 28 {
+			panic("Bad varint")
+		}
+	}
+}
+
+// runOpenDeferFrame runs the active open-coded defers in the frame specified by
+// d. It normally processes all active defers in the frame, but stops immediately
+// if a defer does a successful recover. It returns true if there are no
+// remaining defers to run in the frame.
+func runOpenDeferFrame(d *_defer) bool {
+	done := true
+	fd := d.fd
+
+	deferBitsOffset, fd := readvarintUnsafe(fd)
+	nDefers, fd := readvarintUnsafe(fd)
+	deferBits := *(*uint8)(unsafe.Pointer(d.varp - uintptr(deferBitsOffset)))
+
+	for i := int(nDefers) - 1; i >= 0; i-- {
+		// read the funcdata info for this defer
+		var closureOffset uint32
+		closureOffset, fd = readvarintUnsafe(fd)
+		if deferBits&(1<<i) == 0 {
+			continue
+		}
+		closure := *(*func())(unsafe.Pointer(d.varp - uintptr(closureOffset)))
+		d.fn = closure
+		deferBits = deferBits &^ (1 << i)
+		*(*uint8)(unsafe.Pointer(d.varp - uintptr(deferBitsOffset))) = deferBits
+		p := d._panic
+		// Call the defer. Note that this can change d.varp if
+		// the stack moves.
+		deferCallSave(p, d.fn)
+		if p != nil && p.aborted {
+			break
+		}
+		d.fn = nil
+		if d._panic != nil && d._panic.recovered {
+			done = deferBits == 0
+			break
+		}
+	}
+
+	return done
+}
+
+// deferCallSave calls fn() after saving the caller's pc and sp in the
+// panic record. This allows the runtime to return to the Goexit defer
+// processing loop, in the unusual case where the Goexit may be
+// bypassed by a successful recover.
+//
+// This is marked as a wrapper by the compiler so it doesn't appear in
+// tracebacks.
+func deferCallSave(p *_panic, fn func()) {
+	if p != nil {
+		p.argp = unsafe.Pointer(getargp())
+		p.pc = getcallerpc()
+		p.sp = unsafe.Pointer(getcallersp())
+	}
+	fn()
+	if p != nil {
+		p.pc = 0
+		p.sp = unsafe.Pointer(nil)
+	}
+}
+
+// A PanicNilError happens when code calls panic(nil).
+//
+// Before Go 1.21, programs that called panic(nil) observed recover returning nil.
+// Starting in Go 1.21, programs that call panic(nil) observe recover returning a *PanicNilError.
+// Programs can change back to the old behavior by setting GODEBUG=panicnil=1.
+type PanicNilError struct {
+	// This field makes PanicNilError structurally different from
+	// any other struct in this package, and the _ makes it different
+	// from any struct in other packages too.
+	// This avoids any accidental conversions being possible
+	// between this struct and some other struct sharing the same fields,
+	// like happened in go.dev/issue/56603.
+	_ [0]*PanicNilError
+}
+
+func (*PanicNilError) Error() string { return "panic called with nil argument" }
+func (*PanicNilError) RuntimeError() {}
+
+var panicnil = &godebugInc{name: "panicnil"}
+
+// The implementation of the predeclared function panic.
+func gopanic(e any) {
+	if e == nil {
+		if debug.panicnil.Load() != 1 {
+			e = new(PanicNilError)
+		} else {
+			panicnil.IncNonDefault()
+		}
+	}
+
+	gp := getg()
+	if gp.m.curg != gp {
+		print("panic: ")
+		printany(e)
+		print("\n")
+		throw("panic on system stack")
+	}
+
+	if gp.m.mallocing != 0 {
+		print("panic: ")
+		printany(e)
+		print("\n")
+		throw("panic during malloc")
+	}
+	if gp.m.preemptoff != "" {
+		print("panic: ")
+		printany(e)
+		print("\n")
+		print("preempt off reason: ")
+		print(gp.m.preemptoff)
+		print("\n")
+		throw("panic during preemptoff")
+	}
+	if gp.m.locks != 0 {
+		print("panic: ")
+		printany(e)
+		print("\n")
+		throw("panic holding locks")
+	}
+
+	var p _panic
+	p.arg = e
+	p.link = gp._panic
+	gp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
+
+	runningPanicDefers.Add(1)
+
+	// By calculating getcallerpc/getcallersp here, we avoid scanning the
+	// gopanic frame (stack scanning is slow...)
+	addOneOpenDeferFrame(gp, getcallerpc(), unsafe.Pointer(getcallersp()))
+
+	for {
+		d := gp._defer
+		if d == nil {
+			break
+		}
+
+		// If defer was started by earlier panic or Goexit (and, since we're back here, that triggered a new panic),
+		// take defer off list. An earlier panic will not continue running, but we will make sure below that an
+		// earlier Goexit does continue running.
+		if d.started {
+			if d._panic != nil {
+				d._panic.aborted = true
+			}
+			d._panic = nil
+			if !d.openDefer {
+				// For open-coded defers, we need to process the
+				// defer again, in case there are any other defers
+				// to call in the frame (not including the defer
+				// call that caused the panic).
+				d.fn = nil
+				gp._defer = d.link
+				freedefer(d)
+				continue
+			}
+		}
+
+		// Mark defer as started, but keep on list, so that traceback
+		// can find and update the defer's argument frame if stack growth
+		// or a garbage collection happens before executing d.fn.
+		d.started = true
+
+		// Record the panic that is running the defer.
+		// If there is a new panic during the deferred call, that panic
+		// will find d in the list and will mark d._panic (this panic) aborted.
+		d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))
+
+		done := true
+		if d.openDefer {
+			done = runOpenDeferFrame(d)
+			if done && !d._panic.recovered {
+				addOneOpenDeferFrame(gp, 0, nil)
+			}
+		} else {
+			p.argp = unsafe.Pointer(getargp())
+			d.fn()
+		}
+		p.argp = nil
+
+		// Deferred function did not panic. Remove d.
+		if gp._defer != d {
+			throw("bad defer entry in panic")
+		}
+		d._panic = nil
+
+		// trigger shrinkage to test stack copy. See stack_test.go:TestStackPanic
+		//GC()
+
+		pc := d.pc
+		sp := unsafe.Pointer(d.sp) // must be pointer so it gets adjusted during stack copy
+		if done {
+			d.fn = nil
+			gp._defer = d.link
+			freedefer(d)
+		}
+		if p.recovered {
+			gp._panic = p.link
+			if gp._panic != nil && gp._panic.goexit && gp._panic.aborted {
+				// A normal recover would bypass/abort the Goexit.  Instead,
+				// we return to the processing loop of the Goexit.
+				gp.sigcode0 = uintptr(gp._panic.sp)
+				gp.sigcode1 = uintptr(gp._panic.pc)
+				mcall(recovery)
+				throw("bypassed recovery failed") // mcall should not return
+			}
+			runningPanicDefers.Add(-1)
+
+			// After a recover, remove any remaining non-started,
+			// open-coded defer entries, since the corresponding defers
+			// will be executed normally (inline). Any such entry will
+			// become stale once we run the corresponding defers inline
+			// and exit the associated stack frame. We only remove up to
+			// the first started (in-progress) open defer entry, not
+			// including the current frame, since any higher entries will
+			// be from a higher panic in progress, and will still be
+			// needed.
+			d := gp._defer
+			var prev *_defer
+			if !done {
+				// Skip our current frame, if not done. It is
+				// needed to complete any remaining defers in
+				// deferreturn()
+				prev = d
+				d = d.link
+			}
+			for d != nil {
+				if d.started {
+					// This defer is started but we
+					// are in the middle of a
+					// defer-panic-recover inside of
+					// it, so don't remove it or any
+					// further defer entries
+					break
+				}
+				if d.openDefer {
+					if prev == nil {
+						gp._defer = d.link
+					} else {
+						prev.link = d.link
+					}
+					newd := d.link
+					freedefer(d)
+					d = newd
+				} else {
+					prev = d
+					d = d.link
+				}
+			}
+
+			gp._panic = p.link
+			// Aborted panics are marked but remain on the g.panic list.
+			// Remove them from the list.
+			for gp._panic != nil && gp._panic.aborted {
+				gp._panic = gp._panic.link
+			}
+			if gp._panic == nil { // must be done with signal
+				gp.sig = 0
+			}
+			// Pass information about recovering frame to recovery.
+			gp.sigcode0 = uintptr(sp)
+			gp.sigcode1 = pc
+			mcall(recovery)
+			throw("recovery failed") // mcall should not return
+		}
+	}
+
+	// ran out of deferred calls - old-school panic now
+	// Because it is unsafe to call arbitrary user code after freezing
+	// the world, we call preprintpanics to invoke all necessary Error
+	// and String methods to prepare the panic strings before startpanic.
+	preprintpanics(gp._panic)
+
+	fatalpanic(gp._panic) // should not return
+	*(*int)(nil) = 0      // not reached
+}
+
+// getargp returns the location where the caller
+// writes outgoing function call arguments.
+//
+//go:nosplit
+//go:noinline
+func getargp() uintptr {
+	return getcallersp() + sys.MinFrameSize
+}
+
+// The implementation of the predeclared function recover.
+// Cannot split the stack because it needs to reliably
+// find the stack segment of its caller.
+//
+// TODO(rsc): Once we commit to CopyStackAlways,
+// this doesn't need to be nosplit.
+//
+//go:nosplit
+func gorecover(argp uintptr) any {
+	// Must be in a function running as part of a deferred call during the panic.
+	// Must be called from the topmost function of the call
+	// (the function used in the defer statement).
+	// p.argp is the argument pointer of that topmost deferred function call.
+	// Compare against argp reported by caller.
+	// If they match, the caller is the one who can recover.
+	gp := getg()
+	p := gp._panic
+	if p != nil && !p.goexit && !p.recovered && argp == uintptr(p.argp) {
+		p.recovered = true
+		return p.arg
+	}
+	return nil
+}
+
+//go:linkname sync_throw sync.throw
+func sync_throw(s string) {
+	throw(s)
+}
+
+//go:linkname sync_fatal sync.fatal
+func sync_fatal(s string) {
+	fatal(s)
+}
+
+// throw triggers a fatal error that dumps a stack trace and exits.
+//
+// throw should be used for runtime-internal fatal errors where Go itself,
+// rather than user code, may be at fault for the failure.
+//
+//go:nosplit
+func throw(s string) {
+	// Everything throw does should be recursively nosplit so it
+	// can be called even when it's unsafe to grow the stack.
+	systemstack(func() {
+		print("fatal error: ", s, "\n")
+	})
+
+	fatalthrow(throwTypeRuntime)
+}
+
+// fatal triggers a fatal error that dumps a stack trace and exits.
+//
+// fatal is equivalent to throw, but is used when user code is expected to be
+// at fault for the failure, such as racing map writes.
+//
+// fatal does not include runtime frames, system goroutines, or frame metadata
+// (fp, sp, pc) in the stack trace unless GOTRACEBACK=system or higher.
+//
+//go:nosplit
+func fatal(s string) {
+	// Everything fatal does should be recursively nosplit so it
+	// can be called even when it's unsafe to grow the stack.
+	systemstack(func() {
+		print("fatal error: ", s, "\n")
+	})
+
+	fatalthrow(throwTypeUser)
+}
+
+// runningPanicDefers is non-zero while running deferred functions for panic.
+// This is used to try hard to get a panic stack trace out when exiting.
+var runningPanicDefers atomic.Uint32
+
+// panicking is non-zero when crashing the program for an unrecovered panic.
+var panicking atomic.Uint32
+
+// paniclk is held while printing the panic information and stack trace,
+// so that two concurrent panics don't overlap their output.
+var paniclk mutex
+
+// Unwind the stack after a deferred function calls recover
+// after a panic. Then arrange to continue running as though
+// the caller of the deferred function returned normally.
+func recovery(gp *g) {
+	// Info about defer passed in G struct.
+	sp := gp.sigcode0
+	pc := gp.sigcode1
+
+	// d's arguments need to be in the stack.
+	if sp != 0 && (sp < gp.stack.lo || gp.stack.hi < sp) {
+		print("recover: ", hex(sp), " not in [", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n")
+		throw("bad recovery")
+	}
+
+	// Make the deferproc for this d return again,
+	// this time returning 1. The calling function will
+	// jump to the standard return epilogue.
+	gp.sched.sp = sp
+	gp.sched.pc = pc
+	gp.sched.lr = 0
+	// Restore the bp on platforms that support frame pointers.
+	// N.B. It's fine to not set anything for platforms that don't
+	// support frame pointers, since nothing consumes them.
+	switch {
+	case goarch.IsAmd64 != 0:
+		// On x86, the architectural bp is stored 2 words below the
+		// stack pointer.
+		gp.sched.bp = *(*uintptr)(unsafe.Pointer(sp - 2*goarch.PtrSize))
+	case goarch.IsArm64 != 0:
+		// on arm64, the architectural bp points one word higher
+		// than the sp.
+		gp.sched.bp = sp - goarch.PtrSize
+	}
+	gp.sched.ret = 1
+	gogo(&gp.sched)
+}
+
+// fatalthrow implements an unrecoverable runtime throw. It freezes the
+// system, prints stack traces starting from its caller, and terminates the
+// process.
+//
+//go:nosplit
+func fatalthrow(t throwType) {
+	pc := getcallerpc()
+	sp := getcallersp()
+	gp := getg()
+
+	if gp.m.throwing == throwTypeNone {
+		gp.m.throwing = t
+	}
+
+	// Switch to the system stack to avoid any stack growth, which may make
+	// things worse if the runtime is in a bad state.
+	systemstack(func() {
+		if isSecureMode() {
+			exit(2)
+		}
+
+		startpanic_m()
+
+		if dopanic_m(gp, pc, sp) {
+			// crash uses a decent amount of nosplit stack and we're already
+			// low on stack in throw, so crash on the system stack (unlike
+			// fatalpanic).
+			crash()
+		}
+
+		exit(2)
+	})
+
+	*(*int)(nil) = 0 // not reached
+}
+
+// fatalpanic implements an unrecoverable panic. It is like fatalthrow, except
+// that if msgs != nil, fatalpanic also prints panic messages and decrements
+// runningPanicDefers once main is blocked from exiting.
+//
+//go:nosplit
+func fatalpanic(msgs *_panic) {
+	pc := getcallerpc()
+	sp := getcallersp()
+	gp := getg()
+	var docrash bool
+	// Switch to the system stack to avoid any stack growth, which
+	// may make things worse if the runtime is in a bad state.
+	systemstack(func() {
+		if startpanic_m() && msgs != nil {
+			// There were panic messages and startpanic_m
+			// says it's okay to try to print them.
+
+			// startpanic_m set panicking, which will
+			// block main from exiting, so now OK to
+			// decrement runningPanicDefers.
+			runningPanicDefers.Add(-1)
+
+			printpanics(msgs)
+		}
+
+		docrash = dopanic_m(gp, pc, sp)
+	})
+
+	if docrash {
+		// By crashing outside the above systemstack call, debuggers
+		// will not be confused when generating a backtrace.
+		// Function crash is marked nosplit to avoid stack growth.
+		crash()
+	}
+
+	systemstack(func() {
+		exit(2)
+	})
+
+	*(*int)(nil) = 0 // not reached
+}
+
+// startpanic_m prepares for an unrecoverable panic.
+//
+// It returns true if panic messages should be printed, or false if
+// the runtime is in bad shape and should just print stacks.
+//
+// It must not have write barriers even though the write barrier
+// explicitly ignores writes once dying > 0. Write barriers still
+// assume that g.m.p != nil, and this function may not have P
+// in some contexts (e.g. a panic in a signal handler for a signal
+// sent to an M with no P).
+//
+//go:nowritebarrierrec
+func startpanic_m() bool {
+	gp := getg()
+	if mheap_.cachealloc.size == 0 { // very early
+		print("runtime: panic before malloc heap initialized\n")
+	}
+	// Disallow malloc during an unrecoverable panic. A panic
+	// could happen in a signal handler, or in a throw, or inside
+	// malloc itself. We want to catch if an allocation ever does
+	// happen (even if we're not in one of these situations).
+	gp.m.mallocing++
+
+	// If we're dying because of a bad lock count, set it to a
+	// good lock count so we don't recursively panic below.
+	if gp.m.locks < 0 {
+		gp.m.locks = 1
+	}
+
+	switch gp.m.dying {
+	case 0:
+		// Setting dying >0 has the side-effect of disabling this G's writebuf.
+		gp.m.dying = 1
+		panicking.Add(1)
+		lock(&paniclk)
+		if debug.schedtrace > 0 || debug.scheddetail > 0 {
+			schedtrace(true)
+		}
+		freezetheworld()
+		return true
+	case 1:
+		// Something failed while panicking.
+		// Just print a stack trace and exit.
+		gp.m.dying = 2
+		print("panic during panic\n")
+		return false
+	case 2:
+		// This is a genuine bug in the runtime, we couldn't even
+		// print the stack trace successfully.
+		gp.m.dying = 3
+		print("stack trace unavailable\n")
+		exit(4)
+		fallthrough
+	default:
+		// Can't even print! Just exit.
+		exit(5)
+		return false // Need to return something.
+	}
+}
+
+var didothers bool
+var deadlock mutex
+
+// gp is the crashing g running on this M, but may be a user G, while getg() is
+// always g0.
+func dopanic_m(gp *g, pc, sp uintptr) bool {
+	if gp.sig != 0 {
+		signame := signame(gp.sig)
+		if signame != "" {
+			print("[signal ", signame)
+		} else {
+			print("[signal ", hex(gp.sig))
+		}
+		print(" code=", hex(gp.sigcode0), " addr=", hex(gp.sigcode1), " pc=", hex(gp.sigpc), "]\n")
+	}
+
+	level, all, docrash := gotraceback()
+	if level > 0 {
+		if gp != gp.m.curg {
+			all = true
+		}
+		if gp != gp.m.g0 {
+			print("\n")
+			goroutineheader(gp)
+			traceback(pc, sp, 0, gp)
+		} else if level >= 2 || gp.m.throwing >= throwTypeRuntime {
+			print("\nruntime stack:\n")
+			traceback(pc, sp, 0, gp)
+		}
+		if !didothers && all {
+			didothers = true
+			tracebackothers(gp)
+		}
+	}
+	unlock(&paniclk)
+
+	if panicking.Add(-1) != 0 {
+		// Some other m is panicking too.
+		// Let it print what it needs to print.
+		// Wait forever without chewing up cpu.
+		// It will exit when it's done.
+		lock(&deadlock)
+		lock(&deadlock)
+	}
+
+	printDebugLog()
+
+	return docrash
+}
+
+// canpanic returns false if a signal should throw instead of
+// panicking.
+//
+//go:nosplit
+func canpanic() bool {
+	gp := getg()
+	mp := acquirem()
+
+	// Is it okay for gp to panic instead of crashing the program?
+	// Yes, as long as it is running Go code, not runtime code,
+	// and not stuck in a system call.
+	if gp != mp.curg {
+		releasem(mp)
+		return false
+	}
+	// N.B. mp.locks != 1 instead of 0 to account for acquirem.
+	if mp.locks != 1 || mp.mallocing != 0 || mp.throwing != throwTypeNone || mp.preemptoff != "" || mp.dying != 0 {
+		releasem(mp)
+		return false
+	}
+	status := readgstatus(gp)
+	if status&^_Gscan != _Grunning || gp.syscallsp != 0 {
+		releasem(mp)
+		return false
+	}
+	if GOOS == "windows" && mp.libcallsp != 0 {
+		releasem(mp)
+		return false
+	}
+	releasem(mp)
+	return true
+}
+
+// shouldPushSigpanic reports whether pc should be used as sigpanic's
+// return PC (pushing a frame for the call). Otherwise, it should be
+// left alone so that LR is used as sigpanic's return PC, effectively
+// replacing the top-most frame with sigpanic. This is used by
+// preparePanic.
+func shouldPushSigpanic(gp *g, pc, lr uintptr) bool {
+	if pc == 0 {
+		// Probably a call to a nil func. The old LR is more
+		// useful in the stack trace. Not pushing the frame
+		// will make the trace look like a call to sigpanic
+		// instead. (Otherwise the trace will end at sigpanic
+		// and we won't get to see who faulted.)
+		return false
+	}
+	// If we don't recognize the PC as code, but we do recognize
+	// the link register as code, then this assumes the panic was
+	// caused by a call to non-code. In this case, we want to
+	// ignore this call to make unwinding show the context.
+	//
+	// If we running C code, we're not going to recognize pc as a
+	// Go function, so just assume it's good. Otherwise, traceback
+	// may try to read a stale LR that looks like a Go code
+	// pointer and wander into the woods.
+	if gp.m.incgo || findfunc(pc).valid() {
+		// This wasn't a bad call, so use PC as sigpanic's
+		// return PC.
+		return true
+	}
+	if findfunc(lr).valid() {
+		// This was a bad call, but the LR is good, so use the
+		// LR as sigpanic's return PC.
+		return false
+	}
+	// Neither the PC or LR is good. Hopefully pushing a frame
+	// will work.
+	return true
+}
+
+// isAbortPC reports whether pc is the program counter at which
+// runtime.abort raises a signal.
+//
+// It is nosplit because it's part of the isgoexception
+// implementation.
+//
+//go:nosplit
+func isAbortPC(pc uintptr) bool {
+	f := findfunc(pc)
+	if !f.valid() {
+		return false
+	}
+	return f.funcID == abi.FuncID_abort
+}
diff --git a/src/runtime/panic32.go b/src/runtime/panic32.go
new file mode 100644
index 0000000..fa3f2bf
--- /dev/null
+++ b/src/runtime/panic32.go
@@ -0,0 +1,105 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build 386 || arm || mips || mipsle
+
+package runtime
+
+// Additional index/slice error paths for 32-bit platforms.
+// Used when the high word of a 64-bit index is not zero.
+
+// failures in the comparisons for s[x], 0 <= x < y (y == len(s))
+func goPanicExtendIndex(hi int, lo uint, y int) {
+	panicCheck1(getcallerpc(), "index out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: true, y: y, code: boundsIndex})
+}
+func goPanicExtendIndexU(hi uint, lo uint, y int) {
+	panicCheck1(getcallerpc(), "index out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: false, y: y, code: boundsIndex})
+}
+
+// failures in the comparisons for s[:x], 0 <= x <= y (y == len(s) or cap(s))
+func goPanicExtendSliceAlen(hi int, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: true, y: y, code: boundsSliceAlen})
+}
+func goPanicExtendSliceAlenU(hi uint, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: false, y: y, code: boundsSliceAlen})
+}
+func goPanicExtendSliceAcap(hi int, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: true, y: y, code: boundsSliceAcap})
+}
+func goPanicExtendSliceAcapU(hi uint, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: false, y: y, code: boundsSliceAcap})
+}
+
+// failures in the comparisons for s[x:y], 0 <= x <= y
+func goPanicExtendSliceB(hi int, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: true, y: y, code: boundsSliceB})
+}
+func goPanicExtendSliceBU(hi uint, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: false, y: y, code: boundsSliceB})
+}
+
+// failures in the comparisons for s[::x], 0 <= x <= y (y == len(s) or cap(s))
+func goPanicExtendSlice3Alen(hi int, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: true, y: y, code: boundsSlice3Alen})
+}
+func goPanicExtendSlice3AlenU(hi uint, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: false, y: y, code: boundsSlice3Alen})
+}
+func goPanicExtendSlice3Acap(hi int, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: true, y: y, code: boundsSlice3Acap})
+}
+func goPanicExtendSlice3AcapU(hi uint, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: false, y: y, code: boundsSlice3Acap})
+}
+
+// failures in the comparisons for s[:x:y], 0 <= x <= y
+func goPanicExtendSlice3B(hi int, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: true, y: y, code: boundsSlice3B})
+}
+func goPanicExtendSlice3BU(hi uint, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: false, y: y, code: boundsSlice3B})
+}
+
+// failures in the comparisons for s[x:y:], 0 <= x <= y
+func goPanicExtendSlice3C(hi int, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: true, y: y, code: boundsSlice3C})
+}
+func goPanicExtendSlice3CU(hi uint, lo uint, y int) {
+	panicCheck1(getcallerpc(), "slice bounds out of range")
+	panic(boundsError{x: int64(hi)<<32 + int64(lo), signed: false, y: y, code: boundsSlice3C})
+}
+
+// Implemented in assembly, as they take arguments in registers.
+// Declared here to mark them as ABIInternal.
+func panicExtendIndex(hi int, lo uint, y int)
+func panicExtendIndexU(hi uint, lo uint, y int)
+func panicExtendSliceAlen(hi int, lo uint, y int)
+func panicExtendSliceAlenU(hi uint, lo uint, y int)
+func panicExtendSliceAcap(hi int, lo uint, y int)
+func panicExtendSliceAcapU(hi uint, lo uint, y int)
+func panicExtendSliceB(hi int, lo uint, y int)
+func panicExtendSliceBU(hi uint, lo uint, y int)
+func panicExtendSlice3Alen(hi int, lo uint, y int)
+func panicExtendSlice3AlenU(hi uint, lo uint, y int)
+func panicExtendSlice3Acap(hi int, lo uint, y int)
+func panicExtendSlice3AcapU(hi uint, lo uint, y int)
+func panicExtendSlice3B(hi int, lo uint, y int)
+func panicExtendSlice3BU(hi uint, lo uint, y int)
+func panicExtendSlice3C(hi int, lo uint, y int)
+func panicExtendSlice3CU(hi uint, lo uint, y int)
diff --git a/src/runtime/panic_test.go b/src/runtime/panic_test.go
new file mode 100644
index 0000000..b8a300f
--- /dev/null
+++ b/src/runtime/panic_test.go
@@ -0,0 +1,48 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"strings"
+	"testing"
+)
+
+// Test that panics print out the underlying value
+// when the underlying kind is directly printable.
+// Issue: https://golang.org/issues/37531
+func TestPanicWithDirectlyPrintableCustomTypes(t *testing.T) {
+	tests := []struct {
+		name            string
+		wantPanicPrefix string
+	}{
+		{"panicCustomBool", `panic: main.MyBool(true)`},
+		{"panicCustomComplex128", `panic: main.MyComplex128(+3.210000e+001+1.000000e+001i)`},
+		{"panicCustomComplex64", `panic: main.MyComplex64(+1.100000e-001+3.000000e+000i)`},
+		{"panicCustomFloat32", `panic: main.MyFloat32(-9.370000e+001)`},
+		{"panicCustomFloat64", `panic: main.MyFloat64(-9.370000e+001)`},
+		{"panicCustomInt", `panic: main.MyInt(93)`},
+		{"panicCustomInt8", `panic: main.MyInt8(93)`},
+		{"panicCustomInt16", `panic: main.MyInt16(93)`},
+		{"panicCustomInt32", `panic: main.MyInt32(93)`},
+		{"panicCustomInt64", `panic: main.MyInt64(93)`},
+		{"panicCustomString", `panic: main.MyString("Panic")`},
+		{"panicCustomUint", `panic: main.MyUint(93)`},
+		{"panicCustomUint8", `panic: main.MyUint8(93)`},
+		{"panicCustomUint16", `panic: main.MyUint16(93)`},
+		{"panicCustomUint32", `panic: main.MyUint32(93)`},
+		{"panicCustomUint64", `panic: main.MyUint64(93)`},
+		{"panicCustomUintptr", `panic: main.MyUintptr(93)`},
+	}
+
+	for _, tt := range tests {
+		t := t
+		t.Run(tt.name, func(t *testing.T) {
+			output := runTestProg(t, "testprog", tt.name)
+			if !strings.HasPrefix(output, tt.wantPanicPrefix) {
+				t.Fatalf("%q\nis not present in\n%s", tt.wantPanicPrefix, output)
+			}
+		})
+	}
+}
diff --git a/src/runtime/panicnil_test.go b/src/runtime/panicnil_test.go
new file mode 100644
index 0000000..7ed98e9
--- /dev/null
+++ b/src/runtime/panicnil_test.go
@@ -0,0 +1,54 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"reflect"
+	"runtime"
+	"runtime/metrics"
+	"testing"
+)
+
+func TestPanicNil(t *testing.T) {
+	t.Run("default", func(t *testing.T) {
+		checkPanicNil(t, new(runtime.PanicNilError))
+	})
+	t.Run("GODEBUG=panicnil=0", func(t *testing.T) {
+		t.Setenv("GODEBUG", "panicnil=0")
+		checkPanicNil(t, new(runtime.PanicNilError))
+	})
+	t.Run("GODEBUG=panicnil=1", func(t *testing.T) {
+		t.Setenv("GODEBUG", "panicnil=1")
+		checkPanicNil(t, nil)
+	})
+}
+
+func checkPanicNil(t *testing.T, want any) {
+	name := "/godebug/non-default-behavior/panicnil:events"
+	s := []metrics.Sample{{Name: name}}
+	metrics.Read(s)
+	v1 := s[0].Value.Uint64()
+
+	defer func() {
+		e := recover()
+		if reflect.TypeOf(e) != reflect.TypeOf(want) {
+			println(e, want)
+			t.Errorf("recover() = %v, want %v", e, want)
+			panic(e)
+		}
+		metrics.Read(s)
+		v2 := s[0].Value.Uint64()
+		if want == nil {
+			if v2 != v1+1 {
+				t.Errorf("recover() with panicnil=1 did not increment metric %s", name)
+			}
+		} else {
+			if v2 != v1 {
+				t.Errorf("recover() with panicnil=0 incremented metric %s: %d -> %d", name, v1, v2)
+			}
+		}
+	}()
+	panic(nil)
+}
diff --git a/src/runtime/pinner.go b/src/runtime/pinner.go
new file mode 100644
index 0000000..75de8be
--- /dev/null
+++ b/src/runtime/pinner.go
@@ -0,0 +1,377 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// A Pinner is a set of pinned Go objects. An object can be pinned with
+// the Pin method and all pinned objects of a Pinner can be unpinned with the
+// Unpin method.
+type Pinner struct {
+	*pinner
+}
+
+// Pin pins a Go object, preventing it from being moved or freed by the garbage
+// collector until the Unpin method has been called.
+//
+// A pointer to a pinned
+// object can be directly stored in C memory or can be contained in Go memory
+// passed to C functions. If the pinned object itself contains pointers to Go
+// objects, these objects must be pinned separately if they are going to be
+// accessed from C code.
+//
+// The argument must be a pointer of any type or an unsafe.Pointer.
+func (p *Pinner) Pin(pointer any) {
+	if p.pinner == nil {
+		// Check the pinner cache first.
+		mp := acquirem()
+		if pp := mp.p.ptr(); pp != nil {
+			p.pinner = pp.pinnerCache
+			pp.pinnerCache = nil
+		}
+		releasem(mp)
+
+		if p.pinner == nil {
+			// Didn't get anything from the pinner cache.
+			p.pinner = new(pinner)
+			p.refs = p.refStore[:0]
+
+			// We set this finalizer once and never clear it. Thus, if the
+			// pinner gets cached, we'll reuse it, along with its finalizer.
+			// This lets us avoid the relatively expensive SetFinalizer call
+			// when reusing from the cache. The finalizer however has to be
+			// resilient to an empty pinner being finalized, which is done
+			// by checking p.refs' length.
+			SetFinalizer(p.pinner, func(i *pinner) {
+				if len(i.refs) != 0 {
+					i.unpin() // only required to make the test idempotent
+					pinnerLeakPanic()
+				}
+			})
+		}
+	}
+	ptr := pinnerGetPtr(&pointer)
+	setPinned(ptr, true)
+	p.refs = append(p.refs, ptr)
+}
+
+// Unpin unpins all pinned objects of the Pinner.
+func (p *Pinner) Unpin() {
+	p.pinner.unpin()
+
+	mp := acquirem()
+	if pp := mp.p.ptr(); pp != nil && pp.pinnerCache == nil {
+		// Put the pinner back in the cache, but only if the
+		// cache is empty. If application code is reusing Pinners
+		// on its own, we want to leave the backing store in place
+		// so reuse is more efficient.
+		pp.pinnerCache = p.pinner
+		p.pinner = nil
+	}
+	releasem(mp)
+}
+
+const (
+	pinnerSize         = 64
+	pinnerRefStoreSize = (pinnerSize - unsafe.Sizeof([]unsafe.Pointer{})) / unsafe.Sizeof(unsafe.Pointer(nil))
+)
+
+type pinner struct {
+	refs     []unsafe.Pointer
+	refStore [pinnerRefStoreSize]unsafe.Pointer
+}
+
+func (p *pinner) unpin() {
+	if p == nil || p.refs == nil {
+		return
+	}
+	for i := range p.refs {
+		setPinned(p.refs[i], false)
+	}
+	// The following two lines make all pointers to references
+	// in p.refs unreachable, either by deleting them or dropping
+	// p.refs' backing store (if it was not backed by refStore).
+	p.refStore = [pinnerRefStoreSize]unsafe.Pointer{}
+	p.refs = p.refStore[:0]
+}
+
+func pinnerGetPtr(i *any) unsafe.Pointer {
+	e := efaceOf(i)
+	etyp := e._type
+	if etyp == nil {
+		panic(errorString("runtime.Pinner: argument is nil"))
+	}
+	if kind := etyp.Kind_ & kindMask; kind != kindPtr && kind != kindUnsafePointer {
+		panic(errorString("runtime.Pinner: argument is not a pointer: " + toRType(etyp).string()))
+	}
+	if inUserArenaChunk(uintptr(e.data)) {
+		// Arena-allocated objects are not eligible for pinning.
+		panic(errorString("runtime.Pinner: object was allocated into an arena"))
+	}
+	return e.data
+}
+
+// isPinned checks if a Go pointer is pinned.
+// nosplit, because it's called from nosplit code in cgocheck.
+//
+//go:nosplit
+func isPinned(ptr unsafe.Pointer) bool {
+	span := spanOfHeap(uintptr(ptr))
+	if span == nil {
+		// this code is only called for Go pointer, so this must be a
+		// linker-allocated global object.
+		return true
+	}
+	pinnerBits := span.getPinnerBits()
+	// these pinnerBits might get unlinked by a concurrently running sweep, but
+	// that's OK because gcBits don't get cleared until the following GC cycle
+	// (nextMarkBitArenaEpoch)
+	if pinnerBits == nil {
+		return false
+	}
+	objIndex := span.objIndex(uintptr(ptr))
+	pinState := pinnerBits.ofObject(objIndex)
+	KeepAlive(ptr) // make sure ptr is alive until we are done so the span can't be freed
+	return pinState.isPinned()
+}
+
+// setPinned marks or unmarks a Go pointer as pinned.
+func setPinned(ptr unsafe.Pointer, pin bool) {
+	span := spanOfHeap(uintptr(ptr))
+	if span == nil {
+		if isGoPointerWithoutSpan(ptr) {
+			// this is a linker-allocated or zero size object, nothing to do.
+			return
+		}
+		panic(errorString("runtime.Pinner.Pin: argument is not a Go pointer"))
+	}
+
+	// ensure that the span is swept, b/c sweeping accesses the specials list
+	// w/o locks.
+	mp := acquirem()
+	span.ensureSwept()
+	KeepAlive(ptr) // make sure ptr is still alive after span is swept
+
+	objIndex := span.objIndex(uintptr(ptr))
+
+	lock(&span.speciallock) // guard against concurrent calls of setPinned on same span
+
+	pinnerBits := span.getPinnerBits()
+	if pinnerBits == nil {
+		pinnerBits = span.newPinnerBits()
+		span.setPinnerBits(pinnerBits)
+	}
+	pinState := pinnerBits.ofObject(objIndex)
+	if pin {
+		if pinState.isPinned() {
+			// multiple pins on same object, set multipin bit
+			pinState.setMultiPinned(true)
+			// and increase the pin counter
+			// TODO(mknyszek): investigate if systemstack is necessary here
+			systemstack(func() {
+				offset := objIndex * span.elemsize
+				span.incPinCounter(offset)
+			})
+		} else {
+			// set pin bit
+			pinState.setPinned(true)
+		}
+	} else {
+		// unpin
+		if pinState.isPinned() {
+			if pinState.isMultiPinned() {
+				var exists bool
+				// TODO(mknyszek): investigate if systemstack is necessary here
+				systemstack(func() {
+					offset := objIndex * span.elemsize
+					exists = span.decPinCounter(offset)
+				})
+				if !exists {
+					// counter is 0, clear multipin bit
+					pinState.setMultiPinned(false)
+				}
+			} else {
+				// no multipins recorded. unpin object.
+				pinState.setPinned(false)
+			}
+		} else {
+			// unpinning unpinned object, bail out
+			throw("runtime.Pinner: object already unpinned")
+		}
+	}
+	unlock(&span.speciallock)
+	releasem(mp)
+	return
+}
+
+type pinState struct {
+	bytep   *uint8
+	byteVal uint8
+	mask    uint8
+}
+
+// nosplit, because it's called by isPinned, which is nosplit
+//
+//go:nosplit
+func (v *pinState) isPinned() bool {
+	return (v.byteVal & v.mask) != 0
+}
+
+func (v *pinState) isMultiPinned() bool {
+	return (v.byteVal & (v.mask << 1)) != 0
+}
+
+func (v *pinState) setPinned(val bool) {
+	v.set(val, false)
+}
+
+func (v *pinState) setMultiPinned(val bool) {
+	v.set(val, true)
+}
+
+// set sets the pin bit of the pinState to val. If multipin is true, it
+// sets/unsets the multipin bit instead.
+func (v *pinState) set(val bool, multipin bool) {
+	mask := v.mask
+	if multipin {
+		mask <<= 1
+	}
+	if val {
+		atomic.Or8(v.bytep, mask)
+	} else {
+		atomic.And8(v.bytep, ^mask)
+	}
+}
+
+// pinnerBits is the same type as gcBits but has different methods.
+type pinnerBits gcBits
+
+// ofObject returns the pinState of the n'th object.
+// nosplit, because it's called by isPinned, which is nosplit
+//
+//go:nosplit
+func (p *pinnerBits) ofObject(n uintptr) pinState {
+	bytep, mask := (*gcBits)(p).bitp(n * 2)
+	byteVal := atomic.Load8(bytep)
+	return pinState{bytep, byteVal, mask}
+}
+
+func (s *mspan) pinnerBitSize() uintptr {
+	return divRoundUp(s.nelems*2, 8)
+}
+
+// newPinnerBits returns a pointer to 8 byte aligned bytes to be used for this
+// span's pinner bits. newPinneBits is used to mark objects that are pinned.
+// They are copied when the span is swept.
+func (s *mspan) newPinnerBits() *pinnerBits {
+	return (*pinnerBits)(newMarkBits(s.nelems * 2))
+}
+
+// nosplit, because it's called by isPinned, which is nosplit
+//
+//go:nosplit
+func (s *mspan) getPinnerBits() *pinnerBits {
+	return (*pinnerBits)(atomic.Loadp(unsafe.Pointer(&s.pinnerBits)))
+}
+
+func (s *mspan) setPinnerBits(p *pinnerBits) {
+	atomicstorep(unsafe.Pointer(&s.pinnerBits), unsafe.Pointer(p))
+}
+
+// refreshPinnerBits replaces pinnerBits with a fresh copy in the arenas for the
+// next GC cycle. If it does not contain any pinned objects, pinnerBits of the
+// span is set to nil.
+func (s *mspan) refreshPinnerBits() {
+	p := s.getPinnerBits()
+	if p == nil {
+		return
+	}
+
+	hasPins := false
+	bytes := alignUp(s.pinnerBitSize(), 8)
+
+	// Iterate over each 8-byte chunk and check for pins. Note that
+	// newPinnerBits guarantees that pinnerBits will be 8-byte aligned, so we
+	// don't have to worry about edge cases, irrelevant bits will simply be
+	// zero.
+	for _, x := range unsafe.Slice((*uint64)(unsafe.Pointer(&p.x)), bytes/8) {
+		if x != 0 {
+			hasPins = true
+			break
+		}
+	}
+
+	if hasPins {
+		newPinnerBits := s.newPinnerBits()
+		memmove(unsafe.Pointer(&newPinnerBits.x), unsafe.Pointer(&p.x), bytes)
+		s.setPinnerBits(newPinnerBits)
+	} else {
+		s.setPinnerBits(nil)
+	}
+}
+
+// incPinCounter is only called for multiple pins of the same object and records
+// the _additional_ pins.
+func (span *mspan) incPinCounter(offset uintptr) {
+	var rec *specialPinCounter
+	ref, exists := span.specialFindSplicePoint(offset, _KindSpecialPinCounter)
+	if !exists {
+		lock(&mheap_.speciallock)
+		rec = (*specialPinCounter)(mheap_.specialPinCounterAlloc.alloc())
+		unlock(&mheap_.speciallock)
+		// splice in record, fill in offset.
+		rec.special.offset = uint16(offset)
+		rec.special.kind = _KindSpecialPinCounter
+		rec.special.next = *ref
+		*ref = (*special)(unsafe.Pointer(rec))
+		spanHasSpecials(span)
+	} else {
+		rec = (*specialPinCounter)(unsafe.Pointer(*ref))
+	}
+	rec.counter++
+}
+
+// decPinCounter decreases the counter. If the counter reaches 0, the counter
+// special is deleted and false is returned. Otherwise true is returned.
+func (span *mspan) decPinCounter(offset uintptr) bool {
+	ref, exists := span.specialFindSplicePoint(offset, _KindSpecialPinCounter)
+	if !exists {
+		throw("runtime.Pinner: decreased non-existing pin counter")
+	}
+	counter := (*specialPinCounter)(unsafe.Pointer(*ref))
+	counter.counter--
+	if counter.counter == 0 {
+		*ref = counter.special.next
+		if span.specials == nil {
+			spanHasNoSpecials(span)
+		}
+		lock(&mheap_.speciallock)
+		mheap_.specialPinCounterAlloc.free(unsafe.Pointer(counter))
+		unlock(&mheap_.speciallock)
+		return false
+	}
+	return true
+}
+
+// only for tests
+func pinnerGetPinCounter(addr unsafe.Pointer) *uintptr {
+	_, span, objIndex := findObject(uintptr(addr), 0, 0)
+	offset := objIndex * span.elemsize
+	t, exists := span.specialFindSplicePoint(offset, _KindSpecialPinCounter)
+	if !exists {
+		return nil
+	}
+	counter := (*specialPinCounter)(unsafe.Pointer(*t))
+	return &counter.counter
+}
+
+// to be able to test that the GC panics when a pinned pointer is leaking, this
+// panic function is a variable, that can be overwritten by a test.
+var pinnerLeakPanic = func() {
+	panic(errorString("runtime.Pinner: found leaking pinned pointer; forgot to call Unpin()?"))
+}
diff --git a/src/runtime/pinner_test.go b/src/runtime/pinner_test.go
new file mode 100644
index 0000000..88ead7c
--- /dev/null
+++ b/src/runtime/pinner_test.go
@@ -0,0 +1,524 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+type obj struct {
+	x int64
+	y int64
+	z int64
+}
+
+type objWith[T any] struct {
+	x int64
+	y int64
+	z int64
+	o T
+}
+
+var (
+	globalUintptr                uintptr
+	globalPtrToObj               = &obj{}
+	globalPtrToObjWithPtr        = &objWith[*uintptr]{}
+	globalPtrToRuntimeObj        = func() *obj { return &obj{} }()
+	globalPtrToRuntimeObjWithPtr = func() *objWith[*uintptr] { return &objWith[*uintptr]{} }()
+)
+
+func assertDidPanic(t *testing.T) {
+	if recover() == nil {
+		t.Fatal("did not panic")
+	}
+}
+
+func assertCgoCheckPanics(t *testing.T, p any) {
+	defer func() {
+		if recover() == nil {
+			t.Fatal("cgoCheckPointer() did not panic, make sure the tests run with cgocheck=1")
+		}
+	}()
+	runtime.CgoCheckPointer(p, true)
+}
+
+func TestPinnerSimple(t *testing.T) {
+	var pinner runtime.Pinner
+	p := new(obj)
+	addr := unsafe.Pointer(p)
+	if runtime.IsPinned(addr) {
+		t.Fatal("already marked as pinned")
+	}
+	pinner.Pin(p)
+	if !runtime.IsPinned(addr) {
+		t.Fatal("not marked as pinned")
+	}
+	if runtime.GetPinCounter(addr) != nil {
+		t.Fatal("pin counter should not exist")
+	}
+	pinner.Unpin()
+	if runtime.IsPinned(addr) {
+		t.Fatal("still marked as pinned")
+	}
+}
+
+func TestPinnerPinKeepsAliveAndReleases(t *testing.T) {
+	var pinner runtime.Pinner
+	p := new(obj)
+	done := make(chan struct{})
+	runtime.SetFinalizer(p, func(any) {
+		done <- struct{}{}
+	})
+	pinner.Pin(p)
+	p = nil
+	runtime.GC()
+	runtime.GC()
+	select {
+	case <-done:
+		t.Fatal("Pin() didn't keep object alive")
+	case <-time.After(time.Millisecond * 10):
+		break
+	}
+	pinner.Unpin()
+	runtime.GC()
+	runtime.GC()
+	select {
+	case <-done:
+		break
+	case <-time.After(time.Second):
+		t.Fatal("Unpin() didn't release object")
+	}
+}
+
+func TestPinnerMultiplePinsSame(t *testing.T) {
+	const N = 100
+	var pinner runtime.Pinner
+	p := new(obj)
+	addr := unsafe.Pointer(p)
+	if runtime.IsPinned(addr) {
+		t.Fatal("already marked as pinned")
+	}
+	for i := 0; i < N; i++ {
+		pinner.Pin(p)
+	}
+	if !runtime.IsPinned(addr) {
+		t.Fatal("not marked as pinned")
+	}
+	if cnt := runtime.GetPinCounter(addr); cnt == nil || *cnt != N-1 {
+		t.Fatalf("pin counter incorrect: %d", *cnt)
+	}
+	pinner.Unpin()
+	if runtime.IsPinned(addr) {
+		t.Fatal("still marked as pinned")
+	}
+	if runtime.GetPinCounter(addr) != nil {
+		t.Fatal("pin counter was not deleted")
+	}
+}
+
+func TestPinnerTwoPinner(t *testing.T) {
+	var pinner1, pinner2 runtime.Pinner
+	p := new(obj)
+	addr := unsafe.Pointer(p)
+	if runtime.IsPinned(addr) {
+		t.Fatal("already marked as pinned")
+	}
+	pinner1.Pin(p)
+	if !runtime.IsPinned(addr) {
+		t.Fatal("not marked as pinned")
+	}
+	if runtime.GetPinCounter(addr) != nil {
+		t.Fatal("pin counter should not exist")
+	}
+	pinner2.Pin(p)
+	if !runtime.IsPinned(addr) {
+		t.Fatal("not marked as pinned")
+	}
+	if cnt := runtime.GetPinCounter(addr); cnt == nil || *cnt != 1 {
+		t.Fatalf("pin counter incorrect: %d", *cnt)
+	}
+	pinner1.Unpin()
+	if !runtime.IsPinned(addr) {
+		t.Fatal("not marked as pinned")
+	}
+	if runtime.GetPinCounter(addr) != nil {
+		t.Fatal("pin counter should not exist")
+	}
+	pinner2.Unpin()
+	if runtime.IsPinned(addr) {
+		t.Fatal("still marked as pinned")
+	}
+	if runtime.GetPinCounter(addr) != nil {
+		t.Fatal("pin counter was not deleted")
+	}
+}
+
+func TestPinnerPinZerosizeObj(t *testing.T) {
+	var pinner runtime.Pinner
+	defer pinner.Unpin()
+	p := new(struct{})
+	pinner.Pin(p)
+	if !runtime.IsPinned(unsafe.Pointer(p)) {
+		t.Fatal("not marked as pinned")
+	}
+}
+
+func TestPinnerPinGlobalPtr(t *testing.T) {
+	var pinner runtime.Pinner
+	defer pinner.Unpin()
+	pinner.Pin(globalPtrToObj)
+	pinner.Pin(globalPtrToObjWithPtr)
+	pinner.Pin(globalPtrToRuntimeObj)
+	pinner.Pin(globalPtrToRuntimeObjWithPtr)
+}
+
+func TestPinnerPinTinyObj(t *testing.T) {
+	var pinner runtime.Pinner
+	const N = 64
+	var addr [N]unsafe.Pointer
+	for i := 0; i < N; i++ {
+		p := new(bool)
+		addr[i] = unsafe.Pointer(p)
+		pinner.Pin(p)
+		pinner.Pin(p)
+		if !runtime.IsPinned(addr[i]) {
+			t.Fatalf("not marked as pinned: %d", i)
+		}
+		if cnt := runtime.GetPinCounter(addr[i]); cnt == nil || *cnt == 0 {
+			t.Fatalf("pin counter incorrect: %d, %d", *cnt, i)
+		}
+	}
+	pinner.Unpin()
+	for i := 0; i < N; i++ {
+		if runtime.IsPinned(addr[i]) {
+			t.Fatal("still marked as pinned")
+		}
+		if runtime.GetPinCounter(addr[i]) != nil {
+			t.Fatal("pin counter should not exist")
+		}
+	}
+}
+
+func TestPinnerInterface(t *testing.T) {
+	var pinner runtime.Pinner
+	o := new(obj)
+	ifc := any(o)
+	pinner.Pin(&ifc)
+	if !runtime.IsPinned(unsafe.Pointer(&ifc)) {
+		t.Fatal("not marked as pinned")
+	}
+	if runtime.IsPinned(unsafe.Pointer(o)) {
+		t.Fatal("marked as pinned")
+	}
+	pinner.Unpin()
+	pinner.Pin(ifc)
+	if !runtime.IsPinned(unsafe.Pointer(o)) {
+		t.Fatal("not marked as pinned")
+	}
+	if runtime.IsPinned(unsafe.Pointer(&ifc)) {
+		t.Fatal("marked as pinned")
+	}
+	pinner.Unpin()
+}
+
+func TestPinnerPinNonPtrPanics(t *testing.T) {
+	var pinner runtime.Pinner
+	defer pinner.Unpin()
+	var i int
+	defer assertDidPanic(t)
+	pinner.Pin(i)
+}
+
+func TestPinnerReuse(t *testing.T) {
+	var pinner runtime.Pinner
+	p := new(obj)
+	p2 := &p
+	assertCgoCheckPanics(t, p2)
+	pinner.Pin(p)
+	runtime.CgoCheckPointer(p2, true)
+	pinner.Unpin()
+	assertCgoCheckPanics(t, p2)
+	pinner.Pin(p)
+	runtime.CgoCheckPointer(p2, true)
+	pinner.Unpin()
+}
+
+func TestPinnerEmptyUnpin(t *testing.T) {
+	var pinner runtime.Pinner
+	pinner.Unpin()
+	pinner.Unpin()
+}
+
+func TestPinnerLeakPanics(t *testing.T) {
+	old := runtime.GetPinnerLeakPanic()
+	func() {
+		defer assertDidPanic(t)
+		old()
+	}()
+	done := make(chan struct{})
+	runtime.SetPinnerLeakPanic(func() {
+		done <- struct{}{}
+	})
+	func() {
+		var pinner runtime.Pinner
+		p := new(obj)
+		pinner.Pin(p)
+	}()
+	runtime.GC()
+	runtime.GC()
+	select {
+	case <-done:
+		break
+	case <-time.After(time.Second):
+		t.Fatal("leak didn't make GC to panic")
+	}
+	runtime.SetPinnerLeakPanic(old)
+}
+
+func TestPinnerCgoCheckPtr2Ptr(t *testing.T) {
+	var pinner runtime.Pinner
+	defer pinner.Unpin()
+	p := new(obj)
+	p2 := &objWith[*obj]{o: p}
+	assertCgoCheckPanics(t, p2)
+	pinner.Pin(p)
+	runtime.CgoCheckPointer(p2, true)
+}
+
+func TestPinnerCgoCheckPtr2UnsafePtr(t *testing.T) {
+	var pinner runtime.Pinner
+	defer pinner.Unpin()
+	p := unsafe.Pointer(new(obj))
+	p2 := &objWith[unsafe.Pointer]{o: p}
+	assertCgoCheckPanics(t, p2)
+	pinner.Pin(p)
+	runtime.CgoCheckPointer(p2, true)
+}
+
+func TestPinnerCgoCheckPtr2UnknownPtr(t *testing.T) {
+	var pinner runtime.Pinner
+	defer pinner.Unpin()
+	p := unsafe.Pointer(new(obj))
+	p2 := &p
+	func() {
+		defer assertDidPanic(t)
+		runtime.CgoCheckPointer(p2, nil)
+	}()
+	pinner.Pin(p)
+	runtime.CgoCheckPointer(p2, nil)
+}
+
+func TestPinnerCgoCheckInterface(t *testing.T) {
+	var pinner runtime.Pinner
+	defer pinner.Unpin()
+	var ifc any
+	var o obj
+	ifc = &o
+	p := &ifc
+	assertCgoCheckPanics(t, p)
+	pinner.Pin(&o)
+	runtime.CgoCheckPointer(p, true)
+}
+
+func TestPinnerCgoCheckSlice(t *testing.T) {
+	var pinner runtime.Pinner
+	defer pinner.Unpin()
+	sl := []int{1, 2, 3}
+	assertCgoCheckPanics(t, &sl)
+	pinner.Pin(&sl[0])
+	runtime.CgoCheckPointer(&sl, true)
+}
+
+func TestPinnerCgoCheckString(t *testing.T) {
+	var pinner runtime.Pinner
+	defer pinner.Unpin()
+	b := []byte("foobar")
+	str := unsafe.String(&b[0], 6)
+	assertCgoCheckPanics(t, &str)
+	pinner.Pin(&b[0])
+	runtime.CgoCheckPointer(&str, true)
+}
+
+func TestPinnerCgoCheckPinned2UnpinnedPanics(t *testing.T) {
+	var pinner runtime.Pinner
+	defer pinner.Unpin()
+	p := new(obj)
+	p2 := &objWith[*obj]{o: p}
+	assertCgoCheckPanics(t, p2)
+	pinner.Pin(p2)
+	assertCgoCheckPanics(t, p2)
+}
+
+func TestPinnerCgoCheckPtr2Pinned2Unpinned(t *testing.T) {
+	var pinner runtime.Pinner
+	defer pinner.Unpin()
+	p := new(obj)
+	p2 := &objWith[*obj]{o: p}
+	p3 := &objWith[*objWith[*obj]]{o: p2}
+	assertCgoCheckPanics(t, p2)
+	assertCgoCheckPanics(t, p3)
+	pinner.Pin(p2)
+	assertCgoCheckPanics(t, p2)
+	assertCgoCheckPanics(t, p3)
+	pinner.Pin(p)
+	runtime.CgoCheckPointer(p2, true)
+	runtime.CgoCheckPointer(p3, true)
+}
+
+func BenchmarkPinnerPinUnpinBatch(b *testing.B) {
+	const Batch = 1000
+	var data [Batch]*obj
+	for i := 0; i < Batch; i++ {
+		data[i] = new(obj)
+	}
+	b.ResetTimer()
+	for n := 0; n < b.N; n++ {
+		var pinner runtime.Pinner
+		for i := 0; i < Batch; i++ {
+			pinner.Pin(data[i])
+		}
+		pinner.Unpin()
+	}
+}
+
+func BenchmarkPinnerPinUnpinBatchDouble(b *testing.B) {
+	const Batch = 1000
+	var data [Batch]*obj
+	for i := 0; i < Batch; i++ {
+		data[i] = new(obj)
+	}
+	b.ResetTimer()
+	for n := 0; n < b.N; n++ {
+		var pinner runtime.Pinner
+		for i := 0; i < Batch; i++ {
+			pinner.Pin(data[i])
+			pinner.Pin(data[i])
+		}
+		pinner.Unpin()
+	}
+}
+
+func BenchmarkPinnerPinUnpinBatchTiny(b *testing.B) {
+	const Batch = 1000
+	var data [Batch]*bool
+	for i := 0; i < Batch; i++ {
+		data[i] = new(bool)
+	}
+	b.ResetTimer()
+	for n := 0; n < b.N; n++ {
+		var pinner runtime.Pinner
+		for i := 0; i < Batch; i++ {
+			pinner.Pin(data[i])
+		}
+		pinner.Unpin()
+	}
+}
+
+func BenchmarkPinnerPinUnpin(b *testing.B) {
+	p := new(obj)
+	for n := 0; n < b.N; n++ {
+		var pinner runtime.Pinner
+		pinner.Pin(p)
+		pinner.Unpin()
+	}
+}
+
+func BenchmarkPinnerPinUnpinTiny(b *testing.B) {
+	p := new(bool)
+	for n := 0; n < b.N; n++ {
+		var pinner runtime.Pinner
+		pinner.Pin(p)
+		pinner.Unpin()
+	}
+}
+
+func BenchmarkPinnerPinUnpinDouble(b *testing.B) {
+	p := new(obj)
+	for n := 0; n < b.N; n++ {
+		var pinner runtime.Pinner
+		pinner.Pin(p)
+		pinner.Pin(p)
+		pinner.Unpin()
+	}
+}
+
+func BenchmarkPinnerPinUnpinParallel(b *testing.B) {
+	b.RunParallel(func(pb *testing.PB) {
+		p := new(obj)
+		for pb.Next() {
+			var pinner runtime.Pinner
+			pinner.Pin(p)
+			pinner.Unpin()
+		}
+	})
+}
+
+func BenchmarkPinnerPinUnpinParallelTiny(b *testing.B) {
+	b.RunParallel(func(pb *testing.PB) {
+		p := new(bool)
+		for pb.Next() {
+			var pinner runtime.Pinner
+			pinner.Pin(p)
+			pinner.Unpin()
+		}
+	})
+}
+
+func BenchmarkPinnerPinUnpinParallelDouble(b *testing.B) {
+	b.RunParallel(func(pb *testing.PB) {
+		p := new(obj)
+		for pb.Next() {
+			var pinner runtime.Pinner
+			pinner.Pin(p)
+			pinner.Pin(p)
+			pinner.Unpin()
+		}
+	})
+}
+
+func BenchmarkPinnerIsPinnedOnPinned(b *testing.B) {
+	var pinner runtime.Pinner
+	ptr := new(obj)
+	pinner.Pin(ptr)
+	b.ResetTimer()
+	for n := 0; n < b.N; n++ {
+		runtime.IsPinned(unsafe.Pointer(ptr))
+	}
+	pinner.Unpin()
+}
+
+func BenchmarkPinnerIsPinnedOnUnpinned(b *testing.B) {
+	ptr := new(obj)
+	b.ResetTimer()
+	for n := 0; n < b.N; n++ {
+		runtime.IsPinned(unsafe.Pointer(ptr))
+	}
+}
+
+func BenchmarkPinnerIsPinnedOnPinnedParallel(b *testing.B) {
+	var pinner runtime.Pinner
+	ptr := new(obj)
+	pinner.Pin(ptr)
+	b.ResetTimer()
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			runtime.IsPinned(unsafe.Pointer(ptr))
+		}
+	})
+	pinner.Unpin()
+}
+
+func BenchmarkPinnerIsPinnedOnUnpinnedParallel(b *testing.B) {
+	ptr := new(obj)
+	b.ResetTimer()
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			runtime.IsPinned(unsafe.Pointer(ptr))
+		}
+	})
+}
diff --git a/src/runtime/plugin.go b/src/runtime/plugin.go
new file mode 100644
index 0000000..40dfefd
--- /dev/null
+++ b/src/runtime/plugin.go
@@ -0,0 +1,137 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+//go:linkname plugin_lastmoduleinit plugin.lastmoduleinit
+func plugin_lastmoduleinit() (path string, syms map[string]any, initTasks []*initTask, errstr string) {
+	var md *moduledata
+	for pmd := firstmoduledata.next; pmd != nil; pmd = pmd.next {
+		if pmd.bad {
+			md = nil // we only want the last module
+			continue
+		}
+		md = pmd
+	}
+	if md == nil {
+		throw("runtime: no plugin module data")
+	}
+	if md.pluginpath == "" {
+		throw("runtime: plugin has empty pluginpath")
+	}
+	if md.typemap != nil {
+		return "", nil, nil, "plugin already loaded"
+	}
+
+	for _, pmd := range activeModules() {
+		if pmd.pluginpath == md.pluginpath {
+			md.bad = true
+			return "", nil, nil, "plugin already loaded"
+		}
+
+		if inRange(pmd.text, pmd.etext, md.text, md.etext) ||
+			inRange(pmd.bss, pmd.ebss, md.bss, md.ebss) ||
+			inRange(pmd.data, pmd.edata, md.data, md.edata) ||
+			inRange(pmd.types, pmd.etypes, md.types, md.etypes) {
+			println("plugin: new module data overlaps with previous moduledata")
+			println("\tpmd.text-etext=", hex(pmd.text), "-", hex(pmd.etext))
+			println("\tpmd.bss-ebss=", hex(pmd.bss), "-", hex(pmd.ebss))
+			println("\tpmd.data-edata=", hex(pmd.data), "-", hex(pmd.edata))
+			println("\tpmd.types-etypes=", hex(pmd.types), "-", hex(pmd.etypes))
+			println("\tmd.text-etext=", hex(md.text), "-", hex(md.etext))
+			println("\tmd.bss-ebss=", hex(md.bss), "-", hex(md.ebss))
+			println("\tmd.data-edata=", hex(md.data), "-", hex(md.edata))
+			println("\tmd.types-etypes=", hex(md.types), "-", hex(md.etypes))
+			throw("plugin: new module data overlaps with previous moduledata")
+		}
+	}
+	for _, pkghash := range md.pkghashes {
+		if pkghash.linktimehash != *pkghash.runtimehash {
+			md.bad = true
+			return "", nil, nil, "plugin was built with a different version of package " + pkghash.modulename
+		}
+	}
+
+	// Initialize the freshly loaded module.
+	modulesinit()
+	typelinksinit()
+
+	pluginftabverify(md)
+	moduledataverify1(md)
+
+	lock(&itabLock)
+	for _, i := range md.itablinks {
+		itabAdd(i)
+	}
+	unlock(&itabLock)
+
+	// Build a map of symbol names to symbols. Here in the runtime
+	// we fill out the first word of the interface, the type. We
+	// pass these zero value interfaces to the plugin package,
+	// where the symbol value is filled in (usually via cgo).
+	//
+	// Because functions are handled specially in the plugin package,
+	// function symbol names are prefixed here with '.' to avoid
+	// a dependency on the reflect package.
+	syms = make(map[string]any, len(md.ptab))
+	for _, ptab := range md.ptab {
+		symName := resolveNameOff(unsafe.Pointer(md.types), ptab.name)
+		t := toRType((*_type)(unsafe.Pointer(md.types))).typeOff(ptab.typ) // TODO can this stack of conversions be simpler?
+		var val any
+		valp := (*[2]unsafe.Pointer)(unsafe.Pointer(&val))
+		(*valp)[0] = unsafe.Pointer(t)
+
+		name := symName.Name()
+		if t.Kind_&kindMask == kindFunc {
+			name = "." + name
+		}
+		syms[name] = val
+	}
+	return md.pluginpath, syms, md.inittasks, ""
+}
+
+func pluginftabverify(md *moduledata) {
+	badtable := false
+	for i := 0; i < len(md.ftab); i++ {
+		entry := md.textAddr(md.ftab[i].entryoff)
+		if md.minpc <= entry && entry <= md.maxpc {
+			continue
+		}
+
+		f := funcInfo{(*_func)(unsafe.Pointer(&md.pclntable[md.ftab[i].funcoff])), md}
+		name := funcname(f)
+
+		// A common bug is f.entry has a relocation to a duplicate
+		// function symbol, meaning if we search for its PC we get
+		// a valid entry with a name that is useful for debugging.
+		name2 := "none"
+		entry2 := uintptr(0)
+		f2 := findfunc(entry)
+		if f2.valid() {
+			name2 = funcname(f2)
+			entry2 = f2.entry()
+		}
+		badtable = true
+		println("ftab entry", hex(entry), "/", hex(entry2), ": ",
+			name, "/", name2, "outside pc range:[", hex(md.minpc), ",", hex(md.maxpc), "], modulename=", md.modulename, ", pluginpath=", md.pluginpath)
+	}
+	if badtable {
+		throw("runtime: plugin has bad symbol table")
+	}
+}
+
+// inRange reports whether v0 or v1 are in the range [r0, r1].
+func inRange(r0, r1, v0, v1 uintptr) bool {
+	return (v0 >= r0 && v0 <= r1) || (v1 >= r0 && v1 <= r1)
+}
+
+// A ptabEntry is generated by the compiler for each exported function
+// and global variable in the main package of a plugin. It is used to
+// initialize the plugin module's symbol map.
+type ptabEntry struct {
+	name nameOff
+	typ  typeOff
+}
diff --git a/src/runtime/pprof/elf.go b/src/runtime/pprof/elf.go
new file mode 100644
index 0000000..a8b5ea6
--- /dev/null
+++ b/src/runtime/pprof/elf.go
@@ -0,0 +1,109 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"encoding/binary"
+	"errors"
+	"fmt"
+	"os"
+)
+
+var (
+	errBadELF    = errors.New("malformed ELF binary")
+	errNoBuildID = errors.New("no NT_GNU_BUILD_ID found in ELF binary")
+)
+
+// elfBuildID returns the GNU build ID of the named ELF binary,
+// without introducing a dependency on debug/elf and its dependencies.
+func elfBuildID(file string) (string, error) {
+	buf := make([]byte, 256)
+	f, err := os.Open(file)
+	if err != nil {
+		return "", err
+	}
+	defer f.Close()
+
+	if _, err := f.ReadAt(buf[:64], 0); err != nil {
+		return "", err
+	}
+
+	// ELF file begins with \x7F E L F.
+	if buf[0] != 0x7F || buf[1] != 'E' || buf[2] != 'L' || buf[3] != 'F' {
+		return "", errBadELF
+	}
+
+	var byteOrder binary.ByteOrder
+	switch buf[5] {
+	default:
+		return "", errBadELF
+	case 1: // little-endian
+		byteOrder = binary.LittleEndian
+	case 2: // big-endian
+		byteOrder = binary.BigEndian
+	}
+
+	var shnum int
+	var shoff, shentsize int64
+	switch buf[4] {
+	default:
+		return "", errBadELF
+	case 1: // 32-bit file header
+		shoff = int64(byteOrder.Uint32(buf[32:]))
+		shentsize = int64(byteOrder.Uint16(buf[46:]))
+		if shentsize != 40 {
+			return "", errBadELF
+		}
+		shnum = int(byteOrder.Uint16(buf[48:]))
+	case 2: // 64-bit file header
+		shoff = int64(byteOrder.Uint64(buf[40:]))
+		shentsize = int64(byteOrder.Uint16(buf[58:]))
+		if shentsize != 64 {
+			return "", errBadELF
+		}
+		shnum = int(byteOrder.Uint16(buf[60:]))
+	}
+
+	for i := 0; i < shnum; i++ {
+		if _, err := f.ReadAt(buf[:shentsize], shoff+int64(i)*shentsize); err != nil {
+			return "", err
+		}
+		if typ := byteOrder.Uint32(buf[4:]); typ != 7 { // SHT_NOTE
+			continue
+		}
+		var off, size int64
+		if shentsize == 40 {
+			// 32-bit section header
+			off = int64(byteOrder.Uint32(buf[16:]))
+			size = int64(byteOrder.Uint32(buf[20:]))
+		} else {
+			// 64-bit section header
+			off = int64(byteOrder.Uint64(buf[24:]))
+			size = int64(byteOrder.Uint64(buf[32:]))
+		}
+		size += off
+		for off < size {
+			if _, err := f.ReadAt(buf[:16], off); err != nil { // room for header + name GNU\x00
+				return "", err
+			}
+			nameSize := int(byteOrder.Uint32(buf[0:]))
+			descSize := int(byteOrder.Uint32(buf[4:]))
+			noteType := int(byteOrder.Uint32(buf[8:]))
+			descOff := off + int64(12+(nameSize+3)&^3)
+			off = descOff + int64((descSize+3)&^3)
+			if nameSize != 4 || noteType != 3 || buf[12] != 'G' || buf[13] != 'N' || buf[14] != 'U' || buf[15] != '\x00' { // want name GNU\x00 type 3 (NT_GNU_BUILD_ID)
+				continue
+			}
+			if descSize > len(buf) {
+				return "", errBadELF
+			}
+			if _, err := f.ReadAt(buf[:descSize], descOff); err != nil {
+				return "", err
+			}
+			return fmt.Sprintf("%x", buf[:descSize]), nil
+		}
+	}
+	return "", errNoBuildID
+}
diff --git a/src/runtime/pprof/label.go b/src/runtime/pprof/label.go
new file mode 100644
index 0000000..d39e0ad
--- /dev/null
+++ b/src/runtime/pprof/label.go
@@ -0,0 +1,108 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"context"
+	"fmt"
+	"sort"
+	"strings"
+)
+
+type label struct {
+	key   string
+	value string
+}
+
+// LabelSet is a set of labels.
+type LabelSet struct {
+	list []label
+}
+
+// labelContextKey is the type of contextKeys used for profiler labels.
+type labelContextKey struct{}
+
+func labelValue(ctx context.Context) labelMap {
+	labels, _ := ctx.Value(labelContextKey{}).(*labelMap)
+	if labels == nil {
+		return labelMap(nil)
+	}
+	return *labels
+}
+
+// labelMap is the representation of the label set held in the context type.
+// This is an initial implementation, but it will be replaced with something
+// that admits incremental immutable modification more efficiently.
+type labelMap map[string]string
+
+// String satisfies Stringer and returns key, value pairs in a consistent
+// order.
+func (l *labelMap) String() string {
+	if l == nil {
+		return ""
+	}
+	keyVals := make([]string, 0, len(*l))
+
+	for k, v := range *l {
+		keyVals = append(keyVals, fmt.Sprintf("%q:%q", k, v))
+	}
+
+	sort.Strings(keyVals)
+
+	return "{" + strings.Join(keyVals, ", ") + "}"
+}
+
+// WithLabels returns a new context.Context with the given labels added.
+// A label overwrites a prior label with the same key.
+func WithLabels(ctx context.Context, labels LabelSet) context.Context {
+	parentLabels := labelValue(ctx)
+	childLabels := make(labelMap, len(parentLabels))
+	// TODO(matloob): replace the map implementation with something
+	// more efficient so creating a child context WithLabels doesn't need
+	// to clone the map.
+	for k, v := range parentLabels {
+		childLabels[k] = v
+	}
+	for _, label := range labels.list {
+		childLabels[label.key] = label.value
+	}
+	return context.WithValue(ctx, labelContextKey{}, &childLabels)
+}
+
+// Labels takes an even number of strings representing key-value pairs
+// and makes a LabelSet containing them.
+// A label overwrites a prior label with the same key.
+// Currently only the CPU and goroutine profiles utilize any labels
+// information.
+// See https://golang.org/issue/23458 for details.
+func Labels(args ...string) LabelSet {
+	if len(args)%2 != 0 {
+		panic("uneven number of arguments to pprof.Labels")
+	}
+	list := make([]label, 0, len(args)/2)
+	for i := 0; i+1 < len(args); i += 2 {
+		list = append(list, label{key: args[i], value: args[i+1]})
+	}
+	return LabelSet{list: list}
+}
+
+// Label returns the value of the label with the given key on ctx, and a boolean indicating
+// whether that label exists.
+func Label(ctx context.Context, key string) (string, bool) {
+	ctxLabels := labelValue(ctx)
+	v, ok := ctxLabels[key]
+	return v, ok
+}
+
+// ForLabels invokes f with each label set on the context.
+// The function f should return true to continue iteration or false to stop iteration early.
+func ForLabels(ctx context.Context, f func(key, value string) bool) {
+	ctxLabels := labelValue(ctx)
+	for k, v := range ctxLabels {
+		if !f(k, v) {
+			break
+		}
+	}
+}
diff --git a/src/runtime/pprof/label_test.go b/src/runtime/pprof/label_test.go
new file mode 100644
index 0000000..fcb00bd
--- /dev/null
+++ b/src/runtime/pprof/label_test.go
@@ -0,0 +1,114 @@
+package pprof
+
+import (
+	"context"
+	"reflect"
+	"sort"
+	"testing"
+)
+
+func labelsSorted(ctx context.Context) []label {
+	ls := []label{}
+	ForLabels(ctx, func(key, value string) bool {
+		ls = append(ls, label{key, value})
+		return true
+	})
+	sort.Sort(labelSorter(ls))
+	return ls
+}
+
+type labelSorter []label
+
+func (s labelSorter) Len() int           { return len(s) }
+func (s labelSorter) Swap(i, j int)      { s[i], s[j] = s[j], s[i] }
+func (s labelSorter) Less(i, j int) bool { return s[i].key < s[j].key }
+
+func TestContextLabels(t *testing.T) {
+	// Background context starts with no labels.
+	ctx := context.Background()
+	labels := labelsSorted(ctx)
+	if len(labels) != 0 {
+		t.Errorf("labels on background context: want [], got %v ", labels)
+	}
+
+	// Add a single label.
+	ctx = WithLabels(ctx, Labels("key", "value"))
+	// Retrieve it with Label.
+	v, ok := Label(ctx, "key")
+	if !ok || v != "value" {
+		t.Errorf(`Label(ctx, "key"): got %v, %v; want "value", ok`, v, ok)
+	}
+	gotLabels := labelsSorted(ctx)
+	wantLabels := []label{{"key", "value"}}
+	if !reflect.DeepEqual(gotLabels, wantLabels) {
+		t.Errorf("(sorted) labels on context: got %v, want %v", gotLabels, wantLabels)
+	}
+
+	// Add a label with a different key.
+	ctx = WithLabels(ctx, Labels("key2", "value2"))
+	v, ok = Label(ctx, "key2")
+	if !ok || v != "value2" {
+		t.Errorf(`Label(ctx, "key2"): got %v, %v; want "value2", ok`, v, ok)
+	}
+	gotLabels = labelsSorted(ctx)
+	wantLabels = []label{{"key", "value"}, {"key2", "value2"}}
+	if !reflect.DeepEqual(gotLabels, wantLabels) {
+		t.Errorf("(sorted) labels on context: got %v, want %v", gotLabels, wantLabels)
+	}
+
+	// Add label with first key to test label replacement.
+	ctx = WithLabels(ctx, Labels("key", "value3"))
+	v, ok = Label(ctx, "key")
+	if !ok || v != "value3" {
+		t.Errorf(`Label(ctx, "key3"): got %v, %v; want "value3", ok`, v, ok)
+	}
+	gotLabels = labelsSorted(ctx)
+	wantLabels = []label{{"key", "value3"}, {"key2", "value2"}}
+	if !reflect.DeepEqual(gotLabels, wantLabels) {
+		t.Errorf("(sorted) labels on context: got %v, want %v", gotLabels, wantLabels)
+	}
+
+	// Labels called with two labels with the same key should pick the second.
+	ctx = WithLabels(ctx, Labels("key4", "value4a", "key4", "value4b"))
+	v, ok = Label(ctx, "key4")
+	if !ok || v != "value4b" {
+		t.Errorf(`Label(ctx, "key4"): got %v, %v; want "value4b", ok`, v, ok)
+	}
+	gotLabels = labelsSorted(ctx)
+	wantLabels = []label{{"key", "value3"}, {"key2", "value2"}, {"key4", "value4b"}}
+	if !reflect.DeepEqual(gotLabels, wantLabels) {
+		t.Errorf("(sorted) labels on context: got %v, want %v", gotLabels, wantLabels)
+	}
+}
+
+func TestLabelMapStringer(t *testing.T) {
+	for _, tbl := range []struct {
+		m        labelMap
+		expected string
+	}{
+		{
+			m: labelMap{
+				// empty map
+			},
+			expected: "{}",
+		}, {
+			m: labelMap{
+				"foo": "bar",
+			},
+			expected: `{"foo":"bar"}`,
+		}, {
+			m: labelMap{
+				"foo":             "bar",
+				"key1":            "value1",
+				"key2":            "value2",
+				"key3":            "value3",
+				"key4WithNewline": "\nvalue4",
+			},
+			expected: `{"foo":"bar", "key1":"value1", "key2":"value2", "key3":"value3", "key4WithNewline":"\nvalue4"}`,
+		},
+	} {
+		if got := tbl.m.String(); tbl.expected != got {
+			t.Errorf("%#v.String() = %q; want %q", tbl.m, got, tbl.expected)
+		}
+	}
+}
diff --git a/src/runtime/pprof/map.go b/src/runtime/pprof/map.go
new file mode 100644
index 0000000..7c75872
--- /dev/null
+++ b/src/runtime/pprof/map.go
@@ -0,0 +1,90 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import "unsafe"
+
+// A profMap is a map from (stack, tag) to mapEntry.
+// It grows without bound, but that's assumed to be OK.
+type profMap struct {
+	hash    map[uintptr]*profMapEntry
+	all     *profMapEntry
+	last    *profMapEntry
+	free    []profMapEntry
+	freeStk []uintptr
+}
+
+// A profMapEntry is a single entry in the profMap.
+type profMapEntry struct {
+	nextHash *profMapEntry // next in hash list
+	nextAll  *profMapEntry // next in list of all entries
+	stk      []uintptr
+	tag      unsafe.Pointer
+	count    int64
+}
+
+func (m *profMap) lookup(stk []uint64, tag unsafe.Pointer) *profMapEntry {
+	// Compute hash of (stk, tag).
+	h := uintptr(0)
+	for _, x := range stk {
+		h = h<<8 | (h >> (8 * (unsafe.Sizeof(h) - 1)))
+		h += uintptr(x) * 41
+	}
+	h = h<<8 | (h >> (8 * (unsafe.Sizeof(h) - 1)))
+	h += uintptr(tag) * 41
+
+	// Find entry if present.
+	var last *profMapEntry
+Search:
+	for e := m.hash[h]; e != nil; last, e = e, e.nextHash {
+		if len(e.stk) != len(stk) || e.tag != tag {
+			continue
+		}
+		for j := range stk {
+			if e.stk[j] != uintptr(stk[j]) {
+				continue Search
+			}
+		}
+		// Move to front.
+		if last != nil {
+			last.nextHash = e.nextHash
+			e.nextHash = m.hash[h]
+			m.hash[h] = e
+		}
+		return e
+	}
+
+	// Add new entry.
+	if len(m.free) < 1 {
+		m.free = make([]profMapEntry, 128)
+	}
+	e := &m.free[0]
+	m.free = m.free[1:]
+	e.nextHash = m.hash[h]
+	e.tag = tag
+
+	if len(m.freeStk) < len(stk) {
+		m.freeStk = make([]uintptr, 1024)
+	}
+	// Limit cap to prevent append from clobbering freeStk.
+	e.stk = m.freeStk[:len(stk):len(stk)]
+	m.freeStk = m.freeStk[len(stk):]
+
+	for j := range stk {
+		e.stk[j] = uintptr(stk[j])
+	}
+	if m.hash == nil {
+		m.hash = make(map[uintptr]*profMapEntry)
+	}
+	m.hash[h] = e
+	if m.all == nil {
+		m.all = e
+		m.last = e
+	} else {
+		m.last.nextAll = e
+		m.last = e
+	}
+	return e
+}
diff --git a/src/runtime/pprof/mprof_test.go b/src/runtime/pprof/mprof_test.go
new file mode 100644
index 0000000..391588d
--- /dev/null
+++ b/src/runtime/pprof/mprof_test.go
@@ -0,0 +1,176 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !js
+
+package pprof
+
+import (
+	"bytes"
+	"fmt"
+	"internal/profile"
+	"reflect"
+	"regexp"
+	"runtime"
+	"testing"
+	"unsafe"
+)
+
+var memSink any
+
+func allocateTransient1M() {
+	for i := 0; i < 1024; i++ {
+		memSink = &struct{ x [1024]byte }{}
+	}
+}
+
+//go:noinline
+func allocateTransient2M() {
+	memSink = make([]byte, 2<<20)
+}
+
+func allocateTransient2MInline() {
+	memSink = make([]byte, 2<<20)
+}
+
+type Obj32 struct {
+	link *Obj32
+	pad  [32 - unsafe.Sizeof(uintptr(0))]byte
+}
+
+var persistentMemSink *Obj32
+
+func allocatePersistent1K() {
+	for i := 0; i < 32; i++ {
+		// Can't use slice because that will introduce implicit allocations.
+		obj := &Obj32{link: persistentMemSink}
+		persistentMemSink = obj
+	}
+}
+
+// Allocate transient memory using reflect.Call.
+
+func allocateReflectTransient() {
+	memSink = make([]byte, 2<<20)
+}
+
+func allocateReflect() {
+	rv := reflect.ValueOf(allocateReflectTransient)
+	rv.Call(nil)
+}
+
+var memoryProfilerRun = 0
+
+func TestMemoryProfiler(t *testing.T) {
+	// Disable sampling, otherwise it's difficult to assert anything.
+	oldRate := runtime.MemProfileRate
+	runtime.MemProfileRate = 1
+	defer func() {
+		runtime.MemProfileRate = oldRate
+	}()
+
+	// Allocate a meg to ensure that mcache.nextSample is updated to 1.
+	for i := 0; i < 1024; i++ {
+		memSink = make([]byte, 1024)
+	}
+
+	// Do the interesting allocations.
+	allocateTransient1M()
+	allocateTransient2M()
+	allocateTransient2MInline()
+	allocatePersistent1K()
+	allocateReflect()
+	memSink = nil
+
+	runtime.GC() // materialize stats
+
+	memoryProfilerRun++
+
+	tests := []struct {
+		stk    []string
+		legacy string
+	}{{
+		stk: []string{"runtime/pprof.allocatePersistent1K", "runtime/pprof.TestMemoryProfiler"},
+		legacy: fmt.Sprintf(`%v: %v \[%v: %v\] @ 0x[0-9,a-f]+ 0x[0-9,a-f]+ 0x[0-9,a-f]+ 0x[0-9,a-f]+
+#	0x[0-9,a-f]+	runtime/pprof\.allocatePersistent1K\+0x[0-9,a-f]+	.*runtime/pprof/mprof_test\.go:47
+#	0x[0-9,a-f]+	runtime/pprof\.TestMemoryProfiler\+0x[0-9,a-f]+	.*runtime/pprof/mprof_test\.go:82
+`, 32*memoryProfilerRun, 1024*memoryProfilerRun, 32*memoryProfilerRun, 1024*memoryProfilerRun),
+	}, {
+		stk: []string{"runtime/pprof.allocateTransient1M", "runtime/pprof.TestMemoryProfiler"},
+		legacy: fmt.Sprintf(`0: 0 \[%v: %v\] @ 0x[0-9,a-f]+ 0x[0-9,a-f]+ 0x[0-9,a-f]+ 0x[0-9,a-f]+
+#	0x[0-9,a-f]+	runtime/pprof\.allocateTransient1M\+0x[0-9,a-f]+	.*runtime/pprof/mprof_test.go:24
+#	0x[0-9,a-f]+	runtime/pprof\.TestMemoryProfiler\+0x[0-9,a-f]+	.*runtime/pprof/mprof_test.go:79
+`, (1<<10)*memoryProfilerRun, (1<<20)*memoryProfilerRun),
+	}, {
+		stk: []string{"runtime/pprof.allocateTransient2M", "runtime/pprof.TestMemoryProfiler"},
+		legacy: fmt.Sprintf(`0: 0 \[%v: %v\] @ 0x[0-9,a-f]+ 0x[0-9,a-f]+ 0x[0-9,a-f]+ 0x[0-9,a-f]+
+#	0x[0-9,a-f]+	runtime/pprof\.allocateTransient2M\+0x[0-9,a-f]+	.*runtime/pprof/mprof_test.go:30
+#	0x[0-9,a-f]+	runtime/pprof\.TestMemoryProfiler\+0x[0-9,a-f]+	.*runtime/pprof/mprof_test.go:80
+`, memoryProfilerRun, (2<<20)*memoryProfilerRun),
+	}, {
+		stk: []string{"runtime/pprof.allocateTransient2MInline", "runtime/pprof.TestMemoryProfiler"},
+		legacy: fmt.Sprintf(`0: 0 \[%v: %v\] @ 0x[0-9,a-f]+ 0x[0-9,a-f]+ 0x[0-9,a-f]+ 0x[0-9,a-f]+
+#	0x[0-9,a-f]+	runtime/pprof\.allocateTransient2MInline\+0x[0-9,a-f]+	.*runtime/pprof/mprof_test.go:34
+#	0x[0-9,a-f]+	runtime/pprof\.TestMemoryProfiler\+0x[0-9,a-f]+	.*runtime/pprof/mprof_test.go:81
+`, memoryProfilerRun, (2<<20)*memoryProfilerRun),
+	}, {
+		stk: []string{"runtime/pprof.allocateReflectTransient"},
+		legacy: fmt.Sprintf(`0: 0 \[%v: %v\] @( 0x[0-9,a-f]+)+
+#	0x[0-9,a-f]+	runtime/pprof\.allocateReflectTransient\+0x[0-9,a-f]+	.*runtime/pprof/mprof_test.go:55
+`, memoryProfilerRun, (2<<20)*memoryProfilerRun),
+	}}
+
+	t.Run("debug=1", func(t *testing.T) {
+		var buf bytes.Buffer
+		if err := Lookup("heap").WriteTo(&buf, 1); err != nil {
+			t.Fatalf("failed to write heap profile: %v", err)
+		}
+
+		for _, test := range tests {
+			if !regexp.MustCompile(test.legacy).Match(buf.Bytes()) {
+				t.Fatalf("The entry did not match:\n%v\n\nProfile:\n%v\n", test.legacy, buf.String())
+			}
+		}
+	})
+
+	t.Run("proto", func(t *testing.T) {
+		var buf bytes.Buffer
+		if err := Lookup("heap").WriteTo(&buf, 0); err != nil {
+			t.Fatalf("failed to write heap profile: %v", err)
+		}
+		p, err := profile.Parse(&buf)
+		if err != nil {
+			t.Fatalf("failed to parse heap profile: %v", err)
+		}
+		t.Logf("Profile = %v", p)
+
+		stks := stacks(p)
+		for _, test := range tests {
+			if !containsStack(stks, test.stk) {
+				t.Fatalf("No matching stack entry for %q\n\nProfile:\n%v\n", test.stk, p)
+			}
+		}
+
+		if !containsInlinedCall(TestMemoryProfiler, 4<<10) {
+			t.Logf("Can't determine whether allocateTransient2MInline was inlined into TestMemoryProfiler.")
+			return
+		}
+
+		// Check the inlined function location is encoded correctly.
+		for _, loc := range p.Location {
+			inlinedCaller, inlinedCallee := false, false
+			for _, line := range loc.Line {
+				if line.Function.Name == "runtime/pprof.allocateTransient2MInline" {
+					inlinedCallee = true
+				}
+				if inlinedCallee && line.Function.Name == "runtime/pprof.TestMemoryProfiler" {
+					inlinedCaller = true
+				}
+			}
+			if inlinedCallee != inlinedCaller {
+				t.Errorf("want allocateTransient2MInline after TestMemoryProfiler in one location, got separate location entries:\n%v", loc)
+			}
+		}
+	})
+}
diff --git a/src/runtime/pprof/pe.go b/src/runtime/pprof/pe.go
new file mode 100644
index 0000000..4105458
--- /dev/null
+++ b/src/runtime/pprof/pe.go
@@ -0,0 +1,19 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import "os"
+
+// peBuildID returns a best effort unique ID for the named executable.
+//
+// It would be wasteful to calculate the hash of the whole file,
+// instead use the binary name and the last modified time for the buildid.
+func peBuildID(file string) string {
+	s, err := os.Stat(file)
+	if err != nil {
+		return file
+	}
+	return file + s.ModTime().String()
+}
diff --git a/src/runtime/pprof/pprof.go b/src/runtime/pprof/pprof.go
new file mode 100644
index 0000000..17a490e
--- /dev/null
+++ b/src/runtime/pprof/pprof.go
@@ -0,0 +1,910 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Package pprof writes runtime profiling data in the format expected
+// by the pprof visualization tool.
+//
+// # Profiling a Go program
+//
+// The first step to profiling a Go program is to enable profiling.
+// Support for profiling benchmarks built with the standard testing
+// package is built into go test. For example, the following command
+// runs benchmarks in the current directory and writes the CPU and
+// memory profiles to cpu.prof and mem.prof:
+//
+//	go test -cpuprofile cpu.prof -memprofile mem.prof -bench .
+//
+// To add equivalent profiling support to a standalone program, add
+// code like the following to your main function:
+//
+//	var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to `file`")
+//	var memprofile = flag.String("memprofile", "", "write memory profile to `file`")
+//
+//	func main() {
+//	    flag.Parse()
+//	    if *cpuprofile != "" {
+//	        f, err := os.Create(*cpuprofile)
+//	        if err != nil {
+//	            log.Fatal("could not create CPU profile: ", err)
+//	        }
+//	        defer f.Close() // error handling omitted for example
+//	        if err := pprof.StartCPUProfile(f); err != nil {
+//	            log.Fatal("could not start CPU profile: ", err)
+//	        }
+//	        defer pprof.StopCPUProfile()
+//	    }
+//
+//	    // ... rest of the program ...
+//
+//	    if *memprofile != "" {
+//	        f, err := os.Create(*memprofile)
+//	        if err != nil {
+//	            log.Fatal("could not create memory profile: ", err)
+//	        }
+//	        defer f.Close() // error handling omitted for example
+//	        runtime.GC() // get up-to-date statistics
+//	        if err := pprof.WriteHeapProfile(f); err != nil {
+//	            log.Fatal("could not write memory profile: ", err)
+//	        }
+//	    }
+//	}
+//
+// There is also a standard HTTP interface to profiling data. Adding
+// the following line will install handlers under the /debug/pprof/
+// URL to download live profiles:
+//
+//	import _ "net/http/pprof"
+//
+// See the net/http/pprof package for more details.
+//
+// Profiles can then be visualized with the pprof tool:
+//
+//	go tool pprof cpu.prof
+//
+// There are many commands available from the pprof command line.
+// Commonly used commands include "top", which prints a summary of the
+// top program hot-spots, and "web", which opens an interactive graph
+// of hot-spots and their call graphs. Use "help" for information on
+// all pprof commands.
+//
+// For more information about pprof, see
+// https://github.com/google/pprof/blob/master/doc/README.md.
+package pprof
+
+import (
+	"bufio"
+	"fmt"
+	"internal/abi"
+	"io"
+	"runtime"
+	"sort"
+	"strings"
+	"sync"
+	"text/tabwriter"
+	"time"
+	"unsafe"
+)
+
+// BUG(rsc): Profiles are only as good as the kernel support used to generate them.
+// See https://golang.org/issue/13841 for details about known problems.
+
+// A Profile is a collection of stack traces showing the call sequences
+// that led to instances of a particular event, such as allocation.
+// Packages can create and maintain their own profiles; the most common
+// use is for tracking resources that must be explicitly closed, such as files
+// or network connections.
+//
+// A Profile's methods can be called from multiple goroutines simultaneously.
+//
+// Each Profile has a unique name. A few profiles are predefined:
+//
+//	goroutine    - stack traces of all current goroutines
+//	heap         - a sampling of memory allocations of live objects
+//	allocs       - a sampling of all past memory allocations
+//	threadcreate - stack traces that led to the creation of new OS threads
+//	block        - stack traces that led to blocking on synchronization primitives
+//	mutex        - stack traces of holders of contended mutexes
+//
+// These predefined profiles maintain themselves and panic on an explicit
+// Add or Remove method call.
+//
+// The heap profile reports statistics as of the most recently completed
+// garbage collection; it elides more recent allocation to avoid skewing
+// the profile away from live data and toward garbage.
+// If there has been no garbage collection at all, the heap profile reports
+// all known allocations. This exception helps mainly in programs running
+// without garbage collection enabled, usually for debugging purposes.
+//
+// The heap profile tracks both the allocation sites for all live objects in
+// the application memory and for all objects allocated since the program start.
+// Pprof's -inuse_space, -inuse_objects, -alloc_space, and -alloc_objects
+// flags select which to display, defaulting to -inuse_space (live objects,
+// scaled by size).
+//
+// The allocs profile is the same as the heap profile but changes the default
+// pprof display to -alloc_space, the total number of bytes allocated since
+// the program began (including garbage-collected bytes).
+//
+// The CPU profile is not available as a Profile. It has a special API,
+// the StartCPUProfile and StopCPUProfile functions, because it streams
+// output to a writer during profiling.
+type Profile struct {
+	name  string
+	mu    sync.Mutex
+	m     map[any][]uintptr
+	count func() int
+	write func(io.Writer, int) error
+}
+
+// profiles records all registered profiles.
+var profiles struct {
+	mu sync.Mutex
+	m  map[string]*Profile
+}
+
+var goroutineProfile = &Profile{
+	name:  "goroutine",
+	count: countGoroutine,
+	write: writeGoroutine,
+}
+
+var threadcreateProfile = &Profile{
+	name:  "threadcreate",
+	count: countThreadCreate,
+	write: writeThreadCreate,
+}
+
+var heapProfile = &Profile{
+	name:  "heap",
+	count: countHeap,
+	write: writeHeap,
+}
+
+var allocsProfile = &Profile{
+	name:  "allocs",
+	count: countHeap, // identical to heap profile
+	write: writeAlloc,
+}
+
+var blockProfile = &Profile{
+	name:  "block",
+	count: countBlock,
+	write: writeBlock,
+}
+
+var mutexProfile = &Profile{
+	name:  "mutex",
+	count: countMutex,
+	write: writeMutex,
+}
+
+func lockProfiles() {
+	profiles.mu.Lock()
+	if profiles.m == nil {
+		// Initial built-in profiles.
+		profiles.m = map[string]*Profile{
+			"goroutine":    goroutineProfile,
+			"threadcreate": threadcreateProfile,
+			"heap":         heapProfile,
+			"allocs":       allocsProfile,
+			"block":        blockProfile,
+			"mutex":        mutexProfile,
+		}
+	}
+}
+
+func unlockProfiles() {
+	profiles.mu.Unlock()
+}
+
+// NewProfile creates a new profile with the given name.
+// If a profile with that name already exists, NewProfile panics.
+// The convention is to use a 'import/path.' prefix to create
+// separate name spaces for each package.
+// For compatibility with various tools that read pprof data,
+// profile names should not contain spaces.
+func NewProfile(name string) *Profile {
+	lockProfiles()
+	defer unlockProfiles()
+	if name == "" {
+		panic("pprof: NewProfile with empty name")
+	}
+	if profiles.m[name] != nil {
+		panic("pprof: NewProfile name already in use: " + name)
+	}
+	p := &Profile{
+		name: name,
+		m:    map[any][]uintptr{},
+	}
+	profiles.m[name] = p
+	return p
+}
+
+// Lookup returns the profile with the given name, or nil if no such profile exists.
+func Lookup(name string) *Profile {
+	lockProfiles()
+	defer unlockProfiles()
+	return profiles.m[name]
+}
+
+// Profiles returns a slice of all the known profiles, sorted by name.
+func Profiles() []*Profile {
+	lockProfiles()
+	defer unlockProfiles()
+
+	all := make([]*Profile, 0, len(profiles.m))
+	for _, p := range profiles.m {
+		all = append(all, p)
+	}
+
+	sort.Slice(all, func(i, j int) bool { return all[i].name < all[j].name })
+	return all
+}
+
+// Name returns this profile's name, which can be passed to Lookup to reobtain the profile.
+func (p *Profile) Name() string {
+	return p.name
+}
+
+// Count returns the number of execution stacks currently in the profile.
+func (p *Profile) Count() int {
+	p.mu.Lock()
+	defer p.mu.Unlock()
+	if p.count != nil {
+		return p.count()
+	}
+	return len(p.m)
+}
+
+// Add adds the current execution stack to the profile, associated with value.
+// Add stores value in an internal map, so value must be suitable for use as
+// a map key and will not be garbage collected until the corresponding
+// call to Remove. Add panics if the profile already contains a stack for value.
+//
+// The skip parameter has the same meaning as runtime.Caller's skip
+// and controls where the stack trace begins. Passing skip=0 begins the
+// trace in the function calling Add. For example, given this
+// execution stack:
+//
+//	Add
+//	called from rpc.NewClient
+//	called from mypkg.Run
+//	called from main.main
+//
+// Passing skip=0 begins the stack trace at the call to Add inside rpc.NewClient.
+// Passing skip=1 begins the stack trace at the call to NewClient inside mypkg.Run.
+func (p *Profile) Add(value any, skip int) {
+	if p.name == "" {
+		panic("pprof: use of uninitialized Profile")
+	}
+	if p.write != nil {
+		panic("pprof: Add called on built-in Profile " + p.name)
+	}
+
+	stk := make([]uintptr, 32)
+	n := runtime.Callers(skip+1, stk[:])
+	stk = stk[:n]
+	if len(stk) == 0 {
+		// The value for skip is too large, and there's no stack trace to record.
+		stk = []uintptr{abi.FuncPCABIInternal(lostProfileEvent)}
+	}
+
+	p.mu.Lock()
+	defer p.mu.Unlock()
+	if p.m[value] != nil {
+		panic("pprof: Profile.Add of duplicate value")
+	}
+	p.m[value] = stk
+}
+
+// Remove removes the execution stack associated with value from the profile.
+// It is a no-op if the value is not in the profile.
+func (p *Profile) Remove(value any) {
+	p.mu.Lock()
+	defer p.mu.Unlock()
+	delete(p.m, value)
+}
+
+// WriteTo writes a pprof-formatted snapshot of the profile to w.
+// If a write to w returns an error, WriteTo returns that error.
+// Otherwise, WriteTo returns nil.
+//
+// The debug parameter enables additional output.
+// Passing debug=0 writes the gzip-compressed protocol buffer described
+// in https://github.com/google/pprof/tree/master/proto#overview.
+// Passing debug=1 writes the legacy text format with comments
+// translating addresses to function names and line numbers, so that a
+// programmer can read the profile without tools.
+//
+// The predefined profiles may assign meaning to other debug values;
+// for example, when printing the "goroutine" profile, debug=2 means to
+// print the goroutine stacks in the same form that a Go program uses
+// when dying due to an unrecovered panic.
+func (p *Profile) WriteTo(w io.Writer, debug int) error {
+	if p.name == "" {
+		panic("pprof: use of zero Profile")
+	}
+	if p.write != nil {
+		return p.write(w, debug)
+	}
+
+	// Obtain consistent snapshot under lock; then process without lock.
+	p.mu.Lock()
+	all := make([][]uintptr, 0, len(p.m))
+	for _, stk := range p.m {
+		all = append(all, stk)
+	}
+	p.mu.Unlock()
+
+	// Map order is non-deterministic; make output deterministic.
+	sort.Slice(all, func(i, j int) bool {
+		t, u := all[i], all[j]
+		for k := 0; k < len(t) && k < len(u); k++ {
+			if t[k] != u[k] {
+				return t[k] < u[k]
+			}
+		}
+		return len(t) < len(u)
+	})
+
+	return printCountProfile(w, debug, p.name, stackProfile(all))
+}
+
+type stackProfile [][]uintptr
+
+func (x stackProfile) Len() int              { return len(x) }
+func (x stackProfile) Stack(i int) []uintptr { return x[i] }
+func (x stackProfile) Label(i int) *labelMap { return nil }
+
+// A countProfile is a set of stack traces to be printed as counts
+// grouped by stack trace. There are multiple implementations:
+// all that matters is that we can find out how many traces there are
+// and obtain each trace in turn.
+type countProfile interface {
+	Len() int
+	Stack(i int) []uintptr
+	Label(i int) *labelMap
+}
+
+// printCountCycleProfile outputs block profile records (for block or mutex profiles)
+// as the pprof-proto format output. Translations from cycle count to time duration
+// are done because The proto expects count and time (nanoseconds) instead of count
+// and the number of cycles for block, contention profiles.
+func printCountCycleProfile(w io.Writer, countName, cycleName string, records []runtime.BlockProfileRecord) error {
+	// Output profile in protobuf form.
+	b := newProfileBuilder(w)
+	b.pbValueType(tagProfile_PeriodType, countName, "count")
+	b.pb.int64Opt(tagProfile_Period, 1)
+	b.pbValueType(tagProfile_SampleType, countName, "count")
+	b.pbValueType(tagProfile_SampleType, cycleName, "nanoseconds")
+
+	cpuGHz := float64(runtime_cyclesPerSecond()) / 1e9
+
+	values := []int64{0, 0}
+	var locs []uint64
+	for _, r := range records {
+		values[0] = r.Count
+		values[1] = int64(float64(r.Cycles) / cpuGHz)
+		// For count profiles, all stack addresses are
+		// return PCs, which is what appendLocsForStack expects.
+		locs = b.appendLocsForStack(locs[:0], r.Stack())
+		b.pbSample(values, locs, nil)
+	}
+	b.build()
+	return nil
+}
+
+// printCountProfile prints a countProfile at the specified debug level.
+// The profile will be in compressed proto format unless debug is nonzero.
+func printCountProfile(w io.Writer, debug int, name string, p countProfile) error {
+	// Build count of each stack.
+	var buf strings.Builder
+	key := func(stk []uintptr, lbls *labelMap) string {
+		buf.Reset()
+		fmt.Fprintf(&buf, "@")
+		for _, pc := range stk {
+			fmt.Fprintf(&buf, " %#x", pc)
+		}
+		if lbls != nil {
+			buf.WriteString("\n# labels: ")
+			buf.WriteString(lbls.String())
+		}
+		return buf.String()
+	}
+	count := map[string]int{}
+	index := map[string]int{}
+	var keys []string
+	n := p.Len()
+	for i := 0; i < n; i++ {
+		k := key(p.Stack(i), p.Label(i))
+		if count[k] == 0 {
+			index[k] = i
+			keys = append(keys, k)
+		}
+		count[k]++
+	}
+
+	sort.Sort(&keysByCount{keys, count})
+
+	if debug > 0 {
+		// Print debug profile in legacy format
+		tw := tabwriter.NewWriter(w, 1, 8, 1, '\t', 0)
+		fmt.Fprintf(tw, "%s profile: total %d\n", name, p.Len())
+		for _, k := range keys {
+			fmt.Fprintf(tw, "%d %s\n", count[k], k)
+			printStackRecord(tw, p.Stack(index[k]), false)
+		}
+		return tw.Flush()
+	}
+
+	// Output profile in protobuf form.
+	b := newProfileBuilder(w)
+	b.pbValueType(tagProfile_PeriodType, name, "count")
+	b.pb.int64Opt(tagProfile_Period, 1)
+	b.pbValueType(tagProfile_SampleType, name, "count")
+
+	values := []int64{0}
+	var locs []uint64
+	for _, k := range keys {
+		values[0] = int64(count[k])
+		// For count profiles, all stack addresses are
+		// return PCs, which is what appendLocsForStack expects.
+		locs = b.appendLocsForStack(locs[:0], p.Stack(index[k]))
+		idx := index[k]
+		var labels func()
+		if p.Label(idx) != nil {
+			labels = func() {
+				for k, v := range *p.Label(idx) {
+					b.pbLabel(tagSample_Label, k, v, 0)
+				}
+			}
+		}
+		b.pbSample(values, locs, labels)
+	}
+	b.build()
+	return nil
+}
+
+// keysByCount sorts keys with higher counts first, breaking ties by key string order.
+type keysByCount struct {
+	keys  []string
+	count map[string]int
+}
+
+func (x *keysByCount) Len() int      { return len(x.keys) }
+func (x *keysByCount) Swap(i, j int) { x.keys[i], x.keys[j] = x.keys[j], x.keys[i] }
+func (x *keysByCount) Less(i, j int) bool {
+	ki, kj := x.keys[i], x.keys[j]
+	ci, cj := x.count[ki], x.count[kj]
+	if ci != cj {
+		return ci > cj
+	}
+	return ki < kj
+}
+
+// printStackRecord prints the function + source line information
+// for a single stack trace.
+func printStackRecord(w io.Writer, stk []uintptr, allFrames bool) {
+	show := allFrames
+	frames := runtime.CallersFrames(stk)
+	for {
+		frame, more := frames.Next()
+		name := frame.Function
+		if name == "" {
+			show = true
+			fmt.Fprintf(w, "#\t%#x\n", frame.PC)
+		} else if name != "runtime.goexit" && (show || !strings.HasPrefix(name, "runtime.")) {
+			// Hide runtime.goexit and any runtime functions at the beginning.
+			// This is useful mainly for allocation traces.
+			show = true
+			fmt.Fprintf(w, "#\t%#x\t%s+%#x\t%s:%d\n", frame.PC, name, frame.PC-frame.Entry, frame.File, frame.Line)
+		}
+		if !more {
+			break
+		}
+	}
+	if !show {
+		// We didn't print anything; do it again,
+		// and this time include runtime functions.
+		printStackRecord(w, stk, true)
+		return
+	}
+	fmt.Fprintf(w, "\n")
+}
+
+// Interface to system profiles.
+
+// WriteHeapProfile is shorthand for Lookup("heap").WriteTo(w, 0).
+// It is preserved for backwards compatibility.
+func WriteHeapProfile(w io.Writer) error {
+	return writeHeap(w, 0)
+}
+
+// countHeap returns the number of records in the heap profile.
+func countHeap() int {
+	n, _ := runtime.MemProfile(nil, true)
+	return n
+}
+
+// writeHeap writes the current runtime heap profile to w.
+func writeHeap(w io.Writer, debug int) error {
+	return writeHeapInternal(w, debug, "")
+}
+
+// writeAlloc writes the current runtime heap profile to w
+// with the total allocation space as the default sample type.
+func writeAlloc(w io.Writer, debug int) error {
+	return writeHeapInternal(w, debug, "alloc_space")
+}
+
+func writeHeapInternal(w io.Writer, debug int, defaultSampleType string) error {
+	var memStats *runtime.MemStats
+	if debug != 0 {
+		// Read mem stats first, so that our other allocations
+		// do not appear in the statistics.
+		memStats = new(runtime.MemStats)
+		runtime.ReadMemStats(memStats)
+	}
+
+	// Find out how many records there are (MemProfile(nil, true)),
+	// allocate that many records, and get the data.
+	// There's a race—more records might be added between
+	// the two calls—so allocate a few extra records for safety
+	// and also try again if we're very unlucky.
+	// The loop should only execute one iteration in the common case.
+	var p []runtime.MemProfileRecord
+	n, ok := runtime.MemProfile(nil, true)
+	for {
+		// Allocate room for a slightly bigger profile,
+		// in case a few more entries have been added
+		// since the call to MemProfile.
+		p = make([]runtime.MemProfileRecord, n+50)
+		n, ok = runtime.MemProfile(p, true)
+		if ok {
+			p = p[0:n]
+			break
+		}
+		// Profile grew; try again.
+	}
+
+	if debug == 0 {
+		return writeHeapProto(w, p, int64(runtime.MemProfileRate), defaultSampleType)
+	}
+
+	sort.Slice(p, func(i, j int) bool { return p[i].InUseBytes() > p[j].InUseBytes() })
+
+	b := bufio.NewWriter(w)
+	tw := tabwriter.NewWriter(b, 1, 8, 1, '\t', 0)
+	w = tw
+
+	var total runtime.MemProfileRecord
+	for i := range p {
+		r := &p[i]
+		total.AllocBytes += r.AllocBytes
+		total.AllocObjects += r.AllocObjects
+		total.FreeBytes += r.FreeBytes
+		total.FreeObjects += r.FreeObjects
+	}
+
+	// Technically the rate is MemProfileRate not 2*MemProfileRate,
+	// but early versions of the C++ heap profiler reported 2*MemProfileRate,
+	// so that's what pprof has come to expect.
+	rate := 2 * runtime.MemProfileRate
+
+	// pprof reads a profile with alloc == inuse as being a "2-column" profile
+	// (objects and bytes, not distinguishing alloc from inuse),
+	// but then such a profile can't be merged using pprof *.prof with
+	// other 4-column profiles where alloc != inuse.
+	// The easiest way to avoid this bug is to adjust allocBytes so it's never == inuseBytes.
+	// pprof doesn't use these header values anymore except for checking equality.
+	inUseBytes := total.InUseBytes()
+	allocBytes := total.AllocBytes
+	if inUseBytes == allocBytes {
+		allocBytes++
+	}
+
+	fmt.Fprintf(w, "heap profile: %d: %d [%d: %d] @ heap/%d\n",
+		total.InUseObjects(), inUseBytes,
+		total.AllocObjects, allocBytes,
+		rate)
+
+	for i := range p {
+		r := &p[i]
+		fmt.Fprintf(w, "%d: %d [%d: %d] @",
+			r.InUseObjects(), r.InUseBytes(),
+			r.AllocObjects, r.AllocBytes)
+		for _, pc := range r.Stack() {
+			fmt.Fprintf(w, " %#x", pc)
+		}
+		fmt.Fprintf(w, "\n")
+		printStackRecord(w, r.Stack(), false)
+	}
+
+	// Print memstats information too.
+	// Pprof will ignore, but useful for people
+	s := memStats
+	fmt.Fprintf(w, "\n# runtime.MemStats\n")
+	fmt.Fprintf(w, "# Alloc = %d\n", s.Alloc)
+	fmt.Fprintf(w, "# TotalAlloc = %d\n", s.TotalAlloc)
+	fmt.Fprintf(w, "# Sys = %d\n", s.Sys)
+	fmt.Fprintf(w, "# Lookups = %d\n", s.Lookups)
+	fmt.Fprintf(w, "# Mallocs = %d\n", s.Mallocs)
+	fmt.Fprintf(w, "# Frees = %d\n", s.Frees)
+
+	fmt.Fprintf(w, "# HeapAlloc = %d\n", s.HeapAlloc)
+	fmt.Fprintf(w, "# HeapSys = %d\n", s.HeapSys)
+	fmt.Fprintf(w, "# HeapIdle = %d\n", s.HeapIdle)
+	fmt.Fprintf(w, "# HeapInuse = %d\n", s.HeapInuse)
+	fmt.Fprintf(w, "# HeapReleased = %d\n", s.HeapReleased)
+	fmt.Fprintf(w, "# HeapObjects = %d\n", s.HeapObjects)
+
+	fmt.Fprintf(w, "# Stack = %d / %d\n", s.StackInuse, s.StackSys)
+	fmt.Fprintf(w, "# MSpan = %d / %d\n", s.MSpanInuse, s.MSpanSys)
+	fmt.Fprintf(w, "# MCache = %d / %d\n", s.MCacheInuse, s.MCacheSys)
+	fmt.Fprintf(w, "# BuckHashSys = %d\n", s.BuckHashSys)
+	fmt.Fprintf(w, "# GCSys = %d\n", s.GCSys)
+	fmt.Fprintf(w, "# OtherSys = %d\n", s.OtherSys)
+
+	fmt.Fprintf(w, "# NextGC = %d\n", s.NextGC)
+	fmt.Fprintf(w, "# LastGC = %d\n", s.LastGC)
+	fmt.Fprintf(w, "# PauseNs = %d\n", s.PauseNs)
+	fmt.Fprintf(w, "# PauseEnd = %d\n", s.PauseEnd)
+	fmt.Fprintf(w, "# NumGC = %d\n", s.NumGC)
+	fmt.Fprintf(w, "# NumForcedGC = %d\n", s.NumForcedGC)
+	fmt.Fprintf(w, "# GCCPUFraction = %v\n", s.GCCPUFraction)
+	fmt.Fprintf(w, "# DebugGC = %v\n", s.DebugGC)
+
+	// Also flush out MaxRSS on supported platforms.
+	addMaxRSS(w)
+
+	tw.Flush()
+	return b.Flush()
+}
+
+// countThreadCreate returns the size of the current ThreadCreateProfile.
+func countThreadCreate() int {
+	n, _ := runtime.ThreadCreateProfile(nil)
+	return n
+}
+
+// writeThreadCreate writes the current runtime ThreadCreateProfile to w.
+func writeThreadCreate(w io.Writer, debug int) error {
+	// Until https://golang.org/issues/6104 is addressed, wrap
+	// ThreadCreateProfile because there's no point in tracking labels when we
+	// don't get any stack-traces.
+	return writeRuntimeProfile(w, debug, "threadcreate", func(p []runtime.StackRecord, _ []unsafe.Pointer) (n int, ok bool) {
+		return runtime.ThreadCreateProfile(p)
+	})
+}
+
+// countGoroutine returns the number of goroutines.
+func countGoroutine() int {
+	return runtime.NumGoroutine()
+}
+
+// runtime_goroutineProfileWithLabels is defined in runtime/mprof.go
+func runtime_goroutineProfileWithLabels(p []runtime.StackRecord, labels []unsafe.Pointer) (n int, ok bool)
+
+// writeGoroutine writes the current runtime GoroutineProfile to w.
+func writeGoroutine(w io.Writer, debug int) error {
+	if debug >= 2 {
+		return writeGoroutineStacks(w)
+	}
+	return writeRuntimeProfile(w, debug, "goroutine", runtime_goroutineProfileWithLabels)
+}
+
+func writeGoroutineStacks(w io.Writer) error {
+	// We don't know how big the buffer needs to be to collect
+	// all the goroutines. Start with 1 MB and try a few times, doubling each time.
+	// Give up and use a truncated trace if 64 MB is not enough.
+	buf := make([]byte, 1<<20)
+	for i := 0; ; i++ {
+		n := runtime.Stack(buf, true)
+		if n < len(buf) {
+			buf = buf[:n]
+			break
+		}
+		if len(buf) >= 64<<20 {
+			// Filled 64 MB - stop there.
+			break
+		}
+		buf = make([]byte, 2*len(buf))
+	}
+	_, err := w.Write(buf)
+	return err
+}
+
+func writeRuntimeProfile(w io.Writer, debug int, name string, fetch func([]runtime.StackRecord, []unsafe.Pointer) (int, bool)) error {
+	// Find out how many records there are (fetch(nil)),
+	// allocate that many records, and get the data.
+	// There's a race—more records might be added between
+	// the two calls—so allocate a few extra records for safety
+	// and also try again if we're very unlucky.
+	// The loop should only execute one iteration in the common case.
+	var p []runtime.StackRecord
+	var labels []unsafe.Pointer
+	n, ok := fetch(nil, nil)
+	for {
+		// Allocate room for a slightly bigger profile,
+		// in case a few more entries have been added
+		// since the call to ThreadProfile.
+		p = make([]runtime.StackRecord, n+10)
+		labels = make([]unsafe.Pointer, n+10)
+		n, ok = fetch(p, labels)
+		if ok {
+			p = p[0:n]
+			break
+		}
+		// Profile grew; try again.
+	}
+
+	return printCountProfile(w, debug, name, &runtimeProfile{p, labels})
+}
+
+type runtimeProfile struct {
+	stk    []runtime.StackRecord
+	labels []unsafe.Pointer
+}
+
+func (p *runtimeProfile) Len() int              { return len(p.stk) }
+func (p *runtimeProfile) Stack(i int) []uintptr { return p.stk[i].Stack() }
+func (p *runtimeProfile) Label(i int) *labelMap { return (*labelMap)(p.labels[i]) }
+
+var cpu struct {
+	sync.Mutex
+	profiling bool
+	done      chan bool
+}
+
+// StartCPUProfile enables CPU profiling for the current process.
+// While profiling, the profile will be buffered and written to w.
+// StartCPUProfile returns an error if profiling is already enabled.
+//
+// On Unix-like systems, StartCPUProfile does not work by default for
+// Go code built with -buildmode=c-archive or -buildmode=c-shared.
+// StartCPUProfile relies on the SIGPROF signal, but that signal will
+// be delivered to the main program's SIGPROF signal handler (if any)
+// not to the one used by Go. To make it work, call os/signal.Notify
+// for syscall.SIGPROF, but note that doing so may break any profiling
+// being done by the main program.
+func StartCPUProfile(w io.Writer) error {
+	// The runtime routines allow a variable profiling rate,
+	// but in practice operating systems cannot trigger signals
+	// at more than about 500 Hz, and our processing of the
+	// signal is not cheap (mostly getting the stack trace).
+	// 100 Hz is a reasonable choice: it is frequent enough to
+	// produce useful data, rare enough not to bog down the
+	// system, and a nice round number to make it easy to
+	// convert sample counts to seconds. Instead of requiring
+	// each client to specify the frequency, we hard code it.
+	const hz = 100
+
+	cpu.Lock()
+	defer cpu.Unlock()
+	if cpu.done == nil {
+		cpu.done = make(chan bool)
+	}
+	// Double-check.
+	if cpu.profiling {
+		return fmt.Errorf("cpu profiling already in use")
+	}
+	cpu.profiling = true
+	runtime.SetCPUProfileRate(hz)
+	go profileWriter(w)
+	return nil
+}
+
+// readProfile, provided by the runtime, returns the next chunk of
+// binary CPU profiling stack trace data, blocking until data is available.
+// If profiling is turned off and all the profile data accumulated while it was
+// on has been returned, readProfile returns eof=true.
+// The caller must save the returned data and tags before calling readProfile again.
+func readProfile() (data []uint64, tags []unsafe.Pointer, eof bool)
+
+func profileWriter(w io.Writer) {
+	b := newProfileBuilder(w)
+	var err error
+	for {
+		time.Sleep(100 * time.Millisecond)
+		data, tags, eof := readProfile()
+		if e := b.addCPUData(data, tags); e != nil && err == nil {
+			err = e
+		}
+		if eof {
+			break
+		}
+	}
+	if err != nil {
+		// The runtime should never produce an invalid or truncated profile.
+		// It drops records that can't fit into its log buffers.
+		panic("runtime/pprof: converting profile: " + err.Error())
+	}
+	b.build()
+	cpu.done <- true
+}
+
+// StopCPUProfile stops the current CPU profile, if any.
+// StopCPUProfile only returns after all the writes for the
+// profile have completed.
+func StopCPUProfile() {
+	cpu.Lock()
+	defer cpu.Unlock()
+
+	if !cpu.profiling {
+		return
+	}
+	cpu.profiling = false
+	runtime.SetCPUProfileRate(0)
+	<-cpu.done
+}
+
+// countBlock returns the number of records in the blocking profile.
+func countBlock() int {
+	n, _ := runtime.BlockProfile(nil)
+	return n
+}
+
+// countMutex returns the number of records in the mutex profile.
+func countMutex() int {
+	n, _ := runtime.MutexProfile(nil)
+	return n
+}
+
+// writeBlock writes the current blocking profile to w.
+func writeBlock(w io.Writer, debug int) error {
+	return writeProfileInternal(w, debug, "contention", runtime.BlockProfile)
+}
+
+// writeMutex writes the current mutex profile to w.
+func writeMutex(w io.Writer, debug int) error {
+	return writeProfileInternal(w, debug, "mutex", runtime.MutexProfile)
+}
+
+// writeProfileInternal writes the current blocking or mutex profile depending on the passed parameters.
+func writeProfileInternal(w io.Writer, debug int, name string, runtimeProfile func([]runtime.BlockProfileRecord) (int, bool)) error {
+	var p []runtime.BlockProfileRecord
+	n, ok := runtimeProfile(nil)
+	for {
+		p = make([]runtime.BlockProfileRecord, n+50)
+		n, ok = runtimeProfile(p)
+		if ok {
+			p = p[:n]
+			break
+		}
+	}
+
+	sort.Slice(p, func(i, j int) bool { return p[i].Cycles > p[j].Cycles })
+
+	if debug <= 0 {
+		return printCountCycleProfile(w, "contentions", "delay", p)
+	}
+
+	b := bufio.NewWriter(w)
+	tw := tabwriter.NewWriter(w, 1, 8, 1, '\t', 0)
+	w = tw
+
+	fmt.Fprintf(w, "--- %v:\n", name)
+	fmt.Fprintf(w, "cycles/second=%v\n", runtime_cyclesPerSecond())
+	if name == "mutex" {
+		fmt.Fprintf(w, "sampling period=%d\n", runtime.SetMutexProfileFraction(-1))
+	}
+	for i := range p {
+		r := &p[i]
+		fmt.Fprintf(w, "%v %v @", r.Cycles, r.Count)
+		for _, pc := range r.Stack() {
+			fmt.Fprintf(w, " %#x", pc)
+		}
+		fmt.Fprint(w, "\n")
+		if debug > 0 {
+			printStackRecord(w, r.Stack(), true)
+		}
+	}
+
+	if tw != nil {
+		tw.Flush()
+	}
+	return b.Flush()
+}
+
+func runtime_cyclesPerSecond() int64
diff --git a/src/runtime/pprof/pprof_norusage.go b/src/runtime/pprof/pprof_norusage.go
new file mode 100644
index 0000000..8de3808
--- /dev/null
+++ b/src/runtime/pprof/pprof_norusage.go
@@ -0,0 +1,15 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !aix && !darwin && !dragonfly && !freebsd && !linux && !netbsd && !openbsd && !solaris && !windows
+
+package pprof
+
+import (
+	"io"
+)
+
+// Stub call for platforms that don't support rusage.
+func addMaxRSS(w io.Writer) {
+}
diff --git a/src/runtime/pprof/pprof_rusage.go b/src/runtime/pprof/pprof_rusage.go
new file mode 100644
index 0000000..aa429fb
--- /dev/null
+++ b/src/runtime/pprof/pprof_rusage.go
@@ -0,0 +1,35 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package pprof
+
+import (
+	"fmt"
+	"io"
+	"runtime"
+	"syscall"
+)
+
+// Adds MaxRSS to platforms that are supported.
+func addMaxRSS(w io.Writer) {
+	var rssToBytes uintptr
+	switch runtime.GOOS {
+	case "aix", "android", "dragonfly", "freebsd", "linux", "netbsd", "openbsd":
+		rssToBytes = 1024
+	case "darwin", "ios":
+		rssToBytes = 1
+	case "illumos", "solaris":
+		rssToBytes = uintptr(syscall.Getpagesize())
+	default:
+		panic("unsupported OS")
+	}
+
+	var rusage syscall.Rusage
+	err := syscall.Getrusage(syscall.RUSAGE_SELF, &rusage)
+	if err == nil {
+		fmt.Fprintf(w, "# MaxRSS = %d\n", uintptr(rusage.Maxrss)*rssToBytes)
+	}
+}
diff --git a/src/runtime/pprof/pprof_test.go b/src/runtime/pprof/pprof_test.go
new file mode 100644
index 0000000..56ba6d9
--- /dev/null
+++ b/src/runtime/pprof/pprof_test.go
@@ -0,0 +1,2337 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !js
+
+package pprof
+
+import (
+	"bytes"
+	"context"
+	"fmt"
+	"internal/abi"
+	"internal/profile"
+	"internal/syscall/unix"
+	"internal/testenv"
+	"io"
+	"math"
+	"math/big"
+	"os"
+	"os/exec"
+	"regexp"
+	"runtime"
+	"runtime/debug"
+	"strings"
+	"sync"
+	"sync/atomic"
+	"testing"
+	"time"
+	_ "unsafe"
+)
+
+func cpuHogger(f func(x int) int, y *int, dur time.Duration) {
+	// We only need to get one 100 Hz clock tick, so we've got
+	// a large safety buffer.
+	// But do at least 500 iterations (which should take about 100ms),
+	// otherwise TestCPUProfileMultithreaded can fail if only one
+	// thread is scheduled during the testing period.
+	t0 := time.Now()
+	accum := *y
+	for i := 0; i < 500 || time.Since(t0) < dur; i++ {
+		accum = f(accum)
+	}
+	*y = accum
+}
+
+var (
+	salt1 = 0
+	salt2 = 0
+)
+
+// The actual CPU hogging function.
+// Must not call other functions nor access heap/globals in the loop,
+// otherwise under race detector the samples will be in the race runtime.
+func cpuHog1(x int) int {
+	return cpuHog0(x, 1e5)
+}
+
+func cpuHog0(x, n int) int {
+	foo := x
+	for i := 0; i < n; i++ {
+		if foo > 0 {
+			foo *= foo
+		} else {
+			foo *= foo + 1
+		}
+	}
+	return foo
+}
+
+func cpuHog2(x int) int {
+	foo := x
+	for i := 0; i < 1e5; i++ {
+		if foo > 0 {
+			foo *= foo
+		} else {
+			foo *= foo + 2
+		}
+	}
+	return foo
+}
+
+// Return a list of functions that we don't want to ever appear in CPU
+// profiles. For gccgo, that list includes the sigprof handler itself.
+func avoidFunctions() []string {
+	if runtime.Compiler == "gccgo" {
+		return []string{"runtime.sigprof"}
+	}
+	return nil
+}
+
+func TestCPUProfile(t *testing.T) {
+	matches := matchAndAvoidStacks(stackContains, []string{"runtime/pprof.cpuHog1"}, avoidFunctions())
+	testCPUProfile(t, matches, func(dur time.Duration) {
+		cpuHogger(cpuHog1, &salt1, dur)
+	})
+}
+
+func TestCPUProfileMultithreaded(t *testing.T) {
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(2))
+	matches := matchAndAvoidStacks(stackContains, []string{"runtime/pprof.cpuHog1", "runtime/pprof.cpuHog2"}, avoidFunctions())
+	testCPUProfile(t, matches, func(dur time.Duration) {
+		c := make(chan int)
+		go func() {
+			cpuHogger(cpuHog1, &salt1, dur)
+			c <- 1
+		}()
+		cpuHogger(cpuHog2, &salt2, dur)
+		<-c
+	})
+}
+
+func TestCPUProfileMultithreadMagnitude(t *testing.T) {
+	if runtime.GOOS != "linux" {
+		t.Skip("issue 35057 is only confirmed on Linux")
+	}
+
+	// Linux [5.9,5.16) has a kernel bug that can break CPU timers on newly
+	// created threads, breaking our CPU accounting.
+	major, minor := unix.KernelVersion()
+	t.Logf("Running on Linux %d.%d", major, minor)
+	defer func() {
+		if t.Failed() {
+			t.Logf("Failure of this test may indicate that your system suffers from a known Linux kernel bug fixed on newer kernels. See https://golang.org/issue/49065.")
+		}
+	}()
+
+	// Disable on affected builders to avoid flakiness, but otherwise keep
+	// it enabled to potentially warn users that they are on a broken
+	// kernel.
+	if testenv.Builder() != "" && (runtime.GOARCH == "386" || runtime.GOARCH == "amd64") {
+		have59 := major > 5 || (major == 5 && minor >= 9)
+		have516 := major > 5 || (major == 5 && minor >= 16)
+		if have59 && !have516 {
+			testenv.SkipFlaky(t, 49065)
+		}
+	}
+
+	// Run a workload in a single goroutine, then run copies of the same
+	// workload in several goroutines. For both the serial and parallel cases,
+	// the CPU time the process measures with its own profiler should match the
+	// total CPU usage that the OS reports.
+	//
+	// We could also check that increases in parallelism (GOMAXPROCS) lead to a
+	// linear increase in the CPU usage reported by both the OS and the
+	// profiler, but without a guarantee of exclusive access to CPU resources
+	// that is likely to be a flaky test.
+
+	// Require the smaller value to be within 10%, or 40% in short mode.
+	maxDiff := 0.10
+	if testing.Short() {
+		maxDiff = 0.40
+	}
+
+	compare := func(a, b time.Duration, maxDiff float64) error {
+		if a <= 0 || b <= 0 {
+			return fmt.Errorf("Expected both time reports to be positive")
+		}
+
+		if a < b {
+			a, b = b, a
+		}
+
+		diff := float64(a-b) / float64(a)
+		if diff > maxDiff {
+			return fmt.Errorf("CPU usage reports are too different (limit -%.1f%%, got -%.1f%%)", maxDiff*100, diff*100)
+		}
+
+		return nil
+	}
+
+	for _, tc := range []struct {
+		name    string
+		workers int
+	}{
+		{
+			name:    "serial",
+			workers: 1,
+		},
+		{
+			name:    "parallel",
+			workers: runtime.GOMAXPROCS(0),
+		},
+	} {
+		// check that the OS's perspective matches what the Go runtime measures.
+		t.Run(tc.name, func(t *testing.T) {
+			t.Logf("Running with %d workers", tc.workers)
+
+			var userTime, systemTime time.Duration
+			matches := matchAndAvoidStacks(stackContains, []string{"runtime/pprof.cpuHog1"}, avoidFunctions())
+			acceptProfile := func(t *testing.T, p *profile.Profile) bool {
+				if !matches(t, p) {
+					return false
+				}
+
+				ok := true
+				for i, unit := range []string{"count", "nanoseconds"} {
+					if have, want := p.SampleType[i].Unit, unit; have != want {
+						t.Logf("pN SampleType[%d]; %q != %q", i, have, want)
+						ok = false
+					}
+				}
+
+				// cpuHog1 called below is the primary source of CPU
+				// load, but there may be some background work by the
+				// runtime. Since the OS rusage measurement will
+				// include all work done by the process, also compare
+				// against all samples in our profile.
+				var value time.Duration
+				for _, sample := range p.Sample {
+					value += time.Duration(sample.Value[1]) * time.Nanosecond
+				}
+
+				totalTime := userTime + systemTime
+				t.Logf("compare %s user + %s system = %s vs %s", userTime, systemTime, totalTime, value)
+				if err := compare(totalTime, value, maxDiff); err != nil {
+					t.Logf("compare got %v want nil", err)
+					ok = false
+				}
+
+				return ok
+			}
+
+			testCPUProfile(t, acceptProfile, func(dur time.Duration) {
+				userTime, systemTime = diffCPUTime(t, func() {
+					var wg sync.WaitGroup
+					var once sync.Once
+					for i := 0; i < tc.workers; i++ {
+						wg.Add(1)
+						go func() {
+							defer wg.Done()
+							var salt = 0
+							cpuHogger(cpuHog1, &salt, dur)
+							once.Do(func() { salt1 = salt })
+						}()
+					}
+					wg.Wait()
+				})
+			})
+		})
+	}
+}
+
+// containsInlinedCall reports whether the function body for the function f is
+// known to contain an inlined function call within the first maxBytes bytes.
+func containsInlinedCall(f any, maxBytes int) bool {
+	_, found := findInlinedCall(f, maxBytes)
+	return found
+}
+
+// findInlinedCall returns the PC of an inlined function call within
+// the function body for the function f if any.
+func findInlinedCall(f any, maxBytes int) (pc uint64, found bool) {
+	fFunc := runtime.FuncForPC(uintptr(abi.FuncPCABIInternal(f)))
+	if fFunc == nil || fFunc.Entry() == 0 {
+		panic("failed to locate function entry")
+	}
+
+	for offset := 0; offset < maxBytes; offset++ {
+		innerPC := fFunc.Entry() + uintptr(offset)
+		inner := runtime.FuncForPC(innerPC)
+		if inner == nil {
+			// No function known for this PC value.
+			// It might simply be misaligned, so keep searching.
+			continue
+		}
+		if inner.Entry() != fFunc.Entry() {
+			// Scanned past f and didn't find any inlined functions.
+			break
+		}
+		if inner.Name() != fFunc.Name() {
+			// This PC has f as its entry-point, but is not f. Therefore, it must be a
+			// function inlined into f.
+			return uint64(innerPC), true
+		}
+	}
+
+	return 0, false
+}
+
+func TestCPUProfileInlining(t *testing.T) {
+	if !containsInlinedCall(inlinedCaller, 4<<10) {
+		t.Skip("Can't determine whether inlinedCallee was inlined into inlinedCaller.")
+	}
+
+	matches := matchAndAvoidStacks(stackContains, []string{"runtime/pprof.inlinedCallee", "runtime/pprof.inlinedCaller"}, avoidFunctions())
+	p := testCPUProfile(t, matches, func(dur time.Duration) {
+		cpuHogger(inlinedCaller, &salt1, dur)
+	})
+
+	// Check if inlined function locations are encoded correctly. The inlinedCalee and inlinedCaller should be in one location.
+	for _, loc := range p.Location {
+		hasInlinedCallerAfterInlinedCallee, hasInlinedCallee := false, false
+		for _, line := range loc.Line {
+			if line.Function.Name == "runtime/pprof.inlinedCallee" {
+				hasInlinedCallee = true
+			}
+			if hasInlinedCallee && line.Function.Name == "runtime/pprof.inlinedCaller" {
+				hasInlinedCallerAfterInlinedCallee = true
+			}
+		}
+		if hasInlinedCallee != hasInlinedCallerAfterInlinedCallee {
+			t.Fatalf("want inlinedCallee followed by inlinedCaller, got separate Location entries:\n%v", p)
+		}
+	}
+}
+
+func inlinedCaller(x int) int {
+	x = inlinedCallee(x, 1e5)
+	return x
+}
+
+func inlinedCallee(x, n int) int {
+	return cpuHog0(x, n)
+}
+
+//go:noinline
+func dumpCallers(pcs []uintptr) {
+	if pcs == nil {
+		return
+	}
+
+	skip := 2 // Callers and dumpCallers
+	runtime.Callers(skip, pcs)
+}
+
+//go:noinline
+func inlinedCallerDump(pcs []uintptr) {
+	inlinedCalleeDump(pcs)
+}
+
+func inlinedCalleeDump(pcs []uintptr) {
+	dumpCallers(pcs)
+}
+
+type inlineWrapperInterface interface {
+	dump(stack []uintptr)
+}
+
+type inlineWrapper struct {
+}
+
+func (h inlineWrapper) dump(pcs []uintptr) {
+	dumpCallers(pcs)
+}
+
+func inlinedWrapperCallerDump(pcs []uintptr) {
+	var h inlineWrapperInterface
+	h = &inlineWrapper{}
+	h.dump(pcs)
+}
+
+func TestCPUProfileRecursion(t *testing.T) {
+	matches := matchAndAvoidStacks(stackContains, []string{"runtime/pprof.inlinedCallee", "runtime/pprof.recursionCallee", "runtime/pprof.recursionCaller"}, avoidFunctions())
+	p := testCPUProfile(t, matches, func(dur time.Duration) {
+		cpuHogger(recursionCaller, &salt1, dur)
+	})
+
+	// check the Location encoding was not confused by recursive calls.
+	for i, loc := range p.Location {
+		recursionFunc := 0
+		for _, line := range loc.Line {
+			if name := line.Function.Name; name == "runtime/pprof.recursionCaller" || name == "runtime/pprof.recursionCallee" {
+				recursionFunc++
+			}
+		}
+		if recursionFunc > 1 {
+			t.Fatalf("want at most one recursionCaller or recursionCallee in one Location, got a violating Location (index: %d):\n%v", i, p)
+		}
+	}
+}
+
+func recursionCaller(x int) int {
+	y := recursionCallee(3, x)
+	return y
+}
+
+func recursionCallee(n, x int) int {
+	if n == 0 {
+		return 1
+	}
+	y := inlinedCallee(x, 1e4)
+	return y * recursionCallee(n-1, x)
+}
+
+func recursionChainTop(x int, pcs []uintptr) {
+	if x < 0 {
+		return
+	}
+	recursionChainMiddle(x, pcs)
+}
+
+func recursionChainMiddle(x int, pcs []uintptr) {
+	recursionChainBottom(x, pcs)
+}
+
+func recursionChainBottom(x int, pcs []uintptr) {
+	// This will be called each time, we only care about the last. We
+	// can't make this conditional or this function won't be inlined.
+	dumpCallers(pcs)
+
+	recursionChainTop(x-1, pcs)
+}
+
+func parseProfile(t *testing.T, valBytes []byte, f func(uintptr, []*profile.Location, map[string][]string)) *profile.Profile {
+	p, err := profile.Parse(bytes.NewReader(valBytes))
+	if err != nil {
+		t.Fatal(err)
+	}
+	for _, sample := range p.Sample {
+		count := uintptr(sample.Value[0])
+		f(count, sample.Location, sample.Label)
+	}
+	return p
+}
+
+func cpuProfilingBroken() bool {
+	switch runtime.GOOS {
+	case "plan9":
+		// Profiling unimplemented.
+		return true
+	case "aix":
+		// See https://golang.org/issue/45170.
+		return true
+	case "ios", "dragonfly", "netbsd", "illumos", "solaris":
+		// See https://golang.org/issue/13841.
+		return true
+	case "openbsd":
+		if runtime.GOARCH == "arm" || runtime.GOARCH == "arm64" {
+			// See https://golang.org/issue/13841.
+			return true
+		}
+	}
+
+	return false
+}
+
+// testCPUProfile runs f under the CPU profiler, checking for some conditions specified by need,
+// as interpreted by matches, and returns the parsed profile.
+func testCPUProfile(t *testing.T, matches profileMatchFunc, f func(dur time.Duration)) *profile.Profile {
+	switch runtime.GOOS {
+	case "darwin":
+		out, err := exec.Command("uname", "-a").CombinedOutput()
+		if err != nil {
+			t.Fatal(err)
+		}
+		vers := string(out)
+		t.Logf("uname -a: %v", vers)
+	case "plan9":
+		t.Skip("skipping on plan9")
+	case "wasip1":
+		t.Skip("skipping on wasip1")
+	}
+
+	broken := cpuProfilingBroken()
+
+	deadline, ok := t.Deadline()
+	if broken || !ok {
+		if broken && testing.Short() {
+			// If it's expected to be broken, no point waiting around.
+			deadline = time.Now().Add(1 * time.Second)
+		} else {
+			deadline = time.Now().Add(10 * time.Second)
+		}
+	}
+
+	// If we're running a long test, start with a long duration
+	// for tests that try to make sure something *doesn't* happen.
+	duration := 5 * time.Second
+	if testing.Short() {
+		duration = 100 * time.Millisecond
+	}
+
+	// Profiling tests are inherently flaky, especially on a
+	// loaded system, such as when this test is running with
+	// several others under go test std. If a test fails in a way
+	// that could mean it just didn't run long enough, try with a
+	// longer duration.
+	for {
+		var prof bytes.Buffer
+		if err := StartCPUProfile(&prof); err != nil {
+			t.Fatal(err)
+		}
+		f(duration)
+		StopCPUProfile()
+
+		if p, ok := profileOk(t, matches, prof, duration); ok {
+			return p
+		}
+
+		duration *= 2
+		if time.Until(deadline) < duration {
+			break
+		}
+		t.Logf("retrying with %s duration", duration)
+	}
+
+	if broken {
+		t.Skipf("ignoring failure on %s/%s; see golang.org/issue/13841", runtime.GOOS, runtime.GOARCH)
+	}
+
+	// Ignore the failure if the tests are running in a QEMU-based emulator,
+	// QEMU is not perfect at emulating everything.
+	// IN_QEMU environmental variable is set by some of the Go builders.
+	// IN_QEMU=1 indicates that the tests are running in QEMU. See issue 9605.
+	if os.Getenv("IN_QEMU") == "1" {
+		t.Skip("ignore the failure in QEMU; see golang.org/issue/9605")
+	}
+	t.FailNow()
+	return nil
+}
+
+var diffCPUTimeImpl func(f func()) (user, system time.Duration)
+
+func diffCPUTime(t *testing.T, f func()) (user, system time.Duration) {
+	if fn := diffCPUTimeImpl; fn != nil {
+		return fn(f)
+	}
+	t.Fatalf("cannot measure CPU time on GOOS=%s GOARCH=%s", runtime.GOOS, runtime.GOARCH)
+	return 0, 0
+}
+
+func contains(slice []string, s string) bool {
+	for i := range slice {
+		if slice[i] == s {
+			return true
+		}
+	}
+	return false
+}
+
+// stackContains matches if a function named spec appears anywhere in the stack trace.
+func stackContains(spec string, count uintptr, stk []*profile.Location, labels map[string][]string) bool {
+	for _, loc := range stk {
+		for _, line := range loc.Line {
+			if strings.Contains(line.Function.Name, spec) {
+				return true
+			}
+		}
+	}
+	return false
+}
+
+type sampleMatchFunc func(spec string, count uintptr, stk []*profile.Location, labels map[string][]string) bool
+
+func profileOk(t *testing.T, matches profileMatchFunc, prof bytes.Buffer, duration time.Duration) (_ *profile.Profile, ok bool) {
+	ok = true
+
+	var samples uintptr
+	var buf strings.Builder
+	p := parseProfile(t, prof.Bytes(), func(count uintptr, stk []*profile.Location, labels map[string][]string) {
+		fmt.Fprintf(&buf, "%d:", count)
+		fprintStack(&buf, stk)
+		fmt.Fprintf(&buf, " labels: %v\n", labels)
+		samples += count
+		fmt.Fprintf(&buf, "\n")
+	})
+	t.Logf("total %d CPU profile samples collected:\n%s", samples, buf.String())
+
+	if samples < 10 && runtime.GOOS == "windows" {
+		// On some windows machines we end up with
+		// not enough samples due to coarse timer
+		// resolution. Let it go.
+		t.Log("too few samples on Windows (golang.org/issue/10842)")
+		return p, false
+	}
+
+	// Check that we got a reasonable number of samples.
+	// We used to always require at least ideal/4 samples,
+	// but that is too hard to guarantee on a loaded system.
+	// Now we accept 10 or more samples, which we take to be
+	// enough to show that at least some profiling is occurring.
+	if ideal := uintptr(duration * 100 / time.Second); samples == 0 || (samples < ideal/4 && samples < 10) {
+		t.Logf("too few samples; got %d, want at least %d, ideally %d", samples, ideal/4, ideal)
+		ok = false
+	}
+
+	if matches != nil && !matches(t, p) {
+		ok = false
+	}
+
+	return p, ok
+}
+
+type profileMatchFunc func(*testing.T, *profile.Profile) bool
+
+func matchAndAvoidStacks(matches sampleMatchFunc, need []string, avoid []string) profileMatchFunc {
+	return func(t *testing.T, p *profile.Profile) (ok bool) {
+		ok = true
+
+		// Check that profile is well formed, contains 'need', and does not contain
+		// anything from 'avoid'.
+		have := make([]uintptr, len(need))
+		avoidSamples := make([]uintptr, len(avoid))
+
+		for _, sample := range p.Sample {
+			count := uintptr(sample.Value[0])
+			for i, spec := range need {
+				if matches(spec, count, sample.Location, sample.Label) {
+					have[i] += count
+				}
+			}
+			for i, name := range avoid {
+				for _, loc := range sample.Location {
+					for _, line := range loc.Line {
+						if strings.Contains(line.Function.Name, name) {
+							avoidSamples[i] += count
+						}
+					}
+				}
+			}
+		}
+
+		for i, name := range avoid {
+			bad := avoidSamples[i]
+			if bad != 0 {
+				t.Logf("found %d samples in avoid-function %s\n", bad, name)
+				ok = false
+			}
+		}
+
+		if len(need) == 0 {
+			return
+		}
+
+		var total uintptr
+		for i, name := range need {
+			total += have[i]
+			t.Logf("found %d samples in expected function %s\n", have[i], name)
+		}
+		if total == 0 {
+			t.Logf("no samples in expected functions")
+			ok = false
+		}
+
+		// We'd like to check a reasonable minimum, like
+		// total / len(have) / smallconstant, but this test is
+		// pretty flaky (see bug 7095).  So we'll just test to
+		// make sure we got at least one sample.
+		min := uintptr(1)
+		for i, name := range need {
+			if have[i] < min {
+				t.Logf("%s has %d samples out of %d, want at least %d, ideally %d", name, have[i], total, min, total/uintptr(len(have)))
+				ok = false
+			}
+		}
+		return
+	}
+}
+
+// Fork can hang if preempted with signals frequently enough (see issue 5517).
+// Ensure that we do not do this.
+func TestCPUProfileWithFork(t *testing.T) {
+	testenv.MustHaveExec(t)
+
+	heap := 1 << 30
+	if runtime.GOOS == "android" {
+		// Use smaller size for Android to avoid crash.
+		heap = 100 << 20
+	}
+	if runtime.GOOS == "windows" && runtime.GOARCH == "arm" {
+		// Use smaller heap for Windows/ARM to avoid crash.
+		heap = 100 << 20
+	}
+	if testing.Short() {
+		heap = 100 << 20
+	}
+	// This makes fork slower.
+	garbage := make([]byte, heap)
+	// Need to touch the slice, otherwise it won't be paged in.
+	done := make(chan bool)
+	go func() {
+		for i := range garbage {
+			garbage[i] = 42
+		}
+		done <- true
+	}()
+	<-done
+
+	var prof bytes.Buffer
+	if err := StartCPUProfile(&prof); err != nil {
+		t.Fatal(err)
+	}
+	defer StopCPUProfile()
+
+	for i := 0; i < 10; i++ {
+		exec.Command(os.Args[0], "-h").CombinedOutput()
+	}
+}
+
+// Test that profiler does not observe runtime.gogo as "user" goroutine execution.
+// If it did, it would see inconsistent state and would either record an incorrect stack
+// or crash because the stack was malformed.
+func TestGoroutineSwitch(t *testing.T) {
+	if runtime.Compiler == "gccgo" {
+		t.Skip("not applicable for gccgo")
+	}
+	// How much to try. These defaults take about 1 seconds
+	// on a 2012 MacBook Pro. The ones in short mode take
+	// about 0.1 seconds.
+	tries := 10
+	count := 1000000
+	if testing.Short() {
+		tries = 1
+	}
+	for try := 0; try < tries; try++ {
+		var prof bytes.Buffer
+		if err := StartCPUProfile(&prof); err != nil {
+			t.Fatal(err)
+		}
+		for i := 0; i < count; i++ {
+			runtime.Gosched()
+		}
+		StopCPUProfile()
+
+		// Read profile to look for entries for gogo with an attempt at a traceback.
+		// "runtime.gogo" is OK, because that's the part of the context switch
+		// before the actual switch begins. But we should not see "gogo",
+		// aka "gogo<>(SB)", which does the actual switch and is marked SPWRITE.
+		parseProfile(t, prof.Bytes(), func(count uintptr, stk []*profile.Location, _ map[string][]string) {
+			// An entry with two frames with 'System' in its top frame
+			// exists to record a PC without a traceback. Those are okay.
+			if len(stk) == 2 {
+				name := stk[1].Line[0].Function.Name
+				if name == "runtime._System" || name == "runtime._ExternalCode" || name == "runtime._GC" {
+					return
+				}
+			}
+
+			// An entry with just one frame is OK too:
+			// it knew to stop at gogo.
+			if len(stk) == 1 {
+				return
+			}
+
+			// Otherwise, should not see gogo.
+			// The place we'd see it would be the inner most frame.
+			name := stk[0].Line[0].Function.Name
+			if name == "gogo" {
+				var buf strings.Builder
+				fprintStack(&buf, stk)
+				t.Fatalf("found profile entry for gogo:\n%s", buf.String())
+			}
+		})
+	}
+}
+
+func fprintStack(w io.Writer, stk []*profile.Location) {
+	if len(stk) == 0 {
+		fmt.Fprintf(w, " (stack empty)")
+	}
+	for _, loc := range stk {
+		fmt.Fprintf(w, " %#x", loc.Address)
+		fmt.Fprintf(w, " (")
+		for i, line := range loc.Line {
+			if i > 0 {
+				fmt.Fprintf(w, " ")
+			}
+			fmt.Fprintf(w, "%s:%d", line.Function.Name, line.Line)
+		}
+		fmt.Fprintf(w, ")")
+	}
+}
+
+// Test that profiling of division operations is okay, especially on ARM. See issue 6681.
+func TestMathBigDivide(t *testing.T) {
+	testCPUProfile(t, nil, func(duration time.Duration) {
+		t := time.After(duration)
+		pi := new(big.Int)
+		for {
+			for i := 0; i < 100; i++ {
+				n := big.NewInt(2646693125139304345)
+				d := big.NewInt(842468587426513207)
+				pi.Div(n, d)
+			}
+			select {
+			case <-t:
+				return
+			default:
+			}
+		}
+	})
+}
+
+// stackContainsAll matches if all functions in spec (comma-separated) appear somewhere in the stack trace.
+func stackContainsAll(spec string, count uintptr, stk []*profile.Location, labels map[string][]string) bool {
+	for _, f := range strings.Split(spec, ",") {
+		if !stackContains(f, count, stk, labels) {
+			return false
+		}
+	}
+	return true
+}
+
+func TestMorestack(t *testing.T) {
+	matches := matchAndAvoidStacks(stackContainsAll, []string{"runtime.newstack,runtime/pprof.growstack"}, avoidFunctions())
+	testCPUProfile(t, matches, func(duration time.Duration) {
+		t := time.After(duration)
+		c := make(chan bool)
+		for {
+			go func() {
+				growstack1()
+				c <- true
+			}()
+			select {
+			case <-t:
+				return
+			case <-c:
+			}
+		}
+	})
+}
+
+//go:noinline
+func growstack1() {
+	growstack(10)
+}
+
+//go:noinline
+func growstack(n int) {
+	var buf [8 << 18]byte
+	use(buf)
+	if n > 0 {
+		growstack(n - 1)
+	}
+}
+
+//go:noinline
+func use(x [8 << 18]byte) {}
+
+func TestBlockProfile(t *testing.T) {
+	type TestCase struct {
+		name string
+		f    func(*testing.T)
+		stk  []string
+		re   string
+	}
+	tests := [...]TestCase{
+		{
+			name: "chan recv",
+			f:    blockChanRecv,
+			stk: []string{
+				"runtime.chanrecv1",
+				"runtime/pprof.blockChanRecv",
+				"runtime/pprof.TestBlockProfile",
+			},
+			re: `
+[0-9]+ [0-9]+ @( 0x[[:xdigit:]]+)+
+#	0x[0-9a-f]+	runtime\.chanrecv1\+0x[0-9a-f]+	.*runtime/chan.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.blockChanRecv\+0x[0-9a-f]+	.*runtime/pprof/pprof_test.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.TestBlockProfile\+0x[0-9a-f]+	.*runtime/pprof/pprof_test.go:[0-9]+
+`},
+		{
+			name: "chan send",
+			f:    blockChanSend,
+			stk: []string{
+				"runtime.chansend1",
+				"runtime/pprof.blockChanSend",
+				"runtime/pprof.TestBlockProfile",
+			},
+			re: `
+[0-9]+ [0-9]+ @( 0x[[:xdigit:]]+)+
+#	0x[0-9a-f]+	runtime\.chansend1\+0x[0-9a-f]+	.*runtime/chan.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.blockChanSend\+0x[0-9a-f]+	.*runtime/pprof/pprof_test.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.TestBlockProfile\+0x[0-9a-f]+	.*runtime/pprof/pprof_test.go:[0-9]+
+`},
+		{
+			name: "chan close",
+			f:    blockChanClose,
+			stk: []string{
+				"runtime.chanrecv1",
+				"runtime/pprof.blockChanClose",
+				"runtime/pprof.TestBlockProfile",
+			},
+			re: `
+[0-9]+ [0-9]+ @( 0x[[:xdigit:]]+)+
+#	0x[0-9a-f]+	runtime\.chanrecv1\+0x[0-9a-f]+	.*runtime/chan.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.blockChanClose\+0x[0-9a-f]+	.*runtime/pprof/pprof_test.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.TestBlockProfile\+0x[0-9a-f]+	.*runtime/pprof/pprof_test.go:[0-9]+
+`},
+		{
+			name: "select recv async",
+			f:    blockSelectRecvAsync,
+			stk: []string{
+				"runtime.selectgo",
+				"runtime/pprof.blockSelectRecvAsync",
+				"runtime/pprof.TestBlockProfile",
+			},
+			re: `
+[0-9]+ [0-9]+ @( 0x[[:xdigit:]]+)+
+#	0x[0-9a-f]+	runtime\.selectgo\+0x[0-9a-f]+	.*runtime/select.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.blockSelectRecvAsync\+0x[0-9a-f]+	.*runtime/pprof/pprof_test.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.TestBlockProfile\+0x[0-9a-f]+	.*runtime/pprof/pprof_test.go:[0-9]+
+`},
+		{
+			name: "select send sync",
+			f:    blockSelectSendSync,
+			stk: []string{
+				"runtime.selectgo",
+				"runtime/pprof.blockSelectSendSync",
+				"runtime/pprof.TestBlockProfile",
+			},
+			re: `
+[0-9]+ [0-9]+ @( 0x[[:xdigit:]]+)+
+#	0x[0-9a-f]+	runtime\.selectgo\+0x[0-9a-f]+	.*runtime/select.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.blockSelectSendSync\+0x[0-9a-f]+	.*runtime/pprof/pprof_test.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.TestBlockProfile\+0x[0-9a-f]+	.*runtime/pprof/pprof_test.go:[0-9]+
+`},
+		{
+			name: "mutex",
+			f:    blockMutex,
+			stk: []string{
+				"sync.(*Mutex).Lock",
+				"runtime/pprof.blockMutex",
+				"runtime/pprof.TestBlockProfile",
+			},
+			re: `
+[0-9]+ [0-9]+ @( 0x[[:xdigit:]]+)+
+#	0x[0-9a-f]+	sync\.\(\*Mutex\)\.Lock\+0x[0-9a-f]+	.*sync/mutex\.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.blockMutex\+0x[0-9a-f]+	.*runtime/pprof/pprof_test.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.TestBlockProfile\+0x[0-9a-f]+	.*runtime/pprof/pprof_test.go:[0-9]+
+`},
+		{
+			name: "cond",
+			f:    blockCond,
+			stk: []string{
+				"sync.(*Cond).Wait",
+				"runtime/pprof.blockCond",
+				"runtime/pprof.TestBlockProfile",
+			},
+			re: `
+[0-9]+ [0-9]+ @( 0x[[:xdigit:]]+)+
+#	0x[0-9a-f]+	sync\.\(\*Cond\)\.Wait\+0x[0-9a-f]+	.*sync/cond\.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.blockCond\+0x[0-9a-f]+	.*runtime/pprof/pprof_test.go:[0-9]+
+#	0x[0-9a-f]+	runtime/pprof\.TestBlockProfile\+0x[0-9a-f]+	.*runtime/pprof/pprof_test.go:[0-9]+
+`},
+	}
+
+	// Generate block profile
+	runtime.SetBlockProfileRate(1)
+	defer runtime.SetBlockProfileRate(0)
+	for _, test := range tests {
+		test.f(t)
+	}
+
+	t.Run("debug=1", func(t *testing.T) {
+		var w strings.Builder
+		Lookup("block").WriteTo(&w, 1)
+		prof := w.String()
+
+		if !strings.HasPrefix(prof, "--- contention:\ncycles/second=") {
+			t.Fatalf("Bad profile header:\n%v", prof)
+		}
+
+		if strings.HasSuffix(prof, "#\t0x0\n\n") {
+			t.Errorf("Useless 0 suffix:\n%v", prof)
+		}
+
+		for _, test := range tests {
+			if !regexp.MustCompile(strings.ReplaceAll(test.re, "\t", "\t+")).MatchString(prof) {
+				t.Errorf("Bad %v entry, expect:\n%v\ngot:\n%v", test.name, test.re, prof)
+			}
+		}
+	})
+
+	t.Run("proto", func(t *testing.T) {
+		// proto format
+		var w bytes.Buffer
+		Lookup("block").WriteTo(&w, 0)
+		p, err := profile.Parse(&w)
+		if err != nil {
+			t.Fatalf("failed to parse profile: %v", err)
+		}
+		t.Logf("parsed proto: %s", p)
+		if err := p.CheckValid(); err != nil {
+			t.Fatalf("invalid profile: %v", err)
+		}
+
+		stks := stacks(p)
+		for _, test := range tests {
+			if !containsStack(stks, test.stk) {
+				t.Errorf("No matching stack entry for %v, want %+v", test.name, test.stk)
+			}
+		}
+	})
+
+}
+
+func stacks(p *profile.Profile) (res [][]string) {
+	for _, s := range p.Sample {
+		var stk []string
+		for _, l := range s.Location {
+			for _, line := range l.Line {
+				stk = append(stk, line.Function.Name)
+			}
+		}
+		res = append(res, stk)
+	}
+	return res
+}
+
+func containsStack(got [][]string, want []string) bool {
+	for _, stk := range got {
+		if len(stk) < len(want) {
+			continue
+		}
+		for i, f := range want {
+			if f != stk[i] {
+				break
+			}
+			if i == len(want)-1 {
+				return true
+			}
+		}
+	}
+	return false
+}
+
+// awaitBlockedGoroutine spins on runtime.Gosched until a runtime stack dump
+// shows a goroutine in the given state with a stack frame in
+// runtime/pprof.<fName>.
+func awaitBlockedGoroutine(t *testing.T, state, fName string) {
+	re := fmt.Sprintf(`(?m)^goroutine \d+ \[%s\]:\n(?:.+\n\t.+\n)*runtime/pprof\.%s`, regexp.QuoteMeta(state), fName)
+	r := regexp.MustCompile(re)
+
+	if deadline, ok := t.Deadline(); ok {
+		if d := time.Until(deadline); d > 1*time.Second {
+			timer := time.AfterFunc(d-1*time.Second, func() {
+				debug.SetTraceback("all")
+				panic(fmt.Sprintf("timed out waiting for %#q", re))
+			})
+			defer timer.Stop()
+		}
+	}
+
+	buf := make([]byte, 64<<10)
+	for {
+		runtime.Gosched()
+		n := runtime.Stack(buf, true)
+		if n == len(buf) {
+			// Buffer wasn't large enough for a full goroutine dump.
+			// Resize it and try again.
+			buf = make([]byte, 2*len(buf))
+			continue
+		}
+		if r.Match(buf[:n]) {
+			return
+		}
+	}
+}
+
+func blockChanRecv(t *testing.T) {
+	c := make(chan bool)
+	go func() {
+		awaitBlockedGoroutine(t, "chan receive", "blockChanRecv")
+		c <- true
+	}()
+	<-c
+}
+
+func blockChanSend(t *testing.T) {
+	c := make(chan bool)
+	go func() {
+		awaitBlockedGoroutine(t, "chan send", "blockChanSend")
+		<-c
+	}()
+	c <- true
+}
+
+func blockChanClose(t *testing.T) {
+	c := make(chan bool)
+	go func() {
+		awaitBlockedGoroutine(t, "chan receive", "blockChanClose")
+		close(c)
+	}()
+	<-c
+}
+
+func blockSelectRecvAsync(t *testing.T) {
+	const numTries = 3
+	c := make(chan bool, 1)
+	c2 := make(chan bool, 1)
+	go func() {
+		for i := 0; i < numTries; i++ {
+			awaitBlockedGoroutine(t, "select", "blockSelectRecvAsync")
+			c <- true
+		}
+	}()
+	for i := 0; i < numTries; i++ {
+		select {
+		case <-c:
+		case <-c2:
+		}
+	}
+}
+
+func blockSelectSendSync(t *testing.T) {
+	c := make(chan bool)
+	c2 := make(chan bool)
+	go func() {
+		awaitBlockedGoroutine(t, "select", "blockSelectSendSync")
+		<-c
+	}()
+	select {
+	case c <- true:
+	case c2 <- true:
+	}
+}
+
+func blockMutex(t *testing.T) {
+	var mu sync.Mutex
+	mu.Lock()
+	go func() {
+		awaitBlockedGoroutine(t, "sync.Mutex.Lock", "blockMutex")
+		mu.Unlock()
+	}()
+	// Note: Unlock releases mu before recording the mutex event,
+	// so it's theoretically possible for this to proceed and
+	// capture the profile before the event is recorded. As long
+	// as this is blocked before the unlock happens, it's okay.
+	mu.Lock()
+}
+
+func blockCond(t *testing.T) {
+	var mu sync.Mutex
+	c := sync.NewCond(&mu)
+	mu.Lock()
+	go func() {
+		awaitBlockedGoroutine(t, "sync.Cond.Wait", "blockCond")
+		mu.Lock()
+		c.Signal()
+		mu.Unlock()
+	}()
+	c.Wait()
+	mu.Unlock()
+}
+
+// See http://golang.org/cl/299991.
+func TestBlockProfileBias(t *testing.T) {
+	rate := int(1000) // arbitrary value
+	runtime.SetBlockProfileRate(rate)
+	defer runtime.SetBlockProfileRate(0)
+
+	// simulate blocking events
+	blockFrequentShort(rate)
+	blockInfrequentLong(rate)
+
+	var w bytes.Buffer
+	Lookup("block").WriteTo(&w, 0)
+	p, err := profile.Parse(&w)
+	if err != nil {
+		t.Fatalf("failed to parse profile: %v", err)
+	}
+	t.Logf("parsed proto: %s", p)
+
+	il := float64(-1) // blockInfrequentLong duration
+	fs := float64(-1) // blockFrequentShort duration
+	for _, s := range p.Sample {
+		for _, l := range s.Location {
+			for _, line := range l.Line {
+				if len(s.Value) < 2 {
+					t.Fatal("block profile has less than 2 sample types")
+				}
+
+				if line.Function.Name == "runtime/pprof.blockInfrequentLong" {
+					il = float64(s.Value[1])
+				} else if line.Function.Name == "runtime/pprof.blockFrequentShort" {
+					fs = float64(s.Value[1])
+				}
+			}
+		}
+	}
+	if il == -1 || fs == -1 {
+		t.Fatal("block profile is missing expected functions")
+	}
+
+	// stddev of bias from 100 runs on local machine multiplied by 10x
+	const threshold = 0.2
+	if bias := (il - fs) / il; math.Abs(bias) > threshold {
+		t.Fatalf("bias: abs(%f) > %f", bias, threshold)
+	} else {
+		t.Logf("bias: abs(%f) < %f", bias, threshold)
+	}
+}
+
+// blockFrequentShort produces 100000 block events with an average duration of
+// rate / 10.
+func blockFrequentShort(rate int) {
+	for i := 0; i < 100000; i++ {
+		blockevent(int64(rate/10), 1)
+	}
+}
+
+// blockFrequentShort produces 10000 block events with an average duration of
+// rate.
+func blockInfrequentLong(rate int) {
+	for i := 0; i < 10000; i++ {
+		blockevent(int64(rate), 1)
+	}
+}
+
+// Used by TestBlockProfileBias.
+//
+//go:linkname blockevent runtime.blockevent
+func blockevent(cycles int64, skip int)
+
+func TestMutexProfile(t *testing.T) {
+	// Generate mutex profile
+
+	old := runtime.SetMutexProfileFraction(1)
+	defer runtime.SetMutexProfileFraction(old)
+	if old != 0 {
+		t.Fatalf("need MutexProfileRate 0, got %d", old)
+	}
+
+	blockMutex(t)
+
+	t.Run("debug=1", func(t *testing.T) {
+		var w strings.Builder
+		Lookup("mutex").WriteTo(&w, 1)
+		prof := w.String()
+		t.Logf("received profile: %v", prof)
+
+		if !strings.HasPrefix(prof, "--- mutex:\ncycles/second=") {
+			t.Errorf("Bad profile header:\n%v", prof)
+		}
+		prof = strings.Trim(prof, "\n")
+		lines := strings.Split(prof, "\n")
+		if len(lines) != 6 {
+			t.Errorf("expected 6 lines, got %d %q\n%s", len(lines), prof, prof)
+		}
+		if len(lines) < 6 {
+			return
+		}
+		// checking that the line is like "35258904 1 @ 0x48288d 0x47cd28 0x458931"
+		r2 := `^\d+ \d+ @(?: 0x[[:xdigit:]]+)+`
+		//r2 := "^[0-9]+ 1 @ 0x[0-9a-f x]+$"
+		if ok, err := regexp.MatchString(r2, lines[3]); err != nil || !ok {
+			t.Errorf("%q didn't match %q", lines[3], r2)
+		}
+		r3 := "^#.*runtime/pprof.blockMutex.*$"
+		if ok, err := regexp.MatchString(r3, lines[5]); err != nil || !ok {
+			t.Errorf("%q didn't match %q", lines[5], r3)
+		}
+		t.Logf(prof)
+	})
+	t.Run("proto", func(t *testing.T) {
+		// proto format
+		var w bytes.Buffer
+		Lookup("mutex").WriteTo(&w, 0)
+		p, err := profile.Parse(&w)
+		if err != nil {
+			t.Fatalf("failed to parse profile: %v", err)
+		}
+		t.Logf("parsed proto: %s", p)
+		if err := p.CheckValid(); err != nil {
+			t.Fatalf("invalid profile: %v", err)
+		}
+
+		stks := stacks(p)
+		for _, want := range [][]string{
+			{"sync.(*Mutex).Unlock", "runtime/pprof.blockMutex.func1"},
+		} {
+			if !containsStack(stks, want) {
+				t.Errorf("No matching stack entry for %+v", want)
+			}
+		}
+	})
+}
+
+func TestMutexProfileRateAdjust(t *testing.T) {
+	old := runtime.SetMutexProfileFraction(1)
+	defer runtime.SetMutexProfileFraction(old)
+	if old != 0 {
+		t.Fatalf("need MutexProfileRate 0, got %d", old)
+	}
+
+	readProfile := func() (contentions int64, delay int64) {
+		var w bytes.Buffer
+		Lookup("mutex").WriteTo(&w, 0)
+		p, err := profile.Parse(&w)
+		if err != nil {
+			t.Fatalf("failed to parse profile: %v", err)
+		}
+		t.Logf("parsed proto: %s", p)
+		if err := p.CheckValid(); err != nil {
+			t.Fatalf("invalid profile: %v", err)
+		}
+
+		for _, s := range p.Sample {
+			for _, l := range s.Location {
+				for _, line := range l.Line {
+					if line.Function.Name == "runtime/pprof.blockMutex.func1" {
+						contentions += s.Value[0]
+						delay += s.Value[1]
+					}
+				}
+			}
+		}
+		return
+	}
+
+	blockMutex(t)
+	contentions, delay := readProfile()
+	if contentions == 0 || delay == 0 {
+		t.Fatal("did not see expected function in profile")
+	}
+	runtime.SetMutexProfileFraction(0)
+	newContentions, newDelay := readProfile()
+	if newContentions != contentions || newDelay != delay {
+		t.Fatalf("sample value changed: got [%d, %d], want [%d, %d]", newContentions, newDelay, contentions, delay)
+	}
+}
+
+func func1(c chan int) { <-c }
+func func2(c chan int) { <-c }
+func func3(c chan int) { <-c }
+func func4(c chan int) { <-c }
+
+func TestGoroutineCounts(t *testing.T) {
+	// Setting GOMAXPROCS to 1 ensures we can force all goroutines to the
+	// desired blocking point.
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(1))
+
+	c := make(chan int)
+	for i := 0; i < 100; i++ {
+		switch {
+		case i%10 == 0:
+			go func1(c)
+		case i%2 == 0:
+			go func2(c)
+		default:
+			go func3(c)
+		}
+		// Let goroutines block on channel
+		for j := 0; j < 5; j++ {
+			runtime.Gosched()
+		}
+	}
+	ctx := context.Background()
+
+	// ... and again, with labels this time (just with fewer iterations to keep
+	// sorting deterministic).
+	Do(ctx, Labels("label", "value"), func(context.Context) {
+		for i := 0; i < 89; i++ {
+			switch {
+			case i%10 == 0:
+				go func1(c)
+			case i%2 == 0:
+				go func2(c)
+			default:
+				go func3(c)
+			}
+			// Let goroutines block on channel
+			for j := 0; j < 5; j++ {
+				runtime.Gosched()
+			}
+		}
+	})
+
+	var w bytes.Buffer
+	goroutineProf := Lookup("goroutine")
+
+	// Check debug profile
+	goroutineProf.WriteTo(&w, 1)
+	prof := w.String()
+
+	labels := labelMap{"label": "value"}
+	labelStr := "\n# labels: " + labels.String()
+	if !containsInOrder(prof, "\n50 @ ", "\n44 @", labelStr,
+		"\n40 @", "\n36 @", labelStr, "\n10 @", "\n9 @", labelStr, "\n1 @") {
+		t.Errorf("expected sorted goroutine counts with Labels:\n%s", prof)
+	}
+
+	// Check proto profile
+	w.Reset()
+	goroutineProf.WriteTo(&w, 0)
+	p, err := profile.Parse(&w)
+	if err != nil {
+		t.Errorf("error parsing protobuf profile: %v", err)
+	}
+	if err := p.CheckValid(); err != nil {
+		t.Errorf("protobuf profile is invalid: %v", err)
+	}
+	expectedLabels := map[int64]map[string]string{
+		50: {},
+		44: {"label": "value"},
+		40: {},
+		36: {"label": "value"},
+		10: {},
+		9:  {"label": "value"},
+		1:  {},
+	}
+	if !containsCountsLabels(p, expectedLabels) {
+		t.Errorf("expected count profile to contain goroutines with counts and labels %v, got %v",
+			expectedLabels, p)
+	}
+
+	close(c)
+
+	time.Sleep(10 * time.Millisecond) // let goroutines exit
+}
+
+func containsInOrder(s string, all ...string) bool {
+	for _, t := range all {
+		var ok bool
+		if _, s, ok = strings.Cut(s, t); !ok {
+			return false
+		}
+	}
+	return true
+}
+
+func containsCountsLabels(prof *profile.Profile, countLabels map[int64]map[string]string) bool {
+	m := make(map[int64]int)
+	type nkey struct {
+		count    int64
+		key, val string
+	}
+	n := make(map[nkey]int)
+	for c, kv := range countLabels {
+		m[c]++
+		for k, v := range kv {
+			n[nkey{
+				count: c,
+				key:   k,
+				val:   v,
+			}]++
+
+		}
+	}
+	for _, s := range prof.Sample {
+		// The count is the single value in the sample
+		if len(s.Value) != 1 {
+			return false
+		}
+		m[s.Value[0]]--
+		for k, vs := range s.Label {
+			for _, v := range vs {
+				n[nkey{
+					count: s.Value[0],
+					key:   k,
+					val:   v,
+				}]--
+			}
+		}
+	}
+	for _, n := range m {
+		if n > 0 {
+			return false
+		}
+	}
+	for _, ncnt := range n {
+		if ncnt != 0 {
+			return false
+		}
+	}
+	return true
+}
+
+func TestGoroutineProfileConcurrency(t *testing.T) {
+	testenv.MustHaveParallelism(t)
+
+	goroutineProf := Lookup("goroutine")
+
+	profilerCalls := func(s string) int {
+		return strings.Count(s, "\truntime/pprof.runtime_goroutineProfileWithLabels+")
+	}
+
+	includesFinalizer := func(s string) bool {
+		return strings.Contains(s, "runtime.runfinq")
+	}
+
+	// Concurrent calls to the goroutine profiler should not trigger data races
+	// or corruption.
+	t.Run("overlapping profile requests", func(t *testing.T) {
+		ctx := context.Background()
+		ctx, cancel := context.WithTimeout(ctx, 10*time.Second)
+		defer cancel()
+
+		var wg sync.WaitGroup
+		for i := 0; i < 2; i++ {
+			wg.Add(1)
+			Do(ctx, Labels("i", fmt.Sprint(i)), func(context.Context) {
+				go func() {
+					defer wg.Done()
+					for ctx.Err() == nil {
+						var w strings.Builder
+						goroutineProf.WriteTo(&w, 1)
+						prof := w.String()
+						count := profilerCalls(prof)
+						if count >= 2 {
+							t.Logf("prof %d\n%s", count, prof)
+							cancel()
+						}
+					}
+				}()
+			})
+		}
+		wg.Wait()
+	})
+
+	// The finalizer goroutine should not show up in most profiles, since it's
+	// marked as a system goroutine when idle.
+	t.Run("finalizer not present", func(t *testing.T) {
+		var w strings.Builder
+		goroutineProf.WriteTo(&w, 1)
+		prof := w.String()
+		if includesFinalizer(prof) {
+			t.Errorf("profile includes finalizer (but finalizer should be marked as system):\n%s", prof)
+		}
+	})
+
+	// The finalizer goroutine should show up when it's running user code.
+	t.Run("finalizer present", func(t *testing.T) {
+		obj := new(byte)
+		ch1, ch2 := make(chan int), make(chan int)
+		defer close(ch2)
+		runtime.SetFinalizer(obj, func(_ interface{}) {
+			close(ch1)
+			<-ch2
+		})
+		obj = nil
+		for i := 10; i >= 0; i-- {
+			select {
+			case <-ch1:
+			default:
+				if i == 0 {
+					t.Fatalf("finalizer did not run")
+				}
+				runtime.GC()
+			}
+		}
+		var w strings.Builder
+		goroutineProf.WriteTo(&w, 1)
+		prof := w.String()
+		if !includesFinalizer(prof) {
+			t.Errorf("profile does not include finalizer (and it should be marked as user):\n%s", prof)
+		}
+	})
+
+	// Check that new goroutines only show up in order.
+	testLaunches := func(t *testing.T) {
+		var done sync.WaitGroup
+		defer done.Wait()
+
+		ctx := context.Background()
+		ctx, cancel := context.WithCancel(ctx)
+		defer cancel()
+
+		ch := make(chan int)
+		defer close(ch)
+
+		var ready sync.WaitGroup
+
+		// These goroutines all survive until the end of the subtest, so we can
+		// check that a (numbered) goroutine appearing in the profile implies
+		// that all older goroutines also appear in the profile.
+		ready.Add(1)
+		done.Add(1)
+		go func() {
+			defer done.Done()
+			for i := 0; ctx.Err() == nil; i++ {
+				// Use SetGoroutineLabels rather than Do we can always expect an
+				// extra goroutine (this one) with most recent label.
+				SetGoroutineLabels(WithLabels(ctx, Labels(t.Name()+"-loop-i", fmt.Sprint(i))))
+				done.Add(1)
+				go func() {
+					<-ch
+					done.Done()
+				}()
+				for j := 0; j < i; j++ {
+					// Spin for longer and longer as the test goes on. This
+					// goroutine will do O(N^2) work with the number of
+					// goroutines it launches. This should be slow relative to
+					// the work involved in collecting a goroutine profile,
+					// which is O(N) with the high-water mark of the number of
+					// goroutines in this process (in the allgs slice).
+					runtime.Gosched()
+				}
+				if i == 0 {
+					ready.Done()
+				}
+			}
+		}()
+
+		// Short-lived goroutines exercise different code paths (goroutines with
+		// status _Gdead, for instance). This churn doesn't have behavior that
+		// we can test directly, but does help to shake out data races.
+		ready.Add(1)
+		var churn func(i int)
+		churn = func(i int) {
+			SetGoroutineLabels(WithLabels(ctx, Labels(t.Name()+"-churn-i", fmt.Sprint(i))))
+			if i == 0 {
+				ready.Done()
+			} else if i%16 == 0 {
+				// Yield on occasion so this sequence of goroutine launches
+				// doesn't monopolize a P. See issue #52934.
+				runtime.Gosched()
+			}
+			if ctx.Err() == nil {
+				go churn(i + 1)
+			}
+		}
+		go func() {
+			churn(0)
+		}()
+
+		ready.Wait()
+
+		var w [3]bytes.Buffer
+		for i := range w {
+			goroutineProf.WriteTo(&w[i], 0)
+		}
+		for i := range w {
+			p, err := profile.Parse(bytes.NewReader(w[i].Bytes()))
+			if err != nil {
+				t.Errorf("error parsing protobuf profile: %v", err)
+			}
+
+			// High-numbered loop-i goroutines imply that every lower-numbered
+			// loop-i goroutine should be present in the profile too.
+			counts := make(map[string]int)
+			for _, s := range p.Sample {
+				label := s.Label[t.Name()+"-loop-i"]
+				if len(label) > 0 {
+					counts[label[0]]++
+				}
+			}
+			for j, max := 0, len(counts)-1; j <= max; j++ {
+				n := counts[fmt.Sprint(j)]
+				if n == 1 || (n == 2 && j == max) {
+					continue
+				}
+				t.Errorf("profile #%d's goroutines with label loop-i:%d; %d != 1 (or 2 for the last entry, %d)",
+					i+1, j, n, max)
+				t.Logf("counts %v", counts)
+				break
+			}
+		}
+	}
+
+	runs := 100
+	if testing.Short() {
+		runs = 5
+	}
+	for i := 0; i < runs; i++ {
+		// Run multiple times to shake out data races
+		t.Run("goroutine launches", testLaunches)
+	}
+}
+
+func BenchmarkGoroutine(b *testing.B) {
+	withIdle := func(n int, fn func(b *testing.B)) func(b *testing.B) {
+		return func(b *testing.B) {
+			c := make(chan int)
+			var ready, done sync.WaitGroup
+			defer func() {
+				close(c)
+				done.Wait()
+			}()
+
+			for i := 0; i < n; i++ {
+				ready.Add(1)
+				done.Add(1)
+				go func() {
+					ready.Done()
+					<-c
+					done.Done()
+				}()
+			}
+			// Let goroutines block on channel
+			ready.Wait()
+			for i := 0; i < 5; i++ {
+				runtime.Gosched()
+			}
+
+			fn(b)
+		}
+	}
+
+	withChurn := func(fn func(b *testing.B)) func(b *testing.B) {
+		return func(b *testing.B) {
+			ctx := context.Background()
+			ctx, cancel := context.WithCancel(ctx)
+			defer cancel()
+
+			var ready sync.WaitGroup
+			ready.Add(1)
+			var count int64
+			var churn func(i int)
+			churn = func(i int) {
+				SetGoroutineLabels(WithLabels(ctx, Labels("churn-i", fmt.Sprint(i))))
+				atomic.AddInt64(&count, 1)
+				if i == 0 {
+					ready.Done()
+				}
+				if ctx.Err() == nil {
+					go churn(i + 1)
+				}
+			}
+			go func() {
+				churn(0)
+			}()
+			ready.Wait()
+
+			fn(b)
+			b.ReportMetric(float64(atomic.LoadInt64(&count))/float64(b.N), "concurrent_launches/op")
+		}
+	}
+
+	benchWriteTo := func(b *testing.B) {
+		goroutineProf := Lookup("goroutine")
+		b.ResetTimer()
+		for i := 0; i < b.N; i++ {
+			goroutineProf.WriteTo(io.Discard, 0)
+		}
+		b.StopTimer()
+	}
+
+	benchGoroutineProfile := func(b *testing.B) {
+		p := make([]runtime.StackRecord, 10000)
+		b.ResetTimer()
+		for i := 0; i < b.N; i++ {
+			runtime.GoroutineProfile(p)
+		}
+		b.StopTimer()
+	}
+
+	// Note that some costs of collecting a goroutine profile depend on the
+	// length of the runtime.allgs slice, which never shrinks. Stay within race
+	// detector's 8k-goroutine limit
+	for _, n := range []int{50, 500, 5000} {
+		b.Run(fmt.Sprintf("Profile.WriteTo idle %d", n), withIdle(n, benchWriteTo))
+		b.Run(fmt.Sprintf("Profile.WriteTo churn %d", n), withIdle(n, withChurn(benchWriteTo)))
+		b.Run(fmt.Sprintf("runtime.GoroutineProfile churn %d", n), withIdle(n, withChurn(benchGoroutineProfile)))
+	}
+}
+
+var emptyCallStackTestRun int64
+
+// Issue 18836.
+func TestEmptyCallStack(t *testing.T) {
+	name := fmt.Sprintf("test18836_%d", emptyCallStackTestRun)
+	emptyCallStackTestRun++
+
+	t.Parallel()
+	var buf strings.Builder
+	p := NewProfile(name)
+
+	p.Add("foo", 47674)
+	p.WriteTo(&buf, 1)
+	p.Remove("foo")
+	got := buf.String()
+	prefix := name + " profile: total 1\n"
+	if !strings.HasPrefix(got, prefix) {
+		t.Fatalf("got:\n\t%q\nwant prefix:\n\t%q\n", got, prefix)
+	}
+	lostevent := "lostProfileEvent"
+	if !strings.Contains(got, lostevent) {
+		t.Fatalf("got:\n\t%q\ndoes not contain:\n\t%q\n", got, lostevent)
+	}
+}
+
+// stackContainsLabeled takes a spec like funcname;key=value and matches if the stack has that key
+// and value and has funcname somewhere in the stack.
+func stackContainsLabeled(spec string, count uintptr, stk []*profile.Location, labels map[string][]string) bool {
+	base, kv, ok := strings.Cut(spec, ";")
+	if !ok {
+		panic("no semicolon in key/value spec")
+	}
+	k, v, ok := strings.Cut(kv, "=")
+	if !ok {
+		panic("missing = in key/value spec")
+	}
+	if !contains(labels[k], v) {
+		return false
+	}
+	return stackContains(base, count, stk, labels)
+}
+
+func TestCPUProfileLabel(t *testing.T) {
+	matches := matchAndAvoidStacks(stackContainsLabeled, []string{"runtime/pprof.cpuHogger;key=value"}, avoidFunctions())
+	testCPUProfile(t, matches, func(dur time.Duration) {
+		Do(context.Background(), Labels("key", "value"), func(context.Context) {
+			cpuHogger(cpuHog1, &salt1, dur)
+		})
+	})
+}
+
+func TestLabelRace(t *testing.T) {
+	testenv.MustHaveParallelism(t)
+	// Test the race detector annotations for synchronization
+	// between setting labels and consuming them from the
+	// profile.
+	matches := matchAndAvoidStacks(stackContainsLabeled, []string{"runtime/pprof.cpuHogger;key=value"}, nil)
+	testCPUProfile(t, matches, func(dur time.Duration) {
+		start := time.Now()
+		var wg sync.WaitGroup
+		for time.Since(start) < dur {
+			var salts [10]int
+			for i := 0; i < 10; i++ {
+				wg.Add(1)
+				go func(j int) {
+					Do(context.Background(), Labels("key", "value"), func(context.Context) {
+						cpuHogger(cpuHog1, &salts[j], time.Millisecond)
+					})
+					wg.Done()
+				}(i)
+			}
+			wg.Wait()
+		}
+	})
+}
+
+func TestGoroutineProfileLabelRace(t *testing.T) {
+	testenv.MustHaveParallelism(t)
+	// Test the race detector annotations for synchronization
+	// between setting labels and consuming them from the
+	// goroutine profile. See issue #50292.
+
+	t.Run("reset", func(t *testing.T) {
+		ctx := context.Background()
+		ctx, cancel := context.WithCancel(ctx)
+		defer cancel()
+
+		go func() {
+			goroutineProf := Lookup("goroutine")
+			for ctx.Err() == nil {
+				var w strings.Builder
+				goroutineProf.WriteTo(&w, 1)
+				prof := w.String()
+				if strings.Contains(prof, "loop-i") {
+					cancel()
+				}
+			}
+		}()
+
+		for i := 0; ctx.Err() == nil; i++ {
+			Do(ctx, Labels("loop-i", fmt.Sprint(i)), func(ctx context.Context) {
+			})
+		}
+	})
+
+	t.Run("churn", func(t *testing.T) {
+		ctx := context.Background()
+		ctx, cancel := context.WithCancel(ctx)
+		defer cancel()
+
+		var ready sync.WaitGroup
+		ready.Add(1)
+		var churn func(i int)
+		churn = func(i int) {
+			SetGoroutineLabels(WithLabels(ctx, Labels("churn-i", fmt.Sprint(i))))
+			if i == 0 {
+				ready.Done()
+			}
+			if ctx.Err() == nil {
+				go churn(i + 1)
+			}
+		}
+		go func() {
+			churn(0)
+		}()
+		ready.Wait()
+
+		goroutineProf := Lookup("goroutine")
+		for i := 0; i < 10; i++ {
+			goroutineProf.WriteTo(io.Discard, 1)
+		}
+	})
+}
+
+// TestLabelSystemstack makes sure CPU profiler samples of goroutines running
+// on systemstack include the correct pprof labels. See issue #48577
+func TestLabelSystemstack(t *testing.T) {
+	// Grab and re-set the initial value before continuing to ensure
+	// GOGC doesn't actually change following the test.
+	gogc := debug.SetGCPercent(100)
+	debug.SetGCPercent(gogc)
+
+	matches := matchAndAvoidStacks(stackContainsLabeled, []string{"runtime.systemstack;key=value"}, avoidFunctions())
+	p := testCPUProfile(t, matches, func(dur time.Duration) {
+		Do(context.Background(), Labels("key", "value"), func(ctx context.Context) {
+			parallelLabelHog(ctx, dur, gogc)
+		})
+	})
+
+	// Two conditions to check:
+	// * labelHog should always be labeled.
+	// * The label should _only_ appear on labelHog and the Do call above.
+	for _, s := range p.Sample {
+		isLabeled := s.Label != nil && contains(s.Label["key"], "value")
+		var (
+			mayBeLabeled     bool
+			mustBeLabeled    string
+			mustNotBeLabeled string
+		)
+		for _, loc := range s.Location {
+			for _, l := range loc.Line {
+				switch l.Function.Name {
+				case "runtime/pprof.labelHog", "runtime/pprof.parallelLabelHog", "runtime/pprof.parallelLabelHog.func1":
+					mustBeLabeled = l.Function.Name
+				case "runtime/pprof.Do":
+					// Do sets the labels, so samples may
+					// or may not be labeled depending on
+					// which part of the function they are
+					// at.
+					mayBeLabeled = true
+				case "runtime.bgsweep", "runtime.bgscavenge", "runtime.forcegchelper", "runtime.gcBgMarkWorker", "runtime.runfinq", "runtime.sysmon":
+					// Runtime system goroutines or threads
+					// (such as those identified by
+					// runtime.isSystemGoroutine). These
+					// should never be labeled.
+					mustNotBeLabeled = l.Function.Name
+				case "gogo", "gosave_systemstack_switch", "racecall":
+					// These are context switch/race
+					// critical that we can't do a full
+					// traceback from. Typically this would
+					// be covered by the runtime check
+					// below, but these symbols don't have
+					// the package name.
+					mayBeLabeled = true
+				}
+
+				if strings.HasPrefix(l.Function.Name, "runtime.") {
+					// There are many places in the runtime
+					// where we can't do a full traceback.
+					// Ideally we'd list them all, but
+					// barring that allow anything in the
+					// runtime, unless explicitly excluded
+					// above.
+					mayBeLabeled = true
+				}
+			}
+		}
+		errorStack := func(f string, args ...any) {
+			var buf strings.Builder
+			fprintStack(&buf, s.Location)
+			t.Errorf("%s: %s", fmt.Sprintf(f, args...), buf.String())
+		}
+		if mustBeLabeled != "" && mustNotBeLabeled != "" {
+			errorStack("sample contains both %s, which must be labeled, and %s, which must not be labeled", mustBeLabeled, mustNotBeLabeled)
+			continue
+		}
+		if mustBeLabeled != "" || mustNotBeLabeled != "" {
+			// We found a definitive frame, so mayBeLabeled hints are not relevant.
+			mayBeLabeled = false
+		}
+		if mayBeLabeled {
+			// This sample may or may not be labeled, so there's nothing we can check.
+			continue
+		}
+		if mustBeLabeled != "" && !isLabeled {
+			errorStack("sample must be labeled because of %s, but is not", mustBeLabeled)
+		}
+		if mustNotBeLabeled != "" && isLabeled {
+			errorStack("sample must not be labeled because of %s, but is", mustNotBeLabeled)
+		}
+	}
+}
+
+// labelHog is designed to burn CPU time in a way that a high number of CPU
+// samples end up running on systemstack.
+func labelHog(stop chan struct{}, gogc int) {
+	// Regression test for issue 50032. We must give GC an opportunity to
+	// be initially triggered by a labelled goroutine.
+	runtime.GC()
+
+	for i := 0; ; i++ {
+		select {
+		case <-stop:
+			return
+		default:
+			debug.SetGCPercent(gogc)
+		}
+	}
+}
+
+// parallelLabelHog runs GOMAXPROCS goroutines running labelHog.
+func parallelLabelHog(ctx context.Context, dur time.Duration, gogc int) {
+	var wg sync.WaitGroup
+	stop := make(chan struct{})
+	for i := 0; i < runtime.GOMAXPROCS(0); i++ {
+		wg.Add(1)
+		go func() {
+			defer wg.Done()
+			labelHog(stop, gogc)
+		}()
+	}
+
+	time.Sleep(dur)
+	close(stop)
+	wg.Wait()
+}
+
+// Check that there is no deadlock when the program receives SIGPROF while in
+// 64bit atomics' critical section. Used to happen on mips{,le}. See #20146.
+func TestAtomicLoadStore64(t *testing.T) {
+	f, err := os.CreateTemp("", "profatomic")
+	if err != nil {
+		t.Fatalf("TempFile: %v", err)
+	}
+	defer os.Remove(f.Name())
+	defer f.Close()
+
+	if err := StartCPUProfile(f); err != nil {
+		t.Fatal(err)
+	}
+	defer StopCPUProfile()
+
+	var flag uint64
+	done := make(chan bool, 1)
+
+	go func() {
+		for atomic.LoadUint64(&flag) == 0 {
+			runtime.Gosched()
+		}
+		done <- true
+	}()
+	time.Sleep(50 * time.Millisecond)
+	atomic.StoreUint64(&flag, 1)
+	<-done
+}
+
+func TestTracebackAll(t *testing.T) {
+	// With gccgo, if a profiling signal arrives at the wrong time
+	// during traceback, it may crash or hang. See issue #29448.
+	f, err := os.CreateTemp("", "proftraceback")
+	if err != nil {
+		t.Fatalf("TempFile: %v", err)
+	}
+	defer os.Remove(f.Name())
+	defer f.Close()
+
+	if err := StartCPUProfile(f); err != nil {
+		t.Fatal(err)
+	}
+	defer StopCPUProfile()
+
+	ch := make(chan int)
+	defer close(ch)
+
+	count := 10
+	for i := 0; i < count; i++ {
+		go func() {
+			<-ch // block
+		}()
+	}
+
+	N := 10000
+	if testing.Short() {
+		N = 500
+	}
+	buf := make([]byte, 10*1024)
+	for i := 0; i < N; i++ {
+		runtime.Stack(buf, true)
+	}
+}
+
+// TestTryAdd tests the cases that are hard to test with real program execution.
+//
+// For example, the current go compilers may not always inline functions
+// involved in recursion but that may not be true in the future compilers. This
+// tests such cases by using fake call sequences and forcing the profile build
+// utilizing translateCPUProfile defined in proto_test.go
+func TestTryAdd(t *testing.T) {
+	if _, found := findInlinedCall(inlinedCallerDump, 4<<10); !found {
+		t.Skip("Can't determine whether anything was inlined into inlinedCallerDump.")
+	}
+
+	// inlinedCallerDump
+	//   inlinedCalleeDump
+	pcs := make([]uintptr, 2)
+	inlinedCallerDump(pcs)
+	inlinedCallerStack := make([]uint64, 2)
+	for i := range pcs {
+		inlinedCallerStack[i] = uint64(pcs[i])
+	}
+	wrapperPCs := make([]uintptr, 1)
+	inlinedWrapperCallerDump(wrapperPCs)
+
+	if _, found := findInlinedCall(recursionChainBottom, 4<<10); !found {
+		t.Skip("Can't determine whether anything was inlined into recursionChainBottom.")
+	}
+
+	// recursionChainTop
+	//   recursionChainMiddle
+	//     recursionChainBottom
+	//       recursionChainTop
+	//         recursionChainMiddle
+	//           recursionChainBottom
+	pcs = make([]uintptr, 6)
+	recursionChainTop(1, pcs)
+	recursionStack := make([]uint64, len(pcs))
+	for i := range pcs {
+		recursionStack[i] = uint64(pcs[i])
+	}
+
+	period := int64(2000 * 1000) // 1/500*1e9 nanosec.
+
+	testCases := []struct {
+		name        string
+		input       []uint64          // following the input format assumed by profileBuilder.addCPUData.
+		count       int               // number of records in input.
+		wantLocs    [][]string        // ordered location entries with function names.
+		wantSamples []*profile.Sample // ordered samples, we care only about Value and the profile location IDs.
+	}{{
+		// Sanity test for a normal, complete stack trace.
+		name: "full_stack_trace",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			5, 0, 50, inlinedCallerStack[0], inlinedCallerStack[1],
+		},
+		count: 2,
+		wantLocs: [][]string{
+			{"runtime/pprof.inlinedCalleeDump", "runtime/pprof.inlinedCallerDump"},
+		},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{50, 50 * period}, Location: []*profile.Location{{ID: 1}}},
+		},
+	}, {
+		name: "bug35538",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			// Fake frame: tryAdd will have inlinedCallerDump
+			// (stack[1]) on the deck when it encounters the next
+			// inline function. It should accept this.
+			7, 0, 10, inlinedCallerStack[0], inlinedCallerStack[1], inlinedCallerStack[0], inlinedCallerStack[1],
+			5, 0, 20, inlinedCallerStack[0], inlinedCallerStack[1],
+		},
+		count:    3,
+		wantLocs: [][]string{{"runtime/pprof.inlinedCalleeDump", "runtime/pprof.inlinedCallerDump"}},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{10, 10 * period}, Location: []*profile.Location{{ID: 1}, {ID: 1}}},
+			{Value: []int64{20, 20 * period}, Location: []*profile.Location{{ID: 1}}},
+		},
+	}, {
+		name: "bug38096",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			// count (data[2]) == 0 && len(stk) == 1 is an overflow
+			// entry. The "stk" entry is actually the count.
+			4, 0, 0, 4242,
+		},
+		count:    2,
+		wantLocs: [][]string{{"runtime/pprof.lostProfileEvent"}},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{4242, 4242 * period}, Location: []*profile.Location{{ID: 1}}},
+		},
+	}, {
+		// If a function is directly called recursively then it must
+		// not be inlined in the caller.
+		//
+		// N.B. We're generating an impossible profile here, with a
+		// recursive inlineCalleeDump call. This is simulating a non-Go
+		// function that looks like an inlined Go function other than
+		// its recursive property. See pcDeck.tryAdd.
+		name: "directly_recursive_func_is_not_inlined",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			5, 0, 30, inlinedCallerStack[0], inlinedCallerStack[0],
+			4, 0, 40, inlinedCallerStack[0],
+		},
+		count: 3,
+		// inlinedCallerDump shows up here because
+		// runtime_expandFinalInlineFrame adds it to the stack frame.
+		wantLocs: [][]string{{"runtime/pprof.inlinedCalleeDump"}, {"runtime/pprof.inlinedCallerDump"}},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{30, 30 * period}, Location: []*profile.Location{{ID: 1}, {ID: 1}, {ID: 2}}},
+			{Value: []int64{40, 40 * period}, Location: []*profile.Location{{ID: 1}, {ID: 2}}},
+		},
+	}, {
+		name: "recursion_chain_inline",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			9, 0, 10, recursionStack[0], recursionStack[1], recursionStack[2], recursionStack[3], recursionStack[4], recursionStack[5],
+		},
+		count: 2,
+		wantLocs: [][]string{
+			{"runtime/pprof.recursionChainBottom"},
+			{
+				"runtime/pprof.recursionChainMiddle",
+				"runtime/pprof.recursionChainTop",
+				"runtime/pprof.recursionChainBottom",
+			},
+			{
+				"runtime/pprof.recursionChainMiddle",
+				"runtime/pprof.recursionChainTop",
+				"runtime/pprof.TestTryAdd", // inlined into the test.
+			},
+		},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{10, 10 * period}, Location: []*profile.Location{{ID: 1}, {ID: 2}, {ID: 3}}},
+		},
+	}, {
+		name: "truncated_stack_trace_later",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			5, 0, 50, inlinedCallerStack[0], inlinedCallerStack[1],
+			4, 0, 60, inlinedCallerStack[0],
+		},
+		count:    3,
+		wantLocs: [][]string{{"runtime/pprof.inlinedCalleeDump", "runtime/pprof.inlinedCallerDump"}},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{50, 50 * period}, Location: []*profile.Location{{ID: 1}}},
+			{Value: []int64{60, 60 * period}, Location: []*profile.Location{{ID: 1}}},
+		},
+	}, {
+		name: "truncated_stack_trace_first",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			4, 0, 70, inlinedCallerStack[0],
+			5, 0, 80, inlinedCallerStack[0], inlinedCallerStack[1],
+		},
+		count:    3,
+		wantLocs: [][]string{{"runtime/pprof.inlinedCalleeDump", "runtime/pprof.inlinedCallerDump"}},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{70, 70 * period}, Location: []*profile.Location{{ID: 1}}},
+			{Value: []int64{80, 80 * period}, Location: []*profile.Location{{ID: 1}}},
+		},
+	}, {
+		// We can recover the inlined caller from a truncated stack.
+		name: "truncated_stack_trace_only",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			4, 0, 70, inlinedCallerStack[0],
+		},
+		count:    2,
+		wantLocs: [][]string{{"runtime/pprof.inlinedCalleeDump", "runtime/pprof.inlinedCallerDump"}},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{70, 70 * period}, Location: []*profile.Location{{ID: 1}}},
+		},
+	}, {
+		// The same location is used for duplicated stacks.
+		name: "truncated_stack_trace_twice",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			4, 0, 70, inlinedCallerStack[0],
+			// Fake frame: add a fake call to
+			// inlinedCallerDump to prevent this sample
+			// from getting merged into above.
+			5, 0, 80, inlinedCallerStack[1], inlinedCallerStack[0],
+		},
+		count: 3,
+		wantLocs: [][]string{
+			{"runtime/pprof.inlinedCalleeDump", "runtime/pprof.inlinedCallerDump"},
+			{"runtime/pprof.inlinedCallerDump"},
+		},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{70, 70 * period}, Location: []*profile.Location{{ID: 1}}},
+			{Value: []int64{80, 80 * period}, Location: []*profile.Location{{ID: 2}, {ID: 1}}},
+		},
+	}, {
+		name: "expand_wrapper_function",
+		input: []uint64{
+			3, 0, 500, // hz = 500. Must match the period.
+			4, 0, 50, uint64(wrapperPCs[0]),
+		},
+		count:    2,
+		wantLocs: [][]string{{"runtime/pprof.inlineWrapper.dump"}},
+		wantSamples: []*profile.Sample{
+			{Value: []int64{50, 50 * period}, Location: []*profile.Location{{ID: 1}}},
+		},
+	}}
+
+	for _, tc := range testCases {
+		t.Run(tc.name, func(t *testing.T) {
+			p, err := translateCPUProfile(tc.input, tc.count)
+			if err != nil {
+				t.Fatalf("translating profile: %v", err)
+			}
+			t.Logf("Profile: %v\n", p)
+
+			// One location entry with all inlined functions.
+			var gotLoc [][]string
+			for _, loc := range p.Location {
+				var names []string
+				for _, line := range loc.Line {
+					names = append(names, line.Function.Name)
+				}
+				gotLoc = append(gotLoc, names)
+			}
+			if got, want := fmtJSON(gotLoc), fmtJSON(tc.wantLocs); got != want {
+				t.Errorf("Got Location = %+v\n\twant %+v", got, want)
+			}
+			// All samples should point to one location.
+			var gotSamples []*profile.Sample
+			for _, sample := range p.Sample {
+				var locs []*profile.Location
+				for _, loc := range sample.Location {
+					locs = append(locs, &profile.Location{ID: loc.ID})
+				}
+				gotSamples = append(gotSamples, &profile.Sample{Value: sample.Value, Location: locs})
+			}
+			if got, want := fmtJSON(gotSamples), fmtJSON(tc.wantSamples); got != want {
+				t.Errorf("Got Samples = %+v\n\twant %+v", got, want)
+			}
+		})
+	}
+}
+
+func TestTimeVDSO(t *testing.T) {
+	// Test that time functions have the right stack trace. In particular,
+	// it shouldn't be recursive.
+
+	if runtime.GOOS == "android" {
+		// Flaky on Android, issue 48655. VDSO may not be enabled.
+		testenv.SkipFlaky(t, 48655)
+	}
+
+	matches := matchAndAvoidStacks(stackContains, []string{"time.now"}, avoidFunctions())
+	p := testCPUProfile(t, matches, func(dur time.Duration) {
+		t0 := time.Now()
+		for {
+			t := time.Now()
+			if t.Sub(t0) >= dur {
+				return
+			}
+		}
+	})
+
+	// Check for recursive time.now sample.
+	for _, sample := range p.Sample {
+		var seenNow bool
+		for _, loc := range sample.Location {
+			for _, line := range loc.Line {
+				if line.Function.Name == "time.now" {
+					if seenNow {
+						t.Fatalf("unexpected recursive time.now")
+					}
+					seenNow = true
+				}
+			}
+		}
+	}
+}
diff --git a/src/runtime/pprof/pprof_windows.go b/src/runtime/pprof/pprof_windows.go
new file mode 100644
index 0000000..23ef2f8
--- /dev/null
+++ b/src/runtime/pprof/pprof_windows.go
@@ -0,0 +1,22 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"fmt"
+	"internal/syscall/windows"
+	"io"
+	"syscall"
+	"unsafe"
+)
+
+func addMaxRSS(w io.Writer) {
+	var m windows.PROCESS_MEMORY_COUNTERS
+	p, _ := syscall.GetCurrentProcess()
+	err := windows.GetProcessMemoryInfo(p, &m, uint32(unsafe.Sizeof(m)))
+	if err == nil {
+		fmt.Fprintf(w, "# MaxRSS = %d\n", m.PeakWorkingSetSize)
+	}
+}
diff --git a/src/runtime/pprof/proto.go b/src/runtime/pprof/proto.go
new file mode 100644
index 0000000..db9384e
--- /dev/null
+++ b/src/runtime/pprof/proto.go
@@ -0,0 +1,762 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"bytes"
+	"compress/gzip"
+	"fmt"
+	"internal/abi"
+	"io"
+	"runtime"
+	"strconv"
+	"strings"
+	"time"
+	"unsafe"
+)
+
+// lostProfileEvent is the function to which lost profiling
+// events are attributed.
+// (The name shows up in the pprof graphs.)
+func lostProfileEvent() { lostProfileEvent() }
+
+// A profileBuilder writes a profile incrementally from a
+// stream of profile samples delivered by the runtime.
+type profileBuilder struct {
+	start      time.Time
+	end        time.Time
+	havePeriod bool
+	period     int64
+	m          profMap
+
+	// encoding state
+	w         io.Writer
+	zw        *gzip.Writer
+	pb        protobuf
+	strings   []string
+	stringMap map[string]int
+	locs      map[uintptr]locInfo // list of locInfo starting with the given PC.
+	funcs     map[string]int      // Package path-qualified function name to Function.ID
+	mem       []memMap
+	deck      pcDeck
+}
+
+type memMap struct {
+	// initialized as reading mapping
+	start   uintptr // Address at which the binary (or DLL) is loaded into memory.
+	end     uintptr // The limit of the address range occupied by this mapping.
+	offset  uint64  // Offset in the binary that corresponds to the first mapped address.
+	file    string  // The object this entry is loaded from.
+	buildID string  // A string that uniquely identifies a particular program version with high probability.
+
+	funcs symbolizeFlag
+	fake  bool // map entry was faked; /proc/self/maps wasn't available
+}
+
+// symbolizeFlag keeps track of symbolization result.
+//
+//	0                  : no symbol lookup was performed
+//	1<<0 (lookupTried) : symbol lookup was performed
+//	1<<1 (lookupFailed): symbol lookup was performed but failed
+type symbolizeFlag uint8
+
+const (
+	lookupTried  symbolizeFlag = 1 << iota
+	lookupFailed symbolizeFlag = 1 << iota
+)
+
+const (
+	// message Profile
+	tagProfile_SampleType        = 1  // repeated ValueType
+	tagProfile_Sample            = 2  // repeated Sample
+	tagProfile_Mapping           = 3  // repeated Mapping
+	tagProfile_Location          = 4  // repeated Location
+	tagProfile_Function          = 5  // repeated Function
+	tagProfile_StringTable       = 6  // repeated string
+	tagProfile_DropFrames        = 7  // int64 (string table index)
+	tagProfile_KeepFrames        = 8  // int64 (string table index)
+	tagProfile_TimeNanos         = 9  // int64
+	tagProfile_DurationNanos     = 10 // int64
+	tagProfile_PeriodType        = 11 // ValueType (really optional string???)
+	tagProfile_Period            = 12 // int64
+	tagProfile_Comment           = 13 // repeated int64
+	tagProfile_DefaultSampleType = 14 // int64
+
+	// message ValueType
+	tagValueType_Type = 1 // int64 (string table index)
+	tagValueType_Unit = 2 // int64 (string table index)
+
+	// message Sample
+	tagSample_Location = 1 // repeated uint64
+	tagSample_Value    = 2 // repeated int64
+	tagSample_Label    = 3 // repeated Label
+
+	// message Label
+	tagLabel_Key = 1 // int64 (string table index)
+	tagLabel_Str = 2 // int64 (string table index)
+	tagLabel_Num = 3 // int64
+
+	// message Mapping
+	tagMapping_ID              = 1  // uint64
+	tagMapping_Start           = 2  // uint64
+	tagMapping_Limit           = 3  // uint64
+	tagMapping_Offset          = 4  // uint64
+	tagMapping_Filename        = 5  // int64 (string table index)
+	tagMapping_BuildID         = 6  // int64 (string table index)
+	tagMapping_HasFunctions    = 7  // bool
+	tagMapping_HasFilenames    = 8  // bool
+	tagMapping_HasLineNumbers  = 9  // bool
+	tagMapping_HasInlineFrames = 10 // bool
+
+	// message Location
+	tagLocation_ID        = 1 // uint64
+	tagLocation_MappingID = 2 // uint64
+	tagLocation_Address   = 3 // uint64
+	tagLocation_Line      = 4 // repeated Line
+
+	// message Line
+	tagLine_FunctionID = 1 // uint64
+	tagLine_Line       = 2 // int64
+
+	// message Function
+	tagFunction_ID         = 1 // uint64
+	tagFunction_Name       = 2 // int64 (string table index)
+	tagFunction_SystemName = 3 // int64 (string table index)
+	tagFunction_Filename   = 4 // int64 (string table index)
+	tagFunction_StartLine  = 5 // int64
+)
+
+// stringIndex adds s to the string table if not already present
+// and returns the index of s in the string table.
+func (b *profileBuilder) stringIndex(s string) int64 {
+	id, ok := b.stringMap[s]
+	if !ok {
+		id = len(b.strings)
+		b.strings = append(b.strings, s)
+		b.stringMap[s] = id
+	}
+	return int64(id)
+}
+
+func (b *profileBuilder) flush() {
+	const dataFlush = 4096
+	if b.pb.nest == 0 && len(b.pb.data) > dataFlush {
+		b.zw.Write(b.pb.data)
+		b.pb.data = b.pb.data[:0]
+	}
+}
+
+// pbValueType encodes a ValueType message to b.pb.
+func (b *profileBuilder) pbValueType(tag int, typ, unit string) {
+	start := b.pb.startMessage()
+	b.pb.int64(tagValueType_Type, b.stringIndex(typ))
+	b.pb.int64(tagValueType_Unit, b.stringIndex(unit))
+	b.pb.endMessage(tag, start)
+}
+
+// pbSample encodes a Sample message to b.pb.
+func (b *profileBuilder) pbSample(values []int64, locs []uint64, labels func()) {
+	start := b.pb.startMessage()
+	b.pb.int64s(tagSample_Value, values)
+	b.pb.uint64s(tagSample_Location, locs)
+	if labels != nil {
+		labels()
+	}
+	b.pb.endMessage(tagProfile_Sample, start)
+	b.flush()
+}
+
+// pbLabel encodes a Label message to b.pb.
+func (b *profileBuilder) pbLabel(tag int, key, str string, num int64) {
+	start := b.pb.startMessage()
+	b.pb.int64Opt(tagLabel_Key, b.stringIndex(key))
+	b.pb.int64Opt(tagLabel_Str, b.stringIndex(str))
+	b.pb.int64Opt(tagLabel_Num, num)
+	b.pb.endMessage(tag, start)
+}
+
+// pbLine encodes a Line message to b.pb.
+func (b *profileBuilder) pbLine(tag int, funcID uint64, line int64) {
+	start := b.pb.startMessage()
+	b.pb.uint64Opt(tagLine_FunctionID, funcID)
+	b.pb.int64Opt(tagLine_Line, line)
+	b.pb.endMessage(tag, start)
+}
+
+// pbMapping encodes a Mapping message to b.pb.
+func (b *profileBuilder) pbMapping(tag int, id, base, limit, offset uint64, file, buildID string, hasFuncs bool) {
+	start := b.pb.startMessage()
+	b.pb.uint64Opt(tagMapping_ID, id)
+	b.pb.uint64Opt(tagMapping_Start, base)
+	b.pb.uint64Opt(tagMapping_Limit, limit)
+	b.pb.uint64Opt(tagMapping_Offset, offset)
+	b.pb.int64Opt(tagMapping_Filename, b.stringIndex(file))
+	b.pb.int64Opt(tagMapping_BuildID, b.stringIndex(buildID))
+	// TODO: we set HasFunctions if all symbols from samples were symbolized (hasFuncs).
+	// Decide what to do about HasInlineFrames and HasLineNumbers.
+	// Also, another approach to handle the mapping entry with
+	// incomplete symbolization results is to duplicate the mapping
+	// entry (but with different Has* fields values) and use
+	// different entries for symbolized locations and unsymbolized locations.
+	if hasFuncs {
+		b.pb.bool(tagMapping_HasFunctions, true)
+	}
+	b.pb.endMessage(tag, start)
+}
+
+func allFrames(addr uintptr) ([]runtime.Frame, symbolizeFlag) {
+	// Expand this one address using CallersFrames so we can cache
+	// each expansion. In general, CallersFrames takes a whole
+	// stack, but in this case we know there will be no skips in
+	// the stack and we have return PCs anyway.
+	frames := runtime.CallersFrames([]uintptr{addr})
+	frame, more := frames.Next()
+	if frame.Function == "runtime.goexit" {
+		// Short-circuit if we see runtime.goexit so the loop
+		// below doesn't allocate a useless empty location.
+		return nil, 0
+	}
+
+	symbolizeResult := lookupTried
+	if frame.PC == 0 || frame.Function == "" || frame.File == "" || frame.Line == 0 {
+		symbolizeResult |= lookupFailed
+	}
+
+	if frame.PC == 0 {
+		// If we failed to resolve the frame, at least make up
+		// a reasonable call PC. This mostly happens in tests.
+		frame.PC = addr - 1
+	}
+	ret := []runtime.Frame{frame}
+	for frame.Function != "runtime.goexit" && more {
+		frame, more = frames.Next()
+		ret = append(ret, frame)
+	}
+	return ret, symbolizeResult
+}
+
+type locInfo struct {
+	// location id assigned by the profileBuilder
+	id uint64
+
+	// sequence of PCs, including the fake PCs returned by the traceback
+	// to represent inlined functions
+	// https://github.com/golang/go/blob/d6f2f833c93a41ec1c68e49804b8387a06b131c5/src/runtime/traceback.go#L347-L368
+	pcs []uintptr
+
+	// firstPCFrames and firstPCSymbolizeResult hold the results of the
+	// allFrames call for the first (leaf-most) PC this locInfo represents
+	firstPCFrames          []runtime.Frame
+	firstPCSymbolizeResult symbolizeFlag
+}
+
+// newProfileBuilder returns a new profileBuilder.
+// CPU profiling data obtained from the runtime can be added
+// by calling b.addCPUData, and then the eventual profile
+// can be obtained by calling b.finish.
+func newProfileBuilder(w io.Writer) *profileBuilder {
+	zw, _ := gzip.NewWriterLevel(w, gzip.BestSpeed)
+	b := &profileBuilder{
+		w:         w,
+		zw:        zw,
+		start:     time.Now(),
+		strings:   []string{""},
+		stringMap: map[string]int{"": 0},
+		locs:      map[uintptr]locInfo{},
+		funcs:     map[string]int{},
+	}
+	b.readMapping()
+	return b
+}
+
+// addCPUData adds the CPU profiling data to the profile.
+//
+// The data must be a whole number of records, as delivered by the runtime.
+// len(tags) must be equal to the number of records in data.
+func (b *profileBuilder) addCPUData(data []uint64, tags []unsafe.Pointer) error {
+	if !b.havePeriod {
+		// first record is period
+		if len(data) < 3 {
+			return fmt.Errorf("truncated profile")
+		}
+		if data[0] != 3 || data[2] == 0 {
+			return fmt.Errorf("malformed profile")
+		}
+		// data[2] is sampling rate in Hz. Convert to sampling
+		// period in nanoseconds.
+		b.period = 1e9 / int64(data[2])
+		b.havePeriod = true
+		data = data[3:]
+		// Consume tag slot. Note that there isn't a meaningful tag
+		// value for this record.
+		tags = tags[1:]
+	}
+
+	// Parse CPU samples from the profile.
+	// Each sample is 3+n uint64s:
+	//	data[0] = 3+n
+	//	data[1] = time stamp (ignored)
+	//	data[2] = count
+	//	data[3:3+n] = stack
+	// If the count is 0 and the stack has length 1,
+	// that's an overflow record inserted by the runtime
+	// to indicate that stack[0] samples were lost.
+	// Otherwise the count is usually 1,
+	// but in a few special cases like lost non-Go samples
+	// there can be larger counts.
+	// Because many samples with the same stack arrive,
+	// we want to deduplicate immediately, which we do
+	// using the b.m profMap.
+	for len(data) > 0 {
+		if len(data) < 3 || data[0] > uint64(len(data)) {
+			return fmt.Errorf("truncated profile")
+		}
+		if data[0] < 3 || tags != nil && len(tags) < 1 {
+			return fmt.Errorf("malformed profile")
+		}
+		if len(tags) < 1 {
+			return fmt.Errorf("mismatched profile records and tags")
+		}
+		count := data[2]
+		stk := data[3:data[0]]
+		data = data[data[0]:]
+		tag := tags[0]
+		tags = tags[1:]
+
+		if count == 0 && len(stk) == 1 {
+			// overflow record
+			count = uint64(stk[0])
+			stk = []uint64{
+				// gentraceback guarantees that PCs in the
+				// stack can be unconditionally decremented and
+				// still be valid, so we must do the same.
+				uint64(abi.FuncPCABIInternal(lostProfileEvent) + 1),
+			}
+		}
+		b.m.lookup(stk, tag).count += int64(count)
+	}
+
+	if len(tags) != 0 {
+		return fmt.Errorf("mismatched profile records and tags")
+	}
+	return nil
+}
+
+// build completes and returns the constructed profile.
+func (b *profileBuilder) build() {
+	b.end = time.Now()
+
+	b.pb.int64Opt(tagProfile_TimeNanos, b.start.UnixNano())
+	if b.havePeriod { // must be CPU profile
+		b.pbValueType(tagProfile_SampleType, "samples", "count")
+		b.pbValueType(tagProfile_SampleType, "cpu", "nanoseconds")
+		b.pb.int64Opt(tagProfile_DurationNanos, b.end.Sub(b.start).Nanoseconds())
+		b.pbValueType(tagProfile_PeriodType, "cpu", "nanoseconds")
+		b.pb.int64Opt(tagProfile_Period, b.period)
+	}
+
+	values := []int64{0, 0}
+	var locs []uint64
+
+	for e := b.m.all; e != nil; e = e.nextAll {
+		values[0] = e.count
+		values[1] = e.count * b.period
+
+		var labels func()
+		if e.tag != nil {
+			labels = func() {
+				for k, v := range *(*labelMap)(e.tag) {
+					b.pbLabel(tagSample_Label, k, v, 0)
+				}
+			}
+		}
+
+		locs = b.appendLocsForStack(locs[:0], e.stk)
+
+		b.pbSample(values, locs, labels)
+	}
+
+	for i, m := range b.mem {
+		hasFunctions := m.funcs == lookupTried // lookupTried but not lookupFailed
+		b.pbMapping(tagProfile_Mapping, uint64(i+1), uint64(m.start), uint64(m.end), m.offset, m.file, m.buildID, hasFunctions)
+	}
+
+	// TODO: Anything for tagProfile_DropFrames?
+	// TODO: Anything for tagProfile_KeepFrames?
+
+	b.pb.strings(tagProfile_StringTable, b.strings)
+	b.zw.Write(b.pb.data)
+	b.zw.Close()
+}
+
+// appendLocsForStack appends the location IDs for the given stack trace to the given
+// location ID slice, locs. The addresses in the stack are return PCs or 1 + the PC of
+// an inline marker as the runtime traceback function returns.
+//
+// It may return an empty slice even if locs is non-empty, for example if locs consists
+// solely of runtime.goexit. We still count these empty stacks in profiles in order to
+// get the right cumulative sample count.
+//
+// It may emit to b.pb, so there must be no message encoding in progress.
+func (b *profileBuilder) appendLocsForStack(locs []uint64, stk []uintptr) (newLocs []uint64) {
+	b.deck.reset()
+
+	// The last frame might be truncated. Recover lost inline frames.
+	stk = runtime_expandFinalInlineFrame(stk)
+
+	for len(stk) > 0 {
+		addr := stk[0]
+		if l, ok := b.locs[addr]; ok {
+			// When generating code for an inlined function, the compiler adds
+			// NOP instructions to the outermost function as a placeholder for
+			// each layer of inlining. When the runtime generates tracebacks for
+			// stacks that include inlined functions, it uses the addresses of
+			// those NOPs as "fake" PCs on the stack as if they were regular
+			// function call sites. But if a profiling signal arrives while the
+			// CPU is executing one of those NOPs, its PC will show up as a leaf
+			// in the profile with its own Location entry. So, always check
+			// whether addr is a "fake" PC in the context of the current call
+			// stack by trying to add it to the inlining deck before assuming
+			// that the deck is complete.
+			if len(b.deck.pcs) > 0 {
+				if added := b.deck.tryAdd(addr, l.firstPCFrames, l.firstPCSymbolizeResult); added {
+					stk = stk[1:]
+					continue
+				}
+			}
+
+			// first record the location if there is any pending accumulated info.
+			if id := b.emitLocation(); id > 0 {
+				locs = append(locs, id)
+			}
+
+			// then, record the cached location.
+			locs = append(locs, l.id)
+
+			// Skip the matching pcs.
+			//
+			// Even if stk was truncated due to the stack depth
+			// limit, expandFinalInlineFrame above has already
+			// fixed the truncation, ensuring it is long enough.
+			stk = stk[len(l.pcs):]
+			continue
+		}
+
+		frames, symbolizeResult := allFrames(addr)
+		if len(frames) == 0 { // runtime.goexit.
+			if id := b.emitLocation(); id > 0 {
+				locs = append(locs, id)
+			}
+			stk = stk[1:]
+			continue
+		}
+
+		if added := b.deck.tryAdd(addr, frames, symbolizeResult); added {
+			stk = stk[1:]
+			continue
+		}
+		// add failed because this addr is not inlined with the
+		// existing PCs in the deck. Flush the deck and retry handling
+		// this pc.
+		if id := b.emitLocation(); id > 0 {
+			locs = append(locs, id)
+		}
+
+		// check cache again - previous emitLocation added a new entry
+		if l, ok := b.locs[addr]; ok {
+			locs = append(locs, l.id)
+			stk = stk[len(l.pcs):] // skip the matching pcs.
+		} else {
+			b.deck.tryAdd(addr, frames, symbolizeResult) // must succeed.
+			stk = stk[1:]
+		}
+	}
+	if id := b.emitLocation(); id > 0 { // emit remaining location.
+		locs = append(locs, id)
+	}
+	return locs
+}
+
+// Here's an example of how Go 1.17 writes out inlined functions, compiled for
+// linux/amd64. The disassembly of main.main shows two levels of inlining: main
+// calls b, b calls a, a does some work.
+//
+//   inline.go:9   0x4553ec  90              NOPL                 // func main()    { b(v) }
+//   inline.go:6   0x4553ed  90              NOPL                 // func b(v *int) { a(v) }
+//   inline.go:5   0x4553ee  48c7002a000000  MOVQ $0x2a, 0(AX)    // func a(v *int) { *v = 42 }
+//
+// If a profiling signal arrives while executing the MOVQ at 0x4553ee (for line
+// 5), the runtime will report the stack as the MOVQ frame being called by the
+// NOPL at 0x4553ed (for line 6) being called by the NOPL at 0x4553ec (for line
+// 9).
+//
+// The role of pcDeck is to collapse those three frames back into a single
+// location at 0x4553ee, with file/line/function symbolization info representing
+// the three layers of calls. It does that via sequential calls to pcDeck.tryAdd
+// starting with the leaf-most address. The fourth call to pcDeck.tryAdd will be
+// for the caller of main.main. Because main.main was not inlined in its caller,
+// the deck will reject the addition, and the fourth PC on the stack will get
+// its own location.
+
+// pcDeck is a helper to detect a sequence of inlined functions from
+// a stack trace returned by the runtime.
+//
+// The stack traces returned by runtime's trackback functions are fully
+// expanded (at least for Go functions) and include the fake pcs representing
+// inlined functions. The profile proto expects the inlined functions to be
+// encoded in one Location message.
+// https://github.com/google/pprof/blob/5e965273ee43930341d897407202dd5e10e952cb/proto/profile.proto#L177-L184
+//
+// Runtime does not directly expose whether a frame is for an inlined function
+// and looking up debug info is not ideal, so we use a heuristic to filter
+// the fake pcs and restore the inlined and entry functions. Inlined functions
+// have the following properties:
+//
+//	Frame's Func is nil (note: also true for non-Go functions), and
+//	Frame's Entry matches its entry function frame's Entry (note: could also be true for recursive calls and non-Go functions), and
+//	Frame's Name does not match its entry function frame's name (note: inlined functions cannot be directly recursive).
+//
+// As reading and processing the pcs in a stack trace one by one (from leaf to the root),
+// we use pcDeck to temporarily hold the observed pcs and their expanded frames
+// until we observe the entry function frame.
+type pcDeck struct {
+	pcs             []uintptr
+	frames          []runtime.Frame
+	symbolizeResult symbolizeFlag
+
+	// firstPCFrames indicates the number of frames associated with the first
+	// (leaf-most) PC in the deck
+	firstPCFrames int
+	// firstPCSymbolizeResult holds the results of the allFrames call for the
+	// first (leaf-most) PC in the deck
+	firstPCSymbolizeResult symbolizeFlag
+}
+
+func (d *pcDeck) reset() {
+	d.pcs = d.pcs[:0]
+	d.frames = d.frames[:0]
+	d.symbolizeResult = 0
+	d.firstPCFrames = 0
+	d.firstPCSymbolizeResult = 0
+}
+
+// tryAdd tries to add the pc and Frames expanded from it (most likely one,
+// since the stack trace is already fully expanded) and the symbolizeResult
+// to the deck. If it fails the caller needs to flush the deck and retry.
+func (d *pcDeck) tryAdd(pc uintptr, frames []runtime.Frame, symbolizeResult symbolizeFlag) (success bool) {
+	if existing := len(d.frames); existing > 0 {
+		// 'd.frames' are all expanded from one 'pc' and represent all
+		// inlined functions so we check only the last one.
+		newFrame := frames[0]
+		last := d.frames[existing-1]
+		if last.Func != nil { // the last frame can't be inlined. Flush.
+			return false
+		}
+		if last.Entry == 0 || newFrame.Entry == 0 { // Possibly not a Go function. Don't try to merge.
+			return false
+		}
+
+		if last.Entry != newFrame.Entry { // newFrame is for a different function.
+			return false
+		}
+		if last.Function == newFrame.Function { // maybe recursion.
+			return false
+		}
+	}
+	d.pcs = append(d.pcs, pc)
+	d.frames = append(d.frames, frames...)
+	d.symbolizeResult |= symbolizeResult
+	if len(d.pcs) == 1 {
+		d.firstPCFrames = len(d.frames)
+		d.firstPCSymbolizeResult = symbolizeResult
+	}
+	return true
+}
+
+// emitLocation emits the new location and function information recorded in the deck
+// and returns the location ID encoded in the profile protobuf.
+// It emits to b.pb, so there must be no message encoding in progress.
+// It resets the deck.
+func (b *profileBuilder) emitLocation() uint64 {
+	if len(b.deck.pcs) == 0 {
+		return 0
+	}
+	defer b.deck.reset()
+
+	addr := b.deck.pcs[0]
+	firstFrame := b.deck.frames[0]
+
+	// We can't write out functions while in the middle of the
+	// Location message, so record new functions we encounter and
+	// write them out after the Location.
+	type newFunc struct {
+		id         uint64
+		name, file string
+		startLine  int64
+	}
+	newFuncs := make([]newFunc, 0, 8)
+
+	id := uint64(len(b.locs)) + 1
+	b.locs[addr] = locInfo{
+		id:                     id,
+		pcs:                    append([]uintptr{}, b.deck.pcs...),
+		firstPCSymbolizeResult: b.deck.firstPCSymbolizeResult,
+		firstPCFrames:          append([]runtime.Frame{}, b.deck.frames[:b.deck.firstPCFrames]...),
+	}
+
+	start := b.pb.startMessage()
+	b.pb.uint64Opt(tagLocation_ID, id)
+	b.pb.uint64Opt(tagLocation_Address, uint64(firstFrame.PC))
+	for _, frame := range b.deck.frames {
+		// Write out each line in frame expansion.
+		funcName := runtime_FrameSymbolName(&frame)
+		funcID := uint64(b.funcs[funcName])
+		if funcID == 0 {
+			funcID = uint64(len(b.funcs)) + 1
+			b.funcs[funcName] = int(funcID)
+			newFuncs = append(newFuncs, newFunc{
+				id:        funcID,
+				name:      funcName,
+				file:      frame.File,
+				startLine: int64(runtime_FrameStartLine(&frame)),
+			})
+		}
+		b.pbLine(tagLocation_Line, funcID, int64(frame.Line))
+	}
+	for i := range b.mem {
+		if b.mem[i].start <= addr && addr < b.mem[i].end || b.mem[i].fake {
+			b.pb.uint64Opt(tagLocation_MappingID, uint64(i+1))
+
+			m := b.mem[i]
+			m.funcs |= b.deck.symbolizeResult
+			b.mem[i] = m
+			break
+		}
+	}
+	b.pb.endMessage(tagProfile_Location, start)
+
+	// Write out functions we found during frame expansion.
+	for _, fn := range newFuncs {
+		start := b.pb.startMessage()
+		b.pb.uint64Opt(tagFunction_ID, fn.id)
+		b.pb.int64Opt(tagFunction_Name, b.stringIndex(fn.name))
+		b.pb.int64Opt(tagFunction_SystemName, b.stringIndex(fn.name))
+		b.pb.int64Opt(tagFunction_Filename, b.stringIndex(fn.file))
+		b.pb.int64Opt(tagFunction_StartLine, fn.startLine)
+		b.pb.endMessage(tagProfile_Function, start)
+	}
+
+	b.flush()
+	return id
+}
+
+var space = []byte(" ")
+var newline = []byte("\n")
+
+func parseProcSelfMaps(data []byte, addMapping func(lo, hi, offset uint64, file, buildID string)) {
+	// $ cat /proc/self/maps
+	// 00400000-0040b000 r-xp 00000000 fc:01 787766                             /bin/cat
+	// 0060a000-0060b000 r--p 0000a000 fc:01 787766                             /bin/cat
+	// 0060b000-0060c000 rw-p 0000b000 fc:01 787766                             /bin/cat
+	// 014ab000-014cc000 rw-p 00000000 00:00 0                                  [heap]
+	// 7f7d76af8000-7f7d7797c000 r--p 00000000 fc:01 1318064                    /usr/lib/locale/locale-archive
+	// 7f7d7797c000-7f7d77b36000 r-xp 00000000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+	// 7f7d77b36000-7f7d77d36000 ---p 001ba000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+	// 7f7d77d36000-7f7d77d3a000 r--p 001ba000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+	// 7f7d77d3a000-7f7d77d3c000 rw-p 001be000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+	// 7f7d77d3c000-7f7d77d41000 rw-p 00000000 00:00 0
+	// 7f7d77d41000-7f7d77d64000 r-xp 00000000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+	// 7f7d77f3f000-7f7d77f42000 rw-p 00000000 00:00 0
+	// 7f7d77f61000-7f7d77f63000 rw-p 00000000 00:00 0
+	// 7f7d77f63000-7f7d77f64000 r--p 00022000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+	// 7f7d77f64000-7f7d77f65000 rw-p 00023000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+	// 7f7d77f65000-7f7d77f66000 rw-p 00000000 00:00 0
+	// 7ffc342a2000-7ffc342c3000 rw-p 00000000 00:00 0                          [stack]
+	// 7ffc34343000-7ffc34345000 r-xp 00000000 00:00 0                          [vdso]
+	// ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
+
+	var line []byte
+	// next removes and returns the next field in the line.
+	// It also removes from line any spaces following the field.
+	next := func() []byte {
+		var f []byte
+		f, line, _ = bytes.Cut(line, space)
+		line = bytes.TrimLeft(line, " ")
+		return f
+	}
+
+	for len(data) > 0 {
+		line, data, _ = bytes.Cut(data, newline)
+		addr := next()
+		loStr, hiStr, ok := strings.Cut(string(addr), "-")
+		if !ok {
+			continue
+		}
+		lo, err := strconv.ParseUint(loStr, 16, 64)
+		if err != nil {
+			continue
+		}
+		hi, err := strconv.ParseUint(hiStr, 16, 64)
+		if err != nil {
+			continue
+		}
+		perm := next()
+		if len(perm) < 4 || perm[2] != 'x' {
+			// Only interested in executable mappings.
+			continue
+		}
+		offset, err := strconv.ParseUint(string(next()), 16, 64)
+		if err != nil {
+			continue
+		}
+		next()          // dev
+		inode := next() // inode
+		if line == nil {
+			continue
+		}
+		file := string(line)
+
+		// Trim deleted file marker.
+		deletedStr := " (deleted)"
+		deletedLen := len(deletedStr)
+		if len(file) >= deletedLen && file[len(file)-deletedLen:] == deletedStr {
+			file = file[:len(file)-deletedLen]
+		}
+
+		if len(inode) == 1 && inode[0] == '0' && file == "" {
+			// Huge-page text mappings list the initial fragment of
+			// mapped but unpopulated memory as being inode 0.
+			// Don't report that part.
+			// But [vdso] and [vsyscall] are inode 0, so let non-empty file names through.
+			continue
+		}
+
+		// TODO: pprof's remapMappingIDs makes one adjustment:
+		// 1. If there is an /anon_hugepage mapping first and it is
+		// consecutive to a next mapping, drop the /anon_hugepage.
+		// There's no indication why this is needed.
+		// Let's try not doing this and see what breaks.
+		// If we do need it, it would go here, before we
+		// enter the mappings into b.mem in the first place.
+
+		buildID, _ := elfBuildID(file)
+		addMapping(lo, hi, offset, file, buildID)
+	}
+}
+
+func (b *profileBuilder) addMapping(lo, hi, offset uint64, file, buildID string) {
+	b.addMappingEntry(lo, hi, offset, file, buildID, false)
+}
+
+func (b *profileBuilder) addMappingEntry(lo, hi, offset uint64, file, buildID string, fake bool) {
+	b.mem = append(b.mem, memMap{
+		start:   uintptr(lo),
+		end:     uintptr(hi),
+		offset:  offset,
+		file:    file,
+		buildID: buildID,
+		fake:    fake,
+	})
+}
diff --git a/src/runtime/pprof/proto_other.go b/src/runtime/pprof/proto_other.go
new file mode 100644
index 0000000..4a7fe79
--- /dev/null
+++ b/src/runtime/pprof/proto_other.go
@@ -0,0 +1,30 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !windows
+
+package pprof
+
+import (
+	"errors"
+	"os"
+)
+
+// readMapping reads /proc/self/maps and writes mappings to b.pb.
+// It saves the address ranges of the mappings in b.mem for use
+// when emitting locations.
+func (b *profileBuilder) readMapping() {
+	data, _ := os.ReadFile("/proc/self/maps")
+	parseProcSelfMaps(data, b.addMapping)
+	if len(b.mem) == 0 { // pprof expects a map entry, so fake one.
+		b.addMappingEntry(0, 0, 0, "", "", true)
+		// TODO(hyangah): make addMapping return *memMap or
+		// take a memMap struct, and get rid of addMappingEntry
+		// that takes a bunch of positional arguments.
+	}
+}
+
+func readMainModuleMapping() (start, end uint64, err error) {
+	return 0, 0, errors.New("not implemented")
+}
diff --git a/src/runtime/pprof/proto_test.go b/src/runtime/pprof/proto_test.go
new file mode 100644
index 0000000..8ec9c91
--- /dev/null
+++ b/src/runtime/pprof/proto_test.go
@@ -0,0 +1,470 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"bytes"
+	"encoding/json"
+	"fmt"
+	"internal/abi"
+	"internal/profile"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"reflect"
+	"runtime"
+	"strings"
+	"testing"
+	"unsafe"
+)
+
+// translateCPUProfile parses binary CPU profiling stack trace data
+// generated by runtime.CPUProfile() into a profile struct.
+// This is only used for testing. Real conversions stream the
+// data into the profileBuilder as it becomes available.
+//
+// count is the number of records in data.
+func translateCPUProfile(data []uint64, count int) (*profile.Profile, error) {
+	var buf bytes.Buffer
+	b := newProfileBuilder(&buf)
+	tags := make([]unsafe.Pointer, count)
+	if err := b.addCPUData(data, tags); err != nil {
+		return nil, err
+	}
+	b.build()
+	return profile.Parse(&buf)
+}
+
+// fmtJSON returns a pretty-printed JSON form for x.
+// It works reasonably well for printing protocol-buffer
+// data structures like profile.Profile.
+func fmtJSON(x any) string {
+	js, _ := json.MarshalIndent(x, "", "\t")
+	return string(js)
+}
+
+func TestConvertCPUProfileEmpty(t *testing.T) {
+	// A test server with mock cpu profile data.
+	var buf bytes.Buffer
+
+	b := []uint64{3, 0, 500} // empty profile at 500 Hz (2ms sample period)
+	p, err := translateCPUProfile(b, 1)
+	if err != nil {
+		t.Fatalf("translateCPUProfile: %v", err)
+	}
+	if err := p.Write(&buf); err != nil {
+		t.Fatalf("writing profile: %v", err)
+	}
+
+	p, err = profile.Parse(&buf)
+	if err != nil {
+		t.Fatalf("profile.Parse: %v", err)
+	}
+
+	// Expected PeriodType and SampleType.
+	periodType := &profile.ValueType{Type: "cpu", Unit: "nanoseconds"}
+	sampleType := []*profile.ValueType{
+		{Type: "samples", Unit: "count"},
+		{Type: "cpu", Unit: "nanoseconds"},
+	}
+
+	checkProfile(t, p, 2000*1000, periodType, sampleType, nil, "")
+}
+
+func f1() { f1() }
+func f2() { f2() }
+
+// testPCs returns two PCs and two corresponding memory mappings
+// to use in test profiles.
+func testPCs(t *testing.T) (addr1, addr2 uint64, map1, map2 *profile.Mapping) {
+	switch runtime.GOOS {
+	case "linux", "android", "netbsd":
+		// Figure out two addresses from /proc/self/maps.
+		mmap, err := os.ReadFile("/proc/self/maps")
+		if err != nil {
+			t.Fatal(err)
+		}
+		mprof := &profile.Profile{}
+		if err = mprof.ParseMemoryMap(bytes.NewReader(mmap)); err != nil {
+			t.Fatalf("parsing /proc/self/maps: %v", err)
+		}
+		if len(mprof.Mapping) < 2 {
+			// It is possible for a binary to only have 1 executable
+			// region of memory.
+			t.Skipf("need 2 or more mappings, got %v", len(mprof.Mapping))
+		}
+		addr1 = mprof.Mapping[0].Start
+		map1 = mprof.Mapping[0]
+		map1.BuildID, _ = elfBuildID(map1.File)
+		addr2 = mprof.Mapping[1].Start
+		map2 = mprof.Mapping[1]
+		map2.BuildID, _ = elfBuildID(map2.File)
+	case "windows":
+		addr1 = uint64(abi.FuncPCABIInternal(f1))
+		addr2 = uint64(abi.FuncPCABIInternal(f2))
+
+		exe, err := os.Executable()
+		if err != nil {
+			t.Fatal(err)
+		}
+
+		start, end, err := readMainModuleMapping()
+		if err != nil {
+			t.Fatal(err)
+		}
+
+		map1 = &profile.Mapping{
+			ID:           1,
+			Start:        start,
+			Limit:        end,
+			File:         exe,
+			BuildID:      peBuildID(exe),
+			HasFunctions: true,
+		}
+		map2 = &profile.Mapping{
+			ID:           1,
+			Start:        start,
+			Limit:        end,
+			File:         exe,
+			BuildID:      peBuildID(exe),
+			HasFunctions: true,
+		}
+	case "js", "wasip1":
+		addr1 = uint64(abi.FuncPCABIInternal(f1))
+		addr2 = uint64(abi.FuncPCABIInternal(f2))
+	default:
+		addr1 = uint64(abi.FuncPCABIInternal(f1))
+		addr2 = uint64(abi.FuncPCABIInternal(f2))
+		// Fake mapping - HasFunctions will be true because two PCs from Go
+		// will be fully symbolized.
+		fake := &profile.Mapping{ID: 1, HasFunctions: true}
+		map1, map2 = fake, fake
+	}
+	return
+}
+
+func TestConvertCPUProfile(t *testing.T) {
+	addr1, addr2, map1, map2 := testPCs(t)
+
+	b := []uint64{
+		3, 0, 500, // hz = 500
+		5, 0, 10, uint64(addr1 + 1), uint64(addr1 + 2), // 10 samples in addr1
+		5, 0, 40, uint64(addr2 + 1), uint64(addr2 + 2), // 40 samples in addr2
+		5, 0, 10, uint64(addr1 + 1), uint64(addr1 + 2), // 10 samples in addr1
+	}
+	p, err := translateCPUProfile(b, 4)
+	if err != nil {
+		t.Fatalf("translating profile: %v", err)
+	}
+	period := int64(2000 * 1000)
+	periodType := &profile.ValueType{Type: "cpu", Unit: "nanoseconds"}
+	sampleType := []*profile.ValueType{
+		{Type: "samples", Unit: "count"},
+		{Type: "cpu", Unit: "nanoseconds"},
+	}
+	samples := []*profile.Sample{
+		{Value: []int64{20, 20 * 2000 * 1000}, Location: []*profile.Location{
+			{ID: 1, Mapping: map1, Address: addr1},
+			{ID: 2, Mapping: map1, Address: addr1 + 1},
+		}},
+		{Value: []int64{40, 40 * 2000 * 1000}, Location: []*profile.Location{
+			{ID: 3, Mapping: map2, Address: addr2},
+			{ID: 4, Mapping: map2, Address: addr2 + 1},
+		}},
+	}
+	checkProfile(t, p, period, periodType, sampleType, samples, "")
+}
+
+func checkProfile(t *testing.T, p *profile.Profile, period int64, periodType *profile.ValueType, sampleType []*profile.ValueType, samples []*profile.Sample, defaultSampleType string) {
+	t.Helper()
+
+	if p.Period != period {
+		t.Errorf("p.Period = %d, want %d", p.Period, period)
+	}
+	if !reflect.DeepEqual(p.PeriodType, periodType) {
+		t.Errorf("p.PeriodType = %v\nwant = %v", fmtJSON(p.PeriodType), fmtJSON(periodType))
+	}
+	if !reflect.DeepEqual(p.SampleType, sampleType) {
+		t.Errorf("p.SampleType = %v\nwant = %v", fmtJSON(p.SampleType), fmtJSON(sampleType))
+	}
+	if defaultSampleType != p.DefaultSampleType {
+		t.Errorf("p.DefaultSampleType = %v\nwant = %v", p.DefaultSampleType, defaultSampleType)
+	}
+	// Clear line info since it is not in the expected samples.
+	// If we used f1 and f2 above, then the samples will have line info.
+	for _, s := range p.Sample {
+		for _, l := range s.Location {
+			l.Line = nil
+		}
+	}
+	if fmtJSON(p.Sample) != fmtJSON(samples) { // ignore unexported fields
+		if len(p.Sample) == len(samples) {
+			for i := range p.Sample {
+				if !reflect.DeepEqual(p.Sample[i], samples[i]) {
+					t.Errorf("sample %d = %v\nwant = %v\n", i, fmtJSON(p.Sample[i]), fmtJSON(samples[i]))
+				}
+			}
+			if t.Failed() {
+				t.FailNow()
+			}
+		}
+		t.Fatalf("p.Sample = %v\nwant = %v", fmtJSON(p.Sample), fmtJSON(samples))
+	}
+}
+
+var profSelfMapsTests = `
+00400000-0040b000 r-xp 00000000 fc:01 787766                             /bin/cat
+0060a000-0060b000 r--p 0000a000 fc:01 787766                             /bin/cat
+0060b000-0060c000 rw-p 0000b000 fc:01 787766                             /bin/cat
+014ab000-014cc000 rw-p 00000000 00:00 0                                  [heap]
+7f7d76af8000-7f7d7797c000 r--p 00000000 fc:01 1318064                    /usr/lib/locale/locale-archive
+7f7d7797c000-7f7d77b36000 r-xp 00000000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77b36000-7f7d77d36000 ---p 001ba000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d36000-7f7d77d3a000 r--p 001ba000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d3a000-7f7d77d3c000 rw-p 001be000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d3c000-7f7d77d41000 rw-p 00000000 00:00 0
+7f7d77d41000-7f7d77d64000 r-xp 00000000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f3f000-7f7d77f42000 rw-p 00000000 00:00 0
+7f7d77f61000-7f7d77f63000 rw-p 00000000 00:00 0
+7f7d77f63000-7f7d77f64000 r--p 00022000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f64000-7f7d77f65000 rw-p 00023000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f65000-7f7d77f66000 rw-p 00000000 00:00 0
+7ffc342a2000-7ffc342c3000 rw-p 00000000 00:00 0                          [stack]
+7ffc34343000-7ffc34345000 r-xp 00000000 00:00 0                          [vdso]
+ffffffffff600000-ffffffffff601000 r-xp 00000090 00:00 0                  [vsyscall]
+->
+00400000 0040b000 00000000 /bin/cat
+7f7d7797c000 7f7d77b36000 00000000 /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d41000 7f7d77d64000 00000000 /lib/x86_64-linux-gnu/ld-2.19.so
+7ffc34343000 7ffc34345000 00000000 [vdso]
+ffffffffff600000 ffffffffff601000 00000090 [vsyscall]
+
+00400000-07000000 r-xp 00000000 00:00 0
+07000000-07093000 r-xp 06c00000 00:2e 536754                             /path/to/gobench_server_main
+07093000-0722d000 rw-p 06c92000 00:2e 536754                             /path/to/gobench_server_main
+0722d000-07b21000 rw-p 00000000 00:00 0
+c000000000-c000036000 rw-p 00000000 00:00 0
+->
+07000000 07093000 06c00000 /path/to/gobench_server_main
+`
+
+var profSelfMapsTestsWithDeleted = `
+00400000-0040b000 r-xp 00000000 fc:01 787766                             /bin/cat (deleted)
+0060a000-0060b000 r--p 0000a000 fc:01 787766                             /bin/cat (deleted)
+0060b000-0060c000 rw-p 0000b000 fc:01 787766                             /bin/cat (deleted)
+014ab000-014cc000 rw-p 00000000 00:00 0                                  [heap]
+7f7d76af8000-7f7d7797c000 r--p 00000000 fc:01 1318064                    /usr/lib/locale/locale-archive
+7f7d7797c000-7f7d77b36000 r-xp 00000000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77b36000-7f7d77d36000 ---p 001ba000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d36000-7f7d77d3a000 r--p 001ba000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d3a000-7f7d77d3c000 rw-p 001be000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d3c000-7f7d77d41000 rw-p 00000000 00:00 0
+7f7d77d41000-7f7d77d64000 r-xp 00000000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f3f000-7f7d77f42000 rw-p 00000000 00:00 0
+7f7d77f61000-7f7d77f63000 rw-p 00000000 00:00 0
+7f7d77f63000-7f7d77f64000 r--p 00022000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f64000-7f7d77f65000 rw-p 00023000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f65000-7f7d77f66000 rw-p 00000000 00:00 0
+7ffc342a2000-7ffc342c3000 rw-p 00000000 00:00 0                          [stack]
+7ffc34343000-7ffc34345000 r-xp 00000000 00:00 0                          [vdso]
+ffffffffff600000-ffffffffff601000 r-xp 00000090 00:00 0                  [vsyscall]
+->
+00400000 0040b000 00000000 /bin/cat
+7f7d7797c000 7f7d77b36000 00000000 /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d41000 7f7d77d64000 00000000 /lib/x86_64-linux-gnu/ld-2.19.so
+7ffc34343000 7ffc34345000 00000000 [vdso]
+ffffffffff600000 ffffffffff601000 00000090 [vsyscall]
+
+00400000-0040b000 r-xp 00000000 fc:01 787766                             /bin/cat with space
+0060a000-0060b000 r--p 0000a000 fc:01 787766                             /bin/cat with space
+0060b000-0060c000 rw-p 0000b000 fc:01 787766                             /bin/cat with space
+014ab000-014cc000 rw-p 00000000 00:00 0                                  [heap]
+7f7d76af8000-7f7d7797c000 r--p 00000000 fc:01 1318064                    /usr/lib/locale/locale-archive
+7f7d7797c000-7f7d77b36000 r-xp 00000000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77b36000-7f7d77d36000 ---p 001ba000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d36000-7f7d77d3a000 r--p 001ba000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d3a000-7f7d77d3c000 rw-p 001be000 fc:01 1180226                    /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d3c000-7f7d77d41000 rw-p 00000000 00:00 0
+7f7d77d41000-7f7d77d64000 r-xp 00000000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f3f000-7f7d77f42000 rw-p 00000000 00:00 0
+7f7d77f61000-7f7d77f63000 rw-p 00000000 00:00 0
+7f7d77f63000-7f7d77f64000 r--p 00022000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f64000-7f7d77f65000 rw-p 00023000 fc:01 1180217                    /lib/x86_64-linux-gnu/ld-2.19.so
+7f7d77f65000-7f7d77f66000 rw-p 00000000 00:00 0
+7ffc342a2000-7ffc342c3000 rw-p 00000000 00:00 0                          [stack]
+7ffc34343000-7ffc34345000 r-xp 00000000 00:00 0                          [vdso]
+ffffffffff600000-ffffffffff601000 r-xp 00000090 00:00 0                  [vsyscall]
+->
+00400000 0040b000 00000000 /bin/cat with space
+7f7d7797c000 7f7d77b36000 00000000 /lib/x86_64-linux-gnu/libc-2.19.so
+7f7d77d41000 7f7d77d64000 00000000 /lib/x86_64-linux-gnu/ld-2.19.so
+7ffc34343000 7ffc34345000 00000000 [vdso]
+ffffffffff600000 ffffffffff601000 00000090 [vsyscall]
+`
+
+func TestProcSelfMaps(t *testing.T) {
+
+	f := func(t *testing.T, input string) {
+		for tx, tt := range strings.Split(input, "\n\n") {
+			in, out, ok := strings.Cut(tt, "->\n")
+			if !ok {
+				t.Fatal("malformed test case")
+			}
+			if len(out) > 0 && out[len(out)-1] != '\n' {
+				out += "\n"
+			}
+			var buf strings.Builder
+			parseProcSelfMaps([]byte(in), func(lo, hi, offset uint64, file, buildID string) {
+				fmt.Fprintf(&buf, "%08x %08x %08x %s\n", lo, hi, offset, file)
+			})
+			if buf.String() != out {
+				t.Errorf("#%d: have:\n%s\nwant:\n%s\n%q\n%q", tx, buf.String(), out, buf.String(), out)
+			}
+		}
+	}
+
+	t.Run("Normal", func(t *testing.T) {
+		f(t, profSelfMapsTests)
+	})
+
+	t.Run("WithDeletedFile", func(t *testing.T) {
+		f(t, profSelfMapsTestsWithDeleted)
+	})
+}
+
+// TestMapping checks the mapping section of CPU profiles
+// has the HasFunctions field set correctly. If all PCs included
+// in the samples are successfully symbolized, the corresponding
+// mapping entry (in this test case, only one entry) should have
+// its HasFunctions field set true.
+// The test generates a CPU profile that includes PCs from C side
+// that the runtime can't symbolize. See ./testdata/mappingtest.
+func TestMapping(t *testing.T) {
+	testenv.MustHaveGoRun(t)
+	testenv.MustHaveCGO(t)
+
+	prog := "./testdata/mappingtest/main.go"
+
+	// GoOnly includes only Go symbols that runtime will symbolize.
+	// Go+C includes C symbols that runtime will not symbolize.
+	for _, traceback := range []string{"GoOnly", "Go+C"} {
+		t.Run("traceback"+traceback, func(t *testing.T) {
+			cmd := exec.Command(testenv.GoToolPath(t), "run", prog)
+			if traceback != "GoOnly" {
+				cmd.Env = append(os.Environ(), "SETCGOTRACEBACK=1")
+			}
+			cmd.Stderr = new(bytes.Buffer)
+
+			out, err := cmd.Output()
+			if err != nil {
+				t.Fatalf("failed to run the test program %q: %v\n%v", prog, err, cmd.Stderr)
+			}
+
+			prof, err := profile.Parse(bytes.NewReader(out))
+			if err != nil {
+				t.Fatalf("failed to parse the generated profile data: %v", err)
+			}
+			t.Logf("Profile: %s", prof)
+
+			hit := make(map[*profile.Mapping]bool)
+			miss := make(map[*profile.Mapping]bool)
+			for _, loc := range prof.Location {
+				if symbolized(loc) {
+					hit[loc.Mapping] = true
+				} else {
+					miss[loc.Mapping] = true
+				}
+			}
+			if len(miss) == 0 {
+				t.Log("no location with missing symbol info was sampled")
+			}
+
+			for _, m := range prof.Mapping {
+				if miss[m] && m.HasFunctions {
+					t.Errorf("mapping %+v has HasFunctions=true, but contains locations with failed symbolization", m)
+					continue
+				}
+				if !miss[m] && hit[m] && !m.HasFunctions {
+					t.Errorf("mapping %+v has HasFunctions=false, but all referenced locations from this lapping were symbolized successfully", m)
+					continue
+				}
+			}
+
+			if traceback == "Go+C" {
+				// The test code was arranged to have PCs from C and
+				// they are not symbolized.
+				// Check no Location containing those unsymbolized PCs contains multiple lines.
+				for i, loc := range prof.Location {
+					if !symbolized(loc) && len(loc.Line) > 1 {
+						t.Errorf("Location[%d] contains unsymbolized PCs and multiple lines: %v", i, loc)
+					}
+				}
+			}
+		})
+	}
+}
+
+func symbolized(loc *profile.Location) bool {
+	if len(loc.Line) == 0 {
+		return false
+	}
+	l := loc.Line[0]
+	f := l.Function
+	if l.Line == 0 || f == nil || f.Name == "" || f.Filename == "" {
+		return false
+	}
+	return true
+}
+
+// TestFakeMapping tests if at least one mapping exists
+// (including a fake mapping), and their HasFunctions bits
+// are set correctly.
+func TestFakeMapping(t *testing.T) {
+	var buf bytes.Buffer
+	if err := Lookup("heap").WriteTo(&buf, 0); err != nil {
+		t.Fatalf("failed to write heap profile: %v", err)
+	}
+	prof, err := profile.Parse(&buf)
+	if err != nil {
+		t.Fatalf("failed to parse the generated profile data: %v", err)
+	}
+	t.Logf("Profile: %s", prof)
+	if len(prof.Mapping) == 0 {
+		t.Fatal("want profile with at least one mapping entry, got 0 mapping")
+	}
+
+	hit := make(map[*profile.Mapping]bool)
+	miss := make(map[*profile.Mapping]bool)
+	for _, loc := range prof.Location {
+		if symbolized(loc) {
+			hit[loc.Mapping] = true
+		} else {
+			miss[loc.Mapping] = true
+		}
+	}
+	for _, m := range prof.Mapping {
+		if miss[m] && m.HasFunctions {
+			t.Errorf("mapping %+v has HasFunctions=true, but contains locations with failed symbolization", m)
+			continue
+		}
+		if !miss[m] && hit[m] && !m.HasFunctions {
+			t.Errorf("mapping %+v has HasFunctions=false, but all referenced locations from this lapping were symbolized successfully", m)
+			continue
+		}
+	}
+}
+
+// Make sure the profiler can handle an empty stack trace.
+// See issue 37967.
+func TestEmptyStack(t *testing.T) {
+	b := []uint64{
+		3, 0, 500, // hz = 500
+		3, 0, 10, // 10 samples with an empty stack trace
+	}
+	_, err := translateCPUProfile(b, 2)
+	if err != nil {
+		t.Fatalf("translating profile: %v", err)
+	}
+}
diff --git a/src/runtime/pprof/proto_windows.go b/src/runtime/pprof/proto_windows.go
new file mode 100644
index 0000000..d5ae4a5
--- /dev/null
+++ b/src/runtime/pprof/proto_windows.go
@@ -0,0 +1,73 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"errors"
+	"internal/syscall/windows"
+	"syscall"
+)
+
+// readMapping adds memory mapping information to the profile.
+func (b *profileBuilder) readMapping() {
+	snap, err := createModuleSnapshot()
+	if err != nil {
+		// pprof expects a map entry, so fake one, when we haven't added anything yet.
+		b.addMappingEntry(0, 0, 0, "", "", true)
+		return
+	}
+	defer func() { _ = syscall.CloseHandle(snap) }()
+
+	var module windows.ModuleEntry32
+	module.Size = uint32(windows.SizeofModuleEntry32)
+	err = windows.Module32First(snap, &module)
+	if err != nil {
+		// pprof expects a map entry, so fake one, when we haven't added anything yet.
+		b.addMappingEntry(0, 0, 0, "", "", true)
+		return
+	}
+	for err == nil {
+		exe := syscall.UTF16ToString(module.ExePath[:])
+		b.addMappingEntry(
+			uint64(module.ModBaseAddr),
+			uint64(module.ModBaseAddr)+uint64(module.ModBaseSize),
+			0,
+			exe,
+			peBuildID(exe),
+			false,
+		)
+		err = windows.Module32Next(snap, &module)
+	}
+}
+
+func readMainModuleMapping() (start, end uint64, err error) {
+	snap, err := createModuleSnapshot()
+	if err != nil {
+		return 0, 0, err
+	}
+	defer func() { _ = syscall.CloseHandle(snap) }()
+
+	var module windows.ModuleEntry32
+	module.Size = uint32(windows.SizeofModuleEntry32)
+	err = windows.Module32First(snap, &module)
+	if err != nil {
+		return 0, 0, err
+	}
+
+	return uint64(module.ModBaseAddr), uint64(module.ModBaseAddr) + uint64(module.ModBaseSize), nil
+}
+
+func createModuleSnapshot() (syscall.Handle, error) {
+	for {
+		snap, err := syscall.CreateToolhelp32Snapshot(windows.TH32CS_SNAPMODULE|windows.TH32CS_SNAPMODULE32, uint32(syscall.Getpid()))
+		var errno syscall.Errno
+		if err != nil && errors.As(err, &errno) && errno == windows.ERROR_BAD_LENGTH {
+			// When CreateToolhelp32Snapshot(SNAPMODULE|SNAPMODULE32, ...) fails
+			// with ERROR_BAD_LENGTH then it should be retried until it succeeds.
+			continue
+		}
+		return snap, err
+	}
+}
diff --git a/src/runtime/pprof/protobuf.go b/src/runtime/pprof/protobuf.go
new file mode 100644
index 0000000..f7ec1ac
--- /dev/null
+++ b/src/runtime/pprof/protobuf.go
@@ -0,0 +1,141 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+// A protobuf is a simple protocol buffer encoder.
+type protobuf struct {
+	data []byte
+	tmp  [16]byte
+	nest int
+}
+
+func (b *protobuf) varint(x uint64) {
+	for x >= 128 {
+		b.data = append(b.data, byte(x)|0x80)
+		x >>= 7
+	}
+	b.data = append(b.data, byte(x))
+}
+
+func (b *protobuf) length(tag int, len int) {
+	b.varint(uint64(tag)<<3 | 2)
+	b.varint(uint64(len))
+}
+
+func (b *protobuf) uint64(tag int, x uint64) {
+	// append varint to b.data
+	b.varint(uint64(tag)<<3 | 0)
+	b.varint(x)
+}
+
+func (b *protobuf) uint64s(tag int, x []uint64) {
+	if len(x) > 2 {
+		// Use packed encoding
+		n1 := len(b.data)
+		for _, u := range x {
+			b.varint(u)
+		}
+		n2 := len(b.data)
+		b.length(tag, n2-n1)
+		n3 := len(b.data)
+		copy(b.tmp[:], b.data[n2:n3])
+		copy(b.data[n1+(n3-n2):], b.data[n1:n2])
+		copy(b.data[n1:], b.tmp[:n3-n2])
+		return
+	}
+	for _, u := range x {
+		b.uint64(tag, u)
+	}
+}
+
+func (b *protobuf) uint64Opt(tag int, x uint64) {
+	if x == 0 {
+		return
+	}
+	b.uint64(tag, x)
+}
+
+func (b *protobuf) int64(tag int, x int64) {
+	u := uint64(x)
+	b.uint64(tag, u)
+}
+
+func (b *protobuf) int64Opt(tag int, x int64) {
+	if x == 0 {
+		return
+	}
+	b.int64(tag, x)
+}
+
+func (b *protobuf) int64s(tag int, x []int64) {
+	if len(x) > 2 {
+		// Use packed encoding
+		n1 := len(b.data)
+		for _, u := range x {
+			b.varint(uint64(u))
+		}
+		n2 := len(b.data)
+		b.length(tag, n2-n1)
+		n3 := len(b.data)
+		copy(b.tmp[:], b.data[n2:n3])
+		copy(b.data[n1+(n3-n2):], b.data[n1:n2])
+		copy(b.data[n1:], b.tmp[:n3-n2])
+		return
+	}
+	for _, u := range x {
+		b.int64(tag, u)
+	}
+}
+
+func (b *protobuf) string(tag int, x string) {
+	b.length(tag, len(x))
+	b.data = append(b.data, x...)
+}
+
+func (b *protobuf) strings(tag int, x []string) {
+	for _, s := range x {
+		b.string(tag, s)
+	}
+}
+
+func (b *protobuf) stringOpt(tag int, x string) {
+	if x == "" {
+		return
+	}
+	b.string(tag, x)
+}
+
+func (b *protobuf) bool(tag int, x bool) {
+	if x {
+		b.uint64(tag, 1)
+	} else {
+		b.uint64(tag, 0)
+	}
+}
+
+func (b *protobuf) boolOpt(tag int, x bool) {
+	if !x {
+		return
+	}
+	b.bool(tag, x)
+}
+
+type msgOffset int
+
+func (b *protobuf) startMessage() msgOffset {
+	b.nest++
+	return msgOffset(len(b.data))
+}
+
+func (b *protobuf) endMessage(tag int, start msgOffset) {
+	n1 := int(start)
+	n2 := len(b.data)
+	b.length(tag, n2-n1)
+	n3 := len(b.data)
+	copy(b.tmp[:], b.data[n2:n3])
+	copy(b.data[n1+(n3-n2):], b.data[n1:n2])
+	copy(b.data[n1:], b.tmp[:n3-n2])
+	b.nest--
+}
diff --git a/src/runtime/pprof/protomem.go b/src/runtime/pprof/protomem.go
new file mode 100644
index 0000000..fa75a28
--- /dev/null
+++ b/src/runtime/pprof/protomem.go
@@ -0,0 +1,93 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"io"
+	"math"
+	"runtime"
+	"strings"
+)
+
+// writeHeapProto writes the current heap profile in protobuf format to w.
+func writeHeapProto(w io.Writer, p []runtime.MemProfileRecord, rate int64, defaultSampleType string) error {
+	b := newProfileBuilder(w)
+	b.pbValueType(tagProfile_PeriodType, "space", "bytes")
+	b.pb.int64Opt(tagProfile_Period, rate)
+	b.pbValueType(tagProfile_SampleType, "alloc_objects", "count")
+	b.pbValueType(tagProfile_SampleType, "alloc_space", "bytes")
+	b.pbValueType(tagProfile_SampleType, "inuse_objects", "count")
+	b.pbValueType(tagProfile_SampleType, "inuse_space", "bytes")
+	if defaultSampleType != "" {
+		b.pb.int64Opt(tagProfile_DefaultSampleType, b.stringIndex(defaultSampleType))
+	}
+
+	values := []int64{0, 0, 0, 0}
+	var locs []uint64
+	for _, r := range p {
+		hideRuntime := true
+		for tries := 0; tries < 2; tries++ {
+			stk := r.Stack()
+			// For heap profiles, all stack
+			// addresses are return PCs, which is
+			// what appendLocsForStack expects.
+			if hideRuntime {
+				for i, addr := range stk {
+					if f := runtime.FuncForPC(addr); f != nil && strings.HasPrefix(f.Name(), "runtime.") {
+						continue
+					}
+					// Found non-runtime. Show any runtime uses above it.
+					stk = stk[i:]
+					break
+				}
+			}
+			locs = b.appendLocsForStack(locs[:0], stk)
+			if len(locs) > 0 {
+				break
+			}
+			hideRuntime = false // try again, and show all frames next time.
+		}
+
+		values[0], values[1] = scaleHeapSample(r.AllocObjects, r.AllocBytes, rate)
+		values[2], values[3] = scaleHeapSample(r.InUseObjects(), r.InUseBytes(), rate)
+		var blockSize int64
+		if r.AllocObjects > 0 {
+			blockSize = r.AllocBytes / r.AllocObjects
+		}
+		b.pbSample(values, locs, func() {
+			if blockSize != 0 {
+				b.pbLabel(tagSample_Label, "bytes", "", blockSize)
+			}
+		})
+	}
+	b.build()
+	return nil
+}
+
+// scaleHeapSample adjusts the data from a heap Sample to
+// account for its probability of appearing in the collected
+// data. heap profiles are a sampling of the memory allocations
+// requests in a program. We estimate the unsampled value by dividing
+// each collected sample by its probability of appearing in the
+// profile. heap profiles rely on a poisson process to determine
+// which samples to collect, based on the desired average collection
+// rate R. The probability of a sample of size S to appear in that
+// profile is 1-exp(-S/R).
+func scaleHeapSample(count, size, rate int64) (int64, int64) {
+	if count == 0 || size == 0 {
+		return 0, 0
+	}
+
+	if rate <= 1 {
+		// if rate==1 all samples were collected so no adjustment is needed.
+		// if rate<1 treat as unknown and skip scaling.
+		return count, size
+	}
+
+	avgSize := float64(size) / float64(count)
+	scale := 1 / (1 - math.Exp(-avgSize/float64(rate)))
+
+	return int64(float64(count) * scale), int64(float64(size) * scale)
+}
diff --git a/src/runtime/pprof/protomem_test.go b/src/runtime/pprof/protomem_test.go
new file mode 100644
index 0000000..505c323
--- /dev/null
+++ b/src/runtime/pprof/protomem_test.go
@@ -0,0 +1,146 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"bytes"
+	"fmt"
+	"internal/profile"
+	"runtime"
+	"slices"
+	"strings"
+	"testing"
+)
+
+func TestConvertMemProfile(t *testing.T) {
+	addr1, addr2, map1, map2 := testPCs(t)
+
+	// MemProfileRecord stacks are return PCs, so add one to the
+	// addresses recorded in the "profile". The proto profile
+	// locations are call PCs, so conversion will subtract one
+	// from these and get back to addr1 and addr2.
+	a1, a2 := uintptr(addr1)+1, uintptr(addr2)+1
+	rate := int64(512 * 1024)
+	rec := []runtime.MemProfileRecord{
+		{AllocBytes: 4096, FreeBytes: 1024, AllocObjects: 4, FreeObjects: 1, Stack0: [32]uintptr{a1, a2}},
+		{AllocBytes: 512 * 1024, FreeBytes: 0, AllocObjects: 1, FreeObjects: 0, Stack0: [32]uintptr{a2 + 1, a2 + 2}},
+		{AllocBytes: 512 * 1024, FreeBytes: 512 * 1024, AllocObjects: 1, FreeObjects: 1, Stack0: [32]uintptr{a1 + 1, a1 + 2, a2 + 3}},
+	}
+
+	periodType := &profile.ValueType{Type: "space", Unit: "bytes"}
+	sampleType := []*profile.ValueType{
+		{Type: "alloc_objects", Unit: "count"},
+		{Type: "alloc_space", Unit: "bytes"},
+		{Type: "inuse_objects", Unit: "count"},
+		{Type: "inuse_space", Unit: "bytes"},
+	}
+	samples := []*profile.Sample{
+		{
+			Value: []int64{2050, 2099200, 1537, 1574400},
+			Location: []*profile.Location{
+				{ID: 1, Mapping: map1, Address: addr1},
+				{ID: 2, Mapping: map2, Address: addr2},
+			},
+			NumLabel: map[string][]int64{"bytes": {1024}},
+		},
+		{
+			Value: []int64{1, 829411, 1, 829411},
+			Location: []*profile.Location{
+				{ID: 3, Mapping: map2, Address: addr2 + 1},
+				{ID: 4, Mapping: map2, Address: addr2 + 2},
+			},
+			NumLabel: map[string][]int64{"bytes": {512 * 1024}},
+		},
+		{
+			Value: []int64{1, 829411, 0, 0},
+			Location: []*profile.Location{
+				{ID: 5, Mapping: map1, Address: addr1 + 1},
+				{ID: 6, Mapping: map1, Address: addr1 + 2},
+				{ID: 7, Mapping: map2, Address: addr2 + 3},
+			},
+			NumLabel: map[string][]int64{"bytes": {512 * 1024}},
+		},
+	}
+	for _, tc := range []struct {
+		name              string
+		defaultSampleType string
+	}{
+		{"heap", ""},
+		{"allocs", "alloc_space"},
+	} {
+		t.Run(tc.name, func(t *testing.T) {
+			var buf bytes.Buffer
+			if err := writeHeapProto(&buf, rec, rate, tc.defaultSampleType); err != nil {
+				t.Fatalf("writing profile: %v", err)
+			}
+
+			p, err := profile.Parse(&buf)
+			if err != nil {
+				t.Fatalf("profile.Parse: %v", err)
+			}
+
+			checkProfile(t, p, rate, periodType, sampleType, samples, tc.defaultSampleType)
+		})
+	}
+}
+
+func genericAllocFunc[T interface{ uint32 | uint64 }](n int) []T {
+	return make([]T, n)
+}
+
+func profileToString(p *profile.Profile) []string {
+	var res []string
+	for _, s := range p.Sample {
+		var funcs []string
+		for i := len(s.Location) - 1; i >= 0; i-- {
+			loc := s.Location[i]
+			for j := len(loc.Line) - 1; j >= 0; j-- {
+				line := loc.Line[j]
+				funcs = append(funcs, line.Function.Name)
+			}
+		}
+		res = append(res, fmt.Sprintf("%s %v", strings.Join(funcs, ";"), s.Value))
+	}
+	return res
+}
+
+// This is a regression test for https://go.dev/issue/64528 .
+func TestGenericsHashKeyInPprofBuilder(t *testing.T) {
+	previousRate := runtime.MemProfileRate
+	runtime.MemProfileRate = 1
+	defer func() {
+		runtime.MemProfileRate = previousRate
+	}()
+	for _, sz := range []int{128, 256} {
+		genericAllocFunc[uint32](sz / 4)
+	}
+	for _, sz := range []int{32, 64} {
+		genericAllocFunc[uint64](sz / 8)
+	}
+
+	runtime.GC()
+	buf := bytes.NewBuffer(nil)
+	if err := WriteHeapProfile(buf); err != nil {
+		t.Fatalf("writing profile: %v", err)
+	}
+	p, err := profile.Parse(buf)
+	if err != nil {
+		t.Fatalf("profile.Parse: %v", err)
+	}
+
+	actual := profileToString(p)
+	expected := []string{
+		"testing.tRunner;runtime/pprof.TestGenericsHashKeyInPprofBuilder;runtime/pprof.genericAllocFunc[go.shape.uint32] [1 128 0 0]",
+		"testing.tRunner;runtime/pprof.TestGenericsHashKeyInPprofBuilder;runtime/pprof.genericAllocFunc[go.shape.uint32] [1 256 0 0]",
+		"testing.tRunner;runtime/pprof.TestGenericsHashKeyInPprofBuilder;runtime/pprof.genericAllocFunc[go.shape.uint64] [1 32 0 0]",
+		"testing.tRunner;runtime/pprof.TestGenericsHashKeyInPprofBuilder;runtime/pprof.genericAllocFunc[go.shape.uint64] [1 64 0 0]",
+	}
+
+	for _, l := range expected {
+		if !slices.Contains(actual, l) {
+			t.Errorf("profile = %v\nwant = %v", strings.Join(actual, "\n"), l)
+		}
+	}
+}
diff --git a/src/runtime/pprof/runtime.go b/src/runtime/pprof/runtime.go
new file mode 100644
index 0000000..71f89ca
--- /dev/null
+++ b/src/runtime/pprof/runtime.go
@@ -0,0 +1,52 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"context"
+	"runtime"
+	"unsafe"
+)
+
+// runtime_FrameStartLine is defined in runtime/symtab.go.
+//
+//go:noescape
+func runtime_FrameStartLine(f *runtime.Frame) int
+
+// runtime_FrameSymbolName is defined in runtime/symtab.go.
+//
+//go:noescape
+func runtime_FrameSymbolName(f *runtime.Frame) string
+
+// runtime_expandFinalInlineFrame is defined in runtime/symtab.go.
+func runtime_expandFinalInlineFrame(stk []uintptr) []uintptr
+
+// runtime_setProfLabel is defined in runtime/proflabel.go.
+func runtime_setProfLabel(labels unsafe.Pointer)
+
+// runtime_getProfLabel is defined in runtime/proflabel.go.
+func runtime_getProfLabel() unsafe.Pointer
+
+// SetGoroutineLabels sets the current goroutine's labels to match ctx.
+// A new goroutine inherits the labels of the goroutine that created it.
+// This is a lower-level API than Do, which should be used instead when possible.
+func SetGoroutineLabels(ctx context.Context) {
+	ctxLabels, _ := ctx.Value(labelContextKey{}).(*labelMap)
+	runtime_setProfLabel(unsafe.Pointer(ctxLabels))
+}
+
+// Do calls f with a copy of the parent context with the
+// given labels added to the parent's label map.
+// Goroutines spawned while executing f will inherit the augmented label-set.
+// Each key/value pair in labels is inserted into the label map in the
+// order provided, overriding any previous value for the same key.
+// The augmented label map will be set for the duration of the call to f
+// and restored once f returns.
+func Do(ctx context.Context, labels LabelSet, f func(context.Context)) {
+	defer SetGoroutineLabels(ctx)
+	ctx = WithLabels(ctx, labels)
+	SetGoroutineLabels(ctx)
+	f(ctx)
+}
diff --git a/src/runtime/pprof/runtime_test.go b/src/runtime/pprof/runtime_test.go
new file mode 100644
index 0000000..0dd5324
--- /dev/null
+++ b/src/runtime/pprof/runtime_test.go
@@ -0,0 +1,96 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package pprof
+
+import (
+	"context"
+	"fmt"
+	"reflect"
+	"testing"
+)
+
+func TestSetGoroutineLabels(t *testing.T) {
+	sync := make(chan struct{})
+
+	wantLabels := map[string]string{}
+	if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+		t.Errorf("Expected parent goroutine's profile labels to be empty before test, got %v", gotLabels)
+	}
+	go func() {
+		if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+			t.Errorf("Expected child goroutine's profile labels to be empty before test, got %v", gotLabels)
+		}
+		sync <- struct{}{}
+	}()
+	<-sync
+
+	wantLabels = map[string]string{"key": "value"}
+	ctx := WithLabels(context.Background(), Labels("key", "value"))
+	SetGoroutineLabels(ctx)
+	if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+		t.Errorf("parent goroutine's profile labels: got %v, want %v", gotLabels, wantLabels)
+	}
+	go func() {
+		if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+			t.Errorf("child goroutine's profile labels: got %v, want %v", gotLabels, wantLabels)
+		}
+		sync <- struct{}{}
+	}()
+	<-sync
+
+	wantLabels = map[string]string{}
+	ctx = context.Background()
+	SetGoroutineLabels(ctx)
+	if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+		t.Errorf("Expected parent goroutine's profile labels to be empty, got %v", gotLabels)
+	}
+	go func() {
+		if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+			t.Errorf("Expected child goroutine's profile labels to be empty, got %v", gotLabels)
+		}
+		sync <- struct{}{}
+	}()
+	<-sync
+}
+
+func TestDo(t *testing.T) {
+	wantLabels := map[string]string{}
+	if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+		t.Errorf("Expected parent goroutine's profile labels to be empty before Do, got %v", gotLabels)
+	}
+
+	Do(context.Background(), Labels("key1", "value1", "key2", "value2"), func(ctx context.Context) {
+		wantLabels := map[string]string{"key1": "value1", "key2": "value2"}
+		if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+			t.Errorf("parent goroutine's profile labels: got %v, want %v", gotLabels, wantLabels)
+		}
+
+		sync := make(chan struct{})
+		go func() {
+			wantLabels := map[string]string{"key1": "value1", "key2": "value2"}
+			if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+				t.Errorf("child goroutine's profile labels: got %v, want %v", gotLabels, wantLabels)
+			}
+			sync <- struct{}{}
+		}()
+		<-sync
+
+	})
+
+	wantLabels = map[string]string{}
+	if gotLabels := getProfLabel(); !reflect.DeepEqual(gotLabels, wantLabels) {
+		fmt.Printf("%#v", gotLabels)
+		fmt.Printf("%#v", wantLabels)
+		t.Errorf("Expected parent goroutine's profile labels to be empty after Do, got %v", gotLabels)
+	}
+}
+
+func getProfLabel() map[string]string {
+	l := (*labelMap)(runtime_getProfLabel())
+	if l == nil {
+		return map[string]string{}
+	}
+	return *l
+}
diff --git a/src/runtime/pprof/rusage_test.go b/src/runtime/pprof/rusage_test.go
new file mode 100644
index 0000000..8039510
--- /dev/null
+++ b/src/runtime/pprof/rusage_test.go
@@ -0,0 +1,41 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package pprof
+
+import (
+	"syscall"
+	"time"
+)
+
+func init() {
+	diffCPUTimeImpl = diffCPUTimeRUsage
+}
+
+func diffCPUTimeRUsage(f func()) (user, system time.Duration) {
+	ok := true
+	var before, after syscall.Rusage
+
+	err := syscall.Getrusage(syscall.RUSAGE_SELF, &before)
+	if err != nil {
+		ok = false
+	}
+
+	f()
+
+	err = syscall.Getrusage(syscall.RUSAGE_SELF, &after)
+	if err != nil {
+		ok = false
+	}
+
+	if !ok {
+		return 0, 0
+	}
+
+	user = time.Duration(after.Utime.Nano() - before.Utime.Nano())
+	system = time.Duration(after.Stime.Nano() - before.Stime.Nano())
+	return user, system
+}
diff --git a/src/runtime/pprof/testdata/README b/src/runtime/pprof/testdata/README
new file mode 100644
index 0000000..876538e
--- /dev/null
+++ b/src/runtime/pprof/testdata/README
@@ -0,0 +1,9 @@
+These binaries were generated by:
+
+$ cat empty.s
+.global _start
+_start:
+$ as --32 -o empty.o empty.s && ld  --build-id -m elf_i386 -o test32 empty.o
+$ as --64 -o empty.o empty.s && ld --build-id -o test64 empty.o
+$ powerpc-linux-gnu-as -o empty.o empty.s && powerpc-linux-gnu-ld --build-id -o test32be empty.o
+$ powerpc64-linux-gnu-as -o empty.o empty.s && powerpc64-linux-gnu-ld --build-id -o test64be empty.o
diff --git a/src/runtime/pprof/testdata/mappingtest/main.go b/src/runtime/pprof/testdata/mappingtest/main.go
new file mode 100644
index 0000000..484b7f9
--- /dev/null
+++ b/src/runtime/pprof/testdata/mappingtest/main.go
@@ -0,0 +1,108 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This program outputs a CPU profile that includes
+// both Go and Cgo stacks. This is used by the mapping info
+// tests in runtime/pprof.
+//
+// If SETCGOTRACEBACK=1 is set, the CPU profile will includes
+// PCs from C side but they will not be symbolized.
+package main
+
+/*
+#include <stdint.h>
+#include <stdlib.h>
+
+int cpuHogCSalt1 = 0;
+int cpuHogCSalt2 = 0;
+
+void CPUHogCFunction0(int foo) {
+	int i;
+	for (i = 0; i < 100000; i++) {
+		if (foo > 0) {
+			foo *= foo;
+		} else {
+			foo *= foo + 1;
+		}
+		cpuHogCSalt2 = foo;
+	}
+}
+
+void CPUHogCFunction() {
+	CPUHogCFunction0(cpuHogCSalt1);
+}
+
+struct CgoTracebackArg {
+	uintptr_t context;
+        uintptr_t sigContext;
+	uintptr_t *buf;
+        uintptr_t max;
+};
+
+void CollectCgoTraceback(void* parg) {
+        struct CgoTracebackArg* arg = (struct CgoTracebackArg*)(parg);
+	arg->buf[0] = (uintptr_t)(CPUHogCFunction0);
+	arg->buf[1] = (uintptr_t)(CPUHogCFunction);
+	arg->buf[2] = 0;
+};
+*/
+import "C"
+
+import (
+	"log"
+	"os"
+	"runtime"
+	"runtime/pprof"
+	"time"
+	"unsafe"
+)
+
+func init() {
+	if v := os.Getenv("SETCGOTRACEBACK"); v == "1" {
+		// Collect some PCs from C-side, but don't symbolize.
+		runtime.SetCgoTraceback(0, unsafe.Pointer(C.CollectCgoTraceback), nil, nil)
+	}
+}
+
+func main() {
+	go cpuHogGoFunction()
+	go cpuHogCFunction()
+	runtime.Gosched()
+
+	if err := pprof.StartCPUProfile(os.Stdout); err != nil {
+		log.Fatal("can't start CPU profile: ", err)
+	}
+	time.Sleep(200 * time.Millisecond)
+	pprof.StopCPUProfile()
+
+	if err := os.Stdout.Close(); err != nil {
+		log.Fatal("can't write CPU profile: ", err)
+	}
+}
+
+var salt1 int
+var salt2 int
+
+func cpuHogGoFunction() {
+	for {
+		foo := salt1
+		for i := 0; i < 1e5; i++ {
+			if foo > 0 {
+				foo *= foo
+			} else {
+				foo *= foo + 1
+			}
+			salt2 = foo
+		}
+		runtime.Gosched()
+	}
+}
+
+func cpuHogCFunction() {
+	// Generates CPU profile samples including a Cgo call path.
+	for {
+		C.CPUHogCFunction()
+		runtime.Gosched()
+	}
+}
diff --git a/src/runtime/pprof/testdata/test32 b/src/runtime/pprof/testdata/test32
new file mode 100644
index 0000000..ce59472
--- /dev/null
+++ b/src/runtime/pprof/testdata/test32
diff --git a/src/runtime/pprof/testdata/test32be b/src/runtime/pprof/testdata/test32be
new file mode 100644
index 0000000..f13a732
--- /dev/null
+++ b/src/runtime/pprof/testdata/test32be
diff --git a/src/runtime/pprof/testdata/test64 b/src/runtime/pprof/testdata/test64
new file mode 100644
index 0000000..3fb42fb
--- /dev/null
+++ b/src/runtime/pprof/testdata/test64
diff --git a/src/runtime/pprof/testdata/test64be b/src/runtime/pprof/testdata/test64be
new file mode 100644
index 0000000..09b4b01
--- /dev/null
+++ b/src/runtime/pprof/testdata/test64be
diff --git a/src/runtime/preempt.go b/src/runtime/preempt.go
new file mode 100644
index 0000000..76d8ba4
--- /dev/null
+++ b/src/runtime/preempt.go
@@ -0,0 +1,447 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Goroutine preemption
+//
+// A goroutine can be preempted at any safe-point. Currently, there
+// are a few categories of safe-points:
+//
+// 1. A blocked safe-point occurs for the duration that a goroutine is
+//    descheduled, blocked on synchronization, or in a system call.
+//
+// 2. Synchronous safe-points occur when a running goroutine checks
+//    for a preemption request.
+//
+// 3. Asynchronous safe-points occur at any instruction in user code
+//    where the goroutine can be safely paused and a conservative
+//    stack and register scan can find stack roots. The runtime can
+//    stop a goroutine at an async safe-point using a signal.
+//
+// At both blocked and synchronous safe-points, a goroutine's CPU
+// state is minimal and the garbage collector has complete information
+// about its entire stack. This makes it possible to deschedule a
+// goroutine with minimal space, and to precisely scan a goroutine's
+// stack.
+//
+// Synchronous safe-points are implemented by overloading the stack
+// bound check in function prologues. To preempt a goroutine at the
+// next synchronous safe-point, the runtime poisons the goroutine's
+// stack bound to a value that will cause the next stack bound check
+// to fail and enter the stack growth implementation, which will
+// detect that it was actually a preemption and redirect to preemption
+// handling.
+//
+// Preemption at asynchronous safe-points is implemented by suspending
+// the thread using an OS mechanism (e.g., signals) and inspecting its
+// state to determine if the goroutine was at an asynchronous
+// safe-point. Since the thread suspension itself is generally
+// asynchronous, it also checks if the running goroutine wants to be
+// preempted, since this could have changed. If all conditions are
+// satisfied, it adjusts the signal context to make it look like the
+// signaled thread just called asyncPreempt and resumes the thread.
+// asyncPreempt spills all registers and enters the scheduler.
+//
+// (An alternative would be to preempt in the signal handler itself.
+// This would let the OS save and restore the register state and the
+// runtime would only need to know how to extract potentially
+// pointer-containing registers from the signal context. However, this
+// would consume an M for every preempted G, and the scheduler itself
+// is not designed to run from a signal handler, as it tends to
+// allocate memory and start threads in the preemption path.)
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+)
+
+type suspendGState struct {
+	g *g
+
+	// dead indicates the goroutine was not suspended because it
+	// is dead. This goroutine could be reused after the dead
+	// state was observed, so the caller must not assume that it
+	// remains dead.
+	dead bool
+
+	// stopped indicates that this suspendG transitioned the G to
+	// _Gwaiting via g.preemptStop and thus is responsible for
+	// readying it when done.
+	stopped bool
+}
+
+// suspendG suspends goroutine gp at a safe-point and returns the
+// state of the suspended goroutine. The caller gets read access to
+// the goroutine until it calls resumeG.
+//
+// It is safe for multiple callers to attempt to suspend the same
+// goroutine at the same time. The goroutine may execute between
+// subsequent successful suspend operations. The current
+// implementation grants exclusive access to the goroutine, and hence
+// multiple callers will serialize. However, the intent is to grant
+// shared read access, so please don't depend on exclusive access.
+//
+// This must be called from the system stack and the user goroutine on
+// the current M (if any) must be in a preemptible state. This
+// prevents deadlocks where two goroutines attempt to suspend each
+// other and both are in non-preemptible states. There are other ways
+// to resolve this deadlock, but this seems simplest.
+//
+// TODO(austin): What if we instead required this to be called from a
+// user goroutine? Then we could deschedule the goroutine while
+// waiting instead of blocking the thread. If two goroutines tried to
+// suspend each other, one of them would win and the other wouldn't
+// complete the suspend until it was resumed. We would have to be
+// careful that they couldn't actually queue up suspend for each other
+// and then both be suspended. This would also avoid the need for a
+// kernel context switch in the synchronous case because we could just
+// directly schedule the waiter. The context switch is unavoidable in
+// the signal case.
+//
+//go:systemstack
+func suspendG(gp *g) suspendGState {
+	if mp := getg().m; mp.curg != nil && readgstatus(mp.curg) == _Grunning {
+		// Since we're on the system stack of this M, the user
+		// G is stuck at an unsafe point. If another goroutine
+		// were to try to preempt m.curg, it could deadlock.
+		throw("suspendG from non-preemptible goroutine")
+	}
+
+	// See https://golang.org/cl/21503 for justification of the yield delay.
+	const yieldDelay = 10 * 1000
+	var nextYield int64
+
+	// Drive the goroutine to a preemption point.
+	stopped := false
+	var asyncM *m
+	var asyncGen uint32
+	var nextPreemptM int64
+	for i := 0; ; i++ {
+		switch s := readgstatus(gp); s {
+		default:
+			if s&_Gscan != 0 {
+				// Someone else is suspending it. Wait
+				// for them to finish.
+				//
+				// TODO: It would be nicer if we could
+				// coalesce suspends.
+				break
+			}
+
+			dumpgstatus(gp)
+			throw("invalid g status")
+
+		case _Gdead:
+			// Nothing to suspend.
+			//
+			// preemptStop may need to be cleared, but
+			// doing that here could race with goroutine
+			// reuse. Instead, goexit0 clears it.
+			return suspendGState{dead: true}
+
+		case _Gcopystack:
+			// The stack is being copied. We need to wait
+			// until this is done.
+
+		case _Gpreempted:
+			// We (or someone else) suspended the G. Claim
+			// ownership of it by transitioning it to
+			// _Gwaiting.
+			if !casGFromPreempted(gp, _Gpreempted, _Gwaiting) {
+				break
+			}
+
+			// We stopped the G, so we have to ready it later.
+			stopped = true
+
+			s = _Gwaiting
+			fallthrough
+
+		case _Grunnable, _Gsyscall, _Gwaiting:
+			// Claim goroutine by setting scan bit.
+			// This may race with execution or readying of gp.
+			// The scan bit keeps it from transition state.
+			if !castogscanstatus(gp, s, s|_Gscan) {
+				break
+			}
+
+			// Clear the preemption request. It's safe to
+			// reset the stack guard because we hold the
+			// _Gscan bit and thus own the stack.
+			gp.preemptStop = false
+			gp.preempt = false
+			gp.stackguard0 = gp.stack.lo + stackGuard
+
+			// The goroutine was already at a safe-point
+			// and we've now locked that in.
+			//
+			// TODO: It would be much better if we didn't
+			// leave it in _Gscan, but instead gently
+			// prevented its scheduling until resumption.
+			// Maybe we only use this to bump a suspended
+			// count and the scheduler skips suspended
+			// goroutines? That wouldn't be enough for
+			// {_Gsyscall,_Gwaiting} -> _Grunning. Maybe
+			// for all those transitions we need to check
+			// suspended and deschedule?
+			return suspendGState{g: gp, stopped: stopped}
+
+		case _Grunning:
+			// Optimization: if there is already a pending preemption request
+			// (from the previous loop iteration), don't bother with the atomics.
+			if gp.preemptStop && gp.preempt && gp.stackguard0 == stackPreempt && asyncM == gp.m && asyncM.preemptGen.Load() == asyncGen {
+				break
+			}
+
+			// Temporarily block state transitions.
+			if !castogscanstatus(gp, _Grunning, _Gscanrunning) {
+				break
+			}
+
+			// Request synchronous preemption.
+			gp.preemptStop = true
+			gp.preempt = true
+			gp.stackguard0 = stackPreempt
+
+			// Prepare for asynchronous preemption.
+			asyncM2 := gp.m
+			asyncGen2 := asyncM2.preemptGen.Load()
+			needAsync := asyncM != asyncM2 || asyncGen != asyncGen2
+			asyncM = asyncM2
+			asyncGen = asyncGen2
+
+			casfrom_Gscanstatus(gp, _Gscanrunning, _Grunning)
+
+			// Send asynchronous preemption. We do this
+			// after CASing the G back to _Grunning
+			// because preemptM may be synchronous and we
+			// don't want to catch the G just spinning on
+			// its status.
+			if preemptMSupported && debug.asyncpreemptoff == 0 && needAsync {
+				// Rate limit preemptM calls. This is
+				// particularly important on Windows
+				// where preemptM is actually
+				// synchronous and the spin loop here
+				// can lead to live-lock.
+				now := nanotime()
+				if now >= nextPreemptM {
+					nextPreemptM = now + yieldDelay/2
+					preemptM(asyncM)
+				}
+			}
+		}
+
+		// TODO: Don't busy wait. This loop should really only
+		// be a simple read/decide/CAS loop that only fails if
+		// there's an active race. Once the CAS succeeds, we
+		// should queue up the preemption (which will require
+		// it to be reliable in the _Grunning case, not
+		// best-effort) and then sleep until we're notified
+		// that the goroutine is suspended.
+		if i == 0 {
+			nextYield = nanotime() + yieldDelay
+		}
+		if nanotime() < nextYield {
+			procyield(10)
+		} else {
+			osyield()
+			nextYield = nanotime() + yieldDelay/2
+		}
+	}
+}
+
+// resumeG undoes the effects of suspendG, allowing the suspended
+// goroutine to continue from its current safe-point.
+func resumeG(state suspendGState) {
+	if state.dead {
+		// We didn't actually stop anything.
+		return
+	}
+
+	gp := state.g
+	switch s := readgstatus(gp); s {
+	default:
+		dumpgstatus(gp)
+		throw("unexpected g status")
+
+	case _Grunnable | _Gscan,
+		_Gwaiting | _Gscan,
+		_Gsyscall | _Gscan:
+		casfrom_Gscanstatus(gp, s, s&^_Gscan)
+	}
+
+	if state.stopped {
+		// We stopped it, so we need to re-schedule it.
+		ready(gp, 0, true)
+	}
+}
+
+// canPreemptM reports whether mp is in a state that is safe to preempt.
+//
+// It is nosplit because it has nosplit callers.
+//
+//go:nosplit
+func canPreemptM(mp *m) bool {
+	return mp.locks == 0 && mp.mallocing == 0 && mp.preemptoff == "" && mp.p.ptr().status == _Prunning
+}
+
+//go:generate go run mkpreempt.go
+
+// asyncPreempt saves all user registers and calls asyncPreempt2.
+//
+// When stack scanning encounters an asyncPreempt frame, it scans that
+// frame and its parent frame conservatively.
+//
+// asyncPreempt is implemented in assembly.
+func asyncPreempt()
+
+//go:nosplit
+func asyncPreempt2() {
+	gp := getg()
+	gp.asyncSafePoint = true
+	if gp.preemptStop {
+		mcall(preemptPark)
+	} else {
+		mcall(gopreempt_m)
+	}
+	gp.asyncSafePoint = false
+}
+
+// asyncPreemptStack is the bytes of stack space required to inject an
+// asyncPreempt call.
+var asyncPreemptStack = ^uintptr(0)
+
+func init() {
+	f := findfunc(abi.FuncPCABI0(asyncPreempt))
+	total := funcMaxSPDelta(f)
+	f = findfunc(abi.FuncPCABIInternal(asyncPreempt2))
+	total += funcMaxSPDelta(f)
+	// Add some overhead for return PCs, etc.
+	asyncPreemptStack = uintptr(total) + 8*goarch.PtrSize
+	if asyncPreemptStack > stackNosplit {
+		// We need more than the nosplit limit. This isn't
+		// unsafe, but it may limit asynchronous preemption.
+		//
+		// This may be a problem if we start using more
+		// registers. In that case, we should store registers
+		// in a context object. If we pre-allocate one per P,
+		// asyncPreempt can spill just a few registers to the
+		// stack, then grab its context object and spill into
+		// it. When it enters the runtime, it would allocate a
+		// new context for the P.
+		print("runtime: asyncPreemptStack=", asyncPreemptStack, "\n")
+		throw("async stack too large")
+	}
+}
+
+// wantAsyncPreempt returns whether an asynchronous preemption is
+// queued for gp.
+func wantAsyncPreempt(gp *g) bool {
+	// Check both the G and the P.
+	return (gp.preempt || gp.m.p != 0 && gp.m.p.ptr().preempt) && readgstatus(gp)&^_Gscan == _Grunning
+}
+
+// isAsyncSafePoint reports whether gp at instruction PC is an
+// asynchronous safe point. This indicates that:
+//
+// 1. It's safe to suspend gp and conservatively scan its stack and
+// registers. There are no potentially hidden pointer values and it's
+// not in the middle of an atomic sequence like a write barrier.
+//
+// 2. gp has enough stack space to inject the asyncPreempt call.
+//
+// 3. It's generally safe to interact with the runtime, even if we're
+// in a signal handler stopped here. For example, there are no runtime
+// locks held, so acquiring a runtime lock won't self-deadlock.
+//
+// In some cases the PC is safe for asynchronous preemption but it
+// also needs to adjust the resumption PC. The new PC is returned in
+// the second result.
+func isAsyncSafePoint(gp *g, pc, sp, lr uintptr) (bool, uintptr) {
+	mp := gp.m
+
+	// Only user Gs can have safe-points. We check this first
+	// because it's extremely common that we'll catch mp in the
+	// scheduler processing this G preemption.
+	if mp.curg != gp {
+		return false, 0
+	}
+
+	// Check M state.
+	if mp.p == 0 || !canPreemptM(mp) {
+		return false, 0
+	}
+
+	// Check stack space.
+	if sp < gp.stack.lo || sp-gp.stack.lo < asyncPreemptStack {
+		return false, 0
+	}
+
+	// Check if PC is an unsafe-point.
+	f := findfunc(pc)
+	if !f.valid() {
+		// Not Go code.
+		return false, 0
+	}
+	if (GOARCH == "mips" || GOARCH == "mipsle" || GOARCH == "mips64" || GOARCH == "mips64le") && lr == pc+8 && funcspdelta(f, pc, nil) == 0 {
+		// We probably stopped at a half-executed CALL instruction,
+		// where the LR is updated but the PC has not. If we preempt
+		// here we'll see a seemingly self-recursive call, which is in
+		// fact not.
+		// This is normally ok, as we use the return address saved on
+		// stack for unwinding, not the LR value. But if this is a
+		// call to morestack, we haven't created the frame, and we'll
+		// use the LR for unwinding, which will be bad.
+		return false, 0
+	}
+	up, startpc := pcdatavalue2(f, abi.PCDATA_UnsafePoint, pc)
+	if up == abi.UnsafePointUnsafe {
+		// Unsafe-point marked by compiler. This includes
+		// atomic sequences (e.g., write barrier) and nosplit
+		// functions (except at calls).
+		return false, 0
+	}
+	if fd := funcdata(f, abi.FUNCDATA_LocalsPointerMaps); fd == nil || f.flag&abi.FuncFlagAsm != 0 {
+		// This is assembly code. Don't assume it's well-formed.
+		// TODO: Empirically we still need the fd == nil check. Why?
+		//
+		// TODO: Are there cases that are safe but don't have a
+		// locals pointer map, like empty frame functions?
+		// It might be possible to preempt any assembly functions
+		// except the ones that have funcFlag_SPWRITE set in f.flag.
+		return false, 0
+	}
+	// Check the inner-most name
+	u, uf := newInlineUnwinder(f, pc, nil)
+	name := u.srcFunc(uf).name()
+	if hasPrefix(name, "runtime.") ||
+		hasPrefix(name, "runtime/internal/") ||
+		hasPrefix(name, "reflect.") {
+		// For now we never async preempt the runtime or
+		// anything closely tied to the runtime. Known issues
+		// include: various points in the scheduler ("don't
+		// preempt between here and here"), much of the defer
+		// implementation (untyped info on stack), bulk write
+		// barriers (write barrier check),
+		// reflect.{makeFuncStub,methodValueCall}.
+		//
+		// TODO(austin): We should improve this, or opt things
+		// in incrementally.
+		return false, 0
+	}
+	switch up {
+	case abi.UnsafePointRestart1, abi.UnsafePointRestart2:
+		// Restartable instruction sequence. Back off PC to
+		// the start PC.
+		if startpc == 0 || startpc > pc || pc-startpc > 20 {
+			throw("bad restart PC")
+		}
+		return true, startpc
+	case abi.UnsafePointRestartAtEntry:
+		// Restart from the function entry at resumption.
+		return true, f.entry()
+	}
+	return true, pc
+}
diff --git a/src/runtime/preempt_386.s b/src/runtime/preempt_386.s
new file mode 100644
index 0000000..d57bc3d
--- /dev/null
+++ b/src/runtime/preempt_386.s
@@ -0,0 +1,47 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+TEXT ·asyncPreempt(SB),NOSPLIT|NOFRAME,$0-0
+	PUSHFL
+	ADJSP $156
+	NOP SP
+	MOVL AX, 0(SP)
+	MOVL CX, 4(SP)
+	MOVL DX, 8(SP)
+	MOVL BX, 12(SP)
+	MOVL BP, 16(SP)
+	MOVL SI, 20(SP)
+	MOVL DI, 24(SP)
+	#ifndef GO386_softfloat
+	MOVUPS X0, 28(SP)
+	MOVUPS X1, 44(SP)
+	MOVUPS X2, 60(SP)
+	MOVUPS X3, 76(SP)
+	MOVUPS X4, 92(SP)
+	MOVUPS X5, 108(SP)
+	MOVUPS X6, 124(SP)
+	MOVUPS X7, 140(SP)
+	#endif
+	CALL ·asyncPreempt2(SB)
+	#ifndef GO386_softfloat
+	MOVUPS 140(SP), X7
+	MOVUPS 124(SP), X6
+	MOVUPS 108(SP), X5
+	MOVUPS 92(SP), X4
+	MOVUPS 76(SP), X3
+	MOVUPS 60(SP), X2
+	MOVUPS 44(SP), X1
+	MOVUPS 28(SP), X0
+	#endif
+	MOVL 24(SP), DI
+	MOVL 20(SP), SI
+	MOVL 16(SP), BP
+	MOVL 12(SP), BX
+	MOVL 8(SP), DX
+	MOVL 4(SP), CX
+	MOVL 0(SP), AX
+	ADJSP $-156
+	POPFL
+	RET
diff --git a/src/runtime/preempt_amd64.s b/src/runtime/preempt_amd64.s
new file mode 100644
index 0000000..94a84fb
--- /dev/null
+++ b/src/runtime/preempt_amd64.s
@@ -0,0 +1,87 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+#include "go_asm.h"
+#include "asm_amd64.h"
+#include "textflag.h"
+
+TEXT ·asyncPreempt(SB),NOSPLIT|NOFRAME,$0-0
+	PUSHQ BP
+	MOVQ SP, BP
+	// Save flags before clobbering them
+	PUSHFQ
+	// obj doesn't understand ADD/SUB on SP, but does understand ADJSP
+	ADJSP $368
+	// But vet doesn't know ADJSP, so suppress vet stack checking
+	NOP SP
+	MOVQ AX, 0(SP)
+	MOVQ CX, 8(SP)
+	MOVQ DX, 16(SP)
+	MOVQ BX, 24(SP)
+	MOVQ SI, 32(SP)
+	MOVQ DI, 40(SP)
+	MOVQ R8, 48(SP)
+	MOVQ R9, 56(SP)
+	MOVQ R10, 64(SP)
+	MOVQ R11, 72(SP)
+	MOVQ R12, 80(SP)
+	MOVQ R13, 88(SP)
+	MOVQ R14, 96(SP)
+	MOVQ R15, 104(SP)
+	#ifdef GOOS_darwin
+	#ifndef hasAVX
+	CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $0
+	JE 2(PC)
+	#endif
+	VZEROUPPER
+	#endif
+	MOVUPS X0, 112(SP)
+	MOVUPS X1, 128(SP)
+	MOVUPS X2, 144(SP)
+	MOVUPS X3, 160(SP)
+	MOVUPS X4, 176(SP)
+	MOVUPS X5, 192(SP)
+	MOVUPS X6, 208(SP)
+	MOVUPS X7, 224(SP)
+	MOVUPS X8, 240(SP)
+	MOVUPS X9, 256(SP)
+	MOVUPS X10, 272(SP)
+	MOVUPS X11, 288(SP)
+	MOVUPS X12, 304(SP)
+	MOVUPS X13, 320(SP)
+	MOVUPS X14, 336(SP)
+	MOVUPS X15, 352(SP)
+	CALL ·asyncPreempt2(SB)
+	MOVUPS 352(SP), X15
+	MOVUPS 336(SP), X14
+	MOVUPS 320(SP), X13
+	MOVUPS 304(SP), X12
+	MOVUPS 288(SP), X11
+	MOVUPS 272(SP), X10
+	MOVUPS 256(SP), X9
+	MOVUPS 240(SP), X8
+	MOVUPS 224(SP), X7
+	MOVUPS 208(SP), X6
+	MOVUPS 192(SP), X5
+	MOVUPS 176(SP), X4
+	MOVUPS 160(SP), X3
+	MOVUPS 144(SP), X2
+	MOVUPS 128(SP), X1
+	MOVUPS 112(SP), X0
+	MOVQ 104(SP), R15
+	MOVQ 96(SP), R14
+	MOVQ 88(SP), R13
+	MOVQ 80(SP), R12
+	MOVQ 72(SP), R11
+	MOVQ 64(SP), R10
+	MOVQ 56(SP), R9
+	MOVQ 48(SP), R8
+	MOVQ 40(SP), DI
+	MOVQ 32(SP), SI
+	MOVQ 24(SP), BX
+	MOVQ 16(SP), DX
+	MOVQ 8(SP), CX
+	MOVQ 0(SP), AX
+	ADJSP $-368
+	POPFQ
+	POPQ BP
+	RET
diff --git a/src/runtime/preempt_arm.s b/src/runtime/preempt_arm.s
new file mode 100644
index 0000000..8f243c0
--- /dev/null
+++ b/src/runtime/preempt_arm.s
@@ -0,0 +1,83 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+TEXT ·asyncPreempt(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW.W R14, -188(R13)
+	MOVW R0, 4(R13)
+	MOVW R1, 8(R13)
+	MOVW R2, 12(R13)
+	MOVW R3, 16(R13)
+	MOVW R4, 20(R13)
+	MOVW R5, 24(R13)
+	MOVW R6, 28(R13)
+	MOVW R7, 32(R13)
+	MOVW R8, 36(R13)
+	MOVW R9, 40(R13)
+	MOVW R11, 44(R13)
+	MOVW R12, 48(R13)
+	MOVW CPSR, R0
+	MOVW R0, 52(R13)
+	MOVB ·goarm(SB), R0
+	CMP $6, R0
+	BLT nofp
+	MOVW FPCR, R0
+	MOVW R0, 56(R13)
+	MOVD F0, 60(R13)
+	MOVD F1, 68(R13)
+	MOVD F2, 76(R13)
+	MOVD F3, 84(R13)
+	MOVD F4, 92(R13)
+	MOVD F5, 100(R13)
+	MOVD F6, 108(R13)
+	MOVD F7, 116(R13)
+	MOVD F8, 124(R13)
+	MOVD F9, 132(R13)
+	MOVD F10, 140(R13)
+	MOVD F11, 148(R13)
+	MOVD F12, 156(R13)
+	MOVD F13, 164(R13)
+	MOVD F14, 172(R13)
+	MOVD F15, 180(R13)
+nofp:
+	CALL ·asyncPreempt2(SB)
+	MOVB ·goarm(SB), R0
+	CMP $6, R0
+	BLT nofp2
+	MOVD 180(R13), F15
+	MOVD 172(R13), F14
+	MOVD 164(R13), F13
+	MOVD 156(R13), F12
+	MOVD 148(R13), F11
+	MOVD 140(R13), F10
+	MOVD 132(R13), F9
+	MOVD 124(R13), F8
+	MOVD 116(R13), F7
+	MOVD 108(R13), F6
+	MOVD 100(R13), F5
+	MOVD 92(R13), F4
+	MOVD 84(R13), F3
+	MOVD 76(R13), F2
+	MOVD 68(R13), F1
+	MOVD 60(R13), F0
+	MOVW 56(R13), R0
+	MOVW R0, FPCR
+nofp2:
+	MOVW 52(R13), R0
+	MOVW R0, CPSR
+	MOVW 48(R13), R12
+	MOVW 44(R13), R11
+	MOVW 40(R13), R9
+	MOVW 36(R13), R8
+	MOVW 32(R13), R7
+	MOVW 28(R13), R6
+	MOVW 24(R13), R5
+	MOVW 20(R13), R4
+	MOVW 16(R13), R3
+	MOVW 12(R13), R2
+	MOVW 8(R13), R1
+	MOVW 4(R13), R0
+	MOVW 188(R13), R14
+	MOVW.P 192(R13), R15
+	UNDEF
diff --git a/src/runtime/preempt_arm64.s b/src/runtime/preempt_arm64.s
new file mode 100644
index 0000000..c27d475
--- /dev/null
+++ b/src/runtime/preempt_arm64.s
@@ -0,0 +1,85 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+TEXT ·asyncPreempt(SB),NOSPLIT|NOFRAME,$0-0
+	MOVD R30, -496(RSP)
+	SUB $496, RSP
+	MOVD R29, -8(RSP)
+	SUB $8, RSP, R29
+	#ifdef GOOS_ios
+	MOVD R30, (RSP)
+	#endif
+	STP (R0, R1), 8(RSP)
+	STP (R2, R3), 24(RSP)
+	STP (R4, R5), 40(RSP)
+	STP (R6, R7), 56(RSP)
+	STP (R8, R9), 72(RSP)
+	STP (R10, R11), 88(RSP)
+	STP (R12, R13), 104(RSP)
+	STP (R14, R15), 120(RSP)
+	STP (R16, R17), 136(RSP)
+	STP (R19, R20), 152(RSP)
+	STP (R21, R22), 168(RSP)
+	STP (R23, R24), 184(RSP)
+	STP (R25, R26), 200(RSP)
+	MOVD NZCV, R0
+	MOVD R0, 216(RSP)
+	MOVD FPSR, R0
+	MOVD R0, 224(RSP)
+	FSTPD (F0, F1), 232(RSP)
+	FSTPD (F2, F3), 248(RSP)
+	FSTPD (F4, F5), 264(RSP)
+	FSTPD (F6, F7), 280(RSP)
+	FSTPD (F8, F9), 296(RSP)
+	FSTPD (F10, F11), 312(RSP)
+	FSTPD (F12, F13), 328(RSP)
+	FSTPD (F14, F15), 344(RSP)
+	FSTPD (F16, F17), 360(RSP)
+	FSTPD (F18, F19), 376(RSP)
+	FSTPD (F20, F21), 392(RSP)
+	FSTPD (F22, F23), 408(RSP)
+	FSTPD (F24, F25), 424(RSP)
+	FSTPD (F26, F27), 440(RSP)
+	FSTPD (F28, F29), 456(RSP)
+	FSTPD (F30, F31), 472(RSP)
+	CALL ·asyncPreempt2(SB)
+	FLDPD 472(RSP), (F30, F31)
+	FLDPD 456(RSP), (F28, F29)
+	FLDPD 440(RSP), (F26, F27)
+	FLDPD 424(RSP), (F24, F25)
+	FLDPD 408(RSP), (F22, F23)
+	FLDPD 392(RSP), (F20, F21)
+	FLDPD 376(RSP), (F18, F19)
+	FLDPD 360(RSP), (F16, F17)
+	FLDPD 344(RSP), (F14, F15)
+	FLDPD 328(RSP), (F12, F13)
+	FLDPD 312(RSP), (F10, F11)
+	FLDPD 296(RSP), (F8, F9)
+	FLDPD 280(RSP), (F6, F7)
+	FLDPD 264(RSP), (F4, F5)
+	FLDPD 248(RSP), (F2, F3)
+	FLDPD 232(RSP), (F0, F1)
+	MOVD 224(RSP), R0
+	MOVD R0, FPSR
+	MOVD 216(RSP), R0
+	MOVD R0, NZCV
+	LDP 200(RSP), (R25, R26)
+	LDP 184(RSP), (R23, R24)
+	LDP 168(RSP), (R21, R22)
+	LDP 152(RSP), (R19, R20)
+	LDP 136(RSP), (R16, R17)
+	LDP 120(RSP), (R14, R15)
+	LDP 104(RSP), (R12, R13)
+	LDP 88(RSP), (R10, R11)
+	LDP 72(RSP), (R8, R9)
+	LDP 56(RSP), (R6, R7)
+	LDP 40(RSP), (R4, R5)
+	LDP 24(RSP), (R2, R3)
+	LDP 8(RSP), (R0, R1)
+	MOVD 496(RSP), R30
+	MOVD -8(RSP), R29
+	MOVD (RSP), R27
+	ADD $512, RSP
+	JMP (R27)
diff --git a/src/runtime/preempt_loong64.s b/src/runtime/preempt_loong64.s
new file mode 100644
index 0000000..bb9c948
--- /dev/null
+++ b/src/runtime/preempt_loong64.s
@@ -0,0 +1,133 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+TEXT ·asyncPreempt(SB),NOSPLIT|NOFRAME,$0-0
+	MOVV R1, -480(R3)
+	SUBV $480, R3
+	MOVV R4, 8(R3)
+	MOVV R5, 16(R3)
+	MOVV R6, 24(R3)
+	MOVV R7, 32(R3)
+	MOVV R8, 40(R3)
+	MOVV R9, 48(R3)
+	MOVV R10, 56(R3)
+	MOVV R11, 64(R3)
+	MOVV R12, 72(R3)
+	MOVV R13, 80(R3)
+	MOVV R14, 88(R3)
+	MOVV R15, 96(R3)
+	MOVV R16, 104(R3)
+	MOVV R17, 112(R3)
+	MOVV R18, 120(R3)
+	MOVV R19, 128(R3)
+	MOVV R20, 136(R3)
+	MOVV R21, 144(R3)
+	MOVV R23, 152(R3)
+	MOVV R24, 160(R3)
+	MOVV R25, 168(R3)
+	MOVV R26, 176(R3)
+	MOVV R27, 184(R3)
+	MOVV R28, 192(R3)
+	MOVV R29, 200(R3)
+	MOVV R31, 208(R3)
+	MOVD F0, 216(R3)
+	MOVD F1, 224(R3)
+	MOVD F2, 232(R3)
+	MOVD F3, 240(R3)
+	MOVD F4, 248(R3)
+	MOVD F5, 256(R3)
+	MOVD F6, 264(R3)
+	MOVD F7, 272(R3)
+	MOVD F8, 280(R3)
+	MOVD F9, 288(R3)
+	MOVD F10, 296(R3)
+	MOVD F11, 304(R3)
+	MOVD F12, 312(R3)
+	MOVD F13, 320(R3)
+	MOVD F14, 328(R3)
+	MOVD F15, 336(R3)
+	MOVD F16, 344(R3)
+	MOVD F17, 352(R3)
+	MOVD F18, 360(R3)
+	MOVD F19, 368(R3)
+	MOVD F20, 376(R3)
+	MOVD F21, 384(R3)
+	MOVD F22, 392(R3)
+	MOVD F23, 400(R3)
+	MOVD F24, 408(R3)
+	MOVD F25, 416(R3)
+	MOVD F26, 424(R3)
+	MOVD F27, 432(R3)
+	MOVD F28, 440(R3)
+	MOVD F29, 448(R3)
+	MOVD F30, 456(R3)
+	MOVD F31, 464(R3)
+	MOVV FCC0, R4
+	MOVV R4, 472(R3)
+	CALL ·asyncPreempt2(SB)
+	MOVV 472(R3), R4
+	MOVV R4, FCC0
+	MOVD 464(R3), F31
+	MOVD 456(R3), F30
+	MOVD 448(R3), F29
+	MOVD 440(R3), F28
+	MOVD 432(R3), F27
+	MOVD 424(R3), F26
+	MOVD 416(R3), F25
+	MOVD 408(R3), F24
+	MOVD 400(R3), F23
+	MOVD 392(R3), F22
+	MOVD 384(R3), F21
+	MOVD 376(R3), F20
+	MOVD 368(R3), F19
+	MOVD 360(R3), F18
+	MOVD 352(R3), F17
+	MOVD 344(R3), F16
+	MOVD 336(R3), F15
+	MOVD 328(R3), F14
+	MOVD 320(R3), F13
+	MOVD 312(R3), F12
+	MOVD 304(R3), F11
+	MOVD 296(R3), F10
+	MOVD 288(R3), F9
+	MOVD 280(R3), F8
+	MOVD 272(R3), F7
+	MOVD 264(R3), F6
+	MOVD 256(R3), F5
+	MOVD 248(R3), F4
+	MOVD 240(R3), F3
+	MOVD 232(R3), F2
+	MOVD 224(R3), F1
+	MOVD 216(R3), F0
+	MOVV 208(R3), R31
+	MOVV 200(R3), R29
+	MOVV 192(R3), R28
+	MOVV 184(R3), R27
+	MOVV 176(R3), R26
+	MOVV 168(R3), R25
+	MOVV 160(R3), R24
+	MOVV 152(R3), R23
+	MOVV 144(R3), R21
+	MOVV 136(R3), R20
+	MOVV 128(R3), R19
+	MOVV 120(R3), R18
+	MOVV 112(R3), R17
+	MOVV 104(R3), R16
+	MOVV 96(R3), R15
+	MOVV 88(R3), R14
+	MOVV 80(R3), R13
+	MOVV 72(R3), R12
+	MOVV 64(R3), R11
+	MOVV 56(R3), R10
+	MOVV 48(R3), R9
+	MOVV 40(R3), R8
+	MOVV 32(R3), R7
+	MOVV 24(R3), R6
+	MOVV 16(R3), R5
+	MOVV 8(R3), R4
+	MOVV 480(R3), R1
+	MOVV (R3), R30
+	ADDV $488, R3
+	JMP (R30)
diff --git a/src/runtime/preempt_mips64x.s b/src/runtime/preempt_mips64x.s
new file mode 100644
index 0000000..996b592
--- /dev/null
+++ b/src/runtime/preempt_mips64x.s
@@ -0,0 +1,145 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+//go:build mips64 || mips64le
+
+#include "go_asm.h"
+#include "textflag.h"
+
+TEXT ·asyncPreempt(SB),NOSPLIT|NOFRAME,$0-0
+	MOVV R31, -488(R29)
+	SUBV $488, R29
+	MOVV R1, 8(R29)
+	MOVV R2, 16(R29)
+	MOVV R3, 24(R29)
+	MOVV R4, 32(R29)
+	MOVV R5, 40(R29)
+	MOVV R6, 48(R29)
+	MOVV R7, 56(R29)
+	MOVV R8, 64(R29)
+	MOVV R9, 72(R29)
+	MOVV R10, 80(R29)
+	MOVV R11, 88(R29)
+	MOVV R12, 96(R29)
+	MOVV R13, 104(R29)
+	MOVV R14, 112(R29)
+	MOVV R15, 120(R29)
+	MOVV R16, 128(R29)
+	MOVV R17, 136(R29)
+	MOVV R18, 144(R29)
+	MOVV R19, 152(R29)
+	MOVV R20, 160(R29)
+	MOVV R21, 168(R29)
+	MOVV R22, 176(R29)
+	MOVV R24, 184(R29)
+	MOVV R25, 192(R29)
+	MOVV RSB, 200(R29)
+	MOVV HI, R1
+	MOVV R1, 208(R29)
+	MOVV LO, R1
+	MOVV R1, 216(R29)
+	#ifndef GOMIPS64_softfloat
+	MOVV FCR31, R1
+	MOVV R1, 224(R29)
+	MOVD F0, 232(R29)
+	MOVD F1, 240(R29)
+	MOVD F2, 248(R29)
+	MOVD F3, 256(R29)
+	MOVD F4, 264(R29)
+	MOVD F5, 272(R29)
+	MOVD F6, 280(R29)
+	MOVD F7, 288(R29)
+	MOVD F8, 296(R29)
+	MOVD F9, 304(R29)
+	MOVD F10, 312(R29)
+	MOVD F11, 320(R29)
+	MOVD F12, 328(R29)
+	MOVD F13, 336(R29)
+	MOVD F14, 344(R29)
+	MOVD F15, 352(R29)
+	MOVD F16, 360(R29)
+	MOVD F17, 368(R29)
+	MOVD F18, 376(R29)
+	MOVD F19, 384(R29)
+	MOVD F20, 392(R29)
+	MOVD F21, 400(R29)
+	MOVD F22, 408(R29)
+	MOVD F23, 416(R29)
+	MOVD F24, 424(R29)
+	MOVD F25, 432(R29)
+	MOVD F26, 440(R29)
+	MOVD F27, 448(R29)
+	MOVD F28, 456(R29)
+	MOVD F29, 464(R29)
+	MOVD F30, 472(R29)
+	MOVD F31, 480(R29)
+	#endif
+	CALL ·asyncPreempt2(SB)
+	#ifndef GOMIPS64_softfloat
+	MOVD 480(R29), F31
+	MOVD 472(R29), F30
+	MOVD 464(R29), F29
+	MOVD 456(R29), F28
+	MOVD 448(R29), F27
+	MOVD 440(R29), F26
+	MOVD 432(R29), F25
+	MOVD 424(R29), F24
+	MOVD 416(R29), F23
+	MOVD 408(R29), F22
+	MOVD 400(R29), F21
+	MOVD 392(R29), F20
+	MOVD 384(R29), F19
+	MOVD 376(R29), F18
+	MOVD 368(R29), F17
+	MOVD 360(R29), F16
+	MOVD 352(R29), F15
+	MOVD 344(R29), F14
+	MOVD 336(R29), F13
+	MOVD 328(R29), F12
+	MOVD 320(R29), F11
+	MOVD 312(R29), F10
+	MOVD 304(R29), F9
+	MOVD 296(R29), F8
+	MOVD 288(R29), F7
+	MOVD 280(R29), F6
+	MOVD 272(R29), F5
+	MOVD 264(R29), F4
+	MOVD 256(R29), F3
+	MOVD 248(R29), F2
+	MOVD 240(R29), F1
+	MOVD 232(R29), F0
+	MOVV 224(R29), R1
+	MOVV R1, FCR31
+	#endif
+	MOVV 216(R29), R1
+	MOVV R1, LO
+	MOVV 208(R29), R1
+	MOVV R1, HI
+	MOVV 200(R29), RSB
+	MOVV 192(R29), R25
+	MOVV 184(R29), R24
+	MOVV 176(R29), R22
+	MOVV 168(R29), R21
+	MOVV 160(R29), R20
+	MOVV 152(R29), R19
+	MOVV 144(R29), R18
+	MOVV 136(R29), R17
+	MOVV 128(R29), R16
+	MOVV 120(R29), R15
+	MOVV 112(R29), R14
+	MOVV 104(R29), R13
+	MOVV 96(R29), R12
+	MOVV 88(R29), R11
+	MOVV 80(R29), R10
+	MOVV 72(R29), R9
+	MOVV 64(R29), R8
+	MOVV 56(R29), R7
+	MOVV 48(R29), R6
+	MOVV 40(R29), R5
+	MOVV 32(R29), R4
+	MOVV 24(R29), R3
+	MOVV 16(R29), R2
+	MOVV 8(R29), R1
+	MOVV 488(R29), R31
+	MOVV (R29), R23
+	ADDV $496, R29
+	JMP (R23)
diff --git a/src/runtime/preempt_mipsx.s b/src/runtime/preempt_mipsx.s
new file mode 100644
index 0000000..7b169ac
--- /dev/null
+++ b/src/runtime/preempt_mipsx.s
@@ -0,0 +1,145 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+//go:build mips || mipsle
+
+#include "go_asm.h"
+#include "textflag.h"
+
+TEXT ·asyncPreempt(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW R31, -244(R29)
+	SUB $244, R29
+	MOVW R1, 4(R29)
+	MOVW R2, 8(R29)
+	MOVW R3, 12(R29)
+	MOVW R4, 16(R29)
+	MOVW R5, 20(R29)
+	MOVW R6, 24(R29)
+	MOVW R7, 28(R29)
+	MOVW R8, 32(R29)
+	MOVW R9, 36(R29)
+	MOVW R10, 40(R29)
+	MOVW R11, 44(R29)
+	MOVW R12, 48(R29)
+	MOVW R13, 52(R29)
+	MOVW R14, 56(R29)
+	MOVW R15, 60(R29)
+	MOVW R16, 64(R29)
+	MOVW R17, 68(R29)
+	MOVW R18, 72(R29)
+	MOVW R19, 76(R29)
+	MOVW R20, 80(R29)
+	MOVW R21, 84(R29)
+	MOVW R22, 88(R29)
+	MOVW R24, 92(R29)
+	MOVW R25, 96(R29)
+	MOVW R28, 100(R29)
+	MOVW HI, R1
+	MOVW R1, 104(R29)
+	MOVW LO, R1
+	MOVW R1, 108(R29)
+	#ifndef GOMIPS_softfloat
+	MOVW FCR31, R1
+	MOVW R1, 112(R29)
+	MOVF F0, 116(R29)
+	MOVF F1, 120(R29)
+	MOVF F2, 124(R29)
+	MOVF F3, 128(R29)
+	MOVF F4, 132(R29)
+	MOVF F5, 136(R29)
+	MOVF F6, 140(R29)
+	MOVF F7, 144(R29)
+	MOVF F8, 148(R29)
+	MOVF F9, 152(R29)
+	MOVF F10, 156(R29)
+	MOVF F11, 160(R29)
+	MOVF F12, 164(R29)
+	MOVF F13, 168(R29)
+	MOVF F14, 172(R29)
+	MOVF F15, 176(R29)
+	MOVF F16, 180(R29)
+	MOVF F17, 184(R29)
+	MOVF F18, 188(R29)
+	MOVF F19, 192(R29)
+	MOVF F20, 196(R29)
+	MOVF F21, 200(R29)
+	MOVF F22, 204(R29)
+	MOVF F23, 208(R29)
+	MOVF F24, 212(R29)
+	MOVF F25, 216(R29)
+	MOVF F26, 220(R29)
+	MOVF F27, 224(R29)
+	MOVF F28, 228(R29)
+	MOVF F29, 232(R29)
+	MOVF F30, 236(R29)
+	MOVF F31, 240(R29)
+	#endif
+	CALL ·asyncPreempt2(SB)
+	#ifndef GOMIPS_softfloat
+	MOVF 240(R29), F31
+	MOVF 236(R29), F30
+	MOVF 232(R29), F29
+	MOVF 228(R29), F28
+	MOVF 224(R29), F27
+	MOVF 220(R29), F26
+	MOVF 216(R29), F25
+	MOVF 212(R29), F24
+	MOVF 208(R29), F23
+	MOVF 204(R29), F22
+	MOVF 200(R29), F21
+	MOVF 196(R29), F20
+	MOVF 192(R29), F19
+	MOVF 188(R29), F18
+	MOVF 184(R29), F17
+	MOVF 180(R29), F16
+	MOVF 176(R29), F15
+	MOVF 172(R29), F14
+	MOVF 168(R29), F13
+	MOVF 164(R29), F12
+	MOVF 160(R29), F11
+	MOVF 156(R29), F10
+	MOVF 152(R29), F9
+	MOVF 148(R29), F8
+	MOVF 144(R29), F7
+	MOVF 140(R29), F6
+	MOVF 136(R29), F5
+	MOVF 132(R29), F4
+	MOVF 128(R29), F3
+	MOVF 124(R29), F2
+	MOVF 120(R29), F1
+	MOVF 116(R29), F0
+	MOVW 112(R29), R1
+	MOVW R1, FCR31
+	#endif
+	MOVW 108(R29), R1
+	MOVW R1, LO
+	MOVW 104(R29), R1
+	MOVW R1, HI
+	MOVW 100(R29), R28
+	MOVW 96(R29), R25
+	MOVW 92(R29), R24
+	MOVW 88(R29), R22
+	MOVW 84(R29), R21
+	MOVW 80(R29), R20
+	MOVW 76(R29), R19
+	MOVW 72(R29), R18
+	MOVW 68(R29), R17
+	MOVW 64(R29), R16
+	MOVW 60(R29), R15
+	MOVW 56(R29), R14
+	MOVW 52(R29), R13
+	MOVW 48(R29), R12
+	MOVW 44(R29), R11
+	MOVW 40(R29), R10
+	MOVW 36(R29), R9
+	MOVW 32(R29), R8
+	MOVW 28(R29), R7
+	MOVW 24(R29), R6
+	MOVW 20(R29), R5
+	MOVW 16(R29), R4
+	MOVW 12(R29), R3
+	MOVW 8(R29), R2
+	MOVW 4(R29), R1
+	MOVW 244(R29), R31
+	MOVW (R29), R23
+	ADD $248, R29
+	JMP (R23)
diff --git a/src/runtime/preempt_nonwindows.go b/src/runtime/preempt_nonwindows.go
new file mode 100644
index 0000000..d6a2408
--- /dev/null
+++ b/src/runtime/preempt_nonwindows.go
@@ -0,0 +1,13 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !windows
+
+package runtime
+
+//go:nosplit
+func osPreemptExtEnter(mp *m) {}
+
+//go:nosplit
+func osPreemptExtExit(mp *m) {}
diff --git a/src/runtime/preempt_ppc64x.s b/src/runtime/preempt_ppc64x.s
new file mode 100644
index 0000000..2c4d02e
--- /dev/null
+++ b/src/runtime/preempt_ppc64x.s
@@ -0,0 +1,147 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+//go:build ppc64 || ppc64le
+
+#include "go_asm.h"
+#include "textflag.h"
+
+TEXT ·asyncPreempt(SB),NOSPLIT|NOFRAME,$0-0
+	MOVD R31, -488(R1)
+	MOVD LR, R31
+	MOVDU R31, -520(R1)
+	MOVD R3, 40(R1)
+	MOVD R4, 48(R1)
+	MOVD R5, 56(R1)
+	MOVD R6, 64(R1)
+	MOVD R7, 72(R1)
+	MOVD R8, 80(R1)
+	MOVD R9, 88(R1)
+	MOVD R10, 96(R1)
+	MOVD R11, 104(R1)
+	MOVD R14, 112(R1)
+	MOVD R15, 120(R1)
+	MOVD R16, 128(R1)
+	MOVD R17, 136(R1)
+	MOVD R18, 144(R1)
+	MOVD R19, 152(R1)
+	MOVD R20, 160(R1)
+	MOVD R21, 168(R1)
+	MOVD R22, 176(R1)
+	MOVD R23, 184(R1)
+	MOVD R24, 192(R1)
+	MOVD R25, 200(R1)
+	MOVD R26, 208(R1)
+	MOVD R27, 216(R1)
+	MOVD R28, 224(R1)
+	MOVD R29, 232(R1)
+	MOVW CR, R31
+	MOVW R31, 240(R1)
+	MOVD XER, R31
+	MOVD R31, 248(R1)
+	FMOVD F0, 256(R1)
+	FMOVD F1, 264(R1)
+	FMOVD F2, 272(R1)
+	FMOVD F3, 280(R1)
+	FMOVD F4, 288(R1)
+	FMOVD F5, 296(R1)
+	FMOVD F6, 304(R1)
+	FMOVD F7, 312(R1)
+	FMOVD F8, 320(R1)
+	FMOVD F9, 328(R1)
+	FMOVD F10, 336(R1)
+	FMOVD F11, 344(R1)
+	FMOVD F12, 352(R1)
+	FMOVD F13, 360(R1)
+	FMOVD F14, 368(R1)
+	FMOVD F15, 376(R1)
+	FMOVD F16, 384(R1)
+	FMOVD F17, 392(R1)
+	FMOVD F18, 400(R1)
+	FMOVD F19, 408(R1)
+	FMOVD F20, 416(R1)
+	FMOVD F21, 424(R1)
+	FMOVD F22, 432(R1)
+	FMOVD F23, 440(R1)
+	FMOVD F24, 448(R1)
+	FMOVD F25, 456(R1)
+	FMOVD F26, 464(R1)
+	FMOVD F27, 472(R1)
+	FMOVD F28, 480(R1)
+	FMOVD F29, 488(R1)
+	FMOVD F30, 496(R1)
+	FMOVD F31, 504(R1)
+	MOVFL FPSCR, F0
+	FMOVD F0, 512(R1)
+	CALL ·asyncPreempt2(SB)
+	FMOVD 512(R1), F0
+	MOVFL F0, FPSCR
+	FMOVD 504(R1), F31
+	FMOVD 496(R1), F30
+	FMOVD 488(R1), F29
+	FMOVD 480(R1), F28
+	FMOVD 472(R1), F27
+	FMOVD 464(R1), F26
+	FMOVD 456(R1), F25
+	FMOVD 448(R1), F24
+	FMOVD 440(R1), F23
+	FMOVD 432(R1), F22
+	FMOVD 424(R1), F21
+	FMOVD 416(R1), F20
+	FMOVD 408(R1), F19
+	FMOVD 400(R1), F18
+	FMOVD 392(R1), F17
+	FMOVD 384(R1), F16
+	FMOVD 376(R1), F15
+	FMOVD 368(R1), F14
+	FMOVD 360(R1), F13
+	FMOVD 352(R1), F12
+	FMOVD 344(R1), F11
+	FMOVD 336(R1), F10
+	FMOVD 328(R1), F9
+	FMOVD 320(R1), F8
+	FMOVD 312(R1), F7
+	FMOVD 304(R1), F6
+	FMOVD 296(R1), F5
+	FMOVD 288(R1), F4
+	FMOVD 280(R1), F3
+	FMOVD 272(R1), F2
+	FMOVD 264(R1), F1
+	FMOVD 256(R1), F0
+	MOVD 248(R1), R31
+	MOVD R31, XER
+	MOVW 240(R1), R31
+	MOVFL R31, $0xff
+	MOVD 232(R1), R29
+	MOVD 224(R1), R28
+	MOVD 216(R1), R27
+	MOVD 208(R1), R26
+	MOVD 200(R1), R25
+	MOVD 192(R1), R24
+	MOVD 184(R1), R23
+	MOVD 176(R1), R22
+	MOVD 168(R1), R21
+	MOVD 160(R1), R20
+	MOVD 152(R1), R19
+	MOVD 144(R1), R18
+	MOVD 136(R1), R17
+	MOVD 128(R1), R16
+	MOVD 120(R1), R15
+	MOVD 112(R1), R14
+	MOVD 104(R1), R11
+	MOVD 96(R1), R10
+	MOVD 88(R1), R9
+	MOVD 80(R1), R8
+	MOVD 72(R1), R7
+	MOVD 64(R1), R6
+	MOVD 56(R1), R5
+	MOVD 48(R1), R4
+	MOVD 40(R1), R3
+	MOVD 520(R1), R31
+	MOVD R31, LR
+	MOVD 528(R1), R2
+	MOVD 536(R1), R12
+	MOVD (R1), R31
+	MOVD R31, CTR
+	MOVD 32(R1), R31
+	ADD $552, R1
+	JMP (CTR)
diff --git a/src/runtime/preempt_riscv64.s b/src/runtime/preempt_riscv64.s
new file mode 100644
index 0000000..56df6c3
--- /dev/null
+++ b/src/runtime/preempt_riscv64.s
@@ -0,0 +1,127 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+TEXT ·asyncPreempt(SB),NOSPLIT|NOFRAME,$0-0
+	MOV X1, -464(X2)
+	ADD $-464, X2
+	MOV X5, 8(X2)
+	MOV X6, 16(X2)
+	MOV X7, 24(X2)
+	MOV X8, 32(X2)
+	MOV X9, 40(X2)
+	MOV X10, 48(X2)
+	MOV X11, 56(X2)
+	MOV X12, 64(X2)
+	MOV X13, 72(X2)
+	MOV X14, 80(X2)
+	MOV X15, 88(X2)
+	MOV X16, 96(X2)
+	MOV X17, 104(X2)
+	MOV X18, 112(X2)
+	MOV X19, 120(X2)
+	MOV X20, 128(X2)
+	MOV X21, 136(X2)
+	MOV X22, 144(X2)
+	MOV X23, 152(X2)
+	MOV X24, 160(X2)
+	MOV X25, 168(X2)
+	MOV X26, 176(X2)
+	MOV X28, 184(X2)
+	MOV X29, 192(X2)
+	MOV X30, 200(X2)
+	MOVD F0, 208(X2)
+	MOVD F1, 216(X2)
+	MOVD F2, 224(X2)
+	MOVD F3, 232(X2)
+	MOVD F4, 240(X2)
+	MOVD F5, 248(X2)
+	MOVD F6, 256(X2)
+	MOVD F7, 264(X2)
+	MOVD F8, 272(X2)
+	MOVD F9, 280(X2)
+	MOVD F10, 288(X2)
+	MOVD F11, 296(X2)
+	MOVD F12, 304(X2)
+	MOVD F13, 312(X2)
+	MOVD F14, 320(X2)
+	MOVD F15, 328(X2)
+	MOVD F16, 336(X2)
+	MOVD F17, 344(X2)
+	MOVD F18, 352(X2)
+	MOVD F19, 360(X2)
+	MOVD F20, 368(X2)
+	MOVD F21, 376(X2)
+	MOVD F22, 384(X2)
+	MOVD F23, 392(X2)
+	MOVD F24, 400(X2)
+	MOVD F25, 408(X2)
+	MOVD F26, 416(X2)
+	MOVD F27, 424(X2)
+	MOVD F28, 432(X2)
+	MOVD F29, 440(X2)
+	MOVD F30, 448(X2)
+	MOVD F31, 456(X2)
+	CALL ·asyncPreempt2(SB)
+	MOVD 456(X2), F31
+	MOVD 448(X2), F30
+	MOVD 440(X2), F29
+	MOVD 432(X2), F28
+	MOVD 424(X2), F27
+	MOVD 416(X2), F26
+	MOVD 408(X2), F25
+	MOVD 400(X2), F24
+	MOVD 392(X2), F23
+	MOVD 384(X2), F22
+	MOVD 376(X2), F21
+	MOVD 368(X2), F20
+	MOVD 360(X2), F19
+	MOVD 352(X2), F18
+	MOVD 344(X2), F17
+	MOVD 336(X2), F16
+	MOVD 328(X2), F15
+	MOVD 320(X2), F14
+	MOVD 312(X2), F13
+	MOVD 304(X2), F12
+	MOVD 296(X2), F11
+	MOVD 288(X2), F10
+	MOVD 280(X2), F9
+	MOVD 272(X2), F8
+	MOVD 264(X2), F7
+	MOVD 256(X2), F6
+	MOVD 248(X2), F5
+	MOVD 240(X2), F4
+	MOVD 232(X2), F3
+	MOVD 224(X2), F2
+	MOVD 216(X2), F1
+	MOVD 208(X2), F0
+	MOV 200(X2), X30
+	MOV 192(X2), X29
+	MOV 184(X2), X28
+	MOV 176(X2), X26
+	MOV 168(X2), X25
+	MOV 160(X2), X24
+	MOV 152(X2), X23
+	MOV 144(X2), X22
+	MOV 136(X2), X21
+	MOV 128(X2), X20
+	MOV 120(X2), X19
+	MOV 112(X2), X18
+	MOV 104(X2), X17
+	MOV 96(X2), X16
+	MOV 88(X2), X15
+	MOV 80(X2), X14
+	MOV 72(X2), X13
+	MOV 64(X2), X12
+	MOV 56(X2), X11
+	MOV 48(X2), X10
+	MOV 40(X2), X9
+	MOV 32(X2), X8
+	MOV 24(X2), X7
+	MOV 16(X2), X6
+	MOV 8(X2), X5
+	MOV 464(X2), X1
+	MOV (X2), X31
+	ADD $472, X2
+	JMP (X31)
diff --git a/src/runtime/preempt_s390x.s b/src/runtime/preempt_s390x.s
new file mode 100644
index 0000000..ca9e47c
--- /dev/null
+++ b/src/runtime/preempt_s390x.s
@@ -0,0 +1,51 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+TEXT ·asyncPreempt(SB),NOSPLIT|NOFRAME,$0-0
+	IPM R10
+	MOVD R14, -248(R15)
+	ADD $-248, R15
+	MOVW R10, 8(R15)
+	STMG R0, R12, 16(R15)
+	FMOVD F0, 120(R15)
+	FMOVD F1, 128(R15)
+	FMOVD F2, 136(R15)
+	FMOVD F3, 144(R15)
+	FMOVD F4, 152(R15)
+	FMOVD F5, 160(R15)
+	FMOVD F6, 168(R15)
+	FMOVD F7, 176(R15)
+	FMOVD F8, 184(R15)
+	FMOVD F9, 192(R15)
+	FMOVD F10, 200(R15)
+	FMOVD F11, 208(R15)
+	FMOVD F12, 216(R15)
+	FMOVD F13, 224(R15)
+	FMOVD F14, 232(R15)
+	FMOVD F15, 240(R15)
+	CALL ·asyncPreempt2(SB)
+	FMOVD 240(R15), F15
+	FMOVD 232(R15), F14
+	FMOVD 224(R15), F13
+	FMOVD 216(R15), F12
+	FMOVD 208(R15), F11
+	FMOVD 200(R15), F10
+	FMOVD 192(R15), F9
+	FMOVD 184(R15), F8
+	FMOVD 176(R15), F7
+	FMOVD 168(R15), F6
+	FMOVD 160(R15), F5
+	FMOVD 152(R15), F4
+	FMOVD 144(R15), F3
+	FMOVD 136(R15), F2
+	FMOVD 128(R15), F1
+	FMOVD 120(R15), F0
+	LMG 16(R15), R0, R12
+	MOVD 248(R15), R14
+	ADD $256, R15
+	MOVWZ -248(R15), R10
+	TMLH R10, $(3<<12)
+	MOVD -256(R15), R10
+	JMP (R10)
diff --git a/src/runtime/preempt_wasm.s b/src/runtime/preempt_wasm.s
new file mode 100644
index 0000000..0cf57d3
--- /dev/null
+++ b/src/runtime/preempt_wasm.s
@@ -0,0 +1,8 @@
+// Code generated by mkpreempt.go; DO NOT EDIT.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+TEXT ·asyncPreempt(SB),NOSPLIT|NOFRAME,$0-0
+	// No async preemption on wasm
+	UNDEF
diff --git a/src/runtime/print.go b/src/runtime/print.go
new file mode 100644
index 0000000..0b05aed
--- /dev/null
+++ b/src/runtime/print.go
@@ -0,0 +1,301 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"unsafe"
+)
+
+// The compiler knows that a print of a value of this type
+// should use printhex instead of printuint (decimal).
+type hex uint64
+
+func bytes(s string) (ret []byte) {
+	rp := (*slice)(unsafe.Pointer(&ret))
+	sp := stringStructOf(&s)
+	rp.array = sp.str
+	rp.len = sp.len
+	rp.cap = sp.len
+	return
+}
+
+var (
+	// printBacklog is a circular buffer of messages written with the builtin
+	// print* functions, for use in postmortem analysis of core dumps.
+	printBacklog      [512]byte
+	printBacklogIndex int
+)
+
+// recordForPanic maintains a circular buffer of messages written by the
+// runtime leading up to a process crash, allowing the messages to be
+// extracted from a core dump.
+//
+// The text written during a process crash (following "panic" or "fatal
+// error") is not saved, since the goroutine stacks will generally be readable
+// from the runtime data structures in the core file.
+func recordForPanic(b []byte) {
+	printlock()
+
+	if panicking.Load() == 0 {
+		// Not actively crashing: maintain circular buffer of print output.
+		for i := 0; i < len(b); {
+			n := copy(printBacklog[printBacklogIndex:], b[i:])
+			i += n
+			printBacklogIndex += n
+			printBacklogIndex %= len(printBacklog)
+		}
+	}
+
+	printunlock()
+}
+
+var debuglock mutex
+
+// The compiler emits calls to printlock and printunlock around
+// the multiple calls that implement a single Go print or println
+// statement. Some of the print helpers (printslice, for example)
+// call print recursively. There is also the problem of a crash
+// happening during the print routines and needing to acquire
+// the print lock to print information about the crash.
+// For both these reasons, let a thread acquire the printlock 'recursively'.
+
+func printlock() {
+	mp := getg().m
+	mp.locks++ // do not reschedule between printlock++ and lock(&debuglock).
+	mp.printlock++
+	if mp.printlock == 1 {
+		lock(&debuglock)
+	}
+	mp.locks-- // now we know debuglock is held and holding up mp.locks for us.
+}
+
+func printunlock() {
+	mp := getg().m
+	mp.printlock--
+	if mp.printlock == 0 {
+		unlock(&debuglock)
+	}
+}
+
+// write to goroutine-local buffer if diverting output,
+// or else standard error.
+func gwrite(b []byte) {
+	if len(b) == 0 {
+		return
+	}
+	recordForPanic(b)
+	gp := getg()
+	// Don't use the writebuf if gp.m is dying. We want anything
+	// written through gwrite to appear in the terminal rather
+	// than be written to in some buffer, if we're in a panicking state.
+	// Note that we can't just clear writebuf in the gp.m.dying case
+	// because a panic isn't allowed to have any write barriers.
+	if gp == nil || gp.writebuf == nil || gp.m.dying > 0 {
+		writeErr(b)
+		return
+	}
+
+	n := copy(gp.writebuf[len(gp.writebuf):cap(gp.writebuf)], b)
+	gp.writebuf = gp.writebuf[:len(gp.writebuf)+n]
+}
+
+func printsp() {
+	printstring(" ")
+}
+
+func printnl() {
+	printstring("\n")
+}
+
+func printbool(v bool) {
+	if v {
+		printstring("true")
+	} else {
+		printstring("false")
+	}
+}
+
+func printfloat(v float64) {
+	switch {
+	case v != v:
+		printstring("NaN")
+		return
+	case v+v == v && v > 0:
+		printstring("+Inf")
+		return
+	case v+v == v && v < 0:
+		printstring("-Inf")
+		return
+	}
+
+	const n = 7 // digits printed
+	var buf [n + 7]byte
+	buf[0] = '+'
+	e := 0 // exp
+	if v == 0 {
+		if 1/v < 0 {
+			buf[0] = '-'
+		}
+	} else {
+		if v < 0 {
+			v = -v
+			buf[0] = '-'
+		}
+
+		// normalize
+		for v >= 10 {
+			e++
+			v /= 10
+		}
+		for v < 1 {
+			e--
+			v *= 10
+		}
+
+		// round
+		h := 5.0
+		for i := 0; i < n; i++ {
+			h /= 10
+		}
+		v += h
+		if v >= 10 {
+			e++
+			v /= 10
+		}
+	}
+
+	// format +d.dddd+edd
+	for i := 0; i < n; i++ {
+		s := int(v)
+		buf[i+2] = byte(s + '0')
+		v -= float64(s)
+		v *= 10
+	}
+	buf[1] = buf[2]
+	buf[2] = '.'
+
+	buf[n+2] = 'e'
+	buf[n+3] = '+'
+	if e < 0 {
+		e = -e
+		buf[n+3] = '-'
+	}
+
+	buf[n+4] = byte(e/100) + '0'
+	buf[n+5] = byte(e/10)%10 + '0'
+	buf[n+6] = byte(e%10) + '0'
+	gwrite(buf[:])
+}
+
+func printcomplex(c complex128) {
+	print("(", real(c), imag(c), "i)")
+}
+
+func printuint(v uint64) {
+	var buf [100]byte
+	i := len(buf)
+	for i--; i > 0; i-- {
+		buf[i] = byte(v%10 + '0')
+		if v < 10 {
+			break
+		}
+		v /= 10
+	}
+	gwrite(buf[i:])
+}
+
+func printint(v int64) {
+	if v < 0 {
+		printstring("-")
+		v = -v
+	}
+	printuint(uint64(v))
+}
+
+var minhexdigits = 0 // protected by printlock
+
+func printhex(v uint64) {
+	const dig = "0123456789abcdef"
+	var buf [100]byte
+	i := len(buf)
+	for i--; i > 0; i-- {
+		buf[i] = dig[v%16]
+		if v < 16 && len(buf)-i >= minhexdigits {
+			break
+		}
+		v /= 16
+	}
+	i--
+	buf[i] = 'x'
+	i--
+	buf[i] = '0'
+	gwrite(buf[i:])
+}
+
+func printpointer(p unsafe.Pointer) {
+	printhex(uint64(uintptr(p)))
+}
+func printuintptr(p uintptr) {
+	printhex(uint64(p))
+}
+
+func printstring(s string) {
+	gwrite(bytes(s))
+}
+
+func printslice(s []byte) {
+	sp := (*slice)(unsafe.Pointer(&s))
+	print("[", len(s), "/", cap(s), "]")
+	printpointer(sp.array)
+}
+
+func printeface(e eface) {
+	print("(", e._type, ",", e.data, ")")
+}
+
+func printiface(i iface) {
+	print("(", i.tab, ",", i.data, ")")
+}
+
+// hexdumpWords prints a word-oriented hex dump of [p, end).
+//
+// If mark != nil, it will be called with each printed word's address
+// and should return a character mark to appear just before that
+// word's value. It can return 0 to indicate no mark.
+func hexdumpWords(p, end uintptr, mark func(uintptr) byte) {
+	printlock()
+	var markbuf [1]byte
+	markbuf[0] = ' '
+	minhexdigits = int(unsafe.Sizeof(uintptr(0)) * 2)
+	for i := uintptr(0); p+i < end; i += goarch.PtrSize {
+		if i%16 == 0 {
+			if i != 0 {
+				println()
+			}
+			print(hex(p+i), ": ")
+		}
+
+		if mark != nil {
+			markbuf[0] = mark(p + i)
+			if markbuf[0] == 0 {
+				markbuf[0] = ' '
+			}
+		}
+		gwrite(markbuf[:])
+		val := *(*uintptr)(unsafe.Pointer(p + i))
+		print(hex(val))
+		print(" ")
+
+		// Can we symbolize val?
+		fn := findfunc(val)
+		if fn.valid() {
+			print("<", funcname(fn), "+", hex(val-fn.entry()), "> ")
+		}
+	}
+	minhexdigits = 0
+	println()
+	printunlock()
+}
diff --git a/src/runtime/proc.go b/src/runtime/proc.go
new file mode 100644
index 0000000..b408337
--- /dev/null
+++ b/src/runtime/proc.go
@@ -0,0 +1,6757 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/cpu"
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// set using cmd/go/internal/modload.ModInfoProg
+var modinfo string
+
+// Goroutine scheduler
+// The scheduler's job is to distribute ready-to-run goroutines over worker threads.
+//
+// The main concepts are:
+// G - goroutine.
+// M - worker thread, or machine.
+// P - processor, a resource that is required to execute Go code.
+//     M must have an associated P to execute Go code, however it can be
+//     blocked or in a syscall w/o an associated P.
+//
+// Design doc at https://golang.org/s/go11sched.
+
+// Worker thread parking/unparking.
+// We need to balance between keeping enough running worker threads to utilize
+// available hardware parallelism and parking excessive running worker threads
+// to conserve CPU resources and power. This is not simple for two reasons:
+// (1) scheduler state is intentionally distributed (in particular, per-P work
+// queues), so it is not possible to compute global predicates on fast paths;
+// (2) for optimal thread management we would need to know the future (don't park
+// a worker thread when a new goroutine will be readied in near future).
+//
+// Three rejected approaches that would work badly:
+// 1. Centralize all scheduler state (would inhibit scalability).
+// 2. Direct goroutine handoff. That is, when we ready a new goroutine and there
+//    is a spare P, unpark a thread and handoff it the thread and the goroutine.
+//    This would lead to thread state thrashing, as the thread that readied the
+//    goroutine can be out of work the very next moment, we will need to park it.
+//    Also, it would destroy locality of computation as we want to preserve
+//    dependent goroutines on the same thread; and introduce additional latency.
+// 3. Unpark an additional thread whenever we ready a goroutine and there is an
+//    idle P, but don't do handoff. This would lead to excessive thread parking/
+//    unparking as the additional threads will instantly park without discovering
+//    any work to do.
+//
+// The current approach:
+//
+// This approach applies to three primary sources of potential work: readying a
+// goroutine, new/modified-earlier timers, and idle-priority GC. See below for
+// additional details.
+//
+// We unpark an additional thread when we submit work if (this is wakep()):
+// 1. There is an idle P, and
+// 2. There are no "spinning" worker threads.
+//
+// A worker thread is considered spinning if it is out of local work and did
+// not find work in the global run queue or netpoller; the spinning state is
+// denoted in m.spinning and in sched.nmspinning. Threads unparked this way are
+// also considered spinning; we don't do goroutine handoff so such threads are
+// out of work initially. Spinning threads spin on looking for work in per-P
+// run queues and timer heaps or from the GC before parking. If a spinning
+// thread finds work it takes itself out of the spinning state and proceeds to
+// execution. If it does not find work it takes itself out of the spinning
+// state and then parks.
+//
+// If there is at least one spinning thread (sched.nmspinning>1), we don't
+// unpark new threads when submitting work. To compensate for that, if the last
+// spinning thread finds work and stops spinning, it must unpark a new spinning
+// thread. This approach smooths out unjustified spikes of thread unparking,
+// but at the same time guarantees eventual maximal CPU parallelism
+// utilization.
+//
+// The main implementation complication is that we need to be very careful
+// during spinning->non-spinning thread transition. This transition can race
+// with submission of new work, and either one part or another needs to unpark
+// another worker thread. If they both fail to do that, we can end up with
+// semi-persistent CPU underutilization.
+//
+// The general pattern for submission is:
+// 1. Submit work to the local run queue, timer heap, or GC state.
+// 2. #StoreLoad-style memory barrier.
+// 3. Check sched.nmspinning.
+//
+// The general pattern for spinning->non-spinning transition is:
+// 1. Decrement nmspinning.
+// 2. #StoreLoad-style memory barrier.
+// 3. Check all per-P work queues and GC for new work.
+//
+// Note that all this complexity does not apply to global run queue as we are
+// not sloppy about thread unparking when submitting to global queue. Also see
+// comments for nmspinning manipulation.
+//
+// How these different sources of work behave varies, though it doesn't affect
+// the synchronization approach:
+// * Ready goroutine: this is an obvious source of work; the goroutine is
+//   immediately ready and must run on some thread eventually.
+// * New/modified-earlier timer: The current timer implementation (see time.go)
+//   uses netpoll in a thread with no work available to wait for the soonest
+//   timer. If there is no thread waiting, we want a new spinning thread to go
+//   wait.
+// * Idle-priority GC: The GC wakes a stopped idle thread to contribute to
+//   background GC work (note: currently disabled per golang.org/issue/19112).
+//   Also see golang.org/issue/44313, as this should be extended to all GC
+//   workers.
+
+var (
+	m0           m
+	g0           g
+	mcache0      *mcache
+	raceprocctx0 uintptr
+	raceFiniLock mutex
+)
+
+// This slice records the initializing tasks that need to be
+// done to start up the runtime. It is built by the linker.
+var runtime_inittasks []*initTask
+
+// main_init_done is a signal used by cgocallbackg that initialization
+// has been completed. It is made before _cgo_notify_runtime_init_done,
+// so all cgo calls can rely on it existing. When main_init is complete,
+// it is closed, meaning cgocallbackg can reliably receive from it.
+var main_init_done chan bool
+
+//go:linkname main_main main.main
+func main_main()
+
+// mainStarted indicates that the main M has started.
+var mainStarted bool
+
+// runtimeInitTime is the nanotime() at which the runtime started.
+var runtimeInitTime int64
+
+// Value to use for signal mask for newly created M's.
+var initSigmask sigset
+
+// The main goroutine.
+func main() {
+	mp := getg().m
+
+	// Racectx of m0->g0 is used only as the parent of the main goroutine.
+	// It must not be used for anything else.
+	mp.g0.racectx = 0
+
+	// Max stack size is 1 GB on 64-bit, 250 MB on 32-bit.
+	// Using decimal instead of binary GB and MB because
+	// they look nicer in the stack overflow failure message.
+	if goarch.PtrSize == 8 {
+		maxstacksize = 1000000000
+	} else {
+		maxstacksize = 250000000
+	}
+
+	// An upper limit for max stack size. Used to avoid random crashes
+	// after calling SetMaxStack and trying to allocate a stack that is too big,
+	// since stackalloc works with 32-bit sizes.
+	maxstackceiling = 2 * maxstacksize
+
+	// Allow newproc to start new Ms.
+	mainStarted = true
+
+	if GOARCH != "wasm" { // no threads on wasm yet, so no sysmon
+		systemstack(func() {
+			newm(sysmon, nil, -1)
+		})
+	}
+
+	// Lock the main goroutine onto this, the main OS thread,
+	// during initialization. Most programs won't care, but a few
+	// do require certain calls to be made by the main thread.
+	// Those can arrange for main.main to run in the main thread
+	// by calling runtime.LockOSThread during initialization
+	// to preserve the lock.
+	lockOSThread()
+
+	if mp != &m0 {
+		throw("runtime.main not on m0")
+	}
+
+	// Record when the world started.
+	// Must be before doInit for tracing init.
+	runtimeInitTime = nanotime()
+	if runtimeInitTime == 0 {
+		throw("nanotime returning zero")
+	}
+
+	if debug.inittrace != 0 {
+		inittrace.id = getg().goid
+		inittrace.active = true
+	}
+
+	doInit(runtime_inittasks) // Must be before defer.
+
+	// Defer unlock so that runtime.Goexit during init does the unlock too.
+	needUnlock := true
+	defer func() {
+		if needUnlock {
+			unlockOSThread()
+		}
+	}()
+
+	gcenable()
+
+	main_init_done = make(chan bool)
+	if iscgo {
+		if _cgo_pthread_key_created == nil {
+			throw("_cgo_pthread_key_created missing")
+		}
+
+		if _cgo_thread_start == nil {
+			throw("_cgo_thread_start missing")
+		}
+		if GOOS != "windows" {
+			if _cgo_setenv == nil {
+				throw("_cgo_setenv missing")
+			}
+			if _cgo_unsetenv == nil {
+				throw("_cgo_unsetenv missing")
+			}
+		}
+		if _cgo_notify_runtime_init_done == nil {
+			throw("_cgo_notify_runtime_init_done missing")
+		}
+
+		// Set the x_crosscall2_ptr C function pointer variable point to crosscall2.
+		if set_crosscall2 == nil {
+			throw("set_crosscall2 missing")
+		}
+		set_crosscall2()
+
+		// Start the template thread in case we enter Go from
+		// a C-created thread and need to create a new thread.
+		startTemplateThread()
+		cgocall(_cgo_notify_runtime_init_done, nil)
+	}
+
+	// Run the initializing tasks. Depending on build mode this
+	// list can arrive a few different ways, but it will always
+	// contain the init tasks computed by the linker for all the
+	// packages in the program (excluding those added at runtime
+	// by package plugin).
+	for _, m := range activeModules() {
+		doInit(m.inittasks)
+	}
+
+	// Disable init tracing after main init done to avoid overhead
+	// of collecting statistics in malloc and newproc
+	inittrace.active = false
+
+	close(main_init_done)
+
+	needUnlock = false
+	unlockOSThread()
+
+	if isarchive || islibrary {
+		// A program compiled with -buildmode=c-archive or c-shared
+		// has a main, but it is not executed.
+		return
+	}
+	fn := main_main // make an indirect call, as the linker doesn't know the address of the main package when laying down the runtime
+	fn()
+	if raceenabled {
+		runExitHooks(0) // run hooks now, since racefini does not return
+		racefini()
+	}
+
+	// Make racy client program work: if panicking on
+	// another goroutine at the same time as main returns,
+	// let the other goroutine finish printing the panic trace.
+	// Once it does, it will exit. See issues 3934 and 20018.
+	if runningPanicDefers.Load() != 0 {
+		// Running deferred functions should not take long.
+		for c := 0; c < 1000; c++ {
+			if runningPanicDefers.Load() == 0 {
+				break
+			}
+			Gosched()
+		}
+	}
+	if panicking.Load() != 0 {
+		gopark(nil, nil, waitReasonPanicWait, traceBlockForever, 1)
+	}
+	runExitHooks(0)
+
+	exit(0)
+	for {
+		var x *int32
+		*x = 0
+	}
+}
+
+// os_beforeExit is called from os.Exit(0).
+//
+//go:linkname os_beforeExit os.runtime_beforeExit
+func os_beforeExit(exitCode int) {
+	runExitHooks(exitCode)
+	if exitCode == 0 && raceenabled {
+		racefini()
+	}
+}
+
+// start forcegc helper goroutine
+func init() {
+	go forcegchelper()
+}
+
+func forcegchelper() {
+	forcegc.g = getg()
+	lockInit(&forcegc.lock, lockRankForcegc)
+	for {
+		lock(&forcegc.lock)
+		if forcegc.idle.Load() {
+			throw("forcegc: phase error")
+		}
+		forcegc.idle.Store(true)
+		goparkunlock(&forcegc.lock, waitReasonForceGCIdle, traceBlockSystemGoroutine, 1)
+		// this goroutine is explicitly resumed by sysmon
+		if debug.gctrace > 0 {
+			println("GC forced")
+		}
+		// Time-triggered, fully concurrent.
+		gcStart(gcTrigger{kind: gcTriggerTime, now: nanotime()})
+	}
+}
+
+// Gosched yields the processor, allowing other goroutines to run. It does not
+// suspend the current goroutine, so execution resumes automatically.
+//
+//go:nosplit
+func Gosched() {
+	checkTimeouts()
+	mcall(gosched_m)
+}
+
+// goschedguarded yields the processor like gosched, but also checks
+// for forbidden states and opts out of the yield in those cases.
+//
+//go:nosplit
+func goschedguarded() {
+	mcall(goschedguarded_m)
+}
+
+// goschedIfBusy yields the processor like gosched, but only does so if
+// there are no idle Ps or if we're on the only P and there's nothing in
+// the run queue. In both cases, there is freely available idle time.
+//
+//go:nosplit
+func goschedIfBusy() {
+	gp := getg()
+	// Call gosched if gp.preempt is set; we may be in a tight loop that
+	// doesn't otherwise yield.
+	if !gp.preempt && sched.npidle.Load() > 0 {
+		return
+	}
+	mcall(gosched_m)
+}
+
+// Puts the current goroutine into a waiting state and calls unlockf on the
+// system stack.
+//
+// If unlockf returns false, the goroutine is resumed.
+//
+// unlockf must not access this G's stack, as it may be moved between
+// the call to gopark and the call to unlockf.
+//
+// Note that because unlockf is called after putting the G into a waiting
+// state, the G may have already been readied by the time unlockf is called
+// unless there is external synchronization preventing the G from being
+// readied. If unlockf returns false, it must guarantee that the G cannot be
+// externally readied.
+//
+// Reason explains why the goroutine has been parked. It is displayed in stack
+// traces and heap dumps. Reasons should be unique and descriptive. Do not
+// re-use reasons, add new ones.
+func gopark(unlockf func(*g, unsafe.Pointer) bool, lock unsafe.Pointer, reason waitReason, traceReason traceBlockReason, traceskip int) {
+	if reason != waitReasonSleep {
+		checkTimeouts() // timeouts may expire while two goroutines keep the scheduler busy
+	}
+	mp := acquirem()
+	gp := mp.curg
+	status := readgstatus(gp)
+	if status != _Grunning && status != _Gscanrunning {
+		throw("gopark: bad g status")
+	}
+	mp.waitlock = lock
+	mp.waitunlockf = unlockf
+	gp.waitreason = reason
+	mp.waitTraceBlockReason = traceReason
+	mp.waitTraceSkip = traceskip
+	releasem(mp)
+	// can't do anything that might move the G between Ms here.
+	mcall(park_m)
+}
+
+// Puts the current goroutine into a waiting state and unlocks the lock.
+// The goroutine can be made runnable again by calling goready(gp).
+func goparkunlock(lock *mutex, reason waitReason, traceReason traceBlockReason, traceskip int) {
+	gopark(parkunlock_c, unsafe.Pointer(lock), reason, traceReason, traceskip)
+}
+
+func goready(gp *g, traceskip int) {
+	systemstack(func() {
+		ready(gp, traceskip, true)
+	})
+}
+
+//go:nosplit
+func acquireSudog() *sudog {
+	// Delicate dance: the semaphore implementation calls
+	// acquireSudog, acquireSudog calls new(sudog),
+	// new calls malloc, malloc can call the garbage collector,
+	// and the garbage collector calls the semaphore implementation
+	// in stopTheWorld.
+	// Break the cycle by doing acquirem/releasem around new(sudog).
+	// The acquirem/releasem increments m.locks during new(sudog),
+	// which keeps the garbage collector from being invoked.
+	mp := acquirem()
+	pp := mp.p.ptr()
+	if len(pp.sudogcache) == 0 {
+		lock(&sched.sudoglock)
+		// First, try to grab a batch from central cache.
+		for len(pp.sudogcache) < cap(pp.sudogcache)/2 && sched.sudogcache != nil {
+			s := sched.sudogcache
+			sched.sudogcache = s.next
+			s.next = nil
+			pp.sudogcache = append(pp.sudogcache, s)
+		}
+		unlock(&sched.sudoglock)
+		// If the central cache is empty, allocate a new one.
+		if len(pp.sudogcache) == 0 {
+			pp.sudogcache = append(pp.sudogcache, new(sudog))
+		}
+	}
+	n := len(pp.sudogcache)
+	s := pp.sudogcache[n-1]
+	pp.sudogcache[n-1] = nil
+	pp.sudogcache = pp.sudogcache[:n-1]
+	if s.elem != nil {
+		throw("acquireSudog: found s.elem != nil in cache")
+	}
+	releasem(mp)
+	return s
+}
+
+//go:nosplit
+func releaseSudog(s *sudog) {
+	if s.elem != nil {
+		throw("runtime: sudog with non-nil elem")
+	}
+	if s.isSelect {
+		throw("runtime: sudog with non-false isSelect")
+	}
+	if s.next != nil {
+		throw("runtime: sudog with non-nil next")
+	}
+	if s.prev != nil {
+		throw("runtime: sudog with non-nil prev")
+	}
+	if s.waitlink != nil {
+		throw("runtime: sudog with non-nil waitlink")
+	}
+	if s.c != nil {
+		throw("runtime: sudog with non-nil c")
+	}
+	gp := getg()
+	if gp.param != nil {
+		throw("runtime: releaseSudog with non-nil gp.param")
+	}
+	mp := acquirem() // avoid rescheduling to another P
+	pp := mp.p.ptr()
+	if len(pp.sudogcache) == cap(pp.sudogcache) {
+		// Transfer half of local cache to the central cache.
+		var first, last *sudog
+		for len(pp.sudogcache) > cap(pp.sudogcache)/2 {
+			n := len(pp.sudogcache)
+			p := pp.sudogcache[n-1]
+			pp.sudogcache[n-1] = nil
+			pp.sudogcache = pp.sudogcache[:n-1]
+			if first == nil {
+				first = p
+			} else {
+				last.next = p
+			}
+			last = p
+		}
+		lock(&sched.sudoglock)
+		last.next = sched.sudogcache
+		sched.sudogcache = first
+		unlock(&sched.sudoglock)
+	}
+	pp.sudogcache = append(pp.sudogcache, s)
+	releasem(mp)
+}
+
+// called from assembly.
+func badmcall(fn func(*g)) {
+	throw("runtime: mcall called on m->g0 stack")
+}
+
+func badmcall2(fn func(*g)) {
+	throw("runtime: mcall function returned")
+}
+
+func badreflectcall() {
+	panic(plainError("arg size to reflect.call more than 1GB"))
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func badmorestackg0() {
+	writeErrStr("fatal: morestack on g0\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func badmorestackgsignal() {
+	writeErrStr("fatal: morestack on gsignal\n")
+}
+
+//go:nosplit
+func badctxt() {
+	throw("ctxt != 0")
+}
+
+func lockedOSThread() bool {
+	gp := getg()
+	return gp.lockedm != 0 && gp.m.lockedg != 0
+}
+
+var (
+	// allgs contains all Gs ever created (including dead Gs), and thus
+	// never shrinks.
+	//
+	// Access via the slice is protected by allglock or stop-the-world.
+	// Readers that cannot take the lock may (carefully!) use the atomic
+	// variables below.
+	allglock mutex
+	allgs    []*g
+
+	// allglen and allgptr are atomic variables that contain len(allgs) and
+	// &allgs[0] respectively. Proper ordering depends on totally-ordered
+	// loads and stores. Writes are protected by allglock.
+	//
+	// allgptr is updated before allglen. Readers should read allglen
+	// before allgptr to ensure that allglen is always <= len(allgptr). New
+	// Gs appended during the race can be missed. For a consistent view of
+	// all Gs, allglock must be held.
+	//
+	// allgptr copies should always be stored as a concrete type or
+	// unsafe.Pointer, not uintptr, to ensure that GC can still reach it
+	// even if it points to a stale array.
+	allglen uintptr
+	allgptr **g
+)
+
+func allgadd(gp *g) {
+	if readgstatus(gp) == _Gidle {
+		throw("allgadd: bad status Gidle")
+	}
+
+	lock(&allglock)
+	allgs = append(allgs, gp)
+	if &allgs[0] != allgptr {
+		atomicstorep(unsafe.Pointer(&allgptr), unsafe.Pointer(&allgs[0]))
+	}
+	atomic.Storeuintptr(&allglen, uintptr(len(allgs)))
+	unlock(&allglock)
+}
+
+// allGsSnapshot returns a snapshot of the slice of all Gs.
+//
+// The world must be stopped or allglock must be held.
+func allGsSnapshot() []*g {
+	assertWorldStoppedOrLockHeld(&allglock)
+
+	// Because the world is stopped or allglock is held, allgadd
+	// cannot happen concurrently with this. allgs grows
+	// monotonically and existing entries never change, so we can
+	// simply return a copy of the slice header. For added safety,
+	// we trim everything past len because that can still change.
+	return allgs[:len(allgs):len(allgs)]
+}
+
+// atomicAllG returns &allgs[0] and len(allgs) for use with atomicAllGIndex.
+func atomicAllG() (**g, uintptr) {
+	length := atomic.Loaduintptr(&allglen)
+	ptr := (**g)(atomic.Loadp(unsafe.Pointer(&allgptr)))
+	return ptr, length
+}
+
+// atomicAllGIndex returns ptr[i] with the allgptr returned from atomicAllG.
+func atomicAllGIndex(ptr **g, i uintptr) *g {
+	return *(**g)(add(unsafe.Pointer(ptr), i*goarch.PtrSize))
+}
+
+// forEachG calls fn on every G from allgs.
+//
+// forEachG takes a lock to exclude concurrent addition of new Gs.
+func forEachG(fn func(gp *g)) {
+	lock(&allglock)
+	for _, gp := range allgs {
+		fn(gp)
+	}
+	unlock(&allglock)
+}
+
+// forEachGRace calls fn on every G from allgs.
+//
+// forEachGRace avoids locking, but does not exclude addition of new Gs during
+// execution, which may be missed.
+func forEachGRace(fn func(gp *g)) {
+	ptr, length := atomicAllG()
+	for i := uintptr(0); i < length; i++ {
+		gp := atomicAllGIndex(ptr, i)
+		fn(gp)
+	}
+	return
+}
+
+const (
+	// Number of goroutine ids to grab from sched.goidgen to local per-P cache at once.
+	// 16 seems to provide enough amortization, but other than that it's mostly arbitrary number.
+	_GoidCacheBatch = 16
+)
+
+// cpuinit sets up CPU feature flags and calls internal/cpu.Initialize. env should be the complete
+// value of the GODEBUG environment variable.
+func cpuinit(env string) {
+	switch GOOS {
+	case "aix", "darwin", "ios", "dragonfly", "freebsd", "netbsd", "openbsd", "illumos", "solaris", "linux":
+		cpu.DebugOptions = true
+	}
+	cpu.Initialize(env)
+
+	// Support cpu feature variables are used in code generated by the compiler
+	// to guard execution of instructions that can not be assumed to be always supported.
+	switch GOARCH {
+	case "386", "amd64":
+		x86HasPOPCNT = cpu.X86.HasPOPCNT
+		x86HasSSE41 = cpu.X86.HasSSE41
+		x86HasFMA = cpu.X86.HasFMA
+
+	case "arm":
+		armHasVFPv4 = cpu.ARM.HasVFPv4
+
+	case "arm64":
+		arm64HasATOMICS = cpu.ARM64.HasATOMICS
+	}
+}
+
+// getGodebugEarly extracts the environment variable GODEBUG from the environment on
+// Unix-like operating systems and returns it. This function exists to extract GODEBUG
+// early before much of the runtime is initialized.
+func getGodebugEarly() string {
+	const prefix = "GODEBUG="
+	var env string
+	switch GOOS {
+	case "aix", "darwin", "ios", "dragonfly", "freebsd", "netbsd", "openbsd", "illumos", "solaris", "linux":
+		// Similar to goenv_unix but extracts the environment value for
+		// GODEBUG directly.
+		// TODO(moehrmann): remove when general goenvs() can be called before cpuinit()
+		n := int32(0)
+		for argv_index(argv, argc+1+n) != nil {
+			n++
+		}
+
+		for i := int32(0); i < n; i++ {
+			p := argv_index(argv, argc+1+i)
+			s := unsafe.String(p, findnull(p))
+
+			if hasPrefix(s, prefix) {
+				env = gostring(p)[len(prefix):]
+				break
+			}
+		}
+	}
+	return env
+}
+
+// The bootstrap sequence is:
+//
+//	call osinit
+//	call schedinit
+//	make & queue new G
+//	call runtime·mstart
+//
+// The new G calls runtime·main.
+func schedinit() {
+	lockInit(&sched.lock, lockRankSched)
+	lockInit(&sched.sysmonlock, lockRankSysmon)
+	lockInit(&sched.deferlock, lockRankDefer)
+	lockInit(&sched.sudoglock, lockRankSudog)
+	lockInit(&deadlock, lockRankDeadlock)
+	lockInit(&paniclk, lockRankPanic)
+	lockInit(&allglock, lockRankAllg)
+	lockInit(&allpLock, lockRankAllp)
+	lockInit(&reflectOffs.lock, lockRankReflectOffs)
+	lockInit(&finlock, lockRankFin)
+	lockInit(&cpuprof.lock, lockRankCpuprof)
+	allocmLock.init(lockRankAllocmR, lockRankAllocmRInternal, lockRankAllocmW)
+	execLock.init(lockRankExecR, lockRankExecRInternal, lockRankExecW)
+	traceLockInit()
+	// Enforce that this lock is always a leaf lock.
+	// All of this lock's critical sections should be
+	// extremely short.
+	lockInit(&memstats.heapStats.noPLock, lockRankLeafRank)
+
+	// raceinit must be the first call to race detector.
+	// In particular, it must be done before mallocinit below calls racemapshadow.
+	gp := getg()
+	if raceenabled {
+		gp.racectx, raceprocctx0 = raceinit()
+	}
+
+	sched.maxmcount = 10000
+
+	// The world starts stopped.
+	worldStopped()
+
+	moduledataverify()
+	stackinit()
+	mallocinit()
+	godebug := getGodebugEarly()
+	initPageTrace(godebug) // must run after mallocinit but before anything allocates
+	cpuinit(godebug)       // must run before alginit
+	alginit()              // maps, hash, fastrand must not be used before this call
+	fastrandinit()         // must run before mcommoninit
+	mcommoninit(gp.m, -1)
+	modulesinit()   // provides activeModules
+	typelinksinit() // uses maps, activeModules
+	itabsinit()     // uses activeModules
+	stkobjinit()    // must run before GC starts
+
+	sigsave(&gp.m.sigmask)
+	initSigmask = gp.m.sigmask
+
+	goargs()
+	goenvs()
+	secure()
+	parsedebugvars()
+	gcinit()
+
+	// if disableMemoryProfiling is set, update MemProfileRate to 0 to turn off memprofile.
+	// Note: parsedebugvars may update MemProfileRate, but when disableMemoryProfiling is
+	// set to true by the linker, it means that nothing is consuming the profile, it is
+	// safe to set MemProfileRate to 0.
+	if disableMemoryProfiling {
+		MemProfileRate = 0
+	}
+
+	lock(&sched.lock)
+	sched.lastpoll.Store(nanotime())
+	procs := ncpu
+	if n, ok := atoi32(gogetenv("GOMAXPROCS")); ok && n > 0 {
+		procs = n
+	}
+	if procresize(procs) != nil {
+		throw("unknown runnable goroutine during bootstrap")
+	}
+	unlock(&sched.lock)
+
+	// World is effectively started now, as P's can run.
+	worldStarted()
+
+	if buildVersion == "" {
+		// Condition should never trigger. This code just serves
+		// to ensure runtime·buildVersion is kept in the resulting binary.
+		buildVersion = "unknown"
+	}
+	if len(modinfo) == 1 {
+		// Condition should never trigger. This code just serves
+		// to ensure runtime·modinfo is kept in the resulting binary.
+		modinfo = ""
+	}
+}
+
+func dumpgstatus(gp *g) {
+	thisg := getg()
+	print("runtime:   gp: gp=", gp, ", goid=", gp.goid, ", gp->atomicstatus=", readgstatus(gp), "\n")
+	print("runtime: getg:  g=", thisg, ", goid=", thisg.goid, ",  g->atomicstatus=", readgstatus(thisg), "\n")
+}
+
+// sched.lock must be held.
+func checkmcount() {
+	assertLockHeld(&sched.lock)
+
+	// Exclude extra M's, which are used for cgocallback from threads
+	// created in C.
+	//
+	// The purpose of the SetMaxThreads limit is to avoid accidental fork
+	// bomb from something like millions of goroutines blocking on system
+	// calls, causing the runtime to create millions of threads. By
+	// definition, this isn't a problem for threads created in C, so we
+	// exclude them from the limit. See https://go.dev/issue/60004.
+	count := mcount() - int32(extraMInUse.Load()) - int32(extraMLength.Load())
+	if count > sched.maxmcount {
+		print("runtime: program exceeds ", sched.maxmcount, "-thread limit\n")
+		throw("thread exhaustion")
+	}
+}
+
+// mReserveID returns the next ID to use for a new m. This new m is immediately
+// considered 'running' by checkdead.
+//
+// sched.lock must be held.
+func mReserveID() int64 {
+	assertLockHeld(&sched.lock)
+
+	if sched.mnext+1 < sched.mnext {
+		throw("runtime: thread ID overflow")
+	}
+	id := sched.mnext
+	sched.mnext++
+	checkmcount()
+	return id
+}
+
+// Pre-allocated ID may be passed as 'id', or omitted by passing -1.
+func mcommoninit(mp *m, id int64) {
+	gp := getg()
+
+	// g0 stack won't make sense for user (and is not necessary unwindable).
+	if gp != gp.m.g0 {
+		callers(1, mp.createstack[:])
+	}
+
+	lock(&sched.lock)
+
+	if id >= 0 {
+		mp.id = id
+	} else {
+		mp.id = mReserveID()
+	}
+
+	lo := uint32(int64Hash(uint64(mp.id), fastrandseed))
+	hi := uint32(int64Hash(uint64(cputicks()), ^fastrandseed))
+	if lo|hi == 0 {
+		hi = 1
+	}
+	// Same behavior as for 1.17.
+	// TODO: Simplify this.
+	if goarch.BigEndian {
+		mp.fastrand = uint64(lo)<<32 | uint64(hi)
+	} else {
+		mp.fastrand = uint64(hi)<<32 | uint64(lo)
+	}
+
+	mpreinit(mp)
+	if mp.gsignal != nil {
+		mp.gsignal.stackguard1 = mp.gsignal.stack.lo + stackGuard
+	}
+
+	// Add to allm so garbage collector doesn't free g->m
+	// when it is just in a register or thread-local storage.
+	mp.alllink = allm
+
+	// NumCgoCall() iterates over allm w/o schedlock,
+	// so we need to publish it safely.
+	atomicstorep(unsafe.Pointer(&allm), unsafe.Pointer(mp))
+	unlock(&sched.lock)
+
+	// Allocate memory to hold a cgo traceback if the cgo call crashes.
+	if iscgo || GOOS == "solaris" || GOOS == "illumos" || GOOS == "windows" {
+		mp.cgoCallers = new(cgoCallers)
+	}
+}
+
+func (mp *m) becomeSpinning() {
+	mp.spinning = true
+	sched.nmspinning.Add(1)
+	sched.needspinning.Store(0)
+}
+
+func (mp *m) hasCgoOnStack() bool {
+	return mp.ncgo > 0 || mp.isextra
+}
+
+var fastrandseed uintptr
+
+func fastrandinit() {
+	s := (*[unsafe.Sizeof(fastrandseed)]byte)(unsafe.Pointer(&fastrandseed))[:]
+	getRandomData(s)
+}
+
+// Mark gp ready to run.
+func ready(gp *g, traceskip int, next bool) {
+	if traceEnabled() {
+		traceGoUnpark(gp, traceskip)
+	}
+
+	status := readgstatus(gp)
+
+	// Mark runnable.
+	mp := acquirem() // disable preemption because it can be holding p in a local var
+	if status&^_Gscan != _Gwaiting {
+		dumpgstatus(gp)
+		throw("bad g->status in ready")
+	}
+
+	// status is Gwaiting or Gscanwaiting, make Grunnable and put on runq
+	casgstatus(gp, _Gwaiting, _Grunnable)
+	runqput(mp.p.ptr(), gp, next)
+	wakep()
+	releasem(mp)
+}
+
+// freezeStopWait is a large value that freezetheworld sets
+// sched.stopwait to in order to request that all Gs permanently stop.
+const freezeStopWait = 0x7fffffff
+
+// freezing is set to non-zero if the runtime is trying to freeze the
+// world.
+var freezing atomic.Bool
+
+// Similar to stopTheWorld but best-effort and can be called several times.
+// There is no reverse operation, used during crashing.
+// This function must not lock any mutexes.
+func freezetheworld() {
+	freezing.Store(true)
+	if debug.dontfreezetheworld > 0 {
+		// Don't prempt Ps to stop goroutines. That will perturb
+		// scheduler state, making debugging more difficult. Instead,
+		// allow goroutines to continue execution.
+		//
+		// fatalpanic will tracebackothers to trace all goroutines. It
+		// is unsafe to trace a running goroutine, so tracebackothers
+		// will skip running goroutines. That is OK and expected, we
+		// expect users of dontfreezetheworld to use core files anyway.
+		//
+		// However, allowing the scheduler to continue running free
+		// introduces a race: a goroutine may be stopped when
+		// tracebackothers checks its status, and then start running
+		// later when we are in the middle of traceback, potentially
+		// causing a crash.
+		//
+		// To mitigate this, when an M naturally enters the scheduler,
+		// schedule checks if freezing is set and if so stops
+		// execution. This guarantees that while Gs can transition from
+		// running to stopped, they can never transition from stopped
+		// to running.
+		//
+		// The sleep here allows racing Ms that missed freezing and are
+		// about to run a G to complete the transition to running
+		// before we start traceback.
+		usleep(1000)
+		return
+	}
+
+	// stopwait and preemption requests can be lost
+	// due to races with concurrently executing threads,
+	// so try several times
+	for i := 0; i < 5; i++ {
+		// this should tell the scheduler to not start any new goroutines
+		sched.stopwait = freezeStopWait
+		sched.gcwaiting.Store(true)
+		// this should stop running goroutines
+		if !preemptall() {
+			break // no running goroutines
+		}
+		usleep(1000)
+	}
+	// to be sure
+	usleep(1000)
+	preemptall()
+	usleep(1000)
+}
+
+// All reads and writes of g's status go through readgstatus, casgstatus
+// castogscanstatus, casfrom_Gscanstatus.
+//
+//go:nosplit
+func readgstatus(gp *g) uint32 {
+	return gp.atomicstatus.Load()
+}
+
+// The Gscanstatuses are acting like locks and this releases them.
+// If it proves to be a performance hit we should be able to make these
+// simple atomic stores but for now we are going to throw if
+// we see an inconsistent state.
+func casfrom_Gscanstatus(gp *g, oldval, newval uint32) {
+	success := false
+
+	// Check that transition is valid.
+	switch oldval {
+	default:
+		print("runtime: casfrom_Gscanstatus bad oldval gp=", gp, ", oldval=", hex(oldval), ", newval=", hex(newval), "\n")
+		dumpgstatus(gp)
+		throw("casfrom_Gscanstatus:top gp->status is not in scan state")
+	case _Gscanrunnable,
+		_Gscanwaiting,
+		_Gscanrunning,
+		_Gscansyscall,
+		_Gscanpreempted:
+		if newval == oldval&^_Gscan {
+			success = gp.atomicstatus.CompareAndSwap(oldval, newval)
+		}
+	}
+	if !success {
+		print("runtime: casfrom_Gscanstatus failed gp=", gp, ", oldval=", hex(oldval), ", newval=", hex(newval), "\n")
+		dumpgstatus(gp)
+		throw("casfrom_Gscanstatus: gp->status is not in scan state")
+	}
+	releaseLockRank(lockRankGscan)
+}
+
+// This will return false if the gp is not in the expected status and the cas fails.
+// This acts like a lock acquire while the casfromgstatus acts like a lock release.
+func castogscanstatus(gp *g, oldval, newval uint32) bool {
+	switch oldval {
+	case _Grunnable,
+		_Grunning,
+		_Gwaiting,
+		_Gsyscall:
+		if newval == oldval|_Gscan {
+			r := gp.atomicstatus.CompareAndSwap(oldval, newval)
+			if r {
+				acquireLockRank(lockRankGscan)
+			}
+			return r
+
+		}
+	}
+	print("runtime: castogscanstatus oldval=", hex(oldval), " newval=", hex(newval), "\n")
+	throw("castogscanstatus")
+	panic("not reached")
+}
+
+// casgstatusAlwaysTrack is a debug flag that causes casgstatus to always track
+// various latencies on every transition instead of sampling them.
+var casgstatusAlwaysTrack = false
+
+// If asked to move to or from a Gscanstatus this will throw. Use the castogscanstatus
+// and casfrom_Gscanstatus instead.
+// casgstatus will loop if the g->atomicstatus is in a Gscan status until the routine that
+// put it in the Gscan state is finished.
+//
+//go:nosplit
+func casgstatus(gp *g, oldval, newval uint32) {
+	if (oldval&_Gscan != 0) || (newval&_Gscan != 0) || oldval == newval {
+		systemstack(func() {
+			print("runtime: casgstatus: oldval=", hex(oldval), " newval=", hex(newval), "\n")
+			throw("casgstatus: bad incoming values")
+		})
+	}
+
+	acquireLockRank(lockRankGscan)
+	releaseLockRank(lockRankGscan)
+
+	// See https://golang.org/cl/21503 for justification of the yield delay.
+	const yieldDelay = 5 * 1000
+	var nextYield int64
+
+	// loop if gp->atomicstatus is in a scan state giving
+	// GC time to finish and change the state to oldval.
+	for i := 0; !gp.atomicstatus.CompareAndSwap(oldval, newval); i++ {
+		if oldval == _Gwaiting && gp.atomicstatus.Load() == _Grunnable {
+			throw("casgstatus: waiting for Gwaiting but is Grunnable")
+		}
+		if i == 0 {
+			nextYield = nanotime() + yieldDelay
+		}
+		if nanotime() < nextYield {
+			for x := 0; x < 10 && gp.atomicstatus.Load() != oldval; x++ {
+				procyield(1)
+			}
+		} else {
+			osyield()
+			nextYield = nanotime() + yieldDelay/2
+		}
+	}
+
+	if oldval == _Grunning {
+		// Track every gTrackingPeriod time a goroutine transitions out of running.
+		if casgstatusAlwaysTrack || gp.trackingSeq%gTrackingPeriod == 0 {
+			gp.tracking = true
+		}
+		gp.trackingSeq++
+	}
+	if !gp.tracking {
+		return
+	}
+
+	// Handle various kinds of tracking.
+	//
+	// Currently:
+	// - Time spent in runnable.
+	// - Time spent blocked on a sync.Mutex or sync.RWMutex.
+	switch oldval {
+	case _Grunnable:
+		// We transitioned out of runnable, so measure how much
+		// time we spent in this state and add it to
+		// runnableTime.
+		now := nanotime()
+		gp.runnableTime += now - gp.trackingStamp
+		gp.trackingStamp = 0
+	case _Gwaiting:
+		if !gp.waitreason.isMutexWait() {
+			// Not blocking on a lock.
+			break
+		}
+		// Blocking on a lock, measure it. Note that because we're
+		// sampling, we have to multiply by our sampling period to get
+		// a more representative estimate of the absolute value.
+		// gTrackingPeriod also represents an accurate sampling period
+		// because we can only enter this state from _Grunning.
+		now := nanotime()
+		sched.totalMutexWaitTime.Add((now - gp.trackingStamp) * gTrackingPeriod)
+		gp.trackingStamp = 0
+	}
+	switch newval {
+	case _Gwaiting:
+		if !gp.waitreason.isMutexWait() {
+			// Not blocking on a lock.
+			break
+		}
+		// Blocking on a lock. Write down the timestamp.
+		now := nanotime()
+		gp.trackingStamp = now
+	case _Grunnable:
+		// We just transitioned into runnable, so record what
+		// time that happened.
+		now := nanotime()
+		gp.trackingStamp = now
+	case _Grunning:
+		// We're transitioning into running, so turn off
+		// tracking and record how much time we spent in
+		// runnable.
+		gp.tracking = false
+		sched.timeToRun.record(gp.runnableTime)
+		gp.runnableTime = 0
+	}
+}
+
+// casGToWaiting transitions gp from old to _Gwaiting, and sets the wait reason.
+//
+// Use this over casgstatus when possible to ensure that a waitreason is set.
+func casGToWaiting(gp *g, old uint32, reason waitReason) {
+	// Set the wait reason before calling casgstatus, because casgstatus will use it.
+	gp.waitreason = reason
+	casgstatus(gp, old, _Gwaiting)
+}
+
+// casgstatus(gp, oldstatus, Gcopystack), assuming oldstatus is Gwaiting or Grunnable.
+// Returns old status. Cannot call casgstatus directly, because we are racing with an
+// async wakeup that might come in from netpoll. If we see Gwaiting from the readgstatus,
+// it might have become Grunnable by the time we get to the cas. If we called casgstatus,
+// it would loop waiting for the status to go back to Gwaiting, which it never will.
+//
+//go:nosplit
+func casgcopystack(gp *g) uint32 {
+	for {
+		oldstatus := readgstatus(gp) &^ _Gscan
+		if oldstatus != _Gwaiting && oldstatus != _Grunnable {
+			throw("copystack: bad status, not Gwaiting or Grunnable")
+		}
+		if gp.atomicstatus.CompareAndSwap(oldstatus, _Gcopystack) {
+			return oldstatus
+		}
+	}
+}
+
+// casGToPreemptScan transitions gp from _Grunning to _Gscan|_Gpreempted.
+//
+// TODO(austin): This is the only status operation that both changes
+// the status and locks the _Gscan bit. Rethink this.
+func casGToPreemptScan(gp *g, old, new uint32) {
+	if old != _Grunning || new != _Gscan|_Gpreempted {
+		throw("bad g transition")
+	}
+	acquireLockRank(lockRankGscan)
+	for !gp.atomicstatus.CompareAndSwap(_Grunning, _Gscan|_Gpreempted) {
+	}
+}
+
+// casGFromPreempted attempts to transition gp from _Gpreempted to
+// _Gwaiting. If successful, the caller is responsible for
+// re-scheduling gp.
+func casGFromPreempted(gp *g, old, new uint32) bool {
+	if old != _Gpreempted || new != _Gwaiting {
+		throw("bad g transition")
+	}
+	gp.waitreason = waitReasonPreempted
+	return gp.atomicstatus.CompareAndSwap(_Gpreempted, _Gwaiting)
+}
+
+// stwReason is an enumeration of reasons the world is stopping.
+type stwReason uint8
+
+// Reasons to stop-the-world.
+//
+// Avoid reusing reasons and add new ones instead.
+const (
+	stwUnknown                     stwReason = iota // "unknown"
+	stwGCMarkTerm                                   // "GC mark termination"
+	stwGCSweepTerm                                  // "GC sweep termination"
+	stwWriteHeapDump                                // "write heap dump"
+	stwGoroutineProfile                             // "goroutine profile"
+	stwGoroutineProfileCleanup                      // "goroutine profile cleanup"
+	stwAllGoroutinesStack                           // "all goroutines stack trace"
+	stwReadMemStats                                 // "read mem stats"
+	stwAllThreadsSyscall                            // "AllThreadsSyscall"
+	stwGOMAXPROCS                                   // "GOMAXPROCS"
+	stwStartTrace                                   // "start trace"
+	stwStopTrace                                    // "stop trace"
+	stwForTestCountPagesInUse                       // "CountPagesInUse (test)"
+	stwForTestReadMetricsSlow                       // "ReadMetricsSlow (test)"
+	stwForTestReadMemStatsSlow                      // "ReadMemStatsSlow (test)"
+	stwForTestPageCachePagesLeaked                  // "PageCachePagesLeaked (test)"
+	stwForTestResetDebugLog                         // "ResetDebugLog (test)"
+)
+
+func (r stwReason) String() string {
+	return stwReasonStrings[r]
+}
+
+// If you add to this list, also add it to src/internal/trace/parser.go.
+// If you change the values of any of the stw* constants, bump the trace
+// version number and make a copy of this.
+var stwReasonStrings = [...]string{
+	stwUnknown:                     "unknown",
+	stwGCMarkTerm:                  "GC mark termination",
+	stwGCSweepTerm:                 "GC sweep termination",
+	stwWriteHeapDump:               "write heap dump",
+	stwGoroutineProfile:            "goroutine profile",
+	stwGoroutineProfileCleanup:     "goroutine profile cleanup",
+	stwAllGoroutinesStack:          "all goroutines stack trace",
+	stwReadMemStats:                "read mem stats",
+	stwAllThreadsSyscall:           "AllThreadsSyscall",
+	stwGOMAXPROCS:                  "GOMAXPROCS",
+	stwStartTrace:                  "start trace",
+	stwStopTrace:                   "stop trace",
+	stwForTestCountPagesInUse:      "CountPagesInUse (test)",
+	stwForTestReadMetricsSlow:      "ReadMetricsSlow (test)",
+	stwForTestReadMemStatsSlow:     "ReadMemStatsSlow (test)",
+	stwForTestPageCachePagesLeaked: "PageCachePagesLeaked (test)",
+	stwForTestResetDebugLog:        "ResetDebugLog (test)",
+}
+
+// stopTheWorld stops all P's from executing goroutines, interrupting
+// all goroutines at GC safe points and records reason as the reason
+// for the stop. On return, only the current goroutine's P is running.
+// stopTheWorld must not be called from a system stack and the caller
+// must not hold worldsema. The caller must call startTheWorld when
+// other P's should resume execution.
+//
+// stopTheWorld is safe for multiple goroutines to call at the
+// same time. Each will execute its own stop, and the stops will
+// be serialized.
+//
+// This is also used by routines that do stack dumps. If the system is
+// in panic or being exited, this may not reliably stop all
+// goroutines.
+func stopTheWorld(reason stwReason) {
+	semacquire(&worldsema)
+	gp := getg()
+	gp.m.preemptoff = reason.String()
+	systemstack(func() {
+		// Mark the goroutine which called stopTheWorld preemptible so its
+		// stack may be scanned.
+		// This lets a mark worker scan us while we try to stop the world
+		// since otherwise we could get in a mutual preemption deadlock.
+		// We must not modify anything on the G stack because a stack shrink
+		// may occur. A stack shrink is otherwise OK though because in order
+		// to return from this function (and to leave the system stack) we
+		// must have preempted all goroutines, including any attempting
+		// to scan our stack, in which case, any stack shrinking will
+		// have already completed by the time we exit.
+		// Don't provide a wait reason because we're still executing.
+		casGToWaiting(gp, _Grunning, waitReasonStoppingTheWorld)
+		stopTheWorldWithSema(reason)
+		casgstatus(gp, _Gwaiting, _Grunning)
+	})
+}
+
+// startTheWorld undoes the effects of stopTheWorld.
+func startTheWorld() {
+	systemstack(func() { startTheWorldWithSema() })
+
+	// worldsema must be held over startTheWorldWithSema to ensure
+	// gomaxprocs cannot change while worldsema is held.
+	//
+	// Release worldsema with direct handoff to the next waiter, but
+	// acquirem so that semrelease1 doesn't try to yield our time.
+	//
+	// Otherwise if e.g. ReadMemStats is being called in a loop,
+	// it might stomp on other attempts to stop the world, such as
+	// for starting or ending GC. The operation this blocks is
+	// so heavy-weight that we should just try to be as fair as
+	// possible here.
+	//
+	// We don't want to just allow us to get preempted between now
+	// and releasing the semaphore because then we keep everyone
+	// (including, for example, GCs) waiting longer.
+	mp := acquirem()
+	mp.preemptoff = ""
+	semrelease1(&worldsema, true, 0)
+	releasem(mp)
+}
+
+// stopTheWorldGC has the same effect as stopTheWorld, but blocks
+// until the GC is not running. It also blocks a GC from starting
+// until startTheWorldGC is called.
+func stopTheWorldGC(reason stwReason) {
+	semacquire(&gcsema)
+	stopTheWorld(reason)
+}
+
+// startTheWorldGC undoes the effects of stopTheWorldGC.
+func startTheWorldGC() {
+	startTheWorld()
+	semrelease(&gcsema)
+}
+
+// Holding worldsema grants an M the right to try to stop the world.
+var worldsema uint32 = 1
+
+// Holding gcsema grants the M the right to block a GC, and blocks
+// until the current GC is done. In particular, it prevents gomaxprocs
+// from changing concurrently.
+//
+// TODO(mknyszek): Once gomaxprocs and the execution tracer can handle
+// being changed/enabled during a GC, remove this.
+var gcsema uint32 = 1
+
+// stopTheWorldWithSema is the core implementation of stopTheWorld.
+// The caller is responsible for acquiring worldsema and disabling
+// preemption first and then should stopTheWorldWithSema on the system
+// stack:
+//
+//	semacquire(&worldsema, 0)
+//	m.preemptoff = "reason"
+//	systemstack(stopTheWorldWithSema)
+//
+// When finished, the caller must either call startTheWorld or undo
+// these three operations separately:
+//
+//	m.preemptoff = ""
+//	systemstack(startTheWorldWithSema)
+//	semrelease(&worldsema)
+//
+// It is allowed to acquire worldsema once and then execute multiple
+// startTheWorldWithSema/stopTheWorldWithSema pairs.
+// Other P's are able to execute between successive calls to
+// startTheWorldWithSema and stopTheWorldWithSema.
+// Holding worldsema causes any other goroutines invoking
+// stopTheWorld to block.
+func stopTheWorldWithSema(reason stwReason) {
+	if traceEnabled() {
+		traceSTWStart(reason)
+	}
+	gp := getg()
+
+	// If we hold a lock, then we won't be able to stop another M
+	// that is blocked trying to acquire the lock.
+	if gp.m.locks > 0 {
+		throw("stopTheWorld: holding locks")
+	}
+
+	lock(&sched.lock)
+	sched.stopwait = gomaxprocs
+	sched.gcwaiting.Store(true)
+	preemptall()
+	// stop current P
+	gp.m.p.ptr().status = _Pgcstop // Pgcstop is only diagnostic.
+	sched.stopwait--
+	// try to retake all P's in Psyscall status
+	for _, pp := range allp {
+		s := pp.status
+		if s == _Psyscall && atomic.Cas(&pp.status, s, _Pgcstop) {
+			if traceEnabled() {
+				traceGoSysBlock(pp)
+				traceProcStop(pp)
+			}
+			pp.syscalltick++
+			sched.stopwait--
+		}
+	}
+	// stop idle P's
+	now := nanotime()
+	for {
+		pp, _ := pidleget(now)
+		if pp == nil {
+			break
+		}
+		pp.status = _Pgcstop
+		sched.stopwait--
+	}
+	wait := sched.stopwait > 0
+	unlock(&sched.lock)
+
+	// wait for remaining P's to stop voluntarily
+	if wait {
+		for {
+			// wait for 100us, then try to re-preempt in case of any races
+			if notetsleep(&sched.stopnote, 100*1000) {
+				noteclear(&sched.stopnote)
+				break
+			}
+			preemptall()
+		}
+	}
+
+	// sanity checks
+	bad := ""
+	if sched.stopwait != 0 {
+		bad = "stopTheWorld: not stopped (stopwait != 0)"
+	} else {
+		for _, pp := range allp {
+			if pp.status != _Pgcstop {
+				bad = "stopTheWorld: not stopped (status != _Pgcstop)"
+			}
+		}
+	}
+	if freezing.Load() {
+		// Some other thread is panicking. This can cause the
+		// sanity checks above to fail if the panic happens in
+		// the signal handler on a stopped thread. Either way,
+		// we should halt this thread.
+		lock(&deadlock)
+		lock(&deadlock)
+	}
+	if bad != "" {
+		throw(bad)
+	}
+
+	worldStopped()
+}
+
+func startTheWorldWithSema() int64 {
+	assertWorldStopped()
+
+	mp := acquirem() // disable preemption because it can be holding p in a local var
+	if netpollinited() {
+		list := netpoll(0) // non-blocking
+		injectglist(&list)
+	}
+	lock(&sched.lock)
+
+	procs := gomaxprocs
+	if newprocs != 0 {
+		procs = newprocs
+		newprocs = 0
+	}
+	p1 := procresize(procs)
+	sched.gcwaiting.Store(false)
+	if sched.sysmonwait.Load() {
+		sched.sysmonwait.Store(false)
+		notewakeup(&sched.sysmonnote)
+	}
+	unlock(&sched.lock)
+
+	worldStarted()
+
+	for p1 != nil {
+		p := p1
+		p1 = p1.link.ptr()
+		if p.m != 0 {
+			mp := p.m.ptr()
+			p.m = 0
+			if mp.nextp != 0 {
+				throw("startTheWorld: inconsistent mp->nextp")
+			}
+			mp.nextp.set(p)
+			notewakeup(&mp.park)
+		} else {
+			// Start M to run P.  Do not start another M below.
+			newm(nil, p, -1)
+		}
+	}
+
+	// Capture start-the-world time before doing clean-up tasks.
+	startTime := nanotime()
+	if traceEnabled() {
+		traceSTWDone()
+	}
+
+	// Wakeup an additional proc in case we have excessive runnable goroutines
+	// in local queues or in the global queue. If we don't, the proc will park itself.
+	// If we have lots of excessive work, resetspinning will unpark additional procs as necessary.
+	wakep()
+
+	releasem(mp)
+
+	return startTime
+}
+
+// usesLibcall indicates whether this runtime performs system calls
+// via libcall.
+func usesLibcall() bool {
+	switch GOOS {
+	case "aix", "darwin", "illumos", "ios", "solaris", "windows":
+		return true
+	case "openbsd":
+		return GOARCH == "386" || GOARCH == "amd64" || GOARCH == "arm" || GOARCH == "arm64"
+	}
+	return false
+}
+
+// mStackIsSystemAllocated indicates whether this runtime starts on a
+// system-allocated stack.
+func mStackIsSystemAllocated() bool {
+	switch GOOS {
+	case "aix", "darwin", "plan9", "illumos", "ios", "solaris", "windows":
+		return true
+	case "openbsd":
+		switch GOARCH {
+		case "386", "amd64", "arm", "arm64":
+			return true
+		}
+	}
+	return false
+}
+
+// mstart is the entry-point for new Ms.
+// It is written in assembly, uses ABI0, is marked TOPFRAME, and calls mstart0.
+func mstart()
+
+// mstart0 is the Go entry-point for new Ms.
+// This must not split the stack because we may not even have stack
+// bounds set up yet.
+//
+// May run during STW (because it doesn't have a P yet), so write
+// barriers are not allowed.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func mstart0() {
+	gp := getg()
+
+	osStack := gp.stack.lo == 0
+	if osStack {
+		// Initialize stack bounds from system stack.
+		// Cgo may have left stack size in stack.hi.
+		// minit may update the stack bounds.
+		//
+		// Note: these bounds may not be very accurate.
+		// We set hi to &size, but there are things above
+		// it. The 1024 is supposed to compensate this,
+		// but is somewhat arbitrary.
+		size := gp.stack.hi
+		if size == 0 {
+			size = 16384 * sys.StackGuardMultiplier
+		}
+		gp.stack.hi = uintptr(noescape(unsafe.Pointer(&size)))
+		gp.stack.lo = gp.stack.hi - size + 1024
+	}
+	// Initialize stack guard so that we can start calling regular
+	// Go code.
+	gp.stackguard0 = gp.stack.lo + stackGuard
+	// This is the g0, so we can also call go:systemstack
+	// functions, which check stackguard1.
+	gp.stackguard1 = gp.stackguard0
+	mstart1()
+
+	// Exit this thread.
+	if mStackIsSystemAllocated() {
+		// Windows, Solaris, illumos, Darwin, AIX and Plan 9 always system-allocate
+		// the stack, but put it in gp.stack before mstart,
+		// so the logic above hasn't set osStack yet.
+		osStack = true
+	}
+	mexit(osStack)
+}
+
+// The go:noinline is to guarantee the getcallerpc/getcallersp below are safe,
+// so that we can set up g0.sched to return to the call of mstart1 above.
+//
+//go:noinline
+func mstart1() {
+	gp := getg()
+
+	if gp != gp.m.g0 {
+		throw("bad runtime·mstart")
+	}
+
+	// Set up m.g0.sched as a label returning to just
+	// after the mstart1 call in mstart0 above, for use by goexit0 and mcall.
+	// We're never coming back to mstart1 after we call schedule,
+	// so other calls can reuse the current frame.
+	// And goexit0 does a gogo that needs to return from mstart1
+	// and let mstart0 exit the thread.
+	gp.sched.g = guintptr(unsafe.Pointer(gp))
+	gp.sched.pc = getcallerpc()
+	gp.sched.sp = getcallersp()
+
+	asminit()
+	minit()
+
+	// Install signal handlers; after minit so that minit can
+	// prepare the thread to be able to handle the signals.
+	if gp.m == &m0 {
+		mstartm0()
+	}
+
+	if fn := gp.m.mstartfn; fn != nil {
+		fn()
+	}
+
+	if gp.m != &m0 {
+		acquirep(gp.m.nextp.ptr())
+		gp.m.nextp = 0
+	}
+	schedule()
+}
+
+// mstartm0 implements part of mstart1 that only runs on the m0.
+//
+// Write barriers are allowed here because we know the GC can't be
+// running yet, so they'll be no-ops.
+//
+//go:yeswritebarrierrec
+func mstartm0() {
+	// Create an extra M for callbacks on threads not created by Go.
+	// An extra M is also needed on Windows for callbacks created by
+	// syscall.NewCallback. See issue #6751 for details.
+	if (iscgo || GOOS == "windows") && !cgoHasExtraM {
+		cgoHasExtraM = true
+		newextram()
+	}
+	initsig(false)
+}
+
+// mPark causes a thread to park itself, returning once woken.
+//
+//go:nosplit
+func mPark() {
+	gp := getg()
+	notesleep(&gp.m.park)
+	noteclear(&gp.m.park)
+}
+
+// mexit tears down and exits the current thread.
+//
+// Don't call this directly to exit the thread, since it must run at
+// the top of the thread stack. Instead, use gogo(&gp.m.g0.sched) to
+// unwind the stack to the point that exits the thread.
+//
+// It is entered with m.p != nil, so write barriers are allowed. It
+// will release the P before exiting.
+//
+//go:yeswritebarrierrec
+func mexit(osStack bool) {
+	mp := getg().m
+
+	if mp == &m0 {
+		// This is the main thread. Just wedge it.
+		//
+		// On Linux, exiting the main thread puts the process
+		// into a non-waitable zombie state. On Plan 9,
+		// exiting the main thread unblocks wait even though
+		// other threads are still running. On Solaris we can
+		// neither exitThread nor return from mstart. Other
+		// bad things probably happen on other platforms.
+		//
+		// We could try to clean up this M more before wedging
+		// it, but that complicates signal handling.
+		handoffp(releasep())
+		lock(&sched.lock)
+		sched.nmfreed++
+		checkdead()
+		unlock(&sched.lock)
+		mPark()
+		throw("locked m0 woke up")
+	}
+
+	sigblock(true)
+	unminit()
+
+	// Free the gsignal stack.
+	if mp.gsignal != nil {
+		stackfree(mp.gsignal.stack)
+		// On some platforms, when calling into VDSO (e.g. nanotime)
+		// we store our g on the gsignal stack, if there is one.
+		// Now the stack is freed, unlink it from the m, so we
+		// won't write to it when calling VDSO code.
+		mp.gsignal = nil
+	}
+
+	// Remove m from allm.
+	lock(&sched.lock)
+	for pprev := &allm; *pprev != nil; pprev = &(*pprev).alllink {
+		if *pprev == mp {
+			*pprev = mp.alllink
+			goto found
+		}
+	}
+	throw("m not found in allm")
+found:
+	// Delay reaping m until it's done with the stack.
+	//
+	// Put mp on the free list, though it will not be reaped while freeWait
+	// is freeMWait. mp is no longer reachable via allm, so even if it is
+	// on an OS stack, we must keep a reference to mp alive so that the GC
+	// doesn't free mp while we are still using it.
+	//
+	// Note that the free list must not be linked through alllink because
+	// some functions walk allm without locking, so may be using alllink.
+	mp.freeWait.Store(freeMWait)
+	mp.freelink = sched.freem
+	sched.freem = mp
+	unlock(&sched.lock)
+
+	atomic.Xadd64(&ncgocall, int64(mp.ncgocall))
+
+	// Release the P.
+	handoffp(releasep())
+	// After this point we must not have write barriers.
+
+	// Invoke the deadlock detector. This must happen after
+	// handoffp because it may have started a new M to take our
+	// P's work.
+	lock(&sched.lock)
+	sched.nmfreed++
+	checkdead()
+	unlock(&sched.lock)
+
+	if GOOS == "darwin" || GOOS == "ios" {
+		// Make sure pendingPreemptSignals is correct when an M exits.
+		// For #41702.
+		if mp.signalPending.Load() != 0 {
+			pendingPreemptSignals.Add(-1)
+		}
+	}
+
+	// Destroy all allocated resources. After this is called, we may no
+	// longer take any locks.
+	mdestroy(mp)
+
+	if osStack {
+		// No more uses of mp, so it is safe to drop the reference.
+		mp.freeWait.Store(freeMRef)
+
+		// Return from mstart and let the system thread
+		// library free the g0 stack and terminate the thread.
+		return
+	}
+
+	// mstart is the thread's entry point, so there's nothing to
+	// return to. Exit the thread directly. exitThread will clear
+	// m.freeWait when it's done with the stack and the m can be
+	// reaped.
+	exitThread(&mp.freeWait)
+}
+
+// forEachP calls fn(p) for every P p when p reaches a GC safe point.
+// If a P is currently executing code, this will bring the P to a GC
+// safe point and execute fn on that P. If the P is not executing code
+// (it is idle or in a syscall), this will call fn(p) directly while
+// preventing the P from exiting its state. This does not ensure that
+// fn will run on every CPU executing Go code, but it acts as a global
+// memory barrier. GC uses this as a "ragged barrier."
+//
+// The caller must hold worldsema.
+//
+//go:systemstack
+func forEachP(fn func(*p)) {
+	mp := acquirem()
+	pp := getg().m.p.ptr()
+
+	lock(&sched.lock)
+	if sched.safePointWait != 0 {
+		throw("forEachP: sched.safePointWait != 0")
+	}
+	sched.safePointWait = gomaxprocs - 1
+	sched.safePointFn = fn
+
+	// Ask all Ps to run the safe point function.
+	for _, p2 := range allp {
+		if p2 != pp {
+			atomic.Store(&p2.runSafePointFn, 1)
+		}
+	}
+	preemptall()
+
+	// Any P entering _Pidle or _Psyscall from now on will observe
+	// p.runSafePointFn == 1 and will call runSafePointFn when
+	// changing its status to _Pidle/_Psyscall.
+
+	// Run safe point function for all idle Ps. sched.pidle will
+	// not change because we hold sched.lock.
+	for p := sched.pidle.ptr(); p != nil; p = p.link.ptr() {
+		if atomic.Cas(&p.runSafePointFn, 1, 0) {
+			fn(p)
+			sched.safePointWait--
+		}
+	}
+
+	wait := sched.safePointWait > 0
+	unlock(&sched.lock)
+
+	// Run fn for the current P.
+	fn(pp)
+
+	// Force Ps currently in _Psyscall into _Pidle and hand them
+	// off to induce safe point function execution.
+	for _, p2 := range allp {
+		s := p2.status
+		if s == _Psyscall && p2.runSafePointFn == 1 && atomic.Cas(&p2.status, s, _Pidle) {
+			if traceEnabled() {
+				traceGoSysBlock(p2)
+				traceProcStop(p2)
+			}
+			p2.syscalltick++
+			handoffp(p2)
+		}
+	}
+
+	// Wait for remaining Ps to run fn.
+	if wait {
+		for {
+			// Wait for 100us, then try to re-preempt in
+			// case of any races.
+			//
+			// Requires system stack.
+			if notetsleep(&sched.safePointNote, 100*1000) {
+				noteclear(&sched.safePointNote)
+				break
+			}
+			preemptall()
+		}
+	}
+	if sched.safePointWait != 0 {
+		throw("forEachP: not done")
+	}
+	for _, p2 := range allp {
+		if p2.runSafePointFn != 0 {
+			throw("forEachP: P did not run fn")
+		}
+	}
+
+	lock(&sched.lock)
+	sched.safePointFn = nil
+	unlock(&sched.lock)
+	releasem(mp)
+}
+
+// runSafePointFn runs the safe point function, if any, for this P.
+// This should be called like
+//
+//	if getg().m.p.runSafePointFn != 0 {
+//	    runSafePointFn()
+//	}
+//
+// runSafePointFn must be checked on any transition in to _Pidle or
+// _Psyscall to avoid a race where forEachP sees that the P is running
+// just before the P goes into _Pidle/_Psyscall and neither forEachP
+// nor the P run the safe-point function.
+func runSafePointFn() {
+	p := getg().m.p.ptr()
+	// Resolve the race between forEachP running the safe-point
+	// function on this P's behalf and this P running the
+	// safe-point function directly.
+	if !atomic.Cas(&p.runSafePointFn, 1, 0) {
+		return
+	}
+	sched.safePointFn(p)
+	lock(&sched.lock)
+	sched.safePointWait--
+	if sched.safePointWait == 0 {
+		notewakeup(&sched.safePointNote)
+	}
+	unlock(&sched.lock)
+}
+
+// When running with cgo, we call _cgo_thread_start
+// to start threads for us so that we can play nicely with
+// foreign code.
+var cgoThreadStart unsafe.Pointer
+
+type cgothreadstart struct {
+	g   guintptr
+	tls *uint64
+	fn  unsafe.Pointer
+}
+
+// Allocate a new m unassociated with any thread.
+// Can use p for allocation context if needed.
+// fn is recorded as the new m's m.mstartfn.
+// id is optional pre-allocated m ID. Omit by passing -1.
+//
+// This function is allowed to have write barriers even if the caller
+// isn't because it borrows pp.
+//
+//go:yeswritebarrierrec
+func allocm(pp *p, fn func(), id int64) *m {
+	allocmLock.rlock()
+
+	// The caller owns pp, but we may borrow (i.e., acquirep) it. We must
+	// disable preemption to ensure it is not stolen, which would make the
+	// caller lose ownership.
+	acquirem()
+
+	gp := getg()
+	if gp.m.p == 0 {
+		acquirep(pp) // temporarily borrow p for mallocs in this function
+	}
+
+	// Release the free M list. We need to do this somewhere and
+	// this may free up a stack we can use.
+	if sched.freem != nil {
+		lock(&sched.lock)
+		var newList *m
+		for freem := sched.freem; freem != nil; {
+			wait := freem.freeWait.Load()
+			if wait == freeMWait {
+				next := freem.freelink
+				freem.freelink = newList
+				newList = freem
+				freem = next
+				continue
+			}
+			// Free the stack if needed. For freeMRef, there is
+			// nothing to do except drop freem from the sched.freem
+			// list.
+			if wait == freeMStack {
+				// stackfree must be on the system stack, but allocm is
+				// reachable off the system stack transitively from
+				// startm.
+				systemstack(func() {
+					stackfree(freem.g0.stack)
+				})
+			}
+			freem = freem.freelink
+		}
+		sched.freem = newList
+		unlock(&sched.lock)
+	}
+
+	mp := new(m)
+	mp.mstartfn = fn
+	mcommoninit(mp, id)
+
+	// In case of cgo or Solaris or illumos or Darwin, pthread_create will make us a stack.
+	// Windows and Plan 9 will layout sched stack on OS stack.
+	if iscgo || mStackIsSystemAllocated() {
+		mp.g0 = malg(-1)
+	} else {
+		mp.g0 = malg(16384 * sys.StackGuardMultiplier)
+	}
+	mp.g0.m = mp
+
+	if pp == gp.m.p.ptr() {
+		releasep()
+	}
+
+	releasem(gp.m)
+	allocmLock.runlock()
+	return mp
+}
+
+// needm is called when a cgo callback happens on a
+// thread without an m (a thread not created by Go).
+// In this case, needm is expected to find an m to use
+// and return with m, g initialized correctly.
+// Since m and g are not set now (likely nil, but see below)
+// needm is limited in what routines it can call. In particular
+// it can only call nosplit functions (textflag 7) and cannot
+// do any scheduling that requires an m.
+//
+// In order to avoid needing heavy lifting here, we adopt
+// the following strategy: there is a stack of available m's
+// that can be stolen. Using compare-and-swap
+// to pop from the stack has ABA races, so we simulate
+// a lock by doing an exchange (via Casuintptr) to steal the stack
+// head and replace the top pointer with MLOCKED (1).
+// This serves as a simple spin lock that we can use even
+// without an m. The thread that locks the stack in this way
+// unlocks the stack by storing a valid stack head pointer.
+//
+// In order to make sure that there is always an m structure
+// available to be stolen, we maintain the invariant that there
+// is always one more than needed. At the beginning of the
+// program (if cgo is in use) the list is seeded with a single m.
+// If needm finds that it has taken the last m off the list, its job
+// is - once it has installed its own m so that it can do things like
+// allocate memory - to create a spare m and put it on the list.
+//
+// Each of these extra m's also has a g0 and a curg that are
+// pressed into service as the scheduling stack and current
+// goroutine for the duration of the cgo callback.
+//
+// It calls dropm to put the m back on the list,
+// 1. when the callback is done with the m in non-pthread platforms,
+// 2. or when the C thread exiting on pthread platforms.
+//
+// The signal argument indicates whether we're called from a signal
+// handler.
+//
+//go:nosplit
+func needm(signal bool) {
+	if (iscgo || GOOS == "windows") && !cgoHasExtraM {
+		// Can happen if C/C++ code calls Go from a global ctor.
+		// Can also happen on Windows if a global ctor uses a
+		// callback created by syscall.NewCallback. See issue #6751
+		// for details.
+		//
+		// Can not throw, because scheduler is not initialized yet.
+		writeErrStr("fatal error: cgo callback before cgo call\n")
+		exit(1)
+	}
+
+	// Save and block signals before getting an M.
+	// The signal handler may call needm itself,
+	// and we must avoid a deadlock. Also, once g is installed,
+	// any incoming signals will try to execute,
+	// but we won't have the sigaltstack settings and other data
+	// set up appropriately until the end of minit, which will
+	// unblock the signals. This is the same dance as when
+	// starting a new m to run Go code via newosproc.
+	var sigmask sigset
+	sigsave(&sigmask)
+	sigblock(false)
+
+	// getExtraM is safe here because of the invariant above,
+	// that the extra list always contains or will soon contain
+	// at least one m.
+	mp, last := getExtraM()
+
+	// Set needextram when we've just emptied the list,
+	// so that the eventual call into cgocallbackg will
+	// allocate a new m for the extra list. We delay the
+	// allocation until then so that it can be done
+	// after exitsyscall makes sure it is okay to be
+	// running at all (that is, there's no garbage collection
+	// running right now).
+	mp.needextram = last
+
+	// Store the original signal mask for use by minit.
+	mp.sigmask = sigmask
+
+	// Install TLS on some platforms (previously setg
+	// would do this if necessary).
+	osSetupTLS(mp)
+
+	// Install g (= m->g0) and set the stack bounds
+	// to match the current stack.
+	setg(mp.g0)
+	sp := getcallersp()
+	callbackUpdateSystemStack(mp, sp, signal)
+
+	// Should mark we are already in Go now.
+	// Otherwise, we may call needm again when we get a signal, before cgocallbackg1,
+	// which means the extram list may be empty, that will cause a deadlock.
+	mp.isExtraInC = false
+
+	// Initialize this thread to use the m.
+	asminit()
+	minit()
+
+	// mp.curg is now a real goroutine.
+	casgstatus(mp.curg, _Gdead, _Gsyscall)
+	sched.ngsys.Add(-1)
+}
+
+// Acquire an extra m and bind it to the C thread when a pthread key has been created.
+//
+//go:nosplit
+func needAndBindM() {
+	needm(false)
+
+	if _cgo_pthread_key_created != nil && *(*uintptr)(_cgo_pthread_key_created) != 0 {
+		cgoBindM()
+	}
+}
+
+// newextram allocates m's and puts them on the extra list.
+// It is called with a working local m, so that it can do things
+// like call schedlock and allocate.
+func newextram() {
+	c := extraMWaiters.Swap(0)
+	if c > 0 {
+		for i := uint32(0); i < c; i++ {
+			oneNewExtraM()
+		}
+	} else if extraMLength.Load() == 0 {
+		// Make sure there is at least one extra M.
+		oneNewExtraM()
+	}
+}
+
+// oneNewExtraM allocates an m and puts it on the extra list.
+func oneNewExtraM() {
+	// Create extra goroutine locked to extra m.
+	// The goroutine is the context in which the cgo callback will run.
+	// The sched.pc will never be returned to, but setting it to
+	// goexit makes clear to the traceback routines where
+	// the goroutine stack ends.
+	mp := allocm(nil, nil, -1)
+	gp := malg(4096)
+	gp.sched.pc = abi.FuncPCABI0(goexit) + sys.PCQuantum
+	gp.sched.sp = gp.stack.hi
+	gp.sched.sp -= 4 * goarch.PtrSize // extra space in case of reads slightly beyond frame
+	gp.sched.lr = 0
+	gp.sched.g = guintptr(unsafe.Pointer(gp))
+	gp.syscallpc = gp.sched.pc
+	gp.syscallsp = gp.sched.sp
+	gp.stktopsp = gp.sched.sp
+	// malg returns status as _Gidle. Change to _Gdead before
+	// adding to allg where GC can see it. We use _Gdead to hide
+	// this from tracebacks and stack scans since it isn't a
+	// "real" goroutine until needm grabs it.
+	casgstatus(gp, _Gidle, _Gdead)
+	gp.m = mp
+	mp.curg = gp
+	mp.isextra = true
+	// mark we are in C by default.
+	mp.isExtraInC = true
+	mp.lockedInt++
+	mp.lockedg.set(gp)
+	gp.lockedm.set(mp)
+	gp.goid = sched.goidgen.Add(1)
+	if raceenabled {
+		gp.racectx = racegostart(abi.FuncPCABIInternal(newextram) + sys.PCQuantum)
+	}
+	if traceEnabled() {
+		traceOneNewExtraM(gp)
+	}
+	// put on allg for garbage collector
+	allgadd(gp)
+
+	// gp is now on the allg list, but we don't want it to be
+	// counted by gcount. It would be more "proper" to increment
+	// sched.ngfree, but that requires locking. Incrementing ngsys
+	// has the same effect.
+	sched.ngsys.Add(1)
+
+	// Add m to the extra list.
+	addExtraM(mp)
+}
+
+// dropm puts the current m back onto the extra list.
+//
+// 1. On systems without pthreads, like Windows
+// dropm is called when a cgo callback has called needm but is now
+// done with the callback and returning back into the non-Go thread.
+//
+// The main expense here is the call to signalstack to release the
+// m's signal stack, and then the call to needm on the next callback
+// from this thread. It is tempting to try to save the m for next time,
+// which would eliminate both these costs, but there might not be
+// a next time: the current thread (which Go does not control) might exit.
+// If we saved the m for that thread, there would be an m leak each time
+// such a thread exited. Instead, we acquire and release an m on each
+// call. These should typically not be scheduling operations, just a few
+// atomics, so the cost should be small.
+//
+// 2. On systems with pthreads
+// dropm is called while a non-Go thread is exiting.
+// We allocate a pthread per-thread variable using pthread_key_create,
+// to register a thread-exit-time destructor.
+// And store the g into a thread-specific value associated with the pthread key,
+// when first return back to C.
+// So that the destructor would invoke dropm while the non-Go thread is exiting.
+// This is much faster since it avoids expensive signal-related syscalls.
+//
+// This always runs without a P, so //go:nowritebarrierrec is required.
+//
+// This may run with a different stack than was recorded in g0 (there is no
+// call to callbackUpdateSystemStack prior to dropm), so this must be
+// //go:nosplit to avoid the stack bounds check.
+//
+//go:nowritebarrierrec
+//go:nosplit
+func dropm() {
+	// Clear m and g, and return m to the extra list.
+	// After the call to setg we can only call nosplit functions
+	// with no pointer manipulation.
+	mp := getg().m
+
+	// Return mp.curg to dead state.
+	casgstatus(mp.curg, _Gsyscall, _Gdead)
+	mp.curg.preemptStop = false
+	sched.ngsys.Add(1)
+
+	// Block signals before unminit.
+	// Unminit unregisters the signal handling stack (but needs g on some systems).
+	// Setg(nil) clears g, which is the signal handler's cue not to run Go handlers.
+	// It's important not to try to handle a signal between those two steps.
+	sigmask := mp.sigmask
+	sigblock(false)
+	unminit()
+
+	setg(nil)
+
+	// Clear g0 stack bounds to ensure that needm always refreshes the
+	// bounds when reusing this M.
+	g0 := mp.g0
+	g0.stack.hi = 0
+	g0.stack.lo = 0
+	g0.stackguard0 = 0
+	g0.stackguard1 = 0
+
+	putExtraM(mp)
+
+	msigrestore(sigmask)
+}
+
+// bindm store the g0 of the current m into a thread-specific value.
+//
+// We allocate a pthread per-thread variable using pthread_key_create,
+// to register a thread-exit-time destructor.
+// We are here setting the thread-specific value of the pthread key, to enable the destructor.
+// So that the pthread_key_destructor would dropm while the C thread is exiting.
+//
+// And the saved g will be used in pthread_key_destructor,
+// since the g stored in the TLS by Go might be cleared in some platforms,
+// before the destructor invoked, so, we restore g by the stored g, before dropm.
+//
+// We store g0 instead of m, to make the assembly code simpler,
+// since we need to restore g0 in runtime.cgocallback.
+//
+// On systems without pthreads, like Windows, bindm shouldn't be used.
+//
+// NOTE: this always runs without a P, so, nowritebarrierrec required.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func cgoBindM() {
+	if GOOS == "windows" || GOOS == "plan9" {
+		fatal("bindm in unexpected GOOS")
+	}
+	g := getg()
+	if g.m.g0 != g {
+		fatal("the current g is not g0")
+	}
+	if _cgo_bindm != nil {
+		asmcgocall(_cgo_bindm, unsafe.Pointer(g))
+	}
+}
+
+// A helper function for EnsureDropM.
+func getm() uintptr {
+	return uintptr(unsafe.Pointer(getg().m))
+}
+
+var (
+	// Locking linked list of extra M's, via mp.schedlink. Must be accessed
+	// only via lockextra/unlockextra.
+	//
+	// Can't be atomic.Pointer[m] because we use an invalid pointer as a
+	// "locked" sentinel value. M's on this list remain visible to the GC
+	// because their mp.curg is on allgs.
+	extraM atomic.Uintptr
+	// Number of M's in the extraM list.
+	extraMLength atomic.Uint32
+	// Number of waiters in lockextra.
+	extraMWaiters atomic.Uint32
+
+	// Number of extra M's in use by threads.
+	extraMInUse atomic.Uint32
+)
+
+// lockextra locks the extra list and returns the list head.
+// The caller must unlock the list by storing a new list head
+// to extram. If nilokay is true, then lockextra will
+// return a nil list head if that's what it finds. If nilokay is false,
+// lockextra will keep waiting until the list head is no longer nil.
+//
+//go:nosplit
+func lockextra(nilokay bool) *m {
+	const locked = 1
+
+	incr := false
+	for {
+		old := extraM.Load()
+		if old == locked {
+			osyield_no_g()
+			continue
+		}
+		if old == 0 && !nilokay {
+			if !incr {
+				// Add 1 to the number of threads
+				// waiting for an M.
+				// This is cleared by newextram.
+				extraMWaiters.Add(1)
+				incr = true
+			}
+			usleep_no_g(1)
+			continue
+		}
+		if extraM.CompareAndSwap(old, locked) {
+			return (*m)(unsafe.Pointer(old))
+		}
+		osyield_no_g()
+		continue
+	}
+}
+
+//go:nosplit
+func unlockextra(mp *m, delta int32) {
+	extraMLength.Add(delta)
+	extraM.Store(uintptr(unsafe.Pointer(mp)))
+}
+
+// Return an M from the extra M list. Returns last == true if the list becomes
+// empty because of this call.
+//
+// Spins waiting for an extra M, so caller must ensure that the list always
+// contains or will soon contain at least one M.
+//
+//go:nosplit
+func getExtraM() (mp *m, last bool) {
+	mp = lockextra(false)
+	extraMInUse.Add(1)
+	unlockextra(mp.schedlink.ptr(), -1)
+	return mp, mp.schedlink.ptr() == nil
+}
+
+// Returns an extra M back to the list. mp must be from getExtraM. Newly
+// allocated M's should use addExtraM.
+//
+//go:nosplit
+func putExtraM(mp *m) {
+	extraMInUse.Add(-1)
+	addExtraM(mp)
+}
+
+// Adds a newly allocated M to the extra M list.
+//
+//go:nosplit
+func addExtraM(mp *m) {
+	mnext := lockextra(true)
+	mp.schedlink.set(mnext)
+	unlockextra(mp, 1)
+}
+
+var (
+	// allocmLock is locked for read when creating new Ms in allocm and their
+	// addition to allm. Thus acquiring this lock for write blocks the
+	// creation of new Ms.
+	allocmLock rwmutex
+
+	// execLock serializes exec and clone to avoid bugs or unspecified
+	// behaviour around exec'ing while creating/destroying threads. See
+	// issue #19546.
+	execLock rwmutex
+)
+
+// These errors are reported (via writeErrStr) by some OS-specific
+// versions of newosproc and newosproc0.
+const (
+	failthreadcreate  = "runtime: failed to create new OS thread\n"
+	failallocatestack = "runtime: failed to allocate stack for the new OS thread\n"
+)
+
+// newmHandoff contains a list of m structures that need new OS threads.
+// This is used by newm in situations where newm itself can't safely
+// start an OS thread.
+var newmHandoff struct {
+	lock mutex
+
+	// newm points to a list of M structures that need new OS
+	// threads. The list is linked through m.schedlink.
+	newm muintptr
+
+	// waiting indicates that wake needs to be notified when an m
+	// is put on the list.
+	waiting bool
+	wake    note
+
+	// haveTemplateThread indicates that the templateThread has
+	// been started. This is not protected by lock. Use cas to set
+	// to 1.
+	haveTemplateThread uint32
+}
+
+// Create a new m. It will start off with a call to fn, or else the scheduler.
+// fn needs to be static and not a heap allocated closure.
+// May run with m.p==nil, so write barriers are not allowed.
+//
+// id is optional pre-allocated m ID. Omit by passing -1.
+//
+//go:nowritebarrierrec
+func newm(fn func(), pp *p, id int64) {
+	// allocm adds a new M to allm, but they do not start until created by
+	// the OS in newm1 or the template thread.
+	//
+	// doAllThreadsSyscall requires that every M in allm will eventually
+	// start and be signal-able, even with a STW.
+	//
+	// Disable preemption here until we start the thread to ensure that
+	// newm is not preempted between allocm and starting the new thread,
+	// ensuring that anything added to allm is guaranteed to eventually
+	// start.
+	acquirem()
+
+	mp := allocm(pp, fn, id)
+	mp.nextp.set(pp)
+	mp.sigmask = initSigmask
+	if gp := getg(); gp != nil && gp.m != nil && (gp.m.lockedExt != 0 || gp.m.incgo) && GOOS != "plan9" {
+		// We're on a locked M or a thread that may have been
+		// started by C. The kernel state of this thread may
+		// be strange (the user may have locked it for that
+		// purpose). We don't want to clone that into another
+		// thread. Instead, ask a known-good thread to create
+		// the thread for us.
+		//
+		// This is disabled on Plan 9. See golang.org/issue/22227.
+		//
+		// TODO: This may be unnecessary on Windows, which
+		// doesn't model thread creation off fork.
+		lock(&newmHandoff.lock)
+		if newmHandoff.haveTemplateThread == 0 {
+			throw("on a locked thread with no template thread")
+		}
+		mp.schedlink = newmHandoff.newm
+		newmHandoff.newm.set(mp)
+		if newmHandoff.waiting {
+			newmHandoff.waiting = false
+			notewakeup(&newmHandoff.wake)
+		}
+		unlock(&newmHandoff.lock)
+		// The M has not started yet, but the template thread does not
+		// participate in STW, so it will always process queued Ms and
+		// it is safe to releasem.
+		releasem(getg().m)
+		return
+	}
+	newm1(mp)
+	releasem(getg().m)
+}
+
+func newm1(mp *m) {
+	if iscgo {
+		var ts cgothreadstart
+		if _cgo_thread_start == nil {
+			throw("_cgo_thread_start missing")
+		}
+		ts.g.set(mp.g0)
+		ts.tls = (*uint64)(unsafe.Pointer(&mp.tls[0]))
+		ts.fn = unsafe.Pointer(abi.FuncPCABI0(mstart))
+		if msanenabled {
+			msanwrite(unsafe.Pointer(&ts), unsafe.Sizeof(ts))
+		}
+		if asanenabled {
+			asanwrite(unsafe.Pointer(&ts), unsafe.Sizeof(ts))
+		}
+		execLock.rlock() // Prevent process clone.
+		asmcgocall(_cgo_thread_start, unsafe.Pointer(&ts))
+		execLock.runlock()
+		return
+	}
+	execLock.rlock() // Prevent process clone.
+	newosproc(mp)
+	execLock.runlock()
+}
+
+// startTemplateThread starts the template thread if it is not already
+// running.
+//
+// The calling thread must itself be in a known-good state.
+func startTemplateThread() {
+	if GOARCH == "wasm" { // no threads on wasm yet
+		return
+	}
+
+	// Disable preemption to guarantee that the template thread will be
+	// created before a park once haveTemplateThread is set.
+	mp := acquirem()
+	if !atomic.Cas(&newmHandoff.haveTemplateThread, 0, 1) {
+		releasem(mp)
+		return
+	}
+	newm(templateThread, nil, -1)
+	releasem(mp)
+}
+
+// templateThread is a thread in a known-good state that exists solely
+// to start new threads in known-good states when the calling thread
+// may not be in a good state.
+//
+// Many programs never need this, so templateThread is started lazily
+// when we first enter a state that might lead to running on a thread
+// in an unknown state.
+//
+// templateThread runs on an M without a P, so it must not have write
+// barriers.
+//
+//go:nowritebarrierrec
+func templateThread() {
+	lock(&sched.lock)
+	sched.nmsys++
+	checkdead()
+	unlock(&sched.lock)
+
+	for {
+		lock(&newmHandoff.lock)
+		for newmHandoff.newm != 0 {
+			newm := newmHandoff.newm.ptr()
+			newmHandoff.newm = 0
+			unlock(&newmHandoff.lock)
+			for newm != nil {
+				next := newm.schedlink.ptr()
+				newm.schedlink = 0
+				newm1(newm)
+				newm = next
+			}
+			lock(&newmHandoff.lock)
+		}
+		newmHandoff.waiting = true
+		noteclear(&newmHandoff.wake)
+		unlock(&newmHandoff.lock)
+		notesleep(&newmHandoff.wake)
+	}
+}
+
+// Stops execution of the current m until new work is available.
+// Returns with acquired P.
+func stopm() {
+	gp := getg()
+
+	if gp.m.locks != 0 {
+		throw("stopm holding locks")
+	}
+	if gp.m.p != 0 {
+		throw("stopm holding p")
+	}
+	if gp.m.spinning {
+		throw("stopm spinning")
+	}
+
+	lock(&sched.lock)
+	mput(gp.m)
+	unlock(&sched.lock)
+	mPark()
+	acquirep(gp.m.nextp.ptr())
+	gp.m.nextp = 0
+}
+
+func mspinning() {
+	// startm's caller incremented nmspinning. Set the new M's spinning.
+	getg().m.spinning = true
+}
+
+// Schedules some M to run the p (creates an M if necessary).
+// If p==nil, tries to get an idle P, if no idle P's does nothing.
+// May run with m.p==nil, so write barriers are not allowed.
+// If spinning is set, the caller has incremented nmspinning and must provide a
+// P. startm will set m.spinning in the newly started M.
+//
+// Callers passing a non-nil P must call from a non-preemptible context. See
+// comment on acquirem below.
+//
+// Argument lockheld indicates whether the caller already acquired the
+// scheduler lock. Callers holding the lock when making the call must pass
+// true. The lock might be temporarily dropped, but will be reacquired before
+// returning.
+//
+// Must not have write barriers because this may be called without a P.
+//
+//go:nowritebarrierrec
+func startm(pp *p, spinning, lockheld bool) {
+	// Disable preemption.
+	//
+	// Every owned P must have an owner that will eventually stop it in the
+	// event of a GC stop request. startm takes transient ownership of a P
+	// (either from argument or pidleget below) and transfers ownership to
+	// a started M, which will be responsible for performing the stop.
+	//
+	// Preemption must be disabled during this transient ownership,
+	// otherwise the P this is running on may enter GC stop while still
+	// holding the transient P, leaving that P in limbo and deadlocking the
+	// STW.
+	//
+	// Callers passing a non-nil P must already be in non-preemptible
+	// context, otherwise such preemption could occur on function entry to
+	// startm. Callers passing a nil P may be preemptible, so we must
+	// disable preemption before acquiring a P from pidleget below.
+	mp := acquirem()
+	if !lockheld {
+		lock(&sched.lock)
+	}
+	if pp == nil {
+		if spinning {
+			// TODO(prattmic): All remaining calls to this function
+			// with _p_ == nil could be cleaned up to find a P
+			// before calling startm.
+			throw("startm: P required for spinning=true")
+		}
+		pp, _ = pidleget(0)
+		if pp == nil {
+			if !lockheld {
+				unlock(&sched.lock)
+			}
+			releasem(mp)
+			return
+		}
+	}
+	nmp := mget()
+	if nmp == nil {
+		// No M is available, we must drop sched.lock and call newm.
+		// However, we already own a P to assign to the M.
+		//
+		// Once sched.lock is released, another G (e.g., in a syscall),
+		// could find no idle P while checkdead finds a runnable G but
+		// no running M's because this new M hasn't started yet, thus
+		// throwing in an apparent deadlock.
+		// This apparent deadlock is possible when startm is called
+		// from sysmon, which doesn't count as a running M.
+		//
+		// Avoid this situation by pre-allocating the ID for the new M,
+		// thus marking it as 'running' before we drop sched.lock. This
+		// new M will eventually run the scheduler to execute any
+		// queued G's.
+		id := mReserveID()
+		unlock(&sched.lock)
+
+		var fn func()
+		if spinning {
+			// The caller incremented nmspinning, so set m.spinning in the new M.
+			fn = mspinning
+		}
+		newm(fn, pp, id)
+
+		if lockheld {
+			lock(&sched.lock)
+		}
+		// Ownership transfer of pp committed by start in newm.
+		// Preemption is now safe.
+		releasem(mp)
+		return
+	}
+	if !lockheld {
+		unlock(&sched.lock)
+	}
+	if nmp.spinning {
+		throw("startm: m is spinning")
+	}
+	if nmp.nextp != 0 {
+		throw("startm: m has p")
+	}
+	if spinning && !runqempty(pp) {
+		throw("startm: p has runnable gs")
+	}
+	// The caller incremented nmspinning, so set m.spinning in the new M.
+	nmp.spinning = spinning
+	nmp.nextp.set(pp)
+	notewakeup(&nmp.park)
+	// Ownership transfer of pp committed by wakeup. Preemption is now
+	// safe.
+	releasem(mp)
+}
+
+// Hands off P from syscall or locked M.
+// Always runs without a P, so write barriers are not allowed.
+//
+//go:nowritebarrierrec
+func handoffp(pp *p) {
+	// handoffp must start an M in any situation where
+	// findrunnable would return a G to run on pp.
+
+	// if it has local work, start it straight away
+	if !runqempty(pp) || sched.runqsize != 0 {
+		startm(pp, false, false)
+		return
+	}
+	// if there's trace work to do, start it straight away
+	if (traceEnabled() || traceShuttingDown()) && traceReaderAvailable() != nil {
+		startm(pp, false, false)
+		return
+	}
+	// if it has GC work, start it straight away
+	if gcBlackenEnabled != 0 && gcMarkWorkAvailable(pp) {
+		startm(pp, false, false)
+		return
+	}
+	// no local work, check that there are no spinning/idle M's,
+	// otherwise our help is not required
+	if sched.nmspinning.Load()+sched.npidle.Load() == 0 && sched.nmspinning.CompareAndSwap(0, 1) { // TODO: fast atomic
+		sched.needspinning.Store(0)
+		startm(pp, true, false)
+		return
+	}
+	lock(&sched.lock)
+	if sched.gcwaiting.Load() {
+		pp.status = _Pgcstop
+		sched.stopwait--
+		if sched.stopwait == 0 {
+			notewakeup(&sched.stopnote)
+		}
+		unlock(&sched.lock)
+		return
+	}
+	if pp.runSafePointFn != 0 && atomic.Cas(&pp.runSafePointFn, 1, 0) {
+		sched.safePointFn(pp)
+		sched.safePointWait--
+		if sched.safePointWait == 0 {
+			notewakeup(&sched.safePointNote)
+		}
+	}
+	if sched.runqsize != 0 {
+		unlock(&sched.lock)
+		startm(pp, false, false)
+		return
+	}
+	// If this is the last running P and nobody is polling network,
+	// need to wakeup another M to poll network.
+	if sched.npidle.Load() == gomaxprocs-1 && sched.lastpoll.Load() != 0 {
+		unlock(&sched.lock)
+		startm(pp, false, false)
+		return
+	}
+
+	// The scheduler lock cannot be held when calling wakeNetPoller below
+	// because wakeNetPoller may call wakep which may call startm.
+	when := nobarrierWakeTime(pp)
+	pidleput(pp, 0)
+	unlock(&sched.lock)
+
+	if when != 0 {
+		wakeNetPoller(when)
+	}
+}
+
+// Tries to add one more P to execute G's.
+// Called when a G is made runnable (newproc, ready).
+// Must be called with a P.
+func wakep() {
+	// Be conservative about spinning threads, only start one if none exist
+	// already.
+	if sched.nmspinning.Load() != 0 || !sched.nmspinning.CompareAndSwap(0, 1) {
+		return
+	}
+
+	// Disable preemption until ownership of pp transfers to the next M in
+	// startm. Otherwise preemption here would leave pp stuck waiting to
+	// enter _Pgcstop.
+	//
+	// See preemption comment on acquirem in startm for more details.
+	mp := acquirem()
+
+	var pp *p
+	lock(&sched.lock)
+	pp, _ = pidlegetSpinning(0)
+	if pp == nil {
+		if sched.nmspinning.Add(-1) < 0 {
+			throw("wakep: negative nmspinning")
+		}
+		unlock(&sched.lock)
+		releasem(mp)
+		return
+	}
+	// Since we always have a P, the race in the "No M is available"
+	// comment in startm doesn't apply during the small window between the
+	// unlock here and lock in startm. A checkdead in between will always
+	// see at least one running M (ours).
+	unlock(&sched.lock)
+
+	startm(pp, true, false)
+
+	releasem(mp)
+}
+
+// Stops execution of the current m that is locked to a g until the g is runnable again.
+// Returns with acquired P.
+func stoplockedm() {
+	gp := getg()
+
+	if gp.m.lockedg == 0 || gp.m.lockedg.ptr().lockedm.ptr() != gp.m {
+		throw("stoplockedm: inconsistent locking")
+	}
+	if gp.m.p != 0 {
+		// Schedule another M to run this p.
+		pp := releasep()
+		handoffp(pp)
+	}
+	incidlelocked(1)
+	// Wait until another thread schedules lockedg again.
+	mPark()
+	status := readgstatus(gp.m.lockedg.ptr())
+	if status&^_Gscan != _Grunnable {
+		print("runtime:stoplockedm: lockedg (atomicstatus=", status, ") is not Grunnable or Gscanrunnable\n")
+		dumpgstatus(gp.m.lockedg.ptr())
+		throw("stoplockedm: not runnable")
+	}
+	acquirep(gp.m.nextp.ptr())
+	gp.m.nextp = 0
+}
+
+// Schedules the locked m to run the locked gp.
+// May run during STW, so write barriers are not allowed.
+//
+//go:nowritebarrierrec
+func startlockedm(gp *g) {
+	mp := gp.lockedm.ptr()
+	if mp == getg().m {
+		throw("startlockedm: locked to me")
+	}
+	if mp.nextp != 0 {
+		throw("startlockedm: m has p")
+	}
+	// directly handoff current P to the locked m
+	incidlelocked(-1)
+	pp := releasep()
+	mp.nextp.set(pp)
+	notewakeup(&mp.park)
+	stopm()
+}
+
+// Stops the current m for stopTheWorld.
+// Returns when the world is restarted.
+func gcstopm() {
+	gp := getg()
+
+	if !sched.gcwaiting.Load() {
+		throw("gcstopm: not waiting for gc")
+	}
+	if gp.m.spinning {
+		gp.m.spinning = false
+		// OK to just drop nmspinning here,
+		// startTheWorld will unpark threads as necessary.
+		if sched.nmspinning.Add(-1) < 0 {
+			throw("gcstopm: negative nmspinning")
+		}
+	}
+	pp := releasep()
+	lock(&sched.lock)
+	pp.status = _Pgcstop
+	sched.stopwait--
+	if sched.stopwait == 0 {
+		notewakeup(&sched.stopnote)
+	}
+	unlock(&sched.lock)
+	stopm()
+}
+
+// Schedules gp to run on the current M.
+// If inheritTime is true, gp inherits the remaining time in the
+// current time slice. Otherwise, it starts a new time slice.
+// Never returns.
+//
+// Write barriers are allowed because this is called immediately after
+// acquiring a P in several places.
+//
+//go:yeswritebarrierrec
+func execute(gp *g, inheritTime bool) {
+	mp := getg().m
+
+	if goroutineProfile.active {
+		// Make sure that gp has had its stack written out to the goroutine
+		// profile, exactly as it was when the goroutine profiler first stopped
+		// the world.
+		tryRecordGoroutineProfile(gp, osyield)
+	}
+
+	// Assign gp.m before entering _Grunning so running Gs have an
+	// M.
+	mp.curg = gp
+	gp.m = mp
+	casgstatus(gp, _Grunnable, _Grunning)
+	gp.waitsince = 0
+	gp.preempt = false
+	gp.stackguard0 = gp.stack.lo + stackGuard
+	if !inheritTime {
+		mp.p.ptr().schedtick++
+	}
+
+	// Check whether the profiler needs to be turned on or off.
+	hz := sched.profilehz
+	if mp.profilehz != hz {
+		setThreadCPUProfiler(hz)
+	}
+
+	if traceEnabled() {
+		// GoSysExit has to happen when we have a P, but before GoStart.
+		// So we emit it here.
+		if gp.syscallsp != 0 {
+			traceGoSysExit()
+		}
+		traceGoStart()
+	}
+
+	gogo(&gp.sched)
+}
+
+// Finds a runnable goroutine to execute.
+// Tries to steal from other P's, get g from local or global queue, poll network.
+// tryWakeP indicates that the returned goroutine is not normal (GC worker, trace
+// reader) so the caller should try to wake a P.
+func findRunnable() (gp *g, inheritTime, tryWakeP bool) {
+	mp := getg().m
+
+	// The conditions here and in handoffp must agree: if
+	// findrunnable would return a G to run, handoffp must start
+	// an M.
+
+top:
+	pp := mp.p.ptr()
+	if sched.gcwaiting.Load() {
+		gcstopm()
+		goto top
+	}
+	if pp.runSafePointFn != 0 {
+		runSafePointFn()
+	}
+
+	// now and pollUntil are saved for work stealing later,
+	// which may steal timers. It's important that between now
+	// and then, nothing blocks, so these numbers remain mostly
+	// relevant.
+	now, pollUntil, _ := checkTimers(pp, 0)
+
+	// Try to schedule the trace reader.
+	if traceEnabled() || traceShuttingDown() {
+		gp := traceReader()
+		if gp != nil {
+			casgstatus(gp, _Gwaiting, _Grunnable)
+			traceGoUnpark(gp, 0)
+			return gp, false, true
+		}
+	}
+
+	// Try to schedule a GC worker.
+	if gcBlackenEnabled != 0 {
+		gp, tnow := gcController.findRunnableGCWorker(pp, now)
+		if gp != nil {
+			return gp, false, true
+		}
+		now = tnow
+	}
+
+	// Check the global runnable queue once in a while to ensure fairness.
+	// Otherwise two goroutines can completely occupy the local runqueue
+	// by constantly respawning each other.
+	if pp.schedtick%61 == 0 && sched.runqsize > 0 {
+		lock(&sched.lock)
+		gp := globrunqget(pp, 1)
+		unlock(&sched.lock)
+		if gp != nil {
+			return gp, false, false
+		}
+	}
+
+	// Wake up the finalizer G.
+	if fingStatus.Load()&(fingWait|fingWake) == fingWait|fingWake {
+		if gp := wakefing(); gp != nil {
+			ready(gp, 0, true)
+		}
+	}
+	if *cgo_yield != nil {
+		asmcgocall(*cgo_yield, nil)
+	}
+
+	// local runq
+	if gp, inheritTime := runqget(pp); gp != nil {
+		return gp, inheritTime, false
+	}
+
+	// global runq
+	if sched.runqsize != 0 {
+		lock(&sched.lock)
+		gp := globrunqget(pp, 0)
+		unlock(&sched.lock)
+		if gp != nil {
+			return gp, false, false
+		}
+	}
+
+	// Poll network.
+	// This netpoll is only an optimization before we resort to stealing.
+	// We can safely skip it if there are no waiters or a thread is blocked
+	// in netpoll already. If there is any kind of logical race with that
+	// blocked thread (e.g. it has already returned from netpoll, but does
+	// not set lastpoll yet), this thread will do blocking netpoll below
+	// anyway.
+	if netpollinited() && netpollWaiters.Load() > 0 && sched.lastpoll.Load() != 0 {
+		if list := netpoll(0); !list.empty() { // non-blocking
+			gp := list.pop()
+			injectglist(&list)
+			casgstatus(gp, _Gwaiting, _Grunnable)
+			if traceEnabled() {
+				traceGoUnpark(gp, 0)
+			}
+			return gp, false, false
+		}
+	}
+
+	// Spinning Ms: steal work from other Ps.
+	//
+	// Limit the number of spinning Ms to half the number of busy Ps.
+	// This is necessary to prevent excessive CPU consumption when
+	// GOMAXPROCS>>1 but the program parallelism is low.
+	if mp.spinning || 2*sched.nmspinning.Load() < gomaxprocs-sched.npidle.Load() {
+		if !mp.spinning {
+			mp.becomeSpinning()
+		}
+
+		gp, inheritTime, tnow, w, newWork := stealWork(now)
+		if gp != nil {
+			// Successfully stole.
+			return gp, inheritTime, false
+		}
+		if newWork {
+			// There may be new timer or GC work; restart to
+			// discover.
+			goto top
+		}
+
+		now = tnow
+		if w != 0 && (pollUntil == 0 || w < pollUntil) {
+			// Earlier timer to wait for.
+			pollUntil = w
+		}
+	}
+
+	// We have nothing to do.
+	//
+	// If we're in the GC mark phase, can safely scan and blacken objects,
+	// and have work to do, run idle-time marking rather than give up the P.
+	if gcBlackenEnabled != 0 && gcMarkWorkAvailable(pp) && gcController.addIdleMarkWorker() {
+		node := (*gcBgMarkWorkerNode)(gcBgMarkWorkerPool.pop())
+		if node != nil {
+			pp.gcMarkWorkerMode = gcMarkWorkerIdleMode
+			gp := node.gp.ptr()
+			casgstatus(gp, _Gwaiting, _Grunnable)
+			if traceEnabled() {
+				traceGoUnpark(gp, 0)
+			}
+			return gp, false, false
+		}
+		gcController.removeIdleMarkWorker()
+	}
+
+	// wasm only:
+	// If a callback returned and no other goroutine is awake,
+	// then wake event handler goroutine which pauses execution
+	// until a callback was triggered.
+	gp, otherReady := beforeIdle(now, pollUntil)
+	if gp != nil {
+		casgstatus(gp, _Gwaiting, _Grunnable)
+		if traceEnabled() {
+			traceGoUnpark(gp, 0)
+		}
+		return gp, false, false
+	}
+	if otherReady {
+		goto top
+	}
+
+	// Before we drop our P, make a snapshot of the allp slice,
+	// which can change underfoot once we no longer block
+	// safe-points. We don't need to snapshot the contents because
+	// everything up to cap(allp) is immutable.
+	allpSnapshot := allp
+	// Also snapshot masks. Value changes are OK, but we can't allow
+	// len to change out from under us.
+	idlepMaskSnapshot := idlepMask
+	timerpMaskSnapshot := timerpMask
+
+	// return P and block
+	lock(&sched.lock)
+	if sched.gcwaiting.Load() || pp.runSafePointFn != 0 {
+		unlock(&sched.lock)
+		goto top
+	}
+	if sched.runqsize != 0 {
+		gp := globrunqget(pp, 0)
+		unlock(&sched.lock)
+		return gp, false, false
+	}
+	if !mp.spinning && sched.needspinning.Load() == 1 {
+		// See "Delicate dance" comment below.
+		mp.becomeSpinning()
+		unlock(&sched.lock)
+		goto top
+	}
+	if releasep() != pp {
+		throw("findrunnable: wrong p")
+	}
+	now = pidleput(pp, now)
+	unlock(&sched.lock)
+
+	// Delicate dance: thread transitions from spinning to non-spinning
+	// state, potentially concurrently with submission of new work. We must
+	// drop nmspinning first and then check all sources again (with
+	// #StoreLoad memory barrier in between). If we do it the other way
+	// around, another thread can submit work after we've checked all
+	// sources but before we drop nmspinning; as a result nobody will
+	// unpark a thread to run the work.
+	//
+	// This applies to the following sources of work:
+	//
+	// * Goroutines added to a per-P run queue.
+	// * New/modified-earlier timers on a per-P timer heap.
+	// * Idle-priority GC work (barring golang.org/issue/19112).
+	//
+	// If we discover new work below, we need to restore m.spinning as a
+	// signal for resetspinning to unpark a new worker thread (because
+	// there can be more than one starving goroutine).
+	//
+	// However, if after discovering new work we also observe no idle Ps
+	// (either here or in resetspinning), we have a problem. We may be
+	// racing with a non-spinning M in the block above, having found no
+	// work and preparing to release its P and park. Allowing that P to go
+	// idle will result in loss of work conservation (idle P while there is
+	// runnable work). This could result in complete deadlock in the
+	// unlikely event that we discover new work (from netpoll) right as we
+	// are racing with _all_ other Ps going idle.
+	//
+	// We use sched.needspinning to synchronize with non-spinning Ms going
+	// idle. If needspinning is set when they are about to drop their P,
+	// they abort the drop and instead become a new spinning M on our
+	// behalf. If we are not racing and the system is truly fully loaded
+	// then no spinning threads are required, and the next thread to
+	// naturally become spinning will clear the flag.
+	//
+	// Also see "Worker thread parking/unparking" comment at the top of the
+	// file.
+	wasSpinning := mp.spinning
+	if mp.spinning {
+		mp.spinning = false
+		if sched.nmspinning.Add(-1) < 0 {
+			throw("findrunnable: negative nmspinning")
+		}
+
+		// Note the for correctness, only the last M transitioning from
+		// spinning to non-spinning must perform these rechecks to
+		// ensure no missed work. However, the runtime has some cases
+		// of transient increments of nmspinning that are decremented
+		// without going through this path, so we must be conservative
+		// and perform the check on all spinning Ms.
+		//
+		// See https://go.dev/issue/43997.
+
+		// Check all runqueues once again.
+		pp := checkRunqsNoP(allpSnapshot, idlepMaskSnapshot)
+		if pp != nil {
+			acquirep(pp)
+			mp.becomeSpinning()
+			goto top
+		}
+
+		// Check for idle-priority GC work again.
+		pp, gp := checkIdleGCNoP()
+		if pp != nil {
+			acquirep(pp)
+			mp.becomeSpinning()
+
+			// Run the idle worker.
+			pp.gcMarkWorkerMode = gcMarkWorkerIdleMode
+			casgstatus(gp, _Gwaiting, _Grunnable)
+			if traceEnabled() {
+				traceGoUnpark(gp, 0)
+			}
+			return gp, false, false
+		}
+
+		// Finally, check for timer creation or expiry concurrently with
+		// transitioning from spinning to non-spinning.
+		//
+		// Note that we cannot use checkTimers here because it calls
+		// adjusttimers which may need to allocate memory, and that isn't
+		// allowed when we don't have an active P.
+		pollUntil = checkTimersNoP(allpSnapshot, timerpMaskSnapshot, pollUntil)
+	}
+
+	// Poll network until next timer.
+	if netpollinited() && (netpollWaiters.Load() > 0 || pollUntil != 0) && sched.lastpoll.Swap(0) != 0 {
+		sched.pollUntil.Store(pollUntil)
+		if mp.p != 0 {
+			throw("findrunnable: netpoll with p")
+		}
+		if mp.spinning {
+			throw("findrunnable: netpoll with spinning")
+		}
+		delay := int64(-1)
+		if pollUntil != 0 {
+			if now == 0 {
+				now = nanotime()
+			}
+			delay = pollUntil - now
+			if delay < 0 {
+				delay = 0
+			}
+		}
+		if faketime != 0 {
+			// When using fake time, just poll.
+			delay = 0
+		}
+		list := netpoll(delay) // block until new work is available
+		// Refresh now again, after potentially blocking.
+		now = nanotime()
+		sched.pollUntil.Store(0)
+		sched.lastpoll.Store(now)
+		if faketime != 0 && list.empty() {
+			// Using fake time and nothing is ready; stop M.
+			// When all M's stop, checkdead will call timejump.
+			stopm()
+			goto top
+		}
+		lock(&sched.lock)
+		pp, _ := pidleget(now)
+		unlock(&sched.lock)
+		if pp == nil {
+			injectglist(&list)
+		} else {
+			acquirep(pp)
+			if !list.empty() {
+				gp := list.pop()
+				injectglist(&list)
+				casgstatus(gp, _Gwaiting, _Grunnable)
+				if traceEnabled() {
+					traceGoUnpark(gp, 0)
+				}
+				return gp, false, false
+			}
+			if wasSpinning {
+				mp.becomeSpinning()
+			}
+			goto top
+		}
+	} else if pollUntil != 0 && netpollinited() {
+		pollerPollUntil := sched.pollUntil.Load()
+		if pollerPollUntil == 0 || pollerPollUntil > pollUntil {
+			netpollBreak()
+		}
+	}
+	stopm()
+	goto top
+}
+
+// pollWork reports whether there is non-background work this P could
+// be doing. This is a fairly lightweight check to be used for
+// background work loops, like idle GC. It checks a subset of the
+// conditions checked by the actual scheduler.
+func pollWork() bool {
+	if sched.runqsize != 0 {
+		return true
+	}
+	p := getg().m.p.ptr()
+	if !runqempty(p) {
+		return true
+	}
+	if netpollinited() && netpollWaiters.Load() > 0 && sched.lastpoll.Load() != 0 {
+		if list := netpoll(0); !list.empty() {
+			injectglist(&list)
+			return true
+		}
+	}
+	return false
+}
+
+// stealWork attempts to steal a runnable goroutine or timer from any P.
+//
+// If newWork is true, new work may have been readied.
+//
+// If now is not 0 it is the current time. stealWork returns the passed time or
+// the current time if now was passed as 0.
+func stealWork(now int64) (gp *g, inheritTime bool, rnow, pollUntil int64, newWork bool) {
+	pp := getg().m.p.ptr()
+
+	ranTimer := false
+
+	const stealTries = 4
+	for i := 0; i < stealTries; i++ {
+		stealTimersOrRunNextG := i == stealTries-1
+
+		for enum := stealOrder.start(fastrand()); !enum.done(); enum.next() {
+			if sched.gcwaiting.Load() {
+				// GC work may be available.
+				return nil, false, now, pollUntil, true
+			}
+			p2 := allp[enum.position()]
+			if pp == p2 {
+				continue
+			}
+
+			// Steal timers from p2. This call to checkTimers is the only place
+			// where we might hold a lock on a different P's timers. We do this
+			// once on the last pass before checking runnext because stealing
+			// from the other P's runnext should be the last resort, so if there
+			// are timers to steal do that first.
+			//
+			// We only check timers on one of the stealing iterations because
+			// the time stored in now doesn't change in this loop and checking
+			// the timers for each P more than once with the same value of now
+			// is probably a waste of time.
+			//
+			// timerpMask tells us whether the P may have timers at all. If it
+			// can't, no need to check at all.
+			if stealTimersOrRunNextG && timerpMask.read(enum.position()) {
+				tnow, w, ran := checkTimers(p2, now)
+				now = tnow
+				if w != 0 && (pollUntil == 0 || w < pollUntil) {
+					pollUntil = w
+				}
+				if ran {
+					// Running the timers may have
+					// made an arbitrary number of G's
+					// ready and added them to this P's
+					// local run queue. That invalidates
+					// the assumption of runqsteal
+					// that it always has room to add
+					// stolen G's. So check now if there
+					// is a local G to run.
+					if gp, inheritTime := runqget(pp); gp != nil {
+						return gp, inheritTime, now, pollUntil, ranTimer
+					}
+					ranTimer = true
+				}
+			}
+
+			// Don't bother to attempt to steal if p2 is idle.
+			if !idlepMask.read(enum.position()) {
+				if gp := runqsteal(pp, p2, stealTimersOrRunNextG); gp != nil {
+					return gp, false, now, pollUntil, ranTimer
+				}
+			}
+		}
+	}
+
+	// No goroutines found to steal. Regardless, running a timer may have
+	// made some goroutine ready that we missed. Indicate the next timer to
+	// wait for.
+	return nil, false, now, pollUntil, ranTimer
+}
+
+// Check all Ps for a runnable G to steal.
+//
+// On entry we have no P. If a G is available to steal and a P is available,
+// the P is returned which the caller should acquire and attempt to steal the
+// work to.
+func checkRunqsNoP(allpSnapshot []*p, idlepMaskSnapshot pMask) *p {
+	for id, p2 := range allpSnapshot {
+		if !idlepMaskSnapshot.read(uint32(id)) && !runqempty(p2) {
+			lock(&sched.lock)
+			pp, _ := pidlegetSpinning(0)
+			if pp == nil {
+				// Can't get a P, don't bother checking remaining Ps.
+				unlock(&sched.lock)
+				return nil
+			}
+			unlock(&sched.lock)
+			return pp
+		}
+	}
+
+	// No work available.
+	return nil
+}
+
+// Check all Ps for a timer expiring sooner than pollUntil.
+//
+// Returns updated pollUntil value.
+func checkTimersNoP(allpSnapshot []*p, timerpMaskSnapshot pMask, pollUntil int64) int64 {
+	for id, p2 := range allpSnapshot {
+		if timerpMaskSnapshot.read(uint32(id)) {
+			w := nobarrierWakeTime(p2)
+			if w != 0 && (pollUntil == 0 || w < pollUntil) {
+				pollUntil = w
+			}
+		}
+	}
+
+	return pollUntil
+}
+
+// Check for idle-priority GC, without a P on entry.
+//
+// If some GC work, a P, and a worker G are all available, the P and G will be
+// returned. The returned P has not been wired yet.
+func checkIdleGCNoP() (*p, *g) {
+	// N.B. Since we have no P, gcBlackenEnabled may change at any time; we
+	// must check again after acquiring a P. As an optimization, we also check
+	// if an idle mark worker is needed at all. This is OK here, because if we
+	// observe that one isn't needed, at least one is currently running. Even if
+	// it stops running, its own journey into the scheduler should schedule it
+	// again, if need be (at which point, this check will pass, if relevant).
+	if atomic.Load(&gcBlackenEnabled) == 0 || !gcController.needIdleMarkWorker() {
+		return nil, nil
+	}
+	if !gcMarkWorkAvailable(nil) {
+		return nil, nil
+	}
+
+	// Work is available; we can start an idle GC worker only if there is
+	// an available P and available worker G.
+	//
+	// We can attempt to acquire these in either order, though both have
+	// synchronization concerns (see below). Workers are almost always
+	// available (see comment in findRunnableGCWorker for the one case
+	// there may be none). Since we're slightly less likely to find a P,
+	// check for that first.
+	//
+	// Synchronization: note that we must hold sched.lock until we are
+	// committed to keeping it. Otherwise we cannot put the unnecessary P
+	// back in sched.pidle without performing the full set of idle
+	// transition checks.
+	//
+	// If we were to check gcBgMarkWorkerPool first, we must somehow handle
+	// the assumption in gcControllerState.findRunnableGCWorker that an
+	// empty gcBgMarkWorkerPool is only possible if gcMarkDone is running.
+	lock(&sched.lock)
+	pp, now := pidlegetSpinning(0)
+	if pp == nil {
+		unlock(&sched.lock)
+		return nil, nil
+	}
+
+	// Now that we own a P, gcBlackenEnabled can't change (as it requires STW).
+	if gcBlackenEnabled == 0 || !gcController.addIdleMarkWorker() {
+		pidleput(pp, now)
+		unlock(&sched.lock)
+		return nil, nil
+	}
+
+	node := (*gcBgMarkWorkerNode)(gcBgMarkWorkerPool.pop())
+	if node == nil {
+		pidleput(pp, now)
+		unlock(&sched.lock)
+		gcController.removeIdleMarkWorker()
+		return nil, nil
+	}
+
+	unlock(&sched.lock)
+
+	return pp, node.gp.ptr()
+}
+
+// wakeNetPoller wakes up the thread sleeping in the network poller if it isn't
+// going to wake up before the when argument; or it wakes an idle P to service
+// timers and the network poller if there isn't one already.
+func wakeNetPoller(when int64) {
+	if sched.lastpoll.Load() == 0 {
+		// In findrunnable we ensure that when polling the pollUntil
+		// field is either zero or the time to which the current
+		// poll is expected to run. This can have a spurious wakeup
+		// but should never miss a wakeup.
+		pollerPollUntil := sched.pollUntil.Load()
+		if pollerPollUntil == 0 || pollerPollUntil > when {
+			netpollBreak()
+		}
+	} else {
+		// There are no threads in the network poller, try to get
+		// one there so it can handle new timers.
+		if GOOS != "plan9" { // Temporary workaround - see issue #42303.
+			wakep()
+		}
+	}
+}
+
+func resetspinning() {
+	gp := getg()
+	if !gp.m.spinning {
+		throw("resetspinning: not a spinning m")
+	}
+	gp.m.spinning = false
+	nmspinning := sched.nmspinning.Add(-1)
+	if nmspinning < 0 {
+		throw("findrunnable: negative nmspinning")
+	}
+	// M wakeup policy is deliberately somewhat conservative, so check if we
+	// need to wakeup another P here. See "Worker thread parking/unparking"
+	// comment at the top of the file for details.
+	wakep()
+}
+
+// injectglist adds each runnable G on the list to some run queue,
+// and clears glist. If there is no current P, they are added to the
+// global queue, and up to npidle M's are started to run them.
+// Otherwise, for each idle P, this adds a G to the global queue
+// and starts an M. Any remaining G's are added to the current P's
+// local run queue.
+// This may temporarily acquire sched.lock.
+// Can run concurrently with GC.
+func injectglist(glist *gList) {
+	if glist.empty() {
+		return
+	}
+	if traceEnabled() {
+		for gp := glist.head.ptr(); gp != nil; gp = gp.schedlink.ptr() {
+			traceGoUnpark(gp, 0)
+		}
+	}
+
+	// Mark all the goroutines as runnable before we put them
+	// on the run queues.
+	head := glist.head.ptr()
+	var tail *g
+	qsize := 0
+	for gp := head; gp != nil; gp = gp.schedlink.ptr() {
+		tail = gp
+		qsize++
+		casgstatus(gp, _Gwaiting, _Grunnable)
+	}
+
+	// Turn the gList into a gQueue.
+	var q gQueue
+	q.head.set(head)
+	q.tail.set(tail)
+	*glist = gList{}
+
+	startIdle := func(n int) {
+		for i := 0; i < n; i++ {
+			mp := acquirem() // See comment in startm.
+			lock(&sched.lock)
+
+			pp, _ := pidlegetSpinning(0)
+			if pp == nil {
+				unlock(&sched.lock)
+				releasem(mp)
+				break
+			}
+
+			startm(pp, false, true)
+			unlock(&sched.lock)
+			releasem(mp)
+		}
+	}
+
+	pp := getg().m.p.ptr()
+	if pp == nil {
+		lock(&sched.lock)
+		globrunqputbatch(&q, int32(qsize))
+		unlock(&sched.lock)
+		startIdle(qsize)
+		return
+	}
+
+	npidle := int(sched.npidle.Load())
+	var globq gQueue
+	var n int
+	for n = 0; n < npidle && !q.empty(); n++ {
+		g := q.pop()
+		globq.pushBack(g)
+	}
+	if n > 0 {
+		lock(&sched.lock)
+		globrunqputbatch(&globq, int32(n))
+		unlock(&sched.lock)
+		startIdle(n)
+		qsize -= n
+	}
+
+	if !q.empty() {
+		runqputbatch(pp, &q, qsize)
+	}
+}
+
+// One round of scheduler: find a runnable goroutine and execute it.
+// Never returns.
+func schedule() {
+	mp := getg().m
+
+	if mp.locks != 0 {
+		throw("schedule: holding locks")
+	}
+
+	if mp.lockedg != 0 {
+		stoplockedm()
+		execute(mp.lockedg.ptr(), false) // Never returns.
+	}
+
+	// We should not schedule away from a g that is executing a cgo call,
+	// since the cgo call is using the m's g0 stack.
+	if mp.incgo {
+		throw("schedule: in cgo")
+	}
+
+top:
+	pp := mp.p.ptr()
+	pp.preempt = false
+
+	// Safety check: if we are spinning, the run queue should be empty.
+	// Check this before calling checkTimers, as that might call
+	// goready to put a ready goroutine on the local run queue.
+	if mp.spinning && (pp.runnext != 0 || pp.runqhead != pp.runqtail) {
+		throw("schedule: spinning with local work")
+	}
+
+	gp, inheritTime, tryWakeP := findRunnable() // blocks until work is available
+
+	if debug.dontfreezetheworld > 0 && freezing.Load() {
+		// See comment in freezetheworld. We don't want to perturb
+		// scheduler state, so we didn't gcstopm in findRunnable, but
+		// also don't want to allow new goroutines to run.
+		//
+		// Deadlock here rather than in the findRunnable loop so if
+		// findRunnable is stuck in a loop we don't perturb that
+		// either.
+		lock(&deadlock)
+		lock(&deadlock)
+	}
+
+	// This thread is going to run a goroutine and is not spinning anymore,
+	// so if it was marked as spinning we need to reset it now and potentially
+	// start a new spinning M.
+	if mp.spinning {
+		resetspinning()
+	}
+
+	if sched.disable.user && !schedEnabled(gp) {
+		// Scheduling of this goroutine is disabled. Put it on
+		// the list of pending runnable goroutines for when we
+		// re-enable user scheduling and look again.
+		lock(&sched.lock)
+		if schedEnabled(gp) {
+			// Something re-enabled scheduling while we
+			// were acquiring the lock.
+			unlock(&sched.lock)
+		} else {
+			sched.disable.runnable.pushBack(gp)
+			sched.disable.n++
+			unlock(&sched.lock)
+			goto top
+		}
+	}
+
+	// If about to schedule a not-normal goroutine (a GCworker or tracereader),
+	// wake a P if there is one.
+	if tryWakeP {
+		wakep()
+	}
+	if gp.lockedm != 0 {
+		// Hands off own p to the locked m,
+		// then blocks waiting for a new p.
+		startlockedm(gp)
+		goto top
+	}
+
+	execute(gp, inheritTime)
+}
+
+// dropg removes the association between m and the current goroutine m->curg (gp for short).
+// Typically a caller sets gp's status away from Grunning and then
+// immediately calls dropg to finish the job. The caller is also responsible
+// for arranging that gp will be restarted using ready at an
+// appropriate time. After calling dropg and arranging for gp to be
+// readied later, the caller can do other work but eventually should
+// call schedule to restart the scheduling of goroutines on this m.
+func dropg() {
+	gp := getg()
+
+	setMNoWB(&gp.m.curg.m, nil)
+	setGNoWB(&gp.m.curg, nil)
+}
+
+// checkTimers runs any timers for the P that are ready.
+// If now is not 0 it is the current time.
+// It returns the passed time or the current time if now was passed as 0.
+// and the time when the next timer should run or 0 if there is no next timer,
+// and reports whether it ran any timers.
+// If the time when the next timer should run is not 0,
+// it is always larger than the returned time.
+// We pass now in and out to avoid extra calls of nanotime.
+//
+//go:yeswritebarrierrec
+func checkTimers(pp *p, now int64) (rnow, pollUntil int64, ran bool) {
+	// If it's not yet time for the first timer, or the first adjusted
+	// timer, then there is nothing to do.
+	next := pp.timer0When.Load()
+	nextAdj := pp.timerModifiedEarliest.Load()
+	if next == 0 || (nextAdj != 0 && nextAdj < next) {
+		next = nextAdj
+	}
+
+	if next == 0 {
+		// No timers to run or adjust.
+		return now, 0, false
+	}
+
+	if now == 0 {
+		now = nanotime()
+	}
+	if now < next {
+		// Next timer is not ready to run, but keep going
+		// if we would clear deleted timers.
+		// This corresponds to the condition below where
+		// we decide whether to call clearDeletedTimers.
+		if pp != getg().m.p.ptr() || int(pp.deletedTimers.Load()) <= int(pp.numTimers.Load()/4) {
+			return now, next, false
+		}
+	}
+
+	lock(&pp.timersLock)
+
+	if len(pp.timers) > 0 {
+		adjusttimers(pp, now)
+		for len(pp.timers) > 0 {
+			// Note that runtimer may temporarily unlock
+			// pp.timersLock.
+			if tw := runtimer(pp, now); tw != 0 {
+				if tw > 0 {
+					pollUntil = tw
+				}
+				break
+			}
+			ran = true
+		}
+	}
+
+	// If this is the local P, and there are a lot of deleted timers,
+	// clear them out. We only do this for the local P to reduce
+	// lock contention on timersLock.
+	if pp == getg().m.p.ptr() && int(pp.deletedTimers.Load()) > len(pp.timers)/4 {
+		clearDeletedTimers(pp)
+	}
+
+	unlock(&pp.timersLock)
+
+	return now, pollUntil, ran
+}
+
+func parkunlock_c(gp *g, lock unsafe.Pointer) bool {
+	unlock((*mutex)(lock))
+	return true
+}
+
+// park continuation on g0.
+func park_m(gp *g) {
+	mp := getg().m
+
+	if traceEnabled() {
+		traceGoPark(mp.waitTraceBlockReason, mp.waitTraceSkip)
+	}
+
+	// N.B. Not using casGToWaiting here because the waitreason is
+	// set by park_m's caller.
+	casgstatus(gp, _Grunning, _Gwaiting)
+	dropg()
+
+	if fn := mp.waitunlockf; fn != nil {
+		ok := fn(gp, mp.waitlock)
+		mp.waitunlockf = nil
+		mp.waitlock = nil
+		if !ok {
+			if traceEnabled() {
+				traceGoUnpark(gp, 2)
+			}
+			casgstatus(gp, _Gwaiting, _Grunnable)
+			execute(gp, true) // Schedule it back, never returns.
+		}
+	}
+	schedule()
+}
+
+func goschedImpl(gp *g) {
+	status := readgstatus(gp)
+	if status&^_Gscan != _Grunning {
+		dumpgstatus(gp)
+		throw("bad g status")
+	}
+	casgstatus(gp, _Grunning, _Grunnable)
+	dropg()
+	lock(&sched.lock)
+	globrunqput(gp)
+	unlock(&sched.lock)
+
+	schedule()
+}
+
+// Gosched continuation on g0.
+func gosched_m(gp *g) {
+	if traceEnabled() {
+		traceGoSched()
+	}
+	goschedImpl(gp)
+}
+
+// goschedguarded is a forbidden-states-avoided version of gosched_m.
+func goschedguarded_m(gp *g) {
+
+	if !canPreemptM(gp.m) {
+		gogo(&gp.sched) // never return
+	}
+
+	if traceEnabled() {
+		traceGoSched()
+	}
+	goschedImpl(gp)
+}
+
+func gopreempt_m(gp *g) {
+	if traceEnabled() {
+		traceGoPreempt()
+	}
+	goschedImpl(gp)
+}
+
+// preemptPark parks gp and puts it in _Gpreempted.
+//
+//go:systemstack
+func preemptPark(gp *g) {
+	if traceEnabled() {
+		traceGoPark(traceBlockPreempted, 0)
+	}
+	status := readgstatus(gp)
+	if status&^_Gscan != _Grunning {
+		dumpgstatus(gp)
+		throw("bad g status")
+	}
+
+	if gp.asyncSafePoint {
+		// Double-check that async preemption does not
+		// happen in SPWRITE assembly functions.
+		// isAsyncSafePoint must exclude this case.
+		f := findfunc(gp.sched.pc)
+		if !f.valid() {
+			throw("preempt at unknown pc")
+		}
+		if f.flag&abi.FuncFlagSPWrite != 0 {
+			println("runtime: unexpected SPWRITE function", funcname(f), "in async preempt")
+			throw("preempt SPWRITE")
+		}
+	}
+
+	// Transition from _Grunning to _Gscan|_Gpreempted. We can't
+	// be in _Grunning when we dropg because then we'd be running
+	// without an M, but the moment we're in _Gpreempted,
+	// something could claim this G before we've fully cleaned it
+	// up. Hence, we set the scan bit to lock down further
+	// transitions until we can dropg.
+	casGToPreemptScan(gp, _Grunning, _Gscan|_Gpreempted)
+	dropg()
+	casfrom_Gscanstatus(gp, _Gscan|_Gpreempted, _Gpreempted)
+	schedule()
+}
+
+// goyield is like Gosched, but it:
+// - emits a GoPreempt trace event instead of a GoSched trace event
+// - puts the current G on the runq of the current P instead of the globrunq
+func goyield() {
+	checkTimeouts()
+	mcall(goyield_m)
+}
+
+func goyield_m(gp *g) {
+	if traceEnabled() {
+		traceGoPreempt()
+	}
+	pp := gp.m.p.ptr()
+	casgstatus(gp, _Grunning, _Grunnable)
+	dropg()
+	runqput(pp, gp, false)
+	schedule()
+}
+
+// Finishes execution of the current goroutine.
+func goexit1() {
+	if raceenabled {
+		racegoend()
+	}
+	if traceEnabled() {
+		traceGoEnd()
+	}
+	mcall(goexit0)
+}
+
+// goexit continuation on g0.
+func goexit0(gp *g) {
+	mp := getg().m
+	pp := mp.p.ptr()
+
+	casgstatus(gp, _Grunning, _Gdead)
+	gcController.addScannableStack(pp, -int64(gp.stack.hi-gp.stack.lo))
+	if isSystemGoroutine(gp, false) {
+		sched.ngsys.Add(-1)
+	}
+	gp.m = nil
+	locked := gp.lockedm != 0
+	gp.lockedm = 0
+	mp.lockedg = 0
+	gp.preemptStop = false
+	gp.paniconfault = false
+	gp._defer = nil // should be true already but just in case.
+	gp._panic = nil // non-nil for Goexit during panic. points at stack-allocated data.
+	gp.writebuf = nil
+	gp.waitreason = waitReasonZero
+	gp.param = nil
+	gp.labels = nil
+	gp.timer = nil
+
+	if gcBlackenEnabled != 0 && gp.gcAssistBytes > 0 {
+		// Flush assist credit to the global pool. This gives
+		// better information to pacing if the application is
+		// rapidly creating an exiting goroutines.
+		assistWorkPerByte := gcController.assistWorkPerByte.Load()
+		scanCredit := int64(assistWorkPerByte * float64(gp.gcAssistBytes))
+		gcController.bgScanCredit.Add(scanCredit)
+		gp.gcAssistBytes = 0
+	}
+
+	dropg()
+
+	if GOARCH == "wasm" { // no threads yet on wasm
+		gfput(pp, gp)
+		schedule() // never returns
+	}
+
+	if mp.lockedInt != 0 {
+		print("invalid m->lockedInt = ", mp.lockedInt, "\n")
+		throw("internal lockOSThread error")
+	}
+	gfput(pp, gp)
+	if locked {
+		// The goroutine may have locked this thread because
+		// it put it in an unusual kernel state. Kill it
+		// rather than returning it to the thread pool.
+
+		// Return to mstart, which will release the P and exit
+		// the thread.
+		if GOOS != "plan9" { // See golang.org/issue/22227.
+			gogo(&mp.g0.sched)
+		} else {
+			// Clear lockedExt on plan9 since we may end up re-using
+			// this thread.
+			mp.lockedExt = 0
+		}
+	}
+	schedule()
+}
+
+// save updates getg().sched to refer to pc and sp so that a following
+// gogo will restore pc and sp.
+//
+// save must not have write barriers because invoking a write barrier
+// can clobber getg().sched.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func save(pc, sp uintptr) {
+	gp := getg()
+
+	if gp == gp.m.g0 || gp == gp.m.gsignal {
+		// m.g0.sched is special and must describe the context
+		// for exiting the thread. mstart1 writes to it directly.
+		// m.gsignal.sched should not be used at all.
+		// This check makes sure save calls do not accidentally
+		// run in contexts where they'd write to system g's.
+		throw("save on system g not allowed")
+	}
+
+	gp.sched.pc = pc
+	gp.sched.sp = sp
+	gp.sched.lr = 0
+	gp.sched.ret = 0
+	// We need to ensure ctxt is zero, but can't have a write
+	// barrier here. However, it should always already be zero.
+	// Assert that.
+	if gp.sched.ctxt != nil {
+		badctxt()
+	}
+}
+
+// The goroutine g is about to enter a system call.
+// Record that it's not using the cpu anymore.
+// This is called only from the go syscall library and cgocall,
+// not from the low-level system calls used by the runtime.
+//
+// Entersyscall cannot split the stack: the save must
+// make g->sched refer to the caller's stack segment, because
+// entersyscall is going to return immediately after.
+//
+// Nothing entersyscall calls can split the stack either.
+// We cannot safely move the stack during an active call to syscall,
+// because we do not know which of the uintptr arguments are
+// really pointers (back into the stack).
+// In practice, this means that we make the fast path run through
+// entersyscall doing no-split things, and the slow path has to use systemstack
+// to run bigger things on the system stack.
+//
+// reentersyscall is the entry point used by cgo callbacks, where explicitly
+// saved SP and PC are restored. This is needed when exitsyscall will be called
+// from a function further up in the call stack than the parent, as g->syscallsp
+// must always point to a valid stack frame. entersyscall below is the normal
+// entry point for syscalls, which obtains the SP and PC from the caller.
+//
+// Syscall tracing:
+// At the start of a syscall we emit traceGoSysCall to capture the stack trace.
+// If the syscall does not block, that is it, we do not emit any other events.
+// If the syscall blocks (that is, P is retaken), retaker emits traceGoSysBlock;
+// when syscall returns we emit traceGoSysExit and when the goroutine starts running
+// (potentially instantly, if exitsyscallfast returns true) we emit traceGoStart.
+// To ensure that traceGoSysExit is emitted strictly after traceGoSysBlock,
+// we remember current value of syscalltick in m (gp.m.syscalltick = gp.m.p.ptr().syscalltick),
+// whoever emits traceGoSysBlock increments p.syscalltick afterwards;
+// and we wait for the increment before emitting traceGoSysExit.
+// Note that the increment is done even if tracing is not enabled,
+// because tracing can be enabled in the middle of syscall. We don't want the wait to hang.
+//
+//go:nosplit
+func reentersyscall(pc, sp uintptr) {
+	gp := getg()
+
+	// Disable preemption because during this function g is in Gsyscall status,
+	// but can have inconsistent g->sched, do not let GC observe it.
+	gp.m.locks++
+
+	// Entersyscall must not call any function that might split/grow the stack.
+	// (See details in comment above.)
+	// Catch calls that might, by replacing the stack guard with something that
+	// will trip any stack check and leaving a flag to tell newstack to die.
+	gp.stackguard0 = stackPreempt
+	gp.throwsplit = true
+
+	// Leave SP around for GC and traceback.
+	save(pc, sp)
+	gp.syscallsp = sp
+	gp.syscallpc = pc
+	casgstatus(gp, _Grunning, _Gsyscall)
+	if staticLockRanking {
+		// When doing static lock ranking casgstatus can call
+		// systemstack which clobbers g.sched.
+		save(pc, sp)
+	}
+	if gp.syscallsp < gp.stack.lo || gp.stack.hi < gp.syscallsp {
+		systemstack(func() {
+			print("entersyscall inconsistent ", hex(gp.syscallsp), " [", hex(gp.stack.lo), ",", hex(gp.stack.hi), "]\n")
+			throw("entersyscall")
+		})
+	}
+
+	if traceEnabled() {
+		systemstack(traceGoSysCall)
+		// systemstack itself clobbers g.sched.{pc,sp} and we might
+		// need them later when the G is genuinely blocked in a
+		// syscall
+		save(pc, sp)
+	}
+
+	if sched.sysmonwait.Load() {
+		systemstack(entersyscall_sysmon)
+		save(pc, sp)
+	}
+
+	if gp.m.p.ptr().runSafePointFn != 0 {
+		// runSafePointFn may stack split if run on this stack
+		systemstack(runSafePointFn)
+		save(pc, sp)
+	}
+
+	gp.m.syscalltick = gp.m.p.ptr().syscalltick
+	pp := gp.m.p.ptr()
+	pp.m = 0
+	gp.m.oldp.set(pp)
+	gp.m.p = 0
+	atomic.Store(&pp.status, _Psyscall)
+	if sched.gcwaiting.Load() {
+		systemstack(entersyscall_gcwait)
+		save(pc, sp)
+	}
+
+	gp.m.locks--
+}
+
+// Standard syscall entry used by the go syscall library and normal cgo calls.
+//
+// This is exported via linkname to assembly in the syscall package and x/sys.
+//
+//go:nosplit
+//go:linkname entersyscall
+func entersyscall() {
+	reentersyscall(getcallerpc(), getcallersp())
+}
+
+func entersyscall_sysmon() {
+	lock(&sched.lock)
+	if sched.sysmonwait.Load() {
+		sched.sysmonwait.Store(false)
+		notewakeup(&sched.sysmonnote)
+	}
+	unlock(&sched.lock)
+}
+
+func entersyscall_gcwait() {
+	gp := getg()
+	pp := gp.m.oldp.ptr()
+
+	lock(&sched.lock)
+	if sched.stopwait > 0 && atomic.Cas(&pp.status, _Psyscall, _Pgcstop) {
+		if traceEnabled() {
+			traceGoSysBlock(pp)
+			traceProcStop(pp)
+		}
+		pp.syscalltick++
+		if sched.stopwait--; sched.stopwait == 0 {
+			notewakeup(&sched.stopnote)
+		}
+	}
+	unlock(&sched.lock)
+}
+
+// The same as entersyscall(), but with a hint that the syscall is blocking.
+//
+//go:nosplit
+func entersyscallblock() {
+	gp := getg()
+
+	gp.m.locks++ // see comment in entersyscall
+	gp.throwsplit = true
+	gp.stackguard0 = stackPreempt // see comment in entersyscall
+	gp.m.syscalltick = gp.m.p.ptr().syscalltick
+	gp.m.p.ptr().syscalltick++
+
+	// Leave SP around for GC and traceback.
+	pc := getcallerpc()
+	sp := getcallersp()
+	save(pc, sp)
+	gp.syscallsp = gp.sched.sp
+	gp.syscallpc = gp.sched.pc
+	if gp.syscallsp < gp.stack.lo || gp.stack.hi < gp.syscallsp {
+		sp1 := sp
+		sp2 := gp.sched.sp
+		sp3 := gp.syscallsp
+		systemstack(func() {
+			print("entersyscallblock inconsistent ", hex(sp1), " ", hex(sp2), " ", hex(sp3), " [", hex(gp.stack.lo), ",", hex(gp.stack.hi), "]\n")
+			throw("entersyscallblock")
+		})
+	}
+	casgstatus(gp, _Grunning, _Gsyscall)
+	if gp.syscallsp < gp.stack.lo || gp.stack.hi < gp.syscallsp {
+		systemstack(func() {
+			print("entersyscallblock inconsistent ", hex(sp), " ", hex(gp.sched.sp), " ", hex(gp.syscallsp), " [", hex(gp.stack.lo), ",", hex(gp.stack.hi), "]\n")
+			throw("entersyscallblock")
+		})
+	}
+
+	systemstack(entersyscallblock_handoff)
+
+	// Resave for traceback during blocked call.
+	save(getcallerpc(), getcallersp())
+
+	gp.m.locks--
+}
+
+func entersyscallblock_handoff() {
+	if traceEnabled() {
+		traceGoSysCall()
+		traceGoSysBlock(getg().m.p.ptr())
+	}
+	handoffp(releasep())
+}
+
+// The goroutine g exited its system call.
+// Arrange for it to run on a cpu again.
+// This is called only from the go syscall library, not
+// from the low-level system calls used by the runtime.
+//
+// Write barriers are not allowed because our P may have been stolen.
+//
+// This is exported via linkname to assembly in the syscall package.
+//
+//go:nosplit
+//go:nowritebarrierrec
+//go:linkname exitsyscall
+func exitsyscall() {
+	gp := getg()
+
+	gp.m.locks++ // see comment in entersyscall
+	if getcallersp() > gp.syscallsp {
+		throw("exitsyscall: syscall frame is no longer valid")
+	}
+
+	gp.waitsince = 0
+	oldp := gp.m.oldp.ptr()
+	gp.m.oldp = 0
+	if exitsyscallfast(oldp) {
+		// When exitsyscallfast returns success, we have a P so can now use
+		// write barriers
+		if goroutineProfile.active {
+			// Make sure that gp has had its stack written out to the goroutine
+			// profile, exactly as it was when the goroutine profiler first
+			// stopped the world.
+			systemstack(func() {
+				tryRecordGoroutineProfileWB(gp)
+			})
+		}
+		if traceEnabled() {
+			if oldp != gp.m.p.ptr() || gp.m.syscalltick != gp.m.p.ptr().syscalltick {
+				systemstack(traceGoStart)
+			}
+		}
+		// There's a cpu for us, so we can run.
+		gp.m.p.ptr().syscalltick++
+		// We need to cas the status and scan before resuming...
+		casgstatus(gp, _Gsyscall, _Grunning)
+
+		// Garbage collector isn't running (since we are),
+		// so okay to clear syscallsp.
+		gp.syscallsp = 0
+		gp.m.locks--
+		if gp.preempt {
+			// restore the preemption request in case we've cleared it in newstack
+			gp.stackguard0 = stackPreempt
+		} else {
+			// otherwise restore the real stackGuard, we've spoiled it in entersyscall/entersyscallblock
+			gp.stackguard0 = gp.stack.lo + stackGuard
+		}
+		gp.throwsplit = false
+
+		if sched.disable.user && !schedEnabled(gp) {
+			// Scheduling of this goroutine is disabled.
+			Gosched()
+		}
+
+		return
+	}
+
+	if traceEnabled() {
+		// Wait till traceGoSysBlock event is emitted.
+		// This ensures consistency of the trace (the goroutine is started after it is blocked).
+		for oldp != nil && oldp.syscalltick == gp.m.syscalltick {
+			osyield()
+		}
+		// We can't trace syscall exit right now because we don't have a P.
+		// Tracing code can invoke write barriers that cannot run without a P.
+		// So instead we remember the syscall exit time and emit the event
+		// in execute when we have a P.
+		gp.trace.sysExitTime = traceClockNow()
+	}
+
+	gp.m.locks--
+
+	// Call the scheduler.
+	mcall(exitsyscall0)
+
+	// Scheduler returned, so we're allowed to run now.
+	// Delete the syscallsp information that we left for
+	// the garbage collector during the system call.
+	// Must wait until now because until gosched returns
+	// we don't know for sure that the garbage collector
+	// is not running.
+	gp.syscallsp = 0
+	gp.m.p.ptr().syscalltick++
+	gp.throwsplit = false
+}
+
+//go:nosplit
+func exitsyscallfast(oldp *p) bool {
+	gp := getg()
+
+	// Freezetheworld sets stopwait but does not retake P's.
+	if sched.stopwait == freezeStopWait {
+		return false
+	}
+
+	// Try to re-acquire the last P.
+	if oldp != nil && oldp.status == _Psyscall && atomic.Cas(&oldp.status, _Psyscall, _Pidle) {
+		// There's a cpu for us, so we can run.
+		wirep(oldp)
+		exitsyscallfast_reacquired()
+		return true
+	}
+
+	// Try to get any other idle P.
+	if sched.pidle != 0 {
+		var ok bool
+		systemstack(func() {
+			ok = exitsyscallfast_pidle()
+			if ok && traceEnabled() {
+				if oldp != nil {
+					// Wait till traceGoSysBlock event is emitted.
+					// This ensures consistency of the trace (the goroutine is started after it is blocked).
+					for oldp.syscalltick == gp.m.syscalltick {
+						osyield()
+					}
+				}
+				traceGoSysExit()
+			}
+		})
+		if ok {
+			return true
+		}
+	}
+	return false
+}
+
+// exitsyscallfast_reacquired is the exitsyscall path on which this G
+// has successfully reacquired the P it was running on before the
+// syscall.
+//
+//go:nosplit
+func exitsyscallfast_reacquired() {
+	gp := getg()
+	if gp.m.syscalltick != gp.m.p.ptr().syscalltick {
+		if traceEnabled() {
+			// The p was retaken and then enter into syscall again (since gp.m.syscalltick has changed).
+			// traceGoSysBlock for this syscall was already emitted,
+			// but here we effectively retake the p from the new syscall running on the same p.
+			systemstack(func() {
+				// Denote blocking of the new syscall.
+				traceGoSysBlock(gp.m.p.ptr())
+				// Denote completion of the current syscall.
+				traceGoSysExit()
+			})
+		}
+		gp.m.p.ptr().syscalltick++
+	}
+}
+
+func exitsyscallfast_pidle() bool {
+	lock(&sched.lock)
+	pp, _ := pidleget(0)
+	if pp != nil && sched.sysmonwait.Load() {
+		sched.sysmonwait.Store(false)
+		notewakeup(&sched.sysmonnote)
+	}
+	unlock(&sched.lock)
+	if pp != nil {
+		acquirep(pp)
+		return true
+	}
+	return false
+}
+
+// exitsyscall slow path on g0.
+// Failed to acquire P, enqueue gp as runnable.
+//
+// Called via mcall, so gp is the calling g from this M.
+//
+//go:nowritebarrierrec
+func exitsyscall0(gp *g) {
+	casgstatus(gp, _Gsyscall, _Grunnable)
+	dropg()
+	lock(&sched.lock)
+	var pp *p
+	if schedEnabled(gp) {
+		pp, _ = pidleget(0)
+	}
+	var locked bool
+	if pp == nil {
+		globrunqput(gp)
+
+		// Below, we stoplockedm if gp is locked. globrunqput releases
+		// ownership of gp, so we must check if gp is locked prior to
+		// committing the release by unlocking sched.lock, otherwise we
+		// could race with another M transitioning gp from unlocked to
+		// locked.
+		locked = gp.lockedm != 0
+	} else if sched.sysmonwait.Load() {
+		sched.sysmonwait.Store(false)
+		notewakeup(&sched.sysmonnote)
+	}
+	unlock(&sched.lock)
+	if pp != nil {
+		acquirep(pp)
+		execute(gp, false) // Never returns.
+	}
+	if locked {
+		// Wait until another thread schedules gp and so m again.
+		//
+		// N.B. lockedm must be this M, as this g was running on this M
+		// before entersyscall.
+		stoplockedm()
+		execute(gp, false) // Never returns.
+	}
+	stopm()
+	schedule() // Never returns.
+}
+
+// Called from syscall package before fork.
+//
+//go:linkname syscall_runtime_BeforeFork syscall.runtime_BeforeFork
+//go:nosplit
+func syscall_runtime_BeforeFork() {
+	gp := getg().m.curg
+
+	// Block signals during a fork, so that the child does not run
+	// a signal handler before exec if a signal is sent to the process
+	// group. See issue #18600.
+	gp.m.locks++
+	sigsave(&gp.m.sigmask)
+	sigblock(false)
+
+	// This function is called before fork in syscall package.
+	// Code between fork and exec must not allocate memory nor even try to grow stack.
+	// Here we spoil g.stackguard0 to reliably detect any attempts to grow stack.
+	// runtime_AfterFork will undo this in parent process, but not in child.
+	gp.stackguard0 = stackFork
+}
+
+// Called from syscall package after fork in parent.
+//
+//go:linkname syscall_runtime_AfterFork syscall.runtime_AfterFork
+//go:nosplit
+func syscall_runtime_AfterFork() {
+	gp := getg().m.curg
+
+	// See the comments in beforefork.
+	gp.stackguard0 = gp.stack.lo + stackGuard
+
+	msigrestore(gp.m.sigmask)
+
+	gp.m.locks--
+}
+
+// inForkedChild is true while manipulating signals in the child process.
+// This is used to avoid calling libc functions in case we are using vfork.
+var inForkedChild bool
+
+// Called from syscall package after fork in child.
+// It resets non-sigignored signals to the default handler, and
+// restores the signal mask in preparation for the exec.
+//
+// Because this might be called during a vfork, and therefore may be
+// temporarily sharing address space with the parent process, this must
+// not change any global variables or calling into C code that may do so.
+//
+//go:linkname syscall_runtime_AfterForkInChild syscall.runtime_AfterForkInChild
+//go:nosplit
+//go:nowritebarrierrec
+func syscall_runtime_AfterForkInChild() {
+	// It's OK to change the global variable inForkedChild here
+	// because we are going to change it back. There is no race here,
+	// because if we are sharing address space with the parent process,
+	// then the parent process can not be running concurrently.
+	inForkedChild = true
+
+	clearSignalHandlers()
+
+	// When we are the child we are the only thread running,
+	// so we know that nothing else has changed gp.m.sigmask.
+	msigrestore(getg().m.sigmask)
+
+	inForkedChild = false
+}
+
+// pendingPreemptSignals is the number of preemption signals
+// that have been sent but not received. This is only used on Darwin.
+// For #41702.
+var pendingPreemptSignals atomic.Int32
+
+// Called from syscall package before Exec.
+//
+//go:linkname syscall_runtime_BeforeExec syscall.runtime_BeforeExec
+func syscall_runtime_BeforeExec() {
+	// Prevent thread creation during exec.
+	execLock.lock()
+
+	// On Darwin, wait for all pending preemption signals to
+	// be received. See issue #41702.
+	if GOOS == "darwin" || GOOS == "ios" {
+		for pendingPreemptSignals.Load() > 0 {
+			osyield()
+		}
+	}
+}
+
+// Called from syscall package after Exec.
+//
+//go:linkname syscall_runtime_AfterExec syscall.runtime_AfterExec
+func syscall_runtime_AfterExec() {
+	execLock.unlock()
+}
+
+// Allocate a new g, with a stack big enough for stacksize bytes.
+func malg(stacksize int32) *g {
+	newg := new(g)
+	if stacksize >= 0 {
+		stacksize = round2(stackSystem + stacksize)
+		systemstack(func() {
+			newg.stack = stackalloc(uint32(stacksize))
+		})
+		newg.stackguard0 = newg.stack.lo + stackGuard
+		newg.stackguard1 = ^uintptr(0)
+		// Clear the bottom word of the stack. We record g
+		// there on gsignal stack during VDSO on ARM and ARM64.
+		*(*uintptr)(unsafe.Pointer(newg.stack.lo)) = 0
+	}
+	return newg
+}
+
+// Create a new g running fn.
+// Put it on the queue of g's waiting to run.
+// The compiler turns a go statement into a call to this.
+func newproc(fn *funcval) {
+	gp := getg()
+	pc := getcallerpc()
+	systemstack(func() {
+		newg := newproc1(fn, gp, pc)
+
+		pp := getg().m.p.ptr()
+		runqput(pp, newg, true)
+
+		if mainStarted {
+			wakep()
+		}
+	})
+}
+
+// Create a new g in state _Grunnable, starting at fn. callerpc is the
+// address of the go statement that created this. The caller is responsible
+// for adding the new g to the scheduler.
+func newproc1(fn *funcval, callergp *g, callerpc uintptr) *g {
+	if fn == nil {
+		fatal("go of nil func value")
+	}
+
+	mp := acquirem() // disable preemption because we hold M and P in local vars.
+	pp := mp.p.ptr()
+	newg := gfget(pp)
+	if newg == nil {
+		newg = malg(stackMin)
+		casgstatus(newg, _Gidle, _Gdead)
+		allgadd(newg) // publishes with a g->status of Gdead so GC scanner doesn't look at uninitialized stack.
+	}
+	if newg.stack.hi == 0 {
+		throw("newproc1: newg missing stack")
+	}
+
+	if readgstatus(newg) != _Gdead {
+		throw("newproc1: new g is not Gdead")
+	}
+
+	totalSize := uintptr(4*goarch.PtrSize + sys.MinFrameSize) // extra space in case of reads slightly beyond frame
+	totalSize = alignUp(totalSize, sys.StackAlign)
+	sp := newg.stack.hi - totalSize
+	spArg := sp
+	if usesLR {
+		// caller's LR
+		*(*uintptr)(unsafe.Pointer(sp)) = 0
+		prepGoExitFrame(sp)
+		spArg += sys.MinFrameSize
+	}
+
+	memclrNoHeapPointers(unsafe.Pointer(&newg.sched), unsafe.Sizeof(newg.sched))
+	newg.sched.sp = sp
+	newg.stktopsp = sp
+	newg.sched.pc = abi.FuncPCABI0(goexit) + sys.PCQuantum // +PCQuantum so that previous instruction is in same function
+	newg.sched.g = guintptr(unsafe.Pointer(newg))
+	gostartcallfn(&newg.sched, fn)
+	newg.parentGoid = callergp.goid
+	newg.gopc = callerpc
+	newg.ancestors = saveAncestors(callergp)
+	newg.startpc = fn.fn
+	if isSystemGoroutine(newg, false) {
+		sched.ngsys.Add(1)
+	} else {
+		// Only user goroutines inherit pprof labels.
+		if mp.curg != nil {
+			newg.labels = mp.curg.labels
+		}
+		if goroutineProfile.active {
+			// A concurrent goroutine profile is running. It should include
+			// exactly the set of goroutines that were alive when the goroutine
+			// profiler first stopped the world. That does not include newg, so
+			// mark it as not needing a profile before transitioning it from
+			// _Gdead.
+			newg.goroutineProfiled.Store(goroutineProfileSatisfied)
+		}
+	}
+	// Track initial transition?
+	newg.trackingSeq = uint8(fastrand())
+	if newg.trackingSeq%gTrackingPeriod == 0 {
+		newg.tracking = true
+	}
+	casgstatus(newg, _Gdead, _Grunnable)
+	gcController.addScannableStack(pp, int64(newg.stack.hi-newg.stack.lo))
+
+	if pp.goidcache == pp.goidcacheend {
+		// Sched.goidgen is the last allocated id,
+		// this batch must be [sched.goidgen+1, sched.goidgen+GoidCacheBatch].
+		// At startup sched.goidgen=0, so main goroutine receives goid=1.
+		pp.goidcache = sched.goidgen.Add(_GoidCacheBatch)
+		pp.goidcache -= _GoidCacheBatch - 1
+		pp.goidcacheend = pp.goidcache + _GoidCacheBatch
+	}
+	newg.goid = pp.goidcache
+	pp.goidcache++
+	if raceenabled {
+		newg.racectx = racegostart(callerpc)
+		newg.raceignore = 0
+		if newg.labels != nil {
+			// See note in proflabel.go on labelSync's role in synchronizing
+			// with the reads in the signal handler.
+			racereleasemergeg(newg, unsafe.Pointer(&labelSync))
+		}
+	}
+	if traceEnabled() {
+		traceGoCreate(newg, newg.startpc)
+	}
+	releasem(mp)
+
+	return newg
+}
+
+// saveAncestors copies previous ancestors of the given caller g and
+// includes info for the current caller into a new set of tracebacks for
+// a g being created.
+func saveAncestors(callergp *g) *[]ancestorInfo {
+	// Copy all prior info, except for the root goroutine (goid 0).
+	if debug.tracebackancestors <= 0 || callergp.goid == 0 {
+		return nil
+	}
+	var callerAncestors []ancestorInfo
+	if callergp.ancestors != nil {
+		callerAncestors = *callergp.ancestors
+	}
+	n := int32(len(callerAncestors)) + 1
+	if n > debug.tracebackancestors {
+		n = debug.tracebackancestors
+	}
+	ancestors := make([]ancestorInfo, n)
+	copy(ancestors[1:], callerAncestors)
+
+	var pcs [tracebackInnerFrames]uintptr
+	npcs := gcallers(callergp, 0, pcs[:])
+	ipcs := make([]uintptr, npcs)
+	copy(ipcs, pcs[:])
+	ancestors[0] = ancestorInfo{
+		pcs:  ipcs,
+		goid: callergp.goid,
+		gopc: callergp.gopc,
+	}
+
+	ancestorsp := new([]ancestorInfo)
+	*ancestorsp = ancestors
+	return ancestorsp
+}
+
+// Put on gfree list.
+// If local list is too long, transfer a batch to the global list.
+func gfput(pp *p, gp *g) {
+	if readgstatus(gp) != _Gdead {
+		throw("gfput: bad status (not Gdead)")
+	}
+
+	stksize := gp.stack.hi - gp.stack.lo
+
+	if stksize != uintptr(startingStackSize) {
+		// non-standard stack size - free it.
+		stackfree(gp.stack)
+		gp.stack.lo = 0
+		gp.stack.hi = 0
+		gp.stackguard0 = 0
+	}
+
+	pp.gFree.push(gp)
+	pp.gFree.n++
+	if pp.gFree.n >= 64 {
+		var (
+			inc      int32
+			stackQ   gQueue
+			noStackQ gQueue
+		)
+		for pp.gFree.n >= 32 {
+			gp := pp.gFree.pop()
+			pp.gFree.n--
+			if gp.stack.lo == 0 {
+				noStackQ.push(gp)
+			} else {
+				stackQ.push(gp)
+			}
+			inc++
+		}
+		lock(&sched.gFree.lock)
+		sched.gFree.noStack.pushAll(noStackQ)
+		sched.gFree.stack.pushAll(stackQ)
+		sched.gFree.n += inc
+		unlock(&sched.gFree.lock)
+	}
+}
+
+// Get from gfree list.
+// If local list is empty, grab a batch from global list.
+func gfget(pp *p) *g {
+retry:
+	if pp.gFree.empty() && (!sched.gFree.stack.empty() || !sched.gFree.noStack.empty()) {
+		lock(&sched.gFree.lock)
+		// Move a batch of free Gs to the P.
+		for pp.gFree.n < 32 {
+			// Prefer Gs with stacks.
+			gp := sched.gFree.stack.pop()
+			if gp == nil {
+				gp = sched.gFree.noStack.pop()
+				if gp == nil {
+					break
+				}
+			}
+			sched.gFree.n--
+			pp.gFree.push(gp)
+			pp.gFree.n++
+		}
+		unlock(&sched.gFree.lock)
+		goto retry
+	}
+	gp := pp.gFree.pop()
+	if gp == nil {
+		return nil
+	}
+	pp.gFree.n--
+	if gp.stack.lo != 0 && gp.stack.hi-gp.stack.lo != uintptr(startingStackSize) {
+		// Deallocate old stack. We kept it in gfput because it was the
+		// right size when the goroutine was put on the free list, but
+		// the right size has changed since then.
+		systemstack(func() {
+			stackfree(gp.stack)
+			gp.stack.lo = 0
+			gp.stack.hi = 0
+			gp.stackguard0 = 0
+		})
+	}
+	if gp.stack.lo == 0 {
+		// Stack was deallocated in gfput or just above. Allocate a new one.
+		systemstack(func() {
+			gp.stack = stackalloc(startingStackSize)
+		})
+		gp.stackguard0 = gp.stack.lo + stackGuard
+	} else {
+		if raceenabled {
+			racemalloc(unsafe.Pointer(gp.stack.lo), gp.stack.hi-gp.stack.lo)
+		}
+		if msanenabled {
+			msanmalloc(unsafe.Pointer(gp.stack.lo), gp.stack.hi-gp.stack.lo)
+		}
+		if asanenabled {
+			asanunpoison(unsafe.Pointer(gp.stack.lo), gp.stack.hi-gp.stack.lo)
+		}
+	}
+	return gp
+}
+
+// Purge all cached G's from gfree list to the global list.
+func gfpurge(pp *p) {
+	var (
+		inc      int32
+		stackQ   gQueue
+		noStackQ gQueue
+	)
+	for !pp.gFree.empty() {
+		gp := pp.gFree.pop()
+		pp.gFree.n--
+		if gp.stack.lo == 0 {
+			noStackQ.push(gp)
+		} else {
+			stackQ.push(gp)
+		}
+		inc++
+	}
+	lock(&sched.gFree.lock)
+	sched.gFree.noStack.pushAll(noStackQ)
+	sched.gFree.stack.pushAll(stackQ)
+	sched.gFree.n += inc
+	unlock(&sched.gFree.lock)
+}
+
+// Breakpoint executes a breakpoint trap.
+func Breakpoint() {
+	breakpoint()
+}
+
+// dolockOSThread is called by LockOSThread and lockOSThread below
+// after they modify m.locked. Do not allow preemption during this call,
+// or else the m might be different in this function than in the caller.
+//
+//go:nosplit
+func dolockOSThread() {
+	if GOARCH == "wasm" {
+		return // no threads on wasm yet
+	}
+	gp := getg()
+	gp.m.lockedg.set(gp)
+	gp.lockedm.set(gp.m)
+}
+
+// LockOSThread wires the calling goroutine to its current operating system thread.
+// The calling goroutine will always execute in that thread,
+// and no other goroutine will execute in it,
+// until the calling goroutine has made as many calls to
+// UnlockOSThread as to LockOSThread.
+// If the calling goroutine exits without unlocking the thread,
+// the thread will be terminated.
+//
+// All init functions are run on the startup thread. Calling LockOSThread
+// from an init function will cause the main function to be invoked on
+// that thread.
+//
+// A goroutine should call LockOSThread before calling OS services or
+// non-Go library functions that depend on per-thread state.
+//
+//go:nosplit
+func LockOSThread() {
+	if atomic.Load(&newmHandoff.haveTemplateThread) == 0 && GOOS != "plan9" {
+		// If we need to start a new thread from the locked
+		// thread, we need the template thread. Start it now
+		// while we're in a known-good state.
+		startTemplateThread()
+	}
+	gp := getg()
+	gp.m.lockedExt++
+	if gp.m.lockedExt == 0 {
+		gp.m.lockedExt--
+		panic("LockOSThread nesting overflow")
+	}
+	dolockOSThread()
+}
+
+//go:nosplit
+func lockOSThread() {
+	getg().m.lockedInt++
+	dolockOSThread()
+}
+
+// dounlockOSThread is called by UnlockOSThread and unlockOSThread below
+// after they update m->locked. Do not allow preemption during this call,
+// or else the m might be in different in this function than in the caller.
+//
+//go:nosplit
+func dounlockOSThread() {
+	if GOARCH == "wasm" {
+		return // no threads on wasm yet
+	}
+	gp := getg()
+	if gp.m.lockedInt != 0 || gp.m.lockedExt != 0 {
+		return
+	}
+	gp.m.lockedg = 0
+	gp.lockedm = 0
+}
+
+// UnlockOSThread undoes an earlier call to LockOSThread.
+// If this drops the number of active LockOSThread calls on the
+// calling goroutine to zero, it unwires the calling goroutine from
+// its fixed operating system thread.
+// If there are no active LockOSThread calls, this is a no-op.
+//
+// Before calling UnlockOSThread, the caller must ensure that the OS
+// thread is suitable for running other goroutines. If the caller made
+// any permanent changes to the state of the thread that would affect
+// other goroutines, it should not call this function and thus leave
+// the goroutine locked to the OS thread until the goroutine (and
+// hence the thread) exits.
+//
+//go:nosplit
+func UnlockOSThread() {
+	gp := getg()
+	if gp.m.lockedExt == 0 {
+		return
+	}
+	gp.m.lockedExt--
+	dounlockOSThread()
+}
+
+//go:nosplit
+func unlockOSThread() {
+	gp := getg()
+	if gp.m.lockedInt == 0 {
+		systemstack(badunlockosthread)
+	}
+	gp.m.lockedInt--
+	dounlockOSThread()
+}
+
+func badunlockosthread() {
+	throw("runtime: internal error: misuse of lockOSThread/unlockOSThread")
+}
+
+func gcount() int32 {
+	n := int32(atomic.Loaduintptr(&allglen)) - sched.gFree.n - sched.ngsys.Load()
+	for _, pp := range allp {
+		n -= pp.gFree.n
+	}
+
+	// All these variables can be changed concurrently, so the result can be inconsistent.
+	// But at least the current goroutine is running.
+	if n < 1 {
+		n = 1
+	}
+	return n
+}
+
+func mcount() int32 {
+	return int32(sched.mnext - sched.nmfreed)
+}
+
+var prof struct {
+	signalLock atomic.Uint32
+
+	// Must hold signalLock to write. Reads may be lock-free, but
+	// signalLock should be taken to synchronize with changes.
+	hz atomic.Int32
+}
+
+func _System()                    { _System() }
+func _ExternalCode()              { _ExternalCode() }
+func _LostExternalCode()          { _LostExternalCode() }
+func _GC()                        { _GC() }
+func _LostSIGPROFDuringAtomic64() { _LostSIGPROFDuringAtomic64() }
+func _VDSO()                      { _VDSO() }
+
+// Called if we receive a SIGPROF signal.
+// Called by the signal handler, may run during STW.
+//
+//go:nowritebarrierrec
+func sigprof(pc, sp, lr uintptr, gp *g, mp *m) {
+	if prof.hz.Load() == 0 {
+		return
+	}
+
+	// If mp.profilehz is 0, then profiling is not enabled for this thread.
+	// We must check this to avoid a deadlock between setcpuprofilerate
+	// and the call to cpuprof.add, below.
+	if mp != nil && mp.profilehz == 0 {
+		return
+	}
+
+	// On mips{,le}/arm, 64bit atomics are emulated with spinlocks, in
+	// runtime/internal/atomic. If SIGPROF arrives while the program is inside
+	// the critical section, it creates a deadlock (when writing the sample).
+	// As a workaround, create a counter of SIGPROFs while in critical section
+	// to store the count, and pass it to sigprof.add() later when SIGPROF is
+	// received from somewhere else (with _LostSIGPROFDuringAtomic64 as pc).
+	if GOARCH == "mips" || GOARCH == "mipsle" || GOARCH == "arm" {
+		if f := findfunc(pc); f.valid() {
+			if hasPrefix(funcname(f), "runtime/internal/atomic") {
+				cpuprof.lostAtomic++
+				return
+			}
+		}
+		if GOARCH == "arm" && goarm < 7 && GOOS == "linux" && pc&0xffff0000 == 0xffff0000 {
+			// runtime/internal/atomic functions call into kernel
+			// helpers on arm < 7. See
+			// runtime/internal/atomic/sys_linux_arm.s.
+			cpuprof.lostAtomic++
+			return
+		}
+	}
+
+	// Profiling runs concurrently with GC, so it must not allocate.
+	// Set a trap in case the code does allocate.
+	// Note that on windows, one thread takes profiles of all the
+	// other threads, so mp is usually not getg().m.
+	// In fact mp may not even be stopped.
+	// See golang.org/issue/17165.
+	getg().m.mallocing++
+
+	var u unwinder
+	var stk [maxCPUProfStack]uintptr
+	n := 0
+	if mp.ncgo > 0 && mp.curg != nil && mp.curg.syscallpc != 0 && mp.curg.syscallsp != 0 {
+		cgoOff := 0
+		// Check cgoCallersUse to make sure that we are not
+		// interrupting other code that is fiddling with
+		// cgoCallers.  We are running in a signal handler
+		// with all signals blocked, so we don't have to worry
+		// about any other code interrupting us.
+		if mp.cgoCallersUse.Load() == 0 && mp.cgoCallers != nil && mp.cgoCallers[0] != 0 {
+			for cgoOff < len(mp.cgoCallers) && mp.cgoCallers[cgoOff] != 0 {
+				cgoOff++
+			}
+			n += copy(stk[:], mp.cgoCallers[:cgoOff])
+			mp.cgoCallers[0] = 0
+		}
+
+		// Collect Go stack that leads to the cgo call.
+		u.initAt(mp.curg.syscallpc, mp.curg.syscallsp, 0, mp.curg, unwindSilentErrors)
+	} else if usesLibcall() && mp.libcallg != 0 && mp.libcallpc != 0 && mp.libcallsp != 0 {
+		// Libcall, i.e. runtime syscall on windows.
+		// Collect Go stack that leads to the call.
+		u.initAt(mp.libcallpc, mp.libcallsp, 0, mp.libcallg.ptr(), unwindSilentErrors)
+	} else if mp != nil && mp.vdsoSP != 0 {
+		// VDSO call, e.g. nanotime1 on Linux.
+		// Collect Go stack that leads to the call.
+		u.initAt(mp.vdsoPC, mp.vdsoSP, 0, gp, unwindSilentErrors|unwindJumpStack)
+	} else {
+		u.initAt(pc, sp, lr, gp, unwindSilentErrors|unwindTrap|unwindJumpStack)
+	}
+	n += tracebackPCs(&u, 0, stk[n:])
+
+	if n <= 0 {
+		// Normal traceback is impossible or has failed.
+		// Account it against abstract "System" or "GC".
+		n = 2
+		if inVDSOPage(pc) {
+			pc = abi.FuncPCABIInternal(_VDSO) + sys.PCQuantum
+		} else if pc > firstmoduledata.etext {
+			// "ExternalCode" is better than "etext".
+			pc = abi.FuncPCABIInternal(_ExternalCode) + sys.PCQuantum
+		}
+		stk[0] = pc
+		if mp.preemptoff != "" {
+			stk[1] = abi.FuncPCABIInternal(_GC) + sys.PCQuantum
+		} else {
+			stk[1] = abi.FuncPCABIInternal(_System) + sys.PCQuantum
+		}
+	}
+
+	if prof.hz.Load() != 0 {
+		// Note: it can happen on Windows that we interrupted a system thread
+		// with no g, so gp could nil. The other nil checks are done out of
+		// caution, but not expected to be nil in practice.
+		var tagPtr *unsafe.Pointer
+		if gp != nil && gp.m != nil && gp.m.curg != nil {
+			tagPtr = &gp.m.curg.labels
+		}
+		cpuprof.add(tagPtr, stk[:n])
+
+		gprof := gp
+		var pp *p
+		if gp != nil && gp.m != nil {
+			if gp.m.curg != nil {
+				gprof = gp.m.curg
+			}
+			pp = gp.m.p.ptr()
+		}
+		traceCPUSample(gprof, pp, stk[:n])
+	}
+	getg().m.mallocing--
+}
+
+// setcpuprofilerate sets the CPU profiling rate to hz times per second.
+// If hz <= 0, setcpuprofilerate turns off CPU profiling.
+func setcpuprofilerate(hz int32) {
+	// Force sane arguments.
+	if hz < 0 {
+		hz = 0
+	}
+
+	// Disable preemption, otherwise we can be rescheduled to another thread
+	// that has profiling enabled.
+	gp := getg()
+	gp.m.locks++
+
+	// Stop profiler on this thread so that it is safe to lock prof.
+	// if a profiling signal came in while we had prof locked,
+	// it would deadlock.
+	setThreadCPUProfiler(0)
+
+	for !prof.signalLock.CompareAndSwap(0, 1) {
+		osyield()
+	}
+	if prof.hz.Load() != hz {
+		setProcessCPUProfiler(hz)
+		prof.hz.Store(hz)
+	}
+	prof.signalLock.Store(0)
+
+	lock(&sched.lock)
+	sched.profilehz = hz
+	unlock(&sched.lock)
+
+	if hz != 0 {
+		setThreadCPUProfiler(hz)
+	}
+
+	gp.m.locks--
+}
+
+// init initializes pp, which may be a freshly allocated p or a
+// previously destroyed p, and transitions it to status _Pgcstop.
+func (pp *p) init(id int32) {
+	pp.id = id
+	pp.status = _Pgcstop
+	pp.sudogcache = pp.sudogbuf[:0]
+	pp.deferpool = pp.deferpoolbuf[:0]
+	pp.wbBuf.reset()
+	if pp.mcache == nil {
+		if id == 0 {
+			if mcache0 == nil {
+				throw("missing mcache?")
+			}
+			// Use the bootstrap mcache0. Only one P will get
+			// mcache0: the one with ID 0.
+			pp.mcache = mcache0
+		} else {
+			pp.mcache = allocmcache()
+		}
+	}
+	if raceenabled && pp.raceprocctx == 0 {
+		if id == 0 {
+			pp.raceprocctx = raceprocctx0
+			raceprocctx0 = 0 // bootstrap
+		} else {
+			pp.raceprocctx = raceproccreate()
+		}
+	}
+	lockInit(&pp.timersLock, lockRankTimers)
+
+	// This P may get timers when it starts running. Set the mask here
+	// since the P may not go through pidleget (notably P 0 on startup).
+	timerpMask.set(id)
+	// Similarly, we may not go through pidleget before this P starts
+	// running if it is P 0 on startup.
+	idlepMask.clear(id)
+}
+
+// destroy releases all of the resources associated with pp and
+// transitions it to status _Pdead.
+//
+// sched.lock must be held and the world must be stopped.
+func (pp *p) destroy() {
+	assertLockHeld(&sched.lock)
+	assertWorldStopped()
+
+	// Move all runnable goroutines to the global queue
+	for pp.runqhead != pp.runqtail {
+		// Pop from tail of local queue
+		pp.runqtail--
+		gp := pp.runq[pp.runqtail%uint32(len(pp.runq))].ptr()
+		// Push onto head of global queue
+		globrunqputhead(gp)
+	}
+	if pp.runnext != 0 {
+		globrunqputhead(pp.runnext.ptr())
+		pp.runnext = 0
+	}
+	if len(pp.timers) > 0 {
+		plocal := getg().m.p.ptr()
+		// The world is stopped, but we acquire timersLock to
+		// protect against sysmon calling timeSleepUntil.
+		// This is the only case where we hold the timersLock of
+		// more than one P, so there are no deadlock concerns.
+		lock(&plocal.timersLock)
+		lock(&pp.timersLock)
+		moveTimers(plocal, pp.timers)
+		pp.timers = nil
+		pp.numTimers.Store(0)
+		pp.deletedTimers.Store(0)
+		pp.timer0When.Store(0)
+		unlock(&pp.timersLock)
+		unlock(&plocal.timersLock)
+	}
+	// Flush p's write barrier buffer.
+	if gcphase != _GCoff {
+		wbBufFlush1(pp)
+		pp.gcw.dispose()
+	}
+	for i := range pp.sudogbuf {
+		pp.sudogbuf[i] = nil
+	}
+	pp.sudogcache = pp.sudogbuf[:0]
+	pp.pinnerCache = nil
+	for j := range pp.deferpoolbuf {
+		pp.deferpoolbuf[j] = nil
+	}
+	pp.deferpool = pp.deferpoolbuf[:0]
+	systemstack(func() {
+		for i := 0; i < pp.mspancache.len; i++ {
+			// Safe to call since the world is stopped.
+			mheap_.spanalloc.free(unsafe.Pointer(pp.mspancache.buf[i]))
+		}
+		pp.mspancache.len = 0
+		lock(&mheap_.lock)
+		pp.pcache.flush(&mheap_.pages)
+		unlock(&mheap_.lock)
+	})
+	freemcache(pp.mcache)
+	pp.mcache = nil
+	gfpurge(pp)
+	traceProcFree(pp)
+	if raceenabled {
+		if pp.timerRaceCtx != 0 {
+			// The race detector code uses a callback to fetch
+			// the proc context, so arrange for that callback
+			// to see the right thing.
+			// This hack only works because we are the only
+			// thread running.
+			mp := getg().m
+			phold := mp.p.ptr()
+			mp.p.set(pp)
+
+			racectxend(pp.timerRaceCtx)
+			pp.timerRaceCtx = 0
+
+			mp.p.set(phold)
+		}
+		raceprocdestroy(pp.raceprocctx)
+		pp.raceprocctx = 0
+	}
+	pp.gcAssistTime = 0
+	pp.status = _Pdead
+}
+
+// Change number of processors.
+//
+// sched.lock must be held, and the world must be stopped.
+//
+// gcworkbufs must not be being modified by either the GC or the write barrier
+// code, so the GC must not be running if the number of Ps actually changes.
+//
+// Returns list of Ps with local work, they need to be scheduled by the caller.
+func procresize(nprocs int32) *p {
+	assertLockHeld(&sched.lock)
+	assertWorldStopped()
+
+	old := gomaxprocs
+	if old < 0 || nprocs <= 0 {
+		throw("procresize: invalid arg")
+	}
+	if traceEnabled() {
+		traceGomaxprocs(nprocs)
+	}
+
+	// update statistics
+	now := nanotime()
+	if sched.procresizetime != 0 {
+		sched.totaltime += int64(old) * (now - sched.procresizetime)
+	}
+	sched.procresizetime = now
+
+	maskWords := (nprocs + 31) / 32
+
+	// Grow allp if necessary.
+	if nprocs > int32(len(allp)) {
+		// Synchronize with retake, which could be running
+		// concurrently since it doesn't run on a P.
+		lock(&allpLock)
+		if nprocs <= int32(cap(allp)) {
+			allp = allp[:nprocs]
+		} else {
+			nallp := make([]*p, nprocs)
+			// Copy everything up to allp's cap so we
+			// never lose old allocated Ps.
+			copy(nallp, allp[:cap(allp)])
+			allp = nallp
+		}
+
+		if maskWords <= int32(cap(idlepMask)) {
+			idlepMask = idlepMask[:maskWords]
+			timerpMask = timerpMask[:maskWords]
+		} else {
+			nidlepMask := make([]uint32, maskWords)
+			// No need to copy beyond len, old Ps are irrelevant.
+			copy(nidlepMask, idlepMask)
+			idlepMask = nidlepMask
+
+			ntimerpMask := make([]uint32, maskWords)
+			copy(ntimerpMask, timerpMask)
+			timerpMask = ntimerpMask
+		}
+		unlock(&allpLock)
+	}
+
+	// initialize new P's
+	for i := old; i < nprocs; i++ {
+		pp := allp[i]
+		if pp == nil {
+			pp = new(p)
+		}
+		pp.init(i)
+		atomicstorep(unsafe.Pointer(&allp[i]), unsafe.Pointer(pp))
+	}
+
+	gp := getg()
+	if gp.m.p != 0 && gp.m.p.ptr().id < nprocs {
+		// continue to use the current P
+		gp.m.p.ptr().status = _Prunning
+		gp.m.p.ptr().mcache.prepareForSweep()
+	} else {
+		// release the current P and acquire allp[0].
+		//
+		// We must do this before destroying our current P
+		// because p.destroy itself has write barriers, so we
+		// need to do that from a valid P.
+		if gp.m.p != 0 {
+			if traceEnabled() {
+				// Pretend that we were descheduled
+				// and then scheduled again to keep
+				// the trace sane.
+				traceGoSched()
+				traceProcStop(gp.m.p.ptr())
+			}
+			gp.m.p.ptr().m = 0
+		}
+		gp.m.p = 0
+		pp := allp[0]
+		pp.m = 0
+		pp.status = _Pidle
+		acquirep(pp)
+		if traceEnabled() {
+			traceGoStart()
+		}
+	}
+
+	// g.m.p is now set, so we no longer need mcache0 for bootstrapping.
+	mcache0 = nil
+
+	// release resources from unused P's
+	for i := nprocs; i < old; i++ {
+		pp := allp[i]
+		pp.destroy()
+		// can't free P itself because it can be referenced by an M in syscall
+	}
+
+	// Trim allp.
+	if int32(len(allp)) != nprocs {
+		lock(&allpLock)
+		allp = allp[:nprocs]
+		idlepMask = idlepMask[:maskWords]
+		timerpMask = timerpMask[:maskWords]
+		unlock(&allpLock)
+	}
+
+	var runnablePs *p
+	for i := nprocs - 1; i >= 0; i-- {
+		pp := allp[i]
+		if gp.m.p.ptr() == pp {
+			continue
+		}
+		pp.status = _Pidle
+		if runqempty(pp) {
+			pidleput(pp, now)
+		} else {
+			pp.m.set(mget())
+			pp.link.set(runnablePs)
+			runnablePs = pp
+		}
+	}
+	stealOrder.reset(uint32(nprocs))
+	var int32p *int32 = &gomaxprocs // make compiler check that gomaxprocs is an int32
+	atomic.Store((*uint32)(unsafe.Pointer(int32p)), uint32(nprocs))
+	if old != nprocs {
+		// Notify the limiter that the amount of procs has changed.
+		gcCPULimiter.resetCapacity(now, nprocs)
+	}
+	return runnablePs
+}
+
+// Associate p and the current m.
+//
+// This function is allowed to have write barriers even if the caller
+// isn't because it immediately acquires pp.
+//
+//go:yeswritebarrierrec
+func acquirep(pp *p) {
+	// Do the part that isn't allowed to have write barriers.
+	wirep(pp)
+
+	// Have p; write barriers now allowed.
+
+	// Perform deferred mcache flush before this P can allocate
+	// from a potentially stale mcache.
+	pp.mcache.prepareForSweep()
+
+	if traceEnabled() {
+		traceProcStart()
+	}
+}
+
+// wirep is the first step of acquirep, which actually associates the
+// current M to pp. This is broken out so we can disallow write
+// barriers for this part, since we don't yet have a P.
+//
+//go:nowritebarrierrec
+//go:nosplit
+func wirep(pp *p) {
+	gp := getg()
+
+	if gp.m.p != 0 {
+		throw("wirep: already in go")
+	}
+	if pp.m != 0 || pp.status != _Pidle {
+		id := int64(0)
+		if pp.m != 0 {
+			id = pp.m.ptr().id
+		}
+		print("wirep: p->m=", pp.m, "(", id, ") p->status=", pp.status, "\n")
+		throw("wirep: invalid p state")
+	}
+	gp.m.p.set(pp)
+	pp.m.set(gp.m)
+	pp.status = _Prunning
+}
+
+// Disassociate p and the current m.
+func releasep() *p {
+	gp := getg()
+
+	if gp.m.p == 0 {
+		throw("releasep: invalid arg")
+	}
+	pp := gp.m.p.ptr()
+	if pp.m.ptr() != gp.m || pp.status != _Prunning {
+		print("releasep: m=", gp.m, " m->p=", gp.m.p.ptr(), " p->m=", hex(pp.m), " p->status=", pp.status, "\n")
+		throw("releasep: invalid p state")
+	}
+	if traceEnabled() {
+		traceProcStop(gp.m.p.ptr())
+	}
+	gp.m.p = 0
+	pp.m = 0
+	pp.status = _Pidle
+	return pp
+}
+
+func incidlelocked(v int32) {
+	lock(&sched.lock)
+	sched.nmidlelocked += v
+	if v > 0 {
+		checkdead()
+	}
+	unlock(&sched.lock)
+}
+
+// Check for deadlock situation.
+// The check is based on number of running M's, if 0 -> deadlock.
+// sched.lock must be held.
+func checkdead() {
+	assertLockHeld(&sched.lock)
+
+	// For -buildmode=c-shared or -buildmode=c-archive it's OK if
+	// there are no running goroutines. The calling program is
+	// assumed to be running.
+	if islibrary || isarchive {
+		return
+	}
+
+	// If we are dying because of a signal caught on an already idle thread,
+	// freezetheworld will cause all running threads to block.
+	// And runtime will essentially enter into deadlock state,
+	// except that there is a thread that will call exit soon.
+	if panicking.Load() > 0 {
+		return
+	}
+
+	// If we are not running under cgo, but we have an extra M then account
+	// for it. (It is possible to have an extra M on Windows without cgo to
+	// accommodate callbacks created by syscall.NewCallback. See issue #6751
+	// for details.)
+	var run0 int32
+	if !iscgo && cgoHasExtraM && extraMLength.Load() > 0 {
+		run0 = 1
+	}
+
+	run := mcount() - sched.nmidle - sched.nmidlelocked - sched.nmsys
+	if run > run0 {
+		return
+	}
+	if run < 0 {
+		print("runtime: checkdead: nmidle=", sched.nmidle, " nmidlelocked=", sched.nmidlelocked, " mcount=", mcount(), " nmsys=", sched.nmsys, "\n")
+		unlock(&sched.lock)
+		throw("checkdead: inconsistent counts")
+	}
+
+	grunning := 0
+	forEachG(func(gp *g) {
+		if isSystemGoroutine(gp, false) {
+			return
+		}
+		s := readgstatus(gp)
+		switch s &^ _Gscan {
+		case _Gwaiting,
+			_Gpreempted:
+			grunning++
+		case _Grunnable,
+			_Grunning,
+			_Gsyscall:
+			print("runtime: checkdead: find g ", gp.goid, " in status ", s, "\n")
+			unlock(&sched.lock)
+			throw("checkdead: runnable g")
+		}
+	})
+	if grunning == 0 { // possible if main goroutine calls runtime·Goexit()
+		unlock(&sched.lock) // unlock so that GODEBUG=scheddetail=1 doesn't hang
+		fatal("no goroutines (main called runtime.Goexit) - deadlock!")
+	}
+
+	// Maybe jump time forward for playground.
+	if faketime != 0 {
+		if when := timeSleepUntil(); when < maxWhen {
+			faketime = when
+
+			// Start an M to steal the timer.
+			pp, _ := pidleget(faketime)
+			if pp == nil {
+				// There should always be a free P since
+				// nothing is running.
+				unlock(&sched.lock)
+				throw("checkdead: no p for timer")
+			}
+			mp := mget()
+			if mp == nil {
+				// There should always be a free M since
+				// nothing is running.
+				unlock(&sched.lock)
+				throw("checkdead: no m for timer")
+			}
+			// M must be spinning to steal. We set this to be
+			// explicit, but since this is the only M it would
+			// become spinning on its own anyways.
+			sched.nmspinning.Add(1)
+			mp.spinning = true
+			mp.nextp.set(pp)
+			notewakeup(&mp.park)
+			return
+		}
+	}
+
+	// There are no goroutines running, so we can look at the P's.
+	for _, pp := range allp {
+		if len(pp.timers) > 0 {
+			return
+		}
+	}
+
+	unlock(&sched.lock) // unlock so that GODEBUG=scheddetail=1 doesn't hang
+	fatal("all goroutines are asleep - deadlock!")
+}
+
+// forcegcperiod is the maximum time in nanoseconds between garbage
+// collections. If we go this long without a garbage collection, one
+// is forced to run.
+//
+// This is a variable for testing purposes. It normally doesn't change.
+var forcegcperiod int64 = 2 * 60 * 1e9
+
+// needSysmonWorkaround is true if the workaround for
+// golang.org/issue/42515 is needed on NetBSD.
+var needSysmonWorkaround bool = false
+
+// Always runs without a P, so write barriers are not allowed.
+//
+//go:nowritebarrierrec
+func sysmon() {
+	lock(&sched.lock)
+	sched.nmsys++
+	checkdead()
+	unlock(&sched.lock)
+
+	lasttrace := int64(0)
+	idle := 0 // how many cycles in succession we had not wokeup somebody
+	delay := uint32(0)
+
+	for {
+		if idle == 0 { // start with 20us sleep...
+			delay = 20
+		} else if idle > 50 { // start doubling the sleep after 1ms...
+			delay *= 2
+		}
+		if delay > 10*1000 { // up to 10ms
+			delay = 10 * 1000
+		}
+		usleep(delay)
+
+		// sysmon should not enter deep sleep if schedtrace is enabled so that
+		// it can print that information at the right time.
+		//
+		// It should also not enter deep sleep if there are any active P's so
+		// that it can retake P's from syscalls, preempt long running G's, and
+		// poll the network if all P's are busy for long stretches.
+		//
+		// It should wakeup from deep sleep if any P's become active either due
+		// to exiting a syscall or waking up due to a timer expiring so that it
+		// can resume performing those duties. If it wakes from a syscall it
+		// resets idle and delay as a bet that since it had retaken a P from a
+		// syscall before, it may need to do it again shortly after the
+		// application starts work again. It does not reset idle when waking
+		// from a timer to avoid adding system load to applications that spend
+		// most of their time sleeping.
+		now := nanotime()
+		if debug.schedtrace <= 0 && (sched.gcwaiting.Load() || sched.npidle.Load() == gomaxprocs) {
+			lock(&sched.lock)
+			if sched.gcwaiting.Load() || sched.npidle.Load() == gomaxprocs {
+				syscallWake := false
+				next := timeSleepUntil()
+				if next > now {
+					sched.sysmonwait.Store(true)
+					unlock(&sched.lock)
+					// Make wake-up period small enough
+					// for the sampling to be correct.
+					sleep := forcegcperiod / 2
+					if next-now < sleep {
+						sleep = next - now
+					}
+					shouldRelax := sleep >= osRelaxMinNS
+					if shouldRelax {
+						osRelax(true)
+					}
+					syscallWake = notetsleep(&sched.sysmonnote, sleep)
+					if shouldRelax {
+						osRelax(false)
+					}
+					lock(&sched.lock)
+					sched.sysmonwait.Store(false)
+					noteclear(&sched.sysmonnote)
+				}
+				if syscallWake {
+					idle = 0
+					delay = 20
+				}
+			}
+			unlock(&sched.lock)
+		}
+
+		lock(&sched.sysmonlock)
+		// Update now in case we blocked on sysmonnote or spent a long time
+		// blocked on schedlock or sysmonlock above.
+		now = nanotime()
+
+		// trigger libc interceptors if needed
+		if *cgo_yield != nil {
+			asmcgocall(*cgo_yield, nil)
+		}
+		// poll network if not polled for more than 10ms
+		lastpoll := sched.lastpoll.Load()
+		if netpollinited() && lastpoll != 0 && lastpoll+10*1000*1000 < now {
+			sched.lastpoll.CompareAndSwap(lastpoll, now)
+			list := netpoll(0) // non-blocking - returns list of goroutines
+			if !list.empty() {
+				// Need to decrement number of idle locked M's
+				// (pretending that one more is running) before injectglist.
+				// Otherwise it can lead to the following situation:
+				// injectglist grabs all P's but before it starts M's to run the P's,
+				// another M returns from syscall, finishes running its G,
+				// observes that there is no work to do and no other running M's
+				// and reports deadlock.
+				incidlelocked(-1)
+				injectglist(&list)
+				incidlelocked(1)
+			}
+		}
+		if GOOS == "netbsd" && needSysmonWorkaround {
+			// netpoll is responsible for waiting for timer
+			// expiration, so we typically don't have to worry
+			// about starting an M to service timers. (Note that
+			// sleep for timeSleepUntil above simply ensures sysmon
+			// starts running again when that timer expiration may
+			// cause Go code to run again).
+			//
+			// However, netbsd has a kernel bug that sometimes
+			// misses netpollBreak wake-ups, which can lead to
+			// unbounded delays servicing timers. If we detect this
+			// overrun, then startm to get something to handle the
+			// timer.
+			//
+			// See issue 42515 and
+			// https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=50094.
+			if next := timeSleepUntil(); next < now {
+				startm(nil, false, false)
+			}
+		}
+		if scavenger.sysmonWake.Load() != 0 {
+			// Kick the scavenger awake if someone requested it.
+			scavenger.wake()
+		}
+		// retake P's blocked in syscalls
+		// and preempt long running G's
+		if retake(now) != 0 {
+			idle = 0
+		} else {
+			idle++
+		}
+		// check if we need to force a GC
+		if t := (gcTrigger{kind: gcTriggerTime, now: now}); t.test() && forcegc.idle.Load() {
+			lock(&forcegc.lock)
+			forcegc.idle.Store(false)
+			var list gList
+			list.push(forcegc.g)
+			injectglist(&list)
+			unlock(&forcegc.lock)
+		}
+		if debug.schedtrace > 0 && lasttrace+int64(debug.schedtrace)*1000000 <= now {
+			lasttrace = now
+			schedtrace(debug.scheddetail > 0)
+		}
+		unlock(&sched.sysmonlock)
+	}
+}
+
+type sysmontick struct {
+	schedtick   uint32
+	schedwhen   int64
+	syscalltick uint32
+	syscallwhen int64
+}
+
+// forcePreemptNS is the time slice given to a G before it is
+// preempted.
+const forcePreemptNS = 10 * 1000 * 1000 // 10ms
+
+func retake(now int64) uint32 {
+	n := 0
+	// Prevent allp slice changes. This lock will be completely
+	// uncontended unless we're already stopping the world.
+	lock(&allpLock)
+	// We can't use a range loop over allp because we may
+	// temporarily drop the allpLock. Hence, we need to re-fetch
+	// allp each time around the loop.
+	for i := 0; i < len(allp); i++ {
+		pp := allp[i]
+		if pp == nil {
+			// This can happen if procresize has grown
+			// allp but not yet created new Ps.
+			continue
+		}
+		pd := &pp.sysmontick
+		s := pp.status
+		sysretake := false
+		if s == _Prunning || s == _Psyscall {
+			// Preempt G if it's running for too long.
+			t := int64(pp.schedtick)
+			if int64(pd.schedtick) != t {
+				pd.schedtick = uint32(t)
+				pd.schedwhen = now
+			} else if pd.schedwhen+forcePreemptNS <= now {
+				preemptone(pp)
+				// In case of syscall, preemptone() doesn't
+				// work, because there is no M wired to P.
+				sysretake = true
+			}
+		}
+		if s == _Psyscall {
+			// Retake P from syscall if it's there for more than 1 sysmon tick (at least 20us).
+			t := int64(pp.syscalltick)
+			if !sysretake && int64(pd.syscalltick) != t {
+				pd.syscalltick = uint32(t)
+				pd.syscallwhen = now
+				continue
+			}
+			// On the one hand we don't want to retake Ps if there is no other work to do,
+			// but on the other hand we want to retake them eventually
+			// because they can prevent the sysmon thread from deep sleep.
+			if runqempty(pp) && sched.nmspinning.Load()+sched.npidle.Load() > 0 && pd.syscallwhen+10*1000*1000 > now {
+				continue
+			}
+			// Drop allpLock so we can take sched.lock.
+			unlock(&allpLock)
+			// Need to decrement number of idle locked M's
+			// (pretending that one more is running) before the CAS.
+			// Otherwise the M from which we retake can exit the syscall,
+			// increment nmidle and report deadlock.
+			incidlelocked(-1)
+			if atomic.Cas(&pp.status, s, _Pidle) {
+				if traceEnabled() {
+					traceGoSysBlock(pp)
+					traceProcStop(pp)
+				}
+				n++
+				pp.syscalltick++
+				handoffp(pp)
+			}
+			incidlelocked(1)
+			lock(&allpLock)
+		}
+	}
+	unlock(&allpLock)
+	return uint32(n)
+}
+
+// Tell all goroutines that they have been preempted and they should stop.
+// This function is purely best-effort. It can fail to inform a goroutine if a
+// processor just started running it.
+// No locks need to be held.
+// Returns true if preemption request was issued to at least one goroutine.
+func preemptall() bool {
+	res := false
+	for _, pp := range allp {
+		if pp.status != _Prunning {
+			continue
+		}
+		if preemptone(pp) {
+			res = true
+		}
+	}
+	return res
+}
+
+// Tell the goroutine running on processor P to stop.
+// This function is purely best-effort. It can incorrectly fail to inform the
+// goroutine. It can inform the wrong goroutine. Even if it informs the
+// correct goroutine, that goroutine might ignore the request if it is
+// simultaneously executing newstack.
+// No lock needs to be held.
+// Returns true if preemption request was issued.
+// The actual preemption will happen at some point in the future
+// and will be indicated by the gp->status no longer being
+// Grunning
+func preemptone(pp *p) bool {
+	mp := pp.m.ptr()
+	if mp == nil || mp == getg().m {
+		return false
+	}
+	gp := mp.curg
+	if gp == nil || gp == mp.g0 {
+		return false
+	}
+
+	gp.preempt = true
+
+	// Every call in a goroutine checks for stack overflow by
+	// comparing the current stack pointer to gp->stackguard0.
+	// Setting gp->stackguard0 to StackPreempt folds
+	// preemption into the normal stack overflow check.
+	gp.stackguard0 = stackPreempt
+
+	// Request an async preemption of this P.
+	if preemptMSupported && debug.asyncpreemptoff == 0 {
+		pp.preempt = true
+		preemptM(mp)
+	}
+
+	return true
+}
+
+var starttime int64
+
+func schedtrace(detailed bool) {
+	now := nanotime()
+	if starttime == 0 {
+		starttime = now
+	}
+
+	lock(&sched.lock)
+	print("SCHED ", (now-starttime)/1e6, "ms: gomaxprocs=", gomaxprocs, " idleprocs=", sched.npidle.Load(), " threads=", mcount(), " spinningthreads=", sched.nmspinning.Load(), " needspinning=", sched.needspinning.Load(), " idlethreads=", sched.nmidle, " runqueue=", sched.runqsize)
+	if detailed {
+		print(" gcwaiting=", sched.gcwaiting.Load(), " nmidlelocked=", sched.nmidlelocked, " stopwait=", sched.stopwait, " sysmonwait=", sched.sysmonwait.Load(), "\n")
+	}
+	// We must be careful while reading data from P's, M's and G's.
+	// Even if we hold schedlock, most data can be changed concurrently.
+	// E.g. (p->m ? p->m->id : -1) can crash if p->m changes from non-nil to nil.
+	for i, pp := range allp {
+		mp := pp.m.ptr()
+		h := atomic.Load(&pp.runqhead)
+		t := atomic.Load(&pp.runqtail)
+		if detailed {
+			print("  P", i, ": status=", pp.status, " schedtick=", pp.schedtick, " syscalltick=", pp.syscalltick, " m=")
+			if mp != nil {
+				print(mp.id)
+			} else {
+				print("nil")
+			}
+			print(" runqsize=", t-h, " gfreecnt=", pp.gFree.n, " timerslen=", len(pp.timers), "\n")
+		} else {
+			// In non-detailed mode format lengths of per-P run queues as:
+			// [len1 len2 len3 len4]
+			print(" ")
+			if i == 0 {
+				print("[")
+			}
+			print(t - h)
+			if i == len(allp)-1 {
+				print("]\n")
+			}
+		}
+	}
+
+	if !detailed {
+		unlock(&sched.lock)
+		return
+	}
+
+	for mp := allm; mp != nil; mp = mp.alllink {
+		pp := mp.p.ptr()
+		print("  M", mp.id, ": p=")
+		if pp != nil {
+			print(pp.id)
+		} else {
+			print("nil")
+		}
+		print(" curg=")
+		if mp.curg != nil {
+			print(mp.curg.goid)
+		} else {
+			print("nil")
+		}
+		print(" mallocing=", mp.mallocing, " throwing=", mp.throwing, " preemptoff=", mp.preemptoff, " locks=", mp.locks, " dying=", mp.dying, " spinning=", mp.spinning, " blocked=", mp.blocked, " lockedg=")
+		if lockedg := mp.lockedg.ptr(); lockedg != nil {
+			print(lockedg.goid)
+		} else {
+			print("nil")
+		}
+		print("\n")
+	}
+
+	forEachG(func(gp *g) {
+		print("  G", gp.goid, ": status=", readgstatus(gp), "(", gp.waitreason.String(), ") m=")
+		if gp.m != nil {
+			print(gp.m.id)
+		} else {
+			print("nil")
+		}
+		print(" lockedm=")
+		if lockedm := gp.lockedm.ptr(); lockedm != nil {
+			print(lockedm.id)
+		} else {
+			print("nil")
+		}
+		print("\n")
+	})
+	unlock(&sched.lock)
+}
+
+// schedEnableUser enables or disables the scheduling of user
+// goroutines.
+//
+// This does not stop already running user goroutines, so the caller
+// should first stop the world when disabling user goroutines.
+func schedEnableUser(enable bool) {
+	lock(&sched.lock)
+	if sched.disable.user == !enable {
+		unlock(&sched.lock)
+		return
+	}
+	sched.disable.user = !enable
+	if enable {
+		n := sched.disable.n
+		sched.disable.n = 0
+		globrunqputbatch(&sched.disable.runnable, n)
+		unlock(&sched.lock)
+		for ; n != 0 && sched.npidle.Load() != 0; n-- {
+			startm(nil, false, false)
+		}
+	} else {
+		unlock(&sched.lock)
+	}
+}
+
+// schedEnabled reports whether gp should be scheduled. It returns
+// false is scheduling of gp is disabled.
+//
+// sched.lock must be held.
+func schedEnabled(gp *g) bool {
+	assertLockHeld(&sched.lock)
+
+	if sched.disable.user {
+		return isSystemGoroutine(gp, true)
+	}
+	return true
+}
+
+// Put mp on midle list.
+// sched.lock must be held.
+// May run during STW, so write barriers are not allowed.
+//
+//go:nowritebarrierrec
+func mput(mp *m) {
+	assertLockHeld(&sched.lock)
+
+	mp.schedlink = sched.midle
+	sched.midle.set(mp)
+	sched.nmidle++
+	checkdead()
+}
+
+// Try to get an m from midle list.
+// sched.lock must be held.
+// May run during STW, so write barriers are not allowed.
+//
+//go:nowritebarrierrec
+func mget() *m {
+	assertLockHeld(&sched.lock)
+
+	mp := sched.midle.ptr()
+	if mp != nil {
+		sched.midle = mp.schedlink
+		sched.nmidle--
+	}
+	return mp
+}
+
+// Put gp on the global runnable queue.
+// sched.lock must be held.
+// May run during STW, so write barriers are not allowed.
+//
+//go:nowritebarrierrec
+func globrunqput(gp *g) {
+	assertLockHeld(&sched.lock)
+
+	sched.runq.pushBack(gp)
+	sched.runqsize++
+}
+
+// Put gp at the head of the global runnable queue.
+// sched.lock must be held.
+// May run during STW, so write barriers are not allowed.
+//
+//go:nowritebarrierrec
+func globrunqputhead(gp *g) {
+	assertLockHeld(&sched.lock)
+
+	sched.runq.push(gp)
+	sched.runqsize++
+}
+
+// Put a batch of runnable goroutines on the global runnable queue.
+// This clears *batch.
+// sched.lock must be held.
+// May run during STW, so write barriers are not allowed.
+//
+//go:nowritebarrierrec
+func globrunqputbatch(batch *gQueue, n int32) {
+	assertLockHeld(&sched.lock)
+
+	sched.runq.pushBackAll(*batch)
+	sched.runqsize += n
+	*batch = gQueue{}
+}
+
+// Try get a batch of G's from the global runnable queue.
+// sched.lock must be held.
+func globrunqget(pp *p, max int32) *g {
+	assertLockHeld(&sched.lock)
+
+	if sched.runqsize == 0 {
+		return nil
+	}
+
+	n := sched.runqsize/gomaxprocs + 1
+	if n > sched.runqsize {
+		n = sched.runqsize
+	}
+	if max > 0 && n > max {
+		n = max
+	}
+	if n > int32(len(pp.runq))/2 {
+		n = int32(len(pp.runq)) / 2
+	}
+
+	sched.runqsize -= n
+
+	gp := sched.runq.pop()
+	n--
+	for ; n > 0; n-- {
+		gp1 := sched.runq.pop()
+		runqput(pp, gp1, false)
+	}
+	return gp
+}
+
+// pMask is an atomic bitstring with one bit per P.
+type pMask []uint32
+
+// read returns true if P id's bit is set.
+func (p pMask) read(id uint32) bool {
+	word := id / 32
+	mask := uint32(1) << (id % 32)
+	return (atomic.Load(&p[word]) & mask) != 0
+}
+
+// set sets P id's bit.
+func (p pMask) set(id int32) {
+	word := id / 32
+	mask := uint32(1) << (id % 32)
+	atomic.Or(&p[word], mask)
+}
+
+// clear clears P id's bit.
+func (p pMask) clear(id int32) {
+	word := id / 32
+	mask := uint32(1) << (id % 32)
+	atomic.And(&p[word], ^mask)
+}
+
+// updateTimerPMask clears pp's timer mask if it has no timers on its heap.
+//
+// Ideally, the timer mask would be kept immediately consistent on any timer
+// operations. Unfortunately, updating a shared global data structure in the
+// timer hot path adds too much overhead in applications frequently switching
+// between no timers and some timers.
+//
+// As a compromise, the timer mask is updated only on pidleget / pidleput. A
+// running P (returned by pidleget) may add a timer at any time, so its mask
+// must be set. An idle P (passed to pidleput) cannot add new timers while
+// idle, so if it has no timers at that time, its mask may be cleared.
+//
+// Thus, we get the following effects on timer-stealing in findrunnable:
+//
+//   - Idle Ps with no timers when they go idle are never checked in findrunnable
+//     (for work- or timer-stealing; this is the ideal case).
+//   - Running Ps must always be checked.
+//   - Idle Ps whose timers are stolen must continue to be checked until they run
+//     again, even after timer expiration.
+//
+// When the P starts running again, the mask should be set, as a timer may be
+// added at any time.
+//
+// TODO(prattmic): Additional targeted updates may improve the above cases.
+// e.g., updating the mask when stealing a timer.
+func updateTimerPMask(pp *p) {
+	if pp.numTimers.Load() > 0 {
+		return
+	}
+
+	// Looks like there are no timers, however another P may transiently
+	// decrement numTimers when handling a timerModified timer in
+	// checkTimers. We must take timersLock to serialize with these changes.
+	lock(&pp.timersLock)
+	if pp.numTimers.Load() == 0 {
+		timerpMask.clear(pp.id)
+	}
+	unlock(&pp.timersLock)
+}
+
+// pidleput puts p on the _Pidle list. now must be a relatively recent call
+// to nanotime or zero. Returns now or the current time if now was zero.
+//
+// This releases ownership of p. Once sched.lock is released it is no longer
+// safe to use p.
+//
+// sched.lock must be held.
+//
+// May run during STW, so write barriers are not allowed.
+//
+//go:nowritebarrierrec
+func pidleput(pp *p, now int64) int64 {
+	assertLockHeld(&sched.lock)
+
+	if !runqempty(pp) {
+		throw("pidleput: P has non-empty run queue")
+	}
+	if now == 0 {
+		now = nanotime()
+	}
+	updateTimerPMask(pp) // clear if there are no timers.
+	idlepMask.set(pp.id)
+	pp.link = sched.pidle
+	sched.pidle.set(pp)
+	sched.npidle.Add(1)
+	if !pp.limiterEvent.start(limiterEventIdle, now) {
+		throw("must be able to track idle limiter event")
+	}
+	return now
+}
+
+// pidleget tries to get a p from the _Pidle list, acquiring ownership.
+//
+// sched.lock must be held.
+//
+// May run during STW, so write barriers are not allowed.
+//
+//go:nowritebarrierrec
+func pidleget(now int64) (*p, int64) {
+	assertLockHeld(&sched.lock)
+
+	pp := sched.pidle.ptr()
+	if pp != nil {
+		// Timer may get added at any time now.
+		if now == 0 {
+			now = nanotime()
+		}
+		timerpMask.set(pp.id)
+		idlepMask.clear(pp.id)
+		sched.pidle = pp.link
+		sched.npidle.Add(-1)
+		pp.limiterEvent.stop(limiterEventIdle, now)
+	}
+	return pp, now
+}
+
+// pidlegetSpinning tries to get a p from the _Pidle list, acquiring ownership.
+// This is called by spinning Ms (or callers than need a spinning M) that have
+// found work. If no P is available, this must synchronized with non-spinning
+// Ms that may be preparing to drop their P without discovering this work.
+//
+// sched.lock must be held.
+//
+// May run during STW, so write barriers are not allowed.
+//
+//go:nowritebarrierrec
+func pidlegetSpinning(now int64) (*p, int64) {
+	assertLockHeld(&sched.lock)
+
+	pp, now := pidleget(now)
+	if pp == nil {
+		// See "Delicate dance" comment in findrunnable. We found work
+		// that we cannot take, we must synchronize with non-spinning
+		// Ms that may be preparing to drop their P.
+		sched.needspinning.Store(1)
+		return nil, now
+	}
+
+	return pp, now
+}
+
+// runqempty reports whether pp has no Gs on its local run queue.
+// It never returns true spuriously.
+func runqempty(pp *p) bool {
+	// Defend against a race where 1) pp has G1 in runqnext but runqhead == runqtail,
+	// 2) runqput on pp kicks G1 to the runq, 3) runqget on pp empties runqnext.
+	// Simply observing that runqhead == runqtail and then observing that runqnext == nil
+	// does not mean the queue is empty.
+	for {
+		head := atomic.Load(&pp.runqhead)
+		tail := atomic.Load(&pp.runqtail)
+		runnext := atomic.Loaduintptr((*uintptr)(unsafe.Pointer(&pp.runnext)))
+		if tail == atomic.Load(&pp.runqtail) {
+			return head == tail && runnext == 0
+		}
+	}
+}
+
+// To shake out latent assumptions about scheduling order,
+// we introduce some randomness into scheduling decisions
+// when running with the race detector.
+// The need for this was made obvious by changing the
+// (deterministic) scheduling order in Go 1.5 and breaking
+// many poorly-written tests.
+// With the randomness here, as long as the tests pass
+// consistently with -race, they shouldn't have latent scheduling
+// assumptions.
+const randomizeScheduler = raceenabled
+
+// runqput tries to put g on the local runnable queue.
+// If next is false, runqput adds g to the tail of the runnable queue.
+// If next is true, runqput puts g in the pp.runnext slot.
+// If the run queue is full, runnext puts g on the global queue.
+// Executed only by the owner P.
+func runqput(pp *p, gp *g, next bool) {
+	if randomizeScheduler && next && fastrandn(2) == 0 {
+		next = false
+	}
+
+	if next {
+	retryNext:
+		oldnext := pp.runnext
+		if !pp.runnext.cas(oldnext, guintptr(unsafe.Pointer(gp))) {
+			goto retryNext
+		}
+		if oldnext == 0 {
+			return
+		}
+		// Kick the old runnext out to the regular run queue.
+		gp = oldnext.ptr()
+	}
+
+retry:
+	h := atomic.LoadAcq(&pp.runqhead) // load-acquire, synchronize with consumers
+	t := pp.runqtail
+	if t-h < uint32(len(pp.runq)) {
+		pp.runq[t%uint32(len(pp.runq))].set(gp)
+		atomic.StoreRel(&pp.runqtail, t+1) // store-release, makes the item available for consumption
+		return
+	}
+	if runqputslow(pp, gp, h, t) {
+		return
+	}
+	// the queue is not full, now the put above must succeed
+	goto retry
+}
+
+// Put g and a batch of work from local runnable queue on global queue.
+// Executed only by the owner P.
+func runqputslow(pp *p, gp *g, h, t uint32) bool {
+	var batch [len(pp.runq)/2 + 1]*g
+
+	// First, grab a batch from local queue.
+	n := t - h
+	n = n / 2
+	if n != uint32(len(pp.runq)/2) {
+		throw("runqputslow: queue is not full")
+	}
+	for i := uint32(0); i < n; i++ {
+		batch[i] = pp.runq[(h+i)%uint32(len(pp.runq))].ptr()
+	}
+	if !atomic.CasRel(&pp.runqhead, h, h+n) { // cas-release, commits consume
+		return false
+	}
+	batch[n] = gp
+
+	if randomizeScheduler {
+		for i := uint32(1); i <= n; i++ {
+			j := fastrandn(i + 1)
+			batch[i], batch[j] = batch[j], batch[i]
+		}
+	}
+
+	// Link the goroutines.
+	for i := uint32(0); i < n; i++ {
+		batch[i].schedlink.set(batch[i+1])
+	}
+	var q gQueue
+	q.head.set(batch[0])
+	q.tail.set(batch[n])
+
+	// Now put the batch on global queue.
+	lock(&sched.lock)
+	globrunqputbatch(&q, int32(n+1))
+	unlock(&sched.lock)
+	return true
+}
+
+// runqputbatch tries to put all the G's on q on the local runnable queue.
+// If the queue is full, they are put on the global queue; in that case
+// this will temporarily acquire the scheduler lock.
+// Executed only by the owner P.
+func runqputbatch(pp *p, q *gQueue, qsize int) {
+	h := atomic.LoadAcq(&pp.runqhead)
+	t := pp.runqtail
+	n := uint32(0)
+	for !q.empty() && t-h < uint32(len(pp.runq)) {
+		gp := q.pop()
+		pp.runq[t%uint32(len(pp.runq))].set(gp)
+		t++
+		n++
+	}
+	qsize -= int(n)
+
+	if randomizeScheduler {
+		off := func(o uint32) uint32 {
+			return (pp.runqtail + o) % uint32(len(pp.runq))
+		}
+		for i := uint32(1); i < n; i++ {
+			j := fastrandn(i + 1)
+			pp.runq[off(i)], pp.runq[off(j)] = pp.runq[off(j)], pp.runq[off(i)]
+		}
+	}
+
+	atomic.StoreRel(&pp.runqtail, t)
+	if !q.empty() {
+		lock(&sched.lock)
+		globrunqputbatch(q, int32(qsize))
+		unlock(&sched.lock)
+	}
+}
+
+// Get g from local runnable queue.
+// If inheritTime is true, gp should inherit the remaining time in the
+// current time slice. Otherwise, it should start a new time slice.
+// Executed only by the owner P.
+func runqget(pp *p) (gp *g, inheritTime bool) {
+	// If there's a runnext, it's the next G to run.
+	next := pp.runnext
+	// If the runnext is non-0 and the CAS fails, it could only have been stolen by another P,
+	// because other Ps can race to set runnext to 0, but only the current P can set it to non-0.
+	// Hence, there's no need to retry this CAS if it fails.
+	if next != 0 && pp.runnext.cas(next, 0) {
+		return next.ptr(), true
+	}
+
+	for {
+		h := atomic.LoadAcq(&pp.runqhead) // load-acquire, synchronize with other consumers
+		t := pp.runqtail
+		if t == h {
+			return nil, false
+		}
+		gp := pp.runq[h%uint32(len(pp.runq))].ptr()
+		if atomic.CasRel(&pp.runqhead, h, h+1) { // cas-release, commits consume
+			return gp, false
+		}
+	}
+}
+
+// runqdrain drains the local runnable queue of pp and returns all goroutines in it.
+// Executed only by the owner P.
+func runqdrain(pp *p) (drainQ gQueue, n uint32) {
+	oldNext := pp.runnext
+	if oldNext != 0 && pp.runnext.cas(oldNext, 0) {
+		drainQ.pushBack(oldNext.ptr())
+		n++
+	}
+
+retry:
+	h := atomic.LoadAcq(&pp.runqhead) // load-acquire, synchronize with other consumers
+	t := pp.runqtail
+	qn := t - h
+	if qn == 0 {
+		return
+	}
+	if qn > uint32(len(pp.runq)) { // read inconsistent h and t
+		goto retry
+	}
+
+	if !atomic.CasRel(&pp.runqhead, h, h+qn) { // cas-release, commits consume
+		goto retry
+	}
+
+	// We've inverted the order in which it gets G's from the local P's runnable queue
+	// and then advances the head pointer because we don't want to mess up the statuses of G's
+	// while runqdrain() and runqsteal() are running in parallel.
+	// Thus we should advance the head pointer before draining the local P into a gQueue,
+	// so that we can update any gp.schedlink only after we take the full ownership of G,
+	// meanwhile, other P's can't access to all G's in local P's runnable queue and steal them.
+	// See https://groups.google.com/g/golang-dev/c/0pTKxEKhHSc/m/6Q85QjdVBQAJ for more details.
+	for i := uint32(0); i < qn; i++ {
+		gp := pp.runq[(h+i)%uint32(len(pp.runq))].ptr()
+		drainQ.pushBack(gp)
+		n++
+	}
+	return
+}
+
+// Grabs a batch of goroutines from pp's runnable queue into batch.
+// Batch is a ring buffer starting at batchHead.
+// Returns number of grabbed goroutines.
+// Can be executed by any P.
+func runqgrab(pp *p, batch *[256]guintptr, batchHead uint32, stealRunNextG bool) uint32 {
+	for {
+		h := atomic.LoadAcq(&pp.runqhead) // load-acquire, synchronize with other consumers
+		t := atomic.LoadAcq(&pp.runqtail) // load-acquire, synchronize with the producer
+		n := t - h
+		n = n - n/2
+		if n == 0 {
+			if stealRunNextG {
+				// Try to steal from pp.runnext.
+				if next := pp.runnext; next != 0 {
+					if pp.status == _Prunning {
+						// Sleep to ensure that pp isn't about to run the g
+						// we are about to steal.
+						// The important use case here is when the g running
+						// on pp ready()s another g and then almost
+						// immediately blocks. Instead of stealing runnext
+						// in this window, back off to give pp a chance to
+						// schedule runnext. This will avoid thrashing gs
+						// between different Ps.
+						// A sync chan send/recv takes ~50ns as of time of
+						// writing, so 3us gives ~50x overshoot.
+						if GOOS != "windows" && GOOS != "openbsd" && GOOS != "netbsd" {
+							usleep(3)
+						} else {
+							// On some platforms system timer granularity is
+							// 1-15ms, which is way too much for this
+							// optimization. So just yield.
+							osyield()
+						}
+					}
+					if !pp.runnext.cas(next, 0) {
+						continue
+					}
+					batch[batchHead%uint32(len(batch))] = next
+					return 1
+				}
+			}
+			return 0
+		}
+		if n > uint32(len(pp.runq)/2) { // read inconsistent h and t
+			continue
+		}
+		for i := uint32(0); i < n; i++ {
+			g := pp.runq[(h+i)%uint32(len(pp.runq))]
+			batch[(batchHead+i)%uint32(len(batch))] = g
+		}
+		if atomic.CasRel(&pp.runqhead, h, h+n) { // cas-release, commits consume
+			return n
+		}
+	}
+}
+
+// Steal half of elements from local runnable queue of p2
+// and put onto local runnable queue of p.
+// Returns one of the stolen elements (or nil if failed).
+func runqsteal(pp, p2 *p, stealRunNextG bool) *g {
+	t := pp.runqtail
+	n := runqgrab(p2, &pp.runq, t, stealRunNextG)
+	if n == 0 {
+		return nil
+	}
+	n--
+	gp := pp.runq[(t+n)%uint32(len(pp.runq))].ptr()
+	if n == 0 {
+		return gp
+	}
+	h := atomic.LoadAcq(&pp.runqhead) // load-acquire, synchronize with consumers
+	if t-h+n >= uint32(len(pp.runq)) {
+		throw("runqsteal: runq overflow")
+	}
+	atomic.StoreRel(&pp.runqtail, t+n) // store-release, makes the item available for consumption
+	return gp
+}
+
+// A gQueue is a dequeue of Gs linked through g.schedlink. A G can only
+// be on one gQueue or gList at a time.
+type gQueue struct {
+	head guintptr
+	tail guintptr
+}
+
+// empty reports whether q is empty.
+func (q *gQueue) empty() bool {
+	return q.head == 0
+}
+
+// push adds gp to the head of q.
+func (q *gQueue) push(gp *g) {
+	gp.schedlink = q.head
+	q.head.set(gp)
+	if q.tail == 0 {
+		q.tail.set(gp)
+	}
+}
+
+// pushBack adds gp to the tail of q.
+func (q *gQueue) pushBack(gp *g) {
+	gp.schedlink = 0
+	if q.tail != 0 {
+		q.tail.ptr().schedlink.set(gp)
+	} else {
+		q.head.set(gp)
+	}
+	q.tail.set(gp)
+}
+
+// pushBackAll adds all Gs in q2 to the tail of q. After this q2 must
+// not be used.
+func (q *gQueue) pushBackAll(q2 gQueue) {
+	if q2.tail == 0 {
+		return
+	}
+	q2.tail.ptr().schedlink = 0
+	if q.tail != 0 {
+		q.tail.ptr().schedlink = q2.head
+	} else {
+		q.head = q2.head
+	}
+	q.tail = q2.tail
+}
+
+// pop removes and returns the head of queue q. It returns nil if
+// q is empty.
+func (q *gQueue) pop() *g {
+	gp := q.head.ptr()
+	if gp != nil {
+		q.head = gp.schedlink
+		if q.head == 0 {
+			q.tail = 0
+		}
+	}
+	return gp
+}
+
+// popList takes all Gs in q and returns them as a gList.
+func (q *gQueue) popList() gList {
+	stack := gList{q.head}
+	*q = gQueue{}
+	return stack
+}
+
+// A gList is a list of Gs linked through g.schedlink. A G can only be
+// on one gQueue or gList at a time.
+type gList struct {
+	head guintptr
+}
+
+// empty reports whether l is empty.
+func (l *gList) empty() bool {
+	return l.head == 0
+}
+
+// push adds gp to the head of l.
+func (l *gList) push(gp *g) {
+	gp.schedlink = l.head
+	l.head.set(gp)
+}
+
+// pushAll prepends all Gs in q to l.
+func (l *gList) pushAll(q gQueue) {
+	if !q.empty() {
+		q.tail.ptr().schedlink = l.head
+		l.head = q.head
+	}
+}
+
+// pop removes and returns the head of l. If l is empty, it returns nil.
+func (l *gList) pop() *g {
+	gp := l.head.ptr()
+	if gp != nil {
+		l.head = gp.schedlink
+	}
+	return gp
+}
+
+//go:linkname setMaxThreads runtime/debug.setMaxThreads
+func setMaxThreads(in int) (out int) {
+	lock(&sched.lock)
+	out = int(sched.maxmcount)
+	if in > 0x7fffffff { // MaxInt32
+		sched.maxmcount = 0x7fffffff
+	} else {
+		sched.maxmcount = int32(in)
+	}
+	checkmcount()
+	unlock(&sched.lock)
+	return
+}
+
+//go:nosplit
+func procPin() int {
+	gp := getg()
+	mp := gp.m
+
+	mp.locks++
+	return int(mp.p.ptr().id)
+}
+
+//go:nosplit
+func procUnpin() {
+	gp := getg()
+	gp.m.locks--
+}
+
+//go:linkname sync_runtime_procPin sync.runtime_procPin
+//go:nosplit
+func sync_runtime_procPin() int {
+	return procPin()
+}
+
+//go:linkname sync_runtime_procUnpin sync.runtime_procUnpin
+//go:nosplit
+func sync_runtime_procUnpin() {
+	procUnpin()
+}
+
+//go:linkname sync_atomic_runtime_procPin sync/atomic.runtime_procPin
+//go:nosplit
+func sync_atomic_runtime_procPin() int {
+	return procPin()
+}
+
+//go:linkname sync_atomic_runtime_procUnpin sync/atomic.runtime_procUnpin
+//go:nosplit
+func sync_atomic_runtime_procUnpin() {
+	procUnpin()
+}
+
+// Active spinning for sync.Mutex.
+//
+//go:linkname sync_runtime_canSpin sync.runtime_canSpin
+//go:nosplit
+func sync_runtime_canSpin(i int) bool {
+	// sync.Mutex is cooperative, so we are conservative with spinning.
+	// Spin only few times and only if running on a multicore machine and
+	// GOMAXPROCS>1 and there is at least one other running P and local runq is empty.
+	// As opposed to runtime mutex we don't do passive spinning here,
+	// because there can be work on global runq or on other Ps.
+	if i >= active_spin || ncpu <= 1 || gomaxprocs <= sched.npidle.Load()+sched.nmspinning.Load()+1 {
+		return false
+	}
+	if p := getg().m.p.ptr(); !runqempty(p) {
+		return false
+	}
+	return true
+}
+
+//go:linkname sync_runtime_doSpin sync.runtime_doSpin
+//go:nosplit
+func sync_runtime_doSpin() {
+	procyield(active_spin_cnt)
+}
+
+var stealOrder randomOrder
+
+// randomOrder/randomEnum are helper types for randomized work stealing.
+// They allow to enumerate all Ps in different pseudo-random orders without repetitions.
+// The algorithm is based on the fact that if we have X such that X and GOMAXPROCS
+// are coprime, then a sequences of (i + X) % GOMAXPROCS gives the required enumeration.
+type randomOrder struct {
+	count    uint32
+	coprimes []uint32
+}
+
+type randomEnum struct {
+	i     uint32
+	count uint32
+	pos   uint32
+	inc   uint32
+}
+
+func (ord *randomOrder) reset(count uint32) {
+	ord.count = count
+	ord.coprimes = ord.coprimes[:0]
+	for i := uint32(1); i <= count; i++ {
+		if gcd(i, count) == 1 {
+			ord.coprimes = append(ord.coprimes, i)
+		}
+	}
+}
+
+func (ord *randomOrder) start(i uint32) randomEnum {
+	return randomEnum{
+		count: ord.count,
+		pos:   i % ord.count,
+		inc:   ord.coprimes[i/ord.count%uint32(len(ord.coprimes))],
+	}
+}
+
+func (enum *randomEnum) done() bool {
+	return enum.i == enum.count
+}
+
+func (enum *randomEnum) next() {
+	enum.i++
+	enum.pos = (enum.pos + enum.inc) % enum.count
+}
+
+func (enum *randomEnum) position() uint32 {
+	return enum.pos
+}
+
+func gcd(a, b uint32) uint32 {
+	for b != 0 {
+		a, b = b, a%b
+	}
+	return a
+}
+
+// An initTask represents the set of initializations that need to be done for a package.
+// Keep in sync with ../../test/noinit.go:initTask
+type initTask struct {
+	state uint32 // 0 = uninitialized, 1 = in progress, 2 = done
+	nfns  uint32
+	// followed by nfns pcs, uintptr sized, one per init function to run
+}
+
+// inittrace stores statistics for init functions which are
+// updated by malloc and newproc when active is true.
+var inittrace tracestat
+
+type tracestat struct {
+	active bool   // init tracing activation status
+	id     uint64 // init goroutine id
+	allocs uint64 // heap allocations
+	bytes  uint64 // heap allocated bytes
+}
+
+func doInit(ts []*initTask) {
+	for _, t := range ts {
+		doInit1(t)
+	}
+}
+
+func doInit1(t *initTask) {
+	switch t.state {
+	case 2: // fully initialized
+		return
+	case 1: // initialization in progress
+		throw("recursive call during initialization - linker skew")
+	default: // not initialized yet
+		t.state = 1 // initialization in progress
+
+		var (
+			start  int64
+			before tracestat
+		)
+
+		if inittrace.active {
+			start = nanotime()
+			// Load stats non-atomically since tracinit is updated only by this init goroutine.
+			before = inittrace
+		}
+
+		if t.nfns == 0 {
+			// We should have pruned all of these in the linker.
+			throw("inittask with no functions")
+		}
+
+		firstFunc := add(unsafe.Pointer(t), 8)
+		for i := uint32(0); i < t.nfns; i++ {
+			p := add(firstFunc, uintptr(i)*goarch.PtrSize)
+			f := *(*func())(unsafe.Pointer(&p))
+			f()
+		}
+
+		if inittrace.active {
+			end := nanotime()
+			// Load stats non-atomically since tracinit is updated only by this init goroutine.
+			after := inittrace
+
+			f := *(*func())(unsafe.Pointer(&firstFunc))
+			pkg := funcpkgpath(findfunc(abi.FuncPCABIInternal(f)))
+
+			var sbuf [24]byte
+			print("init ", pkg, " @")
+			print(string(fmtNSAsMS(sbuf[:], uint64(start-runtimeInitTime))), " ms, ")
+			print(string(fmtNSAsMS(sbuf[:], uint64(end-start))), " ms clock, ")
+			print(string(itoa(sbuf[:], after.bytes-before.bytes)), " bytes, ")
+			print(string(itoa(sbuf[:], after.allocs-before.allocs)), " allocs")
+			print("\n")
+		}
+
+		t.state = 2 // initialization done
+	}
+}
diff --git a/src/runtime/proc_runtime_test.go b/src/runtime/proc_runtime_test.go
new file mode 100644
index 0000000..90aed83
--- /dev/null
+++ b/src/runtime/proc_runtime_test.go
@@ -0,0 +1,50 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Proc unit tests. In runtime package so can use runtime guts.
+
+package runtime
+
+func RunStealOrderTest() {
+	var ord randomOrder
+	for procs := 1; procs <= 64; procs++ {
+		ord.reset(uint32(procs))
+		if procs >= 3 && len(ord.coprimes) < 2 {
+			panic("too few coprimes")
+		}
+		for co := 0; co < len(ord.coprimes); co++ {
+			enum := ord.start(uint32(co))
+			checked := make([]bool, procs)
+			for p := 0; p < procs; p++ {
+				x := enum.position()
+				if checked[x] {
+					println("procs:", procs, "inc:", enum.inc)
+					panic("duplicate during enumeration")
+				}
+				checked[x] = true
+				enum.next()
+			}
+			if !enum.done() {
+				panic("not done")
+			}
+		}
+	}
+	// Make sure that different arguments to ord.start don't generate the
+	// same pos+inc twice.
+	for procs := 2; procs <= 64; procs++ {
+		ord.reset(uint32(procs))
+		checked := make([]bool, procs*procs)
+		// We want at least procs*len(ord.coprimes) different pos+inc values
+		// before we start repeating.
+		for i := 0; i < procs*len(ord.coprimes); i++ {
+			enum := ord.start(uint32(i))
+			j := enum.pos*uint32(procs) + enum.inc
+			if checked[j] {
+				println("procs:", procs, "pos:", enum.pos, "inc:", enum.inc)
+				panic("duplicate pos+inc during enumeration")
+			}
+			checked[j] = true
+		}
+	}
+}
diff --git a/src/runtime/proc_test.go b/src/runtime/proc_test.go
new file mode 100644
index 0000000..67eadea
--- /dev/null
+++ b/src/runtime/proc_test.go
@@ -0,0 +1,1160 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"internal/race"
+	"internal/testenv"
+	"math"
+	"net"
+	"runtime"
+	"runtime/debug"
+	"strings"
+	"sync"
+	"sync/atomic"
+	"syscall"
+	"testing"
+	"time"
+)
+
+var stop = make(chan bool, 1)
+
+func perpetuumMobile() {
+	select {
+	case <-stop:
+	default:
+		go perpetuumMobile()
+	}
+}
+
+func TestStopTheWorldDeadlock(t *testing.T) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no preemption on wasm yet")
+	}
+	if testing.Short() {
+		t.Skip("skipping during short test")
+	}
+	maxprocs := runtime.GOMAXPROCS(3)
+	compl := make(chan bool, 2)
+	go func() {
+		for i := 0; i != 1000; i += 1 {
+			runtime.GC()
+		}
+		compl <- true
+	}()
+	go func() {
+		for i := 0; i != 1000; i += 1 {
+			runtime.GOMAXPROCS(3)
+		}
+		compl <- true
+	}()
+	go perpetuumMobile()
+	<-compl
+	<-compl
+	stop <- true
+	runtime.GOMAXPROCS(maxprocs)
+}
+
+func TestYieldProgress(t *testing.T) {
+	testYieldProgress(false)
+}
+
+func TestYieldLockedProgress(t *testing.T) {
+	testYieldProgress(true)
+}
+
+func testYieldProgress(locked bool) {
+	c := make(chan bool)
+	cack := make(chan bool)
+	go func() {
+		if locked {
+			runtime.LockOSThread()
+		}
+		for {
+			select {
+			case <-c:
+				cack <- true
+				return
+			default:
+				runtime.Gosched()
+			}
+		}
+	}()
+	time.Sleep(10 * time.Millisecond)
+	c <- true
+	<-cack
+}
+
+func TestYieldLocked(t *testing.T) {
+	const N = 10
+	c := make(chan bool)
+	go func() {
+		runtime.LockOSThread()
+		for i := 0; i < N; i++ {
+			runtime.Gosched()
+			time.Sleep(time.Millisecond)
+		}
+		c <- true
+		// runtime.UnlockOSThread() is deliberately omitted
+	}()
+	<-c
+}
+
+func TestGoroutineParallelism(t *testing.T) {
+	if runtime.NumCPU() == 1 {
+		// Takes too long, too easy to deadlock, etc.
+		t.Skip("skipping on uniprocessor")
+	}
+	P := 4
+	N := 10
+	if testing.Short() {
+		P = 3
+		N = 3
+	}
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(P))
+	// If runtime triggers a forced GC during this test then it will deadlock,
+	// since the goroutines can't be stopped/preempted.
+	// Disable GC for this test (see issue #10958).
+	defer debug.SetGCPercent(debug.SetGCPercent(-1))
+	// SetGCPercent waits until the mark phase is over, but the runtime
+	// also preempts at the start of the sweep phase, so make sure that's
+	// done too. See #45867.
+	runtime.GC()
+	for try := 0; try < N; try++ {
+		done := make(chan bool)
+		x := uint32(0)
+		for p := 0; p < P; p++ {
+			// Test that all P goroutines are scheduled at the same time
+			go func(p int) {
+				for i := 0; i < 3; i++ {
+					expected := uint32(P*i + p)
+					for atomic.LoadUint32(&x) != expected {
+					}
+					atomic.StoreUint32(&x, expected+1)
+				}
+				done <- true
+			}(p)
+		}
+		for p := 0; p < P; p++ {
+			<-done
+		}
+	}
+}
+
+// Test that all runnable goroutines are scheduled at the same time.
+func TestGoroutineParallelism2(t *testing.T) {
+	//testGoroutineParallelism2(t, false, false)
+	testGoroutineParallelism2(t, true, false)
+	testGoroutineParallelism2(t, false, true)
+	testGoroutineParallelism2(t, true, true)
+}
+
+func testGoroutineParallelism2(t *testing.T, load, netpoll bool) {
+	if runtime.NumCPU() == 1 {
+		// Takes too long, too easy to deadlock, etc.
+		t.Skip("skipping on uniprocessor")
+	}
+	P := 4
+	N := 10
+	if testing.Short() {
+		N = 3
+	}
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(P))
+	// If runtime triggers a forced GC during this test then it will deadlock,
+	// since the goroutines can't be stopped/preempted.
+	// Disable GC for this test (see issue #10958).
+	defer debug.SetGCPercent(debug.SetGCPercent(-1))
+	// SetGCPercent waits until the mark phase is over, but the runtime
+	// also preempts at the start of the sweep phase, so make sure that's
+	// done too. See #45867.
+	runtime.GC()
+	for try := 0; try < N; try++ {
+		if load {
+			// Create P goroutines and wait until they all run.
+			// When we run the actual test below, worker threads
+			// running the goroutines will start parking.
+			done := make(chan bool)
+			x := uint32(0)
+			for p := 0; p < P; p++ {
+				go func() {
+					if atomic.AddUint32(&x, 1) == uint32(P) {
+						done <- true
+						return
+					}
+					for atomic.LoadUint32(&x) != uint32(P) {
+					}
+				}()
+			}
+			<-done
+		}
+		if netpoll {
+			// Enable netpoller, affects schedler behavior.
+			laddr := "localhost:0"
+			if runtime.GOOS == "android" {
+				// On some Android devices, there are no records for localhost,
+				// see https://golang.org/issues/14486.
+				// Don't use 127.0.0.1 for every case, it won't work on IPv6-only systems.
+				laddr = "127.0.0.1:0"
+			}
+			ln, err := net.Listen("tcp", laddr)
+			if err != nil {
+				defer ln.Close() // yup, defer in a loop
+			}
+		}
+		done := make(chan bool)
+		x := uint32(0)
+		// Spawn P goroutines in a nested fashion just to differ from TestGoroutineParallelism.
+		for p := 0; p < P/2; p++ {
+			go func(p int) {
+				for p2 := 0; p2 < 2; p2++ {
+					go func(p2 int) {
+						for i := 0; i < 3; i++ {
+							expected := uint32(P*i + p*2 + p2)
+							for atomic.LoadUint32(&x) != expected {
+							}
+							atomic.StoreUint32(&x, expected+1)
+						}
+						done <- true
+					}(p2)
+				}
+			}(p)
+		}
+		for p := 0; p < P; p++ {
+			<-done
+		}
+	}
+}
+
+func TestBlockLocked(t *testing.T) {
+	const N = 10
+	c := make(chan bool)
+	go func() {
+		runtime.LockOSThread()
+		for i := 0; i < N; i++ {
+			c <- true
+		}
+		runtime.UnlockOSThread()
+	}()
+	for i := 0; i < N; i++ {
+		<-c
+	}
+}
+
+func TestTimerFairness(t *testing.T) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no preemption on wasm yet")
+	}
+
+	done := make(chan bool)
+	c := make(chan bool)
+	for i := 0; i < 2; i++ {
+		go func() {
+			for {
+				select {
+				case c <- true:
+				case <-done:
+					return
+				}
+			}
+		}()
+	}
+
+	timer := time.After(20 * time.Millisecond)
+	for {
+		select {
+		case <-c:
+		case <-timer:
+			close(done)
+			return
+		}
+	}
+}
+
+func TestTimerFairness2(t *testing.T) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no preemption on wasm yet")
+	}
+
+	done := make(chan bool)
+	c := make(chan bool)
+	for i := 0; i < 2; i++ {
+		go func() {
+			timer := time.After(20 * time.Millisecond)
+			var buf [1]byte
+			for {
+				syscall.Read(0, buf[0:0])
+				select {
+				case c <- true:
+				case <-c:
+				case <-timer:
+					done <- true
+					return
+				}
+			}
+		}()
+	}
+	<-done
+	<-done
+}
+
+// The function is used to test preemption at split stack checks.
+// Declaring a var avoids inlining at the call site.
+var preempt = func() int {
+	var a [128]int
+	sum := 0
+	for _, v := range a {
+		sum += v
+	}
+	return sum
+}
+
+func TestPreemption(t *testing.T) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no preemption on wasm yet")
+	}
+
+	// Test that goroutines are preempted at function calls.
+	N := 5
+	if testing.Short() {
+		N = 2
+	}
+	c := make(chan bool)
+	var x uint32
+	for g := 0; g < 2; g++ {
+		go func(g int) {
+			for i := 0; i < N; i++ {
+				for atomic.LoadUint32(&x) != uint32(g) {
+					preempt()
+				}
+				atomic.StoreUint32(&x, uint32(1-g))
+			}
+			c <- true
+		}(g)
+	}
+	<-c
+	<-c
+}
+
+func TestPreemptionGC(t *testing.T) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no preemption on wasm yet")
+	}
+
+	// Test that pending GC preempts running goroutines.
+	P := 5
+	N := 10
+	if testing.Short() {
+		P = 3
+		N = 2
+	}
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(P + 1))
+	var stop uint32
+	for i := 0; i < P; i++ {
+		go func() {
+			for atomic.LoadUint32(&stop) == 0 {
+				preempt()
+			}
+		}()
+	}
+	for i := 0; i < N; i++ {
+		runtime.Gosched()
+		runtime.GC()
+	}
+	atomic.StoreUint32(&stop, 1)
+}
+
+func TestAsyncPreempt(t *testing.T) {
+	if !runtime.PreemptMSupported {
+		t.Skip("asynchronous preemption not supported on this platform")
+	}
+	output := runTestProg(t, "testprog", "AsyncPreempt")
+	want := "OK\n"
+	if output != want {
+		t.Fatalf("want %s, got %s\n", want, output)
+	}
+}
+
+func TestGCFairness(t *testing.T) {
+	output := runTestProg(t, "testprog", "GCFairness")
+	want := "OK\n"
+	if output != want {
+		t.Fatalf("want %s, got %s\n", want, output)
+	}
+}
+
+func TestGCFairness2(t *testing.T) {
+	output := runTestProg(t, "testprog", "GCFairness2")
+	want := "OK\n"
+	if output != want {
+		t.Fatalf("want %s, got %s\n", want, output)
+	}
+}
+
+func TestNumGoroutine(t *testing.T) {
+	output := runTestProg(t, "testprog", "NumGoroutine")
+	want := "1\n"
+	if output != want {
+		t.Fatalf("want %q, got %q", want, output)
+	}
+
+	buf := make([]byte, 1<<20)
+
+	// Try up to 10 times for a match before giving up.
+	// This is a fundamentally racy check but it's important
+	// to notice if NumGoroutine and Stack are _always_ out of sync.
+	for i := 0; ; i++ {
+		// Give goroutines about to exit a chance to exit.
+		// The NumGoroutine and Stack below need to see
+		// the same state of the world, so anything we can do
+		// to keep it quiet is good.
+		runtime.Gosched()
+
+		n := runtime.NumGoroutine()
+		buf = buf[:runtime.Stack(buf, true)]
+
+		// To avoid double-counting "goroutine" in "goroutine $m [running]:"
+		// and "created by $func in goroutine $n", remove the latter
+		output := strings.ReplaceAll(string(buf), "in goroutine", "")
+		nstk := strings.Count(output, "goroutine ")
+		if n == nstk {
+			break
+		}
+		if i >= 10 {
+			t.Fatalf("NumGoroutine=%d, but found %d goroutines in stack dump: %s", n, nstk, buf)
+		}
+	}
+}
+
+func TestPingPongHog(t *testing.T) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no preemption on wasm yet")
+	}
+	if testing.Short() {
+		t.Skip("skipping in -short mode")
+	}
+	if race.Enabled {
+		// The race detector randomizes the scheduler,
+		// which causes this test to fail (#38266).
+		t.Skip("skipping in -race mode")
+	}
+
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(1))
+	done := make(chan bool)
+	hogChan, lightChan := make(chan bool), make(chan bool)
+	hogCount, lightCount := 0, 0
+
+	run := func(limit int, counter *int, wake chan bool) {
+		for {
+			select {
+			case <-done:
+				return
+
+			case <-wake:
+				for i := 0; i < limit; i++ {
+					*counter++
+				}
+				wake <- true
+			}
+		}
+	}
+
+	// Start two co-scheduled hog goroutines.
+	for i := 0; i < 2; i++ {
+		go run(1e6, &hogCount, hogChan)
+	}
+
+	// Start two co-scheduled light goroutines.
+	for i := 0; i < 2; i++ {
+		go run(1e3, &lightCount, lightChan)
+	}
+
+	// Start goroutine pairs and wait for a few preemption rounds.
+	hogChan <- true
+	lightChan <- true
+	time.Sleep(100 * time.Millisecond)
+	close(done)
+	<-hogChan
+	<-lightChan
+
+	// Check that hogCount and lightCount are within a factor of
+	// 20, which indicates that both pairs of goroutines handed off
+	// the P within a time-slice to their buddy. We can use a
+	// fairly large factor here to make this robust: if the
+	// scheduler isn't working right, the gap should be ~1000X
+	// (was 5, increased to 20, see issue 52207).
+	const factor = 20
+	if hogCount/factor > lightCount || lightCount/factor > hogCount {
+		t.Fatalf("want hogCount/lightCount in [%v, %v]; got %d/%d = %g", 1.0/factor, factor, hogCount, lightCount, float64(hogCount)/float64(lightCount))
+	}
+}
+
+func BenchmarkPingPongHog(b *testing.B) {
+	if b.N == 0 {
+		return
+	}
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(1))
+
+	// Create a CPU hog
+	stop, done := make(chan bool), make(chan bool)
+	go func() {
+		for {
+			select {
+			case <-stop:
+				done <- true
+				return
+			default:
+			}
+		}
+	}()
+
+	// Ping-pong b.N times
+	ping, pong := make(chan bool), make(chan bool)
+	go func() {
+		for j := 0; j < b.N; j++ {
+			pong <- <-ping
+		}
+		close(stop)
+		done <- true
+	}()
+	go func() {
+		for i := 0; i < b.N; i++ {
+			ping <- <-pong
+		}
+		done <- true
+	}()
+	b.ResetTimer()
+	ping <- true // Start ping-pong
+	<-stop
+	b.StopTimer()
+	<-ping // Let last ponger exit
+	<-done // Make sure goroutines exit
+	<-done
+	<-done
+}
+
+var padData [128]uint64
+
+func stackGrowthRecursive(i int) {
+	var pad [128]uint64
+	pad = padData
+	for j := range pad {
+		if pad[j] != 0 {
+			return
+		}
+	}
+	if i != 0 {
+		stackGrowthRecursive(i - 1)
+	}
+}
+
+func TestPreemptSplitBig(t *testing.T) {
+	if testing.Short() {
+		t.Skip("skipping in -short mode")
+	}
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(2))
+	stop := make(chan int)
+	go big(stop)
+	for i := 0; i < 3; i++ {
+		time.Sleep(10 * time.Microsecond) // let big start running
+		runtime.GC()
+	}
+	close(stop)
+}
+
+func big(stop chan int) int {
+	n := 0
+	for {
+		// delay so that gc is sure to have asked for a preemption
+		for i := 0; i < 1e9; i++ {
+			n++
+		}
+
+		// call bigframe, which used to miss the preemption in its prologue.
+		bigframe(stop)
+
+		// check if we've been asked to stop.
+		select {
+		case <-stop:
+			return n
+		}
+	}
+}
+
+func bigframe(stop chan int) int {
+	// not splitting the stack will overflow.
+	// small will notice that it needs a stack split and will
+	// catch the overflow.
+	var x [8192]byte
+	return small(stop, &x)
+}
+
+func small(stop chan int, x *[8192]byte) int {
+	for i := range x {
+		x[i] = byte(i)
+	}
+	sum := 0
+	for i := range x {
+		sum += int(x[i])
+	}
+
+	// keep small from being a leaf function, which might
+	// make it not do any stack check at all.
+	nonleaf(stop)
+
+	return sum
+}
+
+func nonleaf(stop chan int) bool {
+	// do something that won't be inlined:
+	select {
+	case <-stop:
+		return true
+	default:
+		return false
+	}
+}
+
+func TestSchedLocalQueue(t *testing.T) {
+	runtime.RunSchedLocalQueueTest()
+}
+
+func TestSchedLocalQueueSteal(t *testing.T) {
+	runtime.RunSchedLocalQueueStealTest()
+}
+
+func TestSchedLocalQueueEmpty(t *testing.T) {
+	if runtime.NumCPU() == 1 {
+		// Takes too long and does not trigger the race.
+		t.Skip("skipping on uniprocessor")
+	}
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(4))
+
+	// If runtime triggers a forced GC during this test then it will deadlock,
+	// since the goroutines can't be stopped/preempted during spin wait.
+	defer debug.SetGCPercent(debug.SetGCPercent(-1))
+	// SetGCPercent waits until the mark phase is over, but the runtime
+	// also preempts at the start of the sweep phase, so make sure that's
+	// done too. See #45867.
+	runtime.GC()
+
+	iters := int(1e5)
+	if testing.Short() {
+		iters = 1e2
+	}
+	runtime.RunSchedLocalQueueEmptyTest(iters)
+}
+
+func benchmarkStackGrowth(b *testing.B, rec int) {
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			stackGrowthRecursive(rec)
+		}
+	})
+}
+
+func BenchmarkStackGrowth(b *testing.B) {
+	benchmarkStackGrowth(b, 10)
+}
+
+func BenchmarkStackGrowthDeep(b *testing.B) {
+	benchmarkStackGrowth(b, 1024)
+}
+
+func BenchmarkCreateGoroutines(b *testing.B) {
+	benchmarkCreateGoroutines(b, 1)
+}
+
+func BenchmarkCreateGoroutinesParallel(b *testing.B) {
+	benchmarkCreateGoroutines(b, runtime.GOMAXPROCS(-1))
+}
+
+func benchmarkCreateGoroutines(b *testing.B, procs int) {
+	c := make(chan bool)
+	var f func(n int)
+	f = func(n int) {
+		if n == 0 {
+			c <- true
+			return
+		}
+		go f(n - 1)
+	}
+	for i := 0; i < procs; i++ {
+		go f(b.N / procs)
+	}
+	for i := 0; i < procs; i++ {
+		<-c
+	}
+}
+
+func BenchmarkCreateGoroutinesCapture(b *testing.B) {
+	b.ReportAllocs()
+	for i := 0; i < b.N; i++ {
+		const N = 4
+		var wg sync.WaitGroup
+		wg.Add(N)
+		for i := 0; i < N; i++ {
+			i := i
+			go func() {
+				if i >= N {
+					b.Logf("bad") // just to capture b
+				}
+				wg.Done()
+			}()
+		}
+		wg.Wait()
+	}
+}
+
+// warmupScheduler ensures the scheduler has at least targetThreadCount threads
+// in its thread pool.
+func warmupScheduler(targetThreadCount int) {
+	var wg sync.WaitGroup
+	var count int32
+	for i := 0; i < targetThreadCount; i++ {
+		wg.Add(1)
+		go func() {
+			atomic.AddInt32(&count, 1)
+			for atomic.LoadInt32(&count) < int32(targetThreadCount) {
+				// spin until all threads started
+			}
+
+			// spin a bit more to ensure they are all running on separate CPUs.
+			doWork(time.Millisecond)
+			wg.Done()
+		}()
+	}
+	wg.Wait()
+}
+
+func doWork(dur time.Duration) {
+	start := time.Now()
+	for time.Since(start) < dur {
+	}
+}
+
+// BenchmarkCreateGoroutinesSingle creates many goroutines, all from a single
+// producer (the main benchmark goroutine).
+//
+// Compared to BenchmarkCreateGoroutines, this causes different behavior in the
+// scheduler because Ms are much more likely to need to steal work from the
+// main P rather than having work in the local run queue.
+func BenchmarkCreateGoroutinesSingle(b *testing.B) {
+	// Since we are interested in stealing behavior, warm the scheduler to
+	// get all the Ps running first.
+	warmupScheduler(runtime.GOMAXPROCS(0))
+	b.ResetTimer()
+
+	var wg sync.WaitGroup
+	wg.Add(b.N)
+	for i := 0; i < b.N; i++ {
+		go func() {
+			wg.Done()
+		}()
+	}
+	wg.Wait()
+}
+
+func BenchmarkClosureCall(b *testing.B) {
+	sum := 0
+	off1 := 1
+	for i := 0; i < b.N; i++ {
+		off2 := 2
+		func() {
+			sum += i + off1 + off2
+		}()
+	}
+	_ = sum
+}
+
+func benchmarkWakeupParallel(b *testing.B, spin func(time.Duration)) {
+	if runtime.GOMAXPROCS(0) == 1 {
+		b.Skip("skipping: GOMAXPROCS=1")
+	}
+
+	wakeDelay := 5 * time.Microsecond
+	for _, delay := range []time.Duration{
+		0,
+		1 * time.Microsecond,
+		2 * time.Microsecond,
+		5 * time.Microsecond,
+		10 * time.Microsecond,
+		20 * time.Microsecond,
+		50 * time.Microsecond,
+		100 * time.Microsecond,
+	} {
+		b.Run(delay.String(), func(b *testing.B) {
+			if b.N == 0 {
+				return
+			}
+			// Start two goroutines, which alternate between being
+			// sender and receiver in the following protocol:
+			//
+			// - The receiver spins for `delay` and then does a
+			// blocking receive on a channel.
+			//
+			// - The sender spins for `delay+wakeDelay` and then
+			// sends to the same channel. (The addition of
+			// `wakeDelay` improves the probability that the
+			// receiver will be blocking when the send occurs when
+			// the goroutines execute in parallel.)
+			//
+			// In each iteration of the benchmark, each goroutine
+			// acts once as sender and once as receiver, so each
+			// goroutine spins for delay twice.
+			//
+			// BenchmarkWakeupParallel is used to estimate how
+			// efficiently the scheduler parallelizes goroutines in
+			// the presence of blocking:
+			//
+			// - If both goroutines are executed on the same core,
+			// an increase in delay by N will increase the time per
+			// iteration by 4*N, because all 4 delays are
+			// serialized.
+			//
+			// - Otherwise, an increase in delay by N will increase
+			// the time per iteration by 2*N, and the time per
+			// iteration is 2 * (runtime overhead + chan
+			// send/receive pair + delay + wakeDelay). This allows
+			// the runtime overhead, including the time it takes
+			// for the unblocked goroutine to be scheduled, to be
+			// estimated.
+			ping, pong := make(chan struct{}), make(chan struct{})
+			start := make(chan struct{})
+			done := make(chan struct{})
+			go func() {
+				<-start
+				for i := 0; i < b.N; i++ {
+					// sender
+					spin(delay + wakeDelay)
+					ping <- struct{}{}
+					// receiver
+					spin(delay)
+					<-pong
+				}
+				done <- struct{}{}
+			}()
+			go func() {
+				for i := 0; i < b.N; i++ {
+					// receiver
+					spin(delay)
+					<-ping
+					// sender
+					spin(delay + wakeDelay)
+					pong <- struct{}{}
+				}
+				done <- struct{}{}
+			}()
+			b.ResetTimer()
+			start <- struct{}{}
+			<-done
+			<-done
+		})
+	}
+}
+
+func BenchmarkWakeupParallelSpinning(b *testing.B) {
+	benchmarkWakeupParallel(b, func(d time.Duration) {
+		end := time.Now().Add(d)
+		for time.Now().Before(end) {
+			// do nothing
+		}
+	})
+}
+
+// sysNanosleep is defined by OS-specific files (such as runtime_linux_test.go)
+// to sleep for the given duration. If nil, dependent tests are skipped.
+// The implementation should invoke a blocking system call and not
+// call time.Sleep, which would deschedule the goroutine.
+var sysNanosleep func(d time.Duration)
+
+func BenchmarkWakeupParallelSyscall(b *testing.B) {
+	if sysNanosleep == nil {
+		b.Skipf("skipping on %v; sysNanosleep not defined", runtime.GOOS)
+	}
+	benchmarkWakeupParallel(b, func(d time.Duration) {
+		sysNanosleep(d)
+	})
+}
+
+type Matrix [][]float64
+
+func BenchmarkMatmult(b *testing.B) {
+	b.StopTimer()
+	// matmult is O(N**3) but testing expects O(b.N),
+	// so we need to take cube root of b.N
+	n := int(math.Cbrt(float64(b.N))) + 1
+	A := makeMatrix(n)
+	B := makeMatrix(n)
+	C := makeMatrix(n)
+	b.StartTimer()
+	matmult(nil, A, B, C, 0, n, 0, n, 0, n, 8)
+}
+
+func makeMatrix(n int) Matrix {
+	m := make(Matrix, n)
+	for i := 0; i < n; i++ {
+		m[i] = make([]float64, n)
+		for j := 0; j < n; j++ {
+			m[i][j] = float64(i*n + j)
+		}
+	}
+	return m
+}
+
+func matmult(done chan<- struct{}, A, B, C Matrix, i0, i1, j0, j1, k0, k1, threshold int) {
+	di := i1 - i0
+	dj := j1 - j0
+	dk := k1 - k0
+	if di >= dj && di >= dk && di >= threshold {
+		// divide in two by y axis
+		mi := i0 + di/2
+		done1 := make(chan struct{}, 1)
+		go matmult(done1, A, B, C, i0, mi, j0, j1, k0, k1, threshold)
+		matmult(nil, A, B, C, mi, i1, j0, j1, k0, k1, threshold)
+		<-done1
+	} else if dj >= dk && dj >= threshold {
+		// divide in two by x axis
+		mj := j0 + dj/2
+		done1 := make(chan struct{}, 1)
+		go matmult(done1, A, B, C, i0, i1, j0, mj, k0, k1, threshold)
+		matmult(nil, A, B, C, i0, i1, mj, j1, k0, k1, threshold)
+		<-done1
+	} else if dk >= threshold {
+		// divide in two by "k" axis
+		// deliberately not parallel because of data races
+		mk := k0 + dk/2
+		matmult(nil, A, B, C, i0, i1, j0, j1, k0, mk, threshold)
+		matmult(nil, A, B, C, i0, i1, j0, j1, mk, k1, threshold)
+	} else {
+		// the matrices are small enough, compute directly
+		for i := i0; i < i1; i++ {
+			for j := j0; j < j1; j++ {
+				for k := k0; k < k1; k++ {
+					C[i][j] += A[i][k] * B[k][j]
+				}
+			}
+		}
+	}
+	if done != nil {
+		done <- struct{}{}
+	}
+}
+
+func TestStealOrder(t *testing.T) {
+	runtime.RunStealOrderTest()
+}
+
+func TestLockOSThreadNesting(t *testing.T) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no threads on wasm yet")
+	}
+
+	go func() {
+		e, i := runtime.LockOSCounts()
+		if e != 0 || i != 0 {
+			t.Errorf("want locked counts 0, 0; got %d, %d", e, i)
+			return
+		}
+		runtime.LockOSThread()
+		runtime.LockOSThread()
+		runtime.UnlockOSThread()
+		e, i = runtime.LockOSCounts()
+		if e != 1 || i != 0 {
+			t.Errorf("want locked counts 1, 0; got %d, %d", e, i)
+			return
+		}
+		runtime.UnlockOSThread()
+		e, i = runtime.LockOSCounts()
+		if e != 0 || i != 0 {
+			t.Errorf("want locked counts 0, 0; got %d, %d", e, i)
+			return
+		}
+	}()
+}
+
+func TestLockOSThreadExit(t *testing.T) {
+	testLockOSThreadExit(t, "testprog")
+}
+
+func testLockOSThreadExit(t *testing.T, prog string) {
+	output := runTestProg(t, prog, "LockOSThreadMain", "GOMAXPROCS=1")
+	want := "OK\n"
+	if output != want {
+		t.Errorf("want %q, got %q", want, output)
+	}
+
+	output = runTestProg(t, prog, "LockOSThreadAlt")
+	if output != want {
+		t.Errorf("want %q, got %q", want, output)
+	}
+}
+
+func TestLockOSThreadAvoidsStatePropagation(t *testing.T) {
+	want := "OK\n"
+	skip := "unshare not permitted\n"
+	output := runTestProg(t, "testprog", "LockOSThreadAvoidsStatePropagation", "GOMAXPROCS=1")
+	if output == skip {
+		t.Skip("unshare syscall not permitted on this system")
+	} else if output != want {
+		t.Errorf("want %q, got %q", want, output)
+	}
+}
+
+func TestLockOSThreadTemplateThreadRace(t *testing.T) {
+	testenv.MustHaveGoRun(t)
+
+	exe, err := buildTestProg(t, "testprog")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	iterations := 100
+	if testing.Short() {
+		// Reduce run time to ~100ms, with much lower probability of
+		// catching issues.
+		iterations = 5
+	}
+	for i := 0; i < iterations; i++ {
+		want := "OK\n"
+		output := runBuiltTestProg(t, exe, "LockOSThreadTemplateThreadRace")
+		if output != want {
+			t.Fatalf("run %d: want %q, got %q", i, want, output)
+		}
+	}
+}
+
+// fakeSyscall emulates a system call.
+//
+//go:nosplit
+func fakeSyscall(duration time.Duration) {
+	runtime.Entersyscall()
+	for start := runtime.Nanotime(); runtime.Nanotime()-start < int64(duration); {
+	}
+	runtime.Exitsyscall()
+}
+
+// Check that a goroutine will be preempted if it is calling short system calls.
+func testPreemptionAfterSyscall(t *testing.T, syscallDuration time.Duration) {
+	if runtime.GOARCH == "wasm" {
+		t.Skip("no preemption on wasm yet")
+	}
+
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(2))
+
+	iterations := 10
+	if testing.Short() {
+		iterations = 1
+	}
+	const (
+		maxDuration = 5 * time.Second
+		nroutines   = 8
+	)
+
+	for i := 0; i < iterations; i++ {
+		c := make(chan bool, nroutines)
+		stop := uint32(0)
+
+		start := time.Now()
+		for g := 0; g < nroutines; g++ {
+			go func(stop *uint32) {
+				c <- true
+				for atomic.LoadUint32(stop) == 0 {
+					fakeSyscall(syscallDuration)
+				}
+				c <- true
+			}(&stop)
+		}
+		// wait until all goroutines have started.
+		for g := 0; g < nroutines; g++ {
+			<-c
+		}
+		atomic.StoreUint32(&stop, 1)
+		// wait until all goroutines have finished.
+		for g := 0; g < nroutines; g++ {
+			<-c
+		}
+		duration := time.Since(start)
+
+		if duration > maxDuration {
+			t.Errorf("timeout exceeded: %v (%v)", duration, maxDuration)
+		}
+	}
+}
+
+func TestPreemptionAfterSyscall(t *testing.T) {
+	if runtime.GOOS == "plan9" {
+		testenv.SkipFlaky(t, 41015)
+	}
+
+	for _, i := range []time.Duration{10, 100, 1000} {
+		d := i * time.Microsecond
+		t.Run(fmt.Sprint(d), func(t *testing.T) {
+			testPreemptionAfterSyscall(t, d)
+		})
+	}
+}
+
+func TestGetgThreadSwitch(t *testing.T) {
+	runtime.RunGetgThreadSwitchTest()
+}
+
+// TestNetpollBreak tests that netpollBreak can break a netpoll.
+// This test is not particularly safe since the call to netpoll
+// will pick up any stray files that are ready, but it should work
+// OK as long it is not run in parallel.
+func TestNetpollBreak(t *testing.T) {
+	if runtime.GOMAXPROCS(0) == 1 {
+		t.Skip("skipping: GOMAXPROCS=1")
+	}
+
+	// Make sure that netpoll is initialized.
+	runtime.NetpollGenericInit()
+
+	start := time.Now()
+	c := make(chan bool, 2)
+	go func() {
+		c <- true
+		runtime.Netpoll(10 * time.Second.Nanoseconds())
+		c <- true
+	}()
+	<-c
+	// Loop because the break might get eaten by the scheduler.
+	// Break twice to break both the netpoll we started and the
+	// scheduler netpoll.
+loop:
+	for {
+		runtime.Usleep(100)
+		runtime.NetpollBreak()
+		runtime.NetpollBreak()
+		select {
+		case <-c:
+			break loop
+		default:
+		}
+	}
+	if dur := time.Since(start); dur > 5*time.Second {
+		t.Errorf("netpollBreak did not interrupt netpoll: slept for: %v", dur)
+	}
+}
+
+// TestBigGOMAXPROCS tests that setting GOMAXPROCS to a large value
+// doesn't cause a crash at startup. See issue 38474.
+func TestBigGOMAXPROCS(t *testing.T) {
+	t.Parallel()
+	output := runTestProg(t, "testprog", "NonexistentTest", "GOMAXPROCS=1024")
+	// Ignore error conditions on small machines.
+	for _, errstr := range []string{
+		"failed to create new OS thread",
+		"cannot allocate memory",
+	} {
+		if strings.Contains(output, errstr) {
+			t.Skipf("failed to create 1024 threads")
+		}
+	}
+	if !strings.Contains(output, "unknown function: NonexistentTest") {
+		t.Errorf("output:\n%s\nwanted:\nunknown function: NonexistentTest", output)
+	}
+}
diff --git a/src/runtime/profbuf.go b/src/runtime/profbuf.go
new file mode 100644
index 0000000..083b55a
--- /dev/null
+++ b/src/runtime/profbuf.go
@@ -0,0 +1,561 @@
+// Copyright 2017 The Go Authors.  All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// A profBuf is a lock-free buffer for profiling events,
+// safe for concurrent use by one reader and one writer.
+// The writer may be a signal handler running without a user g.
+// The reader is assumed to be a user g.
+//
+// Each logged event corresponds to a fixed size header, a list of
+// uintptrs (typically a stack), and exactly one unsafe.Pointer tag.
+// The header and uintptrs are stored in the circular buffer data and the
+// tag is stored in a circular buffer tags, running in parallel.
+// In the circular buffer data, each event takes 2+hdrsize+len(stk)
+// words: the value 2+hdrsize+len(stk), then the time of the event, then
+// hdrsize words giving the fixed-size header, and then len(stk) words
+// for the stack.
+//
+// The current effective offsets into the tags and data circular buffers
+// for reading and writing are stored in the high 30 and low 32 bits of r and w.
+// The bottom bits of the high 32 are additional flag bits in w, unused in r.
+// "Effective" offsets means the total number of reads or writes, mod 2^length.
+// The offset in the buffer is the effective offset mod the length of the buffer.
+// To make wraparound mod 2^length match wraparound mod length of the buffer,
+// the length of the buffer must be a power of two.
+//
+// If the reader catches up to the writer, a flag passed to read controls
+// whether the read blocks until more data is available. A read returns a
+// pointer to the buffer data itself; the caller is assumed to be done with
+// that data at the next read. The read offset rNext tracks the next offset to
+// be returned by read. By definition, r ≤ rNext ≤ w (before wraparound),
+// and rNext is only used by the reader, so it can be accessed without atomics.
+//
+// If the writer gets ahead of the reader, so that the buffer fills,
+// future writes are discarded and replaced in the output stream by an
+// overflow entry, which has size 2+hdrsize+1, time set to the time of
+// the first discarded write, a header of all zeroed words, and a "stack"
+// containing one word, the number of discarded writes.
+//
+// Between the time the buffer fills and the buffer becomes empty enough
+// to hold more data, the overflow entry is stored as a pending overflow
+// entry in the fields overflow and overflowTime. The pending overflow
+// entry can be turned into a real record by either the writer or the
+// reader. If the writer is called to write a new record and finds that
+// the output buffer has room for both the pending overflow entry and the
+// new record, the writer emits the pending overflow entry and the new
+// record into the buffer. If the reader is called to read data and finds
+// that the output buffer is empty but that there is a pending overflow
+// entry, the reader will return a synthesized record for the pending
+// overflow entry.
+//
+// Only the writer can create or add to a pending overflow entry, but
+// either the reader or the writer can clear the pending overflow entry.
+// A pending overflow entry is indicated by the low 32 bits of 'overflow'
+// holding the number of discarded writes, and overflowTime holding the
+// time of the first discarded write. The high 32 bits of 'overflow'
+// increment each time the low 32 bits transition from zero to non-zero
+// or vice versa. This sequence number avoids ABA problems in the use of
+// compare-and-swap to coordinate between reader and writer.
+// The overflowTime is only written when the low 32 bits of overflow are
+// zero, that is, only when there is no pending overflow entry, in
+// preparation for creating a new one. The reader can therefore fetch and
+// clear the entry atomically using
+//
+//	for {
+//		overflow = load(&b.overflow)
+//		if uint32(overflow) == 0 {
+//			// no pending entry
+//			break
+//		}
+//		time = load(&b.overflowTime)
+//		if cas(&b.overflow, overflow, ((overflow>>32)+1)<<32) {
+//			// pending entry cleared
+//			break
+//		}
+//	}
+//	if uint32(overflow) > 0 {
+//		emit entry for uint32(overflow), time
+//	}
+type profBuf struct {
+	// accessed atomically
+	r, w         profAtomic
+	overflow     atomic.Uint64
+	overflowTime atomic.Uint64
+	eof          atomic.Uint32
+
+	// immutable (excluding slice content)
+	hdrsize uintptr
+	data    []uint64
+	tags    []unsafe.Pointer
+
+	// owned by reader
+	rNext       profIndex
+	overflowBuf []uint64 // for use by reader to return overflow record
+	wait        note
+}
+
+// A profAtomic is the atomically-accessed word holding a profIndex.
+type profAtomic uint64
+
+// A profIndex is the packet tag and data counts and flags bits, described above.
+type profIndex uint64
+
+const (
+	profReaderSleeping profIndex = 1 << 32 // reader is sleeping and must be woken up
+	profWriteExtra     profIndex = 1 << 33 // overflow or eof waiting
+)
+
+func (x *profAtomic) load() profIndex {
+	return profIndex(atomic.Load64((*uint64)(x)))
+}
+
+func (x *profAtomic) store(new profIndex) {
+	atomic.Store64((*uint64)(x), uint64(new))
+}
+
+func (x *profAtomic) cas(old, new profIndex) bool {
+	return atomic.Cas64((*uint64)(x), uint64(old), uint64(new))
+}
+
+func (x profIndex) dataCount() uint32 {
+	return uint32(x)
+}
+
+func (x profIndex) tagCount() uint32 {
+	return uint32(x >> 34)
+}
+
+// countSub subtracts two counts obtained from profIndex.dataCount or profIndex.tagCount,
+// assuming that they are no more than 2^29 apart (guaranteed since they are never more than
+// len(data) or len(tags) apart, respectively).
+// tagCount wraps at 2^30, while dataCount wraps at 2^32.
+// This function works for both.
+func countSub(x, y uint32) int {
+	// x-y is 32-bit signed or 30-bit signed; sign-extend to 32 bits and convert to int.
+	return int(int32(x-y) << 2 >> 2)
+}
+
+// addCountsAndClearFlags returns the packed form of "x + (data, tag) - all flags".
+func (x profIndex) addCountsAndClearFlags(data, tag int) profIndex {
+	return profIndex((uint64(x)>>34+uint64(uint32(tag)<<2>>2))<<34 | uint64(uint32(x)+uint32(data)))
+}
+
+// hasOverflow reports whether b has any overflow records pending.
+func (b *profBuf) hasOverflow() bool {
+	return uint32(b.overflow.Load()) > 0
+}
+
+// takeOverflow consumes the pending overflow records, returning the overflow count
+// and the time of the first overflow.
+// When called by the reader, it is racing against incrementOverflow.
+func (b *profBuf) takeOverflow() (count uint32, time uint64) {
+	overflow := b.overflow.Load()
+	time = b.overflowTime.Load()
+	for {
+		count = uint32(overflow)
+		if count == 0 {
+			time = 0
+			break
+		}
+		// Increment generation, clear overflow count in low bits.
+		if b.overflow.CompareAndSwap(overflow, ((overflow>>32)+1)<<32) {
+			break
+		}
+		overflow = b.overflow.Load()
+		time = b.overflowTime.Load()
+	}
+	return uint32(overflow), time
+}
+
+// incrementOverflow records a single overflow at time now.
+// It is racing against a possible takeOverflow in the reader.
+func (b *profBuf) incrementOverflow(now int64) {
+	for {
+		overflow := b.overflow.Load()
+
+		// Once we see b.overflow reach 0, it's stable: no one else is changing it underfoot.
+		// We need to set overflowTime if we're incrementing b.overflow from 0.
+		if uint32(overflow) == 0 {
+			// Store overflowTime first so it's always available when overflow != 0.
+			b.overflowTime.Store(uint64(now))
+			b.overflow.Store((((overflow >> 32) + 1) << 32) + 1)
+			break
+		}
+		// Otherwise we're racing to increment against reader
+		// who wants to set b.overflow to 0.
+		// Out of paranoia, leave 2³²-1 a sticky overflow value,
+		// to avoid wrapping around. Extremely unlikely.
+		if int32(overflow) == -1 {
+			break
+		}
+		if b.overflow.CompareAndSwap(overflow, overflow+1) {
+			break
+		}
+	}
+}
+
+// newProfBuf returns a new profiling buffer with room for
+// a header of hdrsize words and a buffer of at least bufwords words.
+func newProfBuf(hdrsize, bufwords, tags int) *profBuf {
+	if min := 2 + hdrsize + 1; bufwords < min {
+		bufwords = min
+	}
+
+	// Buffer sizes must be power of two, so that we don't have to
+	// worry about uint32 wraparound changing the effective position
+	// within the buffers. We store 30 bits of count; limiting to 28
+	// gives us some room for intermediate calculations.
+	if bufwords >= 1<<28 || tags >= 1<<28 {
+		throw("newProfBuf: buffer too large")
+	}
+	var i int
+	for i = 1; i < bufwords; i <<= 1 {
+	}
+	bufwords = i
+	for i = 1; i < tags; i <<= 1 {
+	}
+	tags = i
+
+	b := new(profBuf)
+	b.hdrsize = uintptr(hdrsize)
+	b.data = make([]uint64, bufwords)
+	b.tags = make([]unsafe.Pointer, tags)
+	b.overflowBuf = make([]uint64, 2+b.hdrsize+1)
+	return b
+}
+
+// canWriteRecord reports whether the buffer has room
+// for a single contiguous record with a stack of length nstk.
+func (b *profBuf) canWriteRecord(nstk int) bool {
+	br := b.r.load()
+	bw := b.w.load()
+
+	// room for tag?
+	if countSub(br.tagCount(), bw.tagCount())+len(b.tags) < 1 {
+		return false
+	}
+
+	// room for data?
+	nd := countSub(br.dataCount(), bw.dataCount()) + len(b.data)
+	want := 2 + int(b.hdrsize) + nstk
+	i := int(bw.dataCount() % uint32(len(b.data)))
+	if i+want > len(b.data) {
+		// Can't fit in trailing fragment of slice.
+		// Skip over that and start over at beginning of slice.
+		nd -= len(b.data) - i
+	}
+	return nd >= want
+}
+
+// canWriteTwoRecords reports whether the buffer has room
+// for two records with stack lengths nstk1, nstk2, in that order.
+// Each record must be contiguous on its own, but the two
+// records need not be contiguous (one can be at the end of the buffer
+// and the other can wrap around and start at the beginning of the buffer).
+func (b *profBuf) canWriteTwoRecords(nstk1, nstk2 int) bool {
+	br := b.r.load()
+	bw := b.w.load()
+
+	// room for tag?
+	if countSub(br.tagCount(), bw.tagCount())+len(b.tags) < 2 {
+		return false
+	}
+
+	// room for data?
+	nd := countSub(br.dataCount(), bw.dataCount()) + len(b.data)
+
+	// first record
+	want := 2 + int(b.hdrsize) + nstk1
+	i := int(bw.dataCount() % uint32(len(b.data)))
+	if i+want > len(b.data) {
+		// Can't fit in trailing fragment of slice.
+		// Skip over that and start over at beginning of slice.
+		nd -= len(b.data) - i
+		i = 0
+	}
+	i += want
+	nd -= want
+
+	// second record
+	want = 2 + int(b.hdrsize) + nstk2
+	if i+want > len(b.data) {
+		// Can't fit in trailing fragment of slice.
+		// Skip over that and start over at beginning of slice.
+		nd -= len(b.data) - i
+		i = 0
+	}
+	return nd >= want
+}
+
+// write writes an entry to the profiling buffer b.
+// The entry begins with a fixed hdr, which must have
+// length b.hdrsize, followed by a variable-sized stack
+// and a single tag pointer *tagPtr (or nil if tagPtr is nil).
+// No write barriers allowed because this might be called from a signal handler.
+func (b *profBuf) write(tagPtr *unsafe.Pointer, now int64, hdr []uint64, stk []uintptr) {
+	if b == nil {
+		return
+	}
+	if len(hdr) > int(b.hdrsize) {
+		throw("misuse of profBuf.write")
+	}
+
+	if hasOverflow := b.hasOverflow(); hasOverflow && b.canWriteTwoRecords(1, len(stk)) {
+		// Room for both an overflow record and the one being written.
+		// Write the overflow record if the reader hasn't gotten to it yet.
+		// Only racing against reader, not other writers.
+		count, time := b.takeOverflow()
+		if count > 0 {
+			var stk [1]uintptr
+			stk[0] = uintptr(count)
+			b.write(nil, int64(time), nil, stk[:])
+		}
+	} else if hasOverflow || !b.canWriteRecord(len(stk)) {
+		// Pending overflow without room to write overflow and new records
+		// or no overflow but also no room for new record.
+		b.incrementOverflow(now)
+		b.wakeupExtra()
+		return
+	}
+
+	// There's room: write the record.
+	br := b.r.load()
+	bw := b.w.load()
+
+	// Profiling tag
+	//
+	// The tag is a pointer, but we can't run a write barrier here.
+	// We have interrupted the OS-level execution of gp, but the
+	// runtime still sees gp as executing. In effect, we are running
+	// in place of the real gp. Since gp is the only goroutine that
+	// can overwrite gp.labels, the value of gp.labels is stable during
+	// this signal handler: it will still be reachable from gp when
+	// we finish executing. If a GC is in progress right now, it must
+	// keep gp.labels alive, because gp.labels is reachable from gp.
+	// If gp were to overwrite gp.labels, the deletion barrier would
+	// still shade that pointer, which would preserve it for the
+	// in-progress GC, so all is well. Any future GC will see the
+	// value we copied when scanning b.tags (heap-allocated).
+	// We arrange that the store here is always overwriting a nil,
+	// so there is no need for a deletion barrier on b.tags[wt].
+	wt := int(bw.tagCount() % uint32(len(b.tags)))
+	if tagPtr != nil {
+		*(*uintptr)(unsafe.Pointer(&b.tags[wt])) = uintptr(unsafe.Pointer(*tagPtr))
+	}
+
+	// Main record.
+	// It has to fit in a contiguous section of the slice, so if it doesn't fit at the end,
+	// leave a rewind marker (0) and start over at the beginning of the slice.
+	wd := int(bw.dataCount() % uint32(len(b.data)))
+	nd := countSub(br.dataCount(), bw.dataCount()) + len(b.data)
+	skip := 0
+	if wd+2+int(b.hdrsize)+len(stk) > len(b.data) {
+		b.data[wd] = 0
+		skip = len(b.data) - wd
+		nd -= skip
+		wd = 0
+	}
+	data := b.data[wd:]
+	data[0] = uint64(2 + b.hdrsize + uintptr(len(stk))) // length
+	data[1] = uint64(now)                               // time stamp
+	// header, zero-padded
+	i := uintptr(copy(data[2:2+b.hdrsize], hdr))
+	for ; i < b.hdrsize; i++ {
+		data[2+i] = 0
+	}
+	for i, pc := range stk {
+		data[2+b.hdrsize+uintptr(i)] = uint64(pc)
+	}
+
+	for {
+		// Commit write.
+		// Racing with reader setting flag bits in b.w, to avoid lost wakeups.
+		old := b.w.load()
+		new := old.addCountsAndClearFlags(skip+2+len(stk)+int(b.hdrsize), 1)
+		if !b.w.cas(old, new) {
+			continue
+		}
+		// If there was a reader, wake it up.
+		if old&profReaderSleeping != 0 {
+			notewakeup(&b.wait)
+		}
+		break
+	}
+}
+
+// close signals that there will be no more writes on the buffer.
+// Once all the data has been read from the buffer, reads will return eof=true.
+func (b *profBuf) close() {
+	if b.eof.Load() > 0 {
+		throw("runtime: profBuf already closed")
+	}
+	b.eof.Store(1)
+	b.wakeupExtra()
+}
+
+// wakeupExtra must be called after setting one of the "extra"
+// atomic fields b.overflow or b.eof.
+// It records the change in b.w and wakes up the reader if needed.
+func (b *profBuf) wakeupExtra() {
+	for {
+		old := b.w.load()
+		new := old | profWriteExtra
+		if !b.w.cas(old, new) {
+			continue
+		}
+		if old&profReaderSleeping != 0 {
+			notewakeup(&b.wait)
+		}
+		break
+	}
+}
+
+// profBufReadMode specifies whether to block when no data is available to read.
+type profBufReadMode int
+
+const (
+	profBufBlocking profBufReadMode = iota
+	profBufNonBlocking
+)
+
+var overflowTag [1]unsafe.Pointer // always nil
+
+func (b *profBuf) read(mode profBufReadMode) (data []uint64, tags []unsafe.Pointer, eof bool) {
+	if b == nil {
+		return nil, nil, true
+	}
+
+	br := b.rNext
+
+	// Commit previous read, returning that part of the ring to the writer.
+	// First clear tags that have now been read, both to avoid holding
+	// up the memory they point at for longer than necessary
+	// and so that b.write can assume it is always overwriting
+	// nil tag entries (see comment in b.write).
+	rPrev := b.r.load()
+	if rPrev != br {
+		ntag := countSub(br.tagCount(), rPrev.tagCount())
+		ti := int(rPrev.tagCount() % uint32(len(b.tags)))
+		for i := 0; i < ntag; i++ {
+			b.tags[ti] = nil
+			if ti++; ti == len(b.tags) {
+				ti = 0
+			}
+		}
+		b.r.store(br)
+	}
+
+Read:
+	bw := b.w.load()
+	numData := countSub(bw.dataCount(), br.dataCount())
+	if numData == 0 {
+		if b.hasOverflow() {
+			// No data to read, but there is overflow to report.
+			// Racing with writer flushing b.overflow into a real record.
+			count, time := b.takeOverflow()
+			if count == 0 {
+				// Lost the race, go around again.
+				goto Read
+			}
+			// Won the race, report overflow.
+			dst := b.overflowBuf
+			dst[0] = uint64(2 + b.hdrsize + 1)
+			dst[1] = uint64(time)
+			for i := uintptr(0); i < b.hdrsize; i++ {
+				dst[2+i] = 0
+			}
+			dst[2+b.hdrsize] = uint64(count)
+			return dst[:2+b.hdrsize+1], overflowTag[:1], false
+		}
+		if b.eof.Load() > 0 {
+			// No data, no overflow, EOF set: done.
+			return nil, nil, true
+		}
+		if bw&profWriteExtra != 0 {
+			// Writer claims to have published extra information (overflow or eof).
+			// Attempt to clear notification and then check again.
+			// If we fail to clear the notification it means b.w changed,
+			// so we still need to check again.
+			b.w.cas(bw, bw&^profWriteExtra)
+			goto Read
+		}
+
+		// Nothing to read right now.
+		// Return or sleep according to mode.
+		if mode == profBufNonBlocking {
+			// Necessary on Darwin, notetsleepg below does not work in signal handler, root cause of #61768.
+			return nil, nil, false
+		}
+		if !b.w.cas(bw, bw|profReaderSleeping) {
+			goto Read
+		}
+		// Committed to sleeping.
+		notetsleepg(&b.wait, -1)
+		noteclear(&b.wait)
+		goto Read
+	}
+	data = b.data[br.dataCount()%uint32(len(b.data)):]
+	if len(data) > numData {
+		data = data[:numData]
+	} else {
+		numData -= len(data) // available in case of wraparound
+	}
+	skip := 0
+	if data[0] == 0 {
+		// Wraparound record. Go back to the beginning of the ring.
+		skip = len(data)
+		data = b.data
+		if len(data) > numData {
+			data = data[:numData]
+		}
+	}
+
+	ntag := countSub(bw.tagCount(), br.tagCount())
+	if ntag == 0 {
+		throw("runtime: malformed profBuf buffer - tag and data out of sync")
+	}
+	tags = b.tags[br.tagCount()%uint32(len(b.tags)):]
+	if len(tags) > ntag {
+		tags = tags[:ntag]
+	}
+
+	// Count out whole data records until either data or tags is done.
+	// They are always in sync in the buffer, but due to an end-of-slice
+	// wraparound we might need to stop early and return the rest
+	// in the next call.
+	di := 0
+	ti := 0
+	for di < len(data) && data[di] != 0 && ti < len(tags) {
+		if uintptr(di)+uintptr(data[di]) > uintptr(len(data)) {
+			throw("runtime: malformed profBuf buffer - invalid size")
+		}
+		di += int(data[di])
+		ti++
+	}
+
+	// Remember how much we returned, to commit read on next call.
+	b.rNext = br.addCountsAndClearFlags(skip+di, ti)
+
+	if raceenabled {
+		// Match racereleasemerge in runtime_setProfLabel,
+		// so that the setting of the labels in runtime_setProfLabel
+		// is treated as happening before any use of the labels
+		// by our caller. The synchronization on labelSync itself is a fiction
+		// for the race detector. The actual synchronization is handled
+		// by the fact that the signal handler only reads from the current
+		// goroutine and uses atomics to write the updated queue indices,
+		// and then the read-out from the signal handler buffer uses
+		// atomics to read those queue indices.
+		raceacquire(unsafe.Pointer(&labelSync))
+	}
+
+	return data[:di], tags[:ti], false
+}
diff --git a/src/runtime/profbuf_test.go b/src/runtime/profbuf_test.go
new file mode 100644
index 0000000..d9c5264
--- /dev/null
+++ b/src/runtime/profbuf_test.go
@@ -0,0 +1,182 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"reflect"
+	. "runtime"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+func TestProfBuf(t *testing.T) {
+	const hdrSize = 2
+
+	write := func(t *testing.T, b *ProfBuf, tag unsafe.Pointer, now int64, hdr []uint64, stk []uintptr) {
+		b.Write(&tag, now, hdr, stk)
+	}
+	read := func(t *testing.T, b *ProfBuf, data []uint64, tags []unsafe.Pointer) {
+		rdata, rtags, eof := b.Read(ProfBufNonBlocking)
+		if !reflect.DeepEqual(rdata, data) || !reflect.DeepEqual(rtags, tags) {
+			t.Fatalf("unexpected profile read:\nhave data %#x\nwant data %#x\nhave tags %#x\nwant tags %#x", rdata, data, rtags, tags)
+		}
+		if eof {
+			t.Fatalf("unexpected eof")
+		}
+	}
+	readBlock := func(t *testing.T, b *ProfBuf, data []uint64, tags []unsafe.Pointer) func() {
+		c := make(chan int)
+		go func() {
+			eof := data == nil
+			rdata, rtags, reof := b.Read(ProfBufBlocking)
+			if !reflect.DeepEqual(rdata, data) || !reflect.DeepEqual(rtags, tags) || reof != eof {
+				// Errorf, not Fatalf, because called in goroutine.
+				t.Errorf("unexpected profile read:\nhave data %#x\nwant data %#x\nhave tags %#x\nwant tags %#x\nhave eof=%v, want %v", rdata, data, rtags, tags, reof, eof)
+			}
+			c <- 1
+		}()
+		time.Sleep(10 * time.Millisecond) // let goroutine run and block
+		return func() {
+			select {
+			case <-c:
+			case <-time.After(1 * time.Second):
+				t.Fatalf("timeout waiting for blocked read")
+			}
+		}
+	}
+	readEOF := func(t *testing.T, b *ProfBuf) {
+		rdata, rtags, eof := b.Read(ProfBufBlocking)
+		if rdata != nil || rtags != nil || !eof {
+			t.Errorf("unexpected profile read: %#x, %#x, eof=%v; want nil, nil, eof=true", rdata, rtags, eof)
+		}
+		rdata, rtags, eof = b.Read(ProfBufNonBlocking)
+		if rdata != nil || rtags != nil || !eof {
+			t.Errorf("unexpected profile read (non-blocking): %#x, %#x, eof=%v; want nil, nil, eof=true", rdata, rtags, eof)
+		}
+	}
+
+	myTags := make([]byte, 100)
+	t.Logf("myTags is %p", &myTags[0])
+
+	t.Run("BasicWriteRead", func(t *testing.T) {
+		b := NewProfBuf(2, 11, 1)
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+		read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])})
+		read(t, b, nil, nil) // release data returned by previous read
+		write(t, b, unsafe.Pointer(&myTags[2]), 99, []uint64{101, 102}, []uintptr{201, 202, 203, 204})
+		read(t, b, []uint64{8, 99, 101, 102, 201, 202, 203, 204}, []unsafe.Pointer{unsafe.Pointer(&myTags[2])})
+	})
+
+	t.Run("ReadMany", func(t *testing.T) {
+		b := NewProfBuf(2, 50, 50)
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+		write(t, b, unsafe.Pointer(&myTags[2]), 99, []uint64{101, 102}, []uintptr{201, 202, 203, 204})
+		write(t, b, unsafe.Pointer(&myTags[1]), 500, []uint64{502, 504}, []uintptr{506})
+		read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 99, 101, 102, 201, 202, 203, 204, 5, 500, 502, 504, 506}, []unsafe.Pointer{unsafe.Pointer(&myTags[0]), unsafe.Pointer(&myTags[2]), unsafe.Pointer(&myTags[1])})
+	})
+
+	t.Run("ReadManyShortData", func(t *testing.T) {
+		b := NewProfBuf(2, 50, 50)
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+		write(t, b, unsafe.Pointer(&myTags[2]), 99, []uint64{101, 102}, []uintptr{201, 202, 203, 204})
+		read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 99, 101, 102, 201, 202, 203, 204}, []unsafe.Pointer{unsafe.Pointer(&myTags[0]), unsafe.Pointer(&myTags[2])})
+	})
+
+	t.Run("ReadManyShortTags", func(t *testing.T) {
+		b := NewProfBuf(2, 50, 50)
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+		write(t, b, unsafe.Pointer(&myTags[2]), 99, []uint64{101, 102}, []uintptr{201, 202, 203, 204})
+		read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 99, 101, 102, 201, 202, 203, 204}, []unsafe.Pointer{unsafe.Pointer(&myTags[0]), unsafe.Pointer(&myTags[2])})
+	})
+
+	t.Run("ReadAfterOverflow1", func(t *testing.T) {
+		// overflow record synthesized by write
+		b := NewProfBuf(2, 16, 5)
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})           // uses 10
+		read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])}) // reads 10 but still in use until next read
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5})                       // uses 6
+		read(t, b, []uint64{6, 1, 2, 3, 4, 5}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])})              // reads 6 but still in use until next read
+		// now 10 available
+		write(t, b, unsafe.Pointer(&myTags[2]), 99, []uint64{101, 102}, []uintptr{201, 202, 203, 204, 205, 206, 207, 208, 209}) // no room
+		for i := 0; i < 299; i++ {
+			write(t, b, unsafe.Pointer(&myTags[3]), int64(100+i), []uint64{101, 102}, []uintptr{201, 202, 203, 204}) // no room for overflow+this record
+		}
+		write(t, b, unsafe.Pointer(&myTags[1]), 500, []uint64{502, 504}, []uintptr{506}) // room for overflow+this record
+		read(t, b, []uint64{5, 99, 0, 0, 300, 5, 500, 502, 504, 506}, []unsafe.Pointer{nil, unsafe.Pointer(&myTags[1])})
+	})
+
+	t.Run("ReadAfterOverflow2", func(t *testing.T) {
+		// overflow record synthesized by read
+		b := NewProfBuf(2, 16, 5)
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+		write(t, b, unsafe.Pointer(&myTags[2]), 99, []uint64{101, 102}, []uintptr{201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213})
+		for i := 0; i < 299; i++ {
+			write(t, b, unsafe.Pointer(&myTags[3]), 100, []uint64{101, 102}, []uintptr{201, 202, 203, 204})
+		}
+		read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])}) // reads 10 but still in use until next read
+		write(t, b, unsafe.Pointer(&myTags[1]), 500, []uint64{502, 504}, []uintptr{})                     // still overflow
+		read(t, b, []uint64{5, 99, 0, 0, 301}, []unsafe.Pointer{nil})                                     // overflow synthesized by read
+		write(t, b, unsafe.Pointer(&myTags[1]), 500, []uint64{502, 505}, []uintptr{506})                  // written
+		read(t, b, []uint64{5, 500, 502, 505, 506}, []unsafe.Pointer{unsafe.Pointer(&myTags[1])})
+	})
+
+	t.Run("ReadAtEndAfterOverflow", func(t *testing.T) {
+		b := NewProfBuf(2, 12, 5)
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+		write(t, b, unsafe.Pointer(&myTags[2]), 99, []uint64{101, 102}, []uintptr{201, 202, 203, 204})
+		for i := 0; i < 299; i++ {
+			write(t, b, unsafe.Pointer(&myTags[3]), 100, []uint64{101, 102}, []uintptr{201, 202, 203, 204})
+		}
+		read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])})
+		read(t, b, []uint64{5, 99, 0, 0, 300}, []unsafe.Pointer{nil})
+		write(t, b, unsafe.Pointer(&myTags[1]), 500, []uint64{502, 504}, []uintptr{506})
+		read(t, b, []uint64{5, 500, 502, 504, 506}, []unsafe.Pointer{unsafe.Pointer(&myTags[1])})
+	})
+
+	t.Run("BlockingWriteRead", func(t *testing.T) {
+		b := NewProfBuf(2, 11, 1)
+		wait := readBlock(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])})
+		write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+		wait()
+		wait = readBlock(t, b, []uint64{8, 99, 101, 102, 201, 202, 203, 204}, []unsafe.Pointer{unsafe.Pointer(&myTags[2])})
+		time.Sleep(10 * time.Millisecond)
+		write(t, b, unsafe.Pointer(&myTags[2]), 99, []uint64{101, 102}, []uintptr{201, 202, 203, 204})
+		wait()
+		wait = readBlock(t, b, nil, nil)
+		b.Close()
+		wait()
+		wait = readBlock(t, b, nil, nil)
+		wait()
+		readEOF(t, b)
+	})
+
+	t.Run("DataWraparound", func(t *testing.T) {
+		b := NewProfBuf(2, 16, 1024)
+		for i := 0; i < 10; i++ {
+			write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+			read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])})
+			read(t, b, nil, nil) // release data returned by previous read
+		}
+	})
+
+	t.Run("TagWraparound", func(t *testing.T) {
+		b := NewProfBuf(2, 1024, 2)
+		for i := 0; i < 10; i++ {
+			write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+			read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])})
+			read(t, b, nil, nil) // release data returned by previous read
+		}
+	})
+
+	t.Run("BothWraparound", func(t *testing.T) {
+		b := NewProfBuf(2, 16, 2)
+		for i := 0; i < 10; i++ {
+			write(t, b, unsafe.Pointer(&myTags[0]), 1, []uint64{2, 3}, []uintptr{4, 5, 6, 7, 8, 9})
+			read(t, b, []uint64{10, 1, 2, 3, 4, 5, 6, 7, 8, 9}, []unsafe.Pointer{unsafe.Pointer(&myTags[0])})
+			read(t, b, nil, nil) // release data returned by previous read
+		}
+	})
+}
diff --git a/src/runtime/proflabel.go b/src/runtime/proflabel.go
new file mode 100644
index 0000000..b2a1617
--- /dev/null
+++ b/src/runtime/proflabel.go
@@ -0,0 +1,40 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+var labelSync uintptr
+
+//go:linkname runtime_setProfLabel runtime/pprof.runtime_setProfLabel
+func runtime_setProfLabel(labels unsafe.Pointer) {
+	// Introduce race edge for read-back via profile.
+	// This would more properly use &getg().labels as the sync address,
+	// but we do the read in a signal handler and can't call the race runtime then.
+	//
+	// This uses racereleasemerge rather than just racerelease so
+	// the acquire in profBuf.read synchronizes with *all* prior
+	// setProfLabel operations, not just the most recent one. This
+	// is important because profBuf.read will observe different
+	// labels set by different setProfLabel operations on
+	// different goroutines, so it needs to synchronize with all
+	// of them (this wouldn't be an issue if we could synchronize
+	// on &getg().labels since we would synchronize with each
+	// most-recent labels write separately.)
+	//
+	// racereleasemerge is like a full read-modify-write on
+	// labelSync, rather than just a store-release, so it carries
+	// a dependency on the previous racereleasemerge, which
+	// ultimately carries forward to the acquire in profBuf.read.
+	if raceenabled {
+		racereleasemerge(unsafe.Pointer(&labelSync))
+	}
+	getg().labels = labels
+}
+
+//go:linkname runtime_getProfLabel runtime/pprof.runtime_getProfLabel
+func runtime_getProfLabel() unsafe.Pointer {
+	return getg().labels
+}
diff --git a/src/runtime/race.go b/src/runtime/race.go
new file mode 100644
index 0000000..e2767f0
--- /dev/null
+++ b/src/runtime/race.go
@@ -0,0 +1,654 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build race
+
+package runtime
+
+import (
+	"internal/abi"
+	"unsafe"
+)
+
+// Public race detection API, present iff build with -race.
+
+func RaceRead(addr unsafe.Pointer)
+func RaceWrite(addr unsafe.Pointer)
+func RaceReadRange(addr unsafe.Pointer, len int)
+func RaceWriteRange(addr unsafe.Pointer, len int)
+
+func RaceErrors() int {
+	var n uint64
+	racecall(&__tsan_report_count, uintptr(unsafe.Pointer(&n)), 0, 0, 0)
+	return int(n)
+}
+
+// RaceAcquire/RaceRelease/RaceReleaseMerge establish happens-before relations
+// between goroutines. These inform the race detector about actual synchronization
+// that it can't see for some reason (e.g. synchronization within RaceDisable/RaceEnable
+// sections of code).
+// RaceAcquire establishes a happens-before relation with the preceding
+// RaceReleaseMerge on addr up to and including the last RaceRelease on addr.
+// In terms of the C memory model (C11 §5.1.2.4, §7.17.3),
+// RaceAcquire is equivalent to atomic_load(memory_order_acquire).
+//
+//go:nosplit
+func RaceAcquire(addr unsafe.Pointer) {
+	raceacquire(addr)
+}
+
+// RaceRelease performs a release operation on addr that
+// can synchronize with a later RaceAcquire on addr.
+//
+// In terms of the C memory model, RaceRelease is equivalent to
+// atomic_store(memory_order_release).
+//
+//go:nosplit
+func RaceRelease(addr unsafe.Pointer) {
+	racerelease(addr)
+}
+
+// RaceReleaseMerge is like RaceRelease, but also establishes a happens-before
+// relation with the preceding RaceRelease or RaceReleaseMerge on addr.
+//
+// In terms of the C memory model, RaceReleaseMerge is equivalent to
+// atomic_exchange(memory_order_release).
+//
+//go:nosplit
+func RaceReleaseMerge(addr unsafe.Pointer) {
+	racereleasemerge(addr)
+}
+
+// RaceDisable disables handling of race synchronization events in the current goroutine.
+// Handling is re-enabled with RaceEnable. RaceDisable/RaceEnable can be nested.
+// Non-synchronization events (memory accesses, function entry/exit) still affect
+// the race detector.
+//
+//go:nosplit
+func RaceDisable() {
+	gp := getg()
+	if gp.raceignore == 0 {
+		racecall(&__tsan_go_ignore_sync_begin, gp.racectx, 0, 0, 0)
+	}
+	gp.raceignore++
+}
+
+// RaceEnable re-enables handling of race events in the current goroutine.
+//
+//go:nosplit
+func RaceEnable() {
+	gp := getg()
+	gp.raceignore--
+	if gp.raceignore == 0 {
+		racecall(&__tsan_go_ignore_sync_end, gp.racectx, 0, 0, 0)
+	}
+}
+
+// Private interface for the runtime.
+
+const raceenabled = true
+
+// For all functions accepting callerpc and pc,
+// callerpc is a return PC of the function that calls this function,
+// pc is start PC of the function that calls this function.
+func raceReadObjectPC(t *_type, addr unsafe.Pointer, callerpc, pc uintptr) {
+	kind := t.Kind_ & kindMask
+	if kind == kindArray || kind == kindStruct {
+		// for composite objects we have to read every address
+		// because a write might happen to any subobject.
+		racereadrangepc(addr, t.Size_, callerpc, pc)
+	} else {
+		// for non-composite objects we can read just the start
+		// address, as any write must write the first byte.
+		racereadpc(addr, callerpc, pc)
+	}
+}
+
+func raceWriteObjectPC(t *_type, addr unsafe.Pointer, callerpc, pc uintptr) {
+	kind := t.Kind_ & kindMask
+	if kind == kindArray || kind == kindStruct {
+		// for composite objects we have to write every address
+		// because a write might happen to any subobject.
+		racewriterangepc(addr, t.Size_, callerpc, pc)
+	} else {
+		// for non-composite objects we can write just the start
+		// address, as any write must write the first byte.
+		racewritepc(addr, callerpc, pc)
+	}
+}
+
+//go:noescape
+func racereadpc(addr unsafe.Pointer, callpc, pc uintptr)
+
+//go:noescape
+func racewritepc(addr unsafe.Pointer, callpc, pc uintptr)
+
+type symbolizeCodeContext struct {
+	pc   uintptr
+	fn   *byte
+	file *byte
+	line uintptr
+	off  uintptr
+	res  uintptr
+}
+
+var qq = [...]byte{'?', '?', 0}
+var dash = [...]byte{'-', 0}
+
+const (
+	raceGetProcCmd = iota
+	raceSymbolizeCodeCmd
+	raceSymbolizeDataCmd
+)
+
+// Callback from C into Go, runs on g0.
+func racecallback(cmd uintptr, ctx unsafe.Pointer) {
+	switch cmd {
+	case raceGetProcCmd:
+		throw("should have been handled by racecallbackthunk")
+	case raceSymbolizeCodeCmd:
+		raceSymbolizeCode((*symbolizeCodeContext)(ctx))
+	case raceSymbolizeDataCmd:
+		raceSymbolizeData((*symbolizeDataContext)(ctx))
+	default:
+		throw("unknown command")
+	}
+}
+
+// raceSymbolizeCode reads ctx.pc and populates the rest of *ctx with
+// information about the code at that pc.
+//
+// The race detector has already subtracted 1 from pcs, so they point to the last
+// byte of call instructions (including calls to runtime.racewrite and friends).
+//
+// If the incoming pc is part of an inlined function, *ctx is populated
+// with information about the inlined function, and on return ctx.pc is set
+// to a pc in the logically containing function. (The race detector should call this
+// function again with that pc.)
+//
+// If the incoming pc is not part of an inlined function, the return pc is unchanged.
+func raceSymbolizeCode(ctx *symbolizeCodeContext) {
+	pc := ctx.pc
+	fi := findfunc(pc)
+	if fi.valid() {
+		u, uf := newInlineUnwinder(fi, pc, nil)
+		for ; uf.valid(); uf = u.next(uf) {
+			sf := u.srcFunc(uf)
+			if sf.funcID == abi.FuncIDWrapper && u.isInlined(uf) {
+				// Ignore wrappers, unless we're at the outermost frame of u.
+				// A non-inlined wrapper frame always means we have a physical
+				// frame consisting entirely of wrappers, in which case we'll
+				// take a outermost wrapper over nothing.
+				continue
+			}
+
+			name := sf.name()
+			file, line := u.fileLine(uf)
+			if line == 0 {
+				// Failure to symbolize
+				continue
+			}
+			ctx.fn = &bytes(name)[0] // assume NUL-terminated
+			ctx.line = uintptr(line)
+			ctx.file = &bytes(file)[0] // assume NUL-terminated
+			ctx.off = pc - fi.entry()
+			ctx.res = 1
+			if u.isInlined(uf) {
+				// Set ctx.pc to the "caller" so the race detector calls this again
+				// to further unwind.
+				uf = u.next(uf)
+				ctx.pc = uf.pc
+			}
+			return
+		}
+	}
+	ctx.fn = &qq[0]
+	ctx.file = &dash[0]
+	ctx.line = 0
+	ctx.off = ctx.pc
+	ctx.res = 1
+}
+
+type symbolizeDataContext struct {
+	addr  uintptr
+	heap  uintptr
+	start uintptr
+	size  uintptr
+	name  *byte
+	file  *byte
+	line  uintptr
+	res   uintptr
+}
+
+func raceSymbolizeData(ctx *symbolizeDataContext) {
+	if base, span, _ := findObject(ctx.addr, 0, 0); base != 0 {
+		ctx.heap = 1
+		ctx.start = base
+		ctx.size = span.elemsize
+		ctx.res = 1
+	}
+}
+
+// Race runtime functions called via runtime·racecall.
+//
+//go:linkname __tsan_init __tsan_init
+var __tsan_init byte
+
+//go:linkname __tsan_fini __tsan_fini
+var __tsan_fini byte
+
+//go:linkname __tsan_proc_create __tsan_proc_create
+var __tsan_proc_create byte
+
+//go:linkname __tsan_proc_destroy __tsan_proc_destroy
+var __tsan_proc_destroy byte
+
+//go:linkname __tsan_map_shadow __tsan_map_shadow
+var __tsan_map_shadow byte
+
+//go:linkname __tsan_finalizer_goroutine __tsan_finalizer_goroutine
+var __tsan_finalizer_goroutine byte
+
+//go:linkname __tsan_go_start __tsan_go_start
+var __tsan_go_start byte
+
+//go:linkname __tsan_go_end __tsan_go_end
+var __tsan_go_end byte
+
+//go:linkname __tsan_malloc __tsan_malloc
+var __tsan_malloc byte
+
+//go:linkname __tsan_free __tsan_free
+var __tsan_free byte
+
+//go:linkname __tsan_acquire __tsan_acquire
+var __tsan_acquire byte
+
+//go:linkname __tsan_release __tsan_release
+var __tsan_release byte
+
+//go:linkname __tsan_release_acquire __tsan_release_acquire
+var __tsan_release_acquire byte
+
+//go:linkname __tsan_release_merge __tsan_release_merge
+var __tsan_release_merge byte
+
+//go:linkname __tsan_go_ignore_sync_begin __tsan_go_ignore_sync_begin
+var __tsan_go_ignore_sync_begin byte
+
+//go:linkname __tsan_go_ignore_sync_end __tsan_go_ignore_sync_end
+var __tsan_go_ignore_sync_end byte
+
+//go:linkname __tsan_report_count __tsan_report_count
+var __tsan_report_count byte
+
+// Mimic what cmd/cgo would do.
+//
+//go:cgo_import_static __tsan_init
+//go:cgo_import_static __tsan_fini
+//go:cgo_import_static __tsan_proc_create
+//go:cgo_import_static __tsan_proc_destroy
+//go:cgo_import_static __tsan_map_shadow
+//go:cgo_import_static __tsan_finalizer_goroutine
+//go:cgo_import_static __tsan_go_start
+//go:cgo_import_static __tsan_go_end
+//go:cgo_import_static __tsan_malloc
+//go:cgo_import_static __tsan_free
+//go:cgo_import_static __tsan_acquire
+//go:cgo_import_static __tsan_release
+//go:cgo_import_static __tsan_release_acquire
+//go:cgo_import_static __tsan_release_merge
+//go:cgo_import_static __tsan_go_ignore_sync_begin
+//go:cgo_import_static __tsan_go_ignore_sync_end
+//go:cgo_import_static __tsan_report_count
+
+// These are called from race_amd64.s.
+//
+//go:cgo_import_static __tsan_read
+//go:cgo_import_static __tsan_read_pc
+//go:cgo_import_static __tsan_read_range
+//go:cgo_import_static __tsan_write
+//go:cgo_import_static __tsan_write_pc
+//go:cgo_import_static __tsan_write_range
+//go:cgo_import_static __tsan_func_enter
+//go:cgo_import_static __tsan_func_exit
+
+//go:cgo_import_static __tsan_go_atomic32_load
+//go:cgo_import_static __tsan_go_atomic64_load
+//go:cgo_import_static __tsan_go_atomic32_store
+//go:cgo_import_static __tsan_go_atomic64_store
+//go:cgo_import_static __tsan_go_atomic32_exchange
+//go:cgo_import_static __tsan_go_atomic64_exchange
+//go:cgo_import_static __tsan_go_atomic32_fetch_add
+//go:cgo_import_static __tsan_go_atomic64_fetch_add
+//go:cgo_import_static __tsan_go_atomic32_compare_exchange
+//go:cgo_import_static __tsan_go_atomic64_compare_exchange
+
+// start/end of global data (data+bss).
+var racedatastart uintptr
+var racedataend uintptr
+
+// start/end of heap for race_amd64.s
+var racearenastart uintptr
+var racearenaend uintptr
+
+func racefuncenter(callpc uintptr)
+func racefuncenterfp(fp uintptr)
+func racefuncexit()
+func raceread(addr uintptr)
+func racewrite(addr uintptr)
+func racereadrange(addr, size uintptr)
+func racewriterange(addr, size uintptr)
+func racereadrangepc1(addr, size, pc uintptr)
+func racewriterangepc1(addr, size, pc uintptr)
+func racecallbackthunk(uintptr)
+
+// racecall allows calling an arbitrary function fn from C race runtime
+// with up to 4 uintptr arguments.
+func racecall(fn *byte, arg0, arg1, arg2, arg3 uintptr)
+
+// checks if the address has shadow (i.e. heap or data/bss).
+//
+//go:nosplit
+func isvalidaddr(addr unsafe.Pointer) bool {
+	return racearenastart <= uintptr(addr) && uintptr(addr) < racearenaend ||
+		racedatastart <= uintptr(addr) && uintptr(addr) < racedataend
+}
+
+//go:nosplit
+func raceinit() (gctx, pctx uintptr) {
+	lockInit(&raceFiniLock, lockRankRaceFini)
+
+	// On most machines, cgo is required to initialize libc, which is used by race runtime.
+	if !iscgo && GOOS != "darwin" {
+		throw("raceinit: race build must use cgo")
+	}
+
+	racecall(&__tsan_init, uintptr(unsafe.Pointer(&gctx)), uintptr(unsafe.Pointer(&pctx)), abi.FuncPCABI0(racecallbackthunk), 0)
+
+	// Round data segment to page boundaries, because it's used in mmap().
+	start := ^uintptr(0)
+	end := uintptr(0)
+	if start > firstmoduledata.noptrdata {
+		start = firstmoduledata.noptrdata
+	}
+	if start > firstmoduledata.data {
+		start = firstmoduledata.data
+	}
+	if start > firstmoduledata.noptrbss {
+		start = firstmoduledata.noptrbss
+	}
+	if start > firstmoduledata.bss {
+		start = firstmoduledata.bss
+	}
+	if end < firstmoduledata.enoptrdata {
+		end = firstmoduledata.enoptrdata
+	}
+	if end < firstmoduledata.edata {
+		end = firstmoduledata.edata
+	}
+	if end < firstmoduledata.enoptrbss {
+		end = firstmoduledata.enoptrbss
+	}
+	if end < firstmoduledata.ebss {
+		end = firstmoduledata.ebss
+	}
+	size := alignUp(end-start, _PageSize)
+	racecall(&__tsan_map_shadow, start, size, 0, 0)
+	racedatastart = start
+	racedataend = start + size
+
+	return
+}
+
+//go:nosplit
+func racefini() {
+	// racefini() can only be called once to avoid races.
+	// This eventually (via __tsan_fini) calls C.exit which has
+	// undefined behavior if called more than once. If the lock is
+	// already held it's assumed that the first caller exits the program
+	// so other calls can hang forever without an issue.
+	lock(&raceFiniLock)
+
+	// __tsan_fini will run C atexit functions and C++ destructors,
+	// which can theoretically call back into Go.
+	// Tell the scheduler we entering external code.
+	entersyscall()
+
+	// We're entering external code that may call ExitProcess on
+	// Windows.
+	osPreemptExtEnter(getg().m)
+
+	racecall(&__tsan_fini, 0, 0, 0, 0)
+}
+
+//go:nosplit
+func raceproccreate() uintptr {
+	var ctx uintptr
+	racecall(&__tsan_proc_create, uintptr(unsafe.Pointer(&ctx)), 0, 0, 0)
+	return ctx
+}
+
+//go:nosplit
+func raceprocdestroy(ctx uintptr) {
+	racecall(&__tsan_proc_destroy, ctx, 0, 0, 0)
+}
+
+//go:nosplit
+func racemapshadow(addr unsafe.Pointer, size uintptr) {
+	if racearenastart == 0 {
+		racearenastart = uintptr(addr)
+	}
+	if racearenaend < uintptr(addr)+size {
+		racearenaend = uintptr(addr) + size
+	}
+	racecall(&__tsan_map_shadow, uintptr(addr), size, 0, 0)
+}
+
+//go:nosplit
+func racemalloc(p unsafe.Pointer, sz uintptr) {
+	racecall(&__tsan_malloc, 0, 0, uintptr(p), sz)
+}
+
+//go:nosplit
+func racefree(p unsafe.Pointer, sz uintptr) {
+	racecall(&__tsan_free, uintptr(p), sz, 0, 0)
+}
+
+//go:nosplit
+func racegostart(pc uintptr) uintptr {
+	gp := getg()
+	var spawng *g
+	if gp.m.curg != nil {
+		spawng = gp.m.curg
+	} else {
+		spawng = gp
+	}
+
+	var racectx uintptr
+	racecall(&__tsan_go_start, spawng.racectx, uintptr(unsafe.Pointer(&racectx)), pc, 0)
+	return racectx
+}
+
+//go:nosplit
+func racegoend() {
+	racecall(&__tsan_go_end, getg().racectx, 0, 0, 0)
+}
+
+//go:nosplit
+func racectxend(racectx uintptr) {
+	racecall(&__tsan_go_end, racectx, 0, 0, 0)
+}
+
+//go:nosplit
+func racewriterangepc(addr unsafe.Pointer, sz, callpc, pc uintptr) {
+	gp := getg()
+	if gp != gp.m.curg {
+		// The call is coming from manual instrumentation of Go code running on g0/gsignal.
+		// Not interesting.
+		return
+	}
+	if callpc != 0 {
+		racefuncenter(callpc)
+	}
+	racewriterangepc1(uintptr(addr), sz, pc)
+	if callpc != 0 {
+		racefuncexit()
+	}
+}
+
+//go:nosplit
+func racereadrangepc(addr unsafe.Pointer, sz, callpc, pc uintptr) {
+	gp := getg()
+	if gp != gp.m.curg {
+		// The call is coming from manual instrumentation of Go code running on g0/gsignal.
+		// Not interesting.
+		return
+	}
+	if callpc != 0 {
+		racefuncenter(callpc)
+	}
+	racereadrangepc1(uintptr(addr), sz, pc)
+	if callpc != 0 {
+		racefuncexit()
+	}
+}
+
+//go:nosplit
+func raceacquire(addr unsafe.Pointer) {
+	raceacquireg(getg(), addr)
+}
+
+//go:nosplit
+func raceacquireg(gp *g, addr unsafe.Pointer) {
+	if getg().raceignore != 0 || !isvalidaddr(addr) {
+		return
+	}
+	racecall(&__tsan_acquire, gp.racectx, uintptr(addr), 0, 0)
+}
+
+//go:nosplit
+func raceacquirectx(racectx uintptr, addr unsafe.Pointer) {
+	if !isvalidaddr(addr) {
+		return
+	}
+	racecall(&__tsan_acquire, racectx, uintptr(addr), 0, 0)
+}
+
+//go:nosplit
+func racerelease(addr unsafe.Pointer) {
+	racereleaseg(getg(), addr)
+}
+
+//go:nosplit
+func racereleaseg(gp *g, addr unsafe.Pointer) {
+	if getg().raceignore != 0 || !isvalidaddr(addr) {
+		return
+	}
+	racecall(&__tsan_release, gp.racectx, uintptr(addr), 0, 0)
+}
+
+//go:nosplit
+func racereleaseacquire(addr unsafe.Pointer) {
+	racereleaseacquireg(getg(), addr)
+}
+
+//go:nosplit
+func racereleaseacquireg(gp *g, addr unsafe.Pointer) {
+	if getg().raceignore != 0 || !isvalidaddr(addr) {
+		return
+	}
+	racecall(&__tsan_release_acquire, gp.racectx, uintptr(addr), 0, 0)
+}
+
+//go:nosplit
+func racereleasemerge(addr unsafe.Pointer) {
+	racereleasemergeg(getg(), addr)
+}
+
+//go:nosplit
+func racereleasemergeg(gp *g, addr unsafe.Pointer) {
+	if getg().raceignore != 0 || !isvalidaddr(addr) {
+		return
+	}
+	racecall(&__tsan_release_merge, gp.racectx, uintptr(addr), 0, 0)
+}
+
+//go:nosplit
+func racefingo() {
+	racecall(&__tsan_finalizer_goroutine, getg().racectx, 0, 0, 0)
+}
+
+// The declarations below generate ABI wrappers for functions
+// implemented in assembly in this package but declared in another
+// package.
+
+//go:linkname abigen_sync_atomic_LoadInt32 sync/atomic.LoadInt32
+func abigen_sync_atomic_LoadInt32(addr *int32) (val int32)
+
+//go:linkname abigen_sync_atomic_LoadInt64 sync/atomic.LoadInt64
+func abigen_sync_atomic_LoadInt64(addr *int64) (val int64)
+
+//go:linkname abigen_sync_atomic_LoadUint32 sync/atomic.LoadUint32
+func abigen_sync_atomic_LoadUint32(addr *uint32) (val uint32)
+
+//go:linkname abigen_sync_atomic_LoadUint64 sync/atomic.LoadUint64
+func abigen_sync_atomic_LoadUint64(addr *uint64) (val uint64)
+
+//go:linkname abigen_sync_atomic_LoadUintptr sync/atomic.LoadUintptr
+func abigen_sync_atomic_LoadUintptr(addr *uintptr) (val uintptr)
+
+//go:linkname abigen_sync_atomic_LoadPointer sync/atomic.LoadPointer
+func abigen_sync_atomic_LoadPointer(addr *unsafe.Pointer) (val unsafe.Pointer)
+
+//go:linkname abigen_sync_atomic_StoreInt32 sync/atomic.StoreInt32
+func abigen_sync_atomic_StoreInt32(addr *int32, val int32)
+
+//go:linkname abigen_sync_atomic_StoreInt64 sync/atomic.StoreInt64
+func abigen_sync_atomic_StoreInt64(addr *int64, val int64)
+
+//go:linkname abigen_sync_atomic_StoreUint32 sync/atomic.StoreUint32
+func abigen_sync_atomic_StoreUint32(addr *uint32, val uint32)
+
+//go:linkname abigen_sync_atomic_StoreUint64 sync/atomic.StoreUint64
+func abigen_sync_atomic_StoreUint64(addr *uint64, val uint64)
+
+//go:linkname abigen_sync_atomic_SwapInt32 sync/atomic.SwapInt32
+func abigen_sync_atomic_SwapInt32(addr *int32, new int32) (old int32)
+
+//go:linkname abigen_sync_atomic_SwapInt64 sync/atomic.SwapInt64
+func abigen_sync_atomic_SwapInt64(addr *int64, new int64) (old int64)
+
+//go:linkname abigen_sync_atomic_SwapUint32 sync/atomic.SwapUint32
+func abigen_sync_atomic_SwapUint32(addr *uint32, new uint32) (old uint32)
+
+//go:linkname abigen_sync_atomic_SwapUint64 sync/atomic.SwapUint64
+func abigen_sync_atomic_SwapUint64(addr *uint64, new uint64) (old uint64)
+
+//go:linkname abigen_sync_atomic_AddInt32 sync/atomic.AddInt32
+func abigen_sync_atomic_AddInt32(addr *int32, delta int32) (new int32)
+
+//go:linkname abigen_sync_atomic_AddUint32 sync/atomic.AddUint32
+func abigen_sync_atomic_AddUint32(addr *uint32, delta uint32) (new uint32)
+
+//go:linkname abigen_sync_atomic_AddInt64 sync/atomic.AddInt64
+func abigen_sync_atomic_AddInt64(addr *int64, delta int64) (new int64)
+
+//go:linkname abigen_sync_atomic_AddUint64 sync/atomic.AddUint64
+func abigen_sync_atomic_AddUint64(addr *uint64, delta uint64) (new uint64)
+
+//go:linkname abigen_sync_atomic_AddUintptr sync/atomic.AddUintptr
+func abigen_sync_atomic_AddUintptr(addr *uintptr, delta uintptr) (new uintptr)
+
+//go:linkname abigen_sync_atomic_CompareAndSwapInt32 sync/atomic.CompareAndSwapInt32
+func abigen_sync_atomic_CompareAndSwapInt32(addr *int32, old, new int32) (swapped bool)
+
+//go:linkname abigen_sync_atomic_CompareAndSwapInt64 sync/atomic.CompareAndSwapInt64
+func abigen_sync_atomic_CompareAndSwapInt64(addr *int64, old, new int64) (swapped bool)
+
+//go:linkname abigen_sync_atomic_CompareAndSwapUint32 sync/atomic.CompareAndSwapUint32
+func abigen_sync_atomic_CompareAndSwapUint32(addr *uint32, old, new uint32) (swapped bool)
+
+//go:linkname abigen_sync_atomic_CompareAndSwapUint64 sync/atomic.CompareAndSwapUint64
+func abigen_sync_atomic_CompareAndSwapUint64(addr *uint64, old, new uint64) (swapped bool)
diff --git a/src/runtime/race/README b/src/runtime/race/README
new file mode 100644
index 0000000..acd8b84
--- /dev/null
+++ b/src/runtime/race/README
@@ -0,0 +1,17 @@
+runtime/race package contains the data race detector runtime library.
+It is based on ThreadSanitizer race detector, that is currently a part of
+the LLVM project (https://github.com/llvm/llvm-project/tree/main/compiler-rt).
+
+To update the .syso files use golang.org/x/build/cmd/racebuild.
+
+race_darwin_amd64.syso built with LLVM 127e59048cd3d8dbb80c14b3036918c114089529 and Go 59ab6f351a370a27458755dc69f4a837e55a05a6.
+race_freebsd_amd64.syso built with LLVM 127e59048cd3d8dbb80c14b3036918c114089529 and Go 59ab6f351a370a27458755dc69f4a837e55a05a6.
+race_linux_ppc64le.syso built with LLVM 41cb504b7c4b18ac15830107431a0c1eec73a6b2 and Go 851ecea4cc99ab276109493477b2c7e30c253ea8.
+race_netbsd_amd64.syso built with LLVM 41cb504b7c4b18ac15830107431a0c1eec73a6b2 and Go 851ecea4cc99ab276109493477b2c7e30c253ea8.
+race_windows_amd64.syso built with LLVM b6374437af39af66896da74a1dc1b8a0ece26bee and Go 3e97294663d978bf8abb7acec7cc615ef2f1ea75.
+race_linux_arm64.syso built with LLVM 41cb504b7c4b18ac15830107431a0c1eec73a6b2 and Go 851ecea4cc99ab276109493477b2c7e30c253ea8.
+race_darwin_arm64.syso built with LLVM 41cb504b7c4b18ac15830107431a0c1eec73a6b2 and Go 851ecea4cc99ab276109493477b2c7e30c253ea8.
+race_openbsd_amd64.syso built with LLVM fcf6ae2f070eba73074b6ec8d8281e54d29dbeeb and Go 8f2db14cd35bbd674cb2988a508306de6655e425.
+race_linux_s390x.syso built with LLVM 41cb504b7c4b18ac15830107431a0c1eec73a6b2 and Go 851ecea4cc99ab276109493477b2c7e30c253ea8.
+internal/amd64v3/race_linux.syso built with LLVM 74c2d4f6024c8f160871a2baa928d0b42415f183 and Go c0f27eb3d580c8b9efd73802678eba4c6c9461be.
+internal/amd64v1/race_linux.syso built with LLVM 74c2d4f6024c8f160871a2baa928d0b42415f183 and Go c0f27eb3d580c8b9efd73802678eba4c6c9461be.
diff --git a/src/runtime/race/doc.go b/src/runtime/race/doc.go
new file mode 100644
index 0000000..60a20df
--- /dev/null
+++ b/src/runtime/race/doc.go
@@ -0,0 +1,11 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Package race implements data race detection logic.
+// No public interface is provided.
+// For details about the race detector see
+// https://golang.org/doc/articles/race_detector.html
+package race
+
+//go:generate ./mkcgo.sh
diff --git a/src/runtime/race/internal/amd64v1/doc.go b/src/runtime/race/internal/amd64v1/doc.go
new file mode 100644
index 0000000..ccb088c
--- /dev/null
+++ b/src/runtime/race/internal/amd64v1/doc.go
@@ -0,0 +1,10 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This package holds the race detector .syso for
+// amd64 architectures with GOAMD64<v3.
+
+//go:build amd64 && ((linux && !amd64.v3) || darwin || freebsd || netbsd || openbsd || windows)
+
+package amd64v1
diff --git a/src/runtime/race/internal/amd64v1/race_darwin.syso b/src/runtime/race/internal/amd64v1/race_darwin.syso
new file mode 100644
index 0000000..e5d848c
--- /dev/null
+++ b/src/runtime/race/internal/amd64v1/race_darwin.syso
diff --git a/src/runtime/race/internal/amd64v1/race_freebsd.syso b/src/runtime/race/internal/amd64v1/race_freebsd.syso
new file mode 100644
index 0000000..b3a4383
--- /dev/null
+++ b/src/runtime/race/internal/amd64v1/race_freebsd.syso
diff --git a/src/runtime/race/internal/amd64v1/race_linux.syso b/src/runtime/race/internal/amd64v1/race_linux.syso
new file mode 100644
index 0000000..68f1508
--- /dev/null
+++ b/src/runtime/race/internal/amd64v1/race_linux.syso
diff --git a/src/runtime/race/internal/amd64v1/race_netbsd.syso b/src/runtime/race/internal/amd64v1/race_netbsd.syso
new file mode 100644
index 0000000..e6cc4bf
--- /dev/null
+++ b/src/runtime/race/internal/amd64v1/race_netbsd.syso
diff --git a/src/runtime/race/internal/amd64v1/race_openbsd.syso b/src/runtime/race/internal/amd64v1/race_openbsd.syso
new file mode 100644
index 0000000..9fefd87
--- /dev/null
+++ b/src/runtime/race/internal/amd64v1/race_openbsd.syso
diff --git a/src/runtime/race/internal/amd64v1/race_windows.syso b/src/runtime/race/internal/amd64v1/race_windows.syso
new file mode 100644
index 0000000..777bd83
--- /dev/null
+++ b/src/runtime/race/internal/amd64v1/race_windows.syso
diff --git a/src/runtime/race/internal/amd64v3/doc.go b/src/runtime/race/internal/amd64v3/doc.go
new file mode 100644
index 0000000..215998a
--- /dev/null
+++ b/src/runtime/race/internal/amd64v3/doc.go
@@ -0,0 +1,10 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This package holds the race detector .syso for
+// amd64 architectures with GOAMD64>=v3.
+
+//go:build amd64 && linux && amd64.v3
+
+package amd64v3
diff --git a/src/runtime/race/internal/amd64v3/race_linux.syso b/src/runtime/race/internal/amd64v3/race_linux.syso
new file mode 100644
index 0000000..33c3e76
--- /dev/null
+++ b/src/runtime/race/internal/amd64v3/race_linux.syso
diff --git a/src/runtime/race/mkcgo.sh b/src/runtime/race/mkcgo.sh
new file mode 100755
index 0000000..6ebe5a4
--- /dev/null
+++ b/src/runtime/race/mkcgo.sh
@@ -0,0 +1,20 @@
+#!/bin/bash
+
+hdr='
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Code generated by mkcgo.sh. DO NOT EDIT.
+
+//go:build race
+
+'
+
+convert() {
+	(echo "$hdr"; go tool cgo -dynpackage race -dynimport $1) | gofmt
+}
+
+convert race_darwin_arm64.syso >race_darwin_arm64.go
+convert internal/amd64v1/race_darwin.syso >race_darwin_amd64.go
+
diff --git a/src/runtime/race/output_test.go b/src/runtime/race/output_test.go
new file mode 100644
index 0000000..4c2c339
--- /dev/null
+++ b/src/runtime/race/output_test.go
@@ -0,0 +1,480 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build race
+
+package race_test
+
+import (
+	"fmt"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"regexp"
+	"runtime"
+	"strings"
+	"testing"
+)
+
+func TestOutput(t *testing.T) {
+	pkgdir := t.TempDir()
+	out, err := exec.Command(testenv.GoToolPath(t), "install", "-race", "-pkgdir="+pkgdir, "testing").CombinedOutput()
+	if err != nil {
+		t.Fatalf("go install -race: %v\n%s", err, out)
+	}
+
+	for _, test := range tests {
+		if test.goos != "" && test.goos != runtime.GOOS {
+			t.Logf("test %v runs only on %v, skipping: ", test.name, test.goos)
+			continue
+		}
+		dir := t.TempDir()
+		source := "main.go"
+		if test.run == "test" {
+			source = "main_test.go"
+		}
+		src := filepath.Join(dir, source)
+		f, err := os.Create(src)
+		if err != nil {
+			t.Fatalf("failed to create file: %v", err)
+		}
+		_, err = f.WriteString(test.source)
+		if err != nil {
+			f.Close()
+			t.Fatalf("failed to write: %v", err)
+		}
+		if err := f.Close(); err != nil {
+			t.Fatalf("failed to close file: %v", err)
+		}
+
+		cmd := exec.Command(testenv.GoToolPath(t), test.run, "-race", "-pkgdir="+pkgdir, src)
+		// GODEBUG spoils program output, GOMAXPROCS makes it flaky.
+		for _, env := range os.Environ() {
+			if strings.HasPrefix(env, "GODEBUG=") ||
+				strings.HasPrefix(env, "GOMAXPROCS=") ||
+				strings.HasPrefix(env, "GORACE=") {
+				continue
+			}
+			cmd.Env = append(cmd.Env, env)
+		}
+		cmd.Env = append(cmd.Env,
+			"GOMAXPROCS=1", // see comment in race_test.go
+			"GORACE="+test.gorace,
+		)
+		got, _ := cmd.CombinedOutput()
+		matched := false
+		for _, re := range test.re {
+			if regexp.MustCompile(re).MatchString(string(got)) {
+				matched = true
+				break
+			}
+		}
+		if !matched {
+			exp := fmt.Sprintf("expect:\n%v\n", test.re[0])
+			if len(test.re) > 1 {
+				exp = fmt.Sprintf("expected one of %d patterns:\n",
+					len(test.re))
+				for k, re := range test.re {
+					exp += fmt.Sprintf("pattern %d:\n%v\n", k, re)
+				}
+			}
+			t.Fatalf("failed test case %v, %sgot:\n%s",
+				test.name, exp, got)
+		}
+	}
+}
+
+var tests = []struct {
+	name   string
+	run    string
+	goos   string
+	gorace string
+	source string
+	re     []string
+}{
+	{"simple", "run", "", "atexit_sleep_ms=0", `
+package main
+import "time"
+var xptr *int
+var donechan chan bool
+func main() {
+	done := make(chan bool)
+	x := 0
+	startRacer(&x, done)
+	store(&x, 43)
+	<-done
+}
+func store(x *int, v int) {
+	*x = v
+}
+func startRacer(x *int, done chan bool) {
+	xptr = x
+	donechan = done
+	go racer()
+}
+func racer() {
+	time.Sleep(10*time.Millisecond)
+	store(xptr, 42)
+	donechan <- true
+}
+`, []string{`==================
+WARNING: DATA RACE
+Write at 0x[0-9,a-f]+ by goroutine [0-9]:
+  main\.store\(\)
+      .+/main\.go:14 \+0x[0-9,a-f]+
+  main\.racer\(\)
+      .+/main\.go:23 \+0x[0-9,a-f]+
+
+Previous write at 0x[0-9,a-f]+ by main goroutine:
+  main\.store\(\)
+      .+/main\.go:14 \+0x[0-9,a-f]+
+  main\.main\(\)
+      .+/main\.go:10 \+0x[0-9,a-f]+
+
+Goroutine [0-9] \(running\) created at:
+  main\.startRacer\(\)
+      .+/main\.go:19 \+0x[0-9,a-f]+
+  main\.main\(\)
+      .+/main\.go:9 \+0x[0-9,a-f]+
+==================
+Found 1 data race\(s\)
+exit status 66
+`}},
+
+	{"exitcode", "run", "", "atexit_sleep_ms=0 exitcode=13", `
+package main
+func main() {
+	done := make(chan bool)
+	x := 0; _ = x
+	go func() {
+		x = 42
+		done <- true
+	}()
+	x = 43
+	<-done
+}
+`, []string{`exit status 13`}},
+
+	{"strip_path_prefix", "run", "", "atexit_sleep_ms=0 strip_path_prefix=/main.", `
+package main
+func main() {
+	done := make(chan bool)
+	x := 0; _ = x
+	go func() {
+		x = 42
+		done <- true
+	}()
+	x = 43
+	<-done
+}
+`, []string{`
+      go:7 \+0x[0-9,a-f]+
+`}},
+
+	{"halt_on_error", "run", "", "atexit_sleep_ms=0 halt_on_error=1", `
+package main
+func main() {
+	done := make(chan bool)
+	x := 0; _ = x
+	go func() {
+		x = 42
+		done <- true
+	}()
+	x = 43
+	<-done
+}
+`, []string{`
+==================
+exit status 66
+`}},
+
+	{"test_fails_on_race", "test", "", "atexit_sleep_ms=0", `
+package main_test
+import "testing"
+func TestFail(t *testing.T) {
+	done := make(chan bool)
+	x := 0
+	_ = x
+	go func() {
+		x = 42
+		done <- true
+	}()
+	x = 43
+	<-done
+	t.Log(t.Failed())
+}
+`, []string{`
+==================
+--- FAIL: TestFail \([0-9.]+s\)
+.*main_test.go:14: true
+.*testing.go:.*: race detected during execution of test
+FAIL`}},
+
+	{"slicebytetostring_pc", "run", "", "atexit_sleep_ms=0", `
+package main
+func main() {
+	done := make(chan string)
+	data := make([]byte, 10)
+	go func() {
+		done <- string(data)
+	}()
+	data[0] = 1
+	<-done
+}
+`, []string{`
+  runtime\.slicebytetostring\(\)
+      .*/runtime/string\.go:.*
+  main\.main\.func1\(\)
+      .*/main.go:7`}},
+
+	// Test for https://golang.org/issue/33309
+	{"midstack_inlining_traceback", "run", "linux", "atexit_sleep_ms=0", `
+package main
+
+var x int
+var c chan int
+func main() {
+	c = make(chan int)
+	go f()
+	x = 1
+	<-c
+}
+
+func f() {
+	g(c)
+}
+
+func g(c chan int) {
+	h(c)
+}
+
+func h(c chan int) {
+	c <- x
+}
+`, []string{`==================
+WARNING: DATA RACE
+Read at 0x[0-9,a-f]+ by goroutine [0-9]:
+  main\.h\(\)
+      .+/main\.go:22 \+0x[0-9,a-f]+
+  main\.g\(\)
+      .+/main\.go:18 \+0x[0-9,a-f]+
+  main\.f\(\)
+      .+/main\.go:14 \+0x[0-9,a-f]+
+
+Previous write at 0x[0-9,a-f]+ by main goroutine:
+  main\.main\(\)
+      .+/main\.go:9 \+0x[0-9,a-f]+
+
+Goroutine [0-9] \(running\) created at:
+  main\.main\(\)
+      .+/main\.go:8 \+0x[0-9,a-f]+
+==================
+Found 1 data race\(s\)
+exit status 66
+`}},
+
+	// Test for https://golang.org/issue/17190
+	{"external_cgo_thread", "run", "linux", "atexit_sleep_ms=0", `
+package main
+
+/*
+#include <pthread.h>
+typedef struct cb {
+        int foo;
+} cb;
+extern void goCallback();
+static inline void *threadFunc(void *p) {
+	goCallback();
+	return 0;
+}
+static inline void startThread(cb* c) {
+	pthread_t th;
+	pthread_create(&th, 0, threadFunc, 0);
+}
+*/
+import "C"
+
+var done chan bool
+var racy int
+
+//export goCallback
+func goCallback() {
+	racy++
+	done <- true
+}
+
+func main() {
+	done = make(chan bool)
+	var c C.cb
+	C.startThread(&c)
+	racy++
+	<- done
+}
+`, []string{`==================
+WARNING: DATA RACE
+Read at 0x[0-9,a-f]+ by main goroutine:
+  main\.main\(\)
+      .*/main\.go:34 \+0x[0-9,a-f]+
+
+Previous write at 0x[0-9,a-f]+ by goroutine [0-9]:
+  main\.goCallback\(\)
+      .*/main\.go:27 \+0x[0-9,a-f]+
+  _cgoexp_[0-9a-z]+_goCallback\(\)
+      .*_cgo_gotypes\.go:[0-9]+ \+0x[0-9,a-f]+
+  _cgoexp_[0-9a-z]+_goCallback\(\)
+      <autogenerated>:1 \+0x[0-9,a-f]+
+
+Goroutine [0-9] \(running\) created at:
+  runtime\.newextram\(\)
+      .*/runtime/proc.go:[0-9]+ \+0x[0-9,a-f]+
+==================`,
+		`==================
+WARNING: DATA RACE
+Read at 0x[0-9,a-f]+ by .*:
+  main\..*
+      .*/main\.go:[0-9]+ \+0x[0-9,a-f]+(?s).*
+
+Previous write at 0x[0-9,a-f]+ by .*:
+  main\..*
+      .*/main\.go:[0-9]+ \+0x[0-9,a-f]+(?s).*
+
+Goroutine [0-9] \(running\) created at:
+  runtime\.newextram\(\)
+      .*/runtime/proc.go:[0-9]+ \+0x[0-9,a-f]+
+==================`}},
+	{"second_test_passes", "test", "", "atexit_sleep_ms=0", `
+package main_test
+import "testing"
+func TestFail(t *testing.T) {
+	done := make(chan bool)
+	x := 0
+	_ = x
+	go func() {
+		x = 42
+		done <- true
+	}()
+	x = 43
+	<-done
+}
+
+func TestPass(t *testing.T) {
+}
+`, []string{`
+==================
+--- FAIL: TestFail \([0-9.]+s\)
+.*testing.go:.*: race detected during execution of test
+FAIL`}},
+	{"mutex", "run", "", "atexit_sleep_ms=0", `
+package main
+import (
+	"sync"
+	"fmt"
+)
+func main() {
+	c := make(chan bool, 1)
+	threads := 1
+	iterations := 20000
+	data := 0
+	var wg sync.WaitGroup
+	for i := 0; i < threads; i++ {
+		wg.Add(1)
+		go func() {
+			defer wg.Done()
+			for i := 0; i < iterations; i++ {
+				c <- true
+				data += 1
+				<- c
+			}
+		}()
+	}
+	for i := 0; i < iterations; i++ {
+		c <- true
+		data += 1
+		<- c
+	}
+	wg.Wait()
+	if (data == iterations*(threads+1)) { fmt.Println("pass") }
+}`, []string{`pass`}},
+	// Test for https://github.com/golang/go/issues/37355
+	{"chanmm", "run", "", "atexit_sleep_ms=0", `
+package main
+import (
+	"sync"
+	"time"
+)
+func main() {
+	c := make(chan bool, 1)
+	var data uint64
+	var wg sync.WaitGroup
+	wg.Add(2)
+	c <- true
+	go func() {
+		defer wg.Done()
+		c <- true
+	}()
+	go func() {
+		defer wg.Done()
+		time.Sleep(time.Second)
+		<-c
+		data = 2
+	}()
+	data = 1
+	<-c
+	wg.Wait()
+	_ = data
+}
+`, []string{`==================
+WARNING: DATA RACE
+Write at 0x[0-9,a-f]+ by goroutine [0-9]:
+  main\.main\.func2\(\)
+      .*/main\.go:21 \+0x[0-9,a-f]+
+
+Previous write at 0x[0-9,a-f]+ by main goroutine:
+  main\.main\(\)
+      .*/main\.go:23 \+0x[0-9,a-f]+
+
+Goroutine [0-9] \(running\) created at:
+  main\.main\(\)
+      .*/main.go:[0-9]+ \+0x[0-9,a-f]+
+==================`}},
+	// Test symbolizing wrappers. Both (*T).f and main.func1 are wrappers.
+	// go.dev/issue/60245
+	{"wrappersym", "run", "", "atexit_sleep_ms=0", `
+package main
+import "sync"
+var wg sync.WaitGroup
+var x int
+func main() {
+	f := (*T).f
+	wg.Add(2)
+	go f(new(T))
+	f(new(T))
+	wg.Wait()
+}
+type T struct{}
+func (t T) f() {
+	x = 42
+	wg.Done()
+}
+`, []string{`==================
+WARNING: DATA RACE
+Write at 0x[0-9,a-f]+ by goroutine [0-9]:
+  main\.T\.f\(\)
+      .*/main.go:15 \+0x[0-9,a-f]+
+  main\.\(\*T\)\.f\(\)
+      <autogenerated>:1 \+0x[0-9,a-f]+
+  main\.main\.func1\(\)
+      .*/main.go:9 \+0x[0-9,a-f]+
+
+Previous write at 0x[0-9,a-f]+ by main goroutine:
+  main\.T\.f\(\)
+      .*/main.go:15 \+0x[0-9,a-f]+
+  main\.\(\*T\)\.f\(\)
+      <autogenerated>:1 \+0x[0-9,a-f]+
+  main\.main\(\)
+      .*/main.go:10 \+0x[0-9,a-f]+
+
+`}},
+}
diff --git a/src/runtime/race/race.go b/src/runtime/race/race.go
new file mode 100644
index 0000000..9c508eb
--- /dev/null
+++ b/src/runtime/race/race.go
@@ -0,0 +1,20 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build race && ((linux && (amd64 || arm64 || ppc64le || s390x)) || ((freebsd || netbsd || openbsd || windows) && amd64))
+
+package race
+
+// This file merely ensures that we link in runtime/cgo in race build,
+// this in turn ensures that runtime uses pthread_create to create threads.
+// The prebuilt race runtime lives in race_GOOS_GOARCH.syso.
+// Calls to the runtime are done directly from src/runtime/race.go.
+
+// On darwin we always use system DLLs to create threads,
+// so we use race_darwin_$GOARCH.go to provide the syso-derived
+// symbol information without needing to invoke cgo.
+// This allows -race to be used on Mac systems without a C toolchain.
+
+// void __race_unused_func(void);
+import "C"
diff --git a/src/runtime/race/race_darwin_amd64.go b/src/runtime/race/race_darwin_amd64.go
new file mode 100644
index 0000000..fbb838a
--- /dev/null
+++ b/src/runtime/race/race_darwin_amd64.go
@@ -0,0 +1,101 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Code generated by mkcgo.sh. DO NOT EDIT.
+
+//go:build race
+
+package race
+
+//go:cgo_import_dynamic _Block_object_assign _Block_object_assign ""
+//go:cgo_import_dynamic _Block_object_dispose _Block_object_dispose ""
+//go:cgo_import_dynamic _NSConcreteStackBlock _NSConcreteStackBlock ""
+//go:cgo_import_dynamic _NSGetArgv _NSGetArgv ""
+//go:cgo_import_dynamic _NSGetEnviron _NSGetEnviron ""
+//go:cgo_import_dynamic _NSGetExecutablePath _NSGetExecutablePath ""
+//go:cgo_import_dynamic __bzero __bzero ""
+//go:cgo_import_dynamic __error __error ""
+//go:cgo_import_dynamic __fork __fork ""
+//go:cgo_import_dynamic __mmap __mmap ""
+//go:cgo_import_dynamic __munmap __munmap ""
+//go:cgo_import_dynamic __stack_chk_fail __stack_chk_fail ""
+//go:cgo_import_dynamic __stack_chk_guard __stack_chk_guard ""
+//go:cgo_import_dynamic _dyld_get_image_header _dyld_get_image_header ""
+//go:cgo_import_dynamic _dyld_get_image_name _dyld_get_image_name ""
+//go:cgo_import_dynamic _dyld_get_image_vmaddr_slide _dyld_get_image_vmaddr_slide ""
+//go:cgo_import_dynamic _dyld_get_shared_cache_range _dyld_get_shared_cache_range ""
+//go:cgo_import_dynamic _dyld_get_shared_cache_uuid _dyld_get_shared_cache_uuid ""
+//go:cgo_import_dynamic _dyld_image_count _dyld_image_count ""
+//go:cgo_import_dynamic _exit _exit ""
+//go:cgo_import_dynamic abort abort ""
+//go:cgo_import_dynamic arc4random_buf arc4random_buf ""
+//go:cgo_import_dynamic close close ""
+//go:cgo_import_dynamic dlsym dlsym ""
+//go:cgo_import_dynamic dup dup ""
+//go:cgo_import_dynamic dup2 dup2 ""
+//go:cgo_import_dynamic dyld_shared_cache_iterate_text dyld_shared_cache_iterate_text ""
+//go:cgo_import_dynamic execve execve ""
+//go:cgo_import_dynamic exit exit ""
+//go:cgo_import_dynamic fstat$INODE64 fstat$INODE64 ""
+//go:cgo_import_dynamic ftruncate ftruncate ""
+//go:cgo_import_dynamic getpid getpid ""
+//go:cgo_import_dynamic getrlimit getrlimit ""
+//go:cgo_import_dynamic gettimeofday gettimeofday ""
+//go:cgo_import_dynamic getuid getuid ""
+//go:cgo_import_dynamic grantpt grantpt ""
+//go:cgo_import_dynamic ioctl ioctl ""
+//go:cgo_import_dynamic isatty isatty ""
+//go:cgo_import_dynamic lstat$INODE64 lstat$INODE64 ""
+//go:cgo_import_dynamic mach_absolute_time mach_absolute_time ""
+//go:cgo_import_dynamic mach_task_self_ mach_task_self_ ""
+//go:cgo_import_dynamic mach_timebase_info mach_timebase_info ""
+//go:cgo_import_dynamic mach_vm_region_recurse mach_vm_region_recurse ""
+//go:cgo_import_dynamic madvise madvise ""
+//go:cgo_import_dynamic malloc_num_zones malloc_num_zones ""
+//go:cgo_import_dynamic malloc_zones malloc_zones ""
+//go:cgo_import_dynamic memcpy memcpy ""
+//go:cgo_import_dynamic memset_pattern16 memset_pattern16 ""
+//go:cgo_import_dynamic mkdir mkdir ""
+//go:cgo_import_dynamic mprotect mprotect ""
+//go:cgo_import_dynamic open open ""
+//go:cgo_import_dynamic pipe pipe ""
+//go:cgo_import_dynamic posix_openpt posix_openpt ""
+//go:cgo_import_dynamic posix_spawn posix_spawn ""
+//go:cgo_import_dynamic posix_spawn_file_actions_addclose posix_spawn_file_actions_addclose ""
+//go:cgo_import_dynamic posix_spawn_file_actions_adddup2 posix_spawn_file_actions_adddup2 ""
+//go:cgo_import_dynamic posix_spawn_file_actions_destroy posix_spawn_file_actions_destroy ""
+//go:cgo_import_dynamic posix_spawn_file_actions_init posix_spawn_file_actions_init ""
+//go:cgo_import_dynamic posix_spawnattr_destroy posix_spawnattr_destroy ""
+//go:cgo_import_dynamic posix_spawnattr_init posix_spawnattr_init ""
+//go:cgo_import_dynamic posix_spawnattr_setflags posix_spawnattr_setflags ""
+//go:cgo_import_dynamic pthread_attr_getstack pthread_attr_getstack ""
+//go:cgo_import_dynamic pthread_create pthread_create ""
+//go:cgo_import_dynamic pthread_get_stackaddr_np pthread_get_stackaddr_np ""
+//go:cgo_import_dynamic pthread_get_stacksize_np pthread_get_stacksize_np ""
+//go:cgo_import_dynamic pthread_getspecific pthread_getspecific ""
+//go:cgo_import_dynamic pthread_join pthread_join ""
+//go:cgo_import_dynamic pthread_self pthread_self ""
+//go:cgo_import_dynamic pthread_sigmask pthread_sigmask ""
+//go:cgo_import_dynamic pthread_threadid_np pthread_threadid_np ""
+//go:cgo_import_dynamic read read ""
+//go:cgo_import_dynamic readlink readlink ""
+//go:cgo_import_dynamic realpath$DARWIN_EXTSN realpath$DARWIN_EXTSN ""
+//go:cgo_import_dynamic rename rename ""
+//go:cgo_import_dynamic sched_yield sched_yield ""
+//go:cgo_import_dynamic setrlimit setrlimit ""
+//go:cgo_import_dynamic sigaction sigaction ""
+//go:cgo_import_dynamic stat$INODE64 stat$INODE64 ""
+//go:cgo_import_dynamic sysconf sysconf ""
+//go:cgo_import_dynamic sysctl sysctl ""
+//go:cgo_import_dynamic sysctlbyname sysctlbyname ""
+//go:cgo_import_dynamic task_info task_info ""
+//go:cgo_import_dynamic tcgetattr tcgetattr ""
+//go:cgo_import_dynamic tcsetattr tcsetattr ""
+//go:cgo_import_dynamic unlink unlink ""
+//go:cgo_import_dynamic unlockpt unlockpt ""
+//go:cgo_import_dynamic usleep usleep ""
+//go:cgo_import_dynamic vm_region_64 vm_region_64 ""
+//go:cgo_import_dynamic vm_region_recurse_64 vm_region_recurse_64 ""
+//go:cgo_import_dynamic waitpid waitpid ""
+//go:cgo_import_dynamic write write ""
diff --git a/src/runtime/race/race_darwin_arm64.go b/src/runtime/race/race_darwin_arm64.go
new file mode 100644
index 0000000..fe8584c
--- /dev/null
+++ b/src/runtime/race/race_darwin_arm64.go
@@ -0,0 +1,95 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Code generated by mkcgo.sh. DO NOT EDIT.
+
+//go:build race
+
+package race
+
+//go:cgo_import_dynamic _NSGetArgv _NSGetArgv ""
+//go:cgo_import_dynamic _NSGetEnviron _NSGetEnviron ""
+//go:cgo_import_dynamic _NSGetExecutablePath _NSGetExecutablePath ""
+//go:cgo_import_dynamic __error __error ""
+//go:cgo_import_dynamic __fork __fork ""
+//go:cgo_import_dynamic __mmap __mmap ""
+//go:cgo_import_dynamic __munmap __munmap ""
+//go:cgo_import_dynamic __stack_chk_fail __stack_chk_fail ""
+//go:cgo_import_dynamic __stack_chk_guard __stack_chk_guard ""
+//go:cgo_import_dynamic _dyld_get_image_header _dyld_get_image_header ""
+//go:cgo_import_dynamic _dyld_get_image_name _dyld_get_image_name ""
+//go:cgo_import_dynamic _dyld_get_image_vmaddr_slide _dyld_get_image_vmaddr_slide ""
+//go:cgo_import_dynamic _dyld_image_count _dyld_image_count ""
+//go:cgo_import_dynamic _exit _exit ""
+//go:cgo_import_dynamic abort abort ""
+//go:cgo_import_dynamic arc4random_buf arc4random_buf ""
+//go:cgo_import_dynamic bzero bzero ""
+//go:cgo_import_dynamic close close ""
+//go:cgo_import_dynamic dlsym dlsym ""
+//go:cgo_import_dynamic dup dup ""
+//go:cgo_import_dynamic dup2 dup2 ""
+//go:cgo_import_dynamic execve execve ""
+//go:cgo_import_dynamic exit exit ""
+//go:cgo_import_dynamic fstat fstat ""
+//go:cgo_import_dynamic ftruncate ftruncate ""
+//go:cgo_import_dynamic getpid getpid ""
+//go:cgo_import_dynamic getrlimit getrlimit ""
+//go:cgo_import_dynamic gettimeofday gettimeofday ""
+//go:cgo_import_dynamic getuid getuid ""
+//go:cgo_import_dynamic grantpt grantpt ""
+//go:cgo_import_dynamic ioctl ioctl ""
+//go:cgo_import_dynamic isatty isatty ""
+//go:cgo_import_dynamic lstat lstat ""
+//go:cgo_import_dynamic mach_absolute_time mach_absolute_time ""
+//go:cgo_import_dynamic mach_task_self_ mach_task_self_ ""
+//go:cgo_import_dynamic mach_timebase_info mach_timebase_info ""
+//go:cgo_import_dynamic mach_vm_region_recurse mach_vm_region_recurse ""
+//go:cgo_import_dynamic madvise madvise ""
+//go:cgo_import_dynamic malloc_num_zones malloc_num_zones ""
+//go:cgo_import_dynamic malloc_zones malloc_zones ""
+//go:cgo_import_dynamic memcpy memcpy ""
+//go:cgo_import_dynamic memset_pattern16 memset_pattern16 ""
+//go:cgo_import_dynamic mkdir mkdir ""
+//go:cgo_import_dynamic mprotect mprotect ""
+//go:cgo_import_dynamic open open ""
+//go:cgo_import_dynamic pipe pipe ""
+//go:cgo_import_dynamic posix_openpt posix_openpt ""
+//go:cgo_import_dynamic posix_spawn posix_spawn ""
+//go:cgo_import_dynamic posix_spawn_file_actions_addclose posix_spawn_file_actions_addclose ""
+//go:cgo_import_dynamic posix_spawn_file_actions_adddup2 posix_spawn_file_actions_adddup2 ""
+//go:cgo_import_dynamic posix_spawn_file_actions_destroy posix_spawn_file_actions_destroy ""
+//go:cgo_import_dynamic posix_spawn_file_actions_init posix_spawn_file_actions_init ""
+//go:cgo_import_dynamic posix_spawnattr_destroy posix_spawnattr_destroy ""
+//go:cgo_import_dynamic posix_spawnattr_init posix_spawnattr_init ""
+//go:cgo_import_dynamic posix_spawnattr_setflags posix_spawnattr_setflags ""
+//go:cgo_import_dynamic pthread_attr_getstack pthread_attr_getstack ""
+//go:cgo_import_dynamic pthread_create pthread_create ""
+//go:cgo_import_dynamic pthread_get_stackaddr_np pthread_get_stackaddr_np ""
+//go:cgo_import_dynamic pthread_get_stacksize_np pthread_get_stacksize_np ""
+//go:cgo_import_dynamic pthread_getspecific pthread_getspecific ""
+//go:cgo_import_dynamic pthread_join pthread_join ""
+//go:cgo_import_dynamic pthread_self pthread_self ""
+//go:cgo_import_dynamic pthread_sigmask pthread_sigmask ""
+//go:cgo_import_dynamic pthread_threadid_np pthread_threadid_np ""
+//go:cgo_import_dynamic read read ""
+//go:cgo_import_dynamic readlink readlink ""
+//go:cgo_import_dynamic realpath$DARWIN_EXTSN realpath$DARWIN_EXTSN ""
+//go:cgo_import_dynamic rename rename ""
+//go:cgo_import_dynamic sched_yield sched_yield ""
+//go:cgo_import_dynamic setrlimit setrlimit ""
+//go:cgo_import_dynamic sigaction sigaction ""
+//go:cgo_import_dynamic stat stat ""
+//go:cgo_import_dynamic sysconf sysconf ""
+//go:cgo_import_dynamic sysctl sysctl ""
+//go:cgo_import_dynamic sysctlbyname sysctlbyname ""
+//go:cgo_import_dynamic task_info task_info ""
+//go:cgo_import_dynamic tcgetattr tcgetattr ""
+//go:cgo_import_dynamic tcsetattr tcsetattr ""
+//go:cgo_import_dynamic unlink unlink ""
+//go:cgo_import_dynamic unlockpt unlockpt ""
+//go:cgo_import_dynamic usleep usleep ""
+//go:cgo_import_dynamic vm_region_64 vm_region_64 ""
+//go:cgo_import_dynamic vm_region_recurse_64 vm_region_recurse_64 ""
+//go:cgo_import_dynamic waitpid waitpid ""
+//go:cgo_import_dynamic write write ""
diff --git a/src/runtime/race/race_darwin_arm64.syso b/src/runtime/race/race_darwin_arm64.syso
new file mode 100644
index 0000000..4a23df2
--- /dev/null
+++ b/src/runtime/race/race_darwin_arm64.syso
diff --git a/src/runtime/race/race_linux_arm64.syso b/src/runtime/race/race_linux_arm64.syso
new file mode 100644
index 0000000..c8b3f48
--- /dev/null
+++ b/src/runtime/race/race_linux_arm64.syso
diff --git a/src/runtime/race/race_linux_ppc64le.syso b/src/runtime/race/race_linux_ppc64le.syso
new file mode 100644
index 0000000..1939f29
--- /dev/null
+++ b/src/runtime/race/race_linux_ppc64le.syso
diff --git a/src/runtime/race/race_linux_s390x.syso b/src/runtime/race/race_linux_s390x.syso
new file mode 100644
index 0000000..ed4a300
--- /dev/null
+++ b/src/runtime/race/race_linux_s390x.syso
diff --git a/src/runtime/race/race_linux_test.go b/src/runtime/race/race_linux_test.go
new file mode 100644
index 0000000..947ed7c
--- /dev/null
+++ b/src/runtime/race/race_linux_test.go
@@ -0,0 +1,65 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && race
+
+package race_test
+
+import (
+	"sync/atomic"
+	"syscall"
+	"testing"
+	"unsafe"
+)
+
+func TestAtomicMmap(t *testing.T) {
+	// Test that atomic operations work on "external" memory. Previously they crashed (#16206).
+	// Also do a sanity correctness check: under race detector atomic operations
+	// are implemented inside of race runtime.
+	mem, err := syscall.Mmap(-1, 0, 1<<20, syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_ANON|syscall.MAP_PRIVATE)
+	if err != nil {
+		t.Fatalf("mmap failed: %v", err)
+	}
+	defer syscall.Munmap(mem)
+	a := (*uint64)(unsafe.Pointer(&mem[0]))
+	if *a != 0 {
+		t.Fatalf("bad atomic value: %v, want 0", *a)
+	}
+	atomic.AddUint64(a, 1)
+	if *a != 1 {
+		t.Fatalf("bad atomic value: %v, want 1", *a)
+	}
+	atomic.AddUint64(a, 1)
+	if *a != 2 {
+		t.Fatalf("bad atomic value: %v, want 2", *a)
+	}
+}
+
+func TestAtomicPageBoundary(t *testing.T) {
+	// Test that atomic access near (but not cross) a page boundary
+	// doesn't fault. See issue 60825.
+
+	// Mmap two pages of memory, and make the second page inaccessible,
+	// so we have an address at the end of a page.
+	pagesize := syscall.Getpagesize()
+	b, err := syscall.Mmap(0, 0, 2*pagesize, syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_ANON|syscall.MAP_PRIVATE)
+	if err != nil {
+		t.Fatalf("mmap failed %s", err)
+	}
+	defer syscall.Munmap(b)
+	err = syscall.Mprotect(b[pagesize:], syscall.PROT_NONE)
+	if err != nil {
+		t.Fatalf("mprotect high failed %s\n", err)
+	}
+
+	// This should not fault.
+	a := (*uint32)(unsafe.Pointer(&b[pagesize-4]))
+	atomic.StoreUint32(a, 1)
+	if x := atomic.LoadUint32(a); x != 1 {
+		t.Fatalf("bad atomic value: %v, want 1", x)
+	}
+	if x := atomic.AddUint32(a, 1); x != 2 {
+		t.Fatalf("bad atomic value: %v, want 2", x)
+	}
+}
diff --git a/src/runtime/race/race_test.go b/src/runtime/race/race_test.go
new file mode 100644
index 0000000..4fe6168
--- /dev/null
+++ b/src/runtime/race/race_test.go
@@ -0,0 +1,250 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build race
+
+// This program is used to verify the race detector
+// by running the tests and parsing their output.
+// It does not check stack correctness, completeness or anything else:
+// it merely verifies that if a test is expected to be racy
+// then the race is detected.
+package race_test
+
+import (
+	"bufio"
+	"bytes"
+	"fmt"
+	"internal/testenv"
+	"io"
+	"log"
+	"math/rand"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"strings"
+	"sync"
+	"sync/atomic"
+	"testing"
+)
+
+var (
+	passedTests = 0
+	totalTests  = 0
+	falsePos    = 0
+	falseNeg    = 0
+	failingPos  = 0
+	failingNeg  = 0
+	failed      = false
+)
+
+const (
+	visibleLen = 40
+	testPrefix = "=== RUN   Test"
+)
+
+func TestRace(t *testing.T) {
+	testOutput, err := runTests(t)
+	if err != nil {
+		t.Fatalf("Failed to run tests: %v\n%v", err, string(testOutput))
+	}
+	reader := bufio.NewReader(bytes.NewReader(testOutput))
+
+	funcName := ""
+	var tsanLog []string
+	for {
+		s, err := nextLine(reader)
+		if err != nil {
+			fmt.Printf("%s\n", processLog(funcName, tsanLog))
+			break
+		}
+		if strings.HasPrefix(s, testPrefix) {
+			fmt.Printf("%s\n", processLog(funcName, tsanLog))
+			tsanLog = make([]string, 0, 100)
+			funcName = s[len(testPrefix):]
+		} else {
+			tsanLog = append(tsanLog, s)
+		}
+	}
+
+	if totalTests == 0 {
+		t.Fatalf("failed to parse test output:\n%s", testOutput)
+	}
+	fmt.Printf("\nPassed %d of %d tests (%.02f%%, %d+, %d-)\n",
+		passedTests, totalTests, 100*float64(passedTests)/float64(totalTests), falsePos, falseNeg)
+	fmt.Printf("%d expected failures (%d has not fail)\n", failingPos+failingNeg, failingNeg)
+	if failed {
+		t.Fail()
+	}
+}
+
+// nextLine is a wrapper around bufio.Reader.ReadString.
+// It reads a line up to the next '\n' character. Error
+// is non-nil if there are no lines left, and nil
+// otherwise.
+func nextLine(r *bufio.Reader) (string, error) {
+	s, err := r.ReadString('\n')
+	if err != nil {
+		if err != io.EOF {
+			log.Fatalf("nextLine: expected EOF, received %v", err)
+		}
+		return s, err
+	}
+	return s[:len(s)-1], nil
+}
+
+// processLog verifies whether the given ThreadSanitizer's log
+// contains a race report, checks this information against
+// the name of the testcase and returns the result of this
+// comparison.
+func processLog(testName string, tsanLog []string) string {
+	if !strings.HasPrefix(testName, "Race") && !strings.HasPrefix(testName, "NoRace") {
+		return ""
+	}
+	gotRace := false
+	for _, s := range tsanLog {
+		if strings.Contains(s, "DATA RACE") {
+			gotRace = true
+			break
+		}
+	}
+
+	failing := strings.Contains(testName, "Failing")
+	expRace := !strings.HasPrefix(testName, "No")
+	for len(testName) < visibleLen {
+		testName += " "
+	}
+	if expRace == gotRace {
+		passedTests++
+		totalTests++
+		if failing {
+			failed = true
+			failingNeg++
+		}
+		return fmt.Sprintf("%s .", testName)
+	}
+	pos := ""
+	if expRace {
+		falseNeg++
+	} else {
+		falsePos++
+		pos = "+"
+	}
+	if failing {
+		failingPos++
+	} else {
+		failed = true
+	}
+	totalTests++
+	return fmt.Sprintf("%s %s%s", testName, "FAILED", pos)
+}
+
+// runTests assures that the package and its dependencies is
+// built with instrumentation enabled and returns the output of 'go test'
+// which includes possible data race reports from ThreadSanitizer.
+func runTests(t *testing.T) ([]byte, error) {
+	tests, err := filepath.Glob("./testdata/*_test.go")
+	if err != nil {
+		return nil, err
+	}
+	args := []string{"test", "-race", "-v"}
+	args = append(args, tests...)
+	cmd := exec.Command(testenv.GoToolPath(t), args...)
+	// The following flags turn off heuristics that suppress seemingly identical reports.
+	// It is required because the tests contain a lot of data races on the same addresses
+	// (the tests are simple and the memory is constantly reused).
+	for _, env := range os.Environ() {
+		if strings.HasPrefix(env, "GOMAXPROCS=") ||
+			strings.HasPrefix(env, "GODEBUG=") ||
+			strings.HasPrefix(env, "GORACE=") {
+			continue
+		}
+		cmd.Env = append(cmd.Env, env)
+	}
+	// We set GOMAXPROCS=1 to prevent test flakiness.
+	// There are two sources of flakiness:
+	// 1. Some tests rely on particular execution order.
+	//    If the order is different, race does not happen at all.
+	// 2. Ironically, ThreadSanitizer runtime contains a logical race condition
+	//    that can lead to false negatives if racy accesses happen literally at the same time.
+	// Tests used to work reliably in the good old days of GOMAXPROCS=1.
+	// So let's set it for now. A more reliable solution is to explicitly annotate tests
+	// with required execution order by means of a special "invisible" synchronization primitive
+	// (that's what is done for C++ ThreadSanitizer tests). This is issue #14119.
+	cmd.Env = append(cmd.Env,
+		"GOMAXPROCS=1",
+		"GORACE=suppress_equal_stacks=0 suppress_equal_addresses=0",
+	)
+	// There are races: we expect tests to fail and the exit code to be non-zero.
+	out, _ := cmd.CombinedOutput()
+	if bytes.Contains(out, []byte("fatal error:")) {
+		// But don't expect runtime to crash.
+		return out, fmt.Errorf("runtime fatal error")
+	}
+	return out, nil
+}
+
+func TestIssue8102(t *testing.T) {
+	// If this compiles with -race, the test passes.
+	type S struct {
+		x any
+		i int
+	}
+	c := make(chan int)
+	a := [2]*int{}
+	for ; ; c <- *a[S{}.i] {
+		if t != nil {
+			break
+		}
+	}
+}
+
+func TestIssue9137(t *testing.T) {
+	a := []string{"a"}
+	i := 0
+	a[i], a[len(a)-1], a = a[len(a)-1], "", a[:len(a)-1]
+	if len(a) != 0 || a[:1][0] != "" {
+		t.Errorf("mangled a: %q %q", a, a[:1])
+	}
+}
+
+func BenchmarkSyncLeak(b *testing.B) {
+	const (
+		G = 1000
+		S = 1000
+		H = 10
+	)
+	var wg sync.WaitGroup
+	wg.Add(G)
+	for g := 0; g < G; g++ {
+		go func() {
+			defer wg.Done()
+			hold := make([][]uint32, H)
+			for i := 0; i < b.N; i++ {
+				a := make([]uint32, S)
+				atomic.AddUint32(&a[rand.Intn(len(a))], 1)
+				hold[rand.Intn(len(hold))] = a
+			}
+			_ = hold
+		}()
+	}
+	wg.Wait()
+}
+
+func BenchmarkStackLeak(b *testing.B) {
+	done := make(chan bool, 1)
+	for i := 0; i < b.N; i++ {
+		go func() {
+			growStack(rand.Intn(100))
+			done <- true
+		}()
+		<-done
+	}
+}
+
+func growStack(i int) {
+	if i == 0 {
+		return
+	}
+	growStack(i - 1)
+}
diff --git a/src/runtime/race/race_unix_test.go b/src/runtime/race/race_unix_test.go
new file mode 100644
index 0000000..3cf53b0
--- /dev/null
+++ b/src/runtime/race/race_unix_test.go
@@ -0,0 +1,29 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build race && (darwin || freebsd || linux)
+
+package race_test
+
+import (
+	"sync/atomic"
+	"syscall"
+	"testing"
+	"unsafe"
+)
+
+// Test that race detector does not crash when accessing non-Go allocated memory (issue 9136).
+func TestNonGoMemory(t *testing.T) {
+	data, err := syscall.Mmap(-1, 0, 4096, syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_ANON|syscall.MAP_PRIVATE)
+	if err != nil {
+		t.Fatalf("failed to mmap memory: %v", err)
+	}
+	defer syscall.Munmap(data)
+	p := (*uint32)(unsafe.Pointer(&data[0]))
+	atomic.AddUint32(p, 1)
+	(*p)++
+	if *p != 2 {
+		t.Fatalf("data[0] = %v, expect 2", *p)
+	}
+}
diff --git a/src/runtime/race/race_v1_amd64.go b/src/runtime/race/race_v1_amd64.go
new file mode 100644
index 0000000..7c40db1
--- /dev/null
+++ b/src/runtime/race/race_v1_amd64.go
@@ -0,0 +1,9 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build (linux && !amd64.v3) || darwin || freebsd || netbsd || openbsd || windows
+
+package race
+
+import _ "runtime/race/internal/amd64v1"
diff --git a/src/runtime/race/race_v3_amd64.go b/src/runtime/race/race_v3_amd64.go
new file mode 100644
index 0000000..80728d8
--- /dev/null
+++ b/src/runtime/race/race_v3_amd64.go
@@ -0,0 +1,9 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && amd64.v3
+
+package race
+
+import _ "runtime/race/internal/amd64v3"
diff --git a/src/runtime/race/race_windows_test.go b/src/runtime/race/race_windows_test.go
new file mode 100644
index 0000000..143b483
--- /dev/null
+++ b/src/runtime/race/race_windows_test.go
@@ -0,0 +1,46 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build windows && race
+
+package race_test
+
+import (
+	"sync/atomic"
+	"syscall"
+	"testing"
+	"unsafe"
+)
+
+func TestAtomicMmap(t *testing.T) {
+	// Test that atomic operations work on "external" memory. Previously they crashed (#16206).
+	// Also do a sanity correctness check: under race detector atomic operations
+	// are implemented inside of race runtime.
+	kernel32 := syscall.NewLazyDLL("kernel32.dll")
+	VirtualAlloc := kernel32.NewProc("VirtualAlloc")
+	VirtualFree := kernel32.NewProc("VirtualFree")
+	const (
+		MEM_COMMIT     = 0x00001000
+		MEM_RESERVE    = 0x00002000
+		MEM_RELEASE    = 0x8000
+		PAGE_READWRITE = 0x04
+	)
+	mem, _, err := syscall.Syscall6(VirtualAlloc.Addr(), 4, 0, 1<<20, MEM_COMMIT|MEM_RESERVE, PAGE_READWRITE, 0, 0)
+	if err != 0 {
+		t.Fatalf("VirtualAlloc failed: %v", err)
+	}
+	defer syscall.Syscall(VirtualFree.Addr(), 3, mem, 1<<20, MEM_RELEASE)
+	a := (*uint64)(unsafe.Pointer(mem))
+	if *a != 0 {
+		t.Fatalf("bad atomic value: %v, want 0", *a)
+	}
+	atomic.AddUint64(a, 1)
+	if *a != 1 {
+		t.Fatalf("bad atomic value: %v, want 1", *a)
+	}
+	atomic.AddUint64(a, 1)
+	if *a != 2 {
+		t.Fatalf("bad atomic value: %v, want 2", *a)
+	}
+}
diff --git a/src/runtime/race/sched_test.go b/src/runtime/race/sched_test.go
new file mode 100644
index 0000000..a66860c
--- /dev/null
+++ b/src/runtime/race/sched_test.go
@@ -0,0 +1,48 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build race
+
+package race_test
+
+import (
+	"fmt"
+	"reflect"
+	"runtime"
+	"strings"
+	"testing"
+)
+
+func TestRandomScheduling(t *testing.T) {
+	// Scheduler is most consistent with GOMAXPROCS=1.
+	// Use that to make the test most likely to fail.
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(1))
+	const N = 10
+	out := make([][]int, N)
+	for i := 0; i < N; i++ {
+		c := make(chan int, N)
+		for j := 0; j < N; j++ {
+			go func(j int) {
+				c <- j
+			}(j)
+		}
+		row := make([]int, N)
+		for j := 0; j < N; j++ {
+			row[j] = <-c
+		}
+		out[i] = row
+	}
+
+	for i := 0; i < N; i++ {
+		if !reflect.DeepEqual(out[0], out[i]) {
+			return // found a different order
+		}
+	}
+
+	var buf strings.Builder
+	for i := 0; i < N; i++ {
+		fmt.Fprintf(&buf, "%v\n", out[i])
+	}
+	t.Fatalf("consistent goroutine execution order:\n%v", buf.String())
+}
diff --git a/src/runtime/race/syso_test.go b/src/runtime/race/syso_test.go
new file mode 100644
index 0000000..2f1a91c
--- /dev/null
+++ b/src/runtime/race/syso_test.go
@@ -0,0 +1,33 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build race
+
+package race
+
+import (
+	"bytes"
+	"os/exec"
+	"path/filepath"
+	"runtime"
+	"testing"
+)
+
+func TestIssue37485(t *testing.T) {
+	files, err := filepath.Glob("./*.syso")
+	if err != nil {
+		t.Fatalf("can't find syso files: %s", err)
+	}
+	for _, f := range files {
+		cmd := exec.Command(filepath.Join(runtime.GOROOT(), "bin", "go"), "tool", "nm", f)
+		res, err := cmd.CombinedOutput()
+		if err != nil {
+			t.Errorf("nm of %s failed: %s", f, err)
+			continue
+		}
+		if bytes.Contains(res, []byte("getauxval")) {
+			t.Errorf("%s contains getauxval", f)
+		}
+	}
+}
diff --git a/src/runtime/race/testdata/atomic_test.go b/src/runtime/race/testdata/atomic_test.go
new file mode 100644
index 0000000..4ce7260
--- /dev/null
+++ b/src/runtime/race/testdata/atomic_test.go
@@ -0,0 +1,325 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"runtime"
+	"sync"
+	"sync/atomic"
+	"testing"
+	"unsafe"
+)
+
+func TestNoRaceAtomicAddInt64(t *testing.T) {
+	var x1, x2 int8
+	_ = x1 + x2
+	var s int64
+	ch := make(chan bool, 2)
+	go func() {
+		x1 = 1
+		if atomic.AddInt64(&s, 1) == 2 {
+			x2 = 1
+		}
+		ch <- true
+	}()
+	go func() {
+		x2 = 1
+		if atomic.AddInt64(&s, 1) == 2 {
+			x1 = 1
+		}
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceAtomicAddInt64(t *testing.T) {
+	var x1, x2 int8
+	_ = x1 + x2
+	var s int64
+	ch := make(chan bool, 2)
+	go func() {
+		x1 = 1
+		if atomic.AddInt64(&s, 1) == 1 {
+			x2 = 1
+		}
+		ch <- true
+	}()
+	go func() {
+		x2 = 1
+		if atomic.AddInt64(&s, 1) == 1 {
+			x1 = 1
+		}
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceAtomicAddInt32(t *testing.T) {
+	var x1, x2 int8
+	_ = x1 + x2
+	var s int32
+	ch := make(chan bool, 2)
+	go func() {
+		x1 = 1
+		if atomic.AddInt32(&s, 1) == 2 {
+			x2 = 1
+		}
+		ch <- true
+	}()
+	go func() {
+		x2 = 1
+		if atomic.AddInt32(&s, 1) == 2 {
+			x1 = 1
+		}
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceAtomicLoadAddInt32(t *testing.T) {
+	var x int64
+	_ = x
+	var s int32
+	go func() {
+		x = 2
+		atomic.AddInt32(&s, 1)
+	}()
+	for atomic.LoadInt32(&s) != 1 {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestNoRaceAtomicLoadStoreInt32(t *testing.T) {
+	var x int64
+	_ = x
+	var s int32
+	go func() {
+		x = 2
+		atomic.StoreInt32(&s, 1)
+	}()
+	for atomic.LoadInt32(&s) != 1 {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestNoRaceAtomicStoreCASInt32(t *testing.T) {
+	var x int64
+	_ = x
+	var s int32
+	go func() {
+		x = 2
+		atomic.StoreInt32(&s, 1)
+	}()
+	for !atomic.CompareAndSwapInt32(&s, 1, 0) {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestNoRaceAtomicCASLoadInt32(t *testing.T) {
+	var x int64
+	_ = x
+	var s int32
+	go func() {
+		x = 2
+		if !atomic.CompareAndSwapInt32(&s, 0, 1) {
+			panic("")
+		}
+	}()
+	for atomic.LoadInt32(&s) != 1 {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestNoRaceAtomicCASCASInt32(t *testing.T) {
+	var x int64
+	_ = x
+	var s int32
+	go func() {
+		x = 2
+		if !atomic.CompareAndSwapInt32(&s, 0, 1) {
+			panic("")
+		}
+	}()
+	for !atomic.CompareAndSwapInt32(&s, 1, 0) {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestNoRaceAtomicCASCASInt32_2(t *testing.T) {
+	var x1, x2 int8
+	_ = x1 + x2
+	var s int32
+	ch := make(chan bool, 2)
+	go func() {
+		x1 = 1
+		if !atomic.CompareAndSwapInt32(&s, 0, 1) {
+			x2 = 1
+		}
+		ch <- true
+	}()
+	go func() {
+		x2 = 1
+		if !atomic.CompareAndSwapInt32(&s, 0, 1) {
+			x1 = 1
+		}
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceAtomicLoadInt64(t *testing.T) {
+	var x int32
+	_ = x
+	var s int64
+	go func() {
+		x = 2
+		atomic.AddInt64(&s, 1)
+	}()
+	for atomic.LoadInt64(&s) != 1 {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestNoRaceAtomicCASCASUInt64(t *testing.T) {
+	var x int64
+	_ = x
+	var s uint64
+	go func() {
+		x = 2
+		if !atomic.CompareAndSwapUint64(&s, 0, 1) {
+			panic("")
+		}
+	}()
+	for !atomic.CompareAndSwapUint64(&s, 1, 0) {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestNoRaceAtomicLoadStorePointer(t *testing.T) {
+	var x int64
+	_ = x
+	var s unsafe.Pointer
+	var y int = 2
+	var p unsafe.Pointer = unsafe.Pointer(&y)
+	go func() {
+		x = 2
+		atomic.StorePointer(&s, p)
+	}()
+	for atomic.LoadPointer(&s) != p {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestNoRaceAtomicStoreCASUint64(t *testing.T) {
+	var x int64
+	_ = x
+	var s uint64
+	go func() {
+		x = 2
+		atomic.StoreUint64(&s, 1)
+	}()
+	for !atomic.CompareAndSwapUint64(&s, 1, 0) {
+		runtime.Gosched()
+	}
+	x = 1
+}
+
+func TestRaceAtomicStoreLoad(t *testing.T) {
+	c := make(chan bool)
+	var a uint64
+	go func() {
+		atomic.StoreUint64(&a, 1)
+		c <- true
+	}()
+	_ = a
+	<-c
+}
+
+func TestRaceAtomicLoadStore(t *testing.T) {
+	c := make(chan bool)
+	var a uint64
+	go func() {
+		_ = atomic.LoadUint64(&a)
+		c <- true
+	}()
+	a = 1
+	<-c
+}
+
+func TestRaceAtomicAddLoad(t *testing.T) {
+	c := make(chan bool)
+	var a uint64
+	go func() {
+		atomic.AddUint64(&a, 1)
+		c <- true
+	}()
+	_ = a
+	<-c
+}
+
+func TestRaceAtomicAddStore(t *testing.T) {
+	c := make(chan bool)
+	var a uint64
+	go func() {
+		atomic.AddUint64(&a, 1)
+		c <- true
+	}()
+	a = 42
+	<-c
+}
+
+// A nil pointer in an atomic operation should not deadlock
+// the rest of the program. Used to hang indefinitely.
+func TestNoRaceAtomicCrash(t *testing.T) {
+	var mutex sync.Mutex
+	var nilptr *int32
+	panics := 0
+	defer func() {
+		if x := recover(); x != nil {
+			mutex.Lock()
+			panics++
+			mutex.Unlock()
+		} else {
+			panic("no panic")
+		}
+	}()
+	atomic.AddInt32(nilptr, 1)
+}
+
+func TestNoRaceDeferAtomicStore(t *testing.T) {
+	// Test that when an atomic function is deferred directly, the
+	// GC scans it correctly. See issue 42599.
+	type foo struct {
+		bar int64
+	}
+
+	var doFork func(f *foo, depth int)
+	doFork = func(f *foo, depth int) {
+		atomic.StoreInt64(&f.bar, 1)
+		defer atomic.StoreInt64(&f.bar, 0)
+		if depth > 0 {
+			for i := 0; i < 2; i++ {
+				f2 := &foo{}
+				go doFork(f2, depth-1)
+			}
+		}
+		runtime.GC()
+	}
+
+	f := &foo{}
+	doFork(f, 11)
+}
diff --git a/src/runtime/race/testdata/cgo_test.go b/src/runtime/race/testdata/cgo_test.go
new file mode 100644
index 0000000..211ef7d
--- /dev/null
+++ b/src/runtime/race/testdata/cgo_test.go
@@ -0,0 +1,21 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"testing"
+)
+
+func TestNoRaceCgoSync(t *testing.T) {
+	cmd := exec.Command(testenv.GoToolPath(t), "run", "-race", "cgo_test_main.go")
+	cmd.Stdout = os.Stdout
+	cmd.Stderr = os.Stderr
+	if err := cmd.Run(); err != nil {
+		t.Fatalf("program exited with error: %v\n", err)
+	}
+}
diff --git a/src/runtime/race/testdata/cgo_test_main.go b/src/runtime/race/testdata/cgo_test_main.go
new file mode 100644
index 0000000..620cea1
--- /dev/null
+++ b/src/runtime/race/testdata/cgo_test_main.go
@@ -0,0 +1,30 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+/*
+int sync;
+
+void Notify(void)
+{
+	__sync_fetch_and_add(&sync, 1);
+}
+
+void Wait(void)
+{
+	while(__sync_fetch_and_add(&sync, 0) == 0) {}
+}
+*/
+import "C"
+
+func main() {
+	data := 0
+	go func() {
+		data = 1
+		C.Notify()
+	}()
+	C.Wait()
+	_ = data
+}
diff --git a/src/runtime/race/testdata/chan_test.go b/src/runtime/race/testdata/chan_test.go
new file mode 100644
index 0000000..e39ad4f
--- /dev/null
+++ b/src/runtime/race/testdata/chan_test.go
@@ -0,0 +1,787 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"runtime"
+	"testing"
+	"time"
+)
+
+func TestNoRaceChanSync(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int)
+	go func() {
+		v = 1
+		c <- 0
+	}()
+	<-c
+	v = 2
+}
+
+func TestNoRaceChanSyncRev(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int)
+	go func() {
+		c <- 0
+		v = 2
+	}()
+	v = 1
+	<-c
+}
+
+func TestNoRaceChanAsync(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int, 10)
+	go func() {
+		v = 1
+		c <- 0
+	}()
+	<-c
+	v = 2
+}
+
+func TestRaceChanAsyncRev(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int, 10)
+	go func() {
+		c <- 0
+		v = 1
+	}()
+	v = 2
+	<-c
+}
+
+func TestNoRaceChanAsyncCloseRecv(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int, 10)
+	go func() {
+		v = 1
+		close(c)
+	}()
+	func() {
+		defer func() {
+			recover()
+			v = 2
+		}()
+		<-c
+	}()
+}
+
+func TestNoRaceChanAsyncCloseRecv2(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int, 10)
+	go func() {
+		v = 1
+		close(c)
+	}()
+	_, _ = <-c
+	v = 2
+}
+
+func TestNoRaceChanAsyncCloseRecv3(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int, 10)
+	go func() {
+		v = 1
+		close(c)
+	}()
+	for range c {
+	}
+	v = 2
+}
+
+func TestNoRaceChanSyncCloseRecv(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int)
+	go func() {
+		v = 1
+		close(c)
+	}()
+	func() {
+		defer func() {
+			recover()
+			v = 2
+		}()
+		<-c
+	}()
+}
+
+func TestNoRaceChanSyncCloseRecv2(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int)
+	go func() {
+		v = 1
+		close(c)
+	}()
+	_, _ = <-c
+	v = 2
+}
+
+func TestNoRaceChanSyncCloseRecv3(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int)
+	go func() {
+		v = 1
+		close(c)
+	}()
+	for range c {
+	}
+	v = 2
+}
+
+func TestRaceChanSyncCloseSend(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int)
+	go func() {
+		v = 1
+		close(c)
+	}()
+	func() {
+		defer func() {
+			recover()
+		}()
+		c <- 0
+	}()
+	v = 2
+}
+
+func TestRaceChanAsyncCloseSend(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int, 10)
+	go func() {
+		v = 1
+		close(c)
+	}()
+	func() {
+		defer func() {
+			recover()
+		}()
+		for {
+			c <- 0
+		}
+	}()
+	v = 2
+}
+
+func TestRaceChanCloseClose(t *testing.T) {
+	compl := make(chan bool, 2)
+	v1 := 0
+	v2 := 0
+	_ = v1 + v2
+	c := make(chan int)
+	go func() {
+		defer func() {
+			if recover() != nil {
+				v2 = 2
+			}
+			compl <- true
+		}()
+		v1 = 1
+		close(c)
+	}()
+	go func() {
+		defer func() {
+			if recover() != nil {
+				v1 = 2
+			}
+			compl <- true
+		}()
+		v2 = 1
+		close(c)
+	}()
+	<-compl
+	<-compl
+}
+
+func TestRaceChanSendLen(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int, 10)
+	go func() {
+		v = 1
+		c <- 1
+	}()
+	for len(c) == 0 {
+		runtime.Gosched()
+	}
+	v = 2
+}
+
+func TestRaceChanRecvLen(t *testing.T) {
+	v := 0
+	_ = v
+	c := make(chan int, 10)
+	c <- 1
+	go func() {
+		v = 1
+		<-c
+	}()
+	for len(c) != 0 {
+		runtime.Gosched()
+	}
+	v = 2
+}
+
+func TestRaceChanSendSend(t *testing.T) {
+	compl := make(chan bool, 2)
+	v1 := 0
+	v2 := 0
+	_ = v1 + v2
+	c := make(chan int, 1)
+	go func() {
+		v1 = 1
+		select {
+		case c <- 1:
+		default:
+			v2 = 2
+		}
+		compl <- true
+	}()
+	go func() {
+		v2 = 1
+		select {
+		case c <- 1:
+		default:
+			v1 = 2
+		}
+		compl <- true
+	}()
+	<-compl
+	<-compl
+}
+
+func TestNoRaceChanPtr(t *testing.T) {
+	type msg struct {
+		x int
+	}
+	c := make(chan *msg)
+	go func() {
+		c <- &msg{1}
+	}()
+	m := <-c
+	m.x = 2
+}
+
+func TestRaceChanWrongSend(t *testing.T) {
+	v1 := 0
+	v2 := 0
+	_ = v1 + v2
+	c := make(chan int, 2)
+	go func() {
+		v1 = 1
+		c <- 1
+	}()
+	go func() {
+		v2 = 2
+		c <- 2
+	}()
+	time.Sleep(1e7)
+	if <-c == 1 {
+		v2 = 3
+	} else {
+		v1 = 3
+	}
+}
+
+func TestRaceChanWrongClose(t *testing.T) {
+	v1 := 0
+	v2 := 0
+	_ = v1 + v2
+	c := make(chan int, 1)
+	done := make(chan bool)
+	go func() {
+		defer func() {
+			recover()
+		}()
+		v1 = 1
+		c <- 1
+		done <- true
+	}()
+	go func() {
+		time.Sleep(1e7)
+		v2 = 2
+		close(c)
+		done <- true
+	}()
+	time.Sleep(2e7)
+	if _, who := <-c; who {
+		v2 = 2
+	} else {
+		v1 = 2
+	}
+	<-done
+	<-done
+}
+
+func TestRaceChanSendClose(t *testing.T) {
+	compl := make(chan bool, 2)
+	c := make(chan int, 1)
+	go func() {
+		defer func() {
+			recover()
+			compl <- true
+		}()
+		c <- 1
+	}()
+	go func() {
+		time.Sleep(10 * time.Millisecond)
+		close(c)
+		compl <- true
+	}()
+	<-compl
+	<-compl
+}
+
+func TestRaceChanSendSelectClose(t *testing.T) {
+	compl := make(chan bool, 2)
+	c := make(chan int, 1)
+	c1 := make(chan int)
+	go func() {
+		defer func() {
+			recover()
+			compl <- true
+		}()
+		time.Sleep(10 * time.Millisecond)
+		select {
+		case c <- 1:
+		case <-c1:
+		}
+	}()
+	go func() {
+		close(c)
+		compl <- true
+	}()
+	<-compl
+	<-compl
+}
+
+func TestRaceSelectReadWriteAsync(t *testing.T) {
+	done := make(chan bool)
+	x := 0
+	c1 := make(chan int, 10)
+	c2 := make(chan int, 10)
+	c3 := make(chan int)
+	c2 <- 1
+	go func() {
+		select {
+		case c1 <- x: // read of x races with...
+		case c3 <- 1:
+		}
+		done <- true
+	}()
+	select {
+	case x = <-c2: // ... write to x here
+	case c3 <- 1:
+	}
+	<-done
+}
+
+func TestRaceSelectReadWriteSync(t *testing.T) {
+	done := make(chan bool)
+	x := 0
+	c1 := make(chan int)
+	c2 := make(chan int)
+	c3 := make(chan int)
+	// make c1 and c2 ready for communication
+	go func() {
+		<-c1
+	}()
+	go func() {
+		c2 <- 1
+	}()
+	go func() {
+		select {
+		case c1 <- x: // read of x races with...
+		case c3 <- 1:
+		}
+		done <- true
+	}()
+	select {
+	case x = <-c2: // ... write to x here
+	case c3 <- 1:
+	}
+	<-done
+}
+
+func TestNoRaceSelectReadWriteAsync(t *testing.T) {
+	done := make(chan bool)
+	x := 0
+	c1 := make(chan int)
+	c2 := make(chan int)
+	go func() {
+		select {
+		case c1 <- x: // read of x does not race with...
+		case c2 <- 1:
+		}
+		done <- true
+	}()
+	select {
+	case x = <-c1: // ... write to x here
+	case c2 <- 1:
+	}
+	<-done
+}
+
+func TestRaceChanReadWriteAsync(t *testing.T) {
+	done := make(chan bool)
+	c1 := make(chan int, 10)
+	c2 := make(chan int, 10)
+	c2 <- 10
+	x := 0
+	go func() {
+		c1 <- x // read of x races with...
+		done <- true
+	}()
+	x = <-c2 // ... write to x here
+	<-done
+}
+
+func TestRaceChanReadWriteSync(t *testing.T) {
+	done := make(chan bool)
+	c1 := make(chan int)
+	c2 := make(chan int)
+	// make c1 and c2 ready for communication
+	go func() {
+		<-c1
+	}()
+	go func() {
+		c2 <- 10
+	}()
+	x := 0
+	go func() {
+		c1 <- x // read of x races with...
+		done <- true
+	}()
+	x = <-c2 // ... write to x here
+	<-done
+}
+
+func TestNoRaceChanReadWriteAsync(t *testing.T) {
+	done := make(chan bool)
+	c1 := make(chan int, 10)
+	x := 0
+	go func() {
+		c1 <- x // read of x does not race with...
+		done <- true
+	}()
+	x = <-c1 // ... write to x here
+	<-done
+}
+
+func TestNoRaceProducerConsumerUnbuffered(t *testing.T) {
+	type Task struct {
+		f    func()
+		done chan bool
+	}
+
+	queue := make(chan Task)
+
+	go func() {
+		t := <-queue
+		t.f()
+		t.done <- true
+	}()
+
+	doit := func(f func()) {
+		done := make(chan bool, 1)
+		queue <- Task{f, done}
+		<-done
+	}
+
+	x := 0
+	doit(func() {
+		x = 1
+	})
+	_ = x
+}
+
+func TestRaceChanItselfSend(t *testing.T) {
+	compl := make(chan bool, 1)
+	c := make(chan int, 10)
+	go func() {
+		c <- 0
+		compl <- true
+	}()
+	c = make(chan int, 20)
+	<-compl
+}
+
+func TestRaceChanItselfRecv(t *testing.T) {
+	compl := make(chan bool, 1)
+	c := make(chan int, 10)
+	c <- 1
+	go func() {
+		<-c
+		compl <- true
+	}()
+	time.Sleep(1e7)
+	c = make(chan int, 20)
+	<-compl
+}
+
+func TestRaceChanItselfNil(t *testing.T) {
+	c := make(chan int, 10)
+	go func() {
+		c <- 0
+	}()
+	time.Sleep(1e7)
+	c = nil
+	_ = c
+}
+
+func TestRaceChanItselfClose(t *testing.T) {
+	compl := make(chan bool, 1)
+	c := make(chan int)
+	go func() {
+		close(c)
+		compl <- true
+	}()
+	c = make(chan int)
+	<-compl
+}
+
+func TestRaceChanItselfLen(t *testing.T) {
+	compl := make(chan bool, 1)
+	c := make(chan int)
+	go func() {
+		_ = len(c)
+		compl <- true
+	}()
+	c = make(chan int)
+	<-compl
+}
+
+func TestRaceChanItselfCap(t *testing.T) {
+	compl := make(chan bool, 1)
+	c := make(chan int)
+	go func() {
+		_ = cap(c)
+		compl <- true
+	}()
+	c = make(chan int)
+	<-compl
+}
+
+func TestNoRaceChanCloseLen(t *testing.T) {
+	c := make(chan int, 10)
+	r := make(chan int, 10)
+	go func() {
+		r <- len(c)
+	}()
+	go func() {
+		close(c)
+		r <- 0
+	}()
+	<-r
+	<-r
+}
+
+func TestNoRaceChanCloseCap(t *testing.T) {
+	c := make(chan int, 10)
+	r := make(chan int, 10)
+	go func() {
+		r <- cap(c)
+	}()
+	go func() {
+		close(c)
+		r <- 0
+	}()
+	<-r
+	<-r
+}
+
+func TestRaceChanCloseSend(t *testing.T) {
+	compl := make(chan bool, 1)
+	c := make(chan int, 10)
+	go func() {
+		close(c)
+		compl <- true
+	}()
+	c <- 0
+	<-compl
+}
+
+func TestNoRaceChanMutex(t *testing.T) {
+	done := make(chan struct{})
+	mtx := make(chan struct{}, 1)
+	data := 0
+	_ = data
+	go func() {
+		mtx <- struct{}{}
+		data = 42
+		<-mtx
+		done <- struct{}{}
+	}()
+	mtx <- struct{}{}
+	data = 43
+	<-mtx
+	<-done
+}
+
+func TestNoRaceSelectMutex(t *testing.T) {
+	done := make(chan struct{})
+	mtx := make(chan struct{}, 1)
+	aux := make(chan bool)
+	data := 0
+	_ = data
+	go func() {
+		select {
+		case mtx <- struct{}{}:
+		case <-aux:
+		}
+		data = 42
+		select {
+		case <-mtx:
+		case <-aux:
+		}
+		done <- struct{}{}
+	}()
+	select {
+	case mtx <- struct{}{}:
+	case <-aux:
+	}
+	data = 43
+	select {
+	case <-mtx:
+	case <-aux:
+	}
+	<-done
+}
+
+func TestRaceChanSem(t *testing.T) {
+	done := make(chan struct{})
+	mtx := make(chan bool, 2)
+	data := 0
+	_ = data
+	go func() {
+		mtx <- true
+		data = 42
+		<-mtx
+		done <- struct{}{}
+	}()
+	mtx <- true
+	data = 43
+	<-mtx
+	<-done
+}
+
+func TestNoRaceChanWaitGroup(t *testing.T) {
+	const N = 10
+	chanWg := make(chan bool, N/2)
+	data := make([]int, N)
+	for i := 0; i < N; i++ {
+		chanWg <- true
+		go func(i int) {
+			data[i] = 42
+			<-chanWg
+		}(i)
+	}
+	for i := 0; i < cap(chanWg); i++ {
+		chanWg <- true
+	}
+	for i := 0; i < N; i++ {
+		_ = data[i]
+	}
+}
+
+// Test that sender synchronizes with receiver even if the sender was blocked.
+func TestNoRaceBlockedSendSync(t *testing.T) {
+	c := make(chan *int, 1)
+	c <- nil
+	go func() {
+		i := 42
+		c <- &i
+	}()
+	// Give the sender time to actually block.
+	// This sleep is completely optional: race report must not be printed
+	// regardless of whether the sender actually blocks or not.
+	// It cannot lead to flakiness.
+	time.Sleep(10 * time.Millisecond)
+	<-c
+	p := <-c
+	if *p != 42 {
+		t.Fatal()
+	}
+}
+
+// The same as TestNoRaceBlockedSendSync above, but sender unblock happens in a select.
+func TestNoRaceBlockedSelectSendSync(t *testing.T) {
+	c := make(chan *int, 1)
+	c <- nil
+	go func() {
+		i := 42
+		c <- &i
+	}()
+	time.Sleep(10 * time.Millisecond)
+	<-c
+	select {
+	case p := <-c:
+		if *p != 42 {
+			t.Fatal()
+		}
+	case <-make(chan int):
+	}
+}
+
+// Test that close synchronizes with a read from the empty closed channel.
+// See https://golang.org/issue/36714.
+func TestNoRaceCloseHappensBeforeRead(t *testing.T) {
+	for i := 0; i < 100; i++ {
+		var loc int
+		var write = make(chan struct{})
+		var read = make(chan struct{})
+
+		go func() {
+			select {
+			case <-write:
+				_ = loc
+			default:
+			}
+			close(read)
+		}()
+
+		go func() {
+			loc = 1
+			close(write)
+		}()
+
+		<-read
+	}
+}
+
+// Test that we call the proper race detector function when c.elemsize==0.
+// See https://github.com/golang/go/issues/42598
+func TestNoRaceElemetSize0(t *testing.T) {
+	var x, y int
+	var c = make(chan struct{}, 2)
+	c <- struct{}{}
+	c <- struct{}{}
+	go func() {
+		x += 1
+		<-c
+	}()
+	go func() {
+		y += 1
+		<-c
+	}()
+	time.Sleep(10 * time.Millisecond)
+	c <- struct{}{}
+	c <- struct{}{}
+	x += 1
+	y += 1
+}
diff --git a/src/runtime/race/testdata/comp_test.go b/src/runtime/race/testdata/comp_test.go
new file mode 100644
index 0000000..27b2d00
--- /dev/null
+++ b/src/runtime/race/testdata/comp_test.go
@@ -0,0 +1,186 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"testing"
+)
+
+type P struct {
+	x, y int
+}
+
+type S struct {
+	s1, s2 P
+}
+
+func TestNoRaceComp(t *testing.T) {
+	c := make(chan bool, 1)
+	var s S
+	go func() {
+		s.s2.x = 1
+		c <- true
+	}()
+	s.s2.y = 2
+	<-c
+}
+
+func TestNoRaceComp2(t *testing.T) {
+	c := make(chan bool, 1)
+	var s S
+	go func() {
+		s.s1.x = 1
+		c <- true
+	}()
+	s.s1.y = 2
+	<-c
+}
+
+func TestRaceComp(t *testing.T) {
+	c := make(chan bool, 1)
+	var s S
+	go func() {
+		s.s2.y = 1
+		c <- true
+	}()
+	s.s2.y = 2
+	<-c
+}
+
+func TestRaceComp2(t *testing.T) {
+	c := make(chan bool, 1)
+	var s S
+	go func() {
+		s.s1.x = 1
+		c <- true
+	}()
+	s = S{}
+	<-c
+}
+
+func TestRaceComp3(t *testing.T) {
+	c := make(chan bool, 1)
+	var s S
+	go func() {
+		s.s2.y = 1
+		c <- true
+	}()
+	s = S{}
+	<-c
+}
+
+func TestRaceCompArray(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]S, 10)
+	x := 4
+	go func() {
+		s[x].s2.y = 1
+		c <- true
+	}()
+	x = 5
+	<-c
+}
+
+type P2 P
+type S2 S
+
+func TestRaceConv1(t *testing.T) {
+	c := make(chan bool, 1)
+	var p P2
+	go func() {
+		p.x = 1
+		c <- true
+	}()
+	_ = P(p).x
+	<-c
+}
+
+func TestRaceConv2(t *testing.T) {
+	c := make(chan bool, 1)
+	var p P2
+	go func() {
+		p.x = 1
+		c <- true
+	}()
+	ptr := &p
+	_ = P(*ptr).x
+	<-c
+}
+
+func TestRaceConv3(t *testing.T) {
+	c := make(chan bool, 1)
+	var s S2
+	go func() {
+		s.s1.x = 1
+		c <- true
+	}()
+	_ = P2(S(s).s1).x
+	<-c
+}
+
+type X struct {
+	V [4]P
+}
+
+type X2 X
+
+func TestRaceConv4(t *testing.T) {
+	c := make(chan bool, 1)
+	var x X2
+	go func() {
+		x.V[1].x = 1
+		c <- true
+	}()
+	_ = P2(X(x).V[1]).x
+	<-c
+}
+
+type Ptr struct {
+	s1, s2 *P
+}
+
+func TestNoRaceCompPtr(t *testing.T) {
+	c := make(chan bool, 1)
+	p := Ptr{&P{}, &P{}}
+	go func() {
+		p.s1.x = 1
+		c <- true
+	}()
+	p.s1.y = 2
+	<-c
+}
+
+func TestNoRaceCompPtr2(t *testing.T) {
+	c := make(chan bool, 1)
+	p := Ptr{&P{}, &P{}}
+	go func() {
+		p.s1.x = 1
+		c <- true
+	}()
+	_ = p
+	<-c
+}
+
+func TestRaceCompPtr(t *testing.T) {
+	c := make(chan bool, 1)
+	p := Ptr{&P{}, &P{}}
+	go func() {
+		p.s2.x = 1
+		c <- true
+	}()
+	p.s2.x = 2
+	<-c
+}
+
+func TestRaceCompPtr2(t *testing.T) {
+	c := make(chan bool, 1)
+	p := Ptr{&P{}, &P{}}
+	go func() {
+		p.s2.x = 1
+		c <- true
+	}()
+	p.s2 = &P{}
+	<-c
+}
diff --git a/src/runtime/race/testdata/finalizer_test.go b/src/runtime/race/testdata/finalizer_test.go
new file mode 100644
index 0000000..3ac33d2
--- /dev/null
+++ b/src/runtime/race/testdata/finalizer_test.go
@@ -0,0 +1,68 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"runtime"
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestNoRaceFin(t *testing.T) {
+	c := make(chan bool)
+	go func() {
+		x := new(string)
+		runtime.SetFinalizer(x, func(x *string) {
+			*x = "foo"
+		})
+		*x = "bar"
+		c <- true
+	}()
+	<-c
+	runtime.GC()
+	time.Sleep(100 * time.Millisecond)
+}
+
+var finVar struct {
+	sync.Mutex
+	cnt int
+}
+
+func TestNoRaceFinGlobal(t *testing.T) {
+	c := make(chan bool)
+	go func() {
+		x := new(string)
+		runtime.SetFinalizer(x, func(x *string) {
+			finVar.Lock()
+			finVar.cnt++
+			finVar.Unlock()
+		})
+		c <- true
+	}()
+	<-c
+	runtime.GC()
+	time.Sleep(100 * time.Millisecond)
+	finVar.Lock()
+	finVar.cnt++
+	finVar.Unlock()
+}
+
+func TestRaceFin(t *testing.T) {
+	c := make(chan bool)
+	y := 0
+	_ = y
+	go func() {
+		x := new(string)
+		runtime.SetFinalizer(x, func(x *string) {
+			y = 42
+		})
+		c <- true
+	}()
+	<-c
+	runtime.GC()
+	time.Sleep(100 * time.Millisecond)
+	y = 66
+}
diff --git a/src/runtime/race/testdata/io_test.go b/src/runtime/race/testdata/io_test.go
new file mode 100644
index 0000000..3303cb0
--- /dev/null
+++ b/src/runtime/race/testdata/io_test.go
@@ -0,0 +1,75 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"fmt"
+	"net"
+	"net/http"
+	"os"
+	"path/filepath"
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestNoRaceIOFile(t *testing.T) {
+	x := 0
+	path := t.TempDir()
+	fname := filepath.Join(path, "data")
+	go func() {
+		x = 42
+		f, _ := os.Create(fname)
+		f.Write([]byte("done"))
+		f.Close()
+	}()
+	for {
+		f, err := os.Open(fname)
+		if err != nil {
+			time.Sleep(1e6)
+			continue
+		}
+		buf := make([]byte, 100)
+		count, err := f.Read(buf)
+		if count == 0 {
+			time.Sleep(1e6)
+			continue
+		}
+		break
+	}
+	_ = x
+}
+
+var (
+	regHandler  sync.Once
+	handlerData int
+)
+
+func TestNoRaceIOHttp(t *testing.T) {
+	regHandler.Do(func() {
+		http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
+			handlerData++
+			fmt.Fprintf(w, "test")
+			handlerData++
+		})
+	})
+	ln, err := net.Listen("tcp", "127.0.0.1:0")
+	if err != nil {
+		t.Fatalf("net.Listen: %v", err)
+	}
+	defer ln.Close()
+	go http.Serve(ln, nil)
+	handlerData++
+	_, err = http.Get("http://" + ln.Addr().String())
+	if err != nil {
+		t.Fatalf("http.Get: %v", err)
+	}
+	handlerData++
+	_, err = http.Get("http://" + ln.Addr().String())
+	if err != nil {
+		t.Fatalf("http.Get: %v", err)
+	}
+	handlerData++
+}
diff --git a/src/runtime/race/testdata/issue12225_test.go b/src/runtime/race/testdata/issue12225_test.go
new file mode 100644
index 0000000..0494493
--- /dev/null
+++ b/src/runtime/race/testdata/issue12225_test.go
@@ -0,0 +1,20 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import "unsafe"
+
+// golang.org/issue/12225
+// The test is that this compiles at all.
+
+//go:noinline
+func convert(s string) []byte {
+	return []byte(s)
+}
+
+func issue12225() {
+	println(*(*int)(unsafe.Pointer(&convert("")[0])))
+	println(*(*int)(unsafe.Pointer(&[]byte("")[0])))
+}
diff --git a/src/runtime/race/testdata/issue12664_test.go b/src/runtime/race/testdata/issue12664_test.go
new file mode 100644
index 0000000..714e83d
--- /dev/null
+++ b/src/runtime/race/testdata/issue12664_test.go
@@ -0,0 +1,76 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"fmt"
+	"testing"
+)
+
+var issue12664 = "hi"
+
+func TestRaceIssue12664(t *testing.T) {
+	c := make(chan struct{})
+	go func() {
+		issue12664 = "bye"
+		close(c)
+	}()
+	fmt.Println(issue12664)
+	<-c
+}
+
+type MyI interface {
+	foo()
+}
+
+type MyT int
+
+func (MyT) foo() {
+}
+
+var issue12664_2 MyT = 0
+
+func TestRaceIssue12664_2(t *testing.T) {
+	c := make(chan struct{})
+	go func() {
+		issue12664_2 = 1
+		close(c)
+	}()
+	func(x MyI) {
+		// Never true, but prevents inlining.
+		if x.(MyT) == -1 {
+			close(c)
+		}
+	}(issue12664_2)
+	<-c
+}
+
+var issue12664_3 MyT = 0
+
+func TestRaceIssue12664_3(t *testing.T) {
+	c := make(chan struct{})
+	go func() {
+		issue12664_3 = 1
+		close(c)
+	}()
+	var r MyT
+	var i any = r
+	issue12664_3 = i.(MyT)
+	<-c
+}
+
+var issue12664_4 MyT = 0
+
+func TestRaceIssue12664_4(t *testing.T) {
+	c := make(chan struct{})
+	go func() {
+		issue12664_4 = 1
+		close(c)
+	}()
+	var r MyT
+	var i MyI = r
+	issue12664_4 = i.(MyT)
+	<-c
+}
diff --git a/src/runtime/race/testdata/issue13264_test.go b/src/runtime/race/testdata/issue13264_test.go
new file mode 100644
index 0000000..d42290d
--- /dev/null
+++ b/src/runtime/race/testdata/issue13264_test.go
@@ -0,0 +1,13 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+// golang.org/issue/13264
+// The test is that this compiles at all.
+
+func issue13264() {
+	for ; ; []map[int]int{}[0][0] = 0 {
+	}
+}
diff --git a/src/runtime/race/testdata/map_test.go b/src/runtime/race/testdata/map_test.go
new file mode 100644
index 0000000..88e735e
--- /dev/null
+++ b/src/runtime/race/testdata/map_test.go
@@ -0,0 +1,335 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"testing"
+)
+
+func TestRaceMapRW(t *testing.T) {
+	m := make(map[int]int)
+	ch := make(chan bool, 1)
+	go func() {
+		_ = m[1]
+		ch <- true
+	}()
+	m[1] = 1
+	<-ch
+}
+
+func TestRaceMapRW2(t *testing.T) {
+	m := make(map[int]int)
+	ch := make(chan bool, 1)
+	go func() {
+		_, _ = m[1]
+		ch <- true
+	}()
+	m[1] = 1
+	<-ch
+}
+
+func TestRaceMapRWArray(t *testing.T) {
+	// Check instrumentation of unaddressable arrays (issue 4578).
+	m := make(map[int][2]int)
+	ch := make(chan bool, 1)
+	go func() {
+		_ = m[1][1]
+		ch <- true
+	}()
+	m[2] = [2]int{1, 2}
+	<-ch
+}
+
+func TestNoRaceMapRR(t *testing.T) {
+	m := make(map[int]int)
+	ch := make(chan bool, 1)
+	go func() {
+		_, _ = m[1]
+		ch <- true
+	}()
+	_ = m[1]
+	<-ch
+}
+
+func TestRaceMapRange(t *testing.T) {
+	m := make(map[int]int)
+	ch := make(chan bool, 1)
+	go func() {
+		for range m {
+		}
+		ch <- true
+	}()
+	m[1] = 1
+	<-ch
+}
+
+func TestRaceMapRange2(t *testing.T) {
+	m := make(map[int]int)
+	ch := make(chan bool, 1)
+	go func() {
+		for range m {
+		}
+		ch <- true
+	}()
+	m[1] = 1
+	<-ch
+}
+
+func TestNoRaceMapRangeRange(t *testing.T) {
+	m := make(map[int]int)
+	// now the map is not empty and range triggers an event
+	// should work without this (as in other tests)
+	// so it is suspicious if this test passes and others don't
+	m[0] = 0
+	ch := make(chan bool, 1)
+	go func() {
+		for range m {
+		}
+		ch <- true
+	}()
+	for range m {
+	}
+	<-ch
+}
+
+func TestRaceMapLen(t *testing.T) {
+	m := make(map[string]bool)
+	ch := make(chan bool, 1)
+	go func() {
+		_ = len(m)
+		ch <- true
+	}()
+	m[""] = true
+	<-ch
+}
+
+func TestRaceMapDelete(t *testing.T) {
+	m := make(map[string]bool)
+	ch := make(chan bool, 1)
+	go func() {
+		delete(m, "")
+		ch <- true
+	}()
+	m[""] = true
+	<-ch
+}
+
+func TestRaceMapLenDelete(t *testing.T) {
+	m := make(map[string]bool)
+	ch := make(chan bool, 1)
+	go func() {
+		delete(m, "a")
+		ch <- true
+	}()
+	_ = len(m)
+	<-ch
+}
+
+func TestRaceMapVariable(t *testing.T) {
+	ch := make(chan bool, 1)
+	m := make(map[int]int)
+	_ = m
+	go func() {
+		m = make(map[int]int)
+		ch <- true
+	}()
+	m = make(map[int]int)
+	<-ch
+}
+
+func TestRaceMapVariable2(t *testing.T) {
+	ch := make(chan bool, 1)
+	m := make(map[int]int)
+	go func() {
+		m[1] = 1
+		ch <- true
+	}()
+	m = make(map[int]int)
+	<-ch
+}
+
+func TestRaceMapVariable3(t *testing.T) {
+	ch := make(chan bool, 1)
+	m := make(map[int]int)
+	go func() {
+		_ = m[1]
+		ch <- true
+	}()
+	m = make(map[int]int)
+	<-ch
+}
+
+type Big struct {
+	x [17]int32
+}
+
+func TestRaceMapLookupPartKey(t *testing.T) {
+	k := &Big{}
+	m := make(map[Big]bool)
+	ch := make(chan bool, 1)
+	go func() {
+		k.x[8] = 1
+		ch <- true
+	}()
+	_ = m[*k]
+	<-ch
+}
+
+func TestRaceMapLookupPartKey2(t *testing.T) {
+	k := &Big{}
+	m := make(map[Big]bool)
+	ch := make(chan bool, 1)
+	go func() {
+		k.x[8] = 1
+		ch <- true
+	}()
+	_, _ = m[*k]
+	<-ch
+}
+func TestRaceMapDeletePartKey(t *testing.T) {
+	k := &Big{}
+	m := make(map[Big]bool)
+	ch := make(chan bool, 1)
+	go func() {
+		k.x[8] = 1
+		ch <- true
+	}()
+	delete(m, *k)
+	<-ch
+}
+
+func TestRaceMapInsertPartKey(t *testing.T) {
+	k := &Big{}
+	m := make(map[Big]bool)
+	ch := make(chan bool, 1)
+	go func() {
+		k.x[8] = 1
+		ch <- true
+	}()
+	m[*k] = true
+	<-ch
+}
+
+func TestRaceMapInsertPartVal(t *testing.T) {
+	v := &Big{}
+	m := make(map[int]Big)
+	ch := make(chan bool, 1)
+	go func() {
+		v.x[8] = 1
+		ch <- true
+	}()
+	m[1] = *v
+	<-ch
+}
+
+// Test for issue 7561.
+func TestRaceMapAssignMultipleReturn(t *testing.T) {
+	connect := func() (int, error) { return 42, nil }
+	conns := make(map[int][]int)
+	conns[1] = []int{0}
+	ch := make(chan bool, 1)
+	var err error
+	_ = err
+	go func() {
+		conns[1][0], err = connect()
+		ch <- true
+	}()
+	x := conns[1][0]
+	_ = x
+	<-ch
+}
+
+// BigKey and BigVal must be larger than 256 bytes,
+// so that compiler sets KindGCProg for them.
+type BigKey [1000]*int
+
+type BigVal struct {
+	x int
+	y [1000]*int
+}
+
+func TestRaceMapBigKeyAccess1(t *testing.T) {
+	m := make(map[BigKey]int)
+	var k BigKey
+	ch := make(chan bool, 1)
+	go func() {
+		_ = m[k]
+		ch <- true
+	}()
+	k[30] = new(int)
+	<-ch
+}
+
+func TestRaceMapBigKeyAccess2(t *testing.T) {
+	m := make(map[BigKey]int)
+	var k BigKey
+	ch := make(chan bool, 1)
+	go func() {
+		_, _ = m[k]
+		ch <- true
+	}()
+	k[30] = new(int)
+	<-ch
+}
+
+func TestRaceMapBigKeyInsert(t *testing.T) {
+	m := make(map[BigKey]int)
+	var k BigKey
+	ch := make(chan bool, 1)
+	go func() {
+		m[k] = 1
+		ch <- true
+	}()
+	k[30] = new(int)
+	<-ch
+}
+
+func TestRaceMapBigKeyDelete(t *testing.T) {
+	m := make(map[BigKey]int)
+	var k BigKey
+	ch := make(chan bool, 1)
+	go func() {
+		delete(m, k)
+		ch <- true
+	}()
+	k[30] = new(int)
+	<-ch
+}
+
+func TestRaceMapBigValInsert(t *testing.T) {
+	m := make(map[int]BigVal)
+	var v BigVal
+	ch := make(chan bool, 1)
+	go func() {
+		m[1] = v
+		ch <- true
+	}()
+	v.y[30] = new(int)
+	<-ch
+}
+
+func TestRaceMapBigValAccess1(t *testing.T) {
+	m := make(map[int]BigVal)
+	var v BigVal
+	ch := make(chan bool, 1)
+	go func() {
+		v = m[1]
+		ch <- true
+	}()
+	v.y[30] = new(int)
+	<-ch
+}
+
+func TestRaceMapBigValAccess2(t *testing.T) {
+	m := make(map[int]BigVal)
+	var v BigVal
+	ch := make(chan bool, 1)
+	go func() {
+		v, _ = m[1]
+		ch <- true
+	}()
+	v.y[30] = new(int)
+	<-ch
+}
diff --git a/src/runtime/race/testdata/mop_test.go b/src/runtime/race/testdata/mop_test.go
new file mode 100644
index 0000000..6b1069f
--- /dev/null
+++ b/src/runtime/race/testdata/mop_test.go
@@ -0,0 +1,2132 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"bytes"
+	"errors"
+	"fmt"
+	"hash/crc32"
+	"io"
+	"os"
+	"runtime"
+	"sync"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+type Point struct {
+	x, y int
+}
+
+type NamedPoint struct {
+	name string
+	p    Point
+}
+
+type DummyWriter struct {
+	state int
+}
+type Writer interface {
+	Write(p []byte) (n int)
+}
+
+func (d DummyWriter) Write(p []byte) (n int) {
+	return 0
+}
+
+var GlobalX, GlobalY int = 0, 0
+var GlobalCh chan int = make(chan int, 2)
+
+func GlobalFunc1() {
+	GlobalY = GlobalX
+	GlobalCh <- 1
+}
+
+func GlobalFunc2() {
+	GlobalX = 1
+	GlobalCh <- 1
+}
+
+func TestRaceIntRWGlobalFuncs(t *testing.T) {
+	go GlobalFunc1()
+	go GlobalFunc2()
+	<-GlobalCh
+	<-GlobalCh
+}
+
+func TestRaceIntRWClosures(t *testing.T) {
+	var x, y int
+	_ = y
+	ch := make(chan int, 2)
+
+	go func() {
+		y = x
+		ch <- 1
+	}()
+	go func() {
+		x = 1
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceIntRWClosures(t *testing.T) {
+	var x, y int
+	_ = y
+	ch := make(chan int, 1)
+
+	go func() {
+		y = x
+		ch <- 1
+	}()
+	<-ch
+	go func() {
+		x = 1
+		ch <- 1
+	}()
+	<-ch
+
+}
+
+func TestRaceInt32RWClosures(t *testing.T) {
+	var x, y int32
+	_ = y
+	ch := make(chan bool, 2)
+
+	go func() {
+		y = x
+		ch <- true
+	}()
+	go func() {
+		x = 1
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceCase(t *testing.T) {
+	var y int
+	for x := -1; x <= 1; x++ {
+		switch {
+		case x < 0:
+			y = -1
+		case x == 0:
+			y = 0
+		case x > 0:
+			y = 1
+		}
+	}
+	y++
+}
+
+func TestRaceCaseCondition(t *testing.T) {
+	var x int = 0
+	ch := make(chan int, 2)
+
+	go func() {
+		x = 2
+		ch <- 1
+	}()
+	go func() {
+		switch x < 2 {
+		case true:
+			x = 1
+			//case false:
+			//	x = 5
+		}
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceCaseCondition2(t *testing.T) {
+	// switch body is rearranged by the compiler so the tests
+	// passes even if we don't instrument '<'
+	var x int = 0
+	ch := make(chan int, 2)
+
+	go func() {
+		x = 2
+		ch <- 1
+	}()
+	go func() {
+		switch x < 2 {
+		case true:
+			x = 1
+		case false:
+			x = 5
+		}
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceCaseBody(t *testing.T) {
+	var x, y int
+	_ = y
+	ch := make(chan int, 2)
+
+	go func() {
+		y = x
+		ch <- 1
+	}()
+	go func() {
+		switch {
+		default:
+			x = 1
+		case x == 100:
+			x = -x
+		}
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceCaseFallthrough(t *testing.T) {
+	var x, y, z int
+	_ = y
+	ch := make(chan int, 2)
+	z = 1
+
+	go func() {
+		y = x
+		ch <- 1
+	}()
+	go func() {
+		switch {
+		case z == 1:
+		case z == 2:
+			x = 2
+		}
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceCaseFallthrough(t *testing.T) {
+	var x, y, z int
+	_ = y
+	ch := make(chan int, 2)
+	z = 1
+
+	go func() {
+		y = x
+		ch <- 1
+	}()
+	go func() {
+		switch {
+		case z == 1:
+			fallthrough
+		case z == 2:
+			x = 2
+		}
+		ch <- 1
+	}()
+
+	<-ch
+	<-ch
+}
+
+func TestRaceCaseIssue6418(t *testing.T) {
+	m := map[string]map[string]string{
+		"a": {
+			"b": "c",
+		},
+	}
+	ch := make(chan int)
+	go func() {
+		m["a"]["x"] = "y"
+		ch <- 1
+	}()
+	switch m["a"]["b"] {
+	}
+	<-ch
+}
+
+func TestRaceCaseType(t *testing.T) {
+	var x, y int
+	var i any = x
+	c := make(chan int, 1)
+	go func() {
+		switch i.(type) {
+		case nil:
+		case int:
+		}
+		c <- 1
+	}()
+	i = y
+	<-c
+}
+
+func TestRaceCaseTypeBody(t *testing.T) {
+	var x, y int
+	var i any = &x
+	c := make(chan int, 1)
+	go func() {
+		switch i := i.(type) {
+		case nil:
+		case *int:
+			*i = y
+		}
+		c <- 1
+	}()
+	x = y
+	<-c
+}
+
+func TestRaceCaseTypeIssue5890(t *testing.T) {
+	// spurious extra instrumentation of the initial interface
+	// value.
+	var x, y int
+	m := make(map[int]map[int]any)
+	m[0] = make(map[int]any)
+	c := make(chan int, 1)
+	go func() {
+		switch i := m[0][1].(type) {
+		case nil:
+		case *int:
+			*i = x
+		}
+		c <- 1
+	}()
+	m[0][1] = y
+	<-c
+}
+
+func TestNoRaceRange(t *testing.T) {
+	ch := make(chan int, 3)
+	a := [...]int{1, 2, 3}
+	for _, v := range a {
+		ch <- v
+	}
+	close(ch)
+}
+
+func TestNoRaceRangeIssue5446(t *testing.T) {
+	ch := make(chan int, 3)
+	a := []int{1, 2, 3}
+	b := []int{4}
+	// used to insert a spurious instrumentation of a[i]
+	// and crash.
+	i := 1
+	for i, a[i] = range b {
+		ch <- i
+	}
+	close(ch)
+}
+
+func TestRaceRange(t *testing.T) {
+	const N = 2
+	var a [N]int
+	var x, y int
+	_ = x + y
+	done := make(chan bool, N)
+	var i, v int // declare here (not in for stmt) so that i and v are shared w/ or w/o loop variable sharing change
+	for i, v = range a {
+		go func(i int) {
+			// we don't want a write-vs-write race
+			// so there is no array b here
+			if i == 0 {
+				x = v
+			} else {
+				y = v
+			}
+			done <- true
+		}(i)
+		// Ensure the goroutine runs before we continue the loop.
+		runtime.Gosched()
+	}
+	for i := 0; i < N; i++ {
+		<-done
+	}
+}
+
+func TestRaceForInit(t *testing.T) {
+	c := make(chan int)
+	x := 0
+	go func() {
+		c <- x
+	}()
+	for x = 42; false; {
+	}
+	<-c
+}
+
+func TestNoRaceForInit(t *testing.T) {
+	done := make(chan bool)
+	c := make(chan bool)
+	x := 0
+	go func() {
+		for {
+			_, ok := <-c
+			if !ok {
+				done <- true
+				return
+			}
+			x++
+		}
+	}()
+	i := 0
+	for x = 42; i < 10; i++ {
+		c <- true
+	}
+	close(c)
+	<-done
+}
+
+func TestRaceForTest(t *testing.T) {
+	done := make(chan bool)
+	c := make(chan bool)
+	stop := false
+	go func() {
+		for {
+			_, ok := <-c
+			if !ok {
+				done <- true
+				return
+			}
+			stop = true
+		}
+	}()
+	for !stop {
+		c <- true
+	}
+	close(c)
+	<-done
+}
+
+func TestRaceForIncr(t *testing.T) {
+	done := make(chan bool)
+	c := make(chan bool)
+	x := 0
+	go func() {
+		for {
+			_, ok := <-c
+			if !ok {
+				done <- true
+				return
+			}
+			x++
+		}
+	}()
+	for i := 0; i < 10; x++ {
+		i++
+		c <- true
+	}
+	close(c)
+	<-done
+}
+
+func TestNoRaceForIncr(t *testing.T) {
+	done := make(chan bool)
+	x := 0
+	go func() {
+		x++
+		done <- true
+	}()
+	for i := 0; i < 0; x++ {
+	}
+	<-done
+}
+
+func TestRacePlus(t *testing.T) {
+	var x, y, z int
+	_ = y
+	ch := make(chan int, 2)
+
+	go func() {
+		y = x + z
+		ch <- 1
+	}()
+	go func() {
+		y = x + z + z
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRacePlus2(t *testing.T) {
+	var x, y, z int
+	_ = y
+	ch := make(chan int, 2)
+
+	go func() {
+		x = 1
+		ch <- 1
+	}()
+	go func() {
+		y = +x + z
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRacePlus(t *testing.T) {
+	var x, y, z, f int
+	_ = x + y + f
+	ch := make(chan int, 2)
+
+	go func() {
+		y = x + z
+		ch <- 1
+	}()
+	go func() {
+		f = z + x
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceComplement(t *testing.T) {
+	var x, y, z int
+	_ = x
+	ch := make(chan int, 2)
+
+	go func() {
+		x = ^y
+		ch <- 1
+	}()
+	go func() {
+		y = ^z
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceDiv(t *testing.T) {
+	var x, y, z int
+	_ = x
+	ch := make(chan int, 2)
+
+	go func() {
+		x = y / (z + 1)
+		ch <- 1
+	}()
+	go func() {
+		y = z
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceDivConst(t *testing.T) {
+	var x, y, z uint32
+	_ = x
+	ch := make(chan int, 2)
+
+	go func() {
+		x = y / 3 // involves only a HMUL node
+		ch <- 1
+	}()
+	go func() {
+		y = z
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceMod(t *testing.T) {
+	var x, y, z int
+	_ = x
+	ch := make(chan int, 2)
+
+	go func() {
+		x = y % (z + 1)
+		ch <- 1
+	}()
+	go func() {
+		y = z
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceModConst(t *testing.T) {
+	var x, y, z int
+	_ = x
+	ch := make(chan int, 2)
+
+	go func() {
+		x = y % 3
+		ch <- 1
+	}()
+	go func() {
+		y = z
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceRotate(t *testing.T) {
+	var x, y, z uint32
+	_ = x
+	ch := make(chan int, 2)
+
+	go func() {
+		x = y<<12 | y>>20
+		ch <- 1
+	}()
+	go func() {
+		y = z
+		ch <- 1
+	}()
+	<-ch
+	<-ch
+}
+
+// May crash if the instrumentation is reckless.
+func TestNoRaceEnoughRegisters(t *testing.T) {
+	// from erf.go
+	const (
+		sa1 = 1
+		sa2 = 2
+		sa3 = 3
+		sa4 = 4
+		sa5 = 5
+		sa6 = 6
+		sa7 = 7
+		sa8 = 8
+	)
+	var s, S float64
+	s = 3.1415
+	S = 1 + s*(sa1+s*(sa2+s*(sa3+s*(sa4+s*(sa5+s*(sa6+s*(sa7+s*sa8)))))))
+	s = S
+}
+
+// emptyFunc should not be inlined.
+func emptyFunc(x int) {
+	if false {
+		fmt.Println(x)
+	}
+}
+
+func TestRaceFuncArgument(t *testing.T) {
+	var x int
+	ch := make(chan bool, 1)
+	go func() {
+		emptyFunc(x)
+		ch <- true
+	}()
+	x = 1
+	<-ch
+}
+
+func TestRaceFuncArgument2(t *testing.T) {
+	var x int
+	ch := make(chan bool, 2)
+	go func() {
+		x = 42
+		ch <- true
+	}()
+	go func(y int) {
+		ch <- true
+	}(x)
+	<-ch
+	<-ch
+}
+
+func TestRaceSprint(t *testing.T) {
+	var x int
+	ch := make(chan bool, 1)
+	go func() {
+		fmt.Sprint(x)
+		ch <- true
+	}()
+	x = 1
+	<-ch
+}
+
+func TestRaceArrayCopy(t *testing.T) {
+	ch := make(chan bool, 1)
+	var a [5]int
+	go func() {
+		a[3] = 1
+		ch <- true
+	}()
+	a = [5]int{1, 2, 3, 4, 5}
+	<-ch
+}
+
+// Blows up a naive compiler.
+func TestRaceNestedArrayCopy(t *testing.T) {
+	ch := make(chan bool, 1)
+	type (
+		Point32   [2][2][2][2][2]Point
+		Point1024 [2][2][2][2][2]Point32
+		Point32k  [2][2][2][2][2]Point1024
+		Point1M   [2][2][2][2][2]Point32k
+	)
+	var a, b Point1M
+	go func() {
+		a[0][1][0][1][0][1][0][1][0][1][0][1][0][1][0][1][0][1][0][1].y = 1
+		ch <- true
+	}()
+	a = b
+	<-ch
+}
+
+func TestRaceStructRW(t *testing.T) {
+	p := Point{0, 0}
+	ch := make(chan bool, 1)
+	go func() {
+		p = Point{1, 1}
+		ch <- true
+	}()
+	q := p
+	<-ch
+	p = q
+}
+
+func TestRaceStructFieldRW1(t *testing.T) {
+	p := Point{0, 0}
+	ch := make(chan bool, 1)
+	go func() {
+		p.x = 1
+		ch <- true
+	}()
+	_ = p.x
+	<-ch
+}
+
+func TestNoRaceStructFieldRW1(t *testing.T) {
+	// Same struct, different variables, no
+	// pointers. The layout is known (at compile time?) ->
+	// no read on p
+	// writes on x and y
+	p := Point{0, 0}
+	ch := make(chan bool, 1)
+	go func() {
+		p.x = 1
+		ch <- true
+	}()
+	p.y = 1
+	<-ch
+	_ = p
+}
+
+func TestNoRaceStructFieldRW2(t *testing.T) {
+	// Same as NoRaceStructFieldRW1
+	// but p is a pointer, so there is a read on p
+	p := Point{0, 0}
+	ch := make(chan bool, 1)
+	go func() {
+		p.x = 1
+		ch <- true
+	}()
+	p.y = 1
+	<-ch
+	_ = p
+}
+
+func TestRaceStructFieldRW2(t *testing.T) {
+	p := &Point{0, 0}
+	ch := make(chan bool, 1)
+	go func() {
+		p.x = 1
+		ch <- true
+	}()
+	_ = p.x
+	<-ch
+}
+
+func TestRaceStructFieldRW3(t *testing.T) {
+	p := NamedPoint{name: "a", p: Point{0, 0}}
+	ch := make(chan bool, 1)
+	go func() {
+		p.p.x = 1
+		ch <- true
+	}()
+	_ = p.p.x
+	<-ch
+}
+
+func TestRaceEfaceWW(t *testing.T) {
+	var a, b any
+	ch := make(chan bool, 1)
+	go func() {
+		a = 1
+		ch <- true
+	}()
+	a = 2
+	<-ch
+	_, _ = a, b
+}
+
+func TestRaceIfaceWW(t *testing.T) {
+	var a, b Writer
+	ch := make(chan bool, 1)
+	go func() {
+		a = DummyWriter{1}
+		ch <- true
+	}()
+	a = DummyWriter{2}
+	<-ch
+	b = a
+	a = b
+}
+
+func TestRaceIfaceCmp(t *testing.T) {
+	var a, b Writer
+	a = DummyWriter{1}
+	ch := make(chan bool, 1)
+	go func() {
+		a = DummyWriter{1}
+		ch <- true
+	}()
+	_ = a == b
+	<-ch
+}
+
+func TestRaceIfaceCmpNil(t *testing.T) {
+	var a Writer
+	a = DummyWriter{1}
+	ch := make(chan bool, 1)
+	go func() {
+		a = DummyWriter{1}
+		ch <- true
+	}()
+	_ = a == nil
+	<-ch
+}
+
+func TestRaceEfaceConv(t *testing.T) {
+	c := make(chan bool)
+	v := 0
+	go func() {
+		go func(x any) {
+		}(v)
+		c <- true
+	}()
+	v = 42
+	<-c
+}
+
+type OsFile struct{}
+
+func (*OsFile) Read() {
+}
+
+type IoReader interface {
+	Read()
+}
+
+func TestRaceIfaceConv(t *testing.T) {
+	c := make(chan bool)
+	f := &OsFile{}
+	go func() {
+		go func(x IoReader) {
+		}(f)
+		c <- true
+	}()
+	f = &OsFile{}
+	<-c
+}
+
+func TestRaceError(t *testing.T) {
+	ch := make(chan bool, 1)
+	var err error
+	go func() {
+		err = nil
+		ch <- true
+	}()
+	_ = err
+	<-ch
+}
+
+func TestRaceIntptrRW(t *testing.T) {
+	var x, y int
+	var p *int = &x
+	ch := make(chan bool, 1)
+	go func() {
+		*p = 5
+		ch <- true
+	}()
+	y = *p
+	x = y
+	<-ch
+}
+
+func TestRaceStringRW(t *testing.T) {
+	ch := make(chan bool, 1)
+	s := ""
+	go func() {
+		s = "abacaba"
+		ch <- true
+	}()
+	_ = s
+	<-ch
+}
+
+func TestRaceStringPtrRW(t *testing.T) {
+	ch := make(chan bool, 1)
+	var x string
+	p := &x
+	go func() {
+		*p = "a"
+		ch <- true
+	}()
+	_ = *p
+	<-ch
+}
+
+func TestRaceFloat64WW(t *testing.T) {
+	var x, y float64
+	ch := make(chan bool, 1)
+	go func() {
+		x = 1.0
+		ch <- true
+	}()
+	x = 2.0
+	<-ch
+
+	y = x
+	x = y
+}
+
+func TestRaceComplex128WW(t *testing.T) {
+	var x, y complex128
+	ch := make(chan bool, 1)
+	go func() {
+		x = 2 + 2i
+		ch <- true
+	}()
+	x = 4 + 4i
+	<-ch
+
+	y = x
+	x = y
+}
+
+func TestRaceUnsafePtrRW(t *testing.T) {
+	var x, y, z int
+	x, y, z = 1, 2, 3
+	var p unsafe.Pointer = unsafe.Pointer(&x)
+	ch := make(chan bool, 1)
+	go func() {
+		p = (unsafe.Pointer)(&z)
+		ch <- true
+	}()
+	y = *(*int)(p)
+	x = y
+	<-ch
+}
+
+func TestRaceFuncVariableRW(t *testing.T) {
+	var f func(x int) int
+	f = func(x int) int {
+		return x * x
+	}
+	ch := make(chan bool, 1)
+	go func() {
+		f = func(x int) int {
+			return x
+		}
+		ch <- true
+	}()
+	y := f(1)
+	<-ch
+	x := y
+	y = x
+}
+
+func TestRaceFuncVariableWW(t *testing.T) {
+	var f func(x int) int
+	_ = f
+	ch := make(chan bool, 1)
+	go func() {
+		f = func(x int) int {
+			return x
+		}
+		ch <- true
+	}()
+	f = func(x int) int {
+		return x * x
+	}
+	<-ch
+}
+
+// This one should not belong to mop_test
+func TestRacePanic(t *testing.T) {
+	var x int
+	_ = x
+	var zero int = 0
+	ch := make(chan bool, 2)
+	go func() {
+		defer func() {
+			err := recover()
+			if err == nil {
+				panic("should be panicking")
+			}
+			x = 1
+			ch <- true
+		}()
+		var y int = 1 / zero
+		zero = y
+	}()
+	go func() {
+		defer func() {
+			err := recover()
+			if err == nil {
+				panic("should be panicking")
+			}
+			x = 2
+			ch <- true
+		}()
+		var y int = 1 / zero
+		zero = y
+	}()
+
+	<-ch
+	<-ch
+	if zero != 0 {
+		panic("zero has changed")
+	}
+}
+
+func TestNoRaceBlank(t *testing.T) {
+	var a [5]int
+	ch := make(chan bool, 1)
+	go func() {
+		_, _ = a[0], a[1]
+		ch <- true
+	}()
+	_, _ = a[2], a[3]
+	<-ch
+	a[1] = a[0]
+}
+
+func TestRaceAppendRW(t *testing.T) {
+	a := make([]int, 10)
+	ch := make(chan bool)
+	go func() {
+		_ = append(a, 1)
+		ch <- true
+	}()
+	a[0] = 1
+	<-ch
+}
+
+func TestRaceAppendLenRW(t *testing.T) {
+	a := make([]int, 0)
+	ch := make(chan bool)
+	go func() {
+		a = append(a, 1)
+		ch <- true
+	}()
+	_ = len(a)
+	<-ch
+}
+
+func TestRaceAppendCapRW(t *testing.T) {
+	a := make([]int, 0)
+	ch := make(chan string)
+	go func() {
+		a = append(a, 1)
+		ch <- ""
+	}()
+	_ = cap(a)
+	<-ch
+}
+
+func TestNoRaceFuncArgsRW(t *testing.T) {
+	ch := make(chan byte, 1)
+	var x byte
+	go func(y byte) {
+		_ = y
+		ch <- 0
+	}(x)
+	x = 1
+	<-ch
+}
+
+func TestRaceFuncArgsRW(t *testing.T) {
+	ch := make(chan byte, 1)
+	var x byte
+	go func(y *byte) {
+		_ = *y
+		ch <- 0
+	}(&x)
+	x = 1
+	<-ch
+}
+
+// from the mailing list, slightly modified
+// unprotected concurrent access to seen[]
+func TestRaceCrawl(t *testing.T) {
+	url := "dummyurl"
+	depth := 3
+	seen := make(map[string]bool)
+	ch := make(chan int, 100)
+	var wg sync.WaitGroup
+	var crawl func(string, int)
+	crawl = func(u string, d int) {
+		nurl := 0
+		defer func() {
+			ch <- nurl
+		}()
+		seen[u] = true
+		if d <= 0 {
+			wg.Done()
+			return
+		}
+		urls := [...]string{"a", "b", "c"}
+		for _, uu := range urls {
+			if _, ok := seen[uu]; !ok {
+				wg.Add(1)
+				go crawl(uu, d-1)
+				nurl++
+			}
+		}
+		wg.Done()
+	}
+	wg.Add(1)
+	go crawl(url, depth)
+	wg.Wait()
+}
+
+func TestRaceIndirection(t *testing.T) {
+	ch := make(chan struct{}, 1)
+	var y int
+	var x *int = &y
+	go func() {
+		*x = 1
+		ch <- struct{}{}
+	}()
+	*x = 2
+	<-ch
+	_ = *x
+}
+
+func TestRaceRune(t *testing.T) {
+	c := make(chan bool)
+	var x rune
+	go func() {
+		x = 1
+		c <- true
+	}()
+	_ = x
+	<-c
+}
+
+func TestRaceEmptyInterface1(t *testing.T) {
+	c := make(chan bool)
+	var x any
+	go func() {
+		x = nil
+		c <- true
+	}()
+	_ = x
+	<-c
+}
+
+func TestRaceEmptyInterface2(t *testing.T) {
+	c := make(chan bool)
+	var x any
+	go func() {
+		x = &Point{}
+		c <- true
+	}()
+	_ = x
+	<-c
+}
+
+func TestRaceTLS(t *testing.T) {
+	comm := make(chan *int)
+	done := make(chan bool, 2)
+	go func() {
+		var x int
+		comm <- &x
+		x = 1
+		x = *(<-comm)
+		done <- true
+	}()
+	go func() {
+		p := <-comm
+		*p = 2
+		comm <- p
+		done <- true
+	}()
+	<-done
+	<-done
+}
+
+func TestNoRaceHeapReallocation(t *testing.T) {
+	// It is possible that a future implementation
+	// of memory allocation will ruin this test.
+	// Increasing n might help in this case, so
+	// this test is a bit more generic than most of the
+	// others.
+	const n = 2
+	done := make(chan bool, n)
+	empty := func(p *int) {}
+	for i := 0; i < n; i++ {
+		ms := i
+		go func() {
+			<-time.After(time.Duration(ms) * time.Millisecond)
+			runtime.GC()
+			var x int
+			empty(&x) // x goes to the heap
+			done <- true
+		}()
+	}
+	for i := 0; i < n; i++ {
+		<-done
+	}
+}
+
+func TestRaceAnd(t *testing.T) {
+	c := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		x = 1
+		c <- true
+	}()
+	if x == 1 && y == 1 {
+	}
+	<-c
+}
+
+func TestRaceAnd2(t *testing.T) {
+	c := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		x = 1
+		c <- true
+	}()
+	if y == 0 && x == 1 {
+	}
+	<-c
+}
+
+func TestNoRaceAnd(t *testing.T) {
+	c := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		x = 1
+		c <- true
+	}()
+	if y == 1 && x == 1 {
+	}
+	<-c
+}
+
+func TestRaceOr(t *testing.T) {
+	c := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		x = 1
+		c <- true
+	}()
+	if x == 1 || y == 1 {
+	}
+	<-c
+}
+
+func TestRaceOr2(t *testing.T) {
+	c := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		x = 1
+		c <- true
+	}()
+	if y == 1 || x == 1 {
+	}
+	<-c
+}
+
+func TestNoRaceOr(t *testing.T) {
+	c := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		x = 1
+		c <- true
+	}()
+	if y == 0 || x == 1 {
+	}
+	<-c
+}
+
+func TestNoRaceShortCalc(t *testing.T) {
+	c := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		y = 1
+		c <- true
+	}()
+	if x == 0 || y == 0 {
+	}
+	<-c
+}
+
+func TestNoRaceShortCalc2(t *testing.T) {
+	c := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		y = 1
+		c <- true
+	}()
+	if x == 1 && y == 0 {
+	}
+	<-c
+}
+
+func TestRaceFuncItself(t *testing.T) {
+	c := make(chan bool)
+	f := func() {}
+	go func() {
+		f()
+		c <- true
+	}()
+	f = func() {}
+	<-c
+}
+
+func TestNoRaceFuncUnlock(t *testing.T) {
+	ch := make(chan bool, 1)
+	var mu sync.Mutex
+	x := 0
+	_ = x
+	go func() {
+		mu.Lock()
+		x = 42
+		mu.Unlock()
+		ch <- true
+	}()
+	x = func(mu *sync.Mutex) int {
+		mu.Lock()
+		return 43
+	}(&mu)
+	mu.Unlock()
+	<-ch
+}
+
+func TestRaceStructInit(t *testing.T) {
+	type X struct {
+		x, y int
+	}
+	c := make(chan bool, 1)
+	y := 0
+	go func() {
+		y = 42
+		c <- true
+	}()
+	x := X{x: y}
+	_ = x
+	<-c
+}
+
+func TestRaceArrayInit(t *testing.T) {
+	c := make(chan bool, 1)
+	y := 0
+	go func() {
+		y = 42
+		c <- true
+	}()
+	x := []int{0, y, 42}
+	_ = x
+	<-c
+}
+
+func TestRaceMapInit(t *testing.T) {
+	c := make(chan bool, 1)
+	y := 0
+	go func() {
+		y = 42
+		c <- true
+	}()
+	x := map[int]int{0: 42, y: 42}
+	_ = x
+	<-c
+}
+
+func TestRaceMapInit2(t *testing.T) {
+	c := make(chan bool, 1)
+	y := 0
+	go func() {
+		y = 42
+		c <- true
+	}()
+	x := map[int]int{0: 42, 42: y}
+	_ = x
+	<-c
+}
+
+type Inter interface {
+	Foo(x int)
+}
+type InterImpl struct {
+	x, y int
+}
+
+//go:noinline
+func (p InterImpl) Foo(x int) {
+}
+
+type InterImpl2 InterImpl
+
+func (p *InterImpl2) Foo(x int) {
+	if p == nil {
+		InterImpl{}.Foo(x)
+	}
+	InterImpl(*p).Foo(x)
+}
+
+func TestRaceInterCall(t *testing.T) {
+	c := make(chan bool, 1)
+	p := InterImpl{}
+	var x Inter = p
+	go func() {
+		p2 := InterImpl{}
+		x = p2
+		c <- true
+	}()
+	x.Foo(0)
+	<-c
+}
+
+func TestRaceInterCall2(t *testing.T) {
+	c := make(chan bool, 1)
+	p := InterImpl{}
+	var x Inter = p
+	z := 0
+	go func() {
+		z = 42
+		c <- true
+	}()
+	x.Foo(z)
+	<-c
+}
+
+func TestRaceFuncCall(t *testing.T) {
+	c := make(chan bool, 1)
+	f := func(x, y int) {}
+	x, y := 0, 0
+	go func() {
+		y = 42
+		c <- true
+	}()
+	f(x, y)
+	<-c
+}
+
+func TestRaceMethodCall(t *testing.T) {
+	c := make(chan bool, 1)
+	i := InterImpl{}
+	x := 0
+	go func() {
+		x = 42
+		c <- true
+	}()
+	i.Foo(x)
+	<-c
+}
+
+func TestRaceMethodCall2(t *testing.T) {
+	c := make(chan bool, 1)
+	i := &InterImpl{}
+	go func() {
+		i = &InterImpl{}
+		c <- true
+	}()
+	i.Foo(0)
+	<-c
+}
+
+// Method value with concrete value receiver.
+func TestRaceMethodValue(t *testing.T) {
+	c := make(chan bool, 1)
+	i := InterImpl{}
+	go func() {
+		i = InterImpl{}
+		c <- true
+	}()
+	_ = i.Foo
+	<-c
+}
+
+// Method value with interface receiver.
+func TestRaceMethodValue2(t *testing.T) {
+	c := make(chan bool, 1)
+	var i Inter = InterImpl{}
+	go func() {
+		i = InterImpl{}
+		c <- true
+	}()
+	_ = i.Foo
+	<-c
+}
+
+// Method value with implicit dereference.
+func TestRaceMethodValue3(t *testing.T) {
+	c := make(chan bool, 1)
+	i := &InterImpl{}
+	go func() {
+		*i = InterImpl{}
+		c <- true
+	}()
+	_ = i.Foo // dereferences i.
+	<-c
+}
+
+// Method value implicitly taking receiver address.
+func TestNoRaceMethodValue(t *testing.T) {
+	c := make(chan bool, 1)
+	i := InterImpl2{}
+	go func() {
+		i = InterImpl2{}
+		c <- true
+	}()
+	_ = i.Foo // takes the address of i only.
+	<-c
+}
+
+func TestRacePanicArg(t *testing.T) {
+	c := make(chan bool, 1)
+	err := errors.New("err")
+	go func() {
+		err = errors.New("err2")
+		c <- true
+	}()
+	defer func() {
+		recover()
+		<-c
+	}()
+	panic(err)
+}
+
+func TestRaceDeferArg(t *testing.T) {
+	c := make(chan bool, 1)
+	x := 0
+	go func() {
+		x = 42
+		c <- true
+	}()
+	func() {
+		defer func(x int) {
+		}(x)
+	}()
+	<-c
+}
+
+type DeferT int
+
+func (d DeferT) Foo() {
+}
+
+func TestRaceDeferArg2(t *testing.T) {
+	c := make(chan bool, 1)
+	var x DeferT
+	go func() {
+		var y DeferT
+		x = y
+		c <- true
+	}()
+	func() {
+		defer x.Foo()
+	}()
+	<-c
+}
+
+func TestNoRaceAddrExpr(t *testing.T) {
+	c := make(chan bool, 1)
+	x := 0
+	go func() {
+		x = 42
+		c <- true
+	}()
+	_ = &x
+	<-c
+}
+
+type AddrT struct {
+	_ [256]byte
+	x int
+}
+
+type AddrT2 struct {
+	_ [512]byte
+	p *AddrT
+}
+
+func TestRaceAddrExpr(t *testing.T) {
+	c := make(chan bool, 1)
+	a := AddrT2{p: &AddrT{x: 42}}
+	go func() {
+		a.p = &AddrT{x: 43}
+		c <- true
+	}()
+	_ = &a.p.x
+	<-c
+}
+
+func TestRaceTypeAssert(t *testing.T) {
+	c := make(chan bool, 1)
+	x := 0
+	var i any = x
+	go func() {
+		y := 0
+		i = y
+		c <- true
+	}()
+	_ = i.(int)
+	<-c
+}
+
+func TestRaceBlockAs(t *testing.T) {
+	c := make(chan bool, 1)
+	var x, y int
+	go func() {
+		x = 42
+		c <- true
+	}()
+	x, y = y, x
+	<-c
+}
+
+func TestRaceBlockCall1(t *testing.T) {
+	done := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		f := func() (int, int) {
+			return 42, 43
+		}
+		x, y = f()
+		done <- true
+	}()
+	_ = x
+	<-done
+	if x != 42 || y != 43 {
+		panic("corrupted data")
+	}
+}
+func TestRaceBlockCall2(t *testing.T) {
+	done := make(chan bool)
+	x, y := 0, 0
+	go func() {
+		f := func() (int, int) {
+			return 42, 43
+		}
+		x, y = f()
+		done <- true
+	}()
+	_ = y
+	<-done
+	if x != 42 || y != 43 {
+		panic("corrupted data")
+	}
+}
+func TestRaceBlockCall3(t *testing.T) {
+	done := make(chan bool)
+	var x *int
+	y := 0
+	go func() {
+		f := func() (*int, int) {
+			i := 42
+			return &i, 43
+		}
+		x, y = f()
+		done <- true
+	}()
+	_ = x
+	<-done
+	if *x != 42 || y != 43 {
+		panic("corrupted data")
+	}
+}
+func TestRaceBlockCall4(t *testing.T) {
+	done := make(chan bool)
+	x := 0
+	var y *int
+	go func() {
+		f := func() (int, *int) {
+			i := 43
+			return 42, &i
+		}
+		x, y = f()
+		done <- true
+	}()
+	_ = y
+	<-done
+	if x != 42 || *y != 43 {
+		panic("corrupted data")
+	}
+}
+func TestRaceBlockCall5(t *testing.T) {
+	done := make(chan bool)
+	var x *int
+	y := 0
+	go func() {
+		f := func() (*int, int) {
+			i := 42
+			return &i, 43
+		}
+		x, y = f()
+		done <- true
+	}()
+	_ = y
+	<-done
+	if *x != 42 || y != 43 {
+		panic("corrupted data")
+	}
+}
+func TestRaceBlockCall6(t *testing.T) {
+	done := make(chan bool)
+	x := 0
+	var y *int
+	go func() {
+		f := func() (int, *int) {
+			i := 43
+			return 42, &i
+		}
+		x, y = f()
+		done <- true
+	}()
+	_ = x
+	<-done
+	if x != 42 || *y != 43 {
+		panic("corrupted data")
+	}
+}
+func TestRaceSliceSlice(t *testing.T) {
+	c := make(chan bool, 1)
+	x := make([]int, 10)
+	go func() {
+		x = make([]int, 20)
+		c <- true
+	}()
+	_ = x[2:3]
+	<-c
+}
+
+func TestRaceSliceSlice2(t *testing.T) {
+	c := make(chan bool, 1)
+	x := make([]int, 10)
+	i := 2
+	go func() {
+		i = 3
+		c <- true
+	}()
+	_ = x[i:4]
+	<-c
+}
+
+func TestRaceSliceString(t *testing.T) {
+	c := make(chan bool, 1)
+	x := "hello"
+	go func() {
+		x = "world"
+		c <- true
+	}()
+	_ = x[2:3]
+	<-c
+}
+
+func TestRaceSliceStruct(t *testing.T) {
+	type X struct {
+		x, y int
+	}
+	c := make(chan bool, 1)
+	x := make([]X, 10)
+	go func() {
+		y := make([]X, 10)
+		copy(y, x)
+		c <- true
+	}()
+	x[1].y = 42
+	<-c
+}
+
+func TestRaceAppendSliceStruct(t *testing.T) {
+	type X struct {
+		x, y int
+	}
+	c := make(chan bool, 1)
+	x := make([]X, 10)
+	go func() {
+		y := make([]X, 0, 10)
+		y = append(y, x...)
+		c <- true
+	}()
+	x[1].y = 42
+	<-c
+}
+
+func TestRaceStructInd(t *testing.T) {
+	c := make(chan bool, 1)
+	type Item struct {
+		x, y int
+	}
+	i := Item{}
+	go func(p *Item) {
+		*p = Item{}
+		c <- true
+	}(&i)
+	i.y = 42
+	<-c
+}
+
+func TestRaceAsFunc1(t *testing.T) {
+	var s []byte
+	c := make(chan bool, 1)
+	go func() {
+		var err error
+		s, err = func() ([]byte, error) {
+			t := []byte("hello world")
+			return t, nil
+		}()
+		c <- true
+		_ = err
+	}()
+	_ = string(s)
+	<-c
+}
+
+func TestRaceAsFunc2(t *testing.T) {
+	c := make(chan bool, 1)
+	x := 0
+	go func() {
+		func(x int) {
+		}(x)
+		c <- true
+	}()
+	x = 42
+	<-c
+}
+
+func TestRaceAsFunc3(t *testing.T) {
+	c := make(chan bool, 1)
+	var mu sync.Mutex
+	x := 0
+	go func() {
+		func(x int) {
+			mu.Lock()
+		}(x) // Read of x must be outside of the mutex.
+		mu.Unlock()
+		c <- true
+	}()
+	mu.Lock()
+	x = 42
+	mu.Unlock()
+	<-c
+}
+
+func TestNoRaceAsFunc4(t *testing.T) {
+	c := make(chan bool, 1)
+	var mu sync.Mutex
+	x := 0
+	_ = x
+	go func() {
+		x = func() int { // Write of x must be under the mutex.
+			mu.Lock()
+			return 42
+		}()
+		mu.Unlock()
+		c <- true
+	}()
+	mu.Lock()
+	x = 42
+	mu.Unlock()
+	<-c
+}
+
+func TestRaceHeapParam(t *testing.T) {
+	done := make(chan bool)
+	x := func() (x int) {
+		go func() {
+			x = 42
+			done <- true
+		}()
+		return
+	}()
+	_ = x
+	<-done
+}
+
+func TestNoRaceEmptyStruct(t *testing.T) {
+	type Empty struct{}
+	type X struct {
+		y int64
+		Empty
+	}
+	type Y struct {
+		x X
+		y int64
+	}
+	c := make(chan X)
+	var y Y
+	go func() {
+		x := y.x
+		c <- x
+	}()
+	y.y = 42
+	<-c
+}
+
+func TestRaceNestedStruct(t *testing.T) {
+	type X struct {
+		x, y int
+	}
+	type Y struct {
+		x X
+	}
+	c := make(chan Y)
+	var y Y
+	go func() {
+		c <- y
+	}()
+	y.x.y = 42
+	<-c
+}
+
+func TestRaceIssue5567(t *testing.T) {
+	testRaceRead(t, false)
+}
+
+func TestRaceIssue51618(t *testing.T) {
+	testRaceRead(t, true)
+}
+
+func testRaceRead(t *testing.T, pread bool) {
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(4))
+	in := make(chan []byte)
+	res := make(chan error)
+	go func() {
+		var err error
+		defer func() {
+			close(in)
+			res <- err
+		}()
+		path := "mop_test.go"
+		f, err := os.Open(path)
+		if err != nil {
+			return
+		}
+		defer f.Close()
+		var n, total int
+		b := make([]byte, 17) // the race is on b buffer
+		for err == nil {
+			if pread {
+				n, err = f.ReadAt(b, int64(total))
+			} else {
+				n, err = f.Read(b)
+			}
+			total += n
+			if n > 0 {
+				in <- b[:n]
+			}
+		}
+		if err == io.EOF {
+			err = nil
+		}
+	}()
+	h := crc32.New(crc32.MakeTable(0x12345678))
+	for b := range in {
+		h.Write(b)
+	}
+	_ = h.Sum(nil)
+	err := <-res
+	if err != nil {
+		t.Fatal(err)
+	}
+}
+
+func TestRaceIssue5654(t *testing.T) {
+	text := `Friends, Romans, countrymen, lend me your ears;
+I come to bury Caesar, not to praise him.
+The evil that men do lives after them;
+The good is oft interred with their bones;
+So let it be with Caesar. The noble Brutus
+Hath told you Caesar was ambitious:
+If it were so, it was a grievous fault,
+And grievously hath Caesar answer'd it.
+Here, under leave of Brutus and the rest -
+For Brutus is an honourable man;
+So are they all, all honourable men -
+Come I to speak in Caesar's funeral.
+He was my friend, faithful and just to me:
+But Brutus says he was ambitious;
+And Brutus is an honourable man.`
+
+	data := bytes.NewBufferString(text)
+	in := make(chan []byte)
+
+	go func() {
+		buf := make([]byte, 16)
+		var n int
+		var err error
+		for ; err == nil; n, err = data.Read(buf) {
+			in <- buf[:n]
+		}
+		close(in)
+	}()
+	res := ""
+	for s := range in {
+		res += string(s)
+	}
+	_ = res
+}
+
+type Base int
+
+func (b *Base) Foo() int {
+	return 42
+}
+
+func (b Base) Bar() int {
+	return int(b)
+}
+
+func TestNoRaceMethodThunk(t *testing.T) {
+	type Derived struct {
+		pad int
+		Base
+	}
+	var d Derived
+	done := make(chan bool)
+	go func() {
+		_ = d.Foo()
+		done <- true
+	}()
+	d = Derived{}
+	<-done
+}
+
+func TestRaceMethodThunk(t *testing.T) {
+	type Derived struct {
+		pad int
+		*Base
+	}
+	var d Derived
+	done := make(chan bool)
+	go func() {
+		_ = d.Foo()
+		done <- true
+	}()
+	d = Derived{}
+	<-done
+}
+
+func TestRaceMethodThunk2(t *testing.T) {
+	type Derived struct {
+		pad int
+		Base
+	}
+	var d Derived
+	done := make(chan bool)
+	go func() {
+		_ = d.Bar()
+		done <- true
+	}()
+	d = Derived{}
+	<-done
+}
+
+func TestRaceMethodThunk3(t *testing.T) {
+	type Derived struct {
+		pad int
+		*Base
+	}
+	var d Derived
+	d.Base = new(Base)
+	done := make(chan bool)
+	go func() {
+		_ = d.Bar()
+		done <- true
+	}()
+	d.Base = new(Base)
+	<-done
+}
+
+func TestRaceMethodThunk4(t *testing.T) {
+	type Derived struct {
+		pad int
+		*Base
+	}
+	var d Derived
+	d.Base = new(Base)
+	done := make(chan bool)
+	go func() {
+		_ = d.Bar()
+		done <- true
+	}()
+	*(*int)(d.Base) = 42
+	<-done
+}
+
+func TestNoRaceTinyAlloc(t *testing.T) {
+	const P = 4
+	const N = 1e6
+	var tinySink *byte
+	_ = tinySink
+	done := make(chan bool)
+	for p := 0; p < P; p++ {
+		go func() {
+			for i := 0; i < N; i++ {
+				var b byte
+				if b != 0 {
+					tinySink = &b // make it heap allocated
+				}
+				b = 42
+			}
+			done <- true
+		}()
+	}
+	for p := 0; p < P; p++ {
+		<-done
+	}
+}
+
+func TestNoRaceIssue60934(t *testing.T) {
+	// Test that runtime.RaceDisable state doesn't accidentally get applied to
+	// new goroutines.
+
+	// Create several goroutines that end after calling runtime.RaceDisable.
+	var wg sync.WaitGroup
+	ready := make(chan struct{})
+	wg.Add(32)
+	for i := 0; i < 32; i++ {
+		go func() {
+			<-ready // ensure we have multiple goroutines running at the same time
+			runtime.RaceDisable()
+			wg.Done()
+		}()
+	}
+	close(ready)
+	wg.Wait()
+
+	// Make sure race detector still works. If the runtime.RaceDisable state
+	// leaks, the happens-before edges here will be ignored and a race on x will
+	// be reported.
+	var x int
+	ch := make(chan struct{}, 0)
+	wg.Add(2)
+	go func() {
+		x = 1
+		ch <- struct{}{}
+		wg.Done()
+	}()
+	go func() {
+		<-ch
+		_ = x
+		wg.Done()
+	}()
+	wg.Wait()
+}
diff --git a/src/runtime/race/testdata/mutex_test.go b/src/runtime/race/testdata/mutex_test.go
new file mode 100644
index 0000000..9dbed9a
--- /dev/null
+++ b/src/runtime/race/testdata/mutex_test.go
@@ -0,0 +1,150 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestNoRaceMutex(t *testing.T) {
+	var mu sync.Mutex
+	var x int16 = 0
+	_ = x
+	ch := make(chan bool, 2)
+	go func() {
+		mu.Lock()
+		defer mu.Unlock()
+		x = 1
+		ch <- true
+	}()
+	go func() {
+		mu.Lock()
+		x = 2
+		mu.Unlock()
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceMutex(t *testing.T) {
+	var mu sync.Mutex
+	var x int16 = 0
+	_ = x
+	ch := make(chan bool, 2)
+	go func() {
+		x = 1
+		mu.Lock()
+		defer mu.Unlock()
+		ch <- true
+	}()
+	go func() {
+		x = 2
+		mu.Lock()
+		mu.Unlock()
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceMutex2(t *testing.T) {
+	var mu1 sync.Mutex
+	var mu2 sync.Mutex
+	var x int8 = 0
+	_ = x
+	ch := make(chan bool, 2)
+	go func() {
+		mu1.Lock()
+		defer mu1.Unlock()
+		x = 1
+		ch <- true
+	}()
+	go func() {
+		mu2.Lock()
+		x = 2
+		mu2.Unlock()
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceMutexPureHappensBefore(t *testing.T) {
+	var mu sync.Mutex
+	var x int16 = 0
+	_ = x
+	written := false
+	ch := make(chan bool, 2)
+	go func() {
+		x = 1
+		mu.Lock()
+		written = true
+		mu.Unlock()
+		ch <- true
+	}()
+	go func() {
+		time.Sleep(100 * time.Microsecond)
+		mu.Lock()
+		for !written {
+			mu.Unlock()
+			time.Sleep(100 * time.Microsecond)
+			mu.Lock()
+		}
+		mu.Unlock()
+		x = 1
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceMutexSemaphore(t *testing.T) {
+	var mu sync.Mutex
+	ch := make(chan bool, 2)
+	x := 0
+	_ = x
+	mu.Lock()
+	go func() {
+		x = 1
+		mu.Unlock()
+		ch <- true
+	}()
+	go func() {
+		mu.Lock()
+		x = 2
+		mu.Unlock()
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+// from doc/go_mem.html
+func TestNoRaceMutexExampleFromHtml(t *testing.T) {
+	var l sync.Mutex
+	a := ""
+
+	l.Lock()
+	go func() {
+		a = "hello, world"
+		l.Unlock()
+	}()
+	l.Lock()
+	_ = a
+}
+
+func TestRaceMutexOverwrite(t *testing.T) {
+	c := make(chan bool, 1)
+	var mu sync.Mutex
+	go func() {
+		mu = sync.Mutex{}
+		c <- true
+	}()
+	mu.Lock()
+	<-c
+}
diff --git a/src/runtime/race/testdata/pool_test.go b/src/runtime/race/testdata/pool_test.go
new file mode 100644
index 0000000..a96913e
--- /dev/null
+++ b/src/runtime/race/testdata/pool_test.go
@@ -0,0 +1,47 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestRacePool(t *testing.T) {
+	// Pool randomly drops the argument on the floor during Put.
+	// Repeat so that at least one iteration gets reuse.
+	for i := 0; i < 10; i++ {
+		c := make(chan int)
+		p := &sync.Pool{New: func() any { return make([]byte, 10) }}
+		x := p.Get().([]byte)
+		x[0] = 1
+		p.Put(x)
+		go func() {
+			y := p.Get().([]byte)
+			y[0] = 2
+			c <- 1
+		}()
+		x[0] = 3
+		<-c
+	}
+}
+
+func TestNoRacePool(t *testing.T) {
+	for i := 0; i < 10; i++ {
+		p := &sync.Pool{New: func() any { return make([]byte, 10) }}
+		x := p.Get().([]byte)
+		x[0] = 1
+		p.Put(x)
+		go func() {
+			y := p.Get().([]byte)
+			y[0] = 2
+			p.Put(y)
+		}()
+		time.Sleep(100 * time.Millisecond)
+		x = p.Get().([]byte)
+		x[0] = 3
+	}
+}
diff --git a/src/runtime/race/testdata/reflect_test.go b/src/runtime/race/testdata/reflect_test.go
new file mode 100644
index 0000000..b567400
--- /dev/null
+++ b/src/runtime/race/testdata/reflect_test.go
@@ -0,0 +1,46 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"reflect"
+	"testing"
+)
+
+func TestRaceReflectRW(t *testing.T) {
+	ch := make(chan bool, 1)
+	i := 0
+	v := reflect.ValueOf(&i)
+	go func() {
+		v.Elem().Set(reflect.ValueOf(1))
+		ch <- true
+	}()
+	_ = v.Elem().Int()
+	<-ch
+}
+
+func TestRaceReflectWW(t *testing.T) {
+	ch := make(chan bool, 1)
+	i := 0
+	v := reflect.ValueOf(&i)
+	go func() {
+		v.Elem().Set(reflect.ValueOf(1))
+		ch <- true
+	}()
+	v.Elem().Set(reflect.ValueOf(2))
+	<-ch
+}
+
+func TestRaceReflectCopyWW(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]byte, 2)
+	v := reflect.ValueOf(a)
+	go func() {
+		reflect.Copy(v, v)
+		ch <- true
+	}()
+	reflect.Copy(v, v)
+	<-ch
+}
diff --git a/src/runtime/race/testdata/regression_test.go b/src/runtime/race/testdata/regression_test.go
new file mode 100644
index 0000000..6a7802f
--- /dev/null
+++ b/src/runtime/race/testdata/regression_test.go
@@ -0,0 +1,189 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Code patterns that caused problems in the past.
+
+package race_test
+
+import (
+	"testing"
+)
+
+type LogImpl struct {
+	x int
+}
+
+func NewLog() (l LogImpl) {
+	c := make(chan bool)
+	go func() {
+		_ = l
+		c <- true
+	}()
+	l = LogImpl{}
+	<-c
+	return
+}
+
+var _ LogImpl = NewLog()
+
+func MakeMap() map[int]int {
+	return make(map[int]int)
+}
+
+func InstrumentMapLen() {
+	_ = len(MakeMap())
+}
+
+func InstrumentMapLen2() {
+	m := make(map[int]map[int]int)
+	_ = len(m[0])
+}
+
+func InstrumentMapLen3() {
+	m := make(map[int]*map[int]int)
+	_ = len(*m[0])
+}
+
+func TestRaceUnaddressableMapLen(t *testing.T) {
+	m := make(map[int]map[int]int)
+	ch := make(chan int, 1)
+	m[0] = make(map[int]int)
+	go func() {
+		_ = len(m[0])
+		ch <- 0
+	}()
+	m[0][0] = 1
+	<-ch
+}
+
+type Rect struct {
+	x, y int
+}
+
+type Image struct {
+	min, max Rect
+}
+
+//go:noinline
+func NewImage() Image {
+	return Image{}
+}
+
+func AddrOfTemp() {
+	_ = NewImage().min
+}
+
+type TypeID int
+
+func (t *TypeID) encodeType(x int) (tt TypeID, err error) {
+	switch x {
+	case 0:
+		return t.encodeType(x * x)
+	}
+	return 0, nil
+}
+
+type stack []int
+
+func (s *stack) push(x int) {
+	*s = append(*s, x)
+}
+
+func (s *stack) pop() int {
+	i := len(*s)
+	n := (*s)[i-1]
+	*s = (*s)[:i-1]
+	return n
+}
+
+func TestNoRaceStackPushPop(t *testing.T) {
+	var s stack
+	go func(s *stack) {}(&s)
+	s.push(1)
+	x := s.pop()
+	_ = x
+}
+
+type RpcChan struct {
+	c chan bool
+}
+
+var makeChanCalls int
+
+//go:noinline
+func makeChan() *RpcChan {
+	makeChanCalls++
+	c := &RpcChan{make(chan bool, 1)}
+	c.c <- true
+	return c
+}
+
+func call() bool {
+	x := <-makeChan().c
+	return x
+}
+
+func TestNoRaceRpcChan(t *testing.T) {
+	makeChanCalls = 0
+	_ = call()
+	if makeChanCalls != 1 {
+		t.Fatalf("makeChanCalls %d, expected 1\n", makeChanCalls)
+	}
+}
+
+func divInSlice() {
+	v := make([]int64, 10)
+	i := 1
+	_ = v[(i*4)/3]
+}
+
+func TestNoRaceReturn(t *testing.T) {
+	c := make(chan int)
+	noRaceReturn(c)
+	<-c
+}
+
+// Return used to do an implicit a = a, causing a read/write race
+// with the goroutine. Compiler has an optimization to avoid that now.
+// See issue 4014.
+func noRaceReturn(c chan int) (a, b int) {
+	a = 42
+	go func() {
+		_ = a
+		c <- 1
+	}()
+	return a, 10
+}
+
+func issue5431() {
+	var p **inltype
+	if inlinetest(p).x && inlinetest(p).y {
+	} else if inlinetest(p).x || inlinetest(p).y {
+	}
+}
+
+type inltype struct {
+	x, y bool
+}
+
+func inlinetest(p **inltype) *inltype {
+	return *p
+}
+
+type iface interface {
+	Foo() *struct{ b bool }
+}
+
+type Int int
+
+func (i Int) Foo() *struct{ b bool } {
+	return &struct{ b bool }{false}
+}
+
+func TestNoRaceForInfiniteLoop(t *testing.T) {
+	var x Int
+	// interface conversion causes nodes to be put on init list
+	for iface(x).Foo().b {
+	}
+}
diff --git a/src/runtime/race/testdata/rwmutex_test.go b/src/runtime/race/testdata/rwmutex_test.go
new file mode 100644
index 0000000..39219e5
--- /dev/null
+++ b/src/runtime/race/testdata/rwmutex_test.go
@@ -0,0 +1,154 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestRaceMutexRWMutex(t *testing.T) {
+	var mu1 sync.Mutex
+	var mu2 sync.RWMutex
+	var x int16 = 0
+	_ = x
+	ch := make(chan bool, 2)
+	go func() {
+		mu1.Lock()
+		defer mu1.Unlock()
+		x = 1
+		ch <- true
+	}()
+	go func() {
+		mu2.Lock()
+		x = 2
+		mu2.Unlock()
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestNoRaceRWMutex(t *testing.T) {
+	var mu sync.RWMutex
+	var x, y int64 = 0, 1
+	_ = y
+	ch := make(chan bool, 2)
+	go func() {
+		mu.Lock()
+		defer mu.Unlock()
+		x = 2
+		ch <- true
+	}()
+	go func() {
+		mu.RLock()
+		y = x
+		mu.RUnlock()
+		ch <- true
+	}()
+	<-ch
+	<-ch
+}
+
+func TestRaceRWMutexMultipleReaders(t *testing.T) {
+	var mu sync.RWMutex
+	var x, y int64 = 0, 1
+	ch := make(chan bool, 4)
+	go func() {
+		mu.Lock()
+		defer mu.Unlock()
+		x = 2
+		ch <- true
+	}()
+	// Use three readers so that no matter what order they're
+	// scheduled in, two will be on the same side of the write
+	// lock above.
+	go func() {
+		mu.RLock()
+		y = x + 1
+		mu.RUnlock()
+		ch <- true
+	}()
+	go func() {
+		mu.RLock()
+		y = x + 2
+		mu.RUnlock()
+		ch <- true
+	}()
+	go func() {
+		mu.RLock()
+		y = x + 3
+		mu.RUnlock()
+		ch <- true
+	}()
+	<-ch
+	<-ch
+	<-ch
+	<-ch
+	_ = y
+}
+
+func TestNoRaceRWMutexMultipleReaders(t *testing.T) {
+	var mu sync.RWMutex
+	x := int64(0)
+	ch := make(chan bool, 4)
+	go func() {
+		mu.Lock()
+		defer mu.Unlock()
+		x = 2
+		ch <- true
+	}()
+	go func() {
+		mu.RLock()
+		y := x + 1
+		_ = y
+		mu.RUnlock()
+		ch <- true
+	}()
+	go func() {
+		mu.RLock()
+		y := x + 2
+		_ = y
+		mu.RUnlock()
+		ch <- true
+	}()
+	go func() {
+		mu.RLock()
+		y := x + 3
+		_ = y
+		mu.RUnlock()
+		ch <- true
+	}()
+	<-ch
+	<-ch
+	<-ch
+	<-ch
+}
+
+func TestNoRaceRWMutexTransitive(t *testing.T) {
+	var mu sync.RWMutex
+	x := int64(0)
+	ch := make(chan bool, 2)
+	go func() {
+		mu.RLock()
+		_ = x
+		mu.RUnlock()
+		ch <- true
+	}()
+	go func() {
+		time.Sleep(1e7)
+		mu.RLock()
+		_ = x
+		mu.RUnlock()
+		ch <- true
+	}()
+	time.Sleep(2e7)
+	mu.Lock()
+	x = 42
+	mu.Unlock()
+	<-ch
+	<-ch
+}
diff --git a/src/runtime/race/testdata/select_test.go b/src/runtime/race/testdata/select_test.go
new file mode 100644
index 0000000..9a43f9b
--- /dev/null
+++ b/src/runtime/race/testdata/select_test.go
@@ -0,0 +1,293 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"runtime"
+	"testing"
+)
+
+func TestNoRaceSelect1(t *testing.T) {
+	var x int
+	_ = x
+	compl := make(chan bool)
+	c := make(chan bool)
+	c1 := make(chan bool)
+
+	go func() {
+		x = 1
+		// At least two channels are needed because
+		// otherwise the compiler optimizes select out.
+		// See comment in runtime/select.go:^func selectgo.
+		select {
+		case c <- true:
+		case c1 <- true:
+		}
+		compl <- true
+	}()
+	select {
+	case <-c:
+	case c1 <- true:
+	}
+	x = 2
+	<-compl
+}
+
+func TestNoRaceSelect2(t *testing.T) {
+	var x int
+	_ = x
+	compl := make(chan bool)
+	c := make(chan bool)
+	c1 := make(chan bool)
+	go func() {
+		select {
+		case <-c:
+		case <-c1:
+		}
+		x = 1
+		compl <- true
+	}()
+	x = 2
+	close(c)
+	runtime.Gosched()
+	<-compl
+}
+
+func TestNoRaceSelect3(t *testing.T) {
+	var x int
+	_ = x
+	compl := make(chan bool)
+	c := make(chan bool, 10)
+	c1 := make(chan bool)
+	go func() {
+		x = 1
+		select {
+		case c <- true:
+		case <-c1:
+		}
+		compl <- true
+	}()
+	<-c
+	x = 2
+	<-compl
+}
+
+func TestNoRaceSelect4(t *testing.T) {
+	type Task struct {
+		f    func()
+		done chan bool
+	}
+
+	queue := make(chan Task)
+	dummy := make(chan bool)
+
+	go func() {
+		for {
+			select {
+			case t := <-queue:
+				t.f()
+				t.done <- true
+			}
+		}
+	}()
+
+	doit := func(f func()) {
+		done := make(chan bool, 1)
+		select {
+		case queue <- Task{f, done}:
+		case <-dummy:
+		}
+		select {
+		case <-done:
+		case <-dummy:
+		}
+	}
+
+	var x int
+	doit(func() {
+		x = 1
+	})
+	_ = x
+}
+
+func TestNoRaceSelect5(t *testing.T) {
+	test := func(sel, needSched bool) {
+		var x int
+		_ = x
+		ch := make(chan bool)
+		c1 := make(chan bool)
+
+		done := make(chan bool, 2)
+		go func() {
+			if needSched {
+				runtime.Gosched()
+			}
+			// println(1)
+			x = 1
+			if sel {
+				select {
+				case ch <- true:
+				case <-c1:
+				}
+			} else {
+				ch <- true
+			}
+			done <- true
+		}()
+
+		go func() {
+			// println(2)
+			if sel {
+				select {
+				case <-ch:
+				case <-c1:
+				}
+			} else {
+				<-ch
+			}
+			x = 1
+			done <- true
+		}()
+		<-done
+		<-done
+	}
+
+	test(true, true)
+	test(true, false)
+	test(false, true)
+	test(false, false)
+}
+
+func TestRaceSelect1(t *testing.T) {
+	var x int
+	_ = x
+	compl := make(chan bool, 2)
+	c := make(chan bool)
+	c1 := make(chan bool)
+
+	go func() {
+		<-c
+		<-c
+	}()
+	f := func() {
+		select {
+		case c <- true:
+		case c1 <- true:
+		}
+		x = 1
+		compl <- true
+	}
+	go f()
+	go f()
+	<-compl
+	<-compl
+}
+
+func TestRaceSelect2(t *testing.T) {
+	var x int
+	_ = x
+	compl := make(chan bool)
+	c := make(chan bool)
+	c1 := make(chan bool)
+	go func() {
+		x = 1
+		select {
+		case <-c:
+		case <-c1:
+		}
+		compl <- true
+	}()
+	close(c)
+	x = 2
+	<-compl
+}
+
+func TestRaceSelect3(t *testing.T) {
+	var x int
+	_ = x
+	compl := make(chan bool)
+	c := make(chan bool)
+	c1 := make(chan bool)
+	go func() {
+		x = 1
+		select {
+		case c <- true:
+		case c1 <- true:
+		}
+		compl <- true
+	}()
+	x = 2
+	select {
+	case <-c:
+	}
+	<-compl
+}
+
+func TestRaceSelect4(t *testing.T) {
+	done := make(chan bool, 1)
+	var x int
+	go func() {
+		select {
+		default:
+			x = 2
+		}
+		done <- true
+	}()
+	_ = x
+	<-done
+}
+
+// The idea behind this test:
+// there are two variables, access to one
+// of them is synchronized, access to the other
+// is not.
+// Select must (unconditionally) choose the non-synchronized variable
+// thus causing exactly one race.
+// Currently this test doesn't look like it accomplishes
+// this goal.
+func TestRaceSelect5(t *testing.T) {
+	done := make(chan bool, 1)
+	c1 := make(chan bool, 1)
+	c2 := make(chan bool)
+	var x, y int
+	go func() {
+		select {
+		case c1 <- true:
+			x = 1
+		case c2 <- true:
+			y = 1
+		}
+		done <- true
+	}()
+	_ = x
+	_ = y
+	<-done
+}
+
+// select statements may introduce
+// flakiness: whether this test contains
+// a race depends on the scheduling
+// (some may argue that the code contains
+// this race by definition)
+/*
+func TestFlakyDefault(t *testing.T) {
+	var x int
+	c := make(chan bool, 1)
+	done := make(chan bool, 1)
+	go func() {
+		select {
+		case <-c:
+			x = 2
+		default:
+			x = 3
+		}
+		done <- true
+	}()
+	x = 1
+	c <- true
+	_ = x
+	<-done
+}
+*/
diff --git a/src/runtime/race/testdata/slice_test.go b/src/runtime/race/testdata/slice_test.go
new file mode 100644
index 0000000..9009a9a
--- /dev/null
+++ b/src/runtime/race/testdata/slice_test.go
@@ -0,0 +1,608 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"sync"
+	"testing"
+)
+
+func TestRaceSliceRW(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]int, 2)
+	go func() {
+		a[1] = 1
+		ch <- true
+	}()
+	_ = a[1]
+	<-ch
+}
+
+func TestNoRaceSliceRW(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]int, 2)
+	go func() {
+		a[0] = 1
+		ch <- true
+	}()
+	_ = a[1]
+	<-ch
+}
+
+func TestRaceSliceWW(t *testing.T) {
+	a := make([]int, 10)
+	ch := make(chan bool, 1)
+	go func() {
+		a[1] = 1
+		ch <- true
+	}()
+	a[1] = 2
+	<-ch
+}
+
+func TestNoRaceArrayWW(t *testing.T) {
+	var a [5]int
+	ch := make(chan bool, 1)
+	go func() {
+		a[0] = 1
+		ch <- true
+	}()
+	a[1] = 2
+	<-ch
+}
+
+func TestRaceArrayWW(t *testing.T) {
+	var a [5]int
+	ch := make(chan bool, 1)
+	go func() {
+		a[1] = 1
+		ch <- true
+	}()
+	a[1] = 2
+	<-ch
+}
+
+func TestNoRaceSliceWriteLen(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]bool, 1)
+	go func() {
+		a[0] = true
+		ch <- true
+	}()
+	_ = len(a)
+	<-ch
+}
+
+func TestNoRaceSliceWriteCap(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]uint64, 100)
+	go func() {
+		a[50] = 123
+		ch <- true
+	}()
+	_ = cap(a)
+	<-ch
+}
+
+func TestRaceSliceCopyRead(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]int, 10)
+	b := make([]int, 10)
+	go func() {
+		_ = a[5]
+		ch <- true
+	}()
+	copy(a, b)
+	<-ch
+}
+
+func TestNoRaceSliceWriteCopy(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]int, 10)
+	b := make([]int, 10)
+	go func() {
+		a[5] = 1
+		ch <- true
+	}()
+	copy(a[:5], b[:5])
+	<-ch
+}
+
+func TestRaceSliceCopyWrite2(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]int, 10)
+	b := make([]int, 10)
+	go func() {
+		b[5] = 1
+		ch <- true
+	}()
+	copy(a, b)
+	<-ch
+}
+
+func TestRaceSliceCopyWrite3(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]byte, 10)
+	go func() {
+		a[7] = 1
+		ch <- true
+	}()
+	copy(a, "qwertyqwerty")
+	<-ch
+}
+
+func TestNoRaceSliceCopyRead(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]int, 10)
+	b := make([]int, 10)
+	go func() {
+		_ = b[5]
+		ch <- true
+	}()
+	copy(a, b)
+	<-ch
+}
+
+func TestRacePointerSliceCopyRead(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]*int, 10)
+	b := make([]*int, 10)
+	go func() {
+		_ = a[5]
+		ch <- true
+	}()
+	copy(a, b)
+	<-ch
+}
+
+func TestNoRacePointerSliceWriteCopy(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]*int, 10)
+	b := make([]*int, 10)
+	go func() {
+		a[5] = new(int)
+		ch <- true
+	}()
+	copy(a[:5], b[:5])
+	<-ch
+}
+
+func TestRacePointerSliceCopyWrite2(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]*int, 10)
+	b := make([]*int, 10)
+	go func() {
+		b[5] = new(int)
+		ch <- true
+	}()
+	copy(a, b)
+	<-ch
+}
+
+func TestNoRacePointerSliceCopyRead(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]*int, 10)
+	b := make([]*int, 10)
+	go func() {
+		_ = b[5]
+		ch <- true
+	}()
+	copy(a, b)
+	<-ch
+}
+
+func TestNoRaceSliceWriteSlice2(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]float64, 10)
+	go func() {
+		a[2] = 1.0
+		ch <- true
+	}()
+	_ = a[0:5]
+	<-ch
+}
+
+func TestRaceSliceWriteSlice(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]float64, 10)
+	go func() {
+		a[2] = 1.0
+		ch <- true
+	}()
+	a = a[5:10]
+	<-ch
+}
+
+func TestNoRaceSliceWriteSlice(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]float64, 10)
+	go func() {
+		a[2] = 1.0
+		ch <- true
+	}()
+	_ = a[5:10]
+	<-ch
+}
+
+func TestNoRaceSliceLenCap(t *testing.T) {
+	ch := make(chan bool, 1)
+	a := make([]struct{}, 10)
+	go func() {
+		_ = len(a)
+		ch <- true
+	}()
+	_ = cap(a)
+	<-ch
+}
+
+func TestNoRaceStructSlicesRangeWrite(t *testing.T) {
+	type Str struct {
+		a []int
+		b []int
+	}
+	ch := make(chan bool, 1)
+	var s Str
+	s.a = make([]int, 10)
+	s.b = make([]int, 10)
+	go func() {
+		for range s.a {
+		}
+		ch <- true
+	}()
+	s.b[5] = 5
+	<-ch
+}
+
+func TestRaceSliceDifferent(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	s2 := s
+	go func() {
+		s[3] = 3
+		c <- true
+	}()
+	// false negative because s2 is PAUTO w/o PHEAP
+	// so we do not instrument it
+	s2[3] = 3
+	<-c
+}
+
+func TestRaceSliceRangeWrite(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		s[3] = 3
+		c <- true
+	}()
+	for _, v := range s {
+		_ = v
+	}
+	<-c
+}
+
+func TestNoRaceSliceRangeWrite(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		s[3] = 3
+		c <- true
+	}()
+	for range s {
+	}
+	<-c
+}
+
+func TestRaceSliceRangeAppend(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		s = append(s, 3)
+		c <- true
+	}()
+	for range s {
+	}
+	<-c
+}
+
+func TestNoRaceSliceRangeAppend(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		_ = append(s, 3)
+		c <- true
+	}()
+	for range s {
+	}
+	<-c
+}
+
+func TestRaceSliceVarWrite(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		s[3] = 3
+		c <- true
+	}()
+	s = make([]int, 20)
+	<-c
+}
+
+func TestRaceSliceVarRead(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		_ = s[3]
+		c <- true
+	}()
+	s = make([]int, 20)
+	<-c
+}
+
+func TestRaceSliceVarRange(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		for range s {
+		}
+		c <- true
+	}()
+	s = make([]int, 20)
+	<-c
+}
+
+func TestRaceSliceVarAppend(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		_ = append(s, 10)
+		c <- true
+	}()
+	s = make([]int, 20)
+	<-c
+}
+
+func TestRaceSliceVarCopy(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		s2 := make([]int, 10)
+		copy(s, s2)
+		c <- true
+	}()
+	s = make([]int, 20)
+	<-c
+}
+
+func TestRaceSliceVarCopy2(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		s2 := make([]int, 10)
+		copy(s2, s)
+		c <- true
+	}()
+	s = make([]int, 20)
+	<-c
+}
+
+func TestRaceSliceAppend(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10, 20)
+	go func() {
+		_ = append(s, 1)
+		c <- true
+	}()
+	_ = append(s, 2)
+	<-c
+}
+
+func TestRaceSliceAppendWrite(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		_ = append(s, 1)
+		c <- true
+	}()
+	s[0] = 42
+	<-c
+}
+
+func TestRaceSliceAppendSlice(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	go func() {
+		s2 := make([]int, 10)
+		_ = append(s, s2...)
+		c <- true
+	}()
+	s[0] = 42
+	<-c
+}
+
+func TestRaceSliceAppendSlice2(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	s2foobar := make([]int, 10)
+	go func() {
+		_ = append(s, s2foobar...)
+		c <- true
+	}()
+	s2foobar[5] = 42
+	<-c
+}
+
+func TestRaceSliceAppendString(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]byte, 10)
+	go func() {
+		_ = append(s, "qwerty"...)
+		c <- true
+	}()
+	s[0] = 42
+	<-c
+}
+
+func TestRacePointerSliceAppend(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]*int, 10, 20)
+	go func() {
+		_ = append(s, new(int))
+		c <- true
+	}()
+	_ = append(s, new(int))
+	<-c
+}
+
+func TestRacePointerSliceAppendWrite(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]*int, 10)
+	go func() {
+		_ = append(s, new(int))
+		c <- true
+	}()
+	s[0] = new(int)
+	<-c
+}
+
+func TestRacePointerSliceAppendSlice(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]*int, 10)
+	go func() {
+		s2 := make([]*int, 10)
+		_ = append(s, s2...)
+		c <- true
+	}()
+	s[0] = new(int)
+	<-c
+}
+
+func TestRacePointerSliceAppendSlice2(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]*int, 10)
+	s2foobar := make([]*int, 10)
+	go func() {
+		_ = append(s, s2foobar...)
+		c <- true
+	}()
+	println("WRITE:", &s2foobar[5])
+	s2foobar[5] = nil
+	<-c
+}
+
+func TestNoRaceSliceIndexAccess(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	v := 0
+	go func() {
+		_ = v
+		c <- true
+	}()
+	s[v] = 1
+	<-c
+}
+
+func TestNoRaceSliceIndexAccess2(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	v := 0
+	go func() {
+		_ = v
+		c <- true
+	}()
+	_ = s[v]
+	<-c
+}
+
+func TestRaceSliceIndexAccess(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	v := 0
+	go func() {
+		v = 1
+		c <- true
+	}()
+	s[v] = 1
+	<-c
+}
+
+func TestRaceSliceIndexAccess2(t *testing.T) {
+	c := make(chan bool, 1)
+	s := make([]int, 10)
+	v := 0
+	go func() {
+		v = 1
+		c <- true
+	}()
+	_ = s[v]
+	<-c
+}
+
+func TestRaceSliceByteToString(t *testing.T) {
+	c := make(chan string)
+	s := make([]byte, 10)
+	go func() {
+		c <- string(s)
+	}()
+	s[0] = 42
+	<-c
+}
+
+func TestRaceSliceRuneToString(t *testing.T) {
+	c := make(chan string)
+	s := make([]rune, 10)
+	go func() {
+		c <- string(s)
+	}()
+	s[9] = 42
+	<-c
+}
+
+func TestRaceConcatString(t *testing.T) {
+	s := "hello"
+	c := make(chan string, 1)
+	go func() {
+		c <- s + " world"
+	}()
+	s = "world"
+	<-c
+}
+
+func TestRaceCompareString(t *testing.T) {
+	s1 := "hello"
+	s2 := "world"
+	c := make(chan bool, 1)
+	go func() {
+		c <- s1 == s2
+	}()
+	s1 = s2
+	<-c
+}
+
+func TestRaceSlice3(t *testing.T) {
+	done := make(chan bool)
+	x := make([]int, 10)
+	i := 2
+	go func() {
+		i = 3
+		done <- true
+	}()
+	_ = x[:1:i]
+	<-done
+}
+
+var saved string
+
+func TestRaceSlice4(t *testing.T) {
+	// See issue 36794.
+	data := []byte("hello there")
+	var wg sync.WaitGroup
+	wg.Add(1)
+	go func() {
+		_ = string(data)
+		wg.Done()
+	}()
+	copy(data, data[2:])
+	wg.Wait()
+}
diff --git a/src/runtime/race/testdata/sync_test.go b/src/runtime/race/testdata/sync_test.go
new file mode 100644
index 0000000..b5fcd6c
--- /dev/null
+++ b/src/runtime/race/testdata/sync_test.go
@@ -0,0 +1,202 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestNoRaceCond(t *testing.T) {
+	x := 0
+	_ = x
+	condition := 0
+	var mu sync.Mutex
+	cond := sync.NewCond(&mu)
+	go func() {
+		x = 1
+		mu.Lock()
+		condition = 1
+		cond.Signal()
+		mu.Unlock()
+	}()
+	mu.Lock()
+	for condition != 1 {
+		cond.Wait()
+	}
+	mu.Unlock()
+	x = 2
+}
+
+func TestRaceCond(t *testing.T) {
+	done := make(chan bool)
+	var mu sync.Mutex
+	cond := sync.NewCond(&mu)
+	x := 0
+	_ = x
+	condition := 0
+	go func() {
+		time.Sleep(10 * time.Millisecond) // Enter cond.Wait loop
+		x = 1
+		mu.Lock()
+		condition = 1
+		cond.Signal()
+		mu.Unlock()
+		time.Sleep(10 * time.Millisecond) // Exit cond.Wait loop
+		mu.Lock()
+		x = 3
+		mu.Unlock()
+		done <- true
+	}()
+	mu.Lock()
+	for condition != 1 {
+		cond.Wait()
+	}
+	mu.Unlock()
+	x = 2
+	<-done
+}
+
+// We do not currently automatically
+// parse this test. It is intended that the creation
+// stack is observed manually not to contain
+// off-by-one errors
+func TestRaceAnnounceThreads(t *testing.T) {
+	const N = 7
+	allDone := make(chan bool, N)
+
+	var x int
+	_ = x
+
+	var f, g, h func()
+	f = func() {
+		x = 1
+		go g()
+		go func() {
+			x = 1
+			allDone <- true
+		}()
+		x = 2
+		allDone <- true
+	}
+
+	g = func() {
+		for i := 0; i < 2; i++ {
+			go func() {
+				x = 1
+				allDone <- true
+			}()
+			allDone <- true
+		}
+	}
+
+	h = func() {
+		x = 1
+		x = 2
+		go f()
+		allDone <- true
+	}
+
+	go h()
+
+	for i := 0; i < N; i++ {
+		<-allDone
+	}
+}
+
+func TestNoRaceAfterFunc1(t *testing.T) {
+	i := 2
+	c := make(chan bool)
+	var f func()
+	f = func() {
+		i--
+		if i >= 0 {
+			time.AfterFunc(0, f)
+		} else {
+			c <- true
+		}
+	}
+
+	time.AfterFunc(0, f)
+	<-c
+}
+
+func TestNoRaceAfterFunc2(t *testing.T) {
+	var x int
+	_ = x
+	timer := time.AfterFunc(10, func() {
+		x = 1
+	})
+	defer timer.Stop()
+}
+
+func TestNoRaceAfterFunc3(t *testing.T) {
+	c := make(chan bool, 1)
+	x := 0
+	_ = x
+	time.AfterFunc(1e7, func() {
+		x = 1
+		c <- true
+	})
+	<-c
+}
+
+func TestRaceAfterFunc3(t *testing.T) {
+	c := make(chan bool, 2)
+	x := 0
+	_ = x
+	time.AfterFunc(1e7, func() {
+		x = 1
+		c <- true
+	})
+	time.AfterFunc(2e7, func() {
+		x = 2
+		c <- true
+	})
+	<-c
+	<-c
+}
+
+// This test's output is intended to be
+// observed manually. One should check
+// that goroutine creation stack is
+// comprehensible.
+func TestRaceGoroutineCreationStack(t *testing.T) {
+	var x int
+	_ = x
+	var ch = make(chan bool, 1)
+
+	f1 := func() {
+		x = 1
+		ch <- true
+	}
+	f2 := func() { go f1() }
+	f3 := func() { go f2() }
+	f4 := func() { go f3() }
+
+	go f4()
+	x = 2
+	<-ch
+}
+
+// A nil pointer in a mutex method call should not
+// corrupt the race detector state.
+// Used to hang indefinitely.
+func TestNoRaceNilMutexCrash(t *testing.T) {
+	var mutex sync.Mutex
+	panics := 0
+	defer func() {
+		if x := recover(); x != nil {
+			mutex.Lock()
+			panics++
+			mutex.Unlock()
+		} else {
+			panic("no panic")
+		}
+	}()
+	var othermutex *sync.RWMutex
+	othermutex.RLock()
+}
diff --git a/src/runtime/race/testdata/waitgroup_test.go b/src/runtime/race/testdata/waitgroup_test.go
new file mode 100644
index 0000000..1693373
--- /dev/null
+++ b/src/runtime/race/testdata/waitgroup_test.go
@@ -0,0 +1,360 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package race_test
+
+import (
+	"runtime"
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestNoRaceWaitGroup(t *testing.T) {
+	var x int
+	_ = x
+	var wg sync.WaitGroup
+	n := 1
+	for i := 0; i < n; i++ {
+		wg.Add(1)
+		j := i
+		go func() {
+			x = j
+			wg.Done()
+		}()
+	}
+	wg.Wait()
+}
+
+func TestRaceWaitGroup(t *testing.T) {
+	var x int
+	_ = x
+	var wg sync.WaitGroup
+	n := 2
+	for i := 0; i < n; i++ {
+		wg.Add(1)
+		j := i
+		go func() {
+			x = j
+			wg.Done()
+		}()
+	}
+	wg.Wait()
+}
+
+func TestNoRaceWaitGroup2(t *testing.T) {
+	var x int
+	_ = x
+	var wg sync.WaitGroup
+	wg.Add(1)
+	go func() {
+		x = 1
+		wg.Done()
+	}()
+	wg.Wait()
+	x = 2
+}
+
+// incrementing counter in Add and locking wg's mutex
+func TestRaceWaitGroupAsMutex(t *testing.T) {
+	var x int
+	_ = x
+	var wg sync.WaitGroup
+	c := make(chan bool, 2)
+	go func() {
+		wg.Wait()
+		time.Sleep(100 * time.Millisecond)
+		wg.Add(+1)
+		x = 1
+		wg.Add(-1)
+		c <- true
+	}()
+	go func() {
+		wg.Wait()
+		time.Sleep(100 * time.Millisecond)
+		wg.Add(+1)
+		x = 2
+		wg.Add(-1)
+		c <- true
+	}()
+	<-c
+	<-c
+}
+
+// Incorrect usage: Add is too late.
+func TestRaceWaitGroupWrongWait(t *testing.T) {
+	c := make(chan bool, 2)
+	var x int
+	_ = x
+	var wg sync.WaitGroup
+	go func() {
+		wg.Add(1)
+		runtime.Gosched()
+		x = 1
+		wg.Done()
+		c <- true
+	}()
+	go func() {
+		wg.Add(1)
+		runtime.Gosched()
+		x = 2
+		wg.Done()
+		c <- true
+	}()
+	wg.Wait()
+	<-c
+	<-c
+}
+
+func TestRaceWaitGroupWrongAdd(t *testing.T) {
+	c := make(chan bool, 2)
+	var wg sync.WaitGroup
+	go func() {
+		wg.Add(1)
+		time.Sleep(100 * time.Millisecond)
+		wg.Done()
+		c <- true
+	}()
+	go func() {
+		wg.Add(1)
+		time.Sleep(100 * time.Millisecond)
+		wg.Done()
+		c <- true
+	}()
+	time.Sleep(50 * time.Millisecond)
+	wg.Wait()
+	<-c
+	<-c
+}
+
+func TestNoRaceWaitGroupMultipleWait(t *testing.T) {
+	c := make(chan bool, 2)
+	var wg sync.WaitGroup
+	go func() {
+		wg.Wait()
+		c <- true
+	}()
+	go func() {
+		wg.Wait()
+		c <- true
+	}()
+	wg.Wait()
+	<-c
+	<-c
+}
+
+func TestNoRaceWaitGroupMultipleWait2(t *testing.T) {
+	c := make(chan bool, 2)
+	var wg sync.WaitGroup
+	wg.Add(2)
+	go func() {
+		wg.Done()
+		wg.Wait()
+		c <- true
+	}()
+	go func() {
+		wg.Done()
+		wg.Wait()
+		c <- true
+	}()
+	wg.Wait()
+	<-c
+	<-c
+}
+
+func TestNoRaceWaitGroupMultipleWait3(t *testing.T) {
+	const P = 3
+	var data [P]int
+	done := make(chan bool, P)
+	var wg sync.WaitGroup
+	wg.Add(P)
+	for p := 0; p < P; p++ {
+		go func(p int) {
+			data[p] = 42
+			wg.Done()
+		}(p)
+	}
+	for p := 0; p < P; p++ {
+		go func() {
+			wg.Wait()
+			for p1 := 0; p1 < P; p1++ {
+				_ = data[p1]
+			}
+			done <- true
+		}()
+	}
+	for p := 0; p < P; p++ {
+		<-done
+	}
+}
+
+// Correct usage but still a race
+func TestRaceWaitGroup2(t *testing.T) {
+	var x int
+	_ = x
+	var wg sync.WaitGroup
+	wg.Add(2)
+	go func() {
+		x = 1
+		wg.Done()
+	}()
+	go func() {
+		x = 2
+		wg.Done()
+	}()
+	wg.Wait()
+}
+
+func TestNoRaceWaitGroupPanicRecover(t *testing.T) {
+	var x int
+	_ = x
+	var wg sync.WaitGroup
+	defer func() {
+		err := recover()
+		if err != "sync: negative WaitGroup counter" {
+			t.Fatalf("Unexpected panic: %#v", err)
+		}
+		x = 2
+	}()
+	x = 1
+	wg.Add(-1)
+}
+
+// TODO: this is actually a panic-synchronization test, not a
+// WaitGroup test. Move it to another *_test file
+// Is it possible to get a race by synchronization via panic?
+func TestNoRaceWaitGroupPanicRecover2(t *testing.T) {
+	var x int
+	_ = x
+	var wg sync.WaitGroup
+	ch := make(chan bool, 1)
+	var f func() = func() {
+		x = 2
+		ch <- true
+	}
+	go func() {
+		defer func() {
+			err := recover()
+			if err != "sync: negative WaitGroup counter" {
+			}
+			go f()
+		}()
+		x = 1
+		wg.Add(-1)
+	}()
+
+	<-ch
+}
+
+func TestNoRaceWaitGroupTransitive(t *testing.T) {
+	x, y := 0, 0
+	var wg sync.WaitGroup
+	wg.Add(2)
+	go func() {
+		x = 42
+		wg.Done()
+	}()
+	go func() {
+		time.Sleep(1e7)
+		y = 42
+		wg.Done()
+	}()
+	wg.Wait()
+	_ = x
+	_ = y
+}
+
+func TestNoRaceWaitGroupReuse(t *testing.T) {
+	const P = 3
+	var data [P]int
+	var wg sync.WaitGroup
+	for try := 0; try < 3; try++ {
+		wg.Add(P)
+		for p := 0; p < P; p++ {
+			go func(p int) {
+				data[p]++
+				wg.Done()
+			}(p)
+		}
+		wg.Wait()
+		for p := 0; p < P; p++ {
+			data[p]++
+		}
+	}
+}
+
+func TestNoRaceWaitGroupReuse2(t *testing.T) {
+	const P = 3
+	var data [P]int
+	var wg sync.WaitGroup
+	for try := 0; try < 3; try++ {
+		wg.Add(P)
+		for p := 0; p < P; p++ {
+			go func(p int) {
+				data[p]++
+				wg.Done()
+			}(p)
+		}
+		done := make(chan bool)
+		go func() {
+			wg.Wait()
+			for p := 0; p < P; p++ {
+				data[p]++
+			}
+			done <- true
+		}()
+		wg.Wait()
+		<-done
+		for p := 0; p < P; p++ {
+			data[p]++
+		}
+	}
+}
+
+func TestRaceWaitGroupReuse(t *testing.T) {
+	const P = 3
+	const T = 3
+	done := make(chan bool, T)
+	var wg sync.WaitGroup
+	for try := 0; try < T; try++ {
+		var data [P]int
+		wg.Add(P)
+		for p := 0; p < P; p++ {
+			go func(p int) {
+				time.Sleep(50 * time.Millisecond)
+				data[p]++
+				wg.Done()
+			}(p)
+		}
+		go func() {
+			wg.Wait()
+			for p := 0; p < P; p++ {
+				data[p]++
+			}
+			done <- true
+		}()
+		time.Sleep(100 * time.Millisecond)
+		wg.Wait()
+	}
+	for try := 0; try < T; try++ {
+		<-done
+	}
+}
+
+func TestNoRaceWaitGroupConcurrentAdd(t *testing.T) {
+	const P = 4
+	waiting := make(chan bool, P)
+	var wg sync.WaitGroup
+	for p := 0; p < P; p++ {
+		go func() {
+			wg.Add(1)
+			waiting <- true
+			wg.Done()
+		}()
+	}
+	for p := 0; p < P; p++ {
+		<-waiting
+	}
+	wg.Wait()
+}
diff --git a/src/runtime/race/timer_test.go b/src/runtime/race/timer_test.go
new file mode 100644
index 0000000..dd59005
--- /dev/null
+++ b/src/runtime/race/timer_test.go
@@ -0,0 +1,33 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build race
+
+package race_test
+
+import (
+	"sync"
+	"testing"
+	"time"
+)
+
+func TestTimers(t *testing.T) {
+	const goroutines = 8
+	var wg sync.WaitGroup
+	wg.Add(goroutines)
+	var mu sync.Mutex
+	for i := 0; i < goroutines; i++ {
+		go func() {
+			defer wg.Done()
+			ticker := time.NewTicker(1)
+			defer ticker.Stop()
+			for c := 0; c < 1000; c++ {
+				<-ticker.C
+				mu.Lock()
+				mu.Unlock()
+			}
+		}()
+	}
+	wg.Wait()
+}
diff --git a/src/runtime/race0.go b/src/runtime/race0.go
new file mode 100644
index 0000000..f36d438
--- /dev/null
+++ b/src/runtime/race0.go
@@ -0,0 +1,44 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !race
+
+// Dummy race detection API, used when not built with -race.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+const raceenabled = false
+
+// Because raceenabled is false, none of these functions should be called.
+
+func raceReadObjectPC(t *_type, addr unsafe.Pointer, callerpc, pc uintptr)  { throw("race") }
+func raceWriteObjectPC(t *_type, addr unsafe.Pointer, callerpc, pc uintptr) { throw("race") }
+func raceinit() (uintptr, uintptr)                                          { throw("race"); return 0, 0 }
+func racefini()                                                             { throw("race") }
+func raceproccreate() uintptr                                               { throw("race"); return 0 }
+func raceprocdestroy(ctx uintptr)                                           { throw("race") }
+func racemapshadow(addr unsafe.Pointer, size uintptr)                       { throw("race") }
+func racewritepc(addr unsafe.Pointer, callerpc, pc uintptr)                 { throw("race") }
+func racereadpc(addr unsafe.Pointer, callerpc, pc uintptr)                  { throw("race") }
+func racereadrangepc(addr unsafe.Pointer, sz, callerpc, pc uintptr)         { throw("race") }
+func racewriterangepc(addr unsafe.Pointer, sz, callerpc, pc uintptr)        { throw("race") }
+func raceacquire(addr unsafe.Pointer)                                       { throw("race") }
+func raceacquireg(gp *g, addr unsafe.Pointer)                               { throw("race") }
+func raceacquirectx(racectx uintptr, addr unsafe.Pointer)                   { throw("race") }
+func racerelease(addr unsafe.Pointer)                                       { throw("race") }
+func racereleaseg(gp *g, addr unsafe.Pointer)                               { throw("race") }
+func racereleaseacquire(addr unsafe.Pointer)                                { throw("race") }
+func racereleaseacquireg(gp *g, addr unsafe.Pointer)                        { throw("race") }
+func racereleasemerge(addr unsafe.Pointer)                                  { throw("race") }
+func racereleasemergeg(gp *g, addr unsafe.Pointer)                          { throw("race") }
+func racefingo()                                                            { throw("race") }
+func racemalloc(p unsafe.Pointer, sz uintptr)                               { throw("race") }
+func racefree(p unsafe.Pointer, sz uintptr)                                 { throw("race") }
+func racegostart(pc uintptr) uintptr                                        { throw("race"); return 0 }
+func racegoend()                                                            { throw("race") }
+func racectxend(racectx uintptr)                                            { throw("race") }
diff --git a/src/runtime/race_amd64.s b/src/runtime/race_amd64.s
new file mode 100644
index 0000000..4fa130e
--- /dev/null
+++ b/src/runtime/race_amd64.s
@@ -0,0 +1,457 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build race
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+#include "cgo/abi_amd64.h"
+
+// The following thunks allow calling the gcc-compiled race runtime directly
+// from Go code without going all the way through cgo.
+// First, it's much faster (up to 50% speedup for real Go programs).
+// Second, it eliminates race-related special cases from cgocall and scheduler.
+// Third, in long-term it will allow to remove cyclic runtime/race dependency on cmd/go.
+
+// A brief recap of the amd64 calling convention.
+// Arguments are passed in DI, SI, DX, CX, R8, R9, the rest is on stack.
+// Callee-saved registers are: BX, BP, R12-R15.
+// SP must be 16-byte aligned.
+// On Windows:
+// Arguments are passed in CX, DX, R8, R9, the rest is on stack.
+// Callee-saved registers are: BX, BP, DI, SI, R12-R15.
+// SP must be 16-byte aligned. Windows also requires "stack-backing" for the 4 register arguments:
+// https://msdn.microsoft.com/en-us/library/ms235286.aspx
+// We do not do this, because it seems to be intended for vararg/unprototyped functions.
+// Gcc-compiled race runtime does not try to use that space.
+
+#ifdef GOOS_windows
+#define RARG0 CX
+#define RARG1 DX
+#define RARG2 R8
+#define RARG3 R9
+#else
+#define RARG0 DI
+#define RARG1 SI
+#define RARG2 DX
+#define RARG3 CX
+#endif
+
+// func runtime·raceread(addr uintptr)
+// Called from instrumented code.
+// Defined as ABIInternal so as to avoid introducing a wrapper,
+// which would render runtime.getcallerpc ineffective.
+TEXT	runtime·raceread<ABIInternal>(SB), NOSPLIT, $0-8
+	MOVQ	AX, RARG1
+	MOVQ	(SP), RARG2
+	// void __tsan_read(ThreadState *thr, void *addr, void *pc);
+	MOVQ	$__tsan_read(SB), AX
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceRead(addr uintptr)
+TEXT	runtime·RaceRead(SB), NOSPLIT, $0-8
+	// This needs to be a tail call, because raceread reads caller pc.
+	JMP	runtime·raceread(SB)
+
+// void runtime·racereadpc(void *addr, void *callpc, void *pc)
+TEXT	runtime·racereadpc(SB), NOSPLIT, $0-24
+	MOVQ	addr+0(FP), RARG1
+	MOVQ	callpc+8(FP), RARG2
+	MOVQ	pc+16(FP), RARG3
+	ADDQ	$1, RARG3 // pc is function start, tsan wants return address
+	// void __tsan_read_pc(ThreadState *thr, void *addr, void *callpc, void *pc);
+	MOVQ	$__tsan_read_pc(SB), AX
+	JMP	racecalladdr<>(SB)
+
+// func runtime·racewrite(addr uintptr)
+// Called from instrumented code.
+// Defined as ABIInternal so as to avoid introducing a wrapper,
+// which would render runtime.getcallerpc ineffective.
+TEXT	runtime·racewrite<ABIInternal>(SB), NOSPLIT, $0-8
+	MOVQ	AX, RARG1
+	MOVQ	(SP), RARG2
+	// void __tsan_write(ThreadState *thr, void *addr, void *pc);
+	MOVQ	$__tsan_write(SB), AX
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceWrite(addr uintptr)
+TEXT	runtime·RaceWrite(SB), NOSPLIT, $0-8
+	// This needs to be a tail call, because racewrite reads caller pc.
+	JMP	runtime·racewrite(SB)
+
+// void runtime·racewritepc(void *addr, void *callpc, void *pc)
+TEXT	runtime·racewritepc(SB), NOSPLIT, $0-24
+	MOVQ	addr+0(FP), RARG1
+	MOVQ	callpc+8(FP), RARG2
+	MOVQ	pc+16(FP), RARG3
+	ADDQ	$1, RARG3 // pc is function start, tsan wants return address
+	// void __tsan_write_pc(ThreadState *thr, void *addr, void *callpc, void *pc);
+	MOVQ	$__tsan_write_pc(SB), AX
+	JMP	racecalladdr<>(SB)
+
+// func runtime·racereadrange(addr, size uintptr)
+// Called from instrumented code.
+// Defined as ABIInternal so as to avoid introducing a wrapper,
+// which would render runtime.getcallerpc ineffective.
+TEXT	runtime·racereadrange<ABIInternal>(SB), NOSPLIT, $0-16
+	MOVQ	AX, RARG1
+	MOVQ	BX, RARG2
+	MOVQ	(SP), RARG3
+	// void __tsan_read_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVQ	$__tsan_read_range(SB), AX
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceReadRange(addr, size uintptr)
+TEXT	runtime·RaceReadRange(SB), NOSPLIT, $0-16
+	// This needs to be a tail call, because racereadrange reads caller pc.
+	JMP	runtime·racereadrange(SB)
+
+// void runtime·racereadrangepc1(void *addr, uintptr sz, void *pc)
+TEXT	runtime·racereadrangepc1(SB), NOSPLIT, $0-24
+	MOVQ	addr+0(FP), RARG1
+	MOVQ	size+8(FP), RARG2
+	MOVQ	pc+16(FP), RARG3
+	ADDQ	$1, RARG3 // pc is function start, tsan wants return address
+	// void __tsan_read_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVQ	$__tsan_read_range(SB), AX
+	JMP	racecalladdr<>(SB)
+
+// func runtime·racewriterange(addr, size uintptr)
+// Called from instrumented code.
+// Defined as ABIInternal so as to avoid introducing a wrapper,
+// which would render runtime.getcallerpc ineffective.
+TEXT	runtime·racewriterange<ABIInternal>(SB), NOSPLIT, $0-16
+	MOVQ	AX, RARG1
+	MOVQ	BX, RARG2
+	MOVQ	(SP), RARG3
+	// void __tsan_write_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVQ	$__tsan_write_range(SB), AX
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceWriteRange(addr, size uintptr)
+TEXT	runtime·RaceWriteRange(SB), NOSPLIT, $0-16
+	// This needs to be a tail call, because racewriterange reads caller pc.
+	JMP	runtime·racewriterange(SB)
+
+// void runtime·racewriterangepc1(void *addr, uintptr sz, void *pc)
+TEXT	runtime·racewriterangepc1(SB), NOSPLIT, $0-24
+	MOVQ	addr+0(FP), RARG1
+	MOVQ	size+8(FP), RARG2
+	MOVQ	pc+16(FP), RARG3
+	ADDQ	$1, RARG3 // pc is function start, tsan wants return address
+	// void __tsan_write_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVQ	$__tsan_write_range(SB), AX
+	JMP	racecalladdr<>(SB)
+
+// If addr (RARG1) is out of range, do nothing.
+// Otherwise, setup goroutine context and invoke racecall. Other arguments already set.
+TEXT	racecalladdr<>(SB), NOSPLIT, $0-0
+	MOVQ	g_racectx(R14), RARG0	// goroutine context
+	// Check that addr is within [arenastart, arenaend) or within [racedatastart, racedataend).
+	CMPQ	RARG1, runtime·racearenastart(SB)
+	JB	data
+	CMPQ	RARG1, runtime·racearenaend(SB)
+	JB	call
+data:
+	CMPQ	RARG1, runtime·racedatastart(SB)
+	JB	ret
+	CMPQ	RARG1, runtime·racedataend(SB)
+	JAE	ret
+call:
+	MOVQ	AX, AX		// w/o this 6a miscompiles this function
+	JMP	racecall<>(SB)
+ret:
+	RET
+
+// func runtime·racefuncenter(pc uintptr)
+// Called from instrumented code.
+TEXT	runtime·racefuncenter(SB), NOSPLIT, $0-8
+	MOVQ	callpc+0(FP), R11
+	JMP	racefuncenter<>(SB)
+
+// Common code for racefuncenter
+// R11 = caller's return address
+TEXT	racefuncenter<>(SB), NOSPLIT|NOFRAME, $0-0
+	MOVQ	DX, BX		// save function entry context (for closures)
+	MOVQ	g_racectx(R14), RARG0	// goroutine context
+	MOVQ	R11, RARG1
+	// void __tsan_func_enter(ThreadState *thr, void *pc);
+	MOVQ	$__tsan_func_enter(SB), AX
+	// racecall<> preserves BX
+	CALL	racecall<>(SB)
+	MOVQ	BX, DX	// restore function entry context
+	RET
+
+// func runtime·racefuncexit()
+// Called from instrumented code.
+TEXT	runtime·racefuncexit(SB), NOSPLIT, $0-0
+	MOVQ	g_racectx(R14), RARG0	// goroutine context
+	// void __tsan_func_exit(ThreadState *thr);
+	MOVQ	$__tsan_func_exit(SB), AX
+	JMP	racecall<>(SB)
+
+// Atomic operations for sync/atomic package.
+
+// Load
+TEXT	sync∕atomic·LoadInt32(SB), NOSPLIT|NOFRAME, $0-12
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic32_load(SB), AX
+	CALL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·LoadInt64(SB), NOSPLIT|NOFRAME, $0-16
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic64_load(SB), AX
+	CALL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·LoadUint32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt32(SB)
+
+TEXT	sync∕atomic·LoadUint64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt64(SB)
+
+TEXT	sync∕atomic·LoadUintptr(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt64(SB)
+
+TEXT	sync∕atomic·LoadPointer(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt64(SB)
+
+// Store
+TEXT	sync∕atomic·StoreInt32(SB), NOSPLIT|NOFRAME, $0-12
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic32_store(SB), AX
+	CALL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·StoreInt64(SB), NOSPLIT|NOFRAME, $0-16
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic64_store(SB), AX
+	CALL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·StoreUint32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	JMP	sync∕atomic·StoreInt32(SB)
+
+TEXT	sync∕atomic·StoreUint64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·StoreInt64(SB)
+
+TEXT	sync∕atomic·StoreUintptr(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·StoreInt64(SB)
+
+// Swap
+TEXT	sync∕atomic·SwapInt32(SB), NOSPLIT|NOFRAME, $0-20
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic32_exchange(SB), AX
+	CALL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·SwapInt64(SB), NOSPLIT|NOFRAME, $0-24
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic64_exchange(SB), AX
+	CALL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·SwapUint32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	JMP	sync∕atomic·SwapInt32(SB)
+
+TEXT	sync∕atomic·SwapUint64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·SwapInt64(SB)
+
+TEXT	sync∕atomic·SwapUintptr(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·SwapInt64(SB)
+
+// Add
+TEXT	sync∕atomic·AddInt32(SB), NOSPLIT|NOFRAME, $0-20
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic32_fetch_add(SB), AX
+	CALL	racecallatomic<>(SB)
+	MOVL	add+8(FP), AX	// convert fetch_add to add_fetch
+	ADDL	AX, ret+16(FP)
+	RET
+
+TEXT	sync∕atomic·AddInt64(SB), NOSPLIT|NOFRAME, $0-24
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic64_fetch_add(SB), AX
+	CALL	racecallatomic<>(SB)
+	MOVQ	add+8(FP), AX	// convert fetch_add to add_fetch
+	ADDQ	AX, ret+16(FP)
+	RET
+
+TEXT	sync∕atomic·AddUint32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	JMP	sync∕atomic·AddInt32(SB)
+
+TEXT	sync∕atomic·AddUint64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·AddInt64(SB)
+
+TEXT	sync∕atomic·AddUintptr(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·AddInt64(SB)
+
+// CompareAndSwap
+TEXT	sync∕atomic·CompareAndSwapInt32(SB), NOSPLIT|NOFRAME, $0-17
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic32_compare_exchange(SB), AX
+	CALL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·CompareAndSwapInt64(SB), NOSPLIT|NOFRAME, $0-25
+	GO_ARGS
+	MOVQ	$__tsan_go_atomic64_compare_exchange(SB), AX
+	CALL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·CompareAndSwapUint32(SB), NOSPLIT, $0-17
+	GO_ARGS
+	JMP	sync∕atomic·CompareAndSwapInt32(SB)
+
+TEXT	sync∕atomic·CompareAndSwapUint64(SB), NOSPLIT, $0-25
+	GO_ARGS
+	JMP	sync∕atomic·CompareAndSwapInt64(SB)
+
+TEXT	sync∕atomic·CompareAndSwapUintptr(SB), NOSPLIT, $0-25
+	GO_ARGS
+	JMP	sync∕atomic·CompareAndSwapInt64(SB)
+
+// Generic atomic operation implementation.
+// AX already contains target function.
+TEXT	racecallatomic<>(SB), NOSPLIT|NOFRAME, $0-0
+	// Trigger SIGSEGV early.
+	MOVQ	16(SP), R12
+	MOVBLZX	(R12), R13
+	// Check that addr is within [arenastart, arenaend) or within [racedatastart, racedataend).
+	CMPQ	R12, runtime·racearenastart(SB)
+	JB	racecallatomic_data
+	CMPQ	R12, runtime·racearenaend(SB)
+	JB	racecallatomic_ok
+racecallatomic_data:
+	CMPQ	R12, runtime·racedatastart(SB)
+	JB	racecallatomic_ignore
+	CMPQ	R12, runtime·racedataend(SB)
+	JAE	racecallatomic_ignore
+racecallatomic_ok:
+	// Addr is within the good range, call the atomic function.
+	MOVQ	g_racectx(R14), RARG0	// goroutine context
+	MOVQ	8(SP), RARG1	// caller pc
+	MOVQ	(SP), RARG2	// pc
+	LEAQ	16(SP), RARG3	// arguments
+	JMP	racecall<>(SB)	// does not return
+racecallatomic_ignore:
+	// Addr is outside the good range.
+	// Call __tsan_go_ignore_sync_begin to ignore synchronization during the atomic op.
+	// An attempt to synchronize on the address would cause crash.
+	MOVQ	AX, BX	// remember the original function
+	MOVQ	$__tsan_go_ignore_sync_begin(SB), AX
+	MOVQ	g_racectx(R14), RARG0	// goroutine context
+	CALL	racecall<>(SB)
+	MOVQ	BX, AX	// restore the original function
+	// Call the atomic function.
+	MOVQ	g_racectx(R14), RARG0	// goroutine context
+	MOVQ	8(SP), RARG1	// caller pc
+	MOVQ	(SP), RARG2	// pc
+	LEAQ	16(SP), RARG3	// arguments
+	CALL	racecall<>(SB)
+	// Call __tsan_go_ignore_sync_end.
+	MOVQ	$__tsan_go_ignore_sync_end(SB), AX
+	MOVQ	g_racectx(R14), RARG0	// goroutine context
+	JMP	racecall<>(SB)
+
+// void runtime·racecall(void(*f)(...), ...)
+// Calls C function f from race runtime and passes up to 4 arguments to it.
+// The arguments are never heap-object-preserving pointers, so we pretend there are no arguments.
+TEXT	runtime·racecall(SB), NOSPLIT, $0-0
+	MOVQ	fn+0(FP), AX
+	MOVQ	arg0+8(FP), RARG0
+	MOVQ	arg1+16(FP), RARG1
+	MOVQ	arg2+24(FP), RARG2
+	MOVQ	arg3+32(FP), RARG3
+	JMP	racecall<>(SB)
+
+// Switches SP to g0 stack and calls (AX). Arguments already set.
+TEXT	racecall<>(SB), NOSPLIT|NOFRAME, $0-0
+	MOVQ	g_m(R14), R13
+	// Switch to g0 stack.
+	MOVQ	SP, R12		// callee-saved, preserved across the CALL
+	MOVQ	m_g0(R13), R10
+	CMPQ	R10, R14
+	JE	call	// already on g0
+	MOVQ	(g_sched+gobuf_sp)(R10), SP
+call:
+	ANDQ	$~15, SP	// alignment for gcc ABI
+	CALL	AX
+	MOVQ	R12, SP
+	// Back to Go world, set special registers.
+	// The g register (R14) is preserved in C.
+	XORPS	X15, X15
+	RET
+
+// C->Go callback thunk that allows to call runtime·racesymbolize from C code.
+// Direct Go->C race call has only switched SP, finish g->g0 switch by setting correct g.
+// The overall effect of Go->C->Go call chain is similar to that of mcall.
+// RARG0 contains command code. RARG1 contains command-specific context.
+// See racecallback for command codes.
+TEXT	runtime·racecallbackthunk(SB), NOSPLIT|NOFRAME, $0-0
+	// Handle command raceGetProcCmd (0) here.
+	// First, code below assumes that we are on curg, while raceGetProcCmd
+	// can be executed on g0. Second, it is called frequently, so will
+	// benefit from this fast path.
+	CMPQ	RARG0, $0
+	JNE	rest
+	get_tls(RARG0)
+	MOVQ	g(RARG0), RARG0
+	MOVQ	g_m(RARG0), RARG0
+	MOVQ	m_p(RARG0), RARG0
+	MOVQ	p_raceprocctx(RARG0), RARG0
+	MOVQ	RARG0, (RARG1)
+	RET
+
+rest:
+	// Transition from C ABI to Go ABI.
+	PUSH_REGS_HOST_TO_ABI0()
+	// Set g = g0.
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	MOVQ	g_m(R14), R13
+	MOVQ	m_g0(R13), R15
+	CMPQ	R13, R15
+	JEQ	noswitch	// branch if already on g0
+	MOVQ	R15, g(R12)	// g = m->g0
+	MOVQ	R15, R14	// set g register
+	PUSHQ	RARG1	// func arg
+	PUSHQ	RARG0	// func arg
+	CALL	runtime·racecallback(SB)
+	POPQ	R12
+	POPQ	R12
+	// All registers are smashed after Go code, reload.
+	get_tls(R12)
+	MOVQ	g(R12), R13
+	MOVQ	g_m(R13), R13
+	MOVQ	m_curg(R13), R14
+	MOVQ	R14, g(R12)	// g = m->curg
+ret:
+	POP_REGS_HOST_TO_ABI0()
+	RET
+
+noswitch:
+	// already on g0
+	PUSHQ	RARG1	// func arg
+	PUSHQ	RARG0	// func arg
+	CALL	runtime·racecallback(SB)
+	POPQ	R12
+	POPQ	R12
+	JMP	ret
diff --git a/src/runtime/race_arm64.s b/src/runtime/race_arm64.s
new file mode 100644
index 0000000..c818345
--- /dev/null
+++ b/src/runtime/race_arm64.s
@@ -0,0 +1,498 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build race
+
+#include "go_asm.h"
+#include "funcdata.h"
+#include "textflag.h"
+#include "tls_arm64.h"
+#include "cgo/abi_arm64.h"
+
+// The following thunks allow calling the gcc-compiled race runtime directly
+// from Go code without going all the way through cgo.
+// First, it's much faster (up to 50% speedup for real Go programs).
+// Second, it eliminates race-related special cases from cgocall and scheduler.
+// Third, in long-term it will allow to remove cyclic runtime/race dependency on cmd/go.
+
+// A brief recap of the arm64 calling convention.
+// Arguments are passed in R0...R7, the rest is on stack.
+// Callee-saved registers are: R19...R28.
+// Temporary registers are: R9...R15
+// SP must be 16-byte aligned.
+
+// When calling racecalladdr, R9 is the call target address.
+
+// The race ctx, ThreadState *thr below, is passed in R0 and loaded in racecalladdr.
+
+// Darwin may return unaligned thread pointer. Align it. (See tls_arm64.s)
+// No-op on other OSes.
+#ifdef TLS_darwin
+#define TP_ALIGN	AND	$~7, R0
+#else
+#define TP_ALIGN
+#endif
+
+// Load g from TLS. (See tls_arm64.s)
+#define load_g \
+	MRS_TPIDR_R0 \
+	TP_ALIGN \
+	MOVD    runtime·tls_g(SB), R11 \
+	MOVD    (R0)(R11), g
+
+// func runtime·raceread(addr uintptr)
+// Called from instrumented code.
+// Defined as ABIInternal so as to avoid introducing a wrapper,
+// which would make caller's PC ineffective.
+TEXT	runtime·raceread<ABIInternal>(SB), NOSPLIT, $0-8
+	MOVD	R0, R1	// addr
+	MOVD	LR, R2
+	// void __tsan_read(ThreadState *thr, void *addr, void *pc);
+	MOVD	$__tsan_read(SB), R9
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceRead(addr uintptr)
+TEXT	runtime·RaceRead(SB), NOSPLIT, $0-8
+	// This needs to be a tail call, because raceread reads caller pc.
+	JMP	runtime·raceread(SB)
+
+// func runtime·racereadpc(void *addr, void *callpc, void *pc)
+TEXT	runtime·racereadpc(SB), NOSPLIT, $0-24
+	MOVD	addr+0(FP), R1
+	MOVD	callpc+8(FP), R2
+	MOVD	pc+16(FP), R3
+	// void __tsan_read_pc(ThreadState *thr, void *addr, void *callpc, void *pc);
+	MOVD	$__tsan_read_pc(SB), R9
+	JMP	racecalladdr<>(SB)
+
+// func runtime·racewrite(addr uintptr)
+// Called from instrumented code.
+// Defined as ABIInternal so as to avoid introducing a wrapper,
+// which would make caller's PC ineffective.
+TEXT	runtime·racewrite<ABIInternal>(SB), NOSPLIT, $0-8
+	MOVD	R0, R1	// addr
+	MOVD	LR, R2
+	// void __tsan_write(ThreadState *thr, void *addr, void *pc);
+	MOVD	$__tsan_write(SB), R9
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceWrite(addr uintptr)
+TEXT	runtime·RaceWrite(SB), NOSPLIT, $0-8
+	// This needs to be a tail call, because racewrite reads caller pc.
+	JMP	runtime·racewrite(SB)
+
+// func runtime·racewritepc(void *addr, void *callpc, void *pc)
+TEXT	runtime·racewritepc(SB), NOSPLIT, $0-24
+	MOVD	addr+0(FP), R1
+	MOVD	callpc+8(FP), R2
+	MOVD	pc+16(FP), R3
+	// void __tsan_write_pc(ThreadState *thr, void *addr, void *callpc, void *pc);
+	MOVD	$__tsan_write_pc(SB), R9
+	JMP	racecalladdr<>(SB)
+
+// func runtime·racereadrange(addr, size uintptr)
+// Called from instrumented code.
+// Defined as ABIInternal so as to avoid introducing a wrapper,
+// which would make caller's PC ineffective.
+TEXT	runtime·racereadrange<ABIInternal>(SB), NOSPLIT, $0-16
+	MOVD	R1, R2	// size
+	MOVD	R0, R1	// addr
+	MOVD	LR, R3
+	// void __tsan_read_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_read_range(SB), R9
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceReadRange(addr, size uintptr)
+TEXT	runtime·RaceReadRange(SB), NOSPLIT, $0-16
+	// This needs to be a tail call, because racereadrange reads caller pc.
+	JMP	runtime·racereadrange(SB)
+
+// func runtime·racereadrangepc1(void *addr, uintptr sz, void *pc)
+TEXT	runtime·racereadrangepc1(SB), NOSPLIT, $0-24
+	MOVD	addr+0(FP), R1
+	MOVD	size+8(FP), R2
+	MOVD	pc+16(FP), R3
+	ADD	$4, R3	// pc is function start, tsan wants return address.
+	// void __tsan_read_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_read_range(SB), R9
+	JMP	racecalladdr<>(SB)
+
+// func runtime·racewriterange(addr, size uintptr)
+// Called from instrumented code.
+// Defined as ABIInternal so as to avoid introducing a wrapper,
+// which would make caller's PC ineffective.
+TEXT	runtime·racewriterange<ABIInternal>(SB), NOSPLIT, $0-16
+	MOVD	R1, R2	// size
+	MOVD	R0, R1	// addr
+	MOVD	LR, R3
+	// void __tsan_write_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_write_range(SB), R9
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceWriteRange(addr, size uintptr)
+TEXT	runtime·RaceWriteRange(SB), NOSPLIT, $0-16
+	// This needs to be a tail call, because racewriterange reads caller pc.
+	JMP	runtime·racewriterange(SB)
+
+// func runtime·racewriterangepc1(void *addr, uintptr sz, void *pc)
+TEXT	runtime·racewriterangepc1(SB), NOSPLIT, $0-24
+	MOVD	addr+0(FP), R1
+	MOVD	size+8(FP), R2
+	MOVD	pc+16(FP), R3
+	ADD	$4, R3	// pc is function start, tsan wants return address.
+	// void __tsan_write_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_write_range(SB), R9
+	JMP	racecalladdr<>(SB)
+
+// If addr (R1) is out of range, do nothing.
+// Otherwise, setup goroutine context and invoke racecall. Other arguments already set.
+TEXT	racecalladdr<>(SB), NOSPLIT, $0-0
+	load_g
+	MOVD	g_racectx(g), R0
+	// Check that addr is within [arenastart, arenaend) or within [racedatastart, racedataend).
+	MOVD	runtime·racearenastart(SB), R10
+	CMP	R10, R1
+	BLT	data
+	MOVD	runtime·racearenaend(SB), R10
+	CMP	R10, R1
+	BLT	call
+data:
+	MOVD	runtime·racedatastart(SB), R10
+	CMP	R10, R1
+	BLT	ret
+	MOVD	runtime·racedataend(SB), R10
+	CMP	R10, R1
+	BGT	ret
+call:
+	JMP	racecall<>(SB)
+ret:
+	RET
+
+// func runtime·racefuncenter(pc uintptr)
+// Called from instrumented code.
+TEXT	runtime·racefuncenter<ABIInternal>(SB), NOSPLIT, $0-8
+	MOVD	R0, R9	// callpc
+	JMP	racefuncenter<>(SB)
+
+// Common code for racefuncenter
+// R9 = caller's return address
+TEXT	racefuncenter<>(SB), NOSPLIT, $0-0
+	load_g
+	MOVD	g_racectx(g), R0	// goroutine racectx
+	MOVD	R9, R1
+	// void __tsan_func_enter(ThreadState *thr, void *pc);
+	MOVD	$__tsan_func_enter(SB), R9
+	BL	racecall<>(SB)
+	RET
+
+// func runtime·racefuncexit()
+// Called from instrumented code.
+TEXT	runtime·racefuncexit<ABIInternal>(SB), NOSPLIT, $0-0
+	load_g
+	MOVD	g_racectx(g), R0	// race context
+	// void __tsan_func_exit(ThreadState *thr);
+	MOVD	$__tsan_func_exit(SB), R9
+	JMP	racecall<>(SB)
+
+// Atomic operations for sync/atomic package.
+// R3 = addr of arguments passed to this function, it can
+// be fetched at 40(RSP) in racecallatomic after two times BL
+// R0, R1, R2 set in racecallatomic
+
+// Load
+TEXT	sync∕atomic·LoadInt32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	MOVD	$__tsan_go_atomic32_load(SB), R9
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·LoadInt64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	MOVD	$__tsan_go_atomic64_load(SB), R9
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·LoadUint32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt32(SB)
+
+TEXT	sync∕atomic·LoadUint64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt64(SB)
+
+TEXT	sync∕atomic·LoadUintptr(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt64(SB)
+
+TEXT	sync∕atomic·LoadPointer(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt64(SB)
+
+// Store
+TEXT	sync∕atomic·StoreInt32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	MOVD	$__tsan_go_atomic32_store(SB), R9
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·StoreInt64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	MOVD	$__tsan_go_atomic64_store(SB), R9
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·StoreUint32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	JMP	sync∕atomic·StoreInt32(SB)
+
+TEXT	sync∕atomic·StoreUint64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·StoreInt64(SB)
+
+TEXT	sync∕atomic·StoreUintptr(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·StoreInt64(SB)
+
+// Swap
+TEXT	sync∕atomic·SwapInt32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	MOVD	$__tsan_go_atomic32_exchange(SB), R9
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·SwapInt64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	MOVD	$__tsan_go_atomic64_exchange(SB), R9
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·SwapUint32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	JMP	sync∕atomic·SwapInt32(SB)
+
+TEXT	sync∕atomic·SwapUint64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·SwapInt64(SB)
+
+TEXT	sync∕atomic·SwapUintptr(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·SwapInt64(SB)
+
+// Add
+TEXT	sync∕atomic·AddInt32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	MOVD	$__tsan_go_atomic32_fetch_add(SB), R9
+	BL	racecallatomic<>(SB)
+	MOVW	add+8(FP), R0	// convert fetch_add to add_fetch
+	MOVW	ret+16(FP), R1
+	ADD	R0, R1, R0
+	MOVW	R0, ret+16(FP)
+	RET
+
+TEXT	sync∕atomic·AddInt64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	MOVD	$__tsan_go_atomic64_fetch_add(SB), R9
+	BL	racecallatomic<>(SB)
+	MOVD	add+8(FP), R0	// convert fetch_add to add_fetch
+	MOVD	ret+16(FP), R1
+	ADD	R0, R1, R0
+	MOVD	R0, ret+16(FP)
+	RET
+
+TEXT	sync∕atomic·AddUint32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	JMP	sync∕atomic·AddInt32(SB)
+
+TEXT	sync∕atomic·AddUint64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·AddInt64(SB)
+
+TEXT	sync∕atomic·AddUintptr(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·AddInt64(SB)
+
+// CompareAndSwap
+TEXT	sync∕atomic·CompareAndSwapInt32(SB), NOSPLIT, $0-17
+	GO_ARGS
+	MOVD	$__tsan_go_atomic32_compare_exchange(SB), R9
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·CompareAndSwapInt64(SB), NOSPLIT, $0-25
+	GO_ARGS
+	MOVD	$__tsan_go_atomic64_compare_exchange(SB), R9
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·CompareAndSwapUint32(SB), NOSPLIT, $0-17
+	GO_ARGS
+	JMP	sync∕atomic·CompareAndSwapInt32(SB)
+
+TEXT	sync∕atomic·CompareAndSwapUint64(SB), NOSPLIT, $0-25
+	GO_ARGS
+	JMP	sync∕atomic·CompareAndSwapInt64(SB)
+
+TEXT	sync∕atomic·CompareAndSwapUintptr(SB), NOSPLIT, $0-25
+	GO_ARGS
+	JMP	sync∕atomic·CompareAndSwapInt64(SB)
+
+// Generic atomic operation implementation.
+// R9 = addr of target function
+TEXT	racecallatomic<>(SB), NOSPLIT, $0
+	// Set up these registers
+	// R0 = *ThreadState
+	// R1 = caller pc
+	// R2 = pc
+	// R3 = addr of incoming arg list
+
+	// Trigger SIGSEGV early.
+	MOVD	40(RSP), R3	// 1st arg is addr. after two times BL, get it at 40(RSP)
+	MOVB	(R3), R13	// segv here if addr is bad
+	// Check that addr is within [arenastart, arenaend) or within [racedatastart, racedataend).
+	MOVD	runtime·racearenastart(SB), R10
+	CMP	R10, R3
+	BLT	racecallatomic_data
+	MOVD	runtime·racearenaend(SB), R10
+	CMP	R10, R3
+	BLT	racecallatomic_ok
+racecallatomic_data:
+	MOVD	runtime·racedatastart(SB), R10
+	CMP	R10, R3
+	BLT	racecallatomic_ignore
+	MOVD	runtime·racedataend(SB), R10
+	CMP	R10, R3
+	BGE	racecallatomic_ignore
+racecallatomic_ok:
+	// Addr is within the good range, call the atomic function.
+	load_g
+	MOVD	g_racectx(g), R0	// goroutine context
+	MOVD	16(RSP), R1	// caller pc
+	MOVD	R9, R2	// pc
+	ADD	$40, RSP, R3
+	JMP	racecall<>(SB)	// does not return
+racecallatomic_ignore:
+	// Addr is outside the good range.
+	// Call __tsan_go_ignore_sync_begin to ignore synchronization during the atomic op.
+	// An attempt to synchronize on the address would cause crash.
+	MOVD	R9, R21	// remember the original function
+	MOVD	$__tsan_go_ignore_sync_begin(SB), R9
+	load_g
+	MOVD	g_racectx(g), R0	// goroutine context
+	BL	racecall<>(SB)
+	MOVD	R21, R9	// restore the original function
+	// Call the atomic function.
+	// racecall will call LLVM race code which might clobber R28 (g)
+	load_g
+	MOVD	g_racectx(g), R0	// goroutine context
+	MOVD	16(RSP), R1	// caller pc
+	MOVD	R9, R2	// pc
+	ADD	$40, RSP, R3	// arguments
+	BL	racecall<>(SB)
+	// Call __tsan_go_ignore_sync_end.
+	MOVD	$__tsan_go_ignore_sync_end(SB), R9
+	MOVD	g_racectx(g), R0	// goroutine context
+	BL	racecall<>(SB)
+	RET
+
+// func runtime·racecall(void(*f)(...), ...)
+// Calls C function f from race runtime and passes up to 4 arguments to it.
+// The arguments are never heap-object-preserving pointers, so we pretend there are no arguments.
+TEXT	runtime·racecall(SB), NOSPLIT, $0-0
+	MOVD	fn+0(FP), R9
+	MOVD	arg0+8(FP), R0
+	MOVD	arg1+16(FP), R1
+	MOVD	arg2+24(FP), R2
+	MOVD	arg3+32(FP), R3
+	JMP	racecall<>(SB)
+
+// Switches SP to g0 stack and calls (R9). Arguments already set.
+// Clobbers R19, R20.
+TEXT	racecall<>(SB), NOSPLIT|NOFRAME, $0-0
+	MOVD	g_m(g), R10
+	// Switch to g0 stack.
+	MOVD	RSP, R19	// callee-saved, preserved across the CALL
+	MOVD	R30, R20	// callee-saved, preserved across the CALL
+	MOVD	m_g0(R10), R11
+	CMP	R11, g
+	BEQ	call	// already on g0
+	MOVD	(g_sched+gobuf_sp)(R11), R12
+	MOVD	R12, RSP
+call:
+	BL	R9
+	MOVD	R19, RSP
+	JMP	(R20)
+
+// C->Go callback thunk that allows to call runtime·racesymbolize from C code.
+// Direct Go->C race call has only switched SP, finish g->g0 switch by setting correct g.
+// The overall effect of Go->C->Go call chain is similar to that of mcall.
+// R0 contains command code. R1 contains command-specific context.
+// See racecallback for command codes.
+TEXT	runtime·racecallbackthunk(SB), NOSPLIT|NOFRAME, $0
+	// Handle command raceGetProcCmd (0) here.
+	// First, code below assumes that we are on curg, while raceGetProcCmd
+	// can be executed on g0. Second, it is called frequently, so will
+	// benefit from this fast path.
+	CBNZ	R0, rest
+	MOVD	g, R13
+#ifdef TLS_darwin
+	MOVD	R27, R12 // save R27 a.k.a. REGTMP (callee-save in C). load_g clobbers it
+#endif
+	load_g
+#ifdef TLS_darwin
+	MOVD	R12, R27
+#endif
+	MOVD	g_m(g), R0
+	MOVD	m_p(R0), R0
+	MOVD	p_raceprocctx(R0), R0
+	MOVD	R0, (R1)
+	MOVD	R13, g
+	JMP	(LR)
+rest:
+	// Save callee-saved registers (Go code won't respect that).
+	// 8(RSP) and 16(RSP) are for args passed through racecallback
+	SUB	$176, RSP
+	MOVD	LR, 0(RSP)
+
+	SAVE_R19_TO_R28(8*3)
+	SAVE_F8_TO_F15(8*13)
+	MOVD	R29, (8*21)(RSP)
+	// Set g = g0.
+	// load_g will clobber R0, Save R0
+	MOVD	R0, R13
+	load_g
+	// restore R0
+	MOVD	R13, R0
+	MOVD	g_m(g), R13
+	MOVD	m_g0(R13), R14
+	CMP	R14, g
+	BEQ	noswitch	// branch if already on g0
+	MOVD	R14, g
+
+	MOVD	R0, 8(RSP)	// func arg
+	MOVD	R1, 16(RSP)	// func arg
+	BL	runtime·racecallback(SB)
+
+	// All registers are smashed after Go code, reload.
+	MOVD	g_m(g), R13
+	MOVD	m_curg(R13), g	// g = m->curg
+ret:
+	// Restore callee-saved registers.
+	MOVD	0(RSP), LR
+	MOVD	(8*21)(RSP), R29
+	RESTORE_F8_TO_F15(8*13)
+	RESTORE_R19_TO_R28(8*3)
+	ADD	$176, RSP
+	JMP	(LR)
+
+noswitch:
+	// already on g0
+	MOVD	R0, 8(RSP)	// func arg
+	MOVD	R1, 16(RSP)	// func arg
+	BL	runtime·racecallback(SB)
+	JMP	ret
+
+#ifndef TLSG_IS_VARIABLE
+// tls_g, g value for each thread in TLS
+GLOBL runtime·tls_g+0(SB), TLSBSS+DUPOK, $8
+#endif
diff --git a/src/runtime/race_ppc64le.s b/src/runtime/race_ppc64le.s
new file mode 100644
index 0000000..39cfffc
--- /dev/null
+++ b/src/runtime/race_ppc64le.s
@@ -0,0 +1,505 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build race
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+#include "asm_ppc64x.h"
+#include "cgo/abi_ppc64x.h"
+
+// The following functions allow calling the clang-compiled race runtime directly
+// from Go code without going all the way through cgo.
+// First, it's much faster (up to 50% speedup for real Go programs).
+// Second, it eliminates race-related special cases from cgocall and scheduler.
+// Third, in long-term it will allow to remove cyclic runtime/race dependency on cmd/go.
+
+// A brief recap of the ppc64le calling convention.
+// Arguments are passed in R3, R4, R5 ...
+// SP must be 16-byte aligned.
+
+// Note that for ppc64x, LLVM follows the standard ABI and
+// expects arguments in registers, so these functions move
+// the arguments from storage to the registers expected
+// by the ABI.
+
+// When calling from Go to Clang tsan code:
+// R3 is the 1st argument and is usually the ThreadState*
+// R4-? are the 2nd, 3rd, 4th, etc. arguments
+
+// When calling racecalladdr:
+// R8 is the call target address
+
+// The race ctx is passed in R3 and loaded in
+// racecalladdr.
+//
+// The sequence used to get the race ctx:
+//    MOVD    runtime·tls_g(SB), R10 // Address of TLS variable
+//    MOVD    0(R10), g              // g = R30
+//    MOVD    g_racectx(g), R3       // racectx == ThreadState
+
+// func runtime·RaceRead(addr uintptr)
+// Called from instrumented Go code
+TEXT	runtime·raceread<ABIInternal>(SB), NOSPLIT, $0-8
+	MOVD	R3, R4 // addr
+	MOVD	LR, R5 // caller of this?
+	// void __tsan_read(ThreadState *thr, void *addr, void *pc);
+	MOVD	$__tsan_read(SB), R8
+	BR	racecalladdr<>(SB)
+
+TEXT    runtime·RaceRead(SB), NOSPLIT, $0-8
+	BR	runtime·raceread(SB)
+
+// void runtime·racereadpc(void *addr, void *callpc, void *pc)
+TEXT	runtime·racereadpc(SB), NOSPLIT, $0-24
+	MOVD	addr+0(FP), R4
+	MOVD	callpc+8(FP), R5
+	MOVD	pc+16(FP), R6
+	// void __tsan_read_pc(ThreadState *thr, void *addr, void *callpc, void *pc);
+	MOVD	$__tsan_read_pc(SB), R8
+	BR	racecalladdr<>(SB)
+
+// func runtime·RaceWrite(addr uintptr)
+// Called from instrumented Go code
+TEXT	runtime·racewrite<ABIInternal>(SB), NOSPLIT, $0-8
+	MOVD	R3, R4 // addr
+	MOVD	LR, R5 // caller has set LR via BL inst
+	// void __tsan_write(ThreadState *thr, void *addr, void *pc);
+	MOVD	$__tsan_write(SB), R8
+	BR	racecalladdr<>(SB)
+
+TEXT    runtime·RaceWrite(SB), NOSPLIT, $0-8
+	JMP	runtime·racewrite(SB)
+
+// void runtime·racewritepc(void *addr, void *callpc, void *pc)
+TEXT	runtime·racewritepc(SB), NOSPLIT, $0-24
+	MOVD	addr+0(FP), R4
+	MOVD	callpc+8(FP), R5
+	MOVD	pc+16(FP), R6
+	// void __tsan_write_pc(ThreadState *thr, void *addr, void *callpc, void *pc);
+	MOVD	$__tsan_write_pc(SB), R8
+	BR	racecalladdr<>(SB)
+
+// func runtime·RaceReadRange(addr, size uintptr)
+// Called from instrumented Go code.
+TEXT	runtime·racereadrange<ABIInternal>(SB), NOSPLIT, $0-16
+	MOVD	R4, R5 // size
+	MOVD	R3, R4 // addr
+	MOVD	LR, R6
+	// void __tsan_read_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_read_range(SB), R8
+	BR	racecalladdr<>(SB)
+
+// void runtime·racereadrangepc1(void *addr, uintptr sz, void *pc)
+TEXT	runtime·racereadrangepc1(SB), NOSPLIT, $0-24
+	MOVD    addr+0(FP), R4
+	MOVD    size+8(FP), R5
+	MOVD    pc+16(FP), R6
+	ADD	$4, R6		// tsan wants return addr
+	// void __tsan_read_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD    $__tsan_read_range(SB), R8
+	BR	racecalladdr<>(SB)
+
+TEXT    runtime·RaceReadRange(SB), NOSPLIT, $0-16
+	BR	runtime·racereadrange(SB)
+
+// func runtime·RaceWriteRange(addr, size uintptr)
+// Called from instrumented Go code.
+TEXT	runtime·racewriterange<ABIInternal>(SB), NOSPLIT, $0-16
+	MOVD	R4, R5 // size
+	MOVD	R3, R4 // addr
+	MOVD	LR, R6
+	// void __tsan_write_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_write_range(SB), R8
+	BR	racecalladdr<>(SB)
+
+TEXT    runtime·RaceWriteRange(SB), NOSPLIT, $0-16
+	BR	runtime·racewriterange(SB)
+
+// void runtime·racewriterangepc1(void *addr, uintptr sz, void *pc)
+// Called from instrumented Go code
+TEXT	runtime·racewriterangepc1(SB), NOSPLIT, $0-24
+	MOVD	addr+0(FP), R4
+	MOVD	size+8(FP), R5
+	MOVD	pc+16(FP), R6
+	ADD	$4, R6			// add 4 to inst offset?
+	// void __tsan_write_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_write_range(SB), R8
+	BR	racecalladdr<>(SB)
+
+// Call a __tsan function from Go code.
+// R8 = tsan function address
+// R3 = *ThreadState a.k.a. g_racectx from g
+// R4 = addr passed to __tsan function
+//
+// Otherwise, setup goroutine context and invoke racecall. Other arguments already set.
+TEXT	racecalladdr<>(SB), NOSPLIT, $0-0
+	MOVD    runtime·tls_g(SB), R10
+	MOVD	0(R10), g
+	MOVD	g_racectx(g), R3	// goroutine context
+	// Check that addr is within [arenastart, arenaend) or within [racedatastart, racedataend).
+	MOVD	runtime·racearenastart(SB), R9
+	CMP	R4, R9
+	BLT	data
+	MOVD	runtime·racearenaend(SB), R9
+	CMP	R4, R9
+	BLT	call
+data:
+	MOVD	runtime·racedatastart(SB), R9
+	CMP	R4, R9
+	BLT	ret
+	MOVD	runtime·racedataend(SB), R9
+	CMP	R4, R9
+	BGT	ret
+call:
+	// Careful!! racecall will save LR on its
+	// stack, which is OK as long as racecalladdr
+	// doesn't change in a way that generates a stack.
+	// racecall should return to the caller of
+	// recalladdr.
+	BR	racecall<>(SB)
+ret:
+	RET
+
+// func runtime·racefuncenter(pc uintptr)
+// Called from instrumented Go code.
+TEXT	runtime·racefuncenter(SB), NOSPLIT, $0-8
+	MOVD	callpc+0(FP), R8
+	BR	racefuncenter<>(SB)
+
+// Common code for racefuncenter
+// R11 = caller's return address
+TEXT	racefuncenter<>(SB), NOSPLIT, $0-0
+	MOVD    runtime·tls_g(SB), R10
+	MOVD    0(R10), g
+	MOVD    g_racectx(g), R3        // goroutine racectx aka *ThreadState
+	MOVD	R8, R4			// caller pc set by caller in R8
+	// void __tsan_func_enter(ThreadState *thr, void *pc);
+	MOVD	$__tsan_func_enter(SB), R8
+	BR	racecall<>(SB)
+	RET
+
+// func runtime·racefuncexit()
+// Called from Go instrumented code.
+TEXT	runtime·racefuncexit(SB), NOSPLIT, $0-0
+	MOVD    runtime·tls_g(SB), R10
+	MOVD    0(R10), g
+	MOVD    g_racectx(g), R3        // goroutine racectx aka *ThreadState
+	// void __tsan_func_exit(ThreadState *thr);
+	MOVD	$__tsan_func_exit(SB), R8
+	BR	racecall<>(SB)
+
+// Atomic operations for sync/atomic package.
+// Some use the __tsan versions instead
+// R6 = addr of arguments passed to this function
+// R3, R4, R5 set in racecallatomic
+
+// Load atomic in tsan
+TEXT	sync∕atomic·LoadInt32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	// void __tsan_go_atomic32_load(ThreadState *thr, uptr cpc, uptr pc, u8 *a);
+	MOVD	$__tsan_go_atomic32_load(SB), R8
+	ADD	$32, R1, R6	// addr of caller's 1st arg
+	BR	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·LoadInt64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	// void __tsan_go_atomic64_load(ThreadState *thr, uptr cpc, uptr pc, u8 *a);
+	MOVD	$__tsan_go_atomic64_load(SB), R8
+	ADD	$32, R1, R6	// addr of caller's 1st arg
+	BR	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·LoadUint32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	BR	sync∕atomic·LoadInt32(SB)
+
+TEXT	sync∕atomic·LoadUint64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	BR	sync∕atomic·LoadInt64(SB)
+
+TEXT	sync∕atomic·LoadUintptr(SB), NOSPLIT, $0-16
+	GO_ARGS
+	BR	sync∕atomic·LoadInt64(SB)
+
+TEXT	sync∕atomic·LoadPointer(SB), NOSPLIT, $0-16
+	GO_ARGS
+	BR	sync∕atomic·LoadInt64(SB)
+
+// Store atomic in tsan
+TEXT	sync∕atomic·StoreInt32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	// void __tsan_go_atomic32_store(ThreadState *thr, uptr cpc, uptr pc, u8 *a);
+	MOVD	$__tsan_go_atomic32_store(SB), R8
+	ADD	$32, R1, R6	// addr of caller's 1st arg
+	BR	racecallatomic<>(SB)
+
+TEXT	sync∕atomic·StoreInt64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	// void __tsan_go_atomic64_store(ThreadState *thr, uptr cpc, uptr pc, u8 *a);
+	MOVD	$__tsan_go_atomic64_store(SB), R8
+	ADD	$32, R1, R6	// addr of caller's 1st arg
+	BR	racecallatomic<>(SB)
+
+TEXT	sync∕atomic·StoreUint32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	BR	sync∕atomic·StoreInt32(SB)
+
+TEXT	sync∕atomic·StoreUint64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	BR	sync∕atomic·StoreInt64(SB)
+
+TEXT	sync∕atomic·StoreUintptr(SB), NOSPLIT, $0-16
+	GO_ARGS
+	BR	sync∕atomic·StoreInt64(SB)
+
+// Swap in tsan
+TEXT	sync∕atomic·SwapInt32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	// void __tsan_go_atomic32_exchange(ThreadState *thr, uptr cpc, uptr pc, u8 *a);
+	MOVD	$__tsan_go_atomic32_exchange(SB), R8
+	ADD	$32, R1, R6	// addr of caller's 1st arg
+	BR	racecallatomic<>(SB)
+
+TEXT	sync∕atomic·SwapInt64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	// void __tsan_go_atomic64_exchange(ThreadState *thr, uptr cpc, uptr pc, u8 *a)
+	MOVD	$__tsan_go_atomic64_exchange(SB), R8
+	ADD	$32, R1, R6	// addr of caller's 1st arg
+	BR	racecallatomic<>(SB)
+
+TEXT	sync∕atomic·SwapUint32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	BR	sync∕atomic·SwapInt32(SB)
+
+TEXT	sync∕atomic·SwapUint64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	BR	sync∕atomic·SwapInt64(SB)
+
+TEXT	sync∕atomic·SwapUintptr(SB), NOSPLIT, $0-24
+	GO_ARGS
+	BR	sync∕atomic·SwapInt64(SB)
+
+// Add atomic in tsan
+TEXT	sync∕atomic·AddInt32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	// void __tsan_go_atomic32_fetch_add(ThreadState *thr, uptr cpc, uptr pc, u8 *a);
+	MOVD	$__tsan_go_atomic32_fetch_add(SB), R8
+	ADD	$64, R1, R6	// addr of caller's 1st arg
+	BL	racecallatomic<>(SB)
+	// The tsan fetch_add result is not as expected by Go,
+	// so the 'add' must be added to the result.
+	MOVW	add+8(FP), R3	// The tsa fetch_add does not return the
+	MOVW	ret+16(FP), R4	// result as expected by go, so fix it.
+	ADD	R3, R4, R3
+	MOVW	R3, ret+16(FP)
+	RET
+
+TEXT	sync∕atomic·AddInt64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	// void __tsan_go_atomic64_fetch_add(ThreadState *thr, uptr cpc, uptr pc, u8 *a);
+	MOVD	$__tsan_go_atomic64_fetch_add(SB), R8
+	ADD	$64, R1, R6	// addr of caller's 1st arg
+	BL	racecallatomic<>(SB)
+	// The tsan fetch_add result is not as expected by Go,
+	// so the 'add' must be added to the result.
+	MOVD	add+8(FP), R3
+	MOVD	ret+16(FP), R4
+	ADD	R3, R4, R3
+	MOVD	R3, ret+16(FP)
+	RET
+
+TEXT	sync∕atomic·AddUint32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	BR	sync∕atomic·AddInt32(SB)
+
+TEXT	sync∕atomic·AddUint64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	BR	sync∕atomic·AddInt64(SB)
+
+TEXT	sync∕atomic·AddUintptr(SB), NOSPLIT, $0-24
+	GO_ARGS
+	BR	sync∕atomic·AddInt64(SB)
+
+// CompareAndSwap in tsan
+TEXT	sync∕atomic·CompareAndSwapInt32(SB), NOSPLIT, $0-17
+	GO_ARGS
+	// void __tsan_go_atomic32_compare_exchange(
+	//   ThreadState *thr, uptr cpc, uptr pc, u8 *a)
+	MOVD	$__tsan_go_atomic32_compare_exchange(SB), R8
+	ADD	$32, R1, R6	// addr of caller's 1st arg
+	BR	racecallatomic<>(SB)
+
+TEXT	sync∕atomic·CompareAndSwapInt64(SB), NOSPLIT, $0-25
+	GO_ARGS
+	// void __tsan_go_atomic32_compare_exchange(
+	//   ThreadState *thr, uptr cpc, uptr pc, u8 *a)
+	MOVD	$__tsan_go_atomic64_compare_exchange(SB), R8
+	ADD	$32, R1, R6	// addr of caller's 1st arg
+	BR	racecallatomic<>(SB)
+
+TEXT	sync∕atomic·CompareAndSwapUint32(SB), NOSPLIT, $0-17
+	GO_ARGS
+	BR	sync∕atomic·CompareAndSwapInt32(SB)
+
+TEXT	sync∕atomic·CompareAndSwapUint64(SB), NOSPLIT, $0-25
+	GO_ARGS
+	BR	sync∕atomic·CompareAndSwapInt64(SB)
+
+TEXT	sync∕atomic·CompareAndSwapUintptr(SB), NOSPLIT, $0-25
+	GO_ARGS
+	BR	sync∕atomic·CompareAndSwapInt64(SB)
+
+// Common function used to call tsan's atomic functions
+// R3 = *ThreadState
+// R4 = TODO: What's this supposed to be?
+// R5 = caller pc
+// R6 = addr of incoming arg list
+// R8 contains addr of target function.
+TEXT	racecallatomic<>(SB), NOSPLIT, $0-0
+	// Trigger SIGSEGV early if address passed to atomic function is bad.
+	MOVD	(R6), R7	// 1st arg is addr
+	MOVB	(R7), R9	// segv here if addr is bad
+	// Check that addr is within [arenastart, arenaend) or within [racedatastart, racedataend).
+	MOVD	runtime·racearenastart(SB), R9
+	CMP	R7, R9
+	BLT	racecallatomic_data
+	MOVD	runtime·racearenaend(SB), R9
+	CMP	R7, R9
+	BLT	racecallatomic_ok
+racecallatomic_data:
+	MOVD	runtime·racedatastart(SB), R9
+	CMP	R7, R9
+	BLT	racecallatomic_ignore
+	MOVD	runtime·racedataend(SB), R9
+	CMP	R7, R9
+	BGE	racecallatomic_ignore
+racecallatomic_ok:
+	// Addr is within the good range, call the atomic function.
+	MOVD    runtime·tls_g(SB), R10
+	MOVD    0(R10), g
+	MOVD    g_racectx(g), R3        // goroutine racectx aka *ThreadState
+	MOVD	R8, R5			// pc is the function called
+	MOVD	(R1), R4		// caller pc from stack
+	BL	racecall<>(SB)		// BL needed to maintain stack consistency
+	RET				//
+racecallatomic_ignore:
+	// Addr is outside the good range.
+	// Call __tsan_go_ignore_sync_begin to ignore synchronization during the atomic op.
+	// An attempt to synchronize on the address would cause crash.
+	MOVD	R8, R15	// save the original function
+	MOVD	R6, R17 // save the original arg list addr
+	MOVD	$__tsan_go_ignore_sync_begin(SB), R8 // func addr to call
+	MOVD    runtime·tls_g(SB), R10
+	MOVD    0(R10), g
+	MOVD    g_racectx(g), R3        // goroutine context
+	BL	racecall<>(SB)
+	MOVD	R15, R8	// restore the original function
+	MOVD	R17, R6 // restore arg list addr
+	// Call the atomic function.
+	// racecall will call LLVM race code which might clobber r30 (g)
+	MOVD	runtime·tls_g(SB), R10
+	MOVD	0(R10), g
+
+	MOVD	g_racectx(g), R3
+	MOVD	R8, R4		// pc being called same TODO as above
+	MOVD	(R1), R5	// caller pc from latest LR
+	BL	racecall<>(SB)
+	// Call __tsan_go_ignore_sync_end.
+	MOVD	$__tsan_go_ignore_sync_end(SB), R8
+	MOVD	g_racectx(g), R3	// goroutine context g should still be good?
+	BL	racecall<>(SB)
+	RET
+
+// void runtime·racecall(void(*f)(...), ...)
+// Calls C function f from race runtime and passes up to 4 arguments to it.
+// The arguments are never heap-object-preserving pointers, so we pretend there are no arguments.
+TEXT	runtime·racecall(SB), NOSPLIT, $0-0
+	MOVD	fn+0(FP), R8
+	MOVD	arg0+8(FP), R3
+	MOVD	arg1+16(FP), R4
+	MOVD	arg2+24(FP), R5
+	MOVD	arg3+32(FP), R6
+	JMP	racecall<>(SB)
+
+// Finds g0 and sets its stack
+// Arguments were loaded for call from Go to C
+TEXT	racecall<>(SB), NOSPLIT, $0-0
+	// Set the LR slot for the ppc64 ABI
+	MOVD	LR, R10
+	MOVD	R10, 0(R1)	// Go expectation
+	MOVD	R10, 16(R1)	// C ABI
+	// Get info from the current goroutine
+	MOVD    runtime·tls_g(SB), R10	// g offset in TLS
+	MOVD    0(R10), g
+	MOVD	g_m(g), R7		// m for g
+	MOVD	R1, R16			// callee-saved, preserved across C call
+	MOVD	m_g0(R7), R10		// g0 for m
+	CMP	R10, g			// same g0?
+	BEQ	call			// already on g0
+	MOVD	(g_sched+gobuf_sp)(R10), R1 // switch R1
+call:
+	// prepare frame for C ABI
+	SUB	$32, R1			// create frame for callee saving LR, CR, R2 etc.
+	RLDCR   $0, R1, $~15, R1	// align SP to 16 bytes
+	MOVD	R8, CTR			// R8 = caller addr
+	MOVD	R8, R12			// expected by PPC64 ABI
+	BL	(CTR)
+	XOR     R0, R0			// clear R0 on return from Clang
+	MOVD	R16, R1			// restore R1; R16 nonvol in Clang
+	MOVD    runtime·tls_g(SB), R10	// find correct g
+	MOVD    0(R10), g
+	MOVD	16(R1), R10		// LR was saved away, restore for return
+	MOVD	R10, LR
+	RET
+
+// C->Go callback thunk that allows to call runtime·racesymbolize from C code.
+// Direct Go->C race call has only switched SP, finish g->g0 switch by setting correct g.
+// The overall effect of Go->C->Go call chain is similar to that of mcall.
+// RARG0 contains command code. RARG1 contains command-specific context.
+// See racecallback for command codes.
+TEXT	runtime·racecallbackthunk(SB), NOSPLIT|NOFRAME, $0
+	// Handle command raceGetProcCmd (0) here.
+	// First, code below assumes that we are on curg, while raceGetProcCmd
+	// can be executed on g0. Second, it is called frequently, so will
+	// benefit from this fast path.
+	MOVD	$0, R0		// clear R0 since we came from C code
+	CMP	R3, $0
+	BNE	rest
+	// Inline raceGetProdCmd without clobbering callee-save registers.
+	MOVD	runtime·tls_g(SB), R10
+	MOVD	0(R10), R11
+	MOVD	g_m(R11), R3
+	MOVD	m_p(R3), R3
+	MOVD	p_raceprocctx(R3), R3
+	MOVD	R3, (R4)
+	RET
+
+rest:
+	// Save registers according to the host PPC64 ABI
+	// and reserve 16B for argument storage.
+	STACK_AND_SAVE_HOST_TO_GO_ABI(16)
+
+	// Load g, and switch to g0 if not already on it.
+	MOVD	runtime·tls_g(SB), R10
+	MOVD	0(R10), g
+
+	MOVD	g_m(g), R7
+	MOVD	m_g0(R7), R8
+	CMP	g, R8
+	BEQ	noswitch
+
+	MOVD	R8, g // set g = m->g0
+
+noswitch:
+	BL	runtime·racecallback<ABIInternal>(SB)
+
+	UNSTACK_AND_RESTORE_GO_TO_HOST_ABI(16)
+	RET
+
+// tls_g, g value for each thread in TLS
+GLOBL runtime·tls_g+0(SB), TLSBSS+DUPOK, $8
diff --git a/src/runtime/race_s390x.s b/src/runtime/race_s390x.s
new file mode 100644
index 0000000..beb7f83
--- /dev/null
+++ b/src/runtime/race_s390x.s
@@ -0,0 +1,391 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build race
+// +build race
+
+#include "go_asm.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// The following thunks allow calling the gcc-compiled race runtime directly
+// from Go code without going all the way through cgo.
+// First, it's much faster (up to 50% speedup for real Go programs).
+// Second, it eliminates race-related special cases from cgocall and scheduler.
+// Third, in long-term it will allow to remove cyclic runtime/race dependency on cmd/go.
+
+// A brief recap of the s390x C calling convention.
+// Arguments are passed in R2...R6, the rest is on stack.
+// Callee-saved registers are: R6...R13, R15.
+// Temporary registers are: R0...R5, R14.
+
+// When calling racecalladdr, R1 is the call target address.
+
+// The race ctx, ThreadState *thr below, is passed in R2 and loaded in racecalladdr.
+
+// func runtime·raceread(addr uintptr)
+// Called from instrumented code.
+TEXT	runtime·raceread(SB), NOSPLIT, $0-8
+	// void __tsan_read(ThreadState *thr, void *addr, void *pc);
+	MOVD	$__tsan_read(SB), R1
+	MOVD	addr+0(FP), R3
+	MOVD	R14, R4
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceRead(addr uintptr)
+TEXT	runtime·RaceRead(SB), NOSPLIT, $0-8
+	// This needs to be a tail call, because raceread reads caller pc.
+	JMP	runtime·raceread(SB)
+
+// func runtime·racereadpc(void *addr, void *callpc, void *pc)
+TEXT	runtime·racereadpc(SB), NOSPLIT, $0-24
+	// void __tsan_read_pc(ThreadState *thr, void *addr, void *callpc, void *pc);
+	MOVD	$__tsan_read_pc(SB), R1
+	LMG	addr+0(FP), R3, R5
+	JMP	racecalladdr<>(SB)
+
+// func runtime·racewrite(addr uintptr)
+// Called from instrumented code.
+TEXT	runtime·racewrite(SB), NOSPLIT, $0-8
+	// void __tsan_write(ThreadState *thr, void *addr, void *pc);
+	MOVD	$__tsan_write(SB), R1
+	MOVD	addr+0(FP), R3
+	MOVD	R14, R4
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceWrite(addr uintptr)
+TEXT	runtime·RaceWrite(SB), NOSPLIT, $0-8
+	// This needs to be a tail call, because racewrite reads caller pc.
+	JMP	runtime·racewrite(SB)
+
+// func runtime·racewritepc(void *addr, void *callpc, void *pc)
+TEXT	runtime·racewritepc(SB), NOSPLIT, $0-24
+	// void __tsan_write_pc(ThreadState *thr, void *addr, void *callpc, void *pc);
+	MOVD	$__tsan_write_pc(SB), R1
+	LMG	addr+0(FP), R3, R5
+	JMP	racecalladdr<>(SB)
+
+// func runtime·racereadrange(addr, size uintptr)
+// Called from instrumented code.
+TEXT	runtime·racereadrange(SB), NOSPLIT, $0-16
+	// void __tsan_read_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_read_range(SB), R1
+	LMG	addr+0(FP), R3, R4
+	MOVD	R14, R5
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceReadRange(addr, size uintptr)
+TEXT	runtime·RaceReadRange(SB), NOSPLIT, $0-16
+	// This needs to be a tail call, because racereadrange reads caller pc.
+	JMP	runtime·racereadrange(SB)
+
+// func runtime·racereadrangepc1(void *addr, uintptr sz, void *pc)
+TEXT	runtime·racereadrangepc1(SB), NOSPLIT, $0-24
+	// void __tsan_read_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_read_range(SB), R1
+	LMG	addr+0(FP), R3, R5
+	// pc is an interceptor address, but TSan expects it to point to the
+	// middle of an interceptor (see LLVM's SCOPED_INTERCEPTOR_RAW).
+	ADD	$2, R5
+	JMP	racecalladdr<>(SB)
+
+// func runtime·racewriterange(addr, size uintptr)
+// Called from instrumented code.
+TEXT	runtime·racewriterange(SB), NOSPLIT, $0-16
+	// void __tsan_write_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_write_range(SB), R1
+	LMG	addr+0(FP), R3, R4
+	MOVD	R14, R5
+	JMP	racecalladdr<>(SB)
+
+// func runtime·RaceWriteRange(addr, size uintptr)
+TEXT	runtime·RaceWriteRange(SB), NOSPLIT, $0-16
+	// This needs to be a tail call, because racewriterange reads caller pc.
+	JMP	runtime·racewriterange(SB)
+
+// func runtime·racewriterangepc1(void *addr, uintptr sz, void *pc)
+TEXT	runtime·racewriterangepc1(SB), NOSPLIT, $0-24
+	// void __tsan_write_range(ThreadState *thr, void *addr, uintptr size, void *pc);
+	MOVD	$__tsan_write_range(SB), R1
+	LMG	addr+0(FP), R3, R5
+	// pc is an interceptor address, but TSan expects it to point to the
+	// middle of an interceptor (see LLVM's SCOPED_INTERCEPTOR_RAW).
+	ADD	$2, R5
+	JMP	racecalladdr<>(SB)
+
+// If R3 is out of range, do nothing. Otherwise, setup goroutine context and
+// invoke racecall. Other arguments are already set.
+TEXT	racecalladdr<>(SB), NOSPLIT, $0-0
+	MOVD	runtime·racearenastart(SB), R0
+	CMPUBLT	R3, R0, data			// Before racearena start?
+	MOVD	runtime·racearenaend(SB), R0
+	CMPUBLT	R3, R0, call			// Before racearena end?
+data:
+	MOVD	runtime·racedatastart(SB), R0
+	CMPUBLT	R3, R0, ret			// Before racedata start?
+	MOVD	runtime·racedataend(SB), R0
+	CMPUBGE	R3, R0, ret			// At or after racedata end?
+call:
+	MOVD	g_racectx(g), R2
+	JMP	racecall<>(SB)
+ret:
+	RET
+
+// func runtime·racefuncenter(pc uintptr)
+// Called from instrumented code.
+TEXT	runtime·racefuncenter(SB), NOSPLIT, $0-8
+	MOVD	callpc+0(FP), R3
+	JMP	racefuncenter<>(SB)
+
+// Common code for racefuncenter
+// R3 = caller's return address
+TEXT	racefuncenter<>(SB), NOSPLIT, $0-0
+	// void __tsan_func_enter(ThreadState *thr, void *pc);
+	MOVD	$__tsan_func_enter(SB), R1
+	MOVD	g_racectx(g), R2
+	BL	racecall<>(SB)
+	RET
+
+// func runtime·racefuncexit()
+// Called from instrumented code.
+TEXT	runtime·racefuncexit(SB), NOSPLIT, $0-0
+	// void __tsan_func_exit(ThreadState *thr);
+	MOVD	$__tsan_func_exit(SB), R1
+	MOVD	g_racectx(g), R2
+	JMP	racecall<>(SB)
+
+// Atomic operations for sync/atomic package.
+
+// Load
+
+TEXT	sync∕atomic·LoadInt32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	MOVD	$__tsan_go_atomic32_load(SB), R1
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·LoadInt64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	MOVD	$__tsan_go_atomic64_load(SB), R1
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·LoadUint32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt32(SB)
+
+TEXT	sync∕atomic·LoadUint64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt64(SB)
+
+TEXT	sync∕atomic·LoadUintptr(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt64(SB)
+
+TEXT	sync∕atomic·LoadPointer(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·LoadInt64(SB)
+
+// Store
+
+TEXT	sync∕atomic·StoreInt32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	MOVD	$__tsan_go_atomic32_store(SB), R1
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·StoreInt64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	MOVD	$__tsan_go_atomic64_store(SB), R1
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·StoreUint32(SB), NOSPLIT, $0-12
+	GO_ARGS
+	JMP	sync∕atomic·StoreInt32(SB)
+
+TEXT	sync∕atomic·StoreUint64(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·StoreInt64(SB)
+
+TEXT	sync∕atomic·StoreUintptr(SB), NOSPLIT, $0-16
+	GO_ARGS
+	JMP	sync∕atomic·StoreInt64(SB)
+
+// Swap
+
+TEXT	sync∕atomic·SwapInt32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	MOVD	$__tsan_go_atomic32_exchange(SB), R1
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·SwapInt64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	MOVD	$__tsan_go_atomic64_exchange(SB), R1
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·SwapUint32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	JMP	sync∕atomic·SwapInt32(SB)
+
+TEXT	sync∕atomic·SwapUint64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·SwapInt64(SB)
+
+TEXT	sync∕atomic·SwapUintptr(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·SwapInt64(SB)
+
+// Add
+
+TEXT	sync∕atomic·AddInt32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	MOVD	$__tsan_go_atomic32_fetch_add(SB), R1
+	BL	racecallatomic<>(SB)
+	// TSan performed fetch_add, but Go needs add_fetch.
+	MOVW	add+8(FP), R0
+	MOVW	ret+16(FP), R1
+	ADD	R0, R1, R0
+	MOVW	R0, ret+16(FP)
+	RET
+
+TEXT	sync∕atomic·AddInt64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	MOVD	$__tsan_go_atomic64_fetch_add(SB), R1
+	BL	racecallatomic<>(SB)
+	// TSan performed fetch_add, but Go needs add_fetch.
+	MOVD	add+8(FP), R0
+	MOVD	ret+16(FP), R1
+	ADD	R0, R1, R0
+	MOVD	R0, ret+16(FP)
+	RET
+
+TEXT	sync∕atomic·AddUint32(SB), NOSPLIT, $0-20
+	GO_ARGS
+	JMP	sync∕atomic·AddInt32(SB)
+
+TEXT	sync∕atomic·AddUint64(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·AddInt64(SB)
+
+TEXT	sync∕atomic·AddUintptr(SB), NOSPLIT, $0-24
+	GO_ARGS
+	JMP	sync∕atomic·AddInt64(SB)
+
+// CompareAndSwap
+
+TEXT	sync∕atomic·CompareAndSwapInt32(SB), NOSPLIT, $0-17
+	GO_ARGS
+	MOVD	$__tsan_go_atomic32_compare_exchange(SB), R1
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·CompareAndSwapInt64(SB), NOSPLIT, $0-25
+	GO_ARGS
+	MOVD	$__tsan_go_atomic64_compare_exchange(SB), R1
+	BL	racecallatomic<>(SB)
+	RET
+
+TEXT	sync∕atomic·CompareAndSwapUint32(SB), NOSPLIT, $0-17
+	GO_ARGS
+	JMP	sync∕atomic·CompareAndSwapInt32(SB)
+
+TEXT	sync∕atomic·CompareAndSwapUint64(SB), NOSPLIT, $0-25
+	GO_ARGS
+	JMP	sync∕atomic·CompareAndSwapInt64(SB)
+
+TEXT	sync∕atomic·CompareAndSwapUintptr(SB), NOSPLIT, $0-25
+	GO_ARGS
+	JMP	sync∕atomic·CompareAndSwapInt64(SB)
+
+// Common code for atomic operations. Calls R1.
+TEXT	racecallatomic<>(SB), NOSPLIT, $0
+	MOVD	24(R15), R5			// Address (arg1, after 2xBL).
+	// If we pass an invalid pointer to the TSan runtime, it will cause a
+	// "fatal error: unknown caller pc". So trigger a SEGV here instead.
+	MOVB	(R5), R0
+	MOVD	runtime·racearenastart(SB), R0
+	CMPUBLT	R5, R0, racecallatomic_data	// Before racearena start?
+	MOVD	runtime·racearenaend(SB), R0
+	CMPUBLT	R5, R0, racecallatomic_ok	// Before racearena end?
+racecallatomic_data:
+	MOVD	runtime·racedatastart(SB), R0
+	CMPUBLT	R5, R0, racecallatomic_ignore	// Before racedata start?
+	MOVD	runtime·racedataend(SB), R0
+	CMPUBGE	R5, R0,	racecallatomic_ignore	// At or after racearena end?
+racecallatomic_ok:
+	MOVD	g_racectx(g), R2		// ThreadState *.
+	MOVD	8(R15), R3			// Caller PC.
+	MOVD	R14, R4				// PC.
+	ADD	$24, R15, R5			// Arguments.
+	// Tail call fails to restore R15, so use a normal one.
+	BL	racecall<>(SB)
+	RET
+racecallatomic_ignore:
+	// Call __tsan_go_ignore_sync_begin to ignore synchronization during
+	// the atomic op. An attempt to synchronize on the address would cause
+	// a crash.
+	MOVD	R1, R6				// Save target function.
+	MOVD	R14, R7				// Save PC.
+	MOVD	$__tsan_go_ignore_sync_begin(SB), R1
+	MOVD	g_racectx(g), R2		// ThreadState *.
+	BL	racecall<>(SB)
+	MOVD	R6, R1				// Restore target function.
+	MOVD	g_racectx(g), R2		// ThreadState *.
+	MOVD	8(R15), R3			// Caller PC.
+	MOVD	R7, R4				// PC.
+	ADD	$24, R15, R5			// Arguments.
+	BL	racecall<>(SB)
+	MOVD	$__tsan_go_ignore_sync_end(SB), R1
+	MOVD	g_racectx(g), R2		// ThreadState *.
+	BL	racecall<>(SB)
+	RET
+
+// func runtime·racecall(void(*f)(...), ...)
+// Calls C function f from race runtime and passes up to 4 arguments to it.
+// The arguments are never heap-object-preserving pointers, so we pretend there
+// are no arguments.
+TEXT	runtime·racecall(SB), NOSPLIT, $0-0
+	MOVD	fn+0(FP), R1
+	MOVD	arg0+8(FP), R2
+	MOVD	arg1+16(FP), R3
+	MOVD	arg2+24(FP), R4
+	MOVD	arg3+32(FP), R5
+	JMP	racecall<>(SB)
+
+// Switches SP to g0 stack and calls R1. Arguments are already set.
+TEXT	racecall<>(SB), NOSPLIT, $0-0
+	BL	runtime·save_g(SB)		// Save g for callbacks.
+	MOVD	R15, R7				// Save SP.
+	MOVD	g_m(g), R8			// R8 = thread.
+	MOVD	m_g0(R8), R8			// R8 = g0.
+	CMPBEQ	R8, g, call			// Already on g0?
+	MOVD	(g_sched+gobuf_sp)(R8), R15	// Switch SP to g0.
+call:	SUB	$160, R15			// Allocate C frame.
+	BL	R1				// Call C code.
+	MOVD	R7, R15				// Restore SP.
+	RET					// Return to Go.
+
+// C->Go callback thunk that allows to call runtime·racesymbolize from C
+// code. racecall has only switched SP, finish g->g0 switch by setting correct
+// g. R2 contains command code, R3 contains command-specific context. See
+// racecallback for command codes.
+TEXT	runtime·racecallbackthunk(SB), NOSPLIT|NOFRAME, $0
+	STMG	R6, R15, 48(R15)		// Save non-volatile regs.
+	BL	runtime·load_g(SB)		// Saved by racecall.
+	CMPBNE	R2, $0, rest			// raceGetProcCmd?
+	MOVD	g_m(g), R2			// R2 = thread.
+	MOVD	m_p(R2), R2			// R2 = processor.
+	MVC	$8, p_raceprocctx(R2), (R3)	// *R3 = ThreadState *.
+	LMG	48(R15), R6, R15		// Restore non-volatile regs.
+	BR	R14				// Return to C.
+rest:	MOVD	g_m(g), R4			// R4 = current thread.
+	MOVD	m_g0(R4), g			// Switch to g0.
+	SUB	$24, R15			// Allocate Go argument slots.
+	STMG	R2, R3, 8(R15)			// Fill Go frame.
+	BL	runtime·racecallback(SB)	// Call Go code.
+	LMG	72(R15), R6, R15		// Restore non-volatile regs.
+	BR	R14				// Return to C.
diff --git a/src/runtime/rand_test.go b/src/runtime/rand_test.go
new file mode 100644
index 0000000..92d07eb
--- /dev/null
+++ b/src/runtime/rand_test.go
@@ -0,0 +1,53 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	. "runtime"
+	"strconv"
+	"testing"
+)
+
+func BenchmarkFastrand(b *testing.B) {
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			Fastrand()
+		}
+	})
+}
+
+func BenchmarkFastrand64(b *testing.B) {
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			Fastrand64()
+		}
+	})
+}
+
+func BenchmarkFastrandHashiter(b *testing.B) {
+	var m = make(map[int]int, 10)
+	for i := 0; i < 10; i++ {
+		m[i] = i
+	}
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			for range m {
+				break
+			}
+		}
+	})
+}
+
+var sink32 uint32
+
+func BenchmarkFastrandn(b *testing.B) {
+	for n := uint32(2); n <= 5; n++ {
+		b.Run(strconv.Itoa(int(n)), func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				sink32 = Fastrandn(n)
+			}
+		})
+	}
+}
diff --git a/src/runtime/rdebug.go b/src/runtime/rdebug.go
new file mode 100644
index 0000000..7ecb2a5
--- /dev/null
+++ b/src/runtime/rdebug.go
@@ -0,0 +1,22 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import _ "unsafe" // for go:linkname
+
+//go:linkname setMaxStack runtime/debug.setMaxStack
+func setMaxStack(in int) (out int) {
+	out = int(maxstacksize)
+	maxstacksize = uintptr(in)
+	return out
+}
+
+//go:linkname setPanicOnFault runtime/debug.setPanicOnFault
+func setPanicOnFault(new bool) (old bool) {
+	gp := getg()
+	old = gp.paniconfault
+	gp.paniconfault = new
+	return old
+}
diff --git a/src/runtime/retry.go b/src/runtime/retry.go
new file mode 100644
index 0000000..2e2f813
--- /dev/null
+++ b/src/runtime/retry.go
@@ -0,0 +1,23 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package runtime
+
+// retryOnEAGAIN retries a function until it does not return EAGAIN.
+// It will use an increasing delay between calls, and retry up to 20 times.
+// The function argument is expected to return an errno value,
+// and retryOnEAGAIN will return any errno value other than EAGAIN.
+// If all retries return EAGAIN, then retryOnEAGAIN will return EAGAIN.
+func retryOnEAGAIN(fn func() int32) int32 {
+	for tries := 0; tries < 20; tries++ {
+		errno := fn()
+		if errno != _EAGAIN {
+			return errno
+		}
+		usleep_no_g(uint32(tries+1) * 1000) // milliseconds
+	}
+	return _EAGAIN
+}
diff --git a/src/runtime/rt0_aix_ppc64.s b/src/runtime/rt0_aix_ppc64.s
new file mode 100644
index 0000000..1670a80
--- /dev/null
+++ b/src/runtime/rt0_aix_ppc64.s
@@ -0,0 +1,190 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+#include "asm_ppc64x.h"
+
+// _rt0_ppc64_aix is a function descriptor of the entrypoint function
+// __start. This name is needed by cmd/link.
+DEFINE_PPC64X_FUNCDESC(_rt0_ppc64_aix, __start<>)
+
+// The starting function must return in the loader to
+// initialise some libraries, especially libthread which
+// creates the main thread and adds the TLS in R13
+// R19 contains a function descriptor to the loader function
+// which needs to be called.
+// This code is similar to the __start function in C
+TEXT __start<>(SB),NOSPLIT,$-8
+	XOR R0, R0
+	MOVD $libc___n_pthreads(SB), R4
+	MOVD 0(R4), R4
+	MOVD $libc___mod_init(SB), R5
+	MOVD 0(R5), R5
+	MOVD 0(R19), R0
+	MOVD R2, 40(R1)
+	MOVD 8(R19), R2
+	MOVD R18, R3
+	MOVD R0, CTR
+	BL (CTR) // Return to AIX loader
+
+	// Launch rt0_go
+	MOVD 40(R1), R2
+	MOVD R14, R3 // argc
+	MOVD R15, R4 // argv
+	BL _main(SB)
+
+
+DEFINE_PPC64X_FUNCDESC(main, _main)
+TEXT _main(SB),NOSPLIT,$-8
+	MOVD $runtime·rt0_go(SB), R12
+	MOVD R12, CTR
+	BR (CTR)
+
+
+TEXT _rt0_ppc64_aix_lib(SB),NOSPLIT,$-8
+	// Start with standard C stack frame layout and linkage.
+	MOVD	LR, R0
+	MOVD	R0, 16(R1) // Save LR in caller's frame.
+	MOVW	CR, R0	   // Save CR in caller's frame
+	MOVD	R0, 8(R1)
+
+	MOVDU	R1, -344(R1) // Allocate frame.
+
+	// Preserve callee-save registers.
+	MOVD	R14, 48(R1)
+	MOVD	R15, 56(R1)
+	MOVD	R16, 64(R1)
+	MOVD	R17, 72(R1)
+	MOVD	R18, 80(R1)
+	MOVD	R19, 88(R1)
+	MOVD	R20, 96(R1)
+	MOVD	R21,104(R1)
+	MOVD	R22, 112(R1)
+	MOVD	R23, 120(R1)
+	MOVD	R24, 128(R1)
+	MOVD	R25, 136(R1)
+	MOVD	R26, 144(R1)
+	MOVD	R27, 152(R1)
+	MOVD	R28, 160(R1)
+	MOVD	R29, 168(R1)
+	MOVD	g, 176(R1) // R30
+	MOVD	R31, 184(R1)
+	FMOVD	F14, 192(R1)
+	FMOVD	F15, 200(R1)
+	FMOVD	F16, 208(R1)
+	FMOVD	F17, 216(R1)
+	FMOVD	F18, 224(R1)
+	FMOVD	F19, 232(R1)
+	FMOVD	F20, 240(R1)
+	FMOVD	F21, 248(R1)
+	FMOVD	F22, 256(R1)
+	FMOVD	F23, 264(R1)
+	FMOVD	F24, 272(R1)
+	FMOVD	F25, 280(R1)
+	FMOVD	F26, 288(R1)
+	FMOVD	F27, 296(R1)
+	FMOVD	F28, 304(R1)
+	FMOVD	F29, 312(R1)
+	FMOVD	F30, 320(R1)
+	FMOVD	F31, 328(R1)
+
+	// Synchronous initialization.
+	MOVD	$runtime·reginit(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+
+	MOVBZ	runtime·isarchive(SB), R3	// Check buildmode = c-archive
+	CMP		$0, R3
+	BEQ		done
+
+	MOVD	R14, _rt0_ppc64_aix_lib_argc<>(SB)
+	MOVD	R15, _rt0_ppc64_aix_lib_argv<>(SB)
+
+	MOVD	$runtime·libpreinit(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVD	_cgo_sys_thread_create(SB), R12
+	CMP	$0, R12
+	BEQ	nocgo
+	MOVD	$_rt0_ppc64_aix_lib_go(SB), R3
+	MOVD	$0, R4
+	MOVD	R2, 40(R1)
+	MOVD	8(R12), R2
+	MOVD	(R12), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+	MOVD	40(R1), R2
+	BR	done
+
+nocgo:
+	MOVD	$0x800000, R12					   // stacksize = 8192KB
+	MOVD	R12, 8(R1)
+	MOVD	$_rt0_ppc64_aix_lib_go(SB), R12
+	MOVD	R12, 16(R1)
+	MOVD	$runtime·newosproc0(SB),R12
+	MOVD	R12, CTR
+	BL	(CTR)
+
+done:
+	// Restore saved registers.
+	MOVD	48(R1), R14
+	MOVD	56(R1), R15
+	MOVD	64(R1), R16
+	MOVD	72(R1), R17
+	MOVD	80(R1), R18
+	MOVD	88(R1), R19
+	MOVD	96(R1), R20
+	MOVD	104(R1), R21
+	MOVD	112(R1), R22
+	MOVD	120(R1), R23
+	MOVD	128(R1), R24
+	MOVD	136(R1), R25
+	MOVD	144(R1), R26
+	MOVD	152(R1), R27
+	MOVD	160(R1), R28
+	MOVD	168(R1), R29
+	MOVD	176(R1), g // R30
+	MOVD	184(R1), R31
+	FMOVD	196(R1), F14
+	FMOVD	200(R1), F15
+	FMOVD	208(R1), F16
+	FMOVD	216(R1), F17
+	FMOVD	224(R1), F18
+	FMOVD	232(R1), F19
+	FMOVD	240(R1), F20
+	FMOVD	248(R1), F21
+	FMOVD	256(R1), F22
+	FMOVD	264(R1), F23
+	FMOVD	272(R1), F24
+	FMOVD	280(R1), F25
+	FMOVD	288(R1), F26
+	FMOVD	296(R1), F27
+	FMOVD	304(R1), F28
+	FMOVD	312(R1), F29
+	FMOVD	320(R1), F30
+	FMOVD	328(R1), F31
+
+	ADD	$344, R1
+
+	MOVD	8(R1), R0
+	MOVFL	R0, $0xff
+	MOVD	16(R1), R0
+	MOVD	R0, LR
+	RET
+
+DEFINE_PPC64X_FUNCDESC(_rt0_ppc64_aix_lib_go, __rt0_ppc64_aix_lib_go)
+
+TEXT __rt0_ppc64_aix_lib_go(SB),NOSPLIT,$0
+	MOVD	_rt0_ppc64_aix_lib_argc<>(SB), R3
+	MOVD	_rt0_ppc64_aix_lib_argv<>(SB), R4
+	MOVD	$runtime·rt0_go(SB), R12
+	MOVD	R12, CTR
+	BR	(CTR)
+
+DATA _rt0_ppc64_aix_lib_argc<>(SB)/8, $0
+GLOBL _rt0_ppc64_aix_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_ppc64_aix_lib_argv<>(SB)/8, $0
+GLOBL _rt0_ppc64_aix_lib_argv<>(SB),NOPTR, $8
diff --git a/src/runtime/rt0_android_386.s b/src/runtime/rt0_android_386.s
new file mode 100644
index 0000000..3a1b06b
--- /dev/null
+++ b/src/runtime/rt0_android_386.s
@@ -0,0 +1,27 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_386_android(SB),NOSPLIT,$0
+	JMP	_rt0_386(SB)
+
+TEXT _rt0_386_android_lib(SB),NOSPLIT,$0
+	PUSHL	$_rt0_386_android_argv(SB)  // argv
+	PUSHL	$1  // argc
+	CALL	_rt0_386_lib(SB)
+	POPL	AX
+	POPL	AX
+	RET
+
+DATA _rt0_386_android_argv+0x00(SB)/4,$_rt0_386_android_argv0(SB)
+DATA _rt0_386_android_argv+0x04(SB)/4,$0  // argv terminate
+DATA _rt0_386_android_argv+0x08(SB)/4,$0  // envp terminate
+DATA _rt0_386_android_argv+0x0c(SB)/4,$0  // auxv terminate
+GLOBL _rt0_386_android_argv(SB),NOPTR,$0x10
+
+// TODO: wire up necessary VDSO (see os_linux_386.go)
+
+DATA _rt0_386_android_argv0(SB)/8, $"gojni"
+GLOBL _rt0_386_android_argv0(SB),RODATA,$8
diff --git a/src/runtime/rt0_android_amd64.s b/src/runtime/rt0_android_amd64.s
new file mode 100644
index 0000000..6bda3bf
--- /dev/null
+++ b/src/runtime/rt0_android_amd64.s
@@ -0,0 +1,22 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_amd64_android(SB),NOSPLIT,$-8
+	JMP	_rt0_amd64(SB)
+
+TEXT _rt0_amd64_android_lib(SB),NOSPLIT,$0
+	MOVQ	$1, DI // argc
+	MOVQ	$_rt0_amd64_android_argv(SB), SI  // argv
+	JMP	_rt0_amd64_lib(SB)
+
+DATA _rt0_amd64_android_argv+0x00(SB)/8,$_rt0_amd64_android_argv0(SB)
+DATA _rt0_amd64_android_argv+0x08(SB)/8,$0 // end argv
+DATA _rt0_amd64_android_argv+0x10(SB)/8,$0 // end envv
+DATA _rt0_amd64_android_argv+0x18(SB)/8,$0 // end auxv
+GLOBL _rt0_amd64_android_argv(SB),NOPTR,$0x20
+
+DATA _rt0_amd64_android_argv0(SB)/8, $"gojni"
+GLOBL _rt0_amd64_android_argv0(SB),RODATA,$8
diff --git a/src/runtime/rt0_android_arm.s b/src/runtime/rt0_android_arm.s
new file mode 100644
index 0000000..cc5b78e
--- /dev/null
+++ b/src/runtime/rt0_android_arm.s
@@ -0,0 +1,25 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_arm_android(SB),NOSPLIT|NOFRAME,$0
+	MOVW		(R13), R0      // argc
+	MOVW		$4(R13), R1    // argv
+	MOVW		$_rt0_arm_linux1(SB), R4
+	B		(R4)
+
+TEXT _rt0_arm_android_lib(SB),NOSPLIT,$0
+	MOVW	$1, R0                          // argc
+	MOVW	$_rt0_arm_android_argv(SB), R1  // **argv
+	B	_rt0_arm_lib(SB)
+
+DATA _rt0_arm_android_argv+0x00(SB)/4,$_rt0_arm_android_argv0(SB)
+DATA _rt0_arm_android_argv+0x04(SB)/4,$0 // end argv
+DATA _rt0_arm_android_argv+0x08(SB)/4,$0 // end envv
+DATA _rt0_arm_android_argv+0x0c(SB)/4,$0 // end auxv
+GLOBL _rt0_arm_android_argv(SB),NOPTR,$0x10
+
+DATA _rt0_arm_android_argv0(SB)/8, $"gojni"
+GLOBL _rt0_arm_android_argv0(SB),RODATA,$8
diff --git a/src/runtime/rt0_android_arm64.s b/src/runtime/rt0_android_arm64.s
new file mode 100644
index 0000000..4135bf0
--- /dev/null
+++ b/src/runtime/rt0_android_arm64.s
@@ -0,0 +1,26 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_arm64_android(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$_rt0_arm64_linux(SB), R4
+	B	(R4)
+
+// When building with -buildmode=c-shared, this symbol is called when the shared
+// library is loaded.
+TEXT _rt0_arm64_android_lib(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$1, R0                            // argc
+	MOVD	$_rt0_arm64_android_argv(SB), R1  // **argv
+	MOVD	$_rt0_arm64_linux_lib(SB), R4
+	B	(R4)
+
+DATA _rt0_arm64_android_argv+0x00(SB)/8,$_rt0_arm64_android_argv0(SB)
+DATA _rt0_arm64_android_argv+0x08(SB)/8,$0 // end argv
+DATA _rt0_arm64_android_argv+0x10(SB)/8,$0 // end envv
+DATA _rt0_arm64_android_argv+0x18(SB)/8,$0 // end auxv
+GLOBL _rt0_arm64_android_argv(SB),NOPTR,$0x20
+
+DATA _rt0_arm64_android_argv0(SB)/8, $"gojni"
+GLOBL _rt0_arm64_android_argv0(SB),RODATA,$8
diff --git a/src/runtime/rt0_darwin_amd64.s b/src/runtime/rt0_darwin_amd64.s
new file mode 100644
index 0000000..ed804d4
--- /dev/null
+++ b/src/runtime/rt0_darwin_amd64.s
@@ -0,0 +1,13 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_amd64_darwin(SB),NOSPLIT,$-8
+	JMP	_rt0_amd64(SB)
+
+// When linking with -shared, this symbol is called when the shared library
+// is loaded.
+TEXT _rt0_amd64_darwin_lib(SB),NOSPLIT,$0
+	JMP	_rt0_amd64_lib(SB)
diff --git a/src/runtime/rt0_darwin_arm64.s b/src/runtime/rt0_darwin_arm64.s
new file mode 100644
index 0000000..697104a
--- /dev/null
+++ b/src/runtime/rt0_darwin_arm64.s
@@ -0,0 +1,63 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+#include "cgo/abi_arm64.h"
+
+TEXT _rt0_arm64_darwin(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$runtime·rt0_go(SB), R2
+	BL	(R2)
+exit:
+	MOVD	$0, R0
+	MOVD	$1, R16	// sys_exit
+	SVC	$0x80
+	B	exit
+
+// When linking with -buildmode=c-archive or -buildmode=c-shared,
+// this symbol is called from a global initialization function.
+//
+// Note that all currently shipping darwin/arm64 platforms require
+// cgo and do not support c-shared.
+TEXT _rt0_arm64_darwin_lib(SB),NOSPLIT,$152
+	// Preserve callee-save registers.
+	SAVE_R19_TO_R28(8)
+	SAVE_F8_TO_F15(88)
+
+	MOVD  R0, _rt0_arm64_darwin_lib_argc<>(SB)
+	MOVD  R1, _rt0_arm64_darwin_lib_argv<>(SB)
+
+	MOVD	$0, g // initialize g to nil
+
+	// Synchronous initialization.
+	MOVD	$runtime·libpreinit(SB), R4
+	BL	(R4)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVD  _cgo_sys_thread_create(SB), R4
+	MOVD  $_rt0_arm64_darwin_lib_go(SB), R0
+	MOVD  $0, R1
+	SUB   $16, RSP		// reserve 16 bytes for sp-8 where fp may be saved.
+	BL    (R4)
+	ADD   $16, RSP
+
+	// Restore callee-save registers.
+	RESTORE_R19_TO_R28(8)
+	RESTORE_F8_TO_F15(88)
+
+	RET
+
+TEXT _rt0_arm64_darwin_lib_go(SB),NOSPLIT,$0
+	MOVD  _rt0_arm64_darwin_lib_argc<>(SB), R0
+	MOVD  _rt0_arm64_darwin_lib_argv<>(SB), R1
+	MOVD  $runtime·rt0_go(SB), R4
+	B     (R4)
+
+DATA  _rt0_arm64_darwin_lib_argc<>(SB)/8, $0
+GLOBL _rt0_arm64_darwin_lib_argc<>(SB),NOPTR, $8
+DATA  _rt0_arm64_darwin_lib_argv<>(SB)/8, $0
+GLOBL _rt0_arm64_darwin_lib_argv<>(SB),NOPTR, $8
+
+// external linking entry point.
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	JMP	_rt0_arm64_darwin(SB)
diff --git a/src/runtime/rt0_dragonfly_amd64.s b/src/runtime/rt0_dragonfly_amd64.s
new file mode 100644
index 0000000..e76f9b9
--- /dev/null
+++ b/src/runtime/rt0_dragonfly_amd64.s
@@ -0,0 +1,14 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// On Dragonfly argc/argv are passed in DI, not SP, so we can't use _rt0_amd64.
+TEXT _rt0_amd64_dragonfly(SB),NOSPLIT,$-8
+	LEAQ	8(DI), SI // argv
+	MOVQ	0(DI), DI // argc
+	JMP	runtime·rt0_go(SB)
+
+TEXT _rt0_amd64_dragonfly_lib(SB),NOSPLIT,$0
+	JMP	_rt0_amd64_lib(SB)
diff --git a/src/runtime/rt0_freebsd_386.s b/src/runtime/rt0_freebsd_386.s
new file mode 100644
index 0000000..1808059
--- /dev/null
+++ b/src/runtime/rt0_freebsd_386.s
@@ -0,0 +1,17 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_386_freebsd(SB),NOSPLIT,$0
+	JMP	_rt0_386(SB)
+
+TEXT _rt0_386_freebsd_lib(SB),NOSPLIT,$0
+	JMP	_rt0_386_lib(SB)
+
+TEXT main(SB),NOSPLIT,$0
+	// Remove the return address from the stack.
+	// rt0_go doesn't expect it to be there.
+	ADDL	$4, SP
+	JMP	runtime·rt0_go(SB)
diff --git a/src/runtime/rt0_freebsd_amd64.s b/src/runtime/rt0_freebsd_amd64.s
new file mode 100644
index 0000000..ccc48f6
--- /dev/null
+++ b/src/runtime/rt0_freebsd_amd64.s
@@ -0,0 +1,14 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// On FreeBSD argc/argv are passed in DI, not SP, so we can't use _rt0_amd64.
+TEXT _rt0_amd64_freebsd(SB),NOSPLIT,$-8
+	LEAQ	8(DI), SI // argv
+	MOVQ	0(DI), DI // argc
+	JMP	runtime·rt0_go(SB)
+
+TEXT _rt0_amd64_freebsd_lib(SB),NOSPLIT,$0
+	JMP	_rt0_amd64_lib(SB)
diff --git a/src/runtime/rt0_freebsd_arm.s b/src/runtime/rt0_freebsd_arm.s
new file mode 100644
index 0000000..62ecd9a
--- /dev/null
+++ b/src/runtime/rt0_freebsd_arm.s
@@ -0,0 +1,11 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_arm_freebsd(SB),NOSPLIT,$0
+	B	_rt0_arm(SB)
+
+TEXT _rt0_arm_freebsd_lib(SB),NOSPLIT,$0
+	B	_rt0_arm_lib(SB)
diff --git a/src/runtime/rt0_freebsd_arm64.s b/src/runtime/rt0_freebsd_arm64.s
new file mode 100644
index 0000000..e517ae0
--- /dev/null
+++ b/src/runtime/rt0_freebsd_arm64.s
@@ -0,0 +1,74 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+#include "cgo/abi_arm64.h"
+
+// On FreeBSD argc/argv are passed in R0, not RSP
+TEXT _rt0_arm64_freebsd(SB),NOSPLIT|NOFRAME,$0
+	ADD	$8, R0, R1	// argv
+	MOVD	0(R0), R0	// argc
+	BL	main(SB)
+
+// When building with -buildmode=c-shared, this symbol is called when the shared
+// library is loaded.
+TEXT _rt0_arm64_freebsd_lib(SB),NOSPLIT,$184
+	// Preserve callee-save registers.
+	SAVE_R19_TO_R28(24)
+	SAVE_F8_TO_F15(104)
+
+	// Initialize g as null in case of using g later e.g. sigaction in cgo_sigaction.go
+	MOVD	ZR, g
+
+	MOVD	R0, _rt0_arm64_freebsd_lib_argc<>(SB)
+	MOVD	R1, _rt0_arm64_freebsd_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	MOVD	$runtime·libpreinit(SB), R4
+	BL	(R4)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVD	_cgo_sys_thread_create(SB), R4
+	CBZ	R4, nocgo
+	MOVD	$_rt0_arm64_freebsd_lib_go(SB), R0
+	MOVD	$0, R1
+	SUB	$16, RSP	// reserve 16 bytes for sp-8 where fp may be saved.
+	BL	(R4)
+	ADD	$16, RSP
+	B	restore
+
+nocgo:
+	MOVD	$0x800000, R0                     // stacksize = 8192KB
+	MOVD	$_rt0_arm64_freebsd_lib_go(SB), R1
+	MOVD	R0, 8(RSP)
+	MOVD	R1, 16(RSP)
+	MOVD	$runtime·newosproc0(SB),R4
+	BL	(R4)
+
+restore:
+	// Restore callee-save registers.
+	RESTORE_R19_TO_R28(24)
+	RESTORE_F8_TO_F15(104)
+	RET
+
+TEXT _rt0_arm64_freebsd_lib_go(SB),NOSPLIT,$0
+	MOVD	_rt0_arm64_freebsd_lib_argc<>(SB), R0
+	MOVD	_rt0_arm64_freebsd_lib_argv<>(SB), R1
+	MOVD	$runtime·rt0_go(SB),R4
+	B       (R4)
+
+DATA _rt0_arm64_freebsd_lib_argc<>(SB)/8, $0
+GLOBL _rt0_arm64_freebsd_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_arm64_freebsd_lib_argv<>(SB)/8, $0
+GLOBL _rt0_arm64_freebsd_lib_argv<>(SB),NOPTR, $8
+
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$runtime·rt0_go(SB), R2
+	BL	(R2)
+exit:
+	MOVD	$0, R0
+	MOVD	$1, R8	// SYS_exit
+	SVC
+	B	exit
diff --git a/src/runtime/rt0_freebsd_riscv64.s b/src/runtime/rt0_freebsd_riscv64.s
new file mode 100644
index 0000000..dc46b70
--- /dev/null
+++ b/src/runtime/rt0_freebsd_riscv64.s
@@ -0,0 +1,112 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// On FreeBSD argc/argv are passed in R0, not X2
+TEXT _rt0_riscv64_freebsd(SB),NOSPLIT|NOFRAME,$0
+	ADD	$8, A0, A1	// argv
+	MOV	0(A0), A0	// argc
+	JMP	main(SB)
+
+// When building with -buildmode=c-shared, this symbol is called when the shared
+// library is loaded.
+TEXT _rt0_riscv64_freebsd_lib(SB),NOSPLIT,$224
+	// Preserve callee-save registers, along with X1 (LR).
+	MOV	X1, (8*3)(X2)
+	MOV	X8, (8*4)(X2)
+	MOV	X9, (8*5)(X2)
+	MOV	X18, (8*6)(X2)
+	MOV	X19, (8*7)(X2)
+	MOV	X20, (8*8)(X2)
+	MOV	X21, (8*9)(X2)
+	MOV	X22, (8*10)(X2)
+	MOV	X23, (8*11)(X2)
+	MOV	X24, (8*12)(X2)
+	MOV	X25, (8*13)(X2)
+	MOV	X26, (8*14)(X2)
+	MOV	g, (8*15)(X2)
+	MOVD	F8, (8*16)(X2)
+	MOVD	F9, (8*17)(X2)
+	MOVD	F18, (8*18)(X2)
+	MOVD	F19, (8*19)(X2)
+	MOVD	F20, (8*20)(X2)
+	MOVD	F21, (8*21)(X2)
+	MOVD	F22, (8*22)(X2)
+	MOVD	F23, (8*23)(X2)
+	MOVD	F24, (8*24)(X2)
+	MOVD	F25, (8*25)(X2)
+	MOVD	F26, (8*26)(X2)
+	MOVD	F27, (8*27)(X2)
+
+	// Initialize g as nil in case of using g later e.g. sigaction in cgo_sigaction.go
+	MOV	X0, g
+
+	MOV	A0, _rt0_riscv64_freebsd_lib_argc<>(SB)
+	MOV	A1, _rt0_riscv64_freebsd_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	MOV	$runtime·libpreinit(SB), T0
+	JALR	RA, T0
+
+	// Create a new thread to do the runtime initialization and return.
+	MOV	_cgo_sys_thread_create(SB), T0
+	BEQZ	T0, nocgo
+	MOV	$_rt0_riscv64_freebsd_lib_go(SB), A0
+	MOV	$0, A1
+	JALR	RA, T0
+	JMP	restore
+
+nocgo:
+	MOV	$0x800000, A0                     // stacksize = 8192KB
+	MOV	$_rt0_riscv64_freebsd_lib_go(SB), A1
+	MOV	A0, 8(X2)
+	MOV	A1, 16(X2)
+	MOV	$runtime·newosproc0(SB), T0
+	JALR	RA, T0
+
+restore:
+	// Restore callee-save registers, along with X1 (LR).
+	MOV	(8*3)(X2), X1
+	MOV	(8*4)(X2), X8
+	MOV	(8*5)(X2), X9
+	MOV	(8*6)(X2), X18
+	MOV	(8*7)(X2), X19
+	MOV	(8*8)(X2), X20
+	MOV	(8*9)(X2), X21
+	MOV	(8*10)(X2), X22
+	MOV	(8*11)(X2), X23
+	MOV	(8*12)(X2), X24
+	MOV	(8*13)(X2), X25
+	MOV	(8*14)(X2), X26
+	MOV	(8*15)(X2), g
+	MOVD	(8*16)(X2), F8
+	MOVD	(8*17)(X2), F9
+	MOVD	(8*18)(X2), F18
+	MOVD	(8*19)(X2), F19
+	MOVD	(8*20)(X2), F20
+	MOVD	(8*21)(X2), F21
+	MOVD	(8*22)(X2), F22
+	MOVD	(8*23)(X2), F23
+	MOVD	(8*24)(X2), F24
+	MOVD	(8*25)(X2), F25
+	MOVD	(8*26)(X2), F26
+	MOVD	(8*27)(X2), F27
+
+	RET
+
+TEXT _rt0_riscv64_freebsd_lib_go(SB),NOSPLIT,$0
+	MOV	_rt0_riscv64_freebsd_lib_argc<>(SB), A0
+	MOV	_rt0_riscv64_freebsd_lib_argv<>(SB), A1
+	MOV	$runtime·rt0_go(SB), T0
+	JALR	ZERO, T0
+
+DATA _rt0_riscv64_freebsd_lib_argc<>(SB)/8, $0
+GLOBL _rt0_riscv64_freebsd_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_riscv64_freebsd_lib_argv<>(SB)/8, $0
+GLOBL _rt0_riscv64_freebsd_lib_argv<>(SB),NOPTR, $8
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	MOV	$runtime·rt0_go(SB), T0
+	JALR	ZERO, T0
diff --git a/src/runtime/rt0_illumos_amd64.s b/src/runtime/rt0_illumos_amd64.s
new file mode 100644
index 0000000..54d35b7
--- /dev/null
+++ b/src/runtime/rt0_illumos_amd64.s
@@ -0,0 +1,11 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_amd64_illumos(SB),NOSPLIT,$-8
+	JMP	_rt0_amd64(SB)
+
+TEXT _rt0_amd64_illumos_lib(SB),NOSPLIT,$0
+	JMP	_rt0_amd64_lib(SB)
diff --git a/src/runtime/rt0_ios_amd64.s b/src/runtime/rt0_ios_amd64.s
new file mode 100644
index 0000000..c699032
--- /dev/null
+++ b/src/runtime/rt0_ios_amd64.s
@@ -0,0 +1,14 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// internal linking executable entry point.
+// ios/amd64 only supports external linking.
+TEXT _rt0_amd64_ios(SB),NOSPLIT|NOFRAME,$0
+	UNDEF
+
+// library entry point.
+TEXT _rt0_amd64_ios_lib(SB),NOSPLIT|NOFRAME,$0
+	JMP	_rt0_amd64_darwin_lib(SB)
diff --git a/src/runtime/rt0_ios_arm64.s b/src/runtime/rt0_ios_arm64.s
new file mode 100644
index 0000000..dcc8365
--- /dev/null
+++ b/src/runtime/rt0_ios_arm64.s
@@ -0,0 +1,14 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+// internal linking executable entry point.
+// ios/arm64 only supports external linking.
+TEXT _rt0_arm64_ios(SB),NOSPLIT|NOFRAME,$0
+	UNDEF
+
+// library entry point.
+TEXT _rt0_arm64_ios_lib(SB),NOSPLIT|NOFRAME,$0
+	JMP	_rt0_arm64_darwin_lib(SB)
diff --git a/src/runtime/rt0_js_wasm.s b/src/runtime/rt0_js_wasm.s
new file mode 100644
index 0000000..34a6047
--- /dev/null
+++ b/src/runtime/rt0_js_wasm.s
@@ -0,0 +1,67 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+// _rt0_wasm_js is not used itself. It only exists to mark the exported functions as alive.
+TEXT _rt0_wasm_js(SB),NOSPLIT,$0
+	I32Const $wasm_export_run(SB)
+	Drop
+	I32Const $wasm_export_resume(SB)
+	Drop
+	I32Const $wasm_export_getsp(SB)
+	Drop
+
+// wasm_export_run gets called from JavaScript. It initializes the Go runtime and executes Go code until it needs
+// to wait for an event. It does NOT follow the Go ABI. It has two WebAssembly parameters:
+// R0: argc (i32)
+// R1: argv (i32)
+TEXT wasm_export_run(SB),NOSPLIT,$0
+	MOVD $runtime·wasmStack+(m0Stack__size-16)(SB), SP
+
+	Get SP
+	Get R0 // argc
+	I64ExtendI32U
+	I64Store $0
+
+	Get SP
+	Get R1 // argv
+	I64ExtendI32U
+	I64Store $8
+
+	I32Const $0 // entry PC_B
+	Call runtime·rt0_go(SB)
+	Drop
+	Call wasm_pc_f_loop(SB)
+
+	Return
+
+// wasm_export_resume gets called from JavaScript. It resumes the execution of Go code until it needs to wait for
+// an event.
+TEXT wasm_export_resume(SB),NOSPLIT,$0
+	I32Const $0
+	Call runtime·handleEvent(SB)
+	Drop
+	Call wasm_pc_f_loop(SB)
+
+	Return
+
+// wasm_export_getsp gets called from JavaScript to retrieve the SP.
+TEXT wasm_export_getsp(SB),NOSPLIT,$0
+	Get SP
+	Return
+
+TEXT runtime·pause(SB), NOSPLIT, $0-8
+	MOVD newsp+0(FP), SP
+	I32Const $1
+	Set PAUSE
+	RETUNWIND
+
+TEXT runtime·exit(SB), NOSPLIT, $0-4
+	I32Const $0
+	Call runtime·wasmExit(SB)
+	I32Const $1
+	Set PAUSE
+	RETUNWIND
diff --git a/src/runtime/rt0_linux_386.s b/src/runtime/rt0_linux_386.s
new file mode 100644
index 0000000..325066f
--- /dev/null
+++ b/src/runtime/rt0_linux_386.s
@@ -0,0 +1,17 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_386_linux(SB),NOSPLIT,$0
+	JMP	_rt0_386(SB)
+
+TEXT _rt0_386_linux_lib(SB),NOSPLIT,$0
+	JMP	_rt0_386_lib(SB)
+
+TEXT main(SB),NOSPLIT,$0
+	// Remove the return address from the stack.
+	// rt0_go doesn't expect it to be there.
+	ADDL	$4, SP
+	JMP	runtime·rt0_go(SB)
diff --git a/src/runtime/rt0_linux_amd64.s b/src/runtime/rt0_linux_amd64.s
new file mode 100644
index 0000000..94ff709
--- /dev/null
+++ b/src/runtime/rt0_linux_amd64.s
@@ -0,0 +1,11 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_amd64_linux(SB),NOSPLIT,$-8
+	JMP	_rt0_amd64(SB)
+
+TEXT _rt0_amd64_linux_lib(SB),NOSPLIT,$0
+	JMP	_rt0_amd64_lib(SB)
diff --git a/src/runtime/rt0_linux_arm.s b/src/runtime/rt0_linux_arm.s
new file mode 100644
index 0000000..8a5722f
--- /dev/null
+++ b/src/runtime/rt0_linux_arm.s
@@ -0,0 +1,33 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_arm_linux(SB),NOSPLIT|NOFRAME,$0
+	MOVW	(R13), R0	// argc
+	MOVW	$4(R13), R1		// argv
+	MOVW	$_rt0_arm_linux1(SB), R4
+	B		(R4)
+
+// When building with -buildmode=c-shared, this symbol is called when the shared
+// library is loaded.
+TEXT _rt0_arm_linux_lib(SB),NOSPLIT,$0
+	B	_rt0_arm_lib(SB)
+
+TEXT _rt0_arm_linux1(SB),NOSPLIT|NOFRAME,$0
+	// We first need to detect the kernel ABI, and warn the user
+	// if the system only supports OABI.
+	// The strategy here is to call some EABI syscall to see if
+	// SIGILL is received.
+	// If you get a SIGILL here, you have the wrong kernel.
+
+	// Save argc and argv (syscall will clobber at least R0).
+	MOVM.DB.W [R0-R1], (R13)
+
+	// do an EABI syscall
+	MOVW	$20, R7 // sys_getpid
+	SWI	$0 // this will trigger SIGILL on OABI systems
+
+	MOVM.IA.W (R13), [R0-R1]
+	B	runtime·rt0_go(SB)
diff --git a/src/runtime/rt0_linux_arm64.s b/src/runtime/rt0_linux_arm64.s
new file mode 100644
index 0000000..0eb8fc2
--- /dev/null
+++ b/src/runtime/rt0_linux_arm64.s
@@ -0,0 +1,73 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+#include "cgo/abi_arm64.h"
+
+TEXT _rt0_arm64_linux(SB),NOSPLIT|NOFRAME,$0
+	MOVD	0(RSP), R0	// argc
+	ADD	$8, RSP, R1	// argv
+	BL	main(SB)
+
+// When building with -buildmode=c-shared, this symbol is called when the shared
+// library is loaded.
+TEXT _rt0_arm64_linux_lib(SB),NOSPLIT,$184
+	// Preserve callee-save registers.
+	SAVE_R19_TO_R28(24)
+	SAVE_F8_TO_F15(104)
+
+	// Initialize g as null in case of using g later e.g. sigaction in cgo_sigaction.go
+	MOVD	ZR, g
+
+	MOVD	R0, _rt0_arm64_linux_lib_argc<>(SB)
+	MOVD	R1, _rt0_arm64_linux_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	MOVD	$runtime·libpreinit(SB), R4
+	BL	(R4)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVD	_cgo_sys_thread_create(SB), R4
+	CBZ	R4, nocgo
+	MOVD	$_rt0_arm64_linux_lib_go(SB), R0
+	MOVD	$0, R1
+	SUB	$16, RSP		// reserve 16 bytes for sp-8 where fp may be saved.
+	BL	(R4)
+	ADD	$16, RSP
+	B	restore
+
+nocgo:
+	MOVD	$0x800000, R0                     // stacksize = 8192KB
+	MOVD	$_rt0_arm64_linux_lib_go(SB), R1
+	MOVD	R0, 8(RSP)
+	MOVD	R1, 16(RSP)
+	MOVD	$runtime·newosproc0(SB),R4
+	BL	(R4)
+
+restore:
+	// Restore callee-save registers.
+	RESTORE_R19_TO_R28(24)
+	RESTORE_F8_TO_F15(104)
+	RET
+
+TEXT _rt0_arm64_linux_lib_go(SB),NOSPLIT,$0
+	MOVD	_rt0_arm64_linux_lib_argc<>(SB), R0
+	MOVD	_rt0_arm64_linux_lib_argv<>(SB), R1
+	MOVD	$runtime·rt0_go(SB),R4
+	B       (R4)
+
+DATA _rt0_arm64_linux_lib_argc<>(SB)/8, $0
+GLOBL _rt0_arm64_linux_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_arm64_linux_lib_argv<>(SB)/8, $0
+GLOBL _rt0_arm64_linux_lib_argv<>(SB),NOPTR, $8
+
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$runtime·rt0_go(SB), R2
+	BL	(R2)
+exit:
+	MOVD $0, R0
+	MOVD	$94, R8	// sys_exit
+	SVC
+	B	exit
diff --git a/src/runtime/rt0_linux_loong64.s b/src/runtime/rt0_linux_loong64.s
new file mode 100644
index 0000000..b52f7d5
--- /dev/null
+++ b/src/runtime/rt0_linux_loong64.s
@@ -0,0 +1,72 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+#include "cgo/abi_loong64.h"
+
+TEXT _rt0_loong64_linux(SB),NOSPLIT|NOFRAME,$0
+	// In a statically linked binary, the stack contains argc,
+	// argv as argc string pointers followed by a NULL, envv as a
+	// sequence of string pointers followed by a NULL, and auxv.
+	// There is no TLS base pointer.
+	MOVW	0(R3), R4	// argc
+	ADDV	$8, R3, R5	// argv
+	JMP	main(SB)
+
+// When building with -buildmode=c-shared, this symbol is called when the shared
+// library is loaded.
+TEXT _rt0_loong64_linux_lib(SB),NOSPLIT,$168
+	// Preserve callee-save registers.
+	SAVE_R22_TO_R31(3*8)
+	SAVE_F24_TO_F31(13*8)
+
+	// Initialize g as nil in case of using g later e.g. sigaction in cgo_sigaction.go
+	MOVV	R0, g
+
+	MOVV	R4, _rt0_loong64_linux_lib_argc<>(SB)
+	MOVV	R5, _rt0_loong64_linux_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	MOVV	$runtime·libpreinit(SB), R19
+	JAL	(R19)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVV	_cgo_sys_thread_create(SB), R19
+	BEQ	R19, nocgo
+	MOVV	$_rt0_loong64_linux_lib_go(SB), R4
+	MOVV	$0, R5
+	JAL	(R19)
+	JMP	restore
+
+nocgo:
+	MOVV	$0x800000, R4                     // stacksize = 8192KB
+	MOVV	$_rt0_loong64_linux_lib_go(SB), R5
+	MOVV	R4, 8(R3)
+	MOVV	R5, 16(R3)
+	MOVV	$runtime·newosproc0(SB), R19
+	JAL	(R19)
+
+restore:
+	// Restore callee-save registers.
+	RESTORE_R22_TO_R31(3*8)
+	RESTORE_F24_TO_F31(13*8)
+	RET
+
+TEXT _rt0_loong64_linux_lib_go(SB),NOSPLIT,$0
+	MOVV	_rt0_loong64_linux_lib_argc<>(SB), R4
+	MOVV	_rt0_loong64_linux_lib_argv<>(SB), R5
+	MOVV	$runtime·rt0_go(SB),R19
+	JMP	(R19)
+
+DATA _rt0_loong64_linux_lib_argc<>(SB)/8, $0
+GLOBL _rt0_loong64_linux_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_loong64_linux_lib_argv<>(SB)/8, $0
+GLOBL _rt0_loong64_linux_lib_argv<>(SB),NOPTR, $8
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	// in external linking, glibc jumps to main with argc in R4
+	// and argv in R5
+
+	MOVV	$runtime·rt0_go(SB), R19
+	JMP	(R19)
diff --git a/src/runtime/rt0_linux_mips64x.s b/src/runtime/rt0_linux_mips64x.s
new file mode 100644
index 0000000..e9328b7
--- /dev/null
+++ b/src/runtime/rt0_linux_mips64x.s
@@ -0,0 +1,38 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (mips64 || mips64le)
+
+#include "textflag.h"
+
+TEXT _rt0_mips64_linux(SB),NOSPLIT,$0
+	JMP	_main<>(SB)
+
+TEXT _rt0_mips64le_linux(SB),NOSPLIT,$0
+	JMP	_main<>(SB)
+
+TEXT _main<>(SB),NOSPLIT|NOFRAME,$0
+	// In a statically linked binary, the stack contains argc,
+	// argv as argc string pointers followed by a NULL, envv as a
+	// sequence of string pointers followed by a NULL, and auxv.
+	// There is no TLS base pointer.
+#ifdef GOARCH_mips64
+	MOVW	4(R29), R4 // argc, big-endian ABI places int32 at offset 4
+#else
+	MOVW	0(R29), R4 // argc
+#endif
+	ADDV	$8, R29, R5 // argv
+	JMP	main(SB)
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	// in external linking, glibc jumps to main with argc in R4
+	// and argv in R5
+
+	// initialize REGSB = PC&0xffffffff00000000
+	BGEZAL	R0, 1(PC)
+	SRLV	$32, R31, RSB
+	SLLV	$32, RSB
+
+	MOVV	$runtime·rt0_go(SB), R1
+	JMP	(R1)
diff --git a/src/runtime/rt0_linux_mipsx.s b/src/runtime/rt0_linux_mipsx.s
new file mode 100644
index 0000000..3cbb7fc
--- /dev/null
+++ b/src/runtime/rt0_linux_mipsx.s
@@ -0,0 +1,27 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (mips || mipsle)
+
+#include "textflag.h"
+
+TEXT _rt0_mips_linux(SB),NOSPLIT,$0
+	JMP	_main<>(SB)
+
+TEXT _rt0_mipsle_linux(SB),NOSPLIT,$0
+	JMP	_main<>(SB)
+
+TEXT _main<>(SB),NOSPLIT|NOFRAME,$0
+	// In a statically linked binary, the stack contains argc,
+	// argv as argc string pointers followed by a NULL, envv as a
+	// sequence of string pointers followed by a NULL, and auxv.
+	// There is no TLS base pointer.
+	MOVW	0(R29), R4 // argc
+	ADD	$4, R29, R5 // argv
+	JMP	main(SB)
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	// In external linking, libc jumps to main with argc in R4, argv in R5
+	MOVW	$runtime·rt0_go(SB), R1
+	JMP	(R1)
diff --git a/src/runtime/rt0_linux_ppc64.s b/src/runtime/rt0_linux_ppc64.s
new file mode 100644
index 0000000..f527170
--- /dev/null
+++ b/src/runtime/rt0_linux_ppc64.s
@@ -0,0 +1,28 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+#include "asm_ppc64x.h"
+
+DEFINE_PPC64X_FUNCDESC(_rt0_ppc64_linux, _main<>)
+DEFINE_PPC64X_FUNCDESC(main, _main<>)
+
+TEXT _main<>(SB),NOSPLIT,$-8
+	// In a statically linked binary, the stack contains argc,
+	// argv as argc string pointers followed by a NULL, envv as a
+	// sequence of string pointers followed by a NULL, and auxv.
+	// There is no TLS base pointer.
+	//
+	// TODO(austin): Support ABI v1 dynamic linking entry point
+	XOR	R0, R0 // Note, newer kernels may not always set R0 to 0.
+	MOVD	$runtime·rt0_go(SB), R12
+	MOVD	R12, CTR
+	MOVBZ	runtime·iscgo(SB), R5
+	CMP	R5, $0
+	BEQ	nocgo
+	BR	(CTR)
+nocgo:
+	MOVD	0(R1), R3 // argc
+	ADD	$8, R1, R4 // argv
+	BR	(CTR)
diff --git a/src/runtime/rt0_linux_ppc64le.s b/src/runtime/rt0_linux_ppc64le.s
new file mode 100644
index 0000000..417ada2
--- /dev/null
+++ b/src/runtime/rt0_linux_ppc64le.s
@@ -0,0 +1,101 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "textflag.h"
+#include "asm_ppc64x.h"
+#include "cgo/abi_ppc64x.h"
+
+TEXT _rt0_ppc64le_linux(SB),NOSPLIT,$0
+	XOR R0, R0	  // Make sure R0 is zero before _main
+	BR _main<>(SB)
+
+TEXT _rt0_ppc64le_linux_lib(SB),NOSPLIT|NOFRAME,$0
+	// This is called with ELFv2 calling conventions. Convert to Go.
+	// Allocate argument storage for call to newosproc0.
+	STACK_AND_SAVE_HOST_TO_GO_ABI(16)
+
+	MOVD	R3, _rt0_ppc64le_linux_lib_argc<>(SB)
+	MOVD	R4, _rt0_ppc64le_linux_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	MOVD	$runtime·libpreinit(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVD	_cgo_sys_thread_create(SB), R12
+	CMP	$0, R12
+	BEQ	nocgo
+	MOVD	$_rt0_ppc64le_linux_lib_go(SB), R3
+	MOVD	$0, R4
+	MOVD	R12, CTR
+	BL	(CTR)
+	BR	done
+
+nocgo:
+	MOVD	$0x800000, R12                     // stacksize = 8192KB
+	MOVD	R12, 8+FIXED_FRAME(R1)
+	MOVD	$_rt0_ppc64le_linux_lib_go(SB), R12
+	MOVD	R12, 16+FIXED_FRAME(R1)
+	MOVD	$runtime·newosproc0(SB),R12
+	MOVD	R12, CTR
+	BL	(CTR)
+
+done:
+	// Restore and return to ELFv2 caller.
+	UNSTACK_AND_RESTORE_GO_TO_HOST_ABI(16)
+	RET
+
+TEXT _rt0_ppc64le_linux_lib_go(SB),NOSPLIT,$0
+	MOVD	_rt0_ppc64le_linux_lib_argc<>(SB), R3
+	MOVD	_rt0_ppc64le_linux_lib_argv<>(SB), R4
+	MOVD	$runtime·rt0_go(SB), R12
+	MOVD	R12, CTR
+	BR	(CTR)
+
+DATA _rt0_ppc64le_linux_lib_argc<>(SB)/8, $0
+GLOBL _rt0_ppc64le_linux_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_ppc64le_linux_lib_argv<>(SB)/8, $0
+GLOBL _rt0_ppc64le_linux_lib_argv<>(SB),NOPTR, $8
+
+TEXT _main<>(SB),NOSPLIT,$-8
+	// In a statically linked binary, the stack contains argc,
+	// argv as argc string pointers followed by a NULL, envv as a
+	// sequence of string pointers followed by a NULL, and auxv.
+	// The TLS pointer should be initialized to 0.
+	//
+	// In an ELFv2 compliant dynamically linked binary, R3 contains argc,
+	// R4 contains argv, R5 contains envp, R6 contains auxv, and R13
+	// contains the TLS pointer.
+	//
+	// When loading via glibc, the first doubleword on the stack points
+	// to NULL a value. (that is *(uintptr)(R1) == 0). This is used to
+	// differentiate static vs dynamically linked binaries.
+	//
+	// If loading with the musl loader, it doesn't follow the ELFv2 ABI. It
+	// passes argc/argv similar to the linux kernel, R13 (TLS) is
+	// initialized, and R3/R4 are undefined.
+	MOVD	(R1), R12
+	CMP	R0, R12
+	BEQ	tls_and_argcv_in_reg
+
+	// Arguments are passed via the stack (musl loader or a static binary)
+	MOVD	0(R1), R3 // argc
+	ADD	$8, R1, R4 // argv
+
+	// Did the TLS pointer get set? If so, don't change it (e.g musl).
+	CMP	R0, R13
+	BNE	tls_and_argcv_in_reg
+
+	MOVD	$runtime·m0+m_tls(SB), R13 // TLS
+	ADD	$0x7000, R13
+
+tls_and_argcv_in_reg:
+	BR	main(SB)
+
+TEXT main(SB),NOSPLIT,$-8
+	MOVD	$runtime·rt0_go(SB), R12
+	MOVD	R12, CTR
+	BR	(CTR)
diff --git a/src/runtime/rt0_linux_riscv64.s b/src/runtime/rt0_linux_riscv64.s
new file mode 100644
index 0000000..d6b8ac8
--- /dev/null
+++ b/src/runtime/rt0_linux_riscv64.s
@@ -0,0 +1,112 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_riscv64_linux(SB),NOSPLIT|NOFRAME,$0
+	MOV	0(X2), A0	// argc
+	ADD	$8, X2, A1	// argv
+	JMP	main(SB)
+
+// When building with -buildmode=c-shared, this symbol is called when the shared
+// library is loaded.
+TEXT _rt0_riscv64_linux_lib(SB),NOSPLIT,$224
+	// Preserve callee-save registers, along with X1 (LR).
+	MOV	X1, (8*3)(X2)
+	MOV	X8, (8*4)(X2)
+	MOV	X9, (8*5)(X2)
+	MOV	X18, (8*6)(X2)
+	MOV	X19, (8*7)(X2)
+	MOV	X20, (8*8)(X2)
+	MOV	X21, (8*9)(X2)
+	MOV	X22, (8*10)(X2)
+	MOV	X23, (8*11)(X2)
+	MOV	X24, (8*12)(X2)
+	MOV	X25, (8*13)(X2)
+	MOV	X26, (8*14)(X2)
+	MOV	g, (8*15)(X2)
+	MOVD	F8, (8*16)(X2)
+	MOVD	F9, (8*17)(X2)
+	MOVD	F18, (8*18)(X2)
+	MOVD	F19, (8*19)(X2)
+	MOVD	F20, (8*20)(X2)
+	MOVD	F21, (8*21)(X2)
+	MOVD	F22, (8*22)(X2)
+	MOVD	F23, (8*23)(X2)
+	MOVD	F24, (8*24)(X2)
+	MOVD	F25, (8*25)(X2)
+	MOVD	F26, (8*26)(X2)
+	MOVD	F27, (8*27)(X2)
+
+	// Initialize g as nil in case of using g later e.g. sigaction in cgo_sigaction.go
+	MOV	X0, g
+
+	MOV	A0, _rt0_riscv64_linux_lib_argc<>(SB)
+	MOV	A1, _rt0_riscv64_linux_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	MOV	$runtime·libpreinit(SB), T0
+	JALR	RA, T0
+
+	// Create a new thread to do the runtime initialization and return.
+	MOV	_cgo_sys_thread_create(SB), T0
+	BEQZ	T0, nocgo
+	MOV	$_rt0_riscv64_linux_lib_go(SB), A0
+	MOV	$0, A1
+	JALR	RA, T0
+	JMP	restore
+
+nocgo:
+	MOV	$0x800000, A0                     // stacksize = 8192KB
+	MOV	$_rt0_riscv64_linux_lib_go(SB), A1
+	MOV	A0, 8(X2)
+	MOV	A1, 16(X2)
+	MOV	$runtime·newosproc0(SB), T0
+	JALR	RA, T0
+
+restore:
+	// Restore callee-save registers, along with X1 (LR).
+	MOV	(8*3)(X2), X1
+	MOV	(8*4)(X2), X8
+	MOV	(8*5)(X2), X9
+	MOV	(8*6)(X2), X18
+	MOV	(8*7)(X2), X19
+	MOV	(8*8)(X2), X20
+	MOV	(8*9)(X2), X21
+	MOV	(8*10)(X2), X22
+	MOV	(8*11)(X2), X23
+	MOV	(8*12)(X2), X24
+	MOV	(8*13)(X2), X25
+	MOV	(8*14)(X2), X26
+	MOV	(8*15)(X2), g
+	MOVD	(8*16)(X2), F8
+	MOVD	(8*17)(X2), F9
+	MOVD	(8*18)(X2), F18
+	MOVD	(8*19)(X2), F19
+	MOVD	(8*20)(X2), F20
+	MOVD	(8*21)(X2), F21
+	MOVD	(8*22)(X2), F22
+	MOVD	(8*23)(X2), F23
+	MOVD	(8*24)(X2), F24
+	MOVD	(8*25)(X2), F25
+	MOVD	(8*26)(X2), F26
+	MOVD	(8*27)(X2), F27
+
+	RET
+
+TEXT _rt0_riscv64_linux_lib_go(SB),NOSPLIT,$0
+	MOV	_rt0_riscv64_linux_lib_argc<>(SB), A0
+	MOV	_rt0_riscv64_linux_lib_argv<>(SB), A1
+	MOV	$runtime·rt0_go(SB), T0
+	JALR	ZERO, T0
+
+DATA _rt0_riscv64_linux_lib_argc<>(SB)/8, $0
+GLOBL _rt0_riscv64_linux_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_riscv64_linux_lib_argv<>(SB)/8, $0
+GLOBL _rt0_riscv64_linux_lib_argv<>(SB),NOPTR, $8
+
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	MOV	$runtime·rt0_go(SB), T0
+	JALR	ZERO, T0
diff --git a/src/runtime/rt0_linux_s390x.s b/src/runtime/rt0_linux_s390x.s
new file mode 100644
index 0000000..4b62c5a
--- /dev/null
+++ b/src/runtime/rt0_linux_s390x.s
@@ -0,0 +1,23 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_s390x_linux(SB), NOSPLIT|NOFRAME, $0
+	// In a statically linked binary, the stack contains argc,
+	// argv as argc string pointers followed by a NULL, envv as a
+	// sequence of string pointers followed by a NULL, and auxv.
+	// There is no TLS base pointer.
+
+	MOVD 0(R15), R2  // argc
+	ADD  $8, R15, R3 // argv
+	BR   main(SB)
+
+TEXT _rt0_s390x_linux_lib(SB), NOSPLIT, $0
+	MOVD $_rt0_s390x_lib(SB), R1
+	BR   R1
+
+TEXT main(SB), NOSPLIT|NOFRAME, $0
+	MOVD $runtime·rt0_go(SB), R1
+	BR   R1
diff --git a/src/runtime/rt0_netbsd_386.s b/src/runtime/rt0_netbsd_386.s
new file mode 100644
index 0000000..cefc04a
--- /dev/null
+++ b/src/runtime/rt0_netbsd_386.s
@@ -0,0 +1,17 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_386_netbsd(SB),NOSPLIT,$0
+	JMP	_rt0_386(SB)
+
+TEXT _rt0_386_netbsd_lib(SB),NOSPLIT,$0
+	JMP	_rt0_386_lib(SB)
+
+TEXT main(SB),NOSPLIT,$0
+	// Remove the return address from the stack.
+	// rt0_go doesn't expect it to be there.
+	ADDL	$4, SP
+	JMP	runtime·rt0_go(SB)
diff --git a/src/runtime/rt0_netbsd_amd64.s b/src/runtime/rt0_netbsd_amd64.s
new file mode 100644
index 0000000..77c7187
--- /dev/null
+++ b/src/runtime/rt0_netbsd_amd64.s
@@ -0,0 +1,11 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_amd64_netbsd(SB),NOSPLIT,$-8
+	JMP	_rt0_amd64(SB)
+
+TEXT _rt0_amd64_netbsd_lib(SB),NOSPLIT,$0
+	JMP	_rt0_amd64_lib(SB)
diff --git a/src/runtime/rt0_netbsd_arm.s b/src/runtime/rt0_netbsd_arm.s
new file mode 100644
index 0000000..503c32a
--- /dev/null
+++ b/src/runtime/rt0_netbsd_arm.s
@@ -0,0 +1,11 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_arm_netbsd(SB),NOSPLIT,$0
+	B	_rt0_arm(SB)
+
+TEXT _rt0_arm_netbsd_lib(SB),NOSPLIT,$0
+	B	_rt0_arm_lib(SB)
diff --git a/src/runtime/rt0_netbsd_arm64.s b/src/runtime/rt0_netbsd_arm64.s
new file mode 100644
index 0000000..691a8e4
--- /dev/null
+++ b/src/runtime/rt0_netbsd_arm64.s
@@ -0,0 +1,71 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+#include "cgo/abi_arm64.h"
+
+TEXT _rt0_arm64_netbsd(SB),NOSPLIT|NOFRAME,$0
+	MOVD	0(RSP), R0	// argc
+	ADD	$8, RSP, R1	// argv
+	BL	main(SB)
+
+// When building with -buildmode=c-shared, this symbol is called when the shared
+// library is loaded.
+TEXT _rt0_arm64_netbsd_lib(SB),NOSPLIT,$184
+	// Preserve callee-save registers.
+	SAVE_R19_TO_R28(24)
+	SAVE_F8_TO_F15(104)
+
+	// Initialize g as null in case of using g later e.g. sigaction in cgo_sigaction.go
+	MOVD	ZR, g
+
+	MOVD	R0, _rt0_arm64_netbsd_lib_argc<>(SB)
+	MOVD	R1, _rt0_arm64_netbsd_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	MOVD	$runtime·libpreinit(SB), R4
+	BL	(R4)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVD	_cgo_sys_thread_create(SB), R4
+	CBZ	R4, nocgo
+	MOVD	$_rt0_arm64_netbsd_lib_go(SB), R0
+	MOVD	$0, R1
+	SUB	$16, RSP		// reserve 16 bytes for sp-8 where fp may be saved.
+	BL	(R4)
+	ADD	$16, RSP
+	B	restore
+
+nocgo:
+	MOVD	$0x800000, R0                     // stacksize = 8192KB
+	MOVD	$_rt0_arm64_netbsd_lib_go(SB), R1
+	MOVD	R0, 8(RSP)
+	MOVD	R1, 16(RSP)
+	MOVD	$runtime·newosproc0(SB),R4
+	BL	(R4)
+
+restore:
+	// Restore callee-save registers.
+	RESTORE_R19_TO_R28(24)
+	RESTORE_F8_TO_F15(104)
+	RET
+
+TEXT _rt0_arm64_netbsd_lib_go(SB),NOSPLIT,$0
+	MOVD	_rt0_arm64_netbsd_lib_argc<>(SB), R0
+	MOVD	_rt0_arm64_netbsd_lib_argv<>(SB), R1
+	MOVD	$runtime·rt0_go(SB),R4
+	B       (R4)
+
+DATA _rt0_arm64_netbsd_lib_argc<>(SB)/8, $0
+GLOBL _rt0_arm64_netbsd_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_arm64_netbsd_lib_argv<>(SB)/8, $0
+GLOBL _rt0_arm64_netbsd_lib_argv<>(SB),NOPTR, $8
+
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$runtime·rt0_go(SB), R2
+	BL	(R2)
+exit:
+	MOVD	$0, R0
+	SVC	$1	// sys_exit
diff --git a/src/runtime/rt0_openbsd_386.s b/src/runtime/rt0_openbsd_386.s
new file mode 100644
index 0000000..959f4d6
--- /dev/null
+++ b/src/runtime/rt0_openbsd_386.s
@@ -0,0 +1,17 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_386_openbsd(SB),NOSPLIT,$0
+	JMP	_rt0_386(SB)
+
+TEXT _rt0_386_openbsd_lib(SB),NOSPLIT,$0
+	JMP	_rt0_386_lib(SB)
+
+TEXT main(SB),NOSPLIT,$0
+	// Remove the return address from the stack.
+	// rt0_go doesn't expect it to be there.
+	ADDL	$4, SP
+	JMP	runtime·rt0_go(SB)
diff --git a/src/runtime/rt0_openbsd_amd64.s b/src/runtime/rt0_openbsd_amd64.s
new file mode 100644
index 0000000..c2f3f23
--- /dev/null
+++ b/src/runtime/rt0_openbsd_amd64.s
@@ -0,0 +1,11 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_amd64_openbsd(SB),NOSPLIT,$-8
+	JMP	_rt0_amd64(SB)
+
+TEXT _rt0_amd64_openbsd_lib(SB),NOSPLIT,$0
+	JMP	_rt0_amd64_lib(SB)
diff --git a/src/runtime/rt0_openbsd_arm.s b/src/runtime/rt0_openbsd_arm.s
new file mode 100644
index 0000000..3511c96
--- /dev/null
+++ b/src/runtime/rt0_openbsd_arm.s
@@ -0,0 +1,11 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_arm_openbsd(SB),NOSPLIT,$0
+	B	_rt0_arm(SB)
+
+TEXT _rt0_arm_openbsd_lib(SB),NOSPLIT,$0
+	B	_rt0_arm_lib(SB)
diff --git a/src/runtime/rt0_openbsd_arm64.s b/src/runtime/rt0_openbsd_arm64.s
new file mode 100644
index 0000000..49d49b3
--- /dev/null
+++ b/src/runtime/rt0_openbsd_arm64.s
@@ -0,0 +1,79 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+#include "cgo/abi_arm64.h"
+
+// See comment in runtime/sys_openbsd_arm64.s re this construction.
+#define	INVOKE_SYSCALL	\
+	SVC;		\
+	NOOP;		\
+	NOOP
+
+TEXT _rt0_arm64_openbsd(SB),NOSPLIT|NOFRAME,$0
+	MOVD	0(RSP), R0	// argc
+	ADD	$8, RSP, R1	// argv
+	BL	main(SB)
+
+// When building with -buildmode=c-shared, this symbol is called when the shared
+// library is loaded.
+TEXT _rt0_arm64_openbsd_lib(SB),NOSPLIT,$184
+	// Preserve callee-save registers.
+	SAVE_R19_TO_R28(24)
+	SAVE_F8_TO_F15(104)
+
+	// Initialize g as null in case of using g later e.g. sigaction in cgo_sigaction.go
+	MOVD	ZR, g
+
+	MOVD	R0, _rt0_arm64_openbsd_lib_argc<>(SB)
+	MOVD	R1, _rt0_arm64_openbsd_lib_argv<>(SB)
+
+	// Synchronous initialization.
+	MOVD	$runtime·libpreinit(SB), R4
+	BL	(R4)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVD	_cgo_sys_thread_create(SB), R4
+	CBZ	R4, nocgo
+	MOVD	$_rt0_arm64_openbsd_lib_go(SB), R0
+	MOVD	$0, R1
+	SUB	$16, RSP		// reserve 16 bytes for sp-8 where fp may be saved.
+	BL	(R4)
+	ADD	$16, RSP
+	B	restore
+
+nocgo:
+	MOVD	$0x800000, R0                     // stacksize = 8192KB
+	MOVD	$_rt0_arm64_openbsd_lib_go(SB), R1
+	MOVD	R0, 8(RSP)
+	MOVD	R1, 16(RSP)
+	MOVD	$runtime·newosproc0(SB),R4
+	BL	(R4)
+
+restore:
+	// Restore callee-save registers.
+	RESTORE_R19_TO_R28(24)
+	RESTORE_F8_TO_F15(104)
+	RET
+
+TEXT _rt0_arm64_openbsd_lib_go(SB),NOSPLIT,$0
+	MOVD	_rt0_arm64_openbsd_lib_argc<>(SB), R0
+	MOVD	_rt0_arm64_openbsd_lib_argv<>(SB), R1
+	MOVD	$runtime·rt0_go(SB),R4
+	B       (R4)
+
+DATA _rt0_arm64_openbsd_lib_argc<>(SB)/8, $0
+GLOBL _rt0_arm64_openbsd_lib_argc<>(SB),NOPTR, $8
+DATA _rt0_arm64_openbsd_lib_argv<>(SB)/8, $0
+GLOBL _rt0_arm64_openbsd_lib_argv<>(SB),NOPTR, $8
+
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$runtime·rt0_go(SB), R2
+	BL	(R2)
+exit:
+	MOVD	$0, R0
+	MOVD	$1, R8		// sys_exit
+	INVOKE_SYSCALL
+	B	exit
diff --git a/src/runtime/rt0_openbsd_mips64.s b/src/runtime/rt0_openbsd_mips64.s
new file mode 100644
index 0000000..82a8dfa
--- /dev/null
+++ b/src/runtime/rt0_openbsd_mips64.s
@@ -0,0 +1,36 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_mips64_openbsd(SB),NOSPLIT,$0
+	JMP	_main<>(SB)
+
+TEXT _rt0_mips64le_openbsd(SB),NOSPLIT,$0
+	JMP	_main<>(SB)
+
+TEXT _main<>(SB),NOSPLIT|NOFRAME,$0
+	// In a statically linked binary, the stack contains argc,
+	// argv as argc string pointers followed by a NULL, envv as a
+	// sequence of string pointers followed by a NULL, and auxv.
+	// There is no TLS base pointer.
+#ifdef GOARCH_mips64
+	MOVW	4(R29), R4 // argc, big-endian ABI places int32 at offset 4
+#else
+	MOVW	0(R29), R4 // argc
+#endif
+	ADDV	$8, R29, R5 // argv
+	JMP	main(SB)
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	// in external linking, glibc jumps to main with argc in R4
+	// and argv in R5
+
+	// initialize REGSB = PC&0xffffffff00000000
+	BGEZAL	R0, 1(PC)
+	SRLV	$32, R31, RSB
+	SLLV	$32, RSB
+
+	MOVV	$runtime·rt0_go(SB), R1
+	JMP	(R1)
diff --git a/src/runtime/rt0_plan9_386.s b/src/runtime/rt0_plan9_386.s
new file mode 100644
index 0000000..6471615
--- /dev/null
+++ b/src/runtime/rt0_plan9_386.s
@@ -0,0 +1,21 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_386_plan9(SB),NOSPLIT,$12
+	MOVL	AX, _tos(SB)
+	LEAL	8(SP), AX
+	MOVL	AX, _privates(SB)
+	MOVL	$1, _nprivates(SB)
+	CALL	runtime·asminit(SB)
+	MOVL	inargc-4(FP), AX
+	MOVL	AX, 0(SP)
+	LEAL	inargv+0(FP), AX
+	MOVL	AX, 4(SP)
+	JMP	runtime·rt0_go(SB)
+
+GLOBL _tos(SB), NOPTR, $4
+GLOBL _privates(SB), NOPTR, $4
+GLOBL _nprivates(SB), NOPTR, $4
diff --git a/src/runtime/rt0_plan9_amd64.s b/src/runtime/rt0_plan9_amd64.s
new file mode 100644
index 0000000..6fd493a
--- /dev/null
+++ b/src/runtime/rt0_plan9_amd64.s
@@ -0,0 +1,19 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_amd64_plan9(SB),NOSPLIT,$24
+	MOVQ	AX, _tos(SB)
+	LEAQ	16(SP), AX
+	MOVQ	AX, _privates(SB)
+	MOVL	$1, _nprivates(SB)
+	MOVL	inargc-8(FP), DI
+	LEAQ	inargv+0(FP), SI
+	MOVQ	$runtime·rt0_go(SB), AX
+	JMP	AX
+
+GLOBL _tos(SB), NOPTR, $8
+GLOBL _privates(SB), NOPTR, $8
+GLOBL _nprivates(SB), NOPTR, $4
diff --git a/src/runtime/rt0_plan9_arm.s b/src/runtime/rt0_plan9_arm.s
new file mode 100644
index 0000000..697a78d
--- /dev/null
+++ b/src/runtime/rt0_plan9_arm.s
@@ -0,0 +1,15 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+//in plan 9 argc is at top of stack followed by ptrs to arguments
+
+TEXT _rt0_arm_plan9(SB),NOSPLIT|NOFRAME,$0
+	MOVW	R0, _tos(SB)
+	MOVW	0(R13), R0
+	MOVW	$4(R13), R1
+	B	runtime·rt0_go(SB)
+
+GLOBL _tos(SB), NOPTR, $4
diff --git a/src/runtime/rt0_solaris_amd64.s b/src/runtime/rt0_solaris_amd64.s
new file mode 100644
index 0000000..5c46ded
--- /dev/null
+++ b/src/runtime/rt0_solaris_amd64.s
@@ -0,0 +1,11 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_amd64_solaris(SB),NOSPLIT,$-8
+	JMP	_rt0_amd64(SB)
+
+TEXT _rt0_amd64_solaris_lib(SB),NOSPLIT,$0
+	JMP	_rt0_amd64_lib(SB)
diff --git a/src/runtime/rt0_wasip1_wasm.s b/src/runtime/rt0_wasip1_wasm.s
new file mode 100644
index 0000000..6dc2393
--- /dev/null
+++ b/src/runtime/rt0_wasip1_wasm.s
@@ -0,0 +1,16 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "textflag.h"
+
+TEXT _rt0_wasm_wasip1(SB),NOSPLIT,$0
+	MOVD $runtime·wasmStack+(m0Stack__size-16)(SB), SP
+
+	I32Const $0 // entry PC_B
+	Call runtime·rt0_go(SB)
+	Drop
+	Call wasm_pc_f_loop(SB)
+
+	Return
diff --git a/src/runtime/rt0_windows_386.s b/src/runtime/rt0_windows_386.s
new file mode 100644
index 0000000..fa39edd
--- /dev/null
+++ b/src/runtime/rt0_windows_386.s
@@ -0,0 +1,47 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT _rt0_386_windows(SB),NOSPLIT,$0
+	JMP	_rt0_386(SB)
+
+// When building with -buildmode=(c-shared or c-archive), this
+// symbol is called. For dynamic libraries it is called when the
+// library is loaded. For static libraries it is called when the
+// final executable starts, during the C runtime initialization
+// phase.
+TEXT _rt0_386_windows_lib(SB),NOSPLIT,$0x1C
+	MOVL	BP, 0x08(SP)
+	MOVL	BX, 0x0C(SP)
+	MOVL	AX, 0x10(SP)
+	MOVL  CX, 0x14(SP)
+	MOVL  DX, 0x18(SP)
+
+	// Create a new thread to do the runtime initialization and return.
+	MOVL	_cgo_sys_thread_create(SB), AX
+	MOVL	$_rt0_386_windows_lib_go(SB), 0x00(SP)
+	MOVL	$0, 0x04(SP)
+
+	 // Top two items on the stack are passed to _cgo_sys_thread_create
+	 // as parameters. This is the calling convention on 32-bit Windows.
+	CALL	AX
+
+	MOVL	0x08(SP), BP
+	MOVL	0x0C(SP), BX
+	MOVL	0x10(SP), AX
+	MOVL	0x14(SP), CX
+	MOVL	0x18(SP), DX
+	RET
+
+TEXT _rt0_386_windows_lib_go(SB),NOSPLIT,$0
+	PUSHL	$0
+	PUSHL	$0
+	JMP	runtime·rt0_go(SB)
+
+TEXT _main(SB),NOSPLIT,$0
+	// Remove the return address from the stack.
+	// rt0_go doesn't expect it to be there.
+	ADDL	$4, SP
+	JMP	runtime·rt0_go(SB)
diff --git a/src/runtime/rt0_windows_amd64.s b/src/runtime/rt0_windows_amd64.s
new file mode 100644
index 0000000..bd18bdd
--- /dev/null
+++ b/src/runtime/rt0_windows_amd64.s
@@ -0,0 +1,36 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+TEXT _rt0_amd64_windows(SB),NOSPLIT|NOFRAME,$-8
+	JMP	_rt0_amd64(SB)
+
+// When building with -buildmode=(c-shared or c-archive), this
+// symbol is called. For dynamic libraries it is called when the
+// library is loaded. For static libraries it is called when the
+// final executable starts, during the C runtime initialization
+// phase.
+// Leave space for four pointers on the stack as required
+// by the Windows amd64 calling convention.
+TEXT _rt0_amd64_windows_lib(SB),NOSPLIT|NOFRAME,$40
+	// Create a new thread to do the runtime initialization and return.
+	MOVQ	BX, 32(SP) // callee-saved, preserved across the CALL
+	MOVQ	SP, BX
+	ANDQ	$~15, SP // alignment as per Windows requirement
+	MOVQ	_cgo_sys_thread_create(SB), AX
+	MOVQ	$_rt0_amd64_windows_lib_go(SB), CX
+	MOVQ	$0, DX
+	CALL	AX
+	MOVQ	BX, SP
+	MOVQ	32(SP), BX
+	RET
+
+TEXT _rt0_amd64_windows_lib_go(SB),NOSPLIT|NOFRAME,$0
+	MOVQ  $0, DI
+	MOVQ	$0, SI
+	MOVQ	$runtime·rt0_go(SB), AX
+	JMP	AX
diff --git a/src/runtime/rt0_windows_arm.s b/src/runtime/rt0_windows_arm.s
new file mode 100644
index 0000000..c5787d0
--- /dev/null
+++ b/src/runtime/rt0_windows_arm.s
@@ -0,0 +1,12 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// This is the entry point for the program from the
+// kernel for an ordinary -buildmode=exe program.
+TEXT _rt0_arm_windows(SB),NOSPLIT|NOFRAME,$0
+	B	·rt0_go(SB)
diff --git a/src/runtime/rt0_windows_arm64.s b/src/runtime/rt0_windows_arm64.s
new file mode 100644
index 0000000..8802c2b
--- /dev/null
+++ b/src/runtime/rt0_windows_arm64.s
@@ -0,0 +1,29 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// This is the entry point for the program from the
+// kernel for an ordinary -buildmode=exe program.
+TEXT _rt0_arm64_windows(SB),NOSPLIT|NOFRAME,$0
+	B	·rt0_go(SB)
+
+TEXT _rt0_arm64_windows_lib(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$_rt0_arm64_windows_lib_go(SB), R0
+	MOVD	$0, R1
+	MOVD	_cgo_sys_thread_create(SB), R2
+	B	(R2)
+
+TEXT _rt0_arm64_windows_lib_go(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$0, R0
+	MOVD	$0, R1
+	MOVD	$runtime·rt0_go(SB), R2
+	B	(R2)
+
+TEXT main(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$runtime·rt0_go(SB), R2
+	B	(R2)
+
diff --git a/src/runtime/runtime-gdb.py b/src/runtime/runtime-gdb.py
new file mode 100644
index 0000000..46f014f
--- /dev/null
+++ b/src/runtime/runtime-gdb.py
@@ -0,0 +1,612 @@
+# Copyright 2010 The Go Authors. All rights reserved.
+# Use of this source code is governed by a BSD-style
+# license that can be found in the LICENSE file.
+
+"""GDB Pretty printers and convenience functions for Go's runtime structures.
+
+This script is loaded by GDB when it finds a .debug_gdb_scripts
+section in the compiled binary. The [68]l linkers emit this with a
+path to this file based on the path to the runtime package.
+"""
+
+# Known issues:
+#    - pretty printing only works for the 'native' strings. E.g. 'type
+#      foo string' will make foo a plain struct in the eyes of gdb,
+#      circumventing the pretty print triggering.
+
+
+from __future__ import print_function
+import re
+import sys
+import gdb
+
+print("Loading Go Runtime support.", file=sys.stderr)
+#http://python3porting.com/differences.html
+if sys.version > '3':
+	xrange = range
+# allow to manually reload while developing
+goobjfile = gdb.current_objfile() or gdb.objfiles()[0]
+goobjfile.pretty_printers = []
+
+# G state (runtime2.go)
+
+def read_runtime_const(varname, default):
+  try:
+    return int(gdb.parse_and_eval(varname))
+  except Exception:
+    return int(default)
+
+
+G_IDLE = read_runtime_const("'runtime._Gidle'", 0)
+G_RUNNABLE = read_runtime_const("'runtime._Grunnable'", 1)
+G_RUNNING = read_runtime_const("'runtime._Grunning'", 2)
+G_SYSCALL = read_runtime_const("'runtime._Gsyscall'", 3)
+G_WAITING = read_runtime_const("'runtime._Gwaiting'", 4)
+G_MORIBUND_UNUSED = read_runtime_const("'runtime._Gmoribund_unused'", 5)
+G_DEAD = read_runtime_const("'runtime._Gdead'", 6)
+G_ENQUEUE_UNUSED = read_runtime_const("'runtime._Genqueue_unused'", 7)
+G_COPYSTACK = read_runtime_const("'runtime._Gcopystack'", 8)
+G_SCAN = read_runtime_const("'runtime._Gscan'", 0x1000)
+G_SCANRUNNABLE = G_SCAN+G_RUNNABLE
+G_SCANRUNNING = G_SCAN+G_RUNNING
+G_SCANSYSCALL = G_SCAN+G_SYSCALL
+G_SCANWAITING = G_SCAN+G_WAITING
+
+sts = {
+    G_IDLE: 'idle',
+    G_RUNNABLE: 'runnable',
+    G_RUNNING: 'running',
+    G_SYSCALL: 'syscall',
+    G_WAITING: 'waiting',
+    G_MORIBUND_UNUSED: 'moribund',
+    G_DEAD: 'dead',
+    G_ENQUEUE_UNUSED: 'enqueue',
+    G_COPYSTACK: 'copystack',
+    G_SCAN: 'scan',
+    G_SCANRUNNABLE: 'runnable+s',
+    G_SCANRUNNING: 'running+s',
+    G_SCANSYSCALL: 'syscall+s',
+    G_SCANWAITING: 'waiting+s',
+}
+
+
+#
+#  Value wrappers
+#
+
+class SliceValue:
+	"Wrapper for slice values."
+
+	def __init__(self, val):
+		self.val = val
+
+	@property
+	def len(self):
+		return int(self.val['len'])
+
+	@property
+	def cap(self):
+		return int(self.val['cap'])
+
+	def __getitem__(self, i):
+		if i < 0 or i >= self.len:
+			raise IndexError(i)
+		ptr = self.val["array"]
+		return (ptr + i).dereference()
+
+
+#
+#  Pretty Printers
+#
+
+# The patterns for matching types are permissive because gdb 8.2 switched to matching on (we think) typedef names instead of C syntax names.
+class StringTypePrinter:
+	"Pretty print Go strings."
+
+	pattern = re.compile(r'^(struct string( \*)?|string)$')
+
+	def __init__(self, val):
+		self.val = val
+
+	def display_hint(self):
+		return 'string'
+
+	def to_string(self):
+		l = int(self.val['len'])
+		return self.val['str'].string("utf-8", "ignore", l)
+
+
+class SliceTypePrinter:
+	"Pretty print slices."
+
+	pattern = re.compile(r'^(struct \[\]|\[\])')
+
+	def __init__(self, val):
+		self.val = val
+
+	def display_hint(self):
+		return 'array'
+
+	def to_string(self):
+		t = str(self.val.type)
+		if (t.startswith("struct ")):
+			return t[len("struct "):]
+		return t
+
+	def children(self):
+		sval = SliceValue(self.val)
+		if sval.len > sval.cap:
+			return
+		for idx, item in enumerate(sval):
+			yield ('[{0}]'.format(idx), item)
+
+
+class MapTypePrinter:
+	"""Pretty print map[K]V types.
+
+	Map-typed go variables are really pointers. dereference them in gdb
+	to inspect their contents with this pretty printer.
+	"""
+
+	pattern = re.compile(r'^map\[.*\].*$')
+
+	def __init__(self, val):
+		self.val = val
+
+	def display_hint(self):
+		return 'map'
+
+	def to_string(self):
+		return str(self.val.type)
+
+	def children(self):
+		MapBucketCount = 8 # see internal/abi.go:MapBucketCount
+		B = self.val['B']
+		buckets = self.val['buckets']
+		oldbuckets = self.val['oldbuckets']
+		flags = self.val['flags']
+		inttype = self.val['hash0'].type
+		cnt = 0
+		for bucket in xrange(2 ** int(B)):
+			bp = buckets + bucket
+			if oldbuckets:
+				oldbucket = bucket & (2 ** (B - 1) - 1)
+				oldbp = oldbuckets + oldbucket
+				oldb = oldbp.dereference()
+				if (oldb['overflow'].cast(inttype) & 1) == 0:  # old bucket not evacuated yet
+					if bucket >= 2 ** (B - 1):
+						continue    # already did old bucket
+					bp = oldbp
+			while bp:
+				b = bp.dereference()
+				for i in xrange(MapBucketCount):
+					if b['tophash'][i] != 0:
+						k = b['keys'][i]
+						v = b['values'][i]
+						if flags & 1:
+							k = k.dereference()
+						if flags & 2:
+							v = v.dereference()
+						yield str(cnt), k
+						yield str(cnt + 1), v
+						cnt += 2
+				bp = b['overflow']
+
+
+class ChanTypePrinter:
+	"""Pretty print chan[T] types.
+
+	Chan-typed go variables are really pointers. dereference them in gdb
+	to inspect their contents with this pretty printer.
+	"""
+
+	pattern = re.compile(r'^chan ')
+
+	def __init__(self, val):
+		self.val = val
+
+	def display_hint(self):
+		return 'array'
+
+	def to_string(self):
+		return str(self.val.type)
+
+	def children(self):
+		# see chan.c chanbuf(). et is the type stolen from hchan<T>::recvq->first->elem
+		et = [x.type for x in self.val['recvq']['first'].type.target().fields() if x.name == 'elem'][0]
+		ptr = (self.val.address["buf"]).cast(et)
+		for i in range(self.val["qcount"]):
+			j = (self.val["recvx"] + i) % self.val["dataqsiz"]
+			yield ('[{0}]'.format(i), (ptr + j).dereference())
+
+
+def paramtypematch(t, pattern):
+	return t.code == gdb.TYPE_CODE_TYPEDEF and str(t).startswith(".param") and pattern.match(str(t.target()))
+
+#
+#  Register all the *Printer classes above.
+#
+
+def makematcher(klass):
+	def matcher(val):
+		try:
+			if klass.pattern.match(str(val.type)):
+				return klass(val)
+			elif paramtypematch(val.type, klass.pattern):
+				return klass(val.cast(val.type.target()))
+		except Exception:
+			pass
+	return matcher
+
+goobjfile.pretty_printers.extend([makematcher(var) for var in vars().values() if hasattr(var, 'pattern')])
+#
+#  Utilities
+#
+
+def pc_to_int(pc):
+	# python2 will not cast pc (type void*) to an int cleanly
+	# instead python2 and python3 work with the hex string representation
+	# of the void pointer which we can parse back into an int.
+	# int(pc) will not work.
+	try:
+		# python3 / newer versions of gdb
+		pc = int(pc)
+	except gdb.error:
+		# str(pc) can return things like
+		# "0x429d6c <runtime.gopark+284>", so
+		# chop at first space.
+		pc = int(str(pc).split(None, 1)[0], 16)
+	return pc
+
+
+#
+#  For reference, this is what we're trying to do:
+#  eface: p *(*(struct 'runtime.rtype'*)'main.e'->type_->data)->string
+#  iface: p *(*(struct 'runtime.rtype'*)'main.s'->tab->Type->data)->string
+#
+# interface types can't be recognized by their name, instead we check
+# if they have the expected fields.  Unfortunately the mapping of
+# fields to python attributes in gdb.py isn't complete: you can't test
+# for presence other than by trapping.
+
+
+def is_iface(val):
+	try:
+		return str(val['tab'].type) == "struct runtime.itab *" and str(val['data'].type) == "void *"
+	except gdb.error:
+		pass
+
+
+def is_eface(val):
+	try:
+		return str(val['_type'].type) == "struct runtime._type *" and str(val['data'].type) == "void *"
+	except gdb.error:
+		pass
+
+
+def lookup_type(name):
+	try:
+		return gdb.lookup_type(name)
+	except gdb.error:
+		pass
+	try:
+		return gdb.lookup_type('struct ' + name)
+	except gdb.error:
+		pass
+	try:
+		return gdb.lookup_type('struct ' + name[1:]).pointer()
+	except gdb.error:
+		pass
+
+
+def iface_commontype(obj):
+	if is_iface(obj):
+		go_type_ptr = obj['tab']['_type']
+	elif is_eface(obj):
+		go_type_ptr = obj['_type']
+	else:
+		return
+
+	return go_type_ptr.cast(gdb.lookup_type("struct reflect.rtype").pointer()).dereference()
+
+
+def iface_dtype(obj):
+	"Decode type of the data field of an eface or iface struct."
+	# known issue: dtype_name decoded from runtime.rtype is "nested.Foo"
+	# but the dwarf table lists it as "full/path/to/nested.Foo"
+
+	dynamic_go_type = iface_commontype(obj)
+	if dynamic_go_type is None:
+		return
+	dtype_name = dynamic_go_type['string'].dereference()['str'].string()
+
+	dynamic_gdb_type = lookup_type(dtype_name)
+	if dynamic_gdb_type is None:
+		return
+
+	type_size = int(dynamic_go_type['size'])
+	uintptr_size = int(dynamic_go_type['size'].type.sizeof)	 # size is itself a uintptr
+	if type_size > uintptr_size:
+			dynamic_gdb_type = dynamic_gdb_type.pointer()
+
+	return dynamic_gdb_type
+
+
+def iface_dtype_name(obj):
+	"Decode type name of the data field of an eface or iface struct."
+
+	dynamic_go_type = iface_commontype(obj)
+	if dynamic_go_type is None:
+		return
+	return dynamic_go_type['string'].dereference()['str'].string()
+
+
+class IfacePrinter:
+	"""Pretty print interface values
+
+	Casts the data field to the appropriate dynamic type."""
+
+	def __init__(self, val):
+		self.val = val
+
+	def display_hint(self):
+		return 'string'
+
+	def to_string(self):
+		if self.val['data'] == 0:
+			return 0x0
+		try:
+			dtype = iface_dtype(self.val)
+		except Exception:
+			return "<bad dynamic type>"
+
+		if dtype is None:  # trouble looking up, print something reasonable
+			return "({typename}){data}".format(
+				typename=iface_dtype_name(self.val), data=self.val['data'])
+
+		try:
+			return self.val['data'].cast(dtype).dereference()
+		except Exception:
+			pass
+		return self.val['data'].cast(dtype)
+
+
+def ifacematcher(val):
+	if is_iface(val) or is_eface(val):
+		return IfacePrinter(val)
+
+goobjfile.pretty_printers.append(ifacematcher)
+
+#
+#  Convenience Functions
+#
+
+
+class GoLenFunc(gdb.Function):
+	"Length of strings, slices, maps or channels"
+
+	how = ((StringTypePrinter, 'len'), (SliceTypePrinter, 'len'), (MapTypePrinter, 'count'), (ChanTypePrinter, 'qcount'))
+
+	def __init__(self):
+		gdb.Function.__init__(self, "len")
+
+	def invoke(self, obj):
+		typename = str(obj.type)
+		for klass, fld in self.how:
+			if klass.pattern.match(typename) or paramtypematch(obj.type, klass.pattern):
+				return obj[fld]
+
+
+class GoCapFunc(gdb.Function):
+	"Capacity of slices or channels"
+
+	how = ((SliceTypePrinter, 'cap'), (ChanTypePrinter, 'dataqsiz'))
+
+	def __init__(self):
+		gdb.Function.__init__(self, "cap")
+
+	def invoke(self, obj):
+		typename = str(obj.type)
+		for klass, fld in self.how:
+			if klass.pattern.match(typename) or paramtypematch(obj.type, klass.pattern):
+				return obj[fld]
+
+
+class DTypeFunc(gdb.Function):
+	"""Cast Interface values to their dynamic type.
+
+	For non-interface types this behaves as the identity operation.
+	"""
+
+	def __init__(self):
+		gdb.Function.__init__(self, "dtype")
+
+	def invoke(self, obj):
+		try:
+			return obj['data'].cast(iface_dtype(obj))
+		except gdb.error:
+			pass
+		return obj
+
+#
+#  Commands
+#
+
+def linked_list(ptr, linkfield):
+	while ptr:
+		yield ptr
+		ptr = ptr[linkfield]
+
+
+class GoroutinesCmd(gdb.Command):
+	"List all goroutines."
+
+	def __init__(self):
+		gdb.Command.__init__(self, "info goroutines", gdb.COMMAND_STACK, gdb.COMPLETE_NONE)
+
+	def invoke(self, _arg, _from_tty):
+		# args = gdb.string_to_argv(arg)
+		vp = gdb.lookup_type('void').pointer()
+		for ptr in SliceValue(gdb.parse_and_eval("'runtime.allgs'")):
+			if ptr['atomicstatus']['value'] == G_DEAD:
+				continue
+			s = ' '
+			if ptr['m']:
+				s = '*'
+			pc = ptr['sched']['pc'].cast(vp)
+			pc = pc_to_int(pc)
+			blk = gdb.block_for_pc(pc)
+			status = int(ptr['atomicstatus']['value'])
+			st = sts.get(status, "unknown(%d)" % status)
+			print(s, ptr['goid'], "{0:8s}".format(st), blk.function)
+
+
+def find_goroutine(goid):
+	"""
+	find_goroutine attempts to find the goroutine identified by goid.
+	It returns a tuple of gdb.Value's representing the stack pointer
+	and program counter pointer for the goroutine.
+
+	@param int goid
+
+	@return tuple (gdb.Value, gdb.Value)
+	"""
+	vp = gdb.lookup_type('void').pointer()
+	for ptr in SliceValue(gdb.parse_and_eval("'runtime.allgs'")):
+		if ptr['atomicstatus']['value'] == G_DEAD:
+			continue
+		if ptr['goid'] == goid:
+			break
+	else:
+		return None, None
+	# Get the goroutine's saved state.
+	pc, sp = ptr['sched']['pc'], ptr['sched']['sp']
+	status = ptr['atomicstatus']['value']&~G_SCAN
+	# Goroutine is not running nor in syscall, so use the info in goroutine
+	if status != G_RUNNING and status != G_SYSCALL:
+		return pc.cast(vp), sp.cast(vp)
+
+	# If the goroutine is in a syscall, use syscallpc/sp.
+	pc, sp = ptr['syscallpc'], ptr['syscallsp']
+	if sp != 0:
+		return pc.cast(vp), sp.cast(vp)
+	# Otherwise, the goroutine is running, so it doesn't have
+	# saved scheduler state. Find G's OS thread.
+	m = ptr['m']
+	if m == 0:
+		return None, None
+	for thr in gdb.selected_inferior().threads():
+		if thr.ptid[1] == m['procid']:
+			break
+	else:
+		return None, None
+	# Get scheduler state from the G's OS thread state.
+	curthr = gdb.selected_thread()
+	try:
+		thr.switch()
+		pc = gdb.parse_and_eval('$pc')
+		sp = gdb.parse_and_eval('$sp')
+	finally:
+		curthr.switch()
+	return pc.cast(vp), sp.cast(vp)
+
+
+class GoroutineCmd(gdb.Command):
+	"""Execute gdb command in the context of goroutine <goid>.
+
+	Switch PC and SP to the ones in the goroutine's G structure,
+	execute an arbitrary gdb command, and restore PC and SP.
+
+	Usage: (gdb) goroutine <goid> <gdbcmd>
+
+	You could pass "all" as <goid> to apply <gdbcmd> to all goroutines.
+
+	For example: (gdb) goroutine all <gdbcmd>
+
+	Note that it is ill-defined to modify state in the context of a goroutine.
+	Restrict yourself to inspecting values.
+	"""
+
+	def __init__(self):
+		gdb.Command.__init__(self, "goroutine", gdb.COMMAND_STACK, gdb.COMPLETE_NONE)
+
+	def invoke(self, arg, _from_tty):
+		goid_str, cmd = arg.split(None, 1)
+		goids = []
+
+		if goid_str == 'all':
+			for ptr in SliceValue(gdb.parse_and_eval("'runtime.allgs'")):
+				goids.append(int(ptr['goid']))
+		else:
+			goids = [int(gdb.parse_and_eval(goid_str))]
+
+		for goid in goids:
+			self.invoke_per_goid(goid, cmd)
+
+	def invoke_per_goid(self, goid, cmd):
+		pc, sp = find_goroutine(goid)
+		if not pc:
+			print("No such goroutine: ", goid)
+			return
+		pc = pc_to_int(pc)
+		save_frame = gdb.selected_frame()
+		gdb.parse_and_eval('$save_sp = $sp')
+		gdb.parse_and_eval('$save_pc = $pc')
+		# In GDB, assignments to sp must be done from the
+		# top-most frame, so select frame 0 first.
+		gdb.execute('select-frame 0')
+		gdb.parse_and_eval('$sp = {0}'.format(str(sp)))
+		gdb.parse_and_eval('$pc = {0}'.format(str(pc)))
+		try:
+			gdb.execute(cmd)
+		finally:
+			# In GDB, assignments to sp must be done from the
+			# top-most frame, so select frame 0 first.
+			gdb.execute('select-frame 0')
+			gdb.parse_and_eval('$pc = $save_pc')
+			gdb.parse_and_eval('$sp = $save_sp')
+			save_frame.select()
+
+
+class GoIfaceCmd(gdb.Command):
+	"Print Static and dynamic interface types"
+
+	def __init__(self):
+		gdb.Command.__init__(self, "iface", gdb.COMMAND_DATA, gdb.COMPLETE_SYMBOL)
+
+	def invoke(self, arg, _from_tty):
+		for obj in gdb.string_to_argv(arg):
+			try:
+				#TODO fix quoting for qualified variable names
+				obj = gdb.parse_and_eval(str(obj))
+			except Exception as e:
+				print("Can't parse ", obj, ": ", e)
+				continue
+
+			if obj['data'] == 0:
+				dtype = "nil"
+			else:
+				dtype = iface_dtype(obj)
+
+			if dtype is None:
+				print("Not an interface: ", obj.type)
+				continue
+
+			print("{0}: {1}".format(obj.type, dtype))
+
+# TODO: print interface's methods and dynamic type's func pointers thereof.
+#rsc: "to find the number of entries in the itab's Fn field look at
+# itab.inter->numMethods
+# i am sure i have the names wrong but look at the interface type
+# and its method count"
+# so Itype will start with a commontype which has kind = interface
+
+#
+# Register all convenience functions and CLI commands
+#
+GoLenFunc()
+GoCapFunc()
+DTypeFunc()
+GoroutinesCmd()
+GoroutineCmd()
+GoIfaceCmd()
diff --git a/src/runtime/runtime-gdb_test.go b/src/runtime/runtime-gdb_test.go
new file mode 100644
index 0000000..8c759bf
--- /dev/null
+++ b/src/runtime/runtime-gdb_test.go
@@ -0,0 +1,793 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"bytes"
+	"flag"
+	"fmt"
+	"internal/abi"
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"regexp"
+	"runtime"
+	"strconv"
+	"strings"
+	"testing"
+	"time"
+)
+
+// NOTE: In some configurations, GDB will segfault when sent a SIGWINCH signal.
+// Some runtime tests send SIGWINCH to the entire process group, so those tests
+// must never run in parallel with GDB tests.
+//
+// See issue 39021 and https://sourceware.org/bugzilla/show_bug.cgi?id=26056.
+
+func checkGdbEnvironment(t *testing.T) {
+	testenv.MustHaveGoBuild(t)
+	switch runtime.GOOS {
+	case "darwin":
+		t.Skip("gdb does not work on darwin")
+	case "netbsd":
+		t.Skip("gdb does not work with threads on NetBSD; see https://golang.org/issue/22893 and https://gnats.netbsd.org/52548")
+	case "linux":
+		if runtime.GOARCH == "ppc64" {
+			t.Skip("skipping gdb tests on linux/ppc64; see https://golang.org/issue/17366")
+		}
+		if runtime.GOARCH == "mips" {
+			t.Skip("skipping gdb tests on linux/mips; see https://golang.org/issue/25939")
+		}
+		// Disable GDB tests on alpine until issue #54352 resolved.
+		if strings.HasSuffix(testenv.Builder(), "-alpine") {
+			t.Skip("skipping gdb tests on alpine; see https://golang.org/issue/54352")
+		}
+	case "freebsd":
+		t.Skip("skipping gdb tests on FreeBSD; see https://golang.org/issue/29508")
+	case "aix":
+		if testing.Short() {
+			t.Skip("skipping gdb tests on AIX; see https://golang.org/issue/35710")
+		}
+	case "plan9":
+		t.Skip("there is no gdb on Plan 9")
+	}
+	if final := os.Getenv("GOROOT_FINAL"); final != "" && testenv.GOROOT(t) != final {
+		t.Skip("gdb test can fail with GOROOT_FINAL pending")
+	}
+}
+
+func checkGdbVersion(t *testing.T) {
+	// Issue 11214 reports various failures with older versions of gdb.
+	out, err := exec.Command("gdb", "--version").CombinedOutput()
+	if err != nil {
+		t.Skipf("skipping: error executing gdb: %v", err)
+	}
+	re := regexp.MustCompile(`([0-9]+)\.([0-9]+)`)
+	matches := re.FindSubmatch(out)
+	if len(matches) < 3 {
+		t.Skipf("skipping: can't determine gdb version from\n%s\n", out)
+	}
+	major, err1 := strconv.Atoi(string(matches[1]))
+	minor, err2 := strconv.Atoi(string(matches[2]))
+	if err1 != nil || err2 != nil {
+		t.Skipf("skipping: can't determine gdb version: %v, %v", err1, err2)
+	}
+	if major < 7 || (major == 7 && minor < 7) {
+		t.Skipf("skipping: gdb version %d.%d too old", major, minor)
+	}
+	t.Logf("gdb version %d.%d", major, minor)
+}
+
+func checkGdbPython(t *testing.T) {
+	if runtime.GOOS == "solaris" || runtime.GOOS == "illumos" {
+		t.Skip("skipping gdb python tests on illumos and solaris; see golang.org/issue/20821")
+	}
+
+	cmd := exec.Command("gdb", "-nx", "-q", "--batch", "-iex", "python import sys; print('go gdb python support')")
+	out, err := cmd.CombinedOutput()
+
+	if err != nil {
+		t.Skipf("skipping due to issue running gdb: %v", err)
+	}
+	if strings.TrimSpace(string(out)) != "go gdb python support" {
+		t.Skipf("skipping due to lack of python gdb support: %s", out)
+	}
+}
+
+// checkCleanBacktrace checks that the given backtrace is well formed and does
+// not contain any error messages from GDB.
+func checkCleanBacktrace(t *testing.T, backtrace string) {
+	backtrace = strings.TrimSpace(backtrace)
+	lines := strings.Split(backtrace, "\n")
+	if len(lines) == 0 {
+		t.Fatalf("empty backtrace")
+	}
+	for i, l := range lines {
+		if !strings.HasPrefix(l, fmt.Sprintf("#%v  ", i)) {
+			t.Fatalf("malformed backtrace at line %v: %v", i, l)
+		}
+	}
+	// TODO(mundaym): check for unknown frames (e.g. "??").
+}
+
+// NOTE: the maps below are allocated larger than abi.MapBucketCount
+// to ensure that they are not "optimized out".
+
+var helloSource = `
+import "fmt"
+import "runtime"
+var gslice []string
+func main() {
+	mapvar := make(map[string]string, ` + strconv.FormatInt(abi.MapBucketCount+9, 10) + `)
+	slicemap := make(map[string][]string,` + strconv.FormatInt(abi.MapBucketCount+3, 10) + `)
+    chanint := make(chan int, 10)
+    chanstr := make(chan string, 10)
+    chanint <- 99
+	chanint <- 11
+    chanstr <- "spongepants"
+    chanstr <- "squarebob"
+	mapvar["abc"] = "def"
+	mapvar["ghi"] = "jkl"
+	slicemap["a"] = []string{"b","c","d"}
+    slicemap["e"] = []string{"f","g","h"}
+	strvar := "abc"
+	ptrvar := &strvar
+	slicevar := make([]string, 0, 16)
+	slicevar = append(slicevar, mapvar["abc"])
+	fmt.Println("hi")
+	runtime.KeepAlive(ptrvar)
+	_ = ptrvar // set breakpoint here
+	gslice = slicevar
+	fmt.Printf("%v, %v, %v\n", slicemap, <-chanint, <-chanstr)
+	runtime.KeepAlive(mapvar)
+}  // END_OF_PROGRAM
+`
+
+func lastLine(src []byte) int {
+	eop := []byte("END_OF_PROGRAM")
+	for i, l := range bytes.Split(src, []byte("\n")) {
+		if bytes.Contains(l, eop) {
+			return i
+		}
+	}
+	return 0
+}
+
+func TestGdbPython(t *testing.T) {
+	testGdbPython(t, false)
+}
+
+func TestGdbPythonCgo(t *testing.T) {
+	if strings.HasPrefix(runtime.GOARCH, "mips") {
+		testenv.SkipFlaky(t, 37794)
+	}
+	testGdbPython(t, true)
+}
+
+func testGdbPython(t *testing.T, cgo bool) {
+	if cgo {
+		testenv.MustHaveCGO(t)
+	}
+
+	checkGdbEnvironment(t)
+	t.Parallel()
+	checkGdbVersion(t)
+	checkGdbPython(t)
+
+	dir := t.TempDir()
+
+	var buf bytes.Buffer
+	buf.WriteString("package main\n")
+	if cgo {
+		buf.WriteString(`import "C"` + "\n")
+	}
+	buf.WriteString(helloSource)
+
+	src := buf.Bytes()
+
+	// Locate breakpoint line
+	var bp int
+	lines := bytes.Split(src, []byte("\n"))
+	for i, line := range lines {
+		if bytes.Contains(line, []byte("breakpoint")) {
+			bp = i
+			break
+		}
+	}
+
+	err := os.WriteFile(filepath.Join(dir, "main.go"), src, 0644)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+	nLines := lastLine(src)
+
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-o", "a.exe", "main.go")
+	cmd.Dir = dir
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("building source %v\n%s", err, out)
+	}
+
+	args := []string{"-nx", "-q", "--batch",
+		"-iex", "add-auto-load-safe-path " + filepath.Join(testenv.GOROOT(t), "src", "runtime"),
+		"-ex", "set startup-with-shell off",
+		"-ex", "set print thread-events off",
+	}
+	if cgo {
+		// When we build the cgo version of the program, the system's
+		// linker is used. Some external linkers, like GNU gold,
+		// compress the .debug_gdb_scripts into .zdebug_gdb_scripts.
+		// Until gold and gdb can work together, temporarily load the
+		// python script directly.
+		args = append(args,
+			"-ex", "source "+filepath.Join(testenv.GOROOT(t), "src", "runtime", "runtime-gdb.py"),
+		)
+	} else {
+		args = append(args,
+			"-ex", "info auto-load python-scripts",
+		)
+	}
+	args = append(args,
+		"-ex", "set python print-stack full",
+		"-ex", fmt.Sprintf("br main.go:%d", bp),
+		"-ex", "run",
+		"-ex", "echo BEGIN info goroutines\n",
+		"-ex", "info goroutines",
+		"-ex", "echo END\n",
+		"-ex", "echo BEGIN print mapvar\n",
+		"-ex", "print mapvar",
+		"-ex", "echo END\n",
+		"-ex", "echo BEGIN print slicemap\n",
+		"-ex", "print slicemap",
+		"-ex", "echo END\n",
+		"-ex", "echo BEGIN print strvar\n",
+		"-ex", "print strvar",
+		"-ex", "echo END\n",
+		"-ex", "echo BEGIN print chanint\n",
+		"-ex", "print chanint",
+		"-ex", "echo END\n",
+		"-ex", "echo BEGIN print chanstr\n",
+		"-ex", "print chanstr",
+		"-ex", "echo END\n",
+		"-ex", "echo BEGIN info locals\n",
+		"-ex", "info locals",
+		"-ex", "echo END\n",
+		"-ex", "echo BEGIN goroutine 1 bt\n",
+		"-ex", "goroutine 1 bt",
+		"-ex", "echo END\n",
+		"-ex", "echo BEGIN goroutine all bt\n",
+		"-ex", "goroutine all bt",
+		"-ex", "echo END\n",
+		"-ex", "clear main.go:15", // clear the previous break point
+		"-ex", fmt.Sprintf("br main.go:%d", nLines), // new break point at the end of main
+		"-ex", "c",
+		"-ex", "echo BEGIN goroutine 1 bt at the end\n",
+		"-ex", "goroutine 1 bt",
+		"-ex", "echo END\n",
+		filepath.Join(dir, "a.exe"),
+	)
+	got, err := exec.Command("gdb", args...).CombinedOutput()
+	t.Logf("gdb output:\n%s", got)
+	if err != nil {
+		t.Fatalf("gdb exited with error: %v", err)
+	}
+
+	firstLine, _, _ := bytes.Cut(got, []byte("\n"))
+	if string(firstLine) != "Loading Go Runtime support." {
+		// This can happen when using all.bash with
+		// GOROOT_FINAL set, because the tests are run before
+		// the final installation of the files.
+		cmd := exec.Command(testenv.GoToolPath(t), "env", "GOROOT")
+		cmd.Env = []string{}
+		out, err := cmd.CombinedOutput()
+		if err != nil && bytes.Contains(out, []byte("cannot find GOROOT")) {
+			t.Skipf("skipping because GOROOT=%s does not exist", testenv.GOROOT(t))
+		}
+
+		_, file, _, _ := runtime.Caller(1)
+
+		t.Logf("package testing source file: %s", file)
+		t.Fatalf("failed to load Go runtime support: %s\n%s", firstLine, got)
+	}
+
+	// Extract named BEGIN...END blocks from output
+	partRe := regexp.MustCompile(`(?ms)^BEGIN ([^\n]*)\n(.*?)\nEND`)
+	blocks := map[string]string{}
+	for _, subs := range partRe.FindAllSubmatch(got, -1) {
+		blocks[string(subs[1])] = string(subs[2])
+	}
+
+	infoGoroutinesRe := regexp.MustCompile(`\*\s+\d+\s+running\s+`)
+	if bl := blocks["info goroutines"]; !infoGoroutinesRe.MatchString(bl) {
+		t.Fatalf("info goroutines failed: %s", bl)
+	}
+
+	printMapvarRe1 := regexp.MustCompile(`^\$[0-9]+ = map\[string\]string = {\[(0x[0-9a-f]+\s+)?"abc"\] = (0x[0-9a-f]+\s+)?"def", \[(0x[0-9a-f]+\s+)?"ghi"\] = (0x[0-9a-f]+\s+)?"jkl"}$`)
+	printMapvarRe2 := regexp.MustCompile(`^\$[0-9]+ = map\[string\]string = {\[(0x[0-9a-f]+\s+)?"ghi"\] = (0x[0-9a-f]+\s+)?"jkl", \[(0x[0-9a-f]+\s+)?"abc"\] = (0x[0-9a-f]+\s+)?"def"}$`)
+	if bl := blocks["print mapvar"]; !printMapvarRe1.MatchString(bl) &&
+		!printMapvarRe2.MatchString(bl) {
+		t.Fatalf("print mapvar failed: %s", bl)
+	}
+
+	// 2 orders, and possible differences in spacing.
+	sliceMapSfx1 := `map[string][]string = {["e"] = []string = {"f", "g", "h"}, ["a"] = []string = {"b", "c", "d"}}`
+	sliceMapSfx2 := `map[string][]string = {["a"] = []string = {"b", "c", "d"}, ["e"] = []string = {"f", "g", "h"}}`
+	if bl := strings.ReplaceAll(blocks["print slicemap"], "  ", " "); !strings.HasSuffix(bl, sliceMapSfx1) && !strings.HasSuffix(bl, sliceMapSfx2) {
+		t.Fatalf("print slicemap failed: %s", bl)
+	}
+
+	chanIntSfx := `chan int = {99, 11}`
+	if bl := strings.ReplaceAll(blocks["print chanint"], "  ", " "); !strings.HasSuffix(bl, chanIntSfx) {
+		t.Fatalf("print chanint failed: %s", bl)
+	}
+
+	chanStrSfx := `chan string = {"spongepants", "squarebob"}`
+	if bl := strings.ReplaceAll(blocks["print chanstr"], "  ", " "); !strings.HasSuffix(bl, chanStrSfx) {
+		t.Fatalf("print chanstr failed: %s", bl)
+	}
+
+	strVarRe := regexp.MustCompile(`^\$[0-9]+ = (0x[0-9a-f]+\s+)?"abc"$`)
+	if bl := blocks["print strvar"]; !strVarRe.MatchString(bl) {
+		t.Fatalf("print strvar failed: %s", bl)
+	}
+
+	// The exact format of composite values has changed over time.
+	// For issue 16338: ssa decompose phase split a slice into
+	// a collection of scalar vars holding its fields. In such cases
+	// the DWARF variable location expression should be of the
+	// form "var.field" and not just "field".
+	// However, the newer dwarf location list code reconstituted
+	// aggregates from their fields and reverted their printing
+	// back to its original form.
+	// Only test that all variables are listed in 'info locals' since
+	// different versions of gdb print variables in different
+	// order and with differing amount of information and formats.
+
+	if bl := blocks["info locals"]; !strings.Contains(bl, "slicevar") ||
+		!strings.Contains(bl, "mapvar") ||
+		!strings.Contains(bl, "strvar") {
+		t.Fatalf("info locals failed: %s", bl)
+	}
+
+	// Check that the backtraces are well formed.
+	checkCleanBacktrace(t, blocks["goroutine 1 bt"])
+	checkCleanBacktrace(t, blocks["goroutine 1 bt at the end"])
+
+	btGoroutine1Re := regexp.MustCompile(`(?m)^#0\s+(0x[0-9a-f]+\s+in\s+)?main\.main.+at`)
+	if bl := blocks["goroutine 1 bt"]; !btGoroutine1Re.MatchString(bl) {
+		t.Fatalf("goroutine 1 bt failed: %s", bl)
+	}
+
+	if bl := blocks["goroutine all bt"]; !btGoroutine1Re.MatchString(bl) {
+		t.Fatalf("goroutine all bt failed: %s", bl)
+	}
+
+	btGoroutine1AtTheEndRe := regexp.MustCompile(`(?m)^#0\s+(0x[0-9a-f]+\s+in\s+)?main\.main.+at`)
+	if bl := blocks["goroutine 1 bt at the end"]; !btGoroutine1AtTheEndRe.MatchString(bl) {
+		t.Fatalf("goroutine 1 bt at the end failed: %s", bl)
+	}
+}
+
+const backtraceSource = `
+package main
+
+//go:noinline
+func aaa() bool { return bbb() }
+
+//go:noinline
+func bbb() bool { return ccc() }
+
+//go:noinline
+func ccc() bool { return ddd() }
+
+//go:noinline
+func ddd() bool { return f() }
+
+//go:noinline
+func eee() bool { return true }
+
+var f = eee
+
+func main() {
+	_ = aaa()
+}
+`
+
+// TestGdbBacktrace tests that gdb can unwind the stack correctly
+// using only the DWARF debug info.
+func TestGdbBacktrace(t *testing.T) {
+	if runtime.GOOS == "netbsd" {
+		testenv.SkipFlaky(t, 15603)
+	}
+	if flag.Lookup("test.parallel").Value.(flag.Getter).Get().(int) < 2 {
+		// It is possible that this test will hang for a long time due to an
+		// apparent GDB bug reported in https://go.dev/issue/37405.
+		// If test parallelism is high enough, that might be ok: the other parallel
+		// tests will finish, and then this test will finish right before it would
+		// time out. However, if test are running sequentially, a hang in this test
+		// would likely cause the remaining tests to run out of time.
+		testenv.SkipFlaky(t, 37405)
+	}
+
+	checkGdbEnvironment(t)
+	t.Parallel()
+	checkGdbVersion(t)
+
+	dir := t.TempDir()
+
+	// Build the source code.
+	src := filepath.Join(dir, "main.go")
+	err := os.WriteFile(src, []byte(backtraceSource), 0644)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-o", "a.exe", "main.go")
+	cmd.Dir = dir
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("building source %v\n%s", err, out)
+	}
+
+	// Execute gdb commands.
+	start := time.Now()
+	args := []string{"-nx", "-batch",
+		"-iex", "add-auto-load-safe-path " + filepath.Join(testenv.GOROOT(t), "src", "runtime"),
+		"-ex", "set startup-with-shell off",
+		"-ex", "break main.eee",
+		"-ex", "run",
+		"-ex", "backtrace",
+		"-ex", "continue",
+		filepath.Join(dir, "a.exe"),
+	}
+	cmd = testenv.Command(t, "gdb", args...)
+
+	// Work around the GDB hang reported in https://go.dev/issue/37405.
+	// Sometimes (rarely), the GDB process hangs completely when the Go program
+	// exits, and we suspect that the bug is on the GDB side.
+	//
+	// The default Cancel function added by testenv.Command will mark the test as
+	// failed if it is in danger of timing out, but we want to instead mark it as
+	// skipped. Change the Cancel function to kill the process and merely log
+	// instead of failing the test.
+	//
+	// (This approach does not scale: if the test parallelism is less than or
+	// equal to the number of tests that run right up to the deadline, then the
+	// remaining parallel tests are likely to time out. But as long as it's just
+	// this one flaky test, it's probably fine..?)
+	//
+	// If there is no deadline set on the test at all, relying on the timeout set
+	// by testenv.Command will cause the test to hang indefinitely, but that's
+	// what “no deadline” means, after all — and it's probably the right behavior
+	// anyway if someone is trying to investigate and fix the GDB bug.
+	cmd.Cancel = func() error {
+		t.Logf("GDB command timed out after %v: %v", time.Since(start), cmd)
+		return cmd.Process.Kill()
+	}
+
+	got, err := cmd.CombinedOutput()
+	t.Logf("gdb output:\n%s", got)
+	if err != nil {
+		switch {
+		case bytes.Contains(got, []byte("internal-error: wait returned unexpected status 0x0")):
+			// GDB bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28551
+			testenv.SkipFlaky(t, 43068)
+		case bytes.Contains(got, []byte("Couldn't get registers: No such process.")),
+			bytes.Contains(got, []byte("Unable to fetch general registers.: No such process.")),
+			bytes.Contains(got, []byte("reading register pc (#64): No such process.")):
+			// GDB bug: https://sourceware.org/bugzilla/show_bug.cgi?id=9086
+			testenv.SkipFlaky(t, 50838)
+		case bytes.Contains(got, []byte("waiting for new child: No child processes.")):
+			// GDB bug: Sometimes it fails to wait for a clone child.
+			testenv.SkipFlaky(t, 60553)
+		case bytes.Contains(got, []byte(" exited normally]\n")):
+			// GDB bug: Sometimes the inferior exits fine,
+			// but then GDB hangs.
+			testenv.SkipFlaky(t, 37405)
+		}
+		t.Fatalf("gdb exited with error: %v", err)
+	}
+
+	// Check that the backtrace matches the source code.
+	bt := []string{
+		"eee",
+		"ddd",
+		"ccc",
+		"bbb",
+		"aaa",
+		"main",
+	}
+	for i, name := range bt {
+		s := fmt.Sprintf("#%v.*main\\.%v", i, name)
+		re := regexp.MustCompile(s)
+		if found := re.Find(got) != nil; !found {
+			t.Fatalf("could not find '%v' in backtrace", s)
+		}
+	}
+}
+
+const autotmpTypeSource = `
+package main
+
+type astruct struct {
+	a, b int
+}
+
+func main() {
+	var iface interface{} = map[string]astruct{}
+	var iface2 interface{} = []astruct{}
+	println(iface, iface2)
+}
+`
+
+// TestGdbAutotmpTypes ensures that types of autotmp variables appear in .debug_info
+// See bug #17830.
+func TestGdbAutotmpTypes(t *testing.T) {
+	checkGdbEnvironment(t)
+	t.Parallel()
+	checkGdbVersion(t)
+
+	if runtime.GOOS == "aix" && testing.Short() {
+		t.Skip("TestGdbAutotmpTypes is too slow on aix/ppc64")
+	}
+
+	dir := t.TempDir()
+
+	// Build the source code.
+	src := filepath.Join(dir, "main.go")
+	err := os.WriteFile(src, []byte(autotmpTypeSource), 0644)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-gcflags=all=-N -l", "-o", "a.exe", "main.go")
+	cmd.Dir = dir
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("building source %v\n%s", err, out)
+	}
+
+	// Execute gdb commands.
+	args := []string{"-nx", "-batch",
+		"-iex", "add-auto-load-safe-path " + filepath.Join(testenv.GOROOT(t), "src", "runtime"),
+		"-ex", "set startup-with-shell off",
+		// Some gdb may set scheduling-locking as "step" by default. This prevents background tasks
+		// (e.g GC) from completing which may result in a hang when executing the step command.
+		// See #49852.
+		"-ex", "set scheduler-locking off",
+		"-ex", "break main.main",
+		"-ex", "run",
+		"-ex", "step",
+		"-ex", "info types astruct",
+		filepath.Join(dir, "a.exe"),
+	}
+	got, err := exec.Command("gdb", args...).CombinedOutput()
+	t.Logf("gdb output:\n%s", got)
+	if err != nil {
+		t.Fatalf("gdb exited with error: %v", err)
+	}
+
+	sgot := string(got)
+
+	// Check that the backtrace matches the source code.
+	types := []string{
+		"[]main.astruct;",
+		"bucket<string,main.astruct>;",
+		"hash<string,main.astruct>;",
+		"main.astruct;",
+		"hash<string,main.astruct> * map[string]main.astruct;",
+	}
+	for _, name := range types {
+		if !strings.Contains(sgot, name) {
+			t.Fatalf("could not find %s in 'info typrs astruct' output", name)
+		}
+	}
+}
+
+const constsSource = `
+package main
+
+const aConstant int = 42
+const largeConstant uint64 = ^uint64(0)
+const minusOne int64 = -1
+
+func main() {
+	println("hello world")
+}
+`
+
+func TestGdbConst(t *testing.T) {
+	checkGdbEnvironment(t)
+	t.Parallel()
+	checkGdbVersion(t)
+
+	dir := t.TempDir()
+
+	// Build the source code.
+	src := filepath.Join(dir, "main.go")
+	err := os.WriteFile(src, []byte(constsSource), 0644)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-gcflags=all=-N -l", "-o", "a.exe", "main.go")
+	cmd.Dir = dir
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("building source %v\n%s", err, out)
+	}
+
+	// Execute gdb commands.
+	args := []string{"-nx", "-batch",
+		"-iex", "add-auto-load-safe-path " + filepath.Join(testenv.GOROOT(t), "src", "runtime"),
+		"-ex", "set startup-with-shell off",
+		"-ex", "break main.main",
+		"-ex", "run",
+		"-ex", "print main.aConstant",
+		"-ex", "print main.largeConstant",
+		"-ex", "print main.minusOne",
+		"-ex", "print 'runtime.mSpanInUse'",
+		"-ex", "print 'runtime._PageSize'",
+		filepath.Join(dir, "a.exe"),
+	}
+	got, err := exec.Command("gdb", args...).CombinedOutput()
+	t.Logf("gdb output:\n%s", got)
+	if err != nil {
+		t.Fatalf("gdb exited with error: %v", err)
+	}
+
+	sgot := strings.ReplaceAll(string(got), "\r\n", "\n")
+
+	if !strings.Contains(sgot, "\n$1 = 42\n$2 = 18446744073709551615\n$3 = -1\n$4 = 1 '\\001'\n$5 = 8192") {
+		t.Fatalf("output mismatch")
+	}
+}
+
+const panicSource = `
+package main
+
+import "runtime/debug"
+
+func main() {
+	debug.SetTraceback("crash")
+	crash()
+}
+
+func crash() {
+	panic("panic!")
+}
+`
+
+// TestGdbPanic tests that gdb can unwind the stack correctly
+// from SIGABRTs from Go panics.
+func TestGdbPanic(t *testing.T) {
+	checkGdbEnvironment(t)
+	t.Parallel()
+	checkGdbVersion(t)
+
+	if runtime.GOOS == "windows" {
+		t.Skip("no signals on windows")
+	}
+
+	dir := t.TempDir()
+
+	// Build the source code.
+	src := filepath.Join(dir, "main.go")
+	err := os.WriteFile(src, []byte(panicSource), 0644)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-o", "a.exe", "main.go")
+	cmd.Dir = dir
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("building source %v\n%s", err, out)
+	}
+
+	// Execute gdb commands.
+	args := []string{"-nx", "-batch",
+		"-iex", "add-auto-load-safe-path " + filepath.Join(testenv.GOROOT(t), "src", "runtime"),
+		"-ex", "set startup-with-shell off",
+		"-ex", "run",
+		"-ex", "backtrace",
+		filepath.Join(dir, "a.exe"),
+	}
+	got, err := exec.Command("gdb", args...).CombinedOutput()
+	t.Logf("gdb output:\n%s", got)
+	if err != nil {
+		t.Fatalf("gdb exited with error: %v", err)
+	}
+
+	// Check that the backtrace matches the source code.
+	bt := []string{
+		`crash`,
+		`main`,
+	}
+	for _, name := range bt {
+		s := fmt.Sprintf("(#.* .* in )?main\\.%v", name)
+		re := regexp.MustCompile(s)
+		if found := re.Find(got) != nil; !found {
+			t.Fatalf("could not find '%v' in backtrace", s)
+		}
+	}
+}
+
+const InfCallstackSource = `
+package main
+import "C"
+import "time"
+
+func loop() {
+        for i := 0; i < 1000; i++ {
+                time.Sleep(time.Millisecond*5)
+        }
+}
+
+func main() {
+        go loop()
+        time.Sleep(time.Second * 1)
+}
+`
+
+// TestGdbInfCallstack tests that gdb can unwind the callstack of cgo programs
+// on arm64 platforms without endless frames of function 'crossfunc1'.
+// https://golang.org/issue/37238
+func TestGdbInfCallstack(t *testing.T) {
+	checkGdbEnvironment(t)
+
+	testenv.MustHaveCGO(t)
+	if runtime.GOARCH != "arm64" {
+		t.Skip("skipping infinite callstack test on non-arm64 arches")
+	}
+
+	t.Parallel()
+	checkGdbVersion(t)
+
+	dir := t.TempDir()
+
+	// Build the source code.
+	src := filepath.Join(dir, "main.go")
+	err := os.WriteFile(src, []byte(InfCallstackSource), 0644)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-o", "a.exe", "main.go")
+	cmd.Dir = dir
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("building source %v\n%s", err, out)
+	}
+
+	// Execute gdb commands.
+	// 'setg_gcc' is the first point where we can reproduce the issue with just one 'run' command.
+	args := []string{"-nx", "-batch",
+		"-iex", "add-auto-load-safe-path " + filepath.Join(testenv.GOROOT(t), "src", "runtime"),
+		"-ex", "set startup-with-shell off",
+		"-ex", "break setg_gcc",
+		"-ex", "run",
+		"-ex", "backtrace 3",
+		"-ex", "disable 1",
+		"-ex", "continue",
+		filepath.Join(dir, "a.exe"),
+	}
+	got, err := exec.Command("gdb", args...).CombinedOutput()
+	t.Logf("gdb output:\n%s", got)
+	if err != nil {
+		t.Fatalf("gdb exited with error: %v", err)
+	}
+
+	// Check that the backtrace matches
+	// We check the 3 inner most frames only as they are present certainly, according to gcc_<OS>_arm64.c
+	bt := []string{
+		`setg_gcc`,
+		`crosscall1`,
+		`threadentry`,
+	}
+	for i, name := range bt {
+		s := fmt.Sprintf("#%v.*%v", i, name)
+		re := regexp.MustCompile(s)
+		if found := re.Find(got) != nil; !found {
+			t.Fatalf("could not find '%v' in backtrace", s)
+		}
+	}
+}
diff --git a/src/runtime/runtime-gdb_unix_test.go b/src/runtime/runtime-gdb_unix_test.go
new file mode 100644
index 0000000..f9cc648
--- /dev/null
+++ b/src/runtime/runtime-gdb_unix_test.go
@@ -0,0 +1,212 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package runtime_test
+
+import (
+	"bytes"
+	"internal/testenv"
+	"io"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"regexp"
+	"runtime"
+	"syscall"
+	"testing"
+)
+
+const coreSignalSource = `
+package main
+
+import (
+	"flag"
+	"fmt"
+	"os"
+	"runtime/debug"
+	"syscall"
+)
+
+var pipeFD = flag.Int("pipe-fd", -1, "FD of write end of control pipe")
+
+func enableCore() {
+	debug.SetTraceback("crash")
+
+	var lim syscall.Rlimit
+	err := syscall.Getrlimit(syscall.RLIMIT_CORE, &lim)
+	if err != nil {
+		panic(fmt.Sprintf("error getting rlimit: %v", err))
+	}
+	lim.Cur = lim.Max
+	fmt.Fprintf(os.Stderr, "Setting RLIMIT_CORE = %+#v\n", lim)
+	err = syscall.Setrlimit(syscall.RLIMIT_CORE, &lim)
+	if err != nil {
+		panic(fmt.Sprintf("error setting rlimit: %v", err))
+	}
+}
+
+func main() {
+	flag.Parse()
+
+	enableCore()
+
+	// Ready to go. Notify parent.
+	if err := syscall.Close(*pipeFD); err != nil {
+		panic(fmt.Sprintf("error closing control pipe fd %d: %v", *pipeFD, err))
+	}
+
+	for {}
+}
+`
+
+// TestGdbCoreSignalBacktrace tests that gdb can unwind the stack correctly
+// through a signal handler in a core file
+func TestGdbCoreSignalBacktrace(t *testing.T) {
+	if runtime.GOOS != "linux" {
+		// N.B. This test isn't fundamentally Linux-only, but it needs
+		// to know how to enable/find core files on each OS.
+		t.Skip("Test only supported on Linux")
+	}
+	if runtime.GOARCH != "386" && runtime.GOARCH != "amd64" {
+		// TODO(go.dev/issue/25218): Other architectures use sigreturn
+		// via VDSO, which we somehow don't handle correctly.
+		t.Skip("Backtrace through signal handler only works on 386 and amd64")
+	}
+
+	checkGdbEnvironment(t)
+	t.Parallel()
+	checkGdbVersion(t)
+
+	// Ensure there is enough RLIMIT_CORE available to generate a full core.
+	var lim syscall.Rlimit
+	err := syscall.Getrlimit(syscall.RLIMIT_CORE, &lim)
+	if err != nil {
+		t.Fatalf("error getting rlimit: %v", err)
+	}
+	// Minimum RLIMIT_CORE max to allow. This is a conservative estimate.
+	// Most systems allow infinity.
+	const minRlimitCore = 100 << 20 // 100 MB
+	if lim.Max < minRlimitCore {
+		t.Skipf("RLIMIT_CORE max too low: %#+v", lim)
+	}
+
+	// Make sure core pattern will send core to the current directory.
+	b, err := os.ReadFile("/proc/sys/kernel/core_pattern")
+	if err != nil {
+		t.Fatalf("error reading core_pattern: %v", err)
+	}
+	if string(b) != "core\n" {
+		t.Skipf("Unexpected core pattern %q", string(b))
+	}
+
+	dir := t.TempDir()
+
+	// Build the source code.
+	src := filepath.Join(dir, "main.go")
+	err = os.WriteFile(src, []byte(coreSignalSource), 0644)
+	if err != nil {
+		t.Fatalf("failed to create file: %v", err)
+	}
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-o", "a.exe", "main.go")
+	cmd.Dir = dir
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("building source %v\n%s", err, out)
+	}
+
+	r, w, err := os.Pipe()
+	if err != nil {
+		t.Fatalf("error creating control pipe: %v", err)
+	}
+	defer r.Close()
+
+	// Start the test binary.
+	cmd = testenv.Command(t, "./a.exe", "-pipe-fd=3")
+	cmd.Dir = dir
+	cmd.ExtraFiles = []*os.File{w}
+	var output bytes.Buffer
+	cmd.Stdout = &output // for test logging
+	cmd.Stderr = &output
+
+	if err := cmd.Start(); err != nil {
+		t.Fatalf("error starting test binary: %v", err)
+	}
+	w.Close()
+
+	// Wait for child to be ready.
+	var buf [1]byte
+	if _, err := r.Read(buf[:]); err != io.EOF {
+		t.Fatalf("control pipe read get err %v want io.EOF", err)
+	}
+
+	// 💥
+	if err := cmd.Process.Signal(os.Signal(syscall.SIGABRT)); err != nil {
+		t.Fatalf("erroring signaling child: %v", err)
+	}
+
+	err = cmd.Wait()
+	t.Logf("child output:\n%s", output.String())
+	if err == nil {
+		t.Fatalf("Wait succeeded, want SIGABRT")
+	}
+	ee, ok := err.(*exec.ExitError)
+	if !ok {
+		t.Fatalf("Wait err got %T %v, want exec.ExitError", ee, ee)
+	}
+	ws, ok := ee.Sys().(syscall.WaitStatus)
+	if !ok {
+		t.Fatalf("Sys got %T %v, want syscall.WaitStatus", ee.Sys(), ee.Sys())
+	}
+	if ws.Signal() != syscall.SIGABRT {
+		t.Fatalf("Signal got %d want SIGABRT", ws.Signal())
+	}
+	if !ws.CoreDump() {
+		t.Fatalf("CoreDump got %v want true", ws.CoreDump())
+	}
+
+	// Execute gdb commands.
+	args := []string{"-nx", "-batch",
+		"-iex", "add-auto-load-safe-path " + filepath.Join(testenv.GOROOT(t), "src", "runtime"),
+		"-ex", "backtrace",
+		filepath.Join(dir, "a.exe"),
+		filepath.Join(dir, "core"),
+	}
+	cmd = testenv.Command(t, "gdb", args...)
+
+	got, err := cmd.CombinedOutput()
+	t.Logf("gdb output:\n%s", got)
+	if err != nil {
+		t.Fatalf("gdb exited with error: %v", err)
+	}
+
+	// We don't know which thread the fatal signal will land on, but we can still check for basics:
+	//
+	// 1. A frame in the signal handler: runtime.sigtramp
+	// 2. GDB detection of the signal handler: <signal handler called>
+	// 3. A frame before the signal handler: this could be foo, or somewhere in the scheduler
+
+	re := regexp.MustCompile(`#.* runtime\.sigtramp `)
+	if found := re.Find(got) != nil; !found {
+		t.Fatalf("could not find sigtramp in backtrace")
+	}
+
+	re = regexp.MustCompile("#.* <signal handler called>")
+	loc := re.FindIndex(got)
+	if loc == nil {
+		t.Fatalf("could not find signal handler marker in backtrace")
+	}
+	rest := got[loc[1]:]
+
+	// Look for any frames after the signal handler. We want to see
+	// symbolized frames, not garbage unknown frames.
+	//
+	// Since the signal might not be delivered to the main thread we can't
+	// look for main.main. Every thread should have a runtime frame though.
+	re = regexp.MustCompile(`#.* runtime\.`)
+	if found := re.Find(rest) != nil; !found {
+		t.Fatalf("could not find runtime symbol in backtrace after signal handler:\n%s", rest)
+	}
+}
diff --git a/src/runtime/runtime-lldb_test.go b/src/runtime/runtime-lldb_test.go
new file mode 100644
index 0000000..19a6cc6
--- /dev/null
+++ b/src/runtime/runtime-lldb_test.go
@@ -0,0 +1,185 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"internal/testenv"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"runtime"
+	"strings"
+	"testing"
+)
+
+var lldbPath string
+
+func checkLldbPython(t *testing.T) {
+	cmd := exec.Command("lldb", "-P")
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Skipf("skipping due to issue running lldb: %v\n%s", err, out)
+	}
+	lldbPath = strings.TrimSpace(string(out))
+
+	cmd = exec.Command("/usr/bin/python2.7", "-c", "import sys;sys.path.append(sys.argv[1]);import lldb; print('go lldb python support')", lldbPath)
+	out, err = cmd.CombinedOutput()
+
+	if err != nil {
+		t.Skipf("skipping due to issue running python: %v\n%s", err, out)
+	}
+	if string(out) != "go lldb python support\n" {
+		t.Skipf("skipping due to lack of python lldb support: %s", out)
+	}
+
+	if runtime.GOOS == "darwin" {
+		// Try to see if we have debugging permissions.
+		cmd = exec.Command("/usr/sbin/DevToolsSecurity", "-status")
+		out, err = cmd.CombinedOutput()
+		if err != nil {
+			t.Skipf("DevToolsSecurity failed: %v", err)
+		} else if !strings.Contains(string(out), "enabled") {
+			t.Skip(string(out))
+		}
+		cmd = exec.Command("/usr/bin/groups")
+		out, err = cmd.CombinedOutput()
+		if err != nil {
+			t.Skipf("groups failed: %v", err)
+		} else if !strings.Contains(string(out), "_developer") {
+			t.Skip("Not in _developer group")
+		}
+	}
+}
+
+const lldbHelloSource = `
+package main
+import "fmt"
+func main() {
+	mapvar := make(map[string]string,5)
+	mapvar["abc"] = "def"
+	mapvar["ghi"] = "jkl"
+	intvar := 42
+	ptrvar := &intvar
+	fmt.Println("hi") // line 10
+	_ = ptrvar
+}
+`
+
+const lldbScriptSource = `
+import sys
+sys.path.append(sys.argv[1])
+import lldb
+import os
+
+TIMEOUT_SECS = 5
+
+debugger = lldb.SBDebugger.Create()
+debugger.SetAsync(True)
+target = debugger.CreateTargetWithFileAndArch("a.exe", None)
+if target:
+  print "Created target"
+  main_bp = target.BreakpointCreateByLocation("main.go", 10)
+  if main_bp:
+    print "Created breakpoint"
+  process = target.LaunchSimple(None, None, os.getcwd())
+  if process:
+    print "Process launched"
+    listener = debugger.GetListener()
+    process.broadcaster.AddListener(listener, lldb.SBProcess.eBroadcastBitStateChanged)
+    while True:
+      event = lldb.SBEvent()
+      if listener.WaitForEvent(TIMEOUT_SECS, event):
+        if lldb.SBProcess.GetRestartedFromEvent(event):
+          continue
+        state = process.GetState()
+        if state in [lldb.eStateUnloaded, lldb.eStateLaunching, lldb.eStateRunning]:
+          continue
+      else:
+        print "Timeout launching"
+      break
+    if state == lldb.eStateStopped:
+      for t in process.threads:
+        if t.GetStopReason() == lldb.eStopReasonBreakpoint:
+          print "Hit breakpoint"
+          frame = t.GetFrameAtIndex(0)
+          if frame:
+            if frame.line_entry:
+              print "Stopped at %s:%d" % (frame.line_entry.file.basename, frame.line_entry.line)
+            if frame.function:
+              print "Stopped in %s" % (frame.function.name,)
+            var = frame.FindVariable('intvar')
+            if var:
+              print "intvar = %s" % (var.GetValue(),)
+            else:
+              print "no intvar"
+    else:
+      print "Process state", state
+    process.Destroy()
+else:
+  print "Failed to create target a.exe"
+
+lldb.SBDebugger.Destroy(debugger)
+sys.exit()
+`
+
+const expectedLldbOutput = `Created target
+Created breakpoint
+Process launched
+Hit breakpoint
+Stopped at main.go:10
+Stopped in main.main
+intvar = 42
+`
+
+func TestLldbPython(t *testing.T) {
+	testenv.MustHaveGoBuild(t)
+	if final := os.Getenv("GOROOT_FINAL"); final != "" && runtime.GOROOT() != final {
+		t.Skip("gdb test can fail with GOROOT_FINAL pending")
+	}
+	testenv.SkipFlaky(t, 31188)
+
+	checkLldbPython(t)
+
+	dir := t.TempDir()
+
+	src := filepath.Join(dir, "main.go")
+	err := os.WriteFile(src, []byte(lldbHelloSource), 0644)
+	if err != nil {
+		t.Fatalf("failed to create src file: %v", err)
+	}
+
+	mod := filepath.Join(dir, "go.mod")
+	err = os.WriteFile(mod, []byte("module lldbtest"), 0644)
+	if err != nil {
+		t.Fatalf("failed to create mod file: %v", err)
+	}
+
+	// As of 2018-07-17, lldb doesn't support compressed DWARF, so
+	// disable it for this test.
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-gcflags=all=-N -l", "-ldflags=-compressdwarf=false", "-o", "a.exe")
+	cmd.Dir = dir
+	cmd.Env = append(os.Environ(), "GOPATH=") // issue 31100
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("building source %v\n%s", err, out)
+	}
+
+	src = filepath.Join(dir, "script.py")
+	err = os.WriteFile(src, []byte(lldbScriptSource), 0755)
+	if err != nil {
+		t.Fatalf("failed to create script: %v", err)
+	}
+
+	cmd = exec.Command("/usr/bin/python2.7", "script.py", lldbPath)
+	cmd.Dir = dir
+	got, _ := cmd.CombinedOutput()
+
+	if string(got) != expectedLldbOutput {
+		if strings.Contains(string(got), "Timeout launching") {
+			t.Skip("Timeout launching")
+		}
+		t.Fatalf("Unexpected lldb output:\n%s", got)
+	}
+}
diff --git a/src/runtime/runtime-seh_windows_test.go b/src/runtime/runtime-seh_windows_test.go
new file mode 100644
index 0000000..27e4f49
--- /dev/null
+++ b/src/runtime/runtime-seh_windows_test.go
@@ -0,0 +1,191 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"internal/abi"
+	"internal/syscall/windows"
+	"runtime"
+	"slices"
+	"testing"
+	"unsafe"
+)
+
+func sehf1() int {
+	return sehf1()
+}
+
+func sehf2() {}
+
+func TestSehLookupFunctionEntry(t *testing.T) {
+	if runtime.GOARCH != "amd64" {
+		t.Skip("skipping amd64-only test")
+	}
+	// This test checks that Win32 is able to retrieve
+	// function metadata stored in the .pdata section
+	// by the Go linker.
+	// Win32 unwinding will fail if this test fails,
+	// as RtlUnwindEx uses RtlLookupFunctionEntry internally.
+	// If that's the case, don't bother investigating further,
+	// first fix the .pdata generation.
+	sehf1pc := abi.FuncPCABIInternal(sehf1)
+	var fnwithframe func()
+	fnwithframe = func() {
+		fnwithframe()
+	}
+	fnwithoutframe := func() {}
+	tests := []struct {
+		name     string
+		pc       uintptr
+		hasframe bool
+	}{
+		{"no frame func", abi.FuncPCABIInternal(sehf2), false},
+		{"no func", sehf1pc - 1, false},
+		{"func at entry", sehf1pc, true},
+		{"func in prologue", sehf1pc + 1, true},
+		{"anonymous func with frame", abi.FuncPCABIInternal(fnwithframe), true},
+		{"anonymous func without frame", abi.FuncPCABIInternal(fnwithoutframe), false},
+		{"pc at func body", runtime.NewContextStub().GetPC(), true},
+	}
+	for _, tt := range tests {
+		var base uintptr
+		fn := windows.RtlLookupFunctionEntry(tt.pc, &base, nil)
+		if !tt.hasframe {
+			if fn != 0 {
+				t.Errorf("%s: unexpected frame", tt.name)
+			}
+			continue
+		}
+		if fn == 0 {
+			t.Errorf("%s: missing frame", tt.name)
+		}
+	}
+}
+
+func sehCallers() []uintptr {
+	// We don't need a real context,
+	// RtlVirtualUnwind just needs a context with
+	// valid a pc, sp and fp (aka bp).
+	ctx := runtime.NewContextStub()
+
+	pcs := make([]uintptr, 15)
+	var base, frame uintptr
+	var n int
+	for i := 0; i < len(pcs); i++ {
+		fn := windows.RtlLookupFunctionEntry(ctx.GetPC(), &base, nil)
+		if fn == 0 {
+			break
+		}
+		pcs[i] = ctx.GetPC()
+		n++
+		windows.RtlVirtualUnwind(0, base, ctx.GetPC(), fn, uintptr(unsafe.Pointer(ctx)), nil, &frame, nil)
+	}
+	return pcs[:n]
+}
+
+// SEH unwinding does not report inlined frames.
+//
+//go:noinline
+func sehf3(pan bool) []uintptr {
+	return sehf4(pan)
+}
+
+//go:noinline
+func sehf4(pan bool) []uintptr {
+	var pcs []uintptr
+	if pan {
+		panic("sehf4")
+	}
+	pcs = sehCallers()
+	return pcs
+}
+
+func testSehCallersEqual(t *testing.T, pcs []uintptr, want []string) {
+	t.Helper()
+	got := make([]string, 0, len(want))
+	for _, pc := range pcs {
+		fn := runtime.FuncForPC(pc)
+		if fn == nil || len(got) >= len(want) {
+			break
+		}
+		name := fn.Name()
+		switch name {
+		case "runtime.deferCallSave", "runtime.runOpenDeferFrame", "runtime.panicmem":
+			// These functions are skipped as they appear inconsistently depending
+			// whether inlining is on or off.
+			continue
+		}
+		got = append(got, name)
+	}
+	if !slices.Equal(want, got) {
+		t.Fatalf("wanted %v, got %v", want, got)
+	}
+}
+
+func TestSehUnwind(t *testing.T) {
+	if runtime.GOARCH != "amd64" {
+		t.Skip("skipping amd64-only test")
+	}
+	pcs := sehf3(false)
+	testSehCallersEqual(t, pcs, []string{"runtime_test.sehCallers", "runtime_test.sehf4",
+		"runtime_test.sehf3", "runtime_test.TestSehUnwind"})
+}
+
+func TestSehUnwindPanic(t *testing.T) {
+	if runtime.GOARCH != "amd64" {
+		t.Skip("skipping amd64-only test")
+	}
+	want := []string{"runtime_test.sehCallers", "runtime_test.TestSehUnwindPanic.func1", "runtime.gopanic",
+		"runtime_test.sehf4", "runtime_test.sehf3", "runtime_test.TestSehUnwindPanic"}
+	defer func() {
+		if r := recover(); r == nil {
+			t.Fatal("did not panic")
+		}
+		pcs := sehCallers()
+		testSehCallersEqual(t, pcs, want)
+	}()
+	sehf3(true)
+}
+
+func TestSehUnwindDoublePanic(t *testing.T) {
+	if runtime.GOARCH != "amd64" {
+		t.Skip("skipping amd64-only test")
+	}
+	want := []string{"runtime_test.sehCallers", "runtime_test.TestSehUnwindDoublePanic.func1.1", "runtime.gopanic",
+		"runtime_test.TestSehUnwindDoublePanic.func1", "runtime.gopanic", "runtime_test.TestSehUnwindDoublePanic"}
+	defer func() {
+		defer func() {
+			if recover() == nil {
+				t.Fatal("did not panic")
+			}
+			pcs := sehCallers()
+			testSehCallersEqual(t, pcs, want)
+		}()
+		if recover() == nil {
+			t.Fatal("did not panic")
+		}
+		panic(2)
+	}()
+	panic(1)
+}
+
+func TestSehUnwindNilPointerPanic(t *testing.T) {
+	if runtime.GOARCH != "amd64" {
+		t.Skip("skipping amd64-only test")
+	}
+	want := []string{"runtime_test.sehCallers", "runtime_test.TestSehUnwindNilPointerPanic.func1", "runtime.gopanic",
+		"runtime.sigpanic", "runtime_test.TestSehUnwindNilPointerPanic"}
+	defer func() {
+		if r := recover(); r == nil {
+			t.Fatal("did not panic")
+		}
+		pcs := sehCallers()
+		testSehCallersEqual(t, pcs, want)
+	}()
+	var p *int
+	if *p == 3 {
+		t.Fatal("did not see nil pointer panic")
+	}
+}
diff --git a/src/runtime/runtime.go b/src/runtime/runtime.go
new file mode 100644
index 0000000..15119cf
--- /dev/null
+++ b/src/runtime/runtime.go
@@ -0,0 +1,167 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+//go:generate go run wincallback.go
+//go:generate go run mkduff.go
+//go:generate go run mkfastlog2table.go
+//go:generate go run mklockrank.go -o lockrank.go
+
+var ticks ticksType
+
+type ticksType struct {
+	lock mutex
+	val  atomic.Int64
+}
+
+// Note: Called by runtime/pprof in addition to runtime code.
+func tickspersecond() int64 {
+	r := ticks.val.Load()
+	if r != 0 {
+		return r
+	}
+	lock(&ticks.lock)
+	r = ticks.val.Load()
+	if r == 0 {
+		t0 := nanotime()
+		c0 := cputicks()
+		usleep(100 * 1000)
+		t1 := nanotime()
+		c1 := cputicks()
+		if t1 == t0 {
+			t1++
+		}
+		r = (c1 - c0) * 1000 * 1000 * 1000 / (t1 - t0)
+		if r == 0 {
+			r++
+		}
+		ticks.val.Store(r)
+	}
+	unlock(&ticks.lock)
+	return r
+}
+
+var envs []string
+var argslice []string
+
+//go:linkname syscall_runtime_envs syscall.runtime_envs
+func syscall_runtime_envs() []string { return append([]string{}, envs...) }
+
+//go:linkname syscall_Getpagesize syscall.Getpagesize
+func syscall_Getpagesize() int { return int(physPageSize) }
+
+//go:linkname os_runtime_args os.runtime_args
+func os_runtime_args() []string { return append([]string{}, argslice...) }
+
+//go:linkname syscall_Exit syscall.Exit
+//go:nosplit
+func syscall_Exit(code int) {
+	exit(int32(code))
+}
+
+var godebugDefault string
+var godebugUpdate atomic.Pointer[func(string, string)]
+var godebugEnv atomic.Pointer[string] // set by parsedebugvars
+var godebugNewIncNonDefault atomic.Pointer[func(string) func()]
+
+//go:linkname godebug_setUpdate internal/godebug.setUpdate
+func godebug_setUpdate(update func(string, string)) {
+	p := new(func(string, string))
+	*p = update
+	godebugUpdate.Store(p)
+	godebugNotify(false)
+}
+
+//go:linkname godebug_setNewIncNonDefault internal/godebug.setNewIncNonDefault
+func godebug_setNewIncNonDefault(newIncNonDefault func(string) func()) {
+	p := new(func(string) func())
+	*p = newIncNonDefault
+	godebugNewIncNonDefault.Store(p)
+}
+
+// A godebugInc provides access to internal/godebug's IncNonDefault function
+// for a given GODEBUG setting.
+// Calls before internal/godebug registers itself are dropped on the floor.
+type godebugInc struct {
+	name string
+	inc  atomic.Pointer[func()]
+}
+
+func (g *godebugInc) IncNonDefault() {
+	inc := g.inc.Load()
+	if inc == nil {
+		newInc := godebugNewIncNonDefault.Load()
+		if newInc == nil {
+			return
+		}
+		inc = new(func())
+		*inc = (*newInc)(g.name)
+		if raceenabled {
+			racereleasemerge(unsafe.Pointer(&g.inc))
+		}
+		if !g.inc.CompareAndSwap(nil, inc) {
+			inc = g.inc.Load()
+		}
+	}
+	if raceenabled {
+		raceacquire(unsafe.Pointer(&g.inc))
+	}
+	(*inc)()
+}
+
+func godebugNotify(envChanged bool) {
+	update := godebugUpdate.Load()
+	var env string
+	if p := godebugEnv.Load(); p != nil {
+		env = *p
+	}
+	if envChanged {
+		reparsedebugvars(env)
+	}
+	if update != nil {
+		(*update)(godebugDefault, env)
+	}
+}
+
+//go:linkname syscall_runtimeSetenv syscall.runtimeSetenv
+func syscall_runtimeSetenv(key, value string) {
+	setenv_c(key, value)
+	if key == "GODEBUG" {
+		p := new(string)
+		*p = value
+		godebugEnv.Store(p)
+		godebugNotify(true)
+	}
+}
+
+//go:linkname syscall_runtimeUnsetenv syscall.runtimeUnsetenv
+func syscall_runtimeUnsetenv(key string) {
+	unsetenv_c(key)
+	if key == "GODEBUG" {
+		godebugEnv.Store(nil)
+		godebugNotify(true)
+	}
+}
+
+// writeErrStr writes a string to descriptor 2.
+//
+//go:nosplit
+func writeErrStr(s string) {
+	write(2, unsafe.Pointer(unsafe.StringData(s)), int32(len(s)))
+}
+
+// auxv is populated on relevant platforms but defined here for all platforms
+// so x/sys/cpu can assume the getAuxv symbol exists without keeping its list
+// of auxv-using GOOS build tags in sync.
+//
+// It contains an even number of elements, (tag, value) pairs.
+var auxv []uintptr
+
+func getAuxv() []uintptr { return auxv } // accessed from x/sys/cpu; see issue 57336
diff --git a/src/runtime/runtime1.go b/src/runtime/runtime1.go
new file mode 100644
index 0000000..7174c63
--- /dev/null
+++ b/src/runtime/runtime1.go
@@ -0,0 +1,657 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/bytealg"
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// Keep a cached value to make gotraceback fast,
+// since we call it on every call to gentraceback.
+// The cached value is a uint32 in which the low bits
+// are the "crash" and "all" settings and the remaining
+// bits are the traceback value (0 off, 1 on, 2 include system).
+const (
+	tracebackCrash = 1 << iota
+	tracebackAll
+	tracebackShift = iota
+)
+
+var traceback_cache uint32 = 2 << tracebackShift
+var traceback_env uint32
+
+// gotraceback returns the current traceback settings.
+//
+// If level is 0, suppress all tracebacks.
+// If level is 1, show tracebacks, but exclude runtime frames.
+// If level is 2, show tracebacks including runtime frames.
+// If all is set, print all goroutine stacks. Otherwise, print just the current goroutine.
+// If crash is set, crash (core dump, etc) after tracebacking.
+//
+//go:nosplit
+func gotraceback() (level int32, all, crash bool) {
+	gp := getg()
+	t := atomic.Load(&traceback_cache)
+	crash = t&tracebackCrash != 0
+	all = gp.m.throwing >= throwTypeUser || t&tracebackAll != 0
+	if gp.m.traceback != 0 {
+		level = int32(gp.m.traceback)
+	} else if gp.m.throwing >= throwTypeRuntime {
+		// Always include runtime frames in runtime throws unless
+		// otherwise overridden by m.traceback.
+		level = 2
+	} else {
+		level = int32(t >> tracebackShift)
+	}
+	return
+}
+
+var (
+	argc int32
+	argv **byte
+)
+
+// nosplit for use in linux startup sysargs.
+//
+//go:nosplit
+func argv_index(argv **byte, i int32) *byte {
+	return *(**byte)(add(unsafe.Pointer(argv), uintptr(i)*goarch.PtrSize))
+}
+
+func args(c int32, v **byte) {
+	argc = c
+	argv = v
+	sysargs(c, v)
+}
+
+func goargs() {
+	if GOOS == "windows" {
+		return
+	}
+	argslice = make([]string, argc)
+	for i := int32(0); i < argc; i++ {
+		argslice[i] = gostringnocopy(argv_index(argv, i))
+	}
+}
+
+func goenvs_unix() {
+	// TODO(austin): ppc64 in dynamic linking mode doesn't
+	// guarantee env[] will immediately follow argv. Might cause
+	// problems.
+	n := int32(0)
+	for argv_index(argv, argc+1+n) != nil {
+		n++
+	}
+
+	envs = make([]string, n)
+	for i := int32(0); i < n; i++ {
+		envs[i] = gostring(argv_index(argv, argc+1+i))
+	}
+}
+
+func environ() []string {
+	return envs
+}
+
+// TODO: These should be locals in testAtomic64, but we don't 8-byte
+// align stack variables on 386.
+var test_z64, test_x64 uint64
+
+func testAtomic64() {
+	test_z64 = 42
+	test_x64 = 0
+	if atomic.Cas64(&test_z64, test_x64, 1) {
+		throw("cas64 failed")
+	}
+	if test_x64 != 0 {
+		throw("cas64 failed")
+	}
+	test_x64 = 42
+	if !atomic.Cas64(&test_z64, test_x64, 1) {
+		throw("cas64 failed")
+	}
+	if test_x64 != 42 || test_z64 != 1 {
+		throw("cas64 failed")
+	}
+	if atomic.Load64(&test_z64) != 1 {
+		throw("load64 failed")
+	}
+	atomic.Store64(&test_z64, (1<<40)+1)
+	if atomic.Load64(&test_z64) != (1<<40)+1 {
+		throw("store64 failed")
+	}
+	if atomic.Xadd64(&test_z64, (1<<40)+1) != (2<<40)+2 {
+		throw("xadd64 failed")
+	}
+	if atomic.Load64(&test_z64) != (2<<40)+2 {
+		throw("xadd64 failed")
+	}
+	if atomic.Xchg64(&test_z64, (3<<40)+3) != (2<<40)+2 {
+		throw("xchg64 failed")
+	}
+	if atomic.Load64(&test_z64) != (3<<40)+3 {
+		throw("xchg64 failed")
+	}
+}
+
+func check() {
+	var (
+		a     int8
+		b     uint8
+		c     int16
+		d     uint16
+		e     int32
+		f     uint32
+		g     int64
+		h     uint64
+		i, i1 float32
+		j, j1 float64
+		k     unsafe.Pointer
+		l     *uint16
+		m     [4]byte
+	)
+	type x1t struct {
+		x uint8
+	}
+	type y1t struct {
+		x1 x1t
+		y  uint8
+	}
+	var x1 x1t
+	var y1 y1t
+
+	if unsafe.Sizeof(a) != 1 {
+		throw("bad a")
+	}
+	if unsafe.Sizeof(b) != 1 {
+		throw("bad b")
+	}
+	if unsafe.Sizeof(c) != 2 {
+		throw("bad c")
+	}
+	if unsafe.Sizeof(d) != 2 {
+		throw("bad d")
+	}
+	if unsafe.Sizeof(e) != 4 {
+		throw("bad e")
+	}
+	if unsafe.Sizeof(f) != 4 {
+		throw("bad f")
+	}
+	if unsafe.Sizeof(g) != 8 {
+		throw("bad g")
+	}
+	if unsafe.Sizeof(h) != 8 {
+		throw("bad h")
+	}
+	if unsafe.Sizeof(i) != 4 {
+		throw("bad i")
+	}
+	if unsafe.Sizeof(j) != 8 {
+		throw("bad j")
+	}
+	if unsafe.Sizeof(k) != goarch.PtrSize {
+		throw("bad k")
+	}
+	if unsafe.Sizeof(l) != goarch.PtrSize {
+		throw("bad l")
+	}
+	if unsafe.Sizeof(x1) != 1 {
+		throw("bad unsafe.Sizeof x1")
+	}
+	if unsafe.Offsetof(y1.y) != 1 {
+		throw("bad offsetof y1.y")
+	}
+	if unsafe.Sizeof(y1) != 2 {
+		throw("bad unsafe.Sizeof y1")
+	}
+
+	if timediv(12345*1000000000+54321, 1000000000, &e) != 12345 || e != 54321 {
+		throw("bad timediv")
+	}
+
+	var z uint32
+	z = 1
+	if !atomic.Cas(&z, 1, 2) {
+		throw("cas1")
+	}
+	if z != 2 {
+		throw("cas2")
+	}
+
+	z = 4
+	if atomic.Cas(&z, 5, 6) {
+		throw("cas3")
+	}
+	if z != 4 {
+		throw("cas4")
+	}
+
+	z = 0xffffffff
+	if !atomic.Cas(&z, 0xffffffff, 0xfffffffe) {
+		throw("cas5")
+	}
+	if z != 0xfffffffe {
+		throw("cas6")
+	}
+
+	m = [4]byte{1, 1, 1, 1}
+	atomic.Or8(&m[1], 0xf0)
+	if m[0] != 1 || m[1] != 0xf1 || m[2] != 1 || m[3] != 1 {
+		throw("atomicor8")
+	}
+
+	m = [4]byte{0xff, 0xff, 0xff, 0xff}
+	atomic.And8(&m[1], 0x1)
+	if m[0] != 0xff || m[1] != 0x1 || m[2] != 0xff || m[3] != 0xff {
+		throw("atomicand8")
+	}
+
+	*(*uint64)(unsafe.Pointer(&j)) = ^uint64(0)
+	if j == j {
+		throw("float64nan")
+	}
+	if !(j != j) {
+		throw("float64nan1")
+	}
+
+	*(*uint64)(unsafe.Pointer(&j1)) = ^uint64(1)
+	if j == j1 {
+		throw("float64nan2")
+	}
+	if !(j != j1) {
+		throw("float64nan3")
+	}
+
+	*(*uint32)(unsafe.Pointer(&i)) = ^uint32(0)
+	if i == i {
+		throw("float32nan")
+	}
+	if i == i {
+		throw("float32nan1")
+	}
+
+	*(*uint32)(unsafe.Pointer(&i1)) = ^uint32(1)
+	if i == i1 {
+		throw("float32nan2")
+	}
+	if i == i1 {
+		throw("float32nan3")
+	}
+
+	testAtomic64()
+
+	if fixedStack != round2(fixedStack) {
+		throw("FixedStack is not power-of-2")
+	}
+
+	if !checkASM() {
+		throw("assembly checks failed")
+	}
+}
+
+type dbgVar struct {
+	name   string
+	value  *int32        // for variables that can only be set at startup
+	atomic *atomic.Int32 // for variables that can be changed during execution
+	def    int32         // default value (ideally zero)
+}
+
+// Holds variables parsed from GODEBUG env var,
+// except for "memprofilerate" since there is an
+// existing int var for that value, which may
+// already have an initial value.
+var debug struct {
+	cgocheck           int32
+	clobberfree        int32
+	disablethp          int32
+	dontfreezetheworld int32
+	efence             int32
+	gccheckmark        int32
+	gcpacertrace       int32
+	gcshrinkstackoff   int32
+	gcstoptheworld     int32
+	gctrace            int32
+	invalidptr         int32
+	madvdontneed       int32 // for Linux; issue 28466
+	scavtrace          int32
+	scheddetail        int32
+	schedtrace         int32
+	tracebackancestors int32
+	asyncpreemptoff    int32
+	harddecommit       int32
+	adaptivestackstart int32
+	tracefpunwindoff   int32
+
+	// debug.malloc is used as a combined debug check
+	// in the malloc function and should be set
+	// if any of the below debug options is != 0.
+	malloc         bool
+	allocfreetrace int32
+	inittrace      int32
+	sbrk           int32
+
+	panicnil atomic.Int32
+}
+
+var dbgvars = []*dbgVar{
+	{name: "allocfreetrace", value: &debug.allocfreetrace},
+	{name: "clobberfree", value: &debug.clobberfree},
+	{name: "cgocheck", value: &debug.cgocheck},
+	{name: "disablethp", value: &debug.disablethp},
+	{name: "dontfreezetheworld", value: &debug.dontfreezetheworld},
+	{name: "efence", value: &debug.efence},
+	{name: "gccheckmark", value: &debug.gccheckmark},
+	{name: "gcpacertrace", value: &debug.gcpacertrace},
+	{name: "gcshrinkstackoff", value: &debug.gcshrinkstackoff},
+	{name: "gcstoptheworld", value: &debug.gcstoptheworld},
+	{name: "gctrace", value: &debug.gctrace},
+	{name: "invalidptr", value: &debug.invalidptr},
+	{name: "madvdontneed", value: &debug.madvdontneed},
+	{name: "sbrk", value: &debug.sbrk},
+	{name: "scavtrace", value: &debug.scavtrace},
+	{name: "scheddetail", value: &debug.scheddetail},
+	{name: "schedtrace", value: &debug.schedtrace},
+	{name: "tracebackancestors", value: &debug.tracebackancestors},
+	{name: "asyncpreemptoff", value: &debug.asyncpreemptoff},
+	{name: "inittrace", value: &debug.inittrace},
+	{name: "harddecommit", value: &debug.harddecommit},
+	{name: "adaptivestackstart", value: &debug.adaptivestackstart},
+	{name: "tracefpunwindoff", value: &debug.tracefpunwindoff},
+	{name: "panicnil", atomic: &debug.panicnil},
+}
+
+func parsedebugvars() {
+	// defaults
+	debug.cgocheck = 1
+	debug.invalidptr = 1
+	debug.adaptivestackstart = 1 // set this to 0 to turn larger initial goroutine stacks off
+	if GOOS == "linux" {
+		// On Linux, MADV_FREE is faster than MADV_DONTNEED,
+		// but doesn't affect many of the statistics that
+		// MADV_DONTNEED does until the memory is actually
+		// reclaimed. This generally leads to poor user
+		// experience, like confusing stats in top and other
+		// monitoring tools; and bad integration with
+		// management systems that respond to memory usage.
+		// Hence, default to MADV_DONTNEED.
+		debug.madvdontneed = 1
+	}
+
+	godebug := gogetenv("GODEBUG")
+
+	p := new(string)
+	*p = godebug
+	godebugEnv.Store(p)
+
+	// apply runtime defaults, if any
+	for _, v := range dbgvars {
+		if v.def != 0 {
+			// Every var should have either v.value or v.atomic set.
+			if v.value != nil {
+				*v.value = v.def
+			} else if v.atomic != nil {
+				v.atomic.Store(v.def)
+			}
+		}
+	}
+
+	// apply compile-time GODEBUG settings
+	parsegodebug(godebugDefault, nil)
+
+	// apply environment settings
+	parsegodebug(godebug, nil)
+
+	debug.malloc = (debug.allocfreetrace | debug.inittrace | debug.sbrk) != 0
+
+	setTraceback(gogetenv("GOTRACEBACK"))
+	traceback_env = traceback_cache
+}
+
+// reparsedebugvars reparses the runtime's debug variables
+// because the environment variable has been changed to env.
+func reparsedebugvars(env string) {
+	seen := make(map[string]bool)
+	// apply environment settings
+	parsegodebug(env, seen)
+	// apply compile-time GODEBUG settings for as-yet-unseen variables
+	parsegodebug(godebugDefault, seen)
+	// apply defaults for as-yet-unseen variables
+	for _, v := range dbgvars {
+		if v.atomic != nil && !seen[v.name] {
+			v.atomic.Store(0)
+		}
+	}
+}
+
+// parsegodebug parses the godebug string, updating variables listed in dbgvars.
+// If seen == nil, this is startup time and we process the string left to right
+// overwriting older settings with newer ones.
+// If seen != nil, $GODEBUG has changed and we are doing an
+// incremental update. To avoid flapping in the case where a value is
+// set multiple times (perhaps in the default and the environment,
+// or perhaps twice in the environment), we process the string right-to-left
+// and only change values not already seen. After doing this for both
+// the environment and the default settings, the caller must also call
+// cleargodebug(seen) to reset any now-unset values back to their defaults.
+func parsegodebug(godebug string, seen map[string]bool) {
+	for p := godebug; p != ""; {
+		var field string
+		if seen == nil {
+			// startup: process left to right, overwriting older settings with newer
+			i := bytealg.IndexByteString(p, ',')
+			if i < 0 {
+				field, p = p, ""
+			} else {
+				field, p = p[:i], p[i+1:]
+			}
+		} else {
+			// incremental update: process right to left, updating and skipping seen
+			i := len(p) - 1
+			for i >= 0 && p[i] != ',' {
+				i--
+			}
+			if i < 0 {
+				p, field = "", p
+			} else {
+				p, field = p[:i], p[i+1:]
+			}
+		}
+		i := bytealg.IndexByteString(field, '=')
+		if i < 0 {
+			continue
+		}
+		key, value := field[:i], field[i+1:]
+		if seen[key] {
+			continue
+		}
+		if seen != nil {
+			seen[key] = true
+		}
+
+		// Update MemProfileRate directly here since it
+		// is int, not int32, and should only be updated
+		// if specified in GODEBUG.
+		if seen == nil && key == "memprofilerate" {
+			if n, ok := atoi(value); ok {
+				MemProfileRate = n
+			}
+		} else {
+			for _, v := range dbgvars {
+				if v.name == key {
+					if n, ok := atoi32(value); ok {
+						if seen == nil && v.value != nil {
+							*v.value = n
+						} else if v.atomic != nil {
+							v.atomic.Store(n)
+						}
+					}
+				}
+			}
+		}
+	}
+
+	if debug.cgocheck > 1 {
+		throw("cgocheck > 1 mode is no longer supported at runtime. Use GOEXPERIMENT=cgocheck2 at build time instead.")
+	}
+}
+
+//go:linkname setTraceback runtime/debug.SetTraceback
+func setTraceback(level string) {
+	var t uint32
+	switch level {
+	case "none":
+		t = 0
+	case "single", "":
+		t = 1 << tracebackShift
+	case "all":
+		t = 1<<tracebackShift | tracebackAll
+	case "system":
+		t = 2<<tracebackShift | tracebackAll
+	case "crash":
+		t = 2<<tracebackShift | tracebackAll | tracebackCrash
+	case "wer":
+		if GOOS == "windows" {
+			t = 2<<tracebackShift | tracebackAll | tracebackCrash
+			enableWER()
+			break
+		}
+		fallthrough
+	default:
+		t = tracebackAll
+		if n, ok := atoi(level); ok && n == int(uint32(n)) {
+			t |= uint32(n) << tracebackShift
+		}
+	}
+	// when C owns the process, simply exit'ing the process on fatal errors
+	// and panics is surprising. Be louder and abort instead.
+	if islibrary || isarchive {
+		t |= tracebackCrash
+	}
+
+	t |= traceback_env
+
+	atomic.Store(&traceback_cache, t)
+}
+
+// Poor mans 64-bit division.
+// This is a very special function, do not use it if you are not sure what you are doing.
+// int64 division is lowered into _divv() call on 386, which does not fit into nosplit functions.
+// Handles overflow in a time-specific manner.
+// This keeps us within no-split stack limits on 32-bit processors.
+//
+//go:nosplit
+func timediv(v int64, div int32, rem *int32) int32 {
+	res := int32(0)
+	for bit := 30; bit >= 0; bit-- {
+		if v >= int64(div)<<uint(bit) {
+			v = v - (int64(div) << uint(bit))
+			// Before this for loop, res was 0, thus all these
+			// power of 2 increments are now just bitsets.
+			res |= 1 << uint(bit)
+		}
+	}
+	if v >= int64(div) {
+		if rem != nil {
+			*rem = 0
+		}
+		return 0x7fffffff
+	}
+	if rem != nil {
+		*rem = int32(v)
+	}
+	return res
+}
+
+// Helpers for Go. Must be NOSPLIT, must only call NOSPLIT functions, and must not block.
+
+//go:nosplit
+func acquirem() *m {
+	gp := getg()
+	gp.m.locks++
+	return gp.m
+}
+
+//go:nosplit
+func releasem(mp *m) {
+	gp := getg()
+	mp.locks--
+	if mp.locks == 0 && gp.preempt {
+		// restore the preemption request in case we've cleared it in newstack
+		gp.stackguard0 = stackPreempt
+	}
+}
+
+//go:linkname reflect_typelinks reflect.typelinks
+func reflect_typelinks() ([]unsafe.Pointer, [][]int32) {
+	modules := activeModules()
+	sections := []unsafe.Pointer{unsafe.Pointer(modules[0].types)}
+	ret := [][]int32{modules[0].typelinks}
+	for _, md := range modules[1:] {
+		sections = append(sections, unsafe.Pointer(md.types))
+		ret = append(ret, md.typelinks)
+	}
+	return sections, ret
+}
+
+// reflect_resolveNameOff resolves a name offset from a base pointer.
+//
+//go:linkname reflect_resolveNameOff reflect.resolveNameOff
+func reflect_resolveNameOff(ptrInModule unsafe.Pointer, off int32) unsafe.Pointer {
+	return unsafe.Pointer(resolveNameOff(ptrInModule, nameOff(off)).Bytes)
+}
+
+// reflect_resolveTypeOff resolves an *rtype offset from a base type.
+//
+//go:linkname reflect_resolveTypeOff reflect.resolveTypeOff
+func reflect_resolveTypeOff(rtype unsafe.Pointer, off int32) unsafe.Pointer {
+	return unsafe.Pointer(toRType((*_type)(rtype)).typeOff(typeOff(off)))
+}
+
+// reflect_resolveTextOff resolves a function pointer offset from a base type.
+//
+//go:linkname reflect_resolveTextOff reflect.resolveTextOff
+func reflect_resolveTextOff(rtype unsafe.Pointer, off int32) unsafe.Pointer {
+	return toRType((*_type)(rtype)).textOff(textOff(off))
+
+}
+
+// reflectlite_resolveNameOff resolves a name offset from a base pointer.
+//
+//go:linkname reflectlite_resolveNameOff internal/reflectlite.resolveNameOff
+func reflectlite_resolveNameOff(ptrInModule unsafe.Pointer, off int32) unsafe.Pointer {
+	return unsafe.Pointer(resolveNameOff(ptrInModule, nameOff(off)).Bytes)
+}
+
+// reflectlite_resolveTypeOff resolves an *rtype offset from a base type.
+//
+//go:linkname reflectlite_resolveTypeOff internal/reflectlite.resolveTypeOff
+func reflectlite_resolveTypeOff(rtype unsafe.Pointer, off int32) unsafe.Pointer {
+	return unsafe.Pointer(toRType((*_type)(rtype)).typeOff(typeOff(off)))
+}
+
+// reflect_addReflectOff adds a pointer to the reflection offset lookup map.
+//
+//go:linkname reflect_addReflectOff reflect.addReflectOff
+func reflect_addReflectOff(ptr unsafe.Pointer) int32 {
+	reflectOffsLock()
+	if reflectOffs.m == nil {
+		reflectOffs.m = make(map[int32]unsafe.Pointer)
+		reflectOffs.minv = make(map[unsafe.Pointer]int32)
+		reflectOffs.next = -1
+	}
+	id, found := reflectOffs.minv[ptr]
+	if !found {
+		id = reflectOffs.next
+		reflectOffs.next-- // use negative offsets as IDs to aid debugging
+		reflectOffs.m[id] = ptr
+		reflectOffs.minv[ptr] = id
+	}
+	reflectOffsUnlock()
+	return id
+}
diff --git a/src/runtime/runtime2.go b/src/runtime/runtime2.go
new file mode 100644
index 0000000..f4c76ab
--- /dev/null
+++ b/src/runtime/runtime2.go
@@ -0,0 +1,1193 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// defined constants
+const (
+	// G status
+	//
+	// Beyond indicating the general state of a G, the G status
+	// acts like a lock on the goroutine's stack (and hence its
+	// ability to execute user code).
+	//
+	// If you add to this list, add to the list
+	// of "okay during garbage collection" status
+	// in mgcmark.go too.
+	//
+	// TODO(austin): The _Gscan bit could be much lighter-weight.
+	// For example, we could choose not to run _Gscanrunnable
+	// goroutines found in the run queue, rather than CAS-looping
+	// until they become _Grunnable. And transitions like
+	// _Gscanwaiting -> _Gscanrunnable are actually okay because
+	// they don't affect stack ownership.
+
+	// _Gidle means this goroutine was just allocated and has not
+	// yet been initialized.
+	_Gidle = iota // 0
+
+	// _Grunnable means this goroutine is on a run queue. It is
+	// not currently executing user code. The stack is not owned.
+	_Grunnable // 1
+
+	// _Grunning means this goroutine may execute user code. The
+	// stack is owned by this goroutine. It is not on a run queue.
+	// It is assigned an M and a P (g.m and g.m.p are valid).
+	_Grunning // 2
+
+	// _Gsyscall means this goroutine is executing a system call.
+	// It is not executing user code. The stack is owned by this
+	// goroutine. It is not on a run queue. It is assigned an M.
+	_Gsyscall // 3
+
+	// _Gwaiting means this goroutine is blocked in the runtime.
+	// It is not executing user code. It is not on a run queue,
+	// but should be recorded somewhere (e.g., a channel wait
+	// queue) so it can be ready()d when necessary. The stack is
+	// not owned *except* that a channel operation may read or
+	// write parts of the stack under the appropriate channel
+	// lock. Otherwise, it is not safe to access the stack after a
+	// goroutine enters _Gwaiting (e.g., it may get moved).
+	_Gwaiting // 4
+
+	// _Gmoribund_unused is currently unused, but hardcoded in gdb
+	// scripts.
+	_Gmoribund_unused // 5
+
+	// _Gdead means this goroutine is currently unused. It may be
+	// just exited, on a free list, or just being initialized. It
+	// is not executing user code. It may or may not have a stack
+	// allocated. The G and its stack (if any) are owned by the M
+	// that is exiting the G or that obtained the G from the free
+	// list.
+	_Gdead // 6
+
+	// _Genqueue_unused is currently unused.
+	_Genqueue_unused // 7
+
+	// _Gcopystack means this goroutine's stack is being moved. It
+	// is not executing user code and is not on a run queue. The
+	// stack is owned by the goroutine that put it in _Gcopystack.
+	_Gcopystack // 8
+
+	// _Gpreempted means this goroutine stopped itself for a
+	// suspendG preemption. It is like _Gwaiting, but nothing is
+	// yet responsible for ready()ing it. Some suspendG must CAS
+	// the status to _Gwaiting to take responsibility for
+	// ready()ing this G.
+	_Gpreempted // 9
+
+	// _Gscan combined with one of the above states other than
+	// _Grunning indicates that GC is scanning the stack. The
+	// goroutine is not executing user code and the stack is owned
+	// by the goroutine that set the _Gscan bit.
+	//
+	// _Gscanrunning is different: it is used to briefly block
+	// state transitions while GC signals the G to scan its own
+	// stack. This is otherwise like _Grunning.
+	//
+	// atomicstatus&~Gscan gives the state the goroutine will
+	// return to when the scan completes.
+	_Gscan          = 0x1000
+	_Gscanrunnable  = _Gscan + _Grunnable  // 0x1001
+	_Gscanrunning   = _Gscan + _Grunning   // 0x1002
+	_Gscansyscall   = _Gscan + _Gsyscall   // 0x1003
+	_Gscanwaiting   = _Gscan + _Gwaiting   // 0x1004
+	_Gscanpreempted = _Gscan + _Gpreempted // 0x1009
+)
+
+const (
+	// P status
+
+	// _Pidle means a P is not being used to run user code or the
+	// scheduler. Typically, it's on the idle P list and available
+	// to the scheduler, but it may just be transitioning between
+	// other states.
+	//
+	// The P is owned by the idle list or by whatever is
+	// transitioning its state. Its run queue is empty.
+	_Pidle = iota
+
+	// _Prunning means a P is owned by an M and is being used to
+	// run user code or the scheduler. Only the M that owns this P
+	// is allowed to change the P's status from _Prunning. The M
+	// may transition the P to _Pidle (if it has no more work to
+	// do), _Psyscall (when entering a syscall), or _Pgcstop (to
+	// halt for the GC). The M may also hand ownership of the P
+	// off directly to another M (e.g., to schedule a locked G).
+	_Prunning
+
+	// _Psyscall means a P is not running user code. It has
+	// affinity to an M in a syscall but is not owned by it and
+	// may be stolen by another M. This is similar to _Pidle but
+	// uses lightweight transitions and maintains M affinity.
+	//
+	// Leaving _Psyscall must be done with a CAS, either to steal
+	// or retake the P. Note that there's an ABA hazard: even if
+	// an M successfully CASes its original P back to _Prunning
+	// after a syscall, it must understand the P may have been
+	// used by another M in the interim.
+	_Psyscall
+
+	// _Pgcstop means a P is halted for STW and owned by the M
+	// that stopped the world. The M that stopped the world
+	// continues to use its P, even in _Pgcstop. Transitioning
+	// from _Prunning to _Pgcstop causes an M to release its P and
+	// park.
+	//
+	// The P retains its run queue and startTheWorld will restart
+	// the scheduler on Ps with non-empty run queues.
+	_Pgcstop
+
+	// _Pdead means a P is no longer used (GOMAXPROCS shrank). We
+	// reuse Ps if GOMAXPROCS increases. A dead P is mostly
+	// stripped of its resources, though a few things remain
+	// (e.g., trace buffers).
+	_Pdead
+)
+
+// Mutual exclusion locks.  In the uncontended case,
+// as fast as spin locks (just a few user-level instructions),
+// but on the contention path they sleep in the kernel.
+// A zeroed Mutex is unlocked (no need to initialize each lock).
+// Initialization is helpful for static lock ranking, but not required.
+type mutex struct {
+	// Empty struct if lock ranking is disabled, otherwise includes the lock rank
+	lockRankStruct
+	// Futex-based impl treats it as uint32 key,
+	// while sema-based impl as M* waitm.
+	// Used to be a union, but unions break precise GC.
+	key uintptr
+}
+
+// sleep and wakeup on one-time events.
+// before any calls to notesleep or notewakeup,
+// must call noteclear to initialize the Note.
+// then, exactly one thread can call notesleep
+// and exactly one thread can call notewakeup (once).
+// once notewakeup has been called, the notesleep
+// will return.  future notesleep will return immediately.
+// subsequent noteclear must be called only after
+// previous notesleep has returned, e.g. it's disallowed
+// to call noteclear straight after notewakeup.
+//
+// notetsleep is like notesleep but wakes up after
+// a given number of nanoseconds even if the event
+// has not yet happened.  if a goroutine uses notetsleep to
+// wake up early, it must wait to call noteclear until it
+// can be sure that no other goroutine is calling
+// notewakeup.
+//
+// notesleep/notetsleep are generally called on g0,
+// notetsleepg is similar to notetsleep but is called on user g.
+type note struct {
+	// Futex-based impl treats it as uint32 key,
+	// while sema-based impl as M* waitm.
+	// Used to be a union, but unions break precise GC.
+	key uintptr
+}
+
+type funcval struct {
+	fn uintptr
+	// variable-size, fn-specific data here
+}
+
+type iface struct {
+	tab  *itab
+	data unsafe.Pointer
+}
+
+type eface struct {
+	_type *_type
+	data  unsafe.Pointer
+}
+
+func efaceOf(ep *any) *eface {
+	return (*eface)(unsafe.Pointer(ep))
+}
+
+// The guintptr, muintptr, and puintptr are all used to bypass write barriers.
+// It is particularly important to avoid write barriers when the current P has
+// been released, because the GC thinks the world is stopped, and an
+// unexpected write barrier would not be synchronized with the GC,
+// which can lead to a half-executed write barrier that has marked the object
+// but not queued it. If the GC skips the object and completes before the
+// queuing can occur, it will incorrectly free the object.
+//
+// We tried using special assignment functions invoked only when not
+// holding a running P, but then some updates to a particular memory
+// word went through write barriers and some did not. This breaks the
+// write barrier shadow checking mode, and it is also scary: better to have
+// a word that is completely ignored by the GC than to have one for which
+// only a few updates are ignored.
+//
+// Gs and Ps are always reachable via true pointers in the
+// allgs and allp lists or (during allocation before they reach those lists)
+// from stack variables.
+//
+// Ms are always reachable via true pointers either from allm or
+// freem. Unlike Gs and Ps we do free Ms, so it's important that
+// nothing ever hold an muintptr across a safe point.
+
+// A guintptr holds a goroutine pointer, but typed as a uintptr
+// to bypass write barriers. It is used in the Gobuf goroutine state
+// and in scheduling lists that are manipulated without a P.
+//
+// The Gobuf.g goroutine pointer is almost always updated by assembly code.
+// In one of the few places it is updated by Go code - func save - it must be
+// treated as a uintptr to avoid a write barrier being emitted at a bad time.
+// Instead of figuring out how to emit the write barriers missing in the
+// assembly manipulation, we change the type of the field to uintptr,
+// so that it does not require write barriers at all.
+//
+// Goroutine structs are published in the allg list and never freed.
+// That will keep the goroutine structs from being collected.
+// There is never a time that Gobuf.g's contain the only references
+// to a goroutine: the publishing of the goroutine in allg comes first.
+// Goroutine pointers are also kept in non-GC-visible places like TLS,
+// so I can't see them ever moving. If we did want to start moving data
+// in the GC, we'd need to allocate the goroutine structs from an
+// alternate arena. Using guintptr doesn't make that problem any worse.
+// Note that pollDesc.rg, pollDesc.wg also store g in uintptr form,
+// so they would need to be updated too if g's start moving.
+type guintptr uintptr
+
+//go:nosplit
+func (gp guintptr) ptr() *g { return (*g)(unsafe.Pointer(gp)) }
+
+//go:nosplit
+func (gp *guintptr) set(g *g) { *gp = guintptr(unsafe.Pointer(g)) }
+
+//go:nosplit
+func (gp *guintptr) cas(old, new guintptr) bool {
+	return atomic.Casuintptr((*uintptr)(unsafe.Pointer(gp)), uintptr(old), uintptr(new))
+}
+
+//go:nosplit
+func (gp *g) guintptr() guintptr {
+	return guintptr(unsafe.Pointer(gp))
+}
+
+// setGNoWB performs *gp = new without a write barrier.
+// For times when it's impractical to use a guintptr.
+//
+//go:nosplit
+//go:nowritebarrier
+func setGNoWB(gp **g, new *g) {
+	(*guintptr)(unsafe.Pointer(gp)).set(new)
+}
+
+type puintptr uintptr
+
+//go:nosplit
+func (pp puintptr) ptr() *p { return (*p)(unsafe.Pointer(pp)) }
+
+//go:nosplit
+func (pp *puintptr) set(p *p) { *pp = puintptr(unsafe.Pointer(p)) }
+
+// muintptr is a *m that is not tracked by the garbage collector.
+//
+// Because we do free Ms, there are some additional constrains on
+// muintptrs:
+//
+//  1. Never hold an muintptr locally across a safe point.
+//
+//  2. Any muintptr in the heap must be owned by the M itself so it can
+//     ensure it is not in use when the last true *m is released.
+type muintptr uintptr
+
+//go:nosplit
+func (mp muintptr) ptr() *m { return (*m)(unsafe.Pointer(mp)) }
+
+//go:nosplit
+func (mp *muintptr) set(m *m) { *mp = muintptr(unsafe.Pointer(m)) }
+
+// setMNoWB performs *mp = new without a write barrier.
+// For times when it's impractical to use an muintptr.
+//
+//go:nosplit
+//go:nowritebarrier
+func setMNoWB(mp **m, new *m) {
+	(*muintptr)(unsafe.Pointer(mp)).set(new)
+}
+
+type gobuf struct {
+	// The offsets of sp, pc, and g are known to (hard-coded in) libmach.
+	//
+	// ctxt is unusual with respect to GC: it may be a
+	// heap-allocated funcval, so GC needs to track it, but it
+	// needs to be set and cleared from assembly, where it's
+	// difficult to have write barriers. However, ctxt is really a
+	// saved, live register, and we only ever exchange it between
+	// the real register and the gobuf. Hence, we treat it as a
+	// root during stack scanning, which means assembly that saves
+	// and restores it doesn't need write barriers. It's still
+	// typed as a pointer so that any other writes from Go get
+	// write barriers.
+	sp   uintptr
+	pc   uintptr
+	g    guintptr
+	ctxt unsafe.Pointer
+	ret  uintptr
+	lr   uintptr
+	bp   uintptr // for framepointer-enabled architectures
+}
+
+// sudog represents a g in a wait list, such as for sending/receiving
+// on a channel.
+//
+// sudog is necessary because the g ↔ synchronization object relation
+// is many-to-many. A g can be on many wait lists, so there may be
+// many sudogs for one g; and many gs may be waiting on the same
+// synchronization object, so there may be many sudogs for one object.
+//
+// sudogs are allocated from a special pool. Use acquireSudog and
+// releaseSudog to allocate and free them.
+type sudog struct {
+	// The following fields are protected by the hchan.lock of the
+	// channel this sudog is blocking on. shrinkstack depends on
+	// this for sudogs involved in channel ops.
+
+	g *g
+
+	next *sudog
+	prev *sudog
+	elem unsafe.Pointer // data element (may point to stack)
+
+	// The following fields are never accessed concurrently.
+	// For channels, waitlink is only accessed by g.
+	// For semaphores, all fields (including the ones above)
+	// are only accessed when holding a semaRoot lock.
+
+	acquiretime int64
+	releasetime int64
+	ticket      uint32
+
+	// isSelect indicates g is participating in a select, so
+	// g.selectDone must be CAS'd to win the wake-up race.
+	isSelect bool
+
+	// success indicates whether communication over channel c
+	// succeeded. It is true if the goroutine was awoken because a
+	// value was delivered over channel c, and false if awoken
+	// because c was closed.
+	success bool
+
+	parent   *sudog // semaRoot binary tree
+	waitlink *sudog // g.waiting list or semaRoot
+	waittail *sudog // semaRoot
+	c        *hchan // channel
+}
+
+type libcall struct {
+	fn   uintptr
+	n    uintptr // number of parameters
+	args uintptr // parameters
+	r1   uintptr // return values
+	r2   uintptr
+	err  uintptr // error number
+}
+
+// Stack describes a Go execution stack.
+// The bounds of the stack are exactly [lo, hi),
+// with no implicit data structures on either side.
+type stack struct {
+	lo uintptr
+	hi uintptr
+}
+
+// heldLockInfo gives info on a held lock and the rank of that lock
+type heldLockInfo struct {
+	lockAddr uintptr
+	rank     lockRank
+}
+
+type g struct {
+	// Stack parameters.
+	// stack describes the actual stack memory: [stack.lo, stack.hi).
+	// stackguard0 is the stack pointer compared in the Go stack growth prologue.
+	// It is stack.lo+StackGuard normally, but can be StackPreempt to trigger a preemption.
+	// stackguard1 is the stack pointer compared in the C stack growth prologue.
+	// It is stack.lo+StackGuard on g0 and gsignal stacks.
+	// It is ~0 on other goroutine stacks, to trigger a call to morestackc (and crash).
+	stack       stack   // offset known to runtime/cgo
+	stackguard0 uintptr // offset known to liblink
+	stackguard1 uintptr // offset known to liblink
+
+	_panic    *_panic // innermost panic - offset known to liblink
+	_defer    *_defer // innermost defer
+	m         *m      // current m; offset known to arm liblink
+	sched     gobuf
+	syscallsp uintptr // if status==Gsyscall, syscallsp = sched.sp to use during gc
+	syscallpc uintptr // if status==Gsyscall, syscallpc = sched.pc to use during gc
+	stktopsp  uintptr // expected sp at top of stack, to check in traceback
+	// param is a generic pointer parameter field used to pass
+	// values in particular contexts where other storage for the
+	// parameter would be difficult to find. It is currently used
+	// in three ways:
+	// 1. When a channel operation wakes up a blocked goroutine, it sets param to
+	//    point to the sudog of the completed blocking operation.
+	// 2. By gcAssistAlloc1 to signal back to its caller that the goroutine completed
+	//    the GC cycle. It is unsafe to do so in any other way, because the goroutine's
+	//    stack may have moved in the meantime.
+	// 3. By debugCallWrap to pass parameters to a new goroutine because allocating a
+	//    closure in the runtime is forbidden.
+	param        unsafe.Pointer
+	atomicstatus atomic.Uint32
+	stackLock    uint32 // sigprof/scang lock; TODO: fold in to atomicstatus
+	goid         uint64
+	schedlink    guintptr
+	waitsince    int64      // approx time when the g become blocked
+	waitreason   waitReason // if status==Gwaiting
+
+	preempt       bool // preemption signal, duplicates stackguard0 = stackpreempt
+	preemptStop   bool // transition to _Gpreempted on preemption; otherwise, just deschedule
+	preemptShrink bool // shrink stack at synchronous safe point
+
+	// asyncSafePoint is set if g is stopped at an asynchronous
+	// safe point. This means there are frames on the stack
+	// without precise pointer information.
+	asyncSafePoint bool
+
+	paniconfault bool // panic (instead of crash) on unexpected fault address
+	gcscandone   bool // g has scanned stack; protected by _Gscan bit in status
+	throwsplit   bool // must not split stack
+	// activeStackChans indicates that there are unlocked channels
+	// pointing into this goroutine's stack. If true, stack
+	// copying needs to acquire channel locks to protect these
+	// areas of the stack.
+	activeStackChans bool
+	// parkingOnChan indicates that the goroutine is about to
+	// park on a chansend or chanrecv. Used to signal an unsafe point
+	// for stack shrinking.
+	parkingOnChan atomic.Bool
+
+	raceignore    int8  // ignore race detection events
+	tracking      bool  // whether we're tracking this G for sched latency statistics
+	trackingSeq   uint8 // used to decide whether to track this G
+	trackingStamp int64 // timestamp of when the G last started being tracked
+	runnableTime  int64 // the amount of time spent runnable, cleared when running, only used when tracking
+	lockedm       muintptr
+	sig           uint32
+	writebuf      []byte
+	sigcode0      uintptr
+	sigcode1      uintptr
+	sigpc         uintptr
+	parentGoid    uint64          // goid of goroutine that created this goroutine
+	gopc          uintptr         // pc of go statement that created this goroutine
+	ancestors     *[]ancestorInfo // ancestor information goroutine(s) that created this goroutine (only used if debug.tracebackancestors)
+	startpc       uintptr         // pc of goroutine function
+	racectx       uintptr
+	waiting       *sudog         // sudog structures this g is waiting on (that have a valid elem ptr); in lock order
+	cgoCtxt       []uintptr      // cgo traceback context
+	labels        unsafe.Pointer // profiler labels
+	timer         *timer         // cached timer for time.Sleep
+	selectDone    atomic.Uint32  // are we participating in a select and did someone win the race?
+
+	// goroutineProfiled indicates the status of this goroutine's stack for the
+	// current in-progress goroutine profile
+	goroutineProfiled goroutineProfileStateHolder
+
+	// Per-G tracer state.
+	trace gTraceState
+
+	// Per-G GC state
+
+	// gcAssistBytes is this G's GC assist credit in terms of
+	// bytes allocated. If this is positive, then the G has credit
+	// to allocate gcAssistBytes bytes without assisting. If this
+	// is negative, then the G must correct this by performing
+	// scan work. We track this in bytes to make it fast to update
+	// and check for debt in the malloc hot path. The assist ratio
+	// determines how this corresponds to scan work debt.
+	gcAssistBytes int64
+}
+
+// gTrackingPeriod is the number of transitions out of _Grunning between
+// latency tracking runs.
+const gTrackingPeriod = 8
+
+const (
+	// tlsSlots is the number of pointer-sized slots reserved for TLS on some platforms,
+	// like Windows.
+	tlsSlots = 6
+	tlsSize  = tlsSlots * goarch.PtrSize
+)
+
+// Values for m.freeWait.
+const (
+	freeMStack = 0 // M done, free stack and reference.
+	freeMRef   = 1 // M done, free reference.
+	freeMWait  = 2 // M still in use.
+)
+
+type m struct {
+	g0      *g     // goroutine with scheduling stack
+	morebuf gobuf  // gobuf arg to morestack
+	divmod  uint32 // div/mod denominator for arm - known to liblink
+	_       uint32 // align next field to 8 bytes
+
+	// Fields not known to debuggers.
+	procid        uint64            // for debuggers, but offset not hard-coded
+	gsignal       *g                // signal-handling g
+	goSigStack    gsignalStack      // Go-allocated signal handling stack
+	sigmask       sigset            // storage for saved signal mask
+	tls           [tlsSlots]uintptr // thread-local storage (for x86 extern register)
+	mstartfn      func()
+	curg          *g       // current running goroutine
+	caughtsig     guintptr // goroutine running during fatal signal
+	p             puintptr // attached p for executing go code (nil if not executing go code)
+	nextp         puintptr
+	oldp          puintptr // the p that was attached before executing a syscall
+	id            int64
+	mallocing     int32
+	throwing      throwType
+	preemptoff    string // if != "", keep curg running on this m
+	locks         int32
+	dying         int32
+	profilehz     int32
+	spinning      bool // m is out of work and is actively looking for work
+	blocked       bool // m is blocked on a note
+	newSigstack   bool // minit on C thread called sigaltstack
+	printlock     int8
+	incgo         bool          // m is executing a cgo call
+	isextra       bool          // m is an extra m
+	isExtraInC    bool          // m is an extra m that is not executing Go code
+	freeWait      atomic.Uint32 // Whether it is safe to free g0 and delete m (one of freeMRef, freeMStack, freeMWait)
+	fastrand      uint64
+	needextram    bool
+	traceback     uint8
+	ncgocall      uint64        // number of cgo calls in total
+	ncgo          int32         // number of cgo calls currently in progress
+	cgoCallersUse atomic.Uint32 // if non-zero, cgoCallers in use temporarily
+	cgoCallers    *cgoCallers   // cgo traceback if crashing in cgo call
+	park          note
+	alllink       *m // on allm
+	schedlink     muintptr
+	lockedg       guintptr
+	createstack   [32]uintptr // stack that created this thread.
+	lockedExt     uint32      // tracking for external LockOSThread
+	lockedInt     uint32      // tracking for internal lockOSThread
+	nextwaitm     muintptr    // next m waiting for lock
+
+	// wait* are used to carry arguments from gopark into park_m, because
+	// there's no stack to put them on. That is their sole purpose.
+	waitunlockf          func(*g, unsafe.Pointer) bool
+	waitlock             unsafe.Pointer
+	waitTraceBlockReason traceBlockReason
+	waitTraceSkip        int
+
+	syscalltick uint32
+	freelink    *m // on sched.freem
+	trace       mTraceState
+
+	// these are here because they are too large to be on the stack
+	// of low-level NOSPLIT functions.
+	libcall   libcall
+	libcallpc uintptr // for cpu profiler
+	libcallsp uintptr
+	libcallg  guintptr
+	syscall   libcall // stores syscall parameters on windows
+
+	vdsoSP uintptr // SP for traceback while in VDSO call (0 if not in call)
+	vdsoPC uintptr // PC for traceback while in VDSO call
+
+	// preemptGen counts the number of completed preemption
+	// signals. This is used to detect when a preemption is
+	// requested, but fails.
+	preemptGen atomic.Uint32
+
+	// Whether this is a pending preemption signal on this M.
+	signalPending atomic.Uint32
+
+	dlogPerM
+
+	mOS
+
+	// Up to 10 locks held by this m, maintained by the lock ranking code.
+	locksHeldLen int
+	locksHeld    [10]heldLockInfo
+}
+
+type p struct {
+	id          int32
+	status      uint32 // one of pidle/prunning/...
+	link        puintptr
+	schedtick   uint32     // incremented on every scheduler call
+	syscalltick uint32     // incremented on every system call
+	sysmontick  sysmontick // last tick observed by sysmon
+	m           muintptr   // back-link to associated m (nil if idle)
+	mcache      *mcache
+	pcache      pageCache
+	raceprocctx uintptr
+
+	deferpool    []*_defer // pool of available defer structs (see panic.go)
+	deferpoolbuf [32]*_defer
+
+	// Cache of goroutine ids, amortizes accesses to runtime·sched.goidgen.
+	goidcache    uint64
+	goidcacheend uint64
+
+	// Queue of runnable goroutines. Accessed without lock.
+	runqhead uint32
+	runqtail uint32
+	runq     [256]guintptr
+	// runnext, if non-nil, is a runnable G that was ready'd by
+	// the current G and should be run next instead of what's in
+	// runq if there's time remaining in the running G's time
+	// slice. It will inherit the time left in the current time
+	// slice. If a set of goroutines is locked in a
+	// communicate-and-wait pattern, this schedules that set as a
+	// unit and eliminates the (potentially large) scheduling
+	// latency that otherwise arises from adding the ready'd
+	// goroutines to the end of the run queue.
+	//
+	// Note that while other P's may atomically CAS this to zero,
+	// only the owner P can CAS it to a valid G.
+	runnext guintptr
+
+	// Available G's (status == Gdead)
+	gFree struct {
+		gList
+		n int32
+	}
+
+	sudogcache []*sudog
+	sudogbuf   [128]*sudog
+
+	// Cache of mspan objects from the heap.
+	mspancache struct {
+		// We need an explicit length here because this field is used
+		// in allocation codepaths where write barriers are not allowed,
+		// and eliminating the write barrier/keeping it eliminated from
+		// slice updates is tricky, more so than just managing the length
+		// ourselves.
+		len int
+		buf [128]*mspan
+	}
+
+	// Cache of a single pinner object to reduce allocations from repeated
+	// pinner creation.
+	pinnerCache *pinner
+
+	trace pTraceState
+
+	palloc persistentAlloc // per-P to avoid mutex
+
+	// The when field of the first entry on the timer heap.
+	// This is 0 if the timer heap is empty.
+	timer0When atomic.Int64
+
+	// The earliest known nextwhen field of a timer with
+	// timerModifiedEarlier status. Because the timer may have been
+	// modified again, there need not be any timer with this value.
+	// This is 0 if there are no timerModifiedEarlier timers.
+	timerModifiedEarliest atomic.Int64
+
+	// Per-P GC state
+	gcAssistTime         int64 // Nanoseconds in assistAlloc
+	gcFractionalMarkTime int64 // Nanoseconds in fractional mark worker (atomic)
+
+	// limiterEvent tracks events for the GC CPU limiter.
+	limiterEvent limiterEvent
+
+	// gcMarkWorkerMode is the mode for the next mark worker to run in.
+	// That is, this is used to communicate with the worker goroutine
+	// selected for immediate execution by
+	// gcController.findRunnableGCWorker. When scheduling other goroutines,
+	// this field must be set to gcMarkWorkerNotWorker.
+	gcMarkWorkerMode gcMarkWorkerMode
+	// gcMarkWorkerStartTime is the nanotime() at which the most recent
+	// mark worker started.
+	gcMarkWorkerStartTime int64
+
+	// gcw is this P's GC work buffer cache. The work buffer is
+	// filled by write barriers, drained by mutator assists, and
+	// disposed on certain GC state transitions.
+	gcw gcWork
+
+	// wbBuf is this P's GC write barrier buffer.
+	//
+	// TODO: Consider caching this in the running G.
+	wbBuf wbBuf
+
+	runSafePointFn uint32 // if 1, run sched.safePointFn at next safe point
+
+	// statsSeq is a counter indicating whether this P is currently
+	// writing any stats. Its value is even when not, odd when it is.
+	statsSeq atomic.Uint32
+
+	// Lock for timers. We normally access the timers while running
+	// on this P, but the scheduler can also do it from a different P.
+	timersLock mutex
+
+	// Actions to take at some time. This is used to implement the
+	// standard library's time package.
+	// Must hold timersLock to access.
+	timers []*timer
+
+	// Number of timers in P's heap.
+	numTimers atomic.Uint32
+
+	// Number of timerDeleted timers in P's heap.
+	deletedTimers atomic.Uint32
+
+	// Race context used while executing timer functions.
+	timerRaceCtx uintptr
+
+	// maxStackScanDelta accumulates the amount of stack space held by
+	// live goroutines (i.e. those eligible for stack scanning).
+	// Flushed to gcController.maxStackScan once maxStackScanSlack
+	// or -maxStackScanSlack is reached.
+	maxStackScanDelta int64
+
+	// gc-time statistics about current goroutines
+	// Note that this differs from maxStackScan in that this
+	// accumulates the actual stack observed to be used at GC time (hi - sp),
+	// not an instantaneous measure of the total stack size that might need
+	// to be scanned (hi - lo).
+	scannedStackSize uint64 // stack size of goroutines scanned by this P
+	scannedStacks    uint64 // number of goroutines scanned by this P
+
+	// preempt is set to indicate that this P should be enter the
+	// scheduler ASAP (regardless of what G is running on it).
+	preempt bool
+
+	// pageTraceBuf is a buffer for writing out page allocation/free/scavenge traces.
+	//
+	// Used only if GOEXPERIMENT=pagetrace.
+	pageTraceBuf pageTraceBuf
+
+	// Padding is no longer needed. False sharing is now not a worry because p is large enough
+	// that its size class is an integer multiple of the cache line size (for any of our architectures).
+}
+
+type schedt struct {
+	goidgen   atomic.Uint64
+	lastpoll  atomic.Int64 // time of last network poll, 0 if currently polling
+	pollUntil atomic.Int64 // time to which current poll is sleeping
+
+	lock mutex
+
+	// When increasing nmidle, nmidlelocked, nmsys, or nmfreed, be
+	// sure to call checkdead().
+
+	midle        muintptr // idle m's waiting for work
+	nmidle       int32    // number of idle m's waiting for work
+	nmidlelocked int32    // number of locked m's waiting for work
+	mnext        int64    // number of m's that have been created and next M ID
+	maxmcount    int32    // maximum number of m's allowed (or die)
+	nmsys        int32    // number of system m's not counted for deadlock
+	nmfreed      int64    // cumulative number of freed m's
+
+	ngsys atomic.Int32 // number of system goroutines
+
+	pidle        puintptr // idle p's
+	npidle       atomic.Int32
+	nmspinning   atomic.Int32  // See "Worker thread parking/unparking" comment in proc.go.
+	needspinning atomic.Uint32 // See "Delicate dance" comment in proc.go. Boolean. Must hold sched.lock to set to 1.
+
+	// Global runnable queue.
+	runq     gQueue
+	runqsize int32
+
+	// disable controls selective disabling of the scheduler.
+	//
+	// Use schedEnableUser to control this.
+	//
+	// disable is protected by sched.lock.
+	disable struct {
+		// user disables scheduling of user goroutines.
+		user     bool
+		runnable gQueue // pending runnable Gs
+		n        int32  // length of runnable
+	}
+
+	// Global cache of dead G's.
+	gFree struct {
+		lock    mutex
+		stack   gList // Gs with stacks
+		noStack gList // Gs without stacks
+		n       int32
+	}
+
+	// Central cache of sudog structs.
+	sudoglock  mutex
+	sudogcache *sudog
+
+	// Central pool of available defer structs.
+	deferlock mutex
+	deferpool *_defer
+
+	// freem is the list of m's waiting to be freed when their
+	// m.exited is set. Linked through m.freelink.
+	freem *m
+
+	gcwaiting  atomic.Bool // gc is waiting to run
+	stopwait   int32
+	stopnote   note
+	sysmonwait atomic.Bool
+	sysmonnote note
+
+	// safepointFn should be called on each P at the next GC
+	// safepoint if p.runSafePointFn is set.
+	safePointFn   func(*p)
+	safePointWait int32
+	safePointNote note
+
+	profilehz int32 // cpu profiling rate
+
+	procresizetime int64 // nanotime() of last change to gomaxprocs
+	totaltime      int64 // ∫gomaxprocs dt up to procresizetime
+
+	// sysmonlock protects sysmon's actions on the runtime.
+	//
+	// Acquire and hold this mutex to block sysmon from interacting
+	// with the rest of the runtime.
+	sysmonlock mutex
+
+	// timeToRun is a distribution of scheduling latencies, defined
+	// as the sum of time a G spends in the _Grunnable state before
+	// it transitions to _Grunning.
+	timeToRun timeHistogram
+
+	// idleTime is the total CPU time Ps have "spent" idle.
+	//
+	// Reset on each GC cycle.
+	idleTime atomic.Int64
+
+	// totalMutexWaitTime is the sum of time goroutines have spent in _Gwaiting
+	// with a waitreason of the form waitReasonSync{RW,}Mutex{R,}Lock.
+	totalMutexWaitTime atomic.Int64
+}
+
+// Values for the flags field of a sigTabT.
+const (
+	_SigNotify   = 1 << iota // let signal.Notify have signal, even if from kernel
+	_SigKill                 // if signal.Notify doesn't take it, exit quietly
+	_SigThrow                // if signal.Notify doesn't take it, exit loudly
+	_SigPanic                // if the signal is from the kernel, panic
+	_SigDefault              // if the signal isn't explicitly requested, don't monitor it
+	_SigGoExit               // cause all runtime procs to exit (only used on Plan 9).
+	_SigSetStack             // Don't explicitly install handler, but add SA_ONSTACK to existing libc handler
+	_SigUnblock              // always unblock; see blockableSig
+	_SigIgn                  // _SIG_DFL action is to ignore the signal
+)
+
+// Layout of in-memory per-function information prepared by linker
+// See https://golang.org/s/go12symtab.
+// Keep in sync with linker (../cmd/link/internal/ld/pcln.go:/pclntab)
+// and with package debug/gosym and with symtab.go in package runtime.
+type _func struct {
+	sys.NotInHeap // Only in static data
+
+	entryOff uint32 // start pc, as offset from moduledata.text/pcHeader.textStart
+	nameOff  int32  // function name, as index into moduledata.funcnametab.
+
+	args        int32  // in/out args size
+	deferreturn uint32 // offset of start of a deferreturn call instruction from entry, if any.
+
+	pcsp      uint32
+	pcfile    uint32
+	pcln      uint32
+	npcdata   uint32
+	cuOffset  uint32     // runtime.cutab offset of this function's CU
+	startLine int32      // line number of start of function (func keyword/TEXT directive)
+	funcID    abi.FuncID // set for certain special runtime functions
+	flag      abi.FuncFlag
+	_         [1]byte // pad
+	nfuncdata uint8   // must be last, must end on a uint32-aligned boundary
+
+	// The end of the struct is followed immediately by two variable-length
+	// arrays that reference the pcdata and funcdata locations for this
+	// function.
+
+	// pcdata contains the offset into moduledata.pctab for the start of
+	// that index's table. e.g.,
+	// &moduledata.pctab[_func.pcdata[_PCDATA_UnsafePoint]] is the start of
+	// the unsafe point table.
+	//
+	// An offset of 0 indicates that there is no table.
+	//
+	// pcdata [npcdata]uint32
+
+	// funcdata contains the offset past moduledata.gofunc which contains a
+	// pointer to that index's funcdata. e.g.,
+	// *(moduledata.gofunc +  _func.funcdata[_FUNCDATA_ArgsPointerMaps]) is
+	// the argument pointer map.
+	//
+	// An offset of ^uint32(0) indicates that there is no entry.
+	//
+	// funcdata [nfuncdata]uint32
+}
+
+// Pseudo-Func that is returned for PCs that occur in inlined code.
+// A *Func can be either a *_func or a *funcinl, and they are distinguished
+// by the first uintptr.
+//
+// TODO(austin): Can we merge this with inlinedCall?
+type funcinl struct {
+	ones      uint32  // set to ^0 to distinguish from _func
+	entry     uintptr // entry of the real (the "outermost") frame
+	name      string
+	file      string
+	line      int32
+	startLine int32
+}
+
+// layout of Itab known to compilers
+// allocated in non-garbage-collected memory
+// Needs to be in sync with
+// ../cmd/compile/internal/reflectdata/reflect.go:/^func.WriteTabs.
+type itab struct {
+	inter *interfacetype
+	_type *_type
+	hash  uint32 // copy of _type.hash. Used for type switches.
+	_     [4]byte
+	fun   [1]uintptr // variable sized. fun[0]==0 means _type does not implement inter.
+}
+
+// Lock-free stack node.
+// Also known to export_test.go.
+type lfnode struct {
+	next    uint64
+	pushcnt uintptr
+}
+
+type forcegcstate struct {
+	lock mutex
+	g    *g
+	idle atomic.Bool
+}
+
+// extendRandom extends the random numbers in r[:n] to the whole slice r.
+// Treats n<0 as n==0.
+func extendRandom(r []byte, n int) {
+	if n < 0 {
+		n = 0
+	}
+	for n < len(r) {
+		// Extend random bits using hash function & time seed
+		w := n
+		if w > 16 {
+			w = 16
+		}
+		h := memhash(unsafe.Pointer(&r[n-w]), uintptr(nanotime()), uintptr(w))
+		for i := 0; i < goarch.PtrSize && n < len(r); i++ {
+			r[n] = byte(h)
+			n++
+			h >>= 8
+		}
+	}
+}
+
+// A _defer holds an entry on the list of deferred calls.
+// If you add a field here, add code to clear it in deferProcStack.
+// This struct must match the code in cmd/compile/internal/ssagen/ssa.go:deferstruct
+// and cmd/compile/internal/ssagen/ssa.go:(*state).call.
+// Some defers will be allocated on the stack and some on the heap.
+// All defers are logically part of the stack, so write barriers to
+// initialize them are not required. All defers must be manually scanned,
+// and for heap defers, marked.
+type _defer struct {
+	started bool
+	heap    bool
+	// openDefer indicates that this _defer is for a frame with open-coded
+	// defers. We have only one defer record for the entire frame (which may
+	// currently have 0, 1, or more defers active).
+	openDefer bool
+	sp        uintptr // sp at time of defer
+	pc        uintptr // pc at time of defer
+	fn        func()  // can be nil for open-coded defers
+	_panic    *_panic // panic that is running defer
+	link      *_defer // next defer on G; can point to either heap or stack!
+
+	// If openDefer is true, the fields below record values about the stack
+	// frame and associated function that has the open-coded defer(s). sp
+	// above will be the sp for the frame, and pc will be address of the
+	// deferreturn call in the function.
+	fd   unsafe.Pointer // funcdata for the function associated with the frame
+	varp uintptr        // value of varp for the stack frame
+	// framepc is the current pc associated with the stack frame. Together,
+	// with sp above (which is the sp associated with the stack frame),
+	// framepc/sp can be used as pc/sp pair to continue a stack trace via
+	// gentraceback().
+	framepc uintptr
+}
+
+// A _panic holds information about an active panic.
+//
+// A _panic value must only ever live on the stack.
+//
+// The argp and link fields are stack pointers, but don't need special
+// handling during stack growth: because they are pointer-typed and
+// _panic values only live on the stack, regular stack pointer
+// adjustment takes care of them.
+type _panic struct {
+	argp      unsafe.Pointer // pointer to arguments of deferred call run during panic; cannot move - known to liblink
+	arg       any            // argument to panic
+	link      *_panic        // link to earlier panic
+	pc        uintptr        // where to return to in runtime if this panic is bypassed
+	sp        unsafe.Pointer // where to return to in runtime if this panic is bypassed
+	recovered bool           // whether this panic is over
+	aborted   bool           // the panic was aborted
+	goexit    bool
+}
+
+// ancestorInfo records details of where a goroutine was started.
+type ancestorInfo struct {
+	pcs  []uintptr // pcs from the stack of this goroutine
+	goid uint64    // goroutine id of this goroutine; original goroutine possibly dead
+	gopc uintptr   // pc of go statement that created this goroutine
+}
+
+// A waitReason explains why a goroutine has been stopped.
+// See gopark. Do not re-use waitReasons, add new ones.
+type waitReason uint8
+
+const (
+	waitReasonZero                  waitReason = iota // ""
+	waitReasonGCAssistMarking                         // "GC assist marking"
+	waitReasonIOWait                                  // "IO wait"
+	waitReasonChanReceiveNilChan                      // "chan receive (nil chan)"
+	waitReasonChanSendNilChan                         // "chan send (nil chan)"
+	waitReasonDumpingHeap                             // "dumping heap"
+	waitReasonGarbageCollection                       // "garbage collection"
+	waitReasonGarbageCollectionScan                   // "garbage collection scan"
+	waitReasonPanicWait                               // "panicwait"
+	waitReasonSelect                                  // "select"
+	waitReasonSelectNoCases                           // "select (no cases)"
+	waitReasonGCAssistWait                            // "GC assist wait"
+	waitReasonGCSweepWait                             // "GC sweep wait"
+	waitReasonGCScavengeWait                          // "GC scavenge wait"
+	waitReasonChanReceive                             // "chan receive"
+	waitReasonChanSend                                // "chan send"
+	waitReasonFinalizerWait                           // "finalizer wait"
+	waitReasonForceGCIdle                             // "force gc (idle)"
+	waitReasonSemacquire                              // "semacquire"
+	waitReasonSleep                                   // "sleep"
+	waitReasonSyncCondWait                            // "sync.Cond.Wait"
+	waitReasonSyncMutexLock                           // "sync.Mutex.Lock"
+	waitReasonSyncRWMutexRLock                        // "sync.RWMutex.RLock"
+	waitReasonSyncRWMutexLock                         // "sync.RWMutex.Lock"
+	waitReasonTraceReaderBlocked                      // "trace reader (blocked)"
+	waitReasonWaitForGCCycle                          // "wait for GC cycle"
+	waitReasonGCWorkerIdle                            // "GC worker (idle)"
+	waitReasonGCWorkerActive                          // "GC worker (active)"
+	waitReasonPreempted                               // "preempted"
+	waitReasonDebugCall                               // "debug call"
+	waitReasonGCMarkTermination                       // "GC mark termination"
+	waitReasonStoppingTheWorld                        // "stopping the world"
+)
+
+var waitReasonStrings = [...]string{
+	waitReasonZero:                  "",
+	waitReasonGCAssistMarking:       "GC assist marking",
+	waitReasonIOWait:                "IO wait",
+	waitReasonChanReceiveNilChan:    "chan receive (nil chan)",
+	waitReasonChanSendNilChan:       "chan send (nil chan)",
+	waitReasonDumpingHeap:           "dumping heap",
+	waitReasonGarbageCollection:     "garbage collection",
+	waitReasonGarbageCollectionScan: "garbage collection scan",
+	waitReasonPanicWait:             "panicwait",
+	waitReasonSelect:                "select",
+	waitReasonSelectNoCases:         "select (no cases)",
+	waitReasonGCAssistWait:          "GC assist wait",
+	waitReasonGCSweepWait:           "GC sweep wait",
+	waitReasonGCScavengeWait:        "GC scavenge wait",
+	waitReasonChanReceive:           "chan receive",
+	waitReasonChanSend:              "chan send",
+	waitReasonFinalizerWait:         "finalizer wait",
+	waitReasonForceGCIdle:           "force gc (idle)",
+	waitReasonSemacquire:            "semacquire",
+	waitReasonSleep:                 "sleep",
+	waitReasonSyncCondWait:          "sync.Cond.Wait",
+	waitReasonSyncMutexLock:         "sync.Mutex.Lock",
+	waitReasonSyncRWMutexRLock:      "sync.RWMutex.RLock",
+	waitReasonSyncRWMutexLock:       "sync.RWMutex.Lock",
+	waitReasonTraceReaderBlocked:    "trace reader (blocked)",
+	waitReasonWaitForGCCycle:        "wait for GC cycle",
+	waitReasonGCWorkerIdle:          "GC worker (idle)",
+	waitReasonGCWorkerActive:        "GC worker (active)",
+	waitReasonPreempted:             "preempted",
+	waitReasonDebugCall:             "debug call",
+	waitReasonGCMarkTermination:     "GC mark termination",
+	waitReasonStoppingTheWorld:      "stopping the world",
+}
+
+func (w waitReason) String() string {
+	if w < 0 || w >= waitReason(len(waitReasonStrings)) {
+		return "unknown wait reason"
+	}
+	return waitReasonStrings[w]
+}
+
+func (w waitReason) isMutexWait() bool {
+	return w == waitReasonSyncMutexLock ||
+		w == waitReasonSyncRWMutexRLock ||
+		w == waitReasonSyncRWMutexLock
+}
+
+var (
+	allm       *m
+	gomaxprocs int32
+	ncpu       int32
+	forcegc    forcegcstate
+	sched      schedt
+	newprocs   int32
+
+	// allpLock protects P-less reads and size changes of allp, idlepMask,
+	// and timerpMask, and all writes to allp.
+	allpLock mutex
+	// len(allp) == gomaxprocs; may change at safe points, otherwise
+	// immutable.
+	allp []*p
+	// Bitmask of Ps in _Pidle list, one bit per P. Reads and writes must
+	// be atomic. Length may change at safe points.
+	//
+	// Each P must update only its own bit. In order to maintain
+	// consistency, a P going idle must the idle mask simultaneously with
+	// updates to the idle P list under the sched.lock, otherwise a racing
+	// pidleget may clear the mask before pidleput sets the mask,
+	// corrupting the bitmap.
+	//
+	// N.B., procresize takes ownership of all Ps in stopTheWorldWithSema.
+	idlepMask pMask
+	// Bitmask of Ps that may have a timer, one bit per P. Reads and writes
+	// must be atomic. Length may change at safe points.
+	timerpMask pMask
+
+	// Pool of GC parked background workers. Entries are type
+	// *gcBgMarkWorkerNode.
+	gcBgMarkWorkerPool lfstack
+
+	// Total number of gcBgMarkWorker goroutines. Protected by worldsema.
+	gcBgMarkWorkerCount int32
+
+	// Information about what cpu features are available.
+	// Packages outside the runtime should not use these
+	// as they are not an external api.
+	// Set on startup in asm_{386,amd64}.s
+	processorVersionInfo uint32
+	isIntel              bool
+
+	goarm uint8 // set by cmd/link on arm systems
+)
+
+// Set by the linker so the runtime can determine the buildmode.
+var (
+	islibrary bool // -buildmode=c-shared
+	isarchive bool // -buildmode=c-archive
+)
+
+// Must agree with internal/buildcfg.FramePointerEnabled.
+const framepointer_enabled = GOARCH == "amd64" || GOARCH == "arm64"
diff --git a/src/runtime/runtime_boring.go b/src/runtime/runtime_boring.go
new file mode 100644
index 0000000..5a98b20
--- /dev/null
+++ b/src/runtime/runtime_boring.go
@@ -0,0 +1,19 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import _ "unsafe" // for go:linkname
+
+//go:linkname boring_runtime_arg0 crypto/internal/boring.runtime_arg0
+func boring_runtime_arg0() string {
+	// On Windows, argslice is not set, and it's too much work to find argv0.
+	if len(argslice) == 0 {
+		return ""
+	}
+	return argslice[0]
+}
+
+//go:linkname fipstls_runtime_arg0 crypto/internal/boring/fipstls.runtime_arg0
+func fipstls_runtime_arg0() string { return boring_runtime_arg0() }
diff --git a/src/runtime/runtime_linux_test.go b/src/runtime/runtime_linux_test.go
new file mode 100644
index 0000000..6af5561
--- /dev/null
+++ b/src/runtime/runtime_linux_test.go
@@ -0,0 +1,65 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	. "runtime"
+	"syscall"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+var pid, tid int
+
+func init() {
+	// Record pid and tid of init thread for use during test.
+	// The call to LockOSThread is just to exercise it;
+	// we can't test that it does anything.
+	// Instead we're testing that the conditions are good
+	// for how it is used in init (must be on main thread).
+	pid, tid = syscall.Getpid(), syscall.Gettid()
+	LockOSThread()
+
+	sysNanosleep = func(d time.Duration) {
+		// Invoke a blocking syscall directly; calling time.Sleep()
+		// would deschedule the goroutine instead.
+		ts := syscall.NsecToTimespec(d.Nanoseconds())
+		for {
+			if err := syscall.Nanosleep(&ts, &ts); err != syscall.EINTR {
+				return
+			}
+		}
+	}
+}
+
+func TestLockOSThread(t *testing.T) {
+	if pid != tid {
+		t.Fatalf("pid=%d but tid=%d", pid, tid)
+	}
+}
+
+// Test that error values are negative.
+// Use a misaligned pointer to get -EINVAL.
+func TestMincoreErrorSign(t *testing.T) {
+	var dst byte
+	v := Mincore(Add(unsafe.Pointer(new(int32)), 1), 1, &dst)
+
+	const EINVAL = 0x16
+	if v != -EINVAL {
+		t.Errorf("mincore = %v, want %v", v, -EINVAL)
+	}
+}
+
+func TestKernelStructSize(t *testing.T) {
+	// Check that the Go definitions of structures exchanged with the kernel are
+	// the same size as what the kernel defines.
+	if have, want := unsafe.Sizeof(Siginfo{}), uintptr(SiginfoMaxSize); have != want {
+		t.Errorf("Go's siginfo struct is %d bytes long; kernel expects %d", have, want)
+	}
+	if have, want := unsafe.Sizeof(Sigevent{}), uintptr(SigeventMaxSize); have != want {
+		t.Errorf("Go's sigevent struct is %d bytes long; kernel expects %d", have, want)
+	}
+}
diff --git a/src/runtime/runtime_mmap_test.go b/src/runtime/runtime_mmap_test.go
new file mode 100644
index 0000000..456f913
--- /dev/null
+++ b/src/runtime/runtime_mmap_test.go
@@ -0,0 +1,53 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package runtime_test
+
+import (
+	"runtime"
+	"testing"
+	"unsafe"
+)
+
+// Test that the error value returned by mmap is positive, as that is
+// what the code in mem_bsd.go, mem_darwin.go, and mem_linux.go expects.
+// See the uses of ENOMEM in sysMap in those files.
+func TestMmapErrorSign(t *testing.T) {
+	p, err := runtime.Mmap(nil, ^uintptr(0)&^(runtime.GetPhysPageSize()-1), 0, runtime.MAP_ANON|runtime.MAP_PRIVATE, -1, 0)
+
+	if p != nil || err != runtime.ENOMEM {
+		t.Errorf("mmap = %v, %v, want nil, %v", p, err, runtime.ENOMEM)
+	}
+}
+
+func TestPhysPageSize(t *testing.T) {
+	// Mmap fails if the address is not page aligned, so we can
+	// use this to test if the page size is the true page size.
+	ps := runtime.GetPhysPageSize()
+
+	// Get a region of memory to play with. This should be page-aligned.
+	b, err := runtime.Mmap(nil, 2*ps, 0, runtime.MAP_ANON|runtime.MAP_PRIVATE, -1, 0)
+	if err != 0 {
+		t.Fatalf("Mmap: %v", err)
+	}
+
+	if runtime.GOOS == "aix" {
+		// AIX does not allow mapping a range that is already mapped.
+		runtime.Munmap(unsafe.Pointer(uintptr(b)), 2*ps)
+	}
+
+	// Mmap should fail at a half page into the buffer.
+	_, err = runtime.Mmap(unsafe.Pointer(uintptr(b)+ps/2), ps, 0, runtime.MAP_ANON|runtime.MAP_PRIVATE|runtime.MAP_FIXED, -1, 0)
+	if err == 0 {
+		t.Errorf("Mmap should have failed with half-page alignment %d, but succeeded: %v", ps/2, err)
+	}
+
+	// Mmap should succeed at a full page into the buffer.
+	_, err = runtime.Mmap(unsafe.Pointer(uintptr(b)+ps), ps, 0, runtime.MAP_ANON|runtime.MAP_PRIVATE|runtime.MAP_FIXED, -1, 0)
+	if err != 0 {
+		t.Errorf("Mmap at full-page alignment %d failed: %v", ps, err)
+	}
+}
diff --git a/src/runtime/runtime_test.go b/src/runtime/runtime_test.go
new file mode 100644
index 0000000..0839cd9
--- /dev/null
+++ b/src/runtime/runtime_test.go
@@ -0,0 +1,543 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"flag"
+	"fmt"
+	"io"
+	. "runtime"
+	"runtime/debug"
+	"sort"
+	"strings"
+	"sync"
+	"testing"
+	"time"
+	"unsafe"
+)
+
+// flagQuick is set by the -quick option to skip some relatively slow tests.
+// This is used by the cmd/dist test runtime:cpu124.
+// The cmd/dist test passes both -test.short and -quick;
+// there are tests that only check testing.Short, and those tests will
+// not be skipped if only -quick is used.
+var flagQuick = flag.Bool("quick", false, "skip slow tests, for cmd/dist test runtime:cpu124")
+
+func init() {
+	// We're testing the runtime, so make tracebacks show things
+	// in the runtime. This only raises the level, so it won't
+	// override GOTRACEBACK=crash from the user.
+	SetTracebackEnv("system")
+}
+
+var errf error
+
+func errfn() error {
+	return errf
+}
+
+func errfn1() error {
+	return io.EOF
+}
+
+func BenchmarkIfaceCmp100(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		for j := 0; j < 100; j++ {
+			if errfn() == io.EOF {
+				b.Fatal("bad comparison")
+			}
+		}
+	}
+}
+
+func BenchmarkIfaceCmpNil100(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		for j := 0; j < 100; j++ {
+			if errfn1() == nil {
+				b.Fatal("bad comparison")
+			}
+		}
+	}
+}
+
+var efaceCmp1 any
+var efaceCmp2 any
+
+func BenchmarkEfaceCmpDiff(b *testing.B) {
+	x := 5
+	efaceCmp1 = &x
+	y := 6
+	efaceCmp2 = &y
+	for i := 0; i < b.N; i++ {
+		for j := 0; j < 100; j++ {
+			if efaceCmp1 == efaceCmp2 {
+				b.Fatal("bad comparison")
+			}
+		}
+	}
+}
+
+func BenchmarkEfaceCmpDiffIndirect(b *testing.B) {
+	efaceCmp1 = [2]int{1, 2}
+	efaceCmp2 = [2]int{1, 2}
+	for i := 0; i < b.N; i++ {
+		for j := 0; j < 100; j++ {
+			if efaceCmp1 != efaceCmp2 {
+				b.Fatal("bad comparison")
+			}
+		}
+	}
+}
+
+func BenchmarkDefer(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		defer1()
+	}
+}
+
+func defer1() {
+	defer func(x, y, z int) {
+		if recover() != nil || x != 1 || y != 2 || z != 3 {
+			panic("bad recover")
+		}
+	}(1, 2, 3)
+}
+
+func BenchmarkDefer10(b *testing.B) {
+	for i := 0; i < b.N/10; i++ {
+		defer2()
+	}
+}
+
+func defer2() {
+	for i := 0; i < 10; i++ {
+		defer func(x, y, z int) {
+			if recover() != nil || x != 1 || y != 2 || z != 3 {
+				panic("bad recover")
+			}
+		}(1, 2, 3)
+	}
+}
+
+func BenchmarkDeferMany(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		defer func(x, y, z int) {
+			if recover() != nil || x != 1 || y != 2 || z != 3 {
+				panic("bad recover")
+			}
+		}(1, 2, 3)
+	}
+}
+
+func BenchmarkPanicRecover(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		defer3()
+	}
+}
+
+func defer3() {
+	defer func(x, y, z int) {
+		if recover() == nil {
+			panic("failed recover")
+		}
+	}(1, 2, 3)
+	panic("hi")
+}
+
+// golang.org/issue/7063
+func TestStopCPUProfilingWithProfilerOff(t *testing.T) {
+	SetCPUProfileRate(0)
+}
+
+// Addresses to test for faulting behavior.
+// This is less a test of SetPanicOnFault and more a check that
+// the operating system and the runtime can process these faults
+// correctly. That is, we're indirectly testing that without SetPanicOnFault
+// these would manage to turn into ordinary crashes.
+// Note that these are truncated on 32-bit systems, so the bottom 32 bits
+// of the larger addresses must themselves be invalid addresses.
+// We might get unlucky and the OS might have mapped one of these
+// addresses, but probably not: they're all in the first page, very high
+// addresses that normally an OS would reserve for itself, or malformed
+// addresses. Even so, we might have to remove one or two on different
+// systems. We will see.
+
+var faultAddrs = []uint64{
+	// low addresses
+	0,
+	1,
+	0xfff,
+	// high (kernel) addresses
+	// or else malformed.
+	0xffffffffffffffff,
+	0xfffffffffffff001,
+	0xffffffffffff0001,
+	0xfffffffffff00001,
+	0xffffffffff000001,
+	0xfffffffff0000001,
+	0xffffffff00000001,
+	0xfffffff000000001,
+	0xffffff0000000001,
+	0xfffff00000000001,
+	0xffff000000000001,
+	0xfff0000000000001,
+	0xff00000000000001,
+	0xf000000000000001,
+	0x8000000000000001,
+}
+
+func TestSetPanicOnFault(t *testing.T) {
+	old := debug.SetPanicOnFault(true)
+	defer debug.SetPanicOnFault(old)
+
+	nfault := 0
+	for _, addr := range faultAddrs {
+		testSetPanicOnFault(t, uintptr(addr), &nfault)
+	}
+	if nfault == 0 {
+		t.Fatalf("none of the addresses faulted")
+	}
+}
+
+// testSetPanicOnFault tests one potentially faulting address.
+// It deliberately constructs and uses an invalid pointer,
+// so mark it as nocheckptr.
+//
+//go:nocheckptr
+func testSetPanicOnFault(t *testing.T, addr uintptr, nfault *int) {
+	if GOOS == "js" || GOOS == "wasip1" {
+		t.Skip(GOOS + " does not support catching faults")
+	}
+
+	defer func() {
+		if err := recover(); err != nil {
+			*nfault++
+		}
+	}()
+
+	// The read should fault, except that sometimes we hit
+	// addresses that have had C or kernel pages mapped there
+	// readable by user code. So just log the content.
+	// If no addresses fault, we'll fail the test.
+	v := *(*byte)(unsafe.Pointer(addr))
+	t.Logf("addr %#x: %#x\n", addr, v)
+}
+
+func eqstring_generic(s1, s2 string) bool {
+	if len(s1) != len(s2) {
+		return false
+	}
+	// optimization in assembly versions:
+	// if s1.str == s2.str { return true }
+	for i := 0; i < len(s1); i++ {
+		if s1[i] != s2[i] {
+			return false
+		}
+	}
+	return true
+}
+
+func TestEqString(t *testing.T) {
+	// This isn't really an exhaustive test of == on strings, it's
+	// just a convenient way of documenting (via eqstring_generic)
+	// what == does.
+	s := []string{
+		"",
+		"a",
+		"c",
+		"aaa",
+		"ccc",
+		"cccc"[:3], // same contents, different string
+		"1234567890",
+	}
+	for _, s1 := range s {
+		for _, s2 := range s {
+			x := s1 == s2
+			y := eqstring_generic(s1, s2)
+			if x != y {
+				t.Errorf(`("%s" == "%s") = %t, want %t`, s1, s2, x, y)
+			}
+		}
+	}
+}
+
+func TestTrailingZero(t *testing.T) {
+	// make sure we add padding for structs with trailing zero-sized fields
+	type T1 struct {
+		n int32
+		z [0]byte
+	}
+	if unsafe.Sizeof(T1{}) != 8 {
+		t.Errorf("sizeof(%#v)==%d, want 8", T1{}, unsafe.Sizeof(T1{}))
+	}
+	type T2 struct {
+		n int64
+		z struct{}
+	}
+	if unsafe.Sizeof(T2{}) != 8+unsafe.Sizeof(uintptr(0)) {
+		t.Errorf("sizeof(%#v)==%d, want %d", T2{}, unsafe.Sizeof(T2{}), 8+unsafe.Sizeof(uintptr(0)))
+	}
+	type T3 struct {
+		n byte
+		z [4]struct{}
+	}
+	if unsafe.Sizeof(T3{}) != 2 {
+		t.Errorf("sizeof(%#v)==%d, want 2", T3{}, unsafe.Sizeof(T3{}))
+	}
+	// make sure padding can double for both zerosize and alignment
+	type T4 struct {
+		a int32
+		b int16
+		c int8
+		z struct{}
+	}
+	if unsafe.Sizeof(T4{}) != 8 {
+		t.Errorf("sizeof(%#v)==%d, want 8", T4{}, unsafe.Sizeof(T4{}))
+	}
+	// make sure we don't pad a zero-sized thing
+	type T5 struct {
+	}
+	if unsafe.Sizeof(T5{}) != 0 {
+		t.Errorf("sizeof(%#v)==%d, want 0", T5{}, unsafe.Sizeof(T5{}))
+	}
+}
+
+func TestAppendGrowth(t *testing.T) {
+	var x []int64
+	check := func(want int) {
+		if cap(x) != want {
+			t.Errorf("len=%d, cap=%d, want cap=%d", len(x), cap(x), want)
+		}
+	}
+
+	check(0)
+	want := 1
+	for i := 1; i <= 100; i++ {
+		x = append(x, 1)
+		check(want)
+		if i&(i-1) == 0 {
+			want = 2 * i
+		}
+	}
+}
+
+var One = []int64{1}
+
+func TestAppendSliceGrowth(t *testing.T) {
+	var x []int64
+	check := func(want int) {
+		if cap(x) != want {
+			t.Errorf("len=%d, cap=%d, want cap=%d", len(x), cap(x), want)
+		}
+	}
+
+	check(0)
+	want := 1
+	for i := 1; i <= 100; i++ {
+		x = append(x, One...)
+		check(want)
+		if i&(i-1) == 0 {
+			want = 2 * i
+		}
+	}
+}
+
+func TestGoroutineProfileTrivial(t *testing.T) {
+	// Calling GoroutineProfile twice in a row should find the same number of goroutines,
+	// but it's possible there are goroutines just about to exit, so we might end up
+	// with fewer in the second call. Try a few times; it should converge once those
+	// zombies are gone.
+	for i := 0; ; i++ {
+		n1, ok := GoroutineProfile(nil) // should fail, there's at least 1 goroutine
+		if n1 < 1 || ok {
+			t.Fatalf("GoroutineProfile(nil) = %d, %v, want >0, false", n1, ok)
+		}
+		n2, ok := GoroutineProfile(make([]StackRecord, n1))
+		if n2 == n1 && ok {
+			break
+		}
+		t.Logf("GoroutineProfile(%d) = %d, %v, want %d, true", n1, n2, ok, n1)
+		if i >= 10 {
+			t.Fatalf("GoroutineProfile not converging")
+		}
+	}
+}
+
+func BenchmarkGoroutineProfile(b *testing.B) {
+	run := func(fn func() bool) func(b *testing.B) {
+		runOne := func(b *testing.B) {
+			latencies := make([]time.Duration, 0, b.N)
+
+			b.ResetTimer()
+			for i := 0; i < b.N; i++ {
+				start := time.Now()
+				ok := fn()
+				if !ok {
+					b.Fatal("goroutine profile failed")
+				}
+				latencies = append(latencies, time.Since(start))
+			}
+			b.StopTimer()
+
+			// Sort latencies then report percentiles.
+			sort.Slice(latencies, func(i, j int) bool {
+				return latencies[i] < latencies[j]
+			})
+			b.ReportMetric(float64(latencies[len(latencies)*50/100]), "p50-ns")
+			b.ReportMetric(float64(latencies[len(latencies)*90/100]), "p90-ns")
+			b.ReportMetric(float64(latencies[len(latencies)*99/100]), "p99-ns")
+		}
+		return func(b *testing.B) {
+			b.Run("idle", runOne)
+
+			b.Run("loaded", func(b *testing.B) {
+				stop := applyGCLoad(b)
+				runOne(b)
+				// Make sure to stop the timer before we wait! The load created above
+				// is very heavy-weight and not easy to stop, so we could end up
+				// confusing the benchmarking framework for small b.N.
+				b.StopTimer()
+				stop()
+			})
+		}
+	}
+
+	// Measure the cost of counting goroutines
+	b.Run("small-nil", run(func() bool {
+		GoroutineProfile(nil)
+		return true
+	}))
+
+	// Measure the cost with a small set of goroutines
+	n := NumGoroutine()
+	p := make([]StackRecord, 2*n+2*GOMAXPROCS(0))
+	b.Run("small", run(func() bool {
+		_, ok := GoroutineProfile(p)
+		return ok
+	}))
+
+	// Measure the cost with a large set of goroutines
+	ch := make(chan int)
+	var ready, done sync.WaitGroup
+	for i := 0; i < 5000; i++ {
+		ready.Add(1)
+		done.Add(1)
+		go func() { ready.Done(); <-ch; done.Done() }()
+	}
+	ready.Wait()
+
+	// Count goroutines with a large allgs list
+	b.Run("large-nil", run(func() bool {
+		GoroutineProfile(nil)
+		return true
+	}))
+
+	n = NumGoroutine()
+	p = make([]StackRecord, 2*n+2*GOMAXPROCS(0))
+	b.Run("large", run(func() bool {
+		_, ok := GoroutineProfile(p)
+		return ok
+	}))
+
+	close(ch)
+	done.Wait()
+
+	// Count goroutines with a large (but unused) allgs list
+	b.Run("sparse-nil", run(func() bool {
+		GoroutineProfile(nil)
+		return true
+	}))
+
+	// Measure the cost of a large (but unused) allgs list
+	n = NumGoroutine()
+	p = make([]StackRecord, 2*n+2*GOMAXPROCS(0))
+	b.Run("sparse", run(func() bool {
+		_, ok := GoroutineProfile(p)
+		return ok
+	}))
+}
+
+func TestVersion(t *testing.T) {
+	// Test that version does not contain \r or \n.
+	vers := Version()
+	if strings.Contains(vers, "\r") || strings.Contains(vers, "\n") {
+		t.Fatalf("cr/nl in version: %q", vers)
+	}
+}
+
+func TestTimediv(t *testing.T) {
+	for _, tc := range []struct {
+		num int64
+		div int32
+		ret int32
+		rem int32
+	}{
+		{
+			num: 8,
+			div: 2,
+			ret: 4,
+			rem: 0,
+		},
+		{
+			num: 9,
+			div: 2,
+			ret: 4,
+			rem: 1,
+		},
+		{
+			// Used by runtime.check.
+			num: 12345*1000000000 + 54321,
+			div: 1000000000,
+			ret: 12345,
+			rem: 54321,
+		},
+		{
+			num: 1<<32 - 1,
+			div: 2,
+			ret: 1<<31 - 1, // no overflow.
+			rem: 1,
+		},
+		{
+			num: 1 << 32,
+			div: 2,
+			ret: 1<<31 - 1, // overflow.
+			rem: 0,
+		},
+		{
+			num: 1 << 40,
+			div: 2,
+			ret: 1<<31 - 1, // overflow.
+			rem: 0,
+		},
+		{
+			num: 1<<40 + 1,
+			div: 1 << 10,
+			ret: 1 << 30,
+			rem: 1,
+		},
+	} {
+		name := fmt.Sprintf("%d div %d", tc.num, tc.div)
+		t.Run(name, func(t *testing.T) {
+			// Double check that the inputs make sense using
+			// standard 64-bit division.
+			ret64 := tc.num / int64(tc.div)
+			rem64 := tc.num % int64(tc.div)
+			if ret64 != int64(int32(ret64)) {
+				// Simulate timediv overflow value.
+				ret64 = 1<<31 - 1
+				rem64 = 0
+			}
+			if ret64 != int64(tc.ret) {
+				t.Errorf("%d / %d got ret %d rem %d want ret %d rem %d", tc.num, tc.div, ret64, rem64, tc.ret, tc.rem)
+			}
+
+			var rem int32
+			ret := Timediv(tc.num, tc.div, &rem)
+			if ret != tc.ret || rem != tc.rem {
+				t.Errorf("timediv %d / %d got ret %d rem %d want ret %d rem %d", tc.num, tc.div, ret, rem, tc.ret, tc.rem)
+			}
+		})
+	}
+}
diff --git a/src/runtime/runtime_unix_test.go b/src/runtime/runtime_unix_test.go
new file mode 100644
index 0000000..642a946
--- /dev/null
+++ b/src/runtime/runtime_unix_test.go
@@ -0,0 +1,56 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Only works on systems with syscall.Close.
+// We need a fast system call to provoke the race,
+// and Close(-1) is nearly universally fast.
+
+//go:build aix || darwin || dragonfly || freebsd || linux || netbsd || openbsd || plan9
+
+package runtime_test
+
+import (
+	"runtime"
+	"sync"
+	"sync/atomic"
+	"syscall"
+	"testing"
+)
+
+func TestGoroutineProfile(t *testing.T) {
+	// GoroutineProfile used to use the wrong starting sp for
+	// goroutines coming out of system calls, causing possible
+	// crashes.
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(100))
+
+	var stop uint32
+	defer atomic.StoreUint32(&stop, 1) // in case of panic
+
+	var wg sync.WaitGroup
+	for i := 0; i < 4; i++ {
+		wg.Add(1)
+		go func() {
+			for atomic.LoadUint32(&stop) == 0 {
+				syscall.Close(-1)
+			}
+			wg.Done()
+		}()
+	}
+
+	max := 10000
+	if testing.Short() {
+		max = 100
+	}
+	stk := make([]runtime.StackRecord, 128)
+	for n := 0; n < max; n++ {
+		_, ok := runtime.GoroutineProfile(stk)
+		if !ok {
+			t.Fatalf("GoroutineProfile failed")
+		}
+	}
+
+	// If the program didn't crash, we passed.
+	atomic.StoreUint32(&stop, 1)
+	wg.Wait()
+}
diff --git a/src/runtime/rwmutex.go b/src/runtime/rwmutex.go
new file mode 100644
index 0000000..34d8f67
--- /dev/null
+++ b/src/runtime/rwmutex.go
@@ -0,0 +1,167 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+)
+
+// This is a copy of sync/rwmutex.go rewritten to work in the runtime.
+
+// A rwmutex is a reader/writer mutual exclusion lock.
+// The lock can be held by an arbitrary number of readers or a single writer.
+// This is a variant of sync.RWMutex, for the runtime package.
+// Like mutex, rwmutex blocks the calling M.
+// It does not interact with the goroutine scheduler.
+type rwmutex struct {
+	rLock      mutex    // protects readers, readerPass, writer
+	readers    muintptr // list of pending readers
+	readerPass uint32   // number of pending readers to skip readers list
+
+	wLock  mutex    // serializes writers
+	writer muintptr // pending writer waiting for completing readers
+
+	readerCount atomic.Int32 // number of pending readers
+	readerWait  atomic.Int32 // number of departing readers
+
+	readRank  lockRank // semantic lock rank for read locking
+}
+
+// Lock ranking an rwmutex has two aspects:
+//
+// Semantic ranking: this rwmutex represents some higher level lock that
+// protects some resource (e.g., allocmLock protects creation of new Ms). The
+// read and write locks of that resource need to be represented in the lock
+// rank.
+//
+// Internal ranking: as an implementation detail, rwmutex uses two mutexes:
+// rLock and wLock. These have lock order requirements: wLock must be locked
+// before rLock. This also needs to be represented in the lock rank.
+//
+// Semantic ranking is represented by acquiring readRank during read lock and
+// writeRank during write lock.
+//
+// wLock is held for the duration of a write lock, so it uses writeRank
+// directly, both for semantic and internal ranking. rLock is only held
+// temporarily inside the rlock/lock methods, so it uses readRankInternal to
+// represent internal ranking. Semantic ranking is represented by a separate
+// acquire of readRank for the duration of a read lock.
+//
+// The lock ranking must document this ordering:
+// - readRankInternal is a leaf lock.
+// - readRank is taken before readRankInternal.
+// - writeRank is taken before readRankInternal.
+// - readRank is placed in the lock order wherever a read lock of this rwmutex
+//   belongs.
+// - writeRank is placed in the lock order wherever a write lock of this
+//   rwmutex belongs.
+func (rw *rwmutex) init(readRank, readRankInternal, writeRank lockRank) {
+	rw.readRank = readRank
+
+	lockInit(&rw.rLock, readRankInternal)
+	lockInit(&rw.wLock, writeRank)
+}
+
+const rwmutexMaxReaders = 1 << 30
+
+// rlock locks rw for reading.
+func (rw *rwmutex) rlock() {
+	// The reader must not be allowed to lose its P or else other
+	// things blocking on the lock may consume all of the Ps and
+	// deadlock (issue #20903). Alternatively, we could drop the P
+	// while sleeping.
+	acquirem()
+
+	acquireLockRank(rw.readRank)
+	lockWithRankMayAcquire(&rw.rLock, getLockRank(&rw.rLock))
+
+	if rw.readerCount.Add(1) < 0 {
+		// A writer is pending. Park on the reader queue.
+		systemstack(func() {
+			lock(&rw.rLock)
+			if rw.readerPass > 0 {
+				// Writer finished.
+				rw.readerPass -= 1
+				unlock(&rw.rLock)
+			} else {
+				// Queue this reader to be woken by
+				// the writer.
+				m := getg().m
+				m.schedlink = rw.readers
+				rw.readers.set(m)
+				unlock(&rw.rLock)
+				notesleep(&m.park)
+				noteclear(&m.park)
+			}
+		})
+	}
+}
+
+// runlock undoes a single rlock call on rw.
+func (rw *rwmutex) runlock() {
+	if r := rw.readerCount.Add(-1); r < 0 {
+		if r+1 == 0 || r+1 == -rwmutexMaxReaders {
+			throw("runlock of unlocked rwmutex")
+		}
+		// A writer is pending.
+		if rw.readerWait.Add(-1) == 0 {
+			// The last reader unblocks the writer.
+			lock(&rw.rLock)
+			w := rw.writer.ptr()
+			if w != nil {
+				notewakeup(&w.park)
+			}
+			unlock(&rw.rLock)
+		}
+	}
+	releaseLockRank(rw.readRank)
+	releasem(getg().m)
+}
+
+// lock locks rw for writing.
+func (rw *rwmutex) lock() {
+	// Resolve competition with other writers and stick to our P.
+	lock(&rw.wLock)
+	m := getg().m
+	// Announce that there is a pending writer.
+	r := rw.readerCount.Add(-rwmutexMaxReaders) + rwmutexMaxReaders
+	// Wait for any active readers to complete.
+	lock(&rw.rLock)
+	if r != 0 && rw.readerWait.Add(r) != 0 {
+		// Wait for reader to wake us up.
+		systemstack(func() {
+			rw.writer.set(m)
+			unlock(&rw.rLock)
+			notesleep(&m.park)
+			noteclear(&m.park)
+		})
+	} else {
+		unlock(&rw.rLock)
+	}
+}
+
+// unlock unlocks rw for writing.
+func (rw *rwmutex) unlock() {
+	// Announce to readers that there is no active writer.
+	r := rw.readerCount.Add(rwmutexMaxReaders)
+	if r >= rwmutexMaxReaders {
+		throw("unlock of unlocked rwmutex")
+	}
+	// Unblock blocked readers.
+	lock(&rw.rLock)
+	for rw.readers.ptr() != nil {
+		reader := rw.readers.ptr()
+		rw.readers = reader.schedlink
+		reader.schedlink.set(nil)
+		notewakeup(&reader.park)
+		r -= 1
+	}
+	// If r > 0, there are pending readers that aren't on the
+	// queue. Tell them to skip waiting.
+	rw.readerPass += uint32(r)
+	unlock(&rw.rLock)
+	// Allow other writers to proceed.
+	unlock(&rw.wLock)
+}
diff --git a/src/runtime/rwmutex_test.go b/src/runtime/rwmutex_test.go
new file mode 100644
index 0000000..bdeb9c4
--- /dev/null
+++ b/src/runtime/rwmutex_test.go
@@ -0,0 +1,195 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// GOMAXPROCS=10 go test
+
+// This is a copy of sync/rwmutex_test.go rewritten to test the
+// runtime rwmutex.
+
+package runtime_test
+
+import (
+	"fmt"
+	. "runtime"
+	"runtime/debug"
+	"sync/atomic"
+	"testing"
+)
+
+func parallelReader(m *RWMutex, clocked chan bool, cunlock *atomic.Bool, cdone chan bool) {
+	m.RLock()
+	clocked <- true
+	for !cunlock.Load() {
+	}
+	m.RUnlock()
+	cdone <- true
+}
+
+func doTestParallelReaders(numReaders int) {
+	GOMAXPROCS(numReaders + 1)
+	var m RWMutex
+	m.Init()
+	clocked := make(chan bool, numReaders)
+	var cunlock atomic.Bool
+	cdone := make(chan bool)
+	for i := 0; i < numReaders; i++ {
+		go parallelReader(&m, clocked, &cunlock, cdone)
+	}
+	// Wait for all parallel RLock()s to succeed.
+	for i := 0; i < numReaders; i++ {
+		<-clocked
+	}
+	cunlock.Store(true)
+	// Wait for the goroutines to finish.
+	for i := 0; i < numReaders; i++ {
+		<-cdone
+	}
+}
+
+func TestParallelRWMutexReaders(t *testing.T) {
+	if GOARCH == "wasm" {
+		t.Skip("wasm has no threads yet")
+	}
+	defer GOMAXPROCS(GOMAXPROCS(-1))
+	// If runtime triggers a forced GC during this test then it will deadlock,
+	// since the goroutines can't be stopped/preempted.
+	// Disable GC for this test (see issue #10958).
+	defer debug.SetGCPercent(debug.SetGCPercent(-1))
+	// SetGCPercent waits until the mark phase is over, but the runtime
+	// also preempts at the start of the sweep phase, so make sure that's
+	// done too.
+	GC()
+
+	doTestParallelReaders(1)
+	doTestParallelReaders(3)
+	doTestParallelReaders(4)
+}
+
+func reader(rwm *RWMutex, num_iterations int, activity *int32, cdone chan bool) {
+	for i := 0; i < num_iterations; i++ {
+		rwm.RLock()
+		n := atomic.AddInt32(activity, 1)
+		if n < 1 || n >= 10000 {
+			panic(fmt.Sprintf("wlock(%d)\n", n))
+		}
+		for i := 0; i < 100; i++ {
+		}
+		atomic.AddInt32(activity, -1)
+		rwm.RUnlock()
+	}
+	cdone <- true
+}
+
+func writer(rwm *RWMutex, num_iterations int, activity *int32, cdone chan bool) {
+	for i := 0; i < num_iterations; i++ {
+		rwm.Lock()
+		n := atomic.AddInt32(activity, 10000)
+		if n != 10000 {
+			panic(fmt.Sprintf("wlock(%d)\n", n))
+		}
+		for i := 0; i < 100; i++ {
+		}
+		atomic.AddInt32(activity, -10000)
+		rwm.Unlock()
+	}
+	cdone <- true
+}
+
+func HammerRWMutex(gomaxprocs, numReaders, num_iterations int) {
+	GOMAXPROCS(gomaxprocs)
+	// Number of active readers + 10000 * number of active writers.
+	var activity int32
+	var rwm RWMutex
+	rwm.Init()
+	cdone := make(chan bool)
+	go writer(&rwm, num_iterations, &activity, cdone)
+	var i int
+	for i = 0; i < numReaders/2; i++ {
+		go reader(&rwm, num_iterations, &activity, cdone)
+	}
+	go writer(&rwm, num_iterations, &activity, cdone)
+	for ; i < numReaders; i++ {
+		go reader(&rwm, num_iterations, &activity, cdone)
+	}
+	// Wait for the 2 writers and all readers to finish.
+	for i := 0; i < 2+numReaders; i++ {
+		<-cdone
+	}
+}
+
+func TestRWMutex(t *testing.T) {
+	defer GOMAXPROCS(GOMAXPROCS(-1))
+	n := 1000
+	if testing.Short() {
+		n = 5
+	}
+	HammerRWMutex(1, 1, n)
+	HammerRWMutex(1, 3, n)
+	HammerRWMutex(1, 10, n)
+	HammerRWMutex(4, 1, n)
+	HammerRWMutex(4, 3, n)
+	HammerRWMutex(4, 10, n)
+	HammerRWMutex(10, 1, n)
+	HammerRWMutex(10, 3, n)
+	HammerRWMutex(10, 10, n)
+	HammerRWMutex(10, 5, n)
+}
+
+func BenchmarkRWMutexUncontended(b *testing.B) {
+	type PaddedRWMutex struct {
+		RWMutex
+		pad [32]uint32
+	}
+	b.RunParallel(func(pb *testing.PB) {
+		var rwm PaddedRWMutex
+		rwm.Init()
+		for pb.Next() {
+			rwm.RLock()
+			rwm.RLock()
+			rwm.RUnlock()
+			rwm.RUnlock()
+			rwm.Lock()
+			rwm.Unlock()
+		}
+	})
+}
+
+func benchmarkRWMutex(b *testing.B, localWork, writeRatio int) {
+	var rwm RWMutex
+	rwm.Init()
+	b.RunParallel(func(pb *testing.PB) {
+		foo := 0
+		for pb.Next() {
+			foo++
+			if foo%writeRatio == 0 {
+				rwm.Lock()
+				rwm.Unlock()
+			} else {
+				rwm.RLock()
+				for i := 0; i != localWork; i += 1 {
+					foo *= 2
+					foo /= 2
+				}
+				rwm.RUnlock()
+			}
+		}
+		_ = foo
+	})
+}
+
+func BenchmarkRWMutexWrite100(b *testing.B) {
+	benchmarkRWMutex(b, 0, 100)
+}
+
+func BenchmarkRWMutexWrite10(b *testing.B) {
+	benchmarkRWMutex(b, 0, 10)
+}
+
+func BenchmarkRWMutexWorkWrite100(b *testing.B) {
+	benchmarkRWMutex(b, 100, 100)
+}
+
+func BenchmarkRWMutexWorkWrite10(b *testing.B) {
+	benchmarkRWMutex(b, 100, 10)
+}
diff --git a/src/runtime/security_aix.go b/src/runtime/security_aix.go
new file mode 100644
index 0000000..c11b9c3
--- /dev/null
+++ b/src/runtime/security_aix.go
@@ -0,0 +1,17 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// secureMode is only ever mutated in schedinit, so we don't need to worry about
+// synchronization primitives.
+var secureMode bool
+
+func initSecureMode() {
+	secureMode = !(getuid() == geteuid() && getgid() == getegid())
+}
+
+func isSecureMode() bool {
+	return secureMode
+}
diff --git a/src/runtime/security_issetugid.go b/src/runtime/security_issetugid.go
new file mode 100644
index 0000000..5048632
--- /dev/null
+++ b/src/runtime/security_issetugid.go
@@ -0,0 +1,19 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build darwin || dragonfly || freebsd || illumos || netbsd || openbsd || solaris
+
+package runtime
+
+// secureMode is only ever mutated in schedinit, so we don't need to worry about
+// synchronization primitives.
+var secureMode bool
+
+func initSecureMode() {
+	secureMode = issetugid() == 1
+}
+
+func isSecureMode() bool {
+	return secureMode
+}
diff --git a/src/runtime/security_linux.go b/src/runtime/security_linux.go
new file mode 100644
index 0000000..181f3a1
--- /dev/null
+++ b/src/runtime/security_linux.go
@@ -0,0 +1,15 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import _ "unsafe"
+
+func initSecureMode() {
+	// We have already initialized the secureMode bool in sysauxv.
+}
+
+func isSecureMode() bool {
+	return secureMode
+}
diff --git a/src/runtime/security_nonunix.go b/src/runtime/security_nonunix.go
new file mode 100644
index 0000000..fc9571c
--- /dev/null
+++ b/src/runtime/security_nonunix.go
@@ -0,0 +1,13 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !unix
+
+package runtime
+
+func isSecureMode() bool {
+	return false
+}
+
+func secure() {}
diff --git a/src/runtime/security_test.go b/src/runtime/security_test.go
new file mode 100644
index 0000000..5cd90f9
--- /dev/null
+++ b/src/runtime/security_test.go
@@ -0,0 +1,145 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package runtime_test
+
+import (
+	"bytes"
+	"context"
+	"fmt"
+	"internal/testenv"
+	"io"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"runtime"
+	"strings"
+	"testing"
+	"time"
+)
+
+func privesc(command string, args ...string) error {
+	ctx, cancel := context.WithTimeout(context.Background(), time.Second*5)
+	defer cancel()
+	var cmd *exec.Cmd
+	if runtime.GOOS == "darwin" {
+		cmd = exec.CommandContext(ctx, "sudo", append([]string{"-n", command}, args...)...)
+	} else if runtime.GOOS == "openbsd" {
+		cmd = exec.CommandContext(ctx, "doas", append([]string{"-n", command}, args...)...)
+	} else {
+		cmd = exec.CommandContext(ctx, "su", highPrivUser, "-c", fmt.Sprintf("%s %s", command, strings.Join(args, " ")))
+	}
+	_, err := cmd.CombinedOutput()
+	return err
+}
+
+const highPrivUser = "root"
+
+func setSetuid(t *testing.T, user, bin string) {
+	t.Helper()
+	// We escalate privileges here even if we are root, because for some reason on some builders
+	// (at least freebsd-amd64-13_0) the default PATH doesn't include /usr/sbin, which is where
+	// chown lives, but using 'su root -c' gives us the correct PATH.
+
+	// buildTestProg uses os.MkdirTemp which creates directories with 0700, which prevents
+	// setuid binaries from executing because of the missing g+rx, so we need to set the parent
+	// directory to better permissions before anything else. We created this directory, so we
+	// shouldn't need to do any privilege trickery.
+	if err := privesc("chmod", "0777", filepath.Dir(bin)); err != nil {
+		t.Skipf("unable to set permissions on %q, likely no passwordless sudo/su: %s", filepath.Dir(bin), err)
+	}
+
+	if err := privesc("chown", user, bin); err != nil {
+		t.Skipf("unable to set permissions on test binary, likely no passwordless sudo/su: %s", err)
+	}
+	if err := privesc("chmod", "u+s", bin); err != nil {
+		t.Skipf("unable to set permissions on test binary, likely no passwordless sudo/su: %s", err)
+	}
+}
+
+func TestSUID(t *testing.T) {
+	// This test is relatively simple, we build a test program which opens a
+	// file passed via the TEST_OUTPUT envvar, prints the value of the
+	// GOTRACEBACK envvar to stdout, and prints "hello" to stderr. We then chown
+	// the program to "nobody" and set u+s on it. We execute the program, only
+	// passing it two files, for stdin and stdout, and passing
+	// GOTRACEBACK=system in the env.
+	//
+	// We expect that the program will trigger the SUID protections, resetting
+	// the value of GOTRACEBACK, and opening the missing stderr descriptor, such
+	// that the program prints "GOTRACEBACK=none" to stdout, and nothing gets
+	// written to the file pointed at by TEST_OUTPUT.
+
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+
+	testenv.MustHaveGoBuild(t)
+
+	helloBin, err := buildTestProg(t, "testsuid")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	f, err := os.CreateTemp(t.TempDir(), "suid-output")
+	if err != nil {
+		t.Fatal(err)
+	}
+	tempfilePath := f.Name()
+	f.Close()
+
+	lowPrivUser := "nobody"
+	setSetuid(t, lowPrivUser, helloBin)
+
+	b := bytes.NewBuffer(nil)
+	pr, pw, err := os.Pipe()
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	proc, err := os.StartProcess(helloBin, []string{helloBin}, &os.ProcAttr{
+		Env:   []string{"GOTRACEBACK=system", "TEST_OUTPUT=" + tempfilePath},
+		Files: []*os.File{os.Stdin, pw},
+	})
+	if err != nil {
+		if os.IsPermission(err) {
+			t.Skip("don't have execute permission on setuid binary, possibly directory permission issue?")
+		}
+		t.Fatal(err)
+	}
+	done := make(chan bool, 1)
+	go func() {
+		io.Copy(b, pr)
+		pr.Close()
+		done <- true
+	}()
+	ps, err := proc.Wait()
+	if err != nil {
+		t.Fatal(err)
+	}
+	pw.Close()
+	<-done
+	output := b.String()
+
+	if ps.ExitCode() == 99 {
+		t.Skip("binary wasn't setuid (uid == euid), unable to effectively test")
+	}
+
+	expected := "GOTRACEBACK=none\n"
+	if output != expected {
+		t.Errorf("unexpected output, got: %q, want %q", output, expected)
+	}
+
+	fc, err := os.ReadFile(tempfilePath)
+	if err != nil {
+		t.Fatal(err)
+	}
+	if string(fc) != "" {
+		t.Errorf("unexpected file content, got: %q", string(fc))
+	}
+
+	// TODO: check the registers aren't leaked?
+}
diff --git a/src/runtime/security_unix.go b/src/runtime/security_unix.go
new file mode 100644
index 0000000..16fc87e
--- /dev/null
+++ b/src/runtime/security_unix.go
@@ -0,0 +1,72 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package runtime
+
+func secure() {
+	initSecureMode()
+
+	if !isSecureMode() {
+		return
+	}
+
+	// When secure mode is enabled, we do two things:
+	//   1. ensure the file descriptors 0, 1, and 2 are open, and if not open them,
+	//      pointing at /dev/null (or fail)
+	//   2. enforce specific environment variable values (currently we only force
+	//		GOTRACEBACK=none)
+	//
+	// Other packages may also disable specific functionality when secure mode
+	// is enabled (determined by using linkname to call isSecureMode).
+	//
+	// NOTE: we may eventually want to enforce (1) regardless of whether secure
+	// mode is enabled or not.
+
+	secureFDs()
+	secureEnv()
+}
+
+func secureEnv() {
+	var hasTraceback bool
+	for i := 0; i < len(envs); i++ {
+		if hasPrefix(envs[i], "GOTRACEBACK=") {
+			hasTraceback = true
+			envs[i] = "GOTRACEBACK=none"
+		}
+	}
+	if !hasTraceback {
+		envs = append(envs, "GOTRACEBACK=none")
+	}
+}
+
+func secureFDs() {
+	const (
+		// F_GETFD and EBADF are standard across all unixes, define
+		// them here rather than in each of the OS specific files
+		F_GETFD = 0x01
+		EBADF   = 0x09
+	)
+
+	devNull := []byte("/dev/null\x00")
+	for i := 0; i < 3; i++ {
+		ret, errno := fcntl(int32(i), F_GETFD, 0)
+		if ret >= 0 {
+			continue
+		}
+		if errno != EBADF {
+			print("runtime: unexpected error while checking standard file descriptor ", i, ", errno=", errno, "\n")
+			throw("cannot secure fds")
+		}
+
+		if ret := open(&devNull[0], 2 /* O_RDWR */, 0); ret < 0 {
+			print("runtime: standard file descriptor ", i, " closed, unable to open /dev/null, errno=", errno, "\n")
+			throw("cannot secure fds")
+		} else if ret != int32(i) {
+			print("runtime: opened unexpected file descriptor ", ret, " when attempting to open ", i, "\n")
+			throw("cannot secure fds")
+		}
+	}
+}
diff --git a/src/runtime/select.go b/src/runtime/select.go
new file mode 100644
index 0000000..34c0637
--- /dev/null
+++ b/src/runtime/select.go
@@ -0,0 +1,632 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// This file contains the implementation of Go select statements.
+
+import (
+	"internal/abi"
+	"unsafe"
+)
+
+const debugSelect = false
+
+// Select case descriptor.
+// Known to compiler.
+// Changes here must also be made in src/cmd/compile/internal/walk/select.go's scasetype.
+type scase struct {
+	c    *hchan         // chan
+	elem unsafe.Pointer // data element
+}
+
+var (
+	chansendpc = abi.FuncPCABIInternal(chansend)
+	chanrecvpc = abi.FuncPCABIInternal(chanrecv)
+)
+
+func selectsetpc(pc *uintptr) {
+	*pc = getcallerpc()
+}
+
+func sellock(scases []scase, lockorder []uint16) {
+	var c *hchan
+	for _, o := range lockorder {
+		c0 := scases[o].c
+		if c0 != c {
+			c = c0
+			lock(&c.lock)
+		}
+	}
+}
+
+func selunlock(scases []scase, lockorder []uint16) {
+	// We must be very careful here to not touch sel after we have unlocked
+	// the last lock, because sel can be freed right after the last unlock.
+	// Consider the following situation.
+	// First M calls runtime·park() in runtime·selectgo() passing the sel.
+	// Once runtime·park() has unlocked the last lock, another M makes
+	// the G that calls select runnable again and schedules it for execution.
+	// When the G runs on another M, it locks all the locks and frees sel.
+	// Now if the first M touches sel, it will access freed memory.
+	for i := len(lockorder) - 1; i >= 0; i-- {
+		c := scases[lockorder[i]].c
+		if i > 0 && c == scases[lockorder[i-1]].c {
+			continue // will unlock it on the next iteration
+		}
+		unlock(&c.lock)
+	}
+}
+
+func selparkcommit(gp *g, _ unsafe.Pointer) bool {
+	// There are unlocked sudogs that point into gp's stack. Stack
+	// copying must lock the channels of those sudogs.
+	// Set activeStackChans here instead of before we try parking
+	// because we could self-deadlock in stack growth on a
+	// channel lock.
+	gp.activeStackChans = true
+	// Mark that it's safe for stack shrinking to occur now,
+	// because any thread acquiring this G's stack for shrinking
+	// is guaranteed to observe activeStackChans after this store.
+	gp.parkingOnChan.Store(false)
+	// Make sure we unlock after setting activeStackChans and
+	// unsetting parkingOnChan. The moment we unlock any of the
+	// channel locks we risk gp getting readied by a channel operation
+	// and so gp could continue running before everything before the
+	// unlock is visible (even to gp itself).
+
+	// This must not access gp's stack (see gopark). In
+	// particular, it must not access the *hselect. That's okay,
+	// because by the time this is called, gp.waiting has all
+	// channels in lock order.
+	var lastc *hchan
+	for sg := gp.waiting; sg != nil; sg = sg.waitlink {
+		if sg.c != lastc && lastc != nil {
+			// As soon as we unlock the channel, fields in
+			// any sudog with that channel may change,
+			// including c and waitlink. Since multiple
+			// sudogs may have the same channel, we unlock
+			// only after we've passed the last instance
+			// of a channel.
+			unlock(&lastc.lock)
+		}
+		lastc = sg.c
+	}
+	if lastc != nil {
+		unlock(&lastc.lock)
+	}
+	return true
+}
+
+func block() {
+	gopark(nil, nil, waitReasonSelectNoCases, traceBlockForever, 1) // forever
+}
+
+// selectgo implements the select statement.
+//
+// cas0 points to an array of type [ncases]scase, and order0 points to
+// an array of type [2*ncases]uint16 where ncases must be <= 65536.
+// Both reside on the goroutine's stack (regardless of any escaping in
+// selectgo).
+//
+// For race detector builds, pc0 points to an array of type
+// [ncases]uintptr (also on the stack); for other builds, it's set to
+// nil.
+//
+// selectgo returns the index of the chosen scase, which matches the
+// ordinal position of its respective select{recv,send,default} call.
+// Also, if the chosen scase was a receive operation, it reports whether
+// a value was received.
+func selectgo(cas0 *scase, order0 *uint16, pc0 *uintptr, nsends, nrecvs int, block bool) (int, bool) {
+	if debugSelect {
+		print("select: cas0=", cas0, "\n")
+	}
+
+	// NOTE: In order to maintain a lean stack size, the number of scases
+	// is capped at 65536.
+	cas1 := (*[1 << 16]scase)(unsafe.Pointer(cas0))
+	order1 := (*[1 << 17]uint16)(unsafe.Pointer(order0))
+
+	ncases := nsends + nrecvs
+	scases := cas1[:ncases:ncases]
+	pollorder := order1[:ncases:ncases]
+	lockorder := order1[ncases:][:ncases:ncases]
+	// NOTE: pollorder/lockorder's underlying array was not zero-initialized by compiler.
+
+	// Even when raceenabled is true, there might be select
+	// statements in packages compiled without -race (e.g.,
+	// ensureSigM in runtime/signal_unix.go).
+	var pcs []uintptr
+	if raceenabled && pc0 != nil {
+		pc1 := (*[1 << 16]uintptr)(unsafe.Pointer(pc0))
+		pcs = pc1[:ncases:ncases]
+	}
+	casePC := func(casi int) uintptr {
+		if pcs == nil {
+			return 0
+		}
+		return pcs[casi]
+	}
+
+	var t0 int64
+	if blockprofilerate > 0 {
+		t0 = cputicks()
+	}
+
+	// The compiler rewrites selects that statically have
+	// only 0 or 1 cases plus default into simpler constructs.
+	// The only way we can end up with such small sel.ncase
+	// values here is for a larger select in which most channels
+	// have been nilled out. The general code handles those
+	// cases correctly, and they are rare enough not to bother
+	// optimizing (and needing to test).
+
+	// generate permuted order
+	norder := 0
+	for i := range scases {
+		cas := &scases[i]
+
+		// Omit cases without channels from the poll and lock orders.
+		if cas.c == nil {
+			cas.elem = nil // allow GC
+			continue
+		}
+
+		j := fastrandn(uint32(norder + 1))
+		pollorder[norder] = pollorder[j]
+		pollorder[j] = uint16(i)
+		norder++
+	}
+	pollorder = pollorder[:norder]
+	lockorder = lockorder[:norder]
+
+	// sort the cases by Hchan address to get the locking order.
+	// simple heap sort, to guarantee n log n time and constant stack footprint.
+	for i := range lockorder {
+		j := i
+		// Start with the pollorder to permute cases on the same channel.
+		c := scases[pollorder[i]].c
+		for j > 0 && scases[lockorder[(j-1)/2]].c.sortkey() < c.sortkey() {
+			k := (j - 1) / 2
+			lockorder[j] = lockorder[k]
+			j = k
+		}
+		lockorder[j] = pollorder[i]
+	}
+	for i := len(lockorder) - 1; i >= 0; i-- {
+		o := lockorder[i]
+		c := scases[o].c
+		lockorder[i] = lockorder[0]
+		j := 0
+		for {
+			k := j*2 + 1
+			if k >= i {
+				break
+			}
+			if k+1 < i && scases[lockorder[k]].c.sortkey() < scases[lockorder[k+1]].c.sortkey() {
+				k++
+			}
+			if c.sortkey() < scases[lockorder[k]].c.sortkey() {
+				lockorder[j] = lockorder[k]
+				j = k
+				continue
+			}
+			break
+		}
+		lockorder[j] = o
+	}
+
+	if debugSelect {
+		for i := 0; i+1 < len(lockorder); i++ {
+			if scases[lockorder[i]].c.sortkey() > scases[lockorder[i+1]].c.sortkey() {
+				print("i=", i, " x=", lockorder[i], " y=", lockorder[i+1], "\n")
+				throw("select: broken sort")
+			}
+		}
+	}
+
+	// lock all the channels involved in the select
+	sellock(scases, lockorder)
+
+	var (
+		gp     *g
+		sg     *sudog
+		c      *hchan
+		k      *scase
+		sglist *sudog
+		sgnext *sudog
+		qp     unsafe.Pointer
+		nextp  **sudog
+	)
+
+	// pass 1 - look for something already waiting
+	var casi int
+	var cas *scase
+	var caseSuccess bool
+	var caseReleaseTime int64 = -1
+	var recvOK bool
+	for _, casei := range pollorder {
+		casi = int(casei)
+		cas = &scases[casi]
+		c = cas.c
+
+		if casi >= nsends {
+			sg = c.sendq.dequeue()
+			if sg != nil {
+				goto recv
+			}
+			if c.qcount > 0 {
+				goto bufrecv
+			}
+			if c.closed != 0 {
+				goto rclose
+			}
+		} else {
+			if raceenabled {
+				racereadpc(c.raceaddr(), casePC(casi), chansendpc)
+			}
+			if c.closed != 0 {
+				goto sclose
+			}
+			sg = c.recvq.dequeue()
+			if sg != nil {
+				goto send
+			}
+			if c.qcount < c.dataqsiz {
+				goto bufsend
+			}
+		}
+	}
+
+	if !block {
+		selunlock(scases, lockorder)
+		casi = -1
+		goto retc
+	}
+
+	// pass 2 - enqueue on all chans
+	gp = getg()
+	if gp.waiting != nil {
+		throw("gp.waiting != nil")
+	}
+	nextp = &gp.waiting
+	for _, casei := range lockorder {
+		casi = int(casei)
+		cas = &scases[casi]
+		c = cas.c
+		sg := acquireSudog()
+		sg.g = gp
+		sg.isSelect = true
+		// No stack splits between assigning elem and enqueuing
+		// sg on gp.waiting where copystack can find it.
+		sg.elem = cas.elem
+		sg.releasetime = 0
+		if t0 != 0 {
+			sg.releasetime = -1
+		}
+		sg.c = c
+		// Construct waiting list in lock order.
+		*nextp = sg
+		nextp = &sg.waitlink
+
+		if casi < nsends {
+			c.sendq.enqueue(sg)
+		} else {
+			c.recvq.enqueue(sg)
+		}
+	}
+
+	// wait for someone to wake us up
+	gp.param = nil
+	// Signal to anyone trying to shrink our stack that we're about
+	// to park on a channel. The window between when this G's status
+	// changes and when we set gp.activeStackChans is not safe for
+	// stack shrinking.
+	gp.parkingOnChan.Store(true)
+	gopark(selparkcommit, nil, waitReasonSelect, traceBlockSelect, 1)
+	gp.activeStackChans = false
+
+	sellock(scases, lockorder)
+
+	gp.selectDone.Store(0)
+	sg = (*sudog)(gp.param)
+	gp.param = nil
+
+	// pass 3 - dequeue from unsuccessful chans
+	// otherwise they stack up on quiet channels
+	// record the successful case, if any.
+	// We singly-linked up the SudoGs in lock order.
+	casi = -1
+	cas = nil
+	caseSuccess = false
+	sglist = gp.waiting
+	// Clear all elem before unlinking from gp.waiting.
+	for sg1 := gp.waiting; sg1 != nil; sg1 = sg1.waitlink {
+		sg1.isSelect = false
+		sg1.elem = nil
+		sg1.c = nil
+	}
+	gp.waiting = nil
+
+	for _, casei := range lockorder {
+		k = &scases[casei]
+		if sg == sglist {
+			// sg has already been dequeued by the G that woke us up.
+			casi = int(casei)
+			cas = k
+			caseSuccess = sglist.success
+			if sglist.releasetime > 0 {
+				caseReleaseTime = sglist.releasetime
+			}
+		} else {
+			c = k.c
+			if int(casei) < nsends {
+				c.sendq.dequeueSudoG(sglist)
+			} else {
+				c.recvq.dequeueSudoG(sglist)
+			}
+		}
+		sgnext = sglist.waitlink
+		sglist.waitlink = nil
+		releaseSudog(sglist)
+		sglist = sgnext
+	}
+
+	if cas == nil {
+		throw("selectgo: bad wakeup")
+	}
+
+	c = cas.c
+
+	if debugSelect {
+		print("wait-return: cas0=", cas0, " c=", c, " cas=", cas, " send=", casi < nsends, "\n")
+	}
+
+	if casi < nsends {
+		if !caseSuccess {
+			goto sclose
+		}
+	} else {
+		recvOK = caseSuccess
+	}
+
+	if raceenabled {
+		if casi < nsends {
+			raceReadObjectPC(c.elemtype, cas.elem, casePC(casi), chansendpc)
+		} else if cas.elem != nil {
+			raceWriteObjectPC(c.elemtype, cas.elem, casePC(casi), chanrecvpc)
+		}
+	}
+	if msanenabled {
+		if casi < nsends {
+			msanread(cas.elem, c.elemtype.Size_)
+		} else if cas.elem != nil {
+			msanwrite(cas.elem, c.elemtype.Size_)
+		}
+	}
+	if asanenabled {
+		if casi < nsends {
+			asanread(cas.elem, c.elemtype.Size_)
+		} else if cas.elem != nil {
+			asanwrite(cas.elem, c.elemtype.Size_)
+		}
+	}
+
+	selunlock(scases, lockorder)
+	goto retc
+
+bufrecv:
+	// can receive from buffer
+	if raceenabled {
+		if cas.elem != nil {
+			raceWriteObjectPC(c.elemtype, cas.elem, casePC(casi), chanrecvpc)
+		}
+		racenotify(c, c.recvx, nil)
+	}
+	if msanenabled && cas.elem != nil {
+		msanwrite(cas.elem, c.elemtype.Size_)
+	}
+	if asanenabled && cas.elem != nil {
+		asanwrite(cas.elem, c.elemtype.Size_)
+	}
+	recvOK = true
+	qp = chanbuf(c, c.recvx)
+	if cas.elem != nil {
+		typedmemmove(c.elemtype, cas.elem, qp)
+	}
+	typedmemclr(c.elemtype, qp)
+	c.recvx++
+	if c.recvx == c.dataqsiz {
+		c.recvx = 0
+	}
+	c.qcount--
+	selunlock(scases, lockorder)
+	goto retc
+
+bufsend:
+	// can send to buffer
+	if raceenabled {
+		racenotify(c, c.sendx, nil)
+		raceReadObjectPC(c.elemtype, cas.elem, casePC(casi), chansendpc)
+	}
+	if msanenabled {
+		msanread(cas.elem, c.elemtype.Size_)
+	}
+	if asanenabled {
+		asanread(cas.elem, c.elemtype.Size_)
+	}
+	typedmemmove(c.elemtype, chanbuf(c, c.sendx), cas.elem)
+	c.sendx++
+	if c.sendx == c.dataqsiz {
+		c.sendx = 0
+	}
+	c.qcount++
+	selunlock(scases, lockorder)
+	goto retc
+
+recv:
+	// can receive from sleeping sender (sg)
+	recv(c, sg, cas.elem, func() { selunlock(scases, lockorder) }, 2)
+	if debugSelect {
+		print("syncrecv: cas0=", cas0, " c=", c, "\n")
+	}
+	recvOK = true
+	goto retc
+
+rclose:
+	// read at end of closed channel
+	selunlock(scases, lockorder)
+	recvOK = false
+	if cas.elem != nil {
+		typedmemclr(c.elemtype, cas.elem)
+	}
+	if raceenabled {
+		raceacquire(c.raceaddr())
+	}
+	goto retc
+
+send:
+	// can send to a sleeping receiver (sg)
+	if raceenabled {
+		raceReadObjectPC(c.elemtype, cas.elem, casePC(casi), chansendpc)
+	}
+	if msanenabled {
+		msanread(cas.elem, c.elemtype.Size_)
+	}
+	if asanenabled {
+		asanread(cas.elem, c.elemtype.Size_)
+	}
+	send(c, sg, cas.elem, func() { selunlock(scases, lockorder) }, 2)
+	if debugSelect {
+		print("syncsend: cas0=", cas0, " c=", c, "\n")
+	}
+	goto retc
+
+retc:
+	if caseReleaseTime > 0 {
+		blockevent(caseReleaseTime-t0, 1)
+	}
+	return casi, recvOK
+
+sclose:
+	// send on closed channel
+	selunlock(scases, lockorder)
+	panic(plainError("send on closed channel"))
+}
+
+func (c *hchan) sortkey() uintptr {
+	return uintptr(unsafe.Pointer(c))
+}
+
+// A runtimeSelect is a single case passed to rselect.
+// This must match ../reflect/value.go:/runtimeSelect
+type runtimeSelect struct {
+	dir selectDir
+	typ unsafe.Pointer // channel type (not used here)
+	ch  *hchan         // channel
+	val unsafe.Pointer // ptr to data (SendDir) or ptr to receive buffer (RecvDir)
+}
+
+// These values must match ../reflect/value.go:/SelectDir.
+type selectDir int
+
+const (
+	_             selectDir = iota
+	selectSend              // case Chan <- Send
+	selectRecv              // case <-Chan:
+	selectDefault           // default
+)
+
+//go:linkname reflect_rselect reflect.rselect
+func reflect_rselect(cases []runtimeSelect) (int, bool) {
+	if len(cases) == 0 {
+		block()
+	}
+	sel := make([]scase, len(cases))
+	orig := make([]int, len(cases))
+	nsends, nrecvs := 0, 0
+	dflt := -1
+	for i, rc := range cases {
+		var j int
+		switch rc.dir {
+		case selectDefault:
+			dflt = i
+			continue
+		case selectSend:
+			j = nsends
+			nsends++
+		case selectRecv:
+			nrecvs++
+			j = len(cases) - nrecvs
+		}
+
+		sel[j] = scase{c: rc.ch, elem: rc.val}
+		orig[j] = i
+	}
+
+	// Only a default case.
+	if nsends+nrecvs == 0 {
+		return dflt, false
+	}
+
+	// Compact sel and orig if necessary.
+	if nsends+nrecvs < len(cases) {
+		copy(sel[nsends:], sel[len(cases)-nrecvs:])
+		copy(orig[nsends:], orig[len(cases)-nrecvs:])
+	}
+
+	order := make([]uint16, 2*(nsends+nrecvs))
+	var pc0 *uintptr
+	if raceenabled {
+		pcs := make([]uintptr, nsends+nrecvs)
+		for i := range pcs {
+			selectsetpc(&pcs[i])
+		}
+		pc0 = &pcs[0]
+	}
+
+	chosen, recvOK := selectgo(&sel[0], &order[0], pc0, nsends, nrecvs, dflt == -1)
+
+	// Translate chosen back to caller's ordering.
+	if chosen < 0 {
+		chosen = dflt
+	} else {
+		chosen = orig[chosen]
+	}
+	return chosen, recvOK
+}
+
+func (q *waitq) dequeueSudoG(sgp *sudog) {
+	x := sgp.prev
+	y := sgp.next
+	if x != nil {
+		if y != nil {
+			// middle of queue
+			x.next = y
+			y.prev = x
+			sgp.next = nil
+			sgp.prev = nil
+			return
+		}
+		// end of queue
+		x.next = nil
+		q.last = x
+		sgp.prev = nil
+		return
+	}
+	if y != nil {
+		// start of queue
+		y.prev = nil
+		q.first = y
+		sgp.next = nil
+		return
+	}
+
+	// x==y==nil. Either sgp is the only element in the queue,
+	// or it has already been removed. Use q.first to disambiguate.
+	if q.first == sgp {
+		q.first = nil
+		q.last = nil
+	}
+}
diff --git a/src/runtime/sema.go b/src/runtime/sema.go
new file mode 100644
index 0000000..d0a8117
--- /dev/null
+++ b/src/runtime/sema.go
@@ -0,0 +1,633 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Semaphore implementation exposed to Go.
+// Intended use is provide a sleep and wakeup
+// primitive that can be used in the contended case
+// of other synchronization primitives.
+// Thus it targets the same goal as Linux's futex,
+// but it has much simpler semantics.
+//
+// That is, don't think of these as semaphores.
+// Think of them as a way to implement sleep and wakeup
+// such that every sleep is paired with a single wakeup,
+// even if, due to races, the wakeup happens before the sleep.
+//
+// See Mullender and Cox, ``Semaphores in Plan 9,''
+// https://swtch.com/semaphore.pdf
+
+package runtime
+
+import (
+	"internal/cpu"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// Asynchronous semaphore for sync.Mutex.
+
+// A semaRoot holds a balanced tree of sudog with distinct addresses (s.elem).
+// Each of those sudog may in turn point (through s.waitlink) to a list
+// of other sudogs waiting on the same address.
+// The operations on the inner lists of sudogs with the same address
+// are all O(1). The scanning of the top-level semaRoot list is O(log n),
+// where n is the number of distinct addresses with goroutines blocked
+// on them that hash to the given semaRoot.
+// See golang.org/issue/17953 for a program that worked badly
+// before we introduced the second level of list, and
+// BenchmarkSemTable/OneAddrCollision/* for a benchmark that exercises this.
+type semaRoot struct {
+	lock  mutex
+	treap *sudog        // root of balanced tree of unique waiters.
+	nwait atomic.Uint32 // Number of waiters. Read w/o the lock.
+}
+
+var semtable semTable
+
+// Prime to not correlate with any user patterns.
+const semTabSize = 251
+
+type semTable [semTabSize]struct {
+	root semaRoot
+	pad  [cpu.CacheLinePadSize - unsafe.Sizeof(semaRoot{})]byte
+}
+
+func (t *semTable) rootFor(addr *uint32) *semaRoot {
+	return &t[(uintptr(unsafe.Pointer(addr))>>3)%semTabSize].root
+}
+
+//go:linkname sync_runtime_Semacquire sync.runtime_Semacquire
+func sync_runtime_Semacquire(addr *uint32) {
+	semacquire1(addr, false, semaBlockProfile, 0, waitReasonSemacquire)
+}
+
+//go:linkname poll_runtime_Semacquire internal/poll.runtime_Semacquire
+func poll_runtime_Semacquire(addr *uint32) {
+	semacquire1(addr, false, semaBlockProfile, 0, waitReasonSemacquire)
+}
+
+//go:linkname sync_runtime_Semrelease sync.runtime_Semrelease
+func sync_runtime_Semrelease(addr *uint32, handoff bool, skipframes int) {
+	semrelease1(addr, handoff, skipframes)
+}
+
+//go:linkname sync_runtime_SemacquireMutex sync.runtime_SemacquireMutex
+func sync_runtime_SemacquireMutex(addr *uint32, lifo bool, skipframes int) {
+	semacquire1(addr, lifo, semaBlockProfile|semaMutexProfile, skipframes, waitReasonSyncMutexLock)
+}
+
+//go:linkname sync_runtime_SemacquireRWMutexR sync.runtime_SemacquireRWMutexR
+func sync_runtime_SemacquireRWMutexR(addr *uint32, lifo bool, skipframes int) {
+	semacquire1(addr, lifo, semaBlockProfile|semaMutexProfile, skipframes, waitReasonSyncRWMutexRLock)
+}
+
+//go:linkname sync_runtime_SemacquireRWMutex sync.runtime_SemacquireRWMutex
+func sync_runtime_SemacquireRWMutex(addr *uint32, lifo bool, skipframes int) {
+	semacquire1(addr, lifo, semaBlockProfile|semaMutexProfile, skipframes, waitReasonSyncRWMutexLock)
+}
+
+//go:linkname poll_runtime_Semrelease internal/poll.runtime_Semrelease
+func poll_runtime_Semrelease(addr *uint32) {
+	semrelease(addr)
+}
+
+func readyWithTime(s *sudog, traceskip int) {
+	if s.releasetime != 0 {
+		s.releasetime = cputicks()
+	}
+	goready(s.g, traceskip)
+}
+
+type semaProfileFlags int
+
+const (
+	semaBlockProfile semaProfileFlags = 1 << iota
+	semaMutexProfile
+)
+
+// Called from runtime.
+func semacquire(addr *uint32) {
+	semacquire1(addr, false, 0, 0, waitReasonSemacquire)
+}
+
+func semacquire1(addr *uint32, lifo bool, profile semaProfileFlags, skipframes int, reason waitReason) {
+	gp := getg()
+	if gp != gp.m.curg {
+		throw("semacquire not on the G stack")
+	}
+
+	// Easy case.
+	if cansemacquire(addr) {
+		return
+	}
+
+	// Harder case:
+	//	increment waiter count
+	//	try cansemacquire one more time, return if succeeded
+	//	enqueue itself as a waiter
+	//	sleep
+	//	(waiter descriptor is dequeued by signaler)
+	s := acquireSudog()
+	root := semtable.rootFor(addr)
+	t0 := int64(0)
+	s.releasetime = 0
+	s.acquiretime = 0
+	s.ticket = 0
+	if profile&semaBlockProfile != 0 && blockprofilerate > 0 {
+		t0 = cputicks()
+		s.releasetime = -1
+	}
+	if profile&semaMutexProfile != 0 && mutexprofilerate > 0 {
+		if t0 == 0 {
+			t0 = cputicks()
+		}
+		s.acquiretime = t0
+	}
+	for {
+		lockWithRank(&root.lock, lockRankRoot)
+		// Add ourselves to nwait to disable "easy case" in semrelease.
+		root.nwait.Add(1)
+		// Check cansemacquire to avoid missed wakeup.
+		if cansemacquire(addr) {
+			root.nwait.Add(-1)
+			unlock(&root.lock)
+			break
+		}
+		// Any semrelease after the cansemacquire knows we're waiting
+		// (we set nwait above), so go to sleep.
+		root.queue(addr, s, lifo)
+		goparkunlock(&root.lock, reason, traceBlockSync, 4+skipframes)
+		if s.ticket != 0 || cansemacquire(addr) {
+			break
+		}
+	}
+	if s.releasetime > 0 {
+		blockevent(s.releasetime-t0, 3+skipframes)
+	}
+	releaseSudog(s)
+}
+
+func semrelease(addr *uint32) {
+	semrelease1(addr, false, 0)
+}
+
+func semrelease1(addr *uint32, handoff bool, skipframes int) {
+	root := semtable.rootFor(addr)
+	atomic.Xadd(addr, 1)
+
+	// Easy case: no waiters?
+	// This check must happen after the xadd, to avoid a missed wakeup
+	// (see loop in semacquire).
+	if root.nwait.Load() == 0 {
+		return
+	}
+
+	// Harder case: search for a waiter and wake it.
+	lockWithRank(&root.lock, lockRankRoot)
+	if root.nwait.Load() == 0 {
+		// The count is already consumed by another goroutine,
+		// so no need to wake up another goroutine.
+		unlock(&root.lock)
+		return
+	}
+	s, t0 := root.dequeue(addr)
+	if s != nil {
+		root.nwait.Add(-1)
+	}
+	unlock(&root.lock)
+	if s != nil { // May be slow or even yield, so unlock first
+		acquiretime := s.acquiretime
+		if acquiretime != 0 {
+			mutexevent(t0-acquiretime, 3+skipframes)
+		}
+		if s.ticket != 0 {
+			throw("corrupted semaphore ticket")
+		}
+		if handoff && cansemacquire(addr) {
+			s.ticket = 1
+		}
+		readyWithTime(s, 5+skipframes)
+		if s.ticket == 1 && getg().m.locks == 0 {
+			// Direct G handoff
+			// readyWithTime has added the waiter G as runnext in the
+			// current P; we now call the scheduler so that we start running
+			// the waiter G immediately.
+			// Note that waiter inherits our time slice: this is desirable
+			// to avoid having a highly contended semaphore hog the P
+			// indefinitely. goyield is like Gosched, but it emits a
+			// "preempted" trace event instead and, more importantly, puts
+			// the current G on the local runq instead of the global one.
+			// We only do this in the starving regime (handoff=true), as in
+			// the non-starving case it is possible for a different waiter
+			// to acquire the semaphore while we are yielding/scheduling,
+			// and this would be wasteful. We wait instead to enter starving
+			// regime, and then we start to do direct handoffs of ticket and
+			// P.
+			// See issue 33747 for discussion.
+			goyield()
+		}
+	}
+}
+
+func cansemacquire(addr *uint32) bool {
+	for {
+		v := atomic.Load(addr)
+		if v == 0 {
+			return false
+		}
+		if atomic.Cas(addr, v, v-1) {
+			return true
+		}
+	}
+}
+
+// queue adds s to the blocked goroutines in semaRoot.
+func (root *semaRoot) queue(addr *uint32, s *sudog, lifo bool) {
+	s.g = getg()
+	s.elem = unsafe.Pointer(addr)
+	s.next = nil
+	s.prev = nil
+
+	var last *sudog
+	pt := &root.treap
+	for t := *pt; t != nil; t = *pt {
+		if t.elem == unsafe.Pointer(addr) {
+			// Already have addr in list.
+			if lifo {
+				// Substitute s in t's place in treap.
+				*pt = s
+				s.ticket = t.ticket
+				s.acquiretime = t.acquiretime
+				s.parent = t.parent
+				s.prev = t.prev
+				s.next = t.next
+				if s.prev != nil {
+					s.prev.parent = s
+				}
+				if s.next != nil {
+					s.next.parent = s
+				}
+				// Add t first in s's wait list.
+				s.waitlink = t
+				s.waittail = t.waittail
+				if s.waittail == nil {
+					s.waittail = t
+				}
+				t.parent = nil
+				t.prev = nil
+				t.next = nil
+				t.waittail = nil
+			} else {
+				// Add s to end of t's wait list.
+				if t.waittail == nil {
+					t.waitlink = s
+				} else {
+					t.waittail.waitlink = s
+				}
+				t.waittail = s
+				s.waitlink = nil
+			}
+			return
+		}
+		last = t
+		if uintptr(unsafe.Pointer(addr)) < uintptr(t.elem) {
+			pt = &t.prev
+		} else {
+			pt = &t.next
+		}
+	}
+
+	// Add s as new leaf in tree of unique addrs.
+	// The balanced tree is a treap using ticket as the random heap priority.
+	// That is, it is a binary tree ordered according to the elem addresses,
+	// but then among the space of possible binary trees respecting those
+	// addresses, it is kept balanced on average by maintaining a heap ordering
+	// on the ticket: s.ticket <= both s.prev.ticket and s.next.ticket.
+	// https://en.wikipedia.org/wiki/Treap
+	// https://faculty.washington.edu/aragon/pubs/rst89.pdf
+	//
+	// s.ticket compared with zero in couple of places, therefore set lowest bit.
+	// It will not affect treap's quality noticeably.
+	s.ticket = fastrand() | 1
+	s.parent = last
+	*pt = s
+
+	// Rotate up into tree according to ticket (priority).
+	for s.parent != nil && s.parent.ticket > s.ticket {
+		if s.parent.prev == s {
+			root.rotateRight(s.parent)
+		} else {
+			if s.parent.next != s {
+				panic("semaRoot queue")
+			}
+			root.rotateLeft(s.parent)
+		}
+	}
+}
+
+// dequeue searches for and finds the first goroutine
+// in semaRoot blocked on addr.
+// If the sudog was being profiled, dequeue returns the time
+// at which it was woken up as now. Otherwise now is 0.
+func (root *semaRoot) dequeue(addr *uint32) (found *sudog, now int64) {
+	ps := &root.treap
+	s := *ps
+	for ; s != nil; s = *ps {
+		if s.elem == unsafe.Pointer(addr) {
+			goto Found
+		}
+		if uintptr(unsafe.Pointer(addr)) < uintptr(s.elem) {
+			ps = &s.prev
+		} else {
+			ps = &s.next
+		}
+	}
+	return nil, 0
+
+Found:
+	now = int64(0)
+	if s.acquiretime != 0 {
+		now = cputicks()
+	}
+	if t := s.waitlink; t != nil {
+		// Substitute t, also waiting on addr, for s in root tree of unique addrs.
+		*ps = t
+		t.ticket = s.ticket
+		t.parent = s.parent
+		t.prev = s.prev
+		if t.prev != nil {
+			t.prev.parent = t
+		}
+		t.next = s.next
+		if t.next != nil {
+			t.next.parent = t
+		}
+		if t.waitlink != nil {
+			t.waittail = s.waittail
+		} else {
+			t.waittail = nil
+		}
+		t.acquiretime = now
+		s.waitlink = nil
+		s.waittail = nil
+	} else {
+		// Rotate s down to be leaf of tree for removal, respecting priorities.
+		for s.next != nil || s.prev != nil {
+			if s.next == nil || s.prev != nil && s.prev.ticket < s.next.ticket {
+				root.rotateRight(s)
+			} else {
+				root.rotateLeft(s)
+			}
+		}
+		// Remove s, now a leaf.
+		if s.parent != nil {
+			if s.parent.prev == s {
+				s.parent.prev = nil
+			} else {
+				s.parent.next = nil
+			}
+		} else {
+			root.treap = nil
+		}
+	}
+	s.parent = nil
+	s.elem = nil
+	s.next = nil
+	s.prev = nil
+	s.ticket = 0
+	return s, now
+}
+
+// rotateLeft rotates the tree rooted at node x.
+// turning (x a (y b c)) into (y (x a b) c).
+func (root *semaRoot) rotateLeft(x *sudog) {
+	// p -> (x a (y b c))
+	p := x.parent
+	y := x.next
+	b := y.prev
+
+	y.prev = x
+	x.parent = y
+	x.next = b
+	if b != nil {
+		b.parent = x
+	}
+
+	y.parent = p
+	if p == nil {
+		root.treap = y
+	} else if p.prev == x {
+		p.prev = y
+	} else {
+		if p.next != x {
+			throw("semaRoot rotateLeft")
+		}
+		p.next = y
+	}
+}
+
+// rotateRight rotates the tree rooted at node y.
+// turning (y (x a b) c) into (x a (y b c)).
+func (root *semaRoot) rotateRight(y *sudog) {
+	// p -> (y (x a b) c)
+	p := y.parent
+	x := y.prev
+	b := x.next
+
+	x.next = y
+	y.parent = x
+	y.prev = b
+	if b != nil {
+		b.parent = y
+	}
+
+	x.parent = p
+	if p == nil {
+		root.treap = x
+	} else if p.prev == y {
+		p.prev = x
+	} else {
+		if p.next != y {
+			throw("semaRoot rotateRight")
+		}
+		p.next = x
+	}
+}
+
+// notifyList is a ticket-based notification list used to implement sync.Cond.
+//
+// It must be kept in sync with the sync package.
+type notifyList struct {
+	// wait is the ticket number of the next waiter. It is atomically
+	// incremented outside the lock.
+	wait atomic.Uint32
+
+	// notify is the ticket number of the next waiter to be notified. It can
+	// be read outside the lock, but is only written to with lock held.
+	//
+	// Both wait & notify can wrap around, and such cases will be correctly
+	// handled as long as their "unwrapped" difference is bounded by 2^31.
+	// For this not to be the case, we'd need to have 2^31+ goroutines
+	// blocked on the same condvar, which is currently not possible.
+	notify uint32
+
+	// List of parked waiters.
+	lock mutex
+	head *sudog
+	tail *sudog
+}
+
+// less checks if a < b, considering a & b running counts that may overflow the
+// 32-bit range, and that their "unwrapped" difference is always less than 2^31.
+func less(a, b uint32) bool {
+	return int32(a-b) < 0
+}
+
+// notifyListAdd adds the caller to a notify list such that it can receive
+// notifications. The caller must eventually call notifyListWait to wait for
+// such a notification, passing the returned ticket number.
+//
+//go:linkname notifyListAdd sync.runtime_notifyListAdd
+func notifyListAdd(l *notifyList) uint32 {
+	// This may be called concurrently, for example, when called from
+	// sync.Cond.Wait while holding a RWMutex in read mode.
+	return l.wait.Add(1) - 1
+}
+
+// notifyListWait waits for a notification. If one has been sent since
+// notifyListAdd was called, it returns immediately. Otherwise, it blocks.
+//
+//go:linkname notifyListWait sync.runtime_notifyListWait
+func notifyListWait(l *notifyList, t uint32) {
+	lockWithRank(&l.lock, lockRankNotifyList)
+
+	// Return right away if this ticket has already been notified.
+	if less(t, l.notify) {
+		unlock(&l.lock)
+		return
+	}
+
+	// Enqueue itself.
+	s := acquireSudog()
+	s.g = getg()
+	s.ticket = t
+	s.releasetime = 0
+	t0 := int64(0)
+	if blockprofilerate > 0 {
+		t0 = cputicks()
+		s.releasetime = -1
+	}
+	if l.tail == nil {
+		l.head = s
+	} else {
+		l.tail.next = s
+	}
+	l.tail = s
+	goparkunlock(&l.lock, waitReasonSyncCondWait, traceBlockCondWait, 3)
+	if t0 != 0 {
+		blockevent(s.releasetime-t0, 2)
+	}
+	releaseSudog(s)
+}
+
+// notifyListNotifyAll notifies all entries in the list.
+//
+//go:linkname notifyListNotifyAll sync.runtime_notifyListNotifyAll
+func notifyListNotifyAll(l *notifyList) {
+	// Fast-path: if there are no new waiters since the last notification
+	// we don't need to acquire the lock.
+	if l.wait.Load() == atomic.Load(&l.notify) {
+		return
+	}
+
+	// Pull the list out into a local variable, waiters will be readied
+	// outside the lock.
+	lockWithRank(&l.lock, lockRankNotifyList)
+	s := l.head
+	l.head = nil
+	l.tail = nil
+
+	// Update the next ticket to be notified. We can set it to the current
+	// value of wait because any previous waiters are already in the list
+	// or will notice that they have already been notified when trying to
+	// add themselves to the list.
+	atomic.Store(&l.notify, l.wait.Load())
+	unlock(&l.lock)
+
+	// Go through the local list and ready all waiters.
+	for s != nil {
+		next := s.next
+		s.next = nil
+		readyWithTime(s, 4)
+		s = next
+	}
+}
+
+// notifyListNotifyOne notifies one entry in the list.
+//
+//go:linkname notifyListNotifyOne sync.runtime_notifyListNotifyOne
+func notifyListNotifyOne(l *notifyList) {
+	// Fast-path: if there are no new waiters since the last notification
+	// we don't need to acquire the lock at all.
+	if l.wait.Load() == atomic.Load(&l.notify) {
+		return
+	}
+
+	lockWithRank(&l.lock, lockRankNotifyList)
+
+	// Re-check under the lock if we need to do anything.
+	t := l.notify
+	if t == l.wait.Load() {
+		unlock(&l.lock)
+		return
+	}
+
+	// Update the next notify ticket number.
+	atomic.Store(&l.notify, t+1)
+
+	// Try to find the g that needs to be notified.
+	// If it hasn't made it to the list yet we won't find it,
+	// but it won't park itself once it sees the new notify number.
+	//
+	// This scan looks linear but essentially always stops quickly.
+	// Because g's queue separately from taking numbers,
+	// there may be minor reorderings in the list, but we
+	// expect the g we're looking for to be near the front.
+	// The g has others in front of it on the list only to the
+	// extent that it lost the race, so the iteration will not
+	// be too long. This applies even when the g is missing:
+	// it hasn't yet gotten to sleep and has lost the race to
+	// the (few) other g's that we find on the list.
+	for p, s := (*sudog)(nil), l.head; s != nil; p, s = s, s.next {
+		if s.ticket == t {
+			n := s.next
+			if p != nil {
+				p.next = n
+			} else {
+				l.head = n
+			}
+			if n == nil {
+				l.tail = p
+			}
+			unlock(&l.lock)
+			s.next = nil
+			readyWithTime(s, 4)
+			return
+		}
+	}
+	unlock(&l.lock)
+}
+
+//go:linkname notifyListCheck sync.runtime_notifyListCheck
+func notifyListCheck(sz uintptr) {
+	if sz != unsafe.Sizeof(notifyList{}) {
+		print("runtime: bad notifyList size - sync=", sz, " runtime=", unsafe.Sizeof(notifyList{}), "\n")
+		throw("bad notifyList size")
+	}
+}
+
+//go:linkname sync_nanotime sync.runtime_nanotime
+func sync_nanotime() int64 {
+	return nanotime()
+}
diff --git a/src/runtime/sema_test.go b/src/runtime/sema_test.go
new file mode 100644
index 0000000..9943d2e
--- /dev/null
+++ b/src/runtime/sema_test.go
@@ -0,0 +1,170 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	. "runtime"
+	"sync"
+	"sync/atomic"
+	"testing"
+)
+
+// TestSemaHandoff checks that when semrelease+handoff is
+// requested, the G that releases the semaphore yields its
+// P directly to the first waiter in line.
+// See issue 33747 for discussion.
+func TestSemaHandoff(t *testing.T) {
+	const iter = 10000
+	ok := 0
+	for i := 0; i < iter; i++ {
+		if testSemaHandoff() {
+			ok++
+		}
+	}
+	// As long as two thirds of handoffs are direct, we
+	// consider the test successful. The scheduler is
+	// nondeterministic, so this test checks that we get the
+	// desired outcome in a significant majority of cases.
+	// The actual ratio of direct handoffs is much higher
+	// (>90%) but we use a lower threshold to minimize the
+	// chances that unrelated changes in the runtime will
+	// cause the test to fail or become flaky.
+	if ok < iter*2/3 {
+		t.Fatal("direct handoff < 2/3:", ok, iter)
+	}
+}
+
+func TestSemaHandoff1(t *testing.T) {
+	if GOMAXPROCS(-1) <= 1 {
+		t.Skip("GOMAXPROCS <= 1")
+	}
+	defer GOMAXPROCS(GOMAXPROCS(-1))
+	GOMAXPROCS(1)
+	TestSemaHandoff(t)
+}
+
+func TestSemaHandoff2(t *testing.T) {
+	if GOMAXPROCS(-1) <= 2 {
+		t.Skip("GOMAXPROCS <= 2")
+	}
+	defer GOMAXPROCS(GOMAXPROCS(-1))
+	GOMAXPROCS(2)
+	TestSemaHandoff(t)
+}
+
+func testSemaHandoff() bool {
+	var sema, res uint32
+	done := make(chan struct{})
+
+	// We're testing that the current goroutine is able to yield its time slice
+	// to another goroutine. Stop the current goroutine from migrating to
+	// another CPU where it can win the race (and appear to have not yielded) by
+	// keeping the CPUs slightly busy.
+	var wg sync.WaitGroup
+	for i := 0; i < GOMAXPROCS(-1); i++ {
+		wg.Add(1)
+		go func() {
+			defer wg.Done()
+			for {
+				select {
+				case <-done:
+					return
+				default:
+				}
+				Gosched()
+			}
+		}()
+	}
+
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		Semacquire(&sema)
+		atomic.CompareAndSwapUint32(&res, 0, 1)
+
+		Semrelease1(&sema, true, 0)
+		close(done)
+	}()
+	for SemNwait(&sema) == 0 {
+		Gosched() // wait for goroutine to block in Semacquire
+	}
+
+	// The crux of the test: we release the semaphore with handoff
+	// and immediately perform a CAS both here and in the waiter; we
+	// want the CAS in the waiter to execute first.
+	Semrelease1(&sema, true, 0)
+	atomic.CompareAndSwapUint32(&res, 0, 2)
+
+	wg.Wait() // wait for goroutines to finish to avoid data races
+
+	return res == 1 // did the waiter run first?
+}
+
+func BenchmarkSemTable(b *testing.B) {
+	for _, n := range []int{1000, 2000, 4000, 8000} {
+		b.Run(fmt.Sprintf("OneAddrCollision/n=%d", n), func(b *testing.B) {
+			tab := Escape(new(SemTable))
+			u := make([]uint32, SemTableSize+1)
+
+			b.ResetTimer()
+
+			for j := 0; j < b.N; j++ {
+				// Simulate two locks colliding on the same semaRoot.
+				//
+				// Specifically enqueue all the waiters for the first lock,
+				// then all the waiters for the second lock.
+				//
+				// Then, dequeue all the waiters from the first lock, then
+				// the second.
+				//
+				// Each enqueue/dequeue operation should be O(1), because
+				// there are exactly 2 locks. This could be O(n) if all
+				// the waiters for both locks are on the same list, as it
+				// once was.
+				for i := 0; i < n; i++ {
+					if i < n/2 {
+						tab.Enqueue(&u[0])
+					} else {
+						tab.Enqueue(&u[SemTableSize])
+					}
+				}
+				for i := 0; i < n; i++ {
+					var ok bool
+					if i < n/2 {
+						ok = tab.Dequeue(&u[0])
+					} else {
+						ok = tab.Dequeue(&u[SemTableSize])
+					}
+					if !ok {
+						b.Fatal("failed to dequeue")
+					}
+				}
+			}
+		})
+		b.Run(fmt.Sprintf("ManyAddrCollision/n=%d", n), func(b *testing.B) {
+			tab := Escape(new(SemTable))
+			u := make([]uint32, n*SemTableSize)
+
+			b.ResetTimer()
+
+			for j := 0; j < b.N; j++ {
+				// Simulate n locks colliding on the same semaRoot.
+				//
+				// Each enqueue/dequeue operation should be O(log n), because
+				// each semaRoot is a tree. This could be O(n) if it was
+				// some simpler data structure.
+				for i := 0; i < n; i++ {
+					tab.Enqueue(&u[i*SemTableSize])
+				}
+				for i := 0; i < n; i++ {
+					if !tab.Dequeue(&u[i*SemTableSize]) {
+						b.Fatal("failed to dequeue")
+					}
+				}
+			}
+		})
+	}
+}
diff --git a/src/runtime/semasleep_test.go b/src/runtime/semasleep_test.go
new file mode 100644
index 0000000..711d5df
--- /dev/null
+++ b/src/runtime/semasleep_test.go
@@ -0,0 +1,121 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9 && !windows && !js && !wasip1
+
+package runtime_test
+
+import (
+	"io"
+	"os/exec"
+	"syscall"
+	"testing"
+	"time"
+)
+
+// Issue #27250. Spurious wakeups to pthread_cond_timedwait_relative_np
+// shouldn't cause semasleep to retry with the same timeout which would
+// cause indefinite spinning.
+func TestSpuriousWakeupsNeverHangSemasleep(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+	t.Parallel() // Waits for a program to sleep for 1s.
+
+	exe, err := buildTestProg(t, "testprog")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	cmd := exec.Command(exe, "After1")
+	stdout, err := cmd.StdoutPipe()
+	if err != nil {
+		t.Fatalf("StdoutPipe: %v", err)
+	}
+	beforeStart := time.Now()
+	if err := cmd.Start(); err != nil {
+		t.Fatalf("Failed to start command: %v", err)
+	}
+
+	waiting := false
+	doneCh := make(chan error, 1)
+	t.Cleanup(func() {
+		cmd.Process.Kill()
+		if waiting {
+			<-doneCh
+		} else {
+			cmd.Wait()
+		}
+	})
+
+	// Wait for After1 to close its stdout so that we know the runtime's SIGIO
+	// handler is registered.
+	b, err := io.ReadAll(stdout)
+	if len(b) > 0 {
+		t.Logf("read from testprog stdout: %s", b)
+	}
+	if err != nil {
+		t.Fatalf("error reading from testprog: %v", err)
+	}
+
+	// Wait for child exit.
+	//
+	// Note that we must do this after waiting for the write/child end of
+	// stdout to close. Wait closes the read/parent end of stdout, so
+	// starting this goroutine prior to io.ReadAll introduces a race
+	// condition where ReadAll may get fs.ErrClosed if the child exits too
+	// quickly.
+	waiting = true
+	go func() {
+		doneCh <- cmd.Wait()
+		close(doneCh)
+	}()
+
+	// Wait for an arbitrary timeout longer than one second. The subprocess itself
+	// attempts to sleep for one second, but if the machine running the test is
+	// heavily loaded that subprocess may not schedule very quickly even if the
+	// bug remains fixed. (This is fine, because if the bug really is unfixed we
+	// can keep the process hung indefinitely, as long as we signal it often
+	// enough.)
+	timeout := 10 * time.Second
+
+	// The subprocess begins sleeping for 1s after it writes to stdout, so measure
+	// the timeout from here (not from when we started creating the process).
+	// That should reduce noise from process startup overhead.
+	ready := time.Now()
+
+	// With the repro running, we can continuously send to it
+	// a signal that the runtime considers non-terminal,
+	// such as SIGIO, to spuriously wake up
+	// pthread_cond_timedwait_relative_np.
+	ticker := time.NewTicker(200 * time.Millisecond)
+	defer ticker.Stop()
+	for {
+		select {
+		case now := <-ticker.C:
+			if now.Sub(ready) > timeout {
+				t.Error("Program failed to return on time and has to be killed, issue #27520 still exists")
+				// Send SIGQUIT to get a goroutine dump.
+				// Stop sending SIGIO so that the program can clean up and actually terminate.
+				cmd.Process.Signal(syscall.SIGQUIT)
+				return
+			}
+
+			// Send the pesky signal that toggles spinning
+			// indefinitely if #27520 is not fixed.
+			cmd.Process.Signal(syscall.SIGIO)
+
+		case err := <-doneCh:
+			if err != nil {
+				t.Fatalf("The program returned but unfortunately with an error: %v", err)
+			}
+			if time.Since(beforeStart) < 1*time.Second {
+				// The program was supposed to sleep for a full (monotonic) second;
+				// it should not return before that has elapsed.
+				t.Fatalf("The program stopped too quickly.")
+			}
+			return
+		}
+	}
+}
diff --git a/src/runtime/sigaction.go b/src/runtime/sigaction.go
new file mode 100644
index 0000000..05f44f6
--- /dev/null
+++ b/src/runtime/sigaction.go
@@ -0,0 +1,16 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build (linux && !amd64 && !arm64 && !ppc64le) || (freebsd && !amd64)
+
+package runtime
+
+// This version is used on Linux and FreeBSD systems on which we don't
+// use cgo to call the C version of sigaction.
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigaction(sig uint32, new, old *sigactiont) {
+	sysSigaction(sig, new, old)
+}
diff --git a/src/runtime/signal_386.go b/src/runtime/signal_386.go
new file mode 100644
index 0000000..aa66032
--- /dev/null
+++ b/src/runtime/signal_386.go
@@ -0,0 +1,59 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build dragonfly || freebsd || linux || netbsd || openbsd
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+func dumpregs(c *sigctxt) {
+	print("eax    ", hex(c.eax()), "\n")
+	print("ebx    ", hex(c.ebx()), "\n")
+	print("ecx    ", hex(c.ecx()), "\n")
+	print("edx    ", hex(c.edx()), "\n")
+	print("edi    ", hex(c.edi()), "\n")
+	print("esi    ", hex(c.esi()), "\n")
+	print("ebp    ", hex(c.ebp()), "\n")
+	print("esp    ", hex(c.esp()), "\n")
+	print("eip    ", hex(c.eip()), "\n")
+	print("eflags ", hex(c.eflags()), "\n")
+	print("cs     ", hex(c.cs()), "\n")
+	print("fs     ", hex(c.fs()), "\n")
+	print("gs     ", hex(c.gs()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.eip()) }
+
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.esp()) }
+func (c *sigctxt) siglr() uintptr { return 0 }
+func (c *sigctxt) fault() uintptr { return uintptr(c.sigaddr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	pc := uintptr(c.eip())
+	sp := uintptr(c.esp())
+
+	if shouldPushSigpanic(gp, pc, *(*uintptr)(unsafe.Pointer(sp))) {
+		c.pushCall(abi.FuncPCABIInternal(sigpanic), pc)
+	} else {
+		// Not safe to push the call. Just clobber the frame.
+		c.set_eip(uint32(abi.FuncPCABIInternal(sigpanic)))
+	}
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Make it look like we called target at resumePC.
+	sp := uintptr(c.esp())
+	sp -= goarch.PtrSize
+	*(*uintptr)(unsafe.Pointer(sp)) = resumePC
+	c.set_esp(uint32(sp))
+	c.set_eip(uint32(targetPC))
+}
diff --git a/src/runtime/signal_aix_ppc64.go b/src/runtime/signal_aix_ppc64.go
new file mode 100644
index 0000000..c6cb91a
--- /dev/null
+++ b/src/runtime/signal_aix_ppc64.go
@@ -0,0 +1,85 @@
+/// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build aix
+
+package runtime
+
+import (
+	"internal/goarch"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *context64 { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) r0() uint64  { return c.regs().gpr[0] }
+func (c *sigctxt) r1() uint64  { return c.regs().gpr[1] }
+func (c *sigctxt) r2() uint64  { return c.regs().gpr[2] }
+func (c *sigctxt) r3() uint64  { return c.regs().gpr[3] }
+func (c *sigctxt) r4() uint64  { return c.regs().gpr[4] }
+func (c *sigctxt) r5() uint64  { return c.regs().gpr[5] }
+func (c *sigctxt) r6() uint64  { return c.regs().gpr[6] }
+func (c *sigctxt) r7() uint64  { return c.regs().gpr[7] }
+func (c *sigctxt) r8() uint64  { return c.regs().gpr[8] }
+func (c *sigctxt) r9() uint64  { return c.regs().gpr[9] }
+func (c *sigctxt) r10() uint64 { return c.regs().gpr[10] }
+func (c *sigctxt) r11() uint64 { return c.regs().gpr[11] }
+func (c *sigctxt) r12() uint64 { return c.regs().gpr[12] }
+func (c *sigctxt) r13() uint64 { return c.regs().gpr[13] }
+func (c *sigctxt) r14() uint64 { return c.regs().gpr[14] }
+func (c *sigctxt) r15() uint64 { return c.regs().gpr[15] }
+func (c *sigctxt) r16() uint64 { return c.regs().gpr[16] }
+func (c *sigctxt) r17() uint64 { return c.regs().gpr[17] }
+func (c *sigctxt) r18() uint64 { return c.regs().gpr[18] }
+func (c *sigctxt) r19() uint64 { return c.regs().gpr[19] }
+func (c *sigctxt) r20() uint64 { return c.regs().gpr[20] }
+func (c *sigctxt) r21() uint64 { return c.regs().gpr[21] }
+func (c *sigctxt) r22() uint64 { return c.regs().gpr[22] }
+func (c *sigctxt) r23() uint64 { return c.regs().gpr[23] }
+func (c *sigctxt) r24() uint64 { return c.regs().gpr[24] }
+func (c *sigctxt) r25() uint64 { return c.regs().gpr[25] }
+func (c *sigctxt) r26() uint64 { return c.regs().gpr[26] }
+func (c *sigctxt) r27() uint64 { return c.regs().gpr[27] }
+func (c *sigctxt) r28() uint64 { return c.regs().gpr[28] }
+func (c *sigctxt) r29() uint64 { return c.regs().gpr[29] }
+func (c *sigctxt) r30() uint64 { return c.regs().gpr[30] }
+func (c *sigctxt) r31() uint64 { return c.regs().gpr[31] }
+func (c *sigctxt) sp() uint64  { return c.regs().gpr[1] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().iar }
+
+func (c *sigctxt) ctr() uint64    { return c.regs().ctr }
+func (c *sigctxt) link() uint64   { return c.regs().lr }
+func (c *sigctxt) xer() uint32    { return c.regs().xer }
+func (c *sigctxt) ccr() uint32    { return c.regs().cr }
+func (c *sigctxt) fpscr() uint32  { return c.regs().fpscr }
+func (c *sigctxt) fpscrx() uint32 { return c.regs().fpscrx }
+
+// TODO(aix): find trap equivalent
+func (c *sigctxt) trap() uint32 { return 0x0 }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return uint64(c.info.si_addr) }
+func (c *sigctxt) fault() uintptr  { return uintptr(c.sigaddr()) }
+
+func (c *sigctxt) set_r0(x uint64)   { c.regs().gpr[0] = x }
+func (c *sigctxt) set_r12(x uint64)  { c.regs().gpr[12] = x }
+func (c *sigctxt) set_r30(x uint64)  { c.regs().gpr[30] = x }
+func (c *sigctxt) set_pc(x uint64)   { c.regs().iar = x }
+func (c *sigctxt) set_sp(x uint64)   { c.regs().gpr[1] = x }
+func (c *sigctxt) set_link(x uint64) { c.regs().lr = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*goarch.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_amd64.go b/src/runtime/signal_amd64.go
new file mode 100644
index 0000000..8ade208
--- /dev/null
+++ b/src/runtime/signal_amd64.go
@@ -0,0 +1,87 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build amd64 && (darwin || dragonfly || freebsd || linux || netbsd || openbsd || solaris)
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+func dumpregs(c *sigctxt) {
+	print("rax    ", hex(c.rax()), "\n")
+	print("rbx    ", hex(c.rbx()), "\n")
+	print("rcx    ", hex(c.rcx()), "\n")
+	print("rdx    ", hex(c.rdx()), "\n")
+	print("rdi    ", hex(c.rdi()), "\n")
+	print("rsi    ", hex(c.rsi()), "\n")
+	print("rbp    ", hex(c.rbp()), "\n")
+	print("rsp    ", hex(c.rsp()), "\n")
+	print("r8     ", hex(c.r8()), "\n")
+	print("r9     ", hex(c.r9()), "\n")
+	print("r10    ", hex(c.r10()), "\n")
+	print("r11    ", hex(c.r11()), "\n")
+	print("r12    ", hex(c.r12()), "\n")
+	print("r13    ", hex(c.r13()), "\n")
+	print("r14    ", hex(c.r14()), "\n")
+	print("r15    ", hex(c.r15()), "\n")
+	print("rip    ", hex(c.rip()), "\n")
+	print("rflags ", hex(c.rflags()), "\n")
+	print("cs     ", hex(c.cs()), "\n")
+	print("fs     ", hex(c.fs()), "\n")
+	print("gs     ", hex(c.gs()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.rip()) }
+
+func (c *sigctxt) setsigpc(x uint64) { c.set_rip(x) }
+func (c *sigctxt) sigsp() uintptr    { return uintptr(c.rsp()) }
+func (c *sigctxt) siglr() uintptr    { return 0 }
+func (c *sigctxt) fault() uintptr    { return uintptr(c.sigaddr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// Work around Leopard bug that doesn't set FPE_INTDIV.
+	// Look at instruction to see if it is a divide.
+	// Not necessary in Snow Leopard (si_code will be != 0).
+	if GOOS == "darwin" && sig == _SIGFPE && gp.sigcode0 == 0 {
+		pc := (*[4]byte)(unsafe.Pointer(gp.sigpc))
+		i := 0
+		if pc[i]&0xF0 == 0x40 { // 64-bit REX prefix
+			i++
+		} else if pc[i] == 0x66 { // 16-bit instruction prefix
+			i++
+		}
+		if pc[i] == 0xF6 || pc[i] == 0xF7 {
+			gp.sigcode0 = _FPE_INTDIV
+		}
+	}
+
+	pc := uintptr(c.rip())
+	sp := uintptr(c.rsp())
+
+	// In case we are panicking from external code, we need to initialize
+	// Go special registers. We inject sigpanic0 (instead of sigpanic),
+	// which takes care of that.
+	if shouldPushSigpanic(gp, pc, *(*uintptr)(unsafe.Pointer(sp))) {
+		c.pushCall(abi.FuncPCABI0(sigpanic0), pc)
+	} else {
+		// Not safe to push the call. Just clobber the frame.
+		c.set_rip(uint64(abi.FuncPCABI0(sigpanic0)))
+	}
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Make it look like we called target at resumePC.
+	sp := uintptr(c.rsp())
+	sp -= goarch.PtrSize
+	*(*uintptr)(unsafe.Pointer(sp)) = resumePC
+	c.set_rsp(uint64(sp))
+	c.set_rip(uint64(targetPC))
+}
diff --git a/src/runtime/signal_arm.go b/src/runtime/signal_arm.go
new file mode 100644
index 0000000..fff302f
--- /dev/null
+++ b/src/runtime/signal_arm.go
@@ -0,0 +1,81 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build dragonfly || freebsd || linux || netbsd || openbsd
+
+package runtime
+
+import (
+	"internal/abi"
+	"unsafe"
+)
+
+func dumpregs(c *sigctxt) {
+	print("trap    ", hex(c.trap()), "\n")
+	print("error   ", hex(c.error()), "\n")
+	print("oldmask ", hex(c.oldmask()), "\n")
+	print("r0      ", hex(c.r0()), "\n")
+	print("r1      ", hex(c.r1()), "\n")
+	print("r2      ", hex(c.r2()), "\n")
+	print("r3      ", hex(c.r3()), "\n")
+	print("r4      ", hex(c.r4()), "\n")
+	print("r5      ", hex(c.r5()), "\n")
+	print("r6      ", hex(c.r6()), "\n")
+	print("r7      ", hex(c.r7()), "\n")
+	print("r8      ", hex(c.r8()), "\n")
+	print("r9      ", hex(c.r9()), "\n")
+	print("r10     ", hex(c.r10()), "\n")
+	print("fp      ", hex(c.fp()), "\n")
+	print("ip      ", hex(c.ip()), "\n")
+	print("sp      ", hex(c.sp()), "\n")
+	print("lr      ", hex(c.lr()), "\n")
+	print("pc      ", hex(c.pc()), "\n")
+	print("cpsr    ", hex(c.cpsr()), "\n")
+	print("fault   ", hex(c.fault()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.pc()) }
+
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.sp()) }
+func (c *sigctxt) siglr() uintptr { return uintptr(c.lr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// We arrange lr, and pc to pretend the panicking
+	// function calls sigpanic directly.
+	// Always save LR to stack so that panics in leaf
+	// functions are correctly handled. This smashes
+	// the stack frame but we're not going back there
+	// anyway.
+	sp := c.sp() - 4
+	c.set_sp(sp)
+	*(*uint32)(unsafe.Pointer(uintptr(sp))) = c.lr()
+
+	pc := gp.sigpc
+
+	if shouldPushSigpanic(gp, pc, uintptr(c.lr())) {
+		// Make it look the like faulting PC called sigpanic.
+		c.set_lr(uint32(pc))
+	}
+
+	// In case we are panicking from external C code
+	c.set_r10(uint32(uintptr(unsafe.Pointer(gp))))
+	c.set_pc(uint32(abi.FuncPCABIInternal(sigpanic)))
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Push the LR to stack, as we'll clobber it in order to
+	// push the call. The function being pushed is responsible
+	// for restoring the LR and setting the SP back.
+	// This extra slot is known to gentraceback.
+	sp := c.sp() - 4
+	c.set_sp(sp)
+	*(*uint32)(unsafe.Pointer(uintptr(sp))) = c.lr()
+	// Set up PC and LR to pretend the function being signaled
+	// calls targetPC at resumePC.
+	c.set_lr(uint32(resumePC))
+	c.set_pc(uint32(targetPC))
+}
diff --git a/src/runtime/signal_arm64.go b/src/runtime/signal_arm64.go
new file mode 100644
index 0000000..4a96b3c
--- /dev/null
+++ b/src/runtime/signal_arm64.go
@@ -0,0 +1,107 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build darwin || freebsd || linux || netbsd || openbsd
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+func dumpregs(c *sigctxt) {
+	print("r0      ", hex(c.r0()), "\n")
+	print("r1      ", hex(c.r1()), "\n")
+	print("r2      ", hex(c.r2()), "\n")
+	print("r3      ", hex(c.r3()), "\n")
+	print("r4      ", hex(c.r4()), "\n")
+	print("r5      ", hex(c.r5()), "\n")
+	print("r6      ", hex(c.r6()), "\n")
+	print("r7      ", hex(c.r7()), "\n")
+	print("r8      ", hex(c.r8()), "\n")
+	print("r9      ", hex(c.r9()), "\n")
+	print("r10     ", hex(c.r10()), "\n")
+	print("r11     ", hex(c.r11()), "\n")
+	print("r12     ", hex(c.r12()), "\n")
+	print("r13     ", hex(c.r13()), "\n")
+	print("r14     ", hex(c.r14()), "\n")
+	print("r15     ", hex(c.r15()), "\n")
+	print("r16     ", hex(c.r16()), "\n")
+	print("r17     ", hex(c.r17()), "\n")
+	print("r18     ", hex(c.r18()), "\n")
+	print("r19     ", hex(c.r19()), "\n")
+	print("r20     ", hex(c.r20()), "\n")
+	print("r21     ", hex(c.r21()), "\n")
+	print("r22     ", hex(c.r22()), "\n")
+	print("r23     ", hex(c.r23()), "\n")
+	print("r24     ", hex(c.r24()), "\n")
+	print("r25     ", hex(c.r25()), "\n")
+	print("r26     ", hex(c.r26()), "\n")
+	print("r27     ", hex(c.r27()), "\n")
+	print("r28     ", hex(c.r28()), "\n")
+	print("r29     ", hex(c.r29()), "\n")
+	print("lr      ", hex(c.lr()), "\n")
+	print("sp      ", hex(c.sp()), "\n")
+	print("pc      ", hex(c.pc()), "\n")
+	print("fault   ", hex(c.fault()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.pc()) }
+
+func (c *sigctxt) setsigpc(x uint64) { c.set_pc(x) }
+func (c *sigctxt) sigsp() uintptr    { return uintptr(c.sp()) }
+func (c *sigctxt) siglr() uintptr    { return uintptr(c.lr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// We arrange lr, and pc to pretend the panicking
+	// function calls sigpanic directly.
+	// Always save LR to stack so that panics in leaf
+	// functions are correctly handled. This smashes
+	// the stack frame but we're not going back there
+	// anyway.
+	sp := c.sp() - sys.StackAlign // needs only sizeof uint64, but must align the stack
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.lr()
+	// Make sure a valid frame pointer is saved on the stack so that the
+	// frame pointer checks in adjustframe are happy, if they're enabled.
+	// Frame pointer unwinding won't visit the sigpanic frame, since
+	// sigpanic will save the same frame pointer before calling into a panic
+	// function.
+	*(*uint64)(unsafe.Pointer(uintptr(sp - goarch.PtrSize))) = c.r29()
+
+	pc := gp.sigpc
+
+	if shouldPushSigpanic(gp, pc, uintptr(c.lr())) {
+		// Make it look the like faulting PC called sigpanic.
+		c.set_lr(uint64(pc))
+	}
+
+	// In case we are panicking from external C code
+	c.set_r28(uint64(uintptr(unsafe.Pointer(gp))))
+	c.set_pc(uint64(abi.FuncPCABIInternal(sigpanic)))
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Push the LR to stack, as we'll clobber it in order to
+	// push the call. The function being pushed is responsible
+	// for restoring the LR and setting the SP back.
+	// This extra space is known to gentraceback.
+	sp := c.sp() - 16 // SP needs 16-byte alignment
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.lr()
+	// Make sure a valid frame pointer is saved on the stack so that the
+	// frame pointer checks in adjustframe are happy, if they're enabled.
+	// This is not actually used for unwinding.
+	*(*uint64)(unsafe.Pointer(uintptr(sp - goarch.PtrSize))) = c.r29()
+	// Set up PC and LR to pretend the function being signaled
+	// calls targetPC at resumePC.
+	c.set_lr(uint64(resumePC))
+	c.set_pc(uint64(targetPC))
+}
diff --git a/src/runtime/signal_darwin.go b/src/runtime/signal_darwin.go
new file mode 100644
index 0000000..8090fb2
--- /dev/null
+++ b/src/runtime/signal_darwin.go
@@ -0,0 +1,40 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	/* 0 */ {0, "SIGNONE: no trap"},
+	/* 1 */ {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
+	/* 2 */ {_SigNotify + _SigKill, "SIGINT: interrupt"},
+	/* 3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+	/* 4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
+	/* 5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
+	/* 6 */ {_SigNotify + _SigThrow, "SIGABRT: abort"},
+	/* 7 */ {_SigThrow, "SIGEMT: emulate instruction executed"},
+	/* 8 */ {_SigPanic + _SigUnblock, "SIGFPE: floating-point exception"},
+	/* 9 */ {0, "SIGKILL: kill"},
+	/* 10 */ {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	/* 11 */ {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	/* 12 */ {_SigThrow, "SIGSYS: bad system call"},
+	/* 13 */ {_SigNotify, "SIGPIPE: write to broken pipe"},
+	/* 14 */ {_SigNotify, "SIGALRM: alarm clock"},
+	/* 15 */ {_SigNotify + _SigKill, "SIGTERM: termination"},
+	/* 16 */ {_SigNotify + _SigIgn, "SIGURG: urgent condition on socket"},
+	/* 17 */ {0, "SIGSTOP: stop"},
+	/* 18 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTSTP: keyboard stop"},
+	/* 19 */ {_SigNotify + _SigDefault + _SigIgn, "SIGCONT: continue after stop"},
+	/* 20 */ {_SigNotify + _SigUnblock + _SigIgn, "SIGCHLD: child status has changed"},
+	/* 21 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTIN: background read from tty"},
+	/* 22 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTOU: background write to tty"},
+	/* 23 */ {_SigNotify + _SigIgn, "SIGIO: i/o now possible"},
+	/* 24 */ {_SigNotify, "SIGXCPU: cpu limit exceeded"},
+	/* 25 */ {_SigNotify, "SIGXFSZ: file size limit exceeded"},
+	/* 26 */ {_SigNotify, "SIGVTALRM: virtual alarm clock"},
+	/* 27 */ {_SigNotify + _SigUnblock, "SIGPROF: profiling alarm clock"},
+	/* 28 */ {_SigNotify + _SigIgn, "SIGWINCH: window size change"},
+	/* 29 */ {_SigNotify + _SigIgn, "SIGINFO: status request from keyboard"},
+	/* 30 */ {_SigNotify, "SIGUSR1: user-defined signal 1"},
+	/* 31 */ {_SigNotify, "SIGUSR2: user-defined signal 2"},
+}
diff --git a/src/runtime/signal_darwin_amd64.go b/src/runtime/signal_darwin_amd64.go
new file mode 100644
index 0000000..20544d8
--- /dev/null
+++ b/src/runtime/signal_darwin_amd64.go
@@ -0,0 +1,96 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *regs64 { return &(*ucontext)(c.ctxt).uc_mcontext.ss }
+
+func (c *sigctxt) rax() uint64 { return c.regs().rax }
+func (c *sigctxt) rbx() uint64 { return c.regs().rbx }
+func (c *sigctxt) rcx() uint64 { return c.regs().rcx }
+func (c *sigctxt) rdx() uint64 { return c.regs().rdx }
+func (c *sigctxt) rdi() uint64 { return c.regs().rdi }
+func (c *sigctxt) rsi() uint64 { return c.regs().rsi }
+func (c *sigctxt) rbp() uint64 { return c.regs().rbp }
+func (c *sigctxt) rsp() uint64 { return c.regs().rsp }
+func (c *sigctxt) r8() uint64  { return c.regs().r8 }
+func (c *sigctxt) r9() uint64  { return c.regs().r9 }
+func (c *sigctxt) r10() uint64 { return c.regs().r10 }
+func (c *sigctxt) r11() uint64 { return c.regs().r11 }
+func (c *sigctxt) r12() uint64 { return c.regs().r12 }
+func (c *sigctxt) r13() uint64 { return c.regs().r13 }
+func (c *sigctxt) r14() uint64 { return c.regs().r14 }
+func (c *sigctxt) r15() uint64 { return c.regs().r15 }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) rip() uint64 { return c.regs().rip }
+
+func (c *sigctxt) rflags() uint64  { return c.regs().rflags }
+func (c *sigctxt) cs() uint64      { return c.regs().cs }
+func (c *sigctxt) fs() uint64      { return c.regs().fs }
+func (c *sigctxt) gs() uint64      { return c.regs().gs }
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_rip(x uint64)     { c.regs().rip = x }
+func (c *sigctxt) set_rsp(x uint64)     { c.regs().rsp = x }
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) { c.info.si_addr = x }
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+	switch sig {
+	case _SIGTRAP:
+		// OS X sets c.sigcode() == TRAP_BRKPT unconditionally for all SIGTRAPs,
+		// leaving no way to distinguish a breakpoint-induced SIGTRAP
+		// from an asynchronous signal SIGTRAP.
+		// They all look breakpoint-induced by default.
+		// Try looking at the code to see if it's a breakpoint.
+		// The assumption is that we're very unlikely to get an
+		// asynchronous SIGTRAP at just the moment that the
+		// PC started to point at unmapped memory.
+		pc := uintptr(c.rip())
+		// OS X will leave the pc just after the INT 3 instruction.
+		// INT 3 is usually 1 byte, but there is a 2-byte form.
+		code := (*[2]byte)(unsafe.Pointer(pc - 2))
+		if code[1] != 0xCC && (code[0] != 0xCD || code[1] != 3) {
+			// SIGTRAP on something other than INT 3.
+			c.set_sigcode(_SI_USER)
+		}
+
+	case _SIGSEGV:
+		// x86-64 has 48-bit virtual addresses. The top 16 bits must echo bit 47.
+		// The hardware delivers a different kind of fault for a malformed address
+		// than it does for an attempt to access a valid but unmapped address.
+		// OS X 10.9.2 mishandles the malformed address case, making it look like
+		// a user-generated signal (like someone ran kill -SEGV ourpid).
+		// We pass user-generated signals to os/signal, or else ignore them.
+		// Doing that here - and returning to the faulting code - results in an
+		// infinite loop. It appears the best we can do is rewrite what the kernel
+		// delivers into something more like the truth. The address used below
+		// has very little chance of being the one that caused the fault, but it is
+		// malformed, it is clearly not a real pointer, and if it does get printed
+		// in real life, people will probably search for it and find this code.
+		// There are no Google hits for b01dfacedebac1e or 0xb01dfacedebac1e
+		// as I type this comment.
+		//
+		// Note: if this code is removed, please consider
+		// enabling TestSignalForwardingGo for darwin-amd64 in
+		// misc/cgo/testcarchive/carchive_test.go.
+		if c.sigcode() == _SI_USER {
+			c.set_sigcode(_SI_USER + 1)
+			c.set_sigaddr(0xb01dfacedebac1e)
+		}
+	}
+}
diff --git a/src/runtime/signal_darwin_arm64.go b/src/runtime/signal_darwin_arm64.go
new file mode 100644
index 0000000..690ffe4
--- /dev/null
+++ b/src/runtime/signal_darwin_arm64.go
@@ -0,0 +1,90 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *regs64 { return &(*ucontext)(c.ctxt).uc_mcontext.ss }
+
+func (c *sigctxt) r0() uint64  { return c.regs().x[0] }
+func (c *sigctxt) r1() uint64  { return c.regs().x[1] }
+func (c *sigctxt) r2() uint64  { return c.regs().x[2] }
+func (c *sigctxt) r3() uint64  { return c.regs().x[3] }
+func (c *sigctxt) r4() uint64  { return c.regs().x[4] }
+func (c *sigctxt) r5() uint64  { return c.regs().x[5] }
+func (c *sigctxt) r6() uint64  { return c.regs().x[6] }
+func (c *sigctxt) r7() uint64  { return c.regs().x[7] }
+func (c *sigctxt) r8() uint64  { return c.regs().x[8] }
+func (c *sigctxt) r9() uint64  { return c.regs().x[9] }
+func (c *sigctxt) r10() uint64 { return c.regs().x[10] }
+func (c *sigctxt) r11() uint64 { return c.regs().x[11] }
+func (c *sigctxt) r12() uint64 { return c.regs().x[12] }
+func (c *sigctxt) r13() uint64 { return c.regs().x[13] }
+func (c *sigctxt) r14() uint64 { return c.regs().x[14] }
+func (c *sigctxt) r15() uint64 { return c.regs().x[15] }
+func (c *sigctxt) r16() uint64 { return c.regs().x[16] }
+func (c *sigctxt) r17() uint64 { return c.regs().x[17] }
+func (c *sigctxt) r18() uint64 { return c.regs().x[18] }
+func (c *sigctxt) r19() uint64 { return c.regs().x[19] }
+func (c *sigctxt) r20() uint64 { return c.regs().x[20] }
+func (c *sigctxt) r21() uint64 { return c.regs().x[21] }
+func (c *sigctxt) r22() uint64 { return c.regs().x[22] }
+func (c *sigctxt) r23() uint64 { return c.regs().x[23] }
+func (c *sigctxt) r24() uint64 { return c.regs().x[24] }
+func (c *sigctxt) r25() uint64 { return c.regs().x[25] }
+func (c *sigctxt) r26() uint64 { return c.regs().x[26] }
+func (c *sigctxt) r27() uint64 { return c.regs().x[27] }
+func (c *sigctxt) r28() uint64 { return c.regs().x[28] }
+func (c *sigctxt) r29() uint64 { return c.regs().fp }
+func (c *sigctxt) lr() uint64  { return c.regs().lr }
+func (c *sigctxt) sp() uint64  { return c.regs().sp }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().pc }
+
+func (c *sigctxt) fault() uintptr { return uintptr(unsafe.Pointer(c.info.si_addr)) }
+
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return uint64(uintptr(unsafe.Pointer(c.info.si_addr))) }
+
+func (c *sigctxt) set_pc(x uint64)  { c.regs().pc = x }
+func (c *sigctxt) set_sp(x uint64)  { c.regs().sp = x }
+func (c *sigctxt) set_lr(x uint64)  { c.regs().lr = x }
+func (c *sigctxt) set_r28(x uint64) { c.regs().x[28] = x }
+
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	c.info.si_addr = (*byte)(unsafe.Pointer(uintptr(x)))
+}
+
+//go:nosplit
+func (c *sigctxt) fixsigcode(sig uint32) {
+	switch sig {
+	case _SIGTRAP:
+		// OS X sets c.sigcode() == TRAP_BRKPT unconditionally for all SIGTRAPs,
+		// leaving no way to distinguish a breakpoint-induced SIGTRAP
+		// from an asynchronous signal SIGTRAP.
+		// They all look breakpoint-induced by default.
+		// Try looking at the code to see if it's a breakpoint.
+		// The assumption is that we're very unlikely to get an
+		// asynchronous SIGTRAP at just the moment that the
+		// PC started to point at unmapped memory.
+		pc := uintptr(c.pc())
+		// OS X will leave the pc just after the instruction.
+		code := (*uint32)(unsafe.Pointer(pc - 4))
+		if *code != 0xd4200000 {
+			// SIGTRAP on something other than breakpoint.
+			c.set_sigcode(_SI_USER)
+		}
+	}
+}
diff --git a/src/runtime/signal_dragonfly.go b/src/runtime/signal_dragonfly.go
new file mode 100644
index 0000000..f2b26e7
--- /dev/null
+++ b/src/runtime/signal_dragonfly.go
@@ -0,0 +1,41 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	/* 0 */ {0, "SIGNONE: no trap"},
+	/* 1 */ {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
+	/* 2 */ {_SigNotify + _SigKill, "SIGINT: interrupt"},
+	/* 3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+	/* 4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
+	/* 5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
+	/* 6 */ {_SigNotify + _SigThrow, "SIGABRT: abort"},
+	/* 7 */ {_SigThrow, "SIGEMT: emulate instruction executed"},
+	/* 8 */ {_SigPanic + _SigUnblock, "SIGFPE: floating-point exception"},
+	/* 9 */ {0, "SIGKILL: kill"},
+	/* 10 */ {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	/* 11 */ {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	/* 12 */ {_SigThrow, "SIGSYS: bad system call"},
+	/* 13 */ {_SigNotify, "SIGPIPE: write to broken pipe"},
+	/* 14 */ {_SigNotify, "SIGALRM: alarm clock"},
+	/* 15 */ {_SigNotify + _SigKill, "SIGTERM: termination"},
+	/* 16 */ {_SigNotify + _SigIgn, "SIGURG: urgent condition on socket"},
+	/* 17 */ {0, "SIGSTOP: stop"},
+	/* 18 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTSTP: keyboard stop"},
+	/* 19 */ {_SigNotify + _SigDefault + _SigIgn, "SIGCONT: continue after stop"},
+	/* 20 */ {_SigNotify + _SigUnblock + _SigIgn, "SIGCHLD: child status has changed"},
+	/* 21 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTIN: background read from tty"},
+	/* 22 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTOU: background write to tty"},
+	/* 23 */ {_SigNotify + _SigIgn, "SIGIO: i/o now possible"},
+	/* 24 */ {_SigNotify, "SIGXCPU: cpu limit exceeded"},
+	/* 25 */ {_SigNotify, "SIGXFSZ: file size limit exceeded"},
+	/* 26 */ {_SigNotify, "SIGVTALRM: virtual alarm clock"},
+	/* 27 */ {_SigNotify + _SigUnblock, "SIGPROF: profiling alarm clock"},
+	/* 28 */ {_SigNotify + _SigIgn, "SIGWINCH: window size change"},
+	/* 29 */ {_SigNotify + _SigIgn, "SIGINFO: status request from keyboard"},
+	/* 30 */ {_SigNotify, "SIGUSR1: user-defined signal 1"},
+	/* 31 */ {_SigNotify, "SIGUSR2: user-defined signal 2"},
+	/* 32 */ {_SigNotify, "SIGTHR: reserved"},
+}
diff --git a/src/runtime/signal_dragonfly_amd64.go b/src/runtime/signal_dragonfly_amd64.go
new file mode 100644
index 0000000..c473edd
--- /dev/null
+++ b/src/runtime/signal_dragonfly_amd64.go
@@ -0,0 +1,51 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontext {
+	return (*mcontext)(unsafe.Pointer(&(*ucontext)(c.ctxt).uc_mcontext))
+}
+
+func (c *sigctxt) rax() uint64 { return c.regs().mc_rax }
+func (c *sigctxt) rbx() uint64 { return c.regs().mc_rbx }
+func (c *sigctxt) rcx() uint64 { return c.regs().mc_rcx }
+func (c *sigctxt) rdx() uint64 { return c.regs().mc_rdx }
+func (c *sigctxt) rdi() uint64 { return c.regs().mc_rdi }
+func (c *sigctxt) rsi() uint64 { return c.regs().mc_rsi }
+func (c *sigctxt) rbp() uint64 { return c.regs().mc_rbp }
+func (c *sigctxt) rsp() uint64 { return c.regs().mc_rsp }
+func (c *sigctxt) r8() uint64  { return c.regs().mc_r8 }
+func (c *sigctxt) r9() uint64  { return c.regs().mc_r9 }
+func (c *sigctxt) r10() uint64 { return c.regs().mc_r10 }
+func (c *sigctxt) r11() uint64 { return c.regs().mc_r11 }
+func (c *sigctxt) r12() uint64 { return c.regs().mc_r12 }
+func (c *sigctxt) r13() uint64 { return c.regs().mc_r13 }
+func (c *sigctxt) r14() uint64 { return c.regs().mc_r14 }
+func (c *sigctxt) r15() uint64 { return c.regs().mc_r15 }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) rip() uint64 { return c.regs().mc_rip }
+
+func (c *sigctxt) rflags() uint64  { return c.regs().mc_rflags }
+func (c *sigctxt) cs() uint64      { return c.regs().mc_cs }
+func (c *sigctxt) fs() uint64      { return c.regs().mc_ss }
+func (c *sigctxt) gs() uint64      { return c.regs().mc_ss }
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_rip(x uint64)     { c.regs().mc_rip = x }
+func (c *sigctxt) set_rsp(x uint64)     { c.regs().mc_rsp = x }
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) { c.info.si_addr = x }
diff --git a/src/runtime/signal_freebsd.go b/src/runtime/signal_freebsd.go
new file mode 100644
index 0000000..2812c69
--- /dev/null
+++ b/src/runtime/signal_freebsd.go
@@ -0,0 +1,41 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	/* 0 */ {0, "SIGNONE: no trap"},
+	/* 1 */ {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
+	/* 2 */ {_SigNotify + _SigKill, "SIGINT: interrupt"},
+	/* 3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+	/* 4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
+	/* 5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
+	/* 6 */ {_SigNotify + _SigThrow, "SIGABRT: abort"},
+	/* 7 */ {_SigThrow, "SIGEMT: emulate instruction executed"},
+	/* 8 */ {_SigPanic + _SigUnblock, "SIGFPE: floating-point exception"},
+	/* 9 */ {0, "SIGKILL: kill"},
+	/* 10 */ {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	/* 11 */ {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	/* 12 */ {_SigNotify, "SIGSYS: bad system call"}, // see golang.org/issues/15204
+	/* 13 */ {_SigNotify, "SIGPIPE: write to broken pipe"},
+	/* 14 */ {_SigNotify, "SIGALRM: alarm clock"},
+	/* 15 */ {_SigNotify + _SigKill, "SIGTERM: termination"},
+	/* 16 */ {_SigNotify + _SigIgn, "SIGURG: urgent condition on socket"},
+	/* 17 */ {0, "SIGSTOP: stop"},
+	/* 18 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTSTP: keyboard stop"},
+	/* 19 */ {_SigNotify + _SigDefault + _SigIgn, "SIGCONT: continue after stop"},
+	/* 20 */ {_SigNotify + _SigUnblock + _SigIgn, "SIGCHLD: child status has changed"},
+	/* 21 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTIN: background read from tty"},
+	/* 22 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTOU: background write to tty"},
+	/* 23 */ {_SigNotify + _SigIgn, "SIGIO: i/o now possible"},
+	/* 24 */ {_SigNotify, "SIGXCPU: cpu limit exceeded"},
+	/* 25 */ {_SigNotify, "SIGXFSZ: file size limit exceeded"},
+	/* 26 */ {_SigNotify, "SIGVTALRM: virtual alarm clock"},
+	/* 27 */ {_SigNotify + _SigUnblock, "SIGPROF: profiling alarm clock"},
+	/* 28 */ {_SigNotify + _SigIgn, "SIGWINCH: window size change"},
+	/* 29 */ {_SigNotify + _SigIgn, "SIGINFO: status request from keyboard"},
+	/* 30 */ {_SigNotify, "SIGUSR1: user-defined signal 1"},
+	/* 31 */ {_SigNotify, "SIGUSR2: user-defined signal 2"},
+	/* 32 */ {_SigNotify, "SIGTHR: reserved"},
+}
diff --git a/src/runtime/signal_freebsd_386.go b/src/runtime/signal_freebsd_386.go
new file mode 100644
index 0000000..f7cc0df
--- /dev/null
+++ b/src/runtime/signal_freebsd_386.go
@@ -0,0 +1,41 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) eax() uint32 { return c.regs().mc_eax }
+func (c *sigctxt) ebx() uint32 { return c.regs().mc_ebx }
+func (c *sigctxt) ecx() uint32 { return c.regs().mc_ecx }
+func (c *sigctxt) edx() uint32 { return c.regs().mc_edx }
+func (c *sigctxt) edi() uint32 { return c.regs().mc_edi }
+func (c *sigctxt) esi() uint32 { return c.regs().mc_esi }
+func (c *sigctxt) ebp() uint32 { return c.regs().mc_ebp }
+func (c *sigctxt) esp() uint32 { return c.regs().mc_esp }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) eip() uint32 { return c.regs().mc_eip }
+
+func (c *sigctxt) eflags() uint32  { return c.regs().mc_eflags }
+func (c *sigctxt) cs() uint32      { return c.regs().mc_cs }
+func (c *sigctxt) fs() uint32      { return c.regs().mc_fs }
+func (c *sigctxt) gs() uint32      { return c.regs().mc_gs }
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint32 { return uint32(c.info.si_addr) }
+
+func (c *sigctxt) set_eip(x uint32)     { c.regs().mc_eip = x }
+func (c *sigctxt) set_esp(x uint32)     { c.regs().mc_esp = x }
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) { c.info.si_addr = uintptr(x) }
diff --git a/src/runtime/signal_freebsd_amd64.go b/src/runtime/signal_freebsd_amd64.go
new file mode 100644
index 0000000..20b86e7
--- /dev/null
+++ b/src/runtime/signal_freebsd_amd64.go
@@ -0,0 +1,51 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontext {
+	return (*mcontext)(unsafe.Pointer(&(*ucontext)(c.ctxt).uc_mcontext))
+}
+
+func (c *sigctxt) rax() uint64 { return c.regs().mc_rax }
+func (c *sigctxt) rbx() uint64 { return c.regs().mc_rbx }
+func (c *sigctxt) rcx() uint64 { return c.regs().mc_rcx }
+func (c *sigctxt) rdx() uint64 { return c.regs().mc_rdx }
+func (c *sigctxt) rdi() uint64 { return c.regs().mc_rdi }
+func (c *sigctxt) rsi() uint64 { return c.regs().mc_rsi }
+func (c *sigctxt) rbp() uint64 { return c.regs().mc_rbp }
+func (c *sigctxt) rsp() uint64 { return c.regs().mc_rsp }
+func (c *sigctxt) r8() uint64  { return c.regs().mc_r8 }
+func (c *sigctxt) r9() uint64  { return c.regs().mc_r9 }
+func (c *sigctxt) r10() uint64 { return c.regs().mc_r10 }
+func (c *sigctxt) r11() uint64 { return c.regs().mc_r11 }
+func (c *sigctxt) r12() uint64 { return c.regs().mc_r12 }
+func (c *sigctxt) r13() uint64 { return c.regs().mc_r13 }
+func (c *sigctxt) r14() uint64 { return c.regs().mc_r14 }
+func (c *sigctxt) r15() uint64 { return c.regs().mc_r15 }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) rip() uint64 { return c.regs().mc_rip }
+
+func (c *sigctxt) rflags() uint64  { return c.regs().mc_rflags }
+func (c *sigctxt) cs() uint64      { return c.regs().mc_cs }
+func (c *sigctxt) fs() uint64      { return uint64(c.regs().mc_fs) }
+func (c *sigctxt) gs() uint64      { return uint64(c.regs().mc_gs) }
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_rip(x uint64)     { c.regs().mc_rip = x }
+func (c *sigctxt) set_rsp(x uint64)     { c.regs().mc_rsp = x }
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) { c.info.si_addr = x }
diff --git a/src/runtime/signal_freebsd_arm.go b/src/runtime/signal_freebsd_arm.go
new file mode 100644
index 0000000..2135c1e
--- /dev/null
+++ b/src/runtime/signal_freebsd_arm.go
@@ -0,0 +1,55 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) r0() uint32  { return c.regs().__gregs[0] }
+func (c *sigctxt) r1() uint32  { return c.regs().__gregs[1] }
+func (c *sigctxt) r2() uint32  { return c.regs().__gregs[2] }
+func (c *sigctxt) r3() uint32  { return c.regs().__gregs[3] }
+func (c *sigctxt) r4() uint32  { return c.regs().__gregs[4] }
+func (c *sigctxt) r5() uint32  { return c.regs().__gregs[5] }
+func (c *sigctxt) r6() uint32  { return c.regs().__gregs[6] }
+func (c *sigctxt) r7() uint32  { return c.regs().__gregs[7] }
+func (c *sigctxt) r8() uint32  { return c.regs().__gregs[8] }
+func (c *sigctxt) r9() uint32  { return c.regs().__gregs[9] }
+func (c *sigctxt) r10() uint32 { return c.regs().__gregs[10] }
+func (c *sigctxt) fp() uint32  { return c.regs().__gregs[11] }
+func (c *sigctxt) ip() uint32  { return c.regs().__gregs[12] }
+func (c *sigctxt) sp() uint32  { return c.regs().__gregs[13] }
+func (c *sigctxt) lr() uint32  { return c.regs().__gregs[14] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint32 { return c.regs().__gregs[15] }
+
+func (c *sigctxt) cpsr() uint32    { return c.regs().__gregs[16] }
+func (c *sigctxt) fault() uintptr  { return uintptr(c.info.si_addr) }
+func (c *sigctxt) trap() uint32    { return 0 }
+func (c *sigctxt) error() uint32   { return 0 }
+func (c *sigctxt) oldmask() uint32 { return 0 }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint32 { return uint32(c.info.si_addr) }
+
+func (c *sigctxt) set_pc(x uint32)  { c.regs().__gregs[15] = x }
+func (c *sigctxt) set_sp(x uint32)  { c.regs().__gregs[13] = x }
+func (c *sigctxt) set_lr(x uint32)  { c.regs().__gregs[14] = x }
+func (c *sigctxt) set_r10(x uint32) { c.regs().__gregs[10] = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) {
+	c.info.si_addr = uintptr(x)
+}
diff --git a/src/runtime/signal_freebsd_arm64.go b/src/runtime/signal_freebsd_arm64.go
new file mode 100644
index 0000000..159e965
--- /dev/null
+++ b/src/runtime/signal_freebsd_arm64.go
@@ -0,0 +1,66 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) r0() uint64  { return c.regs().mc_gpregs.gp_x[0] }
+func (c *sigctxt) r1() uint64  { return c.regs().mc_gpregs.gp_x[1] }
+func (c *sigctxt) r2() uint64  { return c.regs().mc_gpregs.gp_x[2] }
+func (c *sigctxt) r3() uint64  { return c.regs().mc_gpregs.gp_x[3] }
+func (c *sigctxt) r4() uint64  { return c.regs().mc_gpregs.gp_x[4] }
+func (c *sigctxt) r5() uint64  { return c.regs().mc_gpregs.gp_x[5] }
+func (c *sigctxt) r6() uint64  { return c.regs().mc_gpregs.gp_x[6] }
+func (c *sigctxt) r7() uint64  { return c.regs().mc_gpregs.gp_x[7] }
+func (c *sigctxt) r8() uint64  { return c.regs().mc_gpregs.gp_x[8] }
+func (c *sigctxt) r9() uint64  { return c.regs().mc_gpregs.gp_x[9] }
+func (c *sigctxt) r10() uint64 { return c.regs().mc_gpregs.gp_x[10] }
+func (c *sigctxt) r11() uint64 { return c.regs().mc_gpregs.gp_x[11] }
+func (c *sigctxt) r12() uint64 { return c.regs().mc_gpregs.gp_x[12] }
+func (c *sigctxt) r13() uint64 { return c.regs().mc_gpregs.gp_x[13] }
+func (c *sigctxt) r14() uint64 { return c.regs().mc_gpregs.gp_x[14] }
+func (c *sigctxt) r15() uint64 { return c.regs().mc_gpregs.gp_x[15] }
+func (c *sigctxt) r16() uint64 { return c.regs().mc_gpregs.gp_x[16] }
+func (c *sigctxt) r17() uint64 { return c.regs().mc_gpregs.gp_x[17] }
+func (c *sigctxt) r18() uint64 { return c.regs().mc_gpregs.gp_x[18] }
+func (c *sigctxt) r19() uint64 { return c.regs().mc_gpregs.gp_x[19] }
+func (c *sigctxt) r20() uint64 { return c.regs().mc_gpregs.gp_x[20] }
+func (c *sigctxt) r21() uint64 { return c.regs().mc_gpregs.gp_x[21] }
+func (c *sigctxt) r22() uint64 { return c.regs().mc_gpregs.gp_x[22] }
+func (c *sigctxt) r23() uint64 { return c.regs().mc_gpregs.gp_x[23] }
+func (c *sigctxt) r24() uint64 { return c.regs().mc_gpregs.gp_x[24] }
+func (c *sigctxt) r25() uint64 { return c.regs().mc_gpregs.gp_x[25] }
+func (c *sigctxt) r26() uint64 { return c.regs().mc_gpregs.gp_x[26] }
+func (c *sigctxt) r27() uint64 { return c.regs().mc_gpregs.gp_x[27] }
+func (c *sigctxt) r28() uint64 { return c.regs().mc_gpregs.gp_x[28] }
+func (c *sigctxt) r29() uint64 { return c.regs().mc_gpregs.gp_x[29] }
+func (c *sigctxt) lr() uint64  { return c.regs().mc_gpregs.gp_lr }
+func (c *sigctxt) sp() uint64  { return c.regs().mc_gpregs.gp_sp }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().mc_gpregs.gp_elr }
+
+func (c *sigctxt) fault() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_pc(x uint64)  { c.regs().mc_gpregs.gp_elr = x }
+func (c *sigctxt) set_sp(x uint64)  { c.regs().mc_gpregs.gp_sp = x }
+func (c *sigctxt) set_lr(x uint64)  { c.regs().mc_gpregs.gp_lr = x }
+func (c *sigctxt) set_r28(x uint64) { c.regs().mc_gpregs.gp_x[28] = x }
+
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) { c.info.si_addr = x }
diff --git a/src/runtime/signal_freebsd_riscv64.go b/src/runtime/signal_freebsd_riscv64.go
new file mode 100644
index 0000000..fbf6c63
--- /dev/null
+++ b/src/runtime/signal_freebsd_riscv64.go
@@ -0,0 +1,63 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) ra() uint64  { return c.regs().mc_gpregs.gp_ra }
+func (c *sigctxt) sp() uint64  { return c.regs().mc_gpregs.gp_sp }
+func (c *sigctxt) gp() uint64  { return c.regs().mc_gpregs.gp_gp }
+func (c *sigctxt) tp() uint64  { return c.regs().mc_gpregs.gp_tp }
+func (c *sigctxt) t0() uint64  { return c.regs().mc_gpregs.gp_t[0] }
+func (c *sigctxt) t1() uint64  { return c.regs().mc_gpregs.gp_t[1] }
+func (c *sigctxt) t2() uint64  { return c.regs().mc_gpregs.gp_t[2] }
+func (c *sigctxt) s0() uint64  { return c.regs().mc_gpregs.gp_s[0] }
+func (c *sigctxt) s1() uint64  { return c.regs().mc_gpregs.gp_s[1] }
+func (c *sigctxt) a0() uint64  { return c.regs().mc_gpregs.gp_a[0] }
+func (c *sigctxt) a1() uint64  { return c.regs().mc_gpregs.gp_a[1] }
+func (c *sigctxt) a2() uint64  { return c.regs().mc_gpregs.gp_a[2] }
+func (c *sigctxt) a3() uint64  { return c.regs().mc_gpregs.gp_a[3] }
+func (c *sigctxt) a4() uint64  { return c.regs().mc_gpregs.gp_a[4] }
+func (c *sigctxt) a5() uint64  { return c.regs().mc_gpregs.gp_a[5] }
+func (c *sigctxt) a6() uint64  { return c.regs().mc_gpregs.gp_a[6] }
+func (c *sigctxt) a7() uint64  { return c.regs().mc_gpregs.gp_a[7] }
+func (c *sigctxt) s2() uint64  { return c.regs().mc_gpregs.gp_s[2] }
+func (c *sigctxt) s3() uint64  { return c.regs().mc_gpregs.gp_s[3] }
+func (c *sigctxt) s4() uint64  { return c.regs().mc_gpregs.gp_s[4] }
+func (c *sigctxt) s5() uint64  { return c.regs().mc_gpregs.gp_s[5] }
+func (c *sigctxt) s6() uint64  { return c.regs().mc_gpregs.gp_s[6] }
+func (c *sigctxt) s7() uint64  { return c.regs().mc_gpregs.gp_s[7] }
+func (c *sigctxt) s8() uint64  { return c.regs().mc_gpregs.gp_s[8] }
+func (c *sigctxt) s9() uint64  { return c.regs().mc_gpregs.gp_s[9] }
+func (c *sigctxt) s10() uint64 { return c.regs().mc_gpregs.gp_s[10] }
+func (c *sigctxt) s11() uint64 { return c.regs().mc_gpregs.gp_s[11] }
+func (c *sigctxt) t3() uint64  { return c.regs().mc_gpregs.gp_t[3] }
+func (c *sigctxt) t4() uint64  { return c.regs().mc_gpregs.gp_t[4] }
+func (c *sigctxt) t5() uint64  { return c.regs().mc_gpregs.gp_t[5] }
+func (c *sigctxt) t6() uint64  { return c.regs().mc_gpregs.gp_t[6] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().mc_gpregs.gp_sepc }
+
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_pc(x uint64) { c.regs().mc_gpregs.gp_sepc = x }
+func (c *sigctxt) set_ra(x uint64) { c.regs().mc_gpregs.gp_ra = x }
+func (c *sigctxt) set_sp(x uint64) { c.regs().mc_gpregs.gp_sp = x }
+func (c *sigctxt) set_gp(x uint64) { c.regs().mc_gpregs.gp_gp = x }
+
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) { c.info.si_addr = x }
diff --git a/src/runtime/signal_linux_386.go b/src/runtime/signal_linux_386.go
new file mode 100644
index 0000000..321518c
--- /dev/null
+++ b/src/runtime/signal_linux_386.go
@@ -0,0 +1,46 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) eax() uint32 { return c.regs().eax }
+func (c *sigctxt) ebx() uint32 { return c.regs().ebx }
+func (c *sigctxt) ecx() uint32 { return c.regs().ecx }
+func (c *sigctxt) edx() uint32 { return c.regs().edx }
+func (c *sigctxt) edi() uint32 { return c.regs().edi }
+func (c *sigctxt) esi() uint32 { return c.regs().esi }
+func (c *sigctxt) ebp() uint32 { return c.regs().ebp }
+func (c *sigctxt) esp() uint32 { return c.regs().esp }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) eip() uint32 { return c.regs().eip }
+
+func (c *sigctxt) eflags() uint32  { return c.regs().eflags }
+func (c *sigctxt) cs() uint32      { return uint32(c.regs().cs) }
+func (c *sigctxt) fs() uint32      { return uint32(c.regs().fs) }
+func (c *sigctxt) gs() uint32      { return uint32(c.regs().gs) }
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint32 { return c.info.si_addr }
+
+func (c *sigctxt) set_eip(x uint32)     { c.regs().eip = x }
+func (c *sigctxt) set_esp(x uint32)     { c.regs().esp = x }
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*goarch.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_linux_amd64.go b/src/runtime/signal_linux_amd64.go
new file mode 100644
index 0000000..573b118
--- /dev/null
+++ b/src/runtime/signal_linux_amd64.go
@@ -0,0 +1,56 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext {
+	return (*sigcontext)(unsafe.Pointer(&(*ucontext)(c.ctxt).uc_mcontext))
+}
+
+func (c *sigctxt) rax() uint64 { return c.regs().rax }
+func (c *sigctxt) rbx() uint64 { return c.regs().rbx }
+func (c *sigctxt) rcx() uint64 { return c.regs().rcx }
+func (c *sigctxt) rdx() uint64 { return c.regs().rdx }
+func (c *sigctxt) rdi() uint64 { return c.regs().rdi }
+func (c *sigctxt) rsi() uint64 { return c.regs().rsi }
+func (c *sigctxt) rbp() uint64 { return c.regs().rbp }
+func (c *sigctxt) rsp() uint64 { return c.regs().rsp }
+func (c *sigctxt) r8() uint64  { return c.regs().r8 }
+func (c *sigctxt) r9() uint64  { return c.regs().r9 }
+func (c *sigctxt) r10() uint64 { return c.regs().r10 }
+func (c *sigctxt) r11() uint64 { return c.regs().r11 }
+func (c *sigctxt) r12() uint64 { return c.regs().r12 }
+func (c *sigctxt) r13() uint64 { return c.regs().r13 }
+func (c *sigctxt) r14() uint64 { return c.regs().r14 }
+func (c *sigctxt) r15() uint64 { return c.regs().r15 }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) rip() uint64 { return c.regs().rip }
+
+func (c *sigctxt) rflags() uint64  { return c.regs().eflags }
+func (c *sigctxt) cs() uint64      { return uint64(c.regs().cs) }
+func (c *sigctxt) fs() uint64      { return uint64(c.regs().fs) }
+func (c *sigctxt) gs() uint64      { return uint64(c.regs().gs) }
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_rip(x uint64)     { c.regs().rip = x }
+func (c *sigctxt) set_rsp(x uint64)     { c.regs().rsp = x }
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*goarch.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_linux_arm.go b/src/runtime/signal_linux_arm.go
new file mode 100644
index 0000000..eb107d6
--- /dev/null
+++ b/src/runtime/signal_linux_arm.go
@@ -0,0 +1,58 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) r0() uint32  { return c.regs().r0 }
+func (c *sigctxt) r1() uint32  { return c.regs().r1 }
+func (c *sigctxt) r2() uint32  { return c.regs().r2 }
+func (c *sigctxt) r3() uint32  { return c.regs().r3 }
+func (c *sigctxt) r4() uint32  { return c.regs().r4 }
+func (c *sigctxt) r5() uint32  { return c.regs().r5 }
+func (c *sigctxt) r6() uint32  { return c.regs().r6 }
+func (c *sigctxt) r7() uint32  { return c.regs().r7 }
+func (c *sigctxt) r8() uint32  { return c.regs().r8 }
+func (c *sigctxt) r9() uint32  { return c.regs().r9 }
+func (c *sigctxt) r10() uint32 { return c.regs().r10 }
+func (c *sigctxt) fp() uint32  { return c.regs().fp }
+func (c *sigctxt) ip() uint32  { return c.regs().ip }
+func (c *sigctxt) sp() uint32  { return c.regs().sp }
+func (c *sigctxt) lr() uint32  { return c.regs().lr }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint32 { return c.regs().pc }
+
+func (c *sigctxt) cpsr() uint32    { return c.regs().cpsr }
+func (c *sigctxt) fault() uintptr  { return uintptr(c.regs().fault_address) }
+func (c *sigctxt) trap() uint32    { return c.regs().trap_no }
+func (c *sigctxt) error() uint32   { return c.regs().error_code }
+func (c *sigctxt) oldmask() uint32 { return c.regs().oldmask }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint32 { return c.info.si_addr }
+
+func (c *sigctxt) set_pc(x uint32)  { c.regs().pc = x }
+func (c *sigctxt) set_sp(x uint32)  { c.regs().sp = x }
+func (c *sigctxt) set_lr(x uint32)  { c.regs().lr = x }
+func (c *sigctxt) set_r10(x uint32) { c.regs().r10 = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*goarch.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_linux_arm64.go b/src/runtime/signal_linux_arm64.go
new file mode 100644
index 0000000..4ccc030
--- /dev/null
+++ b/src/runtime/signal_linux_arm64.go
@@ -0,0 +1,71 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) r0() uint64  { return c.regs().regs[0] }
+func (c *sigctxt) r1() uint64  { return c.regs().regs[1] }
+func (c *sigctxt) r2() uint64  { return c.regs().regs[2] }
+func (c *sigctxt) r3() uint64  { return c.regs().regs[3] }
+func (c *sigctxt) r4() uint64  { return c.regs().regs[4] }
+func (c *sigctxt) r5() uint64  { return c.regs().regs[5] }
+func (c *sigctxt) r6() uint64  { return c.regs().regs[6] }
+func (c *sigctxt) r7() uint64  { return c.regs().regs[7] }
+func (c *sigctxt) r8() uint64  { return c.regs().regs[8] }
+func (c *sigctxt) r9() uint64  { return c.regs().regs[9] }
+func (c *sigctxt) r10() uint64 { return c.regs().regs[10] }
+func (c *sigctxt) r11() uint64 { return c.regs().regs[11] }
+func (c *sigctxt) r12() uint64 { return c.regs().regs[12] }
+func (c *sigctxt) r13() uint64 { return c.regs().regs[13] }
+func (c *sigctxt) r14() uint64 { return c.regs().regs[14] }
+func (c *sigctxt) r15() uint64 { return c.regs().regs[15] }
+func (c *sigctxt) r16() uint64 { return c.regs().regs[16] }
+func (c *sigctxt) r17() uint64 { return c.regs().regs[17] }
+func (c *sigctxt) r18() uint64 { return c.regs().regs[18] }
+func (c *sigctxt) r19() uint64 { return c.regs().regs[19] }
+func (c *sigctxt) r20() uint64 { return c.regs().regs[20] }
+func (c *sigctxt) r21() uint64 { return c.regs().regs[21] }
+func (c *sigctxt) r22() uint64 { return c.regs().regs[22] }
+func (c *sigctxt) r23() uint64 { return c.regs().regs[23] }
+func (c *sigctxt) r24() uint64 { return c.regs().regs[24] }
+func (c *sigctxt) r25() uint64 { return c.regs().regs[25] }
+func (c *sigctxt) r26() uint64 { return c.regs().regs[26] }
+func (c *sigctxt) r27() uint64 { return c.regs().regs[27] }
+func (c *sigctxt) r28() uint64 { return c.regs().regs[28] }
+func (c *sigctxt) r29() uint64 { return c.regs().regs[29] }
+func (c *sigctxt) lr() uint64  { return c.regs().regs[30] }
+func (c *sigctxt) sp() uint64  { return c.regs().sp }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().pc }
+
+func (c *sigctxt) pstate() uint64 { return c.regs().pstate }
+func (c *sigctxt) fault() uintptr { return uintptr(c.regs().fault_address) }
+
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_pc(x uint64)  { c.regs().pc = x }
+func (c *sigctxt) set_sp(x uint64)  { c.regs().sp = x }
+func (c *sigctxt) set_lr(x uint64)  { c.regs().regs[30] = x }
+func (c *sigctxt) set_r28(x uint64) { c.regs().regs[28] = x }
+
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*goarch.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_linux_loong64.go b/src/runtime/signal_linux_loong64.go
new file mode 100644
index 0000000..51aaacb
--- /dev/null
+++ b/src/runtime/signal_linux_loong64.go
@@ -0,0 +1,75 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && loong64
+
+package runtime
+
+import (
+	"internal/goarch"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) r0() uint64  { return c.regs().sc_regs[0] }
+func (c *sigctxt) r1() uint64  { return c.regs().sc_regs[1] }
+func (c *sigctxt) r2() uint64  { return c.regs().sc_regs[2] }
+func (c *sigctxt) r3() uint64  { return c.regs().sc_regs[3] }
+func (c *sigctxt) r4() uint64  { return c.regs().sc_regs[4] }
+func (c *sigctxt) r5() uint64  { return c.regs().sc_regs[5] }
+func (c *sigctxt) r6() uint64  { return c.regs().sc_regs[6] }
+func (c *sigctxt) r7() uint64  { return c.regs().sc_regs[7] }
+func (c *sigctxt) r8() uint64  { return c.regs().sc_regs[8] }
+func (c *sigctxt) r9() uint64  { return c.regs().sc_regs[9] }
+func (c *sigctxt) r10() uint64 { return c.regs().sc_regs[10] }
+func (c *sigctxt) r11() uint64 { return c.regs().sc_regs[11] }
+func (c *sigctxt) r12() uint64 { return c.regs().sc_regs[12] }
+func (c *sigctxt) r13() uint64 { return c.regs().sc_regs[13] }
+func (c *sigctxt) r14() uint64 { return c.regs().sc_regs[14] }
+func (c *sigctxt) r15() uint64 { return c.regs().sc_regs[15] }
+func (c *sigctxt) r16() uint64 { return c.regs().sc_regs[16] }
+func (c *sigctxt) r17() uint64 { return c.regs().sc_regs[17] }
+func (c *sigctxt) r18() uint64 { return c.regs().sc_regs[18] }
+func (c *sigctxt) r19() uint64 { return c.regs().sc_regs[19] }
+func (c *sigctxt) r20() uint64 { return c.regs().sc_regs[20] }
+func (c *sigctxt) r21() uint64 { return c.regs().sc_regs[21] }
+func (c *sigctxt) r22() uint64 { return c.regs().sc_regs[22] }
+func (c *sigctxt) r23() uint64 { return c.regs().sc_regs[23] }
+func (c *sigctxt) r24() uint64 { return c.regs().sc_regs[24] }
+func (c *sigctxt) r25() uint64 { return c.regs().sc_regs[25] }
+func (c *sigctxt) r26() uint64 { return c.regs().sc_regs[26] }
+func (c *sigctxt) r27() uint64 { return c.regs().sc_regs[27] }
+func (c *sigctxt) r28() uint64 { return c.regs().sc_regs[28] }
+func (c *sigctxt) r29() uint64 { return c.regs().sc_regs[29] }
+func (c *sigctxt) r30() uint64 { return c.regs().sc_regs[30] }
+func (c *sigctxt) r31() uint64 { return c.regs().sc_regs[31] }
+func (c *sigctxt) sp() uint64  { return c.regs().sc_regs[3] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().sc_pc }
+
+func (c *sigctxt) link() uint64 { return c.regs().sc_regs[1] }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_r31(x uint64)  { c.regs().sc_regs[31] = x }
+func (c *sigctxt) set_r22(x uint64)  { c.regs().sc_regs[22] = x }
+func (c *sigctxt) set_pc(x uint64)   { c.regs().sc_pc = x }
+func (c *sigctxt) set_sp(x uint64)   { c.regs().sc_regs[3] = x }
+func (c *sigctxt) set_link(x uint64) { c.regs().sc_regs[1] = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*goarch.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_linux_mips64x.go b/src/runtime/signal_linux_mips64x.go
new file mode 100644
index 0000000..9c2a286
--- /dev/null
+++ b/src/runtime/signal_linux_mips64x.go
@@ -0,0 +1,77 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (mips64 || mips64le)
+
+package runtime
+
+import (
+	"internal/goarch"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) r0() uint64  { return c.regs().sc_regs[0] }
+func (c *sigctxt) r1() uint64  { return c.regs().sc_regs[1] }
+func (c *sigctxt) r2() uint64  { return c.regs().sc_regs[2] }
+func (c *sigctxt) r3() uint64  { return c.regs().sc_regs[3] }
+func (c *sigctxt) r4() uint64  { return c.regs().sc_regs[4] }
+func (c *sigctxt) r5() uint64  { return c.regs().sc_regs[5] }
+func (c *sigctxt) r6() uint64  { return c.regs().sc_regs[6] }
+func (c *sigctxt) r7() uint64  { return c.regs().sc_regs[7] }
+func (c *sigctxt) r8() uint64  { return c.regs().sc_regs[8] }
+func (c *sigctxt) r9() uint64  { return c.regs().sc_regs[9] }
+func (c *sigctxt) r10() uint64 { return c.regs().sc_regs[10] }
+func (c *sigctxt) r11() uint64 { return c.regs().sc_regs[11] }
+func (c *sigctxt) r12() uint64 { return c.regs().sc_regs[12] }
+func (c *sigctxt) r13() uint64 { return c.regs().sc_regs[13] }
+func (c *sigctxt) r14() uint64 { return c.regs().sc_regs[14] }
+func (c *sigctxt) r15() uint64 { return c.regs().sc_regs[15] }
+func (c *sigctxt) r16() uint64 { return c.regs().sc_regs[16] }
+func (c *sigctxt) r17() uint64 { return c.regs().sc_regs[17] }
+func (c *sigctxt) r18() uint64 { return c.regs().sc_regs[18] }
+func (c *sigctxt) r19() uint64 { return c.regs().sc_regs[19] }
+func (c *sigctxt) r20() uint64 { return c.regs().sc_regs[20] }
+func (c *sigctxt) r21() uint64 { return c.regs().sc_regs[21] }
+func (c *sigctxt) r22() uint64 { return c.regs().sc_regs[22] }
+func (c *sigctxt) r23() uint64 { return c.regs().sc_regs[23] }
+func (c *sigctxt) r24() uint64 { return c.regs().sc_regs[24] }
+func (c *sigctxt) r25() uint64 { return c.regs().sc_regs[25] }
+func (c *sigctxt) r26() uint64 { return c.regs().sc_regs[26] }
+func (c *sigctxt) r27() uint64 { return c.regs().sc_regs[27] }
+func (c *sigctxt) r28() uint64 { return c.regs().sc_regs[28] }
+func (c *sigctxt) r29() uint64 { return c.regs().sc_regs[29] }
+func (c *sigctxt) r30() uint64 { return c.regs().sc_regs[30] }
+func (c *sigctxt) r31() uint64 { return c.regs().sc_regs[31] }
+func (c *sigctxt) sp() uint64  { return c.regs().sc_regs[29] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().sc_pc }
+
+func (c *sigctxt) link() uint64 { return c.regs().sc_regs[31] }
+func (c *sigctxt) lo() uint64   { return c.regs().sc_mdlo }
+func (c *sigctxt) hi() uint64   { return c.regs().sc_mdhi }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_r28(x uint64)  { c.regs().sc_regs[28] = x }
+func (c *sigctxt) set_r30(x uint64)  { c.regs().sc_regs[30] = x }
+func (c *sigctxt) set_pc(x uint64)   { c.regs().sc_pc = x }
+func (c *sigctxt) set_sp(x uint64)   { c.regs().sc_regs[29] = x }
+func (c *sigctxt) set_link(x uint64) { c.regs().sc_regs[31] = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*goarch.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_linux_mipsx.go b/src/runtime/signal_linux_mipsx.go
new file mode 100644
index 0000000..f11bfc9
--- /dev/null
+++ b/src/runtime/signal_linux_mipsx.go
@@ -0,0 +1,64 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (mips || mipsle)
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+func (c *sigctxt) regs() *sigcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+func (c *sigctxt) r0() uint32        { return uint32(c.regs().sc_regs[0]) }
+func (c *sigctxt) r1() uint32        { return uint32(c.regs().sc_regs[1]) }
+func (c *sigctxt) r2() uint32        { return uint32(c.regs().sc_regs[2]) }
+func (c *sigctxt) r3() uint32        { return uint32(c.regs().sc_regs[3]) }
+func (c *sigctxt) r4() uint32        { return uint32(c.regs().sc_regs[4]) }
+func (c *sigctxt) r5() uint32        { return uint32(c.regs().sc_regs[5]) }
+func (c *sigctxt) r6() uint32        { return uint32(c.regs().sc_regs[6]) }
+func (c *sigctxt) r7() uint32        { return uint32(c.regs().sc_regs[7]) }
+func (c *sigctxt) r8() uint32        { return uint32(c.regs().sc_regs[8]) }
+func (c *sigctxt) r9() uint32        { return uint32(c.regs().sc_regs[9]) }
+func (c *sigctxt) r10() uint32       { return uint32(c.regs().sc_regs[10]) }
+func (c *sigctxt) r11() uint32       { return uint32(c.regs().sc_regs[11]) }
+func (c *sigctxt) r12() uint32       { return uint32(c.regs().sc_regs[12]) }
+func (c *sigctxt) r13() uint32       { return uint32(c.regs().sc_regs[13]) }
+func (c *sigctxt) r14() uint32       { return uint32(c.regs().sc_regs[14]) }
+func (c *sigctxt) r15() uint32       { return uint32(c.regs().sc_regs[15]) }
+func (c *sigctxt) r16() uint32       { return uint32(c.regs().sc_regs[16]) }
+func (c *sigctxt) r17() uint32       { return uint32(c.regs().sc_regs[17]) }
+func (c *sigctxt) r18() uint32       { return uint32(c.regs().sc_regs[18]) }
+func (c *sigctxt) r19() uint32       { return uint32(c.regs().sc_regs[19]) }
+func (c *sigctxt) r20() uint32       { return uint32(c.regs().sc_regs[20]) }
+func (c *sigctxt) r21() uint32       { return uint32(c.regs().sc_regs[21]) }
+func (c *sigctxt) r22() uint32       { return uint32(c.regs().sc_regs[22]) }
+func (c *sigctxt) r23() uint32       { return uint32(c.regs().sc_regs[23]) }
+func (c *sigctxt) r24() uint32       { return uint32(c.regs().sc_regs[24]) }
+func (c *sigctxt) r25() uint32       { return uint32(c.regs().sc_regs[25]) }
+func (c *sigctxt) r26() uint32       { return uint32(c.regs().sc_regs[26]) }
+func (c *sigctxt) r27() uint32       { return uint32(c.regs().sc_regs[27]) }
+func (c *sigctxt) r28() uint32       { return uint32(c.regs().sc_regs[28]) }
+func (c *sigctxt) r29() uint32       { return uint32(c.regs().sc_regs[29]) }
+func (c *sigctxt) r30() uint32       { return uint32(c.regs().sc_regs[30]) }
+func (c *sigctxt) r31() uint32       { return uint32(c.regs().sc_regs[31]) }
+func (c *sigctxt) sp() uint32        { return uint32(c.regs().sc_regs[29]) }
+func (c *sigctxt) pc() uint32        { return uint32(c.regs().sc_pc) }
+func (c *sigctxt) link() uint32      { return uint32(c.regs().sc_regs[31]) }
+func (c *sigctxt) lo() uint32        { return uint32(c.regs().sc_mdlo) }
+func (c *sigctxt) hi() uint32        { return uint32(c.regs().sc_mdhi) }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint32 { return c.info.si_addr }
+
+func (c *sigctxt) set_r30(x uint32)  { c.regs().sc_regs[30] = uint64(x) }
+func (c *sigctxt) set_pc(x uint32)   { c.regs().sc_pc = uint64(x) }
+func (c *sigctxt) set_sp(x uint32)   { c.regs().sc_regs[29] = uint64(x) }
+func (c *sigctxt) set_link(x uint32) { c.regs().sc_regs[31] = uint64(x) }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) { c.info.si_addr = x }
diff --git a/src/runtime/signal_linux_ppc64x.go b/src/runtime/signal_linux_ppc64x.go
new file mode 100644
index 0000000..3175428
--- /dev/null
+++ b/src/runtime/signal_linux_ppc64x.go
@@ -0,0 +1,81 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (ppc64 || ppc64le)
+
+package runtime
+
+import (
+	"internal/goarch"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *ptregs { return (*ucontext)(c.ctxt).uc_mcontext.regs }
+
+func (c *sigctxt) r0() uint64  { return c.regs().gpr[0] }
+func (c *sigctxt) r1() uint64  { return c.regs().gpr[1] }
+func (c *sigctxt) r2() uint64  { return c.regs().gpr[2] }
+func (c *sigctxt) r3() uint64  { return c.regs().gpr[3] }
+func (c *sigctxt) r4() uint64  { return c.regs().gpr[4] }
+func (c *sigctxt) r5() uint64  { return c.regs().gpr[5] }
+func (c *sigctxt) r6() uint64  { return c.regs().gpr[6] }
+func (c *sigctxt) r7() uint64  { return c.regs().gpr[7] }
+func (c *sigctxt) r8() uint64  { return c.regs().gpr[8] }
+func (c *sigctxt) r9() uint64  { return c.regs().gpr[9] }
+func (c *sigctxt) r10() uint64 { return c.regs().gpr[10] }
+func (c *sigctxt) r11() uint64 { return c.regs().gpr[11] }
+func (c *sigctxt) r12() uint64 { return c.regs().gpr[12] }
+func (c *sigctxt) r13() uint64 { return c.regs().gpr[13] }
+func (c *sigctxt) r14() uint64 { return c.regs().gpr[14] }
+func (c *sigctxt) r15() uint64 { return c.regs().gpr[15] }
+func (c *sigctxt) r16() uint64 { return c.regs().gpr[16] }
+func (c *sigctxt) r17() uint64 { return c.regs().gpr[17] }
+func (c *sigctxt) r18() uint64 { return c.regs().gpr[18] }
+func (c *sigctxt) r19() uint64 { return c.regs().gpr[19] }
+func (c *sigctxt) r20() uint64 { return c.regs().gpr[20] }
+func (c *sigctxt) r21() uint64 { return c.regs().gpr[21] }
+func (c *sigctxt) r22() uint64 { return c.regs().gpr[22] }
+func (c *sigctxt) r23() uint64 { return c.regs().gpr[23] }
+func (c *sigctxt) r24() uint64 { return c.regs().gpr[24] }
+func (c *sigctxt) r25() uint64 { return c.regs().gpr[25] }
+func (c *sigctxt) r26() uint64 { return c.regs().gpr[26] }
+func (c *sigctxt) r27() uint64 { return c.regs().gpr[27] }
+func (c *sigctxt) r28() uint64 { return c.regs().gpr[28] }
+func (c *sigctxt) r29() uint64 { return c.regs().gpr[29] }
+func (c *sigctxt) r30() uint64 { return c.regs().gpr[30] }
+func (c *sigctxt) r31() uint64 { return c.regs().gpr[31] }
+func (c *sigctxt) sp() uint64  { return c.regs().gpr[1] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().nip }
+
+func (c *sigctxt) trap() uint64 { return c.regs().trap }
+func (c *sigctxt) ctr() uint64  { return c.regs().ctr }
+func (c *sigctxt) link() uint64 { return c.regs().link }
+func (c *sigctxt) xer() uint64  { return c.regs().xer }
+func (c *sigctxt) ccr() uint64  { return c.regs().ccr }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+func (c *sigctxt) fault() uintptr  { return uintptr(c.regs().dar) }
+
+func (c *sigctxt) set_r0(x uint64)   { c.regs().gpr[0] = x }
+func (c *sigctxt) set_r12(x uint64)  { c.regs().gpr[12] = x }
+func (c *sigctxt) set_r30(x uint64)  { c.regs().gpr[30] = x }
+func (c *sigctxt) set_pc(x uint64)   { c.regs().nip = x }
+func (c *sigctxt) set_sp(x uint64)   { c.regs().gpr[1] = x }
+func (c *sigctxt) set_link(x uint64) { c.regs().link = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*goarch.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_linux_riscv64.go b/src/runtime/signal_linux_riscv64.go
new file mode 100644
index 0000000..b26450d
--- /dev/null
+++ b/src/runtime/signal_linux_riscv64.go
@@ -0,0 +1,68 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext { return &(*ucontext)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) ra() uint64  { return c.regs().sc_regs.ra }
+func (c *sigctxt) sp() uint64  { return c.regs().sc_regs.sp }
+func (c *sigctxt) gp() uint64  { return c.regs().sc_regs.gp }
+func (c *sigctxt) tp() uint64  { return c.regs().sc_regs.tp }
+func (c *sigctxt) t0() uint64  { return c.regs().sc_regs.t0 }
+func (c *sigctxt) t1() uint64  { return c.regs().sc_regs.t1 }
+func (c *sigctxt) t2() uint64  { return c.regs().sc_regs.t2 }
+func (c *sigctxt) s0() uint64  { return c.regs().sc_regs.s0 }
+func (c *sigctxt) s1() uint64  { return c.regs().sc_regs.s1 }
+func (c *sigctxt) a0() uint64  { return c.regs().sc_regs.a0 }
+func (c *sigctxt) a1() uint64  { return c.regs().sc_regs.a1 }
+func (c *sigctxt) a2() uint64  { return c.regs().sc_regs.a2 }
+func (c *sigctxt) a3() uint64  { return c.regs().sc_regs.a3 }
+func (c *sigctxt) a4() uint64  { return c.regs().sc_regs.a4 }
+func (c *sigctxt) a5() uint64  { return c.regs().sc_regs.a5 }
+func (c *sigctxt) a6() uint64  { return c.regs().sc_regs.a6 }
+func (c *sigctxt) a7() uint64  { return c.regs().sc_regs.a7 }
+func (c *sigctxt) s2() uint64  { return c.regs().sc_regs.s2 }
+func (c *sigctxt) s3() uint64  { return c.regs().sc_regs.s3 }
+func (c *sigctxt) s4() uint64  { return c.regs().sc_regs.s4 }
+func (c *sigctxt) s5() uint64  { return c.regs().sc_regs.s5 }
+func (c *sigctxt) s6() uint64  { return c.regs().sc_regs.s6 }
+func (c *sigctxt) s7() uint64  { return c.regs().sc_regs.s7 }
+func (c *sigctxt) s8() uint64  { return c.regs().sc_regs.s8 }
+func (c *sigctxt) s9() uint64  { return c.regs().sc_regs.s9 }
+func (c *sigctxt) s10() uint64 { return c.regs().sc_regs.s10 }
+func (c *sigctxt) s11() uint64 { return c.regs().sc_regs.s11 }
+func (c *sigctxt) t3() uint64  { return c.regs().sc_regs.t3 }
+func (c *sigctxt) t4() uint64  { return c.regs().sc_regs.t4 }
+func (c *sigctxt) t5() uint64  { return c.regs().sc_regs.t5 }
+func (c *sigctxt) t6() uint64  { return c.regs().sc_regs.t6 }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().sc_regs.pc }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_pc(x uint64) { c.regs().sc_regs.pc = x }
+func (c *sigctxt) set_ra(x uint64) { c.regs().sc_regs.ra = x }
+func (c *sigctxt) set_sp(x uint64) { c.regs().sc_regs.sp = x }
+func (c *sigctxt) set_gp(x uint64) { c.regs().sc_regs.gp = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*goarch.PtrSize)) = uintptr(x)
+}
diff --git a/src/runtime/signal_linux_s390x.go b/src/runtime/signal_linux_s390x.go
new file mode 100644
index 0000000..18c3b11
--- /dev/null
+++ b/src/runtime/signal_linux_s390x.go
@@ -0,0 +1,127 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext {
+	return (*sigcontext)(unsafe.Pointer(&(*ucontext)(c.ctxt).uc_mcontext))
+}
+
+func (c *sigctxt) r0() uint64   { return c.regs().gregs[0] }
+func (c *sigctxt) r1() uint64   { return c.regs().gregs[1] }
+func (c *sigctxt) r2() uint64   { return c.regs().gregs[2] }
+func (c *sigctxt) r3() uint64   { return c.regs().gregs[3] }
+func (c *sigctxt) r4() uint64   { return c.regs().gregs[4] }
+func (c *sigctxt) r5() uint64   { return c.regs().gregs[5] }
+func (c *sigctxt) r6() uint64   { return c.regs().gregs[6] }
+func (c *sigctxt) r7() uint64   { return c.regs().gregs[7] }
+func (c *sigctxt) r8() uint64   { return c.regs().gregs[8] }
+func (c *sigctxt) r9() uint64   { return c.regs().gregs[9] }
+func (c *sigctxt) r10() uint64  { return c.regs().gregs[10] }
+func (c *sigctxt) r11() uint64  { return c.regs().gregs[11] }
+func (c *sigctxt) r12() uint64  { return c.regs().gregs[12] }
+func (c *sigctxt) r13() uint64  { return c.regs().gregs[13] }
+func (c *sigctxt) r14() uint64  { return c.regs().gregs[14] }
+func (c *sigctxt) r15() uint64  { return c.regs().gregs[15] }
+func (c *sigctxt) link() uint64 { return c.regs().gregs[14] }
+func (c *sigctxt) sp() uint64   { return c.regs().gregs[15] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().psw_addr }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return c.info.si_addr }
+
+func (c *sigctxt) set_r0(x uint64)      { c.regs().gregs[0] = x }
+func (c *sigctxt) set_r13(x uint64)     { c.regs().gregs[13] = x }
+func (c *sigctxt) set_link(x uint64)    { c.regs().gregs[14] = x }
+func (c *sigctxt) set_sp(x uint64)      { c.regs().gregs[15] = x }
+func (c *sigctxt) set_pc(x uint64)      { c.regs().psw_addr = x }
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(add(unsafe.Pointer(c.info), 2*goarch.PtrSize)) = uintptr(x)
+}
+
+func dumpregs(c *sigctxt) {
+	print("r0   ", hex(c.r0()), "\t")
+	print("r1   ", hex(c.r1()), "\n")
+	print("r2   ", hex(c.r2()), "\t")
+	print("r3   ", hex(c.r3()), "\n")
+	print("r4   ", hex(c.r4()), "\t")
+	print("r5   ", hex(c.r5()), "\n")
+	print("r6   ", hex(c.r6()), "\t")
+	print("r7   ", hex(c.r7()), "\n")
+	print("r8   ", hex(c.r8()), "\t")
+	print("r9   ", hex(c.r9()), "\n")
+	print("r10  ", hex(c.r10()), "\t")
+	print("r11  ", hex(c.r11()), "\n")
+	print("r12  ", hex(c.r12()), "\t")
+	print("r13  ", hex(c.r13()), "\n")
+	print("r14  ", hex(c.r14()), "\t")
+	print("r15  ", hex(c.r15()), "\n")
+	print("pc   ", hex(c.pc()), "\t")
+	print("link ", hex(c.link()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.pc()) }
+
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.sp()) }
+func (c *sigctxt) siglr() uintptr { return uintptr(c.link()) }
+func (c *sigctxt) fault() uintptr { return uintptr(c.sigaddr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// We arrange link, and pc to pretend the panicking
+	// function calls sigpanic directly.
+	// Always save LINK to stack so that panics in leaf
+	// functions are correctly handled. This smashes
+	// the stack frame but we're not going back there
+	// anyway.
+	sp := c.sp() - sys.MinFrameSize
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.link()
+
+	pc := uintptr(gp.sigpc)
+
+	if shouldPushSigpanic(gp, pc, uintptr(c.link())) {
+		// Make it look the like faulting PC called sigpanic.
+		c.set_link(uint64(pc))
+	}
+
+	// In case we are panicking from external C code
+	c.set_r0(0)
+	c.set_r13(uint64(uintptr(unsafe.Pointer(gp))))
+	c.set_pc(uint64(abi.FuncPCABIInternal(sigpanic)))
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Push the LR to stack, as we'll clobber it in order to
+	// push the call. The function being pushed is responsible
+	// for restoring the LR and setting the SP back.
+	// This extra slot is known to gentraceback.
+	sp := c.sp() - 8
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.link()
+	// Set up PC and LR to pretend the function being signaled
+	// calls targetPC at resumePC.
+	c.set_link(uint64(resumePC))
+	c.set_pc(uint64(targetPC))
+}
diff --git a/src/runtime/signal_loong64.go b/src/runtime/signal_loong64.go
new file mode 100644
index 0000000..ac842c0
--- /dev/null
+++ b/src/runtime/signal_loong64.go
@@ -0,0 +1,96 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && loong64
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+func dumpregs(c *sigctxt) {
+	print("r0   ", hex(c.r0()), "\t")
+	print("r1   ", hex(c.r1()), "\n")
+	print("r2   ", hex(c.r2()), "\t")
+	print("r3   ", hex(c.r3()), "\n")
+	print("r4   ", hex(c.r4()), "\t")
+	print("r5   ", hex(c.r5()), "\n")
+	print("r6   ", hex(c.r6()), "\t")
+	print("r7   ", hex(c.r7()), "\n")
+	print("r8   ", hex(c.r8()), "\t")
+	print("r9   ", hex(c.r9()), "\n")
+	print("r10  ", hex(c.r10()), "\t")
+	print("r11  ", hex(c.r11()), "\n")
+	print("r12  ", hex(c.r12()), "\t")
+	print("r13  ", hex(c.r13()), "\n")
+	print("r14  ", hex(c.r14()), "\t")
+	print("r15  ", hex(c.r15()), "\n")
+	print("r16  ", hex(c.r16()), "\t")
+	print("r17  ", hex(c.r17()), "\n")
+	print("r18  ", hex(c.r18()), "\t")
+	print("r19  ", hex(c.r19()), "\n")
+	print("r20  ", hex(c.r20()), "\t")
+	print("r21  ", hex(c.r21()), "\n")
+	print("r22  ", hex(c.r22()), "\t")
+	print("r23  ", hex(c.r23()), "\n")
+	print("r24  ", hex(c.r24()), "\t")
+	print("r25  ", hex(c.r25()), "\n")
+	print("r26  ", hex(c.r26()), "\t")
+	print("r27  ", hex(c.r27()), "\n")
+	print("r28  ", hex(c.r28()), "\t")
+	print("r29  ", hex(c.r29()), "\n")
+	print("r30  ", hex(c.r30()), "\t")
+	print("r31  ", hex(c.r31()), "\n")
+	print("pc   ", hex(c.pc()), "\t")
+	print("link ", hex(c.link()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.pc()) }
+
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.sp()) }
+func (c *sigctxt) siglr() uintptr { return uintptr(c.link()) }
+func (c *sigctxt) fault() uintptr { return uintptr(c.sigaddr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// We arrange link, and pc to pretend the panicking
+	// function calls sigpanic directly.
+	// Always save LINK to stack so that panics in leaf
+	// functions are correctly handled. This smashes
+	// the stack frame but we're not going back there
+	// anyway.
+	sp := c.sp() - goarch.PtrSize
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.link()
+
+	pc := gp.sigpc
+
+	if shouldPushSigpanic(gp, pc, uintptr(c.link())) {
+		// Make it look the like faulting PC called sigpanic.
+		c.set_link(uint64(pc))
+	}
+
+	// In case we are panicking from external C code
+	c.set_r22(uint64(uintptr(unsafe.Pointer(gp))))
+	c.set_pc(uint64(abi.FuncPCABIInternal(sigpanic)))
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Push the LR to stack, as we'll clobber it in order to
+	// push the call. The function being pushed is responsible
+	// for restoring the LR and setting the SP back.
+	// This extra slot is known to gentraceback.
+	sp := c.sp() - 8
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.link()
+	// Set up PC and LR to pretend the function being signaled
+	// calls targetPC at resumePC.
+	c.set_link(uint64(resumePC))
+	c.set_pc(uint64(targetPC))
+}
diff --git a/src/runtime/signal_mips64x.go b/src/runtime/signal_mips64x.go
new file mode 100644
index 0000000..cee1bf7
--- /dev/null
+++ b/src/runtime/signal_mips64x.go
@@ -0,0 +1,100 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build (linux || openbsd) && (mips64 || mips64le)
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+func dumpregs(c *sigctxt) {
+	print("r0   ", hex(c.r0()), "\t")
+	print("r1   ", hex(c.r1()), "\n")
+	print("r2   ", hex(c.r2()), "\t")
+	print("r3   ", hex(c.r3()), "\n")
+	print("r4   ", hex(c.r4()), "\t")
+	print("r5   ", hex(c.r5()), "\n")
+	print("r6   ", hex(c.r6()), "\t")
+	print("r7   ", hex(c.r7()), "\n")
+	print("r8   ", hex(c.r8()), "\t")
+	print("r9   ", hex(c.r9()), "\n")
+	print("r10  ", hex(c.r10()), "\t")
+	print("r11  ", hex(c.r11()), "\n")
+	print("r12  ", hex(c.r12()), "\t")
+	print("r13  ", hex(c.r13()), "\n")
+	print("r14  ", hex(c.r14()), "\t")
+	print("r15  ", hex(c.r15()), "\n")
+	print("r16  ", hex(c.r16()), "\t")
+	print("r17  ", hex(c.r17()), "\n")
+	print("r18  ", hex(c.r18()), "\t")
+	print("r19  ", hex(c.r19()), "\n")
+	print("r20  ", hex(c.r20()), "\t")
+	print("r21  ", hex(c.r21()), "\n")
+	print("r22  ", hex(c.r22()), "\t")
+	print("r23  ", hex(c.r23()), "\n")
+	print("r24  ", hex(c.r24()), "\t")
+	print("r25  ", hex(c.r25()), "\n")
+	print("r26  ", hex(c.r26()), "\t")
+	print("r27  ", hex(c.r27()), "\n")
+	print("r28  ", hex(c.r28()), "\t")
+	print("r29  ", hex(c.r29()), "\n")
+	print("r30  ", hex(c.r30()), "\t")
+	print("r31  ", hex(c.r31()), "\n")
+	print("pc   ", hex(c.pc()), "\t")
+	print("link ", hex(c.link()), "\n")
+	print("lo   ", hex(c.lo()), "\t")
+	print("hi   ", hex(c.hi()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.pc()) }
+
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.sp()) }
+func (c *sigctxt) siglr() uintptr { return uintptr(c.link()) }
+func (c *sigctxt) fault() uintptr { return uintptr(c.sigaddr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// We arrange link, and pc to pretend the panicking
+	// function calls sigpanic directly.
+	// Always save LINK to stack so that panics in leaf
+	// functions are correctly handled. This smashes
+	// the stack frame but we're not going back there
+	// anyway.
+	sp := c.sp() - goarch.PtrSize
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.link()
+
+	pc := gp.sigpc
+
+	if shouldPushSigpanic(gp, pc, uintptr(c.link())) {
+		// Make it look the like faulting PC called sigpanic.
+		c.set_link(uint64(pc))
+	}
+
+	// In case we are panicking from external C code
+	sigpanicPC := uint64(abi.FuncPCABIInternal(sigpanic))
+	c.set_r28(sigpanicPC >> 32 << 32) // RSB register
+	c.set_r30(uint64(uintptr(unsafe.Pointer(gp))))
+	c.set_pc(sigpanicPC)
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Push the LR to stack, as we'll clobber it in order to
+	// push the call. The function being pushed is responsible
+	// for restoring the LR and setting the SP back.
+	// This extra slot is known to gentraceback.
+	sp := c.sp() - 8
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.link()
+	// Set up PC and LR to pretend the function being signaled
+	// calls targetPC at resumePC.
+	c.set_link(uint64(resumePC))
+	c.set_pc(uint64(targetPC))
+}
diff --git a/src/runtime/signal_mipsx.go b/src/runtime/signal_mipsx.go
new file mode 100644
index 0000000..ba92655
--- /dev/null
+++ b/src/runtime/signal_mipsx.go
@@ -0,0 +1,95 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (mips || mipsle)
+
+package runtime
+
+import (
+	"internal/abi"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+func dumpregs(c *sigctxt) {
+	print("r0   ", hex(c.r0()), "\t")
+	print("r1   ", hex(c.r1()), "\n")
+	print("r2   ", hex(c.r2()), "\t")
+	print("r3   ", hex(c.r3()), "\n")
+	print("r4   ", hex(c.r4()), "\t")
+	print("r5   ", hex(c.r5()), "\n")
+	print("r6   ", hex(c.r6()), "\t")
+	print("r7   ", hex(c.r7()), "\n")
+	print("r8   ", hex(c.r8()), "\t")
+	print("r9   ", hex(c.r9()), "\n")
+	print("r10  ", hex(c.r10()), "\t")
+	print("r11  ", hex(c.r11()), "\n")
+	print("r12  ", hex(c.r12()), "\t")
+	print("r13  ", hex(c.r13()), "\n")
+	print("r14  ", hex(c.r14()), "\t")
+	print("r15  ", hex(c.r15()), "\n")
+	print("r16  ", hex(c.r16()), "\t")
+	print("r17  ", hex(c.r17()), "\n")
+	print("r18  ", hex(c.r18()), "\t")
+	print("r19  ", hex(c.r19()), "\n")
+	print("r20  ", hex(c.r20()), "\t")
+	print("r21  ", hex(c.r21()), "\n")
+	print("r22  ", hex(c.r22()), "\t")
+	print("r23  ", hex(c.r23()), "\n")
+	print("r24  ", hex(c.r24()), "\t")
+	print("r25  ", hex(c.r25()), "\n")
+	print("r26  ", hex(c.r26()), "\t")
+	print("r27  ", hex(c.r27()), "\n")
+	print("r28  ", hex(c.r28()), "\t")
+	print("r29  ", hex(c.r29()), "\n")
+	print("r30  ", hex(c.r30()), "\t")
+	print("r31  ", hex(c.r31()), "\n")
+	print("pc   ", hex(c.pc()), "\t")
+	print("link ", hex(c.link()), "\n")
+	print("lo   ", hex(c.lo()), "\t")
+	print("hi   ", hex(c.hi()), "\n")
+}
+
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.pc()) }
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.sp()) }
+func (c *sigctxt) siglr() uintptr { return uintptr(c.link()) }
+func (c *sigctxt) fault() uintptr { return uintptr(c.sigaddr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// We arrange link, and pc to pretend the panicking
+	// function calls sigpanic directly.
+	// Always save LINK to stack so that panics in leaf
+	// functions are correctly handled. This smashes
+	// the stack frame but we're not going back there
+	// anyway.
+	sp := c.sp() - sys.MinFrameSize
+	c.set_sp(sp)
+	*(*uint32)(unsafe.Pointer(uintptr(sp))) = c.link()
+
+	pc := gp.sigpc
+
+	if shouldPushSigpanic(gp, pc, uintptr(c.link())) {
+		// Make it look the like faulting PC called sigpanic.
+		c.set_link(uint32(pc))
+	}
+
+	// In case we are panicking from external C code
+	c.set_r30(uint32(uintptr(unsafe.Pointer(gp))))
+	c.set_pc(uint32(abi.FuncPCABIInternal(sigpanic)))
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Push the LR to stack, as we'll clobber it in order to
+	// push the call. The function being pushed is responsible
+	// for restoring the LR and setting the SP back.
+	// This extra slot is known to gentraceback.
+	sp := c.sp() - 4
+	c.set_sp(sp)
+	*(*uint32)(unsafe.Pointer(uintptr(sp))) = c.link()
+	// Set up PC and LR to pretend the function being signaled
+	// calls targetPC at resumePC.
+	c.set_link(uint32(resumePC))
+	c.set_pc(uint32(targetPC))
+}
diff --git a/src/runtime/signal_netbsd.go b/src/runtime/signal_netbsd.go
new file mode 100644
index 0000000..ca51084
--- /dev/null
+++ b/src/runtime/signal_netbsd.go
@@ -0,0 +1,41 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	/*  0 */ {0, "SIGNONE: no trap"},
+	/*  1 */ {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
+	/*  2 */ {_SigNotify + _SigKill, "SIGINT: interrupt"},
+	/*  3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+	/*  4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
+	/*  5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
+	/*  6 */ {_SigNotify + _SigThrow, "SIGABRT: abort"},
+	/*  7 */ {_SigThrow, "SIGEMT: emulate instruction executed"},
+	/*  8 */ {_SigPanic + _SigUnblock, "SIGFPE: floating-point exception"},
+	/*  9 */ {0, "SIGKILL: kill"},
+	/* 10 */ {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	/* 11 */ {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	/* 12 */ {_SigThrow, "SIGSYS: bad system call"},
+	/* 13 */ {_SigNotify, "SIGPIPE: write to broken pipe"},
+	/* 14 */ {_SigNotify, "SIGALRM: alarm clock"},
+	/* 15 */ {_SigNotify + _SigKill, "SIGTERM: termination"},
+	/* 16 */ {_SigNotify + _SigIgn, "SIGURG: urgent condition on socket"},
+	/* 17 */ {0, "SIGSTOP: stop"},
+	/* 18 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTSTP: keyboard stop"},
+	/* 19 */ {_SigNotify + _SigDefault + _SigIgn, "SIGCONT: continue after stop"},
+	/* 20 */ {_SigNotify + _SigUnblock + _SigIgn, "SIGCHLD: child status has changed"},
+	/* 21 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTIN: background read from tty"},
+	/* 22 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTOU: background write to tty"},
+	/* 23 */ {_SigNotify + _SigIgn, "SIGIO: i/o now possible"},
+	/* 24 */ {_SigNotify, "SIGXCPU: cpu limit exceeded"},
+	/* 25 */ {_SigNotify, "SIGXFSZ: file size limit exceeded"},
+	/* 26 */ {_SigNotify, "SIGVTALRM: virtual alarm clock"},
+	/* 27 */ {_SigNotify + _SigUnblock, "SIGPROF: profiling alarm clock"},
+	/* 28 */ {_SigNotify + _SigIgn, "SIGWINCH: window size change"},
+	/* 29 */ {_SigNotify + _SigIgn, "SIGINFO: status request from keyboard"},
+	/* 30 */ {_SigNotify, "SIGUSR1: user-defined signal 1"},
+	/* 31 */ {_SigNotify, "SIGUSR2: user-defined signal 2"},
+	/* 32 */ {_SigNotify, "SIGTHR: reserved"},
+}
diff --git a/src/runtime/signal_netbsd_386.go b/src/runtime/signal_netbsd_386.go
new file mode 100644
index 0000000..845a575
--- /dev/null
+++ b/src/runtime/signal_netbsd_386.go
@@ -0,0 +1,45 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontextt { return &(*ucontextt)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) eax() uint32 { return c.regs().__gregs[_REG_EAX] }
+func (c *sigctxt) ebx() uint32 { return c.regs().__gregs[_REG_EBX] }
+func (c *sigctxt) ecx() uint32 { return c.regs().__gregs[_REG_ECX] }
+func (c *sigctxt) edx() uint32 { return c.regs().__gregs[_REG_EDX] }
+func (c *sigctxt) edi() uint32 { return c.regs().__gregs[_REG_EDI] }
+func (c *sigctxt) esi() uint32 { return c.regs().__gregs[_REG_ESI] }
+func (c *sigctxt) ebp() uint32 { return c.regs().__gregs[_REG_EBP] }
+func (c *sigctxt) esp() uint32 { return c.regs().__gregs[_REG_UESP] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) eip() uint32 { return c.regs().__gregs[_REG_EIP] }
+
+func (c *sigctxt) eflags() uint32  { return c.regs().__gregs[_REG_EFL] }
+func (c *sigctxt) cs() uint32      { return c.regs().__gregs[_REG_CS] }
+func (c *sigctxt) fs() uint32      { return c.regs().__gregs[_REG_FS] }
+func (c *sigctxt) gs() uint32      { return c.regs().__gregs[_REG_GS] }
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info._code) }
+func (c *sigctxt) sigaddr() uint32 {
+	return *(*uint32)(unsafe.Pointer(&c.info._reason[0]))
+}
+
+func (c *sigctxt) set_eip(x uint32)     { c.regs().__gregs[_REG_EIP] = x }
+func (c *sigctxt) set_esp(x uint32)     { c.regs().__gregs[_REG_UESP] = x }
+func (c *sigctxt) set_sigcode(x uint32) { c.info._code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) {
+	*(*uint32)(unsafe.Pointer(&c.info._reason[0])) = x
+}
diff --git a/src/runtime/signal_netbsd_amd64.go b/src/runtime/signal_netbsd_amd64.go
new file mode 100644
index 0000000..2112efe
--- /dev/null
+++ b/src/runtime/signal_netbsd_amd64.go
@@ -0,0 +1,55 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontextt {
+	return (*mcontextt)(unsafe.Pointer(&(*ucontextt)(c.ctxt).uc_mcontext))
+}
+
+func (c *sigctxt) rax() uint64 { return c.regs().__gregs[_REG_RAX] }
+func (c *sigctxt) rbx() uint64 { return c.regs().__gregs[_REG_RBX] }
+func (c *sigctxt) rcx() uint64 { return c.regs().__gregs[_REG_RCX] }
+func (c *sigctxt) rdx() uint64 { return c.regs().__gregs[_REG_RDX] }
+func (c *sigctxt) rdi() uint64 { return c.regs().__gregs[_REG_RDI] }
+func (c *sigctxt) rsi() uint64 { return c.regs().__gregs[_REG_RSI] }
+func (c *sigctxt) rbp() uint64 { return c.regs().__gregs[_REG_RBP] }
+func (c *sigctxt) rsp() uint64 { return c.regs().__gregs[_REG_RSP] }
+func (c *sigctxt) r8() uint64  { return c.regs().__gregs[_REG_R8] }
+func (c *sigctxt) r9() uint64  { return c.regs().__gregs[_REG_R9] }
+func (c *sigctxt) r10() uint64 { return c.regs().__gregs[_REG_R10] }
+func (c *sigctxt) r11() uint64 { return c.regs().__gregs[_REG_R11] }
+func (c *sigctxt) r12() uint64 { return c.regs().__gregs[_REG_R12] }
+func (c *sigctxt) r13() uint64 { return c.regs().__gregs[_REG_R13] }
+func (c *sigctxt) r14() uint64 { return c.regs().__gregs[_REG_R14] }
+func (c *sigctxt) r15() uint64 { return c.regs().__gregs[_REG_R15] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) rip() uint64 { return c.regs().__gregs[_REG_RIP] }
+
+func (c *sigctxt) rflags() uint64  { return c.regs().__gregs[_REG_RFLAGS] }
+func (c *sigctxt) cs() uint64      { return c.regs().__gregs[_REG_CS] }
+func (c *sigctxt) fs() uint64      { return c.regs().__gregs[_REG_FS] }
+func (c *sigctxt) gs() uint64      { return c.regs().__gregs[_REG_GS] }
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info._code) }
+func (c *sigctxt) sigaddr() uint64 {
+	return *(*uint64)(unsafe.Pointer(&c.info._reason[0]))
+}
+
+func (c *sigctxt) set_rip(x uint64)     { c.regs().__gregs[_REG_RIP] = x }
+func (c *sigctxt) set_rsp(x uint64)     { c.regs().__gregs[_REG_RSP] = x }
+func (c *sigctxt) set_sigcode(x uint64) { c.info._code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uint64)(unsafe.Pointer(&c.info._reason[0])) = x
+}
diff --git a/src/runtime/signal_netbsd_arm.go b/src/runtime/signal_netbsd_arm.go
new file mode 100644
index 0000000..fdb3078
--- /dev/null
+++ b/src/runtime/signal_netbsd_arm.go
@@ -0,0 +1,55 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontextt { return &(*ucontextt)(c.ctxt).uc_mcontext }
+
+func (c *sigctxt) r0() uint32  { return c.regs().__gregs[_REG_R0] }
+func (c *sigctxt) r1() uint32  { return c.regs().__gregs[_REG_R1] }
+func (c *sigctxt) r2() uint32  { return c.regs().__gregs[_REG_R2] }
+func (c *sigctxt) r3() uint32  { return c.regs().__gregs[_REG_R3] }
+func (c *sigctxt) r4() uint32  { return c.regs().__gregs[_REG_R4] }
+func (c *sigctxt) r5() uint32  { return c.regs().__gregs[_REG_R5] }
+func (c *sigctxt) r6() uint32  { return c.regs().__gregs[_REG_R6] }
+func (c *sigctxt) r7() uint32  { return c.regs().__gregs[_REG_R7] }
+func (c *sigctxt) r8() uint32  { return c.regs().__gregs[_REG_R8] }
+func (c *sigctxt) r9() uint32  { return c.regs().__gregs[_REG_R9] }
+func (c *sigctxt) r10() uint32 { return c.regs().__gregs[_REG_R10] }
+func (c *sigctxt) fp() uint32  { return c.regs().__gregs[_REG_R11] }
+func (c *sigctxt) ip() uint32  { return c.regs().__gregs[_REG_R12] }
+func (c *sigctxt) sp() uint32  { return c.regs().__gregs[_REG_R13] }
+func (c *sigctxt) lr() uint32  { return c.regs().__gregs[_REG_R14] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint32 { return c.regs().__gregs[_REG_R15] }
+
+func (c *sigctxt) cpsr() uint32    { return c.regs().__gregs[_REG_CPSR] }
+func (c *sigctxt) fault() uintptr  { return uintptr(c.info._reason) }
+func (c *sigctxt) trap() uint32    { return 0 }
+func (c *sigctxt) error() uint32   { return 0 }
+func (c *sigctxt) oldmask() uint32 { return 0 }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info._code) }
+func (c *sigctxt) sigaddr() uint32 { return uint32(c.info._reason) }
+
+func (c *sigctxt) set_pc(x uint32)  { c.regs().__gregs[_REG_R15] = x }
+func (c *sigctxt) set_sp(x uint32)  { c.regs().__gregs[_REG_R13] = x }
+func (c *sigctxt) set_lr(x uint32)  { c.regs().__gregs[_REG_R14] = x }
+func (c *sigctxt) set_r10(x uint32) { c.regs().__gregs[_REG_R10] = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info._code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) {
+	c.info._reason = uintptr(x)
+}
diff --git a/src/runtime/signal_netbsd_arm64.go b/src/runtime/signal_netbsd_arm64.go
new file mode 100644
index 0000000..8dfdfea
--- /dev/null
+++ b/src/runtime/signal_netbsd_arm64.go
@@ -0,0 +1,73 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontextt {
+	return (*mcontextt)(unsafe.Pointer(&(*ucontextt)(c.ctxt).uc_mcontext))
+}
+
+func (c *sigctxt) r0() uint64  { return c.regs().__gregs[_REG_X0] }
+func (c *sigctxt) r1() uint64  { return c.regs().__gregs[_REG_X1] }
+func (c *sigctxt) r2() uint64  { return c.regs().__gregs[_REG_X2] }
+func (c *sigctxt) r3() uint64  { return c.regs().__gregs[_REG_X3] }
+func (c *sigctxt) r4() uint64  { return c.regs().__gregs[_REG_X4] }
+func (c *sigctxt) r5() uint64  { return c.regs().__gregs[_REG_X5] }
+func (c *sigctxt) r6() uint64  { return c.regs().__gregs[_REG_X6] }
+func (c *sigctxt) r7() uint64  { return c.regs().__gregs[_REG_X7] }
+func (c *sigctxt) r8() uint64  { return c.regs().__gregs[_REG_X8] }
+func (c *sigctxt) r9() uint64  { return c.regs().__gregs[_REG_X9] }
+func (c *sigctxt) r10() uint64 { return c.regs().__gregs[_REG_X10] }
+func (c *sigctxt) r11() uint64 { return c.regs().__gregs[_REG_X11] }
+func (c *sigctxt) r12() uint64 { return c.regs().__gregs[_REG_X12] }
+func (c *sigctxt) r13() uint64 { return c.regs().__gregs[_REG_X13] }
+func (c *sigctxt) r14() uint64 { return c.regs().__gregs[_REG_X14] }
+func (c *sigctxt) r15() uint64 { return c.regs().__gregs[_REG_X15] }
+func (c *sigctxt) r16() uint64 { return c.regs().__gregs[_REG_X16] }
+func (c *sigctxt) r17() uint64 { return c.regs().__gregs[_REG_X17] }
+func (c *sigctxt) r18() uint64 { return c.regs().__gregs[_REG_X18] }
+func (c *sigctxt) r19() uint64 { return c.regs().__gregs[_REG_X19] }
+func (c *sigctxt) r20() uint64 { return c.regs().__gregs[_REG_X20] }
+func (c *sigctxt) r21() uint64 { return c.regs().__gregs[_REG_X21] }
+func (c *sigctxt) r22() uint64 { return c.regs().__gregs[_REG_X22] }
+func (c *sigctxt) r23() uint64 { return c.regs().__gregs[_REG_X23] }
+func (c *sigctxt) r24() uint64 { return c.regs().__gregs[_REG_X24] }
+func (c *sigctxt) r25() uint64 { return c.regs().__gregs[_REG_X25] }
+func (c *sigctxt) r26() uint64 { return c.regs().__gregs[_REG_X26] }
+func (c *sigctxt) r27() uint64 { return c.regs().__gregs[_REG_X27] }
+func (c *sigctxt) r28() uint64 { return c.regs().__gregs[_REG_X28] }
+func (c *sigctxt) r29() uint64 { return c.regs().__gregs[_REG_X29] }
+func (c *sigctxt) lr() uint64  { return c.regs().__gregs[_REG_X30] }
+func (c *sigctxt) sp() uint64  { return c.regs().__gregs[_REG_X31] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().__gregs[_REG_ELR] }
+
+func (c *sigctxt) fault() uintptr  { return uintptr(c.info._reason) }
+func (c *sigctxt) trap() uint64    { return 0 }
+func (c *sigctxt) error() uint64   { return 0 }
+func (c *sigctxt) oldmask() uint64 { return 0 }
+
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info._code) }
+func (c *sigctxt) sigaddr() uint64 { return uint64(c.info._reason) }
+
+func (c *sigctxt) set_pc(x uint64)  { c.regs().__gregs[_REG_ELR] = x }
+func (c *sigctxt) set_sp(x uint64)  { c.regs().__gregs[_REG_X31] = x }
+func (c *sigctxt) set_lr(x uint64)  { c.regs().__gregs[_REG_X30] = x }
+func (c *sigctxt) set_r28(x uint64) { c.regs().__gregs[_REG_X28] = x }
+
+func (c *sigctxt) set_sigcode(x uint64) { c.info._code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	c.info._reason = uintptr(x)
+}
diff --git a/src/runtime/signal_openbsd.go b/src/runtime/signal_openbsd.go
new file mode 100644
index 0000000..d2c5c5e
--- /dev/null
+++ b/src/runtime/signal_openbsd.go
@@ -0,0 +1,41 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	/*  0 */ {0, "SIGNONE: no trap"},
+	/*  1 */ {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
+	/*  2 */ {_SigNotify + _SigKill, "SIGINT: interrupt"},
+	/*  3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+	/*  4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
+	/*  5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
+	/*  6 */ {_SigNotify + _SigThrow, "SIGABRT: abort"},
+	/*  7 */ {_SigThrow, "SIGEMT: emulate instruction executed"},
+	/*  8 */ {_SigPanic + _SigUnblock, "SIGFPE: floating-point exception"},
+	/*  9 */ {0, "SIGKILL: kill"},
+	/* 10 */ {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	/* 11 */ {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	/* 12 */ {_SigThrow, "SIGSYS: bad system call"},
+	/* 13 */ {_SigNotify, "SIGPIPE: write to broken pipe"},
+	/* 14 */ {_SigNotify, "SIGALRM: alarm clock"},
+	/* 15 */ {_SigNotify + _SigKill, "SIGTERM: termination"},
+	/* 16 */ {_SigNotify + _SigIgn, "SIGURG: urgent condition on socket"},
+	/* 17 */ {0, "SIGSTOP: stop"},
+	/* 18 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTSTP: keyboard stop"},
+	/* 19 */ {_SigNotify + _SigDefault + _SigIgn, "SIGCONT: continue after stop"},
+	/* 20 */ {_SigNotify + _SigUnblock + _SigIgn, "SIGCHLD: child status has changed"},
+	/* 21 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTIN: background read from tty"},
+	/* 22 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTOU: background write to tty"},
+	/* 23 */ {_SigNotify, "SIGIO: i/o now possible"},
+	/* 24 */ {_SigNotify, "SIGXCPU: cpu limit exceeded"},
+	/* 25 */ {_SigNotify, "SIGXFSZ: file size limit exceeded"},
+	/* 26 */ {_SigNotify, "SIGVTALRM: virtual alarm clock"},
+	/* 27 */ {_SigNotify + _SigUnblock, "SIGPROF: profiling alarm clock"},
+	/* 28 */ {_SigNotify, "SIGWINCH: window size change"},
+	/* 29 */ {_SigNotify, "SIGINFO: status request from keyboard"},
+	/* 30 */ {_SigNotify, "SIGUSR1: user-defined signal 1"},
+	/* 31 */ {_SigNotify, "SIGUSR2: user-defined signal 2"},
+	/* 32 */ {0, "SIGTHR: reserved"}, // thread AST - cannot be registered.
+}
diff --git a/src/runtime/signal_openbsd_386.go b/src/runtime/signal_openbsd_386.go
new file mode 100644
index 0000000..2fc4b1d
--- /dev/null
+++ b/src/runtime/signal_openbsd_386.go
@@ -0,0 +1,47 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext {
+	return (*sigcontext)(c.ctxt)
+}
+
+func (c *sigctxt) eax() uint32 { return c.regs().sc_eax }
+func (c *sigctxt) ebx() uint32 { return c.regs().sc_ebx }
+func (c *sigctxt) ecx() uint32 { return c.regs().sc_ecx }
+func (c *sigctxt) edx() uint32 { return c.regs().sc_edx }
+func (c *sigctxt) edi() uint32 { return c.regs().sc_edi }
+func (c *sigctxt) esi() uint32 { return c.regs().sc_esi }
+func (c *sigctxt) ebp() uint32 { return c.regs().sc_ebp }
+func (c *sigctxt) esp() uint32 { return c.regs().sc_esp }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) eip() uint32 { return c.regs().sc_eip }
+
+func (c *sigctxt) eflags() uint32  { return c.regs().sc_eflags }
+func (c *sigctxt) cs() uint32      { return c.regs().sc_cs }
+func (c *sigctxt) fs() uint32      { return c.regs().sc_fs }
+func (c *sigctxt) gs() uint32      { return c.regs().sc_gs }
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint32 {
+	return *(*uint32)(add(unsafe.Pointer(c.info), 12))
+}
+
+func (c *sigctxt) set_eip(x uint32)     { c.regs().sc_eip = x }
+func (c *sigctxt) set_esp(x uint32)     { c.regs().sc_esp = x }
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) {
+	*(*uint32)(add(unsafe.Pointer(c.info), 12)) = x
+}
diff --git a/src/runtime/signal_openbsd_amd64.go b/src/runtime/signal_openbsd_amd64.go
new file mode 100644
index 0000000..091a88a
--- /dev/null
+++ b/src/runtime/signal_openbsd_amd64.go
@@ -0,0 +1,55 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext {
+	return (*sigcontext)(c.ctxt)
+}
+
+func (c *sigctxt) rax() uint64 { return c.regs().sc_rax }
+func (c *sigctxt) rbx() uint64 { return c.regs().sc_rbx }
+func (c *sigctxt) rcx() uint64 { return c.regs().sc_rcx }
+func (c *sigctxt) rdx() uint64 { return c.regs().sc_rdx }
+func (c *sigctxt) rdi() uint64 { return c.regs().sc_rdi }
+func (c *sigctxt) rsi() uint64 { return c.regs().sc_rsi }
+func (c *sigctxt) rbp() uint64 { return c.regs().sc_rbp }
+func (c *sigctxt) rsp() uint64 { return c.regs().sc_rsp }
+func (c *sigctxt) r8() uint64  { return c.regs().sc_r8 }
+func (c *sigctxt) r9() uint64  { return c.regs().sc_r9 }
+func (c *sigctxt) r10() uint64 { return c.regs().sc_r10 }
+func (c *sigctxt) r11() uint64 { return c.regs().sc_r11 }
+func (c *sigctxt) r12() uint64 { return c.regs().sc_r12 }
+func (c *sigctxt) r13() uint64 { return c.regs().sc_r13 }
+func (c *sigctxt) r14() uint64 { return c.regs().sc_r14 }
+func (c *sigctxt) r15() uint64 { return c.regs().sc_r15 }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) rip() uint64 { return c.regs().sc_rip }
+
+func (c *sigctxt) rflags() uint64  { return c.regs().sc_rflags }
+func (c *sigctxt) cs() uint64      { return c.regs().sc_cs }
+func (c *sigctxt) fs() uint64      { return c.regs().sc_fs }
+func (c *sigctxt) gs() uint64      { return c.regs().sc_gs }
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 {
+	return *(*uint64)(add(unsafe.Pointer(c.info), 16))
+}
+
+func (c *sigctxt) set_rip(x uint64)     { c.regs().sc_rip = x }
+func (c *sigctxt) set_rsp(x uint64)     { c.regs().sc_rsp = x }
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uint64)(add(unsafe.Pointer(c.info), 16)) = x
+}
diff --git a/src/runtime/signal_openbsd_arm.go b/src/runtime/signal_openbsd_arm.go
new file mode 100644
index 0000000..f796550
--- /dev/null
+++ b/src/runtime/signal_openbsd_arm.go
@@ -0,0 +1,59 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext {
+	return (*sigcontext)(c.ctxt)
+}
+
+func (c *sigctxt) r0() uint32  { return c.regs().sc_r0 }
+func (c *sigctxt) r1() uint32  { return c.regs().sc_r1 }
+func (c *sigctxt) r2() uint32  { return c.regs().sc_r2 }
+func (c *sigctxt) r3() uint32  { return c.regs().sc_r3 }
+func (c *sigctxt) r4() uint32  { return c.regs().sc_r4 }
+func (c *sigctxt) r5() uint32  { return c.regs().sc_r5 }
+func (c *sigctxt) r6() uint32  { return c.regs().sc_r6 }
+func (c *sigctxt) r7() uint32  { return c.regs().sc_r7 }
+func (c *sigctxt) r8() uint32  { return c.regs().sc_r8 }
+func (c *sigctxt) r9() uint32  { return c.regs().sc_r9 }
+func (c *sigctxt) r10() uint32 { return c.regs().sc_r10 }
+func (c *sigctxt) fp() uint32  { return c.regs().sc_r11 }
+func (c *sigctxt) ip() uint32  { return c.regs().sc_r12 }
+func (c *sigctxt) sp() uint32  { return c.regs().sc_usr_sp }
+func (c *sigctxt) lr() uint32  { return c.regs().sc_usr_lr }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint32 { return c.regs().sc_pc }
+
+func (c *sigctxt) cpsr() uint32    { return c.regs().sc_spsr }
+func (c *sigctxt) fault() uintptr  { return uintptr(c.sigaddr()) }
+func (c *sigctxt) trap() uint32    { return 0 }
+func (c *sigctxt) error() uint32   { return 0 }
+func (c *sigctxt) oldmask() uint32 { return 0 }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint32 {
+	return *(*uint32)(add(unsafe.Pointer(c.info), 16))
+}
+
+func (c *sigctxt) set_pc(x uint32)  { c.regs().sc_pc = x }
+func (c *sigctxt) set_sp(x uint32)  { c.regs().sc_usr_sp = x }
+func (c *sigctxt) set_lr(x uint32)  { c.regs().sc_usr_lr = x }
+func (c *sigctxt) set_r10(x uint32) { c.regs().sc_r10 = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint32) {
+	*(*uint32)(add(unsafe.Pointer(c.info), 16)) = x
+}
diff --git a/src/runtime/signal_openbsd_arm64.go b/src/runtime/signal_openbsd_arm64.go
new file mode 100644
index 0000000..3747b4f
--- /dev/null
+++ b/src/runtime/signal_openbsd_arm64.go
@@ -0,0 +1,75 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext {
+	return (*sigcontext)(c.ctxt)
+}
+
+func (c *sigctxt) r0() uint64  { return (uint64)(c.regs().sc_x[0]) }
+func (c *sigctxt) r1() uint64  { return (uint64)(c.regs().sc_x[1]) }
+func (c *sigctxt) r2() uint64  { return (uint64)(c.regs().sc_x[2]) }
+func (c *sigctxt) r3() uint64  { return (uint64)(c.regs().sc_x[3]) }
+func (c *sigctxt) r4() uint64  { return (uint64)(c.regs().sc_x[4]) }
+func (c *sigctxt) r5() uint64  { return (uint64)(c.regs().sc_x[5]) }
+func (c *sigctxt) r6() uint64  { return (uint64)(c.regs().sc_x[6]) }
+func (c *sigctxt) r7() uint64  { return (uint64)(c.regs().sc_x[7]) }
+func (c *sigctxt) r8() uint64  { return (uint64)(c.regs().sc_x[8]) }
+func (c *sigctxt) r9() uint64  { return (uint64)(c.regs().sc_x[9]) }
+func (c *sigctxt) r10() uint64 { return (uint64)(c.regs().sc_x[10]) }
+func (c *sigctxt) r11() uint64 { return (uint64)(c.regs().sc_x[11]) }
+func (c *sigctxt) r12() uint64 { return (uint64)(c.regs().sc_x[12]) }
+func (c *sigctxt) r13() uint64 { return (uint64)(c.regs().sc_x[13]) }
+func (c *sigctxt) r14() uint64 { return (uint64)(c.regs().sc_x[14]) }
+func (c *sigctxt) r15() uint64 { return (uint64)(c.regs().sc_x[15]) }
+func (c *sigctxt) r16() uint64 { return (uint64)(c.regs().sc_x[16]) }
+func (c *sigctxt) r17() uint64 { return (uint64)(c.regs().sc_x[17]) }
+func (c *sigctxt) r18() uint64 { return (uint64)(c.regs().sc_x[18]) }
+func (c *sigctxt) r19() uint64 { return (uint64)(c.regs().sc_x[19]) }
+func (c *sigctxt) r20() uint64 { return (uint64)(c.regs().sc_x[20]) }
+func (c *sigctxt) r21() uint64 { return (uint64)(c.regs().sc_x[21]) }
+func (c *sigctxt) r22() uint64 { return (uint64)(c.regs().sc_x[22]) }
+func (c *sigctxt) r23() uint64 { return (uint64)(c.regs().sc_x[23]) }
+func (c *sigctxt) r24() uint64 { return (uint64)(c.regs().sc_x[24]) }
+func (c *sigctxt) r25() uint64 { return (uint64)(c.regs().sc_x[25]) }
+func (c *sigctxt) r26() uint64 { return (uint64)(c.regs().sc_x[26]) }
+func (c *sigctxt) r27() uint64 { return (uint64)(c.regs().sc_x[27]) }
+func (c *sigctxt) r28() uint64 { return (uint64)(c.regs().sc_x[28]) }
+func (c *sigctxt) r29() uint64 { return (uint64)(c.regs().sc_x[29]) }
+func (c *sigctxt) lr() uint64  { return (uint64)(c.regs().sc_lr) }
+func (c *sigctxt) sp() uint64  { return (uint64)(c.regs().sc_sp) }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) rip() uint64 { return (uint64)(c.regs().sc_lr) } /* XXX */
+
+func (c *sigctxt) fault() uint64   { return c.sigaddr() }
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 {
+	return *(*uint64)(add(unsafe.Pointer(c.info), 16))
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return uint64(c.regs().sc_elr) }
+
+func (c *sigctxt) set_pc(x uint64)  { c.regs().sc_elr = uintptr(x) }
+func (c *sigctxt) set_sp(x uint64)  { c.regs().sc_sp = uintptr(x) }
+func (c *sigctxt) set_lr(x uint64)  { c.regs().sc_lr = uintptr(x) }
+func (c *sigctxt) set_r28(x uint64) { c.regs().sc_x[28] = uintptr(x) }
+
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uint64)(add(unsafe.Pointer(c.info), 16)) = x
+}
diff --git a/src/runtime/signal_openbsd_mips64.go b/src/runtime/signal_openbsd_mips64.go
new file mode 100644
index 0000000..54ed523
--- /dev/null
+++ b/src/runtime/signal_openbsd_mips64.go
@@ -0,0 +1,78 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"unsafe"
+)
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *sigcontext {
+	return (*sigcontext)(c.ctxt)
+}
+
+func (c *sigctxt) r0() uint64  { return c.regs().sc_regs[0] }
+func (c *sigctxt) r1() uint64  { return c.regs().sc_regs[1] }
+func (c *sigctxt) r2() uint64  { return c.regs().sc_regs[2] }
+func (c *sigctxt) r3() uint64  { return c.regs().sc_regs[3] }
+func (c *sigctxt) r4() uint64  { return c.regs().sc_regs[4] }
+func (c *sigctxt) r5() uint64  { return c.regs().sc_regs[5] }
+func (c *sigctxt) r6() uint64  { return c.regs().sc_regs[6] }
+func (c *sigctxt) r7() uint64  { return c.regs().sc_regs[7] }
+func (c *sigctxt) r8() uint64  { return c.regs().sc_regs[8] }
+func (c *sigctxt) r9() uint64  { return c.regs().sc_regs[9] }
+func (c *sigctxt) r10() uint64 { return c.regs().sc_regs[10] }
+func (c *sigctxt) r11() uint64 { return c.regs().sc_regs[11] }
+func (c *sigctxt) r12() uint64 { return c.regs().sc_regs[12] }
+func (c *sigctxt) r13() uint64 { return c.regs().sc_regs[13] }
+func (c *sigctxt) r14() uint64 { return c.regs().sc_regs[14] }
+func (c *sigctxt) r15() uint64 { return c.regs().sc_regs[15] }
+func (c *sigctxt) r16() uint64 { return c.regs().sc_regs[16] }
+func (c *sigctxt) r17() uint64 { return c.regs().sc_regs[17] }
+func (c *sigctxt) r18() uint64 { return c.regs().sc_regs[18] }
+func (c *sigctxt) r19() uint64 { return c.regs().sc_regs[19] }
+func (c *sigctxt) r20() uint64 { return c.regs().sc_regs[20] }
+func (c *sigctxt) r21() uint64 { return c.regs().sc_regs[21] }
+func (c *sigctxt) r22() uint64 { return c.regs().sc_regs[22] }
+func (c *sigctxt) r23() uint64 { return c.regs().sc_regs[23] }
+func (c *sigctxt) r24() uint64 { return c.regs().sc_regs[24] }
+func (c *sigctxt) r25() uint64 { return c.regs().sc_regs[25] }
+func (c *sigctxt) r26() uint64 { return c.regs().sc_regs[26] }
+func (c *sigctxt) r27() uint64 { return c.regs().sc_regs[27] }
+func (c *sigctxt) r28() uint64 { return c.regs().sc_regs[28] }
+func (c *sigctxt) r29() uint64 { return c.regs().sc_regs[29] }
+func (c *sigctxt) r30() uint64 { return c.regs().sc_regs[30] }
+func (c *sigctxt) r31() uint64 { return c.regs().sc_regs[31] }
+func (c *sigctxt) sp() uint64  { return c.regs().sc_regs[29] }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) pc() uint64 { return c.regs().sc_pc }
+
+func (c *sigctxt) link() uint64 { return c.regs().sc_regs[31] }
+func (c *sigctxt) lo() uint64   { return c.regs().mullo }
+func (c *sigctxt) hi() uint64   { return c.regs().mulhi }
+
+func (c *sigctxt) sigcode() uint32 { return uint32(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 {
+	return *(*uint64)(add(unsafe.Pointer(c.info), 16))
+}
+
+func (c *sigctxt) set_r28(x uint64)  { c.regs().sc_regs[28] = x }
+func (c *sigctxt) set_r30(x uint64)  { c.regs().sc_regs[30] = x }
+func (c *sigctxt) set_pc(x uint64)   { c.regs().sc_pc = x }
+func (c *sigctxt) set_sp(x uint64)   { c.regs().sc_regs[29] = x }
+func (c *sigctxt) set_link(x uint64) { c.regs().sc_regs[31] = x }
+
+func (c *sigctxt) set_sigcode(x uint32) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uint64)(add(unsafe.Pointer(c.info), 16)) = x
+}
diff --git a/src/runtime/signal_plan9.go b/src/runtime/signal_plan9.go
new file mode 100644
index 0000000..d3894c8
--- /dev/null
+++ b/src/runtime/signal_plan9.go
@@ -0,0 +1,57 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+type sigTabT struct {
+	flags int
+	name  string
+}
+
+// Incoming notes are compared against this table using strncmp, so the
+// order matters: longer patterns must appear before their prefixes.
+// There are _SIG constants in os2_plan9.go for the table index of some
+// of these.
+//
+// If you add entries to this table, you must respect the prefix ordering
+// and also update the constant values is os2_plan9.go.
+var sigtable = [...]sigTabT{
+	// Traps that we cannot be recovered.
+	{_SigThrow, "sys: trap: debug exception"},
+	{_SigThrow, "sys: trap: invalid opcode"},
+
+	// We can recover from some memory errors in runtime·sigpanic.
+	{_SigPanic, "sys: trap: fault read"},  // SIGRFAULT
+	{_SigPanic, "sys: trap: fault write"}, // SIGWFAULT
+
+	// We can also recover from math errors.
+	{_SigPanic, "sys: trap: divide error"}, // SIGINTDIV
+	{_SigPanic, "sys: fp:"},                // SIGFLOAT
+
+	// All other traps are normally handled as if they were marked SigThrow.
+	// We mark them SigPanic here so that debug.SetPanicOnFault will work.
+	{_SigPanic, "sys: trap:"}, // SIGTRAP
+
+	// Writes to a closed pipe can be handled if desired, otherwise they're ignored.
+	{_SigNotify, "sys: write on closed pipe"},
+
+	// Other system notes are more serious and cannot be recovered.
+	{_SigThrow, "sys:"},
+
+	// Issued to all other procs when calling runtime·exit.
+	{_SigGoExit, "go: exit "},
+
+	// Kill is sent by external programs to cause an exit.
+	{_SigKill, "kill"},
+
+	// Interrupts can be handled if desired, otherwise they cause an exit.
+	{_SigNotify + _SigKill, "interrupt"},
+	{_SigNotify + _SigKill, "hangup"},
+
+	// Alarms can be handled if desired, otherwise they're ignored.
+	{_SigNotify, "alarm"},
+
+	// Aborts can be handled if desired, otherwise they cause a stack trace.
+	{_SigNotify + _SigThrow, "abort"},
+}
diff --git a/src/runtime/signal_ppc64x.go b/src/runtime/signal_ppc64x.go
new file mode 100644
index 0000000..bdd3540
--- /dev/null
+++ b/src/runtime/signal_ppc64x.go
@@ -0,0 +1,111 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build (aix || linux) && (ppc64 || ppc64le)
+
+package runtime
+
+import (
+	"internal/abi"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+func dumpregs(c *sigctxt) {
+	print("r0   ", hex(c.r0()), "\t")
+	print("r1   ", hex(c.r1()), "\n")
+	print("r2   ", hex(c.r2()), "\t")
+	print("r3   ", hex(c.r3()), "\n")
+	print("r4   ", hex(c.r4()), "\t")
+	print("r5   ", hex(c.r5()), "\n")
+	print("r6   ", hex(c.r6()), "\t")
+	print("r7   ", hex(c.r7()), "\n")
+	print("r8   ", hex(c.r8()), "\t")
+	print("r9   ", hex(c.r9()), "\n")
+	print("r10  ", hex(c.r10()), "\t")
+	print("r11  ", hex(c.r11()), "\n")
+	print("r12  ", hex(c.r12()), "\t")
+	print("r13  ", hex(c.r13()), "\n")
+	print("r14  ", hex(c.r14()), "\t")
+	print("r15  ", hex(c.r15()), "\n")
+	print("r16  ", hex(c.r16()), "\t")
+	print("r17  ", hex(c.r17()), "\n")
+	print("r18  ", hex(c.r18()), "\t")
+	print("r19  ", hex(c.r19()), "\n")
+	print("r20  ", hex(c.r20()), "\t")
+	print("r21  ", hex(c.r21()), "\n")
+	print("r22  ", hex(c.r22()), "\t")
+	print("r23  ", hex(c.r23()), "\n")
+	print("r24  ", hex(c.r24()), "\t")
+	print("r25  ", hex(c.r25()), "\n")
+	print("r26  ", hex(c.r26()), "\t")
+	print("r27  ", hex(c.r27()), "\n")
+	print("r28  ", hex(c.r28()), "\t")
+	print("r29  ", hex(c.r29()), "\n")
+	print("r30  ", hex(c.r30()), "\t")
+	print("r31  ", hex(c.r31()), "\n")
+	print("pc   ", hex(c.pc()), "\t")
+	print("ctr  ", hex(c.ctr()), "\n")
+	print("link ", hex(c.link()), "\t")
+	print("xer  ", hex(c.xer()), "\n")
+	print("ccr  ", hex(c.ccr()), "\t")
+	print("trap ", hex(c.trap()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.pc()) }
+
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.sp()) }
+func (c *sigctxt) siglr() uintptr { return uintptr(c.link()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// We arrange link, and pc to pretend the panicking
+	// function calls sigpanic directly.
+	// Always save LINK to stack so that panics in leaf
+	// functions are correctly handled. This smashes
+	// the stack frame but we're not going back there
+	// anyway.
+	sp := c.sp() - sys.MinFrameSize
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.link()
+
+	pc := gp.sigpc
+
+	if shouldPushSigpanic(gp, pc, uintptr(c.link())) {
+		// Make it look the like faulting PC called sigpanic.
+		c.set_link(uint64(pc))
+	}
+
+	// In case we are panicking from external C code
+	c.set_r0(0)
+	c.set_r30(uint64(uintptr(unsafe.Pointer(gp))))
+	c.set_r12(uint64(abi.FuncPCABIInternal(sigpanic)))
+	c.set_pc(uint64(abi.FuncPCABIInternal(sigpanic)))
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Push the LR to stack, as we'll clobber it in order to
+	// push the call. The function being pushed is responsible
+	// for restoring the LR and setting the SP back.
+	// This extra space is known to gentraceback.
+	sp := c.sp() - sys.MinFrameSize
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.link()
+	// In PIC mode, we'll set up (i.e. clobber) R2 on function
+	// entry. Save it ahead of time.
+	// In PIC mode it requires R12 points to the function entry,
+	// so we'll set it up when pushing the call. Save it ahead
+	// of time as well.
+	// 8(SP) and 16(SP) are unused space in the reserved
+	// MinFrameSize (32) bytes.
+	*(*uint64)(unsafe.Pointer(uintptr(sp) + 8)) = c.r2()
+	*(*uint64)(unsafe.Pointer(uintptr(sp) + 16)) = c.r12()
+	// Set up PC and LR to pretend the function being signaled
+	// calls targetPC at resumePC.
+	c.set_link(uint64(resumePC))
+	c.set_r12(uint64(targetPC))
+	c.set_pc(uint64(targetPC))
+}
diff --git a/src/runtime/signal_riscv64.go b/src/runtime/signal_riscv64.go
new file mode 100644
index 0000000..b8d7b97
--- /dev/null
+++ b/src/runtime/signal_riscv64.go
@@ -0,0 +1,94 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build (linux || freebsd) && riscv64
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+func dumpregs(c *sigctxt) {
+	print("ra  ", hex(c.ra()), "\t")
+	print("sp  ", hex(c.sp()), "\n")
+	print("gp  ", hex(c.gp()), "\t")
+	print("tp  ", hex(c.tp()), "\n")
+	print("t0  ", hex(c.t0()), "\t")
+	print("t1  ", hex(c.t1()), "\n")
+	print("t2  ", hex(c.t2()), "\t")
+	print("s0  ", hex(c.s0()), "\n")
+	print("s1  ", hex(c.s1()), "\t")
+	print("a0  ", hex(c.a0()), "\n")
+	print("a1  ", hex(c.a1()), "\t")
+	print("a2  ", hex(c.a2()), "\n")
+	print("a3  ", hex(c.a3()), "\t")
+	print("a4  ", hex(c.a4()), "\n")
+	print("a5  ", hex(c.a5()), "\t")
+	print("a6  ", hex(c.a6()), "\n")
+	print("a7  ", hex(c.a7()), "\t")
+	print("s2  ", hex(c.s2()), "\n")
+	print("s3  ", hex(c.s3()), "\t")
+	print("s4  ", hex(c.s4()), "\n")
+	print("s5  ", hex(c.s5()), "\t")
+	print("s6  ", hex(c.s6()), "\n")
+	print("s7  ", hex(c.s7()), "\t")
+	print("s8  ", hex(c.s8()), "\n")
+	print("s9  ", hex(c.s9()), "\t")
+	print("s10 ", hex(c.s10()), "\n")
+	print("s11 ", hex(c.s11()), "\t")
+	print("t3  ", hex(c.t3()), "\n")
+	print("t4  ", hex(c.t4()), "\t")
+	print("t5  ", hex(c.t5()), "\n")
+	print("t6  ", hex(c.t6()), "\t")
+	print("pc  ", hex(c.pc()), "\n")
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) sigpc() uintptr { return uintptr(c.pc()) }
+
+func (c *sigctxt) sigsp() uintptr { return uintptr(c.sp()) }
+func (c *sigctxt) siglr() uintptr { return uintptr(c.ra()) }
+func (c *sigctxt) fault() uintptr { return uintptr(c.sigaddr()) }
+
+// preparePanic sets up the stack to look like a call to sigpanic.
+func (c *sigctxt) preparePanic(sig uint32, gp *g) {
+	// We arrange RA, and pc to pretend the panicking
+	// function calls sigpanic directly.
+	// Always save RA to stack so that panics in leaf
+	// functions are correctly handled. This smashes
+	// the stack frame but we're not going back there
+	// anyway.
+	sp := c.sp() - goarch.PtrSize
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.ra()
+
+	pc := gp.sigpc
+
+	if shouldPushSigpanic(gp, pc, uintptr(c.ra())) {
+		// Make it look the like faulting PC called sigpanic.
+		c.set_ra(uint64(pc))
+	}
+
+	// In case we are panicking from external C code
+	c.set_gp(uint64(uintptr(unsafe.Pointer(gp))))
+	c.set_pc(uint64(abi.FuncPCABIInternal(sigpanic)))
+}
+
+func (c *sigctxt) pushCall(targetPC, resumePC uintptr) {
+	// Push the LR to stack, as we'll clobber it in order to
+	// push the call. The function being pushed is responsible
+	// for restoring the LR and setting the SP back.
+	// This extra slot is known to gentraceback.
+	sp := c.sp() - goarch.PtrSize
+	c.set_sp(sp)
+	*(*uint64)(unsafe.Pointer(uintptr(sp))) = c.ra()
+	// Set up PC and LR to pretend the function being signaled
+	// calls targetPC at resumePC.
+	c.set_ra(uint64(resumePC))
+	c.set_pc(uint64(targetPC))
+}
diff --git a/src/runtime/signal_solaris.go b/src/runtime/signal_solaris.go
new file mode 100644
index 0000000..25f8ad5
--- /dev/null
+++ b/src/runtime/signal_solaris.go
@@ -0,0 +1,83 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	/* 0 */ {0, "SIGNONE: no trap"},
+	/* 1 */ {_SigNotify + _SigKill, "SIGHUP: hangup"},
+	/* 2 */ {_SigNotify + _SigKill, "SIGINT: interrupt (rubout)"},
+	/* 3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit (ASCII FS)"},
+	/* 4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction (not reset when caught)"},
+	/* 5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap (not reset when caught)"},
+	/* 6 */ {_SigNotify + _SigThrow, "SIGABRT: used by abort, replace SIGIOT in the future"},
+	/* 7 */ {_SigThrow, "SIGEMT: EMT instruction"},
+	/* 8 */ {_SigPanic + _SigUnblock, "SIGFPE: floating point exception"},
+	/* 9 */ {0, "SIGKILL: kill (cannot be caught or ignored)"},
+	/* 10 */ {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	/* 11 */ {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	/* 12 */ {_SigThrow, "SIGSYS: bad argument to system call"},
+	/* 13 */ {_SigNotify, "SIGPIPE: write on a pipe with no one to read it"},
+	/* 14 */ {_SigNotify, "SIGALRM: alarm clock"},
+	/* 15 */ {_SigNotify + _SigKill, "SIGTERM: software termination signal from kill"},
+	/* 16 */ {_SigNotify, "SIGUSR1: user defined signal 1"},
+	/* 17 */ {_SigNotify, "SIGUSR2: user defined signal 2"},
+	/* 18 */ {_SigNotify + _SigUnblock + _SigIgn, "SIGCHLD: child status change alias (POSIX)"},
+	/* 19 */ {_SigNotify, "SIGPWR: power-fail restart"},
+	/* 20 */ {_SigNotify + _SigIgn, "SIGWINCH: window size change"},
+	/* 21 */ {_SigNotify + _SigIgn, "SIGURG: urgent socket condition"},
+	/* 22 */ {_SigNotify, "SIGPOLL: pollable event occurred"},
+	/* 23 */ {0, "SIGSTOP: stop (cannot be caught or ignored)"},
+	/* 24 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTSTP: user stop requested from tty"},
+	/* 25 */ {_SigNotify + _SigDefault + _SigIgn, "SIGCONT: stopped process has been continued"},
+	/* 26 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTIN: background tty read attempted"},
+	/* 27 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTOU: background tty write attempted"},
+	/* 28 */ {_SigNotify, "SIGVTALRM: virtual timer expired"},
+	/* 29 */ {_SigNotify + _SigUnblock, "SIGPROF: profiling timer expired"},
+	/* 30 */ {_SigNotify, "SIGXCPU: exceeded cpu limit"},
+	/* 31 */ {_SigNotify, "SIGXFSZ: exceeded file size limit"},
+	/* 32 */ {_SigNotify, "SIGWAITING: reserved signal no longer used by"},
+	/* 33 */ {_SigNotify, "SIGLWP: reserved signal no longer used by"},
+	/* 34 */ {_SigNotify, "SIGFREEZE: special signal used by CPR"},
+	/* 35 */ {_SigNotify, "SIGTHAW: special signal used by CPR"},
+	/* 36 */ {_SigSetStack + _SigUnblock, "SIGCANCEL: reserved signal for thread cancellation"}, // Oracle's spelling of cancellation.
+	/* 37 */ {_SigNotify, "SIGLOST: resource lost (eg, record-lock lost)"},
+	/* 38 */ {_SigNotify, "SIGXRES: resource control exceeded"},
+	/* 39 */ {_SigNotify, "SIGJVM1: reserved signal for Java Virtual Machine"},
+	/* 40 */ {_SigNotify, "SIGJVM2: reserved signal for Java Virtual Machine"},
+
+	/* TODO(aram): what should be do about these signals? _SigDefault or _SigNotify? is this set static? */
+	/* 41 */ {_SigNotify, "real time signal"},
+	/* 42 */ {_SigNotify, "real time signal"},
+	/* 43 */ {_SigNotify, "real time signal"},
+	/* 44 */ {_SigNotify, "real time signal"},
+	/* 45 */ {_SigNotify, "real time signal"},
+	/* 46 */ {_SigNotify, "real time signal"},
+	/* 47 */ {_SigNotify, "real time signal"},
+	/* 48 */ {_SigNotify, "real time signal"},
+	/* 49 */ {_SigNotify, "real time signal"},
+	/* 50 */ {_SigNotify, "real time signal"},
+	/* 51 */ {_SigNotify, "real time signal"},
+	/* 52 */ {_SigNotify, "real time signal"},
+	/* 53 */ {_SigNotify, "real time signal"},
+	/* 54 */ {_SigNotify, "real time signal"},
+	/* 55 */ {_SigNotify, "real time signal"},
+	/* 56 */ {_SigNotify, "real time signal"},
+	/* 57 */ {_SigNotify, "real time signal"},
+	/* 58 */ {_SigNotify, "real time signal"},
+	/* 59 */ {_SigNotify, "real time signal"},
+	/* 60 */ {_SigNotify, "real time signal"},
+	/* 61 */ {_SigNotify, "real time signal"},
+	/* 62 */ {_SigNotify, "real time signal"},
+	/* 63 */ {_SigNotify, "real time signal"},
+	/* 64 */ {_SigNotify, "real time signal"},
+	/* 65 */ {_SigNotify, "real time signal"},
+	/* 66 */ {_SigNotify, "real time signal"},
+	/* 67 */ {_SigNotify, "real time signal"},
+	/* 68 */ {_SigNotify, "real time signal"},
+	/* 69 */ {_SigNotify, "real time signal"},
+	/* 70 */ {_SigNotify, "real time signal"},
+	/* 71 */ {_SigNotify, "real time signal"},
+	/* 72 */ {_SigNotify, "real time signal"},
+}
diff --git a/src/runtime/signal_solaris_amd64.go b/src/runtime/signal_solaris_amd64.go
new file mode 100644
index 0000000..b1da313
--- /dev/null
+++ b/src/runtime/signal_solaris_amd64.go
@@ -0,0 +1,53 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+type sigctxt struct {
+	info *siginfo
+	ctxt unsafe.Pointer
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) regs() *mcontext {
+	return (*mcontext)(unsafe.Pointer(&(*ucontext)(c.ctxt).uc_mcontext))
+}
+
+func (c *sigctxt) rax() uint64 { return uint64(c.regs().gregs[_REG_RAX]) }
+func (c *sigctxt) rbx() uint64 { return uint64(c.regs().gregs[_REG_RBX]) }
+func (c *sigctxt) rcx() uint64 { return uint64(c.regs().gregs[_REG_RCX]) }
+func (c *sigctxt) rdx() uint64 { return uint64(c.regs().gregs[_REG_RDX]) }
+func (c *sigctxt) rdi() uint64 { return uint64(c.regs().gregs[_REG_RDI]) }
+func (c *sigctxt) rsi() uint64 { return uint64(c.regs().gregs[_REG_RSI]) }
+func (c *sigctxt) rbp() uint64 { return uint64(c.regs().gregs[_REG_RBP]) }
+func (c *sigctxt) rsp() uint64 { return uint64(c.regs().gregs[_REG_RSP]) }
+func (c *sigctxt) r8() uint64  { return uint64(c.regs().gregs[_REG_R8]) }
+func (c *sigctxt) r9() uint64  { return uint64(c.regs().gregs[_REG_R9]) }
+func (c *sigctxt) r10() uint64 { return uint64(c.regs().gregs[_REG_R10]) }
+func (c *sigctxt) r11() uint64 { return uint64(c.regs().gregs[_REG_R11]) }
+func (c *sigctxt) r12() uint64 { return uint64(c.regs().gregs[_REG_R12]) }
+func (c *sigctxt) r13() uint64 { return uint64(c.regs().gregs[_REG_R13]) }
+func (c *sigctxt) r14() uint64 { return uint64(c.regs().gregs[_REG_R14]) }
+func (c *sigctxt) r15() uint64 { return uint64(c.regs().gregs[_REG_R15]) }
+
+//go:nosplit
+//go:nowritebarrierrec
+func (c *sigctxt) rip() uint64 { return uint64(c.regs().gregs[_REG_RIP]) }
+
+func (c *sigctxt) rflags() uint64  { return uint64(c.regs().gregs[_REG_RFLAGS]) }
+func (c *sigctxt) cs() uint64      { return uint64(c.regs().gregs[_REG_CS]) }
+func (c *sigctxt) fs() uint64      { return uint64(c.regs().gregs[_REG_FS]) }
+func (c *sigctxt) gs() uint64      { return uint64(c.regs().gregs[_REG_GS]) }
+func (c *sigctxt) sigcode() uint64 { return uint64(c.info.si_code) }
+func (c *sigctxt) sigaddr() uint64 { return *(*uint64)(unsafe.Pointer(&c.info.__data[0])) }
+
+func (c *sigctxt) set_rip(x uint64)     { c.regs().gregs[_REG_RIP] = int64(x) }
+func (c *sigctxt) set_rsp(x uint64)     { c.regs().gregs[_REG_RSP] = int64(x) }
+func (c *sigctxt) set_sigcode(x uint64) { c.info.si_code = int32(x) }
+func (c *sigctxt) set_sigaddr(x uint64) {
+	*(*uintptr)(unsafe.Pointer(&c.info.__data[0])) = uintptr(x)
+}
diff --git a/src/runtime/signal_unix.go b/src/runtime/signal_unix.go
new file mode 100644
index 0000000..ae842e9
--- /dev/null
+++ b/src/runtime/signal_unix.go
@@ -0,0 +1,1371 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package runtime
+
+import (
+	"internal/abi"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// sigTabT is the type of an entry in the global sigtable array.
+// sigtable is inherently system dependent, and appears in OS-specific files,
+// but sigTabT is the same for all Unixy systems.
+// The sigtable array is indexed by a system signal number to get the flags
+// and printable name of each signal.
+type sigTabT struct {
+	flags int32
+	name  string
+}
+
+//go:linkname os_sigpipe os.sigpipe
+func os_sigpipe() {
+	systemstack(sigpipe)
+}
+
+func signame(sig uint32) string {
+	if sig >= uint32(len(sigtable)) {
+		return ""
+	}
+	return sigtable[sig].name
+}
+
+const (
+	_SIG_DFL uintptr = 0
+	_SIG_IGN uintptr = 1
+)
+
+// sigPreempt is the signal used for non-cooperative preemption.
+//
+// There's no good way to choose this signal, but there are some
+// heuristics:
+//
+// 1. It should be a signal that's passed-through by debuggers by
+// default. On Linux, this is SIGALRM, SIGURG, SIGCHLD, SIGIO,
+// SIGVTALRM, SIGPROF, and SIGWINCH, plus some glibc-internal signals.
+//
+// 2. It shouldn't be used internally by libc in mixed Go/C binaries
+// because libc may assume it's the only thing that can handle these
+// signals. For example SIGCANCEL or SIGSETXID.
+//
+// 3. It should be a signal that can happen spuriously without
+// consequences. For example, SIGALRM is a bad choice because the
+// signal handler can't tell if it was caused by the real process
+// alarm or not (arguably this means the signal is broken, but I
+// digress). SIGUSR1 and SIGUSR2 are also bad because those are often
+// used in meaningful ways by applications.
+//
+// 4. We need to deal with platforms without real-time signals (like
+// macOS), so those are out.
+//
+// We use SIGURG because it meets all of these criteria, is extremely
+// unlikely to be used by an application for its "real" meaning (both
+// because out-of-band data is basically unused and because SIGURG
+// doesn't report which socket has the condition, making it pretty
+// useless), and even if it is, the application has to be ready for
+// spurious SIGURG. SIGIO wouldn't be a bad choice either, but is more
+// likely to be used for real.
+const sigPreempt = _SIGURG
+
+// Stores the signal handlers registered before Go installed its own.
+// These signal handlers will be invoked in cases where Go doesn't want to
+// handle a particular signal (e.g., signal occurred on a non-Go thread).
+// See sigfwdgo for more information on when the signals are forwarded.
+//
+// This is read by the signal handler; accesses should use
+// atomic.Loaduintptr and atomic.Storeuintptr.
+var fwdSig [_NSIG]uintptr
+
+// handlingSig is indexed by signal number and is non-zero if we are
+// currently handling the signal. Or, to put it another way, whether
+// the signal handler is currently set to the Go signal handler or not.
+// This is uint32 rather than bool so that we can use atomic instructions.
+var handlingSig [_NSIG]uint32
+
+// channels for synchronizing signal mask updates with the signal mask
+// thread
+var (
+	disableSigChan  chan uint32
+	enableSigChan   chan uint32
+	maskUpdatedChan chan struct{}
+)
+
+func init() {
+	// _NSIG is the number of signals on this operating system.
+	// sigtable should describe what to do for all the possible signals.
+	if len(sigtable) != _NSIG {
+		print("runtime: len(sigtable)=", len(sigtable), " _NSIG=", _NSIG, "\n")
+		throw("bad sigtable len")
+	}
+}
+
+var signalsOK bool
+
+// Initialize signals.
+// Called by libpreinit so runtime may not be initialized.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func initsig(preinit bool) {
+	if !preinit {
+		// It's now OK for signal handlers to run.
+		signalsOK = true
+	}
+
+	// For c-archive/c-shared this is called by libpreinit with
+	// preinit == true.
+	if (isarchive || islibrary) && !preinit {
+		return
+	}
+
+	for i := uint32(0); i < _NSIG; i++ {
+		t := &sigtable[i]
+		if t.flags == 0 || t.flags&_SigDefault != 0 {
+			continue
+		}
+
+		// We don't need to use atomic operations here because
+		// there shouldn't be any other goroutines running yet.
+		fwdSig[i] = getsig(i)
+
+		if !sigInstallGoHandler(i) {
+			// Even if we are not installing a signal handler,
+			// set SA_ONSTACK if necessary.
+			if fwdSig[i] != _SIG_DFL && fwdSig[i] != _SIG_IGN {
+				setsigstack(i)
+			} else if fwdSig[i] == _SIG_IGN {
+				sigInitIgnored(i)
+			}
+			continue
+		}
+
+		handlingSig[i] = 1
+		setsig(i, abi.FuncPCABIInternal(sighandler))
+	}
+}
+
+//go:nosplit
+//go:nowritebarrierrec
+func sigInstallGoHandler(sig uint32) bool {
+	// For some signals, we respect an inherited SIG_IGN handler
+	// rather than insist on installing our own default handler.
+	// Even these signals can be fetched using the os/signal package.
+	switch sig {
+	case _SIGHUP, _SIGINT:
+		if atomic.Loaduintptr(&fwdSig[sig]) == _SIG_IGN {
+			return false
+		}
+	}
+
+	if (GOOS == "linux" || GOOS == "android") && !iscgo && sig == sigPerThreadSyscall {
+		// sigPerThreadSyscall is the same signal used by glibc for
+		// per-thread syscalls on Linux. We use it for the same purpose
+		// in non-cgo binaries.
+		return true
+	}
+
+	t := &sigtable[sig]
+	if t.flags&_SigSetStack != 0 {
+		return false
+	}
+
+	// When built using c-archive or c-shared, only install signal
+	// handlers for synchronous signals and SIGPIPE and sigPreempt.
+	if (isarchive || islibrary) && t.flags&_SigPanic == 0 && sig != _SIGPIPE && sig != sigPreempt {
+		return false
+	}
+
+	return true
+}
+
+// sigenable enables the Go signal handler to catch the signal sig.
+// It is only called while holding the os/signal.handlers lock,
+// via os/signal.enableSignal and signal_enable.
+func sigenable(sig uint32) {
+	if sig >= uint32(len(sigtable)) {
+		return
+	}
+
+	// SIGPROF is handled specially for profiling.
+	if sig == _SIGPROF {
+		return
+	}
+
+	t := &sigtable[sig]
+	if t.flags&_SigNotify != 0 {
+		ensureSigM()
+		enableSigChan <- sig
+		<-maskUpdatedChan
+		if atomic.Cas(&handlingSig[sig], 0, 1) {
+			atomic.Storeuintptr(&fwdSig[sig], getsig(sig))
+			setsig(sig, abi.FuncPCABIInternal(sighandler))
+		}
+	}
+}
+
+// sigdisable disables the Go signal handler for the signal sig.
+// It is only called while holding the os/signal.handlers lock,
+// via os/signal.disableSignal and signal_disable.
+func sigdisable(sig uint32) {
+	if sig >= uint32(len(sigtable)) {
+		return
+	}
+
+	// SIGPROF is handled specially for profiling.
+	if sig == _SIGPROF {
+		return
+	}
+
+	t := &sigtable[sig]
+	if t.flags&_SigNotify != 0 {
+		ensureSigM()
+		disableSigChan <- sig
+		<-maskUpdatedChan
+
+		// If initsig does not install a signal handler for a
+		// signal, then to go back to the state before Notify
+		// we should remove the one we installed.
+		if !sigInstallGoHandler(sig) {
+			atomic.Store(&handlingSig[sig], 0)
+			setsig(sig, atomic.Loaduintptr(&fwdSig[sig]))
+		}
+	}
+}
+
+// sigignore ignores the signal sig.
+// It is only called while holding the os/signal.handlers lock,
+// via os/signal.ignoreSignal and signal_ignore.
+func sigignore(sig uint32) {
+	if sig >= uint32(len(sigtable)) {
+		return
+	}
+
+	// SIGPROF is handled specially for profiling.
+	if sig == _SIGPROF {
+		return
+	}
+
+	t := &sigtable[sig]
+	if t.flags&_SigNotify != 0 {
+		atomic.Store(&handlingSig[sig], 0)
+		setsig(sig, _SIG_IGN)
+	}
+}
+
+// clearSignalHandlers clears all signal handlers that are not ignored
+// back to the default. This is called by the child after a fork, so that
+// we can enable the signal mask for the exec without worrying about
+// running a signal handler in the child.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func clearSignalHandlers() {
+	for i := uint32(0); i < _NSIG; i++ {
+		if atomic.Load(&handlingSig[i]) != 0 {
+			setsig(i, _SIG_DFL)
+		}
+	}
+}
+
+// setProcessCPUProfilerTimer is called when the profiling timer changes.
+// It is called with prof.signalLock held. hz is the new timer, and is 0 if
+// profiling is being disabled. Enable or disable the signal as
+// required for -buildmode=c-archive.
+func setProcessCPUProfilerTimer(hz int32) {
+	if hz != 0 {
+		// Enable the Go signal handler if not enabled.
+		if atomic.Cas(&handlingSig[_SIGPROF], 0, 1) {
+			h := getsig(_SIGPROF)
+			// If no signal handler was installed before, then we record
+			// _SIG_IGN here. When we turn off profiling (below) we'll start
+			// ignoring SIGPROF signals. We do this, rather than change
+			// to SIG_DFL, because there may be a pending SIGPROF
+			// signal that has not yet been delivered to some other thread.
+			// If we change to SIG_DFL when turning off profiling, the
+			// program will crash when that SIGPROF is delivered. We assume
+			// that programs that use profiling don't want to crash on a
+			// stray SIGPROF. See issue 19320.
+			// We do the change here instead of when turning off profiling,
+			// because there we may race with a signal handler running
+			// concurrently, in particular, sigfwdgo may observe _SIG_DFL and
+			// die. See issue 43828.
+			if h == _SIG_DFL {
+				h = _SIG_IGN
+			}
+			atomic.Storeuintptr(&fwdSig[_SIGPROF], h)
+			setsig(_SIGPROF, abi.FuncPCABIInternal(sighandler))
+		}
+
+		var it itimerval
+		it.it_interval.tv_sec = 0
+		it.it_interval.set_usec(1000000 / hz)
+		it.it_value = it.it_interval
+		setitimer(_ITIMER_PROF, &it, nil)
+	} else {
+		setitimer(_ITIMER_PROF, &itimerval{}, nil)
+
+		// If the Go signal handler should be disabled by default,
+		// switch back to the signal handler that was installed
+		// when we enabled profiling. We don't try to handle the case
+		// of a program that changes the SIGPROF handler while Go
+		// profiling is enabled.
+		if !sigInstallGoHandler(_SIGPROF) {
+			if atomic.Cas(&handlingSig[_SIGPROF], 1, 0) {
+				h := atomic.Loaduintptr(&fwdSig[_SIGPROF])
+				setsig(_SIGPROF, h)
+			}
+		}
+	}
+}
+
+// setThreadCPUProfilerHz makes any thread-specific changes required to
+// implement profiling at a rate of hz.
+// No changes required on Unix systems when using setitimer.
+func setThreadCPUProfilerHz(hz int32) {
+	getg().m.profilehz = hz
+}
+
+func sigpipe() {
+	if signal_ignored(_SIGPIPE) || sigsend(_SIGPIPE) {
+		return
+	}
+	dieFromSignal(_SIGPIPE)
+}
+
+// doSigPreempt handles a preemption signal on gp.
+func doSigPreempt(gp *g, ctxt *sigctxt) {
+	// Check if this G wants to be preempted and is safe to
+	// preempt.
+	if wantAsyncPreempt(gp) {
+		if ok, newpc := isAsyncSafePoint(gp, ctxt.sigpc(), ctxt.sigsp(), ctxt.siglr()); ok {
+			// Adjust the PC and inject a call to asyncPreempt.
+			ctxt.pushCall(abi.FuncPCABI0(asyncPreempt), newpc)
+		}
+	}
+
+	// Acknowledge the preemption.
+	gp.m.preemptGen.Add(1)
+	gp.m.signalPending.Store(0)
+
+	if GOOS == "darwin" || GOOS == "ios" {
+		pendingPreemptSignals.Add(-1)
+	}
+}
+
+const preemptMSupported = true
+
+// preemptM sends a preemption request to mp. This request may be
+// handled asynchronously and may be coalesced with other requests to
+// the M. When the request is received, if the running G or P are
+// marked for preemption and the goroutine is at an asynchronous
+// safe-point, it will preempt the goroutine. It always atomically
+// increments mp.preemptGen after handling a preemption request.
+func preemptM(mp *m) {
+	// On Darwin, don't try to preempt threads during exec.
+	// Issue #41702.
+	if GOOS == "darwin" || GOOS == "ios" {
+		execLock.rlock()
+	}
+
+	if mp.signalPending.CompareAndSwap(0, 1) {
+		if GOOS == "darwin" || GOOS == "ios" {
+			pendingPreemptSignals.Add(1)
+		}
+
+		// If multiple threads are preempting the same M, it may send many
+		// signals to the same M such that it hardly make progress, causing
+		// live-lock problem. Apparently this could happen on darwin. See
+		// issue #37741.
+		// Only send a signal if there isn't already one pending.
+		signalM(mp, sigPreempt)
+	}
+
+	if GOOS == "darwin" || GOOS == "ios" {
+		execLock.runlock()
+	}
+}
+
+// sigFetchG fetches the value of G safely when running in a signal handler.
+// On some architectures, the g value may be clobbered when running in a VDSO.
+// See issue #32912.
+//
+//go:nosplit
+func sigFetchG(c *sigctxt) *g {
+	switch GOARCH {
+	case "arm", "arm64", "loong64", "ppc64", "ppc64le", "riscv64", "s390x":
+		if !iscgo && inVDSOPage(c.sigpc()) {
+			// When using cgo, we save the g on TLS and load it from there
+			// in sigtramp. Just use that.
+			// Otherwise, before making a VDSO call we save the g to the
+			// bottom of the signal stack. Fetch from there.
+			// TODO: in efence mode, stack is sysAlloc'd, so this wouldn't
+			// work.
+			sp := getcallersp()
+			s := spanOf(sp)
+			if s != nil && s.state.get() == mSpanManual && s.base() < sp && sp < s.limit {
+				gp := *(**g)(unsafe.Pointer(s.base()))
+				return gp
+			}
+			return nil
+		}
+	}
+	return getg()
+}
+
+// sigtrampgo is called from the signal handler function, sigtramp,
+// written in assembly code.
+// This is called by the signal handler, and the world may be stopped.
+//
+// It must be nosplit because getg() is still the G that was running
+// (if any) when the signal was delivered, but it's (usually) called
+// on the gsignal stack. Until this switches the G to gsignal, the
+// stack bounds check won't work.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func sigtrampgo(sig uint32, info *siginfo, ctx unsafe.Pointer) {
+	if sigfwdgo(sig, info, ctx) {
+		return
+	}
+	c := &sigctxt{info, ctx}
+	gp := sigFetchG(c)
+	setg(gp)
+	if gp == nil || (gp.m != nil && gp.m.isExtraInC) {
+		if sig == _SIGPROF {
+			// Some platforms (Linux) have per-thread timers, which we use in
+			// combination with the process-wide timer. Avoid double-counting.
+			if validSIGPROF(nil, c) {
+				sigprofNonGoPC(c.sigpc())
+			}
+			return
+		}
+		if sig == sigPreempt && preemptMSupported && debug.asyncpreemptoff == 0 {
+			// This is probably a signal from preemptM sent
+			// while executing Go code but received while
+			// executing non-Go code.
+			// We got past sigfwdgo, so we know that there is
+			// no non-Go signal handler for sigPreempt.
+			// The default behavior for sigPreempt is to ignore
+			// the signal, so badsignal will be a no-op anyway.
+			if GOOS == "darwin" || GOOS == "ios" {
+				pendingPreemptSignals.Add(-1)
+			}
+			return
+		}
+		c.fixsigcode(sig)
+		// Set g to nil here and badsignal will use g0 by needm.
+		// TODO: reuse the current m here by using the gsignal and adjustSignalStack,
+		// since the current g maybe a normal goroutine and actually running on the signal stack,
+		// it may hit stack split that is not expected here.
+		if gp != nil {
+			setg(nil)
+		}
+		badsignal(uintptr(sig), c)
+		// Restore g
+		if gp != nil {
+			setg(gp)
+		}
+		return
+	}
+
+	setg(gp.m.gsignal)
+
+	// If some non-Go code called sigaltstack, adjust.
+	var gsignalStack gsignalStack
+	setStack := adjustSignalStack(sig, gp.m, &gsignalStack)
+	if setStack {
+		gp.m.gsignal.stktopsp = getcallersp()
+	}
+
+	if gp.stackguard0 == stackFork {
+		signalDuringFork(sig)
+	}
+
+	c.fixsigcode(sig)
+	sighandler(sig, info, ctx, gp)
+	setg(gp)
+	if setStack {
+		restoreGsignalStack(&gsignalStack)
+	}
+}
+
+// If the signal handler receives a SIGPROF signal on a non-Go thread,
+// it tries to collect a traceback into sigprofCallers.
+// sigprofCallersUse is set to non-zero while sigprofCallers holds a traceback.
+var sigprofCallers cgoCallers
+var sigprofCallersUse uint32
+
+// sigprofNonGo is called if we receive a SIGPROF signal on a non-Go thread,
+// and the signal handler collected a stack trace in sigprofCallers.
+// When this is called, sigprofCallersUse will be non-zero.
+// g is nil, and what we can do is very limited.
+//
+// It is called from the signal handling functions written in assembly code that
+// are active for cgo programs, cgoSigtramp and sigprofNonGoWrapper, which have
+// not verified that the SIGPROF delivery corresponds to the best available
+// profiling source for this thread.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func sigprofNonGo(sig uint32, info *siginfo, ctx unsafe.Pointer) {
+	if prof.hz.Load() != 0 {
+		c := &sigctxt{info, ctx}
+		// Some platforms (Linux) have per-thread timers, which we use in
+		// combination with the process-wide timer. Avoid double-counting.
+		if validSIGPROF(nil, c) {
+			n := 0
+			for n < len(sigprofCallers) && sigprofCallers[n] != 0 {
+				n++
+			}
+			cpuprof.addNonGo(sigprofCallers[:n])
+		}
+	}
+
+	atomic.Store(&sigprofCallersUse, 0)
+}
+
+// sigprofNonGoPC is called when a profiling signal arrived on a
+// non-Go thread and we have a single PC value, not a stack trace.
+// g is nil, and what we can do is very limited.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func sigprofNonGoPC(pc uintptr) {
+	if prof.hz.Load() != 0 {
+		stk := []uintptr{
+			pc,
+			abi.FuncPCABIInternal(_ExternalCode) + sys.PCQuantum,
+		}
+		cpuprof.addNonGo(stk)
+	}
+}
+
+// adjustSignalStack adjusts the current stack guard based on the
+// stack pointer that is actually in use while handling a signal.
+// We do this in case some non-Go code called sigaltstack.
+// This reports whether the stack was adjusted, and if so stores the old
+// signal stack in *gsigstack.
+//
+//go:nosplit
+func adjustSignalStack(sig uint32, mp *m, gsigStack *gsignalStack) bool {
+	sp := uintptr(unsafe.Pointer(&sig))
+	if sp >= mp.gsignal.stack.lo && sp < mp.gsignal.stack.hi {
+		return false
+	}
+
+	var st stackt
+	sigaltstack(nil, &st)
+	stsp := uintptr(unsafe.Pointer(st.ss_sp))
+	if st.ss_flags&_SS_DISABLE == 0 && sp >= stsp && sp < stsp+st.ss_size {
+		setGsignalStack(&st, gsigStack)
+		return true
+	}
+
+	if sp >= mp.g0.stack.lo && sp < mp.g0.stack.hi {
+		// The signal was delivered on the g0 stack.
+		// This can happen when linked with C code
+		// using the thread sanitizer, which collects
+		// signals then delivers them itself by calling
+		// the signal handler directly when C code,
+		// including C code called via cgo, calls a
+		// TSAN-intercepted function such as malloc.
+		//
+		// We check this condition last as g0.stack.lo
+		// may be not very accurate (see mstart).
+		st := stackt{ss_size: mp.g0.stack.hi - mp.g0.stack.lo}
+		setSignalstackSP(&st, mp.g0.stack.lo)
+		setGsignalStack(&st, gsigStack)
+		return true
+	}
+
+	// sp is not within gsignal stack, g0 stack, or sigaltstack. Bad.
+	setg(nil)
+	needm(true)
+	if st.ss_flags&_SS_DISABLE != 0 {
+		noSignalStack(sig)
+	} else {
+		sigNotOnStack(sig, sp, mp)
+	}
+	dropm()
+	return false
+}
+
+// crashing is the number of m's we have waited for when implementing
+// GOTRACEBACK=crash when a signal is received.
+var crashing int32
+
+// testSigtrap and testSigusr1 are used by the runtime tests. If
+// non-nil, it is called on SIGTRAP/SIGUSR1. If it returns true, the
+// normal behavior on this signal is suppressed.
+var testSigtrap func(info *siginfo, ctxt *sigctxt, gp *g) bool
+var testSigusr1 func(gp *g) bool
+
+// sighandler is invoked when a signal occurs. The global g will be
+// set to a gsignal goroutine and we will be running on the alternate
+// signal stack. The parameter gp will be the value of the global g
+// when the signal occurred. The sig, info, and ctxt parameters are
+// from the system signal handler: they are the parameters passed when
+// the SA is passed to the sigaction system call.
+//
+// The garbage collector may have stopped the world, so write barriers
+// are not allowed.
+//
+//go:nowritebarrierrec
+func sighandler(sig uint32, info *siginfo, ctxt unsafe.Pointer, gp *g) {
+	// The g executing the signal handler. This is almost always
+	// mp.gsignal. See delayedSignal for an exception.
+	gsignal := getg()
+	mp := gsignal.m
+	c := &sigctxt{info, ctxt}
+
+	// Cgo TSAN (not the Go race detector) intercepts signals and calls the
+	// signal handler at a later time. When the signal handler is called, the
+	// memory may have changed, but the signal context remains old. The
+	// unmatched signal context and memory makes it unsafe to unwind or inspect
+	// the stack. So we ignore delayed non-fatal signals that will cause a stack
+	// inspection (profiling signal and preemption signal).
+	// cgo_yield is only non-nil for TSAN, and is specifically used to trigger
+	// signal delivery. We use that as an indicator of delayed signals.
+	// For delayed signals, the handler is called on the g0 stack (see
+	// adjustSignalStack).
+	delayedSignal := *cgo_yield != nil && mp != nil && gsignal.stack == mp.g0.stack
+
+	if sig == _SIGPROF {
+		// Some platforms (Linux) have per-thread timers, which we use in
+		// combination with the process-wide timer. Avoid double-counting.
+		if !delayedSignal && validSIGPROF(mp, c) {
+			sigprof(c.sigpc(), c.sigsp(), c.siglr(), gp, mp)
+		}
+		return
+	}
+
+	if sig == _SIGTRAP && testSigtrap != nil && testSigtrap(info, (*sigctxt)(noescape(unsafe.Pointer(c))), gp) {
+		return
+	}
+
+	if sig == _SIGUSR1 && testSigusr1 != nil && testSigusr1(gp) {
+		return
+	}
+
+	if (GOOS == "linux" || GOOS == "android") && sig == sigPerThreadSyscall {
+		// sigPerThreadSyscall is the same signal used by glibc for
+		// per-thread syscalls on Linux. We use it for the same purpose
+		// in non-cgo binaries. Since this signal is not _SigNotify,
+		// there is nothing more to do once we run the syscall.
+		runPerThreadSyscall()
+		return
+	}
+
+	if sig == sigPreempt && debug.asyncpreemptoff == 0 && !delayedSignal {
+		// Might be a preemption signal.
+		doSigPreempt(gp, c)
+		// Even if this was definitely a preemption signal, it
+		// may have been coalesced with another signal, so we
+		// still let it through to the application.
+	}
+
+	flags := int32(_SigThrow)
+	if sig < uint32(len(sigtable)) {
+		flags = sigtable[sig].flags
+	}
+	if !c.sigFromUser() && flags&_SigPanic != 0 && (gp.throwsplit || gp != mp.curg) {
+		// We can't safely sigpanic because it may grow the
+		// stack. Abort in the signal handler instead.
+		//
+		// Also don't inject a sigpanic if we are not on a
+		// user G stack. Either we're in the runtime, or we're
+		// running C code. Either way we cannot recover.
+		flags = _SigThrow
+	}
+	if isAbortPC(c.sigpc()) {
+		// On many architectures, the abort function just
+		// causes a memory fault. Don't turn that into a panic.
+		flags = _SigThrow
+	}
+	if !c.sigFromUser() && flags&_SigPanic != 0 {
+		// The signal is going to cause a panic.
+		// Arrange the stack so that it looks like the point
+		// where the signal occurred made a call to the
+		// function sigpanic. Then set the PC to sigpanic.
+
+		// Have to pass arguments out of band since
+		// augmenting the stack frame would break
+		// the unwinding code.
+		gp.sig = sig
+		gp.sigcode0 = uintptr(c.sigcode())
+		gp.sigcode1 = uintptr(c.fault())
+		gp.sigpc = c.sigpc()
+
+		c.preparePanic(sig, gp)
+		return
+	}
+
+	if c.sigFromUser() || flags&_SigNotify != 0 {
+		if sigsend(sig) {
+			return
+		}
+	}
+
+	if c.sigFromUser() && signal_ignored(sig) {
+		return
+	}
+
+	if flags&_SigKill != 0 {
+		dieFromSignal(sig)
+	}
+
+	// _SigThrow means that we should exit now.
+	// If we get here with _SigPanic, it means that the signal
+	// was sent to us by a program (c.sigFromUser() is true);
+	// in that case, if we didn't handle it in sigsend, we exit now.
+	if flags&(_SigThrow|_SigPanic) == 0 {
+		return
+	}
+
+	mp.throwing = throwTypeRuntime
+	mp.caughtsig.set(gp)
+
+	if crashing == 0 {
+		startpanic_m()
+	}
+
+	gp = fatalsignal(sig, c, gp, mp)
+
+	level, _, docrash := gotraceback()
+	if level > 0 {
+		goroutineheader(gp)
+		tracebacktrap(c.sigpc(), c.sigsp(), c.siglr(), gp)
+		if crashing > 0 && gp != mp.curg && mp.curg != nil && readgstatus(mp.curg)&^_Gscan == _Grunning {
+			// tracebackothers on original m skipped this one; trace it now.
+			goroutineheader(mp.curg)
+			traceback(^uintptr(0), ^uintptr(0), 0, mp.curg)
+		} else if crashing == 0 {
+			tracebackothers(gp)
+			print("\n")
+		}
+		dumpregs(c)
+	}
+
+	if docrash {
+		crashing++
+		if crashing < mcount()-int32(extraMLength.Load()) {
+			// There are other m's that need to dump their stacks.
+			// Relay SIGQUIT to the next m by sending it to the current process.
+			// All m's that have already received SIGQUIT have signal masks blocking
+			// receipt of any signals, so the SIGQUIT will go to an m that hasn't seen it yet.
+			// When the last m receives the SIGQUIT, it will fall through to the call to
+			// crash below. Just in case the relaying gets botched, each m involved in
+			// the relay sleeps for 5 seconds and then does the crash/exit itself.
+			// In expected operation, the last m has received the SIGQUIT and run
+			// crash/exit and the process is gone, all long before any of the
+			// 5-second sleeps have finished.
+			print("\n-----\n\n")
+			raiseproc(_SIGQUIT)
+			usleep(5 * 1000 * 1000)
+		}
+		crash()
+	}
+
+	printDebugLog()
+
+	exit(2)
+}
+
+func fatalsignal(sig uint32, c *sigctxt, gp *g, mp *m) *g {
+	if sig < uint32(len(sigtable)) {
+		print(sigtable[sig].name, "\n")
+	} else {
+		print("Signal ", sig, "\n")
+	}
+
+	if isSecureMode() {
+		exit(2)
+	}
+
+	print("PC=", hex(c.sigpc()), " m=", mp.id, " sigcode=", c.sigcode(), "\n")
+	if mp.incgo && gp == mp.g0 && mp.curg != nil {
+		print("signal arrived during cgo execution\n")
+		// Switch to curg so that we get a traceback of the Go code
+		// leading up to the cgocall, which switched from curg to g0.
+		gp = mp.curg
+	}
+	if sig == _SIGILL || sig == _SIGFPE {
+		// It would be nice to know how long the instruction is.
+		// Unfortunately, that's complicated to do in general (mostly for x86
+		// and s930x, but other archs have non-standard instruction lengths also).
+		// Opt to print 16 bytes, which covers most instructions.
+		const maxN = 16
+		n := uintptr(maxN)
+		// We have to be careful, though. If we're near the end of
+		// a page and the following page isn't mapped, we could
+		// segfault. So make sure we don't straddle a page (even though
+		// that could lead to printing an incomplete instruction).
+		// We're assuming here we can read at least the page containing the PC.
+		// I suppose it is possible that the page is mapped executable but not readable?
+		pc := c.sigpc()
+		if n > physPageSize-pc%physPageSize {
+			n = physPageSize - pc%physPageSize
+		}
+		print("instruction bytes:")
+		b := (*[maxN]byte)(unsafe.Pointer(pc))
+		for i := uintptr(0); i < n; i++ {
+			print(" ", hex(b[i]))
+		}
+		println()
+	}
+	print("\n")
+	return gp
+}
+
+// sigpanic turns a synchronous signal into a run-time panic.
+// If the signal handler sees a synchronous panic, it arranges the
+// stack to look like the function where the signal occurred called
+// sigpanic, sets the signal's PC value to sigpanic, and returns from
+// the signal handler. The effect is that the program will act as
+// though the function that got the signal simply called sigpanic
+// instead.
+//
+// This must NOT be nosplit because the linker doesn't know where
+// sigpanic calls can be injected.
+//
+// The signal handler must not inject a call to sigpanic if
+// getg().throwsplit, since sigpanic may need to grow the stack.
+//
+// This is exported via linkname to assembly in runtime/cgo.
+//
+//go:linkname sigpanic
+func sigpanic() {
+	gp := getg()
+	if !canpanic() {
+		throw("unexpected signal during runtime execution")
+	}
+
+	switch gp.sig {
+	case _SIGBUS:
+		if gp.sigcode0 == _BUS_ADRERR && gp.sigcode1 < 0x1000 {
+			panicmem()
+		}
+		// Support runtime/debug.SetPanicOnFault.
+		if gp.paniconfault {
+			panicmemAddr(gp.sigcode1)
+		}
+		print("unexpected fault address ", hex(gp.sigcode1), "\n")
+		throw("fault")
+	case _SIGSEGV:
+		if (gp.sigcode0 == 0 || gp.sigcode0 == _SEGV_MAPERR || gp.sigcode0 == _SEGV_ACCERR) && gp.sigcode1 < 0x1000 {
+			panicmem()
+		}
+		// Support runtime/debug.SetPanicOnFault.
+		if gp.paniconfault {
+			panicmemAddr(gp.sigcode1)
+		}
+		if inUserArenaChunk(gp.sigcode1) {
+			// We could check that the arena chunk is explicitly set to fault,
+			// but the fact that we faulted on accessing it is enough to prove
+			// that it is.
+			print("accessed data from freed user arena ", hex(gp.sigcode1), "\n")
+		} else {
+			print("unexpected fault address ", hex(gp.sigcode1), "\n")
+		}
+		throw("fault")
+	case _SIGFPE:
+		switch gp.sigcode0 {
+		case _FPE_INTDIV:
+			panicdivide()
+		case _FPE_INTOVF:
+			panicoverflow()
+		}
+		panicfloat()
+	}
+
+	if gp.sig >= uint32(len(sigtable)) {
+		// can't happen: we looked up gp.sig in sigtable to decide to call sigpanic
+		throw("unexpected signal value")
+	}
+	panic(errorString(sigtable[gp.sig].name))
+}
+
+// dieFromSignal kills the program with a signal.
+// This provides the expected exit status for the shell.
+// This is only called with fatal signals expected to kill the process.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func dieFromSignal(sig uint32) {
+	unblocksig(sig)
+	// Mark the signal as unhandled to ensure it is forwarded.
+	atomic.Store(&handlingSig[sig], 0)
+	raise(sig)
+
+	// That should have killed us. On some systems, though, raise
+	// sends the signal to the whole process rather than to just
+	// the current thread, which means that the signal may not yet
+	// have been delivered. Give other threads a chance to run and
+	// pick up the signal.
+	osyield()
+	osyield()
+	osyield()
+
+	// If that didn't work, try _SIG_DFL.
+	setsig(sig, _SIG_DFL)
+	raise(sig)
+
+	osyield()
+	osyield()
+	osyield()
+
+	// If we are still somehow running, just exit with the wrong status.
+	exit(2)
+}
+
+// raisebadsignal is called when a signal is received on a non-Go
+// thread, and the Go program does not want to handle it (that is, the
+// program has not called os/signal.Notify for the signal).
+func raisebadsignal(sig uint32, c *sigctxt) {
+	if sig == _SIGPROF {
+		// Ignore profiling signals that arrive on non-Go threads.
+		return
+	}
+
+	var handler uintptr
+	if sig >= _NSIG {
+		handler = _SIG_DFL
+	} else {
+		handler = atomic.Loaduintptr(&fwdSig[sig])
+	}
+
+	// Reset the signal handler and raise the signal.
+	// We are currently running inside a signal handler, so the
+	// signal is blocked. We need to unblock it before raising the
+	// signal, or the signal we raise will be ignored until we return
+	// from the signal handler. We know that the signal was unblocked
+	// before entering the handler, or else we would not have received
+	// it. That means that we don't have to worry about blocking it
+	// again.
+	unblocksig(sig)
+	setsig(sig, handler)
+
+	// If we're linked into a non-Go program we want to try to
+	// avoid modifying the original context in which the signal
+	// was raised. If the handler is the default, we know it
+	// is non-recoverable, so we don't have to worry about
+	// re-installing sighandler. At this point we can just
+	// return and the signal will be re-raised and caught by
+	// the default handler with the correct context.
+	//
+	// On FreeBSD, the libthr sigaction code prevents
+	// this from working so we fall through to raise.
+	if GOOS != "freebsd" && (isarchive || islibrary) && handler == _SIG_DFL && !c.sigFromUser() {
+		return
+	}
+
+	raise(sig)
+
+	// Give the signal a chance to be delivered.
+	// In almost all real cases the program is about to crash,
+	// so sleeping here is not a waste of time.
+	usleep(1000)
+
+	// If the signal didn't cause the program to exit, restore the
+	// Go signal handler and carry on.
+	//
+	// We may receive another instance of the signal before we
+	// restore the Go handler, but that is not so bad: we know
+	// that the Go program has been ignoring the signal.
+	setsig(sig, abi.FuncPCABIInternal(sighandler))
+}
+
+//go:nosplit
+func crash() {
+	dieFromSignal(_SIGABRT)
+}
+
+// ensureSigM starts one global, sleeping thread to make sure at least one thread
+// is available to catch signals enabled for os/signal.
+func ensureSigM() {
+	if maskUpdatedChan != nil {
+		return
+	}
+	maskUpdatedChan = make(chan struct{})
+	disableSigChan = make(chan uint32)
+	enableSigChan = make(chan uint32)
+	go func() {
+		// Signal masks are per-thread, so make sure this goroutine stays on one
+		// thread.
+		LockOSThread()
+		defer UnlockOSThread()
+		// The sigBlocked mask contains the signals not active for os/signal,
+		// initially all signals except the essential. When signal.Notify()/Stop is called,
+		// sigenable/sigdisable in turn notify this thread to update its signal
+		// mask accordingly.
+		sigBlocked := sigset_all
+		for i := range sigtable {
+			if !blockableSig(uint32(i)) {
+				sigdelset(&sigBlocked, i)
+			}
+		}
+		sigprocmask(_SIG_SETMASK, &sigBlocked, nil)
+		for {
+			select {
+			case sig := <-enableSigChan:
+				if sig > 0 {
+					sigdelset(&sigBlocked, int(sig))
+				}
+			case sig := <-disableSigChan:
+				if sig > 0 && blockableSig(sig) {
+					sigaddset(&sigBlocked, int(sig))
+				}
+			}
+			sigprocmask(_SIG_SETMASK, &sigBlocked, nil)
+			maskUpdatedChan <- struct{}{}
+		}
+	}()
+}
+
+// This is called when we receive a signal when there is no signal stack.
+// This can only happen if non-Go code calls sigaltstack to disable the
+// signal stack.
+func noSignalStack(sig uint32) {
+	println("signal", sig, "received on thread with no signal stack")
+	throw("non-Go code disabled sigaltstack")
+}
+
+// This is called if we receive a signal when there is a signal stack
+// but we are not on it. This can only happen if non-Go code called
+// sigaction without setting the SS_ONSTACK flag.
+func sigNotOnStack(sig uint32, sp uintptr, mp *m) {
+	println("signal", sig, "received but handler not on signal stack")
+	print("mp.gsignal stack [", hex(mp.gsignal.stack.lo), " ", hex(mp.gsignal.stack.hi), "], ")
+	print("mp.g0 stack [", hex(mp.g0.stack.lo), " ", hex(mp.g0.stack.hi), "], sp=", hex(sp), "\n")
+	throw("non-Go code set up signal handler without SA_ONSTACK flag")
+}
+
+// signalDuringFork is called if we receive a signal while doing a fork.
+// We do not want signals at that time, as a signal sent to the process
+// group may be delivered to the child process, causing confusion.
+// This should never be called, because we block signals across the fork;
+// this function is just a safety check. See issue 18600 for background.
+func signalDuringFork(sig uint32) {
+	println("signal", sig, "received during fork")
+	throw("signal received during fork")
+}
+
+// This runs on a foreign stack, without an m or a g. No stack split.
+//
+//go:nosplit
+//go:norace
+//go:nowritebarrierrec
+func badsignal(sig uintptr, c *sigctxt) {
+	if !iscgo && !cgoHasExtraM {
+		// There is no extra M. needm will not be able to grab
+		// an M. Instead of hanging, just crash.
+		// Cannot call split-stack function as there is no G.
+		writeErrStr("fatal: bad g in signal handler\n")
+		exit(2)
+		*(*uintptr)(unsafe.Pointer(uintptr(123))) = 2
+	}
+	needm(true)
+	if !sigsend(uint32(sig)) {
+		// A foreign thread received the signal sig, and the
+		// Go code does not want to handle it.
+		raisebadsignal(uint32(sig), c)
+	}
+	dropm()
+}
+
+//go:noescape
+func sigfwd(fn uintptr, sig uint32, info *siginfo, ctx unsafe.Pointer)
+
+// Determines if the signal should be handled by Go and if not, forwards the
+// signal to the handler that was installed before Go's. Returns whether the
+// signal was forwarded.
+// This is called by the signal handler, and the world may be stopped.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func sigfwdgo(sig uint32, info *siginfo, ctx unsafe.Pointer) bool {
+	if sig >= uint32(len(sigtable)) {
+		return false
+	}
+	fwdFn := atomic.Loaduintptr(&fwdSig[sig])
+	flags := sigtable[sig].flags
+
+	// If we aren't handling the signal, forward it.
+	if atomic.Load(&handlingSig[sig]) == 0 || !signalsOK {
+		// If the signal is ignored, doing nothing is the same as forwarding.
+		if fwdFn == _SIG_IGN || (fwdFn == _SIG_DFL && flags&_SigIgn != 0) {
+			return true
+		}
+		// We are not handling the signal and there is no other handler to forward to.
+		// Crash with the default behavior.
+		if fwdFn == _SIG_DFL {
+			setsig(sig, _SIG_DFL)
+			dieFromSignal(sig)
+			return false
+		}
+
+		sigfwd(fwdFn, sig, info, ctx)
+		return true
+	}
+
+	// This function and its caller sigtrampgo assumes SIGPIPE is delivered on the
+	// originating thread. This property does not hold on macOS (golang.org/issue/33384),
+	// so we have no choice but to ignore SIGPIPE.
+	if (GOOS == "darwin" || GOOS == "ios") && sig == _SIGPIPE {
+		return true
+	}
+
+	// If there is no handler to forward to, no need to forward.
+	if fwdFn == _SIG_DFL {
+		return false
+	}
+
+	c := &sigctxt{info, ctx}
+	// Only forward synchronous signals and SIGPIPE.
+	// Unfortunately, user generated SIGPIPEs will also be forwarded, because si_code
+	// is set to _SI_USER even for a SIGPIPE raised from a write to a closed socket
+	// or pipe.
+	if (c.sigFromUser() || flags&_SigPanic == 0) && sig != _SIGPIPE {
+		return false
+	}
+	// Determine if the signal occurred inside Go code. We test that:
+	//   (1) we weren't in VDSO page,
+	//   (2) we were in a goroutine (i.e., m.curg != nil), and
+	//   (3) we weren't in CGO.
+	//   (4) we weren't in dropped extra m.
+	gp := sigFetchG(c)
+	if gp != nil && gp.m != nil && gp.m.curg != nil && !gp.m.isExtraInC && !gp.m.incgo {
+		return false
+	}
+
+	// Signal not handled by Go, forward it.
+	if fwdFn != _SIG_IGN {
+		sigfwd(fwdFn, sig, info, ctx)
+	}
+
+	return true
+}
+
+// sigsave saves the current thread's signal mask into *p.
+// This is used to preserve the non-Go signal mask when a non-Go
+// thread calls a Go function.
+// This is nosplit and nowritebarrierrec because it is called by needm
+// which may be called on a non-Go thread with no g available.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func sigsave(p *sigset) {
+	sigprocmask(_SIG_SETMASK, nil, p)
+}
+
+// msigrestore sets the current thread's signal mask to sigmask.
+// This is used to restore the non-Go signal mask when a non-Go thread
+// calls a Go function.
+// This is nosplit and nowritebarrierrec because it is called by dropm
+// after g has been cleared.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func msigrestore(sigmask sigset) {
+	sigprocmask(_SIG_SETMASK, &sigmask, nil)
+}
+
+// sigsetAllExiting is used by sigblock(true) when a thread is
+// exiting. sigset_all is defined in OS specific code, and per GOOS
+// behavior may override this default for sigsetAllExiting: see
+// osinit().
+var sigsetAllExiting = sigset_all
+
+// sigblock blocks signals in the current thread's signal mask.
+// This is used to block signals while setting up and tearing down g
+// when a non-Go thread calls a Go function. When a thread is exiting
+// we use the sigsetAllExiting value, otherwise the OS specific
+// definition of sigset_all is used.
+// This is nosplit and nowritebarrierrec because it is called by needm
+// which may be called on a non-Go thread with no g available.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func sigblock(exiting bool) {
+	if exiting {
+		sigprocmask(_SIG_SETMASK, &sigsetAllExiting, nil)
+		return
+	}
+	sigprocmask(_SIG_SETMASK, &sigset_all, nil)
+}
+
+// unblocksig removes sig from the current thread's signal mask.
+// This is nosplit and nowritebarrierrec because it is called from
+// dieFromSignal, which can be called by sigfwdgo while running in the
+// signal handler, on the signal stack, with no g available.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func unblocksig(sig uint32) {
+	var set sigset
+	sigaddset(&set, int(sig))
+	sigprocmask(_SIG_UNBLOCK, &set, nil)
+}
+
+// minitSignals is called when initializing a new m to set the
+// thread's alternate signal stack and signal mask.
+func minitSignals() {
+	minitSignalStack()
+	minitSignalMask()
+}
+
+// minitSignalStack is called when initializing a new m to set the
+// alternate signal stack. If the alternate signal stack is not set
+// for the thread (the normal case) then set the alternate signal
+// stack to the gsignal stack. If the alternate signal stack is set
+// for the thread (the case when a non-Go thread sets the alternate
+// signal stack and then calls a Go function) then set the gsignal
+// stack to the alternate signal stack. We also set the alternate
+// signal stack to the gsignal stack if cgo is not used (regardless
+// of whether it is already set). Record which choice was made in
+// newSigstack, so that it can be undone in unminit.
+func minitSignalStack() {
+	mp := getg().m
+	var st stackt
+	sigaltstack(nil, &st)
+	if st.ss_flags&_SS_DISABLE != 0 || !iscgo {
+		signalstack(&mp.gsignal.stack)
+		mp.newSigstack = true
+	} else {
+		setGsignalStack(&st, &mp.goSigStack)
+		mp.newSigstack = false
+	}
+}
+
+// minitSignalMask is called when initializing a new m to set the
+// thread's signal mask. When this is called all signals have been
+// blocked for the thread.  This starts with m.sigmask, which was set
+// either from initSigmask for a newly created thread or by calling
+// sigsave if this is a non-Go thread calling a Go function. It
+// removes all essential signals from the mask, thus causing those
+// signals to not be blocked. Then it sets the thread's signal mask.
+// After this is called the thread can receive signals.
+func minitSignalMask() {
+	nmask := getg().m.sigmask
+	for i := range sigtable {
+		if !blockableSig(uint32(i)) {
+			sigdelset(&nmask, i)
+		}
+	}
+	sigprocmask(_SIG_SETMASK, &nmask, nil)
+}
+
+// unminitSignals is called from dropm, via unminit, to undo the
+// effect of calling minit on a non-Go thread.
+//
+//go:nosplit
+func unminitSignals() {
+	if getg().m.newSigstack {
+		st := stackt{ss_flags: _SS_DISABLE}
+		sigaltstack(&st, nil)
+	} else {
+		// We got the signal stack from someone else. Restore
+		// the Go-allocated stack in case this M gets reused
+		// for another thread (e.g., it's an extram). Also, on
+		// Android, libc allocates a signal stack for all
+		// threads, so it's important to restore the Go stack
+		// even on Go-created threads so we can free it.
+		restoreGsignalStack(&getg().m.goSigStack)
+	}
+}
+
+// blockableSig reports whether sig may be blocked by the signal mask.
+// We never want to block the signals marked _SigUnblock;
+// these are the synchronous signals that turn into a Go panic.
+// We never want to block the preemption signal if it is being used.
+// In a Go program--not a c-archive/c-shared--we never want to block
+// the signals marked _SigKill or _SigThrow, as otherwise it's possible
+// for all running threads to block them and delay their delivery until
+// we start a new thread. When linked into a C program we let the C code
+// decide on the disposition of those signals.
+func blockableSig(sig uint32) bool {
+	flags := sigtable[sig].flags
+	if flags&_SigUnblock != 0 {
+		return false
+	}
+	if sig == sigPreempt && preemptMSupported && debug.asyncpreemptoff == 0 {
+		return false
+	}
+	if isarchive || islibrary {
+		return true
+	}
+	return flags&(_SigKill|_SigThrow) == 0
+}
+
+// gsignalStack saves the fields of the gsignal stack changed by
+// setGsignalStack.
+type gsignalStack struct {
+	stack       stack
+	stackguard0 uintptr
+	stackguard1 uintptr
+	stktopsp    uintptr
+}
+
+// setGsignalStack sets the gsignal stack of the current m to an
+// alternate signal stack returned from the sigaltstack system call.
+// It saves the old values in *old for use by restoreGsignalStack.
+// This is used when handling a signal if non-Go code has set the
+// alternate signal stack.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func setGsignalStack(st *stackt, old *gsignalStack) {
+	gp := getg()
+	if old != nil {
+		old.stack = gp.m.gsignal.stack
+		old.stackguard0 = gp.m.gsignal.stackguard0
+		old.stackguard1 = gp.m.gsignal.stackguard1
+		old.stktopsp = gp.m.gsignal.stktopsp
+	}
+	stsp := uintptr(unsafe.Pointer(st.ss_sp))
+	gp.m.gsignal.stack.lo = stsp
+	gp.m.gsignal.stack.hi = stsp + st.ss_size
+	gp.m.gsignal.stackguard0 = stsp + stackGuard
+	gp.m.gsignal.stackguard1 = stsp + stackGuard
+}
+
+// restoreGsignalStack restores the gsignal stack to the value it had
+// before entering the signal handler.
+//
+//go:nosplit
+//go:nowritebarrierrec
+func restoreGsignalStack(st *gsignalStack) {
+	gp := getg().m.gsignal
+	gp.stack = st.stack
+	gp.stackguard0 = st.stackguard0
+	gp.stackguard1 = st.stackguard1
+	gp.stktopsp = st.stktopsp
+}
+
+// signalstack sets the current thread's alternate signal stack to s.
+//
+//go:nosplit
+func signalstack(s *stack) {
+	st := stackt{ss_size: s.hi - s.lo}
+	setSignalstackSP(&st, s.lo)
+	sigaltstack(&st, nil)
+}
+
+// setsigsegv is used on darwin/arm64 to fake a segmentation fault.
+//
+// This is exported via linkname to assembly in runtime/cgo.
+//
+//go:nosplit
+//go:linkname setsigsegv
+func setsigsegv(pc uintptr) {
+	gp := getg()
+	gp.sig = _SIGSEGV
+	gp.sigpc = pc
+	gp.sigcode0 = _SEGV_MAPERR
+	gp.sigcode1 = 0 // TODO: emulate si_addr
+}
diff --git a/src/runtime/signal_windows.go b/src/runtime/signal_windows.go
new file mode 100644
index 0000000..8e0e39c
--- /dev/null
+++ b/src/runtime/signal_windows.go
@@ -0,0 +1,445 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+const (
+	_SEM_FAILCRITICALERRORS = 0x0001
+	_SEM_NOGPFAULTERRORBOX  = 0x0002
+	_SEM_NOOPENFILEERRORBOX = 0x8000
+
+	_WER_FAULT_REPORTING_NO_UI = 0x0020
+)
+
+func preventErrorDialogs() {
+	errormode := stdcall0(_GetErrorMode)
+	stdcall1(_SetErrorMode, errormode|_SEM_FAILCRITICALERRORS|_SEM_NOGPFAULTERRORBOX|_SEM_NOOPENFILEERRORBOX)
+
+	// Disable WER fault reporting UI.
+	// Do this even if WER is disabled as a whole,
+	// as WER might be enabled later with setTraceback("wer")
+	// and we still want the fault reporting UI to be disabled if this happens.
+	var werflags uintptr
+	stdcall2(_WerGetFlags, currentProcess, uintptr(unsafe.Pointer(&werflags)))
+	stdcall1(_WerSetFlags, werflags|_WER_FAULT_REPORTING_NO_UI)
+}
+
+// enableWER re-enables Windows error reporting without fault reporting UI.
+func enableWER() {
+	// re-enable Windows Error Reporting
+	errormode := stdcall0(_GetErrorMode)
+	if errormode&_SEM_NOGPFAULTERRORBOX != 0 {
+		stdcall1(_SetErrorMode, errormode^_SEM_NOGPFAULTERRORBOX)
+	}
+}
+
+// in sys_windows_386.s, sys_windows_amd64.s, sys_windows_arm.s, and sys_windows_arm64.s
+func exceptiontramp()
+func firstcontinuetramp()
+func lastcontinuetramp()
+func sigresume()
+
+func initExceptionHandler() {
+	stdcall2(_AddVectoredExceptionHandler, 1, abi.FuncPCABI0(exceptiontramp))
+	if _AddVectoredContinueHandler == nil || GOARCH == "386" {
+		// use SetUnhandledExceptionFilter for windows-386 or
+		// if VectoredContinueHandler is unavailable.
+		// note: SetUnhandledExceptionFilter handler won't be called, if debugging.
+		stdcall1(_SetUnhandledExceptionFilter, abi.FuncPCABI0(lastcontinuetramp))
+	} else {
+		stdcall2(_AddVectoredContinueHandler, 1, abi.FuncPCABI0(firstcontinuetramp))
+		stdcall2(_AddVectoredContinueHandler, 0, abi.FuncPCABI0(lastcontinuetramp))
+	}
+}
+
+// isAbort returns true, if context r describes exception raised
+// by calling runtime.abort function.
+//
+//go:nosplit
+func isAbort(r *context) bool {
+	pc := r.ip()
+	if GOARCH == "386" || GOARCH == "amd64" || GOARCH == "arm" {
+		// In the case of an abort, the exception IP is one byte after
+		// the INT3 (this differs from UNIX OSes). Note that on ARM,
+		// this means that the exception IP is no longer aligned.
+		pc--
+	}
+	return isAbortPC(pc)
+}
+
+// isgoexception reports whether this exception should be translated
+// into a Go panic or throw.
+//
+// It is nosplit to avoid growing the stack in case we're aborting
+// because of a stack overflow.
+//
+//go:nosplit
+func isgoexception(info *exceptionrecord, r *context) bool {
+	// Only handle exception if executing instructions in Go binary
+	// (not Windows library code).
+	// TODO(mwhudson): needs to loop to support shared libs
+	if r.ip() < firstmoduledata.text || firstmoduledata.etext < r.ip() {
+		return false
+	}
+
+	// Go will only handle some exceptions.
+	switch info.exceptioncode {
+	default:
+		return false
+	case _EXCEPTION_ACCESS_VIOLATION:
+	case _EXCEPTION_IN_PAGE_ERROR:
+	case _EXCEPTION_INT_DIVIDE_BY_ZERO:
+	case _EXCEPTION_INT_OVERFLOW:
+	case _EXCEPTION_FLT_DENORMAL_OPERAND:
+	case _EXCEPTION_FLT_DIVIDE_BY_ZERO:
+	case _EXCEPTION_FLT_INEXACT_RESULT:
+	case _EXCEPTION_FLT_OVERFLOW:
+	case _EXCEPTION_FLT_UNDERFLOW:
+	case _EXCEPTION_BREAKPOINT:
+	case _EXCEPTION_ILLEGAL_INSTRUCTION: // breakpoint arrives this way on arm64
+	}
+	return true
+}
+
+const (
+	callbackVEH = iota
+	callbackFirstVCH
+	callbackLastVCH
+)
+
+// sigFetchGSafe is like getg() but without panicking
+// when TLS is not set.
+// Only implemented on windows/386, which is the only
+// arch that loads TLS when calling getg(). Others
+// use a dedicated register.
+func sigFetchGSafe() *g
+
+func sigFetchG() *g {
+	if GOARCH == "386" {
+		return sigFetchGSafe()
+	}
+	return getg()
+}
+
+// sigtrampgo is called from the exception handler function, sigtramp,
+// written in assembly code.
+// Return EXCEPTION_CONTINUE_EXECUTION if the exception is handled,
+// else return EXCEPTION_CONTINUE_SEARCH.
+//
+// It is nosplit for the same reason as exceptionhandler.
+//
+//go:nosplit
+func sigtrampgo(ep *exceptionpointers, kind int) int32 {
+	gp := sigFetchG()
+	if gp == nil {
+		return _EXCEPTION_CONTINUE_SEARCH
+	}
+
+	var fn func(info *exceptionrecord, r *context, gp *g) int32
+	switch kind {
+	case callbackVEH:
+		fn = exceptionhandler
+	case callbackFirstVCH:
+		fn = firstcontinuehandler
+	case callbackLastVCH:
+		fn = lastcontinuehandler
+	default:
+		throw("unknown sigtramp callback")
+	}
+
+	// Check if we are running on g0 stack, and if we are,
+	// call fn directly instead of creating the closure.
+	// for the systemstack argument.
+	//
+	// A closure can't be marked as nosplit, so it might
+	// call morestack if we are at the g0 stack limit.
+	// If that happens, the runtime will call abort
+	// and end up in sigtrampgo again.
+	// TODO: revisit this workaround if/when closures
+	// can be compiled as nosplit.
+	//
+	// Note that this scenario should only occur on
+	// TestG0StackOverflow. Any other occurrence should
+	// be treated as a bug.
+	var ret int32
+	if gp != gp.m.g0 {
+		systemstack(func() {
+			ret = fn(ep.record, ep.context, gp)
+		})
+	} else {
+		ret = fn(ep.record, ep.context, gp)
+	}
+	if ret == _EXCEPTION_CONTINUE_SEARCH {
+		return ret
+	}
+
+	// Check if we need to set up the control flow guard workaround.
+	// On Windows, the stack pointer in the context must lie within
+	// system stack limits when we resume from exception.
+	// Store the resume SP and PC in alternate registers
+	// and return to sigresume on the g0 stack.
+	// sigresume makes no use of the stack at all,
+	// loading SP from RX and jumping to RY, being RX and RY two scratch registers.
+	// Note that blindly smashing RX and RY is only safe because we know sigpanic
+	// will not actually return to the original frame, so the registers
+	// are effectively dead. But this does mean we can't use the
+	// same mechanism for async preemption.
+	if ep.context.ip() == abi.FuncPCABI0(sigresume) {
+		// sigresume has already been set up by a previous exception.
+		return ret
+	}
+	prepareContextForSigResume(ep.context)
+	ep.context.set_sp(gp.m.g0.sched.sp)
+	ep.context.set_ip(abi.FuncPCABI0(sigresume))
+	return ret
+}
+
+// Called by sigtramp from Windows VEH handler.
+// Return value signals whether the exception has been handled (EXCEPTION_CONTINUE_EXECUTION)
+// or should be made available to other handlers in the chain (EXCEPTION_CONTINUE_SEARCH).
+//
+// This is nosplit to avoid growing the stack until we've checked for
+// _EXCEPTION_BREAKPOINT, which is raised by abort() if we overflow the g0 stack.
+//
+//go:nosplit
+func exceptionhandler(info *exceptionrecord, r *context, gp *g) int32 {
+	if !isgoexception(info, r) {
+		return _EXCEPTION_CONTINUE_SEARCH
+	}
+
+	if gp.throwsplit || isAbort(r) {
+		// We can't safely sigpanic because it may grow the stack.
+		// Or this is a call to abort.
+		// Don't go through any more of the Windows handler chain.
+		// Crash now.
+		winthrow(info, r, gp)
+	}
+
+	// After this point, it is safe to grow the stack.
+
+	// Make it look like a call to the signal func.
+	// Have to pass arguments out of band since
+	// augmenting the stack frame would break
+	// the unwinding code.
+	gp.sig = info.exceptioncode
+	gp.sigcode0 = info.exceptioninformation[0]
+	gp.sigcode1 = info.exceptioninformation[1]
+	gp.sigpc = r.ip()
+
+	// Only push runtime·sigpanic if r.ip() != 0.
+	// If r.ip() == 0, probably panicked because of a
+	// call to a nil func. Not pushing that onto sp will
+	// make the trace look like a call to runtime·sigpanic instead.
+	// (Otherwise the trace will end at runtime·sigpanic and we
+	// won't get to see who faulted.)
+	// Also don't push a sigpanic frame if the faulting PC
+	// is the entry of asyncPreempt. In this case, we suspended
+	// the thread right between the fault and the exception handler
+	// starting to run, and we have pushed an asyncPreempt call.
+	// The exception is not from asyncPreempt, so not to push a
+	// sigpanic call to make it look like that. Instead, just
+	// overwrite the PC. (See issue #35773)
+	if r.ip() != 0 && r.ip() != abi.FuncPCABI0(asyncPreempt) {
+		sp := unsafe.Pointer(r.sp())
+		delta := uintptr(sys.StackAlign)
+		sp = add(sp, -delta)
+		r.set_sp(uintptr(sp))
+		if usesLR {
+			*((*uintptr)(sp)) = r.lr()
+			r.set_lr(r.ip())
+		} else {
+			*((*uintptr)(sp)) = r.ip()
+		}
+	}
+	r.set_ip(abi.FuncPCABI0(sigpanic0))
+	return _EXCEPTION_CONTINUE_EXECUTION
+}
+
+// It seems Windows searches ContinueHandler's list even
+// if ExceptionHandler returns EXCEPTION_CONTINUE_EXECUTION.
+// firstcontinuehandler will stop that search,
+// if exceptionhandler did the same earlier.
+//
+// It is nosplit for the same reason as exceptionhandler.
+//
+//go:nosplit
+func firstcontinuehandler(info *exceptionrecord, r *context, gp *g) int32 {
+	if !isgoexception(info, r) {
+		return _EXCEPTION_CONTINUE_SEARCH
+	}
+	return _EXCEPTION_CONTINUE_EXECUTION
+}
+
+// lastcontinuehandler is reached, because runtime cannot handle
+// current exception. lastcontinuehandler will print crash info and exit.
+//
+// It is nosplit for the same reason as exceptionhandler.
+//
+//go:nosplit
+func lastcontinuehandler(info *exceptionrecord, r *context, gp *g) int32 {
+	if islibrary || isarchive {
+		// Go DLL/archive has been loaded in a non-go program.
+		// If the exception does not originate from go, the go runtime
+		// should not take responsibility of crashing the process.
+		return _EXCEPTION_CONTINUE_SEARCH
+	}
+
+	// VEH is called before SEH, but arm64 MSVC DLLs use SEH to trap
+	// illegal instructions during runtime initialization to determine
+	// CPU features, so if we make it to the last handler and we're
+	// arm64 and it's an illegal instruction and this is coming from
+	// non-Go code, then assume it's this runtime probing happen, and
+	// pass that onward to SEH.
+	if GOARCH == "arm64" && info.exceptioncode == _EXCEPTION_ILLEGAL_INSTRUCTION &&
+		(r.ip() < firstmoduledata.text || firstmoduledata.etext < r.ip()) {
+		return _EXCEPTION_CONTINUE_SEARCH
+	}
+
+	winthrow(info, r, gp)
+	return 0 // not reached
+}
+
+// Always called on g0. gp is the G where the exception occurred.
+//
+//go:nosplit
+func winthrow(info *exceptionrecord, r *context, gp *g) {
+	g0 := getg()
+
+	if panicking.Load() != 0 { // traceback already printed
+		exit(2)
+	}
+	panicking.Store(1)
+
+	// In case we're handling a g0 stack overflow, blow away the
+	// g0 stack bounds so we have room to print the traceback. If
+	// this somehow overflows the stack, the OS will trap it.
+	g0.stack.lo = 0
+	g0.stackguard0 = g0.stack.lo + stackGuard
+	g0.stackguard1 = g0.stackguard0
+
+	print("Exception ", hex(info.exceptioncode), " ", hex(info.exceptioninformation[0]), " ", hex(info.exceptioninformation[1]), " ", hex(r.ip()), "\n")
+
+	print("PC=", hex(r.ip()), "\n")
+	if g0.m.incgo && gp == g0.m.g0 && g0.m.curg != nil {
+		if iscgo {
+			print("signal arrived during external code execution\n")
+		}
+		gp = g0.m.curg
+	}
+	print("\n")
+
+	g0.m.throwing = throwTypeRuntime
+	g0.m.caughtsig.set(gp)
+
+	level, _, docrash := gotraceback()
+	if level > 0 {
+		tracebacktrap(r.ip(), r.sp(), r.lr(), gp)
+		tracebackothers(gp)
+		dumpregs(r)
+	}
+
+	if docrash {
+		dieFromException(info, r)
+	}
+
+	exit(2)
+}
+
+func sigpanic() {
+	gp := getg()
+	if !canpanic() {
+		throw("unexpected signal during runtime execution")
+	}
+
+	switch gp.sig {
+	case _EXCEPTION_ACCESS_VIOLATION, _EXCEPTION_IN_PAGE_ERROR:
+		if gp.sigcode1 < 0x1000 {
+			panicmem()
+		}
+		if gp.paniconfault {
+			panicmemAddr(gp.sigcode1)
+		}
+		if inUserArenaChunk(gp.sigcode1) {
+			// We could check that the arena chunk is explicitly set to fault,
+			// but the fact that we faulted on accessing it is enough to prove
+			// that it is.
+			print("accessed data from freed user arena ", hex(gp.sigcode1), "\n")
+		} else {
+			print("unexpected fault address ", hex(gp.sigcode1), "\n")
+		}
+		throw("fault")
+	case _EXCEPTION_INT_DIVIDE_BY_ZERO:
+		panicdivide()
+	case _EXCEPTION_INT_OVERFLOW:
+		panicoverflow()
+	case _EXCEPTION_FLT_DENORMAL_OPERAND,
+		_EXCEPTION_FLT_DIVIDE_BY_ZERO,
+		_EXCEPTION_FLT_INEXACT_RESULT,
+		_EXCEPTION_FLT_OVERFLOW,
+		_EXCEPTION_FLT_UNDERFLOW:
+		panicfloat()
+	}
+	throw("fault")
+}
+
+// Following are not implemented.
+
+func initsig(preinit bool) {
+}
+
+func sigenable(sig uint32) {
+}
+
+func sigdisable(sig uint32) {
+}
+
+func sigignore(sig uint32) {
+}
+
+func signame(sig uint32) string {
+	return ""
+}
+
+//go:nosplit
+func crash() {
+	dieFromException(nil, nil)
+}
+
+// dieFromException raises an exception that bypasses all exception handlers.
+// This provides the expected exit status for the shell.
+//
+//go:nosplit
+func dieFromException(info *exceptionrecord, r *context) {
+	if info == nil {
+		gp := getg()
+		if gp.sig != 0 {
+			// Try to reconstruct an exception record from
+			// the exception information stored in gp.
+			info = &exceptionrecord{
+				exceptionaddress: gp.sigpc,
+				exceptioncode:    gp.sig,
+				numberparameters: 2,
+			}
+			info.exceptioninformation[0] = gp.sigcode0
+			info.exceptioninformation[1] = gp.sigcode1
+		} else {
+			// By default, a failing Go application exits with exit code 2.
+			// Use this value when gp does not contain exception info.
+			info = &exceptionrecord{
+				exceptioncode: 2,
+			}
+		}
+	}
+	const FAIL_FAST_GENERATE_EXCEPTION_ADDRESS = 0x1
+	stdcall3(_RaiseFailFastException, uintptr(unsafe.Pointer(info)), uintptr(unsafe.Pointer(r)), FAIL_FAST_GENERATE_EXCEPTION_ADDRESS)
+}
+
+// gsignalStack is unused on Windows.
+type gsignalStack struct{}
diff --git a/src/runtime/signal_windows_test.go b/src/runtime/signal_windows_test.go
new file mode 100644
index 0000000..431c372
--- /dev/null
+++ b/src/runtime/signal_windows_test.go
@@ -0,0 +1,315 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"bufio"
+	"bytes"
+	"fmt"
+	"internal/testenv"
+	"os/exec"
+	"path/filepath"
+	"runtime"
+	"strings"
+	"syscall"
+	"testing"
+)
+
+func TestVectoredHandlerExceptionInNonGoThread(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+	if strings.HasPrefix(testenv.Builder(), "windows-amd64-2012") {
+		testenv.SkipFlaky(t, 49681)
+	}
+	testenv.MustHaveGoBuild(t)
+	testenv.MustHaveCGO(t)
+	testenv.MustHaveExecPath(t, "gcc")
+	testprog.Lock()
+	defer testprog.Unlock()
+	dir := t.TempDir()
+
+	// build c program
+	dll := filepath.Join(dir, "veh.dll")
+	cmd := exec.Command("gcc", "-shared", "-o", dll, "testdata/testwinlibthrow/veh.c")
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build c exe: %s\n%s", err, out)
+	}
+
+	// build go exe
+	exe := filepath.Join(dir, "test.exe")
+	cmd = exec.Command(testenv.GoToolPath(t), "build", "-o", exe, "testdata/testwinlibthrow/main.go")
+	out, err = testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build go library: %s\n%s", err, out)
+	}
+
+	// run test program in same thread
+	cmd = exec.Command(exe)
+	out, err = testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err == nil {
+		t.Fatal("error expected")
+	}
+	if _, ok := err.(*exec.ExitError); ok && len(out) > 0 {
+		if !bytes.Contains(out, []byte("Exception 0x2a")) {
+			t.Fatalf("unexpected failure while running executable: %s\n%s", err, out)
+		}
+	} else {
+		t.Fatalf("unexpected error while running executable: %s\n%s", err, out)
+	}
+	// run test program in a new thread
+	cmd = exec.Command(exe, "thread")
+	out, err = testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err == nil {
+		t.Fatal("error expected")
+	}
+	if err, ok := err.(*exec.ExitError); ok {
+		if err.ExitCode() != 42 {
+			t.Fatalf("unexpected failure while running executable: %s\n%s", err, out)
+		}
+	} else {
+		t.Fatalf("unexpected error while running executable: %s\n%s", err, out)
+	}
+}
+
+func TestVectoredHandlerDontCrashOnLibrary(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+	if runtime.GOARCH == "arm" {
+		//TODO: remove this skip and update testwinlib/main.c
+		// once windows/arm supports c-shared buildmode.
+		// See go.dev/issues/43800.
+		t.Skip("this test can't run on windows/arm")
+	}
+	testenv.MustHaveGoBuild(t)
+	testenv.MustHaveCGO(t)
+	testenv.MustHaveExecPath(t, "gcc")
+	testprog.Lock()
+	defer testprog.Unlock()
+	dir := t.TempDir()
+
+	// build go dll
+	dll := filepath.Join(dir, "testwinlib.dll")
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-o", dll, "-buildmode", "c-shared", "testdata/testwinlib/main.go")
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build go library: %s\n%s", err, out)
+	}
+
+	// build c program
+	exe := filepath.Join(dir, "test.exe")
+	cmd = exec.Command("gcc", "-L"+dir, "-I"+dir, "-ltestwinlib", "-o", exe, "testdata/testwinlib/main.c")
+	out, err = testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build c exe: %s\n%s", err, out)
+	}
+
+	// run test program
+	cmd = exec.Command(exe)
+	out, err = testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("failure while running executable: %s\n%s", err, out)
+	}
+	expectedOutput := "exceptionCount: 1\ncontinueCount: 1\n"
+	// cleaning output
+	cleanedOut := strings.ReplaceAll(string(out), "\r\n", "\n")
+	if cleanedOut != expectedOutput {
+		t.Errorf("expected output %q, got %q", expectedOutput, cleanedOut)
+	}
+}
+
+func sendCtrlBreak(pid int) error {
+	kernel32, err := syscall.LoadDLL("kernel32.dll")
+	if err != nil {
+		return fmt.Errorf("LoadDLL: %v\n", err)
+	}
+	generateEvent, err := kernel32.FindProc("GenerateConsoleCtrlEvent")
+	if err != nil {
+		return fmt.Errorf("FindProc: %v\n", err)
+	}
+	result, _, err := generateEvent.Call(syscall.CTRL_BREAK_EVENT, uintptr(pid))
+	if result == 0 {
+		return fmt.Errorf("GenerateConsoleCtrlEvent: %v\n", err)
+	}
+	return nil
+}
+
+// TestCtrlHandler tests that Go can gracefully handle closing the console window.
+// See https://golang.org/issues/41884.
+func TestCtrlHandler(t *testing.T) {
+	testenv.MustHaveGoBuild(t)
+	t.Parallel()
+
+	// build go program
+	exe := filepath.Join(t.TempDir(), "test.exe")
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-o", exe, "testdata/testwinsignal/main.go")
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build go exe: %v\n%s", err, out)
+	}
+
+	// run test program
+	cmd = exec.Command(exe)
+	var stdout strings.Builder
+	var stderr strings.Builder
+	cmd.Stdout = &stdout
+	cmd.Stderr = &stderr
+	inPipe, err := cmd.StdinPipe()
+	if err != nil {
+		t.Fatalf("Failed to create stdin pipe: %v", err)
+	}
+	// keep inPipe alive until the end of the test
+	defer inPipe.Close()
+
+	// in a new command window
+	const _CREATE_NEW_CONSOLE = 0x00000010
+	cmd.SysProcAttr = &syscall.SysProcAttr{
+		CreationFlags: _CREATE_NEW_CONSOLE,
+		HideWindow:    true,
+	}
+	if err := cmd.Start(); err != nil {
+		t.Fatalf("Start failed: %v", err)
+	}
+	defer func() {
+		cmd.Process.Kill()
+		cmd.Wait()
+	}()
+
+	// check child exited gracefully, did not timeout
+	if err := cmd.Wait(); err != nil {
+		t.Fatalf("Program exited with error: %v\n%s", err, &stderr)
+	}
+
+	// check child received, handled SIGTERM
+	if expected, got := syscall.SIGTERM.String(), strings.TrimSpace(stdout.String()); expected != got {
+		t.Fatalf("Expected '%s' got: %s", expected, got)
+	}
+}
+
+// TestLibraryCtrlHandler tests that Go DLL allows calling program to handle console control events.
+// See https://golang.org/issues/35965.
+func TestLibraryCtrlHandler(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+	if runtime.GOARCH != "amd64" {
+		t.Skip("this test can only run on windows/amd64")
+	}
+	testenv.MustHaveGoBuild(t)
+	testenv.MustHaveCGO(t)
+	testenv.MustHaveExecPath(t, "gcc")
+	testprog.Lock()
+	defer testprog.Unlock()
+	dir := t.TempDir()
+
+	// build go dll
+	dll := filepath.Join(dir, "dummy.dll")
+	cmd := exec.Command(testenv.GoToolPath(t), "build", "-o", dll, "-buildmode", "c-shared", "testdata/testwinlibsignal/dummy.go")
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build go library: %s\n%s", err, out)
+	}
+
+	// build c program
+	exe := filepath.Join(dir, "test.exe")
+	cmd = exec.Command("gcc", "-o", exe, "testdata/testwinlibsignal/main.c")
+	out, err = testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build c exe: %s\n%s", err, out)
+	}
+
+	// run test program
+	cmd = exec.Command(exe)
+	var stderr bytes.Buffer
+	cmd.Stderr = &stderr
+	outPipe, err := cmd.StdoutPipe()
+	if err != nil {
+		t.Fatalf("Failed to create stdout pipe: %v", err)
+	}
+	outReader := bufio.NewReader(outPipe)
+
+	cmd.SysProcAttr = &syscall.SysProcAttr{
+		CreationFlags: syscall.CREATE_NEW_PROCESS_GROUP,
+	}
+	if err := cmd.Start(); err != nil {
+		t.Fatalf("Start failed: %v", err)
+	}
+
+	errCh := make(chan error, 1)
+	go func() {
+		if line, err := outReader.ReadString('\n'); err != nil {
+			errCh <- fmt.Errorf("could not read stdout: %v", err)
+		} else if strings.TrimSpace(line) != "ready" {
+			errCh <- fmt.Errorf("unexpected message: %v", line)
+		} else {
+			errCh <- sendCtrlBreak(cmd.Process.Pid)
+		}
+	}()
+
+	if err := <-errCh; err != nil {
+		t.Fatal(err)
+	}
+	if err := cmd.Wait(); err != nil {
+		t.Fatalf("Program exited with error: %v\n%s", err, &stderr)
+	}
+}
+
+func TestIssue59213(t *testing.T) {
+	if runtime.GOOS != "windows" {
+		t.Skip("skipping windows only test")
+	}
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+	testenv.MustHaveGoBuild(t)
+	testenv.MustHaveCGO(t)
+
+	goEnv := func(arg string) string {
+		cmd := testenv.Command(t, testenv.GoToolPath(t), "env", arg)
+		cmd.Stderr = new(bytes.Buffer)
+
+		line, err := cmd.Output()
+		if err != nil {
+			t.Fatalf("%v: %v\n%s", cmd, err, cmd.Stderr)
+		}
+		out := string(bytes.TrimSpace(line))
+		t.Logf("%v: %q", cmd, out)
+		return out
+	}
+
+	cc := goEnv("CC")
+	cgoCflags := goEnv("CGO_CFLAGS")
+
+	t.Parallel()
+
+	tmpdir := t.TempDir()
+	dllfile := filepath.Join(tmpdir, "test.dll")
+	exefile := filepath.Join(tmpdir, "gotest.exe")
+
+	// build go dll
+	cmd := testenv.Command(t, testenv.GoToolPath(t), "build", "-o", dllfile, "-buildmode", "c-shared", "testdata/testwintls/main.go")
+	out, err := testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build go library: %s\n%s", err, out)
+	}
+
+	// build c program
+	cmd = testenv.Command(t, cc, "-o", exefile, "testdata/testwintls/main.c")
+	testenv.CleanCmdEnv(cmd)
+	cmd.Env = append(cmd.Env, "CGO_CFLAGS="+cgoCflags)
+	out, err = cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build c exe: %s\n%s", err, out)
+	}
+
+	// run test program
+	cmd = testenv.Command(t, exefile, dllfile, "GoFunc")
+	out, err = testenv.CleanCmdEnv(cmd).CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed: %s\n%s", err, out)
+	}
+}
diff --git a/src/runtime/sigqueue.go b/src/runtime/sigqueue.go
new file mode 100644
index 0000000..51e424d
--- /dev/null
+++ b/src/runtime/sigqueue.go
@@ -0,0 +1,275 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This file implements runtime support for signal handling.
+//
+// Most synchronization primitives are not available from
+// the signal handler (it cannot block, allocate memory, or use locks)
+// so the handler communicates with a processing goroutine
+// via struct sig, below.
+//
+// sigsend is called by the signal handler to queue a new signal.
+// signal_recv is called by the Go program to receive a newly queued signal.
+//
+// Synchronization between sigsend and signal_recv is based on the sig.state
+// variable. It can be in three states:
+// * sigReceiving means that signal_recv is blocked on sig.Note and there are
+//   no new pending signals.
+// * sigSending means that sig.mask *may* contain new pending signals,
+//   signal_recv can't be blocked in this state.
+// * sigIdle means that there are no new pending signals and signal_recv is not
+//   blocked.
+//
+// Transitions between states are done atomically with CAS.
+//
+// When signal_recv is unblocked, it resets sig.Note and rechecks sig.mask.
+// If several sigsends and signal_recv execute concurrently, it can lead to
+// unnecessary rechecks of sig.mask, but it cannot lead to missed signals
+// nor deadlocks.
+
+//go:build !plan9
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	_ "unsafe" // for go:linkname
+)
+
+// sig handles communication between the signal handler and os/signal.
+// Other than the inuse and recv fields, the fields are accessed atomically.
+//
+// The wanted and ignored fields are only written by one goroutine at
+// a time; access is controlled by the handlers Mutex in os/signal.
+// The fields are only read by that one goroutine and by the signal handler.
+// We access them atomically to minimize the race between setting them
+// in the goroutine calling os/signal and the signal handler,
+// which may be running in a different thread. That race is unavoidable,
+// as there is no connection between handling a signal and receiving one,
+// but atomic instructions should minimize it.
+var sig struct {
+	note       note
+	mask       [(_NSIG + 31) / 32]uint32
+	wanted     [(_NSIG + 31) / 32]uint32
+	ignored    [(_NSIG + 31) / 32]uint32
+	recv       [(_NSIG + 31) / 32]uint32
+	state      atomic.Uint32
+	delivering atomic.Uint32
+	inuse      bool
+}
+
+const (
+	sigIdle = iota
+	sigReceiving
+	sigSending
+)
+
+// sigsend delivers a signal from sighandler to the internal signal delivery queue.
+// It reports whether the signal was sent. If not, the caller typically crashes the program.
+// It runs from the signal handler, so it's limited in what it can do.
+func sigsend(s uint32) bool {
+	bit := uint32(1) << uint(s&31)
+	if s >= uint32(32*len(sig.wanted)) {
+		return false
+	}
+
+	sig.delivering.Add(1)
+	// We are running in the signal handler; defer is not available.
+
+	if w := atomic.Load(&sig.wanted[s/32]); w&bit == 0 {
+		sig.delivering.Add(-1)
+		return false
+	}
+
+	// Add signal to outgoing queue.
+	for {
+		mask := sig.mask[s/32]
+		if mask&bit != 0 {
+			sig.delivering.Add(-1)
+			return true // signal already in queue
+		}
+		if atomic.Cas(&sig.mask[s/32], mask, mask|bit) {
+			break
+		}
+	}
+
+	// Notify receiver that queue has new bit.
+Send:
+	for {
+		switch sig.state.Load() {
+		default:
+			throw("sigsend: inconsistent state")
+		case sigIdle:
+			if sig.state.CompareAndSwap(sigIdle, sigSending) {
+				break Send
+			}
+		case sigSending:
+			// notification already pending
+			break Send
+		case sigReceiving:
+			if sig.state.CompareAndSwap(sigReceiving, sigIdle) {
+				if GOOS == "darwin" || GOOS == "ios" {
+					sigNoteWakeup(&sig.note)
+					break Send
+				}
+				notewakeup(&sig.note)
+				break Send
+			}
+		}
+	}
+
+	sig.delivering.Add(-1)
+	return true
+}
+
+// Called to receive the next queued signal.
+// Must only be called from a single goroutine at a time.
+//
+//go:linkname signal_recv os/signal.signal_recv
+func signal_recv() uint32 {
+	for {
+		// Serve any signals from local copy.
+		for i := uint32(0); i < _NSIG; i++ {
+			if sig.recv[i/32]&(1<<(i&31)) != 0 {
+				sig.recv[i/32] &^= 1 << (i & 31)
+				return i
+			}
+		}
+
+		// Wait for updates to be available from signal sender.
+	Receive:
+		for {
+			switch sig.state.Load() {
+			default:
+				throw("signal_recv: inconsistent state")
+			case sigIdle:
+				if sig.state.CompareAndSwap(sigIdle, sigReceiving) {
+					if GOOS == "darwin" || GOOS == "ios" {
+						sigNoteSleep(&sig.note)
+						break Receive
+					}
+					notetsleepg(&sig.note, -1)
+					noteclear(&sig.note)
+					break Receive
+				}
+			case sigSending:
+				if sig.state.CompareAndSwap(sigSending, sigIdle) {
+					break Receive
+				}
+			}
+		}
+
+		// Incorporate updates from sender into local copy.
+		for i := range sig.mask {
+			sig.recv[i] = atomic.Xchg(&sig.mask[i], 0)
+		}
+	}
+}
+
+// signalWaitUntilIdle waits until the signal delivery mechanism is idle.
+// This is used to ensure that we do not drop a signal notification due
+// to a race between disabling a signal and receiving a signal.
+// This assumes that signal delivery has already been disabled for
+// the signal(s) in question, and here we are just waiting to make sure
+// that all the signals have been delivered to the user channels
+// by the os/signal package.
+//
+//go:linkname signalWaitUntilIdle os/signal.signalWaitUntilIdle
+func signalWaitUntilIdle() {
+	// Although the signals we care about have been removed from
+	// sig.wanted, it is possible that another thread has received
+	// a signal, has read from sig.wanted, is now updating sig.mask,
+	// and has not yet woken up the processor thread. We need to wait
+	// until all current signal deliveries have completed.
+	for sig.delivering.Load() != 0 {
+		Gosched()
+	}
+
+	// Although WaitUntilIdle seems like the right name for this
+	// function, the state we are looking for is sigReceiving, not
+	// sigIdle.  The sigIdle state is really more like sigProcessing.
+	for sig.state.Load() != sigReceiving {
+		Gosched()
+	}
+}
+
+// Must only be called from a single goroutine at a time.
+//
+//go:linkname signal_enable os/signal.signal_enable
+func signal_enable(s uint32) {
+	if !sig.inuse {
+		// This is the first call to signal_enable. Initialize.
+		sig.inuse = true // enable reception of signals; cannot disable
+		if GOOS == "darwin" || GOOS == "ios" {
+			sigNoteSetup(&sig.note)
+		} else {
+			noteclear(&sig.note)
+		}
+	}
+
+	if s >= uint32(len(sig.wanted)*32) {
+		return
+	}
+
+	w := sig.wanted[s/32]
+	w |= 1 << (s & 31)
+	atomic.Store(&sig.wanted[s/32], w)
+
+	i := sig.ignored[s/32]
+	i &^= 1 << (s & 31)
+	atomic.Store(&sig.ignored[s/32], i)
+
+	sigenable(s)
+}
+
+// Must only be called from a single goroutine at a time.
+//
+//go:linkname signal_disable os/signal.signal_disable
+func signal_disable(s uint32) {
+	if s >= uint32(len(sig.wanted)*32) {
+		return
+	}
+	sigdisable(s)
+
+	w := sig.wanted[s/32]
+	w &^= 1 << (s & 31)
+	atomic.Store(&sig.wanted[s/32], w)
+}
+
+// Must only be called from a single goroutine at a time.
+//
+//go:linkname signal_ignore os/signal.signal_ignore
+func signal_ignore(s uint32) {
+	if s >= uint32(len(sig.wanted)*32) {
+		return
+	}
+	sigignore(s)
+
+	w := sig.wanted[s/32]
+	w &^= 1 << (s & 31)
+	atomic.Store(&sig.wanted[s/32], w)
+
+	i := sig.ignored[s/32]
+	i |= 1 << (s & 31)
+	atomic.Store(&sig.ignored[s/32], i)
+}
+
+// sigInitIgnored marks the signal as already ignored. This is called at
+// program start by initsig. In a shared library initsig is called by
+// libpreinit, so the runtime may not be initialized yet.
+//
+//go:nosplit
+func sigInitIgnored(s uint32) {
+	i := sig.ignored[s/32]
+	i |= 1 << (s & 31)
+	atomic.Store(&sig.ignored[s/32], i)
+}
+
+// Checked by signal handlers.
+//
+//go:linkname signal_ignored os/signal.signal_ignored
+func signal_ignored(s uint32) bool {
+	i := atomic.Load(&sig.ignored[s/32])
+	return i&(1<<(s&31)) != 0
+}
diff --git a/src/runtime/sigqueue_note.go b/src/runtime/sigqueue_note.go
new file mode 100644
index 0000000..fb1a517
--- /dev/null
+++ b/src/runtime/sigqueue_note.go
@@ -0,0 +1,24 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The current implementation of notes on Darwin is not async-signal-safe,
+// so on Darwin the sigqueue code uses different functions to wake up the
+// signal_recv thread. This file holds the non-Darwin implementations of
+// those functions. These functions will never be called.
+
+//go:build !darwin && !plan9
+
+package runtime
+
+func sigNoteSetup(*note) {
+	throw("sigNoteSetup")
+}
+
+func sigNoteSleep(*note) {
+	throw("sigNoteSleep")
+}
+
+func sigNoteWakeup(*note) {
+	throw("sigNoteWakeup")
+}
diff --git a/src/runtime/sigqueue_plan9.go b/src/runtime/sigqueue_plan9.go
new file mode 100644
index 0000000..9ed6fb5
--- /dev/null
+++ b/src/runtime/sigqueue_plan9.go
@@ -0,0 +1,161 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This file implements runtime support for signal handling.
+
+package runtime
+
+import _ "unsafe"
+
+const qsize = 64
+
+var sig struct {
+	q     noteQueue
+	inuse bool
+
+	lock     mutex
+	note     note
+	sleeping bool
+}
+
+type noteData struct {
+	s [_ERRMAX]byte
+	n int // n bytes of s are valid
+}
+
+type noteQueue struct {
+	lock mutex
+	data [qsize]noteData
+	ri   int
+	wi   int
+	full bool
+}
+
+// It is not allowed to allocate memory in the signal handler.
+func (q *noteQueue) push(item *byte) bool {
+	lock(&q.lock)
+	if q.full {
+		unlock(&q.lock)
+		return false
+	}
+	s := gostringnocopy(item)
+	copy(q.data[q.wi].s[:], s)
+	q.data[q.wi].n = len(s)
+	q.wi++
+	if q.wi == qsize {
+		q.wi = 0
+	}
+	if q.wi == q.ri {
+		q.full = true
+	}
+	unlock(&q.lock)
+	return true
+}
+
+func (q *noteQueue) pop() string {
+	lock(&q.lock)
+	q.full = false
+	if q.ri == q.wi {
+		unlock(&q.lock)
+		return ""
+	}
+	note := &q.data[q.ri]
+	item := string(note.s[:note.n])
+	q.ri++
+	if q.ri == qsize {
+		q.ri = 0
+	}
+	unlock(&q.lock)
+	return item
+}
+
+// Called from sighandler to send a signal back out of the signal handling thread.
+// Reports whether the signal was sent. If not, the caller typically crashes the program.
+func sendNote(s *byte) bool {
+	if !sig.inuse {
+		return false
+	}
+
+	// Add signal to outgoing queue.
+	if !sig.q.push(s) {
+		return false
+	}
+
+	lock(&sig.lock)
+	if sig.sleeping {
+		sig.sleeping = false
+		notewakeup(&sig.note)
+	}
+	unlock(&sig.lock)
+
+	return true
+}
+
+// Called to receive the next queued signal.
+// Must only be called from a single goroutine at a time.
+//
+//go:linkname signal_recv os/signal.signal_recv
+func signal_recv() string {
+	for {
+		note := sig.q.pop()
+		if note != "" {
+			return note
+		}
+
+		lock(&sig.lock)
+		sig.sleeping = true
+		noteclear(&sig.note)
+		unlock(&sig.lock)
+		notetsleepg(&sig.note, -1)
+	}
+}
+
+// signalWaitUntilIdle waits until the signal delivery mechanism is idle.
+// This is used to ensure that we do not drop a signal notification due
+// to a race between disabling a signal and receiving a signal.
+// This assumes that signal delivery has already been disabled for
+// the signal(s) in question, and here we are just waiting to make sure
+// that all the signals have been delivered to the user channels
+// by the os/signal package.
+//
+//go:linkname signalWaitUntilIdle os/signal.signalWaitUntilIdle
+func signalWaitUntilIdle() {
+	for {
+		lock(&sig.lock)
+		sleeping := sig.sleeping
+		unlock(&sig.lock)
+		if sleeping {
+			return
+		}
+		Gosched()
+	}
+}
+
+// Must only be called from a single goroutine at a time.
+//
+//go:linkname signal_enable os/signal.signal_enable
+func signal_enable(s uint32) {
+	if !sig.inuse {
+		// This is the first call to signal_enable. Initialize.
+		sig.inuse = true // enable reception of signals; cannot disable
+		noteclear(&sig.note)
+	}
+}
+
+// Must only be called from a single goroutine at a time.
+//
+//go:linkname signal_disable os/signal.signal_disable
+func signal_disable(s uint32) {
+}
+
+// Must only be called from a single goroutine at a time.
+//
+//go:linkname signal_ignore os/signal.signal_ignore
+func signal_ignore(s uint32) {
+}
+
+//go:linkname signal_ignored os/signal.signal_ignored
+func signal_ignored(s uint32) bool {
+	return false
+}
diff --git a/src/runtime/sigtab_aix.go b/src/runtime/sigtab_aix.go
new file mode 100644
index 0000000..42e5606
--- /dev/null
+++ b/src/runtime/sigtab_aix.go
@@ -0,0 +1,264 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	0:           {0, "SIGNONE: no trap"},
+	_SIGHUP:     {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
+	_SIGINT:     {_SigNotify + _SigKill, "SIGINT: interrupt"},
+	_SIGQUIT:    {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+	_SIGILL:     {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
+	_SIGTRAP:    {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
+	_SIGABRT:    {_SigNotify + _SigThrow, "SIGABRT: abort"},
+	_SIGBUS:     {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	_SIGFPE:     {_SigPanic + _SigUnblock, "SIGFPE: floating-point exception"},
+	_SIGKILL:    {0, "SIGKILL: kill"},
+	_SIGUSR1:    {_SigNotify, "SIGUSR1: user-defined signal 1"},
+	_SIGSEGV:    {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	_SIGUSR2:    {_SigNotify, "SIGUSR2: user-defined signal 2"},
+	_SIGPIPE:    {_SigNotify, "SIGPIPE: write to broken pipe"},
+	_SIGALRM:    {_SigNotify, "SIGALRM: alarm clock"},
+	_SIGTERM:    {_SigNotify + _SigKill, "SIGTERM: termination"},
+	_SIGCHLD:    {_SigNotify + _SigUnblock, "SIGCHLD: child status has changed"},
+	_SIGCONT:    {_SigNotify + _SigDefault, "SIGCONT: continue"},
+	_SIGSTOP:    {0, "SIGSTOP: stop"},
+	_SIGTSTP:    {_SigNotify + _SigDefault, "SIGTSTP: keyboard stop"},
+	_SIGTTIN:    {_SigNotify + _SigDefault, "SIGTTIN: background read from tty"},
+	_SIGTTOU:    {_SigNotify + _SigDefault, "SIGTTOU: background write to tty"},
+	_SIGURG:     {_SigNotify, "SIGURG: urgent condition on socket"},
+	_SIGXCPU:    {_SigNotify, "SIGXCPU: cpu limit exceeded"},
+	_SIGXFSZ:    {_SigNotify, "SIGXFSZ: file size limit exceeded"},
+	_SIGVTALRM:  {_SigNotify, "SIGVTALRM: virtual alarm clock"},
+	_SIGPROF:    {_SigNotify + _SigUnblock, "SIGPROF: profiling alarm clock"},
+	_SIGWINCH:   {_SigNotify, "SIGWINCH: window size change"},
+	_SIGSYS:     {_SigThrow, "SIGSYS: bad system call"},
+	_SIGIO:      {_SigNotify, "SIGIO: i/o now possible"},
+	_SIGPWR:     {_SigNotify, "SIGPWR: power failure restart"},
+	_SIGEMT:     {_SigThrow, "SIGEMT: emulate instruction executed"},
+	_SIGWAITING: {0, "SIGWAITING: reserved signal no longer used by"},
+	26:          {_SigNotify, "signal 26"},
+	27:          {_SigNotify, "signal 27"},
+	33:          {_SigNotify, "signal 33"},
+	35:          {_SigNotify, "signal 35"},
+	36:          {_SigNotify, "signal 36"},
+	37:          {_SigNotify, "signal 37"},
+	38:          {_SigNotify, "signal 38"},
+	40:          {_SigNotify, "signal 40"},
+	41:          {_SigNotify, "signal 41"},
+	42:          {_SigNotify, "signal 42"},
+	43:          {_SigNotify, "signal 43"},
+	44:          {_SigNotify, "signal 44"},
+	45:          {_SigNotify, "signal 45"},
+	46:          {_SigNotify, "signal 46"},
+	47:          {_SigNotify, "signal 47"},
+	48:          {_SigNotify, "signal 48"},
+	49:          {_SigNotify, "signal 49"},
+	50:          {_SigNotify, "signal 50"},
+	51:          {_SigNotify, "signal 51"},
+	52:          {_SigNotify, "signal 52"},
+	53:          {_SigNotify, "signal 53"},
+	54:          {_SigNotify, "signal 54"},
+	55:          {_SigNotify, "signal 55"},
+	56:          {_SigNotify, "signal 56"},
+	57:          {_SigNotify, "signal 57"},
+	58:          {_SigNotify, "signal 58"},
+	59:          {_SigNotify, "signal 59"},
+	60:          {_SigNotify, "signal 60"},
+	61:          {_SigNotify, "signal 61"},
+	62:          {_SigNotify, "signal 62"},
+	63:          {_SigNotify, "signal 63"},
+	64:          {_SigNotify, "signal 64"},
+	65:          {_SigNotify, "signal 65"},
+	66:          {_SigNotify, "signal 66"},
+	67:          {_SigNotify, "signal 67"},
+	68:          {_SigNotify, "signal 68"},
+	69:          {_SigNotify, "signal 69"},
+	70:          {_SigNotify, "signal 70"},
+	71:          {_SigNotify, "signal 71"},
+	72:          {_SigNotify, "signal 72"},
+	73:          {_SigNotify, "signal 73"},
+	74:          {_SigNotify, "signal 74"},
+	75:          {_SigNotify, "signal 75"},
+	76:          {_SigNotify, "signal 76"},
+	77:          {_SigNotify, "signal 77"},
+	78:          {_SigNotify, "signal 78"},
+	79:          {_SigNotify, "signal 79"},
+	80:          {_SigNotify, "signal 80"},
+	81:          {_SigNotify, "signal 81"},
+	82:          {_SigNotify, "signal 82"},
+	83:          {_SigNotify, "signal 83"},
+	84:          {_SigNotify, "signal 84"},
+	85:          {_SigNotify, "signal 85"},
+	86:          {_SigNotify, "signal 86"},
+	87:          {_SigNotify, "signal 87"},
+	88:          {_SigNotify, "signal 88"},
+	89:          {_SigNotify, "signal 89"},
+	90:          {_SigNotify, "signal 90"},
+	91:          {_SigNotify, "signal 91"},
+	92:          {_SigNotify, "signal 92"},
+	93:          {_SigNotify, "signal 93"},
+	94:          {_SigNotify, "signal 94"},
+	95:          {_SigNotify, "signal 95"},
+	96:          {_SigNotify, "signal 96"},
+	97:          {_SigNotify, "signal 97"},
+	98:          {_SigNotify, "signal 98"},
+	99:          {_SigNotify, "signal 99"},
+	100:         {_SigNotify, "signal 100"},
+	101:         {_SigNotify, "signal 101"},
+	102:         {_SigNotify, "signal 102"},
+	103:         {_SigNotify, "signal 103"},
+	104:         {_SigNotify, "signal 104"},
+	105:         {_SigNotify, "signal 105"},
+	106:         {_SigNotify, "signal 106"},
+	107:         {_SigNotify, "signal 107"},
+	108:         {_SigNotify, "signal 108"},
+	109:         {_SigNotify, "signal 109"},
+	110:         {_SigNotify, "signal 110"},
+	111:         {_SigNotify, "signal 111"},
+	112:         {_SigNotify, "signal 112"},
+	113:         {_SigNotify, "signal 113"},
+	114:         {_SigNotify, "signal 114"},
+	115:         {_SigNotify, "signal 115"},
+	116:         {_SigNotify, "signal 116"},
+	117:         {_SigNotify, "signal 117"},
+	118:         {_SigNotify, "signal 118"},
+	119:         {_SigNotify, "signal 119"},
+	120:         {_SigNotify, "signal 120"},
+	121:         {_SigNotify, "signal 121"},
+	122:         {_SigNotify, "signal 122"},
+	123:         {_SigNotify, "signal 123"},
+	124:         {_SigNotify, "signal 124"},
+	125:         {_SigNotify, "signal 125"},
+	126:         {_SigNotify, "signal 126"},
+	127:         {_SigNotify, "signal 127"},
+	128:         {_SigNotify, "signal 128"},
+	129:         {_SigNotify, "signal 129"},
+	130:         {_SigNotify, "signal 130"},
+	131:         {_SigNotify, "signal 131"},
+	132:         {_SigNotify, "signal 132"},
+	133:         {_SigNotify, "signal 133"},
+	134:         {_SigNotify, "signal 134"},
+	135:         {_SigNotify, "signal 135"},
+	136:         {_SigNotify, "signal 136"},
+	137:         {_SigNotify, "signal 137"},
+	138:         {_SigNotify, "signal 138"},
+	139:         {_SigNotify, "signal 139"},
+	140:         {_SigNotify, "signal 140"},
+	141:         {_SigNotify, "signal 141"},
+	142:         {_SigNotify, "signal 142"},
+	143:         {_SigNotify, "signal 143"},
+	144:         {_SigNotify, "signal 144"},
+	145:         {_SigNotify, "signal 145"},
+	146:         {_SigNotify, "signal 146"},
+	147:         {_SigNotify, "signal 147"},
+	148:         {_SigNotify, "signal 148"},
+	149:         {_SigNotify, "signal 149"},
+	150:         {_SigNotify, "signal 150"},
+	151:         {_SigNotify, "signal 151"},
+	152:         {_SigNotify, "signal 152"},
+	153:         {_SigNotify, "signal 153"},
+	154:         {_SigNotify, "signal 154"},
+	155:         {_SigNotify, "signal 155"},
+	156:         {_SigNotify, "signal 156"},
+	157:         {_SigNotify, "signal 157"},
+	158:         {_SigNotify, "signal 158"},
+	159:         {_SigNotify, "signal 159"},
+	160:         {_SigNotify, "signal 160"},
+	161:         {_SigNotify, "signal 161"},
+	162:         {_SigNotify, "signal 162"},
+	163:         {_SigNotify, "signal 163"},
+	164:         {_SigNotify, "signal 164"},
+	165:         {_SigNotify, "signal 165"},
+	166:         {_SigNotify, "signal 166"},
+	167:         {_SigNotify, "signal 167"},
+	168:         {_SigNotify, "signal 168"},
+	169:         {_SigNotify, "signal 169"},
+	170:         {_SigNotify, "signal 170"},
+	171:         {_SigNotify, "signal 171"},
+	172:         {_SigNotify, "signal 172"},
+	173:         {_SigNotify, "signal 173"},
+	174:         {_SigNotify, "signal 174"},
+	175:         {_SigNotify, "signal 175"},
+	176:         {_SigNotify, "signal 176"},
+	177:         {_SigNotify, "signal 177"},
+	178:         {_SigNotify, "signal 178"},
+	179:         {_SigNotify, "signal 179"},
+	180:         {_SigNotify, "signal 180"},
+	181:         {_SigNotify, "signal 181"},
+	182:         {_SigNotify, "signal 182"},
+	183:         {_SigNotify, "signal 183"},
+	184:         {_SigNotify, "signal 184"},
+	185:         {_SigNotify, "signal 185"},
+	186:         {_SigNotify, "signal 186"},
+	187:         {_SigNotify, "signal 187"},
+	188:         {_SigNotify, "signal 188"},
+	189:         {_SigNotify, "signal 189"},
+	190:         {_SigNotify, "signal 190"},
+	191:         {_SigNotify, "signal 191"},
+	192:         {_SigNotify, "signal 192"},
+	193:         {_SigNotify, "signal 193"},
+	194:         {_SigNotify, "signal 194"},
+	195:         {_SigNotify, "signal 195"},
+	196:         {_SigNotify, "signal 196"},
+	197:         {_SigNotify, "signal 197"},
+	198:         {_SigNotify, "signal 198"},
+	199:         {_SigNotify, "signal 199"},
+	200:         {_SigNotify, "signal 200"},
+	201:         {_SigNotify, "signal 201"},
+	202:         {_SigNotify, "signal 202"},
+	203:         {_SigNotify, "signal 203"},
+	204:         {_SigNotify, "signal 204"},
+	205:         {_SigNotify, "signal 205"},
+	206:         {_SigNotify, "signal 206"},
+	207:         {_SigNotify, "signal 207"},
+	208:         {_SigNotify, "signal 208"},
+	209:         {_SigNotify, "signal 209"},
+	210:         {_SigNotify, "signal 210"},
+	211:         {_SigNotify, "signal 211"},
+	212:         {_SigNotify, "signal 212"},
+	213:         {_SigNotify, "signal 213"},
+	214:         {_SigNotify, "signal 214"},
+	215:         {_SigNotify, "signal 215"},
+	216:         {_SigNotify, "signal 216"},
+	217:         {_SigNotify, "signal 217"},
+	218:         {_SigNotify, "signal 218"},
+	219:         {_SigNotify, "signal 219"},
+	220:         {_SigNotify, "signal 220"},
+	221:         {_SigNotify, "signal 221"},
+	222:         {_SigNotify, "signal 222"},
+	223:         {_SigNotify, "signal 223"},
+	224:         {_SigNotify, "signal 224"},
+	225:         {_SigNotify, "signal 225"},
+	226:         {_SigNotify, "signal 226"},
+	227:         {_SigNotify, "signal 227"},
+	228:         {_SigNotify, "signal 228"},
+	229:         {_SigNotify, "signal 229"},
+	230:         {_SigNotify, "signal 230"},
+	231:         {_SigNotify, "signal 231"},
+	232:         {_SigNotify, "signal 232"},
+	233:         {_SigNotify, "signal 233"},
+	234:         {_SigNotify, "signal 234"},
+	235:         {_SigNotify, "signal 235"},
+	236:         {_SigNotify, "signal 236"},
+	237:         {_SigNotify, "signal 237"},
+	238:         {_SigNotify, "signal 238"},
+	239:         {_SigNotify, "signal 239"},
+	240:         {_SigNotify, "signal 240"},
+	241:         {_SigNotify, "signal 241"},
+	242:         {_SigNotify, "signal 242"},
+	243:         {_SigNotify, "signal 243"},
+	244:         {_SigNotify, "signal 244"},
+	245:         {_SigNotify, "signal 245"},
+	246:         {_SigNotify, "signal 246"},
+	247:         {_SigNotify, "signal 247"},
+	248:         {_SigNotify, "signal 248"},
+	249:         {_SigNotify, "signal 249"},
+	250:         {_SigNotify, "signal 250"},
+	251:         {_SigNotify, "signal 251"},
+	252:         {_SigNotify, "signal 252"},
+	253:         {_SigNotify, "signal 253"},
+	254:         {_SigNotify, "signal 254"},
+	255:         {_SigNotify, "signal 255"},
+}
diff --git a/src/runtime/sigtab_linux_generic.go b/src/runtime/sigtab_linux_generic.go
new file mode 100644
index 0000000..fe93bba
--- /dev/null
+++ b/src/runtime/sigtab_linux_generic.go
@@ -0,0 +1,75 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !mips && !mipsle && !mips64 && !mips64le && linux
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	/* 0 */ {0, "SIGNONE: no trap"},
+	/* 1 */ {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
+	/* 2 */ {_SigNotify + _SigKill, "SIGINT: interrupt"},
+	/* 3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+	/* 4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
+	/* 5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
+	/* 6 */ {_SigNotify + _SigThrow, "SIGABRT: abort"},
+	/* 7 */ {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	/* 8 */ {_SigPanic + _SigUnblock, "SIGFPE: floating-point exception"},
+	/* 9 */ {0, "SIGKILL: kill"},
+	/* 10 */ {_SigNotify, "SIGUSR1: user-defined signal 1"},
+	/* 11 */ {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	/* 12 */ {_SigNotify, "SIGUSR2: user-defined signal 2"},
+	/* 13 */ {_SigNotify, "SIGPIPE: write to broken pipe"},
+	/* 14 */ {_SigNotify, "SIGALRM: alarm clock"},
+	/* 15 */ {_SigNotify + _SigKill, "SIGTERM: termination"},
+	/* 16 */ {_SigThrow + _SigUnblock, "SIGSTKFLT: stack fault"},
+	/* 17 */ {_SigNotify + _SigUnblock + _SigIgn, "SIGCHLD: child status has changed"},
+	/* 18 */ {_SigNotify + _SigDefault + _SigIgn, "SIGCONT: continue"},
+	/* 19 */ {0, "SIGSTOP: stop, unblockable"},
+	/* 20 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTSTP: keyboard stop"},
+	/* 21 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTIN: background read from tty"},
+	/* 22 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTOU: background write to tty"},
+	/* 23 */ {_SigNotify + _SigIgn, "SIGURG: urgent condition on socket"},
+	/* 24 */ {_SigNotify, "SIGXCPU: cpu limit exceeded"},
+	/* 25 */ {_SigNotify, "SIGXFSZ: file size limit exceeded"},
+	/* 26 */ {_SigNotify, "SIGVTALRM: virtual alarm clock"},
+	/* 27 */ {_SigNotify + _SigUnblock, "SIGPROF: profiling alarm clock"},
+	/* 28 */ {_SigNotify + _SigIgn, "SIGWINCH: window size change"},
+	/* 29 */ {_SigNotify, "SIGIO: i/o now possible"},
+	/* 30 */ {_SigNotify, "SIGPWR: power failure restart"},
+	/* 31 */ {_SigThrow, "SIGSYS: bad system call"},
+	/* 32 */ {_SigSetStack + _SigUnblock, "signal 32"}, /* SIGCANCEL; see issue 6997 */
+	/* 33 */ {_SigSetStack + _SigUnblock, "signal 33"}, /* SIGSETXID; see issues 3871, 9400, 12498 */
+	/* 34 */ {_SigSetStack + _SigUnblock, "signal 34"}, /* musl SIGSYNCCALL; see issue 39343 */
+	/* 35 */ {_SigNotify, "signal 35"},
+	/* 36 */ {_SigNotify, "signal 36"},
+	/* 37 */ {_SigNotify, "signal 37"},
+	/* 38 */ {_SigNotify, "signal 38"},
+	/* 39 */ {_SigNotify, "signal 39"},
+	/* 40 */ {_SigNotify, "signal 40"},
+	/* 41 */ {_SigNotify, "signal 41"},
+	/* 42 */ {_SigNotify, "signal 42"},
+	/* 43 */ {_SigNotify, "signal 43"},
+	/* 44 */ {_SigNotify, "signal 44"},
+	/* 45 */ {_SigNotify, "signal 45"},
+	/* 46 */ {_SigNotify, "signal 46"},
+	/* 47 */ {_SigNotify, "signal 47"},
+	/* 48 */ {_SigNotify, "signal 48"},
+	/* 49 */ {_SigNotify, "signal 49"},
+	/* 50 */ {_SigNotify, "signal 50"},
+	/* 51 */ {_SigNotify, "signal 51"},
+	/* 52 */ {_SigNotify, "signal 52"},
+	/* 53 */ {_SigNotify, "signal 53"},
+	/* 54 */ {_SigNotify, "signal 54"},
+	/* 55 */ {_SigNotify, "signal 55"},
+	/* 56 */ {_SigNotify, "signal 56"},
+	/* 57 */ {_SigNotify, "signal 57"},
+	/* 58 */ {_SigNotify, "signal 58"},
+	/* 59 */ {_SigNotify, "signal 59"},
+	/* 60 */ {_SigNotify, "signal 60"},
+	/* 61 */ {_SigNotify, "signal 61"},
+	/* 62 */ {_SigNotify, "signal 62"},
+	/* 63 */ {_SigNotify, "signal 63"},
+	/* 64 */ {_SigNotify, "signal 64"},
+}
diff --git a/src/runtime/sigtab_linux_mipsx.go b/src/runtime/sigtab_linux_mipsx.go
new file mode 100644
index 0000000..295ced5
--- /dev/null
+++ b/src/runtime/sigtab_linux_mipsx.go
@@ -0,0 +1,139 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build (mips || mipsle || mips64 || mips64le) && linux
+
+package runtime
+
+var sigtable = [...]sigTabT{
+	/*  0 */ {0, "SIGNONE: no trap"},
+	/*  1 */ {_SigNotify + _SigKill, "SIGHUP: terminal line hangup"},
+	/*  2 */ {_SigNotify + _SigKill, "SIGINT: interrupt"},
+	/*  3 */ {_SigNotify + _SigThrow, "SIGQUIT: quit"},
+	/*  4 */ {_SigThrow + _SigUnblock, "SIGILL: illegal instruction"},
+	/*  5 */ {_SigThrow + _SigUnblock, "SIGTRAP: trace trap"},
+	/*  6 */ {_SigNotify + _SigThrow, "SIGABRT: abort"},
+	/*  7 */ {_SigThrow, "SIGEMT"},
+	/*  8 */ {_SigPanic + _SigUnblock, "SIGFPE: floating-point exception"},
+	/*  9 */ {0, "SIGKILL: kill"},
+	/*  10 */ {_SigPanic + _SigUnblock, "SIGBUS: bus error"},
+	/*  11 */ {_SigPanic + _SigUnblock, "SIGSEGV: segmentation violation"},
+	/*  12 */ {_SigThrow, "SIGSYS: bad system call"},
+	/*  13 */ {_SigNotify, "SIGPIPE: write to broken pipe"},
+	/*  14 */ {_SigNotify, "SIGALRM: alarm clock"},
+	/*  15 */ {_SigNotify + _SigKill, "SIGTERM: termination"},
+	/*  16 */ {_SigNotify, "SIGUSR1: user-defined signal 1"},
+	/*  17 */ {_SigNotify, "SIGUSR2: user-defined signal 2"},
+	/*  18 */ {_SigNotify + _SigUnblock + _SigIgn, "SIGCHLD: child status has changed"},
+	/*  19 */ {_SigNotify, "SIGPWR: power failure restart"},
+	/*  20 */ {_SigNotify + _SigIgn, "SIGWINCH: window size change"},
+	/*  21 */ {_SigNotify + _SigIgn, "SIGURG: urgent condition on socket"},
+	/*  22 */ {_SigNotify, "SIGIO: i/o now possible"},
+	/*  23 */ {0, "SIGSTOP: stop, unblockable"},
+	/*  24 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTSTP: keyboard stop"},
+	/*  25 */ {_SigNotify + _SigDefault + _SigIgn, "SIGCONT: continue"},
+	/*  26 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTIN: background read from tty"},
+	/*  27 */ {_SigNotify + _SigDefault + _SigIgn, "SIGTTOU: background write to tty"},
+	/*  28 */ {_SigNotify, "SIGVTALRM: virtual alarm clock"},
+	/*  29 */ {_SigNotify + _SigUnblock, "SIGPROF: profiling alarm clock"},
+	/*  30 */ {_SigNotify, "SIGXCPU: cpu limit exceeded"},
+	/*  31 */ {_SigNotify, "SIGXFSZ: file size limit exceeded"},
+	/*  32 */ {_SigSetStack + _SigUnblock, "signal 32"}, /* SIGCANCEL; see issue 6997 */
+	/*  33 */ {_SigSetStack + _SigUnblock, "signal 33"}, /* SIGSETXID; see issues 3871, 9400, 12498 */
+	/*  34 */ {_SigSetStack + _SigUnblock, "signal 34"}, /* musl SIGSYNCCALL; see issue 39343 */
+	/*  35 */ {_SigNotify, "signal 35"},
+	/*  36 */ {_SigNotify, "signal 36"},
+	/*  37 */ {_SigNotify, "signal 37"},
+	/*  38 */ {_SigNotify, "signal 38"},
+	/*  39 */ {_SigNotify, "signal 39"},
+	/*  40 */ {_SigNotify, "signal 40"},
+	/*  41 */ {_SigNotify, "signal 41"},
+	/*  42 */ {_SigNotify, "signal 42"},
+	/*  43 */ {_SigNotify, "signal 43"},
+	/*  44 */ {_SigNotify, "signal 44"},
+	/*  45 */ {_SigNotify, "signal 45"},
+	/*  46 */ {_SigNotify, "signal 46"},
+	/*  47 */ {_SigNotify, "signal 47"},
+	/*  48 */ {_SigNotify, "signal 48"},
+	/*  49 */ {_SigNotify, "signal 49"},
+	/*  50 */ {_SigNotify, "signal 50"},
+	/*  51 */ {_SigNotify, "signal 51"},
+	/*  52 */ {_SigNotify, "signal 52"},
+	/*  53 */ {_SigNotify, "signal 53"},
+	/*  54 */ {_SigNotify, "signal 54"},
+	/*  55 */ {_SigNotify, "signal 55"},
+	/*  56 */ {_SigNotify, "signal 56"},
+	/*  57 */ {_SigNotify, "signal 57"},
+	/*  58 */ {_SigNotify, "signal 58"},
+	/*  59 */ {_SigNotify, "signal 59"},
+	/*  60 */ {_SigNotify, "signal 60"},
+	/*  61 */ {_SigNotify, "signal 61"},
+	/*  62 */ {_SigNotify, "signal 62"},
+	/*  63 */ {_SigNotify, "signal 63"},
+	/*  64 */ {_SigNotify, "signal 64"},
+	/*  65 */ {_SigNotify, "signal 65"},
+	/*  66 */ {_SigNotify, "signal 66"},
+	/*  67 */ {_SigNotify, "signal 67"},
+	/*  68 */ {_SigNotify, "signal 68"},
+	/*  69 */ {_SigNotify, "signal 69"},
+	/*  70 */ {_SigNotify, "signal 70"},
+	/*  71 */ {_SigNotify, "signal 71"},
+	/*  72 */ {_SigNotify, "signal 72"},
+	/*  73 */ {_SigNotify, "signal 73"},
+	/*  74 */ {_SigNotify, "signal 74"},
+	/*  75 */ {_SigNotify, "signal 75"},
+	/*  76 */ {_SigNotify, "signal 76"},
+	/*  77 */ {_SigNotify, "signal 77"},
+	/*  78 */ {_SigNotify, "signal 78"},
+	/*  79 */ {_SigNotify, "signal 79"},
+	/*  80 */ {_SigNotify, "signal 80"},
+	/*  81 */ {_SigNotify, "signal 81"},
+	/*  82 */ {_SigNotify, "signal 82"},
+	/*  83 */ {_SigNotify, "signal 83"},
+	/*  84 */ {_SigNotify, "signal 84"},
+	/*  85 */ {_SigNotify, "signal 85"},
+	/*  86 */ {_SigNotify, "signal 86"},
+	/*  87 */ {_SigNotify, "signal 87"},
+	/*  88 */ {_SigNotify, "signal 88"},
+	/*  89 */ {_SigNotify, "signal 89"},
+	/*  90 */ {_SigNotify, "signal 90"},
+	/*  91 */ {_SigNotify, "signal 91"},
+	/*  92 */ {_SigNotify, "signal 92"},
+	/*  93 */ {_SigNotify, "signal 93"},
+	/*  94 */ {_SigNotify, "signal 94"},
+	/*  95 */ {_SigNotify, "signal 95"},
+	/*  96 */ {_SigNotify, "signal 96"},
+	/*  97 */ {_SigNotify, "signal 97"},
+	/*  98 */ {_SigNotify, "signal 98"},
+	/*  99 */ {_SigNotify, "signal 99"},
+	/* 100 */ {_SigNotify, "signal 100"},
+	/* 101 */ {_SigNotify, "signal 101"},
+	/* 102 */ {_SigNotify, "signal 102"},
+	/* 103 */ {_SigNotify, "signal 103"},
+	/* 104 */ {_SigNotify, "signal 104"},
+	/* 105 */ {_SigNotify, "signal 105"},
+	/* 106 */ {_SigNotify, "signal 106"},
+	/* 107 */ {_SigNotify, "signal 107"},
+	/* 108 */ {_SigNotify, "signal 108"},
+	/* 109 */ {_SigNotify, "signal 109"},
+	/* 110 */ {_SigNotify, "signal 110"},
+	/* 111 */ {_SigNotify, "signal 111"},
+	/* 112 */ {_SigNotify, "signal 112"},
+	/* 113 */ {_SigNotify, "signal 113"},
+	/* 114 */ {_SigNotify, "signal 114"},
+	/* 115 */ {_SigNotify, "signal 115"},
+	/* 116 */ {_SigNotify, "signal 116"},
+	/* 117 */ {_SigNotify, "signal 117"},
+	/* 118 */ {_SigNotify, "signal 118"},
+	/* 119 */ {_SigNotify, "signal 119"},
+	/* 120 */ {_SigNotify, "signal 120"},
+	/* 121 */ {_SigNotify, "signal 121"},
+	/* 122 */ {_SigNotify, "signal 122"},
+	/* 123 */ {_SigNotify, "signal 123"},
+	/* 124 */ {_SigNotify, "signal 124"},
+	/* 125 */ {_SigNotify, "signal 125"},
+	/* 126 */ {_SigNotify, "signal 126"},
+	/* 127 */ {_SigNotify, "signal 127"},
+	/* 128 */ {_SigNotify, "signal 128"},
+}
diff --git a/src/runtime/sizeclasses.go b/src/runtime/sizeclasses.go
new file mode 100644
index 0000000..9314623
--- /dev/null
+++ b/src/runtime/sizeclasses.go
@@ -0,0 +1,98 @@
+// Code generated by mksizeclasses.go; DO NOT EDIT.
+//go:generate go run mksizeclasses.go
+
+package runtime
+
+// class  bytes/obj  bytes/span  objects  tail waste  max waste  min align
+//     1          8        8192     1024           0     87.50%          8
+//     2         16        8192      512           0     43.75%         16
+//     3         24        8192      341           8     29.24%          8
+//     4         32        8192      256           0     21.88%         32
+//     5         48        8192      170          32     31.52%         16
+//     6         64        8192      128           0     23.44%         64
+//     7         80        8192      102          32     19.07%         16
+//     8         96        8192       85          32     15.95%         32
+//     9        112        8192       73          16     13.56%         16
+//    10        128        8192       64           0     11.72%        128
+//    11        144        8192       56         128     11.82%         16
+//    12        160        8192       51          32      9.73%         32
+//    13        176        8192       46          96      9.59%         16
+//    14        192        8192       42         128      9.25%         64
+//    15        208        8192       39          80      8.12%         16
+//    16        224        8192       36         128      8.15%         32
+//    17        240        8192       34          32      6.62%         16
+//    18        256        8192       32           0      5.86%        256
+//    19        288        8192       28         128     12.16%         32
+//    20        320        8192       25         192     11.80%         64
+//    21        352        8192       23          96      9.88%         32
+//    22        384        8192       21         128      9.51%        128
+//    23        416        8192       19         288     10.71%         32
+//    24        448        8192       18         128      8.37%         64
+//    25        480        8192       17          32      6.82%         32
+//    26        512        8192       16           0      6.05%        512
+//    27        576        8192       14         128     12.33%         64
+//    28        640        8192       12         512     15.48%        128
+//    29        704        8192       11         448     13.93%         64
+//    30        768        8192       10         512     13.94%        256
+//    31        896        8192        9         128     15.52%        128
+//    32       1024        8192        8           0     12.40%       1024
+//    33       1152        8192        7         128     12.41%        128
+//    34       1280        8192        6         512     15.55%        256
+//    35       1408       16384       11         896     14.00%        128
+//    36       1536        8192        5         512     14.00%        512
+//    37       1792       16384        9         256     15.57%        256
+//    38       2048        8192        4           0     12.45%       2048
+//    39       2304       16384        7         256     12.46%        256
+//    40       2688        8192        3         128     15.59%        128
+//    41       3072       24576        8           0     12.47%       1024
+//    42       3200       16384        5         384      6.22%        128
+//    43       3456       24576        7         384      8.83%        128
+//    44       4096        8192        2           0     15.60%       4096
+//    45       4864       24576        5         256     16.65%        256
+//    46       5376       16384        3         256     10.92%        256
+//    47       6144       24576        4           0     12.48%       2048
+//    48       6528       32768        5         128      6.23%        128
+//    49       6784       40960        6         256      4.36%        128
+//    50       6912       49152        7         768      3.37%        256
+//    51       8192        8192        1           0     15.61%       8192
+//    52       9472       57344        6         512     14.28%        256
+//    53       9728       49152        5         512      3.64%        512
+//    54      10240       40960        4           0      4.99%       2048
+//    55      10880       32768        3         128      6.24%        128
+//    56      12288       24576        2           0     11.45%       4096
+//    57      13568       40960        3         256      9.99%        256
+//    58      14336       57344        4           0      5.35%       2048
+//    59      16384       16384        1           0     12.49%       8192
+//    60      18432       73728        4           0     11.11%       2048
+//    61      19072       57344        3         128      3.57%        128
+//    62      20480       40960        2           0      6.87%       4096
+//    63      21760       65536        3         256      6.25%        256
+//    64      24576       24576        1           0     11.45%       8192
+//    65      27264       81920        3         128     10.00%        128
+//    66      28672       57344        2           0      4.91%       4096
+//    67      32768       32768        1           0     12.50%       8192
+
+// alignment  bits  min obj size
+//         8     3             8
+//        16     4            32
+//        32     5           256
+//        64     6           512
+//       128     7           768
+//      4096    12         28672
+//      8192    13         32768
+
+const (
+	_MaxSmallSize   = 32768
+	smallSizeDiv    = 8
+	smallSizeMax    = 1024
+	largeSizeDiv    = 128
+	_NumSizeClasses = 68
+	_PageShift      = 13
+	maxObjsPerSpan  = 1024
+)
+
+var class_to_size = [_NumSizeClasses]uint16{0, 8, 16, 24, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 256, 288, 320, 352, 384, 416, 448, 480, 512, 576, 640, 704, 768, 896, 1024, 1152, 1280, 1408, 1536, 1792, 2048, 2304, 2688, 3072, 3200, 3456, 4096, 4864, 5376, 6144, 6528, 6784, 6912, 8192, 9472, 9728, 10240, 10880, 12288, 13568, 14336, 16384, 18432, 19072, 20480, 21760, 24576, 27264, 28672, 32768}
+var class_to_allocnpages = [_NumSizeClasses]uint8{0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 3, 2, 3, 1, 3, 2, 3, 4, 5, 6, 1, 7, 6, 5, 4, 3, 5, 7, 2, 9, 7, 5, 8, 3, 10, 7, 4}
+var class_to_divmagic = [_NumSizeClasses]uint32{0, ^uint32(0)/8 + 1, ^uint32(0)/16 + 1, ^uint32(0)/24 + 1, ^uint32(0)/32 + 1, ^uint32(0)/48 + 1, ^uint32(0)/64 + 1, ^uint32(0)/80 + 1, ^uint32(0)/96 + 1, ^uint32(0)/112 + 1, ^uint32(0)/128 + 1, ^uint32(0)/144 + 1, ^uint32(0)/160 + 1, ^uint32(0)/176 + 1, ^uint32(0)/192 + 1, ^uint32(0)/208 + 1, ^uint32(0)/224 + 1, ^uint32(0)/240 + 1, ^uint32(0)/256 + 1, ^uint32(0)/288 + 1, ^uint32(0)/320 + 1, ^uint32(0)/352 + 1, ^uint32(0)/384 + 1, ^uint32(0)/416 + 1, ^uint32(0)/448 + 1, ^uint32(0)/480 + 1, ^uint32(0)/512 + 1, ^uint32(0)/576 + 1, ^uint32(0)/640 + 1, ^uint32(0)/704 + 1, ^uint32(0)/768 + 1, ^uint32(0)/896 + 1, ^uint32(0)/1024 + 1, ^uint32(0)/1152 + 1, ^uint32(0)/1280 + 1, ^uint32(0)/1408 + 1, ^uint32(0)/1536 + 1, ^uint32(0)/1792 + 1, ^uint32(0)/2048 + 1, ^uint32(0)/2304 + 1, ^uint32(0)/2688 + 1, ^uint32(0)/3072 + 1, ^uint32(0)/3200 + 1, ^uint32(0)/3456 + 1, ^uint32(0)/4096 + 1, ^uint32(0)/4864 + 1, ^uint32(0)/5376 + 1, ^uint32(0)/6144 + 1, ^uint32(0)/6528 + 1, ^uint32(0)/6784 + 1, ^uint32(0)/6912 + 1, ^uint32(0)/8192 + 1, ^uint32(0)/9472 + 1, ^uint32(0)/9728 + 1, ^uint32(0)/10240 + 1, ^uint32(0)/10880 + 1, ^uint32(0)/12288 + 1, ^uint32(0)/13568 + 1, ^uint32(0)/14336 + 1, ^uint32(0)/16384 + 1, ^uint32(0)/18432 + 1, ^uint32(0)/19072 + 1, ^uint32(0)/20480 + 1, ^uint32(0)/21760 + 1, ^uint32(0)/24576 + 1, ^uint32(0)/27264 + 1, ^uint32(0)/28672 + 1, ^uint32(0)/32768 + 1}
+var size_to_class8 = [smallSizeMax/smallSizeDiv + 1]uint8{0, 1, 2, 3, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 24, 24, 24, 24, 25, 25, 25, 25, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32}
+var size_to_class128 = [(_MaxSmallSize-smallSizeMax)/largeSizeDiv + 1]uint8{32, 33, 34, 35, 36, 37, 37, 38, 38, 39, 39, 40, 40, 40, 41, 41, 41, 42, 43, 43, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 48, 48, 48, 49, 49, 50, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 53, 53, 54, 54, 54, 54, 55, 55, 55, 55, 55, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 58, 58, 58, 58, 58, 58, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 61, 61, 61, 61, 61, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67}
diff --git a/src/runtime/sizeof_test.go b/src/runtime/sizeof_test.go
new file mode 100644
index 0000000..fb91954
--- /dev/null
+++ b/src/runtime/sizeof_test.go
@@ -0,0 +1,38 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"reflect"
+	"runtime"
+	"testing"
+	"unsafe"
+)
+
+// Assert that the size of important structures do not change unexpectedly.
+
+func TestSizeof(t *testing.T) {
+	const _64bit = unsafe.Sizeof(uintptr(0)) == 8
+
+	var tests = []struct {
+		val    any     // type as a value
+		_32bit uintptr // size on 32bit platforms
+		_64bit uintptr // size on 64bit platforms
+	}{
+		{runtime.G{}, 252, 408},   // g, but exported for testing
+		{runtime.Sudog{}, 56, 88}, // sudog, but exported for testing
+	}
+
+	for _, tt := range tests {
+		want := tt._32bit
+		if _64bit {
+			want = tt._64bit
+		}
+		got := reflect.TypeOf(tt.val).Size()
+		if want != got {
+			t.Errorf("unsafe.Sizeof(%T) = %d, want %d", tt.val, got, want)
+		}
+	}
+}
diff --git a/src/runtime/slice.go b/src/runtime/slice.go
new file mode 100644
index 0000000..228697a
--- /dev/null
+++ b/src/runtime/slice.go
@@ -0,0 +1,355 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"runtime/internal/math"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type slice struct {
+	array unsafe.Pointer
+	len   int
+	cap   int
+}
+
+// A notInHeapSlice is a slice backed by runtime/internal/sys.NotInHeap memory.
+type notInHeapSlice struct {
+	array *notInHeap
+	len   int
+	cap   int
+}
+
+func panicmakeslicelen() {
+	panic(errorString("makeslice: len out of range"))
+}
+
+func panicmakeslicecap() {
+	panic(errorString("makeslice: cap out of range"))
+}
+
+// makeslicecopy allocates a slice of "tolen" elements of type "et",
+// then copies "fromlen" elements of type "et" into that new allocation from "from".
+func makeslicecopy(et *_type, tolen int, fromlen int, from unsafe.Pointer) unsafe.Pointer {
+	var tomem, copymem uintptr
+	if uintptr(tolen) > uintptr(fromlen) {
+		var overflow bool
+		tomem, overflow = math.MulUintptr(et.Size_, uintptr(tolen))
+		if overflow || tomem > maxAlloc || tolen < 0 {
+			panicmakeslicelen()
+		}
+		copymem = et.Size_ * uintptr(fromlen)
+	} else {
+		// fromlen is a known good length providing and equal or greater than tolen,
+		// thereby making tolen a good slice length too as from and to slices have the
+		// same element width.
+		tomem = et.Size_ * uintptr(tolen)
+		copymem = tomem
+	}
+
+	var to unsafe.Pointer
+	if et.PtrBytes == 0 {
+		to = mallocgc(tomem, nil, false)
+		if copymem < tomem {
+			memclrNoHeapPointers(add(to, copymem), tomem-copymem)
+		}
+	} else {
+		// Note: can't use rawmem (which avoids zeroing of memory), because then GC can scan uninitialized memory.
+		to = mallocgc(tomem, et, true)
+		if copymem > 0 && writeBarrier.enabled {
+			// Only shade the pointers in old.array since we know the destination slice to
+			// only contains nil pointers because it has been cleared during alloc.
+			bulkBarrierPreWriteSrcOnly(uintptr(to), uintptr(from), copymem)
+		}
+	}
+
+	if raceenabled {
+		callerpc := getcallerpc()
+		pc := abi.FuncPCABIInternal(makeslicecopy)
+		racereadrangepc(from, copymem, callerpc, pc)
+	}
+	if msanenabled {
+		msanread(from, copymem)
+	}
+	if asanenabled {
+		asanread(from, copymem)
+	}
+
+	memmove(to, from, copymem)
+
+	return to
+}
+
+func makeslice(et *_type, len, cap int) unsafe.Pointer {
+	mem, overflow := math.MulUintptr(et.Size_, uintptr(cap))
+	if overflow || mem > maxAlloc || len < 0 || len > cap {
+		// NOTE: Produce a 'len out of range' error instead of a
+		// 'cap out of range' error when someone does make([]T, bignumber).
+		// 'cap out of range' is true too, but since the cap is only being
+		// supplied implicitly, saying len is clearer.
+		// See golang.org/issue/4085.
+		mem, overflow := math.MulUintptr(et.Size_, uintptr(len))
+		if overflow || mem > maxAlloc || len < 0 {
+			panicmakeslicelen()
+		}
+		panicmakeslicecap()
+	}
+
+	return mallocgc(mem, et, true)
+}
+
+func makeslice64(et *_type, len64, cap64 int64) unsafe.Pointer {
+	len := int(len64)
+	if int64(len) != len64 {
+		panicmakeslicelen()
+	}
+
+	cap := int(cap64)
+	if int64(cap) != cap64 {
+		panicmakeslicecap()
+	}
+
+	return makeslice(et, len, cap)
+}
+
+// This is a wrapper over runtime/internal/math.MulUintptr,
+// so the compiler can recognize and treat it as an intrinsic.
+func mulUintptr(a, b uintptr) (uintptr, bool) {
+	return math.MulUintptr(a, b)
+}
+
+// growslice allocates new backing store for a slice.
+//
+// arguments:
+//
+//	oldPtr = pointer to the slice's backing array
+//	newLen = new length (= oldLen + num)
+//	oldCap = original slice's capacity.
+//	   num = number of elements being added
+//	    et = element type
+//
+// return values:
+//
+//	newPtr = pointer to the new backing store
+//	newLen = same value as the argument
+//	newCap = capacity of the new backing store
+//
+// Requires that uint(newLen) > uint(oldCap).
+// Assumes the original slice length is newLen - num
+//
+// A new backing store is allocated with space for at least newLen elements.
+// Existing entries [0, oldLen) are copied over to the new backing store.
+// Added entries [oldLen, newLen) are not initialized by growslice
+// (although for pointer-containing element types, they are zeroed). They
+// must be initialized by the caller.
+// Trailing entries [newLen, newCap) are zeroed.
+//
+// growslice's odd calling convention makes the generated code that calls
+// this function simpler. In particular, it accepts and returns the
+// new length so that the old length is not live (does not need to be
+// spilled/restored) and the new length is returned (also does not need
+// to be spilled/restored).
+func growslice(oldPtr unsafe.Pointer, newLen, oldCap, num int, et *_type) slice {
+	oldLen := newLen - num
+	if raceenabled {
+		callerpc := getcallerpc()
+		racereadrangepc(oldPtr, uintptr(oldLen*int(et.Size_)), callerpc, abi.FuncPCABIInternal(growslice))
+	}
+	if msanenabled {
+		msanread(oldPtr, uintptr(oldLen*int(et.Size_)))
+	}
+	if asanenabled {
+		asanread(oldPtr, uintptr(oldLen*int(et.Size_)))
+	}
+
+	if newLen < 0 {
+		panic(errorString("growslice: len out of range"))
+	}
+
+	if et.Size_ == 0 {
+		// append should not create a slice with nil pointer but non-zero len.
+		// We assume that append doesn't need to preserve oldPtr in this case.
+		return slice{unsafe.Pointer(&zerobase), newLen, newLen}
+	}
+
+	newcap := oldCap
+	doublecap := newcap + newcap
+	if newLen > doublecap {
+		newcap = newLen
+	} else {
+		const threshold = 256
+		if oldCap < threshold {
+			newcap = doublecap
+		} else {
+			// Check 0 < newcap to detect overflow
+			// and prevent an infinite loop.
+			for 0 < newcap && newcap < newLen {
+				// Transition from growing 2x for small slices
+				// to growing 1.25x for large slices. This formula
+				// gives a smooth-ish transition between the two.
+				newcap += (newcap + 3*threshold) / 4
+			}
+			// Set newcap to the requested cap when
+			// the newcap calculation overflowed.
+			if newcap <= 0 {
+				newcap = newLen
+			}
+		}
+	}
+
+	var overflow bool
+	var lenmem, newlenmem, capmem uintptr
+	// Specialize for common values of et.Size.
+	// For 1 we don't need any division/multiplication.
+	// For goarch.PtrSize, compiler will optimize division/multiplication into a shift by a constant.
+	// For powers of 2, use a variable shift.
+	switch {
+	case et.Size_ == 1:
+		lenmem = uintptr(oldLen)
+		newlenmem = uintptr(newLen)
+		capmem = roundupsize(uintptr(newcap))
+		overflow = uintptr(newcap) > maxAlloc
+		newcap = int(capmem)
+	case et.Size_ == goarch.PtrSize:
+		lenmem = uintptr(oldLen) * goarch.PtrSize
+		newlenmem = uintptr(newLen) * goarch.PtrSize
+		capmem = roundupsize(uintptr(newcap) * goarch.PtrSize)
+		overflow = uintptr(newcap) > maxAlloc/goarch.PtrSize
+		newcap = int(capmem / goarch.PtrSize)
+	case isPowerOfTwo(et.Size_):
+		var shift uintptr
+		if goarch.PtrSize == 8 {
+			// Mask shift for better code generation.
+			shift = uintptr(sys.TrailingZeros64(uint64(et.Size_))) & 63
+		} else {
+			shift = uintptr(sys.TrailingZeros32(uint32(et.Size_))) & 31
+		}
+		lenmem = uintptr(oldLen) << shift
+		newlenmem = uintptr(newLen) << shift
+		capmem = roundupsize(uintptr(newcap) << shift)
+		overflow = uintptr(newcap) > (maxAlloc >> shift)
+		newcap = int(capmem >> shift)
+		capmem = uintptr(newcap) << shift
+	default:
+		lenmem = uintptr(oldLen) * et.Size_
+		newlenmem = uintptr(newLen) * et.Size_
+		capmem, overflow = math.MulUintptr(et.Size_, uintptr(newcap))
+		capmem = roundupsize(capmem)
+		newcap = int(capmem / et.Size_)
+		capmem = uintptr(newcap) * et.Size_
+	}
+
+	// The check of overflow in addition to capmem > maxAlloc is needed
+	// to prevent an overflow which can be used to trigger a segfault
+	// on 32bit architectures with this example program:
+	//
+	// type T [1<<27 + 1]int64
+	//
+	// var d T
+	// var s []T
+	//
+	// func main() {
+	//   s = append(s, d, d, d, d)
+	//   print(len(s), "\n")
+	// }
+	if overflow || capmem > maxAlloc {
+		panic(errorString("growslice: len out of range"))
+	}
+
+	var p unsafe.Pointer
+	if et.PtrBytes == 0 {
+		p = mallocgc(capmem, nil, false)
+		// The append() that calls growslice is going to overwrite from oldLen to newLen.
+		// Only clear the part that will not be overwritten.
+		// The reflect_growslice() that calls growslice will manually clear
+		// the region not cleared here.
+		memclrNoHeapPointers(add(p, newlenmem), capmem-newlenmem)
+	} else {
+		// Note: can't use rawmem (which avoids zeroing of memory), because then GC can scan uninitialized memory.
+		p = mallocgc(capmem, et, true)
+		if lenmem > 0 && writeBarrier.enabled {
+			// Only shade the pointers in oldPtr since we know the destination slice p
+			// only contains nil pointers because it has been cleared during alloc.
+			bulkBarrierPreWriteSrcOnly(uintptr(p), uintptr(oldPtr), lenmem-et.Size_+et.PtrBytes)
+		}
+	}
+	memmove(p, oldPtr, lenmem)
+
+	return slice{p, newLen, newcap}
+}
+
+//go:linkname reflect_growslice reflect.growslice
+func reflect_growslice(et *_type, old slice, num int) slice {
+	// Semantically equivalent to slices.Grow, except that the caller
+	// is responsible for ensuring that old.len+num > old.cap.
+	num -= old.cap - old.len // preserve memory of old[old.len:old.cap]
+	new := growslice(old.array, old.cap+num, old.cap, num, et)
+	// growslice does not zero out new[old.cap:new.len] since it assumes that
+	// the memory will be overwritten by an append() that called growslice.
+	// Since the caller of reflect_growslice is not append(),
+	// zero out this region before returning the slice to the reflect package.
+	if et.PtrBytes == 0 {
+		oldcapmem := uintptr(old.cap) * et.Size_
+		newlenmem := uintptr(new.len) * et.Size_
+		memclrNoHeapPointers(add(new.array, oldcapmem), newlenmem-oldcapmem)
+	}
+	new.len = old.len // preserve the old length
+	return new
+}
+
+func isPowerOfTwo(x uintptr) bool {
+	return x&(x-1) == 0
+}
+
+// slicecopy is used to copy from a string or slice of pointerless elements into a slice.
+func slicecopy(toPtr unsafe.Pointer, toLen int, fromPtr unsafe.Pointer, fromLen int, width uintptr) int {
+	if fromLen == 0 || toLen == 0 {
+		return 0
+	}
+
+	n := fromLen
+	if toLen < n {
+		n = toLen
+	}
+
+	if width == 0 {
+		return n
+	}
+
+	size := uintptr(n) * width
+	if raceenabled {
+		callerpc := getcallerpc()
+		pc := abi.FuncPCABIInternal(slicecopy)
+		racereadrangepc(fromPtr, size, callerpc, pc)
+		racewriterangepc(toPtr, size, callerpc, pc)
+	}
+	if msanenabled {
+		msanread(fromPtr, size)
+		msanwrite(toPtr, size)
+	}
+	if asanenabled {
+		asanread(fromPtr, size)
+		asanwrite(toPtr, size)
+	}
+
+	if size == 1 { // common case worth about 2x to do here
+		// TODO: is this still worth it with new memmove impl?
+		*(*byte)(toPtr) = *(*byte)(fromPtr) // known to be a byte pointer
+	} else {
+		memmove(toPtr, fromPtr, size)
+	}
+	return n
+}
+
+//go:linkname bytealg_MakeNoZero internal/bytealg.MakeNoZero
+func bytealg_MakeNoZero(len int) []byte {
+	if uintptr(len) > maxAlloc {
+		panicmakeslicelen()
+	}
+	return unsafe.Slice((*byte)(mallocgc(uintptr(len), nil, false)), len)
+}
diff --git a/src/runtime/slice_test.go b/src/runtime/slice_test.go
new file mode 100644
index 0000000..cd2bc26
--- /dev/null
+++ b/src/runtime/slice_test.go
@@ -0,0 +1,501 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"testing"
+)
+
+const N = 20
+
+func BenchmarkMakeSliceCopy(b *testing.B) {
+	const length = 32
+	var bytes = make([]byte, 8*length)
+	var ints = make([]int, length)
+	var ptrs = make([]*byte, length)
+	b.Run("mallocmove", func(b *testing.B) {
+		b.Run("Byte", func(b *testing.B) {
+			var x []byte
+			for i := 0; i < b.N; i++ {
+				x = make([]byte, len(bytes))
+				copy(x, bytes)
+			}
+		})
+		b.Run("Int", func(b *testing.B) {
+			var x []int
+			for i := 0; i < b.N; i++ {
+				x = make([]int, len(ints))
+				copy(x, ints)
+			}
+		})
+		b.Run("Ptr", func(b *testing.B) {
+			var x []*byte
+			for i := 0; i < b.N; i++ {
+				x = make([]*byte, len(ptrs))
+				copy(x, ptrs)
+			}
+
+		})
+	})
+	b.Run("makecopy", func(b *testing.B) {
+		b.Run("Byte", func(b *testing.B) {
+			var x []byte
+			for i := 0; i < b.N; i++ {
+				x = make([]byte, 8*length)
+				copy(x, bytes)
+			}
+		})
+		b.Run("Int", func(b *testing.B) {
+			var x []int
+			for i := 0; i < b.N; i++ {
+				x = make([]int, length)
+				copy(x, ints)
+			}
+		})
+		b.Run("Ptr", func(b *testing.B) {
+			var x []*byte
+			for i := 0; i < b.N; i++ {
+				x = make([]*byte, length)
+				copy(x, ptrs)
+			}
+
+		})
+	})
+	b.Run("nilappend", func(b *testing.B) {
+		b.Run("Byte", func(b *testing.B) {
+			var x []byte
+			for i := 0; i < b.N; i++ {
+				x = append([]byte(nil), bytes...)
+				_ = x
+			}
+		})
+		b.Run("Int", func(b *testing.B) {
+			var x []int
+			for i := 0; i < b.N; i++ {
+				x = append([]int(nil), ints...)
+				_ = x
+			}
+		})
+		b.Run("Ptr", func(b *testing.B) {
+			var x []*byte
+			for i := 0; i < b.N; i++ {
+				x = append([]*byte(nil), ptrs...)
+				_ = x
+			}
+		})
+	})
+}
+
+type (
+	struct24 struct{ a, b, c int64 }
+	struct32 struct{ a, b, c, d int64 }
+	struct40 struct{ a, b, c, d, e int64 }
+)
+
+func BenchmarkMakeSlice(b *testing.B) {
+	const length = 2
+	b.Run("Byte", func(b *testing.B) {
+		var x []byte
+		for i := 0; i < b.N; i++ {
+			x = make([]byte, length, 2*length)
+			_ = x
+		}
+	})
+	b.Run("Int16", func(b *testing.B) {
+		var x []int16
+		for i := 0; i < b.N; i++ {
+			x = make([]int16, length, 2*length)
+			_ = x
+		}
+	})
+	b.Run("Int", func(b *testing.B) {
+		var x []int
+		for i := 0; i < b.N; i++ {
+			x = make([]int, length, 2*length)
+			_ = x
+		}
+	})
+	b.Run("Ptr", func(b *testing.B) {
+		var x []*byte
+		for i := 0; i < b.N; i++ {
+			x = make([]*byte, length, 2*length)
+			_ = x
+		}
+	})
+	b.Run("Struct", func(b *testing.B) {
+		b.Run("24", func(b *testing.B) {
+			var x []struct24
+			for i := 0; i < b.N; i++ {
+				x = make([]struct24, length, 2*length)
+				_ = x
+			}
+		})
+		b.Run("32", func(b *testing.B) {
+			var x []struct32
+			for i := 0; i < b.N; i++ {
+				x = make([]struct32, length, 2*length)
+				_ = x
+			}
+		})
+		b.Run("40", func(b *testing.B) {
+			var x []struct40
+			for i := 0; i < b.N; i++ {
+				x = make([]struct40, length, 2*length)
+				_ = x
+			}
+		})
+
+	})
+}
+
+func BenchmarkGrowSlice(b *testing.B) {
+	b.Run("Byte", func(b *testing.B) {
+		x := make([]byte, 9)
+		for i := 0; i < b.N; i++ {
+			_ = append([]byte(nil), x...)
+		}
+	})
+	b.Run("Int16", func(b *testing.B) {
+		x := make([]int16, 9)
+		for i := 0; i < b.N; i++ {
+			_ = append([]int16(nil), x...)
+		}
+	})
+	b.Run("Int", func(b *testing.B) {
+		x := make([]int, 9)
+		for i := 0; i < b.N; i++ {
+			_ = append([]int(nil), x...)
+		}
+	})
+	b.Run("Ptr", func(b *testing.B) {
+		x := make([]*byte, 9)
+		for i := 0; i < b.N; i++ {
+			_ = append([]*byte(nil), x...)
+		}
+	})
+	b.Run("Struct", func(b *testing.B) {
+		b.Run("24", func(b *testing.B) {
+			x := make([]struct24, 9)
+			for i := 0; i < b.N; i++ {
+				_ = append([]struct24(nil), x...)
+			}
+		})
+		b.Run("32", func(b *testing.B) {
+			x := make([]struct32, 9)
+			for i := 0; i < b.N; i++ {
+				_ = append([]struct32(nil), x...)
+			}
+		})
+		b.Run("40", func(b *testing.B) {
+			x := make([]struct40, 9)
+			for i := 0; i < b.N; i++ {
+				_ = append([]struct40(nil), x...)
+			}
+		})
+
+	})
+}
+
+var (
+	SinkIntSlice        []int
+	SinkIntPointerSlice []*int
+)
+
+func BenchmarkExtendSlice(b *testing.B) {
+	var length = 4 // Use a variable to prevent stack allocation of slices.
+	b.Run("IntSlice", func(b *testing.B) {
+		s := make([]int, 0, length)
+		for i := 0; i < b.N; i++ {
+			s = append(s[:0:length/2], make([]int, length)...)
+		}
+		SinkIntSlice = s
+	})
+	b.Run("PointerSlice", func(b *testing.B) {
+		s := make([]*int, 0, length)
+		for i := 0; i < b.N; i++ {
+			s = append(s[:0:length/2], make([]*int, length)...)
+		}
+		SinkIntPointerSlice = s
+	})
+	b.Run("NoGrow", func(b *testing.B) {
+		s := make([]int, 0, length)
+		for i := 0; i < b.N; i++ {
+			s = append(s[:0:length], make([]int, length)...)
+		}
+		SinkIntSlice = s
+	})
+}
+
+func BenchmarkAppend(b *testing.B) {
+	b.StopTimer()
+	x := make([]int, 0, N)
+	b.StartTimer()
+	for i := 0; i < b.N; i++ {
+		x = x[0:0]
+		for j := 0; j < N; j++ {
+			x = append(x, j)
+		}
+	}
+}
+
+func BenchmarkAppendGrowByte(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		var x []byte
+		for j := 0; j < 1<<20; j++ {
+			x = append(x, byte(j))
+		}
+	}
+}
+
+func BenchmarkAppendGrowString(b *testing.B) {
+	var s string
+	for i := 0; i < b.N; i++ {
+		var x []string
+		for j := 0; j < 1<<20; j++ {
+			x = append(x, s)
+		}
+	}
+}
+
+func BenchmarkAppendSlice(b *testing.B) {
+	for _, length := range []int{1, 4, 7, 8, 15, 16, 32} {
+		b.Run(fmt.Sprint(length, "Bytes"), func(b *testing.B) {
+			x := make([]byte, 0, N)
+			y := make([]byte, length)
+			for i := 0; i < b.N; i++ {
+				x = x[0:0]
+				x = append(x, y...)
+			}
+		})
+	}
+}
+
+var (
+	blackhole []byte
+)
+
+func BenchmarkAppendSliceLarge(b *testing.B) {
+	for _, length := range []int{1 << 10, 4 << 10, 16 << 10, 64 << 10, 256 << 10, 1024 << 10} {
+		y := make([]byte, length)
+		b.Run(fmt.Sprint(length, "Bytes"), func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				blackhole = nil
+				blackhole = append(blackhole, y...)
+			}
+		})
+	}
+}
+
+func BenchmarkAppendStr(b *testing.B) {
+	for _, str := range []string{
+		"1",
+		"1234",
+		"12345678",
+		"1234567890123456",
+		"12345678901234567890123456789012",
+	} {
+		b.Run(fmt.Sprint(len(str), "Bytes"), func(b *testing.B) {
+			x := make([]byte, 0, N)
+			for i := 0; i < b.N; i++ {
+				x = x[0:0]
+				x = append(x, str...)
+			}
+		})
+	}
+}
+
+func BenchmarkAppendSpecialCase(b *testing.B) {
+	b.StopTimer()
+	x := make([]int, 0, N)
+	b.StartTimer()
+	for i := 0; i < b.N; i++ {
+		x = x[0:0]
+		for j := 0; j < N; j++ {
+			if len(x) < cap(x) {
+				x = x[:len(x)+1]
+				x[len(x)-1] = j
+			} else {
+				x = append(x, j)
+			}
+		}
+	}
+}
+
+var x []int
+
+func f() int {
+	x[:1][0] = 3
+	return 2
+}
+
+func TestSideEffectOrder(t *testing.T) {
+	x = make([]int, 0, 10)
+	x = append(x, 1, f())
+	if x[0] != 1 || x[1] != 2 {
+		t.Error("append failed: ", x[0], x[1])
+	}
+}
+
+func TestAppendOverlap(t *testing.T) {
+	x := []byte("1234")
+	x = append(x[1:], x...) // p > q in runtime·appendslice.
+	got := string(x)
+	want := "2341234"
+	if got != want {
+		t.Errorf("overlap failed: got %q want %q", got, want)
+	}
+}
+
+func BenchmarkCopy(b *testing.B) {
+	for _, l := range []int{1, 2, 4, 8, 12, 16, 32, 128, 1024} {
+		buf := make([]byte, 4096)
+		b.Run(fmt.Sprint(l, "Byte"), func(b *testing.B) {
+			s := make([]byte, l)
+			var n int
+			for i := 0; i < b.N; i++ {
+				n = copy(buf, s)
+			}
+			b.SetBytes(int64(n))
+		})
+		b.Run(fmt.Sprint(l, "String"), func(b *testing.B) {
+			s := string(make([]byte, l))
+			var n int
+			for i := 0; i < b.N; i++ {
+				n = copy(buf, s)
+			}
+			b.SetBytes(int64(n))
+		})
+	}
+}
+
+var (
+	sByte []byte
+	s1Ptr []uintptr
+	s2Ptr [][2]uintptr
+	s3Ptr [][3]uintptr
+	s4Ptr [][4]uintptr
+)
+
+// BenchmarkAppendInPlace tests the performance of append
+// when the result is being written back to the same slice.
+// In order for the in-place optimization to occur,
+// the slice must be referred to by address;
+// using a global is an easy way to trigger that.
+// We test the "grow" and "no grow" paths separately,
+// but not the "normal" (occasionally grow) path,
+// because it is a blend of the other two.
+// We use small numbers and small sizes in an attempt
+// to avoid benchmarking memory allocation and copying.
+// We use scalars instead of pointers in an attempt
+// to avoid benchmarking the write barriers.
+// We benchmark four common sizes (byte, pointer, string/interface, slice),
+// and one larger size.
+func BenchmarkAppendInPlace(b *testing.B) {
+	b.Run("NoGrow", func(b *testing.B) {
+		const C = 128
+
+		b.Run("Byte", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				sByte = make([]byte, C)
+				for j := 0; j < C; j++ {
+					sByte = append(sByte, 0x77)
+				}
+			}
+		})
+
+		b.Run("1Ptr", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				s1Ptr = make([]uintptr, C)
+				for j := 0; j < C; j++ {
+					s1Ptr = append(s1Ptr, 0x77)
+				}
+			}
+		})
+
+		b.Run("2Ptr", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				s2Ptr = make([][2]uintptr, C)
+				for j := 0; j < C; j++ {
+					s2Ptr = append(s2Ptr, [2]uintptr{0x77, 0x88})
+				}
+			}
+		})
+
+		b.Run("3Ptr", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				s3Ptr = make([][3]uintptr, C)
+				for j := 0; j < C; j++ {
+					s3Ptr = append(s3Ptr, [3]uintptr{0x77, 0x88, 0x99})
+				}
+			}
+		})
+
+		b.Run("4Ptr", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				s4Ptr = make([][4]uintptr, C)
+				for j := 0; j < C; j++ {
+					s4Ptr = append(s4Ptr, [4]uintptr{0x77, 0x88, 0x99, 0xAA})
+				}
+			}
+		})
+
+	})
+
+	b.Run("Grow", func(b *testing.B) {
+		const C = 5
+
+		b.Run("Byte", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				sByte = make([]byte, 0)
+				for j := 0; j < C; j++ {
+					sByte = append(sByte, 0x77)
+					sByte = sByte[:cap(sByte)]
+				}
+			}
+		})
+
+		b.Run("1Ptr", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				s1Ptr = make([]uintptr, 0)
+				for j := 0; j < C; j++ {
+					s1Ptr = append(s1Ptr, 0x77)
+					s1Ptr = s1Ptr[:cap(s1Ptr)]
+				}
+			}
+		})
+
+		b.Run("2Ptr", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				s2Ptr = make([][2]uintptr, 0)
+				for j := 0; j < C; j++ {
+					s2Ptr = append(s2Ptr, [2]uintptr{0x77, 0x88})
+					s2Ptr = s2Ptr[:cap(s2Ptr)]
+				}
+			}
+		})
+
+		b.Run("3Ptr", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				s3Ptr = make([][3]uintptr, 0)
+				for j := 0; j < C; j++ {
+					s3Ptr = append(s3Ptr, [3]uintptr{0x77, 0x88, 0x99})
+					s3Ptr = s3Ptr[:cap(s3Ptr)]
+				}
+			}
+		})
+
+		b.Run("4Ptr", func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				s4Ptr = make([][4]uintptr, 0)
+				for j := 0; j < C; j++ {
+					s4Ptr = append(s4Ptr, [4]uintptr{0x77, 0x88, 0x99, 0xAA})
+					s4Ptr = s4Ptr[:cap(s4Ptr)]
+				}
+			}
+		})
+
+	})
+}
diff --git a/src/runtime/softfloat64.go b/src/runtime/softfloat64.go
new file mode 100644
index 0000000..42ef009
--- /dev/null
+++ b/src/runtime/softfloat64.go
@@ -0,0 +1,627 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Software IEEE754 64-bit floating point.
+// Only referred to (and thus linked in) by softfloat targets
+// and by tests in this directory.
+
+package runtime
+
+const (
+	mantbits64 uint = 52
+	expbits64  uint = 11
+	bias64          = -1<<(expbits64-1) + 1
+
+	nan64 uint64 = (1<<expbits64-1)<<mantbits64 + 1<<(mantbits64-1) // quiet NaN, 0 payload
+	inf64 uint64 = (1<<expbits64 - 1) << mantbits64
+	neg64 uint64 = 1 << (expbits64 + mantbits64)
+
+	mantbits32 uint = 23
+	expbits32  uint = 8
+	bias32          = -1<<(expbits32-1) + 1
+
+	nan32 uint32 = (1<<expbits32-1)<<mantbits32 + 1<<(mantbits32-1) // quiet NaN, 0 payload
+	inf32 uint32 = (1<<expbits32 - 1) << mantbits32
+	neg32 uint32 = 1 << (expbits32 + mantbits32)
+)
+
+func funpack64(f uint64) (sign, mant uint64, exp int, inf, nan bool) {
+	sign = f & (1 << (mantbits64 + expbits64))
+	mant = f & (1<<mantbits64 - 1)
+	exp = int(f>>mantbits64) & (1<<expbits64 - 1)
+
+	switch exp {
+	case 1<<expbits64 - 1:
+		if mant != 0 {
+			nan = true
+			return
+		}
+		inf = true
+		return
+
+	case 0:
+		// denormalized
+		if mant != 0 {
+			exp += bias64 + 1
+			for mant < 1<<mantbits64 {
+				mant <<= 1
+				exp--
+			}
+		}
+
+	default:
+		// add implicit top bit
+		mant |= 1 << mantbits64
+		exp += bias64
+	}
+	return
+}
+
+func funpack32(f uint32) (sign, mant uint32, exp int, inf, nan bool) {
+	sign = f & (1 << (mantbits32 + expbits32))
+	mant = f & (1<<mantbits32 - 1)
+	exp = int(f>>mantbits32) & (1<<expbits32 - 1)
+
+	switch exp {
+	case 1<<expbits32 - 1:
+		if mant != 0 {
+			nan = true
+			return
+		}
+		inf = true
+		return
+
+	case 0:
+		// denormalized
+		if mant != 0 {
+			exp += bias32 + 1
+			for mant < 1<<mantbits32 {
+				mant <<= 1
+				exp--
+			}
+		}
+
+	default:
+		// add implicit top bit
+		mant |= 1 << mantbits32
+		exp += bias32
+	}
+	return
+}
+
+func fpack64(sign, mant uint64, exp int, trunc uint64) uint64 {
+	mant0, exp0, trunc0 := mant, exp, trunc
+	if mant == 0 {
+		return sign
+	}
+	for mant < 1<<mantbits64 {
+		mant <<= 1
+		exp--
+	}
+	for mant >= 4<<mantbits64 {
+		trunc |= mant & 1
+		mant >>= 1
+		exp++
+	}
+	if mant >= 2<<mantbits64 {
+		if mant&1 != 0 && (trunc != 0 || mant&2 != 0) {
+			mant++
+			if mant >= 4<<mantbits64 {
+				mant >>= 1
+				exp++
+			}
+		}
+		mant >>= 1
+		exp++
+	}
+	if exp >= 1<<expbits64-1+bias64 {
+		return sign ^ inf64
+	}
+	if exp < bias64+1 {
+		if exp < bias64-int(mantbits64) {
+			return sign | 0
+		}
+		// repeat expecting denormal
+		mant, exp, trunc = mant0, exp0, trunc0
+		for exp < bias64 {
+			trunc |= mant & 1
+			mant >>= 1
+			exp++
+		}
+		if mant&1 != 0 && (trunc != 0 || mant&2 != 0) {
+			mant++
+		}
+		mant >>= 1
+		exp++
+		if mant < 1<<mantbits64 {
+			return sign | mant
+		}
+	}
+	return sign | uint64(exp-bias64)<<mantbits64 | mant&(1<<mantbits64-1)
+}
+
+func fpack32(sign, mant uint32, exp int, trunc uint32) uint32 {
+	mant0, exp0, trunc0 := mant, exp, trunc
+	if mant == 0 {
+		return sign
+	}
+	for mant < 1<<mantbits32 {
+		mant <<= 1
+		exp--
+	}
+	for mant >= 4<<mantbits32 {
+		trunc |= mant & 1
+		mant >>= 1
+		exp++
+	}
+	if mant >= 2<<mantbits32 {
+		if mant&1 != 0 && (trunc != 0 || mant&2 != 0) {
+			mant++
+			if mant >= 4<<mantbits32 {
+				mant >>= 1
+				exp++
+			}
+		}
+		mant >>= 1
+		exp++
+	}
+	if exp >= 1<<expbits32-1+bias32 {
+		return sign ^ inf32
+	}
+	if exp < bias32+1 {
+		if exp < bias32-int(mantbits32) {
+			return sign | 0
+		}
+		// repeat expecting denormal
+		mant, exp, trunc = mant0, exp0, trunc0
+		for exp < bias32 {
+			trunc |= mant & 1
+			mant >>= 1
+			exp++
+		}
+		if mant&1 != 0 && (trunc != 0 || mant&2 != 0) {
+			mant++
+		}
+		mant >>= 1
+		exp++
+		if mant < 1<<mantbits32 {
+			return sign | mant
+		}
+	}
+	return sign | uint32(exp-bias32)<<mantbits32 | mant&(1<<mantbits32-1)
+}
+
+func fadd64(f, g uint64) uint64 {
+	fs, fm, fe, fi, fn := funpack64(f)
+	gs, gm, ge, gi, gn := funpack64(g)
+
+	// Special cases.
+	switch {
+	case fn || gn: // NaN + x or x + NaN = NaN
+		return nan64
+
+	case fi && gi && fs != gs: // +Inf + -Inf or -Inf + +Inf = NaN
+		return nan64
+
+	case fi: // ±Inf + g = ±Inf
+		return f
+
+	case gi: // f + ±Inf = ±Inf
+		return g
+
+	case fm == 0 && gm == 0 && fs != 0 && gs != 0: // -0 + -0 = -0
+		return f
+
+	case fm == 0: // 0 + g = g but 0 + -0 = +0
+		if gm == 0 {
+			g ^= gs
+		}
+		return g
+
+	case gm == 0: // f + 0 = f
+		return f
+
+	}
+
+	if fe < ge || fe == ge && fm < gm {
+		f, g, fs, fm, fe, gs, gm, ge = g, f, gs, gm, ge, fs, fm, fe
+	}
+
+	shift := uint(fe - ge)
+	fm <<= 2
+	gm <<= 2
+	trunc := gm & (1<<shift - 1)
+	gm >>= shift
+	if fs == gs {
+		fm += gm
+	} else {
+		fm -= gm
+		if trunc != 0 {
+			fm--
+		}
+	}
+	if fm == 0 {
+		fs = 0
+	}
+	return fpack64(fs, fm, fe-2, trunc)
+}
+
+func fsub64(f, g uint64) uint64 {
+	return fadd64(f, fneg64(g))
+}
+
+func fneg64(f uint64) uint64 {
+	return f ^ (1 << (mantbits64 + expbits64))
+}
+
+func fmul64(f, g uint64) uint64 {
+	fs, fm, fe, fi, fn := funpack64(f)
+	gs, gm, ge, gi, gn := funpack64(g)
+
+	// Special cases.
+	switch {
+	case fn || gn: // NaN * g or f * NaN = NaN
+		return nan64
+
+	case fi && gi: // Inf * Inf = Inf (with sign adjusted)
+		return f ^ gs
+
+	case fi && gm == 0, fm == 0 && gi: // 0 * Inf = Inf * 0 = NaN
+		return nan64
+
+	case fm == 0: // 0 * x = 0 (with sign adjusted)
+		return f ^ gs
+
+	case gm == 0: // x * 0 = 0 (with sign adjusted)
+		return g ^ fs
+	}
+
+	// 53-bit * 53-bit = 107- or 108-bit
+	lo, hi := mullu(fm, gm)
+	shift := mantbits64 - 1
+	trunc := lo & (1<<shift - 1)
+	mant := hi<<(64-shift) | lo>>shift
+	return fpack64(fs^gs, mant, fe+ge-1, trunc)
+}
+
+func fdiv64(f, g uint64) uint64 {
+	fs, fm, fe, fi, fn := funpack64(f)
+	gs, gm, ge, gi, gn := funpack64(g)
+
+	// Special cases.
+	switch {
+	case fn || gn: // NaN / g = f / NaN = NaN
+		return nan64
+
+	case fi && gi: // ±Inf / ±Inf = NaN
+		return nan64
+
+	case !fi && !gi && fm == 0 && gm == 0: // 0 / 0 = NaN
+		return nan64
+
+	case fi, !gi && gm == 0: // Inf / g = f / 0 = Inf
+		return fs ^ gs ^ inf64
+
+	case gi, fm == 0: // f / Inf = 0 / g = Inf
+		return fs ^ gs ^ 0
+	}
+	_, _, _, _ = fi, fn, gi, gn
+
+	// 53-bit<<54 / 53-bit = 53- or 54-bit.
+	shift := mantbits64 + 2
+	q, r := divlu(fm>>(64-shift), fm<<shift, gm)
+	return fpack64(fs^gs, q, fe-ge-2, r)
+}
+
+func f64to32(f uint64) uint32 {
+	fs, fm, fe, fi, fn := funpack64(f)
+	if fn {
+		return nan32
+	}
+	fs32 := uint32(fs >> 32)
+	if fi {
+		return fs32 ^ inf32
+	}
+	const d = mantbits64 - mantbits32 - 1
+	return fpack32(fs32, uint32(fm>>d), fe-1, uint32(fm&(1<<d-1)))
+}
+
+func f32to64(f uint32) uint64 {
+	const d = mantbits64 - mantbits32
+	fs, fm, fe, fi, fn := funpack32(f)
+	if fn {
+		return nan64
+	}
+	fs64 := uint64(fs) << 32
+	if fi {
+		return fs64 ^ inf64
+	}
+	return fpack64(fs64, uint64(fm)<<d, fe, 0)
+}
+
+func fcmp64(f, g uint64) (cmp int32, isnan bool) {
+	fs, fm, _, fi, fn := funpack64(f)
+	gs, gm, _, gi, gn := funpack64(g)
+
+	switch {
+	case fn, gn: // flag NaN
+		return 0, true
+
+	case !fi && !gi && fm == 0 && gm == 0: // ±0 == ±0
+		return 0, false
+
+	case fs > gs: // f < 0, g > 0
+		return -1, false
+
+	case fs < gs: // f > 0, g < 0
+		return +1, false
+
+	// Same sign, not NaN.
+	// Can compare encodings directly now.
+	// Reverse for sign.
+	case fs == 0 && f < g, fs != 0 && f > g:
+		return -1, false
+
+	case fs == 0 && f > g, fs != 0 && f < g:
+		return +1, false
+	}
+
+	// f == g
+	return 0, false
+}
+
+func f64toint(f uint64) (val int64, ok bool) {
+	fs, fm, fe, fi, fn := funpack64(f)
+
+	switch {
+	case fi, fn: // NaN
+		return 0, false
+
+	case fe < -1: // f < 0.5
+		return 0, false
+
+	case fe > 63: // f >= 2^63
+		if fs != 0 && fm == 0 { // f == -2^63
+			return -1 << 63, true
+		}
+		if fs != 0 {
+			return 0, false
+		}
+		return 0, false
+	}
+
+	for fe > int(mantbits64) {
+		fe--
+		fm <<= 1
+	}
+	for fe < int(mantbits64) {
+		fe++
+		fm >>= 1
+	}
+	val = int64(fm)
+	if fs != 0 {
+		val = -val
+	}
+	return val, true
+}
+
+func fintto64(val int64) (f uint64) {
+	fs := uint64(val) & (1 << 63)
+	mant := uint64(val)
+	if fs != 0 {
+		mant = -mant
+	}
+	return fpack64(fs, mant, int(mantbits64), 0)
+}
+func fintto32(val int64) (f uint32) {
+	fs := uint64(val) & (1 << 63)
+	mant := uint64(val)
+	if fs != 0 {
+		mant = -mant
+	}
+	// Reduce mantissa size until it fits into a uint32.
+	// Keep track of the bits we throw away, and if any are
+	// nonzero or them into the lowest bit.
+	exp := int(mantbits32)
+	var trunc uint32
+	for mant >= 1<<32 {
+		trunc |= uint32(mant) & 1
+		mant >>= 1
+		exp++
+	}
+
+	return fpack32(uint32(fs>>32), uint32(mant), exp, trunc)
+}
+
+// 64x64 -> 128 multiply.
+// adapted from hacker's delight.
+func mullu(u, v uint64) (lo, hi uint64) {
+	const (
+		s    = 32
+		mask = 1<<s - 1
+	)
+	u0 := u & mask
+	u1 := u >> s
+	v0 := v & mask
+	v1 := v >> s
+	w0 := u0 * v0
+	t := u1*v0 + w0>>s
+	w1 := t & mask
+	w2 := t >> s
+	w1 += u0 * v1
+	return u * v, u1*v1 + w2 + w1>>s
+}
+
+// 128/64 -> 64 quotient, 64 remainder.
+// adapted from hacker's delight
+func divlu(u1, u0, v uint64) (q, r uint64) {
+	const b = 1 << 32
+
+	if u1 >= v {
+		return 1<<64 - 1, 1<<64 - 1
+	}
+
+	// s = nlz(v); v <<= s
+	s := uint(0)
+	for v&(1<<63) == 0 {
+		s++
+		v <<= 1
+	}
+
+	vn1 := v >> 32
+	vn0 := v & (1<<32 - 1)
+	un32 := u1<<s | u0>>(64-s)
+	un10 := u0 << s
+	un1 := un10 >> 32
+	un0 := un10 & (1<<32 - 1)
+	q1 := un32 / vn1
+	rhat := un32 - q1*vn1
+
+again1:
+	if q1 >= b || q1*vn0 > b*rhat+un1 {
+		q1--
+		rhat += vn1
+		if rhat < b {
+			goto again1
+		}
+	}
+
+	un21 := un32*b + un1 - q1*v
+	q0 := un21 / vn1
+	rhat = un21 - q0*vn1
+
+again2:
+	if q0 >= b || q0*vn0 > b*rhat+un0 {
+		q0--
+		rhat += vn1
+		if rhat < b {
+			goto again2
+		}
+	}
+
+	return q1*b + q0, (un21*b + un0 - q0*v) >> s
+}
+
+func fadd32(x, y uint32) uint32 {
+	return f64to32(fadd64(f32to64(x), f32to64(y)))
+}
+
+func fmul32(x, y uint32) uint32 {
+	return f64to32(fmul64(f32to64(x), f32to64(y)))
+}
+
+func fdiv32(x, y uint32) uint32 {
+	// TODO: are there double-rounding problems here? See issue 48807.
+	return f64to32(fdiv64(f32to64(x), f32to64(y)))
+}
+
+func feq32(x, y uint32) bool {
+	cmp, nan := fcmp64(f32to64(x), f32to64(y))
+	return cmp == 0 && !nan
+}
+
+func fgt32(x, y uint32) bool {
+	cmp, nan := fcmp64(f32to64(x), f32to64(y))
+	return cmp >= 1 && !nan
+}
+
+func fge32(x, y uint32) bool {
+	cmp, nan := fcmp64(f32to64(x), f32to64(y))
+	return cmp >= 0 && !nan
+}
+
+func feq64(x, y uint64) bool {
+	cmp, nan := fcmp64(x, y)
+	return cmp == 0 && !nan
+}
+
+func fgt64(x, y uint64) bool {
+	cmp, nan := fcmp64(x, y)
+	return cmp >= 1 && !nan
+}
+
+func fge64(x, y uint64) bool {
+	cmp, nan := fcmp64(x, y)
+	return cmp >= 0 && !nan
+}
+
+func fint32to32(x int32) uint32 {
+	return fintto32(int64(x))
+}
+
+func fint32to64(x int32) uint64 {
+	return fintto64(int64(x))
+}
+
+func fint64to32(x int64) uint32 {
+	return fintto32(x)
+}
+
+func fint64to64(x int64) uint64 {
+	return fintto64(x)
+}
+
+func f32toint32(x uint32) int32 {
+	val, _ := f64toint(f32to64(x))
+	return int32(val)
+}
+
+func f32toint64(x uint32) int64 {
+	val, _ := f64toint(f32to64(x))
+	return val
+}
+
+func f64toint32(x uint64) int32 {
+	val, _ := f64toint(x)
+	return int32(val)
+}
+
+func f64toint64(x uint64) int64 {
+	val, _ := f64toint(x)
+	return val
+}
+
+func f64touint64(x uint64) uint64 {
+	var m uint64 = 0x43e0000000000000 // float64 1<<63
+	if fgt64(m, x) {
+		return uint64(f64toint64(x))
+	}
+	y := fadd64(x, -m)
+	z := uint64(f64toint64(y))
+	return z | (1 << 63)
+}
+
+func f32touint64(x uint32) uint64 {
+	var m uint32 = 0x5f000000 // float32 1<<63
+	if fgt32(m, x) {
+		return uint64(f32toint64(x))
+	}
+	y := fadd32(x, -m)
+	z := uint64(f32toint64(y))
+	return z | (1 << 63)
+}
+
+func fuint64to64(x uint64) uint64 {
+	if int64(x) >= 0 {
+		return fint64to64(int64(x))
+	}
+	// See ../cmd/compile/internal/ssagen/ssa.go:uint64Tofloat
+	y := x & 1
+	z := x >> 1
+	z = z | y
+	r := fint64to64(int64(z))
+	return fadd64(r, r)
+}
+
+func fuint64to32(x uint64) uint32 {
+	if int64(x) >= 0 {
+		return fint64to32(int64(x))
+	}
+	// See ../cmd/compile/internal/ssagen/ssa.go:uint64Tofloat
+	y := x & 1
+	z := x >> 1
+	z = z | y
+	r := fint64to32(int64(z))
+	return fadd32(r, r)
+}
diff --git a/src/runtime/softfloat64_test.go b/src/runtime/softfloat64_test.go
new file mode 100644
index 0000000..3f53e8b
--- /dev/null
+++ b/src/runtime/softfloat64_test.go
@@ -0,0 +1,198 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"math"
+	"math/rand"
+	. "runtime"
+	"testing"
+)
+
+// turn uint64 op into float64 op
+func fop(f func(x, y uint64) uint64) func(x, y float64) float64 {
+	return func(x, y float64) float64 {
+		bx := math.Float64bits(x)
+		by := math.Float64bits(y)
+		return math.Float64frombits(f(bx, by))
+	}
+}
+
+func add(x, y float64) float64 { return x + y }
+func sub(x, y float64) float64 { return x - y }
+func mul(x, y float64) float64 { return x * y }
+func div(x, y float64) float64 { return x / y }
+
+func TestFloat64(t *testing.T) {
+	base := []float64{
+		0,
+		math.Copysign(0, -1),
+		-1,
+		1,
+		math.NaN(),
+		math.Inf(+1),
+		math.Inf(-1),
+		0.1,
+		1.5,
+		1.9999999999999998,     // all 1s mantissa
+		1.3333333333333333,     // 1.010101010101...
+		1.1428571428571428,     // 1.001001001001...
+		1.112536929253601e-308, // first normal
+		2,
+		4,
+		8,
+		16,
+		32,
+		64,
+		128,
+		256,
+		3,
+		12,
+		1234,
+		123456,
+		-0.1,
+		-1.5,
+		-1.9999999999999998,
+		-1.3333333333333333,
+		-1.1428571428571428,
+		-2,
+		-3,
+		1e-200,
+		1e-300,
+		1e-310,
+		5e-324,
+		1e-105,
+		1e-305,
+		1e+200,
+		1e+306,
+		1e+307,
+		1e+308,
+	}
+	all := make([]float64, 200)
+	copy(all, base)
+	for i := len(base); i < len(all); i++ {
+		all[i] = rand.NormFloat64()
+	}
+
+	test(t, "+", add, fop(Fadd64), all)
+	test(t, "-", sub, fop(Fsub64), all)
+	if GOARCH != "386" { // 386 is not precise!
+		test(t, "*", mul, fop(Fmul64), all)
+		test(t, "/", div, fop(Fdiv64), all)
+	}
+}
+
+// 64 -hw-> 32 -hw-> 64
+func trunc32(f float64) float64 {
+	return float64(float32(f))
+}
+
+// 64 -sw->32 -hw-> 64
+func to32sw(f float64) float64 {
+	return float64(math.Float32frombits(F64to32(math.Float64bits(f))))
+}
+
+// 64 -hw->32 -sw-> 64
+func to64sw(f float64) float64 {
+	return math.Float64frombits(F32to64(math.Float32bits(float32(f))))
+}
+
+// float64 -hw-> int64 -hw-> float64
+func hwint64(f float64) float64 {
+	return float64(int64(f))
+}
+
+// float64 -hw-> int32 -hw-> float64
+func hwint32(f float64) float64 {
+	return float64(int32(f))
+}
+
+// float64 -sw-> int64 -hw-> float64
+func toint64sw(f float64) float64 {
+	i, ok := F64toint(math.Float64bits(f))
+	if !ok {
+		// There's no right answer for out of range.
+		// Match the hardware to pass the test.
+		i = int64(f)
+	}
+	return float64(i)
+}
+
+// float64 -hw-> int64 -sw-> float64
+func fromint64sw(f float64) float64 {
+	return math.Float64frombits(Fintto64(int64(f)))
+}
+
+var nerr int
+
+func err(t *testing.T, format string, args ...any) {
+	t.Errorf(format, args...)
+
+	// cut errors off after a while.
+	// otherwise we spend all our time
+	// allocating memory to hold the
+	// formatted output.
+	if nerr++; nerr >= 10 {
+		t.Fatal("too many errors")
+	}
+}
+
+func test(t *testing.T, op string, hw, sw func(float64, float64) float64, all []float64) {
+	for _, f := range all {
+		for _, g := range all {
+			h := hw(f, g)
+			s := sw(f, g)
+			if !same(h, s) {
+				err(t, "%g %s %g = sw %g, hw %g\n", f, op, g, s, h)
+			}
+			testu(t, "to32", trunc32, to32sw, h)
+			testu(t, "to64", trunc32, to64sw, h)
+			testu(t, "toint64", hwint64, toint64sw, h)
+			testu(t, "fromint64", hwint64, fromint64sw, h)
+			testcmp(t, f, h)
+			testcmp(t, h, f)
+			testcmp(t, g, h)
+			testcmp(t, h, g)
+		}
+	}
+}
+
+func testu(t *testing.T, op string, hw, sw func(float64) float64, v float64) {
+	h := hw(v)
+	s := sw(v)
+	if !same(h, s) {
+		err(t, "%s %g = sw %g, hw %g\n", op, v, s, h)
+	}
+}
+
+func hwcmp(f, g float64) (cmp int, isnan bool) {
+	switch {
+	case f < g:
+		return -1, false
+	case f > g:
+		return +1, false
+	case f == g:
+		return 0, false
+	}
+	return 0, true // must be NaN
+}
+
+func testcmp(t *testing.T, f, g float64) {
+	hcmp, hisnan := hwcmp(f, g)
+	scmp, sisnan := Fcmp64(math.Float64bits(f), math.Float64bits(g))
+	if int32(hcmp) != scmp || hisnan != sisnan {
+		err(t, "cmp(%g, %g) = sw %v, %v, hw %v, %v\n", f, g, scmp, sisnan, hcmp, hisnan)
+	}
+}
+
+func same(f, g float64) bool {
+	if math.IsNaN(f) && math.IsNaN(g) {
+		return true
+	}
+	if math.Copysign(1, f) != math.Copysign(1, g) {
+		return false
+	}
+	return f == g
+}
diff --git a/src/runtime/stack.go b/src/runtime/stack.go
new file mode 100644
index 0000000..45d66da
--- /dev/null
+++ b/src/runtime/stack.go
@@ -0,0 +1,1347 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/cpu"
+	"internal/goarch"
+	"internal/goos"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+/*
+Stack layout parameters.
+Included both by runtime (compiled via 6c) and linkers (compiled via gcc).
+
+The per-goroutine g->stackguard is set to point StackGuard bytes
+above the bottom of the stack.  Each function compares its stack
+pointer against g->stackguard to check for overflow.  To cut one
+instruction from the check sequence for functions with tiny frames,
+the stack is allowed to protrude StackSmall bytes below the stack
+guard.  Functions with large frames don't bother with the check and
+always call morestack.  The sequences are (for amd64, others are
+similar):
+
+	guard = g->stackguard
+	frame = function's stack frame size
+	argsize = size of function arguments (call + return)
+
+	stack frame size <= StackSmall:
+		CMPQ guard, SP
+		JHI 3(PC)
+		MOVQ m->morearg, $(argsize << 32)
+		CALL morestack(SB)
+
+	stack frame size > StackSmall but < StackBig
+		LEAQ (frame-StackSmall)(SP), R0
+		CMPQ guard, R0
+		JHI 3(PC)
+		MOVQ m->morearg, $(argsize << 32)
+		CALL morestack(SB)
+
+	stack frame size >= StackBig:
+		MOVQ m->morearg, $((argsize << 32) | frame)
+		CALL morestack(SB)
+
+The bottom StackGuard - StackSmall bytes are important: there has
+to be enough room to execute functions that refuse to check for
+stack overflow, either because they need to be adjacent to the
+actual caller's frame (deferproc) or because they handle the imminent
+stack overflow (morestack).
+
+For example, deferproc might call malloc, which does one of the
+above checks (without allocating a full frame), which might trigger
+a call to morestack.  This sequence needs to fit in the bottom
+section of the stack.  On amd64, morestack's frame is 40 bytes, and
+deferproc's frame is 56 bytes.  That fits well within the
+StackGuard - StackSmall bytes at the bottom.
+The linkers explore all possible call traces involving non-splitting
+functions to make sure that this limit cannot be violated.
+*/
+
+const (
+	// stackSystem is a number of additional bytes to add
+	// to each stack below the usual guard area for OS-specific
+	// purposes like signal handling. Used on Windows, Plan 9,
+	// and iOS because they do not use a separate stack.
+	stackSystem = goos.IsWindows*512*goarch.PtrSize + goos.IsPlan9*512 + goos.IsIos*goarch.IsArm64*1024
+
+	// The minimum size of stack used by Go code
+	stackMin = 2048
+
+	// The minimum stack size to allocate.
+	// The hackery here rounds fixedStack0 up to a power of 2.
+	fixedStack0 = stackMin + stackSystem
+	fixedStack1 = fixedStack0 - 1
+	fixedStack2 = fixedStack1 | (fixedStack1 >> 1)
+	fixedStack3 = fixedStack2 | (fixedStack2 >> 2)
+	fixedStack4 = fixedStack3 | (fixedStack3 >> 4)
+	fixedStack5 = fixedStack4 | (fixedStack4 >> 8)
+	fixedStack6 = fixedStack5 | (fixedStack5 >> 16)
+	fixedStack  = fixedStack6 + 1
+
+	// stackNosplit is the maximum number of bytes that a chain of NOSPLIT
+	// functions can use.
+	// This arithmetic must match that in cmd/internal/objabi/stack.go:StackNosplit.
+	stackNosplit = abi.StackNosplitBase * sys.StackGuardMultiplier
+
+	// The stack guard is a pointer this many bytes above the
+	// bottom of the stack.
+	//
+	// The guard leaves enough room for a stackNosplit chain of NOSPLIT calls
+	// plus one stackSmall frame plus stackSystem bytes for the OS.
+	// This arithmetic must match that in cmd/internal/objabi/stack.go:StackLimit.
+	stackGuard = stackNosplit + stackSystem + abi.StackSmall
+)
+
+const (
+	// stackDebug == 0: no logging
+	//            == 1: logging of per-stack operations
+	//            == 2: logging of per-frame operations
+	//            == 3: logging of per-word updates
+	//            == 4: logging of per-word reads
+	stackDebug       = 0
+	stackFromSystem  = 0 // allocate stacks from system memory instead of the heap
+	stackFaultOnFree = 0 // old stacks are mapped noaccess to detect use after free
+	stackNoCache     = 0 // disable per-P small stack caches
+
+	// check the BP links during traceback.
+	debugCheckBP = false
+)
+
+var (
+	stackPoisonCopy = 0 // fill stack that should not be accessed with garbage, to detect bad dereferences during copy
+)
+
+const (
+	uintptrMask = 1<<(8*goarch.PtrSize) - 1
+
+	// The values below can be stored to g.stackguard0 to force
+	// the next stack check to fail.
+	// These are all larger than any real SP.
+
+	// Goroutine preemption request.
+	// 0xfffffade in hex.
+	stackPreempt = uintptrMask & -1314
+
+	// Thread is forking. Causes a split stack check failure.
+	// 0xfffffb2e in hex.
+	stackFork = uintptrMask & -1234
+
+	// Force a stack movement. Used for debugging.
+	// 0xfffffeed in hex.
+	stackForceMove = uintptrMask & -275
+
+	// stackPoisonMin is the lowest allowed stack poison value.
+	stackPoisonMin = uintptrMask & -4096
+)
+
+// Global pool of spans that have free stacks.
+// Stacks are assigned an order according to size.
+//
+//	order = log_2(size/FixedStack)
+//
+// There is a free list for each order.
+var stackpool [_NumStackOrders]struct {
+	item stackpoolItem
+	_    [(cpu.CacheLinePadSize - unsafe.Sizeof(stackpoolItem{})%cpu.CacheLinePadSize) % cpu.CacheLinePadSize]byte
+}
+
+type stackpoolItem struct {
+	_    sys.NotInHeap
+	mu   mutex
+	span mSpanList
+}
+
+// Global pool of large stack spans.
+var stackLarge struct {
+	lock mutex
+	free [heapAddrBits - pageShift]mSpanList // free lists by log_2(s.npages)
+}
+
+func stackinit() {
+	if _StackCacheSize&_PageMask != 0 {
+		throw("cache size must be a multiple of page size")
+	}
+	for i := range stackpool {
+		stackpool[i].item.span.init()
+		lockInit(&stackpool[i].item.mu, lockRankStackpool)
+	}
+	for i := range stackLarge.free {
+		stackLarge.free[i].init()
+		lockInit(&stackLarge.lock, lockRankStackLarge)
+	}
+}
+
+// stacklog2 returns ⌊log_2(n)⌋.
+func stacklog2(n uintptr) int {
+	log2 := 0
+	for n > 1 {
+		n >>= 1
+		log2++
+	}
+	return log2
+}
+
+// Allocates a stack from the free pool. Must be called with
+// stackpool[order].item.mu held.
+func stackpoolalloc(order uint8) gclinkptr {
+	list := &stackpool[order].item.span
+	s := list.first
+	lockWithRankMayAcquire(&mheap_.lock, lockRankMheap)
+	if s == nil {
+		// no free stacks. Allocate another span worth.
+		s = mheap_.allocManual(_StackCacheSize>>_PageShift, spanAllocStack)
+		if s == nil {
+			throw("out of memory")
+		}
+		if s.allocCount != 0 {
+			throw("bad allocCount")
+		}
+		if s.manualFreeList.ptr() != nil {
+			throw("bad manualFreeList")
+		}
+		osStackAlloc(s)
+		s.elemsize = fixedStack << order
+		for i := uintptr(0); i < _StackCacheSize; i += s.elemsize {
+			x := gclinkptr(s.base() + i)
+			x.ptr().next = s.manualFreeList
+			s.manualFreeList = x
+		}
+		list.insert(s)
+	}
+	x := s.manualFreeList
+	if x.ptr() == nil {
+		throw("span has no free stacks")
+	}
+	s.manualFreeList = x.ptr().next
+	s.allocCount++
+	if s.manualFreeList.ptr() == nil {
+		// all stacks in s are allocated.
+		list.remove(s)
+	}
+	return x
+}
+
+// Adds stack x to the free pool. Must be called with stackpool[order].item.mu held.
+func stackpoolfree(x gclinkptr, order uint8) {
+	s := spanOfUnchecked(uintptr(x))
+	if s.state.get() != mSpanManual {
+		throw("freeing stack not in a stack span")
+	}
+	if s.manualFreeList.ptr() == nil {
+		// s will now have a free stack
+		stackpool[order].item.span.insert(s)
+	}
+	x.ptr().next = s.manualFreeList
+	s.manualFreeList = x
+	s.allocCount--
+	if gcphase == _GCoff && s.allocCount == 0 {
+		// Span is completely free. Return it to the heap
+		// immediately if we're sweeping.
+		//
+		// If GC is active, we delay the free until the end of
+		// GC to avoid the following type of situation:
+		//
+		// 1) GC starts, scans a SudoG but does not yet mark the SudoG.elem pointer
+		// 2) The stack that pointer points to is copied
+		// 3) The old stack is freed
+		// 4) The containing span is marked free
+		// 5) GC attempts to mark the SudoG.elem pointer. The
+		//    marking fails because the pointer looks like a
+		//    pointer into a free span.
+		//
+		// By not freeing, we prevent step #4 until GC is done.
+		stackpool[order].item.span.remove(s)
+		s.manualFreeList = 0
+		osStackFree(s)
+		mheap_.freeManual(s, spanAllocStack)
+	}
+}
+
+// stackcacherefill/stackcacherelease implement a global pool of stack segments.
+// The pool is required to prevent unlimited growth of per-thread caches.
+//
+//go:systemstack
+func stackcacherefill(c *mcache, order uint8) {
+	if stackDebug >= 1 {
+		print("stackcacherefill order=", order, "\n")
+	}
+
+	// Grab some stacks from the global cache.
+	// Grab half of the allowed capacity (to prevent thrashing).
+	var list gclinkptr
+	var size uintptr
+	lock(&stackpool[order].item.mu)
+	for size < _StackCacheSize/2 {
+		x := stackpoolalloc(order)
+		x.ptr().next = list
+		list = x
+		size += fixedStack << order
+	}
+	unlock(&stackpool[order].item.mu)
+	c.stackcache[order].list = list
+	c.stackcache[order].size = size
+}
+
+//go:systemstack
+func stackcacherelease(c *mcache, order uint8) {
+	if stackDebug >= 1 {
+		print("stackcacherelease order=", order, "\n")
+	}
+	x := c.stackcache[order].list
+	size := c.stackcache[order].size
+	lock(&stackpool[order].item.mu)
+	for size > _StackCacheSize/2 {
+		y := x.ptr().next
+		stackpoolfree(x, order)
+		x = y
+		size -= fixedStack << order
+	}
+	unlock(&stackpool[order].item.mu)
+	c.stackcache[order].list = x
+	c.stackcache[order].size = size
+}
+
+//go:systemstack
+func stackcache_clear(c *mcache) {
+	if stackDebug >= 1 {
+		print("stackcache clear\n")
+	}
+	for order := uint8(0); order < _NumStackOrders; order++ {
+		lock(&stackpool[order].item.mu)
+		x := c.stackcache[order].list
+		for x.ptr() != nil {
+			y := x.ptr().next
+			stackpoolfree(x, order)
+			x = y
+		}
+		c.stackcache[order].list = 0
+		c.stackcache[order].size = 0
+		unlock(&stackpool[order].item.mu)
+	}
+}
+
+// stackalloc allocates an n byte stack.
+//
+// stackalloc must run on the system stack because it uses per-P
+// resources and must not split the stack.
+//
+//go:systemstack
+func stackalloc(n uint32) stack {
+	// Stackalloc must be called on scheduler stack, so that we
+	// never try to grow the stack during the code that stackalloc runs.
+	// Doing so would cause a deadlock (issue 1547).
+	thisg := getg()
+	if thisg != thisg.m.g0 {
+		throw("stackalloc not on scheduler stack")
+	}
+	if n&(n-1) != 0 {
+		throw("stack size not a power of 2")
+	}
+	if stackDebug >= 1 {
+		print("stackalloc ", n, "\n")
+	}
+
+	if debug.efence != 0 || stackFromSystem != 0 {
+		n = uint32(alignUp(uintptr(n), physPageSize))
+		v := sysAlloc(uintptr(n), &memstats.stacks_sys)
+		if v == nil {
+			throw("out of memory (stackalloc)")
+		}
+		return stack{uintptr(v), uintptr(v) + uintptr(n)}
+	}
+
+	// Small stacks are allocated with a fixed-size free-list allocator.
+	// If we need a stack of a bigger size, we fall back on allocating
+	// a dedicated span.
+	var v unsafe.Pointer
+	if n < fixedStack<<_NumStackOrders && n < _StackCacheSize {
+		order := uint8(0)
+		n2 := n
+		for n2 > fixedStack {
+			order++
+			n2 >>= 1
+		}
+		var x gclinkptr
+		if stackNoCache != 0 || thisg.m.p == 0 || thisg.m.preemptoff != "" {
+			// thisg.m.p == 0 can happen in the guts of exitsyscall
+			// or procresize. Just get a stack from the global pool.
+			// Also don't touch stackcache during gc
+			// as it's flushed concurrently.
+			lock(&stackpool[order].item.mu)
+			x = stackpoolalloc(order)
+			unlock(&stackpool[order].item.mu)
+		} else {
+			c := thisg.m.p.ptr().mcache
+			x = c.stackcache[order].list
+			if x.ptr() == nil {
+				stackcacherefill(c, order)
+				x = c.stackcache[order].list
+			}
+			c.stackcache[order].list = x.ptr().next
+			c.stackcache[order].size -= uintptr(n)
+		}
+		v = unsafe.Pointer(x)
+	} else {
+		var s *mspan
+		npage := uintptr(n) >> _PageShift
+		log2npage := stacklog2(npage)
+
+		// Try to get a stack from the large stack cache.
+		lock(&stackLarge.lock)
+		if !stackLarge.free[log2npage].isEmpty() {
+			s = stackLarge.free[log2npage].first
+			stackLarge.free[log2npage].remove(s)
+		}
+		unlock(&stackLarge.lock)
+
+		lockWithRankMayAcquire(&mheap_.lock, lockRankMheap)
+
+		if s == nil {
+			// Allocate a new stack from the heap.
+			s = mheap_.allocManual(npage, spanAllocStack)
+			if s == nil {
+				throw("out of memory")
+			}
+			osStackAlloc(s)
+			s.elemsize = uintptr(n)
+		}
+		v = unsafe.Pointer(s.base())
+	}
+
+	if raceenabled {
+		racemalloc(v, uintptr(n))
+	}
+	if msanenabled {
+		msanmalloc(v, uintptr(n))
+	}
+	if asanenabled {
+		asanunpoison(v, uintptr(n))
+	}
+	if stackDebug >= 1 {
+		print("  allocated ", v, "\n")
+	}
+	return stack{uintptr(v), uintptr(v) + uintptr(n)}
+}
+
+// stackfree frees an n byte stack allocation at stk.
+//
+// stackfree must run on the system stack because it uses per-P
+// resources and must not split the stack.
+//
+//go:systemstack
+func stackfree(stk stack) {
+	gp := getg()
+	v := unsafe.Pointer(stk.lo)
+	n := stk.hi - stk.lo
+	if n&(n-1) != 0 {
+		throw("stack not a power of 2")
+	}
+	if stk.lo+n < stk.hi {
+		throw("bad stack size")
+	}
+	if stackDebug >= 1 {
+		println("stackfree", v, n)
+		memclrNoHeapPointers(v, n) // for testing, clobber stack data
+	}
+	if debug.efence != 0 || stackFromSystem != 0 {
+		if debug.efence != 0 || stackFaultOnFree != 0 {
+			sysFault(v, n)
+		} else {
+			sysFree(v, n, &memstats.stacks_sys)
+		}
+		return
+	}
+	if msanenabled {
+		msanfree(v, n)
+	}
+	if asanenabled {
+		asanpoison(v, n)
+	}
+	if n < fixedStack<<_NumStackOrders && n < _StackCacheSize {
+		order := uint8(0)
+		n2 := n
+		for n2 > fixedStack {
+			order++
+			n2 >>= 1
+		}
+		x := gclinkptr(v)
+		if stackNoCache != 0 || gp.m.p == 0 || gp.m.preemptoff != "" {
+			lock(&stackpool[order].item.mu)
+			stackpoolfree(x, order)
+			unlock(&stackpool[order].item.mu)
+		} else {
+			c := gp.m.p.ptr().mcache
+			if c.stackcache[order].size >= _StackCacheSize {
+				stackcacherelease(c, order)
+			}
+			x.ptr().next = c.stackcache[order].list
+			c.stackcache[order].list = x
+			c.stackcache[order].size += n
+		}
+	} else {
+		s := spanOfUnchecked(uintptr(v))
+		if s.state.get() != mSpanManual {
+			println(hex(s.base()), v)
+			throw("bad span state")
+		}
+		if gcphase == _GCoff {
+			// Free the stack immediately if we're
+			// sweeping.
+			osStackFree(s)
+			mheap_.freeManual(s, spanAllocStack)
+		} else {
+			// If the GC is running, we can't return a
+			// stack span to the heap because it could be
+			// reused as a heap span, and this state
+			// change would race with GC. Add it to the
+			// large stack cache instead.
+			log2npage := stacklog2(s.npages)
+			lock(&stackLarge.lock)
+			stackLarge.free[log2npage].insert(s)
+			unlock(&stackLarge.lock)
+		}
+	}
+}
+
+var maxstacksize uintptr = 1 << 20 // enough until runtime.main sets it for real
+
+var maxstackceiling = maxstacksize
+
+var ptrnames = []string{
+	0: "scalar",
+	1: "ptr",
+}
+
+// Stack frame layout
+//
+// (x86)
+// +------------------+
+// | args from caller |
+// +------------------+ <- frame->argp
+// |  return address  |
+// +------------------+
+// |  caller's BP (*) | (*) if framepointer_enabled && varp > sp
+// +------------------+ <- frame->varp
+// |     locals       |
+// +------------------+
+// |  args to callee  |
+// +------------------+ <- frame->sp
+//
+// (arm)
+// +------------------+
+// | args from caller |
+// +------------------+ <- frame->argp
+// | caller's retaddr |
+// +------------------+
+// |  caller's FP (*) | (*) on ARM64, if framepointer_enabled && varp > sp
+// +------------------+ <- frame->varp
+// |     locals       |
+// +------------------+
+// |  args to callee  |
+// +------------------+
+// |  return address  |
+// +------------------+ <- frame->sp
+//
+// varp > sp means that the function has a frame;
+// varp == sp means frameless function.
+
+type adjustinfo struct {
+	old   stack
+	delta uintptr // ptr distance from old to new stack (newbase - oldbase)
+	cache pcvalueCache
+
+	// sghi is the highest sudog.elem on the stack.
+	sghi uintptr
+}
+
+// adjustpointer checks whether *vpp is in the old stack described by adjinfo.
+// If so, it rewrites *vpp to point into the new stack.
+func adjustpointer(adjinfo *adjustinfo, vpp unsafe.Pointer) {
+	pp := (*uintptr)(vpp)
+	p := *pp
+	if stackDebug >= 4 {
+		print("        ", pp, ":", hex(p), "\n")
+	}
+	if adjinfo.old.lo <= p && p < adjinfo.old.hi {
+		*pp = p + adjinfo.delta
+		if stackDebug >= 3 {
+			print("        adjust ptr ", pp, ":", hex(p), " -> ", hex(*pp), "\n")
+		}
+	}
+}
+
+// Information from the compiler about the layout of stack frames.
+// Note: this type must agree with reflect.bitVector.
+type bitvector struct {
+	n        int32 // # of bits
+	bytedata *uint8
+}
+
+// ptrbit returns the i'th bit in bv.
+// ptrbit is less efficient than iterating directly over bitvector bits,
+// and should only be used in non-performance-critical code.
+// See adjustpointers for an example of a high-efficiency walk of a bitvector.
+func (bv *bitvector) ptrbit(i uintptr) uint8 {
+	b := *(addb(bv.bytedata, i/8))
+	return (b >> (i % 8)) & 1
+}
+
+// bv describes the memory starting at address scanp.
+// Adjust any pointers contained therein.
+func adjustpointers(scanp unsafe.Pointer, bv *bitvector, adjinfo *adjustinfo, f funcInfo) {
+	minp := adjinfo.old.lo
+	maxp := adjinfo.old.hi
+	delta := adjinfo.delta
+	num := uintptr(bv.n)
+	// If this frame might contain channel receive slots, use CAS
+	// to adjust pointers. If the slot hasn't been received into
+	// yet, it may contain stack pointers and a concurrent send
+	// could race with adjusting those pointers. (The sent value
+	// itself can never contain stack pointers.)
+	useCAS := uintptr(scanp) < adjinfo.sghi
+	for i := uintptr(0); i < num; i += 8 {
+		if stackDebug >= 4 {
+			for j := uintptr(0); j < 8; j++ {
+				print("        ", add(scanp, (i+j)*goarch.PtrSize), ":", ptrnames[bv.ptrbit(i+j)], ":", hex(*(*uintptr)(add(scanp, (i+j)*goarch.PtrSize))), " # ", i, " ", *addb(bv.bytedata, i/8), "\n")
+			}
+		}
+		b := *(addb(bv.bytedata, i/8))
+		for b != 0 {
+			j := uintptr(sys.TrailingZeros8(b))
+			b &= b - 1
+			pp := (*uintptr)(add(scanp, (i+j)*goarch.PtrSize))
+		retry:
+			p := *pp
+			if f.valid() && 0 < p && p < minLegalPointer && debug.invalidptr != 0 {
+				// Looks like a junk value in a pointer slot.
+				// Live analysis wrong?
+				getg().m.traceback = 2
+				print("runtime: bad pointer in frame ", funcname(f), " at ", pp, ": ", hex(p), "\n")
+				throw("invalid pointer found on stack")
+			}
+			if minp <= p && p < maxp {
+				if stackDebug >= 3 {
+					print("adjust ptr ", hex(p), " ", funcname(f), "\n")
+				}
+				if useCAS {
+					ppu := (*unsafe.Pointer)(unsafe.Pointer(pp))
+					if !atomic.Casp1(ppu, unsafe.Pointer(p), unsafe.Pointer(p+delta)) {
+						goto retry
+					}
+				} else {
+					*pp = p + delta
+				}
+			}
+		}
+	}
+}
+
+// Note: the argument/return area is adjusted by the callee.
+func adjustframe(frame *stkframe, adjinfo *adjustinfo) {
+	if frame.continpc == 0 {
+		// Frame is dead.
+		return
+	}
+	f := frame.fn
+	if stackDebug >= 2 {
+		print("    adjusting ", funcname(f), " frame=[", hex(frame.sp), ",", hex(frame.fp), "] pc=", hex(frame.pc), " continpc=", hex(frame.continpc), "\n")
+	}
+
+	// Adjust saved frame pointer if there is one.
+	if (goarch.ArchFamily == goarch.AMD64 || goarch.ArchFamily == goarch.ARM64) && frame.argp-frame.varp == 2*goarch.PtrSize {
+		if stackDebug >= 3 {
+			print("      saved bp\n")
+		}
+		if debugCheckBP {
+			// Frame pointers should always point to the next higher frame on
+			// the Go stack (or be nil, for the top frame on the stack).
+			bp := *(*uintptr)(unsafe.Pointer(frame.varp))
+			if bp != 0 && (bp < adjinfo.old.lo || bp >= adjinfo.old.hi) {
+				println("runtime: found invalid frame pointer")
+				print("bp=", hex(bp), " min=", hex(adjinfo.old.lo), " max=", hex(adjinfo.old.hi), "\n")
+				throw("bad frame pointer")
+			}
+		}
+		// On AMD64, this is the caller's frame pointer saved in the current
+		// frame.
+		// On ARM64, this is the frame pointer of the caller's caller saved
+		// by the caller in its frame (one word below its SP).
+		adjustpointer(adjinfo, unsafe.Pointer(frame.varp))
+	}
+
+	locals, args, objs := frame.getStackMap(&adjinfo.cache, true)
+
+	// Adjust local variables if stack frame has been allocated.
+	if locals.n > 0 {
+		size := uintptr(locals.n) * goarch.PtrSize
+		adjustpointers(unsafe.Pointer(frame.varp-size), &locals, adjinfo, f)
+	}
+
+	// Adjust arguments.
+	if args.n > 0 {
+		if stackDebug >= 3 {
+			print("      args\n")
+		}
+		adjustpointers(unsafe.Pointer(frame.argp), &args, adjinfo, funcInfo{})
+	}
+
+	// Adjust pointers in all stack objects (whether they are live or not).
+	// See comments in mgcmark.go:scanframeworker.
+	if frame.varp != 0 {
+		for i := range objs {
+			obj := &objs[i]
+			off := obj.off
+			base := frame.varp // locals base pointer
+			if off >= 0 {
+				base = frame.argp // arguments and return values base pointer
+			}
+			p := base + uintptr(off)
+			if p < frame.sp {
+				// Object hasn't been allocated in the frame yet.
+				// (Happens when the stack bounds check fails and
+				// we call into morestack.)
+				continue
+			}
+			ptrdata := obj.ptrdata()
+			gcdata := obj.gcdata()
+			var s *mspan
+			if obj.useGCProg() {
+				// See comments in mgcmark.go:scanstack
+				s = materializeGCProg(ptrdata, gcdata)
+				gcdata = (*byte)(unsafe.Pointer(s.startAddr))
+			}
+			for i := uintptr(0); i < ptrdata; i += goarch.PtrSize {
+				if *addb(gcdata, i/(8*goarch.PtrSize))>>(i/goarch.PtrSize&7)&1 != 0 {
+					adjustpointer(adjinfo, unsafe.Pointer(p+i))
+				}
+			}
+			if s != nil {
+				dematerializeGCProg(s)
+			}
+		}
+	}
+}
+
+func adjustctxt(gp *g, adjinfo *adjustinfo) {
+	adjustpointer(adjinfo, unsafe.Pointer(&gp.sched.ctxt))
+	if !framepointer_enabled {
+		return
+	}
+	if debugCheckBP {
+		bp := gp.sched.bp
+		if bp != 0 && (bp < adjinfo.old.lo || bp >= adjinfo.old.hi) {
+			println("runtime: found invalid top frame pointer")
+			print("bp=", hex(bp), " min=", hex(adjinfo.old.lo), " max=", hex(adjinfo.old.hi), "\n")
+			throw("bad top frame pointer")
+		}
+	}
+	oldfp := gp.sched.bp
+	adjustpointer(adjinfo, unsafe.Pointer(&gp.sched.bp))
+	if GOARCH == "arm64" {
+		// On ARM64, the frame pointer is saved one word *below* the SP,
+		// which is not copied or adjusted in any frame. Do it explicitly
+		// here.
+		if oldfp == gp.sched.sp-goarch.PtrSize {
+			memmove(unsafe.Pointer(gp.sched.bp), unsafe.Pointer(oldfp), goarch.PtrSize)
+			adjustpointer(adjinfo, unsafe.Pointer(gp.sched.bp))
+		}
+	}
+}
+
+func adjustdefers(gp *g, adjinfo *adjustinfo) {
+	// Adjust pointers in the Defer structs.
+	// We need to do this first because we need to adjust the
+	// defer.link fields so we always work on the new stack.
+	adjustpointer(adjinfo, unsafe.Pointer(&gp._defer))
+	for d := gp._defer; d != nil; d = d.link {
+		adjustpointer(adjinfo, unsafe.Pointer(&d.fn))
+		adjustpointer(adjinfo, unsafe.Pointer(&d.sp))
+		adjustpointer(adjinfo, unsafe.Pointer(&d._panic))
+		adjustpointer(adjinfo, unsafe.Pointer(&d.link))
+		adjustpointer(adjinfo, unsafe.Pointer(&d.varp))
+		adjustpointer(adjinfo, unsafe.Pointer(&d.fd))
+	}
+}
+
+func adjustpanics(gp *g, adjinfo *adjustinfo) {
+	// Panics are on stack and already adjusted.
+	// Update pointer to head of list in G.
+	adjustpointer(adjinfo, unsafe.Pointer(&gp._panic))
+}
+
+func adjustsudogs(gp *g, adjinfo *adjustinfo) {
+	// the data elements pointed to by a SudoG structure
+	// might be in the stack.
+	for s := gp.waiting; s != nil; s = s.waitlink {
+		adjustpointer(adjinfo, unsafe.Pointer(&s.elem))
+	}
+}
+
+func fillstack(stk stack, b byte) {
+	for p := stk.lo; p < stk.hi; p++ {
+		*(*byte)(unsafe.Pointer(p)) = b
+	}
+}
+
+func findsghi(gp *g, stk stack) uintptr {
+	var sghi uintptr
+	for sg := gp.waiting; sg != nil; sg = sg.waitlink {
+		p := uintptr(sg.elem) + uintptr(sg.c.elemsize)
+		if stk.lo <= p && p < stk.hi && p > sghi {
+			sghi = p
+		}
+	}
+	return sghi
+}
+
+// syncadjustsudogs adjusts gp's sudogs and copies the part of gp's
+// stack they refer to while synchronizing with concurrent channel
+// operations. It returns the number of bytes of stack copied.
+func syncadjustsudogs(gp *g, used uintptr, adjinfo *adjustinfo) uintptr {
+	if gp.waiting == nil {
+		return 0
+	}
+
+	// Lock channels to prevent concurrent send/receive.
+	var lastc *hchan
+	for sg := gp.waiting; sg != nil; sg = sg.waitlink {
+		if sg.c != lastc {
+			// There is a ranking cycle here between gscan bit and
+			// hchan locks. Normally, we only allow acquiring hchan
+			// locks and then getting a gscan bit. In this case, we
+			// already have the gscan bit. We allow acquiring hchan
+			// locks here as a special case, since a deadlock can't
+			// happen because the G involved must already be
+			// suspended. So, we get a special hchan lock rank here
+			// that is lower than gscan, but doesn't allow acquiring
+			// any other locks other than hchan.
+			lockWithRank(&sg.c.lock, lockRankHchanLeaf)
+		}
+		lastc = sg.c
+	}
+
+	// Adjust sudogs.
+	adjustsudogs(gp, adjinfo)
+
+	// Copy the part of the stack the sudogs point in to
+	// while holding the lock to prevent races on
+	// send/receive slots.
+	var sgsize uintptr
+	if adjinfo.sghi != 0 {
+		oldBot := adjinfo.old.hi - used
+		newBot := oldBot + adjinfo.delta
+		sgsize = adjinfo.sghi - oldBot
+		memmove(unsafe.Pointer(newBot), unsafe.Pointer(oldBot), sgsize)
+	}
+
+	// Unlock channels.
+	lastc = nil
+	for sg := gp.waiting; sg != nil; sg = sg.waitlink {
+		if sg.c != lastc {
+			unlock(&sg.c.lock)
+		}
+		lastc = sg.c
+	}
+
+	return sgsize
+}
+
+// Copies gp's stack to a new stack of a different size.
+// Caller must have changed gp status to Gcopystack.
+func copystack(gp *g, newsize uintptr) {
+	if gp.syscallsp != 0 {
+		throw("stack growth not allowed in system call")
+	}
+	old := gp.stack
+	if old.lo == 0 {
+		throw("nil stackbase")
+	}
+	used := old.hi - gp.sched.sp
+	// Add just the difference to gcController.addScannableStack.
+	// g0 stacks never move, so this will never account for them.
+	// It's also fine if we have no P, addScannableStack can deal with
+	// that case.
+	gcController.addScannableStack(getg().m.p.ptr(), int64(newsize)-int64(old.hi-old.lo))
+
+	// allocate new stack
+	new := stackalloc(uint32(newsize))
+	if stackPoisonCopy != 0 {
+		fillstack(new, 0xfd)
+	}
+	if stackDebug >= 1 {
+		print("copystack gp=", gp, " [", hex(old.lo), " ", hex(old.hi-used), " ", hex(old.hi), "]", " -> [", hex(new.lo), " ", hex(new.hi-used), " ", hex(new.hi), "]/", newsize, "\n")
+	}
+
+	// Compute adjustment.
+	var adjinfo adjustinfo
+	adjinfo.old = old
+	adjinfo.delta = new.hi - old.hi
+
+	// Adjust sudogs, synchronizing with channel ops if necessary.
+	ncopy := used
+	if !gp.activeStackChans {
+		if newsize < old.hi-old.lo && gp.parkingOnChan.Load() {
+			// It's not safe for someone to shrink this stack while we're actively
+			// parking on a channel, but it is safe to grow since we do that
+			// ourselves and explicitly don't want to synchronize with channels
+			// since we could self-deadlock.
+			throw("racy sudog adjustment due to parking on channel")
+		}
+		adjustsudogs(gp, &adjinfo)
+	} else {
+		// sudogs may be pointing in to the stack and gp has
+		// released channel locks, so other goroutines could
+		// be writing to gp's stack. Find the highest such
+		// pointer so we can handle everything there and below
+		// carefully. (This shouldn't be far from the bottom
+		// of the stack, so there's little cost in handling
+		// everything below it carefully.)
+		adjinfo.sghi = findsghi(gp, old)
+
+		// Synchronize with channel ops and copy the part of
+		// the stack they may interact with.
+		ncopy -= syncadjustsudogs(gp, used, &adjinfo)
+	}
+
+	// Copy the stack (or the rest of it) to the new location
+	memmove(unsafe.Pointer(new.hi-ncopy), unsafe.Pointer(old.hi-ncopy), ncopy)
+
+	// Adjust remaining structures that have pointers into stacks.
+	// We have to do most of these before we traceback the new
+	// stack because gentraceback uses them.
+	adjustctxt(gp, &adjinfo)
+	adjustdefers(gp, &adjinfo)
+	adjustpanics(gp, &adjinfo)
+	if adjinfo.sghi != 0 {
+		adjinfo.sghi += adjinfo.delta
+	}
+
+	// Swap out old stack for new one
+	gp.stack = new
+	gp.stackguard0 = new.lo + stackGuard // NOTE: might clobber a preempt request
+	gp.sched.sp = new.hi - used
+	gp.stktopsp += adjinfo.delta
+
+	// Adjust pointers in the new stack.
+	var u unwinder
+	for u.init(gp, 0); u.valid(); u.next() {
+		adjustframe(&u.frame, &adjinfo)
+	}
+
+	// free old stack
+	if stackPoisonCopy != 0 {
+		fillstack(old, 0xfc)
+	}
+	stackfree(old)
+}
+
+// round x up to a power of 2.
+func round2(x int32) int32 {
+	s := uint(0)
+	for 1<<s < x {
+		s++
+	}
+	return 1 << s
+}
+
+// Called from runtime·morestack when more stack is needed.
+// Allocate larger stack and relocate to new stack.
+// Stack growth is multiplicative, for constant amortized cost.
+//
+// g->atomicstatus will be Grunning or Gscanrunning upon entry.
+// If the scheduler is trying to stop this g, then it will set preemptStop.
+//
+// This must be nowritebarrierrec because it can be called as part of
+// stack growth from other nowritebarrierrec functions, but the
+// compiler doesn't check this.
+//
+//go:nowritebarrierrec
+func newstack() {
+	thisg := getg()
+	// TODO: double check all gp. shouldn't be getg().
+	if thisg.m.morebuf.g.ptr().stackguard0 == stackFork {
+		throw("stack growth after fork")
+	}
+	if thisg.m.morebuf.g.ptr() != thisg.m.curg {
+		print("runtime: newstack called from g=", hex(thisg.m.morebuf.g), "\n"+"\tm=", thisg.m, " m->curg=", thisg.m.curg, " m->g0=", thisg.m.g0, " m->gsignal=", thisg.m.gsignal, "\n")
+		morebuf := thisg.m.morebuf
+		traceback(morebuf.pc, morebuf.sp, morebuf.lr, morebuf.g.ptr())
+		throw("runtime: wrong goroutine in newstack")
+	}
+
+	gp := thisg.m.curg
+
+	if thisg.m.curg.throwsplit {
+		// Update syscallsp, syscallpc in case traceback uses them.
+		morebuf := thisg.m.morebuf
+		gp.syscallsp = morebuf.sp
+		gp.syscallpc = morebuf.pc
+		pcname, pcoff := "(unknown)", uintptr(0)
+		f := findfunc(gp.sched.pc)
+		if f.valid() {
+			pcname = funcname(f)
+			pcoff = gp.sched.pc - f.entry()
+		}
+		print("runtime: newstack at ", pcname, "+", hex(pcoff),
+			" sp=", hex(gp.sched.sp), " stack=[", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n",
+			"\tmorebuf={pc:", hex(morebuf.pc), " sp:", hex(morebuf.sp), " lr:", hex(morebuf.lr), "}\n",
+			"\tsched={pc:", hex(gp.sched.pc), " sp:", hex(gp.sched.sp), " lr:", hex(gp.sched.lr), " ctxt:", gp.sched.ctxt, "}\n")
+
+		thisg.m.traceback = 2 // Include runtime frames
+		traceback(morebuf.pc, morebuf.sp, morebuf.lr, gp)
+		throw("runtime: stack split at bad time")
+	}
+
+	morebuf := thisg.m.morebuf
+	thisg.m.morebuf.pc = 0
+	thisg.m.morebuf.lr = 0
+	thisg.m.morebuf.sp = 0
+	thisg.m.morebuf.g = 0
+
+	// NOTE: stackguard0 may change underfoot, if another thread
+	// is about to try to preempt gp. Read it just once and use that same
+	// value now and below.
+	stackguard0 := atomic.Loaduintptr(&gp.stackguard0)
+
+	// Be conservative about where we preempt.
+	// We are interested in preempting user Go code, not runtime code.
+	// If we're holding locks, mallocing, or preemption is disabled, don't
+	// preempt.
+	// This check is very early in newstack so that even the status change
+	// from Grunning to Gwaiting and back doesn't happen in this case.
+	// That status change by itself can be viewed as a small preemption,
+	// because the GC might change Gwaiting to Gscanwaiting, and then
+	// this goroutine has to wait for the GC to finish before continuing.
+	// If the GC is in some way dependent on this goroutine (for example,
+	// it needs a lock held by the goroutine), that small preemption turns
+	// into a real deadlock.
+	preempt := stackguard0 == stackPreempt
+	if preempt {
+		if !canPreemptM(thisg.m) {
+			// Let the goroutine keep running for now.
+			// gp->preempt is set, so it will be preempted next time.
+			gp.stackguard0 = gp.stack.lo + stackGuard
+			gogo(&gp.sched) // never return
+		}
+	}
+
+	if gp.stack.lo == 0 {
+		throw("missing stack in newstack")
+	}
+	sp := gp.sched.sp
+	if goarch.ArchFamily == goarch.AMD64 || goarch.ArchFamily == goarch.I386 || goarch.ArchFamily == goarch.WASM {
+		// The call to morestack cost a word.
+		sp -= goarch.PtrSize
+	}
+	if stackDebug >= 1 || sp < gp.stack.lo {
+		print("runtime: newstack sp=", hex(sp), " stack=[", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n",
+			"\tmorebuf={pc:", hex(morebuf.pc), " sp:", hex(morebuf.sp), " lr:", hex(morebuf.lr), "}\n",
+			"\tsched={pc:", hex(gp.sched.pc), " sp:", hex(gp.sched.sp), " lr:", hex(gp.sched.lr), " ctxt:", gp.sched.ctxt, "}\n")
+	}
+	if sp < gp.stack.lo {
+		print("runtime: gp=", gp, ", goid=", gp.goid, ", gp->status=", hex(readgstatus(gp)), "\n ")
+		print("runtime: split stack overflow: ", hex(sp), " < ", hex(gp.stack.lo), "\n")
+		throw("runtime: split stack overflow")
+	}
+
+	if preempt {
+		if gp == thisg.m.g0 {
+			throw("runtime: preempt g0")
+		}
+		if thisg.m.p == 0 && thisg.m.locks == 0 {
+			throw("runtime: g is running but p is not")
+		}
+
+		if gp.preemptShrink {
+			// We're at a synchronous safe point now, so
+			// do the pending stack shrink.
+			gp.preemptShrink = false
+			shrinkstack(gp)
+		}
+
+		if gp.preemptStop {
+			preemptPark(gp) // never returns
+		}
+
+		// Act like goroutine called runtime.Gosched.
+		gopreempt_m(gp) // never return
+	}
+
+	// Allocate a bigger segment and move the stack.
+	oldsize := gp.stack.hi - gp.stack.lo
+	newsize := oldsize * 2
+
+	// Make sure we grow at least as much as needed to fit the new frame.
+	// (This is just an optimization - the caller of morestack will
+	// recheck the bounds on return.)
+	if f := findfunc(gp.sched.pc); f.valid() {
+		max := uintptr(funcMaxSPDelta(f))
+		needed := max + stackGuard
+		used := gp.stack.hi - gp.sched.sp
+		for newsize-used < needed {
+			newsize *= 2
+		}
+	}
+
+	if stackguard0 == stackForceMove {
+		// Forced stack movement used for debugging.
+		// Don't double the stack (or we may quickly run out
+		// if this is done repeatedly).
+		newsize = oldsize
+	}
+
+	if newsize > maxstacksize || newsize > maxstackceiling {
+		if maxstacksize < maxstackceiling {
+			print("runtime: goroutine stack exceeds ", maxstacksize, "-byte limit\n")
+		} else {
+			print("runtime: goroutine stack exceeds ", maxstackceiling, "-byte limit\n")
+		}
+		print("runtime: sp=", hex(sp), " stack=[", hex(gp.stack.lo), ", ", hex(gp.stack.hi), "]\n")
+		throw("stack overflow")
+	}
+
+	// The goroutine must be executing in order to call newstack,
+	// so it must be Grunning (or Gscanrunning).
+	casgstatus(gp, _Grunning, _Gcopystack)
+
+	// The concurrent GC will not scan the stack while we are doing the copy since
+	// the gp is in a Gcopystack status.
+	copystack(gp, newsize)
+	if stackDebug >= 1 {
+		print("stack grow done\n")
+	}
+	casgstatus(gp, _Gcopystack, _Grunning)
+	gogo(&gp.sched)
+}
+
+//go:nosplit
+func nilfunc() {
+	*(*uint8)(nil) = 0
+}
+
+// adjust Gobuf as if it executed a call to fn
+// and then stopped before the first instruction in fn.
+func gostartcallfn(gobuf *gobuf, fv *funcval) {
+	var fn unsafe.Pointer
+	if fv != nil {
+		fn = unsafe.Pointer(fv.fn)
+	} else {
+		fn = unsafe.Pointer(abi.FuncPCABIInternal(nilfunc))
+	}
+	gostartcall(gobuf, fn, unsafe.Pointer(fv))
+}
+
+// isShrinkStackSafe returns whether it's safe to attempt to shrink
+// gp's stack. Shrinking the stack is only safe when we have precise
+// pointer maps for all frames on the stack.
+func isShrinkStackSafe(gp *g) bool {
+	// We can't copy the stack if we're in a syscall.
+	// The syscall might have pointers into the stack and
+	// often we don't have precise pointer maps for the innermost
+	// frames.
+	//
+	// We also can't copy the stack if we're at an asynchronous
+	// safe-point because we don't have precise pointer maps for
+	// all frames.
+	//
+	// We also can't *shrink* the stack in the window between the
+	// goroutine calling gopark to park on a channel and
+	// gp.activeStackChans being set.
+	return gp.syscallsp == 0 && !gp.asyncSafePoint && !gp.parkingOnChan.Load()
+}
+
+// Maybe shrink the stack being used by gp.
+//
+// gp must be stopped and we must own its stack. It may be in
+// _Grunning, but only if this is our own user G.
+func shrinkstack(gp *g) {
+	if gp.stack.lo == 0 {
+		throw("missing stack in shrinkstack")
+	}
+	if s := readgstatus(gp); s&_Gscan == 0 {
+		// We don't own the stack via _Gscan. We could still
+		// own it if this is our own user G and we're on the
+		// system stack.
+		if !(gp == getg().m.curg && getg() != getg().m.curg && s == _Grunning) {
+			// We don't own the stack.
+			throw("bad status in shrinkstack")
+		}
+	}
+	if !isShrinkStackSafe(gp) {
+		throw("shrinkstack at bad time")
+	}
+	// Check for self-shrinks while in a libcall. These may have
+	// pointers into the stack disguised as uintptrs, but these
+	// code paths should all be nosplit.
+	if gp == getg().m.curg && gp.m.libcallsp != 0 {
+		throw("shrinking stack in libcall")
+	}
+
+	if debug.gcshrinkstackoff > 0 {
+		return
+	}
+	f := findfunc(gp.startpc)
+	if f.valid() && f.funcID == abi.FuncID_gcBgMarkWorker {
+		// We're not allowed to shrink the gcBgMarkWorker
+		// stack (see gcBgMarkWorker for explanation).
+		return
+	}
+
+	oldsize := gp.stack.hi - gp.stack.lo
+	newsize := oldsize / 2
+	// Don't shrink the allocation below the minimum-sized stack
+	// allocation.
+	if newsize < fixedStack {
+		return
+	}
+	// Compute how much of the stack is currently in use and only
+	// shrink the stack if gp is using less than a quarter of its
+	// current stack. The currently used stack includes everything
+	// down to the SP plus the stack guard space that ensures
+	// there's room for nosplit functions.
+	avail := gp.stack.hi - gp.stack.lo
+	if used := gp.stack.hi - gp.sched.sp + stackNosplit; used >= avail/4 {
+		return
+	}
+
+	if stackDebug > 0 {
+		print("shrinking stack ", oldsize, "->", newsize, "\n")
+	}
+
+	copystack(gp, newsize)
+}
+
+// freeStackSpans frees unused stack spans at the end of GC.
+func freeStackSpans() {
+	// Scan stack pools for empty stack spans.
+	for order := range stackpool {
+		lock(&stackpool[order].item.mu)
+		list := &stackpool[order].item.span
+		for s := list.first; s != nil; {
+			next := s.next
+			if s.allocCount == 0 {
+				list.remove(s)
+				s.manualFreeList = 0
+				osStackFree(s)
+				mheap_.freeManual(s, spanAllocStack)
+			}
+			s = next
+		}
+		unlock(&stackpool[order].item.mu)
+	}
+
+	// Free large stack spans.
+	lock(&stackLarge.lock)
+	for i := range stackLarge.free {
+		for s := stackLarge.free[i].first; s != nil; {
+			next := s.next
+			stackLarge.free[i].remove(s)
+			osStackFree(s)
+			mheap_.freeManual(s, spanAllocStack)
+			s = next
+		}
+	}
+	unlock(&stackLarge.lock)
+}
+
+// A stackObjectRecord is generated by the compiler for each stack object in a stack frame.
+// This record must match the generator code in cmd/compile/internal/liveness/plive.go:emitStackObjects.
+type stackObjectRecord struct {
+	// offset in frame
+	// if negative, offset from varp
+	// if non-negative, offset from argp
+	off       int32
+	size      int32
+	_ptrdata  int32  // ptrdata, or -ptrdata is GC prog is used
+	gcdataoff uint32 // offset to gcdata from moduledata.rodata
+}
+
+func (r *stackObjectRecord) useGCProg() bool {
+	return r._ptrdata < 0
+}
+
+func (r *stackObjectRecord) ptrdata() uintptr {
+	x := r._ptrdata
+	if x < 0 {
+		return uintptr(-x)
+	}
+	return uintptr(x)
+}
+
+// gcdata returns pointer map or GC prog of the type.
+func (r *stackObjectRecord) gcdata() *byte {
+	ptr := uintptr(unsafe.Pointer(r))
+	var mod *moduledata
+	for datap := &firstmoduledata; datap != nil; datap = datap.next {
+		if datap.gofunc <= ptr && ptr < datap.end {
+			mod = datap
+			break
+		}
+	}
+	// If you get a panic here due to a nil mod,
+	// you may have made a copy of a stackObjectRecord.
+	// You must use the original pointer.
+	res := mod.rodata + uintptr(r.gcdataoff)
+	return (*byte)(unsafe.Pointer(res))
+}
+
+// This is exported as ABI0 via linkname so obj can call it.
+//
+//go:nosplit
+//go:linkname morestackc
+func morestackc() {
+	throw("attempt to execute system stack code on user stack")
+}
+
+// startingStackSize is the amount of stack that new goroutines start with.
+// It is a power of 2, and between _FixedStack and maxstacksize, inclusive.
+// startingStackSize is updated every GC by tracking the average size of
+// stacks scanned during the GC.
+var startingStackSize uint32 = fixedStack
+
+func gcComputeStartingStackSize() {
+	if debug.adaptivestackstart == 0 {
+		return
+	}
+	// For details, see the design doc at
+	// https://docs.google.com/document/d/1YDlGIdVTPnmUiTAavlZxBI1d9pwGQgZT7IKFKlIXohQ/edit?usp=sharing
+	// The basic algorithm is to track the average size of stacks
+	// and start goroutines with stack equal to that average size.
+	// Starting at the average size uses at most 2x the space that
+	// an ideal algorithm would have used.
+	// This is just a heuristic to avoid excessive stack growth work
+	// early in a goroutine's lifetime. See issue 18138. Stacks that
+	// are allocated too small can still grow, and stacks allocated
+	// too large can still shrink.
+	var scannedStackSize uint64
+	var scannedStacks uint64
+	for _, p := range allp {
+		scannedStackSize += p.scannedStackSize
+		scannedStacks += p.scannedStacks
+		// Reset for next time
+		p.scannedStackSize = 0
+		p.scannedStacks = 0
+	}
+	if scannedStacks == 0 {
+		startingStackSize = fixedStack
+		return
+	}
+	avg := scannedStackSize/scannedStacks + stackGuard
+	// Note: we add stackGuard to ensure that a goroutine that
+	// uses the average space will not trigger a growth.
+	if avg > uint64(maxstacksize) {
+		avg = uint64(maxstacksize)
+	}
+	if avg < fixedStack {
+		avg = fixedStack
+	}
+	// Note: maxstacksize fits in 30 bits, so avg also does.
+	startingStackSize = uint32(round2(int32(avg)))
+}
diff --git a/src/runtime/stack_test.go b/src/runtime/stack_test.go
new file mode 100644
index 0000000..600e80d
--- /dev/null
+++ b/src/runtime/stack_test.go
@@ -0,0 +1,958 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"internal/testenv"
+	"reflect"
+	"regexp"
+	. "runtime"
+	"strings"
+	"sync"
+	"sync/atomic"
+	"testing"
+	"time"
+	_ "unsafe" // for go:linkname
+)
+
+// TestStackMem measures per-thread stack segment cache behavior.
+// The test consumed up to 500MB in the past.
+func TestStackMem(t *testing.T) {
+	const (
+		BatchSize      = 32
+		BatchCount     = 256
+		ArraySize      = 1024
+		RecursionDepth = 128
+	)
+	if testing.Short() {
+		return
+	}
+	defer GOMAXPROCS(GOMAXPROCS(BatchSize))
+	s0 := new(MemStats)
+	ReadMemStats(s0)
+	for b := 0; b < BatchCount; b++ {
+		c := make(chan bool, BatchSize)
+		for i := 0; i < BatchSize; i++ {
+			go func() {
+				var f func(k int, a [ArraySize]byte)
+				f = func(k int, a [ArraySize]byte) {
+					if k == 0 {
+						time.Sleep(time.Millisecond)
+						return
+					}
+					f(k-1, a)
+				}
+				f(RecursionDepth, [ArraySize]byte{})
+				c <- true
+			}()
+		}
+		for i := 0; i < BatchSize; i++ {
+			<-c
+		}
+
+		// The goroutines have signaled via c that they are ready to exit.
+		// Give them a chance to exit by sleeping. If we don't wait, we
+		// might not reuse them on the next batch.
+		time.Sleep(10 * time.Millisecond)
+	}
+	s1 := new(MemStats)
+	ReadMemStats(s1)
+	consumed := int64(s1.StackSys - s0.StackSys)
+	t.Logf("Consumed %vMB for stack mem", consumed>>20)
+	estimate := int64(8 * BatchSize * ArraySize * RecursionDepth) // 8 is to reduce flakiness.
+	if consumed > estimate {
+		t.Fatalf("Stack mem: want %v, got %v", estimate, consumed)
+	}
+	// Due to broken stack memory accounting (https://golang.org/issue/7468),
+	// StackInuse can decrease during function execution, so we cast the values to int64.
+	inuse := int64(s1.StackInuse) - int64(s0.StackInuse)
+	t.Logf("Inuse %vMB for stack mem", inuse>>20)
+	if inuse > 4<<20 {
+		t.Fatalf("Stack inuse: want %v, got %v", 4<<20, inuse)
+	}
+}
+
+// Test stack growing in different contexts.
+func TestStackGrowth(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+
+	var wg sync.WaitGroup
+
+	// in a normal goroutine
+	var growDuration time.Duration // For debugging failures
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		start := time.Now()
+		growStack(nil)
+		growDuration = time.Since(start)
+	}()
+	wg.Wait()
+	t.Log("first growStack took", growDuration)
+
+	// in locked goroutine
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		LockOSThread()
+		growStack(nil)
+		UnlockOSThread()
+	}()
+	wg.Wait()
+
+	// in finalizer
+	var finalizerStart time.Time
+	var started atomic.Bool
+	var progress atomic.Uint32
+	wg.Add(1)
+	s := new(string) // Must be of a type that avoids the tiny allocator, or else the finalizer might not run.
+	SetFinalizer(s, func(ss *string) {
+		defer wg.Done()
+		finalizerStart = time.Now()
+		started.Store(true)
+		growStack(&progress)
+	})
+	setFinalizerTime := time.Now()
+	s = nil
+
+	if d, ok := t.Deadline(); ok {
+		// Pad the timeout by an arbitrary 5% to give the AfterFunc time to run.
+		timeout := time.Until(d) * 19 / 20
+		timer := time.AfterFunc(timeout, func() {
+			// Panic — instead of calling t.Error and returning from the test — so
+			// that we get a useful goroutine dump if the test times out, especially
+			// if GOTRACEBACK=system or GOTRACEBACK=crash is set.
+			if !started.Load() {
+				panic("finalizer did not start")
+			} else {
+				panic(fmt.Sprintf("finalizer started %s ago (%s after registration) and ran %d iterations, but did not return", time.Since(finalizerStart), finalizerStart.Sub(setFinalizerTime), progress.Load()))
+			}
+		})
+		defer timer.Stop()
+	}
+
+	GC()
+	wg.Wait()
+	t.Logf("finalizer started after %s and ran %d iterations in %v", finalizerStart.Sub(setFinalizerTime), progress.Load(), time.Since(finalizerStart))
+}
+
+// ... and in init
+//func init() {
+//	growStack()
+//}
+
+func growStack(progress *atomic.Uint32) {
+	n := 1 << 10
+	if testing.Short() {
+		n = 1 << 8
+	}
+	for i := 0; i < n; i++ {
+		x := 0
+		growStackIter(&x, i)
+		if x != i+1 {
+			panic("stack is corrupted")
+		}
+		if progress != nil {
+			progress.Store(uint32(i))
+		}
+	}
+	GC()
+}
+
+// This function is not an anonymous func, so that the compiler can do escape
+// analysis and place x on stack (and subsequently stack growth update the pointer).
+func growStackIter(p *int, n int) {
+	if n == 0 {
+		*p = n + 1
+		GC()
+		return
+	}
+	*p = n + 1
+	x := 0
+	growStackIter(&x, n-1)
+	if x != n {
+		panic("stack is corrupted")
+	}
+}
+
+func TestStackGrowthCallback(t *testing.T) {
+	t.Parallel()
+	var wg sync.WaitGroup
+
+	// test stack growth at chan op
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		c := make(chan int, 1)
+		growStackWithCallback(func() {
+			c <- 1
+			<-c
+		})
+	}()
+
+	// test stack growth at map op
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		m := make(map[int]int)
+		growStackWithCallback(func() {
+			_, _ = m[1]
+			m[1] = 1
+		})
+	}()
+
+	// test stack growth at goroutine creation
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		growStackWithCallback(func() {
+			done := make(chan bool)
+			go func() {
+				done <- true
+			}()
+			<-done
+		})
+	}()
+	wg.Wait()
+}
+
+func growStackWithCallback(cb func()) {
+	var f func(n int)
+	f = func(n int) {
+		if n == 0 {
+			cb()
+			return
+		}
+		f(n - 1)
+	}
+	for i := 0; i < 1<<10; i++ {
+		f(i)
+	}
+}
+
+// TestDeferPtrs tests the adjustment of Defer's argument pointers (p aka &y)
+// during a stack copy.
+func set(p *int, x int) {
+	*p = x
+}
+func TestDeferPtrs(t *testing.T) {
+	var y int
+
+	defer func() {
+		if y != 42 {
+			t.Errorf("defer's stack references were not adjusted appropriately")
+		}
+	}()
+	defer set(&y, 42)
+	growStack(nil)
+}
+
+type bigBuf [4 * 1024]byte
+
+// TestDeferPtrsGoexit is like TestDeferPtrs but exercises the possibility that the
+// stack grows as part of starting the deferred function. It calls Goexit at various
+// stack depths, forcing the deferred function (with >4kB of args) to be run at
+// the bottom of the stack. The goal is to find a stack depth less than 4kB from
+// the end of the stack. Each trial runs in a different goroutine so that an earlier
+// stack growth does not invalidate a later attempt.
+func TestDeferPtrsGoexit(t *testing.T) {
+	for i := 0; i < 100; i++ {
+		c := make(chan int, 1)
+		go testDeferPtrsGoexit(c, i)
+		if n := <-c; n != 42 {
+			t.Fatalf("defer's stack references were not adjusted appropriately (i=%d n=%d)", i, n)
+		}
+	}
+}
+
+func testDeferPtrsGoexit(c chan int, i int) {
+	var y int
+	defer func() {
+		c <- y
+	}()
+	defer setBig(&y, 42, bigBuf{})
+	useStackAndCall(i, Goexit)
+}
+
+func setBig(p *int, x int, b bigBuf) {
+	*p = x
+}
+
+// TestDeferPtrsPanic is like TestDeferPtrsGoexit, but it's using panic instead
+// of Goexit to run the Defers. Those two are different execution paths
+// in the runtime.
+func TestDeferPtrsPanic(t *testing.T) {
+	for i := 0; i < 100; i++ {
+		c := make(chan int, 1)
+		go testDeferPtrsGoexit(c, i)
+		if n := <-c; n != 42 {
+			t.Fatalf("defer's stack references were not adjusted appropriately (i=%d n=%d)", i, n)
+		}
+	}
+}
+
+func testDeferPtrsPanic(c chan int, i int) {
+	var y int
+	defer func() {
+		if recover() == nil {
+			c <- -1
+			return
+		}
+		c <- y
+	}()
+	defer setBig(&y, 42, bigBuf{})
+	useStackAndCall(i, func() { panic(1) })
+}
+
+//go:noinline
+func testDeferLeafSigpanic1() {
+	// Cause a sigpanic to be injected in this frame.
+	//
+	// This function has to be declared before
+	// TestDeferLeafSigpanic so the runtime will crash if we think
+	// this function's continuation PC is in
+	// TestDeferLeafSigpanic.
+	*(*int)(nil) = 0
+}
+
+// TestDeferLeafSigpanic tests defer matching around leaf functions
+// that sigpanic. This is tricky because on LR machines the outer
+// function and the inner function have the same SP, but it's critical
+// that we match up the defer correctly to get the right liveness map.
+// See issue #25499.
+func TestDeferLeafSigpanic(t *testing.T) {
+	// Push a defer that will walk the stack.
+	defer func() {
+		if err := recover(); err == nil {
+			t.Fatal("expected panic from nil pointer")
+		}
+		GC()
+	}()
+	// Call a leaf function. We must set up the exact call stack:
+	//
+	//  deferring function -> leaf function -> sigpanic
+	//
+	// On LR machines, the leaf function will have the same SP as
+	// the SP pushed for the defer frame.
+	testDeferLeafSigpanic1()
+}
+
+// TestPanicUseStack checks that a chain of Panic structs on the stack are
+// updated correctly if the stack grows during the deferred execution that
+// happens as a result of the panic.
+func TestPanicUseStack(t *testing.T) {
+	pc := make([]uintptr, 10000)
+	defer func() {
+		recover()
+		Callers(0, pc) // force stack walk
+		useStackAndCall(100, func() {
+			defer func() {
+				recover()
+				Callers(0, pc) // force stack walk
+				useStackAndCall(200, func() {
+					defer func() {
+						recover()
+						Callers(0, pc) // force stack walk
+					}()
+					panic(3)
+				})
+			}()
+			panic(2)
+		})
+	}()
+	panic(1)
+}
+
+func TestPanicFar(t *testing.T) {
+	var xtree *xtreeNode
+	pc := make([]uintptr, 10000)
+	defer func() {
+		// At this point we created a large stack and unwound
+		// it via recovery. Force a stack walk, which will
+		// check the stack's consistency.
+		Callers(0, pc)
+	}()
+	defer func() {
+		recover()
+	}()
+	useStackAndCall(100, func() {
+		// Kick off the GC and make it do something nontrivial.
+		// (This used to force stack barriers to stick around.)
+		xtree = makeTree(18)
+		// Give the GC time to start scanning stacks.
+		time.Sleep(time.Millisecond)
+		panic(1)
+	})
+	_ = xtree
+}
+
+type xtreeNode struct {
+	l, r *xtreeNode
+}
+
+func makeTree(d int) *xtreeNode {
+	if d == 0 {
+		return new(xtreeNode)
+	}
+	return &xtreeNode{makeTree(d - 1), makeTree(d - 1)}
+}
+
+// use about n KB of stack and call f
+func useStackAndCall(n int, f func()) {
+	if n == 0 {
+		f()
+		return
+	}
+	var b [1024]byte // makes frame about 1KB
+	useStackAndCall(n-1+int(b[99]), f)
+}
+
+func useStack(n int) {
+	useStackAndCall(n, func() {})
+}
+
+func growing(c chan int, done chan struct{}) {
+	for n := range c {
+		useStack(n)
+		done <- struct{}{}
+	}
+	done <- struct{}{}
+}
+
+func TestStackCache(t *testing.T) {
+	// Allocate a bunch of goroutines and grow their stacks.
+	// Repeat a few times to test the stack cache.
+	const (
+		R = 4
+		G = 200
+		S = 5
+	)
+	for i := 0; i < R; i++ {
+		var reqchans [G]chan int
+		done := make(chan struct{})
+		for j := 0; j < G; j++ {
+			reqchans[j] = make(chan int)
+			go growing(reqchans[j], done)
+		}
+		for s := 0; s < S; s++ {
+			for j := 0; j < G; j++ {
+				reqchans[j] <- 1 << uint(s)
+			}
+			for j := 0; j < G; j++ {
+				<-done
+			}
+		}
+		for j := 0; j < G; j++ {
+			close(reqchans[j])
+		}
+		for j := 0; j < G; j++ {
+			<-done
+		}
+	}
+}
+
+func TestStackOutput(t *testing.T) {
+	b := make([]byte, 1024)
+	stk := string(b[:Stack(b, false)])
+	if !strings.HasPrefix(stk, "goroutine ") {
+		t.Errorf("Stack (len %d):\n%s", len(stk), stk)
+		t.Errorf("Stack output should begin with \"goroutine \"")
+	}
+}
+
+func TestStackAllOutput(t *testing.T) {
+	b := make([]byte, 1024)
+	stk := string(b[:Stack(b, true)])
+	if !strings.HasPrefix(stk, "goroutine ") {
+		t.Errorf("Stack (len %d):\n%s", len(stk), stk)
+		t.Errorf("Stack output should begin with \"goroutine \"")
+	}
+}
+
+func TestStackPanic(t *testing.T) {
+	// Test that stack copying copies panics correctly. This is difficult
+	// to test because it is very unlikely that the stack will be copied
+	// in the middle of gopanic. But it can happen.
+	// To make this test effective, edit panic.go:gopanic and uncomment
+	// the GC() call just before freedefer(d).
+	defer func() {
+		if x := recover(); x == nil {
+			t.Errorf("recover failed")
+		}
+	}()
+	useStack(32)
+	panic("test panic")
+}
+
+func BenchmarkStackCopyPtr(b *testing.B) {
+	c := make(chan bool)
+	for i := 0; i < b.N; i++ {
+		go func() {
+			i := 1000000
+			countp(&i)
+			c <- true
+		}()
+		<-c
+	}
+}
+
+func countp(n *int) {
+	if *n == 0 {
+		return
+	}
+	*n--
+	countp(n)
+}
+
+func BenchmarkStackCopy(b *testing.B) {
+	c := make(chan bool)
+	for i := 0; i < b.N; i++ {
+		go func() {
+			count(1000000)
+			c <- true
+		}()
+		<-c
+	}
+}
+
+func count(n int) int {
+	if n == 0 {
+		return 0
+	}
+	return 1 + count(n-1)
+}
+
+func BenchmarkStackCopyNoCache(b *testing.B) {
+	c := make(chan bool)
+	for i := 0; i < b.N; i++ {
+		go func() {
+			count1(1000000)
+			c <- true
+		}()
+		<-c
+	}
+}
+
+func count1(n int) int {
+	if n <= 0 {
+		return 0
+	}
+	return 1 + count2(n-1)
+}
+
+func count2(n int) int  { return 1 + count3(n-1) }
+func count3(n int) int  { return 1 + count4(n-1) }
+func count4(n int) int  { return 1 + count5(n-1) }
+func count5(n int) int  { return 1 + count6(n-1) }
+func count6(n int) int  { return 1 + count7(n-1) }
+func count7(n int) int  { return 1 + count8(n-1) }
+func count8(n int) int  { return 1 + count9(n-1) }
+func count9(n int) int  { return 1 + count10(n-1) }
+func count10(n int) int { return 1 + count11(n-1) }
+func count11(n int) int { return 1 + count12(n-1) }
+func count12(n int) int { return 1 + count13(n-1) }
+func count13(n int) int { return 1 + count14(n-1) }
+func count14(n int) int { return 1 + count15(n-1) }
+func count15(n int) int { return 1 + count16(n-1) }
+func count16(n int) int { return 1 + count17(n-1) }
+func count17(n int) int { return 1 + count18(n-1) }
+func count18(n int) int { return 1 + count19(n-1) }
+func count19(n int) int { return 1 + count20(n-1) }
+func count20(n int) int { return 1 + count21(n-1) }
+func count21(n int) int { return 1 + count22(n-1) }
+func count22(n int) int { return 1 + count23(n-1) }
+func count23(n int) int { return 1 + count1(n-1) }
+
+type stkobjT struct {
+	p *stkobjT
+	x int64
+	y [20]int // consume some stack
+}
+
+// Sum creates a linked list of stkobjTs.
+func Sum(n int64, p *stkobjT) {
+	if n == 0 {
+		return
+	}
+	s := stkobjT{p: p, x: n}
+	Sum(n-1, &s)
+	p.x += s.x
+}
+
+func BenchmarkStackCopyWithStkobj(b *testing.B) {
+	c := make(chan bool)
+	for i := 0; i < b.N; i++ {
+		go func() {
+			var s stkobjT
+			Sum(100000, &s)
+			c <- true
+		}()
+		<-c
+	}
+}
+
+func BenchmarkIssue18138(b *testing.B) {
+	// Channel with N "can run a goroutine" tokens
+	const N = 10
+	c := make(chan []byte, N)
+	for i := 0; i < N; i++ {
+		c <- make([]byte, 1)
+	}
+
+	for i := 0; i < b.N; i++ {
+		<-c // get token
+		go func() {
+			useStackPtrs(1000, false) // uses ~1MB max
+			m := make([]byte, 8192)   // make GC trigger occasionally
+			c <- m                    // return token
+		}()
+	}
+}
+
+func useStackPtrs(n int, b bool) {
+	if b {
+		// This code contributes to the stack frame size, and hence to the
+		// stack copying cost. But since b is always false, it costs no
+		// execution time (not even the zeroing of a).
+		var a [128]*int // 1KB of pointers
+		a[n] = &n
+		n = *a[0]
+	}
+	if n == 0 {
+		return
+	}
+	useStackPtrs(n-1, b)
+}
+
+type structWithMethod struct{}
+
+func (s structWithMethod) caller() string {
+	_, file, line, ok := Caller(1)
+	if !ok {
+		panic("Caller failed")
+	}
+	return fmt.Sprintf("%s:%d", file, line)
+}
+
+func (s structWithMethod) callers() []uintptr {
+	pc := make([]uintptr, 16)
+	return pc[:Callers(0, pc)]
+}
+
+func (s structWithMethod) stack() string {
+	buf := make([]byte, 4<<10)
+	return string(buf[:Stack(buf, false)])
+}
+
+func (s structWithMethod) nop() {}
+
+func (s structWithMethod) inlinablePanic() { panic("panic") }
+
+func TestStackWrapperCaller(t *testing.T) {
+	var d structWithMethod
+	// Force the compiler to construct a wrapper method.
+	wrapper := (*structWithMethod).caller
+	// Check that the wrapper doesn't affect the stack trace.
+	if dc, ic := d.caller(), wrapper(&d); dc != ic {
+		t.Fatalf("direct caller %q != indirect caller %q", dc, ic)
+	}
+}
+
+func TestStackWrapperCallers(t *testing.T) {
+	var d structWithMethod
+	wrapper := (*structWithMethod).callers
+	// Check that <autogenerated> doesn't appear in the stack trace.
+	pcs := wrapper(&d)
+	frames := CallersFrames(pcs)
+	for {
+		fr, more := frames.Next()
+		if fr.File == "<autogenerated>" {
+			t.Fatalf("<autogenerated> appears in stack trace: %+v", fr)
+		}
+		if !more {
+			break
+		}
+	}
+}
+
+func TestStackWrapperStack(t *testing.T) {
+	var d structWithMethod
+	wrapper := (*structWithMethod).stack
+	// Check that <autogenerated> doesn't appear in the stack trace.
+	stk := wrapper(&d)
+	if strings.Contains(stk, "<autogenerated>") {
+		t.Fatalf("<autogenerated> appears in stack trace:\n%s", stk)
+	}
+}
+
+func TestStackWrapperStackInlinePanic(t *testing.T) {
+	// Test that inline unwinding correctly tracks the callee by creating a
+	// stack of the form wrapper -> inlined function -> panic. If we mess up
+	// callee tracking, it will look like the wrapper called panic and we'll see
+	// the wrapper in the stack trace.
+	var d structWithMethod
+	wrapper := (*structWithMethod).inlinablePanic
+	defer func() {
+		err := recover()
+		if err == nil {
+			t.Fatalf("expected panic")
+		}
+		buf := make([]byte, 4<<10)
+		stk := string(buf[:Stack(buf, false)])
+		if strings.Contains(stk, "<autogenerated>") {
+			t.Fatalf("<autogenerated> appears in stack trace:\n%s", stk)
+		}
+		// Self-check: make sure inlinablePanic got inlined.
+		if !testenv.OptimizationOff() {
+			if !strings.Contains(stk, "inlinablePanic(...)") {
+				t.Fatalf("inlinablePanic not inlined")
+			}
+		}
+	}()
+	wrapper(&d)
+}
+
+type I interface {
+	M()
+}
+
+func TestStackWrapperStackPanic(t *testing.T) {
+	t.Run("sigpanic", func(t *testing.T) {
+		// nil calls to interface methods cause a sigpanic.
+		testStackWrapperPanic(t, func() { I.M(nil) }, "runtime_test.I.M")
+	})
+	t.Run("panicwrap", func(t *testing.T) {
+		// Nil calls to value method wrappers call panicwrap.
+		wrapper := (*structWithMethod).nop
+		testStackWrapperPanic(t, func() { wrapper(nil) }, "runtime_test.(*structWithMethod).nop")
+	})
+}
+
+func testStackWrapperPanic(t *testing.T, cb func(), expect string) {
+	// Test that the stack trace from a panicking wrapper includes
+	// the wrapper, even though elide these when they don't panic.
+	t.Run("CallersFrames", func(t *testing.T) {
+		defer func() {
+			err := recover()
+			if err == nil {
+				t.Fatalf("expected panic")
+			}
+			pcs := make([]uintptr, 10)
+			n := Callers(0, pcs)
+			frames := CallersFrames(pcs[:n])
+			for {
+				frame, more := frames.Next()
+				t.Log(frame.Function)
+				if frame.Function == expect {
+					return
+				}
+				if !more {
+					break
+				}
+			}
+			t.Fatalf("panicking wrapper %s missing from stack trace", expect)
+		}()
+		cb()
+	})
+	t.Run("Stack", func(t *testing.T) {
+		defer func() {
+			err := recover()
+			if err == nil {
+				t.Fatalf("expected panic")
+			}
+			buf := make([]byte, 4<<10)
+			stk := string(buf[:Stack(buf, false)])
+			if !strings.Contains(stk, "\n"+expect) {
+				t.Fatalf("panicking wrapper %s missing from stack trace:\n%s", expect, stk)
+			}
+		}()
+		cb()
+	})
+}
+
+func TestCallersFromWrapper(t *testing.T) {
+	// Test that invoking CallersFrames on a stack where the first
+	// PC is an autogenerated wrapper keeps the wrapper in the
+	// trace. Normally we elide these, assuming that the wrapper
+	// calls the thing you actually wanted to see, but in this
+	// case we need to keep it.
+	pc := reflect.ValueOf(I.M).Pointer()
+	frames := CallersFrames([]uintptr{pc})
+	frame, more := frames.Next()
+	if frame.Function != "runtime_test.I.M" {
+		t.Fatalf("want function %s, got %s", "runtime_test.I.M", frame.Function)
+	}
+	if more {
+		t.Fatalf("want 1 frame, got > 1")
+	}
+}
+
+func TestTracebackSystemstack(t *testing.T) {
+	if GOARCH == "ppc64" || GOARCH == "ppc64le" {
+		t.Skip("systemstack tail call not implemented on ppc64x")
+	}
+
+	// Test that profiles correctly jump over systemstack,
+	// including nested systemstack calls.
+	pcs := make([]uintptr, 20)
+	pcs = pcs[:TracebackSystemstack(pcs, 5)]
+	// Check that runtime.TracebackSystemstack appears five times
+	// and that we see TestTracebackSystemstack.
+	countIn, countOut := 0, 0
+	frames := CallersFrames(pcs)
+	var tb strings.Builder
+	for {
+		frame, more := frames.Next()
+		fmt.Fprintf(&tb, "\n%s+0x%x %s:%d", frame.Function, frame.PC-frame.Entry, frame.File, frame.Line)
+		switch frame.Function {
+		case "runtime.TracebackSystemstack":
+			countIn++
+		case "runtime_test.TestTracebackSystemstack":
+			countOut++
+		}
+		if !more {
+			break
+		}
+	}
+	if countIn != 5 || countOut != 1 {
+		t.Fatalf("expected 5 calls to TracebackSystemstack and 1 call to TestTracebackSystemstack, got:%s", tb.String())
+	}
+}
+
+func TestTracebackAncestors(t *testing.T) {
+	goroutineRegex := regexp.MustCompile(`goroutine [0-9]+ \[`)
+	for _, tracebackDepth := range []int{0, 1, 5, 50} {
+		output := runTestProg(t, "testprog", "TracebackAncestors", fmt.Sprintf("GODEBUG=tracebackancestors=%d", tracebackDepth))
+
+		numGoroutines := 3
+		numFrames := 2
+		ancestorsExpected := numGoroutines
+		if numGoroutines > tracebackDepth {
+			ancestorsExpected = tracebackDepth
+		}
+
+		matches := goroutineRegex.FindAllStringSubmatch(output, -1)
+		if len(matches) != 2 {
+			t.Fatalf("want 2 goroutines, got:\n%s", output)
+		}
+
+		// Check functions in the traceback.
+		fns := []string{"main.recurseThenCallGo", "main.main", "main.printStack", "main.TracebackAncestors"}
+		for _, fn := range fns {
+			if !strings.Contains(output, "\n"+fn+"(") {
+				t.Fatalf("expected %q function in traceback:\n%s", fn, output)
+			}
+		}
+
+		if want, count := "originating from goroutine", ancestorsExpected; strings.Count(output, want) != count {
+			t.Errorf("output does not contain %d instances of %q:\n%s", count, want, output)
+		}
+
+		if want, count := "main.recurseThenCallGo(...)", ancestorsExpected*(numFrames+1); strings.Count(output, want) != count {
+			t.Errorf("output does not contain %d instances of %q:\n%s", count, want, output)
+		}
+
+		if want, count := "main.recurseThenCallGo(0x", 1; strings.Count(output, want) != count {
+			t.Errorf("output does not contain %d instances of %q:\n%s", count, want, output)
+		}
+	}
+}
+
+// Test that defer closure is correctly scanned when the stack is scanned.
+func TestDeferLiveness(t *testing.T) {
+	output := runTestProg(t, "testprog", "DeferLiveness", "GODEBUG=clobberfree=1")
+	if output != "" {
+		t.Errorf("output:\n%s\n\nwant no output", output)
+	}
+}
+
+func TestDeferHeapAndStack(t *testing.T) {
+	P := 4     // processors
+	N := 10000 //iterations
+	D := 200   // stack depth
+
+	if testing.Short() {
+		P /= 2
+		N /= 10
+		D /= 10
+	}
+	c := make(chan bool)
+	for p := 0; p < P; p++ {
+		go func() {
+			for i := 0; i < N; i++ {
+				if deferHeapAndStack(D) != 2*D {
+					panic("bad result")
+				}
+			}
+			c <- true
+		}()
+	}
+	for p := 0; p < P; p++ {
+		<-c
+	}
+}
+
+// deferHeapAndStack(n) computes 2*n
+func deferHeapAndStack(n int) (r int) {
+	if n == 0 {
+		return 0
+	}
+	if n%2 == 0 {
+		// heap-allocated defers
+		for i := 0; i < 2; i++ {
+			defer func() {
+				r++
+			}()
+		}
+	} else {
+		// stack-allocated defers
+		defer func() {
+			r++
+		}()
+		defer func() {
+			r++
+		}()
+	}
+	r = deferHeapAndStack(n - 1)
+	escapeMe(new([1024]byte)) // force some GCs
+	return
+}
+
+// Pass a value to escapeMe to force it to escape.
+var escapeMe = func(x any) {}
+
+func TestFramePointerAdjust(t *testing.T) {
+	switch GOARCH {
+	case "amd64", "arm64":
+	default:
+		t.Skipf("frame pointer is not supported on %s", GOARCH)
+	}
+	output := runTestProg(t, "testprog", "FramePointerAdjust")
+	if output != "" {
+		t.Errorf("output:\n%s\n\nwant no output", output)
+	}
+}
+
+// TestSystemstackFramePointerAdjust is a regression test for issue 59692 that
+// ensures that the frame pointer of systemstack is correctly adjusted. See CL
+// 489015 for more details.
+func TestSystemstackFramePointerAdjust(t *testing.T) {
+	growAndShrinkStack(512, [1024]byte{})
+}
+
+// growAndShrinkStack grows the stack of the current goroutine in order to
+// shrink it again and verify that all frame pointers on the new stack have
+// been correctly adjusted. stackBallast is used to ensure we're not depending
+// on the current heuristics of stack shrinking too much.
+func growAndShrinkStack(n int, stackBallast [1024]byte) {
+	if n <= 0 {
+		return
+	}
+	growAndShrinkStack(n-1, stackBallast)
+	ShrinkStackAndVerifyFramePointers()
+}
diff --git a/src/runtime/start_line_amd64_test.go b/src/runtime/start_line_amd64_test.go
new file mode 100644
index 0000000..305ed0b
--- /dev/null
+++ b/src/runtime/start_line_amd64_test.go
@@ -0,0 +1,23 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime/internal/startlinetest"
+	"testing"
+)
+
+// TestStartLineAsm tests the start line metadata of an assembly function. This
+// is only tested on amd64 to avoid the need for a proliferation of per-arch
+// copies of this function.
+func TestStartLineAsm(t *testing.T) {
+	startlinetest.CallerStartLine = callerStartLine
+
+	const wantLine = 23
+	got := startlinetest.AsmFunc()
+	if got != wantLine {
+		t.Errorf("start line got %d want %d", got, wantLine)
+	}
+}
diff --git a/src/runtime/start_line_test.go b/src/runtime/start_line_test.go
new file mode 100644
index 0000000..0762351
--- /dev/null
+++ b/src/runtime/start_line_test.go
@@ -0,0 +1,138 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"internal/testenv"
+	"runtime"
+	"testing"
+)
+
+// The tests in this file test the function start line metadata included in
+// _func and inlinedCall. TestStartLine hard-codes the start lines of functions
+// in this file. If code moves, the test will need to be updated.
+//
+// The "start line" of a function should be the line containing the func
+// keyword.
+
+func normalFunc() int {
+	return callerStartLine(false)
+}
+
+func multilineDeclarationFunc() int {
+	return multilineDeclarationFunc1(0, 0, 0)
+}
+
+//go:noinline
+func multilineDeclarationFunc1(
+	a, b, c int) int {
+	return callerStartLine(false)
+}
+
+func blankLinesFunc() int {
+
+	// Some
+	// lines
+	// without
+	// code
+
+	return callerStartLine(false)
+}
+
+func inlineFunc() int {
+	return inlineFunc1()
+}
+
+func inlineFunc1() int {
+	return callerStartLine(true)
+}
+
+var closureFn func() int
+
+func normalClosure() int {
+	// Assign to global to ensure this isn't inlined.
+	closureFn = func() int {
+		return callerStartLine(false)
+	}
+	return closureFn()
+}
+
+func inlineClosure() int {
+	return func() int {
+		return callerStartLine(true)
+	}()
+}
+
+func TestStartLine(t *testing.T) {
+	// We test inlined vs non-inlined variants. We can't do that if
+	// optimizations are disabled.
+	testenv.SkipIfOptimizationOff(t)
+
+	testCases := []struct {
+		name string
+		fn   func() int
+		want int
+	}{
+		{
+			name: "normal",
+			fn:   normalFunc,
+			want: 21,
+		},
+		{
+			name: "multiline-declaration",
+			fn:   multilineDeclarationFunc,
+			want: 30,
+		},
+		{
+			name: "blank-lines",
+			fn:   blankLinesFunc,
+			want: 35,
+		},
+		{
+			name: "inline",
+			fn:   inlineFunc,
+			want: 49,
+		},
+		{
+			name: "normal-closure",
+			fn:   normalClosure,
+			want: 57,
+		},
+		{
+			name: "inline-closure",
+			fn:   inlineClosure,
+			want: 64,
+		},
+	}
+
+	for _, tc := range testCases {
+		t.Run(tc.name, func(t *testing.T) {
+			got := tc.fn()
+			if got != tc.want {
+				t.Errorf("start line got %d want %d", got, tc.want)
+			}
+		})
+	}
+}
+
+//go:noinline
+func callerStartLine(wantInlined bool) int {
+	var pcs [1]uintptr
+	n := runtime.Callers(2, pcs[:])
+	if n != 1 {
+		panic(fmt.Sprintf("no caller of callerStartLine? n = %d", n))
+	}
+
+	frames := runtime.CallersFrames(pcs[:])
+	frame, _ := frames.Next()
+
+	inlined := frame.Func == nil // Func always set to nil for inlined frames
+	if wantInlined != inlined {
+		panic(fmt.Sprintf("caller %s inlined got %v want %v", frame.Function, inlined, wantInlined))
+	}
+
+	return runtime.FrameStartLine(&frame)
+}
diff --git a/src/runtime/stkframe.go b/src/runtime/stkframe.go
new file mode 100644
index 0000000..5caacba
--- /dev/null
+++ b/src/runtime/stkframe.go
@@ -0,0 +1,289 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// A stkframe holds information about a single physical stack frame.
+type stkframe struct {
+	// fn is the function being run in this frame. If there is
+	// inlining, this is the outermost function.
+	fn funcInfo
+
+	// pc is the program counter within fn.
+	//
+	// The meaning of this is subtle:
+	//
+	// - Typically, this frame performed a regular function call
+	//   and this is the return PC (just after the CALL
+	//   instruction). In this case, pc-1 reflects the CALL
+	//   instruction itself and is the correct source of symbolic
+	//   information.
+	//
+	// - If this frame "called" sigpanic, then pc is the
+	//   instruction that panicked, and pc is the correct address
+	//   to use for symbolic information.
+	//
+	// - If this is the innermost frame, then PC is where
+	//   execution will continue, but it may not be the
+	//   instruction following a CALL. This may be from
+	//   cooperative preemption, in which case this is the
+	//   instruction after the call to morestack. Or this may be
+	//   from a signal or an un-started goroutine, in which case
+	//   PC could be any instruction, including the first
+	//   instruction in a function. Conventionally, we use pc-1
+	//   for symbolic information, unless pc == fn.entry(), in
+	//   which case we use pc.
+	pc uintptr
+
+	// continpc is the PC where execution will continue in fn, or
+	// 0 if execution will not continue in this frame.
+	//
+	// This is usually the same as pc, unless this frame "called"
+	// sigpanic, in which case it's either the address of
+	// deferreturn or 0 if this frame will never execute again.
+	//
+	// This is the PC to use to look up GC liveness for this frame.
+	continpc uintptr
+
+	lr   uintptr // program counter at caller aka link register
+	sp   uintptr // stack pointer at pc
+	fp   uintptr // stack pointer at caller aka frame pointer
+	varp uintptr // top of local variables
+	argp uintptr // pointer to function arguments
+}
+
+// reflectMethodValue is a partial duplicate of reflect.makeFuncImpl
+// and reflect.methodValue.
+type reflectMethodValue struct {
+	fn     uintptr
+	stack  *bitvector // ptrmap for both args and results
+	argLen uintptr    // just args
+}
+
+// argBytes returns the argument frame size for a call to frame.fn.
+func (frame *stkframe) argBytes() uintptr {
+	if frame.fn.args != abi.ArgsSizeUnknown {
+		return uintptr(frame.fn.args)
+	}
+	// This is an uncommon and complicated case. Fall back to fully
+	// fetching the argument map to compute its size.
+	argMap, _ := frame.argMapInternal()
+	return uintptr(argMap.n) * goarch.PtrSize
+}
+
+// argMapInternal is used internally by stkframe to fetch special
+// argument maps.
+//
+// argMap.n is always populated with the size of the argument map.
+//
+// argMap.bytedata is only populated for dynamic argument maps (used
+// by reflect). If the caller requires the argument map, it should use
+// this if non-nil, and otherwise fetch the argument map using the
+// current PC.
+//
+// hasReflectStackObj indicates that this frame also has a reflect
+// function stack object, which the caller must synthesize.
+func (frame *stkframe) argMapInternal() (argMap bitvector, hasReflectStackObj bool) {
+	f := frame.fn
+	if f.args != abi.ArgsSizeUnknown {
+		argMap.n = f.args / goarch.PtrSize
+		return
+	}
+	// Extract argument bitmaps for reflect stubs from the calls they made to reflect.
+	switch funcname(f) {
+	case "reflect.makeFuncStub", "reflect.methodValueCall":
+		// These take a *reflect.methodValue as their
+		// context register and immediately save it to 0(SP).
+		// Get the methodValue from 0(SP).
+		arg0 := frame.sp + sys.MinFrameSize
+
+		minSP := frame.fp
+		if !usesLR {
+			// The CALL itself pushes a word.
+			// Undo that adjustment.
+			minSP -= goarch.PtrSize
+		}
+		if arg0 >= minSP {
+			// The function hasn't started yet.
+			// This only happens if f was the
+			// start function of a new goroutine
+			// that hasn't run yet *and* f takes
+			// no arguments and has no results
+			// (otherwise it will get wrapped in a
+			// closure). In this case, we can't
+			// reach into its locals because it
+			// doesn't have locals yet, but we
+			// also know its argument map is
+			// empty.
+			if frame.pc != f.entry() {
+				print("runtime: confused by ", funcname(f), ": no frame (sp=", hex(frame.sp), " fp=", hex(frame.fp), ") at entry+", hex(frame.pc-f.entry()), "\n")
+				throw("reflect mismatch")
+			}
+			return bitvector{}, false // No locals, so also no stack objects
+		}
+		hasReflectStackObj = true
+		mv := *(**reflectMethodValue)(unsafe.Pointer(arg0))
+		// Figure out whether the return values are valid.
+		// Reflect will update this value after it copies
+		// in the return values.
+		retValid := *(*bool)(unsafe.Pointer(arg0 + 4*goarch.PtrSize))
+		if mv.fn != f.entry() {
+			print("runtime: confused by ", funcname(f), "\n")
+			throw("reflect mismatch")
+		}
+		argMap = *mv.stack
+		if !retValid {
+			// argMap.n includes the results, but
+			// those aren't valid, so drop them.
+			n := int32((uintptr(mv.argLen) &^ (goarch.PtrSize - 1)) / goarch.PtrSize)
+			if n < argMap.n {
+				argMap.n = n
+			}
+		}
+	}
+	return
+}
+
+// getStackMap returns the locals and arguments live pointer maps, and
+// stack object list for frame.
+func (frame *stkframe) getStackMap(cache *pcvalueCache, debug bool) (locals, args bitvector, objs []stackObjectRecord) {
+	targetpc := frame.continpc
+	if targetpc == 0 {
+		// Frame is dead. Return empty bitvectors.
+		return
+	}
+
+	f := frame.fn
+	pcdata := int32(-1)
+	if targetpc != f.entry() {
+		// Back up to the CALL. If we're at the function entry
+		// point, we want to use the entry map (-1), even if
+		// the first instruction of the function changes the
+		// stack map.
+		targetpc--
+		pcdata = pcdatavalue(f, abi.PCDATA_StackMapIndex, targetpc, cache)
+	}
+	if pcdata == -1 {
+		// We do not have a valid pcdata value but there might be a
+		// stackmap for this function. It is likely that we are looking
+		// at the function prologue, assume so and hope for the best.
+		pcdata = 0
+	}
+
+	// Local variables.
+	size := frame.varp - frame.sp
+	var minsize uintptr
+	switch goarch.ArchFamily {
+	case goarch.ARM64:
+		minsize = sys.StackAlign
+	default:
+		minsize = sys.MinFrameSize
+	}
+	if size > minsize {
+		stackid := pcdata
+		stkmap := (*stackmap)(funcdata(f, abi.FUNCDATA_LocalsPointerMaps))
+		if stkmap == nil || stkmap.n <= 0 {
+			print("runtime: frame ", funcname(f), " untyped locals ", hex(frame.varp-size), "+", hex(size), "\n")
+			throw("missing stackmap")
+		}
+		// If nbit == 0, there's no work to do.
+		if stkmap.nbit > 0 {
+			if stackid < 0 || stackid >= stkmap.n {
+				// don't know where we are
+				print("runtime: pcdata is ", stackid, " and ", stkmap.n, " locals stack map entries for ", funcname(f), " (targetpc=", hex(targetpc), ")\n")
+				throw("bad symbol table")
+			}
+			locals = stackmapdata(stkmap, stackid)
+			if stackDebug >= 3 && debug {
+				print("      locals ", stackid, "/", stkmap.n, " ", locals.n, " words ", locals.bytedata, "\n")
+			}
+		} else if stackDebug >= 3 && debug {
+			print("      no locals to adjust\n")
+		}
+	}
+
+	// Arguments. First fetch frame size and special-case argument maps.
+	var isReflect bool
+	args, isReflect = frame.argMapInternal()
+	if args.n > 0 && args.bytedata == nil {
+		// Non-empty argument frame, but not a special map.
+		// Fetch the argument map at pcdata.
+		stackmap := (*stackmap)(funcdata(f, abi.FUNCDATA_ArgsPointerMaps))
+		if stackmap == nil || stackmap.n <= 0 {
+			print("runtime: frame ", funcname(f), " untyped args ", hex(frame.argp), "+", hex(args.n*goarch.PtrSize), "\n")
+			throw("missing stackmap")
+		}
+		if pcdata < 0 || pcdata >= stackmap.n {
+			// don't know where we are
+			print("runtime: pcdata is ", pcdata, " and ", stackmap.n, " args stack map entries for ", funcname(f), " (targetpc=", hex(targetpc), ")\n")
+			throw("bad symbol table")
+		}
+		if stackmap.nbit == 0 {
+			args.n = 0
+		} else {
+			args = stackmapdata(stackmap, pcdata)
+		}
+	}
+
+	// stack objects.
+	if (GOARCH == "amd64" || GOARCH == "arm64" || GOARCH == "ppc64" || GOARCH == "ppc64le" || GOARCH == "riscv64") &&
+		unsafe.Sizeof(abi.RegArgs{}) > 0 && isReflect {
+		// For reflect.makeFuncStub and reflect.methodValueCall,
+		// we need to fake the stack object record.
+		// These frames contain an internal/abi.RegArgs at a hard-coded offset.
+		// This offset matches the assembly code on amd64 and arm64.
+		objs = methodValueCallFrameObjs[:]
+	} else {
+		p := funcdata(f, abi.FUNCDATA_StackObjects)
+		if p != nil {
+			n := *(*uintptr)(p)
+			p = add(p, goarch.PtrSize)
+			r0 := (*stackObjectRecord)(noescape(p))
+			objs = unsafe.Slice(r0, int(n))
+			// Note: the noescape above is needed to keep
+			// getStackMap from "leaking param content:
+			// frame".  That leak propagates up to getgcmask, then
+			// GCMask, then verifyGCInfo, which converts the stack
+			// gcinfo tests into heap gcinfo tests :(
+		}
+	}
+
+	return
+}
+
+var methodValueCallFrameObjs [1]stackObjectRecord // initialized in stackobjectinit
+
+func stkobjinit() {
+	var abiRegArgsEface any = abi.RegArgs{}
+	abiRegArgsType := efaceOf(&abiRegArgsEface)._type
+	if abiRegArgsType.Kind_&kindGCProg != 0 {
+		throw("abiRegArgsType needs GC Prog, update methodValueCallFrameObjs")
+	}
+	// Set methodValueCallFrameObjs[0].gcdataoff so that
+	// stackObjectRecord.gcdata() will work correctly with it.
+	ptr := uintptr(unsafe.Pointer(&methodValueCallFrameObjs[0]))
+	var mod *moduledata
+	for datap := &firstmoduledata; datap != nil; datap = datap.next {
+		if datap.gofunc <= ptr && ptr < datap.end {
+			mod = datap
+			break
+		}
+	}
+	if mod == nil {
+		throw("methodValueCallFrameObjs is not in a module")
+	}
+	methodValueCallFrameObjs[0] = stackObjectRecord{
+		off:       -int32(alignUp(abiRegArgsType.Size_, 8)), // It's always the highest address local.
+		size:      int32(abiRegArgsType.Size_),
+		_ptrdata:  int32(abiRegArgsType.PtrBytes),
+		gcdataoff: uint32(uintptr(unsafe.Pointer(abiRegArgsType.GCData)) - mod.rodata),
+	}
+}
diff --git a/src/runtime/string.go b/src/runtime/string.go
new file mode 100644
index 0000000..7ac3e66
--- /dev/null
+++ b/src/runtime/string.go
@@ -0,0 +1,588 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/bytealg"
+	"internal/goarch"
+	"unsafe"
+)
+
+// The constant is known to the compiler.
+// There is no fundamental theory behind this number.
+const tmpStringBufSize = 32
+
+type tmpBuf [tmpStringBufSize]byte
+
+// concatstrings implements a Go string concatenation x+y+z+...
+// The operands are passed in the slice a.
+// If buf != nil, the compiler has determined that the result does not
+// escape the calling function, so the string data can be stored in buf
+// if small enough.
+func concatstrings(buf *tmpBuf, a []string) string {
+	idx := 0
+	l := 0
+	count := 0
+	for i, x := range a {
+		n := len(x)
+		if n == 0 {
+			continue
+		}
+		if l+n < l {
+			throw("string concatenation too long")
+		}
+		l += n
+		count++
+		idx = i
+	}
+	if count == 0 {
+		return ""
+	}
+
+	// If there is just one string and either it is not on the stack
+	// or our result does not escape the calling frame (buf != nil),
+	// then we can return that string directly.
+	if count == 1 && (buf != nil || !stringDataOnStack(a[idx])) {
+		return a[idx]
+	}
+	s, b := rawstringtmp(buf, l)
+	for _, x := range a {
+		copy(b, x)
+		b = b[len(x):]
+	}
+	return s
+}
+
+func concatstring2(buf *tmpBuf, a0, a1 string) string {
+	return concatstrings(buf, []string{a0, a1})
+}
+
+func concatstring3(buf *tmpBuf, a0, a1, a2 string) string {
+	return concatstrings(buf, []string{a0, a1, a2})
+}
+
+func concatstring4(buf *tmpBuf, a0, a1, a2, a3 string) string {
+	return concatstrings(buf, []string{a0, a1, a2, a3})
+}
+
+func concatstring5(buf *tmpBuf, a0, a1, a2, a3, a4 string) string {
+	return concatstrings(buf, []string{a0, a1, a2, a3, a4})
+}
+
+// slicebytetostring converts a byte slice to a string.
+// It is inserted by the compiler into generated code.
+// ptr is a pointer to the first element of the slice;
+// n is the length of the slice.
+// Buf is a fixed-size buffer for the result,
+// it is not nil if the result does not escape.
+func slicebytetostring(buf *tmpBuf, ptr *byte, n int) string {
+	if n == 0 {
+		// Turns out to be a relatively common case.
+		// Consider that you want to parse out data between parens in "foo()bar",
+		// you find the indices and convert the subslice to string.
+		return ""
+	}
+	if raceenabled {
+		racereadrangepc(unsafe.Pointer(ptr),
+			uintptr(n),
+			getcallerpc(),
+			abi.FuncPCABIInternal(slicebytetostring))
+	}
+	if msanenabled {
+		msanread(unsafe.Pointer(ptr), uintptr(n))
+	}
+	if asanenabled {
+		asanread(unsafe.Pointer(ptr), uintptr(n))
+	}
+	if n == 1 {
+		p := unsafe.Pointer(&staticuint64s[*ptr])
+		if goarch.BigEndian {
+			p = add(p, 7)
+		}
+		return unsafe.String((*byte)(p), 1)
+	}
+
+	var p unsafe.Pointer
+	if buf != nil && n <= len(buf) {
+		p = unsafe.Pointer(buf)
+	} else {
+		p = mallocgc(uintptr(n), nil, false)
+	}
+	memmove(p, unsafe.Pointer(ptr), uintptr(n))
+	return unsafe.String((*byte)(p), n)
+}
+
+// stringDataOnStack reports whether the string's data is
+// stored on the current goroutine's stack.
+func stringDataOnStack(s string) bool {
+	ptr := uintptr(unsafe.Pointer(unsafe.StringData(s)))
+	stk := getg().stack
+	return stk.lo <= ptr && ptr < stk.hi
+}
+
+func rawstringtmp(buf *tmpBuf, l int) (s string, b []byte) {
+	if buf != nil && l <= len(buf) {
+		b = buf[:l]
+		s = slicebytetostringtmp(&b[0], len(b))
+	} else {
+		s, b = rawstring(l)
+	}
+	return
+}
+
+// slicebytetostringtmp returns a "string" referring to the actual []byte bytes.
+//
+// Callers need to ensure that the returned string will not be used after
+// the calling goroutine modifies the original slice or synchronizes with
+// another goroutine.
+//
+// The function is only called when instrumenting
+// and otherwise intrinsified by the compiler.
+//
+// Some internal compiler optimizations use this function.
+//   - Used for m[T1{... Tn{..., string(k), ...} ...}] and m[string(k)]
+//     where k is []byte, T1 to Tn is a nesting of struct and array literals.
+//   - Used for "<"+string(b)+">" concatenation where b is []byte.
+//   - Used for string(b)=="foo" comparison where b is []byte.
+func slicebytetostringtmp(ptr *byte, n int) string {
+	if raceenabled && n > 0 {
+		racereadrangepc(unsafe.Pointer(ptr),
+			uintptr(n),
+			getcallerpc(),
+			abi.FuncPCABIInternal(slicebytetostringtmp))
+	}
+	if msanenabled && n > 0 {
+		msanread(unsafe.Pointer(ptr), uintptr(n))
+	}
+	if asanenabled && n > 0 {
+		asanread(unsafe.Pointer(ptr), uintptr(n))
+	}
+	return unsafe.String(ptr, n)
+}
+
+func stringtoslicebyte(buf *tmpBuf, s string) []byte {
+	var b []byte
+	if buf != nil && len(s) <= len(buf) {
+		*buf = tmpBuf{}
+		b = buf[:len(s)]
+	} else {
+		b = rawbyteslice(len(s))
+	}
+	copy(b, s)
+	return b
+}
+
+func stringtoslicerune(buf *[tmpStringBufSize]rune, s string) []rune {
+	// two passes.
+	// unlike slicerunetostring, no race because strings are immutable.
+	n := 0
+	for range s {
+		n++
+	}
+
+	var a []rune
+	if buf != nil && n <= len(buf) {
+		*buf = [tmpStringBufSize]rune{}
+		a = buf[:n]
+	} else {
+		a = rawruneslice(n)
+	}
+
+	n = 0
+	for _, r := range s {
+		a[n] = r
+		n++
+	}
+	return a
+}
+
+func slicerunetostring(buf *tmpBuf, a []rune) string {
+	if raceenabled && len(a) > 0 {
+		racereadrangepc(unsafe.Pointer(&a[0]),
+			uintptr(len(a))*unsafe.Sizeof(a[0]),
+			getcallerpc(),
+			abi.FuncPCABIInternal(slicerunetostring))
+	}
+	if msanenabled && len(a) > 0 {
+		msanread(unsafe.Pointer(&a[0]), uintptr(len(a))*unsafe.Sizeof(a[0]))
+	}
+	if asanenabled && len(a) > 0 {
+		asanread(unsafe.Pointer(&a[0]), uintptr(len(a))*unsafe.Sizeof(a[0]))
+	}
+	var dum [4]byte
+	size1 := 0
+	for _, r := range a {
+		size1 += encoderune(dum[:], r)
+	}
+	s, b := rawstringtmp(buf, size1+3)
+	size2 := 0
+	for _, r := range a {
+		// check for race
+		if size2 >= size1 {
+			break
+		}
+		size2 += encoderune(b[size2:], r)
+	}
+	return s[:size2]
+}
+
+type stringStruct struct {
+	str unsafe.Pointer
+	len int
+}
+
+// Variant with *byte pointer type for DWARF debugging.
+type stringStructDWARF struct {
+	str *byte
+	len int
+}
+
+func stringStructOf(sp *string) *stringStruct {
+	return (*stringStruct)(unsafe.Pointer(sp))
+}
+
+func intstring(buf *[4]byte, v int64) (s string) {
+	var b []byte
+	if buf != nil {
+		b = buf[:]
+		s = slicebytetostringtmp(&b[0], len(b))
+	} else {
+		s, b = rawstring(4)
+	}
+	if int64(rune(v)) != v {
+		v = runeError
+	}
+	n := encoderune(b, rune(v))
+	return s[:n]
+}
+
+// rawstring allocates storage for a new string. The returned
+// string and byte slice both refer to the same storage.
+// The storage is not zeroed. Callers should use
+// b to set the string contents and then drop b.
+func rawstring(size int) (s string, b []byte) {
+	p := mallocgc(uintptr(size), nil, false)
+	return unsafe.String((*byte)(p), size), unsafe.Slice((*byte)(p), size)
+}
+
+// rawbyteslice allocates a new byte slice. The byte slice is not zeroed.
+func rawbyteslice(size int) (b []byte) {
+	cap := roundupsize(uintptr(size))
+	p := mallocgc(cap, nil, false)
+	if cap != uintptr(size) {
+		memclrNoHeapPointers(add(p, uintptr(size)), cap-uintptr(size))
+	}
+
+	*(*slice)(unsafe.Pointer(&b)) = slice{p, size, int(cap)}
+	return
+}
+
+// rawruneslice allocates a new rune slice. The rune slice is not zeroed.
+func rawruneslice(size int) (b []rune) {
+	if uintptr(size) > maxAlloc/4 {
+		throw("out of memory")
+	}
+	mem := roundupsize(uintptr(size) * 4)
+	p := mallocgc(mem, nil, false)
+	if mem != uintptr(size)*4 {
+		memclrNoHeapPointers(add(p, uintptr(size)*4), mem-uintptr(size)*4)
+	}
+
+	*(*slice)(unsafe.Pointer(&b)) = slice{p, size, int(mem / 4)}
+	return
+}
+
+// used by cmd/cgo
+func gobytes(p *byte, n int) (b []byte) {
+	if n == 0 {
+		return make([]byte, 0)
+	}
+
+	if n < 0 || uintptr(n) > maxAlloc {
+		panic(errorString("gobytes: length out of range"))
+	}
+
+	bp := mallocgc(uintptr(n), nil, false)
+	memmove(bp, unsafe.Pointer(p), uintptr(n))
+
+	*(*slice)(unsafe.Pointer(&b)) = slice{bp, n, n}
+	return
+}
+
+// This is exported via linkname to assembly in syscall (for Plan9).
+//
+//go:linkname gostring
+func gostring(p *byte) string {
+	l := findnull(p)
+	if l == 0 {
+		return ""
+	}
+	s, b := rawstring(l)
+	memmove(unsafe.Pointer(&b[0]), unsafe.Pointer(p), uintptr(l))
+	return s
+}
+
+// internal_syscall_gostring is a version of gostring for internal/syscall/unix.
+//
+//go:linkname internal_syscall_gostring internal/syscall/unix.gostring
+func internal_syscall_gostring(p *byte) string {
+	return gostring(p)
+}
+
+func gostringn(p *byte, l int) string {
+	if l == 0 {
+		return ""
+	}
+	s, b := rawstring(l)
+	memmove(unsafe.Pointer(&b[0]), unsafe.Pointer(p), uintptr(l))
+	return s
+}
+
+func hasPrefix(s, prefix string) bool {
+	return len(s) >= len(prefix) && s[:len(prefix)] == prefix
+}
+
+func hasSuffix(s, suffix string) bool {
+	return len(s) >= len(suffix) && s[len(s)-len(suffix):] == suffix
+}
+
+const (
+	maxUint64 = ^uint64(0)
+	maxInt64  = int64(maxUint64 >> 1)
+)
+
+// atoi64 parses an int64 from a string s.
+// The bool result reports whether s is a number
+// representable by a value of type int64.
+func atoi64(s string) (int64, bool) {
+	if s == "" {
+		return 0, false
+	}
+
+	neg := false
+	if s[0] == '-' {
+		neg = true
+		s = s[1:]
+	}
+
+	un := uint64(0)
+	for i := 0; i < len(s); i++ {
+		c := s[i]
+		if c < '0' || c > '9' {
+			return 0, false
+		}
+		if un > maxUint64/10 {
+			// overflow
+			return 0, false
+		}
+		un *= 10
+		un1 := un + uint64(c) - '0'
+		if un1 < un {
+			// overflow
+			return 0, false
+		}
+		un = un1
+	}
+
+	if !neg && un > uint64(maxInt64) {
+		return 0, false
+	}
+	if neg && un > uint64(maxInt64)+1 {
+		return 0, false
+	}
+
+	n := int64(un)
+	if neg {
+		n = -n
+	}
+
+	return n, true
+}
+
+// atoi is like atoi64 but for integers
+// that fit into an int.
+func atoi(s string) (int, bool) {
+	if n, ok := atoi64(s); n == int64(int(n)) {
+		return int(n), ok
+	}
+	return 0, false
+}
+
+// atoi32 is like atoi but for integers
+// that fit into an int32.
+func atoi32(s string) (int32, bool) {
+	if n, ok := atoi64(s); n == int64(int32(n)) {
+		return int32(n), ok
+	}
+	return 0, false
+}
+
+// parseByteCount parses a string that represents a count of bytes.
+//
+// s must match the following regular expression:
+//
+//	^[0-9]+(([KMGT]i)?B)?$
+//
+// In other words, an integer byte count with an optional unit
+// suffix. Acceptable suffixes include one of
+// - KiB, MiB, GiB, TiB which represent binary IEC/ISO 80000 units, or
+// - B, which just represents bytes.
+//
+// Returns an int64 because that's what its callers want and receive,
+// but the result is always non-negative.
+func parseByteCount(s string) (int64, bool) {
+	// The empty string is not valid.
+	if s == "" {
+		return 0, false
+	}
+	// Handle the easy non-suffix case.
+	last := s[len(s)-1]
+	if last >= '0' && last <= '9' {
+		n, ok := atoi64(s)
+		if !ok || n < 0 {
+			return 0, false
+		}
+		return n, ok
+	}
+	// Failing a trailing digit, this must always end in 'B'.
+	// Also at this point there must be at least one digit before
+	// that B.
+	if last != 'B' || len(s) < 2 {
+		return 0, false
+	}
+	// The one before that must always be a digit or 'i'.
+	if c := s[len(s)-2]; c >= '0' && c <= '9' {
+		// Trivial 'B' suffix.
+		n, ok := atoi64(s[:len(s)-1])
+		if !ok || n < 0 {
+			return 0, false
+		}
+		return n, ok
+	} else if c != 'i' {
+		return 0, false
+	}
+	// Finally, we need at least 4 characters now, for the unit
+	// prefix and at least one digit.
+	if len(s) < 4 {
+		return 0, false
+	}
+	power := 0
+	switch s[len(s)-3] {
+	case 'K':
+		power = 1
+	case 'M':
+		power = 2
+	case 'G':
+		power = 3
+	case 'T':
+		power = 4
+	default:
+		// Invalid suffix.
+		return 0, false
+	}
+	m := uint64(1)
+	for i := 0; i < power; i++ {
+		m *= 1024
+	}
+	n, ok := atoi64(s[:len(s)-3])
+	if !ok || n < 0 {
+		return 0, false
+	}
+	un := uint64(n)
+	if un > maxUint64/m {
+		// Overflow.
+		return 0, false
+	}
+	un *= m
+	if un > uint64(maxInt64) {
+		// Overflow.
+		return 0, false
+	}
+	return int64(un), true
+}
+
+//go:nosplit
+func findnull(s *byte) int {
+	if s == nil {
+		return 0
+	}
+
+	// Avoid IndexByteString on Plan 9 because it uses SSE instructions
+	// on x86 machines, and those are classified as floating point instructions,
+	// which are illegal in a note handler.
+	if GOOS == "plan9" {
+		p := (*[maxAlloc/2 - 1]byte)(unsafe.Pointer(s))
+		l := 0
+		for p[l] != 0 {
+			l++
+		}
+		return l
+	}
+
+	// pageSize is the unit we scan at a time looking for NULL.
+	// It must be the minimum page size for any architecture Go
+	// runs on. It's okay (just a minor performance loss) if the
+	// actual system page size is larger than this value.
+	const pageSize = 4096
+
+	offset := 0
+	ptr := unsafe.Pointer(s)
+	// IndexByteString uses wide reads, so we need to be careful
+	// with page boundaries. Call IndexByteString on
+	// [ptr, endOfPage) interval.
+	safeLen := int(pageSize - uintptr(ptr)%pageSize)
+
+	for {
+		t := *(*string)(unsafe.Pointer(&stringStruct{ptr, safeLen}))
+		// Check one page at a time.
+		if i := bytealg.IndexByteString(t, 0); i != -1 {
+			return offset + i
+		}
+		// Move to next page
+		ptr = unsafe.Pointer(uintptr(ptr) + uintptr(safeLen))
+		offset += safeLen
+		safeLen = pageSize
+	}
+}
+
+func findnullw(s *uint16) int {
+	if s == nil {
+		return 0
+	}
+	p := (*[maxAlloc/2/2 - 1]uint16)(unsafe.Pointer(s))
+	l := 0
+	for p[l] != 0 {
+		l++
+	}
+	return l
+}
+
+//go:nosplit
+func gostringnocopy(str *byte) string {
+	ss := stringStruct{str: unsafe.Pointer(str), len: findnull(str)}
+	s := *(*string)(unsafe.Pointer(&ss))
+	return s
+}
+
+func gostringw(strw *uint16) string {
+	var buf [8]byte
+	str := (*[maxAlloc/2/2 - 1]uint16)(unsafe.Pointer(strw))
+	n1 := 0
+	for i := 0; str[i] != 0; i++ {
+		n1 += encoderune(buf[:], rune(str[i]))
+	}
+	s, b := rawstring(n1 + 4)
+	n2 := 0
+	for i := 0; str[i] != 0; i++ {
+		// check for race
+		if n2 >= n1 {
+			break
+		}
+		n2 += encoderune(b[n2:], rune(str[i]))
+	}
+	b[n2] = 0 // for luck
+	return s[:n2]
+}
diff --git a/src/runtime/string_test.go b/src/runtime/string_test.go
new file mode 100644
index 0000000..cfc0ad7
--- /dev/null
+++ b/src/runtime/string_test.go
@@ -0,0 +1,606 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime"
+	"strconv"
+	"strings"
+	"testing"
+	"unicode/utf8"
+)
+
+// Strings and slices that don't escape and fit into tmpBuf are stack allocated,
+// which defeats using AllocsPerRun to test other optimizations.
+const sizeNoStack = 100
+
+func BenchmarkCompareStringEqual(b *testing.B) {
+	bytes := []byte("Hello Gophers!")
+	s1, s2 := string(bytes), string(bytes)
+	for i := 0; i < b.N; i++ {
+		if s1 != s2 {
+			b.Fatal("s1 != s2")
+		}
+	}
+}
+
+func BenchmarkCompareStringIdentical(b *testing.B) {
+	s1 := "Hello Gophers!"
+	s2 := s1
+	for i := 0; i < b.N; i++ {
+		if s1 != s2 {
+			b.Fatal("s1 != s2")
+		}
+	}
+}
+
+func BenchmarkCompareStringSameLength(b *testing.B) {
+	s1 := "Hello Gophers!"
+	s2 := "Hello, Gophers"
+	for i := 0; i < b.N; i++ {
+		if s1 == s2 {
+			b.Fatal("s1 == s2")
+		}
+	}
+}
+
+func BenchmarkCompareStringDifferentLength(b *testing.B) {
+	s1 := "Hello Gophers!"
+	s2 := "Hello, Gophers!"
+	for i := 0; i < b.N; i++ {
+		if s1 == s2 {
+			b.Fatal("s1 == s2")
+		}
+	}
+}
+
+func BenchmarkCompareStringBigUnaligned(b *testing.B) {
+	bytes := make([]byte, 0, 1<<20)
+	for len(bytes) < 1<<20 {
+		bytes = append(bytes, "Hello Gophers!"...)
+	}
+	s1, s2 := string(bytes), "hello"+string(bytes)
+	for i := 0; i < b.N; i++ {
+		if s1 != s2[len("hello"):] {
+			b.Fatal("s1 != s2")
+		}
+	}
+	b.SetBytes(int64(len(s1)))
+}
+
+func BenchmarkCompareStringBig(b *testing.B) {
+	bytes := make([]byte, 0, 1<<20)
+	for len(bytes) < 1<<20 {
+		bytes = append(bytes, "Hello Gophers!"...)
+	}
+	s1, s2 := string(bytes), string(bytes)
+	for i := 0; i < b.N; i++ {
+		if s1 != s2 {
+			b.Fatal("s1 != s2")
+		}
+	}
+	b.SetBytes(int64(len(s1)))
+}
+
+func BenchmarkConcatStringAndBytes(b *testing.B) {
+	s1 := []byte("Gophers!")
+	for i := 0; i < b.N; i++ {
+		_ = "Hello " + string(s1)
+	}
+}
+
+var escapeString string
+
+func BenchmarkSliceByteToString(b *testing.B) {
+	buf := []byte{'!'}
+	for n := 0; n < 8; n++ {
+		b.Run(strconv.Itoa(len(buf)), func(b *testing.B) {
+			for i := 0; i < b.N; i++ {
+				escapeString = string(buf)
+			}
+		})
+		buf = append(buf, buf...)
+	}
+}
+
+var stringdata = []struct{ name, data string }{
+	{"ASCII", "01234567890"},
+	{"Japanese", "日本語日本語日本語"},
+	{"MixedLength", "$Ѐࠀက퀀𐀀\U00040000\U0010FFFF"},
+}
+
+var sinkInt int
+
+func BenchmarkRuneCount(b *testing.B) {
+	// Each sub-benchmark counts the runes in a string in a different way.
+	b.Run("lenruneslice", func(b *testing.B) {
+		for _, sd := range stringdata {
+			b.Run(sd.name, func(b *testing.B) {
+				for i := 0; i < b.N; i++ {
+					sinkInt += len([]rune(sd.data))
+				}
+			})
+		}
+	})
+	b.Run("rangeloop", func(b *testing.B) {
+		for _, sd := range stringdata {
+			b.Run(sd.name, func(b *testing.B) {
+				for i := 0; i < b.N; i++ {
+					n := 0
+					for range sd.data {
+						n++
+					}
+					sinkInt += n
+				}
+			})
+		}
+	})
+	b.Run("utf8.RuneCountInString", func(b *testing.B) {
+		for _, sd := range stringdata {
+			b.Run(sd.name, func(b *testing.B) {
+				for i := 0; i < b.N; i++ {
+					sinkInt += utf8.RuneCountInString(sd.data)
+				}
+			})
+		}
+	})
+}
+
+func BenchmarkRuneIterate(b *testing.B) {
+	b.Run("range", func(b *testing.B) {
+		for _, sd := range stringdata {
+			b.Run(sd.name, func(b *testing.B) {
+				for i := 0; i < b.N; i++ {
+					for range sd.data {
+					}
+				}
+			})
+		}
+	})
+	b.Run("range1", func(b *testing.B) {
+		for _, sd := range stringdata {
+			b.Run(sd.name, func(b *testing.B) {
+				for i := 0; i < b.N; i++ {
+					for range sd.data {
+					}
+				}
+			})
+		}
+	})
+	b.Run("range2", func(b *testing.B) {
+		for _, sd := range stringdata {
+			b.Run(sd.name, func(b *testing.B) {
+				for i := 0; i < b.N; i++ {
+					for range sd.data {
+					}
+				}
+			})
+		}
+	})
+}
+
+func BenchmarkArrayEqual(b *testing.B) {
+	a1 := [16]byte{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
+	a2 := [16]byte{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		if a1 != a2 {
+			b.Fatal("not equal")
+		}
+	}
+}
+
+func TestStringW(t *testing.T) {
+	strings := []string{
+		"hello",
+		"a\u5566\u7788b",
+	}
+
+	for _, s := range strings {
+		var b []uint16
+		for _, c := range s {
+			b = append(b, uint16(c))
+			if c != rune(uint16(c)) {
+				t.Errorf("bad test: stringW can't handle >16 bit runes")
+			}
+		}
+		b = append(b, 0)
+		r := runtime.GostringW(b)
+		if r != s {
+			t.Errorf("gostringW(%v) = %s, want %s", b, r, s)
+		}
+	}
+}
+
+func TestLargeStringConcat(t *testing.T) {
+	output := runTestProg(t, "testprog", "stringconcat")
+	want := "panic: " + strings.Repeat("0", 1<<10) + strings.Repeat("1", 1<<10) +
+		strings.Repeat("2", 1<<10) + strings.Repeat("3", 1<<10)
+	if !strings.HasPrefix(output, want) {
+		t.Fatalf("output does not start with %q:\n%s", want, output)
+	}
+}
+
+func TestConcatTempString(t *testing.T) {
+	s := "bytes"
+	b := []byte(s)
+	n := testing.AllocsPerRun(1000, func() {
+		if "prefix "+string(b)+" suffix" != "prefix bytes suffix" {
+			t.Fatalf("strings are not equal: '%v' and '%v'", "prefix "+string(b)+" suffix", "prefix bytes suffix")
+		}
+	})
+	if n != 0 {
+		t.Fatalf("want 0 allocs, got %v", n)
+	}
+}
+
+func TestCompareTempString(t *testing.T) {
+	s := strings.Repeat("x", sizeNoStack)
+	b := []byte(s)
+	n := testing.AllocsPerRun(1000, func() {
+		if string(b) != s {
+			t.Fatalf("strings are not equal: '%v' and '%v'", string(b), s)
+		}
+		if string(b) < s {
+			t.Fatalf("strings are not equal: '%v' and '%v'", string(b), s)
+		}
+		if string(b) > s {
+			t.Fatalf("strings are not equal: '%v' and '%v'", string(b), s)
+		}
+		if string(b) == s {
+		} else {
+			t.Fatalf("strings are not equal: '%v' and '%v'", string(b), s)
+		}
+		if string(b) <= s {
+		} else {
+			t.Fatalf("strings are not equal: '%v' and '%v'", string(b), s)
+		}
+		if string(b) >= s {
+		} else {
+			t.Fatalf("strings are not equal: '%v' and '%v'", string(b), s)
+		}
+	})
+	if n != 0 {
+		t.Fatalf("want 0 allocs, got %v", n)
+	}
+}
+
+func TestStringIndexHaystack(t *testing.T) {
+	// See issue 25864.
+	haystack := []byte("hello")
+	needle := "ll"
+	n := testing.AllocsPerRun(1000, func() {
+		if strings.Index(string(haystack), needle) != 2 {
+			t.Fatalf("needle not found")
+		}
+	})
+	if n != 0 {
+		t.Fatalf("want 0 allocs, got %v", n)
+	}
+}
+
+func TestStringIndexNeedle(t *testing.T) {
+	// See issue 25864.
+	haystack := "hello"
+	needle := []byte("ll")
+	n := testing.AllocsPerRun(1000, func() {
+		if strings.Index(haystack, string(needle)) != 2 {
+			t.Fatalf("needle not found")
+		}
+	})
+	if n != 0 {
+		t.Fatalf("want 0 allocs, got %v", n)
+	}
+}
+
+func TestStringOnStack(t *testing.T) {
+	s := ""
+	for i := 0; i < 3; i++ {
+		s = "a" + s + "b" + s + "c"
+	}
+
+	if want := "aaabcbabccbaabcbabccc"; s != want {
+		t.Fatalf("want: '%v', got '%v'", want, s)
+	}
+}
+
+func TestIntString(t *testing.T) {
+	// Non-escaping result of intstring.
+	s := ""
+	for i := rune(0); i < 4; i++ {
+		s += string(i+'0') + string(i+'0'+1)
+	}
+	if want := "01122334"; s != want {
+		t.Fatalf("want '%v', got '%v'", want, s)
+	}
+
+	// Escaping result of intstring.
+	var a [4]string
+	for i := rune(0); i < 4; i++ {
+		a[i] = string(i + '0')
+	}
+	s = a[0] + a[1] + a[2] + a[3]
+	if want := "0123"; s != want {
+		t.Fatalf("want '%v', got '%v'", want, s)
+	}
+}
+
+func TestIntStringAllocs(t *testing.T) {
+	unknown := '0'
+	n := testing.AllocsPerRun(1000, func() {
+		s1 := string(unknown)
+		s2 := string(unknown + 1)
+		if s1 == s2 {
+			t.Fatalf("bad")
+		}
+	})
+	if n != 0 {
+		t.Fatalf("want 0 allocs, got %v", n)
+	}
+}
+
+func TestRangeStringCast(t *testing.T) {
+	s := strings.Repeat("x", sizeNoStack)
+	n := testing.AllocsPerRun(1000, func() {
+		for i, c := range []byte(s) {
+			if c != s[i] {
+				t.Fatalf("want '%c' at pos %v, got '%c'", s[i], i, c)
+			}
+		}
+	})
+	if n != 0 {
+		t.Fatalf("want 0 allocs, got %v", n)
+	}
+}
+
+func isZeroed(b []byte) bool {
+	for _, x := range b {
+		if x != 0 {
+			return false
+		}
+	}
+	return true
+}
+
+func isZeroedR(r []rune) bool {
+	for _, x := range r {
+		if x != 0 {
+			return false
+		}
+	}
+	return true
+}
+
+func TestString2Slice(t *testing.T) {
+	// Make sure we don't return slices that expose
+	// an unzeroed section of stack-allocated temp buf
+	// between len and cap. See issue 14232.
+	s := "foož"
+	b := ([]byte)(s)
+	if !isZeroed(b[len(b):cap(b)]) {
+		t.Errorf("extra bytes not zeroed")
+	}
+	r := ([]rune)(s)
+	if !isZeroedR(r[len(r):cap(r)]) {
+		t.Errorf("extra runes not zeroed")
+	}
+}
+
+const intSize = 32 << (^uint(0) >> 63)
+
+type atoi64Test struct {
+	in  string
+	out int64
+	ok  bool
+}
+
+var atoi64tests = []atoi64Test{
+	{"", 0, false},
+	{"0", 0, true},
+	{"-0", 0, true},
+	{"1", 1, true},
+	{"-1", -1, true},
+	{"12345", 12345, true},
+	{"-12345", -12345, true},
+	{"012345", 12345, true},
+	{"-012345", -12345, true},
+	{"12345x", 0, false},
+	{"-12345x", 0, false},
+	{"98765432100", 98765432100, true},
+	{"-98765432100", -98765432100, true},
+	{"20496382327982653440", 0, false},
+	{"-20496382327982653440", 0, false},
+	{"9223372036854775807", 1<<63 - 1, true},
+	{"-9223372036854775807", -(1<<63 - 1), true},
+	{"9223372036854775808", 0, false},
+	{"-9223372036854775808", -1 << 63, true},
+	{"9223372036854775809", 0, false},
+	{"-9223372036854775809", 0, false},
+}
+
+func TestAtoi(t *testing.T) {
+	switch intSize {
+	case 32:
+		for i := range atoi32tests {
+			test := &atoi32tests[i]
+			out, ok := runtime.Atoi(test.in)
+			if test.out != int32(out) || test.ok != ok {
+				t.Errorf("atoi(%q) = (%v, %v) want (%v, %v)",
+					test.in, out, ok, test.out, test.ok)
+			}
+		}
+	case 64:
+		for i := range atoi64tests {
+			test := &atoi64tests[i]
+			out, ok := runtime.Atoi(test.in)
+			if test.out != int64(out) || test.ok != ok {
+				t.Errorf("atoi(%q) = (%v, %v) want (%v, %v)",
+					test.in, out, ok, test.out, test.ok)
+			}
+		}
+	}
+}
+
+type atoi32Test struct {
+	in  string
+	out int32
+	ok  bool
+}
+
+var atoi32tests = []atoi32Test{
+	{"", 0, false},
+	{"0", 0, true},
+	{"-0", 0, true},
+	{"1", 1, true},
+	{"-1", -1, true},
+	{"12345", 12345, true},
+	{"-12345", -12345, true},
+	{"012345", 12345, true},
+	{"-012345", -12345, true},
+	{"12345x", 0, false},
+	{"-12345x", 0, false},
+	{"987654321", 987654321, true},
+	{"-987654321", -987654321, true},
+	{"2147483647", 1<<31 - 1, true},
+	{"-2147483647", -(1<<31 - 1), true},
+	{"2147483648", 0, false},
+	{"-2147483648", -1 << 31, true},
+	{"2147483649", 0, false},
+	{"-2147483649", 0, false},
+}
+
+func TestAtoi32(t *testing.T) {
+	for i := range atoi32tests {
+		test := &atoi32tests[i]
+		out, ok := runtime.Atoi32(test.in)
+		if test.out != out || test.ok != ok {
+			t.Errorf("atoi32(%q) = (%v, %v) want (%v, %v)",
+				test.in, out, ok, test.out, test.ok)
+		}
+	}
+}
+
+func TestParseByteCount(t *testing.T) {
+	for _, test := range []struct {
+		in  string
+		out int64
+		ok  bool
+	}{
+		// Good numeric inputs.
+		{"1", 1, true},
+		{"12345", 12345, true},
+		{"012345", 12345, true},
+		{"98765432100", 98765432100, true},
+		{"9223372036854775807", 1<<63 - 1, true},
+
+		// Good trivial suffix inputs.
+		{"1B", 1, true},
+		{"12345B", 12345, true},
+		{"012345B", 12345, true},
+		{"98765432100B", 98765432100, true},
+		{"9223372036854775807B", 1<<63 - 1, true},
+
+		// Good binary suffix inputs.
+		{"1KiB", 1 << 10, true},
+		{"05KiB", 5 << 10, true},
+		{"1MiB", 1 << 20, true},
+		{"10MiB", 10 << 20, true},
+		{"1GiB", 1 << 30, true},
+		{"100GiB", 100 << 30, true},
+		{"1TiB", 1 << 40, true},
+		{"99TiB", 99 << 40, true},
+
+		// Good zero inputs.
+		//
+		// -0 is an edge case, but no harm in supporting it.
+		{"-0", 0, true},
+		{"0", 0, true},
+		{"0B", 0, true},
+		{"0KiB", 0, true},
+		{"0MiB", 0, true},
+		{"0GiB", 0, true},
+		{"0TiB", 0, true},
+
+		// Bad inputs.
+		{"", 0, false},
+		{"-1", 0, false},
+		{"a12345", 0, false},
+		{"a12345B", 0, false},
+		{"12345x", 0, false},
+		{"0x12345", 0, false},
+
+		// Bad numeric inputs.
+		{"9223372036854775808", 0, false},
+		{"9223372036854775809", 0, false},
+		{"18446744073709551615", 0, false},
+		{"20496382327982653440", 0, false},
+		{"18446744073709551616", 0, false},
+		{"18446744073709551617", 0, false},
+		{"9999999999999999999999", 0, false},
+
+		// Bad trivial suffix inputs.
+		{"9223372036854775808B", 0, false},
+		{"9223372036854775809B", 0, false},
+		{"18446744073709551615B", 0, false},
+		{"20496382327982653440B", 0, false},
+		{"18446744073709551616B", 0, false},
+		{"18446744073709551617B", 0, false},
+		{"9999999999999999999999B", 0, false},
+
+		// Bad binary suffix inputs.
+		{"1Ki", 0, false},
+		{"05Ki", 0, false},
+		{"10Mi", 0, false},
+		{"100Gi", 0, false},
+		{"99Ti", 0, false},
+		{"22iB", 0, false},
+		{"B", 0, false},
+		{"iB", 0, false},
+		{"KiB", 0, false},
+		{"MiB", 0, false},
+		{"GiB", 0, false},
+		{"TiB", 0, false},
+		{"-120KiB", 0, false},
+		{"-891MiB", 0, false},
+		{"-704GiB", 0, false},
+		{"-42TiB", 0, false},
+		{"99999999999999999999KiB", 0, false},
+		{"99999999999999999MiB", 0, false},
+		{"99999999999999GiB", 0, false},
+		{"99999999999TiB", 0, false},
+		{"555EiB", 0, false},
+
+		// Mistaken SI suffix inputs.
+		{"0KB", 0, false},
+		{"0MB", 0, false},
+		{"0GB", 0, false},
+		{"0TB", 0, false},
+		{"1KB", 0, false},
+		{"05KB", 0, false},
+		{"1MB", 0, false},
+		{"10MB", 0, false},
+		{"1GB", 0, false},
+		{"100GB", 0, false},
+		{"1TB", 0, false},
+		{"99TB", 0, false},
+		{"1K", 0, false},
+		{"05K", 0, false},
+		{"10M", 0, false},
+		{"100G", 0, false},
+		{"99T", 0, false},
+		{"99999999999999999999KB", 0, false},
+		{"99999999999999999MB", 0, false},
+		{"99999999999999GB", 0, false},
+		{"99999999999TB", 0, false},
+		{"99999999999TiB", 0, false},
+		{"555EB", 0, false},
+	} {
+		out, ok := runtime.ParseByteCount(test.in)
+		if test.out != out || test.ok != ok {
+			t.Errorf("parseByteCount(%q) = (%v, %v) want (%v, %v)",
+				test.in, out, ok, test.out, test.ok)
+		}
+	}
+}
diff --git a/src/runtime/stubs.go b/src/runtime/stubs.go
new file mode 100644
index 0000000..65b7299
--- /dev/null
+++ b/src/runtime/stubs.go
@@ -0,0 +1,499 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"runtime/internal/math"
+	"unsafe"
+)
+
+// Should be a built-in for unsafe.Pointer?
+//
+//go:nosplit
+func add(p unsafe.Pointer, x uintptr) unsafe.Pointer {
+	return unsafe.Pointer(uintptr(p) + x)
+}
+
+// getg returns the pointer to the current g.
+// The compiler rewrites calls to this function into instructions
+// that fetch the g directly (from TLS or from the dedicated register).
+func getg() *g
+
+// mcall switches from the g to the g0 stack and invokes fn(g),
+// where g is the goroutine that made the call.
+// mcall saves g's current PC/SP in g->sched so that it can be restored later.
+// It is up to fn to arrange for that later execution, typically by recording
+// g in a data structure, causing something to call ready(g) later.
+// mcall returns to the original goroutine g later, when g has been rescheduled.
+// fn must not return at all; typically it ends by calling schedule, to let the m
+// run other goroutines.
+//
+// mcall can only be called from g stacks (not g0, not gsignal).
+//
+// This must NOT be go:noescape: if fn is a stack-allocated closure,
+// fn puts g on a run queue, and g executes before fn returns, the
+// closure will be invalidated while it is still executing.
+func mcall(fn func(*g))
+
+// systemstack runs fn on a system stack.
+// If systemstack is called from the per-OS-thread (g0) stack, or
+// if systemstack is called from the signal handling (gsignal) stack,
+// systemstack calls fn directly and returns.
+// Otherwise, systemstack is being called from the limited stack
+// of an ordinary goroutine. In this case, systemstack switches
+// to the per-OS-thread stack, calls fn, and switches back.
+// It is common to use a func literal as the argument, in order
+// to share inputs and outputs with the code around the call
+// to system stack:
+//
+//	... set up y ...
+//	systemstack(func() {
+//		x = bigcall(y)
+//	})
+//	... use x ...
+//
+//go:noescape
+func systemstack(fn func())
+
+//go:nosplit
+//go:nowritebarrierrec
+func badsystemstack() {
+	writeErrStr("fatal: systemstack called from unexpected goroutine")
+}
+
+// memclrNoHeapPointers clears n bytes starting at ptr.
+//
+// Usually you should use typedmemclr. memclrNoHeapPointers should be
+// used only when the caller knows that *ptr contains no heap pointers
+// because either:
+//
+// *ptr is initialized memory and its type is pointer-free, or
+//
+// *ptr is uninitialized memory (e.g., memory that's being reused
+// for a new allocation) and hence contains only "junk".
+//
+// memclrNoHeapPointers ensures that if ptr is pointer-aligned, and n
+// is a multiple of the pointer size, then any pointer-aligned,
+// pointer-sized portion is cleared atomically. Despite the function
+// name, this is necessary because this function is the underlying
+// implementation of typedmemclr and memclrHasPointers. See the doc of
+// memmove for more details.
+//
+// The (CPU-specific) implementations of this function are in memclr_*.s.
+//
+//go:noescape
+func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
+
+//go:linkname reflect_memclrNoHeapPointers reflect.memclrNoHeapPointers
+func reflect_memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr) {
+	memclrNoHeapPointers(ptr, n)
+}
+
+// memmove copies n bytes from "from" to "to".
+//
+// memmove ensures that any pointer in "from" is written to "to" with
+// an indivisible write, so that racy reads cannot observe a
+// half-written pointer. This is necessary to prevent the garbage
+// collector from observing invalid pointers, and differs from memmove
+// in unmanaged languages. However, memmove is only required to do
+// this if "from" and "to" may contain pointers, which can only be the
+// case if "from", "to", and "n" are all be word-aligned.
+//
+// Implementations are in memmove_*.s.
+//
+//go:noescape
+func memmove(to, from unsafe.Pointer, n uintptr)
+
+// Outside assembly calls memmove. Make sure it has ABI wrappers.
+//
+//go:linkname memmove
+
+//go:linkname reflect_memmove reflect.memmove
+func reflect_memmove(to, from unsafe.Pointer, n uintptr) {
+	memmove(to, from, n)
+}
+
+// exported value for testing
+const hashLoad = float32(loadFactorNum) / float32(loadFactorDen)
+
+//go:nosplit
+func fastrand() uint32 {
+	mp := getg().m
+	// Implement wyrand: https://github.com/wangyi-fudan/wyhash
+	// Only the platform that math.Mul64 can be lowered
+	// by the compiler should be in this list.
+	if goarch.IsAmd64|goarch.IsArm64|goarch.IsPpc64|
+		goarch.IsPpc64le|goarch.IsMips64|goarch.IsMips64le|
+		goarch.IsS390x|goarch.IsRiscv64|goarch.IsLoong64 == 1 {
+		mp.fastrand += 0xa0761d6478bd642f
+		hi, lo := math.Mul64(mp.fastrand, mp.fastrand^0xe7037ed1a0b428db)
+		return uint32(hi ^ lo)
+	}
+
+	// Implement xorshift64+: 2 32-bit xorshift sequences added together.
+	// Shift triplet [17,7,16] was calculated as indicated in Marsaglia's
+	// Xorshift paper: https://www.jstatsoft.org/article/view/v008i14/xorshift.pdf
+	// This generator passes the SmallCrush suite, part of TestU01 framework:
+	// http://simul.iro.umontreal.ca/testu01/tu01.html
+	t := (*[2]uint32)(unsafe.Pointer(&mp.fastrand))
+	s1, s0 := t[0], t[1]
+	s1 ^= s1 << 17
+	s1 = s1 ^ s0 ^ s1>>7 ^ s0>>16
+	t[0], t[1] = s0, s1
+	return s0 + s1
+}
+
+//go:nosplit
+func fastrandn(n uint32) uint32 {
+	// This is similar to fastrand() % n, but faster.
+	// See https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
+	return uint32(uint64(fastrand()) * uint64(n) >> 32)
+}
+
+func fastrand64() uint64 {
+	mp := getg().m
+	// Implement wyrand: https://github.com/wangyi-fudan/wyhash
+	// Only the platform that math.Mul64 can be lowered
+	// by the compiler should be in this list.
+	if goarch.IsAmd64|goarch.IsArm64|goarch.IsPpc64|
+		goarch.IsPpc64le|goarch.IsMips64|goarch.IsMips64le|
+		goarch.IsS390x|goarch.IsRiscv64 == 1 {
+		mp.fastrand += 0xa0761d6478bd642f
+		hi, lo := math.Mul64(mp.fastrand, mp.fastrand^0xe7037ed1a0b428db)
+		return hi ^ lo
+	}
+
+	// Implement xorshift64+: 2 32-bit xorshift sequences added together.
+	// Xorshift paper: https://www.jstatsoft.org/article/view/v008i14/xorshift.pdf
+	// This generator passes the SmallCrush suite, part of TestU01 framework:
+	// http://simul.iro.umontreal.ca/testu01/tu01.html
+	t := (*[2]uint32)(unsafe.Pointer(&mp.fastrand))
+	s1, s0 := t[0], t[1]
+	s1 ^= s1 << 17
+	s1 = s1 ^ s0 ^ s1>>7 ^ s0>>16
+	r := uint64(s0 + s1)
+
+	s0, s1 = s1, s0
+	s1 ^= s1 << 17
+	s1 = s1 ^ s0 ^ s1>>7 ^ s0>>16
+	r += uint64(s0+s1) << 32
+
+	t[0], t[1] = s0, s1
+	return r
+}
+
+func fastrandu() uint {
+	if goarch.PtrSize == 4 {
+		return uint(fastrand())
+	}
+	return uint(fastrand64())
+}
+
+//go:linkname rand_fastrand64 math/rand.fastrand64
+func rand_fastrand64() uint64 { return fastrand64() }
+
+//go:linkname sync_fastrandn sync.fastrandn
+func sync_fastrandn(n uint32) uint32 { return fastrandn(n) }
+
+//go:linkname net_fastrandu net.fastrandu
+func net_fastrandu() uint { return fastrandu() }
+
+//go:linkname os_fastrand os.fastrand
+func os_fastrand() uint32 { return fastrand() }
+
+// in internal/bytealg/equal_*.s
+//
+//go:noescape
+func memequal(a, b unsafe.Pointer, size uintptr) bool
+
+// noescape hides a pointer from escape analysis.  noescape is
+// the identity function but escape analysis doesn't think the
+// output depends on the input.  noescape is inlined and currently
+// compiles down to zero instructions.
+// USE CAREFULLY!
+//
+//go:nosplit
+func noescape(p unsafe.Pointer) unsafe.Pointer {
+	x := uintptr(p)
+	return unsafe.Pointer(x ^ 0)
+}
+
+// noEscapePtr hides a pointer from escape analysis. See noescape.
+// USE CAREFULLY!
+//
+//go:nosplit
+func noEscapePtr[T any](p *T) *T {
+	x := uintptr(unsafe.Pointer(p))
+	return (*T)(unsafe.Pointer(x ^ 0))
+}
+
+// Not all cgocallback frames are actually cgocallback,
+// so not all have these arguments. Mark them uintptr so that the GC
+// does not misinterpret memory when the arguments are not present.
+// cgocallback is not called from Go, only from crosscall2.
+// This in turn calls cgocallbackg, which is where we'll find
+// pointer-declared arguments.
+//
+// When fn is nil (frame is saved g), call dropm instead,
+// this is used when the C thread is exiting.
+func cgocallback(fn, frame, ctxt uintptr)
+
+func gogo(buf *gobuf)
+
+func asminit()
+func setg(gg *g)
+func breakpoint()
+
+// reflectcall calls fn with arguments described by stackArgs, stackArgsSize,
+// frameSize, and regArgs.
+//
+// Arguments passed on the stack and space for return values passed on the stack
+// must be laid out at the space pointed to by stackArgs (with total length
+// stackArgsSize) according to the ABI.
+//
+// stackRetOffset must be some value <= stackArgsSize that indicates the
+// offset within stackArgs where the return value space begins.
+//
+// frameSize is the total size of the argument frame at stackArgs and must
+// therefore be >= stackArgsSize. It must include additional space for spilling
+// register arguments for stack growth and preemption.
+//
+// TODO(mknyszek): Once we don't need the additional spill space, remove frameSize,
+// since frameSize will be redundant with stackArgsSize.
+//
+// Arguments passed in registers must be laid out in regArgs according to the ABI.
+// regArgs will hold any return values passed in registers after the call.
+//
+// reflectcall copies stack arguments from stackArgs to the goroutine stack, and
+// then copies back stackArgsSize-stackRetOffset bytes back to the return space
+// in stackArgs once fn has completed. It also "unspills" argument registers from
+// regArgs before calling fn, and spills them back into regArgs immediately
+// following the call to fn. If there are results being returned on the stack,
+// the caller should pass the argument frame type as stackArgsType so that
+// reflectcall can execute appropriate write barriers during the copy.
+//
+// reflectcall expects regArgs.ReturnIsPtr to be populated indicating which
+// registers on the return path will contain Go pointers. It will then store
+// these pointers in regArgs.Ptrs such that they are visible to the GC.
+//
+// Package reflect passes a frame type. In package runtime, there is only
+// one call that copies results back, in callbackWrap in syscall_windows.go, and it
+// does NOT pass a frame type, meaning there are no write barriers invoked. See that
+// call site for justification.
+//
+// Package reflect accesses this symbol through a linkname.
+//
+// Arguments passed through to reflectcall do not escape. The type is used
+// only in a very limited callee of reflectcall, the stackArgs are copied, and
+// regArgs is only used in the reflectcall frame.
+//
+//go:noescape
+func reflectcall(stackArgsType *_type, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+
+func procyield(cycles uint32)
+
+type neverCallThisFunction struct{}
+
+// goexit is the return stub at the top of every goroutine call stack.
+// Each goroutine stack is constructed as if goexit called the
+// goroutine's entry point function, so that when the entry point
+// function returns, it will return to goexit, which will call goexit1
+// to perform the actual exit.
+//
+// This function must never be called directly. Call goexit1 instead.
+// gentraceback assumes that goexit terminates the stack. A direct
+// call on the stack will cause gentraceback to stop walking the stack
+// prematurely and if there is leftover state it may panic.
+func goexit(neverCallThisFunction)
+
+// publicationBarrier performs a store/store barrier (a "publication"
+// or "export" barrier). Some form of synchronization is required
+// between initializing an object and making that object accessible to
+// another processor. Without synchronization, the initialization
+// writes and the "publication" write may be reordered, allowing the
+// other processor to follow the pointer and observe an uninitialized
+// object. In general, higher-level synchronization should be used,
+// such as locking or an atomic pointer write. publicationBarrier is
+// for when those aren't an option, such as in the implementation of
+// the memory manager.
+//
+// There's no corresponding barrier for the read side because the read
+// side naturally has a data dependency order. All architectures that
+// Go supports or seems likely to ever support automatically enforce
+// data dependency ordering.
+func publicationBarrier()
+
+// getcallerpc returns the program counter (PC) of its caller's caller.
+// getcallersp returns the stack pointer (SP) of its caller's caller.
+// The implementation may be a compiler intrinsic; there is not
+// necessarily code implementing this on every platform.
+//
+// For example:
+//
+//	func f(arg1, arg2, arg3 int) {
+//		pc := getcallerpc()
+//		sp := getcallersp()
+//	}
+//
+// These two lines find the PC and SP immediately following
+// the call to f (where f will return).
+//
+// The call to getcallerpc and getcallersp must be done in the
+// frame being asked about.
+//
+// The result of getcallersp is correct at the time of the return,
+// but it may be invalidated by any subsequent call to a function
+// that might relocate the stack in order to grow or shrink it.
+// A general rule is that the result of getcallersp should be used
+// immediately and can only be passed to nosplit functions.
+
+//go:noescape
+func getcallerpc() uintptr
+
+//go:noescape
+func getcallersp() uintptr // implemented as an intrinsic on all platforms
+
+// getclosureptr returns the pointer to the current closure.
+// getclosureptr can only be used in an assignment statement
+// at the entry of a function. Moreover, go:nosplit directive
+// must be specified at the declaration of caller function,
+// so that the function prolog does not clobber the closure register.
+// for example:
+//
+//	//go:nosplit
+//	func f(arg1, arg2, arg3 int) {
+//		dx := getclosureptr()
+//	}
+//
+// The compiler rewrites calls to this function into instructions that fetch the
+// pointer from a well-known register (DX on x86 architecture, etc.) directly.
+func getclosureptr() uintptr
+
+//go:noescape
+func asmcgocall(fn, arg unsafe.Pointer) int32
+
+func morestack()
+func morestack_noctxt()
+func rt0_go()
+
+// return0 is a stub used to return 0 from deferproc.
+// It is called at the very end of deferproc to signal
+// the calling Go function that it should not jump
+// to deferreturn.
+// in asm_*.s
+func return0()
+
+// in asm_*.s
+// not called directly; definitions here supply type information for traceback.
+// These must have the same signature (arg pointer map) as reflectcall.
+func call16(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call32(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call64(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call128(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call256(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call512(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call1024(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call2048(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call4096(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call8192(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call16384(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call32768(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call65536(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call131072(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call262144(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call524288(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call1048576(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call2097152(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call4194304(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call8388608(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call16777216(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call33554432(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call67108864(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call134217728(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call268435456(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call536870912(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+func call1073741824(typ, fn, stackArgs unsafe.Pointer, stackArgsSize, stackRetOffset, frameSize uint32, regArgs *abi.RegArgs)
+
+func systemstack_switch()
+
+// alignUp rounds n up to a multiple of a. a must be a power of 2.
+func alignUp(n, a uintptr) uintptr {
+	return (n + a - 1) &^ (a - 1)
+}
+
+// alignDown rounds n down to a multiple of a. a must be a power of 2.
+func alignDown(n, a uintptr) uintptr {
+	return n &^ (a - 1)
+}
+
+// divRoundUp returns ceil(n / a).
+func divRoundUp(n, a uintptr) uintptr {
+	// a is generally a power of two. This will get inlined and
+	// the compiler will optimize the division.
+	return (n + a - 1) / a
+}
+
+// checkASM reports whether assembly runtime checks have passed.
+func checkASM() bool
+
+func memequal_varlen(a, b unsafe.Pointer) bool
+
+// bool2int returns 0 if x is false or 1 if x is true.
+func bool2int(x bool) int {
+	// Avoid branches. In the SSA compiler, this compiles to
+	// exactly what you would want it to.
+	return int(uint8(*(*uint8)(unsafe.Pointer(&x))))
+}
+
+// abort crashes the runtime in situations where even throw might not
+// work. In general it should do something a debugger will recognize
+// (e.g., an INT3 on x86). A crash in abort is recognized by the
+// signal handler, which will attempt to tear down the runtime
+// immediately.
+func abort()
+
+// Called from compiled code; declared for vet; do NOT call from Go.
+func gcWriteBarrier1()
+func gcWriteBarrier2()
+func gcWriteBarrier3()
+func gcWriteBarrier4()
+func gcWriteBarrier5()
+func gcWriteBarrier6()
+func gcWriteBarrier7()
+func gcWriteBarrier8()
+func duffzero()
+func duffcopy()
+
+// Called from linker-generated .initarray; declared for go vet; do NOT call from Go.
+func addmoduledata()
+
+// Injected by the signal handler for panicking signals.
+// Initializes any registers that have fixed meaning at calls but
+// are scratch in bodies and calls sigpanic.
+// On many platforms it just jumps to sigpanic.
+func sigpanic0()
+
+// intArgRegs is used by the various register assignment
+// algorithm implementations in the runtime. These include:.
+// - Finalizers (mfinal.go)
+// - Windows callbacks (syscall_windows.go)
+//
+// Both are stripped-down versions of the algorithm since they
+// only have to deal with a subset of cases (finalizers only
+// take a pointer or interface argument, Go Windows callbacks
+// don't support floating point).
+//
+// It should be modified with care and are generally only
+// modified when testing this package.
+//
+// It should never be set higher than its internal/abi
+// constant counterparts, because the system relies on a
+// structure that is at least large enough to hold the
+// registers the system supports.
+//
+// Protected by finlock.
+var intArgRegs = abi.IntArgRegs
diff --git a/src/runtime/stubs2.go b/src/runtime/stubs2.go
new file mode 100644
index 0000000..9637347
--- /dev/null
+++ b/src/runtime/stubs2.go
@@ -0,0 +1,44 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !aix && !darwin && !js && !openbsd && !plan9 && !solaris && !wasip1 && !windows
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// read calls the read system call.
+// It returns a non-negative number of bytes written or a negative errno value.
+func read(fd int32, p unsafe.Pointer, n int32) int32
+
+func closefd(fd int32) int32
+
+func exit(code int32)
+func usleep(usec uint32)
+
+//go:nosplit
+func usleep_no_g(usec uint32) {
+	usleep(usec)
+}
+
+// write1 calls the write system call.
+// It returns a non-negative number of bytes written or a negative errno value.
+//
+//go:noescape
+func write1(fd uintptr, p unsafe.Pointer, n int32) int32
+
+//go:noescape
+func open(name *byte, mode, perm int32) int32
+
+// return value is only set on linux to be used in osinit().
+func madvise(addr unsafe.Pointer, n uintptr, flags int32) int32
+
+// exitThread terminates the current thread, writing *wait = freeMStack when
+// the stack is safe to reclaim.
+//
+//go:noescape
+func exitThread(wait *atomic.Uint32)
diff --git a/src/runtime/stubs3.go b/src/runtime/stubs3.go
new file mode 100644
index 0000000..c3749f3
--- /dev/null
+++ b/src/runtime/stubs3.go
@@ -0,0 +1,10 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !aix && !darwin && !freebsd && !openbsd && !plan9 && !solaris && !wasip1
+
+package runtime
+
+//go:wasmimport gojs runtime.nanotime1
+func nanotime1() int64
diff --git a/src/runtime/stubs_386.go b/src/runtime/stubs_386.go
new file mode 100644
index 0000000..a1dd023
--- /dev/null
+++ b/src/runtime/stubs_386.go
@@ -0,0 +1,24 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+func float64touint32(a float64) uint32
+func uint32tofloat64(a uint32) float64
+
+// stackcheck checks that SP is in range [g->stack.lo, g->stack.hi).
+func stackcheck()
+
+// Called from assembly only; declared for go vet.
+func setldt(slot uintptr, base unsafe.Pointer, size uintptr)
+func emptyfunc()
+
+//go:noescape
+func asmcgocall_no_g(fn, arg unsafe.Pointer)
+
+// getfp returns the frame pointer register of its caller or 0 if not implemented.
+// TODO: Make this a compiler intrinsic
+func getfp() uintptr { return 0 }
diff --git a/src/runtime/stubs_amd64.go b/src/runtime/stubs_amd64.go
new file mode 100644
index 0000000..a86a496
--- /dev/null
+++ b/src/runtime/stubs_amd64.go
@@ -0,0 +1,53 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// Called from compiled code; declared for vet; do NOT call from Go.
+func gcWriteBarrierCX()
+func gcWriteBarrierDX()
+func gcWriteBarrierBX()
+func gcWriteBarrierBP()
+func gcWriteBarrierSI()
+func gcWriteBarrierR8()
+func gcWriteBarrierR9()
+
+// stackcheck checks that SP is in range [g->stack.lo, g->stack.hi).
+func stackcheck()
+
+// Called from assembly only; declared for go vet.
+func settls() // argument in DI
+
+// Retpolines, used by -spectre=ret flag in cmd/asm, cmd/compile.
+func retpolineAX()
+func retpolineCX()
+func retpolineDX()
+func retpolineBX()
+func retpolineBP()
+func retpolineSI()
+func retpolineDI()
+func retpolineR8()
+func retpolineR9()
+func retpolineR10()
+func retpolineR11()
+func retpolineR12()
+func retpolineR13()
+func retpolineR14()
+func retpolineR15()
+
+//go:noescape
+func asmcgocall_no_g(fn, arg unsafe.Pointer)
+
+// Used by reflectcall and the reflect package.
+//
+// Spills/loads arguments in registers to/from an internal/abi.RegArgs
+// respectively. Does not follow the Go ABI.
+func spillArgs()
+func unspillArgs()
+
+// getfp returns the frame pointer register of its caller or 0 if not implemented.
+// TODO: Make this a compiler intrinsic
+func getfp() uintptr
diff --git a/src/runtime/stubs_arm.go b/src/runtime/stubs_arm.go
new file mode 100644
index 0000000..e19f1a8
--- /dev/null
+++ b/src/runtime/stubs_arm.go
@@ -0,0 +1,29 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// Called from compiler-generated code; declared for go vet.
+func udiv()
+func _div()
+func _divu()
+func _mod()
+func _modu()
+
+// Called from assembly only; declared for go vet.
+func usplitR0()
+func load_g()
+func save_g()
+func emptyfunc()
+func _initcgo()
+func read_tls_fallback()
+
+//go:noescape
+func asmcgocall_no_g(fn, arg unsafe.Pointer)
+
+// getfp returns the frame pointer register of its caller or 0 if not implemented.
+// TODO: Make this a compiler intrinsic
+func getfp() uintptr { return 0 }
diff --git a/src/runtime/stubs_arm64.go b/src/runtime/stubs_arm64.go
new file mode 100644
index 0000000..df04e64
--- /dev/null
+++ b/src/runtime/stubs_arm64.go
@@ -0,0 +1,27 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// Called from assembly only; declared for go vet.
+func load_g()
+func save_g()
+
+//go:noescape
+func asmcgocall_no_g(fn, arg unsafe.Pointer)
+
+func emptyfunc()
+
+// Used by reflectcall and the reflect package.
+//
+// Spills/loads arguments in registers to/from an internal/abi.RegArgs
+// respectively. Does not follow the Go ABI.
+func spillArgs()
+func unspillArgs()
+
+// getfp returns the frame pointer register of its caller or 0 if not implemented.
+// TODO: Make this a compiler intrinsic
+func getfp() uintptr
diff --git a/src/runtime/stubs_linux.go b/src/runtime/stubs_linux.go
new file mode 100644
index 0000000..2367dc2
--- /dev/null
+++ b/src/runtime/stubs_linux.go
@@ -0,0 +1,20 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux
+
+package runtime
+
+import "unsafe"
+
+func sbrk0() uintptr
+
+// Called from write_err_android.go only, but defined in sys_linux_*.s;
+// declared here (instead of in write_err_android.go) for go vet on non-android builds.
+// The return value is the raw syscall result, which may encode an error number.
+//
+//go:noescape
+func access(name *byte, mode int32) int32
+func connect(fd int32, addr unsafe.Pointer, len int32) int32
+func socket(domain int32, typ int32, prot int32) int32
diff --git a/src/runtime/stubs_loong64.go b/src/runtime/stubs_loong64.go
new file mode 100644
index 0000000..556983c
--- /dev/null
+++ b/src/runtime/stubs_loong64.go
@@ -0,0 +1,15 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build loong64
+
+package runtime
+
+// Called from assembly only; declared for go vet.
+func load_g()
+func save_g()
+
+// getfp returns the frame pointer register of its caller or 0 if not implemented.
+// TODO: Make this a compiler intrinsic
+func getfp() uintptr { return 0 }
diff --git a/src/runtime/stubs_mips64x.go b/src/runtime/stubs_mips64x.go
new file mode 100644
index 0000000..f0cf088
--- /dev/null
+++ b/src/runtime/stubs_mips64x.go
@@ -0,0 +1,20 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips64 || mips64le
+
+package runtime
+
+import "unsafe"
+
+// Called from assembly only; declared for go vet.
+func load_g()
+func save_g()
+
+//go:noescape
+func asmcgocall_no_g(fn, arg unsafe.Pointer)
+
+// getfp returns the frame pointer register of its caller or 0 if not implemented.
+// TODO: Make this a compiler intrinsic
+func getfp() uintptr { return 0 }
diff --git a/src/runtime/stubs_mipsx.go b/src/runtime/stubs_mipsx.go
new file mode 100644
index 0000000..84ba147
--- /dev/null
+++ b/src/runtime/stubs_mipsx.go
@@ -0,0 +1,15 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips || mipsle
+
+package runtime
+
+// Called from assembly only; declared for go vet.
+func load_g()
+func save_g()
+
+// getfp returns the frame pointer register of its caller or 0 if not implemented.
+// TODO: Make this a compiler intrinsic
+func getfp() uintptr { return 0 }
diff --git a/src/runtime/stubs_nonlinux.go b/src/runtime/stubs_nonlinux.go
new file mode 100644
index 0000000..1a06d7c
--- /dev/null
+++ b/src/runtime/stubs_nonlinux.go
@@ -0,0 +1,12 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !linux
+
+package runtime
+
+// sbrk0 returns the current process brk, or 0 if not implemented.
+func sbrk0() uintptr {
+	return 0
+}
diff --git a/src/runtime/stubs_ppc64.go b/src/runtime/stubs_ppc64.go
new file mode 100644
index 0000000..e23e338
--- /dev/null
+++ b/src/runtime/stubs_ppc64.go
@@ -0,0 +1,12 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux
+
+package runtime
+
+// This is needed for vet.
+//
+//go:noescape
+func callCgoSigaction(sig uintptr, new, old *sigactiont) int32
diff --git a/src/runtime/stubs_ppc64x.go b/src/runtime/stubs_ppc64x.go
new file mode 100644
index 0000000..0b7771e
--- /dev/null
+++ b/src/runtime/stubs_ppc64x.go
@@ -0,0 +1,21 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ppc64le || ppc64
+
+package runtime
+
+// Called from assembly only; declared for go vet.
+func load_g()
+func save_g()
+func reginit()
+
+// Spills/loads arguments in registers to/from an internal/abi.RegArgs
+// respectively. Does not follow the Go ABI.
+func spillArgs()
+func unspillArgs()
+
+// getfp returns the frame pointer register of its caller or 0 if not implemented.
+// TODO: Make this a compiler intrinsic
+func getfp() uintptr { return 0 }
diff --git a/src/runtime/stubs_riscv64.go b/src/runtime/stubs_riscv64.go
new file mode 100644
index 0000000..b07d7f8
--- /dev/null
+++ b/src/runtime/stubs_riscv64.go
@@ -0,0 +1,20 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// Called from assembly only; declared for go vet.
+func load_g()
+func save_g()
+
+// Used by reflectcall and the reflect package.
+//
+// Spills/loads arguments in registers to/from an internal/abi.RegArgs
+// respectively. Does not follow the Go ABI.
+func spillArgs()
+func unspillArgs()
+
+// getfp returns the frame pointer register of its caller or 0 if not implemented.
+// TODO: Make this a compiler intrinsic
+func getfp() uintptr { return 0 }
diff --git a/src/runtime/stubs_s390x.go b/src/runtime/stubs_s390x.go
new file mode 100644
index 0000000..a2b07ff
--- /dev/null
+++ b/src/runtime/stubs_s390x.go
@@ -0,0 +1,13 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// Called from assembly only; declared for go vet.
+func load_g()
+func save_g()
+
+// getfp returns the frame pointer register of its caller or 0 if not implemented.
+// TODO: Make this a compiler intrinsic
+func getfp() uintptr { return 0 }
diff --git a/src/runtime/symtab.go b/src/runtime/symtab.go
new file mode 100644
index 0000000..b47f2d8
--- /dev/null
+++ b/src/runtime/symtab.go
@@ -0,0 +1,1125 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// Frames may be used to get function/file/line information for a
+// slice of PC values returned by Callers.
+type Frames struct {
+	// callers is a slice of PCs that have not yet been expanded to frames.
+	callers []uintptr
+
+	// frames is a slice of Frames that have yet to be returned.
+	frames     []Frame
+	frameStore [2]Frame
+}
+
+// Frame is the information returned by Frames for each call frame.
+type Frame struct {
+	// PC is the program counter for the location in this frame.
+	// For a frame that calls another frame, this will be the
+	// program counter of a call instruction. Because of inlining,
+	// multiple frames may have the same PC value, but different
+	// symbolic information.
+	PC uintptr
+
+	// Func is the Func value of this call frame. This may be nil
+	// for non-Go code or fully inlined functions.
+	Func *Func
+
+	// Function is the package path-qualified function name of
+	// this call frame. If non-empty, this string uniquely
+	// identifies a single function in the program.
+	// This may be the empty string if not known.
+	// If Func is not nil then Function == Func.Name().
+	Function string
+
+	// File and Line are the file name and line number of the
+	// location in this frame. For non-leaf frames, this will be
+	// the location of a call. These may be the empty string and
+	// zero, respectively, if not known.
+	File string
+	Line int
+
+	// startLine is the line number of the beginning of the function in
+	// this frame. Specifically, it is the line number of the func keyword
+	// for Go functions. Note that //line directives can change the
+	// filename and/or line number arbitrarily within a function, meaning
+	// that the Line - startLine offset is not always meaningful.
+	//
+	// This may be zero if not known.
+	startLine int
+
+	// Entry point program counter for the function; may be zero
+	// if not known. If Func is not nil then Entry ==
+	// Func.Entry().
+	Entry uintptr
+
+	// The runtime's internal view of the function. This field
+	// is set (funcInfo.valid() returns true) only for Go functions,
+	// not for C functions.
+	funcInfo funcInfo
+}
+
+// CallersFrames takes a slice of PC values returned by Callers and
+// prepares to return function/file/line information.
+// Do not change the slice until you are done with the Frames.
+func CallersFrames(callers []uintptr) *Frames {
+	f := &Frames{callers: callers}
+	f.frames = f.frameStore[:0]
+	return f
+}
+
+// Next returns a Frame representing the next call frame in the slice
+// of PC values. If it has already returned all call frames, Next
+// returns a zero Frame.
+//
+// The more result indicates whether the next call to Next will return
+// a valid Frame. It does not necessarily indicate whether this call
+// returned one.
+//
+// See the Frames example for idiomatic usage.
+func (ci *Frames) Next() (frame Frame, more bool) {
+	for len(ci.frames) < 2 {
+		// Find the next frame.
+		// We need to look for 2 frames so we know what
+		// to return for the "more" result.
+		if len(ci.callers) == 0 {
+			break
+		}
+		pc := ci.callers[0]
+		ci.callers = ci.callers[1:]
+		funcInfo := findfunc(pc)
+		if !funcInfo.valid() {
+			if cgoSymbolizer != nil {
+				// Pre-expand cgo frames. We could do this
+				// incrementally, too, but there's no way to
+				// avoid allocation in this case anyway.
+				ci.frames = append(ci.frames, expandCgoFrames(pc)...)
+			}
+			continue
+		}
+		f := funcInfo._Func()
+		entry := f.Entry()
+		if pc > entry {
+			// We store the pc of the start of the instruction following
+			// the instruction in question (the call or the inline mark).
+			// This is done for historical reasons, and to make FuncForPC
+			// work correctly for entries in the result of runtime.Callers.
+			pc--
+		}
+		// It's important that interpret pc non-strictly as cgoTraceback may
+		// have added bogus PCs with a valid funcInfo but invalid PCDATA.
+		u, uf := newInlineUnwinder(funcInfo, pc, nil)
+		sf := u.srcFunc(uf)
+		if u.isInlined(uf) {
+			// Note: entry is not modified. It always refers to a real frame, not an inlined one.
+			// File/line from funcline1 below are already correct.
+			f = nil
+		}
+		ci.frames = append(ci.frames, Frame{
+			PC:        pc,
+			Func:      f,
+			Function:  funcNameForPrint(sf.name()),
+			Entry:     entry,
+			startLine: int(sf.startLine),
+			funcInfo:  funcInfo,
+			// Note: File,Line set below
+		})
+	}
+
+	// Pop one frame from the frame list. Keep the rest.
+	// Avoid allocation in the common case, which is 1 or 2 frames.
+	switch len(ci.frames) {
+	case 0: // In the rare case when there are no frames at all, we return Frame{}.
+		return
+	case 1:
+		frame = ci.frames[0]
+		ci.frames = ci.frameStore[:0]
+	case 2:
+		frame = ci.frames[0]
+		ci.frameStore[0] = ci.frames[1]
+		ci.frames = ci.frameStore[:1]
+	default:
+		frame = ci.frames[0]
+		ci.frames = ci.frames[1:]
+	}
+	more = len(ci.frames) > 0
+	if frame.funcInfo.valid() {
+		// Compute file/line just before we need to return it,
+		// as it can be expensive. This avoids computing file/line
+		// for the Frame we find but don't return. See issue 32093.
+		file, line := funcline1(frame.funcInfo, frame.PC, false)
+		frame.File, frame.Line = file, int(line)
+	}
+	return
+}
+
+// runtime_FrameStartLine returns the start line of the function in a Frame.
+//
+//go:linkname runtime_FrameStartLine runtime/pprof.runtime_FrameStartLine
+func runtime_FrameStartLine(f *Frame) int {
+	return f.startLine
+}
+
+// runtime_FrameSymbolName returns the full symbol name of the function in a Frame.
+// For generic functions this differs from f.Function in that this doesn't replace
+// the shape name to "...".
+//
+//go:linkname runtime_FrameSymbolName runtime/pprof.runtime_FrameSymbolName
+func runtime_FrameSymbolName(f *Frame) string {
+	if !f.funcInfo.valid() {
+		return f.Function
+	}
+	u, uf := newInlineUnwinder(f.funcInfo, f.PC, nil)
+	sf := u.srcFunc(uf)
+	return sf.name()
+}
+
+// runtime_expandFinalInlineFrame expands the final pc in stk to include all
+// "callers" if pc is inline.
+//
+//go:linkname runtime_expandFinalInlineFrame runtime/pprof.runtime_expandFinalInlineFrame
+func runtime_expandFinalInlineFrame(stk []uintptr) []uintptr {
+	// TODO: It would be more efficient to report only physical PCs to pprof and
+	// just expand the whole stack.
+	if len(stk) == 0 {
+		return stk
+	}
+	pc := stk[len(stk)-1]
+	tracepc := pc - 1
+
+	f := findfunc(tracepc)
+	if !f.valid() {
+		// Not a Go function.
+		return stk
+	}
+
+	var cache pcvalueCache
+	u, uf := newInlineUnwinder(f, tracepc, &cache)
+	if !u.isInlined(uf) {
+		// Nothing inline at tracepc.
+		return stk
+	}
+
+	// Treat the previous func as normal. We haven't actually checked, but
+	// since this pc was included in the stack, we know it shouldn't be
+	// elided.
+	calleeID := abi.FuncIDNormal
+
+	// Remove pc from stk; we'll re-add it below.
+	stk = stk[:len(stk)-1]
+
+	for ; uf.valid(); uf = u.next(uf) {
+		funcID := u.srcFunc(uf).funcID
+		if funcID == abi.FuncIDWrapper && elideWrapperCalling(calleeID) {
+			// ignore wrappers
+		} else {
+			stk = append(stk, uf.pc+1)
+		}
+		calleeID = funcID
+	}
+
+	return stk
+}
+
+// expandCgoFrames expands frame information for pc, known to be
+// a non-Go function, using the cgoSymbolizer hook. expandCgoFrames
+// returns nil if pc could not be expanded.
+func expandCgoFrames(pc uintptr) []Frame {
+	arg := cgoSymbolizerArg{pc: pc}
+	callCgoSymbolizer(&arg)
+
+	if arg.file == nil && arg.funcName == nil {
+		// No useful information from symbolizer.
+		return nil
+	}
+
+	var frames []Frame
+	for {
+		frames = append(frames, Frame{
+			PC:       pc,
+			Func:     nil,
+			Function: gostring(arg.funcName),
+			File:     gostring(arg.file),
+			Line:     int(arg.lineno),
+			Entry:    arg.entry,
+			// funcInfo is zero, which implies !funcInfo.valid().
+			// That ensures that we use the File/Line info given here.
+		})
+		if arg.more == 0 {
+			break
+		}
+		callCgoSymbolizer(&arg)
+	}
+
+	// No more frames for this PC. Tell the symbolizer we are done.
+	// We don't try to maintain a single cgoSymbolizerArg for the
+	// whole use of Frames, because there would be no good way to tell
+	// the symbolizer when we are done.
+	arg.pc = 0
+	callCgoSymbolizer(&arg)
+
+	return frames
+}
+
+// NOTE: Func does not expose the actual unexported fields, because we return *Func
+// values to users, and we want to keep them from being able to overwrite the data
+// with (say) *f = Func{}.
+// All code operating on a *Func must call raw() to get the *_func
+// or funcInfo() to get the funcInfo instead.
+
+// A Func represents a Go function in the running binary.
+type Func struct {
+	opaque struct{} // unexported field to disallow conversions
+}
+
+func (f *Func) raw() *_func {
+	return (*_func)(unsafe.Pointer(f))
+}
+
+func (f *Func) funcInfo() funcInfo {
+	return f.raw().funcInfo()
+}
+
+func (f *_func) funcInfo() funcInfo {
+	// Find the module containing fn. fn is located in the pclntable.
+	// The unsafe.Pointer to uintptr conversions and arithmetic
+	// are safe because we are working with module addresses.
+	ptr := uintptr(unsafe.Pointer(f))
+	var mod *moduledata
+	for datap := &firstmoduledata; datap != nil; datap = datap.next {
+		if len(datap.pclntable) == 0 {
+			continue
+		}
+		base := uintptr(unsafe.Pointer(&datap.pclntable[0]))
+		if base <= ptr && ptr < base+uintptr(len(datap.pclntable)) {
+			mod = datap
+			break
+		}
+	}
+	return funcInfo{f, mod}
+}
+
+// pcHeader holds data used by the pclntab lookups.
+type pcHeader struct {
+	magic          uint32  // 0xFFFFFFF1
+	pad1, pad2     uint8   // 0,0
+	minLC          uint8   // min instruction size
+	ptrSize        uint8   // size of a ptr in bytes
+	nfunc          int     // number of functions in the module
+	nfiles         uint    // number of entries in the file tab
+	textStart      uintptr // base for function entry PC offsets in this module, equal to moduledata.text
+	funcnameOffset uintptr // offset to the funcnametab variable from pcHeader
+	cuOffset       uintptr // offset to the cutab variable from pcHeader
+	filetabOffset  uintptr // offset to the filetab variable from pcHeader
+	pctabOffset    uintptr // offset to the pctab variable from pcHeader
+	pclnOffset     uintptr // offset to the pclntab variable from pcHeader
+}
+
+// moduledata records information about the layout of the executable
+// image. It is written by the linker. Any changes here must be
+// matched changes to the code in cmd/link/internal/ld/symtab.go:symtab.
+// moduledata is stored in statically allocated non-pointer memory;
+// none of the pointers here are visible to the garbage collector.
+type moduledata struct {
+	sys.NotInHeap // Only in static data
+
+	pcHeader     *pcHeader
+	funcnametab  []byte
+	cutab        []uint32
+	filetab      []byte
+	pctab        []byte
+	pclntable    []byte
+	ftab         []functab
+	findfunctab  uintptr
+	minpc, maxpc uintptr
+
+	text, etext           uintptr
+	noptrdata, enoptrdata uintptr
+	data, edata           uintptr
+	bss, ebss             uintptr
+	noptrbss, enoptrbss   uintptr
+	covctrs, ecovctrs     uintptr
+	end, gcdata, gcbss    uintptr
+	types, etypes         uintptr
+	rodata                uintptr
+	gofunc                uintptr // go.func.*
+
+	textsectmap []textsect
+	typelinks   []int32 // offsets from types
+	itablinks   []*itab
+
+	ptab []ptabEntry
+
+	pluginpath string
+	pkghashes  []modulehash
+
+	// This slice records the initializing tasks that need to be
+	// done to start up the program. It is built by the linker.
+	inittasks []*initTask
+
+	modulename   string
+	modulehashes []modulehash
+
+	hasmain uint8 // 1 if module contains the main function, 0 otherwise
+
+	gcdatamask, gcbssmask bitvector
+
+	typemap map[typeOff]*_type // offset to *_rtype in previous module
+
+	bad bool // module failed to load and should be ignored
+
+	next *moduledata
+}
+
+// A modulehash is used to compare the ABI of a new module or a
+// package in a new module with the loaded program.
+//
+// For each shared library a module links against, the linker creates an entry in the
+// moduledata.modulehashes slice containing the name of the module, the abi hash seen
+// at link time and a pointer to the runtime abi hash. These are checked in
+// moduledataverify1 below.
+//
+// For each loaded plugin, the pkghashes slice has a modulehash of the
+// newly loaded package that can be used to check the plugin's version of
+// a package against any previously loaded version of the package.
+// This is done in plugin.lastmoduleinit.
+type modulehash struct {
+	modulename   string
+	linktimehash string
+	runtimehash  *string
+}
+
+// pinnedTypemaps are the map[typeOff]*_type from the moduledata objects.
+//
+// These typemap objects are allocated at run time on the heap, but the
+// only direct reference to them is in the moduledata, created by the
+// linker and marked SNOPTRDATA so it is ignored by the GC.
+//
+// To make sure the map isn't collected, we keep a second reference here.
+var pinnedTypemaps []map[typeOff]*_type
+
+var firstmoduledata moduledata  // linker symbol
+var lastmoduledatap *moduledata // linker symbol
+var modulesSlice *[]*moduledata // see activeModules
+
+// activeModules returns a slice of active modules.
+//
+// A module is active once its gcdatamask and gcbssmask have been
+// assembled and it is usable by the GC.
+//
+// This is nosplit/nowritebarrier because it is called by the
+// cgo pointer checking code.
+//
+//go:nosplit
+//go:nowritebarrier
+func activeModules() []*moduledata {
+	p := (*[]*moduledata)(atomic.Loadp(unsafe.Pointer(&modulesSlice)))
+	if p == nil {
+		return nil
+	}
+	return *p
+}
+
+// modulesinit creates the active modules slice out of all loaded modules.
+//
+// When a module is first loaded by the dynamic linker, an .init_array
+// function (written by cmd/link) is invoked to call addmoduledata,
+// appending to the module to the linked list that starts with
+// firstmoduledata.
+//
+// There are two times this can happen in the lifecycle of a Go
+// program. First, if compiled with -linkshared, a number of modules
+// built with -buildmode=shared can be loaded at program initialization.
+// Second, a Go program can load a module while running that was built
+// with -buildmode=plugin.
+//
+// After loading, this function is called which initializes the
+// moduledata so it is usable by the GC and creates a new activeModules
+// list.
+//
+// Only one goroutine may call modulesinit at a time.
+func modulesinit() {
+	modules := new([]*moduledata)
+	for md := &firstmoduledata; md != nil; md = md.next {
+		if md.bad {
+			continue
+		}
+		*modules = append(*modules, md)
+		if md.gcdatamask == (bitvector{}) {
+			scanDataSize := md.edata - md.data
+			md.gcdatamask = progToPointerMask((*byte)(unsafe.Pointer(md.gcdata)), scanDataSize)
+			scanBSSSize := md.ebss - md.bss
+			md.gcbssmask = progToPointerMask((*byte)(unsafe.Pointer(md.gcbss)), scanBSSSize)
+			gcController.addGlobals(int64(scanDataSize + scanBSSSize))
+		}
+	}
+
+	// Modules appear in the moduledata linked list in the order they are
+	// loaded by the dynamic loader, with one exception: the
+	// firstmoduledata itself the module that contains the runtime. This
+	// is not always the first module (when using -buildmode=shared, it
+	// is typically libstd.so, the second module). The order matters for
+	// typelinksinit, so we swap the first module with whatever module
+	// contains the main function.
+	//
+	// See Issue #18729.
+	for i, md := range *modules {
+		if md.hasmain != 0 {
+			(*modules)[0] = md
+			(*modules)[i] = &firstmoduledata
+			break
+		}
+	}
+
+	atomicstorep(unsafe.Pointer(&modulesSlice), unsafe.Pointer(modules))
+}
+
+type functab struct {
+	entryoff uint32 // relative to runtime.text
+	funcoff  uint32
+}
+
+// Mapping information for secondary text sections
+
+type textsect struct {
+	vaddr    uintptr // prelinked section vaddr
+	end      uintptr // vaddr + section length
+	baseaddr uintptr // relocated section address
+}
+
+const minfunc = 16                 // minimum function size
+const pcbucketsize = 256 * minfunc // size of bucket in the pc->func lookup table
+
+// findfuncbucket is an array of these structures.
+// Each bucket represents 4096 bytes of the text segment.
+// Each subbucket represents 256 bytes of the text segment.
+// To find a function given a pc, locate the bucket and subbucket for
+// that pc. Add together the idx and subbucket value to obtain a
+// function index. Then scan the functab array starting at that
+// index to find the target function.
+// This table uses 20 bytes for every 4096 bytes of code, or ~0.5% overhead.
+type findfuncbucket struct {
+	idx        uint32
+	subbuckets [16]byte
+}
+
+func moduledataverify() {
+	for datap := &firstmoduledata; datap != nil; datap = datap.next {
+		moduledataverify1(datap)
+	}
+}
+
+const debugPcln = false
+
+func moduledataverify1(datap *moduledata) {
+	// Check that the pclntab's format is valid.
+	hdr := datap.pcHeader
+	if hdr.magic != 0xfffffff1 || hdr.pad1 != 0 || hdr.pad2 != 0 ||
+		hdr.minLC != sys.PCQuantum || hdr.ptrSize != goarch.PtrSize || hdr.textStart != datap.text {
+		println("runtime: pcHeader: magic=", hex(hdr.magic), "pad1=", hdr.pad1, "pad2=", hdr.pad2,
+			"minLC=", hdr.minLC, "ptrSize=", hdr.ptrSize, "pcHeader.textStart=", hex(hdr.textStart),
+			"text=", hex(datap.text), "pluginpath=", datap.pluginpath)
+		throw("invalid function symbol table")
+	}
+
+	// ftab is lookup table for function by program counter.
+	nftab := len(datap.ftab) - 1
+	for i := 0; i < nftab; i++ {
+		// NOTE: ftab[nftab].entry is legal; it is the address beyond the final function.
+		if datap.ftab[i].entryoff > datap.ftab[i+1].entryoff {
+			f1 := funcInfo{(*_func)(unsafe.Pointer(&datap.pclntable[datap.ftab[i].funcoff])), datap}
+			f2 := funcInfo{(*_func)(unsafe.Pointer(&datap.pclntable[datap.ftab[i+1].funcoff])), datap}
+			f2name := "end"
+			if i+1 < nftab {
+				f2name = funcname(f2)
+			}
+			println("function symbol table not sorted by PC offset:", hex(datap.ftab[i].entryoff), funcname(f1), ">", hex(datap.ftab[i+1].entryoff), f2name, ", plugin:", datap.pluginpath)
+			for j := 0; j <= i; j++ {
+				println("\t", hex(datap.ftab[j].entryoff), funcname(funcInfo{(*_func)(unsafe.Pointer(&datap.pclntable[datap.ftab[j].funcoff])), datap}))
+			}
+			if GOOS == "aix" && isarchive {
+				println("-Wl,-bnoobjreorder is mandatory on aix/ppc64 with c-archive")
+			}
+			throw("invalid runtime symbol table")
+		}
+	}
+
+	min := datap.textAddr(datap.ftab[0].entryoff)
+	max := datap.textAddr(datap.ftab[nftab].entryoff)
+	if datap.minpc != min || datap.maxpc != max {
+		println("minpc=", hex(datap.minpc), "min=", hex(min), "maxpc=", hex(datap.maxpc), "max=", hex(max))
+		throw("minpc or maxpc invalid")
+	}
+
+	for _, modulehash := range datap.modulehashes {
+		if modulehash.linktimehash != *modulehash.runtimehash {
+			println("abi mismatch detected between", datap.modulename, "and", modulehash.modulename)
+			throw("abi mismatch")
+		}
+	}
+}
+
+// textAddr returns md.text + off, with special handling for multiple text sections.
+// off is a (virtual) offset computed at internal linking time,
+// before the external linker adjusts the sections' base addresses.
+//
+// The text, or instruction stream is generated as one large buffer.
+// The off (offset) for a function is its offset within this buffer.
+// If the total text size gets too large, there can be issues on platforms like ppc64
+// if the target of calls are too far for the call instruction.
+// To resolve the large text issue, the text is split into multiple text sections
+// to allow the linker to generate long calls when necessary.
+// When this happens, the vaddr for each text section is set to its offset within the text.
+// Each function's offset is compared against the section vaddrs and ends to determine the containing section.
+// Then the section relative offset is added to the section's
+// relocated baseaddr to compute the function address.
+//
+// It is nosplit because it is part of the findfunc implementation.
+//
+//go:nosplit
+func (md *moduledata) textAddr(off32 uint32) uintptr {
+	off := uintptr(off32)
+	res := md.text + off
+	if len(md.textsectmap) > 1 {
+		for i, sect := range md.textsectmap {
+			// For the last section, include the end address (etext), as it is included in the functab.
+			if off >= sect.vaddr && off < sect.end || (i == len(md.textsectmap)-1 && off == sect.end) {
+				res = sect.baseaddr + off - sect.vaddr
+				break
+			}
+		}
+		if res > md.etext && GOARCH != "wasm" { // on wasm, functions do not live in the same address space as the linear memory
+			println("runtime: textAddr", hex(res), "out of range", hex(md.text), "-", hex(md.etext))
+			throw("runtime: text offset out of range")
+		}
+	}
+	return res
+}
+
+// textOff is the opposite of textAddr. It converts a PC to a (virtual) offset
+// to md.text, and returns if the PC is in any Go text section.
+//
+// It is nosplit because it is part of the findfunc implementation.
+//
+//go:nosplit
+func (md *moduledata) textOff(pc uintptr) (uint32, bool) {
+	res := uint32(pc - md.text)
+	if len(md.textsectmap) > 1 {
+		for i, sect := range md.textsectmap {
+			if sect.baseaddr > pc {
+				// pc is not in any section.
+				return 0, false
+			}
+			end := sect.baseaddr + (sect.end - sect.vaddr)
+			// For the last section, include the end address (etext), as it is included in the functab.
+			if i == len(md.textsectmap) {
+				end++
+			}
+			if pc < end {
+				res = uint32(pc - sect.baseaddr + sect.vaddr)
+				break
+			}
+		}
+	}
+	return res, true
+}
+
+// funcName returns the string at nameOff in the function name table.
+func (md *moduledata) funcName(nameOff int32) string {
+	if nameOff == 0 {
+		return ""
+	}
+	return gostringnocopy(&md.funcnametab[nameOff])
+}
+
+// FuncForPC returns a *Func describing the function that contains the
+// given program counter address, or else nil.
+//
+// If pc represents multiple functions because of inlining, it returns
+// the *Func describing the innermost function, but with an entry of
+// the outermost function.
+func FuncForPC(pc uintptr) *Func {
+	f := findfunc(pc)
+	if !f.valid() {
+		return nil
+	}
+	// This must interpret PC non-strictly so bad PCs (those between functions) don't crash the runtime.
+	// We just report the preceding function in that situation. See issue 29735.
+	// TODO: Perhaps we should report no function at all in that case.
+	// The runtime currently doesn't have function end info, alas.
+	u, uf := newInlineUnwinder(f, pc, nil)
+	if !u.isInlined(uf) {
+		return f._Func()
+	}
+	sf := u.srcFunc(uf)
+	file, line := u.fileLine(uf)
+	fi := &funcinl{
+		ones:      ^uint32(0),
+		entry:     f.entry(), // entry of the real (the outermost) function.
+		name:      sf.name(),
+		file:      file,
+		line:      int32(line),
+		startLine: sf.startLine,
+	}
+	return (*Func)(unsafe.Pointer(fi))
+}
+
+// Name returns the name of the function.
+func (f *Func) Name() string {
+	if f == nil {
+		return ""
+	}
+	fn := f.raw()
+	if fn.isInlined() { // inlined version
+		fi := (*funcinl)(unsafe.Pointer(fn))
+		return funcNameForPrint(fi.name)
+	}
+	return funcNameForPrint(funcname(f.funcInfo()))
+}
+
+// Entry returns the entry address of the function.
+func (f *Func) Entry() uintptr {
+	fn := f.raw()
+	if fn.isInlined() { // inlined version
+		fi := (*funcinl)(unsafe.Pointer(fn))
+		return fi.entry
+	}
+	return fn.funcInfo().entry()
+}
+
+// FileLine returns the file name and line number of the
+// source code corresponding to the program counter pc.
+// The result will not be accurate if pc is not a program
+// counter within f.
+func (f *Func) FileLine(pc uintptr) (file string, line int) {
+	fn := f.raw()
+	if fn.isInlined() { // inlined version
+		fi := (*funcinl)(unsafe.Pointer(fn))
+		return fi.file, int(fi.line)
+	}
+	// Pass strict=false here, because anyone can call this function,
+	// and they might just be wrong about targetpc belonging to f.
+	file, line32 := funcline1(f.funcInfo(), pc, false)
+	return file, int(line32)
+}
+
+// startLine returns the starting line number of the function. i.e., the line
+// number of the func keyword.
+func (f *Func) startLine() int32 {
+	fn := f.raw()
+	if fn.isInlined() { // inlined version
+		fi := (*funcinl)(unsafe.Pointer(fn))
+		return fi.startLine
+	}
+	return fn.funcInfo().startLine
+}
+
+// findmoduledatap looks up the moduledata for a PC.
+//
+// It is nosplit because it's part of the isgoexception
+// implementation.
+//
+//go:nosplit
+func findmoduledatap(pc uintptr) *moduledata {
+	for datap := &firstmoduledata; datap != nil; datap = datap.next {
+		if datap.minpc <= pc && pc < datap.maxpc {
+			return datap
+		}
+	}
+	return nil
+}
+
+type funcInfo struct {
+	*_func
+	datap *moduledata
+}
+
+func (f funcInfo) valid() bool {
+	return f._func != nil
+}
+
+func (f funcInfo) _Func() *Func {
+	return (*Func)(unsafe.Pointer(f._func))
+}
+
+// isInlined reports whether f should be re-interpreted as a *funcinl.
+func (f *_func) isInlined() bool {
+	return f.entryOff == ^uint32(0) // see comment for funcinl.ones
+}
+
+// entry returns the entry PC for f.
+func (f funcInfo) entry() uintptr {
+	return f.datap.textAddr(f.entryOff)
+}
+
+// findfunc looks up function metadata for a PC.
+//
+// It is nosplit because it's part of the isgoexception
+// implementation.
+//
+//go:nosplit
+func findfunc(pc uintptr) funcInfo {
+	datap := findmoduledatap(pc)
+	if datap == nil {
+		return funcInfo{}
+	}
+	const nsub = uintptr(len(findfuncbucket{}.subbuckets))
+
+	pcOff, ok := datap.textOff(pc)
+	if !ok {
+		return funcInfo{}
+	}
+
+	x := uintptr(pcOff) + datap.text - datap.minpc // TODO: are datap.text and datap.minpc always equal?
+	b := x / pcbucketsize
+	i := x % pcbucketsize / (pcbucketsize / nsub)
+
+	ffb := (*findfuncbucket)(add(unsafe.Pointer(datap.findfunctab), b*unsafe.Sizeof(findfuncbucket{})))
+	idx := ffb.idx + uint32(ffb.subbuckets[i])
+
+	// Find the ftab entry.
+	for datap.ftab[idx+1].entryoff <= pcOff {
+		idx++
+	}
+
+	funcoff := datap.ftab[idx].funcoff
+	return funcInfo{(*_func)(unsafe.Pointer(&datap.pclntable[funcoff])), datap}
+}
+
+// A srcFunc represents a logical function in the source code. This may
+// correspond to an actual symbol in the binary text, or it may correspond to a
+// source function that has been inlined.
+type srcFunc struct {
+	datap     *moduledata
+	nameOff   int32
+	startLine int32
+	funcID    abi.FuncID
+}
+
+func (f funcInfo) srcFunc() srcFunc {
+	if !f.valid() {
+		return srcFunc{}
+	}
+	return srcFunc{f.datap, f.nameOff, f.startLine, f.funcID}
+}
+
+func (s srcFunc) name() string {
+	if s.datap == nil {
+		return ""
+	}
+	return s.datap.funcName(s.nameOff)
+}
+
+type pcvalueCache struct {
+	entries [2][8]pcvalueCacheEnt
+}
+
+type pcvalueCacheEnt struct {
+	// targetpc and off together are the key of this cache entry.
+	targetpc uintptr
+	off      uint32
+	// val is the value of this cached pcvalue entry.
+	val int32
+}
+
+// pcvalueCacheKey returns the outermost index in a pcvalueCache to use for targetpc.
+// It must be very cheap to calculate.
+// For now, align to goarch.PtrSize and reduce mod the number of entries.
+// In practice, this appears to be fairly randomly and evenly distributed.
+func pcvalueCacheKey(targetpc uintptr) uintptr {
+	return (targetpc / goarch.PtrSize) % uintptr(len(pcvalueCache{}.entries))
+}
+
+// Returns the PCData value, and the PC where this value starts.
+// TODO: the start PC is returned only when cache is nil.
+func pcvalue(f funcInfo, off uint32, targetpc uintptr, cache *pcvalueCache, strict bool) (int32, uintptr) {
+	if off == 0 {
+		return -1, 0
+	}
+
+	// Check the cache. This speeds up walks of deep stacks, which
+	// tend to have the same recursive functions over and over.
+	//
+	// This cache is small enough that full associativity is
+	// cheaper than doing the hashing for a less associative
+	// cache.
+	if cache != nil {
+		x := pcvalueCacheKey(targetpc)
+		for i := range cache.entries[x] {
+			// We check off first because we're more
+			// likely to have multiple entries with
+			// different offsets for the same targetpc
+			// than the other way around, so we'll usually
+			// fail in the first clause.
+			ent := &cache.entries[x][i]
+			if ent.off == off && ent.targetpc == targetpc {
+				return ent.val, 0
+			}
+		}
+	}
+
+	if !f.valid() {
+		if strict && panicking.Load() == 0 {
+			println("runtime: no module data for", hex(f.entry()))
+			throw("no module data")
+		}
+		return -1, 0
+	}
+	datap := f.datap
+	p := datap.pctab[off:]
+	pc := f.entry()
+	prevpc := pc
+	val := int32(-1)
+	for {
+		var ok bool
+		p, ok = step(p, &pc, &val, pc == f.entry())
+		if !ok {
+			break
+		}
+		if targetpc < pc {
+			// Replace a random entry in the cache. Random
+			// replacement prevents a performance cliff if
+			// a recursive stack's cycle is slightly
+			// larger than the cache.
+			// Put the new element at the beginning,
+			// since it is the most likely to be newly used.
+			if cache != nil {
+				x := pcvalueCacheKey(targetpc)
+				e := &cache.entries[x]
+				ci := fastrandn(uint32(len(cache.entries[x])))
+				e[ci] = e[0]
+				e[0] = pcvalueCacheEnt{
+					targetpc: targetpc,
+					off:      off,
+					val:      val,
+				}
+			}
+
+			return val, prevpc
+		}
+		prevpc = pc
+	}
+
+	// If there was a table, it should have covered all program counters.
+	// If not, something is wrong.
+	if panicking.Load() != 0 || !strict {
+		return -1, 0
+	}
+
+	print("runtime: invalid pc-encoded table f=", funcname(f), " pc=", hex(pc), " targetpc=", hex(targetpc), " tab=", p, "\n")
+
+	p = datap.pctab[off:]
+	pc = f.entry()
+	val = -1
+	for {
+		var ok bool
+		p, ok = step(p, &pc, &val, pc == f.entry())
+		if !ok {
+			break
+		}
+		print("\tvalue=", val, " until pc=", hex(pc), "\n")
+	}
+
+	throw("invalid runtime symbol table")
+	return -1, 0
+}
+
+func funcname(f funcInfo) string {
+	if !f.valid() {
+		return ""
+	}
+	return f.datap.funcName(f.nameOff)
+}
+
+func funcpkgpath(f funcInfo) string {
+	name := funcNameForPrint(funcname(f))
+	i := len(name) - 1
+	for ; i > 0; i-- {
+		if name[i] == '/' {
+			break
+		}
+	}
+	for ; i < len(name); i++ {
+		if name[i] == '.' {
+			break
+		}
+	}
+	return name[:i]
+}
+
+func funcfile(f funcInfo, fileno int32) string {
+	datap := f.datap
+	if !f.valid() {
+		return "?"
+	}
+	// Make sure the cu index and file offset are valid
+	if fileoff := datap.cutab[f.cuOffset+uint32(fileno)]; fileoff != ^uint32(0) {
+		return gostringnocopy(&datap.filetab[fileoff])
+	}
+	// pcln section is corrupt.
+	return "?"
+}
+
+func funcline1(f funcInfo, targetpc uintptr, strict bool) (file string, line int32) {
+	datap := f.datap
+	if !f.valid() {
+		return "?", 0
+	}
+	fileno, _ := pcvalue(f, f.pcfile, targetpc, nil, strict)
+	line, _ = pcvalue(f, f.pcln, targetpc, nil, strict)
+	if fileno == -1 || line == -1 || int(fileno) >= len(datap.filetab) {
+		// print("looking for ", hex(targetpc), " in ", funcname(f), " got file=", fileno, " line=", lineno, "\n")
+		return "?", 0
+	}
+	file = funcfile(f, fileno)
+	return
+}
+
+func funcline(f funcInfo, targetpc uintptr) (file string, line int32) {
+	return funcline1(f, targetpc, true)
+}
+
+func funcspdelta(f funcInfo, targetpc uintptr, cache *pcvalueCache) int32 {
+	x, _ := pcvalue(f, f.pcsp, targetpc, cache, true)
+	if debugPcln && x&(goarch.PtrSize-1) != 0 {
+		print("invalid spdelta ", funcname(f), " ", hex(f.entry()), " ", hex(targetpc), " ", hex(f.pcsp), " ", x, "\n")
+		throw("bad spdelta")
+	}
+	return x
+}
+
+// funcMaxSPDelta returns the maximum spdelta at any point in f.
+func funcMaxSPDelta(f funcInfo) int32 {
+	datap := f.datap
+	p := datap.pctab[f.pcsp:]
+	pc := f.entry()
+	val := int32(-1)
+	max := int32(0)
+	for {
+		var ok bool
+		p, ok = step(p, &pc, &val, pc == f.entry())
+		if !ok {
+			return max
+		}
+		if val > max {
+			max = val
+		}
+	}
+}
+
+func pcdatastart(f funcInfo, table uint32) uint32 {
+	return *(*uint32)(add(unsafe.Pointer(&f.nfuncdata), unsafe.Sizeof(f.nfuncdata)+uintptr(table)*4))
+}
+
+func pcdatavalue(f funcInfo, table uint32, targetpc uintptr, cache *pcvalueCache) int32 {
+	if table >= f.npcdata {
+		return -1
+	}
+	r, _ := pcvalue(f, pcdatastart(f, table), targetpc, cache, true)
+	return r
+}
+
+func pcdatavalue1(f funcInfo, table uint32, targetpc uintptr, cache *pcvalueCache, strict bool) int32 {
+	if table >= f.npcdata {
+		return -1
+	}
+	r, _ := pcvalue(f, pcdatastart(f, table), targetpc, cache, strict)
+	return r
+}
+
+// Like pcdatavalue, but also return the start PC of this PCData value.
+// It doesn't take a cache.
+func pcdatavalue2(f funcInfo, table uint32, targetpc uintptr) (int32, uintptr) {
+	if table >= f.npcdata {
+		return -1, 0
+	}
+	return pcvalue(f, pcdatastart(f, table), targetpc, nil, true)
+}
+
+// funcdata returns a pointer to the ith funcdata for f.
+// funcdata should be kept in sync with cmd/link:writeFuncs.
+func funcdata(f funcInfo, i uint8) unsafe.Pointer {
+	if i < 0 || i >= f.nfuncdata {
+		return nil
+	}
+	base := f.datap.gofunc // load gofunc address early so that we calculate during cache misses
+	p := uintptr(unsafe.Pointer(&f.nfuncdata)) + unsafe.Sizeof(f.nfuncdata) + uintptr(f.npcdata)*4 + uintptr(i)*4
+	off := *(*uint32)(unsafe.Pointer(p))
+	// Return off == ^uint32(0) ? 0 : f.datap.gofunc + uintptr(off), but without branches.
+	// The compiler calculates mask on most architectures using conditional assignment.
+	var mask uintptr
+	if off == ^uint32(0) {
+		mask = 1
+	}
+	mask--
+	raw := base + uintptr(off)
+	return unsafe.Pointer(raw & mask)
+}
+
+// step advances to the next pc, value pair in the encoded table.
+func step(p []byte, pc *uintptr, val *int32, first bool) (newp []byte, ok bool) {
+	// For both uvdelta and pcdelta, the common case (~70%)
+	// is that they are a single byte. If so, avoid calling readvarint.
+	uvdelta := uint32(p[0])
+	if uvdelta == 0 && !first {
+		return nil, false
+	}
+	n := uint32(1)
+	if uvdelta&0x80 != 0 {
+		n, uvdelta = readvarint(p)
+	}
+	*val += int32(-(uvdelta & 1) ^ (uvdelta >> 1))
+	p = p[n:]
+
+	pcdelta := uint32(p[0])
+	n = 1
+	if pcdelta&0x80 != 0 {
+		n, pcdelta = readvarint(p)
+	}
+	p = p[n:]
+	*pc += uintptr(pcdelta * sys.PCQuantum)
+	return p, true
+}
+
+// readvarint reads a varint from p.
+func readvarint(p []byte) (read uint32, val uint32) {
+	var v, shift, n uint32
+	for {
+		b := p[n]
+		n++
+		v |= uint32(b&0x7F) << (shift & 31)
+		if b&0x80 == 0 {
+			break
+		}
+		shift += 7
+	}
+	return n, v
+}
+
+type stackmap struct {
+	n        int32   // number of bitmaps
+	nbit     int32   // number of bits in each bitmap
+	bytedata [1]byte // bitmaps, each starting on a byte boundary
+}
+
+//go:nowritebarrier
+func stackmapdata(stkmap *stackmap, n int32) bitvector {
+	// Check this invariant only when stackDebug is on at all.
+	// The invariant is already checked by many of stackmapdata's callers,
+	// and disabling it by default allows stackmapdata to be inlined.
+	if stackDebug > 0 && (n < 0 || n >= stkmap.n) {
+		throw("stackmapdata: index out of range")
+	}
+	return bitvector{stkmap.nbit, addb(&stkmap.bytedata[0], uintptr(n*((stkmap.nbit+7)>>3)))}
+}
diff --git a/src/runtime/symtab_test.go b/src/runtime/symtab_test.go
new file mode 100644
index 0000000..cf20ea7
--- /dev/null
+++ b/src/runtime/symtab_test.go
@@ -0,0 +1,285 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime"
+	"strings"
+	"testing"
+	"unsafe"
+)
+
+func TestCaller(t *testing.T) {
+	procs := runtime.GOMAXPROCS(-1)
+	c := make(chan bool, procs)
+	for p := 0; p < procs; p++ {
+		go func() {
+			for i := 0; i < 1000; i++ {
+				testCallerFoo(t)
+			}
+			c <- true
+		}()
+		defer func() {
+			<-c
+		}()
+	}
+}
+
+// These are marked noinline so that we can use FuncForPC
+// in testCallerBar.
+//
+//go:noinline
+func testCallerFoo(t *testing.T) {
+	testCallerBar(t)
+}
+
+//go:noinline
+func testCallerBar(t *testing.T) {
+	for i := 0; i < 2; i++ {
+		pc, file, line, ok := runtime.Caller(i)
+		f := runtime.FuncForPC(pc)
+		if !ok ||
+			!strings.HasSuffix(file, "symtab_test.go") ||
+			(i == 0 && !strings.HasSuffix(f.Name(), "testCallerBar")) ||
+			(i == 1 && !strings.HasSuffix(f.Name(), "testCallerFoo")) ||
+			line < 5 || line > 1000 ||
+			f.Entry() >= pc {
+			t.Errorf("incorrect symbol info %d: %t %d %d %s %s %d",
+				i, ok, f.Entry(), pc, f.Name(), file, line)
+		}
+	}
+}
+
+func lineNumber() int {
+	_, _, line, _ := runtime.Caller(1)
+	return line // return 0 for error
+}
+
+// Do not add/remove lines in this block without updating the line numbers.
+var firstLine = lineNumber() // 0
+var (                        // 1
+	lineVar1             = lineNumber()               // 2
+	lineVar2a, lineVar2b = lineNumber(), lineNumber() // 3
+)                        // 4
+var compLit = []struct { // 5
+	lineA, lineB int // 6
+}{ // 7
+	{ // 8
+		lineNumber(), lineNumber(), // 9
+	}, // 10
+	{ // 11
+		lineNumber(), // 12
+		lineNumber(), // 13
+	}, // 14
+	{ // 15
+		lineB: lineNumber(), // 16
+		lineA: lineNumber(), // 17
+	}, // 18
+}                                     // 19
+var arrayLit = [...]int{lineNumber(), // 20
+	lineNumber(), lineNumber(), // 21
+	lineNumber(), // 22
+}                                  // 23
+var sliceLit = []int{lineNumber(), // 24
+	lineNumber(), lineNumber(), // 25
+	lineNumber(), // 26
+}                         // 27
+var mapLit = map[int]int{ // 28
+	29:           lineNumber(), // 29
+	30:           lineNumber(), // 30
+	lineNumber(): 31,           // 31
+	lineNumber(): 32,           // 32
+}                           // 33
+var intLit = lineNumber() + // 34
+	lineNumber() + // 35
+	lineNumber() // 36
+func trythis() { // 37
+	recordLines(lineNumber(), // 38
+		lineNumber(), // 39
+		lineNumber()) // 40
+}
+
+// Modifications below this line are okay.
+
+var l38, l39, l40 int
+
+func recordLines(a, b, c int) {
+	l38 = a
+	l39 = b
+	l40 = c
+}
+
+func TestLineNumber(t *testing.T) {
+	trythis()
+	for _, test := range []struct {
+		name string
+		val  int
+		want int
+	}{
+		{"firstLine", firstLine, 0},
+		{"lineVar1", lineVar1, 2},
+		{"lineVar2a", lineVar2a, 3},
+		{"lineVar2b", lineVar2b, 3},
+		{"compLit[0].lineA", compLit[0].lineA, 9},
+		{"compLit[0].lineB", compLit[0].lineB, 9},
+		{"compLit[1].lineA", compLit[1].lineA, 12},
+		{"compLit[1].lineB", compLit[1].lineB, 13},
+		{"compLit[2].lineA", compLit[2].lineA, 17},
+		{"compLit[2].lineB", compLit[2].lineB, 16},
+
+		{"arrayLit[0]", arrayLit[0], 20},
+		{"arrayLit[1]", arrayLit[1], 21},
+		{"arrayLit[2]", arrayLit[2], 21},
+		{"arrayLit[3]", arrayLit[3], 22},
+
+		{"sliceLit[0]", sliceLit[0], 24},
+		{"sliceLit[1]", sliceLit[1], 25},
+		{"sliceLit[2]", sliceLit[2], 25},
+		{"sliceLit[3]", sliceLit[3], 26},
+
+		{"mapLit[29]", mapLit[29], 29},
+		{"mapLit[30]", mapLit[30], 30},
+		{"mapLit[31]", mapLit[31+firstLine] + firstLine, 31}, // nb it's the key not the value
+		{"mapLit[32]", mapLit[32+firstLine] + firstLine, 32}, // nb it's the key not the value
+
+		{"intLit", intLit - 2*firstLine, 34 + 35 + 36},
+
+		{"l38", l38, 38},
+		{"l39", l39, 39},
+		{"l40", l40, 40},
+	} {
+		if got := test.val - firstLine; got != test.want {
+			t.Errorf("%s on firstLine+%d want firstLine+%d (firstLine=%d, val=%d)",
+				test.name, got, test.want, firstLine, test.val)
+		}
+	}
+}
+
+func TestNilName(t *testing.T) {
+	defer func() {
+		if ex := recover(); ex != nil {
+			t.Fatalf("expected no nil panic, got=%v", ex)
+		}
+	}()
+	if got := (*runtime.Func)(nil).Name(); got != "" {
+		t.Errorf("Name() = %q, want %q", got, "")
+	}
+}
+
+var dummy int
+
+func inlined() {
+	// Side effect to prevent elimination of this entire function.
+	dummy = 42
+}
+
+// A function with an InlTree. Returns a PC within the function body.
+//
+// No inline to ensure this complete function appears in output.
+//
+//go:noinline
+func tracebackFunc(t *testing.T) uintptr {
+	// This body must be more complex than a single call to inlined to get
+	// an inline tree.
+	inlined()
+	inlined()
+
+	// Acquire a PC in this function.
+	pc, _, _, ok := runtime.Caller(0)
+	if !ok {
+		t.Fatalf("Caller(0) got ok false, want true")
+	}
+
+	return pc
+}
+
+// Test that CallersFrames handles PCs in the alignment region between
+// functions (int 3 on amd64) without crashing.
+//
+// Go will never generate a stack trace containing such an address, as it is
+// not a valid call site. However, the cgo traceback function passed to
+// runtime.SetCgoTraceback may not be completely accurate and may incorrect
+// provide PCs in Go code or the alignment region between functions.
+//
+// Go obviously doesn't easily expose the problematic PCs to running programs,
+// so this test is a bit fragile. Some details:
+//
+//   - tracebackFunc is our target function. We want to get a PC in the
+//     alignment region following this function. This function also has other
+//     functions inlined into it to ensure it has an InlTree (this was the source
+//     of the bug in issue 44971).
+//
+//   - We acquire a PC in tracebackFunc, walking forwards until FuncForPC says
+//     we're in a new function. The last PC of the function according to FuncForPC
+//     should be in the alignment region (assuming the function isn't already
+//     perfectly aligned).
+//
+// This is a regression test for issue 44971.
+func TestFunctionAlignmentTraceback(t *testing.T) {
+	pc := tracebackFunc(t)
+
+	// Double-check we got the right PC.
+	f := runtime.FuncForPC(pc)
+	if !strings.HasSuffix(f.Name(), "tracebackFunc") {
+		t.Fatalf("Caller(0) = %+v, want tracebackFunc", f)
+	}
+
+	// Iterate forward until we find a different function. Back up one
+	// instruction is (hopefully) an alignment instruction.
+	for runtime.FuncForPC(pc) == f {
+		pc++
+	}
+	pc--
+
+	// Is this an alignment region filler instruction? We only check this
+	// on amd64 for simplicity. If this function has no filler, then we may
+	// get a false negative, but will never get a false positive.
+	if runtime.GOARCH == "amd64" {
+		code := *(*uint8)(unsafe.Pointer(pc))
+		if code != 0xcc { // INT $3
+			t.Errorf("PC %v code got %#x want 0xcc", pc, code)
+		}
+	}
+
+	// Finally ensure that Frames.Next doesn't crash when processing this
+	// PC.
+	frames := runtime.CallersFrames([]uintptr{pc})
+	frame, _ := frames.Next()
+	if frame.Func != f {
+		t.Errorf("frames.Next() got %+v want %+v", frame.Func, f)
+	}
+}
+
+func BenchmarkFunc(b *testing.B) {
+	pc, _, _, ok := runtime.Caller(0)
+	if !ok {
+		b.Fatal("failed to look up PC")
+	}
+	f := runtime.FuncForPC(pc)
+	b.Run("Name", func(b *testing.B) {
+		for i := 0; i < b.N; i++ {
+			name := f.Name()
+			if name != "runtime_test.BenchmarkFunc" {
+				b.Fatalf("unexpected name %q", name)
+			}
+		}
+	})
+	b.Run("Entry", func(b *testing.B) {
+		for i := 0; i < b.N; i++ {
+			pc := f.Entry()
+			if pc == 0 {
+				b.Fatal("zero PC")
+			}
+		}
+	})
+	b.Run("FileLine", func(b *testing.B) {
+		for i := 0; i < b.N; i++ {
+			file, line := f.FileLine(pc)
+			if !strings.HasSuffix(file, "symtab_test.go") || line == 0 {
+				b.Fatalf("unexpected file/line %q:%d", file, line)
+			}
+		}
+	})
+}
diff --git a/src/runtime/symtabinl.go b/src/runtime/symtabinl.go
new file mode 100644
index 0000000..2bb1c4b
--- /dev/null
+++ b/src/runtime/symtabinl.go
@@ -0,0 +1,116 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "internal/abi"
+
+// inlinedCall is the encoding of entries in the FUNCDATA_InlTree table.
+type inlinedCall struct {
+	funcID    abi.FuncID // type of the called function
+	_         [3]byte
+	nameOff   int32 // offset into pclntab for name of called function
+	parentPc  int32 // position of an instruction whose source position is the call site (offset from entry)
+	startLine int32 // line number of start of function (func keyword/TEXT directive)
+}
+
+// An inlineUnwinder iterates over the stack of inlined calls at a PC by
+// decoding the inline table. The last step of iteration is always the frame of
+// the physical function, so there's always at least one frame.
+//
+// This is typically used as:
+//
+//	for u, uf := newInlineUnwinder(...); uf.valid(); uf = u.next(uf) { ... }
+//
+// Implementation note: This is used in contexts that disallow write barriers.
+// Hence, the constructor returns this by value and pointer receiver methods
+// must not mutate pointer fields. Also, we keep the mutable state in a separate
+// struct mostly to keep both structs SSA-able, which generates much better
+// code.
+type inlineUnwinder struct {
+	f       funcInfo
+	cache   *pcvalueCache
+	inlTree *[1 << 20]inlinedCall
+}
+
+// An inlineFrame is a position in an inlineUnwinder.
+type inlineFrame struct {
+	// pc is the PC giving the file/line metadata of the current frame. This is
+	// always a "call PC" (not a "return PC"). This is 0 when the iterator is
+	// exhausted.
+	pc uintptr
+
+	// index is the index of the current record in inlTree, or -1 if we are in
+	// the outermost function.
+	index int32
+}
+
+// newInlineUnwinder creates an inlineUnwinder initially set to the inner-most
+// inlined frame at PC. PC should be a "call PC" (not a "return PC").
+//
+// This unwinder uses non-strict handling of PC because it's assumed this is
+// only ever used for symbolic debugging. If things go really wrong, it'll just
+// fall back to the outermost frame.
+func newInlineUnwinder(f funcInfo, pc uintptr, cache *pcvalueCache) (inlineUnwinder, inlineFrame) {
+	inldata := funcdata(f, abi.FUNCDATA_InlTree)
+	if inldata == nil {
+		return inlineUnwinder{f: f}, inlineFrame{pc: pc, index: -1}
+	}
+	inlTree := (*[1 << 20]inlinedCall)(inldata)
+	u := inlineUnwinder{f: f, cache: cache, inlTree: inlTree}
+	return u, u.resolveInternal(pc)
+}
+
+func (u *inlineUnwinder) resolveInternal(pc uintptr) inlineFrame {
+	return inlineFrame{
+		pc: pc,
+		// Conveniently, this returns -1 if there's an error, which is the same
+		// value we use for the outermost frame.
+		index: pcdatavalue1(u.f, abi.PCDATA_InlTreeIndex, pc, u.cache, false),
+	}
+}
+
+func (uf inlineFrame) valid() bool {
+	return uf.pc != 0
+}
+
+// next returns the frame representing uf's logical caller.
+func (u *inlineUnwinder) next(uf inlineFrame) inlineFrame {
+	if uf.index < 0 {
+		uf.pc = 0
+		return uf
+	}
+	parentPc := u.inlTree[uf.index].parentPc
+	return u.resolveInternal(u.f.entry() + uintptr(parentPc))
+}
+
+// isInlined returns whether uf is an inlined frame.
+func (u *inlineUnwinder) isInlined(uf inlineFrame) bool {
+	return uf.index >= 0
+}
+
+// srcFunc returns the srcFunc representing the given frame.
+func (u *inlineUnwinder) srcFunc(uf inlineFrame) srcFunc {
+	if uf.index < 0 {
+		return u.f.srcFunc()
+	}
+	t := &u.inlTree[uf.index]
+	return srcFunc{
+		u.f.datap,
+		t.nameOff,
+		t.startLine,
+		t.funcID,
+	}
+}
+
+// fileLine returns the file name and line number of the call within the given
+// frame. As a convenience, for the innermost frame, it returns the file and
+// line of the PC this unwinder was started at (often this is a call to another
+// physical function).
+//
+// It returns "?", 0 if something goes wrong.
+func (u *inlineUnwinder) fileLine(uf inlineFrame) (file string, line int) {
+	file, line32 := funcline1(u.f, uf.pc, false)
+	return file, int(line32)
+}
diff --git a/src/runtime/symtabinl_test.go b/src/runtime/symtabinl_test.go
new file mode 100644
index 0000000..9e75f79
--- /dev/null
+++ b/src/runtime/symtabinl_test.go
@@ -0,0 +1,122 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"runtime/internal/sys"
+)
+
+func XTestInlineUnwinder(t TestingT) {
+	if TestenvOptimizationOff() {
+		t.Skip("skipping test with inlining optimizations disabled")
+	}
+
+	pc1 := abi.FuncPCABIInternal(tiuTest)
+	f := findfunc(pc1)
+	if !f.valid() {
+		t.Fatalf("failed to resolve tiuTest at PC %#x", pc1)
+	}
+
+	want := map[string]int{
+		"tiuInlined1:3 tiuTest:10":               0,
+		"tiuInlined1:3 tiuInlined2:6 tiuTest:11": 0,
+		"tiuInlined2:7 tiuTest:11":               0,
+		"tiuTest:12":                             0,
+	}
+	wantStart := map[string]int{
+		"tiuInlined1": 2,
+		"tiuInlined2": 5,
+		"tiuTest":     9,
+	}
+
+	// Iterate over the PCs in tiuTest and walk the inline stack for each.
+	prevStack := "x"
+	var cache pcvalueCache
+	for pc := pc1; pc < pc1+1024 && findfunc(pc) == f; pc += sys.PCQuantum {
+		stack := ""
+		u, uf := newInlineUnwinder(f, pc, &cache)
+		if file, _ := u.fileLine(uf); file == "?" {
+			// We're probably in the trailing function padding, where findfunc
+			// still returns f but there's no symbolic information. Just keep
+			// going until we definitely hit the end. If we see a "?" in the
+			// middle of unwinding, that's a real problem.
+			//
+			// TODO: If we ever have function end information, use that to make
+			// this robust.
+			continue
+		}
+		for ; uf.valid(); uf = u.next(uf) {
+			file, line := u.fileLine(uf)
+			const wantFile = "symtabinl_test.go"
+			if !hasSuffix(file, wantFile) {
+				t.Errorf("tiuTest+%#x: want file ...%s, got %s", pc-pc1, wantFile, file)
+			}
+
+			sf := u.srcFunc(uf)
+
+			name := sf.name()
+			const namePrefix = "runtime."
+			if hasPrefix(name, namePrefix) {
+				name = name[len(namePrefix):]
+			}
+			if !hasPrefix(name, "tiu") {
+				t.Errorf("tiuTest+%#x: unexpected function %s", pc-pc1, name)
+			}
+
+			start := int(sf.startLine) - tiuStart
+			if start != wantStart[name] {
+				t.Errorf("tiuTest+%#x: want startLine %d, got %d", pc-pc1, wantStart[name], start)
+			}
+			if sf.funcID != abi.FuncIDNormal {
+				t.Errorf("tiuTest+%#x: bad funcID %v", pc-pc1, sf.funcID)
+			}
+
+			if len(stack) > 0 {
+				stack += " "
+			}
+			stack += FmtSprintf("%s:%d", name, line-tiuStart)
+		}
+
+		if stack != prevStack {
+			prevStack = stack
+
+			t.Logf("tiuTest+%#x: %s", pc-pc1, stack)
+
+			if _, ok := want[stack]; ok {
+				want[stack]++
+			}
+		}
+	}
+
+	// Check that we got all the stacks we wanted.
+	for stack, count := range want {
+		if count == 0 {
+			t.Errorf("missing stack %s", stack)
+		}
+	}
+}
+
+func lineNumber() int {
+	_, _, line, _ := Caller(1)
+	return line // return 0 for error
+}
+
+// Below here is the test data for XTestInlineUnwinder
+
+var tiuStart = lineNumber() // +0
+var tiu1, tiu2, tiu3 int    // +1
+func tiuInlined1() { // +2
+	tiu1++ // +3
+} // +4
+func tiuInlined2() { // +5
+	tiuInlined1() // +6
+	tiu2++        // +7
+} // +8
+func tiuTest() { // +9
+	tiuInlined1() // +10
+	tiuInlined2() // +11
+	tiu3++        // +12
+} // +13
diff --git a/src/runtime/sys_aix_ppc64.s b/src/runtime/sys_aix_ppc64.s
new file mode 100644
index 0000000..6608197
--- /dev/null
+++ b/src/runtime/sys_aix_ppc64.s
@@ -0,0 +1,318 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for ppc64, Aix
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "asm_ppc64x.h"
+
+// This function calls a C function with the function descriptor in R12
+TEXT callCfunction<>(SB),	NOSPLIT|NOFRAME,$0
+	MOVD	0(R12), R12
+	MOVD	R2, 40(R1)
+	MOVD	0(R12), R0
+	MOVD	8(R12), R2
+	MOVD	R0, CTR
+	BR	(CTR)
+
+
+// asmsyscall6 calls a library function with a function descriptor
+// stored in libcall_fn and store the results in libcall structure
+// Up to 6 arguments can be passed to this C function
+// Called by runtime.asmcgocall
+// It reserves a stack of 288 bytes for the C function. It must
+// follow AIX convention, thus the first local variable must
+// be stored at the offset 112, after the linker area (48 bytes)
+// and the argument area (64).
+// The AIX convention is described here:
+// https://www.ibm.com/docs/en/aix/7.2?topic=overview-runtime-process-stack
+// NOT USING GO CALLING CONVENTION
+// runtime.asmsyscall6 is a function descriptor to the real asmsyscall6.
+DATA	runtime·asmsyscall6+0(SB)/8, $asmsyscall6<>(SB)
+DATA	runtime·asmsyscall6+8(SB)/8, $TOC(SB)
+DATA	runtime·asmsyscall6+16(SB)/8, $0
+GLOBL	runtime·asmsyscall6(SB), NOPTR, $24
+
+TEXT asmsyscall6<>(SB),NOSPLIT,$256
+	// Save libcall for later
+	MOVD	R3, 112(R1)
+	MOVD	libcall_fn(R3), R12
+	MOVD	libcall_args(R3), R9
+	MOVD	0(R9), R3
+	MOVD	8(R9), R4
+	MOVD	16(R9), R5
+	MOVD	24(R9), R6
+	MOVD	32(R9), R7
+	MOVD	40(R9), R8
+	BL	callCfunction<>(SB)
+
+	// Restore R0 and TOC
+	XOR	R0, R0
+	MOVD	40(R1), R2
+
+	// Store result in libcall
+	MOVD	112(R1), R5
+	MOVD	R3, (libcall_r1)(R5)
+	MOVD	$-1, R6
+	CMP	R6, R3
+	BNE	skiperrno
+
+	// Save errno in libcall
+	BL	runtime·load_g(SB)
+	MOVD	g_m(g), R4
+	MOVD	(m_mOS + mOS_perrno)(R4), R9
+	MOVW	0(R9), R9
+	MOVD	R9, (libcall_err)(R5)
+	RET
+skiperrno:
+	// Reset errno if no error has been returned
+	MOVD	R0, (libcall_err)(R5)
+	RET
+
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R3
+	MOVD	info+16(FP), R4
+	MOVD	ctx+24(FP), R5
+	MOVD	fn+0(FP), R12
+	// fn is a function descriptor
+	// R2 must be saved on restore
+	MOVD	0(R12), R0
+	MOVD	R2, 40(R1)
+	MOVD	8(R12), R2
+	MOVD	R0, CTR
+	BL	(CTR)
+	MOVD	40(R1), R2
+	BL	runtime·reginit(SB)
+	RET
+
+
+// runtime.sigtramp is a function descriptor to the real sigtramp.
+DATA	runtime·sigtramp+0(SB)/8, $sigtramp<>(SB)
+DATA	runtime·sigtramp+8(SB)/8, $TOC(SB)
+DATA	runtime·sigtramp+16(SB)/8, $0
+GLOBL	runtime·sigtramp(SB), NOPTR, $24
+
+// This function must not have any frame as we want to control how
+// every registers are used.
+// TODO(aix): Implement SetCgoTraceback handler.
+TEXT sigtramp<>(SB),NOSPLIT|NOFRAME|TOPFRAME,$0
+	MOVD	LR, R0
+	MOVD	R0, 16(R1)
+	// initialize essential registers (just in case)
+	BL	runtime·reginit(SB)
+
+	// Note that we are executing on altsigstack here, so we have
+	// more stack available than NOSPLIT would have us believe.
+	// To defeat the linker, we make our own stack frame with
+	// more space.
+	SUB	$144+FIXED_FRAME, R1
+
+	// Save registers
+	MOVD	R31, 56(R1)
+	MOVD	g, 64(R1)
+	MOVD	R29, 72(R1)
+	MOVD	R14, 80(R1)
+	MOVD	R15, 88(R1)
+
+	BL	runtime·load_g(SB)
+
+	CMP	$0, g
+	BEQ	sigtramp // g == nil
+	MOVD	g_m(g), R6
+	CMP	$0, R6
+	BEQ	sigtramp	// g.m == nil
+
+	// Save m->libcall. We need to do this because we
+	// might get interrupted by a signal in runtime·asmcgocall.
+	MOVD	(m_libcall+libcall_fn)(R6), R7
+	MOVD	R7, 96(R1)
+	MOVD	(m_libcall+libcall_args)(R6), R7
+	MOVD	R7, 104(R1)
+	MOVD	(m_libcall+libcall_n)(R6), R7
+	MOVD	R7, 112(R1)
+	MOVD	(m_libcall+libcall_r1)(R6), R7
+	MOVD	R7, 120(R1)
+	MOVD	(m_libcall+libcall_r2)(R6), R7
+	MOVD	R7, 128(R1)
+
+	// save errno, it might be EINTR; stuff we do here might reset it.
+	MOVD	(m_mOS+mOS_perrno)(R6), R8
+	MOVD	0(R8), R8
+	MOVD	R8, 136(R1)
+
+sigtramp:
+	MOVW	R3, FIXED_FRAME+0(R1)
+	MOVD	R4, FIXED_FRAME+8(R1)
+	MOVD	R5, FIXED_FRAME+16(R1)
+	MOVD	$runtime·sigtrampgo(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+
+	CMP	$0, g
+	BEQ	exit // g == nil
+	MOVD	g_m(g), R6
+	CMP	$0, R6
+	BEQ	exit	// g.m == nil
+
+	// restore libcall
+	MOVD	96(R1), R7
+	MOVD	R7, (m_libcall+libcall_fn)(R6)
+	MOVD	104(R1), R7
+	MOVD	R7, (m_libcall+libcall_args)(R6)
+	MOVD	112(R1), R7
+	MOVD	R7, (m_libcall+libcall_n)(R6)
+	MOVD	120(R1), R7
+	MOVD	R7, (m_libcall+libcall_r1)(R6)
+	MOVD	128(R1), R7
+	MOVD	R7, (m_libcall+libcall_r2)(R6)
+
+	// restore errno
+	MOVD	(m_mOS+mOS_perrno)(R6), R7
+	MOVD	136(R1), R8
+	MOVD	R8, 0(R7)
+
+exit:
+	// restore registers
+	MOVD	56(R1),R31
+	MOVD	64(R1),g
+	MOVD	72(R1),R29
+	MOVD	80(R1), R14
+	MOVD	88(R1), R15
+
+	// Don't use RET because we need to restore R31 !
+	ADD $144+FIXED_FRAME, R1
+	MOVD	16(R1), R0
+	MOVD	R0, LR
+	BR (LR)
+
+// runtime.tstart is a function descriptor to the real tstart.
+DATA	runtime·tstart+0(SB)/8, $tstart<>(SB)
+DATA	runtime·tstart+8(SB)/8, $TOC(SB)
+DATA	runtime·tstart+16(SB)/8, $0
+GLOBL	runtime·tstart(SB), NOPTR, $24
+
+TEXT tstart<>(SB),NOSPLIT,$0
+	XOR	 R0, R0 // reset R0
+
+	// set g
+	MOVD	m_g0(R3), g
+	BL	runtime·save_g(SB)
+	MOVD	R3, g_m(g)
+
+	// Layout new m scheduler stack on os stack.
+	MOVD	R1, R3
+	MOVD	R3, (g_stack+stack_hi)(g)
+	SUB	$(const_threadStackSize), R3		// stack size
+	MOVD	R3, (g_stack+stack_lo)(g)
+	ADD	$const_stackGuard, R3
+	MOVD	R3, g_stackguard0(g)
+	MOVD	R3, g_stackguard1(g)
+
+	BL	runtime·mstart(SB)
+
+	MOVD R0, R3
+	RET
+
+
+#define CSYSCALL()			\
+	MOVD	0(R12), R12		\
+	MOVD	R2, 40(R1)		\
+	MOVD	0(R12), R0		\
+	MOVD	8(R12), R2		\
+	MOVD	R0, CTR			\
+	BL	(CTR)			\
+	MOVD	40(R1), R2		\
+	BL runtime·reginit(SB)
+
+
+// Runs on OS stack, called from runtime·osyield.
+TEXT runtime·osyield1(SB),NOSPLIT,$0
+	MOVD	$libc_sched_yield(SB), R12
+	CSYSCALL()
+	RET
+
+
+// Runs on OS stack, called from runtime·sigprocmask.
+TEXT runtime·sigprocmask1(SB),NOSPLIT,$0-24
+	MOVD	how+0(FP), R3
+	MOVD	new+8(FP), R4
+	MOVD	old+16(FP), R5
+	MOVD	$libpthread_sigthreadmask(SB), R12
+	CSYSCALL()
+	RET
+
+// Runs on OS stack, called from runtime·usleep.
+TEXT runtime·usleep1(SB),NOSPLIT,$0-4
+	MOVW	us+0(FP), R3
+	MOVD	$libc_usleep(SB), R12
+	CSYSCALL()
+	RET
+
+// Runs on OS stack, called from runtime·exit.
+TEXT runtime·exit1(SB),NOSPLIT,$0-4
+	MOVW	code+0(FP), R3
+	MOVD	$libc_exit(SB), R12
+	CSYSCALL()
+	RET
+
+// Runs on OS stack, called from runtime·write1.
+TEXT runtime·write2(SB),NOSPLIT,$0-28
+	MOVD	fd+0(FP), R3
+	MOVD	p+8(FP), R4
+	MOVW	n+16(FP), R5
+	MOVD	$libc_write(SB), R12
+	CSYSCALL()
+	MOVW	R3, ret+24(FP)
+	RET
+
+// Runs on OS stack, called from runtime·pthread_attr_init.
+TEXT runtime·pthread_attr_init1(SB),NOSPLIT,$0-12
+	MOVD	attr+0(FP), R3
+	MOVD	$libpthread_attr_init(SB), R12
+	CSYSCALL()
+	MOVW	R3, ret+8(FP)
+	RET
+
+// Runs on OS stack, called from runtime·pthread_attr_setstacksize.
+TEXT runtime·pthread_attr_setstacksize1(SB),NOSPLIT,$0-20
+	MOVD	attr+0(FP), R3
+	MOVD	size+8(FP), R4
+	MOVD	$libpthread_attr_setstacksize(SB), R12
+	CSYSCALL()
+	MOVW	R3, ret+16(FP)
+	RET
+
+// Runs on OS stack, called from runtime·pthread_setdetachstate.
+TEXT runtime·pthread_attr_setdetachstate1(SB),NOSPLIT,$0-20
+	MOVD	attr+0(FP), R3
+	MOVW	state+8(FP), R4
+	MOVD	$libpthread_attr_setdetachstate(SB), R12
+	CSYSCALL()
+	MOVW	R3, ret+16(FP)
+	RET
+
+// Runs on OS stack, called from runtime·pthread_create.
+TEXT runtime·pthread_create1(SB),NOSPLIT,$0-36
+	MOVD	tid+0(FP), R3
+	MOVD	attr+8(FP), R4
+	MOVD	fn+16(FP), R5
+	MOVD	arg+24(FP), R6
+	MOVD	$libpthread_create(SB), R12
+	CSYSCALL()
+	MOVW	R3, ret+32(FP)
+	RET
+
+// Runs on OS stack, called from runtime·sigaction.
+TEXT runtime·sigaction1(SB),NOSPLIT,$0-24
+	MOVD	sig+0(FP), R3
+	MOVD	new+8(FP), R4
+	MOVD	old+16(FP), R5
+	MOVD	$libc_sigaction(SB), R12
+	CSYSCALL()
+	RET
diff --git a/src/runtime/sys_arm.go b/src/runtime/sys_arm.go
new file mode 100644
index 0000000..730b9c9
--- /dev/null
+++ b/src/runtime/sys_arm.go
@@ -0,0 +1,21 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then did an immediate Gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	if buf.lr != 0 {
+		throw("invalid use of gostartcall")
+	}
+	buf.lr = buf.pc
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
+
+// for testing
+func usplit(x uint32) (q, r uint32)
diff --git a/src/runtime/sys_arm64.go b/src/runtime/sys_arm64.go
new file mode 100644
index 0000000..230241d
--- /dev/null
+++ b/src/runtime/sys_arm64.go
@@ -0,0 +1,18 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then did an immediate Gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	if buf.lr != 0 {
+		throw("invalid use of gostartcall")
+	}
+	buf.lr = buf.pc
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
diff --git a/src/runtime/sys_darwin.go b/src/runtime/sys_darwin.go
new file mode 100644
index 0000000..fa9a2fb
--- /dev/null
+++ b/src/runtime/sys_darwin.go
@@ -0,0 +1,603 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// The X versions of syscall expect the libc call to return a 64-bit result.
+// Otherwise (the non-X version) expects a 32-bit result.
+// This distinction is required because an error is indicated by returning -1,
+// and we need to know whether to check 32 or 64 bits of the result.
+// (Some libc functions that return 32 bits put junk in the upper 32 bits of AX.)
+
+//go:linkname syscall_syscall syscall.syscall
+//go:nosplit
+func syscall_syscall(fn, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	args := struct{ fn, a1, a2, a3, r1, r2, err uintptr }{fn, a1, a2, a3, r1, r2, err}
+	entersyscall()
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscall)), unsafe.Pointer(&args))
+	exitsyscall()
+	return args.r1, args.r2, args.err
+}
+func syscall()
+
+//go:linkname syscall_syscallX syscall.syscallX
+//go:nosplit
+func syscall_syscallX(fn, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	args := struct{ fn, a1, a2, a3, r1, r2, err uintptr }{fn, a1, a2, a3, r1, r2, err}
+	entersyscall()
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscallX)), unsafe.Pointer(&args))
+	exitsyscall()
+	return args.r1, args.r2, args.err
+}
+func syscallX()
+
+//go:linkname syscall_syscall6 syscall.syscall6
+//go:nosplit
+func syscall_syscall6(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	args := struct{ fn, a1, a2, a3, a4, a5, a6, r1, r2, err uintptr }{fn, a1, a2, a3, a4, a5, a6, r1, r2, err}
+	entersyscall()
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscall6)), unsafe.Pointer(&args))
+	exitsyscall()
+	return args.r1, args.r2, args.err
+}
+func syscall6()
+
+//go:linkname syscall_syscall9 syscall.syscall9
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscall9(fn, a1, a2, a3, a4, a5, a6, a7, a8, a9 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscall9)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscall9()
+
+//go:linkname syscall_syscall6X syscall.syscall6X
+//go:nosplit
+func syscall_syscall6X(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	args := struct{ fn, a1, a2, a3, a4, a5, a6, r1, r2, err uintptr }{fn, a1, a2, a3, a4, a5, a6, r1, r2, err}
+	entersyscall()
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscall6X)), unsafe.Pointer(&args))
+	exitsyscall()
+	return args.r1, args.r2, args.err
+}
+func syscall6X()
+
+//go:linkname syscall_syscallPtr syscall.syscallPtr
+//go:nosplit
+func syscall_syscallPtr(fn, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	args := struct{ fn, a1, a2, a3, r1, r2, err uintptr }{fn, a1, a2, a3, r1, r2, err}
+	entersyscall()
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscallPtr)), unsafe.Pointer(&args))
+	exitsyscall()
+	return args.r1, args.r2, args.err
+}
+func syscallPtr()
+
+//go:linkname syscall_rawSyscall syscall.rawSyscall
+//go:nosplit
+func syscall_rawSyscall(fn, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	args := struct{ fn, a1, a2, a3, r1, r2, err uintptr }{fn, a1, a2, a3, r1, r2, err}
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscall)), unsafe.Pointer(&args))
+	return args.r1, args.r2, args.err
+}
+
+//go:linkname syscall_rawSyscall6 syscall.rawSyscall6
+//go:nosplit
+func syscall_rawSyscall6(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	args := struct{ fn, a1, a2, a3, a4, a5, a6, r1, r2, err uintptr }{fn, a1, a2, a3, a4, a5, a6, r1, r2, err}
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscall6)), unsafe.Pointer(&args))
+	return args.r1, args.r2, args.err
+}
+
+// crypto_x509_syscall is used in crypto/x509/internal/macos to call into Security.framework and CF.
+
+//go:linkname crypto_x509_syscall crypto/x509/internal/macos.syscall
+//go:nosplit
+func crypto_x509_syscall(fn, a1, a2, a3, a4, a5 uintptr, f1 float64) (r1 uintptr) {
+	args := struct {
+		fn, a1, a2, a3, a4, a5 uintptr
+		f1                     float64
+		r1                     uintptr
+	}{fn, a1, a2, a3, a4, a5, f1, r1}
+	entersyscall()
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscall_x509)), unsafe.Pointer(&args))
+	exitsyscall()
+	return args.r1
+}
+func syscall_x509()
+
+// The *_trampoline functions convert from the Go calling convention to the C calling convention
+// and then call the underlying libc function.  They are defined in sys_darwin_$ARCH.s.
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_attr_init(attr *pthreadattr) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_attr_init_trampoline)), unsafe.Pointer(&attr))
+	KeepAlive(attr)
+	return ret
+}
+func pthread_attr_init_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_attr_getstacksize(attr *pthreadattr, size *uintptr) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_attr_getstacksize_trampoline)), unsafe.Pointer(&attr))
+	KeepAlive(attr)
+	KeepAlive(size)
+	return ret
+}
+func pthread_attr_getstacksize_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_attr_setdetachstate(attr *pthreadattr, state int) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_attr_setdetachstate_trampoline)), unsafe.Pointer(&attr))
+	KeepAlive(attr)
+	return ret
+}
+func pthread_attr_setdetachstate_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_create(attr *pthreadattr, start uintptr, arg unsafe.Pointer) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_create_trampoline)), unsafe.Pointer(&attr))
+	KeepAlive(attr)
+	KeepAlive(arg) // Just for consistency. Arg of course needs to be kept alive for the start function.
+	return ret
+}
+func pthread_create_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func raise(sig uint32) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(raise_trampoline)), unsafe.Pointer(&sig))
+}
+func raise_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_self() (t pthread) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_self_trampoline)), unsafe.Pointer(&t))
+	return
+}
+func pthread_self_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_kill(t pthread, sig uint32) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_kill_trampoline)), unsafe.Pointer(&t))
+	return
+}
+func pthread_kill_trampoline()
+
+// osinit_hack is a clumsy hack to work around Apple libc bugs
+// causing fork+exec to hang in the child process intermittently.
+// See go.dev/issue/33565 and go.dev/issue/56784 for a few reports.
+//
+// The stacks obtained from the hung child processes are in
+// libSystem_atfork_child, which is supposed to reinitialize various
+// parts of the C library in the new process.
+//
+// One common stack dies in _notify_fork_child calling _notify_globals
+// (inlined) calling _os_alloc_once, because _os_alloc_once detects that
+// the once lock is held by the parent process and then calls
+// _os_once_gate_corruption_abort. The allocation is setting up the
+// globals for the notification subsystem. See the source code at [1].
+// To work around this, we can allocate the globals earlier in the Go
+// program's lifetime, before any execs are involved, by calling any
+// notify routine that is exported, calls _notify_globals, and doesn't do
+// anything too expensive otherwise. notify_is_valid_token(0) fits the bill.
+//
+// The other common stack dies in xpc_atfork_child calling
+// _objc_msgSend_uncached which ends up in
+// WAITING_FOR_ANOTHER_THREAD_TO_FINISH_CALLING_+initialize. Of course,
+// whatever thread the child is waiting for is in the parent process and
+// is not going to finish anything in the child process. There is no
+// public source code for these routines, so it is unclear exactly what
+// the problem is. An Apple engineer suggests using xpc_date_create_from_current,
+// which empirically does fix the problem.
+//
+// So osinit_hack_trampoline (in sys_darwin_$GOARCH.s) calls
+// notify_is_valid_token(0) and xpc_date_create_from_current(), which makes the
+// fork+exec hangs stop happening. If Apple fixes the libc bug in
+// some future version of macOS, then we can remove this awful code.
+//
+//go:nosplit
+func osinit_hack() {
+	if GOOS == "darwin" { // not ios
+		libcCall(unsafe.Pointer(abi.FuncPCABI0(osinit_hack_trampoline)), nil)
+	}
+	return
+}
+func osinit_hack_trampoline()
+
+// mmap is used to do low-level memory allocation via mmap. Don't allow stack
+// splits, since this function (used by sysAlloc) is called in a lot of low-level
+// parts of the runtime and callers often assume it won't acquire any locks.
+//
+//go:nosplit
+func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (unsafe.Pointer, int) {
+	args := struct {
+		addr            unsafe.Pointer
+		n               uintptr
+		prot, flags, fd int32
+		off             uint32
+		ret1            unsafe.Pointer
+		ret2            int
+	}{addr, n, prot, flags, fd, off, nil, 0}
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(mmap_trampoline)), unsafe.Pointer(&args))
+	return args.ret1, args.ret2
+}
+func mmap_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func munmap(addr unsafe.Pointer, n uintptr) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(munmap_trampoline)), unsafe.Pointer(&addr))
+	KeepAlive(addr) // Just for consistency. Hopefully addr is not a Go address.
+}
+func munmap_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func madvise(addr unsafe.Pointer, n uintptr, flags int32) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(madvise_trampoline)), unsafe.Pointer(&addr))
+	KeepAlive(addr) // Just for consistency. Hopefully addr is not a Go address.
+}
+func madvise_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func mlock(addr unsafe.Pointer, n uintptr) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(mlock_trampoline)), unsafe.Pointer(&addr))
+	KeepAlive(addr) // Just for consistency. Hopefully addr is not a Go address.
+}
+func mlock_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func read(fd int32, p unsafe.Pointer, n int32) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(read_trampoline)), unsafe.Pointer(&fd))
+	KeepAlive(p)
+	return ret
+}
+func read_trampoline()
+
+func pipe() (r, w int32, errno int32) {
+	var p [2]int32
+	errno = libcCall(unsafe.Pointer(abi.FuncPCABI0(pipe_trampoline)), noescape(unsafe.Pointer(&p)))
+	return p[0], p[1], errno
+}
+func pipe_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func closefd(fd int32) int32 {
+	return libcCall(unsafe.Pointer(abi.FuncPCABI0(close_trampoline)), unsafe.Pointer(&fd))
+}
+func close_trampoline()
+
+// This is exported via linkname to assembly in runtime/cgo.
+//
+//go:nosplit
+//go:cgo_unsafe_args
+//go:linkname exit
+func exit(code int32) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(exit_trampoline)), unsafe.Pointer(&code))
+}
+func exit_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func usleep(usec uint32) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(usleep_trampoline)), unsafe.Pointer(&usec))
+}
+func usleep_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func usleep_no_g(usec uint32) {
+	asmcgocall_no_g(unsafe.Pointer(abi.FuncPCABI0(usleep_trampoline)), unsafe.Pointer(&usec))
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+func write1(fd uintptr, p unsafe.Pointer, n int32) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(write_trampoline)), unsafe.Pointer(&fd))
+	KeepAlive(p)
+	return ret
+}
+func write_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func open(name *byte, mode, perm int32) (ret int32) {
+	ret = libcCall(unsafe.Pointer(abi.FuncPCABI0(open_trampoline)), unsafe.Pointer(&name))
+	KeepAlive(name)
+	return
+}
+func open_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func nanotime1() int64 {
+	var r struct {
+		t            int64  // raw timer
+		numer, denom uint32 // conversion factors. nanoseconds = t * numer / denom.
+	}
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(nanotime_trampoline)), unsafe.Pointer(&r))
+	// Note: Apple seems unconcerned about overflow here. See
+	// https://developer.apple.com/library/content/qa/qa1398/_index.html
+	// Note also, numer == denom == 1 is common.
+	t := r.t
+	if r.numer != 1 {
+		t *= int64(r.numer)
+	}
+	if r.denom != 1 {
+		t /= int64(r.denom)
+	}
+	return t
+}
+func nanotime_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func walltime() (int64, int32) {
+	var t timespec
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(walltime_trampoline)), unsafe.Pointer(&t))
+	return t.tv_sec, int32(t.tv_nsec)
+}
+func walltime_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sigaction(sig uint32, new *usigactiont, old *usigactiont) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(sigaction_trampoline)), unsafe.Pointer(&sig))
+	KeepAlive(new)
+	KeepAlive(old)
+}
+func sigaction_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sigprocmask(how uint32, new *sigset, old *sigset) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(sigprocmask_trampoline)), unsafe.Pointer(&how))
+	KeepAlive(new)
+	KeepAlive(old)
+}
+func sigprocmask_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sigaltstack(new *stackt, old *stackt) {
+	if new != nil && new.ss_flags&_SS_DISABLE != 0 && new.ss_size == 0 {
+		// Despite the fact that Darwin's sigaltstack man page says it ignores the size
+		// when SS_DISABLE is set, it doesn't. sigaltstack returns ENOMEM
+		// if we don't give it a reasonable size.
+		// ref: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140421/214296.html
+		new.ss_size = 32768
+	}
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(sigaltstack_trampoline)), unsafe.Pointer(&new))
+	KeepAlive(new)
+	KeepAlive(old)
+}
+func sigaltstack_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func raiseproc(sig uint32) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(raiseproc_trampoline)), unsafe.Pointer(&sig))
+}
+func raiseproc_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func setitimer(mode int32, new, old *itimerval) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(setitimer_trampoline)), unsafe.Pointer(&mode))
+	KeepAlive(new)
+	KeepAlive(old)
+}
+func setitimer_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sysctl(mib *uint32, miblen uint32, oldp *byte, oldlenp *uintptr, newp *byte, newlen uintptr) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(sysctl_trampoline)), unsafe.Pointer(&mib))
+	KeepAlive(mib)
+	KeepAlive(oldp)
+	KeepAlive(oldlenp)
+	KeepAlive(newp)
+	return ret
+}
+func sysctl_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sysctlbyname(name *byte, oldp *byte, oldlenp *uintptr, newp *byte, newlen uintptr) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(sysctlbyname_trampoline)), unsafe.Pointer(&name))
+	KeepAlive(name)
+	KeepAlive(oldp)
+	KeepAlive(oldlenp)
+	KeepAlive(newp)
+	return ret
+}
+func sysctlbyname_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func fcntl(fd, cmd, arg int32) (ret int32, errno int32) {
+	args := struct {
+		fd, cmd, arg int32
+		ret, errno   int32
+	}{fd, cmd, arg, 0, 0}
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(fcntl_trampoline)), unsafe.Pointer(&args))
+	return args.ret, args.errno
+}
+func fcntl_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func kqueue() int32 {
+	v := libcCall(unsafe.Pointer(abi.FuncPCABI0(kqueue_trampoline)), nil)
+	return v
+}
+func kqueue_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func kevent(kq int32, ch *keventt, nch int32, ev *keventt, nev int32, ts *timespec) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(kevent_trampoline)), unsafe.Pointer(&kq))
+	KeepAlive(ch)
+	KeepAlive(ev)
+	KeepAlive(ts)
+	return ret
+}
+func kevent_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_mutex_init(m *pthreadmutex, attr *pthreadmutexattr) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_mutex_init_trampoline)), unsafe.Pointer(&m))
+	KeepAlive(m)
+	KeepAlive(attr)
+	return ret
+}
+func pthread_mutex_init_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_mutex_lock(m *pthreadmutex) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_mutex_lock_trampoline)), unsafe.Pointer(&m))
+	KeepAlive(m)
+	return ret
+}
+func pthread_mutex_lock_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_mutex_unlock(m *pthreadmutex) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_mutex_unlock_trampoline)), unsafe.Pointer(&m))
+	KeepAlive(m)
+	return ret
+}
+func pthread_mutex_unlock_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_cond_init(c *pthreadcond, attr *pthreadcondattr) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_cond_init_trampoline)), unsafe.Pointer(&c))
+	KeepAlive(c)
+	KeepAlive(attr)
+	return ret
+}
+func pthread_cond_init_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_cond_wait(c *pthreadcond, m *pthreadmutex) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_cond_wait_trampoline)), unsafe.Pointer(&c))
+	KeepAlive(c)
+	KeepAlive(m)
+	return ret
+}
+func pthread_cond_wait_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_cond_timedwait_relative_np(c *pthreadcond, m *pthreadmutex, t *timespec) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_cond_timedwait_relative_np_trampoline)), unsafe.Pointer(&c))
+	KeepAlive(c)
+	KeepAlive(m)
+	KeepAlive(t)
+	return ret
+}
+func pthread_cond_timedwait_relative_np_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_cond_signal(c *pthreadcond) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_cond_signal_trampoline)), unsafe.Pointer(&c))
+	KeepAlive(c)
+	return ret
+}
+func pthread_cond_signal_trampoline()
+
+// Not used on Darwin, but must be defined.
+func exitThread(wait *atomic.Uint32) {
+	throw("exitThread")
+}
+
+//go:nosplit
+func setNonblock(fd int32) {
+	flags, _ := fcntl(fd, _F_GETFL, 0)
+	if flags != -1 {
+		fcntl(fd, _F_SETFL, flags|_O_NONBLOCK)
+	}
+}
+
+func issetugid() int32 {
+	return libcCall(unsafe.Pointer(abi.FuncPCABI0(issetugid_trampoline)), nil)
+}
+func issetugid_trampoline()
+
+// Tell the linker that the libc_* functions are to be found
+// in a system library, with the libc_ prefix missing.
+
+//go:cgo_import_dynamic libc_pthread_attr_init pthread_attr_init "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_attr_getstacksize pthread_attr_getstacksize "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_attr_setdetachstate pthread_attr_setdetachstate "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_create pthread_create "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_self pthread_self "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_kill pthread_kill "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_exit _exit "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_raise raise "/usr/lib/libSystem.B.dylib"
+
+//go:cgo_import_dynamic libc_open open "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_close close "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_read read "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_write write "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pipe pipe "/usr/lib/libSystem.B.dylib"
+
+//go:cgo_import_dynamic libc_mmap mmap "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_munmap munmap "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_madvise madvise "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_mlock mlock "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_error __error "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_usleep usleep "/usr/lib/libSystem.B.dylib"
+
+//go:cgo_import_dynamic libc_mach_timebase_info mach_timebase_info "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_mach_absolute_time mach_absolute_time "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_clock_gettime clock_gettime "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_sigaction sigaction "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_sigmask pthread_sigmask "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_sigaltstack sigaltstack "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_getpid getpid "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_kill kill "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_setitimer setitimer "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_sysctl sysctl "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_sysctlbyname sysctlbyname "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_fcntl fcntl "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_kqueue kqueue "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_kevent kevent "/usr/lib/libSystem.B.dylib"
+
+//go:cgo_import_dynamic libc_pthread_mutex_init pthread_mutex_init "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_mutex_lock pthread_mutex_lock "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_mutex_unlock pthread_mutex_unlock "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_cond_init pthread_cond_init "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_cond_wait pthread_cond_wait "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_cond_timedwait_relative_np pthread_cond_timedwait_relative_np "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_cond_signal pthread_cond_signal "/usr/lib/libSystem.B.dylib"
+
+//go:cgo_import_dynamic libc_notify_is_valid_token notify_is_valid_token "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_xpc_date_create_from_current xpc_date_create_from_current "/usr/lib/libSystem.B.dylib"
+
+//go:cgo_import_dynamic libc_issetugid issetugid "/usr/lib/libSystem.B.dylib"
diff --git a/src/runtime/sys_darwin_amd64.s b/src/runtime/sys_darwin_amd64.s
new file mode 100644
index 0000000..8e8ad9c
--- /dev/null
+++ b/src/runtime/sys_darwin_amd64.s
@@ -0,0 +1,798 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// System calls and other sys.stuff for AMD64, Darwin
+// System calls are implemented in libSystem, this file contains
+// trampolines that convert from Go to C calling convention.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "cgo/abi_amd64.h"
+
+#define CLOCK_REALTIME		0
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit_trampoline(SB),NOSPLIT,$0
+	MOVL	0(DI), DI		// arg 1 exit status
+	CALL	libc_exit(SB)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·open_trampoline(SB),NOSPLIT,$0
+	MOVL	8(DI), SI		// arg 2 flags
+	MOVL	12(DI), DX		// arg 3 mode
+	MOVQ	0(DI), DI		// arg 1 pathname
+	XORL	AX, AX			// vararg: say "no float args"
+	CALL	libc_open(SB)
+	RET
+
+TEXT runtime·close_trampoline(SB),NOSPLIT,$0
+	MOVL	0(DI), DI		// arg 1 fd
+	CALL	libc_close(SB)
+	RET
+
+TEXT runtime·read_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 buf
+	MOVL	16(DI), DX		// arg 3 count
+	MOVL	0(DI), DI		// arg 1 fd
+	CALL	libc_read(SB)
+	TESTL	AX, AX
+	JGE	noerr
+	CALL	libc_error(SB)
+	MOVL	(AX), AX
+	NEGL	AX			// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·write_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 buf
+	MOVL	16(DI), DX		// arg 3 count
+	MOVQ	0(DI), DI		// arg 1 fd
+	CALL	libc_write(SB)
+	TESTL	AX, AX
+	JGE	noerr
+	CALL	libc_error(SB)
+	MOVL	(AX), AX
+	NEGL	AX			// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·pipe_trampoline(SB),NOSPLIT,$0
+	CALL	libc_pipe(SB)		// pointer already in DI
+	TESTL	AX, AX
+	JEQ	3(PC)
+	CALL	libc_error(SB)		// return negative errno value
+	NEGL	AX
+	RET
+
+TEXT runtime·setitimer_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 new
+	MOVQ	16(DI), DX		// arg 3 old
+	MOVL	0(DI), DI		// arg 1 which
+	CALL	libc_setitimer(SB)
+	RET
+
+TEXT runtime·madvise_trampoline(SB), NOSPLIT, $0
+	MOVQ	8(DI), SI	// arg 2 len
+	MOVL	16(DI), DX	// arg 3 advice
+	MOVQ	0(DI), DI	// arg 1 addr
+	CALL	libc_madvise(SB)
+	// ignore failure - maybe pages are locked
+	RET
+
+TEXT runtime·mlock_trampoline(SB), NOSPLIT, $0
+	UNDEF // unimplemented
+
+GLOBL timebase<>(SB),NOPTR,$(machTimebaseInfo__size)
+
+TEXT runtime·nanotime_trampoline(SB),NOSPLIT,$0
+	MOVQ	DI, BX
+	CALL	libc_mach_absolute_time(SB)
+	MOVQ	AX, 0(BX)
+	MOVL	timebase<>+machTimebaseInfo_numer(SB), SI
+	MOVL	timebase<>+machTimebaseInfo_denom(SB), DI // atomic read
+	TESTL	DI, DI
+	JNE	initialized
+
+	SUBQ	$(machTimebaseInfo__size+15)/16*16, SP
+	MOVQ	SP, DI
+	CALL	libc_mach_timebase_info(SB)
+	MOVL	machTimebaseInfo_numer(SP), SI
+	MOVL	machTimebaseInfo_denom(SP), DI
+	ADDQ	$(machTimebaseInfo__size+15)/16*16, SP
+
+	MOVL	SI, timebase<>+machTimebaseInfo_numer(SB)
+	MOVL	DI, AX
+	XCHGL	AX, timebase<>+machTimebaseInfo_denom(SB) // atomic write
+
+initialized:
+	MOVL	SI, 8(BX)
+	MOVL	DI, 12(BX)
+	RET
+
+TEXT runtime·walltime_trampoline(SB),NOSPLIT,$0
+	MOVQ	DI, SI			// arg 2 timespec
+	MOVL	$CLOCK_REALTIME, DI	// arg 1 clock_id
+	CALL	libc_clock_gettime(SB)
+	RET
+
+TEXT runtime·sigaction_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 new
+	MOVQ	16(DI), DX		// arg 3 old
+	MOVL	0(DI), DI		// arg 1 sig
+	CALL	libc_sigaction(SB)
+	TESTL	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·sigprocmask_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI	// arg 2 new
+	MOVQ	16(DI), DX	// arg 3 old
+	MOVL	0(DI), DI	// arg 1 how
+	CALL	libc_pthread_sigmask(SB)
+	TESTL	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·sigaltstack_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 old
+	MOVQ	0(DI), DI		// arg 1 new
+	CALL	libc_sigaltstack(SB)
+	TESTQ	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·raiseproc_trampoline(SB),NOSPLIT,$0
+	MOVL	0(DI), BX	// signal
+	CALL	libc_getpid(SB)
+	MOVL	AX, DI		// arg 1 pid
+	MOVL	BX, SI		// arg 2 signal
+	CALL	libc_kill(SB)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVQ	fn+0(FP),    AX
+	MOVL	sig+8(FP),   DI
+	MOVQ	info+16(FP), SI
+	MOVQ	ctx+24(FP),  DX
+	MOVQ	SP, BX		// callee-saved
+	ANDQ	$~15, SP	// alignment for x86_64 ABI
+	CALL	AX
+	MOVQ	BX, SP
+	RET
+
+// This is the function registered during sigaction and is invoked when
+// a signal is received. It just redirects to the Go function sigtrampgo.
+// Called using C ABI.
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
+	// Transition from C ABI to Go ABI.
+	PUSH_REGS_HOST_TO_ABI0()
+
+	// Set up ABIInternal environment: g in R14, cleared X15.
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	PXOR	X15, X15
+
+	// Reserve space for spill slots.
+	NOP	SP		// disable vet stack checking
+	ADJSP   $24
+
+	// Call into the Go signal handler
+	MOVQ	DI, AX	// sig
+	MOVQ	SI, BX	// info
+	MOVQ	DX, CX	// ctx
+	CALL	·sigtrampgo<ABIInternal>(SB)
+
+	ADJSP	$-24
+
+	POP_REGS_HOST_TO_ABI0()
+	RET
+
+// Called using C ABI.
+TEXT runtime·sigprofNonGoWrapper<>(SB),NOSPLIT|NOFRAME,$0
+	// Transition from C ABI to Go ABI.
+	PUSH_REGS_HOST_TO_ABI0()
+
+	// Call into the Go signal handler
+	NOP	SP		// disable vet stack checking
+	ADJSP	$24
+	MOVL	DI, 0(SP)	// sig
+	MOVQ	SI, 8(SP)	// info
+	MOVQ	DX, 16(SP)	// ctx
+	CALL	·sigprofNonGo(SB)
+	ADJSP	$-24
+
+	POP_REGS_HOST_TO_ABI0()
+	RET
+
+// Used instead of sigtramp in programs that use cgo.
+// Arguments from kernel are in DI, SI, DX.
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	// If no traceback function, do usual sigtramp.
+	MOVQ	runtime·cgoTraceback(SB), AX
+	TESTQ	AX, AX
+	JZ	sigtramp
+
+	// If no traceback support function, which means that
+	// runtime/cgo was not linked in, do usual sigtramp.
+	MOVQ	_cgo_callers(SB), AX
+	TESTQ	AX, AX
+	JZ	sigtramp
+
+	// Figure out if we are currently in a cgo call.
+	// If not, just do usual sigtramp.
+	get_tls(CX)
+	MOVQ	g(CX),AX
+	TESTQ	AX, AX
+	JZ	sigtrampnog     // g == nil
+	MOVQ	g_m(AX), AX
+	TESTQ	AX, AX
+	JZ	sigtramp        // g.m == nil
+	MOVL	m_ncgo(AX), CX
+	TESTL	CX, CX
+	JZ	sigtramp        // g.m.ncgo == 0
+	MOVQ	m_curg(AX), CX
+	TESTQ	CX, CX
+	JZ	sigtramp        // g.m.curg == nil
+	MOVQ	g_syscallsp(CX), CX
+	TESTQ	CX, CX
+	JZ	sigtramp        // g.m.curg.syscallsp == 0
+	MOVQ	m_cgoCallers(AX), R8
+	TESTQ	R8, R8
+	JZ	sigtramp        // g.m.cgoCallers == nil
+	MOVL	m_cgoCallersUse(AX), CX
+	TESTL	CX, CX
+	JNZ	sigtramp	// g.m.cgoCallersUse != 0
+
+	// Jump to a function in runtime/cgo.
+	// That function, written in C, will call the user's traceback
+	// function with proper unwind info, and will then call back here.
+	// The first three arguments, and the fifth, are already in registers.
+	// Set the two remaining arguments now.
+	MOVQ	runtime·cgoTraceback(SB), CX
+	MOVQ	$runtime·sigtramp(SB), R9
+	MOVQ	_cgo_callers(SB), AX
+	JMP	AX
+
+sigtramp:
+	JMP	runtime·sigtramp(SB)
+
+sigtrampnog:
+	// Signal arrived on a non-Go thread. If this is SIGPROF, get a
+	// stack trace.
+	CMPL	DI, $27 // 27 == SIGPROF
+	JNZ	sigtramp
+
+	// Lock sigprofCallersUse.
+	MOVL	$0, AX
+	MOVL	$1, CX
+	MOVQ	$runtime·sigprofCallersUse(SB), R11
+	LOCK
+	CMPXCHGL	CX, 0(R11)
+	JNZ	sigtramp  // Skip stack trace if already locked.
+
+	// Jump to the traceback function in runtime/cgo.
+	// It will call back to sigprofNonGo, via sigprofNonGoWrapper, to convert
+	// the arguments to the Go calling convention.
+	// First three arguments to traceback function are in registers already.
+	MOVQ	runtime·cgoTraceback(SB), CX
+	MOVQ	$runtime·sigprofCallers(SB), R8
+	MOVQ	$runtime·sigprofNonGoWrapper<>(SB), R9
+	MOVQ	_cgo_callers(SB), AX
+	JMP	AX
+
+TEXT runtime·mmap_trampoline(SB),NOSPLIT,$0
+	MOVQ	DI, BX
+	MOVQ	0(BX), DI		// arg 1 addr
+	MOVQ	8(BX), SI		// arg 2 len
+	MOVL	16(BX), DX		// arg 3 prot
+	MOVL	20(BX), CX		// arg 4 flags
+	MOVL	24(BX), R8		// arg 5 fid
+	MOVL	28(BX), R9		// arg 6 offset
+	CALL	libc_mmap(SB)
+	XORL	DX, DX
+	CMPQ	AX, $-1
+	JNE	ok
+	CALL	libc_error(SB)
+	MOVLQSX	(AX), DX		// errno
+	XORL	AX, AX
+ok:
+	MOVQ	AX, 32(BX)
+	MOVQ	DX, 40(BX)
+	RET
+
+TEXT runtime·munmap_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 len
+	MOVQ	0(DI), DI		// arg 1 addr
+	CALL	libc_munmap(SB)
+	TESTQ	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·usleep_trampoline(SB),NOSPLIT,$0
+	MOVL	0(DI), DI	// arg 1 usec
+	CALL	libc_usleep(SB)
+	RET
+
+TEXT runtime·settls(SB),NOSPLIT,$32
+	// Nothing to do on Darwin, pthread already set thread-local storage up.
+	RET
+
+TEXT runtime·sysctl_trampoline(SB),NOSPLIT,$0
+	MOVL	8(DI), SI		// arg 2 miblen
+	MOVQ	16(DI), DX		// arg 3 oldp
+	MOVQ	24(DI), CX		// arg 4 oldlenp
+	MOVQ	32(DI), R8		// arg 5 newp
+	MOVQ	40(DI), R9		// arg 6 newlen
+	MOVQ	0(DI), DI		// arg 1 mib
+	CALL	libc_sysctl(SB)
+	RET
+
+TEXT runtime·sysctlbyname_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 oldp
+	MOVQ	16(DI), DX		// arg 3 oldlenp
+	MOVQ	24(DI), CX		// arg 4 newp
+	MOVQ	32(DI), R8		// arg 5 newlen
+	MOVQ	0(DI), DI		// arg 1 name
+	CALL	libc_sysctlbyname(SB)
+	RET
+
+TEXT runtime·kqueue_trampoline(SB),NOSPLIT,$0
+	CALL	libc_kqueue(SB)
+	RET
+
+TEXT runtime·kevent_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 keventt
+	MOVL	16(DI), DX		// arg 3 nch
+	MOVQ	24(DI), CX		// arg 4 ev
+	MOVL	32(DI), R8		// arg 5 nev
+	MOVQ	40(DI), R9		// arg 6 ts
+	MOVL	0(DI), DI		// arg 1 kq
+	CALL	libc_kevent(SB)
+	CMPL	AX, $-1
+	JNE	ok
+	CALL	libc_error(SB)
+	MOVLQSX	(AX), AX		// errno
+	NEGQ	AX			// caller wants it as a negative error code
+ok:
+	RET
+
+TEXT runtime·fcntl_trampoline(SB),NOSPLIT,$0
+	MOVQ	DI, BX
+	MOVL	0(BX), DI		// arg 1 fd
+	MOVL	4(BX), SI		// arg 2 cmd
+	MOVL	8(BX), DX		// arg 3 arg
+	XORL	AX, AX			// vararg: say "no float args"
+	CALL	libc_fcntl(SB)
+	XORL	DX, DX
+	CMPQ	AX, $-1
+	JNE	noerr
+	CALL	libc_error(SB)
+	MOVL	(AX), DX
+	MOVL	$-1, AX
+noerr:
+	MOVL	AX, 12(BX)
+	MOVL	DX, 16(BX)
+	RET
+
+// mstart_stub is the first function executed on a new thread started by pthread_create.
+// It just does some low-level setup and then calls mstart.
+// Note: called with the C calling convention.
+TEXT runtime·mstart_stub(SB),NOSPLIT|NOFRAME,$0
+	// DI points to the m.
+	// We are already on m's g0 stack.
+
+	// Transition from C ABI to Go ABI.
+	PUSH_REGS_HOST_TO_ABI0()
+
+	MOVQ	m_g0(DI), DX // g
+
+	// Initialize TLS entry.
+	// See cmd/link/internal/ld/sym.go:computeTLSOffset.
+	MOVQ	DX, 0x30(GS)
+
+	CALL	runtime·mstart(SB)
+
+	POP_REGS_HOST_TO_ABI0()
+
+	// Go is all done with this OS thread.
+	// Tell pthread everything is ok (we never join with this thread, so
+	// the value here doesn't really matter).
+	XORL	AX, AX
+	RET
+
+// These trampolines help convert from Go calling convention to C calling convention.
+// They should be called with asmcgocall.
+// A pointer to the arguments is passed in DI.
+// A single int32 result is returned in AX.
+// (For more results, make an args/results structure.)
+TEXT runtime·pthread_attr_init_trampoline(SB),NOSPLIT,$0
+	MOVQ	0(DI), DI // arg 1 attr
+	CALL	libc_pthread_attr_init(SB)
+	RET
+
+TEXT runtime·pthread_attr_getstacksize_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI	// arg 2 size
+	MOVQ	0(DI), DI	// arg 1 attr
+	CALL	libc_pthread_attr_getstacksize(SB)
+	RET
+
+TEXT runtime·pthread_attr_setdetachstate_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI	// arg 2 state
+	MOVQ	0(DI), DI	// arg 1 attr
+	CALL	libc_pthread_attr_setdetachstate(SB)
+	RET
+
+TEXT runtime·pthread_create_trampoline(SB),NOSPLIT,$16
+	MOVQ	0(DI), SI	// arg 2 attr
+	MOVQ	8(DI), DX	// arg 3 start
+	MOVQ	16(DI), CX	// arg 4 arg
+	MOVQ	SP, DI		// arg 1 &threadid (which we throw away)
+	CALL	libc_pthread_create(SB)
+	RET
+
+TEXT runtime·raise_trampoline(SB),NOSPLIT,$0
+	MOVL	0(DI), DI	// arg 1 signal
+	CALL	libc_raise(SB)
+	RET
+
+TEXT runtime·pthread_mutex_init_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI	// arg 2 attr
+	MOVQ	0(DI), DI	// arg 1 mutex
+	CALL	libc_pthread_mutex_init(SB)
+	RET
+
+TEXT runtime·pthread_mutex_lock_trampoline(SB),NOSPLIT,$0
+	MOVQ	0(DI), DI	// arg 1 mutex
+	CALL	libc_pthread_mutex_lock(SB)
+	RET
+
+TEXT runtime·pthread_mutex_unlock_trampoline(SB),NOSPLIT,$0
+	MOVQ	0(DI), DI	// arg 1 mutex
+	CALL	libc_pthread_mutex_unlock(SB)
+	RET
+
+TEXT runtime·pthread_cond_init_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI	// arg 2 attr
+	MOVQ	0(DI), DI	// arg 1 cond
+	CALL	libc_pthread_cond_init(SB)
+	RET
+
+TEXT runtime·pthread_cond_wait_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI	// arg 2 mutex
+	MOVQ	0(DI), DI	// arg 1 cond
+	CALL	libc_pthread_cond_wait(SB)
+	RET
+
+TEXT runtime·pthread_cond_timedwait_relative_np_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI	// arg 2 mutex
+	MOVQ	16(DI), DX	// arg 3 timeout
+	MOVQ	0(DI), DI	// arg 1 cond
+	CALL	libc_pthread_cond_timedwait_relative_np(SB)
+	RET
+
+TEXT runtime·pthread_cond_signal_trampoline(SB),NOSPLIT,$0
+	MOVQ	0(DI), DI	// arg 1 cond
+	CALL	libc_pthread_cond_signal(SB)
+	RET
+
+TEXT runtime·pthread_self_trampoline(SB),NOSPLIT,$0
+	MOVQ	DI, BX		// BX is caller-save
+	CALL	libc_pthread_self(SB)
+	MOVQ	AX, 0(BX)	// return value
+	RET
+
+TEXT runtime·pthread_kill_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI	// arg 2 sig
+	MOVQ	0(DI), DI	// arg 1 thread
+	CALL	libc_pthread_kill(SB)
+	RET
+
+TEXT runtime·osinit_hack_trampoline(SB),NOSPLIT,$0
+	MOVQ	$0, DI	// arg 1 val
+	CALL	libc_notify_is_valid_token(SB)
+	CALL	libc_xpc_date_create_from_current(SB)
+	RET
+
+// syscall calls a function in libc on behalf of the syscall package.
+// syscall takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall(SB),NOSPLIT,$16
+	MOVQ	(0*8)(DI), CX // fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	CX
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (4*8)(DI) // r1
+	MOVQ	DX, (5*8)(DI) // r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMPL	AX, $-1	      // Note: high 32 bits are junk
+	JNE	ok
+
+	// Get error code from libc.
+	CALL	libc_error(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (6*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	RET
+
+// syscallX calls a function in libc on behalf of the syscall package.
+// syscallX takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscallX must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscallX is like syscall but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscallX(SB),NOSPLIT,$16
+	MOVQ	(0*8)(DI), CX // fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	CX
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (4*8)(DI) // r1
+	MOVQ	DX, (5*8)(DI) // r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMPQ	AX, $-1
+	JNE	ok
+
+	// Get error code from libc.
+	CALL	libc_error(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (6*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	RET
+
+// syscallPtr is like syscallX except that the libc function reports an
+// error by returning NULL and setting errno.
+TEXT runtime·syscallPtr(SB),NOSPLIT,$16
+	MOVQ	(0*8)(DI), CX // fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	CX
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (4*8)(DI) // r1
+	MOVQ	DX, (5*8)(DI) // r2
+
+	// syscallPtr libc functions return NULL on error
+	// and set errno.
+	TESTQ	AX, AX
+	JNE	ok
+
+	// Get error code from libc.
+	CALL	libc_error(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (6*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	RET
+
+// syscall6 calls a function in libc on behalf of the syscall package.
+// syscall6 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall6 expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall6(SB),NOSPLIT,$16
+	MOVQ	(0*8)(DI), R11// fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	(4*8)(DI), CX // a4
+	MOVQ	(5*8)(DI), R8 // a5
+	MOVQ	(6*8)(DI), R9 // a6
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	R11
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (7*8)(DI) // r1
+	MOVQ	DX, (8*8)(DI) // r2
+
+	CMPL	AX, $-1
+	JNE	ok
+
+	CALL	libc_error(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (9*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	RET
+
+// syscall6X calls a function in libc on behalf of the syscall package.
+// syscall6X takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6X must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall6X is like syscall6 but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscall6X(SB),NOSPLIT,$16
+	MOVQ	(0*8)(DI), R11// fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	(4*8)(DI), CX // a4
+	MOVQ	(5*8)(DI), R8 // a5
+	MOVQ	(6*8)(DI), R9 // a6
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	R11
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (7*8)(DI) // r1
+	MOVQ	DX, (8*8)(DI) // r2
+
+	CMPQ	AX, $-1
+	JNE	ok
+
+	CALL	libc_error(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (9*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	RET
+
+// syscall9 calls a function in libc on behalf of the syscall package.
+// syscall9 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	a7    uintptr
+//	a8    uintptr
+//	a9    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall9 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall9 expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall9(SB),NOSPLIT,$16
+	MOVQ	(0*8)(DI), R13// fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	(4*8)(DI), CX // a4
+	MOVQ	(5*8)(DI), R8 // a5
+	MOVQ	(6*8)(DI), R9 // a6
+	MOVQ	(7*8)(DI), R10 // a7
+	MOVQ	(8*8)(DI), R11 // a8
+	MOVQ	(9*8)(DI), R12 // a9
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	R13
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (10*8)(DI) // r1
+	MOVQ	DX, (11*8)(DI) // r2
+
+	CMPL	AX, $-1
+	JNE	ok
+
+	CALL	libc_error(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (12*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	RET
+
+// syscall_x509 is for crypto/x509. It is like syscall6 but does not check for errors,
+// takes 5 uintptrs and 1 float64, and only returns one value,
+// for use with standard C ABI functions.
+TEXT runtime·syscall_x509(SB),NOSPLIT,$16
+	MOVQ	(0*8)(DI), R11// fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	(4*8)(DI), CX // a4
+	MOVQ	(5*8)(DI), R8 // a5
+	MOVQ	(6*8)(DI), X0 // f1
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	R11
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (7*8)(DI) // r1
+
+	XORL	AX, AX        // no error (it's ignored anyway)
+	RET
+
+TEXT runtime·issetugid_trampoline(SB),NOSPLIT,$0
+	CALL	libc_issetugid(SB)
+	RET
diff --git a/src/runtime/sys_darwin_arm64.go b/src/runtime/sys_darwin_arm64.go
new file mode 100644
index 0000000..6170f4f
--- /dev/null
+++ b/src/runtime/sys_darwin_arm64.go
@@ -0,0 +1,65 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+// libc function wrappers. Must run on system stack.
+
+//go:nosplit
+//go:cgo_unsafe_args
+func g0_pthread_key_create(k *pthreadkey, destructor uintptr) int32 {
+	ret := asmcgocall(unsafe.Pointer(abi.FuncPCABI0(pthread_key_create_trampoline)), unsafe.Pointer(&k))
+	KeepAlive(k)
+	return ret
+}
+func pthread_key_create_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func g0_pthread_setspecific(k pthreadkey, value uintptr) int32 {
+	return asmcgocall(unsafe.Pointer(abi.FuncPCABI0(pthread_setspecific_trampoline)), unsafe.Pointer(&k))
+}
+func pthread_setspecific_trampoline()
+
+//go:cgo_import_dynamic libc_pthread_key_create pthread_key_create "/usr/lib/libSystem.B.dylib"
+//go:cgo_import_dynamic libc_pthread_setspecific pthread_setspecific "/usr/lib/libSystem.B.dylib"
+
+// tlsinit allocates a thread-local storage slot for g.
+//
+// It finds the first available slot using pthread_key_create and uses
+// it as the offset value for runtime.tlsg.
+//
+// This runs at startup on g0 stack, but before g is set, so it must
+// not split stack (transitively). g is expected to be nil, so things
+// (e.g. asmcgocall) will skip saving or reading g.
+//
+//go:nosplit
+func tlsinit(tlsg *uintptr, tlsbase *[_PTHREAD_KEYS_MAX]uintptr) {
+	var k pthreadkey
+	err := g0_pthread_key_create(&k, 0)
+	if err != 0 {
+		abort()
+	}
+
+	const magic = 0xc476c475c47957
+	err = g0_pthread_setspecific(k, magic)
+	if err != 0 {
+		abort()
+	}
+
+	for i, x := range tlsbase {
+		if x == magic {
+			*tlsg = uintptr(i * goarch.PtrSize)
+			g0_pthread_setspecific(k, 0)
+			return
+		}
+	}
+	abort()
+}
diff --git a/src/runtime/sys_darwin_arm64.s b/src/runtime/sys_darwin_arm64.s
new file mode 100644
index 0000000..dc6caf8
--- /dev/null
+++ b/src/runtime/sys_darwin_arm64.s
@@ -0,0 +1,769 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// System calls and other sys.stuff for ARM64, Darwin
+// System calls are implemented in libSystem, this file contains
+// trampolines that convert from Go to C calling convention.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "cgo/abi_arm64.h"
+
+#define CLOCK_REALTIME		0
+
+TEXT notok<>(SB),NOSPLIT,$0
+	MOVD	$0, R8
+	MOVD	R8, (R8)
+	B	0(PC)
+
+TEXT runtime·open_trampoline(SB),NOSPLIT,$0
+	SUB	$16, RSP
+	MOVW	8(R0), R1	// arg 2 flags
+	MOVW	12(R0), R2	// arg 3 mode
+	MOVW	R2, (RSP)	// arg 3 is variadic, pass on stack
+	MOVD	0(R0), R0	// arg 1 pathname
+	BL	libc_open(SB)
+	ADD	$16, RSP
+	RET
+
+TEXT runtime·close_trampoline(SB),NOSPLIT,$0
+	MOVW	0(R0), R0	// arg 1 fd
+	BL	libc_close(SB)
+	RET
+
+TEXT runtime·write_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 buf
+	MOVW	16(R0), R2	// arg 3 count
+	MOVW	0(R0), R0	// arg 1 fd
+	BL	libc_write(SB)
+	MOVD	$-1, R1
+	CMP	R0, R1
+	BNE	noerr
+	BL	libc_error(SB)
+	MOVW	(R0), R0
+	NEG	R0, R0		// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·read_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 buf
+	MOVW	16(R0), R2	// arg 3 count
+	MOVW	0(R0), R0	// arg 1 fd
+	BL	libc_read(SB)
+	MOVD	$-1, R1
+	CMP	R0, R1
+	BNE	noerr
+	BL	libc_error(SB)
+	MOVW	(R0), R0
+	NEG	R0, R0		// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·pipe_trampoline(SB),NOSPLIT,$0
+	BL	libc_pipe(SB)	// pointer already in R0
+	CMP	$0, R0
+	BEQ	3(PC)
+	BL	libc_error(SB)	// return negative errno value
+	NEG	R0, R0
+	RET
+
+TEXT runtime·exit_trampoline(SB),NOSPLIT|NOFRAME,$0
+	MOVW	0(R0), R0
+	BL	libc_exit(SB)
+	MOVD	$1234, R0
+	MOVD	$1002, R1
+	MOVD	R0, (R1)	// fail hard
+
+TEXT runtime·raiseproc_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R19	// signal
+	BL	libc_getpid(SB)
+	// arg 1 pid already in R0 from getpid
+	MOVD	R19, R1	// arg 2 signal
+	BL	libc_kill(SB)
+	RET
+
+TEXT runtime·mmap_trampoline(SB),NOSPLIT,$0
+	MOVD	R0, R19
+	MOVD	0(R19), R0	// arg 1 addr
+	MOVD	8(R19), R1	// arg 2 len
+	MOVW	16(R19), R2	// arg 3 prot
+	MOVW	20(R19), R3	// arg 4 flags
+	MOVW	24(R19), R4	// arg 5 fd
+	MOVW	28(R19), R5	// arg 6 off
+	BL	libc_mmap(SB)
+	MOVD	$0, R1
+	MOVD	$-1, R2
+	CMP	R0, R2
+	BNE	ok
+	BL	libc_error(SB)
+	MOVW	(R0), R1
+	MOVD	$0, R0
+ok:
+	MOVD	R0, 32(R19) // ret 1 p
+	MOVD	R1, 40(R19)	// ret 2 err
+	RET
+
+TEXT runtime·munmap_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 len
+	MOVD	0(R0), R0	// arg 1 addr
+	BL	libc_munmap(SB)
+	CMP	$0, R0
+	BEQ	2(PC)
+	BL	notok<>(SB)
+	RET
+
+TEXT runtime·madvise_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 len
+	MOVW	16(R0), R2	// arg 3 advice
+	MOVD	0(R0), R0	// arg 1 addr
+	BL	libc_madvise(SB)
+	RET
+
+TEXT runtime·mlock_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 len
+	MOVD	0(R0), R0	// arg 1 addr
+	BL	libc_mlock(SB)
+	RET
+
+TEXT runtime·setitimer_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 new
+	MOVD	16(R0), R2	// arg 3 old
+	MOVW	0(R0), R0	// arg 1 which
+	BL	libc_setitimer(SB)
+	RET
+
+TEXT runtime·walltime_trampoline(SB),NOSPLIT,$0
+	MOVD	R0, R1			// arg 2 timespec
+	MOVW	$CLOCK_REALTIME, R0 	// arg 1 clock_id
+	BL	libc_clock_gettime(SB)
+	RET
+
+GLOBL timebase<>(SB),NOPTR,$(machTimebaseInfo__size)
+
+TEXT runtime·nanotime_trampoline(SB),NOSPLIT,$40
+	MOVD	R0, R19
+	BL	libc_mach_absolute_time(SB)
+	MOVD	R0, 0(R19)
+	MOVW	timebase<>+machTimebaseInfo_numer(SB), R20
+	MOVD	$timebase<>+machTimebaseInfo_denom(SB), R21
+	LDARW	(R21), R21	// atomic read
+	CMP	$0, R21
+	BNE	initialized
+
+	SUB	$(machTimebaseInfo__size+15)/16*16, RSP
+	MOVD	RSP, R0
+	BL	libc_mach_timebase_info(SB)
+	MOVW	machTimebaseInfo_numer(RSP), R20
+	MOVW	machTimebaseInfo_denom(RSP), R21
+	ADD	$(machTimebaseInfo__size+15)/16*16, RSP
+
+	MOVW	R20, timebase<>+machTimebaseInfo_numer(SB)
+	MOVD	$timebase<>+machTimebaseInfo_denom(SB), R22
+	STLRW	R21, (R22)	// atomic write
+
+initialized:
+	MOVW	R20, 8(R19)
+	MOVW	R21, 12(R19)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R0
+	MOVD	info+16(FP), R1
+	MOVD	ctx+24(FP), R2
+	MOVD	fn+0(FP), R11
+	BL	(R11)
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$176
+	// Save callee-save registers in the case of signal forwarding.
+	// Please refer to https://golang.org/issue/31827 .
+	SAVE_R19_TO_R28(8*4)
+	SAVE_F8_TO_F15(8*14)
+
+	// Save arguments.
+	MOVW	R0, (8*1)(RSP)	// sig
+	MOVD	R1, (8*2)(RSP)	// info
+	MOVD	R2, (8*3)(RSP)	// ctx
+
+	// this might be called in external code context,
+	// where g is not set.
+	BL	runtime·load_g(SB)
+
+#ifdef GOOS_ios
+	MOVD	RSP, R6
+	CMP	$0, g
+	BEQ	nog
+	// iOS always use the main stack to run the signal handler.
+	// We need to switch to gsignal ourselves.
+	MOVD	g_m(g), R11
+	MOVD	m_gsignal(R11), R5
+	MOVD	(g_stack+stack_hi)(R5), R6
+
+nog:
+	// Restore arguments.
+	MOVW	(8*1)(RSP), R0
+	MOVD	(8*2)(RSP), R1
+	MOVD	(8*3)(RSP), R2
+
+	// Reserve space for args and the stack pointer on the
+	// gsignal stack.
+	SUB	$48, R6
+	// Save stack pointer.
+	MOVD	RSP, R4
+	MOVD	R4, (8*4)(R6)
+	// Switch to gsignal stack.
+	MOVD	R6, RSP
+
+	// Save arguments.
+	MOVW	R0, (8*1)(RSP)
+	MOVD	R1, (8*2)(RSP)
+	MOVD	R2, (8*3)(RSP)
+#endif
+
+	// Call sigtrampgo.
+	MOVD	$runtime·sigtrampgo(SB), R11
+	BL	(R11)
+
+#ifdef GOOS_ios
+	// Switch to old stack.
+	MOVD	(8*4)(RSP), R5
+	MOVD	R5, RSP
+#endif
+
+	// Restore callee-save registers.
+	RESTORE_R19_TO_R28(8*4)
+	RESTORE_F8_TO_F15(8*14)
+
+	RET
+
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	JMP	runtime·sigtramp(SB)
+
+TEXT runtime·sigprocmask_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 new
+	MOVD	16(R0), R2	// arg 3 old
+	MOVW	0(R0), R0	// arg 1 how
+	BL	libc_pthread_sigmask(SB)
+	CMP	$0, R0
+	BEQ	2(PC)
+	BL	notok<>(SB)
+	RET
+
+TEXT runtime·sigaction_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 new
+	MOVD	16(R0), R2	// arg 3 old
+	MOVW	0(R0), R0	// arg 1 how
+	BL	libc_sigaction(SB)
+	CMP	$0, R0
+	BEQ	2(PC)
+	BL	notok<>(SB)
+	RET
+
+TEXT runtime·usleep_trampoline(SB),NOSPLIT,$0
+	MOVW	0(R0), R0	// arg 1 usec
+	BL	libc_usleep(SB)
+	RET
+
+TEXT runtime·sysctl_trampoline(SB),NOSPLIT,$0
+	MOVW	8(R0), R1	// arg 2 miblen
+	MOVD	16(R0), R2	// arg 3 oldp
+	MOVD	24(R0), R3	// arg 4 oldlenp
+	MOVD	32(R0), R4	// arg 5 newp
+	MOVD	40(R0), R5	// arg 6 newlen
+	MOVD	0(R0), R0	// arg 1 mib
+	BL	libc_sysctl(SB)
+	RET
+
+TEXT runtime·sysctlbyname_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 oldp
+	MOVD	16(R0), R2	// arg 3 oldlenp
+	MOVD	24(R0), R3	// arg 4 newp
+	MOVD	32(R0), R4	// arg 5 newlen
+	MOVD	0(R0), R0	// arg 1 name
+	BL	libc_sysctlbyname(SB)
+	RET
+
+
+TEXT runtime·kqueue_trampoline(SB),NOSPLIT,$0
+	BL	libc_kqueue(SB)
+	RET
+
+TEXT runtime·kevent_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 keventt
+	MOVW	16(R0), R2	// arg 3 nch
+	MOVD	24(R0), R3	// arg 4 ev
+	MOVW	32(R0), R4	// arg 5 nev
+	MOVD	40(R0), R5	// arg 6 ts
+	MOVW	0(R0), R0	// arg 1 kq
+	BL	libc_kevent(SB)
+	MOVD	$-1, R2
+	CMP	R0, R2
+	BNE	ok
+	BL	libc_error(SB)
+	MOVW	(R0), R0	// errno
+	NEG	R0, R0	// caller wants it as a negative error code
+ok:
+	RET
+
+TEXT runtime·fcntl_trampoline(SB),NOSPLIT,$0
+	SUB	$16, RSP
+	MOVD	R0, R19
+	MOVW	0(R19), R0	// arg 1 fd
+	MOVW	4(R19), R1	// arg 2 cmd
+	MOVW	8(R19), R2	// arg 3 arg
+	MOVW	R2, (RSP)	// arg 3 is variadic, pass on stack
+	BL	libc_fcntl(SB)
+	MOVD	$0, R1
+	MOVD	$-1, R2
+	CMP	R0, R2
+	BNE	noerr
+	BL	libc_error(SB)
+	MOVW	(R0), R1
+	MOVW	$-1, R0
+noerr:
+	MOVW	R0, 12(R19)
+	MOVW	R1, 16(R19)
+	ADD	$16, RSP
+	RET
+
+TEXT runtime·sigaltstack_trampoline(SB),NOSPLIT,$0
+#ifdef GOOS_ios
+	// sigaltstack on iOS is not supported and will always
+	// run the signal handler on the main stack, so our sigtramp has
+	// to do the stack switch ourselves.
+	MOVW	$43, R0
+	BL	libc_exit(SB)
+#else
+	MOVD	8(R0), R1		// arg 2 old
+	MOVD	0(R0), R0		// arg 1 new
+	CALL	libc_sigaltstack(SB)
+	CBZ	R0, 2(PC)
+	BL	notok<>(SB)
+#endif
+	RET
+
+// Thread related functions
+
+// mstart_stub is the first function executed on a new thread started by pthread_create.
+// It just does some low-level setup and then calls mstart.
+// Note: called with the C calling convention.
+TEXT runtime·mstart_stub(SB),NOSPLIT,$160
+	// R0 points to the m.
+	// We are already on m's g0 stack.
+
+	// Save callee-save registers.
+	SAVE_R19_TO_R28(8)
+	SAVE_F8_TO_F15(88)
+
+	MOVD	m_g0(R0), g
+	BL	·save_g(SB)
+
+	BL	runtime·mstart(SB)
+
+	// Restore callee-save registers.
+	RESTORE_R19_TO_R28(8)
+	RESTORE_F8_TO_F15(88)
+
+	// Go is all done with this OS thread.
+	// Tell pthread everything is ok (we never join with this thread, so
+	// the value here doesn't really matter).
+	MOVD	$0, R0
+
+	RET
+
+TEXT runtime·pthread_attr_init_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R0	// arg 1 attr
+	BL	libc_pthread_attr_init(SB)
+	RET
+
+TEXT runtime·pthread_attr_getstacksize_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 size
+	MOVD	0(R0), R0	// arg 1 attr
+	BL	libc_pthread_attr_getstacksize(SB)
+	RET
+
+TEXT runtime·pthread_attr_setdetachstate_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 state
+	MOVD	0(R0), R0	// arg 1 attr
+	BL	libc_pthread_attr_setdetachstate(SB)
+	RET
+
+TEXT runtime·pthread_create_trampoline(SB),NOSPLIT,$0
+	SUB	$16, RSP
+	MOVD	0(R0), R1	// arg 2 state
+	MOVD	8(R0), R2	// arg 3 start
+	MOVD	16(R0), R3	// arg 4 arg
+	MOVD	RSP, R0 	// arg 1 &threadid (which we throw away)
+	BL	libc_pthread_create(SB)
+	ADD	$16, RSP
+	RET
+
+TEXT runtime·raise_trampoline(SB),NOSPLIT,$0
+	MOVW	0(R0), R0	// arg 1 sig
+	BL	libc_raise(SB)
+	RET
+
+TEXT runtime·pthread_mutex_init_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 attr
+	MOVD	0(R0), R0	// arg 1 mutex
+	BL	libc_pthread_mutex_init(SB)
+	RET
+
+TEXT runtime·pthread_mutex_lock_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R0	// arg 1 mutex
+	BL	libc_pthread_mutex_lock(SB)
+	RET
+
+TEXT runtime·pthread_mutex_unlock_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R0	// arg 1 mutex
+	BL	libc_pthread_mutex_unlock(SB)
+	RET
+
+TEXT runtime·pthread_cond_init_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 attr
+	MOVD	0(R0), R0	// arg 1 cond
+	BL	libc_pthread_cond_init(SB)
+	RET
+
+TEXT runtime·pthread_cond_wait_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 mutex
+	MOVD	0(R0), R0	// arg 1 cond
+	BL	libc_pthread_cond_wait(SB)
+	RET
+
+TEXT runtime·pthread_cond_timedwait_relative_np_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 mutex
+	MOVD	16(R0), R2	// arg 3 timeout
+	MOVD	0(R0), R0	// arg 1 cond
+	BL	libc_pthread_cond_timedwait_relative_np(SB)
+	RET
+
+TEXT runtime·pthread_cond_signal_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R0	// arg 1 cond
+	BL	libc_pthread_cond_signal(SB)
+	RET
+
+TEXT runtime·pthread_self_trampoline(SB),NOSPLIT,$0
+	MOVD	R0, R19		// R19 is callee-save
+	BL	libc_pthread_self(SB)
+	MOVD	R0, 0(R19)	// return value
+	RET
+
+TEXT runtime·pthread_kill_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 sig
+	MOVD	0(R0), R0	// arg 1 thread
+	BL	libc_pthread_kill(SB)
+	RET
+
+TEXT runtime·pthread_key_create_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 destructor
+	MOVD	0(R0), R0	// arg 1 *key
+	BL	libc_pthread_key_create(SB)
+	RET
+
+TEXT runtime·pthread_setspecific_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1	// arg 2 value
+	MOVD	0(R0), R0	// arg 1 key
+	BL	libc_pthread_setspecific(SB)
+	RET
+
+TEXT runtime·osinit_hack_trampoline(SB),NOSPLIT,$0
+	MOVD	$0, R0	// arg 1 val
+	BL	libc_notify_is_valid_token(SB)
+	BL	libc_xpc_date_create_from_current(SB)
+	RET
+
+// syscall calls a function in libc on behalf of the syscall package.
+// syscall takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall must be called on the g0 stack with the
+// C calling convention (use libcCall).
+TEXT runtime·syscall(SB),NOSPLIT,$0
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R0, 8(RSP)
+
+	MOVD	0(R0), R12	// fn
+	MOVD	16(R0), R1	// a2
+	MOVD	24(R0), R2	// a3
+	MOVD	8(R0), R0	// a1
+
+	// If fn is declared as vararg, we have to pass the vararg arguments on the stack.
+	// (Because ios decided not to adhere to the standard arm64 calling convention, sigh...)
+	// The only libSystem calls we support that are vararg are open, fcntl, and ioctl,
+	// which are all of the form fn(x, y, ...). So we just need to put the 3rd arg
+	// on the stack as well.
+	// If we ever have other vararg libSystem calls, we might need to handle more cases.
+	MOVD	R2, (RSP)
+
+	BL	(R12)
+
+	MOVD	8(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 32(R2)	// save r1
+	MOVD	R1, 40(R2)	// save r2
+	CMPW	$-1, R0
+	BNE	ok
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R2, 8(RSP)
+	BL	libc_error(SB)
+	MOVW	(R0), R0
+	MOVD	8(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 48(R2)	// save err
+ok:
+	RET
+
+// syscallX calls a function in libc on behalf of the syscall package.
+// syscallX takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscallX must be called on the g0 stack with the
+// C calling convention (use libcCall).
+TEXT runtime·syscallX(SB),NOSPLIT,$0
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R0, (RSP)
+
+	MOVD	0(R0), R12	// fn
+	MOVD	16(R0), R1	// a2
+	MOVD	24(R0), R2	// a3
+	MOVD	8(R0), R0	// a1
+	BL	(R12)
+
+	MOVD	(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 32(R2)	// save r1
+	MOVD	R1, 40(R2)	// save r2
+	CMP	$-1, R0
+	BNE	ok
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R2, (RSP)
+	BL	libc_error(SB)
+	MOVW	(R0), R0
+	MOVD	(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 48(R2)	// save err
+ok:
+	RET
+
+// syscallPtr is like syscallX except that the libc function reports an
+// error by returning NULL and setting errno.
+TEXT runtime·syscallPtr(SB),NOSPLIT,$0
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R0, (RSP)
+
+	MOVD	0(R0), R12	// fn
+	MOVD	16(R0), R1	// a2
+	MOVD	24(R0), R2	// a3
+	MOVD	8(R0), R0	// a1
+	BL	(R12)
+
+	MOVD	(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 32(R2)	// save r1
+	MOVD	R1, 40(R2)	// save r2
+	CMP	$0, R0
+	BNE	ok
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R2, (RSP)
+	BL	libc_error(SB)
+	MOVW	(R0), R0
+	MOVD	(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 48(R2)	// save err
+ok:
+	RET
+
+// syscall6 calls a function in libc on behalf of the syscall package.
+// syscall6 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+TEXT runtime·syscall6(SB),NOSPLIT,$0
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R0, 8(RSP)
+
+	MOVD	0(R0), R12	// fn
+	MOVD	16(R0), R1	// a2
+	MOVD	24(R0), R2	// a3
+	MOVD	32(R0), R3	// a4
+	MOVD	40(R0), R4	// a5
+	MOVD	48(R0), R5	// a6
+	MOVD	8(R0), R0	// a1
+
+	// If fn is declared as vararg, we have to pass the vararg arguments on the stack.
+	// See syscall above. The only function this applies to is openat, for which the 4th
+	// arg must be on the stack.
+	MOVD	R3, (RSP)
+
+	BL	(R12)
+
+	MOVD	8(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 56(R2)	// save r1
+	MOVD	R1, 64(R2)	// save r2
+	CMPW	$-1, R0
+	BNE	ok
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R2, 8(RSP)
+	BL	libc_error(SB)
+	MOVW	(R0), R0
+	MOVD	8(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 72(R2)	// save err
+ok:
+	RET
+
+// syscall6X calls a function in libc on behalf of the syscall package.
+// syscall6X takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6X must be called on the g0 stack with the
+// C calling convention (use libcCall).
+TEXT runtime·syscall6X(SB),NOSPLIT,$0
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R0, (RSP)
+
+	MOVD	0(R0), R12	// fn
+	MOVD	16(R0), R1	// a2
+	MOVD	24(R0), R2	// a3
+	MOVD	32(R0), R3	// a4
+	MOVD	40(R0), R4	// a5
+	MOVD	48(R0), R5	// a6
+	MOVD	8(R0), R0	// a1
+	BL	(R12)
+
+	MOVD	(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 56(R2)	// save r1
+	MOVD	R1, 64(R2)	// save r2
+	CMP	$-1, R0
+	BNE	ok
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R2, (RSP)
+	BL	libc_error(SB)
+	MOVW	(R0), R0
+	MOVD	(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 72(R2)	// save err
+ok:
+	RET
+
+// syscall9 calls a function in libc on behalf of the syscall package.
+// syscall9 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	a7    uintptr
+//	a8    uintptr
+//	a9    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall9 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+TEXT runtime·syscall9(SB),NOSPLIT,$0
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R0, 8(RSP)
+
+	MOVD	0(R0), R12	// fn
+	MOVD	16(R0), R1	// a2
+	MOVD	24(R0), R2	// a3
+	MOVD	32(R0), R3	// a4
+	MOVD	40(R0), R4	// a5
+	MOVD	48(R0), R5	// a6
+	MOVD	56(R0), R6	// a7
+	MOVD	64(R0), R7	// a8
+	MOVD	72(R0), R8	// a9
+	MOVD	8(R0), R0	// a1
+
+	// If fn is declared as vararg, we have to pass the vararg arguments on the stack.
+	// See syscall above. The only function this applies to is openat, for which the 4th
+	// arg must be on the stack.
+	MOVD	R3, (RSP)
+
+	BL	(R12)
+
+	MOVD	8(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 80(R2)	// save r1
+	MOVD	R1, 88(R2)	// save r2
+	CMPW	$-1, R0
+	BNE	ok
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R2, 8(RSP)
+	BL	libc_error(SB)
+	MOVW	(R0), R0
+	MOVD	8(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 96(R2)	// save err
+ok:
+	RET
+
+// syscall_x509 is for crypto/x509. It is like syscall6 but does not check for errors,
+// takes 5 uintptrs and 1 float64, and only returns one value,
+// for use with standard C ABI functions.
+TEXT runtime·syscall_x509(SB),NOSPLIT,$0
+	SUB	$16, RSP	// push structure pointer
+	MOVD	R0, (RSP)
+
+	MOVD	0(R0), R12	// fn
+	MOVD	16(R0), R1	// a2
+	MOVD	24(R0), R2	// a3
+	MOVD	32(R0), R3	// a4
+	MOVD	40(R0), R4	// a5
+	FMOVD	48(R0), F0	// f1
+	MOVD	8(R0), R0	// a1
+	BL	(R12)
+
+	MOVD	(RSP), R2	// pop structure pointer
+	ADD	$16, RSP
+	MOVD	R0, 56(R2)	// save r1
+	RET
+
+TEXT runtime·issetugid_trampoline(SB),NOSPLIT,$0
+	BL	libc_issetugid(SB)
+	RET
diff --git a/src/runtime/sys_dragonfly_amd64.s b/src/runtime/sys_dragonfly_amd64.s
new file mode 100644
index 0000000..a223c2c
--- /dev/null
+++ b/src/runtime/sys_dragonfly_amd64.s
@@ -0,0 +1,412 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for AMD64, FreeBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "cgo/abi_amd64.h"
+
+TEXT runtime·sys_umtx_sleep(SB),NOSPLIT,$0
+	MOVQ addr+0(FP), DI		// arg 1 - ptr
+	MOVL val+8(FP), SI		// arg 2 - value
+	MOVL timeout+12(FP), DX		// arg 3 - timeout
+	MOVL $469, AX		// umtx_sleep
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·sys_umtx_wakeup(SB),NOSPLIT,$0
+	MOVQ addr+0(FP), DI		// arg 1 - ptr
+	MOVL val+8(FP), SI		// arg 2 - count
+	MOVL $470, AX		// umtx_wakeup
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·lwp_create(SB),NOSPLIT,$0
+	MOVQ param+0(FP), DI		// arg 1 - params
+	MOVL $495, AX		// lwp_create
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·lwp_start(SB),NOSPLIT,$0
+	MOVQ	DI, R13 // m
+
+	// set up FS to point at m->tls
+	LEAQ	m_tls(R13), DI
+	CALL	runtime·settls(SB)	// smashes DI
+
+	// set up m, g
+	get_tls(CX)
+	MOVQ	m_g0(R13), DI
+	MOVQ	R13, g_m(DI)
+	MOVQ	DI, g(CX)
+
+	CALL	runtime·stackcheck(SB)
+	CALL	runtime·mstart(SB)
+
+	MOVQ 0, AX			// crash (not reached)
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT,$-8
+	MOVL	code+0(FP), DI		// arg 1 exit status
+	MOVL	$1, AX
+	SYSCALL
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-8
+	MOVQ	wait+0(FP), AX
+	// We're done using the stack.
+	MOVL	$0, (AX)
+	MOVL	$0x10000, DI	// arg 1 how - EXTEXIT_LWP
+	MOVL	$0, SI		// arg 2 status
+	MOVL	$0, DX		// arg 3 addr
+	MOVL	$494, AX	// extexit
+	SYSCALL
+	MOVL	$0xf1, 0xf1  // crash
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$-8
+	MOVQ	name+0(FP), DI		// arg 1 pathname
+	MOVL	mode+8(FP), SI		// arg 2 flags
+	MOVL	perm+12(FP), DX		// arg 3 mode
+	MOVL	$5, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$-8
+	MOVL	fd+0(FP), DI		// arg 1 fd
+	MOVL	$6, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$-8
+	MOVL	fd+0(FP), DI		// arg 1 fd
+	MOVQ	p+8(FP), SI		// arg 2 buf
+	MOVL	n+16(FP), DX		// arg 3 count
+	MOVL	$3, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGL	AX			// caller expects negative errno
+	MOVL	AX, ret+24(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-20
+	MOVL	$0, DI
+	// dragonfly expects flags as the 2nd argument
+	MOVL	flags+0(FP), SI
+	MOVL	$538, AX
+	SYSCALL
+	JCC	pipe2ok
+	MOVL	$-1,r+8(FP)
+	MOVL	$-1,w+12(FP)
+	MOVL	AX, errno+16(FP)
+	RET
+pipe2ok:
+	MOVL	AX, r+8(FP)
+	MOVL	DX, w+12(FP)
+	MOVL	$0, errno+16(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$-8
+	MOVQ	fd+0(FP), DI		// arg 1 fd
+	MOVQ	p+8(FP), SI		// arg 2 buf
+	MOVL	n+16(FP), DX		// arg 3 count
+	MOVL	$4, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGL	AX			// caller expects negative errno
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·lwp_gettid(SB),NOSPLIT,$0-4
+	MOVL	$496, AX	// lwp_gettid
+	SYSCALL
+	MOVL	AX, ret+0(FP)
+	RET
+
+TEXT runtime·lwp_kill(SB),NOSPLIT,$0-16
+	MOVL	pid+0(FP), DI	// arg 1 - pid
+	MOVL	tid+4(FP), SI	// arg 2 - tid
+	MOVQ	sig+8(FP), DX	// arg 3 - signum
+	MOVL	$497, AX	// lwp_kill
+	SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$0
+	MOVL	$20, AX		// getpid
+	SYSCALL
+	MOVQ	AX, DI		// arg 1 - pid
+	MOVL	sig+0(FP), SI	// arg 2 - signum
+	MOVL	$37, AX		// kill
+	SYSCALL
+	RET
+
+TEXT runtime·setitimer(SB), NOSPLIT, $-8
+	MOVL	mode+0(FP), DI
+	MOVQ	new+8(FP), SI
+	MOVQ	old+16(FP), DX
+	MOVL	$83, AX
+	SYSCALL
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB), NOSPLIT, $32
+	MOVL	$232, AX // clock_gettime
+	MOVQ	$0, DI  	// CLOCK_REALTIME
+	LEAQ	8(SP), SI
+	SYSCALL
+	MOVQ	8(SP), AX	// sec
+	MOVQ	16(SP), DX	// nsec
+
+	// sec is in AX, nsec in DX
+	MOVQ	AX, sec+0(FP)
+	MOVL	DX, nsec+8(FP)
+	RET
+
+TEXT runtime·nanotime1(SB), NOSPLIT, $32
+	MOVL	$232, AX
+	MOVQ	$4, DI  	// CLOCK_MONOTONIC
+	LEAQ	8(SP), SI
+	SYSCALL
+	MOVQ	8(SP), AX	// sec
+	MOVQ	16(SP), DX	// nsec
+
+	// sec is in AX, nsec in DX
+	// return nsec in AX
+	IMULQ	$1000000000, AX
+	ADDQ	DX, AX
+	MOVQ	AX, ret+0(FP)
+	RET
+
+TEXT runtime·sigaction(SB),NOSPLIT,$-8
+	MOVL	sig+0(FP), DI		// arg 1 sig
+	MOVQ	new+8(FP), SI		// arg 2 act
+	MOVQ	old+16(FP), DX		// arg 3 oact
+	MOVL	$342, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVQ	fn+0(FP),    AX
+	MOVL	sig+8(FP),   DI
+	MOVQ	info+16(FP), SI
+	MOVQ	ctx+24(FP),  DX
+	MOVQ	SP, BX		// callee-saved
+	ANDQ	$~15, SP	// alignment for x86_64 ABI
+	CALL	AX
+	MOVQ	BX, SP
+	RET
+
+// Called using C ABI.
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
+	// Transition from C ABI to Go ABI.
+	PUSH_REGS_HOST_TO_ABI0()
+
+	// Set up ABIInternal environment: g in R14, cleared X15.
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	PXOR	X15, X15
+
+	// Reserve space for spill slots.
+	NOP	SP		// disable vet stack checking
+	ADJSP   $24
+
+	// Call into the Go signal handler
+	MOVQ	DI, AX	// sig
+	MOVQ	SI, BX	// info
+	MOVQ	DX, CX	// ctx
+	CALL	·sigtrampgo<ABIInternal>(SB)
+
+	ADJSP	$-24
+
+	POP_REGS_HOST_TO_ABI0()
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI		// arg 1 - addr
+	MOVQ	n+8(FP), SI		// arg 2 - len
+	MOVL	prot+16(FP), DX		// arg 3 - prot
+	MOVL	flags+20(FP), R10		// arg 4 - flags
+	MOVL	fd+24(FP), R8		// arg 5 - fd
+	MOVL	off+28(FP), R9
+	SUBQ	$16, SP
+	MOVQ	R9, 8(SP)		// arg 7 - offset (passed on stack)
+	MOVQ	$0, R9			// arg 6 - pad
+	MOVL	$197, AX
+	SYSCALL
+	JCC	ok
+	ADDQ	$16, SP
+	MOVQ	$0, p+32(FP)
+	MOVQ	AX, err+40(FP)
+	RET
+ok:
+	ADDQ	$16, SP
+	MOVQ	AX, p+32(FP)
+	MOVQ	$0, err+40(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI		// arg 1 addr
+	MOVQ	n+8(FP), SI		// arg 2 len
+	MOVL	$73, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVL	flags+16(FP), DX
+	MOVQ	$75, AX	// madvise
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$-8
+	MOVQ	new+0(FP), DI
+	MOVQ	old+8(FP), SI
+	MOVQ	$53, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16
+	MOVL	$0, DX
+	MOVL	usec+0(FP), AX
+	MOVL	$1000000, CX
+	DIVL	CX
+	MOVQ	AX, 0(SP)		// tv_sec
+	MOVL	$1000, AX
+	MULL	DX
+	MOVQ	AX, 8(SP)		// tv_nsec
+
+	MOVQ	SP, DI			// arg 1 - rqtp
+	MOVQ	$0, SI			// arg 2 - rmtp
+	MOVL	$240, AX		// sys_nanosleep
+	SYSCALL
+	RET
+
+// set tls base to DI
+TEXT runtime·settls(SB),NOSPLIT,$16
+	ADDQ	$8, DI	// adjust for ELF: wants to use -8(FS) for g
+	MOVQ	DI, 0(SP)
+	MOVQ	$16, 8(SP)
+	MOVQ	$0, DI			// arg 1 - which
+	MOVQ	SP, SI			// arg 2 - tls_info
+	MOVQ	$16, DX			// arg 3 - infosize
+	MOVQ	$472, AX		// set_tls_area
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$0
+	MOVQ	mib+0(FP), DI		// arg 1 - name
+	MOVL	miblen+8(FP), SI		// arg 2 - namelen
+	MOVQ	out+16(FP), DX		// arg 3 - oldp
+	MOVQ	size+24(FP), R10		// arg 4 - oldlenp
+	MOVQ	dst+32(FP), R8		// arg 5 - newp
+	MOVQ	ndst+40(FP), R9		// arg 6 - newlen
+	MOVQ	$202, AX		// sys___sysctl
+	SYSCALL
+	JCC 4(PC)
+	NEGQ	AX
+	MOVL	AX, ret+48(FP)
+	RET
+	MOVL	$0, AX
+	MOVL	AX, ret+48(FP)
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$-4
+	MOVL	$331, AX		// sys_sched_yield
+	SYSCALL
+	RET
+
+TEXT runtime·sigprocmask(SB),NOSPLIT,$0
+	MOVL	how+0(FP), DI		// arg 1 - how
+	MOVQ	new+8(FP), SI		// arg 2 - set
+	MOVQ	old+16(FP), DX		// arg 3 - oset
+	MOVL	$340, AX		// sys_sigprocmask
+	SYSCALL
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+// int32 runtime·kqueue(void);
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVQ	$0, DI
+	MOVQ	$0, SI
+	MOVQ	$0, DX
+	MOVL	$362, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout);
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVL	kq+0(FP), DI
+	MOVQ	ch+8(FP), SI
+	MOVL	nch+16(FP), DX
+	MOVQ	ev+24(FP), R10
+	MOVL	nev+32(FP), R8
+	MOVQ	ts+40(FP), R9
+	MOVL	$363, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+48(FP)
+	RET
+
+// func fcntl(fd, cmd, arg int32) (ret int32, errno int32)
+TEXT runtime·fcntl(SB),NOSPLIT,$0
+	MOVL	fd+0(FP), DI	// fd
+	MOVL	cmd+4(FP), SI	// cmd
+	MOVL	arg+8(FP), DX	// arg
+	MOVL	$92, AX		// fcntl
+	SYSCALL
+	JCC	noerr
+	MOVL	$-1, ret+16(FP)
+	MOVL	AX, errno+20(FP)
+	RET
+noerr:
+	MOVL	AX, ret+16(FP)
+	MOVL	$0, errno+20(FP)
+	RET
+
+// func issetugid() int32
+TEXT runtime·issetugid(SB),NOSPLIT,$0
+	MOVQ	$0, DI
+	MOVQ	$0, SI
+	MOVQ	$0, DX
+	MOVL	$253, AX
+	SYSCALL
+	MOVL	AX, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_freebsd_386.s b/src/runtime/sys_freebsd_386.s
new file mode 100644
index 0000000..184cd14
--- /dev/null
+++ b/src/runtime/sys_freebsd_386.s
@@ -0,0 +1,482 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for 386, FreeBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME		0
+#define CLOCK_MONOTONIC		4
+
+#define SYS_exit		1
+#define SYS_read		3
+#define SYS_write		4
+#define SYS_open		5
+#define SYS_close		6
+#define SYS_getpid		20
+#define SYS_kill		37
+#define SYS_sigaltstack		53
+#define SYS_munmap		73
+#define SYS_madvise		75
+#define SYS_setitimer		83
+#define SYS_fcntl		92
+#define SYS_sysarch		165
+#define SYS___sysctl		202
+#define SYS_clock_gettime	232
+#define SYS_nanosleep		240
+#define SYS_issetugid		253
+#define SYS_sched_yield		331
+#define SYS_sigprocmask		340
+#define SYS_kqueue		362
+#define SYS_sigaction		416
+#define SYS_sigreturn		417
+#define SYS_thr_exit		431
+#define SYS_thr_self		432
+#define SYS_thr_kill		433
+#define SYS__umtx_op		454
+#define SYS_thr_new		455
+#define SYS_mmap		477
+#define SYS_cpuset_getaffinity	487
+#define SYS_pipe2 		542
+#define SYS_kevent		560
+
+TEXT runtime·sys_umtx_op(SB),NOSPLIT,$-4
+	MOVL	$SYS__umtx_op, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+20(FP)
+	RET
+
+TEXT runtime·thr_new(SB),NOSPLIT,$-4
+	MOVL	$SYS_thr_new, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+8(FP)
+	RET
+
+// Called by OS using C ABI.
+TEXT runtime·thr_start(SB),NOSPLIT,$0
+	NOP	SP	// tell vet SP changed - stop checking offsets
+	MOVL	4(SP), AX // m
+	MOVL	m_g0(AX), BX
+	LEAL	m_tls(AX), BP
+	MOVL	m_id(AX), DI
+	ADDL	$7, DI
+	PUSHAL
+	PUSHL	$32
+	PUSHL	BP
+	PUSHL	DI
+	CALL	runtime·setldt(SB)
+	POPL	AX
+	POPL	AX
+	POPL	AX
+	POPAL
+	get_tls(CX)
+	MOVL	BX, g(CX)
+
+	MOVL	AX, g_m(BX)
+	CALL	runtime·stackcheck(SB)		// smashes AX
+	CALL	runtime·mstart(SB)
+
+	MOVL	0, AX			// crash (not reached)
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT,$-4
+	MOVL	$SYS_exit, AX
+	INT	$0x80
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+GLOBL exitStack<>(SB),RODATA,$8
+DATA exitStack<>+0x00(SB)/4, $0
+DATA exitStack<>+0x04(SB)/4, $0
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-4
+	MOVL	wait+0(FP), AX
+	// We're done using the stack.
+	MOVL	$0, (AX)
+	// thr_exit takes a single pointer argument, which it expects
+	// on the stack. We want to pass 0, so switch over to a fake
+	// stack of 0s. It won't write to the stack.
+	MOVL	$exitStack<>(SB), SP
+	MOVL	$SYS_thr_exit, AX
+	INT	$0x80
+	MOVL	$0xf1, 0xf1  // crash
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$-4
+	MOVL	$SYS_open, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$-4
+	MOVL	$SYS_close, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$-4
+	MOVL	$SYS_read, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX			// caller expects negative errno
+	MOVL	AX, ret+12(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$12-16
+	MOVL	$SYS_pipe2, AX
+	LEAL	r+4(FP), BX
+	MOVL	BX, 4(SP)
+	MOVL	flags+0(FP), BX
+	MOVL	BX, 8(SP)
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, errno+12(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$-4
+	MOVL	$SYS_write, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX			// caller expects negative errno
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·thr_self(SB),NOSPLIT,$8-4
+	// thr_self(&0(FP))
+	LEAL	ret+0(FP), AX
+	MOVL	AX, 4(SP)
+	MOVL	$SYS_thr_self, AX
+	INT	$0x80
+	RET
+
+TEXT runtime·thr_kill(SB),NOSPLIT,$-4
+	// thr_kill(tid, sig)
+	MOVL	$SYS_thr_kill, AX
+	INT	$0x80
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$16
+	// getpid
+	MOVL	$SYS_getpid, AX
+	INT	$0x80
+	// kill(self, sig)
+	MOVL	AX, 4(SP)
+	MOVL	sig+0(FP), AX
+	MOVL	AX, 8(SP)
+	MOVL	$SYS_kill, AX
+	INT	$0x80
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$32
+	LEAL addr+0(FP), SI
+	LEAL	4(SP), DI
+	CLD
+	MOVSL
+	MOVSL
+	MOVSL
+	MOVSL
+	MOVSL
+	MOVSL
+	MOVL	$0, AX	// top 32 bits of file offset
+	STOSL
+	MOVL	$SYS_mmap, AX
+	INT	$0x80
+	JAE	ok
+	MOVL	$0, p+24(FP)
+	MOVL	AX, err+28(FP)
+	RET
+ok:
+	MOVL	AX, p+24(FP)
+	MOVL	$0, err+28(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$-4
+	MOVL	$SYS_munmap, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$-4
+	MOVL	$SYS_madvise, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·setitimer(SB), NOSPLIT, $-4
+	MOVL	$SYS_setitimer, AX
+	INT	$0x80
+	RET
+
+// func fallback_walltime() (sec int64, nsec int32)
+TEXT runtime·fallback_walltime(SB), NOSPLIT, $32-12
+	MOVL	$SYS_clock_gettime, AX
+	LEAL	12(SP), BX
+	MOVL	$CLOCK_REALTIME, 4(SP)
+	MOVL	BX, 8(SP)
+	INT	$0x80
+	MOVL	12(SP), AX	// sec
+	MOVL	16(SP), BX	// nsec
+
+	// sec is in AX, nsec in BX
+	MOVL	AX, sec_lo+0(FP)
+	MOVL	$0, sec_hi+4(FP)
+	MOVL	BX, nsec+8(FP)
+	RET
+
+// func fallback_nanotime() int64
+TEXT runtime·fallback_nanotime(SB), NOSPLIT, $32-8
+	MOVL	$SYS_clock_gettime, AX
+	LEAL	12(SP), BX
+	MOVL	$CLOCK_MONOTONIC, 4(SP)
+	MOVL	BX, 8(SP)
+	INT	$0x80
+	MOVL	12(SP), AX	// sec
+	MOVL	16(SP), BX	// nsec
+
+	// sec is in AX, nsec in BX
+	// convert to DX:AX nsec
+	MOVL	$1000000000, CX
+	MULL	CX
+	ADDL	BX, AX
+	ADCL	$0, DX
+
+	MOVL	AX, ret_lo+0(FP)
+	MOVL	DX, ret_hi+4(FP)
+	RET
+
+
+TEXT runtime·asmSigaction(SB),NOSPLIT,$-4
+	MOVL	$SYS_sigaction, AX
+	INT	$0x80
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$12-16
+	MOVL	fn+0(FP), AX
+	MOVL	sig+4(FP), BX
+	MOVL	info+8(FP), CX
+	MOVL	ctx+12(FP), DX
+	MOVL	SP, SI
+	SUBL	$32, SP
+	ANDL	$~15, SP	// align stack: handler might be a C function
+	MOVL	BX, 0(SP)
+	MOVL	CX, 4(SP)
+	MOVL	DX, 8(SP)
+	MOVL	SI, 12(SP)	// save SI: handler might be a Go function
+	CALL	AX
+	MOVL	12(SP), AX
+	MOVL	AX, SP
+	RET
+
+// Called by OS using C ABI.
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$12
+	NOP	SP	// tell vet SP changed - stop checking offsets
+	MOVL	16(SP), BX	// signo
+	MOVL	BX, 0(SP)
+	MOVL	20(SP), BX // info
+	MOVL	BX, 4(SP)
+	MOVL	24(SP), BX // context
+	MOVL	BX, 8(SP)
+	CALL	runtime·sigtrampgo(SB)
+
+	// call sigreturn
+	MOVL	24(SP), AX	// context
+	MOVL	$0, 0(SP)	// syscall gap
+	MOVL	AX, 4(SP)
+	MOVL	$SYS_sigreturn, AX
+	INT	$0x80
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$0
+	MOVL	$SYS_sigaltstack, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$20
+	MOVL	$0, DX
+	MOVL	usec+0(FP), AX
+	MOVL	$1000000, CX
+	DIVL	CX
+	MOVL	AX, 12(SP)		// tv_sec
+	MOVL	$1000, AX
+	MULL	DX
+	MOVL	AX, 16(SP)		// tv_nsec
+
+	MOVL	$0, 0(SP)
+	LEAL	12(SP), AX
+	MOVL	AX, 4(SP)		// arg 1 - rqtp
+	MOVL	$0, 8(SP)		// arg 2 - rmtp
+	MOVL	$SYS_nanosleep, AX
+	INT	$0x80
+	RET
+
+/*
+descriptor entry format for system call
+is the native machine format, ugly as it is:
+
+	2-byte limit
+	3-byte base
+	1-byte: 0x80=present, 0x60=dpl<<5, 0x1F=type
+	1-byte: 0x80=limit is *4k, 0x40=32-bit operand size,
+		0x0F=4 more bits of limit
+	1 byte: 8 more bits of base
+
+int i386_get_ldt(int, union ldt_entry *, int);
+int i386_set_ldt(int, const union ldt_entry *, int);
+
+*/
+
+// setldt(int entry, int address, int limit)
+TEXT runtime·setldt(SB),NOSPLIT,$32
+	MOVL	base+4(FP), BX
+	// see comment in sys_linux_386.s; freebsd is similar
+	ADDL	$0x4, BX
+
+	// set up data_desc
+	LEAL	16(SP), AX	// struct data_desc
+	MOVL	$0, 0(AX)
+	MOVL	$0, 4(AX)
+
+	MOVW	BX, 2(AX)
+	SHRL	$16, BX
+	MOVB	BX, 4(AX)
+	SHRL	$8, BX
+	MOVB	BX, 7(AX)
+
+	MOVW	$0xffff, 0(AX)
+	MOVB	$0xCF, 6(AX)	// 32-bit operand, 4k limit unit, 4 more bits of limit
+
+	MOVB	$0xF2, 5(AX)	// r/w data descriptor, dpl=3, present
+
+	// call i386_set_ldt(entry, desc, 1)
+	MOVL	$0xffffffff, 0(SP)	// auto-allocate entry and return in AX
+	MOVL	AX, 4(SP)
+	MOVL	$1, 8(SP)
+	CALL	i386_set_ldt<>(SB)
+
+	// compute segment selector - (entry*8+7)
+	SHLL	$3, AX
+	ADDL	$7, AX
+	MOVW	AX, GS
+	RET
+
+TEXT i386_set_ldt<>(SB),NOSPLIT,$16
+	LEAL	args+0(FP), AX	// 0(FP) == 4(SP) before SP got moved
+	MOVL	$0, 0(SP)	// syscall gap
+	MOVL	$1, 4(SP)
+	MOVL	AX, 8(SP)
+	MOVL	$SYS_sysarch, AX
+	INT	$0x80
+	JAE	2(PC)
+	INT	$3
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$28
+	LEAL	mib+0(FP), SI
+	LEAL	4(SP), DI
+	CLD
+	MOVSL				// arg 1 - name
+	MOVSL				// arg 2 - namelen
+	MOVSL				// arg 3 - oldp
+	MOVSL				// arg 4 - oldlenp
+	MOVSL				// arg 5 - newp
+	MOVSL				// arg 6 - newlen
+	MOVL	$SYS___sysctl, AX
+	INT	$0x80
+	JAE	4(PC)
+	NEGL	AX
+	MOVL	AX, ret+24(FP)
+	RET
+	MOVL	$0, AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$-4
+	MOVL	$SYS_sched_yield, AX
+	INT	$0x80
+	RET
+
+TEXT runtime·sigprocmask(SB),NOSPLIT,$16
+	MOVL	$0, 0(SP)		// syscall gap
+	MOVL	how+0(FP), AX		// arg 1 - how
+	MOVL	AX, 4(SP)
+	MOVL	new+4(FP), AX
+	MOVL	AX, 8(SP)		// arg 2 - set
+	MOVL	old+8(FP), AX
+	MOVL	AX, 12(SP)		// arg 3 - oset
+	MOVL	$SYS_sigprocmask, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+// int32 runtime·kqueue(void);
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVL	$SYS_kqueue, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout);
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVL	$SYS_kevent, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+// func fcntl(fd, cmd, arg int32) (int32, int32)
+TEXT runtime·fcntl(SB),NOSPLIT,$-4
+	MOVL	$SYS_fcntl, AX
+	INT	$0x80
+	JAE	noerr
+	MOVL	$-1, ret+12(FP)
+	MOVL	AX, errno+16(FP)
+	RET
+noerr:
+	MOVL	AX, ret+12(FP)
+	MOVL	$0, errno+16(FP)
+	RET
+
+// func cpuset_getaffinity(level int, which int, id int64, size int, mask *byte) int32
+TEXT runtime·cpuset_getaffinity(SB), NOSPLIT, $0-28
+	MOVL	$SYS_cpuset_getaffinity, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+GLOBL runtime·tlsoffset(SB),NOPTR,$4
+
+// func issetugid() int32
+TEXT runtime·issetugid(SB),NOSPLIT,$0
+	MOVL	$SYS_issetugid, AX
+	INT	$0x80
+	MOVL	AX, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_freebsd_amd64.s b/src/runtime/sys_freebsd_amd64.s
new file mode 100644
index 0000000..977ea09
--- /dev/null
+++ b/src/runtime/sys_freebsd_amd64.s
@@ -0,0 +1,588 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for AMD64, FreeBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "cgo/abi_amd64.h"
+
+#define CLOCK_REALTIME		0
+#define CLOCK_MONOTONIC		4
+#define AMD64_SET_FSBASE	129
+
+#define SYS_exit		1
+#define SYS_read		3
+#define SYS_write		4
+#define SYS_open		5
+#define SYS_close		6
+#define SYS_getpid		20
+#define SYS_kill		37
+#define SYS_sigaltstack		53
+#define SYS_munmap		73
+#define SYS_madvise		75
+#define SYS_setitimer		83
+#define SYS_fcntl		92
+#define SYS_sysarch		165
+#define SYS___sysctl		202
+#define SYS_clock_gettime	232
+#define SYS_nanosleep		240
+#define SYS_issetugid		253
+#define SYS_sched_yield		331
+#define SYS_sigprocmask		340
+#define SYS_kqueue		362
+#define SYS_sigaction		416
+#define SYS_thr_exit		431
+#define SYS_thr_self		432
+#define SYS_thr_kill		433
+#define SYS__umtx_op		454
+#define SYS_thr_new		455
+#define SYS_mmap		477
+#define SYS_cpuset_getaffinity	487
+#define SYS_pipe2 		542
+#define SYS_kevent		560
+
+TEXT runtime·sys_umtx_op(SB),NOSPLIT,$0
+	MOVQ addr+0(FP), DI
+	MOVL mode+8(FP), SI
+	MOVL val+12(FP), DX
+	MOVQ uaddr1+16(FP), R10
+	MOVQ ut+24(FP), R8
+	MOVL $SYS__umtx_op, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+32(FP)
+	RET
+
+TEXT runtime·thr_new(SB),NOSPLIT,$0
+	MOVQ param+0(FP), DI
+	MOVL size+8(FP), SI
+	MOVL $SYS_thr_new, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·thr_start(SB),NOSPLIT,$0
+	MOVQ	DI, R13 // m
+
+	// set up FS to point at m->tls
+	LEAQ	m_tls(R13), DI
+	CALL	runtime·settls(SB)	// smashes DI
+
+	// set up m, g
+	get_tls(CX)
+	MOVQ	m_g0(R13), DI
+	MOVQ	R13, g_m(DI)
+	MOVQ	DI, g(CX)
+
+	CALL	runtime·stackcheck(SB)
+	CALL	runtime·mstart(SB)
+
+	MOVQ 0, AX			// crash (not reached)
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT,$-8
+	MOVL	code+0(FP), DI		// arg 1 exit status
+	MOVL	$SYS_exit, AX
+	SYSCALL
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+// func exitThread(wait *atomic.uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-8
+	MOVQ	wait+0(FP), AX
+	// We're done using the stack.
+	MOVL	$0, (AX)
+	MOVL	$0, DI		// arg 1 long *state
+	MOVL	$SYS_thr_exit, AX
+	SYSCALL
+	MOVL	$0xf1, 0xf1  // crash
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$-8
+	MOVQ	name+0(FP), DI		// arg 1 pathname
+	MOVL	mode+8(FP), SI		// arg 2 flags
+	MOVL	perm+12(FP), DX		// arg 3 mode
+	MOVL	$SYS_open, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$-8
+	MOVL	fd+0(FP), DI		// arg 1 fd
+	MOVL	$SYS_close, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$-8
+	MOVL	fd+0(FP), DI		// arg 1 fd
+	MOVQ	p+8(FP), SI		// arg 2 buf
+	MOVL	n+16(FP), DX		// arg 3 count
+	MOVL	$SYS_read, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX			// caller expects negative errno
+	MOVL	AX, ret+24(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-20
+	LEAQ	r+8(FP), DI
+	MOVL	flags+0(FP), SI
+	MOVL	$SYS_pipe2, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, errno+16(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$-8
+	MOVQ	fd+0(FP), DI		// arg 1 fd
+	MOVQ	p+8(FP), SI		// arg 2 buf
+	MOVL	n+16(FP), DX		// arg 3 count
+	MOVL	$SYS_write, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX			// caller expects negative errno
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·thr_self(SB),NOSPLIT,$0-8
+	// thr_self(&0(FP))
+	LEAQ	ret+0(FP), DI	// arg 1
+	MOVL	$SYS_thr_self, AX
+	SYSCALL
+	RET
+
+TEXT runtime·thr_kill(SB),NOSPLIT,$0-16
+	// thr_kill(tid, sig)
+	MOVQ	tid+0(FP), DI	// arg 1 id
+	MOVQ	sig+8(FP), SI	// arg 2 sig
+	MOVL	$SYS_thr_kill, AX
+	SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$0
+	// getpid
+	MOVL	$SYS_getpid, AX
+	SYSCALL
+	// kill(self, sig)
+	MOVQ	AX, DI		// arg 1 pid
+	MOVL	sig+0(FP), SI	// arg 2 sig
+	MOVL	$SYS_kill, AX
+	SYSCALL
+	RET
+
+TEXT runtime·setitimer(SB), NOSPLIT, $-8
+	MOVL	mode+0(FP), DI
+	MOVQ	new+8(FP), SI
+	MOVQ	old+16(FP), DX
+	MOVL	$SYS_setitimer, AX
+	SYSCALL
+	RET
+
+// func fallback_walltime() (sec int64, nsec int32)
+TEXT runtime·fallback_walltime(SB), NOSPLIT, $32-12
+	MOVL	$SYS_clock_gettime, AX
+	MOVQ	$CLOCK_REALTIME, DI
+	LEAQ	8(SP), SI
+	SYSCALL
+	MOVQ	8(SP), AX	// sec
+	MOVQ	16(SP), DX	// nsec
+
+	// sec is in AX, nsec in DX
+	MOVQ	AX, sec+0(FP)
+	MOVL	DX, nsec+8(FP)
+	RET
+
+TEXT runtime·fallback_nanotime(SB), NOSPLIT, $32-8
+	MOVL	$SYS_clock_gettime, AX
+	MOVQ	$CLOCK_MONOTONIC, DI
+	LEAQ	8(SP), SI
+	SYSCALL
+	MOVQ	8(SP), AX	// sec
+	MOVQ	16(SP), DX	// nsec
+
+	// sec is in AX, nsec in DX
+	// return nsec in AX
+	IMULQ	$1000000000, AX
+	ADDQ	DX, AX
+	MOVQ	AX, ret+0(FP)
+	RET
+
+TEXT runtime·asmSigaction(SB),NOSPLIT,$0
+	MOVQ	sig+0(FP), DI		// arg 1 sig
+	MOVQ	new+8(FP), SI		// arg 2 act
+	MOVQ	old+16(FP), DX		// arg 3 oact
+	MOVL	$SYS_sigaction, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·callCgoSigaction(SB),NOSPLIT,$16
+	MOVQ	sig+0(FP), DI		// arg 1 sig
+	MOVQ	new+8(FP), SI		// arg 2 act
+	MOVQ	old+16(FP), DX		// arg 3 oact
+	MOVQ	_cgo_sigaction(SB), AX
+	MOVQ	SP, BX			// callee-saved
+	ANDQ	$~15, SP		// alignment as per amd64 psABI
+	CALL	AX
+	MOVQ	BX, SP
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVQ	fn+0(FP),    AX
+	MOVL	sig+8(FP),   DI
+	MOVQ	info+16(FP), SI
+	MOVQ	ctx+24(FP),  DX
+	MOVQ	SP, BX		// callee-saved
+	ANDQ	$~15, SP	// alignment for x86_64 ABI
+	CALL	AX
+	MOVQ	BX, SP
+	RET
+
+// Called using C ABI.
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
+	// Transition from C ABI to Go ABI.
+	PUSH_REGS_HOST_TO_ABI0()
+
+	// Set up ABIInternal environment: g in R14, cleared X15.
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	PXOR	X15, X15
+
+	// Reserve space for spill slots.
+	NOP	SP		// disable vet stack checking
+	ADJSP   $24
+
+	// Call into the Go signal handler
+	MOVQ	DI, AX	// sig
+	MOVQ	SI, BX	// info
+	MOVQ	DX, CX	// ctx
+	CALL	·sigtrampgo<ABIInternal>(SB)
+
+	ADJSP	$-24
+
+	POP_REGS_HOST_TO_ABI0()
+	RET
+
+// Called using C ABI.
+TEXT runtime·sigprofNonGoWrapper<>(SB),NOSPLIT|NOFRAME,$0
+	// Transition from C ABI to Go ABI.
+	PUSH_REGS_HOST_TO_ABI0()
+
+	// Set up ABIInternal environment: g in R14, cleared X15.
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	PXOR	X15, X15
+
+	// Reserve space for spill slots.
+	NOP	SP		// disable vet stack checking
+	ADJSP   $24
+
+	// Call into the Go signal handler
+	MOVQ	DI, AX	// sig
+	MOVQ	SI, BX	// info
+	MOVQ	DX, CX	// ctx
+	CALL	·sigprofNonGo<ABIInternal>(SB)
+
+	ADJSP	$-24
+
+	POP_REGS_HOST_TO_ABI0()
+	RET
+
+// Used instead of sigtramp in programs that use cgo.
+// Arguments from kernel are in DI, SI, DX.
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	// If no traceback function, do usual sigtramp.
+	MOVQ	runtime·cgoTraceback(SB), AX
+	TESTQ	AX, AX
+	JZ	sigtramp
+
+	// If no traceback support function, which means that
+	// runtime/cgo was not linked in, do usual sigtramp.
+	MOVQ	_cgo_callers(SB), AX
+	TESTQ	AX, AX
+	JZ	sigtramp
+
+	// Figure out if we are currently in a cgo call.
+	// If not, just do usual sigtramp.
+	get_tls(CX)
+	MOVQ	g(CX),AX
+	TESTQ	AX, AX
+	JZ	sigtrampnog     // g == nil
+	MOVQ	g_m(AX), AX
+	TESTQ	AX, AX
+	JZ	sigtramp        // g.m == nil
+	MOVL	m_ncgo(AX), CX
+	TESTL	CX, CX
+	JZ	sigtramp        // g.m.ncgo == 0
+	MOVQ	m_curg(AX), CX
+	TESTQ	CX, CX
+	JZ	sigtramp        // g.m.curg == nil
+	MOVQ	g_syscallsp(CX), CX
+	TESTQ	CX, CX
+	JZ	sigtramp        // g.m.curg.syscallsp == 0
+	MOVQ	m_cgoCallers(AX), R8
+	TESTQ	R8, R8
+	JZ	sigtramp        // g.m.cgoCallers == nil
+	MOVL	m_cgoCallersUse(AX), CX
+	TESTL	CX, CX
+	JNZ	sigtramp	// g.m.cgoCallersUse != 0
+
+	// Jump to a function in runtime/cgo.
+	// That function, written in C, will call the user's traceback
+	// function with proper unwind info, and will then call back here.
+	// The first three arguments, and the fifth, are already in registers.
+	// Set the two remaining arguments now.
+	MOVQ	runtime·cgoTraceback(SB), CX
+	MOVQ	$runtime·sigtramp(SB), R9
+	MOVQ	_cgo_callers(SB), AX
+	JMP	AX
+
+sigtramp:
+	JMP	runtime·sigtramp(SB)
+
+sigtrampnog:
+	// Signal arrived on a non-Go thread. If this is SIGPROF, get a
+	// stack trace.
+	CMPL	DI, $27 // 27 == SIGPROF
+	JNZ	sigtramp
+
+	// Lock sigprofCallersUse.
+	MOVL	$0, AX
+	MOVL	$1, CX
+	MOVQ	$runtime·sigprofCallersUse(SB), R11
+	LOCK
+	CMPXCHGL	CX, 0(R11)
+	JNZ	sigtramp  // Skip stack trace if already locked.
+
+	// Jump to the traceback function in runtime/cgo.
+	// It will call back to sigprofNonGo, via sigprofNonGoWrapper, to convert
+	// the arguments to the Go calling convention.
+	// First three arguments to traceback function are in registers already.
+	MOVQ	runtime·cgoTraceback(SB), CX
+	MOVQ	$runtime·sigprofCallers(SB), R8
+	MOVQ	$runtime·sigprofNonGoWrapper<>(SB), R9
+	MOVQ	_cgo_callers(SB), AX
+	JMP	AX
+
+TEXT runtime·sysMmap(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI		// arg 1 addr
+	MOVQ	n+8(FP), SI		// arg 2 len
+	MOVL	prot+16(FP), DX		// arg 3 prot
+	MOVL	flags+20(FP), R10		// arg 4 flags
+	MOVL	fd+24(FP), R8		// arg 5 fid
+	MOVL	off+28(FP), R9		// arg 6 offset
+	MOVL	$SYS_mmap, AX
+	SYSCALL
+	JCC	ok
+	MOVQ	$0, p+32(FP)
+	MOVQ	AX, err+40(FP)
+	RET
+ok:
+	MOVQ	AX, p+32(FP)
+	MOVQ	$0, err+40(FP)
+	RET
+
+// Call the function stored in _cgo_mmap using the GCC calling convention.
+// This must be called on the system stack.
+TEXT runtime·callCgoMmap(SB),NOSPLIT,$16
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVL	prot+16(FP), DX
+	MOVL	flags+20(FP), CX
+	MOVL	fd+24(FP), R8
+	MOVL	off+28(FP), R9
+	MOVQ	_cgo_mmap(SB), AX
+	MOVQ	SP, BX
+	ANDQ	$~15, SP	// alignment as per amd64 psABI
+	MOVQ	BX, 0(SP)
+	CALL	AX
+	MOVQ	0(SP), SP
+	MOVQ	AX, ret+32(FP)
+	RET
+
+TEXT runtime·sysMunmap(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI		// arg 1 addr
+	MOVQ	n+8(FP), SI		// arg 2 len
+	MOVL	$SYS_munmap, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+// Call the function stored in _cgo_munmap using the GCC calling convention.
+// This must be called on the system stack.
+TEXT runtime·callCgoMunmap(SB),NOSPLIT,$16-16
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVQ	_cgo_munmap(SB), AX
+	MOVQ	SP, BX
+	ANDQ	$~15, SP	// alignment as per amd64 psABI
+	MOVQ	BX, 0(SP)
+	CALL	AX
+	MOVQ	0(SP), SP
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVL	flags+16(FP), DX
+	MOVQ	$SYS_madvise, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$-8
+	MOVQ	new+0(FP), DI
+	MOVQ	old+8(FP), SI
+	MOVQ	$SYS_sigaltstack, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16
+	MOVL	$0, DX
+	MOVL	usec+0(FP), AX
+	MOVL	$1000000, CX
+	DIVL	CX
+	MOVQ	AX, 0(SP)		// tv_sec
+	MOVL	$1000, AX
+	MULL	DX
+	MOVQ	AX, 8(SP)		// tv_nsec
+
+	MOVQ	SP, DI			// arg 1 - rqtp
+	MOVQ	$0, SI			// arg 2 - rmtp
+	MOVL	$SYS_nanosleep, AX
+	SYSCALL
+	RET
+
+// set tls base to DI
+TEXT runtime·settls(SB),NOSPLIT,$8
+	ADDQ	$8, DI	// adjust for ELF: wants to use -8(FS) for g and m
+	MOVQ	DI, 0(SP)
+	MOVQ	SP, SI
+	MOVQ	$AMD64_SET_FSBASE, DI
+	MOVQ	$SYS_sysarch, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$0
+	MOVQ	mib+0(FP), DI		// arg 1 - name
+	MOVL	miblen+8(FP), SI		// arg 2 - namelen
+	MOVQ	out+16(FP), DX		// arg 3 - oldp
+	MOVQ	size+24(FP), R10		// arg 4 - oldlenp
+	MOVQ	dst+32(FP), R8		// arg 5 - newp
+	MOVQ	ndst+40(FP), R9		// arg 6 - newlen
+	MOVQ	$SYS___sysctl, AX
+	SYSCALL
+	JCC 4(PC)
+	NEGQ	AX
+	MOVL	AX, ret+48(FP)
+	RET
+	MOVL	$0, AX
+	MOVL	AX, ret+48(FP)
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$-4
+	MOVL	$SYS_sched_yield, AX
+	SYSCALL
+	RET
+
+TEXT runtime·sigprocmask(SB),NOSPLIT,$0
+	MOVL	how+0(FP), DI		// arg 1 - how
+	MOVQ	new+8(FP), SI		// arg 2 - set
+	MOVQ	old+16(FP), DX		// arg 3 - oset
+	MOVL	$SYS_sigprocmask, AX
+	SYSCALL
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+// int32 runtime·kqueue(void);
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVQ	$0, DI
+	MOVQ	$0, SI
+	MOVQ	$0, DX
+	MOVL	$SYS_kqueue, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout);
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVL	kq+0(FP), DI
+	MOVQ	ch+8(FP), SI
+	MOVL	nch+16(FP), DX
+	MOVQ	ev+24(FP), R10
+	MOVL	nev+32(FP), R8
+	MOVQ	ts+40(FP), R9
+	MOVL	$SYS_kevent, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+48(FP)
+	RET
+
+// func fcntl(fd, cmd, arg int32) (int32, int32)
+TEXT runtime·fcntl(SB),NOSPLIT,$0
+	MOVL	fd+0(FP), DI	// fd
+	MOVL	cmd+4(FP), SI	// cmd
+	MOVL	arg+8(FP), DX	// arg
+	MOVL	$SYS_fcntl, AX
+	SYSCALL
+	JCC	noerr
+	MOVL	$-1, ret+16(FP)
+	MOVL	AX, errno+20(FP)
+	RET
+noerr:
+	MOVL	AX, ret+16(FP)
+	MOVL	$0, errno+20(FP)
+	RET
+
+// func cpuset_getaffinity(level int, which int, id int64, size int, mask *byte) int32
+TEXT runtime·cpuset_getaffinity(SB), NOSPLIT, $0-44
+	MOVQ	level+0(FP), DI
+	MOVQ	which+8(FP), SI
+	MOVQ	id+16(FP), DX
+	MOVQ	size+24(FP), R10
+	MOVQ	mask+32(FP), R8
+	MOVL	$SYS_cpuset_getaffinity, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+40(FP)
+	RET
+
+// func issetugid() int32
+TEXT runtime·issetugid(SB),NOSPLIT,$0
+	MOVQ	$0, DI
+	MOVQ	$0, SI
+	MOVQ	$0, DX
+	MOVL	$SYS_issetugid, AX
+	SYSCALL
+	MOVL	AX, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_freebsd_arm.s b/src/runtime/sys_freebsd_arm.s
new file mode 100644
index 0000000..44430f5
--- /dev/null
+++ b/src/runtime/sys_freebsd_arm.s
@@ -0,0 +1,456 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for ARM, FreeBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// for EABI, as we don't support OABI
+#define SYS_BASE 0x0
+
+#define SYS_exit (SYS_BASE + 1)
+#define SYS_read (SYS_BASE + 3)
+#define SYS_write (SYS_BASE + 4)
+#define SYS_open (SYS_BASE + 5)
+#define SYS_close (SYS_BASE + 6)
+#define SYS_getpid (SYS_BASE + 20)
+#define SYS_kill (SYS_BASE + 37)
+#define SYS_sigaltstack (SYS_BASE + 53)
+#define SYS_munmap (SYS_BASE + 73)
+#define SYS_madvise (SYS_BASE + 75)
+#define SYS_setitimer (SYS_BASE + 83)
+#define SYS_fcntl (SYS_BASE + 92)
+#define SYS___sysctl (SYS_BASE + 202)
+#define SYS_nanosleep (SYS_BASE + 240)
+#define SYS_issetugid (SYS_BASE + 253)
+#define SYS_clock_gettime (SYS_BASE + 232)
+#define SYS_sched_yield (SYS_BASE + 331)
+#define SYS_sigprocmask (SYS_BASE + 340)
+#define SYS_kqueue (SYS_BASE + 362)
+#define SYS_sigaction (SYS_BASE + 416)
+#define SYS_thr_exit (SYS_BASE + 431)
+#define SYS_thr_self (SYS_BASE + 432)
+#define SYS_thr_kill (SYS_BASE + 433)
+#define SYS__umtx_op (SYS_BASE + 454)
+#define SYS_thr_new (SYS_BASE + 455)
+#define SYS_mmap (SYS_BASE + 477)
+#define SYS_cpuset_getaffinity (SYS_BASE + 487)
+#define SYS_pipe2 (SYS_BASE + 542)
+#define SYS_kevent (SYS_BASE + 560)
+
+TEXT runtime·sys_umtx_op(SB),NOSPLIT,$0
+	MOVW addr+0(FP), R0
+	MOVW mode+4(FP), R1
+	MOVW val+8(FP), R2
+	MOVW uaddr1+12(FP), R3
+	ADD $20, R13 // arg 5 is passed on stack
+	MOVW $SYS__umtx_op, R7
+	SWI $0
+	RSB.CS $0, R0
+	SUB $20, R13
+	// BCS error
+	MOVW	R0, ret+20(FP)
+	RET
+
+TEXT runtime·thr_new(SB),NOSPLIT,$0
+	MOVW param+0(FP), R0
+	MOVW size+4(FP), R1
+	MOVW $SYS_thr_new, R7
+	SWI $0
+	RSB.CS $0, R0
+	MOVW	R0, ret+8(FP)
+	RET
+
+TEXT runtime·thr_start(SB),NOSPLIT,$0
+	// set up g
+	MOVW m_g0(R0), g
+	MOVW R0, g_m(g)
+	BL runtime·emptyfunc(SB) // fault if stack check is wrong
+	BL runtime·mstart(SB)
+
+	MOVW $2, R8  // crash (not reached)
+	MOVW R8, (R8)
+	RET
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0
+	MOVW code+0(FP), R0	// arg 1 exit status
+	MOVW $SYS_exit, R7
+	SWI $0
+	MOVW.CS $0, R8 // crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-4
+	MOVW	wait+0(FP), R0
+	// We're done using the stack.
+	MOVW	$0, R2
+storeloop:
+	LDREX	(R0), R4          // loads R4
+	STREX	R2, (R0), R1      // stores R2
+	CMP	$0, R1
+	BNE	storeloop
+	MOVW	$0, R0		// arg 1 long *state
+	MOVW	$SYS_thr_exit, R7
+	SWI	$0
+	MOVW.CS	$0, R8 // crash on syscall failure
+	MOVW.CS	R8, (R8)
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0
+	MOVW name+0(FP), R0	// arg 1 name
+	MOVW mode+4(FP), R1	// arg 2 mode
+	MOVW perm+8(FP), R2	// arg 3 perm
+	MOVW $SYS_open, R7
+	SWI $0
+	MOVW.CS	$-1, R0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0
+	MOVW fd+0(FP), R0	// arg 1 fd
+	MOVW p+4(FP), R1	// arg 2 buf
+	MOVW n+8(FP), R2	// arg 3 count
+	MOVW $SYS_read, R7
+	SWI $0
+	RSB.CS	$0, R0		// caller expects negative errno
+	MOVW	R0, ret+12(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-16
+	MOVW	$r+4(FP), R0
+	MOVW	flags+0(FP), R1
+	MOVW	$SYS_pipe2, R7
+	SWI	$0
+	RSB.CS $0, R0
+	MOVW	R0, errno+12(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0
+	MOVW fd+0(FP), R0	// arg 1 fd
+	MOVW p+4(FP), R1	// arg 2 buf
+	MOVW n+8(FP), R2	// arg 3 count
+	MOVW $SYS_write, R7
+	SWI $0
+	RSB.CS	$0, R0		// caller expects negative errno
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0
+	MOVW fd+0(FP), R0	// arg 1 fd
+	MOVW $SYS_close, R7
+	SWI $0
+	MOVW.CS	$-1, R0
+	MOVW	R0, ret+4(FP)
+	RET
+
+TEXT runtime·thr_self(SB),NOSPLIT,$0-4
+	// thr_self(&0(FP))
+	MOVW $ret+0(FP), R0 // arg 1
+	MOVW $SYS_thr_self, R7
+	SWI $0
+	RET
+
+TEXT runtime·thr_kill(SB),NOSPLIT,$0-8
+	// thr_kill(tid, sig)
+	MOVW tid+0(FP), R0	// arg 1 id
+	MOVW sig+4(FP), R1	// arg 2 signal
+	MOVW $SYS_thr_kill, R7
+	SWI $0
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$0
+	// getpid
+	MOVW $SYS_getpid, R7
+	SWI $0
+	// kill(self, sig)
+				// arg 1 - pid, now in R0
+	MOVW sig+0(FP), R1	// arg 2 - signal
+	MOVW $SYS_kill, R7
+	SWI $0
+	RET
+
+TEXT runtime·setitimer(SB), NOSPLIT|NOFRAME, $0
+	MOVW mode+0(FP), R0
+	MOVW new+4(FP), R1
+	MOVW old+8(FP), R2
+	MOVW $SYS_setitimer, R7
+	SWI $0
+	RET
+
+// func fallback_walltime() (sec int64, nsec int32)
+TEXT runtime·fallback_walltime(SB), NOSPLIT, $32-12
+	MOVW $0, R0 // CLOCK_REALTIME
+	MOVW $8(R13), R1
+	MOVW $SYS_clock_gettime, R7
+	SWI $0
+
+	MOVW 8(R13), R0 // sec.low
+	MOVW 12(R13), R1 // sec.high
+	MOVW 16(R13), R2 // nsec
+
+	MOVW R0, sec_lo+0(FP)
+	MOVW R1, sec_hi+4(FP)
+	MOVW R2, nsec+8(FP)
+	RET
+
+// func fallback_nanotime() int64
+TEXT runtime·fallback_nanotime(SB), NOSPLIT, $32
+	MOVW $4, R0 // CLOCK_MONOTONIC
+	MOVW $8(R13), R1
+	MOVW $SYS_clock_gettime, R7
+	SWI $0
+
+	MOVW 8(R13), R0 // sec.low
+	MOVW 12(R13), R4 // sec.high
+	MOVW 16(R13), R2 // nsec
+
+	MOVW $1000000000, R3
+	MULLU R0, R3, (R1, R0)
+	MUL R3, R4
+	ADD.S R2, R0
+	ADC R4, R1
+
+	MOVW R0, ret_lo+0(FP)
+	MOVW R1, ret_hi+4(FP)
+	RET
+
+TEXT runtime·asmSigaction(SB),NOSPLIT|NOFRAME,$0
+	MOVW sig+0(FP), R0		// arg 1 sig
+	MOVW new+4(FP), R1		// arg 2 act
+	MOVW old+8(FP), R2		// arg 3 oact
+	MOVW $SYS_sigaction, R7
+	SWI $0
+	MOVW.CS	$-1, R0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$0
+	// Reserve space for callee-save registers and arguments.
+	MOVM.DB.W [R4-R11], (R13)
+	SUB	$16, R13
+
+	// this might be called in external code context,
+	// where g is not set.
+	// first save R0, because runtime·load_g will clobber it
+	MOVW	R0, 4(R13) // signum
+	MOVB	runtime·iscgo(SB), R0
+	CMP 	$0, R0
+	BL.NE	runtime·load_g(SB)
+
+	MOVW	R1, 8(R13)
+	MOVW	R2, 12(R13)
+	BL	runtime·sigtrampgo(SB)
+
+	// Restore callee-save registers.
+	ADD	$16, R13
+	MOVM.IA.W (R13), [R4-R11]
+
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$16
+	MOVW addr+0(FP), R0		// arg 1 addr
+	MOVW n+4(FP), R1		// arg 2 len
+	MOVW prot+8(FP), R2		// arg 3 prot
+	MOVW flags+12(FP), R3		// arg 4 flags
+	// arg 5 (fid) and arg6 (offset_lo, offset_hi) are passed on stack
+	// note the C runtime only passes the 32-bit offset_lo to us
+	MOVW fd+16(FP), R4		// arg 5
+	MOVW R4, 4(R13)
+	MOVW off+20(FP), R5		// arg 6 lower 32-bit
+	// the word at 8(R13) is skipped due to 64-bit argument alignment.
+	MOVW R5, 12(R13)
+	MOVW $0, R6 		// higher 32-bit for arg 6
+	MOVW R6, 16(R13)
+	ADD $4, R13
+	MOVW $SYS_mmap, R7
+	SWI $0
+	SUB $4, R13
+	MOVW $0, R1
+	MOVW.CS R0, R1		// if failed, put in R1
+	MOVW.CS $0, R0
+	MOVW	R0, p+24(FP)
+	MOVW	R1, err+28(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVW addr+0(FP), R0		// arg 1 addr
+	MOVW n+4(FP), R1		// arg 2 len
+	MOVW $SYS_munmap, R7
+	SWI $0
+	MOVW.CS $0, R8 // crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVW	addr+0(FP), R0		// arg 1 addr
+	MOVW	n+4(FP), R1		// arg 2 len
+	MOVW	flags+8(FP), R2		// arg 3 flags
+	MOVW	$SYS_madvise, R7
+	SWI	$0
+	MOVW.CS $-1, R0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOVW new+0(FP), R0
+	MOVW old+4(FP), R1
+	MOVW $SYS_sigaltstack, R7
+	SWI $0
+	MOVW.CS $0, R8 // crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-16
+	MOVW	sig+4(FP), R0
+	MOVW	info+8(FP), R1
+	MOVW	ctx+12(FP), R2
+	MOVW	fn+0(FP), R11
+	MOVW	R13, R4
+	SUB	$24, R13
+	BIC	$0x7, R13 // alignment for ELF ABI
+	BL	(R11)
+	MOVW	R4, R13
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16
+	MOVW usec+0(FP), R0
+	CALL runtime·usplitR0(SB)
+	// 0(R13) is the saved LR, don't use it
+	MOVW R0, 4(R13) // tv_sec.low
+	MOVW $0, R0
+	MOVW R0, 8(R13) // tv_sec.high
+	MOVW $1000, R2
+	MUL R1, R2
+	MOVW R2, 12(R13) // tv_nsec
+
+	MOVW $4(R13), R0 // arg 1 - rqtp
+	MOVW $0, R1      // arg 2 - rmtp
+	MOVW $SYS_nanosleep, R7
+	SWI $0
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$0
+	MOVW mib+0(FP), R0	// arg 1 - name
+	MOVW miblen+4(FP), R1	// arg 2 - namelen
+	MOVW out+8(FP), R2	// arg 3 - old
+	MOVW size+12(FP), R3	// arg 4 - oldlenp
+	// arg 5 (newp) and arg 6 (newlen) are passed on stack
+	ADD $20, R13
+	MOVW $SYS___sysctl, R7
+	SWI $0
+	SUB.CS $0, R0, R0
+	SUB $20, R13
+	MOVW	R0, ret+24(FP)
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT|NOFRAME,$0
+	MOVW $SYS_sched_yield, R7
+	SWI $0
+	RET
+
+TEXT runtime·sigprocmask(SB),NOSPLIT,$0
+	MOVW how+0(FP), R0	// arg 1 - how
+	MOVW new+4(FP), R1	// arg 2 - set
+	MOVW old+8(FP), R2	// arg 3 - oset
+	MOVW $SYS_sigprocmask, R7
+	SWI $0
+	MOVW.CS $0, R8 // crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+// int32 runtime·kqueue(void)
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVW $SYS_kqueue, R7
+	SWI $0
+	RSB.CS $0, R0
+	MOVW	R0, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout)
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVW kq+0(FP), R0	// kq
+	MOVW ch+4(FP), R1	// changelist
+	MOVW nch+8(FP), R2	// nchanges
+	MOVW ev+12(FP), R3	// eventlist
+	ADD $20, R13	// pass arg 5 and 6 on stack
+	MOVW $SYS_kevent, R7
+	SWI $0
+	RSB.CS $0, R0
+	SUB $20, R13
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func fcntl(fd, cmd, arg int32) (int32, int32)
+TEXT runtime·fcntl(SB),NOSPLIT,$0
+	MOVW fd+0(FP), R0	// fd
+	MOVW cmd+4(FP), R1	// cmd
+	MOVW arg+8(FP), R2	// arg
+	MOVW $SYS_fcntl, R7
+	SWI $0
+	MOVW $0, R1
+	MOVW.CS R0, R1
+	MOVW.CS $-1, R0
+	MOVW R0, ret+12(FP)
+	MOVW R1, errno+16(FP)
+	RET
+
+// TODO: this is only valid for ARMv7+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	B	runtime·armPublicationBarrier(SB)
+
+// TODO(minux): this only supports ARMv6K+.
+TEXT runtime·read_tls_fallback(SB),NOSPLIT|NOFRAME,$0
+	WORD $0xee1d0f70 // mrc p15, 0, r0, c13, c0, 3
+	RET
+
+// func cpuset_getaffinity(level int, which int, id int64, size int, mask *byte) int32
+TEXT runtime·cpuset_getaffinity(SB), NOSPLIT, $0-28
+	MOVW	level+0(FP), R0
+	MOVW	which+4(FP), R1
+	MOVW	id_lo+8(FP), R2
+	MOVW	id_hi+12(FP), R3
+	ADD	$20, R13	// Pass size and mask on stack.
+	MOVW	$SYS_cpuset_getaffinity, R7
+	SWI	$0
+	RSB.CS	$0, R0
+	SUB	$20, R13
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func getCntxct(physical bool) uint32
+TEXT runtime·getCntxct(SB),NOSPLIT|NOFRAME,$0-8
+	MOVB	runtime·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	DMB
+
+	MOVB	physical+0(FP), R0
+	CMP	$1, R0
+	B.NE	3(PC)
+
+	// get CNTPCT (Physical Count Register) into R0(low) R1(high)
+	// mrrc    15, 0, r0, r1, cr14
+	WORD	$0xec510f0e
+	B	2(PC)
+
+	// get CNTVCT (Virtual Count Register) into R0(low) R1(high)
+	// mrrc    15, 1, r0, r1, cr14
+	WORD	$0xec510f1e
+
+	MOVW	R0, ret+4(FP)
+	RET
+
+// func issetugid() int32
+TEXT runtime·issetugid(SB),NOSPLIT,$0
+	MOVW $SYS_issetugid, R7
+	SWI $0
+	MOVW	R0, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_freebsd_arm64.s b/src/runtime/sys_freebsd_arm64.s
new file mode 100644
index 0000000..8fb46f4
--- /dev/null
+++ b/src/runtime/sys_freebsd_arm64.s
@@ -0,0 +1,476 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for arm64, FreeBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "cgo/abi_arm64.h"
+
+#define CLOCK_REALTIME		0
+#define CLOCK_MONOTONIC		4
+
+#define SYS_exit		1
+#define SYS_read		3
+#define SYS_write		4
+#define SYS_open		5
+#define SYS_close		6
+#define SYS_getpid		20
+#define SYS_kill		37
+#define SYS_sigaltstack		53
+#define SYS_munmap		73
+#define SYS_madvise		75
+#define SYS_setitimer		83
+#define SYS_fcntl		92
+#define SYS___sysctl		202
+#define SYS_nanosleep		240
+#define SYS_issetugid		253
+#define SYS_clock_gettime	232
+#define SYS_sched_yield		331
+#define SYS_sigprocmask		340
+#define SYS_kqueue		362
+#define SYS_sigaction		416
+#define SYS_thr_exit		431
+#define SYS_thr_self		432
+#define SYS_thr_kill		433
+#define SYS__umtx_op		454
+#define SYS_thr_new		455
+#define SYS_mmap		477
+#define SYS_cpuset_getaffinity	487
+#define SYS_pipe2 		542
+#define SYS_kevent		560
+
+TEXT emptyfunc<>(SB),0,$0-0
+	RET
+
+// func sys_umtx_op(addr *uint32, mode int32, val uint32, uaddr1 uintptr, ut *umtx_time) int32
+TEXT runtime·sys_umtx_op(SB),NOSPLIT,$0
+	MOVD	addr+0(FP), R0
+	MOVW	mode+8(FP), R1
+	MOVW	val+12(FP), R2
+	MOVD	uaddr1+16(FP), R3
+	MOVD	ut+24(FP), R4
+	MOVD	$SYS__umtx_op, R8
+	SVC
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+32(FP)
+	RET
+
+// func thr_new(param *thrparam, size int32) int32
+TEXT runtime·thr_new(SB),NOSPLIT,$0
+	MOVD	param+0(FP), R0
+	MOVW	size+8(FP), R1
+	MOVD	$SYS_thr_new, R8
+	SVC
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+16(FP)
+	RET
+
+// func thr_start()
+TEXT runtime·thr_start(SB),NOSPLIT,$0
+	// set up g
+	MOVD	m_g0(R0), g
+	MOVD	R0, g_m(g)
+	BL	emptyfunc<>(SB)	 // fault if stack check is wrong
+	BL	runtime·mstart(SB)
+
+	MOVD	$2, R8	// crash (not reached)
+	MOVD	R8, (R8)
+	RET
+
+// func exit(code int32)
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	code+0(FP), R0
+	MOVD	$SYS_exit, R8
+	SVC
+	MOVD	$0, R0
+	MOVD	R0, (R0)
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT|NOFRAME,$0-8
+	MOVD	wait+0(FP), R0
+	// We're done using the stack.
+	MOVW	$0, R1
+	STLRW	R1, (R0)
+	MOVW	$0, R0
+	MOVD	$SYS_thr_exit, R8
+	SVC
+	JMP	0(PC)
+
+// func open(name *byte, mode, perm int32) int32
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0-20
+	MOVD	name+0(FP), R0
+	MOVW	mode+8(FP), R1
+	MOVW	perm+12(FP), R2
+	MOVD	$SYS_open, R8
+	SVC
+	BCC	ok
+	MOVW	$-1, R0
+ok:
+	MOVW	R0, ret+16(FP)
+	RET
+
+// func closefd(fd int32) int32
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	fd+0(FP), R0
+	MOVD	$SYS_close, R8
+	SVC
+	BCC	ok
+	MOVW	$-1, R0
+ok:
+	MOVW	R0, ret+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	MOVD	$r+8(FP), R0
+	MOVW	flags+0(FP), R1
+	MOVD	$SYS_pipe2, R8
+	SVC
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, errno+16(FP)
+	RET
+
+// func write1(fd uintptr, p unsafe.Pointer, n int32) int32
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0-28
+	MOVD	fd+0(FP), R0
+	MOVD	p+8(FP), R1
+	MOVW	n+16(FP), R2
+	MOVD	$SYS_write, R8
+	SVC
+	BCC	ok
+	NEG	R0, R0		// caller expects negative errno
+ok:
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func read(fd int32, p unsafe.Pointer, n int32) int32
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	fd+0(FP), R0
+	MOVD	p+8(FP), R1
+	MOVW	n+16(FP), R2
+	MOVD	$SYS_read, R8
+	SVC
+	BCC	ok
+	NEG	R0, R0		// caller expects negative errno
+ok:
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func usleep(usec uint32)
+TEXT runtime·usleep(SB),NOSPLIT,$24-4
+	MOVWU	usec+0(FP), R3
+	MOVD	R3, R5
+	MOVW	$1000000, R4
+	UDIV	R4, R3
+	MOVD	R3, 8(RSP)
+	MUL	R3, R4
+	SUB	R4, R5
+	MOVW	$1000, R4
+	MUL	R4, R5
+	MOVD	R5, 16(RSP)
+
+	// nanosleep(&ts, 0)
+	ADD	$8, RSP, R0
+	MOVD	$0, R1
+	MOVD	$SYS_nanosleep, R8
+	SVC
+	RET
+
+// func thr_self() thread
+TEXT runtime·thr_self(SB),NOSPLIT,$8-8
+	MOVD	$ptr-8(SP), R0	// arg 1 &8(SP)
+	MOVD	$SYS_thr_self, R8
+	SVC
+	MOVD	ptr-8(SP), R0
+	MOVD	R0, ret+0(FP)
+	RET
+
+// func thr_kill(t thread, sig int)
+TEXT runtime·thr_kill(SB),NOSPLIT,$0-16
+	MOVD	tid+0(FP), R0	// arg 1 pid
+	MOVD	sig+8(FP), R1	// arg 2 sig
+	MOVD	$SYS_thr_kill, R8
+	SVC
+	RET
+
+// func raiseproc(sig uint32)
+TEXT runtime·raiseproc(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$SYS_getpid, R8
+	SVC
+	MOVW	sig+0(FP), R1
+	MOVD	$SYS_kill, R8
+	SVC
+	RET
+
+// func setitimer(mode int32, new, old *itimerval)
+TEXT runtime·setitimer(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	mode+0(FP), R0
+	MOVD	new+8(FP), R1
+	MOVD	old+16(FP), R2
+	MOVD	$SYS_setitimer, R8
+	SVC
+	RET
+
+// func fallback_walltime() (sec int64, nsec int32)
+TEXT runtime·fallback_walltime(SB),NOSPLIT,$24-12
+	MOVW	$CLOCK_REALTIME, R0
+	MOVD	$8(RSP), R1
+	MOVD	$SYS_clock_gettime, R8
+	SVC
+	MOVD	8(RSP), R0	// sec
+	MOVW	16(RSP), R1	// nsec
+	MOVD	R0, sec+0(FP)
+	MOVW	R1, nsec+8(FP)
+	RET
+
+// func fallback_nanotime() int64
+TEXT runtime·fallback_nanotime(SB),NOSPLIT,$24-8
+	MOVD	$CLOCK_MONOTONIC, R0
+	MOVD	$8(RSP), R1
+	MOVD	$SYS_clock_gettime, R8
+	SVC
+	MOVD	8(RSP), R0	// sec
+	MOVW	16(RSP), R2	// nsec
+
+	// sec is in R0, nsec in R2
+	// return nsec in R2
+	MOVD	$1000000000, R3
+	MUL	R3, R0
+	ADD	R2, R0
+
+	MOVD	R0, ret+0(FP)
+	RET
+
+// func asmSigaction(sig uintptr, new, old *sigactiont) int32
+TEXT runtime·asmSigaction(SB),NOSPLIT|NOFRAME,$0
+	MOVD	sig+0(FP), R0		// arg 1 sig
+	MOVD	new+8(FP), R1		// arg 2 act
+	MOVD	old+16(FP), R2		// arg 3 oact
+	MOVD	$SYS_sigaction, R8
+	SVC
+	BCC	ok
+	MOVW	$-1, R0
+ok:
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func sigfwd(fn uintptr, sig uint32, info *siginfo, ctx unsafe.Pointer)
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R0
+	MOVD	info+16(FP), R1
+	MOVD	ctx+24(FP), R2
+	MOVD	fn+0(FP), R11
+	BL	(R11)
+	RET
+
+// func sigtramp()
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$176
+	// Save callee-save registers in the case of signal forwarding.
+	// Please refer to https://golang.org/issue/31827 .
+	SAVE_R19_TO_R28(8*4)
+	SAVE_F8_TO_F15(8*14)
+
+	// this might be called in external code context,
+	// where g is not set.
+	// first save R0, because runtime·load_g will clobber it
+	MOVW	R0, 8(RSP)
+	MOVBU	runtime·iscgo(SB), R0
+	CMP	$0, R0
+	BEQ	2(PC)
+	BL	runtime·load_g(SB)
+
+	// Restore signum to R0.
+	MOVW	8(RSP), R0
+	// R1 and R2 already contain info and ctx, respectively.
+	MOVD	$runtime·sigtrampgo<ABIInternal>(SB), R3
+	BL	(R3)
+
+	// Restore callee-save registers.
+	RESTORE_R19_TO_R28(8*4)
+	RESTORE_F8_TO_F15(8*14)
+
+	RET
+
+// func mmap(addr uintptr, n uintptr, prot int, flags int, fd int, off int64) (ret uintptr, err error)
+TEXT runtime·mmap(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVW	prot+16(FP), R2
+	MOVW	flags+20(FP), R3
+	MOVW	fd+24(FP), R4
+	MOVW	off+28(FP), R5
+	MOVD	$SYS_mmap, R8
+	SVC
+	BCS	fail
+	MOVD	R0, p+32(FP)
+	MOVD	$0, err+40(FP)
+	RET
+fail:
+	MOVD	$0, p+32(FP)
+	MOVD	R0, err+40(FP)
+	RET
+
+// func munmap(addr uintptr, n uintptr) (err error)
+TEXT runtime·munmap(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVD	$SYS_munmap, R8
+	SVC
+	BCS	fail
+	RET
+fail:
+	MOVD	$0, R0
+	MOVD	R0, (R0)	// crash
+
+// func madvise(addr unsafe.Pointer, n uintptr, flags int32) int32
+TEXT runtime·madvise(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVW	flags+16(FP), R2
+	MOVD	$SYS_madvise, R8
+	SVC
+	BCC	ok
+	MOVW	$-1, R0
+ok:
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func sysctl(mib *uint32, miblen uint32, out *byte, size *uintptr, dst *byte, ndst uintptr) int32
+TEXT runtime·sysctl(SB),NOSPLIT,$0
+	MOVD	mib+0(FP), R0
+	MOVD	miblen+8(FP), R1
+	MOVD	out+16(FP), R2
+	MOVD	size+24(FP), R3
+	MOVD	dst+32(FP), R4
+	MOVD	ndst+40(FP), R5
+	MOVD	$SYS___sysctl, R8
+	SVC
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+48(FP)
+	RET
+
+// func sigaltstack(new, old *stackt)
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOVD	new+0(FP), R0
+	MOVD	old+8(FP), R1
+	MOVD	$SYS_sigaltstack, R8
+	SVC
+	BCS	fail
+	RET
+fail:
+	MOVD	$0, R0
+	MOVD	R0, (R0)	// crash
+
+// func osyield()
+TEXT runtime·osyield(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$SYS_sched_yield, R8
+	SVC
+	RET
+
+// func sigprocmask(how int32, new, old *sigset)
+TEXT runtime·sigprocmask(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	how+0(FP), R0
+	MOVD	new+8(FP), R1
+	MOVD	old+16(FP), R2
+	MOVD	$SYS_sigprocmask, R8
+	SVC
+	BCS	fail
+	RET
+fail:
+	MOVD	$0, R0
+	MOVD	R0, (R0)	// crash
+
+// func cpuset_getaffinity(level int, which int, id int64, size int, mask *byte) int32
+TEXT runtime·cpuset_getaffinity(SB),NOSPLIT|NOFRAME,$0-44
+	MOVD	level+0(FP), R0
+	MOVD	which+8(FP), R1
+	MOVD	id+16(FP), R2
+	MOVD	size+24(FP), R3
+	MOVD	mask+32(FP), R4
+	MOVD	$SYS_cpuset_getaffinity, R8
+	SVC
+	BCC	ok
+	MOVW	$-1, R0
+ok:
+	MOVW	R0, ret+40(FP)
+	RET
+
+// func kqueue() int32
+TEXT runtime·kqueue(SB),NOSPLIT|NOFRAME,$0
+	MOVD $SYS_kqueue, R8
+	SVC
+	BCC	ok
+	MOVW	$-1, R0
+ok:
+	MOVW	R0, ret+0(FP)
+	RET
+
+// func kevent(kq int, ch unsafe.Pointer, nch int, ev unsafe.Pointer, nev int, ts *Timespec) (n int, err error)
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVW	kq+0(FP), R0
+	MOVD	ch+8(FP), R1
+	MOVW	nch+16(FP), R2
+	MOVD	ev+24(FP), R3
+	MOVW	nev+32(FP), R4
+	MOVD	ts+40(FP), R5
+	MOVD	$SYS_kevent, R8
+	SVC
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+48(FP)
+	RET
+
+// func fcntl(fd, cmd, arg int32) (int32, int32)
+TEXT runtime·fcntl(SB),NOSPLIT,$0
+	MOVW	fd+0(FP), R0
+	MOVW	cmd+4(FP), R1
+	MOVW	arg+8(FP), R2
+	MOVD	$SYS_fcntl, R8
+	SVC
+	BCC	noerr
+	MOVW	$-1, R1
+	MOVW	R1, ret+16(FP)
+	MOVW	R0, errno+20(FP)
+	RET
+noerr:
+	MOVW	R0, ret+16(FP)
+	MOVW	$0, errno+20(FP)
+	RET
+
+// func getCntxct(physical bool) uint32
+TEXT runtime·getCntxct(SB),NOSPLIT,$0
+	MOVB	physical+0(FP), R0
+	CMP	$0, R0
+	BEQ	3(PC)
+
+	// get CNTPCT (Physical Count Register) into R0
+	MRS	CNTPCT_EL0, R0
+	B	2(PC)
+
+	// get CNTVCT (Virtual Count Register) into R0
+	MRS	CNTVCT_EL0, R0
+
+	MOVW	R0, ret+8(FP)
+	RET
+
+// func issetugid() int32
+TEXT runtime·issetugid(SB),NOSPLIT|NOFRAME,$0
+	MOVD $SYS_issetugid, R8
+	SVC
+	MOVW	R0, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_freebsd_riscv64.s b/src/runtime/sys_freebsd_riscv64.s
new file mode 100644
index 0000000..cbf920c
--- /dev/null
+++ b/src/runtime/sys_freebsd_riscv64.s
@@ -0,0 +1,448 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for riscv64, FreeBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME		0
+#define CLOCK_MONOTONIC		4
+
+#define SYS_exit		1
+#define SYS_read		3
+#define SYS_write		4
+#define SYS_open		5
+#define SYS_close		6
+#define SYS_getpid		20
+#define SYS_kill		37
+#define SYS_sigaltstack		53
+#define SYS_munmap		73
+#define SYS_madvise		75
+#define SYS_setitimer		83
+#define SYS_fcntl		92
+#define SYS___sysctl		202
+#define SYS_nanosleep		240
+#define SYS_issetugid		253
+#define SYS_clock_gettime	232
+#define SYS_sched_yield		331
+#define SYS_sigprocmask		340
+#define SYS_kqueue		362
+#define SYS_sigaction		416
+#define SYS_thr_exit		431
+#define SYS_thr_self		432
+#define SYS_thr_kill		433
+#define SYS__umtx_op		454
+#define SYS_thr_new		455
+#define SYS_mmap		477
+#define SYS_cpuset_getaffinity	487
+#define SYS_pipe2 		542
+#define SYS_kevent		560
+
+TEXT emptyfunc<>(SB),0,$0-0
+	RET
+
+// func sys_umtx_op(addr *uint32, mode int32, val uint32, uaddr1 uintptr, ut *umtx_time) int32
+TEXT runtime·sys_umtx_op(SB),NOSPLIT,$0
+	MOV	addr+0(FP), A0
+	MOVW	mode+8(FP), A1
+	MOVW	val+12(FP), A2
+	MOV	uaddr1+16(FP), A3
+	MOV	ut+24(FP), A4
+	MOV	$SYS__umtx_op, T0
+	ECALL
+	BEQ	T0, ZERO, ok
+	NEG	A0, A0
+ok:
+	MOVW	A0, ret+32(FP)
+	RET
+
+// func thr_new(param *thrparam, size int32) int32
+TEXT runtime·thr_new(SB),NOSPLIT,$0
+	MOV	param+0(FP), A0
+	MOVW	size+8(FP), A1
+	MOV	$SYS_thr_new, T0
+	ECALL
+	BEQ	T0, ZERO, ok
+	NEG	A0, A0
+ok:
+	MOVW	A0, ret+16(FP)
+	RET
+
+// func thr_start()
+TEXT runtime·thr_start(SB),NOSPLIT,$0
+	// set up g
+	MOV	m_g0(A0), g
+	MOV	A0, g_m(g)
+	CALL	emptyfunc<>(SB)	 // fault if stack check is wrong
+	CALL	runtime·mstart(SB)
+
+	WORD	$0	// crash
+	RET
+
+// func exit(code int32)
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	code+0(FP), A0
+	MOV	$SYS_exit, T0
+	ECALL
+	WORD	$0	// crash
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT|NOFRAME,$0-8
+	MOV	wait+0(FP), A0
+	// We're done using the stack.
+	FENCE
+	MOVW	ZERO, (A0)
+	FENCE
+	MOV	$0, A0	// exit code
+	MOV	$SYS_thr_exit, T0
+	ECALL
+	JMP	0(PC)
+
+// func open(name *byte, mode, perm int32) int32
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0-20
+	MOV	name+0(FP), A0
+	MOVW	mode+8(FP), A1
+	MOVW	perm+12(FP), A2
+	MOV	$SYS_open, T0
+	ECALL
+	BEQ	T0, ZERO, ok
+	MOV	$-1, A0
+ok:
+	MOVW	A0, ret+16(FP)
+	RET
+
+// func closefd(fd int32) int32
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	fd+0(FP), A0
+	MOV	$SYS_close, T0
+	ECALL
+	BEQ	T0, ZERO, ok
+	MOV	$-1, A0
+ok:
+	MOVW	A0, ret+8(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	MOV	$r+8(FP), A0
+	MOVW	flags+0(FP), A1
+	MOV	$SYS_pipe2, T0
+	ECALL
+	BEQ	T0, ZERO, ok
+	NEG	A0, A0
+ok:
+	MOVW	A0, errno+16(FP)
+	RET
+
+// func write1(fd uintptr, p unsafe.Pointer, n int32) int32
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0-28
+	MOV	fd+0(FP), A0
+	MOV	p+8(FP), A1
+	MOVW	n+16(FP), A2
+	MOV	$SYS_write, T0
+	ECALL
+	BEQ	T0, ZERO, ok
+	NEG	A0, A0
+ok:
+	MOVW	A0, ret+24(FP)
+	RET
+
+// func read(fd int32, p unsafe.Pointer, n int32) int32
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	fd+0(FP), A0
+	MOV	p+8(FP), A1
+	MOVW	n+16(FP), A2
+	MOV	$SYS_read, T0
+	ECALL
+	BEQ	T0, ZERO, ok
+	NEG	A0, A0
+ok:
+	MOVW	A0, ret+24(FP)
+	RET
+
+// func usleep(usec uint32)
+TEXT runtime·usleep(SB),NOSPLIT,$24-4
+	MOVWU	usec+0(FP), A0
+	MOV	$1000, A1
+	MUL	A1, A0, A0
+	MOV	$1000000000, A1
+	DIV	A1, A0, A2
+	MOV	A2, 8(X2)
+	REM	A1, A0, A3
+	MOV	A3, 16(X2)
+	ADD	$8, X2, A0
+	MOV	ZERO, A1
+	MOV	$SYS_nanosleep, T0
+	ECALL
+	RET
+
+// func thr_self() thread
+TEXT runtime·thr_self(SB),NOSPLIT,$8-8
+	MOV	$ptr-8(SP), A0	// arg 1 &8(SP)
+	MOV	$SYS_thr_self, T0
+	ECALL
+	MOV	ptr-8(SP), A0
+	MOV	A0, ret+0(FP)
+	RET
+
+// func thr_kill(t thread, sig int)
+TEXT runtime·thr_kill(SB),NOSPLIT,$0-16
+	MOV	tid+0(FP), A0	// arg 1 pid
+	MOV	sig+8(FP), A1	// arg 2 sig
+	MOV	$SYS_thr_kill, T0
+	ECALL
+	RET
+
+// func raiseproc(sig uint32)
+TEXT runtime·raiseproc(SB),NOSPLIT|NOFRAME,$0
+	MOV	$SYS_getpid, T0
+	ECALL
+	// arg 1 pid - already in A0
+	MOVW	sig+0(FP), A1	// arg 2
+	MOV	$SYS_kill, T0
+	ECALL
+	RET
+
+// func setitimer(mode int32, new, old *itimerval)
+TEXT runtime·setitimer(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	mode+0(FP), A0
+	MOV	new+8(FP), A1
+	MOV	old+16(FP), A2
+	MOV	$SYS_setitimer, T0
+	ECALL
+	RET
+
+// func fallback_walltime() (sec int64, nsec int32)
+TEXT runtime·fallback_walltime(SB),NOSPLIT,$24-12
+	MOV	$CLOCK_REALTIME, A0
+	MOV	$8(X2), A1
+	MOV	$SYS_clock_gettime, T0
+	ECALL
+	MOV	8(X2), T0	// sec
+	MOVW	16(X2), T1	// nsec
+	MOV	T0, sec+0(FP)
+	MOVW	T1, nsec+8(FP)
+	RET
+
+// func fallback_nanotime() int64
+TEXT runtime·fallback_nanotime(SB),NOSPLIT,$24-8
+	MOV	$CLOCK_MONOTONIC, A0
+	MOV	$8(X2), A1
+	MOV	$SYS_clock_gettime, T0
+	ECALL
+	MOV	8(X2), T0	// sec
+	MOV	16(X2), T1	// nsec
+
+	// sec is in T0, nsec in T1
+	// return nsec in T0
+	MOV	$1000000000, T2
+	MUL	T2, T0
+	ADD	T1, T0
+
+	MOV	T0, ret+0(FP)
+	RET
+
+// func asmSigaction(sig uintptr, new, old *sigactiont) int32
+TEXT runtime·asmSigaction(SB),NOSPLIT|NOFRAME,$0
+	MOV	sig+0(FP), A0		// arg 1 sig
+	MOV	new+8(FP), A1		// arg 2 act
+	MOV	old+16(FP), A2		// arg 3 oact
+	MOV	$SYS_sigaction, T0
+	ECALL
+	BEQ	T0, ZERO, ok
+	MOV	$-1, A0
+ok:
+	MOVW	A0, ret+24(FP)
+	RET
+
+// func sigfwd(fn uintptr, sig uint32, info *siginfo, ctx unsafe.Pointer)
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), A0
+	MOV	info+16(FP), A1
+	MOV	ctx+24(FP), A2
+	MOV	fn+0(FP), T1
+	JALR	RA, T1
+	RET
+
+// func sigtramp(signo, ureg, ctxt unsafe.Pointer)
+TEXT runtime·sigtramp(SB),NOSPLIT,$64
+	MOVW	A0, 8(X2)
+	MOV	A1, 16(X2)
+	MOV	A2, 24(X2)
+
+	// this might be called in external code context,
+	// where g is not set.
+	MOVBU	runtime·iscgo(SB), A0
+	BEQ	A0, ZERO, ok
+	CALL	runtime·load_g(SB)
+ok:
+	MOV	$runtime·sigtrampgo(SB), A0
+	JALR	RA, A0
+	RET
+
+// func mmap(addr uintptr, n uintptr, prot int, flags int, fd int, off int64) (ret uintptr, err error)
+TEXT runtime·mmap(SB),NOSPLIT|NOFRAME,$0
+	MOV	addr+0(FP), A0
+	MOV	n+8(FP), A1
+	MOVW	prot+16(FP), A2
+	MOVW	flags+20(FP), A3
+	MOVW	fd+24(FP), A4
+	MOVW	off+28(FP), A5
+	MOV	$SYS_mmap, T0
+	ECALL
+	BNE	T0, ZERO, fail
+	MOV	A0, p+32(FP)
+	MOV	ZERO, err+40(FP)
+	RET
+fail:
+	MOV	ZERO, p+32(FP)
+	MOV	A0, err+40(FP)
+	RET
+
+// func munmap(addr uintptr, n uintptr) (err error)
+TEXT runtime·munmap(SB),NOSPLIT|NOFRAME,$0
+	MOV	addr+0(FP), A0
+	MOV	n+8(FP), A1
+	MOV	$SYS_munmap, T0
+	ECALL
+	BNE	T0, ZERO, fail
+	RET
+fail:
+	WORD	$0	// crash
+
+// func madvise(addr unsafe.Pointer, n uintptr, flags int32) int32
+TEXT runtime·madvise(SB),NOSPLIT|NOFRAME,$0
+	MOV	addr+0(FP), A0
+	MOV	n+8(FP), A1
+	MOVW	flags+16(FP), A2
+	MOV	$SYS_madvise, T0
+	ECALL
+	BEQ	T0, ZERO, ok
+	MOV	$-1, A0
+ok:
+	MOVW	A0, ret+24(FP)
+	RET
+
+// func sysctl(mib *uint32, miblen uint32, out *byte, size *uintptr, dst *byte, ndst uintptr) int32
+TEXT runtime·sysctl(SB),NOSPLIT,$0
+	MOV	mib+0(FP), A0
+	MOV	miblen+8(FP), A1
+	MOV	out+16(FP), A2
+	MOV	size+24(FP), A3
+	MOV	dst+32(FP), A4
+	MOV	ndst+40(FP), A5
+	MOV	$SYS___sysctl, T0
+	ECALL
+	BEQ	T0, ZERO, ok
+	NEG	A0, A0
+ok:
+	MOVW	A0, ret+48(FP)
+	RET
+
+// func sigaltstack(new, old *stackt)
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOV	new+0(FP), A0
+	MOV	old+8(FP), A1
+	MOV	$SYS_sigaltstack, T0
+	ECALL
+	BNE	T0, ZERO, fail
+	RET
+fail:
+	WORD	$0	// crash
+
+// func osyield()
+TEXT runtime·osyield(SB),NOSPLIT|NOFRAME,$0
+	MOV	$SYS_sched_yield, T0
+	ECALL
+	RET
+
+// func sigprocmask(how int32, new, old *sigset)
+TEXT runtime·sigprocmask(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	how+0(FP), A0
+	MOV	new+8(FP), A1
+	MOV	old+16(FP), A2
+	MOV	$SYS_sigprocmask, T0
+	ECALL
+	BNE	T0, ZERO, fail
+	RET
+fail:
+	WORD	$0	// crash
+
+
+// func cpuset_getaffinity(level int, which int, id int64, size int, mask *byte) int32
+TEXT runtime·cpuset_getaffinity(SB),NOSPLIT|NOFRAME,$0-44
+	MOV	level+0(FP), A0
+	MOV	which+8(FP), A1
+	MOV	id+16(FP), A2
+	MOV	size+24(FP), A3
+	MOV	mask+32(FP), A4
+	MOV	$SYS_cpuset_getaffinity, T0
+	ECALL
+	BEQ	T0, ZERO, ok
+	MOV	$-1, A0
+ok:
+	MOVW	A0, ret+40(FP)
+	RET
+
+// func kqueue() int32
+TEXT runtime·kqueue(SB),NOSPLIT|NOFRAME,$0
+	MOV $SYS_kqueue, T0
+	ECALL
+	BEQ	T0, ZERO, ok
+	MOV	$-1, A0
+ok:
+	MOVW	A0, ret+0(FP)
+	RET
+
+// func kevent(kq int, ch unsafe.Pointer, nch int, ev unsafe.Pointer, nev int, ts *Timespec) (n int, err error)
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVW	kq+0(FP), A0
+	MOV	ch+8(FP), A1
+	MOVW	nch+16(FP), A2
+	MOV	ev+24(FP), A3
+	MOVW	nev+32(FP), A4
+	MOV	ts+40(FP), A5
+	MOV	$SYS_kevent, T0
+	ECALL
+	BEQ	T0, ZERO, ok
+	NEG	A0, A0
+ok:
+	MOVW	A0, ret+48(FP)
+	RET
+
+// func fcntl(fd, cmd, arg int32) (int32, int32)
+TEXT runtime·fcntl(SB),NOSPLIT,$0
+	MOVW	fd+0(FP), A0
+	MOVW	cmd+4(FP), A1
+	MOVW	arg+8(FP), A2
+	MOV	$SYS_fcntl, T0
+	ECALL
+	BEQ	T0, ZERO, noerr
+	MOV	$-1, A1
+	MOVW	A1, ret+16(FP)
+	MOVW	A0, errno+20(FP)
+	RET
+noerr:
+	MOVW	A0, ret+16(FP)
+	MOVW	ZERO, errno+20(FP)
+	RET
+
+// func getCntxct() uint32
+TEXT runtime·getCntxct(SB),NOSPLIT|NOFRAME,$0
+	RDTIME	A0
+	MOVW	A0, ret+0(FP)
+	RET
+
+// func issetugid() int32
+TEXT runtime·issetugid(SB),NOSPLIT|NOFRAME,$0
+	MOV $SYS_issetugid, T0
+	ECALL
+	MOVW	A0, ret+0(FP)
+	RET
+
diff --git a/src/runtime/sys_libc.go b/src/runtime/sys_libc.go
new file mode 100644
index 0000000..0c6f13c
--- /dev/null
+++ b/src/runtime/sys_libc.go
@@ -0,0 +1,54 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build darwin || (openbsd && !mips64)
+
+package runtime
+
+import "unsafe"
+
+// Call fn with arg as its argument. Return what fn returns.
+// fn is the raw pc value of the entry point of the desired function.
+// Switches to the system stack, if not already there.
+// Preserves the calling point as the location where a profiler traceback will begin.
+//
+//go:nosplit
+func libcCall(fn, arg unsafe.Pointer) int32 {
+	// Leave caller's PC/SP/G around for traceback.
+	gp := getg()
+	var mp *m
+	if gp != nil {
+		mp = gp.m
+	}
+	if mp != nil && mp.libcallsp == 0 {
+		mp.libcallg.set(gp)
+		mp.libcallpc = getcallerpc()
+		// sp must be the last, because once async cpu profiler finds
+		// all three values to be non-zero, it will use them
+		mp.libcallsp = getcallersp()
+	} else {
+		// Make sure we don't reset libcallsp. This makes
+		// libcCall reentrant; We remember the g/pc/sp for the
+		// first call on an M, until that libcCall instance
+		// returns.  Reentrance only matters for signals, as
+		// libc never calls back into Go.  The tricky case is
+		// where we call libcX from an M and record g/pc/sp.
+		// Before that call returns, a signal arrives on the
+		// same M and the signal handling code calls another
+		// libc function.  We don't want that second libcCall
+		// from within the handler to be recorded, and we
+		// don't want that call's completion to zero
+		// libcallsp.
+		// We don't need to set libcall* while we're in a sighandler
+		// (even if we're not currently in libc) because we block all
+		// signals while we're handling a signal. That includes the
+		// profile signal, which is the one that uses the libcall* info.
+		mp = nil
+	}
+	res := asmcgocall(fn, arg)
+	if mp != nil {
+		mp.libcallsp = 0
+	}
+	return res
+}
diff --git a/src/runtime/sys_linux_386.s b/src/runtime/sys_linux_386.s
new file mode 100644
index 0000000..d53be24
--- /dev/null
+++ b/src/runtime/sys_linux_386.s
@@ -0,0 +1,772 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for 386, Linux
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// Most linux systems use glibc's dynamic linker, which puts the
+// __kernel_vsyscall vdso helper at 0x10(GS) for easy access from position
+// independent code and setldt in runtime does the same in the statically
+// linked case. However, systems that use alternative libc such as Android's
+// bionic and musl, do not save the helper anywhere, and so the only way to
+// invoke a syscall from position independent code is boring old int $0x80
+// (which is also what syscall wrappers in bionic/musl use).
+//
+// The benchmarks also showed that using int $0x80 is as fast as calling
+// *%gs:0x10 except on AMD Opteron. See https://golang.org/cl/19833
+// for the benchmark program and raw data.
+//#define INVOKE_SYSCALL	CALL	0x10(GS) // non-portable
+#define INVOKE_SYSCALL	INT	$0x80
+
+#define SYS_exit		1
+#define SYS_read		3
+#define SYS_write		4
+#define SYS_open		5
+#define SYS_close		6
+#define SYS_getpid		20
+#define SYS_access		33
+#define SYS_kill		37
+#define SYS_brk 		45
+#define SYS_munmap		91
+#define SYS_socketcall		102
+#define SYS_setittimer		104
+#define SYS_clone		120
+#define SYS_sched_yield 	158
+#define SYS_nanosleep		162
+#define SYS_rt_sigreturn	173
+#define SYS_rt_sigaction	174
+#define SYS_rt_sigprocmask	175
+#define SYS_sigaltstack 	186
+#define SYS_mmap2		192
+#define SYS_mincore		218
+#define SYS_madvise		219
+#define SYS_gettid		224
+#define SYS_futex		240
+#define SYS_sched_getaffinity	242
+#define SYS_set_thread_area	243
+#define SYS_exit_group		252
+#define SYS_timer_create	259
+#define SYS_timer_settime	260
+#define SYS_timer_delete	263
+#define SYS_clock_gettime	265
+#define SYS_tgkill		270
+#define SYS_pipe2		331
+
+TEXT runtime·exit(SB),NOSPLIT,$0
+	MOVL	$SYS_exit_group, AX
+	MOVL	code+0(FP), BX
+	INVOKE_SYSCALL
+	INT $3	// not reached
+	RET
+
+TEXT exit1<>(SB),NOSPLIT,$0
+	MOVL	$SYS_exit, AX
+	MOVL	code+0(FP), BX
+	INVOKE_SYSCALL
+	INT $3	// not reached
+	RET
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-4
+	MOVL	wait+0(FP), AX
+	// We're done using the stack.
+	MOVL	$0, (AX)
+	MOVL	$1, AX	// exit (just this thread)
+	MOVL	$0, BX	// exit code
+	INT	$0x80	// no stack; must not use CALL
+	// We may not even have a stack any more.
+	INT	$3
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$0
+	MOVL	$SYS_open, AX
+	MOVL	name+0(FP), BX
+	MOVL	mode+4(FP), CX
+	MOVL	perm+8(FP), DX
+	INVOKE_SYSCALL
+	CMPL	AX, $0xfffff001
+	JLS	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$0
+	MOVL	$SYS_close, AX
+	MOVL	fd+0(FP), BX
+	INVOKE_SYSCALL
+	CMPL	AX, $0xfffff001
+	JLS	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$0
+	MOVL	$SYS_write, AX
+	MOVL	fd+0(FP), BX
+	MOVL	p+4(FP), CX
+	MOVL	n+8(FP), DX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$0
+	MOVL	$SYS_read, AX
+	MOVL	fd+0(FP), BX
+	MOVL	p+4(FP), CX
+	MOVL	n+8(FP), DX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+12(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-16
+	MOVL	$SYS_pipe2, AX
+	LEAL	r+4(FP), BX
+	MOVL	flags+0(FP), CX
+	INVOKE_SYSCALL
+	MOVL	AX, errno+12(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$8
+	MOVL	$0, DX
+	MOVL	usec+0(FP), AX
+	MOVL	$1000000, CX
+	DIVL	CX
+	MOVL	AX, 0(SP)
+	MOVL	$1000, AX	// usec to nsec
+	MULL	DX
+	MOVL	AX, 4(SP)
+
+	// nanosleep(&ts, 0)
+	MOVL	$SYS_nanosleep, AX
+	LEAL	0(SP), BX
+	MOVL	$0, CX
+	INVOKE_SYSCALL
+	RET
+
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOVL	$SYS_gettid, AX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+0(FP)
+	RET
+
+TEXT runtime·raise(SB),NOSPLIT,$12
+	MOVL	$SYS_getpid, AX
+	INVOKE_SYSCALL
+	MOVL	AX, BX	// arg 1 pid
+	MOVL	$SYS_gettid, AX
+	INVOKE_SYSCALL
+	MOVL	AX, CX	// arg 2 tid
+	MOVL	sig+0(FP), DX	// arg 3 signal
+	MOVL	$SYS_tgkill, AX
+	INVOKE_SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$12
+	MOVL	$SYS_getpid, AX
+	INVOKE_SYSCALL
+	MOVL	AX, BX	// arg 1 pid
+	MOVL	sig+0(FP), CX	// arg 2 signal
+	MOVL	$SYS_kill, AX
+	INVOKE_SYSCALL
+	RET
+
+TEXT ·getpid(SB),NOSPLIT,$0-4
+	MOVL	$SYS_getpid, AX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+0(FP)
+	RET
+
+TEXT ·tgkill(SB),NOSPLIT,$0
+	MOVL	$SYS_tgkill, AX
+	MOVL	tgid+0(FP), BX
+	MOVL	tid+4(FP), CX
+	MOVL	sig+8(FP), DX
+	INVOKE_SYSCALL
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$0-12
+	MOVL	$SYS_setittimer, AX
+	MOVL	mode+0(FP), BX
+	MOVL	new+4(FP), CX
+	MOVL	old+8(FP), DX
+	INVOKE_SYSCALL
+	RET
+
+TEXT runtime·timer_create(SB),NOSPLIT,$0-16
+	MOVL	$SYS_timer_create, AX
+	MOVL	clockid+0(FP), BX
+	MOVL	sevp+4(FP), CX
+	MOVL	timerid+8(FP), DX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·timer_settime(SB),NOSPLIT,$0-20
+	MOVL	$SYS_timer_settime, AX
+	MOVL	timerid+0(FP), BX
+	MOVL	flags+4(FP), CX
+	MOVL	new+8(FP), DX
+	MOVL	old+12(FP), SI
+	INVOKE_SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·timer_delete(SB),NOSPLIT,$0-8
+	MOVL	$SYS_timer_delete, AX
+	MOVL	timerid+0(FP), BX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·mincore(SB),NOSPLIT,$0-16
+	MOVL	$SYS_mincore, AX
+	MOVL	addr+0(FP), BX
+	MOVL	n+4(FP), CX
+	MOVL	dst+8(FP), DX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+12(FP)
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB), NOSPLIT, $8-12
+	// We don't know how much stack space the VDSO code will need,
+	// so switch to g0.
+
+	MOVL	SP, BP	// Save old SP; BP unchanged by C code.
+
+	get_tls(CX)
+	MOVL	g(CX), AX
+	MOVL	g_m(AX), SI // SI unchanged by C code.
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVL	m_vdsoPC(SI), CX
+	MOVL	m_vdsoSP(SI), DX
+	MOVL	CX, 0(SP)
+	MOVL	DX, 4(SP)
+
+	LEAL	sec+0(FP), DX
+	MOVL	-4(DX), CX
+	MOVL	CX, m_vdsoPC(SI)
+	MOVL	DX, m_vdsoSP(SI)
+
+	CMPL	AX, m_curg(SI)	// Only switch if on curg.
+	JNE	noswitch
+
+	MOVL	m_g0(SI), DX
+	MOVL	(g_sched+gobuf_sp)(DX), SP	// Set SP to g0 stack
+
+noswitch:
+	SUBL	$16, SP		// Space for results
+	ANDL	$~15, SP	// Align for C code
+
+	// Stack layout, depending on call path:
+	//  x(SP)   vDSO            INVOKE_SYSCALL
+	//    12    ts.tv_nsec      ts.tv_nsec
+	//     8    ts.tv_sec       ts.tv_sec
+	//     4    &ts             -
+	//     0    CLOCK_<id>      -
+
+	MOVL	runtime·vdsoClockgettimeSym(SB), AX
+	CMPL	AX, $0
+	JEQ	fallback
+
+	LEAL	8(SP), BX	// &ts (struct timespec)
+	MOVL	BX, 4(SP)
+	MOVL	$0, 0(SP)	// CLOCK_REALTIME
+	CALL	AX
+	JMP finish
+
+fallback:
+	MOVL	$SYS_clock_gettime, AX
+	MOVL	$0, BX		// CLOCK_REALTIME
+	LEAL	8(SP), CX
+	INVOKE_SYSCALL
+
+finish:
+	MOVL	8(SP), AX	// sec
+	MOVL	12(SP), BX	// nsec
+
+	MOVL	BP, SP		// Restore real SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVL	4(SP), CX
+	MOVL	CX, m_vdsoSP(SI)
+	MOVL	0(SP), CX
+	MOVL	CX, m_vdsoPC(SI)
+
+	// sec is in AX, nsec in BX
+	MOVL	AX, sec_lo+0(FP)
+	MOVL	$0, sec_hi+4(FP)
+	MOVL	BX, nsec+8(FP)
+	RET
+
+// int64 nanotime(void) so really
+// void nanotime(int64 *nsec)
+TEXT runtime·nanotime1(SB), NOSPLIT, $8-8
+	// Switch to g0 stack. See comment above in runtime·walltime.
+
+	MOVL	SP, BP	// Save old SP; BP unchanged by C code.
+
+	get_tls(CX)
+	MOVL	g(CX), AX
+	MOVL	g_m(AX), SI // SI unchanged by C code.
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVL	m_vdsoPC(SI), CX
+	MOVL	m_vdsoSP(SI), DX
+	MOVL	CX, 0(SP)
+	MOVL	DX, 4(SP)
+
+	LEAL	ret+0(FP), DX
+	MOVL	-4(DX), CX
+	MOVL	CX, m_vdsoPC(SI)
+	MOVL	DX, m_vdsoSP(SI)
+
+	CMPL	AX, m_curg(SI)	// Only switch if on curg.
+	JNE	noswitch
+
+	MOVL	m_g0(SI), DX
+	MOVL	(g_sched+gobuf_sp)(DX), SP	// Set SP to g0 stack
+
+noswitch:
+	SUBL	$16, SP		// Space for results
+	ANDL	$~15, SP	// Align for C code
+
+	MOVL	runtime·vdsoClockgettimeSym(SB), AX
+	CMPL	AX, $0
+	JEQ	fallback
+
+	LEAL	8(SP), BX	// &ts (struct timespec)
+	MOVL	BX, 4(SP)
+	MOVL	$1, 0(SP)	// CLOCK_MONOTONIC
+	CALL	AX
+	JMP finish
+
+fallback:
+	MOVL	$SYS_clock_gettime, AX
+	MOVL	$1, BX		// CLOCK_MONOTONIC
+	LEAL	8(SP), CX
+	INVOKE_SYSCALL
+
+finish:
+	MOVL	8(SP), AX	// sec
+	MOVL	12(SP), BX	// nsec
+
+	MOVL	BP, SP		// Restore real SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVL	4(SP), CX
+	MOVL	CX, m_vdsoSP(SI)
+	MOVL	0(SP), CX
+	MOVL	CX, m_vdsoPC(SI)
+
+	// sec is in AX, nsec in BX
+	// convert to DX:AX nsec
+	MOVL	$1000000000, CX
+	MULL	CX
+	ADDL	BX, AX
+	ADCL	$0, DX
+
+	MOVL	AX, ret_lo+0(FP)
+	MOVL	DX, ret_hi+4(FP)
+	RET
+
+TEXT runtime·rtsigprocmask(SB),NOSPLIT,$0
+	MOVL	$SYS_rt_sigprocmask, AX
+	MOVL	how+0(FP), BX
+	MOVL	new+4(FP), CX
+	MOVL	old+8(FP), DX
+	MOVL	size+12(FP), SI
+	INVOKE_SYSCALL
+	CMPL	AX, $0xfffff001
+	JLS	2(PC)
+	INT $3
+	RET
+
+TEXT runtime·rt_sigaction(SB),NOSPLIT,$0
+	MOVL	$SYS_rt_sigaction, AX
+	MOVL	sig+0(FP), BX
+	MOVL	new+4(FP), CX
+	MOVL	old+8(FP), DX
+	MOVL	size+12(FP), SI
+	INVOKE_SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$12-16
+	MOVL	fn+0(FP), AX
+	MOVL	sig+4(FP), BX
+	MOVL	info+8(FP), CX
+	MOVL	ctx+12(FP), DX
+	MOVL	SP, SI
+	SUBL	$32, SP
+	ANDL	$-15, SP	// align stack: handler might be a C function
+	MOVL	BX, 0(SP)
+	MOVL	CX, 4(SP)
+	MOVL	DX, 8(SP)
+	MOVL	SI, 12(SP)	// save SI: handler might be a Go function
+	CALL	AX
+	MOVL	12(SP), AX
+	MOVL	AX, SP
+	RET
+
+// Called using C ABI.
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$28
+	// Save callee-saved C registers, since the caller may be a C signal handler.
+	MOVL	BX, bx-4(SP)
+	MOVL	BP, bp-8(SP)
+	MOVL	SI, si-12(SP)
+	MOVL	DI, di-16(SP)
+	// We don't save mxcsr or the x87 control word because sigtrampgo doesn't
+	// modify them.
+
+	MOVL	(28+4)(SP), BX
+	MOVL	BX, 0(SP)
+	MOVL	(28+8)(SP), BX
+	MOVL	BX, 4(SP)
+	MOVL	(28+12)(SP), BX
+	MOVL	BX, 8(SP)
+	CALL	runtime·sigtrampgo(SB)
+
+	MOVL	di-16(SP), DI
+	MOVL	si-12(SP), SI
+	MOVL	bp-8(SP),  BP
+	MOVL	bx-4(SP),  BX
+	RET
+
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	JMP	runtime·sigtramp(SB)
+
+// For cgo unwinding to work, this function must look precisely like
+// the one in glibc. The glibc source code is:
+// https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/i386/libc_sigaction.c;h=0665b41bbcd0986f0b33bf19a7ecbcedf9961d0a#l59
+// The code that cares about the precise instructions used is:
+// https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgcc/config/i386/linux-unwind.h;h=5486223d60272c73d5103b29ae592d2ee998e1cf#l136
+//
+// For gdb unwinding to work, this function must look precisely like the one in
+// glibc and must be named "__restore_rt" or contain the string "sigaction" in
+// the name. The gdb source code is:
+// https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/i386-linux-tdep.c;h=a6adeca1b97416f7194341151a8ce30723a786a3#l168
+TEXT runtime·sigreturn__sigaction(SB),NOSPLIT,$0
+	MOVL	$SYS_rt_sigreturn, AX
+	// Sigreturn expects same SP as signal handler,
+	// so cannot CALL 0x10(GS) here.
+	INT	$0x80
+	INT	$3	// not reached
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$0
+	MOVL	$SYS_mmap2, AX
+	MOVL	addr+0(FP), BX
+	MOVL	n+4(FP), CX
+	MOVL	prot+8(FP), DX
+	MOVL	flags+12(FP), SI
+	MOVL	fd+16(FP), DI
+	MOVL	off+20(FP), BP
+	SHRL	$12, BP
+	INVOKE_SYSCALL
+	CMPL	AX, $0xfffff001
+	JLS	ok
+	NOTL	AX
+	INCL	AX
+	MOVL	$0, p+24(FP)
+	MOVL	AX, err+28(FP)
+	RET
+ok:
+	MOVL	AX, p+24(FP)
+	MOVL	$0, err+28(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVL	$SYS_munmap, AX
+	MOVL	addr+0(FP), BX
+	MOVL	n+4(FP), CX
+	INVOKE_SYSCALL
+	CMPL	AX, $0xfffff001
+	JLS	2(PC)
+	INT $3
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVL	$SYS_madvise, AX
+	MOVL	addr+0(FP), BX
+	MOVL	n+4(FP), CX
+	MOVL	flags+8(FP), DX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+12(FP)
+	RET
+
+// int32 futex(int32 *uaddr, int32 op, int32 val,
+//	struct timespec *timeout, int32 *uaddr2, int32 val2);
+TEXT runtime·futex(SB),NOSPLIT,$0
+	MOVL	$SYS_futex, AX
+	MOVL	addr+0(FP), BX
+	MOVL	op+4(FP), CX
+	MOVL	val+8(FP), DX
+	MOVL	ts+12(FP), SI
+	MOVL	addr2+16(FP), DI
+	MOVL	val3+20(FP), BP
+	INVOKE_SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+// int32 clone(int32 flags, void *stack, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT,$0
+	MOVL	$SYS_clone, AX
+	MOVL	flags+0(FP), BX
+	MOVL	stk+4(FP), CX
+	MOVL	$0, DX	// parent tid ptr
+	MOVL	$0, DI	// child tid ptr
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	SUBL	$16, CX
+	MOVL	mp+8(FP), SI
+	MOVL	SI, 0(CX)
+	MOVL	gp+12(FP), SI
+	MOVL	SI, 4(CX)
+	MOVL	fn+16(FP), SI
+	MOVL	SI, 8(CX)
+	MOVL	$1234, 12(CX)
+
+	// cannot use CALL 0x10(GS) here, because the stack changes during the
+	// system call (after CALL 0x10(GS), the child is still using the
+	// parent's stack when executing its RET instruction).
+	INT	$0x80
+
+	// In parent, return.
+	CMPL	AX, $0
+	JEQ	3(PC)
+	MOVL	AX, ret+20(FP)
+	RET
+
+	// Paranoia: check that SP is as we expect.
+	NOP	SP // tell vet SP changed - stop checking offsets
+	MOVL	12(SP), BP
+	CMPL	BP, $1234
+	JEQ	2(PC)
+	INT	$3
+
+	// Initialize AX to Linux tid
+	MOVL	$SYS_gettid, AX
+	INVOKE_SYSCALL
+
+	MOVL	0(SP), BX	    // m
+	MOVL	4(SP), DX	    // g
+	MOVL	8(SP), SI	    // fn
+
+	CMPL	BX, $0
+	JEQ	nog
+	CMPL	DX, $0
+	JEQ	nog
+
+	MOVL	AX, m_procid(BX)	// save tid as m->procid
+
+	// set up ldt 7+id to point at m->tls.
+	LEAL	m_tls(BX), BP
+	MOVL	m_id(BX), DI
+	ADDL	$7, DI	// m0 is LDT#7. count up.
+	// setldt(tls#, &tls, sizeof tls)
+	PUSHAL	// save registers
+	PUSHL	$32	// sizeof tls
+	PUSHL	BP	// &tls
+	PUSHL	DI	// tls #
+	CALL	runtime·setldt(SB)
+	POPL	AX
+	POPL	AX
+	POPL	AX
+	POPAL
+
+	// Now segment is established. Initialize m, g.
+	get_tls(AX)
+	MOVL	DX, g(AX)
+	MOVL	BX, g_m(DX)
+
+	CALL	runtime·stackcheck(SB)	// smashes AX, CX
+	MOVL	0(DX), DX	// paranoia; check they are not nil
+	MOVL	0(BX), BX
+
+	// more paranoia; check that stack splitting code works
+	PUSHAL
+	CALL	runtime·emptyfunc(SB)
+	POPAL
+
+nog:
+	CALL	SI	// fn()
+	CALL	exit1<>(SB)
+	MOVL	$0x1234, 0x1005
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$-8
+	MOVL	$SYS_sigaltstack, AX
+	MOVL	new+0(FP), BX
+	MOVL	old+4(FP), CX
+	INVOKE_SYSCALL
+	CMPL	AX, $0xfffff001
+	JLS	2(PC)
+	INT	$3
+	RET
+
+// <asm-i386/ldt.h>
+// struct user_desc {
+//	unsigned int  entry_number;
+//	unsigned long base_addr;
+//	unsigned int  limit;
+//	unsigned int  seg_32bit:1;
+//	unsigned int  contents:2;
+//	unsigned int  read_exec_only:1;
+//	unsigned int  limit_in_pages:1;
+//	unsigned int  seg_not_present:1;
+//	unsigned int  useable:1;
+// };
+#define SEG_32BIT 0x01
+// contents are the 2 bits 0x02 and 0x04.
+#define CONTENTS_DATA 0x00
+#define CONTENTS_STACK 0x02
+#define CONTENTS_CODE 0x04
+#define READ_EXEC_ONLY 0x08
+#define LIMIT_IN_PAGES 0x10
+#define SEG_NOT_PRESENT 0x20
+#define USEABLE 0x40
+
+// `-1` means the kernel will pick a TLS entry on the first setldt call,
+// which happens during runtime init, and that we'll store back the saved
+// entry and reuse that on subsequent calls when creating new threads.
+DATA  runtime·tls_entry_number+0(SB)/4, $-1
+GLOBL runtime·tls_entry_number(SB), NOPTR, $4
+
+// setldt(int entry, int address, int limit)
+// We use set_thread_area, which mucks with the GDT, instead of modify_ldt,
+// which would modify the LDT, but is disabled on some kernels.
+// The name, setldt, is a misnomer, although we leave this name as it is for
+// the compatibility with other platforms.
+TEXT runtime·setldt(SB),NOSPLIT,$32
+	MOVL	base+4(FP), DX
+
+#ifdef GOOS_android
+	// Android stores the TLS offset in runtime·tls_g.
+	SUBL	runtime·tls_g(SB), DX
+	MOVL	DX, 0(DX)
+#else
+	/*
+	 * When linking against the system libraries,
+	 * we use its pthread_create and let it set up %gs
+	 * for us.  When we do that, the private storage
+	 * we get is not at 0(GS), but -4(GS).
+	 * To insulate the rest of the tool chain from this
+	 * ugliness, 8l rewrites 0(TLS) into -4(GS) for us.
+	 * To accommodate that rewrite, we translate
+	 * the address here and bump the limit to 0xffffffff (no limit)
+	 * so that -4(GS) maps to 0(address).
+	 * Also, the final 0(GS) (current 4(DX)) has to point
+	 * to itself, to mimic ELF.
+	 */
+	ADDL	$0x4, DX	// address
+	MOVL	DX, 0(DX)
+#endif
+
+	// get entry number
+	MOVL	runtime·tls_entry_number(SB), CX
+
+	// set up user_desc
+	LEAL	16(SP), AX	// struct user_desc
+	MOVL	CX, 0(AX)	// unsigned int entry_number
+	MOVL	DX, 4(AX)	// unsigned long base_addr
+	MOVL	$0xfffff, 8(AX)	// unsigned int limit
+	MOVL	$(SEG_32BIT|LIMIT_IN_PAGES|USEABLE|CONTENTS_DATA), 12(AX)	// flag bits
+
+	// call set_thread_area
+	MOVL	AX, BX	// user_desc
+	MOVL	$SYS_set_thread_area, AX
+	// We can't call this via 0x10(GS) because this is called from setldt0 to set that up.
+	INT     $0x80
+
+	// breakpoint on error
+	CMPL AX, $0xfffff001
+	JLS 2(PC)
+	INT $3
+
+	// read allocated entry number back out of user_desc
+	LEAL	16(SP), AX	// get our user_desc back
+	MOVL	0(AX), AX
+
+	// store entry number if the kernel allocated it
+	CMPL	CX, $-1
+	JNE	2(PC)
+	MOVL	AX, runtime·tls_entry_number(SB)
+
+	// compute segment selector - (entry*8+3)
+	SHLL	$3, AX
+	ADDL	$3, AX
+	MOVW	AX, GS
+
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	MOVL	$SYS_sched_yield, AX
+	INVOKE_SYSCALL
+	RET
+
+TEXT runtime·sched_getaffinity(SB),NOSPLIT,$0
+	MOVL	$SYS_sched_getaffinity, AX
+	MOVL	pid+0(FP), BX
+	MOVL	len+4(FP), CX
+	MOVL	buf+8(FP), DX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+12(FP)
+	RET
+
+// int access(const char *name, int mode)
+TEXT runtime·access(SB),NOSPLIT,$0
+	MOVL	$SYS_access, AX
+	MOVL	name+0(FP), BX
+	MOVL	mode+4(FP), CX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+// int connect(int fd, const struct sockaddr *addr, socklen_t addrlen)
+TEXT runtime·connect(SB),NOSPLIT,$0-16
+	// connect is implemented as socketcall(NR_socket, 3, *(rest of args))
+	// stack already should have fd, addr, addrlen.
+	MOVL	$SYS_socketcall, AX
+	MOVL	$3, BX  // connect
+	LEAL	fd+0(FP), CX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+12(FP)
+	RET
+
+// int socket(int domain, int type, int protocol)
+TEXT runtime·socket(SB),NOSPLIT,$0-16
+	// socket is implemented as socketcall(NR_socket, 1, *(rest of args))
+	// stack already should have domain, type, protocol.
+	MOVL	$SYS_socketcall, AX
+	MOVL	$1, BX  // socket
+	LEAL	domain+0(FP), CX
+	INVOKE_SYSCALL
+	MOVL	AX, ret+12(FP)
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT,$0-4
+	// Implemented as brk(NULL).
+	MOVL	$SYS_brk, AX
+	MOVL	$0, BX  // NULL
+	INVOKE_SYSCALL
+	MOVL	AX, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_linux_amd64.s b/src/runtime/sys_linux_amd64.s
new file mode 100644
index 0000000..b6c64dc
--- /dev/null
+++ b/src/runtime/sys_linux_amd64.s
@@ -0,0 +1,706 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for AMD64, Linux
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "cgo/abi_amd64.h"
+
+#define AT_FDCWD -100
+
+#define SYS_read		0
+#define SYS_write		1
+#define SYS_close		3
+#define SYS_mmap		9
+#define SYS_munmap		11
+#define SYS_brk 		12
+#define SYS_rt_sigaction	13
+#define SYS_rt_sigprocmask	14
+#define SYS_rt_sigreturn	15
+#define SYS_sched_yield 	24
+#define SYS_mincore		27
+#define SYS_madvise		28
+#define SYS_nanosleep		35
+#define SYS_setittimer		38
+#define SYS_getpid		39
+#define SYS_socket		41
+#define SYS_connect		42
+#define SYS_clone		56
+#define SYS_exit		60
+#define SYS_kill		62
+#define SYS_sigaltstack 	131
+#define SYS_arch_prctl		158
+#define SYS_gettid		186
+#define SYS_futex		202
+#define SYS_sched_getaffinity	204
+#define SYS_timer_create	222
+#define SYS_timer_settime	223
+#define SYS_timer_delete	226
+#define SYS_clock_gettime	228
+#define SYS_exit_group		231
+#define SYS_tgkill		234
+#define SYS_openat		257
+#define SYS_faccessat		269
+#define SYS_pipe2		293
+
+TEXT runtime·exit(SB),NOSPLIT,$0-4
+	MOVL	code+0(FP), DI
+	MOVL	$SYS_exit_group, AX
+	SYSCALL
+	RET
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-8
+	MOVQ	wait+0(FP), AX
+	// We're done using the stack.
+	MOVL	$0, (AX)
+	MOVL	$0, DI	// exit code
+	MOVL	$SYS_exit, AX
+	SYSCALL
+	// We may not even have a stack any more.
+	INT	$3
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$0-20
+	// This uses openat instead of open, because Android O blocks open.
+	MOVL	$AT_FDCWD, DI // AT_FDCWD, so this acts like open
+	MOVQ	name+0(FP), SI
+	MOVL	mode+8(FP), DX
+	MOVL	perm+12(FP), R10
+	MOVL	$SYS_openat, AX
+	SYSCALL
+	CMPQ	AX, $0xfffffffffffff001
+	JLS	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$0-12
+	MOVL	fd+0(FP), DI
+	MOVL	$SYS_close, AX
+	SYSCALL
+	CMPQ	AX, $0xfffffffffffff001
+	JLS	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$0-28
+	MOVQ	fd+0(FP), DI
+	MOVQ	p+8(FP), SI
+	MOVL	n+16(FP), DX
+	MOVL	$SYS_write, AX
+	SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$0-28
+	MOVL	fd+0(FP), DI
+	MOVQ	p+8(FP), SI
+	MOVL	n+16(FP), DX
+	MOVL	$SYS_read, AX
+	SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-20
+	LEAQ	r+8(FP), DI
+	MOVL	flags+0(FP), SI
+	MOVL	$SYS_pipe2, AX
+	SYSCALL
+	MOVL	AX, errno+16(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16
+	MOVL	$0, DX
+	MOVL	usec+0(FP), AX
+	MOVL	$1000000, CX
+	DIVL	CX
+	MOVQ	AX, 0(SP)
+	MOVL	$1000, AX	// usec to nsec
+	MULL	DX
+	MOVQ	AX, 8(SP)
+
+	// nanosleep(&ts, 0)
+	MOVQ	SP, DI
+	MOVL	$0, SI
+	MOVL	$SYS_nanosleep, AX
+	SYSCALL
+	RET
+
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOVL	$SYS_gettid, AX
+	SYSCALL
+	MOVL	AX, ret+0(FP)
+	RET
+
+TEXT runtime·raise(SB),NOSPLIT,$0
+	MOVL	$SYS_getpid, AX
+	SYSCALL
+	MOVL	AX, R12
+	MOVL	$SYS_gettid, AX
+	SYSCALL
+	MOVL	AX, SI	// arg 2 tid
+	MOVL	R12, DI	// arg 1 pid
+	MOVL	sig+0(FP), DX	// arg 3
+	MOVL	$SYS_tgkill, AX
+	SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$0
+	MOVL	$SYS_getpid, AX
+	SYSCALL
+	MOVL	AX, DI	// arg 1 pid
+	MOVL	sig+0(FP), SI	// arg 2
+	MOVL	$SYS_kill, AX
+	SYSCALL
+	RET
+
+TEXT ·getpid(SB),NOSPLIT,$0-8
+	MOVL	$SYS_getpid, AX
+	SYSCALL
+	MOVQ	AX, ret+0(FP)
+	RET
+
+TEXT ·tgkill(SB),NOSPLIT,$0
+	MOVQ	tgid+0(FP), DI
+	MOVQ	tid+8(FP), SI
+	MOVQ	sig+16(FP), DX
+	MOVL	$SYS_tgkill, AX
+	SYSCALL
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$0-24
+	MOVL	mode+0(FP), DI
+	MOVQ	new+8(FP), SI
+	MOVQ	old+16(FP), DX
+	MOVL	$SYS_setittimer, AX
+	SYSCALL
+	RET
+
+TEXT runtime·timer_create(SB),NOSPLIT,$0-28
+	MOVL	clockid+0(FP), DI
+	MOVQ	sevp+8(FP), SI
+	MOVQ	timerid+16(FP), DX
+	MOVL	$SYS_timer_create, AX
+	SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·timer_settime(SB),NOSPLIT,$0-28
+	MOVL	timerid+0(FP), DI
+	MOVL	flags+4(FP), SI
+	MOVQ	new+8(FP), DX
+	MOVQ	old+16(FP), R10
+	MOVL	$SYS_timer_settime, AX
+	SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·timer_delete(SB),NOSPLIT,$0-12
+	MOVL	timerid+0(FP), DI
+	MOVL	$SYS_timer_delete, AX
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·mincore(SB),NOSPLIT,$0-28
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVQ	dst+16(FP), DX
+	MOVL	$SYS_mincore, AX
+	SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+// func nanotime1() int64
+TEXT runtime·nanotime1(SB),NOSPLIT,$16-8
+	// We don't know how much stack space the VDSO code will need,
+	// so switch to g0.
+	// In particular, a kernel configured with CONFIG_OPTIMIZE_INLINING=n
+	// and hardening can use a full page of stack space in gettime_sym
+	// due to stack probes inserted to avoid stack/heap collisions.
+	// See issue #20427.
+
+	MOVQ	SP, R12	// Save old SP; R12 unchanged by C code.
+
+	MOVQ	g_m(R14), BX // BX unchanged by C code.
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVQ	m_vdsoPC(BX), CX
+	MOVQ	m_vdsoSP(BX), DX
+	MOVQ	CX, 0(SP)
+	MOVQ	DX, 8(SP)
+
+	LEAQ	ret+0(FP), DX
+	MOVQ	-8(DX), CX
+	MOVQ	CX, m_vdsoPC(BX)
+	MOVQ	DX, m_vdsoSP(BX)
+
+	CMPQ	R14, m_curg(BX)	// Only switch if on curg.
+	JNE	noswitch
+
+	MOVQ	m_g0(BX), DX
+	MOVQ	(g_sched+gobuf_sp)(DX), SP	// Set SP to g0 stack
+
+noswitch:
+	SUBQ	$16, SP		// Space for results
+	ANDQ	$~15, SP	// Align for C code
+
+	MOVL	$1, DI // CLOCK_MONOTONIC
+	LEAQ	0(SP), SI
+	MOVQ	runtime·vdsoClockgettimeSym(SB), AX
+	CMPQ	AX, $0
+	JEQ	fallback
+	CALL	AX
+ret:
+	MOVQ	0(SP), AX	// sec
+	MOVQ	8(SP), DX	// nsec
+	MOVQ	R12, SP		// Restore real SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVQ	8(SP), CX
+	MOVQ	CX, m_vdsoSP(BX)
+	MOVQ	0(SP), CX
+	MOVQ	CX, m_vdsoPC(BX)
+	// sec is in AX, nsec in DX
+	// return nsec in AX
+	IMULQ	$1000000000, AX
+	ADDQ	DX, AX
+	MOVQ	AX, ret+0(FP)
+	RET
+fallback:
+	MOVQ	$SYS_clock_gettime, AX
+	SYSCALL
+	JMP	ret
+
+TEXT runtime·rtsigprocmask(SB),NOSPLIT,$0-28
+	MOVL	how+0(FP), DI
+	MOVQ	new+8(FP), SI
+	MOVQ	old+16(FP), DX
+	MOVL	size+24(FP), R10
+	MOVL	$SYS_rt_sigprocmask, AX
+	SYSCALL
+	CMPQ	AX, $0xfffffffffffff001
+	JLS	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·rt_sigaction(SB),NOSPLIT,$0-36
+	MOVQ	sig+0(FP), DI
+	MOVQ	new+8(FP), SI
+	MOVQ	old+16(FP), DX
+	MOVQ	size+24(FP), R10
+	MOVL	$SYS_rt_sigaction, AX
+	SYSCALL
+	MOVL	AX, ret+32(FP)
+	RET
+
+// Call the function stored in _cgo_sigaction using the GCC calling convention.
+TEXT runtime·callCgoSigaction(SB),NOSPLIT,$16
+	MOVQ	sig+0(FP), DI
+	MOVQ	new+8(FP), SI
+	MOVQ	old+16(FP), DX
+	MOVQ	_cgo_sigaction(SB), AX
+	MOVQ	SP, BX	// callee-saved
+	ANDQ	$~15, SP	// alignment as per amd64 psABI
+	CALL	AX
+	MOVQ	BX, SP
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVQ	fn+0(FP),    AX
+	MOVL	sig+8(FP),   DI
+	MOVQ	info+16(FP), SI
+	MOVQ	ctx+24(FP),  DX
+	MOVQ	SP, BX		// callee-saved
+	ANDQ	$~15, SP     // alignment for x86_64 ABI
+	CALL	AX
+	MOVQ	BX, SP
+	RET
+
+// Called using C ABI.
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
+	// Transition from C ABI to Go ABI.
+	PUSH_REGS_HOST_TO_ABI0()
+
+	// Set up ABIInternal environment: g in R14, cleared X15.
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	PXOR	X15, X15
+
+	// Reserve space for spill slots.
+	NOP	SP		// disable vet stack checking
+	ADJSP   $24
+
+	// Call into the Go signal handler
+	MOVQ	DI, AX	// sig
+	MOVQ	SI, BX	// info
+	MOVQ	DX, CX	// ctx
+	CALL	·sigtrampgo<ABIInternal>(SB)
+
+	ADJSP	$-24
+
+	POP_REGS_HOST_TO_ABI0()
+	RET
+
+// Called using C ABI.
+TEXT runtime·sigprofNonGoWrapper<>(SB),NOSPLIT|NOFRAME,$0
+	// Transition from C ABI to Go ABI.
+	PUSH_REGS_HOST_TO_ABI0()
+
+	// Set up ABIInternal environment: g in R14, cleared X15.
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	PXOR	X15, X15
+
+	// Reserve space for spill slots.
+	NOP	SP		// disable vet stack checking
+	ADJSP   $24
+
+	// Call into the Go signal handler
+	MOVQ	DI, AX	// sig
+	MOVQ	SI, BX	// info
+	MOVQ	DX, CX	// ctx
+	CALL	·sigprofNonGo<ABIInternal>(SB)
+
+	ADJSP	$-24
+
+	POP_REGS_HOST_TO_ABI0()
+	RET
+
+// Used instead of sigtramp in programs that use cgo.
+// Arguments from kernel are in DI, SI, DX.
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	// If no traceback function, do usual sigtramp.
+	MOVQ	runtime·cgoTraceback(SB), AX
+	TESTQ	AX, AX
+	JZ	sigtramp
+
+	// If no traceback support function, which means that
+	// runtime/cgo was not linked in, do usual sigtramp.
+	MOVQ	_cgo_callers(SB), AX
+	TESTQ	AX, AX
+	JZ	sigtramp
+
+	// Figure out if we are currently in a cgo call.
+	// If not, just do usual sigtramp.
+	get_tls(CX)
+	MOVQ	g(CX),AX
+	TESTQ	AX, AX
+	JZ	sigtrampnog     // g == nil
+	MOVQ	g_m(AX), AX
+	TESTQ	AX, AX
+	JZ	sigtramp        // g.m == nil
+	MOVL	m_ncgo(AX), CX
+	TESTL	CX, CX
+	JZ	sigtramp        // g.m.ncgo == 0
+	MOVQ	m_curg(AX), CX
+	TESTQ	CX, CX
+	JZ	sigtramp        // g.m.curg == nil
+	MOVQ	g_syscallsp(CX), CX
+	TESTQ	CX, CX
+	JZ	sigtramp        // g.m.curg.syscallsp == 0
+	MOVQ	m_cgoCallers(AX), R8
+	TESTQ	R8, R8
+	JZ	sigtramp        // g.m.cgoCallers == nil
+	MOVL	m_cgoCallersUse(AX), CX
+	TESTL	CX, CX
+	JNZ	sigtramp	// g.m.cgoCallersUse != 0
+
+	// Jump to a function in runtime/cgo.
+	// That function, written in C, will call the user's traceback
+	// function with proper unwind info, and will then call back here.
+	// The first three arguments, and the fifth, are already in registers.
+	// Set the two remaining arguments now.
+	MOVQ	runtime·cgoTraceback(SB), CX
+	MOVQ	$runtime·sigtramp(SB), R9
+	MOVQ	_cgo_callers(SB), AX
+	JMP	AX
+
+sigtramp:
+	JMP	runtime·sigtramp(SB)
+
+sigtrampnog:
+	// Signal arrived on a non-Go thread. If this is SIGPROF, get a
+	// stack trace.
+	CMPL	DI, $27 // 27 == SIGPROF
+	JNZ	sigtramp
+
+	// Lock sigprofCallersUse.
+	MOVL	$0, AX
+	MOVL	$1, CX
+	MOVQ	$runtime·sigprofCallersUse(SB), R11
+	LOCK
+	CMPXCHGL	CX, 0(R11)
+	JNZ	sigtramp  // Skip stack trace if already locked.
+
+	// Jump to the traceback function in runtime/cgo.
+	// It will call back to sigprofNonGo, via sigprofNonGoWrapper, to convert
+	// the arguments to the Go calling convention.
+	// First three arguments to traceback function are in registers already.
+	MOVQ	runtime·cgoTraceback(SB), CX
+	MOVQ	$runtime·sigprofCallers(SB), R8
+	MOVQ	$runtime·sigprofNonGoWrapper<>(SB), R9
+	MOVQ	_cgo_callers(SB), AX
+	JMP	AX
+
+// For cgo unwinding to work, this function must look precisely like
+// the one in glibc. The glibc source code is:
+// https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c;h=afdce87381228f0cf32fa9fa6c8c4efa5179065c#l80
+// The code that cares about the precise instructions used is:
+// https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgcc/config/i386/linux-unwind.h;h=5486223d60272c73d5103b29ae592d2ee998e1cf#l49
+//
+// For gdb unwinding to work, this function must look precisely like the one in
+// glibc and must be named "__restore_rt" or contain the string "sigaction" in
+// the name. The gdb source code is:
+// https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/amd64-linux-tdep.c;h=cbbac1a0c64e1deb8181b9d0ff6404e328e2979d#l178
+TEXT runtime·sigreturn__sigaction(SB),NOSPLIT,$0
+	MOVQ	$SYS_rt_sigreturn, AX
+	SYSCALL
+	INT $3	// not reached
+
+TEXT runtime·sysMmap(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVL	prot+16(FP), DX
+	MOVL	flags+20(FP), R10
+	MOVL	fd+24(FP), R8
+	MOVL	off+28(FP), R9
+
+	MOVL	$SYS_mmap, AX
+	SYSCALL
+	CMPQ	AX, $0xfffffffffffff001
+	JLS	ok
+	NOTQ	AX
+	INCQ	AX
+	MOVQ	$0, p+32(FP)
+	MOVQ	AX, err+40(FP)
+	RET
+ok:
+	MOVQ	AX, p+32(FP)
+	MOVQ	$0, err+40(FP)
+	RET
+
+// Call the function stored in _cgo_mmap using the GCC calling convention.
+// This must be called on the system stack.
+TEXT runtime·callCgoMmap(SB),NOSPLIT,$16
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVL	prot+16(FP), DX
+	MOVL	flags+20(FP), CX
+	MOVL	fd+24(FP), R8
+	MOVL	off+28(FP), R9
+	MOVQ	_cgo_mmap(SB), AX
+	MOVQ	SP, BX
+	ANDQ	$~15, SP	// alignment as per amd64 psABI
+	MOVQ	BX, 0(SP)
+	CALL	AX
+	MOVQ	0(SP), SP
+	MOVQ	AX, ret+32(FP)
+	RET
+
+TEXT runtime·sysMunmap(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVQ	$SYS_munmap, AX
+	SYSCALL
+	CMPQ	AX, $0xfffffffffffff001
+	JLS	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+// Call the function stored in _cgo_munmap using the GCC calling convention.
+// This must be called on the system stack.
+TEXT runtime·callCgoMunmap(SB),NOSPLIT,$16-16
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVQ	_cgo_munmap(SB), AX
+	MOVQ	SP, BX
+	ANDQ	$~15, SP	// alignment as per amd64 psABI
+	MOVQ	BX, 0(SP)
+	CALL	AX
+	MOVQ	0(SP), SP
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI
+	MOVQ	n+8(FP), SI
+	MOVL	flags+16(FP), DX
+	MOVQ	$SYS_madvise, AX
+	SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+// int64 futex(int32 *uaddr, int32 op, int32 val,
+//	struct timespec *timeout, int32 *uaddr2, int32 val2);
+TEXT runtime·futex(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI
+	MOVL	op+8(FP), SI
+	MOVL	val+12(FP), DX
+	MOVQ	ts+16(FP), R10
+	MOVQ	addr2+24(FP), R8
+	MOVL	val3+32(FP), R9
+	MOVL	$SYS_futex, AX
+	SYSCALL
+	MOVL	AX, ret+40(FP)
+	RET
+
+// int32 clone(int32 flags, void *stk, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT|NOFRAME,$0
+	MOVL	flags+0(FP), DI
+	MOVQ	stk+8(FP), SI
+	MOVQ	$0, DX
+	MOVQ	$0, R10
+	MOVQ    $0, R8
+	// Copy mp, gp, fn off parent stack for use by child.
+	// Careful: Linux system call clobbers CX and R11.
+	MOVQ	mp+16(FP), R13
+	MOVQ	gp+24(FP), R9
+	MOVQ	fn+32(FP), R12
+	CMPQ	R13, $0    // m
+	JEQ	nog1
+	CMPQ	R9, $0    // g
+	JEQ	nog1
+	LEAQ	m_tls(R13), R8
+#ifdef GOOS_android
+	// Android stores the TLS offset in runtime·tls_g.
+	SUBQ	runtime·tls_g(SB), R8
+#else
+	ADDQ	$8, R8	// ELF wants to use -8(FS)
+#endif
+	ORQ 	$0x00080000, DI //add flag CLONE_SETTLS(0x00080000) to call clone
+nog1:
+	MOVL	$SYS_clone, AX
+	SYSCALL
+
+	// In parent, return.
+	CMPQ	AX, $0
+	JEQ	3(PC)
+	MOVL	AX, ret+40(FP)
+	RET
+
+	// In child, on new stack.
+	MOVQ	SI, SP
+
+	// If g or m are nil, skip Go-related setup.
+	CMPQ	R13, $0    // m
+	JEQ	nog2
+	CMPQ	R9, $0    // g
+	JEQ	nog2
+
+	// Initialize m->procid to Linux tid
+	MOVL	$SYS_gettid, AX
+	SYSCALL
+	MOVQ	AX, m_procid(R13)
+
+	// In child, set up new stack
+	get_tls(CX)
+	MOVQ	R13, g_m(R9)
+	MOVQ	R9, g(CX)
+	MOVQ	R9, R14 // set g register
+	CALL	runtime·stackcheck(SB)
+
+nog2:
+	// Call fn. This is the PC of an ABI0 function.
+	CALL	R12
+
+	// It shouldn't return. If it does, exit that thread.
+	MOVL	$111, DI
+	MOVL	$SYS_exit, AX
+	SYSCALL
+	JMP	-3(PC)	// keep exiting
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$0
+	MOVQ	new+0(FP), DI
+	MOVQ	old+8(FP), SI
+	MOVQ	$SYS_sigaltstack, AX
+	SYSCALL
+	CMPQ	AX, $0xfffffffffffff001
+	JLS	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+// set tls base to DI
+TEXT runtime·settls(SB),NOSPLIT,$32
+#ifdef GOOS_android
+	// Android stores the TLS offset in runtime·tls_g.
+	SUBQ	runtime·tls_g(SB), DI
+#else
+	ADDQ	$8, DI	// ELF wants to use -8(FS)
+#endif
+	MOVQ	DI, SI
+	MOVQ	$0x1002, DI	// ARCH_SET_FS
+	MOVQ	$SYS_arch_prctl, AX
+	SYSCALL
+	CMPQ	AX, $0xfffffffffffff001
+	JLS	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	MOVL	$SYS_sched_yield, AX
+	SYSCALL
+	RET
+
+TEXT runtime·sched_getaffinity(SB),NOSPLIT,$0
+	MOVQ	pid+0(FP), DI
+	MOVQ	len+8(FP), SI
+	MOVQ	buf+16(FP), DX
+	MOVL	$SYS_sched_getaffinity, AX
+	SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+// int access(const char *name, int mode)
+TEXT runtime·access(SB),NOSPLIT,$0
+	// This uses faccessat instead of access, because Android O blocks access.
+	MOVL	$AT_FDCWD, DI // AT_FDCWD, so this acts like access
+	MOVQ	name+0(FP), SI
+	MOVL	mode+8(FP), DX
+	MOVL	$0, R10
+	MOVL	$SYS_faccessat, AX
+	SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+// int connect(int fd, const struct sockaddr *addr, socklen_t addrlen)
+TEXT runtime·connect(SB),NOSPLIT,$0-28
+	MOVL	fd+0(FP), DI
+	MOVQ	addr+8(FP), SI
+	MOVL	len+16(FP), DX
+	MOVL	$SYS_connect, AX
+	SYSCALL
+	MOVL	AX, ret+24(FP)
+	RET
+
+// int socket(int domain, int type, int protocol)
+TEXT runtime·socket(SB),NOSPLIT,$0-20
+	MOVL	domain+0(FP), DI
+	MOVL	typ+4(FP), SI
+	MOVL	prot+8(FP), DX
+	MOVL	$SYS_socket, AX
+	SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT,$0-8
+	// Implemented as brk(NULL).
+	MOVQ	$0, DI
+	MOVL	$SYS_brk, AX
+	SYSCALL
+	MOVQ	AX, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_linux_arm.s b/src/runtime/sys_linux_arm.s
new file mode 100644
index 0000000..992d32a
--- /dev/null
+++ b/src/runtime/sys_linux_arm.s
@@ -0,0 +1,652 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for arm, Linux
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME	0
+#define CLOCK_MONOTONIC	1
+
+// for EABI, as we don't support OABI
+#define SYS_BASE 0x0
+
+#define SYS_exit (SYS_BASE + 1)
+#define SYS_read (SYS_BASE + 3)
+#define SYS_write (SYS_BASE + 4)
+#define SYS_open (SYS_BASE + 5)
+#define SYS_close (SYS_BASE + 6)
+#define SYS_getpid (SYS_BASE + 20)
+#define SYS_kill (SYS_BASE + 37)
+#define SYS_clone (SYS_BASE + 120)
+#define SYS_rt_sigreturn (SYS_BASE + 173)
+#define SYS_rt_sigaction (SYS_BASE + 174)
+#define SYS_rt_sigprocmask (SYS_BASE + 175)
+#define SYS_sigaltstack (SYS_BASE + 186)
+#define SYS_mmap2 (SYS_BASE + 192)
+#define SYS_futex (SYS_BASE + 240)
+#define SYS_exit_group (SYS_BASE + 248)
+#define SYS_munmap (SYS_BASE + 91)
+#define SYS_madvise (SYS_BASE + 220)
+#define SYS_setitimer (SYS_BASE + 104)
+#define SYS_mincore (SYS_BASE + 219)
+#define SYS_gettid (SYS_BASE + 224)
+#define SYS_tgkill (SYS_BASE + 268)
+#define SYS_sched_yield (SYS_BASE + 158)
+#define SYS_nanosleep (SYS_BASE + 162)
+#define SYS_sched_getaffinity (SYS_BASE + 242)
+#define SYS_clock_gettime (SYS_BASE + 263)
+#define SYS_timer_create (SYS_BASE + 257)
+#define SYS_timer_settime (SYS_BASE + 258)
+#define SYS_timer_delete (SYS_BASE + 261)
+#define SYS_pipe2 (SYS_BASE + 359)
+#define SYS_access (SYS_BASE + 33)
+#define SYS_connect (SYS_BASE + 283)
+#define SYS_socket (SYS_BASE + 281)
+#define SYS_brk (SYS_BASE + 45)
+
+#define ARM_BASE (SYS_BASE + 0x0f0000)
+
+TEXT runtime·open(SB),NOSPLIT,$0
+	MOVW	name+0(FP), R0
+	MOVW	mode+4(FP), R1
+	MOVW	perm+8(FP), R2
+	MOVW	$SYS_open, R7
+	SWI	$0
+	MOVW	$0xfffff001, R1
+	CMP	R1, R0
+	MOVW.HI	$-1, R0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$0
+	MOVW	fd+0(FP), R0
+	MOVW	$SYS_close, R7
+	SWI	$0
+	MOVW	$0xfffff001, R1
+	CMP	R1, R0
+	MOVW.HI	$-1, R0
+	MOVW	R0, ret+4(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$0
+	MOVW	fd+0(FP), R0
+	MOVW	p+4(FP), R1
+	MOVW	n+8(FP), R2
+	MOVW	$SYS_write, R7
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$0
+	MOVW	fd+0(FP), R0
+	MOVW	p+4(FP), R1
+	MOVW	n+8(FP), R2
+	MOVW	$SYS_read, R7
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-16
+	MOVW	$r+4(FP), R0
+	MOVW	flags+0(FP), R1
+	MOVW	$SYS_pipe2, R7
+	SWI	$0
+	MOVW	R0, errno+12(FP)
+	RET
+
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0
+	MOVW	code+0(FP), R0
+	MOVW	$SYS_exit_group, R7
+	SWI	$0
+	MOVW	$1234, R0
+	MOVW	$1002, R1
+	MOVW	R0, (R1)	// fail hard
+
+TEXT exit1<>(SB),NOSPLIT|NOFRAME,$0
+	MOVW	code+0(FP), R0
+	MOVW	$SYS_exit, R7
+	SWI	$0
+	MOVW	$1234, R0
+	MOVW	$1003, R1
+	MOVW	R0, (R1)	// fail hard
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	wait+0(FP), R0
+	// We're done using the stack.
+	// Alas, there's no reliable way to make this write atomic
+	// without potentially using the stack. So it goes.
+	MOVW	$0, R1
+	MOVW	R1, (R0)
+	MOVW	$0, R0	// exit code
+	MOVW	$SYS_exit, R7
+	SWI	$0
+	MOVW	$1234, R0
+	MOVW	$1004, R1
+	MOVW	R0, (R1)	// fail hard
+	JMP	0(PC)
+
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOVW	$SYS_gettid, R7
+	SWI	$0
+	MOVW	R0, ret+0(FP)
+	RET
+
+TEXT	runtime·raise(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$SYS_getpid, R7
+	SWI	$0
+	MOVW	R0, R4
+	MOVW	$SYS_gettid, R7
+	SWI	$0
+	MOVW	R0, R1	// arg 2 tid
+	MOVW	R4, R0	// arg 1 pid
+	MOVW	sig+0(FP), R2	// arg 3
+	MOVW	$SYS_tgkill, R7
+	SWI	$0
+	RET
+
+TEXT	runtime·raiseproc(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$SYS_getpid, R7
+	SWI	$0
+	// arg 1 tid already in R0 from getpid
+	MOVW	sig+0(FP), R1	// arg 2 - signal
+	MOVW	$SYS_kill, R7
+	SWI	$0
+	RET
+
+TEXT ·getpid(SB),NOSPLIT,$0-4
+	MOVW	$SYS_getpid, R7
+	SWI	$0
+	MOVW	R0, ret+0(FP)
+	RET
+
+TEXT ·tgkill(SB),NOSPLIT,$0-12
+	MOVW	tgid+0(FP), R0
+	MOVW	tid+4(FP), R1
+	MOVW	sig+8(FP), R2
+	MOVW	$SYS_tgkill, R7
+	SWI	$0
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$0
+	MOVW	addr+0(FP), R0
+	MOVW	n+4(FP), R1
+	MOVW	prot+8(FP), R2
+	MOVW	flags+12(FP), R3
+	MOVW	fd+16(FP), R4
+	MOVW	off+20(FP), R5
+	MOVW	$SYS_mmap2, R7
+	SWI	$0
+	MOVW	$0xfffff001, R6
+	CMP		R6, R0
+	MOVW	$0, R1
+	RSB.HI	$0, R0
+	MOVW.HI	R0, R1		// if error, put in R1
+	MOVW.HI	$0, R0
+	MOVW	R0, p+24(FP)
+	MOVW	R1, err+28(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVW	addr+0(FP), R0
+	MOVW	n+4(FP), R1
+	MOVW	$SYS_munmap, R7
+	SWI	$0
+	MOVW	$0xfffff001, R6
+	CMP 	R6, R0
+	MOVW.HI	$0, R8  // crash on syscall failure
+	MOVW.HI	R8, (R8)
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVW	addr+0(FP), R0
+	MOVW	n+4(FP), R1
+	MOVW	flags+8(FP), R2
+	MOVW	$SYS_madvise, R7
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$0
+	MOVW	mode+0(FP), R0
+	MOVW	new+4(FP), R1
+	MOVW	old+8(FP), R2
+	MOVW	$SYS_setitimer, R7
+	SWI	$0
+	RET
+
+TEXT runtime·timer_create(SB),NOSPLIT,$0-16
+	MOVW	clockid+0(FP), R0
+	MOVW	sevp+4(FP), R1
+	MOVW	timerid+8(FP), R2
+	MOVW	$SYS_timer_create, R7
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·timer_settime(SB),NOSPLIT,$0-20
+	MOVW	timerid+0(FP), R0
+	MOVW	flags+4(FP), R1
+	MOVW	new+8(FP), R2
+	MOVW	old+12(FP), R3
+	MOVW	$SYS_timer_settime, R7
+	SWI	$0
+	MOVW	R0, ret+16(FP)
+	RET
+
+TEXT runtime·timer_delete(SB),NOSPLIT,$0-8
+	MOVW	timerid+0(FP), R0
+	MOVW	$SYS_timer_delete, R7
+	SWI	$0
+	MOVW	R0, ret+4(FP)
+	RET
+
+TEXT runtime·mincore(SB),NOSPLIT,$0
+	MOVW	addr+0(FP), R0
+	MOVW	n+4(FP), R1
+	MOVW	dst+8(FP), R2
+	MOVW	$SYS_mincore, R7
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+// Call a VDSO function.
+//
+// R0-R3: arguments to VDSO function (C calling convention)
+// R4: uintptr function to call
+//
+// There is no return value.
+TEXT runtime·vdsoCall(SB),NOSPLIT,$8-0
+	// R0-R3 may be arguments to fn, do not touch.
+	// R4 is function to call.
+	// R5-R9 are available as locals. They are unchanged by the C call
+	// (callee-save).
+
+	// We don't know how much stack space the VDSO code will need,
+	// so switch to g0.
+
+	// Save old SP. Use R13 instead of SP to avoid linker rewriting the offsets.
+	MOVW	R13, R5
+
+	MOVW	g_m(g), R6
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVW	m_vdsoPC(R6), R7
+	MOVW	m_vdsoSP(R6), R8
+	MOVW	R7, 4(R13)
+	MOVW	R8, 8(R13)
+
+	MOVW	$sp-4(FP), R7 // caller's SP
+	MOVW	LR, m_vdsoPC(R6)
+	MOVW	R7, m_vdsoSP(R6)
+
+	MOVW	m_curg(R6), R7
+
+	CMP	g, R7		// Only switch if on curg.
+	B.NE	noswitch
+
+	MOVW	m_g0(R6), R7
+	MOVW	(g_sched+gobuf_sp)(R7), R13	 // Set SP to g0 stack
+
+noswitch:
+	BIC	$0x7, R13	// Align for C code
+
+	// Store g on gsignal's stack, so if we receive a signal
+	// during VDSO code we can find the g.
+
+	// When using cgo, we already saved g on TLS, also don't save g here.
+	MOVB	runtime·iscgo(SB), R7
+	CMP	$0, R7
+	BNE	nosaveg
+	// If we don't have a signal stack, we won't receive signal, so don't
+	// bother saving g.
+	MOVW	m_gsignal(R6), R7          // g.m.gsignal
+	CMP	$0, R7
+	BEQ	nosaveg
+	// Don't save g if we are already on the signal stack, as we won't get
+	// a nested signal.
+	CMP	g, R7
+	BEQ	nosaveg
+	// If we don't have a signal stack, we won't receive signal, so don't
+	// bother saving g.
+	MOVW	(g_stack+stack_lo)(R7), R7 // g.m.gsignal.stack.lo
+	CMP	$0, R7
+	BEQ	nosaveg
+	MOVW	g, (R7)
+
+	BL	(R4)
+
+	MOVW	$0, R8
+	MOVW	R8, (R7) // clear g slot
+
+	JMP	finish
+
+nosaveg:
+	BL	(R4)
+
+finish:
+	MOVW	R5, R13		// Restore real SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVW	8(R13), R7
+	MOVW	R7, m_vdsoSP(R6)
+	MOVW	4(R13), R7
+	MOVW	R7, m_vdsoPC(R6)
+	RET
+
+TEXT runtime·walltime(SB),NOSPLIT,$12-12
+	MOVW	$CLOCK_REALTIME, R0
+	MOVW	$spec-12(SP), R1	// timespec
+
+	MOVW	runtime·vdsoClockgettimeSym(SB), R4
+	CMP	$0, R4
+	B.EQ	fallback
+
+	BL	runtime·vdsoCall(SB)
+
+	JMP	finish
+
+fallback:
+	MOVW	$SYS_clock_gettime, R7
+	SWI	$0
+
+finish:
+	MOVW	sec-12(SP), R0  // sec
+	MOVW	nsec-8(SP), R2  // nsec
+
+	MOVW	R0, sec_lo+0(FP)
+	MOVW	$0, R1
+	MOVW	R1, sec_hi+4(FP)
+	MOVW	R2, nsec+8(FP)
+	RET
+
+// func nanotime1() int64
+TEXT runtime·nanotime1(SB),NOSPLIT,$12-8
+	MOVW	$CLOCK_MONOTONIC, R0
+	MOVW	$spec-12(SP), R1	// timespec
+
+	MOVW	runtime·vdsoClockgettimeSym(SB), R4
+	CMP	$0, R4
+	B.EQ	fallback
+
+	BL	runtime·vdsoCall(SB)
+
+	JMP	finish
+
+fallback:
+	MOVW	$SYS_clock_gettime, R7
+	SWI	$0
+
+finish:
+	MOVW	sec-12(SP), R0  // sec
+	MOVW	nsec-8(SP), R2  // nsec
+
+	MOVW	$1000000000, R3
+	MULLU	R0, R3, (R1, R0)
+	ADD.S	R2, R0
+	ADC	$0, R1	// Add carry bit to upper half.
+
+	MOVW	R0, ret_lo+0(FP)
+	MOVW	R1, ret_hi+4(FP)
+
+	RET
+
+// int32 futex(int32 *uaddr, int32 op, int32 val,
+//	struct timespec *timeout, int32 *uaddr2, int32 val2);
+TEXT runtime·futex(SB),NOSPLIT,$0
+	MOVW    addr+0(FP), R0
+	MOVW    op+4(FP), R1
+	MOVW    val+8(FP), R2
+	MOVW    ts+12(FP), R3
+	MOVW    addr2+16(FP), R4
+	MOVW    val3+20(FP), R5
+	MOVW	$SYS_futex, R7
+	SWI	$0
+	MOVW	R0, ret+24(FP)
+	RET
+
+// int32 clone(int32 flags, void *stack, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT,$0
+	MOVW	flags+0(FP), R0
+	MOVW	stk+4(FP), R1
+	MOVW	$0, R2	// parent tid ptr
+	MOVW	$0, R3	// tls_val
+	MOVW	$0, R4	// child tid ptr
+	MOVW	$0, R5
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	MOVW	$-16(R1), R1
+	MOVW	mp+8(FP), R6
+	MOVW	R6, 0(R1)
+	MOVW	gp+12(FP), R6
+	MOVW	R6, 4(R1)
+	MOVW	fn+16(FP), R6
+	MOVW	R6, 8(R1)
+	MOVW	$1234, R6
+	MOVW	R6, 12(R1)
+
+	MOVW	$SYS_clone, R7
+	SWI	$0
+
+	// In parent, return.
+	CMP	$0, R0
+	BEQ	3(PC)
+	MOVW	R0, ret+20(FP)
+	RET
+
+	// Paranoia: check that SP is as we expect. Use R13 to avoid linker 'fixup'
+	NOP	R13	// tell vet SP/R13 changed - stop checking offsets
+	MOVW	12(R13), R0
+	MOVW	$1234, R1
+	CMP	R0, R1
+	BEQ	2(PC)
+	BL	runtime·abort(SB)
+
+	MOVW	0(R13), R8    // m
+	MOVW	4(R13), R0    // g
+
+	CMP	$0, R8
+	BEQ	nog
+	CMP	$0, R0
+	BEQ	nog
+
+	MOVW	R0, g
+	MOVW	R8, g_m(g)
+
+	// paranoia; check they are not nil
+	MOVW	0(R8), R0
+	MOVW	0(g), R0
+
+	BL	runtime·emptyfunc(SB)	// fault if stack check is wrong
+
+	// Initialize m->procid to Linux tid
+	MOVW	$SYS_gettid, R7
+	SWI	$0
+	MOVW	g_m(g), R8
+	MOVW	R0, m_procid(R8)
+
+nog:
+	// Call fn
+	MOVW	8(R13), R0
+	MOVW	$16(R13), R13
+	BL	(R0)
+
+	// It shouldn't return. If it does, exit that thread.
+	SUB	$16, R13 // restore the stack pointer to avoid memory corruption
+	MOVW	$0, R0
+	MOVW	R0, 4(R13)
+	BL	exit1<>(SB)
+
+	MOVW	$1234, R0
+	MOVW	$1005, R1
+	MOVW	R0, (R1)
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$0
+	MOVW	new+0(FP), R0
+	MOVW	old+4(FP), R1
+	MOVW	$SYS_sigaltstack, R7
+	SWI	$0
+	MOVW	$0xfffff001, R6
+	CMP 	R6, R0
+	MOVW.HI	$0, R8  // crash on syscall failure
+	MOVW.HI	R8, (R8)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-16
+	MOVW	sig+4(FP), R0
+	MOVW	info+8(FP), R1
+	MOVW	ctx+12(FP), R2
+	MOVW	fn+0(FP), R11
+	MOVW	R13, R4
+	SUB	$24, R13
+	BIC	$0x7, R13 // alignment for ELF ABI
+	BL	(R11)
+	MOVW	R4, R13
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$0
+	// Reserve space for callee-save registers and arguments.
+	MOVM.DB.W [R4-R11], (R13)
+	SUB	$16, R13
+
+	// this might be called in external code context,
+	// where g is not set.
+	// first save R0, because runtime·load_g will clobber it
+	MOVW	R0, 4(R13)
+	MOVB	runtime·iscgo(SB), R0
+	CMP 	$0, R0
+	BL.NE	runtime·load_g(SB)
+
+	MOVW	R1, 8(R13)
+	MOVW	R2, 12(R13)
+	MOVW  	$runtime·sigtrampgo(SB), R11
+	BL	(R11)
+
+	// Restore callee-save registers.
+	ADD	$16, R13
+	MOVM.IA.W (R13), [R4-R11]
+
+	RET
+
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	MOVW  	$runtime·sigtramp(SB), R11
+	B	(R11)
+
+TEXT runtime·rtsigprocmask(SB),NOSPLIT,$0
+	MOVW	how+0(FP), R0
+	MOVW	new+4(FP), R1
+	MOVW	old+8(FP), R2
+	MOVW	size+12(FP), R3
+	MOVW	$SYS_rt_sigprocmask, R7
+	SWI	$0
+	RET
+
+TEXT runtime·rt_sigaction(SB),NOSPLIT,$0
+	MOVW	sig+0(FP), R0
+	MOVW	new+4(FP), R1
+	MOVW	old+8(FP), R2
+	MOVW	size+12(FP), R3
+	MOVW	$SYS_rt_sigaction, R7
+	SWI	$0
+	MOVW	R0, ret+16(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$12
+	MOVW	usec+0(FP), R0
+	CALL	runtime·usplitR0(SB)
+	MOVW	R0, 4(R13)
+	MOVW	$1000, R0	// usec to nsec
+	MUL	R0, R1
+	MOVW	R1, 8(R13)
+	MOVW	$4(R13), R0
+	MOVW	$0, R1
+	MOVW	$SYS_nanosleep, R7
+	SWI	$0
+	RET
+
+// As for cas, memory barriers are complicated on ARM, but the kernel
+// provides a user helper. ARMv5 does not support SMP and has no
+// memory barrier instruction at all. ARMv6 added SMP support and has
+// a memory barrier, but it requires writing to a coprocessor
+// register. ARMv7 introduced the DMB instruction, but it's expensive
+// even on single-core devices. The kernel helper takes care of all of
+// this for us.
+
+TEXT kernelPublicationBarrier<>(SB),NOSPLIT,$0
+	// void __kuser_memory_barrier(void);
+	MOVW	$0xffff0fa0, R11
+	CALL	(R11)
+	RET
+
+TEXT ·publicationBarrier(SB),NOSPLIT,$0
+	MOVB	·goarm(SB), R11
+	CMP	$7, R11
+	BLT	2(PC)
+	JMP	·armPublicationBarrier(SB)
+	JMP	kernelPublicationBarrier<>(SB) // extra layer so this function is leaf and no SP adjustment on GOARM=7
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	MOVW	$SYS_sched_yield, R7
+	SWI	$0
+	RET
+
+TEXT runtime·sched_getaffinity(SB),NOSPLIT,$0
+	MOVW	pid+0(FP), R0
+	MOVW	len+4(FP), R1
+	MOVW	buf+8(FP), R2
+	MOVW	$SYS_sched_getaffinity, R7
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+// b __kuser_get_tls @ 0xffff0fe0
+TEXT runtime·read_tls_fallback(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$0xffff0fe0, R0
+	B	(R0)
+
+TEXT runtime·access(SB),NOSPLIT,$0
+	MOVW	name+0(FP), R0
+	MOVW	mode+4(FP), R1
+	MOVW	$SYS_access, R7
+	SWI	$0
+	MOVW	R0, ret+8(FP)
+	RET
+
+TEXT runtime·connect(SB),NOSPLIT,$0
+	MOVW	fd+0(FP), R0
+	MOVW	addr+4(FP), R1
+	MOVW	len+8(FP), R2
+	MOVW	$SYS_connect, R7
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·socket(SB),NOSPLIT,$0
+	MOVW	domain+0(FP), R0
+	MOVW	typ+4(FP), R1
+	MOVW	prot+8(FP), R2
+	MOVW	$SYS_socket, R7
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT,$0-4
+	// Implemented as brk(NULL).
+	MOVW	$0, R0
+	MOVW	$SYS_brk, R7
+	SWI	$0
+	MOVW	R0, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_linux_arm64.s b/src/runtime/sys_linux_arm64.s
new file mode 100644
index 0000000..51c87be
--- /dev/null
+++ b/src/runtime/sys_linux_arm64.s
@@ -0,0 +1,787 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for arm64, Linux
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "cgo/abi_arm64.h"
+
+#define AT_FDCWD -100
+
+#define CLOCK_REALTIME 0
+#define CLOCK_MONOTONIC 1
+
+#define SYS_exit		93
+#define SYS_read		63
+#define SYS_write		64
+#define SYS_openat		56
+#define SYS_close		57
+#define SYS_pipe2		59
+#define SYS_nanosleep		101
+#define SYS_mmap		222
+#define SYS_munmap		215
+#define SYS_setitimer		103
+#define SYS_clone		220
+#define SYS_sched_yield		124
+#define SYS_rt_sigreturn	139
+#define SYS_rt_sigaction	134
+#define SYS_rt_sigprocmask	135
+#define SYS_sigaltstack		132
+#define SYS_madvise		233
+#define SYS_mincore		232
+#define SYS_getpid		172
+#define SYS_gettid		178
+#define SYS_kill		129
+#define SYS_tgkill		131
+#define SYS_futex		98
+#define SYS_sched_getaffinity	123
+#define SYS_exit_group		94
+#define SYS_clock_gettime	113
+#define SYS_faccessat		48
+#define SYS_socket		198
+#define SYS_connect		203
+#define SYS_brk			214
+#define SYS_timer_create	107
+#define SYS_timer_settime	110
+#define SYS_timer_delete	111
+
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	code+0(FP), R0
+	MOVD	$SYS_exit_group, R8
+	SVC
+	RET
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT|NOFRAME,$0-8
+	MOVD	wait+0(FP), R0
+	// We're done using the stack.
+	MOVW	$0, R1
+	STLRW	R1, (R0)
+	MOVW	$0, R0	// exit code
+	MOVD	$SYS_exit, R8
+	SVC
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0-20
+	MOVD	$AT_FDCWD, R0
+	MOVD	name+0(FP), R1
+	MOVW	mode+8(FP), R2
+	MOVW	perm+12(FP), R3
+	MOVD	$SYS_openat, R8
+	SVC
+	CMN	$4095, R0
+	BCC	done
+	MOVW	$-1, R0
+done:
+	MOVW	R0, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	fd+0(FP), R0
+	MOVD	$SYS_close, R8
+	SVC
+	CMN	$4095, R0
+	BCC	done
+	MOVW	$-1, R0
+done:
+	MOVW	R0, ret+8(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0-28
+	MOVD	fd+0(FP), R0
+	MOVD	p+8(FP), R1
+	MOVW	n+16(FP), R2
+	MOVD	$SYS_write, R8
+	SVC
+	MOVW	R0, ret+24(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	fd+0(FP), R0
+	MOVD	p+8(FP), R1
+	MOVW	n+16(FP), R2
+	MOVD	$SYS_read, R8
+	SVC
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	MOVD	$r+8(FP), R0
+	MOVW	flags+0(FP), R1
+	MOVW	$SYS_pipe2, R8
+	SVC
+	MOVW	R0, errno+16(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$24-4
+	MOVWU	usec+0(FP), R3
+	MOVD	R3, R5
+	MOVW	$1000000, R4
+	UDIV	R4, R3
+	MOVD	R3, 8(RSP)
+	MUL	R3, R4
+	SUB	R4, R5
+	MOVW	$1000, R4
+	MUL	R4, R5
+	MOVD	R5, 16(RSP)
+
+	// nanosleep(&ts, 0)
+	ADD	$8, RSP, R0
+	MOVD	$0, R1
+	MOVD	$SYS_nanosleep, R8
+	SVC
+	RET
+
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOVD	$SYS_gettid, R8
+	SVC
+	MOVW	R0, ret+0(FP)
+	RET
+
+TEXT runtime·raise(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$SYS_getpid, R8
+	SVC
+	MOVW	R0, R19
+	MOVD	$SYS_gettid, R8
+	SVC
+	MOVW	R0, R1	// arg 2 tid
+	MOVW	R19, R0	// arg 1 pid
+	MOVW	sig+0(FP), R2	// arg 3
+	MOVD	$SYS_tgkill, R8
+	SVC
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$SYS_getpid, R8
+	SVC
+	MOVW	R0, R0		// arg 1 pid
+	MOVW	sig+0(FP), R1	// arg 2
+	MOVD	$SYS_kill, R8
+	SVC
+	RET
+
+TEXT ·getpid(SB),NOSPLIT|NOFRAME,$0-8
+	MOVD	$SYS_getpid, R8
+	SVC
+	MOVD	R0, ret+0(FP)
+	RET
+
+TEXT ·tgkill(SB),NOSPLIT,$0-24
+	MOVD	tgid+0(FP), R0
+	MOVD	tid+8(FP), R1
+	MOVD	sig+16(FP), R2
+	MOVD	$SYS_tgkill, R8
+	SVC
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	mode+0(FP), R0
+	MOVD	new+8(FP), R1
+	MOVD	old+16(FP), R2
+	MOVD	$SYS_setitimer, R8
+	SVC
+	RET
+
+TEXT runtime·timer_create(SB),NOSPLIT,$0-28
+	MOVW	clockid+0(FP), R0
+	MOVD	sevp+8(FP), R1
+	MOVD	timerid+16(FP), R2
+	MOVD	$SYS_timer_create, R8
+	SVC
+	MOVW	R0, ret+24(FP)
+	RET
+
+TEXT runtime·timer_settime(SB),NOSPLIT,$0-28
+	MOVW	timerid+0(FP), R0
+	MOVW	flags+4(FP), R1
+	MOVD	new+8(FP), R2
+	MOVD	old+16(FP), R3
+	MOVD	$SYS_timer_settime, R8
+	SVC
+	MOVW	R0, ret+24(FP)
+	RET
+
+TEXT runtime·timer_delete(SB),NOSPLIT,$0-12
+	MOVW	timerid+0(FP), R0
+	MOVD	$SYS_timer_delete, R8
+	SVC
+	MOVW	R0, ret+8(FP)
+	RET
+
+TEXT runtime·mincore(SB),NOSPLIT|NOFRAME,$0-28
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVD	dst+16(FP), R2
+	MOVD	$SYS_mincore, R8
+	SVC
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB),NOSPLIT,$24-12
+	MOVD	RSP, R20	// R20 is unchanged by C code
+	MOVD	RSP, R1
+
+	MOVD	g_m(g), R21	// R21 = m
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVD	m_vdsoPC(R21), R2
+	MOVD	m_vdsoSP(R21), R3
+	MOVD	R2, 8(RSP)
+	MOVD	R3, 16(RSP)
+
+	MOVD	$ret-8(FP), R2 // caller's SP
+	MOVD	LR, m_vdsoPC(R21)
+	MOVD	R2, m_vdsoSP(R21)
+
+	MOVD	m_curg(R21), R0
+	CMP	g, R0
+	BNE	noswitch
+
+	MOVD	m_g0(R21), R3
+	MOVD	(g_sched+gobuf_sp)(R3), R1	// Set RSP to g0 stack
+
+noswitch:
+	SUB	$16, R1
+	BIC	$15, R1	// Align for C code
+	MOVD	R1, RSP
+
+	MOVW	$CLOCK_REALTIME, R0
+	MOVD	runtime·vdsoClockgettimeSym(SB), R2
+	CBZ	R2, fallback
+
+	// Store g on gsignal's stack, so if we receive a signal
+	// during VDSO code we can find the g.
+	// If we don't have a signal stack, we won't receive signal,
+	// so don't bother saving g.
+	// When using cgo, we already saved g on TLS, also don't save
+	// g here.
+	// Also don't save g if we are already on the signal stack.
+	// We won't get a nested signal.
+	MOVBU	runtime·iscgo(SB), R22
+	CBNZ	R22, nosaveg
+	MOVD	m_gsignal(R21), R22          // g.m.gsignal
+	CBZ	R22, nosaveg
+	CMP	g, R22
+	BEQ	nosaveg
+	MOVD	(g_stack+stack_lo)(R22), R22 // g.m.gsignal.stack.lo
+	MOVD	g, (R22)
+
+	BL	(R2)
+
+	MOVD	ZR, (R22)  // clear g slot, R22 is unchanged by C code
+
+	B	finish
+
+nosaveg:
+	BL	(R2)
+	B	finish
+
+fallback:
+	MOVD	$SYS_clock_gettime, R8
+	SVC
+
+finish:
+	MOVD	0(RSP), R3	// sec
+	MOVD	8(RSP), R5	// nsec
+
+	MOVD	R20, RSP	// restore SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVD	16(RSP), R1
+	MOVD	R1, m_vdsoSP(R21)
+	MOVD	8(RSP), R1
+	MOVD	R1, m_vdsoPC(R21)
+
+	MOVD	R3, sec+0(FP)
+	MOVW	R5, nsec+8(FP)
+	RET
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$24-8
+	MOVD	RSP, R20	// R20 is unchanged by C code
+	MOVD	RSP, R1
+
+	MOVD	g_m(g), R21	// R21 = m
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVD	m_vdsoPC(R21), R2
+	MOVD	m_vdsoSP(R21), R3
+	MOVD	R2, 8(RSP)
+	MOVD	R3, 16(RSP)
+
+	MOVD	$ret-8(FP), R2 // caller's SP
+	MOVD	LR, m_vdsoPC(R21)
+	MOVD	R2, m_vdsoSP(R21)
+
+	MOVD	m_curg(R21), R0
+	CMP	g, R0
+	BNE	noswitch
+
+	MOVD	m_g0(R21), R3
+	MOVD	(g_sched+gobuf_sp)(R3), R1	// Set RSP to g0 stack
+
+noswitch:
+	SUB	$32, R1
+	BIC	$15, R1
+	MOVD	R1, RSP
+
+	MOVW	$CLOCK_MONOTONIC, R0
+	MOVD	runtime·vdsoClockgettimeSym(SB), R2
+	CBZ	R2, fallback
+
+	// Store g on gsignal's stack, so if we receive a signal
+	// during VDSO code we can find the g.
+	// If we don't have a signal stack, we won't receive signal,
+	// so don't bother saving g.
+	// When using cgo, we already saved g on TLS, also don't save
+	// g here.
+	// Also don't save g if we are already on the signal stack.
+	// We won't get a nested signal.
+	MOVBU	runtime·iscgo(SB), R22
+	CBNZ	R22, nosaveg
+	MOVD	m_gsignal(R21), R22          // g.m.gsignal
+	CBZ	R22, nosaveg
+	CMP	g, R22
+	BEQ	nosaveg
+	MOVD	(g_stack+stack_lo)(R22), R22 // g.m.gsignal.stack.lo
+	MOVD	g, (R22)
+
+	BL	(R2)
+
+	MOVD	ZR, (R22)  // clear g slot, R22 is unchanged by C code
+
+	B	finish
+
+nosaveg:
+	BL	(R2)
+	B	finish
+
+fallback:
+	MOVD	$SYS_clock_gettime, R8
+	SVC
+
+finish:
+	MOVD	0(RSP), R3	// sec
+	MOVD	8(RSP), R5	// nsec
+
+	MOVD	R20, RSP	// restore SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVD	16(RSP), R1
+	MOVD	R1, m_vdsoSP(R21)
+	MOVD	8(RSP), R1
+	MOVD	R1, m_vdsoPC(R21)
+
+	// sec is in R3, nsec in R5
+	// return nsec in R3
+	MOVD	$1000000000, R4
+	MUL	R4, R3
+	ADD	R5, R3
+	MOVD	R3, ret+0(FP)
+	RET
+
+TEXT runtime·rtsigprocmask(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	how+0(FP), R0
+	MOVD	new+8(FP), R1
+	MOVD	old+16(FP), R2
+	MOVW	size+24(FP), R3
+	MOVD	$SYS_rt_sigprocmask, R8
+	SVC
+	CMN	$4095, R0
+	BCC	done
+	MOVD	$0, R0
+	MOVD	R0, (R0)	// crash
+done:
+	RET
+
+TEXT runtime·rt_sigaction(SB),NOSPLIT|NOFRAME,$0-36
+	MOVD	sig+0(FP), R0
+	MOVD	new+8(FP), R1
+	MOVD	old+16(FP), R2
+	MOVD	size+24(FP), R3
+	MOVD	$SYS_rt_sigaction, R8
+	SVC
+	MOVW	R0, ret+32(FP)
+	RET
+
+// Call the function stored in _cgo_sigaction using the GCC calling convention.
+TEXT runtime·callCgoSigaction(SB),NOSPLIT,$0
+	MOVD	sig+0(FP), R0
+	MOVD	new+8(FP), R1
+	MOVD	old+16(FP), R2
+	MOVD	 _cgo_sigaction(SB), R3
+	SUB	$16, RSP		// reserve 16 bytes for sp-8 where fp may be saved.
+	BL	R3
+	ADD	$16, RSP
+	MOVW	R0, ret+24(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R0
+	MOVD	info+16(FP), R1
+	MOVD	ctx+24(FP), R2
+	MOVD	fn+0(FP), R11
+	BL	(R11)
+	RET
+
+// Called from c-abi, R0: sig, R1: info, R2: cxt
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$176
+	// Save callee-save registers in the case of signal forwarding.
+	// Please refer to https://golang.org/issue/31827 .
+	SAVE_R19_TO_R28(8*4)
+	SAVE_F8_TO_F15(8*14)
+
+	// this might be called in external code context,
+	// where g is not set.
+	// first save R0, because runtime·load_g will clobber it
+	MOVW	R0, 8(RSP)
+	MOVBU	runtime·iscgo(SB), R0
+	CBZ	R0, 2(PC)
+	BL	runtime·load_g(SB)
+
+	// Restore signum to R0.
+	MOVW	8(RSP), R0
+	// R1 and R2 already contain info and ctx, respectively.
+	MOVD	$runtime·sigtrampgo<ABIInternal>(SB), R3
+	BL	(R3)
+
+	// Restore callee-save registers.
+	RESTORE_R19_TO_R28(8*4)
+	RESTORE_F8_TO_F15(8*14)
+
+	RET
+
+// Called from c-abi, R0: sig, R1: info, R2: cxt
+TEXT runtime·sigprofNonGoWrapper<>(SB),NOSPLIT,$176
+	// Save callee-save registers because it's a callback from c code.
+	SAVE_R19_TO_R28(8*4)
+	SAVE_F8_TO_F15(8*14)
+
+	// R0, R1 and R2 already contain sig, info and ctx, respectively.
+	CALL	runtime·sigprofNonGo<ABIInternal>(SB)
+
+	// Restore callee-save registers.
+	RESTORE_R19_TO_R28(8*4)
+	RESTORE_F8_TO_F15(8*14)
+	RET
+
+// Called from c-abi, R0: sig, R1: info, R2: cxt
+TEXT runtime·cgoSigtramp(SB),NOSPLIT|NOFRAME,$0
+	// The stack unwinder, presumably written in C, may not be able to
+	// handle Go frame correctly. So, this function is NOFRAME, and we
+	// save/restore LR manually.
+	MOVD	LR, R10
+	// Save R27, g because they will be clobbered,
+	// we need to restore them before jump to sigtramp.
+	MOVD	R27, R11
+	MOVD	g, R12
+
+	// If no traceback function, do usual sigtramp.
+	MOVD	runtime·cgoTraceback(SB), R6
+	CBZ	R6, sigtramp
+
+	// If no traceback support function, which means that
+	// runtime/cgo was not linked in, do usual sigtramp.
+	MOVD	_cgo_callers(SB), R7
+	CBZ	R7, sigtramp
+
+	// Figure out if we are currently in a cgo call.
+	// If not, just do usual sigtramp.
+	// first save R0, because runtime·load_g will clobber it.
+	MOVD	R0, R8
+	// Set up g register.
+	CALL	runtime·load_g(SB)
+	MOVD	R8, R0
+
+	CBZ	g, sigtrampnog // g == nil
+	MOVD	g_m(g), R6
+	CBZ	R6, sigtramp    // g.m == nil
+	MOVW	m_ncgo(R6), R7
+	CBZW	R7, sigtramp    // g.m.ncgo = 0
+	MOVD	m_curg(R6), R8
+	CBZ	R8, sigtramp    // g.m.curg == nil
+	MOVD	g_syscallsp(R8), R7
+	CBZ	R7,	sigtramp    // g.m.curg.syscallsp == 0
+	MOVD	m_cgoCallers(R6), R4 // R4 is the fifth arg in C calling convention.
+	CBZ	R4,	sigtramp    // g.m.cgoCallers == nil
+	MOVW	m_cgoCallersUse(R6), R8
+	CBNZW	R8, sigtramp    // g.m.cgoCallersUse != 0
+
+	// Jump to a function in runtime/cgo.
+	// That function, written in C, will call the user's traceback
+	// function with proper unwind info, and will then call back here.
+	// The first three arguments, and the fifth, are already in registers.
+	// Set the two remaining arguments now.
+	MOVD	runtime·cgoTraceback(SB), R3
+	MOVD	$runtime·sigtramp(SB), R5
+	MOVD	_cgo_callers(SB), R13
+	MOVD	R10, LR // restore
+	MOVD	R11, R27
+	MOVD	R12, g
+	B	(R13)
+
+sigtramp:
+	MOVD	R10, LR // restore
+	MOVD	R11, R27
+	MOVD	R12, g
+	B	runtime·sigtramp(SB)
+
+sigtrampnog:
+	// Signal arrived on a non-Go thread. If this is SIGPROF, get a
+	// stack trace.
+	CMPW	$27, R0 // 27 == SIGPROF
+	BNE	sigtramp
+
+	// Lock sigprofCallersUse (cas from 0 to 1).
+	MOVW	$1, R7
+	MOVD	$runtime·sigprofCallersUse(SB), R8
+load_store_loop:
+	LDAXRW	(R8), R9
+	CBNZW	R9, sigtramp // Skip stack trace if already locked.
+	STLXRW	R7, (R8), R9
+	CBNZ	R9, load_store_loop
+
+	// Jump to the traceback function in runtime/cgo.
+	// It will call back to sigprofNonGo, which will ignore the
+	// arguments passed in registers.
+	// First three arguments to traceback function are in registers already.
+	MOVD	runtime·cgoTraceback(SB), R3
+	MOVD	$runtime·sigprofCallers(SB), R4
+	MOVD	$runtime·sigprofNonGoWrapper<>(SB), R5
+	MOVD	_cgo_callers(SB), R13
+	MOVD	R10, LR // restore
+	MOVD	R11, R27
+	MOVD	R12, g
+	B	(R13)
+
+TEXT runtime·sysMmap(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVW	prot+16(FP), R2
+	MOVW	flags+20(FP), R3
+	MOVW	fd+24(FP), R4
+	MOVW	off+28(FP), R5
+
+	MOVD	$SYS_mmap, R8
+	SVC
+	CMN	$4095, R0
+	BCC	ok
+	NEG	R0,R0
+	MOVD	$0, p+32(FP)
+	MOVD	R0, err+40(FP)
+	RET
+ok:
+	MOVD	R0, p+32(FP)
+	MOVD	$0, err+40(FP)
+	RET
+
+// Call the function stored in _cgo_mmap using the GCC calling convention.
+// This must be called on the system stack.
+TEXT runtime·callCgoMmap(SB),NOSPLIT,$0
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVW	prot+16(FP), R2
+	MOVW	flags+20(FP), R3
+	MOVW	fd+24(FP), R4
+	MOVW	off+28(FP), R5
+	MOVD	_cgo_mmap(SB), R9
+	SUB	$16, RSP		// reserve 16 bytes for sp-8 where fp may be saved.
+	BL	R9
+	ADD	$16, RSP
+	MOVD	R0, ret+32(FP)
+	RET
+
+TEXT runtime·sysMunmap(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVD	$SYS_munmap, R8
+	SVC
+	CMN	$4095, R0
+	BCC	cool
+	MOVD	R0, 0xf0(R0)
+cool:
+	RET
+
+// Call the function stored in _cgo_munmap using the GCC calling convention.
+// This must be called on the system stack.
+TEXT runtime·callCgoMunmap(SB),NOSPLIT,$0
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVD	_cgo_munmap(SB), R9
+	SUB	$16, RSP		// reserve 16 bytes for sp-8 where fp may be saved.
+	BL	R9
+	ADD	$16, RSP
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R0
+	MOVD	n+8(FP), R1
+	MOVW	flags+16(FP), R2
+	MOVD	$SYS_madvise, R8
+	SVC
+	MOVW	R0, ret+24(FP)
+	RET
+
+// int64 futex(int32 *uaddr, int32 op, int32 val,
+//	struct timespec *timeout, int32 *uaddr2, int32 val2);
+TEXT runtime·futex(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R0
+	MOVW	op+8(FP), R1
+	MOVW	val+12(FP), R2
+	MOVD	ts+16(FP), R3
+	MOVD	addr2+24(FP), R4
+	MOVW	val3+32(FP), R5
+	MOVD	$SYS_futex, R8
+	SVC
+	MOVW	R0, ret+40(FP)
+	RET
+
+// int64 clone(int32 flags, void *stk, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT|NOFRAME,$0
+	MOVW	flags+0(FP), R0
+	MOVD	stk+8(FP), R1
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	MOVD	mp+16(FP), R10
+	MOVD	gp+24(FP), R11
+	MOVD	fn+32(FP), R12
+
+	MOVD	R10, -8(R1)
+	MOVD	R11, -16(R1)
+	MOVD	R12, -24(R1)
+	MOVD	$1234, R10
+	MOVD	R10, -32(R1)
+
+	MOVD	$SYS_clone, R8
+	SVC
+
+	// In parent, return.
+	CMP	ZR, R0
+	BEQ	child
+	MOVW	R0, ret+40(FP)
+	RET
+child:
+
+	// In child, on new stack.
+	MOVD	-32(RSP), R10
+	MOVD	$1234, R0
+	CMP	R0, R10
+	BEQ	good
+	MOVD	$0, R0
+	MOVD	R0, (R0)	// crash
+
+good:
+	// Initialize m->procid to Linux tid
+	MOVD	$SYS_gettid, R8
+	SVC
+
+	MOVD	-24(RSP), R12     // fn
+	MOVD	-16(RSP), R11     // g
+	MOVD	-8(RSP), R10      // m
+
+	CMP	$0, R10
+	BEQ	nog
+	CMP	$0, R11
+	BEQ	nog
+
+	MOVD	R0, m_procid(R10)
+
+	// TODO: setup TLS.
+
+	// In child, set up new stack
+	MOVD	R10, g_m(R11)
+	MOVD	R11, g
+	//CALL	runtime·stackcheck(SB)
+
+nog:
+	// Call fn
+	MOVD	R12, R0
+	BL	(R0)
+
+	// It shouldn't return.	 If it does, exit that thread.
+	MOVW	$111, R0
+again:
+	MOVD	$SYS_exit, R8
+	SVC
+	B	again	// keep exiting
+
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOVD	new+0(FP), R0
+	MOVD	old+8(FP), R1
+	MOVD	$SYS_sigaltstack, R8
+	SVC
+	CMN	$4095, R0
+	BCC	ok
+	MOVD	$0, R0
+	MOVD	R0, (R0)	// crash
+ok:
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$SYS_sched_yield, R8
+	SVC
+	RET
+
+TEXT runtime·sched_getaffinity(SB),NOSPLIT|NOFRAME,$0
+	MOVD	pid+0(FP), R0
+	MOVD	len+8(FP), R1
+	MOVD	buf+16(FP), R2
+	MOVD	$SYS_sched_getaffinity, R8
+	SVC
+	MOVW	R0, ret+24(FP)
+	RET
+
+// int access(const char *name, int mode)
+TEXT runtime·access(SB),NOSPLIT,$0-20
+	MOVD	$AT_FDCWD, R0
+	MOVD	name+0(FP), R1
+	MOVW	mode+8(FP), R2
+	MOVD	$SYS_faccessat, R8
+	SVC
+	MOVW	R0, ret+16(FP)
+	RET
+
+// int connect(int fd, const struct sockaddr *addr, socklen_t len)
+TEXT runtime·connect(SB),NOSPLIT,$0-28
+	MOVW	fd+0(FP), R0
+	MOVD	addr+8(FP), R1
+	MOVW	len+16(FP), R2
+	MOVD	$SYS_connect, R8
+	SVC
+	MOVW	R0, ret+24(FP)
+	RET
+
+// int socket(int domain, int typ, int prot)
+TEXT runtime·socket(SB),NOSPLIT,$0-20
+	MOVW	domain+0(FP), R0
+	MOVW	typ+4(FP), R1
+	MOVW	prot+8(FP), R2
+	MOVD	$SYS_socket, R8
+	SVC
+	MOVW	R0, ret+16(FP)
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT,$0-8
+	// Implemented as brk(NULL).
+	MOVD	$0, R0
+	MOVD	$SYS_brk, R8
+	SVC
+	MOVD	R0, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_linux_loong64.s b/src/runtime/sys_linux_loong64.s
new file mode 100644
index 0000000..12e5455
--- /dev/null
+++ b/src/runtime/sys_linux_loong64.s
@@ -0,0 +1,630 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for loong64, Linux
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "cgo/abi_loong64.h"
+
+#define AT_FDCWD	-100
+#define CLOCK_REALTIME	0
+#define CLOCK_MONOTONIC	1
+
+#define SYS_exit		93
+#define SYS_read		63
+#define SYS_write		64
+#define SYS_close		57
+#define SYS_getpid		172
+#define SYS_kill		129
+#define SYS_mmap		222
+#define SYS_munmap		215
+#define SYS_setitimer		103
+#define SYS_clone		220
+#define SYS_nanosleep		101
+#define SYS_sched_yield		124
+#define SYS_rt_sigreturn	139
+#define SYS_rt_sigaction	134
+#define SYS_rt_sigprocmask	135
+#define SYS_sigaltstack		132
+#define SYS_madvise		233
+#define SYS_mincore		232
+#define SYS_gettid		178
+#define SYS_futex		98
+#define SYS_sched_getaffinity	123
+#define SYS_exit_group		94
+#define SYS_tgkill		131
+#define SYS_openat		56
+#define SYS_clock_gettime	113
+#define SYS_brk			214
+#define SYS_pipe2		59
+#define SYS_timer_create	107
+#define SYS_timer_settime	110
+#define SYS_timer_delete	111
+
+// func exit(code int32)
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	code+0(FP), R4
+	MOVV	$SYS_exit_group, R11
+	SYSCALL
+	RET
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT|NOFRAME,$0-8
+	MOVV	wait+0(FP), R19
+	// We're done using the stack.
+	MOVW	$0, R11
+	DBAR
+	MOVW	R11, (R19)
+	DBAR
+	MOVW	$0, R4	// exit code
+	MOVV	$SYS_exit, R11
+	SYSCALL
+	JMP	0(PC)
+
+// func open(name *byte, mode, perm int32) int32
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0-20
+	MOVW	$AT_FDCWD, R4 // AT_FDCWD, so this acts like open
+	MOVV	name+0(FP), R5
+	MOVW	mode+8(FP), R6
+	MOVW	perm+12(FP), R7
+	MOVV	$SYS_openat, R11
+	SYSCALL
+	MOVW	$-4096, R5
+	BGEU	R5, R4, 2(PC)
+	MOVW	$-1, R4
+	MOVW	R4, ret+16(FP)
+	RET
+
+// func closefd(fd int32) int32
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	fd+0(FP), R4
+	MOVV	$SYS_close, R11
+	SYSCALL
+	MOVW	$-4096, R5
+	BGEU	R5, R4, 2(PC)
+	MOVW	$-1, R4
+	MOVW	R4, ret+8(FP)
+	RET
+
+// func write1(fd uintptr, p unsafe.Pointer, n int32) int32
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0-28
+	MOVV	fd+0(FP), R4
+	MOVV	p+8(FP), R5
+	MOVW	n+16(FP), R6
+	MOVV	$SYS_write, R11
+	SYSCALL
+	MOVW	R4, ret+24(FP)
+	RET
+
+// func read(fd int32, p unsafe.Pointer, n int32) int32
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	fd+0(FP), R4
+	MOVV	p+8(FP), R5
+	MOVW	n+16(FP), R6
+	MOVV	$SYS_read, R11
+	SYSCALL
+	MOVW	R4, ret+24(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	MOVV	$r+8(FP), R4
+	MOVW	flags+0(FP), R5
+	MOVV	$SYS_pipe2, R11
+	SYSCALL
+	MOVW	R4, errno+16(FP)
+	RET
+
+// func usleep(usec uint32)
+TEXT runtime·usleep(SB),NOSPLIT,$16-4
+	MOVWU	usec+0(FP), R7
+	MOVV	$1000, R6
+	MULVU	R6, R7, R7
+	MOVV	$1000000000, R6
+
+	DIVVU	R6, R7, R5	// ts->tv_sec
+	REMVU	R6, R7, R4	// ts->tv_nsec
+	MOVV	R5, 8(R3)
+	MOVV	R4, 16(R3)
+
+	// nanosleep(&ts, 0)
+	ADDV	$8, R3, R4
+	MOVV	R0, R5
+	MOVV	$SYS_nanosleep, R11
+	SYSCALL
+	RET
+
+// func gettid() uint32
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOVV	$SYS_gettid, R11
+	SYSCALL
+	MOVW	R4, ret+0(FP)
+	RET
+
+// func raise(sig uint32)
+TEXT runtime·raise(SB),NOSPLIT|NOFRAME,$0
+	MOVV	$SYS_getpid, R11
+	SYSCALL
+	MOVW	R4, R23
+	MOVV	$SYS_gettid, R11
+	SYSCALL
+	MOVW	R4, R5	// arg 2 tid
+	MOVW	R23, R4	// arg 1 pid
+	MOVW	sig+0(FP), R6	// arg 3
+	MOVV	$SYS_tgkill, R11
+	SYSCALL
+	RET
+
+// func raiseproc(sig uint32)
+TEXT runtime·raiseproc(SB),NOSPLIT|NOFRAME,$0
+	MOVV	$SYS_getpid, R11
+	SYSCALL
+	//MOVW	R4, R4	// arg 1 pid
+	MOVW	sig+0(FP), R5	// arg 2
+	MOVV	$SYS_kill, R11
+	SYSCALL
+	RET
+
+// func getpid() int
+TEXT ·getpid(SB),NOSPLIT|NOFRAME,$0-8
+	MOVV	$SYS_getpid, R11
+	SYSCALL
+	MOVV	R4, ret+0(FP)
+	RET
+
+// func tgkill(tgid, tid, sig int)
+TEXT ·tgkill(SB),NOSPLIT|NOFRAME,$0-24
+	MOVV	tgid+0(FP), R4
+	MOVV	tid+8(FP), R5
+	MOVV	sig+16(FP), R6
+	MOVV	$SYS_tgkill, R11
+	SYSCALL
+	RET
+
+// func setitimer(mode int32, new, old *itimerval)
+TEXT runtime·setitimer(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	mode+0(FP), R4
+	MOVV	new+8(FP), R5
+	MOVV	old+16(FP), R6
+	MOVV	$SYS_setitimer, R11
+	SYSCALL
+	RET
+
+// func timer_create(clockid int32, sevp *sigevent, timerid *int32) int32
+TEXT runtime·timer_create(SB),NOSPLIT,$0-28
+	MOVW	clockid+0(FP), R4
+	MOVV	sevp+8(FP), R5
+	MOVV	timerid+16(FP), R6
+	MOVV	$SYS_timer_create, R11
+	SYSCALL
+	MOVW	R4, ret+24(FP)
+	RET
+
+// func timer_settime(timerid int32, flags int32, new, old *itimerspec) int32
+TEXT runtime·timer_settime(SB),NOSPLIT,$0-28
+	MOVW	timerid+0(FP), R4
+	MOVW	flags+4(FP), R5
+	MOVV	new+8(FP), R6
+	MOVV	old+16(FP), R7
+	MOVV	$SYS_timer_settime, R11
+	SYSCALL
+	MOVW	R4, ret+24(FP)
+	RET
+
+// func timer_delete(timerid int32) int32
+TEXT runtime·timer_delete(SB),NOSPLIT,$0-12
+	MOVW	timerid+0(FP), R4
+	MOVV	$SYS_timer_delete, R11
+	SYSCALL
+	MOVW	R4, ret+8(FP)
+	RET
+
+// func mincore(addr unsafe.Pointer, n uintptr, dst *byte) int32
+TEXT runtime·mincore(SB),NOSPLIT|NOFRAME,$0-28
+	MOVV	addr+0(FP), R4
+	MOVV	n+8(FP), R5
+	MOVV	dst+16(FP), R6
+	MOVV	$SYS_mincore, R11
+	SYSCALL
+	MOVW	R4, ret+24(FP)
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB),NOSPLIT,$24-12
+	MOVV	R3, R23	// R23 is unchanged by C code
+	MOVV	R3, R25
+
+	MOVV	g_m(g), R24	// R24 = m
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVV	m_vdsoPC(R24), R11
+	MOVV	m_vdsoSP(R24), R7
+	MOVV	R11, 8(R3)
+	MOVV	R7, 16(R3)
+
+	MOVV    $ret-8(FP), R11 // caller's SP
+	MOVV	R1, m_vdsoPC(R24)
+	MOVV	R11, m_vdsoSP(R24)
+
+	MOVV	m_curg(R24), R4
+	MOVV	g, R5
+	BNE	R4, R5, noswitch
+
+	MOVV	m_g0(R24), R4
+	MOVV	(g_sched+gobuf_sp)(R4), R25	// Set SP to g0 stack
+
+noswitch:
+	SUBV	$16, R25
+	AND	$~15, R25	// Align for C code
+	MOVV	R25, R3
+
+	MOVW	$CLOCK_REALTIME, R4
+	MOVV	$0(R3), R5
+
+	MOVV	runtime·vdsoClockgettimeSym(SB), R20
+	BEQ	R20, fallback
+
+	// Store g on gsignal's stack, see sys_linux_arm64.s for detail
+	MOVBU	runtime·iscgo(SB), R25
+	BNE	R25, nosaveg
+
+	MOVV	m_gsignal(R24), R25	// g.m.gsignal
+	BEQ	R25, nosaveg
+	BEQ	g, R25, nosaveg
+
+	MOVV	(g_stack+stack_lo)(R25), R25	// g.m.gsignal.stack.lo
+	MOVV	g, (R25)
+
+	JAL	(R20)
+
+	MOVV	R0, (R25)
+	JMP	finish
+
+nosaveg:
+	JAL	(R20)
+
+finish:
+	MOVV	0(R3), R7	// sec
+	MOVV	8(R3), R5	// nsec
+
+	MOVV	R23, R3	// restore SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVV	16(R3), R25
+	MOVV	R25, m_vdsoSP(R24)
+	MOVV	8(R3), R25
+	MOVV	R25, m_vdsoPC(R24)
+
+	MOVV	R7, sec+0(FP)
+	MOVW	R5, nsec+8(FP)
+	RET
+
+fallback:
+	MOVV	$SYS_clock_gettime, R11
+	SYSCALL
+	JMP finish
+
+// func nanotime1() int64
+TEXT runtime·nanotime1(SB),NOSPLIT,$16-8
+	MOVV	R3, R23	// R23 is unchanged by C code
+	MOVV	R3, R25
+
+	MOVV	g_m(g), R24	// R24 = m
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVV	m_vdsoPC(R24), R11
+	MOVV	m_vdsoSP(R24), R7
+	MOVV	R11, 8(R3)
+	MOVV	R7, 16(R3)
+
+	MOVV    $ret-8(FP), R11 // caller's SP
+	MOVV	R1, m_vdsoPC(R24)
+	MOVV	R11, m_vdsoSP(R24)
+
+	MOVV	m_curg(R24), R4
+	MOVV	g, R5
+	BNE	R4, R5, noswitch
+
+	MOVV	m_g0(R24), R4
+	MOVV	(g_sched+gobuf_sp)(R4), R25	// Set SP to g0 stack
+
+noswitch:
+	SUBV	$16, R25
+	AND	$~15, R25	// Align for C code
+	MOVV	R25, R3
+
+	MOVW	$CLOCK_MONOTONIC, R4
+	MOVV	$0(R3), R5
+
+	MOVV	runtime·vdsoClockgettimeSym(SB), R20
+	BEQ	R20, fallback
+
+	// Store g on gsignal's stack, see sys_linux_arm64.s for detail
+	MOVBU	runtime·iscgo(SB), R25
+	BNE	R25, nosaveg
+
+	MOVV	m_gsignal(R24), R25	// g.m.gsignal
+	BEQ	R25, nosaveg
+	BEQ	g, R25, nosaveg
+
+	MOVV	(g_stack+stack_lo)(R25), R25	// g.m.gsignal.stack.lo
+	MOVV	g, (R25)
+
+	JAL	(R20)
+
+	MOVV	R0, (R25)
+	JMP	finish
+
+nosaveg:
+	JAL	(R20)
+
+finish:
+	MOVV	0(R3), R7	// sec
+	MOVV	8(R3), R5	// nsec
+
+	MOVV	R23, R3	// restore SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVV	16(R3), R25
+	MOVV	R25, m_vdsoSP(R24)
+	MOVV	8(R3), R25
+	MOVV	R25, m_vdsoPC(R24)
+
+	// sec is in R7, nsec in R5
+	// return nsec in R7
+	MOVV	$1000000000, R4
+	MULVU	R4, R7, R7
+	ADDVU	R5, R7
+	MOVV	R7, ret+0(FP)
+	RET
+
+fallback:
+	MOVV	$SYS_clock_gettime, R11
+	SYSCALL
+	JMP	finish
+
+// func rtsigprocmask(how int32, new, old *sigset, size int32)
+TEXT runtime·rtsigprocmask(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	how+0(FP), R4
+	MOVV	new+8(FP), R5
+	MOVV	old+16(FP), R6
+	MOVW	size+24(FP), R7
+	MOVV	$SYS_rt_sigprocmask, R11
+	SYSCALL
+	MOVW	$-4096, R5
+	BGEU	R5, R4, 2(PC)
+	MOVV	R0, 0xf1(R0)	// crash
+	RET
+
+// func rt_sigaction(sig uintptr, new, old *sigactiont, size uintptr) int32
+TEXT runtime·rt_sigaction(SB),NOSPLIT|NOFRAME,$0-36
+	MOVV	sig+0(FP), R4
+	MOVV	new+8(FP), R5
+	MOVV	old+16(FP), R6
+	MOVV	size+24(FP), R7
+	MOVV	$SYS_rt_sigaction, R11
+	SYSCALL
+	MOVW	R4, ret+32(FP)
+	RET
+
+// func sigfwd(fn uintptr, sig uint32, info *siginfo, ctx unsafe.Pointer)
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R4
+	MOVV	info+16(FP), R5
+	MOVV	ctx+24(FP), R6
+	MOVV	fn+0(FP), R20
+	JAL	(R20)
+	RET
+
+// func sigtramp(signo, ureg, ctxt unsafe.Pointer)
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$168
+	MOVW	R4, (1*8)(R3)
+	MOVV	R5, (2*8)(R3)
+	MOVV	R6, (3*8)(R3)
+
+	// Save callee-save registers in the case of signal forwarding.
+	// Please refer to https://golang.org/issue/31827 .
+	SAVE_R22_TO_R31((4*8))
+	SAVE_F24_TO_F31((14*8))
+
+	// this might be called in external code context,
+	// where g is not set.
+	MOVB	runtime·iscgo(SB), R4
+	BEQ	R4, 2(PC)
+	JAL	runtime·load_g(SB)
+
+	MOVV	$runtime·sigtrampgo(SB), R4
+	JAL	(R4)
+
+	// Restore callee-save registers.
+	RESTORE_R22_TO_R31((4*8))
+	RESTORE_F24_TO_F31((14*8))
+
+	RET
+
+// func cgoSigtramp()
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	JMP	runtime·sigtramp(SB)
+
+// func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (p unsafe.Pointer, err int)
+TEXT runtime·mmap(SB),NOSPLIT|NOFRAME,$0
+	MOVV	addr+0(FP), R4
+	MOVV	n+8(FP), R5
+	MOVW	prot+16(FP), R6
+	MOVW	flags+20(FP), R7
+	MOVW	fd+24(FP), R8
+	MOVW	off+28(FP), R9
+
+	MOVV	$SYS_mmap, R11
+	SYSCALL
+	MOVW	$-4096, R5
+	BGEU	R5, R4, ok
+	MOVV	$0, p+32(FP)
+	SUBVU	R4, R0, R4
+	MOVV	R4, err+40(FP)
+	RET
+ok:
+	MOVV	R4, p+32(FP)
+	MOVV	$0, err+40(FP)
+	RET
+
+// func munmap(addr unsafe.Pointer, n uintptr)
+TEXT runtime·munmap(SB),NOSPLIT|NOFRAME,$0
+	MOVV	addr+0(FP), R4
+	MOVV	n+8(FP), R5
+	MOVV	$SYS_munmap, R11
+	SYSCALL
+	MOVW	$-4096, R5
+	BGEU	R5, R4, 2(PC)
+	MOVV	R0, 0xf3(R0)	// crash
+	RET
+
+// func madvise(addr unsafe.Pointer, n uintptr, flags int32)
+TEXT runtime·madvise(SB),NOSPLIT|NOFRAME,$0
+	MOVV	addr+0(FP), R4
+	MOVV	n+8(FP), R5
+	MOVW	flags+16(FP), R6
+	MOVV	$SYS_madvise, R11
+	SYSCALL
+	MOVW	R4, ret+24(FP)
+	RET
+
+// func futex(addr unsafe.Pointer, op int32, val uint32, ts, addr2 unsafe.Pointer, val3 uint32) int32
+TEXT runtime·futex(SB),NOSPLIT|NOFRAME,$0
+	MOVV	addr+0(FP), R4
+	MOVW	op+8(FP), R5
+	MOVW	val+12(FP), R6
+	MOVV	ts+16(FP), R7
+	MOVV	addr2+24(FP), R8
+	MOVW	val3+32(FP), R9
+	MOVV	$SYS_futex, R11
+	SYSCALL
+	MOVW	R4, ret+40(FP)
+	RET
+
+// int64 clone(int32 flags, void *stk, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT|NOFRAME,$0
+	MOVW	flags+0(FP), R4
+	MOVV	stk+8(FP), R5
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	// Careful: Linux system call clobbers ???.
+	MOVV	mp+16(FP), R23
+	MOVV	gp+24(FP), R24
+	MOVV	fn+32(FP), R25
+
+	MOVV	R23, -8(R5)
+	MOVV	R24, -16(R5)
+	MOVV	R25, -24(R5)
+	MOVV	$1234, R23
+	MOVV	R23, -32(R5)
+
+	MOVV	$SYS_clone, R11
+	SYSCALL
+
+	// In parent, return.
+	BEQ	R4, 3(PC)
+	MOVW	R4, ret+40(FP)
+	RET
+
+	// In child, on new stack.
+	MOVV	-32(R3), R23
+	MOVV	$1234, R19
+	BEQ	R23, R19, 2(PC)
+	MOVV	R0, 0(R0)
+
+	// Initialize m->procid to Linux tid
+	MOVV	$SYS_gettid, R11
+	SYSCALL
+
+	MOVV	-24(R3), R25		// fn
+	MOVV	-16(R3), R24		// g
+	MOVV	-8(R3), R23		// m
+
+	BEQ	R23, nog
+	BEQ	R24, nog
+
+	MOVV	R4, m_procid(R23)
+
+	// TODO: setup TLS.
+
+	// In child, set up new stack
+	MOVV	R23, g_m(R24)
+	MOVV	R24, g
+	//CALL	runtime·stackcheck(SB)
+
+nog:
+	// Call fn
+	JAL	(R25)
+
+	// It shouldn't return.	 If it does, exit that thread.
+	MOVW	$111, R4
+	MOVV	$SYS_exit, R11
+	SYSCALL
+	JMP	-3(PC)	// keep exiting
+
+// func sigaltstack(new, old *stackt)
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOVV	new+0(FP), R4
+	MOVV	old+8(FP), R5
+	MOVV	$SYS_sigaltstack, R11
+	SYSCALL
+	MOVW	$-4096, R5
+	BGEU	R5, R4, 2(PC)
+	MOVV	R0, 0xf1(R0)	// crash
+	RET
+
+// func osyield()
+TEXT runtime·osyield(SB),NOSPLIT|NOFRAME,$0
+	MOVV	$SYS_sched_yield, R11
+	SYSCALL
+	RET
+
+// func sched_getaffinity(pid, len uintptr, buf *uintptr) int32
+TEXT runtime·sched_getaffinity(SB),NOSPLIT|NOFRAME,$0
+	MOVV	pid+0(FP), R4
+	MOVV	len+8(FP), R5
+	MOVV	buf+16(FP), R6
+	MOVV	$SYS_sched_getaffinity, R11
+	SYSCALL
+	MOVW	R4, ret+24(FP)
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT|NOFRAME,$0-8
+	// Implemented as brk(NULL).
+	MOVV	$0, R4
+	MOVV	$SYS_brk, R11
+	SYSCALL
+	MOVV	R4, ret+0(FP)
+	RET
+
+TEXT runtime·access(SB),$0-20
+	MOVV	R0, 2(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+16(FP) // for vet
+	RET
+
+TEXT runtime·connect(SB),$0-28
+	MOVV	R0, 2(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+24(FP) // for vet
+	RET
+
+TEXT runtime·socket(SB),$0-20
+	MOVV	R0, 2(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+16(FP) // for vet
+	RET
diff --git a/src/runtime/sys_linux_mips64x.s b/src/runtime/sys_linux_mips64x.s
new file mode 100644
index 0000000..47f2da5
--- /dev/null
+++ b/src/runtime/sys_linux_mips64x.s
@@ -0,0 +1,588 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (mips64 || mips64le)
+
+//
+// System calls and other sys.stuff for mips64, Linux
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define AT_FDCWD -100
+
+#define SYS_exit		5058
+#define SYS_read		5000
+#define SYS_write		5001
+#define SYS_close		5003
+#define SYS_getpid		5038
+#define SYS_kill		5060
+#define SYS_mmap		5009
+#define SYS_munmap		5011
+#define SYS_setitimer		5036
+#define SYS_clone		5055
+#define SYS_nanosleep		5034
+#define SYS_sched_yield		5023
+#define SYS_rt_sigreturn	5211
+#define SYS_rt_sigaction	5013
+#define SYS_rt_sigprocmask	5014
+#define SYS_sigaltstack		5129
+#define SYS_madvise		5027
+#define SYS_mincore		5026
+#define SYS_gettid		5178
+#define SYS_futex		5194
+#define SYS_sched_getaffinity	5196
+#define SYS_exit_group		5205
+#define SYS_timer_create	5216
+#define SYS_timer_settime	5217
+#define SYS_timer_delete	5220
+#define SYS_tgkill		5225
+#define SYS_openat		5247
+#define SYS_clock_gettime	5222
+#define SYS_brk			5012
+#define SYS_pipe2		5287
+
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	code+0(FP), R4
+	MOVV	$SYS_exit_group, R2
+	SYSCALL
+	RET
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT|NOFRAME,$0-8
+	MOVV	wait+0(FP), R1
+	// We're done using the stack.
+	MOVW	$0, R2
+	SYNC
+	MOVW	R2, (R1)
+	SYNC
+	MOVW	$0, R4	// exit code
+	MOVV	$SYS_exit, R2
+	SYSCALL
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0-20
+	// This uses openat instead of open, because Android O blocks open.
+	MOVW	$AT_FDCWD, R4 // AT_FDCWD, so this acts like open
+	MOVV	name+0(FP), R5
+	MOVW	mode+8(FP), R6
+	MOVW	perm+12(FP), R7
+	MOVV	$SYS_openat, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	fd+0(FP), R4
+	MOVV	$SYS_close, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+8(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0-28
+	MOVV	fd+0(FP), R4
+	MOVV	p+8(FP), R5
+	MOVW	n+16(FP), R6
+	MOVV	$SYS_write, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+24(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	fd+0(FP), R4
+	MOVV	p+8(FP), R5
+	MOVW	n+16(FP), R6
+	MOVV	$SYS_read, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+24(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	MOVV	$r+8(FP), R4
+	MOVW	flags+0(FP), R5
+	MOVV	$SYS_pipe2, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, errno+16(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16-4
+	MOVWU	usec+0(FP), R3
+	MOVV	R3, R5
+	MOVW	$1000000, R4
+	DIVVU	R4, R3
+	MOVV	LO, R3
+	MOVV	R3, 8(R29)
+	MOVW	$1000, R4
+	MULVU	R3, R4
+	MOVV	LO, R4
+	SUBVU	R4, R5
+	MOVV	R5, 16(R29)
+
+	// nanosleep(&ts, 0)
+	ADDV	$8, R29, R4
+	MOVW	$0, R5
+	MOVV	$SYS_nanosleep, R2
+	SYSCALL
+	RET
+
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOVV	$SYS_gettid, R2
+	SYSCALL
+	MOVW	R2, ret+0(FP)
+	RET
+
+TEXT runtime·raise(SB),NOSPLIT|NOFRAME,$0
+	MOVV	$SYS_getpid, R2
+	SYSCALL
+	MOVW	R2, R16
+	MOVV	$SYS_gettid, R2
+	SYSCALL
+	MOVW	R2, R5	// arg 2 tid
+	MOVW	R16, R4	// arg 1 pid
+	MOVW	sig+0(FP), R6	// arg 3
+	MOVV	$SYS_tgkill, R2
+	SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT|NOFRAME,$0
+	MOVV	$SYS_getpid, R2
+	SYSCALL
+	MOVW	R2, R4	// arg 1 pid
+	MOVW	sig+0(FP), R5	// arg 2
+	MOVV	$SYS_kill, R2
+	SYSCALL
+	RET
+
+TEXT ·getpid(SB),NOSPLIT|NOFRAME,$0-8
+	MOVV	$SYS_getpid, R2
+	SYSCALL
+	MOVV	R2, ret+0(FP)
+	RET
+
+TEXT ·tgkill(SB),NOSPLIT|NOFRAME,$0-24
+	MOVV	tgid+0(FP), R4
+	MOVV	tid+8(FP), R5
+	MOVV	sig+16(FP), R6
+	MOVV	$SYS_tgkill, R2
+	SYSCALL
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	mode+0(FP), R4
+	MOVV	new+8(FP), R5
+	MOVV	old+16(FP), R6
+	MOVV	$SYS_setitimer, R2
+	SYSCALL
+	RET
+
+TEXT runtime·timer_create(SB),NOSPLIT,$0-28
+	MOVW	clockid+0(FP), R4
+	MOVV	sevp+8(FP), R5
+	MOVV	timerid+16(FP), R6
+	MOVV	$SYS_timer_create, R2
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+TEXT runtime·timer_settime(SB),NOSPLIT,$0-28
+	MOVW	timerid+0(FP), R4
+	MOVW	flags+4(FP), R5
+	MOVV	new+8(FP), R6
+	MOVV	old+16(FP), R7
+	MOVV	$SYS_timer_settime, R2
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+TEXT runtime·timer_delete(SB),NOSPLIT,$0-12
+	MOVW	timerid+0(FP), R4
+	MOVV	$SYS_timer_delete, R2
+	SYSCALL
+	MOVW	R2, ret+8(FP)
+	RET
+
+TEXT runtime·mincore(SB),NOSPLIT|NOFRAME,$0-28
+	MOVV	addr+0(FP), R4
+	MOVV	n+8(FP), R5
+	MOVV	dst+16(FP), R6
+	MOVV	$SYS_mincore, R2
+	SYSCALL
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+24(FP)
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB),NOSPLIT,$16-12
+	MOVV	R29, R16	// R16 is unchanged by C code
+	MOVV	R29, R1
+
+	MOVV	g_m(g), R17	// R17 = m
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVV	m_vdsoPC(R17), R2
+	MOVV	m_vdsoSP(R17), R3
+	MOVV	R2, 8(R29)
+	MOVV	R3, 16(R29)
+
+	MOVV	$ret-8(FP), R2 // caller's SP
+	MOVV	R31, m_vdsoPC(R17)
+	MOVV	R2, m_vdsoSP(R17)
+
+	MOVV	m_curg(R17), R4
+	MOVV	g, R5
+	BNE	R4, R5, noswitch
+
+	MOVV	m_g0(R17), R4
+	MOVV	(g_sched+gobuf_sp)(R4), R1	// Set SP to g0 stack
+
+noswitch:
+	SUBV	$16, R1
+	AND	$~15, R1	// Align for C code
+	MOVV	R1, R29
+
+	MOVW	$0, R4 // CLOCK_REALTIME
+	MOVV	$0(R29), R5
+
+	MOVV	runtime·vdsoClockgettimeSym(SB), R25
+	BEQ	R25, fallback
+
+	JAL	(R25)
+	// check on vdso call return for kernel compatibility
+	// see https://golang.org/issues/39046
+	// if we get any error make fallback permanent.
+	BEQ	R2, R0, finish
+	MOVV	R0, runtime·vdsoClockgettimeSym(SB)
+	MOVW	$0, R4 // CLOCK_REALTIME
+	MOVV	$0(R29), R5
+	JMP	fallback
+
+finish:
+	MOVV	0(R29), R3	// sec
+	MOVV	8(R29), R5	// nsec
+
+	MOVV	R16, R29	// restore SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVV	16(R29), R1
+	MOVV	R1, m_vdsoSP(R17)
+	MOVV	8(R29), R1
+	MOVV	R1, m_vdsoPC(R17)
+
+	MOVV	R3, sec+0(FP)
+	MOVW	R5, nsec+8(FP)
+	RET
+
+fallback:
+	MOVV	$SYS_clock_gettime, R2
+	SYSCALL
+	JMP finish
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$16-8
+	MOVV	R29, R16	// R16 is unchanged by C code
+	MOVV	R29, R1
+
+	MOVV	g_m(g), R17	// R17 = m
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVV	m_vdsoPC(R17), R2
+	MOVV	m_vdsoSP(R17), R3
+	MOVV	R2, 8(R29)
+	MOVV	R3, 16(R29)
+
+	MOVV	$ret-8(FP), R2 // caller's SP
+	MOVV	R31, m_vdsoPC(R17)
+	MOVV	R2, m_vdsoSP(R17)
+
+	MOVV	m_curg(R17), R4
+	MOVV	g, R5
+	BNE	R4, R5, noswitch
+
+	MOVV	m_g0(R17), R4
+	MOVV	(g_sched+gobuf_sp)(R4), R1	// Set SP to g0 stack
+
+noswitch:
+	SUBV	$16, R1
+	AND	$~15, R1	// Align for C code
+	MOVV	R1, R29
+
+	MOVW	$1, R4 // CLOCK_MONOTONIC
+	MOVV	$0(R29), R5
+
+	MOVV	runtime·vdsoClockgettimeSym(SB), R25
+	BEQ	R25, fallback
+
+	JAL	(R25)
+	// see walltime for detail
+	BEQ	R2, R0, finish
+	MOVV	R0, runtime·vdsoClockgettimeSym(SB)
+	MOVW	$1, R4 // CLOCK_MONOTONIC
+	MOVV	$0(R29), R5
+	JMP	fallback
+
+finish:
+	MOVV	0(R29), R3	// sec
+	MOVV	8(R29), R5	// nsec
+
+	MOVV	R16, R29	// restore SP
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVV	16(R29), R1
+	MOVV	R1, m_vdsoSP(R17)
+	MOVV	8(R29), R1
+	MOVV	R1, m_vdsoPC(R17)
+
+	// sec is in R3, nsec in R5
+	// return nsec in R3
+	MOVV	$1000000000, R4
+	MULVU	R4, R3
+	MOVV	LO, R3
+	ADDVU	R5, R3
+	MOVV	R3, ret+0(FP)
+	RET
+
+fallback:
+	MOVV	$SYS_clock_gettime, R2
+	SYSCALL
+	JMP	finish
+
+TEXT runtime·rtsigprocmask(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	how+0(FP), R4
+	MOVV	new+8(FP), R5
+	MOVV	old+16(FP), R6
+	MOVW	size+24(FP), R7
+	MOVV	$SYS_rt_sigprocmask, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVV	R0, 0xf1(R0)	// crash
+	RET
+
+TEXT runtime·rt_sigaction(SB),NOSPLIT|NOFRAME,$0-36
+	MOVV	sig+0(FP), R4
+	MOVV	new+8(FP), R5
+	MOVV	old+16(FP), R6
+	MOVV	size+24(FP), R7
+	MOVV	$SYS_rt_sigaction, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+32(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R4
+	MOVV	info+16(FP), R5
+	MOVV	ctx+24(FP), R6
+	MOVV	fn+0(FP), R25
+	JAL	(R25)
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$64
+	// initialize REGSB = PC&0xffffffff00000000
+	BGEZAL	R0, 1(PC)
+	SRLV	$32, R31, RSB
+	SLLV	$32, RSB
+
+	// this might be called in external code context,
+	// where g is not set.
+	MOVB	runtime·iscgo(SB), R1
+	BEQ	R1, 2(PC)
+	JAL	runtime·load_g(SB)
+
+	MOVW	R4, 8(R29)
+	MOVV	R5, 16(R29)
+	MOVV	R6, 24(R29)
+	MOVV	$runtime·sigtrampgo(SB), R1
+	JAL	(R1)
+	RET
+
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	JMP	runtime·sigtramp(SB)
+
+TEXT runtime·mmap(SB),NOSPLIT|NOFRAME,$0
+	MOVV	addr+0(FP), R4
+	MOVV	n+8(FP), R5
+	MOVW	prot+16(FP), R6
+	MOVW	flags+20(FP), R7
+	MOVW	fd+24(FP), R8
+	MOVW	off+28(FP), R9
+
+	MOVV	$SYS_mmap, R2
+	SYSCALL
+	BEQ	R7, ok
+	MOVV	$0, p+32(FP)
+	MOVV	R2, err+40(FP)
+	RET
+ok:
+	MOVV	R2, p+32(FP)
+	MOVV	$0, err+40(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT|NOFRAME,$0
+	MOVV	addr+0(FP), R4
+	MOVV	n+8(FP), R5
+	MOVV	$SYS_munmap, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVV	R0, 0xf3(R0)	// crash
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT|NOFRAME,$0
+	MOVV	addr+0(FP), R4
+	MOVV	n+8(FP), R5
+	MOVW	flags+16(FP), R6
+	MOVV	$SYS_madvise, R2
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+// int64 futex(int32 *uaddr, int32 op, int32 val,
+//	struct timespec *timeout, int32 *uaddr2, int32 val2);
+TEXT runtime·futex(SB),NOSPLIT|NOFRAME,$0
+	MOVV	addr+0(FP), R4
+	MOVW	op+8(FP), R5
+	MOVW	val+12(FP), R6
+	MOVV	ts+16(FP), R7
+	MOVV	addr2+24(FP), R8
+	MOVW	val3+32(FP), R9
+	MOVV	$SYS_futex, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+40(FP)
+	RET
+
+// int64 clone(int32 flags, void *stk, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT|NOFRAME,$0
+	MOVW	flags+0(FP), R4
+	MOVV	stk+8(FP), R5
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	// Careful: Linux system call clobbers ???.
+	MOVV	mp+16(FP), R16
+	MOVV	gp+24(FP), R17
+	MOVV	fn+32(FP), R18
+
+	MOVV	R16, -8(R5)
+	MOVV	R17, -16(R5)
+	MOVV	R18, -24(R5)
+	MOVV	$1234, R16
+	MOVV	R16, -32(R5)
+
+	MOVV	$SYS_clone, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+
+	// In parent, return.
+	BEQ	R2, 3(PC)
+	MOVW	R2, ret+40(FP)
+	RET
+
+	// In child, on new stack.
+	MOVV	-32(R29), R16
+	MOVV	$1234, R1
+	BEQ	R16, R1, 2(PC)
+	MOVV	R0, 0(R0)
+
+	// Initialize m->procid to Linux tid
+	MOVV	$SYS_gettid, R2
+	SYSCALL
+
+	MOVV	-24(R29), R18		// fn
+	MOVV	-16(R29), R17		// g
+	MOVV	-8(R29), R16		// m
+
+	BEQ	R16, nog
+	BEQ	R17, nog
+
+	MOVV	R2, m_procid(R16)
+
+	// TODO: setup TLS.
+
+	// In child, set up new stack
+	MOVV	R16, g_m(R17)
+	MOVV	R17, g
+	//CALL	runtime·stackcheck(SB)
+
+nog:
+	// Call fn
+	JAL	(R18)
+
+	// It shouldn't return.	 If it does, exit that thread.
+	MOVW	$111, R4
+	MOVV	$SYS_exit, R2
+	SYSCALL
+	JMP	-3(PC)	// keep exiting
+
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOVV	new+0(FP), R4
+	MOVV	old+8(FP), R5
+	MOVV	$SYS_sigaltstack, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVV	R0, 0xf1(R0)	// crash
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT|NOFRAME,$0
+	MOVV	$SYS_sched_yield, R2
+	SYSCALL
+	RET
+
+TEXT runtime·sched_getaffinity(SB),NOSPLIT|NOFRAME,$0
+	MOVV	pid+0(FP), R4
+	MOVV	len+8(FP), R5
+	MOVV	buf+16(FP), R6
+	MOVV	$SYS_sched_getaffinity, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+24(FP)
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT|NOFRAME,$0-8
+	// Implemented as brk(NULL).
+	MOVV	$0, R4
+	MOVV	$SYS_brk, R2
+	SYSCALL
+	MOVV	R2, ret+0(FP)
+	RET
+
+TEXT runtime·access(SB),$0-20
+	MOVV	R0, 2(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+16(FP) // for vet
+	RET
+
+TEXT runtime·connect(SB),$0-28
+	MOVV	R0, 2(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+24(FP) // for vet
+	RET
+
+TEXT runtime·socket(SB),$0-20
+	MOVV	R0, 2(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+16(FP) // for vet
+	RET
diff --git a/src/runtime/sys_linux_mipsx.s b/src/runtime/sys_linux_mipsx.s
new file mode 100644
index 0000000..5e6b6c1
--- /dev/null
+++ b/src/runtime/sys_linux_mipsx.s
@@ -0,0 +1,507 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (mips || mipsle)
+
+//
+// System calls and other sys.stuff for mips, Linux
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define SYS_exit		4001
+#define SYS_read		4003
+#define SYS_write		4004
+#define SYS_open		4005
+#define SYS_close		4006
+#define SYS_getpid		4020
+#define SYS_kill		4037
+#define SYS_brk			4045
+#define SYS_mmap		4090
+#define SYS_munmap		4091
+#define SYS_setitimer		4104
+#define SYS_clone		4120
+#define SYS_sched_yield		4162
+#define SYS_nanosleep		4166
+#define SYS_rt_sigreturn	4193
+#define SYS_rt_sigaction	4194
+#define SYS_rt_sigprocmask	4195
+#define SYS_sigaltstack		4206
+#define SYS_madvise		4218
+#define SYS_mincore		4217
+#define SYS_gettid		4222
+#define SYS_futex		4238
+#define SYS_sched_getaffinity	4240
+#define SYS_exit_group		4246
+#define SYS_timer_create	4257
+#define SYS_timer_settime	4258
+#define SYS_timer_delete	4261
+#define SYS_clock_gettime	4263
+#define SYS_tgkill		4266
+#define SYS_pipe2		4328
+
+TEXT runtime·exit(SB),NOSPLIT,$0-4
+	MOVW	code+0(FP), R4
+	MOVW	$SYS_exit_group, R2
+	SYSCALL
+	UNDEF
+	RET
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-4
+	MOVW	wait+0(FP), R1
+	// We're done using the stack.
+	MOVW	$0, R2
+	SYNC
+	MOVW	R2, (R1)
+	SYNC
+	MOVW	$0, R4	// exit code
+	MOVW	$SYS_exit, R2
+	SYSCALL
+	UNDEF
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$0-16
+	MOVW	name+0(FP), R4
+	MOVW	mode+4(FP), R5
+	MOVW	perm+8(FP), R6
+	MOVW	$SYS_open, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+12(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$0-8
+	MOVW	fd+0(FP), R4
+	MOVW	$SYS_close, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+4(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$0-16
+	MOVW	fd+0(FP), R4
+	MOVW	p+4(FP), R5
+	MOVW	n+8(FP), R6
+	MOVW	$SYS_write, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+12(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$0-16
+	MOVW	fd+0(FP), R4
+	MOVW	p+4(FP), R5
+	MOVW	n+8(FP), R6
+	MOVW	$SYS_read, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+12(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-16
+	MOVW	$r+4(FP), R4
+	MOVW	flags+0(FP), R5
+	MOVW	$SYS_pipe2, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, errno+12(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$28-4
+	MOVW	usec+0(FP), R3
+	MOVW	R3, R5
+	MOVW	$1000000, R4
+	DIVU	R4, R3
+	MOVW	LO, R3
+	MOVW	R3, 24(R29)
+	MOVW	$1000, R4
+	MULU	R3, R4
+	MOVW	LO, R4
+	SUBU	R4, R5
+	MOVW	R5, 28(R29)
+
+	// nanosleep(&ts, 0)
+	ADDU	$24, R29, R4
+	MOVW	$0, R5
+	MOVW	$SYS_nanosleep, R2
+	SYSCALL
+	RET
+
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOVW	$SYS_gettid, R2
+	SYSCALL
+	MOVW	R2, ret+0(FP)
+	RET
+
+TEXT runtime·raise(SB),NOSPLIT,$0-4
+	MOVW	$SYS_getpid, R2
+	SYSCALL
+	MOVW	R2, R16
+	MOVW	$SYS_gettid, R2
+	SYSCALL
+	MOVW	R2, R5	// arg 2 tid
+	MOVW	R16, R4	// arg 1 pid
+	MOVW	sig+0(FP), R6	// arg 3
+	MOVW	$SYS_tgkill, R2
+	SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$0
+	MOVW	$SYS_getpid, R2
+	SYSCALL
+	MOVW	R2, R4	// arg 1 pid
+	MOVW	sig+0(FP), R5	// arg 2
+	MOVW	$SYS_kill, R2
+	SYSCALL
+	RET
+
+TEXT ·getpid(SB),NOSPLIT,$0-4
+	MOVW	$SYS_getpid, R2
+	SYSCALL
+	MOVW	R2, ret+0(FP)
+	RET
+
+TEXT ·tgkill(SB),NOSPLIT,$0-12
+	MOVW	tgid+0(FP), R4
+	MOVW	tid+4(FP), R5
+	MOVW	sig+8(FP), R6
+	MOVW	$SYS_tgkill, R2
+	SYSCALL
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$0-12
+	MOVW	mode+0(FP), R4
+	MOVW	new+4(FP), R5
+	MOVW	old+8(FP), R6
+	MOVW	$SYS_setitimer, R2
+	SYSCALL
+	RET
+
+TEXT runtime·timer_create(SB),NOSPLIT,$0-16
+	MOVW	clockid+0(FP), R4
+	MOVW	sevp+4(FP), R5
+	MOVW	timerid+8(FP), R6
+	MOVW	$SYS_timer_create, R2
+	SYSCALL
+	MOVW	R2, ret+12(FP)
+	RET
+
+TEXT runtime·timer_settime(SB),NOSPLIT,$0-20
+	MOVW	timerid+0(FP), R4
+	MOVW	flags+4(FP), R5
+	MOVW	new+8(FP), R6
+	MOVW	old+12(FP), R7
+	MOVW	$SYS_timer_settime, R2
+	SYSCALL
+	MOVW	R2, ret+16(FP)
+	RET
+
+TEXT runtime·timer_delete(SB),NOSPLIT,$0-8
+	MOVW	timerid+0(FP), R4
+	MOVW	$SYS_timer_delete, R2
+	SYSCALL
+	MOVW	R2, ret+4(FP)
+	RET
+
+TEXT runtime·mincore(SB),NOSPLIT,$0-16
+	MOVW	addr+0(FP), R4
+	MOVW	n+4(FP), R5
+	MOVW	dst+8(FP), R6
+	MOVW	$SYS_mincore, R2
+	SYSCALL
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+12(FP)
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB),NOSPLIT,$8-12
+	MOVW	$0, R4	// CLOCK_REALTIME
+	MOVW	$4(R29), R5
+	MOVW	$SYS_clock_gettime, R2
+	SYSCALL
+	MOVW	4(R29), R3	// sec
+	MOVW	8(R29), R5	// nsec
+	MOVW	$sec+0(FP), R6
+#ifdef GOARCH_mips
+	MOVW	R3, 4(R6)
+	MOVW	R0, 0(R6)
+#else
+	MOVW	R3, 0(R6)
+	MOVW	R0, 4(R6)
+#endif
+	MOVW	R5, nsec+8(FP)
+	RET
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$8-8
+	MOVW	$1, R4	// CLOCK_MONOTONIC
+	MOVW	$4(R29), R5
+	MOVW	$SYS_clock_gettime, R2
+	SYSCALL
+	MOVW	4(R29), R3	// sec
+	MOVW	8(R29), R5	// nsec
+	// sec is in R3, nsec in R5
+	// return nsec in R3
+	MOVW	$1000000000, R4
+	MULU	R4, R3
+	MOVW	LO, R3
+	ADDU	R5, R3
+	SGTU	R5, R3, R4
+	MOVW	$ret+0(FP), R6
+#ifdef GOARCH_mips
+	MOVW	R3, 4(R6)
+#else
+	MOVW	R3, 0(R6)
+#endif
+	MOVW	HI, R3
+	ADDU	R4, R3
+#ifdef GOARCH_mips
+	MOVW	R3, 0(R6)
+#else
+	MOVW	R3, 4(R6)
+#endif
+	RET
+
+TEXT runtime·rtsigprocmask(SB),NOSPLIT,$0-16
+	MOVW	how+0(FP), R4
+	MOVW	new+4(FP), R5
+	MOVW	old+8(FP), R6
+	MOVW	size+12(FP), R7
+	MOVW	$SYS_rt_sigprocmask, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	UNDEF	// crash
+	RET
+
+TEXT runtime·rt_sigaction(SB),NOSPLIT,$0-20
+	MOVW	sig+0(FP), R4
+	MOVW	new+4(FP), R5
+	MOVW	old+8(FP), R6
+	MOVW	size+12(FP), R7
+	MOVW	$SYS_rt_sigaction, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+16(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-16
+	MOVW	sig+4(FP), R4
+	MOVW	info+8(FP), R5
+	MOVW	ctx+12(FP), R6
+	MOVW	fn+0(FP), R25
+	MOVW	R29, R22
+	SUBU	$16, R29
+	AND	$~7, R29	// shadow space for 4 args aligned to 8 bytes as per O32 ABI
+	JAL	(R25)
+	MOVW	R22, R29
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$12
+	// this might be called in external code context,
+	// where g is not set.
+	MOVB	runtime·iscgo(SB), R1
+	BEQ	R1, 2(PC)
+	JAL	runtime·load_g(SB)
+
+	MOVW	R4, 4(R29)
+	MOVW	R5, 8(R29)
+	MOVW	R6, 12(R29)
+	MOVW	$runtime·sigtrampgo(SB), R1
+	JAL	(R1)
+	RET
+
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	JMP	runtime·sigtramp(SB)
+
+TEXT runtime·mmap(SB),NOSPLIT,$20-32
+	MOVW	addr+0(FP), R4
+	MOVW	n+4(FP), R5
+	MOVW	prot+8(FP), R6
+	MOVW	flags+12(FP), R7
+	MOVW	fd+16(FP), R8
+	MOVW	off+20(FP), R9
+	MOVW	R8, 16(R29)
+	MOVW	R9, 20(R29)
+
+	MOVW	$SYS_mmap, R2
+	SYSCALL
+	BEQ	R7, ok
+	MOVW	$0, p+24(FP)
+	MOVW	R2, err+28(FP)
+	RET
+ok:
+	MOVW	R2, p+24(FP)
+	MOVW	$0, err+28(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0-8
+	MOVW	addr+0(FP), R4
+	MOVW	n+4(FP), R5
+	MOVW	$SYS_munmap, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	UNDEF	// crash
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0-16
+	MOVW	addr+0(FP), R4
+	MOVW	n+4(FP), R5
+	MOVW	flags+8(FP), R6
+	MOVW	$SYS_madvise, R2
+	SYSCALL
+	MOVW	R2, ret+12(FP)
+	RET
+
+// int32 futex(int32 *uaddr, int32 op, int32 val, struct timespec *timeout, int32 *uaddr2, int32 val2);
+TEXT runtime·futex(SB),NOSPLIT,$20-28
+	MOVW	addr+0(FP), R4
+	MOVW	op+4(FP), R5
+	MOVW	val+8(FP), R6
+	MOVW	ts+12(FP), R7
+
+	MOVW	addr2+16(FP), R8
+	MOVW	val3+20(FP), R9
+
+	MOVW	R8, 16(R29)
+	MOVW	R9, 20(R29)
+
+	MOVW	$SYS_futex, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+24(FP)
+	RET
+
+
+// int32 clone(int32 flags, void *stk, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	flags+0(FP), R4
+	MOVW	stk+4(FP), R5
+	MOVW	R0, R6	// ptid
+	MOVW	R0, R7	// tls
+
+	// O32 syscall handler unconditionally copies arguments 5-8 from stack,
+	// even for syscalls with less than 8 arguments. Reserve 32 bytes of new
+	// stack so that any syscall invoked immediately in the new thread won't fail.
+	ADD	$-32, R5
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	MOVW	mp+8(FP), R16
+	MOVW	gp+12(FP), R17
+	MOVW	fn+16(FP), R18
+
+	MOVW	$1234, R1
+
+	MOVW	R16, 0(R5)
+	MOVW	R17, 4(R5)
+	MOVW	R18, 8(R5)
+
+	MOVW	R1, 12(R5)
+
+	MOVW	$SYS_clone, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+
+	// In parent, return.
+	BEQ	R2, 3(PC)
+	MOVW	R2, ret+20(FP)
+	RET
+
+	// In child, on new stack.
+	// Check that SP is as we expect
+	NOP	R29	// tell vet R29/SP changed - stop checking offsets
+	MOVW	12(R29), R16
+	MOVW	$1234, R1
+	BEQ	R16, R1, 2(PC)
+	MOVW	(R0), R0
+
+	// Initialize m->procid to Linux tid
+	MOVW	$SYS_gettid, R2
+	SYSCALL
+
+	MOVW	0(R29), R16	// m
+	MOVW	4(R29), R17	// g
+	MOVW	8(R29), R18	// fn
+
+	BEQ	R16, nog
+	BEQ	R17, nog
+
+	MOVW	R2, m_procid(R16)
+
+	// In child, set up new stack
+	MOVW	R16, g_m(R17)
+	MOVW	R17, g
+
+// TODO(mips32): doesn't have runtime·stackcheck(SB)
+
+nog:
+	// Call fn
+	ADDU	$32, R29
+	JAL	(R18)
+
+	// It shouldn't return.	 If it does, exit that thread.
+	ADDU	$-32, R29
+	MOVW	$0xf4, R4
+	MOVW	$SYS_exit, R2
+	SYSCALL
+	UNDEF
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$0
+	MOVW	new+0(FP), R4
+	MOVW	old+4(FP), R5
+	MOVW	$SYS_sigaltstack, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	UNDEF	// crash
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	MOVW	$SYS_sched_yield, R2
+	SYSCALL
+	RET
+
+TEXT runtime·sched_getaffinity(SB),NOSPLIT,$0-16
+	MOVW	pid+0(FP), R4
+	MOVW	len+4(FP), R5
+	MOVW	buf+8(FP), R6
+	MOVW	$SYS_sched_getaffinity, R2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+12(FP)
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT,$0-4
+	// Implemented as brk(NULL).
+	MOVW	$0, R4
+	MOVW	$SYS_brk, R2
+	SYSCALL
+	MOVW	R2, ret+0(FP)
+	RET
+
+TEXT runtime·access(SB),$0-12
+	BREAK // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+8(FP)	// for vet
+	RET
+
+TEXT runtime·connect(SB),$0-16
+	BREAK // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+12(FP)	// for vet
+	RET
+
+TEXT runtime·socket(SB),$0-16
+	BREAK // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+12(FP)	// for vet
+	RET
diff --git a/src/runtime/sys_linux_ppc64x.s b/src/runtime/sys_linux_ppc64x.s
new file mode 100644
index 0000000..d105585
--- /dev/null
+++ b/src/runtime/sys_linux_ppc64x.s
@@ -0,0 +1,759 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (ppc64 || ppc64le)
+
+//
+// System calls and other sys.stuff for ppc64, Linux
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "asm_ppc64x.h"
+#include "cgo/abi_ppc64x.h"
+
+#define SYS_exit		  1
+#define SYS_read		  3
+#define SYS_write		  4
+#define SYS_open		  5
+#define SYS_close		  6
+#define SYS_getpid		 20
+#define SYS_kill		 37
+#define SYS_brk			 45
+#define SYS_mmap		 90
+#define SYS_munmap		 91
+#define SYS_setitimer		104
+#define SYS_clone		120
+#define SYS_sched_yield		158
+#define SYS_nanosleep		162
+#define SYS_rt_sigreturn	172
+#define SYS_rt_sigaction	173
+#define SYS_rt_sigprocmask	174
+#define SYS_sigaltstack		185
+#define SYS_madvise		205
+#define SYS_mincore		206
+#define SYS_gettid		207
+#define SYS_futex		221
+#define SYS_sched_getaffinity	223
+#define SYS_exit_group		234
+#define SYS_timer_create	240
+#define SYS_timer_settime	241
+#define SYS_timer_delete	244
+#define SYS_clock_gettime	246
+#define SYS_tgkill		250
+#define SYS_pipe2		317
+
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	code+0(FP), R3
+	SYSCALL	$SYS_exit_group
+	RET
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT|NOFRAME,$0-8
+	MOVD	wait+0(FP), R1
+	// We're done using the stack.
+	MOVW	$0, R2
+	SYNC
+	MOVW	R2, (R1)
+	MOVW	$0, R3	// exit code
+	SYSCALL	$SYS_exit
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0-20
+	MOVD	name+0(FP), R3
+	MOVW	mode+8(FP), R4
+	MOVW	perm+12(FP), R5
+	SYSCALL	$SYS_open
+	BVC	2(PC)
+	MOVW	$-1, R3
+	MOVW	R3, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	fd+0(FP), R3
+	SYSCALL	$SYS_close
+	BVC	2(PC)
+	MOVW	$-1, R3
+	MOVW	R3, ret+8(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0-28
+	MOVD	fd+0(FP), R3
+	MOVD	p+8(FP), R4
+	MOVW	n+16(FP), R5
+	SYSCALL	$SYS_write
+	BVC	2(PC)
+	NEG	R3	// caller expects negative errno
+	MOVW	R3, ret+24(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	fd+0(FP), R3
+	MOVD	p+8(FP), R4
+	MOVW	n+16(FP), R5
+	SYSCALL	$SYS_read
+	BVC	2(PC)
+	NEG	R3	// caller expects negative errno
+	MOVW	R3, ret+24(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	ADD	$FIXED_FRAME+8, R1, R3
+	MOVW	flags+0(FP), R4
+	SYSCALL	$SYS_pipe2
+	MOVW	R3, errno+16(FP)
+	RET
+
+// func usleep(usec uint32)
+TEXT runtime·usleep(SB),NOSPLIT,$16-4
+	MOVW	usec+0(FP), R3
+
+	// Use magic constant 0x8637bd06 and shift right 51
+	// to perform usec/1000000.
+	MOVD	$0x8637bd06, R4
+	MULLD	R3, R4, R4	// Convert usec to S.
+	SRD	$51, R4, R4
+	MOVD	R4, 8(R1)	// Store to tv_sec
+
+	MOVD	$1000000, R5
+	MULLW	R4, R5, R5	// Convert tv_sec back into uS
+	SUB	R5, R3, R5	// Compute remainder uS.
+	MULLD	$1000, R5, R5	// Convert to nsec
+	MOVD	R5, 16(R1)	// Store to tv_nsec
+
+	// nanosleep(&ts, 0)
+	ADD	$8, R1, R3
+	MOVW	$0, R4
+	SYSCALL	$SYS_nanosleep
+	RET
+
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	SYSCALL	$SYS_gettid
+	MOVW	R3, ret+0(FP)
+	RET
+
+TEXT runtime·raise(SB),NOSPLIT|NOFRAME,$0
+	SYSCALL	$SYS_getpid
+	MOVW	R3, R14
+	SYSCALL	$SYS_gettid
+	MOVW	R3, R4	// arg 2 tid
+	MOVW	R14, R3	// arg 1 pid
+	MOVW	sig+0(FP), R5	// arg 3
+	SYSCALL	$SYS_tgkill
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT|NOFRAME,$0
+	SYSCALL	$SYS_getpid
+	MOVW	R3, R3	// arg 1 pid
+	MOVW	sig+0(FP), R4	// arg 2
+	SYSCALL	$SYS_kill
+	RET
+
+TEXT ·getpid(SB),NOSPLIT|NOFRAME,$0-8
+	SYSCALL $SYS_getpid
+	MOVD	R3, ret+0(FP)
+	RET
+
+TEXT ·tgkill(SB),NOSPLIT|NOFRAME,$0-24
+	MOVD	tgid+0(FP), R3
+	MOVD	tid+8(FP), R4
+	MOVD	sig+16(FP), R5
+	SYSCALL $SYS_tgkill
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	mode+0(FP), R3
+	MOVD	new+8(FP), R4
+	MOVD	old+16(FP), R5
+	SYSCALL	$SYS_setitimer
+	RET
+
+TEXT runtime·timer_create(SB),NOSPLIT,$0-28
+	MOVW	clockid+0(FP), R3
+	MOVD	sevp+8(FP), R4
+	MOVD	timerid+16(FP), R5
+	SYSCALL	$SYS_timer_create
+	MOVW	R3, ret+24(FP)
+	RET
+
+TEXT runtime·timer_settime(SB),NOSPLIT,$0-28
+	MOVW	timerid+0(FP), R3
+	MOVW	flags+4(FP), R4
+	MOVD	new+8(FP), R5
+	MOVD	old+16(FP), R6
+	SYSCALL	$SYS_timer_settime
+	MOVW	R3, ret+24(FP)
+	RET
+
+TEXT runtime·timer_delete(SB),NOSPLIT,$0-12
+	MOVW	timerid+0(FP), R3
+	SYSCALL	$SYS_timer_delete
+	MOVW	R3, ret+8(FP)
+	RET
+
+TEXT runtime·mincore(SB),NOSPLIT|NOFRAME,$0-28
+	MOVD	addr+0(FP), R3
+	MOVD	n+8(FP), R4
+	MOVD	dst+16(FP), R5
+	SYSCALL	$SYS_mincore
+	NEG	R3		// caller expects negative errno
+	MOVW	R3, ret+24(FP)
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB),NOSPLIT,$16-12
+	MOVD	R1, R15		// R15 is unchanged by C code
+	MOVD	g_m(g), R21	// R21 = m
+
+	MOVD	$0, R3		// CLOCK_REALTIME
+
+	MOVD	runtime·vdsoClockgettimeSym(SB), R12	// Check for VDSO availability
+	CMP	R12, R0
+	BEQ	fallback
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVD	m_vdsoPC(R21), R4
+	MOVD	m_vdsoSP(R21), R5
+	MOVD	R4, 32(R1)
+	MOVD	R5, 40(R1)
+
+	MOVD	LR, R14
+	MOVD	$ret-FIXED_FRAME(FP), R5 // caller's SP
+	MOVD	R14, m_vdsoPC(R21)
+	MOVD	R5, m_vdsoSP(R21)
+
+	MOVD	m_curg(R21), R6
+	CMP	g, R6
+	BNE	noswitch
+
+	MOVD	m_g0(R21), R7
+	MOVD	(g_sched+gobuf_sp)(R7), R1	// Set SP to g0 stack
+
+noswitch:
+	SUB	$16, R1                 // Space for results
+	RLDICR	$0, R1, $59, R1         // Align for C code
+	MOVD	R12, CTR
+	MOVD	R1, R4
+
+	// Store g on gsignal's stack, so if we receive a signal
+	// during VDSO code we can find the g.
+	// If we don't have a signal stack, we won't receive signal,
+	// so don't bother saving g.
+	// When using cgo, we already saved g on TLS, also don't save
+	// g here.
+	// Also don't save g if we are already on the signal stack.
+	// We won't get a nested signal.
+	MOVBZ	runtime·iscgo(SB), R22
+	CMP	R22, $0
+	BNE	nosaveg
+	MOVD	m_gsignal(R21), R22	// g.m.gsignal
+	CMP	R22, $0
+	BEQ	nosaveg
+
+	CMP	g, R22
+	BEQ	nosaveg
+	MOVD	(g_stack+stack_lo)(R22), R22 // g.m.gsignal.stack.lo
+	MOVD	g, (R22)
+
+	BL	(CTR)	// Call from VDSO
+
+	MOVD	$0, (R22)	// clear g slot, R22 is unchanged by C code
+
+	JMP	finish
+
+nosaveg:
+	BL	(CTR)	// Call from VDSO
+
+finish:
+	MOVD	$0, R0		// Restore R0
+	MOVD	0(R1), R3	// sec
+	MOVD	8(R1), R5	// nsec
+	MOVD	R15, R1		// Restore SP
+
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVD	40(R1), R6
+	MOVD	R6, m_vdsoSP(R21)
+	MOVD	32(R1), R6
+	MOVD	R6, m_vdsoPC(R21)
+
+return:
+	MOVD	R3, sec+0(FP)
+	MOVW	R5, nsec+8(FP)
+	RET
+
+	// Syscall fallback
+fallback:
+	ADD	$32, R1, R4
+	SYSCALL $SYS_clock_gettime
+	MOVD	32(R1), R3
+	MOVD	40(R1), R5
+	JMP	return
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$16-8
+	MOVD	$1, R3		// CLOCK_MONOTONIC
+
+	MOVD	R1, R15		// R15 is unchanged by C code
+	MOVD	g_m(g), R21	// R21 = m
+
+	MOVD	runtime·vdsoClockgettimeSym(SB), R12	// Check for VDSO availability
+	CMP	R12, R0
+	BEQ	fallback
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVD	m_vdsoPC(R21), R4
+	MOVD	m_vdsoSP(R21), R5
+	MOVD	R4, 32(R1)
+	MOVD	R5, 40(R1)
+
+	MOVD	LR, R14				// R14 is unchanged by C code
+	MOVD	$ret-FIXED_FRAME(FP), R5	// caller's SP
+	MOVD	R14, m_vdsoPC(R21)
+	MOVD	R5, m_vdsoSP(R21)
+
+	MOVD	m_curg(R21), R6
+	CMP	g, R6
+	BNE	noswitch
+
+	MOVD	m_g0(R21), R7
+	MOVD	(g_sched+gobuf_sp)(R7), R1	// Set SP to g0 stack
+
+noswitch:
+	SUB	$16, R1			// Space for results
+	RLDICR	$0, R1, $59, R1		// Align for C code
+	MOVD	R12, CTR
+	MOVD	R1, R4
+
+	// Store g on gsignal's stack, so if we receive a signal
+	// during VDSO code we can find the g.
+	// If we don't have a signal stack, we won't receive signal,
+	// so don't bother saving g.
+	// When using cgo, we already saved g on TLS, also don't save
+	// g here.
+	// Also don't save g if we are already on the signal stack.
+	// We won't get a nested signal.
+	MOVBZ	runtime·iscgo(SB), R22
+	CMP	R22, $0
+	BNE	nosaveg
+	MOVD	m_gsignal(R21), R22	// g.m.gsignal
+	CMP	R22, $0
+	BEQ	nosaveg
+
+	CMP	g, R22
+	BEQ	nosaveg
+	MOVD	(g_stack+stack_lo)(R22), R22 // g.m.gsignal.stack.lo
+	MOVD	g, (R22)
+
+	BL	(CTR)	// Call from VDSO
+
+	MOVD	$0, (R22)	// clear g slot, R22 is unchanged by C code
+
+	JMP	finish
+
+nosaveg:
+	BL	(CTR)	// Call from VDSO
+
+finish:
+	MOVD	$0, R0			// Restore R0
+	MOVD	0(R1), R3		// sec
+	MOVD	8(R1), R5		// nsec
+	MOVD	R15, R1			// Restore SP
+
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVD	40(R1), R6
+	MOVD	R6, m_vdsoSP(R21)
+	MOVD	32(R1), R6
+	MOVD	R6, m_vdsoPC(R21)
+
+return:
+	// sec is in R3, nsec in R5
+	// return nsec in R3
+	MOVD	$1000000000, R4
+	MULLD	R4, R3
+	ADD	R5, R3
+	MOVD	R3, ret+0(FP)
+	RET
+
+	// Syscall fallback
+fallback:
+	ADD	$32, R1, R4
+	SYSCALL $SYS_clock_gettime
+	MOVD	32(R1), R3
+	MOVD	40(R1), R5
+	JMP	return
+
+TEXT runtime·rtsigprocmask(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	how+0(FP), R3
+	MOVD	new+8(FP), R4
+	MOVD	old+16(FP), R5
+	MOVW	size+24(FP), R6
+	SYSCALL	$SYS_rt_sigprocmask
+	BVC	2(PC)
+	MOVD	R0, 0xf0(R0)	// crash
+	RET
+
+TEXT runtime·rt_sigaction(SB),NOSPLIT|NOFRAME,$0-36
+	MOVD	sig+0(FP), R3
+	MOVD	new+8(FP), R4
+	MOVD	old+16(FP), R5
+	MOVD	size+24(FP), R6
+	SYSCALL	$SYS_rt_sigaction
+	BVC	2(PC)
+	NEG	R3	// caller expects negative errno
+	MOVW	R3, ret+32(FP)
+	RET
+
+#ifdef GOARCH_ppc64le
+// Call the function stored in _cgo_sigaction using the GCC calling convention.
+TEXT runtime·callCgoSigaction(SB),NOSPLIT,$0
+	MOVD    sig+0(FP), R3
+	MOVD    new+8(FP), R4
+	MOVD    old+16(FP), R5
+	MOVD     _cgo_sigaction(SB), R12
+	MOVD    R12, CTR                // R12 should contain the function address
+	MOVD    R1, R15                 // Save R1
+	MOVD    R2, 24(R1)              // Save R2
+	SUB     $48, R1                 // reserve 32 (frame) + 16 bytes for sp-8 where fp may be saved.
+	RLDICR  $0, R1, $59, R1         // Align to 16 bytes for C code
+	BL      (CTR)
+	XOR     R0, R0, R0              // Clear R0 as Go expects
+	MOVD    R15, R1                 // Restore R1
+	MOVD    24(R1), R2              // Restore R2
+	MOVW    R3, ret+24(FP)          // Return result
+	RET
+#endif
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R3
+	MOVD	info+16(FP), R4
+	MOVD	ctx+24(FP), R5
+	MOVD	fn+0(FP), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+	MOVD	24(R1), R2
+	RET
+
+#ifdef GO_PPC64X_HAS_FUNCDESC
+DEFINE_PPC64X_FUNCDESC(runtime·sigtramp, sigtramp<>)
+// cgo isn't supported on ppc64, but we need to supply a cgoSigTramp function.
+DEFINE_PPC64X_FUNCDESC(runtime·cgoSigtramp, sigtramp<>)
+TEXT sigtramp<>(SB),NOSPLIT|NOFRAME|TOPFRAME,$0
+#else
+// ppc64le doesn't need function descriptors
+// Save callee-save registers in the case of signal forwarding.
+// Same as on ARM64 https://golang.org/issue/31827 .
+//
+// Note, it is assumed this is always called indirectly (e.g via
+// a function pointer) as R2 may not be preserved when calling this
+// function. In those cases, the caller preserves their R2.
+TEXT runtime·sigtramp(SB),NOSPLIT|NOFRAME,$0
+#endif
+	// This is called with ELF calling conventions. Convert to Go.
+	// Allocate space for argument storage to call runtime.sigtrampgo.
+	STACK_AND_SAVE_HOST_TO_GO_ABI(32)
+
+	// this might be called in external code context,
+	// where g is not set.
+	MOVBZ	runtime·iscgo(SB), R6
+	CMP	R6, $0
+	BEQ	2(PC)
+	BL	runtime·load_g(SB)
+
+	// R3,R4,R5 already hold the arguments. Forward them on.
+	// TODO: Indirectly call runtime.sigtrampgo to avoid the linker's static NOSPLIT stack
+	// overflow detection. It thinks this might be called on a small Go stack, but this is only
+	// called from a larger pthread or sigaltstack stack. Can the checker be improved to not
+	// flag a direct call here?
+	MOVD	$runtime·sigtrampgo<ABIInternal>(SB), R12
+	MOVD	R12, CTR
+	BL	(CTR)
+	// Restore R2 (TOC pointer) in the event it might be used later in this function.
+	// If this was not compiled as shared code, R2 is undefined, reloading it is harmless.
+	MOVD	24(R1), R2
+
+	UNSTACK_AND_RESTORE_GO_TO_HOST_ABI(32)
+	RET
+
+#ifdef GOARCH_ppc64le
+TEXT runtime·cgoSigtramp(SB),NOSPLIT|NOFRAME,$0
+	// The stack unwinder, presumably written in C, may not be able to
+	// handle Go frame correctly. So, this function is NOFRAME, and we
+	// save/restore LR manually, and obey ELFv2 calling conventions.
+	MOVD	LR, R10
+
+	// We're coming from C code, initialize R0
+	MOVD	$0, R0
+
+	// If no traceback function, do usual sigtramp.
+	MOVD	runtime·cgoTraceback(SB), R6
+	CMP	$0, R6
+	BEQ	sigtramp
+
+	// If no traceback support function, which means that
+	// runtime/cgo was not linked in, do usual sigtramp.
+	MOVD	_cgo_callers(SB), R6
+	CMP	$0, R6
+	BEQ	sigtramp
+
+	// Inspect the g in TLS without clobbering R30/R31 via runtime.load_g.
+	MOVD	runtime·tls_g(SB), R9
+	MOVD	0(R9), R9
+
+	// Figure out if we are currently in a cgo call.
+	// If not, just do usual sigtramp.
+	// compared to ARM64 and others.
+	CMP	$0, R9
+	BEQ	sigtrampnog // g == nil
+
+	// g is not nil. Check further.
+	MOVD	g_m(R9), R6
+	CMP	$0, R6
+	BEQ	sigtramp    // g.m == nil
+	MOVW	m_ncgo(R6), R7
+	CMPW	$0, R7
+	BEQ	sigtramp    // g.m.ncgo = 0
+	MOVD	m_curg(R6), R7
+	CMP	$0, R7
+	BEQ	sigtramp    // g.m.curg == nil
+	MOVD	g_syscallsp(R7), R7
+	CMP	$0, R7
+	BEQ	sigtramp    // g.m.curg.syscallsp == 0
+	MOVD	m_cgoCallers(R6), R7 // R7 is the fifth arg in C calling convention.
+	CMP	$0, R7
+	BEQ	sigtramp    // g.m.cgoCallers == nil
+	MOVW	m_cgoCallersUse(R6), R8
+	CMPW	$0, R8
+	BNE	sigtramp    // g.m.cgoCallersUse != 0
+
+	// Jump to a function in runtime/cgo.
+	// That function, written in C, will call the user's traceback
+	// function with proper unwind info, and will then call back here.
+	// The first three arguments, and the fifth, are already in registers.
+	// Set the two remaining arguments now.
+	MOVD	runtime·cgoTraceback(SB), R6
+	MOVD	$runtime·sigtramp(SB), R8
+	MOVD	_cgo_callers(SB), R12
+	MOVD	R12, CTR
+	MOVD	R10, LR // restore LR
+	JMP	(CTR)
+
+sigtramp:
+	MOVD	R10, LR // restore LR
+	JMP	runtime·sigtramp(SB)
+
+sigtrampnog:
+	// Signal arrived on a non-Go thread. If this is SIGPROF, get a
+	// stack trace.
+	CMPW	R3, $27 // 27 == SIGPROF
+	BNE	sigtramp
+
+	// Lock sigprofCallersUse (cas from 0 to 1).
+	MOVW	$1, R7
+	MOVD	$runtime·sigprofCallersUse(SB), R8
+	SYNC
+	LWAR    (R8), R6
+	CMPW    $0, R6
+	BNE     sigtramp
+	STWCCC  R7, (R8)
+	BNE     -4(PC)
+	ISYNC
+
+	// Jump to the traceback function in runtime/cgo.
+	// It will call back to sigprofNonGo, which will ignore the
+	// arguments passed in registers.
+	// First three arguments to traceback function are in registers already.
+	MOVD	runtime·cgoTraceback(SB), R6
+	MOVD	$runtime·sigprofCallers(SB), R7
+	MOVD	$runtime·sigprofNonGoWrapper<>(SB), R8
+	MOVD	_cgo_callers(SB), R12
+	MOVD	R12, CTR
+	MOVD	R10, LR // restore LR
+	JMP	(CTR)
+#endif
+
+// Used by cgoSigtramp to inspect without clobbering R30/R31 via runtime.load_g.
+GLOBL runtime·tls_g+0(SB), TLSBSS+DUPOK, $8
+
+TEXT runtime·sigprofNonGoWrapper<>(SB),NOSPLIT|NOFRAME,$0
+	// This is called from C code. Callee save registers must be saved.
+	// R3,R4,R5 hold arguments, and allocate argument space to call sigprofNonGo.
+	STACK_AND_SAVE_HOST_TO_GO_ABI(32)
+
+	CALL	runtime·sigprofNonGo<ABIInternal>(SB)
+
+	UNSTACK_AND_RESTORE_GO_TO_HOST_ABI(32)
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R3
+	MOVD	n+8(FP), R4
+	MOVW	prot+16(FP), R5
+	MOVW	flags+20(FP), R6
+	MOVW	fd+24(FP), R7
+	MOVW	off+28(FP), R8
+
+	SYSCALL	$SYS_mmap
+	BVC	ok
+	MOVD	$0, p+32(FP)
+	MOVD	R3, err+40(FP)
+	RET
+ok:
+	MOVD	R3, p+32(FP)
+	MOVD	$0, err+40(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R3
+	MOVD	n+8(FP), R4
+	SYSCALL	$SYS_munmap
+	BVC	2(PC)
+	MOVD	R0, 0xf0(R0)
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R3
+	MOVD	n+8(FP), R4
+	MOVW	flags+16(FP), R5
+	SYSCALL	$SYS_madvise
+	MOVW	R3, ret+24(FP)
+	RET
+
+// int64 futex(int32 *uaddr, int32 op, int32 val,
+//	struct timespec *timeout, int32 *uaddr2, int32 val2);
+TEXT runtime·futex(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R3
+	MOVW	op+8(FP), R4
+	MOVW	val+12(FP), R5
+	MOVD	ts+16(FP), R6
+	MOVD	addr2+24(FP), R7
+	MOVW	val3+32(FP), R8
+	SYSCALL	$SYS_futex
+	BVC	2(PC)
+	NEG	R3	// caller expects negative errno
+	MOVW	R3, ret+40(FP)
+	RET
+
+// int64 clone(int32 flags, void *stk, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT|NOFRAME,$0
+	MOVW	flags+0(FP), R3
+	MOVD	stk+8(FP), R4
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	// Careful: Linux system call clobbers ???.
+	MOVD	mp+16(FP), R7
+	MOVD	gp+24(FP), R8
+	MOVD	fn+32(FP), R12
+
+	MOVD	R7, -8(R4)
+	MOVD	R8, -16(R4)
+	MOVD	R12, -24(R4)
+	MOVD	$1234, R7
+	MOVD	R7, -32(R4)
+
+	SYSCALL $SYS_clone
+	BVC	2(PC)
+	NEG	R3	// caller expects negative errno
+
+	// In parent, return.
+	CMP	R3, $0
+	BEQ	3(PC)
+	MOVW	R3, ret+40(FP)
+	RET
+
+	// In child, on new stack.
+	// initialize essential registers
+	BL	runtime·reginit(SB)
+	MOVD	-32(R1), R7
+	CMP	R7, $1234
+	BEQ	2(PC)
+	MOVD	R0, 0(R0)
+
+	// Initialize m->procid to Linux tid
+	SYSCALL $SYS_gettid
+
+	MOVD	-24(R1), R12       // fn
+	MOVD	-16(R1), R8        // g
+	MOVD	-8(R1), R7         // m
+
+	CMP	R7, $0
+	BEQ	nog
+	CMP	R8, $0
+	BEQ	nog
+
+	MOVD	R3, m_procid(R7)
+
+	// TODO: setup TLS.
+
+	// In child, set up new stack
+	MOVD	R7, g_m(R8)
+	MOVD	R8, g
+	//CALL	runtime·stackcheck(SB)
+
+nog:
+	// Call fn
+	MOVD	R12, CTR
+	BL	(CTR)
+
+	// It shouldn't return.	 If it does, exit that thread.
+	MOVW	$111, R3
+	SYSCALL	$SYS_exit
+	BR	-2(PC)	// keep exiting
+
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOVD	new+0(FP), R3
+	MOVD	old+8(FP), R4
+	SYSCALL	$SYS_sigaltstack
+	BVC	2(PC)
+	MOVD	R0, 0xf0(R0)  // crash
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT|NOFRAME,$0
+	SYSCALL	$SYS_sched_yield
+	RET
+
+TEXT runtime·sched_getaffinity(SB),NOSPLIT|NOFRAME,$0
+	MOVD	pid+0(FP), R3
+	MOVD	len+8(FP), R4
+	MOVD	buf+16(FP), R5
+	SYSCALL	$SYS_sched_getaffinity
+	BVC	2(PC)
+	NEG	R3	// caller expects negative errno
+	MOVW	R3, ret+24(FP)
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT|NOFRAME,$0
+	// Implemented as brk(NULL).
+	MOVD	$0, R3
+	SYSCALL	$SYS_brk
+	MOVD	R3, ret+0(FP)
+	RET
+
+TEXT runtime·access(SB),$0-20
+	MOVD	R0, 0(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+16(FP) // for vet
+	RET
+
+TEXT runtime·connect(SB),$0-28
+	MOVD	R0, 0(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+24(FP) // for vet
+	RET
+
+TEXT runtime·socket(SB),$0-20
+	MOVD	R0, 0(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+16(FP) // for vet
+	RET
diff --git a/src/runtime/sys_linux_riscv64.s b/src/runtime/sys_linux_riscv64.s
new file mode 100644
index 0000000..d1558fd
--- /dev/null
+++ b/src/runtime/sys_linux_riscv64.s
@@ -0,0 +1,584 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for riscv64, Linux
+//
+
+#include "textflag.h"
+#include "go_asm.h"
+
+#define AT_FDCWD -100
+#define CLOCK_REALTIME 0
+#define CLOCK_MONOTONIC 1
+
+#define SYS_brk			214
+#define SYS_clock_gettime	113
+#define SYS_clone		220
+#define SYS_close		57
+#define SYS_connect		203
+#define SYS_exit		93
+#define SYS_exit_group		94
+#define SYS_faccessat		48
+#define SYS_futex		98
+#define SYS_getpid		172
+#define SYS_gettid		178
+#define SYS_gettimeofday	169
+#define SYS_kill		129
+#define SYS_madvise		233
+#define SYS_mincore		232
+#define SYS_mmap		222
+#define SYS_munmap		215
+#define SYS_nanosleep		101
+#define SYS_openat		56
+#define SYS_pipe2		59
+#define SYS_pselect6		72
+#define SYS_read		63
+#define SYS_rt_sigaction	134
+#define SYS_rt_sigprocmask	135
+#define SYS_rt_sigreturn	139
+#define SYS_sched_getaffinity	123
+#define SYS_sched_yield		124
+#define SYS_setitimer		103
+#define SYS_sigaltstack		132
+#define SYS_socket		198
+#define SYS_tgkill		131
+#define SYS_timer_create	107
+#define SYS_timer_delete	111
+#define SYS_timer_settime	110
+#define SYS_tkill		130
+#define SYS_write		64
+
+// func exit(code int32)
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	code+0(FP), A0
+	MOV	$SYS_exit_group, A7
+	ECALL
+	RET
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT|NOFRAME,$0-8
+	MOV	wait+0(FP), A0
+	// We're done using the stack.
+	FENCE
+	MOVW	ZERO, (A0)
+	FENCE
+	MOV	$0, A0	// exit code
+	MOV	$SYS_exit, A7
+	ECALL
+	JMP	0(PC)
+
+// func open(name *byte, mode, perm int32) int32
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0-20
+	MOV	$AT_FDCWD, A0
+	MOV	name+0(FP), A1
+	MOVW	mode+8(FP), A2
+	MOVW	perm+12(FP), A3
+	MOV	$SYS_openat, A7
+	ECALL
+	MOV	$-4096, T0
+	BGEU	T0, A0, 2(PC)
+	MOV	$-1, A0
+	MOVW	A0, ret+16(FP)
+	RET
+
+// func closefd(fd int32) int32
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	fd+0(FP), A0
+	MOV	$SYS_close, A7
+	ECALL
+	MOV	$-4096, T0
+	BGEU	T0, A0, 2(PC)
+	MOV	$-1, A0
+	MOVW	A0, ret+8(FP)
+	RET
+
+// func write1(fd uintptr, p unsafe.Pointer, n int32) int32
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0-28
+	MOV	fd+0(FP), A0
+	MOV	p+8(FP), A1
+	MOVW	n+16(FP), A2
+	MOV	$SYS_write, A7
+	ECALL
+	MOVW	A0, ret+24(FP)
+	RET
+
+// func read(fd int32, p unsafe.Pointer, n int32) int32
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	fd+0(FP), A0
+	MOV	p+8(FP), A1
+	MOVW	n+16(FP), A2
+	MOV	$SYS_read, A7
+	ECALL
+	MOVW	A0, ret+24(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	MOV	$r+8(FP), A0
+	MOVW	flags+0(FP), A1
+	MOV	$SYS_pipe2, A7
+	ECALL
+	MOVW	A0, errno+16(FP)
+	RET
+
+// func usleep(usec uint32)
+TEXT runtime·usleep(SB),NOSPLIT,$24-4
+	MOVWU	usec+0(FP), A0
+	MOV	$1000, A1
+	MUL	A1, A0, A0
+	MOV	$1000000000, A1
+	DIV	A1, A0, A2
+	MOV	A2, 8(X2)
+	REM	A1, A0, A3
+	MOV	A3, 16(X2)
+	ADD	$8, X2, A0
+	MOV	ZERO, A1
+	MOV	$SYS_nanosleep, A7
+	ECALL
+	RET
+
+// func gettid() uint32
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOV	$SYS_gettid, A7
+	ECALL
+	MOVW	A0, ret+0(FP)
+	RET
+
+// func raise(sig uint32)
+TEXT runtime·raise(SB),NOSPLIT|NOFRAME,$0
+	MOV	$SYS_gettid, A7
+	ECALL
+	// arg 1 tid - already in A0
+	MOVW	sig+0(FP), A1	// arg 2
+	MOV	$SYS_tkill, A7
+	ECALL
+	RET
+
+// func raiseproc(sig uint32)
+TEXT runtime·raiseproc(SB),NOSPLIT|NOFRAME,$0
+	MOV	$SYS_getpid, A7
+	ECALL
+	// arg 1 pid - already in A0
+	MOVW	sig+0(FP), A1	// arg 2
+	MOV	$SYS_kill, A7
+	ECALL
+	RET
+
+// func getpid() int
+TEXT ·getpid(SB),NOSPLIT|NOFRAME,$0-8
+	MOV	$SYS_getpid, A7
+	ECALL
+	MOV	A0, ret+0(FP)
+	RET
+
+// func tgkill(tgid, tid, sig int)
+TEXT ·tgkill(SB),NOSPLIT|NOFRAME,$0-24
+	MOV	tgid+0(FP), A0
+	MOV	tid+8(FP), A1
+	MOV	sig+16(FP), A2
+	MOV	$SYS_tgkill, A7
+	ECALL
+	RET
+
+// func setitimer(mode int32, new, old *itimerval)
+TEXT runtime·setitimer(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	mode+0(FP), A0
+	MOV	new+8(FP), A1
+	MOV	old+16(FP), A2
+	MOV	$SYS_setitimer, A7
+	ECALL
+	RET
+
+// func timer_create(clockid int32, sevp *sigevent, timerid *int32) int32
+TEXT runtime·timer_create(SB),NOSPLIT,$0-28
+	MOVW	clockid+0(FP), A0
+	MOV	sevp+8(FP), A1
+	MOV	timerid+16(FP), A2
+	MOV	$SYS_timer_create, A7
+	ECALL
+	MOVW	A0, ret+24(FP)
+	RET
+
+// func timer_settime(timerid int32, flags int32, new, old *itimerspec) int32
+TEXT runtime·timer_settime(SB),NOSPLIT,$0-28
+	MOVW	timerid+0(FP), A0
+	MOVW	flags+4(FP), A1
+	MOV	new+8(FP), A2
+	MOV	old+16(FP), A3
+	MOV	$SYS_timer_settime, A7
+	ECALL
+	MOVW	A0, ret+24(FP)
+	RET
+
+// func timer_delete(timerid int32) int32
+TEXT runtime·timer_delete(SB),NOSPLIT,$0-12
+	MOVW	timerid+0(FP), A0
+	MOV	$SYS_timer_delete, A7
+	ECALL
+	MOVW	A0, ret+8(FP)
+	RET
+
+// func mincore(addr unsafe.Pointer, n uintptr, dst *byte) int32
+TEXT runtime·mincore(SB),NOSPLIT|NOFRAME,$0-28
+	MOV	addr+0(FP), A0
+	MOV	n+8(FP), A1
+	MOV	dst+16(FP), A2
+	MOV	$SYS_mincore, A7
+	ECALL
+	MOVW	A0, ret+24(FP)
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB),NOSPLIT,$40-12
+	MOV	$CLOCK_REALTIME, A0
+
+	MOV	runtime·vdsoClockgettimeSym(SB), A7
+	BEQZ	A7, fallback
+	MOV	X2, S2 // S2,S3,S4 is unchanged by C code
+	MOV	g_m(g), S3 // S3 = m
+
+	// Save the old values on stack for reentrant
+	MOV	m_vdsoPC(S3), T0
+	MOV	T0, 24(X2)
+	MOV	m_vdsoSP(S3), T0
+	MOV	T0, 32(X2)
+
+	MOV	RA, m_vdsoPC(S3)
+	MOV	$ret-8(FP), T1 // caller's SP
+	MOV	T1, m_vdsoSP(S3)
+
+	MOV	m_curg(S3), T1
+	BNE	g, T1, noswitch
+
+	MOV	m_g0(S3), T1
+	MOV	(g_sched+gobuf_sp)(T1), X2
+
+noswitch:
+	ADDI	$-24, X2 // Space for result
+	ANDI	$~7, X2 // Align for C code
+	MOV	$8(X2), A1
+
+	// Store g on gsignal's stack, see sys_linux_arm64.s for detail
+	MOVBU	runtime·iscgo(SB), S4
+	BNEZ	S4, nosaveg
+	MOV	m_gsignal(S3), S4 // g.m.gsignal
+	BEQZ	S4, nosaveg
+	BEQ	g, S4, nosaveg
+	MOV	(g_stack+stack_lo)(S4), S4 // g.m.gsignal.stack.lo
+	MOV	g, (S4)
+
+	JALR	RA, A7
+
+	MOV	ZERO, (S4)
+	JMP	finish
+
+nosaveg:
+	JALR	RA, A7
+
+finish:
+	MOV	8(X2), T0	// sec
+	MOV	16(X2), T1	// nsec
+
+	MOV	S2, X2	// restore stack
+	MOV	24(X2), A2
+	MOV	A2, m_vdsoPC(S3)
+
+	MOV	32(X2), A3
+	MOV	A3, m_vdsoSP(S3)
+
+	MOV	T0, sec+0(FP)
+	MOVW	T1, nsec+8(FP)
+	RET
+
+fallback:
+	MOV	$8(X2), A1
+	MOV	$SYS_clock_gettime, A7
+	ECALL
+	MOV	8(X2), T0	// sec
+	MOV	16(X2), T1	// nsec
+	MOV	T0, sec+0(FP)
+	MOVW	T1, nsec+8(FP)
+	RET
+
+// func nanotime1() int64
+TEXT runtime·nanotime1(SB),NOSPLIT,$40-8
+	MOV	$CLOCK_MONOTONIC, A0
+
+	MOV	runtime·vdsoClockgettimeSym(SB), A7
+	BEQZ	A7, fallback
+
+	MOV	X2, S2 // S2 = RSP, S2 is unchanged by C code
+	MOV	g_m(g), S3 // S3 = m
+	// Save the old values on stack for reentrant
+	MOV	m_vdsoPC(S3), T0
+	MOV	T0, 24(X2)
+	MOV	m_vdsoSP(S3), T0
+	MOV	T0, 32(X2)
+
+	MOV	RA, m_vdsoPC(S3)
+	MOV	$ret-8(FP), T0 // caller's SP
+	MOV	T0, m_vdsoSP(S3)
+
+	MOV	m_curg(S3), T1
+	BNE	g, T1, noswitch
+
+	MOV	m_g0(S3), T1
+	MOV	(g_sched+gobuf_sp)(T1), X2
+
+noswitch:
+	ADDI	$-24, X2 // Space for result
+	ANDI	$~7, X2 // Align for C code
+	MOV	$8(X2), A1
+
+	// Store g on gsignal's stack, see sys_linux_arm64.s for detail
+	MOVBU	runtime·iscgo(SB), S4
+	BNEZ	S4, nosaveg
+	MOV	m_gsignal(S3), S4 // g.m.gsignal
+	BEQZ	S4, nosaveg
+	BEQ	g, S4, nosaveg
+	MOV	(g_stack+stack_lo)(S4), S4 // g.m.gsignal.stack.lo
+	MOV	g, (S4)
+
+	JALR	RA, A7
+
+	MOV	ZERO, (S4)
+	JMP	finish
+
+nosaveg:
+	JALR	RA, A7
+
+finish:
+	MOV	8(X2), T0	// sec
+	MOV	16(X2), T1	// nsec
+	// restore stack
+	MOV	S2, X2
+	MOV	24(X2), T2
+	MOV	T2, m_vdsoPC(S3)
+
+	MOV	32(X2), T2
+	MOV	T2, m_vdsoSP(S3)
+	// sec is in T0, nsec in T1
+	// return nsec in T0
+	MOV	$1000000000, T2
+	MUL	T2, T0
+	ADD	T1, T0
+	MOV	T0, ret+0(FP)
+	RET
+
+fallback:
+	MOV	$8(X2), A1
+	MOV	$SYS_clock_gettime, A7
+	ECALL
+	MOV	8(X2), T0	// sec
+	MOV	16(X2), T1	// nsec
+	MOV	$1000000000, T2
+	MUL	T2, T0
+	ADD	T1, T0
+	MOV	T0, ret+0(FP)
+	RET
+
+// func rtsigprocmask(how int32, new, old *sigset, size int32)
+TEXT runtime·rtsigprocmask(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	how+0(FP), A0
+	MOV	new+8(FP), A1
+	MOV	old+16(FP), A2
+	MOVW	size+24(FP), A3
+	MOV	$SYS_rt_sigprocmask, A7
+	ECALL
+	MOV	$-4096, T0
+	BLTU	A0, T0, 2(PC)
+	WORD	$0	// crash
+	RET
+
+// func rt_sigaction(sig uintptr, new, old *sigactiont, size uintptr) int32
+TEXT runtime·rt_sigaction(SB),NOSPLIT|NOFRAME,$0-36
+	MOV	sig+0(FP), A0
+	MOV	new+8(FP), A1
+	MOV	old+16(FP), A2
+	MOV	size+24(FP), A3
+	MOV	$SYS_rt_sigaction, A7
+	ECALL
+	MOVW	A0, ret+32(FP)
+	RET
+
+// func sigfwd(fn uintptr, sig uint32, info *siginfo, ctx unsafe.Pointer)
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), A0
+	MOV	info+16(FP), A1
+	MOV	ctx+24(FP), A2
+	MOV	fn+0(FP), T1
+	JALR	RA, T1
+	RET
+
+// func sigtramp(signo, ureg, ctxt unsafe.Pointer)
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$64
+	MOVW	A0, 8(X2)
+	MOV	A1, 16(X2)
+	MOV	A2, 24(X2)
+
+	// this might be called in external code context,
+	// where g is not set.
+	MOVBU	runtime·iscgo(SB), A0
+	BEQ	A0, ZERO, 2(PC)
+	CALL	runtime·load_g(SB)
+
+	MOV	$runtime·sigtrampgo(SB), A0
+	JALR	RA, A0
+	RET
+
+// func cgoSigtramp()
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	MOV	$runtime·sigtramp(SB), T1
+	JALR	ZERO, T1
+
+// func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (p unsafe.Pointer, err int)
+TEXT runtime·mmap(SB),NOSPLIT|NOFRAME,$0
+	MOV	addr+0(FP), A0
+	MOV	n+8(FP), A1
+	MOVW	prot+16(FP), A2
+	MOVW	flags+20(FP), A3
+	MOVW	fd+24(FP), A4
+	MOVW	off+28(FP), A5
+	MOV	$SYS_mmap, A7
+	ECALL
+	MOV	$-4096, T0
+	BGEU	T0, A0, 5(PC)
+	SUB	A0, ZERO, A0
+	MOV	ZERO, p+32(FP)
+	MOV	A0, err+40(FP)
+	RET
+ok:
+	MOV	A0, p+32(FP)
+	MOV	ZERO, err+40(FP)
+	RET
+
+// func munmap(addr unsafe.Pointer, n uintptr)
+TEXT runtime·munmap(SB),NOSPLIT|NOFRAME,$0
+	MOV	addr+0(FP), A0
+	MOV	n+8(FP), A1
+	MOV	$SYS_munmap, A7
+	ECALL
+	MOV	$-4096, T0
+	BLTU	A0, T0, 2(PC)
+	WORD	$0	// crash
+	RET
+
+// func madvise(addr unsafe.Pointer, n uintptr, flags int32)
+TEXT runtime·madvise(SB),NOSPLIT|NOFRAME,$0
+	MOV	addr+0(FP), A0
+	MOV	n+8(FP), A1
+	MOVW	flags+16(FP), A2
+	MOV	$SYS_madvise, A7
+	ECALL
+	MOVW	A0, ret+24(FP)
+	RET
+
+// func futex(addr unsafe.Pointer, op int32, val uint32, ts, addr2 unsafe.Pointer, val3 uint32) int32
+TEXT runtime·futex(SB),NOSPLIT|NOFRAME,$0
+	MOV	addr+0(FP), A0
+	MOVW	op+8(FP), A1
+	MOVW	val+12(FP), A2
+	MOV	ts+16(FP), A3
+	MOV	addr2+24(FP), A4
+	MOVW	val3+32(FP), A5
+	MOV	$SYS_futex, A7
+	ECALL
+	MOVW	A0, ret+40(FP)
+	RET
+
+// func clone(flags int32, stk, mp, gp, fn unsafe.Pointer) int32
+TEXT runtime·clone(SB),NOSPLIT|NOFRAME,$0
+	MOVW	flags+0(FP), A0
+	MOV	stk+8(FP), A1
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	MOV	mp+16(FP), T0
+	MOV	gp+24(FP), T1
+	MOV	fn+32(FP), T2
+
+	MOV	T0, -8(A1)
+	MOV	T1, -16(A1)
+	MOV	T2, -24(A1)
+	MOV	$1234, T0
+	MOV	T0, -32(A1)
+
+	MOV	$SYS_clone, A7
+	ECALL
+
+	// In parent, return.
+	BEQ	ZERO, A0, child
+	MOVW	ZERO, ret+40(FP)
+	RET
+
+child:
+	// In child, on new stack.
+	MOV	-32(X2), T0
+	MOV	$1234, A0
+	BEQ	A0, T0, good
+	WORD	$0	// crash
+
+good:
+	// Initialize m->procid to Linux tid
+	MOV	$SYS_gettid, A7
+	ECALL
+
+	MOV	-24(X2), T2	// fn
+	MOV	-16(X2), T1	// g
+	MOV	-8(X2), T0	// m
+
+	BEQ	ZERO, T0, nog
+	BEQ	ZERO, T1, nog
+
+	MOV	A0, m_procid(T0)
+
+	// In child, set up new stack
+	MOV	T0, g_m(T1)
+	MOV	T1, g
+
+nog:
+	// Call fn
+	JALR	RA, T2
+
+	// It shouldn't return.  If it does, exit this thread.
+	MOV	$111, A0
+	MOV	$SYS_exit, A7
+	ECALL
+	JMP	-3(PC)	// keep exiting
+
+// func sigaltstack(new, old *stackt)
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOV	new+0(FP), A0
+	MOV	old+8(FP), A1
+	MOV	$SYS_sigaltstack, A7
+	ECALL
+	MOV	$-4096, T0
+	BLTU	A0, T0, 2(PC)
+	WORD	$0	// crash
+	RET
+
+// func osyield()
+TEXT runtime·osyield(SB),NOSPLIT|NOFRAME,$0
+	MOV	$SYS_sched_yield, A7
+	ECALL
+	RET
+
+// func sched_getaffinity(pid, len uintptr, buf *uintptr) int32
+TEXT runtime·sched_getaffinity(SB),NOSPLIT|NOFRAME,$0
+	MOV	pid+0(FP), A0
+	MOV	len+8(FP), A1
+	MOV	buf+16(FP), A2
+	MOV	$SYS_sched_getaffinity, A7
+	ECALL
+	MOV	A0, ret+24(FP)
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT,$0-8
+	// Implemented as brk(NULL).
+	MOV	$0, A0
+	MOV	$SYS_brk, A7
+	ECALL
+	MOVW	A0, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_linux_s390x.s b/src/runtime/sys_linux_s390x.s
new file mode 100644
index 0000000..adf5612
--- /dev/null
+++ b/src/runtime/sys_linux_s390x.s
@@ -0,0 +1,606 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// System calls and other system stuff for Linux s390x; see
+// /usr/include/asm/unistd.h for the syscall number definitions.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define SYS_exit                  1
+#define SYS_read                  3
+#define SYS_write                 4
+#define SYS_open                  5
+#define SYS_close                 6
+#define SYS_getpid               20
+#define SYS_kill                 37
+#define SYS_brk			 45
+#define SYS_mmap                 90
+#define SYS_munmap               91
+#define SYS_setitimer           104
+#define SYS_clone               120
+#define SYS_sched_yield         158
+#define SYS_nanosleep           162
+#define SYS_rt_sigreturn        173
+#define SYS_rt_sigaction        174
+#define SYS_rt_sigprocmask      175
+#define SYS_sigaltstack         186
+#define SYS_madvise             219
+#define SYS_mincore             218
+#define SYS_gettid              236
+#define SYS_futex               238
+#define SYS_sched_getaffinity   240
+#define SYS_tgkill              241
+#define SYS_exit_group          248
+#define SYS_timer_create        254
+#define SYS_timer_settime       255
+#define SYS_timer_delete        258
+#define SYS_clock_gettime       260
+#define SYS_pipe2		325
+
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	code+0(FP), R2
+	MOVW	$SYS_exit_group, R1
+	SYSCALL
+	RET
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT|NOFRAME,$0-8
+	MOVD	wait+0(FP), R1
+	// We're done using the stack.
+	MOVW	$0, R2
+	MOVW	R2, (R1)
+	MOVW	$0, R2	// exit code
+	MOVW	$SYS_exit, R1
+	SYSCALL
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0-20
+	MOVD	name+0(FP), R2
+	MOVW	mode+8(FP), R3
+	MOVW	perm+12(FP), R4
+	MOVW	$SYS_open, R1
+	SYSCALL
+	MOVD	$-4095, R3
+	CMPUBLT	R2, R3, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	fd+0(FP), R2
+	MOVW	$SYS_close, R1
+	SYSCALL
+	MOVD	$-4095, R3
+	CMPUBLT	R2, R3, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+8(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0-28
+	MOVD	fd+0(FP), R2
+	MOVD	p+8(FP), R3
+	MOVW	n+16(FP), R4
+	MOVW	$SYS_write, R1
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	fd+0(FP), R2
+	MOVD	p+8(FP), R3
+	MOVW	n+16(FP), R4
+	MOVW	$SYS_read, R1
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+// func pipe2() (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	MOVD	$r+8(FP), R2
+	MOVW	flags+0(FP), R3
+	MOVW	$SYS_pipe2, R1
+	SYSCALL
+	MOVW	R2, errno+16(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16-4
+	MOVW	usec+0(FP), R2
+	MOVD	R2, R4
+	MOVW	$1000000, R3
+	DIVD	R3, R2
+	MOVD	R2, 8(R15)
+	MOVW	$1000, R3
+	MULLD	R2, R3
+	SUB	R3, R4
+	MOVD	R4, 16(R15)
+
+	// nanosleep(&ts, 0)
+	ADD	$8, R15, R2
+	MOVW	$0, R3
+	MOVW	$SYS_nanosleep, R1
+	SYSCALL
+	RET
+
+TEXT runtime·gettid(SB),NOSPLIT,$0-4
+	MOVW	$SYS_gettid, R1
+	SYSCALL
+	MOVW	R2, ret+0(FP)
+	RET
+
+TEXT runtime·raise(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$SYS_getpid, R1
+	SYSCALL
+	MOVW	R2, R10
+	MOVW	$SYS_gettid, R1
+	SYSCALL
+	MOVW	R2, R3	// arg 2 tid
+	MOVW	R10, R2	// arg 1 pid
+	MOVW	sig+0(FP), R4	// arg 2
+	MOVW	$SYS_tgkill, R1
+	SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$SYS_getpid, R1
+	SYSCALL
+	MOVW	R2, R2	// arg 1 pid
+	MOVW	sig+0(FP), R3	// arg 2
+	MOVW	$SYS_kill, R1
+	SYSCALL
+	RET
+
+TEXT ·getpid(SB),NOSPLIT|NOFRAME,$0-8
+	MOVW	$SYS_getpid, R1
+	SYSCALL
+	MOVD	R2, ret+0(FP)
+	RET
+
+TEXT ·tgkill(SB),NOSPLIT|NOFRAME,$0-24
+	MOVD	tgid+0(FP), R2
+	MOVD	tid+8(FP), R3
+	MOVD	sig+16(FP), R4
+	MOVW	$SYS_tgkill, R1
+	SYSCALL
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT|NOFRAME,$0-24
+	MOVW	mode+0(FP), R2
+	MOVD	new+8(FP), R3
+	MOVD	old+16(FP), R4
+	MOVW	$SYS_setitimer, R1
+	SYSCALL
+	RET
+
+TEXT runtime·timer_create(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	clockid+0(FP), R2
+	MOVD	sevp+8(FP), R3
+	MOVD	timerid+16(FP), R4
+	MOVW	$SYS_timer_create, R1
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+TEXT runtime·timer_settime(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	timerid+0(FP), R2
+	MOVW	flags+4(FP), R3
+	MOVD	new+8(FP), R4
+	MOVD	old+16(FP), R5
+	MOVW	$SYS_timer_settime, R1
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+TEXT runtime·timer_delete(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	timerid+0(FP), R2
+	MOVW	$SYS_timer_delete, R1
+	SYSCALL
+	MOVW	R2, ret+8(FP)
+	RET
+
+TEXT runtime·mincore(SB),NOSPLIT|NOFRAME,$0-28
+	MOVD	addr+0(FP), R2
+	MOVD	n+8(FP), R3
+	MOVD	dst+16(FP), R4
+	MOVW	$SYS_mincore, R1
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB),NOSPLIT,$32-12
+	MOVW	$0, R2			// CLOCK_REALTIME
+	MOVD	R15, R7			// Backup stack pointer
+
+	MOVD	g_m(g), R6		//m
+
+	MOVD	runtime·vdsoClockgettimeSym(SB), R9	// Check for VDSO availability
+	CMPBEQ	R9, $0, fallback
+
+	MOVD	m_vdsoPC(R6), R4
+	MOVD	R4, 16(R15)
+	MOVD	m_vdsoSP(R6), R4
+	MOVD	R4, 24(R15)
+
+	MOVD	R14, R8 		// Backup return address
+	MOVD	$sec+0(FP), R4 	// return parameter caller
+
+	MOVD	R8, m_vdsoPC(R6)
+	MOVD	R4, m_vdsoSP(R6)
+
+	MOVD	m_curg(R6), R5
+	CMP		g, R5
+	BNE		noswitch
+
+	MOVD	m_g0(R6), R4
+	MOVD	(g_sched+gobuf_sp)(R4), R15	// Set SP to g0 stack
+
+noswitch:
+	SUB		$16, R15		// reserve 2x 8 bytes for parameters
+	MOVD	$~7, R4			// align to 8 bytes because of gcc ABI
+	AND		R4, R15
+	MOVD	R15, R3			// R15 needs to be in R3 as expected by kernel_clock_gettime
+
+	MOVB	runtime·iscgo(SB),R12
+	CMPBNE	R12, $0, nosaveg
+
+	MOVD	m_gsignal(R6), R12	// g.m.gsignal
+	CMPBEQ	R12, $0, nosaveg
+
+	CMPBEQ	g, R12, nosaveg
+	MOVD	(g_stack+stack_lo)(R12), R12 // g.m.gsignal.stack.lo
+	MOVD	g, (R12)
+
+	BL	R9 // to vdso lookup
+
+	MOVD	$0, (R12)
+
+	JMP	finish
+
+nosaveg:
+	BL	R9					// to vdso lookup
+
+finish:
+	MOVD	0(R15), R3		// sec
+	MOVD	8(R15), R5		// nsec
+	MOVD	R7, R15			// Restore SP
+
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVD	24(R15), R12
+	MOVD	R12, m_vdsoSP(R6)
+	MOVD	16(R15), R12
+	MOVD	R12, m_vdsoPC(R6)
+
+return:
+	// sec is in R3, nsec in R5
+	// return nsec in R3
+	MOVD	R3, sec+0(FP)
+	MOVW	R5, nsec+8(FP)
+	RET
+
+	// Syscall fallback
+fallback:
+	MOVD	$tp-16(SP), R3
+	MOVW	$SYS_clock_gettime, R1
+	SYSCALL
+	LMG		tp-16(SP), R2, R3
+	// sec is in R2, nsec in R3
+	MOVD	R2, sec+0(FP)
+	MOVW	R3, nsec+8(FP)
+	RET
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$32-8
+	MOVW	$1, R2			// CLOCK_MONOTONIC
+
+	MOVD	R15, R7			// Backup stack pointer
+
+	MOVD	g_m(g), R6		//m
+
+	MOVD	runtime·vdsoClockgettimeSym(SB), R9	// Check for VDSO availability
+	CMPBEQ	R9, $0, fallback
+
+	MOVD	m_vdsoPC(R6), R4
+	MOVD	R4, 16(R15)
+	MOVD	m_vdsoSP(R6), R4
+	MOVD	R4, 24(R15)
+
+	MOVD	R14, R8			// Backup return address
+	MOVD	$ret+0(FP), R4	// caller's SP
+
+	MOVD	R8, m_vdsoPC(R6)
+	MOVD	R4, m_vdsoSP(R6)
+
+	MOVD	m_curg(R6), R5
+	CMP		g, R5
+	BNE		noswitch
+
+	MOVD	m_g0(R6), R4
+	MOVD	(g_sched+gobuf_sp)(R4), R15	// Set SP to g0 stack
+
+noswitch:
+	SUB		$16, R15		// reserve 2x 8 bytes for parameters
+	MOVD	$~7, R4			// align to 8 bytes because of gcc ABI
+	AND		R4, R15
+	MOVD	R15, R3			// R15 needs to be in R3 as expected by kernel_clock_gettime
+
+	MOVB	runtime·iscgo(SB),R12
+	CMPBNE	R12, $0, nosaveg
+
+	MOVD	m_gsignal(R6), R12	// g.m.gsignal
+	CMPBEQ	R12, $0, nosaveg
+
+	CMPBEQ	g, R12, nosaveg
+	MOVD	(g_stack+stack_lo)(R12), R12	// g.m.gsignal.stack.lo
+	MOVD	g, (R12)
+
+	BL	R9 					// to vdso lookup
+
+	MOVD $0, (R12)
+
+	JMP	finish
+
+nosaveg:
+	BL	R9					// to vdso lookup
+
+finish:
+	MOVD	0(R15), R3		// sec
+	MOVD	8(R15), R5		// nsec
+	MOVD	R7, R15			// Restore SP
+
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+
+	MOVD	24(R15), R12
+	MOVD	R12, m_vdsoSP(R6)
+	MOVD	16(R15), R12
+	MOVD	R12, m_vdsoPC(R6)
+
+return:
+	// sec is in R3, nsec in R5
+	// return nsec in R3
+	MULLD	$1000000000, R3
+	ADD		R5, R3
+	MOVD	R3, ret+0(FP)
+	RET
+
+	// Syscall fallback
+fallback:
+	MOVD	$tp-16(SP), R3
+	MOVD	$SYS_clock_gettime, R1
+	SYSCALL
+	LMG		tp-16(SP), R2, R3
+	MOVD	R3, R5
+	MOVD	R2, R3
+	JMP	return
+
+TEXT runtime·rtsigprocmask(SB),NOSPLIT|NOFRAME,$0-28
+	MOVW	how+0(FP), R2
+	MOVD	new+8(FP), R3
+	MOVD	old+16(FP), R4
+	MOVW	size+24(FP), R5
+	MOVW	$SYS_rt_sigprocmask, R1
+	SYSCALL
+	MOVD	$-4095, R3
+	CMPUBLT	R2, R3, 2(PC)
+	MOVD	R0, 0(R0) // crash
+	RET
+
+TEXT runtime·rt_sigaction(SB),NOSPLIT|NOFRAME,$0-36
+	MOVD	sig+0(FP), R2
+	MOVD	new+8(FP), R3
+	MOVD	old+16(FP), R4
+	MOVD	size+24(FP), R5
+	MOVW	$SYS_rt_sigaction, R1
+	SYSCALL
+	MOVW	R2, ret+32(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R2
+	MOVD	info+16(FP), R3
+	MOVD	ctx+24(FP), R4
+	MOVD	fn+0(FP), R5
+	BL	R5
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$64
+	// initialize essential registers (just in case)
+	XOR	R0, R0
+
+	// this might be called in external code context,
+	// where g is not set.
+	MOVB	runtime·iscgo(SB), R6
+	CMPBEQ	R6, $0, 2(PC)
+	BL	runtime·load_g(SB)
+
+	MOVW	R2, 8(R15)
+	MOVD	R3, 16(R15)
+	MOVD	R4, 24(R15)
+	MOVD	$runtime·sigtrampgo(SB), R5
+	BL	R5
+	RET
+
+TEXT runtime·cgoSigtramp(SB),NOSPLIT,$0
+	BR	runtime·sigtramp(SB)
+
+// func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) unsafe.Pointer
+TEXT runtime·mmap(SB),NOSPLIT,$48-48
+	MOVD	addr+0(FP), R2
+	MOVD	n+8(FP), R3
+	MOVW	prot+16(FP), R4
+	MOVW	flags+20(FP), R5
+	MOVW	fd+24(FP), R6
+	MOVWZ	off+28(FP), R7
+
+	// s390x uses old_mmap, so the arguments need to be placed into
+	// a struct and a pointer to the struct passed to mmap.
+	MOVD	R2, addr-48(SP)
+	MOVD	R3, n-40(SP)
+	MOVD	R4, prot-32(SP)
+	MOVD	R5, flags-24(SP)
+	MOVD	R6, fd-16(SP)
+	MOVD	R7, off-8(SP)
+
+	MOVD	$addr-48(SP), R2
+	MOVW	$SYS_mmap, R1
+	SYSCALL
+	MOVD	$-4095, R3
+	CMPUBLT	R2, R3, ok
+	NEG	R2
+	MOVD	$0, p+32(FP)
+	MOVD	R2, err+40(FP)
+	RET
+ok:
+	MOVD	R2, p+32(FP)
+	MOVD	$0, err+40(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R2
+	MOVD	n+8(FP), R3
+	MOVW	$SYS_munmap, R1
+	SYSCALL
+	MOVD	$-4095, R3
+	CMPUBLT	R2, R3, 2(PC)
+	MOVD	R0, 0(R0) // crash
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R2
+	MOVD	n+8(FP), R3
+	MOVW	flags+16(FP), R4
+	MOVW	$SYS_madvise, R1
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+// int64 futex(int32 *uaddr, int32 op, int32 val,
+//	struct timespec *timeout, int32 *uaddr2, int32 val2);
+TEXT runtime·futex(SB),NOSPLIT|NOFRAME,$0
+	MOVD	addr+0(FP), R2
+	MOVW	op+8(FP), R3
+	MOVW	val+12(FP), R4
+	MOVD	ts+16(FP), R5
+	MOVD	addr2+24(FP), R6
+	MOVW	val3+32(FP),  R7
+	MOVW	$SYS_futex, R1
+	SYSCALL
+	MOVW	R2, ret+40(FP)
+	RET
+
+// int32 clone(int32 flags, void *stk, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·clone(SB),NOSPLIT|NOFRAME,$0
+	MOVW	flags+0(FP), R3
+	MOVD	stk+8(FP), R2
+
+	// Copy mp, gp, fn off parent stack for use by child.
+	// Careful: Linux system call clobbers ???.
+	MOVD	mp+16(FP), R7
+	MOVD	gp+24(FP), R8
+	MOVD	fn+32(FP), R9
+
+	MOVD	R7, -8(R2)
+	MOVD	R8, -16(R2)
+	MOVD	R9, -24(R2)
+	MOVD	$1234, R7
+	MOVD	R7, -32(R2)
+
+	SYSCALL $SYS_clone
+
+	// In parent, return.
+	CMPBEQ	R2, $0, 3(PC)
+	MOVW	R2, ret+40(FP)
+	RET
+
+	// In child, on new stack.
+	// initialize essential registers
+	XOR	R0, R0
+	MOVD	-32(R15), R7
+	CMP	R7, $1234
+	BEQ	2(PC)
+	MOVD	R0, 0(R0)
+
+	// Initialize m->procid to Linux tid
+	SYSCALL $SYS_gettid
+
+	MOVD	-24(R15), R9        // fn
+	MOVD	-16(R15), R8        // g
+	MOVD	-8(R15), R7         // m
+
+	CMPBEQ	R7, $0, nog
+	CMP	R8, $0
+	BEQ	nog
+
+	MOVD	R2, m_procid(R7)
+
+	// In child, set up new stack
+	MOVD	R7, g_m(R8)
+	MOVD	R8, g
+	//CALL	runtime·stackcheck(SB)
+
+nog:
+	// Call fn
+	BL	R9
+
+	// It shouldn't return.	 If it does, exit that thread.
+	MOVW	$111, R2
+	MOVW	$SYS_exit, R1
+	SYSCALL
+	BR	-2(PC)	// keep exiting
+
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOVD	new+0(FP), R2
+	MOVD	old+8(FP), R3
+	MOVW	$SYS_sigaltstack, R1
+	SYSCALL
+	MOVD	$-4095, R3
+	CMPUBLT	R2, R3, 2(PC)
+	MOVD	R0, 0(R0) // crash
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$SYS_sched_yield, R1
+	SYSCALL
+	RET
+
+TEXT runtime·sched_getaffinity(SB),NOSPLIT|NOFRAME,$0
+	MOVD	pid+0(FP), R2
+	MOVD	len+8(FP), R3
+	MOVD	buf+16(FP), R4
+	MOVW	$SYS_sched_getaffinity, R1
+	SYSCALL
+	MOVW	R2, ret+24(FP)
+	RET
+
+// func sbrk0() uintptr
+TEXT runtime·sbrk0(SB),NOSPLIT|NOFRAME,$0-8
+	// Implemented as brk(NULL).
+	MOVD	$0, R2
+	MOVW	$SYS_brk, R1
+	SYSCALL
+	MOVD	R2, ret+0(FP)
+	RET
+
+TEXT runtime·access(SB),$0-20
+	MOVD	$0, 2(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+16(FP)
+	RET
+
+TEXT runtime·connect(SB),$0-28
+	MOVD	$0, 2(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+24(FP)
+	RET
+
+TEXT runtime·socket(SB),$0-20
+	MOVD	$0, 2(R0) // unimplemented, only needed for android; declared in stubs_linux.go
+	MOVW	R0, ret+16(FP)
+	RET
diff --git a/src/runtime/sys_loong64.go b/src/runtime/sys_loong64.go
new file mode 100644
index 0000000..812db5c
--- /dev/null
+++ b/src/runtime/sys_loong64.go
@@ -0,0 +1,20 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build loong64
+
+package runtime
+
+import "unsafe"
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then did an immediate Gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	if buf.lr != 0 {
+		throw("invalid use of gostartcall")
+	}
+	buf.lr = buf.pc
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
diff --git a/src/runtime/sys_mips64x.go b/src/runtime/sys_mips64x.go
new file mode 100644
index 0000000..b715384
--- /dev/null
+++ b/src/runtime/sys_mips64x.go
@@ -0,0 +1,20 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips64 || mips64le
+
+package runtime
+
+import "unsafe"
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then did an immediate Gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	if buf.lr != 0 {
+		throw("invalid use of gostartcall")
+	}
+	buf.lr = buf.pc
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
diff --git a/src/runtime/sys_mipsx.go b/src/runtime/sys_mipsx.go
new file mode 100644
index 0000000..b60135f
--- /dev/null
+++ b/src/runtime/sys_mipsx.go
@@ -0,0 +1,20 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips || mipsle
+
+package runtime
+
+import "unsafe"
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then did an immediate Gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	if buf.lr != 0 {
+		throw("invalid use of gostartcall")
+	}
+	buf.lr = buf.pc
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
diff --git a/src/runtime/sys_netbsd_386.s b/src/runtime/sys_netbsd_386.s
new file mode 100644
index 0000000..f4875cd
--- /dev/null
+++ b/src/runtime/sys_netbsd_386.s
@@ -0,0 +1,477 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for 386, NetBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME		0
+#define CLOCK_MONOTONIC		3
+
+#define SYS_exit			1
+#define SYS_read			3
+#define SYS_write			4
+#define SYS_open			5
+#define SYS_close			6
+#define SYS_getpid			20
+#define SYS_kill			37
+#define SYS_munmap			73
+#define SYS_madvise			75
+#define SYS_fcntl			92
+#define SYS_mmap			197
+#define SYS___sysctl			202
+#define SYS___sigaltstack14		281
+#define SYS___sigprocmask14		293
+#define SYS_issetugid			305
+#define SYS_getcontext			307
+#define SYS_setcontext			308
+#define SYS__lwp_create			309
+#define SYS__lwp_exit			310
+#define SYS__lwp_self			311
+#define SYS__lwp_setprivate		317
+#define SYS__lwp_kill			318
+#define SYS__lwp_unpark			321
+#define SYS___sigaction_sigtramp	340
+#define SYS_kqueue			344
+#define SYS_sched_yield			350
+#define SYS___setitimer50		425
+#define SYS___clock_gettime50		427
+#define SYS___nanosleep50		430
+#define SYS___kevent50			435
+#define SYS____lwp_park60		478
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT,$-4
+	MOVL	$SYS_exit, AX
+	INT	$0x80
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-4
+	MOVL	wait+0(FP), AX
+	// We're done using the stack.
+	MOVL	$0, (AX)
+	MOVL	$SYS__lwp_exit, AX
+	INT	$0x80
+	MOVL	$0xf1, 0xf1		// crash
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$-4
+	MOVL	$SYS_open, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$-4
+	MOVL	$SYS_close, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$-4
+	MOVL	$SYS_read, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX			// caller expects negative errno
+	MOVL	AX, ret+12(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$12-16
+	MOVL	$453, AX
+	LEAL	r+4(FP), BX
+	MOVL	BX, 4(SP)
+	MOVL	flags+0(FP), BX
+	MOVL	BX, 8(SP)
+	INT	$0x80
+	MOVL	AX, errno+12(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$-4
+	MOVL	$SYS_write, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX			// caller expects negative errno
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$24
+	MOVL	$0, DX
+	MOVL	usec+0(FP), AX
+	MOVL	$1000000, CX
+	DIVL	CX
+	MOVL	AX, 12(SP)		// tv_sec - l32
+	MOVL	$0, 16(SP)		// tv_sec - h32
+	MOVL	$1000, AX
+	MULL	DX
+	MOVL	AX, 20(SP)		// tv_nsec
+
+	MOVL	$0, 0(SP)
+	LEAL	12(SP), AX
+	MOVL	AX, 4(SP)		// arg 1 - rqtp
+	MOVL	$0, 8(SP)		// arg 2 - rmtp
+	MOVL	$SYS___nanosleep50, AX
+	INT	$0x80
+	RET
+
+TEXT runtime·lwp_kill(SB),NOSPLIT,$12-8
+	MOVL	$0, 0(SP)
+	MOVL	tid+0(FP), AX
+	MOVL	AX, 4(SP)		// arg 1 - target
+	MOVL	sig+4(FP), AX
+	MOVL	AX, 8(SP)		// arg 2 - signo
+	MOVL	$SYS__lwp_kill, AX
+	INT	$0x80
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$12
+	MOVL	$SYS_getpid, AX
+	INT	$0x80
+	MOVL	$0, 0(SP)
+	MOVL	AX, 4(SP)		// arg 1 - pid
+	MOVL	sig+0(FP), AX
+	MOVL	AX, 8(SP)		// arg 2 - signo
+	MOVL	$SYS_kill, AX
+	INT	$0x80
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$36
+	LEAL	addr+0(FP), SI
+	LEAL	4(SP), DI
+	CLD
+	MOVSL				// arg 1 - addr
+	MOVSL				// arg 2 - len
+	MOVSL				// arg 3 - prot
+	MOVSL				// arg 4 - flags
+	MOVSL				// arg 5 - fd
+	MOVL	$0, AX
+	STOSL				// arg 6 - pad
+	MOVSL				// arg 7 - offset
+	MOVL	$0, AX			// top 32 bits of file offset
+	STOSL
+	MOVL	$SYS_mmap, AX
+	INT	$0x80
+	JAE	ok
+	MOVL	$0, p+24(FP)
+	MOVL	AX, err+28(FP)
+	RET
+ok:
+	MOVL	AX, p+24(FP)
+	MOVL	$0, err+28(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$-4
+	MOVL	$SYS_munmap, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$-4
+	MOVL	$SYS_madvise, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$-4
+	MOVL	$SYS___setitimer50, AX
+	INT	$0x80
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB), NOSPLIT, $32
+	LEAL	12(SP), BX
+	MOVL	$CLOCK_REALTIME, 4(SP)	// arg 1 - clock_id
+	MOVL	BX, 8(SP)		// arg 2 - tp
+	MOVL	$SYS___clock_gettime50, AX
+	INT	$0x80
+
+	MOVL	12(SP), AX		// sec - l32
+	MOVL	AX, sec_lo+0(FP)
+	MOVL	16(SP), AX		// sec - h32
+	MOVL	AX, sec_hi+4(FP)
+
+	MOVL	20(SP), BX		// nsec
+	MOVL	BX, nsec+8(FP)
+	RET
+
+// int64 nanotime1(void) so really
+// void nanotime1(int64 *nsec)
+TEXT runtime·nanotime1(SB),NOSPLIT,$32
+	LEAL	12(SP), BX
+	MOVL	$CLOCK_MONOTONIC, 4(SP)	// arg 1 - clock_id
+	MOVL	BX, 8(SP)		// arg 2 - tp
+	MOVL	$SYS___clock_gettime50, AX
+	INT	$0x80
+
+	MOVL	16(SP), CX		// sec - h32
+	IMULL	$1000000000, CX
+
+	MOVL	12(SP), AX		// sec - l32
+	MOVL	$1000000000, BX
+	MULL	BX			// result in dx:ax
+
+	MOVL	20(SP), BX		// nsec
+	ADDL	BX, AX
+	ADCL	CX, DX			// add high bits with carry
+
+	MOVL	AX, ret_lo+0(FP)
+	MOVL	DX, ret_hi+4(FP)
+	RET
+
+TEXT runtime·getcontext(SB),NOSPLIT,$-4
+	MOVL	$SYS_getcontext, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·sigprocmask(SB),NOSPLIT,$-4
+	MOVL	$SYS___sigprocmask14, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT sigreturn_tramp<>(SB),NOSPLIT,$0
+	LEAL	140(SP), AX		// Load address of ucontext
+	MOVL	AX, 4(SP)
+	MOVL	$SYS_setcontext, AX
+	INT	$0x80
+	MOVL	$-1, 4(SP)		// Something failed...
+	MOVL	$SYS_exit, AX
+	INT	$0x80
+
+TEXT runtime·sigaction(SB),NOSPLIT,$24
+	LEAL	sig+0(FP), SI
+	LEAL	4(SP), DI
+	CLD
+	MOVSL				// arg 1 - sig
+	MOVSL				// arg 2 - act
+	MOVSL				// arg 3 - oact
+	LEAL	sigreturn_tramp<>(SB), AX
+	STOSL				// arg 4 - tramp
+	MOVL	$2, AX
+	STOSL				// arg 5 - vers
+	MOVL	$SYS___sigaction_sigtramp, AX
+	INT	$0x80
+	JAE	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$12-16
+	MOVL	fn+0(FP), AX
+	MOVL	sig+4(FP), BX
+	MOVL	info+8(FP), CX
+	MOVL	ctx+12(FP), DX
+	MOVL	SP, SI
+	SUBL	$32, SP
+	ANDL	$-15, SP	// align stack: handler might be a C function
+	MOVL	BX, 0(SP)
+	MOVL	CX, 4(SP)
+	MOVL	DX, 8(SP)
+	MOVL	SI, 12(SP)	// save SI: handler might be a Go function
+	CALL	AX
+	MOVL	12(SP), AX
+	MOVL	AX, SP
+	RET
+
+// Called by OS using C ABI.
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$28
+	NOP	SP	// tell vet SP changed - stop checking offsets
+	// Save callee-saved C registers, since the caller may be a C signal handler.
+	MOVL	BX, bx-4(SP)
+	MOVL	BP, bp-8(SP)
+	MOVL	SI, si-12(SP)
+	MOVL	DI, di-16(SP)
+	// We don't save mxcsr or the x87 control word because sigtrampgo doesn't
+	// modify them.
+
+	MOVL	32(SP), BX // signo
+	MOVL	BX, 0(SP)
+	MOVL	36(SP), BX // info
+	MOVL	BX, 4(SP)
+	MOVL	40(SP), BX // context
+	MOVL	BX, 8(SP)
+	CALL	runtime·sigtrampgo(SB)
+
+	MOVL	di-16(SP), DI
+	MOVL	si-12(SP), SI
+	MOVL	bp-8(SP),  BP
+	MOVL	bx-4(SP),  BX
+	RET
+
+// int32 lwp_create(void *context, uintptr flags, void *lwpid);
+TEXT runtime·lwp_create(SB),NOSPLIT,$16
+	MOVL	$0, 0(SP)
+	MOVL	ctxt+0(FP), AX
+	MOVL	AX, 4(SP)		// arg 1 - context
+	MOVL	flags+4(FP), AX
+	MOVL	AX, 8(SP)		// arg 2 - flags
+	MOVL	lwpid+8(FP), AX
+	MOVL	AX, 12(SP)		// arg 3 - lwpid
+	MOVL	$SYS__lwp_create, AX
+	INT	$0x80
+	JCC	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·lwp_tramp(SB),NOSPLIT,$0
+
+	// Set FS to point at m->tls
+	LEAL	m_tls(BX), BP
+	PUSHAL				// save registers
+	PUSHL	BP
+	CALL	lwp_setprivate<>(SB)
+	POPL	AX
+	POPAL
+
+	// Now segment is established. Initialize m, g.
+	get_tls(AX)
+	MOVL	DX, g(AX)
+	MOVL	BX, g_m(DX)
+
+	CALL	runtime·stackcheck(SB)	// smashes AX, CX
+	MOVL	0(DX), DX		// paranoia; check they are not nil
+	MOVL	0(BX), BX
+
+	// more paranoia; check that stack splitting code works
+	PUSHAL
+	CALL	runtime·emptyfunc(SB)
+	POPAL
+
+	// Call fn
+	CALL	SI
+
+	// fn should never return
+	MOVL	$0x1234, 0x1005
+	RET
+
+TEXT ·netbsdMstart(SB),NOSPLIT|TOPFRAME,$0
+	CALL	·netbsdMstart0(SB)
+	RET // not reached
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$-8
+	MOVL	$SYS___sigaltstack14, AX
+	MOVL	new+0(FP), BX
+	MOVL	old+4(FP), CX
+	INT	$0x80
+	CMPL	AX, $0xfffff001
+	JLS	2(PC)
+	INT	$3
+	RET
+
+TEXT runtime·setldt(SB),NOSPLIT,$8
+	// Under NetBSD we set the GS base instead of messing with the LDT.
+	MOVL	base+4(FP), AX
+	MOVL	AX, 0(SP)
+	CALL	lwp_setprivate<>(SB)
+	RET
+
+TEXT lwp_setprivate<>(SB),NOSPLIT,$16
+	// adjust for ELF: wants to use -4(GS) for g
+	MOVL	base+0(FP), CX
+	ADDL	$4, CX
+	MOVL	$0, 0(SP)		// syscall gap
+	MOVL	CX, 4(SP)		// arg 1 - ptr
+	MOVL	$SYS__lwp_setprivate, AX
+	INT	$0x80
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$-4
+	MOVL	$SYS_sched_yield, AX
+	INT	$0x80
+	RET
+
+TEXT runtime·lwp_park(SB),NOSPLIT,$-4
+	MOVL	$SYS____lwp_park60, AX
+	INT	$0x80
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·lwp_unpark(SB),NOSPLIT,$-4
+	MOVL	$SYS__lwp_unpark, AX
+	INT	$0x80
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·lwp_self(SB),NOSPLIT,$-4
+	MOVL	$SYS__lwp_self, AX
+	INT	$0x80
+	MOVL	AX, ret+0(FP)
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$28
+	LEAL	mib+0(FP), SI
+	LEAL	4(SP), DI
+	CLD
+	MOVSL				// arg 1 - name
+	MOVSL				// arg 2 - namelen
+	MOVSL				// arg 3 - oldp
+	MOVSL				// arg 4 - oldlenp
+	MOVSL				// arg 5 - newp
+	MOVSL				// arg 6 - newlen
+	MOVL	$SYS___sysctl, AX
+	INT	$0x80
+	JAE	4(PC)
+	NEGL	AX
+	MOVL	AX, ret+24(FP)
+	RET
+	MOVL	$0, AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+GLOBL runtime·tlsoffset(SB),NOPTR,$4
+
+// int32 runtime·kqueue(void)
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVL	$SYS_kqueue, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout)
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVL	$SYS___kevent50, AX
+	INT	$0x80
+	JAE	2(PC)
+	NEGL	AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+// func fcntl(fd, cmd, arg int32) (int32, int32)
+TEXT runtime·fcntl(SB),NOSPLIT,$-4
+	MOVL	$SYS_fcntl, AX
+	INT	$0x80
+	JAE	noerr
+	MOVL	$-1, ret+12(FP)
+	MOVL	AX, errno+16(FP)
+	RET
+noerr:
+	MOVL	AX, ret+12(FP)
+	MOVL	$0, errno+16(FP)
+	RET
+
+// func issetugid() int32
+TEXT runtime·issetugid(SB),NOSPLIT,$0
+	MOVL	$SYS_issetugid, AX
+	INT	$0x80
+	MOVL	AX, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_netbsd_amd64.s b/src/runtime/sys_netbsd_amd64.s
new file mode 100644
index 0000000..2f1ddcd
--- /dev/null
+++ b/src/runtime/sys_netbsd_amd64.s
@@ -0,0 +1,458 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for AMD64, NetBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "cgo/abi_amd64.h"
+
+#define CLOCK_REALTIME		0
+#define CLOCK_MONOTONIC		3
+
+#define SYS_exit			1
+#define SYS_read			3
+#define SYS_write			4
+#define SYS_open			5
+#define SYS_close			6
+#define SYS_getpid			20
+#define SYS_kill			37
+#define SYS_munmap			73
+#define SYS_madvise			75
+#define SYS_fcntl			92
+#define SYS_mmap			197
+#define SYS___sysctl			202
+#define SYS___sigaltstack14		281
+#define SYS___sigprocmask14		293
+#define SYS_issetugid			305
+#define SYS_getcontext			307
+#define SYS_setcontext			308
+#define SYS__lwp_create			309
+#define SYS__lwp_exit			310
+#define SYS__lwp_self			311
+#define SYS__lwp_setprivate		317
+#define SYS__lwp_kill			318
+#define SYS__lwp_unpark			321
+#define SYS___sigaction_sigtramp	340
+#define SYS_kqueue			344
+#define SYS_sched_yield			350
+#define SYS___setitimer50		425
+#define SYS___clock_gettime50		427
+#define SYS___nanosleep50		430
+#define SYS___kevent50			435
+#define SYS____lwp_park60		478
+
+// int32 lwp_create(void *context, uintptr flags, void *lwpid)
+TEXT runtime·lwp_create(SB),NOSPLIT,$0
+	MOVQ	ctxt+0(FP), DI
+	MOVQ	flags+8(FP), SI
+	MOVQ	lwpid+16(FP), DX
+	MOVL	$SYS__lwp_create, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·lwp_tramp(SB),NOSPLIT,$0
+
+	// Set FS to point at m->tls.
+	LEAQ	m_tls(R8), DI
+	CALL	runtime·settls(SB)
+
+	// Set up new stack.
+	get_tls(CX)
+	MOVQ	R8, g_m(R9)
+	MOVQ	R9, g(CX)
+	CALL	runtime·stackcheck(SB)
+
+	// Call fn. This is an ABI0 PC.
+	CALL	R12
+
+	// It shouldn't return. If it does, exit.
+	MOVL	$SYS__lwp_exit, AX
+	SYSCALL
+	JMP	-3(PC)			// keep exiting
+
+TEXT ·netbsdMstart(SB),NOSPLIT|TOPFRAME,$0
+	CALL	·netbsdMstart0(SB)
+	RET // not reached
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	MOVL	$SYS_sched_yield, AX
+	SYSCALL
+	RET
+
+TEXT runtime·lwp_park(SB),NOSPLIT,$0
+	MOVL	clockid+0(FP), DI		// arg 1 - clockid
+	MOVL	flags+4(FP), SI			// arg 2 - flags
+	MOVQ	ts+8(FP), DX			// arg 3 - ts
+	MOVL	unpark+16(FP), R10		// arg 4 - unpark
+	MOVQ	hint+24(FP), R8			// arg 5 - hint
+	MOVQ	unparkhint+32(FP), R9		// arg 6 - unparkhint
+	MOVL	$SYS____lwp_park60, AX
+	SYSCALL
+	MOVL	AX, ret+40(FP)
+	RET
+
+TEXT runtime·lwp_unpark(SB),NOSPLIT,$0
+	MOVL	lwp+0(FP), DI		// arg 1 - lwp
+	MOVQ	hint+8(FP), SI		// arg 2 - hint
+	MOVL	$SYS__lwp_unpark, AX
+	SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·lwp_self(SB),NOSPLIT,$0
+	MOVL	$SYS__lwp_self, AX
+	SYSCALL
+	MOVL	AX, ret+0(FP)
+	RET
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT,$-8
+	MOVL	code+0(FP), DI		// arg 1 - exit status
+	MOVL	$SYS_exit, AX
+	SYSCALL
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-8
+	MOVQ	wait+0(FP), AX
+	// We're done using the stack.
+	MOVL	$0, (AX)
+	MOVL	$SYS__lwp_exit, AX
+	SYSCALL
+	MOVL	$0xf1, 0xf1		// crash
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT,$-8
+	MOVQ	name+0(FP), DI		// arg 1 pathname
+	MOVL	mode+8(FP), SI		// arg 2 flags
+	MOVL	perm+12(FP), DX		// arg 3 mode
+	MOVL	$SYS_open, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$-8
+	MOVL	fd+0(FP), DI		// arg 1 fd
+	MOVL	$SYS_close, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT,$-8
+	MOVL	fd+0(FP), DI		// arg 1 fd
+	MOVQ	p+8(FP), SI		// arg 2 buf
+	MOVL	n+16(FP), DX		// arg 3 count
+	MOVL	$SYS_read, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX			// caller expects negative errno
+	MOVL	AX, ret+24(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-20
+	LEAQ	r+8(FP), DI
+	MOVL	flags+0(FP), SI
+	MOVL	$453, AX
+	SYSCALL
+	MOVL	AX, errno+16(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$-8
+	MOVQ	fd+0(FP), DI		// arg 1 - fd
+	MOVQ	p+8(FP), SI		// arg 2 - buf
+	MOVL	n+16(FP), DX		// arg 3 - nbyte
+	MOVL	$SYS_write, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX			// caller expects negative errno
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$16
+	MOVL	$0, DX
+	MOVL	usec+0(FP), AX
+	MOVL	$1000000, CX
+	DIVL	CX
+	MOVQ	AX, 0(SP)		// tv_sec
+	MOVL	$1000, AX
+	MULL	DX
+	MOVQ	AX, 8(SP)		// tv_nsec
+
+	MOVQ	SP, DI			// arg 1 - rqtp
+	MOVQ	$0, SI			// arg 2 - rmtp
+	MOVL	$SYS___nanosleep50, AX
+	SYSCALL
+	RET
+
+TEXT runtime·lwp_kill(SB),NOSPLIT,$0-16
+	MOVL	tid+0(FP), DI		// arg 1 - target
+	MOVQ	sig+8(FP), SI		// arg 2 - signo
+	MOVL	$SYS__lwp_kill, AX
+	SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$16
+	MOVL	$SYS_getpid, AX
+	SYSCALL
+	MOVQ	AX, DI			// arg 1 - pid
+	MOVL	sig+0(FP), SI		// arg 2 - signo
+	MOVL	$SYS_kill, AX
+	SYSCALL
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$-8
+	MOVL	mode+0(FP), DI		// arg 1 - which
+	MOVQ	new+8(FP), SI		// arg 2 - itv
+	MOVQ	old+16(FP), DX		// arg 3 - oitv
+	MOVL	$SYS___setitimer50, AX
+	SYSCALL
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB), NOSPLIT, $32
+	MOVQ	$CLOCK_REALTIME, DI	// arg 1 - clock_id
+	LEAQ	8(SP), SI		// arg 2 - tp
+	MOVL	$SYS___clock_gettime50, AX
+	SYSCALL
+	MOVQ	8(SP), AX		// sec
+	MOVQ	16(SP), DX		// nsec
+
+	// sec is in AX, nsec in DX
+	MOVQ	AX, sec+0(FP)
+	MOVL	DX, nsec+8(FP)
+	RET
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$32
+	MOVQ	$CLOCK_MONOTONIC, DI	// arg 1 - clock_id
+	LEAQ	8(SP), SI		// arg 2 - tp
+	MOVL	$SYS___clock_gettime50, AX
+	SYSCALL
+	MOVQ	8(SP), AX		// sec
+	MOVQ	16(SP), DX		// nsec
+
+	// sec is in AX, nsec in DX
+	// return nsec in AX
+	IMULQ	$1000000000, AX
+	ADDQ	DX, AX
+	MOVQ	AX, ret+0(FP)
+	RET
+
+TEXT runtime·getcontext(SB),NOSPLIT,$-8
+	MOVQ	ctxt+0(FP), DI		// arg 1 - context
+	MOVL	$SYS_getcontext, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·sigprocmask(SB),NOSPLIT,$0
+	MOVL	how+0(FP), DI		// arg 1 - how
+	MOVQ	new+8(FP), SI		// arg 2 - set
+	MOVQ	old+16(FP), DX		// arg 3 - oset
+	MOVL	$SYS___sigprocmask14, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT sigreturn_tramp<>(SB),NOSPLIT,$-8
+	MOVQ	R15, DI			// Load address of ucontext
+	MOVQ	$SYS_setcontext, AX
+	SYSCALL
+	MOVQ	$-1, DI			// Something failed...
+	MOVL	$SYS_exit, AX
+	SYSCALL
+
+TEXT runtime·sigaction(SB),NOSPLIT,$-8
+	MOVL	sig+0(FP), DI		// arg 1 - signum
+	MOVQ	new+8(FP), SI		// arg 2 - nsa
+	MOVQ	old+16(FP), DX		// arg 3 - osa
+					// arg 4 - tramp
+	LEAQ	sigreturn_tramp<>(SB), R10
+	MOVQ	$2, R8			// arg 5 - vers
+	MOVL	$SYS___sigaction_sigtramp, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVQ	fn+0(FP),    AX
+	MOVL	sig+8(FP),   DI
+	MOVQ	info+16(FP), SI
+	MOVQ	ctx+24(FP),  DX
+	MOVQ	SP, BX		// callee-saved
+	ANDQ	$~15, SP	// alignment for x86_64 ABI
+	CALL	AX
+	MOVQ	BX, SP
+	RET
+
+// Called using C ABI.
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
+	// Transition from C ABI to Go ABI.
+	PUSH_REGS_HOST_TO_ABI0()
+
+	// Set up ABIInternal environment: g in R14, cleared X15.
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	PXOR	X15, X15
+
+	// Reserve space for spill slots.
+	NOP	SP		// disable vet stack checking
+	ADJSP   $24
+
+	// Call into the Go signal handler
+	MOVQ	DI, AX	// sig
+	MOVQ	SI, BX	// info
+	MOVQ	DX, CX	// ctx
+	CALL	·sigtrampgo<ABIInternal>(SB)
+
+	ADJSP	$-24
+
+	POP_REGS_HOST_TO_ABI0()
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI		// arg 1 - addr
+	MOVQ	n+8(FP), SI		// arg 2 - len
+	MOVL	prot+16(FP), DX		// arg 3 - prot
+	MOVL	flags+20(FP), R10		// arg 4 - flags
+	MOVL	fd+24(FP), R8		// arg 5 - fd
+	MOVL	off+28(FP), R9
+	SUBQ	$16, SP
+	MOVQ	R9, 8(SP)		// arg 7 - offset (passed on stack)
+	MOVQ	$0, R9			// arg 6 - pad
+	MOVL	$SYS_mmap, AX
+	SYSCALL
+	JCC	ok
+	ADDQ	$16, SP
+	MOVQ	$0, p+32(FP)
+	MOVQ	AX, err+40(FP)
+	RET
+ok:
+	ADDQ	$16, SP
+	MOVQ	AX, p+32(FP)
+	MOVQ	$0, err+40(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI		// arg 1 - addr
+	MOVQ	n+8(FP), SI		// arg 2 - len
+	MOVL	$SYS_munmap, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVQ	addr+0(FP), DI		// arg 1 - addr
+	MOVQ	n+8(FP), SI		// arg 2 - len
+	MOVL	flags+16(FP), DX	// arg 3 - behav
+	MOVQ	$SYS_madvise, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$-1, AX
+	MOVL	AX, ret+24(FP)
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$-8
+	MOVQ	new+0(FP), DI		// arg 1 - nss
+	MOVQ	old+8(FP), SI		// arg 2 - oss
+	MOVQ	$SYS___sigaltstack14, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+// set tls base to DI
+TEXT runtime·settls(SB),NOSPLIT,$8
+	// adjust for ELF: wants to use -8(FS) for g
+	ADDQ	$8, DI			// arg 1 - ptr
+	MOVQ	$SYS__lwp_setprivate, AX
+	SYSCALL
+	JCC	2(PC)
+	MOVL	$0xf1, 0xf1		// crash
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$0
+	MOVQ	mib+0(FP), DI		// arg 1 - name
+	MOVL	miblen+8(FP), SI		// arg 2 - namelen
+	MOVQ	out+16(FP), DX		// arg 3 - oldp
+	MOVQ	size+24(FP), R10		// arg 4 - oldlenp
+	MOVQ	dst+32(FP), R8		// arg 5 - newp
+	MOVQ	ndst+40(FP), R9		// arg 6 - newlen
+	MOVQ	$SYS___sysctl, AX
+	SYSCALL
+	JCC 4(PC)
+	NEGQ	AX
+	MOVL	AX, ret+48(FP)
+	RET
+	MOVL	$0, AX
+	MOVL	AX, ret+48(FP)
+	RET
+
+// int32 runtime·kqueue(void)
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVQ	$0, DI
+	MOVL	$SYS_kqueue, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout)
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVL	kq+0(FP), DI
+	MOVQ	ch+8(FP), SI
+	MOVL	nch+16(FP), DX
+	MOVQ	ev+24(FP), R10
+	MOVL	nev+32(FP), R8
+	MOVQ	ts+40(FP), R9
+	MOVL	$SYS___kevent50, AX
+	SYSCALL
+	JCC	2(PC)
+	NEGQ	AX
+	MOVL	AX, ret+48(FP)
+	RET
+
+// func fcntl(fd, cmd, arg int32) (int32, int32)
+TEXT runtime·fcntl(SB),NOSPLIT,$0
+	MOVL	fd+0(FP), DI	// fd
+	MOVL	cmd+4(FP), SI	// cmd
+	MOVL	arg+8(FP), DX	// arg
+	MOVL	$SYS_fcntl, AX
+	SYSCALL
+	JCC	noerr
+	MOVL	$-1, ret+16(FP)
+	MOVL	AX, errno+20(FP)
+	RET
+noerr:
+	MOVL	AX, ret+16(FP)
+	MOVL	$0, errno+20(FP)
+	RET
+
+// func issetugid() int32
+TEXT runtime·issetugid(SB),NOSPLIT,$0
+	MOVQ	$0, DI
+	MOVQ	$0, SI
+	MOVQ	$0, DX
+	MOVL	$SYS_issetugid, AX
+	SYSCALL
+	MOVL	AX, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_netbsd_arm.s b/src/runtime/sys_netbsd_arm.s
new file mode 100644
index 0000000..960c419
--- /dev/null
+++ b/src/runtime/sys_netbsd_arm.s
@@ -0,0 +1,427 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for ARM, NetBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME		0
+#define CLOCK_MONOTONIC		3
+
+#define SWI_OS_NETBSD			0xa00000
+#define SYS_exit			SWI_OS_NETBSD | 1
+#define SYS_read			SWI_OS_NETBSD | 3
+#define SYS_write			SWI_OS_NETBSD | 4
+#define SYS_open			SWI_OS_NETBSD | 5
+#define SYS_close			SWI_OS_NETBSD | 6
+#define SYS_getpid			SWI_OS_NETBSD | 20
+#define SYS_kill			SWI_OS_NETBSD | 37
+#define SYS_munmap			SWI_OS_NETBSD | 73
+#define SYS_madvise			SWI_OS_NETBSD | 75
+#define SYS_fcntl			SWI_OS_NETBSD | 92
+#define SYS_mmap			SWI_OS_NETBSD | 197
+#define SYS___sysctl			SWI_OS_NETBSD | 202
+#define SYS___sigaltstack14		SWI_OS_NETBSD | 281
+#define SYS___sigprocmask14		SWI_OS_NETBSD | 293
+#define SYS_issetugid			SWI_OS_NETBSD | 305
+#define SYS_getcontext			SWI_OS_NETBSD | 307
+#define SYS_setcontext			SWI_OS_NETBSD | 308
+#define SYS__lwp_create			SWI_OS_NETBSD | 309
+#define SYS__lwp_exit			SWI_OS_NETBSD | 310
+#define SYS__lwp_self			SWI_OS_NETBSD | 311
+#define SYS__lwp_getprivate		SWI_OS_NETBSD | 316
+#define SYS__lwp_setprivate		SWI_OS_NETBSD | 317
+#define SYS__lwp_kill			SWI_OS_NETBSD | 318
+#define SYS__lwp_unpark			SWI_OS_NETBSD | 321
+#define SYS___sigaction_sigtramp	SWI_OS_NETBSD | 340
+#define SYS_kqueue			SWI_OS_NETBSD | 344
+#define SYS_sched_yield			SWI_OS_NETBSD | 350
+#define SYS___setitimer50		SWI_OS_NETBSD | 425
+#define SYS___clock_gettime50		SWI_OS_NETBSD | 427
+#define SYS___nanosleep50		SWI_OS_NETBSD | 430
+#define SYS___kevent50			SWI_OS_NETBSD | 435
+#define SYS____lwp_park60		SWI_OS_NETBSD | 478
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0
+	MOVW code+0(FP), R0	// arg 1 exit status
+	SWI $SYS_exit
+	MOVW.CS $0, R8	// crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-4
+	MOVW wait+0(FP), R0
+	// We're done using the stack.
+	MOVW $0, R2
+storeloop:
+	LDREX (R0), R4          // loads R4
+	STREX R2, (R0), R1      // stores R2
+	CMP $0, R1
+	BNE storeloop
+	SWI $SYS__lwp_exit
+	MOVW $1, R8	// crash
+	MOVW R8, (R8)
+	JMP 0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0
+	MOVW name+0(FP), R0
+	MOVW mode+4(FP), R1
+	MOVW perm+8(FP), R2
+	SWI $SYS_open
+	MOVW.CS	$-1, R0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0
+	MOVW fd+0(FP), R0
+	SWI $SYS_close
+	MOVW.CS	$-1, R0
+	MOVW	R0, ret+4(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0
+	MOVW fd+0(FP), R0
+	MOVW p+4(FP), R1
+	MOVW n+8(FP), R2
+	SWI $SYS_read
+	RSB.CS	$0, R0		// caller expects negative errno
+	MOVW	R0, ret+12(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT,$0-16
+	MOVW $r+4(FP), R0
+	MOVW flags+0(FP), R1
+	SWI $0xa001c5
+	MOVW R0, errno+12(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0
+	MOVW	fd+0(FP), R0	// arg 1 - fd
+	MOVW	p+4(FP), R1	// arg 2 - buf
+	MOVW	n+8(FP), R2	// arg 3 - nbyte
+	SWI $SYS_write
+	RSB.CS	$0, R0		// caller expects negative errno
+	MOVW	R0, ret+12(FP)
+	RET
+
+// int32 lwp_create(void *context, uintptr flags, void *lwpid)
+TEXT runtime·lwp_create(SB),NOSPLIT,$0
+	MOVW ctxt+0(FP), R0
+	MOVW flags+4(FP), R1
+	MOVW lwpid+8(FP), R2
+	SWI $SYS__lwp_create
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	SWI $SYS_sched_yield
+	RET
+
+TEXT runtime·lwp_park(SB),NOSPLIT,$8
+	MOVW clockid+0(FP), R0		// arg 1 - clock_id
+	MOVW flags+4(FP), R1		// arg 2 - flags
+	MOVW ts+8(FP), R2		// arg 3 - ts
+	MOVW unpark+12(FP), R3		// arg 4 - unpark
+	MOVW hint+16(FP), R4		// arg 5 - hint
+	MOVW R4, 4(R13)
+	MOVW unparkhint+20(FP), R5	// arg 6 - unparkhint
+	MOVW R5, 8(R13)
+	SWI $SYS____lwp_park60
+	MOVW	R0, ret+24(FP)
+	RET
+
+TEXT runtime·lwp_unpark(SB),NOSPLIT,$0
+	MOVW	lwp+0(FP), R0	// arg 1 - lwp
+	MOVW	hint+4(FP), R1	// arg 2 - hint
+	SWI	$SYS__lwp_unpark
+	MOVW	R0, ret+8(FP)
+	RET
+
+TEXT runtime·lwp_self(SB),NOSPLIT,$0
+	SWI	$SYS__lwp_self
+	MOVW	R0, ret+0(FP)
+	RET
+
+TEXT runtime·lwp_tramp(SB),NOSPLIT,$0
+	MOVW R0, g_m(R1)
+	MOVW R1, g
+
+	BL runtime·emptyfunc(SB) // fault if stack check is wrong
+	BL (R2)
+	MOVW $2, R8  // crash (not reached)
+	MOVW R8, (R8)
+	RET
+
+TEXT ·netbsdMstart(SB),NOSPLIT|TOPFRAME,$0
+	BL ·netbsdMstart0(SB)
+	RET // not reached
+
+TEXT runtime·usleep(SB),NOSPLIT,$16
+	MOVW usec+0(FP), R0
+	CALL runtime·usplitR0(SB)
+	// 0(R13) is the saved LR, don't use it
+	MOVW R0, 4(R13) // tv_sec.low
+	MOVW $0, R0
+	MOVW R0, 8(R13) // tv_sec.high
+	MOVW $1000, R2
+	MUL R1, R2
+	MOVW R2, 12(R13) // tv_nsec
+
+	MOVW $4(R13), R0 // arg 1 - rqtp
+	MOVW $0, R1      // arg 2 - rmtp
+	SWI $SYS___nanosleep50
+	RET
+
+TEXT runtime·lwp_kill(SB),NOSPLIT,$0-8
+	MOVW	tid+0(FP), R0	// arg 1 - tid
+	MOVW	sig+4(FP), R1	// arg 2 - signal
+	SWI	$SYS__lwp_kill
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$16
+	SWI	$SYS_getpid	// the returned R0 is arg 1
+	MOVW	sig+0(FP), R1	// arg 2 - signal
+	SWI	$SYS_kill
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT|NOFRAME,$0
+	MOVW mode+0(FP), R0	// arg 1 - which
+	MOVW new+4(FP), R1	// arg 2 - itv
+	MOVW old+8(FP), R2	// arg 3 - oitv
+	SWI $SYS___setitimer50
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB), NOSPLIT, $32
+	MOVW $0, R0	// CLOCK_REALTIME
+	MOVW $8(R13), R1
+	SWI $SYS___clock_gettime50
+
+	MOVW 8(R13), R0	// sec.low
+	MOVW 12(R13), R1 // sec.high
+	MOVW 16(R13), R2 // nsec
+
+	MOVW R0, sec_lo+0(FP)
+	MOVW R1, sec_hi+4(FP)
+	MOVW R2, nsec+8(FP)
+	RET
+
+// int64 nanotime1(void) so really
+// void nanotime1(int64 *nsec)
+TEXT runtime·nanotime1(SB), NOSPLIT, $32
+	MOVW $3, R0 // CLOCK_MONOTONIC
+	MOVW $8(R13), R1
+	SWI $SYS___clock_gettime50
+
+	MOVW 8(R13), R0 // sec.low
+	MOVW 12(R13), R4 // sec.high
+	MOVW 16(R13), R2 // nsec
+
+	MOVW $1000000000, R3
+	MULLU R0, R3, (R1, R0)
+	MUL R3, R4
+	ADD.S R2, R0
+	ADC R4, R1
+
+	MOVW R0, ret_lo+0(FP)
+	MOVW R1, ret_hi+4(FP)
+	RET
+
+TEXT runtime·getcontext(SB),NOSPLIT|NOFRAME,$0
+	MOVW ctxt+0(FP), R0	// arg 1 - context
+	SWI $SYS_getcontext
+	MOVW.CS $0, R8	// crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+TEXT runtime·sigprocmask(SB),NOSPLIT,$0
+	MOVW how+0(FP), R0	// arg 1 - how
+	MOVW new+4(FP), R1	// arg 2 - set
+	MOVW old+8(FP), R2	// arg 3 - oset
+	SWI $SYS___sigprocmask14
+	MOVW.CS $0, R8	// crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+TEXT sigreturn_tramp<>(SB),NOSPLIT|NOFRAME,$0
+	// on entry, SP points to siginfo, we add sizeof(ucontext)
+	// to SP to get a pointer to ucontext.
+	ADD $0x80, R13, R0 // 0x80 == sizeof(UcontextT)
+	SWI $SYS_setcontext
+	// something failed, we have to exit
+	MOVW $0x4242, R0 // magic return number
+	SWI $SYS_exit
+	B -2(PC)	// continue exit
+
+TEXT runtime·sigaction(SB),NOSPLIT,$4
+	MOVW sig+0(FP), R0	// arg 1 - signum
+	MOVW new+4(FP), R1	// arg 2 - nsa
+	MOVW old+8(FP), R2	// arg 3 - osa
+	MOVW $sigreturn_tramp<>(SB), R3	// arg 4 - tramp
+	MOVW $2, R4	// arg 5 - vers
+	MOVW R4, 4(R13)
+	ADD $4, R13	// pass arg 5 on stack
+	SWI $SYS___sigaction_sigtramp
+	SUB $4, R13
+	MOVW.CS $3, R8	// crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-16
+	MOVW	sig+4(FP), R0
+	MOVW	info+8(FP), R1
+	MOVW	ctx+12(FP), R2
+	MOVW	fn+0(FP), R11
+	MOVW	R13, R4
+	SUB	$24, R13
+	BIC	$0x7, R13 // alignment for ELF ABI
+	BL	(R11)
+	MOVW	R4, R13
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$0
+	// Reserve space for callee-save registers and arguments.
+	MOVM.DB.W [R4-R11], (R13)
+	SUB	$16, R13
+
+	// this might be called in external code context,
+	// where g is not set.
+	// first save R0, because runtime·load_g will clobber it
+	MOVW	R0, 4(R13) // signum
+	MOVB	runtime·iscgo(SB), R0
+	CMP 	$0, R0
+	BL.NE	runtime·load_g(SB)
+
+	MOVW	R1, 8(R13)
+	MOVW	R2, 12(R13)
+	BL	runtime·sigtrampgo(SB)
+
+	// Restore callee-save registers.
+	ADD	$16, R13
+	MOVM.IA.W (R13), [R4-R11]
+
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$12
+	MOVW addr+0(FP), R0	// arg 1 - addr
+	MOVW n+4(FP), R1	// arg 2 - len
+	MOVW prot+8(FP), R2	// arg 3 - prot
+	MOVW flags+12(FP), R3	// arg 4 - flags
+	// arg 5 (fid) and arg6 (offset_lo, offset_hi) are passed on stack
+	// note the C runtime only passes the 32-bit offset_lo to us
+	MOVW fd+16(FP), R4		// arg 5
+	MOVW R4, 4(R13)
+	MOVW off+20(FP), R5		// arg 6 lower 32-bit
+	MOVW R5, 8(R13)
+	MOVW $0, R6 // higher 32-bit for arg 6
+	MOVW R6, 12(R13)
+	ADD $4, R13 // pass arg 5 and arg 6 on stack
+	SWI $SYS_mmap
+	SUB $4, R13
+	MOVW	$0, R1
+	MOVW.CS R0, R1	// if error, move to R1
+	MOVW.CS $0, R0
+	MOVW	R0, p+24(FP)
+	MOVW	R1, err+28(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVW addr+0(FP), R0	// arg 1 - addr
+	MOVW n+4(FP), R1	// arg 2 - len
+	SWI $SYS_munmap
+	MOVW.CS $0, R8	// crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVW	addr+0(FP), R0	// arg 1 - addr
+	MOVW	n+4(FP), R1	// arg 2 - len
+	MOVW	flags+8(FP), R2	// arg 3 - behav
+	SWI	$SYS_madvise
+	MOVW.CS	$-1, R0
+	MOVW	R0, ret+12(FP)
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT|NOFRAME,$0
+	MOVW new+0(FP), R0	// arg 1 - nss
+	MOVW old+4(FP), R1	// arg 2 - oss
+	SWI $SYS___sigaltstack14
+	MOVW.CS $0, R8	// crash on syscall failure
+	MOVW.CS R8, (R8)
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$8
+	MOVW mib+0(FP), R0	// arg 1 - name
+	MOVW miblen+4(FP), R1	// arg 2 - namelen
+	MOVW out+8(FP), R2	// arg 3 - oldp
+	MOVW size+12(FP), R3	// arg 4 - oldlenp
+	MOVW dst+16(FP), R4	// arg 5 - newp
+	MOVW R4, 4(R13)
+	MOVW ndst+20(FP), R4	// arg 6 - newlen
+	MOVW R4, 8(R13)
+	ADD $4, R13	// pass arg 5 and 6 on stack
+	SWI $SYS___sysctl
+	SUB $4, R13
+	MOVW	R0, ret+24(FP)
+	RET
+
+// int32 runtime·kqueue(void)
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	SWI	$SYS_kqueue
+	RSB.CS	$0, R0
+	MOVW	R0, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout)
+TEXT runtime·kevent(SB),NOSPLIT,$8
+	MOVW kq+0(FP), R0	// kq
+	MOVW ch+4(FP), R1	// changelist
+	MOVW nch+8(FP), R2	// nchanges
+	MOVW ev+12(FP), R3	// eventlist
+	MOVW nev+16(FP), R4	// nevents
+	MOVW R4, 4(R13)
+	MOVW ts+20(FP), R4	// timeout
+	MOVW R4, 8(R13)
+	ADD $4, R13	// pass arg 5 and 6 on stack
+	SWI $SYS___kevent50
+	RSB.CS $0, R0
+	SUB $4, R13
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func fcntl(fd, cmd, args int32) int32
+TEXT runtime·fcntl(SB),NOSPLIT,$0
+	MOVW fd+0(FP), R0
+	MOVW cmd+4(FP), R1
+	MOVW arg+8(FP), R2
+	SWI $SYS_fcntl
+	MOVW $0, R1
+	MOVW.CS R0, R1
+	MOVW.CS $-1, R0
+	MOVW R0, ret+12(FP)
+	MOVW R1, errno+16(FP)
+	RET
+
+// TODO: this is only valid for ARMv7+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	B	runtime·armPublicationBarrier(SB)
+
+TEXT runtime·read_tls_fallback(SB),NOSPLIT|NOFRAME,$0
+	MOVM.WP [R1, R2, R3, R12], (R13)
+	SWI $SYS__lwp_getprivate
+	MOVM.IAW    (R13), [R1, R2, R3, R12]
+	RET
+
+// func issetugid() int32
+TEXT runtime·issetugid(SB),NOSPLIT,$0
+	SWI $SYS_issetugid
+	MOVW	R0, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_netbsd_arm64.s b/src/runtime/sys_netbsd_arm64.s
new file mode 100644
index 0000000..23e7494
--- /dev/null
+++ b/src/runtime/sys_netbsd_arm64.s
@@ -0,0 +1,435 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for arm64, NetBSD
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "cgo/abi_arm64.h"
+
+#define CLOCK_REALTIME		0
+#define CLOCK_MONOTONIC		3
+
+#define SYS_exit			1
+#define SYS_read			3
+#define SYS_write			4
+#define SYS_open			5
+#define SYS_close			6
+#define SYS_getpid			20
+#define SYS_kill			37
+#define SYS_munmap			73
+#define SYS_madvise			75
+#define SYS_fcntl			92
+#define SYS_mmap			197
+#define SYS___sysctl			202
+#define SYS___sigaltstack14		281
+#define SYS___sigprocmask14		293
+#define SYS_issetugid			305
+#define SYS_getcontext			307
+#define SYS_setcontext			308
+#define SYS__lwp_create			309
+#define SYS__lwp_exit			310
+#define SYS__lwp_self			311
+#define SYS__lwp_kill			318
+#define SYS__lwp_unpark			321
+#define SYS___sigaction_sigtramp	340
+#define SYS_kqueue			344
+#define SYS_sched_yield			350
+#define SYS___setitimer50		425
+#define SYS___clock_gettime50		427
+#define SYS___nanosleep50		430
+#define SYS___kevent50			435
+#define SYS_pipe2			453
+#define SYS_openat			468
+#define SYS____lwp_park60		478
+
+// int32 lwp_create(void *context, uintptr flags, void *lwpid)
+TEXT runtime·lwp_create(SB),NOSPLIT,$0
+	MOVD	ctxt+0(FP), R0
+	MOVD	flags+8(FP), R1
+	MOVD	lwpid+16(FP), R2
+	SVC	$SYS__lwp_create
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+24(FP)
+	RET
+
+TEXT runtime·lwp_tramp(SB),NOSPLIT,$0
+	CMP	$0, R1
+	BEQ	nog
+	CMP	$0, R2
+	BEQ	nog
+
+	MOVD	R0, g_m(R1)
+	MOVD	R1, g
+nog:
+	CALL	(R2)
+
+	MOVD	$0, R0  // crash (not reached)
+	MOVD	R0, (R8)
+
+TEXT ·netbsdMstart(SB),NOSPLIT|TOPFRAME,$0
+	CALL	·netbsdMstart0(SB)
+	RET // not reached
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	SVC	$SYS_sched_yield
+	RET
+
+TEXT runtime·lwp_park(SB),NOSPLIT,$0
+	MOVW	clockid+0(FP), R0	// arg 1 - clockid
+	MOVW	flags+4(FP), R1		// arg 2 - flags
+	MOVD	ts+8(FP), R2		// arg 3 - ts
+	MOVW	unpark+16(FP), R3	// arg 4 - unpark
+	MOVD	hint+24(FP), R4		// arg 5 - hint
+	MOVD	unparkhint+32(FP), R5	// arg 6 - unparkhint
+	SVC	$SYS____lwp_park60
+	MOVW	R0, ret+40(FP)
+	RET
+
+TEXT runtime·lwp_unpark(SB),NOSPLIT,$0
+	MOVW	lwp+0(FP), R0		// arg 1 - lwp
+	MOVD	hint+8(FP), R1		// arg 2 - hint
+	SVC	$SYS__lwp_unpark
+	MOVW	R0, ret+16(FP)
+	RET
+
+TEXT runtime·lwp_self(SB),NOSPLIT,$0
+	SVC	$SYS__lwp_self
+	MOVW	R0, ret+0(FP)
+	RET
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT,$-8
+	MOVW	code+0(FP), R0		// arg 1 - exit status
+	SVC	$SYS_exit
+	MOVD	$0, R0			// If we're still running,
+	MOVD	R0, (R0)		// crash
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0-8
+	MOVD	wait+0(FP), R0
+	// We're done using the stack.
+	MOVW	$0, R1
+	STLRW	R1, (R0)
+	SVC	$SYS__lwp_exit
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$-8
+	MOVD	name+0(FP), R0		// arg 1 - pathname
+	MOVW	mode+8(FP), R1		// arg 2 - flags
+	MOVW	perm+12(FP), R2		// arg 3 - mode
+	SVC	$SYS_open
+	BCC	ok
+	MOVW	$-1, R0
+ok:
+	MOVW	R0, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$-8
+	MOVW	fd+0(FP), R0		// arg 1 - fd
+	SVC	$SYS_close
+	BCC	ok
+	MOVW	$-1, R0
+ok:
+	MOVW	R0, ret+8(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0
+	MOVW	fd+0(FP), R0		// arg 1 - fd
+	MOVD	p+8(FP), R1		// arg 2 - buf
+	MOVW	n+16(FP), R2		// arg 3 - count
+	SVC	$SYS_read
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+24(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	ADD	$16, RSP, R0
+	MOVW	flags+0(FP), R1
+	SVC	$SYS_pipe2
+	BCC	pipe2ok
+	NEG	R0, R0
+pipe2ok:
+	MOVW	R0, errno+16(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT,$-8
+	MOVD	fd+0(FP), R0		// arg 1 - fd
+	MOVD	p+8(FP), R1		// arg 2 - buf
+	MOVW	n+16(FP), R2		// arg 3 - nbyte
+	SVC	$SYS_write
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+24(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$24-4
+	MOVWU	usec+0(FP), R3
+	MOVD	R3, R5
+	MOVW	$1000000, R4
+	UDIV	R4, R3
+	MOVD	R3, 8(RSP)		// sec
+	MUL	R3, R4
+	SUB	R4, R5
+	MOVW	$1000, R4
+	MUL	R4, R5
+	MOVD	R5, 16(RSP)		// nsec
+
+	MOVD	$8(RSP), R0		// arg 1 - rqtp
+	MOVD	$0, R1			// arg 2 - rmtp
+	SVC	$SYS___nanosleep50
+	RET
+
+TEXT runtime·lwp_kill(SB),NOSPLIT,$0-16
+	MOVW	tid+0(FP), R0		// arg 1 - target
+	MOVD	sig+8(FP), R1		// arg 2 - signo
+	SVC	$SYS__lwp_kill
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$16
+	SVC	$SYS_getpid
+					// arg 1 - pid (from getpid)
+	MOVD	sig+0(FP), R1		// arg 2 - signo
+	SVC	$SYS_kill
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$-8
+	MOVW	mode+0(FP), R0		// arg 1 - which
+	MOVD	new+8(FP), R1		// arg 2 - itv
+	MOVD	old+16(FP), R2		// arg 3 - oitv
+	SVC	$SYS___setitimer50
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB), NOSPLIT, $32
+	MOVW	$CLOCK_REALTIME, R0	// arg 1 - clock_id
+	MOVD	$8(RSP), R1		// arg 2 - tp
+	SVC	$SYS___clock_gettime50
+
+	MOVD	8(RSP), R0		// sec
+	MOVD	16(RSP), R1		// nsec
+
+	// sec is in R0, nsec in R1
+	MOVD	R0, sec+0(FP)
+	MOVW	R1, nsec+8(FP)
+	RET
+
+// int64 nanotime1(void) so really
+// void nanotime1(int64 *nsec)
+TEXT runtime·nanotime1(SB), NOSPLIT, $32
+	MOVD	$CLOCK_MONOTONIC, R0	// arg 1 - clock_id
+	MOVD	$8(RSP), R1		// arg 2 - tp
+	SVC	$SYS___clock_gettime50
+	MOVD	8(RSP), R0		// sec
+	MOVD	16(RSP), R2		// nsec
+
+	// sec is in R0, nsec in R2
+	// return nsec in R2
+	MOVD	$1000000000, R3
+	MUL	R3, R0
+	ADD	R2, R0
+
+	MOVD	R0, ret+0(FP)
+	RET
+
+TEXT runtime·getcontext(SB),NOSPLIT,$-8
+	MOVD	ctxt+0(FP), R0		// arg 1 - context
+	SVC	$SYS_getcontext
+	BCS	fail
+	RET
+fail:
+	MOVD	$0, R0
+	MOVD	R0, (R0)		// crash
+
+TEXT runtime·sigprocmask(SB),NOSPLIT,$0
+	MOVW	how+0(FP), R0		// arg 1 - how
+	MOVD	new+8(FP), R1		// arg 2 - set
+	MOVD	old+16(FP), R2		// arg 3 - oset
+	SVC	$SYS___sigprocmask14
+	BCS	fail
+	RET
+fail:
+	MOVD	$0, R0
+	MOVD	R0, (R0)		// crash
+
+TEXT sigreturn_tramp<>(SB),NOSPLIT,$-8
+	MOVD	g, R0
+	SVC	$SYS_setcontext
+	MOVD	$0, R0
+	MOVD	R0, (R0)		// crash
+
+TEXT runtime·sigaction(SB),NOSPLIT,$-8
+	MOVW	sig+0(FP), R0		// arg 1 - signum
+	MOVD	new+8(FP), R1		// arg 2 - nsa
+	MOVD	old+16(FP), R2		// arg 3 - osa
+					// arg 4 - tramp
+	MOVD	$sigreturn_tramp<>(SB), R3
+	MOVW	$2, R4			// arg 5 - vers
+	SVC	$SYS___sigaction_sigtramp
+	BCS	fail
+	RET
+fail:
+	MOVD	$0, R0
+	MOVD	R0, (R0)		// crash
+
+// XXX ???
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R0
+	MOVD	info+16(FP), R1
+	MOVD	ctx+24(FP), R2
+	MOVD	fn+0(FP), R11
+	BL	(R11)
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$176
+	// Save callee-save registers in the case of signal forwarding.
+	// Please refer to https://golang.org/issue/31827 .
+	SAVE_R19_TO_R28(8*4)
+	SAVE_F8_TO_F15(8*14)
+	// Unclobber g for now (kernel uses it as ucontext ptr)
+	// See https://github.com/golang/go/issues/30824#issuecomment-492772426
+	// This is only correct in the non-cgo case.
+	// XXX should use lwp_getprivate as suggested.
+	// 8*36 is ucontext.uc_mcontext.__gregs[_REG_X28]
+	MOVD	8*36(g), g
+
+	// this might be called in external code context,
+	// where g is not set.
+	// first save R0, because runtime·load_g will clobber it
+	MOVD	R0, 8(RSP)		// signum
+	MOVB	runtime·iscgo(SB), R0
+	CMP 	$0, R0
+	// XXX branch destination
+	BEQ	2(PC)
+	BL	runtime·load_g(SB)
+
+	// Restore signum to R0.
+	MOVW	8(RSP), R0
+	// R1 and R2 already contain info and ctx, respectively.
+	BL	runtime·sigtrampgo<ABIInternal>(SB)
+
+	// Restore callee-save registers.
+	RESTORE_R19_TO_R28(8*4)
+	RESTORE_F8_TO_F15(8*14)
+
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$0
+	MOVD	addr+0(FP), R0		// arg 1 - addr
+	MOVD	n+8(FP), R1		// arg 2 - len
+	MOVW	prot+16(FP), R2		// arg 3 - prot
+	MOVW	flags+20(FP), R3	// arg 4 - flags
+	MOVW	fd+24(FP), R4		// arg 5 - fd
+	MOVW	$0, R5			// arg 6 - pad
+	MOVD	off+28(FP), R6		// arg 7 - offset
+	SVC	$SYS_mmap
+	BCS	fail
+	MOVD	R0, p+32(FP)
+	MOVD	$0, err+40(FP)
+	RET
+fail:
+	MOVD	$0, p+32(FP)
+	MOVD	R0, err+40(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVD	addr+0(FP), R0	// arg 1 - addr
+	MOVD	n+8(FP), R1	// arg 2 - len
+	SVC	$SYS_munmap
+	BCS	fail
+	RET
+fail:
+	MOVD	$0, R0
+	MOVD	R0, (R0)	// crash
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVD	addr+0(FP), R0		// arg 1 - addr
+	MOVD	n+8(FP), R1		// arg 2 - len
+	MOVW	flags+16(FP), R2	// arg 3 - behav
+	SVC	$SYS_madvise
+	BCC	ok
+	MOVD	$-1, R0
+ok:
+	MOVD	R0, ret+24(FP)
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$0
+	MOVD	new+0(FP), R0		// arg 1 - nss
+	MOVD	old+8(FP), R1		// arg 2 - oss
+	SVC	$SYS___sigaltstack14
+	BCS	fail
+	RET
+fail:
+	MOVD	$0, R0
+	MOVD	R0, (R0)		// crash
+
+TEXT runtime·sysctl(SB),NOSPLIT,$0
+	MOVD	mib+0(FP), R0		// arg 1 - name
+	MOVW	miblen+8(FP), R1	// arg 2 - namelen
+	MOVD	out+16(FP), R2		// arg 3 - oldp
+	MOVD	size+24(FP), R3		// arg 4 - oldlenp
+	MOVD	dst+32(FP), R4		// arg 5 - newp
+	MOVD	ndst+40(FP), R5		// arg 6 - newlen
+	SVC	$SYS___sysctl
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+48(FP)
+	RET
+
+// int32 runtime·kqueue(void)
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVD	$0, R0
+	SVC	$SYS_kqueue
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout)
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVW	kq+0(FP), R0	// arg 1 - kq
+	MOVD	ch+8(FP), R1	// arg 2 - changelist
+	MOVW	nch+16(FP), R2	// arg 3 - nchanges
+	MOVD	ev+24(FP), R3	// arg 4 - eventlist
+	MOVW	nev+32(FP), R4	// arg 5 - nevents
+	MOVD	ts+40(FP), R5	// arg 6 - timeout
+	SVC	$SYS___kevent50
+	BCC	ok
+	NEG	R0, R0
+ok:
+	MOVW	R0, ret+48(FP)
+	RET
+
+// func fcntl(fd, cmd, arg int32) (int32, int32)
+TEXT runtime·fcntl(SB),NOSPLIT,$0
+	MOVW	fd+0(FP), R0	// fd
+	MOVW	cmd+4(FP), R1	// cmd
+	MOVW	arg+8(FP), R2	// arg
+	SVC	$SYS_fcntl
+	BCC	noerr
+	MOVW	$-1, R1
+	MOVW	R1, ret+16(FP)
+	MOVW	R0, errno+20(FP)
+	RET
+noerr:
+	MOVW	R0, ret+16(FP)
+	MOVW	$0, errno+20(FP)
+	RET
+
+// func issetugid() int32
+TEXT runtime·issetugid(SB),NOSPLIT|NOFRAME,$0
+	SVC $SYS_issetugid
+	MOVW	R0, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_nonppc64x.go b/src/runtime/sys_nonppc64x.go
new file mode 100644
index 0000000..653f1c9
--- /dev/null
+++ b/src/runtime/sys_nonppc64x.go
@@ -0,0 +1,10 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !ppc64 && !ppc64le
+
+package runtime
+
+func prepGoExitFrame(sp uintptr) {
+}
diff --git a/src/runtime/sys_openbsd.go b/src/runtime/sys_openbsd.go
new file mode 100644
index 0000000..c4b8489
--- /dev/null
+++ b/src/runtime/sys_openbsd.go
@@ -0,0 +1,75 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build openbsd && !mips64
+
+package runtime
+
+import (
+	"internal/abi"
+	"unsafe"
+)
+
+// The *_trampoline functions convert from the Go calling convention to the C calling convention
+// and then call the underlying libc function. These are defined in sys_openbsd_$ARCH.s.
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_attr_init(attr *pthreadattr) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_attr_init_trampoline)), unsafe.Pointer(&attr))
+	KeepAlive(attr)
+	return ret
+}
+func pthread_attr_init_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_attr_destroy(attr *pthreadattr) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_attr_destroy_trampoline)), unsafe.Pointer(&attr))
+	KeepAlive(attr)
+	return ret
+}
+func pthread_attr_destroy_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_attr_getstacksize(attr *pthreadattr, size *uintptr) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_attr_getstacksize_trampoline)), unsafe.Pointer(&attr))
+	KeepAlive(attr)
+	KeepAlive(size)
+	return ret
+}
+func pthread_attr_getstacksize_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_attr_setdetachstate(attr *pthreadattr, state int) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_attr_setdetachstate_trampoline)), unsafe.Pointer(&attr))
+	KeepAlive(attr)
+	return ret
+}
+func pthread_attr_setdetachstate_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func pthread_create(attr *pthreadattr, start uintptr, arg unsafe.Pointer) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(pthread_create_trampoline)), unsafe.Pointer(&attr))
+	KeepAlive(attr)
+	KeepAlive(arg) // Just for consistency. Arg of course needs to be kept alive for the start function.
+	return ret
+}
+func pthread_create_trampoline()
+
+// Tell the linker that the libc_* functions are to be found
+// in a system library, with the libc_ prefix missing.
+
+//go:cgo_import_dynamic libc_pthread_attr_init pthread_attr_init "libpthread.so"
+//go:cgo_import_dynamic libc_pthread_attr_destroy pthread_attr_destroy "libpthread.so"
+//go:cgo_import_dynamic libc_pthread_attr_getstacksize pthread_attr_getstacksize "libpthread.so"
+//go:cgo_import_dynamic libc_pthread_attr_setdetachstate pthread_attr_setdetachstate "libpthread.so"
+//go:cgo_import_dynamic libc_pthread_create pthread_create "libpthread.so"
+//go:cgo_import_dynamic libc_pthread_sigmask pthread_sigmask "libpthread.so"
+
+//go:cgo_import_dynamic _ _ "libpthread.so"
+//go:cgo_import_dynamic _ _ "libc.so"
diff --git a/src/runtime/sys_openbsd1.go b/src/runtime/sys_openbsd1.go
new file mode 100644
index 0000000..d852e3c
--- /dev/null
+++ b/src/runtime/sys_openbsd1.go
@@ -0,0 +1,46 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build openbsd && !mips64
+
+package runtime
+
+import (
+	"internal/abi"
+	"unsafe"
+)
+
+//go:nosplit
+//go:cgo_unsafe_args
+func thrsleep(ident uintptr, clock_id int32, tsp *timespec, lock uintptr, abort *uint32) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(thrsleep_trampoline)), unsafe.Pointer(&ident))
+	KeepAlive(tsp)
+	KeepAlive(abort)
+	return ret
+}
+func thrsleep_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func thrwakeup(ident uintptr, n int32) int32 {
+	return libcCall(unsafe.Pointer(abi.FuncPCABI0(thrwakeup_trampoline)), unsafe.Pointer(&ident))
+}
+func thrwakeup_trampoline()
+
+//go:nosplit
+func osyield() {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(sched_yield_trampoline)), unsafe.Pointer(nil))
+}
+func sched_yield_trampoline()
+
+//go:nosplit
+func osyield_no_g() {
+	asmcgocall_no_g(unsafe.Pointer(abi.FuncPCABI0(sched_yield_trampoline)), unsafe.Pointer(nil))
+}
+
+//go:cgo_import_dynamic libc_thrsleep __thrsleep "libc.so"
+//go:cgo_import_dynamic libc_thrwakeup __thrwakeup "libc.so"
+//go:cgo_import_dynamic libc_sched_yield sched_yield "libc.so"
+
+//go:cgo_import_dynamic _ _ "libc.so"
diff --git a/src/runtime/sys_openbsd2.go b/src/runtime/sys_openbsd2.go
new file mode 100644
index 0000000..b38e49e
--- /dev/null
+++ b/src/runtime/sys_openbsd2.go
@@ -0,0 +1,303 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build openbsd && !mips64
+
+package runtime
+
+import (
+	"internal/abi"
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+// This is exported via linkname to assembly in runtime/cgo.
+//
+//go:linkname exit
+//go:nosplit
+//go:cgo_unsafe_args
+func exit(code int32) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(exit_trampoline)), unsafe.Pointer(&code))
+}
+func exit_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func getthrid() (tid int32) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(getthrid_trampoline)), unsafe.Pointer(&tid))
+	return
+}
+func getthrid_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func raiseproc(sig uint32) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(raiseproc_trampoline)), unsafe.Pointer(&sig))
+}
+func raiseproc_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func thrkill(tid int32, sig int) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(thrkill_trampoline)), unsafe.Pointer(&tid))
+}
+func thrkill_trampoline()
+
+// mmap is used to do low-level memory allocation via mmap. Don't allow stack
+// splits, since this function (used by sysAlloc) is called in a lot of low-level
+// parts of the runtime and callers often assume it won't acquire any locks.
+//
+//go:nosplit
+func mmap(addr unsafe.Pointer, n uintptr, prot, flags, fd int32, off uint32) (unsafe.Pointer, int) {
+	args := struct {
+		addr            unsafe.Pointer
+		n               uintptr
+		prot, flags, fd int32
+		off             uint32
+		ret1            unsafe.Pointer
+		ret2            int
+	}{addr, n, prot, flags, fd, off, nil, 0}
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(mmap_trampoline)), unsafe.Pointer(&args))
+	KeepAlive(addr) // Just for consistency. Hopefully addr is not a Go address.
+	return args.ret1, args.ret2
+}
+func mmap_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func munmap(addr unsafe.Pointer, n uintptr) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(munmap_trampoline)), unsafe.Pointer(&addr))
+	KeepAlive(addr) // Just for consistency. Hopefully addr is not a Go address.
+}
+func munmap_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func madvise(addr unsafe.Pointer, n uintptr, flags int32) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(madvise_trampoline)), unsafe.Pointer(&addr))
+	KeepAlive(addr) // Just for consistency. Hopefully addr is not a Go address.
+}
+func madvise_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func open(name *byte, mode, perm int32) (ret int32) {
+	ret = libcCall(unsafe.Pointer(abi.FuncPCABI0(open_trampoline)), unsafe.Pointer(&name))
+	KeepAlive(name)
+	return
+}
+func open_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func closefd(fd int32) int32 {
+	return libcCall(unsafe.Pointer(abi.FuncPCABI0(close_trampoline)), unsafe.Pointer(&fd))
+}
+func close_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func read(fd int32, p unsafe.Pointer, n int32) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(read_trampoline)), unsafe.Pointer(&fd))
+	KeepAlive(p)
+	return ret
+}
+func read_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func write1(fd uintptr, p unsafe.Pointer, n int32) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(write_trampoline)), unsafe.Pointer(&fd))
+	KeepAlive(p)
+	return ret
+}
+func write_trampoline()
+
+func pipe2(flags int32) (r, w int32, errno int32) {
+	var p [2]int32
+	args := struct {
+		p     unsafe.Pointer
+		flags int32
+	}{noescape(unsafe.Pointer(&p)), flags}
+	errno = libcCall(unsafe.Pointer(abi.FuncPCABI0(pipe2_trampoline)), unsafe.Pointer(&args))
+	return p[0], p[1], errno
+}
+func pipe2_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func setitimer(mode int32, new, old *itimerval) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(setitimer_trampoline)), unsafe.Pointer(&mode))
+	KeepAlive(new)
+	KeepAlive(old)
+}
+func setitimer_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func usleep(usec uint32) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(usleep_trampoline)), unsafe.Pointer(&usec))
+}
+func usleep_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func usleep_no_g(usec uint32) {
+	asmcgocall_no_g(unsafe.Pointer(abi.FuncPCABI0(usleep_trampoline)), unsafe.Pointer(&usec))
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sysctl(mib *uint32, miblen uint32, out *byte, size *uintptr, dst *byte, ndst uintptr) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(sysctl_trampoline)), unsafe.Pointer(&mib))
+	KeepAlive(mib)
+	KeepAlive(out)
+	KeepAlive(size)
+	KeepAlive(dst)
+	return ret
+}
+func sysctl_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func fcntl(fd, cmd, arg int32) (ret int32, errno int32) {
+	args := struct {
+		fd, cmd, arg int32
+		ret, errno   int32
+	}{fd, cmd, arg, 0, 0}
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(fcntl_trampoline)), unsafe.Pointer(&args))
+	return args.ret, args.errno
+}
+func fcntl_trampoline()
+
+//go:nosplit
+func nanotime1() int64 {
+	var ts timespec
+	args := struct {
+		clock_id int32
+		tp       unsafe.Pointer
+	}{_CLOCK_MONOTONIC, unsafe.Pointer(&ts)}
+	if errno := libcCall(unsafe.Pointer(abi.FuncPCABI0(clock_gettime_trampoline)), unsafe.Pointer(&args)); errno < 0 {
+		// Avoid growing the nosplit stack.
+		systemstack(func() {
+			println("runtime: errno", -errno)
+			throw("clock_gettime failed")
+		})
+	}
+	return ts.tv_sec*1e9 + int64(ts.tv_nsec)
+}
+func clock_gettime_trampoline()
+
+//go:nosplit
+func walltime() (int64, int32) {
+	var ts timespec
+	args := struct {
+		clock_id int32
+		tp       unsafe.Pointer
+	}{_CLOCK_REALTIME, unsafe.Pointer(&ts)}
+	if errno := libcCall(unsafe.Pointer(abi.FuncPCABI0(clock_gettime_trampoline)), unsafe.Pointer(&args)); errno < 0 {
+		// Avoid growing the nosplit stack.
+		systemstack(func() {
+			println("runtime: errno", -errno)
+			throw("clock_gettime failed")
+		})
+	}
+	return ts.tv_sec, int32(ts.tv_nsec)
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+func kqueue() int32 {
+	return libcCall(unsafe.Pointer(abi.FuncPCABI0(kqueue_trampoline)), nil)
+}
+func kqueue_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func kevent(kq int32, ch *keventt, nch int32, ev *keventt, nev int32, ts *timespec) int32 {
+	ret := libcCall(unsafe.Pointer(abi.FuncPCABI0(kevent_trampoline)), unsafe.Pointer(&kq))
+	KeepAlive(ch)
+	KeepAlive(ev)
+	KeepAlive(ts)
+	return ret
+}
+func kevent_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sigaction(sig uint32, new *sigactiont, old *sigactiont) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(sigaction_trampoline)), unsafe.Pointer(&sig))
+	KeepAlive(new)
+	KeepAlive(old)
+}
+func sigaction_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sigprocmask(how uint32, new *sigset, old *sigset) {
+	// sigprocmask is called from sigsave, which is called from needm.
+	// As such, we have to be able to run with no g here.
+	asmcgocall_no_g(unsafe.Pointer(abi.FuncPCABI0(sigprocmask_trampoline)), unsafe.Pointer(&how))
+	KeepAlive(new)
+	KeepAlive(old)
+}
+func sigprocmask_trampoline()
+
+//go:nosplit
+//go:cgo_unsafe_args
+func sigaltstack(new *stackt, old *stackt) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(sigaltstack_trampoline)), unsafe.Pointer(&new))
+	KeepAlive(new)
+	KeepAlive(old)
+}
+func sigaltstack_trampoline()
+
+// Not used on OpenBSD, but must be defined.
+func exitThread(wait *atomic.Uint32) {
+	throw("exitThread")
+}
+
+//go:nosplit
+//go:cgo_unsafe_args
+func issetugid() (ret int32) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(issetugid_trampoline)), unsafe.Pointer(&ret))
+	return
+}
+func issetugid_trampoline()
+
+// Tell the linker that the libc_* functions are to be found
+// in a system library, with the libc_ prefix missing.
+
+//go:cgo_import_dynamic libc_errno __errno "libc.so"
+//go:cgo_import_dynamic libc_exit exit "libc.so"
+//go:cgo_import_dynamic libc_getthrid getthrid "libc.so"
+//go:cgo_import_dynamic libc_sched_yield sched_yield "libc.so"
+//go:cgo_import_dynamic libc_thrkill thrkill "libc.so"
+
+//go:cgo_import_dynamic libc_mmap mmap "libc.so"
+//go:cgo_import_dynamic libc_munmap munmap "libc.so"
+//go:cgo_import_dynamic libc_madvise madvise "libc.so"
+
+//go:cgo_import_dynamic libc_open open "libc.so"
+//go:cgo_import_dynamic libc_close close "libc.so"
+//go:cgo_import_dynamic libc_read read "libc.so"
+//go:cgo_import_dynamic libc_write write "libc.so"
+//go:cgo_import_dynamic libc_pipe2 pipe2 "libc.so"
+
+//go:cgo_import_dynamic libc_clock_gettime clock_gettime "libc.so"
+//go:cgo_import_dynamic libc_setitimer setitimer "libc.so"
+//go:cgo_import_dynamic libc_usleep usleep "libc.so"
+//go:cgo_import_dynamic libc_sysctl sysctl "libc.so"
+//go:cgo_import_dynamic libc_fcntl fcntl "libc.so"
+//go:cgo_import_dynamic libc_getpid getpid "libc.so"
+//go:cgo_import_dynamic libc_kill kill "libc.so"
+//go:cgo_import_dynamic libc_kqueue kqueue "libc.so"
+//go:cgo_import_dynamic libc_kevent kevent "libc.so"
+
+//go:cgo_import_dynamic libc_sigaction sigaction "libc.so"
+//go:cgo_import_dynamic libc_sigaltstack sigaltstack "libc.so"
+
+//go:cgo_import_dynamic libc_issetugid issetugid "libc.so"
+
+//go:cgo_import_dynamic _ _ "libc.so"
diff --git a/src/runtime/sys_openbsd3.go b/src/runtime/sys_openbsd3.go
new file mode 100644
index 0000000..269bf86
--- /dev/null
+++ b/src/runtime/sys_openbsd3.go
@@ -0,0 +1,116 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build openbsd && !mips64
+
+package runtime
+
+import (
+	"internal/abi"
+	"unsafe"
+)
+
+// The X versions of syscall expect the libc call to return a 64-bit result.
+// Otherwise (the non-X version) expects a 32-bit result.
+// This distinction is required because an error is indicated by returning -1,
+// and we need to know whether to check 32 or 64 bits of the result.
+// (Some libc functions that return 32 bits put junk in the upper 32 bits of AX.)
+
+//go:linkname syscall_syscall syscall.syscall
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscall(fn, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscall)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscall()
+
+//go:linkname syscall_syscallX syscall.syscallX
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscallX(fn, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscallX)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscallX()
+
+//go:linkname syscall_syscall6 syscall.syscall6
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscall6(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscall6)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscall6()
+
+//go:linkname syscall_syscall6X syscall.syscall6X
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscall6X(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscall6X)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscall6X()
+
+//go:linkname syscall_syscall10 syscall.syscall10
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscall10(fn, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscall10)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscall10()
+
+//go:linkname syscall_syscall10X syscall.syscall10X
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_syscall10X(fn, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10 uintptr) (r1, r2, err uintptr) {
+	entersyscall()
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscall10X)), unsafe.Pointer(&fn))
+	exitsyscall()
+	return
+}
+func syscall10X()
+
+//go:linkname syscall_rawSyscall syscall.rawSyscall
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_rawSyscall(fn, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscall)), unsafe.Pointer(&fn))
+	return
+}
+
+//go:linkname syscall_rawSyscall6 syscall.rawSyscall6
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_rawSyscall6(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscall6)), unsafe.Pointer(&fn))
+	return
+}
+
+//go:linkname syscall_rawSyscall6X syscall.rawSyscall6X
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_rawSyscall6X(fn, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscall6X)), unsafe.Pointer(&fn))
+	return
+}
+
+//go:linkname syscall_rawSyscall10X syscall.rawSyscall10X
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_rawSyscall10X(fn, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10 uintptr) (r1, r2, err uintptr) {
+	libcCall(unsafe.Pointer(abi.FuncPCABI0(syscall10X)), unsafe.Pointer(&fn))
+	return
+}
diff --git a/src/runtime/sys_openbsd_386.s b/src/runtime/sys_openbsd_386.s
new file mode 100644
index 0000000..6005c10
--- /dev/null
+++ b/src/runtime/sys_openbsd_386.s
@@ -0,0 +1,990 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for 386, OpenBSD
+// System calls are implemented in libc/libpthread, this file
+// contains trampolines that convert from Go to C calling convention.
+// Some direct system call implementations currently remain.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define	CLOCK_MONOTONIC	$3
+
+TEXT runtime·setldt(SB),NOSPLIT,$0
+	// Nothing to do, pthread already set thread-local storage up.
+	RET
+
+// mstart_stub is the first function executed on a new thread started by pthread_create.
+// It just does some low-level setup and then calls mstart.
+// Note: called with the C calling convention.
+TEXT runtime·mstart_stub(SB),NOSPLIT,$28
+	NOP	SP	// tell vet SP changed - stop checking offsets
+
+	// We are already on m's g0 stack.
+
+	// Save callee-save registers.
+	MOVL	BX, bx-4(SP)
+	MOVL	BP, bp-8(SP)
+	MOVL	SI, si-12(SP)
+	MOVL	DI, di-16(SP)
+
+	MOVL	32(SP), AX	// m
+	MOVL	m_g0(AX), DX
+	get_tls(CX)
+	MOVL	DX, g(CX)
+
+	CALL	runtime·mstart(SB)
+
+	// Restore callee-save registers.
+	MOVL	di-16(SP), DI
+	MOVL	si-12(SP), SI
+	MOVL	bp-8(SP),  BP
+	MOVL	bx-4(SP),  BX
+
+	// Go is all done with this OS thread.
+	// Tell pthread everything is ok (we never join with this thread, so
+	// the value here doesn't really matter).
+	MOVL	$0, AX
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-16
+	MOVL	fn+0(FP), AX
+	MOVL	sig+4(FP), BX
+	MOVL	info+8(FP), CX
+	MOVL	ctx+12(FP), DX
+	MOVL	SP, SI
+	SUBL	$32, SP
+	ANDL	$~15, SP	// align stack: handler might be a C function
+	MOVL	BX, 0(SP)
+	MOVL	CX, 4(SP)
+	MOVL	DX, 8(SP)
+	MOVL	SI, 12(SP)	// save SI: handler might be a Go function
+	CALL	AX
+	MOVL	12(SP), AX
+	MOVL	AX, SP
+	RET
+
+// Called by OS using C ABI.
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$28
+	NOP	SP	// tell vet SP changed - stop checking offsets
+	// Save callee-saved C registers, since the caller may be a C signal handler.
+	MOVL	BX, bx-4(SP)
+	MOVL	BP, bp-8(SP)
+	MOVL	SI, si-12(SP)
+	MOVL	DI, di-16(SP)
+	// We don't save mxcsr or the x87 control word because sigtrampgo doesn't
+	// modify them.
+
+	MOVL	32(SP), BX // signo
+	MOVL	BX, 0(SP)
+	MOVL	36(SP), BX // info
+	MOVL	BX, 4(SP)
+	MOVL	40(SP), BX // context
+	MOVL	BX, 8(SP)
+	CALL	runtime·sigtrampgo(SB)
+
+	MOVL	di-16(SP), DI
+	MOVL	si-12(SP), SI
+	MOVL	bp-8(SP),  BP
+	MOVL	bx-4(SP),  BX
+	RET
+
+// These trampolines help convert from Go calling convention to C calling convention.
+// They should be called with asmcgocall - note that while asmcgocall does
+// stack alignment, creation of a frame undoes it again.
+// A pointer to the arguments is passed on the stack.
+// A single int32 result is returned in AX.
+// (For more results, make an args/results structure.)
+TEXT runtime·pthread_attr_init_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$4, SP
+	MOVL	12(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	AX, 0(SP)		// arg 1 - attr
+	CALL	libc_pthread_attr_init(SB)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·pthread_attr_destroy_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$4, SP
+	MOVL	12(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	AX, 0(SP)		// arg 1 - attr
+	CALL	libc_pthread_attr_destroy(SB)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·pthread_attr_getstacksize_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$8, SP
+	MOVL	16(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	AX, 0(SP)		// arg 1 - attr
+	MOVL	BX, 4(SP)		// arg 2 - size
+	CALL	libc_pthread_attr_getstacksize(SB)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·pthread_attr_setdetachstate_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$8, SP
+	MOVL	16(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	AX, 0(SP)		// arg 1 - attr
+	MOVL	BX, 4(SP)		// arg 2 - state
+	CALL	libc_pthread_attr_setdetachstate(SB)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·pthread_create_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$20, SP
+	MOVL	28(SP), DX		// pointer to args
+	LEAL	16(SP), AX
+	MOVL	AX, 0(SP)		// arg 1 - &threadid (discarded)
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	8(DX), CX
+	MOVL	AX, 4(SP)		// arg 2 - attr
+	MOVL	BX, 8(SP)		// arg 3 - start
+	MOVL	CX, 12(SP)		// arg 4 - arg
+	CALL	libc_pthread_create(SB)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·thrkill_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$12, SP
+	MOVL	20(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	AX, 0(SP)		// arg 1 - tid
+	MOVL	BX, 4(SP)		// arg 2 - signal
+	MOVL	$0, 8(SP)		// arg 3 - tcb
+	CALL	libc_thrkill(SB)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·thrsleep_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$20, SP
+	MOVL	28(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	8(DX), CX
+	MOVL	AX, 0(SP)		// arg 1 - id
+	MOVL	BX, 4(SP)		// arg 2 - clock_id
+	MOVL	CX, 8(SP)		// arg 3 - abstime
+	MOVL	12(DX), AX
+	MOVL	16(DX), BX
+	MOVL	AX, 12(SP)		// arg 4 - lock
+	MOVL	BX, 16(SP)		// arg 5 - abort
+	CALL	libc_thrsleep(SB)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·thrwakeup_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$8, SP
+	MOVL	16(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	AX, 0(SP)		// arg 1 - id
+	MOVL	BX, 4(SP)		// arg 2 - count
+	CALL	libc_thrwakeup(SB)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·exit_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$4, SP
+	MOVL	12(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	AX, 0(SP)		// arg 1 - status
+	CALL	libc_exit(SB)
+	MOVL	$0xf1, 0xf1		// crash on failure
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·getthrid_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	CALL	libc_getthrid(SB)
+	NOP	SP			// tell vet SP changed - stop checking offsets
+	MOVL	8(SP), DX		// pointer to return value
+	MOVL	AX, 0(DX)
+	POPL	BP
+	RET
+
+TEXT runtime·raiseproc_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$8, SP
+	MOVL	16(SP), DX
+	MOVL	0(DX), BX
+	CALL	libc_getpid(SB)
+	MOVL	AX, 0(SP)		// arg 1 - pid
+	MOVL	BX, 4(SP)		// arg 2 - signal
+	CALL	libc_kill(SB)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·sched_yield_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	CALL	libc_sched_yield(SB)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·mmap_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$32, SP
+	MOVL	40(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	8(DX), CX
+	MOVL	AX, 0(SP)		// arg 1 - addr
+	MOVL	BX, 4(SP)		// arg 2 - len
+	MOVL	CX, 8(SP)		// arg 3 - prot
+	MOVL	12(DX), AX
+	MOVL	16(DX), BX
+	MOVL	20(DX), CX
+	MOVL	AX, 12(SP)		// arg 4 - flags
+	MOVL	BX, 16(SP)		// arg 5 - fid
+	MOVL	$0, 20(SP)		// pad
+	MOVL	CX, 24(SP)		// arg 6 - offset (low 32 bits)
+	MOVL	$0, 28(SP)		// offset (high 32 bits)
+	CALL	libc_mmap(SB)
+	MOVL	$0, BX
+	CMPL	AX, $-1
+	JNE	ok
+	CALL	libc_errno(SB)
+	MOVL	(AX), BX
+	MOVL	$0, AX
+ok:
+	MOVL	40(SP), DX
+	MOVL	AX, 24(DX)
+	MOVL	BX, 28(DX)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·munmap_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$8, SP
+	MOVL	16(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	AX, 0(SP)		// arg 1 - addr
+	MOVL	BX, 4(SP)		// arg 2 - len
+	CALL	libc_munmap(SB)
+	CMPL	AX, $-1
+	JNE	2(PC)
+	MOVL	$0xf1, 0xf1		// crash on failure
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·madvise_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$12, SP
+	MOVL	20(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	8(DX), CX
+	MOVL	AX, 0(SP)		// arg 1 - addr
+	MOVL	BX, 4(SP)		// arg 2 - len
+	MOVL	CX, 8(SP)		// arg 3 - advice
+	CALL	libc_madvise(SB)
+	// ignore failure - maybe pages are locked
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·open_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$16, SP
+	MOVL	24(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	8(DX), CX
+	MOVL	AX, 0(SP)		// arg 1 - path
+	MOVL	BX, 4(SP)		// arg 2 - flags
+	MOVL	CX, 8(SP)		// arg 3 - mode
+	MOVL	$0, 12(SP)		// vararg
+	CALL	libc_open(SB)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·close_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$4, SP
+	MOVL	12(SP), DX
+	MOVL	0(DX), AX
+	MOVL	AX, 0(SP)		// arg 1 - fd
+	CALL	libc_close(SB)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·read_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$12, SP
+	MOVL	20(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	8(DX), CX
+	MOVL	AX, 0(SP)		// arg 1 - fd
+	MOVL	BX, 4(SP)		// arg 2 - buf
+	MOVL	CX, 8(SP)		// arg 3 - count
+	CALL	libc_read(SB)
+	CMPL	AX, $-1
+	JNE	noerr
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX
+	NEGL	AX			// caller expects negative errno
+noerr:
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·write_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$12, SP
+	MOVL	20(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	8(DX), CX
+	MOVL	AX, 0(SP)		// arg 1 - fd
+	MOVL	BX, 4(SP)		// arg 2 - buf
+	MOVL	CX, 8(SP)		// arg 3 - count
+	CALL	libc_write(SB)
+	CMPL	AX, $-1
+	JNE	noerr
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX
+	NEGL	AX			// caller expects negative errno
+noerr:
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·pipe2_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$8, SP
+	MOVL	16(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	AX, 0(SP)		// arg 1 - fds
+	MOVL	BX, 4(SP)		// arg 2 - flags
+	CALL	libc_pipe2(SB)
+	CMPL	AX, $-1
+	JNE	noerr
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX
+	NEGL	AX			// caller expects negative errno
+noerr:
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·setitimer_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$12, SP
+	MOVL	20(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	8(DX), CX
+	MOVL	AX, 0(SP)		// arg 1 - which
+	MOVL	BX, 4(SP)		// arg 2 - new
+	MOVL	CX, 8(SP)		// arg 3 - old
+	CALL	libc_setitimer(SB)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·usleep_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$4, SP
+	MOVL	12(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	AX, 0(SP)
+	CALL	libc_usleep(SB)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·sysctl_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$24, SP
+	MOVL	32(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	8(DX), CX
+	MOVL	AX, 0(SP)		// arg 1 - name
+	MOVL	BX, 4(SP)		// arg 2 - namelen
+	MOVL	CX, 8(SP)		// arg 3 - old
+	MOVL	12(DX), AX
+	MOVL	16(DX), BX
+	MOVL	20(DX), CX
+	MOVL	AX, 12(SP)		// arg 4 - oldlenp
+	MOVL	BX, 16(SP)		// arg 5 - newp
+	MOVL	CX, 20(SP)		// arg 6 - newlen
+	CALL	libc_sysctl(SB)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·kqueue_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	CALL	libc_kqueue(SB)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·kevent_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$24, SP
+	MOVL	32(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	8(DX), CX
+	MOVL	AX, 0(SP)		// arg 1 - kq
+	MOVL	BX, 4(SP)		// arg 2 - keventt
+	MOVL	CX, 8(SP)		// arg 3 - nch
+	MOVL	12(DX), AX
+	MOVL	16(DX), BX
+	MOVL	20(DX), CX
+	MOVL	AX, 12(SP)		// arg 4 - ev
+	MOVL	BX, 16(SP)		// arg 5 - nev
+	MOVL	CX, 20(SP)		// arg 6 - ts
+	CALL	libc_kevent(SB)
+	CMPL	AX, $-1
+	JNE	noerr
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX
+	NEGL	AX			// caller expects negative errno
+noerr:
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·clock_gettime_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$8, SP
+	MOVL	16(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	AX, 0(SP)		// arg 1 - tp
+	MOVL	BX, 4(SP)		// arg 2 - clock_id
+	CALL	libc_clock_gettime(SB)
+	CMPL	AX, $-1
+	JNE	noerr
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX
+	NEGL	AX			// caller expects negative errno
+noerr:
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·fcntl_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$16, SP
+	MOVL	24(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	8(DX), CX
+	MOVL	AX, 0(SP)		// arg 1 - fd
+	MOVL	BX, 4(SP)		// arg 2 - cmd
+	MOVL	CX, 8(SP)		// arg 3 - arg
+	MOVL	$0, 12(SP)		// vararg
+	CALL	libc_fcntl(SB)
+	MOVL	$0, BX
+	CMPL	AX, $-1
+	JNE	noerr
+	CALL	libc_errno(SB)
+	MOVL	(AX), BX
+	MOVL	$-1, AX
+noerr:
+	MOVL	24(SP), DX		// pointer to args
+	MOVL	AX, 12(DX)
+	MOVL	BX, 16(DX)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·sigaction_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$12, SP
+	MOVL	20(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	8(DX), CX
+	MOVL	AX, 0(SP)		// arg 1 - sig
+	MOVL	BX, 4(SP)		// arg 2 - new
+	MOVL	CX, 8(SP)		// arg 3 - old
+	CALL	libc_sigaction(SB)
+	CMPL	AX, $-1
+	JNE	2(PC)
+	MOVL	$0xf1, 0xf1		// crash on failure
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·sigprocmask_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$12, SP
+	MOVL	20(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	8(DX), CX
+	MOVL	AX, 0(SP)		// arg 1 - how
+	MOVL	BX, 4(SP)		// arg 2 - new
+	MOVL	CX, 8(SP)		// arg 3 - old
+	CALL	libc_pthread_sigmask(SB)
+	CMPL	AX, $-1
+	JNE	2(PC)
+	MOVL	$0xf1, 0xf1		// crash on failure
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·sigaltstack_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+	SUBL	$8, SP
+	MOVL	16(SP), DX		// pointer to args
+	MOVL	0(DX), AX
+	MOVL	4(DX), BX
+	MOVL	AX, 0(SP)		// arg 1 - new
+	MOVL	BX, 4(SP)		// arg 2 - old
+	CALL	libc_sigaltstack(SB)
+	CMPL	AX, $-1
+	JNE	2(PC)
+	MOVL	$0xf1, 0xf1		// crash on failure
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+// syscall calls a function in libc on behalf of the syscall package.
+// syscall takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+
+	SUBL	$12, SP
+	MOVL	20(SP), BX		// pointer to args
+
+	MOVL	(1*4)(BX), AX
+	MOVL	(2*4)(BX), CX
+	MOVL	(3*4)(BX), DX
+	MOVL	AX, (0*4)(SP)		// a1
+	MOVL	CX, (1*4)(SP)		// a2
+	MOVL	DX, (2*4)(SP)		// a3
+
+	MOVL	(0*4)(BX), AX		// fn
+	CALL	AX
+
+	MOVL	AX, (4*4)(BX)		// r1
+	MOVL	DX, (5*4)(BX)		// r2
+
+	// Standard libc functions return -1 on error and set errno.
+	CMPL	AX, $-1
+	JNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX
+	MOVW	AX, (6*4)(BX)		// err
+
+ok:
+	MOVL	$0, AX			// no error (it's ignored anyway)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+// syscallX calls a function in libc on behalf of the syscall package.
+// syscallX takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscallX must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscallX is like syscall but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscallX(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+
+	SUBL	$12, SP
+	MOVL	20(SP), BX		// pointer to args
+
+	MOVL	(1*4)(BX), AX
+	MOVL	(2*4)(BX), CX
+	MOVL	(3*4)(BX), DX
+	MOVL	AX, (0*4)(SP)		// a1
+	MOVL	CX, (1*4)(SP)		// a2
+	MOVL	DX, (2*4)(SP)		// a3
+
+	MOVL	(0*4)(BX), AX		// fn
+	CALL	AX
+
+	MOVL	AX, (4*4)(BX)		// r1
+	MOVL	DX, (5*4)(BX)		// r2
+
+	// Standard libc functions return -1 on error and set errno.
+	CMPL	AX, $-1
+	JNE	ok
+	CMPL	DX, $-1
+	JNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX
+	MOVW	AX, (6*4)(BX)		// err
+
+ok:
+	MOVL	$0, AX			// no error (it's ignored anyway)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+// syscall6 calls a function in libc on behalf of the syscall package.
+// syscall6 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall6 expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall6(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+
+	SUBL	$24, SP
+	MOVL	32(SP), BX		// pointer to args
+
+	MOVL	(1*4)(BX), AX
+	MOVL	(2*4)(BX), CX
+	MOVL	(3*4)(BX), DX
+	MOVL	AX, (0*4)(SP)		// a1
+	MOVL	CX, (1*4)(SP)		// a2
+	MOVL	DX, (2*4)(SP)		// a3
+	MOVL	(4*4)(BX), AX
+	MOVL	(5*4)(BX), CX
+	MOVL	(6*4)(BX), DX
+	MOVL	AX, (3*4)(SP)		// a4
+	MOVL	CX, (4*4)(SP)		// a5
+	MOVL	DX, (5*4)(SP)		// a6
+
+	MOVL	(0*4)(BX), AX		// fn
+	CALL	AX
+
+	MOVL	AX, (7*4)(BX)		// r1
+	MOVL	DX, (8*4)(BX)		// r2
+
+	// Standard libc functions return -1 on error and set errno.
+	CMPL	AX, $-1
+	JNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX
+	MOVW	AX, (9*4)(BX)		// err
+
+ok:
+	MOVL	$0, AX			// no error (it's ignored anyway)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+// syscall6X calls a function in libc on behalf of the syscall package.
+// syscall6X takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6X must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall6X is like syscall6 but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscall6X(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+
+	SUBL	$24, SP
+	MOVL	32(SP), BX		// pointer to args
+
+	MOVL	(1*4)(BX), AX
+	MOVL	(2*4)(BX), CX
+	MOVL	(3*4)(BX), DX
+	MOVL	AX, (0*4)(SP)		// a1
+	MOVL	CX, (1*4)(SP)		// a2
+	MOVL	DX, (2*4)(SP)		// a3
+	MOVL	(4*4)(BX), AX
+	MOVL	(5*4)(BX), CX
+	MOVL	(6*4)(BX), DX
+	MOVL	AX, (3*4)(SP)		// a4
+	MOVL	CX, (4*4)(SP)		// a5
+	MOVL	DX, (5*4)(SP)		// a6
+
+	MOVL	(0*4)(BX), AX		// fn
+	CALL	AX
+
+	MOVL	AX, (7*4)(BX)		// r1
+	MOVL	DX, (8*4)(BX)		// r2
+
+	// Standard libc functions return -1 on error and set errno.
+	CMPL	AX, $-1
+	JNE	ok
+	CMPL	DX, $-1
+	JNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX
+	MOVW	AX, (9*4)(BX)		// err
+
+ok:
+	MOVL	$0, AX			// no error (it's ignored anyway)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+// syscall10 calls a function in libc on behalf of the syscall package.
+// syscall10 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	a7    uintptr
+//	a8    uintptr
+//	a9    uintptr
+//	a10   uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall10 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+TEXT runtime·syscall10(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+
+	SUBL	$40, SP
+	MOVL	48(SP), BX		// pointer to args
+
+	MOVL	(1*4)(BX), AX
+	MOVL	(2*4)(BX), CX
+	MOVL	(3*4)(BX), DX
+	MOVL	AX, (0*4)(SP)		// a1
+	MOVL	CX, (1*4)(SP)		// a2
+	MOVL	DX, (2*4)(SP)		// a3
+	MOVL	(4*4)(BX), AX
+	MOVL	(5*4)(BX), CX
+	MOVL	(6*4)(BX), DX
+	MOVL	AX, (3*4)(SP)		// a4
+	MOVL	CX, (4*4)(SP)		// a5
+	MOVL	DX, (5*4)(SP)		// a6
+	MOVL	(7*4)(BX), AX
+	MOVL	(8*4)(BX), CX
+	MOVL	(9*4)(BX), DX
+	MOVL	AX, (6*4)(SP)		// a7
+	MOVL	CX, (7*4)(SP)		// a8
+	MOVL	DX, (8*4)(SP)		// a9
+	MOVL	(10*4)(BX), AX
+	MOVL	AX, (9*4)(SP)		// a10
+
+	MOVL	(0*4)(BX), AX		// fn
+	CALL	AX
+
+	MOVL	AX, (11*4)(BX)		// r1
+	MOVL	DX, (12*4)(BX)		// r2
+
+	// Standard libc functions return -1 on error and set errno.
+	CMPL	AX, $-1
+	JNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX
+	MOVW	AX, (13*4)(BX)		// err
+
+ok:
+	MOVL	$0, AX			// no error (it's ignored anyway)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+// syscall10X calls a function in libc on behalf of the syscall package.
+// syscall10X takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	a7    uintptr
+//	a8    uintptr
+//	a9    uintptr
+//	a10   uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall10X must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall10X is like syscall9 but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscall10X(SB),NOSPLIT,$0
+	PUSHL	BP
+	MOVL	SP, BP
+
+	SUBL	$40, SP
+	MOVL	48(SP), BX		// pointer to args
+
+	MOVL	(1*4)(BX), AX
+	MOVL	(2*4)(BX), CX
+	MOVL	(3*4)(BX), DX
+	MOVL	AX, (0*4)(SP)		// a1
+	MOVL	CX, (1*4)(SP)		// a2
+	MOVL	DX, (2*4)(SP)		// a3
+	MOVL	(4*4)(BX), AX
+	MOVL	(5*4)(BX), CX
+	MOVL	(6*4)(BX), DX
+	MOVL	AX, (3*4)(SP)		// a4
+	MOVL	CX, (4*4)(SP)		// a5
+	MOVL	DX, (5*4)(SP)		// a6
+	MOVL	(7*4)(BX), AX
+	MOVL	(8*4)(BX), CX
+	MOVL	(9*4)(BX), DX
+	MOVL	AX, (6*4)(SP)		// a7
+	MOVL	CX, (7*4)(SP)		// a8
+	MOVL	DX, (8*4)(SP)		// a9
+	MOVL	(10*4)(BX), AX
+	MOVL	AX, (9*4)(SP)		// a10
+
+	MOVL	(0*4)(BX), AX		// fn
+	CALL	AX
+
+	MOVL	AX, (11*4)(BX)		// r1
+	MOVL	DX, (12*4)(BX)		// r2
+
+	// Standard libc functions return -1 on error and set errno.
+	CMPL	AX, $-1
+	JNE	ok
+	CMPL	DX, $-1
+	JNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX
+	MOVW	AX, (13*4)(BX)		// err
+
+ok:
+	MOVL	$0, AX			// no error (it's ignored anyway)
+	MOVL	BP, SP
+	POPL	BP
+	RET
+
+TEXT runtime·issetugid_trampoline(SB),NOSPLIT,$0
+	PUSHL	BP
+	CALL	libc_issetugid(SB)
+	NOP	SP			// tell vet SP changed - stop checking offsets
+	MOVL	8(SP), DX		// pointer to return value
+	MOVL	AX, 0(DX)
+	POPL	BP
+	RET
diff --git a/src/runtime/sys_openbsd_amd64.s b/src/runtime/sys_openbsd_amd64.s
new file mode 100644
index 0000000..ff0bc24
--- /dev/null
+++ b/src/runtime/sys_openbsd_amd64.s
@@ -0,0 +1,666 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for AMD64, OpenBSD.
+// System calls are implemented in libc/libpthread, this file
+// contains trampolines that convert from Go to C calling convention.
+// Some direct system call implementations currently remain.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "cgo/abi_amd64.h"
+
+#define CLOCK_MONOTONIC	$3
+
+TEXT runtime·settls(SB),NOSPLIT,$0
+	// Nothing to do, pthread already set thread-local storage up.
+	RET
+
+// mstart_stub is the first function executed on a new thread started by pthread_create.
+// It just does some low-level setup and then calls mstart.
+// Note: called with the C calling convention.
+TEXT runtime·mstart_stub(SB),NOSPLIT,$0
+	// DI points to the m.
+	// We are already on m's g0 stack.
+
+	// Transition from C ABI to Go ABI.
+	PUSH_REGS_HOST_TO_ABI0()
+
+	// Load g and save to TLS entry.
+	// See cmd/link/internal/ld/sym.go:computeTLSOffset.
+	MOVQ	m_g0(DI), DX // g
+	MOVQ	DX, -8(FS)
+
+	CALL	runtime·mstart(SB)
+
+	POP_REGS_HOST_TO_ABI0()
+
+	// Go is all done with this OS thread.
+	// Tell pthread everything is ok (we never join with this thread, so
+	// the value here doesn't really matter).
+	XORL	AX, AX
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVQ	fn+0(FP),    AX
+	MOVL	sig+8(FP),   DI
+	MOVQ	info+16(FP), SI
+	MOVQ	ctx+24(FP),  DX
+	MOVQ	SP, BX		// callee-saved
+	ANDQ	$~15, SP	// alignment for x86_64 ABI
+	CALL	AX
+	MOVQ	BX, SP
+	RET
+
+// Called using C ABI.
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
+	// Transition from C ABI to Go ABI.
+	PUSH_REGS_HOST_TO_ABI0()
+
+	// Set up ABIInternal environment: g in R14, cleared X15.
+	get_tls(R12)
+	MOVQ	g(R12), R14
+	PXOR	X15, X15
+
+	// Reserve space for spill slots.
+	NOP	SP		// disable vet stack checking
+	ADJSP   $24
+
+	// Call into the Go signal handler
+	MOVQ	DI, AX	// sig
+	MOVQ	SI, BX	// info
+	MOVQ	DX, CX	// ctx
+	CALL	·sigtrampgo<ABIInternal>(SB)
+
+	ADJSP	$-24
+
+	POP_REGS_HOST_TO_ABI0()
+	RET
+
+//
+// These trampolines help convert from Go calling convention to C calling convention.
+// They should be called with asmcgocall.
+// A pointer to the arguments is passed in DI.
+// A single int32 result is returned in AX.
+// (For more results, make an args/results structure.)
+TEXT runtime·pthread_attr_init_trampoline(SB),NOSPLIT,$0
+	MOVQ	0(DI), DI		// arg 1 - attr
+	CALL	libc_pthread_attr_init(SB)
+	RET
+
+TEXT runtime·pthread_attr_destroy_trampoline(SB),NOSPLIT,$0
+	MOVQ	0(DI), DI		// arg 1 - attr
+	CALL	libc_pthread_attr_destroy(SB)
+	RET
+
+TEXT runtime·pthread_attr_getstacksize_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 - stacksize
+	MOVQ	0(DI), DI		// arg 1 - attr
+	CALL	libc_pthread_attr_getstacksize(SB)
+	RET
+
+TEXT runtime·pthread_attr_setdetachstate_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 - detachstate
+	MOVQ	0(DI), DI		// arg 1 - attr
+	CALL	libc_pthread_attr_setdetachstate(SB)
+	RET
+
+TEXT runtime·pthread_create_trampoline(SB),NOSPLIT,$16
+	MOVQ	0(DI), SI		// arg 2 - attr
+	MOVQ	8(DI), DX		// arg 3 - start
+	MOVQ	16(DI), CX		// arg 4 - arg
+	MOVQ	SP, DI			// arg 1 - &thread (discarded)
+	CALL	libc_pthread_create(SB)
+	RET
+
+TEXT runtime·thrkill_trampoline(SB),NOSPLIT,$0
+	MOVL	8(DI), SI		// arg 2 - signal
+	MOVQ	$0, DX			// arg 3 - tcb
+	MOVL	0(DI), DI		// arg 1 - tid
+	CALL	libc_thrkill(SB)
+	RET
+
+TEXT runtime·thrsleep_trampoline(SB),NOSPLIT,$0
+	MOVL	8(DI), SI		// arg 2 - clock_id
+	MOVQ	16(DI), DX		// arg 3 - abstime
+	MOVQ	24(DI), CX		// arg 4 - lock
+	MOVQ	32(DI), R8		// arg 5 - abort
+	MOVQ	0(DI), DI		// arg 1 - id
+	CALL	libc_thrsleep(SB)
+	RET
+
+TEXT runtime·thrwakeup_trampoline(SB),NOSPLIT,$0
+	MOVL	8(DI), SI		// arg 2 - count
+	MOVQ	0(DI), DI		// arg 1 - id
+	CALL	libc_thrwakeup(SB)
+	RET
+
+TEXT runtime·exit_trampoline(SB),NOSPLIT,$0
+	MOVL	0(DI), DI		// arg 1 exit status
+	CALL	libc_exit(SB)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·getthrid_trampoline(SB),NOSPLIT,$0
+	MOVQ	DI, BX			// BX is caller-save
+	CALL	libc_getthrid(SB)
+	MOVL	AX, 0(BX)		// return value
+	RET
+
+TEXT runtime·raiseproc_trampoline(SB),NOSPLIT,$0
+	MOVL	0(DI), BX	// signal
+	CALL	libc_getpid(SB)
+	MOVL	AX, DI		// arg 1 pid
+	MOVL	BX, SI		// arg 2 signal
+	CALL	libc_kill(SB)
+	RET
+
+TEXT runtime·sched_yield_trampoline(SB),NOSPLIT,$0
+	CALL	libc_sched_yield(SB)
+	RET
+
+TEXT runtime·mmap_trampoline(SB),NOSPLIT,$0
+	MOVQ	DI, BX
+	MOVQ	0(BX), DI		// arg 1 addr
+	MOVQ	8(BX), SI		// arg 2 len
+	MOVL	16(BX), DX		// arg 3 prot
+	MOVL	20(BX), CX		// arg 4 flags
+	MOVL	24(BX), R8		// arg 5 fid
+	MOVL	28(BX), R9		// arg 6 offset
+	CALL	libc_mmap(SB)
+	XORL	DX, DX
+	CMPQ	AX, $-1
+	JNE	ok
+	CALL	libc_errno(SB)
+	MOVLQSX	(AX), DX		// errno
+	XORQ	AX, AX
+ok:
+	MOVQ	AX, 32(BX)
+	MOVQ	DX, 40(BX)
+	RET
+
+TEXT runtime·munmap_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 len
+	MOVQ	0(DI), DI		// arg 1 addr
+	CALL	libc_munmap(SB)
+	TESTQ	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·madvise_trampoline(SB), NOSPLIT, $0
+	MOVQ	8(DI), SI	// arg 2 len
+	MOVL	16(DI), DX	// arg 3 advice
+	MOVQ	0(DI), DI	// arg 1 addr
+	CALL	libc_madvise(SB)
+	// ignore failure - maybe pages are locked
+	RET
+
+TEXT runtime·open_trampoline(SB),NOSPLIT,$0
+	MOVL	8(DI), SI		// arg 2 - flags
+	MOVL	12(DI), DX		// arg 3 - mode
+	MOVQ	0(DI), DI		// arg 1 - path
+	XORL	AX, AX			// vararg: say "no float args"
+	CALL	libc_open(SB)
+	RET
+
+TEXT runtime·close_trampoline(SB),NOSPLIT,$0
+	MOVL	0(DI), DI		// arg 1 - fd
+	CALL	libc_close(SB)
+	RET
+
+TEXT runtime·read_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 - buf
+	MOVL	16(DI), DX		// arg 3 - count
+	MOVL	0(DI), DI		// arg 1 - fd
+	CALL	libc_read(SB)
+	TESTL	AX, AX
+	JGE	noerr
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX		// errno
+	NEGL	AX			// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·write_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 buf
+	MOVL	16(DI), DX		// arg 3 count
+	MOVL	0(DI), DI		// arg 1 fd
+	CALL	libc_write(SB)
+	TESTL	AX, AX
+	JGE	noerr
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX		// errno
+	NEGL	AX			// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·pipe2_trampoline(SB),NOSPLIT,$0
+	MOVL	8(DI), SI		// arg 2 flags
+	MOVQ	0(DI), DI		// arg 1 filedes
+	CALL	libc_pipe2(SB)
+	TESTL	AX, AX
+	JEQ	3(PC)
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX		// errno
+	NEGL	AX			// caller expects negative errno value
+	RET
+
+TEXT runtime·setitimer_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 new
+	MOVQ	16(DI), DX		// arg 3 old
+	MOVL	0(DI), DI		// arg 1 which
+	CALL	libc_setitimer(SB)
+	RET
+
+TEXT runtime·usleep_trampoline(SB),NOSPLIT,$0
+	MOVL	0(DI), DI		// arg 1 usec
+	CALL	libc_usleep(SB)
+	RET
+
+TEXT runtime·sysctl_trampoline(SB),NOSPLIT,$0
+	MOVL	8(DI), SI		// arg 2 miblen
+	MOVQ	16(DI), DX		// arg 3 out
+	MOVQ	24(DI), CX		// arg 4 size
+	MOVQ	32(DI), R8		// arg 5 dst
+	MOVQ	40(DI), R9		// arg 6 ndst
+	MOVQ	0(DI), DI		// arg 1 mib
+	CALL	libc_sysctl(SB)
+	RET
+
+TEXT runtime·kqueue_trampoline(SB),NOSPLIT,$0
+	CALL	libc_kqueue(SB)
+	RET
+
+TEXT runtime·kevent_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 keventt
+	MOVL	16(DI), DX		// arg 3 nch
+	MOVQ	24(DI), CX		// arg 4 ev
+	MOVL	32(DI), R8		// arg 5 nev
+	MOVQ	40(DI), R9		// arg 6 ts
+	MOVL	0(DI), DI		// arg 1 kq
+	CALL	libc_kevent(SB)
+	CMPL	AX, $-1
+	JNE	ok
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX		// errno
+	NEGL	AX			// caller expects negative errno value
+ok:
+	RET
+
+TEXT runtime·clock_gettime_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 tp
+	MOVL	0(DI), DI		// arg 1 clock_id
+	CALL	libc_clock_gettime(SB)
+	TESTL	AX, AX
+	JEQ	noerr
+	CALL	libc_errno(SB)
+	MOVL	(AX), AX		// errno
+	NEGL	AX			// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·fcntl_trampoline(SB),NOSPLIT,$0
+	MOVQ	DI, BX
+	MOVL	0(BX), DI		// arg 1 fd
+	MOVL	4(BX), SI		// arg 2 cmd
+	MOVL	8(BX), DX		// arg 3 arg
+	XORL	AX, AX			// vararg: say "no float args"
+	CALL	libc_fcntl(SB)
+	XORL	DX, DX
+	CMPL	AX, $-1
+	JNE	noerr
+	CALL	libc_errno(SB)
+	MOVL	(AX), DX
+	MOVL	$-1, AX
+noerr:
+	MOVL	AX, 12(BX)
+	MOVL	DX, 16(BX)
+	RET
+
+TEXT runtime·sigaction_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 new
+	MOVQ	16(DI), DX		// arg 3 old
+	MOVL	0(DI), DI		// arg 1 sig
+	CALL	libc_sigaction(SB)
+	TESTL	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·sigprocmask_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI	// arg 2 new
+	MOVQ	16(DI), DX	// arg 3 old
+	MOVL	0(DI), DI	// arg 1 how
+	CALL	libc_pthread_sigmask(SB)
+	TESTL	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+TEXT runtime·sigaltstack_trampoline(SB),NOSPLIT,$0
+	MOVQ	8(DI), SI		// arg 2 old
+	MOVQ	0(DI), DI		// arg 1 new
+	CALL	libc_sigaltstack(SB)
+	TESTQ	AX, AX
+	JEQ	2(PC)
+	MOVL	$0xf1, 0xf1  // crash
+	RET
+
+// syscall calls a function in libc on behalf of the syscall package.
+// syscall takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall(SB),NOSPLIT,$16
+	MOVQ	(0*8)(DI), CX // fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	CX
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (4*8)(DI) // r1
+	MOVQ	DX, (5*8)(DI) // r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMPL	AX, $-1	      // Note: high 32 bits are junk
+	JNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (6*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	RET
+
+// syscallX calls a function in libc on behalf of the syscall package.
+// syscallX takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscallX must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscallX is like syscall but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscallX(SB),NOSPLIT,$16
+	MOVQ	(0*8)(DI), CX // fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	CX
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (4*8)(DI) // r1
+	MOVQ	DX, (5*8)(DI) // r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMPQ	AX, $-1
+	JNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (6*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	RET
+
+// syscall6 calls a function in libc on behalf of the syscall package.
+// syscall6 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall6 expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall6(SB),NOSPLIT,$16
+	MOVQ	(0*8)(DI), R11// fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	(4*8)(DI), CX // a4
+	MOVQ	(5*8)(DI), R8 // a5
+	MOVQ	(6*8)(DI), R9 // a6
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	R11
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (7*8)(DI) // r1
+	MOVQ	DX, (8*8)(DI) // r2
+
+	CMPL	AX, $-1
+	JNE	ok
+
+	CALL	libc_errno(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (9*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	RET
+
+// syscall6X calls a function in libc on behalf of the syscall package.
+// syscall6X takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6X must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall6X is like syscall6 but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscall6X(SB),NOSPLIT,$16
+	MOVQ	(0*8)(DI), R11// fn
+	MOVQ	(2*8)(DI), SI // a2
+	MOVQ	(3*8)(DI), DX // a3
+	MOVQ	(4*8)(DI), CX // a4
+	MOVQ	(5*8)(DI), R8 // a5
+	MOVQ	(6*8)(DI), R9 // a6
+	MOVQ	DI, (SP)
+	MOVQ	(1*8)(DI), DI // a1
+	XORL	AX, AX	      // vararg: say "no float args"
+
+	CALL	R11
+
+	MOVQ	(SP), DI
+	MOVQ	AX, (7*8)(DI) // r1
+	MOVQ	DX, (8*8)(DI) // r2
+
+	CMPQ	AX, $-1
+	JNE	ok
+
+	CALL	libc_errno(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(SP), DI
+	MOVQ	AX, (9*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	RET
+
+// syscall10 calls a function in libc on behalf of the syscall package.
+// syscall10 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	a7    uintptr
+//	a8    uintptr
+//	a9    uintptr
+//	a10   uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall10 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+TEXT runtime·syscall10(SB),NOSPLIT,$48
+	// Arguments a1 to a6 get passed in registers, with a7 onwards being
+	// passed via the stack per the x86-64 System V ABI
+	// (https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf).
+	MOVQ	(7*8)(DI), R10	// a7
+	MOVQ	(8*8)(DI), R11	// a8
+	MOVQ	(9*8)(DI), R12	// a9
+	MOVQ	(10*8)(DI), R13	// a10
+	MOVQ	R10, (0*8)(SP)	// a7
+	MOVQ	R11, (1*8)(SP)	// a8
+	MOVQ	R12, (2*8)(SP)	// a9
+	MOVQ	R13, (3*8)(SP)	// a10
+	MOVQ	(0*8)(DI), R11	// fn
+	MOVQ	(2*8)(DI), SI	// a2
+	MOVQ	(3*8)(DI), DX	// a3
+	MOVQ	(4*8)(DI), CX	// a4
+	MOVQ	(5*8)(DI), R8	// a5
+	MOVQ	(6*8)(DI), R9	// a6
+	MOVQ	DI, (4*8)(SP)
+	MOVQ	(1*8)(DI), DI	// a1
+	XORL	AX, AX	     	// vararg: say "no float args"
+
+	CALL	R11
+
+	MOVQ	(4*8)(SP), DI
+	MOVQ	AX, (11*8)(DI) // r1
+	MOVQ	DX, (12*8)(DI) // r2
+
+	CMPL	AX, $-1
+	JNE	ok
+
+	CALL	libc_errno(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(4*8)(SP), DI
+	MOVQ	AX, (13*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	RET
+
+// syscall10X calls a function in libc on behalf of the syscall package.
+// syscall10X takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	a7    uintptr
+//	a8    uintptr
+//	a9    uintptr
+//	a10   uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall10X must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall10X is like syscall10 but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscall10X(SB),NOSPLIT,$48
+	// Arguments a1 to a6 get passed in registers, with a7 onwards being
+	// passed via the stack per the x86-64 System V ABI
+	// (https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf).
+	MOVQ	(7*8)(DI), R10	// a7
+	MOVQ	(8*8)(DI), R11	// a8
+	MOVQ	(9*8)(DI), R12	// a9
+	MOVQ	(10*8)(DI), R13	// a10
+	MOVQ	R10, (0*8)(SP)	// a7
+	MOVQ	R11, (1*8)(SP)	// a8
+	MOVQ	R12, (2*8)(SP)	// a9
+	MOVQ	R13, (3*8)(SP)	// a10
+	MOVQ	(0*8)(DI), R11	// fn
+	MOVQ	(2*8)(DI), SI	// a2
+	MOVQ	(3*8)(DI), DX	// a3
+	MOVQ	(4*8)(DI), CX	// a4
+	MOVQ	(5*8)(DI), R8	// a5
+	MOVQ	(6*8)(DI), R9	// a6
+	MOVQ	DI, (4*8)(SP)
+	MOVQ	(1*8)(DI), DI	// a1
+	XORL	AX, AX	     	// vararg: say "no float args"
+
+	CALL	R11
+
+	MOVQ	(4*8)(SP), DI
+	MOVQ	AX, (11*8)(DI) // r1
+	MOVQ	DX, (12*8)(DI) // r2
+
+	CMPQ	AX, $-1
+	JNE	ok
+
+	CALL	libc_errno(SB)
+	MOVLQSX	(AX), AX
+	MOVQ	(4*8)(SP), DI
+	MOVQ	AX, (13*8)(DI) // err
+
+ok:
+	XORL	AX, AX        // no error (it's ignored anyway)
+	RET
+
+TEXT runtime·issetugid_trampoline(SB),NOSPLIT,$0
+	MOVQ	DI, BX			// BX is caller-save
+	CALL	libc_issetugid(SB)
+	MOVL	AX, 0(BX)		// return value
+	RET
diff --git a/src/runtime/sys_openbsd_arm.s b/src/runtime/sys_openbsd_arm.s
new file mode 100644
index 0000000..61b901b
--- /dev/null
+++ b/src/runtime/sys_openbsd_arm.s
@@ -0,0 +1,827 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for ARM, OpenBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define	CLOCK_REALTIME	$0
+#define	CLOCK_MONOTONIC	$3
+
+// With OpenBSD 6.7 onwards, an armv7 syscall returns two instructions
+// after the SWI instruction, to allow for a speculative execution
+// barrier to be placed after the SWI without impacting performance.
+// For now use hardware no-ops as this works with both older and newer
+// kernels. After OpenBSD 6.8 is released this should be changed to
+// speculation barriers.
+#define NOOP	MOVW    R0, R0
+#define	INVOKE_SYSCALL	\
+	SWI	$0;	\
+	NOOP;		\
+	NOOP
+
+// mstart_stub is the first function executed on a new thread started by pthread_create.
+// It just does some low-level setup and then calls mstart.
+// Note: called with the C calling convention.
+TEXT runtime·mstart_stub(SB),NOSPLIT,$0
+	// R0 points to the m.
+	// We are already on m's g0 stack.
+
+	// Save callee-save registers.
+	MOVM.DB.W [R4-R11], (R13)
+
+	MOVW	m_g0(R0), g
+	BL	runtime·save_g(SB)
+
+	BL	runtime·mstart(SB)
+
+	// Restore callee-save registers.
+	MOVM.IA.W (R13), [R4-R11]
+
+	// Go is all done with this OS thread.
+	// Tell pthread everything is ok (we never join with this thread, so
+	// the value here doesn't really matter).
+	MOVW	$0, R0
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-16
+	MOVW	sig+4(FP), R0
+	MOVW	info+8(FP), R1
+	MOVW	ctx+12(FP), R2
+	MOVW	fn+0(FP), R3
+	MOVW	R13, R9
+	SUB	$24, R13
+	BIC	$0x7, R13 // alignment for ELF ABI
+	BL	(R3)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$0
+	// Reserve space for callee-save registers and arguments.
+	MOVM.DB.W [R4-R11], (R13)
+	SUB	$16, R13
+
+	// If called from an external code context, g will not be set.
+	// Save R0, since runtime·load_g will clobber it.
+	MOVW	R0, 4(R13)		// signum
+	BL	runtime·load_g(SB)
+
+	MOVW	R1, 8(R13)
+	MOVW	R2, 12(R13)
+	BL	runtime·sigtrampgo(SB)
+
+	// Restore callee-save registers.
+	ADD	$16, R13
+	MOVM.IA.W (R13), [R4-R11]
+
+	RET
+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	B	runtime·armPublicationBarrier(SB)
+
+// TODO(jsing): OpenBSD only supports GOARM=7 machines... this
+// should not be needed, however the linker still allows GOARM=5
+// on this platform.
+TEXT runtime·read_tls_fallback(SB),NOSPLIT|NOFRAME,$0
+	MOVM.WP	[R1, R2, R3, R12], (R13)
+	MOVW	$330, R12		// sys___get_tcb
+	INVOKE_SYSCALL
+	MOVM.IAW (R13), [R1, R2, R3, R12]
+	RET
+
+// These trampolines help convert from Go calling convention to C calling convention.
+// They should be called with asmcgocall - note that while asmcgocall does
+// stack alignment, creation of a frame undoes it again.
+// A pointer to the arguments is passed in R0.
+// A single int32 result is returned in R0.
+// (For more results, make an args/results structure.)
+TEXT runtime·pthread_attr_init_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	0(R0), R0		// arg 1 attr
+	CALL	libc_pthread_attr_init(SB)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·pthread_attr_destroy_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	0(R0), R0		// arg 1 attr
+	CALL	libc_pthread_attr_destroy(SB)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·pthread_attr_getstacksize_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 size
+	MOVW	0(R0), R0		// arg 1 attr
+	CALL	libc_pthread_attr_getstacksize(SB)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·pthread_attr_setdetachstate_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 state
+	MOVW	0(R0), R0		// arg 1 attr
+	CALL	libc_pthread_attr_setdetachstate(SB)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·pthread_create_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	SUB	$16, R13
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	0(R0), R1		// arg 2 attr
+	MOVW	4(R0), R2		// arg 3 start
+	MOVW	8(R0), R3		// arg 4 arg
+	MOVW	R13, R0			// arg 1 &threadid (discarded)
+	CALL	libc_pthread_create(SB)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·thrkill_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 - signal
+	MOVW	$0, R2			// arg 3 - tcb
+	MOVW	0(R0), R0		// arg 1 - tid
+	CALL	libc_thrkill(SB)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·thrsleep_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	SUB	$16, R13
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 - clock_id
+	MOVW	8(R0), R2		// arg 3 - abstime
+	MOVW	12(R0), R3		// arg 4 - lock
+	MOVW	16(R0), R4		// arg 5 - abort (on stack)
+	MOVW	R4, 0(R13)
+	MOVW	0(R0), R0		// arg 1 - id
+	CALL	libc_thrsleep(SB)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·thrwakeup_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 - count
+	MOVW	0(R0), R0		// arg 1 - id
+	CALL	libc_thrwakeup(SB)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·exit_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	0(R0), R0		// arg 1 exit status
+	BL	libc_exit(SB)
+	MOVW	$0, R8			// crash on failure
+	MOVW	R8, (R8)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·getthrid_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	MOVW	R0, R8
+	BIC     $0x7, R13		// align for ELF ABI
+	BL	libc_getthrid(SB)
+	MOVW	R0, 0(R8)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·raiseproc_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	R0, R4
+	BL	libc_getpid(SB)		// arg 1 pid
+	MOVW	R4, R1			// arg 2 signal
+	BL	libc_kill(SB)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·sched_yield_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	BL	libc_sched_yield(SB)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·mmap_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	SUB	$16, R13
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	R0, R8
+	MOVW	4(R0), R1		// arg 2 len
+	MOVW	8(R0), R2		// arg 3 prot
+	MOVW	12(R0), R3		// arg 4 flags
+	MOVW	16(R0), R4		// arg 5 fid (on stack)
+	MOVW	R4, 0(R13)
+	MOVW	$0, R5			// pad (on stack)
+	MOVW	R5, 4(R13)
+	MOVW	20(R0), R6		// arg 6 offset (on stack)
+	MOVW	R6, 8(R13)		// low 32 bits
+	MOVW    $0, R7
+	MOVW	R7, 12(R13)		// high 32 bits
+	MOVW	0(R0), R0		// arg 1 addr
+	BL	libc_mmap(SB)
+	MOVW	$0, R1
+	CMP	$-1, R0
+	BNE	ok
+	BL	libc_errno(SB)
+	MOVW	(R0), R1		// errno
+	MOVW	$0, R0
+ok:
+	MOVW	R0, 24(R8)
+	MOVW	R1, 28(R8)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·munmap_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 len
+	MOVW	0(R0), R0		// arg 1 addr
+	BL	libc_munmap(SB)
+	CMP	$-1, R0
+	BNE	3(PC)
+	MOVW	$0, R8			// crash on failure
+	MOVW	R8, (R8)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·madvise_trampoline(SB), NOSPLIT, $0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 len
+	MOVW	8(R0), R2		// arg 3 advice
+	MOVW	0(R0), R0		// arg 1 addr
+	BL	libc_madvise(SB)
+	// ignore failure - maybe pages are locked
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·open_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	SUB	$8, R13
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 - flags
+	MOVW	8(R0), R2		// arg 3 - mode (vararg, on stack)
+	MOVW	R2, 0(R13)
+	MOVW	0(R0), R0		// arg 1 - path
+	MOVW	R13, R4
+	BIC     $0x7, R13		// align for ELF ABI
+	BL	libc_open(SB)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·close_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	0(R0), R0		// arg 1 - fd
+	BL	libc_close(SB)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·read_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 - buf
+	MOVW	8(R0), R2		// arg 3 - count
+	MOVW	0(R0), R0		// arg 1 - fd
+	BL	libc_read(SB)
+	CMP	$-1, R0
+	BNE	noerr
+	BL	libc_errno(SB)
+	MOVW	(R0), R0		// errno
+	RSB.CS	$0, R0			// caller expects negative errno
+noerr:
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·write_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 buf
+	MOVW	8(R0), R2		// arg 3 count
+	MOVW	0(R0), R0		// arg 1 fd
+	BL	libc_write(SB)
+	CMP	$-1, R0
+	BNE	noerr
+	BL	libc_errno(SB)
+	MOVW	(R0), R0		// errno
+	RSB.CS	$0, R0			// caller expects negative errno
+noerr:
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·pipe2_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 flags
+	MOVW	0(R0), R0		// arg 1 filedes
+	BL	libc_pipe2(SB)
+	CMP	$-1, R0
+	BNE	3(PC)
+	BL	libc_errno(SB)
+	MOVW	(R0), R0		// errno
+	RSB.CS	$0, R0			// caller expects negative errno
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·setitimer_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 new
+	MOVW	8(R0), R2		// arg 3 old
+	MOVW	0(R0), R0		// arg 1 which
+	BL	libc_setitimer(SB)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·usleep_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	0(R0), R0		// arg 1 usec
+	BL	libc_usleep(SB)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·sysctl_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	SUB	$8, R13
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 miblen
+	MOVW	8(R0), R2		// arg 3 out
+	MOVW	12(R0), R3		// arg 4 size
+	MOVW	16(R0), R4		// arg 5 dst (on stack)
+	MOVW	R4, 0(R13)
+	MOVW	20(R0), R5		// arg 6 ndst (on stack)
+	MOVW	R5, 4(R13)
+	MOVW	0(R0), R0		// arg 1 mib
+	BL	libc_sysctl(SB)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·kqueue_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	BL	libc_kqueue(SB)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·kevent_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	SUB	$8, R13
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 keventt
+	MOVW	8(R0), R2		// arg 3 nch
+	MOVW	12(R0), R3		// arg 4 ev
+	MOVW	16(R0), R4		// arg 5 nev (on stack)
+	MOVW	R4, 0(R13)
+	MOVW	20(R0), R5		// arg 6 ts (on stack)
+	MOVW	R5, 4(R13)
+	MOVW	0(R0), R0		// arg 1 kq
+	BL	libc_kevent(SB)
+	CMP	$-1, R0
+	BNE	ok
+	BL	libc_errno(SB)
+	MOVW	(R0), R0		// errno
+	RSB.CS	$0, R0			// caller expects negative errno
+ok:
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·clock_gettime_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 tp
+	MOVW	0(R0), R0		// arg 1 clock_id
+	BL	libc_clock_gettime(SB)
+	CMP	$-1, R0
+	BNE	noerr
+	BL	libc_errno(SB)
+	MOVW	(R0), R0		// errno
+	RSB.CS	$0, R0			// caller expects negative errno
+noerr:
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·fcntl_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	SUB	$8, R13
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	R0, R8
+	MOVW	0(R8), R0		// arg 1 fd
+	MOVW	4(R8), R1		// arg 2 cmd
+	MOVW	8(R8), R2		// arg 3 arg (vararg, on stack)
+	MOVW	R2, 0(R13)
+	BL	libc_fcntl(SB)
+	MOVW	$0, R1
+	CMP	$-1, R0
+	BNE	noerr
+	BL	libc_errno(SB)
+	MOVW	(R0), R1
+	MOVW	$-1, R0
+noerr:
+	MOVW	R0, 12(R8)
+	MOVW	R1, 16(R8)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·sigaction_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 new
+	MOVW	8(R0), R2		// arg 3 old
+	MOVW	0(R0), R0		// arg 1 sig
+	BL	libc_sigaction(SB)
+	CMP	$-1, R0
+	BNE	3(PC)
+	MOVW	$0, R8			// crash on failure
+	MOVW	R8, (R8)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·sigprocmask_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 new
+	MOVW	8(R0), R2		// arg 3 old
+	MOVW	0(R0), R0		// arg 1 how
+	BL	libc_pthread_sigmask(SB)
+	CMP	$-1, R0
+	BNE	3(PC)
+	MOVW	$0, R8			// crash on failure
+	MOVW	R8, (R8)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·sigaltstack_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+	MOVW	4(R0), R1		// arg 2 old
+	MOVW	0(R0), R0		// arg 1 new
+	BL	libc_sigaltstack(SB)
+	CMP	$-1, R0
+	BNE	3(PC)
+	MOVW	$0, R8			// crash on failure
+	MOVW	R8, (R8)
+	MOVW	R9, R13
+	RET
+
+// syscall calls a function in libc on behalf of the syscall package.
+// syscall takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+
+	MOVW	R0, R8
+
+	MOVW	(0*4)(R8), R7 // fn
+	MOVW	(1*4)(R8), R0 // a1
+	MOVW	(2*4)(R8), R1 // a2
+	MOVW	(3*4)(R8), R2 // a3
+
+	BL	(R7)
+
+	MOVW	R0, (4*4)(R8) // r1
+	MOVW	R1, (5*4)(R8) // r2
+
+	// Standard libc functions return -1 on error and set errno.
+	CMP	$-1, R0
+	BNE	ok
+
+	// Get error code from libc.
+	BL	libc_errno(SB)
+	MOVW	(R0), R1
+	MOVW	R1, (6*4)(R8) // err
+
+ok:
+	MOVW	$0, R0		// no error (it's ignored anyway)
+	MOVW	R9, R13
+	RET
+
+// syscallX calls a function in libc on behalf of the syscall package.
+// syscallX takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscallX must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscallX is like syscall but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscallX(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	BIC     $0x7, R13		// align for ELF ABI
+
+	MOVW	R0, R8
+
+	MOVW	(0*4)(R8), R7 // fn
+	MOVW	(1*4)(R8), R0 // a1
+	MOVW	(2*4)(R8), R1 // a2
+	MOVW	(3*4)(R8), R2 // a3
+
+	BL	(R7)
+
+	MOVW	R0, (4*4)(R8) // r1
+	MOVW	R1, (5*4)(R8) // r2
+
+	// Standard libc functions return -1 on error and set errno.
+	CMP	$-1, R0
+	BNE	ok
+	CMP	$-1, R1
+	BNE	ok
+
+	// Get error code from libc.
+	BL	libc_errno(SB)
+	MOVW	(R0), R1
+	MOVW	R1, (6*4)(R8) // err
+
+ok:
+	MOVW	$0, R0		// no error (it's ignored anyway)
+	MOVW	R9, R13
+	RET
+
+// syscall6 calls a function in libc on behalf of the syscall package.
+// syscall6 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall6 expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall6(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	SUB	$8, R13
+	BIC     $0x7, R13		// align for ELF ABI
+
+	MOVW	R0, R8
+
+	MOVW	(0*4)(R8), R7 // fn
+	MOVW	(1*4)(R8), R0 // a1
+	MOVW	(2*4)(R8), R1 // a2
+	MOVW	(3*4)(R8), R2 // a3
+	MOVW	(4*4)(R8), R3 // a4
+	MOVW	(5*4)(R8), R4 // a5
+	MOVW	R4, 0(R13)
+	MOVW	(6*4)(R8), R5 // a6
+	MOVW	R5, 4(R13)
+
+	BL	(R7)
+
+	MOVW	R0, (7*4)(R8) // r1
+	MOVW	R1, (8*4)(R8) // r2
+
+	// Standard libc functions return -1 on error and set errno.
+	CMP	$-1, R0
+	BNE	ok
+
+	// Get error code from libc.
+	BL	libc_errno(SB)
+	MOVW	(R0), R1
+	MOVW	R1, (9*4)(R8) // err
+
+ok:
+	MOVW	$0, R0		// no error (it's ignored anyway)
+	MOVW	R9, R13
+	RET
+
+// syscall6X calls a function in libc on behalf of the syscall package.
+// syscall6X takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6X must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall6X is like syscall6 but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscall6X(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	SUB	$8, R13
+	BIC     $0x7, R13		// align for ELF ABI
+
+	MOVW	R0, R8
+
+	MOVW	(0*4)(R8), R7 // fn
+	MOVW	(1*4)(R8), R0 // a1
+	MOVW	(2*4)(R8), R1 // a2
+	MOVW	(3*4)(R8), R2 // a3
+	MOVW	(4*4)(R8), R3 // a4
+	MOVW	(5*4)(R8), R4 // a5
+	MOVW	R4, 0(R13)
+	MOVW	(6*4)(R8), R5 // a6
+	MOVW	R5, 4(R13)
+
+	BL	(R7)
+
+	MOVW	R0, (7*4)(R8) // r1
+	MOVW	R1, (8*4)(R8) // r2
+
+	// Standard libc functions return -1 on error and set errno.
+	CMP	$-1, R0
+	BNE	ok
+	CMP	$-1, R1
+	BNE	ok
+
+	// Get error code from libc.
+	BL	libc_errno(SB)
+	MOVW	(R0), R1
+	MOVW	R1, (9*4)(R8) // err
+
+ok:
+	MOVW	$0, R0		// no error (it's ignored anyway)
+	MOVW	R9, R13
+	RET
+
+// syscall10 calls a function in libc on behalf of the syscall package.
+// syscall10 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	a7    uintptr
+//	a8    uintptr
+//	a9    uintptr
+//	a10   uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall10 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+TEXT runtime·syscall10(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	SUB	$24, R13
+	BIC     $0x7, R13		// align for ELF ABI
+
+	MOVW	R0, R8
+
+	MOVW	(0*4)(R8), R7 // fn
+	MOVW	(1*4)(R8), R0 // a1
+	MOVW	(2*4)(R8), R1 // a2
+	MOVW	(3*4)(R8), R2 // a3
+	MOVW	(4*4)(R8), R3 // a4
+	MOVW	(5*4)(R8), R4 // a5
+	MOVW	R4, 0(R13)
+	MOVW	(6*4)(R8), R5 // a6
+	MOVW	R5, 4(R13)
+	MOVW	(7*4)(R8), R6 // a7
+	MOVW	R6, 8(R13)
+	MOVW	(8*4)(R8), R4 // a8
+	MOVW	R4, 12(R13)
+	MOVW	(9*4)(R8), R5 // a9
+	MOVW	R5, 16(R13)
+	MOVW	(10*4)(R8), R6 // a10
+	MOVW	R6, 20(R13)
+
+	BL	(R7)
+
+	MOVW	R0, (11*4)(R8) // r1
+	MOVW	R1, (12*4)(R8) // r2
+
+	// Standard libc functions return -1 on error and set errno.
+	CMP	$-1, R0
+	BNE	ok
+
+	// Get error code from libc.
+	BL	libc_errno(SB)
+	MOVW	(R0), R1
+	MOVW	R1, (13*4)(R8) // err
+
+ok:
+	MOVW	$0, R0		// no error (it's ignored anyway)
+	MOVW	R9, R13
+	RET
+
+// syscall10X calls a function in libc on behalf of the syscall package.
+// syscall10X takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	a7    uintptr
+//	a8    uintptr
+//	a9    uintptr
+//	a10   uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall10X must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall10X is like syscall10 but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscall10X(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	SUB	$24, R13
+	BIC     $0x7, R13		// align for ELF ABI
+
+	MOVW	R0, R8
+
+	MOVW	(0*4)(R8), R7 // fn
+	MOVW	(1*4)(R8), R0 // a1
+	MOVW	(2*4)(R8), R1 // a2
+	MOVW	(3*4)(R8), R2 // a3
+	MOVW	(4*4)(R8), R3 // a4
+	MOVW	(5*4)(R8), R4 // a5
+	MOVW	R4, 0(R13)
+	MOVW	(6*4)(R8), R5 // a6
+	MOVW	R5, 4(R13)
+	MOVW	(7*4)(R8), R6 // a7
+	MOVW	R6, 8(R13)
+	MOVW	(8*4)(R8), R4 // a8
+	MOVW	R4, 12(R13)
+	MOVW	(9*4)(R8), R5 // a9
+	MOVW	R5, 16(R13)
+	MOVW	(10*4)(R8), R6 // a10
+	MOVW	R6, 20(R13)
+
+	BL	(R7)
+
+	MOVW	R0, (11*4)(R8) // r1
+	MOVW	R1, (12*4)(R8) // r2
+
+	// Standard libc functions return -1 on error and set errno.
+	CMP	$-1, R0
+	BNE	ok
+	CMP	$-1, R1
+	BNE	ok
+
+	// Get error code from libc.
+	BL	libc_errno(SB)
+	MOVW	(R0), R1
+	MOVW	R1, (13*4)(R8) // err
+
+ok:
+	MOVW	$0, R0		// no error (it's ignored anyway)
+	MOVW	R9, R13
+	RET
+
+TEXT runtime·issetugid_trampoline(SB),NOSPLIT,$0
+	MOVW	R13, R9
+	MOVW	R0, R8
+	BIC     $0x7, R13		// align for ELF ABI
+	BL	libc_issetugid(SB)
+	MOVW	R0, 0(R8)
+	MOVW	R9, R13
+	RET
diff --git a/src/runtime/sys_openbsd_arm64.s b/src/runtime/sys_openbsd_arm64.s
new file mode 100644
index 0000000..6667dad
--- /dev/null
+++ b/src/runtime/sys_openbsd_arm64.s
@@ -0,0 +1,652 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for arm64, OpenBSD
+// System calls are implemented in libc/libpthread, this file
+// contains trampolines that convert from Go to C calling convention.
+// Some direct system call implementations currently remain.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "cgo/abi_arm64.h"
+
+#define CLOCK_REALTIME	$0
+#define	CLOCK_MONOTONIC	$3
+
+// mstart_stub is the first function executed on a new thread started by pthread_create.
+// It just does some low-level setup and then calls mstart.
+// Note: called with the C calling convention.
+TEXT runtime·mstart_stub(SB),NOSPLIT,$144
+	// R0 points to the m.
+	// We are already on m's g0 stack.
+
+	// Save callee-save registers.
+	SAVE_R19_TO_R28(8)
+	SAVE_F8_TO_F15(88)
+
+	MOVD    m_g0(R0), g
+	BL	runtime·save_g(SB)
+
+	BL	runtime·mstart(SB)
+
+	// Restore callee-save registers.
+	RESTORE_R19_TO_R28(8)
+	RESTORE_F8_TO_F15(88)
+
+	// Go is all done with this OS thread.
+	// Tell pthread everything is ok (we never join with this thread, so
+	// the value here doesn't really matter).
+	MOVD	$0, R0
+
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R0
+	MOVD	info+16(FP), R1
+	MOVD	ctx+24(FP), R2
+	MOVD	fn+0(FP), R11
+	BL	(R11)			// Alignment for ELF ABI?
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$192
+	// Save callee-save registers in the case of signal forwarding.
+	// Please refer to https://golang.org/issue/31827 .
+	SAVE_R19_TO_R28(8*4)
+	SAVE_F8_TO_F15(8*14)
+
+	// If called from an external code context, g will not be set.
+	// Save R0, since runtime·load_g will clobber it.
+	MOVW	R0, 8(RSP)		// signum
+	BL	runtime·load_g(SB)
+
+	// Restore signum to R0.
+	MOVW	8(RSP), R0
+	// R1 and R2 already contain info and ctx, respectively.
+	BL	runtime·sigtrampgo<ABIInternal>(SB)
+
+	// Restore callee-save registers.
+	RESTORE_R19_TO_R28(8*4)
+	RESTORE_F8_TO_F15(8*14)
+
+	RET
+
+//
+// These trampolines help convert from Go calling convention to C calling convention.
+// They should be called with asmcgocall.
+// A pointer to the arguments is passed in R0.
+// A single int32 result is returned in R0.
+// (For more results, make an args/results structure.)
+TEXT runtime·pthread_attr_init_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R0		// arg 1 - attr
+	CALL	libc_pthread_attr_init(SB)
+	RET
+
+TEXT runtime·pthread_attr_destroy_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R0		// arg 1 - attr
+	CALL	libc_pthread_attr_destroy(SB)
+	RET
+
+TEXT runtime·pthread_attr_getstacksize_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - size
+	MOVD	0(R0), R0		// arg 1 - attr
+	CALL	libc_pthread_attr_getstacksize(SB)
+	RET
+
+TEXT runtime·pthread_attr_setdetachstate_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - state
+	MOVD	0(R0), R0		// arg 1 - attr
+	CALL	libc_pthread_attr_setdetachstate(SB)
+	RET
+
+TEXT runtime·pthread_create_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R1		// arg 2 - attr
+	MOVD	8(R0), R2		// arg 3 - start
+	MOVD	16(R0), R3		// arg 4 - arg
+	SUB	$16, RSP
+	MOVD	RSP, R0			// arg 1 - &threadid (discard)
+	CALL	libc_pthread_create(SB)
+	ADD	$16, RSP
+	RET
+
+TEXT runtime·thrkill_trampoline(SB),NOSPLIT,$0
+	MOVW	8(R0), R1		// arg 2 - signal
+	MOVD	$0, R2			// arg 3 - tcb
+	MOVW	0(R0), R0		// arg 1 - tid
+	CALL	libc_thrkill(SB)
+	RET
+
+TEXT runtime·thrsleep_trampoline(SB),NOSPLIT,$0
+	MOVW	8(R0), R1		// arg 2 - clock_id
+	MOVD	16(R0), R2		// arg 3 - abstime
+	MOVD	24(R0), R3		// arg 4 - lock
+	MOVD	32(R0), R4		// arg 5 - abort
+	MOVD	0(R0), R0		// arg 1 - id
+	CALL	libc_thrsleep(SB)
+	RET
+
+TEXT runtime·thrwakeup_trampoline(SB),NOSPLIT,$0
+	MOVW	8(R0), R1		// arg 2 - count
+	MOVD	0(R0), R0		// arg 1 - id
+	CALL	libc_thrwakeup(SB)
+	RET
+
+TEXT runtime·exit_trampoline(SB),NOSPLIT,$0
+	MOVW	0(R0), R0		// arg 1 - status
+	CALL	libc_exit(SB)
+	MOVD	$0, R0			// crash on failure
+	MOVD	R0, (R0)
+	RET
+
+TEXT runtime·getthrid_trampoline(SB),NOSPLIT,$0
+	MOVD	R0, R19			// pointer to args
+	CALL	libc_getthrid(SB)
+	MOVW	R0, 0(R19)		// return value
+	RET
+
+TEXT runtime·raiseproc_trampoline(SB),NOSPLIT,$0
+	MOVD	R0, R19			// pointer to args
+	CALL	libc_getpid(SB)		// arg 1 - pid
+	MOVW	0(R19), R1		// arg 2 - signal
+	CALL	libc_kill(SB)
+	RET
+
+TEXT runtime·sched_yield_trampoline(SB),NOSPLIT,$0
+	CALL	libc_sched_yield(SB)
+	RET
+
+TEXT runtime·mmap_trampoline(SB),NOSPLIT,$0
+	MOVD    R0, R19			// pointer to args
+	MOVD	0(R19), R0		// arg 1 - addr
+	MOVD	8(R19), R1		// arg 2 - len
+	MOVW	16(R19), R2		// arg 3 - prot
+	MOVW	20(R19), R3		// arg 4 - flags
+	MOVW	24(R19), R4		// arg 5 - fid
+	MOVW	28(R19), R5		// arg 6 - offset
+	CALL	libc_mmap(SB)
+	MOVD	$0, R1
+	CMP	$-1, R0
+	BNE	noerr
+	CALL	libc_errno(SB)
+	MOVW	(R0), R1		// errno
+	MOVD	$0, R0
+noerr:
+	MOVD	R0, 32(R19)
+	MOVD	R1, 40(R19)
+	RET
+
+TEXT runtime·munmap_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - len
+	MOVD	0(R0), R0		// arg 1 - addr
+	CALL	libc_munmap(SB)
+	CMP	$-1, R0
+	BNE	3(PC)
+	MOVD	$0, R0			// crash on failure
+	MOVD	R0, (R0)
+	RET
+
+TEXT runtime·madvise_trampoline(SB), NOSPLIT, $0
+	MOVD	8(R0), R1		// arg 2 - len
+	MOVW	16(R0), R2		// arg 3 - advice
+	MOVD	0(R0), R0		// arg 1 - addr
+	CALL	libc_madvise(SB)
+	// ignore failure - maybe pages are locked
+	RET
+
+TEXT runtime·open_trampoline(SB),NOSPLIT,$0
+	MOVW	8(R0), R1		// arg 2 - flags
+	MOVW	12(R0), R2		// arg 3 - mode
+	MOVD	0(R0), R0		// arg 1 - path
+	MOVD	$0, R3			// varargs
+	CALL	libc_open(SB)
+	RET
+
+TEXT runtime·close_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R0		// arg 1 - fd
+	CALL	libc_close(SB)
+	RET
+
+TEXT runtime·read_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - buf
+	MOVW	16(R0), R2		// arg 3 - count
+	MOVW	0(R0), R0		// arg 1 - fd
+	CALL	libc_read(SB)
+	CMP	$-1, R0
+	BNE	noerr
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0		// errno
+	NEG	R0, R0			// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·write_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - buf
+	MOVW	16(R0), R2		// arg 3 - count
+	MOVW	0(R0), R0		// arg 1 - fd
+	CALL	libc_write(SB)
+	CMP	$-1, R0
+	BNE	noerr
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0		// errno
+	NEG	R0, R0			// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·pipe2_trampoline(SB),NOSPLIT,$0
+	MOVW	8(R0), R1		// arg 2 - flags
+	MOVD	0(R0), R0		// arg 1 - filedes
+	CALL	libc_pipe2(SB)
+	CMP	$-1, R0
+	BNE	noerr
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0		// errno
+	NEG	R0, R0			// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·setitimer_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - new
+	MOVD	16(R0), R2		// arg 3 - old
+	MOVW	0(R0), R0		// arg 1 - which
+	CALL	libc_setitimer(SB)
+	RET
+
+TEXT runtime·usleep_trampoline(SB),NOSPLIT,$0
+	MOVD	0(R0), R0		// arg 1 - usec
+	CALL	libc_usleep(SB)
+	RET
+
+TEXT runtime·sysctl_trampoline(SB),NOSPLIT,$0
+	MOVW	8(R0), R1		// arg 2 - miblen
+	MOVD	16(R0), R2		// arg 3 - out
+	MOVD	24(R0), R3		// arg 4 - size
+	MOVD	32(R0), R4		// arg 5 - dst
+	MOVD	40(R0), R5		// arg 6 - ndst
+	MOVD	0(R0), R0		// arg 1 - mib
+	CALL	libc_sysctl(SB)
+	RET
+
+TEXT runtime·kqueue_trampoline(SB),NOSPLIT,$0
+	CALL	libc_kqueue(SB)
+	RET
+
+TEXT runtime·kevent_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - keventt
+	MOVW	16(R0), R2		// arg 3 - nch
+	MOVD	24(R0), R3		// arg 4 - ev
+	MOVW	32(R0), R4		// arg 5 - nev
+	MOVD	40(R0), R5		// arg 6 - ts
+	MOVW	0(R0), R0		// arg 1 - kq
+	CALL	libc_kevent(SB)
+	CMP	$-1, R0
+	BNE	noerr
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0		// errno
+	NEG	R0, R0			// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·clock_gettime_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - tp
+	MOVD	0(R0), R0		// arg 1 - clock_id
+	CALL	libc_clock_gettime(SB)
+	CMP	$-1, R0
+	BNE	noerr
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0		// errno
+	NEG	R0, R0			// caller expects negative errno value
+noerr:
+	RET
+
+TEXT runtime·fcntl_trampoline(SB),NOSPLIT,$0
+	MOVD	R0, R19
+	MOVW	0(R19), R0		// arg 1 - fd
+	MOVW	4(R19), R1		// arg 2 - cmd
+	MOVW	8(R19), R2		// arg 3 - arg
+	MOVD	$0, R3			// vararg
+	CALL	libc_fcntl(SB)
+	MOVD	$0, R1
+	CMP	$-1, R0
+	BNE	noerr
+	CALL	libc_errno(SB)
+	MOVW	(R0), R1
+	MOVW	$-1, R0
+noerr:
+	MOVW	R0, 12(R19)
+	MOVW	R1, 16(R19)
+	RET
+
+TEXT runtime·sigaction_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - new
+	MOVD	16(R0), R2		// arg 3 - old
+	MOVW	0(R0), R0		// arg 1 - sig
+	CALL	libc_sigaction(SB)
+	CMP	$-1, R0
+	BNE	3(PC)
+	MOVD	$0, R0			// crash on syscall failure
+	MOVD	R0, (R0)
+	RET
+
+TEXT runtime·sigprocmask_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - new
+	MOVD	16(R0), R2		// arg 3 - old
+	MOVW	0(R0), R0		// arg 1 - how
+	CALL	libc_pthread_sigmask(SB)
+	CMP	$-1, R0
+	BNE	3(PC)
+	MOVD	$0, R0			// crash on syscall failure
+	MOVD	R0, (R0)
+	RET
+
+TEXT runtime·sigaltstack_trampoline(SB),NOSPLIT,$0
+	MOVD	8(R0), R1		// arg 2 - old
+	MOVD	0(R0), R0		// arg 1 - new
+	CALL	libc_sigaltstack(SB)
+	CMP	$-1, R0
+	BNE	3(PC)
+	MOVD	$0, R0			// crash on syscall failure
+	MOVD	R0, (R0)
+	RET
+
+// syscall calls a function in libc on behalf of the syscall package.
+// syscall takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall(SB),NOSPLIT,$0
+	MOVD    R0, R19			// pointer to args
+
+	MOVD	(0*8)(R19), R11		// fn
+	MOVD	(1*8)(R19), R0		// a1
+	MOVD	(2*8)(R19), R1		// a2
+	MOVD	(3*8)(R19), R2		// a3
+	MOVD	$0, R3			// vararg
+
+	CALL	R11
+
+	MOVD	R0, (4*8)(R19)		// r1
+	MOVD	R1, (5*8)(R19)		// r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMPW	$-1, R0
+	BNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0
+	MOVD	R0, (6*8)(R19)		// err
+
+ok:
+	RET
+
+// syscallX calls a function in libc on behalf of the syscall package.
+// syscallX takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscallX must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscallX is like syscall but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscallX(SB),NOSPLIT,$0
+	MOVD    R0, R19			// pointer to args
+
+	MOVD	(0*8)(R19), R11		// fn
+	MOVD	(1*8)(R19), R0		// a1
+	MOVD	(2*8)(R19), R1		// a2
+	MOVD	(3*8)(R19), R2		// a3
+	MOVD	$0, R3			// vararg
+
+	CALL	R11
+
+	MOVD	R0, (4*8)(R19)		// r1
+	MOVD	R1, (5*8)(R19)		// r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMP	$-1, R0
+	BNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0
+	MOVD	R0, (6*8)(R19)		// err
+
+ok:
+	RET
+
+// syscall6 calls a function in libc on behalf of the syscall package.
+// syscall6 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall6 expects a 32-bit result and tests for 32-bit -1
+// to decide there was an error.
+TEXT runtime·syscall6(SB),NOSPLIT,$0
+	MOVD    R0, R19			// pointer to args
+
+	MOVD	(0*8)(R19), R11		// fn
+	MOVD	(1*8)(R19), R0		// a1
+	MOVD	(2*8)(R19), R1		// a2
+	MOVD	(3*8)(R19), R2		// a3
+	MOVD	(4*8)(R19), R3		// a4
+	MOVD	(5*8)(R19), R4		// a5
+	MOVD	(6*8)(R19), R5		// a6
+	MOVD	$0, R6			// vararg
+
+	CALL	R11
+
+	MOVD	R0, (7*8)(R19)		// r1
+	MOVD	R1, (8*8)(R19)		// r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMPW	$-1, R0
+	BNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0
+	MOVD	R0, (9*8)(R19)		// err
+
+ok:
+	RET
+
+// syscall6X calls a function in libc on behalf of the syscall package.
+// syscall6X takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall6X must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall6X is like syscall6 but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscall6X(SB),NOSPLIT,$0
+	MOVD    R0, R19			// pointer to args
+
+	MOVD	(0*8)(R19), R11		// fn
+	MOVD	(1*8)(R19), R0		// a1
+	MOVD	(2*8)(R19), R1		// a2
+	MOVD	(3*8)(R19), R2		// a3
+	MOVD	(4*8)(R19), R3		// a4
+	MOVD	(5*8)(R19), R4		// a5
+	MOVD	(6*8)(R19), R5		// a6
+	MOVD	$0, R6			// vararg
+
+	CALL	R11
+
+	MOVD	R0, (7*8)(R19)		// r1
+	MOVD	R1, (8*8)(R19)		// r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMP	$-1, R0
+	BNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0
+	MOVD	R0, (9*8)(R19)		// err
+
+ok:
+	RET
+
+// syscall10 calls a function in libc on behalf of the syscall package.
+// syscall10 takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	a7    uintptr
+//	a8    uintptr
+//	a9    uintptr
+//	a10   uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall10 must be called on the g0 stack with the
+// C calling convention (use libcCall).
+TEXT runtime·syscall10(SB),NOSPLIT,$0
+	MOVD    R0, R19			// pointer to args
+
+	MOVD	(0*8)(R19), R11		// fn
+	MOVD	(1*8)(R19), R0		// a1
+	MOVD	(2*8)(R19), R1		// a2
+	MOVD	(3*8)(R19), R2		// a3
+	MOVD	(4*8)(R19), R3		// a4
+	MOVD	(5*8)(R19), R4		// a5
+	MOVD	(6*8)(R19), R5		// a6
+	MOVD	(7*8)(R19), R6		// a7
+	MOVD	(8*8)(R19), R7		// a8
+	MOVD	(9*8)(R19), R8		// a9
+	MOVD	(10*8)(R19), R9		// a10
+	MOVD	$0, R10			// vararg
+
+	CALL	R11
+
+	MOVD	R0, (11*8)(R19)		// r1
+	MOVD	R1, (12*8)(R19)		// r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMPW	$-1, R0
+	BNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0
+	MOVD	R0, (13*8)(R19)		// err
+
+ok:
+	RET
+
+// syscall10X calls a function in libc on behalf of the syscall package.
+// syscall10X takes a pointer to a struct like:
+// struct {
+//	fn    uintptr
+//	a1    uintptr
+//	a2    uintptr
+//	a3    uintptr
+//	a4    uintptr
+//	a5    uintptr
+//	a6    uintptr
+//	a7    uintptr
+//	a8    uintptr
+//	a9    uintptr
+//	a10   uintptr
+//	r1    uintptr
+//	r2    uintptr
+//	err   uintptr
+// }
+// syscall10X must be called on the g0 stack with the
+// C calling convention (use libcCall).
+//
+// syscall10X is like syscall10 but expects a 64-bit result
+// and tests for 64-bit -1 to decide there was an error.
+TEXT runtime·syscall10X(SB),NOSPLIT,$0
+	MOVD    R0, R19			// pointer to args
+
+	MOVD	(0*8)(R19), R11		// fn
+	MOVD	(1*8)(R19), R0		// a1
+	MOVD	(2*8)(R19), R1		// a2
+	MOVD	(3*8)(R19), R2		// a3
+	MOVD	(4*8)(R19), R3		// a4
+	MOVD	(5*8)(R19), R4		// a5
+	MOVD	(6*8)(R19), R5		// a6
+	MOVD	(7*8)(R19), R6		// a7
+	MOVD	(8*8)(R19), R7		// a8
+	MOVD	(9*8)(R19), R8		// a9
+	MOVD	(10*8)(R19), R9		// a10
+	MOVD	$0, R10			// vararg
+
+	CALL	R11
+
+	MOVD	R0, (11*8)(R19)		// r1
+	MOVD	R1, (12*8)(R19)		// r2
+
+	// Standard libc functions return -1 on error
+	// and set errno.
+	CMP	$-1, R0
+	BNE	ok
+
+	// Get error code from libc.
+	CALL	libc_errno(SB)
+	MOVW	(R0), R0
+	MOVD	R0, (13*8)(R19)		// err
+
+ok:
+	RET
+
+TEXT runtime·issetugid_trampoline(SB),NOSPLIT,$0
+	MOVD	R0, R19			// pointer to args
+	CALL	libc_issetugid(SB)
+	MOVW	R0, 0(R19)		// return value
+	RET
diff --git a/src/runtime/sys_openbsd_mips64.s b/src/runtime/sys_openbsd_mips64.s
new file mode 100644
index 0000000..7ac0db0
--- /dev/null
+++ b/src/runtime/sys_openbsd_mips64.s
@@ -0,0 +1,388 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//
+// System calls and other sys.stuff for mips64, OpenBSD
+// /usr/src/sys/kern/syscalls.master for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define CLOCK_REALTIME	$0
+#define	CLOCK_MONOTONIC	$3
+
+// Exit the entire program (like C exit)
+TEXT runtime·exit(SB),NOSPLIT|NOFRAME,$0
+	MOVW	code+0(FP), R4		// arg 1 - status
+	MOVV	$1, R2			// sys_exit
+	SYSCALL
+	BEQ	R7, 3(PC)
+	MOVV	$0, R2			// crash on syscall failure
+	MOVV	R2, (R2)
+	RET
+
+// func exitThread(wait *atomic.Uint32)
+TEXT runtime·exitThread(SB),NOSPLIT,$0
+	MOVV	wait+0(FP), R4		// arg 1 - notdead
+	MOVV	$302, R2		// sys___threxit
+	SYSCALL
+	MOVV	$0, R2			// crash on syscall failure
+	MOVV	R2, (R2)
+	JMP	0(PC)
+
+TEXT runtime·open(SB),NOSPLIT|NOFRAME,$0
+	MOVV	name+0(FP), R4		// arg 1 - path
+	MOVW	mode+8(FP), R5		// arg 2 - mode
+	MOVW	perm+12(FP), R6		// arg 3 - perm
+	MOVV	$5, R2			// sys_open
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+16(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT|NOFRAME,$0
+	MOVW	fd+0(FP), R4		// arg 1 - fd
+	MOVV	$6, R2			// sys_close
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+8(FP)
+	RET
+
+TEXT runtime·read(SB),NOSPLIT|NOFRAME,$0
+	MOVW	fd+0(FP), R4		// arg 1 - fd
+	MOVV	p+8(FP), R5		// arg 2 - buf
+	MOVW	n+16(FP), R6		// arg 3 - nbyte
+	MOVV	$3, R2			// sys_read
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+24(FP)
+	RET
+
+// func pipe2(flags int32) (r, w int32, errno int32)
+TEXT runtime·pipe2(SB),NOSPLIT|NOFRAME,$0-20
+	MOVV	$r+8(FP), R4
+	MOVW	flags+0(FP), R5
+	MOVV	$101, R2		// sys_pipe2
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, errno+16(FP)
+	RET
+
+TEXT runtime·write1(SB),NOSPLIT|NOFRAME,$0
+	MOVV	fd+0(FP), R4		// arg 1 - fd
+	MOVV	p+8(FP), R5		// arg 2 - buf
+	MOVW	n+16(FP), R6		// arg 3 - nbyte
+	MOVV	$4, R2			// sys_write
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+24(FP)
+	RET
+
+TEXT runtime·usleep(SB),NOSPLIT,$24-4
+	MOVWU	usec+0(FP), R3
+	MOVV	R3, R5
+	MOVW	$1000000, R4
+	DIVVU	R4, R3
+	MOVV	LO, R3
+	MOVV	R3, 8(R29)		// tv_sec
+	MOVW	$1000, R4
+	MULVU	R3, R4
+	MOVV	LO, R4
+	SUBVU	R4, R5
+	MOVV	R5, 16(R29)		// tv_nsec
+
+	ADDV	$8, R29, R4		// arg 1 - rqtp
+	MOVV	$0, R5			// arg 2 - rmtp
+	MOVV	$91, R2			// sys_nanosleep
+	SYSCALL
+	RET
+
+TEXT runtime·getthrid(SB),NOSPLIT,$0-4
+	MOVV	$299, R2		// sys_getthrid
+	SYSCALL
+	MOVW	R2, ret+0(FP)
+	RET
+
+TEXT runtime·thrkill(SB),NOSPLIT,$0-16
+	MOVW	tid+0(FP), R4		// arg 1 - tid
+	MOVV	sig+8(FP), R5		// arg 2 - signum
+	MOVW	$0, R6			// arg 3 - tcb
+	MOVV	$119, R2		// sys_thrkill
+	SYSCALL
+	RET
+
+TEXT runtime·raiseproc(SB),NOSPLIT,$0
+	MOVV	$20, R4			// sys_getpid
+	SYSCALL
+	MOVV	R2, R4			// arg 1 - pid
+	MOVW	sig+0(FP), R5		// arg 2 - signum
+	MOVV	$122, R2		// sys_kill
+	SYSCALL
+	RET
+
+TEXT runtime·mmap(SB),NOSPLIT,$0
+	MOVV	addr+0(FP), R4		// arg 1 - addr
+	MOVV	n+8(FP), R5		// arg 2 - len
+	MOVW	prot+16(FP), R6		// arg 3 - prot
+	MOVW	flags+20(FP), R7	// arg 4 - flags
+	MOVW	fd+24(FP), R8		// arg 5 - fd
+	MOVW	$0, R9			// arg 6 - pad
+	MOVW	off+28(FP), R10		// arg 7 - offset
+	MOVV	$197, R2		// sys_mmap
+	SYSCALL
+	MOVV	$0, R4
+	BEQ	R7, 3(PC)
+	MOVV	R2, R4			// if error, move to R4
+	MOVV	$0, R2
+	MOVV	R2, p+32(FP)
+	MOVV	R4, err+40(FP)
+	RET
+
+TEXT runtime·munmap(SB),NOSPLIT,$0
+	MOVV	addr+0(FP), R4		// arg 1 - addr
+	MOVV	n+8(FP), R5		// arg 2 - len
+	MOVV	$73, R2			// sys_munmap
+	SYSCALL
+	BEQ	R7, 3(PC)
+	MOVV	$0, R2			// crash on syscall failure
+	MOVV	R2, (R2)
+	RET
+
+TEXT runtime·madvise(SB),NOSPLIT,$0
+	MOVV	addr+0(FP), R4		// arg 1 - addr
+	MOVV	n+8(FP), R5		// arg 2 - len
+	MOVW	flags+16(FP), R6	// arg 2 - flags
+	MOVV	$75, R2			// sys_madvise
+	SYSCALL
+	BEQ	R7, 2(PC)
+	MOVW	$-1, R2
+	MOVW	R2, ret+24(FP)
+	RET
+
+TEXT runtime·setitimer(SB),NOSPLIT,$0
+	MOVW	mode+0(FP), R4		// arg 1 - mode
+	MOVV	new+8(FP), R5		// arg 2 - new value
+	MOVV	old+16(FP), R6		// arg 3 - old value
+	MOVV	$69, R2			// sys_setitimer
+	SYSCALL
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB), NOSPLIT, $32
+	MOVW	CLOCK_REALTIME, R4	// arg 1 - clock_id
+	MOVV	$8(R29), R5		// arg 2 - tp
+	MOVV	$87, R2			// sys_clock_gettime
+	SYSCALL
+
+	MOVV	8(R29), R4		// sec
+	MOVV	16(R29), R5		// nsec
+	MOVV	R4, sec+0(FP)
+	MOVW	R5, nsec+8(FP)
+
+	RET
+
+// int64 nanotime1(void) so really
+// void nanotime1(int64 *nsec)
+TEXT runtime·nanotime1(SB),NOSPLIT,$32
+	MOVW	CLOCK_MONOTONIC, R4	// arg 1 - clock_id
+	MOVV	$8(R29), R5		// arg 2 - tp
+	MOVV	$87, R2			// sys_clock_gettime
+	SYSCALL
+
+	MOVV	8(R29), R3		// sec
+	MOVV	16(R29), R5		// nsec
+
+	MOVV	$1000000000, R4
+	MULVU	R4, R3
+	MOVV	LO, R3
+	ADDVU	R5, R3
+	MOVV	R3, ret+0(FP)
+	RET
+
+TEXT runtime·sigaction(SB),NOSPLIT,$0
+	MOVW	sig+0(FP), R4		// arg 1 - signum
+	MOVV	new+8(FP), R5		// arg 2 - new sigaction
+	MOVV	old+16(FP), R6		// arg 3 - old sigaction
+	MOVV	$46, R2			// sys_sigaction
+	SYSCALL
+	BEQ	R7, 3(PC)
+	MOVV	$3, R2			// crash on syscall failure
+	MOVV	R2, (R2)
+	RET
+
+TEXT runtime·obsdsigprocmask(SB),NOSPLIT,$0
+	MOVW	how+0(FP), R4		// arg 1 - mode
+	MOVW	new+4(FP), R5		// arg 2 - new
+	MOVV	$48, R2			// sys_sigprocmask
+	SYSCALL
+	BEQ	R7, 3(PC)
+	MOVV	$3, R2			// crash on syscall failure
+	MOVV	R2, (R2)
+	MOVW	R2, ret+8(FP)
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVW	sig+8(FP), R4
+	MOVV	info+16(FP), R5
+	MOVV	ctx+24(FP), R6
+	MOVV	fn+0(FP), R25		// Must use R25, needed for PIC code.
+	CALL	(R25)
+	RET
+
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME,$192
+	// initialize REGSB = PC&0xffffffff00000000
+	BGEZAL	R0, 1(PC)
+	SRLV	$32, R31, RSB
+	SLLV	$32, RSB
+
+	// this might be called in external code context,
+	// where g is not set.
+	MOVB	runtime·iscgo(SB), R1
+	BEQ	R1, 2(PC)
+	JAL	runtime·load_g(SB)
+
+	MOVW	R4, 8(R29)
+	MOVV	R5, 16(R29)
+	MOVV	R6, 24(R29)
+	MOVV	$runtime·sigtrampgo(SB), R1
+	JAL	(R1)
+	RET
+
+// int32 tfork(void *param, uintptr psize, M *mp, G *gp, void (*fn)(void));
+TEXT runtime·tfork(SB),NOSPLIT,$0
+
+	// Copy mp, gp and fn off parent stack for use by child.
+	MOVV	mm+16(FP), R16
+	MOVV	gg+24(FP), R17
+	MOVV	fn+32(FP), R18
+
+	MOVV	param+0(FP), R4		// arg 1 - param
+	MOVV	psize+8(FP), R5		// arg 2 - psize
+	MOVV	$8, R2			// sys___tfork
+	SYSCALL
+
+	// Return if syscall failed.
+	BEQ	R7, 4(PC)
+	SUBVU	R2, R0, R2		// caller expects negative errno
+	MOVW	R2, ret+40(FP)
+	RET
+
+	// In parent, return.
+	BEQ	R2, 3(PC)
+	MOVW	$0, ret+40(FP)
+	RET
+
+	// Initialise m, g.
+	MOVV	R17, g
+	MOVV	R16, g_m(g)
+
+	// Call fn.
+	CALL	(R18)
+
+	// fn should never return.
+	MOVV	$2, R8			// crash if reached
+	MOVV	R8, (R8)
+	RET
+
+TEXT runtime·sigaltstack(SB),NOSPLIT,$0
+	MOVV	new+0(FP), R4		// arg 1 - new sigaltstack
+	MOVV	old+8(FP), R5		// arg 2 - old sigaltstack
+	MOVV	$288, R2		// sys_sigaltstack
+	SYSCALL
+	BEQ	R7, 3(PC)
+	MOVV	$0, R8			// crash on syscall failure
+	MOVV	R8, (R8)
+	RET
+
+TEXT runtime·osyield(SB),NOSPLIT,$0
+	MOVV	$298, R2		// sys_sched_yield
+	SYSCALL
+	RET
+
+TEXT runtime·thrsleep(SB),NOSPLIT,$0
+	MOVV	ident+0(FP), R4		// arg 1 - ident
+	MOVW	clock_id+8(FP), R5	// arg 2 - clock_id
+	MOVV	tsp+16(FP), R6		// arg 3 - tsp
+	MOVV	lock+24(FP), R7		// arg 4 - lock
+	MOVV	abort+32(FP), R8	// arg 5 - abort
+	MOVV	$94, R2			// sys___thrsleep
+	SYSCALL
+	MOVW	R2, ret+40(FP)
+	RET
+
+TEXT runtime·thrwakeup(SB),NOSPLIT,$0
+	MOVV	ident+0(FP), R4		// arg 1 - ident
+	MOVW	n+8(FP), R5		// arg 2 - n
+	MOVV	$301, R2		// sys___thrwakeup
+	SYSCALL
+	MOVW	R2, ret+16(FP)
+	RET
+
+TEXT runtime·sysctl(SB),NOSPLIT,$0
+	MOVV	mib+0(FP), R4		// arg 1 - mib
+	MOVW	miblen+8(FP), R5	// arg 2 - miblen
+	MOVV	out+16(FP), R6		// arg 3 - out
+	MOVV	size+24(FP), R7		// arg 4 - size
+	MOVV	dst+32(FP), R8		// arg 5 - dest
+	MOVV	ndst+40(FP), R9		// arg 6 - newlen
+	MOVV	$202, R2		// sys___sysctl
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+48(FP)
+	RET
+
+// int32 runtime·kqueue(void);
+TEXT runtime·kqueue(SB),NOSPLIT,$0
+	MOVV	$269, R2		// sys_kqueue
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+0(FP)
+	RET
+
+// int32 runtime·kevent(int kq, Kevent *changelist, int nchanges, Kevent *eventlist, int nevents, Timespec *timeout);
+TEXT runtime·kevent(SB),NOSPLIT,$0
+	MOVW	kq+0(FP), R4		// arg 1 - kq
+	MOVV	ch+8(FP), R5		// arg 2 - changelist
+	MOVW	nch+16(FP), R6		// arg 3 - nchanges
+	MOVV	ev+24(FP), R7		// arg 4 - eventlist
+	MOVW	nev+32(FP), R8		// arg 5 - nevents
+	MOVV	ts+40(FP), R9		// arg 6 - timeout
+	MOVV	$72, R2			// sys_kevent
+	SYSCALL
+	BEQ	R7, 2(PC)
+	SUBVU	R2, R0, R2	// caller expects negative errno
+	MOVW	R2, ret+48(FP)
+	RET
+
+// func fcntl(fd, cmd, arg int32) (int32, int32)
+TEXT runtime·fcntl(SB),NOSPLIT,$0
+	MOVW	fd+0(FP), R4	// fd
+	MOVW	cmd+4(FP), R5	// cmd
+	MOVW	arg+8(FP), R6	// arg
+	MOVV	$92, R2		// sys_fcntl
+	SYSCALL
+	MOVV	$0, R4
+	BEQ	R7, noerr
+	MOVV	R2, R4
+	MOVW	$-1, R2
+noerr:
+	MOVW	R2, ret+16(FP)
+	MOVW	R4, errno+20(FP)
+	RET
+
+// func issetugid() int32
+TEXT runtime·issetugid(SB),NOSPLIT,$0
+	MOVV	$253, R2	// sys_issetugid
+	SYSCALL
+	MOVW	R2, ret+0(FP)
+	RET
diff --git a/src/runtime/sys_plan9_386.s b/src/runtime/sys_plan9_386.s
new file mode 100644
index 0000000..bdcb98e
--- /dev/null
+++ b/src/runtime/sys_plan9_386.s
@@ -0,0 +1,256 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// setldt(int entry, int address, int limit)
+TEXT runtime·setldt(SB),NOSPLIT,$0
+	RET
+
+TEXT runtime·open(SB),NOSPLIT,$0
+	MOVL    $14, AX
+	INT     $64
+	MOVL	AX, ret+12(FP)
+	RET
+
+TEXT runtime·pread(SB),NOSPLIT,$0
+	MOVL    $50, AX
+	INT     $64
+	MOVL	AX, ret+20(FP)
+	RET
+
+TEXT runtime·pwrite(SB),NOSPLIT,$0
+	MOVL    $51, AX
+	INT     $64
+	MOVL	AX, ret+20(FP)
+	RET
+
+// int32 _seek(int64*, int32, int64, int32)
+TEXT _seek<>(SB),NOSPLIT,$0
+	MOVL	$39, AX
+	INT	$64
+	RET
+
+TEXT runtime·seek(SB),NOSPLIT,$24
+	LEAL	ret+16(FP), AX
+	MOVL	fd+0(FP), BX
+	MOVL	offset_lo+4(FP), CX
+	MOVL	offset_hi+8(FP), DX
+	MOVL	whence+12(FP), SI
+	MOVL	AX, 0(SP)
+	MOVL	BX, 4(SP)
+	MOVL	CX, 8(SP)
+	MOVL	DX, 12(SP)
+	MOVL	SI, 16(SP)
+	CALL	_seek<>(SB)
+	CMPL	AX, $0
+	JGE	3(PC)
+	MOVL	$-1, ret_lo+16(FP)
+	MOVL	$-1, ret_hi+20(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$0
+	MOVL	$4, AX
+	INT	$64
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·exits(SB),NOSPLIT,$0
+	MOVL    $8, AX
+	INT     $64
+	RET
+
+TEXT runtime·brk_(SB),NOSPLIT,$0
+	MOVL    $24, AX
+	INT     $64
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·sleep(SB),NOSPLIT,$0
+	MOVL    $17, AX
+	INT     $64
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·plan9_semacquire(SB),NOSPLIT,$0
+	MOVL	$37, AX
+	INT	$64
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·plan9_tsemacquire(SB),NOSPLIT,$0
+	MOVL	$52, AX
+	INT	$64
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT nsec<>(SB),NOSPLIT,$0
+	MOVL	$53, AX
+	INT	$64
+	RET
+
+TEXT runtime·nsec(SB),NOSPLIT,$8
+	LEAL	ret+4(FP), AX
+	MOVL	AX, 0(SP)
+	CALL	nsec<>(SB)
+	CMPL	AX, $0
+	JGE	3(PC)
+	MOVL	$-1, ret_lo+4(FP)
+	MOVL	$-1, ret_hi+8(FP)
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB),NOSPLIT,$8-12
+	CALL	runtime·nanotime1(SB)
+	MOVL	0(SP), AX
+	MOVL	4(SP), DX
+
+	MOVL	$1000000000, CX
+	DIVL	CX
+	MOVL	AX, sec_lo+0(FP)
+	MOVL	$0, sec_hi+4(FP)
+	MOVL	DX, nsec+8(FP)
+	RET
+
+TEXT runtime·notify(SB),NOSPLIT,$0
+	MOVL	$28, AX
+	INT	$64
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·noted(SB),NOSPLIT,$0
+	MOVL	$29, AX
+	INT	$64
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·plan9_semrelease(SB),NOSPLIT,$0
+	MOVL	$38, AX
+	INT	$64
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·rfork(SB),NOSPLIT,$0
+	MOVL	$19, AX
+	INT	$64
+	MOVL	AX, ret+4(FP)
+	RET
+
+TEXT runtime·tstart_plan9(SB),NOSPLIT,$4
+	MOVL	newm+0(FP), CX
+	MOVL	m_g0(CX), DX
+
+	// Layout new m scheduler stack on os stack.
+	MOVL	SP, AX
+	MOVL	AX, (g_stack+stack_hi)(DX)
+	SUBL	$(64*1024), AX		// stack size
+	MOVL	AX, (g_stack+stack_lo)(DX)
+	MOVL	AX, g_stackguard0(DX)
+	MOVL	AX, g_stackguard1(DX)
+
+	// Initialize procid from TOS struct.
+	MOVL	_tos(SB), AX
+	MOVL	48(AX), AX
+	MOVL	AX, m_procid(CX)	// save pid as m->procid
+
+	// Finally, initialize g.
+	get_tls(BX)
+	MOVL	DX, g(BX)
+
+	CALL	runtime·stackcheck(SB)	// smashes AX, CX
+	CALL	runtime·mstart(SB)
+
+	// Exit the thread.
+	MOVL	$0, 0(SP)
+	CALL	runtime·exits(SB)
+	JMP	0(PC)
+
+// void sigtramp(void *ureg, int8 *note)
+TEXT runtime·sigtramp(SB),NOSPLIT,$0
+	get_tls(AX)
+
+	// check that g exists
+	MOVL	g(AX), BX
+	CMPL	BX, $0
+	JNE	3(PC)
+	CALL	runtime·badsignal2(SB) // will exit
+	RET
+
+	// save args
+	MOVL	ureg+0(FP), CX
+	MOVL	note+4(FP), DX
+
+	// change stack
+	MOVL	g_m(BX), BX
+	MOVL	m_gsignal(BX), BP
+	MOVL	(g_stack+stack_hi)(BP), BP
+	MOVL	BP, SP
+
+	// make room for args and g
+	SUBL	$24, SP
+
+	// save g
+	MOVL	g(AX), BP
+	MOVL	BP, 20(SP)
+
+	// g = m->gsignal
+	MOVL	m_gsignal(BX), DI
+	MOVL	DI, g(AX)
+
+	// load args and call sighandler
+	MOVL	CX, 0(SP)
+	MOVL	DX, 4(SP)
+	MOVL	BP, 8(SP)
+
+	CALL	runtime·sighandler(SB)
+	MOVL	12(SP), AX
+
+	// restore g
+	get_tls(BX)
+	MOVL	20(SP), BP
+	MOVL	BP, g(BX)
+
+	// call noted(AX)
+	MOVL	AX, 0(SP)
+	CALL	runtime·noted(SB)
+	RET
+
+// Only used by the 64-bit runtime.
+TEXT runtime·setfpmasks(SB),NOSPLIT,$0
+	RET
+
+#define ERRMAX 128	/* from os_plan9.h */
+
+// void errstr(int8 *buf, int32 len)
+TEXT errstr<>(SB),NOSPLIT,$0
+	MOVL    $41, AX
+	INT     $64
+	RET
+
+// func errstr() string
+// Only used by package syscall.
+// Grab error string due to a syscall made
+// in entersyscall mode, without going
+// through the allocator (issue 4994).
+// See ../syscall/asm_plan9_386.s:/·Syscall/
+TEXT runtime·errstr(SB),NOSPLIT,$8-8
+	get_tls(AX)
+	MOVL	g(AX), BX
+	MOVL	g_m(BX), BX
+	MOVL	(m_mOS+mOS_errstr)(BX), CX
+	MOVL	CX, 0(SP)
+	MOVL	$ERRMAX, 4(SP)
+	CALL	errstr<>(SB)
+	CALL	runtime·findnull(SB)
+	MOVL	4(SP), AX
+	MOVL	AX, ret_len+4(FP)
+	MOVL	0(SP), AX
+	MOVL	AX, ret_base+0(FP)
+	RET
+
+// never called on this platform
+TEXT ·sigpanictramp(SB),NOSPLIT,$0-0
+	UNDEF
diff --git a/src/runtime/sys_plan9_amd64.s b/src/runtime/sys_plan9_amd64.s
new file mode 100644
index 0000000..a53f920
--- /dev/null
+++ b/src/runtime/sys_plan9_amd64.s
@@ -0,0 +1,257 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+TEXT runtime·open(SB),NOSPLIT,$0
+	MOVQ	$14, BP
+	SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·pread(SB),NOSPLIT,$0
+	MOVQ	$50, BP
+	SYSCALL
+	MOVL	AX, ret+32(FP)
+	RET
+
+TEXT runtime·pwrite(SB),NOSPLIT,$0
+	MOVQ	$51, BP
+	SYSCALL
+	MOVL	AX, ret+32(FP)
+	RET
+
+// int32 _seek(int64*, int32, int64, int32)
+TEXT _seek<>(SB),NOSPLIT,$0
+	MOVQ	$39, BP
+	SYSCALL
+	RET
+
+// int64 seek(int32, int64, int32)
+// Convenience wrapper around _seek, the actual system call.
+TEXT runtime·seek(SB),NOSPLIT,$32
+	LEAQ	ret+24(FP), AX
+	MOVL	fd+0(FP), BX
+	MOVQ	offset+8(FP), CX
+	MOVL	whence+16(FP), DX
+	MOVQ	AX, 0(SP)
+	MOVL	BX, 8(SP)
+	MOVQ	CX, 16(SP)
+	MOVL	DX, 24(SP)
+	CALL	_seek<>(SB)
+	CMPL	AX, $0
+	JGE	2(PC)
+	MOVQ	$-1, ret+24(FP)
+	RET
+
+TEXT runtime·closefd(SB),NOSPLIT,$0
+	MOVQ	$4, BP
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·exits(SB),NOSPLIT,$0
+	MOVQ	$8, BP
+	SYSCALL
+	RET
+
+TEXT runtime·brk_(SB),NOSPLIT,$0
+	MOVQ	$24, BP
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·sleep(SB),NOSPLIT,$0
+	MOVQ	$17, BP
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·plan9_semacquire(SB),NOSPLIT,$0
+	MOVQ	$37, BP
+	SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·plan9_tsemacquire(SB),NOSPLIT,$0
+	MOVQ	$52, BP
+	SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·nsec(SB),NOSPLIT,$0
+	MOVQ	$53, BP
+	SYSCALL
+	MOVQ	AX, ret+8(FP)
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB),NOSPLIT,$8-12
+	CALL	runtime·nanotime1(SB)
+	MOVQ	0(SP), AX
+
+	// generated code for
+	//	func f(x uint64) (uint64, uint64) { return x/1000000000, x%1000000000 }
+	// adapted to reduce duplication
+	MOVQ	AX, CX
+	MOVQ	$1360296554856532783, AX
+	MULQ	CX
+	ADDQ	CX, DX
+	RCRQ	$1, DX
+	SHRQ	$29, DX
+	MOVQ	DX, sec+0(FP)
+	IMULQ	$1000000000, DX
+	SUBQ	DX, CX
+	MOVL	CX, nsec+8(FP)
+	RET
+
+TEXT runtime·notify(SB),NOSPLIT,$0
+	MOVQ	$28, BP
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·noted(SB),NOSPLIT,$0
+	MOVQ	$29, BP
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·plan9_semrelease(SB),NOSPLIT,$0
+	MOVQ	$38, BP
+	SYSCALL
+	MOVL	AX, ret+16(FP)
+	RET
+
+TEXT runtime·rfork(SB),NOSPLIT,$0
+	MOVQ	$19, BP
+	SYSCALL
+	MOVL	AX, ret+8(FP)
+	RET
+
+TEXT runtime·tstart_plan9(SB),NOSPLIT,$8
+	MOVQ	newm+0(FP), CX
+	MOVQ	m_g0(CX), DX
+
+	// Layout new m scheduler stack on os stack.
+	MOVQ	SP, AX
+	MOVQ	AX, (g_stack+stack_hi)(DX)
+	SUBQ	$(64*1024), AX		// stack size
+	MOVQ	AX, (g_stack+stack_lo)(DX)
+	MOVQ	AX, g_stackguard0(DX)
+	MOVQ	AX, g_stackguard1(DX)
+
+	// Initialize procid from TOS struct.
+	MOVQ	_tos(SB), AX
+	MOVL	64(AX), AX
+	MOVQ	AX, m_procid(CX)	// save pid as m->procid
+
+	// Finally, initialize g.
+	get_tls(BX)
+	MOVQ	DX, g(BX)
+
+	CALL	runtime·stackcheck(SB)	// smashes AX, CX
+	CALL	runtime·mstart(SB)
+
+	// Exit the thread.
+	MOVQ	$0, 0(SP)
+	CALL	runtime·exits(SB)
+	JMP	0(PC)
+
+// This is needed by asm_amd64.s
+TEXT runtime·settls(SB),NOSPLIT,$0
+	RET
+
+// void sigtramp(void *ureg, int8 *note)
+TEXT runtime·sigtramp(SB),NOSPLIT|NOFRAME,$0
+	get_tls(AX)
+
+	// check that g exists
+	MOVQ	g(AX), BX
+	CMPQ	BX, $0
+	JNE	3(PC)
+	CALL	runtime·badsignal2(SB) // will exit
+	RET
+
+	// save args
+	MOVQ	ureg+0(FP), CX
+	MOVQ	note+8(FP), DX
+
+	// change stack
+	MOVQ	g_m(BX), BX
+	MOVQ	m_gsignal(BX), R10
+	MOVQ	(g_stack+stack_hi)(R10), BP
+	MOVQ	BP, SP
+
+	// make room for args and g
+	SUBQ	$128, SP
+
+	// save g
+	MOVQ	g(AX), BP
+	MOVQ	BP, 32(SP)
+
+	// g = m->gsignal
+	MOVQ	R10, g(AX)
+
+	// load args and call sighandler
+	MOVQ	CX, 0(SP)
+	MOVQ	DX, 8(SP)
+	MOVQ	BP, 16(SP)
+
+	CALL	runtime·sighandler(SB)
+	MOVL	24(SP), AX
+
+	// restore g
+	get_tls(BX)
+	MOVQ	32(SP), R10
+	MOVQ	R10, g(BX)
+
+	// call noted(AX)
+	MOVQ	AX, 0(SP)
+	CALL	runtime·noted(SB)
+	RET
+
+TEXT runtime·setfpmasks(SB),NOSPLIT,$8
+	STMXCSR	0(SP)
+	MOVL	0(SP), AX
+	ANDL	$~0x3F, AX
+	ORL	$(0x3F<<7), AX
+	MOVL	AX, 0(SP)
+	LDMXCSR	0(SP)
+	RET
+
+#define ERRMAX 128	/* from os_plan9.h */
+
+// void errstr(int8 *buf, int32 len)
+TEXT errstr<>(SB),NOSPLIT,$0
+	MOVQ    $41, BP
+	SYSCALL
+	RET
+
+// func errstr() string
+// Only used by package syscall.
+// Grab error string due to a syscall made
+// in entersyscall mode, without going
+// through the allocator (issue 4994).
+// See ../syscall/asm_plan9_amd64.s:/·Syscall/
+TEXT runtime·errstr(SB),NOSPLIT,$16-16
+	get_tls(AX)
+	MOVQ	g(AX), BX
+	MOVQ	g_m(BX), BX
+	MOVQ	(m_mOS+mOS_errstr)(BX), CX
+	MOVQ	CX, 0(SP)
+	MOVQ	$ERRMAX, 8(SP)
+	CALL	errstr<>(SB)
+	CALL	runtime·findnull(SB)
+	MOVQ	8(SP), AX
+	MOVQ	AX, ret_len+8(FP)
+	MOVQ	0(SP), AX
+	MOVQ	AX, ret_base+0(FP)
+	RET
+
+// never called on this platform
+TEXT ·sigpanictramp(SB),NOSPLIT,$0-0
+	UNDEF
diff --git a/src/runtime/sys_plan9_arm.s b/src/runtime/sys_plan9_arm.s
new file mode 100644
index 0000000..5343085
--- /dev/null
+++ b/src/runtime/sys_plan9_arm.s
@@ -0,0 +1,320 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// from ../syscall/zsysnum_plan9.go
+
+#define SYS_SYSR1       0
+#define SYS_BIND        2
+#define SYS_CHDIR       3
+#define SYS_CLOSE       4
+#define SYS_DUP         5
+#define SYS_ALARM       6
+#define SYS_EXEC        7
+#define SYS_EXITS       8
+#define SYS_FAUTH       10
+#define SYS_SEGBRK      12
+#define SYS_OPEN        14
+#define SYS_OSEEK       16
+#define SYS_SLEEP       17
+#define SYS_RFORK       19
+#define SYS_PIPE        21
+#define SYS_CREATE      22
+#define SYS_FD2PATH     23
+#define SYS_BRK_        24
+#define SYS_REMOVE      25
+#define SYS_NOTIFY      28
+#define SYS_NOTED       29
+#define SYS_SEGATTACH   30
+#define SYS_SEGDETACH   31
+#define SYS_SEGFREE     32
+#define SYS_SEGFLUSH    33
+#define SYS_RENDEZVOUS  34
+#define SYS_UNMOUNT     35
+#define SYS_SEMACQUIRE  37
+#define SYS_SEMRELEASE  38
+#define SYS_SEEK        39
+#define SYS_FVERSION    40
+#define SYS_ERRSTR      41
+#define SYS_STAT        42
+#define SYS_FSTAT       43
+#define SYS_WSTAT       44
+#define SYS_FWSTAT      45
+#define SYS_MOUNT       46
+#define SYS_AWAIT       47
+#define SYS_PREAD       50
+#define SYS_PWRITE      51
+#define SYS_TSEMACQUIRE 52
+#define SYS_NSEC        53
+
+//func open(name *byte, mode, perm int32) int32
+TEXT runtime·open(SB),NOSPLIT,$0-16
+	MOVW    $SYS_OPEN, R0
+	SWI	$0
+	MOVW	R0, ret+12(FP)
+	RET
+
+//func pread(fd int32, buf unsafe.Pointer, nbytes int32, offset int64) int32
+TEXT runtime·pread(SB),NOSPLIT,$0-24
+	MOVW    $SYS_PREAD, R0
+	SWI	$0
+	MOVW	R0, ret+20(FP)
+	RET
+
+//func pwrite(fd int32, buf unsafe.Pointer, nbytes int32, offset int64) int32
+TEXT runtime·pwrite(SB),NOSPLIT,$0-24
+	MOVW    $SYS_PWRITE, R0
+	SWI	$0
+	MOVW	R0, ret+20(FP)
+	RET
+
+//func seek(fd int32, offset int64, whence int32) int64
+TEXT runtime·seek(SB),NOSPLIT,$0-24
+	MOVW	$ret_lo+16(FP), R0
+	MOVW	0(R13), R1
+	MOVW	R0, 0(R13)
+	MOVW.W	R1, -4(R13)
+	MOVW	$SYS_SEEK, R0
+	SWI	$0
+	MOVW.W	R1, 4(R13)
+	CMP	$-1, R0
+	MOVW.EQ	R0, ret_lo+16(FP)
+	MOVW.EQ	R0, ret_hi+20(FP)
+	RET
+
+//func closefd(fd int32) int32
+TEXT runtime·closefd(SB),NOSPLIT,$0-8
+	MOVW	$SYS_CLOSE, R0
+	SWI	$0
+	MOVW	R0, ret+4(FP)
+	RET
+
+//func exits(msg *byte)
+TEXT runtime·exits(SB),NOSPLIT,$0-4
+	MOVW    $SYS_EXITS, R0
+	SWI	$0
+	RET
+
+//func brk_(addr unsafe.Pointer) int32
+TEXT runtime·brk_(SB),NOSPLIT,$0-8
+	MOVW    $SYS_BRK_, R0
+	SWI	$0
+	MOVW	R0, ret+4(FP)
+	RET
+
+//func sleep(ms int32) int32
+TEXT runtime·sleep(SB),NOSPLIT,$0-8
+	MOVW    $SYS_SLEEP, R0
+	SWI	$0
+	MOVW	R0, ret+4(FP)
+	RET
+
+//func plan9_semacquire(addr *uint32, block int32) int32
+TEXT runtime·plan9_semacquire(SB),NOSPLIT,$0-12
+	MOVW	$SYS_SEMACQUIRE, R0
+	SWI	$0
+	MOVW	R0, ret+8(FP)
+	RET
+
+//func plan9_tsemacquire(addr *uint32, ms int32) int32
+TEXT runtime·plan9_tsemacquire(SB),NOSPLIT,$0-12
+	MOVW	$SYS_TSEMACQUIRE, R0
+	SWI	$0
+	MOVW	R0, ret+8(FP)
+	RET
+
+//func nsec(*int64) int64
+TEXT runtime·nsec(SB),NOSPLIT|NOFRAME,$0-12
+	MOVW	$SYS_NSEC, R0
+	SWI	$0
+	MOVW	arg+0(FP), R1
+	MOVW	0(R1), R0
+	MOVW	R0, ret_lo+4(FP)
+	MOVW	4(R1), R0
+	MOVW	R0, ret_hi+8(FP)
+	RET
+
+// func walltime() (sec int64, nsec int32)
+TEXT runtime·walltime(SB),NOSPLIT,$12-12
+	// use nsec system call to get current time in nanoseconds
+	MOVW	$sysnsec_lo-8(SP), R0	// destination addr
+	MOVW	R0,res-12(SP)
+	MOVW	$SYS_NSEC, R0
+	SWI	$0
+	MOVW	sysnsec_lo-8(SP), R1	// R1:R2 = nsec
+	MOVW	sysnsec_hi-4(SP), R2
+
+	// multiply nanoseconds by reciprocal of 10**9 (scaled by 2**61)
+	// to get seconds (96 bit scaled result)
+	MOVW	$0x89705f41, R3		// 2**61 * 10**-9
+	MULLU	R1,R3,(R6,R5)		// R5:R6:R7 = R1:R2 * R3
+	MOVW	$0,R7
+	MULALU	R2,R3,(R7,R6)
+
+	// unscale by discarding low 32 bits, shifting the rest by 29
+	MOVW	R6>>29,R6		// R6:R7 = (R5:R6:R7 >> 61)
+	ORR	R7<<3,R6
+	MOVW	R7>>29,R7
+
+	// subtract (10**9 * sec) from nsec to get nanosecond remainder
+	MOVW	$1000000000, R5		// 10**9
+	MULLU	R6,R5,(R9,R8)		// R8:R9 = R6:R7 * R5
+	MULA	R7,R5,R9,R9
+	SUB.S	R8,R1			// R1:R2 -= R8:R9
+	SBC	R9,R2
+
+	// because reciprocal was a truncated repeating fraction, quotient
+	// may be slightly too small -- adjust to make remainder < 10**9
+	CMP	R5,R1			// if remainder > 10**9
+	SUB.HS	R5,R1			//    remainder -= 10**9
+	ADD.HS	$1,R6			//    sec += 1
+
+	MOVW	R6,sec_lo+0(FP)
+	MOVW	R7,sec_hi+4(FP)
+	MOVW	R1,nsec+8(FP)
+	RET
+
+//func notify(fn unsafe.Pointer) int32
+TEXT runtime·notify(SB),NOSPLIT,$0-8
+	MOVW	$SYS_NOTIFY, R0
+	SWI	$0
+	MOVW	R0, ret+4(FP)
+	RET
+
+//func noted(mode int32) int32
+TEXT runtime·noted(SB),NOSPLIT,$0-8
+	MOVW	$SYS_NOTED, R0
+	SWI	$0
+	MOVW	R0, ret+4(FP)
+	RET
+
+//func plan9_semrelease(addr *uint32, count int32) int32
+TEXT runtime·plan9_semrelease(SB),NOSPLIT,$0-12
+	MOVW	$SYS_SEMRELEASE, R0
+	SWI	$0
+	MOVW	R0, ret+8(FP)
+	RET
+
+//func rfork(flags int32) int32
+TEXT runtime·rfork(SB),NOSPLIT,$0-8
+	MOVW	$SYS_RFORK, R0
+	SWI	$0
+	MOVW	R0, ret+4(FP)
+	RET
+
+//func tstart_plan9(newm *m)
+TEXT runtime·tstart_plan9(SB),NOSPLIT,$4-4
+	MOVW	newm+0(FP), R1
+	MOVW	m_g0(R1), g
+
+	// Layout new m scheduler stack on os stack.
+	MOVW	R13, R0
+	MOVW	R0, g_stack+stack_hi(g)
+	SUB	$(64*1024), R0
+	MOVW	R0, (g_stack+stack_lo)(g)
+	MOVW	R0, g_stackguard0(g)
+	MOVW	R0, g_stackguard1(g)
+
+	// Initialize procid from TOS struct.
+	MOVW	_tos(SB), R0
+	MOVW	48(R0), R0
+	MOVW	R0, m_procid(R1)	// save pid as m->procid
+
+	BL	runtime·mstart(SB)
+
+	// Exit the thread.
+	MOVW	$0, R0
+	MOVW	R0, 4(R13)
+	CALL	runtime·exits(SB)
+	JMP	0(PC)
+
+//func sigtramp(ureg, note unsafe.Pointer)
+TEXT runtime·sigtramp(SB),NOSPLIT,$0-8
+	// check that g and m exist
+	CMP	$0, g
+	BEQ	4(PC)
+	MOVW	g_m(g), R0
+	CMP 	$0, R0
+	BNE	2(PC)
+	BL	runtime·badsignal2(SB)	// will exit
+
+	// save args
+	MOVW	ureg+0(FP), R1
+	MOVW	note+4(FP), R2
+
+	// change stack
+	MOVW	m_gsignal(R0), R3
+	MOVW	(g_stack+stack_hi)(R3), R13
+
+	// make room for args, retval and g
+	SUB	$24, R13
+
+	// save g
+	MOVW	g, R3
+	MOVW	R3, 20(R13)
+
+	// g = m->gsignal
+	MOVW	m_gsignal(R0), g
+
+	// load args and call sighandler
+	ADD	$4,R13,R5
+	MOVM.IA	[R1-R3], (R5)
+	BL	runtime·sighandler(SB)
+	MOVW	16(R13), R0			// retval
+
+	// restore g
+	MOVW	20(R13), g
+
+	// call noted(R0)
+	MOVW	R0, 4(R13)
+	BL	runtime·noted(SB)
+	RET
+
+//func sigpanictramp()
+TEXT  runtime·sigpanictramp(SB),NOSPLIT,$0-0
+	MOVW.W	R0, -4(R13)
+	B	runtime·sigpanic(SB)
+
+//func setfpmasks()
+// Only used by the 64-bit runtime.
+TEXT runtime·setfpmasks(SB),NOSPLIT,$0
+	RET
+
+#define ERRMAX 128	/* from os_plan9.h */
+
+// func errstr() string
+// Only used by package syscall.
+// Grab error string due to a syscall made
+// in entersyscall mode, without going
+// through the allocator (issue 4994).
+// See ../syscall/asm_plan9_arm.s:/·Syscall/
+TEXT runtime·errstr(SB),NOSPLIT,$0-8
+	MOVW	g_m(g), R0
+	MOVW	(m_mOS+mOS_errstr)(R0), R1
+	MOVW	R1, ret_base+0(FP)
+	MOVW	$ERRMAX, R2
+	MOVW	R2, ret_len+4(FP)
+	MOVW    $SYS_ERRSTR, R0
+	SWI	$0
+	MOVW	R1, R2
+	MOVBU	0(R2), R0
+	CMP	$0, R0
+	BEQ	3(PC)
+	ADD	$1, R2
+	B	-4(PC)
+	SUB	R1, R2
+	MOVW	R2, ret_len+4(FP)
+	RET
+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	B	runtime·armPublicationBarrier(SB)
+
+// never called (cgo not supported)
+TEXT runtime·read_tls_fallback(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$0, R0
+	MOVW	R0, (R0)
+	RET
diff --git a/src/runtime/sys_ppc64x.go b/src/runtime/sys_ppc64x.go
new file mode 100644
index 0000000..56c5c95
--- /dev/null
+++ b/src/runtime/sys_ppc64x.go
@@ -0,0 +1,22 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ppc64 || ppc64le
+
+package runtime
+
+import "unsafe"
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then did an immediate Gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	if buf.lr != 0 {
+		throw("invalid use of gostartcall")
+	}
+	buf.lr = buf.pc
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
+
+func prepGoExitFrame(sp uintptr)
diff --git a/src/runtime/sys_riscv64.go b/src/runtime/sys_riscv64.go
new file mode 100644
index 0000000..e710840
--- /dev/null
+++ b/src/runtime/sys_riscv64.go
@@ -0,0 +1,18 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then did an immediate Gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	if buf.lr != 0 {
+		throw("invalid use of gostartcall")
+	}
+	buf.lr = buf.pc
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
diff --git a/src/runtime/sys_s390x.go b/src/runtime/sys_s390x.go
new file mode 100644
index 0000000..e710840
--- /dev/null
+++ b/src/runtime/sys_s390x.go
@@ -0,0 +1,18 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then did an immediate Gosave.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	if buf.lr != 0 {
+		throw("invalid use of gostartcall")
+	}
+	buf.lr = buf.pc
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
diff --git a/src/runtime/sys_solaris_amd64.s b/src/runtime/sys_solaris_amd64.s
new file mode 100644
index 0000000..7a80020
--- /dev/null
+++ b/src/runtime/sys_solaris_amd64.s
@@ -0,0 +1,304 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// System calls and other sys.stuff for AMD64, SunOS
+// /usr/include/sys/syscall.h for syscall numbers.
+//
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+// This is needed by asm_amd64.s
+TEXT runtime·settls(SB),NOSPLIT,$8
+	RET
+
+// void libc_miniterrno(void *(*___errno)(void));
+//
+// Set the TLS errno pointer in M.
+//
+// Called using runtime·asmcgocall from os_solaris.c:/minit.
+// NOT USING GO CALLING CONVENTION.
+TEXT runtime·miniterrno(SB),NOSPLIT,$0
+	// asmcgocall will put first argument into DI.
+	CALL	DI	// SysV ABI so returns in AX
+	get_tls(CX)
+	MOVQ	g(CX), BX
+	MOVQ	g_m(BX), BX
+	MOVQ	AX,	(m_mOS+mOS_perrno)(BX)
+	RET
+
+// Call a library function with SysV calling conventions.
+// The called function can take a maximum of 6 INTEGER class arguments,
+// see
+//   Michael Matz, Jan Hubicka, Andreas Jaeger, and Mark Mitchell
+//   System V Application Binary Interface
+//   AMD64 Architecture Processor Supplement
+// section 3.2.3.
+//
+// Called by runtime·asmcgocall or runtime·cgocall.
+// NOT USING GO CALLING CONVENTION.
+TEXT runtime·asmsysvicall6(SB),NOSPLIT,$0
+	// asmcgocall will put first argument into DI.
+	PUSHQ	DI			// save for later
+	MOVQ	libcall_fn(DI), AX
+	MOVQ	libcall_args(DI), R11
+	MOVQ	libcall_n(DI), R10
+
+	get_tls(CX)
+	MOVQ	g(CX), BX
+	CMPQ	BX, $0
+	JEQ	skiperrno1
+	MOVQ	g_m(BX), BX
+	MOVQ	(m_mOS+mOS_perrno)(BX), DX
+	CMPQ	DX, $0
+	JEQ	skiperrno1
+	MOVL	$0, 0(DX)
+
+skiperrno1:
+	CMPQ	R11, $0
+	JEQ	skipargs
+	// Load 6 args into correspondent registers.
+	MOVQ	0(R11), DI
+	MOVQ	8(R11), SI
+	MOVQ	16(R11), DX
+	MOVQ	24(R11), CX
+	MOVQ	32(R11), R8
+	MOVQ	40(R11), R9
+skipargs:
+
+	// Call SysV function
+	CALL	AX
+
+	// Return result
+	POPQ	DI
+	MOVQ	AX, libcall_r1(DI)
+	MOVQ	DX, libcall_r2(DI)
+
+	get_tls(CX)
+	MOVQ	g(CX), BX
+	CMPQ	BX, $0
+	JEQ	skiperrno2
+	MOVQ	g_m(BX), BX
+	MOVQ	(m_mOS+mOS_perrno)(BX), AX
+	CMPQ	AX, $0
+	JEQ	skiperrno2
+	MOVL	0(AX), AX
+	MOVQ	AX, libcall_err(DI)
+
+skiperrno2:
+	RET
+
+// uint32 tstart_sysvicall(M *newm);
+TEXT runtime·tstart_sysvicall(SB),NOSPLIT,$0
+	// DI contains first arg newm
+	MOVQ	m_g0(DI), DX		// g
+
+	// Make TLS entries point at g and m.
+	get_tls(BX)
+	MOVQ	DX, g(BX)
+	MOVQ	DI, g_m(DX)
+
+	// Layout new m scheduler stack on os stack.
+	MOVQ	SP, AX
+	MOVQ	AX, (g_stack+stack_hi)(DX)
+	SUBQ	$(0x100000), AX		// stack size
+	MOVQ	AX, (g_stack+stack_lo)(DX)
+	ADDQ	$const_stackGuard, AX
+	MOVQ	AX, g_stackguard0(DX)
+	MOVQ	AX, g_stackguard1(DX)
+
+	// Someday the convention will be D is always cleared.
+	CLD
+
+	CALL	runtime·stackcheck(SB)	// clobbers AX,CX
+	CALL	runtime·mstart(SB)
+
+	XORL	AX, AX			// return 0 == success
+	MOVL	AX, ret+8(FP)
+	RET
+
+// Careful, this is called by __sighndlr, a libc function. We must preserve
+// registers as per AMD 64 ABI.
+TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
+	// Note that we are executing on altsigstack here, so we have
+	// more stack available than NOSPLIT would have us believe.
+	// To defeat the linker, we make our own stack frame with
+	// more space:
+	SUBQ    $168, SP
+	// save registers
+	MOVQ    BX, 24(SP)
+	MOVQ    BP, 32(SP)
+	MOVQ	R12, 40(SP)
+	MOVQ	R13, 48(SP)
+	MOVQ	R14, 56(SP)
+	MOVQ	R15, 64(SP)
+
+	get_tls(BX)
+	// check that g exists
+	MOVQ	g(BX), R10
+	CMPQ	R10, $0
+	JNE	allgood
+	MOVQ	SI, 72(SP)
+	MOVQ	DX, 80(SP)
+	LEAQ	72(SP), AX
+	MOVQ	DI, 0(SP)
+	MOVQ	AX, 8(SP)
+	MOVQ	$runtime·badsignal(SB), AX
+	CALL	AX
+	JMP	exit
+
+allgood:
+	// Save m->libcall and m->scratch. We need to do this because we
+	// might get interrupted by a signal in runtime·asmcgocall.
+
+	// save m->libcall
+	MOVQ	g_m(R10), BP
+	LEAQ	m_libcall(BP), R11
+	MOVQ	libcall_fn(R11), R10
+	MOVQ	R10, 72(SP)
+	MOVQ	libcall_args(R11), R10
+	MOVQ	R10, 80(SP)
+	MOVQ	libcall_n(R11), R10
+	MOVQ	R10, 88(SP)
+	MOVQ    libcall_r1(R11), R10
+	MOVQ    R10, 152(SP)
+	MOVQ    libcall_r2(R11), R10
+	MOVQ    R10, 160(SP)
+
+	// save m->scratch
+	LEAQ	(m_mOS+mOS_scratch)(BP), R11
+	MOVQ	0(R11), R10
+	MOVQ	R10, 96(SP)
+	MOVQ	8(R11), R10
+	MOVQ	R10, 104(SP)
+	MOVQ	16(R11), R10
+	MOVQ	R10, 112(SP)
+	MOVQ	24(R11), R10
+	MOVQ	R10, 120(SP)
+	MOVQ	32(R11), R10
+	MOVQ	R10, 128(SP)
+	MOVQ	40(R11), R10
+	MOVQ	R10, 136(SP)
+
+	// save errno, it might be EINTR; stuff we do here might reset it.
+	MOVQ	(m_mOS+mOS_perrno)(BP), R10
+	MOVL	0(R10), R10
+	MOVQ	R10, 144(SP)
+
+	// prepare call
+	MOVQ	DI, 0(SP)
+	MOVQ	SI, 8(SP)
+	MOVQ	DX, 16(SP)
+	CALL	runtime·sigtrampgo(SB)
+
+	get_tls(BX)
+	MOVQ	g(BX), BP
+	MOVQ	g_m(BP), BP
+	// restore libcall
+	LEAQ	m_libcall(BP), R11
+	MOVQ	72(SP), R10
+	MOVQ	R10, libcall_fn(R11)
+	MOVQ	80(SP), R10
+	MOVQ	R10, libcall_args(R11)
+	MOVQ	88(SP), R10
+	MOVQ	R10, libcall_n(R11)
+	MOVQ    152(SP), R10
+	MOVQ    R10, libcall_r1(R11)
+	MOVQ    160(SP), R10
+	MOVQ    R10, libcall_r2(R11)
+
+	// restore scratch
+	LEAQ	(m_mOS+mOS_scratch)(BP), R11
+	MOVQ	96(SP), R10
+	MOVQ	R10, 0(R11)
+	MOVQ	104(SP), R10
+	MOVQ	R10, 8(R11)
+	MOVQ	112(SP), R10
+	MOVQ	R10, 16(R11)
+	MOVQ	120(SP), R10
+	MOVQ	R10, 24(R11)
+	MOVQ	128(SP), R10
+	MOVQ	R10, 32(R11)
+	MOVQ	136(SP), R10
+	MOVQ	R10, 40(R11)
+
+	// restore errno
+	MOVQ	(m_mOS+mOS_perrno)(BP), R11
+	MOVQ	144(SP), R10
+	MOVL	R10, 0(R11)
+
+exit:
+	// restore registers
+	MOVQ	24(SP), BX
+	MOVQ	32(SP), BP
+	MOVQ	40(SP), R12
+	MOVQ	48(SP), R13
+	MOVQ	56(SP), R14
+	MOVQ	64(SP), R15
+	ADDQ    $168, SP
+	RET
+
+TEXT runtime·sigfwd(SB),NOSPLIT,$0-32
+	MOVQ	fn+0(FP),    AX
+	MOVL	sig+8(FP),   DI
+	MOVQ	info+16(FP), SI
+	MOVQ	ctx+24(FP),  DX
+	MOVQ	SP, BX		// callee-saved
+	ANDQ	$~15, SP	// alignment for x86_64 ABI
+	CALL	AX
+	MOVQ	BX, SP
+	RET
+
+// Called from runtime·usleep (Go). Can be called on Go stack, on OS stack,
+// can also be called in cgo callback path without a g->m.
+TEXT runtime·usleep1(SB),NOSPLIT,$0
+	MOVL	usec+0(FP), DI
+	MOVQ	$usleep2<>(SB), AX // to hide from 6l
+
+	// Execute call on m->g0.
+	get_tls(R15)
+	CMPQ	R15, $0
+	JE	noswitch
+
+	MOVQ	g(R15), R13
+	CMPQ	R13, $0
+	JE	noswitch
+	MOVQ	g_m(R13), R13
+	CMPQ	R13, $0
+	JE	noswitch
+	// TODO(aram): do something about the cpu profiler here.
+
+	MOVQ	m_g0(R13), R14
+	CMPQ	g(R15), R14
+	JNE	switch
+	// executing on m->g0 already
+	CALL	AX
+	RET
+
+switch:
+	// Switch to m->g0 stack and back.
+	MOVQ	(g_sched+gobuf_sp)(R14), R14
+	MOVQ	SP, -8(R14)
+	LEAQ	-8(R14), SP
+	CALL	AX
+	MOVQ	0(SP), SP
+	RET
+
+noswitch:
+	// Not a Go-managed thread. Do not switch stack.
+	CALL	AX
+	RET
+
+// Runs on OS stack. duration (in µs units) is in DI.
+TEXT usleep2<>(SB),NOSPLIT,$0
+	LEAQ	libc_usleep(SB), AX
+	CALL	AX
+	RET
+
+// Runs on OS stack, called from runtime·osyield.
+TEXT runtime·osyield1(SB),NOSPLIT,$0
+	LEAQ	libc_sched_yield(SB), AX
+	CALL	AX
+	RET
diff --git a/src/runtime/sys_wasm.go b/src/runtime/sys_wasm.go
new file mode 100644
index 0000000..27f9432
--- /dev/null
+++ b/src/runtime/sys_wasm.go
@@ -0,0 +1,36 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/goarch"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+type m0Stack struct {
+	_ [8192 * sys.StackGuardMultiplier]byte
+}
+
+var wasmStack m0Stack
+
+func wasmDiv()
+
+func wasmTruncS()
+func wasmTruncU()
+
+//go:wasmimport gojs runtime.wasmExit
+func wasmExit(code int32)
+
+// adjust Gobuf as it if executed a call to fn with context ctxt
+// and then stopped before the first instruction in fn.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	sp := buf.sp
+	sp -= goarch.PtrSize
+	*(*uintptr)(unsafe.Pointer(sp)) = buf.pc
+	buf.sp = sp
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
diff --git a/src/runtime/sys_wasm.s b/src/runtime/sys_wasm.s
new file mode 100644
index 0000000..1e73ada
--- /dev/null
+++ b/src/runtime/sys_wasm.s
@@ -0,0 +1,94 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT runtime·wasmDiv(SB), NOSPLIT, $0-0
+	Get R0
+	I64Const $-0x8000000000000000
+	I64Eq
+	If
+		Get R1
+		I64Const $-1
+		I64Eq
+		If
+			I64Const $-0x8000000000000000
+			Return
+		End
+	End
+	Get R0
+	Get R1
+	I64DivS
+	Return
+
+TEXT runtime·wasmTruncS(SB), NOSPLIT, $0-0
+	Get R0
+	Get R0
+	F64Ne // NaN
+	If
+		I64Const $0x8000000000000000
+		Return
+	End
+
+	Get R0
+	F64Const $0x7ffffffffffffc00p0 // Maximum truncated representation of 0x7fffffffffffffff
+	F64Gt
+	If
+		I64Const $0x8000000000000000
+		Return
+	End
+
+	Get R0
+	F64Const $-0x7ffffffffffffc00p0 // Minimum truncated representation of -0x8000000000000000
+	F64Lt
+	If
+		I64Const $0x8000000000000000
+		Return
+	End
+
+	Get R0
+	I64TruncF64S
+	Return
+
+TEXT runtime·wasmTruncU(SB), NOSPLIT, $0-0
+	Get R0
+	Get R0
+	F64Ne // NaN
+	If
+		I64Const $0x8000000000000000
+		Return
+	End
+
+	Get R0
+	F64Const $0xfffffffffffff800p0 // Maximum truncated representation of 0xffffffffffffffff
+	F64Gt
+	If
+		I64Const $0x8000000000000000
+		Return
+	End
+
+	Get R0
+	F64Const $0.
+	F64Lt
+	If
+		I64Const $0x8000000000000000
+		Return
+	End
+
+	Get R0
+	I64TruncF64U
+	Return
+
+TEXT runtime·exitThread(SB), NOSPLIT, $0-0
+	UNDEF
+
+TEXT runtime·osyield(SB), NOSPLIT, $0-0
+	UNDEF
+
+TEXT runtime·growMemory(SB), NOSPLIT, $0
+	Get SP
+	I32Load pages+0(FP)
+	GrowMemory
+	I32Store ret+8(FP)
+	RET
diff --git a/src/runtime/sys_windows_386.s b/src/runtime/sys_windows_386.s
new file mode 100644
index 0000000..41a6ee6
--- /dev/null
+++ b/src/runtime/sys_windows_386.s
@@ -0,0 +1,303 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "time_windows.h"
+
+// Offsets into Thread Environment Block (pointer in FS)
+#define TEB_TlsSlots 0xE10
+#define TEB_ArbitraryPtr 0x14
+
+// void runtime·asmstdcall(void *c);
+TEXT runtime·asmstdcall(SB),NOSPLIT,$0
+	MOVL	fn+0(FP), BX
+
+	// SetLastError(0).
+	MOVL	$0, 0x34(FS)
+
+	// Copy args to the stack.
+	MOVL	SP, BP
+	MOVL	libcall_n(BX), CX	// words
+	MOVL	CX, AX
+	SALL	$2, AX
+	SUBL	AX, SP			// room for args
+	MOVL	SP, DI
+	MOVL	libcall_args(BX), SI
+	CLD
+	REP; MOVSL
+
+	// Call stdcall or cdecl function.
+	// DI SI BP BX are preserved, SP is not
+	CALL	libcall_fn(BX)
+	MOVL	BP, SP
+
+	// Return result.
+	MOVL	fn+0(FP), BX
+	MOVL	AX, libcall_r1(BX)
+	MOVL	DX, libcall_r2(BX)
+
+	// GetLastError().
+	MOVL	0x34(FS), AX
+	MOVL	AX, libcall_err(BX)
+
+	RET
+
+// faster get/set last error
+TEXT runtime·getlasterror(SB),NOSPLIT,$0
+	MOVL	0x34(FS), AX
+	MOVL	AX, ret+0(FP)
+	RET
+
+TEXT runtime·sigFetchGSafe<ABIInternal>(SB),NOSPLIT,$0
+	get_tls(AX)
+	CMPL	AX, $0
+	JE	2(PC)
+	MOVL	g(AX), AX
+	MOVL	AX, ret+0(FP)
+	RET
+
+// Called by Windows as a Vectored Exception Handler (VEH).
+// AX is pointer to struct containing
+// exception record and context pointers.
+// CX is the kind of sigtramp function.
+// Return value of sigtrampgo is stored in AX.
+TEXT sigtramp<>(SB),NOSPLIT,$0-0
+	SUBL	$40, SP
+
+	// save callee-saved registers
+	MOVL	BX, 28(SP)
+	MOVL	BP, 16(SP)
+	MOVL	SI, 20(SP)
+	MOVL	DI, 24(SP)
+
+	MOVL	AX, 0(SP)
+	MOVL	CX, 4(SP)
+	CALL	runtime·sigtrampgo(SB)
+	MOVL	8(SP), AX
+
+	// restore callee-saved registers
+	MOVL	24(SP), DI
+	MOVL	20(SP), SI
+	MOVL	16(SP), BP
+	MOVL	28(SP), BX
+
+	ADDL	$40, SP
+	// RET 4 (return and pop 4 bytes parameters)
+	BYTE $0xC2; WORD $4
+	RET // unreached; make assembler happy
+
+// Trampoline to resume execution from exception handler.
+// This is part of the control flow guard workaround.
+// It switches stacks and jumps to the continuation address.
+// DX and CX are set above at the end of sigtrampgo
+// in the context that starts executing at sigresume.
+TEXT runtime·sigresume(SB),NOSPLIT,$0
+	MOVL	DX, SP
+	JMP	CX
+
+TEXT runtime·exceptiontramp(SB),NOSPLIT,$0
+	MOVL	argframe+0(FP), AX
+	MOVL	$const_callbackVEH, CX
+	JMP	sigtramp<>(SB)
+
+TEXT runtime·firstcontinuetramp(SB),NOSPLIT,$0-0
+	// is never called
+	INT	$3
+
+TEXT runtime·lastcontinuetramp(SB),NOSPLIT,$0-0
+	MOVL	argframe+0(FP), AX
+	MOVL	$const_callbackLastVCH, CX
+	JMP	sigtramp<>(SB)
+
+TEXT runtime·callbackasm1(SB),NOSPLIT,$0
+	MOVL	0(SP), AX	// will use to find our callback context
+
+	// remove return address from stack, we are not returning to callbackasm, but to its caller.
+	ADDL	$4, SP
+
+	// address to callback parameters into CX
+	LEAL	4(SP), CX
+
+	// save registers as required for windows callback
+	PUSHL	DI
+	PUSHL	SI
+	PUSHL	BP
+	PUSHL	BX
+
+	// Go ABI requires DF flag to be cleared.
+	CLD
+
+	// determine index into runtime·cbs table
+	SUBL	$runtime·callbackasm(SB), AX
+	MOVL	$0, DX
+	MOVL	$5, BX	// divide by 5 because each call instruction in runtime·callbacks is 5 bytes long
+	DIVL	BX
+	SUBL	$1, AX	// subtract 1 because return PC is to the next slot
+
+	// Create a struct callbackArgs on our stack.
+	SUBL	$(12+callbackArgs__size), SP
+	MOVL	AX, (12+callbackArgs_index)(SP)		// callback index
+	MOVL	CX, (12+callbackArgs_args)(SP)		// address of args vector
+	MOVL	$0, (12+callbackArgs_result)(SP)	// result
+	LEAL	12(SP), AX	// AX = &callbackArgs{...}
+
+	// Call cgocallback, which will call callbackWrap(frame).
+	MOVL	$0, 8(SP)	// context
+	MOVL	AX, 4(SP)	// frame (address of callbackArgs)
+	LEAL	·callbackWrap(SB), AX
+	MOVL	AX, 0(SP)	// PC of function to call
+	CALL	runtime·cgocallback(SB)
+
+	// Get callback result.
+	MOVL	(12+callbackArgs_result)(SP), AX
+	// Get popRet.
+	MOVL	(12+callbackArgs_retPop)(SP), CX	// Can't use a callee-save register
+	ADDL	$(12+callbackArgs__size), SP
+
+	// restore registers as required for windows callback
+	POPL	BX
+	POPL	BP
+	POPL	SI
+	POPL	DI
+
+	// remove callback parameters before return (as per Windows spec)
+	POPL	DX
+	ADDL	CX, SP
+	PUSHL	DX
+
+	CLD
+
+	RET
+
+// void tstart(M *newm);
+TEXT tstart<>(SB),NOSPLIT,$8-4
+	MOVL	newm+0(FP), CX		// m
+	MOVL	m_g0(CX), DX		// g
+
+	// Layout new m scheduler stack on os stack.
+	MOVL	SP, AX
+	MOVL	AX, (g_stack+stack_hi)(DX)
+	SUBL	$(64*1024), AX		// initial stack size (adjusted later)
+	MOVL	AX, (g_stack+stack_lo)(DX)
+	ADDL	$const_stackGuard, AX
+	MOVL	AX, g_stackguard0(DX)
+	MOVL	AX, g_stackguard1(DX)
+
+	// Set up tls.
+	LEAL	m_tls(CX), DI
+	MOVL	CX, g_m(DX)
+	MOVL	DX, g(DI)
+	MOVL	DI, 4(SP)
+	CALL	runtime·setldt(SB) // clobbers CX and DX
+
+	// Someday the convention will be D is always cleared.
+	CLD
+
+	CALL	runtime·stackcheck(SB)	// clobbers AX,CX
+	CALL	runtime·mstart(SB)
+
+	RET
+
+// uint32 tstart_stdcall(M *newm);
+TEXT runtime·tstart_stdcall(SB),NOSPLIT,$0
+	MOVL	newm+0(FP), BX
+
+	PUSHL	BX
+	CALL	tstart<>(SB)
+	POPL	BX
+
+	// Adjust stack for stdcall to return properly.
+	MOVL	(SP), AX		// save return address
+	ADDL	$4, SP			// remove single parameter
+	MOVL	AX, (SP)		// restore return address
+
+	XORL	AX, AX			// return 0 == success
+
+	RET
+
+// setldt(int slot, int base, int size)
+TEXT runtime·setldt(SB),NOSPLIT,$0-12
+	MOVL	base+4(FP), DX
+	MOVL	runtime·tls_g(SB), CX
+	MOVL	DX, 0(CX)(FS)
+	RET
+
+// Runs on OS stack.
+// duration (in -100ns units) is in dt+0(FP).
+// g may be nil.
+TEXT runtime·usleep2(SB),NOSPLIT,$20-4
+	MOVL	dt+0(FP), BX
+	MOVL	$-1, hi-4(SP)
+	MOVL	BX, lo-8(SP)
+	LEAL	lo-8(SP), BX
+	MOVL	BX, ptime-12(SP)
+	MOVL	$0, alertable-16(SP)
+	MOVL	$-1, handle-20(SP)
+	MOVL	SP, BP
+	MOVL	runtime·_NtWaitForSingleObject(SB), AX
+	CALL	AX
+	MOVL	BP, SP
+	RET
+
+// Runs on OS stack.
+TEXT runtime·switchtothread(SB),NOSPLIT,$0
+	MOVL	SP, BP
+	MOVL	runtime·_SwitchToThread(SB), AX
+	CALL	AX
+	MOVL	BP, SP
+	RET
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$0-8
+	CMPB	runtime·useQPCTime(SB), $0
+	JNE	useQPC
+loop:
+	MOVL	(_INTERRUPT_TIME+time_hi1), AX
+	MOVL	(_INTERRUPT_TIME+time_lo), CX
+	MOVL	(_INTERRUPT_TIME+time_hi2), DI
+	CMPL	AX, DI
+	JNE	loop
+
+	// wintime = DI:CX, multiply by 100
+	MOVL	$100, AX
+	MULL	CX
+	IMULL	$100, DI
+	ADDL	DI, DX
+	// wintime*100 = DX:AX
+	MOVL	AX, ret_lo+0(FP)
+	MOVL	DX, ret_hi+4(FP)
+	RET
+useQPC:
+	JMP	runtime·nanotimeQPC(SB)
+	RET
+
+// This is called from rt0_go, which runs on the system stack
+// using the initial stack allocated by the OS.
+TEXT runtime·wintls(SB),NOSPLIT,$0
+	// Allocate a TLS slot to hold g across calls to external code
+	MOVL	SP, BP
+	MOVL	runtime·_TlsAlloc(SB), AX
+	CALL	AX
+	MOVL	BP, SP
+
+	MOVL	AX, CX	// TLS index
+
+	// Assert that slot is less than 64 so we can use _TEB->TlsSlots
+	CMPL	CX, $64
+	JB	ok
+	// Fallback to the TEB arbitrary pointer.
+	// TODO: don't use the arbitrary pointer (see go.dev/issue/59824)
+	MOVL	$TEB_ArbitraryPtr, CX
+	JMP	settls
+ok:
+	// Convert the TLS index at CX into
+	// an offset from TEB_TlsSlots.
+	SHLL	$2, CX
+
+	// Save offset from TLS into tls_g.
+	ADDL	$TEB_TlsSlots, CX
+settls:
+	MOVL	CX, runtime·tls_g(SB)
+	RET
diff --git a/src/runtime/sys_windows_amd64.s b/src/runtime/sys_windows_amd64.s
new file mode 100644
index 0000000..e66f444
--- /dev/null
+++ b/src/runtime/sys_windows_amd64.s
@@ -0,0 +1,319 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "time_windows.h"
+#include "cgo/abi_amd64.h"
+
+// Offsets into Thread Environment Block (pointer in GS)
+#define TEB_TlsSlots 0x1480
+#define TEB_ArbitraryPtr 0x28
+
+// void runtime·asmstdcall(void *c);
+TEXT runtime·asmstdcall(SB),NOSPLIT,$16
+	MOVQ	SP, AX
+	ANDQ	$~15, SP	// alignment as per Windows requirement
+	MOVQ	AX, 8(SP)
+	MOVQ	CX, 0(SP)	// asmcgocall will put first argument into CX.
+
+	MOVQ	libcall_fn(CX), AX
+	MOVQ	libcall_args(CX), SI
+	MOVQ	libcall_n(CX), CX
+
+	// SetLastError(0).
+	MOVQ	0x30(GS), DI
+	MOVL	$0, 0x68(DI)
+
+	SUBQ	$(const_maxArgs*8), SP	// room for args
+
+	// Fast version, do not store args on the stack.
+	CMPL	CX, $4
+	JLE	loadregs
+
+	// Check we have enough room for args.
+	CMPL	CX, $const_maxArgs
+	JLE	2(PC)
+	INT	$3			// not enough room -> crash
+
+	// Copy args to the stack.
+	MOVQ	SP, DI
+	CLD
+	REP; MOVSQ
+	MOVQ	SP, SI
+
+loadregs:
+	// Load first 4 args into correspondent registers.
+	MOVQ	0(SI), CX
+	MOVQ	8(SI), DX
+	MOVQ	16(SI), R8
+	MOVQ	24(SI), R9
+	// Floating point arguments are passed in the XMM
+	// registers. Set them here in case any of the arguments
+	// are floating point values. For details see
+	//	https://msdn.microsoft.com/en-us/library/zthk2dkh.aspx
+	MOVQ	CX, X0
+	MOVQ	DX, X1
+	MOVQ	R8, X2
+	MOVQ	R9, X3
+
+	// Call stdcall function.
+	CALL	AX
+
+	ADDQ	$(const_maxArgs*8), SP
+
+	// Return result.
+	MOVQ	0(SP), CX
+	MOVQ	8(SP), SP
+	MOVQ	AX, libcall_r1(CX)
+	// Floating point return values are returned in XMM0. Setting r2 to this
+	// value in case this call returned a floating point value. For details,
+	// see https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention
+	MOVQ    X0, libcall_r2(CX)
+
+	// GetLastError().
+	MOVQ	0x30(GS), DI
+	MOVL	0x68(DI), AX
+	MOVQ	AX, libcall_err(CX)
+
+	RET
+
+// faster get/set last error
+TEXT runtime·getlasterror(SB),NOSPLIT,$0
+	MOVQ	0x30(GS), AX
+	MOVL	0x68(AX), AX
+	MOVL	AX, ret+0(FP)
+	RET
+
+// Called by Windows as a Vectored Exception Handler (VEH).
+// CX is pointer to struct containing
+// exception record and context pointers.
+// DX is the kind of sigtramp function.
+// Return value of sigtrampgo is stored in AX.
+TEXT sigtramp<>(SB),NOSPLIT|NOFRAME,$0-0
+	// Switch from the host ABI to the Go ABI.
+	PUSH_REGS_HOST_TO_ABI0()
+
+	// Set up ABIInternal environment: cleared X15 and R14.
+	// R14 is cleared in case there's a non-zero value in there
+	// if called from a non-go thread.
+	XORPS	X15, X15
+	XORQ	R14, R14
+
+	get_tls(AX)
+	CMPQ	AX, $0
+	JE	2(PC)
+	// Exception from Go thread, set R14.
+	MOVQ	g(AX), R14
+
+	// Reserve space for spill slots.
+	ADJSP	$16
+	MOVQ	CX, AX
+	MOVQ	DX, BX
+	// Calling ABIInternal because TLS might be nil.
+	CALL	runtime·sigtrampgo<ABIInternal>(SB)
+	// Return value is already stored in AX.
+
+	ADJSP	$-16
+
+	POP_REGS_HOST_TO_ABI0()
+	RET
+
+// Trampoline to resume execution from exception handler.
+// This is part of the control flow guard workaround.
+// It switches stacks and jumps to the continuation address.
+// R8 and R9 are set above at the end of sigtrampgo
+// in the context that starts executing at sigresume.
+TEXT runtime·sigresume(SB),NOSPLIT|NOFRAME,$0
+	MOVQ	R8, SP
+	JMP	R9
+
+TEXT runtime·exceptiontramp(SB),NOSPLIT|NOFRAME,$0
+	// PExceptionPointers already on CX
+	MOVQ	$const_callbackVEH, DX
+	JMP	sigtramp<>(SB)
+
+TEXT runtime·firstcontinuetramp(SB),NOSPLIT|NOFRAME,$0-0
+	// PExceptionPointers already on CX
+	MOVQ	$const_callbackFirstVCH, DX
+	JMP	sigtramp<>(SB)
+
+TEXT runtime·lastcontinuetramp(SB),NOSPLIT|NOFRAME,$0-0
+	// PExceptionPointers already on CX
+	MOVQ	$const_callbackLastVCH, DX
+	JMP	sigtramp<>(SB)
+
+TEXT runtime·callbackasm1(SB),NOSPLIT|NOFRAME,$0
+	// Construct args vector for cgocallback().
+	// By windows/amd64 calling convention first 4 args are in CX, DX, R8, R9
+	// args from the 5th on are on the stack.
+	// In any case, even if function has 0,1,2,3,4 args, there is reserved
+	// but uninitialized "shadow space" for the first 4 args.
+	// The values are in registers.
+	MOVQ	CX, (16+0)(SP)
+	MOVQ	DX, (16+8)(SP)
+	MOVQ	R8, (16+16)(SP)
+	MOVQ	R9, (16+24)(SP)
+	// R8 = address of args vector
+	LEAQ	(16+0)(SP), R8
+
+	// remove return address from stack, we are not returning to callbackasm, but to its caller.
+	MOVQ	0(SP), AX
+	ADDQ	$8, SP
+
+	// determine index into runtime·cbs table
+	MOVQ	$runtime·callbackasm(SB), DX
+	SUBQ	DX, AX
+	MOVQ	$0, DX
+	MOVQ	$5, CX	// divide by 5 because each call instruction in runtime·callbacks is 5 bytes long
+	DIVL	CX
+	SUBQ	$1, AX	// subtract 1 because return PC is to the next slot
+
+	// Switch from the host ABI to the Go ABI.
+	PUSH_REGS_HOST_TO_ABI0()
+
+	// Create a struct callbackArgs on our stack to be passed as
+	// the "frame" to cgocallback and on to callbackWrap.
+	SUBQ	$(24+callbackArgs__size), SP
+	MOVQ	AX, (24+callbackArgs_index)(SP) 	// callback index
+	MOVQ	R8, (24+callbackArgs_args)(SP)  	// address of args vector
+	MOVQ	$0, (24+callbackArgs_result)(SP)	// result
+	LEAQ	24(SP), AX
+	// Call cgocallback, which will call callbackWrap(frame).
+	MOVQ	$0, 16(SP)	// context
+	MOVQ	AX, 8(SP)	// frame (address of callbackArgs)
+	LEAQ	·callbackWrap<ABIInternal>(SB), BX	// cgocallback takes an ABIInternal entry-point
+	MOVQ	BX, 0(SP)	// PC of function value to call (callbackWrap)
+	CALL	·cgocallback(SB)
+	// Get callback result.
+	MOVQ	(24+callbackArgs_result)(SP), AX
+	ADDQ	$(24+callbackArgs__size), SP
+
+	POP_REGS_HOST_TO_ABI0()
+
+	// The return value was placed in AX above.
+	RET
+
+// uint32 tstart_stdcall(M *newm);
+TEXT runtime·tstart_stdcall(SB),NOSPLIT|NOFRAME,$0
+	// Switch from the host ABI to the Go ABI.
+	PUSH_REGS_HOST_TO_ABI0()
+
+	// CX contains first arg newm
+	MOVQ	m_g0(CX), DX		// g
+
+	// Layout new m scheduler stack on os stack.
+	MOVQ	SP, AX
+	MOVQ	AX, (g_stack+stack_hi)(DX)
+	SUBQ	$(64*1024), AX		// initial stack size (adjusted later)
+	MOVQ	AX, (g_stack+stack_lo)(DX)
+	ADDQ	$const_stackGuard, AX
+	MOVQ	AX, g_stackguard0(DX)
+	MOVQ	AX, g_stackguard1(DX)
+
+	// Set up tls.
+	LEAQ	m_tls(CX), DI
+	MOVQ	CX, g_m(DX)
+	MOVQ	DX, g(DI)
+	CALL	runtime·settls(SB) // clobbers CX
+
+	CALL	runtime·stackcheck(SB)	// clobbers AX,CX
+	CALL	runtime·mstart(SB)
+
+	POP_REGS_HOST_TO_ABI0()
+
+	XORL	AX, AX			// return 0 == success
+	RET
+
+// set tls base to DI
+TEXT runtime·settls(SB),NOSPLIT,$0
+	MOVQ	runtime·tls_g(SB), CX
+	MOVQ	DI, 0(CX)(GS)
+	RET
+
+// Runs on OS stack.
+// duration (in -100ns units) is in dt+0(FP).
+// g may be nil.
+// The function leaves room for 4 syscall parameters
+// (as per windows amd64 calling convention).
+TEXT runtime·usleep2(SB),NOSPLIT,$48-4
+	MOVLQSX	dt+0(FP), BX
+	MOVQ	SP, AX
+	ANDQ	$~15, SP	// alignment as per Windows requirement
+	MOVQ	AX, 40(SP)
+	LEAQ	32(SP), R8  // ptime
+	MOVQ	BX, (R8)
+	MOVQ	$-1, CX // handle
+	MOVQ	$0, DX // alertable
+	MOVQ	runtime·_NtWaitForSingleObject(SB), AX
+	CALL	AX
+	MOVQ	40(SP), SP
+	RET
+
+// Runs on OS stack.
+TEXT runtime·switchtothread(SB),NOSPLIT,$0
+	MOVQ	SP, AX
+	ANDQ	$~15, SP	// alignment as per Windows requirement
+	SUBQ	$(48), SP	// room for SP and 4 args as per Windows requirement
+				// plus one extra word to keep stack 16 bytes aligned
+	MOVQ	AX, 32(SP)
+	MOVQ	runtime·_SwitchToThread(SB), AX
+	CALL	AX
+	MOVQ	32(SP), SP
+	RET
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$0-8
+	CMPB	runtime·useQPCTime(SB), $0
+	JNE	useQPC
+	MOVQ	$_INTERRUPT_TIME, DI
+	MOVQ	time_lo(DI), AX
+	IMULQ	$100, AX
+	MOVQ	AX, ret+0(FP)
+	RET
+useQPC:
+	JMP	runtime·nanotimeQPC(SB)
+	RET
+
+// func osSetupTLS(mp *m)
+// Setup TLS. for use by needm on Windows.
+TEXT runtime·osSetupTLS(SB),NOSPLIT,$0-8
+	MOVQ	mp+0(FP), AX
+	LEAQ	m_tls(AX), DI
+	CALL	runtime·settls(SB)
+	RET
+
+// This is called from rt0_go, which runs on the system stack
+// using the initial stack allocated by the OS.
+TEXT runtime·wintls(SB),NOSPLIT,$0
+	// Allocate a TLS slot to hold g across calls to external code
+	MOVQ	SP, AX
+	ANDQ	$~15, SP	// alignment as per Windows requirement
+	SUBQ	$48, SP	// room for SP and 4 args as per Windows requirement
+			// plus one extra word to keep stack 16 bytes aligned
+	MOVQ	AX, 32(SP)
+	MOVQ	runtime·_TlsAlloc(SB), AX
+	CALL	AX
+	MOVQ	32(SP), SP
+
+	MOVQ	AX, CX	// TLS index
+
+	// Assert that slot is less than 64 so we can use _TEB->TlsSlots
+	CMPQ	CX, $64
+	JB	ok
+
+	// Fallback to the TEB arbitrary pointer.
+	// TODO: don't use the arbitrary pointer (see go.dev/issue/59824)
+	MOVQ	$TEB_ArbitraryPtr, CX
+	JMP	settls
+ok:
+	// Convert the TLS index at CX into
+	// an offset from TEB_TlsSlots.
+	SHLQ	$3, CX
+
+	// Save offset from TLS into tls_g.
+	ADDQ	$TEB_TlsSlots, CX
+settls:
+	MOVQ	CX, runtime·tls_g(SB)
+	RET
diff --git a/src/runtime/sys_windows_arm.s b/src/runtime/sys_windows_arm.s
new file mode 100644
index 0000000..67009df
--- /dev/null
+++ b/src/runtime/sys_windows_arm.s
@@ -0,0 +1,320 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "time_windows.h"
+
+// Note: For system ABI, R0-R3 are args, R4-R11 are callee-save.
+
+// void runtime·asmstdcall(void *c);
+TEXT runtime·asmstdcall(SB),NOSPLIT|NOFRAME,$0
+	MOVM.DB.W [R4, R5, R14], (R13)	// push {r4, r5, lr}
+	MOVW	R0, R4			// put libcall * in r4
+	MOVW	R13, R5			// save stack pointer in r5
+
+	// SetLastError(0)
+	MOVW	$0, R0
+	MRC	15, 0, R1, C13, C0, 2
+	MOVW	R0, 0x34(R1)
+
+	MOVW	8(R4), R12	// libcall->args
+
+	// Do we have more than 4 arguments?
+	MOVW	4(R4), R0	// libcall->n
+	SUB.S	$4, R0, R2
+	BLE	loadregs
+
+	// Reserve stack space for remaining args
+	SUB	R2<<2, R13
+	BIC	$0x7, R13	// alignment for ABI
+
+	// R0: count of arguments
+	// R1:
+	// R2: loop counter, from 0 to (n-4)
+	// R3: scratch
+	// R4: pointer to libcall struct
+	// R12: libcall->args
+	MOVW	$0, R2
+stackargs:
+	ADD	$4, R2, R3		// r3 = args[4 + i]
+	MOVW	R3<<2(R12), R3
+	MOVW	R3, R2<<2(R13)		// stack[i] = r3
+
+	ADD	$1, R2			// i++
+	SUB	$4, R0, R3		// while (i < (n - 4))
+	CMP	R3, R2
+	BLT	stackargs
+
+loadregs:
+	CMP	$3, R0
+	MOVW.GT 12(R12), R3
+
+	CMP	$2, R0
+	MOVW.GT 8(R12), R2
+
+	CMP	$1, R0
+	MOVW.GT 4(R12), R1
+
+	CMP	$0, R0
+	MOVW.GT 0(R12), R0
+
+	BIC	$0x7, R13		// alignment for ABI
+	MOVW	0(R4), R12		// branch to libcall->fn
+	BL	(R12)
+
+	MOVW	R5, R13			// free stack space
+	MOVW	R0, 12(R4)		// save return value to libcall->r1
+	MOVW	R1, 16(R4)
+
+	// GetLastError
+	MRC	15, 0, R1, C13, C0, 2
+	MOVW	0x34(R1), R0
+	MOVW	R0, 20(R4)		// store in libcall->err
+
+	MOVM.IA.W (R13), [R4, R5, R15]
+
+TEXT runtime·getlasterror(SB),NOSPLIT,$0
+	MRC	15, 0, R0, C13, C0, 2
+	MOVW	0x34(R0), R0
+	MOVW	R0, ret+0(FP)
+	RET
+
+// Called by Windows as a Vectored Exception Handler (VEH).
+// R0 is pointer to struct containing
+// exception record and context pointers.
+// R1 is the kind of sigtramp function.
+// Return value of sigtrampgo is stored in R0.
+TEXT sigtramp<>(SB),NOSPLIT|NOFRAME,$0
+	MOVM.DB.W [R4-R11, R14], (R13)	// push {r4-r11, lr} (SP-=40)
+	SUB	$(16), R13		// reserve space for parameters/retval to go call
+
+	MOVW	R0, R6			// Save param0
+	MOVW	R1, R7			// Save param1
+	BL	runtime·load_g(SB)	// Clobbers R0
+
+	MOVW	$0, R4
+	MOVW	R4, 0(R13)	// No saved link register.
+	MOVW	R6, 4(R13)	// Move arg0 into position
+	MOVW	R7, 8(R13)	// Move arg1 into position
+	BL	runtime·sigtrampgo(SB)
+	MOVW	12(R13), R0	// Fetch return value from stack
+
+	ADD	$(16), R13			// free locals
+	MOVM.IA.W (R13), [R4-R11, R14]	// pop {r4-r11, lr}
+
+	B	(R14)				// return
+
+// Trampoline to resume execution from exception handler.
+// This is part of the control flow guard workaround.
+// It switches stacks and jumps to the continuation address.
+// R0 and R1 are set above at the end of sigtrampgo
+// in the context that starts executing at sigresume.
+TEXT runtime·sigresume(SB),NOSPLIT|NOFRAME,$0
+	// Important: do not smash LR,
+	// which is set to a live value when handling
+	// a signal by pushing a call to sigpanic onto the stack.
+	MOVW	R0, R13
+	B	(R1)
+
+TEXT runtime·exceptiontramp(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$const_callbackVEH, R1
+	B	sigtramp<>(SB)
+
+TEXT runtime·firstcontinuetramp(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$const_callbackFirstVCH, R1
+	B	sigtramp<>(SB)
+
+TEXT runtime·lastcontinuetramp(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$const_callbackLastVCH, R1
+	B	sigtramp<>(SB)
+
+TEXT runtime·callbackasm1(SB),NOSPLIT|NOFRAME,$0
+	// On entry, the trampoline in zcallback_windows_arm.s left
+	// the callback index in R12 (which is volatile in the C ABI).
+
+	// Push callback register arguments r0-r3. We do this first so
+	// they're contiguous with stack arguments.
+	MOVM.DB.W [R0-R3], (R13)
+	// Push C callee-save registers r4-r11 and lr.
+	MOVM.DB.W [R4-R11, R14], (R13)
+	SUB	$(16 + callbackArgs__size), R13	// space for locals
+
+	// Create a struct callbackArgs on our stack.
+	MOVW	R12, (16+callbackArgs_index)(R13)	// callback index
+	MOVW	$(16+callbackArgs__size+4*9)(R13), R0
+	MOVW	R0, (16+callbackArgs_args)(R13)		// address of args vector
+	MOVW	$0, R0
+	MOVW	R0, (16+callbackArgs_result)(R13)	// result
+
+	// Prepare for entry to Go.
+	BL	runtime·load_g(SB)
+
+	// Call cgocallback, which will call callbackWrap(frame).
+	MOVW	$0, R0
+	MOVW	R0, 12(R13)	// context
+	MOVW	$16(R13), R1	// R1 = &callbackArgs{...}
+	MOVW	R1, 8(R13)	// frame (address of callbackArgs)
+	MOVW	$·callbackWrap(SB), R1
+	MOVW	R1, 4(R13)	// PC of function to call
+	BL	runtime·cgocallback(SB)
+
+	// Get callback result.
+	MOVW	(16+callbackArgs_result)(R13), R0
+
+	ADD	$(16 + callbackArgs__size), R13	// free locals
+	MOVM.IA.W (R13), [R4-R11, R12]	// pop {r4-r11, lr=>r12}
+	ADD	$(4*4), R13	// skip r0-r3
+	B	(R12)	// return
+
+// uint32 tstart_stdcall(M *newm);
+TEXT runtime·tstart_stdcall(SB),NOSPLIT|NOFRAME,$0
+	MOVM.DB.W [R4-R11, R14], (R13)		// push {r4-r11, lr}
+
+	MOVW	m_g0(R0), g
+	MOVW	R0, g_m(g)
+	BL	runtime·save_g(SB)
+
+	// Layout new m scheduler stack on os stack.
+	MOVW	R13, R0
+	MOVW	R0, g_stack+stack_hi(g)
+	SUB	$(64*1024), R0
+	MOVW	R0, (g_stack+stack_lo)(g)
+	MOVW	R0, g_stackguard0(g)
+	MOVW	R0, g_stackguard1(g)
+
+	BL	runtime·emptyfunc(SB)	// fault if stack check is wrong
+	BL	runtime·mstart(SB)
+
+	// Exit the thread.
+	MOVW	$0, R0
+	MOVM.IA.W (R13), [R4-R11, R15]		// pop {r4-r11, pc}
+
+// Runs on OS stack.
+// duration (in -100ns units) is in dt+0(FP).
+// g may be nil.
+TEXT runtime·usleep2(SB),NOSPLIT|NOFRAME,$0-4
+	MOVW	dt+0(FP), R3
+	MOVM.DB.W [R4, R14], (R13)	// push {r4, lr}
+	MOVW	R13, R4			// Save SP
+	SUB	$8, R13			// R13 = R13 - 8
+	BIC	$0x7, R13		// Align SP for ABI
+	MOVW	$0, R1			// R1 = FALSE (alertable)
+	MOVW	$-1, R0			// R0 = handle
+	MOVW	R13, R2			// R2 = pTime
+	MOVW	R3, 0(R2)		// time_lo
+	MOVW	R0, 4(R2)		// time_hi
+	MOVW	runtime·_NtWaitForSingleObject(SB), R3
+	BL	(R3)
+	MOVW	R4, R13			// Restore SP
+	MOVM.IA.W (R13), [R4, R15]	// pop {R4, pc}
+
+// Runs on OS stack.
+TEXT runtime·switchtothread(SB),NOSPLIT|NOFRAME,$0
+	MOVM.DB.W [R4, R14], (R13)  	// push {R4, lr}
+	MOVW    R13, R4
+	BIC	$0x7, R13		// alignment for ABI
+	MOVW	runtime·_SwitchToThread(SB), R0
+	BL	(R0)
+	MOVW 	R4, R13			// restore stack pointer
+	MOVM.IA.W (R13), [R4, R15]	// pop {R4, pc}
+
+TEXT ·publicationBarrier(SB),NOSPLIT|NOFRAME,$0-0
+	B	runtime·armPublicationBarrier(SB)
+
+// never called (this is a GOARM=7 platform)
+TEXT runtime·read_tls_fallback(SB),NOSPLIT,$0
+	MOVW	$0xabcd, R0
+	MOVW	R0, (R0)
+	RET
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$0-8
+	MOVW	$0, R0
+	MOVB	runtime·useQPCTime(SB), R0
+	CMP	$0, R0
+	BNE	useQPC
+	MOVW	$_INTERRUPT_TIME, R3
+loop:
+	MOVW	time_hi1(R3), R1
+	DMB	MB_ISH
+	MOVW	time_lo(R3), R0
+	DMB	MB_ISH
+	MOVW	time_hi2(R3), R2
+	CMP	R1, R2
+	BNE	loop
+
+	// wintime = R1:R0, multiply by 100
+	MOVW	$100, R2
+	MULLU	R0, R2, (R4, R3)    // R4:R3 = R1:R0 * R2
+	MULA	R1, R2, R4, R4
+
+	// wintime*100 = R4:R3
+	MOVW	R3, ret_lo+0(FP)
+	MOVW	R4, ret_hi+4(FP)
+	RET
+useQPC:
+	RET	runtime·nanotimeQPC(SB)		// tail call
+
+// save_g saves the g register (R10) into thread local memory
+// so that we can call externally compiled
+// ARM code that will overwrite those registers.
+// NOTE: runtime.gogo assumes that R1 is preserved by this function.
+//       runtime.mcall assumes this function only clobbers R0 and R11.
+// Returns with g in R0.
+// Save the value in the _TEB->TlsSlots array.
+// Effectively implements TlsSetValue().
+// tls_g stores the TLS slot allocated TlsAlloc().
+TEXT runtime·save_g(SB),NOSPLIT,$0
+	MRC	15, 0, R0, C13, C0, 2
+	ADD	$0xe10, R0
+	MOVW 	$runtime·tls_g(SB), R11
+	MOVW	(R11), R11
+	MOVW	g, R11<<2(R0)
+	MOVW	g, R0	// preserve R0 across call to setg<>
+	RET
+
+// load_g loads the g register from thread-local memory,
+// for use after calling externally compiled
+// ARM code that overwrote those registers.
+// Get the value from the _TEB->TlsSlots array.
+// Effectively implements TlsGetValue().
+TEXT runtime·load_g(SB),NOSPLIT,$0
+	MRC	15, 0, R0, C13, C0, 2
+	ADD	$0xe10, R0
+	MOVW 	$runtime·tls_g(SB), g
+	MOVW	(g), g
+	MOVW	g<<2(R0), g
+	RET
+
+// This is called from rt0_go, which runs on the system stack
+// using the initial stack allocated by the OS.
+// It calls back into standard C using the BL below.
+// To do that, the stack pointer must be 8-byte-aligned.
+TEXT runtime·_initcgo(SB),NOSPLIT|NOFRAME,$0
+	MOVM.DB.W [R4, R14], (R13)	// push {r4, lr}
+
+	// Ensure stack is 8-byte aligned before calling C code
+	MOVW	R13, R4
+	BIC	$0x7, R13
+
+	// Allocate a TLS slot to hold g across calls to external code
+	MOVW 	$runtime·_TlsAlloc(SB), R0
+	MOVW	(R0), R0
+	BL	(R0)
+
+	// Assert that slot is less than 64 so we can use _TEB->TlsSlots
+	CMP	$64, R0
+	MOVW	$runtime·abort(SB), R1
+	BL.GE	(R1)
+
+	// Save Slot into tls_g
+	MOVW 	$runtime·tls_g(SB), R1
+	MOVW	R0, (R1)
+
+	MOVW	R4, R13
+	MOVM.IA.W (R13), [R4, R15]	// pop {r4, pc}
+
+// Holds the TLS Slot, which was allocated by TlsAlloc()
+GLOBL runtime·tls_g+0(SB), NOPTR, $4
diff --git a/src/runtime/sys_windows_arm64.s b/src/runtime/sys_windows_arm64.s
new file mode 100644
index 0000000..22bf1dd
--- /dev/null
+++ b/src/runtime/sys_windows_arm64.s
@@ -0,0 +1,288 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+#include "funcdata.h"
+#include "time_windows.h"
+#include "cgo/abi_arm64.h"
+
+// Offsets into Thread Environment Block (pointer in R18)
+#define TEB_error 0x68
+#define TEB_TlsSlots 0x1480
+#define TEB_ArbitraryPtr 0x28
+
+// Note: R0-R7 are args, R8 is indirect return value address,
+// R9-R15 are caller-save, R19-R29 are callee-save.
+//
+// load_g and save_g (in tls_arm64.s) clobber R27 (REGTMP) and R0.
+
+// void runtime·asmstdcall(void *c);
+TEXT runtime·asmstdcall(SB),NOSPLIT,$16
+	STP	(R19, R20), 16(RSP) // save old R19, R20
+	MOVD	R0, R19	// save libcall pointer
+	MOVD	RSP, R20	// save stack pointer
+
+	// SetLastError(0)
+	MOVD	$0,	TEB_error(R18_PLATFORM)
+	MOVD	libcall_args(R19), R12	// libcall->args
+
+	// Do we have more than 8 arguments?
+	MOVD	libcall_n(R19), R0
+	CMP	$0,	R0; BEQ	_0args
+	CMP	$1,	R0; BEQ	_1args
+	CMP	$2,	R0; BEQ	_2args
+	CMP	$3,	R0; BEQ	_3args
+	CMP	$4,	R0; BEQ	_4args
+	CMP	$5,	R0; BEQ	_5args
+	CMP	$6,	R0; BEQ	_6args
+	CMP	$7,	R0; BEQ	_7args
+	CMP	$8,	R0; BEQ	_8args
+
+	// Reserve stack space for remaining args
+	SUB	$8, R0, R2
+	ADD	$1, R2, R3 // make even number of words for stack alignment
+	AND	$~1, R3
+	LSL	$3, R3
+	SUB	R3, RSP
+
+	// R4: size of stack arguments (n-8)*8
+	// R5: &args[8]
+	// R6: loop counter, from 0 to (n-8)*8
+	// R7: scratch
+	// R8: copy of RSP - (R2)(RSP) assembles as (R2)(ZR)
+	SUB	$8, R0, R4
+	LSL	$3, R4
+	ADD	$(8*8), R12, R5
+	MOVD	$0, R6
+	MOVD	RSP, R8
+stackargs:
+	MOVD	(R6)(R5), R7
+	MOVD	R7, (R6)(R8)
+	ADD	$8, R6
+	CMP	R6, R4
+	BNE	stackargs
+
+_8args:
+	MOVD	(7*8)(R12), R7
+_7args:
+	MOVD	(6*8)(R12), R6
+_6args:
+	MOVD	(5*8)(R12), R5
+_5args:
+	MOVD	(4*8)(R12), R4
+_4args:
+	MOVD	(3*8)(R12), R3
+_3args:
+	MOVD	(2*8)(R12), R2
+_2args:
+	MOVD	(1*8)(R12), R1
+_1args:
+	MOVD	(0*8)(R12), R0
+_0args:
+
+	MOVD	libcall_fn(R19), R12	// branch to libcall->fn
+	BL	(R12)
+
+	MOVD	R20, RSP			// free stack space
+	MOVD	R0, libcall_r1(R19)		// save return value to libcall->r1
+	// TODO(rsc) floating point like amd64 in libcall->r2?
+
+	// GetLastError
+	MOVD	TEB_error(R18_PLATFORM), R0
+	MOVD	R0, libcall_err(R19)
+
+	// Restore callee-saved registers.
+	LDP	16(RSP), (R19, R20)
+	RET
+
+TEXT runtime·getlasterror(SB),NOSPLIT,$0
+	MOVD	TEB_error(R18_PLATFORM), R0
+	MOVD	R0, ret+0(FP)
+	RET
+
+// Called by Windows as a Vectored Exception Handler (VEH).
+// R0 is pointer to struct containing
+// exception record and context pointers.
+// R1 is the kind of sigtramp function.
+// Return value of sigtrampgo is stored in R0.
+TEXT sigtramp<>(SB),NOSPLIT,$176
+	// Switch from the host ABI to the Go ABI, safe args and lr.
+	MOVD	R0, R5
+	MOVD	R1, R6
+	MOVD	LR, R7
+	SAVE_R19_TO_R28(8*4)
+	SAVE_F8_TO_F15(8*14)
+
+	BL	runtime·load_g(SB)	// Clobers R0, R27, R28 (g)
+
+	MOVD	R5, R0
+	MOVD	R6, R1
+	// Calling ABIInternal because TLS might be nil.
+	BL	runtime·sigtrampgo<ABIInternal>(SB)
+	// Return value is already stored in R0.
+
+	// Restore callee-save registers.
+	RESTORE_R19_TO_R28(8*4)
+	RESTORE_F8_TO_F15(8*14)
+	MOVD	R7, LR
+	RET
+
+// Trampoline to resume execution from exception handler.
+// This is part of the control flow guard workaround.
+// It switches stacks and jumps to the continuation address.
+// R0 and R1 are set above at the end of sigtrampgo
+// in the context that starts executing at sigresume.
+TEXT runtime·sigresume(SB),NOSPLIT|NOFRAME,$0
+	// Important: do not smash LR,
+	// which is set to a live value when handling
+	// a signal by pushing a call to sigpanic onto the stack.
+	MOVD	R0, RSP
+	B	(R1)
+
+TEXT runtime·exceptiontramp(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$const_callbackVEH, R1
+	B	sigtramp<>(SB)
+
+TEXT runtime·firstcontinuetramp(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$const_callbackFirstVCH, R1
+	B	sigtramp<>(SB)
+
+TEXT runtime·lastcontinuetramp(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$const_callbackLastVCH, R1
+	B	sigtramp<>(SB)
+
+TEXT runtime·callbackasm1(SB),NOSPLIT,$208-0
+	NO_LOCAL_POINTERS
+
+	// On entry, the trampoline in zcallback_windows_arm64.s left
+	// the callback index in R12 (which is volatile in the C ABI).
+
+	// Save callback register arguments R0-R7.
+	// We do this at the top of the frame so they're contiguous with stack arguments.
+	// The 7*8 setting up R14 looks like a bug but is not: the eighth word
+	// is the space the assembler reserved for our caller's frame pointer,
+	// but we are not called from Go so that space is ours to use,
+	// and we must to be contiguous with the stack arguments.
+	MOVD	$arg0-(7*8)(SP), R14
+	STP	(R0, R1), (0*8)(R14)
+	STP	(R2, R3), (2*8)(R14)
+	STP	(R4, R5), (4*8)(R14)
+	STP	(R6, R7), (6*8)(R14)
+
+	// Push C callee-save registers R19-R28.
+	// LR, FP already saved.
+	SAVE_R19_TO_R28(8*9)
+
+	// Create a struct callbackArgs on our stack.
+	MOVD	$cbargs-(18*8+callbackArgs__size)(SP), R13
+	MOVD	R12, callbackArgs_index(R13)	// callback index
+	MOVD	R14, R0
+	MOVD	R0, callbackArgs_args(R13)		// address of args vector
+	MOVD	$0, R0
+	MOVD	R0, callbackArgs_result(R13)	// result
+
+	// Call cgocallback, which will call callbackWrap(frame).
+	MOVD	$·callbackWrap<ABIInternal>(SB), R0	// PC of function to call, cgocallback takes an ABIInternal entry-point
+	MOVD	R13, R1	// frame (&callbackArgs{...})
+	MOVD	$0, R2	// context
+	STP	(R0, R1), (1*8)(RSP)
+	MOVD	R2, (3*8)(RSP)
+	BL	runtime·cgocallback(SB)
+
+	// Get callback result.
+	MOVD	$cbargs-(18*8+callbackArgs__size)(SP), R13
+	MOVD	callbackArgs_result(R13), R0
+
+	RESTORE_R19_TO_R28(8*9)
+
+	RET
+
+// uint32 tstart_stdcall(M *newm);
+TEXT runtime·tstart_stdcall(SB),NOSPLIT,$96-0
+	SAVE_R19_TO_R28(8*3)
+
+	MOVD	m_g0(R0), g
+	MOVD	R0, g_m(g)
+	BL	runtime·save_g(SB)
+
+	// Set up stack guards for OS stack.
+	MOVD	RSP, R0
+	MOVD	R0, g_stack+stack_hi(g)
+	SUB	$(64*1024), R0
+	MOVD	R0, (g_stack+stack_lo)(g)
+	MOVD	R0, g_stackguard0(g)
+	MOVD	R0, g_stackguard1(g)
+
+	BL	runtime·emptyfunc(SB)	// fault if stack check is wrong
+	BL	runtime·mstart(SB)
+
+	RESTORE_R19_TO_R28(8*3)
+
+	// Exit the thread.
+	MOVD	$0, R0
+	RET
+
+// Runs on OS stack.
+// duration (in -100ns units) is in dt+0(FP).
+// g may be nil.
+TEXT runtime·usleep2(SB),NOSPLIT,$32-4
+	MOVW	dt+0(FP), R0
+	MOVD	$16(RSP), R2		// R2 = pTime
+	MOVD	R0, 0(R2)		// *pTime = -dt
+	MOVD	$-1, R0			// R0 = handle
+	MOVD	$0, R1			// R1 = FALSE (alertable)
+	MOVD	runtime·_NtWaitForSingleObject(SB), R3
+	SUB	$16, RSP	// skip over saved frame pointer below RSP
+	BL	(R3)
+	ADD	$16, RSP
+	RET
+
+// Runs on OS stack.
+TEXT runtime·switchtothread(SB),NOSPLIT,$16-0
+	MOVD	runtime·_SwitchToThread(SB), R0
+	SUB	$16, RSP	// skip over saved frame pointer below RSP
+	BL	(R0)
+	ADD	$16, RSP
+	RET
+
+TEXT runtime·nanotime1(SB),NOSPLIT,$0-8
+	MOVB	runtime·useQPCTime(SB), R0
+	CMP	$0, R0
+	BNE	useQPC
+	MOVD	$_INTERRUPT_TIME, R3
+	MOVD	time_lo(R3), R0
+	MOVD	$100, R1
+	MUL	R1, R0
+	MOVD	R0, ret+0(FP)
+	RET
+useQPC:
+	RET	runtime·nanotimeQPC(SB)		// tail call
+
+// This is called from rt0_go, which runs on the system stack
+// using the initial stack allocated by the OS.
+// It calls back into standard C using the BL below.
+TEXT runtime·wintls(SB),NOSPLIT,$0
+	// Allocate a TLS slot to hold g across calls to external code
+	MOVD	runtime·_TlsAlloc(SB), R0
+	SUB	$16, RSP	// skip over saved frame pointer below RSP
+	BL	(R0)
+	ADD	$16, RSP
+
+	// Assert that slot is less than 64 so we can use _TEB->TlsSlots
+	CMP	$64, R0
+	BLT	ok
+	// Fallback to the TEB arbitrary pointer.
+	// TODO: don't use the arbitrary pointer (see go.dev/issue/59824)
+	MOVD	$TEB_ArbitraryPtr, R0
+	B	settls
+ok:
+
+	// Save offset from R18 into tls_g.
+	LSL	$3, R0
+	ADD	$TEB_TlsSlots, R0
+settls:
+	MOVD	R0, runtime·tls_g(SB)
+	RET
diff --git a/src/runtime/sys_x86.go b/src/runtime/sys_x86.go
new file mode 100644
index 0000000..9fb36c2
--- /dev/null
+++ b/src/runtime/sys_x86.go
@@ -0,0 +1,23 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build amd64 || 386
+
+package runtime
+
+import (
+	"internal/goarch"
+	"unsafe"
+)
+
+// adjust Gobuf as if it executed a call to fn with context ctxt
+// and then stopped before the first instruction in fn.
+func gostartcall(buf *gobuf, fn, ctxt unsafe.Pointer) {
+	sp := buf.sp
+	sp -= goarch.PtrSize
+	*(*uintptr)(unsafe.Pointer(sp)) = buf.pc
+	buf.sp = sp
+	buf.pc = uintptr(fn)
+	buf.ctxt = ctxt
+}
diff --git a/src/runtime/syscall2_solaris.go b/src/runtime/syscall2_solaris.go
new file mode 100644
index 0000000..10a4fa0
--- /dev/null
+++ b/src/runtime/syscall2_solaris.go
@@ -0,0 +1,45 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import _ "unsafe" // for go:linkname
+
+//go:cgo_import_dynamic libc_chdir chdir "libc.so"
+//go:cgo_import_dynamic libc_chroot chroot "libc.so"
+//go:cgo_import_dynamic libc_close close "libc.so"
+//go:cgo_import_dynamic libc_execve execve "libc.so"
+//go:cgo_import_dynamic libc_fcntl fcntl "libc.so"
+//go:cgo_import_dynamic libc_forkx forkx "libc.so"
+//go:cgo_import_dynamic libc_gethostname gethostname "libc.so"
+//go:cgo_import_dynamic libc_getpid getpid "libc.so"
+//go:cgo_import_dynamic libc_ioctl ioctl "libc.so"
+//go:cgo_import_dynamic libc_setgid setgid "libc.so"
+//go:cgo_import_dynamic libc_setgroups setgroups "libc.so"
+//go:cgo_import_dynamic libc_setrlimit setrlimit "libc.so"
+//go:cgo_import_dynamic libc_setsid setsid "libc.so"
+//go:cgo_import_dynamic libc_setuid setuid "libc.so"
+//go:cgo_import_dynamic libc_setpgid setpgid "libc.so"
+//go:cgo_import_dynamic libc_syscall syscall "libc.so"
+//go:cgo_import_dynamic libc_wait4 wait4 "libc.so"
+//go:cgo_import_dynamic libc_issetugid issetugid "libc.so"
+
+//go:linkname libc_chdir libc_chdir
+//go:linkname libc_chroot libc_chroot
+//go:linkname libc_close libc_close
+//go:linkname libc_execve libc_execve
+//go:linkname libc_fcntl libc_fcntl
+//go:linkname libc_forkx libc_forkx
+//go:linkname libc_gethostname libc_gethostname
+//go:linkname libc_getpid libc_getpid
+//go:linkname libc_ioctl libc_ioctl
+//go:linkname libc_setgid libc_setgid
+//go:linkname libc_setgroups libc_setgroups
+//go:linkname libc_setrlimit libc_setrlimit
+//go:linkname libc_setsid libc_setsid
+//go:linkname libc_setuid libc_setuid
+//go:linkname libc_setpgid libc_setpgid
+//go:linkname libc_syscall libc_syscall
+//go:linkname libc_wait4 libc_wait4
+//go:linkname libc_issetugid libc_issetugid
diff --git a/src/runtime/syscall_aix.go b/src/runtime/syscall_aix.go
new file mode 100644
index 0000000..e87d4d6
--- /dev/null
+++ b/src/runtime/syscall_aix.go
@@ -0,0 +1,238 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// This file handles some syscalls from the syscall package
+// Especially, syscalls use during forkAndExecInChild which must not split the stack
+
+//go:cgo_import_dynamic libc_chdir chdir "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_chroot chroot "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_dup2 dup2 "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_execve execve "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_fcntl fcntl "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_fork fork "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_ioctl ioctl "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_setgid setgid "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_setgroups setgroups "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_setrlimit setrlimit "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_setsid setsid "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_setuid setuid "libc.a/shr_64.o"
+//go:cgo_import_dynamic libc_setpgid setpgid "libc.a/shr_64.o"
+
+//go:linkname libc_chdir libc_chdir
+//go:linkname libc_chroot libc_chroot
+//go:linkname libc_dup2 libc_dup2
+//go:linkname libc_execve libc_execve
+//go:linkname libc_fcntl libc_fcntl
+//go:linkname libc_fork libc_fork
+//go:linkname libc_ioctl libc_ioctl
+//go:linkname libc_setgid libc_setgid
+//go:linkname libc_setgroups libc_setgroups
+//go:linkname libc_setrlimit libc_setrlimit
+//go:linkname libc_setsid libc_setsid
+//go:linkname libc_setuid libc_setuid
+//go:linkname libc_setpgid libc_setpgid
+
+var (
+	libc_chdir,
+	libc_chroot,
+	libc_dup2,
+	libc_execve,
+	libc_fcntl,
+	libc_fork,
+	libc_ioctl,
+	libc_setgid,
+	libc_setgroups,
+	libc_setrlimit,
+	libc_setsid,
+	libc_setuid,
+	libc_setpgid libFunc
+)
+
+// In syscall_syscall6 and syscall_rawsyscall6, r2 is always 0
+// as it's never used on AIX
+// TODO: remove r2 from zsyscall_aix_$GOARCH.go
+
+// Syscall is needed because some packages (like net) need it too.
+// The best way is to return EINVAL and let Golang handles its failure
+// If the syscall can't fail, this function can redirect it to a real syscall.
+//
+// This is exported via linkname to assembly in the syscall package.
+//
+//go:nosplit
+//go:linkname syscall_Syscall
+func syscall_Syscall(fn, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	return 0, 0, _EINVAL
+}
+
+// This is syscall.RawSyscall, it exists to satisfy some build dependency,
+// but it doesn't work.
+//
+// This is exported via linkname to assembly in the syscall package.
+//
+//go:linkname syscall_RawSyscall
+func syscall_RawSyscall(trap, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	panic("RawSyscall not available on AIX")
+}
+
+// This is exported via linkname to assembly in the syscall package.
+//
+//go:nosplit
+//go:cgo_unsafe_args
+//go:linkname syscall_syscall6
+func syscall_syscall6(fn, nargs, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	c := libcall{
+		fn:   fn,
+		n:    nargs,
+		args: uintptr(unsafe.Pointer(&a1)),
+	}
+
+	entersyscallblock()
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+	exitsyscall()
+	return c.r1, 0, c.err
+}
+
+// This is exported via linkname to assembly in the syscall package.
+//
+//go:nosplit
+//go:cgo_unsafe_args
+//go:linkname syscall_rawSyscall6
+func syscall_rawSyscall6(fn, nargs, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	c := libcall{
+		fn:   fn,
+		n:    nargs,
+		args: uintptr(unsafe.Pointer(&a1)),
+	}
+
+	asmcgocall(unsafe.Pointer(&asmsyscall6), unsafe.Pointer(&c))
+
+	return c.r1, 0, c.err
+}
+
+//go:linkname syscall_chdir syscall.chdir
+//go:nosplit
+func syscall_chdir(path uintptr) (err uintptr) {
+	_, err = syscall1(&libc_chdir, path)
+	return
+}
+
+//go:linkname syscall_chroot1 syscall.chroot1
+//go:nosplit
+func syscall_chroot1(path uintptr) (err uintptr) {
+	_, err = syscall1(&libc_chroot, path)
+	return
+}
+
+// like close, but must not split stack, for fork.
+//
+//go:linkname syscall_closeFD syscall.closeFD
+//go:nosplit
+func syscall_closeFD(fd int32) int32 {
+	_, err := syscall1(&libc_close, uintptr(fd))
+	return int32(err)
+}
+
+//go:linkname syscall_dup2child syscall.dup2child
+//go:nosplit
+func syscall_dup2child(old, new uintptr) (val, err uintptr) {
+	val, err = syscall2(&libc_dup2, old, new)
+	return
+}
+
+//go:linkname syscall_execve syscall.execve
+//go:nosplit
+func syscall_execve(path, argv, envp uintptr) (err uintptr) {
+	_, err = syscall3(&libc_execve, path, argv, envp)
+	return
+}
+
+// like exit, but must not split stack, for fork.
+//
+//go:linkname syscall_exit syscall.exit
+//go:nosplit
+func syscall_exit(code uintptr) {
+	syscall1(&libc_exit, code)
+}
+
+//go:linkname syscall_fcntl1 syscall.fcntl1
+//go:nosplit
+func syscall_fcntl1(fd, cmd, arg uintptr) (val, err uintptr) {
+	val, err = syscall3(&libc_fcntl, fd, cmd, arg)
+	return
+
+}
+
+//go:linkname syscall_forkx syscall.forkx
+//go:nosplit
+func syscall_forkx(flags uintptr) (pid uintptr, err uintptr) {
+	pid, err = syscall1(&libc_fork, flags)
+	return
+}
+
+//go:linkname syscall_getpid syscall.getpid
+//go:nosplit
+func syscall_getpid() (pid, err uintptr) {
+	pid, err = syscall0(&libc_getpid)
+	return
+}
+
+//go:linkname syscall_ioctl syscall.ioctl
+//go:nosplit
+func syscall_ioctl(fd, req, arg uintptr) (err uintptr) {
+	_, err = syscall3(&libc_ioctl, fd, req, arg)
+	return
+}
+
+//go:linkname syscall_setgid syscall.setgid
+//go:nosplit
+func syscall_setgid(gid uintptr) (err uintptr) {
+	_, err = syscall1(&libc_setgid, gid)
+	return
+}
+
+//go:linkname syscall_setgroups1 syscall.setgroups1
+//go:nosplit
+func syscall_setgroups1(ngid, gid uintptr) (err uintptr) {
+	_, err = syscall2(&libc_setgroups, ngid, gid)
+	return
+}
+
+//go:linkname syscall_setrlimit1 syscall.setrlimit1
+//go:nosplit
+func syscall_setrlimit1(which uintptr, lim unsafe.Pointer) (err uintptr) {
+	_, err = syscall2(&libc_setrlimit, which, uintptr(lim))
+	return
+}
+
+//go:linkname syscall_setsid syscall.setsid
+//go:nosplit
+func syscall_setsid() (pid, err uintptr) {
+	pid, err = syscall0(&libc_setsid)
+	return
+}
+
+//go:linkname syscall_setuid syscall.setuid
+//go:nosplit
+func syscall_setuid(uid uintptr) (err uintptr) {
+	_, err = syscall1(&libc_setuid, uid)
+	return
+}
+
+//go:linkname syscall_setpgid syscall.setpgid
+//go:nosplit
+func syscall_setpgid(pid, pgid uintptr) (err uintptr) {
+	_, err = syscall2(&libc_setpgid, pid, pgid)
+	return
+}
+
+//go:linkname syscall_write1 syscall.write1
+//go:nosplit
+func syscall_write1(fd, buf, nbyte uintptr) (n, err uintptr) {
+	n, err = syscall3(&libc_write, fd, buf, nbyte)
+	return
+}
diff --git a/src/runtime/syscall_solaris.go b/src/runtime/syscall_solaris.go
new file mode 100644
index 0000000..11b9c2a
--- /dev/null
+++ b/src/runtime/syscall_solaris.go
@@ -0,0 +1,330 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+var (
+	libc_chdir,
+	libc_chroot,
+	libc_close,
+	libc_execve,
+	libc_fcntl,
+	libc_forkx,
+	libc_gethostname,
+	libc_getpid,
+	libc_ioctl,
+	libc_setgid,
+	libc_setgroups,
+	libc_setrlimit,
+	libc_setsid,
+	libc_setuid,
+	libc_setpgid,
+	libc_syscall,
+	libc_issetugid,
+	libc_wait4 libcFunc
+)
+
+// Many of these are exported via linkname to assembly in the syscall
+// package.
+
+//go:nosplit
+//go:linkname syscall_sysvicall6
+//go:cgo_unsafe_args
+func syscall_sysvicall6(fn, nargs, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	call := libcall{
+		fn:   fn,
+		n:    nargs,
+		args: uintptr(unsafe.Pointer(&a1)),
+	}
+	entersyscallblock()
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	exitsyscall()
+	return call.r1, call.r2, call.err
+}
+
+//go:nosplit
+//go:linkname syscall_rawsysvicall6
+//go:cgo_unsafe_args
+func syscall_rawsysvicall6(fn, nargs, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	call := libcall{
+		fn:   fn,
+		n:    nargs,
+		args: uintptr(unsafe.Pointer(&a1)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.r1, call.r2, call.err
+}
+
+// TODO(aram): Once we remove all instances of C calling sysvicallN, make
+// sysvicallN return errors and replace the body of the following functions
+// with calls to sysvicallN.
+
+//go:nosplit
+//go:linkname syscall_chdir
+func syscall_chdir(path uintptr) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_chdir)),
+		n:    1,
+		args: uintptr(unsafe.Pointer(&path)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+//go:nosplit
+//go:linkname syscall_chroot
+func syscall_chroot(path uintptr) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_chroot)),
+		n:    1,
+		args: uintptr(unsafe.Pointer(&path)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+// like close, but must not split stack, for forkx.
+//
+//go:nosplit
+//go:linkname syscall_close
+func syscall_close(fd int32) int32 {
+	return int32(sysvicall1(&libc_close, uintptr(fd)))
+}
+
+const _F_DUP2FD = 0x9
+
+//go:nosplit
+//go:linkname syscall_dup2
+func syscall_dup2(oldfd, newfd uintptr) (val, err uintptr) {
+	return syscall_fcntl(oldfd, _F_DUP2FD, newfd)
+}
+
+//go:nosplit
+//go:linkname syscall_execve
+//go:cgo_unsafe_args
+func syscall_execve(path, argv, envp uintptr) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_execve)),
+		n:    3,
+		args: uintptr(unsafe.Pointer(&path)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+// like exit, but must not split stack, for forkx.
+//
+//go:nosplit
+//go:linkname syscall_exit
+func syscall_exit(code uintptr) {
+	sysvicall1(&libc_exit, code)
+}
+
+//go:nosplit
+//go:linkname syscall_fcntl
+//go:cgo_unsafe_args
+func syscall_fcntl(fd, cmd, arg uintptr) (val, err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_fcntl)),
+		n:    3,
+		args: uintptr(unsafe.Pointer(&fd)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.r1, call.err
+}
+
+//go:nosplit
+//go:linkname syscall_forkx
+func syscall_forkx(flags uintptr) (pid uintptr, err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_forkx)),
+		n:    1,
+		args: uintptr(unsafe.Pointer(&flags)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	if int(call.r1) != -1 {
+		call.err = 0
+	}
+	return call.r1, call.err
+}
+
+//go:linkname syscall_gethostname
+func syscall_gethostname() (name string, err uintptr) {
+	cname := new([_MAXHOSTNAMELEN]byte)
+	var args = [2]uintptr{uintptr(unsafe.Pointer(&cname[0])), _MAXHOSTNAMELEN}
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_gethostname)),
+		n:    2,
+		args: uintptr(unsafe.Pointer(&args[0])),
+	}
+	entersyscallblock()
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	exitsyscall()
+	if call.r1 != 0 {
+		return "", call.err
+	}
+	cname[_MAXHOSTNAMELEN-1] = 0
+	return gostringnocopy(&cname[0]), 0
+}
+
+//go:nosplit
+//go:linkname syscall_getpid
+func syscall_getpid() (pid, err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_getpid)),
+		n:    0,
+		args: uintptr(unsafe.Pointer(&libc_getpid)), // it's unused but must be non-nil, otherwise crashes
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.r1, call.err
+}
+
+//go:nosplit
+//go:linkname syscall_ioctl
+//go:cgo_unsafe_args
+func syscall_ioctl(fd, req, arg uintptr) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_ioctl)),
+		n:    3,
+		args: uintptr(unsafe.Pointer(&fd)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+// This is syscall.RawSyscall, it exists to satisfy some build dependency,
+// but it doesn't work.
+//
+//go:linkname syscall_rawsyscall
+func syscall_rawsyscall(trap, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	panic("RawSyscall not available on Solaris")
+}
+
+// This is syscall.RawSyscall6, it exists to avoid a linker error because
+// syscall.RawSyscall6 is already declared. See golang.org/issue/24357
+//
+//go:linkname syscall_rawsyscall6
+func syscall_rawsyscall6(trap, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	panic("RawSyscall6 not available on Solaris")
+}
+
+//go:nosplit
+//go:linkname syscall_setgid
+func syscall_setgid(gid uintptr) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_setgid)),
+		n:    1,
+		args: uintptr(unsafe.Pointer(&gid)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+//go:nosplit
+//go:linkname syscall_setgroups
+//go:cgo_unsafe_args
+func syscall_setgroups(ngid, gid uintptr) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_setgroups)),
+		n:    2,
+		args: uintptr(unsafe.Pointer(&ngid)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+//go:nosplit
+//go:linkname syscall_setrlimit
+//go:cgo_unsafe_args
+func syscall_setrlimit(which uintptr, lim unsafe.Pointer) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_setrlimit)),
+		n:    2,
+		args: uintptr(unsafe.Pointer(&which)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+//go:nosplit
+//go:linkname syscall_setsid
+func syscall_setsid() (pid, err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_setsid)),
+		n:    0,
+		args: uintptr(unsafe.Pointer(&libc_setsid)), // it's unused but must be non-nil, otherwise crashes
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.r1, call.err
+}
+
+//go:nosplit
+//go:linkname syscall_setuid
+func syscall_setuid(uid uintptr) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_setuid)),
+		n:    1,
+		args: uintptr(unsafe.Pointer(&uid)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+//go:nosplit
+//go:linkname syscall_setpgid
+//go:cgo_unsafe_args
+func syscall_setpgid(pid, pgid uintptr) (err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_setpgid)),
+		n:    2,
+		args: uintptr(unsafe.Pointer(&pid)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.err
+}
+
+//go:linkname syscall_syscall
+//go:cgo_unsafe_args
+func syscall_syscall(trap, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_syscall)),
+		n:    4,
+		args: uintptr(unsafe.Pointer(&trap)),
+	}
+	entersyscallblock()
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	exitsyscall()
+	return call.r1, call.r2, call.err
+}
+
+//go:linkname syscall_wait4
+//go:cgo_unsafe_args
+func syscall_wait4(pid uintptr, wstatus *uint32, options uintptr, rusage unsafe.Pointer) (wpid int, err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_wait4)),
+		n:    4,
+		args: uintptr(unsafe.Pointer(&pid)),
+	}
+	entersyscallblock()
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	exitsyscall()
+	KeepAlive(wstatus)
+	KeepAlive(rusage)
+	return int(call.r1), call.err
+}
+
+//go:nosplit
+//go:linkname syscall_write
+//go:cgo_unsafe_args
+func syscall_write(fd, buf, nbyte uintptr) (n, err uintptr) {
+	call := libcall{
+		fn:   uintptr(unsafe.Pointer(&libc_write)),
+		n:    3,
+		args: uintptr(unsafe.Pointer(&fd)),
+	}
+	asmcgocall(unsafe.Pointer(&asmsysvicall6x), unsafe.Pointer(&call))
+	return call.r1, call.err
+}
diff --git a/src/runtime/syscall_unix_test.go b/src/runtime/syscall_unix_test.go
new file mode 100644
index 0000000..2a69c40
--- /dev/null
+++ b/src/runtime/syscall_unix_test.go
@@ -0,0 +1,25 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package runtime_test
+
+import (
+	"runtime"
+	"syscall"
+	"testing"
+)
+
+func TestSyscallFlagAlignment(t *testing.T) {
+	// TODO(mknyszek): Check other flags.
+	check := func(name string, got, want int) {
+		if got != want {
+			t.Errorf("flag %s does not line up: got %d, want %d", name, got, want)
+		}
+	}
+	check("O_WRONLY", runtime.O_WRONLY, syscall.O_WRONLY)
+	check("O_CREAT", runtime.O_CREAT, syscall.O_CREAT)
+	check("O_TRUNC", runtime.O_TRUNC, syscall.O_TRUNC)
+}
diff --git a/src/runtime/syscall_windows.go b/src/runtime/syscall_windows.go
new file mode 100644
index 0000000..ba88e93
--- /dev/null
+++ b/src/runtime/syscall_windows.go
@@ -0,0 +1,546 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"unsafe"
+)
+
+// cbs stores all registered Go callbacks.
+var cbs struct {
+	lock  mutex // use cbsLock / cbsUnlock for race instrumentation.
+	ctxt  [cb_max]winCallback
+	index map[winCallbackKey]int
+	n     int
+}
+
+func cbsLock() {
+	lock(&cbs.lock)
+	// compileCallback is used by goenvs prior to completion of schedinit.
+	// raceacquire involves a racecallback to get the proc, which is not
+	// safe prior to scheduler initialization. Thus avoid instrumentation
+	// until then.
+	if raceenabled && mainStarted {
+		raceacquire(unsafe.Pointer(&cbs.lock))
+	}
+}
+
+func cbsUnlock() {
+	if raceenabled && mainStarted {
+		racerelease(unsafe.Pointer(&cbs.lock))
+	}
+	unlock(&cbs.lock)
+}
+
+// winCallback records information about a registered Go callback.
+type winCallback struct {
+	fn     *funcval // Go function
+	retPop uintptr  // For 386 cdecl, how many bytes to pop on return
+	abiMap abiDesc
+}
+
+// abiPartKind is the action an abiPart should take.
+type abiPartKind int
+
+const (
+	abiPartBad   abiPartKind = iota
+	abiPartStack             // Move a value from memory to the stack.
+	abiPartReg               // Move a value from memory to a register.
+)
+
+// abiPart encodes a step in translating between calling ABIs.
+type abiPart struct {
+	kind           abiPartKind
+	srcStackOffset uintptr
+	dstStackOffset uintptr // used if kind == abiPartStack
+	dstRegister    int     // used if kind == abiPartReg
+	len            uintptr
+}
+
+func (a *abiPart) tryMerge(b abiPart) bool {
+	if a.kind != abiPartStack || b.kind != abiPartStack {
+		return false
+	}
+	if a.srcStackOffset+a.len == b.srcStackOffset && a.dstStackOffset+a.len == b.dstStackOffset {
+		a.len += b.len
+		return true
+	}
+	return false
+}
+
+// abiDesc specifies how to translate from a C frame to a Go
+// frame. This does not specify how to translate back because
+// the result is always a uintptr. If the C ABI is fastcall,
+// this assumes the four fastcall registers were first spilled
+// to the shadow space.
+type abiDesc struct {
+	parts []abiPart
+
+	srcStackSize uintptr // stdcall/fastcall stack space tracking
+	dstStackSize uintptr // Go stack space used
+	dstSpill     uintptr // Extra stack space for argument spill slots
+	dstRegisters int     // Go ABI int argument registers used
+
+	// retOffset is the offset of the uintptr-sized result in the Go
+	// frame.
+	retOffset uintptr
+}
+
+func (p *abiDesc) assignArg(t *_type) {
+	if t.Size_ > goarch.PtrSize {
+		// We don't support this right now. In
+		// stdcall/cdecl, 64-bit ints and doubles are
+		// passed as two words (little endian); and
+		// structs are pushed on the stack. In
+		// fastcall, arguments larger than the word
+		// size are passed by reference. On arm,
+		// 8-byte aligned arguments round up to the
+		// next even register and can be split across
+		// registers and the stack.
+		panic("compileCallback: argument size is larger than uintptr")
+	}
+	if k := t.Kind_ & kindMask; GOARCH != "386" && (k == kindFloat32 || k == kindFloat64) {
+		// In fastcall, floating-point arguments in
+		// the first four positions are passed in
+		// floating-point registers, which we don't
+		// currently spill. arm passes floating-point
+		// arguments in VFP registers, which we also
+		// don't support.
+		// So basically we only support 386.
+		panic("compileCallback: float arguments not supported")
+	}
+
+	if t.Size_ == 0 {
+		// The Go ABI aligns for zero-sized types.
+		p.dstStackSize = alignUp(p.dstStackSize, uintptr(t.Align_))
+		return
+	}
+
+	// In the C ABI, we're already on a word boundary.
+	// Also, sub-word-sized fastcall register arguments
+	// are stored to the least-significant bytes of the
+	// argument word and all supported Windows
+	// architectures are little endian, so srcStackOffset
+	// is already pointing to the right place for smaller
+	// arguments. The same is true on arm.
+
+	oldParts := p.parts
+	if p.tryRegAssignArg(t, 0) {
+		// Account for spill space.
+		//
+		// TODO(mknyszek): Remove this when we no longer have
+		// caller reserved spill space.
+		p.dstSpill = alignUp(p.dstSpill, uintptr(t.Align_))
+		p.dstSpill += t.Size_
+	} else {
+		// Register assignment failed.
+		// Undo the work and stack assign.
+		p.parts = oldParts
+
+		// The Go ABI aligns arguments.
+		p.dstStackSize = alignUp(p.dstStackSize, uintptr(t.Align_))
+
+		// Copy just the size of the argument. Note that this
+		// could be a small by-value struct, but C and Go
+		// struct layouts are compatible, so we can copy these
+		// directly, too.
+		part := abiPart{
+			kind:           abiPartStack,
+			srcStackOffset: p.srcStackSize,
+			dstStackOffset: p.dstStackSize,
+			len:            t.Size_,
+		}
+		// Add this step to the adapter.
+		if len(p.parts) == 0 || !p.parts[len(p.parts)-1].tryMerge(part) {
+			p.parts = append(p.parts, part)
+		}
+		// The Go ABI packs arguments.
+		p.dstStackSize += t.Size_
+	}
+
+	// cdecl, stdcall, fastcall, and arm pad arguments to word size.
+	// TODO(rsc): On arm and arm64 do we need to skip the caller's saved LR?
+	p.srcStackSize += goarch.PtrSize
+}
+
+// tryRegAssignArg tries to register-assign a value of type t.
+// If this type is nested in an aggregate type, then offset is the
+// offset of this type within its parent type.
+// Assumes t.size <= goarch.PtrSize and t.size != 0.
+//
+// Returns whether the assignment succeeded.
+func (p *abiDesc) tryRegAssignArg(t *_type, offset uintptr) bool {
+	switch k := t.Kind_ & kindMask; k {
+	case kindBool, kindInt, kindInt8, kindInt16, kindInt32, kindUint, kindUint8, kindUint16, kindUint32, kindUintptr, kindPtr, kindUnsafePointer:
+		// Assign a register for all these types.
+		return p.assignReg(t.Size_, offset)
+	case kindInt64, kindUint64:
+		// Only register-assign if the registers are big enough.
+		if goarch.PtrSize == 8 {
+			return p.assignReg(t.Size_, offset)
+		}
+	case kindArray:
+		at := (*arraytype)(unsafe.Pointer(t))
+		if at.Len == 1 {
+			return p.tryRegAssignArg(at.Elem, offset) // TODO fix when runtime is fully commoned up w/ abi.Type
+		}
+	case kindStruct:
+		st := (*structtype)(unsafe.Pointer(t))
+		for i := range st.Fields {
+			f := &st.Fields[i]
+			if !p.tryRegAssignArg(f.Typ, offset+f.Offset) {
+				return false
+			}
+		}
+		return true
+	}
+	// Pointer-sized types such as maps and channels are currently
+	// not supported.
+	panic("compileCallback: type " + toRType(t).string() + " is currently not supported for use in system callbacks")
+}
+
+// assignReg attempts to assign a single register for an
+// argument with the given size, at the given offset into the
+// value in the C ABI space.
+//
+// Returns whether the assignment was successful.
+func (p *abiDesc) assignReg(size, offset uintptr) bool {
+	if p.dstRegisters >= intArgRegs {
+		return false
+	}
+	p.parts = append(p.parts, abiPart{
+		kind:           abiPartReg,
+		srcStackOffset: p.srcStackSize + offset,
+		dstRegister:    p.dstRegisters,
+		len:            size,
+	})
+	p.dstRegisters++
+	return true
+}
+
+type winCallbackKey struct {
+	fn    *funcval
+	cdecl bool
+}
+
+func callbackasm()
+
+// callbackasmAddr returns address of runtime.callbackasm
+// function adjusted by i.
+// On x86 and amd64, runtime.callbackasm is a series of CALL instructions,
+// and we want callback to arrive at
+// correspondent call instruction instead of start of
+// runtime.callbackasm.
+// On ARM, runtime.callbackasm is a series of mov and branch instructions.
+// R12 is loaded with the callback index. Each entry is two instructions,
+// hence 8 bytes.
+func callbackasmAddr(i int) uintptr {
+	var entrySize int
+	switch GOARCH {
+	default:
+		panic("unsupported architecture")
+	case "386", "amd64":
+		entrySize = 5
+	case "arm", "arm64":
+		// On ARM and ARM64, each entry is a MOV instruction
+		// followed by a branch instruction
+		entrySize = 8
+	}
+	return abi.FuncPCABI0(callbackasm) + uintptr(i*entrySize)
+}
+
+const callbackMaxFrame = 64 * goarch.PtrSize
+
+// compileCallback converts a Go function fn into a C function pointer
+// that can be passed to Windows APIs.
+//
+// On 386, if cdecl is true, the returned C function will use the
+// cdecl calling convention; otherwise, it will use stdcall. On amd64,
+// it always uses fastcall. On arm, it always uses the ARM convention.
+//
+//go:linkname compileCallback syscall.compileCallback
+func compileCallback(fn eface, cdecl bool) (code uintptr) {
+	if GOARCH != "386" {
+		// cdecl is only meaningful on 386.
+		cdecl = false
+	}
+
+	if fn._type == nil || (fn._type.Kind_&kindMask) != kindFunc {
+		panic("compileCallback: expected function with one uintptr-sized result")
+	}
+	ft := (*functype)(unsafe.Pointer(fn._type))
+
+	// Check arguments and construct ABI translation.
+	var abiMap abiDesc
+	for _, t := range ft.InSlice() {
+		abiMap.assignArg(t)
+	}
+	// The Go ABI aligns the result to the word size. src is
+	// already aligned.
+	abiMap.dstStackSize = alignUp(abiMap.dstStackSize, goarch.PtrSize)
+	abiMap.retOffset = abiMap.dstStackSize
+
+	if len(ft.OutSlice()) != 1 {
+		panic("compileCallback: expected function with one uintptr-sized result")
+	}
+	if ft.OutSlice()[0].Size_ != goarch.PtrSize {
+		panic("compileCallback: expected function with one uintptr-sized result")
+	}
+	if k := ft.OutSlice()[0].Kind_ & kindMask; k == kindFloat32 || k == kindFloat64 {
+		// In cdecl and stdcall, float results are returned in
+		// ST(0). In fastcall, they're returned in XMM0.
+		// Either way, it's not AX.
+		panic("compileCallback: float results not supported")
+	}
+	if intArgRegs == 0 {
+		// Make room for the uintptr-sized result.
+		// If there are argument registers, the return value will
+		// be passed in the first register.
+		abiMap.dstStackSize += goarch.PtrSize
+	}
+
+	// TODO(mknyszek): Remove dstSpill from this calculation when we no longer have
+	// caller reserved spill space.
+	frameSize := alignUp(abiMap.dstStackSize, goarch.PtrSize)
+	frameSize += abiMap.dstSpill
+	if frameSize > callbackMaxFrame {
+		panic("compileCallback: function argument frame too large")
+	}
+
+	// For cdecl, the callee is responsible for popping its
+	// arguments from the C stack.
+	var retPop uintptr
+	if cdecl {
+		retPop = abiMap.srcStackSize
+	}
+
+	key := winCallbackKey{(*funcval)(fn.data), cdecl}
+
+	cbsLock()
+
+	// Check if this callback is already registered.
+	if n, ok := cbs.index[key]; ok {
+		cbsUnlock()
+		return callbackasmAddr(n)
+	}
+
+	// Register the callback.
+	if cbs.index == nil {
+		cbs.index = make(map[winCallbackKey]int)
+	}
+	n := cbs.n
+	if n >= len(cbs.ctxt) {
+		cbsUnlock()
+		throw("too many callback functions")
+	}
+	c := winCallback{key.fn, retPop, abiMap}
+	cbs.ctxt[n] = c
+	cbs.index[key] = n
+	cbs.n++
+
+	cbsUnlock()
+	return callbackasmAddr(n)
+}
+
+type callbackArgs struct {
+	index uintptr
+	// args points to the argument block.
+	//
+	// For cdecl and stdcall, all arguments are on the stack.
+	//
+	// For fastcall, the trampoline spills register arguments to
+	// the reserved spill slots below the stack arguments,
+	// resulting in a layout equivalent to stdcall.
+	//
+	// For arm, the trampoline stores the register arguments just
+	// below the stack arguments, so again we can treat it as one
+	// big stack arguments frame.
+	args unsafe.Pointer
+	// Below are out-args from callbackWrap
+	result uintptr
+	retPop uintptr // For 386 cdecl, how many bytes to pop on return
+}
+
+// callbackWrap is called by callbackasm to invoke a registered C callback.
+func callbackWrap(a *callbackArgs) {
+	c := cbs.ctxt[a.index]
+	a.retPop = c.retPop
+
+	// Convert from C to Go ABI.
+	var regs abi.RegArgs
+	var frame [callbackMaxFrame]byte
+	goArgs := unsafe.Pointer(&frame)
+	for _, part := range c.abiMap.parts {
+		switch part.kind {
+		case abiPartStack:
+			memmove(add(goArgs, part.dstStackOffset), add(a.args, part.srcStackOffset), part.len)
+		case abiPartReg:
+			goReg := unsafe.Pointer(&regs.Ints[part.dstRegister])
+			memmove(goReg, add(a.args, part.srcStackOffset), part.len)
+		default:
+			panic("bad ABI description")
+		}
+	}
+
+	// TODO(mknyszek): Remove this when we no longer have
+	// caller reserved spill space.
+	frameSize := alignUp(c.abiMap.dstStackSize, goarch.PtrSize)
+	frameSize += c.abiMap.dstSpill
+
+	// Even though this is copying back results, we can pass a nil
+	// type because those results must not require write barriers.
+	reflectcall(nil, unsafe.Pointer(c.fn), noescape(goArgs), uint32(c.abiMap.dstStackSize), uint32(c.abiMap.retOffset), uint32(frameSize), &regs)
+
+	// Extract the result.
+	//
+	// There's always exactly one return value, one pointer in size.
+	// If it's on the stack, then we will have reserved space for it
+	// at the end of the frame, otherwise it was passed in a register.
+	if c.abiMap.dstStackSize != c.abiMap.retOffset {
+		a.result = *(*uintptr)(unsafe.Pointer(&frame[c.abiMap.retOffset]))
+	} else {
+		var zero int
+		// On architectures with no registers, Ints[0] would be a compile error,
+		// so we use a dynamic index. These architectures will never take this
+		// branch, so this won't cause a runtime panic.
+		a.result = regs.Ints[zero]
+	}
+}
+
+const _LOAD_LIBRARY_SEARCH_SYSTEM32 = 0x00000800
+
+//go:linkname syscall_loadsystemlibrary syscall.loadsystemlibrary
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_loadsystemlibrary(filename *uint16) (handle, err uintptr) {
+	lockOSThread()
+	c := &getg().m.syscall
+	c.fn = getLoadLibraryEx()
+	c.n = 3
+	args := struct {
+		lpFileName *uint16
+		hFile      uintptr // always 0
+		flags      uint32
+	}{filename, 0, _LOAD_LIBRARY_SEARCH_SYSTEM32}
+	c.args = uintptr(noescape(unsafe.Pointer(&args)))
+
+	cgocall(asmstdcallAddr, unsafe.Pointer(c))
+	KeepAlive(filename)
+	handle = c.r1
+	if handle == 0 {
+		err = c.err
+	}
+	unlockOSThread() // not defer'd after the lockOSThread above to save stack frame size.
+	return
+}
+
+//go:linkname syscall_loadlibrary syscall.loadlibrary
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_loadlibrary(filename *uint16) (handle, err uintptr) {
+	lockOSThread()
+	defer unlockOSThread()
+	c := &getg().m.syscall
+	c.fn = getLoadLibrary()
+	c.n = 1
+	c.args = uintptr(noescape(unsafe.Pointer(&filename)))
+	cgocall(asmstdcallAddr, unsafe.Pointer(c))
+	KeepAlive(filename)
+	handle = c.r1
+	if handle == 0 {
+		err = c.err
+	}
+	return
+}
+
+//go:linkname syscall_getprocaddress syscall.getprocaddress
+//go:nosplit
+//go:cgo_unsafe_args
+func syscall_getprocaddress(handle uintptr, procname *byte) (outhandle, err uintptr) {
+	lockOSThread()
+	defer unlockOSThread()
+	c := &getg().m.syscall
+	c.fn = getGetProcAddress()
+	c.n = 2
+	c.args = uintptr(noescape(unsafe.Pointer(&handle)))
+	cgocall(asmstdcallAddr, unsafe.Pointer(c))
+	KeepAlive(procname)
+	outhandle = c.r1
+	if outhandle == 0 {
+		err = c.err
+	}
+	return
+}
+
+//go:linkname syscall_Syscall syscall.Syscall
+//go:nosplit
+func syscall_Syscall(fn, nargs, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+	return syscall_SyscallN(fn, a1, a2, a3)
+}
+
+//go:linkname syscall_Syscall6 syscall.Syscall6
+//go:nosplit
+func syscall_Syscall6(fn, nargs, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
+	return syscall_SyscallN(fn, a1, a2, a3, a4, a5, a6)
+}
+
+//go:linkname syscall_Syscall9 syscall.Syscall9
+//go:nosplit
+func syscall_Syscall9(fn, nargs, a1, a2, a3, a4, a5, a6, a7, a8, a9 uintptr) (r1, r2, err uintptr) {
+	return syscall_SyscallN(fn, a1, a2, a3, a4, a5, a6, a7, a8, a9)
+}
+
+//go:linkname syscall_Syscall12 syscall.Syscall12
+//go:nosplit
+func syscall_Syscall12(fn, nargs, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12 uintptr) (r1, r2, err uintptr) {
+	return syscall_SyscallN(fn, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12)
+}
+
+//go:linkname syscall_Syscall15 syscall.Syscall15
+//go:nosplit
+func syscall_Syscall15(fn, nargs, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15 uintptr) (r1, r2, err uintptr) {
+	return syscall_SyscallN(fn, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15)
+}
+
+//go:linkname syscall_Syscall18 syscall.Syscall18
+//go:nosplit
+func syscall_Syscall18(fn, nargs, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18 uintptr) (r1, r2, err uintptr) {
+	return syscall_SyscallN(fn, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18)
+}
+
+// maxArgs should be divisible by 2, as Windows stack
+// must be kept 16-byte aligned on syscall entry.
+//
+// Although it only permits maximum 42 parameters, it
+// is arguably large enough.
+const maxArgs = 42
+
+//go:linkname syscall_SyscallN syscall.SyscallN
+//go:nosplit
+func syscall_SyscallN(trap uintptr, args ...uintptr) (r1, r2, err uintptr) {
+	nargs := len(args)
+
+	// asmstdcall expects it can access the first 4 arguments
+	// to load them into registers.
+	var tmp [4]uintptr
+	switch {
+	case nargs < 4:
+		copy(tmp[:], args)
+		args = tmp[:]
+	case nargs > maxArgs:
+		panic("runtime: SyscallN has too many arguments")
+	}
+
+	lockOSThread()
+	defer unlockOSThread()
+	c := &getg().m.syscall
+	c.fn = trap
+	c.n = uintptr(nargs)
+	c.args = uintptr(noescape(unsafe.Pointer(&args[0])))
+	cgocall(asmstdcallAddr, unsafe.Pointer(c))
+	return c.r1, c.r2, c.err
+}
diff --git a/src/runtime/syscall_windows_test.go b/src/runtime/syscall_windows_test.go
new file mode 100644
index 0000000..1770b83
--- /dev/null
+++ b/src/runtime/syscall_windows_test.go
@@ -0,0 +1,1349 @@
+// Copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"fmt"
+	"internal/abi"
+	"internal/syscall/windows/sysdll"
+	"internal/testenv"
+	"io"
+	"math"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"reflect"
+	"runtime"
+	"strconv"
+	"strings"
+	"syscall"
+	"testing"
+	"unsafe"
+)
+
+type DLL struct {
+	*syscall.DLL
+	t *testing.T
+}
+
+func GetDLL(t *testing.T, name string) *DLL {
+	d, e := syscall.LoadDLL(name)
+	if e != nil {
+		t.Fatal(e)
+	}
+	return &DLL{DLL: d, t: t}
+}
+
+func (d *DLL) Proc(name string) *syscall.Proc {
+	p, e := d.FindProc(name)
+	if e != nil {
+		d.t.Fatal(e)
+	}
+	return p
+}
+
+func TestStdCall(t *testing.T) {
+	type Rect struct {
+		left, top, right, bottom int32
+	}
+	res := Rect{}
+	expected := Rect{1, 1, 40, 60}
+	a, _, _ := GetDLL(t, "user32.dll").Proc("UnionRect").Call(
+		uintptr(unsafe.Pointer(&res)),
+		uintptr(unsafe.Pointer(&Rect{10, 1, 14, 60})),
+		uintptr(unsafe.Pointer(&Rect{1, 2, 40, 50})))
+	if a != 1 || res.left != expected.left ||
+		res.top != expected.top ||
+		res.right != expected.right ||
+		res.bottom != expected.bottom {
+		t.Error("stdcall USER32.UnionRect returns", a, "res=", res)
+	}
+}
+
+func Test64BitReturnStdCall(t *testing.T) {
+
+	const (
+		VER_BUILDNUMBER      = 0x0000004
+		VER_MAJORVERSION     = 0x0000002
+		VER_MINORVERSION     = 0x0000001
+		VER_PLATFORMID       = 0x0000008
+		VER_PRODUCT_TYPE     = 0x0000080
+		VER_SERVICEPACKMAJOR = 0x0000020
+		VER_SERVICEPACKMINOR = 0x0000010
+		VER_SUITENAME        = 0x0000040
+
+		VER_EQUAL         = 1
+		VER_GREATER       = 2
+		VER_GREATER_EQUAL = 3
+		VER_LESS          = 4
+		VER_LESS_EQUAL    = 5
+
+		ERROR_OLD_WIN_VERSION syscall.Errno = 1150
+	)
+
+	type OSVersionInfoEx struct {
+		OSVersionInfoSize uint32
+		MajorVersion      uint32
+		MinorVersion      uint32
+		BuildNumber       uint32
+		PlatformId        uint32
+		CSDVersion        [128]uint16
+		ServicePackMajor  uint16
+		ServicePackMinor  uint16
+		SuiteMask         uint16
+		ProductType       byte
+		Reserve           byte
+	}
+
+	d := GetDLL(t, "kernel32.dll")
+
+	var m1, m2 uintptr
+	VerSetConditionMask := d.Proc("VerSetConditionMask")
+	m1, m2, _ = VerSetConditionMask.Call(m1, m2, VER_MAJORVERSION, VER_GREATER_EQUAL)
+	m1, m2, _ = VerSetConditionMask.Call(m1, m2, VER_MINORVERSION, VER_GREATER_EQUAL)
+	m1, m2, _ = VerSetConditionMask.Call(m1, m2, VER_SERVICEPACKMAJOR, VER_GREATER_EQUAL)
+	m1, m2, _ = VerSetConditionMask.Call(m1, m2, VER_SERVICEPACKMINOR, VER_GREATER_EQUAL)
+
+	vi := OSVersionInfoEx{
+		MajorVersion:     5,
+		MinorVersion:     1,
+		ServicePackMajor: 2,
+		ServicePackMinor: 0,
+	}
+	vi.OSVersionInfoSize = uint32(unsafe.Sizeof(vi))
+	r, _, e2 := d.Proc("VerifyVersionInfoW").Call(
+		uintptr(unsafe.Pointer(&vi)),
+		VER_MAJORVERSION|VER_MINORVERSION|VER_SERVICEPACKMAJOR|VER_SERVICEPACKMINOR,
+		m1, m2)
+	if r == 0 && e2 != ERROR_OLD_WIN_VERSION {
+		t.Errorf("VerifyVersionInfo failed: %s", e2)
+	}
+}
+
+func TestCDecl(t *testing.T) {
+	var buf [50]byte
+	fmtp, _ := syscall.BytePtrFromString("%d %d %d")
+	a, _, _ := GetDLL(t, "user32.dll").Proc("wsprintfA").Call(
+		uintptr(unsafe.Pointer(&buf[0])),
+		uintptr(unsafe.Pointer(fmtp)),
+		1000, 2000, 3000)
+	if string(buf[:a]) != "1000 2000 3000" {
+		t.Error("cdecl USER32.wsprintfA returns", a, "buf=", buf[:a])
+	}
+}
+
+func TestEnumWindows(t *testing.T) {
+	d := GetDLL(t, "user32.dll")
+	isWindows := d.Proc("IsWindow")
+	counter := 0
+	cb := syscall.NewCallback(func(hwnd syscall.Handle, lparam uintptr) uintptr {
+		if lparam != 888 {
+			t.Error("lparam was not passed to callback")
+		}
+		b, _, _ := isWindows.Call(uintptr(hwnd))
+		if b == 0 {
+			t.Error("USER32.IsWindow returns FALSE")
+		}
+		counter++
+		return 1 // continue enumeration
+	})
+	a, _, _ := d.Proc("EnumWindows").Call(cb, 888)
+	if a == 0 {
+		t.Error("USER32.EnumWindows returns FALSE")
+	}
+	if counter == 0 {
+		t.Error("Callback has been never called or your have no windows")
+	}
+}
+
+func callback(timeFormatString unsafe.Pointer, lparam uintptr) uintptr {
+	(*(*func())(unsafe.Pointer(&lparam)))()
+	return 0 // stop enumeration
+}
+
+// nestedCall calls into Windows, back into Go, and finally to f.
+func nestedCall(t *testing.T, f func()) {
+	c := syscall.NewCallback(callback)
+	d := GetDLL(t, "kernel32.dll")
+	defer d.Release()
+	const LOCALE_NAME_USER_DEFAULT = 0
+	d.Proc("EnumTimeFormatsEx").Call(c, LOCALE_NAME_USER_DEFAULT, 0, uintptr(*(*unsafe.Pointer)(unsafe.Pointer(&f))))
+}
+
+func TestCallback(t *testing.T) {
+	var x = false
+	nestedCall(t, func() { x = true })
+	if !x {
+		t.Fatal("nestedCall did not call func")
+	}
+}
+
+func TestCallbackGC(t *testing.T) {
+	nestedCall(t, runtime.GC)
+}
+
+func TestCallbackPanicLocked(t *testing.T) {
+	runtime.LockOSThread()
+	defer runtime.UnlockOSThread()
+
+	if !runtime.LockedOSThread() {
+		t.Fatal("runtime.LockOSThread didn't")
+	}
+	defer func() {
+		s := recover()
+		if s == nil {
+			t.Fatal("did not panic")
+		}
+		if s.(string) != "callback panic" {
+			t.Fatal("wrong panic:", s)
+		}
+		if !runtime.LockedOSThread() {
+			t.Fatal("lost lock on OS thread after panic")
+		}
+	}()
+	nestedCall(t, func() { panic("callback panic") })
+	panic("nestedCall returned")
+}
+
+func TestCallbackPanic(t *testing.T) {
+	// Make sure panic during callback unwinds properly.
+	if runtime.LockedOSThread() {
+		t.Fatal("locked OS thread on entry to TestCallbackPanic")
+	}
+	defer func() {
+		s := recover()
+		if s == nil {
+			t.Fatal("did not panic")
+		}
+		if s.(string) != "callback panic" {
+			t.Fatal("wrong panic:", s)
+		}
+		if runtime.LockedOSThread() {
+			t.Fatal("locked OS thread on exit from TestCallbackPanic")
+		}
+	}()
+	nestedCall(t, func() { panic("callback panic") })
+	panic("nestedCall returned")
+}
+
+func TestCallbackPanicLoop(t *testing.T) {
+	// Make sure we don't blow out m->g0 stack.
+	for i := 0; i < 100000; i++ {
+		TestCallbackPanic(t)
+	}
+}
+
+func TestBlockingCallback(t *testing.T) {
+	c := make(chan int)
+	go func() {
+		for i := 0; i < 10; i++ {
+			c <- <-c
+		}
+	}()
+	nestedCall(t, func() {
+		for i := 0; i < 10; i++ {
+			c <- i
+			if j := <-c; j != i {
+				t.Errorf("out of sync %d != %d", j, i)
+			}
+		}
+	})
+}
+
+func TestCallbackInAnotherThread(t *testing.T) {
+	d := GetDLL(t, "kernel32.dll")
+
+	f := func(p uintptr) uintptr {
+		return p
+	}
+	r, _, err := d.Proc("CreateThread").Call(0, 0, syscall.NewCallback(f), 123, 0, 0)
+	if r == 0 {
+		t.Fatalf("CreateThread failed: %v", err)
+	}
+	h := syscall.Handle(r)
+	defer syscall.CloseHandle(h)
+
+	switch s, err := syscall.WaitForSingleObject(h, 100); s {
+	case syscall.WAIT_OBJECT_0:
+		break
+	case syscall.WAIT_TIMEOUT:
+		t.Fatal("timeout waiting for thread to exit")
+	case syscall.WAIT_FAILED:
+		t.Fatalf("WaitForSingleObject failed: %v", err)
+	default:
+		t.Fatalf("WaitForSingleObject returns unexpected value %v", s)
+	}
+
+	var ec uint32
+	r, _, err = d.Proc("GetExitCodeThread").Call(uintptr(h), uintptr(unsafe.Pointer(&ec)))
+	if r == 0 {
+		t.Fatalf("GetExitCodeThread failed: %v", err)
+	}
+	if ec != 123 {
+		t.Fatalf("expected 123, but got %d", ec)
+	}
+}
+
+type cbFunc struct {
+	goFunc any
+}
+
+func (f cbFunc) cName(cdecl bool) string {
+	name := "stdcall"
+	if cdecl {
+		name = "cdecl"
+	}
+	t := reflect.TypeOf(f.goFunc)
+	for i := 0; i < t.NumIn(); i++ {
+		name += "_" + t.In(i).Name()
+	}
+	return name
+}
+
+func (f cbFunc) cSrc(w io.Writer, cdecl bool) {
+	// Construct a C function that takes a callback with
+	// f.goFunc's signature, and calls it with integers 1..N.
+	funcname := f.cName(cdecl)
+	attr := "__stdcall"
+	if cdecl {
+		attr = "__cdecl"
+	}
+	typename := "t" + funcname
+	t := reflect.TypeOf(f.goFunc)
+	cTypes := make([]string, t.NumIn())
+	cArgs := make([]string, t.NumIn())
+	for i := range cTypes {
+		// We included stdint.h, so this works for all sized
+		// integer types, and uint8Pair_t.
+		cTypes[i] = t.In(i).Name() + "_t"
+		if t.In(i).Name() == "uint8Pair" {
+			cArgs[i] = fmt.Sprintf("(uint8Pair_t){%d,1}", i)
+		} else {
+			cArgs[i] = fmt.Sprintf("%d", i+1)
+		}
+	}
+	fmt.Fprintf(w, `
+typedef uintptr_t %s (*%s)(%s);
+uintptr_t %s(%s f) {
+	return f(%s);
+}
+	`, attr, typename, strings.Join(cTypes, ","), funcname, typename, strings.Join(cArgs, ","))
+}
+
+func (f cbFunc) testOne(t *testing.T, dll *syscall.DLL, cdecl bool, cb uintptr) {
+	r1, _, _ := dll.MustFindProc(f.cName(cdecl)).Call(cb)
+
+	want := 0
+	for i := 0; i < reflect.TypeOf(f.goFunc).NumIn(); i++ {
+		want += i + 1
+	}
+	if int(r1) != want {
+		t.Errorf("wanted result %d; got %d", want, r1)
+	}
+}
+
+type uint8Pair struct{ x, y uint8 }
+
+var cbFuncs = []cbFunc{
+	{func(i1, i2 uintptr) uintptr {
+		return i1 + i2
+	}},
+	{func(i1, i2, i3 uintptr) uintptr {
+		return i1 + i2 + i3
+	}},
+	{func(i1, i2, i3, i4 uintptr) uintptr {
+		return i1 + i2 + i3 + i4
+	}},
+	{func(i1, i2, i3, i4, i5 uintptr) uintptr {
+		return i1 + i2 + i3 + i4 + i5
+	}},
+	{func(i1, i2, i3, i4, i5, i6 uintptr) uintptr {
+		return i1 + i2 + i3 + i4 + i5 + i6
+	}},
+	{func(i1, i2, i3, i4, i5, i6, i7 uintptr) uintptr {
+		return i1 + i2 + i3 + i4 + i5 + i6 + i7
+	}},
+	{func(i1, i2, i3, i4, i5, i6, i7, i8 uintptr) uintptr {
+		return i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8
+	}},
+	{func(i1, i2, i3, i4, i5, i6, i7, i8, i9 uintptr) uintptr {
+		return i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9
+	}},
+
+	// Non-uintptr parameters.
+	{func(i1, i2, i3, i4, i5, i6, i7, i8, i9 uint8) uintptr {
+		return uintptr(i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9)
+	}},
+	{func(i1, i2, i3, i4, i5, i6, i7, i8, i9 uint16) uintptr {
+		return uintptr(i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9)
+	}},
+	{func(i1, i2, i3, i4, i5, i6, i7, i8, i9 int8) uintptr {
+		return uintptr(i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9)
+	}},
+	{func(i1 int8, i2 int16, i3 int32, i4, i5 uintptr) uintptr {
+		return uintptr(i1) + uintptr(i2) + uintptr(i3) + i4 + i5
+	}},
+	{func(i1, i2, i3, i4, i5 uint8Pair) uintptr {
+		return uintptr(i1.x + i1.y + i2.x + i2.y + i3.x + i3.y + i4.x + i4.y + i5.x + i5.y)
+	}},
+	{func(i1, i2, i3, i4, i5, i6, i7, i8, i9 uint32) uintptr {
+		runtime.GC()
+		return uintptr(i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9)
+	}},
+}
+
+//go:registerparams
+func sum2(i1, i2 uintptr) uintptr {
+	return i1 + i2
+}
+
+//go:registerparams
+func sum3(i1, i2, i3 uintptr) uintptr {
+	return i1 + i2 + i3
+}
+
+//go:registerparams
+func sum4(i1, i2, i3, i4 uintptr) uintptr {
+	return i1 + i2 + i3 + i4
+}
+
+//go:registerparams
+func sum5(i1, i2, i3, i4, i5 uintptr) uintptr {
+	return i1 + i2 + i3 + i4 + i5
+}
+
+//go:registerparams
+func sum6(i1, i2, i3, i4, i5, i6 uintptr) uintptr {
+	return i1 + i2 + i3 + i4 + i5 + i6
+}
+
+//go:registerparams
+func sum7(i1, i2, i3, i4, i5, i6, i7 uintptr) uintptr {
+	return i1 + i2 + i3 + i4 + i5 + i6 + i7
+}
+
+//go:registerparams
+func sum8(i1, i2, i3, i4, i5, i6, i7, i8 uintptr) uintptr {
+	return i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8
+}
+
+//go:registerparams
+func sum9(i1, i2, i3, i4, i5, i6, i7, i8, i9 uintptr) uintptr {
+	return i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9
+}
+
+//go:registerparams
+func sum10(i1, i2, i3, i4, i5, i6, i7, i8, i9, i10 uintptr) uintptr {
+	return i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9 + i10
+}
+
+//go:registerparams
+func sum9uint8(i1, i2, i3, i4, i5, i6, i7, i8, i9 uint8) uintptr {
+	return uintptr(i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9)
+}
+
+//go:registerparams
+func sum9uint16(i1, i2, i3, i4, i5, i6, i7, i8, i9 uint16) uintptr {
+	return uintptr(i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9)
+}
+
+//go:registerparams
+func sum9int8(i1, i2, i3, i4, i5, i6, i7, i8, i9 int8) uintptr {
+	return uintptr(i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9)
+}
+
+//go:registerparams
+func sum5mix(i1 int8, i2 int16, i3 int32, i4, i5 uintptr) uintptr {
+	return uintptr(i1) + uintptr(i2) + uintptr(i3) + i4 + i5
+}
+
+//go:registerparams
+func sum5andPair(i1, i2, i3, i4, i5 uint8Pair) uintptr {
+	return uintptr(i1.x + i1.y + i2.x + i2.y + i3.x + i3.y + i4.x + i4.y + i5.x + i5.y)
+}
+
+// This test forces a GC. The idea is to have enough arguments
+// that insufficient spill slots allocated (according to the ABI)
+// may cause compiler-generated spills to clobber the return PC.
+// Then, the GC stack scanning will catch that.
+//
+//go:registerparams
+func sum9andGC(i1, i2, i3, i4, i5, i6, i7, i8, i9 uint32) uintptr {
+	runtime.GC()
+	return uintptr(i1 + i2 + i3 + i4 + i5 + i6 + i7 + i8 + i9)
+}
+
+// TODO(register args): Remove this once we switch to using the register
+// calling convention by default, since this is redundant with the existing
+// tests.
+var cbFuncsRegABI = []cbFunc{
+	{sum2},
+	{sum3},
+	{sum4},
+	{sum5},
+	{sum6},
+	{sum7},
+	{sum8},
+	{sum9},
+	{sum10},
+	{sum9uint8},
+	{sum9uint16},
+	{sum9int8},
+	{sum5mix},
+	{sum5andPair},
+	{sum9andGC},
+}
+
+func getCallbackTestFuncs() []cbFunc {
+	if regs := runtime.SetIntArgRegs(-1); regs > 0 {
+		return cbFuncsRegABI
+	}
+	return cbFuncs
+}
+
+type cbDLL struct {
+	name      string
+	buildArgs func(out, src string) []string
+}
+
+func (d *cbDLL) makeSrc(t *testing.T, path string) {
+	f, err := os.Create(path)
+	if err != nil {
+		t.Fatalf("failed to create source file: %v", err)
+	}
+	defer f.Close()
+
+	fmt.Fprint(f, `
+#include <stdint.h>
+typedef struct { uint8_t x, y; } uint8Pair_t;
+`)
+	for _, cbf := range getCallbackTestFuncs() {
+		cbf.cSrc(f, false)
+		cbf.cSrc(f, true)
+	}
+}
+
+func (d *cbDLL) build(t *testing.T, dir string) string {
+	srcname := d.name + ".c"
+	d.makeSrc(t, filepath.Join(dir, srcname))
+	outname := d.name + ".dll"
+	args := d.buildArgs(outname, srcname)
+	cmd := exec.Command(args[0], args[1:]...)
+	cmd.Dir = dir
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build dll: %v - %v", err, string(out))
+	}
+	return filepath.Join(dir, outname)
+}
+
+var cbDLLs = []cbDLL{
+	{
+		"test",
+		func(out, src string) []string {
+			return []string{"gcc", "-shared", "-s", "-Werror", "-o", out, src}
+		},
+	},
+	{
+		"testO2",
+		func(out, src string) []string {
+			return []string{"gcc", "-shared", "-s", "-Werror", "-o", out, "-O2", src}
+		},
+	},
+}
+
+func TestStdcallAndCDeclCallbacks(t *testing.T) {
+	if _, err := exec.LookPath("gcc"); err != nil {
+		t.Skip("skipping test: gcc is missing")
+	}
+	tmp := t.TempDir()
+
+	oldRegs := runtime.SetIntArgRegs(abi.IntArgRegs)
+	defer runtime.SetIntArgRegs(oldRegs)
+
+	for _, dll := range cbDLLs {
+		t.Run(dll.name, func(t *testing.T) {
+			dllPath := dll.build(t, tmp)
+			dll := syscall.MustLoadDLL(dllPath)
+			defer dll.Release()
+			for _, cbf := range getCallbackTestFuncs() {
+				t.Run(cbf.cName(false), func(t *testing.T) {
+					stdcall := syscall.NewCallback(cbf.goFunc)
+					cbf.testOne(t, dll, false, stdcall)
+				})
+				t.Run(cbf.cName(true), func(t *testing.T) {
+					cdecl := syscall.NewCallbackCDecl(cbf.goFunc)
+					cbf.testOne(t, dll, true, cdecl)
+				})
+			}
+		})
+	}
+}
+
+func TestRegisterClass(t *testing.T) {
+	kernel32 := GetDLL(t, "kernel32.dll")
+	user32 := GetDLL(t, "user32.dll")
+	mh, _, _ := kernel32.Proc("GetModuleHandleW").Call(0)
+	cb := syscall.NewCallback(func(hwnd syscall.Handle, msg uint32, wparam, lparam uintptr) (rc uintptr) {
+		t.Fatal("callback should never get called")
+		return 0
+	})
+	type Wndclassex struct {
+		Size       uint32
+		Style      uint32
+		WndProc    uintptr
+		ClsExtra   int32
+		WndExtra   int32
+		Instance   syscall.Handle
+		Icon       syscall.Handle
+		Cursor     syscall.Handle
+		Background syscall.Handle
+		MenuName   *uint16
+		ClassName  *uint16
+		IconSm     syscall.Handle
+	}
+	name := syscall.StringToUTF16Ptr("test_window")
+	wc := Wndclassex{
+		WndProc:   cb,
+		Instance:  syscall.Handle(mh),
+		ClassName: name,
+	}
+	wc.Size = uint32(unsafe.Sizeof(wc))
+	a, _, err := user32.Proc("RegisterClassExW").Call(uintptr(unsafe.Pointer(&wc)))
+	if a == 0 {
+		t.Fatalf("RegisterClassEx failed: %v", err)
+	}
+	r, _, err := user32.Proc("UnregisterClassW").Call(uintptr(unsafe.Pointer(name)), 0)
+	if r == 0 {
+		t.Fatalf("UnregisterClass failed: %v", err)
+	}
+}
+
+func TestOutputDebugString(t *testing.T) {
+	d := GetDLL(t, "kernel32.dll")
+	p := syscall.StringToUTF16Ptr("testing OutputDebugString")
+	d.Proc("OutputDebugStringW").Call(uintptr(unsafe.Pointer(p)))
+}
+
+func TestRaiseException(t *testing.T) {
+	if strings.HasPrefix(testenv.Builder(), "windows-amd64-2012") {
+		testenv.SkipFlaky(t, 49681)
+	}
+	o := runTestProg(t, "testprog", "RaiseException")
+	if strings.Contains(o, "RaiseException should not return") {
+		t.Fatalf("RaiseException did not crash program: %v", o)
+	}
+	if !strings.Contains(o, "Exception 0xbad") {
+		t.Fatalf("No stack trace: %v", o)
+	}
+}
+
+func TestZeroDivisionException(t *testing.T) {
+	o := runTestProg(t, "testprog", "ZeroDivisionException")
+	if !strings.Contains(o, "panic: runtime error: integer divide by zero") {
+		t.Fatalf("No stack trace: %v", o)
+	}
+}
+
+func TestWERDialogue(t *testing.T) {
+	if os.Getenv("TEST_WER_DIALOGUE") == "1" {
+		const EXCEPTION_NONCONTINUABLE = 1
+		mod := syscall.MustLoadDLL("kernel32.dll")
+		proc := mod.MustFindProc("RaiseException")
+		proc.Call(0xbad, EXCEPTION_NONCONTINUABLE, 0, 0)
+		t.Fatal("RaiseException should not return")
+	}
+	exe, err := os.Executable()
+	if err != nil {
+		t.Fatal(err)
+	}
+	cmd := testenv.CleanCmdEnv(testenv.Command(t, exe, "-test.run=TestWERDialogue"))
+	cmd.Env = append(cmd.Env, "TEST_WER_DIALOGUE=1", "GOTRACEBACK=wer")
+	// Child process should not open WER dialogue, but return immediately instead.
+	// The exit code can't be reliably tested here because Windows can change it.
+	_, err = cmd.CombinedOutput()
+	if err == nil {
+		t.Error("test program succeeded unexpectedly")
+	}
+}
+
+func TestWindowsStackMemory(t *testing.T) {
+	o := runTestProg(t, "testprog", "StackMemory")
+	stackUsage, err := strconv.Atoi(o)
+	if err != nil {
+		t.Fatalf("Failed to read stack usage: %v", err)
+	}
+	if expected, got := 100<<10, stackUsage; got > expected {
+		t.Fatalf("expected < %d bytes of memory per thread, got %d", expected, got)
+	}
+}
+
+var used byte
+
+func use(buf []byte) {
+	for _, c := range buf {
+		used += c
+	}
+}
+
+func forceStackCopy() (r int) {
+	var f func(int) int
+	f = func(i int) int {
+		var buf [256]byte
+		use(buf[:])
+		if i == 0 {
+			return 0
+		}
+		return i + f(i-1)
+	}
+	r = f(128)
+	return
+}
+
+func TestReturnAfterStackGrowInCallback(t *testing.T) {
+	if _, err := exec.LookPath("gcc"); err != nil {
+		t.Skip("skipping test: gcc is missing")
+	}
+
+	const src = `
+#include <stdint.h>
+#include <windows.h>
+
+typedef uintptr_t __stdcall (*callback)(uintptr_t);
+
+uintptr_t cfunc(callback f, uintptr_t n) {
+   uintptr_t r;
+   r = f(n);
+   SetLastError(333);
+   return r;
+}
+`
+	tmpdir := t.TempDir()
+
+	srcname := "mydll.c"
+	err := os.WriteFile(filepath.Join(tmpdir, srcname), []byte(src), 0)
+	if err != nil {
+		t.Fatal(err)
+	}
+	outname := "mydll.dll"
+	cmd := exec.Command("gcc", "-shared", "-s", "-Werror", "-o", outname, srcname)
+	cmd.Dir = tmpdir
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build dll: %v - %v", err, string(out))
+	}
+	dllpath := filepath.Join(tmpdir, outname)
+
+	dll := syscall.MustLoadDLL(dllpath)
+	defer dll.Release()
+
+	proc := dll.MustFindProc("cfunc")
+
+	cb := syscall.NewCallback(func(n uintptr) uintptr {
+		forceStackCopy()
+		return n
+	})
+
+	// Use a new goroutine so that we get a small stack.
+	type result struct {
+		r   uintptr
+		err syscall.Errno
+	}
+	want := result{
+		// Make it large enough to test issue #29331.
+		r:   (^uintptr(0)) >> 24,
+		err: 333,
+	}
+	c := make(chan result)
+	go func() {
+		r, _, err := proc.Call(cb, want.r)
+		c <- result{r, err.(syscall.Errno)}
+	}()
+	if got := <-c; got != want {
+		t.Errorf("got %d want %d", got, want)
+	}
+}
+
+func TestSyscallN(t *testing.T) {
+	if _, err := exec.LookPath("gcc"); err != nil {
+		t.Skip("skipping test: gcc is missing")
+	}
+	if runtime.GOARCH != "amd64" {
+		t.Skipf("skipping test: GOARCH=%s", runtime.GOARCH)
+	}
+
+	for arglen := 0; arglen <= runtime.MaxArgs; arglen++ {
+		arglen := arglen
+		t.Run(fmt.Sprintf("arg-%d", arglen), func(t *testing.T) {
+			t.Parallel()
+			args := make([]string, arglen)
+			rets := make([]string, arglen+1)
+			params := make([]uintptr, arglen)
+			for i := range args {
+				args[i] = fmt.Sprintf("int a%d", i)
+				rets[i] = fmt.Sprintf("(a%d == %d)", i, i)
+				params[i] = uintptr(i)
+			}
+			rets[arglen] = "1" // for arglen == 0
+
+			src := fmt.Sprintf(`
+		#include <stdint.h>
+		#include <windows.h>
+		int cfunc(%s) { return %s; }`, strings.Join(args, ", "), strings.Join(rets, " && "))
+
+			tmpdir := t.TempDir()
+
+			srcname := "mydll.c"
+			err := os.WriteFile(filepath.Join(tmpdir, srcname), []byte(src), 0)
+			if err != nil {
+				t.Fatal(err)
+			}
+			outname := "mydll.dll"
+			cmd := exec.Command("gcc", "-shared", "-s", "-Werror", "-o", outname, srcname)
+			cmd.Dir = tmpdir
+			out, err := cmd.CombinedOutput()
+			if err != nil {
+				t.Fatalf("failed to build dll: %v\n%s", err, out)
+			}
+			dllpath := filepath.Join(tmpdir, outname)
+
+			dll := syscall.MustLoadDLL(dllpath)
+			defer dll.Release()
+
+			proc := dll.MustFindProc("cfunc")
+
+			// proc.Call() will call SyscallN() internally.
+			r, _, err := proc.Call(params...)
+			if r != 1 {
+				t.Errorf("got %d want 1 (err=%v)", r, err)
+			}
+		})
+	}
+}
+
+func TestFloatArgs(t *testing.T) {
+	if _, err := exec.LookPath("gcc"); err != nil {
+		t.Skip("skipping test: gcc is missing")
+	}
+	if runtime.GOARCH != "amd64" {
+		t.Skipf("skipping test: GOARCH=%s", runtime.GOARCH)
+	}
+
+	const src = `
+#include <stdint.h>
+#include <windows.h>
+
+uintptr_t cfunc(uintptr_t a, double b, float c, double d) {
+	if (a == 1 && b == 2.2 && c == 3.3f && d == 4.4e44) {
+		return 1;
+	}
+	return 0;
+}
+`
+	tmpdir := t.TempDir()
+
+	srcname := "mydll.c"
+	err := os.WriteFile(filepath.Join(tmpdir, srcname), []byte(src), 0)
+	if err != nil {
+		t.Fatal(err)
+	}
+	outname := "mydll.dll"
+	cmd := exec.Command("gcc", "-shared", "-s", "-Werror", "-o", outname, srcname)
+	cmd.Dir = tmpdir
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build dll: %v - %v", err, string(out))
+	}
+	dllpath := filepath.Join(tmpdir, outname)
+
+	dll := syscall.MustLoadDLL(dllpath)
+	defer dll.Release()
+
+	proc := dll.MustFindProc("cfunc")
+
+	r, _, err := proc.Call(
+		1,
+		uintptr(math.Float64bits(2.2)),
+		uintptr(math.Float32bits(3.3)),
+		uintptr(math.Float64bits(4.4e44)),
+	)
+	if r != 1 {
+		t.Errorf("got %d want 1 (err=%v)", r, err)
+	}
+}
+
+func TestFloatReturn(t *testing.T) {
+	if _, err := exec.LookPath("gcc"); err != nil {
+		t.Skip("skipping test: gcc is missing")
+	}
+	if runtime.GOARCH != "amd64" {
+		t.Skipf("skipping test: GOARCH=%s", runtime.GOARCH)
+	}
+
+	const src = `
+#include <stdint.h>
+#include <windows.h>
+
+float cfuncFloat(uintptr_t a, double b, float c, double d) {
+	if (a == 1 && b == 2.2 && c == 3.3f && d == 4.4e44) {
+		return 1.5f;
+	}
+	return 0;
+}
+
+double cfuncDouble(uintptr_t a, double b, float c, double d) {
+	if (a == 1 && b == 2.2 && c == 3.3f && d == 4.4e44) {
+		return 2.5;
+	}
+	return 0;
+}
+`
+	tmpdir := t.TempDir()
+
+	srcname := "mydll.c"
+	err := os.WriteFile(filepath.Join(tmpdir, srcname), []byte(src), 0)
+	if err != nil {
+		t.Fatal(err)
+	}
+	outname := "mydll.dll"
+	cmd := exec.Command("gcc", "-shared", "-s", "-Werror", "-o", outname, srcname)
+	cmd.Dir = tmpdir
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build dll: %v - %v", err, string(out))
+	}
+	dllpath := filepath.Join(tmpdir, outname)
+
+	dll := syscall.MustLoadDLL(dllpath)
+	defer dll.Release()
+
+	proc := dll.MustFindProc("cfuncFloat")
+
+	_, r, err := proc.Call(
+		1,
+		uintptr(math.Float64bits(2.2)),
+		uintptr(math.Float32bits(3.3)),
+		uintptr(math.Float64bits(4.4e44)),
+	)
+	fr := math.Float32frombits(uint32(r))
+	if fr != 1.5 {
+		t.Errorf("got %f want 1.5 (err=%v)", fr, err)
+	}
+
+	proc = dll.MustFindProc("cfuncDouble")
+
+	_, r, err = proc.Call(
+		1,
+		uintptr(math.Float64bits(2.2)),
+		uintptr(math.Float32bits(3.3)),
+		uintptr(math.Float64bits(4.4e44)),
+	)
+	dr := math.Float64frombits(uint64(r))
+	if dr != 2.5 {
+		t.Errorf("got %f want 2.5 (err=%v)", dr, err)
+	}
+}
+
+func TestTimeBeginPeriod(t *testing.T) {
+	const TIMERR_NOERROR = 0
+	if *runtime.TimeBeginPeriodRetValue != TIMERR_NOERROR {
+		t.Fatalf("timeBeginPeriod failed: it returned %d", *runtime.TimeBeginPeriodRetValue)
+	}
+}
+
+// removeOneCPU removes one (any) cpu from affinity mask.
+// It returns new affinity mask.
+func removeOneCPU(mask uintptr) (uintptr, error) {
+	if mask == 0 {
+		return 0, fmt.Errorf("cpu affinity mask is empty")
+	}
+	maskbits := int(unsafe.Sizeof(mask) * 8)
+	for i := 0; i < maskbits; i++ {
+		newmask := mask & ^(1 << uint(i))
+		if newmask != mask {
+			return newmask, nil
+		}
+
+	}
+	panic("not reached")
+}
+
+func resumeChildThread(kernel32 *syscall.DLL, childpid int) error {
+	_OpenThread := kernel32.MustFindProc("OpenThread")
+	_ResumeThread := kernel32.MustFindProc("ResumeThread")
+	_Thread32First := kernel32.MustFindProc("Thread32First")
+	_Thread32Next := kernel32.MustFindProc("Thread32Next")
+
+	snapshot, err := syscall.CreateToolhelp32Snapshot(syscall.TH32CS_SNAPTHREAD, 0)
+	if err != nil {
+		return err
+	}
+	defer syscall.CloseHandle(snapshot)
+
+	const _THREAD_SUSPEND_RESUME = 0x0002
+
+	type ThreadEntry32 struct {
+		Size           uint32
+		tUsage         uint32
+		ThreadID       uint32
+		OwnerProcessID uint32
+		BasePri        int32
+		DeltaPri       int32
+		Flags          uint32
+	}
+
+	var te ThreadEntry32
+	te.Size = uint32(unsafe.Sizeof(te))
+	ret, _, err := _Thread32First.Call(uintptr(snapshot), uintptr(unsafe.Pointer(&te)))
+	if ret == 0 {
+		return err
+	}
+	for te.OwnerProcessID != uint32(childpid) {
+		ret, _, err = _Thread32Next.Call(uintptr(snapshot), uintptr(unsafe.Pointer(&te)))
+		if ret == 0 {
+			return err
+		}
+	}
+	h, _, err := _OpenThread.Call(_THREAD_SUSPEND_RESUME, 1, uintptr(te.ThreadID))
+	if h == 0 {
+		return err
+	}
+	defer syscall.Close(syscall.Handle(h))
+
+	ret, _, err = _ResumeThread.Call(h)
+	if ret == 0xffffffff {
+		return err
+	}
+	return nil
+}
+
+func TestNumCPU(t *testing.T) {
+	if os.Getenv("GO_WANT_HELPER_PROCESS") == "1" {
+		// in child process
+		fmt.Fprintf(os.Stderr, "%d", runtime.NumCPU())
+		os.Exit(0)
+	}
+
+	switch n := runtime.NumberOfProcessors(); {
+	case n < 1:
+		t.Fatalf("system cannot have %d cpu(s)", n)
+	case n == 1:
+		if runtime.NumCPU() != 1 {
+			t.Fatalf("runtime.NumCPU() returns %d on single cpu system", runtime.NumCPU())
+		}
+		return
+	}
+
+	const (
+		_CREATE_SUSPENDED   = 0x00000004
+		_PROCESS_ALL_ACCESS = syscall.STANDARD_RIGHTS_REQUIRED | syscall.SYNCHRONIZE | 0xfff
+	)
+
+	kernel32 := syscall.MustLoadDLL("kernel32.dll")
+	_GetProcessAffinityMask := kernel32.MustFindProc("GetProcessAffinityMask")
+	_SetProcessAffinityMask := kernel32.MustFindProc("SetProcessAffinityMask")
+
+	cmd := exec.Command(os.Args[0], "-test.run=TestNumCPU")
+	cmd.Env = append(os.Environ(), "GO_WANT_HELPER_PROCESS=1")
+	var buf strings.Builder
+	cmd.Stdout = &buf
+	cmd.Stderr = &buf
+	cmd.SysProcAttr = &syscall.SysProcAttr{CreationFlags: _CREATE_SUSPENDED}
+	err := cmd.Start()
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer func() {
+		err = cmd.Wait()
+		childOutput := buf.String()
+		if err != nil {
+			t.Fatalf("child failed: %v: %v", err, childOutput)
+		}
+		// removeOneCPU should have decreased child cpu count by 1
+		want := fmt.Sprintf("%d", runtime.NumCPU()-1)
+		if childOutput != want {
+			t.Fatalf("child output: want %q, got %q", want, childOutput)
+		}
+	}()
+
+	defer func() {
+		err = resumeChildThread(kernel32, cmd.Process.Pid)
+		if err != nil {
+			t.Fatal(err)
+		}
+	}()
+
+	ph, err := syscall.OpenProcess(_PROCESS_ALL_ACCESS, false, uint32(cmd.Process.Pid))
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer syscall.CloseHandle(ph)
+
+	var mask, sysmask uintptr
+	ret, _, err := _GetProcessAffinityMask.Call(uintptr(ph), uintptr(unsafe.Pointer(&mask)), uintptr(unsafe.Pointer(&sysmask)))
+	if ret == 0 {
+		t.Fatal(err)
+	}
+
+	newmask, err := removeOneCPU(mask)
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	ret, _, err = _SetProcessAffinityMask.Call(uintptr(ph), newmask)
+	if ret == 0 {
+		t.Fatal(err)
+	}
+	ret, _, err = _GetProcessAffinityMask.Call(uintptr(ph), uintptr(unsafe.Pointer(&mask)), uintptr(unsafe.Pointer(&sysmask)))
+	if ret == 0 {
+		t.Fatal(err)
+	}
+	if newmask != mask {
+		t.Fatalf("SetProcessAffinityMask didn't set newmask of 0x%x. Current mask is 0x%x.", newmask, mask)
+	}
+}
+
+// See Issue 14959
+func TestDLLPreloadMitigation(t *testing.T) {
+	if _, err := exec.LookPath("gcc"); err != nil {
+		t.Skip("skipping test: gcc is missing")
+	}
+
+	tmpdir := t.TempDir()
+
+	dir0, err := os.Getwd()
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer os.Chdir(dir0)
+
+	const src = `
+#include <stdint.h>
+#include <windows.h>
+
+uintptr_t cfunc(void) {
+   SetLastError(123);
+   return 0;
+}
+`
+	srcname := "nojack.c"
+	err = os.WriteFile(filepath.Join(tmpdir, srcname), []byte(src), 0)
+	if err != nil {
+		t.Fatal(err)
+	}
+	name := "nojack.dll"
+	cmd := exec.Command("gcc", "-shared", "-s", "-Werror", "-o", name, srcname)
+	cmd.Dir = tmpdir
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build dll: %v - %v", err, string(out))
+	}
+	dllpath := filepath.Join(tmpdir, name)
+
+	dll := syscall.MustLoadDLL(dllpath)
+	dll.MustFindProc("cfunc")
+	dll.Release()
+
+	// Get into the directory with the DLL we'll load by base name
+	// ("nojack.dll") Think of this as the user double-clicking an
+	// installer from their Downloads directory where a browser
+	// silently downloaded some malicious DLLs.
+	os.Chdir(tmpdir)
+
+	// First before we can load a DLL from the current directory,
+	// loading it only as "nojack.dll", without an absolute path.
+	delete(sysdll.IsSystemDLL, name) // in case test was run repeatedly
+	dll, err = syscall.LoadDLL(name)
+	if err != nil {
+		t.Fatalf("failed to load %s by base name before sysdll registration: %v", name, err)
+	}
+	dll.Release()
+
+	// And now verify that if we register it as a system32-only
+	// DLL, the implicit loading from the current directory no
+	// longer works.
+	sysdll.IsSystemDLL[name] = true
+	dll, err = syscall.LoadDLL(name)
+	if err == nil {
+		dll.Release()
+		t.Fatalf("Bad: insecure load of DLL by base name %q before sysdll registration: %v", name, err)
+	}
+}
+
+// Test that C code called via a DLL can use large Windows thread
+// stacks and call back in to Go without crashing. See issue #20975.
+//
+// See also TestBigStackCallbackCgo.
+func TestBigStackCallbackSyscall(t *testing.T) {
+	if _, err := exec.LookPath("gcc"); err != nil {
+		t.Skip("skipping test: gcc is missing")
+	}
+
+	srcname, err := filepath.Abs("testdata/testprogcgo/bigstack_windows.c")
+	if err != nil {
+		t.Fatal("Abs failed: ", err)
+	}
+
+	tmpdir := t.TempDir()
+
+	outname := "mydll.dll"
+	cmd := exec.Command("gcc", "-shared", "-s", "-Werror", "-o", outname, srcname)
+	cmd.Dir = tmpdir
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		t.Fatalf("failed to build dll: %v - %v", err, string(out))
+	}
+	dllpath := filepath.Join(tmpdir, outname)
+
+	dll := syscall.MustLoadDLL(dllpath)
+	defer dll.Release()
+
+	var ok bool
+	proc := dll.MustFindProc("bigStack")
+	cb := syscall.NewCallback(func() uintptr {
+		// Do something interesting to force stack checks.
+		forceStackCopy()
+		ok = true
+		return 0
+	})
+	proc.Call(cb)
+	if !ok {
+		t.Fatalf("callback not called")
+	}
+}
+
+var (
+	modwinmm    = syscall.NewLazyDLL("winmm.dll")
+	modkernel32 = syscall.NewLazyDLL("kernel32.dll")
+
+	procCreateEvent = modkernel32.NewProc("CreateEventW")
+	procSetEvent    = modkernel32.NewProc("SetEvent")
+)
+
+func createEvent() (syscall.Handle, error) {
+	r0, _, e0 := syscall.Syscall6(procCreateEvent.Addr(), 4, 0, 0, 0, 0, 0, 0)
+	if r0 == 0 {
+		return 0, syscall.Errno(e0)
+	}
+	return syscall.Handle(r0), nil
+}
+
+func setEvent(h syscall.Handle) error {
+	r0, _, e0 := syscall.Syscall(procSetEvent.Addr(), 1, uintptr(h), 0, 0)
+	if r0 == 0 {
+		return syscall.Errno(e0)
+	}
+	return nil
+}
+
+func BenchmarkChanToSyscallPing(b *testing.B) {
+	n := b.N
+	ch := make(chan int)
+	event, err := createEvent()
+	if err != nil {
+		b.Fatal(err)
+	}
+	go func() {
+		for i := 0; i < n; i++ {
+			syscall.WaitForSingleObject(event, syscall.INFINITE)
+			ch <- 1
+		}
+	}()
+	for i := 0; i < n; i++ {
+		err := setEvent(event)
+		if err != nil {
+			b.Fatal(err)
+		}
+		<-ch
+	}
+}
+
+func BenchmarkSyscallToSyscallPing(b *testing.B) {
+	n := b.N
+	event1, err := createEvent()
+	if err != nil {
+		b.Fatal(err)
+	}
+	event2, err := createEvent()
+	if err != nil {
+		b.Fatal(err)
+	}
+	go func() {
+		for i := 0; i < n; i++ {
+			syscall.WaitForSingleObject(event1, syscall.INFINITE)
+			if err := setEvent(event2); err != nil {
+				b.Errorf("Set event failed: %v", err)
+				return
+			}
+		}
+	}()
+	for i := 0; i < n; i++ {
+		if err := setEvent(event1); err != nil {
+			b.Fatal(err)
+		}
+		if b.Failed() {
+			break
+		}
+		syscall.WaitForSingleObject(event2, syscall.INFINITE)
+	}
+}
+
+func BenchmarkChanToChanPing(b *testing.B) {
+	n := b.N
+	ch1 := make(chan int)
+	ch2 := make(chan int)
+	go func() {
+		for i := 0; i < n; i++ {
+			<-ch1
+			ch2 <- 1
+		}
+	}()
+	for i := 0; i < n; i++ {
+		ch1 <- 1
+		<-ch2
+	}
+}
+
+func BenchmarkOsYield(b *testing.B) {
+	for i := 0; i < b.N; i++ {
+		runtime.OsYield()
+	}
+}
+
+func BenchmarkRunningGoProgram(b *testing.B) {
+	tmpdir := b.TempDir()
+
+	src := filepath.Join(tmpdir, "main.go")
+	err := os.WriteFile(src, []byte(benchmarkRunningGoProgram), 0666)
+	if err != nil {
+		b.Fatal(err)
+	}
+
+	exe := filepath.Join(tmpdir, "main.exe")
+	cmd := exec.Command(testenv.GoToolPath(b), "build", "-o", exe, src)
+	cmd.Dir = tmpdir
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		b.Fatalf("building main.exe failed: %v\n%s", err, out)
+	}
+
+	b.ResetTimer()
+	for i := 0; i < b.N; i++ {
+		cmd := exec.Command(exe)
+		out, err := cmd.CombinedOutput()
+		if err != nil {
+			b.Fatalf("running main.exe failed: %v\n%s", err, out)
+		}
+	}
+}
+
+const benchmarkRunningGoProgram = `
+package main
+
+import _ "os" // average Go program will use "os" package, do the same here
+
+func main() {
+}
+`
diff --git a/src/runtime/tagptr.go b/src/runtime/tagptr.go
new file mode 100644
index 0000000..0e17a15
--- /dev/null
+++ b/src/runtime/tagptr.go
@@ -0,0 +1,14 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// taggedPointer is a pointer with a numeric tag.
+// The size of the numeric tag is GOARCH-dependent,
+// currently at least 10 bits.
+// This should only be used with pointers allocated outside the Go heap.
+type taggedPointer uint64
+
+// minTagBits is the minimum number of tag bits that we expect.
+const minTagBits = 10
diff --git a/src/runtime/tagptr_32bit.go b/src/runtime/tagptr_32bit.go
new file mode 100644
index 0000000..f79e182
--- /dev/null
+++ b/src/runtime/tagptr_32bit.go
@@ -0,0 +1,30 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build 386 || arm || mips || mipsle
+
+package runtime
+
+import "unsafe"
+
+// The number of bits stored in the numeric tag of a taggedPointer
+const taggedPointerBits = 32
+
+// On 32-bit systems, taggedPointer has a 32-bit pointer and 32-bit count.
+
+// taggedPointerPack created a taggedPointer from a pointer and a tag.
+// Tag bits that don't fit in the result are discarded.
+func taggedPointerPack(ptr unsafe.Pointer, tag uintptr) taggedPointer {
+	return taggedPointer(uintptr(ptr))<<32 | taggedPointer(tag)
+}
+
+// Pointer returns the pointer from a taggedPointer.
+func (tp taggedPointer) pointer() unsafe.Pointer {
+	return unsafe.Pointer(uintptr(tp >> 32))
+}
+
+// Tag returns the tag from a taggedPointer.
+func (tp taggedPointer) tag() uintptr {
+	return uintptr(tp)
+}
diff --git a/src/runtime/tagptr_64bit.go b/src/runtime/tagptr_64bit.go
new file mode 100644
index 0000000..9ff11cc
--- /dev/null
+++ b/src/runtime/tagptr_64bit.go
@@ -0,0 +1,89 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build amd64 || arm64 || loong64 || mips64 || mips64le || ppc64 || ppc64le || riscv64 || s390x || wasm
+
+package runtime
+
+import (
+	"internal/goarch"
+	"internal/goos"
+	"unsafe"
+)
+
+const (
+	// addrBits is the number of bits needed to represent a virtual address.
+	//
+	// See heapAddrBits for a table of address space sizes on
+	// various architectures. 48 bits is enough for all
+	// architectures except s390x.
+	//
+	// On AMD64, virtual addresses are 48-bit (or 57-bit) numbers sign extended to 64.
+	// We shift the address left 16 to eliminate the sign extended part and make
+	// room in the bottom for the count.
+	//
+	// On s390x, virtual addresses are 64-bit. There's not much we
+	// can do about this, so we just hope that the kernel doesn't
+	// get to really high addresses and panic if it does.
+	addrBits = 48
+
+	// In addition to the 16 bits taken from the top, we can take 3 from the
+	// bottom, because node must be pointer-aligned, giving a total of 19 bits
+	// of count.
+	tagBits = 64 - addrBits + 3
+
+	// On AIX, 64-bit addresses are split into 36-bit segment number and 28-bit
+	// offset in segment.  Segment numbers in the range 0x0A0000000-0x0AFFFFFFF(LSA)
+	// are available for mmap.
+	// We assume all tagged addresses are from memory allocated with mmap.
+	// We use one bit to distinguish between the two ranges.
+	aixAddrBits = 57
+	aixTagBits  = 64 - aixAddrBits + 3
+
+	// riscv64 SV57 mode gives 56 bits of userspace VA.
+	// tagged pointer code supports it,
+	// but broader support for SV57 mode is incomplete,
+	// and there may be other issues (see #54104).
+	riscv64AddrBits = 56
+	riscv64TagBits  = 64 - riscv64AddrBits + 3
+)
+
+// The number of bits stored in the numeric tag of a taggedPointer
+const taggedPointerBits = (goos.IsAix * aixTagBits) + (goarch.IsRiscv64 * riscv64TagBits) + ((1 - goos.IsAix) * (1 - goarch.IsRiscv64) * tagBits)
+
+// taggedPointerPack created a taggedPointer from a pointer and a tag.
+// Tag bits that don't fit in the result are discarded.
+func taggedPointerPack(ptr unsafe.Pointer, tag uintptr) taggedPointer {
+	if GOOS == "aix" {
+		if GOARCH != "ppc64" {
+			throw("check this code for aix on non-ppc64")
+		}
+		return taggedPointer(uint64(uintptr(ptr))<<(64-aixAddrBits) | uint64(tag&(1<<aixTagBits-1)))
+	}
+	if GOARCH == "riscv64" {
+		return taggedPointer(uint64(uintptr(ptr))<<(64-riscv64AddrBits) | uint64(tag&(1<<riscv64TagBits-1)))
+	}
+	return taggedPointer(uint64(uintptr(ptr))<<(64-addrBits) | uint64(tag&(1<<tagBits-1)))
+}
+
+// Pointer returns the pointer from a taggedPointer.
+func (tp taggedPointer) pointer() unsafe.Pointer {
+	if GOARCH == "amd64" {
+		// amd64 systems can place the stack above the VA hole, so we need to sign extend
+		// val before unpacking.
+		return unsafe.Pointer(uintptr(int64(tp) >> tagBits << 3))
+	}
+	if GOOS == "aix" {
+		return unsafe.Pointer(uintptr((tp >> aixTagBits << 3) | 0xa<<56))
+	}
+	if GOARCH == "riscv64" {
+		return unsafe.Pointer(uintptr(tp >> riscv64TagBits << 3))
+	}
+	return unsafe.Pointer(uintptr(tp >> tagBits << 3))
+}
+
+// Tag returns the tag from a taggedPointer.
+func (tp taggedPointer) tag() uintptr {
+	return uintptr(tp & (1<<taggedPointerBits - 1))
+}
diff --git a/src/runtime/test_amd64.go b/src/runtime/test_amd64.go
new file mode 100644
index 0000000..70c7a4f
--- /dev/null
+++ b/src/runtime/test_amd64.go
@@ -0,0 +1,7 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+func testSPWrite()
diff --git a/src/runtime/test_amd64.s b/src/runtime/test_amd64.s
new file mode 100644
index 0000000..80fa8c9
--- /dev/null
+++ b/src/runtime/test_amd64.s
@@ -0,0 +1,7 @@
+// Create a large frame to force stack growth. See #62326.
+TEXT ·testSPWrite(SB),0,$16384-0
+	// Write to SP
+	MOVQ SP, AX
+	ANDQ $~0xf, SP
+	MOVQ AX, SP
+	RET
diff --git a/src/runtime/test_stubs.go b/src/runtime/test_stubs.go
new file mode 100644
index 0000000..cefc324
--- /dev/null
+++ b/src/runtime/test_stubs.go
@@ -0,0 +1,9 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !amd64
+
+package runtime
+
+func testSPWrite() {}
diff --git a/src/runtime/testdata/testexithooks/testexithooks.go b/src/runtime/testdata/testexithooks/testexithooks.go
new file mode 100644
index 0000000..ceb3326
--- /dev/null
+++ b/src/runtime/testdata/testexithooks/testexithooks.go
@@ -0,0 +1,85 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"flag"
+	"os"
+	_ "unsafe"
+)
+
+var modeflag = flag.String("mode", "", "mode to run in")
+
+func main() {
+	flag.Parse()
+	switch *modeflag {
+	case "simple":
+		testSimple()
+	case "goodexit":
+		testGoodExit()
+	case "badexit":
+		testBadExit()
+	case "panics":
+		testPanics()
+	case "callsexit":
+		testHookCallsExit()
+	default:
+		panic("unknown mode")
+	}
+}
+
+//go:linkname runtime_addExitHook runtime.addExitHook
+func runtime_addExitHook(f func(), runOnNonZeroExit bool)
+
+func testSimple() {
+	f1 := func() { println("foo") }
+	f2 := func() { println("bar") }
+	runtime_addExitHook(f1, false)
+	runtime_addExitHook(f2, false)
+	// no explicit call to os.Exit
+}
+
+func testGoodExit() {
+	f1 := func() { println("apple") }
+	f2 := func() { println("orange") }
+	runtime_addExitHook(f1, false)
+	runtime_addExitHook(f2, false)
+	// explicit call to os.Exit
+	os.Exit(0)
+}
+
+func testBadExit() {
+	f1 := func() { println("blog") }
+	f2 := func() { println("blix") }
+	f3 := func() { println("blek") }
+	f4 := func() { println("blub") }
+	f5 := func() { println("blat") }
+	runtime_addExitHook(f1, false)
+	runtime_addExitHook(f2, true)
+	runtime_addExitHook(f3, false)
+	runtime_addExitHook(f4, true)
+	runtime_addExitHook(f5, false)
+	os.Exit(1)
+}
+
+func testPanics() {
+	f1 := func() { println("ok") }
+	f2 := func() { panic("BADBADBAD") }
+	f3 := func() { println("good") }
+	runtime_addExitHook(f1, true)
+	runtime_addExitHook(f2, true)
+	runtime_addExitHook(f3, true)
+	os.Exit(0)
+}
+
+func testHookCallsExit() {
+	f1 := func() { println("ok") }
+	f2 := func() { os.Exit(1) }
+	f3 := func() { println("good") }
+	runtime_addExitHook(f1, true)
+	runtime_addExitHook(f2, true)
+	runtime_addExitHook(f3, true)
+	os.Exit(1)
+}
diff --git a/src/runtime/testdata/testfaketime/faketime.go b/src/runtime/testdata/testfaketime/faketime.go
new file mode 100644
index 0000000..1fb15eb
--- /dev/null
+++ b/src/runtime/testdata/testfaketime/faketime.go
@@ -0,0 +1,28 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Test faketime support. This is its own test program because we have
+// to build it with custom build tags and hence want to minimize
+// dependencies.
+
+package main
+
+import (
+	"os"
+	"time"
+)
+
+func main() {
+	println("line 1")
+	// Stream switch, increments time
+	os.Stdout.WriteString("line 2\n")
+	os.Stdout.WriteString("line 3\n")
+	// Stream switch, increments time
+	os.Stderr.WriteString("line 4\n")
+	// Time jump
+	time.Sleep(1 * time.Second)
+	os.Stdout.WriteString("line 5\n")
+	// Print the current time.
+	os.Stdout.WriteString(time.Now().UTC().Format(time.RFC3339))
+}
diff --git a/src/runtime/testdata/testprog/abort.go b/src/runtime/testdata/testprog/abort.go
new file mode 100644
index 0000000..9e79d4d
--- /dev/null
+++ b/src/runtime/testdata/testprog/abort.go
@@ -0,0 +1,23 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import _ "unsafe" // for go:linkname
+
+func init() {
+	register("Abort", Abort)
+}
+
+//go:linkname runtimeAbort runtime.abort
+func runtimeAbort()
+
+func Abort() {
+	defer func() {
+		recover()
+		panic("BAD: recovered from abort")
+	}()
+	runtimeAbort()
+	println("BAD: after abort")
+}
diff --git a/src/runtime/testdata/testprog/badtraceback.go b/src/runtime/testdata/testprog/badtraceback.go
new file mode 100644
index 0000000..09aa2b8
--- /dev/null
+++ b/src/runtime/testdata/testprog/badtraceback.go
@@ -0,0 +1,50 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"runtime"
+	"runtime/debug"
+	"unsafe"
+)
+
+func init() {
+	register("BadTraceback", BadTraceback)
+}
+
+func BadTraceback() {
+	// Disable GC to prevent traceback at unexpected time.
+	debug.SetGCPercent(-1)
+	// Out of an abundance of caution, also make sure that there are
+	// no GCs actively in progress.
+	runtime.GC()
+
+	// Run badLR1 on its own stack to minimize the stack size and
+	// exercise the stack bounds logic in the hex dump.
+	go badLR1()
+	select {}
+}
+
+//go:noinline
+func badLR1() {
+	// We need two frames on LR machines because we'll smash this
+	// frame's saved LR.
+	badLR2(0)
+}
+
+//go:noinline
+func badLR2(arg int) {
+	// Smash the return PC or saved LR.
+	lrOff := unsafe.Sizeof(uintptr(0))
+	if runtime.GOARCH == "ppc64" || runtime.GOARCH == "ppc64le" {
+		lrOff = 32 // FIXED_FRAME or sys.MinFrameSize
+	}
+	lrPtr := (*uintptr)(unsafe.Pointer(uintptr(unsafe.Pointer(&arg)) - lrOff))
+	*lrPtr = 0xbad
+
+	// Print a backtrace. This should include diagnostics for the
+	// bad return PC and a hex dump.
+	panic("backtrace")
+}
diff --git a/src/runtime/testdata/testprog/checkptr.go b/src/runtime/testdata/testprog/checkptr.go
new file mode 100644
index 0000000..60e71e6
--- /dev/null
+++ b/src/runtime/testdata/testprog/checkptr.go
@@ -0,0 +1,119 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"runtime"
+	"time"
+	"unsafe"
+)
+
+func init() {
+	register("CheckPtrAlignmentNoPtr", CheckPtrAlignmentNoPtr)
+	register("CheckPtrAlignmentPtr", CheckPtrAlignmentPtr)
+	register("CheckPtrAlignmentNilPtr", CheckPtrAlignmentNilPtr)
+	register("CheckPtrArithmetic", CheckPtrArithmetic)
+	register("CheckPtrArithmetic2", CheckPtrArithmetic2)
+	register("CheckPtrSize", CheckPtrSize)
+	register("CheckPtrSmall", CheckPtrSmall)
+	register("CheckPtrSliceOK", CheckPtrSliceOK)
+	register("CheckPtrSliceFail", CheckPtrSliceFail)
+	register("CheckPtrStringOK", CheckPtrStringOK)
+	register("CheckPtrStringFail", CheckPtrStringFail)
+	register("CheckPtrAlignmentNested", CheckPtrAlignmentNested)
+}
+
+func CheckPtrAlignmentNoPtr() {
+	var x [2]int64
+	p := unsafe.Pointer(&x[0])
+	sink2 = (*int64)(unsafe.Pointer(uintptr(p) + 1))
+}
+
+func CheckPtrAlignmentPtr() {
+	var x [2]int64
+	p := unsafe.Pointer(&x[0])
+	sink2 = (**int64)(unsafe.Pointer(uintptr(p) + 1))
+}
+
+// CheckPtrAlignmentNilPtr tests that checkptrAlignment doesn't crash
+// on nil pointers (#47430).
+func CheckPtrAlignmentNilPtr() {
+	var do func(int)
+	do = func(n int) {
+		// Inflate the stack so runtime.shrinkstack gets called during GC
+		if n > 0 {
+			do(n - 1)
+		}
+
+		var p unsafe.Pointer
+		_ = (*int)(p)
+	}
+
+	go func() {
+		for {
+			runtime.GC()
+		}
+	}()
+
+	go func() {
+		for i := 0; ; i++ {
+			do(i % 1024)
+		}
+	}()
+
+	time.Sleep(time.Second)
+}
+
+func CheckPtrArithmetic() {
+	var x int
+	i := uintptr(unsafe.Pointer(&x))
+	sink2 = (*int)(unsafe.Pointer(i))
+}
+
+func CheckPtrArithmetic2() {
+	var x [2]int64
+	p := unsafe.Pointer(&x[1])
+	var one uintptr = 1
+	sink2 = unsafe.Pointer(uintptr(p) & ^one)
+}
+
+func CheckPtrSize() {
+	p := new(int64)
+	sink2 = p
+	sink2 = (*[100]int64)(unsafe.Pointer(p))
+}
+
+func CheckPtrSmall() {
+	sink2 = unsafe.Pointer(uintptr(1))
+}
+
+func CheckPtrSliceOK() {
+	p := new([4]int64)
+	sink2 = unsafe.Slice(&p[1], 3)
+}
+
+func CheckPtrSliceFail() {
+	p := new(int64)
+	sink2 = p
+	sink2 = unsafe.Slice(p, 100)
+}
+
+func CheckPtrStringOK() {
+	p := new([4]byte)
+	sink2 = unsafe.String(&p[1], 3)
+}
+
+func CheckPtrStringFail() {
+	p := new(byte)
+	sink2 = p
+	sink2 = unsafe.String(p, 100)
+}
+
+func CheckPtrAlignmentNested() {
+	s := make([]int8, 100)
+	p := unsafe.Pointer(&s[0])
+	n := 9
+	_ = ((*[10]int8)(unsafe.Pointer((*[10]int64)(unsafe.Pointer(&p)))))[:n:n]
+}
diff --git a/src/runtime/testdata/testprog/crash.go b/src/runtime/testdata/testprog/crash.go
new file mode 100644
index 0000000..38c8f6a
--- /dev/null
+++ b/src/runtime/testdata/testprog/crash.go
@@ -0,0 +1,139 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"fmt"
+	"runtime"
+)
+
+func init() {
+	register("Crash", Crash)
+	register("DoublePanic", DoublePanic)
+	register("ErrorPanic", ErrorPanic)
+	register("StringerPanic", StringerPanic)
+	register("DoubleErrorPanic", DoubleErrorPanic)
+	register("DoubleStringerPanic", DoubleStringerPanic)
+	register("StringPanic", StringPanic)
+	register("NilPanic", NilPanic)
+	register("CircularPanic", CircularPanic)
+}
+
+func test(name string) {
+	defer func() {
+		if x := recover(); x != nil {
+			fmt.Printf(" recovered")
+		}
+		fmt.Printf(" done\n")
+	}()
+	fmt.Printf("%s:", name)
+	var s *string
+	_ = *s
+	fmt.Print("SHOULD NOT BE HERE")
+}
+
+func testInNewThread(name string) {
+	c := make(chan bool)
+	go func() {
+		runtime.LockOSThread()
+		test(name)
+		c <- true
+	}()
+	<-c
+}
+
+func Crash() {
+	runtime.LockOSThread()
+	test("main")
+	testInNewThread("new-thread")
+	testInNewThread("second-new-thread")
+	test("main-again")
+}
+
+type P string
+
+func (p P) String() string {
+	// Try to free the "YYY" string header when the "XXX"
+	// panic is stringified.
+	runtime.GC()
+	runtime.GC()
+	runtime.GC()
+	return string(p)
+}
+
+// Test that panic message is not clobbered.
+// See issue 30150.
+func DoublePanic() {
+	defer func() {
+		panic(P("YYY"))
+	}()
+	panic(P("XXX"))
+}
+
+// Test that panic while panicking discards error message
+// See issue 52257
+type exampleError struct{}
+
+func (e exampleError) Error() string {
+	panic("important error message")
+}
+
+func ErrorPanic() {
+	panic(exampleError{})
+}
+
+type examplePanicError struct{}
+
+func (e examplePanicError) Error() string {
+	panic(exampleError{})
+}
+
+func DoubleErrorPanic() {
+	panic(examplePanicError{})
+}
+
+type exampleStringer struct{}
+
+func (s exampleStringer) String() string {
+	panic("important stringer message")
+}
+
+func StringerPanic() {
+	panic(exampleStringer{})
+}
+
+type examplePanicStringer struct{}
+
+func (s examplePanicStringer) String() string {
+	panic(exampleStringer{})
+}
+
+func DoubleStringerPanic() {
+	panic(examplePanicStringer{})
+}
+
+func StringPanic() {
+	panic("important string message")
+}
+
+func NilPanic() {
+	panic(nil)
+}
+
+type exampleCircleStartError struct{}
+
+func (e exampleCircleStartError) Error() string {
+	panic(exampleCircleEndError{})
+}
+
+type exampleCircleEndError struct{}
+
+func (e exampleCircleEndError) Error() string {
+	panic(exampleCircleStartError{})
+}
+
+func CircularPanic() {
+	panic(exampleCircleStartError{})
+}
diff --git a/src/runtime/testdata/testprog/crashdump.go b/src/runtime/testdata/testprog/crashdump.go
new file mode 100644
index 0000000..bced397
--- /dev/null
+++ b/src/runtime/testdata/testprog/crashdump.go
@@ -0,0 +1,47 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"fmt"
+	"os"
+	"runtime"
+)
+
+func init() {
+	register("CrashDumpsAllThreads", CrashDumpsAllThreads)
+}
+
+func CrashDumpsAllThreads() {
+	const count = 4
+	runtime.GOMAXPROCS(count + 1)
+
+	chans := make([]chan bool, count)
+	for i := range chans {
+		chans[i] = make(chan bool)
+		go crashDumpsAllThreadsLoop(i, chans[i])
+	}
+
+	// Wait for all the goroutines to start executing.
+	for _, c := range chans {
+		<-c
+	}
+
+	// Tell our parent that all the goroutines are executing.
+	if _, err := os.NewFile(3, "pipe").WriteString("x"); err != nil {
+		fmt.Fprintf(os.Stderr, "write to pipe failed: %v\n", err)
+		os.Exit(2)
+	}
+
+	select {}
+}
+
+func crashDumpsAllThreadsLoop(i int, c chan bool) {
+	close(c)
+	for {
+		for j := 0; j < 0x7fffffff; j++ {
+		}
+	}
+}
diff --git a/src/runtime/testdata/testprog/deadlock.go b/src/runtime/testdata/testprog/deadlock.go
new file mode 100644
index 0000000..781acbd
--- /dev/null
+++ b/src/runtime/testdata/testprog/deadlock.go
@@ -0,0 +1,363 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"fmt"
+	"runtime"
+	"runtime/debug"
+	"time"
+)
+
+func init() {
+	registerInit("InitDeadlock", InitDeadlock)
+	registerInit("NoHelperGoroutines", NoHelperGoroutines)
+
+	register("SimpleDeadlock", SimpleDeadlock)
+	register("LockedDeadlock", LockedDeadlock)
+	register("LockedDeadlock2", LockedDeadlock2)
+	register("GoexitDeadlock", GoexitDeadlock)
+	register("StackOverflow", StackOverflow)
+	register("ThreadExhaustion", ThreadExhaustion)
+	register("RecursivePanic", RecursivePanic)
+	register("RecursivePanic2", RecursivePanic2)
+	register("RecursivePanic3", RecursivePanic3)
+	register("RecursivePanic4", RecursivePanic4)
+	register("RecursivePanic5", RecursivePanic5)
+	register("GoexitExit", GoexitExit)
+	register("GoNil", GoNil)
+	register("MainGoroutineID", MainGoroutineID)
+	register("Breakpoint", Breakpoint)
+	register("GoexitInPanic", GoexitInPanic)
+	register("PanicAfterGoexit", PanicAfterGoexit)
+	register("RecoveredPanicAfterGoexit", RecoveredPanicAfterGoexit)
+	register("RecoverBeforePanicAfterGoexit", RecoverBeforePanicAfterGoexit)
+	register("RecoverBeforePanicAfterGoexit2", RecoverBeforePanicAfterGoexit2)
+	register("PanicTraceback", PanicTraceback)
+	register("GoschedInPanic", GoschedInPanic)
+	register("SyscallInPanic", SyscallInPanic)
+	register("PanicLoop", PanicLoop)
+}
+
+func SimpleDeadlock() {
+	select {}
+	panic("not reached")
+}
+
+func InitDeadlock() {
+	select {}
+	panic("not reached")
+}
+
+func LockedDeadlock() {
+	runtime.LockOSThread()
+	select {}
+}
+
+func LockedDeadlock2() {
+	go func() {
+		runtime.LockOSThread()
+		select {}
+	}()
+	time.Sleep(time.Millisecond)
+	select {}
+}
+
+func GoexitDeadlock() {
+	F := func() {
+		for i := 0; i < 10; i++ {
+		}
+	}
+
+	go F()
+	go F()
+	runtime.Goexit()
+}
+
+func StackOverflow() {
+	var f func() byte
+	f = func() byte {
+		var buf [64 << 10]byte
+		return buf[0] + f()
+	}
+	debug.SetMaxStack(1474560)
+	f()
+}
+
+func ThreadExhaustion() {
+	debug.SetMaxThreads(10)
+	c := make(chan int)
+	for i := 0; i < 100; i++ {
+		go func() {
+			runtime.LockOSThread()
+			c <- 0
+			select {}
+		}()
+		<-c
+	}
+}
+
+func RecursivePanic() {
+	func() {
+		defer func() {
+			fmt.Println(recover())
+		}()
+		var x [8192]byte
+		func(x [8192]byte) {
+			defer func() {
+				if err := recover(); err != nil {
+					panic("wrap: " + err.(string))
+				}
+			}()
+			panic("bad")
+		}(x)
+	}()
+	panic("again")
+}
+
+// Same as RecursivePanic, but do the first recover and the second panic in
+// separate defers, and make sure they are executed in the correct order.
+func RecursivePanic2() {
+	func() {
+		defer func() {
+			fmt.Println(recover())
+		}()
+		var x [8192]byte
+		func(x [8192]byte) {
+			defer func() {
+				panic("second panic")
+			}()
+			defer func() {
+				fmt.Println(recover())
+			}()
+			panic("first panic")
+		}(x)
+	}()
+	panic("third panic")
+}
+
+// Make sure that the first panic finished as a panic, even though the second
+// panic was recovered
+func RecursivePanic3() {
+	defer func() {
+		defer func() {
+			recover()
+		}()
+		panic("second panic")
+	}()
+	panic("first panic")
+}
+
+// Test case where a single defer recovers one panic but starts another panic. If
+// the second panic is never recovered, then the recovered first panic will still
+// appear on the panic stack (labeled '[recovered]') and the runtime stack.
+func RecursivePanic4() {
+	defer func() {
+		recover()
+		panic("second panic")
+	}()
+	panic("first panic")
+}
+
+// Test case where we have an open-coded defer higher up the stack (in two), and
+// in the current function (three) we recover in a defer while we still have
+// another defer to be processed.
+func RecursivePanic5() {
+	one()
+	panic("third panic")
+}
+
+//go:noinline
+func one() {
+	two()
+}
+
+//go:noinline
+func two() {
+	defer func() {
+	}()
+
+	three()
+}
+
+//go:noinline
+func three() {
+	defer func() {
+	}()
+
+	defer func() {
+		fmt.Println(recover())
+	}()
+
+	defer func() {
+		fmt.Println(recover())
+		panic("second panic")
+	}()
+
+	panic("first panic")
+}
+
+func GoexitExit() {
+	println("t1")
+	go func() {
+		time.Sleep(time.Millisecond)
+	}()
+	i := 0
+	println("t2")
+	runtime.SetFinalizer(&i, func(p *int) {})
+	println("t3")
+	runtime.GC()
+	println("t4")
+	runtime.Goexit()
+}
+
+func GoNil() {
+	defer func() {
+		recover()
+	}()
+	var f func()
+	go f()
+	select {}
+}
+
+func MainGoroutineID() {
+	panic("test")
+}
+
+func NoHelperGoroutines() {
+	i := 0
+	runtime.SetFinalizer(&i, func(p *int) {})
+	time.AfterFunc(time.Hour, func() {})
+	panic("oops")
+}
+
+func Breakpoint() {
+	runtime.Breakpoint()
+}
+
+func GoexitInPanic() {
+	go func() {
+		defer func() {
+			runtime.Goexit()
+		}()
+		panic("hello")
+	}()
+	runtime.Goexit()
+}
+
+type errorThatGosched struct{}
+
+func (errorThatGosched) Error() string {
+	runtime.Gosched()
+	return "errorThatGosched"
+}
+
+func GoschedInPanic() {
+	panic(errorThatGosched{})
+}
+
+type errorThatPrint struct{}
+
+func (errorThatPrint) Error() string {
+	fmt.Println("1")
+	fmt.Println("2")
+	return "3"
+}
+
+func SyscallInPanic() {
+	panic(errorThatPrint{})
+}
+
+func PanicAfterGoexit() {
+	defer func() {
+		panic("hello")
+	}()
+	runtime.Goexit()
+}
+
+func RecoveredPanicAfterGoexit() {
+	defer func() {
+		defer func() {
+			r := recover()
+			if r == nil {
+				panic("bad recover")
+			}
+		}()
+		panic("hello")
+	}()
+	runtime.Goexit()
+}
+
+func RecoverBeforePanicAfterGoexit() {
+	// 1. defer a function that recovers
+	// 2. defer a function that panics
+	// 3. call goexit
+	// Goexit runs the #2 defer. Its panic
+	// is caught by the #1 defer.  For Goexit, we explicitly
+	// resume execution in the Goexit loop, instead of resuming
+	// execution in the caller (which would make the Goexit disappear!)
+	defer func() {
+		r := recover()
+		if r == nil {
+			panic("bad recover")
+		}
+	}()
+	defer func() {
+		panic("hello")
+	}()
+	runtime.Goexit()
+}
+
+func RecoverBeforePanicAfterGoexit2() {
+	for i := 0; i < 2; i++ {
+		defer func() {
+		}()
+	}
+	// 1. defer a function that recovers
+	// 2. defer a function that panics
+	// 3. call goexit
+	// Goexit runs the #2 defer. Its panic
+	// is caught by the #1 defer.  For Goexit, we explicitly
+	// resume execution in the Goexit loop, instead of resuming
+	// execution in the caller (which would make the Goexit disappear!)
+	defer func() {
+		r := recover()
+		if r == nil {
+			panic("bad recover")
+		}
+	}()
+	defer func() {
+		panic("hello")
+	}()
+	runtime.Goexit()
+}
+
+func PanicTraceback() {
+	pt1()
+}
+
+func pt1() {
+	defer func() {
+		panic("panic pt1")
+	}()
+	pt2()
+}
+
+func pt2() {
+	defer func() {
+		panic("panic pt2")
+	}()
+	panic("hello")
+}
+
+type panicError struct{}
+
+func (*panicError) Error() string {
+	panic("double error")
+}
+
+func PanicLoop() {
+	panic(&panicError{})
+}
diff --git a/src/runtime/testdata/testprog/framepointer.go b/src/runtime/testdata/testprog/framepointer.go
new file mode 100644
index 0000000..cee6f7d
--- /dev/null
+++ b/src/runtime/testdata/testprog/framepointer.go
@@ -0,0 +1,44 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build amd64 || arm64
+
+package main
+
+import "unsafe"
+
+func init() {
+	register("FramePointerAdjust", FramePointerAdjust)
+}
+
+func FramePointerAdjust() { framePointerAdjust1(0) }
+
+//go:noinline
+func framePointerAdjust1(x int) {
+	argp := uintptr(unsafe.Pointer(&x))
+	fp := *getFP()
+	if !(argp-0x100 <= fp && fp <= argp+0x100) {
+		print("saved FP=", fp, " &x=", argp, "\n")
+		panic("FAIL")
+	}
+
+	// grow the stack
+	grow(10000)
+
+	// check again
+	argp = uintptr(unsafe.Pointer(&x))
+	fp = *getFP()
+	if !(argp-0x100 <= fp && fp <= argp+0x100) {
+		print("saved FP=", fp, " &x=", argp, "\n")
+		panic("FAIL")
+	}
+}
+
+func grow(n int) {
+	if n > 0 {
+		grow(n - 1)
+	}
+}
+
+func getFP() *uintptr
diff --git a/src/runtime/testdata/testprog/framepointer_amd64.s b/src/runtime/testdata/testprog/framepointer_amd64.s
new file mode 100644
index 0000000..2cd1299
--- /dev/null
+++ b/src/runtime/testdata/testprog/framepointer_amd64.s
@@ -0,0 +1,9 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT	·getFP(SB), NOSPLIT|NOFRAME, $0-8
+	MOVQ	BP, ret+0(FP)
+	RET
diff --git a/src/runtime/testdata/testprog/framepointer_arm64.s b/src/runtime/testdata/testprog/framepointer_arm64.s
new file mode 100644
index 0000000..cbaa286
--- /dev/null
+++ b/src/runtime/testdata/testprog/framepointer_arm64.s
@@ -0,0 +1,9 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "textflag.h"
+
+TEXT	·getFP(SB), NOSPLIT|NOFRAME, $0-8
+	MOVD	R29, ret+0(FP)
+	RET
diff --git a/src/runtime/testdata/testprog/gc.go b/src/runtime/testdata/testprog/gc.go
new file mode 100644
index 0000000..5dc85fb
--- /dev/null
+++ b/src/runtime/testdata/testprog/gc.go
@@ -0,0 +1,420 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"fmt"
+	"math"
+	"os"
+	"runtime"
+	"runtime/debug"
+	"runtime/metrics"
+	"sync"
+	"sync/atomic"
+	"time"
+	"unsafe"
+)
+
+func init() {
+	register("GCFairness", GCFairness)
+	register("GCFairness2", GCFairness2)
+	register("GCSys", GCSys)
+	register("GCPhys", GCPhys)
+	register("DeferLiveness", DeferLiveness)
+	register("GCZombie", GCZombie)
+	register("GCMemoryLimit", GCMemoryLimit)
+	register("GCMemoryLimitNoGCPercent", GCMemoryLimitNoGCPercent)
+}
+
+func GCSys() {
+	runtime.GOMAXPROCS(1)
+	memstats := new(runtime.MemStats)
+	runtime.GC()
+	runtime.ReadMemStats(memstats)
+	sys := memstats.Sys
+
+	runtime.MemProfileRate = 0 // disable profiler
+
+	itercount := 100000
+	for i := 0; i < itercount; i++ {
+		workthegc()
+	}
+
+	// Should only be using a few MB.
+	// We allocated 100 MB or (if not short) 1 GB.
+	runtime.ReadMemStats(memstats)
+	if sys > memstats.Sys {
+		sys = 0
+	} else {
+		sys = memstats.Sys - sys
+	}
+	if sys > 16<<20 {
+		fmt.Printf("using too much memory: %d bytes\n", sys)
+		return
+	}
+	fmt.Printf("OK\n")
+}
+
+var sink []byte
+
+func workthegc() []byte {
+	sink = make([]byte, 1029)
+	return sink
+}
+
+func GCFairness() {
+	runtime.GOMAXPROCS(1)
+	f, err := os.Open("/dev/null")
+	if os.IsNotExist(err) {
+		// This test tests what it is intended to test only if writes are fast.
+		// If there is no /dev/null, we just don't execute the test.
+		fmt.Println("OK")
+		return
+	}
+	if err != nil {
+		fmt.Println(err)
+		os.Exit(1)
+	}
+	for i := 0; i < 2; i++ {
+		go func() {
+			for {
+				f.Write([]byte("."))
+			}
+		}()
+	}
+	time.Sleep(10 * time.Millisecond)
+	fmt.Println("OK")
+}
+
+func GCFairness2() {
+	// Make sure user code can't exploit the GC's high priority
+	// scheduling to make scheduling of user code unfair. See
+	// issue #15706.
+	runtime.GOMAXPROCS(1)
+	debug.SetGCPercent(1)
+	var count [3]int64
+	var sink [3]any
+	for i := range count {
+		go func(i int) {
+			for {
+				sink[i] = make([]byte, 1024)
+				atomic.AddInt64(&count[i], 1)
+			}
+		}(i)
+	}
+	// Note: If the unfairness is really bad, it may not even get
+	// past the sleep.
+	//
+	// If the scheduling rules change, this may not be enough time
+	// to let all goroutines run, but for now we cycle through
+	// them rapidly.
+	//
+	// OpenBSD's scheduler makes every usleep() take at least
+	// 20ms, so we need a long time to ensure all goroutines have
+	// run. If they haven't run after 30ms, give it another 1000ms
+	// and check again.
+	time.Sleep(30 * time.Millisecond)
+	var fail bool
+	for i := range count {
+		if atomic.LoadInt64(&count[i]) == 0 {
+			fail = true
+		}
+	}
+	if fail {
+		time.Sleep(1 * time.Second)
+		for i := range count {
+			if atomic.LoadInt64(&count[i]) == 0 {
+				fmt.Printf("goroutine %d did not run\n", i)
+				return
+			}
+		}
+	}
+	fmt.Println("OK")
+}
+
+func GCPhys() {
+	// This test ensures that heap-growth scavenging is working as intended.
+	//
+	// It attempts to construct a sizeable "swiss cheese" heap, with many
+	// allocChunk-sized holes. Then, it triggers a heap growth by trying to
+	// allocate as much memory as would fit in those holes.
+	//
+	// The heap growth should cause a large number of those holes to be
+	// returned to the OS.
+
+	const (
+		// The total amount of memory we're willing to allocate.
+		allocTotal = 32 << 20
+
+		// The page cache could hide 64 8-KiB pages from the scavenger today.
+		maxPageCache = (8 << 10) * 64
+	)
+
+	// How big the allocations are needs to depend on the page size.
+	// If the page size is too big and the allocations are too small,
+	// they might not be aligned to the physical page size, so the scavenger
+	// will gloss over them.
+	pageSize := os.Getpagesize()
+	var allocChunk int
+	if pageSize <= 8<<10 {
+		allocChunk = 64 << 10
+	} else {
+		allocChunk = 512 << 10
+	}
+	allocs := allocTotal / allocChunk
+
+	// Set GC percent just so this test is a little more consistent in the
+	// face of varying environments.
+	debug.SetGCPercent(100)
+
+	// Set GOMAXPROCS to 1 to minimize the amount of memory held in the page cache,
+	// and to reduce the chance that the background scavenger gets scheduled.
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(1))
+
+	// Allocate allocTotal bytes of memory in allocChunk byte chunks.
+	// Alternate between whether the chunk will be held live or will be
+	// condemned to GC to create holes in the heap.
+	saved := make([][]byte, allocs/2+1)
+	condemned := make([][]byte, allocs/2)
+	for i := 0; i < allocs; i++ {
+		b := make([]byte, allocChunk)
+		if i%2 == 0 {
+			saved = append(saved, b)
+		} else {
+			condemned = append(condemned, b)
+		}
+	}
+
+	// Run a GC cycle just so we're at a consistent state.
+	runtime.GC()
+
+	// Drop the only reference to all the condemned memory.
+	condemned = nil
+
+	// Clear the condemned memory.
+	runtime.GC()
+
+	// At this point, the background scavenger is likely running
+	// and could pick up the work, so the next line of code doesn't
+	// end up doing anything. That's fine. What's important is that
+	// this test fails somewhat regularly if the runtime doesn't
+	// scavenge on heap growth, and doesn't fail at all otherwise.
+
+	// Make a large allocation that in theory could fit, but won't
+	// because we turned the heap into swiss cheese.
+	saved = append(saved, make([]byte, allocTotal/2))
+
+	// heapBacked is an estimate of the amount of physical memory used by
+	// this test. HeapSys is an estimate of the size of the mapped virtual
+	// address space (which may or may not be backed by physical pages)
+	// whereas HeapReleased is an estimate of the amount of bytes returned
+	// to the OS. Their difference then roughly corresponds to the amount
+	// of virtual address space that is backed by physical pages.
+	//
+	// heapBacked also subtracts out maxPageCache bytes of memory because
+	// this is memory that may be hidden from the scavenger per-P. Since
+	// GOMAXPROCS=1 here, subtracting it out once is fine.
+	var stats runtime.MemStats
+	runtime.ReadMemStats(&stats)
+	heapBacked := stats.HeapSys - stats.HeapReleased - maxPageCache
+	// If heapBacked does not exceed the heap goal by more than retainExtraPercent
+	// then the scavenger is working as expected; the newly-created holes have been
+	// scavenged immediately as part of the allocations which cannot fit in the holes.
+	//
+	// Since the runtime should scavenge the entirety of the remaining holes,
+	// theoretically there should be no more free and unscavenged memory. However due
+	// to other allocations that happen during this test we may still see some physical
+	// memory over-use.
+	overuse := (float64(heapBacked) - float64(stats.HeapAlloc)) / float64(stats.HeapAlloc)
+	// Check against our overuse threshold, which is what the scavenger always reserves
+	// to encourage allocation of memory that doesn't need to be faulted in.
+	//
+	// Add additional slack in case the page size is large and the scavenger
+	// can't reach that memory because it doesn't constitute a complete aligned
+	// physical page. Assume the worst case: a full physical page out of each
+	// allocation.
+	threshold := 0.1 + float64(pageSize)/float64(allocChunk)
+	if overuse <= threshold {
+		fmt.Println("OK")
+		return
+	}
+	// Physical memory utilization exceeds the threshold, so heap-growth scavenging
+	// did not operate as expected.
+	//
+	// In the context of this test, this indicates a large amount of
+	// fragmentation with physical pages that are otherwise unused but not
+	// returned to the OS.
+	fmt.Printf("exceeded physical memory overuse threshold of %3.2f%%: %3.2f%%\n"+
+		"(alloc: %d, goal: %d, sys: %d, rel: %d, objs: %d)\n", threshold*100, overuse*100,
+		stats.HeapAlloc, stats.NextGC, stats.HeapSys, stats.HeapReleased, len(saved))
+	runtime.KeepAlive(saved)
+	runtime.KeepAlive(condemned)
+}
+
+// Test that defer closure is correctly scanned when the stack is scanned.
+func DeferLiveness() {
+	var x [10]int
+	escape(&x)
+	fn := func() {
+		if x[0] != 42 {
+			panic("FAIL")
+		}
+	}
+	defer fn()
+
+	x[0] = 42
+	runtime.GC()
+	runtime.GC()
+	runtime.GC()
+}
+
+//go:noinline
+func escape(x any) { sink2 = x; sink2 = nil }
+
+var sink2 any
+
+// Test zombie object detection and reporting.
+func GCZombie() {
+	// Allocate several objects of unusual size (so free slots are
+	// unlikely to all be re-allocated by the runtime).
+	const size = 190
+	const count = 8192 / size
+	keep := make([]*byte, 0, (count+1)/2)
+	free := make([]uintptr, 0, (count+1)/2)
+	zombies := make([]*byte, 0, len(free))
+	for i := 0; i < count; i++ {
+		obj := make([]byte, size)
+		p := &obj[0]
+		if i%2 == 0 {
+			keep = append(keep, p)
+		} else {
+			free = append(free, uintptr(unsafe.Pointer(p)))
+		}
+	}
+
+	// Free the unreferenced objects.
+	runtime.GC()
+
+	// Bring the free objects back to life.
+	for _, p := range free {
+		zombies = append(zombies, (*byte)(unsafe.Pointer(p)))
+	}
+
+	// GC should detect the zombie objects.
+	runtime.GC()
+	println("failed")
+	runtime.KeepAlive(keep)
+	runtime.KeepAlive(zombies)
+}
+
+func GCMemoryLimit() {
+	gcMemoryLimit(100)
+}
+
+func GCMemoryLimitNoGCPercent() {
+	gcMemoryLimit(-1)
+}
+
+// Test SetMemoryLimit functionality.
+//
+// This test lives here instead of runtime/debug because the entire
+// implementation is in the runtime, and testprog gives us a more
+// consistent testing environment to help avoid flakiness.
+func gcMemoryLimit(gcPercent int) {
+	if oldProcs := runtime.GOMAXPROCS(4); oldProcs < 4 {
+		// Fail if the default GOMAXPROCS isn't at least 4.
+		// Whatever invokes this should check and do a proper t.Skip.
+		println("insufficient CPUs")
+		return
+	}
+	debug.SetGCPercent(gcPercent)
+
+	const myLimit = 256 << 20
+	if limit := debug.SetMemoryLimit(-1); limit != math.MaxInt64 {
+		print("expected MaxInt64 limit, got ", limit, " bytes instead\n")
+		return
+	}
+	if limit := debug.SetMemoryLimit(myLimit); limit != math.MaxInt64 {
+		print("expected MaxInt64 limit, got ", limit, " bytes instead\n")
+		return
+	}
+	if limit := debug.SetMemoryLimit(-1); limit != myLimit {
+		print("expected a ", myLimit, "-byte limit, got ", limit, " bytes instead\n")
+		return
+	}
+
+	target := make(chan int64)
+	var wg sync.WaitGroup
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+
+		sinkSize := int(<-target / memLimitUnit)
+		for {
+			if len(memLimitSink) != sinkSize {
+				memLimitSink = make([]*[memLimitUnit]byte, sinkSize)
+			}
+			for i := 0; i < len(memLimitSink); i++ {
+				memLimitSink[i] = new([memLimitUnit]byte)
+				// Write to this memory to slow down the allocator, otherwise
+				// we get flaky behavior. See #52433.
+				for j := range memLimitSink[i] {
+					memLimitSink[i][j] = 9
+				}
+			}
+			// Again, Gosched to slow down the allocator.
+			runtime.Gosched()
+			select {
+			case newTarget := <-target:
+				if newTarget == math.MaxInt64 {
+					return
+				}
+				sinkSize = int(newTarget / memLimitUnit)
+			default:
+			}
+		}
+	}()
+	var m [2]metrics.Sample
+	m[0].Name = "/memory/classes/total:bytes"
+	m[1].Name = "/memory/classes/heap/released:bytes"
+
+	// Don't set this too high, because this is a *live heap* target which
+	// is not directly comparable to a total memory limit.
+	maxTarget := int64((myLimit / 10) * 8)
+	increment := int64((myLimit / 10) * 1)
+	for i := increment; i < maxTarget; i += increment {
+		target <- i
+
+		// Check to make sure the memory limit is maintained.
+		// We're just sampling here so if it transiently goes over we might miss it.
+		// The internal accounting is inconsistent anyway, so going over by a few
+		// pages is certainly possible. Just make sure we're within some bound.
+		// Note that to avoid flakiness due to #52433 (especially since we're allocating
+		// somewhat heavily here) this bound is kept loose. In practice the Go runtime
+		// should do considerably better than this bound.
+		bound := int64(myLimit + 16<<20)
+		start := time.Now()
+		for time.Since(start) < 200*time.Millisecond {
+			metrics.Read(m[:])
+			retained := int64(m[0].Value.Uint64() - m[1].Value.Uint64())
+			if retained > bound {
+				print("retained=", retained, " limit=", myLimit, " bound=", bound, "\n")
+				panic("exceeded memory limit by more than bound allows")
+			}
+			runtime.Gosched()
+		}
+	}
+
+	if limit := debug.SetMemoryLimit(math.MaxInt64); limit != myLimit {
+		print("expected a ", myLimit, "-byte limit, got ", limit, " bytes instead\n")
+		return
+	}
+	println("OK")
+}
+
+// Pick a value close to the page size. We want to m
+const memLimitUnit = 8000
+
+var memLimitSink []*[memLimitUnit]byte
diff --git a/src/runtime/testdata/testprog/lockosthread.go b/src/runtime/testdata/testprog/lockosthread.go
new file mode 100644
index 0000000..90d98e4
--- /dev/null
+++ b/src/runtime/testdata/testprog/lockosthread.go
@@ -0,0 +1,246 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"os"
+	"runtime"
+	"sync"
+	"time"
+)
+
+var mainTID int
+
+func init() {
+	registerInit("LockOSThreadMain", func() {
+		// init is guaranteed to run on the main thread.
+		mainTID = gettid()
+	})
+	register("LockOSThreadMain", LockOSThreadMain)
+
+	registerInit("LockOSThreadAlt", func() {
+		// Lock the OS thread now so main runs on the main thread.
+		runtime.LockOSThread()
+	})
+	register("LockOSThreadAlt", LockOSThreadAlt)
+
+	registerInit("LockOSThreadAvoidsStatePropagation", func() {
+		// Lock the OS thread now so main runs on the main thread.
+		runtime.LockOSThread()
+	})
+	register("LockOSThreadAvoidsStatePropagation", LockOSThreadAvoidsStatePropagation)
+	register("LockOSThreadTemplateThreadRace", LockOSThreadTemplateThreadRace)
+}
+
+func LockOSThreadMain() {
+	// gettid only works on Linux, so on other platforms this just
+	// checks that the runtime doesn't do anything terrible.
+
+	// This requires GOMAXPROCS=1 from the beginning to reliably
+	// start a goroutine on the main thread.
+	if runtime.GOMAXPROCS(-1) != 1 {
+		println("requires GOMAXPROCS=1")
+		os.Exit(1)
+	}
+
+	ready := make(chan bool, 1)
+	go func() {
+		// Because GOMAXPROCS=1, this *should* be on the main
+		// thread. Stay there.
+		runtime.LockOSThread()
+		if mainTID != 0 && gettid() != mainTID {
+			println("failed to start goroutine on main thread")
+			os.Exit(1)
+		}
+		// Exit with the thread locked, which should exit the
+		// main thread.
+		ready <- true
+	}()
+	<-ready
+	time.Sleep(1 * time.Millisecond)
+	// Check that this goroutine is still running on a different
+	// thread.
+	if mainTID != 0 && gettid() == mainTID {
+		println("goroutine migrated to locked thread")
+		os.Exit(1)
+	}
+	println("OK")
+}
+
+func LockOSThreadAlt() {
+	// This is running locked to the main OS thread.
+
+	var subTID int
+	ready := make(chan bool, 1)
+	go func() {
+		// This goroutine must be running on a new thread.
+		runtime.LockOSThread()
+		subTID = gettid()
+		ready <- true
+		// Exit with the thread locked.
+	}()
+	<-ready
+	runtime.UnlockOSThread()
+	for i := 0; i < 100; i++ {
+		time.Sleep(1 * time.Millisecond)
+		// Check that this goroutine is running on a different thread.
+		if subTID != 0 && gettid() == subTID {
+			println("locked thread reused")
+			os.Exit(1)
+		}
+		exists, supported := tidExists(subTID)
+		if !supported || !exists {
+			goto ok
+		}
+	}
+	println("sub thread", subTID, "still running")
+	return
+ok:
+	println("OK")
+}
+
+func LockOSThreadAvoidsStatePropagation() {
+	// This test is similar to LockOSThreadAlt in that it will detect if a thread
+	// which should have died is still running. However, rather than do this with
+	// thread IDs, it does this by unsharing state on that thread. This way, it
+	// also detects whether new threads were cloned from the dead thread, and not
+	// from a clean thread. Cloning from a locked thread is undesirable since
+	// cloned threads will inherit potentially unwanted OS state.
+	//
+	// unshareFs, getcwd, and chdir("/tmp") are only guaranteed to work on
+	// Linux, so on other platforms this just checks that the runtime doesn't
+	// do anything terrible.
+	//
+	// This is running locked to the main OS thread.
+
+	// GOMAXPROCS=1 makes this fail much more reliably if a tainted thread is
+	// cloned from.
+	if runtime.GOMAXPROCS(-1) != 1 {
+		println("requires GOMAXPROCS=1")
+		os.Exit(1)
+	}
+
+	if err := chdir("/"); err != nil {
+		println("failed to chdir:", err.Error())
+		os.Exit(1)
+	}
+	// On systems other than Linux, cwd == "".
+	cwd, err := getcwd()
+	if err != nil {
+		println("failed to get cwd:", err.Error())
+		os.Exit(1)
+	}
+	if cwd != "" && cwd != "/" {
+		println("unexpected cwd", cwd, " wanted /")
+		os.Exit(1)
+	}
+
+	ready := make(chan bool, 1)
+	go func() {
+		// This goroutine must be running on a new thread.
+		runtime.LockOSThread()
+
+		// Unshare details about the FS, like the CWD, with
+		// the rest of the process on this thread.
+		// On systems other than Linux, this is a no-op.
+		if err := unshareFs(); err != nil {
+			if err == errNotPermitted {
+				println("unshare not permitted")
+				os.Exit(0)
+			}
+			println("failed to unshare fs:", err.Error())
+			os.Exit(1)
+		}
+		// Chdir to somewhere else on this thread.
+		// On systems other than Linux, this is a no-op.
+		if err := chdir(os.TempDir()); err != nil {
+			println("failed to chdir:", err.Error())
+			os.Exit(1)
+		}
+
+		// The state on this thread is now considered "tainted", but it
+		// should no longer be observable in any other context.
+
+		ready <- true
+		// Exit with the thread locked.
+	}()
+	<-ready
+
+	// Spawn yet another goroutine and lock it. Since GOMAXPROCS=1, if
+	// for some reason state from the (hopefully dead) locked thread above
+	// propagated into a newly created thread (via clone), or that thread
+	// is actually being re-used, then we should get scheduled on such a
+	// thread with high likelihood.
+	done := make(chan bool)
+	go func() {
+		runtime.LockOSThread()
+
+		// Get the CWD and check if this is the same as the main thread's
+		// CWD. Every thread should share the same CWD.
+		// On systems other than Linux, wd == "".
+		wd, err := getcwd()
+		if err != nil {
+			println("failed to get cwd:", err.Error())
+			os.Exit(1)
+		}
+		if wd != cwd {
+			println("bad state from old thread propagated after it should have died")
+			os.Exit(1)
+		}
+		<-done
+
+		runtime.UnlockOSThread()
+	}()
+	done <- true
+	runtime.UnlockOSThread()
+	println("OK")
+}
+
+func LockOSThreadTemplateThreadRace() {
+	// This test attempts to reproduce the race described in
+	// golang.org/issue/38931. To do so, we must have a stop-the-world
+	// (achieved via ReadMemStats) racing with two LockOSThread calls.
+	//
+	// While this test attempts to line up the timing, it is only expected
+	// to fail (and thus hang) around 2% of the time if the race is
+	// present.
+
+	// Ensure enough Ps to actually run everything in parallel. Though on
+	// <4 core machines, we are still at the whim of the kernel scheduler.
+	runtime.GOMAXPROCS(4)
+
+	go func() {
+		// Stop the world; race with LockOSThread below.
+		var m runtime.MemStats
+		for {
+			runtime.ReadMemStats(&m)
+		}
+	}()
+
+	// Try to synchronize both LockOSThreads.
+	start := time.Now().Add(10 * time.Millisecond)
+
+	var wg sync.WaitGroup
+	wg.Add(2)
+
+	for i := 0; i < 2; i++ {
+		go func() {
+			for time.Now().Before(start) {
+			}
+
+			// Add work to the local runq to trigger early startm
+			// in handoffp.
+			go func() {}()
+
+			runtime.LockOSThread()
+			runtime.Gosched() // add a preemption point.
+			wg.Done()
+		}()
+	}
+
+	wg.Wait()
+	// If both LockOSThreads completed then we did not hit the race.
+	println("OK")
+}
diff --git a/src/runtime/testdata/testprog/main.go b/src/runtime/testdata/testprog/main.go
new file mode 100644
index 0000000..ae491a2
--- /dev/null
+++ b/src/runtime/testdata/testprog/main.go
@@ -0,0 +1,35 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "os"
+
+var cmds = map[string]func(){}
+
+func register(name string, f func()) {
+	if cmds[name] != nil {
+		panic("duplicate registration: " + name)
+	}
+	cmds[name] = f
+}
+
+func registerInit(name string, f func()) {
+	if len(os.Args) >= 2 && os.Args[1] == name {
+		f()
+	}
+}
+
+func main() {
+	if len(os.Args) < 2 {
+		println("usage: " + os.Args[0] + " name-of-test")
+		return
+	}
+	f := cmds[os.Args[1]]
+	if f == nil {
+		println("unknown function: " + os.Args[1])
+		return
+	}
+	f()
+}
diff --git a/src/runtime/testdata/testprog/map.go b/src/runtime/testdata/testprog/map.go
new file mode 100644
index 0000000..5524289
--- /dev/null
+++ b/src/runtime/testdata/testprog/map.go
@@ -0,0 +1,77 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "runtime"
+
+func init() {
+	register("concurrentMapWrites", concurrentMapWrites)
+	register("concurrentMapReadWrite", concurrentMapReadWrite)
+	register("concurrentMapIterateWrite", concurrentMapIterateWrite)
+}
+
+func concurrentMapWrites() {
+	m := map[int]int{}
+	c := make(chan struct{})
+	go func() {
+		for i := 0; i < 10000; i++ {
+			m[5] = 0
+			runtime.Gosched()
+		}
+		c <- struct{}{}
+	}()
+	go func() {
+		for i := 0; i < 10000; i++ {
+			m[6] = 0
+			runtime.Gosched()
+		}
+		c <- struct{}{}
+	}()
+	<-c
+	<-c
+}
+
+func concurrentMapReadWrite() {
+	m := map[int]int{}
+	c := make(chan struct{})
+	go func() {
+		for i := 0; i < 10000; i++ {
+			m[5] = 0
+			runtime.Gosched()
+		}
+		c <- struct{}{}
+	}()
+	go func() {
+		for i := 0; i < 10000; i++ {
+			_ = m[6]
+			runtime.Gosched()
+		}
+		c <- struct{}{}
+	}()
+	<-c
+	<-c
+}
+
+func concurrentMapIterateWrite() {
+	m := map[int]int{}
+	c := make(chan struct{})
+	go func() {
+		for i := 0; i < 10000; i++ {
+			m[5] = 0
+			runtime.Gosched()
+		}
+		c <- struct{}{}
+	}()
+	go func() {
+		for i := 0; i < 10000; i++ {
+			for range m {
+			}
+			runtime.Gosched()
+		}
+		c <- struct{}{}
+	}()
+	<-c
+	<-c
+}
diff --git a/src/runtime/testdata/testprog/memprof.go b/src/runtime/testdata/testprog/memprof.go
new file mode 100644
index 0000000..0392e60
--- /dev/null
+++ b/src/runtime/testdata/testprog/memprof.go
@@ -0,0 +1,51 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"bytes"
+	"fmt"
+	"os"
+	"runtime"
+	"runtime/pprof"
+)
+
+func init() {
+	register("MemProf", MemProf)
+}
+
+var memProfBuf bytes.Buffer
+var memProfStr string
+
+func MemProf() {
+	// Force heap sampling for determinism.
+	runtime.MemProfileRate = 1
+
+	for i := 0; i < 10; i++ {
+		fmt.Fprintf(&memProfBuf, "%*d\n", i, i)
+	}
+	memProfStr = memProfBuf.String()
+
+	runtime.GC()
+
+	f, err := os.CreateTemp("", "memprof")
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	if err := pprof.WriteHeapProfile(f); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	name := f.Name()
+	if err := f.Close(); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	fmt.Println(name)
+}
diff --git a/src/runtime/testdata/testprog/misc.go b/src/runtime/testdata/testprog/misc.go
new file mode 100644
index 0000000..7ccd389
--- /dev/null
+++ b/src/runtime/testdata/testprog/misc.go
@@ -0,0 +1,15 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "runtime"
+
+func init() {
+	register("NumGoroutine", NumGoroutine)
+}
+
+func NumGoroutine() {
+	println(runtime.NumGoroutine())
+}
diff --git a/src/runtime/testdata/testprog/numcpu_freebsd.go b/src/runtime/testdata/testprog/numcpu_freebsd.go
new file mode 100644
index 0000000..310c212
--- /dev/null
+++ b/src/runtime/testdata/testprog/numcpu_freebsd.go
@@ -0,0 +1,140 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"bytes"
+	"fmt"
+	"os"
+	"os/exec"
+	"regexp"
+	"runtime"
+	"strconv"
+	"strings"
+	"syscall"
+)
+
+var (
+	cpuSetRE = regexp.MustCompile(`(\d,?)+`)
+)
+
+func init() {
+	register("FreeBSDNumCPU", FreeBSDNumCPU)
+	register("FreeBSDNumCPUHelper", FreeBSDNumCPUHelper)
+}
+
+func FreeBSDNumCPUHelper() {
+	fmt.Printf("%d\n", runtime.NumCPU())
+}
+
+func FreeBSDNumCPU() {
+	_, err := exec.LookPath("cpuset")
+	if err != nil {
+		// Can not test without cpuset command.
+		fmt.Println("OK")
+		return
+	}
+	_, err = exec.LookPath("sysctl")
+	if err != nil {
+		// Can not test without sysctl command.
+		fmt.Println("OK")
+		return
+	}
+	cmd := exec.Command("sysctl", "-n", "kern.smp.active")
+	output, err := cmd.CombinedOutput()
+	if err != nil {
+		fmt.Printf("fail to launch '%s', error: %s, output: %s\n", strings.Join(cmd.Args, " "), err, output)
+		return
+	}
+	if !bytes.Equal(output, []byte("1\n")) {
+		// SMP mode deactivated in kernel.
+		fmt.Println("OK")
+		return
+	}
+
+	list, err := getList()
+	if err != nil {
+		fmt.Printf("%s\n", err)
+		return
+	}
+	err = checkNCPU(list)
+	if err != nil {
+		fmt.Printf("%s\n", err)
+		return
+	}
+	if len(list) >= 2 {
+		err = checkNCPU(list[:len(list)-1])
+		if err != nil {
+			fmt.Printf("%s\n", err)
+			return
+		}
+	}
+	fmt.Println("OK")
+	return
+}
+
+func getList() ([]string, error) {
+	pid := syscall.Getpid()
+
+	// Launch cpuset to print a list of available CPUs: pid <PID> mask: 0, 1, 2, 3.
+	cmd := exec.Command("cpuset", "-g", "-p", strconv.Itoa(pid))
+	cmdline := strings.Join(cmd.Args, " ")
+	output, err := cmd.CombinedOutput()
+	if err != nil {
+		return nil, fmt.Errorf("fail to execute '%s': %s", cmdline, err)
+	}
+	output, _, ok := bytes.Cut(output, []byte("\n"))
+	if !ok {
+		return nil, fmt.Errorf("invalid output from '%s', '\\n' not found: %s", cmdline, output)
+	}
+
+	_, cpus, ok := bytes.Cut(output, []byte(":"))
+	if !ok {
+		return nil, fmt.Errorf("invalid output from '%s', ':' not found: %s", cmdline, output)
+	}
+
+	var list []string
+	for _, val := range bytes.Split(cpus, []byte(",")) {
+		index := string(bytes.TrimSpace(val))
+		if len(index) == 0 {
+			continue
+		}
+		list = append(list, index)
+	}
+	if len(list) == 0 {
+		return nil, fmt.Errorf("empty CPU list from '%s': %s", cmdline, output)
+	}
+	return list, nil
+}
+
+func checkNCPU(list []string) error {
+	listString := strings.Join(list, ",")
+	if len(listString) == 0 {
+		return fmt.Errorf("could not check against an empty CPU list")
+	}
+
+	cListString := cpuSetRE.FindString(listString)
+	if len(cListString) == 0 {
+		return fmt.Errorf("invalid cpuset output '%s'", listString)
+	}
+	// Launch FreeBSDNumCPUHelper() with specified CPUs list.
+	cmd := exec.Command("cpuset", "-l", cListString, os.Args[0], "FreeBSDNumCPUHelper")
+	cmdline := strings.Join(cmd.Args, " ")
+	output, err := cmd.CombinedOutput()
+	if err != nil {
+		return fmt.Errorf("fail to launch child '%s', error: %s, output: %s", cmdline, err, output)
+	}
+
+	// NumCPU from FreeBSDNumCPUHelper come with '\n'.
+	output = bytes.TrimSpace(output)
+	n, err := strconv.Atoi(string(output))
+	if err != nil {
+		return fmt.Errorf("fail to parse output from child '%s', error: %s, output: %s", cmdline, err, output)
+	}
+	if n != len(list) {
+		return fmt.Errorf("runtime.NumCPU() expected to %d, got %d when run with CPU list %s", len(list), n, cListString)
+	}
+	return nil
+}
diff --git a/src/runtime/testdata/testprog/panicprint.go b/src/runtime/testdata/testprog/panicprint.go
new file mode 100644
index 0000000..c8deabe
--- /dev/null
+++ b/src/runtime/testdata/testprog/panicprint.go
@@ -0,0 +1,111 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+type MyBool bool
+type MyComplex128 complex128
+type MyComplex64 complex64
+type MyFloat32 float32
+type MyFloat64 float64
+type MyInt int
+type MyInt8 int8
+type MyInt16 int16
+type MyInt32 int32
+type MyInt64 int64
+type MyString string
+type MyUint uint
+type MyUint8 uint8
+type MyUint16 uint16
+type MyUint32 uint32
+type MyUint64 uint64
+type MyUintptr uintptr
+
+func panicCustomComplex64() {
+	panic(MyComplex64(0.11 + 3i))
+}
+
+func panicCustomComplex128() {
+	panic(MyComplex128(32.1 + 10i))
+}
+
+func panicCustomString() {
+	panic(MyString("Panic"))
+}
+
+func panicCustomBool() {
+	panic(MyBool(true))
+}
+
+func panicCustomInt() {
+	panic(MyInt(93))
+}
+
+func panicCustomInt8() {
+	panic(MyInt8(93))
+}
+
+func panicCustomInt16() {
+	panic(MyInt16(93))
+}
+
+func panicCustomInt32() {
+	panic(MyInt32(93))
+}
+
+func panicCustomInt64() {
+	panic(MyInt64(93))
+}
+
+func panicCustomUint() {
+	panic(MyUint(93))
+}
+
+func panicCustomUint8() {
+	panic(MyUint8(93))
+}
+
+func panicCustomUint16() {
+	panic(MyUint16(93))
+}
+
+func panicCustomUint32() {
+	panic(MyUint32(93))
+}
+
+func panicCustomUint64() {
+	panic(MyUint64(93))
+}
+
+func panicCustomUintptr() {
+	panic(MyUintptr(93))
+}
+
+func panicCustomFloat64() {
+	panic(MyFloat64(-93.70))
+}
+
+func panicCustomFloat32() {
+	panic(MyFloat32(-93.70))
+}
+
+func init() {
+	register("panicCustomComplex64", panicCustomComplex64)
+	register("panicCustomComplex128", panicCustomComplex128)
+	register("panicCustomBool", panicCustomBool)
+	register("panicCustomFloat32", panicCustomFloat32)
+	register("panicCustomFloat64", panicCustomFloat64)
+	register("panicCustomInt", panicCustomInt)
+	register("panicCustomInt8", panicCustomInt8)
+	register("panicCustomInt16", panicCustomInt16)
+	register("panicCustomInt32", panicCustomInt32)
+	register("panicCustomInt64", panicCustomInt64)
+	register("panicCustomString", panicCustomString)
+	register("panicCustomUint", panicCustomUint)
+	register("panicCustomUint8", panicCustomUint8)
+	register("panicCustomUint16", panicCustomUint16)
+	register("panicCustomUint32", panicCustomUint32)
+	register("panicCustomUint64", panicCustomUint64)
+	register("panicCustomUintptr", panicCustomUintptr)
+}
diff --git a/src/runtime/testdata/testprog/panicrace.go b/src/runtime/testdata/testprog/panicrace.go
new file mode 100644
index 0000000..f058994
--- /dev/null
+++ b/src/runtime/testdata/testprog/panicrace.go
@@ -0,0 +1,27 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"runtime"
+	"sync"
+)
+
+func init() {
+	register("PanicRace", PanicRace)
+}
+
+func PanicRace() {
+	var wg sync.WaitGroup
+	wg.Add(1)
+	go func() {
+		defer func() {
+			wg.Done()
+			runtime.Gosched()
+		}()
+		panic("crash")
+	}()
+	wg.Wait()
+}
diff --git a/src/runtime/testdata/testprog/preempt.go b/src/runtime/testdata/testprog/preempt.go
new file mode 100644
index 0000000..fb6755a
--- /dev/null
+++ b/src/runtime/testdata/testprog/preempt.go
@@ -0,0 +1,75 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"runtime"
+	"runtime/debug"
+	"sync/atomic"
+)
+
+func init() {
+	register("AsyncPreempt", AsyncPreempt)
+}
+
+func AsyncPreempt() {
+	// Run with just 1 GOMAXPROCS so the runtime is required to
+	// use scheduler preemption.
+	runtime.GOMAXPROCS(1)
+	// Disable GC so we have complete control of what we're testing.
+	debug.SetGCPercent(-1)
+	// Out of an abundance of caution, also make sure that there are
+	// no GCs actively in progress. The sweep phase of a GC cycle
+	// for instance tries to preempt Ps at the very beginning.
+	runtime.GC()
+
+	// Start a goroutine with no sync safe-points.
+	var ready, ready2 uint32
+	go func() {
+		for {
+			atomic.StoreUint32(&ready, 1)
+			dummy()
+			dummy()
+		}
+	}()
+	// Also start one with a frameless function.
+	// This is an especially interesting case for
+	// LR machines.
+	go func() {
+		atomic.AddUint32(&ready2, 1)
+		frameless()
+	}()
+	// Also test empty infinite loop.
+	go func() {
+		atomic.AddUint32(&ready2, 1)
+		for {
+		}
+	}()
+
+	// Wait for the goroutine to stop passing through sync
+	// safe-points.
+	for atomic.LoadUint32(&ready) == 0 || atomic.LoadUint32(&ready2) < 2 {
+		runtime.Gosched()
+	}
+
+	// Run a GC, which will have to stop the goroutine for STW and
+	// for stack scanning. If this doesn't work, the test will
+	// deadlock and timeout.
+	runtime.GC()
+
+	println("OK")
+}
+
+//go:noinline
+func frameless() {
+	for i := int64(0); i < 1<<62; i++ {
+		out += i * i * i * i * i * 12345
+	}
+}
+
+var out int64
+
+//go:noinline
+func dummy() {}
diff --git a/src/runtime/testdata/testprog/segv.go b/src/runtime/testdata/testprog/segv.go
new file mode 100644
index 0000000..8547726
--- /dev/null
+++ b/src/runtime/testdata/testprog/segv.go
@@ -0,0 +1,32 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package main
+
+import "syscall"
+
+func init() {
+	register("Segv", Segv)
+}
+
+var Sum int
+
+func Segv() {
+	c := make(chan bool)
+	go func() {
+		close(c)
+		for i := 0; ; i++ {
+			Sum += i
+		}
+	}()
+
+	<-c
+
+	syscall.Kill(syscall.Getpid(), syscall.SIGSEGV)
+
+	// Wait for the OS to deliver the signal.
+	select {}
+}
diff --git a/src/runtime/testdata/testprog/segv_linux.go b/src/runtime/testdata/testprog/segv_linux.go
new file mode 100644
index 0000000..aa386bb
--- /dev/null
+++ b/src/runtime/testdata/testprog/segv_linux.go
@@ -0,0 +1,29 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "syscall"
+
+func init() {
+	register("TgkillSegv", TgkillSegv)
+}
+
+func TgkillSegv() {
+	c := make(chan bool)
+	go func() {
+		close(c)
+		for i := 0; ; i++ {
+			// Sum defined in segv.go.
+			Sum += i
+		}
+	}()
+
+	<-c
+
+	syscall.Tgkill(syscall.Getpid(), syscall.Gettid(), syscall.SIGSEGV)
+
+	// Wait for the OS to deliver the signal.
+	select {}
+}
diff --git a/src/runtime/testdata/testprog/signal.go b/src/runtime/testdata/testprog/signal.go
new file mode 100644
index 0000000..cc5ac8a
--- /dev/null
+++ b/src/runtime/testdata/testprog/signal.go
@@ -0,0 +1,30 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !windows && !plan9
+// +build !windows,!plan9
+
+package main
+
+import (
+	"syscall"
+	"time"
+)
+
+func init() {
+	register("SignalExitStatus", SignalExitStatus)
+}
+
+func SignalExitStatus() {
+	syscall.Kill(syscall.Getpid(), syscall.SIGTERM)
+
+	// Should die immediately, but we've seen flakiness on various
+	// systems (see issue 14063). It's possible that the signal is
+	// being delivered to a different thread and we are returning
+	// and exiting before that thread runs again. Give the program
+	// a little while to die to make sure we pick up the signal
+	// before we return and exit the program. The time here
+	// shouldn't matter--we'll never really sleep this long.
+	time.Sleep(time.Second)
+}
diff --git a/src/runtime/testdata/testprog/sleep.go b/src/runtime/testdata/testprog/sleep.go
new file mode 100644
index 0000000..b230e60
--- /dev/null
+++ b/src/runtime/testdata/testprog/sleep.go
@@ -0,0 +1,22 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"os"
+	"time"
+)
+
+// for golang.org/issue/27250
+
+func init() {
+	register("After1", After1)
+}
+
+func After1() {
+	os.Stdout.WriteString("ready\n")
+	os.Stdout.Close()
+	<-time.After(1 * time.Second)
+}
diff --git a/src/runtime/testdata/testprog/stringconcat.go b/src/runtime/testdata/testprog/stringconcat.go
new file mode 100644
index 0000000..f233e66
--- /dev/null
+++ b/src/runtime/testdata/testprog/stringconcat.go
@@ -0,0 +1,20 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "strings"
+
+func init() {
+	register("stringconcat", stringconcat)
+}
+
+func stringconcat() {
+	s0 := strings.Repeat("0", 1<<10)
+	s1 := strings.Repeat("1", 1<<10)
+	s2 := strings.Repeat("2", 1<<10)
+	s3 := strings.Repeat("3", 1<<10)
+	s := s0 + s1 + s2 + s3
+	panic(s)
+}
diff --git a/src/runtime/testdata/testprog/syscall_windows.go b/src/runtime/testdata/testprog/syscall_windows.go
new file mode 100644
index 0000000..71bf384
--- /dev/null
+++ b/src/runtime/testdata/testprog/syscall_windows.go
@@ -0,0 +1,73 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"internal/syscall/windows"
+	"runtime"
+	"sync"
+	"syscall"
+	"unsafe"
+)
+
+func init() {
+	register("RaiseException", RaiseException)
+	register("ZeroDivisionException", ZeroDivisionException)
+	register("StackMemory", StackMemory)
+}
+
+func RaiseException() {
+	const EXCEPTION_NONCONTINUABLE = 1
+	mod := syscall.MustLoadDLL("kernel32.dll")
+	proc := mod.MustFindProc("RaiseException")
+	proc.Call(0xbad, EXCEPTION_NONCONTINUABLE, 0, 0)
+	println("RaiseException should not return")
+}
+
+func ZeroDivisionException() {
+	x := 1
+	y := 0
+	z := x / y
+	println(z)
+}
+
+func getPagefileUsage() (uintptr, error) {
+	p, err := syscall.GetCurrentProcess()
+	if err != nil {
+		return 0, err
+	}
+	var m windows.PROCESS_MEMORY_COUNTERS
+	err = windows.GetProcessMemoryInfo(p, &m, uint32(unsafe.Sizeof(m)))
+	if err != nil {
+		return 0, err
+	}
+	return m.PagefileUsage, nil
+}
+
+func StackMemory() {
+	mem1, err := getPagefileUsage()
+	if err != nil {
+		panic(err)
+	}
+	const threadCount = 100
+	var wg sync.WaitGroup
+	for i := 0; i < threadCount; i++ {
+		wg.Add(1)
+		go func() {
+			runtime.LockOSThread()
+			wg.Done()
+			select {}
+		}()
+	}
+	wg.Wait()
+	mem2, err := getPagefileUsage()
+	if err != nil {
+		panic(err)
+	}
+	// assumes that this process creates 1 thread for each
+	// thread locked goroutine plus extra 5 threads
+	// like sysmon and others
+	print((mem2 - mem1) / (threadCount + 5))
+}
diff --git a/src/runtime/testdata/testprog/syscalls.go b/src/runtime/testdata/testprog/syscalls.go
new file mode 100644
index 0000000..098d5ca
--- /dev/null
+++ b/src/runtime/testdata/testprog/syscalls.go
@@ -0,0 +1,11 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"errors"
+)
+
+var errNotPermitted = errors.New("operation not permitted")
diff --git a/src/runtime/testdata/testprog/syscalls_linux.go b/src/runtime/testdata/testprog/syscalls_linux.go
new file mode 100644
index 0000000..48f8014
--- /dev/null
+++ b/src/runtime/testdata/testprog/syscalls_linux.go
@@ -0,0 +1,58 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"bytes"
+	"fmt"
+	"os"
+	"syscall"
+)
+
+func gettid() int {
+	return syscall.Gettid()
+}
+
+func tidExists(tid int) (exists, supported bool) {
+	stat, err := os.ReadFile(fmt.Sprintf("/proc/self/task/%d/stat", tid))
+	if os.IsNotExist(err) {
+		return false, true
+	}
+	// Check if it's a zombie thread.
+	state := bytes.Fields(stat)[2]
+	return !(len(state) == 1 && state[0] == 'Z'), true
+}
+
+func getcwd() (string, error) {
+	if !syscall.ImplementsGetwd {
+		return "", nil
+	}
+	// Use the syscall to get the current working directory.
+	// This is imperative for checking for OS thread state
+	// after an unshare since os.Getwd might just check the
+	// environment, or use some other mechanism.
+	var buf [4096]byte
+	n, err := syscall.Getcwd(buf[:])
+	if err != nil {
+		return "", err
+	}
+	// Subtract one for null terminator.
+	return string(buf[:n-1]), nil
+}
+
+func unshareFs() error {
+	err := syscall.Unshare(syscall.CLONE_FS)
+	if err != nil {
+		errno, ok := err.(syscall.Errno)
+		if ok && errno == syscall.EPERM {
+			return errNotPermitted
+		}
+	}
+	return err
+}
+
+func chdir(path string) error {
+	return syscall.Chdir(path)
+}
diff --git a/src/runtime/testdata/testprog/syscalls_none.go b/src/runtime/testdata/testprog/syscalls_none.go
new file mode 100644
index 0000000..068bb59
--- /dev/null
+++ b/src/runtime/testdata/testprog/syscalls_none.go
@@ -0,0 +1,28 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !linux
+// +build !linux
+
+package main
+
+func gettid() int {
+	return 0
+}
+
+func tidExists(tid int) (exists, supported bool) {
+	return false, false
+}
+
+func getcwd() (string, error) {
+	return "", nil
+}
+
+func unshareFs() error {
+	return nil
+}
+
+func chdir(path string) error {
+	return nil
+}
diff --git a/src/runtime/testdata/testprog/timeprof.go b/src/runtime/testdata/testprog/timeprof.go
new file mode 100644
index 0000000..1e90af4
--- /dev/null
+++ b/src/runtime/testdata/testprog/timeprof.go
@@ -0,0 +1,45 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"fmt"
+	"os"
+	"runtime/pprof"
+	"time"
+)
+
+func init() {
+	register("TimeProf", TimeProf)
+}
+
+func TimeProf() {
+	f, err := os.CreateTemp("", "timeprof")
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	if err := pprof.StartCPUProfile(f); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	t0 := time.Now()
+	// We should get a profiling signal 100 times a second,
+	// so running for 1/10 second should be sufficient.
+	for time.Since(t0) < time.Second/10 {
+	}
+
+	pprof.StopCPUProfile()
+
+	name := f.Name()
+	if err := f.Close(); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	fmt.Println(name)
+}
diff --git a/src/runtime/testdata/testprog/traceback_ancestors.go b/src/runtime/testdata/testprog/traceback_ancestors.go
new file mode 100644
index 0000000..8fc1aa7
--- /dev/null
+++ b/src/runtime/testdata/testprog/traceback_ancestors.go
@@ -0,0 +1,96 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"bytes"
+	"fmt"
+	"runtime"
+	"strings"
+)
+
+func init() {
+	register("TracebackAncestors", TracebackAncestors)
+}
+
+const numGoroutines = 3
+const numFrames = 2
+
+func TracebackAncestors() {
+	w := make(chan struct{})
+	recurseThenCallGo(w, numGoroutines, numFrames, true)
+	<-w
+	printStack()
+	close(w)
+}
+
+var ignoreGoroutines = make(map[string]bool)
+
+func printStack() {
+	buf := make([]byte, 1024)
+	for {
+		n := runtime.Stack(buf, true)
+		if n < len(buf) {
+			all := string(buf[:n])
+			var saved string
+
+			// Delete any ignored goroutines, if present.
+			for all != "" {
+				var g string
+				g, all, _ = strings.Cut(all, "\n\n")
+
+				if strings.HasPrefix(g, "goroutine ") {
+					id, _, _ := strings.Cut(strings.TrimPrefix(g, "goroutine "), " ")
+					if ignoreGoroutines[id] {
+						continue
+					}
+				}
+				if saved != "" {
+					saved += "\n\n"
+				}
+				saved += g
+			}
+
+			fmt.Print(saved)
+			return
+		}
+		buf = make([]byte, 2*len(buf))
+	}
+}
+
+func recurseThenCallGo(w chan struct{}, frames int, goroutines int, main bool) {
+	if frames == 0 {
+		// Signal to TracebackAncestors that we are done recursing and starting goroutines.
+		w <- struct{}{}
+		<-w
+		return
+	}
+	if goroutines == 0 {
+		// Record which goroutine this is so we can ignore it
+		// in the traceback if it hasn't finished exiting by
+		// the time we printStack.
+		if !main {
+			ignoreGoroutines[goroutineID()] = true
+		}
+
+		// Start the next goroutine now that there are no more recursions left
+		// for this current goroutine.
+		go recurseThenCallGo(w, frames-1, numFrames, false)
+		return
+	}
+	recurseThenCallGo(w, frames, goroutines-1, main)
+}
+
+func goroutineID() string {
+	buf := make([]byte, 128)
+	runtime.Stack(buf, false)
+	prefix := []byte("goroutine ")
+	var found bool
+	if buf, found = bytes.CutPrefix(buf, prefix); !found {
+		panic(fmt.Sprintf("expected %q at beginning of traceback:\n%s", prefix, buf))
+	}
+	id, _, _ := bytes.Cut(buf, []byte(" "))
+	return string(id)
+}
diff --git a/src/runtime/testdata/testprog/unsafe.go b/src/runtime/testdata/testprog/unsafe.go
new file mode 100644
index 0000000..021b08f
--- /dev/null
+++ b/src/runtime/testdata/testprog/unsafe.go
@@ -0,0 +1,12 @@
+package main
+
+import "unsafe"
+
+func init() {
+	register("panicOnNilAndEleSizeIsZero", panicOnNilAndEleSizeIsZero)
+}
+
+func panicOnNilAndEleSizeIsZero() {
+	var p *struct{}
+	_ = unsafe.Slice(p, 5)
+}
diff --git a/src/runtime/testdata/testprog/vdso.go b/src/runtime/testdata/testprog/vdso.go
new file mode 100644
index 0000000..b18bc74
--- /dev/null
+++ b/src/runtime/testdata/testprog/vdso.go
@@ -0,0 +1,54 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Invoke signal handler in the VDSO context (see issue 32912).
+
+package main
+
+import (
+	"fmt"
+	"os"
+	"runtime/pprof"
+	"time"
+)
+
+func init() {
+	register("SignalInVDSO", signalInVDSO)
+}
+
+func signalInVDSO() {
+	f, err := os.CreateTemp("", "timeprofnow")
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	if err := pprof.StartCPUProfile(f); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	t0 := time.Now()
+	t1 := t0
+	// We should get a profiling signal 100 times a second,
+	// so running for 1 second should be sufficient.
+	for t1.Sub(t0) < time.Second {
+		t1 = time.Now()
+	}
+
+	pprof.StopCPUProfile()
+
+	name := f.Name()
+	if err := f.Close(); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	if err := os.Remove(name); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	fmt.Println("success")
+}
diff --git a/src/runtime/testdata/testprogcgo/aprof.go b/src/runtime/testdata/testprogcgo/aprof.go
new file mode 100644
index 0000000..1687014
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/aprof.go
@@ -0,0 +1,56 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+// Test that SIGPROF received in C code does not crash the process
+// looking for the C code's func pointer.
+
+// This is a regression test for issue 14599, where profiling fails when the
+// function is the first C function. Exported functions are the first C
+// functions, so we use an exported function. Exported functions are created in
+// lexicographical order of source files, so this file is named aprof.go to
+// ensure its function is first.
+
+// extern void CallGoNop();
+import "C"
+
+import (
+	"bytes"
+	"fmt"
+	"runtime/pprof"
+	"time"
+)
+
+func init() {
+	register("CgoCCodeSIGPROF", CgoCCodeSIGPROF)
+}
+
+//export GoNop
+func GoNop() {}
+
+func CgoCCodeSIGPROF() {
+	c := make(chan bool)
+	go func() {
+		<-c
+		start := time.Now()
+		for i := 0; i < 1e7; i++ {
+			if i%1000 == 0 {
+				if time.Since(start) > time.Second {
+					break
+				}
+			}
+			C.CallGoNop()
+		}
+		c <- true
+	}()
+
+	var buf bytes.Buffer
+	pprof.StartCPUProfile(&buf)
+	c <- true
+	<-c
+	pprof.StopCPUProfile()
+
+	fmt.Println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/aprof_c.c b/src/runtime/testdata/testprogcgo/aprof_c.c
new file mode 100644
index 0000000..d588e13
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/aprof_c.c
@@ -0,0 +1,9 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "_cgo_export.h"
+
+void CallGoNop() {
+	GoNop();
+}
diff --git a/src/runtime/testdata/testprogcgo/bigstack1_windows.c b/src/runtime/testdata/testprogcgo/bigstack1_windows.c
new file mode 100644
index 0000000..551fb68
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/bigstack1_windows.c
@@ -0,0 +1,12 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This is not in bigstack_windows.c because it needs to be part of
+// testprogcgo but is not part of the DLL built from bigstack_windows.c.
+
+#include "_cgo_export.h"
+
+void CallGoBigStack1(char* p) {
+	goBigStack1(p);
+}
diff --git a/src/runtime/testdata/testprogcgo/bigstack_windows.c b/src/runtime/testdata/testprogcgo/bigstack_windows.c
new file mode 100644
index 0000000..cd85ac8
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/bigstack_windows.c
@@ -0,0 +1,46 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This test source is used by both TestBigStackCallbackCgo (linked
+// directly into the Go binary) and TestBigStackCallbackSyscall
+// (compiled into a DLL).
+
+#include <windows.h>
+#include <stdio.h>
+
+#ifndef STACK_SIZE_PARAM_IS_A_RESERVATION
+#define STACK_SIZE_PARAM_IS_A_RESERVATION 0x00010000
+#endif
+
+typedef void callback(char*);
+
+// Allocate a stack that's much larger than the default.
+static const int STACK_SIZE = 16<<20;
+
+static callback *bigStackCallback;
+
+static void useStack(int bytes) {
+	// Windows doesn't like huge frames, so we grow the stack 64k at a time.
+	char x[64<<10];
+	if (bytes < sizeof x) {
+		bigStackCallback(x);
+	} else {
+		useStack(bytes - sizeof x);
+	}
+}
+
+static DWORD WINAPI threadEntry(LPVOID lpParam) {
+	useStack(STACK_SIZE - (128<<10));
+	return 0;
+}
+
+void bigStack(callback *cb) {
+	bigStackCallback = cb;
+	HANDLE hThread = CreateThread(NULL, STACK_SIZE, threadEntry, NULL, STACK_SIZE_PARAM_IS_A_RESERVATION, NULL);
+	if (hThread == NULL) {
+		fprintf(stderr, "CreateThread failed\n");
+		exit(1);
+	}
+	WaitForSingleObject(hThread, INFINITE);
+}
diff --git a/src/runtime/testdata/testprogcgo/bigstack_windows.go b/src/runtime/testdata/testprogcgo/bigstack_windows.go
new file mode 100644
index 0000000..135b5fc
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/bigstack_windows.go
@@ -0,0 +1,27 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+/*
+typedef void callback(char*);
+extern void CallGoBigStack1(char*);
+extern void bigStack(callback*);
+*/
+import "C"
+
+func init() {
+	register("BigStack", BigStack)
+}
+
+func BigStack() {
+	// Create a large thread stack and call back into Go to test
+	// if Go correctly determines the stack bounds.
+	C.bigStack((*C.callback)(C.CallGoBigStack1))
+}
+
+//export goBigStack1
+func goBigStack1(x *C.char) {
+	println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/bindm.c b/src/runtime/testdata/testprogcgo/bindm.c
new file mode 100644
index 0000000..815d8a7
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/bindm.c
@@ -0,0 +1,34 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9 && !windows
+
+#include <stdint.h>
+#include <pthread.h>
+#include <unistd.h>
+#include "_cgo_export.h"
+
+#define CTHREADS 2
+#define CHECKCALLS 100
+
+static void* checkBindMThread(void* thread) {
+	int i;
+	for (i = 0; i < CHECKCALLS; i++) {
+		GoCheckBindM((uintptr_t)thread);
+		usleep(1);
+	}
+	return NULL;
+}
+
+void CheckBindM() {
+	int i;
+	pthread_t s[CTHREADS];
+
+	for (i = 0; i < CTHREADS; i++) {
+		pthread_create(&s[i], NULL, checkBindMThread, &s[i]);
+	}
+	for (i = 0; i < CTHREADS; i++) {
+		pthread_join(s[i], NULL);
+	}
+}
diff --git a/src/runtime/testdata/testprogcgo/bindm.go b/src/runtime/testdata/testprogcgo/bindm.go
new file mode 100644
index 0000000..c2003c2
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/bindm.go
@@ -0,0 +1,61 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9 && !windows
+
+// Test that callbacks from C to Go in the same C-thread always get the same m.
+// Make sure the extra M bind to the C-thread.
+
+package main
+
+/*
+extern void CheckBindM();
+*/
+import "C"
+
+import (
+	"fmt"
+	"os"
+	"runtime"
+	"sync"
+	"sync/atomic"
+)
+
+var (
+	mutex      = sync.Mutex{}
+	cThreadToM = map[uintptr]uintptr{}
+	started    = atomic.Uint32{}
+)
+
+// same as CTHREADS in C, make sure all the C threads are actually started.
+const cThreadNum = 2
+
+func init() {
+	register("EnsureBindM", EnsureBindM)
+}
+
+//export GoCheckBindM
+func GoCheckBindM(thread uintptr) {
+	// Wait all threads start
+	if started.Load() != cThreadNum {
+		// Only once for each thread, since it will wait all threads start.
+		started.Add(1)
+		for started.Load() < cThreadNum {
+			runtime.Gosched()
+		}
+	}
+	m := runtime_getm_for_test()
+	mutex.Lock()
+	defer mutex.Unlock()
+	if savedM, ok := cThreadToM[thread]; ok && savedM != m {
+		fmt.Printf("m == %x want %x\n", m, savedM)
+		os.Exit(1)
+	}
+	cThreadToM[thread] = m
+}
+
+func EnsureBindM() {
+	C.CheckBindM()
+	fmt.Println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/callback.go b/src/runtime/testdata/testprogcgo/callback.go
new file mode 100644
index 0000000..319572f
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/callback.go
@@ -0,0 +1,116 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9 && !windows
+// +build !plan9,!windows
+
+package main
+
+/*
+#include <pthread.h>
+
+void go_callback();
+
+static void *thr(void *arg) {
+    go_callback();
+    return 0;
+}
+
+static void foo() {
+    pthread_t th;
+    pthread_attr_t attr;
+    pthread_attr_init(&attr);
+    pthread_attr_setstacksize(&attr, 256 << 10);
+    pthread_create(&th, &attr, thr, 0);
+    pthread_join(th, 0);
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"os"
+	"runtime"
+	"sync/atomic"
+	_ "unsafe" // for go:linkname
+)
+
+func init() {
+	register("CgoCallbackGC", CgoCallbackGC)
+}
+
+//export go_callback
+func go_callback() {
+	if e := extraMInUse.Load(); e == 0 {
+		fmt.Printf("in callback extraMInUse got %d want >0\n", e)
+		os.Exit(1)
+	}
+
+	runtime.GC()
+	grow()
+	runtime.GC()
+}
+
+var cnt int
+
+func grow() {
+	x := 10000
+	sum := 0
+	if grow1(&x, &sum) == 0 {
+		panic("bad")
+	}
+}
+
+func grow1(x, sum *int) int {
+	if *x == 0 {
+		return *sum + 1
+	}
+	*x--
+	sum1 := *sum + *x
+	return grow1(x, &sum1)
+}
+
+func CgoCallbackGC() {
+	P := 100
+	if os.Getenv("RUNTIME_TEST_SHORT") != "" {
+		P = 10
+	}
+
+	if e := extraMInUse.Load(); e != 0 {
+		fmt.Printf("before testing extraMInUse got %d want 0\n", e)
+		os.Exit(1)
+	}
+
+	done := make(chan bool)
+	// allocate a bunch of stack frames and spray them with pointers
+	for i := 0; i < P; i++ {
+		go func() {
+			grow()
+			done <- true
+		}()
+	}
+	for i := 0; i < P; i++ {
+		<-done
+	}
+	// now give these stack frames to cgo callbacks
+	for i := 0; i < P; i++ {
+		go func() {
+			C.foo()
+			done <- true
+		}()
+	}
+	for i := 0; i < P; i++ {
+		<-done
+	}
+
+	if e := extraMInUse.Load(); e != 0 {
+		fmt.Printf("after testing extraMInUse got %d want 0\n", e)
+		os.Exit(1)
+	}
+
+	fmt.Printf("OK\n")
+}
+
+//go:linkname extraMInUse runtime.extraMInUse
+var extraMInUse atomic.Uint32
diff --git a/src/runtime/testdata/testprogcgo/catchpanic.go b/src/runtime/testdata/testprogcgo/catchpanic.go
new file mode 100644
index 0000000..c722d40
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/catchpanic.go
@@ -0,0 +1,47 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9 && !windows
+// +build !plan9,!windows
+
+package main
+
+/*
+#include <signal.h>
+#include <stdlib.h>
+#include <string.h>
+
+static void abrthandler(int signum) {
+	if (signum == SIGABRT) {
+		exit(0);  // success
+	}
+}
+
+void registerAbortHandler() {
+	struct sigaction act;
+	memset(&act, 0, sizeof act);
+	act.sa_handler = abrthandler;
+	sigaction(SIGABRT, &act, NULL);
+}
+
+static void __attribute__ ((constructor)) sigsetup(void) {
+	if (getenv("CGOCATCHPANIC_EARLY_HANDLER") == NULL)
+		return;
+	registerAbortHandler();
+}
+*/
+import "C"
+import "os"
+
+func init() {
+	register("CgoCatchPanic", CgoCatchPanic)
+}
+
+// Test that the SIGABRT raised by panic can be caught by an early signal handler.
+func CgoCatchPanic() {
+	if _, ok := os.LookupEnv("CGOCATCHPANIC_EARLY_HANDLER"); !ok {
+		C.registerAbortHandler()
+	}
+	panic("catch me")
+}
diff --git a/src/runtime/testdata/testprogcgo/cgo.go b/src/runtime/testdata/testprogcgo/cgo.go
new file mode 100644
index 0000000..a587db3
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/cgo.go
@@ -0,0 +1,108 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+/*
+void foo1(void) {}
+void foo2(void* p) {}
+*/
+import "C"
+import (
+	"fmt"
+	"os"
+	"runtime"
+	"strconv"
+	"time"
+	"unsafe"
+)
+
+func init() {
+	register("CgoSignalDeadlock", CgoSignalDeadlock)
+	register("CgoTraceback", CgoTraceback)
+	register("CgoCheckBytes", CgoCheckBytes)
+}
+
+func CgoSignalDeadlock() {
+	runtime.GOMAXPROCS(100)
+	ping := make(chan bool)
+	go func() {
+		for i := 0; ; i++ {
+			runtime.Gosched()
+			select {
+			case done := <-ping:
+				if done {
+					ping <- true
+					return
+				}
+				ping <- true
+			default:
+			}
+			func() {
+				defer func() {
+					recover()
+				}()
+				var s *string
+				*s = ""
+				fmt.Printf("continued after expected panic\n")
+			}()
+		}
+	}()
+	time.Sleep(time.Millisecond)
+	start := time.Now()
+	var times []time.Duration
+	n := 64
+	if os.Getenv("RUNTIME_TEST_SHORT") != "" {
+		n = 16
+	}
+	for i := 0; i < n; i++ {
+		go func() {
+			runtime.LockOSThread()
+			select {}
+		}()
+		go func() {
+			runtime.LockOSThread()
+			select {}
+		}()
+		time.Sleep(time.Millisecond)
+		ping <- false
+		select {
+		case <-ping:
+			times = append(times, time.Since(start))
+		case <-time.After(time.Second):
+			fmt.Printf("HANG 1 %v\n", times)
+			return
+		}
+	}
+	ping <- true
+	select {
+	case <-ping:
+	case <-time.After(time.Second):
+		fmt.Printf("HANG 2 %v\n", times)
+		return
+	}
+	fmt.Printf("OK\n")
+}
+
+func CgoTraceback() {
+	C.foo1()
+	buf := make([]byte, 1)
+	runtime.Stack(buf, true)
+	fmt.Printf("OK\n")
+}
+
+func CgoCheckBytes() {
+	try, _ := strconv.Atoi(os.Getenv("GO_CGOCHECKBYTES_TRY"))
+	if try <= 0 {
+		try = 1
+	}
+	b := make([]byte, 1e6*try)
+	start := time.Now()
+	for i := 0; i < 1e3*try; i++ {
+		C.foo2(unsafe.Pointer(&b[0]))
+		if time.Since(start) > time.Second {
+			break
+		}
+	}
+}
diff --git a/src/runtime/testdata/testprogcgo/crash.go b/src/runtime/testdata/testprogcgo/crash.go
new file mode 100644
index 0000000..4d83132
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/crash.go
@@ -0,0 +1,45 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"fmt"
+	"runtime"
+)
+
+func init() {
+	register("Crash", Crash)
+}
+
+func test(name string) {
+	defer func() {
+		if x := recover(); x != nil {
+			fmt.Printf(" recovered")
+		}
+		fmt.Printf(" done\n")
+	}()
+	fmt.Printf("%s:", name)
+	var s *string
+	_ = *s
+	fmt.Print("SHOULD NOT BE HERE")
+}
+
+func testInNewThread(name string) {
+	c := make(chan bool)
+	go func() {
+		runtime.LockOSThread()
+		test(name)
+		c <- true
+	}()
+	<-c
+}
+
+func Crash() {
+	runtime.LockOSThread()
+	test("main")
+	testInNewThread("new-thread")
+	testInNewThread("second-new-thread")
+	test("main-again")
+}
diff --git a/src/runtime/testdata/testprogcgo/deadlock.go b/src/runtime/testdata/testprogcgo/deadlock.go
new file mode 100644
index 0000000..2cc68a8
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/deadlock.go
@@ -0,0 +1,30 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+/*
+char *geterror() {
+	return "cgo error";
+}
+*/
+import "C"
+import (
+	"fmt"
+)
+
+func init() {
+	register("CgoPanicDeadlock", CgoPanicDeadlock)
+}
+
+type cgoError struct{}
+
+func (cgoError) Error() string {
+	fmt.Print("") // necessary to trigger the deadlock
+	return C.GoString(C.geterror())
+}
+
+func CgoPanicDeadlock() {
+	panic(cgoError{})
+}
diff --git a/src/runtime/testdata/testprogcgo/destructor.c b/src/runtime/testdata/testprogcgo/destructor.c
new file mode 100644
index 0000000..8604d81
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/destructor.c
@@ -0,0 +1,22 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "_cgo_export.h"
+
+static void callDestructorCallback() {
+	GoDestructorCallback();
+}
+
+static void (*destructorFn)(void);
+
+void registerDestructor() {
+	destructorFn = callDestructorCallback;
+}
+
+__attribute__((destructor))
+static void destructor() {
+	if (destructorFn) {
+		destructorFn();
+	}
+}
diff --git a/src/runtime/testdata/testprogcgo/destructor.go b/src/runtime/testdata/testprogcgo/destructor.go
new file mode 100644
index 0000000..49529f0
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/destructor.go
@@ -0,0 +1,23 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+// extern void registerDestructor();
+import "C"
+
+import "fmt"
+
+func init() {
+	register("DestructorCallback", DestructorCallback)
+}
+
+//export GoDestructorCallback
+func GoDestructorCallback() {
+}
+
+func DestructorCallback() {
+	C.registerDestructor()
+	fmt.Println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/dll_windows.go b/src/runtime/testdata/testprogcgo/dll_windows.go
new file mode 100644
index 0000000..25380fb
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/dll_windows.go
@@ -0,0 +1,25 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+/*
+#include <windows.h>
+
+DWORD getthread() {
+	return GetCurrentThreadId();
+}
+*/
+import "C"
+import "runtime/testdata/testprogcgo/windows"
+
+func init() {
+	register("CgoDLLImportsMain", CgoDLLImportsMain)
+}
+
+func CgoDLLImportsMain() {
+	C.getthread()
+	windows.GetThread()
+	println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/dropm.go b/src/runtime/testdata/testprogcgo/dropm.go
new file mode 100644
index 0000000..700b7fa
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/dropm.go
@@ -0,0 +1,60 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9 && !windows
+// +build !plan9,!windows
+
+// Test that a sequence of callbacks from C to Go get the same m.
+// This failed to be true on arm and arm64, which was the root cause
+// of issue 13881.
+
+package main
+
+/*
+#include <stddef.h>
+#include <pthread.h>
+
+extern void GoCheckM();
+
+static void* thread(void* arg __attribute__ ((unused))) {
+	GoCheckM();
+	return NULL;
+}
+
+static void CheckM() {
+	pthread_t tid;
+	pthread_create(&tid, NULL, thread, NULL);
+	pthread_join(tid, NULL);
+	pthread_create(&tid, NULL, thread, NULL);
+	pthread_join(tid, NULL);
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"os"
+)
+
+func init() {
+	register("EnsureDropM", EnsureDropM)
+}
+
+var savedM uintptr
+
+//export GoCheckM
+func GoCheckM() {
+	m := runtime_getm_for_test()
+	if savedM == 0 {
+		savedM = m
+	} else if savedM != m {
+		fmt.Printf("m == %x want %x\n", m, savedM)
+		os.Exit(1)
+	}
+}
+
+func EnsureDropM() {
+	C.CheckM()
+	fmt.Println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/dropm_stub.go b/src/runtime/testdata/testprogcgo/dropm_stub.go
new file mode 100644
index 0000000..6997cfd
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/dropm_stub.go
@@ -0,0 +1,12 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import _ "unsafe" // for go:linkname
+
+// Defined in the runtime package.
+//
+//go:linkname runtime_getm_for_test runtime.getm
+func runtime_getm_for_test() uintptr
diff --git a/src/runtime/testdata/testprogcgo/eintr.go b/src/runtime/testdata/testprogcgo/eintr.go
new file mode 100644
index 0000000..6e9677f
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/eintr.go
@@ -0,0 +1,247 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9 && !windows
+// +build !plan9,!windows
+
+package main
+
+/*
+#include <errno.h>
+#include <signal.h>
+#include <string.h>
+
+static int clearRestart(int sig) {
+	struct sigaction sa;
+
+	memset(&sa, 0, sizeof sa);
+	if (sigaction(sig, NULL, &sa) < 0) {
+		return errno;
+	}
+	sa.sa_flags &=~ SA_RESTART;
+	if (sigaction(sig, &sa, NULL) < 0) {
+		return errno;
+	}
+	return 0;
+}
+*/
+import "C"
+
+import (
+	"bytes"
+	"errors"
+	"fmt"
+	"io"
+	"log"
+	"net"
+	"os"
+	"os/exec"
+	"sync"
+	"syscall"
+	"time"
+)
+
+func init() {
+	register("EINTR", EINTR)
+	register("Block", Block)
+}
+
+// Test various operations when a signal handler is installed without
+// the SA_RESTART flag. This tests that the os and net APIs handle EINTR.
+func EINTR() {
+	if errno := C.clearRestart(C.int(syscall.SIGURG)); errno != 0 {
+		log.Fatal(syscall.Errno(errno))
+	}
+	if errno := C.clearRestart(C.int(syscall.SIGWINCH)); errno != 0 {
+		log.Fatal(syscall.Errno(errno))
+	}
+	if errno := C.clearRestart(C.int(syscall.SIGCHLD)); errno != 0 {
+		log.Fatal(syscall.Errno(errno))
+	}
+
+	var wg sync.WaitGroup
+	testPipe(&wg)
+	testNet(&wg)
+	testExec(&wg)
+	wg.Wait()
+	fmt.Println("OK")
+}
+
+// spin does CPU bound spinning and allocating for a millisecond,
+// to get a SIGURG.
+//
+//go:noinline
+func spin() (float64, []byte) {
+	stop := time.Now().Add(time.Millisecond)
+	r1 := 0.0
+	r2 := make([]byte, 200)
+	for time.Now().Before(stop) {
+		for i := 1; i < 1e6; i++ {
+			r1 += r1 / float64(i)
+			r2 = append(r2, bytes.Repeat([]byte{byte(i)}, 100)...)
+			r2 = r2[100:]
+		}
+	}
+	return r1, r2
+}
+
+// winch sends a few SIGWINCH signals to the process.
+func winch() {
+	ticker := time.NewTicker(100 * time.Microsecond)
+	defer ticker.Stop()
+	pid := syscall.Getpid()
+	for n := 10; n > 0; n-- {
+		syscall.Kill(pid, syscall.SIGWINCH)
+		<-ticker.C
+	}
+}
+
+// sendSomeSignals triggers a few SIGURG and SIGWINCH signals.
+func sendSomeSignals() {
+	done := make(chan struct{})
+	go func() {
+		spin()
+		close(done)
+	}()
+	winch()
+	<-done
+}
+
+// testPipe tests pipe operations.
+func testPipe(wg *sync.WaitGroup) {
+	r, w, err := os.Pipe()
+	if err != nil {
+		log.Fatal(err)
+	}
+	if err := syscall.SetNonblock(int(r.Fd()), false); err != nil {
+		log.Fatal(err)
+	}
+	if err := syscall.SetNonblock(int(w.Fd()), false); err != nil {
+		log.Fatal(err)
+	}
+	wg.Add(2)
+	go func() {
+		defer wg.Done()
+		defer w.Close()
+		// Spin before calling Write so that the first ReadFull
+		// in the other goroutine will likely be interrupted
+		// by a signal.
+		sendSomeSignals()
+		// This Write will likely be interrupted by a signal
+		// as the other goroutine spins in the middle of reading.
+		// We write enough data that we should always fill the
+		// pipe buffer and need multiple write system calls.
+		if _, err := w.Write(bytes.Repeat([]byte{0}, 2<<20)); err != nil {
+			log.Fatal(err)
+		}
+	}()
+	go func() {
+		defer wg.Done()
+		defer r.Close()
+		b := make([]byte, 1<<20)
+		// This ReadFull will likely be interrupted by a signal,
+		// as the other goroutine spins before writing anything.
+		if _, err := io.ReadFull(r, b); err != nil {
+			log.Fatal(err)
+		}
+		// Spin after reading half the data so that the Write
+		// in the other goroutine will likely be interrupted
+		// before it completes.
+		sendSomeSignals()
+		if _, err := io.ReadFull(r, b); err != nil {
+			log.Fatal(err)
+		}
+	}()
+}
+
+// testNet tests network operations.
+func testNet(wg *sync.WaitGroup) {
+	ln, err := net.Listen("tcp4", "127.0.0.1:0")
+	if err != nil {
+		if errors.Is(err, syscall.EAFNOSUPPORT) || errors.Is(err, syscall.EPROTONOSUPPORT) {
+			return
+		}
+		log.Fatal(err)
+	}
+	wg.Add(2)
+	go func() {
+		defer wg.Done()
+		defer ln.Close()
+		c, err := ln.Accept()
+		if err != nil {
+			log.Fatal(err)
+		}
+		defer c.Close()
+		cf, err := c.(*net.TCPConn).File()
+		if err != nil {
+			log.Fatal(err)
+		}
+		defer cf.Close()
+		if err := syscall.SetNonblock(int(cf.Fd()), false); err != nil {
+			log.Fatal(err)
+		}
+		// See comments in testPipe.
+		sendSomeSignals()
+		if _, err := cf.Write(bytes.Repeat([]byte{0}, 2<<20)); err != nil {
+			log.Fatal(err)
+		}
+	}()
+	go func() {
+		defer wg.Done()
+		sendSomeSignals()
+		c, err := net.Dial("tcp", ln.Addr().String())
+		if err != nil {
+			log.Fatal(err)
+		}
+		defer c.Close()
+		cf, err := c.(*net.TCPConn).File()
+		if err != nil {
+			log.Fatal(err)
+		}
+		defer cf.Close()
+		if err := syscall.SetNonblock(int(cf.Fd()), false); err != nil {
+			log.Fatal(err)
+		}
+		// See comments in testPipe.
+		b := make([]byte, 1<<20)
+		if _, err := io.ReadFull(cf, b); err != nil {
+			log.Fatal(err)
+		}
+		sendSomeSignals()
+		if _, err := io.ReadFull(cf, b); err != nil {
+			log.Fatal(err)
+		}
+	}()
+}
+
+func testExec(wg *sync.WaitGroup) {
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		cmd := exec.Command(os.Args[0], "Block")
+		stdin, err := cmd.StdinPipe()
+		if err != nil {
+			log.Fatal(err)
+		}
+		cmd.Stderr = new(bytes.Buffer)
+		cmd.Stdout = cmd.Stderr
+		if err := cmd.Start(); err != nil {
+			log.Fatal(err)
+		}
+
+		go func() {
+			sendSomeSignals()
+			stdin.Close()
+		}()
+
+		if err := cmd.Wait(); err != nil {
+			log.Fatalf("%v:\n%s", err, cmd.Stdout)
+		}
+	}()
+}
+
+// Block blocks until stdin is closed.
+func Block() {
+	io.Copy(io.Discard, os.Stdin)
+}
diff --git a/src/runtime/testdata/testprogcgo/exec.go b/src/runtime/testdata/testprogcgo/exec.go
new file mode 100644
index 0000000..c268bcd
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/exec.go
@@ -0,0 +1,107 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9 && !windows
+// +build !plan9,!windows
+
+package main
+
+/*
+#include <stddef.h>
+#include <signal.h>
+#include <pthread.h>
+
+// Save the signal mask at startup so that we see what it is before
+// the Go runtime starts setting up signals.
+
+static sigset_t mask;
+
+static void init(void) __attribute__ ((constructor));
+
+static void init() {
+	sigemptyset(&mask);
+	pthread_sigmask(SIG_SETMASK, NULL, &mask);
+}
+
+int SIGINTBlocked() {
+	return sigismember(&mask, SIGINT);
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"io/fs"
+	"os"
+	"os/exec"
+	"os/signal"
+	"sync"
+	"syscall"
+)
+
+func init() {
+	register("CgoExecSignalMask", CgoExecSignalMask)
+}
+
+func CgoExecSignalMask() {
+	if len(os.Args) > 2 && os.Args[2] == "testsigint" {
+		if C.SIGINTBlocked() != 0 {
+			os.Exit(1)
+		}
+		os.Exit(0)
+	}
+
+	c := make(chan os.Signal, 1)
+	signal.Notify(c, syscall.SIGTERM)
+	go func() {
+		for range c {
+		}
+	}()
+
+	const goCount = 10
+	const execCount = 10
+	var wg sync.WaitGroup
+	wg.Add(goCount*execCount + goCount)
+	for i := 0; i < goCount; i++ {
+		go func() {
+			defer wg.Done()
+			for j := 0; j < execCount; j++ {
+				c2 := make(chan os.Signal, 1)
+				signal.Notify(c2, syscall.SIGUSR1)
+				syscall.Kill(os.Getpid(), syscall.SIGTERM)
+				go func(j int) {
+					defer wg.Done()
+					cmd := exec.Command(os.Args[0], "CgoExecSignalMask", "testsigint")
+					cmd.Stdin = os.Stdin
+					cmd.Stdout = os.Stdout
+					cmd.Stderr = os.Stderr
+					if err := cmd.Run(); err != nil {
+						// An overloaded system
+						// may fail with EAGAIN.
+						// This doesn't tell us
+						// anything useful; ignore it.
+						// Issue #27731.
+						if isEAGAIN(err) {
+							return
+						}
+						fmt.Printf("iteration %d: %v\n", j, err)
+						os.Exit(1)
+					}
+				}(j)
+				signal.Stop(c2)
+			}
+		}()
+	}
+	wg.Wait()
+
+	fmt.Println("OK")
+}
+
+// isEAGAIN reports whether err is an EAGAIN error from a process execution.
+func isEAGAIN(err error) bool {
+	if p, ok := err.(*fs.PathError); ok {
+		err = p.Err
+	}
+	return err == syscall.EAGAIN
+}
diff --git a/src/runtime/testdata/testprogcgo/gprof.go b/src/runtime/testdata/testprogcgo/gprof.go
new file mode 100644
index 0000000..d453b4d
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/gprof.go
@@ -0,0 +1,46 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+// Test taking a goroutine profile with C traceback.
+
+/*
+// Defined in gprof_c.c.
+void CallGoSleep(void);
+void gprofCgoTraceback(void* parg);
+void gprofCgoContext(void* parg);
+*/
+import "C"
+
+import (
+	"fmt"
+	"io"
+	"runtime"
+	"runtime/pprof"
+	"time"
+	"unsafe"
+)
+
+func init() {
+	register("GoroutineProfile", GoroutineProfile)
+}
+
+func GoroutineProfile() {
+	runtime.SetCgoTraceback(0, unsafe.Pointer(C.gprofCgoTraceback), unsafe.Pointer(C.gprofCgoContext), nil)
+
+	go C.CallGoSleep()
+	go C.CallGoSleep()
+	go C.CallGoSleep()
+	time.Sleep(1 * time.Second)
+
+	prof := pprof.Lookup("goroutine")
+	prof.WriteTo(io.Discard, 1)
+	fmt.Println("OK")
+}
+
+//export GoSleep
+func GoSleep() {
+	time.Sleep(time.Hour)
+}
diff --git a/src/runtime/testdata/testprogcgo/gprof_c.c b/src/runtime/testdata/testprogcgo/gprof_c.c
new file mode 100644
index 0000000..5c7cd77
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/gprof_c.c
@@ -0,0 +1,30 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The C definitions for gprof.go. That file uses //export so
+// it can't put function definitions in the "C" import comment.
+
+#include <stdint.h>
+#include <stdlib.h>
+
+// Functions exported from Go.
+extern void GoSleep();
+
+struct cgoContextArg {
+	uintptr_t context;
+};
+
+void gprofCgoContext(void *arg) {
+	((struct cgoContextArg*)arg)->context = 1;
+}
+
+void gprofCgoTraceback(void *arg) {
+	// spend some time here so the P is more likely to be retaken.
+	volatile int i;
+	for (i = 0; i < 123456789; i++);
+}
+
+void CallGoSleep() {
+	GoSleep();
+}
diff --git a/src/runtime/testdata/testprogcgo/issue29707.go b/src/runtime/testdata/testprogcgo/issue29707.go
new file mode 100644
index 0000000..7d9299f
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/issue29707.go
@@ -0,0 +1,60 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9 && !windows
+// +build !plan9,!windows
+
+// This is for issue #29707
+
+package main
+
+/*
+#include <pthread.h>
+
+extern void* callbackTraceParser(void*);
+typedef void* (*cbTraceParser)(void*);
+
+static void testCallbackTraceParser(cbTraceParser cb) {
+	pthread_t thread_id;
+	pthread_create(&thread_id, NULL, cb, NULL);
+	pthread_join(thread_id, NULL);
+}
+*/
+import "C"
+
+import (
+	"bytes"
+	"fmt"
+	traceparser "internal/trace"
+	"runtime/trace"
+	"time"
+	"unsafe"
+)
+
+func init() {
+	register("CgoTraceParser", CgoTraceParser)
+}
+
+//export callbackTraceParser
+func callbackTraceParser(unsafe.Pointer) unsafe.Pointer {
+	time.Sleep(time.Millisecond)
+	return nil
+}
+
+func CgoTraceParser() {
+	buf := new(bytes.Buffer)
+
+	trace.Start(buf)
+	C.testCallbackTraceParser(C.cbTraceParser(C.callbackTraceParser))
+	trace.Stop()
+
+	_, err := traceparser.Parse(buf, "")
+	if err == traceparser.ErrTimeOrder {
+		fmt.Println("ErrTimeOrder")
+	} else if err != nil {
+		fmt.Println("Parse error: ", err)
+	} else {
+		fmt.Println("OK")
+	}
+}
diff --git a/src/runtime/testdata/testprogcgo/lockosthread.c b/src/runtime/testdata/testprogcgo/lockosthread.c
new file mode 100644
index 0000000..b10cc4f
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/lockosthread.c
@@ -0,0 +1,13 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9,!windows
+
+#include <stdint.h>
+
+uint32_t threadExited;
+
+void setExited(void *x) {
+	__sync_fetch_and_add(&threadExited, 1);
+}
diff --git a/src/runtime/testdata/testprogcgo/lockosthread.go b/src/runtime/testdata/testprogcgo/lockosthread.go
new file mode 100644
index 0000000..e6dce36
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/lockosthread.go
@@ -0,0 +1,110 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9 && !windows
+// +build !plan9,!windows
+
+package main
+
+import (
+	"os"
+	"runtime"
+	"sync/atomic"
+	"time"
+	"unsafe"
+)
+
+/*
+#include <pthread.h>
+#include <stdint.h>
+
+extern uint32_t threadExited;
+
+void setExited(void *x);
+*/
+import "C"
+
+var mainThread C.pthread_t
+
+func init() {
+	registerInit("LockOSThreadMain", func() {
+		// init is guaranteed to run on the main thread.
+		mainThread = C.pthread_self()
+	})
+	register("LockOSThreadMain", LockOSThreadMain)
+
+	registerInit("LockOSThreadAlt", func() {
+		// Lock the OS thread now so main runs on the main thread.
+		runtime.LockOSThread()
+	})
+	register("LockOSThreadAlt", LockOSThreadAlt)
+}
+
+func LockOSThreadMain() {
+	// This requires GOMAXPROCS=1 from the beginning to reliably
+	// start a goroutine on the main thread.
+	if runtime.GOMAXPROCS(-1) != 1 {
+		println("requires GOMAXPROCS=1")
+		os.Exit(1)
+	}
+
+	ready := make(chan bool, 1)
+	go func() {
+		// Because GOMAXPROCS=1, this *should* be on the main
+		// thread. Stay there.
+		runtime.LockOSThread()
+		self := C.pthread_self()
+		if C.pthread_equal(mainThread, self) == 0 {
+			println("failed to start goroutine on main thread")
+			os.Exit(1)
+		}
+		// Exit with the thread locked, which should exit the
+		// main thread.
+		ready <- true
+	}()
+	<-ready
+	time.Sleep(1 * time.Millisecond)
+	// Check that this goroutine is still running on a different
+	// thread.
+	self := C.pthread_self()
+	if C.pthread_equal(mainThread, self) != 0 {
+		println("goroutine migrated to locked thread")
+		os.Exit(1)
+	}
+	println("OK")
+}
+
+func LockOSThreadAlt() {
+	// This is running locked to the main OS thread.
+
+	var subThread C.pthread_t
+	ready := make(chan bool, 1)
+	C.threadExited = 0
+	go func() {
+		// This goroutine must be running on a new thread.
+		runtime.LockOSThread()
+		subThread = C.pthread_self()
+		// Register a pthread destructor so we can tell this
+		// thread has exited.
+		var key C.pthread_key_t
+		C.pthread_key_create(&key, (*[0]byte)(unsafe.Pointer(C.setExited)))
+		C.pthread_setspecific(key, unsafe.Pointer(new(int)))
+		ready <- true
+		// Exit with the thread locked.
+	}()
+	<-ready
+	for {
+		time.Sleep(1 * time.Millisecond)
+		// Check that this goroutine is running on a different thread.
+		self := C.pthread_self()
+		if C.pthread_equal(subThread, self) != 0 {
+			println("locked thread reused")
+			os.Exit(1)
+		}
+		if atomic.LoadUint32((*uint32)(&C.threadExited)) != 0 {
+			println("OK")
+			return
+		}
+	}
+}
diff --git a/src/runtime/testdata/testprogcgo/main.go b/src/runtime/testdata/testprogcgo/main.go
new file mode 100644
index 0000000..ae491a2
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/main.go
@@ -0,0 +1,35 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "os"
+
+var cmds = map[string]func(){}
+
+func register(name string, f func()) {
+	if cmds[name] != nil {
+		panic("duplicate registration: " + name)
+	}
+	cmds[name] = f
+}
+
+func registerInit(name string, f func()) {
+	if len(os.Args) >= 2 && os.Args[1] == name {
+		f()
+	}
+}
+
+func main() {
+	if len(os.Args) < 2 {
+		println("usage: " + os.Args[0] + " name-of-test")
+		return
+	}
+	f := cmds[os.Args[1]]
+	if f == nil {
+		println("unknown function: " + os.Args[1])
+		return
+	}
+	f()
+}
diff --git a/src/runtime/testdata/testprogcgo/needmdeadlock.go b/src/runtime/testdata/testprogcgo/needmdeadlock.go
new file mode 100644
index 0000000..b95ec77
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/needmdeadlock.go
@@ -0,0 +1,96 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9 && !windows
+// +build !plan9,!windows
+
+package main
+
+// This is for issue #42207.
+// During a call to needm we could get a SIGCHLD signal
+// which would itself call needm, causing a deadlock.
+
+/*
+#include <signal.h>
+#include <pthread.h>
+#include <sched.h>
+#include <unistd.h>
+
+extern void GoNeedM();
+
+#define SIGNALERS 10
+
+static void* needmSignalThread(void* p) {
+	pthread_t* pt = (pthread_t*)(p);
+	int i;
+
+	for (i = 0; i < 100; i++) {
+		if (pthread_kill(*pt, SIGCHLD) < 0) {
+			return NULL;
+		}
+		usleep(1);
+	}
+	return NULL;
+}
+
+// We don't need many calls, as the deadlock is only likely
+// to occur the first couple of times that needm is called.
+// After that there will likely be an extra M available.
+#define CALLS 10
+
+static void* needmCallbackThread(void* p) {
+	int i;
+
+	for (i = 0; i < SIGNALERS; i++) {
+		sched_yield(); // Help the signal threads get started.
+	}
+	for (i = 0; i < CALLS; i++) {
+		GoNeedM();
+	}
+	return NULL;
+}
+
+static void runNeedmSignalThread() {
+	int i;
+	pthread_t caller;
+	pthread_t s[SIGNALERS];
+
+	pthread_create(&caller, NULL, needmCallbackThread, NULL);
+	for (i = 0; i < SIGNALERS; i++) {
+		pthread_create(&s[i], NULL, needmSignalThread, &caller);
+	}
+	for (i = 0; i < SIGNALERS; i++) {
+		pthread_join(s[i], NULL);
+	}
+	pthread_join(caller, NULL);
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"os"
+	"time"
+)
+
+func init() {
+	register("NeedmDeadlock", NeedmDeadlock)
+}
+
+//export GoNeedM
+func GoNeedM() {
+}
+
+func NeedmDeadlock() {
+	// The failure symptom is that the program hangs because of a
+	// deadlock in needm, so set an alarm.
+	go func() {
+		time.Sleep(5 * time.Second)
+		fmt.Println("Hung for 5 seconds")
+		os.Exit(1)
+	}()
+
+	C.runNeedmSignalThread()
+	fmt.Println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/numgoroutine.go b/src/runtime/testdata/testprogcgo/numgoroutine.go
new file mode 100644
index 0000000..1b9f202
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/numgoroutine.go
@@ -0,0 +1,93 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9 && !windows
+// +build !plan9,!windows
+
+package main
+
+/*
+#include <stddef.h>
+#include <pthread.h>
+
+extern void CallbackNumGoroutine();
+
+static void* thread2(void* arg __attribute__ ((unused))) {
+	CallbackNumGoroutine();
+	return NULL;
+}
+
+static void CheckNumGoroutine() {
+	pthread_t tid;
+	pthread_create(&tid, NULL, thread2, NULL);
+	pthread_join(tid, NULL);
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"runtime"
+	"strings"
+)
+
+var baseGoroutines int
+
+func init() {
+	register("NumGoroutine", NumGoroutine)
+}
+
+func NumGoroutine() {
+	// Test that there are just the expected number of goroutines
+	// running. Specifically, test that the spare M's goroutine
+	// doesn't show up.
+	if _, ok := checkNumGoroutine("first", 1+baseGoroutines); !ok {
+		return
+	}
+
+	// Test that the goroutine for a callback from C appears.
+	if C.CheckNumGoroutine(); !callbackok {
+		return
+	}
+
+	// Make sure we're back to the initial goroutines.
+	if _, ok := checkNumGoroutine("third", 1+baseGoroutines); !ok {
+		return
+	}
+
+	fmt.Println("OK")
+}
+
+func checkNumGoroutine(label string, want int) (string, bool) {
+	n := runtime.NumGoroutine()
+	if n != want {
+		fmt.Printf("%s NumGoroutine: want %d; got %d\n", label, want, n)
+		return "", false
+	}
+
+	sbuf := make([]byte, 32<<10)
+	sbuf = sbuf[:runtime.Stack(sbuf, true)]
+	n = strings.Count(string(sbuf), "goroutine ")
+	if n != want {
+		fmt.Printf("%s Stack: want %d; got %d:\n%s\n", label, want, n, string(sbuf))
+		return "", false
+	}
+	return string(sbuf), true
+}
+
+var callbackok bool
+
+//export CallbackNumGoroutine
+func CallbackNumGoroutine() {
+	stk, ok := checkNumGoroutine("second", 2+baseGoroutines)
+	if !ok {
+		return
+	}
+	if !strings.Contains(stk, "CallbackNumGoroutine") {
+		fmt.Printf("missing CallbackNumGoroutine from stack:\n%s\n", stk)
+		return
+	}
+
+	callbackok = true
+}
diff --git a/src/runtime/testdata/testprogcgo/panic.c b/src/runtime/testdata/testprogcgo/panic.c
new file mode 100644
index 0000000..deb5ed5
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/panic.c
@@ -0,0 +1,9 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+extern void panic_callback();
+
+void call_callback(void) {
+	panic_callback();
+}
diff --git a/src/runtime/testdata/testprogcgo/panic.go b/src/runtime/testdata/testprogcgo/panic.go
new file mode 100644
index 0000000..57ac895
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/panic.go
@@ -0,0 +1,23 @@
+package main
+
+// This program will crash.
+// We want to test unwinding from a cgo callback.
+
+/*
+void call_callback(void);
+*/
+import "C"
+
+func init() {
+	register("PanicCallback", PanicCallback)
+}
+
+//export panic_callback
+func panic_callback() {
+	var i *int
+	*i = 42
+}
+
+func PanicCallback() {
+	C.call_callback()
+}
diff --git a/src/runtime/testdata/testprogcgo/pprof.go b/src/runtime/testdata/testprogcgo/pprof.go
new file mode 100644
index 0000000..8870d0c
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/pprof.go
@@ -0,0 +1,93 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+// Run a slow C function saving a CPU profile.
+
+/*
+#include <stdint.h>
+
+int salt1;
+int salt2;
+
+void cpuHog() {
+	int foo = salt1;
+	int i;
+
+	for (i = 0; i < 100000; i++) {
+		if (foo > 0) {
+			foo *= foo;
+		} else {
+			foo *= foo + 1;
+		}
+	}
+	salt2 = foo;
+}
+
+void cpuHog2() {
+}
+
+struct cgoTracebackArg {
+	uintptr_t  context;
+	uintptr_t  sigContext;
+	uintptr_t* buf;
+	uintptr_t  max;
+};
+
+// pprofCgoTraceback is passed to runtime.SetCgoTraceback.
+// For testing purposes it pretends that all CPU hits in C code are in cpuHog.
+// Issue #29034: At least 2 frames are required to verify all frames are captured
+// since runtime/pprof ignores the runtime.goexit base frame if it exists.
+void pprofCgoTraceback(void* parg) {
+	struct cgoTracebackArg* arg = (struct cgoTracebackArg*)(parg);
+	arg->buf[0] = (uintptr_t)(cpuHog) + 0x10;
+	arg->buf[1] = (uintptr_t)(cpuHog2) + 0x4;
+	arg->buf[2] = 0;
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"os"
+	"runtime"
+	"runtime/pprof"
+	"time"
+	"unsafe"
+)
+
+func init() {
+	register("CgoPprof", CgoPprof)
+}
+
+func CgoPprof() {
+	runtime.SetCgoTraceback(0, unsafe.Pointer(C.pprofCgoTraceback), nil, nil)
+
+	f, err := os.CreateTemp("", "prof")
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	if err := pprof.StartCPUProfile(f); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	t0 := time.Now()
+	for time.Since(t0) < time.Second {
+		C.cpuHog()
+	}
+
+	pprof.StopCPUProfile()
+
+	name := f.Name()
+	if err := f.Close(); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	fmt.Println(name)
+}
diff --git a/src/runtime/testdata/testprogcgo/pprof_callback.go b/src/runtime/testdata/testprogcgo/pprof_callback.go
new file mode 100644
index 0000000..fd87eb8
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/pprof_callback.go
@@ -0,0 +1,89 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9 && !windows
+
+package main
+
+// Make many C-to-Go callback while collecting a CPU profile.
+//
+// This is a regression test for issue 50936.
+
+/*
+#include <unistd.h>
+
+void goCallbackPprof();
+
+static void callGo() {
+	// Spent >20us in C so this thread is eligible for sysmon to retake its
+	// P.
+	usleep(50);
+	goCallbackPprof();
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"os"
+	"runtime"
+	"runtime/pprof"
+	"time"
+)
+
+func init() {
+	register("CgoPprofCallback", CgoPprofCallback)
+}
+
+//export goCallbackPprof
+func goCallbackPprof() {
+	// No-op. We want to stress the cgocall and cgocallback internals,
+	// landing as many pprof signals there as possible.
+}
+
+func CgoPprofCallback() {
+	// Issue 50936 was a crash in the SIGPROF handler when the signal
+	// arrived during the exitsyscall following a cgocall(back) in dropg or
+	// execute, when updating mp.curg.
+	//
+	// These are reachable only when exitsyscall finds no P available. Thus
+	// we make C calls from significantly more Gs than there are available
+	// Ps. Lots of runnable work combined with >20us spent in callGo makes
+	// it possible for sysmon to retake Ps, forcing C calls to go down the
+	// desired exitsyscall path.
+	//
+	// High GOMAXPROCS is used to increase opportunities for failure on
+	// high CPU machines.
+	const (
+		P = 16
+		G = 64
+	)
+	runtime.GOMAXPROCS(P)
+
+	f, err := os.CreateTemp("", "prof")
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+	defer f.Close()
+
+	if err := pprof.StartCPUProfile(f); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	for i := 0; i < G; i++ {
+		go func() {
+			for {
+				C.callGo()
+			}
+		}()
+	}
+
+	time.Sleep(time.Second)
+
+	pprof.StopCPUProfile()
+
+	fmt.Println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/raceprof.go b/src/runtime/testdata/testprogcgo/raceprof.go
new file mode 100644
index 0000000..68cabd4
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/raceprof.go
@@ -0,0 +1,79 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+// +build unix
+
+package main
+
+// Test that we can collect a lot of colliding profiling signals from
+// an external C thread. This used to fail when built with the race
+// detector, because a call of the predeclared function copy was
+// turned into a call to runtime.slicecopy, which is not marked nosplit.
+
+/*
+#include <signal.h>
+#include <stdint.h>
+#include <pthread.h>
+#include <sched.h>
+
+struct cgoTracebackArg {
+	uintptr_t  context;
+	uintptr_t  sigContext;
+	uintptr_t* buf;
+	uintptr_t  max;
+};
+
+static int raceprofCount;
+
+// We want a bunch of different profile stacks that collide in the
+// hash table maintained in runtime/cpuprof.go. This code knows the
+// size of the hash table (1 << 10) and knows that the hash function
+// is simply multiplicative.
+void raceprofTraceback(void* parg) {
+	struct cgoTracebackArg* arg = (struct cgoTracebackArg*)(parg);
+	raceprofCount++;
+	arg->buf[0] = raceprofCount * (1 << 10);
+	arg->buf[1] = 0;
+}
+
+static void* raceprofThread(void* p) {
+	int i;
+
+	for (i = 0; i < 100; i++) {
+		pthread_kill(pthread_self(), SIGPROF);
+		sched_yield();
+	}
+	return 0;
+}
+
+void runRaceprofThread() {
+	pthread_t tid;
+	pthread_create(&tid, 0, raceprofThread, 0);
+	pthread_join(tid, 0);
+}
+*/
+import "C"
+
+import (
+	"bytes"
+	"fmt"
+	"runtime"
+	"runtime/pprof"
+	"unsafe"
+)
+
+func init() {
+	register("CgoRaceprof", CgoRaceprof)
+}
+
+func CgoRaceprof() {
+	runtime.SetCgoTraceback(0, unsafe.Pointer(C.raceprofTraceback), nil, nil)
+
+	var buf bytes.Buffer
+	pprof.StartCPUProfile(&buf)
+
+	C.runRaceprofThread()
+	fmt.Println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/racesig.go b/src/runtime/testdata/testprogcgo/racesig.go
new file mode 100644
index 0000000..0667020
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/racesig.go
@@ -0,0 +1,93 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+// +build unix
+
+package main
+
+// Test that an external C thread that is calling malloc can be hit
+// with SIGCHLD signals. This used to fail when built with the race
+// detector, because in that case the signal handler would indirectly
+// call the C malloc function.
+
+/*
+#include <errno.h>
+#include <signal.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <pthread.h>
+#include <sched.h>
+#include <unistd.h>
+
+#define ALLOCERS 100
+#define SIGNALERS 10
+
+static void* signalThread(void* p) {
+	pthread_t* pt = (pthread_t*)(p);
+	int i, j;
+
+	for (i = 0; i < 100; i++) {
+		for (j = 0; j < ALLOCERS; j++) {
+			if (pthread_kill(pt[j], SIGCHLD) < 0) {
+				return NULL;
+			}
+		}
+		usleep(1);
+	}
+	return NULL;
+}
+
+#define CALLS 100
+
+static void* mallocThread(void* p) {
+	int i;
+	void *a[CALLS];
+
+	for (i = 0; i < ALLOCERS; i++) {
+		sched_yield();
+	}
+	for (i = 0; i < CALLS; i++) {
+		a[i] = malloc(i);
+	}
+	for (i = 0; i < CALLS; i++) {
+		free(a[i]);
+	}
+	return NULL;
+}
+
+void runRaceSignalThread() {
+	int i;
+	pthread_t m[ALLOCERS];
+	pthread_t s[SIGNALERS];
+
+	for (i = 0; i < ALLOCERS; i++) {
+		pthread_create(&m[i], NULL, mallocThread, NULL);
+	}
+	for (i = 0; i < SIGNALERS; i++) {
+		pthread_create(&s[i], NULL, signalThread, &m[0]);
+	}
+	for (i = 0; i < SIGNALERS; i++) {
+		pthread_join(s[i], NULL);
+	}
+	for (i = 0; i < ALLOCERS; i++) {
+		pthread_join(m[i], NULL);
+	}
+}
+*/
+import "C"
+
+import (
+	"fmt"
+)
+
+func init() {
+	register("CgoRaceSignal", CgoRaceSignal)
+}
+
+func CgoRaceSignal() {
+	C.runRaceSignalThread()
+	fmt.Println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/segv.go b/src/runtime/testdata/testprogcgo/segv.go
new file mode 100644
index 0000000..c776fe6
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/segv.go
@@ -0,0 +1,34 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package main
+
+// #include <unistd.h>
+// static void nop() {}
+import "C"
+
+import "syscall"
+
+func init() {
+	register("SegvInCgo", SegvInCgo)
+}
+
+func SegvInCgo() {
+	c := make(chan bool)
+	go func() {
+		close(c)
+		for {
+			C.nop()
+		}
+	}()
+
+	<-c
+
+	syscall.Kill(syscall.Getpid(), syscall.SIGSEGV)
+
+	// Wait for the OS to deliver the signal.
+	C.pause()
+}
diff --git a/src/runtime/testdata/testprogcgo/segv_linux.go b/src/runtime/testdata/testprogcgo/segv_linux.go
new file mode 100644
index 0000000..517ce72
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/segv_linux.go
@@ -0,0 +1,32 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+// #include <unistd.h>
+// static void nop() {}
+import "C"
+
+import "syscall"
+
+func init() {
+	register("TgkillSegvInCgo", TgkillSegvInCgo)
+}
+
+func TgkillSegvInCgo() {
+	c := make(chan bool)
+	go func() {
+		close(c)
+		for {
+			C.nop()
+		}
+	}()
+
+	<-c
+
+	syscall.Tgkill(syscall.Getpid(), syscall.Gettid(), syscall.SIGSEGV)
+
+	// Wait for the OS to deliver the signal.
+	C.pause()
+}
diff --git a/src/runtime/testdata/testprogcgo/sigfwd.go b/src/runtime/testdata/testprogcgo/sigfwd.go
new file mode 100644
index 0000000..f6a0c03
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/sigfwd.go
@@ -0,0 +1,87 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+package main
+
+import (
+	"fmt"
+	"os"
+)
+
+/*
+#include <signal.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+sig_atomic_t expectCSigsegv;
+int *sigfwdP;
+
+static void sigsegv() {
+	expectCSigsegv = 1;
+	*sigfwdP = 1;
+	fprintf(stderr, "ERROR: C SIGSEGV not thrown on caught?.\n");
+	exit(2);
+}
+
+static void segvhandler(int signum) {
+	if (signum == SIGSEGV) {
+		if (expectCSigsegv == 0) {
+			fprintf(stderr, "SIGSEGV caught in C unexpectedly\n");
+			exit(1);
+		}
+		fprintf(stdout, "OK\n");
+		exit(0);  // success
+	}
+}
+
+static void __attribute__ ((constructor)) sigsetup(void) {
+	if (getenv("GO_TEST_CGOSIGFWD") == NULL) {
+		return;
+	}
+
+	struct sigaction act;
+
+	memset(&act, 0, sizeof act);
+	act.sa_handler = segvhandler;
+	sigaction(SIGSEGV, &act, NULL);
+}
+*/
+import "C"
+
+func init() {
+	register("CgoSigfwd", CgoSigfwd)
+}
+
+var nilPtr *byte
+
+func f() (ret bool) {
+	defer func() {
+		if recover() == nil {
+			fmt.Fprintf(os.Stderr, "ERROR: couldn't raise SIGSEGV in Go\n")
+			C.exit(2)
+		}
+		ret = true
+	}()
+	*nilPtr = 1
+	return false
+}
+
+func CgoSigfwd() {
+	if os.Getenv("GO_TEST_CGOSIGFWD") == "" {
+		fmt.Fprintf(os.Stderr, "test must be run with GO_TEST_CGOSIGFWD set\n")
+		os.Exit(1)
+	}
+
+	// Test that the signal originating in Go is handled (and recovered) by Go.
+	if !f() {
+		fmt.Fprintf(os.Stderr, "couldn't recover from SIGSEGV in Go.\n")
+		C.exit(2)
+	}
+
+	// Test that the signal originating in C is handled by C.
+	C.sigsegv()
+}
diff --git a/src/runtime/testdata/testprogcgo/sigpanic.go b/src/runtime/testdata/testprogcgo/sigpanic.go
new file mode 100644
index 0000000..cb46030
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/sigpanic.go
@@ -0,0 +1,28 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+// This program will crash.
+// We want to test unwinding from sigpanic into C code (without a C symbolizer).
+
+/*
+#cgo CFLAGS: -O0
+
+char *pnil;
+
+static int f1(void) {
+	*pnil = 0;
+	return 0;
+}
+*/
+import "C"
+
+func init() {
+	register("TracebackSigpanic", TracebackSigpanic)
+}
+
+func TracebackSigpanic() {
+	C.f1()
+}
diff --git a/src/runtime/testdata/testprogcgo/sigstack.go b/src/runtime/testdata/testprogcgo/sigstack.go
new file mode 100644
index 0000000..12ca661
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/sigstack.go
@@ -0,0 +1,99 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9 && !windows
+// +build !plan9,!windows
+
+// Test handling of Go-allocated signal stacks when calling from
+// C-created threads with and without signal stacks. (See issue
+// #22930.)
+
+package main
+
+/*
+#include <pthread.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/mman.h>
+
+#ifdef _AIX
+// On AIX, SIGSTKSZ is too small to handle Go sighandler.
+#define CSIGSTKSZ 0x4000
+#else
+#define CSIGSTKSZ SIGSTKSZ
+#endif
+
+extern void SigStackCallback();
+
+static void* WithSigStack(void* arg __attribute__((unused))) {
+	// Set up an alternate system stack.
+	void* base = mmap(0, CSIGSTKSZ, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
+	if (base == MAP_FAILED) {
+		perror("mmap failed");
+		abort();
+	}
+	stack_t st = {}, ost = {};
+	st.ss_sp = (char*)base;
+	st.ss_flags = 0;
+	st.ss_size = CSIGSTKSZ;
+	if (sigaltstack(&st, &ost) < 0) {
+		perror("sigaltstack failed");
+		abort();
+	}
+
+	// Call Go.
+	SigStackCallback();
+
+	// Disable signal stack and protect it so we can detect reuse.
+	if (ost.ss_flags & SS_DISABLE) {
+		// Darwin libsystem has a bug where it checks ss_size
+		// even if SS_DISABLE is set. (The kernel gets it right.)
+		ost.ss_size = CSIGSTKSZ;
+	}
+	if (sigaltstack(&ost, NULL) < 0) {
+		perror("sigaltstack restore failed");
+		abort();
+	}
+	mprotect(base, CSIGSTKSZ, PROT_NONE);
+	return NULL;
+}
+
+static void* WithoutSigStack(void* arg __attribute__((unused))) {
+	SigStackCallback();
+	return NULL;
+}
+
+static void DoThread(int sigstack) {
+	pthread_t tid;
+	if (sigstack) {
+		pthread_create(&tid, NULL, WithSigStack, NULL);
+	} else {
+		pthread_create(&tid, NULL, WithoutSigStack, NULL);
+	}
+	pthread_join(tid, NULL);
+}
+*/
+import "C"
+
+func init() {
+	register("SigStack", SigStack)
+}
+
+func SigStack() {
+	C.DoThread(0)
+	C.DoThread(1)
+	C.DoThread(0)
+	C.DoThread(1)
+	println("OK")
+}
+
+var BadPtr *int
+
+//export SigStackCallback
+func SigStackCallback() {
+	// Cause the Go signal handler to run.
+	defer func() { recover() }()
+	*BadPtr = 42
+}
diff --git a/src/runtime/testdata/testprogcgo/sigthrow.go b/src/runtime/testdata/testprogcgo/sigthrow.go
new file mode 100644
index 0000000..665e3b0
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/sigthrow.go
@@ -0,0 +1,20 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+// This program will abort.
+
+/*
+#include <stdlib.h>
+*/
+import "C"
+
+func init() {
+	register("Abort", Abort)
+}
+
+func Abort() {
+	C.abort()
+}
diff --git a/src/runtime/testdata/testprogcgo/stack_windows.go b/src/runtime/testdata/testprogcgo/stack_windows.go
new file mode 100644
index 0000000..0be1126
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/stack_windows.go
@@ -0,0 +1,57 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "C"
+import (
+	"internal/syscall/windows"
+	"runtime"
+	"sync"
+	"syscall"
+	"unsafe"
+)
+
+func init() {
+	register("StackMemory", StackMemory)
+}
+
+func getPagefileUsage() (uintptr, error) {
+	p, err := syscall.GetCurrentProcess()
+	if err != nil {
+		return 0, err
+	}
+	var m windows.PROCESS_MEMORY_COUNTERS
+	err = windows.GetProcessMemoryInfo(p, &m, uint32(unsafe.Sizeof(m)))
+	if err != nil {
+		return 0, err
+	}
+	return m.PagefileUsage, nil
+}
+
+func StackMemory() {
+	mem1, err := getPagefileUsage()
+	if err != nil {
+		panic(err)
+	}
+	const threadCount = 100
+	var wg sync.WaitGroup
+	for i := 0; i < threadCount; i++ {
+		wg.Add(1)
+		go func() {
+			runtime.LockOSThread()
+			wg.Done()
+			select {}
+		}()
+	}
+	wg.Wait()
+	mem2, err := getPagefileUsage()
+	if err != nil {
+		panic(err)
+	}
+	// assumes that this process creates 1 thread for each
+	// thread locked goroutine plus extra 5 threads
+	// like sysmon and others
+	print((mem2 - mem1) / (threadCount + 5))
+}
diff --git a/src/runtime/testdata/testprogcgo/stackswitch.c b/src/runtime/testdata/testprogcgo/stackswitch.c
new file mode 100644
index 0000000..3473d5b
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/stackswitch.c
@@ -0,0 +1,147 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix && !android && !openbsd
+
+// Required for darwin ucontext.
+#define _XOPEN_SOURCE
+// Required for netbsd stack_t if _XOPEN_SOURCE is set.
+#define _XOPEN_SOURCE_EXTENDED	1
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
+
+#include <assert.h>
+#include <pthread.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <ucontext.h>
+
+// musl libc does not provide getcontext, etc. Skip the test there.
+//
+// musl libc doesn't provide any direct detection mechanism. So assume any
+// non-glibc linux is using musl.
+//
+// Note that bionic does not provide getcontext either, but that is skipped via
+// the android build tag.
+#if defined(__linux__) && !defined(__GLIBC__)
+#define MUSL 1
+#endif
+#if defined(MUSL)
+void callStackSwitchCallbackFromThread(void) {
+	printf("SKIP\n");
+	exit(0);
+}
+#else
+
+// Use a stack size larger than the 32kb estimate in
+// runtime.callbackUpdateSystemStack. This ensures that a second stack
+// allocation won't accidentally count as in bounds of the first stack
+#define STACK_SIZE	(64ull << 10)
+
+static ucontext_t uctx_save, uctx_switch;
+
+extern void stackSwitchCallback(void);
+
+char *stack2;
+
+static void *stackSwitchThread(void *arg) {
+	// Simple test: callback works from the normal system stack.
+	stackSwitchCallback();
+
+	// Next, verify that switching stacks doesn't break callbacks.
+
+	char *stack1 = malloc(STACK_SIZE);
+	if (stack1 == NULL) {
+		perror("malloc");
+		exit(1);
+	}
+
+	// Allocate the second stack before freeing the first to ensure we don't get
+	// the same address from malloc.
+	//
+	// Will be freed in stackSwitchThread2.
+	stack2 = malloc(STACK_SIZE);
+	if (stack1 == NULL) {
+		perror("malloc");
+		exit(1);
+	}
+
+	if (getcontext(&uctx_switch) == -1) {
+		perror("getcontext");
+		exit(1);
+	}
+	uctx_switch.uc_stack.ss_sp = stack1;
+	uctx_switch.uc_stack.ss_size = STACK_SIZE;
+	uctx_switch.uc_link = &uctx_save;
+	makecontext(&uctx_switch, stackSwitchCallback, 0);
+
+	if (swapcontext(&uctx_save, &uctx_switch) == -1) {
+		perror("swapcontext");
+		exit(1);
+	}
+
+	if (getcontext(&uctx_switch) == -1) {
+		perror("getcontext");
+		exit(1);
+	}
+	uctx_switch.uc_stack.ss_sp = stack2;
+	uctx_switch.uc_stack.ss_size = STACK_SIZE;
+	uctx_switch.uc_link = &uctx_save;
+	makecontext(&uctx_switch, stackSwitchCallback, 0);
+
+	if (swapcontext(&uctx_save, &uctx_switch) == -1) {
+		perror("swapcontext");
+		exit(1);
+	}
+
+	free(stack1);
+
+	return NULL;
+}
+
+static void *stackSwitchThread2(void *arg) {
+	// New thread. Use stack bounds that partially overlap the previous
+	// bounds. needm should refresh the stack bounds anyway since this is a
+	// new thread.
+
+	// N.B. since we used a custom stack with makecontext,
+	// callbackUpdateSystemStack had to guess the bounds. Its guess assumes
+	// a 32KiB stack.
+	char *prev_stack_lo = stack2 + STACK_SIZE - (32*1024);
+
+	// New SP is just barely in bounds, but if we don't update the bounds
+	// we'll almost certainly overflow. The SP that
+	// callbackUpdateSystemStack sees already has some data pushed, so it
+	// will be a bit below what we set here. Thus we include some slack.
+	char *new_stack_hi = prev_stack_lo + 128;
+
+	if (getcontext(&uctx_switch) == -1) {
+		perror("getcontext");
+		exit(1);
+	}
+	uctx_switch.uc_stack.ss_sp = new_stack_hi - (STACK_SIZE / 2);
+	uctx_switch.uc_stack.ss_size = STACK_SIZE / 2;
+	uctx_switch.uc_link = &uctx_save;
+	makecontext(&uctx_switch, stackSwitchCallback, 0);
+
+	if (swapcontext(&uctx_save, &uctx_switch) == -1) {
+		perror("swapcontext");
+		exit(1);
+	}
+
+	free(stack2);
+
+	return NULL;
+}
+
+void callStackSwitchCallbackFromThread(void) {
+	pthread_t thread;
+	assert(pthread_create(&thread, NULL, stackSwitchThread, NULL) == 0);
+	assert(pthread_join(thread, NULL) == 0);
+
+	assert(pthread_create(&thread, NULL, stackSwitchThread2, NULL) == 0);
+	assert(pthread_join(thread, NULL) == 0);
+}
+
+#endif
diff --git a/src/runtime/testdata/testprogcgo/stackswitch.go b/src/runtime/testdata/testprogcgo/stackswitch.go
new file mode 100644
index 0000000..a2e422f
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/stackswitch.go
@@ -0,0 +1,43 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix && !android && !openbsd
+
+package main
+
+/*
+void callStackSwitchCallbackFromThread(void);
+*/
+import "C"
+
+import (
+	"fmt"
+	"runtime/debug"
+)
+
+func init() {
+	register("StackSwitchCallback", StackSwitchCallback)
+}
+
+//export stackSwitchCallback
+func stackSwitchCallback() {
+	// We want to trigger a bounds check on the g0 stack. To do this, we
+	// need to call a splittable function through systemstack().
+	// SetGCPercent contains such a systemstack call.
+	gogc := debug.SetGCPercent(100)
+	debug.SetGCPercent(gogc)
+}
+
+
+// Regression test for https://go.dev/issue/62440. It should be possible for C
+// threads to call into Go from different stacks without crashing due to g0
+// stack bounds checks.
+//
+// N.B. This is only OK for threads created in C. Threads with Go frames up the
+// stack must not change the stack out from under us.
+func StackSwitchCallback() {
+	C.callStackSwitchCallbackFromThread();
+
+	fmt.Printf("OK\n")
+}
diff --git a/src/runtime/testdata/testprogcgo/threadpanic.go b/src/runtime/testdata/testprogcgo/threadpanic.go
new file mode 100644
index 0000000..2d24fe6
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/threadpanic.go
@@ -0,0 +1,25 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9
+// +build !plan9
+
+package main
+
+// void start(void);
+import "C"
+
+func init() {
+	register("CgoExternalThreadPanic", CgoExternalThreadPanic)
+}
+
+func CgoExternalThreadPanic() {
+	C.start()
+	select {}
+}
+
+//export gopanic
+func gopanic() {
+	panic("BOOM")
+}
diff --git a/src/runtime/testdata/testprogcgo/threadpanic_unix.c b/src/runtime/testdata/testprogcgo/threadpanic_unix.c
new file mode 100644
index 0000000..c426452
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/threadpanic_unix.c
@@ -0,0 +1,26 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !plan9,!windows
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <pthread.h>
+
+void gopanic(void);
+
+static void*
+die(void* x)
+{
+	gopanic();
+	return 0;
+}
+
+void
+start(void)
+{
+	pthread_t t;
+	if(pthread_create(&t, 0, die, 0) != 0)
+		printf("pthread_create failed\n");
+}
diff --git a/src/runtime/testdata/testprogcgo/threadpanic_windows.c b/src/runtime/testdata/testprogcgo/threadpanic_windows.c
new file mode 100644
index 0000000..ba66d0f
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/threadpanic_windows.c
@@ -0,0 +1,23 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <process.h>
+#include <stdlib.h>
+#include <stdio.h>
+
+void gopanic(void);
+
+static unsigned int __attribute__((__stdcall__))
+die(void* x)
+{
+	gopanic();
+	return 0;
+}
+
+void
+start(void)
+{
+	if(_beginthreadex(0, 0, die, 0, 0, 0) != 0)
+		printf("_beginthreadex failed\n");
+}
diff --git a/src/runtime/testdata/testprogcgo/threadpprof.go b/src/runtime/testdata/testprogcgo/threadpprof.go
new file mode 100644
index 0000000..70717e0
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/threadpprof.go
@@ -0,0 +1,128 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9 && !windows
+// +build !plan9,!windows
+
+package main
+
+// Run a slow C function saving a CPU profile.
+
+/*
+#include <stdint.h>
+#include <time.h>
+#include <pthread.h>
+
+int threadSalt1;
+int threadSalt2;
+
+static pthread_t tid;
+
+void cpuHogThread() {
+	int foo = threadSalt1;
+	int i;
+
+	for (i = 0; i < 100000; i++) {
+		if (foo > 0) {
+			foo *= foo;
+		} else {
+			foo *= foo + 1;
+		}
+	}
+	threadSalt2 = foo;
+}
+
+void cpuHogThread2() {
+}
+
+struct cgoTracebackArg {
+	uintptr_t  context;
+	uintptr_t  sigContext;
+	uintptr_t* buf;
+	uintptr_t  max;
+};
+
+// pprofCgoThreadTraceback is passed to runtime.SetCgoTraceback.
+// For testing purposes it pretends that all CPU hits on the cpuHog
+// C thread are in cpuHog.
+void pprofCgoThreadTraceback(void* parg) {
+	struct cgoTracebackArg* arg = (struct cgoTracebackArg*)(parg);
+	if (pthread_self() == tid) {
+		arg->buf[0] = (uintptr_t)(cpuHogThread) + 0x10;
+		arg->buf[1] = (uintptr_t)(cpuHogThread2) + 0x4;
+		arg->buf[2] = 0;
+	} else
+		arg->buf[0] = 0;
+}
+
+static void* cpuHogDriver(void* arg __attribute__ ((unused))) {
+	while (1) {
+		cpuHogThread();
+	}
+	return 0;
+}
+
+void runCPUHogThread(void) {
+	pthread_create(&tid, 0, cpuHogDriver, 0);
+}
+*/
+import "C"
+
+import (
+	"context"
+	"fmt"
+	"os"
+	"runtime"
+	"runtime/pprof"
+	"time"
+	"unsafe"
+)
+
+func init() {
+	register("CgoPprofThread", CgoPprofThread)
+	register("CgoPprofThreadNoTraceback", CgoPprofThreadNoTraceback)
+}
+
+func CgoPprofThread() {
+	runtime.SetCgoTraceback(0, unsafe.Pointer(C.pprofCgoThreadTraceback), nil, nil)
+	pprofThread()
+}
+
+func CgoPprofThreadNoTraceback() {
+	pprofThread()
+}
+
+func pprofThread() {
+	f, err := os.CreateTemp("", "prof")
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	if err := pprof.StartCPUProfile(f); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	// This goroutine may receive a profiling signal while creating the C-owned
+	// thread. If it does, the SetCgoTraceback handler will make the leaf end of
+	// the stack look almost (but not exactly) like the stacks the test case is
+	// trying to find. Attach a profiler label so the test can filter out those
+	// confusing samples.
+	pprof.Do(context.Background(), pprof.Labels("ignore", "ignore"), func(ctx context.Context) {
+		C.runCPUHogThread()
+	})
+
+	time.Sleep(1 * time.Second)
+
+	pprof.StopCPUProfile()
+
+	name := f.Name()
+	if err := f.Close(); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		os.Exit(2)
+	}
+
+	fmt.Println(name)
+}
diff --git a/src/runtime/testdata/testprogcgo/threadprof.go b/src/runtime/testdata/testprogcgo/threadprof.go
new file mode 100644
index 0000000..00b511d
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/threadprof.go
@@ -0,0 +1,105 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !plan9 && !windows
+// +build !plan9,!windows
+
+package main
+
+/*
+#include <stdint.h>
+#include <stdlib.h>
+#include <signal.h>
+#include <pthread.h>
+
+volatile int32_t spinlock;
+
+// Note that this thread is only started if GO_START_SIGPROF_THREAD
+// is set in the environment, which is only done when running the
+// CgoExternalThreadSIGPROF test.
+static void *thread1(void *p) {
+	(void)p;
+	while (spinlock == 0)
+		;
+	pthread_kill(pthread_self(), SIGPROF);
+	spinlock = 0;
+	return NULL;
+}
+
+// This constructor function is run when the program starts.
+// It is used for the CgoExternalThreadSIGPROF test.
+__attribute__((constructor)) void issue9456() {
+	if (getenv("GO_START_SIGPROF_THREAD") != NULL) {
+		pthread_t tid;
+		pthread_create(&tid, 0, thread1, NULL);
+	}
+}
+
+void **nullptr;
+
+void *crash(void *p) {
+	*nullptr = p;
+	return 0;
+}
+
+int start_crashing_thread(void) {
+	pthread_t tid;
+	return pthread_create(&tid, 0, crash, 0);
+}
+*/
+import "C"
+
+import (
+	"fmt"
+	"os"
+	"os/exec"
+	"runtime"
+	"sync/atomic"
+	"time"
+	"unsafe"
+)
+
+func init() {
+	register("CgoExternalThreadSIGPROF", CgoExternalThreadSIGPROF)
+	register("CgoExternalThreadSignal", CgoExternalThreadSignal)
+}
+
+func CgoExternalThreadSIGPROF() {
+	// This test intends to test that sending SIGPROF to foreign threads
+	// before we make any cgo call will not abort the whole process, so
+	// we cannot make any cgo call here. See https://golang.org/issue/9456.
+	atomic.StoreInt32((*int32)(unsafe.Pointer(&C.spinlock)), 1)
+	for atomic.LoadInt32((*int32)(unsafe.Pointer(&C.spinlock))) == 1 {
+		runtime.Gosched()
+	}
+	println("OK")
+}
+
+func CgoExternalThreadSignal() {
+	if len(os.Args) > 2 && os.Args[2] == "crash" {
+		i := C.start_crashing_thread()
+		if i != 0 {
+			fmt.Println("pthread_create failed:", i)
+			// Exit with 0 because parent expects us to crash.
+			return
+		}
+
+		// We should crash immediately, but give it plenty of
+		// time before failing (by exiting 0) in case we are
+		// running on a slow system.
+		time.Sleep(5 * time.Second)
+		return
+	}
+
+	cmd := exec.Command(os.Args[0], "CgoExternalThreadSignal", "crash")
+	cmd.Dir = os.TempDir() // put any core file in tempdir
+	out, err := cmd.CombinedOutput()
+	if err == nil {
+		fmt.Println("C signal did not crash as expected")
+		fmt.Printf("\n%s\n", out)
+		os.Exit(1)
+	}
+
+	fmt.Println("OK")
+}
diff --git a/src/runtime/testdata/testprogcgo/trace.go b/src/runtime/testdata/testprogcgo/trace.go
new file mode 100644
index 0000000..875434b
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/trace.go
@@ -0,0 +1,60 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+/*
+// Defined in trace_*.c.
+void cCalledFromGo(void);
+*/
+import "C"
+import (
+	"context"
+	"fmt"
+	"log"
+	"os"
+	"runtime/trace"
+)
+
+func init() {
+	register("Trace", Trace)
+}
+
+// Trace is used by TestTraceUnwindCGO.
+func Trace() {
+	file, err := os.CreateTemp("", "testprogcgo_trace")
+	if err != nil {
+		log.Fatalf("failed to create temp file: %s", err)
+	}
+	defer file.Close()
+
+	if err := trace.Start(file); err != nil {
+		log.Fatal(err)
+	}
+	defer trace.Stop()
+
+	goCalledFromGo()
+	<-goCalledFromCThreadChan
+
+	fmt.Printf("trace path:%s", file.Name())
+}
+
+// goCalledFromGo calls cCalledFromGo which calls back into goCalledFromC and
+// goCalledFromCThread.
+func goCalledFromGo() {
+	C.cCalledFromGo()
+}
+
+//export goCalledFromC
+func goCalledFromC() {
+	trace.Log(context.Background(), "goCalledFromC", "")
+}
+
+var goCalledFromCThreadChan = make(chan struct{})
+
+//export goCalledFromCThread
+func goCalledFromCThread() {
+	trace.Log(context.Background(), "goCalledFromCThread", "")
+	close(goCalledFromCThreadChan)
+}
diff --git a/src/runtime/testdata/testprogcgo/trace_unix.c b/src/runtime/testdata/testprogcgo/trace_unix.c
new file mode 100644
index 0000000..0fa55c7
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/trace_unix.c
@@ -0,0 +1,27 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build unix
+
+// The unix C definitions for trace.go. That file uses //export so
+// it can't put function definitions in the "C" import comment.
+
+#include <pthread.h>
+#include <assert.h>
+
+extern void goCalledFromC(void);
+extern void goCalledFromCThread(void);
+
+static void* cCalledFromCThread(void *p) {
+	goCalledFromCThread();
+	return NULL;
+}
+
+void cCalledFromGo(void) {
+	goCalledFromC();
+
+	pthread_t thread;
+	assert(pthread_create(&thread, NULL, cCalledFromCThread, NULL) == 0);
+	assert(pthread_join(thread, NULL) == 0);
+}
diff --git a/src/runtime/testdata/testprogcgo/trace_windows.c b/src/runtime/testdata/testprogcgo/trace_windows.c
new file mode 100644
index 0000000..7758054
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/trace_windows.c
@@ -0,0 +1,29 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The windows C definitions for trace.go. That file uses //export so
+// it can't put function definitions in the "C" import comment.
+
+#define WIN32_LEAN_AND_MEAN
+#include <windows.h>
+#include <process.h>
+#include "_cgo_export.h"
+
+extern void goCalledFromC(void);
+extern void goCalledFromCThread(void);
+
+__stdcall
+static unsigned int cCalledFromCThread(void *p) {
+	goCalledFromCThread();
+	return 0;
+}
+
+void cCalledFromGo(void) {
+	goCalledFromC();
+
+	uintptr_t thread;
+	thread = _beginthreadex(NULL, 0, cCalledFromCThread, NULL, 0, NULL);
+	WaitForSingleObject((HANDLE)thread, INFINITE);
+	CloseHandle((HANDLE)thread);
+}
diff --git a/src/runtime/testdata/testprogcgo/traceback.go b/src/runtime/testdata/testprogcgo/traceback.go
new file mode 100644
index 0000000..e2d7599
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/traceback.go
@@ -0,0 +1,54 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+// This program will crash.
+// We want the stack trace to include the C functions.
+// We use a fake traceback, and a symbolizer that dumps a string we recognize.
+
+/*
+#cgo CFLAGS: -g -O0
+
+// Defined in traceback_c.c.
+extern int crashInGo;
+int tracebackF1(void);
+void cgoTraceback(void* parg);
+void cgoSymbolizer(void* parg);
+*/
+import "C"
+
+import (
+	"runtime"
+	"unsafe"
+)
+
+func init() {
+	register("CrashTraceback", CrashTraceback)
+	register("CrashTracebackGo", CrashTracebackGo)
+}
+
+func CrashTraceback() {
+	runtime.SetCgoTraceback(0, unsafe.Pointer(C.cgoTraceback), nil, unsafe.Pointer(C.cgoSymbolizer))
+	C.tracebackF1()
+}
+
+func CrashTracebackGo() {
+	C.crashInGo = 1
+	CrashTraceback()
+}
+
+//export h1
+func h1() {
+	h2()
+}
+
+func h2() {
+	h3()
+}
+
+func h3() {
+	var x *int
+	*x = 0
+}
diff --git a/src/runtime/testdata/testprogcgo/traceback_c.c b/src/runtime/testdata/testprogcgo/traceback_c.c
new file mode 100644
index 0000000..56eda8f
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/traceback_c.c
@@ -0,0 +1,65 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The C definitions for traceback.go. That file uses //export so
+// it can't put function definitions in the "C" import comment.
+
+#include <stdint.h>
+
+char *p;
+
+int crashInGo;
+extern void h1(void);
+
+int tracebackF3(void) {
+	if (crashInGo)
+		h1();
+	else
+		*p = 0;
+	return 0;
+}
+
+int tracebackF2(void) {
+	return tracebackF3();
+}
+
+int tracebackF1(void) {
+	return tracebackF2();
+}
+
+struct cgoTracebackArg {
+	uintptr_t  context;
+	uintptr_t  sigContext;
+	uintptr_t* buf;
+	uintptr_t  max;
+};
+
+struct cgoSymbolizerArg {
+	uintptr_t   pc;
+	const char* file;
+	uintptr_t   lineno;
+	const char* func;
+	uintptr_t   entry;
+	uintptr_t   more;
+	uintptr_t   data;
+};
+
+void cgoTraceback(void* parg) {
+	struct cgoTracebackArg* arg = (struct cgoTracebackArg*)(parg);
+	arg->buf[0] = 1;
+	arg->buf[1] = 2;
+	arg->buf[2] = 3;
+	arg->buf[3] = 0;
+}
+
+void cgoSymbolizer(void* parg) {
+	struct cgoSymbolizerArg* arg = (struct cgoSymbolizerArg*)(parg);
+	if (arg->pc != arg->data + 1) {
+		arg->file = "unexpected data";
+	} else {
+		arg->file = "cgo symbolizer";
+	}
+	arg->lineno = arg->data + 1;
+	arg->data++;
+}
diff --git a/src/runtime/testdata/testprogcgo/tracebackctxt.go b/src/runtime/testdata/testprogcgo/tracebackctxt.go
new file mode 100644
index 0000000..62ff8ec
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/tracebackctxt.go
@@ -0,0 +1,136 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+// Test the context argument to SetCgoTraceback.
+// Use fake context, traceback, and symbolizer functions.
+
+/*
+// Defined in tracebackctxt_c.c.
+extern void C1(void);
+extern void C2(void);
+extern void tcContext(void*);
+extern void tcContextSimple(void*);
+extern void tcTraceback(void*);
+extern void tcSymbolizer(void*);
+extern int getContextCount(void);
+extern void TracebackContextPreemptionCallGo(int);
+*/
+import "C"
+
+import (
+	"fmt"
+	"runtime"
+	"sync"
+	"unsafe"
+)
+
+func init() {
+	register("TracebackContext", TracebackContext)
+	register("TracebackContextPreemption", TracebackContextPreemption)
+}
+
+var tracebackOK bool
+
+func TracebackContext() {
+	runtime.SetCgoTraceback(0, unsafe.Pointer(C.tcTraceback), unsafe.Pointer(C.tcContext), unsafe.Pointer(C.tcSymbolizer))
+	C.C1()
+	if got := C.getContextCount(); got != 0 {
+		fmt.Printf("at end contextCount == %d, expected 0\n", got)
+		tracebackOK = false
+	}
+	if tracebackOK {
+		fmt.Println("OK")
+	}
+}
+
+//export G1
+func G1() {
+	C.C2()
+}
+
+//export G2
+func G2() {
+	pc := make([]uintptr, 32)
+	n := runtime.Callers(0, pc)
+	cf := runtime.CallersFrames(pc[:n])
+	var frames []runtime.Frame
+	for {
+		frame, more := cf.Next()
+		frames = append(frames, frame)
+		if !more {
+			break
+		}
+	}
+
+	want := []struct {
+		function string
+		line     int
+	}{
+		{"main.G2", 0},
+		{"cFunction", 0x10200},
+		{"cFunction", 0x200},
+		{"cFunction", 0x10201},
+		{"cFunction", 0x201},
+		{"main.G1", 0},
+		{"cFunction", 0x10100},
+		{"cFunction", 0x100},
+		{"main.TracebackContext", 0},
+	}
+
+	ok := true
+	i := 0
+wantLoop:
+	for _, w := range want {
+		for ; i < len(frames); i++ {
+			if w.function == frames[i].Function {
+				if w.line != 0 && w.line != frames[i].Line {
+					fmt.Printf("found function %s at wrong line %#x (expected %#x)\n", w.function, frames[i].Line, w.line)
+					ok = false
+				}
+				i++
+				continue wantLoop
+			}
+		}
+		fmt.Printf("did not find function %s in\n", w.function)
+		for _, f := range frames {
+			fmt.Println(f)
+		}
+		ok = false
+		break
+	}
+	tracebackOK = ok
+	if got := C.getContextCount(); got != 2 {
+		fmt.Printf("at bottom contextCount == %d, expected 2\n", got)
+		tracebackOK = false
+	}
+}
+
+// Issue 47441.
+func TracebackContextPreemption() {
+	runtime.SetCgoTraceback(0, unsafe.Pointer(C.tcTraceback), unsafe.Pointer(C.tcContextSimple), unsafe.Pointer(C.tcSymbolizer))
+
+	const funcs = 10
+	const calls = 1e5
+	var wg sync.WaitGroup
+	for i := 0; i < funcs; i++ {
+		wg.Add(1)
+		go func(i int) {
+			defer wg.Done()
+			for j := 0; j < calls; j++ {
+				C.TracebackContextPreemptionCallGo(C.int(i*calls + j))
+			}
+		}(i)
+	}
+	wg.Wait()
+
+	fmt.Println("OK")
+}
+
+//export TracebackContextPreemptionGoFunction
+func TracebackContextPreemptionGoFunction(i C.int) {
+	// Do some busy work.
+	fmt.Sprintf("%d\n", i)
+}
diff --git a/src/runtime/testdata/testprogcgo/tracebackctxt_c.c b/src/runtime/testdata/testprogcgo/tracebackctxt_c.c
new file mode 100644
index 0000000..910cb7b
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/tracebackctxt_c.c
@@ -0,0 +1,103 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// The C definitions for tracebackctxt.go. That file uses //export so
+// it can't put function definitions in the "C" import comment.
+
+#include <stdlib.h>
+#include <stdint.h>
+
+// Functions exported from Go.
+extern void G1(void);
+extern void G2(void);
+extern void TracebackContextPreemptionGoFunction(int);
+
+void C1() {
+	G1();
+}
+
+void C2() {
+	G2();
+}
+
+struct cgoContextArg {
+	uintptr_t context;
+};
+
+struct cgoTracebackArg {
+	uintptr_t  context;
+	uintptr_t  sigContext;
+	uintptr_t* buf;
+	uintptr_t  max;
+};
+
+struct cgoSymbolizerArg {
+	uintptr_t   pc;
+	const char* file;
+	uintptr_t   lineno;
+	const char* func;
+	uintptr_t   entry;
+	uintptr_t   more;
+	uintptr_t   data;
+};
+
+// Uses atomic adds and subtracts to catch the possibility of
+// erroneous calls from multiple threads; that should be impossible in
+// this test case, but we check just in case.
+static int contextCount;
+
+int getContextCount() {
+	return __sync_add_and_fetch(&contextCount, 0);
+}
+
+void tcContext(void* parg) {
+	struct cgoContextArg* arg = (struct cgoContextArg*)(parg);
+	if (arg->context == 0) {
+		arg->context = __sync_add_and_fetch(&contextCount, 1);
+	} else {
+		if (arg->context != __sync_add_and_fetch(&contextCount, 0)) {
+			abort();
+		}
+		__sync_sub_and_fetch(&contextCount, 1);
+	}
+}
+
+void tcContextSimple(void* parg) {
+	struct cgoContextArg* arg = (struct cgoContextArg*)(parg);
+	if (arg->context == 0) {
+		arg->context = 1;
+	}
+}
+
+void tcTraceback(void* parg) {
+	int base, i;
+	struct cgoTracebackArg* arg = (struct cgoTracebackArg*)(parg);
+	if (arg->context == 0 && arg->sigContext == 0) {
+		// This shouldn't happen in this program.
+		abort();
+	}
+	// Return a variable number of PC values.
+	base = arg->context << 8;
+	for (i = 0; i < arg->context; i++) {
+		if (i < arg->max) {
+			arg->buf[i] = base + i;
+		}
+	}
+}
+
+void tcSymbolizer(void *parg) {
+	struct cgoSymbolizerArg* arg = (struct cgoSymbolizerArg*)(parg);
+	if (arg->pc == 0) {
+		return;
+	}
+	// Report two lines per PC returned by traceback, to test more handling.
+	arg->more = arg->file == NULL;
+	arg->file = "tracebackctxt.go";
+	arg->func = "cFunction";
+	arg->lineno = arg->pc + (arg->more << 16);
+}
+
+void TracebackContextPreemptionCallGo(int i) {
+	TracebackContextPreemptionGoFunction(i);
+}
diff --git a/src/runtime/testdata/testprogcgo/windows/win.go b/src/runtime/testdata/testprogcgo/windows/win.go
new file mode 100644
index 0000000..9d9f86c
--- /dev/null
+++ b/src/runtime/testdata/testprogcgo/windows/win.go
@@ -0,0 +1,14 @@
+package windows
+
+/*
+#include <windows.h>
+
+DWORD agetthread() {
+	return GetCurrentThreadId();
+}
+*/
+import "C"
+
+func GetThread() uint32 {
+	return uint32(C.agetthread())
+}
diff --git a/src/runtime/testdata/testprognet/main.go b/src/runtime/testdata/testprognet/main.go
new file mode 100644
index 0000000..ae491a2
--- /dev/null
+++ b/src/runtime/testdata/testprognet/main.go
@@ -0,0 +1,35 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "os"
+
+var cmds = map[string]func(){}
+
+func register(name string, f func()) {
+	if cmds[name] != nil {
+		panic("duplicate registration: " + name)
+	}
+	cmds[name] = f
+}
+
+func registerInit(name string, f func()) {
+	if len(os.Args) >= 2 && os.Args[1] == name {
+		f()
+	}
+}
+
+func main() {
+	if len(os.Args) < 2 {
+		println("usage: " + os.Args[0] + " name-of-test")
+		return
+	}
+	f := cmds[os.Args[1]]
+	if f == nil {
+		println("unknown function: " + os.Args[1])
+		return
+	}
+	f()
+}
diff --git a/src/runtime/testdata/testprognet/net.go b/src/runtime/testdata/testprognet/net.go
new file mode 100644
index 0000000..714b101
--- /dev/null
+++ b/src/runtime/testdata/testprognet/net.go
@@ -0,0 +1,29 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"fmt"
+	"net"
+)
+
+func init() {
+	registerInit("NetpollDeadlock", NetpollDeadlockInit)
+	register("NetpollDeadlock", NetpollDeadlock)
+}
+
+func NetpollDeadlockInit() {
+	fmt.Println("dialing")
+	c, err := net.Dial("tcp", "localhost:14356")
+	if err == nil {
+		c.Close()
+	} else {
+		fmt.Println("error: ", err)
+	}
+}
+
+func NetpollDeadlock() {
+	fmt.Println("done")
+}
diff --git a/src/runtime/testdata/testprognet/signal.go b/src/runtime/testdata/testprognet/signal.go
new file mode 100644
index 0000000..dfa2e10
--- /dev/null
+++ b/src/runtime/testdata/testprognet/signal.go
@@ -0,0 +1,27 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !windows && !plan9
+// +build !windows,!plan9
+
+// This is in testprognet instead of testprog because testprog
+// must not import anything (like net, but also like os/signal)
+// that kicks off background goroutines during init.
+
+package main
+
+import (
+	"os/signal"
+	"syscall"
+)
+
+func init() {
+	register("SignalIgnoreSIGTRAP", SignalIgnoreSIGTRAP)
+}
+
+func SignalIgnoreSIGTRAP() {
+	signal.Ignore(syscall.SIGTRAP)
+	syscall.Kill(syscall.Getpid(), syscall.SIGTRAP)
+	println("OK")
+}
diff --git a/src/runtime/testdata/testprognet/signalexec.go b/src/runtime/testdata/testprognet/signalexec.go
new file mode 100644
index 0000000..62ebce7
--- /dev/null
+++ b/src/runtime/testdata/testprognet/signalexec.go
@@ -0,0 +1,71 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build darwin || dragonfly || freebsd || linux || netbsd || openbsd
+// +build darwin dragonfly freebsd linux netbsd openbsd
+
+// This is in testprognet instead of testprog because testprog
+// must not import anything (like net, but also like os/signal)
+// that kicks off background goroutines during init.
+
+package main
+
+import (
+	"fmt"
+	"os"
+	"os/exec"
+	"os/signal"
+	"sync"
+	"syscall"
+	"time"
+)
+
+func init() {
+	register("SignalDuringExec", SignalDuringExec)
+	register("Nop", Nop)
+}
+
+func SignalDuringExec() {
+	pgrp := syscall.Getpgrp()
+
+	const tries = 10
+
+	var wg sync.WaitGroup
+	c := make(chan os.Signal, tries)
+	signal.Notify(c, syscall.SIGWINCH)
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		for range c {
+		}
+	}()
+
+	for i := 0; i < tries; i++ {
+		time.Sleep(time.Microsecond)
+		wg.Add(2)
+		go func() {
+			defer wg.Done()
+			cmd := exec.Command(os.Args[0], "Nop")
+			cmd.Stdout = os.Stdout
+			cmd.Stderr = os.Stderr
+			if err := cmd.Run(); err != nil {
+				fmt.Printf("Start failed: %v", err)
+			}
+		}()
+		go func() {
+			defer wg.Done()
+			syscall.Kill(-pgrp, syscall.SIGWINCH)
+		}()
+	}
+
+	signal.Stop(c)
+	close(c)
+	wg.Wait()
+
+	fmt.Println("OK")
+}
+
+func Nop() {
+	// This is just for SignalDuringExec.
+}
diff --git a/src/runtime/testdata/testsuid/main.go b/src/runtime/testdata/testsuid/main.go
new file mode 100644
index 0000000..1949d2d
--- /dev/null
+++ b/src/runtime/testdata/testsuid/main.go
@@ -0,0 +1,25 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import (
+	"fmt"
+	"log"
+	"os"
+)
+
+func main() {
+	if os.Geteuid() == os.Getuid() {
+		os.Exit(99)
+	}
+
+	fmt.Fprintf(os.Stdout, "GOTRACEBACK=%s\n", os.Getenv("GOTRACEBACK"))
+	f, err := os.OpenFile(os.Getenv("TEST_OUTPUT"), os.O_CREATE|os.O_RDWR, 0600)
+	if err != nil {
+		log.Fatalf("os.Open failed: %s", err)
+	}
+	defer f.Close()
+	fmt.Fprintf(os.Stderr, "hello\n")
+}
diff --git a/src/runtime/testdata/testwinlib/main.c b/src/runtime/testdata/testwinlib/main.c
new file mode 100644
index 0000000..55ee657
--- /dev/null
+++ b/src/runtime/testdata/testwinlib/main.c
@@ -0,0 +1,67 @@
+#include <stdio.h>
+#include <windows.h>
+#include "testwinlib.h"
+
+int exceptionCount;
+int continueCount;
+LONG WINAPI customExceptionHandlder(struct _EXCEPTION_POINTERS *ExceptionInfo)
+{
+    if (ExceptionInfo->ExceptionRecord->ExceptionCode == EXCEPTION_BREAKPOINT)
+    {
+        exceptionCount++;
+        // prepare context to resume execution
+        CONTEXT *c = ExceptionInfo->ContextRecord;
+#ifdef _AMD64_
+        c->Rip = *(DWORD64 *)c->Rsp;
+        c->Rsp += 8;
+#elif defined(_X86_)
+        c->Eip = *(DWORD *)c->Esp;
+        c->Esp += 4;
+#else
+        c->Pc = c->Lr;
+#endif
+        return EXCEPTION_CONTINUE_EXECUTION;
+    }
+    return EXCEPTION_CONTINUE_SEARCH;
+}
+LONG WINAPI customContinueHandlder(struct _EXCEPTION_POINTERS *ExceptionInfo)
+{
+    if (ExceptionInfo->ExceptionRecord->ExceptionCode == EXCEPTION_BREAKPOINT)
+    {
+        continueCount++;
+        return EXCEPTION_CONTINUE_EXECUTION;
+    }
+    return EXCEPTION_CONTINUE_SEARCH;
+}
+
+void throwFromC()
+{
+    DebugBreak();
+}
+int main()
+{
+    // simulate a "lazily" attached debugger, by calling some go code before attaching the exception/continue handler
+    Dummy();
+    exceptionCount = 0;
+    continueCount = 0;
+    void *exceptionHandlerHandle = AddVectoredExceptionHandler(0, customExceptionHandlder);
+    if (NULL == exceptionHandlerHandle)
+    {
+        printf("cannot add vectored exception handler\n");
+        fflush(stdout);
+        return 2;
+    }
+    void *continueHandlerHandle = AddVectoredContinueHandler(0, customContinueHandlder);
+    if (NULL == continueHandlerHandle)
+    {
+        printf("cannot add vectored continue handler\n");
+        fflush(stdout);
+        return 2;
+    }
+    CallMeBack(throwFromC);
+    RemoveVectoredContinueHandler(continueHandlerHandle);
+    RemoveVectoredExceptionHandler(exceptionHandlerHandle);
+    printf("exceptionCount: %d\ncontinueCount: %d\n", exceptionCount, continueCount);
+    fflush(stdout);
+    return 0;
+}
diff --git a/src/runtime/testdata/testwinlib/main.go b/src/runtime/testdata/testwinlib/main.go
new file mode 100644
index 0000000..407331b
--- /dev/null
+++ b/src/runtime/testdata/testwinlib/main.go
@@ -0,0 +1,31 @@
+//go:build windows && cgo
+// +build windows,cgo
+
+package main
+
+// #include <windows.h>
+// typedef void(*callmeBackFunc)();
+// static void bridgeCallback(callmeBackFunc callback) {
+//	callback();
+//}
+import "C"
+
+// CallMeBack call backs C code.
+//
+//export CallMeBack
+func CallMeBack(callback C.callmeBackFunc) {
+	C.bridgeCallback(callback)
+}
+
+// Dummy is called by the C code before registering the exception/continue handlers simulating a debugger.
+// This makes sure that the Go runtime's lastcontinuehandler is reached before the C continue handler and thus,
+// validate that it does not crash the program before another handler could take an action.
+// The idea here is to reproduce what happens when you attach a debugger to a running program.
+// It also simulate the behavior of the .Net debugger, which register its exception/continue handlers lazily.
+//
+//export Dummy
+func Dummy() int {
+	return 42
+}
+
+func main() {}
diff --git a/src/runtime/testdata/testwinlibsignal/dummy.go b/src/runtime/testdata/testwinlibsignal/dummy.go
new file mode 100644
index 0000000..e610f15
--- /dev/null
+++ b/src/runtime/testdata/testwinlibsignal/dummy.go
@@ -0,0 +1,13 @@
+//go:build windows
+// +build windows
+
+package main
+
+import "C"
+
+//export Dummy
+func Dummy() int {
+	return 42
+}
+
+func main() {}
diff --git a/src/runtime/testdata/testwinlibsignal/main.c b/src/runtime/testdata/testwinlibsignal/main.c
new file mode 100644
index 0000000..37f2482
--- /dev/null
+++ b/src/runtime/testdata/testwinlibsignal/main.c
@@ -0,0 +1,57 @@
+#include <windows.h>
+#include <stdio.h>
+
+HANDLE waitForCtrlBreakEvent;
+
+BOOL WINAPI CtrlHandler(DWORD fdwCtrlType)
+{
+    switch (fdwCtrlType)
+    {
+    case CTRL_BREAK_EVENT:
+        SetEvent(waitForCtrlBreakEvent);
+        return TRUE;
+    default:
+        return FALSE;
+    }
+}
+
+int main(void)
+{
+    waitForCtrlBreakEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
+    if (!waitForCtrlBreakEvent) {
+        fprintf(stderr, "ERROR: Could not create event\n");
+        return 1;
+    }
+
+    if (!SetConsoleCtrlHandler(CtrlHandler, TRUE))
+    {
+        fprintf(stderr, "ERROR: Could not set control handler\n");
+        return 1;
+    }
+
+    // The library must be loaded after the SetConsoleCtrlHandler call
+    // so that the library handler registers after the main program.
+    // This way the library handler gets called first.
+    HMODULE dummyDll = LoadLibrary("dummy.dll");
+    if (!dummyDll) {
+        fprintf(stderr, "ERROR: Could not load dummy.dll\n");
+        return 1;
+    }
+
+    // Call the Dummy function so that Go initialization completes, since
+    // all cgo entry points call out to _cgo_wait_runtime_init_done.
+    if (((int(*)(void))GetProcAddress(dummyDll, "Dummy"))() != 42) {
+        fprintf(stderr, "ERROR: Dummy function did not return 42\n");
+        return 1;
+    }
+
+    printf("ready\n");
+    fflush(stdout);
+
+    if (WaitForSingleObject(waitForCtrlBreakEvent, 5000) != WAIT_OBJECT_0) {
+        fprintf(stderr, "FAILURE: No signal received\n");
+        return 1;
+    }
+
+    return 0;
+}
diff --git a/src/runtime/testdata/testwinlibthrow/main.go b/src/runtime/testdata/testwinlibthrow/main.go
new file mode 100644
index 0000000..ce0c92f
--- /dev/null
+++ b/src/runtime/testdata/testwinlibthrow/main.go
@@ -0,0 +1,19 @@
+package main
+
+import (
+	"os"
+	"syscall"
+)
+
+func main() {
+	dll := syscall.MustLoadDLL("veh.dll")
+	RaiseNoExcept := dll.MustFindProc("RaiseNoExcept")
+	ThreadRaiseNoExcept := dll.MustFindProc("ThreadRaiseNoExcept")
+
+	thread := len(os.Args) > 1 && os.Args[1] == "thread"
+	if !thread {
+		RaiseNoExcept.Call()
+	} else {
+		ThreadRaiseNoExcept.Call()
+	}
+}
diff --git a/src/runtime/testdata/testwinlibthrow/veh.c b/src/runtime/testdata/testwinlibthrow/veh.c
new file mode 100644
index 0000000..08c1f9e
--- /dev/null
+++ b/src/runtime/testdata/testwinlibthrow/veh.c
@@ -0,0 +1,26 @@
+//go:build ignore
+
+#include <windows.h>
+
+__declspec(dllexport)
+void RaiseNoExcept(void)
+{
+    RaiseException(42, 0, 0, 0);
+}
+
+static DWORD WINAPI ThreadRaiser(void* Context)
+{
+    RaiseNoExcept();
+    return 0;
+}
+
+__declspec(dllexport)
+void ThreadRaiseNoExcept(void)
+{
+    HANDLE thread = CreateThread(0, 0, ThreadRaiser,  0, 0, 0);
+    if (0 != thread)
+    {
+        WaitForSingleObject(thread, INFINITE);
+        CloseHandle(thread);
+    }
+}
diff --git a/src/runtime/testdata/testwinsignal/main.go b/src/runtime/testdata/testwinsignal/main.go
new file mode 100644
index 0000000..e1136f3
--- /dev/null
+++ b/src/runtime/testdata/testwinsignal/main.go
@@ -0,0 +1,53 @@
+package main
+
+import (
+	"fmt"
+	"io"
+	"log"
+	"os"
+	"os/signal"
+	"syscall"
+	"time"
+)
+
+func main() {
+	// Ensure that this process terminates when the test times out,
+	// even if the expected signal never arrives.
+	go func() {
+		io.Copy(io.Discard, os.Stdin)
+		log.Fatal("stdin is closed; terminating")
+	}()
+
+	// Register to receive all signals.
+	c := make(chan os.Signal, 1)
+	signal.Notify(c)
+
+	// Get console window handle.
+	kernel32 := syscall.NewLazyDLL("kernel32.dll")
+	getConsoleWindow := kernel32.NewProc("GetConsoleWindow")
+	hwnd, _, err := getConsoleWindow.Call()
+	if hwnd == 0 {
+		log.Fatal("no associated console: ", err)
+	}
+
+	// Send message to close the console window.
+	const _WM_CLOSE = 0x0010
+	user32 := syscall.NewLazyDLL("user32.dll")
+	postMessage := user32.NewProc("PostMessageW")
+	ok, _, err := postMessage.Call(hwnd, _WM_CLOSE, 0, 0)
+	if ok == 0 {
+		log.Fatal("post message failed: ", err)
+	}
+
+	sig := <-c
+
+	// Allow some time for the handler to complete if it's going to.
+	//
+	// (In https://go.dev/issue/41884 the handler returned immediately,
+	// which caused Windows to terminate the program before the goroutine
+	// that received the SIGTERM had a chance to actually clean up.)
+	time.Sleep(time.Second)
+
+	// Print the signal's name: "terminated" makes the test succeed.
+	fmt.Println(sig)
+}
diff --git a/src/runtime/testdata/testwintls/main.c b/src/runtime/testdata/testwintls/main.c
new file mode 100644
index 0000000..6061828
--- /dev/null
+++ b/src/runtime/testdata/testwintls/main.c
@@ -0,0 +1,29 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include <windows.h>
+
+int main(int argc, char **argv) {
+    if (argc < 3) {
+        return 1;
+    }
+    // Allocate more than 64 TLS indices
+    // so the Go runtime doesn't find
+    // enough space in the TEB TLS slots.
+    for (int i = 0; i < 65; i++) {
+        TlsAlloc();
+    }
+    HMODULE hlib = LoadLibrary(argv[1]);
+    if (hlib == NULL) {
+        return 2;
+    }
+    FARPROC proc = GetProcAddress(hlib, argv[2]);
+    if (proc == NULL) {
+        return 3;
+    }
+    if (proc() != 42) {
+        return 4;
+    }
+    return 0;
+}
+\ No newline at end of file
diff --git a/src/runtime/testdata/testwintls/main.go b/src/runtime/testdata/testwintls/main.go
new file mode 100644
index 0000000..1cf296c
--- /dev/null
+++ b/src/runtime/testdata/testwintls/main.go
@@ -0,0 +1,12 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "C"
+
+//export GoFunc
+func GoFunc() int { return 42 }
+
+func main() {}
diff --git a/src/runtime/textflag.h b/src/runtime/textflag.h
new file mode 100644
index 0000000..8930312
--- /dev/null
+++ b/src/runtime/textflag.h
@@ -0,0 +1,38 @@
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// This file defines flags attached to various functions
+// and data objects. The compilers, assemblers, and linker must
+// all agree on these values.
+//
+// Keep in sync with src/cmd/internal/obj/textflag.go.
+
+// Don't profile the marked routine. This flag is deprecated.
+#define NOPROF	1
+// It is ok for the linker to get multiple of these symbols. It will
+// pick one of the duplicates to use.
+#define DUPOK	2
+// Don't insert stack check preamble.
+#define NOSPLIT	4
+// Put this data in a read-only section.
+#define RODATA	8
+// This data contains no pointers.
+#define NOPTR	16
+// This is a wrapper function and should not count as disabling 'recover'.
+#define WRAPPER 32
+// This function uses its incoming context register.
+#define NEEDCTXT 64
+// Allocate a word of thread local storage and store the offset from the
+// thread local base to the thread local storage in this variable.
+#define TLSBSS	256
+// Do not insert instructions to allocate a stack frame for this function.
+// Only valid on functions that declare a frame size of 0.
+#define NOFRAME 512
+// Function can call reflect.Type.Method or reflect.Type.MethodByName.
+#define REFLECTMETHOD 1024
+// Function is the outermost frame of the call stack. Call stack unwinders
+// should stop at this function.
+#define TOPFRAME 2048
+// Function is an ABI wrapper.
+#define ABIWRAPPER 4096
diff --git a/src/runtime/time.go b/src/runtime/time.go
new file mode 100644
index 0000000..c05351c
--- /dev/null
+++ b/src/runtime/time.go
@@ -0,0 +1,1144 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Time-related runtime and pieces of package time.
+
+package runtime
+
+import (
+	"internal/abi"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// Package time knows the layout of this structure.
+// If this struct changes, adjust ../time/sleep.go:/runtimeTimer.
+type timer struct {
+	// If this timer is on a heap, which P's heap it is on.
+	// puintptr rather than *p to match uintptr in the versions
+	// of this struct defined in other packages.
+	pp puintptr
+
+	// Timer wakes up at when, and then at when+period, ... (period > 0 only)
+	// each time calling f(arg, now) in the timer goroutine, so f must be
+	// a well-behaved function and not block.
+	//
+	// when must be positive on an active timer.
+	when   int64
+	period int64
+	f      func(any, uintptr)
+	arg    any
+	seq    uintptr
+
+	// What to set the when field to in timerModifiedXX status.
+	nextwhen int64
+
+	// The status field holds one of the values below.
+	status atomic.Uint32
+}
+
+// Code outside this file has to be careful in using a timer value.
+//
+// The pp, status, and nextwhen fields may only be used by code in this file.
+//
+// Code that creates a new timer value can set the when, period, f,
+// arg, and seq fields.
+// A new timer value may be passed to addtimer (called by time.startTimer).
+// After doing that no fields may be touched.
+//
+// An active timer (one that has been passed to addtimer) may be
+// passed to deltimer (time.stopTimer), after which it is no longer an
+// active timer. It is an inactive timer.
+// In an inactive timer the period, f, arg, and seq fields may be modified,
+// but not the when field.
+// It's OK to just drop an inactive timer and let the GC collect it.
+// It's not OK to pass an inactive timer to addtimer.
+// Only newly allocated timer values may be passed to addtimer.
+//
+// An active timer may be passed to modtimer. No fields may be touched.
+// It remains an active timer.
+//
+// An inactive timer may be passed to resettimer to turn into an
+// active timer with an updated when field.
+// It's OK to pass a newly allocated timer value to resettimer.
+//
+// Timer operations are addtimer, deltimer, modtimer, resettimer,
+// cleantimers, adjusttimers, and runtimer.
+//
+// We don't permit calling addtimer/deltimer/modtimer/resettimer simultaneously,
+// but adjusttimers and runtimer can be called at the same time as any of those.
+//
+// Active timers live in heaps attached to P, in the timers field.
+// Inactive timers live there too temporarily, until they are removed.
+//
+// addtimer:
+//   timerNoStatus   -> timerWaiting
+//   anything else   -> panic: invalid value
+// deltimer:
+//   timerWaiting         -> timerModifying -> timerDeleted
+//   timerModifiedEarlier -> timerModifying -> timerDeleted
+//   timerModifiedLater   -> timerModifying -> timerDeleted
+//   timerNoStatus        -> do nothing
+//   timerDeleted         -> do nothing
+//   timerRemoving        -> do nothing
+//   timerRemoved         -> do nothing
+//   timerRunning         -> wait until status changes
+//   timerMoving          -> wait until status changes
+//   timerModifying       -> wait until status changes
+// modtimer:
+//   timerWaiting    -> timerModifying -> timerModifiedXX
+//   timerModifiedXX -> timerModifying -> timerModifiedYY
+//   timerNoStatus   -> timerModifying -> timerWaiting
+//   timerRemoved    -> timerModifying -> timerWaiting
+//   timerDeleted    -> timerModifying -> timerModifiedXX
+//   timerRunning    -> wait until status changes
+//   timerMoving     -> wait until status changes
+//   timerRemoving   -> wait until status changes
+//   timerModifying  -> wait until status changes
+// cleantimers (looks in P's timer heap):
+//   timerDeleted    -> timerRemoving -> timerRemoved
+//   timerModifiedXX -> timerMoving -> timerWaiting
+// adjusttimers (looks in P's timer heap):
+//   timerDeleted    -> timerRemoving -> timerRemoved
+//   timerModifiedXX -> timerMoving -> timerWaiting
+// runtimer (looks in P's timer heap):
+//   timerNoStatus   -> panic: uninitialized timer
+//   timerWaiting    -> timerWaiting or
+//   timerWaiting    -> timerRunning -> timerNoStatus or
+//   timerWaiting    -> timerRunning -> timerWaiting
+//   timerModifying  -> wait until status changes
+//   timerModifiedXX -> timerMoving -> timerWaiting
+//   timerDeleted    -> timerRemoving -> timerRemoved
+//   timerRunning    -> panic: concurrent runtimer calls
+//   timerRemoved    -> panic: inconsistent timer heap
+//   timerRemoving   -> panic: inconsistent timer heap
+//   timerMoving     -> panic: inconsistent timer heap
+
+// Values for the timer status field.
+const (
+	// Timer has no status set yet.
+	timerNoStatus = iota
+
+	// Waiting for timer to fire.
+	// The timer is in some P's heap.
+	timerWaiting
+
+	// Running the timer function.
+	// A timer will only have this status briefly.
+	timerRunning
+
+	// The timer is deleted and should be removed.
+	// It should not be run, but it is still in some P's heap.
+	timerDeleted
+
+	// The timer is being removed.
+	// The timer will only have this status briefly.
+	timerRemoving
+
+	// The timer has been stopped.
+	// It is not in any P's heap.
+	timerRemoved
+
+	// The timer is being modified.
+	// The timer will only have this status briefly.
+	timerModifying
+
+	// The timer has been modified to an earlier time.
+	// The new when value is in the nextwhen field.
+	// The timer is in some P's heap, possibly in the wrong place.
+	timerModifiedEarlier
+
+	// The timer has been modified to the same or a later time.
+	// The new when value is in the nextwhen field.
+	// The timer is in some P's heap, possibly in the wrong place.
+	timerModifiedLater
+
+	// The timer has been modified and is being moved.
+	// The timer will only have this status briefly.
+	timerMoving
+)
+
+// maxWhen is the maximum value for timer's when field.
+const maxWhen = 1<<63 - 1
+
+// verifyTimers can be set to true to add debugging checks that the
+// timer heaps are valid.
+const verifyTimers = false
+
+// Package time APIs.
+// Godoc uses the comments in package time, not these.
+
+// time.now is implemented in assembly.
+
+// timeSleep puts the current goroutine to sleep for at least ns nanoseconds.
+//
+//go:linkname timeSleep time.Sleep
+func timeSleep(ns int64) {
+	if ns <= 0 {
+		return
+	}
+
+	gp := getg()
+	t := gp.timer
+	if t == nil {
+		t = new(timer)
+		gp.timer = t
+	}
+	t.f = goroutineReady
+	t.arg = gp
+	t.nextwhen = nanotime() + ns
+	if t.nextwhen < 0 { // check for overflow.
+		t.nextwhen = maxWhen
+	}
+	gopark(resetForSleep, unsafe.Pointer(t), waitReasonSleep, traceBlockSleep, 1)
+}
+
+// resetForSleep is called after the goroutine is parked for timeSleep.
+// We can't call resettimer in timeSleep itself because if this is a short
+// sleep and there are many goroutines then the P can wind up running the
+// timer function, goroutineReady, before the goroutine has been parked.
+func resetForSleep(gp *g, ut unsafe.Pointer) bool {
+	t := (*timer)(ut)
+	resettimer(t, t.nextwhen)
+	return true
+}
+
+// startTimer adds t to the timer heap.
+//
+//go:linkname startTimer time.startTimer
+func startTimer(t *timer) {
+	if raceenabled {
+		racerelease(unsafe.Pointer(t))
+	}
+	addtimer(t)
+}
+
+// stopTimer stops a timer.
+// It reports whether t was stopped before being run.
+//
+//go:linkname stopTimer time.stopTimer
+func stopTimer(t *timer) bool {
+	return deltimer(t)
+}
+
+// resetTimer resets an inactive timer, adding it to the heap.
+//
+// Reports whether the timer was modified before it was run.
+//
+//go:linkname resetTimer time.resetTimer
+func resetTimer(t *timer, when int64) bool {
+	if raceenabled {
+		racerelease(unsafe.Pointer(t))
+	}
+	return resettimer(t, when)
+}
+
+// modTimer modifies an existing timer.
+//
+//go:linkname modTimer time.modTimer
+func modTimer(t *timer, when, period int64, f func(any, uintptr), arg any, seq uintptr) {
+	modtimer(t, when, period, f, arg, seq)
+}
+
+// Go runtime.
+
+// Ready the goroutine arg.
+func goroutineReady(arg any, seq uintptr) {
+	goready(arg.(*g), 0)
+}
+
+// Note: this changes some unsynchronized operations to synchronized operations
+// addtimer adds a timer to the current P.
+// This should only be called with a newly created timer.
+// That avoids the risk of changing the when field of a timer in some P's heap,
+// which could cause the heap to become unsorted.
+func addtimer(t *timer) {
+	// when must be positive. A negative value will cause runtimer to
+	// overflow during its delta calculation and never expire other runtime
+	// timers. Zero will cause checkTimers to fail to notice the timer.
+	if t.when <= 0 {
+		throw("timer when must be positive")
+	}
+	if t.period < 0 {
+		throw("timer period must be non-negative")
+	}
+	if t.status.Load() != timerNoStatus {
+		throw("addtimer called with initialized timer")
+	}
+	t.status.Store(timerWaiting)
+
+	when := t.when
+
+	// Disable preemption while using pp to avoid changing another P's heap.
+	mp := acquirem()
+
+	pp := getg().m.p.ptr()
+	lock(&pp.timersLock)
+	cleantimers(pp)
+	doaddtimer(pp, t)
+	unlock(&pp.timersLock)
+
+	wakeNetPoller(when)
+
+	releasem(mp)
+}
+
+// doaddtimer adds t to the current P's heap.
+// The caller must have locked the timers for pp.
+func doaddtimer(pp *p, t *timer) {
+	// Timers rely on the network poller, so make sure the poller
+	// has started.
+	if netpollInited.Load() == 0 {
+		netpollGenericInit()
+	}
+
+	if t.pp != 0 {
+		throw("doaddtimer: P already set in timer")
+	}
+	t.pp.set(pp)
+	i := len(pp.timers)
+	pp.timers = append(pp.timers, t)
+	siftupTimer(pp.timers, i)
+	if t == pp.timers[0] {
+		pp.timer0When.Store(t.when)
+	}
+	pp.numTimers.Add(1)
+}
+
+// deltimer deletes the timer t. It may be on some other P, so we can't
+// actually remove it from the timers heap. We can only mark it as deleted.
+// It will be removed in due course by the P whose heap it is on.
+// Reports whether the timer was removed before it was run.
+func deltimer(t *timer) bool {
+	for {
+		switch s := t.status.Load(); s {
+		case timerWaiting, timerModifiedLater:
+			// Prevent preemption while the timer is in timerModifying.
+			// This could lead to a self-deadlock. See #38070.
+			mp := acquirem()
+			if t.status.CompareAndSwap(s, timerModifying) {
+				// Must fetch t.pp before changing status,
+				// as cleantimers in another goroutine
+				// can clear t.pp of a timerDeleted timer.
+				tpp := t.pp.ptr()
+				if !t.status.CompareAndSwap(timerModifying, timerDeleted) {
+					badTimer()
+				}
+				releasem(mp)
+				tpp.deletedTimers.Add(1)
+				// Timer was not yet run.
+				return true
+			} else {
+				releasem(mp)
+			}
+		case timerModifiedEarlier:
+			// Prevent preemption while the timer is in timerModifying.
+			// This could lead to a self-deadlock. See #38070.
+			mp := acquirem()
+			if t.status.CompareAndSwap(s, timerModifying) {
+				// Must fetch t.pp before setting status
+				// to timerDeleted.
+				tpp := t.pp.ptr()
+				if !t.status.CompareAndSwap(timerModifying, timerDeleted) {
+					badTimer()
+				}
+				releasem(mp)
+				tpp.deletedTimers.Add(1)
+				// Timer was not yet run.
+				return true
+			} else {
+				releasem(mp)
+			}
+		case timerDeleted, timerRemoving, timerRemoved:
+			// Timer was already run.
+			return false
+		case timerRunning, timerMoving:
+			// The timer is being run or moved, by a different P.
+			// Wait for it to complete.
+			osyield()
+		case timerNoStatus:
+			// Removing timer that was never added or
+			// has already been run. Also see issue 21874.
+			return false
+		case timerModifying:
+			// Simultaneous calls to deltimer and modtimer.
+			// Wait for the other call to complete.
+			osyield()
+		default:
+			badTimer()
+		}
+	}
+}
+
+// dodeltimer removes timer i from the current P's heap.
+// We are locked on the P when this is called.
+// It returns the smallest changed index in pp.timers.
+// The caller must have locked the timers for pp.
+func dodeltimer(pp *p, i int) int {
+	if t := pp.timers[i]; t.pp.ptr() != pp {
+		throw("dodeltimer: wrong P")
+	} else {
+		t.pp = 0
+	}
+	last := len(pp.timers) - 1
+	if i != last {
+		pp.timers[i] = pp.timers[last]
+	}
+	pp.timers[last] = nil
+	pp.timers = pp.timers[:last]
+	smallestChanged := i
+	if i != last {
+		// Moving to i may have moved the last timer to a new parent,
+		// so sift up to preserve the heap guarantee.
+		smallestChanged = siftupTimer(pp.timers, i)
+		siftdownTimer(pp.timers, i)
+	}
+	if i == 0 {
+		updateTimer0When(pp)
+	}
+	n := pp.numTimers.Add(-1)
+	if n == 0 {
+		// If there are no timers, then clearly none are modified.
+		pp.timerModifiedEarliest.Store(0)
+	}
+	return smallestChanged
+}
+
+// dodeltimer0 removes timer 0 from the current P's heap.
+// We are locked on the P when this is called.
+// It reports whether it saw no problems due to races.
+// The caller must have locked the timers for pp.
+func dodeltimer0(pp *p) {
+	if t := pp.timers[0]; t.pp.ptr() != pp {
+		throw("dodeltimer0: wrong P")
+	} else {
+		t.pp = 0
+	}
+	last := len(pp.timers) - 1
+	if last > 0 {
+		pp.timers[0] = pp.timers[last]
+	}
+	pp.timers[last] = nil
+	pp.timers = pp.timers[:last]
+	if last > 0 {
+		siftdownTimer(pp.timers, 0)
+	}
+	updateTimer0When(pp)
+	n := pp.numTimers.Add(-1)
+	if n == 0 {
+		// If there are no timers, then clearly none are modified.
+		pp.timerModifiedEarliest.Store(0)
+	}
+}
+
+// modtimer modifies an existing timer.
+// This is called by the netpoll code or time.Ticker.Reset or time.Timer.Reset.
+// Reports whether the timer was modified before it was run.
+func modtimer(t *timer, when, period int64, f func(any, uintptr), arg any, seq uintptr) bool {
+	if when <= 0 {
+		throw("timer when must be positive")
+	}
+	if period < 0 {
+		throw("timer period must be non-negative")
+	}
+
+	status := uint32(timerNoStatus)
+	wasRemoved := false
+	var pending bool
+	var mp *m
+loop:
+	for {
+		switch status = t.status.Load(); status {
+		case timerWaiting, timerModifiedEarlier, timerModifiedLater:
+			// Prevent preemption while the timer is in timerModifying.
+			// This could lead to a self-deadlock. See #38070.
+			mp = acquirem()
+			if t.status.CompareAndSwap(status, timerModifying) {
+				pending = true // timer not yet run
+				break loop
+			}
+			releasem(mp)
+		case timerNoStatus, timerRemoved:
+			// Prevent preemption while the timer is in timerModifying.
+			// This could lead to a self-deadlock. See #38070.
+			mp = acquirem()
+
+			// Timer was already run and t is no longer in a heap.
+			// Act like addtimer.
+			if t.status.CompareAndSwap(status, timerModifying) {
+				wasRemoved = true
+				pending = false // timer already run or stopped
+				break loop
+			}
+			releasem(mp)
+		case timerDeleted:
+			// Prevent preemption while the timer is in timerModifying.
+			// This could lead to a self-deadlock. See #38070.
+			mp = acquirem()
+			if t.status.CompareAndSwap(status, timerModifying) {
+				t.pp.ptr().deletedTimers.Add(-1)
+				pending = false // timer already stopped
+				break loop
+			}
+			releasem(mp)
+		case timerRunning, timerRemoving, timerMoving:
+			// The timer is being run or moved, by a different P.
+			// Wait for it to complete.
+			osyield()
+		case timerModifying:
+			// Multiple simultaneous calls to modtimer.
+			// Wait for the other call to complete.
+			osyield()
+		default:
+			badTimer()
+		}
+	}
+
+	t.period = period
+	t.f = f
+	t.arg = arg
+	t.seq = seq
+
+	if wasRemoved {
+		t.when = when
+		pp := getg().m.p.ptr()
+		lock(&pp.timersLock)
+		doaddtimer(pp, t)
+		unlock(&pp.timersLock)
+		if !t.status.CompareAndSwap(timerModifying, timerWaiting) {
+			badTimer()
+		}
+		releasem(mp)
+		wakeNetPoller(when)
+	} else {
+		// The timer is in some other P's heap, so we can't change
+		// the when field. If we did, the other P's heap would
+		// be out of order. So we put the new when value in the
+		// nextwhen field, and let the other P set the when field
+		// when it is prepared to resort the heap.
+		t.nextwhen = when
+
+		newStatus := uint32(timerModifiedLater)
+		if when < t.when {
+			newStatus = timerModifiedEarlier
+		}
+
+		tpp := t.pp.ptr()
+
+		if newStatus == timerModifiedEarlier {
+			updateTimerModifiedEarliest(tpp, when)
+		}
+
+		// Set the new status of the timer.
+		if !t.status.CompareAndSwap(timerModifying, newStatus) {
+			badTimer()
+		}
+		releasem(mp)
+
+		// If the new status is earlier, wake up the poller.
+		if newStatus == timerModifiedEarlier {
+			wakeNetPoller(when)
+		}
+	}
+
+	return pending
+}
+
+// resettimer resets the time when a timer should fire.
+// If used for an inactive timer, the timer will become active.
+// This should be called instead of addtimer if the timer value has been,
+// or may have been, used previously.
+// Reports whether the timer was modified before it was run.
+func resettimer(t *timer, when int64) bool {
+	return modtimer(t, when, t.period, t.f, t.arg, t.seq)
+}
+
+// cleantimers cleans up the head of the timer queue. This speeds up
+// programs that create and delete timers; leaving them in the heap
+// slows down addtimer. Reports whether no timer problems were found.
+// The caller must have locked the timers for pp.
+func cleantimers(pp *p) {
+	gp := getg()
+	for {
+		if len(pp.timers) == 0 {
+			return
+		}
+
+		// This loop can theoretically run for a while, and because
+		// it is holding timersLock it cannot be preempted.
+		// If someone is trying to preempt us, just return.
+		// We can clean the timers later.
+		if gp.preemptStop {
+			return
+		}
+
+		t := pp.timers[0]
+		if t.pp.ptr() != pp {
+			throw("cleantimers: bad p")
+		}
+		switch s := t.status.Load(); s {
+		case timerDeleted:
+			if !t.status.CompareAndSwap(s, timerRemoving) {
+				continue
+			}
+			dodeltimer0(pp)
+			if !t.status.CompareAndSwap(timerRemoving, timerRemoved) {
+				badTimer()
+			}
+			pp.deletedTimers.Add(-1)
+		case timerModifiedEarlier, timerModifiedLater:
+			if !t.status.CompareAndSwap(s, timerMoving) {
+				continue
+			}
+			// Now we can change the when field.
+			t.when = t.nextwhen
+			// Move t to the right position.
+			dodeltimer0(pp)
+			doaddtimer(pp, t)
+			if !t.status.CompareAndSwap(timerMoving, timerWaiting) {
+				badTimer()
+			}
+		default:
+			// Head of timers does not need adjustment.
+			return
+		}
+	}
+}
+
+// moveTimers moves a slice of timers to pp. The slice has been taken
+// from a different P.
+// This is currently called when the world is stopped, but the caller
+// is expected to have locked the timers for pp.
+func moveTimers(pp *p, timers []*timer) {
+	for _, t := range timers {
+	loop:
+		for {
+			switch s := t.status.Load(); s {
+			case timerWaiting:
+				if !t.status.CompareAndSwap(s, timerMoving) {
+					continue
+				}
+				t.pp = 0
+				doaddtimer(pp, t)
+				if !t.status.CompareAndSwap(timerMoving, timerWaiting) {
+					badTimer()
+				}
+				break loop
+			case timerModifiedEarlier, timerModifiedLater:
+				if !t.status.CompareAndSwap(s, timerMoving) {
+					continue
+				}
+				t.when = t.nextwhen
+				t.pp = 0
+				doaddtimer(pp, t)
+				if !t.status.CompareAndSwap(timerMoving, timerWaiting) {
+					badTimer()
+				}
+				break loop
+			case timerDeleted:
+				if !t.status.CompareAndSwap(s, timerRemoved) {
+					continue
+				}
+				t.pp = 0
+				// We no longer need this timer in the heap.
+				break loop
+			case timerModifying:
+				// Loop until the modification is complete.
+				osyield()
+			case timerNoStatus, timerRemoved:
+				// We should not see these status values in a timers heap.
+				badTimer()
+			case timerRunning, timerRemoving, timerMoving:
+				// Some other P thinks it owns this timer,
+				// which should not happen.
+				badTimer()
+			default:
+				badTimer()
+			}
+		}
+	}
+}
+
+// adjusttimers looks through the timers in the current P's heap for
+// any timers that have been modified to run earlier, and puts them in
+// the correct place in the heap. While looking for those timers,
+// it also moves timers that have been modified to run later,
+// and removes deleted timers. The caller must have locked the timers for pp.
+func adjusttimers(pp *p, now int64) {
+	// If we haven't yet reached the time of the first timerModifiedEarlier
+	// timer, don't do anything. This speeds up programs that adjust
+	// a lot of timers back and forth if the timers rarely expire.
+	// We'll postpone looking through all the adjusted timers until
+	// one would actually expire.
+	first := pp.timerModifiedEarliest.Load()
+	if first == 0 || first > now {
+		if verifyTimers {
+			verifyTimerHeap(pp)
+		}
+		return
+	}
+
+	// We are going to clear all timerModifiedEarlier timers.
+	pp.timerModifiedEarliest.Store(0)
+
+	var moved []*timer
+	for i := 0; i < len(pp.timers); i++ {
+		t := pp.timers[i]
+		if t.pp.ptr() != pp {
+			throw("adjusttimers: bad p")
+		}
+		switch s := t.status.Load(); s {
+		case timerDeleted:
+			if t.status.CompareAndSwap(s, timerRemoving) {
+				changed := dodeltimer(pp, i)
+				if !t.status.CompareAndSwap(timerRemoving, timerRemoved) {
+					badTimer()
+				}
+				pp.deletedTimers.Add(-1)
+				// Go back to the earliest changed heap entry.
+				// "- 1" because the loop will add 1.
+				i = changed - 1
+			}
+		case timerModifiedEarlier, timerModifiedLater:
+			if t.status.CompareAndSwap(s, timerMoving) {
+				// Now we can change the when field.
+				t.when = t.nextwhen
+				// Take t off the heap, and hold onto it.
+				// We don't add it back yet because the
+				// heap manipulation could cause our
+				// loop to skip some other timer.
+				changed := dodeltimer(pp, i)
+				moved = append(moved, t)
+				// Go back to the earliest changed heap entry.
+				// "- 1" because the loop will add 1.
+				i = changed - 1
+			}
+		case timerNoStatus, timerRunning, timerRemoving, timerRemoved, timerMoving:
+			badTimer()
+		case timerWaiting:
+			// OK, nothing to do.
+		case timerModifying:
+			// Check again after modification is complete.
+			osyield()
+			i--
+		default:
+			badTimer()
+		}
+	}
+
+	if len(moved) > 0 {
+		addAdjustedTimers(pp, moved)
+	}
+
+	if verifyTimers {
+		verifyTimerHeap(pp)
+	}
+}
+
+// addAdjustedTimers adds any timers we adjusted in adjusttimers
+// back to the timer heap.
+func addAdjustedTimers(pp *p, moved []*timer) {
+	for _, t := range moved {
+		doaddtimer(pp, t)
+		if !t.status.CompareAndSwap(timerMoving, timerWaiting) {
+			badTimer()
+		}
+	}
+}
+
+// nobarrierWakeTime looks at P's timers and returns the time when we
+// should wake up the netpoller. It returns 0 if there are no timers.
+// This function is invoked when dropping a P, and must run without
+// any write barriers.
+//
+//go:nowritebarrierrec
+func nobarrierWakeTime(pp *p) int64 {
+	next := pp.timer0When.Load()
+	nextAdj := pp.timerModifiedEarliest.Load()
+	if next == 0 || (nextAdj != 0 && nextAdj < next) {
+		next = nextAdj
+	}
+	return next
+}
+
+// runtimer examines the first timer in timers. If it is ready based on now,
+// it runs the timer and removes or updates it.
+// Returns 0 if it ran a timer, -1 if there are no more timers, or the time
+// when the first timer should run.
+// The caller must have locked the timers for pp.
+// If a timer is run, this will temporarily unlock the timers.
+//
+//go:systemstack
+func runtimer(pp *p, now int64) int64 {
+	for {
+		t := pp.timers[0]
+		if t.pp.ptr() != pp {
+			throw("runtimer: bad p")
+		}
+		switch s := t.status.Load(); s {
+		case timerWaiting:
+			if t.when > now {
+				// Not ready to run.
+				return t.when
+			}
+
+			if !t.status.CompareAndSwap(s, timerRunning) {
+				continue
+			}
+			// Note that runOneTimer may temporarily unlock
+			// pp.timersLock.
+			runOneTimer(pp, t, now)
+			return 0
+
+		case timerDeleted:
+			if !t.status.CompareAndSwap(s, timerRemoving) {
+				continue
+			}
+			dodeltimer0(pp)
+			if !t.status.CompareAndSwap(timerRemoving, timerRemoved) {
+				badTimer()
+			}
+			pp.deletedTimers.Add(-1)
+			if len(pp.timers) == 0 {
+				return -1
+			}
+
+		case timerModifiedEarlier, timerModifiedLater:
+			if !t.status.CompareAndSwap(s, timerMoving) {
+				continue
+			}
+			t.when = t.nextwhen
+			dodeltimer0(pp)
+			doaddtimer(pp, t)
+			if !t.status.CompareAndSwap(timerMoving, timerWaiting) {
+				badTimer()
+			}
+
+		case timerModifying:
+			// Wait for modification to complete.
+			osyield()
+
+		case timerNoStatus, timerRemoved:
+			// Should not see a new or inactive timer on the heap.
+			badTimer()
+		case timerRunning, timerRemoving, timerMoving:
+			// These should only be set when timers are locked,
+			// and we didn't do it.
+			badTimer()
+		default:
+			badTimer()
+		}
+	}
+}
+
+// runOneTimer runs a single timer.
+// The caller must have locked the timers for pp.
+// This will temporarily unlock the timers while running the timer function.
+//
+//go:systemstack
+func runOneTimer(pp *p, t *timer, now int64) {
+	if raceenabled {
+		ppcur := getg().m.p.ptr()
+		if ppcur.timerRaceCtx == 0 {
+			ppcur.timerRaceCtx = racegostart(abi.FuncPCABIInternal(runtimer) + sys.PCQuantum)
+		}
+		raceacquirectx(ppcur.timerRaceCtx, unsafe.Pointer(t))
+	}
+
+	f := t.f
+	arg := t.arg
+	seq := t.seq
+
+	if t.period > 0 {
+		// Leave in heap but adjust next time to fire.
+		delta := t.when - now
+		t.when += t.period * (1 + -delta/t.period)
+		if t.when < 0 { // check for overflow.
+			t.when = maxWhen
+		}
+		siftdownTimer(pp.timers, 0)
+		if !t.status.CompareAndSwap(timerRunning, timerWaiting) {
+			badTimer()
+		}
+		updateTimer0When(pp)
+	} else {
+		// Remove from heap.
+		dodeltimer0(pp)
+		if !t.status.CompareAndSwap(timerRunning, timerNoStatus) {
+			badTimer()
+		}
+	}
+
+	if raceenabled {
+		// Temporarily use the current P's racectx for g0.
+		gp := getg()
+		if gp.racectx != 0 {
+			throw("runOneTimer: unexpected racectx")
+		}
+		gp.racectx = gp.m.p.ptr().timerRaceCtx
+	}
+
+	unlock(&pp.timersLock)
+
+	f(arg, seq)
+
+	lock(&pp.timersLock)
+
+	if raceenabled {
+		gp := getg()
+		gp.racectx = 0
+	}
+}
+
+// clearDeletedTimers removes all deleted timers from the P's timer heap.
+// This is used to avoid clogging up the heap if the program
+// starts a lot of long-running timers and then stops them.
+// For example, this can happen via context.WithTimeout.
+//
+// This is the only function that walks through the entire timer heap,
+// other than moveTimers which only runs when the world is stopped.
+//
+// The caller must have locked the timers for pp.
+func clearDeletedTimers(pp *p) {
+	// We are going to clear all timerModifiedEarlier timers.
+	// Do this now in case new ones show up while we are looping.
+	pp.timerModifiedEarliest.Store(0)
+
+	cdel := int32(0)
+	to := 0
+	changedHeap := false
+	timers := pp.timers
+nextTimer:
+	for _, t := range timers {
+		for {
+			switch s := t.status.Load(); s {
+			case timerWaiting:
+				if changedHeap {
+					timers[to] = t
+					siftupTimer(timers, to)
+				}
+				to++
+				continue nextTimer
+			case timerModifiedEarlier, timerModifiedLater:
+				if t.status.CompareAndSwap(s, timerMoving) {
+					t.when = t.nextwhen
+					timers[to] = t
+					siftupTimer(timers, to)
+					to++
+					changedHeap = true
+					if !t.status.CompareAndSwap(timerMoving, timerWaiting) {
+						badTimer()
+					}
+					continue nextTimer
+				}
+			case timerDeleted:
+				if t.status.CompareAndSwap(s, timerRemoving) {
+					t.pp = 0
+					cdel++
+					if !t.status.CompareAndSwap(timerRemoving, timerRemoved) {
+						badTimer()
+					}
+					changedHeap = true
+					continue nextTimer
+				}
+			case timerModifying:
+				// Loop until modification complete.
+				osyield()
+			case timerNoStatus, timerRemoved:
+				// We should not see these status values in a timer heap.
+				badTimer()
+			case timerRunning, timerRemoving, timerMoving:
+				// Some other P thinks it owns this timer,
+				// which should not happen.
+				badTimer()
+			default:
+				badTimer()
+			}
+		}
+	}
+
+	// Set remaining slots in timers slice to nil,
+	// so that the timer values can be garbage collected.
+	for i := to; i < len(timers); i++ {
+		timers[i] = nil
+	}
+
+	pp.deletedTimers.Add(-cdel)
+	pp.numTimers.Add(-cdel)
+
+	timers = timers[:to]
+	pp.timers = timers
+	updateTimer0When(pp)
+
+	if verifyTimers {
+		verifyTimerHeap(pp)
+	}
+}
+
+// verifyTimerHeap verifies that the timer heap is in a valid state.
+// This is only for debugging, and is only called if verifyTimers is true.
+// The caller must have locked the timers.
+func verifyTimerHeap(pp *p) {
+	for i, t := range pp.timers {
+		if i == 0 {
+			// First timer has no parent.
+			continue
+		}
+
+		// The heap is 4-ary. See siftupTimer and siftdownTimer.
+		p := (i - 1) / 4
+		if t.when < pp.timers[p].when {
+			print("bad timer heap at ", i, ": ", p, ": ", pp.timers[p].when, ", ", i, ": ", t.when, "\n")
+			throw("bad timer heap")
+		}
+	}
+	if numTimers := int(pp.numTimers.Load()); len(pp.timers) != numTimers {
+		println("timer heap len", len(pp.timers), "!= numTimers", numTimers)
+		throw("bad timer heap len")
+	}
+}
+
+// updateTimer0When sets the P's timer0When field.
+// The caller must have locked the timers for pp.
+func updateTimer0When(pp *p) {
+	if len(pp.timers) == 0 {
+		pp.timer0When.Store(0)
+	} else {
+		pp.timer0When.Store(pp.timers[0].when)
+	}
+}
+
+// updateTimerModifiedEarliest updates the recorded nextwhen field of the
+// earlier timerModifiedEarier value.
+// The timers for pp will not be locked.
+func updateTimerModifiedEarliest(pp *p, nextwhen int64) {
+	for {
+		old := pp.timerModifiedEarliest.Load()
+		if old != 0 && int64(old) < nextwhen {
+			return
+		}
+
+		if pp.timerModifiedEarliest.CompareAndSwap(old, nextwhen) {
+			return
+		}
+	}
+}
+
+// timeSleepUntil returns the time when the next timer should fire. Returns
+// maxWhen if there are no timers.
+// This is only called by sysmon and checkdead.
+func timeSleepUntil() int64 {
+	next := int64(maxWhen)
+
+	// Prevent allp slice changes. This is like retake.
+	lock(&allpLock)
+	for _, pp := range allp {
+		if pp == nil {
+			// This can happen if procresize has grown
+			// allp but not yet created new Ps.
+			continue
+		}
+
+		w := pp.timer0When.Load()
+		if w != 0 && w < next {
+			next = w
+		}
+
+		w = pp.timerModifiedEarliest.Load()
+		if w != 0 && w < next {
+			next = w
+		}
+	}
+	unlock(&allpLock)
+
+	return next
+}
+
+// Heap maintenance algorithms.
+// These algorithms check for slice index errors manually.
+// Slice index error can happen if the program is using racy
+// access to timers. We don't want to panic here, because
+// it will cause the program to crash with a mysterious
+// "panic holding locks" message. Instead, we panic while not
+// holding a lock.
+
+// siftupTimer puts the timer at position i in the right place
+// in the heap by moving it up toward the top of the heap.
+// It returns the smallest changed index.
+func siftupTimer(t []*timer, i int) int {
+	if i >= len(t) {
+		badTimer()
+	}
+	when := t[i].when
+	if when <= 0 {
+		badTimer()
+	}
+	tmp := t[i]
+	for i > 0 {
+		p := (i - 1) / 4 // parent
+		if when >= t[p].when {
+			break
+		}
+		t[i] = t[p]
+		i = p
+	}
+	if tmp != t[i] {
+		t[i] = tmp
+	}
+	return i
+}
+
+// siftdownTimer puts the timer at position i in the right place
+// in the heap by moving it down toward the bottom of the heap.
+func siftdownTimer(t []*timer, i int) {
+	n := len(t)
+	if i >= n {
+		badTimer()
+	}
+	when := t[i].when
+	if when <= 0 {
+		badTimer()
+	}
+	tmp := t[i]
+	for {
+		c := i*4 + 1 // left child
+		c3 := c + 2  // mid child
+		if c >= n {
+			break
+		}
+		w := t[c].when
+		if c+1 < n && t[c+1].when < w {
+			w = t[c+1].when
+			c++
+		}
+		if c3 < n {
+			w3 := t[c3].when
+			if c3+1 < n && t[c3+1].when < w3 {
+				w3 = t[c3+1].when
+				c3++
+			}
+			if w3 < w {
+				w = w3
+				c = c3
+			}
+		}
+		if w >= when {
+			break
+		}
+		t[i] = t[c]
+		i = c
+	}
+	if tmp != t[i] {
+		t[i] = tmp
+	}
+}
+
+// badTimer is called if the timer data structures have been corrupted,
+// presumably due to racy use by the program. We panic here rather than
+// panicking due to invalid slice access while holding locks.
+// See issue #25686.
+func badTimer() {
+	throw("timer data corruption")
+}
diff --git a/src/runtime/time_fake.go b/src/runtime/time_fake.go
new file mode 100644
index 0000000..9e24f70
--- /dev/null
+++ b/src/runtime/time_fake.go
@@ -0,0 +1,98 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build faketime && !windows
+
+// Faketime isn't currently supported on Windows. This would require
+// modifying syscall.Write to call syscall.faketimeWrite,
+// translating the Stdout and Stderr handles into FDs 1 and 2.
+// (See CL 192739 PS 3.)
+
+package runtime
+
+import "unsafe"
+
+// faketime is the simulated time in nanoseconds since 1970 for the
+// playground.
+var faketime int64 = 1257894000000000000
+
+var faketimeState struct {
+	lock mutex
+
+	// lastfaketime is the last faketime value written to fd 1 or 2.
+	lastfaketime int64
+
+	// lastfd is the fd to which lastfaketime was written.
+	//
+	// Subsequent writes to the same fd may use the same
+	// timestamp, but the timestamp must increase if the fd
+	// changes.
+	lastfd uintptr
+}
+
+//go:nosplit
+func nanotime() int64 {
+	return faketime
+}
+
+//go:linkname time_now time.now
+func time_now() (sec int64, nsec int32, mono int64) {
+	return faketime / 1e9, int32(faketime % 1e9), faketime
+}
+
+// write is like the Unix write system call.
+// We have to avoid write barriers to avoid potential deadlock
+// on write calls.
+//
+//go:nowritebarrierrec
+func write(fd uintptr, p unsafe.Pointer, n int32) int32 {
+	if !(fd == 1 || fd == 2) {
+		// Do an ordinary write.
+		return write1(fd, p, n)
+	}
+
+	// Write with the playback header.
+
+	// First, lock to avoid interleaving writes.
+	lock(&faketimeState.lock)
+
+	// If the current fd doesn't match the fd of the previous write,
+	// ensure that the timestamp is strictly greater. That way, we can
+	// recover the original order even if we read the fds separately.
+	t := faketimeState.lastfaketime
+	if fd != faketimeState.lastfd {
+		t++
+		faketimeState.lastfd = fd
+	}
+	if faketime > t {
+		t = faketime
+	}
+	faketimeState.lastfaketime = t
+
+	// Playback header: 0 0 P B <8-byte time> <4-byte data length> (big endian)
+	var buf [4 + 8 + 4]byte
+	buf[2] = 'P'
+	buf[3] = 'B'
+	tu := uint64(t)
+	buf[4] = byte(tu >> (7 * 8))
+	buf[5] = byte(tu >> (6 * 8))
+	buf[6] = byte(tu >> (5 * 8))
+	buf[7] = byte(tu >> (4 * 8))
+	buf[8] = byte(tu >> (3 * 8))
+	buf[9] = byte(tu >> (2 * 8))
+	buf[10] = byte(tu >> (1 * 8))
+	buf[11] = byte(tu >> (0 * 8))
+	nu := uint32(n)
+	buf[12] = byte(nu >> (3 * 8))
+	buf[13] = byte(nu >> (2 * 8))
+	buf[14] = byte(nu >> (1 * 8))
+	buf[15] = byte(nu >> (0 * 8))
+	write1(fd, unsafe.Pointer(&buf[0]), int32(len(buf)))
+
+	// Write actual data.
+	res := write1(fd, p, n)
+
+	unlock(&faketimeState.lock)
+	return res
+}
diff --git a/src/runtime/time_linux_amd64.s b/src/runtime/time_linux_amd64.s
new file mode 100644
index 0000000..1416d23
--- /dev/null
+++ b/src/runtime/time_linux_amd64.s
@@ -0,0 +1,87 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !faketime
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "textflag.h"
+
+#define SYS_clock_gettime	228
+
+// func time.now() (sec int64, nsec int32, mono int64)
+TEXT time·now<ABIInternal>(SB),NOSPLIT,$16-24
+	MOVQ	SP, R12 // Save old SP; R12 unchanged by C code.
+
+	MOVQ	g_m(R14), BX // BX unchanged by C code.
+
+	// Set vdsoPC and vdsoSP for SIGPROF traceback.
+	// Save the old values on stack and restore them on exit,
+	// so this function is reentrant.
+	MOVQ	m_vdsoPC(BX), CX
+	MOVQ	m_vdsoSP(BX), DX
+	MOVQ	CX, 0(SP)
+	MOVQ	DX, 8(SP)
+
+	LEAQ	sec+0(FP), DX
+	MOVQ	-8(DX), CX	// Sets CX to function return address.
+	MOVQ	CX, m_vdsoPC(BX)
+	MOVQ	DX, m_vdsoSP(BX)
+
+	CMPQ	R14, m_curg(BX)	// Only switch if on curg.
+	JNE	noswitch
+
+	MOVQ	m_g0(BX), DX
+	MOVQ	(g_sched+gobuf_sp)(DX), SP	// Set SP to g0 stack
+
+noswitch:
+	SUBQ	$32, SP		// Space for two time results
+	ANDQ	$~15, SP	// Align for C code
+
+	MOVL	$0, DI // CLOCK_REALTIME
+	LEAQ	16(SP), SI
+	MOVQ	runtime·vdsoClockgettimeSym(SB), AX
+	CMPQ	AX, $0
+	JEQ	fallback
+	CALL	AX
+
+	MOVL	$1, DI // CLOCK_MONOTONIC
+	LEAQ	0(SP), SI
+	MOVQ	runtime·vdsoClockgettimeSym(SB), AX
+	CALL	AX
+
+ret:
+	MOVQ	16(SP), AX	// realtime sec
+	MOVQ	24(SP), DI	// realtime nsec (moved to BX below)
+	MOVQ	0(SP), CX	// monotonic sec
+	IMULQ	$1000000000, CX
+	MOVQ	8(SP), DX	// monotonic nsec
+
+	MOVQ	R12, SP		// Restore real SP
+
+	// Restore vdsoPC, vdsoSP
+	// We don't worry about being signaled between the two stores.
+	// If we are not in a signal handler, we'll restore vdsoSP to 0,
+	// and no one will care about vdsoPC. If we are in a signal handler,
+	// we cannot receive another signal.
+	MOVQ	8(SP), SI
+	MOVQ	SI, m_vdsoSP(BX)
+	MOVQ	0(SP), SI
+	MOVQ	SI, m_vdsoPC(BX)
+
+	// set result registers; AX is already correct
+	MOVQ	DI, BX
+	ADDQ	DX, CX
+	RET
+
+fallback:
+	MOVQ	$SYS_clock_gettime, AX
+	SYSCALL
+
+	MOVL	$1, DI // CLOCK_MONOTONIC
+	LEAQ	0(SP), SI
+	MOVQ	$SYS_clock_gettime, AX
+	SYSCALL
+
+	JMP	ret
diff --git a/src/runtime/time_nofake.go b/src/runtime/time_nofake.go
new file mode 100644
index 0000000..70a2102
--- /dev/null
+++ b/src/runtime/time_nofake.go
@@ -0,0 +1,32 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !faketime
+
+package runtime
+
+import "unsafe"
+
+// faketime is the simulated time in nanoseconds since 1970 for the
+// playground.
+//
+// Zero means not to use faketime.
+var faketime int64
+
+//go:nosplit
+func nanotime() int64 {
+	return nanotime1()
+}
+
+var overrideWrite func(fd uintptr, p unsafe.Pointer, n int32) int32
+
+// write must be nosplit on Windows (see write1)
+//
+//go:nosplit
+func write(fd uintptr, p unsafe.Pointer, n int32) int32 {
+	if overrideWrite != nil {
+		return overrideWrite(fd, noescape(p), n)
+	}
+	return write1(fd, p, n)
+}
diff --git a/src/runtime/time_test.go b/src/runtime/time_test.go
new file mode 100644
index 0000000..f086820
--- /dev/null
+++ b/src/runtime/time_test.go
@@ -0,0 +1,97 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"bytes"
+	"encoding/binary"
+	"errors"
+	"internal/testenv"
+	"os/exec"
+	"reflect"
+	"runtime"
+	"testing"
+)
+
+func TestFakeTime(t *testing.T) {
+	if runtime.GOOS == "windows" {
+		t.Skip("faketime not supported on windows")
+	}
+
+	// Faketime is advanced in checkdead. External linking brings in cgo,
+	// causing checkdead not working.
+	testenv.MustInternalLink(t, false)
+
+	t.Parallel()
+
+	exe, err := buildTestProg(t, "testfaketime", "-tags=faketime")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	var stdout, stderr bytes.Buffer
+	cmd := exec.Command(exe)
+	cmd.Stdout = &stdout
+	cmd.Stderr = &stderr
+
+	err = testenv.CleanCmdEnv(cmd).Run()
+	if err != nil {
+		t.Fatalf("exit status: %v\n%s", err, stderr.String())
+	}
+
+	t.Logf("raw stdout: %q", stdout.String())
+	t.Logf("raw stderr: %q", stderr.String())
+
+	f1, err1 := parseFakeTime(stdout.Bytes())
+	if err1 != nil {
+		t.Fatal(err1)
+	}
+	f2, err2 := parseFakeTime(stderr.Bytes())
+	if err2 != nil {
+		t.Fatal(err2)
+	}
+
+	const time0 = 1257894000000000000
+	got := [][]fakeTimeFrame{f1, f2}
+	var want = [][]fakeTimeFrame{{
+		{time0 + 1, "line 2\n"},
+		{time0 + 1, "line 3\n"},
+		{time0 + 1e9, "line 5\n"},
+		{time0 + 1e9, "2009-11-10T23:00:01Z"},
+	}, {
+		{time0, "line 1\n"},
+		{time0 + 2, "line 4\n"},
+	}}
+	if !reflect.DeepEqual(want, got) {
+		t.Fatalf("want %v, got %v", want, got)
+	}
+}
+
+type fakeTimeFrame struct {
+	time uint64
+	data string
+}
+
+func parseFakeTime(x []byte) ([]fakeTimeFrame, error) {
+	var frames []fakeTimeFrame
+	for len(x) != 0 {
+		if len(x) < 4+8+4 {
+			return nil, errors.New("truncated header")
+		}
+		const magic = "\x00\x00PB"
+		if string(x[:len(magic)]) != magic {
+			return nil, errors.New("bad magic")
+		}
+		x = x[len(magic):]
+		time := binary.BigEndian.Uint64(x)
+		x = x[8:]
+		dlen := binary.BigEndian.Uint32(x)
+		x = x[4:]
+		data := string(x[:dlen])
+		x = x[dlen:]
+		frames = append(frames, fakeTimeFrame{time, data})
+	}
+	return frames, nil
+}
diff --git a/src/runtime/time_windows.h b/src/runtime/time_windows.h
new file mode 100644
index 0000000..7c2e65c
--- /dev/null
+++ b/src/runtime/time_windows.h
@@ -0,0 +1,17 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Constants for fetching time values on Windows for use in asm code.
+
+// See https://wrkhpi.wordpress.com/2007/08/09/getting-os-information-the-kuser_shared_data-structure/
+// Archived copy at:
+// http://web.archive.org/web/20210411000829/https://wrkhpi.wordpress.com/2007/08/09/getting-os-information-the-kuser_shared_data-structure/
+
+// Must read hi1, then lo, then hi2. The snapshot is valid if hi1 == hi2.
+// Or, on 64-bit, just read lo:hi1 all at once atomically.
+#define _INTERRUPT_TIME 0x7ffe0008
+#define _SYSTEM_TIME 0x7ffe0014
+#define time_lo 0
+#define time_hi1 4
+#define time_hi2 8
diff --git a/src/runtime/time_windows_386.s b/src/runtime/time_windows_386.s
new file mode 100644
index 0000000..b8b636e
--- /dev/null
+++ b/src/runtime/time_windows_386.s
@@ -0,0 +1,84 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !faketime
+
+#include "go_asm.h"
+#include "textflag.h"
+#include "time_windows.h"
+
+TEXT time·now(SB),NOSPLIT,$0-20
+	CMPB	runtime·useQPCTime(SB), $0
+	JNE	useQPC
+loop:
+	MOVL	(_INTERRUPT_TIME+time_hi1), AX
+	MOVL	(_INTERRUPT_TIME+time_lo), CX
+	MOVL	(_INTERRUPT_TIME+time_hi2), DI
+	CMPL	AX, DI
+	JNE	loop
+
+	// w = DI:CX
+	// multiply by 100
+	MOVL	$100, AX
+	MULL	CX
+	IMULL	$100, DI
+	ADDL	DI, DX
+	// w*100 = DX:AX
+	MOVL	AX, mono+12(FP)
+	MOVL	DX, mono+16(FP)
+
+wall:
+	MOVL	(_SYSTEM_TIME+time_hi1), CX
+	MOVL	(_SYSTEM_TIME+time_lo), AX
+	MOVL	(_SYSTEM_TIME+time_hi2), DX
+	CMPL	CX, DX
+	JNE	wall
+
+	// w = DX:AX
+	// convert to Unix epoch (but still 100ns units)
+	#define delta 116444736000000000
+	SUBL	$(delta & 0xFFFFFFFF), AX
+	SBBL $(delta >> 32), DX
+
+	// nano/100 = DX:AX
+	// split into two decimal halves by div 1e9.
+	// (decimal point is two spots over from correct place,
+	// but we avoid overflow in the high word.)
+	MOVL	$1000000000, CX
+	DIVL	CX
+	MOVL	AX, DI
+	MOVL	DX, SI
+
+	// DI = nano/100/1e9 = nano/1e11 = sec/100, DX = SI = nano/100%1e9
+	// split DX into seconds and nanoseconds by div 1e7 magic multiply.
+	MOVL	DX, AX
+	MOVL	$1801439851, CX
+	MULL	CX
+	SHRL	$22, DX
+	MOVL	DX, BX
+	IMULL	$10000000, DX
+	MOVL	SI, CX
+	SUBL	DX, CX
+
+	// DI = sec/100 (still)
+	// BX = (nano/100%1e9)/1e7 = (nano/1e9)%100 = sec%100
+	// CX = (nano/100%1e9)%1e7 = (nano%1e9)/100 = nsec/100
+	// store nsec for return
+	IMULL	$100, CX
+	MOVL	CX, nsec+8(FP)
+
+	// DI = sec/100 (still)
+	// BX = sec%100
+	// construct DX:AX = 64-bit sec and store for return
+	MOVL	$0, DX
+	MOVL	$100, AX
+	MULL	DI
+	ADDL	BX, AX
+	ADCL	$0, DX
+	MOVL	AX, sec+0(FP)
+	MOVL	DX, sec+4(FP)
+	RET
+useQPC:
+	JMP	runtime·nowQPC(SB)
+	RET
diff --git a/src/runtime/time_windows_amd64.s b/src/runtime/time_windows_amd64.s
new file mode 100644
index 0000000..226f2b5
--- /dev/null
+++ b/src/runtime/time_windows_amd64.s
@@ -0,0 +1,42 @@
+// Copyright 2011 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !faketime
+
+#include "go_asm.h"
+#include "textflag.h"
+#include "time_windows.h"
+
+TEXT time·now(SB),NOSPLIT,$0-24
+	CMPB	runtime·useQPCTime(SB), $0
+	JNE	useQPC
+
+	MOVQ	$_INTERRUPT_TIME, DI
+	MOVQ	time_lo(DI), AX
+	IMULQ	$100, AX
+	MOVQ	AX, mono+16(FP)
+
+	MOVQ	$_SYSTEM_TIME, DI
+	MOVQ	time_lo(DI), AX
+	MOVQ	$116444736000000000, DI
+	SUBQ	DI, AX
+	IMULQ	$100, AX
+
+	// generated code for
+	//	func f(x uint64) (uint64, uint64) { return x/1000000000, x%1000000000 }
+	// adapted to reduce duplication
+	MOVQ	AX, CX
+	MOVQ	$1360296554856532783, AX
+	MULQ	CX
+	ADDQ	CX, DX
+	RCRQ	$1, DX
+	SHRQ	$29, DX
+	MOVQ	DX, sec+0(FP)
+	IMULQ	$1000000000, DX
+	SUBQ	DX, CX
+	MOVL	CX, nsec+8(FP)
+	RET
+useQPC:
+	JMP	runtime·nowQPC(SB)
+	RET
diff --git a/src/runtime/time_windows_arm.s b/src/runtime/time_windows_arm.s
new file mode 100644
index 0000000..8d4469f
--- /dev/null
+++ b/src/runtime/time_windows_arm.s
@@ -0,0 +1,90 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !faketime
+
+#include "go_asm.h"
+#include "textflag.h"
+#include "time_windows.h"
+
+TEXT time·now(SB),NOSPLIT,$0-20
+	MOVW    $0, R0
+	MOVB    runtime·useQPCTime(SB), R0
+	CMP	$0, R0
+	BNE	useQPC
+	MOVW	$_INTERRUPT_TIME, R3
+loop:
+	MOVW	time_hi1(R3), R1
+	DMB	MB_ISH
+	MOVW	time_lo(R3), R0
+	DMB	MB_ISH
+	MOVW	time_hi2(R3), R2
+	CMP	R1, R2
+	BNE	loop
+
+	// wintime = R1:R0, multiply by 100
+	MOVW	$100, R2
+	MULLU	R0, R2, (R4, R3)    // R4:R3 = R1:R0 * R2
+	MULA	R1, R2, R4, R4
+
+	// wintime*100 = R4:R3
+	MOVW	R3, mono+12(FP)
+	MOVW	R4, mono+16(FP)
+
+	MOVW	$_SYSTEM_TIME, R3
+wall:
+	MOVW	time_hi1(R3), R1
+	DMB	MB_ISH
+	MOVW	time_lo(R3), R0
+	DMB	MB_ISH
+	MOVW	time_hi2(R3), R2
+	CMP	R1, R2
+	BNE	wall
+
+	// w = R1:R0 in 100ns untis
+	// convert to Unix epoch (but still 100ns units)
+	#define delta 116444736000000000
+	SUB.S   $(delta & 0xFFFFFFFF), R0
+	SBC     $(delta >> 32), R1
+
+	// Convert to nSec
+	MOVW    $100, R2
+	MULLU   R0, R2, (R4, R3)    // R4:R3 = R1:R0 * R2
+	MULA    R1, R2, R4, R4
+	// w = R2:R1 in nSec
+	MOVW    R3, R1	      // R4:R3 -> R2:R1
+	MOVW    R4, R2
+
+	// multiply nanoseconds by reciprocal of 10**9 (scaled by 2**61)
+	// to get seconds (96 bit scaled result)
+	MOVW	$0x89705f41, R3		// 2**61 * 10**-9
+	MULLU	R1,R3,(R6,R5)		// R7:R6:R5 = R2:R1 * R3
+	MOVW	$0,R7
+	MULALU	R2,R3,(R7,R6)
+
+	// unscale by discarding low 32 bits, shifting the rest by 29
+	MOVW	R6>>29,R6		// R7:R6 = (R7:R6:R5 >> 61)
+	ORR	R7<<3,R6
+	MOVW	R7>>29,R7
+
+	// subtract (10**9 * sec) from nsec to get nanosecond remainder
+	MOVW	$1000000000, R5	// 10**9
+	MULLU	R6,R5,(R9,R8)   // R9:R8 = R7:R6 * R5
+	MULA	R7,R5,R9,R9
+	SUB.S	R8,R1		// R2:R1 -= R9:R8
+	SBC	R9,R2
+
+	// because reciprocal was a truncated repeating fraction, quotient
+	// may be slightly too small -- adjust to make remainder < 10**9
+	CMP	R5,R1	// if remainder > 10**9
+	SUB.HS	R5,R1   //    remainder -= 10**9
+	ADD.HS	$1,R6	//    sec += 1
+
+	MOVW	R6,sec_lo+0(FP)
+	MOVW	R7,sec_hi+4(FP)
+	MOVW	R1,nsec+8(FP)
+	RET
+useQPC:
+	RET	runtime·nowQPC(SB)		// tail call
+
diff --git a/src/runtime/time_windows_arm64.s b/src/runtime/time_windows_arm64.s
new file mode 100644
index 0000000..7943d6b
--- /dev/null
+++ b/src/runtime/time_windows_arm64.s
@@ -0,0 +1,47 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !faketime
+
+#include "go_asm.h"
+#include "textflag.h"
+#include "time_windows.h"
+
+TEXT time·now(SB),NOSPLIT,$0-24
+	MOVB    runtime·useQPCTime(SB), R0
+	CMP	$0, R0
+	BNE	useQPC
+
+	MOVD	$_INTERRUPT_TIME, R3
+	MOVD	time_lo(R3), R0
+	MOVD	$100, R1
+	MUL	R1, R0
+	MOVD	R0, mono+16(FP)
+
+	MOVD	$_SYSTEM_TIME, R3
+	MOVD	time_lo(R3), R0
+	// convert to Unix epoch (but still 100ns units)
+	#define delta 116444736000000000
+	SUB	$delta, R0
+	// Convert to nSec
+	MOVD	$100, R1
+	MUL	R1, R0
+
+	// Code stolen from compiler output for:
+	//
+	//	var x uint64
+	//	func f() (sec uint64, nsec uint32) { return x / 1000000000, uint32(x % 1000000000) }
+	//
+	LSR	$1, R0, R1
+	MOVD	$-8543223759426509416, R2
+	UMULH	R1, R2, R1
+	LSR	$28, R1, R1
+	MOVD	R1, sec+0(FP)
+	MOVD	$1000000000, R2
+	MSUB	R1, R0, R2, R0
+	MOVW	R0, nsec+8(FP)
+	RET
+useQPC:
+	RET	runtime·nowQPC(SB)		// tail call
+
diff --git a/src/runtime/timeasm.go b/src/runtime/timeasm.go
new file mode 100644
index 0000000..0421388
--- /dev/null
+++ b/src/runtime/timeasm.go
@@ -0,0 +1,14 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Declarations for operating systems implementing time.now directly in assembly.
+
+//go:build !faketime && (windows || (linux && amd64))
+
+package runtime
+
+import _ "unsafe"
+
+//go:linkname time_now time.now
+func time_now() (sec int64, nsec int32, mono int64)
diff --git a/src/runtime/timestub.go b/src/runtime/timestub.go
new file mode 100644
index 0000000..1d2926b
--- /dev/null
+++ b/src/runtime/timestub.go
@@ -0,0 +1,18 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Declarations for operating systems implementing time.now
+// indirectly, in terms of walltime and nanotime assembly.
+
+//go:build !faketime && !windows && !(linux && amd64)
+
+package runtime
+
+import _ "unsafe" // for go:linkname
+
+//go:linkname time_now time.now
+func time_now() (sec int64, nsec int32, mono int64) {
+	sec, nsec = walltime()
+	return sec, nsec, nanotime()
+}
diff --git a/src/runtime/timestub2.go b/src/runtime/timestub2.go
new file mode 100644
index 0000000..49bfeb6
--- /dev/null
+++ b/src/runtime/timestub2.go
@@ -0,0 +1,10 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !aix && !darwin && !freebsd && !openbsd && !solaris && !wasip1 && !windows && !(linux && amd64)
+
+package runtime
+
+//go:wasmimport gojs runtime.walltime
+func walltime() (sec int64, nsec int32)
diff --git a/src/runtime/tls_arm.s b/src/runtime/tls_arm.s
new file mode 100644
index 0000000..d224c55
--- /dev/null
+++ b/src/runtime/tls_arm.s
@@ -0,0 +1,100 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !windows
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// We have to resort to TLS variable to save g(R10).
+// One reason is that external code might trigger
+// SIGSEGV, and our runtime.sigtramp don't even know we
+// are in external code, and will continue to use R10,
+// this might as well result in another SIGSEGV.
+// Note: both functions will clobber R0 and R11 and
+// can be called from 5c ABI code.
+
+// On android, runtime.tls_g is a normal variable.
+// TLS offset is computed in x_cgo_inittls.
+#ifdef GOOS_android
+#define TLSG_IS_VARIABLE
+#endif
+
+// save_g saves the g register into pthread-provided
+// thread-local memory, so that we can call externally compiled
+// ARM code that will overwrite those registers.
+// NOTE: runtime.gogo assumes that R1 is preserved by this function.
+//       runtime.mcall assumes this function only clobbers R0 and R11.
+// Returns with g in R0.
+TEXT runtime·save_g(SB),NOSPLIT,$0
+	// If the host does not support MRC the linker will replace it with
+	// a call to runtime.read_tls_fallback which jumps to __kuser_get_tls.
+	// The replacement function saves LR in R11 over the call to read_tls_fallback.
+	// To make stack unwinding work, this function should NOT be marked as NOFRAME,
+	// as it may contain a call, which clobbers LR even just temporarily.
+	MRC	15, 0, R0, C13, C0, 3 // fetch TLS base pointer
+	BIC $3, R0 // Darwin/ARM might return unaligned pointer
+	MOVW	runtime·tls_g(SB), R11
+	ADD	R11, R0
+	MOVW	g, 0(R0)
+	MOVW	g, R0 // preserve R0 across call to setg<>
+	RET
+
+// load_g loads the g register from pthread-provided
+// thread-local memory, for use after calling externally compiled
+// ARM code that overwrote those registers.
+TEXT runtime·load_g(SB),NOSPLIT,$0
+	// See save_g
+	MRC	15, 0, R0, C13, C0, 3 // fetch TLS base pointer
+	BIC $3, R0 // Darwin/ARM might return unaligned pointer
+	MOVW	runtime·tls_g(SB), R11
+	ADD	R11, R0
+	MOVW	0(R0), g
+	RET
+
+// This is called from rt0_go, which runs on the system stack
+// using the initial stack allocated by the OS.
+// It calls back into standard C using the BL (R4) below.
+// To do that, the stack pointer must be 8-byte-aligned
+// on some systems, notably FreeBSD.
+// The ARM ABI says the stack pointer must be 8-byte-aligned
+// on entry to any function, but only FreeBSD's C library seems to care.
+// The caller was 8-byte aligned, but we push an LR.
+// Declare a dummy word ($4, not $0) to make sure the
+// frame is 8 bytes and stays 8-byte-aligned.
+TEXT runtime·_initcgo(SB),NOSPLIT,$4
+	// if there is an _cgo_init, call it.
+	MOVW	_cgo_init(SB), R4
+	CMP	$0, R4
+	B.EQ	nocgo
+	MRC     15, 0, R0, C13, C0, 3 	// load TLS base pointer
+	MOVW 	R0, R3 			// arg 3: TLS base pointer
+#ifdef TLSG_IS_VARIABLE
+	MOVW 	$runtime·tls_g(SB), R2 	// arg 2: &tls_g
+#else
+	MOVW	$0, R2			// arg 2: not used when using platform tls
+#endif
+	MOVW	$setg_gcc<>(SB), R1 	// arg 1: setg
+	MOVW	g, R0 			// arg 0: G
+	BL	(R4) // will clobber R0-R3
+nocgo:
+	RET
+
+// void setg_gcc(G*); set g called from gcc.
+TEXT setg_gcc<>(SB),NOSPLIT,$0
+	MOVW	R0, g
+	B		runtime·save_g(SB)
+
+#ifdef TLSG_IS_VARIABLE
+#ifdef GOOS_android
+// Use the free TLS_SLOT_APP slot #2 on Android Q.
+// Earlier androids are set up in gcc_android.c.
+DATA runtime·tls_g+0(SB)/4, $8
+#endif
+GLOBL runtime·tls_g+0(SB), NOPTR, $4
+#else
+GLOBL runtime·tls_g+0(SB), TLSBSS, $4
+#endif
diff --git a/src/runtime/tls_arm64.h b/src/runtime/tls_arm64.h
new file mode 100644
index 0000000..3aa8c63
--- /dev/null
+++ b/src/runtime/tls_arm64.h
@@ -0,0 +1,51 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#ifdef GOOS_android
+#define TLS_linux
+#define TLSG_IS_VARIABLE
+#endif
+#ifdef GOOS_linux
+#define TLS_linux
+#endif
+#ifdef TLS_linux
+#define MRS_TPIDR_R0 WORD $0xd53bd040 // MRS TPIDR_EL0, R0
+#endif
+
+#ifdef GOOS_darwin
+#define TLS_darwin
+#endif
+#ifdef GOOS_ios
+#define TLS_darwin
+#endif
+#ifdef TLS_darwin
+#define TLSG_IS_VARIABLE
+#define MRS_TPIDR_R0 WORD $0xd53bd060 // MRS TPIDRRO_EL0, R0
+#endif
+
+#ifdef GOOS_freebsd
+#define MRS_TPIDR_R0 WORD $0xd53bd040 // MRS TPIDR_EL0, R0
+#endif
+
+#ifdef GOOS_netbsd
+#define MRS_TPIDR_R0 WORD $0xd53bd040 // MRS TPIDRRO_EL0, R0
+#endif
+
+#ifdef GOOS_openbsd
+#define MRS_TPIDR_R0 WORD $0xd53bd040 // MRS TPIDR_EL0, R0
+#endif
+
+#ifdef GOOS_windows
+#define TLS_windows
+#endif
+#ifdef TLS_windows
+#define TLSG_IS_VARIABLE
+#define MRS_TPIDR_R0 MOVD R18_PLATFORM, R0
+#endif
+
+// Define something that will break the build if
+// the GOOS is unknown.
+#ifndef MRS_TPIDR_R0
+#define MRS_TPIDR_R0 unknown_TLS_implementation_in_tls_arm64_h
+#endif
diff --git a/src/runtime/tls_arm64.s b/src/runtime/tls_arm64.s
new file mode 100644
index 0000000..52b3e8f
--- /dev/null
+++ b/src/runtime/tls_arm64.s
@@ -0,0 +1,62 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+#include "tls_arm64.h"
+
+TEXT runtime·load_g(SB),NOSPLIT,$0
+#ifndef GOOS_darwin
+#ifndef GOOS_openbsd
+#ifndef GOOS_windows
+	MOVB	runtime·iscgo(SB), R0
+	CBZ	R0, nocgo
+#endif
+#endif
+#endif
+
+	MRS_TPIDR_R0
+#ifdef TLS_darwin
+	// Darwin sometimes returns unaligned pointers
+	AND	$0xfffffffffffffff8, R0
+#endif
+	MOVD	runtime·tls_g(SB), R27
+	MOVD	(R0)(R27), g
+
+nocgo:
+	RET
+
+TEXT runtime·save_g(SB),NOSPLIT,$0
+#ifndef GOOS_darwin
+#ifndef GOOS_openbsd
+#ifndef GOOS_windows
+	MOVB	runtime·iscgo(SB), R0
+	CBZ	R0, nocgo
+#endif
+#endif
+#endif
+
+	MRS_TPIDR_R0
+#ifdef TLS_darwin
+	// Darwin sometimes returns unaligned pointers
+	AND	$0xfffffffffffffff8, R0
+#endif
+	MOVD	runtime·tls_g(SB), R27
+	MOVD	g, (R0)(R27)
+
+nocgo:
+	RET
+
+#ifdef TLSG_IS_VARIABLE
+#ifdef GOOS_android
+// Use the free TLS_SLOT_APP slot #2 on Android Q.
+// Earlier androids are set up in gcc_android.c.
+DATA runtime·tls_g+0(SB)/8, $16
+#endif
+GLOBL runtime·tls_g+0(SB), NOPTR, $8
+#else
+GLOBL runtime·tls_g+0(SB), TLSBSS, $8
+#endif
diff --git a/src/runtime/tls_loong64.s b/src/runtime/tls_loong64.s
new file mode 100644
index 0000000..bc3be3d
--- /dev/null
+++ b/src/runtime/tls_loong64.s
@@ -0,0 +1,26 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// If !iscgo, this is a no-op.
+//
+// NOTE: mcall() assumes this clobbers only R30 (REGTMP).
+TEXT runtime·save_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVB	runtime·iscgo(SB), R30
+	BEQ	R30, nocgo
+
+	MOVV	g, runtime·tls_g(SB)
+
+nocgo:
+	RET
+
+TEXT runtime·load_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVV	runtime·tls_g(SB), g
+	RET
+
+GLOBL runtime·tls_g(SB), TLSBSS, $8
diff --git a/src/runtime/tls_mips64x.s b/src/runtime/tls_mips64x.s
new file mode 100644
index 0000000..ec2748e
--- /dev/null
+++ b/src/runtime/tls_mips64x.s
@@ -0,0 +1,30 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips64 || mips64le
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// If !iscgo, this is a no-op.
+//
+// NOTE: mcall() assumes this clobbers only R23 (REGTMP).
+TEXT runtime·save_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVB	runtime·iscgo(SB), R23
+	BEQ	R23, nocgo
+
+	MOVV	R3, R23	// save R3
+	MOVV	g, runtime·tls_g(SB) // TLS relocation clobbers R3
+	MOVV	R23, R3	// restore R3
+
+nocgo:
+	RET
+
+TEXT runtime·load_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVV	runtime·tls_g(SB), g // TLS relocation clobbers R3
+	RET
+
+GLOBL runtime·tls_g(SB), TLSBSS, $8
diff --git a/src/runtime/tls_mipsx.s b/src/runtime/tls_mipsx.s
new file mode 100644
index 0000000..71806f4
--- /dev/null
+++ b/src/runtime/tls_mipsx.s
@@ -0,0 +1,29 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build mips || mipsle
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// If !iscgo, this is a no-op.
+// NOTE: gogo assumes load_g only clobers g (R30) and REGTMP (R23)
+TEXT runtime·save_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVB	runtime·iscgo(SB), R23
+	BEQ	R23, nocgo
+
+	MOVW	R3, R23
+	MOVW	g, runtime·tls_g(SB) // TLS relocation clobbers R3
+	MOVW	R23, R3
+
+nocgo:
+	RET
+
+TEXT runtime·load_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	runtime·tls_g(SB), g // TLS relocation clobbers R3
+	RET
+
+GLOBL runtime·tls_g(SB), TLSBSS, $4
diff --git a/src/runtime/tls_ppc64x.s b/src/runtime/tls_ppc64x.s
new file mode 100644
index 0000000..17aec9f
--- /dev/null
+++ b/src/runtime/tls_ppc64x.s
@@ -0,0 +1,51 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ppc64 || ppc64le
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// We have to resort to TLS variable to save g (R30).
+// One reason is that external code might trigger
+// SIGSEGV, and our runtime.sigtramp don't even know we
+// are in external code, and will continue to use R30,
+// this might well result in another SIGSEGV.
+
+// save_g saves the g register into pthread-provided
+// thread-local memory, so that we can call externally compiled
+// ppc64 code that will overwrite this register.
+//
+// If !iscgo, this is a no-op.
+//
+// NOTE: setg_gcc<> assume this clobbers only R31.
+TEXT runtime·save_g(SB),NOSPLIT|NOFRAME,$0-0
+#ifndef GOOS_aix
+	MOVBZ	runtime·iscgo(SB), R31
+	CMP	R31, $0
+	BEQ	nocgo
+#endif
+	MOVD	runtime·tls_g(SB), R31
+	MOVD	g, 0(R31)
+
+nocgo:
+	RET
+
+// load_g loads the g register from pthread-provided
+// thread-local memory, for use after calling externally compiled
+// ppc64 code that overwrote those registers.
+//
+// This is never called directly from C code (it doesn't have to
+// follow the C ABI), but it may be called from a C context, where the
+// usual Go registers aren't set up.
+//
+// NOTE: _cgo_topofstack assumes this only clobbers g (R30), and R31.
+TEXT runtime·load_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVD	runtime·tls_g(SB), R31
+	MOVD	0(R31), g
+	RET
+
+GLOBL runtime·tls_g+0(SB), TLSBSS+DUPOK, $8
diff --git a/src/runtime/tls_riscv64.s b/src/runtime/tls_riscv64.s
new file mode 100644
index 0000000..397919a
--- /dev/null
+++ b/src/runtime/tls_riscv64.s
@@ -0,0 +1,30 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// If !iscgo, this is a no-op.
+//
+// NOTE: mcall() assumes this clobbers only X31 (REG_TMP).
+TEXT runtime·save_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVB	runtime·iscgo(SB), X31
+	BEQ	X0, X31, nocgo
+
+	MOV	runtime·tls_g(SB), X31
+	ADD	TP, X31		// add offset to thread pointer (X4)
+	MOV	g, (X31)
+
+nocgo:
+	RET
+
+TEXT runtime·load_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOV	runtime·tls_g(SB), X31
+	ADD	TP, X31		// add offset to thread pointer (X4)
+	MOV	(X31), g
+	RET
+
+GLOBL runtime·tls_g(SB), TLSBSS, $8
diff --git a/src/runtime/tls_s390x.s b/src/runtime/tls_s390x.s
new file mode 100644
index 0000000..cb6a21c
--- /dev/null
+++ b/src/runtime/tls_s390x.s
@@ -0,0 +1,51 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// We have to resort to TLS variable to save g (R13).
+// One reason is that external code might trigger
+// SIGSEGV, and our runtime.sigtramp don't even know we
+// are in external code, and will continue to use R13,
+// this might well result in another SIGSEGV.
+
+// save_g saves the g register into pthread-provided
+// thread-local memory, so that we can call externally compiled
+// s390x code that will overwrite this register.
+//
+// If !iscgo, this is a no-op.
+//
+// NOTE: setg_gcc<> assume this clobbers only R10 and R11.
+TEXT runtime·save_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVB	runtime·iscgo(SB),  R10
+	CMPBEQ	R10, $0, nocgo
+	MOVW	AR0, R11
+	SLD	$32, R11
+	MOVW	AR1, R11
+	MOVD	runtime·tls_g(SB), R10
+	MOVD	g, 0(R10)(R11*1)
+nocgo:
+	RET
+
+// load_g loads the g register from pthread-provided
+// thread-local memory, for use after calling externally compiled
+// s390x code that overwrote those registers.
+//
+// This is never called directly from C code (it doesn't have to
+// follow the C ABI), but it may be called from a C context, where the
+// usual Go registers aren't set up.
+//
+// NOTE: _cgo_topofstack assumes this only clobbers g (R13), R10 and R11.
+TEXT runtime·load_g(SB),NOSPLIT|NOFRAME,$0-0
+	MOVW	AR0, R11
+	SLD	$32, R11
+	MOVW	AR1, R11
+	MOVD	runtime·tls_g(SB), R10
+	MOVD	0(R10)(R11*1), g
+	RET
+
+GLOBL runtime·tls_g+0(SB),TLSBSS,$8
diff --git a/src/runtime/tls_stub.go b/src/runtime/tls_stub.go
new file mode 100644
index 0000000..7bdfc6b
--- /dev/null
+++ b/src/runtime/tls_stub.go
@@ -0,0 +1,10 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build (windows && !amd64) || !windows
+
+package runtime
+
+//go:nosplit
+func osSetupTLS(mp *m) {}
diff --git a/src/runtime/tls_windows_amd64.go b/src/runtime/tls_windows_amd64.go
new file mode 100644
index 0000000..cacaa84
--- /dev/null
+++ b/src/runtime/tls_windows_amd64.go
@@ -0,0 +1,10 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// osSetupTLS is called by needm to set up TLS for non-Go threads.
+//
+// Defined in assembly.
+func osSetupTLS(mp *m)
diff --git a/src/runtime/trace.go b/src/runtime/trace.go
new file mode 100644
index 0000000..7d7987c
--- /dev/null
+++ b/src/runtime/trace.go
@@ -0,0 +1,1818 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Go execution tracer.
+// The tracer captures a wide range of execution events like goroutine
+// creation/blocking/unblocking, syscall enter/exit/block, GC-related events,
+// changes of heap size, processor start/stop, etc and writes them to a buffer
+// in a compact form. A precise nanosecond-precision timestamp and a stack
+// trace is captured for most events.
+// See https://golang.org/s/go15trace for more info.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/goarch"
+	"internal/goos"
+	"runtime/internal/atomic"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// Event types in the trace, args are given in square brackets.
+const (
+	traceEvNone              = 0  // unused
+	traceEvBatch             = 1  // start of per-P batch of events [pid, timestamp]
+	traceEvFrequency         = 2  // contains tracer timer frequency [frequency (ticks per second)]
+	traceEvStack             = 3  // stack [stack id, number of PCs, array of {PC, func string ID, file string ID, line}]
+	traceEvGomaxprocs        = 4  // current value of GOMAXPROCS [timestamp, GOMAXPROCS, stack id]
+	traceEvProcStart         = 5  // start of P [timestamp, thread id]
+	traceEvProcStop          = 6  // stop of P [timestamp]
+	traceEvGCStart           = 7  // GC start [timestamp, seq, stack id]
+	traceEvGCDone            = 8  // GC done [timestamp]
+	traceEvSTWStart          = 9  // STW start [timestamp, kind]
+	traceEvSTWDone           = 10 // STW done [timestamp]
+	traceEvGCSweepStart      = 11 // GC sweep start [timestamp, stack id]
+	traceEvGCSweepDone       = 12 // GC sweep done [timestamp, swept, reclaimed]
+	traceEvGoCreate          = 13 // goroutine creation [timestamp, new goroutine id, new stack id, stack id]
+	traceEvGoStart           = 14 // goroutine starts running [timestamp, goroutine id, seq]
+	traceEvGoEnd             = 15 // goroutine ends [timestamp]
+	traceEvGoStop            = 16 // goroutine stops (like in select{}) [timestamp, stack]
+	traceEvGoSched           = 17 // goroutine calls Gosched [timestamp, stack]
+	traceEvGoPreempt         = 18 // goroutine is preempted [timestamp, stack]
+	traceEvGoSleep           = 19 // goroutine calls Sleep [timestamp, stack]
+	traceEvGoBlock           = 20 // goroutine blocks [timestamp, stack]
+	traceEvGoUnblock         = 21 // goroutine is unblocked [timestamp, goroutine id, seq, stack]
+	traceEvGoBlockSend       = 22 // goroutine blocks on chan send [timestamp, stack]
+	traceEvGoBlockRecv       = 23 // goroutine blocks on chan recv [timestamp, stack]
+	traceEvGoBlockSelect     = 24 // goroutine blocks on select [timestamp, stack]
+	traceEvGoBlockSync       = 25 // goroutine blocks on Mutex/RWMutex [timestamp, stack]
+	traceEvGoBlockCond       = 26 // goroutine blocks on Cond [timestamp, stack]
+	traceEvGoBlockNet        = 27 // goroutine blocks on network [timestamp, stack]
+	traceEvGoSysCall         = 28 // syscall enter [timestamp, stack]
+	traceEvGoSysExit         = 29 // syscall exit [timestamp, goroutine id, seq, real timestamp]
+	traceEvGoSysBlock        = 30 // syscall blocks [timestamp]
+	traceEvGoWaiting         = 31 // denotes that goroutine is blocked when tracing starts [timestamp, goroutine id]
+	traceEvGoInSyscall       = 32 // denotes that goroutine is in syscall when tracing starts [timestamp, goroutine id]
+	traceEvHeapAlloc         = 33 // gcController.heapLive change [timestamp, heap_alloc]
+	traceEvHeapGoal          = 34 // gcController.heapGoal() (formerly next_gc) change [timestamp, heap goal in bytes]
+	traceEvTimerGoroutine    = 35 // not currently used; previously denoted timer goroutine [timer goroutine id]
+	traceEvFutileWakeup      = 36 // not currently used; denotes that the previous wakeup of this goroutine was futile [timestamp]
+	traceEvString            = 37 // string dictionary entry [ID, length, string]
+	traceEvGoStartLocal      = 38 // goroutine starts running on the same P as the last event [timestamp, goroutine id]
+	traceEvGoUnblockLocal    = 39 // goroutine is unblocked on the same P as the last event [timestamp, goroutine id, stack]
+	traceEvGoSysExitLocal    = 40 // syscall exit on the same P as the last event [timestamp, goroutine id, real timestamp]
+	traceEvGoStartLabel      = 41 // goroutine starts running with label [timestamp, goroutine id, seq, label string id]
+	traceEvGoBlockGC         = 42 // goroutine blocks on GC assist [timestamp, stack]
+	traceEvGCMarkAssistStart = 43 // GC mark assist start [timestamp, stack]
+	traceEvGCMarkAssistDone  = 44 // GC mark assist done [timestamp]
+	traceEvUserTaskCreate    = 45 // trace.NewTask [timestamp, internal task id, internal parent task id, name string, stack]
+	traceEvUserTaskEnd       = 46 // end of a task [timestamp, internal task id, stack]
+	traceEvUserRegion        = 47 // trace.WithRegion [timestamp, internal task id, mode(0:start, 1:end), name string, stack]
+	traceEvUserLog           = 48 // trace.Log [timestamp, internal task id, key string id, stack, value string]
+	traceEvCPUSample         = 49 // CPU profiling sample [timestamp, real timestamp, real P id (-1 when absent), goroutine id, stack]
+	traceEvCount             = 50
+	// Byte is used but only 6 bits are available for event type.
+	// The remaining 2 bits are used to specify the number of arguments.
+	// That means, the max event type value is 63.
+)
+
+// traceBlockReason is an enumeration of reasons a goroutine might block.
+// This is the interface the rest of the runtime uses to tell the
+// tracer why a goroutine blocked. The tracer then propagates this information
+// into the trace however it sees fit.
+//
+// Note that traceBlockReasons should not be compared, since reasons that are
+// distinct by name may *not* be distinct by value.
+type traceBlockReason uint8
+
+// For maximal efficiency, just map the trace block reason directly to a trace
+// event.
+const (
+	traceBlockGeneric         traceBlockReason = traceEvGoBlock
+	traceBlockForever                          = traceEvGoStop
+	traceBlockNet                              = traceEvGoBlockNet
+	traceBlockSelect                           = traceEvGoBlockSelect
+	traceBlockCondWait                         = traceEvGoBlockCond
+	traceBlockSync                             = traceEvGoBlockSync
+	traceBlockChanSend                         = traceEvGoBlockSend
+	traceBlockChanRecv                         = traceEvGoBlockRecv
+	traceBlockGCMarkAssist                     = traceEvGoBlockGC
+	traceBlockGCSweep                          = traceEvGoBlock
+	traceBlockSystemGoroutine                  = traceEvGoBlock
+	traceBlockPreempted                        = traceEvGoBlock
+	traceBlockDebugCall                        = traceEvGoBlock
+	traceBlockUntilGCEnds                      = traceEvGoBlock
+	traceBlockSleep                            = traceEvGoSleep
+)
+
+const (
+	// Timestamps in trace are cputicks/traceTickDiv.
+	// This makes absolute values of timestamp diffs smaller,
+	// and so they are encoded in less number of bytes.
+	// 64 on x86 is somewhat arbitrary (one tick is ~20ns on a 3GHz machine).
+	// The suggested increment frequency for PowerPC's time base register is
+	// 512 MHz according to Power ISA v2.07 section 6.2, so we use 16 on ppc64
+	// and ppc64le.
+	traceTimeDiv = 16 + 48*(goarch.Is386|goarch.IsAmd64)
+	// Maximum number of PCs in a single stack trace.
+	// Since events contain only stack id rather than whole stack trace,
+	// we can allow quite large values here.
+	traceStackSize = 128
+	// Identifier of a fake P that is used when we trace without a real P.
+	traceGlobProc = -1
+	// Maximum number of bytes to encode uint64 in base-128.
+	traceBytesPerNumber = 10
+	// Shift of the number of arguments in the first event byte.
+	traceArgCountShift = 6
+)
+
+// trace is global tracing context.
+var trace struct {
+	// trace.lock must only be acquired on the system stack where
+	// stack splits cannot happen while it is held.
+	lock          mutex       // protects the following members
+	enabled       bool        // when set runtime traces events
+	shutdown      bool        // set when we are waiting for trace reader to finish after setting enabled to false
+	headerWritten bool        // whether ReadTrace has emitted trace header
+	footerWritten bool        // whether ReadTrace has emitted trace footer
+	shutdownSema  uint32      // used to wait for ReadTrace completion
+	seqStart      uint64      // sequence number when tracing was started
+	startTicks    int64       // cputicks when tracing was started
+	endTicks      int64       // cputicks when tracing was stopped
+	startNanotime int64       // nanotime when tracing was started
+	endNanotime   int64       // nanotime when tracing was stopped
+	startTime     traceTime   // traceClockNow when tracing started
+	endTime       traceTime   // traceClockNow when tracing stopped
+	seqGC         uint64      // GC start/done sequencer
+	reading       traceBufPtr // buffer currently handed off to user
+	empty         traceBufPtr // stack of empty buffers
+	fullHead      traceBufPtr // queue of full buffers
+	fullTail      traceBufPtr
+	stackTab      traceStackTable // maps stack traces to unique ids
+	// cpuLogRead accepts CPU profile samples from the signal handler where
+	// they're generated. It uses a two-word header to hold the IDs of the P and
+	// G (respectively) that were active at the time of the sample. Because
+	// profBuf uses a record with all zeros in its header to indicate overflow,
+	// we make sure to make the P field always non-zero: The ID of a real P will
+	// start at bit 1, and bit 0 will be set. Samples that arrive while no P is
+	// running (such as near syscalls) will set the first header field to 0b10.
+	// This careful handling of the first header field allows us to store ID of
+	// the active G directly in the second field, even though that will be 0
+	// when sampling g0.
+	cpuLogRead *profBuf
+	// cpuLogBuf is a trace buffer to hold events corresponding to CPU profile
+	// samples, which arrive out of band and not directly connected to a
+	// specific P.
+	cpuLogBuf traceBufPtr
+
+	reader atomic.Pointer[g] // goroutine that called ReadTrace, or nil
+
+	signalLock  atomic.Uint32 // protects use of the following member, only usable in signal handlers
+	cpuLogWrite *profBuf      // copy of cpuLogRead for use in signal handlers, set without signalLock
+
+	// Dictionary for traceEvString.
+	//
+	// TODO: central lock to access the map is not ideal.
+	//   option: pre-assign ids to all user annotation region names and tags
+	//   option: per-P cache
+	//   option: sync.Map like data structure
+	stringsLock mutex
+	strings     map[string]uint64
+	stringSeq   uint64
+
+	// markWorkerLabels maps gcMarkWorkerMode to string ID.
+	markWorkerLabels [len(gcMarkWorkerModeStrings)]uint64
+
+	bufLock mutex       // protects buf
+	buf     traceBufPtr // global trace buffer, used when running without a p
+}
+
+// gTraceState is per-G state for the tracer.
+type gTraceState struct {
+	sysExitTime        traceTime // timestamp when syscall has returned
+	tracedSyscallEnter bool      // syscall or cgo was entered while trace was enabled or StartTrace has emitted EvGoInSyscall about this goroutine
+	seq                uint64    // trace event sequencer
+	lastP              puintptr  // last P emitted an event for this goroutine
+}
+
+// mTraceState is per-M state for the tracer.
+type mTraceState struct {
+	startingTrace  bool // this M is in TraceStart, potentially before traceEnabled is true
+	tracedSTWStart bool // this M traced a STW start, so it should trace an end
+}
+
+// pTraceState is per-P state for the tracer.
+type pTraceState struct {
+	buf traceBufPtr
+
+	// inSweep indicates the sweep events should be traced.
+	// This is used to defer the sweep start event until a span
+	// has actually been swept.
+	inSweep bool
+
+	// swept and reclaimed track the number of bytes swept and reclaimed
+	// by sweeping in the current sweep loop (while inSweep was true).
+	swept, reclaimed uintptr
+}
+
+// traceLockInit initializes global trace locks.
+func traceLockInit() {
+	lockInit(&trace.bufLock, lockRankTraceBuf)
+	lockInit(&trace.stringsLock, lockRankTraceStrings)
+	lockInit(&trace.lock, lockRankTrace)
+	lockInit(&trace.stackTab.lock, lockRankTraceStackTab)
+}
+
+// traceBufHeader is per-P tracing buffer.
+type traceBufHeader struct {
+	link     traceBufPtr             // in trace.empty/full
+	lastTime traceTime               // when we wrote the last event
+	pos      int                     // next write offset in arr
+	stk      [traceStackSize]uintptr // scratch buffer for traceback
+}
+
+// traceBuf is per-P tracing buffer.
+type traceBuf struct {
+	_ sys.NotInHeap
+	traceBufHeader
+	arr [64<<10 - unsafe.Sizeof(traceBufHeader{})]byte // underlying buffer for traceBufHeader.buf
+}
+
+// traceBufPtr is a *traceBuf that is not traced by the garbage
+// collector and doesn't have write barriers. traceBufs are not
+// allocated from the GC'd heap, so this is safe, and are often
+// manipulated in contexts where write barriers are not allowed, so
+// this is necessary.
+//
+// TODO: Since traceBuf is now embedded runtime/internal/sys.NotInHeap, this isn't necessary.
+type traceBufPtr uintptr
+
+func (tp traceBufPtr) ptr() *traceBuf   { return (*traceBuf)(unsafe.Pointer(tp)) }
+func (tp *traceBufPtr) set(b *traceBuf) { *tp = traceBufPtr(unsafe.Pointer(b)) }
+func traceBufPtrOf(b *traceBuf) traceBufPtr {
+	return traceBufPtr(unsafe.Pointer(b))
+}
+
+// traceEnabled returns true if the trace is currently enabled.
+//
+//go:nosplit
+func traceEnabled() bool {
+	return trace.enabled
+}
+
+// traceShuttingDown returns true if the trace is currently shutting down.
+//
+//go:nosplit
+func traceShuttingDown() bool {
+	return trace.shutdown
+}
+
+// StartTrace enables tracing for the current process.
+// While tracing, the data will be buffered and available via ReadTrace.
+// StartTrace returns an error if tracing is already enabled.
+// Most clients should use the runtime/trace package or the testing package's
+// -test.trace flag instead of calling StartTrace directly.
+func StartTrace() error {
+	// Stop the world so that we can take a consistent snapshot
+	// of all goroutines at the beginning of the trace.
+	// Do not stop the world during GC so we ensure we always see
+	// a consistent view of GC-related events (e.g. a start is always
+	// paired with an end).
+	stopTheWorldGC(stwStartTrace)
+
+	// Prevent sysmon from running any code that could generate events.
+	lock(&sched.sysmonlock)
+
+	// We are in stop-the-world, but syscalls can finish and write to trace concurrently.
+	// Exitsyscall could check trace.enabled long before and then suddenly wake up
+	// and decide to write to trace at a random point in time.
+	// However, such syscall will use the global trace.buf buffer, because we've
+	// acquired all p's by doing stop-the-world. So this protects us from such races.
+	lock(&trace.bufLock)
+
+	if trace.enabled || trace.shutdown {
+		unlock(&trace.bufLock)
+		unlock(&sched.sysmonlock)
+		startTheWorldGC()
+		return errorString("tracing is already enabled")
+	}
+
+	// Can't set trace.enabled yet. While the world is stopped, exitsyscall could
+	// already emit a delayed event (see exitTicks in exitsyscall) if we set trace.enabled here.
+	// That would lead to an inconsistent trace:
+	// - either GoSysExit appears before EvGoInSyscall,
+	// - or GoSysExit appears for a goroutine for which we don't emit EvGoInSyscall below.
+	// To instruct traceEvent that it must not ignore events below, we set trace.startingTrace.
+	// trace.enabled is set afterwards once we have emitted all preliminary events.
+	mp := getg().m
+	mp.trace.startingTrace = true
+
+	// Obtain current stack ID to use in all traceEvGoCreate events below.
+	stkBuf := make([]uintptr, traceStackSize)
+	stackID := traceStackID(mp, stkBuf, 2)
+
+	profBuf := newProfBuf(2, profBufWordCount, profBufTagCount) // after the timestamp, header is [pp.id, gp.goid]
+	trace.cpuLogRead = profBuf
+
+	// We must not acquire trace.signalLock outside of a signal handler: a
+	// profiling signal may arrive at any time and try to acquire it, leading to
+	// deadlock. Because we can't use that lock to protect updates to
+	// trace.cpuLogWrite (only use of the structure it references), reads and
+	// writes of the pointer must be atomic. (And although this field is never
+	// the sole pointer to the profBuf value, it's best to allow a write barrier
+	// here.)
+	atomicstorep(unsafe.Pointer(&trace.cpuLogWrite), unsafe.Pointer(profBuf))
+
+	// World is stopped, no need to lock.
+	forEachGRace(func(gp *g) {
+		status := readgstatus(gp)
+		if status != _Gdead {
+			gp.trace.seq = 0
+			gp.trace.lastP = getg().m.p
+			// +PCQuantum because traceFrameForPC expects return PCs and subtracts PCQuantum.
+			id := trace.stackTab.put([]uintptr{logicalStackSentinel, startPCforTrace(gp.startpc) + sys.PCQuantum})
+			traceEvent(traceEvGoCreate, -1, gp.goid, uint64(id), stackID)
+		}
+		if status == _Gwaiting {
+			// traceEvGoWaiting is implied to have seq=1.
+			gp.trace.seq++
+			traceEvent(traceEvGoWaiting, -1, gp.goid)
+		}
+		if status == _Gsyscall {
+			gp.trace.seq++
+			gp.trace.tracedSyscallEnter = true
+			traceEvent(traceEvGoInSyscall, -1, gp.goid)
+		} else if status == _Gdead && gp.m != nil && gp.m.isextra {
+			// Trigger two trace events for the dead g in the extra m,
+			// since the next event of the g will be traceEvGoSysExit in exitsyscall,
+			// while calling from C thread to Go.
+			gp.trace.seq = 0
+			gp.trace.lastP = getg().m.p
+			// +PCQuantum because traceFrameForPC expects return PCs and subtracts PCQuantum.
+			id := trace.stackTab.put([]uintptr{logicalStackSentinel, startPCforTrace(0) + sys.PCQuantum}) // no start pc
+			traceEvent(traceEvGoCreate, -1, gp.goid, uint64(id), stackID)
+			gp.trace.seq++
+			gp.trace.tracedSyscallEnter = true
+			traceEvent(traceEvGoInSyscall, -1, gp.goid)
+		} else {
+			// We need to explicitly clear the flag. A previous trace might have ended with a goroutine
+			// not emitting a GoSysExit and clearing the flag, leaving it in a stale state. Clearing
+			// it here makes it unambiguous to any goroutine exiting a syscall racing with us that
+			// no EvGoInSyscall event was emitted for it. (It's not racy to set this flag here, because
+			// it'll only get checked when the goroutine runs again, which will be after the world starts
+			// again.)
+			gp.trace.tracedSyscallEnter = false
+		}
+	})
+	traceProcStart()
+	traceGoStart()
+	// Note: startTicks needs to be set after we emit traceEvGoInSyscall events.
+	// If we do it the other way around, it is possible that exitsyscall will
+	// query sysExitTime after startTicks but before traceEvGoInSyscall timestamp.
+	// It will lead to a false conclusion that cputicks is broken.
+	trace.startTime = traceClockNow()
+	trace.startTicks = cputicks()
+	trace.startNanotime = nanotime()
+	trace.headerWritten = false
+	trace.footerWritten = false
+
+	// string to id mapping
+	//  0 : reserved for an empty string
+	//  remaining: other strings registered by traceString
+	trace.stringSeq = 0
+	trace.strings = make(map[string]uint64)
+
+	trace.seqGC = 0
+	mp.trace.startingTrace = false
+	trace.enabled = true
+
+	// Register runtime goroutine labels.
+	_, pid, bufp := traceAcquireBuffer()
+	for i, label := range gcMarkWorkerModeStrings[:] {
+		trace.markWorkerLabels[i], bufp = traceString(bufp, pid, label)
+	}
+	traceReleaseBuffer(mp, pid)
+
+	unlock(&trace.bufLock)
+
+	unlock(&sched.sysmonlock)
+
+	// Record the current state of HeapGoal to avoid information loss in trace.
+	traceHeapGoal()
+
+	startTheWorldGC()
+	return nil
+}
+
+// StopTrace stops tracing, if it was previously enabled.
+// StopTrace only returns after all the reads for the trace have completed.
+func StopTrace() {
+	// Stop the world so that we can collect the trace buffers from all p's below,
+	// and also to avoid races with traceEvent.
+	stopTheWorldGC(stwStopTrace)
+
+	// See the comment in StartTrace.
+	lock(&sched.sysmonlock)
+
+	// See the comment in StartTrace.
+	lock(&trace.bufLock)
+
+	if !trace.enabled {
+		unlock(&trace.bufLock)
+		unlock(&sched.sysmonlock)
+		startTheWorldGC()
+		return
+	}
+
+	traceGoSched()
+
+	atomicstorep(unsafe.Pointer(&trace.cpuLogWrite), nil)
+	trace.cpuLogRead.close()
+	traceReadCPU()
+
+	// Loop over all allocated Ps because dead Ps may still have
+	// trace buffers.
+	for _, p := range allp[:cap(allp)] {
+		buf := p.trace.buf
+		if buf != 0 {
+			traceFullQueue(buf)
+			p.trace.buf = 0
+		}
+	}
+	if trace.buf != 0 {
+		buf := trace.buf
+		trace.buf = 0
+		if buf.ptr().pos != 0 {
+			traceFullQueue(buf)
+		}
+	}
+	if trace.cpuLogBuf != 0 {
+		buf := trace.cpuLogBuf
+		trace.cpuLogBuf = 0
+		if buf.ptr().pos != 0 {
+			traceFullQueue(buf)
+		}
+	}
+
+	// Wait for startNanotime != endNanotime. On Windows the default interval between
+	// system clock ticks is typically between 1 and 15 milliseconds, which may not
+	// have passed since the trace started. Without nanotime moving forward, trace
+	// tooling has no way of identifying how much real time each cputicks time deltas
+	// represent.
+	for {
+		trace.endTime = traceClockNow()
+		trace.endTicks = cputicks()
+		trace.endNanotime = nanotime()
+
+		if trace.endNanotime != trace.startNanotime || faketime != 0 {
+			break
+		}
+		osyield()
+	}
+
+	trace.enabled = false
+	trace.shutdown = true
+	unlock(&trace.bufLock)
+
+	unlock(&sched.sysmonlock)
+
+	startTheWorldGC()
+
+	// The world is started but we've set trace.shutdown, so new tracing can't start.
+	// Wait for the trace reader to flush pending buffers and stop.
+	semacquire(&trace.shutdownSema)
+	if raceenabled {
+		raceacquire(unsafe.Pointer(&trace.shutdownSema))
+	}
+
+	systemstack(func() {
+		// The lock protects us from races with StartTrace/StopTrace because they do stop-the-world.
+		lock(&trace.lock)
+		for _, p := range allp[:cap(allp)] {
+			if p.trace.buf != 0 {
+				throw("trace: non-empty trace buffer in proc")
+			}
+		}
+		if trace.buf != 0 {
+			throw("trace: non-empty global trace buffer")
+		}
+		if trace.fullHead != 0 || trace.fullTail != 0 {
+			throw("trace: non-empty full trace buffer")
+		}
+		if trace.reading != 0 || trace.reader.Load() != nil {
+			throw("trace: reading after shutdown")
+		}
+		for trace.empty != 0 {
+			buf := trace.empty
+			trace.empty = buf.ptr().link
+			sysFree(unsafe.Pointer(buf), unsafe.Sizeof(*buf.ptr()), &memstats.other_sys)
+		}
+		trace.strings = nil
+		trace.shutdown = false
+		trace.cpuLogRead = nil
+		unlock(&trace.lock)
+	})
+}
+
+// ReadTrace returns the next chunk of binary tracing data, blocking until data
+// is available. If tracing is turned off and all the data accumulated while it
+// was on has been returned, ReadTrace returns nil. The caller must copy the
+// returned data before calling ReadTrace again.
+// ReadTrace must be called from one goroutine at a time.
+func ReadTrace() []byte {
+top:
+	var buf []byte
+	var park bool
+	systemstack(func() {
+		buf, park = readTrace0()
+	})
+	if park {
+		gopark(func(gp *g, _ unsafe.Pointer) bool {
+			if !trace.reader.CompareAndSwapNoWB(nil, gp) {
+				// We're racing with another reader.
+				// Wake up and handle this case.
+				return false
+			}
+
+			if g2 := traceReader(); gp == g2 {
+				// New data arrived between unlocking
+				// and the CAS and we won the wake-up
+				// race, so wake up directly.
+				return false
+			} else if g2 != nil {
+				printlock()
+				println("runtime: got trace reader", g2, g2.goid)
+				throw("unexpected trace reader")
+			}
+
+			return true
+		}, nil, waitReasonTraceReaderBlocked, traceBlockSystemGoroutine, 2)
+		goto top
+	}
+
+	return buf
+}
+
+// readTrace0 is ReadTrace's continuation on g0. This must run on the
+// system stack because it acquires trace.lock.
+//
+//go:systemstack
+func readTrace0() (buf []byte, park bool) {
+	if raceenabled {
+		// g0 doesn't have a race context. Borrow the user G's.
+		if getg().racectx != 0 {
+			throw("expected racectx == 0")
+		}
+		getg().racectx = getg().m.curg.racectx
+		// (This defer should get open-coded, which is safe on
+		// the system stack.)
+		defer func() { getg().racectx = 0 }()
+	}
+
+	// Optimistically look for CPU profile samples. This may write new stack
+	// records, and may write new tracing buffers. This must be done with the
+	// trace lock not held. footerWritten and shutdown are safe to access
+	// here. They are only mutated by this goroutine or during a STW.
+	if !trace.footerWritten && !trace.shutdown {
+		traceReadCPU()
+	}
+
+	// This function must not allocate while holding trace.lock:
+	// allocation can call heap allocate, which will try to emit a trace
+	// event while holding heap lock.
+	lock(&trace.lock)
+
+	if trace.reader.Load() != nil {
+		// More than one goroutine reads trace. This is bad.
+		// But we rather do not crash the program because of tracing,
+		// because tracing can be enabled at runtime on prod servers.
+		unlock(&trace.lock)
+		println("runtime: ReadTrace called from multiple goroutines simultaneously")
+		return nil, false
+	}
+	// Recycle the old buffer.
+	if buf := trace.reading; buf != 0 {
+		buf.ptr().link = trace.empty
+		trace.empty = buf
+		trace.reading = 0
+	}
+	// Write trace header.
+	if !trace.headerWritten {
+		trace.headerWritten = true
+		unlock(&trace.lock)
+		return []byte("go 1.21 trace\x00\x00\x00"), false
+	}
+	// Wait for new data.
+	if trace.fullHead == 0 && !trace.shutdown {
+		// We don't simply use a note because the scheduler
+		// executes this goroutine directly when it wakes up
+		// (also a note would consume an M).
+		unlock(&trace.lock)
+		return nil, true
+	}
+newFull:
+	assertLockHeld(&trace.lock)
+	// Write a buffer.
+	if trace.fullHead != 0 {
+		buf := traceFullDequeue()
+		trace.reading = buf
+		unlock(&trace.lock)
+		return buf.ptr().arr[:buf.ptr().pos], false
+	}
+
+	// Write footer with timer frequency.
+	if !trace.footerWritten {
+		trace.footerWritten = true
+		freq := (float64(trace.endTicks-trace.startTicks) / traceTimeDiv) / (float64(trace.endNanotime-trace.startNanotime) / 1e9)
+		if freq <= 0 {
+			throw("trace: ReadTrace got invalid frequency")
+		}
+		unlock(&trace.lock)
+
+		// Write frequency event.
+		bufp := traceFlush(0, 0)
+		buf := bufp.ptr()
+		buf.byte(traceEvFrequency | 0<<traceArgCountShift)
+		buf.varint(uint64(freq))
+
+		// Dump stack table.
+		// This will emit a bunch of full buffers, we will pick them up
+		// on the next iteration.
+		bufp = trace.stackTab.dump(bufp)
+
+		// Flush final buffer.
+		lock(&trace.lock)
+		traceFullQueue(bufp)
+		goto newFull // trace.lock should be held at newFull
+	}
+	// Done.
+	if trace.shutdown {
+		unlock(&trace.lock)
+		if raceenabled {
+			// Model synchronization on trace.shutdownSema, which race
+			// detector does not see. This is required to avoid false
+			// race reports on writer passed to trace.Start.
+			racerelease(unsafe.Pointer(&trace.shutdownSema))
+		}
+		// trace.enabled is already reset, so can call traceable functions.
+		semrelease(&trace.shutdownSema)
+		return nil, false
+	}
+	// Also bad, but see the comment above.
+	unlock(&trace.lock)
+	println("runtime: spurious wakeup of trace reader")
+	return nil, false
+}
+
+// traceReader returns the trace reader that should be woken up, if any.
+// Callers should first check that trace.enabled or trace.shutdown is set.
+//
+// This must run on the system stack because it acquires trace.lock.
+//
+//go:systemstack
+func traceReader() *g {
+	// Optimistic check first
+	if traceReaderAvailable() == nil {
+		return nil
+	}
+	lock(&trace.lock)
+	gp := traceReaderAvailable()
+	if gp == nil || !trace.reader.CompareAndSwapNoWB(gp, nil) {
+		unlock(&trace.lock)
+		return nil
+	}
+	unlock(&trace.lock)
+	return gp
+}
+
+// traceReaderAvailable returns the trace reader if it is not currently
+// scheduled and should be. Callers should first check that trace.enabled
+// or trace.shutdown is set.
+func traceReaderAvailable() *g {
+	if trace.fullHead != 0 || trace.shutdown {
+		return trace.reader.Load()
+	}
+	return nil
+}
+
+// traceProcFree frees trace buffer associated with pp.
+//
+// This must run on the system stack because it acquires trace.lock.
+//
+//go:systemstack
+func traceProcFree(pp *p) {
+	buf := pp.trace.buf
+	pp.trace.buf = 0
+	if buf == 0 {
+		return
+	}
+	lock(&trace.lock)
+	traceFullQueue(buf)
+	unlock(&trace.lock)
+}
+
+// traceFullQueue queues buf into queue of full buffers.
+func traceFullQueue(buf traceBufPtr) {
+	buf.ptr().link = 0
+	if trace.fullHead == 0 {
+		trace.fullHead = buf
+	} else {
+		trace.fullTail.ptr().link = buf
+	}
+	trace.fullTail = buf
+}
+
+// traceFullDequeue dequeues from queue of full buffers.
+func traceFullDequeue() traceBufPtr {
+	buf := trace.fullHead
+	if buf == 0 {
+		return 0
+	}
+	trace.fullHead = buf.ptr().link
+	if trace.fullHead == 0 {
+		trace.fullTail = 0
+	}
+	buf.ptr().link = 0
+	return buf
+}
+
+// traceEvent writes a single event to trace buffer, flushing the buffer if necessary.
+// ev is event type.
+// If skip > 0, write current stack id as the last argument (skipping skip top frames).
+// If skip = 0, this event type should contain a stack, but we don't want
+// to collect and remember it for this particular call.
+func traceEvent(ev byte, skip int, args ...uint64) {
+	mp, pid, bufp := traceAcquireBuffer()
+	// Double-check trace.enabled now that we've done m.locks++ and acquired bufLock.
+	// This protects from races between traceEvent and StartTrace/StopTrace.
+
+	// The caller checked that trace.enabled == true, but trace.enabled might have been
+	// turned off between the check and now. Check again. traceLockBuffer did mp.locks++,
+	// StopTrace does stopTheWorld, and stopTheWorld waits for mp.locks to go back to zero,
+	// so if we see trace.enabled == true now, we know it's true for the rest of the function.
+	// Exitsyscall can run even during stopTheWorld. The race with StartTrace/StopTrace
+	// during tracing in exitsyscall is resolved by locking trace.bufLock in traceLockBuffer.
+	//
+	// Note trace_userTaskCreate runs the same check.
+	if !trace.enabled && !mp.trace.startingTrace {
+		traceReleaseBuffer(mp, pid)
+		return
+	}
+
+	if skip > 0 {
+		if getg() == mp.curg {
+			skip++ // +1 because stack is captured in traceEventLocked.
+		}
+	}
+	traceEventLocked(0, mp, pid, bufp, ev, 0, skip, args...)
+	traceReleaseBuffer(mp, pid)
+}
+
+// traceEventLocked writes a single event of type ev to the trace buffer bufp,
+// flushing the buffer if necessary. pid is the id of the current P, or
+// traceGlobProc if we're tracing without a real P.
+//
+// Preemption is disabled, and if running without a real P the global tracing
+// buffer is locked.
+//
+// Events types that do not include a stack set skip to -1. Event types that
+// include a stack may explicitly reference a stackID from the trace.stackTab
+// (obtained by an earlier call to traceStackID). Without an explicit stackID,
+// this function will automatically capture the stack of the goroutine currently
+// running on mp, skipping skip top frames or, if skip is 0, writing out an
+// empty stack record.
+//
+// It records the event's args to the traceBuf, and also makes an effort to
+// reserve extraBytes bytes of additional space immediately following the event,
+// in the same traceBuf.
+func traceEventLocked(extraBytes int, mp *m, pid int32, bufp *traceBufPtr, ev byte, stackID uint32, skip int, args ...uint64) {
+	buf := bufp.ptr()
+	// TODO: test on non-zero extraBytes param.
+	maxSize := 2 + 5*traceBytesPerNumber + extraBytes // event type, length, sequence, timestamp, stack id and two add params
+	if buf == nil || len(buf.arr)-buf.pos < maxSize {
+		systemstack(func() {
+			buf = traceFlush(traceBufPtrOf(buf), pid).ptr()
+		})
+		bufp.set(buf)
+	}
+
+	ts := traceClockNow()
+	if ts <= buf.lastTime {
+		ts = buf.lastTime + 1
+	}
+	tsDiff := uint64(ts - buf.lastTime)
+	buf.lastTime = ts
+	narg := byte(len(args))
+	if stackID != 0 || skip >= 0 {
+		narg++
+	}
+	// We have only 2 bits for number of arguments.
+	// If number is >= 3, then the event type is followed by event length in bytes.
+	if narg > 3 {
+		narg = 3
+	}
+	startPos := buf.pos
+	buf.byte(ev | narg<<traceArgCountShift)
+	var lenp *byte
+	if narg == 3 {
+		// Reserve the byte for length assuming that length < 128.
+		buf.varint(0)
+		lenp = &buf.arr[buf.pos-1]
+	}
+	buf.varint(tsDiff)
+	for _, a := range args {
+		buf.varint(a)
+	}
+	if stackID != 0 {
+		buf.varint(uint64(stackID))
+	} else if skip == 0 {
+		buf.varint(0)
+	} else if skip > 0 {
+		buf.varint(traceStackID(mp, buf.stk[:], skip))
+	}
+	evSize := buf.pos - startPos
+	if evSize > maxSize {
+		throw("invalid length of trace event")
+	}
+	if lenp != nil {
+		// Fill in actual length.
+		*lenp = byte(evSize - 2)
+	}
+}
+
+// traceCPUSample writes a CPU profile sample stack to the execution tracer's
+// profiling buffer. It is called from a signal handler, so is limited in what
+// it can do.
+func traceCPUSample(gp *g, pp *p, stk []uintptr) {
+	if !trace.enabled {
+		// Tracing is usually turned off; don't spend time acquiring the signal
+		// lock unless it's active.
+		return
+	}
+
+	// Match the clock used in traceEventLocked
+	now := traceClockNow()
+	// The "header" here is the ID of the P that was running the profiled code,
+	// followed by the ID of the goroutine. (For normal CPU profiling, it's
+	// usually the number of samples with the given stack.) Near syscalls, pp
+	// may be nil. Reporting goid of 0 is fine for either g0 or a nil gp.
+	var hdr [2]uint64
+	if pp != nil {
+		// Overflow records in profBuf have all header values set to zero. Make
+		// sure that real headers have at least one bit set.
+		hdr[0] = uint64(pp.id)<<1 | 0b1
+	} else {
+		hdr[0] = 0b10
+	}
+	if gp != nil {
+		hdr[1] = gp.goid
+	}
+
+	// Allow only one writer at a time
+	for !trace.signalLock.CompareAndSwap(0, 1) {
+		// TODO: Is it safe to osyield here? https://go.dev/issue/52672
+		osyield()
+	}
+
+	if log := (*profBuf)(atomic.Loadp(unsafe.Pointer(&trace.cpuLogWrite))); log != nil {
+		// Note: we don't pass a tag pointer here (how should profiling tags
+		// interact with the execution tracer?), but if we did we'd need to be
+		// careful about write barriers. See the long comment in profBuf.write.
+		log.write(nil, int64(now), hdr[:], stk)
+	}
+
+	trace.signalLock.Store(0)
+}
+
+func traceReadCPU() {
+	bufp := &trace.cpuLogBuf
+
+	for {
+		data, tags, _ := trace.cpuLogRead.read(profBufNonBlocking)
+		if len(data) == 0 {
+			break
+		}
+		for len(data) > 0 {
+			if len(data) < 4 || data[0] > uint64(len(data)) {
+				break // truncated profile
+			}
+			if data[0] < 4 || tags != nil && len(tags) < 1 {
+				break // malformed profile
+			}
+			if len(tags) < 1 {
+				break // mismatched profile records and tags
+			}
+			timestamp := data[1]
+			ppid := data[2] >> 1
+			if hasP := (data[2] & 0b1) != 0; !hasP {
+				ppid = ^uint64(0)
+			}
+			goid := data[3]
+			stk := data[4:data[0]]
+			empty := len(stk) == 1 && data[2] == 0 && data[3] == 0
+			data = data[data[0]:]
+			// No support here for reporting goroutine tags at the moment; if
+			// that information is to be part of the execution trace, we'd
+			// probably want to see when the tags are applied and when they
+			// change, instead of only seeing them when we get a CPU sample.
+			tags = tags[1:]
+
+			if empty {
+				// Looks like an overflow record from the profBuf. Not much to
+				// do here, we only want to report full records.
+				//
+				// TODO: should we start a goroutine to drain the profBuf,
+				// rather than relying on a high-enough volume of tracing events
+				// to keep ReadTrace busy? https://go.dev/issue/52674
+				continue
+			}
+
+			buf := bufp.ptr()
+			if buf == nil {
+				systemstack(func() {
+					*bufp = traceFlush(*bufp, 0)
+				})
+				buf = bufp.ptr()
+			}
+			nstk := 1
+			buf.stk[0] = logicalStackSentinel
+			for ; nstk < len(buf.stk) && nstk-1 < len(stk); nstk++ {
+				buf.stk[nstk] = uintptr(stk[nstk-1])
+			}
+			stackID := trace.stackTab.put(buf.stk[:nstk])
+
+			traceEventLocked(0, nil, 0, bufp, traceEvCPUSample, stackID, 1, uint64(timestamp), ppid, goid)
+		}
+	}
+}
+
+// logicalStackSentinel is a sentinel value at pcBuf[0] signifying that
+// pcBuf[1:] holds a logical stack requiring no further processing. Any other
+// value at pcBuf[0] represents a skip value to apply to the physical stack in
+// pcBuf[1:] after inline expansion.
+const logicalStackSentinel = ^uintptr(0)
+
+// traceStackID captures a stack trace into pcBuf, registers it in the trace
+// stack table, and returns its unique ID. pcBuf should have a length equal to
+// traceStackSize. skip controls the number of leaf frames to omit in order to
+// hide tracer internals from stack traces, see CL 5523.
+func traceStackID(mp *m, pcBuf []uintptr, skip int) uint64 {
+	gp := getg()
+	curgp := mp.curg
+	nstk := 1
+	if tracefpunwindoff() || mp.hasCgoOnStack() {
+		// Slow path: Unwind using default unwinder. Used when frame pointer
+		// unwinding is unavailable or disabled (tracefpunwindoff), or might
+		// produce incomplete results or crashes (hasCgoOnStack). Note that no
+		// cgo callback related crashes have been observed yet. The main
+		// motivation is to take advantage of a potentially registered cgo
+		// symbolizer.
+		pcBuf[0] = logicalStackSentinel
+		if curgp == gp {
+			nstk += callers(skip+1, pcBuf[1:])
+		} else if curgp != nil {
+			nstk += gcallers(curgp, skip, pcBuf[1:])
+		}
+	} else {
+		// Fast path: Unwind using frame pointers.
+		pcBuf[0] = uintptr(skip)
+		if curgp == gp {
+			nstk += fpTracebackPCs(unsafe.Pointer(getfp()), pcBuf[1:])
+		} else if curgp != nil {
+			// We're called on the g0 stack through mcall(fn) or systemstack(fn). To
+			// behave like gcallers above, we start unwinding from sched.bp, which
+			// points to the caller frame of the leaf frame on g's stack. The return
+			// address of the leaf frame is stored in sched.pc, which we manually
+			// capture here.
+			pcBuf[1] = curgp.sched.pc
+			nstk += 1 + fpTracebackPCs(unsafe.Pointer(curgp.sched.bp), pcBuf[2:])
+		}
+	}
+	if nstk > 0 {
+		nstk-- // skip runtime.goexit
+	}
+	if nstk > 0 && curgp.goid == 1 {
+		nstk-- // skip runtime.main
+	}
+	id := trace.stackTab.put(pcBuf[:nstk])
+	return uint64(id)
+}
+
+// tracefpunwindoff returns true if frame pointer unwinding for the tracer is
+// disabled via GODEBUG or not supported by the architecture.
+// TODO(#60254): support frame pointer unwinding on plan9/amd64.
+func tracefpunwindoff() bool {
+	return debug.tracefpunwindoff != 0 || (goarch.ArchFamily != goarch.AMD64 && goarch.ArchFamily != goarch.ARM64) || goos.IsPlan9 == 1
+}
+
+// fpTracebackPCs populates pcBuf with the return addresses for each frame and
+// returns the number of PCs written to pcBuf. The returned PCs correspond to
+// "physical frames" rather than "logical frames"; that is if A is inlined into
+// B, this will return a PC for only B.
+func fpTracebackPCs(fp unsafe.Pointer, pcBuf []uintptr) (i int) {
+	for i = 0; i < len(pcBuf) && fp != nil; i++ {
+		// return addr sits one word above the frame pointer
+		pcBuf[i] = *(*uintptr)(unsafe.Pointer(uintptr(fp) + goarch.PtrSize))
+		// follow the frame pointer to the next one
+		fp = unsafe.Pointer(*(*uintptr)(fp))
+	}
+	return i
+}
+
+// traceAcquireBuffer returns trace buffer to use and, if necessary, locks it.
+func traceAcquireBuffer() (mp *m, pid int32, bufp *traceBufPtr) {
+	// Any time we acquire a buffer, we may end up flushing it,
+	// but flushes are rare. Record the lock edge even if it
+	// doesn't happen this time.
+	lockRankMayTraceFlush()
+
+	mp = acquirem()
+	if p := mp.p.ptr(); p != nil {
+		return mp, p.id, &p.trace.buf
+	}
+	lock(&trace.bufLock)
+	return mp, traceGlobProc, &trace.buf
+}
+
+// traceReleaseBuffer releases a buffer previously acquired with traceAcquireBuffer.
+func traceReleaseBuffer(mp *m, pid int32) {
+	if pid == traceGlobProc {
+		unlock(&trace.bufLock)
+	}
+	releasem(mp)
+}
+
+// lockRankMayTraceFlush records the lock ranking effects of a
+// potential call to traceFlush.
+func lockRankMayTraceFlush() {
+	lockWithRankMayAcquire(&trace.lock, getLockRank(&trace.lock))
+}
+
+// traceFlush puts buf onto stack of full buffers and returns an empty buffer.
+//
+// This must run on the system stack because it acquires trace.lock.
+//
+//go:systemstack
+func traceFlush(buf traceBufPtr, pid int32) traceBufPtr {
+	lock(&trace.lock)
+	if buf != 0 {
+		traceFullQueue(buf)
+	}
+	if trace.empty != 0 {
+		buf = trace.empty
+		trace.empty = buf.ptr().link
+	} else {
+		buf = traceBufPtr(sysAlloc(unsafe.Sizeof(traceBuf{}), &memstats.other_sys))
+		if buf == 0 {
+			throw("trace: out of memory")
+		}
+	}
+	bufp := buf.ptr()
+	bufp.link.set(nil)
+	bufp.pos = 0
+
+	// initialize the buffer for a new batch
+	ts := traceClockNow()
+	if ts <= bufp.lastTime {
+		ts = bufp.lastTime + 1
+	}
+	bufp.lastTime = ts
+	bufp.byte(traceEvBatch | 1<<traceArgCountShift)
+	bufp.varint(uint64(pid))
+	bufp.varint(uint64(ts))
+
+	unlock(&trace.lock)
+	return buf
+}
+
+// traceString adds a string to the trace.strings and returns the id.
+func traceString(bufp *traceBufPtr, pid int32, s string) (uint64, *traceBufPtr) {
+	if s == "" {
+		return 0, bufp
+	}
+
+	lock(&trace.stringsLock)
+	if raceenabled {
+		// raceacquire is necessary because the map access
+		// below is race annotated.
+		raceacquire(unsafe.Pointer(&trace.stringsLock))
+	}
+
+	if id, ok := trace.strings[s]; ok {
+		if raceenabled {
+			racerelease(unsafe.Pointer(&trace.stringsLock))
+		}
+		unlock(&trace.stringsLock)
+
+		return id, bufp
+	}
+
+	trace.stringSeq++
+	id := trace.stringSeq
+	trace.strings[s] = id
+
+	if raceenabled {
+		racerelease(unsafe.Pointer(&trace.stringsLock))
+	}
+	unlock(&trace.stringsLock)
+
+	// memory allocation in above may trigger tracing and
+	// cause *bufp changes. Following code now works with *bufp,
+	// so there must be no memory allocation or any activities
+	// that causes tracing after this point.
+
+	buf := bufp.ptr()
+	size := 1 + 2*traceBytesPerNumber + len(s)
+	if buf == nil || len(buf.arr)-buf.pos < size {
+		systemstack(func() {
+			buf = traceFlush(traceBufPtrOf(buf), pid).ptr()
+			bufp.set(buf)
+		})
+	}
+	buf.byte(traceEvString)
+	buf.varint(id)
+
+	// double-check the string and the length can fit.
+	// Otherwise, truncate the string.
+	slen := len(s)
+	if room := len(buf.arr) - buf.pos; room < slen+traceBytesPerNumber {
+		slen = room
+	}
+
+	buf.varint(uint64(slen))
+	buf.pos += copy(buf.arr[buf.pos:], s[:slen])
+
+	bufp.set(buf)
+	return id, bufp
+}
+
+// varint appends v to buf in little-endian-base-128 encoding.
+func (buf *traceBuf) varint(v uint64) {
+	pos := buf.pos
+	for ; v >= 0x80; v >>= 7 {
+		buf.arr[pos] = 0x80 | byte(v)
+		pos++
+	}
+	buf.arr[pos] = byte(v)
+	pos++
+	buf.pos = pos
+}
+
+// varintAt writes varint v at byte position pos in buf. This always
+// consumes traceBytesPerNumber bytes. This is intended for when the
+// caller needs to reserve space for a varint but can't populate it
+// until later.
+func (buf *traceBuf) varintAt(pos int, v uint64) {
+	for i := 0; i < traceBytesPerNumber; i++ {
+		if i < traceBytesPerNumber-1 {
+			buf.arr[pos] = 0x80 | byte(v)
+		} else {
+			buf.arr[pos] = byte(v)
+		}
+		v >>= 7
+		pos++
+	}
+}
+
+// byte appends v to buf.
+func (buf *traceBuf) byte(v byte) {
+	buf.arr[buf.pos] = v
+	buf.pos++
+}
+
+// traceStackTable maps stack traces (arrays of PC's) to unique uint32 ids.
+// It is lock-free for reading.
+type traceStackTable struct {
+	lock mutex // Must be acquired on the system stack
+	seq  uint32
+	mem  traceAlloc
+	tab  [1 << 13]traceStackPtr
+}
+
+// traceStack is a single stack in traceStackTable.
+type traceStack struct {
+	link traceStackPtr
+	hash uintptr
+	id   uint32
+	n    int
+	stk  [0]uintptr // real type [n]uintptr
+}
+
+type traceStackPtr uintptr
+
+func (tp traceStackPtr) ptr() *traceStack { return (*traceStack)(unsafe.Pointer(tp)) }
+
+// stack returns slice of PCs.
+func (ts *traceStack) stack() []uintptr {
+	return (*[traceStackSize]uintptr)(unsafe.Pointer(&ts.stk))[:ts.n]
+}
+
+// put returns a unique id for the stack trace pcs and caches it in the table,
+// if it sees the trace for the first time.
+func (tab *traceStackTable) put(pcs []uintptr) uint32 {
+	if len(pcs) == 0 {
+		return 0
+	}
+	hash := memhash(unsafe.Pointer(&pcs[0]), 0, uintptr(len(pcs))*unsafe.Sizeof(pcs[0]))
+	// First, search the hashtable w/o the mutex.
+	if id := tab.find(pcs, hash); id != 0 {
+		return id
+	}
+	// Now, double check under the mutex.
+	// Switch to the system stack so we can acquire tab.lock
+	var id uint32
+	systemstack(func() {
+		lock(&tab.lock)
+		if id = tab.find(pcs, hash); id != 0 {
+			unlock(&tab.lock)
+			return
+		}
+		// Create new record.
+		tab.seq++
+		stk := tab.newStack(len(pcs))
+		stk.hash = hash
+		stk.id = tab.seq
+		id = stk.id
+		stk.n = len(pcs)
+		stkpc := stk.stack()
+		copy(stkpc, pcs)
+		part := int(hash % uintptr(len(tab.tab)))
+		stk.link = tab.tab[part]
+		atomicstorep(unsafe.Pointer(&tab.tab[part]), unsafe.Pointer(stk))
+		unlock(&tab.lock)
+	})
+	return id
+}
+
+// find checks if the stack trace pcs is already present in the table.
+func (tab *traceStackTable) find(pcs []uintptr, hash uintptr) uint32 {
+	part := int(hash % uintptr(len(tab.tab)))
+Search:
+	for stk := tab.tab[part].ptr(); stk != nil; stk = stk.link.ptr() {
+		if stk.hash == hash && stk.n == len(pcs) {
+			for i, stkpc := range stk.stack() {
+				if stkpc != pcs[i] {
+					continue Search
+				}
+			}
+			return stk.id
+		}
+	}
+	return 0
+}
+
+// newStack allocates a new stack of size n.
+func (tab *traceStackTable) newStack(n int) *traceStack {
+	return (*traceStack)(tab.mem.alloc(unsafe.Sizeof(traceStack{}) + uintptr(n)*goarch.PtrSize))
+}
+
+// traceFrames returns the frames corresponding to pcs. It may
+// allocate and may emit trace events.
+func traceFrames(bufp traceBufPtr, pcs []uintptr) ([]traceFrame, traceBufPtr) {
+	frames := make([]traceFrame, 0, len(pcs))
+	ci := CallersFrames(pcs)
+	for {
+		var frame traceFrame
+		f, more := ci.Next()
+		frame, bufp = traceFrameForPC(bufp, 0, f)
+		frames = append(frames, frame)
+		if !more {
+			return frames, bufp
+		}
+	}
+}
+
+// dump writes all previously cached stacks to trace buffers,
+// releases all memory and resets state.
+//
+// This must run on the system stack because it calls traceFlush.
+//
+//go:systemstack
+func (tab *traceStackTable) dump(bufp traceBufPtr) traceBufPtr {
+	for i := range tab.tab {
+		stk := tab.tab[i].ptr()
+		for ; stk != nil; stk = stk.link.ptr() {
+			var frames []traceFrame
+			frames, bufp = traceFrames(bufp, fpunwindExpand(stk.stack()))
+
+			// Estimate the size of this record. This
+			// bound is pretty loose, but avoids counting
+			// lots of varint sizes.
+			maxSize := 1 + traceBytesPerNumber + (2+4*len(frames))*traceBytesPerNumber
+			// Make sure we have enough buffer space.
+			if buf := bufp.ptr(); len(buf.arr)-buf.pos < maxSize {
+				bufp = traceFlush(bufp, 0)
+			}
+
+			// Emit header, with space reserved for length.
+			buf := bufp.ptr()
+			buf.byte(traceEvStack | 3<<traceArgCountShift)
+			lenPos := buf.pos
+			buf.pos += traceBytesPerNumber
+
+			// Emit body.
+			recPos := buf.pos
+			buf.varint(uint64(stk.id))
+			buf.varint(uint64(len(frames)))
+			for _, frame := range frames {
+				buf.varint(uint64(frame.PC))
+				buf.varint(frame.funcID)
+				buf.varint(frame.fileID)
+				buf.varint(frame.line)
+			}
+
+			// Fill in size header.
+			buf.varintAt(lenPos, uint64(buf.pos-recPos))
+		}
+	}
+
+	tab.mem.drop()
+	*tab = traceStackTable{}
+	lockInit(&((*tab).lock), lockRankTraceStackTab)
+
+	return bufp
+}
+
+// fpunwindExpand checks if pcBuf contains logical frames (which include inlined
+// frames) or physical frames (produced by frame pointer unwinding) using a
+// sentinel value in pcBuf[0]. Logical frames are simply returned without the
+// sentinel. Physical frames are turned into logical frames via inline unwinding
+// and by applying the skip value that's stored in pcBuf[0].
+func fpunwindExpand(pcBuf []uintptr) []uintptr {
+	if len(pcBuf) > 0 && pcBuf[0] == logicalStackSentinel {
+		// pcBuf contains logical rather than inlined frames, skip has already been
+		// applied, just return it without the sentinel value in pcBuf[0].
+		return pcBuf[1:]
+	}
+
+	var (
+		cache      pcvalueCache
+		lastFuncID = abi.FuncIDNormal
+		newPCBuf   = make([]uintptr, 0, traceStackSize)
+		skip       = pcBuf[0]
+		// skipOrAdd skips or appends retPC to newPCBuf and returns true if more
+		// pcs can be added.
+		skipOrAdd = func(retPC uintptr) bool {
+			if skip > 0 {
+				skip--
+			} else {
+				newPCBuf = append(newPCBuf, retPC)
+			}
+			return len(newPCBuf) < cap(newPCBuf)
+		}
+	)
+
+outer:
+	for _, retPC := range pcBuf[1:] {
+		callPC := retPC - 1
+		fi := findfunc(callPC)
+		if !fi.valid() {
+			// There is no funcInfo if callPC belongs to a C function. In this case
+			// we still keep the pc, but don't attempt to expand inlined frames.
+			if more := skipOrAdd(retPC); !more {
+				break outer
+			}
+			continue
+		}
+
+		u, uf := newInlineUnwinder(fi, callPC, &cache)
+		for ; uf.valid(); uf = u.next(uf) {
+			sf := u.srcFunc(uf)
+			if sf.funcID == abi.FuncIDWrapper && elideWrapperCalling(lastFuncID) {
+				// ignore wrappers
+			} else if more := skipOrAdd(uf.pc + 1); !more {
+				break outer
+			}
+			lastFuncID = sf.funcID
+		}
+	}
+	return newPCBuf
+}
+
+type traceFrame struct {
+	PC     uintptr
+	funcID uint64
+	fileID uint64
+	line   uint64
+}
+
+// traceFrameForPC records the frame information.
+// It may allocate memory.
+func traceFrameForPC(buf traceBufPtr, pid int32, f Frame) (traceFrame, traceBufPtr) {
+	bufp := &buf
+	var frame traceFrame
+	frame.PC = f.PC
+
+	fn := f.Function
+	const maxLen = 1 << 10
+	if len(fn) > maxLen {
+		fn = fn[len(fn)-maxLen:]
+	}
+	frame.funcID, bufp = traceString(bufp, pid, fn)
+	frame.line = uint64(f.Line)
+	file := f.File
+	if len(file) > maxLen {
+		file = file[len(file)-maxLen:]
+	}
+	frame.fileID, bufp = traceString(bufp, pid, file)
+	return frame, (*bufp)
+}
+
+// traceAlloc is a non-thread-safe region allocator.
+// It holds a linked list of traceAllocBlock.
+type traceAlloc struct {
+	head traceAllocBlockPtr
+	off  uintptr
+}
+
+// traceAllocBlock is a block in traceAlloc.
+//
+// traceAllocBlock is allocated from non-GC'd memory, so it must not
+// contain heap pointers. Writes to pointers to traceAllocBlocks do
+// not need write barriers.
+type traceAllocBlock struct {
+	_    sys.NotInHeap
+	next traceAllocBlockPtr
+	data [64<<10 - goarch.PtrSize]byte
+}
+
+// TODO: Since traceAllocBlock is now embedded runtime/internal/sys.NotInHeap, this isn't necessary.
+type traceAllocBlockPtr uintptr
+
+func (p traceAllocBlockPtr) ptr() *traceAllocBlock   { return (*traceAllocBlock)(unsafe.Pointer(p)) }
+func (p *traceAllocBlockPtr) set(x *traceAllocBlock) { *p = traceAllocBlockPtr(unsafe.Pointer(x)) }
+
+// alloc allocates n-byte block.
+func (a *traceAlloc) alloc(n uintptr) unsafe.Pointer {
+	n = alignUp(n, goarch.PtrSize)
+	if a.head == 0 || a.off+n > uintptr(len(a.head.ptr().data)) {
+		if n > uintptr(len(a.head.ptr().data)) {
+			throw("trace: alloc too large")
+		}
+		block := (*traceAllocBlock)(sysAlloc(unsafe.Sizeof(traceAllocBlock{}), &memstats.other_sys))
+		if block == nil {
+			throw("trace: out of memory")
+		}
+		block.next.set(a.head.ptr())
+		a.head.set(block)
+		a.off = 0
+	}
+	p := &a.head.ptr().data[a.off]
+	a.off += n
+	return unsafe.Pointer(p)
+}
+
+// drop frees all previously allocated memory and resets the allocator.
+func (a *traceAlloc) drop() {
+	for a.head != 0 {
+		block := a.head.ptr()
+		a.head.set(block.next.ptr())
+		sysFree(unsafe.Pointer(block), unsafe.Sizeof(traceAllocBlock{}), &memstats.other_sys)
+	}
+}
+
+// The following functions write specific events to trace.
+
+func traceGomaxprocs(procs int32) {
+	traceEvent(traceEvGomaxprocs, 1, uint64(procs))
+}
+
+func traceProcStart() {
+	traceEvent(traceEvProcStart, -1, uint64(getg().m.id))
+}
+
+func traceProcStop(pp *p) {
+	// Sysmon and stopTheWorld can stop Ps blocked in syscalls,
+	// to handle this we temporary employ the P.
+	mp := acquirem()
+	oldp := mp.p
+	mp.p.set(pp)
+	traceEvent(traceEvProcStop, -1)
+	mp.p = oldp
+	releasem(mp)
+}
+
+func traceGCStart() {
+	traceEvent(traceEvGCStart, 3, trace.seqGC)
+	trace.seqGC++
+}
+
+func traceGCDone() {
+	traceEvent(traceEvGCDone, -1)
+}
+
+func traceSTWStart(reason stwReason) {
+	// Don't trace if this STW is for trace start/stop, since traceEnabled
+	// switches during a STW.
+	if reason == stwStartTrace || reason == stwStopTrace {
+		return
+	}
+	getg().m.trace.tracedSTWStart = true
+	traceEvent(traceEvSTWStart, -1, uint64(reason))
+}
+
+func traceSTWDone() {
+	mp := getg().m
+	if !mp.trace.tracedSTWStart {
+		return
+	}
+	mp.trace.tracedSTWStart = false
+	traceEvent(traceEvSTWDone, -1)
+}
+
+// traceGCSweepStart prepares to trace a sweep loop. This does not
+// emit any events until traceGCSweepSpan is called.
+//
+// traceGCSweepStart must be paired with traceGCSweepDone and there
+// must be no preemption points between these two calls.
+func traceGCSweepStart() {
+	// Delay the actual GCSweepStart event until the first span
+	// sweep. If we don't sweep anything, don't emit any events.
+	pp := getg().m.p.ptr()
+	if pp.trace.inSweep {
+		throw("double traceGCSweepStart")
+	}
+	pp.trace.inSweep, pp.trace.swept, pp.trace.reclaimed = true, 0, 0
+}
+
+// traceGCSweepSpan traces the sweep of a single page.
+//
+// This may be called outside a traceGCSweepStart/traceGCSweepDone
+// pair; however, it will not emit any trace events in this case.
+func traceGCSweepSpan(bytesSwept uintptr) {
+	pp := getg().m.p.ptr()
+	if pp.trace.inSweep {
+		if pp.trace.swept == 0 {
+			traceEvent(traceEvGCSweepStart, 1)
+		}
+		pp.trace.swept += bytesSwept
+	}
+}
+
+func traceGCSweepDone() {
+	pp := getg().m.p.ptr()
+	if !pp.trace.inSweep {
+		throw("missing traceGCSweepStart")
+	}
+	if pp.trace.swept != 0 {
+		traceEvent(traceEvGCSweepDone, -1, uint64(pp.trace.swept), uint64(pp.trace.reclaimed))
+	}
+	pp.trace.inSweep = false
+}
+
+func traceGCMarkAssistStart() {
+	traceEvent(traceEvGCMarkAssistStart, 1)
+}
+
+func traceGCMarkAssistDone() {
+	traceEvent(traceEvGCMarkAssistDone, -1)
+}
+
+func traceGoCreate(newg *g, pc uintptr) {
+	newg.trace.seq = 0
+	newg.trace.lastP = getg().m.p
+	// +PCQuantum because traceFrameForPC expects return PCs and subtracts PCQuantum.
+	id := trace.stackTab.put([]uintptr{logicalStackSentinel, startPCforTrace(pc) + sys.PCQuantum})
+	traceEvent(traceEvGoCreate, 2, newg.goid, uint64(id))
+}
+
+func traceGoStart() {
+	gp := getg().m.curg
+	pp := gp.m.p
+	gp.trace.seq++
+	if pp.ptr().gcMarkWorkerMode != gcMarkWorkerNotWorker {
+		traceEvent(traceEvGoStartLabel, -1, gp.goid, gp.trace.seq, trace.markWorkerLabels[pp.ptr().gcMarkWorkerMode])
+	} else if gp.trace.lastP == pp {
+		traceEvent(traceEvGoStartLocal, -1, gp.goid)
+	} else {
+		gp.trace.lastP = pp
+		traceEvent(traceEvGoStart, -1, gp.goid, gp.trace.seq)
+	}
+}
+
+func traceGoEnd() {
+	traceEvent(traceEvGoEnd, -1)
+}
+
+func traceGoSched() {
+	gp := getg()
+	gp.trace.lastP = gp.m.p
+	traceEvent(traceEvGoSched, 1)
+}
+
+func traceGoPreempt() {
+	gp := getg()
+	gp.trace.lastP = gp.m.p
+	traceEvent(traceEvGoPreempt, 1)
+}
+
+func traceGoPark(reason traceBlockReason, skip int) {
+	// Convert the block reason directly to a trace event type.
+	// See traceBlockReason for more information.
+	traceEvent(byte(reason), skip)
+}
+
+func traceGoUnpark(gp *g, skip int) {
+	pp := getg().m.p
+	gp.trace.seq++
+	if gp.trace.lastP == pp {
+		traceEvent(traceEvGoUnblockLocal, skip, gp.goid)
+	} else {
+		gp.trace.lastP = pp
+		traceEvent(traceEvGoUnblock, skip, gp.goid, gp.trace.seq)
+	}
+}
+
+func traceGoSysCall() {
+	var skip int
+	switch {
+	case tracefpunwindoff():
+		// Unwind by skipping 1 frame relative to gp.syscallsp which is captured 3
+		// frames above this frame. For frame pointer unwinding we produce the same
+		// results by hard coding the number of frames in between our caller and the
+		// actual syscall, see cases below.
+		// TODO(felixge): Implement gp.syscallbp to avoid this workaround?
+		skip = 1
+	case GOOS == "solaris" || GOOS == "illumos":
+		// These platforms don't use a libc_read_trampoline.
+		skip = 3
+	default:
+		// Skip the extra trampoline frame used on most systems.
+		skip = 4
+	}
+	getg().m.curg.trace.tracedSyscallEnter = true
+	traceEvent(traceEvGoSysCall, skip)
+}
+
+func traceGoSysExit() {
+	gp := getg().m.curg
+	if !gp.trace.tracedSyscallEnter {
+		// There was no syscall entry traced for us at all, so there's definitely
+		// no EvGoSysBlock or EvGoInSyscall before us, which EvGoSysExit requires.
+		return
+	}
+	gp.trace.tracedSyscallEnter = false
+	ts := gp.trace.sysExitTime
+	if ts != 0 && ts < trace.startTime {
+		// There is a race between the code that initializes sysExitTimes
+		// (in exitsyscall, which runs without a P, and therefore is not
+		// stopped with the rest of the world) and the code that initializes
+		// a new trace. The recorded sysExitTime must therefore be treated
+		// as "best effort". If they are valid for this trace, then great,
+		// use them for greater accuracy. But if they're not valid for this
+		// trace, assume that the trace was started after the actual syscall
+		// exit (but before we actually managed to start the goroutine,
+		// aka right now), and assign a fresh time stamp to keep the log consistent.
+		ts = 0
+	}
+	gp.trace.sysExitTime = 0
+	gp.trace.seq++
+	gp.trace.lastP = gp.m.p
+	traceEvent(traceEvGoSysExit, -1, gp.goid, gp.trace.seq, uint64(ts))
+}
+
+func traceGoSysBlock(pp *p) {
+	// Sysmon and stopTheWorld can declare syscalls running on remote Ps as blocked,
+	// to handle this we temporary employ the P.
+	mp := acquirem()
+	oldp := mp.p
+	mp.p.set(pp)
+	traceEvent(traceEvGoSysBlock, -1)
+	mp.p = oldp
+	releasem(mp)
+}
+
+func traceHeapAlloc(live uint64) {
+	traceEvent(traceEvHeapAlloc, -1, live)
+}
+
+func traceHeapGoal() {
+	heapGoal := gcController.heapGoal()
+	if heapGoal == ^uint64(0) {
+		// Heap-based triggering is disabled.
+		traceEvent(traceEvHeapGoal, -1, 0)
+	} else {
+		traceEvent(traceEvHeapGoal, -1, heapGoal)
+	}
+}
+
+// To access runtime functions from runtime/trace.
+// See runtime/trace/annotation.go
+
+//go:linkname trace_userTaskCreate runtime/trace.userTaskCreate
+func trace_userTaskCreate(id, parentID uint64, taskType string) {
+	if !trace.enabled {
+		return
+	}
+
+	// Same as in traceEvent.
+	mp, pid, bufp := traceAcquireBuffer()
+	if !trace.enabled && !mp.trace.startingTrace {
+		traceReleaseBuffer(mp, pid)
+		return
+	}
+
+	typeStringID, bufp := traceString(bufp, pid, taskType)
+	traceEventLocked(0, mp, pid, bufp, traceEvUserTaskCreate, 0, 3, id, parentID, typeStringID)
+	traceReleaseBuffer(mp, pid)
+}
+
+//go:linkname trace_userTaskEnd runtime/trace.userTaskEnd
+func trace_userTaskEnd(id uint64) {
+	traceEvent(traceEvUserTaskEnd, 2, id)
+}
+
+//go:linkname trace_userRegion runtime/trace.userRegion
+func trace_userRegion(id, mode uint64, name string) {
+	if !trace.enabled {
+		return
+	}
+
+	mp, pid, bufp := traceAcquireBuffer()
+	if !trace.enabled && !mp.trace.startingTrace {
+		traceReleaseBuffer(mp, pid)
+		return
+	}
+
+	nameStringID, bufp := traceString(bufp, pid, name)
+	traceEventLocked(0, mp, pid, bufp, traceEvUserRegion, 0, 3, id, mode, nameStringID)
+	traceReleaseBuffer(mp, pid)
+}
+
+//go:linkname trace_userLog runtime/trace.userLog
+func trace_userLog(id uint64, category, message string) {
+	if !trace.enabled {
+		return
+	}
+
+	mp, pid, bufp := traceAcquireBuffer()
+	if !trace.enabled && !mp.trace.startingTrace {
+		traceReleaseBuffer(mp, pid)
+		return
+	}
+
+	categoryID, bufp := traceString(bufp, pid, category)
+
+	// The log message is recorded after all of the normal trace event
+	// arguments, including the task, category, and stack IDs. We must ask
+	// traceEventLocked to reserve extra space for the length of the message
+	// and the message itself.
+	extraSpace := traceBytesPerNumber + len(message)
+	traceEventLocked(extraSpace, mp, pid, bufp, traceEvUserLog, 0, 3, id, categoryID)
+	buf := bufp.ptr()
+
+	// double-check the message and its length can fit.
+	// Otherwise, truncate the message.
+	slen := len(message)
+	if room := len(buf.arr) - buf.pos; room < slen+traceBytesPerNumber {
+		slen = room
+	}
+	buf.varint(uint64(slen))
+	buf.pos += copy(buf.arr[buf.pos:], message[:slen])
+
+	traceReleaseBuffer(mp, pid)
+}
+
+// the start PC of a goroutine for tracing purposes. If pc is a wrapper,
+// it returns the PC of the wrapped function. Otherwise it returns pc.
+func startPCforTrace(pc uintptr) uintptr {
+	f := findfunc(pc)
+	if !f.valid() {
+		return pc // may happen for locked g in extra M since its pc is 0.
+	}
+	w := funcdata(f, abi.FUNCDATA_WrapInfo)
+	if w == nil {
+		return pc // not a wrapper
+	}
+	return f.datap.textAddr(*(*uint32)(w))
+}
+
+// traceOneNewExtraM registers the fact that a new extra M was created with
+// the tracer. This matters if the M (which has an attached G) is used while
+// the trace is still active because if it is, we need the fact that it exists
+// to show up in the final trace.
+func traceOneNewExtraM(gp *g) {
+	// Trigger two trace events for the locked g in the extra m,
+	// since the next event of the g will be traceEvGoSysExit in exitsyscall,
+	// while calling from C thread to Go.
+	traceGoCreate(gp, 0) // no start pc
+	gp.trace.seq++
+	traceEvent(traceEvGoInSyscall, -1, gp.goid)
+}
+
+// traceTime represents a timestamp for the trace.
+type traceTime uint64
+
+// traceClockNow returns a monotonic timestamp. The clock this function gets
+// the timestamp from is specific to tracing, and shouldn't be mixed with other
+// clock sources.
+//
+// nosplit because it's called from exitsyscall, which is nosplit.
+//
+//go:nosplit
+func traceClockNow() traceTime {
+	return traceTime(cputicks() / traceTimeDiv)
+}
diff --git a/src/runtime/trace/annotation.go b/src/runtime/trace/annotation.go
new file mode 100644
index 0000000..2666d14
--- /dev/null
+++ b/src/runtime/trace/annotation.go
@@ -0,0 +1,198 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package trace
+
+import (
+	"context"
+	"fmt"
+	"sync/atomic"
+	_ "unsafe"
+)
+
+type traceContextKey struct{}
+
+// NewTask creates a task instance with the type taskType and returns
+// it along with a Context that carries the task.
+// If the input context contains a task, the new task is its subtask.
+//
+// The taskType is used to classify task instances. Analysis tools
+// like the Go execution tracer may assume there are only a bounded
+// number of unique task types in the system.
+//
+// The returned Task's [Task.End] method is used to mark the task's end.
+// The trace tool measures task latency as the time between task creation
+// and when the End method is called, and provides the latency
+// distribution per task type.
+// If the End method is called multiple times, only the first
+// call is used in the latency measurement.
+//
+//	ctx, task := trace.NewTask(ctx, "awesomeTask")
+//	trace.WithRegion(ctx, "preparation", prepWork)
+//	// preparation of the task
+//	go func() {  // continue processing the task in a separate goroutine.
+//	    defer task.End()
+//	    trace.WithRegion(ctx, "remainingWork", remainingWork)
+//	}()
+func NewTask(pctx context.Context, taskType string) (ctx context.Context, task *Task) {
+	pid := fromContext(pctx).id
+	id := newID()
+	userTaskCreate(id, pid, taskType)
+	s := &Task{id: id}
+	return context.WithValue(pctx, traceContextKey{}, s), s
+
+	// We allocate a new task even when
+	// the tracing is disabled because the context and task
+	// can be used across trace enable/disable boundaries,
+	// which complicates the problem.
+	//
+	// For example, consider the following scenario:
+	//   - trace is enabled.
+	//   - trace.WithRegion is called, so a new context ctx
+	//     with a new region is created.
+	//   - trace is disabled.
+	//   - trace is enabled again.
+	//   - trace APIs with the ctx is called. Is the ID in the task
+	//   a valid one to use?
+	//
+	// TODO(hyangah): reduce the overhead at least when
+	// tracing is disabled. Maybe the id can embed a tracing
+	// round number and ignore ids generated from previous
+	// tracing round.
+}
+
+func fromContext(ctx context.Context) *Task {
+	if s, ok := ctx.Value(traceContextKey{}).(*Task); ok {
+		return s
+	}
+	return &bgTask
+}
+
+// Task is a data type for tracing a user-defined, logical operation.
+type Task struct {
+	id uint64
+	// TODO(hyangah): record parent id?
+}
+
+// End marks the end of the operation represented by the [Task].
+func (t *Task) End() {
+	userTaskEnd(t.id)
+}
+
+var lastTaskID uint64 = 0 // task id issued last time
+
+func newID() uint64 {
+	// TODO(hyangah): use per-P cache
+	return atomic.AddUint64(&lastTaskID, 1)
+}
+
+var bgTask = Task{id: uint64(0)}
+
+// Log emits a one-off event with the given category and message.
+// Category can be empty and the API assumes there are only a handful of
+// unique categories in the system.
+func Log(ctx context.Context, category, message string) {
+	id := fromContext(ctx).id
+	userLog(id, category, message)
+}
+
+// Logf is like [Log], but the value is formatted using the specified format spec.
+func Logf(ctx context.Context, category, format string, args ...any) {
+	if IsEnabled() {
+		// Ideally this should be just Log, but that will
+		// add one more frame in the stack trace.
+		id := fromContext(ctx).id
+		userLog(id, category, fmt.Sprintf(format, args...))
+	}
+}
+
+const (
+	regionStartCode = uint64(0)
+	regionEndCode   = uint64(1)
+)
+
+// WithRegion starts a region associated with its calling goroutine, runs fn,
+// and then ends the region. If the context carries a task, the region is
+// associated with the task. Otherwise, the region is attached to the background
+// task.
+//
+// The regionType is used to classify regions, so there should be only a
+// handful of unique region types.
+func WithRegion(ctx context.Context, regionType string, fn func()) {
+	// NOTE:
+	// WithRegion helps avoiding misuse of the API but in practice,
+	// this is very restrictive:
+	// - Use of WithRegion makes the stack traces captured from
+	//   region start and end are identical.
+	// - Refactoring the existing code to use WithRegion is sometimes
+	//   hard and makes the code less readable.
+	//     e.g. code block nested deep in the loop with various
+	//          exit point with return values
+	// - Refactoring the code to use this API with closure can
+	//   cause different GC behavior such as retaining some parameters
+	//   longer.
+	// This causes more churns in code than I hoped, and sometimes
+	// makes the code less readable.
+
+	id := fromContext(ctx).id
+	userRegion(id, regionStartCode, regionType)
+	defer userRegion(id, regionEndCode, regionType)
+	fn()
+}
+
+// StartRegion starts a region and returns it.
+// The returned Region's [Region.End] method must be called
+// from the same goroutine where the region was started.
+// Within each goroutine, regions must nest. That is, regions started
+// after this region must be ended before this region can be ended.
+// Recommended usage is
+//
+//	defer trace.StartRegion(ctx, "myTracedRegion").End()
+func StartRegion(ctx context.Context, regionType string) *Region {
+	if !IsEnabled() {
+		return noopRegion
+	}
+	id := fromContext(ctx).id
+	userRegion(id, regionStartCode, regionType)
+	return &Region{id, regionType}
+}
+
+// Region is a region of code whose execution time interval is traced.
+type Region struct {
+	id         uint64
+	regionType string
+}
+
+var noopRegion = &Region{}
+
+// End marks the end of the traced code region.
+func (r *Region) End() {
+	if r == noopRegion {
+		return
+	}
+	userRegion(r.id, regionEndCode, r.regionType)
+}
+
+// IsEnabled reports whether tracing is enabled.
+// The information is advisory only. The tracing status
+// may have changed by the time this function returns.
+func IsEnabled() bool {
+	return tracing.enabled.Load()
+}
+
+//
+// Function bodies are defined in runtime/trace.go
+//
+
+// emits UserTaskCreate event.
+func userTaskCreate(id, parentID uint64, taskType string)
+
+// emits UserTaskEnd event.
+func userTaskEnd(id uint64)
+
+// emits UserRegion event.
+func userRegion(id, mode uint64, regionType string)
+
+// emits UserLog event.
+func userLog(id uint64, category, message string)
diff --git a/src/runtime/trace/annotation_test.go b/src/runtime/trace/annotation_test.go
new file mode 100644
index 0000000..69ea8f2
--- /dev/null
+++ b/src/runtime/trace/annotation_test.go
@@ -0,0 +1,156 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package trace_test
+
+import (
+	"bytes"
+	"context"
+	"fmt"
+	"internal/trace"
+	"reflect"
+	. "runtime/trace"
+	"strings"
+	"sync"
+	"testing"
+)
+
+func BenchmarkStartRegion(b *testing.B) {
+	b.ReportAllocs()
+	ctx, task := NewTask(context.Background(), "benchmark")
+	defer task.End()
+
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			StartRegion(ctx, "region").End()
+		}
+	})
+}
+
+func BenchmarkNewTask(b *testing.B) {
+	b.ReportAllocs()
+	pctx, task := NewTask(context.Background(), "benchmark")
+	defer task.End()
+
+	b.RunParallel(func(pb *testing.PB) {
+		for pb.Next() {
+			_, task := NewTask(pctx, "task")
+			task.End()
+		}
+	})
+}
+
+func TestUserTaskRegion(t *testing.T) {
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+	bgctx, cancel := context.WithCancel(context.Background())
+	defer cancel()
+
+	preExistingRegion := StartRegion(bgctx, "pre-existing region")
+
+	buf := new(bytes.Buffer)
+	if err := Start(buf); err != nil {
+		t.Fatalf("failed to start tracing: %v", err)
+	}
+
+	// Beginning of traced execution
+	var wg sync.WaitGroup
+	ctx, task := NewTask(bgctx, "task0") // EvUserTaskCreate("task0")
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		defer task.End() // EvUserTaskEnd("task0")
+
+		WithRegion(ctx, "region0", func() {
+			// EvUserRegionCreate("region0", start)
+			WithRegion(ctx, "region1", func() {
+				Log(ctx, "key0", "0123456789abcdef") // EvUserLog("task0", "key0", "0....f")
+			})
+			// EvUserRegion("region0", end)
+		})
+	}()
+
+	wg.Wait()
+
+	preExistingRegion.End()
+	postExistingRegion := StartRegion(bgctx, "post-existing region")
+
+	// End of traced execution
+	Stop()
+
+	postExistingRegion.End()
+
+	saveTrace(t, buf, "TestUserTaskRegion")
+	res, err := trace.Parse(buf, "")
+	if err == trace.ErrTimeOrder {
+		// golang.org/issues/16755
+		t.Skipf("skipping trace: %v", err)
+	}
+	if err != nil {
+		t.Fatalf("Parse failed: %v", err)
+	}
+
+	// Check whether we see all user annotation related records in order
+	type testData struct {
+		typ     byte
+		strs    []string
+		args    []uint64
+		setLink bool
+	}
+
+	var got []testData
+	tasks := map[uint64]string{}
+	for _, e := range res.Events {
+		t.Logf("%s", e)
+		switch e.Type {
+		case trace.EvUserTaskCreate:
+			taskName := e.SArgs[0]
+			got = append(got, testData{trace.EvUserTaskCreate, []string{taskName}, nil, e.Link != nil})
+			if e.Link != nil && e.Link.Type != trace.EvUserTaskEnd {
+				t.Errorf("Unexpected linked event %q->%q", e, e.Link)
+			}
+			tasks[e.Args[0]] = taskName
+		case trace.EvUserLog:
+			key, val := e.SArgs[0], e.SArgs[1]
+			taskName := tasks[e.Args[0]]
+			got = append(got, testData{trace.EvUserLog, []string{taskName, key, val}, nil, e.Link != nil})
+		case trace.EvUserTaskEnd:
+			taskName := tasks[e.Args[0]]
+			got = append(got, testData{trace.EvUserTaskEnd, []string{taskName}, nil, e.Link != nil})
+			if e.Link != nil && e.Link.Type != trace.EvUserTaskCreate {
+				t.Errorf("Unexpected linked event %q->%q", e, e.Link)
+			}
+		case trace.EvUserRegion:
+			taskName := tasks[e.Args[0]]
+			regionName := e.SArgs[0]
+			got = append(got, testData{trace.EvUserRegion, []string{taskName, regionName}, []uint64{e.Args[1]}, e.Link != nil})
+			if e.Link != nil && (e.Link.Type != trace.EvUserRegion || e.Link.SArgs[0] != regionName) {
+				t.Errorf("Unexpected linked event %q->%q", e, e.Link)
+			}
+		}
+	}
+	want := []testData{
+		{trace.EvUserTaskCreate, []string{"task0"}, nil, true},
+		{trace.EvUserRegion, []string{"task0", "region0"}, []uint64{0}, true},
+		{trace.EvUserRegion, []string{"task0", "region1"}, []uint64{0}, true},
+		{trace.EvUserLog, []string{"task0", "key0", "0123456789abcdef"}, nil, false},
+		{trace.EvUserRegion, []string{"task0", "region1"}, []uint64{1}, false},
+		{trace.EvUserRegion, []string{"task0", "region0"}, []uint64{1}, false},
+		{trace.EvUserTaskEnd, []string{"task0"}, nil, false},
+		//  Currently, pre-existing region is not recorded to avoid allocations.
+		//  {trace.EvUserRegion, []string{"", "pre-existing region"}, []uint64{1}, false},
+		{trace.EvUserRegion, []string{"", "post-existing region"}, []uint64{0}, false},
+	}
+	if !reflect.DeepEqual(got, want) {
+		pretty := func(data []testData) string {
+			var s strings.Builder
+			for _, d := range data {
+				fmt.Fprintf(&s, "\t%+v\n", d)
+			}
+			return s.String()
+		}
+		t.Errorf("Got user region related events\n%+v\nwant:\n%+v", pretty(got), pretty(want))
+	}
+}
diff --git a/src/runtime/trace/example_test.go b/src/runtime/trace/example_test.go
new file mode 100644
index 0000000..ba96a82
--- /dev/null
+++ b/src/runtime/trace/example_test.go
@@ -0,0 +1,39 @@
+// Copyright 2017 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package trace_test
+
+import (
+	"fmt"
+	"log"
+	"os"
+	"runtime/trace"
+)
+
+// Example demonstrates the use of the trace package to trace
+// the execution of a Go program. The trace output will be
+// written to the file trace.out
+func Example() {
+	f, err := os.Create("trace.out")
+	if err != nil {
+		log.Fatalf("failed to create trace output file: %v", err)
+	}
+	defer func() {
+		if err := f.Close(); err != nil {
+			log.Fatalf("failed to close trace file: %v", err)
+		}
+	}()
+
+	if err := trace.Start(f); err != nil {
+		log.Fatalf("failed to start trace: %v", err)
+	}
+	defer trace.Stop()
+
+	// your program here
+	RunMyProgram()
+}
+
+func RunMyProgram() {
+	fmt.Printf("this function will be traced")
+}
diff --git a/src/runtime/trace/trace.go b/src/runtime/trace/trace.go
new file mode 100644
index 0000000..935d222
--- /dev/null
+++ b/src/runtime/trace/trace.go
@@ -0,0 +1,154 @@
+// Copyright 2015 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Package trace contains facilities for programs to generate traces
+// for the Go execution tracer.
+//
+// # Tracing runtime activities
+//
+// The execution trace captures a wide range of execution events such as
+// goroutine creation/blocking/unblocking, syscall enter/exit/block,
+// GC-related events, changes of heap size, processor start/stop, etc.
+// When CPU profiling is active, the execution tracer makes an effort to
+// include those samples as well.
+// A precise nanosecond-precision timestamp and a stack trace is
+// captured for most events. The generated trace can be interpreted
+// using `go tool trace`.
+//
+// Support for tracing tests and benchmarks built with the standard
+// testing package is built into `go test`. For example, the following
+// command runs the test in the current directory and writes the trace
+// file (trace.out).
+//
+//	go test -trace=trace.out
+//
+// This runtime/trace package provides APIs to add equivalent tracing
+// support to a standalone program. See the Example that demonstrates
+// how to use this API to enable tracing.
+//
+// There is also a standard HTTP interface to trace data. Adding the
+// following line will install a handler under the /debug/pprof/trace URL
+// to download a live trace:
+//
+//	import _ "net/http/pprof"
+//
+// See the [net/http/pprof] package for more details about all of the
+// debug endpoints installed by this import.
+//
+// # User annotation
+//
+// Package trace provides user annotation APIs that can be used to
+// log interesting events during execution.
+//
+// There are three types of user annotations: log messages, regions,
+// and tasks.
+//
+// [Log] emits a timestamped message to the execution trace along with
+// additional information such as the category of the message and
+// which goroutine called [Log]. The execution tracer provides UIs to filter
+// and group goroutines using the log category and the message supplied
+// in [Log].
+//
+// A region is for logging a time interval during a goroutine's execution.
+// By definition, a region starts and ends in the same goroutine.
+// Regions can be nested to represent subintervals.
+// For example, the following code records four regions in the execution
+// trace to trace the durations of sequential steps in a cappuccino making
+// operation.
+//
+//	trace.WithRegion(ctx, "makeCappuccino", func() {
+//
+//	   // orderID allows to identify a specific order
+//	   // among many cappuccino order region records.
+//	   trace.Log(ctx, "orderID", orderID)
+//
+//	   trace.WithRegion(ctx, "steamMilk", steamMilk)
+//	   trace.WithRegion(ctx, "extractCoffee", extractCoffee)
+//	   trace.WithRegion(ctx, "mixMilkCoffee", mixMilkCoffee)
+//	})
+//
+// A task is a higher-level component that aids tracing of logical
+// operations such as an RPC request, an HTTP request, or an
+// interesting local operation which may require multiple goroutines
+// working together. Since tasks can involve multiple goroutines,
+// they are tracked via a [context.Context] object. [NewTask] creates
+// a new task and embeds it in the returned [context.Context] object.
+// Log messages and regions are attached to the task, if any, in the
+// Context passed to [Log] and [WithRegion].
+//
+// For example, assume that we decided to froth milk, extract coffee,
+// and mix milk and coffee in separate goroutines. With a task,
+// the trace tool can identify the goroutines involved in a specific
+// cappuccino order.
+//
+//	ctx, task := trace.NewTask(ctx, "makeCappuccino")
+//	trace.Log(ctx, "orderID", orderID)
+//
+//	milk := make(chan bool)
+//	espresso := make(chan bool)
+//
+//	go func() {
+//	        trace.WithRegion(ctx, "steamMilk", steamMilk)
+//	        milk <- true
+//	}()
+//	go func() {
+//	        trace.WithRegion(ctx, "extractCoffee", extractCoffee)
+//	        espresso <- true
+//	}()
+//	go func() {
+//	        defer task.End() // When assemble is done, the order is complete.
+//	        <-espresso
+//	        <-milk
+//	        trace.WithRegion(ctx, "mixMilkCoffee", mixMilkCoffee)
+//	}()
+//
+// The trace tool computes the latency of a task by measuring the
+// time between the task creation and the task end and provides
+// latency distributions for each task type found in the trace.
+package trace
+
+import (
+	"io"
+	"runtime"
+	"sync"
+	"sync/atomic"
+)
+
+// Start enables tracing for the current program.
+// While tracing, the trace will be buffered and written to w.
+// Start returns an error if tracing is already enabled.
+func Start(w io.Writer) error {
+	tracing.Lock()
+	defer tracing.Unlock()
+
+	if err := runtime.StartTrace(); err != nil {
+		return err
+	}
+	go func() {
+		for {
+			data := runtime.ReadTrace()
+			if data == nil {
+				break
+			}
+			w.Write(data)
+		}
+	}()
+	tracing.enabled.Store(true)
+	return nil
+}
+
+// Stop stops the current tracing, if any.
+// Stop only returns after all the writes for the trace have completed.
+func Stop() {
+	tracing.Lock()
+	defer tracing.Unlock()
+	tracing.enabled.Store(false)
+
+	runtime.StopTrace()
+}
+
+var tracing struct {
+	sync.Mutex // gate mutators (Start, Stop)
+	enabled    atomic.Bool
+}
diff --git a/src/runtime/trace/trace_stack_test.go b/src/runtime/trace/trace_stack_test.go
new file mode 100644
index 0000000..be3adc9
--- /dev/null
+++ b/src/runtime/trace/trace_stack_test.go
@@ -0,0 +1,333 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package trace_test
+
+import (
+	"bytes"
+	"fmt"
+	"internal/testenv"
+	"internal/trace"
+	"net"
+	"os"
+	"runtime"
+	. "runtime/trace"
+	"strings"
+	"sync"
+	"testing"
+	"text/tabwriter"
+	"time"
+)
+
+// TestTraceSymbolize tests symbolization and that events has proper stacks.
+// In particular that we strip bottom uninteresting frames like goexit,
+// top uninteresting frames (runtime guts).
+func TestTraceSymbolize(t *testing.T) {
+	skipTraceSymbolizeTestIfNecessary(t)
+
+	buf := new(bytes.Buffer)
+	if err := Start(buf); err != nil {
+		t.Fatalf("failed to start tracing: %v", err)
+	}
+	defer Stop() // in case of early return
+
+	// Now we will do a bunch of things for which we verify stacks later.
+	// It is impossible to ensure that a goroutine has actually blocked
+	// on a channel, in a select or otherwise. So we kick off goroutines
+	// that need to block first in the hope that while we are executing
+	// the rest of the test, they will block.
+	go func() { // func1
+		select {}
+	}()
+	go func() { // func2
+		var c chan int
+		c <- 0
+	}()
+	go func() { // func3
+		var c chan int
+		<-c
+	}()
+	done1 := make(chan bool)
+	go func() { // func4
+		<-done1
+	}()
+	done2 := make(chan bool)
+	go func() { // func5
+		done2 <- true
+	}()
+	c1 := make(chan int)
+	c2 := make(chan int)
+	go func() { // func6
+		select {
+		case <-c1:
+		case <-c2:
+		}
+	}()
+	var mu sync.Mutex
+	mu.Lock()
+	go func() { // func7
+		mu.Lock()
+		mu.Unlock()
+	}()
+	var wg sync.WaitGroup
+	wg.Add(1)
+	go func() { // func8
+		wg.Wait()
+	}()
+	cv := sync.NewCond(&sync.Mutex{})
+	go func() { // func9
+		cv.L.Lock()
+		cv.Wait()
+		cv.L.Unlock()
+	}()
+	ln, err := net.Listen("tcp", "127.0.0.1:0")
+	if err != nil {
+		t.Fatalf("failed to listen: %v", err)
+	}
+	go func() { // func10
+		c, err := ln.Accept()
+		if err != nil {
+			t.Errorf("failed to accept: %v", err)
+			return
+		}
+		c.Close()
+	}()
+	rp, wp, err := os.Pipe()
+	if err != nil {
+		t.Fatalf("failed to create a pipe: %v", err)
+	}
+	defer rp.Close()
+	defer wp.Close()
+	pipeReadDone := make(chan bool)
+	go func() { // func11
+		var data [1]byte
+		rp.Read(data[:])
+		pipeReadDone <- true
+	}()
+
+	time.Sleep(100 * time.Millisecond)
+	runtime.GC()
+	runtime.Gosched()
+	time.Sleep(100 * time.Millisecond) // the last chance for the goroutines above to block
+	done1 <- true
+	<-done2
+	select {
+	case c1 <- 0:
+	case c2 <- 0:
+	}
+	mu.Unlock()
+	wg.Done()
+	cv.Signal()
+	c, err := net.Dial("tcp", ln.Addr().String())
+	if err != nil {
+		t.Fatalf("failed to dial: %v", err)
+	}
+	c.Close()
+	var data [1]byte
+	wp.Write(data[:])
+	<-pipeReadDone
+
+	oldGoMaxProcs := runtime.GOMAXPROCS(0)
+	runtime.GOMAXPROCS(oldGoMaxProcs + 1)
+
+	Stop()
+
+	runtime.GOMAXPROCS(oldGoMaxProcs)
+
+	events, _ := parseTrace(t, buf)
+
+	// Now check that the stacks are correct.
+	type eventDesc struct {
+		Type byte
+		Stk  []frame
+	}
+	want := []eventDesc{
+		{trace.EvGCStart, []frame{
+			{"runtime.GC", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 0},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoStart, []frame{
+			{"runtime/trace_test.TestTraceSymbolize.func1", 0},
+		}},
+		{trace.EvGoSched, []frame{
+			{"runtime/trace_test.TestTraceSymbolize", 111},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoCreate, []frame{
+			{"runtime/trace_test.TestTraceSymbolize", 40},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoStop, []frame{
+			{"runtime.block", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func1", 0},
+		}},
+		{trace.EvGoStop, []frame{
+			{"runtime.chansend1", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func2", 0},
+		}},
+		{trace.EvGoStop, []frame{
+			{"runtime.chanrecv1", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func3", 0},
+		}},
+		{trace.EvGoBlockRecv, []frame{
+			{"runtime.chanrecv1", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func4", 0},
+		}},
+		{trace.EvGoUnblock, []frame{
+			{"runtime.chansend1", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 113},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoBlockSend, []frame{
+			{"runtime.chansend1", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func5", 0},
+		}},
+		{trace.EvGoUnblock, []frame{
+			{"runtime.chanrecv1", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 114},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoBlockSelect, []frame{
+			{"runtime.selectgo", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func6", 0},
+		}},
+		{trace.EvGoUnblock, []frame{
+			{"runtime.selectgo", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 115},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoBlockSync, []frame{
+			{"sync.(*Mutex).Lock", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func7", 0},
+		}},
+		{trace.EvGoUnblock, []frame{
+			{"sync.(*Mutex).Unlock", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 0},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoBlockSync, []frame{
+			{"sync.(*WaitGroup).Wait", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func8", 0},
+		}},
+		{trace.EvGoUnblock, []frame{
+			{"sync.(*WaitGroup).Add", 0},
+			{"sync.(*WaitGroup).Done", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 120},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoBlockCond, []frame{
+			{"sync.(*Cond).Wait", 0},
+			{"runtime/trace_test.TestTraceSymbolize.func9", 0},
+		}},
+		{trace.EvGoUnblock, []frame{
+			{"sync.(*Cond).Signal", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 0},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGoSleep, []frame{
+			{"time.Sleep", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 0},
+			{"testing.tRunner", 0},
+		}},
+		{trace.EvGomaxprocs, []frame{
+			{"runtime.startTheWorld", 0}, // this is when the current gomaxprocs is logged.
+			{"runtime.startTheWorldGC", 0},
+			{"runtime.GOMAXPROCS", 0},
+			{"runtime/trace_test.TestTraceSymbolize", 0},
+			{"testing.tRunner", 0},
+		}},
+	}
+	// Stacks for the following events are OS-dependent due to OS-specific code in net package.
+	if runtime.GOOS != "windows" && runtime.GOOS != "plan9" {
+		want = append(want, []eventDesc{
+			{trace.EvGoBlockNet, []frame{
+				{"internal/poll.(*FD).Accept", 0},
+				{"net.(*netFD).accept", 0},
+				{"net.(*TCPListener).accept", 0},
+				{"net.(*TCPListener).Accept", 0},
+				{"runtime/trace_test.TestTraceSymbolize.func10", 0},
+			}},
+			{trace.EvGoSysCall, []frame{
+				{"syscall.read", 0},
+				{"syscall.Read", 0},
+				{"internal/poll.ignoringEINTRIO", 0},
+				{"internal/poll.(*FD).Read", 0},
+				{"os.(*File).read", 0},
+				{"os.(*File).Read", 0},
+				{"runtime/trace_test.TestTraceSymbolize.func11", 0},
+			}},
+		}...)
+	}
+	matched := make([]bool, len(want))
+	for _, ev := range events {
+	wantLoop:
+		for i, w := range want {
+			if matched[i] || w.Type != ev.Type || len(w.Stk) != len(ev.Stk) {
+				continue
+			}
+
+			for fi, f := range ev.Stk {
+				wf := w.Stk[fi]
+				if wf.Fn != f.Fn || wf.Line != 0 && wf.Line != f.Line {
+					continue wantLoop
+				}
+			}
+			matched[i] = true
+		}
+	}
+	for i, w := range want {
+		if matched[i] {
+			continue
+		}
+		seen, n := dumpEventStacks(w.Type, events)
+		t.Errorf("Did not match event %v with stack\n%s\nSeen %d events of the type\n%s",
+			trace.EventDescriptions[w.Type].Name, dumpFrames(w.Stk), n, seen)
+	}
+}
+
+func skipTraceSymbolizeTestIfNecessary(t *testing.T) {
+	testenv.MustHaveGoBuild(t)
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+}
+
+func dumpEventStacks(typ byte, events []*trace.Event) ([]byte, int) {
+	matched := 0
+	o := new(bytes.Buffer)
+	tw := tabwriter.NewWriter(o, 0, 8, 0, '\t', 0)
+	for _, ev := range events {
+		if ev.Type != typ {
+			continue
+		}
+		matched++
+		fmt.Fprintf(tw, "Offset %d\n", ev.Off)
+		for _, f := range ev.Stk {
+			fname := f.File
+			if idx := strings.Index(fname, "/go/src/"); idx > 0 {
+				fname = fname[idx:]
+			}
+			fmt.Fprintf(tw, "  %v\t%s:%d\n", f.Fn, fname, f.Line)
+		}
+	}
+	tw.Flush()
+	return o.Bytes(), matched
+}
+
+type frame struct {
+	Fn   string
+	Line int
+}
+
+func dumpFrames(frames []frame) []byte {
+	o := new(bytes.Buffer)
+	tw := tabwriter.NewWriter(o, 0, 8, 0, '\t', 0)
+
+	for _, f := range frames {
+		fmt.Fprintf(tw, "  %v\t :%d\n", f.Fn, f.Line)
+	}
+	tw.Flush()
+	return o.Bytes()
+}
diff --git a/src/runtime/trace/trace_test.go b/src/runtime/trace/trace_test.go
new file mode 100644
index 0000000..04a43a0
--- /dev/null
+++ b/src/runtime/trace/trace_test.go
@@ -0,0 +1,794 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package trace_test
+
+import (
+	"bytes"
+	"context"
+	"flag"
+	"fmt"
+	"internal/profile"
+	"internal/race"
+	"internal/trace"
+	"io"
+	"net"
+	"os"
+	"runtime"
+	"runtime/pprof"
+	. "runtime/trace"
+	"strconv"
+	"strings"
+	"sync"
+	"testing"
+	"time"
+)
+
+var (
+	saveTraces = flag.Bool("savetraces", false, "save traces collected by tests")
+)
+
+// TestEventBatch tests Flush calls that happen during Start
+// don't produce corrupted traces.
+func TestEventBatch(t *testing.T) {
+	if race.Enabled {
+		t.Skip("skipping in race mode")
+	}
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+	if testing.Short() {
+		t.Skip("skipping in short mode")
+	}
+	// During Start, bunch of records are written to reflect the current
+	// snapshot of the program, including state of each goroutines.
+	// And some string constants are written to the trace to aid trace
+	// parsing. This test checks Flush of the buffer occurred during
+	// this process doesn't cause corrupted traces.
+	// When a Flush is called during Start is complicated
+	// so we test with a range of number of goroutines hoping that one
+	// of them triggers Flush.
+	// This range was chosen to fill up a ~64KB buffer with traceEvGoCreate
+	// and traceEvGoWaiting events (12~13bytes per goroutine).
+	for g := 4950; g < 5050; g++ {
+		n := g
+		t.Run("G="+strconv.Itoa(n), func(t *testing.T) {
+			var wg sync.WaitGroup
+			wg.Add(n)
+
+			in := make(chan bool, 1000)
+			for i := 0; i < n; i++ {
+				go func() {
+					<-in
+					wg.Done()
+				}()
+			}
+			buf := new(bytes.Buffer)
+			if err := Start(buf); err != nil {
+				t.Fatalf("failed to start tracing: %v", err)
+			}
+
+			for i := 0; i < n; i++ {
+				in <- true
+			}
+			wg.Wait()
+			Stop()
+
+			_, err := trace.Parse(buf, "")
+			if err == trace.ErrTimeOrder {
+				t.Skipf("skipping trace: %v", err)
+			}
+
+			if err != nil {
+				t.Fatalf("failed to parse trace: %v", err)
+			}
+		})
+	}
+}
+
+func TestTraceStartStop(t *testing.T) {
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+	buf := new(bytes.Buffer)
+	if err := Start(buf); err != nil {
+		t.Fatalf("failed to start tracing: %v", err)
+	}
+	Stop()
+	size := buf.Len()
+	if size == 0 {
+		t.Fatalf("trace is empty")
+	}
+	time.Sleep(100 * time.Millisecond)
+	if size != buf.Len() {
+		t.Fatalf("trace writes after stop: %v -> %v", size, buf.Len())
+	}
+	saveTrace(t, buf, "TestTraceStartStop")
+}
+
+func TestTraceDoubleStart(t *testing.T) {
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+	Stop()
+	buf := new(bytes.Buffer)
+	if err := Start(buf); err != nil {
+		t.Fatalf("failed to start tracing: %v", err)
+	}
+	if err := Start(buf); err == nil {
+		t.Fatalf("succeed to start tracing second time")
+	}
+	Stop()
+	Stop()
+}
+
+func TestTrace(t *testing.T) {
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+	buf := new(bytes.Buffer)
+	if err := Start(buf); err != nil {
+		t.Fatalf("failed to start tracing: %v", err)
+	}
+	Stop()
+	saveTrace(t, buf, "TestTrace")
+	_, err := trace.Parse(buf, "")
+	if err == trace.ErrTimeOrder {
+		t.Skipf("skipping trace: %v", err)
+	}
+	if err != nil {
+		t.Fatalf("failed to parse trace: %v", err)
+	}
+}
+
+func parseTrace(t *testing.T, r io.Reader) ([]*trace.Event, map[uint64]*trace.GDesc) {
+	res, err := trace.Parse(r, "")
+	if err == trace.ErrTimeOrder {
+		t.Skipf("skipping trace: %v", err)
+	}
+	if err != nil {
+		t.Fatalf("failed to parse trace: %v", err)
+	}
+	gs := trace.GoroutineStats(res.Events)
+	for goid := range gs {
+		// We don't do any particular checks on the result at the moment.
+		// But still check that RelatedGoroutines does not crash, hang, etc.
+		_ = trace.RelatedGoroutines(res.Events, goid)
+	}
+	return res.Events, gs
+}
+
+func testBrokenTimestamps(t *testing.T, data []byte) {
+	// On some processors cputicks (used to generate trace timestamps)
+	// produce non-monotonic timestamps. It is important that the parser
+	// distinguishes logically inconsistent traces (e.g. missing, excessive
+	// or misordered events) from broken timestamps. The former is a bug
+	// in tracer, the latter is a machine issue.
+	// So now that we have a consistent trace, test that (1) parser does
+	// not return a logical error in case of broken timestamps
+	// and (2) broken timestamps are eventually detected and reported.
+	trace.BreakTimestampsForTesting = true
+	defer func() {
+		trace.BreakTimestampsForTesting = false
+	}()
+	for i := 0; i < 1e4; i++ {
+		_, err := trace.Parse(bytes.NewReader(data), "")
+		if err == trace.ErrTimeOrder {
+			return
+		}
+		if err != nil {
+			t.Fatalf("failed to parse trace: %v", err)
+		}
+	}
+}
+
+func TestTraceStress(t *testing.T) {
+	switch runtime.GOOS {
+	case "js", "wasip1":
+		t.Skip("no os.Pipe on " + runtime.GOOS)
+	}
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+	if testing.Short() {
+		t.Skip("skipping in -short mode")
+	}
+
+	var wg sync.WaitGroup
+	done := make(chan bool)
+
+	// Create a goroutine blocked before tracing.
+	wg.Add(1)
+	go func() {
+		<-done
+		wg.Done()
+	}()
+
+	// Create a goroutine blocked in syscall before tracing.
+	rp, wp, err := os.Pipe()
+	if err != nil {
+		t.Fatalf("failed to create pipe: %v", err)
+	}
+	defer func() {
+		rp.Close()
+		wp.Close()
+	}()
+	wg.Add(1)
+	go func() {
+		var tmp [1]byte
+		rp.Read(tmp[:])
+		<-done
+		wg.Done()
+	}()
+	time.Sleep(time.Millisecond) // give the goroutine above time to block
+
+	buf := new(bytes.Buffer)
+	if err := Start(buf); err != nil {
+		t.Fatalf("failed to start tracing: %v", err)
+	}
+
+	procs := runtime.GOMAXPROCS(10)
+	time.Sleep(50 * time.Millisecond) // test proc stop/start events
+
+	go func() {
+		runtime.LockOSThread()
+		for {
+			select {
+			case <-done:
+				return
+			default:
+				runtime.Gosched()
+			}
+		}
+	}()
+
+	runtime.GC()
+	// Trigger GC from malloc.
+	n := int(1e3)
+	if isMemoryConstrained() {
+		// Reduce allocation to avoid running out of
+		// memory on the builder - see issue/12032.
+		n = 512
+	}
+	for i := 0; i < n; i++ {
+		_ = make([]byte, 1<<20)
+	}
+
+	// Create a bunch of busy goroutines to load all Ps.
+	for p := 0; p < 10; p++ {
+		wg.Add(1)
+		go func() {
+			// Do something useful.
+			tmp := make([]byte, 1<<16)
+			for i := range tmp {
+				tmp[i]++
+			}
+			_ = tmp
+			<-done
+			wg.Done()
+		}()
+	}
+
+	// Block in syscall.
+	wg.Add(1)
+	go func() {
+		var tmp [1]byte
+		rp.Read(tmp[:])
+		<-done
+		wg.Done()
+	}()
+
+	// Test timers.
+	timerDone := make(chan bool)
+	go func() {
+		time.Sleep(time.Millisecond)
+		timerDone <- true
+	}()
+	<-timerDone
+
+	// A bit of network.
+	ln, err := net.Listen("tcp", "127.0.0.1:0")
+	if err != nil {
+		t.Fatalf("listen failed: %v", err)
+	}
+	defer ln.Close()
+	go func() {
+		c, err := ln.Accept()
+		if err != nil {
+			return
+		}
+		time.Sleep(time.Millisecond)
+		var buf [1]byte
+		c.Write(buf[:])
+		c.Close()
+	}()
+	c, err := net.Dial("tcp", ln.Addr().String())
+	if err != nil {
+		t.Fatalf("dial failed: %v", err)
+	}
+	var tmp [1]byte
+	c.Read(tmp[:])
+	c.Close()
+
+	go func() {
+		runtime.Gosched()
+		select {}
+	}()
+
+	// Unblock helper goroutines and wait them to finish.
+	wp.Write(tmp[:])
+	wp.Write(tmp[:])
+	close(done)
+	wg.Wait()
+
+	runtime.GOMAXPROCS(procs)
+
+	Stop()
+	saveTrace(t, buf, "TestTraceStress")
+	trace := buf.Bytes()
+	parseTrace(t, buf)
+	testBrokenTimestamps(t, trace)
+}
+
+// isMemoryConstrained reports whether the current machine is likely
+// to be memory constrained.
+// This was originally for the openbsd/arm builder (Issue 12032).
+// TODO: move this to testenv? Make this look at memory? Look at GO_BUILDER_NAME?
+func isMemoryConstrained() bool {
+	if runtime.GOOS == "plan9" {
+		return true
+	}
+	switch runtime.GOARCH {
+	case "arm", "mips", "mipsle":
+		return true
+	}
+	return false
+}
+
+// Do a bunch of various stuff (timers, GC, network, etc) in a separate goroutine.
+// And concurrently with all that start/stop trace 3 times.
+func TestTraceStressStartStop(t *testing.T) {
+	switch runtime.GOOS {
+	case "js", "wasip1":
+		t.Skip("no os.Pipe on " + runtime.GOOS)
+	}
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(8))
+	outerDone := make(chan bool)
+
+	go func() {
+		defer func() {
+			outerDone <- true
+		}()
+
+		var wg sync.WaitGroup
+		done := make(chan bool)
+
+		wg.Add(1)
+		go func() {
+			<-done
+			wg.Done()
+		}()
+
+		rp, wp, err := os.Pipe()
+		if err != nil {
+			t.Errorf("failed to create pipe: %v", err)
+			return
+		}
+		defer func() {
+			rp.Close()
+			wp.Close()
+		}()
+		wg.Add(1)
+		go func() {
+			var tmp [1]byte
+			rp.Read(tmp[:])
+			<-done
+			wg.Done()
+		}()
+		time.Sleep(time.Millisecond)
+
+		go func() {
+			runtime.LockOSThread()
+			for {
+				select {
+				case <-done:
+					return
+				default:
+					runtime.Gosched()
+				}
+			}
+		}()
+
+		runtime.GC()
+		// Trigger GC from malloc.
+		n := int(1e3)
+		if isMemoryConstrained() {
+			// Reduce allocation to avoid running out of
+			// memory on the builder.
+			n = 512
+		}
+		for i := 0; i < n; i++ {
+			_ = make([]byte, 1<<20)
+		}
+
+		// Create a bunch of busy goroutines to load all Ps.
+		for p := 0; p < 10; p++ {
+			wg.Add(1)
+			go func() {
+				// Do something useful.
+				tmp := make([]byte, 1<<16)
+				for i := range tmp {
+					tmp[i]++
+				}
+				_ = tmp
+				<-done
+				wg.Done()
+			}()
+		}
+
+		// Block in syscall.
+		wg.Add(1)
+		go func() {
+			var tmp [1]byte
+			rp.Read(tmp[:])
+			<-done
+			wg.Done()
+		}()
+
+		runtime.GOMAXPROCS(runtime.GOMAXPROCS(1))
+
+		// Test timers.
+		timerDone := make(chan bool)
+		go func() {
+			time.Sleep(time.Millisecond)
+			timerDone <- true
+		}()
+		<-timerDone
+
+		// A bit of network.
+		ln, err := net.Listen("tcp", "127.0.0.1:0")
+		if err != nil {
+			t.Errorf("listen failed: %v", err)
+			return
+		}
+		defer ln.Close()
+		go func() {
+			c, err := ln.Accept()
+			if err != nil {
+				return
+			}
+			time.Sleep(time.Millisecond)
+			var buf [1]byte
+			c.Write(buf[:])
+			c.Close()
+		}()
+		c, err := net.Dial("tcp", ln.Addr().String())
+		if err != nil {
+			t.Errorf("dial failed: %v", err)
+			return
+		}
+		var tmp [1]byte
+		c.Read(tmp[:])
+		c.Close()
+
+		go func() {
+			runtime.Gosched()
+			select {}
+		}()
+
+		// Unblock helper goroutines and wait them to finish.
+		wp.Write(tmp[:])
+		wp.Write(tmp[:])
+		close(done)
+		wg.Wait()
+	}()
+
+	for i := 0; i < 3; i++ {
+		buf := new(bytes.Buffer)
+		if err := Start(buf); err != nil {
+			t.Fatalf("failed to start tracing: %v", err)
+		}
+		time.Sleep(time.Millisecond)
+		Stop()
+		saveTrace(t, buf, "TestTraceStressStartStop")
+		trace := buf.Bytes()
+		parseTrace(t, buf)
+		testBrokenTimestamps(t, trace)
+	}
+	<-outerDone
+}
+
+func TestTraceFutileWakeup(t *testing.T) {
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+	buf := new(bytes.Buffer)
+	if err := Start(buf); err != nil {
+		t.Fatalf("failed to start tracing: %v", err)
+	}
+
+	defer runtime.GOMAXPROCS(runtime.GOMAXPROCS(8))
+	c0 := make(chan int, 1)
+	c1 := make(chan int, 1)
+	c2 := make(chan int, 1)
+	const procs = 2
+	var done sync.WaitGroup
+	done.Add(4 * procs)
+	for p := 0; p < procs; p++ {
+		const iters = 1e3
+		go func() {
+			for i := 0; i < iters; i++ {
+				runtime.Gosched()
+				c0 <- 0
+			}
+			done.Done()
+		}()
+		go func() {
+			for i := 0; i < iters; i++ {
+				runtime.Gosched()
+				<-c0
+			}
+			done.Done()
+		}()
+		go func() {
+			for i := 0; i < iters; i++ {
+				runtime.Gosched()
+				select {
+				case c1 <- 0:
+				case c2 <- 0:
+				}
+			}
+			done.Done()
+		}()
+		go func() {
+			for i := 0; i < iters; i++ {
+				runtime.Gosched()
+				select {
+				case <-c1:
+				case <-c2:
+				}
+			}
+			done.Done()
+		}()
+	}
+	done.Wait()
+
+	Stop()
+	saveTrace(t, buf, "TestTraceFutileWakeup")
+	events, _ := parseTrace(t, buf)
+	// Check that (1) trace does not contain EvFutileWakeup events and
+	// (2) there are no consecutive EvGoBlock/EvGCStart/EvGoBlock events
+	// (we call runtime.Gosched between all operations, so these would be futile wakeups).
+	gs := make(map[uint64]int)
+	for _, ev := range events {
+		switch ev.Type {
+		case trace.EvFutileWakeup:
+			t.Fatalf("found EvFutileWakeup event")
+		case trace.EvGoBlockSend, trace.EvGoBlockRecv, trace.EvGoBlockSelect:
+			if gs[ev.G] == 2 {
+				t.Fatalf("goroutine %v blocked on %v at %v right after start",
+					ev.G, trace.EventDescriptions[ev.Type].Name, ev.Ts)
+			}
+			if gs[ev.G] == 1 {
+				t.Fatalf("goroutine %v blocked on %v at %v while blocked",
+					ev.G, trace.EventDescriptions[ev.Type].Name, ev.Ts)
+			}
+			gs[ev.G] = 1
+		case trace.EvGoStart:
+			if gs[ev.G] == 1 {
+				gs[ev.G] = 2
+			}
+		default:
+			delete(gs, ev.G)
+		}
+	}
+}
+
+func TestTraceCPUProfile(t *testing.T) {
+	if IsEnabled() {
+		t.Skip("skipping because -test.trace is set")
+	}
+
+	cpuBuf := new(bytes.Buffer)
+	if err := pprof.StartCPUProfile(cpuBuf); err != nil {
+		t.Skipf("failed to start CPU profile: %v", err)
+	}
+
+	buf := new(bytes.Buffer)
+	if err := Start(buf); err != nil {
+		t.Fatalf("failed to start tracing: %v", err)
+	}
+
+	dur := 100 * time.Millisecond
+	func() {
+		// Create a region in the execution trace. Set and clear goroutine
+		// labels fully within that region, so we know that any CPU profile
+		// sample with the label must also be eligible for inclusion in the
+		// execution trace.
+		ctx := context.Background()
+		defer StartRegion(ctx, "cpuHogger").End()
+		pprof.Do(ctx, pprof.Labels("tracing", "on"), func(ctx context.Context) {
+			cpuHogger(cpuHog1, &salt1, dur)
+		})
+		// Be sure the execution trace's view, when filtered to this goroutine
+		// via the explicit goroutine ID in each event, gets many more samples
+		// than the CPU profiler when filtered to this goroutine via labels.
+		cpuHogger(cpuHog1, &salt1, dur)
+	}()
+
+	Stop()
+	pprof.StopCPUProfile()
+	saveTrace(t, buf, "TestTraceCPUProfile")
+
+	prof, err := profile.Parse(cpuBuf)
+	if err != nil {
+		t.Fatalf("failed to parse CPU profile: %v", err)
+	}
+	// Examine the CPU profiler's view. Filter it to only include samples from
+	// the single test goroutine. Use labels to execute that filter: they should
+	// apply to all work done while that goroutine is getg().m.curg, and they
+	// should apply to no other goroutines.
+	pprofSamples := 0
+	pprofStacks := make(map[string]int)
+	for _, s := range prof.Sample {
+		if s.Label["tracing"] != nil {
+			var fns []string
+			var leaf string
+			for _, loc := range s.Location {
+				for _, line := range loc.Line {
+					fns = append(fns, fmt.Sprintf("%s:%d", line.Function.Name, line.Line))
+					leaf = line.Function.Name
+				}
+			}
+			// runtime.sigprof synthesizes call stacks when "normal traceback is
+			// impossible or has failed", using particular placeholder functions
+			// to represent common failure cases. Look for those functions in
+			// the leaf position as a sign that the call stack and its
+			// symbolization are more complex than this test can handle.
+			//
+			// TODO: Make the symbolization done by the execution tracer and CPU
+			// profiler match up even in these harder cases. See #53378.
+			switch leaf {
+			case "runtime._System", "runtime._GC", "runtime._ExternalCode", "runtime._VDSO":
+				continue
+			}
+			stack := strings.Join(fns, " ")
+			samples := int(s.Value[0])
+			pprofSamples += samples
+			pprofStacks[stack] += samples
+		}
+	}
+	if pprofSamples == 0 {
+		t.Skipf("CPU profile did not include any samples while tracing was active\n%s", prof)
+	}
+
+	// Examine the execution tracer's view of the CPU profile samples. Filter it
+	// to only include samples from the single test goroutine. Use the goroutine
+	// ID that was recorded in the events: that should reflect getg().m.curg,
+	// same as the profiler's labels (even when the M is using its g0 stack).
+	totalTraceSamples := 0
+	traceSamples := 0
+	traceStacks := make(map[string]int)
+	events, _ := parseTrace(t, buf)
+	var hogRegion *trace.Event
+	for _, ev := range events {
+		if ev.Type == trace.EvUserRegion && ev.Args[1] == 0 && ev.SArgs[0] == "cpuHogger" {
+			// mode "0" indicates region start
+			hogRegion = ev
+		}
+	}
+	if hogRegion == nil {
+		t.Fatalf("execution trace did not identify cpuHogger goroutine")
+	} else if hogRegion.Link == nil {
+		t.Fatalf("execution trace did not close cpuHogger region")
+	}
+	for _, ev := range events {
+		if ev.Type == trace.EvCPUSample {
+			totalTraceSamples++
+			if ev.G == hogRegion.G {
+				traceSamples++
+				var fns []string
+				for _, frame := range ev.Stk {
+					if frame.Fn != "runtime.goexit" {
+						fns = append(fns, fmt.Sprintf("%s:%d", frame.Fn, frame.Line))
+					}
+				}
+				stack := strings.Join(fns, " ")
+				traceStacks[stack]++
+			}
+		}
+	}
+
+	// The execution trace may drop CPU profile samples if the profiling buffer
+	// overflows. Based on the size of profBufWordCount, that takes a bit over
+	// 1900 CPU samples or 19 thread-seconds at a 100 Hz sample rate. If we've
+	// hit that case, then we definitely have at least one full buffer's worth
+	// of CPU samples, so we'll call that success.
+	overflowed := totalTraceSamples >= 1900
+	if traceSamples < pprofSamples {
+		t.Logf("execution trace did not include all CPU profile samples; %d in profile, %d in trace", pprofSamples, traceSamples)
+		if !overflowed {
+			t.Fail()
+		}
+	}
+
+	for stack, traceSamples := range traceStacks {
+		pprofSamples := pprofStacks[stack]
+		delete(pprofStacks, stack)
+		if traceSamples < pprofSamples {
+			t.Logf("execution trace did not include all CPU profile samples for stack %q; %d in profile, %d in trace",
+				stack, pprofSamples, traceSamples)
+			if !overflowed {
+				t.Fail()
+			}
+		}
+	}
+	for stack, pprofSamples := range pprofStacks {
+		t.Logf("CPU profile included %d samples at stack %q not present in execution trace", pprofSamples, stack)
+		if !overflowed {
+			t.Fail()
+		}
+	}
+
+	if t.Failed() {
+		t.Logf("execution trace CPU samples:")
+		for stack, samples := range traceStacks {
+			t.Logf("%d: %q", samples, stack)
+		}
+		t.Logf("CPU profile:\n%v", prof)
+	}
+}
+
+func cpuHogger(f func(x int) int, y *int, dur time.Duration) {
+	// We only need to get one 100 Hz clock tick, so we've got
+	// a large safety buffer.
+	// But do at least 500 iterations (which should take about 100ms),
+	// otherwise TestCPUProfileMultithreaded can fail if only one
+	// thread is scheduled during the testing period.
+	t0 := time.Now()
+	accum := *y
+	for i := 0; i < 500 || time.Since(t0) < dur; i++ {
+		accum = f(accum)
+	}
+	*y = accum
+}
+
+var (
+	salt1 = 0
+)
+
+// The actual CPU hogging function.
+// Must not call other functions nor access heap/globals in the loop,
+// otherwise under race detector the samples will be in the race runtime.
+func cpuHog1(x int) int {
+	return cpuHog0(x, 1e5)
+}
+
+func cpuHog0(x, n int) int {
+	foo := x
+	for i := 0; i < n; i++ {
+		if i%1000 == 0 {
+			// Spend time in mcall, stored as gp.m.curg, with g0 running
+			runtime.Gosched()
+		}
+		if foo > 0 {
+			foo *= foo
+		} else {
+			foo *= foo + 1
+		}
+	}
+	return foo
+}
+
+func saveTrace(t *testing.T, buf *bytes.Buffer, name string) {
+	if !*saveTraces {
+		return
+	}
+	if err := os.WriteFile(name+".trace", buf.Bytes(), 0600); err != nil {
+		t.Errorf("failed to write trace file: %s", err)
+	}
+}
diff --git a/src/runtime/trace_cgo_test.go b/src/runtime/trace_cgo_test.go
new file mode 100644
index 0000000..3f207aa
--- /dev/null
+++ b/src/runtime/trace_cgo_test.go
@@ -0,0 +1,105 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build cgo
+
+package runtime_test
+
+import (
+	"bytes"
+	"fmt"
+	"internal/testenv"
+	"internal/trace"
+	"io"
+	"os"
+	"runtime"
+	"strings"
+	"testing"
+)
+
+// TestTraceUnwindCGO verifies that trace events emitted in cgo callbacks
+// produce the same stack traces and don't cause any crashes regardless of
+// tracefpunwindoff being set to 0 or 1.
+func TestTraceUnwindCGO(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+	testenv.MustHaveGoBuild(t)
+	t.Parallel()
+
+	exe, err := buildTestProg(t, "testprogcgo")
+	if err != nil {
+		t.Fatal(err)
+	}
+
+	logs := map[string]*trace.Event{
+		"goCalledFromC":       nil,
+		"goCalledFromCThread": nil,
+	}
+	for _, tracefpunwindoff := range []int{1, 0} {
+		env := fmt.Sprintf("GODEBUG=tracefpunwindoff=%d", tracefpunwindoff)
+		got := runBuiltTestProg(t, exe, "Trace", env)
+		prefix, tracePath, found := strings.Cut(got, ":")
+		if !found || prefix != "trace path" {
+			t.Fatalf("unexpected output:\n%s\n", got)
+		}
+		defer os.Remove(tracePath)
+
+		traceData, err := os.ReadFile(tracePath)
+		if err != nil {
+			t.Fatalf("failed to read trace: %s", err)
+		}
+		events := parseTrace(t, bytes.NewReader(traceData))
+
+		for category := range logs {
+			event := mustFindLog(t, events, category)
+			if wantEvent := logs[category]; wantEvent == nil {
+				logs[category] = event
+			} else if got, want := dumpStack(event), dumpStack(wantEvent); got != want {
+				t.Errorf("%q: got stack:\n%s\nwant stack:\n%s\n", category, got, want)
+			}
+		}
+	}
+}
+
+// mustFindLog returns the EvUserLog event with the given category in events. It
+// fails if no event or multiple events match the category.
+func mustFindLog(t *testing.T, events []*trace.Event, category string) *trace.Event {
+	t.Helper()
+	var candidates []*trace.Event
+	for _, e := range events {
+		if e.Type == trace.EvUserLog && len(e.SArgs) >= 1 && e.SArgs[0] == category {
+			candidates = append(candidates, e)
+		}
+	}
+	if len(candidates) == 0 {
+		t.Errorf("could not find log with category: %q", category)
+	} else if len(candidates) > 1 {
+		t.Errorf("found more than one log with category: %q", category)
+	}
+	return candidates[0]
+}
+
+// dumpStack returns e.Stk as a string.
+func dumpStack(e *trace.Event) string {
+	var buf bytes.Buffer
+	for _, f := range e.Stk {
+		file := strings.TrimPrefix(f.File, runtime.GOROOT())
+		fmt.Fprintf(&buf, "%s\n\t%s:%d\n", f.Fn, file, f.Line)
+	}
+	return buf.String()
+}
+
+// parseTrace parses the given trace or skips the test if the trace is broken
+// due to known issues. Partially copied from runtime/trace/trace_test.go.
+func parseTrace(t *testing.T, r io.Reader) []*trace.Event {
+	res, err := trace.Parse(r, "")
+	if err == trace.ErrTimeOrder {
+		t.Skipf("skipping trace: %v", err)
+	}
+	if err != nil {
+		t.Fatalf("failed to parse trace: %v", err)
+	}
+	return res.Events
+}
diff --git a/src/runtime/traceback.go b/src/runtime/traceback.go
new file mode 100644
index 0000000..32a5385
--- /dev/null
+++ b/src/runtime/traceback.go
@@ -0,0 +1,1640 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"internal/abi"
+	"internal/bytealg"
+	"internal/goarch"
+	"runtime/internal/sys"
+	"unsafe"
+)
+
+// The code in this file implements stack trace walking for all architectures.
+// The most important fact about a given architecture is whether it uses a link register.
+// On systems with link registers, the prologue for a non-leaf function stores the
+// incoming value of LR at the bottom of the newly allocated stack frame.
+// On systems without link registers (x86), the architecture pushes a return PC during
+// the call instruction, so the return PC ends up above the stack frame.
+// In this file, the return PC is always called LR, no matter how it was found.
+
+const usesLR = sys.MinFrameSize > 0
+
+const (
+	// tracebackInnerFrames is the number of innermost frames to print in a
+	// stack trace. The total maximum frames is tracebackInnerFrames +
+	// tracebackOuterFrames.
+	tracebackInnerFrames = 50
+
+	// tracebackOuterFrames is the number of outermost frames to print in a
+	// stack trace.
+	tracebackOuterFrames = 50
+)
+
+// unwindFlags control the behavior of various unwinders.
+type unwindFlags uint8
+
+const (
+	// unwindPrintErrors indicates that if unwinding encounters an error, it
+	// should print a message and stop without throwing. This is used for things
+	// like stack printing, where it's better to get incomplete information than
+	// to crash. This is also used in situations where everything may not be
+	// stopped nicely and the stack walk may not be able to complete, such as
+	// during profiling signals or during a crash.
+	//
+	// If neither unwindPrintErrors or unwindSilentErrors are set, unwinding
+	// performs extra consistency checks and throws on any error.
+	//
+	// Note that there are a small number of fatal situations that will throw
+	// regardless of unwindPrintErrors or unwindSilentErrors.
+	unwindPrintErrors unwindFlags = 1 << iota
+
+	// unwindSilentErrors silently ignores errors during unwinding.
+	unwindSilentErrors
+
+	// unwindTrap indicates that the initial PC and SP are from a trap, not a
+	// return PC from a call.
+	//
+	// The unwindTrap flag is updated during unwinding. If set, frame.pc is the
+	// address of a faulting instruction instead of the return address of a
+	// call. It also means the liveness at pc may not be known.
+	//
+	// TODO: Distinguish frame.continpc, which is really the stack map PC, from
+	// the actual continuation PC, which is computed differently depending on
+	// this flag and a few other things.
+	unwindTrap
+
+	// unwindJumpStack indicates that, if the traceback is on a system stack, it
+	// should resume tracing at the user stack when the system stack is
+	// exhausted.
+	unwindJumpStack
+)
+
+// An unwinder iterates the physical stack frames of a Go sack.
+//
+// Typical use of an unwinder looks like:
+//
+//	var u unwinder
+//	for u.init(gp, 0); u.valid(); u.next() {
+//		// ... use frame info in u ...
+//	}
+//
+// Implementation note: This is carefully structured to be pointer-free because
+// tracebacks happen in places that disallow write barriers (e.g., signals).
+// Even if this is stack-allocated, its pointer-receiver methods don't know that
+// their receiver is on the stack, so they still emit write barriers. Here we
+// address that by carefully avoiding any pointers in this type. Another
+// approach would be to split this into a mutable part that's passed by pointer
+// but contains no pointers itself and an immutable part that's passed and
+// returned by value and can contain pointers. We could potentially hide that
+// we're doing that in trivial methods that are inlined into the caller that has
+// the stack allocation, but that's fragile.
+type unwinder struct {
+	// frame is the current physical stack frame, or all 0s if
+	// there is no frame.
+	frame stkframe
+
+	// g is the G who's stack is being unwound. If the
+	// unwindJumpStack flag is set and the unwinder jumps stacks,
+	// this will be different from the initial G.
+	g guintptr
+
+	// cgoCtxt is the index into g.cgoCtxt of the next frame on the cgo stack.
+	// The cgo stack is unwound in tandem with the Go stack as we find marker frames.
+	cgoCtxt int
+
+	// calleeFuncID is the function ID of the caller of the current
+	// frame.
+	calleeFuncID abi.FuncID
+
+	// flags are the flags to this unwind. Some of these are updated as we
+	// unwind (see the flags documentation).
+	flags unwindFlags
+
+	// cache is used to cache pcvalue lookups.
+	cache pcvalueCache
+}
+
+// init initializes u to start unwinding gp's stack and positions the
+// iterator on gp's innermost frame. gp must not be the current G.
+//
+// A single unwinder can be reused for multiple unwinds.
+func (u *unwinder) init(gp *g, flags unwindFlags) {
+	// Implementation note: This starts the iterator on the first frame and we
+	// provide a "valid" method. Alternatively, this could start in a "before
+	// the first frame" state and "next" could return whether it was able to
+	// move to the next frame, but that's both more awkward to use in a "for"
+	// loop and is harder to implement because we have to do things differently
+	// for the first frame.
+	u.initAt(^uintptr(0), ^uintptr(0), ^uintptr(0), gp, flags)
+}
+
+func (u *unwinder) initAt(pc0, sp0, lr0 uintptr, gp *g, flags unwindFlags) {
+	// Don't call this "g"; it's too easy get "g" and "gp" confused.
+	if ourg := getg(); ourg == gp && ourg == ourg.m.curg {
+		// The starting sp has been passed in as a uintptr, and the caller may
+		// have other uintptr-typed stack references as well.
+		// If during one of the calls that got us here or during one of the
+		// callbacks below the stack must be grown, all these uintptr references
+		// to the stack will not be updated, and traceback will continue
+		// to inspect the old stack memory, which may no longer be valid.
+		// Even if all the variables were updated correctly, it is not clear that
+		// we want to expose a traceback that begins on one stack and ends
+		// on another stack. That could confuse callers quite a bit.
+		// Instead, we require that initAt and any other function that
+		// accepts an sp for the current goroutine (typically obtained by
+		// calling getcallersp) must not run on that goroutine's stack but
+		// instead on the g0 stack.
+		throw("cannot trace user goroutine on its own stack")
+	}
+
+	if pc0 == ^uintptr(0) && sp0 == ^uintptr(0) { // Signal to fetch saved values from gp.
+		if gp.syscallsp != 0 {
+			pc0 = gp.syscallpc
+			sp0 = gp.syscallsp
+			if usesLR {
+				lr0 = 0
+			}
+		} else {
+			pc0 = gp.sched.pc
+			sp0 = gp.sched.sp
+			if usesLR {
+				lr0 = gp.sched.lr
+			}
+		}
+	}
+
+	var frame stkframe
+	frame.pc = pc0
+	frame.sp = sp0
+	if usesLR {
+		frame.lr = lr0
+	}
+
+	// If the PC is zero, it's likely a nil function call.
+	// Start in the caller's frame.
+	if frame.pc == 0 {
+		if usesLR {
+			frame.pc = *(*uintptr)(unsafe.Pointer(frame.sp))
+			frame.lr = 0
+		} else {
+			frame.pc = uintptr(*(*uintptr)(unsafe.Pointer(frame.sp)))
+			frame.sp += goarch.PtrSize
+		}
+	}
+
+	// runtime/internal/atomic functions call into kernel helpers on
+	// arm < 7. See runtime/internal/atomic/sys_linux_arm.s.
+	//
+	// Start in the caller's frame.
+	if GOARCH == "arm" && goarm < 7 && GOOS == "linux" && frame.pc&0xffff0000 == 0xffff0000 {
+		// Note that the calls are simple BL without pushing the return
+		// address, so we use LR directly.
+		//
+		// The kernel helpers are frameless leaf functions, so SP and
+		// LR are not touched.
+		frame.pc = frame.lr
+		frame.lr = 0
+	}
+
+	f := findfunc(frame.pc)
+	if !f.valid() {
+		if flags&unwindSilentErrors == 0 {
+			print("runtime: g ", gp.goid, ": unknown pc ", hex(frame.pc), "\n")
+			tracebackHexdump(gp.stack, &frame, 0)
+		}
+		if flags&(unwindPrintErrors|unwindSilentErrors) == 0 {
+			throw("unknown pc")
+		}
+		*u = unwinder{}
+		return
+	}
+	frame.fn = f
+
+	// Populate the unwinder.
+	*u = unwinder{
+		frame:        frame,
+		g:            gp.guintptr(),
+		cgoCtxt:      len(gp.cgoCtxt) - 1,
+		calleeFuncID: abi.FuncIDNormal,
+		flags:        flags,
+	}
+
+	isSyscall := frame.pc == pc0 && frame.sp == sp0 && pc0 == gp.syscallpc && sp0 == gp.syscallsp
+	u.resolveInternal(true, isSyscall)
+}
+
+func (u *unwinder) valid() bool {
+	return u.frame.pc != 0
+}
+
+// resolveInternal fills in u.frame based on u.frame.fn, pc, and sp.
+//
+// innermost indicates that this is the first resolve on this stack. If
+// innermost is set, isSyscall indicates that the PC/SP was retrieved from
+// gp.syscall*; this is otherwise ignored.
+//
+// On entry, u.frame contains:
+//   - fn is the running function.
+//   - pc is the PC in the running function.
+//   - sp is the stack pointer at that program counter.
+//   - For the innermost frame on LR machines, lr is the program counter that called fn.
+//
+// On return, u.frame contains:
+//   - fp is the stack pointer of the caller.
+//   - lr is the program counter that called fn.
+//   - varp, argp, and continpc are populated for the current frame.
+//
+// If fn is a stack-jumping function, resolveInternal can change the entire
+// frame state to follow that stack jump.
+//
+// This is internal to unwinder.
+func (u *unwinder) resolveInternal(innermost, isSyscall bool) {
+	frame := &u.frame
+	gp := u.g.ptr()
+
+	f := frame.fn
+	if f.pcsp == 0 {
+		// No frame information, must be external function, like race support.
+		// See golang.org/issue/13568.
+		u.finishInternal()
+		return
+	}
+
+	// Compute function info flags.
+	flag := f.flag
+	if f.funcID == abi.FuncID_cgocallback {
+		// cgocallback does write SP to switch from the g0 to the curg stack,
+		// but it carefully arranges that during the transition BOTH stacks
+		// have cgocallback frame valid for unwinding through.
+		// So we don't need to exclude it with the other SP-writing functions.
+		flag &^= abi.FuncFlagSPWrite
+	}
+	if isSyscall {
+		// Some Syscall functions write to SP, but they do so only after
+		// saving the entry PC/SP using entersyscall.
+		// Since we are using the entry PC/SP, the later SP write doesn't matter.
+		flag &^= abi.FuncFlagSPWrite
+	}
+
+	// Found an actual function.
+	// Derive frame pointer.
+	if frame.fp == 0 {
+		// Jump over system stack transitions. If we're on g0 and there's a user
+		// goroutine, try to jump. Otherwise this is a regular call.
+		// We also defensively check that this won't switch M's on us,
+		// which could happen at critical points in the scheduler.
+		// This ensures gp.m doesn't change from a stack jump.
+		if u.flags&unwindJumpStack != 0 && gp == gp.m.g0 && gp.m.curg != nil && gp.m.curg.m == gp.m {
+			switch f.funcID {
+			case abi.FuncID_morestack:
+				// morestack does not return normally -- newstack()
+				// gogo's to curg.sched. Match that.
+				// This keeps morestack() from showing up in the backtrace,
+				// but that makes some sense since it'll never be returned
+				// to.
+				gp = gp.m.curg
+				u.g.set(gp)
+				frame.pc = gp.sched.pc
+				frame.fn = findfunc(frame.pc)
+				f = frame.fn
+				flag = f.flag
+				frame.lr = gp.sched.lr
+				frame.sp = gp.sched.sp
+				u.cgoCtxt = len(gp.cgoCtxt) - 1
+			case abi.FuncID_systemstack:
+				// systemstack returns normally, so just follow the
+				// stack transition.
+				if usesLR && funcspdelta(f, frame.pc, &u.cache) == 0 {
+					// We're at the function prologue and the stack
+					// switch hasn't happened, or epilogue where we're
+					// about to return. Just unwind normally.
+					// Do this only on LR machines because on x86
+					// systemstack doesn't have an SP delta (the CALL
+					// instruction opens the frame), therefore no way
+					// to check.
+					flag &^= abi.FuncFlagSPWrite
+					break
+				}
+				gp = gp.m.curg
+				u.g.set(gp)
+				frame.sp = gp.sched.sp
+				u.cgoCtxt = len(gp.cgoCtxt) - 1
+				flag &^= abi.FuncFlagSPWrite
+			}
+		}
+		frame.fp = frame.sp + uintptr(funcspdelta(f, frame.pc, &u.cache))
+		if !usesLR {
+			// On x86, call instruction pushes return PC before entering new function.
+			frame.fp += goarch.PtrSize
+		}
+	}
+
+	// Derive link register.
+	if flag&abi.FuncFlagTopFrame != 0 {
+		// This function marks the top of the stack. Stop the traceback.
+		frame.lr = 0
+	} else if flag&abi.FuncFlagSPWrite != 0 && (!innermost || u.flags&(unwindPrintErrors|unwindSilentErrors) != 0) {
+		// The function we are in does a write to SP that we don't know
+		// how to encode in the spdelta table. Examples include context
+		// switch routines like runtime.gogo but also any code that switches
+		// to the g0 stack to run host C code.
+		// We can't reliably unwind the SP (we might not even be on
+		// the stack we think we are), so stop the traceback here.
+		//
+		// The one exception (encoded in the complex condition above) is that
+		// we assume if we're doing a precise traceback, and this is the
+		// innermost frame, that the SPWRITE function voluntarily preempted itself on entry
+		// during the stack growth check. In that case, the function has
+		// not yet had a chance to do any writes to SP and is safe to unwind.
+		// isAsyncSafePoint does not allow assembly functions to be async preempted,
+		// and preemptPark double-checks that SPWRITE functions are not async preempted.
+		// So for GC stack traversal, we can safely ignore SPWRITE for the innermost frame,
+		// but farther up the stack we'd better not find any.
+		// This is somewhat imprecise because we're just guessing that we're in the stack
+		// growth check. It would be better if SPWRITE were encoded in the spdelta
+		// table so we would know for sure that we were still in safe code.
+		//
+		// uSE uPE inn | action
+		//  T   _   _  | frame.lr = 0
+		//  F   T   _  | frame.lr = 0
+		//  F   F   F  | print; panic
+		//  F   F   T  | ignore SPWrite
+		if u.flags&(unwindPrintErrors|unwindSilentErrors) == 0 && !innermost {
+			println("traceback: unexpected SPWRITE function", funcname(f))
+			throw("traceback")
+		}
+		frame.lr = 0
+	} else {
+		var lrPtr uintptr
+		if usesLR {
+			if innermost && frame.sp < frame.fp || frame.lr == 0 {
+				lrPtr = frame.sp
+				frame.lr = *(*uintptr)(unsafe.Pointer(lrPtr))
+			}
+		} else {
+			if frame.lr == 0 {
+				lrPtr = frame.fp - goarch.PtrSize
+				frame.lr = *(*uintptr)(unsafe.Pointer(lrPtr))
+			}
+		}
+	}
+
+	frame.varp = frame.fp
+	if !usesLR {
+		// On x86, call instruction pushes return PC before entering new function.
+		frame.varp -= goarch.PtrSize
+	}
+
+	// For architectures with frame pointers, if there's
+	// a frame, then there's a saved frame pointer here.
+	//
+	// NOTE: This code is not as general as it looks.
+	// On x86, the ABI is to save the frame pointer word at the
+	// top of the stack frame, so we have to back down over it.
+	// On arm64, the frame pointer should be at the bottom of
+	// the stack (with R29 (aka FP) = RSP), in which case we would
+	// not want to do the subtraction here. But we started out without
+	// any frame pointer, and when we wanted to add it, we didn't
+	// want to break all the assembly doing direct writes to 8(RSP)
+	// to set the first parameter to a called function.
+	// So we decided to write the FP link *below* the stack pointer
+	// (with R29 = RSP - 8 in Go functions).
+	// This is technically ABI-compatible but not standard.
+	// And it happens to end up mimicking the x86 layout.
+	// Other architectures may make different decisions.
+	if frame.varp > frame.sp && framepointer_enabled {
+		frame.varp -= goarch.PtrSize
+	}
+
+	frame.argp = frame.fp + sys.MinFrameSize
+
+	// Determine frame's 'continuation PC', where it can continue.
+	// Normally this is the return address on the stack, but if sigpanic
+	// is immediately below this function on the stack, then the frame
+	// stopped executing due to a trap, and frame.pc is probably not
+	// a safe point for looking up liveness information. In this panicking case,
+	// the function either doesn't return at all (if it has no defers or if the
+	// defers do not recover) or it returns from one of the calls to
+	// deferproc a second time (if the corresponding deferred func recovers).
+	// In the latter case, use a deferreturn call site as the continuation pc.
+	frame.continpc = frame.pc
+	if u.calleeFuncID == abi.FuncID_sigpanic {
+		if frame.fn.deferreturn != 0 {
+			frame.continpc = frame.fn.entry() + uintptr(frame.fn.deferreturn) + 1
+			// Note: this may perhaps keep return variables alive longer than
+			// strictly necessary, as we are using "function has a defer statement"
+			// as a proxy for "function actually deferred something". It seems
+			// to be a minor drawback. (We used to actually look through the
+			// gp._defer for a defer corresponding to this function, but that
+			// is hard to do with defer records on the stack during a stack copy.)
+			// Note: the +1 is to offset the -1 that
+			// stack.go:getStackMap does to back up a return
+			// address make sure the pc is in the CALL instruction.
+		} else {
+			frame.continpc = 0
+		}
+	}
+}
+
+func (u *unwinder) next() {
+	frame := &u.frame
+	f := frame.fn
+	gp := u.g.ptr()
+
+	// Do not unwind past the bottom of the stack.
+	if frame.lr == 0 {
+		u.finishInternal()
+		return
+	}
+	flr := findfunc(frame.lr)
+	if !flr.valid() {
+		// This happens if you get a profiling interrupt at just the wrong time.
+		// In that context it is okay to stop early.
+		// But if no error flags are set, we're doing a garbage collection and must
+		// get everything, so crash loudly.
+		fail := u.flags&(unwindPrintErrors|unwindSilentErrors) == 0
+		doPrint := u.flags&unwindSilentErrors == 0
+		if doPrint && gp.m.incgo && f.funcID == abi.FuncID_sigpanic {
+			// We can inject sigpanic
+			// calls directly into C code,
+			// in which case we'll see a C
+			// return PC. Don't complain.
+			doPrint = false
+		}
+		if fail || doPrint {
+			print("runtime: g ", gp.goid, ": unexpected return pc for ", funcname(f), " called from ", hex(frame.lr), "\n")
+			tracebackHexdump(gp.stack, frame, 0)
+		}
+		if fail {
+			throw("unknown caller pc")
+		}
+		frame.lr = 0
+		u.finishInternal()
+		return
+	}
+
+	if frame.pc == frame.lr && frame.sp == frame.fp {
+		// If the next frame is identical to the current frame, we cannot make progress.
+		print("runtime: traceback stuck. pc=", hex(frame.pc), " sp=", hex(frame.sp), "\n")
+		tracebackHexdump(gp.stack, frame, frame.sp)
+		throw("traceback stuck")
+	}
+
+	injectedCall := f.funcID == abi.FuncID_sigpanic || f.funcID == abi.FuncID_asyncPreempt || f.funcID == abi.FuncID_debugCallV2
+	if injectedCall {
+		u.flags |= unwindTrap
+	} else {
+		u.flags &^= unwindTrap
+	}
+
+	// Unwind to next frame.
+	u.calleeFuncID = f.funcID
+	frame.fn = flr
+	frame.pc = frame.lr
+	frame.lr = 0
+	frame.sp = frame.fp
+	frame.fp = 0
+
+	// On link register architectures, sighandler saves the LR on stack
+	// before faking a call.
+	if usesLR && injectedCall {
+		x := *(*uintptr)(unsafe.Pointer(frame.sp))
+		frame.sp += alignUp(sys.MinFrameSize, sys.StackAlign)
+		f = findfunc(frame.pc)
+		frame.fn = f
+		if !f.valid() {
+			frame.pc = x
+		} else if funcspdelta(f, frame.pc, &u.cache) == 0 {
+			frame.lr = x
+		}
+	}
+
+	u.resolveInternal(false, false)
+}
+
+// finishInternal is an unwinder-internal helper called after the stack has been
+// exhausted. It sets the unwinder to an invalid state and checks that it
+// successfully unwound the entire stack.
+func (u *unwinder) finishInternal() {
+	u.frame.pc = 0
+
+	// Note that panic != nil is okay here: there can be leftover panics,
+	// because the defers on the panic stack do not nest in frame order as
+	// they do on the defer stack. If you have:
+	//
+	//	frame 1 defers d1
+	//	frame 2 defers d2
+	//	frame 3 defers d3
+	//	frame 4 panics
+	//	frame 4's panic starts running defers
+	//	frame 5, running d3, defers d4
+	//	frame 5 panics
+	//	frame 5's panic starts running defers
+	//	frame 6, running d4, garbage collects
+	//	frame 6, running d2, garbage collects
+	//
+	// During the execution of d4, the panic stack is d4 -> d3, which
+	// is nested properly, and we'll treat frame 3 as resumable, because we
+	// can find d3. (And in fact frame 3 is resumable. If d4 recovers
+	// and frame 5 continues running, d3, d3 can recover and we'll
+	// resume execution in (returning from) frame 3.)
+	//
+	// During the execution of d2, however, the panic stack is d2 -> d3,
+	// which is inverted. The scan will match d2 to frame 2 but having
+	// d2 on the stack until then means it will not match d3 to frame 3.
+	// This is okay: if we're running d2, then all the defers after d2 have
+	// completed and their corresponding frames are dead. Not finding d3
+	// for frame 3 means we'll set frame 3's continpc == 0, which is correct
+	// (frame 3 is dead). At the end of the walk the panic stack can thus
+	// contain defers (d3 in this case) for dead frames. The inversion here
+	// always indicates a dead frame, and the effect of the inversion on the
+	// scan is to hide those dead frames, so the scan is still okay:
+	// what's left on the panic stack are exactly (and only) the dead frames.
+	//
+	// We require callback != nil here because only when callback != nil
+	// do we know that gentraceback is being called in a "must be correct"
+	// context as opposed to a "best effort" context. The tracebacks with
+	// callbacks only happen when everything is stopped nicely.
+	// At other times, such as when gathering a stack for a profiling signal
+	// or when printing a traceback during a crash, everything may not be
+	// stopped nicely, and the stack walk may not be able to complete.
+	gp := u.g.ptr()
+	if u.flags&(unwindPrintErrors|unwindSilentErrors) == 0 && u.frame.sp != gp.stktopsp {
+		print("runtime: g", gp.goid, ": frame.sp=", hex(u.frame.sp), " top=", hex(gp.stktopsp), "\n")
+		print("\tstack=[", hex(gp.stack.lo), "-", hex(gp.stack.hi), "\n")
+		throw("traceback did not unwind completely")
+	}
+}
+
+// symPC returns the PC that should be used for symbolizing the current frame.
+// Specifically, this is the PC of the last instruction executed in this frame.
+//
+// If this frame did a normal call, then frame.pc is a return PC, so this will
+// return frame.pc-1, which points into the CALL instruction. If the frame was
+// interrupted by a signal (e.g., profiler, segv, etc) then frame.pc is for the
+// trapped instruction, so this returns frame.pc. See issue #34123. Finally,
+// frame.pc can be at function entry when the frame is initialized without
+// actually running code, like in runtime.mstart, in which case this returns
+// frame.pc because that's the best we can do.
+func (u *unwinder) symPC() uintptr {
+	if u.flags&unwindTrap == 0 && u.frame.pc > u.frame.fn.entry() {
+		// Regular call.
+		return u.frame.pc - 1
+	}
+	// Trapping instruction or we're at the function entry point.
+	return u.frame.pc
+}
+
+// cgoCallers populates pcBuf with the cgo callers of the current frame using
+// the registered cgo unwinder. It returns the number of PCs written to pcBuf.
+// If the current frame is not a cgo frame or if there's no registered cgo
+// unwinder, it returns 0.
+func (u *unwinder) cgoCallers(pcBuf []uintptr) int {
+	if cgoTraceback == nil || u.frame.fn.funcID != abi.FuncID_cgocallback || u.cgoCtxt < 0 {
+		// We don't have a cgo unwinder (typical case), or we do but we're not
+		// in a cgo frame or we're out of cgo context.
+		return 0
+	}
+
+	ctxt := u.g.ptr().cgoCtxt[u.cgoCtxt]
+	u.cgoCtxt--
+	cgoContextPCs(ctxt, pcBuf)
+	for i, pc := range pcBuf {
+		if pc == 0 {
+			return i
+		}
+	}
+	return len(pcBuf)
+}
+
+// tracebackPCs populates pcBuf with the return addresses for each frame from u
+// and returns the number of PCs written to pcBuf. The returned PCs correspond
+// to "logical frames" rather than "physical frames"; that is if A is inlined
+// into B, this will still return a PCs for both A and B. This also includes PCs
+// generated by the cgo unwinder, if one is registered.
+//
+// If skip != 0, this skips this many logical frames.
+//
+// Callers should set the unwindSilentErrors flag on u.
+func tracebackPCs(u *unwinder, skip int, pcBuf []uintptr) int {
+	var cgoBuf [32]uintptr
+	n := 0
+	for ; n < len(pcBuf) && u.valid(); u.next() {
+		f := u.frame.fn
+		cgoN := u.cgoCallers(cgoBuf[:])
+
+		// TODO: Why does &u.cache cause u to escape? (Same in traceback2)
+		for iu, uf := newInlineUnwinder(f, u.symPC(), noEscapePtr(&u.cache)); n < len(pcBuf) && uf.valid(); uf = iu.next(uf) {
+			sf := iu.srcFunc(uf)
+			if sf.funcID == abi.FuncIDWrapper && elideWrapperCalling(u.calleeFuncID) {
+				// ignore wrappers
+			} else if skip > 0 {
+				skip--
+			} else {
+				// Callers expect the pc buffer to contain return addresses
+				// and do the -1 themselves, so we add 1 to the call PC to
+				// create a return PC.
+				pcBuf[n] = uf.pc + 1
+				n++
+			}
+			u.calleeFuncID = sf.funcID
+		}
+		// Add cgo frames (if we're done skipping over the requested number of
+		// Go frames).
+		if skip == 0 {
+			n += copy(pcBuf[n:], cgoBuf[:cgoN])
+		}
+	}
+	return n
+}
+
+// printArgs prints function arguments in traceback.
+func printArgs(f funcInfo, argp unsafe.Pointer, pc uintptr) {
+	// The "instruction" of argument printing is encoded in _FUNCDATA_ArgInfo.
+	// See cmd/compile/internal/ssagen.emitArgInfo for the description of the
+	// encoding.
+	// These constants need to be in sync with the compiler.
+	const (
+		_endSeq         = 0xff
+		_startAgg       = 0xfe
+		_endAgg         = 0xfd
+		_dotdotdot      = 0xfc
+		_offsetTooLarge = 0xfb
+	)
+
+	const (
+		limit    = 10                       // print no more than 10 args/components
+		maxDepth = 5                        // no more than 5 layers of nesting
+		maxLen   = (maxDepth*3+2)*limit + 1 // max length of _FUNCDATA_ArgInfo (see the compiler side for reasoning)
+	)
+
+	p := (*[maxLen]uint8)(funcdata(f, abi.FUNCDATA_ArgInfo))
+	if p == nil {
+		return
+	}
+
+	liveInfo := funcdata(f, abi.FUNCDATA_ArgLiveInfo)
+	liveIdx := pcdatavalue(f, abi.PCDATA_ArgLiveIndex, pc, nil)
+	startOffset := uint8(0xff) // smallest offset that needs liveness info (slots with a lower offset is always live)
+	if liveInfo != nil {
+		startOffset = *(*uint8)(liveInfo)
+	}
+
+	isLive := func(off, slotIdx uint8) bool {
+		if liveInfo == nil || liveIdx <= 0 {
+			return true // no liveness info, always live
+		}
+		if off < startOffset {
+			return true
+		}
+		bits := *(*uint8)(add(liveInfo, uintptr(liveIdx)+uintptr(slotIdx/8)))
+		return bits&(1<<(slotIdx%8)) != 0
+	}
+
+	print1 := func(off, sz, slotIdx uint8) {
+		x := readUnaligned64(add(argp, uintptr(off)))
+		// mask out irrelevant bits
+		if sz < 8 {
+			shift := 64 - sz*8
+			if goarch.BigEndian {
+				x = x >> shift
+			} else {
+				x = x << shift >> shift
+			}
+		}
+		print(hex(x))
+		if !isLive(off, slotIdx) {
+			print("?")
+		}
+	}
+
+	start := true
+	printcomma := func() {
+		if !start {
+			print(", ")
+		}
+	}
+	pi := 0
+	slotIdx := uint8(0) // register arg spill slot index
+printloop:
+	for {
+		o := p[pi]
+		pi++
+		switch o {
+		case _endSeq:
+			break printloop
+		case _startAgg:
+			printcomma()
+			print("{")
+			start = true
+			continue
+		case _endAgg:
+			print("}")
+		case _dotdotdot:
+			printcomma()
+			print("...")
+		case _offsetTooLarge:
+			printcomma()
+			print("_")
+		default:
+			printcomma()
+			sz := p[pi]
+			pi++
+			print1(o, sz, slotIdx)
+			if o >= startOffset {
+				slotIdx++
+			}
+		}
+		start = false
+	}
+}
+
+// funcNamePiecesForPrint returns the function name for printing to the user.
+// It returns three pieces so it doesn't need an allocation for string
+// concatenation.
+func funcNamePiecesForPrint(name string) (string, string, string) {
+	// Replace the shape name in generic function with "...".
+	i := bytealg.IndexByteString(name, '[')
+	if i < 0 {
+		return name, "", ""
+	}
+	j := len(name) - 1
+	for name[j] != ']' {
+		j--
+	}
+	if j <= i {
+		return name, "", ""
+	}
+	return name[:i], "[...]", name[j+1:]
+}
+
+// funcNameForPrint returns the function name for printing to the user.
+func funcNameForPrint(name string) string {
+	a, b, c := funcNamePiecesForPrint(name)
+	return a + b + c
+}
+
+// printFuncName prints a function name. name is the function name in
+// the binary's func data table.
+func printFuncName(name string) {
+	if name == "runtime.gopanic" {
+		print("panic")
+		return
+	}
+	a, b, c := funcNamePiecesForPrint(name)
+	print(a, b, c)
+}
+
+func printcreatedby(gp *g) {
+	// Show what created goroutine, except main goroutine (goid 1).
+	pc := gp.gopc
+	f := findfunc(pc)
+	if f.valid() && showframe(f.srcFunc(), gp, false, abi.FuncIDNormal) && gp.goid != 1 {
+		printcreatedby1(f, pc, gp.parentGoid)
+	}
+}
+
+func printcreatedby1(f funcInfo, pc uintptr, goid uint64) {
+	print("created by ")
+	printFuncName(funcname(f))
+	if goid != 0 {
+		print(" in goroutine ", goid)
+	}
+	print("\n")
+	tracepc := pc // back up to CALL instruction for funcline.
+	if pc > f.entry() {
+		tracepc -= sys.PCQuantum
+	}
+	file, line := funcline(f, tracepc)
+	print("\t", file, ":", line)
+	if pc > f.entry() {
+		print(" +", hex(pc-f.entry()))
+	}
+	print("\n")
+}
+
+func traceback(pc, sp, lr uintptr, gp *g) {
+	traceback1(pc, sp, lr, gp, 0)
+}
+
+// tracebacktrap is like traceback but expects that the PC and SP were obtained
+// from a trap, not from gp->sched or gp->syscallpc/gp->syscallsp or getcallerpc/getcallersp.
+// Because they are from a trap instead of from a saved pair,
+// the initial PC must not be rewound to the previous instruction.
+// (All the saved pairs record a PC that is a return address, so we
+// rewind it into the CALL instruction.)
+// If gp.m.libcall{g,pc,sp} information is available, it uses that information in preference to
+// the pc/sp/lr passed in.
+func tracebacktrap(pc, sp, lr uintptr, gp *g) {
+	if gp.m.libcallsp != 0 {
+		// We're in C code somewhere, traceback from the saved position.
+		traceback1(gp.m.libcallpc, gp.m.libcallsp, 0, gp.m.libcallg.ptr(), 0)
+		return
+	}
+	traceback1(pc, sp, lr, gp, unwindTrap)
+}
+
+func traceback1(pc, sp, lr uintptr, gp *g, flags unwindFlags) {
+	// If the goroutine is in cgo, and we have a cgo traceback, print that.
+	if iscgo && gp.m != nil && gp.m.ncgo > 0 && gp.syscallsp != 0 && gp.m.cgoCallers != nil && gp.m.cgoCallers[0] != 0 {
+		// Lock cgoCallers so that a signal handler won't
+		// change it, copy the array, reset it, unlock it.
+		// We are locked to the thread and are not running
+		// concurrently with a signal handler.
+		// We just have to stop a signal handler from interrupting
+		// in the middle of our copy.
+		gp.m.cgoCallersUse.Store(1)
+		cgoCallers := *gp.m.cgoCallers
+		gp.m.cgoCallers[0] = 0
+		gp.m.cgoCallersUse.Store(0)
+
+		printCgoTraceback(&cgoCallers)
+	}
+
+	if readgstatus(gp)&^_Gscan == _Gsyscall {
+		// Override registers if blocked in system call.
+		pc = gp.syscallpc
+		sp = gp.syscallsp
+		flags &^= unwindTrap
+	}
+	if gp.m != nil && gp.m.vdsoSP != 0 {
+		// Override registers if running in VDSO. This comes after the
+		// _Gsyscall check to cover VDSO calls after entersyscall.
+		pc = gp.m.vdsoPC
+		sp = gp.m.vdsoSP
+		flags &^= unwindTrap
+	}
+
+	// Print traceback.
+	//
+	// We print the first tracebackInnerFrames frames, and the last
+	// tracebackOuterFrames frames. There are many possible approaches to this.
+	// There are various complications to this:
+	//
+	// - We'd prefer to walk the stack once because in really bad situations
+	//   traceback may crash (and we want as much output as possible) or the stack
+	//   may be changing.
+	//
+	// - Each physical frame can represent several logical frames, so we might
+	//   have to pause in the middle of a physical frame and pick up in the middle
+	//   of a physical frame.
+	//
+	// - The cgo symbolizer can expand a cgo PC to more than one logical frame,
+	//   and involves juggling state on the C side that we don't manage. Since its
+	//   expansion state is managed on the C side, we can't capture the expansion
+	//   state part way through, and because the output strings are managed on the
+	//   C side, we can't capture the output. Thus, our only choice is to replay a
+	//   whole expansion, potentially discarding some of it.
+	//
+	// Rejected approaches:
+	//
+	// - Do two passes where the first pass just counts and the second pass does
+	//   all the printing. This is undesirable if the stack is corrupted or changing
+	//   because we won't see a partial stack if we panic.
+	//
+	// - Keep a ring buffer of the last N logical frames and use this to print
+	//   the bottom frames once we reach the end of the stack. This works, but
+	//   requires keeping a surprising amount of state on the stack, and we have
+	//   to run the cgo symbolizer twice—once to count frames, and a second to
+	//   print them—since we can't retain the strings it returns.
+	//
+	// Instead, we print the outer frames, and if we reach that limit, we clone
+	// the unwinder, count the remaining frames, and then skip forward and
+	// finish printing from the clone. This makes two passes over the outer part
+	// of the stack, but the single pass over the inner part ensures that's
+	// printed immediately and not revisited. It keeps minimal state on the
+	// stack. And through a combination of skip counts and limits, we can do all
+	// of the steps we need with a single traceback printer implementation.
+	//
+	// We could be more lax about exactly how many frames we print, for example
+	// always stopping and resuming on physical frame boundaries, or at least
+	// cgo expansion boundaries. It's not clear that's much simpler.
+	flags |= unwindPrintErrors
+	var u unwinder
+	tracebackWithRuntime := func(showRuntime bool) int {
+		const maxInt int = 0x7fffffff
+		u.initAt(pc, sp, lr, gp, flags)
+		n, lastN := traceback2(&u, showRuntime, 0, tracebackInnerFrames)
+		if n < tracebackInnerFrames {
+			// We printed the whole stack.
+			return n
+		}
+		// Clone the unwinder and figure out how many frames are left. This
+		// count will include any logical frames already printed for u's current
+		// physical frame.
+		u2 := u
+		remaining, _ := traceback2(&u, showRuntime, maxInt, 0)
+		elide := remaining - lastN - tracebackOuterFrames
+		if elide > 0 {
+			print("...", elide, " frames elided...\n")
+			traceback2(&u2, showRuntime, lastN+elide, tracebackOuterFrames)
+		} else if elide <= 0 {
+			// There are tracebackOuterFrames or fewer frames left to print.
+			// Just print the rest of the stack.
+			traceback2(&u2, showRuntime, lastN, tracebackOuterFrames)
+		}
+		return n
+	}
+	// By default, omits runtime frames. If that means we print nothing at all,
+	// repeat forcing all frames printed.
+	if tracebackWithRuntime(false) == 0 {
+		tracebackWithRuntime(true)
+	}
+	printcreatedby(gp)
+
+	if gp.ancestors == nil {
+		return
+	}
+	for _, ancestor := range *gp.ancestors {
+		printAncestorTraceback(ancestor)
+	}
+}
+
+// traceback2 prints a stack trace starting at u. It skips the first "skip"
+// logical frames, after which it prints at most "max" logical frames. It
+// returns n, which is the number of logical frames skipped and printed, and
+// lastN, which is the number of logical frames skipped or printed just in the
+// physical frame that u references.
+func traceback2(u *unwinder, showRuntime bool, skip, max int) (n, lastN int) {
+	// commitFrame commits to a logical frame and returns whether this frame
+	// should be printed and whether iteration should stop.
+	commitFrame := func() (pr, stop bool) {
+		if skip == 0 && max == 0 {
+			// Stop
+			return false, true
+		}
+		n++
+		lastN++
+		if skip > 0 {
+			// Skip
+			skip--
+			return false, false
+		}
+		// Print
+		max--
+		return true, false
+	}
+
+	gp := u.g.ptr()
+	level, _, _ := gotraceback()
+	var cgoBuf [32]uintptr
+	for ; u.valid(); u.next() {
+		lastN = 0
+		f := u.frame.fn
+		for iu, uf := newInlineUnwinder(f, u.symPC(), noEscapePtr(&u.cache)); uf.valid(); uf = iu.next(uf) {
+			sf := iu.srcFunc(uf)
+			callee := u.calleeFuncID
+			u.calleeFuncID = sf.funcID
+			if !(showRuntime || showframe(sf, gp, n == 0, callee)) {
+				continue
+			}
+
+			if pr, stop := commitFrame(); stop {
+				return
+			} else if !pr {
+				continue
+			}
+
+			name := sf.name()
+			file, line := iu.fileLine(uf)
+			// Print during crash.
+			//	main(0x1, 0x2, 0x3)
+			//		/home/rsc/go/src/runtime/x.go:23 +0xf
+			//
+			printFuncName(name)
+			print("(")
+			if iu.isInlined(uf) {
+				print("...")
+			} else {
+				argp := unsafe.Pointer(u.frame.argp)
+				printArgs(f, argp, u.symPC())
+			}
+			print(")\n")
+			print("\t", file, ":", line)
+			if !iu.isInlined(uf) {
+				if u.frame.pc > f.entry() {
+					print(" +", hex(u.frame.pc-f.entry()))
+				}
+				if gp.m != nil && gp.m.throwing >= throwTypeRuntime && gp == gp.m.curg || level >= 2 {
+					print(" fp=", hex(u.frame.fp), " sp=", hex(u.frame.sp), " pc=", hex(u.frame.pc))
+				}
+			}
+			print("\n")
+		}
+
+		// Print cgo frames.
+		if cgoN := u.cgoCallers(cgoBuf[:]); cgoN > 0 {
+			var arg cgoSymbolizerArg
+			anySymbolized := false
+			stop := false
+			for _, pc := range cgoBuf[:cgoN] {
+				if cgoSymbolizer == nil {
+					if pr, stop := commitFrame(); stop {
+						break
+					} else if pr {
+						print("non-Go function at pc=", hex(pc), "\n")
+					}
+				} else {
+					stop = printOneCgoTraceback(pc, commitFrame, &arg)
+					anySymbolized = true
+					if stop {
+						break
+					}
+				}
+			}
+			if anySymbolized {
+				// Free symbolization state.
+				arg.pc = 0
+				callCgoSymbolizer(&arg)
+			}
+			if stop {
+				return
+			}
+		}
+	}
+	return n, 0
+}
+
+// printAncestorTraceback prints the traceback of the given ancestor.
+// TODO: Unify this with gentraceback and CallersFrames.
+func printAncestorTraceback(ancestor ancestorInfo) {
+	print("[originating from goroutine ", ancestor.goid, "]:\n")
+	for fidx, pc := range ancestor.pcs {
+		f := findfunc(pc) // f previously validated
+		if showfuncinfo(f.srcFunc(), fidx == 0, abi.FuncIDNormal) {
+			printAncestorTracebackFuncInfo(f, pc)
+		}
+	}
+	if len(ancestor.pcs) == tracebackInnerFrames {
+		print("...additional frames elided...\n")
+	}
+	// Show what created goroutine, except main goroutine (goid 1).
+	f := findfunc(ancestor.gopc)
+	if f.valid() && showfuncinfo(f.srcFunc(), false, abi.FuncIDNormal) && ancestor.goid != 1 {
+		// In ancestor mode, we'll already print the goroutine ancestor.
+		// Pass 0 for the goid parameter so we don't print it again.
+		printcreatedby1(f, ancestor.gopc, 0)
+	}
+}
+
+// printAncestorTracebackFuncInfo prints the given function info at a given pc
+// within an ancestor traceback. The precision of this info is reduced
+// due to only have access to the pcs at the time of the caller
+// goroutine being created.
+func printAncestorTracebackFuncInfo(f funcInfo, pc uintptr) {
+	u, uf := newInlineUnwinder(f, pc, nil)
+	file, line := u.fileLine(uf)
+	printFuncName(u.srcFunc(uf).name())
+	print("(...)\n")
+	print("\t", file, ":", line)
+	if pc > f.entry() {
+		print(" +", hex(pc-f.entry()))
+	}
+	print("\n")
+}
+
+func callers(skip int, pcbuf []uintptr) int {
+	sp := getcallersp()
+	pc := getcallerpc()
+	gp := getg()
+	var n int
+	systemstack(func() {
+		var u unwinder
+		u.initAt(pc, sp, 0, gp, unwindSilentErrors)
+		n = tracebackPCs(&u, skip, pcbuf)
+	})
+	return n
+}
+
+func gcallers(gp *g, skip int, pcbuf []uintptr) int {
+	var u unwinder
+	u.init(gp, unwindSilentErrors)
+	return tracebackPCs(&u, skip, pcbuf)
+}
+
+// showframe reports whether the frame with the given characteristics should
+// be printed during a traceback.
+func showframe(sf srcFunc, gp *g, firstFrame bool, calleeID abi.FuncID) bool {
+	mp := getg().m
+	if mp.throwing >= throwTypeRuntime && gp != nil && (gp == mp.curg || gp == mp.caughtsig.ptr()) {
+		return true
+	}
+	return showfuncinfo(sf, firstFrame, calleeID)
+}
+
+// showfuncinfo reports whether a function with the given characteristics should
+// be printed during a traceback.
+func showfuncinfo(sf srcFunc, firstFrame bool, calleeID abi.FuncID) bool {
+	level, _, _ := gotraceback()
+	if level > 1 {
+		// Show all frames.
+		return true
+	}
+
+	if sf.funcID == abi.FuncIDWrapper && elideWrapperCalling(calleeID) {
+		return false
+	}
+
+	name := sf.name()
+
+	// Special case: always show runtime.gopanic frame
+	// in the middle of a stack trace, so that we can
+	// see the boundary between ordinary code and
+	// panic-induced deferred code.
+	// See golang.org/issue/5832.
+	if name == "runtime.gopanic" && !firstFrame {
+		return true
+	}
+
+	return bytealg.IndexByteString(name, '.') >= 0 && (!hasPrefix(name, "runtime.") || isExportedRuntime(name))
+}
+
+// isExportedRuntime reports whether name is an exported runtime function.
+// It is only for runtime functions, so ASCII A-Z is fine.
+// TODO: this handles exported functions but not exported methods.
+func isExportedRuntime(name string) bool {
+	const n = len("runtime.")
+	return len(name) > n && name[:n] == "runtime." && 'A' <= name[n] && name[n] <= 'Z'
+}
+
+// elideWrapperCalling reports whether a wrapper function that called
+// function id should be elided from stack traces.
+func elideWrapperCalling(id abi.FuncID) bool {
+	// If the wrapper called a panic function instead of the
+	// wrapped function, we want to include it in stacks.
+	return !(id == abi.FuncID_gopanic || id == abi.FuncID_sigpanic || id == abi.FuncID_panicwrap)
+}
+
+var gStatusStrings = [...]string{
+	_Gidle:      "idle",
+	_Grunnable:  "runnable",
+	_Grunning:   "running",
+	_Gsyscall:   "syscall",
+	_Gwaiting:   "waiting",
+	_Gdead:      "dead",
+	_Gcopystack: "copystack",
+	_Gpreempted: "preempted",
+}
+
+func goroutineheader(gp *g) {
+	gpstatus := readgstatus(gp)
+
+	isScan := gpstatus&_Gscan != 0
+	gpstatus &^= _Gscan // drop the scan bit
+
+	// Basic string status
+	var status string
+	if 0 <= gpstatus && gpstatus < uint32(len(gStatusStrings)) {
+		status = gStatusStrings[gpstatus]
+	} else {
+		status = "???"
+	}
+
+	// Override.
+	if gpstatus == _Gwaiting && gp.waitreason != waitReasonZero {
+		status = gp.waitreason.String()
+	}
+
+	// approx time the G is blocked, in minutes
+	var waitfor int64
+	if (gpstatus == _Gwaiting || gpstatus == _Gsyscall) && gp.waitsince != 0 {
+		waitfor = (nanotime() - gp.waitsince) / 60e9
+	}
+	print("goroutine ", gp.goid, " [", status)
+	if isScan {
+		print(" (scan)")
+	}
+	if waitfor >= 1 {
+		print(", ", waitfor, " minutes")
+	}
+	if gp.lockedm != 0 {
+		print(", locked to thread")
+	}
+	print("]:\n")
+}
+
+func tracebackothers(me *g) {
+	level, _, _ := gotraceback()
+
+	// Show the current goroutine first, if we haven't already.
+	curgp := getg().m.curg
+	if curgp != nil && curgp != me {
+		print("\n")
+		goroutineheader(curgp)
+		traceback(^uintptr(0), ^uintptr(0), 0, curgp)
+	}
+
+	// We can't call locking forEachG here because this may be during fatal
+	// throw/panic, where locking could be out-of-order or a direct
+	// deadlock.
+	//
+	// Instead, use forEachGRace, which requires no locking. We don't lock
+	// against concurrent creation of new Gs, but even with allglock we may
+	// miss Gs created after this loop.
+	forEachGRace(func(gp *g) {
+		if gp == me || gp == curgp || readgstatus(gp) == _Gdead || isSystemGoroutine(gp, false) && level < 2 {
+			return
+		}
+		print("\n")
+		goroutineheader(gp)
+		// Note: gp.m == getg().m occurs when tracebackothers is called
+		// from a signal handler initiated during a systemstack call.
+		// The original G is still in the running state, and we want to
+		// print its stack.
+		if gp.m != getg().m && readgstatus(gp)&^_Gscan == _Grunning {
+			print("\tgoroutine running on other thread; stack unavailable\n")
+			printcreatedby(gp)
+		} else {
+			traceback(^uintptr(0), ^uintptr(0), 0, gp)
+		}
+	})
+}
+
+// tracebackHexdump hexdumps part of stk around frame.sp and frame.fp
+// for debugging purposes. If the address bad is included in the
+// hexdumped range, it will mark it as well.
+func tracebackHexdump(stk stack, frame *stkframe, bad uintptr) {
+	const expand = 32 * goarch.PtrSize
+	const maxExpand = 256 * goarch.PtrSize
+	// Start around frame.sp.
+	lo, hi := frame.sp, frame.sp
+	// Expand to include frame.fp.
+	if frame.fp != 0 && frame.fp < lo {
+		lo = frame.fp
+	}
+	if frame.fp != 0 && frame.fp > hi {
+		hi = frame.fp
+	}
+	// Expand a bit more.
+	lo, hi = lo-expand, hi+expand
+	// But don't go too far from frame.sp.
+	if lo < frame.sp-maxExpand {
+		lo = frame.sp - maxExpand
+	}
+	if hi > frame.sp+maxExpand {
+		hi = frame.sp + maxExpand
+	}
+	// And don't go outside the stack bounds.
+	if lo < stk.lo {
+		lo = stk.lo
+	}
+	if hi > stk.hi {
+		hi = stk.hi
+	}
+
+	// Print the hex dump.
+	print("stack: frame={sp:", hex(frame.sp), ", fp:", hex(frame.fp), "} stack=[", hex(stk.lo), ",", hex(stk.hi), ")\n")
+	hexdumpWords(lo, hi, func(p uintptr) byte {
+		switch p {
+		case frame.fp:
+			return '>'
+		case frame.sp:
+			return '<'
+		case bad:
+			return '!'
+		}
+		return 0
+	})
+}
+
+// isSystemGoroutine reports whether the goroutine g must be omitted
+// in stack dumps and deadlock detector. This is any goroutine that
+// starts at a runtime.* entry point, except for runtime.main,
+// runtime.handleAsyncEvent (wasm only) and sometimes runtime.runfinq.
+//
+// If fixed is true, any goroutine that can vary between user and
+// system (that is, the finalizer goroutine) is considered a user
+// goroutine.
+func isSystemGoroutine(gp *g, fixed bool) bool {
+	// Keep this in sync with internal/trace.IsSystemGoroutine.
+	f := findfunc(gp.startpc)
+	if !f.valid() {
+		return false
+	}
+	if f.funcID == abi.FuncID_runtime_main || f.funcID == abi.FuncID_handleAsyncEvent {
+		return false
+	}
+	if f.funcID == abi.FuncID_runfinq {
+		// We include the finalizer goroutine if it's calling
+		// back into user code.
+		if fixed {
+			// This goroutine can vary. In fixed mode,
+			// always consider it a user goroutine.
+			return false
+		}
+		return fingStatus.Load()&fingRunningFinalizer == 0
+	}
+	return hasPrefix(funcname(f), "runtime.")
+}
+
+// SetCgoTraceback records three C functions to use to gather
+// traceback information from C code and to convert that traceback
+// information into symbolic information. These are used when printing
+// stack traces for a program that uses cgo.
+//
+// The traceback and context functions may be called from a signal
+// handler, and must therefore use only async-signal safe functions.
+// The symbolizer function may be called while the program is
+// crashing, and so must be cautious about using memory.  None of the
+// functions may call back into Go.
+//
+// The context function will be called with a single argument, a
+// pointer to a struct:
+//
+//	struct {
+//		Context uintptr
+//	}
+//
+// In C syntax, this struct will be
+//
+//	struct {
+//		uintptr_t Context;
+//	};
+//
+// If the Context field is 0, the context function is being called to
+// record the current traceback context. It should record in the
+// Context field whatever information is needed about the current
+// point of execution to later produce a stack trace, probably the
+// stack pointer and PC. In this case the context function will be
+// called from C code.
+//
+// If the Context field is not 0, then it is a value returned by a
+// previous call to the context function. This case is called when the
+// context is no longer needed; that is, when the Go code is returning
+// to its C code caller. This permits the context function to release
+// any associated resources.
+//
+// While it would be correct for the context function to record a
+// complete a stack trace whenever it is called, and simply copy that
+// out in the traceback function, in a typical program the context
+// function will be called many times without ever recording a
+// traceback for that context. Recording a complete stack trace in a
+// call to the context function is likely to be inefficient.
+//
+// The traceback function will be called with a single argument, a
+// pointer to a struct:
+//
+//	struct {
+//		Context    uintptr
+//		SigContext uintptr
+//		Buf        *uintptr
+//		Max        uintptr
+//	}
+//
+// In C syntax, this struct will be
+//
+//	struct {
+//		uintptr_t  Context;
+//		uintptr_t  SigContext;
+//		uintptr_t* Buf;
+//		uintptr_t  Max;
+//	};
+//
+// The Context field will be zero to gather a traceback from the
+// current program execution point. In this case, the traceback
+// function will be called from C code.
+//
+// Otherwise Context will be a value previously returned by a call to
+// the context function. The traceback function should gather a stack
+// trace from that saved point in the program execution. The traceback
+// function may be called from an execution thread other than the one
+// that recorded the context, but only when the context is known to be
+// valid and unchanging. The traceback function may also be called
+// deeper in the call stack on the same thread that recorded the
+// context. The traceback function may be called multiple times with
+// the same Context value; it will usually be appropriate to cache the
+// result, if possible, the first time this is called for a specific
+// context value.
+//
+// If the traceback function is called from a signal handler on a Unix
+// system, SigContext will be the signal context argument passed to
+// the signal handler (a C ucontext_t* cast to uintptr_t). This may be
+// used to start tracing at the point where the signal occurred. If
+// the traceback function is not called from a signal handler,
+// SigContext will be zero.
+//
+// Buf is where the traceback information should be stored. It should
+// be PC values, such that Buf[0] is the PC of the caller, Buf[1] is
+// the PC of that function's caller, and so on.  Max is the maximum
+// number of entries to store.  The function should store a zero to
+// indicate the top of the stack, or that the caller is on a different
+// stack, presumably a Go stack.
+//
+// Unlike runtime.Callers, the PC values returned should, when passed
+// to the symbolizer function, return the file/line of the call
+// instruction.  No additional subtraction is required or appropriate.
+//
+// On all platforms, the traceback function is invoked when a call from
+// Go to C to Go requests a stack trace. On linux/amd64, linux/ppc64le,
+// linux/arm64, and freebsd/amd64, the traceback function is also invoked
+// when a signal is received by a thread that is executing a cgo call.
+// The traceback function should not make assumptions about when it is
+// called, as future versions of Go may make additional calls.
+//
+// The symbolizer function will be called with a single argument, a
+// pointer to a struct:
+//
+//	struct {
+//		PC      uintptr // program counter to fetch information for
+//		File    *byte   // file name (NUL terminated)
+//		Lineno  uintptr // line number
+//		Func    *byte   // function name (NUL terminated)
+//		Entry   uintptr // function entry point
+//		More    uintptr // set non-zero if more info for this PC
+//		Data    uintptr // unused by runtime, available for function
+//	}
+//
+// In C syntax, this struct will be
+//
+//	struct {
+//		uintptr_t PC;
+//		char*     File;
+//		uintptr_t Lineno;
+//		char*     Func;
+//		uintptr_t Entry;
+//		uintptr_t More;
+//		uintptr_t Data;
+//	};
+//
+// The PC field will be a value returned by a call to the traceback
+// function.
+//
+// The first time the function is called for a particular traceback,
+// all the fields except PC will be 0. The function should fill in the
+// other fields if possible, setting them to 0/nil if the information
+// is not available. The Data field may be used to store any useful
+// information across calls. The More field should be set to non-zero
+// if there is more information for this PC, zero otherwise. If More
+// is set non-zero, the function will be called again with the same
+// PC, and may return different information (this is intended for use
+// with inlined functions). If More is zero, the function will be
+// called with the next PC value in the traceback. When the traceback
+// is complete, the function will be called once more with PC set to
+// zero; this may be used to free any information. Each call will
+// leave the fields of the struct set to the same values they had upon
+// return, except for the PC field when the More field is zero. The
+// function must not keep a copy of the struct pointer between calls.
+//
+// When calling SetCgoTraceback, the version argument is the version
+// number of the structs that the functions expect to receive.
+// Currently this must be zero.
+//
+// The symbolizer function may be nil, in which case the results of
+// the traceback function will be displayed as numbers. If the
+// traceback function is nil, the symbolizer function will never be
+// called. The context function may be nil, in which case the
+// traceback function will only be called with the context field set
+// to zero.  If the context function is nil, then calls from Go to C
+// to Go will not show a traceback for the C portion of the call stack.
+//
+// SetCgoTraceback should be called only once, ideally from an init function.
+func SetCgoTraceback(version int, traceback, context, symbolizer unsafe.Pointer) {
+	if version != 0 {
+		panic("unsupported version")
+	}
+
+	if cgoTraceback != nil && cgoTraceback != traceback ||
+		cgoContext != nil && cgoContext != context ||
+		cgoSymbolizer != nil && cgoSymbolizer != symbolizer {
+		panic("call SetCgoTraceback only once")
+	}
+
+	cgoTraceback = traceback
+	cgoContext = context
+	cgoSymbolizer = symbolizer
+
+	// The context function is called when a C function calls a Go
+	// function. As such it is only called by C code in runtime/cgo.
+	if _cgo_set_context_function != nil {
+		cgocall(_cgo_set_context_function, context)
+	}
+}
+
+var cgoTraceback unsafe.Pointer
+var cgoContext unsafe.Pointer
+var cgoSymbolizer unsafe.Pointer
+
+// cgoTracebackArg is the type passed to cgoTraceback.
+type cgoTracebackArg struct {
+	context    uintptr
+	sigContext uintptr
+	buf        *uintptr
+	max        uintptr
+}
+
+// cgoContextArg is the type passed to the context function.
+type cgoContextArg struct {
+	context uintptr
+}
+
+// cgoSymbolizerArg is the type passed to cgoSymbolizer.
+type cgoSymbolizerArg struct {
+	pc       uintptr
+	file     *byte
+	lineno   uintptr
+	funcName *byte
+	entry    uintptr
+	more     uintptr
+	data     uintptr
+}
+
+// printCgoTraceback prints a traceback of callers.
+func printCgoTraceback(callers *cgoCallers) {
+	if cgoSymbolizer == nil {
+		for _, c := range callers {
+			if c == 0 {
+				break
+			}
+			print("non-Go function at pc=", hex(c), "\n")
+		}
+		return
+	}
+
+	commitFrame := func() (pr, stop bool) { return true, false }
+	var arg cgoSymbolizerArg
+	for _, c := range callers {
+		if c == 0 {
+			break
+		}
+		printOneCgoTraceback(c, commitFrame, &arg)
+	}
+	arg.pc = 0
+	callCgoSymbolizer(&arg)
+}
+
+// printOneCgoTraceback prints the traceback of a single cgo caller.
+// This can print more than one line because of inlining.
+// It returns the "stop" result of commitFrame.
+func printOneCgoTraceback(pc uintptr, commitFrame func() (pr, stop bool), arg *cgoSymbolizerArg) bool {
+	arg.pc = pc
+	for {
+		if pr, stop := commitFrame(); stop {
+			return true
+		} else if !pr {
+			continue
+		}
+
+		callCgoSymbolizer(arg)
+		if arg.funcName != nil {
+			// Note that we don't print any argument
+			// information here, not even parentheses.
+			// The symbolizer must add that if appropriate.
+			println(gostringnocopy(arg.funcName))
+		} else {
+			println("non-Go function")
+		}
+		print("\t")
+		if arg.file != nil {
+			print(gostringnocopy(arg.file), ":", arg.lineno, " ")
+		}
+		print("pc=", hex(pc), "\n")
+		if arg.more == 0 {
+			return false
+		}
+	}
+}
+
+// callCgoSymbolizer calls the cgoSymbolizer function.
+func callCgoSymbolizer(arg *cgoSymbolizerArg) {
+	call := cgocall
+	if panicking.Load() > 0 || getg().m.curg != getg() {
+		// We do not want to call into the scheduler when panicking
+		// or when on the system stack.
+		call = asmcgocall
+	}
+	if msanenabled {
+		msanwrite(unsafe.Pointer(arg), unsafe.Sizeof(cgoSymbolizerArg{}))
+	}
+	if asanenabled {
+		asanwrite(unsafe.Pointer(arg), unsafe.Sizeof(cgoSymbolizerArg{}))
+	}
+	call(cgoSymbolizer, noescape(unsafe.Pointer(arg)))
+}
+
+// cgoContextPCs gets the PC values from a cgo traceback.
+func cgoContextPCs(ctxt uintptr, buf []uintptr) {
+	if cgoTraceback == nil {
+		return
+	}
+	call := cgocall
+	if panicking.Load() > 0 || getg().m.curg != getg() {
+		// We do not want to call into the scheduler when panicking
+		// or when on the system stack.
+		call = asmcgocall
+	}
+	arg := cgoTracebackArg{
+		context: ctxt,
+		buf:     (*uintptr)(noescape(unsafe.Pointer(&buf[0]))),
+		max:     uintptr(len(buf)),
+	}
+	if msanenabled {
+		msanwrite(unsafe.Pointer(&arg), unsafe.Sizeof(arg))
+	}
+	if asanenabled {
+		asanwrite(unsafe.Pointer(&arg), unsafe.Sizeof(arg))
+	}
+	call(cgoTraceback, noescape(unsafe.Pointer(&arg)))
+}
diff --git a/src/runtime/traceback_test.go b/src/runtime/traceback_test.go
new file mode 100644
index 0000000..1617612
--- /dev/null
+++ b/src/runtime/traceback_test.go
@@ -0,0 +1,838 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"bytes"
+	"fmt"
+	"internal/abi"
+	"internal/testenv"
+	"regexp"
+	"runtime"
+	"runtime/debug"
+	"strconv"
+	"strings"
+	"sync"
+	"testing"
+	_ "unsafe"
+)
+
+// Test traceback printing of inlined frames.
+func TestTracebackInlined(t *testing.T) {
+	testenv.SkipIfOptimizationOff(t) // This test requires inlining
+	check := func(t *testing.T, r *ttiResult, funcs ...string) {
+		t.Helper()
+
+		// Check the printed traceback.
+		frames := parseTraceback1(t, r.printed).frames
+		t.Log(r.printed)
+		// Find ttiLeaf
+		for len(frames) > 0 && frames[0].funcName != "runtime_test.ttiLeaf" {
+			frames = frames[1:]
+		}
+		if len(frames) == 0 {
+			t.Errorf("missing runtime_test.ttiLeaf")
+			return
+		}
+		frames = frames[1:]
+		// Check the function sequence.
+		for i, want := range funcs {
+			got := "<end>"
+			if i < len(frames) {
+				got = frames[i].funcName
+				if strings.HasSuffix(want, ")") {
+					got += "(" + frames[i].args + ")"
+				}
+			}
+			if got != want {
+				t.Errorf("got %s, want %s", got, want)
+				return
+			}
+		}
+	}
+
+	t.Run("simple", func(t *testing.T) {
+		// Check a simple case of inlining
+		r := ttiSimple1()
+		check(t, r, "runtime_test.ttiSimple3(...)", "runtime_test.ttiSimple2(...)", "runtime_test.ttiSimple1()")
+	})
+
+	t.Run("sigpanic", func(t *testing.T) {
+		// Check that sigpanic from an inlined function prints correctly
+		r := ttiSigpanic1()
+		check(t, r, "runtime_test.ttiSigpanic1.func1()", "panic", "runtime_test.ttiSigpanic3(...)", "runtime_test.ttiSigpanic2(...)", "runtime_test.ttiSigpanic1()")
+	})
+
+	t.Run("wrapper", func(t *testing.T) {
+		// Check that a method inlined into a wrapper prints correctly
+		r := ttiWrapper1()
+		check(t, r, "runtime_test.ttiWrapper.m1(...)", "runtime_test.ttiWrapper1()")
+	})
+
+	t.Run("excluded", func(t *testing.T) {
+		// Check that when F -> G is inlined and F is excluded from stack
+		// traces, G still appears.
+		r := ttiExcluded1()
+		check(t, r, "runtime_test.ttiExcluded3(...)", "runtime_test.ttiExcluded1()")
+	})
+}
+
+type ttiResult struct {
+	printed string
+}
+
+//go:noinline
+func ttiLeaf() *ttiResult {
+	// Get a printed stack trace.
+	printed := string(debug.Stack())
+	return &ttiResult{printed}
+}
+
+//go:noinline
+func ttiSimple1() *ttiResult {
+	return ttiSimple2()
+}
+func ttiSimple2() *ttiResult {
+	return ttiSimple3()
+}
+func ttiSimple3() *ttiResult {
+	return ttiLeaf()
+}
+
+//go:noinline
+func ttiSigpanic1() (res *ttiResult) {
+	defer func() {
+		res = ttiLeaf()
+		recover()
+	}()
+	ttiSigpanic2()
+	panic("did not panic")
+}
+func ttiSigpanic2() {
+	ttiSigpanic3()
+}
+func ttiSigpanic3() {
+	var p *int
+	*p = 3
+}
+
+//go:noinline
+func ttiWrapper1() *ttiResult {
+	var w ttiWrapper
+	m := (*ttiWrapper).m1
+	return m(&w)
+}
+
+type ttiWrapper struct{}
+
+func (w ttiWrapper) m1() *ttiResult {
+	return ttiLeaf()
+}
+
+//go:noinline
+func ttiExcluded1() *ttiResult {
+	return ttiExcluded2()
+}
+
+// ttiExcluded2 should be excluded from tracebacks. There are
+// various ways this could come up. Linking it to a "runtime." name is
+// rather synthetic, but it's easy and reliable. See issue #42754 for
+// one way this happened in real code.
+//
+//go:linkname ttiExcluded2 runtime.ttiExcluded2
+//go:noinline
+func ttiExcluded2() *ttiResult {
+	return ttiExcluded3()
+}
+func ttiExcluded3() *ttiResult {
+	return ttiLeaf()
+}
+
+var testTracebackArgsBuf [1000]byte
+
+func TestTracebackElision(t *testing.T) {
+	// Test printing exactly the maximum number of frames to make sure we don't
+	// print any "elided" message, eliding exactly 1 so we have to pick back up
+	// in the paused physical frame, and eliding 10 so we have to advance the
+	// physical frame forward.
+	for _, elided := range []int{0, 1, 10} {
+		t.Run(fmt.Sprintf("elided=%d", elided), func(t *testing.T) {
+			n := elided + runtime.TracebackInnerFrames + runtime.TracebackOuterFrames
+
+			// Start a new goroutine so we have control over the whole stack.
+			stackChan := make(chan string)
+			go tteStack(n, stackChan)
+			stack := <-stackChan
+			tb := parseTraceback1(t, stack)
+
+			// Check the traceback.
+			i := 0
+			for i < n {
+				if len(tb.frames) == 0 {
+					t.Errorf("traceback ended early")
+					break
+				}
+				fr := tb.frames[0]
+				if i == runtime.TracebackInnerFrames && elided > 0 {
+					// This should be an "elided" frame.
+					if fr.elided != elided {
+						t.Errorf("want %d frames elided", elided)
+						break
+					}
+					i += fr.elided
+				} else {
+					want := fmt.Sprintf("runtime_test.tte%d", (i+1)%5)
+					if i == 0 {
+						want = "runtime/debug.Stack"
+					} else if i == n-1 {
+						want = "runtime_test.tteStack"
+					}
+					if fr.funcName != want {
+						t.Errorf("want %s, got %s", want, fr.funcName)
+						break
+					}
+					i++
+				}
+				tb.frames = tb.frames[1:]
+			}
+			if !t.Failed() && len(tb.frames) > 0 {
+				t.Errorf("got %d more frames than expected", len(tb.frames))
+			}
+			if t.Failed() {
+				t.Logf("traceback diverged at frame %d", i)
+				off := len(stack)
+				if len(tb.frames) > 0 {
+					off = tb.frames[0].off
+				}
+				t.Logf("traceback before error:\n%s", stack[:off])
+				t.Logf("traceback after error:\n%s", stack[off:])
+			}
+		})
+	}
+}
+
+// tteStack creates a stack of n logical frames and sends the traceback to
+// stack. It cycles through 5 logical frames per physical frame to make it
+// unlikely that any part of the traceback will end on a physical boundary.
+func tteStack(n int, stack chan<- string) {
+	n-- // Account for this frame
+	// This is basically a Duff's device for starting the inline stack in the
+	// right place so we wind up at tteN when n%5=N.
+	switch n % 5 {
+	case 0:
+		stack <- tte0(n)
+	case 1:
+		stack <- tte1(n)
+	case 2:
+		stack <- tte2(n)
+	case 3:
+		stack <- tte3(n)
+	case 4:
+		stack <- tte4(n)
+	default:
+		panic("unreachable")
+	}
+}
+func tte0(n int) string {
+	return tte4(n - 1)
+}
+func tte1(n int) string {
+	return tte0(n - 1)
+}
+func tte2(n int) string {
+	// tte2 opens n%5 == 2 frames. It's also the base case of the recursion,
+	// since we can open no fewer than two frames to call debug.Stack().
+	if n < 2 {
+		panic("bad n")
+	}
+	if n == 2 {
+		return string(debug.Stack())
+	}
+	return tte1(n - 1)
+}
+func tte3(n int) string {
+	return tte2(n - 1)
+}
+func tte4(n int) string {
+	return tte3(n - 1)
+}
+
+func TestTracebackArgs(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+	optimized := !testenv.OptimizationOff()
+	abiSel := func(x, y string) string {
+		// select expected output based on ABI
+		// In noopt build we always spill arguments so the output is the same as stack ABI.
+		if optimized && abi.IntArgRegs > 0 {
+			return x
+		}
+		return y
+	}
+
+	tests := []struct {
+		fn     func() int
+		expect string
+	}{
+		// simple ints
+		{
+			func() int { return testTracebackArgs1(1, 2, 3, 4, 5) },
+			"testTracebackArgs1(0x1, 0x2, 0x3, 0x4, 0x5)",
+		},
+		// some aggregates
+		{
+			func() int {
+				return testTracebackArgs2(false, struct {
+					a, b, c int
+					x       [2]int
+				}{1, 2, 3, [2]int{4, 5}}, [0]int{}, [3]byte{6, 7, 8})
+			},
+			"testTracebackArgs2(0x0, {0x1, 0x2, 0x3, {0x4, 0x5}}, {}, {0x6, 0x7, 0x8})",
+		},
+		{
+			func() int { return testTracebackArgs3([3]byte{1, 2, 3}, 4, 5, 6, [3]byte{7, 8, 9}) },
+			"testTracebackArgs3({0x1, 0x2, 0x3}, 0x4, 0x5, 0x6, {0x7, 0x8, 0x9})",
+		},
+		// too deeply nested type
+		{
+			func() int { return testTracebackArgs4(false, [1][1][1][1][1][1][1][1][1][1]int{}) },
+			"testTracebackArgs4(0x0, {{{{{...}}}}})",
+		},
+		// a lot of zero-sized type
+		{
+			func() int {
+				z := [0]int{}
+				return testTracebackArgs5(false, struct {
+					x int
+					y [0]int
+					z [2][0]int
+				}{1, z, [2][0]int{}}, z, z, z, z, z, z, z, z, z, z, z, z)
+			},
+			"testTracebackArgs5(0x0, {0x1, {}, {{}, {}}}, {}, {}, {}, {}, {}, ...)",
+		},
+
+		// edge cases for ...
+		// no ... for 10 args
+		{
+			func() int { return testTracebackArgs6a(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) },
+			"testTracebackArgs6a(0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xa)",
+		},
+		// has ... for 11 args
+		{
+			func() int { return testTracebackArgs6b(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) },
+			"testTracebackArgs6b(0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xa, ...)",
+		},
+		// no ... for aggregates with 10 words
+		{
+			func() int { return testTracebackArgs7a([10]int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}) },
+			"testTracebackArgs7a({0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xa})",
+		},
+		// has ... for aggregates with 11 words
+		{
+			func() int { return testTracebackArgs7b([11]int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}) },
+			"testTracebackArgs7b({0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xa, ...})",
+		},
+		// no ... for aggregates, but with more args
+		{
+			func() int { return testTracebackArgs7c([10]int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, 11) },
+			"testTracebackArgs7c({0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xa}, ...)",
+		},
+		// has ... for aggregates and also for more args
+		{
+			func() int { return testTracebackArgs7d([11]int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}, 12) },
+			"testTracebackArgs7d({0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xa, ...}, ...)",
+		},
+		// nested aggregates, no ...
+		{
+			func() int { return testTracebackArgs8a(testArgsType8a{1, 2, 3, 4, 5, 6, 7, 8, [2]int{9, 10}}) },
+			"testTracebackArgs8a({0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, {0x9, 0xa}})",
+		},
+		// nested aggregates, ... in inner but not outer
+		{
+			func() int { return testTracebackArgs8b(testArgsType8b{1, 2, 3, 4, 5, 6, 7, 8, [3]int{9, 10, 11}}) },
+			"testTracebackArgs8b({0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, {0x9, 0xa, ...}})",
+		},
+		// nested aggregates, ... in outer but not inner
+		{
+			func() int { return testTracebackArgs8c(testArgsType8c{1, 2, 3, 4, 5, 6, 7, 8, [2]int{9, 10}, 11}) },
+			"testTracebackArgs8c({0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, {0x9, 0xa}, ...})",
+		},
+		// nested aggregates, ... in both inner and outer
+		{
+			func() int { return testTracebackArgs8d(testArgsType8d{1, 2, 3, 4, 5, 6, 7, 8, [3]int{9, 10, 11}, 12}) },
+			"testTracebackArgs8d({0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, {0x9, 0xa, ...}, ...})",
+		},
+
+		// Register argument liveness.
+		// 1, 3 are used and live, 2, 4 are dead (in register ABI).
+		// Address-taken (7) and stack ({5, 6}) args are always live.
+		{
+			func() int {
+				poisonStack() // poison arg area to make output deterministic
+				return testTracebackArgs9(1, 2, 3, 4, [2]int{5, 6}, 7)
+			},
+			abiSel(
+				"testTracebackArgs9(0x1, 0xffffffff?, 0x3, 0xff?, {0x5, 0x6}, 0x7)",
+				"testTracebackArgs9(0x1, 0x2, 0x3, 0x4, {0x5, 0x6}, 0x7)"),
+		},
+		// No live.
+		// (Note: this assume at least 5 int registers if register ABI is used.)
+		{
+			func() int {
+				poisonStack() // poison arg area to make output deterministic
+				return testTracebackArgs10(1, 2, 3, 4, 5)
+			},
+			abiSel(
+				"testTracebackArgs10(0xffffffff?, 0xffffffff?, 0xffffffff?, 0xffffffff?, 0xffffffff?)",
+				"testTracebackArgs10(0x1, 0x2, 0x3, 0x4, 0x5)"),
+		},
+		// Conditional spills.
+		// Spill in conditional, not executed.
+		{
+			func() int {
+				poisonStack() // poison arg area to make output deterministic
+				return testTracebackArgs11a(1, 2, 3)
+			},
+			abiSel(
+				"testTracebackArgs11a(0xffffffff?, 0xffffffff?, 0xffffffff?)",
+				"testTracebackArgs11a(0x1, 0x2, 0x3)"),
+		},
+		// 2 spills in conditional, not executed; 3 spills in conditional, executed, but not statically known.
+		// So print 0x3?.
+		{
+			func() int {
+				poisonStack() // poison arg area to make output deterministic
+				return testTracebackArgs11b(1, 2, 3, 4)
+			},
+			abiSel(
+				"testTracebackArgs11b(0xffffffff?, 0xffffffff?, 0x3?, 0x4)",
+				"testTracebackArgs11b(0x1, 0x2, 0x3, 0x4)"),
+		},
+	}
+	for _, test := range tests {
+		n := test.fn()
+		got := testTracebackArgsBuf[:n]
+		if !bytes.Contains(got, []byte(test.expect)) {
+			t.Errorf("traceback does not contain expected string: want %q, got\n%s", test.expect, got)
+		}
+	}
+}
+
+//go:noinline
+func testTracebackArgs1(a, b, c, d, e int) int {
+	n := runtime.Stack(testTracebackArgsBuf[:], false)
+	if a < 0 {
+		// use in-reg args to keep them alive
+		return a + b + c + d + e
+	}
+	return n
+}
+
+//go:noinline
+func testTracebackArgs2(a bool, b struct {
+	a, b, c int
+	x       [2]int
+}, _ [0]int, d [3]byte) int {
+	n := runtime.Stack(testTracebackArgsBuf[:], false)
+	if a {
+		// use in-reg args to keep them alive
+		return b.a + b.b + b.c + b.x[0] + b.x[1] + int(d[0]) + int(d[1]) + int(d[2])
+	}
+	return n
+
+}
+
+//go:noinline
+//go:registerparams
+func testTracebackArgs3(x [3]byte, a, b, c int, y [3]byte) int {
+	n := runtime.Stack(testTracebackArgsBuf[:], false)
+	if a < 0 {
+		// use in-reg args to keep them alive
+		return int(x[0]) + int(x[1]) + int(x[2]) + a + b + c + int(y[0]) + int(y[1]) + int(y[2])
+	}
+	return n
+}
+
+//go:noinline
+func testTracebackArgs4(a bool, x [1][1][1][1][1][1][1][1][1][1]int) int {
+	n := runtime.Stack(testTracebackArgsBuf[:], false)
+	if a {
+		panic(x) // use args to keep them alive
+	}
+	return n
+}
+
+//go:noinline
+func testTracebackArgs5(a bool, x struct {
+	x int
+	y [0]int
+	z [2][0]int
+}, _, _, _, _, _, _, _, _, _, _, _, _ [0]int) int {
+	n := runtime.Stack(testTracebackArgsBuf[:], false)
+	if a {
+		panic(x) // use args to keep them alive
+	}
+	return n
+}
+
+//go:noinline
+func testTracebackArgs6a(a, b, c, d, e, f, g, h, i, j int) int {
+	n := runtime.Stack(testTracebackArgsBuf[:], false)
+	if a < 0 {
+		// use in-reg args to keep them alive
+		return a + b + c + d + e + f + g + h + i + j
+	}
+	return n
+}
+
+//go:noinline
+func testTracebackArgs6b(a, b, c, d, e, f, g, h, i, j, k int) int {
+	n := runtime.Stack(testTracebackArgsBuf[:], false)
+	if a < 0 {
+		// use in-reg args to keep them alive
+		return a + b + c + d + e + f + g + h + i + j + k
+	}
+	return n
+}
+
+//go:noinline
+func testTracebackArgs7a(a [10]int) int {
+	n := runtime.Stack(testTracebackArgsBuf[:], false)
+	if a[0] < 0 {
+		// use in-reg args to keep them alive
+		return a[1] + a[2] + a[3] + a[4] + a[5] + a[6] + a[7] + a[8] + a[9]
+	}
+	return n
+}
+
+//go:noinline
+func testTracebackArgs7b(a [11]int) int {
+	n := runtime.Stack(testTracebackArgsBuf[:], false)
+	if a[0] < 0 {
+		// use in-reg args to keep them alive
+		return a[1] + a[2] + a[3] + a[4] + a[5] + a[6] + a[7] + a[8] + a[9] + a[10]
+	}
+	return n
+}
+
+//go:noinline
+func testTracebackArgs7c(a [10]int, b int) int {
+	n := runtime.Stack(testTracebackArgsBuf[:], false)
+	if a[0] < 0 {
+		// use in-reg args to keep them alive
+		return a[1] + a[2] + a[3] + a[4] + a[5] + a[6] + a[7] + a[8] + a[9] + b
+	}
+	return n
+}
+
+//go:noinline
+func testTracebackArgs7d(a [11]int, b int) int {
+	n := runtime.Stack(testTracebackArgsBuf[:], false)
+	if a[0] < 0 {
+		// use in-reg args to keep them alive
+		return a[1] + a[2] + a[3] + a[4] + a[5] + a[6] + a[7] + a[8] + a[9] + a[10] + b
+	}
+	return n
+}
+
+type testArgsType8a struct {
+	a, b, c, d, e, f, g, h int
+	i                      [2]int
+}
+type testArgsType8b struct {
+	a, b, c, d, e, f, g, h int
+	i                      [3]int
+}
+type testArgsType8c struct {
+	a, b, c, d, e, f, g, h int
+	i                      [2]int
+	j                      int
+}
+type testArgsType8d struct {
+	a, b, c, d, e, f, g, h int
+	i                      [3]int
+	j                      int
+}
+
+//go:noinline
+func testTracebackArgs8a(a testArgsType8a) int {
+	n := runtime.Stack(testTracebackArgsBuf[:], false)
+	if a.a < 0 {
+		// use in-reg args to keep them alive
+		return a.b + a.c + a.d + a.e + a.f + a.g + a.h + a.i[0] + a.i[1]
+	}
+	return n
+}
+
+//go:noinline
+func testTracebackArgs8b(a testArgsType8b) int {
+	n := runtime.Stack(testTracebackArgsBuf[:], false)
+	if a.a < 0 {
+		// use in-reg args to keep them alive
+		return a.b + a.c + a.d + a.e + a.f + a.g + a.h + a.i[0] + a.i[1] + a.i[2]
+	}
+	return n
+}
+
+//go:noinline
+func testTracebackArgs8c(a testArgsType8c) int {
+	n := runtime.Stack(testTracebackArgsBuf[:], false)
+	if a.a < 0 {
+		// use in-reg args to keep them alive
+		return a.b + a.c + a.d + a.e + a.f + a.g + a.h + a.i[0] + a.i[1] + a.j
+	}
+	return n
+}
+
+//go:noinline
+func testTracebackArgs8d(a testArgsType8d) int {
+	n := runtime.Stack(testTracebackArgsBuf[:], false)
+	if a.a < 0 {
+		// use in-reg args to keep them alive
+		return a.b + a.c + a.d + a.e + a.f + a.g + a.h + a.i[0] + a.i[1] + a.i[2] + a.j
+	}
+	return n
+}
+
+// nosplit to avoid preemption or morestack spilling registers.
+//
+//go:nosplit
+//go:noinline
+func testTracebackArgs9(a int64, b int32, c int16, d int8, x [2]int, y int) int {
+	if a < 0 {
+		println(&y) // take address, make y live, even if no longer used at traceback
+	}
+	n := runtime.Stack(testTracebackArgsBuf[:], false)
+	if a < 0 {
+		// use half of in-reg args to keep them alive, the other half are dead
+		return int(a) + int(c)
+	}
+	return n
+}
+
+// nosplit to avoid preemption or morestack spilling registers.
+//
+//go:nosplit
+//go:noinline
+func testTracebackArgs10(a, b, c, d, e int32) int {
+	// no use of any args
+	return runtime.Stack(testTracebackArgsBuf[:], false)
+}
+
+// norace to avoid race instrumentation changing spill locations.
+// nosplit to avoid preemption or morestack spilling registers.
+//
+//go:norace
+//go:nosplit
+//go:noinline
+func testTracebackArgs11a(a, b, c int32) int {
+	if a < 0 {
+		println(a, b, c) // spill in a conditional, may not execute
+	}
+	if b < 0 {
+		return int(a + b + c)
+	}
+	return runtime.Stack(testTracebackArgsBuf[:], false)
+}
+
+// norace to avoid race instrumentation changing spill locations.
+// nosplit to avoid preemption or morestack spilling registers.
+//
+//go:norace
+//go:nosplit
+//go:noinline
+func testTracebackArgs11b(a, b, c, d int32) int {
+	var x int32
+	if a < 0 {
+		print() // spill b in a conditional
+		x = b
+	} else {
+		print() // spill c in a conditional
+		x = c
+	}
+	if d < 0 { // d is always needed
+		return int(x + d)
+	}
+	return runtime.Stack(testTracebackArgsBuf[:], false)
+}
+
+// Poison the arg area with deterministic values.
+//
+//go:noinline
+func poisonStack() [20]int {
+	return [20]int{-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1}
+}
+
+func TestTracebackParentChildGoroutines(t *testing.T) {
+	parent := fmt.Sprintf("goroutine %d", runtime.Goid())
+	var wg sync.WaitGroup
+	wg.Add(1)
+	go func() {
+		defer wg.Done()
+		buf := make([]byte, 1<<10)
+		// We collect the stack only for this goroutine (by passing
+		// false to runtime.Stack). We expect to see the current
+		// goroutine ID, and the parent goroutine ID in a message like
+		// "created by ... in goroutine N".
+		stack := string(buf[:runtime.Stack(buf, false)])
+		child := fmt.Sprintf("goroutine %d", runtime.Goid())
+		if !strings.Contains(stack, parent) || !strings.Contains(stack, child) {
+			t.Errorf("did not see parent (%s) and child (%s) IDs in stack, got %s", parent, child, stack)
+		}
+	}()
+	wg.Wait()
+}
+
+type traceback struct {
+	frames    []*tbFrame
+	createdBy *tbFrame // no args
+}
+
+type tbFrame struct {
+	funcName string
+	args     string
+	inlined  bool
+
+	// elided is set to the number of frames elided, and the other fields are
+	// set to the zero value.
+	elided int
+
+	off int // byte offset in the traceback text of this frame
+}
+
+// parseTraceback parses a printed traceback to make it easier for tests to
+// check the result.
+func parseTraceback(t *testing.T, tb string) []*traceback {
+	//lines := strings.Split(tb, "\n")
+	//nLines := len(lines)
+	off := 0
+	lineNo := 0
+	fatal := func(f string, args ...any) {
+		msg := fmt.Sprintf(f, args...)
+		t.Fatalf("%s (line %d):\n%s", msg, lineNo, tb)
+	}
+	parseFrame := func(funcName, args string) *tbFrame {
+		// Consume file/line/etc
+		if !strings.HasPrefix(tb, "\t") {
+			fatal("missing source line")
+		}
+		_, tb, _ = strings.Cut(tb, "\n")
+		lineNo++
+		inlined := args == "..."
+		return &tbFrame{funcName: funcName, args: args, inlined: inlined, off: off}
+	}
+	var elidedRe = regexp.MustCompile(`^\.\.\.([0-9]+) frames elided\.\.\.$`)
+	var tbs []*traceback
+	var cur *traceback
+	tbLen := len(tb)
+	for len(tb) > 0 {
+		var line string
+		off = tbLen - len(tb)
+		line, tb, _ = strings.Cut(tb, "\n")
+		lineNo++
+		switch {
+		case strings.HasPrefix(line, "goroutine "):
+			cur = &traceback{}
+			tbs = append(tbs, cur)
+		case line == "":
+			// Separator between goroutines
+			cur = nil
+		case line[0] == '\t':
+			fatal("unexpected indent")
+		case strings.HasPrefix(line, "created by "):
+			funcName := line[len("created by "):]
+			cur.createdBy = parseFrame(funcName, "")
+		case strings.HasSuffix(line, ")"):
+			line = line[:len(line)-1] // Trim trailing ")"
+			funcName, args, found := strings.Cut(line, "(")
+			if !found {
+				fatal("missing (")
+			}
+			frame := parseFrame(funcName, args)
+			cur.frames = append(cur.frames, frame)
+		case elidedRe.MatchString(line):
+			// "...N frames elided..."
+			nStr := elidedRe.FindStringSubmatch(line)
+			n, _ := strconv.Atoi(nStr[1])
+			frame := &tbFrame{elided: n}
+			cur.frames = append(cur.frames, frame)
+		}
+	}
+	return tbs
+}
+
+// parseTraceback1 is like parseTraceback, but expects tb to contain exactly one
+// goroutine.
+func parseTraceback1(t *testing.T, tb string) *traceback {
+	tbs := parseTraceback(t, tb)
+	if len(tbs) != 1 {
+		t.Fatalf("want 1 goroutine, got %d:\n%s", len(tbs), tb)
+	}
+	return tbs[0]
+}
+
+//go:noinline
+func testTracebackGenericFn[T any](buf []byte) int {
+	return runtime.Stack(buf[:], false)
+}
+
+func testTracebackGenericFnInlined[T any](buf []byte) int {
+	return runtime.Stack(buf[:], false)
+}
+
+type testTracebackGenericTyp[P any] struct{ x P }
+
+//go:noinline
+func (t testTracebackGenericTyp[P]) M(buf []byte) int {
+	return runtime.Stack(buf[:], false)
+}
+
+func (t testTracebackGenericTyp[P]) Inlined(buf []byte) int {
+	return runtime.Stack(buf[:], false)
+}
+
+func TestTracebackGeneric(t *testing.T) {
+	if *flagQuick {
+		t.Skip("-quick")
+	}
+	var x testTracebackGenericTyp[int]
+	tests := []struct {
+		fn     func([]byte) int
+		expect string
+	}{
+		// function, not inlined
+		{
+			testTracebackGenericFn[int],
+			"testTracebackGenericFn[...](",
+		},
+		// function, inlined
+		{
+			func(buf []byte) int { return testTracebackGenericFnInlined[int](buf) },
+			"testTracebackGenericFnInlined[...](",
+		},
+		// method, not inlined
+		{
+			x.M,
+			"testTracebackGenericTyp[...].M(",
+		},
+		// method, inlined
+		{
+			func(buf []byte) int { return x.Inlined(buf) },
+			"testTracebackGenericTyp[...].Inlined(",
+		},
+	}
+	var buf [1000]byte
+	for _, test := range tests {
+		n := test.fn(buf[:])
+		got := buf[:n]
+		if !bytes.Contains(got, []byte(test.expect)) {
+			t.Errorf("traceback does not contain expected string: want %q, got\n%s", test.expect, got)
+		}
+		if bytes.Contains(got, []byte("shape")) { // should not contain shape name
+			t.Errorf("traceback contains shape name: got\n%s", got)
+		}
+	}
+}
diff --git a/src/runtime/tracebackx_test.go b/src/runtime/tracebackx_test.go
new file mode 100644
index 0000000..b318fa3
--- /dev/null
+++ b/src/runtime/tracebackx_test.go
@@ -0,0 +1,18 @@
+// Copyright 2023 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+func XTestSPWrite(t TestingT) {
+	// Test that we can traceback from the stack check prologue of a function
+	// that writes to SP. See #62326.
+
+	// Start a goroutine to minimize the initial stack and ensure we grow the stack.
+	done := make(chan bool)
+	go func() {
+		testSPWrite() // Defined in assembly
+		done <- true
+	}()
+	<-done
+}
diff --git a/src/runtime/type.go b/src/runtime/type.go
new file mode 100644
index 0000000..1150a53
--- /dev/null
+++ b/src/runtime/type.go
@@ -0,0 +1,469 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Runtime type representation.
+
+package runtime
+
+import (
+	"internal/abi"
+	"unsafe"
+)
+
+type nameOff = abi.NameOff
+type typeOff = abi.TypeOff
+type textOff = abi.TextOff
+
+type _type = abi.Type
+
+// rtype is a wrapper that allows us to define additional methods.
+type rtype struct {
+	*abi.Type // embedding is okay here (unlike reflect) because none of this is public
+}
+
+func (t rtype) string() string {
+	s := t.nameOff(t.Str).Name()
+	if t.TFlag&abi.TFlagExtraStar != 0 {
+		return s[1:]
+	}
+	return s
+}
+
+func (t rtype) uncommon() *uncommontype {
+	return t.Uncommon()
+}
+
+func (t rtype) name() string {
+	if t.TFlag&abi.TFlagNamed == 0 {
+		return ""
+	}
+	s := t.string()
+	i := len(s) - 1
+	sqBrackets := 0
+	for i >= 0 && (s[i] != '.' || sqBrackets != 0) {
+		switch s[i] {
+		case ']':
+			sqBrackets++
+		case '[':
+			sqBrackets--
+		}
+		i--
+	}
+	return s[i+1:]
+}
+
+// pkgpath returns the path of the package where t was defined, if
+// available. This is not the same as the reflect package's PkgPath
+// method, in that it returns the package path for struct and interface
+// types, not just named types.
+func (t rtype) pkgpath() string {
+	if u := t.uncommon(); u != nil {
+		return t.nameOff(u.PkgPath).Name()
+	}
+	switch t.Kind_ & kindMask {
+	case kindStruct:
+		st := (*structtype)(unsafe.Pointer(t.Type))
+		return st.PkgPath.Name()
+	case kindInterface:
+		it := (*interfacetype)(unsafe.Pointer(t.Type))
+		return it.PkgPath.Name()
+	}
+	return ""
+}
+
+// reflectOffs holds type offsets defined at run time by the reflect package.
+//
+// When a type is defined at run time, its *rtype data lives on the heap.
+// There are a wide range of possible addresses the heap may use, that
+// may not be representable as a 32-bit offset. Moreover the GC may
+// one day start moving heap memory, in which case there is no stable
+// offset that can be defined.
+//
+// To provide stable offsets, we add pin *rtype objects in a global map
+// and treat the offset as an identifier. We use negative offsets that
+// do not overlap with any compile-time module offsets.
+//
+// Entries are created by reflect.addReflectOff.
+var reflectOffs struct {
+	lock mutex
+	next int32
+	m    map[int32]unsafe.Pointer
+	minv map[unsafe.Pointer]int32
+}
+
+func reflectOffsLock() {
+	lock(&reflectOffs.lock)
+	if raceenabled {
+		raceacquire(unsafe.Pointer(&reflectOffs.lock))
+	}
+}
+
+func reflectOffsUnlock() {
+	if raceenabled {
+		racerelease(unsafe.Pointer(&reflectOffs.lock))
+	}
+	unlock(&reflectOffs.lock)
+}
+
+func resolveNameOff(ptrInModule unsafe.Pointer, off nameOff) name {
+	if off == 0 {
+		return name{}
+	}
+	base := uintptr(ptrInModule)
+	for md := &firstmoduledata; md != nil; md = md.next {
+		if base >= md.types && base < md.etypes {
+			res := md.types + uintptr(off)
+			if res > md.etypes {
+				println("runtime: nameOff", hex(off), "out of range", hex(md.types), "-", hex(md.etypes))
+				throw("runtime: name offset out of range")
+			}
+			return name{Bytes: (*byte)(unsafe.Pointer(res))}
+		}
+	}
+
+	// No module found. see if it is a run time name.
+	reflectOffsLock()
+	res, found := reflectOffs.m[int32(off)]
+	reflectOffsUnlock()
+	if !found {
+		println("runtime: nameOff", hex(off), "base", hex(base), "not in ranges:")
+		for next := &firstmoduledata; next != nil; next = next.next {
+			println("\ttypes", hex(next.types), "etypes", hex(next.etypes))
+		}
+		throw("runtime: name offset base pointer out of range")
+	}
+	return name{Bytes: (*byte)(res)}
+}
+
+func (t rtype) nameOff(off nameOff) name {
+	return resolveNameOff(unsafe.Pointer(t.Type), off)
+}
+
+func resolveTypeOff(ptrInModule unsafe.Pointer, off typeOff) *_type {
+	if off == 0 || off == -1 {
+		// -1 is the sentinel value for unreachable code.
+		// See cmd/link/internal/ld/data.go:relocsym.
+		return nil
+	}
+	base := uintptr(ptrInModule)
+	var md *moduledata
+	for next := &firstmoduledata; next != nil; next = next.next {
+		if base >= next.types && base < next.etypes {
+			md = next
+			break
+		}
+	}
+	if md == nil {
+		reflectOffsLock()
+		res := reflectOffs.m[int32(off)]
+		reflectOffsUnlock()
+		if res == nil {
+			println("runtime: typeOff", hex(off), "base", hex(base), "not in ranges:")
+			for next := &firstmoduledata; next != nil; next = next.next {
+				println("\ttypes", hex(next.types), "etypes", hex(next.etypes))
+			}
+			throw("runtime: type offset base pointer out of range")
+		}
+		return (*_type)(res)
+	}
+	if t := md.typemap[off]; t != nil {
+		return t
+	}
+	res := md.types + uintptr(off)
+	if res > md.etypes {
+		println("runtime: typeOff", hex(off), "out of range", hex(md.types), "-", hex(md.etypes))
+		throw("runtime: type offset out of range")
+	}
+	return (*_type)(unsafe.Pointer(res))
+}
+
+func (t rtype) typeOff(off typeOff) *_type {
+	return resolveTypeOff(unsafe.Pointer(t.Type), off)
+}
+
+func (t rtype) textOff(off textOff) unsafe.Pointer {
+	if off == -1 {
+		// -1 is the sentinel value for unreachable code.
+		// See cmd/link/internal/ld/data.go:relocsym.
+		return unsafe.Pointer(abi.FuncPCABIInternal(unreachableMethod))
+	}
+	base := uintptr(unsafe.Pointer(t.Type))
+	var md *moduledata
+	for next := &firstmoduledata; next != nil; next = next.next {
+		if base >= next.types && base < next.etypes {
+			md = next
+			break
+		}
+	}
+	if md == nil {
+		reflectOffsLock()
+		res := reflectOffs.m[int32(off)]
+		reflectOffsUnlock()
+		if res == nil {
+			println("runtime: textOff", hex(off), "base", hex(base), "not in ranges:")
+			for next := &firstmoduledata; next != nil; next = next.next {
+				println("\ttypes", hex(next.types), "etypes", hex(next.etypes))
+			}
+			throw("runtime: text offset base pointer out of range")
+		}
+		return res
+	}
+	res := md.textAddr(uint32(off))
+	return unsafe.Pointer(res)
+}
+
+type uncommontype = abi.UncommonType
+
+type interfacetype = abi.InterfaceType
+
+type maptype = abi.MapType
+
+type arraytype = abi.ArrayType
+
+type chantype = abi.ChanType
+
+type slicetype = abi.SliceType
+
+type functype = abi.FuncType
+
+type ptrtype = abi.PtrType
+
+type name = abi.Name
+
+type structtype = abi.StructType
+
+func pkgPath(n name) string {
+	if n.Bytes == nil || *n.Data(0)&(1<<2) == 0 {
+		return ""
+	}
+	i, l := n.ReadVarint(1)
+	off := 1 + i + l
+	if *n.Data(0)&(1<<1) != 0 {
+		i2, l2 := n.ReadVarint(off)
+		off += i2 + l2
+	}
+	var nameOff nameOff
+	copy((*[4]byte)(unsafe.Pointer(&nameOff))[:], (*[4]byte)(unsafe.Pointer(n.Data(off)))[:])
+	pkgPathName := resolveNameOff(unsafe.Pointer(n.Bytes), nameOff)
+	return pkgPathName.Name()
+}
+
+// typelinksinit scans the types from extra modules and builds the
+// moduledata typemap used to de-duplicate type pointers.
+func typelinksinit() {
+	if firstmoduledata.next == nil {
+		return
+	}
+	typehash := make(map[uint32][]*_type, len(firstmoduledata.typelinks))
+
+	modules := activeModules()
+	prev := modules[0]
+	for _, md := range modules[1:] {
+		// Collect types from the previous module into typehash.
+	collect:
+		for _, tl := range prev.typelinks {
+			var t *_type
+			if prev.typemap == nil {
+				t = (*_type)(unsafe.Pointer(prev.types + uintptr(tl)))
+			} else {
+				t = prev.typemap[typeOff(tl)]
+			}
+			// Add to typehash if not seen before.
+			tlist := typehash[t.Hash]
+			for _, tcur := range tlist {
+				if tcur == t {
+					continue collect
+				}
+			}
+			typehash[t.Hash] = append(tlist, t)
+		}
+
+		if md.typemap == nil {
+			// If any of this module's typelinks match a type from a
+			// prior module, prefer that prior type by adding the offset
+			// to this module's typemap.
+			tm := make(map[typeOff]*_type, len(md.typelinks))
+			pinnedTypemaps = append(pinnedTypemaps, tm)
+			md.typemap = tm
+			for _, tl := range md.typelinks {
+				t := (*_type)(unsafe.Pointer(md.types + uintptr(tl)))
+				for _, candidate := range typehash[t.Hash] {
+					seen := map[_typePair]struct{}{}
+					if typesEqual(t, candidate, seen) {
+						t = candidate
+						break
+					}
+				}
+				md.typemap[typeOff(tl)] = t
+			}
+		}
+
+		prev = md
+	}
+}
+
+type _typePair struct {
+	t1 *_type
+	t2 *_type
+}
+
+func toRType(t *abi.Type) rtype {
+	return rtype{t}
+}
+
+// typesEqual reports whether two types are equal.
+//
+// Everywhere in the runtime and reflect packages, it is assumed that
+// there is exactly one *_type per Go type, so that pointer equality
+// can be used to test if types are equal. There is one place that
+// breaks this assumption: buildmode=shared. In this case a type can
+// appear as two different pieces of memory. This is hidden from the
+// runtime and reflect package by the per-module typemap built in
+// typelinksinit. It uses typesEqual to map types from later modules
+// back into earlier ones.
+//
+// Only typelinksinit needs this function.
+func typesEqual(t, v *_type, seen map[_typePair]struct{}) bool {
+	tp := _typePair{t, v}
+	if _, ok := seen[tp]; ok {
+		return true
+	}
+
+	// mark these types as seen, and thus equivalent which prevents an infinite loop if
+	// the two types are identical, but recursively defined and loaded from
+	// different modules
+	seen[tp] = struct{}{}
+
+	if t == v {
+		return true
+	}
+	kind := t.Kind_ & kindMask
+	if kind != v.Kind_&kindMask {
+		return false
+	}
+	rt, rv := toRType(t), toRType(v)
+	if rt.string() != rv.string() {
+		return false
+	}
+	ut := t.Uncommon()
+	uv := v.Uncommon()
+	if ut != nil || uv != nil {
+		if ut == nil || uv == nil {
+			return false
+		}
+		pkgpatht := rt.nameOff(ut.PkgPath).Name()
+		pkgpathv := rv.nameOff(uv.PkgPath).Name()
+		if pkgpatht != pkgpathv {
+			return false
+		}
+	}
+	if kindBool <= kind && kind <= kindComplex128 {
+		return true
+	}
+	switch kind {
+	case kindString, kindUnsafePointer:
+		return true
+	case kindArray:
+		at := (*arraytype)(unsafe.Pointer(t))
+		av := (*arraytype)(unsafe.Pointer(v))
+		return typesEqual(at.Elem, av.Elem, seen) && at.Len == av.Len
+	case kindChan:
+		ct := (*chantype)(unsafe.Pointer(t))
+		cv := (*chantype)(unsafe.Pointer(v))
+		return ct.Dir == cv.Dir && typesEqual(ct.Elem, cv.Elem, seen)
+	case kindFunc:
+		ft := (*functype)(unsafe.Pointer(t))
+		fv := (*functype)(unsafe.Pointer(v))
+		if ft.OutCount != fv.OutCount || ft.InCount != fv.InCount {
+			return false
+		}
+		tin, vin := ft.InSlice(), fv.InSlice()
+		for i := 0; i < len(tin); i++ {
+			if !typesEqual(tin[i], vin[i], seen) {
+				return false
+			}
+		}
+		tout, vout := ft.OutSlice(), fv.OutSlice()
+		for i := 0; i < len(tout); i++ {
+			if !typesEqual(tout[i], vout[i], seen) {
+				return false
+			}
+		}
+		return true
+	case kindInterface:
+		it := (*interfacetype)(unsafe.Pointer(t))
+		iv := (*interfacetype)(unsafe.Pointer(v))
+		if it.PkgPath.Name() != iv.PkgPath.Name() {
+			return false
+		}
+		if len(it.Methods) != len(iv.Methods) {
+			return false
+		}
+		for i := range it.Methods {
+			tm := &it.Methods[i]
+			vm := &iv.Methods[i]
+			// Note the mhdr array can be relocated from
+			// another module. See #17724.
+			tname := resolveNameOff(unsafe.Pointer(tm), tm.Name)
+			vname := resolveNameOff(unsafe.Pointer(vm), vm.Name)
+			if tname.Name() != vname.Name() {
+				return false
+			}
+			if pkgPath(tname) != pkgPath(vname) {
+				return false
+			}
+			tityp := resolveTypeOff(unsafe.Pointer(tm), tm.Typ)
+			vityp := resolveTypeOff(unsafe.Pointer(vm), vm.Typ)
+			if !typesEqual(tityp, vityp, seen) {
+				return false
+			}
+		}
+		return true
+	case kindMap:
+		mt := (*maptype)(unsafe.Pointer(t))
+		mv := (*maptype)(unsafe.Pointer(v))
+		return typesEqual(mt.Key, mv.Key, seen) && typesEqual(mt.Elem, mv.Elem, seen)
+	case kindPtr:
+		pt := (*ptrtype)(unsafe.Pointer(t))
+		pv := (*ptrtype)(unsafe.Pointer(v))
+		return typesEqual(pt.Elem, pv.Elem, seen)
+	case kindSlice:
+		st := (*slicetype)(unsafe.Pointer(t))
+		sv := (*slicetype)(unsafe.Pointer(v))
+		return typesEqual(st.Elem, sv.Elem, seen)
+	case kindStruct:
+		st := (*structtype)(unsafe.Pointer(t))
+		sv := (*structtype)(unsafe.Pointer(v))
+		if len(st.Fields) != len(sv.Fields) {
+			return false
+		}
+		if st.PkgPath.Name() != sv.PkgPath.Name() {
+			return false
+		}
+		for i := range st.Fields {
+			tf := &st.Fields[i]
+			vf := &sv.Fields[i]
+			if tf.Name.Name() != vf.Name.Name() {
+				return false
+			}
+			if !typesEqual(tf.Typ, vf.Typ, seen) {
+				return false
+			}
+			if tf.Name.Tag() != vf.Name.Tag() {
+				return false
+			}
+			if tf.Offset != vf.Offset {
+				return false
+			}
+			if tf.Name.IsEmbedded() != vf.Name.IsEmbedded() {
+				return false
+			}
+		}
+		return true
+	default:
+		println("runtime: impossible type kind", kind)
+		throw("runtime: impossible type kind")
+		return false
+	}
+}
diff --git a/src/runtime/typekind.go b/src/runtime/typekind.go
new file mode 100644
index 0000000..bd2dec9
--- /dev/null
+++ b/src/runtime/typekind.go
@@ -0,0 +1,43 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	kindBool = 1 + iota
+	kindInt
+	kindInt8
+	kindInt16
+	kindInt32
+	kindInt64
+	kindUint
+	kindUint8
+	kindUint16
+	kindUint32
+	kindUint64
+	kindUintptr
+	kindFloat32
+	kindFloat64
+	kindComplex64
+	kindComplex128
+	kindArray
+	kindChan
+	kindFunc
+	kindInterface
+	kindMap
+	kindPtr
+	kindSlice
+	kindString
+	kindStruct
+	kindUnsafePointer
+
+	kindDirectIface = 1 << 5
+	kindGCProg      = 1 << 6
+	kindMask        = (1 << 5) - 1
+)
+
+// isDirectIface reports whether t is stored directly in an interface value.
+func isDirectIface(t *_type) bool {
+	return t.Kind_&kindDirectIface != 0
+}
diff --git a/src/runtime/unsafe.go b/src/runtime/unsafe.go
new file mode 100644
index 0000000..6675264
--- /dev/null
+++ b/src/runtime/unsafe.go
@@ -0,0 +1,114 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import (
+	"runtime/internal/math"
+	"unsafe"
+)
+
+func unsafestring(ptr unsafe.Pointer, len int) {
+	if len < 0 {
+		panicunsafestringlen()
+	}
+
+	if uintptr(len) > -uintptr(ptr) {
+		if ptr == nil {
+			panicunsafestringnilptr()
+		}
+		panicunsafestringlen()
+	}
+}
+
+// Keep this code in sync with cmd/compile/internal/walk/builtin.go:walkUnsafeString
+func unsafestring64(ptr unsafe.Pointer, len64 int64) {
+	len := int(len64)
+	if int64(len) != len64 {
+		panicunsafestringlen()
+	}
+	unsafestring(ptr, len)
+}
+
+func unsafestringcheckptr(ptr unsafe.Pointer, len64 int64) {
+	unsafestring64(ptr, len64)
+
+	// Check that underlying array doesn't straddle multiple heap objects.
+	// unsafestring64 has already checked for overflow.
+	if checkptrStraddles(ptr, uintptr(len64)) {
+		throw("checkptr: unsafe.String result straddles multiple allocations")
+	}
+}
+
+func panicunsafestringlen() {
+	panic(errorString("unsafe.String: len out of range"))
+}
+
+func panicunsafestringnilptr() {
+	panic(errorString("unsafe.String: ptr is nil and len is not zero"))
+}
+
+// Keep this code in sync with cmd/compile/internal/walk/builtin.go:walkUnsafeSlice
+func unsafeslice(et *_type, ptr unsafe.Pointer, len int) {
+	if len < 0 {
+		panicunsafeslicelen1(getcallerpc())
+	}
+
+	if et.Size_ == 0 {
+		if ptr == nil && len > 0 {
+			panicunsafeslicenilptr1(getcallerpc())
+		}
+	}
+
+	mem, overflow := math.MulUintptr(et.Size_, uintptr(len))
+	if overflow || mem > -uintptr(ptr) {
+		if ptr == nil {
+			panicunsafeslicenilptr1(getcallerpc())
+		}
+		panicunsafeslicelen1(getcallerpc())
+	}
+}
+
+// Keep this code in sync with cmd/compile/internal/walk/builtin.go:walkUnsafeSlice
+func unsafeslice64(et *_type, ptr unsafe.Pointer, len64 int64) {
+	len := int(len64)
+	if int64(len) != len64 {
+		panicunsafeslicelen1(getcallerpc())
+	}
+	unsafeslice(et, ptr, len)
+}
+
+func unsafeslicecheckptr(et *_type, ptr unsafe.Pointer, len64 int64) {
+	unsafeslice64(et, ptr, len64)
+
+	// Check that underlying array doesn't straddle multiple heap objects.
+	// unsafeslice64 has already checked for overflow.
+	if checkptrStraddles(ptr, uintptr(len64)*et.Size_) {
+		throw("checkptr: unsafe.Slice result straddles multiple allocations")
+	}
+}
+
+func panicunsafeslicelen() {
+	// This is called only from compiler-generated code, so we can get the
+	// source of the panic.
+	panicunsafeslicelen1(getcallerpc())
+}
+
+//go:yeswritebarrierrec
+func panicunsafeslicelen1(pc uintptr) {
+	panicCheck1(pc, "unsafe.Slice: len out of range")
+	panic(errorString("unsafe.Slice: len out of range"))
+}
+
+func panicunsafeslicenilptr() {
+	// This is called only from compiler-generated code, so we can get the
+	// source of the panic.
+	panicunsafeslicenilptr1(getcallerpc())
+}
+
+//go:yeswritebarrierrec
+func panicunsafeslicenilptr1(pc uintptr) {
+	panicCheck1(pc, "unsafe.Slice: ptr is nil and len is not zero")
+	panic(errorString("unsafe.Slice: ptr is nil and len is not zero"))
+}
diff --git a/src/runtime/utf8.go b/src/runtime/utf8.go
new file mode 100644
index 0000000..52b7576
--- /dev/null
+++ b/src/runtime/utf8.go
@@ -0,0 +1,132 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+// Numbers fundamental to the encoding.
+const (
+	runeError = '\uFFFD'     // the "error" Rune or "Unicode replacement character"
+	runeSelf  = 0x80         // characters below runeSelf are represented as themselves in a single byte.
+	maxRune   = '\U0010FFFF' // Maximum valid Unicode code point.
+)
+
+// Code points in the surrogate range are not valid for UTF-8.
+const (
+	surrogateMin = 0xD800
+	surrogateMax = 0xDFFF
+)
+
+const (
+	t1 = 0x00 // 0000 0000
+	tx = 0x80 // 1000 0000
+	t2 = 0xC0 // 1100 0000
+	t3 = 0xE0 // 1110 0000
+	t4 = 0xF0 // 1111 0000
+	t5 = 0xF8 // 1111 1000
+
+	maskx = 0x3F // 0011 1111
+	mask2 = 0x1F // 0001 1111
+	mask3 = 0x0F // 0000 1111
+	mask4 = 0x07 // 0000 0111
+
+	rune1Max = 1<<7 - 1
+	rune2Max = 1<<11 - 1
+	rune3Max = 1<<16 - 1
+
+	// The default lowest and highest continuation byte.
+	locb = 0x80 // 1000 0000
+	hicb = 0xBF // 1011 1111
+)
+
+// countrunes returns the number of runes in s.
+func countrunes(s string) int {
+	n := 0
+	for range s {
+		n++
+	}
+	return n
+}
+
+// decoderune returns the non-ASCII rune at the start of
+// s[k:] and the index after the rune in s.
+//
+// decoderune assumes that caller has checked that
+// the to be decoded rune is a non-ASCII rune.
+//
+// If the string appears to be incomplete or decoding problems
+// are encountered (runeerror, k + 1) is returned to ensure
+// progress when decoderune is used to iterate over a string.
+func decoderune(s string, k int) (r rune, pos int) {
+	pos = k
+
+	if k >= len(s) {
+		return runeError, k + 1
+	}
+
+	s = s[k:]
+
+	switch {
+	case t2 <= s[0] && s[0] < t3:
+		// 0080-07FF two byte sequence
+		if len(s) > 1 && (locb <= s[1] && s[1] <= hicb) {
+			r = rune(s[0]&mask2)<<6 | rune(s[1]&maskx)
+			pos += 2
+			if rune1Max < r {
+				return
+			}
+		}
+	case t3 <= s[0] && s[0] < t4:
+		// 0800-FFFF three byte sequence
+		if len(s) > 2 && (locb <= s[1] && s[1] <= hicb) && (locb <= s[2] && s[2] <= hicb) {
+			r = rune(s[0]&mask3)<<12 | rune(s[1]&maskx)<<6 | rune(s[2]&maskx)
+			pos += 3
+			if rune2Max < r && !(surrogateMin <= r && r <= surrogateMax) {
+				return
+			}
+		}
+	case t4 <= s[0] && s[0] < t5:
+		// 10000-1FFFFF four byte sequence
+		if len(s) > 3 && (locb <= s[1] && s[1] <= hicb) && (locb <= s[2] && s[2] <= hicb) && (locb <= s[3] && s[3] <= hicb) {
+			r = rune(s[0]&mask4)<<18 | rune(s[1]&maskx)<<12 | rune(s[2]&maskx)<<6 | rune(s[3]&maskx)
+			pos += 4
+			if rune3Max < r && r <= maxRune {
+				return
+			}
+		}
+	}
+
+	return runeError, k + 1
+}
+
+// encoderune writes into p (which must be large enough) the UTF-8 encoding of the rune.
+// It returns the number of bytes written.
+func encoderune(p []byte, r rune) int {
+	// Negative values are erroneous. Making it unsigned addresses the problem.
+	switch i := uint32(r); {
+	case i <= rune1Max:
+		p[0] = byte(r)
+		return 1
+	case i <= rune2Max:
+		_ = p[1] // eliminate bounds checks
+		p[0] = t2 | byte(r>>6)
+		p[1] = tx | byte(r)&maskx
+		return 2
+	case i > maxRune, surrogateMin <= i && i <= surrogateMax:
+		r = runeError
+		fallthrough
+	case i <= rune3Max:
+		_ = p[2] // eliminate bounds checks
+		p[0] = t3 | byte(r>>12)
+		p[1] = tx | byte(r>>6)&maskx
+		p[2] = tx | byte(r)&maskx
+		return 3
+	default:
+		_ = p[3] // eliminate bounds checks
+		p[0] = t4 | byte(r>>18)
+		p[1] = tx | byte(r>>12)&maskx
+		p[2] = tx | byte(r>>6)&maskx
+		p[3] = tx | byte(r)&maskx
+		return 4
+	}
+}
diff --git a/src/runtime/vdso_elf32.go b/src/runtime/vdso_elf32.go
new file mode 100644
index 0000000..1b8afbe
--- /dev/null
+++ b/src/runtime/vdso_elf32.go
@@ -0,0 +1,79 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (386 || arm)
+
+package runtime
+
+// ELF32 structure definitions for use by the vDSO loader
+
+type elfSym struct {
+	st_name  uint32
+	st_value uint32
+	st_size  uint32
+	st_info  byte
+	st_other byte
+	st_shndx uint16
+}
+
+type elfVerdef struct {
+	vd_version uint16 /* Version revision */
+	vd_flags   uint16 /* Version information */
+	vd_ndx     uint16 /* Version Index */
+	vd_cnt     uint16 /* Number of associated aux entries */
+	vd_hash    uint32 /* Version name hash value */
+	vd_aux     uint32 /* Offset in bytes to verdaux array */
+	vd_next    uint32 /* Offset in bytes to next verdef entry */
+}
+
+type elfEhdr struct {
+	e_ident     [_EI_NIDENT]byte /* Magic number and other info */
+	e_type      uint16           /* Object file type */
+	e_machine   uint16           /* Architecture */
+	e_version   uint32           /* Object file version */
+	e_entry     uint32           /* Entry point virtual address */
+	e_phoff     uint32           /* Program header table file offset */
+	e_shoff     uint32           /* Section header table file offset */
+	e_flags     uint32           /* Processor-specific flags */
+	e_ehsize    uint16           /* ELF header size in bytes */
+	e_phentsize uint16           /* Program header table entry size */
+	e_phnum     uint16           /* Program header table entry count */
+	e_shentsize uint16           /* Section header table entry size */
+	e_shnum     uint16           /* Section header table entry count */
+	e_shstrndx  uint16           /* Section header string table index */
+}
+
+type elfPhdr struct {
+	p_type   uint32 /* Segment type */
+	p_offset uint32 /* Segment file offset */
+	p_vaddr  uint32 /* Segment virtual address */
+	p_paddr  uint32 /* Segment physical address */
+	p_filesz uint32 /* Segment size in file */
+	p_memsz  uint32 /* Segment size in memory */
+	p_flags  uint32 /* Segment flags */
+	p_align  uint32 /* Segment alignment */
+}
+
+type elfShdr struct {
+	sh_name      uint32 /* Section name (string tbl index) */
+	sh_type      uint32 /* Section type */
+	sh_flags     uint32 /* Section flags */
+	sh_addr      uint32 /* Section virtual addr at execution */
+	sh_offset    uint32 /* Section file offset */
+	sh_size      uint32 /* Section size in bytes */
+	sh_link      uint32 /* Link to another section */
+	sh_info      uint32 /* Additional section information */
+	sh_addralign uint32 /* Section alignment */
+	sh_entsize   uint32 /* Entry size if section holds table */
+}
+
+type elfDyn struct {
+	d_tag int32  /* Dynamic entry type */
+	d_val uint32 /* Integer value */
+}
+
+type elfVerdaux struct {
+	vda_name uint32 /* Version or dependency names */
+	vda_next uint32 /* Offset in bytes to next verdaux entry */
+}
diff --git a/src/runtime/vdso_elf64.go b/src/runtime/vdso_elf64.go
new file mode 100644
index 0000000..d41d25e
--- /dev/null
+++ b/src/runtime/vdso_elf64.go
@@ -0,0 +1,79 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (amd64 || arm64 || loong64 || mips64 || mips64le || ppc64 || ppc64le || riscv64 || s390x)
+
+package runtime
+
+// ELF64 structure definitions for use by the vDSO loader
+
+type elfSym struct {
+	st_name  uint32
+	st_info  byte
+	st_other byte
+	st_shndx uint16
+	st_value uint64
+	st_size  uint64
+}
+
+type elfVerdef struct {
+	vd_version uint16 /* Version revision */
+	vd_flags   uint16 /* Version information */
+	vd_ndx     uint16 /* Version Index */
+	vd_cnt     uint16 /* Number of associated aux entries */
+	vd_hash    uint32 /* Version name hash value */
+	vd_aux     uint32 /* Offset in bytes to verdaux array */
+	vd_next    uint32 /* Offset in bytes to next verdef entry */
+}
+
+type elfEhdr struct {
+	e_ident     [_EI_NIDENT]byte /* Magic number and other info */
+	e_type      uint16           /* Object file type */
+	e_machine   uint16           /* Architecture */
+	e_version   uint32           /* Object file version */
+	e_entry     uint64           /* Entry point virtual address */
+	e_phoff     uint64           /* Program header table file offset */
+	e_shoff     uint64           /* Section header table file offset */
+	e_flags     uint32           /* Processor-specific flags */
+	e_ehsize    uint16           /* ELF header size in bytes */
+	e_phentsize uint16           /* Program header table entry size */
+	e_phnum     uint16           /* Program header table entry count */
+	e_shentsize uint16           /* Section header table entry size */
+	e_shnum     uint16           /* Section header table entry count */
+	e_shstrndx  uint16           /* Section header string table index */
+}
+
+type elfPhdr struct {
+	p_type   uint32 /* Segment type */
+	p_flags  uint32 /* Segment flags */
+	p_offset uint64 /* Segment file offset */
+	p_vaddr  uint64 /* Segment virtual address */
+	p_paddr  uint64 /* Segment physical address */
+	p_filesz uint64 /* Segment size in file */
+	p_memsz  uint64 /* Segment size in memory */
+	p_align  uint64 /* Segment alignment */
+}
+
+type elfShdr struct {
+	sh_name      uint32 /* Section name (string tbl index) */
+	sh_type      uint32 /* Section type */
+	sh_flags     uint64 /* Section flags */
+	sh_addr      uint64 /* Section virtual addr at execution */
+	sh_offset    uint64 /* Section file offset */
+	sh_size      uint64 /* Section size in bytes */
+	sh_link      uint32 /* Link to another section */
+	sh_info      uint32 /* Additional section information */
+	sh_addralign uint64 /* Section alignment */
+	sh_entsize   uint64 /* Entry size if section holds table */
+}
+
+type elfDyn struct {
+	d_tag int64  /* Dynamic entry type */
+	d_val uint64 /* Integer value */
+}
+
+type elfVerdaux struct {
+	vda_name uint32 /* Version or dependency names */
+	vda_next uint32 /* Offset in bytes to next verdaux entry */
+}
diff --git a/src/runtime/vdso_freebsd.go b/src/runtime/vdso_freebsd.go
new file mode 100644
index 0000000..0fe21cf
--- /dev/null
+++ b/src/runtime/vdso_freebsd.go
@@ -0,0 +1,114 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build freebsd
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const _VDSO_TH_NUM = 4 // defined in <sys/vdso.h> #ifdef _KERNEL
+
+var timekeepSharedPage *vdsoTimekeep
+
+//go:nosplit
+func (bt *bintime) Add(bt2 *bintime) {
+	u := bt.frac
+	bt.frac += bt2.frac
+	if u > bt.frac {
+		bt.sec++
+	}
+	bt.sec += bt2.sec
+}
+
+//go:nosplit
+func (bt *bintime) AddX(x uint64) {
+	u := bt.frac
+	bt.frac += x
+	if u > bt.frac {
+		bt.sec++
+	}
+}
+
+var (
+	// binuptimeDummy is used in binuptime as the address of an atomic.Load, to simulate
+	// an atomic_thread_fence_acq() call which behaves as an instruction reordering and
+	// memory barrier.
+	binuptimeDummy uint32
+
+	zeroBintime bintime
+)
+
+// based on /usr/src/lib/libc/sys/__vdso_gettimeofday.c
+//
+//go:nosplit
+func binuptime(abs bool) (bt bintime) {
+	timehands := (*[_VDSO_TH_NUM]vdsoTimehands)(add(unsafe.Pointer(timekeepSharedPage), vdsoTimekeepSize))
+	for {
+		if timekeepSharedPage.enabled == 0 {
+			return zeroBintime
+		}
+
+		curr := atomic.Load(&timekeepSharedPage.current) // atomic_load_acq_32
+		th := &timehands[curr]
+		gen := atomic.Load(&th.gen) // atomic_load_acq_32
+		bt = th.offset
+
+		if tc, ok := th.getTimecounter(); !ok {
+			return zeroBintime
+		} else {
+			delta := (tc - th.offset_count) & th.counter_mask
+			bt.AddX(th.scale * uint64(delta))
+		}
+		if abs {
+			bt.Add(&th.boottime)
+		}
+
+		atomic.Load(&binuptimeDummy) // atomic_thread_fence_acq()
+		if curr == timekeepSharedPage.current && gen != 0 && gen == th.gen {
+			break
+		}
+	}
+	return bt
+}
+
+//go:nosplit
+func vdsoClockGettime(clockID int32) bintime {
+	if timekeepSharedPage == nil || timekeepSharedPage.ver != _VDSO_TK_VER_CURR {
+		return zeroBintime
+	}
+	abs := false
+	switch clockID {
+	case _CLOCK_MONOTONIC:
+		/* ok */
+	case _CLOCK_REALTIME:
+		abs = true
+	default:
+		return zeroBintime
+	}
+	return binuptime(abs)
+}
+
+func fallback_nanotime() int64
+func fallback_walltime() (sec int64, nsec int32)
+
+//go:nosplit
+func nanotime1() int64 {
+	bt := vdsoClockGettime(_CLOCK_MONOTONIC)
+	if bt == zeroBintime {
+		return fallback_nanotime()
+	}
+	return int64((1e9 * uint64(bt.sec)) + ((1e9 * uint64(bt.frac>>32)) >> 32))
+}
+
+func walltime() (sec int64, nsec int32) {
+	bt := vdsoClockGettime(_CLOCK_REALTIME)
+	if bt == zeroBintime {
+		return fallback_walltime()
+	}
+	return int64(bt.sec), int32((1e9 * uint64(bt.frac>>32)) >> 32)
+}
diff --git a/src/runtime/vdso_freebsd_arm.go b/src/runtime/vdso_freebsd_arm.go
new file mode 100644
index 0000000..669fed0
--- /dev/null
+++ b/src/runtime/vdso_freebsd_arm.go
@@ -0,0 +1,21 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	_VDSO_TH_ALGO_ARM_GENTIM = 1
+)
+
+func getCntxct(physical bool) uint32
+
+//go:nosplit
+func (th *vdsoTimehands) getTimecounter() (uint32, bool) {
+	switch th.algo {
+	case _VDSO_TH_ALGO_ARM_GENTIM:
+		return getCntxct(th.physical != 0), true
+	default:
+		return 0, false
+	}
+}
diff --git a/src/runtime/vdso_freebsd_arm64.go b/src/runtime/vdso_freebsd_arm64.go
new file mode 100644
index 0000000..37b26d7
--- /dev/null
+++ b/src/runtime/vdso_freebsd_arm64.go
@@ -0,0 +1,21 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	_VDSO_TH_ALGO_ARM_GENTIM = 1
+)
+
+func getCntxct(physical bool) uint32
+
+//go:nosplit
+func (th *vdsoTimehands) getTimecounter() (uint32, bool) {
+	switch th.algo {
+	case _VDSO_TH_ALGO_ARM_GENTIM:
+		return getCntxct(th.physical != 0), true
+	default:
+		return 0, false
+	}
+}
diff --git a/src/runtime/vdso_freebsd_riscv64.go b/src/runtime/vdso_freebsd_riscv64.go
new file mode 100644
index 0000000..a4fff4b
--- /dev/null
+++ b/src/runtime/vdso_freebsd_riscv64.go
@@ -0,0 +1,21 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	_VDSO_TH_ALGO_RISCV_RDTIME = 1
+)
+
+func getCntxct() uint32
+
+//go:nosplit
+func (th *vdsoTimehands) getTimecounter() (uint32, bool) {
+	switch th.algo {
+	case _VDSO_TH_ALGO_RISCV_RDTIME:
+		return getCntxct(), true
+	default:
+		return 0, false
+	}
+}
diff --git a/src/runtime/vdso_freebsd_x86.go b/src/runtime/vdso_freebsd_x86.go
new file mode 100644
index 0000000..66d1c65
--- /dev/null
+++ b/src/runtime/vdso_freebsd_x86.go
@@ -0,0 +1,90 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build freebsd && (386 || amd64)
+
+package runtime
+
+import (
+	"runtime/internal/atomic"
+	"unsafe"
+)
+
+const (
+	_VDSO_TH_ALGO_X86_TSC  = 1
+	_VDSO_TH_ALGO_X86_HPET = 2
+)
+
+const (
+	_HPET_DEV_MAP_MAX  = 10
+	_HPET_MAIN_COUNTER = 0xf0 /* Main counter register */
+
+	hpetDevPath = "/dev/hpetX\x00"
+)
+
+var hpetDevMap [_HPET_DEV_MAP_MAX]uintptr
+
+//go:nosplit
+func (th *vdsoTimehands) getTSCTimecounter() uint32 {
+	tsc := cputicks()
+	if th.x86_shift > 0 {
+		tsc >>= th.x86_shift
+	}
+	return uint32(tsc)
+}
+
+//go:nosplit
+func (th *vdsoTimehands) getHPETTimecounter() (uint32, bool) {
+	idx := int(th.x86_hpet_idx)
+	if idx >= len(hpetDevMap) {
+		return 0, false
+	}
+
+	p := atomic.Loaduintptr(&hpetDevMap[idx])
+	if p == 0 {
+		systemstack(func() { initHPETTimecounter(idx) })
+		p = atomic.Loaduintptr(&hpetDevMap[idx])
+	}
+	if p == ^uintptr(0) {
+		return 0, false
+	}
+	return *(*uint32)(unsafe.Pointer(p + _HPET_MAIN_COUNTER)), true
+}
+
+//go:systemstack
+func initHPETTimecounter(idx int) {
+	const digits = "0123456789"
+
+	var devPath [len(hpetDevPath)]byte
+	copy(devPath[:], hpetDevPath)
+	devPath[9] = digits[idx]
+
+	fd := open(&devPath[0], 0 /* O_RDONLY */ |_O_CLOEXEC, 0)
+	if fd < 0 {
+		atomic.Casuintptr(&hpetDevMap[idx], 0, ^uintptr(0))
+		return
+	}
+
+	addr, mmapErr := mmap(nil, physPageSize, _PROT_READ, _MAP_SHARED, fd, 0)
+	closefd(fd)
+	newP := uintptr(addr)
+	if mmapErr != 0 {
+		newP = ^uintptr(0)
+	}
+	if !atomic.Casuintptr(&hpetDevMap[idx], 0, newP) && mmapErr == 0 {
+		munmap(addr, physPageSize)
+	}
+}
+
+//go:nosplit
+func (th *vdsoTimehands) getTimecounter() (uint32, bool) {
+	switch th.algo {
+	case _VDSO_TH_ALGO_X86_TSC:
+		return th.getTSCTimecounter(), true
+	case _VDSO_TH_ALGO_X86_HPET:
+		return th.getHPETTimecounter()
+	default:
+		return 0, false
+	}
+}
diff --git a/src/runtime/vdso_in_none.go b/src/runtime/vdso_in_none.go
new file mode 100644
index 0000000..3a6ee6f
--- /dev/null
+++ b/src/runtime/vdso_in_none.go
@@ -0,0 +1,13 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build (linux && !386 && !amd64 && !arm && !arm64 && !loong64 && !mips64 && !mips64le && !ppc64 && !ppc64le && !riscv64 && !s390x) || !linux
+
+package runtime
+
+// A dummy version of inVDSOPage for targets that don't use a VDSO.
+
+func inVDSOPage(pc uintptr) bool {
+	return false
+}
diff --git a/src/runtime/vdso_linux.go b/src/runtime/vdso_linux.go
new file mode 100644
index 0000000..4523615
--- /dev/null
+++ b/src/runtime/vdso_linux.go
@@ -0,0 +1,295 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (386 || amd64 || arm || arm64 || loong64 || mips64 || mips64le || ppc64 || ppc64le || riscv64 || s390x)
+
+package runtime
+
+import "unsafe"
+
+// Look up symbols in the Linux vDSO.
+
+// This code was originally based on the sample Linux vDSO parser at
+// https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/vDSO/parse_vdso.c
+
+// This implements the ELF dynamic linking spec at
+// http://sco.com/developers/gabi/latest/ch5.dynamic.html
+
+// The version section is documented at
+// https://refspecs.linuxfoundation.org/LSB_3.2.0/LSB-Core-generic/LSB-Core-generic/symversion.html
+
+const (
+	_AT_SYSINFO_EHDR = 33
+
+	_PT_LOAD    = 1 /* Loadable program segment */
+	_PT_DYNAMIC = 2 /* Dynamic linking information */
+
+	_DT_NULL     = 0          /* Marks end of dynamic section */
+	_DT_HASH     = 4          /* Dynamic symbol hash table */
+	_DT_STRTAB   = 5          /* Address of string table */
+	_DT_SYMTAB   = 6          /* Address of symbol table */
+	_DT_GNU_HASH = 0x6ffffef5 /* GNU-style dynamic symbol hash table */
+	_DT_VERSYM   = 0x6ffffff0
+	_DT_VERDEF   = 0x6ffffffc
+
+	_VER_FLG_BASE = 0x1 /* Version definition of file itself */
+
+	_SHN_UNDEF = 0 /* Undefined section */
+
+	_SHT_DYNSYM = 11 /* Dynamic linker symbol table */
+
+	_STT_FUNC = 2 /* Symbol is a code object */
+
+	_STT_NOTYPE = 0 /* Symbol type is not specified */
+
+	_STB_GLOBAL = 1 /* Global symbol */
+	_STB_WEAK   = 2 /* Weak symbol */
+
+	_EI_NIDENT = 16
+
+	// Maximum indices for the array types used when traversing the vDSO ELF structures.
+	// Computed from architecture-specific max provided by vdso_linux_*.go
+	vdsoSymTabSize     = vdsoArrayMax / unsafe.Sizeof(elfSym{})
+	vdsoDynSize        = vdsoArrayMax / unsafe.Sizeof(elfDyn{})
+	vdsoSymStringsSize = vdsoArrayMax     // byte
+	vdsoVerSymSize     = vdsoArrayMax / 2 // uint16
+	vdsoHashSize       = vdsoArrayMax / 4 // uint32
+
+	// vdsoBloomSizeScale is a scaling factor for gnuhash tables which are uint32 indexed,
+	// but contain uintptrs
+	vdsoBloomSizeScale = unsafe.Sizeof(uintptr(0)) / 4 // uint32
+)
+
+/* How to extract and insert information held in the st_info field.  */
+func _ELF_ST_BIND(val byte) byte { return val >> 4 }
+func _ELF_ST_TYPE(val byte) byte { return val & 0xf }
+
+type vdsoSymbolKey struct {
+	name    string
+	symHash uint32
+	gnuHash uint32
+	ptr     *uintptr
+}
+
+type vdsoVersionKey struct {
+	version string
+	verHash uint32
+}
+
+type vdsoInfo struct {
+	valid bool
+
+	/* Load information */
+	loadAddr   uintptr
+	loadOffset uintptr /* loadAddr - recorded vaddr */
+
+	/* Symbol table */
+	symtab     *[vdsoSymTabSize]elfSym
+	symstrings *[vdsoSymStringsSize]byte
+	chain      []uint32
+	bucket     []uint32
+	symOff     uint32
+	isGNUHash  bool
+
+	/* Version table */
+	versym *[vdsoVerSymSize]uint16
+	verdef *elfVerdef
+}
+
+// see vdso_linux_*.go for vdsoSymbolKeys[] and vdso*Sym vars
+
+func vdsoInitFromSysinfoEhdr(info *vdsoInfo, hdr *elfEhdr) {
+	info.valid = false
+	info.loadAddr = uintptr(unsafe.Pointer(hdr))
+
+	pt := unsafe.Pointer(info.loadAddr + uintptr(hdr.e_phoff))
+
+	// We need two things from the segment table: the load offset
+	// and the dynamic table.
+	var foundVaddr bool
+	var dyn *[vdsoDynSize]elfDyn
+	for i := uint16(0); i < hdr.e_phnum; i++ {
+		pt := (*elfPhdr)(add(pt, uintptr(i)*unsafe.Sizeof(elfPhdr{})))
+		switch pt.p_type {
+		case _PT_LOAD:
+			if !foundVaddr {
+				foundVaddr = true
+				info.loadOffset = info.loadAddr + uintptr(pt.p_offset-pt.p_vaddr)
+			}
+
+		case _PT_DYNAMIC:
+			dyn = (*[vdsoDynSize]elfDyn)(unsafe.Pointer(info.loadAddr + uintptr(pt.p_offset)))
+		}
+	}
+
+	if !foundVaddr || dyn == nil {
+		return // Failed
+	}
+
+	// Fish out the useful bits of the dynamic table.
+
+	var hash, gnuhash *[vdsoHashSize]uint32
+	info.symstrings = nil
+	info.symtab = nil
+	info.versym = nil
+	info.verdef = nil
+	for i := 0; dyn[i].d_tag != _DT_NULL; i++ {
+		dt := &dyn[i]
+		p := info.loadOffset + uintptr(dt.d_val)
+		switch dt.d_tag {
+		case _DT_STRTAB:
+			info.symstrings = (*[vdsoSymStringsSize]byte)(unsafe.Pointer(p))
+		case _DT_SYMTAB:
+			info.symtab = (*[vdsoSymTabSize]elfSym)(unsafe.Pointer(p))
+		case _DT_HASH:
+			hash = (*[vdsoHashSize]uint32)(unsafe.Pointer(p))
+		case _DT_GNU_HASH:
+			gnuhash = (*[vdsoHashSize]uint32)(unsafe.Pointer(p))
+		case _DT_VERSYM:
+			info.versym = (*[vdsoVerSymSize]uint16)(unsafe.Pointer(p))
+		case _DT_VERDEF:
+			info.verdef = (*elfVerdef)(unsafe.Pointer(p))
+		}
+	}
+
+	if info.symstrings == nil || info.symtab == nil || (hash == nil && gnuhash == nil) {
+		return // Failed
+	}
+
+	if info.verdef == nil {
+		info.versym = nil
+	}
+
+	if gnuhash != nil {
+		// Parse the GNU hash table header.
+		nbucket := gnuhash[0]
+		info.symOff = gnuhash[1]
+		bloomSize := gnuhash[2]
+		info.bucket = gnuhash[4+bloomSize*uint32(vdsoBloomSizeScale):][:nbucket]
+		info.chain = gnuhash[4+bloomSize*uint32(vdsoBloomSizeScale)+nbucket:]
+		info.isGNUHash = true
+	} else {
+		// Parse the hash table header.
+		nbucket := hash[0]
+		nchain := hash[1]
+		info.bucket = hash[2 : 2+nbucket]
+		info.chain = hash[2+nbucket : 2+nbucket+nchain]
+	}
+
+	// That's all we need.
+	info.valid = true
+}
+
+func vdsoFindVersion(info *vdsoInfo, ver *vdsoVersionKey) int32 {
+	if !info.valid {
+		return 0
+	}
+
+	def := info.verdef
+	for {
+		if def.vd_flags&_VER_FLG_BASE == 0 {
+			aux := (*elfVerdaux)(add(unsafe.Pointer(def), uintptr(def.vd_aux)))
+			if def.vd_hash == ver.verHash && ver.version == gostringnocopy(&info.symstrings[aux.vda_name]) {
+				return int32(def.vd_ndx & 0x7fff)
+			}
+		}
+
+		if def.vd_next == 0 {
+			break
+		}
+		def = (*elfVerdef)(add(unsafe.Pointer(def), uintptr(def.vd_next)))
+	}
+
+	return -1 // cannot match any version
+}
+
+func vdsoParseSymbols(info *vdsoInfo, version int32) {
+	if !info.valid {
+		return
+	}
+
+	apply := func(symIndex uint32, k vdsoSymbolKey) bool {
+		sym := &info.symtab[symIndex]
+		typ := _ELF_ST_TYPE(sym.st_info)
+		bind := _ELF_ST_BIND(sym.st_info)
+		// On ppc64x, VDSO functions are of type _STT_NOTYPE.
+		if typ != _STT_FUNC && typ != _STT_NOTYPE || bind != _STB_GLOBAL && bind != _STB_WEAK || sym.st_shndx == _SHN_UNDEF {
+			return false
+		}
+		if k.name != gostringnocopy(&info.symstrings[sym.st_name]) {
+			return false
+		}
+		// Check symbol version.
+		if info.versym != nil && version != 0 && int32(info.versym[symIndex]&0x7fff) != version {
+			return false
+		}
+
+		*k.ptr = info.loadOffset + uintptr(sym.st_value)
+		return true
+	}
+
+	if !info.isGNUHash {
+		// Old-style DT_HASH table.
+		for _, k := range vdsoSymbolKeys {
+			if len(info.bucket) > 0 {
+				for chain := info.bucket[k.symHash%uint32(len(info.bucket))]; chain != 0; chain = info.chain[chain] {
+					if apply(chain, k) {
+						break
+					}
+				}
+			}
+		}
+		return
+	}
+
+	// New-style DT_GNU_HASH table.
+	for _, k := range vdsoSymbolKeys {
+		symIndex := info.bucket[k.gnuHash%uint32(len(info.bucket))]
+		if symIndex < info.symOff {
+			continue
+		}
+		for ; ; symIndex++ {
+			hash := info.chain[symIndex-info.symOff]
+			if hash|1 == k.gnuHash|1 {
+				// Found a hash match.
+				if apply(symIndex, k) {
+					break
+				}
+			}
+			if hash&1 != 0 {
+				// End of chain.
+				break
+			}
+		}
+	}
+}
+
+func vdsoauxv(tag, val uintptr) {
+	switch tag {
+	case _AT_SYSINFO_EHDR:
+		if val == 0 {
+			// Something went wrong
+			return
+		}
+		var info vdsoInfo
+		// TODO(rsc): I don't understand why the compiler thinks info escapes
+		// when passed to the three functions below.
+		info1 := (*vdsoInfo)(noescape(unsafe.Pointer(&info)))
+		vdsoInitFromSysinfoEhdr(info1, (*elfEhdr)(unsafe.Pointer(val)))
+		vdsoParseSymbols(info1, vdsoFindVersion(info1, &vdsoLinuxVersion))
+	}
+}
+
+// vdsoMarker reports whether PC is on the VDSO page.
+//
+//go:nosplit
+func inVDSOPage(pc uintptr) bool {
+	for _, k := range vdsoSymbolKeys {
+		if *k.ptr != 0 {
+			page := *k.ptr &^ (physPageSize - 1)
+			return pc >= page && pc < page+physPageSize
+		}
+	}
+	return false
+}
diff --git a/src/runtime/vdso_linux_386.go b/src/runtime/vdso_linux_386.go
new file mode 100644
index 0000000..5092c7c
--- /dev/null
+++ b/src/runtime/vdso_linux_386.go
@@ -0,0 +1,21 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	// vdsoArrayMax is the byte-size of a maximally sized array on this architecture.
+	// See cmd/compile/internal/x86/galign.go arch.MAXWIDTH initialization, but must also
+	// be constrained to max +ve int.
+	vdsoArrayMax = 1<<31 - 1
+)
+
+var vdsoLinuxVersion = vdsoVersionKey{"LINUX_2.6", 0x3ae75f6}
+
+var vdsoSymbolKeys = []vdsoSymbolKey{
+	{"__vdso_clock_gettime", 0xd35ec75, 0x6e43a318, &vdsoClockgettimeSym},
+}
+
+// initialize to fall back to syscall
+var vdsoClockgettimeSym uintptr = 0
diff --git a/src/runtime/vdso_linux_amd64.go b/src/runtime/vdso_linux_amd64.go
new file mode 100644
index 0000000..4e9f748
--- /dev/null
+++ b/src/runtime/vdso_linux_amd64.go
@@ -0,0 +1,23 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	// vdsoArrayMax is the byte-size of a maximally sized array on this architecture.
+	// See cmd/compile/internal/amd64/galign.go arch.MAXWIDTH initialization.
+	vdsoArrayMax = 1<<50 - 1
+)
+
+var vdsoLinuxVersion = vdsoVersionKey{"LINUX_2.6", 0x3ae75f6}
+
+var vdsoSymbolKeys = []vdsoSymbolKey{
+	{"__vdso_gettimeofday", 0x315ca59, 0xb01bca00, &vdsoGettimeofdaySym},
+	{"__vdso_clock_gettime", 0xd35ec75, 0x6e43a318, &vdsoClockgettimeSym},
+}
+
+var (
+	vdsoGettimeofdaySym uintptr
+	vdsoClockgettimeSym uintptr
+)
diff --git a/src/runtime/vdso_linux_arm.go b/src/runtime/vdso_linux_arm.go
new file mode 100644
index 0000000..ac3bdcf
--- /dev/null
+++ b/src/runtime/vdso_linux_arm.go
@@ -0,0 +1,21 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	// vdsoArrayMax is the byte-size of a maximally sized array on this architecture.
+	// See cmd/compile/internal/arm/galign.go arch.MAXWIDTH initialization, but must also
+	// be constrained to max +ve int.
+	vdsoArrayMax = 1<<31 - 1
+)
+
+var vdsoLinuxVersion = vdsoVersionKey{"LINUX_2.6", 0x3ae75f6}
+
+var vdsoSymbolKeys = []vdsoSymbolKey{
+	{"__vdso_clock_gettime", 0xd35ec75, 0x6e43a318, &vdsoClockgettimeSym},
+}
+
+// initialize to fall back to syscall
+var vdsoClockgettimeSym uintptr = 0
diff --git a/src/runtime/vdso_linux_arm64.go b/src/runtime/vdso_linux_arm64.go
new file mode 100644
index 0000000..2f003cd
--- /dev/null
+++ b/src/runtime/vdso_linux_arm64.go
@@ -0,0 +1,21 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	// vdsoArrayMax is the byte-size of a maximally sized array on this architecture.
+	// See cmd/compile/internal/arm64/galign.go arch.MAXWIDTH initialization.
+	vdsoArrayMax = 1<<50 - 1
+)
+
+// key and version at man 7 vdso : aarch64
+var vdsoLinuxVersion = vdsoVersionKey{"LINUX_2.6.39", 0x75fcb89}
+
+var vdsoSymbolKeys = []vdsoSymbolKey{
+	{"__kernel_clock_gettime", 0xd35ec75, 0x6e43a318, &vdsoClockgettimeSym},
+}
+
+// initialize to fall back to syscall
+var vdsoClockgettimeSym uintptr = 0
diff --git a/src/runtime/vdso_linux_loong64.go b/src/runtime/vdso_linux_loong64.go
new file mode 100644
index 0000000..e00ef95
--- /dev/null
+++ b/src/runtime/vdso_linux_loong64.go
@@ -0,0 +1,27 @@
+// Copyright 2022 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && loong64
+
+package runtime
+
+const (
+	// vdsoArrayMax is the byte-size of a maximally sized array on this architecture.
+	// See cmd/compile/internal/loong64/galign.go arch.MAXWIDTH initialization.
+	vdsoArrayMax = 1<<50 - 1
+)
+
+// not currently described in manpages as of May 2022, but will eventually
+// appear
+// when that happens, see man 7 vdso : loongarch
+var vdsoLinuxVersion = vdsoVersionKey{"LINUX_5.10", 0xae78f70}
+
+var vdsoSymbolKeys = []vdsoSymbolKey{
+	{"__vdso_clock_gettime", 0xd35ec75, 0x6e43a318, &vdsoClockgettimeSym},
+}
+
+// initialize to fall back to syscall
+var (
+	vdsoClockgettimeSym uintptr = 0
+)
diff --git a/src/runtime/vdso_linux_mips64x.go b/src/runtime/vdso_linux_mips64x.go
new file mode 100644
index 0000000..1444f8e
--- /dev/null
+++ b/src/runtime/vdso_linux_mips64x.go
@@ -0,0 +1,27 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (mips64 || mips64le)
+
+package runtime
+
+const (
+	// vdsoArrayMax is the byte-size of a maximally sized array on this architecture.
+	// See cmd/compile/internal/mips64/galign.go arch.MAXWIDTH initialization.
+	vdsoArrayMax = 1<<50 - 1
+)
+
+// see man 7 vdso : mips
+var vdsoLinuxVersion = vdsoVersionKey{"LINUX_2.6", 0x3ae75f6}
+
+// The symbol name is not __kernel_clock_gettime as suggested by the manpage;
+// according to Linux source code it should be __vdso_clock_gettime instead.
+var vdsoSymbolKeys = []vdsoSymbolKey{
+	{"__vdso_clock_gettime", 0xd35ec75, 0x6e43a318, &vdsoClockgettimeSym},
+}
+
+// initialize to fall back to syscall
+var (
+	vdsoClockgettimeSym uintptr = 0
+)
diff --git a/src/runtime/vdso_linux_ppc64x.go b/src/runtime/vdso_linux_ppc64x.go
new file mode 100644
index 0000000..09c8d9d
--- /dev/null
+++ b/src/runtime/vdso_linux_ppc64x.go
@@ -0,0 +1,24 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && (ppc64 || ppc64le)
+
+package runtime
+
+const (
+	// vdsoArrayMax is the byte-size of a maximally sized array on this architecture.
+	// See cmd/compile/internal/ppc64/galign.go arch.MAXWIDTH initialization.
+	vdsoArrayMax = 1<<50 - 1
+)
+
+var vdsoLinuxVersion = vdsoVersionKey{"LINUX_2.6.15", 0x75fcba5}
+
+var vdsoSymbolKeys = []vdsoSymbolKey{
+	{"__kernel_clock_gettime", 0xb0cd725, 0xdfa941fd, &vdsoClockgettimeSym},
+}
+
+// initialize with vsyscall fallbacks
+var (
+	vdsoClockgettimeSym uintptr = 0
+)
diff --git a/src/runtime/vdso_linux_riscv64.go b/src/runtime/vdso_linux_riscv64.go
new file mode 100644
index 0000000..f427124
--- /dev/null
+++ b/src/runtime/vdso_linux_riscv64.go
@@ -0,0 +1,21 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+const (
+	// vdsoArrayMax is the byte-size of a maximally sized array on this architecture.
+	// See cmd/compile/internal/riscv64/galign.go arch.MAXWIDTH initialization.
+	vdsoArrayMax = 1<<50 - 1
+)
+
+// key and version at man 7 vdso : riscv
+var vdsoLinuxVersion = vdsoVersionKey{"LINUX_4.15", 0xae77f75}
+
+var vdsoSymbolKeys = []vdsoSymbolKey{
+	{"__vdso_clock_gettime", 0xd35ec75, 0x6e43a318, &vdsoClockgettimeSym},
+}
+
+// initialize to fall back to syscall
+var vdsoClockgettimeSym uintptr = 0
diff --git a/src/runtime/vdso_linux_s390x.go b/src/runtime/vdso_linux_s390x.go
new file mode 100644
index 0000000..c1c0b1b
--- /dev/null
+++ b/src/runtime/vdso_linux_s390x.go
@@ -0,0 +1,25 @@
+// Copyright 2021 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build linux && s390x
+// +build linux,s390x
+
+package runtime
+
+const (
+	// vdsoArrayMax is the byte-size of a maximally sized array on this architecture.
+	// See cmd/compile/internal/s390x/galign.go arch.MAXWIDTH initialization.
+	vdsoArrayMax = 1<<50 - 1
+)
+
+var vdsoLinuxVersion = vdsoVersionKey{"LINUX_2.6.29", 0x75fcbb9}
+
+var vdsoSymbolKeys = []vdsoSymbolKey{
+	{"__kernel_clock_gettime", 0xb0cd725, 0xdfa941fd, &vdsoClockgettimeSym},
+}
+
+// initialize with vsyscall fallbacks
+var (
+	vdsoClockgettimeSym uintptr = 0
+)
diff --git a/src/runtime/vlop_386.s b/src/runtime/vlop_386.s
new file mode 100644
index 0000000..b478ff8
--- /dev/null
+++ b/src/runtime/vlop_386.s
@@ -0,0 +1,56 @@
+// Inferno's libkern/vlop-386.s
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/vlop-386.s
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+#include "textflag.h"
+
+/*
+ * C runtime for 64-bit divide.
+ */
+
+// runtime·_mul64x32(lo64 *uint64, a uint64, b uint32) (hi32 uint32)
+// sets *lo64 = low 64 bits of 96-bit product a*b; returns high 32 bits.
+TEXT runtime·_mul64by32(SB), NOSPLIT, $0
+	MOVL	lo64+0(FP), CX
+	MOVL	a_lo+4(FP), AX
+	MULL	b+12(FP)
+	MOVL	AX, 0(CX)
+	MOVL	DX, BX
+	MOVL	a_hi+8(FP), AX
+	MULL	b+12(FP)
+	ADDL	AX, BX
+	ADCL	$0, DX
+	MOVL	BX, 4(CX)
+	MOVL	DX, AX
+	MOVL	AX, hi32+16(FP)
+	RET
+
+TEXT runtime·_div64by32(SB), NOSPLIT, $0
+	MOVL	r+12(FP), CX
+	MOVL	a_lo+0(FP), AX
+	MOVL	a_hi+4(FP), DX
+	DIVL	b+8(FP)
+	MOVL	DX, 0(CX)
+	MOVL	AX, q+16(FP)
+	RET
diff --git a/src/runtime/vlop_arm.s b/src/runtime/vlop_arm.s
new file mode 100644
index 0000000..9e19938
--- /dev/null
+++ b/src/runtime/vlop_arm.s
@@ -0,0 +1,260 @@
+// Inferno's libkern/vlop-arm.s
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/vlop-arm.s
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+#include "go_asm.h"
+#include "go_tls.h"
+#include "funcdata.h"
+#include "textflag.h"
+
+// func runtime·udiv(n, d uint32) (q, r uint32)
+// compiler knowns the register usage of this function
+// Reference:
+// Sloss, Andrew et. al; ARM System Developer's Guide: Designing and Optimizing System Software
+// Morgan Kaufmann; 1 edition (April 8, 2004), ISBN 978-1558608740
+#define Rq	R0 // input d, output q
+#define Rr	R1 // input n, output r
+#define Rs	R2 // three temporary variables
+#define RM	R3
+#define Ra	R11
+
+// Be careful: Ra == R11 will be used by the linker for synthesized instructions.
+// Note: this function does not have a frame.
+TEXT runtime·udiv(SB),NOSPLIT|NOFRAME,$0
+	MOVBU	internal∕cpu·ARM+const_offsetARMHasIDIVA(SB), Ra
+	CMP	$0, Ra
+	BNE	udiv_hardware
+
+	CLZ 	Rq, Rs // find normalizing shift
+	MOVW.S	Rq<<Rs, Ra
+	MOVW	$fast_udiv_tab<>-64(SB), RM
+	ADD.NE	Ra>>25, RM, Ra // index by most significant 7 bits of divisor
+	MOVBU.NE	(Ra), Ra
+
+	SUB.S	$7, Rs
+	RSB 	$0, Rq, RM // M = -q
+	MOVW.PL	Ra<<Rs, Rq
+
+	// 1st Newton iteration
+	MUL.PL	RM, Rq, Ra // a = -q*d
+	BMI 	udiv_by_large_d
+	MULAWT	Ra, Rq, Rq, Rq // q approx q-(q*q*d>>32)
+	TEQ 	RM->1, RM // check for d=0 or d=1
+
+	// 2nd Newton iteration
+	MUL.NE	RM, Rq, Ra
+	MOVW.NE	$0, Rs
+	MULAL.NE Rq, Ra, (Rq,Rs)
+	BEQ 	udiv_by_0_or_1
+
+	// q now accurate enough for a remainder r, 0<=r<3*d
+	MULLU	Rq, Rr, (Rq,Rs) // q = (r * q) >> 32
+	ADD 	RM, Rr, Rr // r = n - d
+	MULA	RM, Rq, Rr, Rr // r = n - (q+1)*d
+
+	// since 0 <= n-q*d < 3*d; thus -d <= r < 2*d
+	CMN 	RM, Rr // t = r-d
+	SUB.CS	RM, Rr, Rr // if (t<-d || t>=0) r=r+d
+	ADD.CC	$1, Rq
+	ADD.PL	RM<<1, Rr
+	ADD.PL	$2, Rq
+	RET
+
+// use hardware divider
+udiv_hardware:
+	DIVUHW	Rq, Rr, Rs
+	MUL	Rs, Rq, RM
+	RSB	Rr, RM, Rr
+	MOVW	Rs, Rq
+	RET
+
+udiv_by_large_d:
+	// at this point we know d>=2^(31-6)=2^25
+	SUB 	$4, Ra, Ra
+	RSB 	$0, Rs, Rs
+	MOVW	Ra>>Rs, Rq
+	MULLU	Rq, Rr, (Rq,Rs)
+	MULA	RM, Rq, Rr, Rr
+
+	// q now accurate enough for a remainder r, 0<=r<4*d
+	CMN 	Rr>>1, RM // if(r/2 >= d)
+	ADD.CS	RM<<1, Rr
+	ADD.CS	$2, Rq
+	CMN 	Rr, RM
+	ADD.CS	RM, Rr
+	ADD.CS	$1, Rq
+	RET
+
+udiv_by_0_or_1:
+	// carry set if d==1, carry clear if d==0
+	BCC udiv_by_0
+	MOVW	Rr, Rq
+	MOVW	$0, Rr
+	RET
+
+udiv_by_0:
+	MOVW	$runtime·panicdivide(SB), R11
+	B	(R11)
+
+// var tab [64]byte
+// tab[0] = 255; for i := 1; i <= 63; i++ { tab[i] = (1<<14)/(64+i) }
+// laid out here as little-endian uint32s
+DATA fast_udiv_tab<>+0x00(SB)/4, $0xf4f8fcff
+DATA fast_udiv_tab<>+0x04(SB)/4, $0xe6eaedf0
+DATA fast_udiv_tab<>+0x08(SB)/4, $0xdadde0e3
+DATA fast_udiv_tab<>+0x0c(SB)/4, $0xcfd2d4d7
+DATA fast_udiv_tab<>+0x10(SB)/4, $0xc5c7cacc
+DATA fast_udiv_tab<>+0x14(SB)/4, $0xbcbec0c3
+DATA fast_udiv_tab<>+0x18(SB)/4, $0xb4b6b8ba
+DATA fast_udiv_tab<>+0x1c(SB)/4, $0xacaeb0b2
+DATA fast_udiv_tab<>+0x20(SB)/4, $0xa5a7a8aa
+DATA fast_udiv_tab<>+0x24(SB)/4, $0x9fa0a2a3
+DATA fast_udiv_tab<>+0x28(SB)/4, $0x999a9c9d
+DATA fast_udiv_tab<>+0x2c(SB)/4, $0x93949697
+DATA fast_udiv_tab<>+0x30(SB)/4, $0x8e8f9092
+DATA fast_udiv_tab<>+0x34(SB)/4, $0x898a8c8d
+DATA fast_udiv_tab<>+0x38(SB)/4, $0x85868788
+DATA fast_udiv_tab<>+0x3c(SB)/4, $0x81828384
+GLOBL fast_udiv_tab<>(SB), RODATA, $64
+
+// The linker will pass numerator in R8
+#define Rn R8
+// The linker expects the result in RTMP
+#define RTMP R11
+
+TEXT runtime·_divu(SB), NOSPLIT, $16-0
+	// It's not strictly true that there are no local pointers.
+	// It could be that the saved registers Rq, Rr, Rs, and Rm
+	// contain pointers. However, the only way this can matter
+	// is if the stack grows (which it can't, udiv is nosplit)
+	// or if a fault happens and more frames are added to
+	// the stack due to deferred functions.
+	// In the latter case, the stack can grow arbitrarily,
+	// and garbage collection can happen, and those
+	// operations care about pointers, but in that case
+	// the calling frame is dead, and so are the saved
+	// registers. So we can claim there are no pointers here.
+	NO_LOCAL_POINTERS
+	MOVW	Rq, 4(R13)
+	MOVW	Rr, 8(R13)
+	MOVW	Rs, 12(R13)
+	MOVW	RM, 16(R13)
+
+	MOVW	Rn, Rr			/* numerator */
+	MOVW	g_m(g), Rq
+	MOVW	m_divmod(Rq), Rq	/* denominator */
+	BL  	runtime·udiv(SB)
+	MOVW	Rq, RTMP
+	MOVW	4(R13), Rq
+	MOVW	8(R13), Rr
+	MOVW	12(R13), Rs
+	MOVW	16(R13), RM
+	RET
+
+TEXT runtime·_modu(SB), NOSPLIT, $16-0
+	NO_LOCAL_POINTERS
+	MOVW	Rq, 4(R13)
+	MOVW	Rr, 8(R13)
+	MOVW	Rs, 12(R13)
+	MOVW	RM, 16(R13)
+
+	MOVW	Rn, Rr			/* numerator */
+	MOVW	g_m(g), Rq
+	MOVW	m_divmod(Rq), Rq	/* denominator */
+	BL  	runtime·udiv(SB)
+	MOVW	Rr, RTMP
+	MOVW	4(R13), Rq
+	MOVW	8(R13), Rr
+	MOVW	12(R13), Rs
+	MOVW	16(R13), RM
+	RET
+
+TEXT runtime·_div(SB),NOSPLIT,$16-0
+	NO_LOCAL_POINTERS
+	MOVW	Rq, 4(R13)
+	MOVW	Rr, 8(R13)
+	MOVW	Rs, 12(R13)
+	MOVW	RM, 16(R13)
+	MOVW	Rn, Rr			/* numerator */
+	MOVW	g_m(g), Rq
+	MOVW	m_divmod(Rq), Rq	/* denominator */
+	CMP 	$0, Rr
+	BGE 	d1
+	RSB 	$0, Rr, Rr
+	CMP 	$0, Rq
+	BGE 	d2
+	RSB 	$0, Rq, Rq
+d0:
+	BL  	runtime·udiv(SB)  	/* none/both neg */
+	MOVW	Rq, RTMP
+	B	out1
+d1:
+	CMP 	$0, Rq
+	BGE 	d0
+	RSB 	$0, Rq, Rq
+d2:
+	BL  	runtime·udiv(SB)  	/* one neg */
+	RSB	$0, Rq, RTMP
+out1:
+	MOVW	4(R13), Rq
+	MOVW	8(R13), Rr
+	MOVW	12(R13), Rs
+	MOVW	16(R13), RM
+	RET
+
+TEXT runtime·_mod(SB),NOSPLIT,$16-0
+	NO_LOCAL_POINTERS
+	MOVW	Rq, 4(R13)
+	MOVW	Rr, 8(R13)
+	MOVW	Rs, 12(R13)
+	MOVW	RM, 16(R13)
+	MOVW	Rn, Rr			/* numerator */
+	MOVW	g_m(g), Rq
+	MOVW	m_divmod(Rq), Rq	/* denominator */
+	CMP 	$0, Rq
+	RSB.LT	$0, Rq, Rq
+	CMP 	$0, Rr
+	BGE 	m1
+	RSB 	$0, Rr, Rr
+	BL  	runtime·udiv(SB)  	/* neg numerator */
+	RSB 	$0, Rr, RTMP
+	B   	out
+m1:
+	BL  	runtime·udiv(SB)  	/* pos numerator */
+	MOVW	Rr, RTMP
+out:
+	MOVW	4(R13), Rq
+	MOVW	8(R13), Rr
+	MOVW	12(R13), Rs
+	MOVW	16(R13), RM
+	RET
+
+// _mul64by32 and _div64by32 not implemented on arm
+TEXT runtime·_mul64by32(SB), NOSPLIT, $0
+	MOVW	$0, R0
+	MOVW	(R0), R1 // crash
+
+TEXT runtime·_div64by32(SB), NOSPLIT, $0
+	MOVW	$0, R0
+	MOVW	(R0), R1 // crash
diff --git a/src/runtime/vlop_arm_test.go b/src/runtime/vlop_arm_test.go
new file mode 100644
index 0000000..015126a
--- /dev/null
+++ b/src/runtime/vlop_arm_test.go
@@ -0,0 +1,128 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime_test
+
+import (
+	"runtime"
+	"testing"
+)
+
+// arm soft division benchmarks adapted from
+// https://ridiculousfish.com/files/division_benchmarks.tar.gz
+
+const numeratorsSize = 1 << 21
+
+var numerators = randomNumerators()
+
+type randstate struct {
+	hi, lo uint32
+}
+
+func (r *randstate) rand() uint32 {
+	r.hi = r.hi<<16 + r.hi>>16
+	r.hi += r.lo
+	r.lo += r.hi
+	return r.hi
+}
+
+func randomNumerators() []uint32 {
+	numerators := make([]uint32, numeratorsSize)
+	random := &randstate{2147483563, 2147483563 ^ 0x49616E42}
+	for i := range numerators {
+		numerators[i] = random.rand()
+	}
+	return numerators
+}
+
+func bmUint32Div(divisor uint32, b *testing.B) {
+	var sum uint32
+	for i := 0; i < b.N; i++ {
+		sum += numerators[i&(numeratorsSize-1)] / divisor
+	}
+}
+
+func BenchmarkUint32Div7(b *testing.B)         { bmUint32Div(7, b) }
+func BenchmarkUint32Div37(b *testing.B)        { bmUint32Div(37, b) }
+func BenchmarkUint32Div123(b *testing.B)       { bmUint32Div(123, b) }
+func BenchmarkUint32Div763(b *testing.B)       { bmUint32Div(763, b) }
+func BenchmarkUint32Div1247(b *testing.B)      { bmUint32Div(1247, b) }
+func BenchmarkUint32Div9305(b *testing.B)      { bmUint32Div(9305, b) }
+func BenchmarkUint32Div13307(b *testing.B)     { bmUint32Div(13307, b) }
+func BenchmarkUint32Div52513(b *testing.B)     { bmUint32Div(52513, b) }
+func BenchmarkUint32Div60978747(b *testing.B)  { bmUint32Div(60978747, b) }
+func BenchmarkUint32Div106956295(b *testing.B) { bmUint32Div(106956295, b) }
+
+func bmUint32Mod(divisor uint32, b *testing.B) {
+	var sum uint32
+	for i := 0; i < b.N; i++ {
+		sum += numerators[i&(numeratorsSize-1)] % divisor
+	}
+}
+
+func BenchmarkUint32Mod7(b *testing.B)         { bmUint32Mod(7, b) }
+func BenchmarkUint32Mod37(b *testing.B)        { bmUint32Mod(37, b) }
+func BenchmarkUint32Mod123(b *testing.B)       { bmUint32Mod(123, b) }
+func BenchmarkUint32Mod763(b *testing.B)       { bmUint32Mod(763, b) }
+func BenchmarkUint32Mod1247(b *testing.B)      { bmUint32Mod(1247, b) }
+func BenchmarkUint32Mod9305(b *testing.B)      { bmUint32Mod(9305, b) }
+func BenchmarkUint32Mod13307(b *testing.B)     { bmUint32Mod(13307, b) }
+func BenchmarkUint32Mod52513(b *testing.B)     { bmUint32Mod(52513, b) }
+func BenchmarkUint32Mod60978747(b *testing.B)  { bmUint32Mod(60978747, b) }
+func BenchmarkUint32Mod106956295(b *testing.B) { bmUint32Mod(106956295, b) }
+
+func TestUsplit(t *testing.T) {
+	var den uint32 = 1000000
+	for _, x := range []uint32{0, 1, 999999, 1000000, 1010101, 0xFFFFFFFF} {
+		q1, r1 := runtime.Usplit(x)
+		q2, r2 := x/den, x%den
+		if q1 != q2 || r1 != r2 {
+			t.Errorf("%d/1e6, %d%%1e6 = %d, %d, want %d, %d", x, x, q1, r1, q2, r2)
+		}
+	}
+}
+
+//go:noinline
+func armFloatWrite(a *[129]float64) {
+	// This used to miscompile on arm5.
+	// The offset is too big to fit in a load.
+	// So the code does:
+	//   ldr     r0, [sp, #8]
+	//   bl      6f690 <_sfloat>
+	//   ldr     fp, [pc, #32]   ; (address of 128.0)
+	//   vldr    d0, [fp]
+	//   ldr     fp, [pc, #28]   ; (1024)
+	//   add     fp, fp, r0
+	//   vstr    d0, [fp]
+	// The software floating-point emulator gives up on the add.
+	// This causes the store to not work.
+	// See issue 15440.
+	a[128] = 128.0
+}
+func TestArmFloatBigOffsetWrite(t *testing.T) {
+	var a [129]float64
+	for i := 0; i < 128; i++ {
+		a[i] = float64(i)
+	}
+	armFloatWrite(&a)
+	for i, x := range a {
+		if x != float64(i) {
+			t.Errorf("bad entry %d:%f\n", i, x)
+		}
+	}
+}
+
+//go:noinline
+func armFloatRead(a *[129]float64) float64 {
+	return a[128]
+}
+func TestArmFloatBigOffsetRead(t *testing.T) {
+	var a [129]float64
+	for i := 0; i < 129; i++ {
+		a[i] = float64(i)
+	}
+	if x := armFloatRead(&a); x != 128.0 {
+		t.Errorf("bad value %f\n", x)
+	}
+}
diff --git a/src/runtime/vlrt.go b/src/runtime/vlrt.go
new file mode 100644
index 0000000..4b12f59
--- /dev/null
+++ b/src/runtime/vlrt.go
@@ -0,0 +1,310 @@
+// Inferno's libkern/vlrt-arm.c
+// https://bitbucket.org/inferno-os/inferno-os/src/master/libkern/vlrt-arm.c
+//
+//         Copyright © 1994-1999 Lucent Technologies Inc. All rights reserved.
+//         Revisions Copyright © 2000-2007 Vita Nuova Holdings Limited (www.vitanuova.com).  All rights reserved.
+//         Portions Copyright 2009 The Go Authors. All rights reserved.
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to deal
+// in the Software without restriction, including without limitation the rights
+// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+// copies of the Software, and to permit persons to whom the Software is
+// furnished to do so, subject to the following conditions:
+//
+// The above copyright notice and this permission notice shall be included in
+// all copies or substantial portions of the Software.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+// THE SOFTWARE.
+
+//go:build arm || 386 || mips || mipsle
+
+package runtime
+
+import "unsafe"
+
+const (
+	sign32 = 1 << (32 - 1)
+	sign64 = 1 << (64 - 1)
+)
+
+func float64toint64(d float64) (y uint64) {
+	_d2v(&y, d)
+	return
+}
+
+func float64touint64(d float64) (y uint64) {
+	_d2v(&y, d)
+	return
+}
+
+func int64tofloat64(y int64) float64 {
+	if y < 0 {
+		return -uint64tofloat64(-uint64(y))
+	}
+	return uint64tofloat64(uint64(y))
+}
+
+func uint64tofloat64(y uint64) float64 {
+	hi := float64(uint32(y >> 32))
+	lo := float64(uint32(y))
+	d := hi*(1<<32) + lo
+	return d
+}
+
+func int64tofloat32(y int64) float32 {
+	if y < 0 {
+		return -uint64tofloat32(-uint64(y))
+	}
+	return uint64tofloat32(uint64(y))
+}
+
+func uint64tofloat32(y uint64) float32 {
+	// divide into top 18, mid 23, and bottom 23 bits.
+	// (23-bit integers fit into a float32 without loss.)
+	top := uint32(y >> 46)
+	mid := uint32(y >> 23 & (1<<23 - 1))
+	bot := uint32(y & (1<<23 - 1))
+	if top == 0 {
+		return float32(mid)*(1<<23) + float32(bot)
+	}
+	if bot != 0 {
+		// Top is not zero, so the bits in bot
+		// won't make it into the final mantissa.
+		// In fact, the bottom bit of mid won't
+		// make it into the mantissa either.
+		// We only need to make sure that if top+mid
+		// is about to round down in a round-to-even
+		// scenario, and bot is not zero, we make it
+		// round up instead.
+		mid |= 1
+	}
+	return float32(top)*(1<<46) + float32(mid)*(1<<23)
+}
+
+func _d2v(y *uint64, d float64) {
+	x := *(*uint64)(unsafe.Pointer(&d))
+
+	xhi := uint32(x>>32)&0xfffff | 0x100000
+	xlo := uint32(x)
+	sh := 1075 - int32(uint32(x>>52)&0x7ff)
+
+	var ylo, yhi uint32
+	if sh >= 0 {
+		sh := uint32(sh)
+		/* v = (hi||lo) >> sh */
+		if sh < 32 {
+			if sh == 0 {
+				ylo = xlo
+				yhi = xhi
+			} else {
+				ylo = xlo>>sh | xhi<<(32-sh)
+				yhi = xhi >> sh
+			}
+		} else {
+			if sh == 32 {
+				ylo = xhi
+			} else if sh < 64 {
+				ylo = xhi >> (sh - 32)
+			}
+		}
+	} else {
+		/* v = (hi||lo) << -sh */
+		sh := uint32(-sh)
+		if sh <= 11 {
+			ylo = xlo << sh
+			yhi = xhi<<sh | xlo>>(32-sh)
+		} else {
+			/* overflow */
+			yhi = uint32(d) /* causes something awful */
+		}
+	}
+	if x&sign64 != 0 {
+		if ylo != 0 {
+			ylo = -ylo
+			yhi = ^yhi
+		} else {
+			yhi = -yhi
+		}
+	}
+
+	*y = uint64(yhi)<<32 | uint64(ylo)
+}
+func uint64div(n, d uint64) uint64 {
+	// Check for 32 bit operands
+	if uint32(n>>32) == 0 && uint32(d>>32) == 0 {
+		if uint32(d) == 0 {
+			panicdivide()
+		}
+		return uint64(uint32(n) / uint32(d))
+	}
+	q, _ := dodiv(n, d)
+	return q
+}
+
+func uint64mod(n, d uint64) uint64 {
+	// Check for 32 bit operands
+	if uint32(n>>32) == 0 && uint32(d>>32) == 0 {
+		if uint32(d) == 0 {
+			panicdivide()
+		}
+		return uint64(uint32(n) % uint32(d))
+	}
+	_, r := dodiv(n, d)
+	return r
+}
+
+func int64div(n, d int64) int64 {
+	// Check for 32 bit operands
+	if int64(int32(n)) == n && int64(int32(d)) == d {
+		if int32(n) == -0x80000000 && int32(d) == -1 {
+			// special case: 32-bit -0x80000000 / -1 = -0x80000000,
+			// but 64-bit -0x80000000 / -1 = 0x80000000.
+			return 0x80000000
+		}
+		if int32(d) == 0 {
+			panicdivide()
+		}
+		return int64(int32(n) / int32(d))
+	}
+
+	nneg := n < 0
+	dneg := d < 0
+	if nneg {
+		n = -n
+	}
+	if dneg {
+		d = -d
+	}
+	uq, _ := dodiv(uint64(n), uint64(d))
+	q := int64(uq)
+	if nneg != dneg {
+		q = -q
+	}
+	return q
+}
+
+//go:nosplit
+func int64mod(n, d int64) int64 {
+	// Check for 32 bit operands
+	if int64(int32(n)) == n && int64(int32(d)) == d {
+		if int32(d) == 0 {
+			panicdivide()
+		}
+		return int64(int32(n) % int32(d))
+	}
+
+	nneg := n < 0
+	if nneg {
+		n = -n
+	}
+	if d < 0 {
+		d = -d
+	}
+	_, ur := dodiv(uint64(n), uint64(d))
+	r := int64(ur)
+	if nneg {
+		r = -r
+	}
+	return r
+}
+
+//go:noescape
+func _mul64by32(lo64 *uint64, a uint64, b uint32) (hi32 uint32)
+
+//go:noescape
+func _div64by32(a uint64, b uint32, r *uint32) (q uint32)
+
+//go:nosplit
+func dodiv(n, d uint64) (q, r uint64) {
+	if GOARCH == "arm" {
+		// arm doesn't have a division instruction, so
+		// slowdodiv is the best that we can do.
+		return slowdodiv(n, d)
+	}
+
+	if GOARCH == "mips" || GOARCH == "mipsle" {
+		// No _div64by32 on mips and using only _mul64by32 doesn't bring much benefit
+		return slowdodiv(n, d)
+	}
+
+	if d > n {
+		return 0, n
+	}
+
+	if uint32(d>>32) != 0 {
+		t := uint32(n>>32) / uint32(d>>32)
+		var lo64 uint64
+		hi32 := _mul64by32(&lo64, d, t)
+		if hi32 != 0 || lo64 > n {
+			return slowdodiv(n, d)
+		}
+		return uint64(t), n - lo64
+	}
+
+	// d is 32 bit
+	var qhi uint32
+	if uint32(n>>32) >= uint32(d) {
+		if uint32(d) == 0 {
+			panicdivide()
+		}
+		qhi = uint32(n>>32) / uint32(d)
+		n -= uint64(uint32(d)*qhi) << 32
+	} else {
+		qhi = 0
+	}
+
+	var rlo uint32
+	qlo := _div64by32(n, uint32(d), &rlo)
+	return uint64(qhi)<<32 + uint64(qlo), uint64(rlo)
+}
+
+//go:nosplit
+func slowdodiv(n, d uint64) (q, r uint64) {
+	if d == 0 {
+		panicdivide()
+	}
+
+	// Set up the divisor and find the number of iterations needed.
+	capn := n
+	if n >= sign64 {
+		capn = sign64
+	}
+	i := 0
+	for d < capn {
+		d <<= 1
+		i++
+	}
+
+	for ; i >= 0; i-- {
+		q <<= 1
+		if n >= d {
+			n -= d
+			q |= 1
+		}
+		d >>= 1
+	}
+	return q, n
+}
+
+// Floating point control word values.
+// Bits 0-5 are bits to disable floating-point exceptions.
+// Bits 8-9 are the precision control:
+//
+//	0 = single precision a.k.a. float32
+//	2 = double precision a.k.a. float64
+//
+// Bits 10-11 are the rounding mode:
+//
+//	0 = round to nearest (even on a tie)
+//	3 = round toward zero
+var (
+	controlWord64      uint16 = 0x3f + 2<<8 + 0<<10
+	controlWord64trunc uint16 = 0x3f + 2<<8 + 3<<10
+)
diff --git a/src/runtime/wincallback.go b/src/runtime/wincallback.go
new file mode 100644
index 0000000..14847db
--- /dev/null
+++ b/src/runtime/wincallback.go
@@ -0,0 +1,127 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build ignore
+
+// Generate Windows callback assembly file.
+
+package main
+
+import (
+	"bytes"
+	"fmt"
+	"os"
+)
+
+const maxCallback = 2000
+
+func genasm386Amd64() {
+	var buf bytes.Buffer
+
+	buf.WriteString(`// Code generated by wincallback.go using 'go generate'. DO NOT EDIT.
+
+//go:build 386 || amd64
+
+#include "textflag.h"
+
+// runtime·callbackasm is called by external code to
+// execute Go implemented callback function. It is not
+// called from the start, instead runtime·compilecallback
+// always returns address into runtime·callbackasm offset
+// appropriately so different callbacks start with different
+// CALL instruction in runtime·callbackasm. This determines
+// which Go callback function is executed later on.
+
+TEXT runtime·callbackasm(SB),NOSPLIT|NOFRAME,$0
+`)
+	for i := 0; i < maxCallback; i++ {
+		buf.WriteString("\tCALL\truntime·callbackasm1(SB)\n")
+	}
+
+	filename := fmt.Sprintf("zcallback_windows.s")
+	err := os.WriteFile(filename, buf.Bytes(), 0666)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "wincallback: %s\n", err)
+		os.Exit(2)
+	}
+}
+
+func genasmArm() {
+	var buf bytes.Buffer
+
+	buf.WriteString(`// Code generated by wincallback.go using 'go generate'. DO NOT EDIT.
+
+// External code calls into callbackasm at an offset corresponding
+// to the callback index. Callbackasm is a table of MOV and B instructions.
+// The MOV instruction loads R12 with the callback index, and the
+// B instruction branches to callbackasm1.
+// callbackasm1 takes the callback index from R12 and
+// indexes into an array that stores information about each callback.
+// It then calls the Go implementation for that callback.
+#include "textflag.h"
+
+TEXT runtime·callbackasm(SB),NOSPLIT|NOFRAME,$0
+`)
+	for i := 0; i < maxCallback; i++ {
+		fmt.Fprintf(&buf, "\tMOVW\t$%d, R12\n", i)
+		buf.WriteString("\tB\truntime·callbackasm1(SB)\n")
+	}
+
+	err := os.WriteFile("zcallback_windows_arm.s", buf.Bytes(), 0666)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "wincallback: %s\n", err)
+		os.Exit(2)
+	}
+}
+
+func genasmArm64() {
+	var buf bytes.Buffer
+
+	buf.WriteString(`// Code generated by wincallback.go using 'go generate'. DO NOT EDIT.
+
+// External code calls into callbackasm at an offset corresponding
+// to the callback index. Callbackasm is a table of MOV and B instructions.
+// The MOV instruction loads R12 with the callback index, and the
+// B instruction branches to callbackasm1.
+// callbackasm1 takes the callback index from R12 and
+// indexes into an array that stores information about each callback.
+// It then calls the Go implementation for that callback.
+#include "textflag.h"
+
+TEXT runtime·callbackasm(SB),NOSPLIT|NOFRAME,$0
+`)
+	for i := 0; i < maxCallback; i++ {
+		fmt.Fprintf(&buf, "\tMOVD\t$%d, R12\n", i)
+		buf.WriteString("\tB\truntime·callbackasm1(SB)\n")
+	}
+
+	err := os.WriteFile("zcallback_windows_arm64.s", buf.Bytes(), 0666)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "wincallback: %s\n", err)
+		os.Exit(2)
+	}
+}
+
+func gengo() {
+	var buf bytes.Buffer
+
+	fmt.Fprintf(&buf, `// Code generated by wincallback.go using 'go generate'. DO NOT EDIT.
+
+package runtime
+
+const cb_max = %d // maximum number of windows callbacks allowed
+`, maxCallback)
+	err := os.WriteFile("zcallback_windows.go", buf.Bytes(), 0666)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "wincallback: %s\n", err)
+		os.Exit(2)
+	}
+}
+
+func main() {
+	genasm386Amd64()
+	genasmArm()
+	genasmArm64()
+	gengo()
+}
diff --git a/src/runtime/write_err.go b/src/runtime/write_err.go
new file mode 100644
index 0000000..81ae872
--- /dev/null
+++ b/src/runtime/write_err.go
@@ -0,0 +1,13 @@
+// Copyright 2009 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+//go:build !android
+
+package runtime
+
+import "unsafe"
+
+func writeErr(b []byte) {
+	write(2, unsafe.Pointer(&b[0]), int32(len(b)))
+}
diff --git a/src/runtime/write_err_android.go b/src/runtime/write_err_android.go
new file mode 100644
index 0000000..a876900
--- /dev/null
+++ b/src/runtime/write_err_android.go
@@ -0,0 +1,162 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+var (
+	writeHeader = []byte{6 /* ANDROID_LOG_ERROR */, 'G', 'o', 0}
+	writePath   = []byte("/dev/log/main\x00")
+	writeLogd   = []byte("/dev/socket/logdw\x00")
+
+	// guarded by printlock/printunlock.
+	writeFD  uintptr
+	writeBuf [1024]byte
+	writePos int
+)
+
+// Prior to Android-L, logging was done through writes to /dev/log files implemented
+// in kernel ring buffers. In Android-L, those /dev/log files are no longer
+// accessible and logging is done through a centralized user-mode logger, logd.
+//
+// https://android.googlesource.com/platform/system/core/+/refs/tags/android-6.0.1_r78/liblog/logd_write.c
+type loggerType int32
+
+const (
+	unknown loggerType = iota
+	legacy
+	logd
+	// TODO(hakim): logging for emulator?
+)
+
+var logger loggerType
+
+func writeErr(b []byte) {
+	if logger == unknown {
+		// Use logd if /dev/socket/logdw is available.
+		if v := uintptr(access(&writeLogd[0], 0x02 /* W_OK */)); v == 0 {
+			logger = logd
+			initLogd()
+		} else {
+			logger = legacy
+			initLegacy()
+		}
+	}
+
+	// Write to stderr for command-line programs.
+	write(2, unsafe.Pointer(&b[0]), int32(len(b)))
+
+	// Log format: "<header>\x00<message m bytes>\x00"
+	//
+	// <header>
+	//   In legacy mode: "<priority 1 byte><tag n bytes>".
+	//   In logd mode: "<android_log_header_t 11 bytes><priority 1 byte><tag n bytes>"
+	//
+	// The entire log needs to be delivered in a single syscall (the NDK
+	// does this with writev). Each log is its own line, so we need to
+	// buffer writes until we see a newline.
+	var hlen int
+	switch logger {
+	case logd:
+		hlen = writeLogdHeader()
+	case legacy:
+		hlen = len(writeHeader)
+	}
+
+	dst := writeBuf[hlen:]
+	for _, v := range b {
+		if v == 0 { // android logging won't print a zero byte
+			v = '0'
+		}
+		dst[writePos] = v
+		writePos++
+		if v == '\n' || writePos == len(dst)-1 {
+			dst[writePos] = 0
+			write(writeFD, unsafe.Pointer(&writeBuf[0]), int32(hlen+writePos))
+			for i := range dst {
+				dst[i] = 0
+			}
+			writePos = 0
+		}
+	}
+}
+
+func initLegacy() {
+	// In legacy mode, logs are written to /dev/log/main
+	writeFD = uintptr(open(&writePath[0], 0x1 /* O_WRONLY */, 0))
+	if writeFD == 0 {
+		// It is hard to do anything here. Write to stderr just
+		// in case user has root on device and has run
+		//	adb shell setprop log.redirect-stdio true
+		msg := []byte("runtime: cannot open /dev/log/main\x00")
+		write(2, unsafe.Pointer(&msg[0]), int32(len(msg)))
+		exit(2)
+	}
+
+	// Prepopulate the invariant header part.
+	copy(writeBuf[:len(writeHeader)], writeHeader)
+}
+
+// used in initLogdWrite but defined here to avoid heap allocation.
+var logdAddr sockaddr_un
+
+func initLogd() {
+	// In logd mode, logs are sent to the logd via a unix domain socket.
+	logdAddr.family = _AF_UNIX
+	copy(logdAddr.path[:], writeLogd)
+
+	// We are not using non-blocking I/O because writes taking this path
+	// are most likely triggered by panic, we cannot think of the advantage of
+	// non-blocking I/O for panic but see disadvantage (dropping panic message),
+	// and blocking I/O simplifies the code a lot.
+	fd := socket(_AF_UNIX, _SOCK_DGRAM|_O_CLOEXEC, 0)
+	if fd < 0 {
+		msg := []byte("runtime: cannot create a socket for logging\x00")
+		write(2, unsafe.Pointer(&msg[0]), int32(len(msg)))
+		exit(2)
+	}
+
+	errno := connect(fd, unsafe.Pointer(&logdAddr), int32(unsafe.Sizeof(logdAddr)))
+	if errno < 0 {
+		msg := []byte("runtime: cannot connect to /dev/socket/logdw\x00")
+		write(2, unsafe.Pointer(&msg[0]), int32(len(msg)))
+		// TODO(hakim): or should we just close fd and hope for better luck next time?
+		exit(2)
+	}
+	writeFD = uintptr(fd)
+
+	// Prepopulate invariant part of the header.
+	// The first 11 bytes will be populated later in writeLogdHeader.
+	copy(writeBuf[11:11+len(writeHeader)], writeHeader)
+}
+
+// writeLogdHeader populates the header and returns the length of the payload.
+func writeLogdHeader() int {
+	hdr := writeBuf[:11]
+
+	// The first 11 bytes of the header corresponds to android_log_header_t
+	// as defined in system/core/include/private/android_logger.h
+	//   hdr[0] log type id (unsigned char), defined in <log/log.h>
+	//   hdr[1:2] tid (uint16_t)
+	//   hdr[3:11] log_time defined in <log/log_read.h>
+	//      hdr[3:7] sec unsigned uint32, little endian.
+	//      hdr[7:11] nsec unsigned uint32, little endian.
+	hdr[0] = 0 // LOG_ID_MAIN
+	sec, nsec, _ := time_now()
+	packUint32(hdr[3:7], uint32(sec))
+	packUint32(hdr[7:11], uint32(nsec))
+
+	// TODO(hakim):  hdr[1:2] = gettid?
+
+	return 11 + len(writeHeader)
+}
+
+func packUint32(b []byte, v uint32) {
+	// little-endian.
+	b[0] = byte(v)
+	b[1] = byte(v >> 8)
+	b[2] = byte(v >> 16)
+	b[3] = byte(v >> 24)
+}
diff --git a/src/runtime/zcallback_windows.go b/src/runtime/zcallback_windows.go
new file mode 100644
index 0000000..2c3cb28
--- /dev/null
+++ b/src/runtime/zcallback_windows.go
@@ -0,0 +1,5 @@
+// Code generated by wincallback.go using 'go generate'. DO NOT EDIT.
+
+package runtime
+
+const cb_max = 2000 // maximum number of windows callbacks allowed
diff --git a/src/runtime/zcallback_windows.s b/src/runtime/zcallback_windows.s
new file mode 100644
index 0000000..86d70d6
--- /dev/null
+++ b/src/runtime/zcallback_windows.s
@@ -0,0 +1,2015 @@
+// Code generated by wincallback.go using 'go generate'. DO NOT EDIT.
+
+//go:build 386 || amd64
+
+#include "textflag.h"
+
+// runtime·callbackasm is called by external code to
+// execute Go implemented callback function. It is not
+// called from the start, instead runtime·compilecallback
+// always returns address into runtime·callbackasm offset
+// appropriately so different callbacks start with different
+// CALL instruction in runtime·callbackasm. This determines
+// which Go callback function is executed later on.
+
+TEXT runtime·callbackasm(SB),NOSPLIT|NOFRAME,$0
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
+	CALL	runtime·callbackasm1(SB)
diff --git a/src/runtime/zcallback_windows_arm.s b/src/runtime/zcallback_windows_arm.s
new file mode 100644
index 0000000..f943d84
--- /dev/null
+++ b/src/runtime/zcallback_windows_arm.s
@@ -0,0 +1,4012 @@
+// Code generated by wincallback.go using 'go generate'. DO NOT EDIT.
+
+// External code calls into callbackasm at an offset corresponding
+// to the callback index. Callbackasm is a table of MOV and B instructions.
+// The MOV instruction loads R12 with the callback index, and the
+// B instruction branches to callbackasm1.
+// callbackasm1 takes the callback index from R12 and
+// indexes into an array that stores information about each callback.
+// It then calls the Go implementation for that callback.
+#include "textflag.h"
+
+TEXT runtime·callbackasm(SB),NOSPLIT|NOFRAME,$0
+	MOVW	$0, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$2, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$3, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$4, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$5, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$6, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$7, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$8, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$9, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$10, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$11, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$12, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$13, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$14, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$15, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$16, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$17, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$18, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$19, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$20, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$21, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$22, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$23, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$24, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$25, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$26, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$27, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$28, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$29, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$30, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$31, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$32, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$33, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$34, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$35, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$36, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$37, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$38, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$39, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$40, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$41, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$42, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$43, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$44, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$45, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$46, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$47, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$48, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$49, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$50, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$51, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$52, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$53, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$54, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$55, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$56, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$57, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$58, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$59, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$60, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$61, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$62, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$63, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$64, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$65, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$66, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$67, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$68, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$69, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$70, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$71, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$72, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$73, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$74, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$75, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$76, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$77, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$78, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$79, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$80, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$81, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$82, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$83, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$84, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$85, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$86, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$87, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$88, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$89, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$90, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$91, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$92, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$93, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$94, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$95, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$96, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$97, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$98, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$99, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$100, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$101, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$102, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$103, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$104, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$105, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$106, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$107, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$108, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$109, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$110, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$111, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$112, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$113, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$114, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$115, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$116, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$117, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$118, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$119, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$120, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$121, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$122, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$123, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$124, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$125, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$126, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$127, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$128, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$129, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$130, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$131, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$132, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$133, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$134, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$135, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$136, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$137, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$138, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$139, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$140, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$141, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$142, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$143, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$144, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$145, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$146, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$147, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$148, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$149, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$150, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$151, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$152, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$153, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$154, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$155, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$156, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$157, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$158, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$159, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$160, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$161, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$162, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$163, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$164, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$165, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$166, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$167, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$168, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$169, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$170, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$171, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$172, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$173, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$174, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$175, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$176, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$177, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$178, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$179, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$180, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$181, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$182, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$183, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$184, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$185, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$186, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$187, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$188, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$189, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$190, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$191, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$192, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$193, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$194, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$195, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$196, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$197, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$198, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$199, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$200, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$201, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$202, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$203, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$204, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$205, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$206, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$207, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$208, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$209, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$210, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$211, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$212, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$213, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$214, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$215, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$216, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$217, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$218, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$219, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$220, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$221, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$222, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$223, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$224, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$225, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$226, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$227, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$228, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$229, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$230, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$231, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$232, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$233, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$234, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$235, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$236, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$237, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$238, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$239, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$240, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$241, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$242, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$243, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$244, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$245, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$246, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$247, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$248, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$249, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$250, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$251, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$252, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$253, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$254, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$255, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$256, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$257, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$258, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$259, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$260, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$261, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$262, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$263, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$264, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$265, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$266, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$267, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$268, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$269, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$270, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$271, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$272, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$273, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$274, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$275, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$276, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$277, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$278, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$279, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$280, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$281, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$282, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$283, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$284, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$285, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$286, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$287, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$288, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$289, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$290, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$291, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$292, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$293, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$294, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$295, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$296, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$297, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$298, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$299, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$300, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$301, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$302, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$303, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$304, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$305, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$306, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$307, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$308, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$309, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$310, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$311, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$312, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$313, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$314, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$315, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$316, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$317, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$318, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$319, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$320, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$321, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$322, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$323, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$324, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$325, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$326, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$327, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$328, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$329, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$330, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$331, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$332, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$333, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$334, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$335, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$336, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$337, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$338, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$339, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$340, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$341, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$342, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$343, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$344, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$345, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$346, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$347, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$348, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$349, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$350, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$351, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$352, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$353, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$354, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$355, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$356, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$357, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$358, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$359, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$360, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$361, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$362, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$363, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$364, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$365, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$366, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$367, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$368, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$369, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$370, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$371, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$372, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$373, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$374, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$375, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$376, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$377, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$378, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$379, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$380, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$381, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$382, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$383, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$384, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$385, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$386, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$387, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$388, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$389, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$390, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$391, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$392, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$393, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$394, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$395, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$396, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$397, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$398, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$399, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$400, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$401, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$402, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$403, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$404, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$405, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$406, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$407, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$408, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$409, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$410, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$411, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$412, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$413, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$414, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$415, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$416, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$417, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$418, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$419, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$420, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$421, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$422, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$423, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$424, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$425, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$426, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$427, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$428, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$429, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$430, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$431, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$432, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$433, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$434, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$435, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$436, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$437, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$438, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$439, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$440, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$441, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$442, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$443, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$444, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$445, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$446, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$447, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$448, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$449, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$450, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$451, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$452, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$453, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$454, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$455, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$456, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$457, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$458, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$459, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$460, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$461, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$462, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$463, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$464, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$465, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$466, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$467, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$468, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$469, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$470, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$471, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$472, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$473, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$474, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$475, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$476, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$477, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$478, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$479, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$480, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$481, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$482, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$483, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$484, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$485, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$486, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$487, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$488, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$489, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$490, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$491, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$492, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$493, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$494, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$495, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$496, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$497, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$498, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$499, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$500, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$501, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$502, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$503, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$504, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$505, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$506, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$507, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$508, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$509, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$510, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$511, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$512, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$513, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$514, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$515, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$516, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$517, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$518, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$519, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$520, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$521, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$522, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$523, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$524, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$525, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$526, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$527, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$528, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$529, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$530, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$531, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$532, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$533, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$534, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$535, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$536, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$537, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$538, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$539, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$540, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$541, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$542, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$543, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$544, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$545, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$546, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$547, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$548, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$549, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$550, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$551, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$552, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$553, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$554, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$555, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$556, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$557, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$558, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$559, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$560, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$561, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$562, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$563, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$564, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$565, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$566, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$567, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$568, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$569, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$570, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$571, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$572, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$573, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$574, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$575, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$576, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$577, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$578, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$579, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$580, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$581, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$582, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$583, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$584, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$585, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$586, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$587, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$588, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$589, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$590, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$591, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$592, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$593, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$594, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$595, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$596, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$597, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$598, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$599, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$600, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$601, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$602, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$603, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$604, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$605, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$606, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$607, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$608, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$609, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$610, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$611, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$612, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$613, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$614, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$615, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$616, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$617, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$618, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$619, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$620, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$621, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$622, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$623, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$624, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$625, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$626, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$627, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$628, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$629, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$630, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$631, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$632, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$633, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$634, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$635, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$636, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$637, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$638, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$639, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$640, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$641, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$642, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$643, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$644, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$645, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$646, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$647, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$648, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$649, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$650, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$651, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$652, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$653, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$654, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$655, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$656, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$657, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$658, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$659, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$660, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$661, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$662, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$663, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$664, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$665, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$666, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$667, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$668, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$669, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$670, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$671, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$672, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$673, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$674, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$675, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$676, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$677, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$678, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$679, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$680, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$681, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$682, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$683, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$684, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$685, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$686, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$687, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$688, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$689, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$690, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$691, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$692, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$693, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$694, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$695, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$696, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$697, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$698, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$699, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$700, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$701, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$702, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$703, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$704, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$705, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$706, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$707, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$708, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$709, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$710, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$711, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$712, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$713, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$714, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$715, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$716, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$717, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$718, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$719, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$720, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$721, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$722, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$723, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$724, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$725, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$726, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$727, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$728, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$729, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$730, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$731, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$732, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$733, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$734, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$735, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$736, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$737, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$738, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$739, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$740, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$741, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$742, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$743, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$744, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$745, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$746, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$747, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$748, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$749, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$750, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$751, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$752, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$753, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$754, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$755, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$756, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$757, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$758, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$759, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$760, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$761, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$762, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$763, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$764, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$765, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$766, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$767, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$768, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$769, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$770, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$771, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$772, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$773, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$774, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$775, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$776, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$777, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$778, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$779, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$780, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$781, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$782, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$783, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$784, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$785, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$786, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$787, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$788, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$789, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$790, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$791, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$792, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$793, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$794, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$795, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$796, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$797, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$798, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$799, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$800, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$801, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$802, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$803, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$804, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$805, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$806, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$807, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$808, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$809, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$810, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$811, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$812, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$813, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$814, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$815, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$816, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$817, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$818, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$819, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$820, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$821, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$822, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$823, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$824, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$825, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$826, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$827, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$828, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$829, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$830, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$831, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$832, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$833, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$834, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$835, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$836, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$837, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$838, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$839, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$840, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$841, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$842, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$843, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$844, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$845, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$846, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$847, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$848, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$849, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$850, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$851, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$852, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$853, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$854, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$855, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$856, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$857, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$858, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$859, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$860, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$861, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$862, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$863, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$864, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$865, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$866, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$867, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$868, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$869, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$870, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$871, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$872, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$873, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$874, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$875, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$876, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$877, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$878, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$879, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$880, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$881, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$882, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$883, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$884, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$885, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$886, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$887, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$888, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$889, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$890, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$891, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$892, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$893, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$894, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$895, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$896, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$897, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$898, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$899, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$900, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$901, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$902, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$903, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$904, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$905, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$906, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$907, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$908, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$909, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$910, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$911, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$912, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$913, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$914, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$915, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$916, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$917, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$918, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$919, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$920, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$921, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$922, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$923, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$924, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$925, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$926, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$927, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$928, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$929, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$930, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$931, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$932, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$933, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$934, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$935, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$936, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$937, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$938, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$939, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$940, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$941, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$942, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$943, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$944, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$945, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$946, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$947, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$948, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$949, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$950, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$951, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$952, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$953, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$954, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$955, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$956, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$957, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$958, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$959, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$960, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$961, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$962, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$963, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$964, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$965, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$966, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$967, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$968, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$969, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$970, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$971, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$972, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$973, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$974, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$975, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$976, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$977, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$978, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$979, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$980, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$981, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$982, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$983, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$984, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$985, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$986, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$987, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$988, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$989, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$990, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$991, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$992, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$993, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$994, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$995, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$996, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$997, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$998, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$999, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1000, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1001, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1002, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1003, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1004, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1005, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1006, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1007, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1008, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1009, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1010, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1011, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1012, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1013, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1014, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1015, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1016, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1017, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1018, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1019, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1020, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1021, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1022, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1023, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1024, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1025, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1026, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1027, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1028, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1029, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1030, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1031, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1032, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1033, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1034, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1035, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1036, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1037, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1038, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1039, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1040, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1041, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1042, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1043, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1044, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1045, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1046, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1047, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1048, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1049, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1050, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1051, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1052, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1053, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1054, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1055, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1056, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1057, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1058, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1059, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1060, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1061, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1062, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1063, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1064, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1065, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1066, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1067, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1068, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1069, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1070, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1071, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1072, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1073, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1074, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1075, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1076, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1077, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1078, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1079, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1080, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1081, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1082, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1083, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1084, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1085, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1086, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1087, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1088, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1089, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1090, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1091, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1092, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1093, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1094, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1095, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1096, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1097, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1098, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1099, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1100, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1101, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1102, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1103, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1104, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1105, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1106, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1107, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1108, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1109, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1110, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1111, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1112, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1113, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1114, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1115, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1116, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1117, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1118, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1119, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1120, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1121, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1122, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1123, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1124, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1125, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1126, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1127, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1128, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1129, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1130, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1131, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1132, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1133, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1134, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1135, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1136, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1137, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1138, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1139, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1140, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1141, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1142, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1143, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1144, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1145, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1146, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1147, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1148, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1149, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1150, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1151, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1152, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1153, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1154, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1155, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1156, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1157, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1158, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1159, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1160, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1161, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1162, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1163, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1164, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1165, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1166, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1167, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1168, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1169, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1170, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1171, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1172, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1173, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1174, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1175, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1176, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1177, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1178, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1179, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1180, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1181, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1182, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1183, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1184, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1185, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1186, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1187, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1188, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1189, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1190, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1191, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1192, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1193, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1194, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1195, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1196, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1197, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1198, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1199, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1200, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1201, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1202, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1203, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1204, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1205, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1206, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1207, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1208, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1209, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1210, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1211, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1212, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1213, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1214, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1215, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1216, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1217, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1218, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1219, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1220, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1221, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1222, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1223, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1224, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1225, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1226, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1227, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1228, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1229, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1230, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1231, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1232, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1233, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1234, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1235, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1236, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1237, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1238, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1239, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1240, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1241, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1242, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1243, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1244, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1245, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1246, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1247, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1248, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1249, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1250, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1251, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1252, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1253, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1254, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1255, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1256, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1257, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1258, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1259, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1260, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1261, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1262, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1263, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1264, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1265, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1266, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1267, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1268, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1269, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1270, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1271, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1272, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1273, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1274, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1275, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1276, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1277, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1278, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1279, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1280, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1281, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1282, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1283, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1284, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1285, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1286, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1287, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1288, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1289, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1290, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1291, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1292, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1293, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1294, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1295, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1296, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1297, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1298, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1299, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1300, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1301, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1302, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1303, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1304, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1305, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1306, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1307, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1308, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1309, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1310, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1311, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1312, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1313, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1314, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1315, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1316, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1317, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1318, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1319, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1320, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1321, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1322, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1323, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1324, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1325, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1326, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1327, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1328, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1329, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1330, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1331, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1332, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1333, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1334, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1335, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1336, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1337, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1338, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1339, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1340, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1341, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1342, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1343, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1344, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1345, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1346, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1347, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1348, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1349, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1350, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1351, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1352, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1353, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1354, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1355, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1356, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1357, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1358, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1359, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1360, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1361, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1362, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1363, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1364, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1365, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1366, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1367, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1368, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1369, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1370, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1371, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1372, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1373, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1374, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1375, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1376, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1377, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1378, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1379, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1380, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1381, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1382, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1383, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1384, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1385, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1386, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1387, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1388, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1389, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1390, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1391, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1392, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1393, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1394, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1395, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1396, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1397, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1398, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1399, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1400, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1401, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1402, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1403, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1404, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1405, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1406, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1407, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1408, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1409, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1410, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1411, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1412, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1413, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1414, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1415, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1416, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1417, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1418, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1419, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1420, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1421, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1422, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1423, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1424, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1425, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1426, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1427, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1428, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1429, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1430, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1431, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1432, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1433, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1434, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1435, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1436, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1437, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1438, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1439, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1440, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1441, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1442, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1443, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1444, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1445, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1446, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1447, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1448, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1449, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1450, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1451, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1452, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1453, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1454, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1455, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1456, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1457, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1458, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1459, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1460, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1461, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1462, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1463, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1464, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1465, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1466, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1467, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1468, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1469, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1470, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1471, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1472, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1473, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1474, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1475, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1476, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1477, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1478, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1479, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1480, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1481, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1482, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1483, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1484, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1485, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1486, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1487, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1488, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1489, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1490, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1491, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1492, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1493, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1494, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1495, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1496, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1497, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1498, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1499, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1500, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1501, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1502, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1503, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1504, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1505, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1506, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1507, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1508, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1509, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1510, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1511, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1512, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1513, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1514, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1515, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1516, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1517, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1518, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1519, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1520, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1521, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1522, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1523, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1524, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1525, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1526, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1527, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1528, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1529, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1530, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1531, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1532, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1533, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1534, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1535, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1536, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1537, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1538, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1539, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1540, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1541, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1542, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1543, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1544, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1545, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1546, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1547, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1548, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1549, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1550, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1551, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1552, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1553, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1554, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1555, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1556, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1557, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1558, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1559, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1560, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1561, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1562, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1563, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1564, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1565, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1566, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1567, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1568, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1569, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1570, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1571, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1572, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1573, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1574, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1575, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1576, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1577, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1578, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1579, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1580, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1581, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1582, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1583, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1584, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1585, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1586, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1587, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1588, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1589, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1590, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1591, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1592, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1593, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1594, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1595, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1596, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1597, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1598, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1599, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1600, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1601, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1602, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1603, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1604, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1605, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1606, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1607, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1608, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1609, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1610, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1611, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1612, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1613, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1614, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1615, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1616, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1617, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1618, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1619, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1620, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1621, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1622, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1623, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1624, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1625, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1626, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1627, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1628, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1629, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1630, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1631, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1632, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1633, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1634, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1635, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1636, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1637, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1638, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1639, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1640, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1641, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1642, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1643, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1644, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1645, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1646, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1647, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1648, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1649, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1650, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1651, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1652, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1653, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1654, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1655, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1656, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1657, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1658, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1659, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1660, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1661, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1662, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1663, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1664, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1665, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1666, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1667, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1668, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1669, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1670, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1671, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1672, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1673, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1674, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1675, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1676, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1677, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1678, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1679, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1680, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1681, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1682, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1683, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1684, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1685, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1686, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1687, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1688, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1689, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1690, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1691, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1692, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1693, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1694, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1695, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1696, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1697, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1698, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1699, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1700, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1701, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1702, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1703, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1704, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1705, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1706, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1707, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1708, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1709, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1710, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1711, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1712, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1713, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1714, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1715, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1716, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1717, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1718, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1719, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1720, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1721, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1722, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1723, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1724, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1725, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1726, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1727, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1728, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1729, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1730, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1731, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1732, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1733, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1734, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1735, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1736, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1737, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1738, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1739, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1740, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1741, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1742, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1743, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1744, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1745, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1746, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1747, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1748, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1749, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1750, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1751, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1752, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1753, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1754, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1755, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1756, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1757, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1758, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1759, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1760, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1761, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1762, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1763, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1764, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1765, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1766, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1767, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1768, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1769, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1770, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1771, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1772, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1773, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1774, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1775, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1776, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1777, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1778, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1779, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1780, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1781, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1782, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1783, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1784, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1785, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1786, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1787, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1788, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1789, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1790, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1791, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1792, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1793, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1794, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1795, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1796, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1797, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1798, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1799, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1800, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1801, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1802, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1803, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1804, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1805, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1806, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1807, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1808, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1809, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1810, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1811, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1812, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1813, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1814, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1815, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1816, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1817, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1818, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1819, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1820, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1821, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1822, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1823, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1824, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1825, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1826, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1827, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1828, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1829, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1830, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1831, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1832, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1833, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1834, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1835, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1836, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1837, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1838, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1839, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1840, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1841, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1842, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1843, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1844, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1845, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1846, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1847, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1848, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1849, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1850, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1851, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1852, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1853, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1854, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1855, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1856, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1857, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1858, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1859, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1860, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1861, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1862, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1863, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1864, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1865, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1866, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1867, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1868, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1869, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1870, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1871, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1872, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1873, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1874, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1875, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1876, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1877, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1878, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1879, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1880, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1881, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1882, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1883, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1884, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1885, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1886, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1887, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1888, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1889, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1890, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1891, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1892, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1893, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1894, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1895, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1896, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1897, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1898, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1899, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1900, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1901, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1902, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1903, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1904, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1905, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1906, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1907, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1908, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1909, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1910, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1911, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1912, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1913, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1914, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1915, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1916, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1917, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1918, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1919, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1920, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1921, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1922, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1923, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1924, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1925, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1926, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1927, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1928, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1929, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1930, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1931, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1932, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1933, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1934, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1935, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1936, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1937, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1938, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1939, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1940, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1941, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1942, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1943, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1944, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1945, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1946, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1947, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1948, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1949, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1950, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1951, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1952, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1953, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1954, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1955, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1956, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1957, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1958, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1959, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1960, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1961, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1962, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1963, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1964, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1965, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1966, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1967, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1968, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1969, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1970, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1971, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1972, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1973, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1974, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1975, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1976, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1977, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1978, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1979, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1980, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1981, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1982, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1983, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1984, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1985, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1986, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1987, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1988, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1989, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1990, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1991, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1992, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1993, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1994, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1995, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1996, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1997, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1998, R12
+	B	runtime·callbackasm1(SB)
+	MOVW	$1999, R12
+	B	runtime·callbackasm1(SB)
diff --git a/src/runtime/zcallback_windows_arm64.s b/src/runtime/zcallback_windows_arm64.s
new file mode 100644
index 0000000..69fb057
--- /dev/null
+++ b/src/runtime/zcallback_windows_arm64.s
@@ -0,0 +1,4012 @@
+// Code generated by wincallback.go using 'go generate'. DO NOT EDIT.
+
+// External code calls into callbackasm at an offset corresponding
+// to the callback index. Callbackasm is a table of MOV and B instructions.
+// The MOV instruction loads R12 with the callback index, and the
+// B instruction branches to callbackasm1.
+// callbackasm1 takes the callback index from R12 and
+// indexes into an array that stores information about each callback.
+// It then calls the Go implementation for that callback.
+#include "textflag.h"
+
+TEXT runtime·callbackasm(SB),NOSPLIT|NOFRAME,$0
+	MOVD	$0, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$2, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$3, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$4, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$5, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$6, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$7, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$8, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$9, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$10, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$11, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$12, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$13, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$14, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$15, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$16, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$17, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$18, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$19, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$20, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$21, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$22, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$23, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$24, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$25, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$26, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$27, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$28, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$29, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$30, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$31, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$32, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$33, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$34, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$35, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$36, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$37, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$38, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$39, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$40, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$41, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$42, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$43, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$44, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$45, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$46, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$47, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$48, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$49, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$50, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$51, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$52, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$53, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$54, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$55, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$56, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$57, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$58, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$59, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$60, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$61, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$62, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$63, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$64, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$65, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$66, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$67, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$68, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$69, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$70, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$71, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$72, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$73, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$74, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$75, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$76, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$77, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$78, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$79, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$80, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$81, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$82, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$83, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$84, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$85, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$86, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$87, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$88, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$89, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$90, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$91, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$92, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$93, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$94, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$95, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$96, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$97, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$98, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$99, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$100, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$101, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$102, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$103, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$104, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$105, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$106, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$107, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$108, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$109, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$110, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$111, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$112, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$113, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$114, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$115, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$116, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$117, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$118, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$119, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$120, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$121, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$122, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$123, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$124, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$125, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$126, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$127, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$128, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$129, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$130, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$131, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$132, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$133, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$134, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$135, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$136, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$137, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$138, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$139, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$140, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$141, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$142, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$143, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$144, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$145, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$146, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$147, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$148, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$149, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$150, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$151, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$152, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$153, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$154, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$155, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$156, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$157, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$158, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$159, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$160, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$161, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$162, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$163, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$164, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$165, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$166, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$167, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$168, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$169, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$170, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$171, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$172, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$173, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$174, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$175, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$176, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$177, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$178, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$179, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$180, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$181, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$182, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$183, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$184, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$185, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$186, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$187, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$188, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$189, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$190, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$191, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$192, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$193, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$194, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$195, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$196, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$197, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$198, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$199, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$200, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$201, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$202, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$203, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$204, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$205, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$206, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$207, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$208, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$209, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$210, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$211, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$212, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$213, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$214, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$215, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$216, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$217, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$218, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$219, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$220, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$221, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$222, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$223, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$224, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$225, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$226, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$227, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$228, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$229, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$230, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$231, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$232, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$233, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$234, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$235, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$236, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$237, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$238, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$239, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$240, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$241, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$242, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$243, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$244, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$245, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$246, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$247, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$248, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$249, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$250, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$251, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$252, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$253, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$254, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$255, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$256, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$257, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$258, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$259, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$260, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$261, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$262, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$263, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$264, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$265, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$266, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$267, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$268, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$269, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$270, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$271, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$272, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$273, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$274, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$275, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$276, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$277, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$278, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$279, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$280, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$281, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$282, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$283, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$284, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$285, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$286, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$287, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$288, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$289, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$290, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$291, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$292, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$293, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$294, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$295, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$296, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$297, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$298, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$299, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$300, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$301, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$302, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$303, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$304, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$305, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$306, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$307, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$308, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$309, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$310, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$311, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$312, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$313, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$314, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$315, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$316, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$317, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$318, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$319, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$320, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$321, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$322, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$323, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$324, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$325, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$326, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$327, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$328, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$329, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$330, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$331, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$332, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$333, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$334, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$335, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$336, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$337, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$338, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$339, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$340, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$341, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$342, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$343, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$344, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$345, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$346, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$347, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$348, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$349, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$350, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$351, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$352, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$353, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$354, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$355, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$356, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$357, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$358, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$359, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$360, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$361, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$362, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$363, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$364, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$365, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$366, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$367, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$368, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$369, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$370, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$371, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$372, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$373, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$374, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$375, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$376, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$377, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$378, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$379, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$380, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$381, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$382, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$383, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$384, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$385, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$386, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$387, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$388, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$389, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$390, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$391, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$392, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$393, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$394, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$395, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$396, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$397, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$398, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$399, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$400, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$401, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$402, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$403, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$404, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$405, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$406, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$407, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$408, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$409, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$410, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$411, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$412, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$413, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$414, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$415, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$416, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$417, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$418, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$419, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$420, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$421, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$422, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$423, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$424, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$425, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$426, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$427, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$428, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$429, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$430, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$431, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$432, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$433, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$434, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$435, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$436, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$437, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$438, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$439, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$440, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$441, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$442, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$443, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$444, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$445, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$446, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$447, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$448, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$449, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$450, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$451, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$452, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$453, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$454, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$455, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$456, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$457, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$458, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$459, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$460, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$461, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$462, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$463, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$464, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$465, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$466, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$467, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$468, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$469, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$470, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$471, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$472, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$473, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$474, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$475, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$476, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$477, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$478, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$479, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$480, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$481, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$482, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$483, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$484, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$485, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$486, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$487, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$488, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$489, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$490, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$491, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$492, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$493, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$494, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$495, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$496, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$497, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$498, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$499, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$500, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$501, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$502, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$503, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$504, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$505, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$506, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$507, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$508, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$509, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$510, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$511, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$512, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$513, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$514, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$515, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$516, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$517, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$518, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$519, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$520, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$521, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$522, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$523, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$524, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$525, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$526, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$527, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$528, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$529, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$530, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$531, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$532, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$533, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$534, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$535, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$536, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$537, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$538, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$539, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$540, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$541, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$542, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$543, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$544, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$545, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$546, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$547, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$548, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$549, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$550, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$551, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$552, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$553, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$554, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$555, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$556, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$557, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$558, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$559, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$560, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$561, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$562, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$563, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$564, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$565, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$566, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$567, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$568, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$569, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$570, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$571, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$572, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$573, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$574, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$575, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$576, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$577, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$578, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$579, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$580, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$581, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$582, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$583, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$584, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$585, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$586, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$587, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$588, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$589, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$590, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$591, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$592, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$593, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$594, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$595, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$596, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$597, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$598, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$599, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$600, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$601, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$602, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$603, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$604, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$605, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$606, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$607, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$608, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$609, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$610, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$611, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$612, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$613, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$614, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$615, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$616, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$617, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$618, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$619, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$620, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$621, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$622, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$623, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$624, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$625, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$626, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$627, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$628, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$629, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$630, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$631, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$632, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$633, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$634, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$635, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$636, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$637, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$638, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$639, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$640, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$641, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$642, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$643, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$644, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$645, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$646, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$647, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$648, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$649, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$650, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$651, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$652, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$653, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$654, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$655, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$656, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$657, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$658, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$659, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$660, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$661, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$662, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$663, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$664, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$665, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$666, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$667, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$668, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$669, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$670, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$671, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$672, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$673, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$674, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$675, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$676, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$677, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$678, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$679, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$680, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$681, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$682, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$683, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$684, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$685, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$686, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$687, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$688, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$689, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$690, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$691, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$692, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$693, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$694, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$695, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$696, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$697, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$698, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$699, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$700, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$701, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$702, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$703, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$704, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$705, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$706, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$707, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$708, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$709, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$710, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$711, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$712, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$713, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$714, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$715, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$716, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$717, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$718, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$719, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$720, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$721, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$722, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$723, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$724, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$725, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$726, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$727, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$728, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$729, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$730, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$731, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$732, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$733, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$734, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$735, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$736, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$737, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$738, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$739, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$740, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$741, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$742, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$743, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$744, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$745, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$746, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$747, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$748, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$749, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$750, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$751, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$752, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$753, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$754, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$755, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$756, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$757, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$758, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$759, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$760, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$761, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$762, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$763, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$764, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$765, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$766, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$767, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$768, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$769, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$770, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$771, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$772, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$773, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$774, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$775, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$776, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$777, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$778, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$779, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$780, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$781, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$782, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$783, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$784, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$785, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$786, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$787, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$788, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$789, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$790, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$791, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$792, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$793, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$794, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$795, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$796, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$797, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$798, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$799, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$800, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$801, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$802, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$803, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$804, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$805, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$806, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$807, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$808, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$809, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$810, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$811, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$812, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$813, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$814, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$815, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$816, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$817, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$818, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$819, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$820, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$821, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$822, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$823, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$824, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$825, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$826, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$827, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$828, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$829, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$830, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$831, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$832, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$833, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$834, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$835, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$836, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$837, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$838, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$839, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$840, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$841, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$842, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$843, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$844, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$845, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$846, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$847, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$848, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$849, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$850, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$851, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$852, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$853, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$854, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$855, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$856, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$857, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$858, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$859, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$860, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$861, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$862, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$863, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$864, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$865, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$866, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$867, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$868, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$869, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$870, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$871, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$872, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$873, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$874, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$875, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$876, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$877, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$878, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$879, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$880, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$881, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$882, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$883, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$884, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$885, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$886, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$887, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$888, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$889, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$890, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$891, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$892, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$893, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$894, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$895, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$896, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$897, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$898, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$899, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$900, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$901, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$902, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$903, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$904, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$905, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$906, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$907, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$908, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$909, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$910, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$911, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$912, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$913, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$914, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$915, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$916, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$917, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$918, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$919, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$920, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$921, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$922, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$923, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$924, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$925, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$926, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$927, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$928, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$929, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$930, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$931, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$932, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$933, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$934, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$935, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$936, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$937, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$938, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$939, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$940, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$941, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$942, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$943, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$944, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$945, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$946, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$947, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$948, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$949, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$950, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$951, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$952, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$953, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$954, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$955, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$956, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$957, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$958, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$959, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$960, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$961, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$962, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$963, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$964, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$965, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$966, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$967, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$968, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$969, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$970, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$971, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$972, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$973, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$974, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$975, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$976, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$977, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$978, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$979, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$980, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$981, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$982, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$983, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$984, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$985, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$986, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$987, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$988, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$989, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$990, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$991, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$992, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$993, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$994, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$995, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$996, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$997, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$998, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$999, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1000, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1001, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1002, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1003, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1004, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1005, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1006, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1007, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1008, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1009, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1010, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1011, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1012, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1013, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1014, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1015, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1016, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1017, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1018, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1019, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1020, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1021, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1022, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1023, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1024, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1025, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1026, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1027, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1028, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1029, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1030, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1031, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1032, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1033, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1034, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1035, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1036, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1037, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1038, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1039, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1040, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1041, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1042, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1043, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1044, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1045, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1046, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1047, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1048, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1049, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1050, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1051, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1052, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1053, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1054, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1055, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1056, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1057, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1058, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1059, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1060, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1061, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1062, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1063, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1064, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1065, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1066, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1067, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1068, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1069, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1070, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1071, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1072, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1073, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1074, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1075, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1076, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1077, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1078, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1079, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1080, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1081, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1082, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1083, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1084, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1085, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1086, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1087, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1088, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1089, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1090, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1091, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1092, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1093, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1094, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1095, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1096, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1097, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1098, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1099, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1100, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1101, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1102, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1103, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1104, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1105, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1106, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1107, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1108, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1109, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1110, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1111, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1112, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1113, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1114, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1115, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1116, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1117, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1118, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1119, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1120, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1121, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1122, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1123, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1124, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1125, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1126, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1127, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1128, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1129, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1130, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1131, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1132, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1133, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1134, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1135, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1136, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1137, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1138, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1139, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1140, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1141, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1142, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1143, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1144, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1145, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1146, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1147, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1148, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1149, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1150, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1151, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1152, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1153, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1154, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1155, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1156, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1157, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1158, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1159, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1160, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1161, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1162, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1163, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1164, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1165, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1166, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1167, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1168, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1169, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1170, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1171, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1172, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1173, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1174, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1175, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1176, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1177, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1178, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1179, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1180, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1181, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1182, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1183, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1184, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1185, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1186, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1187, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1188, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1189, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1190, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1191, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1192, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1193, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1194, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1195, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1196, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1197, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1198, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1199, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1200, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1201, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1202, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1203, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1204, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1205, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1206, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1207, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1208, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1209, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1210, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1211, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1212, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1213, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1214, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1215, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1216, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1217, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1218, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1219, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1220, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1221, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1222, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1223, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1224, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1225, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1226, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1227, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1228, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1229, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1230, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1231, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1232, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1233, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1234, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1235, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1236, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1237, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1238, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1239, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1240, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1241, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1242, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1243, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1244, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1245, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1246, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1247, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1248, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1249, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1250, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1251, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1252, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1253, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1254, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1255, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1256, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1257, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1258, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1259, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1260, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1261, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1262, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1263, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1264, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1265, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1266, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1267, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1268, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1269, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1270, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1271, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1272, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1273, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1274, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1275, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1276, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1277, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1278, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1279, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1280, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1281, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1282, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1283, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1284, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1285, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1286, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1287, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1288, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1289, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1290, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1291, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1292, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1293, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1294, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1295, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1296, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1297, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1298, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1299, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1300, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1301, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1302, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1303, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1304, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1305, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1306, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1307, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1308, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1309, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1310, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1311, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1312, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1313, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1314, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1315, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1316, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1317, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1318, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1319, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1320, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1321, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1322, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1323, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1324, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1325, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1326, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1327, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1328, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1329, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1330, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1331, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1332, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1333, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1334, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1335, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1336, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1337, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1338, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1339, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1340, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1341, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1342, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1343, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1344, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1345, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1346, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1347, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1348, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1349, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1350, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1351, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1352, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1353, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1354, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1355, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1356, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1357, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1358, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1359, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1360, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1361, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1362, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1363, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1364, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1365, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1366, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1367, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1368, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1369, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1370, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1371, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1372, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1373, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1374, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1375, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1376, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1377, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1378, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1379, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1380, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1381, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1382, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1383, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1384, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1385, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1386, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1387, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1388, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1389, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1390, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1391, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1392, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1393, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1394, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1395, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1396, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1397, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1398, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1399, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1400, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1401, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1402, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1403, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1404, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1405, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1406, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1407, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1408, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1409, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1410, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1411, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1412, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1413, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1414, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1415, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1416, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1417, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1418, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1419, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1420, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1421, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1422, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1423, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1424, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1425, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1426, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1427, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1428, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1429, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1430, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1431, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1432, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1433, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1434, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1435, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1436, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1437, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1438, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1439, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1440, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1441, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1442, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1443, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1444, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1445, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1446, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1447, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1448, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1449, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1450, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1451, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1452, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1453, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1454, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1455, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1456, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1457, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1458, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1459, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1460, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1461, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1462, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1463, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1464, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1465, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1466, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1467, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1468, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1469, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1470, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1471, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1472, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1473, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1474, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1475, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1476, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1477, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1478, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1479, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1480, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1481, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1482, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1483, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1484, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1485, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1486, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1487, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1488, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1489, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1490, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1491, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1492, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1493, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1494, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1495, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1496, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1497, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1498, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1499, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1500, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1501, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1502, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1503, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1504, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1505, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1506, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1507, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1508, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1509, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1510, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1511, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1512, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1513, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1514, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1515, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1516, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1517, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1518, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1519, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1520, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1521, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1522, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1523, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1524, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1525, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1526, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1527, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1528, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1529, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1530, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1531, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1532, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1533, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1534, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1535, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1536, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1537, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1538, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1539, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1540, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1541, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1542, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1543, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1544, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1545, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1546, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1547, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1548, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1549, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1550, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1551, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1552, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1553, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1554, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1555, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1556, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1557, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1558, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1559, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1560, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1561, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1562, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1563, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1564, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1565, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1566, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1567, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1568, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1569, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1570, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1571, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1572, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1573, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1574, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1575, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1576, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1577, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1578, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1579, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1580, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1581, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1582, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1583, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1584, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1585, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1586, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1587, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1588, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1589, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1590, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1591, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1592, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1593, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1594, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1595, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1596, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1597, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1598, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1599, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1600, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1601, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1602, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1603, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1604, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1605, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1606, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1607, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1608, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1609, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1610, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1611, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1612, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1613, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1614, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1615, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1616, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1617, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1618, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1619, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1620, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1621, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1622, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1623, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1624, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1625, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1626, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1627, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1628, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1629, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1630, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1631, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1632, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1633, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1634, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1635, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1636, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1637, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1638, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1639, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1640, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1641, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1642, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1643, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1644, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1645, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1646, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1647, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1648, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1649, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1650, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1651, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1652, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1653, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1654, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1655, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1656, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1657, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1658, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1659, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1660, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1661, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1662, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1663, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1664, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1665, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1666, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1667, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1668, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1669, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1670, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1671, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1672, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1673, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1674, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1675, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1676, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1677, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1678, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1679, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1680, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1681, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1682, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1683, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1684, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1685, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1686, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1687, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1688, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1689, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1690, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1691, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1692, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1693, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1694, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1695, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1696, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1697, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1698, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1699, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1700, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1701, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1702, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1703, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1704, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1705, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1706, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1707, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1708, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1709, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1710, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1711, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1712, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1713, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1714, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1715, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1716, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1717, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1718, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1719, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1720, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1721, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1722, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1723, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1724, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1725, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1726, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1727, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1728, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1729, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1730, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1731, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1732, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1733, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1734, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1735, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1736, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1737, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1738, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1739, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1740, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1741, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1742, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1743, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1744, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1745, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1746, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1747, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1748, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1749, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1750, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1751, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1752, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1753, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1754, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1755, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1756, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1757, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1758, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1759, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1760, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1761, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1762, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1763, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1764, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1765, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1766, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1767, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1768, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1769, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1770, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1771, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1772, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1773, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1774, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1775, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1776, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1777, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1778, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1779, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1780, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1781, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1782, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1783, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1784, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1785, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1786, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1787, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1788, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1789, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1790, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1791, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1792, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1793, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1794, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1795, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1796, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1797, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1798, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1799, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1800, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1801, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1802, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1803, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1804, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1805, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1806, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1807, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1808, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1809, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1810, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1811, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1812, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1813, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1814, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1815, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1816, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1817, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1818, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1819, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1820, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1821, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1822, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1823, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1824, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1825, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1826, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1827, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1828, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1829, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1830, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1831, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1832, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1833, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1834, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1835, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1836, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1837, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1838, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1839, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1840, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1841, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1842, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1843, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1844, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1845, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1846, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1847, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1848, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1849, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1850, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1851, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1852, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1853, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1854, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1855, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1856, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1857, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1858, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1859, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1860, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1861, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1862, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1863, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1864, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1865, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1866, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1867, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1868, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1869, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1870, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1871, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1872, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1873, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1874, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1875, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1876, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1877, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1878, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1879, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1880, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1881, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1882, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1883, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1884, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1885, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1886, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1887, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1888, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1889, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1890, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1891, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1892, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1893, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1894, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1895, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1896, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1897, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1898, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1899, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1900, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1901, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1902, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1903, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1904, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1905, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1906, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1907, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1908, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1909, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1910, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1911, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1912, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1913, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1914, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1915, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1916, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1917, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1918, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1919, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1920, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1921, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1922, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1923, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1924, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1925, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1926, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1927, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1928, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1929, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1930, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1931, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1932, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1933, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1934, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1935, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1936, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1937, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1938, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1939, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1940, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1941, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1942, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1943, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1944, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1945, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1946, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1947, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1948, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1949, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1950, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1951, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1952, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1953, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1954, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1955, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1956, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1957, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1958, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1959, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1960, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1961, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1962, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1963, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1964, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1965, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1966, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1967, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1968, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1969, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1970, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1971, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1972, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1973, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1974, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1975, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1976, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1977, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1978, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1979, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1980, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1981, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1982, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1983, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1984, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1985, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1986, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1987, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1988, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1989, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1990, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1991, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1992, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1993, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1994, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1995, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1996, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1997, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1998, R12
+	B	runtime·callbackasm1(SB)
+	MOVD	$1999, R12
+	B	runtime·callbackasm1(SB)
author	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-16 19:19:13 +0000
committer	Daniel Baumann <daniel.baumann@progress-linux.org>	2024-04-16 19:19:13 +0000
commit	ccd992355df7192993c666236047820244914598 (patch)
tree	f00fea65147227b7743083c6148396f74cd66935 /src/runtime
parent	Initial commit. (diff)
download	golang-1.21-ccd992355df7192993c666236047820244914598.tar.xz golang-1.21-ccd992355df7192993c666236047820244914598.zip