summaryrefslogtreecommitdiffstats
path: root/docs/RANDOM_SEEDS.md
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-27 13:00:47 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-04-27 13:00:47 +0000
commit2cb7e0aaedad73b076ea18c6900b0e86c5760d79 (patch)
treeda68ca54bb79f4080079bf0828acda937593a4e1 /docs/RANDOM_SEEDS.md
parentInitial commit. (diff)
downloadsystemd-upstream.tar.xz
systemd-upstream.zip
Adding upstream version 247.3.upstream/247.3upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
-rw-r--r--docs/RANDOM_SEEDS.md435
1 files changed, 435 insertions, 0 deletions
diff --git a/docs/RANDOM_SEEDS.md b/docs/RANDOM_SEEDS.md
new file mode 100644
index 0000000..da3fe40
--- /dev/null
+++ b/docs/RANDOM_SEEDS.md
@@ -0,0 +1,435 @@
+---
+title: Random Seeds
+category: Concepts
+layout: default
+---
+
+# Random Seeds
+
+systemd can help in a number of ways with providing reliable, high quality
+random numbers from early boot on.
+
+## Linux Kernel Entropy Pool
+
+Today's computer systems require random number generators for numerous
+cryptographic and other purposes. On Linux systems, the kernel's entropy pool
+is typically used as high-quality source of random numbers. The kernel's
+entropy pool combines various entropy inputs together, mixes them and provides
+an API to userspace as well as to internal kernel subsystems to retrieve
+it. This entropy pool needs to be initialized with a minimal level of entropy
+before it can provide high quality, cryptographic random numbers to
+applications. Until the entropy pool is fully initialized application requests
+for high-quality random numbers cannot be fulfilled.
+
+The Linux kernel provides three relevant userspace APIs to request random data
+from the kernel's entropy pool:
+
+* The [`getrandom()`](http://man7.org/linux/man-pages/man2/getrandom.2.html)
+ system call with its `flags` parameter set to 0. If invoked the calling
+ program will synchronously block until the random pool is fully initialized
+ and the requested bytes can be provided.
+
+* The `getrandom()` system call with its `flags` parameter set to
+ `GRND_NONBLOCK`. If invoked the request for random bytes will fail if the
+ pool is not initialized yet.
+
+* Reading from the
+ [`/dev/urandom`](http://man7.org/linux/man-pages/man4/urandom.4.html)
+ pseudo-device will always return random bytes immediately, even if the pool
+ is not initialized. The provided random bytes will be of low quality in this
+ case however. Moreover the kernel will log about all programs using this
+ interface in this state, and which thus potentially rely on an uninitialized
+ entropy pool.
+
+(Strictly speaking there are more APIs, for example `/dev/random`, but these
+should not be used by almost any application and hence aren't mentioned here.)
+
+Note that the time it takes to initialize the random pool may differ between
+systems. If local hardware random number generators are available,
+initialization is likely quick, but particularly in embedded and virtualized
+environments available entropy is small and thus random pool initialization
+might take a long time (up to tens of minutes!).
+
+Modern hardware tends to come with a number of hardware random number
+generators (hwrng), that may be used to relatively quickly fill up the entropy
+pool. Specifically:
+
+* All recent Intel and AMD CPUs provide the CPU opcode
+ [RDRAND](https://en.wikipedia.org/wiki/RdRand) to acquire random bytes. Linux
+ includes random bytes generated this way in its entropy pool, but didn't use
+ to credit entropy for it (i.e. data from this source wasn't considered good
+ enough to consider the entropy pool properly filled even though it was
+ used). This has changed recently however, and most big distributions have
+ turned on the `CONFIG_RANDOM_TRUST_CPU=y` kernel compile time option. This
+ means systems with CPUs supporting this opcode will be able to very quickly
+ reach the "pool filled" state.
+
+* The TPM security chip that is available on all modern desktop systems has a
+ hwrng. It is also fed into the entropy pool, but generally not credited
+ entropy. You may use `rng_core.default_quality=1000` on the kernel command
+ line to change that, but note that this is a global setting affect all
+ hwrngs. (Yeah, that's weird.)
+
+* Many Intel and AMD chipsets have hwrng chips. Their Linux drivers usually
+ don't credit entropy. (But there's `rng_core.default_quality=1000`, see
+ above.)
+
+* Various embedded boards have hwrng chips. Some drivers automatically credit
+ entropy, others do not. Some WiFi chips appear to have hwrng sources too, and
+ they usually do not credit entropy for them.
+
+* `virtio-rng` is used in virtualized environments and retrieves random data
+ from the VM host. It credits full entropy.
+
+* The EFI firmware typically provides a RNG API. When transitioning from UEFI
+ to kernel mode Linux will query some random data through it, and feed it into
+ the pool, but not credit entropy to it. What kind of random source is behind
+ the EFI RNG API is often not entirely clear, but it hopefully is some kind of
+ hardware source.
+
+If neither of these are available (in fact, even if they are), Linux generates
+entropy from various non-hwrng sources in various subsystems, all of which
+ultimately are rooted in IRQ noise, a very "slow" source of entropy, in
+particular in virtualized environments.
+
+## `systemd`'s Use of Random Numbers
+
+systemd is responsible for bringing up the OS. It generally runs as the first
+userspace process the kernel invokes. Because of that it runs at a time where
+the entropy pool is typically not yet initialized, and thus requests to acquire
+random bytes will either be delayed, will fail or result in a noisy kernel log
+message (see above).
+
+Various other components run during early boot that require random bytes. For
+example, initial RAM disks nowadays communicate with encrypted networks or
+access encrypted storage which might need random numbers. systemd itself
+requires random numbers as well, including for the following uses:
+
+* systemd assigns 'invocation' UUIDs to all services it invokes that uniquely
+ identify each invocation. This is useful retain a global handle on a specific
+ service invocation and relate it to other data. For example, log data
+ collected by the journal usually includes the invocation UUID and thus the
+ runtime context the service manager maintains can be neatly matched up with
+ the log data a specific service invocation generated. systemd also
+ initializes `/etc/machine-id` with a randomized UUID. (systemd also makes use
+ of the randomized "boot id" the kernel exposes in
+ `/proc/sys/kernel/random/boot_id`). These UUIDs are exclusively Type 4 UUIDs,
+ i.e. randomly generated ones.
+
+* systemd maintains various hash tables internally. In order to harden them
+ against [collision
+ attacks](https://www.cs.auckland.ac.nz/~mcw/Teaching/refs/misc/denial-of-service.pdf)
+ they are seeded with random numbers.
+
+* At various places systemd needs random bytes for temporary file name
+ generation, UID allocation randomization, and similar.
+
+* systemd-resolved and systemd-networkd use random number generators to harden
+ the protocols they implement against packet forgery.
+
+* systemd-udevd and systemd-nspawn can generate randomized MAC addresses for
+ network devices.
+
+Note that these cases generally do not require a cryptographic-grade random
+number generator, as most of these utilize random numbers to minimize risk of
+collision and not to generate secret key material. However, they usually do
+require "medium-grade" random data. For example: systemd's hash-maps are
+reseeded if they grow beyond certain thresholds (and thus collisions are more
+likely). This means they are generally fine with low-quality (even constant)
+random numbers initially as long as they get better with time, so that
+collision attacks are eventually thwarted as better, non-guessable seeds are
+acquired.
+
+## Keeping `systemd'`s Demand on the Kernel Entropy Pool Minimal
+
+Since most of systemd's own use of random numbers do not require
+cryptographic-grade RNGs, it tries to avoid reading entropy from the kernel
+entropy pool if possible. If it succeeds this has the benefit that there's no
+need to delay the early boot process until entropy is available, and noisy
+kernel log messages about early reading from `/dev/urandom` are avoided
+too. Specifically:
+
+1. When generating [Type 4
+ UUIDs](https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_\(random\)),
+ systemd tries to use Intel's and AMD's RDRAND CPU opcode directly, if
+ available. While some doubt the quality and trustworthiness of the entropy
+ provided by these opcodes, they should be good enough for generating UUIDs,
+ if not key material (though, as mentioned, today's big distributions opted
+ to trust it for that too, now, see above — but we are not going to make that
+ decision for you, and for anything key material related will only use the
+ kernel's entropy pool). If RDRAND is not available or doesn't work, it will
+ use synchronous `getrandom()` as fallback, and `/dev/urandom` on old kernels
+ where that system call doesn't exist yet. This means on non-Intel/AMD
+ systems UUID generation will block on kernel entropy initialization.
+
+2. For seeding hash tables, and all the other similar purposes systemd first
+ tries RDRAND, and if that's not available will try to use asynchronous
+ `getrandom()` (if the kernel doesn't support this system call,
+ `/dev/urandom` is used). This may fail too in case the pool is not
+ initialized yet, in which case it will fall back to glibc's internal rand()
+ calls, i.e. weak pseudo-random numbers. This should make sure we use good
+ random bytes if we can, but neither delay boot nor trigger noisy kernel log
+ messages during early boot for these use-cases.
+
+## `systemd`'s Support for Filling the Kernel Entropy Pool
+
+systemd has various provisions to ensure the kernel entropy is filled during
+boot, in order to ensure the entropy pool is filled up quickly.
+
+1. When systemd's PID 1 detects it runs in a virtualized environment providing
+ the `virtio-rng` interface it will load the necessary kernel modules to make
+ use of it during earliest boot, if possible — much earlier than regular
+ kernel module loading done by `systemd-udevd.service`. This should ensure
+ that in VM environments the entropy pool is quickly filled, even before
+ systemd invokes the first service process — as long as the VM environment
+ provides virtualized RNG hardware (and VM environments really should!).
+
+2. The
+ [`systemd-random-seed.service`](https://www.freedesktop.org/software/systemd/man/systemd-random-seed.service.html)
+ system service will load a random seed from `/var/lib/systemd/random-seed`
+ into the kernel entropy pool. By default it does not credit entropy for it
+ though, since the seed is — more often than not — not reset when 'golden'
+ master images of an OS are created, and thus replicated into every
+ installation. If OS image builders carefully reset the random seed file
+ before generating the image it should be safe to credit entropy, which can
+ be enabled by setting the `$SYSTEMD_RANDOM_SEED_CREDIT` environment variable
+ for the service to `1` (or even `force`, see man page). Note however, that
+ this service typically runs relatively late during early boot: long after
+ the initial RAM disk (`initrd`) completed, and after the `/var/` file system
+ became writable. This is usually too late for many applications, it is hence
+ not advised to rely exclusively on this functionality to seed the kernel's
+ entropy pool. Also note that this service synchronously waits until the
+ kernel's entropy pool is initialized before completing start-up. It may thus
+ be used by other services as synchronization point to order against, if they
+ require an initialized entropy pool to operate correctly.
+
+3. The
+ [`systemd-boot`](https://www.freedesktop.org/software/systemd/man/systemd-boot.html)
+ EFI boot loader included in systemd is able to maintain and provide a random
+ seed stored in the EFI System Partition (ESP) to the booted OS, which allows
+ booting up with a fully initialized entropy pool from earliest boot
+ on. During installation of the boot loader (or when invoking [`bootctl
+ random-seed`](https://www.freedesktop.org/software/systemd/man/bootctl.html#random-seed))
+ a seed file with an initial seed is placed in a file `/loader/random-seed`
+ in the ESP. In addition, an identically sized randomized EFI variable called
+ the 'system token' is set, which is written to the machine's firmware NVRAM.
+ During boot, when `systemd-boot` finds both the random seed file and the
+ system token they are combined and hashed with SHA256 (in counter mode, to
+ generate sufficient data), to generate a new random seed file to store in
+ the ESP as well as a random seed to pass to the OS kernel. The new random
+ seed file for the ESP is then written to the ESP, ensuring this is completed
+ before the OS is invoked. Very early during initialization PID 1 will read
+ the random seed provided in the EFI variable and credit it fully to the
+ kernel's entropy pool.
+
+ This mechanism is able to safely provide an initialized entropy pool already
+ in the `initrd` and guarantees that different seeds are passed from the boot
+ loader to the OS on every boot (in a way that does not allow regeneration of
+ an old seed file from a new seed file). Moreover, when an OS image is
+ replicated between multiple images and the random seed is not reset, this
+ will still result in different random seeds being passed to the OS, as the
+ per-machine 'system token' is specific to the physical host, and not
+ included in OS disk images. If the 'system token' is properly initialized
+ and kept sufficiently secret it should not be possible to regenerate the
+ entropy pool of different machines, even if this seed is the only source of
+ entropy.
+
+ Note that the writes to the ESP needed to maintain the random seed should be
+ minimal. The size of the random seed file is directly derived from the Linux
+ kernel's entropy pool size, which defaults to 512 bytes. This means updating
+ the random seed in the ESP should be doable safely with a single sector
+ write (since hard-disk sectors typically happen to be 512 bytes long, too),
+ which should be safe even with FAT file system drivers built into
+ low-quality EFI firmwares.
+
+ As a special restriction: in virtualized environments PID 1 will refrain
+ from using this mechanism, for safety reasons. This is because on VM
+ environments the EFI variable space and the disk space is generally not
+ maintained physically separate (for example, `qemu` in EFI mode stores the
+ variables in the ESP itself). The robustness towards sloppy OS image
+ generation is the main purpose of maintaining the 'system token' however,
+ and if the EFI variable storage is not kept physically separate from the OS
+ image there's no point in it. That said, OS builders that know that they are
+ not going to replicate the built image on multiple systems may opt to turn
+ off the 'system token' concept by setting `random-seed-mode always` in the
+ ESP's
+ [`/loader/loader.conf`](https://www.freedesktop.org/software/systemd/man/loader.conf.html)
+ file. If done, `systemd-boot` will use the random seed file even if no
+ system token is found in EFI variables.
+
+4. A kernel command line option `systemd.random_seed=` may be used to pass in a
+ base64 encoded seed to initialize the kernel's entropy pool from during
+ early service manager initialization. This option is only safe in testing
+ environments, as the random seed passed this way is accessible to
+ unprivileged programs via `/proc/cmdline`. Using this option outside of
+ testing environments is a security problem since cryptographic key material
+ derived from the entropy pool initialized with a seed accessible to
+ unprivileged programs should not be considered secret.
+
+With the four mechanisms described above it should be possible to provide
+early-boot entropy in most cases. Specifically:
+
+1. On EFI systems, `systemd-boot`'s random seed logic should make sure good
+ entropy is available during earliest boot — as long as `systemd-boot` is
+ used as boot loader, and outside of virtualized environments.
+
+2. On virtualized systems, the early `virtio-rng` hookup should ensure entropy
+ is available early on — as long as the VM environment provides virtualized
+ RNG devices, which they really should all do in 2019. Complain to your
+ hosting provider if they don't. For VMs used in testing environments,
+ `systemd.random_seed=` may be used as an alternative to a virtualized RNG.
+
+3. On Intel/AMD systems systemd's own reliance on the kernel entropy pool is
+ minimal (as RDRAND is used on those for UUID generation). This only works if
+ the CPU has RDRAND of course, which most physical CPUs do (but I hear many
+ virtualized CPUs do not. Pity.)
+
+4. In all other cases, `systemd-random-seed.service` will help a bit, but — as
+ mentioned — is too late to help with early boot.
+
+This primarily leaves two kind of systems in the cold:
+
+1. Some embedded systems. Many embedded chipsets have hwrng functionality these
+ days. Consider using them while crediting
+ entropy. (i.e. `rng_core.default_quality=1000` on the kernel command line is
+ your friend). Or accept that the system might take a bit longer to
+ boot. Alternatively, consider implementing a solution similar to
+ systemd-boot's random seed concept in your platform's boot loader.
+
+2. Virtualized environments that lack both virtio-rng and RDRAND, outside of
+ test environments. Tough luck. Talk to your hosting provider, and ask them
+ to fix this.
+
+3. Also note: if you deploy an image without any random seed and/or without
+ installing any 'system token' in an EFI variable, as described above, this
+ means that on the first boot no seed can be passed to the OS
+ either. However, as the boot completes (with entropy acquired elsewhere),
+ systemd will automatically install both a random seed in the GPT and a
+ 'system token' in the EFI variable space, so that any future boots will have
+ entropy from earliest boot on — all provided `systemd-boot` is used.
+
+## Frequently Asked Questions
+
+1. *Why don't you just use getrandom()? That's all you need!*
+
+ Did you read any of the above? getrandom() is hooked to the kernel entropy
+ pool, and during early boot it's not going to be filled yet, very likely. We
+ do use it in many cases, but not in all. Please read the above again!
+
+2. *Why don't you use
+ [getentropy()](http://man7.org/linux/man-pages/man3/getentropy.3.html)? That's
+ all you need!*
+
+ Same story. That call is just a different name for `getrandom()` with
+ `flags` set to zero, and some additional limitations, and thus it also needs
+ the kernel's entropy pool to be initialized, which is the whole problem we
+ are trying to address here.
+
+3. *Why don't you generate your UUIDs with
+ [`uuidd`](http://man7.org/linux/man-pages/man8/uuidd.8.html)? That's all you
+ need!*
+
+ First of all, that's a system service, i.e. something that runs as "payload"
+ of systemd, long after systemd is already up and hence can't provide us
+ UUIDs during earliest boot yet. Don't forget: to assign the invocation UUID
+ for the `uuidd.service` start we already need a UUID that the service is
+ supposed to provide us. More importantly though, `uuidd` needs state/a random
+ seed/a MAC address/host ID to operate, all of which are not available during
+ early boot.
+
+4. *Why don't you generate your UUIDs with `/proc/sys/kernel/random/uuid`?
+ That's all you need!*
+
+ This is just a different, more limited interface to `/dev/urandom`. It gains
+ us nothing.
+
+5. *Why don't you use [`rngd`](https://github.com/nhorman/rng-tools),
+ [`haveged`](http://www.issihosts.com/haveged/),
+ [`egd`](http://egd.sourceforge.net/)? That's all you need!*
+
+ Like `uuidd` above these are system services, hence come too late for our
+ use-case. In addition much of what `rngd` provides appears to be equivalent
+ to `CONFIG_RANDOM_TRUST_CPU=y` or `rng_core.default_quality=1000`, except
+ being more complex and involving userspace. These services partly measure
+ system behavior (such as scheduling effects) which the kernel either
+ already feeds into its pool anyway (and thus shouldn't be fed into it a
+ second time, crediting entropy for it a second time) or is at least
+ something the kernel could much better do on its own. Hence, if what these
+ daemons do is still desirable today, this would be much better implemented
+ in kernel (which would be very welcome of course, but wouldn't really help
+ us here in our specific problem, see above).
+
+6. *Why don't you use [`arc4random()`](https://man.openbsd.org/arc4random.3)?
+ That's all you need!*
+
+ This doesn't solve the issue, since it requires a nonce to start from, and
+ it gets that from `getrandom()`, and thus we have to wait for random pool
+ initialization the same way as calling `getrandom()`
+ directly. `arc4random()` is nothing more than optimization, in fact it
+ implements similar algorithms that the kernel entropy pool implements
+ anyway, hence besides being able to provide random bytes with higher
+ throughput there's little it gets us over just using `getrandom()`. Also,
+ it's not supported by glibc. And as long as that's the case we are not keen
+ on using it, as we'd have to maintain that on our own, and we don't want to
+ maintain our own cryptographic primitives if we don't have to. Since
+ systemd's uses are not performance relevant (besides the pool initialization
+ delay, which this doesn't solve), there's hence little benefit for us to
+ call these functions. That said, if glibc learns these APIs one day, we'll
+ certainly make use of them where appropriate.
+
+7. *This is boring: NetBSD had [boot loader entropy seed
+ support](https://netbsd.gw.com/cgi-bin/man-cgi?boot+8) since ages!*
+
+ Yes, NetBSD has that, and the above is inspired by that (note though: this
+ article is about a lot more than that). NetBSD's support is not really safe,
+ since it neither updates the random seed before using it, nor has any
+ safeguards against replicating the same disk image with its random seed on
+ multiple machines (which the 'system token' mentioned above is supposed to
+ address). This means reuse of the same random seed by the boot loader is
+ much more likely.
+
+8. *Why does PID 1 upload the boot loader provided random seed into kernel
+ instead of kernel doing that on its own?*
+
+ That's a good question. Ideally the kernel would do that on its own, and we
+ wouldn't have to involve userspace in this.
+
+9. *What about non-EFI?*
+
+ The boot loader random seed logic described above uses EFI variables to pass
+ the seed from the boot loader to the OS. Other systems might have similar
+ functionality though, and it shouldn't be too hard to implement something
+ similar for them. Ideally, we'd have an official way to pass such a seed as
+ part of the `struct boot_params` from the boot loader to the kernel, but
+ this is currently not available.
+
+10. *I use a different boot loader than `systemd-boot`, I'd like to use boot
+ loader random seeds too!*
+
+ Well, consider just switching to `systemd-boot`, it's worth it. See
+ [systemd-boot(7)](https://www.freedesktop.org/software/systemd/man/systemd-boot.html)
+ for an introduction why. That said, any boot loader can re-implement the
+ logic described above, and can pass a random seed that systemd as PID 1
+ will then upload into the kernel's entropy pool. For details see the [Boot
+ Loader Interface](https://systemd.io/BOOT_LOADER_INTERFACE) documentation.
+
+11. *Why not pass the boot loader random seed via kernel command line instead
+ of as EFI variable?*
+
+ The kernel command line is accessible to unprivileged processes via
+ `/proc/cmdline`. It's not desirable if unprivileged processes can use this
+ information to possibly gain too much information about the current state
+ of the kernel's entropy pool.
+
+ That said, we actually do implement this with the `systemd.random_seed=`
+ kernel command line option. Don't use this outside of testing environments,
+ however, for the aforementioned reasons.
+
+12. *Why doesn't `systemd-boot` rewrite the 'system token' too each time
+ when updating the random seed file stored in the ESP?*
+
+ The system token is stored as persistent EFI variable, i.e. in some form of
+ NVRAM. These memory chips tend be of low quality in many machines, and
+ hence we shouldn't write them too often. Writing them once during
+ installation should generally be OK, but rewriting them on every single
+ boot would probably wear the chip out too much, and we shouldn't risk that.