diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-06-12 03:50:42 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-06-12 03:50:42 +0000 |
commit | 78e9bb837c258ac0ec7712b3d612cc2f407e731e (patch) | |
tree | f515d16b6efd858a9aeb5b0ef5d6f90bf288283d /docs | |
parent | Adding debian version 255.5-1. (diff) | |
download | systemd-78e9bb837c258ac0ec7712b3d612cc2f407e731e.tar.xz systemd-78e9bb837c258ac0ec7712b3d612cc2f407e731e.zip |
Merging upstream version 256.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to '')
28 files changed, 1264 insertions, 984 deletions
diff --git a/docs/AUTOPKGTEST.md b/docs/AUTOPKGTEST.md deleted file mode 100644 index 393b74e..0000000 --- a/docs/AUTOPKGTEST.md +++ /dev/null @@ -1,92 +0,0 @@ ---- -title: Autopkgtest - Defining tests for Debian packages -category: Documentation for Developers -layout: default -SPDX-License-Identifier: LGPL-2.1-or-later ---- - -# Test description - -Full system integration/acceptance testing is done through [autopkgtests](https://salsa.debian.org/ci-team/autopkgtest/-/blob/master/doc/README.package-tests.rst). These test the actual installed binary distribution packages. They are run in QEMU or containers and thus can do intrusive and destructive things such as installing arbitrary packages, modifying arbitrary files in the system (including grub boot parameters), rebooting, or loading kernel modules. - -The tests for systemd are defined in the [Debian package's debian/tests](https://salsa.debian.org/systemd-team/systemd/tree/master/debian/tests) directory. For validating a pull request, the Debian package is built using the unpatched code from that PR (via the [checkout-upstream](https://salsa.debian.org/systemd-team/systemd/blob/master/debian/extra/checkout-upstream) script), and the tests run against these built packages. Note that some tests which check Debian specific behaviour are skipped in "test upstream" mode. - -# Infrastructure - -systemd's GitHub project has webhooks that trigger autopkgtests on Ubuntu 18.04 LTS on three architectures: - -* i386: 32 bit x86, little endian, QEMU (OpenStack cloud instance) -* amd64: 64 bit x86, little endian, QEMU (OpenStack cloud instance) -* arm64: 64 bit ARM, little endian, QEMU (OpenStack cloud instance) -* s390x: 64 bit IBM z/Series, big endian, LXC (this architecture is not yet available in Canonical's OpenStack and thus skips some tests) - -Please see the [Ubuntu CI infrastructure](https://wiki.ubuntu.com/ProposedMigration/AutopkgtestInfrastructure) documentation for details about how this works. - -# Manually retrying/triggering tests on the infrastructure - -The current tests are fairly solid by now, but rarely they fail on infrastructure/network issues or race conditions. If you encounter these, please notify @iainlane in the GitHub PR for debugging/fixing those -- transient infrastructure issues are supposed to be detected automatically, and tests auto-retry on those; and flaky tests should of course be fixed properly. But sometimes it is useful to trigger tests on a different Ubuntu release too, for example to test a PR on a newer kernel or against current build/binary dependencies (cgroup changes, util-linux, gcc, etc.). - -This can be done using the generic [retry-github-test](https://git.launchpad.net/autopkgtest-cloud/tree/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/tools/retry-github-test) script from [Ubuntu's autopkgtest infrastructure](https://git.launchpad.net/autopkgtest-cloud): you need the parameterized URL from the [configured webhooks](https://github.com/systemd/systemd/settings/hooks) and the shared secret (Ubuntu's CI needs to restrict access to avoid DoSing and misuse). - -You can use Martin Pitt's [retry-gh-systemd-test](https://piware.de/gitweb/?p=bin.git;a=blob;f=retry-gh-systemd-test) shell wrapper around retry-github-test for that. You need to adjust the path where you put retry-github-test and the file with the shared secret, then you can call it like this: - -```sh -$ retry-gh-systemd-test <#PR> <architecture> [release] -``` - -where `release` defaults to `bionic` (aka Ubuntu 18.04 LTS). For example: - -```sh -$ retry-gh-systemd-test 1234 amd64 -$ retry-gh-systemd-test 2345 s390x cosmic -``` - -Please make sure to not trigger unknown [releases](https://launchpad.net/ubuntu/+series) or architectures as they will cause a pending test on the PR which never gets finished. - -# Test the code from the PR locally - -As soon as a test on the infrastructure finishes, the "Details" link in the PR "checks" section will point to the `log.gz` log. You can download the individual test log, built .debs, and other artifacts that tests leave behind (some dump a complete journal or the udev database on failure) by replacing `/log.gz` with `/artifacts.tar.gz` in that URL. You can then unpack the tarball and use `sudo dpkg -iO binaries/*.deb` to install the debs from the PR into an Ubuntu VM of the same release/architecture for manually testing a PR. - -# Run autopkgtests locally - -Preparations: - -* Get autopkgtest: - ```sh - git clone https://salsa.debian.org/ci-team/autopkgtest.git - ``` - -* Install necessary dependencies; on Debian/Ubuntu you can simply run `sudo apt install autopkgtest` (instead of the above cloning), on Fedora do `yum install qemu-kvm dpkg-perl` - -* Build a test image based on Ubuntu cloud images for the desired release/arch: - ```sh - autopkgtest/tools/autopkgtest-buildvm-ubuntu-cloud -r bionic -a amd64 - ``` - - This will build `autopkgtest-bionic-amd64.img`. This is normally being used through the `autopkgtest` command (see below), but you can boot this normally in QEMU (using `-snapshot` is highly recommended) to interactively poke around; this provides a easy throw-away test environment. - - -The most basic mode of operation is to run the tests for the current distro packages: - -```sh -autopkgtest/runner/autopkgtest systemd -- qemu autopkgtest-bionic-amd64.img -``` - -But autopkgtest allows lots of [different modes](https://salsa.debian.org/ci-team/autopkgtest/-/blob/master/doc/README.running-tests.rst) and [options](http://manpages.ubuntu.com/autopkgtest), like running a shell on failure (`-s`), running a single test only (`--test-name`), running the tests from a local checkout of the Debian source tree (possibly with modifications to the test) instead of from the distribution source, or running QEMU with more than one CPU (check the [autopkgtest-virt-qemu manpage](http://manpages.ubuntu.com/autopkgtest-virt-qemu). - -A common use case is to check out the Debian packaging git for getting/modifying the tests locally: - -```sh -git clone https://salsa.debian.org/systemd-team/systemd.git /tmp/systemd-debian/ -``` - -and running these against the binaries from a PR (see above), running only the `logind` test, getting a shell on failure, showing the boot output, and running with 2 CPUs: - -```sh -autopkgtest/runner/autopkgtest --test-name logind /tmp/binaries/*.deb /tmp/systemd-debian/ -s -- \ - qemu --show-boot --cpus 2 /srv/vm/autopkgtest-bionic-amd64.img -``` - -# Contact - -For troubles with the infrastructure, please notify [iainlane](https://github.com/iainlane) in the affected PR. diff --git a/docs/CNAME b/docs/CNAME deleted file mode 100644 index cdcf4d9..0000000 --- a/docs/CNAME +++ /dev/null @@ -1 +0,0 @@ -systemd.io
\ No newline at end of file diff --git a/docs/CODING_STYLE.md b/docs/CODING_STYLE.md index b4e88c9..8f687e6 100644 --- a/docs/CODING_STYLE.md +++ b/docs/CODING_STYLE.md @@ -780,3 +780,13 @@ SPDX-License-Identifier: LGPL-2.1-or-later good idea where it might end up running inside of libsystemd.so or similar. Hence, use TLS (i.e. `thread_local`) where appropriate, and maybe the occasional `pthread_once()`. + +## Tests + +- Use the assertion macros from `tests.h` (`ASSERT_GE()`, `ASSERT_OK()`, ...) to + make sure a descriptive error is logged when an assertion fails. If no assertion + macro exists for your specific use case, please add a new assertion macro in a + separate commit. + +- When modifying existing tests, please convert the test to use the new assertion + macros from `tests.h` if it is not already using those. diff --git a/docs/CONTAINER_INTERFACE.md b/docs/CONTAINER_INTERFACE.md index 460cc67..e64953c 100644 --- a/docs/CONTAINER_INTERFACE.md +++ b/docs/CONTAINER_INTERFACE.md @@ -164,10 +164,15 @@ manager, please consider supporting the following interfaces. issuing `journalctl -m`. The container machine ID can be determined from `/etc/machine-id` in the container. -3. If the container manager wants to cleanly shutdown the container, it might +3. If the container manager wants to cleanly shut down the container, it might be a good idea to send `SIGRTMIN+3` to its init process. systemd will then do a clean shutdown. Note however, that since only systemd understands - `SIGRTMIN+3` like this, this might confuse other init systems. + `SIGRTMIN+3` like this, this might confuse other init systems. A container + manager may implement the `$NOTIFY_SOCKET` protocol mentioned below in which + case it will receive a notification message `X_SYSTEMD_SIGNALS_LEVEL=2` that + indicates if and when these additional signal handlers are installed. If + these signals are sent to the container's PID 1 before this notification + message is sent they might not be handled correctly yet. 4. To support [Socket Activated Containers](https://0pointer.de/blog/projects/socket-activated-containers.html) @@ -189,12 +194,14 @@ manager, please consider supporting the following interfaces. unit they created for their container. That's private property of systemd, and no other code should modify it. -6. systemd running inside the container can report when boot-up is complete - using the usual `sd_notify()` protocol that is also used when a service - wants to tell the service manager about readiness. A container manager can - set the `$NOTIFY_SOCKET` environment variable to a suitable socket path to - make use of this functionality. (Also see information about - `/run/host/notify` below.) +6. systemd running inside the container can report when boot-up is complete, + boot progress and functionality as well as various other bits of system + information using the `sd_notify()` protocol that is also used when a + service wants to tell the service manager about readiness. A container + manager can set the `$NOTIFY_SOCKET` environment variable to a suitable + socket path to make use of this functionality. (Also see information about + `/run/host/notify` below, as well as the Readiness Protocol section on + [systemd(1)](https://www.freedesktop.org/software/systemd/man/latest/systemd.html) ## Networking @@ -272,6 +279,30 @@ care should be taken to avoid naming conflicts. `systemd` (and in particular 7. The `/run/host/credentials/` directory is a good place to pass credentials into the container, using the `$CREDENTIALS_DIRECTORY` protocol, see above. +8. The `/run/host/unix-export/` directory shall be writable from the container + payload, and is where container payload can bind `AF_UNIX` sockets in that + shall be *exported* to the host, so that the host can connect to them. The + container manager should bind mount this directory on the host side + (read-only ideally), so that the host can connect to contained sockets. This + is most prominently used by `systemd-ssh-generator` when run in such a + container to automatically bind an SSH socket into that directory, which + then can be used to connect to the container. + +9. The `/run/host/unix-export/ssh` `AF_UNIX` socket will be automatically bound + by `systemd-ssh-generator` in the container if possible, and can be used to + connect to the container. + +10. The `/run/host/userdb/` directory may be used to drop-in additional JSON + user records that `nss-systemd` inside the container shall include in the + system's user database. This is useful to make host users and their home + directories automatically accessible to containers in transitive + fashion. See `nss-systemd(8)` for details. + +11. The `/run/host/home/` directory may be used to bind mount host home + directories of users that shall be made available in the container to. This + may be used in combination with `/run/host/userdb/` above: one defines the + user record, the other contains the user's home directory. + ## What You Shouldn't Do 1. Do not drop `CAP_MKNOD` from the container. `PrivateDevices=` is a commonly diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md index c247102..5274f01 100644 --- a/docs/CONTRIBUTING.md +++ b/docs/CONTRIBUTING.md @@ -33,13 +33,13 @@ For older versions that are still supported by your distribution please use resp ## Security vulnerability reports -See [reporting of security vulnerabilities](/SECURITY). +See [reporting of security vulnerabilities](https://systemd.io/SECURITY). ## Posting Pull Requests * Make sure to post PRs only relative to a recent tip of the `main` branch. -* Follow our [Coding Style](/CODING_STYLE) when contributing code. This is a requirement for all code we merge. -* Please make sure to test your change before submitting the PR. See the [Hacking guide](/HACKING) for details on how to do this. +* Follow our [Coding Style](https://systemd.io/CODING_STYLE) when contributing code. This is a requirement for all code we merge. +* Please make sure to test your change before submitting the PR. See the [Hacking guide](https://systemd.io/HACKING) for details on how to do this. * Make sure to run the test suite locally, before posting your PR. We use a CI system, meaning we don't even look at your PR if the build and tests don't pass. * If you need to update the code in an existing PR, force-push into the same branch, overriding old commits with new versions. * After you have pushed a new version, add a comment explaining the latest changes. diff --git a/docs/CONTROL_GROUP_INTERFACE.md b/docs/CONTROL_GROUP_INTERFACE.md index c82a2c3..f95cf76 100644 --- a/docs/CONTROL_GROUP_INTERFACE.md +++ b/docs/CONTROL_GROUP_INTERFACE.md @@ -223,9 +223,9 @@ Use these APIs to register any kind of process workload with systemd to be place ### Reading Accounting Information -Note that there's currently no systemd API to retrieve accounting information from cgroups. For now, if you need to retrieve this information use `/proc/$PID/cgroup` to determine the cgroup path for your process in the `cpuacct` controller (or whichever controller matters to you), and then read the attributes directly from the cgroup tree. +Note that there's currently no systemd API to retrieve accounting information from cgroups. For now, if you need to retrieve this information use `/proc/$PID/cgroup` to determine the cgroup path for your process in the `cpuacct` controller (or whichever controller matters to you), and then read the attributes directly from the cgroup tree. -If you want to collect the exit status and other runtime parameters of your transient scope or service unit after the processes in them ended set the `RemainAfterExited` boolean property when creating it. This will has the effect that the unit will stay around even after all processes in it died, in the `SubState="exited"` state. Simply watch for state changes until this state is reached, then read the status details from the various properties you need, and finally terminate the unit via `StopUnit()` on the `Manager` object or `Stop()` on the `Unit` object itself. +If you want to collect the exit status and other runtime parameters of your transient scope or service unit after the processes in them ended set the `RemainAfterExit` boolean property when creating it. This will has the effect that the unit will stay around even after all processes in it died, in the `SubState="exited"` state. Simply watch for state changes until this state is reached, then read the status details from the various properties you need, and finally terminate the unit via `StopUnit()` on the `Manager` object or `Stop()` on the `Unit` object itself. ### Becoming a Controller @@ -241,7 +241,7 @@ Service and scope units know a special `Delegate` boolean property. If set, then 2. Access to the cgroup directory of the scope/service is permitted, and files/and directories are updated to get write access for the user specified in `User=` if the scope/unit runs unprivileged. Note that in this case access to any controllers is not available. 3. systemd will refrain from moving processes across the "delegation" boundary. -Generally, the `Delegate` property is only useful for services that need to manage their own cgroup subtrees, such as container managers. After creating a unit with this property set, they should use `/proc/$PID/cgroup` to figure out the cgroup subtree path they may manage (the one from the name=systemd hierarchy!). Managers should refrain from making any changes to the cgroup tree outside of the subtrees for units they created with the `Delegate` flag turned on. +Generally, the `Delegate` property is only useful for services that need to manage their own cgroup subtrees, such as container managers. After creating a unit with this property set, they should use `/proc/$PID/cgroup` to figure out the cgroup subtree path they may manage (the one from the name=systemd hierarchy!). Managers should refrain from making any changes to the cgroup tree outside of the subtrees for units they created with the `Delegate` flag turned on. Note that scope units created by `machined`'s `CreateMachine()` call have this flag set. diff --git a/docs/COREDUMP.md b/docs/COREDUMP.md index d235479..ce58f16 100644 --- a/docs/COREDUMP.md +++ b/docs/COREDUMP.md @@ -30,7 +30,7 @@ Specifically, PID 1 provides the following functionality: to the current working directory of the crashing process.) Net effect: after PID1 has started and performed this setup coredumps are -disabled, but by means of the the `kernel.core_pattern` sysctl rather than by +disabled, but by means of the `kernel.core_pattern` sysctl rather than by size limit. This is generally preferable, since the pattern can be updated trivially at the right time to enable coredumping once the system is ready, taking comprehensive effect on all userspace. (Or to say this differently: disabling coredumps via the size limit is problematic, since it cannot easily diff --git a/docs/CREDENTIALS.md b/docs/CREDENTIALS.md index efa948b..1203f61 100644 --- a/docs/CREDENTIALS.md +++ b/docs/CREDENTIALS.md @@ -67,7 +67,8 @@ purpose. Specifically, the following features are provided: ## Configuring per-Service Credentials -Within unit files, there are four settings to configure service credentials. +Within unit files, there are the following settings to configure service +credentials. 1. `LoadCredential=` may be used to load a credential from disk, from an `AF_UNIX` socket, or propagate them from a system credential. @@ -94,7 +95,7 @@ Each credential configured with these options carries a short name (suitable for inclusion in a filename) in the unit file, under which the invoked service code can then retrieve it. Each name should only be specified once. -For details about these four settings [see the man +For details about these settings [see the man page](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Credentials). It is a good idea to also enable mount namespacing for services that process @@ -208,7 +209,7 @@ via `systemd-creds cat`. ## Encryption Credentials are supposed to be useful for carrying sensitive information, such -as cryptographic key material. For this kind of data (symmetric) encryption and +as cryptographic key material. For such purposes (symmetric) encryption and authentication are provided to make storage of the data at rest safer. The data may be encrypted and authenticated with AES256-GCM. The encryption key can either be one derived from the local TPM2 device, or one stored in @@ -284,8 +285,8 @@ services where they are ultimately consumed. 1. A container manager may set the `$CREDENTIALS_DIRECTORY` environment variable for systemd running as PID 1 in the container, the same way as - systemd would set it for a service it - invokes. [`systemd-nspawn(1)`](https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html#Credentials)'s + systemd would set it for a service it invokes. + [`systemd-nspawn(1)`](https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html#Credentials)'s `--set-credential=` and `--load-credential=` switches implement this, in order to pass arbitrary credentials from host to container payload. Also see the [Container Interface](/CONTAINER_INTERFACE) documentation. @@ -300,31 +301,27 @@ services where they are ultimately consumed. host through the hypervisor into the VM via qemu's `fw_cfg` mechanism. (All three of these specific switches would set credential `foo` to `bar`.) Passing credentials via the SMBIOS mechanism is typically preferable over - `fw_cfg` since it is faster and less specific to the chosen VMM - implementation. Moreover, `fw_cfg` has a 55 character limitation on names - passed that way. So some settings may not fit. + `fw_cfg` since it is faster and less specific to the chosen VMM implementation. + Moreover, `fw_cfg` has a 55 character limitation on names passed that way. So some settings may not fit. -3. Credentials may be passed from the initrd to the host during the initrd → - host transition. Provisioning systems that run in the initrd may use this to - install credentials on the system. All files placed in - `/run/credentials/@initrd/` are imported into the set of file system - credentials during the transition. The files (and their directory) are - removed once this is completed. +3. Credentials may be passed from the initrd to the host during the initrd → host transition. + Provisioning systems that run in the initrd may use this to install credentials on the system. + All files placed in `/run/credentials/@initrd/` are imported into the set of file system credentials during the transition. + The files (and their directory) are removed once this is completed. 4. Credentials may also be passed from the UEFI environment to userspace, if the [`systemd-stub`](https://www.freedesktop.org/software/systemd/man/systemd-stub.html) - UEFI kernel stub is used. This allows placing encrypted credentials in the - EFI System Partition, which are then picked up by `systemd-stub` and passed - to the kernel and ultimately userspace where systemd receives them. This is - useful to implement secure parameterization of vendor-built and signed + UEFI kernel stub is used. + This allows placing encrypted credentials in the EFI System Partition, which are then picked up by `systemd-stub` and passed to the kernel and ultimately userspace where systemd receives them. + This is useful to implement secure parameterization of vendor-built and signed initrds, as userspace can place credentials next to these EFI kernels, and be sure they can be accessed securely from initrd context. 5. Credentials can also be passed into a system via the kernel command line, via the `systemd.set_credential=` and `systemd.set_credential_binary=` - kernel command line options (the latter takes Base64 encoded binary - data). Note though that any data specified here is visible to all userspace + kernel command line options (the latter takes Base64 encoded binary data). + Note though that any data specified here is visible to all userspace applications (even unprivileged ones) via `/proc/cmdline`. Typically, this is hence not useful to pass sensitive information, and should be avoided. @@ -376,19 +373,19 @@ Various services shipped with `systemd` consume credentials for tweaking behavio * [`systemd(1)`](https://www.freedesktop.org/software/systemd/man/systemd.html) (I.E.: PID1, the system manager) will look for the credential `vmm.notify_socket` and will use it to send a `READY=1` datagram when the system has finished - booting. This is useful for hypervisors/VMMs or other processes on the host - to receive a notification via VSOCK when a virtual machine has finished booting. + booting. + This is useful for hypervisors/VMMs or other processes on the host to receive a notification via VSOCK when a virtual machine has finished booting. Note that in case the hypervisor does not support `SOCK_DGRAM` over `AF_VSOCK`, - `SOCK_SEQPACKET` will be tried instead. The credential payload should be in the - form: `vsock:<CID>:<PORT>`. Also note that this requires support for VHOST to be - built-in both the guest and the host kernels, and the kernel modules to be loaded. + `SOCK_SEQPACKET` will be tried instead. + The credential payload should be in the form: `vsock:<CID>:<PORT>`. + Also note that this requires support for VHOST to be built-in both the guest and the host kernels, and the kernel modules to be loaded. * [`systemd-sysusers(8)`](https://www.freedesktop.org/software/systemd/man/systemd-sysusers.html) will look for the credentials `passwd.hashed-password.<username>`, `passwd.plaintext-password.<username>` and `passwd.shell.<username>` to configure the password (either in UNIX hashed form, or plaintext) or shell of - system users created. Replace `<username>` with the system user of your - choice, for example, `root`. + system users created. + Replace `<username>` with the system user of your choice, for example, `root`. * [`systemd-firstboot(1)`](https://www.freedesktop.org/software/systemd/man/systemd-firstboot.html) will look for the credentials `firstboot.locale`, `firstboot.locale-messages`, @@ -455,7 +452,7 @@ qemu-system-x86_64 \ -device scsi-hd,drive=hd,bootindex=1 \ -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=42 \ -smbios type=11,value=io.systemd.credential:vmm.notify_socket=vsock:2:1234 \ - -smbios type=11,value=io.systemd.credential.binary:tmpfiles.extra=$(echo "f~ /root/.ssh/authorized_keys 600 root root - $(ssh-add -L | base64 -w 0)" | base64 -w 0) + -smbios type=11,value=io.systemd.credential.binary:tmpfiles.extra=$(echo -e "d /root/.ssh 0750 root root -\nf~ /root/.ssh/authorized_keys 0600 root root - $(ssh-add -L | base64 -w 0)" | base64 -w 0) ``` A process on the host can listen for the notification, for example: @@ -488,17 +485,14 @@ credentials that must be decrypted/validated before use, such as those from The `ImportCredential=` setting (and the `LoadCredential=` and `LoadCredentialEncrypted=` settings when configured with a relative source -path) will search for the source file to read the credential from -automatically. Primarily, these credentials are searched among the credentials -passed into the system. If not found there, they are searched in -`/etc/credstore/`, `/run/credstore/`, -`/usr/lib/credstore/`. `LoadCredentialEncrypted=` will also search -`/etc/credstore.encrypted/` and similar directories. `ImportCredential=` will -search both the non-encrypted and encrypted directories. These directories are -hence a great place to store credentials to load on the system. +path) will search for the source file to read the credential from automatically. +Primarily, these credentials are searched among the credentials passed into the system. If not found there, they are searched in `/etc/credstore/`, `/run/credstore/`, `/usr/lib/credstore/`. `LoadCredentialEncrypted=` will also search +`/etc/credstore.encrypted/` and similar directories. +`ImportCredential=` will search both the non-encrypted and encrypted directories. +These directories are hence a great place to store credentials to load on the system. ## Conditionalizing Services Sometimes it makes sense to conditionalize system services and invoke them only -if the right system credential is passed to the system. Use the -`ConditionCredential=` and `AssertCredential=` unit file settings for that. +if the right system credential is passed to the system. +Use the `ConditionCredential=` and `AssertCredential=` unit file settings for that. diff --git a/docs/DAEMON_SOCKET_ACTIVATION.md b/docs/DAEMON_SOCKET_ACTIVATION.md index 70a3299..107615e 100644 --- a/docs/DAEMON_SOCKET_ACTIVATION.md +++ b/docs/DAEMON_SOCKET_ACTIVATION.md @@ -35,10 +35,6 @@ PrivateNetwork=true **/etc/systemd/system/my-nginx.socket** ``` -[Unit] -After=network.target -Requires=network.target - [Socket] ListenStream=80 ListenStream=0.0.0.0:80 diff --git a/docs/DEBUGGING.md b/docs/DEBUGGING.md index 3e89a5d..175e557 100644 --- a/docs/DEBUGGING.md +++ b/docs/DEBUGGING.md @@ -93,7 +93,7 @@ systemctl enable debug-shell.service or by specifying ```sh -systemd.debug-shell=1 +systemd.debug_shell=1 ``` on the kernel command line. diff --git a/docs/ELF_DLOPEN_METADATA.md b/docs/ELF_DLOPEN_METADATA.md new file mode 100644 index 0000000..5c3bf1e --- /dev/null +++ b/docs/ELF_DLOPEN_METADATA.md @@ -0,0 +1,89 @@ +--- +title: Dlopen Metadata for ELF Files +category: Interfaces +layout: default +SPDX-License-Identifier: LGPL-2.1-or-later +--- + +# `dlopen()` Metadata for ELF Files + +*Intended audience: hackers working on packaging ELF files that use dlopen to load libraries.* + +## Motivation + +Using `dlopen()` to load optional dependencies brings several advantages: programs can gracefully downgrade +a feature when a library is not available, and the shared library is only loaded into the process (and its +ELF constructors are run) only when the requested feature is actually used. But it also has some drawbacks, +and the main one is that it is harder to track a program's dependencies, since unlike build-time dynamic +linking there will not be a mention in the ELF metadata. This specification aims to solve this problem by +providing a standardized specification for a custom ELF note that can be used to list `dlopen()` +dependencies. + +## Implementation + +This document will attempt to define a common metadata format specification, so that multiple implementers +might use it when coding upstream software, and packagers might use it when building packages and setting +dependencies. + +The metadata will be embedded in a series of new, 4-byte-aligned, allocated, 0-padded, read-only ELF header +sections, in a JSON array containing name-value objects, either one ELF note per dependency or as a single +note listing multiple dependencies in the top-level array. Implementers working on parsing ELF files should +not assume a specific list of names, but parse anything that is included in the section, and should look for +the note using the `note type`. Implementers working on build tools should strive to use the same names, for +consistency. The most common will be listed here. + +* Section header + +``` +SECTION: `.note.dlopen` +note type: `0x407c0c0a` +Owner: `FDO` (FreeDesktop.org) +Value: an array of JSON objects encoded as a zero-terminated UTF-8 string +``` + +* JSON payload + +```json +[ + { + "soname": ["libfoo.so.1"], + "feature": "foo", + "description": "Enables the foo feature", + "priority": "recommended" + } +] +``` + +The format is a single JSON array containing objects, encoded as a zero-terminated `UTF-8` string. Each key +in each object shall be unique as per recommendations of [RFC8259](https://datatracker.ietf.org/doc/html/rfc8259#section-4). +Strings shall not contain any control characters or use `\uXXX` escaping. + +Reference implementations of [packaging tools for `.deb` and `.rpm`](https://github.com/systemd/package-notes) +are available, and provide macros/helpers to parse the note when building packages and adding dependencies. + +## Well-known keys + +The metadata format is intentionally extensible, so that upstreams and later revisions of this spec can add +their own information. The 'soname' array is required, with at least one element, everything else is +optional. If alternative soname versions for the same library are supported at the same time, an array can +be used, listing the most preferred first, and parsers are expected to select only the first one that is +available on the system, as it is a mechanism to specify alternatives. If the `priority` field is used, it +must follow the specification and use one of the values specified in the table. If it is not specified, a +parser should assume 'recommended' if a priority is needed. If the `feature` field is used, it will identify +an individual feature, and multiple entries using the same `feature` denote functionality that requires all +of the libraries they specify in order to be enabled. + +| Key name | Key type | Mandatory | Key description | Example value | +|-------------|----------------------------|-----------|--------------------------------------------------------------------------|----------------------------------| +| soname | array of strings | yes | The library names loaded by `dlopen()` | [ "libfoo.so.1", "libfoo.so.0" ] | +| feature | string | no | A keyword identifying the feature that the library contributes to enable | "foo" | +| description | string | no | A human-readable text string describing the feature | "Enables the foo feature" | +| priority | string | no | The priority of the feature, one of: required, recommended, suggested | "recommended" | + +### Priority definition + +| Priority | Semantics | +|-------------|--------------------------------------------------------------------------------------------------------------------------------------| +| required | Core functionality needs the dependency, the binary will not work if it cannot be found | +| recommended | Important functionality needs the dependency, the binary will work but in most cases the dependency should be provided | +| suggested | Secondary functionality needs the dependency, the binary will work and the dependency is only needed for full-featured installations | diff --git a/docs/ELF_PACKAGE_METADATA.md b/docs/ELF_PACKAGE_METADATA.md index 176f574..2b58cf1 100644 --- a/docs/ELF_PACKAGE_METADATA.md +++ b/docs/ELF_PACKAGE_METADATA.md @@ -14,7 +14,7 @@ or parse ELF core files.* ELF binaries get stamped with a unique, build-time generated hex string identifier called `build-id`, [which gets embedded as an ELF note called `.note.gnu.build-id`](https://fedoraproject.org/wiki/Releases/FeatureBuildId). -In most cases, this allows to associate a stripped binary with its debugging information. +In most cases, this allows a stripped binary to be associated with its debugging information. It is used, for example, to dynamically fetch DWARF symbols from a debuginfo server, or to query the local package manager and find out the package metadata or, again, the DWARF symbols or program sources. diff --git a/docs/ENVIRONMENT.md b/docs/ENVIRONMENT.md index 5e15b2b..fd8aa0c 100644 --- a/docs/ENVIRONMENT.md +++ b/docs/ENVIRONMENT.md @@ -126,6 +126,9 @@ All tools: * `$SYSTEMD_NETLINK_DEFAULT_TIMEOUT` — specifies the default timeout of waiting replies for netlink messages from the kernel. Defaults to 25 seconds. +* `$SYSTEMD_VERITY_SHARING=0` — if set, sharing dm-verity devices by + using a stable `<ROOTHASH>-verity` device mapper name will be disabled. + `systemctl`: * `$SYSTEMCTL_FORCE_BUS=1` — if set, do not connect to PID 1's private D-Bus @@ -180,6 +183,17 @@ All tools: expected format is six groups of two hexadecimal digits separated by colons, e.g. `SYSTEMD_NSPAWN_NETWORK_MAC=12:34:56:78:90:AB` +`systemd-vmspawn`: + +* `$SYSTEMD_VMSPAWN_NETWORK_MAC=...` — if set, allows users to set a specific MAC + address for a VM, ensuring that it uses the provided value instead of + generating a random one. It is effective when used with `--network-tap`. The + expected format is six groups of two hexadecimal digits separated by colons, + e.g. `SYSTEMD_VMSPAWN_NETWORK_MAC=12:34:56:78:90:AB` + +* `$SYSTEMD_VMSPAWN_QEMU_EXTRA=…` – may contain additional command line + arguments to append the qemu command line. + `systemd-logind`: * `$SYSTEMD_BYPASS_HIBERNATION_MEMORY_CHECK=1` — if set, report that @@ -241,7 +255,7 @@ All tools: when determining stable network interface names. This may be used to revert to naming schemes of older udev versions, in order to provide more stable naming across updates. This environment variable takes precedence over the - kernel command line option `net.naming-scheme=`, except if the value is + kernel command line option `net.naming_scheme=`, except if the value is prefixed with `:` in which case the kernel command line option takes precedence, if it is specified as well. @@ -249,6 +263,21 @@ All tools: devices sysfs path are actually backed by sysfs. Relaxing this verification is useful for testing purposes. +* `$SYSTEMD_UDEV_EXTRA_TIMEOUT_SEC=` — Specifies an extra timespan that the + udev manager process waits for a worker process kills slow programs specified + by IMPORT{program}=, PROGRAM=, or RUN=, and finalizes the processing event. + If the worker process cannot finalize the event within the specified timespan, + the worker process is killed by the manager process. Defaults to 10 seconds, + maximum allowed is 5 hours. + +`udevadm` and `systemd-hwdb`: + +* `SYSTEMD_HWDB_UPDATE_BYPASS=` — If set to "1", execution of hwdb updates is skipped + when `udevadm hwdb --update` or `systemd-hwdb update` are invoked. This can + be useful if either of these tools are invoked unconditionally as a child + process by another tool, such as package managers running either of these + tools in a postinstall script. + `nss-systemd`: * `$SYSTEMD_NSS_BYPASS_SYNTHETIC=1` — if set, `nss-systemd` won't synthesize @@ -312,7 +341,7 @@ All tools: for cases where we don't need to track given unit type, e.g. `--user` manager often doesn't need to deal with device or swap units because they are handled by the `--system` manager (PID 1). Note that setting certain unit - type as unsupported may not prevent loading some units of that type if they + type as unsupported might not prevent loading some units of that type if they are referenced by other units of another supported type. * `$SYSTEMD_DEFAULT_MOUNT_RATE_LIMIT_BURST` — can be set to override the mount @@ -330,16 +359,22 @@ All tools: `systemd-gpt-auto-generator` to ensure the root partition is mounted writable in accordance to the GPT partition flags. -`systemd-firstboot` and `localectl`: +`systemd-firstboot`, `localectl`, and `systemd-localed`: * `$SYSTEMD_LIST_NON_UTF8_LOCALES=1` — if set, non-UTF-8 locales are listed among the installed ones. By default non-UTF-8 locales are suppressed from the selection, since we are living in the 21st century. +* `$SYSTEMD_KEYMAP_DIRECTORIES=` — takes a colon (`:`) separated list of keymap + directories. The directories must be absolute and normalized. If unset, the + default keymap directories (/usr/share/keymaps/, /usr/share/kbd/keymaps/, and + /usr/lib/kbd/keymaps/) will be used. + `systemd-resolved`: * `$SYSTEMD_RESOLVED_SYNTHESIZE_HOSTNAME` — if set to "0", `systemd-resolved` - won't synthesize system hostname on both regular and reverse lookups. + won't synthesize A/AAAA/PTR RRs for the system hostname on either regular nor + reverse lookups. `systemd-sysext`: @@ -354,6 +389,13 @@ All tools: `$SYSTEMD_CONFEXT_HIERARCHIES` works for confext images and supports the systemd-confext multi-call functionality of sysext. +* `$SYSTEMD_SYSEXT_MUTABLE_MODE` — this variable may be used to override the + default mutability mode for hierarchies managed by `systemd-sysext`. It takes + the same values the `--mutable=` command line switch does. Note that the + command line still overrides the effect of the environment + variable. Similarly, `$SYSTEMD_CONFEXT_MUTABLE_MODE` works for confext images + and supports the systemd-confext multi-call functionality of sysext. + `systemd-tmpfiles`: * `$SYSTEMD_TMPFILES_FORCE_SUBVOL` — if unset, `v`/`q`/`Q` lines will create @@ -462,6 +504,12 @@ disk images with `--image=` or similar: devices when opening them. Defaults to on, set this to "0" to disable this feature. +* `$SYSTEMD_ALLOW_USERSPACE_VERITY` — takes a boolean, which controls whether + to consider the userspace Verity public key store in `/etc/verity.d/` (and + related directories) to authenticate signatures on Verity hashes of disk + images. Defaults to true, i.e. userspace signature validation is allowed. If + false, authentication can be done only via the kernel's internal keyring. + `systemd-cryptsetup`: * `$SYSTEMD_CRYPTSETUP_USE_TOKEN_MODULE` – takes a boolean, which controls @@ -533,6 +581,14 @@ SYSTEMD_HOME_DEBUG_SUFFIX=foo \ `mkfs` when formatting LUKS home directories. There's one variable for each of the supported file systems for the LUKS home directory backend. +* `$SYSTEMD_HOME_LOCK_FREEZE_SESSION` - Takes a boolean. When false, the user's + session will not be frozen when the home directory is locked. Note that the kernel + may still freeze any task that tries to access data from the user's locked home + directory. This can lead to data loss, security leaks, or other undesired behavior + caused by parts of the session becoming unresponsive due to disk I/O while other + parts of the session continue running. Thus, we highly recommend that this variable + isn't used unless necessary. Defaults to true. + `kernel-install`: * `$KERNEL_INSTALL_BYPASS` – If set to "1", execution of kernel-install is skipped @@ -585,6 +641,13 @@ SYSTEMD_HOME_DEBUG_SUFFIX=foo \ `nftables`. Selects the firewall backend to use. If not specified tries to use `nftables` and falls back to `iptables` if that's not available. +`systemd-networkd`: + +* `$SYSTEMD_NETWORK_PERSISTENT_STORAGE_READY` – takes a boolean. If true, + systemd-networkd tries to open the persistent storage on start. To make this + work, ProtectSystem=strict in systemd-networkd.service needs to be downgraded + or disabled. + `systemd-storagetm`: * `$SYSTEMD_NVME_MODEL`, `$SYSTEMD_NVME_FIRMWARE`, `$SYSTEMD_NVME_SERIAL`, @@ -595,3 +658,58 @@ SYSTEMD_HOME_DEBUG_SUFFIX=foo \ latter two via the environment variable unless `systemd-storagetm` is invoked to expose a single device only, since those identifiers better should be kept unique. + +`systemd-pcrlock`, `systemd-pcrextend`: + +* `$SYSTEMD_MEASURE_LOG_USERSPACE` – the path to the `tpm2-measure.log` file + (containing userspace measurement data) to read. This allows overriding the + default of `/run/log/systemd/tpm2-measure.log`. + +* `$SYSTEMD_MEASURE_LOG_FIRMWARE` – the path to the `binary_bios_measurements` + file (containing firmware measurement data) to read. This allows overriding + the default of `/sys/kernel/security/tpm0/binary_bios_measurements`. + +`systemd-sleep`: + +* `$SYSTEMD_SLEEP_FREEZE_USER_SESSIONS` - Takes a boolean. When true (the default), + `user.slice` will be frozen during sleep. When false it will not be. We recommend + against using this variable, because it can lead to undesired behavior, especially + for systems that use home directory encryption and for + `systemd-suspend-then-hibernate.service`. + +Tools using the Varlink protocol (such as `varlinkctl`) or sd-bus (such as +`busctl`): + +* `$SYSTEMD_SSH` – the ssh binary to invoke when the `ssh:` transport is + used. May be a filename (which is searched for in `$PATH`) or absolute path. + +* `$SYSTEMD_VARLINK_LISTEN` – interpreted by some tools that provide a Varlink + service. Takes a file system path: if specified the tool will listen on an + `AF_UNIX` stream socket on the specified path in addition to whatever else it + would listen on. + +`systemd-mountfsd`: + +* `$SYSTEMD_MOUNTFSD_TRUSTED_DIRECTORIES` – takes a boolean argument. If true + disk images from the usual disk image directories (`/var/lib/machines/`, + `/var/lib/confexts/`, …) will be considered "trusted", i.e. are validated + with a more relaxed image policy (typically not requiring Verity signature + checking) than those from other directories (where Verity signature checks + are mandatory). If false all images are treated the same, regardless if + placed in the usual disk image directories or elsewhere. If not set defaults + to a compile time setting. + +* `$SYSTEMD_MOUNTFSD_IMAGE_POLICY_TRUSTED`, + `$SYSTEMD_MOUNTFSD_IMAGE_POLICY_UNTRUSTED` – the default image policy to + apply to trusted and untrusted disk images. An image is considered trusted if + placed in a trusted disk image directory (see above), or if suitable polkit + authentication was acquired. See `systemd.image-policy(7)` for the valid + syntax for image policy strings. + +`systemd-run`, `run0`, `systemd-nspawn`, `systemd-vmspawn`: + +* `$SYSTEMD_TINT_BACKGROUND` – Takes a boolean. When false the automatic + tinting of the background for containers, VMs, and interactive `systemd-run` + and `run0` invocations is turned off. Note that this environment variable has + no effect if the background color is explicitly selected via the relevant + `--background=` switch of the tool. diff --git a/docs/HACKING.md b/docs/HACKING.md index 45334d8..51499d7 100644 --- a/docs/HACKING.md +++ b/docs/HACKING.md @@ -7,41 +7,33 @@ SPDX-License-Identifier: LGPL-2.1-or-later # Hacking on systemd -We welcome all contributions to systemd. If you notice a bug or a missing -feature, please feel invited to fix it, and submit your work as a +We welcome all contributions to systemd. +If you notice a bug or a missing feature, please feel invited to fix it, and submit your work as a [GitHub Pull Request (PR)](https://github.com/systemd/systemd/pull/new). -Please make sure to follow our [Coding Style](/CODING_STYLE) when submitting -patches. Also have a look at our [Contribution Guidelines](/CONTRIBUTING). - -When adding new functionality, tests should be added. For shared functionality -(in `src/basic/` and `src/shared/`) unit tests should be sufficient. The general -policy is to keep tests in matching files underneath `src/test/`, -e.g. `src/test/test-path-util.c` contains tests for any functions in -`src/basic/path-util.c`. If adding a new source file, consider adding a matching -test executable. For features at a higher level, tests in `src/test/` are very -strongly recommended. If that is not possible, integration tests in `test/` are -encouraged. - -Please also have a look at our list of [code quality tools](/CODE_QUALITY) we -have setup for systemd, to ensure our codebase stays in good shape. - -Please always test your work before submitting a PR. For many of the components -of systemd testing is straightforward as you can simply compile systemd and -run the relevant tool from the build directory. - -For some components (most importantly, systemd/PID 1 itself) this is not -possible, however. In order to simplify testing for cases like this we provide -a set of `mkosi` build files directly in the source tree. -[mkosi](https://github.com/systemd/mkosi) is a tool for building clean OS images -from an upstream distribution in combination with a fresh build of the project -in the local working directory. To make use of this, please install `mkosi` v19 -or newer using your distribution's package manager or from the -[GitHub repository](https://github.com/systemd/mkosi). `mkosi` will build an -image for the host distro by default. First, run `mkosi genkey` to generate a key -and certificate to be used for secure boot and verity signing. After that is done, -it is sufficient to type `mkosi` in the systemd project directory to generate a disk -image you can boot either in `systemd-nspawn` or in a UEFI-capable VM: +Please make sure to follow our [Coding Style](/CODING_STYLE) when submitting patches. +Also have a look at our [Contribution Guidelines](/CONTRIBUTING). + +When adding new functionality, tests should be added. +For shared functionality (in `src/basic/` and `src/shared/`) unit tests should be sufficient. +The general policy is to keep tests in matching files underneath `src/test/`, +e.g. `src/test/test-path-util.c` contains tests for any functions in `src/basic/path-util.c`. +If adding a new source file, consider adding a matching test executable. +For features at a higher level, tests in `src/test/` are very strongly recommended. +If that is not possible, integration tests in `test/` are encouraged. + +Please always test your work before submitting a PR. +For many of the components of systemd testing is straightforward as you can simply compile systemd and run the relevant tool from the build directory. + +For some components (most importantly, systemd/PID 1 itself) this is not possible, however. +In order to simplify testing for cases like this we provide a set of `mkosi` config files directly in the source tree. +[mkosi](https://mkosi.systemd.io/) +is a tool for building clean OS images from an upstream distribution in combination with a fresh build of the project in the local working directory. +To make use of this, please install `mkosi` v19 or newer using your distribution's package manager or from the +[GitHub repository](https://github.com/systemd/mkosi). +`mkosi` will build an image for the host distro by default. +First, run `mkosi genkey` to generate a key and certificate to be used for secure boot and verity signing. +After that is done, it is sufficient to type `mkosi` in the systemd project directory to generate a disk image you can boot either in `systemd-nspawn` or in a UEFI-capable VM: ```sh $ sudo mkosi boot # nspawn still needs sudo for now @@ -53,11 +45,62 @@ or: $ mkosi qemu ``` -Every time you rerun the `mkosi` command a fresh image is built, incorporating -all current changes you made to the project tree. +Every time you rerun the `mkosi` command a fresh image is built, +incorporating all current changes you made to the project tree. -Putting this all together, here's a series of commands for preparing a patch -for systemd: +By default a directory image is built. +This requires `virtiofsd` to be installed on the host. +To build a disk image instead which does not require `virtiofsd`, add the following to `mkosi.local.conf`: + +```conf +[Output] +Format=disk +``` + +To boot in UEFI mode instead of using QEMU's direct kernel boot, add the following to `mkosi.local.conf`: + +```conf +[Host] +QemuFirmware=uefi +``` + +To avoid having to build a new image all the time when iterating on a patch, +add the following to `mkosi.local.conf`: + +```conf +[Host] +RuntimeBuildSources=yes +``` + +After enabling this setting, the source and build directories will be mounted to +`/work/src` and `/work/build` respectively when booting the image as a container +or virtual machine. To build the latest changes and re-install, run +`meson install -C /work/build --only-changed` in the container or virtual machine +and optionally restart the daemon(s) you're working on using +`systemctl restart <units>` or `systemctl daemon-reexec` if you're working on pid1 +or `systemctl soft-reboot` to restart everything. + +Aside from the image, the `mkosi.output` directory will also be populated with a +set of distribution packages. Assuming you're running the same distribution and +release as the mkosi image, you can install these rpms on your host or test +system as well for any testing or debugging that cannot easily be performed in a +VM or container. + +By default, no debuginfo packages are produced. To produce debuginfo packages, +run mkosi with the `WITH_DEBUG` environment variable set to `1`: + +```sh +$ mkosi -E WITH_DEBUG=1 -f +``` + +or configure it in `mkosi.local.conf`: + +```conf +[Content] +Environment=WITH_DEBUG=1 +``` + +Putting this all together, here's a series of commands for preparing a patch for systemd: ```sh $ git clone https://github.com/systemd/mkosi.git # If mkosi v19 or newer is not packaged by your distribution @@ -74,9 +117,8 @@ $ git push -u <REMOTE> # where REMOTE is your "fork" on GitHub And after that, head over to your repo on GitHub and click "Compare & pull request" -If you want to do a local build without mkosi, most distributions also provide -very simple and convenient ways to install most development packages necessary -to build systemd: +If you want to do a local build without mkosi, +most distributions also provide very simple and convenient ways to install most development packages necessary to build systemd: ```sh # Fedora @@ -105,85 +147,72 @@ Happy hacking! Some source files are generated during build. We use two templating engines: * meson's `configure_file()` directive uses syntax with `@VARIABLE@`. - See the - [Meson docs for `configure_file()`](https://mesonbuild.com/Reference-manual.html#configure_file) - for details. +See the [Meson docs for `configure_file()`](https://mesonbuild.com/Reference-manual.html#configure_file) for details. {% raw %} * most files are rendered using jinja2, with `{{VARIABLE}}` and `{% if … %}`, - `{% elif … %}`, `{% else … %}`, `{% endif … %}` blocks. `{# … #}` is a - jinja2 comment, i.e. that block will not be visible in the rendered - output. `{% raw %} … `{% endraw %}`{{ '{' }}{{ '% endraw %' }}}` creates a block - where jinja2 syntax is not interpreted. +`{% elif … %}`, `{% else … %}`, `{% endif … %}` blocks. `{# … #}` is a jinja2 comment, +i.e. that block will not be visible in the rendered output. +`{% raw %} … `{% endraw %}`{{ '{' }}{{ '% endraw %' }}}` creates a block where jinja2 syntax is not interpreted. - See the - [Jinja Template Designer Documentation](https://jinja.palletsprojects.com/en/3.1.x/templates/#synopsis) - for details. +See the [Jinja Template Designer Documentation](https://jinja.palletsprojects.com/en/3.1.x/templates/#synopsis) for details. Please note that files for both template engines use the `.in` extension. ## Developer and release modes -In the default meson configuration (`-Dmode=developer`), certain checks are -enabled that are suitable when hacking on systemd (such as internal -documentation consistency checks). Those are not useful when compiling for -distribution and can be disabled by setting `-Dmode=release`. +In the default meson configuration (`-Dmode=developer`), +certain checks are enabled that are suitable when hacking on systemd (such as internal documentation consistency checks). +Those are not useful when compiling for distribution and can be disabled by setting `-Dmode=release`. ## Sanitizers in mkosi -See [Testing systemd using sanitizers](/TESTING_WITH_SANITIZERS) for more information -on how to build with sanitizers enabled in mkosi. +See [Testing systemd using sanitizers](/TESTING_WITH_SANITIZERS) for more information on how to build with sanitizers enabled in mkosi. ## Fuzzers -systemd includes fuzzers in `src/fuzz/` that use libFuzzer and are automatically -run by [OSS-Fuzz](https://github.com/google/oss-fuzz) with sanitizers. -To add a fuzz target, create a new `src/fuzz/fuzz-foo.c` file with a `LLVMFuzzerTestOneInput` -function and add it to the list in `src/fuzz/meson.build`. +systemd includes fuzzers in `src/fuzz/` that use libFuzzer and are automatically run by [OSS-Fuzz](https://github.com/google/oss-fuzz) with sanitizers. +To add a fuzz target, create a new `src/fuzz/fuzz-foo.c` file with a `LLVMFuzzerTestOneInput` function and add it to the list in `src/fuzz/meson.build`. -Whenever possible, a seed corpus and a dictionary should also be added with new -fuzz targets. The dictionary should be named `src/fuzz/fuzz-foo.dict` and the seed -corpus should be built and exported as `$OUT/fuzz-foo_seed_corpus.zip` in -`tools/oss-fuzz.sh`. +Whenever possible, a seed corpus and a dictionary should also be added with new fuzz targets. +The dictionary should be named `src/fuzz/fuzz-foo.dict` and the seed corpus should be built and exported as `$OUT/fuzz-foo_seed_corpus.zip` in `tools/oss-fuzz.sh`. -The fuzzers can be built locally if you have libFuzzer installed by running -`tools/oss-fuzz.sh`, or by running: +The fuzzers can be built locally if you have libFuzzer installed by running `tools/oss-fuzz.sh`, or by running: -``` +```sh CC=clang CXX=clang++ \ meson setup build-libfuzz -Dllvm-fuzz=true -Db_sanitize=address,undefined -Db_lundef=false \ - -Dc_args='-fno-omit-frame-pointer -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION' +-Dc_args='-fno-omit-frame-pointer -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION' ninja -C build-libfuzz fuzzers ``` -Each fuzzer then can be then run manually together with a directory containing -the initial corpus: +Each fuzzer then can be then run manually together with a directory containing the initial corpus: ``` export UBSAN_OPTIONS=print_stacktrace=1:print_summary=1:halt_on_error=1 build-libfuzz/fuzz-varlink-idl test/fuzz/fuzz-varlink-idl/ ``` -Note: the `halt_on_error=1` UBSan option is especially important, otherwise -the fuzzer won't crash when undefined behavior is triggered. +Note: the `halt_on_error=1` UBSan option is especially important, +otherwise the fuzzer won't crash when undefined behavior is triggered. You should also confirm that the fuzzers can be built and run using [the OSS-Fuzz toolchain](https://google.github.io/oss-fuzz/advanced-topics/reproducing/#building-using-docker): -``` +```sh path_to_systemd=... git clone --depth=1 https://github.com/google/oss-fuzz cd oss-fuzz for sanitizer in address undefined memory; do - for engine in libfuzzer afl honggfuzz; do - ./infra/helper.py build_fuzzers --sanitizer "$sanitizer" --engine "$engine" \ - --clean systemd "$path_to_systemd" +for engine in libfuzzer afl honggfuzz; do +./infra/helper.py build_fuzzers --sanitizer "$sanitizer" --engine "$engine" \ +--clean systemd "$path_to_systemd" - ./infra/helper.py check_build --sanitizer "$sanitizer" --engine "$engine" \ - -e ALLOWED_BROKEN_TARGETS_PERCENTAGE=0 systemd - done +./infra/helper.py check_build --sanitizer "$sanitizer" --engine "$engine" \ +-e ALLOWED_BROKEN_TARGETS_PERCENTAGE=0 systemd +done done ./infra/helper.py build_fuzzers --clean --architecture i386 systemd "$path_to_systemd" @@ -193,8 +222,8 @@ done ./infra/helper.py coverage --no-corpus-download systemd ``` -If you find a bug that impacts the security of systemd, please follow the -guidance in [CONTRIBUTING.md](/CONTRIBUTING) on how to report a security vulnerability. +If you find a bug that impacts the security of systemd, +please follow the guidance in [CONTRIBUTING.md](/CONTRIBUTING) on how to report a security vulnerability. For more details on building fuzzers and integrating with OSS-Fuzz, visit: @@ -203,55 +232,39 @@ For more details on building fuzzers and integrating with OSS-Fuzz, visit: ## Debugging binaries that need to run as root in vscode -When trying to debug binaries that need to run as root, we need to do some custom configuration in vscode to -have it try to run the applications as root and to ask the user for the root password when trying to start -the binary. To achieve this, we'll use a custom debugger path which points to a script that starts `gdb` as -root using `pkexec`. pkexec will prompt the user for their root password via a graphical interface. This -guide assumes the C/C++ extension is used for debugging. +When trying to debug binaries that need to run as root, +we need to do some custom configuration in vscode to have it try to run the applications as root and to ask the user for the root password when trying to start the binary. +To achieve this, we'll use a custom debugger path which points to a script that starts `gdb` as root using `pkexec`. +pkexec will prompt the user for their root password via a graphical interface. +This guide assumes the C/C++ extension is used for debugging. -First, create a file `sgdb` in the root of the systemd repository with the following contents and make it -executable: +First, create a file `sgdb` in the root of the systemd repository with the following contents and make it executable: -``` +```sh #!/bin/sh exec pkexec gdb "$@" ``` -Then, open launch.json in vscode, and set `miDebuggerPath` to `${workspaceFolder}/sgdb` for the corresponding -debug configuration. Now, whenever you try to debug the application, vscode will try to start gdb as root via -pkexec which will prompt you for your password via a graphical interface. After entering your password, -vscode should be able to start debugging the application. +Then, open launch.json in vscode, and set `miDebuggerPath` to `${workspaceFolder}/sgdb` for the corresponding debug configuration. +Now, whenever you try to debug the application, vscode will try to start gdb as root via pkexec which will prompt you for your password via a graphical interface. +After entering your password, vscode should be able to start debugging the application. -For more information on how to set up a debug configuration for C binaries, please refer to the official -vscode documentation [here](https://code.visualstudio.com/docs/cpp/launch-json-reference) +For more information on how to set up a debug configuration for C binaries, +please refer to the official vscode documentation [here](https://code.visualstudio.com/docs/cpp/launch-json-reference) ## Debugging systemd with mkosi + vscode -To simplify debugging systemd when testing changes using mkosi, we're going to show how to attach -[VSCode](https://code.visualstudio.com/)'s debugger to an instance of systemd running in a mkosi image using -QEMU. - -To allow VSCode's debugger to attach to systemd running in a mkosi image, we have to make sure it can access -the virtual machine spawned by mkosi where systemd is running. mkosi makes this possible via a handy SSH -option that makes the generated image accessible via SSH when booted. Thus you must build the image with -`mkosi --ssh`. The easiest way to set the option is to create a file `mkosi.local.conf` in the root of the -repository and add the following contents: - -``` -[Host] -Ssh=yes -RuntimeTrees=. -``` - -Also make sure that the SSH agent is running on your system and that you've added your SSH key to it with -`ssh-add`. Also make sure that `virtiofsd` is installed. +To simplify debugging systemd when testing changes using mkosi, we're going to show how to attach [VSCode](https://code.visualstudio.com/)'s debugger to an instance of systemd running in a mkosi image using QEMU. -After rebuilding the image and booting it with `mkosi qemu`, you should now be able to connect to it by -running `mkosi ssh` from the same directory in another terminal window. +To allow VSCode's debugger to attach to systemd running in a mkosi image, +we have to make sure it can access the virtual machine spawned by mkosi where systemd is running. +After booting the image with `mkosi qemu`, +you should now be able to connect to it by running `mkosi ssh` from the same directory in another terminal window. -Now we need to configure VSCode. First, make sure the C/C++ extension is installed. If you're already using -a different extension for code completion and other IDE features for C in VSCode, make sure to disable the -corresponding parts of the C/C++ extension in your VSCode user settings by adding the following entries: +Now we need to configure VSCode. +First, make sure the C/C++ extension is installed. +If you're already using a different extension for code completion and other IDE features for C in VSCode, +make sure to disable the corresponding parts of the C/C++ extension in your VSCode user settings by adding the following entries: ```json "C_Cpp.formatting": "Disabled", @@ -260,9 +273,9 @@ corresponding parts of the C/C++ extension in your VSCode user settings by addin "C_Cpp.suggestSnippets": false, ``` -With the extension set up, we can create the launch.json file in the .vscode/ directory to tell the VSCode -debugger how to attach to the systemd instance running in our mkosi container/VM. Create the file, and possibly -the directory, and add the following contents: +With the extension set up, +we can create the launch.json file in the .vscode/ directory to tell the VSCode debugger how to attach to the systemd instance running in our mkosi container/VM. +Create the file, and possibly the directory, and add the following contents: ```json { @@ -276,16 +289,12 @@ the directory, and add the following contents: "name": "systemd", "pipeTransport": { "pipeProgram": "mkosi", - "pipeArgs": [ - "-C", - "/path/to/systemd/repo/directory/on/host/system/", - "ssh" - ], + "pipeArgs": ["-C", "${workspaceFolder}", "ssh"], "debuggerPath": "/usr/bin/gdb" }, "MIMode": "gdb", "sourceFileMap": { - "/root/src/systemd": { + "/work/src": { "editorPath": "${workspaceFolder}", "useForBreakpoints": false }, @@ -295,29 +304,28 @@ the directory, and add the following contents: } ``` -Now that the debugger knows how to connect to our process in the container/VM and we've set up the necessary -source mappings, go to the "Run and Debug" window and run the "systemd" debug configuration. If everything -goes well, the debugger should now be attached to the systemd instance running in the container/VM. You can -attach breakpoints from the editor and enjoy all the other features of VSCode's debugger. +Now that the debugger knows how to connect to our process in the container/VM and we've set up the necessary source mappings, +go to the "Run and Debug" window and run the "systemd" debug configuration. +If everything goes well, the debugger should now be attached to the systemd instance running in the container/VM. +You can attach breakpoints from the editor and enjoy all the other features of VSCode's debugger. -To debug systemd components other than PID 1, set "program" to the full path of the component you want to -debug and set "processId" to "${command:pickProcess}". Now, when starting the debugger, VSCode will ask you -the PID of the process you want to debug. Run `systemctl show --property MainPID --value <component>` in the -container to figure out the PID and enter it when asked and VSCode will attach to that process instead. +To debug systemd components other than PID 1, +set "program" to the full path of the component you want to debug and set "processId" to "${command:pickProcess}". +Now, when starting the debugger, VSCode will ask you the PID of the process you want to debug. +Run `systemctl show --property MainPID --value <component>` +in the container to figure out the PID and enter it when asked and VSCode will attach to that process instead. ## Debugging systemd-boot -During boot, systemd-boot and the stub loader will output messages like -`systemd-boot@0x0A` and `systemd-stub@0x0B`, providing the base of the loaded -code. This location can then be used to attach to a QEMU session (provided it -was run with `-s`). See `debug-sd-boot.sh` script in the tools folder which -automates this processes. +During boot, systemd-boot and the stub loader will output messages like `systemd-boot@0x0A` and `systemd-stub@0x0B`, +providing the base of the loaded code. +This location can then be used to attach to a QEMU session (provided it was run with `-s`). +See `debug-sd-boot.sh` script in the tools folder which automates this processes. If the debugger is too slow to attach to examine an early boot code passage, -the call to `DEFINE_EFI_MAIN_FUNCTION()` can be modified to enable waiting. As -soon as the debugger has control, we can then run `set variable wait = 0` or -`return` to continue. Once the debugger has attached, setting breakpoints will -work like usual. +the call to `DEFINE_EFI_MAIN_FUNCTION()` can be modified to enable waiting. +As soon as the debugger has control, we can then run `set variable wait = 0` or `return` to continue. +Once the debugger has attached, setting breakpoints will work like usual. To debug systemd-boot in an IDE such as VSCode we can use a launch configuration like this: ```json diff --git a/docs/INCOMPATIBILITIES.md b/docs/INCOMPATIBILITIES.md index 332f1ef..784f3a2 100644 --- a/docs/INCOMPATIBILITIES.md +++ b/docs/INCOMPATIBILITIES.md @@ -30,7 +30,7 @@ Many of the incompatibilities are specific to distribution-specific extensions o * Early boot runlevels as they are used by some distributions are no longer supported. i.e. "fake", distribution-specific runlevels such as "S" or "b" cannot be used with systemd. * On SysV systems changes to init scripts or any other files that define the boot process (such as /etc/fstab) usually had an immediate effect on everything started later. This is different on systemd-based systems where init script information and other boot-time configuration files are only reread when "systemctl daemon-reload" is issued. (Note that some commands, notably "systemctl enable"/"systemctl disable" do this implicitly however.) This is by design, and a safety feature, since it ensures that half-completed changes are not read at the wrong time. * Multiple entries for the same mount path in /etc/fstab are not supported. In systemd there's only a single unit definition for each mount path read at any time. Also the listing order of mounts in /etc/fstab has no effect, mounts are executed in parallel and dependencies between them generated automatically depending on path prefixes and source paths. -* systemd's handling of the existing "nofail" mount option in /etc/fstab is stricter than it used to be on some sysvinit distributions: mount points that fail and are not listed as "nofail" will cause the boot to be stopped, for security reasons, as we we should not permit unprivileged code to run without everything listed — and not expressly exempted through "nofail" — being around. Hence, please mark all mounts where booting shall proceed regardless whether they succeeded or not with "nofail" +* systemd's handling of the existing "nofail" mount option in /etc/fstab is stricter than it used to be on some sysvinit distributions: mount points that fail and are not listed as "nofail" will cause the boot to be stopped, for security reasons, as we should not permit unprivileged code to run without everything listed — and not expressly exempted through "nofail" — being around. Hence, please mark all mounts where booting shall proceed regardless whether they succeeded or not with "nofail" * Some SysV systems support an "rc.local" script that is supposed to be called "last" during boot. In systemd, the script is supported, but the semantics are less strict, as there is simply no concept of "last service", as the boot process is event- and request-based, parallelized and compositive. In general, it's a good idea to write proper unit files with properly defined dependencies, and avoid making use of rc.local. * systemd assumes that the UID boundary between system and regular users is a choice the distribution makes, and not the administrator. Hence it expects this setting as compile-time option to be picked by the distribution. It will _not_ check /etc/login.defs during runtime. diff --git a/docs/INHIBITOR_LOCKS.md b/docs/INHIBITOR_LOCKS.md index 7dafc5e..1308f6e 100644 --- a/docs/INHIBITOR_LOCKS.md +++ b/docs/INHIBITOR_LOCKS.md @@ -87,7 +87,7 @@ A delay lock taken this way should be released ASAP on reception of PrepareForSh **ListInhibitors()** lists all currently active inhibitor locks. It returns an array of structs, each consisting of What, Who, Why, Mode as above, plus the PID and UID of the process that requested the lock. -The **PrepareForShutdown()** and **PrepareForSleep()** signals are emitted when a system suspend or shutdown has been requested and is about to be executed, as well as after the the suspend/shutdown was completed (or failed). +The **PrepareForShutdown()** and **PrepareForSleep()** signals are emitted when a system suspend or shutdown has been requested and is about to be executed, as well as after the suspend/shutdown was completed (or failed). The signals carry a boolean argument. If _True_ the shutdown/sleep has been requested, and the preparation phase for it begins, if _False_ the operation has finished completion (or failed). @@ -99,9 +99,9 @@ The signal with _False_ is generally delivered only after the system comes back The signal with _False_ is usually the signal on which applications request a new delay lock in order to be synchronously notified about the next suspend/shutdown cycle. -Note that watching PrepareForShutdown(true)[?](//secure.freedesktop.org/write/www/ikiwiki.cgi?do=create&from=Software%2Fsystemd%2Finhibit&page=Software%2Fsystemd%2Finhibit%2FPrepareForSleep)/PrepareForSleep(true) without taking a delay lock is racy and should not be done, as any code that an application might want to execute on this signal might not actually finish before the suspend/shutdown cycle is executed. +Note that watching PrepareForShutdown(true)/PrepareForSleep(true) without taking a delay lock is racy and should not be done, as any code that an application might want to execute on this signal might not actually finish before the suspend/shutdown cycle is executed. -_Again_: if you watch PrepareForSuspend(true), then you really should have taken a delay lock first. PrepareForShutdown(false) may be subscribed to by applications which want to be notified about system resume events. +_Again_: if you watch PrepareForShutdown(true)/PrepareForSleep(true), then you really should have taken a delay lock first. PrepareForSleep(false) may be subscribed to by applications which want to be notified about system resume events. Note that this will only be sent out for suspend/resume cycles done via logind, i.e. generally only for high-level user-induced suspend cycles, and not automatic, low-level kernel induced ones which might exist on certain devices with more aggressive power management. diff --git a/docs/MEMORY_PRESSURE.md b/docs/MEMORY_PRESSURE.md index 69c23ec..da1c9b2 100644 --- a/docs/MEMORY_PRESSURE.md +++ b/docs/MEMORY_PRESSURE.md @@ -169,7 +169,7 @@ pressure handling: setting controls whether to enable the memory pressure protocol for the service in question. -* The `MemoryPressureThresholdSec=` setting allows to configure the threshold +* The `MemoryPressureThresholdSec=` setting allows configuring the threshold when to signal memory pressure to the services. It takes a time value (usually in the millisecond range) that defines a threshold per 1s time window: if memory allocation latencies grow beyond this threshold diff --git a/docs/PAX_CONTROL_GROUPS.md b/docs/PAX_CONTROL_GROUPS.md index 4b2374a..4491959 100644 --- a/docs/PAX_CONTROL_GROUPS.md +++ b/docs/PAX_CONTROL_GROUPS.md @@ -105,12 +105,11 @@ systemd adheres to the recommendations above and guarantees additional behavior It is hence OK to pre-create cgroups and then let systemd use it, without having systemd remove it afterwards. - If a service cgroup already exists, systemd will not override the attributes of the cgroup with the exception of those explicitly configured in the systemd unit files. It is hence OK to pre-create cgroups for use in systemd, and pre-apply attributes to it. -- To avoid that systemd places all services in automatic cgroups in the "cpu" hierarchy change the [?](https://secure.freedesktop.org/write/www/ikiwiki.cgi?do=create&from=Software%2Fsystemd%2FPaxControlGroups&page=DefaultControllers) DefaultControllers= in /etc/systemd/system.conf and set it to the empty string. +- To avoid that systemd places all services in automatic cgroups in the "cpu" hierarchy change the DefaultControllers= in /etc/systemd/system.conf and set it to the empty string. - By default systemd will place services only in automatic cgroups in the "cpu" hierarchy and in its own private tree "name=systemd". - If you want it to duplicate these trees in other hierarchies add them to [?](https://secure.freedesktop.org/write/www/ikiwiki.cgi?do=create&from=Software%2Fsystemd%2FPaxControlGroups&page=DefaultControllers) DefaultControllers= in /etc/systemd/system.conf -- To opt-out or opt-in specific services from the automatic tree generation in the kernel controller hierarchies use [?](https://secure.freedesktop.org/write/www/ikiwiki.cgi?do=create&from=Software%2Fsystemd%2FPaxControlGroups&page=ControlGroup) ControlGroup= in the unit file. - Use "[?](https://secure.freedesktop.org/write/www/ikiwiki.cgi?do=create&from=Software%2Fsystemd%2FPaxControlGroups&page=ControlGroup) ControlGroup=cpu:/" to opt-out of cgroup assignment for a service or - [?](https://secure.freedesktop.org/write/www/ikiwiki.cgi?do=create&from=Software%2Fsystemd%2FPaxControlGroups&page=ControlGroup) ControlGroup=cpu:/foo/bar" to manipulate the cgroup path. + If you want it to duplicate these trees in other hierarchies add them to DefaultControllers= in /etc/systemd/system.conf +- To opt-out or opt-in specific services from the automatic tree generation in the kernel controller hierarchies use ControlGroup= in the unit file. + Use "ControlGroup=cpu:/" to opt-out of cgroup assignment for a service or "ControlGroup=cpu:/foo/bar" to manipulate the cgroup path. - Stay away from the name=systemd named hierarchy. It's private property of systemd. You are welcome to explore it, but it is uncool to modify it from outside systemd. diff --git a/docs/PORTABLE_SERVICES.md b/docs/PORTABLE_SERVICES.md index 6f5ff11..a0bb11b 100644 --- a/docs/PORTABLE_SERVICES.md +++ b/docs/PORTABLE_SERVICES.md @@ -19,32 +19,32 @@ two specific features of container management: The primary tool for interacting with Portable Services is `portablectl`, and they are managed by the `systemd-portabled` service. -Portable services don't bring anything inherently new to the table. All they do -is put together known concepts to cover a specific set of use-cases in a +Portable services don't bring anything inherently new to the table. +All they do is put together known concepts to cover a specific set of use-cases in a slightly nicer way. ## So, what *is* a "Portable Service"? A portable service is ultimately just an OS tree, either inside of a directory, -or inside a raw disk image containing a Linux file system. This tree is called -the "image". It can be "attached" or "detached" from the system. When -"attached", specific systemd units from the image are made available on the -host system, then behaving pretty much exactly like locally installed system -services. When "detached", these units are removed again from the host, leaving +or inside a raw disk image containing a Linux file system. +This tree is called the "image". It can be "attached" or "detached" from the system. +When "attached", specific systemd units from the image are made available on the +host system, then behaving pretty much exactly like locally installed system services. +When "detached", these units are removed again from the host, leaving no artifacts around (except maybe messages they might have logged). -The OS tree/image can be created with any tool of your choice. For example, you -can use `dnf --installroot=` if you like, or `debootstrap`, the image format is -entirely generic, and doesn't have to carry any specific metadata beyond what -distribution images carry anyway. Or to say this differently: the image format -doesn't define any new metadata as unit files and OS tree directories or disk -images are already sufficient, and pretty universally available these days. One -particularly nice tool for creating suitable images is -[mkosi](https://github.com/systemd/mkosi), but many other existing tools will -do too. +The OS tree/image can be created with any tool of your choice. +For example, you can use `dnf --installroot=` if you like, or `debootstrap`, the image format is +entirely generic, and doesn't have to carry any specific metadata beyond what distribution images carry anyway. +Or to say this differently: +The image format doesn't define any new metadata as unit files and OS tree directories or disk +images are already sufficient, and pretty universally available these days. +One particularly nice tool for creating suitable images is +[mkosi](https://github.com/systemd/mkosi), +but many other existing tools will do too. -Portable services may also be constructed from layers, similarly to container -environments. See [Extension Images](#extension-images) below. +Portable services may also be constructed from layers, similarly to container environments. +See [Extension Images](#extension-images) below. If you so will, "Portable Services" are a nicer way to manage chroot() environments, with better security, tooling and behavior. @@ -55,26 +55,21 @@ environments, with better security, tooling and behavior. systemd-nspawn/LXC-type OS containers, for Docker/rkt-like micro service containers, and even certain 'lightweight' VM runtimes. -"Portable services" do not provide a fully isolated environment to the payload, -like containers mostly intend to. Instead, they are more like regular system -services, can be controlled with the same tools, are exposed the same way in -all infrastructure, and so on. The main difference is that they use a different -root directory than the rest of the system. Hence, the intent is not to run -code in a different, isolated environment from the host — like most containers -would — but to run it in the same environment, but with stricter access -controls on what the service can see and do. +"Portable services" do not provide a fully isolated environment to the payload, like containers mostly intend to. +Instead, they are more like regular system services, can be controlled with the same tools, are exposed the same way in all infrastructure, and so on. +The main difference is that they use a different root directory than the rest of the system. +Hence, the intent is not to run code in a different, isolated environment from the host — like most containers would — but to run it in the same environment, but with stricter access controls on what the service can see and do. One point of differentiation: since programs running as "portable services" are pretty much regular system services, they won't run as PID 1 (like they would -under Docker), but as normal processes. A corollary of that is that they aren't -supposed to manage anything in their own environment (such as the network) as -the execution environment is mostly shared with the rest of the system. +under Docker), but as normal processes. + +A corollary of that is that they aren't supposed to manage anything in their own environment (such as the network) as the execution environment is mostly shared with the rest of the system. The primary focus use-case of "portable services" is to extend the host system with encapsulated extensions, but provide almost full integration with the rest -of the system, though possibly restricted by security knobs. This focus -includes system extensions otherwise sometimes called "super-privileged -containers". +of the system, though possibly restricted by security knobs. +This focus includes system extensions otherwise sometimes called "super-privileged containers". Note that portable services are only available for system services, not for user services (i.e. the functionality cannot be used for the stuff @@ -103,15 +98,15 @@ This command does the following: `foobar-*.{service|socket|target|timer|path}`, `foobar@.{service|socket|target|timer|path}` as well as `foobar.*.{service|socket|target|timer|path}` and - `foobar.{service|socket|target|timer|path}` are copied out. These unit files - are placed in `/etc/systemd/system.attached/` (which is part of the normal - unit file search path of PID 1, and thus loaded exactly like regular unit - files). Within the images the unit files are looked for at the usual - locations, i.e. in `/usr/lib/systemd/system/` and `/etc/systemd/system/` and - so on, relative to the image's root. - -3. For each such unit file a drop-in file is created. Let's say - `foobar-waldo.service` was one of the unit files copied to + `foobar.{service|socket|target|timer|path}` + are copied out. + These unit files are placed in `/etc/systemd/system.attached/` + (which is part of the normal unit file search path of PID 1, and thus loaded exactly like regular unit + files). + Within the images the unit files are looked for at the usual locations, i.e. in `/usr/lib/systemd/system/` and `/etc/systemd/system/` and so on, relative to the image's root. + +3. For each such unit file a drop-in file is created. + Let's say `foobar-waldo.service` was one of the unit files copied to `/etc/systemd/system.attached/`, then a drop-in file `/etc/systemd/system.attached/foobar-waldo.service.d/20-portable.conf` is created, containing a few lines of additional configuration: @@ -123,31 +118,30 @@ This command does the following: LogExtraFields=PORTABLE=foobar ``` -4. For each such unit a "profile" drop-in is linked in. This "profile" drop-in - generally contains security options that lock down the service. By default - the `default` profile is used, which provides a medium level of security. +4. For each such unit a "profile" drop-in is linked in. + This "profile" drop-in generally contains security options that lock down the service. + By default the `default` profile is used, which provides a medium level of security. There's also `trusted`, which runs the service with no restrictions, i.e. in - the host file system root and with full privileges. The `strict` profile - comes with the toughest security restrictions. Finally, `nonetwork` is like - `default` but without network access. Users may define their own profiles - too (or modify the existing ones). + the host file system root and with full privileges. + The `strict` profile comes with the toughest security restrictions. + Finally, `nonetwork` is like `default` but without network access. + Users may define their own profiles too (or modify the existing ones). And that's already it. Note that the images need to stay around (and in the same location) as long as the -portable service is attached. If an image is moved, the `RootImage=` line -written to the unit drop-in would point to an non-existent path, and break -access to the image. +portable service is attached. +If an image is moved, the `RootImage=` line written to the unit drop-in would point to an non-existent path, and break access to the image. -The `portablectl detach` command executes the reverse operation: it looks for -the drop-ins and the unit files associated with the image, and removes them. +The `portablectl detach` command executes the reverse operation: +it looks for the drop-ins and the unit files associated with the image, and removes them. Note that `portablectl attach` won't enable or start any of the units it copies out by default, but `--enable` and `--now` parameter are available as shortcuts. The same is true for the opposite `detach` operation. -The `portablectl reattach` command combines a `detach` with an `attach`. It is -useful in case an image gets upgraded, as it allows performing a `restart` +The `portablectl reattach` command combines a `detach` with an `attach`. +It is useful in case an image gets upgraded, as it allows performing a `restart` operation on the units instead of `stop` plus `start`, thus providing lower downtime and avoiding losing runtime state associated with the unit such as the file descriptor store. @@ -155,13 +149,12 @@ file descriptor store. ## Requirements on Images Note that portable services don't introduce any new image format, but most OS -images should just work the way they are. Specifically, the following -requirements are made for an image that can be attached/detached with -`portablectl`. +images should just work the way they are. +Specifically, the following requirements are made for an image that can be attached/detached with `portablectl`. 1. It must contain an executable that shall be invoked, along with all its - dependencies. Any binary code needs to be compiled for an architecture - compatible with the host. + dependencies. + Any binary code needs to be compiled for an architecture compatible with the host. 2. The image must either be a plain sub-directory (or btrfs subvolume) containing the binaries and its dependencies in a classic Linux OS tree, or @@ -172,10 +165,9 @@ requirements are made for an image that can be attached/detached with [Discoverable Partitions Specification](https://uapi-group.org/specifications/specs/discoverable_partitions_specification). 3. The image must at least contain one matching unit file, with the right name - prefix and suffix (see above). The unit file is searched in the usual paths, - i.e. primarily /etc/systemd/system/ and /usr/lib/systemd/system/ within the - image. (The implementation will check a couple of other paths too, but it's - recommended to use these two paths.) + prefix and suffix (see above). + The unit file is searched in the usual paths, i.e. primarily /etc/systemd/system/ and /usr/lib/systemd/system/ within the image. + (The implementation will check a couple of other paths too, but it's recommended to use these two paths.) 4. The image must contain an os-release file, either in `/etc/os-release` or `/usr/lib/os-release`. The file should follow the standard format. @@ -187,17 +179,17 @@ requirements are made for an image that can be attached/detached with `/tmp/`, `/var/tmp/` that can be mounted over with the corresponding version from the host. -7. The OS might require other files or directories to be in place. For example, - if the image is built based on glibc, the dynamic loader needs to be +7. The OS might require other files or directories to be in place. + For example, if the image is built based on glibc, the dynamic loader needs to be available in `/lib/ld-linux.so.2` or `/lib64/ld-linux-x86-64.so.2` (or similar, depending on architecture), and if the distribution implements a merged `/usr/` tree, this means `/lib` and/or `/lib64` need to be symlinks - to their respective counterparts below `/usr/`. For details see your - distribution's documentation. + to their respective counterparts below `/usr/`. + For details see your distribution's documentation. Note that images created by tools such as `debootstrap`, `dnf --installroot=` -or `mkosi` generally satisfy all of the above. If you wonder what the most -minimal image would be that complies with the requirements above, it could +or `mkosi` generally satisfy all of the above. +If you wonder what the most minimal image would be that complies with the requirements above, it could consist of this: ``` @@ -216,10 +208,9 @@ consist of this: And that's it. -Note that qualifying images do not have to contain an init system of their -own. If they do, it's fine, it will be ignored by the portable service logic, -but they generally don't have to, and it might make sense to avoid any, to keep -images minimal. +Note that qualifying images do not have to contain an init system of their own. +If they do, it's fine, it will be ignored by the portable service logic, +but they generally don't have to, and it might make sense to avoid any, to keep images minimal. If the image is writable, and some of the files or directories that are overmounted from the host do not exist yet they will be automatically created. @@ -227,8 +218,8 @@ On read-only, immutable images (e.g. `erofs` or `squashfs` images) all files and directories to over-mount must exist already. Note that as no new image format or metadata is defined, it's very -straightforward to define images than can be made use of in a number of -different ways. For example, by using `mkosi -b` you can trivially build a +straightforward to define images than can be made use of in a number of different ways. +For example, by using `mkosi -b` you can trivially build a single, unified image that: 1. Can be attached as portable service, to run any container services natively @@ -242,35 +233,33 @@ single, unified image that: 4. Can be booted directly on bare-metal systems. -Of course, to facilitate 2, 3 and 4 you need to include an init system in the -image. To facilitate 3 and 4 you also need to include a boot loader in the -image. As mentioned, `mkosi -b` takes care of all of that for you, but any -other image generator should work too. +Of course, to facilitate 2, 3 and 4 you need to include an init system in the image. +To facilitate 3 and 4 you also need to include a boot loader in the +image. +As mentioned, `mkosi -b` takes care of all of that for you, but any other image generator should work too. The [os-release(5)](https://www.freedesktop.org/software/systemd/man/os-release.html) file may optionally be extended with a `PORTABLE_PREFIXES=` field listing all -supported portable service prefixes for the image (see above). This is useful -for informational purposes (as it allows recognizing portable service images +supported portable service prefixes for the image (see above). +This is useful for informational purposes (as it allows recognizing portable service images from their contents as such), but is also useful to protect the image from -being used under a wrong name and prefix. This is particularly relevant if the -images are cryptographically authenticated (via Verity or a similar mechanism) -as this way the (not necessarily authenticated) image file name can be -validated against the (authenticated) image contents. If the field is not -specified the image will work fine, but is not necessarily recognizable as -portable service image, and any set of units included in the image may be -attached, there are no restrictions enforced. +being used under a wrong name and prefix. +This is particularly relevant if the images are cryptographically authenticated (via Verity or a similar mechanism) as this way the (not necessarily authenticated) image file name can be +validated against the (authenticated) image contents. +If the field is not specified the image will work fine, but is not necessarily recognizable as +portable service image, and any set of units included in the image may be attached, there are no restrictions enforced. ## Extension Images Portable services can be delivered as one or multiple images that extend the base -image, and are combined with OverlayFS at runtime, when they are attached. This -enables a workflow that splits the base 'runtime' from the daemon, so that multiple +image, and are combined with OverlayFS at runtime, when they are attached. +This enables a workflow that splits the base 'runtime' from the daemon, so that multiple portable services can share the same 'runtime' image (libraries, tools) without having to include everything each time, with the layering happening only at runtime. The `--extension` parameter of `portablectl` can be used to specify as many upper -layers as desired. On top of the requirements listed in the previous section, the -following must be also be observed: +layers as desired. +On top of the requirements listed in the previous section, the following must be also be observed: 1. The base/OS image must contain an `os-release file`, either in `/etc/os-release` or `/usr/lib/os-release`, in the standard format. @@ -296,25 +285,25 @@ following must be also be observed: ## Execution Environment -Note that the code in portable service images is run exactly like regular -services. Hence there's no new execution environment to consider. And, unlike -Docker would do it, as these are regular system services they aren't run as PID +Note that the code in portable service images is run exactly like regular services. +Hence there's no new execution environment to consider. +And, unlike Docker would do it, as these are regular system services they aren't run as PID 1 either, but with regular PID values. ## Access to host resources If services shipped with this mechanism shall be able to access host resources (such as files or AF_UNIX sockets for IPC), use the normal `BindPaths=` and -`BindReadOnlyPaths=` settings in unit files to mount them in. In fact, the -`default` profile mentioned above makes use of this to ensure +`BindReadOnlyPaths=` settings in unit files to mount them in. +In fact, the `default` profile mentioned above makes use of this to ensure `/etc/resolv.conf`, the D-Bus system bus socket or write access to the logging subsystem are available to the service. ## Instantiation -Sometimes it makes sense to instantiate the same set of services multiple -times. The portable service concept does not introduce a new logic for this. It -is recommended to use the regular systemd unit templating for this, i.e. to +Sometimes it makes sense to instantiate the same set of services multiple times. +The portable service concept does not introduce a new logic for this. +It is recommended to use the regular systemd unit templating for this, i.e. to include template units such as `foobar@.service`, so that instantiation is as simple as: @@ -330,11 +319,10 @@ units shipped with the OS itself as for attached portable services. ## Immutable images with local data -It's a good idea to keep portable service images read-only during normal -operation. In fact, all but the `trusted` profile will default to this kind of -behaviour, by setting the `ProtectSystem=strict` option. In this case writable -service data may be placed on the host file system. Use `StateDirectory=` in -the unit files to enable such behaviour and add a local data directory to the +It's a good idea to keep portable service images read-only during normal operation. +In fact, all but the `trusted` profile will default to this kind of behaviour, by setting the `ProtectSystem=strict` option. +In this case writable service data may be placed on the host file system. +Use `StateDirectory=` in the unit files to enable such behaviour and add a local data directory to the services copied onto the host. ## Logging @@ -342,24 +330,19 @@ services copied onto the host. Several fields are autotmatically added to log messages generated by a portable service (or about a portable service, e.g.: start/stop logs from systemd). The `PORTABLE=` field will refer to the name of the portable image where the unit -was loaded from. In case extensions are used, additionally there will be a -`PORTABLE_ROOT=` field, referring to the name of image used as the base layer -(i.e.: `RootImage=` or `RootDirectory=`), and one `PORTABLE_EXTENSION=` field per +was loaded from. In case extensions are used, additionally there will be a `PORTABLE_ROOT=` field, referring to the name of image used as the base layer (i.e.: `RootImage=` or `RootDirectory=`), and one `PORTABLE_EXTENSION=` field per each extension image used. -The `os-release` file from the portable image will be parsed and added as structured -metadata to the journal log entries. The parsed fields will be the first ID field which -is set from the set of `IMAGE_ID` and `ID` in this order of preference, and the first -version field which is set from a set of `IMAGE_VERSION`, `VERSION_ID`, and `BUILD_ID` -in this order of preference. The ID and version, if any, are concatenated with an -underscore (`_`) as separator. If only either one is found, it will be used by itself. +The `os-release` file from the portable image will be parsed and added as structured metadata to the journal log entries. +The parsed fields will be the first ID field which is set from the set of `IMAGE_ID` and `ID` in this order of preference, and the first version field which is set from a set of `IMAGE_VERSION`, `VERSION_ID`, and `BUILD_ID` in this order of preference. +The ID and version, if any, are concatenated with an underscore (`_`) as separator. +If only either one is found, it will be used by itself. The field will be named `PORTABLE_NAME_AND_VERSION=`. In case extensions are used, the same fields in the same order are, but prefixed by `SYSEXT_`/`CONFEXT_`, are parsed from each `extension-release` file, and are appended -to the journal as log entries, using `PORTABLE_EXTENSION_NAME_AND_VERSION=` as the -field name. The base layer's field will be named `PORTABLE_ROOT_NAME_AND_VERSION=` -instead of `PORTABLE_NAME_AND_VERSION=` in this case. +to the journal as log entries, using `PORTABLE_EXTENSION_NAME_AND_VERSION=` as the field name. +The base layer's field will be named `PORTABLE_ROOT_NAME_AND_VERSION=` instead of `PORTABLE_NAME_AND_VERSION=` in this case. For example, a portable service `app0` using two extensions `app0.raw` and `app1.raw` (with `SYSEXT_ID=app`, and `SYSEXT_VERSION_ID=` `0` and `1` in their diff --git a/docs/RELEASE.md b/docs/RELEASE.md index df04cb4..f299c62 100644 --- a/docs/RELEASE.md +++ b/docs/RELEASE.md @@ -12,17 +12,19 @@ SPDX-License-Identifier: LGPL-2.1-or-later 3. Update the time and place in NEWS 4. Update hwdb (`ninja -C build update-hwdb`, `ninja -C build update-hwdb-autosuspend`, commit separately). 5. Update syscall numbers (`ninja -C build update-syscall-tables update-syscall-header`). -6. [RC1] Update version and library numbers in `meson.build` -7. Check dbus docs with `ninja -C build update-dbus-docs` -8. Update translation strings (`cd build`, `meson compile systemd-pot`, `meson compile systemd-update-po`) - drop the header comments from `systemd.pot` + re-add SPDX before committing. If the only change in a file is the 'POT-Creation-Date' field, then ignore that file. -9. Tag the release: `version=vXXX-rcY && git tag -s "${version}" -m "systemd ${version}"` -10. Do `ninja -C build` -11. Make sure that the version string and package string match: `build/systemctl --version` -12. [FINAL] Close the github milestone and open a new one (https://github.com/systemd/systemd/milestones) -13. "Draft" a new release on github (https://github.com/systemd/systemd/releases/new), mark "This is a pre-release" if appropriate. -14. Check that announcement to systemd-devel, with a copy&paste from NEWS, was sent. This should happen automatically. -15. Update IRC topic (`/msg chanserv TOPIC #systemd Version NNN released | Online resources https://systemd.io/`) -16. Push commits to stable, create an empty -stable branch: `git push systemd-stable --atomic origin/main:main origin/main:refs/heads/${version}-stable`. -17. Build and upload the documentation (on the -stable branch): `ninja -C build doc-sync` -18. [FINAL] Change the default branch to latest release (https://github.com/systemd/systemd-stable/settings/branches). -19. [FINAL] Change the Github Pages branch in the stable repository to the newly created branch (https://github.com/systemd/systemd-stable/settings/pages) and set the 'Custom domain' to 'systemd.io' +6. [RC1] Update library numbers in `meson.build` +7. Update version number in `meson.version` (e.g. from `256~devel` to `256~rc1` or from `256~rc3` to `256`). Note that this uses a tilde (\~) instead of a hyphen (-) because tildes sort lower in version comparisons according to the [version format specification](https://uapi-group.org/specifications/specs/version_format_specification/), and we want `255~rc1` to sort lower than `255`. +8. Check dbus docs with `ninja -C build update-dbus-docs` +9. Update translation strings (`ninja -C build systemd-pot`, `ninja -C build systemd-update-po`) - drop the header comments from `systemd.pot` + re-add SPDX before committing. If the only change in a file is the 'POT-Creation-Date' field, then ignore that file. +10. Tag the release: `version="v$(sed 's/~/-/g' meson.version)" && git tag -s "${version}" -m "systemd ${version}"` (tildes are replaced with hyphens, because git doesn't accept the former). +11. Do `ninja -C build` +12. Make sure that the version string and package string match: `build/systemctl --version` +13. [FINAL] Close the github milestone and open a new one (https://github.com/systemd/systemd/milestones) +14. "Draft" a new release on github (https://github.com/systemd/systemd/releases/new), mark "This is a pre-release" if appropriate. +15. Check that announcement to systemd-devel, with a copy&paste from NEWS, was sent. This should happen automatically. +16. Update IRC topic (`/msg chanserv TOPIC #systemd Version NNN released | Online resources https://systemd.io/`) +17. [FINAL] Push commits to stable, create an empty -stable branch: `git push systemd-stable --atomic origin/main:main origin/main:refs/heads/${version}-stable`. +18. [FINAL] Build and upload the documentation (on the -stable branch): `ninja -C build doc-sync` +19. [FINAL] Change the default branch to latest release (https://github.com/systemd/systemd-stable/settings/branches). +20. [FINAL] Change the Github Pages branch in the stable repository to the newly created branch (https://github.com/systemd/systemd-stable/settings/pages) and set the 'Custom domain' to 'systemd.io' +21. [FINAL] Update version number in `meson.version` to the devel version of the next release (e.g. from `v256` to `v257~devel`) diff --git a/docs/USER_RECORD.md b/docs/USER_RECORD.md index 5d43de5..0268cc1 100644 --- a/docs/USER_RECORD.md +++ b/docs/USER_RECORD.md @@ -7,10 +7,11 @@ SPDX-License-Identifier: LGPL-2.1-or-later # JSON User Records -systemd optionally processes user records that go beyond the classic UNIX (or -glibc NSS) `struct passwd`. Various components of systemd are able to provide -and consume records in a more extensible format of a dictionary of key/value -pairs, encoded as JSON. Specifically: +systemd optionally processes user records that go beyond the classic UNIX (or glibc NSS) `struct passwd`. +Various components of systemd are able to provide and consume records in a more extensible format of a dictionary of key/value +pairs, encoded as JSON. + +Specifically: 1. [`systemd-homed.service`](https://www.freedesktop.org/software/systemd/man/systemd-homed.service.html) manages `human` user home directories and embeds these JSON records @@ -24,8 +25,8 @@ pairs, encoded as JSON. Specifically: 3. [`systemd-logind.service`](https://www.freedesktop.org/software/systemd/man/systemd-logind.service.html) processes these JSON records of users that log in, and applies various - resource management settings to the per-user slice units it manages. This - allows setting global limits on resource consumption by a specific user. + resource management settings to the per-user slice units it manages. + This allows setting global limits on resource consumption by a specific user. 4. [`nss-systemd`](https://www.freedesktop.org/software/systemd/man/nss-systemd.html) is a glibc NSS module that synthesizes classic NSS records from these JSON @@ -37,14 +38,13 @@ pairs, encoded as JSON. Specifically: records, making them discoverable to the rest of the system. 6. [`systemd-userdbd.service`](https://www.freedesktop.org/software/systemd/man/systemd-userdbd.service.html) - is a small service that can translate UNIX/glibc NSS records to these JSON - user records. It also provides a unified [Varlink](https://varlink.org/) API - for querying and enumerating records of this type, optionally acquiring them - from various other services. + is a small service that can translate UNIX/glibc NSS records to these JSON user records. + It also provides a unified [Varlink](https://varlink.org/) API for querying and enumerating records of this type, + optionally acquiring them from various other services. JSON user records may contain various fields that are not available in `struct -passwd`, and are extensible for other applications. For example, the record may -contain information about: +passwd`, and are extensible for other applications. +For example, the record may contain information about: 1. Additional security credentials (PKCS#11 security token information, biometrical authentication information, SSH public key information) @@ -74,6 +74,10 @@ the following extensions are envisioned: Similar to JSON User Records there are also [JSON Group Records](/GROUP_RECORD) that encapsulate UNIX groups. +JSON User Records are not suitable for storing all identity information about +the user, such as binary data or large unstructured blobs of text. These parts +of a user's identity should be stored in the [Blob Directories](/USER_RECORD_BLOB_DIRS). + JSON User Records may be transferred or written to disk in various protocols and formats. To inquire about such records defined on the local system use the [User/Group Lookup API via Varlink](/USER_GROUP_API). User/group records may @@ -83,88 +87,82 @@ for details. ## Why JSON? -JSON is nicely extensible and widely used. In particular it's easy to -synthesize and process with numerous programming languages. It's particularly -popular in the web communities, which hopefully should make it easy to link +JSON is nicely extensible and widely used. +In particular it's easy to synthesize and process with numerous programming languages. +It's particularly popular in the web communities, which hopefully should make it easy to link user credential data from the web and from local systems more closely together. Please note that this specification assumes that JSON numbers may cover the full -integer range of -2^63 … 2^64-1 without loss of precision (i.e. INT64_MIN … -UINT64_MAX). Please read, write and process user records as defined by this -specification only with JSON implementations that provide this number range. +integer range of -2^63 … 2^64-1 without loss of precision (i.e. INT64_MIN … UINT64_MAX). +Please read, write and process user records as defined by this specification only with JSON implementations that provide this number range. ## General Structure The JSON user records generated and processed by systemd follow a general structure, consisting of seven distinct "sections". Specifically: -1. Various fields are placed at the top-level of user record (the `regular` - section). These are generally fields that shall apply unconditionally to the +1. Various fields are placed at the top-level of user record (the `regular` section). + These are generally fields that shall apply unconditionally to the user in all contexts, are portable and not security sensitive. -2. A number of fields are located in the `privileged` section (a sub-object of - the user record). Fields contained in this object are security sensitive, - i.e. contain information that the user and the administrator should be able - to see, but other users should not. In many ways this matches the data - stored in `/etc/shadow` in classic Linux user accounts, i.e. includes - password hashes and more. Algorithmically, when a user record is passed to - an untrusted client, by monopolizing such sensitive records in a single +2. A number of fields are located in the `privileged` section (a sub-object of the user record). + Fields contained in this object are security sensitive, + i.e. contain information that the user and the administrator should be able to see, but other users should not. + In many ways this matches the data stored in `/etc/shadow` in classic Linux user accounts, i.e. includes + password hashes and more. + Algorithmically, when a user record is passed to an untrusted client, by monopolizing such sensitive records in a single object field we can easily remove it from view. 3. A number of fields are located in objects inside the `perMachine` section - (an array field of the user record). Primarily these are resource - management-related fields, as those tend to make sense on a specific system + (an array field of the user record). + Primarily these are resource management-related fields, as those tend to make sense on a specific system only, e.g. limiting a user's memory use to 1G only makes sense on a specific - system that has more than 1G of memory. Each object inside the `perMachine` - array comes with a `matchMachineId` or `matchHostname` field which indicate - which systems to apply the listed settings to. Note that many fields - accepted in the `perMachine` section can also be set at the top level (the - `regular` section), where they define the fallback if no matching object in - `perMachine` is found. + system that has more than 1G of memory. + Each object inside the `perMachine` array comes with a `matchMachineId` or `matchHostname` field which indicate + which systems to apply the listed settings to. + Note that many fields accepted in the `perMachine` section can also be set at the top level (the + `regular` section), where they define the fallback if no matching object in `perMachine` is found. 4. Various fields are located in the `binding` section (a sub-sub-object of the user record; an intermediary object is inserted which is keyed by the - machine ID of the host). Fields included in this section "bind" the object - to a specific system. They generally include non-portable information about - paths or UID assignments, that are true on a specific system, but not + machine ID of the host). + Fields included in this section "bind" the object to a specific system. + They generally include non-portable information about paths or UID assignments, + that are true on a specific system, but not necessarily on others, and which are managed automatically by some user - record manager (such as `systemd-homed`). Data in this section is considered - part of the user record only in the local context, and is generally not - ported to other systems. Due to that it is not included in the reduced user - record the cryptographic signature defined in the `signature` section is - calculated on. In `systemd-homed` this section is also removed when the - user's record is stored in the `~/.identity` file in the home directory, so - that every system with access to the home directory can manage these - `binding` fields individually. Typically, the binding section is persisted - to the local disk. + record manager (such as `systemd-homed`). + Data in this section is considered part of the user record only in the local context, and is generally not + ported to other systems. + Due to that it is not included in the reduced user record the cryptographic signature defined in the `signature` section is calculated on. + In `systemd-homed` this section is also removed when the user's record is stored in the `~/.identity` file in the home directory, + so that every system with access to the home directory can manage these + `binding` fields individually. + Typically, the binding section is persisted to the local disk. 5. Various fields are located in the `status` section (a sub-sub-object of the user record, also with an intermediary object between that is keyed by the - machine ID, similar to the way the `binding` section is organized). This - section is augmented during runtime only, and never persisted to disk. The - idea is that this section contains information about current runtime + machine ID, similar to the way the `binding` section is organized). + This section is augmented during runtime only, and never persisted to disk. + The idea is that this section contains information about current runtime resource usage (for example: currently used disk space of the user), that changes dynamically but is otherwise immediately associated with the user record and for many purposes should be considered to be part of the user record. 6. The `signature` section contains one or more cryptographic signatures of a - reduced version of the user record. This is used to ensure that only user - records defined by a specific source are accepted on a system, by validating - the signature against the set of locally accepted signature public keys. The - signature is calculated from the JSON user record with all sections removed, + reduced version of the user record. + This is used to ensure that only user records defined by a specific source are accepted on a system, by validating + the signature against the set of locally accepted signature public keys. + The signature is calculated from the JSON user record with all sections removed, except for `regular`, `privileged`, `perMachine`. Specifically, `binding`, - `status`, `signature` itself and `secret` are removed first and thus not - covered by the signature. This section is optional, and is only used when - cryptographic validation of user records is required (as it is by - `systemd-homed.service` for example). - -7. The `secret` section contains secret user credentials, such as password or - PIN information. This data is never persisted, and never returned when user - records are inquired by a client, privileged or not. This data should only - be included in a user record very briefly, for example when certain very - specific operations are executed. For example, in tools such as - `systemd-homed` this section may be included in user records, when creating + `status`, `signature` itself and `secret` are removed first and thus not covered by the signature. + This section is optional, and is only used when cryptographic validation of user records is required + (as it is by `systemd-homed.service` for example). + +7. The `secret` section contains secret user credentials, such as password or PIN information. + This data is never persisted, and never returned when user records are inquired by a client, privileged or not. + This data should only be included in a user record very briefly, for example when certain very specific operations are executed. + For example, in tools such as `systemd-homed` this section may be included in user records, when creating a new home directory, as passwords and similar credentials need to be provided to encrypt the home directory with. @@ -181,118 +179,120 @@ Here's a tabular overview of the sections and their properties: | secret | no | no | yes | no | Note that services providing user records to the local system are free to -manage only a subset of these sections and never include the others in -them. For example, a service that has no concept of signed records (for example +manage only a subset of these sections and never include the others in them. +For example, a service that has no concept of signed records (for example because the records it manages are inherently trusted anyway) does not have to -bother with the `signature` section. A service that only defines records in a -strictly local context and without signatures doesn't have to deal with the -`perMachine` or `binding` sections and can include its data exclusively in the -regular section. A service that uses a separate, private channel for -authenticating users (or that doesn't have a concept of authentication at all) +bother with the `signature` section. +A service that only defines records in a strictly local context and without signatures doesn't have to deal with the +`perMachine` or `binding` sections and can include its data exclusively in the regular section. +A service that uses a separate, private channel for authenticating users (or that doesn't have a concept of authentication at all) does not need to be concerned with the `secret` section of user records, as the fields included therein are only useful when executing authentication operations natively against JSON user records. -The `systemd-homed` manager uses all seven sections for various -purposes. Inside the home directories (and if the LUKS2 backend is used, also +The `systemd-homed` manager uses all seven sections for various purposes. +Inside the home directories (and if the LUKS2 backend is used, also in the LUKS2 header) a user record containing the `regular`, `privileged`, `perMachine` and `signature` sections is stored. `systemd-homed` also stores a version of the record on the host, with the same four sections and augmented -with an additional, fifth `binding` section. When a local client enquires about -a user record managed by `systemd-homed` the service will add in some +with an additional, fifth `binding` section. +When a local client enquires about a user record managed by `systemd-homed` the service will add in some additional information about the user and home directory in the `status` -section — this version is only transferred via IPC and never written to -disk. Finally the `secret` section is used during authentication operations via +section — this version is only transferred via IPC and never written to disk. +Finally the `secret` section is used during authentication operations via IPC to transfer the user record along with its authentication tokens in one go. ## Fields in the `regular` section -As mentioned, the `regular` section's fields are placed at the top level -object. The following fields are currently defined: - -`userName` → The UNIX user name for this record. Takes a string with a valid -UNIX user name. This field is the only mandatory field, all others are -optional. Corresponds with the `pw_name` field of `struct passwd` and the -`sp_namp` field of `struct spwd` (i.e. the shadow user record stored in -`/etc/shadow`). See [User/Group Name Syntax](/USER_NAMES) for -the (relaxed) rules the various systemd components enforce on user/group names. - -`realm` → The "realm" a user is defined in. This concept allows distinguishing -users with the same name that originate in different organizations or -installations. This should take a string in DNS domain syntax, but doesn't have -to refer to an actual DNS domain (though it is recommended to use one for -this). The idea is that the user `lpoetter` in the `redhat.com` realm might be -distinct from the same user in the `poettering.hq` realm. User records for the -same user name that have different realm fields are considered referring to -different users. When updating a user record it is required that any new -version has to match in both `userName` and `realm` field. This field is -optional, when unset the user should not be considered part of any realm. A -user record with a realm set is never compatible (for the purpose of updates, +As mentioned, the `regular` section's fields are placed at the top level object. +The following fields are currently defined: + +`userName` → The UNIX user name for this record. +Takes a string with a valid UNIX user name. +This field is the only mandatory field, all others are optional. +Corresponds with the `pw_name` field of `struct passwd` and the `sp_namp` field of `struct spwd` (i.e. the shadow user record stored in `/etc/shadow`). +See [User/Group Name Syntax](/USER_NAMES) +for the (relaxed) rules the various systemd components enforce on user/group names. + +`realm` → The "realm" a user is defined in. +This concept allows distinguishing users with the same name that originate in different organizations or installations. +This should take a string in DNS domain syntax, but doesn't have +to refer to an actual DNS domain (though it is recommended to use one for this). +The idea is that the user `lpoetter` in the `redhat.com` realm might be +distinct from the same user in the `poettering.hq` realm. +User records for the same user name that have different realm fields are considered referring to different users. +When updating a user record it is required that any new version has to match in both `userName` and `realm` field. +This field is optional, when unset the user should not be considered part of any realm. +A user record with a realm set is never compatible (for the purpose of updates, see above) with a user record without one set, even if the `userName` field matches. +`blobDirectory` → The absolute path to a world-readable copy of the user's blob +directory. See [Blob Directories](/USER_RECORD_BLOB_DIRS) for more details. + +`blobManifest` → An object, which maps valid blob directory filenames (see +[Blob Directories](/USER_RECORD_BLOB_DIRS) for requirements) to SHA256 hashes +formatted as hex strings. This exists for the purpose of including the contents +of the blob directory in the record's signature. Managers that support blob +directories and utilize signed user records (like `systemd-homed`) should use +this field to verify the contents of the blob directory whenever appropriate. + `realName` → The real name of the user, a string. This should contain the -user's real ("human") name, and corresponds loosely to the GECOS field of -classic UNIX user records. When converting a `struct passwd` to a JSON user -record this field is initialized from GECOS (i.e. the `pw_gecos` field), and -vice versa when converting back. That said, unlike GECOS this field is supposed -to contain only the real name and no other information. This field must not -contain control characters (such as `\n`) or colons (`:`), since those are used +user's real ("human") name, and corresponds loosely to the GECOS field of classic UNIX user records. +When converting a `struct passwd` to a JSON user record this field is initialized from GECOS (i.e. the `pw_gecos` field), and +vice versa when converting back. +That said, unlike GECOS this field is supposed to contain only the real name and no other information. +This field must not contain control characters (such as `\n`) or colons (`:`), since those are used as record separators in classic `/etc/passwd` files and similar formats. -`emailAddress` → The email address of the user, formatted as -string. [`pam_systemd`](https://www.freedesktop.org/software/systemd/man/pam_systemd.html) +`emailAddress` → The email address of the user, formatted as string. +[`pam_systemd`](https://www.freedesktop.org/software/systemd/man/pam_systemd.html) initializes the `$EMAIL` environment variable from this value for all login sessions. `iconName` → The name of an icon picked by the user, for example for the -purpose of an avatar. This must be a string, and should follow the semantics -defined in the [Icon Naming -Specification](https://standards.freedesktop.org/icon-naming-spec/icon-naming-spec-latest.html). +purpose of an avatar. +This must be a string, and should follow the semantics defined in the +[Icon Naming Specification](https://standards.freedesktop.org/icon-naming-spec/icon-naming-spec-latest.html). -`location` → A free-form location string describing the location of the user, -if that is applicable. It's probably wise to use a location string processable -by geo-location subsystems, but this is not enforced nor required. Example: -`Berlin, Germany` or `Basement, Room 3a`. +`location` → A free-form location string describing the location of the user, if that is applicable. +It's probably wise to use a location string processable by geo-location subsystems, but this is not enforced nor required. +Example: `Berlin, Germany` or `Basement, Room 3a`. `disposition` → A string, one of `intrinsic`, `system`, `dynamic`, `regular`, `container`, `reserved`. If specified clarifies the disposition of the user, -i.e. the context it is defined in. For regular, "human" users this should be -`regular`, for system users (i.e. users that system services run under, and -similar) this should be `system`. The `intrinsic` disposition should be used -only for the two users that have special meaning to the OS kernel itself, -i.e. the `root` and `nobody` users. The `container` string should be used for -users that are used by an OS container, and hence will show up in `ps` listings -and such, but are only defined in container context. Finally `reserved` should -be used for any users outside of these use-cases. Note that this property is -entirely optional and applications are assumed to be able to derive the +i.e. the context it is defined in. +For regular, "human" users this should be `regular`, for system users (i.e. users that system services run under, and similar) this should be `system`. +The `intrinsic` disposition should be used only for the two users that have special meaning to the OS kernel itself, +i.e. the `root` and `nobody` users. +The `container` string should be used for users that are used by an OS container, and hence will show up in `ps` listings +and such, but are only defined in container context. +Finally `reserved` should be used for any users outside of these use-cases. +Note that this property is entirely optional and applications are assumed to be able to derive the disposition of a user automatically from a record even in absence of this field, based on other fields, for example the numeric UID. By setting this field explicitly applications can override this default determination. `lastChangeUSec` → An unsigned 64-bit integer value, referring to a timestamp in µs since the epoch 1970, indicating when the user record (specifically, any of the -`regular`, `privileged`, `perMachine` sections) was last changed. This field is -used when comparing two records of the same user to identify the newer one, and +`regular`, `privileged`, `perMachine` sections) was last changed. +This field is used when comparing two records of the same user to identify the newer one, and is used for example for automatic updating of user records, where appropriate. `lastPasswordChangeUSec` → Similar, also an unsigned 64-bit integer value, -indicating the point in time the password (or any authentication token) of the -user was last changed. This corresponds to the `sp_lstchg` field of `struct -spwd`, i.e. the matching field in the user shadow database `/etc/shadow`, +indicating the point in time the password (or any authentication token) of the user was last changed. +This corresponds to the `sp_lstchg` field of `struct spwd`, i.e. the matching field in the user shadow database `/etc/shadow`, though provides finer resolution. -`shell` → A string, referring to the shell binary to use for terminal logins of -this user. This corresponds with the `pw_shell` field of `struct passwd`, and -should contain an absolute file system path. For system users not suitable for -terminal log-in this field should not be set. - -`umask` → The `umask` to set for the user's login sessions. Takes an -integer. Note that usually on UNIX the umask is noted in octal, but JSON's -integers are generally written in decimal, hence in this context we denote it -umask in decimal too. The specified value should be in the valid range for -umasks, i.e. 0000…0777 (in octal as typical in UNIX), or 0…511 (in decimal, how -it actually appears in the JSON record). This `umask` is automatically set by -[`pam_systemd`](https://www.freedesktop.org/software/systemd/man/pam_systemd.html) +`shell` → A string, referring to the shell binary to use for terminal logins of this user. +This corresponds with the `pw_shell` field of `struct passwd`, and should contain an absolute file system path. +For system users not suitable for terminal log-in this field should not be set. + +`umask` → The `umask` to set for the user's login sessions. +Takes an integer. Note that usually on UNIX the umask is noted in octal, but JSON's +integers are generally written in decimal, hence in this context we denote it umask in decimal too. +The specified value should be in the valid range for umasks, i.e. 0000…0777 (in octal as typical in UNIX), or 0…511 (in decimal, how +it actually appears in the JSON record). +This `umask` is automatically set by [`pam_systemd`](https://www.freedesktop.org/software/systemd/man/pam_systemd.html) for all login sessions of the user. `environment` → An array of strings, each containing an environment variable @@ -301,20 +301,30 @@ and its value to set for the user's login session, in a format compatible with environment variable listed here is automatically set by [`pam_systemd`](https://www.freedesktop.org/software/systemd/man/pam_systemd.html) for all login sessions of the user. - `timeZone` → A string indicating a preferred timezone to use for the user. When logging in [`pam_systemd`](https://www.freedesktop.org/software/systemd/man/pam_systemd.html) will automatically initialize the `$TZ` environment variable from this -string. The string should be a `tzdata` compatible location string, for -example: `Europe/Berlin`. +string. +The string should be a `tzdata` compatible location string, for example: `Europe/Berlin`. `preferredLanguage` → A string indicating the preferred language/locale for the -user. When logging in +user. It is combined with the `additionalLanguages` field to initialize the `$LANG` +and `$LANGUAGE` environment variables on login; see below for more details. This string +should be in a format compatible with the `$LANG` environment variable, for example: +`de_DE.UTF-8`. + +`additionalLanguages` → An array of strings indicating the preferred languages/locales +that should be used in the event that translations for the `preferredLanguage` are +missing, listed in order of descending priority. This allows multi-lingual users to +specify all the languages that they know, so software lacking translations in the user's +primary language can try another language that the user knows rather than falling back to +the default English. All entries in this field must be valid locale names, compatible with +the `$LANG` variable, for example: `de_DE.UTF-8`. When logging in [`pam_systemd`](https://www.freedesktop.org/software/systemd/man/pam_systemd.html) -will automatically initialize the `$LANG` environment variable from this -string. The string hence should be in a format compatible with this environment -variable, for example: `de_DE.UTF8`. +will prepend `preferredLanguage` (if set) to this list (if set), remove duplicates, +and then automatically initialize the `$LANGUAGE` variable with the resulting list. +It will also initialize `$LANG` variable with the first entry in the resulting list. `niceLevel` → An integer value in the range -20…19. When logging in [`pam_systemd`](https://www.freedesktop.org/software/systemd/man/pam_systemd.html) @@ -322,10 +332,10 @@ will automatically initialize the login process' nice level to this value with, which is then inherited by all the user's processes, see [`setpriority()`](https://man7.org/linux/man-pages/man2/setpriority.2.html) for more information. - `resourceLimits` → An object, where each key refers to a Linux resource limit -(such as `RLIMIT_NOFILE` and similar). Their values should be an object with -two keys `cur` and `max` for the soft and hard resource limit. When logging in +(such as `RLIMIT_NOFILE` and similar). +Their values should be an object with two keys `cur` and `max` for the soft and hard resource limit. +When logging in [`pam_systemd`](https://www.freedesktop.org/software/systemd/man/pam_systemd.html) will automatically initialize the login process' resource limits to these values, which is then inherited by all the user's processes, see @@ -334,127 +344,119 @@ information. `locked` → A boolean value. If true, the user account is locked, the user may not log in. If this field is missing it should be assumed to be false, -i.e. logins are permitted. This field corresponds to the `sp_expire` field of -`struct spwd` (i.e. the `/etc/shadow` data for a user) being set to zero or -one. +i.e. logins are permitted. +This field corresponds to the `sp_expire` field of `struct spwd` (i.e. the `/etc/shadow` data for a user) being set to zero or one. `notBeforeUSec` → An unsigned 64-bit integer value, indicating a time in µs since the UNIX epoch (1970) before which the record should be considered invalid for the purpose of logging in. `notAfterUSec` → Similar, but indicates the point in time *after* which logins -shall not be permitted anymore. This corresponds to the `sp_expire` field of -`struct spwd`, when it is set to a value larger than one, but provides finer -granularity. +shall not be permitted anymore. +This corresponds to the `sp_expire` field of `struct spwd`, when it is set to a value larger than one, but provides finer granularity. `storage` → A string, one of `classic`, `luks`, `directory`, `subvolume`, -`fscrypt`, `cifs`. Indicates the storage mechanism for the user's home -directory. If `classic` the home directory is a plain directory as in classic +`fscrypt`, `cifs`. +Indicates the storage mechanism for the user's home directory. If `classic` the home directory is a plain directory as in classic UNIX. When `directory`, the home directory is a regular directory, but the `~/.identity` file in it contains the user's user record, so that the directory is self-contained. Similar, `subvolume` is a `btrfs` subvolume that also contains a `~/.identity` user record; `fscrypt` is an `fscrypt`-encrypted directory, also containing the `~/.identity` user record; `luks` is a per-user -LUKS volume that is mounted as home directory, and `cifs` a home directory -mounted from a Windows File Share. The five latter types are primarily used by -`systemd-homed` when managing home directories, but may be used if other +LUKS volume that is mounted as home directory, and `cifs` a home directory mounted from a Windows File Share. +The five latter types are primarily used by `systemd-homed` when managing home directories, but may be used if other managers are used too. If this is not set, `classic` is the implied default. `diskSize` → An unsigned 64-bit integer, indicating the intended home directory -disk space in bytes to assign to the user. Depending on the selected storage -type this might be implemented differently: for `luks` this is the intended size -of the file system and LUKS volume, while for the others this likely translates +disk space in bytes to assign to the user. +Depending on the selected storage type this might be implemented differently: +for `luks` this is the intended size of the file system and LUKS volume, while for the others this likely translates to classic file system quota settings. `diskSizeRelative` → Similar to `diskSize` but takes a relative value, but -specifies a fraction of the available disk space on the selected storage medium -to assign to the user. This unsigned integer value is normalized to 2^32 = -100%. +specifies a fraction of the available disk space on the selected storage medium to assign to the user. +This unsigned integer value is normalized to 2^32 = 100%. `skeletonDirectory` → Takes a string with the absolute path to the skeleton -directory to populate a new home directory from. This is only used when a home -directory is first created, and defaults to `/etc/skel` if not defined. +directory to populate a new home directory from. +This is only used when a home directory is first created, and defaults to `/etc/skel` if not defined. `accessMode` → Takes an unsigned integer in the range 0…511 indicating the UNIX access mask for the home directory when it is first created. `tasksMax` → Takes an unsigned 64-bit integer indicating the maximum number of -tasks the user may start in parallel during system runtime. This counts -all tasks (i.e. threads, where each process is at least one thread) the user starts or that are -forked from these processes even if the user identity is changed (for example -by setuid binaries/`su`/`sudo` and similar). +tasks the user may start in parallel during system runtime. +This counts all tasks (i.e. threads, where each process is at least one thread) the user starts or that are +forked from these processes even if the user identity is changed (for example by setuid binaries/`su`/`sudo` and similar). [`systemd-logind.service`](https://www.freedesktop.org/software/systemd/man/systemd-logind.service.html) enforces this by setting the `TasksMax` slice property for the user's slice `user-$UID.slice`. `memoryHigh`/`memoryMax` → These take unsigned 64-bit integers indicating upper memory limits for all processes of the user (plus all processes forked off them -that might have changed user identity), in bytes. Enforced by +that might have changed user identity), +in bytes. Enforced by [`systemd-logind.service`](https://www.freedesktop.org/software/systemd/man/systemd-logind.service.html), similar to `tasksMax`. `cpuWeight`/`ioWeight` → These take unsigned integers in the range 1…10000 (defaults to 100) and configure the CPU and IO scheduling weights for the -user's processes as a whole. Also enforced by +user's processes as a whole. +Also enforced by [`systemd-logind.service`](https://www.freedesktop.org/software/systemd/man/systemd-logind.service.html), similar to `tasksMax`, `memoryHigh` and `memoryMax`. `mountNoDevices`/`mountNoSuid`/`mountNoExecute` → Three booleans that control the `nodev`, `nosuid`, `noexec` mount flags of the user's home -directories. Note that these booleans are only honored if the home directory -is managed by a subsystem such as `systemd-homed.service` that automatically +directories. +Note that these booleans are only honored if the home directory is managed by a subsystem such as `systemd-homed.service` that automatically mounts home directories on login. -`cifsDomain` → A string indicating the Windows File Sharing domain (CIFS) to -use. This is generally useful, but particularly when `cifs` is used as storage +`cifsDomain` → A string indicating the Windows File Sharing domain (CIFS) to use. +This is generally useful, but particularly when `cifs` is used as storage mechanism for the user's home directory, see above. `cifsUserName` → A string indicating the Windows File Sharing user name (CIFS) -to associate this user record with. This is generally useful, but particularly -useful when `cifs` is used as storage mechanism for the user's home directory, -see above. +to associate this user record with. +This is generally useful, but particularly useful when `cifs` is used as storage mechanism for the user's home directory, see above. `cifsService` → A string indicating the Windows File Share service (CIFS) to -mount as home directory of the user on login. Should be in format -`//<host>/<service>/<directory/…>`. The directory part is optional. If missing -the top-level directory of the CIFS share is used. +mount as home directory of the user on login. +Should be in format `//<host>/<service>/<directory/…>`. +The directory part is optional. If missing the top-level directory of the CIFS share is used. `cifsExtraMountOptions` → A string with additional mount options to pass to `mount.cifs` when mounting the home directory CIFS share. `imagePath` → A string with an absolute file system path to the file, directory -or block device to use for storage backing the home directory. If the `luks` -storage is used, this refers to the loopback file or block device node to store -the LUKS volume on. For `fscrypt`, `directory`, `subvolume` this refers to the -directory to bind mount as home directory on login. Not defined for `classic` -or `cifs`. - -`homeDirectory` → A string with an absolute file system path to the home -directory. This is where the image indicated in `imagePath` is mounted to on -login and thus indicates the application facing home directory while the home -directory is active, and is what the user's `$HOME` environment variable is set -to during log-in. It corresponds to the `pw_dir` field of `struct passwd`. - -`uid` → An unsigned integer in the range 0…4294967295: the numeric UNIX user ID (UID) to -use for the user. This corresponds to the `pw_uid` field of `struct passwd`. - -`gid` → An unsigned integer in the range 0…4294967295: the numeric UNIX group -ID (GID) to use for the user. This corresponds to the `pw_gid` field of -`struct passwd`. - -`memberOf` → An array of strings, each indicating a UNIX group this user shall -be a member of. The listed strings must be valid group names, but it is not -required that all groups listed exist in all contexts: any entry for which no +or block device to use for storage backing the home directory. +If the `luks` storage is used, this refers to the loopback file or block device node to store the LUKS volume on. +For `fscrypt`, `directory`, `subvolume` this refers to the directory to bind mount as home directory on login. +Not defined for `classic` or `cifs`. + +`homeDirectory` → A string with an absolute file system path to the home directory. +This is where the image indicated in `imagePath` is mounted to on login and thus indicates the application facing home directory while the home +directory is active, and is what the user's `$HOME` environment variable is set to during log-in. +It corresponds to the `pw_dir` field of `struct passwd`. + +`uid` → An unsigned integer in the range 0…4294967295: the numeric UNIX user ID (UID) to use for the user. +This corresponds to the `pw_uid` field of `struct passwd`. + +`gid` → An unsigned integer in the range 0…4294967295: the numeric UNIX group ID (GID) to use for the user. +This corresponds to the `pw_gid` field of `struct passwd`. + +`memberOf` → An array of strings, each indicating a UNIX group this user shall be a member of. +The listed strings must be valid group names, but it is not required that all groups listed exist in all contexts: any entry for which no group exists should be silently ignored. `fileSystemType` → A string, one of `ext4`, `xfs`, `btrfs` (possibly others) to -use as file system for the user's home directory. This is primarily relevant -when the storage mechanism used is `luks` as a file system to use inside the +use as file system for the user's home directory. +This is primarily relevant when the storage mechanism used is `luks` as a file system to use inside the LUKS container must be selected. `partitionUuid` → A string containing a lower-case, text-formatted UUID, referencing -the GPT partition UUID the home directory is located in. This is primarily -relevant when the storage mechanism used is `luks`. +the GPT partition UUID the home directory is located in. +This is primarily relevant when the storage mechanism used is `luks`. `luksUuid` → A string containing a lower-case, text-formatted UUID, referencing the LUKS volume UUID the home directory is located in. This is primarily @@ -466,9 +468,9 @@ primarily relevant when the storage mechanism used is `luks`. `luksDiscard` → A boolean. If true and `luks` storage is used, controls whether the loopback block devices, LUKS and the file system on top shall be used in -`discard` mode, i.e. erased sectors should always be returned to the underlying -storage. If false and `luks` storage is used turns this behavior off. In -addition, depending on this setting an `FITRIM` or `fallocate()` operation is +`discard` mode, i.e. erased sectors should always be returned to the underlying storage. +If false and `luks` storage is used turns this behavior off. +In addition, depending on this setting an `FITRIM` or `fallocate()` operation is executed to make sure the image matches the selected option. `luksOfflineDiscard` → A boolean. Similar to `luksDiscard`, it controls whether @@ -503,45 +505,51 @@ memory cost for the PBKDF operation, when LUKS storage is used, in bytes. `luksPbkdfParallelThreads` → An unsigned 64-bit integer, indicating the intended required parallel threads for the PBKDF operation, when LUKS storage is used. -`luksSectorSize` → An unsigned 64-bit integer, indicating the sector size to -use for the LUKS storage mechanism, in bytes. Must be a power of two between -512 and 4096. +`luksSectorSize` → An unsigned 64-bit integer, indicating the sector size to use for the LUKS storage mechanism, in bytes. +Must be a power of two between 512 and 4096. -`autoResizeMode` → A string, one of `off`, `grow`, `shrink-and-grow`. Unless -set to `off`, controls whether the home area shall be grown automatically to -the size configured in `diskSize` automatically at login time. If set to -`shrink-and-grown` the home area is also shrunk to the minimal size possible +`autoResizeMode` → A string, one of `off`, `grow`, `shrink-and-grow`. +Unless set to `off`, controls whether the home area shall be grown automatically to +the size configured in `diskSize` automatically at login time. +If set to `shrink-and-grown` the home area is also shrunk to the minimal size possible (as dictated by used disk space and file system constraints) on logout. -`rebalanceWeight` → An unsigned integer, `null` or a boolean. Configures the -free disk space rebalancing weight for the home area. The integer must be in -the range 1…10000 to configure an explicit weight. If unset, or set to `null` -or `true` the default weight of 100 is implied. If set to 0 or `false` -rebalancing is turned off for this home area. +`rebalanceWeight` → An unsigned integer, `null` or a boolean. +Configures the free disk space rebalancing weight for the home area. +The integer must be in the range 1…10000 to configure an explicit weight. +If unset, or set to `null` or `true` the default weight of 100 is implied. +If set to 0 or `false` rebalancing is turned off for this home area. -`service` → A string declaring the service that defines or manages this user -record. It is recommended to use reverse domain name notation for this. For -example, if `systemd-homed` manages a user a string of `io.systemd.Home` is -used for this. +`service` → A string declaring the service that defines or manages this user record. +It is recommended to use reverse domain name notation for this. +For example, if `systemd-homed` manages a user a string of `io.systemd.Home` is used for this. `rateLimitIntervalUSec` → An unsigned 64-bit integer that configures the -authentication rate limiting enforced on the user account. This specifies a -timer interval (in µs) within which to count authentication attempts. When the -counter goes above the value configured n `rateLimitIntervalBurst` log-ins are +authentication rate limiting enforced on the user account. +This specifies a timer interval (in µs) within which to count authentication attempts. +When the counter goes above the value configured n `rateLimitIntervalBurst` log-ins are temporarily refused until the interval passes. `rateLimitIntervalBurst` → An unsigned 64-bit integer, closely related to `rateLimitIntervalUSec`, that puts a limit on authentication attempts within the configured time interval. -`enforcePasswordPolicy` → A boolean. Configures whether to enforce the system's -password policy when creating the home directory for the user or changing the -user's password. By default the policy is enforced, but if this field is false -it is bypassed. +`enforcePasswordPolicy` → A boolean. +Configures whether to enforce the system's password policy when creating the home directory for the user or changing the user's password. +By default the policy is enforced, but if this field is false it is bypassed. + +`autoLogin` → A boolean. +If true the user record is marked as suitable for auto-login. +Systems are supposed to automatically log in a user marked this way during boot, if there's exactly one user on it defined this way. -`autoLogin` → A boolean. If true the user record is marked as suitable for -auto-login. Systems are supposed to automatically log in a user marked this way -during boot, if there's exactly one user on it defined this way. +`preferredSessionType` → A string that indicates the user's preferred session type +(i.e. `x11`, `wayland`, or other values valid for `$XDG_SESSION_TYPE`). This should +be used by the display manager to pre-select the correct environment to log into. + +`preferredSessionLauncher` → A string that indicates the user's preferred session launcher +desktop entry file (i.e. `gnome`, `gnome-classic`, `plasma`, `kodi`, or others that appear +in `/usr/share/xsessions/` or `/usr/share/wayland-sessions/`). This should be used by the +display manager to pre-select the correct environment to launch when the user logs in. `stopDelayUSec` → An unsigned 64-bit integer, indicating the time in µs the per-user service manager is kept around after the user fully logged out. This @@ -549,50 +557,45 @@ value is honored by [`systemd-logind.service`](https://www.freedesktop.org/software/systemd/man/systemd-logind.service.html). If set to zero the per-user service manager is immediately terminated when the user logs out, and longer values optimize high-frequency log-ins as the -necessary work to set up and tear down a log-in is reduced if the service -manager stays running. +necessary work to set up and tear down a log-in is reduced if the service manager stays running. -`killProcesses` → A boolean. If true all processes of the user are -automatically killed when the user logs out. This is enforced by -[`systemd-logind.service`](https://www.freedesktop.org/software/systemd/man/systemd-logind.service.html). If -false any processes left around when the user logs out are left running. +`killProcesses` → A boolean. +If true all processes of the user are automatically killed when the user logs out. +This is enforced by +[`systemd-logind.service`](https://www.freedesktop.org/software/systemd/man/systemd-logind.service.html). +If false any processes left around when the user logs out are left running. `passwordChangeMinUSec`/`passwordChangeMaxUSec` → An unsigned 64-bit integer, -encoding how much time has to pass at least/at most between password changes of -the user. This corresponds with the `sp_min` and `sp_max` fields of `struct -spwd` (i.e. the `/etc/shadow` entries of the user), but offers finer -granularity. +encoding how much time has to pass at least/at most between password changes of the user. +This corresponds with the `sp_min` and `sp_max` fields of `struct spwd` (i.e. the `/etc/shadow` entries of the user), but offers finer granularity. `passwordChangeWarnUSec` → An unsigned 64-bit integer, encoding how much time to -warn the user before their password expires, in µs. This corresponds with the -`sp_warn` field of `struct spwd`. +warn the user before their password expires, in µs. +This corresponds with the `sp_warn` field of `struct spwd`. `passwordChangeInactiveUSec` → An unsigned 64-bit integer, encoding how much -time has to pass after the password expired that the account is -deactivated. This corresponds with the `sp_inact` field of `struct spwd`. +time has to pass after the password expired that the account is deactivated. +This corresponds with the `sp_inact` field of `struct spwd`. -`passwordChangeNow` → A boolean. If true the user has to change their password -on next login. This corresponds with the `sp_lstchg` field of `struct spwd` -being set to zero. +`passwordChangeNow` → A boolean. +If true the user has to change their password on next login. +This corresponds with the `sp_lstchg` field of `struct spwd` being set to zero. `pkcs11TokenUri` → An array of strings, each with an RFC 7512 compliant PKCS#11 URI referring to security token (or smart card) of some form, that shall be -associated with the user and may be used for authentication. The URI is used to -search for an X.509 certificate and associated private key that may be used to -decrypt an encrypted secret key that is used to unlock the user's account (see -below). It's undefined how precise the URI is: during log-in it is tested -against all plugged in security tokens and if there's exactly one matching +associated with the user and may be used for authentication. +The URI is used to search for an X.509 certificate and associated private key that may be used to +decrypt an encrypted secret key that is used to unlock the user's account (see below). +It's undefined how precise the URI is: during log-in it is tested against all plugged in security tokens and if there's exactly one matching private key found with it it is used. -`fido2HmacCredential` → An array of strings, each with a Base64-encoded FIDO2 -credential ID that shall be used for authentication with FIDO2 devices that -implement the `hmac-secret` extension. The salt to pass to the FIDO2 device is -found in `fido2HmacSalt`. +`fido2HmacCredential` → An array of strings,each with a Base64-encoded FIDO2 +credential ID that shall be used for authentication with FIDO2 devices that implement the `hmac-secret` extension. +The salt to pass to the FIDO2 device is found in `fido2HmacSalt`. -`recoveryKeyType` → An array of strings, each indicating the type of one -recovery key. The only supported recovery key type at the moment is `modhex64`, -for details see the description of `recoveryKey` below. An account may have any -number of recovery keys defined, and the array should have one entry for each. +`recoveryKeyType` → An array of strings, each indicating the type of one recovery key. +The only supported recovery key type at the moment is `modhex64`, for details see the description of `recoveryKey` below. +An account may have any number of recovery keys defined, and the array should have one entry for each. `privileged` → An object, which contains the fields of the `privileged` section of the user record, see below. @@ -620,75 +623,71 @@ user record, see below. ## Fields in the `privileged` section As mentioned, the `privileged` section is encoded in a sub-object of the user -record top-level object, in the `privileged` field. Any data included in this -object shall only be visible to the administrator and the user themselves, and -be suppressed implicitly when other users get access to a user record. It thus -takes the role of the `/etc/shadow` records for each user, which has similarly -restrictive access semantics. The following fields are currently defined: +record top-level object, in the `privileged` field. +Any data included in this object shall only be visible to the administrator and the user themselves, +and be suppressed implicitly when other users get access to a user record. +It thus takes the role of the `/etc/shadow` records for each user, which has similarly restrictive access semantics. +The following fields are currently defined: -`passwordHint` → A user-selected password hint in free-form text. This should -be a string like "What's the name of your first pet?", but is entirely for the -user to choose. +`passwordHint` → A user-selected password hint in free-form text. +This should be a string like "What's the name of your first pet?", but is entirely for the user to choose. `hashedPassword` → An array of strings, each containing a hashed UNIX password string, in the format -[`crypt(3)`](https://man7.org/linux/man-pages/man3/crypt.3.html) generates. This -corresponds with `sp_pwdp` field of `struct spwd` (and in a way the `pw_passwd` +[`crypt(3)`](https://man7.org/linux/man-pages/man3/crypt.3.html) generates. +This corresponds with `sp_pwdp` field of `struct spwd` (and in a way the `pw_passwd` field of `struct passwd`). `sshAuthorizedKeys` → An array of strings, each listing an SSH public key that -is authorized to access the account. The strings should follow the same format -as the lines in the traditional `~/.ssh/authorized_keys` file. +is authorized to access the account. +The strings should follow the same format as the lines in the traditional `~/.ssh/authorized_keys` file. -`pkcs11EncryptedKey` → An array of objects. Each element of the array should be -an object consisting of three string fields: `uri` shall contain a PKCS#11 +`pkcs11EncryptedKey` → An array of objects. +Each element of the array should be an object consisting of three string fields: `uri` shall contain a PKCS#11 security token URI, `data` shall contain a Base64-encoded encrypted key and -`hashedPassword` shall contain a UNIX password hash to test the key -against. Authenticating with a security token against this account shall work +`hashedPassword` shall contain a UNIX password hash to test the key against. +Authenticating with a security token against this account shall work as follows: the encrypted secret key is converted from its Base64 representation into binary, then decrypted with the PKCS#11 `C_Decrypt()` -function of the PKCS#11 module referenced by the specified URI, using the -private key found on the same token. The resulting decrypted key is then -Base64-encoded and tested against the specified UNIX hashed password. The -Base64-encoded decrypted key may also be used to unlock further resources -during log-in, for example the LUKS or `fscrypt` storage backend. It is -generally recommended that for each entry in `pkcs11EncryptedKey` there's also +function of the PKCS#11 module referenced by the specified URI, using the private key found on the same token. +The resulting decrypted key is then Base64-encoded and tested against the specified UNIX hashed password. +The Base64-encoded decrypted key may also be used to unlock further resources +during log-in, for example the LUKS or `fscrypt` storage backend. +It is generally recommended that for each entry in `pkcs11EncryptedKey` there's also a matching one in `pkcs11TokenUri` and vice versa, with the same URI, appearing in the same order, but this should not be required by applications processing user records. `fido2HmacSalt` → An array of objects, implementing authentication support with -FIDO2 devices that implement the `hmac-secret` extension. Each element of the -array should be an object consisting of three string fields: `credential`, -`salt`, `hashedPassword`, and three boolean fields: `up`, `uv` and -`clientPin`. The first two string fields shall contain Base64-encoded binary -data: the FIDO2 credential ID and the salt value to pass to the FIDO2 -device. During authentication this salt along with the credential ID is sent to -the FIDO2 token, which will HMAC hash the salt with its internal secret key and -return the result. This resulting binary key should then be Base64-encoded and -used as string password for the further layers of the stack. The -`hashedPassword` field of the `fido2HmacSalt` field shall be a UNIX password -hash to test this derived secret key against for authentication. The `up`, `uv` -and `clientPin` booleans map to the FIDO2 concepts of the same name and encode -whether the `uv`/`up` options are enabled during the authentication, and -whether a PIN shall be required. It is generally recommended that for each -entry in `fido2HmacSalt` there's also a matching one in `fido2HmacCredential`, -and vice versa, with the same credential ID, appearing in the same order, but -this should not be required by applications processing user records. - -`recoveryKey`→ An array of objects, each defining a recovery key. The object -has two mandatory fields: `type` indicates the type of recovery key. The only -currently permitted value is the string `modhex64`. The `hashedPassword` field -contains a UNIX password hash of the normalized recovery key. Recovery keys are -in most ways similar to regular passwords, except that they are generated by -the computer, not chosen by the user, and are longer. Currently, the only -supported recovery key format is `modhex64`, which consists of 64 +FIDO2 devices that implement the `hmac-secret` extension. +Each element of the array should be an object consisting of three string fields: `credential`, +`salt`, `hashedPassword`, and three boolean fields: `up`, `uv` and `clientPin`. +The first two string fields shall contain Base64-encoded binary +data: the FIDO2 credential ID and the salt value to pass to the FIDO2 device. +During authentication this salt along with the credential ID is sent to +the FIDO2 token, which will HMAC hash the salt with its internal secret key and return the result. +This resulting binary key should then be Base64-encoded and used as string password for the further layers of the stack. +The `hashedPassword` field of the `fido2HmacSalt` field shall be a UNIX password +hash to test this derived secret key against for authentication. +The `up`, `uv` and `clientPin` booleans map to the FIDO2 concepts of the same name and encode +whether the `uv`/`up` options are enabled during the authentication, and whether a PIN shall be required. +It is generally recommended that for each entry in `fido2HmacSalt` there's also a matching one in `fido2HmacCredential`, +and vice versa, with the same credential ID, appearing in the same order, +but this should not be required by applications processing user records. + +`recoveryKey`→ An array of objects, each defining a recovery key. +The object has two mandatory fields: `type` indicates the type of recovery key. +The only currently permitted value is the string `modhex64`. +The `hashedPassword` field contains a UNIX password hash of the normalized recovery key. +Recovery keys are in most ways similar to regular passwords, except that they are generated by +the computer, not chosen by the user, and are longer. +Currently, the only supported recovery key format is `modhex64`, which consists of 64 [modhex](https://developers.yubico.com/yubico-c/Manuals/modhex.1.html) characters (i.e. 256bit of information), in groups of 8 chars separated by dashes, -e.g. `lhkbicdj-trbuftjv-tviijfck-dfvbknrh-uiulbhui-higltier-kecfhkbk-egrirkui`. Recovery -keys should be accepted wherever regular passwords are. The `recoveryKey` field -should always be accompanied by a `recoveryKeyType` field (see above), and each +e.g. `lhkbicdj-trbuftjv-tviijfck-dfvbknrh-uiulbhui-higltier-kecfhkbk-egrirkui`. +Recovery keys should be accepted wherever regular passwords are. +The `recoveryKey` field should always be accompanied by a `recoveryKeyType` field (see above), and each entry in either should map 1:1 to an entry in the other, in the same order and matching the type. When accepting a recovery key it should be brought automatically into normalized form, i.e. the dashes inserted when missing, and @@ -697,25 +696,24 @@ recovery keys are effectively case-insensitive. ## Fields in the `perMachine` section -As mentioned, the `perMachine` section contains settings that shall apply to -specific systems only. This is primarily interesting for resource management -properties as they tend to require a per-system focus, however they may be used -for other purposes too. +As mentioned, the `perMachine` section contains settings that shall apply to specific systems only. +This is primarily interesting for resource management properties as they tend to require a per-system focus, +however they may be used for other purposes too. -The `perMachine` field in the top-level object is an array of objects. When -processing the user record first the various fields on the top-level object -should be parsed. Then, the `perMachine` array should be iterated in order, and +The `perMachine` field in the top-level object is an array of objects. +When processing the user record first the various fields on the top-level object should be parsed. +Then, the `perMachine` array should be iterated in order, and the various settings within each contained object should be applied that match either the indicated machine ID or host name, overriding any corresponding -settings previously parsed from the top-level object. There may be multiple -array entries that match a specific system, in which case all settings should -be applied. If the same option is set in the top-level object as in a +settings previously parsed from the top-level object. +There may be multiple array entries that match a specific system, in which case all settings should be applied. +If the same option is set in the top-level object as in a per-machine object then the per-machine setting wins and entirely undoes the -setting in the top-level object (i.e. no merging of properties that are arrays -is done). If the same option is set in multiple per-machine objects the one +setting in the top-level object (i.e. no merging of properties that are arrays is done). +If the same option is set in multiple per-machine objects the one specified later in the array wins (and here too no merging of individual fields -is done, the later field always wins in full). To summarize, the order of -application is (last one wins): +is done, the later field always wins in full). +To summarize, the order of application is (last one wins): 1. Settings in the top-level object 2. Settings in the first matching `perMachine` array entry @@ -725,26 +723,24 @@ application is (last one wins): The following fields are defined in this section: -`matchMachineId` → An array of strings that are formatted 128-bit IDs in -hex. If any of the specified IDs match the system's local machine ID -(i.e. matches `/etc/machine-id`) the fields in this object are honored. (As a -special case, if only a single machine ID is listed this field may be a single +`matchMachineId` → An array of strings that are formatted 128-bit IDs in hex. +If any of the specified IDs match the system's local machine ID +(i.e. matches `/etc/machine-id`) the fields in this object are honored. +(As a special case, if only a single machine ID is listed this field may be a single string rather than an array of strings.) -`matchHostname` → An array of strings that are valid hostnames. If any of the -specified hostnames match the system's local hostname, the fields in this -object are honored. If both `matchHostname` and `matchMachineId` are used -within the same array entry, the object is honored when either match succeeds, -i.e. the two match types are combined in OR, not in AND. (As a special case, if -only a single machine ID is listed this field may be a single string rather -than an array of strings.) +`matchHostname` → An array of strings that are valid hostnames. +If any of the specified hostnames match the system's local hostname, the fields in this object are honored. +If both `matchHostname` and `matchMachineId` are used within the same array entry, the object is honored when either match succeeds, +i.e. the two match types are combined in OR, not in AND. +(As a special case, if only a single hostname is listed this field may be a single string rather than an array of strings.) -These two are the only two fields specific to this section. All other fields -that may be used in this section are identical to the equally named ones in the +These two are the only two fields specific to this section. +All other fields that may be used in this section are identical to the equally named ones in the `regular` section (i.e. at the top-level object). Specifically, these are: -`iconName`, `location`, `shell`, `umask`, `environment`, `timeZone`, -`preferredLanguage`, `niceLevel`, `resourceLimits`, `locked`, `notBeforeUSec`, +`blobDirectory`, `blobManifest`, `iconName`, `location`, `shell`, `umask`, `environment`, `timeZone`, +`preferredLanguage`, `additionalLanguages`, `niceLevel`, `resourceLimits`, `locked`, `notBeforeUSec`, `notAfterUSec`, `storage`, `diskSize`, `diskSizeRelative`, `skeletonDirectory`, `accessMode`, `tasksMax`, `memoryHigh`, `memoryMax`, `cpuWeight`, `ioWeight`, `mountNoDevices`, `mountNoSuid`, `mountNoExecute`, `cifsDomain`, @@ -755,33 +751,31 @@ that may be used in this section are identical to the equally named ones in the `luksPbkdfType`, `luksPbkdfForceIterations`, `luksPbkdfTimeCostUSec`, `luksPbkdfMemoryCost`, `luksPbkdfParallelThreads`, `luksSectorSize`, `autoResizeMode`, `rebalanceWeight`, `rateLimitIntervalUSec`, `rateLimitBurst`, `enforcePasswordPolicy`, -`autoLogin`, `stopDelayUSec`, `killProcesses`, `passwordChangeMinUSec`, -`passwordChangeMaxUSec`, `passwordChangeWarnUSec`, +`autoLogin`, `preferredSessionType`, `preferredSessionLauncher`, `stopDelayUSec`, `killProcesses`, +`passwordChangeMinUSec`, `passwordChangeMaxUSec`, `passwordChangeWarnUSec`, `passwordChangeInactiveUSec`, `passwordChangeNow`, `pkcs11TokenUri`, `fido2HmacCredential`. ## Fields in the `binding` section -As mentioned, the `binding` section contains additional fields about the user -record, that bind it to the local system. These fields are generally used by a -local user manager (such as `systemd-homed.service`) to add in fields that make -sense in a local context but not necessarily in a global one. For example, a -user record that contains no `uid` field in the regular section is likely +As mentioned, the `binding` section contains additional fields about the user record, that bind it to the local system. +These fields are generally used by a local user manager (such as `systemd-homed.service`) to add in fields that make +sense in a local context but not necessarily in a global one. +For example, a user record that contains no `uid` field in the regular section is likely extended with one in the `binding` section to assign a local UID if no global UID is defined. All fields in the `binding` section only make sense in a local context and are -suppressed when the user record is ported between systems. The `binding` section -is generally persisted on the system but not in the home directories themselves +suppressed when the user record is ported between systems. +The `binding` section is generally persisted on the system but not in the home directories themselves and the home directory is supposed to be fully portable and thus not contain the information that `binding` is supposed to contain that binds the portable record to a specific system. The `binding` sub-object on the top-level user record object is keyed by the -machine ID the binding is intended for, which point to an object with the -fields of the bindings. These fields generally match fields that may also be -defined in the `regular` and `perMachine` sections, however override -both. Usually, the `binding` value should not contain settings different from +machine ID the binding is intended for, which point to an object with the fields of the bindings. +These fields generally match fields that may also be defined in the `regular` and `perMachine` sections, however override both. +Usually, the `binding` value should not contain settings different from those set via `regular` or `perMachine`, however this might happen if some settings are not supported locally (think: `fscrypt` is recorded as intended storage mechanism in the `regular` section, but the local kernel does not @@ -791,13 +785,13 @@ was created with `luks` as storage mechanism but later the user record was updated to prefer `subvolume`, which however doesn't change the actual storage used already which is pinned in the `binding` section). -The following fields are defined in the `binding` section. They all have an -identical format and override their equally named counterparts in the `regular` +The following fields are defined in the `binding` section. +They all have an identical format and override their equally named counterparts in the `regular` and `perMachine` sections: -`imagePath`, `homeDirectory`, `partitionUuid`, `luksUuid`, `fileSystemUuid`, -`uid`, `gid`, `storage`, `fileSystemType`, `luksCipher`, `luksCipherMode`, -`luksVolumeKeySize`. +`blobDirectory`, `imagePath`, `homeDirectory`, `partitionUuid`, `luksUuid`, +`fileSystemUuid`, `uid`, `gid`, `storage`, `fileSystemType`, `luksCipher`, +`luksCipherMode`, `luksVolumeKeySize`. ## Fields in the `status` section @@ -808,14 +802,14 @@ only acquired "on-the-fly" when requested. This section is arranged similarly to the `binding` section: the `status` sub-object of the top-level user record object is keyed by the machine ID, -which points to the object with the fields defined here. The following fields -are defined: - -`diskUsage` → An unsigned 64-bit integer. The currently used disk space of the -home directory in bytes. This value might be determined in different ways, -depending on the selected storage mechanism. For LUKS storage this is the file -size of the loopback file or block device size. For the -directory/subvolume/fscrypt storage this is the current disk space used as +which points to the object with the fields defined here. +The following fields are defined: + +`diskUsage` → An unsigned 64-bit integer. +The currently used disk space of the home directory in bytes. +This value might be determined in different ways, depending on the selected storage mechanism. +For LUKS storage this is the file size of the loopback file or block device size. +For the directory/subvolume/fscrypt storage this is the current disk space used as reported by the file system quota subsystem. `diskFree` → An unsigned 64-bit integer, denoting the number of "free" bytes in @@ -824,52 +818,45 @@ reported by `diskSize` and the used already as reported in `diskFree`, but possibly skewed by metadata sizes, disk compression and similar. `diskSize` → An unsigned 64-bit integer, denoting the disk space currently -allotted to the user, in bytes. Depending on the storage mechanism this can mean -different things (see above). In contrast to the top-level field of the same -(or the one in the `perMachine` section), this field reports the current size -allotted to the user, not the intended one. The values may differ when user -records are updated without the home directory being re-sized. +allotted to the user, in bytes. Depending on the storage mechanism this can mean different things (see above). +In contrast to the top-level field of the same (or the one in the `perMachine` section), +this field reports the current size allotted to the user, not the intended one. +The values may differ when user records are updated without the home directory being re-sized. `diskCeiling`/`diskFloor` → Unsigned 64-bit integers indicating upper and lower -bounds when changing the `diskSize` value, in bytes. These values are typically -derived from the underlying data storage, and indicate in which range the home -directory may be re-sized in, i.e. in which sensible range the `diskSize` value -should be kept. - -`state` → A string indicating the current state of the home directory. The -precise set of values exposed here are up to the service managing the home -directory to define (i.e. are up to the service identified with the `service` -field below). However, it is recommended to stick to a basic vocabulary here: +bounds when changing the `diskSize` value, in bytes. +These values are typically derived from the underlying data storage, and indicate in which range the home +directory may be re-sized in, i.e. in which sensible range the `diskSize` value should be kept. + +`state` → A string indicating the current state of the home directory. +The precise set of values exposed here are up to the service managing the home +directory to define (i.e. are up to the service identified with the `service` field below). +However, it is recommended to stick to a basic vocabulary here: `inactive` for a home directory currently not mounted, `absent` for a home directory that cannot be mounted currently because it does not exist on the local system, `active` for a home directory that is currently mounted and accessible. -`service` → A string identifying the service that manages this user record. For -example `systemd-homed.service` sets this to `io.systemd.Home` to all user -records it manages. This is particularly relevant to define clearly the context -in which `state` lives, see above. Note that this field also exists on the -top-level object (i.e. in the `regular` section), which it overrides. The -`regular` field should be used if conceptually the user record can only be -managed by the specified service, and this `status` field if it can -conceptually be managed by different managers, but currently is managed by the +`service` → A string identifying the service that manages this user record. +For example `systemd-homed.service` sets this to `io.systemd.Home` to all user records it manages. +This is particularly relevant to define clearly the context in which `state` lives, see above. +Note that this field also exists on the top-level object (i.e. in the `regular` section), which it overrides. +The `regular` field should be used if conceptually the user record can only be managed by the specified service, +and this `status` field if it can conceptually be managed by different managers, but currently is managed by the specified one. -`signedLocally` → A boolean. If true indicates that the user record is signed -by a public key for which the private key is available locally. This means that -the user record may be modified locally as it can be re-signed with the private -key. If false indicates that the user record is signed by a public key -recognized by the local manager but whose private key is not available -locally. This means the user record cannot be modified locally as it couldn't -be signed afterwards. +`signedLocally` → A boolean. +If true indicates that the user record is signed by a public key for which the private key is available locally. +This means that the user record may be modified locally as it can be re-signed with the private key. +If false indicates that the user record is signed by a public key recognized by the local manager but whose private key is not available locally. +This means the user record cannot be modified locally as it couldn't be signed afterwards. -`goodAuthenticationCounter` → An unsigned 64-bit integer. This counter is -increased by one on every successful authentication attempt, i.e. an -authentication attempt where a security token of some form was presented and it -was correct. +`goodAuthenticationCounter` → An unsigned 64-bit integer. +This counter is increased by one on every successful authentication attempt, i.e. an +authentication attempt where a security token of some form was presented and it was correct. -`badAuthenticationCounter` → An unsigned 64-bit integer. This counter is -increased by one on every unsuccessfully authentication attempt, i.e. an +`badAuthenticationCounter` → An unsigned 64-bit integer. +This counter is increased by one on every unsuccessfully authentication attempt, i.e. an authentication attempt where a security token of some form was presented and it was incorrect. @@ -884,14 +871,13 @@ UNIX epoch (1970) where the most recent rate limiting interval has been started, as configured with `rateLimitIntervalUSec`. `rateLimitCount` → An unsigned 64-bit integer, counting the authentication -attempts in the current rate limiting interval, see above. If this counter -grows beyond the value configured in `rateLimitBurst` authentication attempts -are temporarily refused. - -`removable` → A boolean value. If true the manager of this user record -determined the home directory being on removable media. If false it was -determined the home directory is in internal built-in media. (This is used by -`systemd-logind.service` to automatically pick the right default value for +attempts in the current rate limiting interval, see above. +If this counter grows beyond the value configured in `rateLimitBurst` authentication attempts are temporarily refused. + +`removable` → A boolean value. +If true the manager of this user record determined the home directory being on removable media. +If false it was determined the home directory is in internal built-in media. +(This is used by `systemd-logind.service` to automatically pick the right default value for `stopDelayUSec` if the field is not explicitly specified: for home directories on removable media the delay is selected very low to minimize the chance the home directory remains in unclean state if the storage device is removed from @@ -903,26 +889,37 @@ itself. `fileSystemType` → The file system type backing the home directory: a short string, such as "btrfs", "ext4", "xfs". +`fallbackShell`, `fallbackHomeDirectory` → These fields have the same contents +and format as the `shell` and `homeDirectory` fields (see above). When the +`useFallback` field (see below) is set to true, the data from these fields +should override the fields of the same name without the `fallback` prefix. + +`useFallback` → A boolean that allows choosing between the regular `shell` and +`homeDirectory` fields or the fallback fields of the same name (see above). If +`true` the fallback fields should be used in place of the regular fields, if +`false` or unset the regular fields should be used. This mechanism is used for +enable subsystems such as SSH to allow logins into user accounts, whose homed +directories need further unlocking (because the SSH native authentication +cannot release a suitabable disk encryption key), which the fallback shell +provides. + ## Fields in the `signature` section As mentioned, the `signature` section of the user record may contain one or -more cryptographic signatures of the user record. Like all others, this section -is optional, and only used when cryptographic validation of user records shall -be used. Specifically, all user records managed by `systemd-homed.service` will -carry such signatures and the service refuses managing user records that come -without signature or with signatures not recognized by any locally defined -public key. - -The `signature` field in the top-level user record object is an array of -objects. Each object encapsulates one signature and has two fields: `data` and -`key` (both are strings). The `data` field contains the actual signature, -encoded in Base64, the `key` field contains a copy of the public key whose -private key was used to make the signature, in PEM format. Currently only -signatures with Ed25519 keys are defined. +more cryptographic signatures of the user record. +Like all others, this section is optional, and only used when cryptographic validation of user records shall be used. +Specifically, all user records managed by `systemd-homed.service` will carry such signatures and the service refuses managing user records that come +without signature or with signatures not recognized by any locally defined public key. + +The `signature` field in the top-level user record object is an array of objects. +Each object encapsulates one signature and has two fields: `data` and `key` (both are strings). +The `data` field contains the actual signature, encoded in Base64, the `key` field contains a copy of the public key whose +private key was used to make the signature, in PEM format. +Currently only signatures with Ed25519 keys are defined. Before signing the user record should be brought into "normalized" form, -i.e. the keys in all objects should be sorted alphabetically. All redundant -white-space and newlines should be removed and the JSON text then signed. +i.e. the keys in all objects should be sorted alphabetically. +All redundant white-space and newlines should be removed and the JSON text then signed. The signatures only cover the `regular`, `perMachine` and `privileged` sections of the user records, all other sections (include `signature` itself), are @@ -930,40 +927,35 @@ removed before the signature is calculated. Rationale for signing and threat model: while a multi-user operating system like Linux strives for being sufficiently secure even after a user acquired a -local login session reality tells us this is not the case. Hence it is -essential to restrict carefully which users may gain access to a system and -which ones shall not. A minimal level of trust must be established between -system, user record and the user themselves before a log-in request may be -permitted. In particular if the home directory is provided in its own LUKS2 -encapsulated file system it is essential this trust is established before the -user logs in (and hence the file system mounted), since file system -implementations on Linux are well known to be relatively vulnerable to rogue -disk images. User records and home directories in many context are expected to -be something shareable between multiple systems, and the transfer between them -might not happen via exclusively trusted channels. Hence it's essential that -the user record is not manipulated between uses. Finally, resource management -(which may be done by the various fields of the user record) is security -sensitive, since it should forcefully lock the user into the assigned resource -usage and not allow them to use more. The requirement of being able to trust -the user record data combined with the potential transfer over untrusted +local login session reality tells us this is not the case. +Hence it is essential to restrict carefully which users may gain access to a system and which ones shall not. +A minimal level of trust must be established between system, +user record and the user themselves before a log-in request may be permitted. +In particular if the home directory is provided in its own LUKS2 encapsulated file system +it is essential this trust is established before the user logs in (and hence the file system mounted), +since file system implementations on Linux are well known to be relatively vulnerable to rogue disk images. +User records and home directories in many context are expected to be something shareable between multiple systems, +and the transfer between them might not happen via exclusively trusted channels. +Hence it's essential that the user record is not manipulated between uses. +Finally, resource management (which may be done by the various fields of the user record) is security +sensitive, since it should forcefully lock the user into the assigned resource usage and not allow them to use more. +The requirement of being able to trust the user record data combined with the potential transfer over untrusted channels suggest a cryptographic signature mechanism where only user records signed by a recognized key are permitted to log in locally. -Note that other mechanisms for establishing sufficient trust exist too, and are -perfectly valid as well. For example, systems like LDAP/ActiveDirectory -generally insist on user record transfer from trusted servers via encrypted TLS -channels only. Or traditional UNIX users created locally in `/etc/passwd` never -exist outside of the local trusted system, hence transfer and trust in the -source are not an issue. The major benefit of operating with signed user -records is that they are self-sufficiently trusted, not relying on a secure -channel for transfer, and thus being compatible with a more distributed model +Note that other mechanisms for establishing sufficient trust exist too, and are perfectly valid as well. +For example, systems like LDAP/ActiveDirectory generally insist on user record transfer from trusted servers via encrypted TLS channels only. +Or traditional UNIX users created locally in `/etc/passwd` never exist outside of the local trusted system, +hence transfer and trust in the source are not an issue. +The major benefit of operating with signed user records is that they are self-sufficiently trusted, +not relying on a secure channel for transfer, and thus being compatible with a more distributed model of home directory transfer, including on USB sticks and such. ## Fields in the `secret` section As mentioned, the `secret` section of the user record should never be persisted -nor transferred across machines. It is only defined in short-lived operations, -for example when a user record is first created or registered, as the secret +nor transferred across machines. +It is only defined in short-lived operations, for example when a user record is first created or registered, as the secret key data needs to be available to derive encryption keys from and similar. The `secret` field of the top-level user record contains the following fields: @@ -971,26 +963,26 @@ The `secret` field of the top-level user record contains the following fields: `password` → an array of strings, each containing a plain text password. `tokenPin` → an array of strings, each containing a plain text PIN, suitable -for unlocking security tokens that require that. (The field `pkcs11Pin` should -be considered a compatibility alias for this field, and merged with `tokenPin` +for unlocking security tokens that require that. +(The field `pkcs11Pin` should be considered a compatibility alias for this field, and merged with `tokenPin` in case both are set.) -`pkcs11ProtectedAuthenticationPathPermitted` → a boolean. If set to true allows -the receiver to use the PKCS#11 "protected authentication path" (i.e. a -physical button/touch element on the security token) for authenticating the -user. If false or unset, authentication this way shall not be attempted. +`pkcs11ProtectedAuthenticationPathPermitted` → a boolean. +If set to true allows the receiver to use the PKCS#11 "protected authentication path" (i.e. a +physical button/touch element on the security token) for authenticating the user. +If false or unset, authentication this way shall not be attempted. -`fido2UserPresencePermitted` → a boolean. If set to true allows the receiver to -use the FIDO2 "user presence" flag. This is similar to the concept of -`pkcs11ProtectedAuthenticationPathPermitted`, but exposes the FIDO2 "up" -concept behind it. If false or unset authentication this way shall not be -attempted. +`fido2UserPresencePermitted` → a boolean. +If set to true allows the receiver to use the FIDO2 "user presence" flag. +This is similar to the concept of `pkcs11ProtectedAuthenticationPathPermitted`, +but exposes the FIDO2 "up" concept behind it. +If false or unset authentication this way shall not be attempted. `fido2UserVerificationPermitted` → a boolean. If set to true allows the -receiver to use the FIDO2 "user verification" flag. This is similar to the -concept of `pkcs11ProtectedAuthenticationPathPermitted`, but exposes the FIDO2 -"uv" concept behind it. If false or unset authentication this way shall not be -attempted. +receiver to use the FIDO2 "user verification" flag. +This is similar to the concept of `pkcs11ProtectedAuthenticationPathPermitted`, +but exposes the FIDO2 "uv" concept behind it. +If false or unset authentication this way shall not be attempted. ## Mapping to `struct passwd` and `struct spwd` @@ -1023,21 +1015,19 @@ is stored in the shadow entry `struct spwd`'s field `sp_pwdp`. ## Extending These Records -User records following this specifications are supposed to be extendable for -various applications. In general, subsystems are free to introduce their own -keys, as long as: +User records following this specifications are supposed to be extendable for various applications. +In general, subsystems are free to introduce their own keys, as long as: * Care should be taken to place the keys in the right section, i.e. the most appropriate for the data field. -* Care should be taken to avoid namespace clashes. Please prefix your fields - with a short identifier of your project to avoid ambiguities and +* Care should be taken to avoid namespace clashes. + Please prefix your fields with a short identifier of your project to avoid ambiguities and incompatibilities. -* This specification is supposed to be a living specification. If you need - additional fields, please consider submitting them upstream for inclusion in - this specification. If they are reasonably universally useful, it would be - best to list them here. +* This specification is supposed to be a living specification. + If you need additional fields, please consider submitting them upstream for inclusion in this specification. + If they are reasonably universally useful, it would be best to list them here. ## Examples @@ -1045,7 +1035,7 @@ The shortest valid user record looks like this: ```json { - "userName" : "u" +"userName" : "u" } ``` @@ -1073,6 +1063,7 @@ A fully featured user record associated with a home directory managed by "fileSystemUuid" : "758e88c8-5851-4a2a-b88f-e7474279c111", "gid" : 60232, "homeDirectory" : "/home/grobie", + "blobDirectory" : "/var/cache/systemd/homed/grobie/", "imagePath" : "/home/grobie.home", "luksCipher" : "aes", "luksCipherMode" : "xts-plain64", @@ -1083,6 +1074,10 @@ A fully featured user record associated with a home directory managed by "uid" : 60232 } }, + "blobManifest" : { + "avatar" : "c0636851d25a62d817ff7da4e081d1e646e42c74d0ecb53425f75fcf1ba43b52", + "login-background" : "da7ad0222a6edbc6cd095149c72d38d92fd3114f606e4b57469857ef47fade18" + }, "disposition" : "regular", "enforcePasswordPolicy" : false, "lastChangeUSec" : 1565950024279735, @@ -1119,8 +1114,8 @@ A fully featured user record associated with a home directory managed by ``` When `systemd-homed.service` manages a home directory it will also include a -version of the user record in the home directory itself in the `~/.identity` -file. This version lacks the `binding` and `status` sections which are used for +version of the user record in the home directory itself in the `~/.identity` file. +This version lacks the `binding` and `status` sections which are used for local management of the user, but are not intended to be portable between systems. It would hence look like this: diff --git a/docs/USER_RECORD_BLOB_DIRS.md b/docs/USER_RECORD_BLOB_DIRS.md new file mode 100644 index 0000000..efbc5cd --- /dev/null +++ b/docs/USER_RECORD_BLOB_DIRS.md @@ -0,0 +1,128 @@ +--- +title: User Record Blob Directories +category: Users, Groups and Home Directories +layout: default +SPDX-License-Identifier: LGPL-2.1-or-later +--- + +# User Record Blob Directories + +The blob directories are for storing binary or unstructured data that would +otherwise be stored in [JSON User Records](/USER_RECORD). For instance, +this includes image files such as the user's avatar picture. This data, +like most of the user record, will be made publicly available to the +system. + +The JSON User Record specifies the location of the blob directory via the +`blobDirectory` field. If the field is unset, then there is no blob directory +and thus no blob files to look for. Note that `blobDirectory` can exist in the +`regular`, `perMachine`, and `status` sections. The blob directory is completely +owned and managed by the service that owns the rest of the user record (as +specified in the `service` field). + +For consistency, blob directories have certain restrictions placed on them +that may be enforced by their owning service. Services implementing blob +directories are free to ignore these restrictions, but software that wishes +to store some of its data in blob directories must adhere to the following: + +* The directory only contains regular files; no sub-directories or any special + files are permitted. + +* Filenames inside of the directory are restricted to + [URI Unreserved Characters](https://www.rfc-editor.org/rfc/rfc3986#section-2.3) + (alphanumeric, `-`, `.`, `_`, and `~`), and must not start with a dot. + +* The total size of the directory should not exceed 64M. + +* File ownership and permissions will not be preserved. The service may reset + the mode of the files to 0644, and ownership to whatever it wishes. + +* Timestamps, xattrs, ACLs, or any other metadata on the files will not be preserved. + +Services are required to ensure that the directory and its contents are +world-readable. Aside from this requirement, services are free to provide +the directory and its contents in whatever manner they like, including but +not limited to synthesizing the directory at runtime using external data +or keeping around multiple copies. Thus, only the service that owns the +directory is permitted to write to this directory in any way: for all +other software the directory is strictly read-only. + +Services may choose to provide some way to change user records. Services +that provide this functionality should support changing the blob directory also. +Care must be taken to avoid exposing sensitive data to malicious clients. This +includes but is not limited to disallowing symlinks and using file descriptors +(excluding O_PATH!) to ensure that the client actually has permission to access +the data it wants the service to publish. + +Services that make use of the `signature` section in the records they manage +should enforce `blobManifest`. This ensures that the contents of the blob directory +are part of the cryptographically signed data. + +## Known Files + +Various files in the blob directories have known semantic meanings. +The following files are currently defined: + +`avatar` → An image file that should be used as the user's avatar picture. +The exact file type and resolution of this image are left unspecified, +and requirements will depend on the capabilities of the components that will +display it. However, we suggest the use of commonly-supported picture formats +(i.e. PNG or JPEG) and a resolution of 512 x 512. This image should not have any +transparency. If missing, of an incompatible file type, or otherwise unusable, +then the user does not have a profile picture and a default will be used instead. + +`login-background` → An image file that will be used as the user's background on the +login screen (i.e. in GDM). The exact file type and resolution are left unspecified +and are ultimately up to the components that will render this background image. This +image should not have any transparency. If missing, of an incompatible file type, or +otherwise unusable, a fallback background of some kind will be used. + +## Extending These Directories + +Like JSON User Records, the blob directories are intended to be extendable for +various applications. In general, subsystems are free to introduce their own +files, as long as: + +* The requirements listed above are all met. + +* Care is taken to avoid namespace clashes. Please prefix your file names with + a short identifier of your project to avoid ambiguities and incompatibilities. + +* This specification is supposed to be a living specification. If you need + additional files, please consider defining them upstream for inclusion in + this specification. If they are reasonably universally useful, it would be + best to list them here. + +## Examples + +The simplest way to define a user record is via the drop-in directories (as documented +in [nss-systemd(8)](https://www.freedesktop.org/software/systemd/man/latest/nss-systemd.html) +and [systemd-userdb.service(8)](https://www.freedesktop.org/software/systemd/man/latest/systemd-userdbd.service.html)). +Such records can have blob directories by simply referring to some persistent +place from the record, possibly next to the record itself. For instance, +`/etc/userdb/grobie.user` may contain: + +```json +{ + "userName": "grobie", + "disposition": "regular", + "homeDirectory": "/home/grobie", + "blobDirectory": "/etc/userdb/grobie.blob/", +} +``` + +In this case, `/etc/userdb/grobie.blob/` will be the blob directory for the +user `grobie`. + +A more complicated case is a home directory managed by `systemd-homed.service`. +When it manages a home directory, it maintains and synchronizes two separate +blob directories: one belonging to the system in `/var/cache/systemd/home`, +and another belonging to the home directory in `~/.identity-blob`. The system +blob directory ensures that the blob data is available while the home directory +is encrypted or otherwise unavailable, and the home blob directory ensures that +the user account remains portable between systems. To implement this behavior, +`systemd-homed.service` always sets `blobDirectory` to the system blob directory +in the `binding` section of the user record (i.e. this is _not_ persisted to +`~/.identity`). If some client tries to update the user record with a new blob +directory, `systemd-homed.service` will copy the updated blob directory into both +the system and home blob locations. diff --git a/docs/VM_INTERFACE.md b/docs/VM_INTERFACE.md new file mode 100644 index 0000000..abe7067 --- /dev/null +++ b/docs/VM_INTERFACE.md @@ -0,0 +1,54 @@ +--- +title: VM Interface +category: Interfaces +layout: default +SPDX-License-Identifier: LGPL-2.1-or-later +--- + +# The VM Interface + +Also consult [Writing Virtual Machine or Container +Managers](https://systemd.io/WRITING_VM_AND_CONTAINER_MANAGERS). + +systemd has a number of interfaces for interacting with virtual machine +managers, when systemd is used inside of a VM. If you work on a VM manager, +please consider supporting the following interfaces. + +1. systemd supports passing immutable binary data blobs with limited size and + restricted access to services via the `ImportCredential=`, `LoadCredential=` + and `SetCredential=` settings. These credentials may be passed into a system + via SMBIOS Type 11 vendor strings, see + [systemd(1)](https://www.freedesktop.org/software/systemd/man/latest/systemd.html) + for details. This concept may be used to flexibly configure various facets + ot the guest system. See + [systemd.system-credentials(7)](https://www.freedesktop.org/software/systemd/man/latest/systemd.system-credentials.html) + for a list of system credentials implemented by various systemd components. + +2. Readiness, information about various system properties and functionality, as + well as progress of boot may be reported by systemd to a machine manager via + the `sd_notify()` protocol via `AF_VSOCK` sockets. The address of this + socket may be configured via the `vmm.notify_socket` system credential. See + [systemd(1)](https://www.freedesktop.org/software/systemd/man/latest/systemd.html). + +3. The + [systemd-ssh-generator(8)](https://www.freedesktop.org/software/systemd/man/latest/systemd-ssh-generator.html) + functionality will automatically bind SSH login functionality to `AF_VSOCK` + port 22, if the system runs in a VM. + +4. If not initialized yet the system's + [machine-id(5)](https://www.freedesktop.org/software/systemd/man/latest/machine-id.html) + is automatically set to the SMBIOS product UUID if available and invocation + in an VM environment is detected. + +5. The + [`systemd-boot(7)`](https://www.freedesktop.org/software/systemd/man/latest/systemd-boot.html) + and + [`systemd-stub(7)`](https://www.freedesktop.org/software/systemd/man/latest/systemd-stub.html) + components support two SMBIOS Type 11 vendor strings that may be used to + extend the kernel command line of booted Linux environments: + `io.systemd.stub.kernel-cmdline-extra=` and + `io.systemd.boot.kernel-cmdline-extra=`. + +Also see +[smbios-type-11(7)](https://www.freedesktop.org/software/systemd/man/latest/smbios-type-11.html) +for a list of supported SMBIOS Type 11 vendor strings. diff --git a/docs/WRITING_DISPLAY_MANAGERS.md b/docs/WRITING_DISPLAY_MANAGERS.md index 467e8a8..1fe70d0 100644 --- a/docs/WRITING_DISPLAY_MANAGERS.md +++ b/docs/WRITING_DISPLAY_MANAGERS.md @@ -33,8 +33,6 @@ Minimal porting (without multi-seat) requires the following: The former should contain "seat0", the latter the VT number your session runs on. pam_systemd can determine these values automatically but it's nice to pass these variables anyway. In summary: porting a display manager from ConsoleKit to systemd primarily means removing code, not necessarily adding any new code. Here, a cheers to simplicity! -Complete porting (with multi-seat) requires the following (Before you continue, make sure to read up on [Multi-Seat on Linux](https://www.freedesktop.org/wiki/Software/systemd/multiseat) first.): - 1. Subscribe to seats showing up and going away, via the systemd-logind D-Bus interface's SeatAdded and SeatRemoved signals. Take possession of each seat by spawning your greeter on it. However, do so exclusively for seats where the boolean CanGraphical property is true. diff --git a/docs/WRITING_VM_AND_CONTAINER_MANAGERS.md b/docs/WRITING_VM_AND_CONTAINER_MANAGERS.md index e3cc280..724d3d6 100644 --- a/docs/WRITING_VM_AND_CONTAINER_MANAGERS.md +++ b/docs/WRITING_VM_AND_CONTAINER_MANAGERS.md @@ -9,17 +9,12 @@ SPDX-License-Identifier: LGPL-2.1-or-later _Or: How to hook up your favorite VM or container manager with systemd_ -Nomenclature: a _Virtual Machine_ shall refer to a system running on -virtualized hardware consisting of a full OS with its own kernel. A _Container_ -shall refer to a system running on the same shared kernel of the host, but -running a mostly complete OS with its own init system. Both kinds of -virtualized systems shall collectively be called "machines". - -systemd provides a number of integration points with virtual machine and -container managers, such as libvirt, LXC or systemd-nspawn. On one hand there -are integration points of the VM/container manager towards the host OS it is -running on, and on the other there integration points for container managers -towards the guest OS it is managing. +Nomenclature: a _Virtual Machine_ shall refer to a system running on virtualized hardware consisting of a full OS with its own kernel. +A _Container_ shall refer to a system running on the same shared kernel of the host, but running a mostly complete OS with its own init system. +Both kinds of virtualized systems shall collectively be called "machines". + +systemd provides a number of integration points with virtual machine and container managers, such as libvirt, LXC or systemd-nspawn. +On one hand there are integration points of the VM/container manager towards the host OS it is running on, and on the other there integration points for container managers towards the guest OS it is managing. Note that this document does not cover lightweight containers for the purpose of application sandboxes, i.e. containers that do _not_ run a init system of @@ -27,36 +22,19 @@ their own. ## Host OS Integration -All virtual machines and containers should be registered with the -[systemd-machined(8)](https://www.freedesktop.org/software/systemd/man/latest/systemd-machined.service.html) -mini service that is part of systemd. This provides integration into the core -OS at various points. For example, tools like ps, cgls, gnome-system-manager -use this registration information to show machine information for running -processes, as each of the VM's/container's processes can reliably attributed to -a registered machine. The various systemd tools (like systemctl, journalctl, -loginctl, systemd-run, ...) all support a -M switch that operates on machines -registered with machined. "machinectl" may be used to execute operations on any -such machine. When a machine is registered via machined its processes will -automatically be placed in a systemd scope unit (that is located in the -machines.slice slice) and thus appear in "systemctl" and similar commands. The -scope unit name is based on the machine meta information passed to machined at -registration. - -For more details on the APIs provided by machine consult [the bus API interface -documentation](https://www.freedesktop.org/software/systemd/man/latest/org.freedesktop.machine1.html). +All virtual machines and containers should be registered with the [machined](https://www.freedesktop.org/software/systemd/man/latest/org.freedesktop.machine1) mini service that is part of systemd. This provides integration into the core OS at various points. For example, tools like ps, cgls, gnome-system-manager use this registration information to show machine information for running processes, as each of the VM's/container's processes can reliably attributed to a registered machine. +The various systemd tools (like systemctl, journalctl, loginctl, systemd-run, ...) all support a -M switch that operates on machines registered with machined. +"machinectl" may be used to execute operations on any such machine. +When a machine is registered via machined its processes will automatically be placed in a systemd scope unit (that is located in the machines.slice slice) and thus appear in "systemctl" and similar commands. +The scope unit name is based on the machine meta information passed to machined at registration. + +For more details on the APIs provided by machine consult [the bus API interface documentation](https://www.freedesktop.org/software/systemd/man/latest/org.freedesktop.machine1). ## Guest OS Integration -As container virtualization is much less comprehensive, and the guest is less -isolated from the host, there are a number of interfaces defined how the -container manager can set up the environment for systemd running inside a -container. These Interfaces are documented in [Container Interface of -systemd](https://systemd.io/CONTAINER_INTERFACE). - -VM virtualization is more comprehensive and fewer integration APIs are -available. In fact there's only one: a VM manager may initialize the SMBIOS DMI -field "Product UUUID" to a UUID uniquely identifying this virtual machine -instance. This is read in the guest via /sys/class/dmi/id/product_uuid, and -used as configuration source for /etc/machine-id if in the guest, if that file -is not initialized yet. Note that this is currently only supported for kvm -hosts, but may be extended to other managers as well. +As container virtualization is much less comprehensive, and the guest is less isolated from the host, there are a number of interfaces defined how the container manager can set up the environment for systemd running inside a container. These Interfaces are documented in [Container Interface of systemd](/CONTAINER_INTERFACE). + +VM virtualization is more comprehensive and fewer integration APIs are available. +In fact there's only one: a VM manager may initialize the SMBIOS DMI field "Product UUUID" to a UUID uniquely identifying this virtual machine instance. +This is read in the guest via `/sys/class/dmi/id/product_uuid`, and used as configuration source for `/etc/machine-id` if in the guest, if that file is not initialized yet. +Note that this is currently only supported for kvm hosts, but may be extended to other managers as well. diff --git a/docs/_data/extra_pages.json b/docs/_data/extra_pages.json index d24e301..09a6bed 100644 --- a/docs/_data/extra_pages.json +++ b/docs/_data/extra_pages.json @@ -102,7 +102,7 @@ { "category": "Publications", "title": "SUSE White Paper on systemd", - "url": "https://www.suse.com/media/white-paper/systemd_in_suse_linux_enterprise_12_white_paper.pdf" + "url": "https://documentation.suse.com/external-tree/en-us/sles/12-SP4/systemd_in_suse_linux_enterprise_12_white_paper.pdf" }, { "category": "Videos for Users and Administrators", @@ -401,16 +401,6 @@ }, { "category": "Documentation for Developers - external links", - "title": "On /etc/os-release", - "url": "http://0pointer.de/blog/projects/os-release.html" - }, - { - "category": "Documentation for Developers - external links", - "title": "Control Groups vs. Control Groups", - "url": "http://0pointer.de/blog/projects/cgroups-vs-cgroups.html" - }, - { - "category": "Documentation for Developers - external links", "title": "The 30 Biggest Myths about systemd", "url": "http://0pointer.de/blog/projects/the-biggest-myths.html" }, diff --git a/docs/_includes/footer.html b/docs/_includes/footer.html index 3e5214e..bdaa0ee 100644 --- a/docs/_includes/footer.html +++ b/docs/_includes/footer.html @@ -1,7 +1,7 @@ <!-- SPDX-License-Identifier: LGPL-2.1-or-later --> <footer class="site-footer"> - <p>© systemd, 2023</p> + <p>© systemd, 2024</p> <p><a href="https://github.com/systemd/systemd/tree/main/docs">Website source</a></p> </footer> diff --git a/docs/sysvinit/README.in b/docs/sysvinit/README.in index 89effc8..ace1aba 100644 --- a/docs/sysvinit/README.in +++ b/docs/sysvinit/README.in @@ -24,4 +24,4 @@ Further reading: man:systemctl(1) man:systemd(1) https://0pointer.de/blog/projects/systemd-for-admins-3.html - https://www.freedesktop.org/wiki/Software/systemd/Incompatibilities + https://systemd.io/INCOMPATIBILITIES |