summaryrefslogtreecommitdiffstats
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/AUTOMATIC_BOOT_ASSESSMENT.md207
-rw-r--r--docs/BLOCK_DEVICE_LOCKING.md67
-rw-r--r--docs/BOOT_LOADER_INTERFACE.md118
-rw-r--r--docs/BOOT_LOADER_SPECIFICATION.md187
-rw-r--r--docs/CGROUP_DELEGATION.md477
-rw-r--r--docs/CNAME1
-rw-r--r--docs/CODE_OF_CONDUCT.md18
-rw-r--r--docs/CODE_QUALITY.md68
-rw-r--r--docs/CODING_STYLE.md524
-rw-r--r--docs/CONTRIBUTING.md41
-rw-r--r--docs/DISTRO_PORTING.md79
-rw-r--r--docs/ENVIRONMENT.md179
-rw-r--r--docs/HACKING.md127
-rw-r--r--docs/PORTABLE_SERVICES.md260
-rw-r--r--docs/PREDICTABLE_INTERFACE_NAMES.md68
-rw-r--r--docs/RELEASE.md16
-rw-r--r--docs/TRANSIENT-SETTINGS.md465
-rw-r--r--docs/TRANSLATORS.md78
-rw-r--r--docs/UIDS-GIDS.md282
-rw-r--r--docs/_config.yml1
-rw-r--r--docs/index.md11
-rw-r--r--docs/sysvinit/README.in27
-rw-r--r--docs/sysvinit/meson.build11
-rw-r--r--docs/var-log/README.in26
-rw-r--r--docs/var-log/meson.build11
25 files changed, 3349 insertions, 0 deletions
diff --git a/docs/AUTOMATIC_BOOT_ASSESSMENT.md b/docs/AUTOMATIC_BOOT_ASSESSMENT.md
new file mode 100644
index 0000000..6f7182a
--- /dev/null
+++ b/docs/AUTOMATIC_BOOT_ASSESSMENT.md
@@ -0,0 +1,207 @@
+---
+title: Automatic Boot Assessment
+---
+
+# Automatic Boot Assessment
+
+systemd provides support for automatically reverting back to the previous
+version of the OS or kernel in case the system consistently fails to boot. This
+support is built into various of its components. When used together these
+components provide a complete solution on UEFI systems, built as add-on to the
+[Boot Loader
+Specification](https://systemd.io/BOOT_LOADER_SPECIFICATION). However, the
+different components may also be used independently, and in combination with
+other software, to implement similar schemes, for example with other boot
+loaders or for non-UEFI systems. Here's a brief overview of the complete set of
+components:
+
+* The
+ [`systemd-boot(7)`](https://www.freedesktop.org/software/systemd/man/systemd-boot.html)
+ boot loader optionally maintains a per-boot-loader-entry counter that is
+ decreased by one on each attempt to boot the entry, prioritizing entries that
+ have non-zero counters over those which already reached a counter of zero
+ when choosing the entry to boot.
+
+* The
+ [`systemd-bless-boot.service(8)`](https://www.freedesktop.org/software/systemd/man/systemd-bless-boot.service.html)
+ service automatically marks a boot loader entry, for which boot counting as
+ mentioned above is enabled, as "good" when a boot has been determined to be
+ successful, thus turning off boot counting for it.
+
+* The
+ [`systemd-bless-boot-generator(8)`](https://www.freedesktop.org/software/systemd/man/systemd-bless-boot-generator.html)
+ generator automatically pulls in `systemd-bless-boot.service` when use of
+ `systemd-boot` with boot counting enabled is detected.
+
+* The
+ [`systemd-boot-check-no-failures.service(8)`](https://www.freedesktop.org/software/systemd/man/systemd-boot-check-no-failures.service.html)
+ service is a simple health check tool that determines whether the boot
+ completed successfully. When enabled it becomes an indirect dependency of
+ `systemd-bless-boot.service` (by means of `boot-complete.target`, see
+ below), ensuring that the boot will not be considered successful if there are
+ any failed services.
+
+* The `boot-complete.target` target unit (see
+ [`systemd.special(7)`](https://www.freedesktop.org/software/systemd/man/systemd.special.html))
+ serves as a generic extension point both for units that shall be considered
+ necessary to consider a boot successful on one side (example:
+ `systemd-boot-check-no-failures.service` as described above), and units that
+ want to act only if the boot is successful on the other (example:
+ `systemd-bless-boot.service` as described above).
+
+* The
+ [`kernel-install(8)`](https://www.freedesktop.org/software/systemd/man/kernel-install.html)
+ script can optionally create boot loader entries that carry an initial boot
+ counter (the initial counter is configurable in `/etc/kernel/tries`).
+
+# Details
+
+The boot counting data `systemd-boot` and `systemd-bless-boot.service`
+manage is stored in the name of the boot loader entries. If a boot loader entry
+file name contains `+` followed by one or two numbers (if two numbers, then
+those need to be separated by `-`) right before the `.conf` suffix, then boot
+counting is enabled for it. The first number is the "tries left" counter
+encoding how many attempts to boot this entry shall still be made. The second
+number is the "tries done" counter, encoding how many failed attempts to boot
+it have already been made. Each time a boot loader entry marked this way is
+booted the first counter is decreased by one, and the second one increased by
+one. (If the second counter is missing, then it is assumed to be equivalent to
+zero.) If the "tries left" counter is above zero the entry is still considered
+for booting (the entry's state is considered to be "indeterminate"), as soon as
+it reached zero the entry is not tried anymore (entry state "bad"). If the boot
+attempt completed successfully the entry's counters are removed from the name
+(entry state "good"), thus turning off boot counting for the future.
+
+## Walkthrough
+
+Here's an example walkthrough of how this all fits together.
+
+1. The user runs `echo 3 > /etc/kernel/tries` to enable boot counting.
+
+2. A new kernel is installed. `kernel-install` is used to generate a new boot
+ loader entry file for it. Let's say the version string for the new kernel is
+ `4.14.11-300.fc27.x86_64`, a new boot loader entry
+ `/boot/loader/entries/4.14.11-300.fc27.x86_64+3.conf` is hence created.
+
+3. The system is booted for the first time after the new kernel is
+ installed. The boot loader now sees the `+3` counter in the entry file
+ name. It hence renames the file to `4.14.11-300.fc27.x86_64+2-1.conf`
+ indicating that at this point one attempt has started and thus only one less
+ is left. After the rename completed the entry is booted as usual.
+
+4. Let's say this attempt to boot fails. On the following boot the boot loader
+ will hence see the `+2-1` tag in the name, and hence rename the entry file to
+ `4.14.11-300.fc27.x86_64+1-2.conf`, and boot it.
+
+5. Let's say the boot fails again. On the subsequent boot the loader hence will
+ see the `+1-2` tag, and rename the file to
+ `4.14.11-300.fc27.x86_64+0-3.conf` and boot it.
+
+6. If this boot also fails, on the next boot the boot loader will see the the
+ tag `+0-3`, i.e. the counter reached zero. At this point the entry will be
+ considered "bad", and ordered to the end of the list of entries. The next
+ newest boot entry is now tried, i.e. the system automatically reverted back
+ to an earlier version.
+
+The above describes the walkthrough when the selected boot entry continuously
+fails. Let's have a look at an alternative ending to this walkthrough. In this
+scenario the first 4 steps are the same as above:
+
+1. *as above*
+
+2. *as above*
+
+3. *as above*
+
+4. *as above*
+
+5. Let's say the second boot succeeds. The kernel initializes properly, systemd
+ is started and invokes all generators.
+
+6. One of the generators started is `systemd-bless-boot-generator` which
+ detects that boot counting is used. It hence pulls
+ `systemd-bless-boot.service` into the initial transaction.
+
+7. `systemd-bless-boot.service` is ordered after and `Requires=` the generic
+ `boot-complete.target` unit. This unit is hence also pulled into the initial
+ transaction.
+
+8. The `boot-complete.target` unit is ordered after and pulls in various units
+ that are required to succeed for the boot process to be considered
+ successful. One such unit is `systemd-boot-check-no-failures.service`.
+
+9. `systemd-boot-check-no-failures.service` is run after all its own
+ dependencies completed, and assesses that the boot completed
+ successfully. It hence exits cleanly.
+
+10. This allows `boot-complete.target` to be reached. This signifies to the
+ system that this boot attempt shall be considered successful.
+
+11. Which in turn permits `systemd-bless-boot.service` to run. It now
+ determines which boot loader entry file was used to boot the system, and
+ renames it dropping the counter tag. Thus
+ `4.14.11-300.fc27.x86_64+1-2.conf` is renamed to
+ `4.14.11-300.fc27.x86_64.conf`. From this moment boot counting is turned
+ off.
+
+12. On the following boot (and all subsequent boots after that) the entry is
+ now seen with boot counting turned off, no further renaming takes place.
+
+# How to adapt this scheme to other setups
+
+Of the stack described above many components may be replaced or augmented. Here
+are a couple of recommendations.
+
+1. To support alternative boot loaders in place of `systemd-boot` two scenarios
+ are recommended:
+
+ a. Boot loaders already implementing the Boot Loader Specification can simply
+ implement an equivalent file rename based logic, and thus integrate fully
+ with the rest of the stack.
+
+ b. Boot loaders that want to implement boot counting and store the counters
+ elsewhere can provide their own replacements for
+ `systemd-bless-boot.service` and `systemd-bless-boot-generator`, but should
+ continue to use `boot-complete.target` and thus support any services
+ ordered before that.
+
+2. To support additional components that shall succeed before the boot is
+ considered successful, simply place them in units (if they aren't already)
+ and order them before the generic `boot-complete.target` target unit,
+ combined with `Requires=` dependencies from the target, so that the target
+ cannot be reached when any of the units fail. You may add any number of
+ units like this, and only if they all succeed the boot entry is marked as
+ good. Note that the target unit shall pull in these boot checking units, not
+ the other way around.
+
+3. To support additional components that shall only run on boot success, simply
+ wrap them in a unit and order them after `boot-complete.target`, pulling it
+ in.
+
+# FAQ
+
+1. *Why do you use file renames to store the counter? Why not a regular file?*
+ — Mainly two reasons: it's relatively likely that renames can be implemented
+ atomically even in simpler file systems, while writing to file contents has
+ a much bigger chance to be result in incomplete or corrupt data, as renaming
+ generally avoids allocating or releasing data blocks. Moreover it has the
+ benefit that the boot count metadata is directly attached to the boot loader
+ entry file, and thus the lifecycle of the metadata and the entry itself are
+ bound together. This means no additional clean-up needs to take place to
+ drop the boot loader counting information for an entry when it is removed.
+
+2. *Why not use EFI variables for storing the boot counter?* — The memory chips
+ used to back the persistent EFI variables are generally not of the highest
+ quality, hence shouldn't be written to more than necessary. This means we
+ can't really use it for changes made regularly during boot, but can use it
+ only for seldom made configuration changes.
+
+3. *I have a service which — when it fails — should immediately cause a
+ reboot. How does that fit in with the above?* — Well, that's orthogonal to
+ the above, please use `FailureAction=` in the unit file for this.
+
+4. *Under some condition I want to mark the current boot loader entry as bad
+ right-away, so that it never is tried again, how do I do that?* — You may
+ invoke `/usr/lib/systemd/systemd-bless-boot bad` at any time to mark the
+ current boot loader entry as "bad" right-away so that it isn't tried again
+ on later boots.
diff --git a/docs/BLOCK_DEVICE_LOCKING.md b/docs/BLOCK_DEVICE_LOCKING.md
new file mode 100644
index 0000000..58178ad
--- /dev/null
+++ b/docs/BLOCK_DEVICE_LOCKING.md
@@ -0,0 +1,67 @@
+---
+title: Locking Block Device Access
+---
+
+# Locking Block Device Access
+
+*TL;DR: Use BSD file locks
+[(`flock(2)`)](http://man7.org/linux/man-pages/man2/flock.2.html) on block
+device nodes to synchronize access for partitioning and file system formatting
+tools.*
+
+`systemd-udevd` probes all block devices showing up for file system superblock
+and partition table information (utilizing `libblkid`). If another program
+concurrently modifies a superblock or partition table this probing might be
+affected, which is bad in itself, but also might in turn result in undesired
+effects in programs subscribing to `udev` events.
+
+Applications manipulating a block device can temporarily stop `systemd-udevd`
+from processing rules on it — and thus bar it from probing the device — by
+taking a BSD file lock on the block device node. Specifically, whenever
+`systemd-udevd` starts processing a block device it takes a `LOCK_SH|LOCK_NB`
+lock using [`flock(2)`](http://man7.org/linux/man-pages/man2/flock.2.html) on
+the main block device (i.e. never on any partition block device, but on the
+device the partition belongs to). If this lock cannot be taken (i.e. `flock()`
+returns `EBUSY`), it refrains from processing the device. If it manages to take
+the lock it is kept for the entire time the device is processed.
+
+Note that `systemd-udevd` also watches all block device nodes it manages for
+`inotify()` `IN_CLOSE` events: whenever such an event is seen, this is used as
+trigger to re-run the rule-set for the device.
+
+These two concepts allow tools such as disk partitioners or file system
+formatting tools to safely and easily take exclusive ownership of a block
+device while operating: before starting work on the block device, they should
+take an `LOCK_EX` lock on it. This has two effects: first of all, in case
+`systemd-udevd` is still processing the device the tool will wait for it to
+finish. Second, after the lock is taken, it can be sure that that
+`systemd-udevd` will refrain from processing the block device, and thus all
+other client applications subscribed to it won't get device notifications from
+potentially half-written data either. After the operation is complete the
+partitioner/formatter can simply close the device node. This has two effects:
+it implicitly releases the lock, so that `systemd-udevd` can process events on
+the device node again. Secondly, it results an `IN_CLOSE` event, which causes
+`systemd-udevd` to immediately re-process the device — seeing all changes the
+tool made — and notify subscribed clients about it.
+
+Besides synchronizing block device access between `systemd-udevd` and such
+tools this scheme may also be used to synchronize access between those tools
+themselves. However, do note that `flock()` locks are advisory only. This means
+if one tool honours this scheme and another tool does not, they will of course
+not be synchronized properly, and might interfere with each other's work.
+
+Note that the file locks follow the usual access semantics of BSD locks: since
+`systemd-udevd` never writes to such block devices it only takes a `LOCK_SH`
+*shared* lock. A program intending to make changes to the block device should
+take a `LOCK_EX` *exclusive* lock instead. For further details, see the
+`flock(2)` man page.
+
+And please keep in mind: BSD file locks (`flock()`) and POSIX file locks
+(`lockf()`, `F_SETLK`, …) are different concepts, and in their effect
+orthogonal. The scheme discussed above uses the former and not the latter,
+because these types of locks more closely match the required semantics.
+
+Summarizing: it is recommended to take `LOCK_EX` BSD file locks when
+manipulating block devices in all tools that change file system block devices
+(`mkfs`, `fsck`, …) or partition tables (`fdisk`, `parted`, …), right after
+opening the node.
diff --git a/docs/BOOT_LOADER_INTERFACE.md b/docs/BOOT_LOADER_INTERFACE.md
new file mode 100644
index 0000000..50488ee
--- /dev/null
+++ b/docs/BOOT_LOADER_INTERFACE.md
@@ -0,0 +1,118 @@
+---
+title: The Boot Loader Interface
+---
+
+# The Boot Loader Interface
+
+systemd can interface with the boot loader to receive performance data and
+other information, and pass control information. This is only supported on EFI
+systems. Data is transferred between the boot loader and systemd in EFI
+variables. All EFI variables use the vendor UUID
+`4a67b082-0a4c-41cf-b6c7-440b29bb8c4f`.
+
+* The EFI Variable `LoaderTimeInitUSec` contains the timestamp in microseconds
+ when the loader was initialized. This value is the time spent in the firmware
+ for initialization, it is formatted as numeric, NUL-terminated, decimal
+ string, in UTF-16.
+
+* The EFI Variable `LoaderTimeExecUSec` contains the timestamp in microseconds
+ when the loader finished its work and is about to execute the kernel. The
+ time spent in the loader is the difference between `LoaderTimeExecUSec` and
+ `LoaderTimeInitUSec`. This value is formatted the same way as
+ `LoaderTimeInitUSec`.
+
+* The EFI variable `LoaderDevicePartUUID` contains the partition GUID of the
+ ESP the boot loader was run from formatted as NUL-terminated UTF16 string, in
+ normal GUID syntax.
+
+* The EFI variable `LoaderConfigTimeout` contains the boot menu timeout
+ currently in use. It may be modified both by the boot loader and by the
+ host. The value should be formatted as numeric, NUL-terminated, decimal
+ string, in UTF-16. The time is specified in µs.
+
+* Similarly, the EFI variable `LoaderConfigTimeoutOneShot` contains a boot menu
+ timeout for a single following boot. It is set by the OS in order to request
+ display of the boot menu on the following boot. When set overrides
+ `LoaderConfigTimeout`. It is removed automatically after being read by the
+ boot loader, to ensure it only takes effect a single time. This value is
+ formatted the same way as `LoaderConfigTimeout`. If set to `0` the boot menu
+ timeout is turned off, and the menu is shown indefinitely.
+
+* The EFI variable `LoaderEntries` may contain a series of boot loader entry
+ identifiers, one after the other, each individually NUL terminated. This may
+ be used to let the OS know which boot menu entries were discovered by the
+ boot loader. A boot loader entry identifier should be a short, non-empty
+ alphanumeric string (possibly containing `-`, too). The list should be in the
+ order the entries are shown on screen during boot. See below regarding a
+ recommended vocabulary for boot loader entry identifiers.
+
+* The EFI variable `LoaderEntryDefault` contains the default boot loader entry
+ to use. It contains a NUL-terminated boot loader entry identifier.
+
+* Similarly, the EFI variable `LoaderEntryOneShot` contains the default boot
+ loader entry to use for a single following boot. It is set by the OS in order
+ to request booting into a specific menu entry on the following boot. When set
+ overrides `LoaderEntryDefault`. It is removed automatically after being read
+ by the boot loader, to ensure it only takes effect a single time. This value
+ is formatted the same way as `LoaderEntryDefault`.
+
+* The EFI variable `LoaderEntrySelected` contains the boot loader entry
+ identifier that was booted. It is set by the boot loader and read by
+ the OS in order to identify which entry has been used for the current boot.
+
+* The EFI variable `LoaderFeatures` contains a 64bit unsigned integer with a
+ number of flags bits that are set by the boot loader and passed to the OS and
+ indicate the features the boot loader supports. Specifically, the following
+ bits are defined:
+
+ * `1 << 0` → The boot loader honours `LoaderConfigTimeout` when set.
+ * `1 << 1` → The boot loader honours `LoaderConfigTimeoutOneShot` when set.
+ * `1 << 2` → The boot loader honours `LoaderEntryDefault` when set.
+ * `1 << 3` → The boot loader honours `LoaderEntryOneShot` when set.
+ * `1 << 4` → The boot loader supports boot counting as described in [Automatic Boot Assessment](https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT).
+
+If `LoaderTimeInitUSec` and `LoaderTimeExecUSec` are set, `systemd-analyze`
+will include them in its boot-time analysis. If `LoaderDevicePartUUID` is set,
+systemd will mount the ESP that was used for the boot to `/boot`, but only if
+that directory is empty, and only if no other file systems are mounted
+there. The `systemctl reboot --boot-loader-entry=…` and `systemctl reboot
+--boot-loader-menu=…` commands rely on the `LoaderFeatures` ,
+`LoaderConfigTimeoutOneShot`, `LoaderEntries`, `LoaderEntryOneShot` variables.
+
+## Boot Loader Entry Identifiers
+
+While boot loader entries may be named relatively freely, it's highly
+recommended to follow the following rules when picking identifiers for the
+entries, so that programs (and users) can derive basic context and meaning from
+the identifiers as passed in `LoaderEntries`, `LoaderEntryDefault`,
+`LoaderEntryOneShot`, `LoaderEntrySelected`, and possibly show nicely localized
+names for them in UIs.
+
+1. When boot loader entries are defined through [Boot Loader
+ Specification](https://systemd.io/BOOT_LOADER_SPECIFICATION) drop-in files
+ the identifier should be derived directly from the drop-in snippet name, but
+ with the `.conf` (or `.efi` in case of Type #2 entries) suffix removed.
+
+2. Entries automatically discovered by the boot loader (as opposed to being
+ configured in configuration files) should generally have an identifier
+ prefixed with `auto-`.
+
+3. Boot menu entries referring to Microsoft Windows installations should either
+ use the identifier `windows` or use the `windows-` prefix for the
+ identifier. If a menu entry is automatically discovered, it should be
+ prefixed with `auto-`, see above (Example: this means an automatically
+ discovered Windows installation might have the identifier `auto-windows` or
+ `auto-windows-10` or so.).
+
+4. Similar, boot menu entries referring to Apple MacOS X installations should
+ use the identifier `osx` or one that is prefixed with `osx-`. If such an
+ entry is automatically discovered by the boot loader use `auto-osx` as
+ identifier, or `auto-osx-` as prefix for the identifier, see above.
+
+5. If a boot menu entry encapsulates the EFI shell program, it should use the
+ identifier `efi-shell` (or when automatically discovered: `auto-efi-shell`,
+ see above).
+
+6. If a boot menu entry encapsulates a reboot into EFI firmware setup feature,
+ it should use the identifier `reboot-to-firmware-setup` (or
+ `auto-reboot-to-firmware-setup` in case it is automatically discovered).
diff --git a/docs/BOOT_LOADER_SPECIFICATION.md b/docs/BOOT_LOADER_SPECIFICATION.md
new file mode 100644
index 0000000..3612ff1
--- /dev/null
+++ b/docs/BOOT_LOADER_SPECIFICATION.md
@@ -0,0 +1,187 @@
+---
+title: The Boot Loader Specification
+---
+
+# The Boot Loader Specification
+
+_TL;DR: Currently there's little cooperation between multiple distributions in dual-boot (or triple, ... multi-boot) setups, and we'd like to improve this situation by getting everybody to commit to a single boot configuration format that is based on drop-in files, and thus is robust, simple, works without rewriting configuration files and is free of namespace clashes._
+
+The Boot Loader Specification defines a scheme how different operating systems can cooperatively manage a boot loader configuration directory, that accepts drop-in files for boot menu items that are defined in a format that is shared between various boot loader implementations, operating systems, and userspace programs. The target audience for this specification is:
+
+* Boot loader developers, to write a boot loader that directly reads its configuration at runtime from these drop-in snippets
+* Distribution and Core OS developers, in order to create these snippets at OS/kernel package installation time
+* UI developers, for implementing a user interface that discovers the available boot options
+* OS Installer developers, for setting up the initial drop-in directory
+
+## Why is there a need for this specification?
+
+Of course, without this specification things already work mostly fine. But here's why we think this specification is needed:
+
+* To make the boot more robust, as no explicit rewriting of configuration files is required any more
+* To improve dual-boot scenarios. Currently, multiple Linux installations tend to fight over which boot loader becomes the primary one in possession of the MBR, and only that one installation can then update the boot loader configuration of it freely. Other Linux installs have to be manually configured to never touch the MBR and instead install a chain-loaded boot loader in their own partition headers. In this new scheme as all installations share a loader directory no manual configuration has to take place, and all participants implicitly cooperate due to removal of name collisions and can install/remove their own boot menu entries at free will, without interfering with the entries of other installed operating systems.
+* Drop-in directories are otherwise now pretty ubiquitous on Linux as an easy way to extend configuration without having to edit, regenerate or manipulate configuration files. For the sake of uniformity, we should do the same for extending the boot menu.
+* Userspace code can sanely parse boot loader configuration which is essential with modern BIOSes which do not necessarily initialize USB keyboards anymore during boot, which makes boot menus hard to reach for the user. If userspace code can parse the boot loader configuration, too, this allows for UIs that can select a boot menu item to boot into, before rebooting the machine, thus not requiring interactivity during early boot.
+* To unify and thus simplify configuration of the various boot loaders around, which makes configuration of the boot loading process easier for users, administrators and developers alike.
+* For boot loaders with configuration _scripts_ such as grub2, adopting this spec allows for mostly static scripts that are generated only once at first installation, but then do not need to be updated anymore as that is done via drop-in files exclusively.
+
+## Why not simply rely on the EFI boot menu logic?
+
+The EFI specification provides a boot options logic that can offer similar functionality. Here's why we think that it is not enough for our uses:
+
+* The various EFI implementations implement the boot order/boot item logic to different levels. Some firmware implementations do not offer a boot menu at all and instead unconditionally follow the EFI boot order, booting the first item that is working.
+* If the firmware setup is used to reset all data usually all EFI boot entries are lost, making the system entirely unbootable, as the firmware setups generally do not offer a UI to define additional boot items. By placing the menu item information on disk, it is always available, regardless if the BIOS setup data is lost.
+* Harddisk images should be moveable between machines and be bootable without requiring explicit EFI variables to be set. This also requires that the list of boot options is defined on disk, and not in EFI variables alone.
+* EFI is not universal yet (especially on non-x86 platforms), this specification is useful both for EFI and non-EFI boot loaders.
+* Many EFI systems disable USB support during early boot to optimize boot times, thus making keyboard input unavailable in the EFI menu. It is thus useful if the OS UI has a standardized way to discover available boot options which can be booted to.
+
+## Technical Details
+
+Everything described below is located on a placeholder file system `$BOOT`. The installer program should pick `$BOOT` according to the following rules:
+
+* On disks with MBR disk labels
+ * If the OS is installed on a disk with MBR disk label, and a partition with the MBR type id of 0xEA already exists it should be used as `$BOOT`.
+ * Otherwise, if the OS is installed on a disk with MBR disk label, a new partition with MBR type id of 0xEA shall be created, of a suitable size (let's say 500MB), and it should be used as `$BOOT`.
+* On disks with GPT disk labels
+ * If the OS is installed on a disk with GPT disk label, and a partition with the GPT type GUID of bc13c2ff-59e6-4262-a352-b275fd6f7172 already exists, it should be used as `$BOOT`.
+ * Otherwise, if the OS is installed on a disk with GPT disk label, and an ESP partition (i.e. with the GPT type UID of c12a7328-f81f-11d2-ba4b-00a0c93ec93b) already exists and is large enough (let's say 250MB) and otherwise qualifies, it should be used as `$BOOT`.
+ * Otherwise, if the OS is installed on a disk with GPT disk label, and if the ESP partition already exists but is too small, a new suitably sized (let's say 500MB) partition with GPT type GUID of bc13c2ff-59e6-4262-a352-b275fd6f7172 shall be created and it should be used as `$BOOT`.
+ * Otherwise, if the OS is installed on a disk with GPT disk label, and no ESP partition exists yet, a new suitably sized (let's say 500MB) ESP should be created and should be used as `$BOOT`.
+
+This placeholder file system shall be determined during _installation time_, and an fstab entry may be created. It should be mounted to either `/boot/` or `/efi/`. Additional locations like `/boot/efi/`, with `/boot/` being a separate file system, might be supported by implementations. This is not recommended because the mounting of `$BOOT` is then dependent on and requires the mounting of the intermediate file system.
+
+**Note:** _`$BOOT` should be considered **shared** among all OS installations of a system. Instead of maintaining one `$BOOT` per installed OS (as `/boot/` was traditionally handled), all installed OS share the same place to drop in their boot-time configuration._
+
+For systems where the firmware is able to read file systems directly, `$BOOT` must be a file system readable by the firmware. For other systems, `$BOOT` must be a VFAT (16 or 32) file system. Applications accessing `$BOOT` should hence not assume that fancier file system features such as symlinks, hardlinks, access control or case sensitivity are supported.
+
+This specification defines two types of boot loader entries. The first type is
+text based, very simple and suitable for a variety of firmware, architecture
+and image types ("Type #1"). The second type is specific to EFI, but allows
+single-file images that embed all metadata in the kernel binary itself, which
+is useful to cryptographically sign them as one file for the purpose of
+SecureBoot ("Type #2").
+
+Not all boot loader entries will apply to all systems. For example, Type #1
+entries that use the `efi` key and all Type #2 entries only apply to EFI
+systems. Entries using the `architecture` key might specify an architecture that
+doesn't match the local one. Boot loaders should ignore all entries that don't
+match the local platform and what the boot loader can support, and hide them
+from the user. Only entries matching the feature set of boot loader and system
+shall be considered and displayed. This allows image builders to put together
+images that transparently support multiple different architectures.
+
+### Type #1 Boot Loader Specification Entries
+
+We define two directories below `$BOOT`:
+
+* `$BOOT/loader/` is the directory containing all files needed for Type #1 entries
+* `$BOOT/loader/entries/` is the directory containing the drop-in snippets. This directory contains one `.conf` file for each boot menu item.
+
+**Note:** _In all cases the `/loader/` directory should be located directly in the root of the file system. Specifically, if `$BOOT` is the ESP, then `/loader/` directory should be located directly in the root directory of the ESP, and not in the `/EFI/` subdirectory._
+
+Inside the `$BOOT/loader/entries/` directory each OS vendor may drop one or more configuration snippets with the suffix ".conf", one for each boot menu item. The file name of the file is used for identification of the boot item but shall never be presented to the user in the UI. The file name may be chosen freely but should be unique enough to avoid clashes between OS installations. More specifically it is suggested to include the machine ID (`/etc/machine-id` or the D-Bus machine ID for OSes that lack `/etc/machine-id`), the kernel version (as returned by `uname -r`) and an OS identifier (The ID field of `/etc/os-release`). Example: `$BOOT/loader/entries/6a9857a393724b7a981ebb5b8495b9ea-3.8.0-2.fc19.x86_64.conf`.
+
+These configuration snippets shall be Unix-style text files (i.e. line separation with a single newline character), in the UTF-8 encoding. The configuration snippets are loosely inspired on Grub1's configuration syntax. Lines beginning with '#' shall be ignored and used for commenting. The first word of a line is used as key and shall be separated by a space from its value. The following keys are known:
+
+* `title` shall contain a human readable title string for this menu item. This will be displayed in the boot menu for the item. It is a good idea to initialize this from the `PRETTY_NAME` of `/etc/os-release`. This name should be descriptive and does not have to be unique. If a boot loader discovers two entries with the same title it is a good idea to show more than just the raw title in the UI, for example by appending the `version` field. This field is optional. Example: "Fedora 18 (Spherical Cow)".
+* `version` shall contain a human readable version string for this menu item. This is usually the kernel version and is intended for use by OSes to install multiple kernel versions at the same time with the same `title` field. This field shall be in a syntax that is useful for Debian-style version sorts, so that the boot loader UI can determine the newest version easily and show it first or preselect it automatically. This field is optional. Example: `3.7.2-201.fc18.x86_64`.
+* `machine-id` shall contain the machine ID of the OS `/etc/machine-id`. This is useful for boot loaders and applications to filter out boot entries, for example to show only a single newest kernel per OS, or to group items by OS, or to maybe filter out the currently booted OS in UIs that want to show only other installed operating systems. This ID shall be formatted as 32 lower case hexadecimal characters (i.e. without any UUID formatting). This key is optional. Example: `4098b3f648d74c13b1f04ccfba7798e8`.
+* `linux` refers to the Linux kernel to spawn and shall be a path relative to the `$BOOT` directory. It is recommended that every distribution creates a machine id and version specific subdirectory below `$BOOT` and places its kernels and initial RAM disk images there. Example: `/6a9857a393724b7a981ebb5b8495b9ea/3.8.0-2.fc19.x86_64/linux`.
+* `initrd` refers to the initrd to use when executing the kernel. This also shall be a path relative to the `$BOOT` directory. This key is optional. This key may appear more than once in which case all specified images are used, in the order they are listed. Example: `6a9857a393724b7a981ebb5b8495b9ea/3.8.0-2.fc19.x86_64/initrd`.
+* `efi` refers to an arbitrary EFI program. This also takes a path relative to `$BOOT`. If this key is set, and the system is not an EFI system this entry should be hidden.
+* `options` shall contain kernel parameters to pass to the Linux kernel to spawn. This key is optional and may appear more than once in which case all specified parameters are used in the order they are listed.
+* `devicetree` refers to the binary device tree to use when executing the
+kernel. This also shall be a path relative to the `$BOOT` directory. This
+key is optional. Example: `6a9857a393724b7a981ebb5b8495b9ea/3.8.0-2.fc19.armv7hl/tegra20-paz00.dtb`.
+* `architecture` refers to the architecture this entry is defined for. The argument should be an architecture identifier, using the architecture vocabulary defined by the EFI specification (i.e. `IA32`, `x64`, `IA64`, `ARM`, `AA64`, …). If specified and this does not match (case insensitively) the local system architecture this entry should be hidden.
+
+Each configuration drop-in snippet must include at least a `linux` or an `efi` key and is otherwise not valid. Here's an example for a complete drop-in file:
+
+ # /boot/loader/entries/6a9857a393724b7a981ebb5b8495b9ea-3.8.0-2.fc19.x86_64.conf
+ title Fedora 19 (Rawhide)
+ version 3.8.0-2.fc19.x86_64
+ machine-id 6a9857a393724b7a981ebb5b8495b9ea
+ options root=UUID=6d3376e4-fc93-4509-95ec-a21d68011da2
+ architecture x64
+ linux /6a9857a393724b7a981ebb5b8495b9ea/3.8.0-2.fc19.x86_64/linux
+ initrd /6a9857a393724b7a981ebb5b8495b9ea/3.8.0-2.fc19.x86_64/initrd
+
+On EFI systems all Linux kernel images should be EFI images. In order to increase compatibility with EFI systems it is highly recommended only to install EFI kernel images, even on non-EFI systems, if that's applicable and supported on the specific architecture.
+
+Note that these configuration snippets may only reference kernels (and EFI programs) that reside on the same file system as the configuration snippets, i.e. everything referenced must be contained in the same file system. This is by design, as referencing other partitions or devices would require a non-trivial language for denoting device paths. If kernels/initrds are to be read from other partitions/disks the boot loader can do this in its own native configuration, using its own specific device path language, and this is out of focus for this specification. More specifically, on non-EFI systems configuration snippets following this specification cannot be used to spawn other operating systems (such as Windows).
+
+### Type #2 EFI Unified Kernel Images
+
+A unified kernel image is a single EFI PE executable combining an EFI stub
+loader, a kernel image, an initramfs image, and the kernel command line. See
+the description of the `--uefi` option in
+[dracut(8)](http://man7.org/linux/man-pages/man8/dracut.8.html). Such unified
+images will be searched for under `$BOOT/EFI/Linux/` and must have the
+extension `.efi`. Support for images of this type is of course specific to
+systems with EFI firmware. Ignore this section if you work on systems not
+supporting EFI.
+
+Images of this type have the advantage that all metadata and payload that makes
+up the boot entry is monopolized in a single PE file that can be signed
+cryptographically as one for the purpose of EFI SecureBoot.
+
+A valid unified kernel image must contain two PE sections:
+
+* `.cmdline` section with the kernel command line
+* `.osrel` section with an embedded copy of the [os-release](https://www.freedesktop.org/software/systemd/man/os-release.html) file describing the image
+
+The `PRETTY_NAME=` and `VERSION_ID=` fields in the embedded os-release file are used the same as `title` and `version` in the "boot loader specification" entries. The `.cmdline` section is used instead of the `options` field. `linux` and `initrd` fields are not necessary, and there is no counterpart for the `machine-id` field.
+
+On EFI, any such images shall be added to the list of valid boot entries.
+
+### Additional notes
+
+Note that these configurations snippets do not need to be the only configuration source for a boot loader. It may extend this list of entries with additional items from other configuration files (for example its own native configuration files) or automatically detected other entries without explicit configuration.
+
+To make this explicitly clear: this specification is designed with "free" operating systems in mind, starting Windows or MacOS is out of focus with these configuration snippets, use boot-loader specific solutions for that. In the text above, if we say "OS" we hence imply "free", i.e. primarily Linux (though this could be easily be extended to the BSDs and whatnot).
+
+Note that all paths used in the configuration snippets use a Unix-style "/" as path separator. This needs to be converted to an EFI-style "\" separator in EFI boot loaders.
+
+
+## Logic
+
+A _boot loader_ needs a file system driver to discover and read `$BOOT`, then
+simply reads all files `$BOOT/loader/entries/*.conf`, and populates its boot
+menu with this. On EFI, it then extends this with any unified kernel images
+found in `$BOOT/EFI/Linux/*.efi`. It may also add additional entries, for
+example a "Reboot into firmware" option. Optionally it may sort the menu based
+on the `machine-id` and `version` fields, and possibly others. It uses the file
+name to identify specific items, for example in case it supports storing away
+default entry information somewhere. A boot loader should generally not modify
+these files.
+
+For "Boot Loader Specification Entries" (Type #1), the _kernel package
+installer_ installs the kernel and initrd images to `$BOOT` (it is recommended
+to place these files in a vendor and OS and installation specific directory)
+and then generates a configuration snippet for it, placing this in
+`$BOOT/loader/entries/xyz.conf`, with xyz as concatenation of machine id and
+version information (see above). The files created by a kernel package are
+private property of the kernel package and should be removed along with it.
+
+For "EFI Unified Kernel Images" (Type #2), the vendor or kernel package
+installer creates the combined image and drops it into `$BOOT/EFI/Linux/`. This
+file is also private property of the kernel package and should be removed along
+with it.
+
+A _UI application_ intended to show available boot options shall operate similar to a boot loader, but might apply additional filters, for example by filtering out the booted OS via the machine ID, or by suppressing all but the newest kernel versions.
+
+An _OS installer_ picks the right place for `$BOOT` as defined above (possibly creating a partition and file system for it) and pre-creates the `/loader/entries/` directory in it. It then installs an appropriate boot loader that can read these snippets. Finally, it installs one or more kernel packages.
+
+
+## Out of Focus
+
+There are a couple of items that are out of focus for this specification:
+
+* If userspace can figure out the available boot options, then this is only useful so much: we'd still need to come up with a way how userspace could communicate to the boot loader the default boot loader entry temporarily or persistently. Defining a common scheme for this is certainly a good idea, but out of focus for this specification.
+* This specification is just about "Free" Operating systems. Hooking in other operating systems (like Windows and macOS) into the boot menu is a different story and should probably happen outside of this specification. For example, boot loaders might choose to detect other available OSes dynamically at runtime without explicit configuration (like `systemd-boot` does it), or via native configuration (for example via explicit Grub2 configuration generated once at installation).
+* This specification leaves undefined what to do about systems which are upgraded from an OS that does not implement this specification. As the previous boot loader logic was largely handled by in distribution-specific ways we probably should leave the upgrade path (and whether there actually is one) to the distributions. The simplest solution might be to simply continue with the old scheme for old installations and use this new scheme only for new installations.
+
+
+## Links
+
+[systemd-boot(7)](https://www.freedesktop.org/software/systemd/man/systemd-boot.html)<br>
+[bootctl(1)](https://www.freedesktop.org/software/systemd/man/bootctl.html)
diff --git a/docs/CGROUP_DELEGATION.md b/docs/CGROUP_DELEGATION.md
new file mode 100644
index 0000000..8bf1b69
--- /dev/null
+++ b/docs/CGROUP_DELEGATION.md
@@ -0,0 +1,477 @@
+---
+title: Control Group APIs and Delegation
+---
+
+# Control Group APIs and Delegation
+
+*Intended audience: hackers working on userspace subsystems that require direct
+cgroup access, such as container managers and similar.*
+
+So you are wondering about resource management with systemd, you know Linux
+control groups (cgroups) a bit and are trying to integrate your software with
+what systemd has to offer there. Here's a bit of documentation about the
+concepts and interfaces involved with this.
+
+What's described here has been part of systemd and documented since v205
+times. However, it has been updated and improved substantially, even
+though the concepts stayed mostly the same. This is an attempt to provide more
+comprehensive up-to-date information about all this, particular in light of the
+poor implementations of the components interfacing with systemd of current
+container managers.
+
+Before you read on, please make sure you read the low-level [kernel
+documentation about
+cgroup v2](https://www.kernel.org/doc/Documentation/cgroup-v2.txt). This
+documentation then adds in the higher-level view from systemd.
+
+This document augments the existing documentation we already have:
+
+* [The New Control Group Interfaces](https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/)
+* [Writing VM and Container Managers](https://www.freedesktop.org/wiki/Software/systemd/writing-vm-managers/)
+
+These wiki documents are not as up to date as they should be, currently, but
+the basic concepts still fully apply. You should read them too, if you do something
+with cgroups and systemd, in particular as they shine more light on the various
+D-Bus APIs provided. (That said, sooner or later we should probably fold that
+wiki documentation into this very document, too.)
+
+## Two Key Design Rules
+
+Much of the philosophy behind these concepts is based on a couple of basic
+design ideas of cgroup v2 (which we however try to adapt as far as we can to
+cgroup v1 too). Specifically two cgroup v2 rules are the most relevant:
+
+1. The **no-processes-in-inner-nodes** rule: this means that it's not permitted
+to have processes directly attached to a cgroup that also has child cgroups and
+vice versa. A cgroup is either an inner node or a leaf node of the tree, and if
+it's an inner node it may not contain processes directly, and if it's a leaf
+node then it may not have child cgroups. (Note that there are some minor
+exceptions to this rule, though. E.g. the root cgroup is special and allows
+both processes and children — which is used in particular to maintain kernel
+threads.)
+
+2. The **single-writer** rule: this means that each cgroup only has a single
+writer, i.e. a single process managing it. It's OK if different cgroups have
+different processes managing them. However, only a single process should own a
+specific cgroup, and when it does that ownership is exclusive, and nothing else
+should manipulate it at the same time. This rule ensures that various pieces of
+software don't step on each other's toes constantly.
+
+These two rules have various effects. For example, one corollary of this is: if
+your container manager creates and manages cgroups in the system's root cgroup
+you violate rule #2, as the root cgroup is managed by systemd and hence off
+limits to everybody else.
+
+Note that rule #1 is generally enforced by the kernel if cgroup v2 is used: as
+soon as you add a process to a cgroup it is ensured the rule is not
+violated. On cgroup v1 this rule didn't exist, and hence isn't enforced, even
+though it's a good thing to follow it then too. Rule #2 is not enforced on
+either cgroup v1 nor cgroup v2 (this is UNIX after all, in the general case
+root can do anything, modulo SELinux and friends), but if you ignore it you'll
+be in constant pain as various pieces of software will fight over cgroup
+ownership.
+
+Note that cgroup v1 is currently the most deployed implementation, even though
+it's semantically broken in many ways, and in many cases doesn't actually do
+what people think it does. cgroup v2 is where things are going, and most new
+kernel features in this area are only added to cgroup v2, and not cgroup v1
+anymore. For example cgroup v2 provides proper cgroup-empty notifications, has
+support for all kinds of per-cgroup BPF magic, supports secure delegation of
+cgroup trees to less privileged processes and so on, which all are not
+available on cgroup v1.
+
+## Three Different Tree Setups 🌳
+
+systemd supports three different modes how cgroups are set up. Specifically:
+
+1. **Unified** — this is the simplest mode, and exposes a pure cgroup v2
+logic. In this mode `/sys/fs/cgroup` is the only mounted cgroup API file system
+and all available controllers are exclusively exposed through it.
+
+2. **Legacy** — this is the traditional cgroup v1 mode. In this mode the
+various controllers each get their own cgroup file system mounted to
+`/sys/fs/cgroup/<controller>/`. On top of that systemd manages its own cgroup
+hierarchy for managing purposes as `/sys/fs/cgroup/systemd/`.
+
+3. **Hybrid** — this is a hybrid between the unified and legacy mode. It's set
+up mostly like legacy, except that there's also an additional hierarchy
+`/sys/fs/cgroup/unified/` that contains the cgroup v2 hierarchy. (Note that in
+this mode the unified hierarchy won't have controllers attached, the
+controllers are all mounted as separate hierarchies as in legacy mode,
+i.e. `/sys/fs/cgroup/unified/` is purely and exclusively about core cgroup v2
+functionality and not about resource management.) In this mode compatibility
+with cgroup v1 is retained while some cgroup v2 features are available
+too. This mode is a stopgap. Don't bother with this too much unless you have
+too much free time.
+
+To say this clearly, legacy and hybrid modes have no future. If you develop
+software today and don't focus on the unified mode, then you are writing
+software for yesterday, not tomorrow. They are primarily supported for
+compatibility reasons and will not receive new features. Sorry.
+
+Superficially, in legacy and hybrid modes it might appear that the parallel
+cgroup hierarchies for each controller are orthogonal from each other. In
+systemd they are not: the hierarchies of all controllers are always kept in
+sync (at least mostly: sub-trees might be suppressed in certain hierarchies if
+no controller usage is required for them). The fact that systemd keeps these
+hierarchies in sync means that the legacy and hybrid hierarchies are
+conceptually very close to the unified hierarchy. In particular this allows us
+to talk of one specific cgroup and actually mean the same cgroup in all
+available controller hierarchies. E.g. if we talk about the cgroup `/foo/bar/`
+then we actually mean `/sys/fs/cgroup/cpu/foo/bar/` as well as
+`/sys/fs/cgroup/memory/foo/bar/`, `/sys/fs/cgroup/pids/foo/bar/`, and so on.
+Note that in cgroup v2 the controller hierarchies aren't orthogonal, hence
+thinking about them as orthogonal won't help you in the long run anyway.
+
+If you wonder how to detect which of these three modes is currently used, use
+`statfs()` on `/sys/fs/cgroup/`. If it reports `CGROUP2_SUPER_MAGIC` in its
+`.f_type` field, then you are in unified mode. If it reports `TMPFS_MAGIC` then
+you are either in legacy or hybrid mode. To distinguish these two cases, run
+`statfs()` again on `/sys/fs/cgroup/unified/`. If that succeeds and reports
+`CGROUP2_SUPER_MAGIC` you are in hybrid mode, otherwise not.
+
+## systemd's Unit Types
+
+The low-level kernel cgroups feature is exposed in systemd in three different
+"unit" types. Specifically:
+
+1. 💼 The `.service` unit type. This unit type is for units encapsulating
+ processes systemd itself starts. Units of these types have cgroups that are
+ the leaves of the cgroup tree the systemd instance manages (though possibly
+ they might contain a sub-tree of their own managed by something else, made
+ possible by the concept of delegation, see below). Service units are usually
+ instantiated based on a unit file on disk that describes the command line to
+ invoke and other properties of the service. However, service units may also
+ be declared and started programmatically at runtime through a D-Bus API
+ (which is called *transient* services).
+
+2. 👓 The `.scope` unit type. This is very similar to `.service`. The main
+ difference: the processes the units of this type encapsulate are forked off
+ by some unrelated manager process, and that manager asked systemd to expose
+ them as a unit. Unlike services, scopes can only be declared and started
+ programmatically, i.e. are always transient. That's because they encapsulate
+ processes forked off by something else, i.e. existing runtime objects, and
+ hence cannot really be defined fully in 'offline' concepts such as unit
+ files.
+
+3. 🔪 The `.slice` unit type. Units of this type do not directly contain any
+ processes. Units of this type are the inner nodes of part of the cgroup tree
+ the systemd instance manages. Much like services, slices can be defined
+ either on disk with unit files or programmatically as transient units.
+
+Slices expose the trunk and branches of a tree, and scopes and services are
+attached to those branches as leaves. The idea is that scopes and services can
+be moved around though, i.e. assigned to a different slice if needed.
+
+The naming of slice units directly maps to the cgroup tree path. This is not
+the case for service and scope units however. A slice named `foo-bar-baz.slice`
+maps to a cgroup `/foo.slice/foo-bar.slice/foo-bar-baz.slice/`. A service
+`quux.service` which is attached to the slice `foo-bar-baz.slice` maps to the
+cgroup `/foo.slice/foo-bar.slice/foo-bar-baz.slice/quux.service/`.
+
+By default systemd sets up four slice units:
+
+1. `-.slice` is the root slice. i.e. the parent of everything else. On the host
+ system it maps directly to the top-level directory of cgroup v2.
+
+2. `system.slice` is where system services are by default placed, unless
+ configured otherwise.
+
+3. `user.slice` is where user sessions are placed. Each user gets a slice of
+ its own below that.
+
+4. `machines.slice` is where VMs and containers are supposed to be
+ placed. `systemd-nspawn` makes use of this by default, and you're very welcome
+ to place your containers and VMs there too if you hack on managers for those.
+
+Users may define any amount of additional slices they like though, the four
+above are just the defaults.
+
+## Delegation
+
+Container managers and suchlike often want to control cgroups directly using
+the raw kernel APIs. That's entirely fine and supported, as long as proper
+*delegation* is followed. Delegation is a concept we inherited from cgroup v2,
+but we expose it on cgroup v1 too. Delegation means that some parts of the
+cgroup tree may be managed by different managers than others. As long as it is
+clear which manager manages which part of the tree each one can do within its
+sub-graph of the tree whatever it wants.
+
+Only sub-trees can be delegated (though whoever decides to request a sub-tree
+can delegate sub-sub-trees further to somebody else if they like). Delegation
+takes place at a specific cgroup: in systemd there's a `Delegate=` property you
+can set for a service or scope unit. If you do, it's the cut-off point for
+systemd's cgroup management: the unit itself is managed by systemd, i.e. all
+its attributes are managed exclusively by systemd, however your program may
+create/remove sub-cgroups inside it freely, and those then become exclusive
+property of your program, systemd won't touch them — all attributes of *those*
+sub-cgroups can be manipulated freely and exclusively by your program.
+
+By turning on the `Delegate=` property for a scope or service you get a few
+guarantees:
+
+1. systemd won't fiddle with your sub-tree of the cgroup tree anymore. It won't
+ change attributes of any cgroups below it, nor will it create or remove any
+ cgroups thereunder, nor migrate processes across the boundaries of that
+ sub-tree as it deems useful anymore.
+
+2. If your service makes use of the `User=` functionality, then the sub-tree
+ will be `chown()`ed to the indicated user so that it can correctly create
+ cgroups below it. Note however that systemd will do that only in the unified
+ hierarchy (in unified and hybrid mode) as well as on systemd's own private
+ hierarchy (in legacy and hybrid mode). It won't pass ownership of the legacy
+ controller hierarchies. Delegation to less privileges processes is not safe
+ in cgroup v1 (as a limitation of the kernel), hence systemd won't facilitate
+ access to it.
+
+3. Any BPF IP filter programs systemd installs will be installed with
+ `BPF_F_ALLOW_MULTI` so that your program can install additional ones.
+
+In unit files the `Delegate=` property is superficially exposed as
+boolean. However, since v236 it optionally takes a list of controller names
+instead. If so, delegation is requested for listed controllers
+specifically. Note hat this only encodes a request. Depending on various
+parameters it might happen that your service actually will get fewer
+controllers delegated (for example, because the controller is not available on
+the current kernel or was turned off) or more. If no list is specified
+(i.e. the property simply set to `yes`) then all available controllers are
+delegated.
+
+Let's stress one thing: delegation is available on scope and service units
+only. It's expressly not available on slice units. Why? Because slice units are
+our *inner* nodes of the cgroup trees and we freely attach service and scopes
+to them. If we'd allow delegation on slice units then this would mean that
+both systemd and your own manager would create/delete cgroups below the slice
+unit and that conflicts with the single-writer rule.
+
+So, if you want to do your own raw cgroups kernel level access, then allocate a
+scope unit, or a service unit (or just use the service unit you already have
+for your service code), and turn on delegation for it.
+
+(OK, here's one caveat: if you turn on delegation for a service, and that
+service has `ExecStartPost=`, `ExecReload=`, `ExecStop=` or `ExecStopPost=`
+set, then these commands will be executed within the `.control/` sub-cgroup of
+your service's cgroup. This is necessary because by turning on delegation we
+have to assume that the cgroup delegated to your service is now an *inner*
+cgroup, which means that it may not directly contain any processes. Hence, if
+your service has any of these four settings set, you must be prepared that a
+`.control/` subcgroup might appear, managed by the service manager. This also
+means that your service code should have moved itself further down the cgroup
+tree by the time it notifies the service manager about start-up readiness, so
+that the service's main cgroup is definitely an inner node by the time the
+service manager might start `ExecStartPost=`.)
+
+## Three Scenarios
+
+Let's say you write a container manager, and you wonder what to do regarding
+cgroups for it, as you want your manager to be able to run on systemd systems.
+
+You basically have three options:
+
+1. 😊 The *integration-is-good* option. For this, you register each container
+ you have either as a systemd service (i.e. let systemd invoke the executor
+ binary for you) or a systemd scope (i.e. your manager executes the binary
+ directly, but then tells systemd about it. In this mode the administrator
+ can use the usual systemd resource management and reporting commands
+ individually on those containers. By turning on `Delegate=` for these scopes
+ or services you make it possible to run cgroup-enabled programs in your
+ containers, for example a nested systemd instance. This option has two
+ sub-options:
+
+ a. You transiently register the service or scope by directly contacting
+ systemd via D-Bus. In this case systemd will just manage the unit for you
+ and nothing else.
+
+ b. Instead you register the service or scope through `systemd-machined`
+ (also via D-Bus). This mini-daemon is basically just a proxy for the same
+ operations as in a. The main benefit of this: this way you let the system
+ know that what you are registering is a container, and this opens up
+ certain additional integration points. For example, `journalctl -M` can
+ then be used to directly look into any container's journal logs (should
+ the container run systemd inside), or `systemctl -M` can be used to
+ directly invoke systemd operations inside the containers. Moreover tools
+ like "ps" can then show you to which container a process belongs (`ps -eo
+ pid,comm,machine`), and even gnome-system-monitor supports it.
+
+2. 🙁 The *i-like-islands* option. If all you care about is your own cgroup tree,
+ and you want to have to do as little as possible with systemd and no
+ interest in integration with the rest of the system, then this is a valid
+ option. For this all you have to do is turn on `Delegate=` for your main
+ manager daemon. Then figure out the cgroup systemd placed your daemon in:
+ you can now freely create sub-cgroups beneath it. Don't forget the
+ *no-processes-in-inner-nodes* rule however: you have to move your main
+ daemon process out of that cgroup (and into a sub-cgroup) before you can
+ start further processes in any of your sub-cgroups.
+
+3. 🙁 The *i-like-continents* option. In this option you'd leave your manager
+ daemon where it is, and would not turn on delegation on its unit. However,
+ as first thing you register a new scope unit with systemd, and that scope
+ unit would have `Delegate=` turned on, and then you place all your
+ containers underneath it. From systemd's PoV there'd be two units: your
+ manager service and the big scope that contains all your containers in one.
+
+BTW: if for whatever reason you say "I hate D-Bus, I'll never call any D-Bus
+API, kthxbye", then options #1 and #3 are not available, as they generally
+involve talking to systemd from your program code, via D-Bus. You still have
+option #2 in that case however, as you can simply set `Delegate=` in your
+service's unit file and you are done and have your own sub-tree. In fact, #2 is
+the one option that allows you to completely ignore systemd's existence: you
+can entirely generically follow the single rule that you just use the cgroup
+you are started in, and everything below it, whatever that might be. That said,
+maybe if you dislike D-Bus and systemd that much, the better approach might be
+to work on that, and widen your horizon a bit. You are welcome.
+
+## Controller Support
+
+systemd supports a number of controllers (but not all). Specifically, supported
+are:
+
+* on cgroup v1: `cpu`, `cpuacct`, `blkio`, `memory`, `devices`, `pids`
+* on cgroup v2: `cpu`, `io`, `memory`, `pids`
+
+It is our intention to natively support all cgroup v2 controllers as they are
+added to the kernel. However, regarding cgroup v1: at this point we will not
+add support for any other controllers anymore. This means systemd currently
+does not and will never manage the following controllers on cgroup v1:
+`freezer`, `cpuset`, `net_cls`, `perf_event`, `net_prio`, `hugetlb`. Why not?
+Depending on the case, either their API semantics or implementations aren't
+really usable, or it's very clear they have no future on cgroup v2, and we
+won't add new code for stuff that clearly has no future.
+
+Effectively this means that all those mentioned cgroup v1 controllers are up
+for grabs: systemd won't manage them, and hence won't delegate them to your
+code (however, systemd will still mount their hierarchies, simply because it
+mounts all controller hierarchies it finds available in the kernel). If you
+decide to use them, then that's fine, but systemd won't help you with it (but
+also not interfere with it). To be nice to other tenants it might be wise to
+replicate the cgroup hierarchies of the other controllers in them too however,
+but of course that's between you and those other tenants, and systemd won't
+care. Replicating the cgroup hierarchies in those unsupported controllers would
+mean replicating the full cgroup paths in them, and hence the prefixing
+`.slice` components too, otherwise the hierarchies will start being orthogonal
+after all, and that's not really desirable. On more thing: systemd will clean
+up after you in the hierarchies it manages: if your daemon goes down, its
+cgroups will be removed too. You basically get the guarantee that you start
+with a pristine cgroup sub-tree for your service or scope whenever it is
+started. This is not the case however in the hierarchies systemd doesn't
+manage. This means that your programs should be ready to deal with left-over
+cgroups in them — from previous runs, and be extra careful with them as they
+might still carry settings that might not be valid anymore.
+
+Note a particular asymmetry here: if your systemd version doesn't support a
+specific controller on cgroup v1 you can still make use of it for delegation,
+by directly fiddling with its hierarchy and replicating the cgroup tree there
+as necessary (as suggested above). However, on cgroup v2 this is different:
+separately mounted hierarchies are not available, and delegation has always to
+happen through systemd itself. This means: when you update your kernel and it
+adds a new, so far unseen controller, and you want to use it for delegation,
+then you also need to update systemd to a version that groks it.
+
+## systemd as Container Payload
+
+systemd can happily run as a container payload's PID 1. Note that systemd
+unconditionally needs write access to the cgroup tree however, hence you need
+to delegate a sub-tree to it. Note that there's nothing too special you have to
+do beyond that: just invoke systemd as PID 1 inside the root of the delegated
+cgroup sub-tree, and it will figure out the rest: it will determine the cgroup
+it is running in and take possession of it. It won't interfere with any cgroup
+outside of the sub-tree it was invoked in. Use of `CLONE_NEWCGROUP` is hence
+optional (but of course wise).
+
+Note one particular asymmetry here though: systemd will try to take possession
+of the root cgroup you pass to it *in* *full*, i.e. it will not only
+create/remove child cgroups below it, it will also attempt to manage the
+attributes of it. OTOH as mentioned above, when delegating a cgroup tree to
+somebody else it only passes the rights to create/remove sub-cgroups, but will
+insist on managing the delegated cgroup tree's top-level attributes. Or in
+other words: systemd is *greedy* when accepting delegated cgroup trees and also
+*greedy* when delegating them to others: it insists on managing attributes on
+the specific cgroup in both cases. A container manager that is itself a payload
+of a host systemd which wants to run a systemd as its own container payload
+instead hence needs to insert an extra level in the hierarchy in between, so
+that the systemd on the host and the one in the container won't fight for the
+attributes. That said, you likely should do that anyway, due to the
+no-processes-in-inner-cgroups rule, see below.
+
+When systemd runs as container payload it will make use of all hierarchies it
+has write access to. For legacy mode you need to make at least
+`/sys/fs/cgroup/systemd/` available, all other hierarchies are optional. For
+hybrid mode you need to add `/sys/fs/cgroup/unified/`. Finally, for fully
+unified you (of course, I guess) need to provide only `/sys/fs/cgroup/` itself.
+
+## Some Dos
+
+1. ⚡ If you go for implementation option 1a or 1b (as in the list above), then
+ each of your containers will have its own systemd-managed unit and hence
+ cgroup with possibly further sub-cgroups below. Typically the first process
+ running in that unit will be some kind of executor program, which will in
+ turn fork off the payload processes of the container. In this case don't
+ forget that there are two levels of delegation involved: first, systemd
+ delegates a group sub-tree to your executor. And then your executor should
+ delegate a sub-tree further down to the container payload. Oh, and because
+ of the no-process-in-inner-nodes rule, your executor needs to migrate itself
+ to a sub-cgroup of the cgroup it got delegated, too. Most likely you hence
+ want a two-pronged approach: below the cgroup you got started in, you want
+ one cgroup maybe called `supervisor/` where your manager runs in and then
+ for each container a sibling cgroup of that maybe called `payload-xyz/`.
+
+2. ⚡ Don't forget that the cgroups you create have to have names that are
+ suitable as UNIX file names, and that they live in the same namespace as the
+ various kernel attribute files. Hence, when you want to allow the user
+ arbitrary naming, you might need to escape some of the names (for example,
+ you really don't want to create a cgroup named `tasks`, just because the
+ user created a container by that name, because `tasks` after all is a magic
+ attribute in cgroup v1, and your `mkdir()` will hence fail with `EEXIST`. In
+ systemd we do escaping by prefixing names that might collide with a kernel
+ attribute name with an underscore. You might want to do the same, but this
+ is really up to you how you do it. Just do it, and be careful.
+
+## Some Don'ts
+
+1. 🚫 Never create your own cgroups below arbitrary cgroups systemd manages, i.e
+ cgroups you haven't set `Delegate=` in. Specifically: 🔥 don't create your
+ own cgroups below the root cgroup 🔥. That's owned by systemd, and you will
+ step on systemd's toes if you ignore that, and systemd will step on
+ yours. Get your own delegated sub-tree, you may create as many cgroups there
+ as you like. Seriously, if you create cgroups directly in the cgroup root,
+ then all you do is ask for trouble.
+
+2. 🚫 Don't attempt to set `Delegate=` in slice units, and in particular not in
+ `-.slice`. It's not supported, and will generate an error.
+
+3. 🚫 Never *write* to any of the attributes of a cgroup systemd created for
+ you. It's systemd's private property. You are welcome to manipulate the
+ attributes of cgroups you created in your own delegated sub-tree, but the
+ cgroup tree of systemd itself is out of limits for you. It's fine to *read*
+ from any attribute you like however. That's totally OK and welcome.
+
+4. 🚫 When not using `CLONE_NEWCGROUP` when delegating a sub-tree to a
+ container payload running systemd, then don't get the idea that you can bind
+ mount only a sub-tree of the host's cgroup tree into the container. Part of
+ the cgroup API is that `/proc/$PID/cgroup` reports the cgroup path of every
+ process, and hence any path below `/sys/fs/cgroup/` needs to match what
+ `/proc/$PID/cgroup` of the payload processes reports. What you can do safely
+ however, is mount the upper parts of the cgroup tree read-only (or even
+ replace the middle bits with an intermediary `tmpfs` — but be careful not to
+ break the `statfs()` detection logic discussed above), as long as the path
+ to the delegated sub-tree remains accessible as-is.
+
+5. ⚡ Currently, the algorithm for mapping between slice/scope/service unit
+ naming and their cgroup paths is not considered public API of systemd, and
+ may change in future versions. This means: it's best to avoid implementing a
+ local logic of translating cgroup paths to slice/scope/service names in your
+ program, or vice versa — it's likely going to break sooner or later. Use the
+ appropriate D-Bus API calls for that instead, so that systemd translates
+ this for you. (Specifically: each Unit object has a `ControlGroup` property
+ to get the cgroup for a unit. The method `GetUnitByControlGroup()` may be
+ used to get the unit for a cgroup.)
+
+6. ⚡ Think twice before delegating cgroup v1 controllers to less privileged
+ containers. It's not safe, you basically allow your containers to freeze the
+ system with that and worse. Delegation is a strongpoint of cgroup v2 though,
+ and there it's safe to treat delegation boundaries as privilege boundaries.
+
+And that's it for now. If you have further questions, refer to the systemd
+mailing list.
+
+— Berlin, 2018-04-20
diff --git a/docs/CNAME b/docs/CNAME
new file mode 100644
index 0000000..cdcf4d9
--- /dev/null
+++ b/docs/CNAME
@@ -0,0 +1 @@
+systemd.io \ No newline at end of file
diff --git a/docs/CODE_OF_CONDUCT.md b/docs/CODE_OF_CONDUCT.md
new file mode 100644
index 0000000..da290ec
--- /dev/null
+++ b/docs/CODE_OF_CONDUCT.md
@@ -0,0 +1,18 @@
+---
+title: The systemd Community Conduct Guidelines
+---
+
+# The systemd Community Conduct Guidelines
+
+This document provides community guidelines for a safe, respectful, productive, and collaborative place for any person who is willing to contribute to systemd. It applies to all “collaborative spaces”, which is defined as community communications channels (such as mailing lists, submitted patches, commit comments, etc.).
+
+- Participants will be tolerant of opposing views.
+- Participants must ensure that their language and actions are free of personal attacks and disparaging personal remarks.
+- When interpreting the words and actions of others, participants should always assume good intentions.
+- Behaviour which can be reasonably considered harassment will not be tolerated.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at systemd-conduct@googlegroups.com. This team currently consists of David Strauss <<systemd-conduct@davidstrauss.net>>, Ekaterina Gerasimova (Kat) <<Kittykat3756@gmail.com>>, and Zbigniew Jędrzejewski-Szmek <<zbyszek@in.waw.pl>>. In the unfortunate event that you wish to make a complaint against one of the members, you may instead contact any of the other members individually.
+
+All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident.
diff --git a/docs/CODE_QUALITY.md b/docs/CODE_QUALITY.md
new file mode 100644
index 0000000..18363f0
--- /dev/null
+++ b/docs/CODE_QUALITY.md
@@ -0,0 +1,68 @@
+---
+title: Code Quality Tools
+---
+
+# Code Quality Tools
+
+The systemd project has a number of code quality tools set up in the source
+tree and on the github infrastructure. Here's an incomprehensive list of the
+available functionality:
+
+1. Use `ninja -C build test` to run the unit tests. Some tests are skipped if
+ no privileges are available, hence consider also running them with `sudo
+ ninja -C build test`. A couple of unit tests are considered "unsafe" (as
+ they change system state); to run those too, build with `meson
+ -Dtests=unsafe`. Finally, some unit tests are considered to be very slow,
+ build them too with `meson -Dslow-tests=true`. (Note that there are a couple
+ of manual tests in addition to these unit tests.)
+
+2. Use `./test/run-integration-tests.sh` to run the full integration test
+ suite. This will build OS images with a number of integration tests and run
+ them in nspawn and qemu. Requires root.
+
+3. Use `./coccinelle/run-coccinelle.sh` to run all
+ [Coccinelle](http://coccinelle.lip6.fr/) semantic patch scripts we ship. The
+ output will show false positives, hence take it with a pinch of salt.
+
+4. Use `./tools/find-double-newline.sh recdiff` to find double newlines. Use
+ `./tools/find-double-newline.sh recpatch` to fix them. Take this with a grain
+ of salt, in particular as we generally leave foreign header files we include in
+ our tree unmodified, if possible.
+
+5. Similar use `./tools/find-tabs.sh recdiff` to find TABs, and
+ `./tools/find-tabs.sh recpatch` to fix them. (Again, grain of salt, foreign
+ headers should usually be left unmodified.)
+
+6. Use `ninja -C build check-api-docs` to compare the list of exported
+ symbols of `libsystemd.so` and `libudev.so` with the list of man pages. Symbols
+ lacking documentation are highlighted.
+
+7. Use `ninja -C build hwdb-update` to automatically download and import the
+ PCI, USB and OUI databases into hwdb.
+
+8. Use `ninja -C build man/update-man-rules` to update the meson rules for
+ building man pages automatically from the docbook XML files included in
+ `man/`.
+
+9. There are multiple CI systems in use that run on every github PR submission.
+
+10. [Coverity](https://scan.coverity.com/) is analyzing systemd master in
+ regular intervals. The reports are available
+ [online](https://scan.coverity.com/projects/systemd).
+
+11. [oss-fuzz](https://oss-fuzz.com/) is continuously fuzzing the
+ codebase. Reports are available
+ [online](https://oss-fuzz.com/v2/testcases?project=systemd).
+
+12. Our tree includes `.editorconfig`, `.dir-locals.el` and `.vimrc` files, to
+ ensure that editors follow the right indentiation styles automatically.
+
+13. When building systemd from a git checkout the build scripts will
+ automatically enable a git commit hook that ensures whitespace cleanliness.
+
+14. [LGTM](https://lgtm.com/) analyzes every commit pushed to master. The list
+ of active alerts can be found
+ [here](https://lgtm.com/projects/g/systemd/systemd/alerts/?mode=list).
+
+Access to Coverity and oss-fuzz reports is limited. Please reach out to the
+maintainers if you need access.
diff --git a/docs/CODING_STYLE.md b/docs/CODING_STYLE.md
new file mode 100644
index 0000000..7bad3f5
--- /dev/null
+++ b/docs/CODING_STYLE.md
@@ -0,0 +1,524 @@
+---
+title: Coding Style
+---
+
+# Coding Style
+
+- 8ch indent, no tabs, except for files in `man/` which are 2ch indent,
+ and still no tabs.
+
+- We prefer `/* comments */` over `// comments` in code you commit, please. This
+ way `// comments` are left for developers to use for local, temporary
+ commenting of code for debug purposes (i.e. uncommittable stuff), making such
+ comments easily discernible from explanatory, documenting code comments
+ (i.e. committable stuff).
+
+- Don't break code lines too eagerly. We do **not** force line breaks at 80ch,
+ all of today's screens should be much larger than that. But then again, don't
+ overdo it, ~109ch should be enough really. The `.editorconfig`, `.vimrc` and
+ `.dir-locals.el` files contained in the repository will set this limit up for
+ you automatically, if you let them (as well as a few other things).
+
+- Variables and functions **must** be static, unless they have a
+ prototype, and are supposed to be exported.
+
+- structs in `PascalCase` (with exceptions, such as public API structs),
+ variables and functions in `snake_case`.
+
+- The destructors always deregister the object from the next bigger
+ object, not the other way around.
+
+- To minimize strict aliasing violations, we prefer unions over casting.
+
+- For robustness reasons, destructors should be able to destruct
+ half-initialized objects, too.
+
+- Error codes are returned as negative `Exxx`. e.g. `return -EINVAL`. There
+ are some exceptions: for constructors, it is OK to return `NULL` on
+ OOM. For lookup functions, `NULL` is fine too for "not found".
+
+ Be strict with this. When you write a function that can fail due to
+ more than one cause, it *really* should have an `int` as the return value
+ for the error code.
+
+- Do not bother with error checking whether writing to stdout/stderr
+ worked.
+
+- Do not log errors from "library" code, only do so from "main
+ program" code. (With one exception: it is OK to log with DEBUG level
+ from any code, with the exception of maybe inner loops).
+
+- Always check OOM. There is no excuse. In program code, you can use
+ `log_oom()` for then printing a short message, but not in "library" code.
+
+- Do not issue NSS requests (that includes user name and host name
+ lookups) from PID 1 as this might trigger deadlocks when those
+ lookups involve synchronously talking to services that we would need
+ to start up.
+
+- Do not synchronously talk to any other service from PID 1, due to
+ risk of deadlocks.
+
+- Avoid fixed-size string buffers, unless you really know the maximum
+ size and that maximum size is small. They are a source of errors,
+ since they possibly result in truncated strings. It is often nicer
+ to use dynamic memory, `alloca()` or VLAs. If you do allocate fixed-size
+ strings on the stack, then it is probably only OK if you either
+ use a maximum size such as `LINE_MAX`, or count in detail the maximum
+ size a string can have. (`DECIMAL_STR_MAX` and `DECIMAL_STR_WIDTH`
+ macros are your friends for this!)
+
+ Or in other words, if you use `char buf[256]` then you are likely
+ doing something wrong!
+
+- Stay uniform. For example, always use `usec_t` for time
+ values. Do not mix `usec` and `msec`, and `usec` and whatnot.
+
+- Make use of `_cleanup_free_` and friends. It makes your code much
+ nicer to read (and shorter)!
+
+- Be exceptionally careful when formatting and parsing floating point
+ numbers. Their syntax is locale dependent (i.e. `5.000` in en_US is
+ generally understood as 5, while in de_DE as 5000.).
+
+- Try to use this:
+
+ ```c
+ void foo() {
+ }
+ ```
+
+ instead of this:
+
+ ```c
+ void foo()
+ {
+ }
+ ```
+
+ But it is OK if you do not.
+
+- Single-line `if` blocks should not be enclosed in `{}`. Use this:
+
+ ```c
+ if (foobar)
+ waldo();
+ ```
+
+ instead of this:
+
+ ```c
+ if (foobar) {
+ waldo();
+ }
+ ```
+
+- Do not write `foo ()`, write `foo()`.
+
+- Please use `streq()` and `strneq()` instead of `strcmp()`, `strncmp()` where
+ applicable (i.e. wherever you just care about equality/inequality, not about
+ the sorting order).
+
+- Preferably allocate stack variables on the top of the block:
+
+ ```c
+ {
+ int a, b;
+
+ a = 5;
+ b = a;
+ }
+ ```
+
+- Unless you allocate an array, `double` is always a better choice
+ than `float`. Processors speak `double` natively anyway, so there is
+ no speed benefit, and on calls like `printf()` `float`s get promoted
+ to `double`s anyway, so there is no point.
+
+- Do not mix function invocations with variable definitions in one
+ line. Wrong:
+
+ ```c
+ {
+ int a = foobar();
+ uint64_t x = 7;
+ }
+ ```
+
+ Right:
+
+ ```c
+ {
+ int a;
+ uint64_t x = 7;
+
+ a = foobar();
+ }
+ ```
+
+- Use `goto` for cleaning up, and only use it for that. i.e. you may
+ only jump to the end of a function, and little else. Never jump
+ backwards!
+
+- Think about the types you use. If a value cannot sensibly be
+ negative, do not use `int`, but use `unsigned`.
+
+- Use `char` only for actual characters. Use `uint8_t` or `int8_t`
+ when you actually mean a byte-sized signed or unsigned
+ integers. When referring to a generic byte, we generally prefer the
+ unsigned variant `uint8_t`. Do not use types based on `short`. They
+ *never* make sense. Use `int`, `long`, `long long`, all in
+ unsigned and signed fashion, and the fixed-size types
+ `uint8_t`, `uint16_t`, `uint32_t`, `uint64_t`, `int8_t`, `int16_t`, `int32_t` and so on,
+ as well as `size_t`, but nothing else. Do not use kernel types like
+ `u32` and so on, leave that to the kernel.
+
+- Public API calls (i.e. functions exported by our shared libraries)
+ must be marked `_public_` and need to be prefixed with `sd_`. No
+ other functions should be prefixed like that.
+
+- In public API calls, you **must** validate all your input arguments for
+ programming error with `assert_return()` and return a sensible return
+ code. In all other calls, it is recommended to check for programming
+ errors with a more brutal `assert()`. We are more forgiving to public
+ users than for ourselves! Note that `assert()` and `assert_return()`
+ really only should be used for detecting programming errors, not for
+ runtime errors. `assert()` and `assert_return()` by usage of `_likely_()`
+ inform the compiler that he should not expect these checks to fail,
+ and they inform fellow programmers about the expected validity and
+ range of parameters.
+
+- Never use `strtol()`, `atoi()` and similar calls. Use `safe_atoli()`,
+ `safe_atou32()` and suchlike instead. They are much nicer to use in
+ most cases and correctly check for parsing errors.
+
+- For every function you add, think about whether it is a "logging"
+ function or a "non-logging" function. "Logging" functions do logging
+ on their own, "non-logging" function never log on their own and
+ expect their callers to log. All functions in "library" code,
+ i.e. in `src/shared/` and suchlike must be "non-logging". Every time a
+ "logging" function calls a "non-logging" function, it should log
+ about the resulting errors. If a "logging" function calls another
+ "logging" function, then it should not generate log messages, so
+ that log messages are not generated twice for the same errors.
+
+- If possible, do a combined log & return operation:
+
+ ```c
+ r = operation(...);
+ if (r < 0)
+ return log_(error|warning|notice|...)_errno(r, "Failed to ...: %m");
+ ```
+
+ If the error value is "synthetic", i.e. it was not received from
+ the called function, use `SYNTHETIC_ERRNO` wrapper to tell the logging
+ system to not log the errno value, but still return it:
+
+ ```c
+ n = read(..., s, sizeof s);
+ if (n != sizeof s)
+ return log_error_errno(SYNTHETIC_ERRNO(EIO), "Failed to read ...");
+ ```
+
+- Avoid static variables, except for caches and very few other
+ cases. Think about thread-safety! While most of our code is never
+ used in threaded environments, at least the library code should make
+ sure it works correctly in them. Instead of doing a lot of locking
+ for that, we tend to prefer using TLS to do per-thread caching (which
+ only works for small, fixed-size cache objects), or we disable
+ caching for any thread that is not the main thread. Use
+ `is_main_thread()` to detect whether the calling thread is the main
+ thread.
+
+- Command line option parsing:
+ - Do not print full `help()` on error, be specific about the error.
+ - Do not print messages to stdout on error.
+ - Do not POSIX_ME_HARDER unless necessary, i.e. avoid `+` in option string.
+
+- Do not write functions that clobber call-by-reference variables on
+ failure. Use temporary variables for these cases and change the
+ passed in variables only on success.
+
+- When you allocate a file descriptor, it should be made `O_CLOEXEC`
+ right from the beginning, as none of our files should leak to forked
+ binaries by default. Hence, whenever you open a file, `O_CLOEXEC` must
+ be specified, right from the beginning. This also applies to
+ sockets. Effectively, this means that all invocations to:
+
+ - `open()` must get `O_CLOEXEC` passed,
+ - `socket()` and `socketpair()` must get `SOCK_CLOEXEC` passed,
+ - `recvmsg()` must get `MSG_CMSG_CLOEXEC` set,
+ - `F_DUPFD_CLOEXEC` should be used instead of `F_DUPFD`, and so on,
+ - invocations of `fopen()` should take `e`.
+
+- We never use the POSIX version of `basename()` (which glibc defines it in
+ `libgen.h`), only the GNU version (which glibc defines in `string.h`).
+ The only reason to include `libgen.h` is because `dirname()`
+ is needed. Every time you need that please immediately undefine
+ `basename()`, and add a comment about it, so that no code ever ends up
+ using the POSIX version!
+
+- Use the bool type for booleans, not integers. One exception: in public
+ headers (i.e those in `src/systemd/sd-*.h`) use integers after all, as `bool`
+ is C99 and in our public APIs we try to stick to C89 (with a few extension).
+
+- When you invoke certain calls like `unlink()`, or `mkdir_p()` and you
+ know it is safe to ignore the error it might return (because a later
+ call would detect the failure anyway, or because the error is in an
+ error path and you thus couldn't do anything about it anyway), then
+ make this clear by casting the invocation explicitly to `(void)`. Code
+ checks like Coverity understand that, and will not complain about
+ ignored error codes. Hence, please use this:
+
+ ```c
+ (void) unlink("/foo/bar/baz");
+ ```
+
+ instead of just this:
+
+ ```c
+ unlink("/foo/bar/baz");
+ ```
+
+ Don't cast function calls to `(void)` that return no error
+ conditions. Specifically, the various `xyz_unref()` calls that return a `NULL`
+ object shouldn't be cast to `(void)`, since not using the return value does not
+ hide any errors.
+
+- Don't invoke `exit()`, ever. It is not replacement for proper error
+ handling. Please escalate errors up your call chain, and use normal
+ `return` to exit from the main function of a process. If you
+ `fork()`ed off a child process, please use `_exit()` instead of `exit()`,
+ so that the exit handlers are not run.
+
+- Please never use `dup()`. Use `fcntl(fd, F_DUPFD_CLOEXEC, 3)`
+ instead. For two reason: first, you want `O_CLOEXEC` set on the new `fd`
+ (see above). Second, `dup()` will happily duplicate your `fd` as 0, 1,
+ 2, i.e. stdin, stdout, stderr, should those `fd`s be closed. Given the
+ special semantics of those `fd`s, it's probably a good idea to avoid
+ them. `F_DUPFD_CLOEXEC` with `3` as parameter avoids them.
+
+- When you define a destructor or `unref()` call for an object, please
+ accept a `NULL` object and simply treat this as NOP. This is similar
+ to how libc `free()` works, which accepts `NULL` pointers and becomes a
+ NOP for them. By following this scheme a lot of `if` checks can be
+ removed before invoking your destructor, which makes the code
+ substantially more readable and robust.
+
+- Related to this: when you define a destructor or `unref()` call for an
+ object, please make it return the same type it takes and always
+ return `NULL` from it. This allows writing code like this:
+
+ ```c
+ p = foobar_unref(p);
+ ```
+
+ which will always work regardless if `p` is initialized or not, and
+ guarantees that `p` is `NULL` afterwards, all in just one line.
+
+- Use `alloca()`, but never forget that it is not OK to invoke `alloca()`
+ within a loop or within function call parameters. `alloca()` memory is
+ released at the end of a function, and not at the end of a `{}`
+ block. Thus, if you invoke it in a loop, you keep increasing the
+ stack pointer without ever releasing memory again. (VLAs have better
+ behavior in this case, so consider using them as an alternative.)
+ Regarding not using `alloca()` within function parameters, see the
+ BUGS section of the `alloca(3)` man page.
+
+- Use `memzero()` or even better `zero()` instead of `memset(..., 0, ...)`
+
+- Instead of using `memzero()`/`memset()` to initialize structs allocated
+ on the stack, please try to use c99 structure initializers. It's
+ short, prettier and actually even faster at execution. Hence:
+
+ ```c
+ struct foobar t = {
+ .foo = 7,
+ .bar = "bazz",
+ };
+ ```
+
+ instead of:
+
+ ```c
+ struct foobar t;
+ zero(t);
+ t.foo = 7;
+ t.bar = "bazz";
+ ```
+
+- When returning a return code from `main()`, please preferably use
+ `EXIT_FAILURE` and `EXIT_SUCCESS` as defined by libc.
+
+- The order in which header files are included doesn't matter too
+ much. systemd-internal headers must not rely on an include order, so
+ it is safe to include them in any order possible.
+ However, to not clutter global includes, and to make sure internal
+ definitions will not affect global headers, please always include the
+ headers of external components first (these are all headers enclosed
+ in <>), followed by our own exported headers (usually everything
+ that's prefixed by `sd-`), and then followed by internal headers.
+ Furthermore, in all three groups, order all includes alphabetically
+ so duplicate includes can easily be detected.
+
+- To implement an endless loop, use `for (;;)` rather than `while (1)`.
+ The latter is a bit ugly anyway, since you probably really
+ meant `while (true)`. To avoid the discussion what the right
+ always-true expression for an infinite while loop is, our
+ recommendation is to simply write it without any such expression by
+ using `for (;;)`.
+
+- Never use the `off_t` type, and particularly avoid it in public
+ APIs. It's really weirdly defined, as it usually is 64-bit and we
+ don't support it any other way, but it could in theory also be
+ 32-bit. Which one it is depends on a compiler switch chosen by the
+ compiled program, which hence corrupts APIs using it unless they can
+ also follow the program's choice. Moreover, in systemd we should
+ parse values the same way on all architectures and cannot expose
+ `off_t` values over D-Bus. To avoid any confusion regarding conversion
+ and ABIs, always use simply `uint64_t` directly.
+
+- Commit message subject lines should be prefixed with an appropriate
+ component name of some kind. For example "journal: ", "nspawn: " and
+ so on.
+
+- Do not use "Signed-Off-By:" in your commit messages. That's a kernel
+ thing we don't do in the systemd project.
+
+- Avoid leaving long-running child processes around, i.e. `fork()`s that
+ are not followed quickly by an `execv()` in the child. Resource
+ management is unclear in this case, and memory CoW will result in
+ unexpected penalties in the parent much, much later on.
+
+- Don't block execution for arbitrary amounts of time using `usleep()`
+ or a similar call, unless you really know what you do. Just "giving
+ something some time", or so is a lazy excuse. Always wait for the
+ proper event, instead of doing time-based poll loops.
+
+- To determine the length of a constant string `"foo"`, don't bother with
+ `sizeof("foo")-1`, please use `strlen()` instead (both gcc and clang optimize
+ the call away for fixed strings). The only exception is when declaring an
+ array. In that case use STRLEN, which evaluates to a static constant and
+ doesn't force the compiler to create a VLA.
+
+- If you want to concatenate two or more strings, consider using `strjoina()`
+ or `strjoin()` rather than `asprintf()`, as the latter is a lot slower. This
+ matters particularly in inner loops (but note that `strjoina()` cannot be
+ used there).
+
+- Please avoid using global variables as much as you can. And if you
+ do use them make sure they are static at least, instead of
+ exported. Especially in library-like code it is important to avoid
+ global variables. Why are global variables bad? They usually hinder
+ generic reusability of code (since they break in threaded programs,
+ and usually would require locking there), and as the code using them
+ has side-effects make programs non-transparent. That said, there are
+ many cases where they explicitly make a lot of sense, and are OK to
+ use. For example, the log level and target in `log.c` is stored in a
+ global variable, and that's OK and probably expected by most. Also
+ in many cases we cache data in global variables. If you add more
+ caches like this, please be careful however, and think about
+ threading. Only use static variables if you are sure that
+ thread-safety doesn't matter in your case. Alternatively, consider
+ using TLS, which is pretty easy to use with gcc's `thread_local`
+ concept. It's also OK to store data that is inherently global in
+ global variables, for example data parsed from command lines, see
+ below.
+
+- If you parse a command line, and want to store the parsed parameters
+ in global variables, please consider prefixing their names with
+ `arg_`. We have been following this naming rule in most of our
+ tools, and we should continue to do so, as it makes it easy to
+ identify command line parameter variables, and makes it clear why it
+ is OK that they are global variables.
+
+- When exposing public C APIs, be careful what function parameters you make
+ `const`. For example, a parameter taking a context object should probably not
+ be `const`, even if you are writing an otherwise read-only accessor function
+ for it. The reason is that making it `const` fixates the contract that your
+ call won't alter the object ever, as part of the API. However, that's often
+ quite a promise, given that this even prohibits object-internal caching or
+ lazy initialization of object variables. Moreover, it's usually not too useful
+ for client applications. Hence, please be careful and avoid `const` on object
+ parameters, unless you are very sure `const` is appropriate.
+
+- Make sure to enforce limits on every user controllable resource. If the user
+ can allocate resources in your code, your code must enforce some form of
+ limits after which it will refuse operation. It's fine if it is hard-coded (at
+ least initially), but it needs to be there. This is particularly important
+ for objects that unprivileged users may allocate, but also matters for
+ everything else any user may allocated.
+
+- `htonl()`/`ntohl()` and `htons()`/`ntohs()` are weird. Please use `htobe32()` and
+ `htobe16()` instead, it's much more descriptive, and actually says what really
+ is happening, after all `htonl()` and `htons()` don't operate on `long`s and
+ `short`s as their name would suggest, but on `uint32_t` and `uint16_t`. Also,
+ "network byte order" is just a weird name for "big endian", hence we might
+ want to call it "big endian" right-away.
+
+- You might wonder what kind of common code belongs in `src/shared/` and what
+ belongs in `src/basic/`. The split is like this: anything that is used to
+ implement the public shared object we provide (sd-bus, sd-login, sd-id128,
+ nss-systemd, nss-mymachines, nss-resolve, nss-myhostname, pam_systemd), must
+ be located in `src/basic` (those objects are not allowed to link to
+ libsystemd-shared.so). Conversely, anything which is shared between multiple
+ components and does not need to be in `src/basic/`, should be in
+ `src/shared/`.
+
+ To summarize:
+
+ `src/basic/`
+ - may be used by all code in the tree
+ - may not use any code outside of `src/basic/`
+
+ `src/libsystemd/`
+ - may be used by all code in the tree, except for code in `src/basic/`
+ - may not use any code outside of `src/basic/`, `src/libsystemd/`
+
+ `src/shared/`
+ - may be used by all code in the tree, except for code in `src/basic/`,
+ `src/libsystemd/`, `src/nss-*`, `src/login/pam_systemd.*`, and files under
+ `src/journal/` that end up in `libjournal-client.a` convenience library.
+ - may not use any code outside of `src/basic/`, `src/libsystemd/`, `src/shared/`
+
+- Our focus is on the GNU libc (glibc), not any other libcs. If other libcs are
+ incompatible with glibc it's on them. However, if there are equivalent POSIX
+ and Linux/GNU-specific APIs, we generally prefer the POSIX APIs. If there
+ aren't, we are happy to use GNU or Linux APIs, and expect non-GNU
+ implementations of libc to catch up with glibc.
+
+- Whenever installing a signal handler, make sure to set `SA_RESTART` for it, so
+ that interrupted system calls are automatically restarted, and we minimize
+ hassles with handling `EINTR` (in particular as `EINTR` handling is pretty broken
+ on Linux).
+
+- When applying C-style unescaping as well as specifier expansion on the same
+ string, always apply the C-style unescaping fist, followed by the specifier
+ expansion. When doing the reverse, make sure to escape `%` in specifier-style
+ first (i.e. `%` → `%%`), and then do C-style escaping where necessary.
+
+- It's a good idea to use `O_NONBLOCK` when opening 'foreign' regular files, i.e.
+ file system objects that are supposed to be regular files whose paths where
+ specified by the user and hence might actually refer to other types of file
+ system objects. This is a good idea so that we don't end up blocking on
+ 'strange' file nodes, for example if the user pointed us to a FIFO or device
+ node which may block when opening. Moreover even for actual regular files
+ `O_NONBLOCK` has a benefit: it bypasses any mandatory lock that might be in
+ effect on the regular file. If in doubt consider turning off `O_NONBLOCK` again
+ after opening.
+
+- When referring to a configuration file option in the documentation and such,
+ please always suffix it with `=`, to indicate that it is a configuration file
+ setting.
+
+- When referring to a command line option in the documentation and such, please
+ always prefix with `--` or `-` (as appropriate), to indicate that it is a
+ command line option.
+
+- When referring to a file system path that is a directory, please always
+ suffix it with `/`, to indicate that it is a directory, not a regular file
+ (or other file system object).
+
+- Don't use `fgets()`, it's too hard to properly handle errors such as overly
+ long lines. Use `read_line()` instead, which is our own function that handles
+ this much nicer.
diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md
new file mode 100644
index 0000000..f40d9a0
--- /dev/null
+++ b/docs/CONTRIBUTING.md
@@ -0,0 +1,41 @@
+---
+title: Contributing
+---
+
+# Contributing
+
+We welcome contributions from everyone. However, please follow the following guidelines when posting a GitHub Pull Request or filing a GitHub Issue on the systemd project:
+
+## Filing Issues
+
+* We use GitHub Issues **exclusively** for tracking **bugs** and **feature** **requests** of systemd. If you are looking for help, please contact our [mailing list](https://lists.freedesktop.org/mailman/listinfo/systemd-devel) instead.
+* We only track bugs in the **two** **most** **recently** **released** **versions** of systemd in the GitHub Issue tracker. If you are using an older version of systemd, please contact your distribution's bug tracker instead.
+* When filing an issue, specify the **systemd** **version** you are experiencing the issue with. Also, indicate which **distribution** you are using.
+* Please include an explanation how to reproduce the issue you are pointing out.
+
+Following these guidelines makes it easier for us to process your issue, and ensures we won't close your issue right-away for being misfiled.
+
+### Older downstream versions
+For older versions that are still supported by your distribution please use respective downstream tracker:
+* **Fedora** - [bugzilla](https://bugzilla.redhat.com/enter_bug.cgi?product=Fedora&component=systemd)
+* **RHEL-7/CentOS-7** - [bugzilla](https://bugzilla.redhat.com/enter_bug.cgi?product=Red%20Hat%20Enterprise%20Linux%207&component=systemd) or [systemd-rhel github](https://github.com/lnykryn/systemd-rhel/issues)
+* **Debian** - [bugs.debian.org](https://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=systemd)
+
+## Security vulnerability reports
+
+If you discover a security vulnerability, we'd appreciate a non-public disclosure. The issue tracker and mailing list listed above are fully public. If you need to reach systemd developers in a non-public way, report the issue in one of the "big" distributions using systemd: [Fedora](https://bugzilla.redhat.com/enter_bug.cgi?product=Fedora&component=systemd) (be sure to check "Security Sensitive Bug" under "Show Advanced Fields"), [Ubuntu](https://launchpad.net/ubuntu/+source/systemd/+filebug) (be sure to change "This bug contains information that is" from "Public" to "Private Security"), or [Debian](mailto:security@debian.org). Various systemd developers are active distribution maintainers and will propagate the information about the bug to other parties.
+
+## Posting Pull Requests
+
+* Make sure to post PRs only relative to a very recent git master.
+* Follow our [Coding Style](CODING_STYLE.md) when contributing code. This is a requirement for all code we merge.
+* Please make sure to test your change before submitting the PR. See the [Hacking guide](HACKING.md) for details on how to do this.
+* Make sure to run the test suite locally, before posting your PR. We use a CI system, meaning we don't even look at your PR, if the build and tests don't pass.
+* If you need to update the code in an existing PR, force-push into the same branch, overriding old commits with new versions.
+* After you have pushed a new version, add a comment about the new version (no notification is sent just for the commits, so it's easy to miss the update without an explicit comment). If you are a member of the systemd project on GitHub, remove the `reviewed/needs-rework` label.
+
+## Final Words
+
+We'd like to apologize in advance if we are not able to process and reply to your issue or PR right-away. We have a lot of work to do, but we are trying our best!
+
+Thank you very much for your contributions!
diff --git a/docs/DISTRO_PORTING.md b/docs/DISTRO_PORTING.md
new file mode 100644
index 0000000..f8b98bc
--- /dev/null
+++ b/docs/DISTRO_PORTING.md
@@ -0,0 +1,79 @@
+---
+title: Porting systemd To New Distributions
+---
+
+# Porting systemd To New Distributions
+
+## HOWTO
+
+You need to make the follow changes to adapt systemd to your
+distribution:
+
+1. Find the right configure parameters for:
+
+ * `-Drootprefix=`
+ * `-Dsysvinit-path=`
+ * `-Dsysvrcnd-path=`
+ * `-Drc-local=`
+ * `-Dhalt-local=`
+ * `-Dloadkeys-path=`
+ * `-Dsetfont-path=`
+ * `-Dtty-gid=`
+ * `-Dntp-servers=`
+ * `-Ddns-servers=`
+ * `-Dsupport-url=`
+
+2. Try it out.
+
+ Play around (as an ordinary user) with
+ `/usr/lib/systemd/systemd --test --system` for a test run
+ of systemd without booting. This will read the unit files and
+ print the initial transaction it would execute during boot-up.
+ This will also inform you about ordering loops and suchlike.
+
+## NTP Pool
+
+By default, systemd-timesyncd uses the Google Public NTP servers
+`time[1-4].google.com`, if no other NTP configuration is available.
+They serve time that uses a
+[leap second smear](https://developers.google.com/time/smear)
+and can be up to .5s off from servers that use stepped leap seconds.
+
+If you prefer to use leap second steps, please register your own
+vendor pool at ntp.org and make it the built-in default by
+passing `-Dntp-servers=` to meson. Registering vendor
+pools is [free](http://www.pool.ntp.org/en/vendors.html).
+
+Use `-Dntp-servers=` to direct systemd-timesyncd to different fallback
+NTP servers.
+
+## DNS Servers
+
+By default, systemd-resolved uses the Google Public DNS servers
+`8.8.8.8`, `8.8.4.4`, `2001:4860:4860::8888`, `2001:4860:4860::8844`
+as fallback, if no other DNS configuration is available.
+
+Use `-Ddns-servers=` to direct systemd-resolved to different fallback
+DNS servers.
+
+## PAM
+
+The default PAM config shipped by systemd is really bare bones.
+It does not include many modules your distro might want to enable
+to provide a more seamless experience. For example, limits set in
+`/etc/security/limits.conf` will not be read unless you load `pam_limits`.
+Make sure you add modules your distro expects from user services.
+
+Pass `-Dpamconfdir=no` to meson to avoid installing this file and
+instead install your own.
+
+## Contributing Upstream
+
+We generally do no longer accept distribution-specific patches to
+systemd upstream. If you have to make changes to systemd's source code
+to make it work on your distribution, unless your code is generic
+enough to be generally useful, we are unlikely to merge it. Please
+always consider adopting the upstream defaults. If that is not
+possible, please maintain the relevant patches downstream.
+
+Thank you for understanding.
diff --git a/docs/ENVIRONMENT.md b/docs/ENVIRONMENT.md
new file mode 100644
index 0000000..99b5b03
--- /dev/null
+++ b/docs/ENVIRONMENT.md
@@ -0,0 +1,179 @@
+---
+title: Known Environment Variables
+---
+
+# Known Environment Variables
+
+A number of systemd components take additional runtime parameters via
+environment variables. Many of these environment variables are not supported at
+the same level as command line switches and other interfaces are: we don't
+document them in the man pages and we make no stability guarantees for
+them. While they generally are unlikely to be dropped any time soon again, we
+do not want to guarantee that they stay around for good either.
+
+Below is an (incomprehensive) list of the environment variables understood by
+the various tools. Note that this list only covers environment variables not
+documented in the proper man pages.
+
+All tools:
+
+* `$SYSTEMD_OFFLINE=[0|1]` — if set to `1`, then `systemctl` will
+ refrain from talking to PID 1; this has the same effect as the historical
+ detection of `chroot()`. Setting this variable to `0` instead has a similar
+ effect as `SYSTEMD_IGNORE_CHROOT=1`; i.e. tools will try to
+ communicate with PID 1 even if a `chroot()` environment is detected.
+ You almost certainly want to set this to `1` if you maintain a package build system
+ or similar and are trying to use a modern container system and not plain
+ `chroot()`.
+
+* `$SYSTEMD_IGNORE_CHROOT=1` — if set, don't check whether being invoked in a
+ `chroot()` environment. This is particularly relevant for systemctl, as it
+ will not alter its behaviour for `chroot()` environments if set. Normally it
+ refrains from talking to PID 1 in such a case; turning most operations such
+ as `start` into no-ops. If that's what's explicitly desired, you might
+ consider setting `SYSTEMD_OFFLINE=1`.
+
+* `$SD_EVENT_PROFILE_DELAYS=1` — if set, the sd-event event loop implementation
+ will print latency information at runtime.
+
+* `$SYSTEMD_PROC_CMDLINE` — if set, may contain a string that is used as kernel
+ command line instead of the actual one readable from /proc/cmdline. This is
+ useful for debugging, in order to test generators and other code against
+ specific kernel command lines.
+
+* `$SYSTEMD_IN_INITRD` — takes a boolean. If set, overrides initrd detection.
+ This is useful for debugging and testing initrd-only programs in the main
+ system.
+
+* `$SYSTEMD_BUS_TIMEOUT=SECS` — specifies the maximum time to wait for method call
+ completion. If no time unit is specified, assumes seconds. The usual other units
+ are understood, too (us, ms, s, min, h, d, w, month, y). If it is not set or set
+ to 0, then the built-in default is used.
+
+* `$SYSTEMD_MEMPOOL=0` — if set, the internal memory caching logic employed by
+ hash tables is turned off, and libc malloc() is used for all allocations.
+
+* `$SYSTEMD_EMOJI=0` — if set, tools such as "systemd-analyze security" will
+ not output graphical smiley emojis, but ASCII alternatives instead. Note that
+ this only controls use of Unicode emoji glyphs, and has no effect on other
+ Unicode glyphs.
+
+systemctl:
+
+* `$SYSTEMCTL_FORCE_BUS=1` — if set, do not connect to PID1's private D-Bus
+ listener, and instead always connect through the dbus-daemon D-bus broker.
+
+* `$SYSTEMCTL_INSTALL_CLIENT_SIDE=1` — if set, enable or disable unit files on
+ the client side, instead of asking PID 1 to do this.
+
+* `$SYSTEMCTL_SKIP_SYSV=1` — if set, do not call out to SysV compatibility hooks.
+
+systemd-nspawn:
+
+* `$UNIFIED_CGROUP_HIERARCHY=1` — if set, force nspawn into unified cgroup
+ hierarchy mode.
+
+* `$SYSTEMD_NSPAWN_API_VFS_WRITABLE=1` — if set, make /sys and /proc/sys and
+ friends writable in the container. If set to "network", leave only
+ /proc/sys/net writable.
+
+* `$SYSTEMD_NSPAWN_CONTAINER_SERVICE=…` — override the "service" name nspawn
+ uses to register with machined. If unset defaults to "nspawn", but with this
+ variable may be set to any other value.
+
+* `$SYSTEMD_NSPAWN_USE_CGNS=0` — if set, do not use cgroup namespacing, even if
+ it is available.
+
+* `$SYSTEMD_NSPAWN_LOCK=0` — if set, do not lock container images when running.
+
+* `$SYSTEMD_NSPAWN_TMPFS_TMP=0` — if set, do not overmount /tmp in the
+ container with a tmpfs, but leave the directory from the image in place.
+
+systemd-logind:
+
+* `$SYSTEMD_BYPASS_HIBERNATION_MEMORY_CHECK=1` — if set, report that
+ hibernation is available even if the swap devices do not provide enough room
+ for it.
+
+systemd-udevd:
+
+* `$NET_NAMING_SCHEME=` – if set, takes a network naming scheme (i.e. one of
+ "v238", "v239", "v240"…, or the special value "latest") as parameter. If
+ specified udev's net_id builtin will follow the specified naming scheme when
+ determining stable network interface names. This may be used to revert to
+ naming schemes of older udev versions, in order to provide more stable naming
+ across updates. This environment variable takes precedence over the kernel
+ command line option `net.naming-scheme=`, except if the value is prefixed
+ with `:` in which case the kernel command line option takes precedence, if it
+ is specified as well.
+
+installed systemd tests:
+
+* `$SYSTEMD_TEST_DATA` — override the location of test data. This is useful if
+ a test executable is moved to an arbitrary location.
+
+nss-systemd:
+
+* `$SYSTEMD_NSS_BYPASS_SYNTHETIC=1` — if set, `nss-systemd` won't synthesize
+ user/group records for the `root` and `nobody` users if they are missing from
+ `/etc/passwd`.
+
+* `$SYSTEMD_NSS_DYNAMIC_BYPASS=1` — if set, `nss-systemd` won't return
+ user/group records for dynamically registered service users (i.e. users
+ registered through `DynamicUser=1`).
+
+* `$SYSTEMD_NSS_BYPASS_BUS=1` — if set, `nss-systemd` won't use D-Bus to do
+ dynamic user lookups. This is primarily useful to make `nss-systemd` work
+ safely from within `dbus-daemon`.
+
+systemd-timedated:
+
+* `$SYSTEMD_TIMEDATED_NTP_SERVICES=…` — colon-separated list of unit names of
+ NTP client services. If set, `timedatectl set-ntp on` enables and starts the
+ first existing unit listed in the environment variable, and
+ `timedatectl set-ntp off` disables and stops all listed units.
+
+systemd-sulogin-shell:
+
+* `$SYSTEMD_SULOGIN_FORCE=1` — This skips asking for the root password if the
+ root password is not available (such as when the root account is locked).
+ See `sulogin(8)` for more details.
+
+bootctl and other tools that access the EFI System Partition (ESP):
+
+* `$SYSTEMD_RELAX_ESP_CHECKS=1` — if set, the ESP validation checks are
+ relaxed. Specifically, validation checks that ensure the specified ESP path
+ is a FAT file system are turned off, as are checks that the path is located
+ on a GPT partition with the correct type UUID.
+
+* `$SYSTEMD_ESP_PATH=…` — override the path to the EFI System Partition. This
+ may be used to override ESP path auto detection, and redirect any accesses to
+ the ESP to the specified directory. Not that unlike with bootctl's --path=
+ switch only very superficial validation of the specified path is done when
+ this environment variable is used.
+
+systemd itself:
+
+* `$SYSTEMD_ACTIVATION_UNIT` — set for all NSS and PAM module invocations that
+ are done by the service manager on behalf of a specific unit, in child
+ processes that are later (after execve()) going to become unit
+ processes. Contains the full unit name (e.g. "foobar.service"). NSS and PAM
+ modules can use this information to determine in which context and on whose
+ behalf they are being called, which may be useful to avoid deadlocks, for
+ example to bypass IPC calls to the very service that is about to be
+ started. Note that NSS and PAM modules should be careful to only rely on this
+ data when invoked privileged, or possibly only when getppid() returns 1, as
+ setting environment variables is of course possible in any even unprivileged
+ contexts.
+
+* `$SYSTEMD_ACTIVATION_SCOPE` — closely related to `$SYSTEMD_ACTIVATION_UNIT`,
+ it is either set to `system` or `user` depending on whether the NSS/PAM
+ module is called by systemd in `--system` or `--user` mode.
+
+systemd-remount-fs:
+
+* `$SYSTEMD_REMOUNT_ROOT_RW=1` — if set and and no entry for the root directory
+ exists in /etc/fstab (this file always takes precedence), then the root
+ directory is remounted writable. This is primarily used by
+ systemd-gpt-auto-generator to ensure the root partition is mounted writable
+ in accordance to the GPT partition flags.
diff --git a/docs/HACKING.md b/docs/HACKING.md
new file mode 100644
index 0000000..b14be72
--- /dev/null
+++ b/docs/HACKING.md
@@ -0,0 +1,127 @@
+---
+title: Hacking on systemd
+---
+
+# Hacking on systemd
+
+We welcome all contributions to systemd. If you notice a bug or a missing
+feature, please feel invited to fix it, and submit your work as a GitHub Pull
+Request (PR) at https://github.com/systemd/systemd/pull/new.
+
+Please make sure to follow our [Coding Style](CODING_STYLE.md) when submitting patches.
+Also have a look at our [Contribution Guidelines](CONTRIBUTING.md).
+
+When adding new functionality, tests should be added. For shared functionality
+(in `src/basic/` and `src/shared/`) unit tests should be sufficient. The general
+policy is to keep tests in matching files underneath `src/test/`,
+e.g. `src/test/test-path-util.c` contains tests for any functions in
+`src/basic/path-util.c`. If adding a new source file, consider adding a matching
+test executable. For features at a higher level, tests in `src/test/` are very
+strongly recommended. If that is not possible, integration tests in `test/` are
+encouraged.
+
+Please also have a look at our list of [code quality tools](CODE_QUALITY.md) we have setup for systemd,
+to ensure our codebase stays in good shape.
+
+Please always test your work before submitting a PR. For many of the components
+of systemd testing is straight-forward as you can simply compile systemd and
+run the relevant tool from the build directory.
+
+For some components (most importantly, systemd/PID1 itself) this is not
+possible, however. In order to simplify testing for cases like this we provide
+a set of `mkosi` build files directly in the source tree. `mkosi` is a tool for
+building clean OS images from an upstream distribution in combination with a
+fresh build of the project in the local working directory. To make use of this,
+please acquire `mkosi` from https://github.com/systemd/mkosi first, unless your
+distribution has packaged it already and you can get it from there. After the
+tool is installed it is sufficient to type `mkosi` in the systemd project
+directory to generate a disk image `image.raw` you can boot either in
+`systemd-nspawn` or in an UEFI-capable VM:
+
+```
+# systemd-nspawn -bi image.raw
+```
+
+or:
+
+```
+# qemu-system-x86_64 -enable-kvm -m 512 -smp 2 -bios /usr/share/edk2/ovmf/OVMF_CODE.fd -hda image.raw
+```
+
+Every time you rerun the `mkosi` command a fresh image is built, incorporating
+all current changes you made to the project tree.
+
+Alternatively, you may install the systemd version from your git check-out
+directly on top of your host system's directory tree. This mostly works fine,
+but of course you should know what you are doing as you might make your system
+unbootable in case of a bug in your changes. Also, you might step into your
+package manager's territory with this. Be careful!
+
+And never forget: most distributions provide very simple and convenient ways to
+install all development packages necessary to build systemd. For example, on
+Fedora the following command line should be sufficient to install all of
+systemd's build dependencies:
+
+```
+# dnf builddep systemd
+```
+
+Putting this all together, here's a series of commands for preparing a patch
+for systemd (this example is for Fedora):
+
+```sh
+$ sudo dnf builddep systemd # install build dependencies
+$ sudo dnf install mkosi # install tool to quickly build images
+$ git clone https://github.com/systemd/systemd.git
+$ cd systemd
+$ vim src/core/main.c # or wherever you'd like to make your changes
+$ meson build # configure the build
+$ ninja -C build # build it locally, see if everything compiles fine
+$ ninja -C build test # run some simple regression tests
+$ (umask 077; echo 123 > mkosi.rootpw) # set root password used by mkosi
+$ sudo mkosi # build a test image
+$ sudo systemd-nspawn -bi image.raw # boot up the test image
+$ git add -p # interactively put together your patch
+$ git commit # commit it
+$ git push REMOTE HEAD:refs/heads/BRANCH
+ # where REMOTE is your "fork" on GitHub
+ # and BRANCH is a branch name.
+```
+
+And after that, head over to your repo on GitHub and click "Compare & pull request"
+
+Happy hacking!
+
+
+## Fuzzers
+
+systemd includes fuzzers in `src/fuzz/` that use libFuzzer and are automatically
+run by [OSS-Fuzz](https://github.com/google/oss-fuzz) with sanitizers. To add a
+fuzz target, create a new `src/fuzz/fuzz-foo.c` file with a `LLVMFuzzerTestOneInput`
+function and add it to the list in `src/fuzz/meson.build`.
+
+Whenever possible, a seed corpus and a dictionary should also be added with new
+fuzz targets. The dictionary should be named `src/fuzz/fuzz-foo.dict` and the seed
+corpus should be built and exported as `$OUT/fuzz-foo_seed_corpus.zip` in
+`tools/oss-fuzz.sh`.
+
+The fuzzers can be built locally if you have libFuzzer installed by running
+`tools/oss-fuzz.sh`. You should also confirm that the fuzzer runs in the
+OSS-Fuzz environment by checking out the OSS-Fuzz repo, and then running
+commands like this:
+
+```
+python infra/helper.py build_image systemd
+python infra/helper.py build_fuzzers --sanitizer memory systemd ../systemd
+python infra/helper.py run_fuzzer systemd fuzz-foo
+```
+
+If you find a bug that impacts the security of systemd, please follow the
+guidance in [CONTRIBUTING.md](CONTRIBUTING.md) on how to report a security vulnerability.
+
+For more details on building fuzzers and integrating with OSS-Fuzz, visit:
+
+- https://github.com/google/oss-fuzz/blob/master/docs/new_project_guide.md
+- https://llvm.org/docs/LibFuzzer.html
+- https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md
+- https://chromium.googlesource.com/chromium/src/testing/libfuzzer/+/HEAD/efficient_fuzzer.md
diff --git a/docs/PORTABLE_SERVICES.md b/docs/PORTABLE_SERVICES.md
new file mode 100644
index 0000000..5b6c085
--- /dev/null
+++ b/docs/PORTABLE_SERVICES.md
@@ -0,0 +1,260 @@
+---
+title: Portable Services Introduction
+---
+
+# Portable Services Introduction
+
+This systemd version includes a preview of the "portable service"
+concept. "Portable Services" are supposed to be an incremental improvement over
+traditional system services, making two specific facets of container management
+available to system services more readily. Specifically:
+
+1. The bundling of applications, i.e. packing up multiple services, their
+ binaries and all their dependencies in a single image, and running them
+ directly from it.
+
+2. Stricter default security policies, i.e. sand-boxing of applications.
+
+The primary tool for interfacing with "portable services" is the new
+"portablectl" program. It's currently shipped in /usr/lib/systemd/portablectl
+(i.e. not in the `$PATH`), since it's not yet considered part of the officially
+supported systemd interfaces — it's a preview still after all.
+
+Portable services don't bring anything inherently new to the table. All they do
+is put together known concepts in a slightly nicer way to cover a specific set
+of use-cases in a nicer way.
+
+## So, what *is* a "Portable Service"?
+
+A portable service is ultimately just an OS tree, either inside of a directory
+tree, or inside a raw disk image containing a Linux file system. This tree is
+called the "image". It can be "attached" or "detached" from the system. When
+"attached" specific systemd units from the image are made available on the host
+system, then behaving pretty much exactly like locally installed system
+services. When "detached" these units are removed again from the host, leaving
+no artifacts around (except maybe messages they might have logged).
+
+The OS tree/image can be created with any tool of your choice. For example, you
+can use `dnf --installroot=` if you like, or `debootstrap`, the image format is
+entirely generic, and doesn't have to carry any specific metadata beyond what
+distribution images carry anyway. Or to say this differently: the image format
+doesn't define any new metadata as unit files and OS tree directories or disk
+images are already sufficient, and pretty universally available these days. One
+particularly nice tool for creating suitable images is
+[mkosi](https://github.com/systemd/mkosi), but many other existing tools will
+do too.
+
+If you so will, "Portable Services" are a nicer way to manage chroot()
+environments, with better security, tooling and behavior.
+
+## Where's the difference to a "Container"?
+
+"Container" is a very vague term, after all it is used for
+systemd-nspawn/LXC-type OS containers, for Docker/rkt-like micro service
+containers, and even certain 'lightweight' VM runtimes.
+
+The "portable service" concept ultimately will not provide a fully isolated
+environment to the payload, like containers mostly intend to. Instead they are
+from the beginning more alike regular system services, can be controlled with
+the same tools, are exposed the same way in all infrastructure and so on. Their
+main difference is that the use a different root directory than the rest of the
+system. Hence, the intention is not to run code in a different, isolated world
+from the host — like most containers would do it —, but to run it in the same
+world, but with stricter access controls on what the service can see and do.
+
+As one point of differentiation: as programs run as "portable services" are
+pretty much regular system services, they won't run as PID 1 (like Docker would
+do it), but as normal process. A corollary of that is that they aren't supposed
+to manage anything in their own environment (such as the network) as the
+execution environment is mostly shared with the rest of the system.
+
+The primary focus use-case of "portable services" is to extend the host system
+with encapsulated extensions, but provide almost full integration with the rest
+of the system, though possibly restricted by effective security knobs. This
+focus includes system extensions otherwise sometimes called "super-privileged
+containers".
+
+Note that portable services are only available for system services, not for
+user services. i.e. the functionality cannot be used for the stuff
+bubblewrap/flatpak is focusing on.
+
+## Mode of Operation
+
+If you have portable service image, maybe in a raw disk image called
+`foobar_0.7.23.raw`, then attaching the services to the host is as easy as:
+
+```
+# /usr/lib/systemd/portablectl attach foobar_0.7.23.raw
+```
+
+This command does the following:
+
+1. It dissects the image, checks and validates the `/etc/os-release`
+ (or `/usr/lib/os-release`, see below) data of the image, and looks for
+ all included unit files.
+
+2. It copies out all unit files with a suffix of `.service`, `.socket`,
+ `.target`, `.timer` and `.path`. whose name begins with the image's name
+ (with the .raw removed), truncated at the first underscore (if there is
+ one). This prefix name generated from the image name must be followed by a
+ ".", "-" or "@" character in the unit name. Or in other words, given the
+ image name of `foobar_0.7.23.raw` all unit files matching
+ `foobar-*.{service|socket|target|timer|path}`,
+ `foobar@.{service|socket|target|timer|path}` as well as
+ `foobar.*.{service|socket|target|timer|path}` and
+ `foobar.{service|socket|target|timer|path}` are copied out. These unit files
+ are placed in `/etc/systemd/system.attached/` (which is part of the normal
+ unit file search path of PID 1, and thus loaded exactly like regular unit
+ files). Within the images the unit files are looked for at the usual
+ locations, i.e. in `/usr/lib/systemd/system/` and `/etc/systemd/system/` and
+ so on, relative to the image's root.
+
+3. For each such unit file a drop-in file is created. Let's say
+ `foobar-waldo.service` was one of the unit files copied to
+ `/etc/systemd/system.attached/`, then a drop-in file
+ `/etc/systemd/system.attached/foobar-waldo.service.d/20-portable.conf` is
+ created, containing a few lines of additional configuration:
+
+ ```
+ [Service]
+ RootImage=/path/to/foobar.raw
+ Environment=PORTABLE=foobar
+ LogExtraFields=PORTABLE=foobar
+ ```
+
+4. For each such unit a "profile" drop-in is linked in. This "profile" drop-in
+ generally contains security options that lock down the service. By default
+ the `default` profile is used, which provides a medium level of
+ security. There's also `trusted` which runs the service at the highest
+ privileges, i.e. host's root and everything. The `strict` profile comes with
+ the toughest security restrictions. Finally, `nonetwork` is like `default`
+ but without network access. Users may define their own profiles too (or
+ modify the existing ones)
+
+And that's already it.
+
+Note that the images need to stay around (and the same location) as long as the
+portable service is attached. If an image is moved, the `RootImage=` line
+written to the unit drop-in would point to an non-existing place, and break the
+logic.
+
+The `portablectl detach` command executes the reverse operation: it looks for
+the drop-ins and the unit files associated with the image, and removes them
+again.
+
+Note that `portable attach` won't enable or start any of the units it copies
+out. This still has to take place in a second, separate step. (That said We
+might add options to do this automatically later on.).
+
+## Requirements on Images
+
+Note that portable services don't introduce any new image format, but most OS
+images should just work the way they are. Specifically, the following
+requirements are made for an image that can be attached/detached with
+`portablectl`.
+
+1. It must contain a binary (and its dependencies) that shall be invoked,
+ including all its dependencies. If binary code, the code needs to be
+ compiled for an architecture compatible with the host.
+
+2. The image must either be a plain sub-directory (or btrfs subvolume)
+ containing the binaries and its dependencies in a classic Linux OS tree, or
+ must be a raw disk image either containing only one, naked file system, or
+ an image with a partition table understood by the Linux kernel with only a
+ single partition defined, or alternatively, a GPT partition table with a set
+ of properly marked partitions following the [Discoverable Partitions
+ Specification](https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/).
+
+3. The image must at least contain one matching unit file, with the right name
+ prefix and suffix (see above). The unit file is searched in the usual paths,
+ i.e. primarily /etc/systemd/system/ and /usr/lib/systemd/system/ within the
+ image. (The implementation will check a couple of other paths too, but it's
+ recommended to use these two paths.)
+
+4. The image must contain an os-release file, either in `/etc/os-release` or
+ `/usr/lib/os-release`. The file should follow the standard format.
+
+5. The image must contain the files `/etc/resolv.conf` and `/etc/machine-id`
+ (empty files are ok), they will be bind mounted from the host at runtime.
+
+Note that generally images created by tools such as `debootstrap`, `dnf
+--installroot=` or `mkosi` qualify for all of the above in one way or
+another. If you wonder what the most minimal image would be that complies with
+the requirements above, it could consist of this:
+
+```
+/usr/bin/minimald # a statically compiled binary
+/usr/lib/systemd/minimal-test.service # the unit file for the service, with ExecStart=/usr/bin/minimald
+/usr/lib/os-release # an os-release file explaining what this is
+```
+
+And that's it.
+
+Note that qualifying images do not have to contain an init system of their
+own. If they do, it's fine, it will be ignored by the portable service logic,
+but they generally don't have to, and it might make sense to avoid any, to keep
+images minimal.
+
+Note that as no new image format or metadata is defined, it's very
+straight-forward to define images than can be made use of it a number of
+different ways. For example, by using `mkosi -b` you can trivially build a
+single, unified image that:
+
+1. Can be attached as portable service, to run any container services natively
+ on the host.
+
+2. Can be run as OS container, using `systemd-nspawn`, by booting the image
+ with `systemd-nspawn -i -b`.
+
+3. Can be booted directly as VM image, using a generic VM executor such as
+ `virtualbox`/`qemu`/`kvm`
+
+4. Can be booted directly on bare-metal systems.
+
+Of course, to facilitate 2, 3 and 4 you need to include an init system in the
+image. To facility 3 and 4 you also need to include a boot loader in the
+image. As mentioned `mkosi -b` takes care of all of that for you, but any other
+image generator should work too.
+
+## Execution Environment
+
+Note that the code in portable service images is run exactly like regular
+services. Hence there's no new execution environment to consider. Oh, unlike
+Docker would do it, as these are regular system services they aren't run as PID
+1 either, but with regular PID values.
+
+## Access to host resources
+
+If services shipped with this mechanism shall be able to access host resources
+(such as files or AF_UNIX sockets for IPC), use the normal `BindPaths=` and
+`BindReadOnlyPaths=` settings in unit files to mount them in. In fact the
+`default` profile mentioned above makes use of this to ensure
+`/etc/resolv.conf`, the D-Bus system bus socket or write access to the logging
+subsystem are available to the service.
+
+## Instantiation
+
+Sometimes it makes sense to instantiate the same set of services multiple
+times. The portable service concept does not introduce a new logic for this. It
+is recommended to use the regular unit templating of systemd for this, i.e. to
+include template units such as `foobar@.service`, so that instantiation is as
+simple as:
+
+```
+# /usr/lib/systemd/portablectl attach foobar_0.7.23.raw
+# systemctl enable --now foobar@instancea.service
+# systemctl enable --now foobar@instanceb.service
+…
+```
+
+The benefit of this approach is that templating works exactly the same for
+units shipped with the OS itself as for attached portable services.
+
+## Immutable images with local data
+
+It's a good idea to keep portable service images read-only during normal
+operation. In fact all but the `trusted` profile will default to this kind of
+behaviour, by setting the `ProtectSystem=strict` option. In this case writable
+service data may be placed on the host file system. Use `StateDirectory=` in
+the unit files to enable such behaviour and add a local data directory to the
+services copied onto the host.
diff --git a/docs/PREDICTABLE_INTERFACE_NAMES.md b/docs/PREDICTABLE_INTERFACE_NAMES.md
new file mode 100644
index 0000000..b29016f
--- /dev/null
+++ b/docs/PREDICTABLE_INTERFACE_NAMES.md
@@ -0,0 +1,68 @@
+---
+title: Predictable Network Interface Names
+---
+
+# Predictable Network Interface Names
+
+Starting with v197 systemd/udev will automatically assign predictable, stable network interface names for all local Ethernet, WLAN and WWAN interfaces. This is a departure from the traditional interface naming scheme ("eth0", "eth1", "wlan0", ...), but should fix real problems.
+
+
+## Why?
+
+The classic naming scheme for network interfaces applied by the kernel is to simply assign names beginning with "eth0", "eth1", ... to all interfaces as they are probed by the drivers. As the driver probing is generally not predictable for modern technology this means that as soon as multiple network interfaces are available the assignment of the names "eth0", "eth1" and so on is generally not fixed anymore and it might very well happen that "eth0" on one boot ends up being "eth1" on the next. This can have serious security implications, for example in firewall rules which are coded for certain naming schemes, and which are hence very sensitive to unpredictable changing names.
+
+To fix this problem multiple solutions have been proposed and implemented. For a longer time udev shipped support for assigning permanent "ethX" names to certain interfaces based on their MAC addresses. This turned out to have a multitude of problems, among them: this required a writable root directory which is generally not available; the statelessness of the system is lost as booting an OS image on a system will result in changed configuration of the image; on many systems MAC addresses are not actually fixed, such as on a lot of embedded hardware and particularly on all kinds of virtualization solutions. The biggest of all however is that the userspace components trying to assign the interface name raced against the kernel assigning new names from the same "ethX" namespace, a race condition with all kinds of weird effects, among them that assignment of names sometimes failed. As a result support for this has been removed from systemd/udev a while back.
+
+Another solution that has been implemented is "biosdevname" which tries to find fixed slot topology information in certain firmware interfaces and uses them to assign fixed names to interfaces which incorporate their physical location on the mainboard. In a way this naming scheme is similar to what is already done natively in udev for various device nodes via /dev/*/by-path/ symlinks. In many cases, biosdevname departs from the low-level kernel device identification schemes that udev generally uses for these symlinks, and instead invents its own enumeration schemes.
+
+Finally, many distributions support renaming interfaces to user-chosen names (think: "internet0", "dmz0", ...) keyed off their MAC addresses or physical locations as part of their networking scripts. This is a very good choice but does have the problem that it implies that the user is willing and capable of choosing and assigning these names.
+
+We believe it is a good default choice to generalize the scheme pioneered by "biosdevname". Assigning fixed names based on firmware/topology/location information has the big advantage that the names are fully automatic, fully predictable, that they stay fixed even if hardware is added or removed (i.e. no reenumeration takes place) and that broken hardware can be replaced seamlessly. That said, they admittedly are sometimes harder to read than the "eth0" or "wlan0" everybody is used to. Example: "enp5s0"
+
+
+## What precisely has changed in v197?
+
+With systemd 197 we have added native support for a number of different naming policies into systemd/udevd proper and made a scheme similar to biosdevname's (but generally more powerful, and closer to kernel-internal device identification schemes) the default. The following different naming schemes for network interfaces are now supported by udev natively:
+
+1. Names incorporating Firmware/BIOS provided index numbers for on-board devices (example: `eno1`)
+1. Names incorporating Firmware/BIOS provided PCI Express hotplug slot index numbers (example: `ens1`)
+1. Names incorporating physical/geographical location of the connector of the hardware (example: `enp2s0`)
+1. Names incorporating the interfaces's MAC address (example: `enx78e7d1ea46da`)
+1. Classic, unpredictable kernel-native ethX naming (example: `eth0`)
+
+By default, systemd v197 will now name interfaces following policy 1) if that information from the firmware is applicable and available, falling back to 2) if that information from the firmware is applicable and available, falling back to 3) if applicable, falling back to 5) in all other cases. Policy 4) is not used by default, but is available if the user chooses so.
+
+This combined policy is only applied as last resort. That means, if the system has biosdevname installed, it will take precedence. If the user has added udev rules which change the name of the kernel devices these will take precedence too. Also, any distribution specific naming schemes generally take precedence.
+
+
+## Come again, what good does this do?
+
+With this new scheme you now get:
+
+* Stable interface names across reboots
+* Stable interface names even when hardware is added or removed, i.e. no re-enumeration takes place (to the level the firmware permits this)
+* Stable interface names when kernels or drivers are updated/changed
+* Stable interface names even if you have to replace broken ethernet cards by new ones
+* The names are automatically determined without user configuration, they just work
+* The interface names are fully predictable, i.e. just by looking at lspci you can figure out what the interface is going to be called
+* Fully stateless operation, changing the hardware configuration will not result in changes in /etc
+* Compatibility with read-only root
+* The network interface naming now follows more closely the scheme used for aliasing block device nodes and other device nodes in /dev via symlinks
+* Applicability to both x86 and non-x86 machines
+* The same on all distributions that adopted systemd/udev
+* It's easy to opt out of the scheme (see below)
+
+Does this have any drawbacks? Yes, it does. Previously it was practically guaranteed that hosts equipped with a single ethernet card only had a single "eth0" interface. With this new scheme in place, an administrator now has to check first what the local interface name is before he can invoke commands on it where previously he had a good chance that "eth0" was the right name.
+
+
+## I don't like this, how do I disable this?
+
+You basically have three options:
+
+1. You disable the assignment of fixed names, so that the unpredictable kernel names are used again. For this, simply mask udev's .link file for the default policy: `ln -s /dev/null /etc/systemd/network/99-default.link`
+1. You create your own manual naming scheme, for example by naming your interfaces "internet0", "dmz0" or "lan0". For that create your own .link files in /etc/systemd/network/, that choose an explicit name or a better naming scheme for one, some, or all of your interfaces. See [[systemd.link(5)|http://www.freedesktop.org/software/systemd/man/systemd.link.html]] for more information.
+1. You pass the net.ifnames=0 on the kernel command line
+
+## How does the new naming scheme look like, precisely?
+
+That's documented in detail in a comment block [[the sources of the net_id built-in|https://github.com/systemd/systemd/blob/master/src/udev/udev-builtin-net_id.c#L20]]. Please refer to this in case you are wondering how to decode the new interface names.
diff --git a/docs/RELEASE.md b/docs/RELEASE.md
new file mode 100644
index 0000000..4bf5ab8
--- /dev/null
+++ b/docs/RELEASE.md
@@ -0,0 +1,16 @@
+---
+title: Steps to a Successful Release
+---
+
+# Steps to a Successful Release
+
+1. Add all items to NEWS
+2. Update the contributors list in NEWS ("make git-contrib")
+3. Update the time and place in NEWS
+4. Update version in configure.ac and library numbers in Makefile.am
+5. Check that "make distcheck" works
+6. Tag the release ("make git-tag")
+7. Upload the documentation ("make doc-sync")
+8. Close the github milestone and open a new one (https://github.com/systemd/systemd/milestones)
+9. Send announcement to systemd-devel, with a copy&paste from NEWS
+10. Update IRC topic ("/msg chanserv TOPIC #systemd Version NNN released")
diff --git a/docs/TRANSIENT-SETTINGS.md b/docs/TRANSIENT-SETTINGS.md
new file mode 100644
index 0000000..0ac77f0
--- /dev/null
+++ b/docs/TRANSIENT-SETTINGS.md
@@ -0,0 +1,465 @@
+---
+title: What settings are currently available for transient units?
+---
+
+# What settings are currently available for transient units?
+
+Our intention is to make all settings that are available as unit file settings
+also available for transient units, through the D-Bus API. At the moment, some
+unit types (device, swap, target) are not supported at all via unit types,
+but most others are pretty well supported, with some notable omissions.
+
+The lists below contain all settings currently available in unit files. The
+ones currently available in transient units are prefixed with `✓`.
+
+## Generic Unit Settings
+
+Most generic unit settings are available for transient units.
+
+```
+✓ Description=
+✓ Documentation=
+✓ SourcePath=
+✓ Requires=
+✓ Requisite=
+✓ Wants=
+✓ BindsTo=
+✓ Conflicts=
+✓ Before=
+✓ After=
+✓ OnFailure=
+✓ PropagatesReloadTo=
+✓ ReloadPropagatedFrom=
+✓ PartOf=
+✓ JoinsNamespaceOf=
+✓ RequiresMountsFor=
+✓ StopWhenUnneeded=
+✓ RefuseManualStart=
+✓ RefuseManualStop=
+✓ AllowIsolate=
+✓ DefaultDependencies=
+✓ OnFailureJobMode=
+✓ IgnoreOnIsolate=
+✓ JobTimeoutSec=
+✓ JobRunningTimeoutSec=
+✓ JobTimeoutAction=
+✓ JobTimeoutRebootArgument=
+✓ StartLimitIntervalSec=SECONDS
+✓ StartLimitBurst=UNSIGNED
+✓ StartLimitAction=ACTION
+✓ FailureAction=
+✓ SuccessAction=
+✓ FailureActionExitStatus=
+✓ SuccessActionExitStatus=
+✓ AddRef=
+✓ RebootArgument=STRING
+✓ ConditionPathExists=
+✓ ConditionPathExistsGlob=
+✓ ConditionPathIsDirectory=
+✓ ConditionPathIsSymbolicLink=
+✓ ConditionPathIsMountPoint=
+✓ ConditionPathIsReadWrite=
+✓ ConditionDirectoryNotEmpty=
+✓ ConditionFileNotEmpty=
+✓ ConditionFileIsExecutable=
+✓ ConditionNeedsUpdate=
+✓ ConditionFirstBoot=
+✓ ConditionKernelCommandLine=
+✓ ConditionKernelVersion=
+✓ ConditionArchitecture=
+✓ ConditionVirtualization=
+✓ ConditionSecurity=
+✓ ConditionCapability=
+✓ ConditionHost=
+✓ ConditionACPower=
+✓ ConditionUser=
+✓ ConditionGroup=
+✓ ConditionControlGroupController=
+✓ AssertPathExists=
+✓ AssertPathExistsGlob=
+✓ AssertPathIsDirectory=
+✓ AssertPathIsSymbolicLink=
+✓ AssertPathIsMountPoint=
+✓ AssertPathIsReadWrite=
+✓ AssertDirectoryNotEmpty=
+✓ AssertFileNotEmpty=
+✓ AssertFileIsExecutable=
+✓ AssertNeedsUpdate=
+✓ AssertFirstBoot=
+✓ AssertKernelCommandLine=
+✓ AssertKernelVersion=
+✓ AssertArchitecture=
+✓ AssertVirtualization=
+✓ AssertSecurity=
+✓ AssertCapability=
+✓ AssertHost=
+✓ AssertACPower=
+✓ AssertUser=
+✓ AssertGroup=
+✓ AssertControlGroupController=
+✓ CollectMode=
+```
+
+## Execution-Related Settings
+
+All execution-related settings are available for transient units.
+
+```
+✓ WorkingDirectory=
+✓ RootDirectory=
+✓ RootImage=
+✓ User=
+✓ Group=
+✓ SupplementaryGroups=
+✓ Nice=
+✓ OOMScoreAdjust=
+✓ IOSchedulingClass=
+✓ IOSchedulingPriority=
+✓ CPUSchedulingPolicy=
+✓ CPUSchedulingPriority=
+✓ CPUSchedulingResetOnFork=
+✓ CPUAffinity=
+✓ UMask=
+✓ Environment=
+✓ EnvironmentFile=
+✓ PassEnvironment=
+✓ UnsetEnvironment=
+✓ DynamicUser=
+✓ RemoveIPC=
+✓ StandardInput=
+✓ StandardOutput=
+✓ StandardError=
+✓ StandardInputText=
+✓ StandardInputData=
+✓ TTYPath=
+✓ TTYReset=
+✓ TTYVHangup=
+✓ TTYVTDisallocate=
+✓ SyslogIdentifier=
+✓ SyslogFacility=
+✓ SyslogLevel=
+✓ SyslogLevelPrefix=
+✓ LogLevelMax=
+✓ LogExtraFields=
+✓ LogRateLimitIntervalSec=
+✓ LogRateLimitBurst=
+✓ SecureBits=
+✓ CapabilityBoundingSet=
+✓ AmbientCapabilities=
+✓ TimerSlackNSec=
+✓ NoNewPrivileges=
+✓ KeyringMode=
+✓ SystemCallFilter=
+✓ SystemCallArchitectures=
+✓ SystemCallErrorNumber=
+✓ MemoryDenyWriteExecute=
+✓ RestrictNamespaces=
+✓ RestrictRealtime=
+✓ RestrictAddressFamilies=
+✓ LockPersonality=
+✓ LimitCPU=
+✓ LimitFSIZE=
+✓ LimitDATA=
+✓ LimitSTACK=
+✓ LimitCORE=
+✓ LimitRSS=
+✓ LimitNOFILE=
+✓ LimitAS=
+✓ LimitNPROC=
+✓ LimitMEMLOCK=
+✓ LimitLOCKS=
+✓ LimitSIGPENDING=
+✓ LimitMSGQUEUE=
+✓ LimitNICE=
+✓ LimitRTPRIO=
+✓ LimitRTTIME=
+✓ ReadWritePaths=
+✓ ReadOnlyPaths=
+✓ InaccessiblePaths=
+✓ BindPaths=
+✓ BindReadOnlyPaths=
+✓ TemporaryFileSystem=
+✓ PrivateTmp=
+✓ PrivateDevices=
+✓ PrivateMounts=
+✓ ProtectKernelTunables=
+✓ ProtectKernelModules=
+✓ ProtectControlGroups=
+✓ PrivateNetwork=
+✓ PrivateUsers=
+✓ ProtectSystem=
+✓ ProtectHome=
+✓ MountFlags=
+✓ MountAPIVFS=
+✓ Personality=
+✓ RuntimeDirectoryPreserve=
+✓ RuntimeDirectoryMode=
+✓ RuntimeDirectory=
+✓ StateDirectoryMode=
+✓ StateDirectory=
+✓ CacheDirectoryMode=
+✓ CacheDirectory=
+✓ LogsDirectoryMode=
+✓ LogsDirectory=
+✓ ConfigurationDirectoryMode=
+✓ ConfigurationDirectory=
+✓ PAMName=
+✓ IgnoreSIGPIPE=
+✓ UtmpIdentifier=
+✓ UtmpMode=
+✓ SELinuxContext=
+✓ SmackProcessLabel=
+✓ AppArmorProfile=
+✓ Slice=
+```
+
+## Resource Control Settings
+
+All cgroup/resource control settings are available for transient units
+
+```
+✓ CPUAccounting=
+✓ CPUWeight=
+✓ StartupCPUWeight=
+✓ CPUShares=
+✓ StartupCPUShares=
+✓ CPUQuota=
+✓ MemoryAccounting=
+✓ MemoryMin=
+✓ MemoryLow=
+✓ MemoryHigh=
+✓ MemoryMax=
+✓ MemorySwapMax=
+✓ MemoryLimit=
+✓ DeviceAllow=
+✓ DevicePolicy=
+✓ IOAccounting=
+✓ IOWeight=
+✓ StartupIOWeight=
+✓ IODeviceWeight=
+✓ IOReadBandwidthMax=
+✓ IOWriteBandwidthMax=
+✓ IOReadIOPSMax=
+✓ IOWriteIOPSMax=
+✓ BlockIOAccounting=
+✓ BlockIOWeight=
+✓ StartupBlockIOWeight=
+✓ BlockIODeviceWeight=
+✓ BlockIOReadBandwidth=
+✓ BlockIOWriteBandwidth=
+✓ TasksAccounting=
+✓ TasksMax=
+✓ Delegate=
+✓ IPAccounting=
+✓ IPAddressAllow=
+✓ IPAddressDeny=
+```
+
+## Process Killing Settings
+
+All process killing settings are available for transient units:
+
+```
+✓ SendSIGKILL=
+✓ SendSIGHUP=
+✓ KillMode=
+✓ KillSignal=
+✓ FinalKillSignal=
+✓ WatchdogSignal=
+```
+
+## Service Unit Settings
+
+Most service unit settings are available for transient units.
+
+```
+✓ PIDFile=
+✓ ExecStartPre=
+✓ ExecStart=
+✓ ExecStartPost=
+✓ ExecReload=
+✓ ExecStop=
+✓ ExecStopPost=
+✓ RestartSec=
+✓ TimeoutStartSec=
+✓ TimeoutStopSec=
+✓ TimeoutSec=
+✓ RuntimeMaxSec=
+✓ WatchdogSec=
+✓ Type=
+✓ Restart=
+✓ RootDirectoryStartOnly=
+✓ RemainAfterExit=
+✓ GuessMainPID=
+✓ RestartPreventExitStatus=
+✓ RestartForceExitStatus=
+✓ SuccessExitStatus=
+✓ NonBlocking=
+✓ BusName=
+✓ FileDescriptorStoreMax=
+✓ NotifyAccess=
+ Sockets=
+✓ USBFunctionDescriptors=
+✓ USBFunctionStrings=
+```
+
+## Mount Unit Settings
+
+All mount unit settings are available to transient units:
+
+```
+✓ What=
+✓ Where=
+✓ Options=
+✓ Type=
+✓ TimeoutSec=
+✓ DirectoryMode=
+✓ SloppyOptions=
+✓ LazyUnmount=
+✓ ForceUnmount=
+```
+
+## Automount Unit Settings
+
+All automount unit setting is available to transient units:
+
+```
+✓ Where=
+✓ DirectoryMode=
+✓ TimeoutIdleSec=
+```
+
+## Timer Unit Settings
+
+Most timer unit settings are available to transient units.
+
+```
+✓ OnCalendar=
+✓ OnActiveSec=
+✓ OnBootSec=
+✓ OnStartupSec=
+✓ OnUnitActiveSec=
+✓ OnUnitInactiveSec=
+✓ Persistent=
+✓ WakeSystem=
+✓ RemainAfterElapse=
+✓ AccuracySec=
+✓ RandomizedDelaySec=
+ Unit=
+```
+
+## Slice Unit Settings
+
+Slice units are fully supported as transient units, but they have no settings
+of their own beyond the generic unit and resource control settings.
+
+## Scope Unit Settings
+
+Scope units are fully supported as transient units (in fact they only exist as
+such).
+
+```
+✓ TimeoutStopSec=
+```
+
+## Socket Unit Settings
+
+Most socket unit settings are available to transient units.
+
+```
+✓ ListenStream=
+✓ ListenDatagram=
+✓ ListenSequentialPacket=
+✓ ListenFIFO=
+✓ ListenNetlink=
+✓ ListenSpecial=
+✓ ListenMessageQueue=
+✓ ListenUSBFunction=
+✓ SocketProtocol=
+✓ BindIPv6Only=
+✓ Backlog=
+✓ BindToDevice=
+✓ ExecStartPre=
+✓ ExecStartPost=
+✓ ExecStopPre=
+✓ ExecStopPost=
+✓ TimeoutSec=
+✓ SocketUser=
+✓ SocketGroup=
+✓ SocketMode=
+✓ DirectoryMode=
+✓ Accept=
+✓ Writable=
+✓ MaxConnections=
+✓ MaxConnectionsPerSource=
+✓ KeepAlive=
+✓ KeepAliveTimeSec=
+✓ KeepAliveIntervalSec=
+✓ KeepAliveProbes=
+✓ DeferAcceptSec=
+✓ NoDelay=
+✓ Priority=
+✓ ReceiveBuffer=
+✓ SendBuffer=
+✓ IPTOS=
+✓ IPTTL=
+✓ Mark=
+✓ PipeSize=
+✓ FreeBind=
+✓ Transparent=
+✓ Broadcast=
+✓ PassCredentials=
+✓ PassSecurity=
+✓ TCPCongestion=
+✓ ReusePort=
+✓ MessageQueueMaxMessages=
+✓ MessageQueueMessageSize=
+✓ RemoveOnStop=
+✓ Symlinks=
+✓ FileDescriptorName=
+ Service=
+✓ TriggerLimitIntervalSec=
+✓ TriggerLimitBurst=
+✓ SmackLabel=
+✓ SmackLabelIPIn=
+✓ SmackLabelIPOut=
+✓ SELinuxContextFromNet=
+```
+
+## Swap Unit Settings
+
+Swap units are currently not available at all as transient units:
+
+```
+ What=
+ Priority=
+ Options=
+ TimeoutSec=
+```
+
+## Path Unit Settings
+
+Most path unit settings are available to transient units.
+
+```
+✓ PathExists=
+✓ PathExistsGlob=
+✓ PathChanged=
+✓ PathModified=
+✓ DirectoryNotEmpty=
+ Unit=
+✓ MakeDirectory=
+✓ DirectoryMode=
+```
+
+## Install Section
+
+The `[Install]` section is currently not available at all for transient units, and it probably doesn't even make sense.
+
+```
+ Alias=
+ WantedBy=
+ RequiredBy=
+ Also=
+ DefaultInstance=
+```
diff --git a/docs/TRANSLATORS.md b/docs/TRANSLATORS.md
new file mode 100644
index 0000000..d155c1c
--- /dev/null
+++ b/docs/TRANSLATORS.md
@@ -0,0 +1,78 @@
+---
+title: Notes for Translators
+---
+
+# Notes for Translators
+
+systemd depends on the `gettext` package for multilingual support.
+
+You'll find the i18n files in the `po/` directory.
+
+The build system (meson/ninja) can be used to generate a template (`*.pot`),
+which can be used to create new translations.
+
+It can also merge the template into the existing translations (`*.po`), to pick
+up new strings in need of translation.
+
+Finally, it is able to compile the translations (to `*.gmo` files), so that
+they can be used by systemd software. (This step is also useful to confirm the
+syntax of the `*.po` files is correct.)
+
+## Creating a New Translation
+
+To create a translation to a language not yet available, start by creating the
+initial template:
+
+```
+$ ninja -C build/ systemd-pot
+```
+
+This will generate file `po/systemd.pot` in the source tree.
+
+Then simply copy it to a new <code><i>${lang_code}</i>.po</code> file, where
+<code><i>${lang_code}</i></code> is the two-letter code for a language
+(possibly followed by a two-letter uppercase country code), according to the
+ISO 639 standard.
+
+In short:
+
+<pre>
+$ cp po/systemd.pot po/<i>${lang_code}</i>.po
+</pre>
+
+Then edit the new <code>po/<i>${lang_code}</i>.po</code> file (for example,
+using the `poedit` GUI editor.)
+
+## Updating an Existing Translation
+
+Start by updating the `*.po` files from the latest template:
+
+```
+$ ninja -C build/ systemd-update-po
+```
+
+This will touch all the `*.po` files, so you'll want to pay attention when
+creating a git commit from this change, to only include the one translation
+you're actually updating.
+
+Edit the `*.po` file, looking for empty translations and translations marked as
+"fuzzy" (which means the merger found a similar message that needs to be
+reviewed as it's expected not to match exactly.)
+
+You can use any text editor to update the `*.po` files, but a good choice is
+the `poedit` editor, a graphical application specifically designed for this
+purpose.
+
+Once you're done, create a git commit for the update of the `po/*.po` file you
+touched. Remember to undo the changes to the other `*.po` files (for instance,
+using `git checkout -- po/` after you commit the changes you do want to keep.)
+
+# Recompiling Translations
+
+You can recompile the `*.po` files using the following command:
+
+```
+$ ninja -C build/ systemd-gmo
+```
+
+The resulting files will be saved in the `build/po/` directory.
diff --git a/docs/UIDS-GIDS.md b/docs/UIDS-GIDS.md
new file mode 100644
index 0000000..25345a9
--- /dev/null
+++ b/docs/UIDS-GIDS.md
@@ -0,0 +1,282 @@
+---
+title: Users, Groups, UIDs and GIDs on `systemd` Systems
+---
+
+# Users, Groups, UIDs and GIDs on `systemd` Systems
+
+Here's a summary of the requirements `systemd` (and Linux) make on UID/GID
+assignments and their ranges.
+
+Note that while in theory UIDs and GIDs are orthogonal concepts they really
+aren't IRL. With that in mind, when we discuss UIDs below it should be assumed
+that whatever we say about UIDs applies to GIDs in mostly the same way, and all
+the special assignments and ranges for UIDs always have mostly the same
+validity for GIDs too.
+
+## Special Linux UIDs
+
+In theory, the range of the C type `uid_t` is 32bit wide on Linux,
+i.e. 0…4294967295. However, four UIDs are special on Linux:
+
+1. 0 → The `root` super-user
+
+2. 65534 → The `nobody` UID, also called the "overflow" UID or similar. It's
+ where various subsystems map unmappable users to, for example file systems
+ only supporting 16bit UIDs, NFS or user namespacing. (The latter can be
+ changed with a sysctl during runtime, but that's not supported on
+ `systemd`. If you do change it you void your warranty.) Because Fedora is a
+ bit confused the `nobody` user is called `nfsnobody` there (and they have a
+ different `nobody` user at UID 99). I hope this will be corrected eventually
+ though. (Also, some distributions call the `nobody` group `nogroup`. I wish
+ they didn't.)
+
+3. 4294967295, aka "32bit `(uid_t) -1`" → This UID is not a valid user ID, as
+ `setresuid()`, `chown()` and friends treat -1 as a special request to not
+ change the UID of the process/file. This UID is hence not available for
+ assignment to users in the user database.
+
+4. 65535, aka "16bit `(uid_t) -1`" → Before Linux kernel 2.4 `uid_t` used to be
+ 16bit, and programs compiled for that would hence assume that `(uid_t) -1`
+ is 65535. This UID is hence not usable either.
+
+The `nss-systemd` glibc NSS module will synthesize user database records for
+the UIDs 0 and 65534 if the system user database doesn't list them. This means
+that any system where this module is enabled works to some minimal level
+without `/etc/passwd`.
+
+## Special Distribution UID ranges
+
+Distributions generally split the available UID range in two:
+
+1. 1…999 → System users. These are users that do not map to actual "human"
+ users, but are used as security identities for system daemons, to implement
+ privilege separation and run system daemons with minimal privileges.
+
+2. 1000…65533 and 65536…4294967294 → Everything else, i.e. regular (human) users.
+
+Note that most distributions allow changing the boundary between system and
+regular users, even during runtime as user configuration. Moreover, some older
+systems placed the boundary at 499/500, or even 99/100. In `systemd`, the
+boundary is configurable only during compilation time, as this should be a
+decision for distribution builders, not for users. Moreover, we strongly
+discourage downstreams to change the boundary from the upstream default of
+999/1000.
+
+Also note that programs such as `adduser` tend to allocate from a subset of the
+available regular user range only, usually 1000..60000. And it's also usually
+user-configurable, too.
+
+Note that systemd requires that system users and groups are resolvable without
+networking available — a requirement that is not made for regular users. This
+means regular users may be stored in remote LDAP or NIS databases, but system
+users may not (except when there's a consistent local cache kept, that is
+available during earliest boot, including in the initial RAM disk).
+
+## Special `systemd` GIDs
+
+`systemd` defines no special UIDs beyond what Linux already defines (see
+above). However, it does define some special group/GID assignments, which are
+primarily used for `systemd-udevd`'s device management. The precise list of the
+currently defined groups is found in this `sysusers.d` snippet:
+[basic.conf](https://raw.githubusercontent.com/systemd/systemd/master/sysusers.d/basic.conf.in)
+
+It's strongly recommended that downstream distributions include these groups in
+their default group databases.
+
+Note that the actual GID numbers assigned to these groups do not have to be
+constant beyond a specific system. There's one exception however: the `tty`
+group must have the GID 5. That's because it must be encoded in the `devpts`
+mount parameters during earliest boot, at a time where NSS lookups are not
+possible. (Note that the actual GID can be changed during `systemd` build time,
+but downstreams are strongly advised against doing that.)
+
+## Special `systemd` UID ranges
+
+`systemd` defines a number of special UID ranges:
+
+1. 61184…65519 → UIDs for dynamic users are allocated from this range (see the
+ `DynamicUser=` documentation in
+ [`systemd.exec(5)`](https://www.freedesktop.org/software/systemd/man/systemd.exec.html)). This
+ range has been chosen so that it is below the 16bit boundary (i.e. below
+ 65535), in order to provide compatibility with container environments that
+ assign a 64K range of UIDs to containers using user namespacing. This range
+ is above the 60000 boundary, so that its allocations are unlikely to be
+ affected by `adduser` allocations (see above). And we leave some room
+ upwards for other purposes. (And if you wonder why precisely these numbers:
+ if you write them in hexadecimal, they might make more sense: 0xEF00 and
+ 0xFFEF). The `nss-systemd` module will synthesize user records implicitly
+ for all currently allocated dynamic users from this range. Thus, NSS-based
+ user record resolving works correctly without those users being in
+ `/etc/passwd`.
+
+2. 524288…1879048191 → UID range for `systemd-nspawn`'s automatic allocation of
+ per-container UID ranges. When the `--private-users=pick` switch is used (or
+ `-U`) then it will automatically find a so far unused 16bit subrange of this
+ range and assign it to the container. The range is picked so that the upper
+ 16bit of the 32bit UIDs are constant for all users of the container, while
+ the lower 16bit directly encode the 65536 UIDs assigned to the
+ container. This mode of allocation means that the upper 16bit of any UID
+ assigned to a container are kind of a "container ID", while the lower 16bit
+ directly expose the container's own UID numbers. If you wonder why precisely
+ these numbers, consider them in hexadecimal: 0x00080000…0x6FFFFFFF. This
+ range is above the 16bit boundary. Moreover it's below the 31bit boundary,
+ as some broken code (specifically: the kernel's `devpts` file system)
+ erroneously considers UIDs signed integers, and hence can't deal with values
+ above 2^31. The `nss-mymachines` glibc NSS module will synthesize user
+ database records for all UIDs assigned to a running container from this
+ range.
+
+Note for both allocation ranges: when an UID allocation takes place NSS is
+checked for collisions first, and a different UID is picked if an entry is
+found. Thus, the user database is used as synchronization mechanism to ensure
+exclusive ownership of UIDs and UID ranges. To ensure compatibility with other
+subsystems allocating from the same ranges it is hence essential that they
+ensure that whatever they pick shows up in the user/group databases, either by
+providing an NSS module, or by adding entries directly to `/etc/passwd` and
+`/etc/group`. For performance reasons, do note that `systemd-nspawn` will only
+do an NSS check for the first UID of the range it allocates, not all 65536 of
+them. Also note that while the allocation logic is operating, the glibc
+`lckpwdf()` user database lock is taken, in order to make this logic race-free.
+
+## Figuring out the system's UID boundaries
+
+The most important boundaries of the local system may be queried with
+`pkg-config`:
+
+```
+$ pkg-config --variable=systemuidmax systemd
+999
+$ pkg-config --variable=dynamicuidmin systemd
+61184
+$ pkg-config --variable=dynamicuidmax systemd
+65519
+$ pkg-config --variable=containeruidbasemin systemd
+524288
+$ pkg-config --variable=containeruidbasemax systemd
+1878982656
+```
+
+(Note that the latter encodes the maximum UID *base* `systemd-nspawn` might
+pick — given that 64K UIDs are assigned to each container according to this
+allocation logic, the maximum UID used for this range is hence
+1878982656+65535=1879048191.)
+
+Note that systemd does not make any of these values runtime-configurable. All
+these boundaries are chosen during build time. That said, the system UID/GID
+boundary is traditionally configured in /etc/login.defs, though systemd won't
+look there during runtime.
+
+## Considerations for container managers
+
+If you hack on a container manager, and wonder how and how many UIDs best to
+assign to your containers, here are a few recommendations:
+
+1. Definitely, don't assign less than 65536 UIDs/GIDs. After all the `nobody`
+user has magic properties, and hence should be available in your container, and
+given that it's assigned the UID 65534, you should really cover the full 16bit
+range in your container. Note that systemd will — as mentioned — synthesize
+user records for the `nobody` user, and assumes its availability in various
+other parts of its codebase, too, hence assigning fewer users means you lose
+compatibility with running systemd code inside your container. And most likely
+other packages make similar restrictions.
+
+2. While it's fine to assign more than 65536 UIDs/GIDs to a container, there's
+most likely not much value in doing so, as Linux distributions won't use the
+higher ranges by default (as mentioned neither `adduser` nor `systemd`'s
+dynamic user concept allocate from above the 16bit range). Unless you actively
+care for nested containers, it's hence probably a good idea to allocate exactly
+65536 UIDs per container, and neither less nor more. A pretty side-effect is
+that by doing so, you expose the same number of UIDs per container as Linux 2.2
+supported for the whole system, back in the days.
+
+3. Consider allocating UID ranges for containers so that the first UID you
+assign has the lower 16bits all set to zero. That way, the upper 16bits become
+a container ID of some kind, while the lower 16bits directly encode the
+internal container UID. This is the way `systemd-nspawn` allocates UID ranges
+(see above). Following this allocation logic ensures best compatibility with
+`systemd-nspawn` and all other container managers following the scheme, as it
+is sufficient then to check NSS for the first UID you pick regarding conflicts,
+as that's what they do, too. Moreover, it makes `chown()`ing container file
+system trees nicely robust to interruptions: as the external UID encodes the
+internal UID in a fixed way, it's very easy to adjust the container's base UID
+without the need to know the original base UID: to change the container base,
+just mask away the upper 16bit, and insert the upper 16bit of the new container
+base instead. Here are the easy conversions to derive the internal UID, the
+external UID, and the container base UID from each other:
+
+ ```
+ INTERNAL_UID = EXTERNAL_UID & 0x0000FFFF
+ CONTAINER_BASE_UID = EXTERNAL_UID & 0xFFFF0000
+ EXTERNAL_UID = INTERNAL_UID | CONTAINER_BASE_UID
+ ```
+
+4. When picking a UID range for containers, make sure to check NSS first, with
+a simple `getpwuid()` call: if there's already a user record for the first UID
+you want to pick, then it's already in use: pick a different one. Wrap that
+call in a `lckpwdf()` + `ulckpwdf()` pair, to make allocation
+race-free. Provide an NSS module that makes all UIDs you end up taking show up
+in the user database, and make sure that the NSS module returns up-to-date
+information before you release the lock, so that other system components can
+safely use the NSS user database as allocation check, too. Note that if you
+follow this scheme no changes to `/etc/passwd` need to be made, thus minimizing
+the artifacts the container manager persistently leaves in the system.
+
+## Summary
+
+| UID/GID | Purpose | Defined By | Listed in |
+|-----------------------|-----------------------|---------------|-------------------------------|
+| 0 | `root` user | Linux | `/etc/passwd` + `nss-systemd` |
+| 1…4 | System users | Distributions | `/etc/passwd` |
+| 5 | `tty` group | `systemd` | `/etc/passwd` |
+| 6…999 | System users | Distributions | `/etc/passwd` |
+| 1000…60000 | Regular users | Distributions | `/etc/passwd` + LDAP/NIS/… |
+| 60001…61183 | Unused | | |
+| 61184…65519 | Dynamic service users | `systemd` | `nss-systemd` |
+| 65520…65533 | Unused | | |
+| 65534 | `nobody` user | Linux | `/etc/passwd` + `nss-systemd` |
+| 65535 | 16bit `(uid_t) -1` | Linux | |
+| 65536…524287 | Unused | | |
+| 524288…1879048191 | Container UID ranges | `systemd` | `nss-mymachines` |
+| 1879048192…4294967294 | Unused | | |
+| 4294967295 | 32bit `(uid_t) -1` | Linux | |
+
+Note that "Unused" in the table above doesn't meant that these ranges are
+really unused. It just means that these ranges have no well-established
+pre-defined purposes between Linux, generic low-level distributions and
+`systemd`. There might very well be other packages that allocate from these
+ranges.
+
+## Notes on resolvability of user and group names
+
+User names, UIDs, group names and GIDs don't have to be resolvable using NSS
+(i.e. getpwuid() and getpwnam() and friends) all the time. However, systemd
+makes the following requirements:
+
+System users generally have to be resolvable during early boot already. This
+means they should not be provided by any networked service (as those usually
+become available during late boot only), except if a local cache is kept that
+makes them available during early boot too (i.e. before networking is
+up). Specifically, system users need to be resolvable at least before
+`systemd-udevd.service` and `systemd-tmpfiles.service` are started, as both
+need to resolve system users — but note that there might be more services
+requiring full resolvability of system users than just these two.
+
+Regular users do not need to be resolvable during early boot, it is sufficient
+if they become resolvable during late boot. Specifically, regular users need to
+be resolvable at the point in time the `nss-user-lookup.target` unit is
+reached. This target unit is generally used as synchronization point between
+providers of the user database and consumers of it. Services that require that
+the user database is fully available (for example, the login service
+`systemd-logind.service`) are ordered *after* it, while services that provide
+parts of the user database (for example an LDAP user database client) are
+ordered *before* it. Note that `nss-user-lookup.target` is a *passive* unit: in
+order to minimize synchronization points on systems that don't need it the unit
+is pulled into the initial transaction only if there's at least one service
+that really needs it, and that means only if there's a service providing the
+local user database somehow through IPC or suchlike. Or in other words: if you
+hack on some networked user database project, then make sure you order your
+service `Before=nss-user-lookup.target` and that you pull it in with
+`Wants=nss-user-lookup.target`. However, if you hack on some project that needs
+the user database to be up in full, then order your service
+`After=nss-user-lookup.target`, but do *not* pull it in via a `Wants=`
+dependency.
diff --git a/docs/_config.yml b/docs/_config.yml
new file mode 100644
index 0000000..ee845ee
--- /dev/null
+++ b/docs/_config.yml
@@ -0,0 +1 @@
+theme: jekyll-theme-primer
diff --git a/docs/index.md b/docs/index.md
new file mode 100644
index 0000000..ffb30b9
--- /dev/null
+++ b/docs/index.md
@@ -0,0 +1,11 @@
+---
+title: systemd Documentation
+---
+
+# systemd Documentation
+
+{% for p in site.pages %}
+ {% if p.url != page.url and p.title %}
+* [{{ p.title }}]({{ p.url | relative_url }})
+ {% endif %}
+{% endfor %}
diff --git a/docs/sysvinit/README.in b/docs/sysvinit/README.in
new file mode 100644
index 0000000..de5d80d
--- /dev/null
+++ b/docs/sysvinit/README.in
@@ -0,0 +1,27 @@
+You are looking for the traditional init scripts in @SYSTEM_SYSVINIT_PATH@,
+and they are gone?
+
+Here's an explanation on what's going on:
+
+You are running a systemd-based OS where traditional init scripts have
+been replaced by native systemd services files. Service files provide
+very similar functionality to init scripts. To make use of service
+files simply invoke "systemctl", which will output a list of all
+currently running services (and other units). Use "systemctl
+list-unit-files" to get a listing of all known unit files, including
+stopped, disabled and masked ones. Use "systemctl start
+foobar.service" and "systemctl stop foobar.service" to start or stop a
+service, respectively. For further details, please refer to
+systemctl(1).
+
+Note that traditional init scripts continue to function on a systemd
+system. An init script @SYSTEM_SYSVINIT_PATH@/foobar is implicitly mapped
+into a service unit foobar.service during system initialization.
+
+Thank you!
+
+Further reading:
+ man:systemctl(1)
+ man:systemd(1)
+ http://0pointer.de/blog/projects/systemd-for-admins-3.html
+ https://www.freedesktop.org/wiki/Software/systemd/Incompatibilities
diff --git a/docs/sysvinit/meson.build b/docs/sysvinit/meson.build
new file mode 100644
index 0000000..fbac59a
--- /dev/null
+++ b/docs/sysvinit/meson.build
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: LGPL-2.1+
+
+file = configure_file(
+ input : 'README.in',
+ output : 'README',
+ configuration : substs)
+
+if conf.get('HAVE_SYSV_COMPAT') == 1
+ install_data(file,
+ install_dir : sysvinit_path)
+endif
diff --git a/docs/var-log/README.in b/docs/var-log/README.in
new file mode 100644
index 0000000..2e64fb1
--- /dev/null
+++ b/docs/var-log/README.in
@@ -0,0 +1,26 @@
+You are looking for the traditional text log files in @VARLOGDIR@, and
+they are gone?
+
+Here's an explanation on what's going on:
+
+You are running a systemd-based OS where traditional syslog has been
+replaced with the Journal. The journal stores the same (and more)
+information as classic syslog. To make use of the journal and access
+the collected log data simply invoke "journalctl", which will output
+the logs in the identical text-based format the syslog files in
+@VARLOGDIR@ used to be. For further details, please refer to
+journalctl(1).
+
+Alternatively, consider installing one of the traditional syslog
+implementations available for your distribution, which will generate
+the classic log files for you. Syslog implementations such as
+syslog-ng or rsyslog may be installed side-by-side with the journal
+and will continue to function the way they always did.
+
+Thank you!
+
+Further reading:
+ man:journalctl(1)
+ man:systemd-journald.service(8)
+ man:journald.conf(5)
+ http://0pointer.de/blog/projects/the-journal.html
diff --git a/docs/var-log/meson.build b/docs/var-log/meson.build
new file mode 100644
index 0000000..0ddff20
--- /dev/null
+++ b/docs/var-log/meson.build
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: LGPL-2.1+
+
+file = configure_file(
+ input : 'README.in',
+ output : 'README',
+ configuration : substs)
+
+if conf.get('HAVE_SYSV_COMPAT') == 1
+ install_data(file,
+ install_dir : varlogdir)
+endif