10 files changed, 594 insertions, 0 deletions
diff --git a/health/guides/systemdunits/systemd_automount_unit_failed_state.md b/health/guides/systemdunits/systemd_automount_unit_failed_state.md
new file mode 100644
index 00000000..eb3024a9
--- /dev/null
+++ b/health/guides/systemdunits/systemd_automount_unit_failed_state.md
@@ -0,0 +1,58 @@
+### Understand the alert
+
+This alert is triggered when a `systemd` automount unit enters the `failed` state. It means that a mounted filesystem has failed or experienced an error and thus is not available for use.
+
+### What is an automount unit?
+
+An automount unit is a type of `systemd` unit that handles automounting filesystems. It defines when, where, and how a filesystem should be automatically mounted on the system. Automount units use the `.automount` file extension and are typically located in the `/etc/systemd/system` directory.
+
+### Troubleshoot the alert
+
+1. Identify the failed automount unit(s)
+
+To list all `systemd` automount units and their states, run the following command:
+
+```
+systemctl list-units --all --type=automount
+```
+
+Look for the unit(s) with a `failed` state.
+
+2. Check the automount unit file
+
+Examine the failed unit's configuration file in `/etc/systemd/system/` or `/lib/systemd/system/` (depending on your system). If there is an error in the configuration, fix it and reload the `systemd` configuration.
+
+```
+sudo systemctl daemon-reload
+```
+
+3. Check the journal for errors
+
+Use the `journalctl` command to check for any system logs related to the failed automount unit:
+
+```
+sudo journalctl -u [UnitName].automount
+```
+
+Replace `[UnitName]` with the name of the failed automount unit. Analyze the logs to identify the root cause of the failure.
+
+4. Attempt to restart the automount unit
+
+After identifying and addressing the cause of the failure, try to restart the automount unit:
+
+```
+sudo systemctl restart [UnitName].automount
+```
+
+Check the unit's status:
+
+```
+systemctl status [UnitName].automount
+```
+
+If it's in the `active` state, the issue has been resolved.
+
+### Useful resources
+
+1. [Arch Linux Wiki: systemd automount](https://wiki.archlinux.org/title/Fstab#systemd_automount)
+2. [systemd automount unit file example](https://www.freedesktop.org/software/systemd/man/systemd.automount.html#Examples)
diff --git a/health/guides/systemdunits/systemd_device_unit_failed_state.md b/health/guides/systemdunits/systemd_device_unit_failed_state.md
new file mode 100644
index 00000000..8a7fc39d
--- /dev/null
+++ b/health/guides/systemdunits/systemd_device_unit_failed_state.md
@@ -0,0 +1,65 @@
+### Understand the alert
+
+This alert is triggered when a `systemd device unit` enters a `failed state`. If you receive this alert, it means that a device managed by `systemd` on your Linux system has encountered an issue and is currently in a non-operational state.
+
+### What is a systemd device unit?
+
+`Systemd` is a system and service manager for Linux operating systems. A `device unit` in `systemd` is a unit that encapsulates a device in the system's device tree (e.g., `/sys` directory). The device units are used to automatically discover and manage devices present on the system.
+
+### What does a failed state mean?
+
+A `failed state` implies that the device has encountered an issue and is currently non-operational. The problem could be related to hardware, driver, or configuration issues.
+
+### Troubleshoot the alert
+
+1. Identify the failed device unit:
+
+   Check the `systemd` status for failed units using the following command:
+
+   ```
+   systemctl --failed --type=device
+   ```
+
+   This will show you the list of device units that are currently in a failed state.
+
+2. Check logs for errors:
+
+   Use the `journalctl` command to check the logs for any error messages related to the failed device unit. For instance, if the failed unit is `example.device`, you can execute:
+
+   ```
+   journalctl -xe -u example.device
+   ```
+
+   This will show you the logs with any error messages that will help you identify the root cause of the failure.
+
+3. Fix the issue:
+
+   Depending on the results from the previous steps, you might need to:
+
+   - Check the hardware connections and make sure they are properly connected.
+   - Update or reinstall the device driver.
+   - Check and correct device configurations if needed.
+
+4. Restart the device unit:
+
+   Once the issue has been fixed, restart the device unit using `systemctl`:
+
+   ```
+   sudo systemctl restart example.device
+   ```
+
+   Replace `example.device` with the specific device unit name.
+
+5. Validate the fix:
+
+   Check if the device unit is now operational by executing the following command:
+
+   ```
+   systemctl status example.device
+   ```
+
+   This should show you that the device unit is now active and running properly.
+
+### Useful resources
+
+1. [Systemd Device Units](https://www.freedesktop.org/software/systemd/man/systemd.device.html)
diff --git a/health/guides/systemdunits/systemd_mount_unit_failed_state.md b/health/guides/systemdunits/systemd_mount_unit_failed_state.md
new file mode 100644
index 00000000..5840b7ce
--- /dev/null
+++ b/health/guides/systemdunits/systemd_mount_unit_failed_state.md
@@ -0,0 +1,54 @@
+### Understand the alert
+
+This alert is triggered when a `systemd` mount unit enters a `failed state`. If you receive this alert, it means that your system has encountered an issue with mounting a filesystem or a mount point.
+
+### What is a systemd mount unit?
+
+`systemd` is the init system used in most Linux distributions to manage services, processes, and system startup. A mount unit is a configuration file that describes how a filesystem or mount point should be mounted and managed by `systemd`. 
+
+### What does a failed state mean?
+
+A `failed state` indicates that there was an issue with mounting the filesystem, or the mount point failed to function as expected. This can be caused by multiple factors, such as incorrect configuration, missing dependencies, or hardware issues.
+
+### Troubleshoot the alert
+
+- Identify the failed mount unit
+
+  Check the status of your `systemd` mount units by running:
+  ```
+  systemctl list-units --type=mount
+  ```
+  Look for units with a `failed` state.
+
+- Check the journal logs
+
+  To gain more insight into the issue, check the `systemd` journal logs for the failed mount unit:
+  ```
+  journalctl -u [unit-name]
+  ```
+  Replace `[unit-name]` with the actual name of the failed mount unit.
+
+- Verify the mount unit configuration
+
+  Review the mount unit configuration file located at `/etc/systemd/system/[unit-name].mount`. Ensure that options such as the filesystem type, device, and mount point are correct.
+
+- Check system logs for hardware or filesystem issues
+
+  Review the system logs (e.g., `/var/log/syslog` or `/var/log/messages`) for any hardware or filesystem related errors. Ensure that the device and mount point are properly connected and accessible.
+
+- Restart the mount unit
+
+  If you have made any changes to the configuration or resolved a hardware issue, attempt to restart the mount unit by running:
+  ```
+  systemctl restart [unit-name].mount
+  ```
+
+- Seek technical support
+
+  If the issue persists, consider reaching out to support, as there might be an underlying issue that needs to be addressed.
+
+### Useful resources
+
+1. [systemd.mount - Mount unit configuration](https://www.freedesktop.org/software/systemd/man/systemd.mount.html)
+2. [systemctl - Control the systemd system and service manager](https://www.freedesktop.org/software/systemd/man/systemctl.html)
+3. [journalctl - Query the systemd journal](https://www.freedesktop.org/software/systemd/man/journalctl.html)
+\ No newline at end of file
diff --git a/health/guides/systemdunits/systemd_path_unit_failed_state.md b/health/guides/systemdunits/systemd_path_unit_failed_state.md
new file mode 100644
index 00000000..9a4749b6
--- /dev/null
+++ b/health/guides/systemdunits/systemd_path_unit_failed_state.md
@@ -0,0 +1,61 @@
+### Understand the alert
+
+This alert is triggered when a `systemd path unit` enters a `failed state`. Service units in a failed state indicate an issue with the service's startup, runtime, or shutdown, which can result in the service being marked as failed.
+
+### What is a systemd path unit?
+
+`systemd` is an init system and system manager that manages services and their dependencies on Linux systems. A `path unit` is a type of unit configuration file that runs a service in response to the existence or modification of files and directories. These units are used to monitor files and directories and trigger actions based on changes to them.
+
+### Troubleshoot the alert
+
+1. Identify the failed systemd path unit
+
+First, you need to identify which path unit is experiencing issues. To list all failed units:
+
+   ```
+   systemctl --state=failed
+   ```
+
+Take note of the units indicated as 'path' in the output.
+
+2. Inspect the path unit status
+
+To get more details about the specific failed path unit, run:
+
+   ```
+   systemctl status <failed-path-unit>
+   ```
+
+Replace `<failed-path-unit>` with the name of the failed path unit you identified previously.
+
+3. Review logs for the failed path unit
+
+To view the logs for the failed path unit, use the `journalctl` command:
+
+   ```
+   journalctl -u <failed-path-unit>
+   ```
+
+Again, replace `<failed-path-unit>` with the name of the failed path unit. Review the logs to identify possible reasons for the failure.
+
+4. Reload the unit configuration (if necessary)
+
+If you discovered an issue in the unit configuration file and resolved it, reload the configuration by running:
+
+   ```
+   sudo systemctl daemon-reload
+   ```
+
+5. Restart the failed path unit
+
+Once you have identified and resolved the issue causing the failed state, try to restart the path unit:
+
+   ```
+   sudo systemctl restart <failed-path-unit>
+   ```
+
+Replace `<failed-path-unit>` with the name of the failed path unit. Then, monitor the path unit status to ensure it is running without issues.
+
+### Useful resources
+
+1. [Introduction to Systemd Units and Unit Files](https://www.digitalocean.com/community/tutorials/understanding-systemd-units-and-unit-files)
diff --git a/health/guides/systemdunits/systemd_scope_unit_failed_state.md b/health/guides/systemdunits/systemd_scope_unit_failed_state.md
new file mode 100644
index 00000000..e080ae36
--- /dev/null
+++ b/health/guides/systemdunits/systemd_scope_unit_failed_state.md
@@ -0,0 +1,57 @@
+### Understand the alert
+
+This alert is triggered when a systemd scope unit enters a failed state. If you receive this alert, it means that one of your systemd scope units is not working properly and requires attention.
+
+### What is a systemd scope unit?
+
+Systemd is the system and service manager on modern Linux systems. It is responsible for managing and controlling system processes, services, and units. A scope unit is a type of systemd unit that groups several processes together in a single unit. It is used to organize and manage resources of a group of processes.
+
+### Troubleshoot the alert
+
+1. Identify the systemd scope unit in the failed state
+
+To list all the systemd scope units on the system, run the following command:
+
+```
+systemctl list-units --type=scope
+```
+
+Look for the units with a 'failed' state.
+
+2. Check the status of the systemd scope unit
+
+To get more information about the failed systemd scope unit, use the `systemctl status` command followed by the unit name:
+
+```
+systemctl status UNIT_NAME
+```
+
+This command will display the unit status, any error messages, and the last few lines of the unit logs.
+
+3. Consult the logs for further details
+
+To get additional information about the unit's failure, you can use the `journalctl` command for the specific unit:
+
+```
+journalctl -u UNIT_NAME
+```
+
+This command will display the logs of the systemd scope unit, allowing you to identify any issues or error messages.
+
+4. Restart the systemd scope unit
+
+If the issue appears to be temporary, try restarting the unit using the following command:
+
+```
+systemctl restart UNIT_NAME
+```
+
+This will attempt to stop the failed unit and start it again.
+
+5. Debug and fix the issue
+
+If the systemd scope unit keeps failing, refer to the documentation and logs to debug the issue and apply the necessary fixes. You might need to update the unit's configuration, fix application issues, or address system resource limitations.
+
+### Useful resources
+
+1. [Systemd - Understanding and Managing System Startup](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_guide/chap-Managing_Services_with_systemd)
diff --git a/health/guides/systemdunits/systemd_service_unit_failed_state.md b/health/guides/systemdunits/systemd_service_unit_failed_state.md
new file mode 100644
index 00000000..f7356799
--- /dev/null
+++ b/health/guides/systemdunits/systemd_service_unit_failed_state.md
@@ -0,0 +1,66 @@
+### Understand the alert
+
+This alert is triggered when a `systemd service unit` enters the `failed state`. If you receive this alert, it means that a critical service on your system has stopped working, and it requires immediate attention.
+
+### What is a systemd service unit?
+
+A `systemd service unit` is a simply stated, a service configuration file that describes how a specific service should be controlled and managed on a Linux system. It includes information about service dependencies, the order in which it should start, and more. Systemd is responsible for managing these services and making sure they are functioning as intended.
+
+### What does the failed state mean?
+
+When a `systemd service unit` enters the `failed state`, it indicates that the service has encountered a fault, such as an incorrect configuration file, crashing, or failing to start due to other dependencies. When this occurs, the service is rendered non-functional, and you should troubleshoot the issue to restore normal functionality.
+
+### Troubleshoot the alert
+
+1. Identify the failed service unit
+
+   Use the following command to list all failed service units:
+
+   ```
+   systemctl --state=failed
+   ```
+
+   Take note of the failed service unit name as you will use it in the next steps.
+
+2. Check the service unit status
+
+   Use the following command to investigate the status and any error messages:
+
+   ```
+   systemctl status <failed_service_unit>
+   ```
+
+   Replace `<failed_service_unit>` with the name of the failed service unit you identified earlier.
+
+3. Examine the logs for the failed service
+
+   Use the following command to inspect the logs for any clues:
+
+   ```
+   journalctl -u <failed_service_unit> --since "1 hour ago"
+   ```
+
+   Adjust the `--since` parameter to view logs from a specific timeframe.
+
+4. Resolve the issue
+
+   Based on the information gathered from the status and logs, try to resolve the issue causing the failure. This can involve updating configuration files, installing missing dependencies, or addressing issues with other services that the failed service unit depends on.
+
+5. Restart the service
+
+   Once the issue has been addressed, restart the service to restore functionality:
+
+   ```
+   systemctl start <failed_service_unit>
+   ```
+
+   Verify that the service has started successfully:
+
+   ```
+   systemctl status <failed_service_unit>
+   ```
+
+### Useful resources
+
+1. [Systemd: Managing Services (ArchWiki)](https://wiki.archlinux.org/title/Systemd#Managing_services)
+2. [Troubleshooting Systemd Services (Digital Ocean)](https://www.digitalocean.com/community/tutorials/how-to-use-systemctl-to-manage-systemd-services-and-units)
diff --git a/health/guides/systemdunits/systemd_slice_unit_failed_state.md b/health/guides/systemdunits/systemd_slice_unit_failed_state.md
new file mode 100644
index 00000000..d736f83f
--- /dev/null
+++ b/health/guides/systemdunits/systemd_slice_unit_failed_state.md
@@ -0,0 +1,58 @@
+### Understand the alert
+
+This alert is triggered when a `systemd slice unit` enters a `failed state`. Systemd slice units are a way to organize and manage system processes in a hierarchical manner. If you receive this alert, it means that there is an issue with a specific slice unit, which can be crucial for system stability and performance.
+
+### What does the failed state mean?
+
+A `failed state` in the context of systemd units means that the unit has encountered a problem and is not functioning properly. This could be caused by a variety of reasons, such as misconfiguration, dependency issues, or unhandled errors in the underlying service.
+
+### Troubleshoot the alert
+
+- Identify the problematic systemd slice unit.
+
+  Run the following command to list all systemd units and their states:
+
+  ```bash
+  systemctl --all
+  ```
+
+  Look for the units with the `failed` state in the output, and take note of the affected unit(s).
+
+- Investigate the specific issue with the failed unit.
+
+  Use the `systemctl status` command followed by the unit name to get more information about the problem:
+
+  ```bash
+  systemctl status <unit-name>
+  ```
+
+  The output will provide more details on the issue and may include error messages or log entries that can help identify the root cause.
+
+- Check the unit logs for additional clues.
+
+  The `journalctl` command can be used to view the logs related to a specific unit by specifying the `-u` flag followed by the unit name:
+
+  ```bash
+  journalctl -u <unit-name>
+  ```
+
+  Analyze the log entries for any reported errors or warnings that could be related to the failure.
+
+- Address the root cause of the issue.
+
+  Based on the information gathered, take the necessary steps to resolve the issue with the failed unit. This may involve reconfiguring the unit, adjusting dependencies, or fixing the underlying service.
+
+- Restart the unit and verify its status.
+
+  Once the issue has been resolved, restart the systemd unit using the `systemctl restart` command:
+
+  ```bash
+  systemctl restart <unit-name>
+  ```
+
+  Afterwards, check the unit's status to confirm that it is no longer in a failed state and is functioning properly:
+
+  ```bash
+  systemctl status <unit-name>
+  ```
+
diff --git a/health/guides/systemdunits/systemd_socket_unit_failed_state.md b/health/guides/systemdunits/systemd_socket_unit_failed_state.md
new file mode 100644
index 00000000..9d2d4366
--- /dev/null
+++ b/health/guides/systemdunits/systemd_socket_unit_failed_state.md
@@ -0,0 +1,65 @@
+### Understand the alert
+
+The `systemd_socket_unit_failed_state` alert is triggered when a `systemd` socket unit on your Linux server enters a failed state. This could indicate issues with the services that depend on these socket units, impacting their functionality or performance.
+
+### What is a systemd socket unit?
+
+`systemd` is the system and service manager for modern Linux systems. It initializes and manages the services on the system, ensuring a smooth boot process and operation.
+
+A socket unit is a special kind of `systemd` unit that encapsulates local and remote IPC (Inter-process communication) sockets. They are defined by .socket files and are used to start and manage services automatically when incoming traffic is received on socket addresses managed by the socket unit.
+
+### Troubleshoot the alert
+
+1. Identify the failed socket unit(s):
+
+To list all the socket units with their current state, run:
+
+```
+systemctl --state=failed --type=socket
+```
+
+This command will display the socket units in a failed state.
+
+2. Check the status of the failed socket unit:
+
+To view the detailed status of a particular failed socket unit, use:
+
+```
+systemctl status your_socket_unit.socket
+```
+
+Replace `your_socket_unit` with the name of the failed socket unit you're investigating. This will provide more information about the socket unit and possible error messages.
+
+3. Examine the logs:
+
+Check the logs for any errors or issues related to the failed socket unit:
+
+```
+journalctl -u your_socket_unit.socket
+```
+
+Replace `your_socket_unit` with the name of the failed socket unit you're investigating. This will display relevant logs for the socket unit.
+
+4. Restart the failed socket unit:
+
+Once the issue is identified and resolved, you can attempt to restart the failed socket unit:
+
+```
+systemctl restart your_socket_unit.socket
+```
+
+Replace `your_socket_unit` with the name of the failed socket unit you're investigating. This will attempt to restart the socket unit and put it into an active state.
+
+5. Monitor the socket unit:
+
+After restarting the socket unit, monitor its status to ensure it stays active and operational:
+
+```
+systemctl status your_socket_unit.socket
+```
+
+Replace `your_socket_unit` with the name of the failed socket unit you're investigating. Verify that the socket unit remains in an active state.
+
+### Useful resources
+
+1. [Sockets in Systemd Linux Operating System](https://www.freedesktop.org/software/systemd/man/systemd.socket.html)
diff --git a/health/guides/systemdunits/systemd_swap_unit_failed_state.md b/health/guides/systemdunits/systemd_swap_unit_failed_state.md
new file mode 100644
index 00000000..516156d0
--- /dev/null
+++ b/health/guides/systemdunits/systemd_swap_unit_failed_state.md
@@ -0,0 +1,58 @@
+### Understand the alert
+
+This alert monitors the state of your `systemd` swap units and is triggered when a swap unit is in the `failed` state. If you receive this alert, it means that you have an issue with one or more of your swap units managed by `systemd`.
+
+### What is a swap unit?
+
+A swap unit in Linux is a dedicated partition or a file on the filesystem (called a swap file) used for expanding system memory. When the physical memory (RAM) gets full, the Linux system swaps some of the least used memory pages to this swap space, allowing more applications to run without the need for extra physical memory.
+
+### What does the failed state mean?
+
+If a `systemd` swap unit is in the `failed` state, it means that there was an issue initializing or activating the swap space. This might be due to configuration issues, disk space limitations, or filesystem errors.
+
+### Troubleshoot the alert
+
+1. Check the status of the swap units:
+
+   To list the swap units and their states, run the following command:
+
+   ```
+   systemctl list-units --type=swap
+   ```
+
+   Look for the failed swap units and note their names.
+
+2. Investigate the failed swap units:
+
+   For each failed swap unit, check its status and any relevant messages by running:
+
+   ```
+   systemctl status <swap_unit_name>
+   ```
+
+   Replace `<swap_unit_name>` with the name of the failed swap unit.
+
+3. Check system logs:
+
+   Examine the system logs for any errors or information related to the failed swap units with:
+
+   ```
+   journalctl -xeu <swap_unit_name>
+   ```
+
+4. Identify the issue and take corrective actions:
+
+   Based on the information from the previous steps, you may need to:
+
+   - Adjust swap unit configurations
+   - Increase disk space or allocate a larger swap partition
+   - Resolve disk or filesystem issues
+   - Restart the swap units
+
+5. Verify that the swap units are working:
+
+   After resolving the issue, ensure the swap units are active and running by repeating step 1.
+
+### Useful resources
+
+1. [systemd.swap — Swap unit configuration](https://www.freedesktop.org/software/systemd/man/systemd.swap.html)
diff --git a/health/guides/systemdunits/systemd_target_unit_failed_state.md b/health/guides/systemdunits/systemd_target_unit_failed_state.md
new file mode 100644
index 00000000..84340514
--- /dev/null
+++ b/health/guides/systemdunits/systemd_target_unit_failed_state.md
@@ -0,0 +1,52 @@
+### Understand the alert
+
+The `systemd_target_unit_failed_state` alert is triggered when a `systemd` target unit goes into a failed state. Systemd is the system and service manager for Linux, and target units are groups of systemd units that are organized for a specific purpose. If this alert is triggered, it means there is an issue with one of your systemd target units.
+
+### What does failed state mean?
+
+A systemd target unit in the failed state means that one or more units/tasks of that target, whether it's a service, or any other kind of systemd unit, have encountered an issue and cannot continue running.
+
+### Troubleshoot the alert
+
+1. First, you need to identify which systemd target unit is causing the alert. You can list all the failed units by running:
+
+   ```
+   systemctl --failed --all
+   ```
+
+2. Once you have identified the problematic target unit, check its status for more information about the issue. Replace `<target_unit>` with the actual target unit name:
+
+   ```
+   systemctl status <target_unit>
+   ```
+
+3. Look at the logs of the failed target unit to collect more details on the issue:
+
+   ```
+   journalctl -u <target_unit>
+   ```
+
+4. Based on the information gathered in steps 2 and 3, troubleshoot and fix the problem(s) in your target unit. This may involve:
+   - Editing the unit file
+   - Checking the services and processes that compose the target
+   - Looking into configuration files and directories.
+
+5. Reload the systemctl daemon to apply any changes you made, then restart the target unit:
+
+   ```
+   sudo systemctl daemon-reload
+   sudo systemctl restart <target_unit>
+   ```
+
+6. Verify that the target unit has been successfully restarted:
+
+   ```
+   systemctl is-active <target_unit>
+   ```
+
+7. Continue monitoring the target unit to ensure that it remains stable and does not return to a failed state.
+
+### Useful resources
+
+1. [systemd man pages (targets)](https://www.freedesktop.org/software/systemd/man/systemd.target.html)
+2. [systemd Targets - ArchWiki](https://wiki.archlinux.org/title/Systemd#Targets)