From be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Fri, 19 Apr 2024 04:57:58 +0200 Subject: Adding upstream version 1.44.3. Signed-off-by: Daniel Baumann --- health/guides/zfs/zfs_memory_throttle.md | 21 ++++++++++++ health/guides/zfs/zfs_pool_state_crit.md | 58 ++++++++++++++++++++++++++++++++ health/guides/zfs/zfs_pool_state_warn.md | 20 +++++++++++ 3 files changed, 99 insertions(+) create mode 100644 health/guides/zfs/zfs_memory_throttle.md create mode 100644 health/guides/zfs/zfs_pool_state_crit.md create mode 100644 health/guides/zfs/zfs_pool_state_warn.md (limited to 'health/guides/zfs') diff --git a/health/guides/zfs/zfs_memory_throttle.md b/health/guides/zfs/zfs_memory_throttle.md new file mode 100644 index 00000000..3903a02e --- /dev/null +++ b/health/guides/zfs/zfs_memory_throttle.md @@ -0,0 +1,21 @@ +### Understand the alert + +This alert indicates the number of times ZFS had to limit the Adaptive Replacement Cache (ARC) growth in the last 10 minutes. ARC stores the most recently used and most frequently used data in RAM, helping to improve read performance. When ARC growth is throttled, it can impact read performance due to a higher chance of cold hits. + +### Troubleshoot the alert + +1. **Monitor RAM usage**: Check your system's RAM usage to determine if there is sufficient memory available for ARC. If other processes are consuming a large amount of RAM, ARC growth may be throttled to free up resources. + +2. **Increase RAM capacity**: If you consistently experience ARC throttling, consider increasing your RAM capacity. This will allow for a larger ARC size, improving read performance and reducing the likelihood of cold hits. + +3. **Adjust ARC size**: If you are using ZFS on Linux, you can adjust the ARC size by modifying the `zfs_arc_min` and `zfs_arc_max` parameters in the `/etc/modprobe.d/zfs.conf` file. On FreeBSD, you can adjust the `vfs.zfs.arc_max` sysctl parameter. Make sure to set these values according to your system's RAM capacity and workload requirements. + +4. **Evaluate workload**: Analyze your system's workload to identify if there are any specific processes or applications that are causing high memory usage, leading to ARC throttling. Optimize or limit these processes if necessary. + + +### Useful resources + +1. [Linux: ZFS Caching](https://www.45drives.com/community/articles/zfs-caching/) +2. [FreeBSD: OpenZFS documentation](https://openzfs.org/w/index.php?title=Features&mobileaction=toggle_view_mobile#Single_Copy_ARC) +3. [ZFS on Linux Performance Tuning Guide](https://github.com/zfsonlinux/zfs/wiki/Performance-Tuning) +4. [FreeBSD ZFS Tuning Guide](https://wiki.freebsd.org/ZFSTuningGuide) diff --git a/health/guides/zfs/zfs_pool_state_crit.md b/health/guides/zfs/zfs_pool_state_crit.md new file mode 100644 index 00000000..72db4b06 --- /dev/null +++ b/health/guides/zfs/zfs_pool_state_crit.md @@ -0,0 +1,58 @@ +### Understand the alert + +The `zfs_pool_state_crit` alert indicates that your ZFS pool is faulted or unavailable, which can cause access and data loss problems. It is important to identify the current state of the pool and take corrective actions to remedy the situation. + +### Troubleshoot the alert + +1. **Check the current ZFS pool state** + + Run the `zpool status` command to view the status of all ZFS pools: + + ``` + zpool status + ``` + + This will display the pool state, device states, and any errors that occurred. Take note of any devices that are in DEGRADED, FAULTED, UNAVAIL, or OFFLINE states. + +2. **Assess the problematic devices** + + Check for any hardware issues or file system errors on the affected devices. For example, if a device is FAULTED due to a hardware failure, replace the device. If a device is UNAVAIL or OFFLINE, check the connectivity and make sure it's properly accessible. + +3. **Repair the pool** + + Depending on the root cause of the problem, you may need to take different actions: + + - Repair file system errors using the `zpool scrub` command. This will initiate a scrub, which attempts to fix any errors in the pool. + + ``` + zpool scrub [pool_name] + ``` + + - Replace a failed device using the `zpool replace` command. For example, if you have a new device `/dev/sdb` that will replace `/dev/sda`, run the following command: + + ``` + zpool replace [pool_name] /dev/sda /dev/sdb + ``` + + - Bring an OFFLINE device back ONLINE using the `zpool online` command: + + ``` + zpool online [pool_name] [device] + ``` + + Note: Make sure to replace `[pool_name]` and `[device]` with the appropriate values for your system. + +4. **Verify the pool state** + + After taking the necessary corrective actions, run the `zpool status` command again to verify that the pool state has improved. + +5. **Monitor pool health** + + Continuously monitor the health of your ZFS pools to avoid future issues. Consider setting up periodic scrubs and reviewing system logs to catch any hardware or file system errors. + +### Useful resources + +1. [Determining the Health Status of ZFS Storage Pools](https://docs.oracle.com/cd/E19253-01/819-5461/gamno/index.html) +2. [Chapter 11, Oracle Solaris ZFS Troubleshooting and Pool Recovery](https://docs.oracle.com/cd/E53394_01/html/E54801/gavwg.html) +3. [ZFS on FreeBSD documentation](https://docs.freebsd.org/en/books/handbook/zfs/) +4. [OpenZFS documentation](https://openzfs.github.io/openzfs-docs/) \ No newline at end of file diff --git a/health/guides/zfs/zfs_pool_state_warn.md b/health/guides/zfs/zfs_pool_state_warn.md new file mode 100644 index 00000000..ffba2045 --- /dev/null +++ b/health/guides/zfs/zfs_pool_state_warn.md @@ -0,0 +1,20 @@ +### Understand the alert + +This alert is triggered when the state of a ZFS pool changes to a warning state, indicating potential issues with the pool, such as disk errors, corruption, or degraded performance. + +### Troubleshoot the alert + +1. **Check pool status**: Use the `zpool status` command to check the status of the ZFS pool and identify any issues or errors. + +2. **Review disk health**: Inspect the health of the disks in the ZFS pool using `smartctl` or other disk health monitoring tools. + +3. **Replace faulty disks**: If a disk in the ZFS pool is faulty, replace it with a new one and perform a resilvering operation using `zpool replace`. + +4. **Scrub the pool**: Run a manual scrub operation on the ZFS pool with `zpool scrub` to verify data integrity and repair any detected issues. + +5. **Monitor pool health**: Keep an eye on the ZFS pool's health and performance metrics to ensure that issues are resolved and do not recur. + +### Useful resources + +1. [ZFS on Linux Documentation](https://openzfs.github.io/openzfs-docs/) +2. [FreeBSD Handbook - ZFS](https://www.freebsd.org/doc/handbook/zfs.html) -- cgit v1.2.3