summaryrefslogtreecommitdiffstats
path: root/health/guides/apcupsd
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-03-09 13:19:22 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-03-09 13:19:22 +0000
commitc21c3b0befeb46a51b6bf3758ffa30813bea0ff0 (patch)
tree9754ff1ca740f6346cf8483ec915d4054bc5da2d /health/guides/apcupsd
parentAdding upstream version 1.43.2. (diff)
downloadnetdata-c21c3b0befeb46a51b6bf3758ffa30813bea0ff0.tar.xz
netdata-c21c3b0befeb46a51b6bf3758ffa30813bea0ff0.zip
Adding upstream version 1.44.3.upstream/1.44.3
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'health/guides/apcupsd')
-rw-r--r--health/guides/apcupsd/apcupsd_10min_ups_load.md22
-rw-r--r--health/guides/apcupsd/apcupsd_last_collected_secs.md46
-rw-r--r--health/guides/apcupsd/apcupsd_ups_charge.md45
3 files changed, 113 insertions, 0 deletions
diff --git a/health/guides/apcupsd/apcupsd_10min_ups_load.md b/health/guides/apcupsd/apcupsd_10min_ups_load.md
new file mode 100644
index 000000000..4069de9f0
--- /dev/null
+++ b/health/guides/apcupsd/apcupsd_10min_ups_load.md
@@ -0,0 +1,22 @@
+### Understand the alert
+
+This alert is related to your APC uninterruptible power supply (UPS) device. If you receive this alert, it means that your UPS is experiencing high load, which could result in it entering bypass mode or shutdown to protect the device. The alert is triggered in a warning state when the average UPS load is between 70-80% and in a critical state when it is between 85-95%.
+
+### Troubleshoot the alert
+
+Follow these steps to address the high load on your UPS device:
+
+1. **Identify devices connected to the UPS**: Make a list of all the devices connected to the UPS. This list could include computers, servers, routers, and other essential equipment.
+
+2. **Assess the importance of each device**: Prioritize the devices connected to the UPS based on their importance to your network infrastructure. Determine which devices are mission-critical and which ones can be temporarily disconnected without causing significant disruptions.
+
+3. **Disconnect non-critical devices**: Once you have assessed the importance of the connected devices, disconnect any non-critical devices to reduce the load on the UPS. This action will help ensure that the mission-critical devices continue to receive power during a utility failure.
+
+4. **Consider additional UPS capacity**: If you frequently receive this alert or are unable to disconnect enough devices to reduce the load on the UPS, consider adding additional UPS capacity to your infrastructure. This additional capacity could come in the form of additional UPS units or a larger UPS with a higher output capacity.
+
+5. **Monitor the UPS load**: After disconnecting non-critical devices or adding additional UPS capacity, continue monitoring the UPS load using the Netdata Agent to ensure the load stays within acceptable limits. If the alert persists, you may need to reevaluate your infrastructure and device connections.
+
+### Useful resources
+
+1. [APC UPS Management](https://www.schneider-electric.com/en/product-category/870_IDSof_0145_NET/!ut/p/z1/hZBNbsIwDMD3ejK_Sh4xWb1tgwEfkFLCVKrUYKngigoXrWtJ_gSCk_bm0RfbT707TIAx8WuuDIwdwmK28Q2YY3Agq3XkKAGwpTEgZUPAHD7HxAqcAkgxV7OuBHSkrBSV7eGzvdN1jQZSYhNnhP7YvfFGttb8j7LlPvTXSuC7V-q1DXce8XtWjZmfrniT7ufcTtT8AKaWHzA!!/dz/d5/L2dBISEvZ0FBIS9nQSEh/)
+2. [Understanding the Different Types of UPS Systems](https://www.apc.com/us/en/faqs/FA157448/)
diff --git a/health/guides/apcupsd/apcupsd_last_collected_secs.md b/health/guides/apcupsd/apcupsd_last_collected_secs.md
new file mode 100644
index 000000000..7c8f8035d
--- /dev/null
+++ b/health/guides/apcupsd/apcupsd_last_collected_secs.md
@@ -0,0 +1,46 @@
+### Understand the alert
+
+This alert is related to your American Power Conversion (APC) uninterruptible power supply (UPS) device. The Netdata Agent monitors the number of seconds since the last successful data collection by querying the `apcaccess` tool. If you receive this alert, it means that no data collection has taken place for some time, which might indicate a problem with the APC UPS device or connection.
+
+### Troubleshoot the alert
+
+1. Verify the `apcaccess` tool is installed and functioning properly
+ ```
+ $ apcaccess status
+ ```
+ This command should provide you with a status display of the UPS. If the command is not found, you may need to install the `apcaccess` tool.
+
+2. Check the APC UPS daemon
+
+ a. Check the status of the APC UPS daemon
+ ```
+ $ systemctl status apcupsd
+ ```
+
+ b. Check for obvious and common errors, such as wrong device path, incorrect permissions, or configuration issues in `/etc/apcupsd/apcupsd.conf`.
+
+ c. If needed, restart the APC UPS daemon
+ ```
+ $ systemctl restart apcupsd
+ ```
+
+3. Inspect system logs
+
+ Check the system logs for any error messages related to APC UPS or `apcupsd`, which might give more insights into the issue.
+
+4. Verify UPS Connection
+
+ Ensure that the UPS device is properly connected to your server, both physically (USB/Serial) and in the configuration file (`/etc/apcupsd/apcupsd.conf`).
+
+5. Update Netdata configuration
+
+ If the issue is still not resolved, you can try updating the Netdata configuration file for the `apcupsd_last_collected_secs` collector.
+
+6. Check your UPS device
+
+ If all previous steps have been completed and the issue persists, your UPS device might be faulty. Consider contacting the manufacturer for support or replace the device with a known-good unit.
+
+### Useful resources
+
+1. [Netdata - APC UPS monitoring](https://learn.netdata.cloud/docs/data-collection/ups/apc-ups)
+2. [`apcupsd` - Power management and control software for APC UPS](https://github.com/apcupsd/apcupsd)
diff --git a/health/guides/apcupsd/apcupsd_ups_charge.md b/health/guides/apcupsd/apcupsd_ups_charge.md
new file mode 100644
index 000000000..600520b58
--- /dev/null
+++ b/health/guides/apcupsd/apcupsd_ups_charge.md
@@ -0,0 +1,45 @@
+### Understand the alert
+
+This alert is related to the charge level of your American Power Conversion (APC) uninterruptible power supply (UPS) device. When the UPS charge level drops below a certain threshold, you receive an alert indicating that the system is running on battery and may shut down if external power is not restored soon.
+
+- Warning state: UPS charge < 100%
+- Critical state: UPS charge < 50%
+
+The main purpose of a UPS is to provide a temporary power source to connected devices in case of a power outage. As the battery charge decreases, you need to either restore the power supply or prepare the equipment for a graceful shutdown.
+
+### Troubleshoot the alert
+
+1. Check the UPS charge level and status
+
+ To investigate the current status and charge level of the UPS, you can use the `apcaccess` command which provides information about the APC UPS device.
+
+ ```
+ apcaccess
+ ```
+
+ Look for the `STATUS` and `BCHARGE` fields in the output.
+
+2. Restore the power supply (if possible)
+
+ If the power outage is temporary or local (e.g. due to a tripped circuit breaker), try to restore the power supply to the UPS by fixing the issue or connecting the UPS to a different power source.
+
+3. Prepare for a graceful shutdown
+
+ If you cannot restore power to the UPS, or if the battery charge is critically low, you should immediately prepare your machine and any connected devices for a graceful shutdown. This will help to avoid data loss or system corruption due to an abrupt shutdown.
+
+ For Linux systems, you can execute the following command to perform a graceful shutdown:
+
+ ```
+ sudo shutdown -h +1 "UPS battery is low. The system will shut down in 1 minute."
+ ```
+
+ For Windows systems, open a command prompt with admin privileges and execute the following command to perform a graceful shutdown:
+
+ ```
+ shutdown /s /t 60 /c "UPS battery is low. The system will shut down in 1 minute."
+ ```
+
+4. Monitor UPS and system logs
+
+ Keep an eye on UPS and system logs to detect any issues with the power supply or UPS device. This can help you stay informed about the system's status and troubleshoot any potential problems.
+