summaryrefslogtreecommitdiffstats
path: root/health/guides/upsd
diff options
context:
space:
mode:
Diffstat (limited to 'health/guides/upsd')
-rw-r--r--health/guides/upsd/upsd_10min_ups_load.md38
-rw-r--r--health/guides/upsd/upsd_ups_battery_charge.md38
-rw-r--r--health/guides/upsd/upsd_ups_last_collected_secs.md34
3 files changed, 110 insertions, 0 deletions
diff --git a/health/guides/upsd/upsd_10min_ups_load.md b/health/guides/upsd/upsd_10min_ups_load.md
new file mode 100644
index 000000000..fad4a2f6f
--- /dev/null
+++ b/health/guides/upsd/upsd_10min_ups_load.md
@@ -0,0 +1,38 @@
+### Understand the alert
+
+This alert is based on the `upsd_10min_ups_load` metric, which measures the average UPS load over the last 10 minutes. If you receive this alert, it means that the load on your UPS is higher than expected, which may lead to an unstable power supply and ungraceful system shutdowns.
+
+### Troubleshoot the alert
+
+1. Verify the UPS load status
+
+ Check the current load on the UPS using the `upsc` command with your UPS identifier:
+ ```
+ upsc <your_ups_identifier>
+ ```
+ Look for the `ups.load` metric in the command output to identify the current load percentage.
+
+2. Analyze the connected devices
+
+ Make an inventory of all devices connected to the UPS, including servers, networking devices, and other equipment. Determine if all devices are essential or if some can be moved to another power source or disconnected entirely.
+
+3. Balance the load between multiple UPS units (if available)
+
+ If you have more than one UPS, consider distributing the connected devices across multiple units to balance the load and ensure that each UPS isn't overloaded.
+
+4. Upgrade or replace the UPS
+
+ If necessary, consider upgrading your UPS to a higher capacity model to handle the increased load or replacing the current unit if it's malfunctioning or unable to provide the required power.
+
+5. Monitor power usage trends
+
+ Regularly review your power usage patterns and system logs, and take action to prevent load spikes that could trigger the `nut_10min_ups_load` alert.
+
+6. Optimize device power consumption
+
+ Implement power-saving strategies for connected devices, such as enabling power-saving modes, reducing CPU usage, or using power-efficient networking equipment.
+
+### Useful resources
+
+1. [NUT user manual](https://networkupstools.org/docs/user-manual.chunked/index.html)
+2. [Five steps to reduce UPS energy consumption](https://sp.ts.fujitsu.com/dmsp/Publications/public/wp-reduce-ups-energy-consumption-ww-en.pdf)
diff --git a/health/guides/upsd/upsd_ups_battery_charge.md b/health/guides/upsd/upsd_ups_battery_charge.md
new file mode 100644
index 000000000..0d8f757f2
--- /dev/null
+++ b/health/guides/upsd/upsd_ups_battery_charge.md
@@ -0,0 +1,38 @@
+### Understand the alert
+
+The `upsd_ups_battery_charge` alert indicates that the average UPS charge over the last minute has dropped below a predefined threshold. This might be due to a power outage, a UPS malfunction, or a sudden surge in power demands that the UPS can't handle.
+
+### Troubleshoot the alert
+
+1. Check UPS status and connections
+
+Inspect the UPS physical connections, including power cables, communication cables, and any other devices connected to it. Ensure that everything is plugged in correctly and firmly.
+
+2. Check UPS logs and error messages
+
+Review the UPS logs for any error messages or events that might have occurred around the time the alert was triggered. This information could help you pinpoint the cause of the issue. You can find the logs in the Network UPS Tools (NUT) software.
+
+3. Monitor UPS charge level
+
+Keep an eye on the UPS charge level to determine if it's increasing or decreasing. This information can help you understand the overall health of your UPS.
+
+4. Test UPS batteries
+
+Test the UPS batteries to ensure that they are functioning correctly and have enough charge to power your devices during a power outage. Replace any faulty batteries or upgrade to higher-capacity batteries if needed.
+
+5. Check the UPS load
+
+Review the devices connected to the UPS and calculate their total power consumption. Ensure that the UPS is not overloaded and is capable of supporting the power demands of your devices.
+
+6. Restore the power supply
+
+If the UPS charge level remains low, try restoring the power supply to your UPS. This could involve switching to a different power source, fixing any faulty connections, or resolving issues with your local power grid.
+
+7. Prepare for a graceful shutdown
+
+If you can't restore the power supply to this UPS or if the problem persists,prepare your machine for a graceful shutdown to minimize the risk of data loss or hardware damage.
+
+### Useful resources
+
+1. [NUT User Manual](https://networkupstools.org/docs/user-manual.chunked/index.html)
+2. [UPS troubleshooting guide](https://www.apc.com/us/en/faqs/FA158852/)
diff --git a/health/guides/upsd/upsd_ups_last_collected_secs.md b/health/guides/upsd/upsd_ups_last_collected_secs.md
new file mode 100644
index 000000000..818247834
--- /dev/null
+++ b/health/guides/upsd/upsd_ups_last_collected_secs.md
@@ -0,0 +1,34 @@
+### Understand the alert
+
+This alert is related to the Network UPS Tools (NUT) which monitors power devices, such as uninterruptible power supplies, power distribution units, solar controllers, and server power supply units. If you receive this alert, it means that there is an issue with the data collection process and needs troubleshooting to ensure the monitoring process works correctly.
+
+### Troubleshoot the alert
+
+#### Check the upsd server
+
+1. Check the status of the upsd daemon:
+
+ ```
+ $ systemctl status upsd
+ ```
+
+2. Check for obvious and common errors in the log or output. If any errors are found, resolve them accordingly.
+
+3. Restart the daemon if needed:
+
+ ```
+ $ systemctl restart upsd
+ ```
+
+#### Diagnose a bad driver
+
+1. `upsd` expects the drivers to either update their status regularly or at least answer periodic queries, called pings. If a driver doesn't answer, `upsd` will declare it "stale" and no more information will be provided to the clients.
+
+2. If upsd complains about staleness when you start it, then either your driver or configuration files are probably broken. Be sure that the driver is actually running, and that the UPS definition in [ups.conf(5)](https://networkupstools.org/docs/man/ups.conf.html) is correct. Also, make sure that you start your driver(s) before starting upsd.
+
+3. Data can also be marked stale if the driver can no longer communicate with the UPS. In this case, the driver should also provide diagnostic information in the syslog. If this happens, check the serial or USB cabling, or inspect the network path in the case of a SNMP UPS.
+
+### Useful resources
+
+1. [NUT User Manual](https://networkupstools.org/docs/user-manual.chunked/index.html)
+2. [ups.conf(5)](https://networkupstools.org/docs/man/ups.conf.html) \ No newline at end of file