summaryrefslogtreecommitdiffstats
path: root/health/guides/memory/1hour_ecc_memory_uncorrectable.md
diff options
context:
space:
mode:
Diffstat (limited to 'health/guides/memory/1hour_ecc_memory_uncorrectable.md')
-rw-r--r--health/guides/memory/1hour_ecc_memory_uncorrectable.md27
1 files changed, 27 insertions, 0 deletions
diff --git a/health/guides/memory/1hour_ecc_memory_uncorrectable.md b/health/guides/memory/1hour_ecc_memory_uncorrectable.md
new file mode 100644
index 000000000..509ff5448
--- /dev/null
+++ b/health/guides/memory/1hour_ecc_memory_uncorrectable.md
@@ -0,0 +1,27 @@
+### Understand the alert
+
+This alert, `1hour_ecc_memory_uncorrectable`, indicates that there are ECC (Error-Correcting Code) uncorrectable errors detected in your system's memory within the last hour. ECC errors are caused by issues in the system's RAM (Random Access Memory). These uncorrectable errors are severe and may lead to system crashes or data corruption.
+
+### What are ECC errors?
+
+ECC memory is designed to detect and, in some cases, correct data corruption in the memory, preventing system crashes and providing overall system stability. ECC errors fall into two categories:
+
+1. **Correctable Errors**: These are errors that the ECC memory can detect and correct, preventing system crashes and ensuring data integrity.
+2. **Uncorrectable Errors**: These are more severe errors that the ECC memory cannot correct, often requiring faulty memory modules to be replaced to prevent system crashes and data corruption.
+
+### Troubleshoot the alert
+
+- **Inspect the memory modules**: Power off the system and check the memory modules for any signs of damage or poor contact with the socket. Ensure that the memory modules are seated firmly and there is proper contact.
+
+- **Run memory diagnostics**: Run memory diagnostic tools, like [Memtest86+](https://www.memtest.org/) to identify any memory errors and verify the memory's health. If errors are detected, it's an indication that the memory modules need to be replaced.
+
+- **Replace faulty memory modules**: If uncorrectable errors continue occurring or if diagnostics identify faulty memory modules, consider replacing them. Before doing so, check if the memory modules are still covered under warranty.
+
+- **Check system logs**: Review system logs, such as Event Viewer on Windows or `/var/log` on Linux systems, for any related messages or errors that may help to diagnose the issue further.
+
+- **Update firmware**: Ensure your system's firmware and BIOS are up-to-date. Manufacturers often release stability and performance improvements that can potentially resolve or mitigate ECC errors.
+
+
+### Useful resources
+
+1. [How to Check Memory Problems in Linux](https://www.cyberciti.biz/faq/linux-check-memory-usage/)