4 files changed, 175 insertions, 0 deletions
diff --git a/health/guides/redis/redis_bgsave_broken.md b/health/guides/redis/redis_bgsave_broken.md
new file mode 100644
index 00000000..23ed75ff
--- /dev/null
+++ b/health/guides/redis/redis_bgsave_broken.md
@@ -0,0 +1,23 @@
+### Understand the alert
+
+This alert is triggered when the Redis server fails to save the RDB snapshot to disk. This can indicate issues with the disk, the Redis server itself, or other factors affecting the save operation.
+
+### Troubleshoot the alert
+
+1. **Check Redis logs**: Inspect the Redis logs to identify any error messages or issues related to the failed RDB save operation. You can typically find the logs in `/var/log/redis/redis-server.log`.
+
+2. **Verify disk space**: Ensure that your server has enough disk space available for the RDB snapshot. Insufficient disk space can cause the save operation to fail.
+
+3. **Check disk health**: Use disk health monitoring tools like `smartctl` to inspect the health of the disk where the RDB snapshot is being saved.
+
+4. **Review Redis configuration**: Check your Redis server's configuration file (`redis.conf`) for any misconfigurations or settings that may be causing the issue. Ensure that the `dir` and `dbfilename` options are correctly set.
+
+5. **Monitor server resources**: Monitor your server's resources, such as CPU and RAM usage, to ensure that they are not causing issues with the save operation.
+
+6. **Restart Redis**: If the issue persists, consider restarting the Redis server to clear any temporary issues or stuck processes.
+
+### Useful resources
+
+1. [Redis Configuration Documentation](https://redis.io/topics/config)
+2. [Redis Persistence Documentation](https://redis.io/topics/persistence)
+3. [Redis Troubleshooting Guide](https://redis.io/topics/problems)
diff --git a/health/guides/redis/redis_bgsave_slow.md b/health/guides/redis/redis_bgsave_slow.md
new file mode 100644
index 00000000..6a04bdf2
--- /dev/null
+++ b/health/guides/redis/redis_bgsave_slow.md
@@ -0,0 +1,54 @@
+### Understand the alert
+
+This alert, `redis_bgsave_slow`, indicates that the duration of the ongoing Redis RDB save operation is taking too long. This can be due to a large dataset size or a lack of CPU resources. As a result, Redis might stop serving clients for a few milliseconds, or even up to a second.
+
+### What is the Redis RDB save operation?
+
+Redis RDB (Redis Database) is a point-in-time snapshot of the dataset. It's a binary file that represents the dataset at the time of saving. The RDB save operation is the process of writing the dataset to disk, which occurs in the background.
+
+### Troubleshoot the alert
+
+1. Check the CPU usage
+
+Use the `top` command to see if the CPU usage is unusually high.
+
+```bash
+top
+```
+
+If the CPU usage is high, identify the processes that are consuming the most CPU resources and determine if they are necessary. Minimize the load by closing unnecessary processes.
+
+2. Analyze the dataset size
+
+Check the size of your Redis dataset using the `INFO` command:
+
+```bash
+redis-cli INFO | grep "used_memory_human"
+```
+
+If the dataset size is large, consider optimizing your data structure or implementing data management strategies, such as data expiration or partitioning.
+
+3. Monitor the Redis RDB save operation
+
+Use the following command to obtain the Redis statistics:
+
+```bash
+redis-cli INFO | grep "rdb_last_bgsave_time_sec"
+```
+
+Review the duration of the RDB save operation (rdb_last_bgsave_time_sec). If the save operation takes an unusually long time or fails frequently, consider optimizing your Redis configuration or improving your hardware resources like CPU and disk I/O.
+
+4. Change the save operation frequency
+
+To limit the frequency of RDB save operations, adjust the `save` configuration directive in your Redis configuration file (redis.conf). For example, to save the dataset only after 300 seconds (5 minutes) and at least 10000 changes:
+
+```
+save 300 10000
+```
+
+After modifying the configuration, restart the Redis service for the changes to take effect.
+
+### Useful resources
+
+1. [Redis Persistence](https://redis.io/topics/persistence)
+2. [Redis configuration](https://redis.io/topics/config)
diff --git a/health/guides/redis/redis_connections_rejected.md b/health/guides/redis/redis_connections_rejected.md
new file mode 100644
index 00000000..78460246
--- /dev/null
+++ b/health/guides/redis/redis_connections_rejected.md
@@ -0,0 +1,48 @@
+### Understand the alert
+
+The `redis_connections_rejected` alert is triggered when the number of connections rejected by Redis due to the `maxclients` limit being reached in the last minute is greater than 0. This means that Redis is no longer able to accept new connections as it has reached its maximum allowed clients.
+
+### What does maxclients limit mean?
+
+The `maxclients` limit in Redis is the maximum number of clients that can be connected to the Redis instance at the same time. When the Redis server reaches its `maxclients` limit, any new connection attempts will be rejected.
+
+### Troubleshoot the alert
+
+1. Check the current number of connections in Redis:
+
+   Use the `redis-cli` command-line tool to check the current number of clients connected to the Redis server:
+
+   ```
+   redis-cli client list | wc -l
+   ```
+
+2. Check Redis configuration file for the maxclients setting:
+
+   The `maxclients` value can be found in the Redis configuration file, usually called `redis.conf`. Open the file and search for `maxclients` to find the current limit.
+
+   ```
+   grep 'maxclients' /etc/redis/redis.conf
+   ```
+
+3. Increase the maxclients limit.
+
+   If necessary, increase the `maxclients` limit in the Redis configuration file (`redis.conf`), and then restart the Redis service to apply the changes:
+
+   ```
+   sudo systemctl restart redis
+   ```
+
+   _**Note**: Keep in mind that increasing the `maxclients` limit might cause increased memory consumption._
+
+4. Inspect client connections.
+
+   Determine if the connections are legitimate and needed for your application's requirements, or if some clients are connecting unnecessarily. Optimize your application or services as needed to reduce the number of unwanted connections.
+
+5. Monitor connection usage.
+
+   Keep an eye on connection usage over time to better understand the trends and patterns in your system, and adjust the `maxclients` configuration accordingly.
+
+### Useful resources
+
+1. [Redis Clients documentation](https://redis.io/topics/clients)
+2. [Redis configuration documentation](https://redis.io/topics/config)
diff --git a/health/guides/redis/redis_master_link_down.md b/health/guides/redis/redis_master_link_down.md
new file mode 100644
index 00000000..5a2d2429
--- /dev/null
+++ b/health/guides/redis/redis_master_link_down.md
@@ -0,0 +1,50 @@
+### Understand the alert
+
+The `redis_master_link_down` alert is triggered when there is a disconnection between a Redis master and its slave for more than 10 seconds. This alert indicates a potential problem with the replication process and can impact the data consistency across multiple instances.
+
+### Troubleshoot the alert
+
+1. Check the Redis logs
+
+   Examine the Redis logs for any errors or issues regarding the disconnection between the master and slave instances. By default, Redis log files are located at `/var/log/redis/redis.log`. Look for messages related to replication, network errors or timeouts.
+
+   ```
+   grep -i "replication" /var/log/redis/redis.log
+   grep -i "timeout" /var/log/redis/redis.log
+   ```
+
+2. Check the Redis replication status
+
+   Connect to the Redis master using the `redis-cli` tool, and execute the `INFO` command to get the detailed information about the master instance:
+
+   ```
+   redis-cli
+   INFO REPLICATION
+   ```
+
+   Also, check the replication status on the slave instance. If you have access to the IP address and port of the slave, connect to it and run the same `INFO` command.
+
+3. Verify the network connection between the master and slave instances
+
+   Test the network connectivity using `ping` and `telnet` or `nc` commands, ensuring that the connection between the master and slave instances is stable and there are no issues with firewalls or network policies.
+
+   ```
+   ping <slave_ip_address>
+   telnet <slave_ip_address> <redis_port>
+   ```
+
+4. Restart the Redis instances (if needed)
+
+   If Redis instances are experiencing issues or are unable to reconnect, consider restarting them. Be cautious as restarting instances might result in data loss or consistency issues.
+
+   ```
+   sudo systemctl restart redis
+   ```
+
+5. Monitor the situation
+
+   After addressing the potential issues, keep an eye on the Redis instances to ensure that the problem doesn't reoccur.
+
+### Useful resources
+
+1. [Redis Replication Documentation](https://redis.io/topics/replication)