diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-19 02:57:58 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-04-19 02:57:58 +0000 |
commit | be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97 (patch) | |
tree | 9754ff1ca740f6346cf8483ec915d4054bc5da2d /health/guides/mysql | |
parent | Initial commit. (diff) | |
download | netdata-be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97.tar.xz netdata-be1c7e50e1e8809ea56f2c9d472eccd8ffd73a97.zip |
Adding upstream version 1.44.3.upstream/1.44.3upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'health/guides/mysql')
-rw-r--r-- | health/guides/mysql/mysql_10s_slow_queries.md | 25 | ||||
-rw-r--r-- | health/guides/mysql/mysql_10s_table_locks_immediate.md | 46 | ||||
-rw-r--r-- | health/guides/mysql/mysql_10s_table_locks_waited.md | 37 | ||||
-rw-r--r-- | health/guides/mysql/mysql_10s_waited_locks_ratio.md | 36 | ||||
-rw-r--r-- | health/guides/mysql/mysql_connections.md | 74 | ||||
-rw-r--r-- | health/guides/mysql/mysql_galera_cluster_size.md | 50 | ||||
-rw-r--r-- | health/guides/mysql/mysql_galera_cluster_size_max_2m.md | 40 | ||||
-rw-r--r-- | health/guides/mysql/mysql_galera_cluster_state_crit.md | 46 | ||||
-rw-r--r-- | health/guides/mysql/mysql_galera_cluster_state_warn.md | 43 | ||||
-rw-r--r-- | health/guides/mysql/mysql_galera_cluster_status.md | 39 | ||||
-rw-r--r-- | health/guides/mysql/mysql_replication.md | 64 | ||||
-rw-r--r-- | health/guides/mysql/mysql_replication_lag.md | 30 |
12 files changed, 530 insertions, 0 deletions
diff --git a/health/guides/mysql/mysql_10s_slow_queries.md b/health/guides/mysql/mysql_10s_slow_queries.md new file mode 100644 index 00000000..17321844 --- /dev/null +++ b/health/guides/mysql/mysql_10s_slow_queries.md @@ -0,0 +1,25 @@ +### Understand the alert + +This alert presents the number of slow queries in the last 10 seconds. If you receive this, it indicates a high number of slow queries. + +The metric is raised in a warning state when the value is larger than 10. If the number of slow queries in the last 10 seconds exceeds 20, then the alert is raised in critical state. + +Queries are defined as "slow", if they have taken more than `long_query_time` seconds, a predefined variable. Also, the value is measured in real time, not CPU time. + +### Troubleshoot the alert + +- Determine which queries are the problem and try to optimise them + +To identify the slow queries, you can enable the slow-query log of MySQL: + +1. Locate the `my.cnf` file +2. Enable the slow-query log by setting the `slow_query_log variable` to `On`. +3. Enter a path where the log files should be stored in the `slow_query_log_file` variable. + +After you know which queries are the ones taking longer than preferred, you can use the `EXPLAIN` keyword to overview how many rows are accessed, what operations are being done etc. + +After you've found the cause for the slow queries, you can start optimizing your queries. Consider to use an index and think about how you can change the way you `JOIN` tables. Both of these methods aid to reduce the amount of data that is being accessed without it really being needed. + +### Useful resources +[SQL Query Optimisation](https://opensource.com/article/17/5/speed-your-mysql-queries-300-times) + diff --git a/health/guides/mysql/mysql_10s_table_locks_immediate.md b/health/guides/mysql/mysql_10s_table_locks_immediate.md new file mode 100644 index 00000000..7b375b43 --- /dev/null +++ b/health/guides/mysql/mysql_10s_table_locks_immediate.md @@ -0,0 +1,46 @@ +### Understand the alert + +This alert is triggered when the number of table immediate locks in MySQL increases within the last 10 seconds. Table locks are used to control concurrent access to tables, and immediate locks are granted when the requested lock is available. + +### What are table immediate locks? + +In MySQL, table immediate locks are a mechanism for managing concurrent access to tables. When a table lock is requested and is available, an immediate lock is granted, allowing the process to continue execution. This ensures that multiple processes can't modify the data simultaneously, which could cause data inconsistencies. + +### Troubleshoot the alert + +1. Identify the queries causing the table locks: + + You can use the following command to display the process list in MySQL, which will include information about the locks: + + ``` + SHOW FULL PROCESSLIST; + ``` + +2. Analyze the queries: + + Check the queries causing the table locks to determine if they are necessary, can be optimized, or should be terminated. To terminate a specific query, use the `KILL QUERY` command followed by the connection ID: + + ``` + KILL QUERY connection_id; + ``` + +3. Check table lock status: + + To get more information about the lock status, you can use the following command to display the lock status of all tables: + + ``` + SHOW OPEN TABLES WHERE in_use > 0; + ``` + +4. Optimize database queries and configurations: + + Improve query performance by optimizing the queries and indexing the tables. Additionally, check your MySQL configuration and adjust it if necessary to minimize the number of locks required. + +5. Monitor the lock situation: + + Keep monitoring the lock situation with the `SHOW FULL PROCESSLIST` command to see if the problem persists. If the issue is not resolved, consider increasing the MySQL lock timeout or seek assistance from a database administrator or the MySQL community. + +### Useful resources + +1. [MySQL Table Locking](https://dev.mysql.com/doc/refman/8.0/en/table-locking.html) +2. [MySQL Lock Information](https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html) diff --git a/health/guides/mysql/mysql_10s_table_locks_waited.md b/health/guides/mysql/mysql_10s_table_locks_waited.md new file mode 100644 index 00000000..1cac9e92 --- /dev/null +++ b/health/guides/mysql/mysql_10s_table_locks_waited.md @@ -0,0 +1,37 @@ +### Understand the alert + +This alert is triggered when there's a high number of `table locks waited` in the last 10 seconds for a MySQL database. Table locks prevent multiple processes from writing to a table at the same time, ensuring the integrity of the data. However, too many table locks waiting can indicate a performance issue, as it could mean that some queries are causing deadlocks or taking too long to complete. + +### Troubleshoot the alert + +1. Identify queries causing locks + + Use the following MySQL command to view the currently running queries and identify the ones causing the table locks: + + ``` + SHOW FULL PROCESSLIST; + ``` + +2. Examine locked tables + + Use the following command to find more information about the locked tables: + + ``` + SHOW OPEN TABLES WHERE In_use > 0; + ``` + +3. Optimize query performance + + Analyze the queries causing the table locks and optimize them to improve performance. This may include creating or modifying indexes, optimizing the SQL query structure, or adjusting the MySQL server configuration settings. + +4. Consider using InnoDB + + If your MySQL database is using MyISAM storage engine, consider switching to InnoDB storage engine to take advantage of row-level locking and reduce the number of table locks. + +5. Monitor MySQL performance + + Keep an eye on MySQL performance metrics such as table locks, query response times, and overall database performance to prevent future issues. Tools like the Netdata Agent can help in monitoring MySQL performance. + +### Useful resources + +1. [InnoDB Locking and Transaction Model](https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-transaction-model.html) diff --git a/health/guides/mysql/mysql_10s_waited_locks_ratio.md b/health/guides/mysql/mysql_10s_waited_locks_ratio.md new file mode 100644 index 00000000..60c03059 --- /dev/null +++ b/health/guides/mysql/mysql_10s_waited_locks_ratio.md @@ -0,0 +1,36 @@ +### Understand the alert + +This alert indicates a high ratio of waited table locks in your MySQL database over the last 10 seconds. If you receive this alert, it means that there might be performance issues due to contention for table locks. + +### What are table locks? + +Table locks are a method used by MySQL to ensure data consistency and prevent multiple clients from modifying the same data at the same time. When a client attempts to modify data, it must first acquire a lock on the table. If the lock is not available, the client must wait until the lock is released by another client. + +### Troubleshoot the alert + +1. Identify problematic queries: + + Use the following command to display the queries that are causing table locks in your MySQL database: + + ``` + SHOW FULL PROCESSLIST; + ``` + + Look for queries with a state of `'Locked'` or `'Waiting for table lock'` and note down their details. + +2. Optimize your queries: + + Analyze the problematic queries identified in the previous step and try to optimize them. You can use `EXPLAIN` or other similar tools to get insights into the performance of the queries. + +3. Consider splitting your table(s): + + If the problem persists after optimizing the queries, consider splitting the large tables into smaller ones. This can help to reduce contention for table locks and improve performance. + +4. Use replication: + + Another solution to this issue is the implementation of MySQL replication, which can reduce contention for table locks by allowing read queries to be executed on replica servers rather than the primary server. + +### Useful resources + +1. [Documentation: Table Locking Issues](https://dev.mysql.com/doc/refman/5.7/en/table-locking.html) +2. [MySQL Replication](https://dev.mysql.com/doc/refman/8.0/en/replication.html) diff --git a/health/guides/mysql/mysql_connections.md b/health/guides/mysql/mysql_connections.md new file mode 100644 index 00000000..2f57fef2 --- /dev/null +++ b/health/guides/mysql/mysql_connections.md @@ -0,0 +1,74 @@ +### Understand the alert + +The `mysql_connections` alert indicates the percentage of used client connections compared to the maximum configured connections. When you receive this alert, it means your MySQL or MariaDB server is reaching its connection limit, which could lead to performance issues or failed connections for clients. + +### Troubleshoot the alert + +1. **Check the current connection usage** + + Use the following command to see the current used and total connections: + + ``` + mysql -u root -p -e "SHOW STATUS LIKE 'max_used_connections'; SHOW VARIABLES LIKE 'max_connections';" + ``` + + This will display the maximum number of connections used since the server was started and the maximum allowed number of connections (`max_connections`). + +2. **Monitor connections over time** + + You can monitor the connection usage over time using the following command: + + ``` + watch -n 1 "mysql -u root -p -e 'SHOW STATUS LIKE \"Threads_connected\";'" + ``` + + This will update the number of currently connected threads every second. + +3. **Identify connection-consuming processes** + + If connection usage is high, check which processes or clients are using connections: + + ``` + mysql -u root -p -e "SHOW PROCESSLIST;" + ``` + + This gives you an overview of the currently connected clients, their states, and queries being executed. + +4. **Optimize client connections** + + Analyze the processes using connections and ensure they close their connections properly when done, utilize connection pooling, and reduce the number of connections where possible. + +5. **Increase the connection limit (if necessary)** + + If you need to increase the `max_connections` value, follow these steps: + + - Log into MySQL from the terminal as shown in the troubleshooting section: + + ``` + mysql -u root -p + ``` + + - Check the current limit: + + ``` + show variables like "max_connections"; + ``` + + - Set a new limit temporarily: + + ``` + set global max_connections = "LIMIT"; + ``` + + Replace "LIMIT" with the desired new limit. + + - To set the limit permanently, locate the `my.cnf` file (typically under `/etc`, but it may vary depending on your installation) and append `max_connections = LIMIT` under the `[mysqld]` section. + + Replace "LIMIT" with the desired new limit, then restart the MySQL/MariaDB service. + +### Useful resources + +1. [How to Increase Max Connections in MySQL](https://ubiq.co/database-blog/how-to-increase-max-connections-in-mysql/) +2. [MySQL 5.7 Reference Manual: SHOW STATUS Syntax](https://dev.mysql.com/doc/refman/5.7/en/show-status.html) +3. [MySQL 5.7 Reference Manual: SHOW PROCESSLIST Syntax](https://dev.mysql.com/doc/refman/5.7/en/show-processlist.html) +4. [MySQL 5.7 Reference Manual: mysqld – The MySQL Server](https://dev.mysql.com/doc/refman/5.7/en/mysqld.html) diff --git a/health/guides/mysql/mysql_galera_cluster_size.md b/health/guides/mysql/mysql_galera_cluster_size.md new file mode 100644 index 00000000..ebe5d64a --- /dev/null +++ b/health/guides/mysql/mysql_galera_cluster_size.md @@ -0,0 +1,50 @@ +### Understand the alert + +This alert monitors the Galera cluster size and checks if there is a discrepancy between the current cluster size and the maximum size in the last 2 minutes. A warning is raised if the current size is larger, and a critical alert is raised if the current size is smaller than the maximum size in the last minute. + +### Troubleshoot the alert + +1. Check the network connectivity: + + Galera Cluster relies on persistent network connections. Review your system logs for any connectivity issues or network errors. If you find such issues, work with your network administrator to resolve them. + +2. Check the status of MySQL nodes: + + You can use the following query to examine the status of all nodes in the Galera cluster: + + ``` + SHOW STATUS LIKE 'wsrep_cluster_%'; + ``` + + Look for the `wsrep_cluster_size` and `wsrep_cluster_status` values, and analyze if there are any inconsistencies or issues. + +3. Review Galera logs: + + Inspect the logs of the Galera cluster for any errors, warnings or issues. The log files are usually located in `/var/log/mysql` or `/var/lib/mysql` directories. + +4. Check node synchronization: + + - Ensure that all nodes are synced by checking the `wsrep_local_state_comment` status variable. A value of 'Synced' indicates that the node is in sync with the cluster. + + ``` + SHOW STATUS LIKE 'wsrep_local_state_comment'; + ``` + + - If any node is not synced, check its logs to find the cause of the issue and resolve it. + +5. Restart nodes if necessary: + + If you find that a node is not working properly, you can try to restart the MySQL service on the affected node: + + ``` + sudo systemctl restart mysql + ``` + + Keep in mind that restarting a node can cause temporary downtime for applications connecting to that specific node. + +6. If the issue persists, consider contacting the Galera Cluster support team for assistance or consult the [Galera Cluster documentation](https://galeracluster.com/library/documentation/) for further guidance. + +### Useful resources + +1. [Galera Cluster Monitoring](https://galeracluster.com/library/training/tutorials/galera-monitoring.html) +2. [Galera Cluster Documentation](https://galeracluster.com/library/documentation/) diff --git a/health/guides/mysql/mysql_galera_cluster_size_max_2m.md b/health/guides/mysql/mysql_galera_cluster_size_max_2m.md new file mode 100644 index 00000000..0f14ca8a --- /dev/null +++ b/health/guides/mysql/mysql_galera_cluster_size_max_2m.md @@ -0,0 +1,40 @@ +### Understand the alert + +This alert calculates the maximum size of the MySQL Galera cluster over a 2-minute period, starting from one minute ago. If you receive this alert, it means that there has been a significant change in the cluster size, which might affect the database's performance, stability, and data consistency. + +### What is MySQL Galera Cluster? + +MySQL Galera Cluster is a synchronous multi-master cluster for MySQL, built on the Galera replication plugin. It provides high-availability and improved performance for MySQL databases by synchronizing data across multiple nodes. + +### What does the cluster size mean? + +The cluster size refers to the number of nodes participating in a MySQL Galera Cluster. An optimal cluster size ensures that the database can handle more significant workloads, handle node failures, and perform automatic failovers. + +### Troubleshoot the alert + +- Determine the current cluster size + + 1. Connect to any node in the cluster and run the following SQL query: + + ``` + SHOW STATUS LIKE 'wsrep_cluster_size'; + ``` + + 2. The query will display the current number of nodes in the cluster. + +- Identify the cause of the cluster size change + + 1. Check the MySQL and Galera logs on all nodes to identify any issues, such as network connectivity issues, node crashes, or hardware problems. + + 2. Review the logs for events such as joining or leaving of the cluster nodes. Look for patterns that could lead to instability (e.g., frequent node join & leave events). + +- Resolve the issue + + 1. Fix any identified problems causing the cluster size change. This may involve monitoring and resolving any network issues, restarting failed nodes, or replacing faulty hardware. + + 2. If necessary, plan and execute a controlled reconfiguration of the Galera cluster to maintain the optimal cluster size. + +### Useful resources + +1. [Galera Cluster Documentation](https://galeracluster.com/library/documentation/) +2. [Monitoring Galera Cluster for MySQL or MariaDB](https://severalnines.com/database-blog/monitoring-galera-cluster-mysql-or-mariadb)
\ No newline at end of file diff --git a/health/guides/mysql/mysql_galera_cluster_state_crit.md b/health/guides/mysql/mysql_galera_cluster_state_crit.md new file mode 100644 index 00000000..c1ac649e --- /dev/null +++ b/health/guides/mysql/mysql_galera_cluster_state_crit.md @@ -0,0 +1,46 @@ +### Understand the alert + +The `mysql_galera_cluster_state_crit` alert is triggered when the Galera node state is either `Undefined`, `Joining`, or `Error`. This indicates that there is an issue with a Galera node in your MySQL Galera Cluster. + +### What is a MySQL Galera Cluster? + +MySQL Galera Cluster is a synchronous, multi-master database cluster that provides high availability, no data loss, and scalability for your MySQL databases. It uses Galera replication library and MySQL server to achieve these goals. + +### Troubleshoot the alert + +To troubleshoot the MySQL Galera Cluster State Critical alert, follow these steps: + +1. Inspect the MariaDB error log + + Check the MariaDB error log for any relevant error messages that can help identify the issue. + + ``` + sudo tail -f /var/log/mysql/error.log + ``` + +2. Check the Galera node's status + + Connect to the problematic MySQL node and check the Galera node status by running the following query: + + ``` + SHOW STATUS LIKE 'wsrep_%'; + ``` + + Take note of the value of `wsrep_local_state` and `wsrep_local_state_comment`. + +3. Diagnose the issue + + - If `wsrep_local_state` is 0 (`Undefined`), it means the node is not part of any cluster. + - If `wsrep_local_state` is 1 (`Joining`), it means the node is trying to connect or reconnect to the cluster. + - If `wsrep_local_state` is 5 (`Error`), it means the node has encountered a consistency error. + +4. Resolve the issue + + - For an `Undefined` state, check and fix the wsrep configuration settings and restart the node. + - For a `Joining` state, ensure that the node can communicate with the other nodes in the cluster and make sure that the cluster's state is healthy. Then, retry joining the node to the cluster. + - For an `Error` state, the node may need to be resynchronized with the cluster. Restart the mysqld process on the affected node, or you may need to perform a full state transfer to recover. + +5. Monitor the cluster + + After resolving the issue, monitor the cluster to ensure that all nodes are healthy and remain in-sync. + diff --git a/health/guides/mysql/mysql_galera_cluster_state_warn.md b/health/guides/mysql/mysql_galera_cluster_state_warn.md new file mode 100644 index 00000000..e03ffa2e --- /dev/null +++ b/health/guides/mysql/mysql_galera_cluster_state_warn.md @@ -0,0 +1,43 @@ +### Understand the alert + +This alert checks the state of a Galera node in a MySQL Galera cluster. If you receive this alert, it means that the node is either in the **Donor/Desynced** state or the **Joined** state, which can indicate potential issues within the cluster. + +### What does Donor/Desynced and Joined state mean? + +1. **Donor/Desynced**: When a node is in the Donor/Desynced state, it is providing a State Snapshot Transfer (SST) to another node in the cluster. During this time, the node is not synchronized with the rest of the cluster and cannot process any write or commit requests. + +2. **Joined**: In the Joined state, a node has completed the initial SST and is now catching up with any missing transactions through an Incremental State Transfer (IST). + +### Troubleshoot the alert + +1. Check the Galera cluster status with the following command: + + ``` + SHOW STATUS LIKE 'wsrep_%'; + ``` + +2. Verify if any node is in the Donor/Desynced or Joined state: + + ``` + SELECT VARIABLE_NAME, VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME IN ('wsrep_local_state_comment', 'wsrep_cluster_status', 'wsrep_ready'); + ``` + +3. Identify the cause of the node state change. Some possible reasons are: + + - A new node has joined the cluster and requires an SST. + - A node has been restarted, and it is rejoining the cluster. + - A node experienced a temporary network issue and is now resynchronizing with the cluster. + +4. Monitor the progress of the resynchronization process using the `SHOW STATUS` command, as provided above, and wait for the node to reach the *Synced* state. + +5. If the node remains in the Donor/Desynced or Joined state for an extended period, investigate further to determine the cause of the issue: + + - Inspect the MySQL error log for any relevant messages. + - Check for network issues or connectivity problems between the nodes. + - Verify the cluster configuration and ensure all nodes have a consistent configuration. + +6. Contact your DBA for assistance if the issue persists, as they may need to perform additional investigation and troubleshooting. + +### Useful resources + +1. [Galera Cluster's Documentation](https://galeracluster.com/library/documentation/) diff --git a/health/guides/mysql/mysql_galera_cluster_status.md b/health/guides/mysql/mysql_galera_cluster_status.md new file mode 100644 index 00000000..c5b07516 --- /dev/null +++ b/health/guides/mysql/mysql_galera_cluster_status.md @@ -0,0 +1,39 @@ +### Understand the alert + +This alert indicates the current status of the Galera node cluster component in your MySQL or MariaDB database. Receiving this alert means that there is a potential issue with the cluster, such as a network partition that has caused the cluster to split into multiple components. + +### Troubleshoot the alert + +1. **Check the status of the Galera cluster** + + First, you need to determine the current status of the cluster to understand the severity of the issue. Check the value of the alert. Refer to the table in the given alert description to see which state your cluster is in. + +2. **Verify cluster connectivity** + + If your cluster is in a non-primary state or disconnected, you should verify if all the nodes in your cluster can communicate with each other. You can use tools like `ping`, `traceroute`, or `mtr` to test connectivity between the cluster nodes. If there is a network issue, get in touch with your network administrator to resolve it. + +3. **Examine node logs** + + Check the logs on each node for any indication of issues or error messages that can help identify the root cause of the problem. The logs are usually located in the `/var/log/mysqld.log` file or in the `/var/log/mysql/error.log` file. Look for lines that contain "ERROR" or "WARNING" as a starting point. + +4. **Inspect Galera cluster settings** + + Analyze your Galera cluster configuration file (`/etc/my.cnf` or `/etc/mysql/my.cnf`) to make sure you have the correct settings, including the initial `wsrep_cluster_address` value, which defines the initial list of nodes in the cluster. If you find any misconfiguration, correct it and restart your database service. + +5. **Force a new primary component** + + If you have a split-brain scenario, where multiple parts of the cluster are claiming to be the primary component, you need to force a new primary component. To do this, you can use the `SET GLOBAL wsrep_provider_options='pc.bootstrap=YES';` statement on one of the nodes that has the most up-to-date data. This action will force that node to act as the new primary component. + +### Prevention + +To minimize the risks of cluster issues, ensure the following: + +1. Use reliable and redundant network connections between nodes. +2. Configure Galera cluster settings correctly. +3. Regularly monitor the cluster status and review logs. +4. Use the latest stable version of the Galera cluster software. + +### Useful resources + +1. [MariaDB Galera Cluster Documentation]( + https://mariadb.com/kb/en/getting-started-with-mariadb-galera-cluster/) diff --git a/health/guides/mysql/mysql_replication.md b/health/guides/mysql/mysql_replication.md new file mode 100644 index 00000000..50f7e543 --- /dev/null +++ b/health/guides/mysql/mysql_replication.md @@ -0,0 +1,64 @@ +### Understand the alert + +This alert is triggered when the replication status of a MySQL server is indicating a problem or failure. Replication is important for redundancy, data backup, or load balancing. Issues with replication threads can lead to data inconsistencies or potential loss of data. + +### Troubleshoot the alert + +1. Identify the failing thread: + + As mentioned above, use the appropriate command for your MySQL or MariaDB version to check the status of replication threads and determine which of them (I/O or SQL) is not running. + + For MySQL and MariaDB before v10.2.0, use: + + ``` + SHOW SLAVE STATUS\G + ``` + + For MariaDB v10.2.0+, use: + + ``` + SHOW ALL SLAVES STATUS\G + ``` + +2. Inspect the MySQL error log: + + The MySQL error log can provide valuable information about the possible cause of the replication issues. Check the log for any replication-related error messages: + + ``` + tail -f /path/to/mysql/error.log + ``` + + Replace `/path/to/mysql/error.log` with the correct path to the MySQL error log file. + +3. Check the source MySQL server: + + Replication issues can also originate from the source MySQL server. Make sure that the source server is properly configured and running, and that the binary logs are being written and flushed correctly. + + Refer to the [MySQL documentation](https://dev.mysql.com/doc/refman/5.7/en/replication-howto.html) for more information on configuring replication. + +4. Restart the replication threads: + + After identifying and resolving any issues found in the previous steps, you can try restarting the replication threads: + + ``` + STOP SLAVE; + START SLAVE; + ``` + + For MariaDB v10.2.0+ with multi-source replication, you may need to specify the connection name: + + ``` + STOP ALL SLAVES; + START ALL SLAVES; + ``` + +5. Verify the replication status: + + After restarting the replication threads, use the appropriate command from step 1 to verify that the threads are running, and that the replication is working as expected. + +### Useful resources + +1. [How To Set Up Replication in MySQL](https://www.digitalocean.com/community/tutorials/how-to-set-up-replication-in-mysql) +2. [MySQL Replication Administration and Status](https://dev.mysql.com/doc/refman/5.7/en/replication-administration-status.html) +3. [Replication Replica I/O Thread States](https://dev.mysql.com/doc/refman/5.7/en/replica-io-thread-states.html) +4. [Replication Replica SQL Thread States](https://dev.mysql.com/doc/refman/5.7/en/replica-sql-thread-states.html)
\ No newline at end of file diff --git a/health/guides/mysql/mysql_replication_lag.md b/health/guides/mysql/mysql_replication_lag.md new file mode 100644 index 00000000..9c57f810 --- /dev/null +++ b/health/guides/mysql/mysql_replication_lag.md @@ -0,0 +1,30 @@ +### Understand the alert + +This alert presents the number of seconds that the replica is behind the master. Receiving this means that the replication SQL thread is far behind processing the source binary log. A constantly high value (or an increasing one) indicates that the replica is unable to handle events from the source in a timely fashion. + +This alert is raised into warning when the metric exceeds 10 seconds. If the number of seconds that the replica is behind the master exceeds 30 seconds then the alert is raised into critical. + + +### Troubleshoot the alert + +- Query optimization and "log_slow_slave_statements" + +To minimize slave `SQL_THREAD` lag, focus on query optimization. The following logs will help you identify the problem: +1. Enable [log_slow_slave_statements](https://dev.mysql.com/doc/refman/8.0/en/replication-options-replica.html#sysvar_log_slow_slave_statements) to see queries executed by slave that take more than [long_query_time](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_long_query_time). +2. To get more information about query performance, set the configuration option [log_slow_verbosity](https://www.percona.com/doc/percona-server/5.1/diagnostics/slow_extended.html?id=percona-server:features:slow_extended_51&redirect=2#log_slow_verbosity) to `full`. + +You can also read the Percona blog for a nice write-up about[MySQL replication slave lag](https://www.percona.com/blog/2014/05/02/how-to-identify-and-cure-mysql-replication-slave-lag/). + +### Useful resources + +1. [Replication in MySQL]( + https://www.digitalocean.com/community/tutorials/how-to-set-up-replication-in-mysql) +2. [MySQL Replication Slave Lag]( + https://www.percona.com/blog/2014/05/02/how-to-identify-and-cure-mysql-replication-slave-lag/) +3. [log_slow_slave_statements]( + https://dev.mysql.com/doc/refman/8.0/en/replication-options-replica.html#sysvar_log_slow_slave_statements) +4. [long_query_time]( + https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_long_query_time) +5. [log_slow_verbosity]( + https://www.percona.com/doc/percona-server/5.1/diagnostics/slow_extended.html?id=percona-server:features:slow_extended_51&redirect=2#log_slow_verbosity) + |