summaryrefslogtreecommitdiffstats
path: root/health/guides/postgres
diff options
context:
space:
mode:
Diffstat (limited to 'health/guides/postgres')
-rw-r--r--health/guides/postgres/postgres_acquired_locks_utilization.md43
-rw-r--r--health/guides/postgres/postgres_db_cache_io_ratio.md51
-rw-r--r--health/guides/postgres/postgres_db_deadlocks_rate.md39
-rw-r--r--health/guides/postgres/postgres_db_transactions_rollback_ratio.md55
-rw-r--r--health/guides/postgres/postgres_index_bloat_size_perc.md33
-rw-r--r--health/guides/postgres/postgres_table_bloat_size_perc.md58
-rw-r--r--health/guides/postgres/postgres_table_cache_io_ratio.md32
-rw-r--r--health/guides/postgres/postgres_table_index_cache_io_ratio.md45
-rw-r--r--health/guides/postgres/postgres_table_last_autoanalyze_time.md41
-rw-r--r--health/guides/postgres/postgres_table_last_autovacuum_time.md50
-rw-r--r--health/guides/postgres/postgres_table_toast_cache_io_ratio.md39
-rw-r--r--health/guides/postgres/postgres_table_toast_index_cache_io_ratio.md41
-rw-r--r--health/guides/postgres/postgres_total_connection_utilization.md45
-rw-r--r--health/guides/postgres/postgres_txid_exhaustion_perc.md33
14 files changed, 605 insertions, 0 deletions
diff --git a/health/guides/postgres/postgres_acquired_locks_utilization.md b/health/guides/postgres/postgres_acquired_locks_utilization.md
new file mode 100644
index 000000000..d0b76eae3
--- /dev/null
+++ b/health/guides/postgres/postgres_acquired_locks_utilization.md
@@ -0,0 +1,43 @@
+### Understand the alert
+
+This alert monitors the average `acquired locks utilization` over the last minute in PostgreSQL databases. If you receive this alert, it means that the acquired locks utilization for your system is near or above the warning threshold (15% or 20%).
+
+### What are acquired locks?
+
+In PostgreSQL, a lock is a mechanism used to control access to shared resources, such as database tables or rows. When multiple users or tasks are working with the database, locks help coordinate their activities and prevent conflicts.
+
+Acquired locks utilization refers to the percentage of locks currently in use in the system, compared to the total number of locks available.
+
+### Troubleshoot the alert
+
+1. Identify the most lock-intensive queries:
+
+ You can use the following SQL query to get the list of most lock-intensive queries running on your PostgreSQL server:
+
+ ```
+ SELECT pid, locktype, mode, granted, client_addr, query_start, now() - query_start AS duration, query
+ FROM pg_locks l
+ JOIN pg_stat_activity a ON l.pid = a.pid
+ WHERE query != '<IDLE>'
+ ORDER BY duration DESC;
+ ```
+
+2. Analyze the problematic queries and look for ways to optimize them, such as:
+
+ a. Adding missing indexes for faster query execution.
+ b. Updating and optimizing query plans.
+ c. Adjusting lock types or lock levels, if possible.
+
+3. Check the overall health and performance of your PostgreSQL server:
+
+ a. Monitor the CPU, memory, and disk usage.
+ b. Consider configuring the autovacuum settings to maintain your database's health.
+
+4. Monitor database server logs for any errors or issues.
+
+5. If the problem persists, consider adjusting the warning threshold (`warn` option), or even increasing the available locks in the PostgreSQL configuration (`max_locks_per_transaction`).
+
+### Useful resources
+
+1. [PostgreSQL Locks Monitoring](https://www.postgresql.org/docs/current/monitoring-locks.html)
+2. [PostgreSQL Server Activity statistics](https://www.postgresql.org/docs/current/monitoring-stats.html)
diff --git a/health/guides/postgres/postgres_db_cache_io_ratio.md b/health/guides/postgres/postgres_db_cache_io_ratio.md
new file mode 100644
index 000000000..d39329763
--- /dev/null
+++ b/health/guides/postgres/postgres_db_cache_io_ratio.md
@@ -0,0 +1,51 @@
+### Understand the alert
+
+The `postgres_db_cache_io_ratio` alert is related to PostgreSQL databases and measures the `cache hit ratio` in the last minute. If you receive this alert, it means that your database server cache is not as efficient as it should be, and your system is frequently reading data from disk instead of cache, causing possible slow performance and higher I/O workload.
+
+### What does cache hit ratio mean?
+
+Cache hit ratio is an indicator of how frequently the data required for a query is found in the cache instead of reading it directly from disk. Higher cache hit ratios mean increased query performance and less disk I/O, which can greatly impact your database performance.
+
+### Troubleshoot the alert
+
+1. Determine if the cache hit ratio issue is affecting your overall database performance using `htop`:
+
+ ```
+ htop
+ ```
+
+ Check the `Load average` gauge, if it's in the safe zone (green), the cache hit ratio issue might not be affecting overall performance. If it's in the yellow or red zone, further troubleshooting is necessary.
+
+2. Check per-database cache hit ratio:
+
+ Run the following query to see cache hit ratios for each database:
+ ```
+ SELECT dbname, (block_cache_hit_kb / (block_cache_miss_read_kb + block_cache_hit_kb)) * 100 AS cache_hit_ratio
+ FROM (SELECT datname as dbname,
+ sum(blks_read * 8.0 / 1024) as block_cache_miss_read_kb,
+ sum(blks_hit * 8.0 / 1024) as block_cache_hit_kb
+ FROM pg_stat_database
+ GROUP BY datname) T;
+ ```
+
+ Analyze the results to determine which databases have a low cache hit ratio.
+
+3. Analyze PostgreSQL cache settings:
+
+ Check the cache settings in the `postgresql.conf` file. You may need to increase the `shared_buffers` parameter to allocate more memory for caching purposes, if there is available memory on the host.
+
+ For example, set increased shared_buffers value:
+ ```
+ shared_buffers = 2GB # Change the value according to your host's available memory.
+ ```
+
+ Restart the PostgreSQL service to apply the changes:
+ ```
+ sudo systemctl restart postgresql
+ ```
+
+ Monitor the cache hit ratio to determine if the changes improved performance. It might take some time for the changes to take effect, so be patient and monitor the cache hit ratio and overall system health over time.
+
+### Useful resources
+
+1. [Tuning Your PostgreSQL Server](https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server)
diff --git a/health/guides/postgres/postgres_db_deadlocks_rate.md b/health/guides/postgres/postgres_db_deadlocks_rate.md
new file mode 100644
index 000000000..0b670b640
--- /dev/null
+++ b/health/guides/postgres/postgres_db_deadlocks_rate.md
@@ -0,0 +1,39 @@
+### Understand the alert
+
+This alert calculates the number of deadlocks in your PostgreSQL database in the last minute. If you receive this alert, it means that the number of deadlocks has surpassed the warning threshold (10 deadlocks per minute by default).
+
+### What are deadlocks?
+
+In a PostgreSQL database, a deadlock occurs when two or more transactions are waiting for one another to release a lock, causing a cyclical dependency. As a result, none of these transactions can proceed, and the database server may be unable to process other requests.
+
+### Troubleshoot the alert
+
+- Identify deadlock occurrences and problematic queries
+
+1. Check the PostgreSQL log for deadlock occurrence messages. You can typically find these logs in `/var/log/postgresql/` or `/pg_log/`.
+
+ Look for messages like: `DETAIL: Process 12345 waits for ShareLock on transaction 67890; blocked by process 98765.`
+
+2. To find the problematic queries, examine the log entries before the deadlock messages. Most often, these entries will contain the SQL queries that led to the deadlocks.
+
+- Analyze and optimize the problematic queries
+
+1. Analyze the execution plans of the problematic queries using the `EXPLAIN` command. This can help you identify which parts of the query are causing the deadlock.
+
+2. Optimize the queries by rewriting them or by adding appropriate indices to speed up the processing time.
+
+- Avoid long-running transactions
+
+1. Long-running transactions increase the chances of deadlocks. Monitor your database for long-running transactions and try to minimize their occurrence.
+
+2. Set sensible lock timeouts to avoid transactions waiting indefinitely for a lock.
+
+- Review your application logic
+
+1. Inspect your application code for any circular dependencies that could lead to deadlocks.
+
+2. Use advisory locks when possible to minimize lock contention in the database.
+
+### Useful resources
+
+1. [PostgreSQL: Deadlocks](https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-DEADLOCKS)
diff --git a/health/guides/postgres/postgres_db_transactions_rollback_ratio.md b/health/guides/postgres/postgres_db_transactions_rollback_ratio.md
new file mode 100644
index 000000000..b2f94fede
--- /dev/null
+++ b/health/guides/postgres/postgres_db_transactions_rollback_ratio.md
@@ -0,0 +1,55 @@
+### Understand the alert
+
+This alert calculates the `PostgreSQL database transactions rollback ratio` for the last five minutes. If you receive this alert, it means that the percentage of `aborted transactions` in the specified PostgreSQL database is higher than the defined threshold.
+
+### What does transactions rollback ratio mean?
+
+In a PostgreSQL database, the transactions rollback ratio represents the proportion of aborted transactions (those that roll back) in relation to the total number of transactions processed. A high rollback ratio may indicate issues with the application logic, database performance or excessive `deadlocks` causing transactions to be aborted frequently.
+
+### Troubleshoot the alert
+
+1. Check the PostgreSQL logs for any error messages or unusual activities related to transactions that might help identify the cause of the high rollback ratio.
+
+ ```
+ vi /var/log/postgresql/postgresql.log
+ ```
+
+ Replace `/var/log/postgresql/postgresql.log` with the appropriate path to your PostgreSQL log file.
+
+2. Investigate recent database changes or application code modifications that might have led to the increased rollback ratio.
+
+3. Examine the PostgreSQL database table and index statistics to identify potential performance bottlenecks.
+
+ ```
+ SELECT relname, seq_scan, idx_scan, n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd, n_live_tup, n_dead_tup, last_vacuum, last_analyze
+ FROM pg_stat_all_tables
+ WHERE schemaname = 'your_schema_name';
+ ```
+
+ Replace `your_schema_name` with the appropriate schema name.
+
+4. Identify the most frequent queries that cause transaction rollbacks using pg_stat_statements view:
+
+ ```
+ SELECT substring(query, 1, 50) as short_query, calls, total_time, rows, 100.0 * shared_blks_hit/nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
+ FROM pg_stat_statements
+ WHERE calls > 50
+ ORDER BY (total_time / calls) DESC;
+ ```
+
+5. Investigate database locks and deadlocks using pg_locks:
+
+ ```
+ SELECT database, relation::regclass, mode, transactionid AS tid, virtualtransaction AS vtid, pid, granted
+ FROM pg_catalog.pg_locks;
+ ```
+
+6. Make necessary changes in the application logic or database configuration to resolve the issues causing a high rollback ratio. Consult a PostgreSQL expert, if needed.
+
+### Useful resources
+
+1. [Monitoring PostgreSQL - rollback ratio](https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS)
+2. [PostgreSQL: Database Indexes](https://www.postgresql.org/docs/current/indexes.html)
+3. [PostgreSQL: Deadlocks](https://www.postgresql.org/docs/current/explicit-locking.html#LOCK-BUILT-IN-DEADLOCK-AVOIDANCE)
+4. [PostgreSQL: Log files](https://www.postgresql.org/docs/current/runtime-config-logging.html)
+5. [PostgreSQL: pg_stat_statements module](https://www.postgresql.org/docs/current/pgstatstatements.html) \ No newline at end of file
diff --git a/health/guides/postgres/postgres_index_bloat_size_perc.md b/health/guides/postgres/postgres_index_bloat_size_perc.md
new file mode 100644
index 000000000..bd6e4ba08
--- /dev/null
+++ b/health/guides/postgres/postgres_index_bloat_size_perc.md
@@ -0,0 +1,33 @@
+### Understand the alert
+
+This alert monitors index bloat in a PostgreSQL database table. If you receive this alert, it indicates that the index is bloated and is taking up more disk space than necessary, which can lead to performance issues.
+
+### What does index bloat mean?
+
+In PostgreSQL, when a row is updated or deleted, the old row data remains in the index while the new data is added. Over time, this causes the index to grow in size (bloat), leading to increased disk usage and degraded query performance. This alert measures the bloat size percentage for each index in the specified database and table.
+
+### Troubleshoot the alert
+
+1. Identify the bloated index in your PostgreSQL database, as mentioned in the alert's info field (e.g. `db [database] table [table] index [index]`).
+
+2. Rebuild the bloated index:
+
+ Use the `REINDEX` command to rebuild the bloated index. This will free up the space occupied by the old row data and help optimize query performance.
+
+ ```
+ REINDEX INDEX [index_name];
+ ```
+
+ **Note:** `REINDEX` might lock the table for the time it takes to rebuild the index, so plan to run this command during maintenance periods or during low database usage periods.
+
+3. Monitor the index bloat size after rebuilding:
+
+ After rebuilding the index, continue monitoring the index bloat size and performance to ensure the issue has been resolved.
+
+ You can use tools like [pg_stat_statements](https://www.postgresql.org/docs/current/pgstatstatements.html) (a built-in PostgreSQL extension) and pg_stat_indexes (user-defined database views that collect index-related statistics) to keep an eye on your database's performance and catch any bloat issues before they negatively impact your PostgreSQL setup.
+
+### Useful resources
+
+1. [PostgreSQL documentation: REINDEX](https://www.postgresql.org/docs/current/sql-reindex.html)
+2. [PostgreSQL documentation: pg_stat_statements](https://www.postgresql.org/docs/current/pgstatstatements.html)
+3. [PostgreSQL documentation: Routine Vacuuming](https://www.postgresql.org/docs/current/routine-vacuuming.html)
diff --git a/health/guides/postgres/postgres_table_bloat_size_perc.md b/health/guides/postgres/postgres_table_bloat_size_perc.md
new file mode 100644
index 000000000..0edc21bb1
--- /dev/null
+++ b/health/guides/postgres/postgres_table_bloat_size_perc.md
@@ -0,0 +1,58 @@
+### Understand the alert
+
+The `postgres_table_bloat_size_perc` alert measures the bloat size percentage in a PostgreSQL database table. If you receive this alert, it means that the bloat size in a particular table in your PostgreSQL database has crossed the warning or critical threshold.
+
+### What is bloat size?
+
+In PostgreSQL, bloat size refers to the wasted storage space caused by dead rows and unused space that accumulates in database tables over time. It is a result of frequent database operations (inserts, updates, and deletes), impacting database performance and storage footprint.
+
+### Troubleshoot the alert
+
+- Investigate the bloat size and impacted table
+
+To get a detailed report on bloated tables in your PostgreSQL database, use the [`pgstattuple`](https://www.postgresql.org/docs/current/pgstattuple.html) extension. First, install the extension if it isn't already installed:
+
+ ```
+ CREATE EXTENSION pgstattuple;
+ ```
+
+Then, run the following query to find the bloated tables:
+
+ ```sql
+ SELECT
+ schemaname, tablename,
+ pg_size_pretty(bloat_size) AS bloat_size,
+ round(bloat_ratio::numeric, 2) AS bloat_ratio
+ FROM (
+ SELECT
+ schemaname, tablename,
+ bloat_size, table_size, (bloat_size / table_size) * 100 as bloat_ratio
+ FROM pgstattuple.schema_bloat
+ ) sub_query
+ WHERE bloat_ratio > 10
+ ORDER BY bloat_ratio DESC;
+ ```
+
+- Reclaim storage space
+
+Reducing the bloat size in PostgreSQL tables involves reclaiming wasted storage space. Here are two approaches:
+
+ 1. **VACUUM**: The `VACUUM` command helps clean up dead rows and compact the space used by the table. Use the following command to clean up the impacted table:
+
+ ```
+ VACUUM VERBOSE ANALYZE <schema_name>.<table_name>;
+ ```
+
+ 2. **REINDEX**: If the issue persists after using `VACUUM`, consider REINDEXing the table. This command rebuilds the table's indexes, which can improve query performance and reduce bloat. It can be more intrusive than `VACUUM`, be sure you understand its implications before running:
+
+ ```
+ REINDEX TABLE <schema_name>.<table_name>;
+ ```
+
+- Monitor the bloat size
+
+Continue monitoring the bloat size in your PostgreSQL tables by regularly checking the `postgres_table_bloat_size_perc` alert on Netdata.
+
+### Useful resources
+
+1. [How to monitor and fix Database bloats in PostgreSQL?](https://blog.netdata.cloud/postgresql-database-bloat/)
diff --git a/health/guides/postgres/postgres_table_cache_io_ratio.md b/health/guides/postgres/postgres_table_cache_io_ratio.md
new file mode 100644
index 000000000..382f8ee4d
--- /dev/null
+++ b/health/guides/postgres/postgres_table_cache_io_ratio.md
@@ -0,0 +1,32 @@
+### Understand the alert
+
+This alert monitors the PostgreSQL table cache hit ratio, which is the percentage of database read requests that can be served from the cache without requiring I/O operations. If you receive this alert, it means your PostgreSQL table cache hit ratio is too low, indicating performance issues with the database.
+
+### What does PostgreSQL table cache hit ratio mean?
+
+The PostgreSQL table cache hit ratio is an important metric for analyzing the performance of a database. A high cache hit ratio means that most read requests are being served from the cache, reducing the need for disk I/O operations and improving overall database performance. On the other hand, a low cache hit ratio indicates that more I/O operations are required, which can lead to performance degradation.
+
+### Troubleshoot the alert
+
+To address the low cache hit ratio issue, follow these steps:
+
+1. Analyze database performance:
+
+Analyze the database performance to identify potential bottlenecks and areas for optimization. You can use PostgreSQL performance monitoring tools such as `pg_top`, `pg_stat_statements`, and `pg_stat_user_tables` to gather information about query execution, table access patterns, and other performance metrics.
+
+2. Optimize queries:
+
+Review and optimize complex or long-running SQL queries that may be causing performance issues. Utilize PostgreSQL features like `EXPLAIN` and `EXPLAIN ANALYZE` to analyze query execution plans and identify optimization opportunities. Indexing and query optimization can reduce I/O requirements and improve cache hit ratios.
+
+3. Increase shared_buffers:
+
+If you have a dedicated database server with sufficient memory, you can consider increasing the `shared_buffers` in your PostgreSQL configuration. This increases the amount of memory available to the PostgreSQL cache and can help improve cache hit ratios. Before making changes to the configuration, ensure that you analyze the existing memory usage patterns and leave enough free memory for other system processes and caching demands.
+
+4. Monitor cache hit ratios:
+
+Keep monitoring cache hit ratios after making changes to your configuration or optimization efforts. Depending on the results, you may need to adjust further settings, indexes, or queries to optimize database performance.
+
+### Useful resources
+
+1. [Tuning Your PostgreSQL Server](https://www.postgresql.org/docs/current/runtime-config-resource.html)
+2. [Performance Monitoring and Tuning in PostgreSQL](https://learn.netdata.cloud/docs/agent/collectors/python.d.plugin/postgres#monitoring)
diff --git a/health/guides/postgres/postgres_table_index_cache_io_ratio.md b/health/guides/postgres/postgres_table_index_cache_io_ratio.md
new file mode 100644
index 000000000..5c5bb2bd8
--- /dev/null
+++ b/health/guides/postgres/postgres_table_index_cache_io_ratio.md
@@ -0,0 +1,45 @@
+### Understand the alert
+
+This alert monitors the PostgreSQL table index cache hit ratio, specifically the average index cache hit ratio over the last minute, for a specific database and table. If you receive this alert, it means that your table index caching is not efficient and might result in slow database performance.
+
+### What does cache hit ratio mean?
+
+Cache hit ratio is the percentage of cache accesses to an existing item in the cache, compared to cache accesses to a non-existing item. A higher cache hit ratio means that your database entries are found in the cache more often, reducing the need to access the disk and consequently speeding up the execution times for database operations.
+
+### Troubleshoot the alert
+
+1. Check cache configuration settings
+
+- `shared_buffers`: This parameter sets the amount of shared memory used for the buffer pool, which is the most common caching mechanism. You can check its current value by running the following query:
+
+ ```
+ SHOW shared_buffers;
+ ```
+
+- `effective_cache_size`: This parameter is used by the PostgreSQL query planner to estimate how much of the buffer pool data will be cached in the operating system's page cache. To check its current value, run:
+
+ ```
+ SHOW effective_cache_size;
+ ```
+
+2. Analyze the query workload
+
+- Queries using inefficient indexes or not using indexes properly might contribute to a higher cache miss ratio. To find the most expensive queries, you can run:
+
+ ```
+ SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;
+ ```
+
+- Check if your database is using proper indexes. You can create a missing index based on your query plan or modify existing indexes to cover more cases.
+
+3. Increase cache size
+
+- If the cache settings are low and disk I/O is high, you might need to increase the cache size. Remember that increasing the cache size may also impact system memory usage, so monitor the changes and adjust the settings accordingly.
+
+4. Optimize storage performance
+
+- Verify that the underlying storage system performs well by monitoring disk latency and throughput rates. If required, consider upgrading the disk subsystem or using faster disks.
+
+### Useful resources
+
+1. [PostgreSQL Performance Tuning Guide](https://www.cybertec-postgresql.com/en/postgresql-performance-tuning/)
diff --git a/health/guides/postgres/postgres_table_last_autoanalyze_time.md b/health/guides/postgres/postgres_table_last_autoanalyze_time.md
new file mode 100644
index 000000000..1a7a3d79b
--- /dev/null
+++ b/health/guides/postgres/postgres_table_last_autoanalyze_time.md
@@ -0,0 +1,41 @@
+### Understand the alert
+
+This alert is triggered when the time elapsed since a PostgreSQL table was last analyzed by the AutoVacuum daemon exceeds one week. AutoVacuum is responsible for recovering storage, optimizing the database, and updating statistics used by the PostgreSQL query planner. If you receive this alert, it indicates that one or more of your PostgreSQL tables have not been analyzed recently which may impact performance.
+
+### What is PostgreSQL table autoanalyze?
+
+In PostgreSQL, table autoanalyze is a process carried out by the AutoVacuum daemon. This process analyzes the table contents and gathers statistics for the query planner to help it make better decisions about optimizing your queries. Regular autoanalyze is crucial for maintaining good performance in your PostgreSQL database.
+
+### Troubleshoot the alert
+
+1. Check the current AutoVacuum settings: To verify if AutoVacuum is enabled and configured correctly in your PostgreSQL database, run the following SQL command:
+
+ ```sql
+ SHOW autovacuum;
+ ```
+
+ If it returns `on`, AutoVacuum is enabled. Otherwise, enable AutoVacuum by modifying the `postgresql.conf` file, and set `autovacuum = on`. Then, restart the PostgreSQL service.
+
+2. Analyze the table manually: If AutoVacuum is enabled but the table has not been analyzed recently, you can manually analyze the table by running the following SQL command:
+
+ ```sql
+ ANALYZE [VERBOSE] [schema_name.]table_name;
+ ```
+
+ Replace `[schema_name.]table_name` with the appropriate schema and table name. The optional `VERBOSE` keyword provides detailed information about the analyze process.
+
+3. Investigate any errors during autoanalyze: If AutoVacuum is enabled and running but you still receive this alert, check the PostgreSQL log files for any errors or issues related to the AutoVacuum process. Address any issues discovered in the logs.
+
+4. Monitor AutoVacuum activity: To get an overview of AutoVacuum activity, you can monitor the `pg_stat_progress_vacuum` view. Run the following SQL command to inspect the view:
+
+ ```sql
+ SELECT * FROM pg_stat_progress_vacuum;
+ ```
+
+ Analyze the results to determine if there are any inefficiencies or issues with the AutoVacuum settings.
+
+### Useful resources
+
+1. [PostgreSQL: AutoVacuum](https://www.postgresql.org/docs/current/routine-vacuuming.html)
+2. [PostgreSQL: Analyzing a Table](https://www.postgresql.org/docs/current/sql-analyze.html)
+3. [PostgreSQL: Monitoring AutoVacuum Progress](https://www.postgresql.org/docs/current/progress-reporting.html#VACUUM-PART) \ No newline at end of file
diff --git a/health/guides/postgres/postgres_table_last_autovacuum_time.md b/health/guides/postgres/postgres_table_last_autovacuum_time.md
new file mode 100644
index 000000000..8a79b0d31
--- /dev/null
+++ b/health/guides/postgres/postgres_table_last_autovacuum_time.md
@@ -0,0 +1,50 @@
+### Understand the alert
+
+This alert is related to the PostgreSQL database and checks the time since the last autovacuum operation occurred on a specific table. If you receive this alert, it means that the table has not been vacuumed by the autovacuum daemon for more than a week (7 days).
+
+### What is autovacuum in PostgreSQL?
+
+Autovacuum is a feature in PostgreSQL that automates the maintenance of the database by reclaiming storage, optimizing the performance of the database, and updating statistics. It operates on individual tables and performs the following tasks:
+
+1. Reclaims storage occupied by dead rows and updates the Free Space Map.
+2. Optimizes the performance by updating statistics and executing the `ANALYZE` command.
+3. Removes dead rows and updates the visibility map in order to reduce the need for vacuuming.
+
+### Troubleshoot the alert
+
+- Check the autovacuum status
+
+To check if the autovacuum daemon is running for the PostgreSQL instance, run the following SQL command:
+
+ ```
+ SHOW autovacuum;
+ ```
+
+If the result is "off", then the autovacuum is disabled for the PostgreSQL instance. You can enable it by modifying the `postgresql.conf` configuration file and setting `autovacuum = on`.
+
+- Verify table-specific autovacuum settings
+
+Sometimes, autovacuum settings might be altered for individual tables. To check the autovacuum settings for the specific table mentioned in the alert, run the following SQL command:
+
+ ```
+ SELECT relname, reloptions FROM pg_class JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace WHERE relname = '<table_name>' AND nspname = '<schema_name>';
+ ```
+
+Look for any custom `autovacuum_*` settings in the `reloptions` column and adjust them accordingly to allow the autovacuum daemon to run on the table.
+
+- Monitor the PostgreSQL logs
+
+Inspect the PostgreSQL logs for any error messages or unusual behavior related to autovacuum. The log file location depends on your PostgreSQL installation and configuration.
+
+- Manually vacuum the table
+
+If the autovacuum daemon has not run for a long time on the table, you can manually vacuum the table to reclaim storage and update statistics. To perform a manual vacuum, run the following SQL command:
+
+ ```
+ VACUUM (VERBOSE, ANALYZE) <schema_name>.<table_name>;
+ ```
+
+### Useful resources
+
+1. [PostgreSQL: Autovacuum](https://www.postgresql.org/docs/current/runtime-config-autovacuum.html)
+2. [PostgreSQL: Routine Vacuuming](https://www.postgresql.org/docs/current/routine-vacuuming.html)
diff --git a/health/guides/postgres/postgres_table_toast_cache_io_ratio.md b/health/guides/postgres/postgres_table_toast_cache_io_ratio.md
new file mode 100644
index 000000000..c33a2373c
--- /dev/null
+++ b/health/guides/postgres/postgres_table_toast_cache_io_ratio.md
@@ -0,0 +1,39 @@
+### Understand the alert
+
+This alert monitors the TOAST hit ratio (i.e., cached I/O efficiency) of a specific table in a PostgreSQL database. If the hit ratio is low, it indicates that the database is performing more disk I/O operations than needed for the table, which may cause performance issues.
+
+### What is TOAST?
+
+TOAST (The Oversized-Attribute Storage Technique) is a mechanism in PostgreSQL to efficiently store large data items. It allows you to store large values (such as text or binary data) in a separate table, improving the overall performance of the database.
+
+### What does the hit ratio mean?
+
+The hit ratio is the percentage of cache hits (successful reads from the cache) compared to total cache requests (hits + misses). A high hit ratio indicates that the data frequently needed is stored in the cache, resulting in fewer disk I/O operations and better performance.
+
+### Troubleshoot the alert
+
+1. Verify if the alert is accurate by checking the TOAST hit ratio in the affected PostgreSQL system. You can use the following query to retrieve the hit ratio of a specific table:
+
+ ```sql
+ SELECT CASE
+ WHEN blks_hit + blks_read = 0 THEN 0
+ ELSE 100 * blks_hit / (blks_hit + blks_read)
+ END as cache_hit_ratio
+ FROM pg_statio_user_tables
+ WHERE schemaname = 'your_schema' AND relname = 'your_table';
+ ```
+
+ Replace `your_schema` and `your_table` with the appropriate values.
+
+2. Examine the table's indexes, and consider creating new indexes to improve query performance. Be cautious when creating indexes, as too many can negatively impact performance.
+
+3. Analyze the table's read and write patterns to determine if you need to adjust the cache settings, such as increasing the `shared_buffers` configuration value.
+
+4. Inspect the application's queries to see if any can be optimized to improve performance. For example, use EXPLAIN ANALYZE to determine if the queries are using indexes effectively.
+
+5. Monitor overall PostgreSQL performance with tools like pg_stat_statements or pg_stat_activity to identify potential bottlenecks and areas for improvement.
+
+### Useful resources
+
+1. [PostgreSQL TOAST Overview](https://www.postgresql.org/docs/current/storage-toast.html)
+2. [Tuning Your PostgreSQL Server](https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server)
diff --git a/health/guides/postgres/postgres_table_toast_index_cache_io_ratio.md b/health/guides/postgres/postgres_table_toast_index_cache_io_ratio.md
new file mode 100644
index 000000000..6aeb38624
--- /dev/null
+++ b/health/guides/postgres/postgres_table_toast_index_cache_io_ratio.md
@@ -0,0 +1,41 @@
+### Understand the alert
+
+This alert monitors the `PostgreSQL` TOAST index cache hit ratio for a specific table in a database. A low hit ratio indicates a potential performance issue, as it means that a high number of cache misses are occurring. If you receive this alert, it suggests that your system is experiencing higher cache miss rates, which may lead to increased I/O load and reduced query performance.
+
+### What is TOAST?
+
+TOAST (The Oversized-Attribute Storage Technique) is a technique used by PostgreSQL to handle large data values. It allows PostgreSQL to store large records more efficiently by compressing and storing them separately from the main table. The TOAST index cache helps PostgreSQL efficiently access large data values, and a high cache hit ratio is desired for better performance.
+
+### Troubleshoot the alert
+
+- Check the current cache hit ratio
+
+ Run the following query in the PostgreSQL prompt to see the current hit ratio:
+
+ ```
+ SELECT schemaname, relname, toastidx_scan, toastidx_fetch, 100 * (1 - (toastidx_fetch / toastidx_scan)) as hit_ratio
+ FROM pg_stat_all_tables
+ WHERE toastidx_scan > 0 and relname='${label:table}' and schemaname='${label:database}';
+ ```
+
+- Investigate the workload on the database
+
+ Inspect the queries running on the database to determine if any specific queries are causing excessive cache misses. Use [`pg_stat_statements`](https://www.postgresql.org/docs/current/pgstatstatements.html) module to gather information on query performance.
+
+- Increase `work_mem` configuration value
+
+ If the issue persists, consider increasing the `work_mem` value in the PostgreSQL configuration file (`postgresql.conf`). This parameter determines the amount of memory PostgreSQL can use for internal sort operations and hash tables, which may help reduce cache misses.
+
+ Remember to restart the PostgreSQL server after making changes to the configuration file for the changes to take effect.
+
+- Optimize table structure
+
+ Assess if the table design can be optimized to reduce the number of large data values or if additional indexes can be created to improve cache hit ratio.
+
+- Monitor the effect of increased cache miss ratios
+
+ Keep an eye on overall database performance metrics, such as query execution times and I/O load, to determine the impact of increased cache miss ratios on database performance.
+
+### Useful resources
+
+1. [PostgreSQL: The TOAST Technique](https://www.postgresql.org/docs/current/storage-toast.html)
diff --git a/health/guides/postgres/postgres_total_connection_utilization.md b/health/guides/postgres/postgres_total_connection_utilization.md
new file mode 100644
index 000000000..266f4cbd0
--- /dev/null
+++ b/health/guides/postgres/postgres_total_connection_utilization.md
@@ -0,0 +1,45 @@
+### Understand the alert
+
+This alert monitors the total `connection utilization` of a PostgreSQL database. If you receive this alert, it means that your `PostgreSQL` database is experiencing a high demand for connections. This can lead to performance degradation and, in extreme cases, could potentially prevent new connections from being established.
+
+### What does connection utilization mean?
+
+`Connection utilization` refers to the percentage of `database connections` currently in use compared to the maximum number of connections allowed by the PostgreSQL server. A high connection utilization implies that the server is handling a large number of concurrent connections, and its resources may be strained, leading to decreased performance.
+
+### Troubleshoot the alert
+
+1. Check the current connections to the PostgreSQL database:
+
+ You can use the following SQL query to check the number of active connections for each database:
+
+ ```
+ SELECT datname, count(*) FROM pg_stat_activity GROUP BY datname;
+ ```
+
+ or use the following command to check the total connections to all databases:
+
+ ```
+ SELECT count(*) FROM pg_stat_activity;
+ ```
+
+2. Identify the source of increased connections:
+
+ To find out which user or application is responsible for the high connection count, you can use the following SQL query:
+
+ ```
+ SELECT usename, application_name, count(*) FROM pg_stat_activity GROUP BY usename, application_name;
+ ```
+
+ This query shows the number of connections per user and application, which can help you identify the source of the increased connection demand.
+
+3. Optimize connection pooling:
+
+ If you are using an application server, such as `pgBouncer`, that supports connection pooling, consider adjusting the connection pool settings to better manage the available connections. This can help mitigate high connection utilization.
+
+4. Increase the maximum connections limit:
+
+ If your server has the necessary resources, you may consider increasing the maximum number of connections allowed by the PostgreSQL server. To do this, modify the `max_connections` configuration parameter in the `postgresql.conf` file and then restart the PostgreSQL service.
+
+### Useful resources
+
+1. [PostgreSQL: max_connections](https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-MAX-CONNECTIONS)
diff --git a/health/guides/postgres/postgres_txid_exhaustion_perc.md b/health/guides/postgres/postgres_txid_exhaustion_perc.md
new file mode 100644
index 000000000..9c2284956
--- /dev/null
+++ b/health/guides/postgres/postgres_txid_exhaustion_perc.md
@@ -0,0 +1,33 @@
+### Understand the alert
+
+This alert monitors the percentage of transaction ID (TXID) exhaustion in a PostgreSQL database, specifically the rate at which the system is approaching a `TXID wraparound`. If the alert is triggered, it means that your PostgreSQL database is more than 90% towards exhausting its available transaction IDs, and you should take action to prevent transaction ID wraparound.
+
+### What is TXID wraparound?
+
+In PostgreSQL, transaction IDs are 32-bit integers, and a new one is assigned to each new transaction. Once the system has used all possible 32-bit integers for transaction IDs, it wraps back around to the beginning, reusing previous transaction IDs. This wraparound can lead to data loss or database unavailability if transactions' tuple visibility information becomes muddled.
+
+### Troubleshoot the alert
+
+1. Check the number of remaining transactions before wraparound. Connect to your PostgreSQL database, and run the following SQL query:
+
+ ```sql
+ SELECT datname, age(datfrozenxid) as age, current_limit FROM pg_database JOIN (SELECT setting AS current_limit FROM pg_settings WHERE name = 'autovacuum_vacuum_scale_factor') AS t1 ORDER BY age DESC;
+ ```
+
+2. Vacuum the database to prevent transaction ID wraparound. Run the following command:
+
+ ```
+ vacuumdb --all --freeze
+ ```
+
+ The command `vacuumdb` reclaims storage, optimizes the database for better performance, and prevents transaction ID wraparound.
+
+3. Configure Autovacuum settings for long-term prevention. Adjust `autovacuum_vacuum_scale_factor`, `autovacuum_analyze_scale_factor`, `vacuum_cost_limit`, and `maintenance_work_mem` in the PostgreSQL configuration file `postgresql.conf`. Then, restart the PostgreSQL service for the changes to take effect.
+
+ ```
+ service postgresql restart
+ ```
+
+### Useful resources
+
+1. [Preventing Transaction ID Wraparound Failures](https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND)