1 files changed, 417 insertions, 0 deletions
diff --git a/src/go/plugin/go.d/modules/postgres/integrations/postgresql.md b/src/go/plugin/go.d/modules/postgres/integrations/postgresql.md
new file mode 100644
index 000000000..4f2a91101
--- /dev/null
+++ b/src/go/plugin/go.d/modules/postgres/integrations/postgresql.md
@@ -0,0 +1,417 @@
+<!--startmeta
+custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/go/plugin/go.d/modules/postgres/README.md"
+meta_yaml: "https://github.com/netdata/netdata/edit/master/src/go/plugin/go.d/modules/postgres/metadata.yaml"
+sidebar_label: "PostgreSQL"
+learn_status: "Published"
+learn_rel_path: "Collecting Metrics/Databases"
+most_popular: True
+message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE"
+endmeta-->
+
+# PostgreSQL
+
+
+<img src="https://netdata.cloud/img/postgres.svg" width="150"/>
+
+
+Plugin: go.d.plugin
+Module: postgres
+
+<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" />
+
+## Overview
+
+This collector monitors the activity and performance of Postgres servers, collects replication statistics, metrics for each database, table and index, and more.
+
+
+It establishes a connection to the Postgres instance via a TCP or UNIX socket.
+To collect metrics for database tables and indexes, it establishes an additional connection for each discovered database.
+
+
+This collector is supported on all platforms.
+
+This collector supports collecting metrics from multiple instances of this integration, including remote instances.
+
+
+### Default Behavior
+
+#### Auto-Detection
+
+By default, it detects instances running on localhost by trying to connect as root and netdata using known PostgreSQL TCP and UNIX sockets:
+
+- 127.0.0.1:5432
+- /var/run/postgresql/
+
+
+#### Limits
+
+Table and index metrics are not collected for databases with more than 50 tables or 250 indexes.
+These limits can be changed in the configuration file.
+
+
+#### Performance Impact
+
+The default configuration for this integration is not expected to impose a significant performance impact on the system.
+
+
+## Metrics
+
+Metrics grouped by *scope*.
+
+The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.
+
+
+
+### Per PostgreSQL instance
+
+These metrics refer to the entire monitored application.
+
+This scope has no labels.
+
+Metrics:
+
+| Metric | Dimensions | Unit |
+|:------|:----------|:----|
+| postgres.connections_utilization | used | percentage |
+| postgres.connections_usage | available, used | connections |
+| postgres.connections_state_count | active, idle, idle_in_transaction, idle_in_transaction_aborted, disabled | connections |
+| postgres.transactions_duration | a dimension per bucket | transactions/s |
+| postgres.queries_duration | a dimension per bucket | queries/s |
+| postgres.locks_utilization | used | percentage |
+| postgres.checkpoints_rate | scheduled, requested | checkpoints/s |
+| postgres.checkpoints_time | write, sync | milliseconds |
+| postgres.bgwriter_halts_rate | maxwritten | events/s |
+| postgres.buffers_io_rate | checkpoint, backend, bgwriter | B/s |
+| postgres.buffers_backend_fsync_rate | fsync | calls/s |
+| postgres.buffers_allocated_rate | allocated | B/s |
+| postgres.wal_io_rate | write | B/s |
+| postgres.wal_files_count | written, recycled | files |
+| postgres.wal_archiving_files_count | ready, done | files/s |
+| postgres.autovacuum_workers_count | analyze, vacuum_analyze, vacuum, vacuum_freeze, brin_summarize | workers |
+| postgres.txid_exhaustion_towards_autovacuum_perc | emergency_autovacuum | percentage |
+| postgres.txid_exhaustion_perc | txid_exhaustion | percentage |
+| postgres.txid_exhaustion_oldest_txid_num | xid | xid |
+| postgres.catalog_relations_count | ordinary_table, index, sequence, toast_table, view, materialized_view, composite_type, foreign_table, partitioned_table, partitioned_index | relations |
+| postgres.catalog_relations_size | ordinary_table, index, sequence, toast_table, view, materialized_view, composite_type, foreign_table, partitioned_table, partitioned_index | B |
+| postgres.uptime | uptime | seconds |
+| postgres.databases_count | databases | databases |
+
+### Per repl application
+
+These metrics refer to the replication application.
+
+Labels:
+
+| Label      | Description     |
+|:-----------|:----------------|
+| application | application name |
+
+Metrics:
+
+| Metric | Dimensions | Unit |
+|:------|:----------|:----|
+| postgres.replication_app_wal_lag_size | sent_lag, write_lag, flush_lag, replay_lag | B |
+| postgres.replication_app_wal_lag_time | write_lag, flush_lag, replay_lag | seconds |
+
+### Per repl slot
+
+These metrics refer to the replication slot.
+
+Labels:
+
+| Label      | Description     |
+|:-----------|:----------------|
+| slot | replication slot name |
+
+Metrics:
+
+| Metric | Dimensions | Unit |
+|:------|:----------|:----|
+| postgres.replication_slot_files_count | wal_keep, pg_replslot_files | files |
+
+### Per database
+
+These metrics refer to the database.
+
+Labels:
+
+| Label      | Description     |
+|:-----------|:----------------|
+| database | database name |
+
+Metrics:
+
+| Metric | Dimensions | Unit |
+|:------|:----------|:----|
+| postgres.db_transactions_ratio | committed, rollback | percentage |
+| postgres.db_transactions_rate | committed, rollback | transactions/s |
+| postgres.db_connections_utilization | used | percentage |
+| postgres.db_connections_count | connections | connections |
+| postgres.db_cache_io_ratio | miss | percentage |
+| postgres.db_io_rate | memory, disk | B/s |
+| postgres.db_ops_fetched_rows_ratio | fetched | percentage |
+| postgres.db_ops_read_rows_rate | returned, fetched | rows/s |
+| postgres.db_ops_write_rows_rate | inserted, deleted, updated | rows/s |
+| postgres.db_conflicts_rate | conflicts | queries/s |
+| postgres.db_conflicts_reason_rate | tablespace, lock, snapshot, bufferpin, deadlock | queries/s |
+| postgres.db_deadlocks_rate | deadlocks | deadlocks/s |
+| postgres.db_locks_held_count | access_share, row_share, row_exclusive, share_update, share, share_row_exclusive, exclusive, access_exclusive | locks |
+| postgres.db_locks_awaited_count | access_share, row_share, row_exclusive, share_update, share, share_row_exclusive, exclusive, access_exclusive | locks |
+| postgres.db_temp_files_created_rate | created | files/s |
+| postgres.db_temp_files_io_rate | written | B/s |
+| postgres.db_size | size | B |
+
+### Per table
+
+These metrics refer to the database table.
+
+Labels:
+
+| Label      | Description     |
+|:-----------|:----------------|
+| database | database name |
+| schema | schema name |
+| table | table name |
+| parent_table | parent table name |
+
+Metrics:
+
+| Metric | Dimensions | Unit |
+|:------|:----------|:----|
+| postgres.table_rows_dead_ratio | dead | percentage |
+| postgres.table_rows_count | live, dead | rows |
+| postgres.table_ops_rows_rate | inserted, deleted, updated | rows/s |
+| postgres.table_ops_rows_hot_ratio | hot | percentage |
+| postgres.table_ops_rows_hot_rate | hot | rows/s |
+| postgres.table_cache_io_ratio | miss | percentage |
+| postgres.table_io_rate | memory, disk | B/s |
+| postgres.table_index_cache_io_ratio | miss | percentage |
+| postgres.table_index_io_rate | memory, disk | B/s |
+| postgres.table_toast_cache_io_ratio | miss | percentage |
+| postgres.table_toast_io_rate | memory, disk | B/s |
+| postgres.table_toast_index_cache_io_ratio | miss | percentage |
+| postgres.table_toast_index_io_rate | memory, disk | B/s |
+| postgres.table_scans_rate | index, sequential | scans/s |
+| postgres.table_scans_rows_rate | index, sequential | rows/s |
+| postgres.table_autovacuum_since_time | time | seconds |
+| postgres.table_vacuum_since_time | time | seconds |
+| postgres.table_autoanalyze_since_time | time | seconds |
+| postgres.table_analyze_since_time | time | seconds |
+| postgres.table_null_columns | null | columns |
+| postgres.table_size | size | B |
+| postgres.table_bloat_size_perc | bloat | percentage |
+| postgres.table_bloat_size | bloat | B |
+
+### Per index
+
+These metrics refer to the table index.
+
+Labels:
+
+| Label      | Description     |
+|:-----------|:----------------|
+| database | database name |
+| schema | schema name |
+| table | table name |
+| parent_table | parent table name |
+| index | index name |
+
+Metrics:
+
+| Metric | Dimensions | Unit |
+|:------|:----------|:----|
+| postgres.index_size | size | B |
+| postgres.index_bloat_size_perc | bloat | percentage |
+| postgres.index_bloat_size | bloat | B |
+| postgres.index_usage_status | used, unused | status |
+
+
+
+## Alerts
+
+
+The following alerts are available:
+
+| Alert name  | On metric | Description |
+|:------------|:----------|:------------|
+| [ postgres_total_connection_utilization ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.connections_utilization | average total connection utilization over the last minute |
+| [ postgres_acquired_locks_utilization ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.locks_utilization | average acquired locks utilization over the last minute |
+| [ postgres_txid_exhaustion_perc ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.txid_exhaustion_perc | percent towards TXID wraparound |
+| [ postgres_db_cache_io_ratio ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.db_cache_io_ratio | average cache hit ratio in db ${label:database} over the last minute |
+| [ postgres_db_transactions_rollback_ratio ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.db_cache_io_ratio | average aborted transactions percentage in db ${label:database} over the last five minutes |
+| [ postgres_db_deadlocks_rate ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.db_deadlocks_rate | number of deadlocks detected in db ${label:database} in the last minute |
+| [ postgres_table_cache_io_ratio ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.table_cache_io_ratio | average cache hit ratio in db ${label:database} table ${label:table} over the last minute |
+| [ postgres_table_index_cache_io_ratio ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.table_index_cache_io_ratio | average index cache hit ratio in db ${label:database} table ${label:table} over the last minute |
+| [ postgres_table_toast_cache_io_ratio ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.table_toast_cache_io_ratio | average TOAST hit ratio in db ${label:database} table ${label:table} over the last minute |
+| [ postgres_table_toast_index_cache_io_ratio ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.table_toast_index_cache_io_ratio | average index TOAST hit ratio in db ${label:database} table ${label:table} over the last minute |
+| [ postgres_table_bloat_size_perc ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.table_bloat_size_perc | bloat size percentage in db ${label:database} table ${label:table} |
+| [ postgres_table_last_autovacuum_time ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.table_autovacuum_since_time | time elapsed since db ${label:database} table ${label:table} was vacuumed by the autovacuum daemon |
+| [ postgres_table_last_autoanalyze_time ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.table_autoanalyze_since_time | time elapsed since db ${label:database} table ${label:table} was analyzed by the autovacuum daemon |
+| [ postgres_index_bloat_size_perc ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.index_bloat_size_perc | bloat size percentage in db ${label:database} table ${label:table} index ${label:index} |
+
+
+## Setup
+
+### Prerequisites
+
+#### Create netdata user
+
+Create a user with granted `pg_monitor`
+or `pg_read_all_stat` [built-in role](https://www.postgresql.org/docs/current/predefined-roles.html).
+
+To create the `netdata` user with these permissions, execute the following in the psql session, as a user with CREATEROLE privileges:
+
+```postgresql
+CREATE USER netdata;
+GRANT pg_monitor TO netdata;
+```
+
+After creating the new user, restart the Netdata agent with `sudo systemctl restart netdata`, or
+the [appropriate method](/docs/netdata-agent/start-stop-restart.md) for your
+system.
+
+
+
+### Configuration
+
+#### File
+
+The configuration file name for this integration is `go.d/postgres.conf`.
+
+
+You can edit the configuration file using the `edit-config` script from the
+Netdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).
+
+```bash
+cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
+sudo ./edit-config go.d/postgres.conf
+```
+#### Options
+
+The following options can be defined globally: update_every, autodetection_retry.
+
+
+<details open><summary>Config options</summary>
+
+| Name | Description | Default | Required |
+|:----|:-----------|:-------|:--------:|
+| update_every | Data collection frequency. | 5 | no |
+| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |
+| dsn | Postgres server DSN (Data Source Name). See [DSN syntax](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING). | postgres://postgres:postgres@127.0.0.1:5432/postgres | yes |
+| timeout | Query timeout in seconds. | 2 | no |
+| collect_databases_matching | Databases selector. Determines which database metrics will be collected. Syntax is [simple patterns](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/pkg/matcher#simple-patterns-matcher). |  | no |
+| max_db_tables | Maximum number of tables in the database. Table metrics will not be collected for databases that have more tables than max_db_tables. 0 means no limit. | 50 | no |
+| max_db_indexes | Maximum number of indexes in the database. Index metrics will not be collected for databases that have more indexes than max_db_indexes. 0 means no limit. | 250 | no |
+
+</details>
+
+#### Examples
+
+##### TCP socket
+
+An example configuration.
+
+```yaml
+jobs:
+  - name: local
+    dsn: 'postgresql://netdata@127.0.0.1:5432/postgres'
+
+```
+##### Unix socket
+
+An example configuration.
+
+<details open><summary>Config</summary>
+
+```yaml
+jobs:
+  - name: local
+    dsn: 'host=/var/run/postgresql dbname=postgres user=netdata'
+
+```
+</details>
+
+##### Multi-instance
+
+> **Note**: When you define multiple jobs, their names must be unique.
+
+Local and remote instances.
+
+
+<details open><summary>Config</summary>
+
+```yaml
+jobs:
+  - name: local
+    dsn: 'postgresql://netdata@127.0.0.1:5432/postgres'
+
+  - name: remote
+    dsn: 'postgresql://netdata@203.0.113.0:5432/postgres'
+
+```
+</details>
+
+
+
+## Troubleshooting
+
+### Debug Mode
+
+**Important**: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature.
+
+To troubleshoot issues with the `postgres` collector, run the `go.d.plugin` with the debug option enabled. The output
+should give you clues as to why the collector isn't working.
+
+- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on
+  your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.
+
+  ```bash
+  cd /usr/libexec/netdata/plugins.d/
+  ```
+
+- Switch to the `netdata` user.
+
+  ```bash
+  sudo -u netdata -s
+  ```
+
+- Run the `go.d.plugin` to debug the collector:
+
+  ```bash
+  ./go.d.plugin -d -m postgres
+  ```
+
+### Getting Logs
+
+If you're encountering problems with the `postgres` collector, follow these steps to retrieve logs and identify potential issues:
+
+- **Run the command** specific to your system (systemd, non-systemd, or Docker container).
+- **Examine the output** for any warnings or error messages that might indicate issues.  These messages should provide clues about the root cause of the problem.
+
+#### System with systemd
+
+Use the following command to view logs generated since the last Netdata service restart:
+
+```bash
+journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep postgres
+```
+
+#### System without systemd
+
+Locate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name:
+
+```bash
+grep postgres /var/log/netdata/collector.log
+```
+
+**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues.
+
+#### Docker Container
+
+If your Netdata runs in a Docker container named "netdata" (replace if different), use this command:
+
+```bash
+docker logs netdata 2>&1 | grep postgres
+```
+
+