diff options
Diffstat (limited to 'src/go/plugin/go.d/modules/postgres/integrations/postgresql.md')
-rw-r--r-- | src/go/plugin/go.d/modules/postgres/integrations/postgresql.md | 417 |
1 files changed, 417 insertions, 0 deletions
diff --git a/src/go/plugin/go.d/modules/postgres/integrations/postgresql.md b/src/go/plugin/go.d/modules/postgres/integrations/postgresql.md new file mode 100644 index 000000000..4f2a91101 --- /dev/null +++ b/src/go/plugin/go.d/modules/postgres/integrations/postgresql.md @@ -0,0 +1,417 @@ +<!--startmeta +custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/go/plugin/go.d/modules/postgres/README.md" +meta_yaml: "https://github.com/netdata/netdata/edit/master/src/go/plugin/go.d/modules/postgres/metadata.yaml" +sidebar_label: "PostgreSQL" +learn_status: "Published" +learn_rel_path: "Collecting Metrics/Databases" +most_popular: True +message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE" +endmeta--> + +# PostgreSQL + + +<img src="https://netdata.cloud/img/postgres.svg" width="150"/> + + +Plugin: go.d.plugin +Module: postgres + +<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" /> + +## Overview + +This collector monitors the activity and performance of Postgres servers, collects replication statistics, metrics for each database, table and index, and more. + + +It establishes a connection to the Postgres instance via a TCP or UNIX socket. +To collect metrics for database tables and indexes, it establishes an additional connection for each discovered database. + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +By default, it detects instances running on localhost by trying to connect as root and netdata using known PostgreSQL TCP and UNIX sockets: + +- 127.0.0.1:5432 +- /var/run/postgresql/ + + +#### Limits + +Table and index metrics are not collected for databases with more than 50 tables or 250 indexes. +These limits can be changed in the configuration file. + + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per PostgreSQL instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| postgres.connections_utilization | used | percentage | +| postgres.connections_usage | available, used | connections | +| postgres.connections_state_count | active, idle, idle_in_transaction, idle_in_transaction_aborted, disabled | connections | +| postgres.transactions_duration | a dimension per bucket | transactions/s | +| postgres.queries_duration | a dimension per bucket | queries/s | +| postgres.locks_utilization | used | percentage | +| postgres.checkpoints_rate | scheduled, requested | checkpoints/s | +| postgres.checkpoints_time | write, sync | milliseconds | +| postgres.bgwriter_halts_rate | maxwritten | events/s | +| postgres.buffers_io_rate | checkpoint, backend, bgwriter | B/s | +| postgres.buffers_backend_fsync_rate | fsync | calls/s | +| postgres.buffers_allocated_rate | allocated | B/s | +| postgres.wal_io_rate | write | B/s | +| postgres.wal_files_count | written, recycled | files | +| postgres.wal_archiving_files_count | ready, done | files/s | +| postgres.autovacuum_workers_count | analyze, vacuum_analyze, vacuum, vacuum_freeze, brin_summarize | workers | +| postgres.txid_exhaustion_towards_autovacuum_perc | emergency_autovacuum | percentage | +| postgres.txid_exhaustion_perc | txid_exhaustion | percentage | +| postgres.txid_exhaustion_oldest_txid_num | xid | xid | +| postgres.catalog_relations_count | ordinary_table, index, sequence, toast_table, view, materialized_view, composite_type, foreign_table, partitioned_table, partitioned_index | relations | +| postgres.catalog_relations_size | ordinary_table, index, sequence, toast_table, view, materialized_view, composite_type, foreign_table, partitioned_table, partitioned_index | B | +| postgres.uptime | uptime | seconds | +| postgres.databases_count | databases | databases | + +### Per repl application + +These metrics refer to the replication application. + +Labels: + +| Label | Description | +|:-----------|:----------------| +| application | application name | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| postgres.replication_app_wal_lag_size | sent_lag, write_lag, flush_lag, replay_lag | B | +| postgres.replication_app_wal_lag_time | write_lag, flush_lag, replay_lag | seconds | + +### Per repl slot + +These metrics refer to the replication slot. + +Labels: + +| Label | Description | +|:-----------|:----------------| +| slot | replication slot name | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| postgres.replication_slot_files_count | wal_keep, pg_replslot_files | files | + +### Per database + +These metrics refer to the database. + +Labels: + +| Label | Description | +|:-----------|:----------------| +| database | database name | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| postgres.db_transactions_ratio | committed, rollback | percentage | +| postgres.db_transactions_rate | committed, rollback | transactions/s | +| postgres.db_connections_utilization | used | percentage | +| postgres.db_connections_count | connections | connections | +| postgres.db_cache_io_ratio | miss | percentage | +| postgres.db_io_rate | memory, disk | B/s | +| postgres.db_ops_fetched_rows_ratio | fetched | percentage | +| postgres.db_ops_read_rows_rate | returned, fetched | rows/s | +| postgres.db_ops_write_rows_rate | inserted, deleted, updated | rows/s | +| postgres.db_conflicts_rate | conflicts | queries/s | +| postgres.db_conflicts_reason_rate | tablespace, lock, snapshot, bufferpin, deadlock | queries/s | +| postgres.db_deadlocks_rate | deadlocks | deadlocks/s | +| postgres.db_locks_held_count | access_share, row_share, row_exclusive, share_update, share, share_row_exclusive, exclusive, access_exclusive | locks | +| postgres.db_locks_awaited_count | access_share, row_share, row_exclusive, share_update, share, share_row_exclusive, exclusive, access_exclusive | locks | +| postgres.db_temp_files_created_rate | created | files/s | +| postgres.db_temp_files_io_rate | written | B/s | +| postgres.db_size | size | B | + +### Per table + +These metrics refer to the database table. + +Labels: + +| Label | Description | +|:-----------|:----------------| +| database | database name | +| schema | schema name | +| table | table name | +| parent_table | parent table name | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| postgres.table_rows_dead_ratio | dead | percentage | +| postgres.table_rows_count | live, dead | rows | +| postgres.table_ops_rows_rate | inserted, deleted, updated | rows/s | +| postgres.table_ops_rows_hot_ratio | hot | percentage | +| postgres.table_ops_rows_hot_rate | hot | rows/s | +| postgres.table_cache_io_ratio | miss | percentage | +| postgres.table_io_rate | memory, disk | B/s | +| postgres.table_index_cache_io_ratio | miss | percentage | +| postgres.table_index_io_rate | memory, disk | B/s | +| postgres.table_toast_cache_io_ratio | miss | percentage | +| postgres.table_toast_io_rate | memory, disk | B/s | +| postgres.table_toast_index_cache_io_ratio | miss | percentage | +| postgres.table_toast_index_io_rate | memory, disk | B/s | +| postgres.table_scans_rate | index, sequential | scans/s | +| postgres.table_scans_rows_rate | index, sequential | rows/s | +| postgres.table_autovacuum_since_time | time | seconds | +| postgres.table_vacuum_since_time | time | seconds | +| postgres.table_autoanalyze_since_time | time | seconds | +| postgres.table_analyze_since_time | time | seconds | +| postgres.table_null_columns | null | columns | +| postgres.table_size | size | B | +| postgres.table_bloat_size_perc | bloat | percentage | +| postgres.table_bloat_size | bloat | B | + +### Per index + +These metrics refer to the table index. + +Labels: + +| Label | Description | +|:-----------|:----------------| +| database | database name | +| schema | schema name | +| table | table name | +| parent_table | parent table name | +| index | index name | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| postgres.index_size | size | B | +| postgres.index_bloat_size_perc | bloat | percentage | +| postgres.index_bloat_size | bloat | B | +| postgres.index_usage_status | used, unused | status | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ postgres_total_connection_utilization ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.connections_utilization | average total connection utilization over the last minute | +| [ postgres_acquired_locks_utilization ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.locks_utilization | average acquired locks utilization over the last minute | +| [ postgres_txid_exhaustion_perc ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.txid_exhaustion_perc | percent towards TXID wraparound | +| [ postgres_db_cache_io_ratio ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.db_cache_io_ratio | average cache hit ratio in db ${label:database} over the last minute | +| [ postgres_db_transactions_rollback_ratio ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.db_cache_io_ratio | average aborted transactions percentage in db ${label:database} over the last five minutes | +| [ postgres_db_deadlocks_rate ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.db_deadlocks_rate | number of deadlocks detected in db ${label:database} in the last minute | +| [ postgres_table_cache_io_ratio ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.table_cache_io_ratio | average cache hit ratio in db ${label:database} table ${label:table} over the last minute | +| [ postgres_table_index_cache_io_ratio ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.table_index_cache_io_ratio | average index cache hit ratio in db ${label:database} table ${label:table} over the last minute | +| [ postgres_table_toast_cache_io_ratio ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.table_toast_cache_io_ratio | average TOAST hit ratio in db ${label:database} table ${label:table} over the last minute | +| [ postgres_table_toast_index_cache_io_ratio ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.table_toast_index_cache_io_ratio | average index TOAST hit ratio in db ${label:database} table ${label:table} over the last minute | +| [ postgres_table_bloat_size_perc ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.table_bloat_size_perc | bloat size percentage in db ${label:database} table ${label:table} | +| [ postgres_table_last_autovacuum_time ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.table_autovacuum_since_time | time elapsed since db ${label:database} table ${label:table} was vacuumed by the autovacuum daemon | +| [ postgres_table_last_autoanalyze_time ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.table_autoanalyze_since_time | time elapsed since db ${label:database} table ${label:table} was analyzed by the autovacuum daemon | +| [ postgres_index_bloat_size_perc ](https://github.com/netdata/netdata/blob/master/src/health/health.d/postgres.conf) | postgres.index_bloat_size_perc | bloat size percentage in db ${label:database} table ${label:table} index ${label:index} | + + +## Setup + +### Prerequisites + +#### Create netdata user + +Create a user with granted `pg_monitor` +or `pg_read_all_stat` [built-in role](https://www.postgresql.org/docs/current/predefined-roles.html). + +To create the `netdata` user with these permissions, execute the following in the psql session, as a user with CREATEROLE privileges: + +```postgresql +CREATE USER netdata; +GRANT pg_monitor TO netdata; +``` + +After creating the new user, restart the Netdata agent with `sudo systemctl restart netdata`, or +the [appropriate method](/docs/netdata-agent/start-stop-restart.md) for your +system. + + + +### Configuration + +#### File + +The configuration file name for this integration is `go.d/postgres.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config go.d/postgres.conf +``` +#### Options + +The following options can be defined globally: update_every, autodetection_retry. + + +<details open><summary>Config options</summary> + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Data collection frequency. | 5 | no | +| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no | +| dsn | Postgres server DSN (Data Source Name). See [DSN syntax](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING). | postgres://postgres:postgres@127.0.0.1:5432/postgres | yes | +| timeout | Query timeout in seconds. | 2 | no | +| collect_databases_matching | Databases selector. Determines which database metrics will be collected. Syntax is [simple patterns](https://github.com/netdata/netdata/tree/master/src/go/plugin/go.d/pkg/matcher#simple-patterns-matcher). | | no | +| max_db_tables | Maximum number of tables in the database. Table metrics will not be collected for databases that have more tables than max_db_tables. 0 means no limit. | 50 | no | +| max_db_indexes | Maximum number of indexes in the database. Index metrics will not be collected for databases that have more indexes than max_db_indexes. 0 means no limit. | 250 | no | + +</details> + +#### Examples + +##### TCP socket + +An example configuration. + +```yaml +jobs: + - name: local + dsn: 'postgresql://netdata@127.0.0.1:5432/postgres' + +``` +##### Unix socket + +An example configuration. + +<details open><summary>Config</summary> + +```yaml +jobs: + - name: local + dsn: 'host=/var/run/postgresql dbname=postgres user=netdata' + +``` +</details> + +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Local and remote instances. + + +<details open><summary>Config</summary> + +```yaml +jobs: + - name: local + dsn: 'postgresql://netdata@127.0.0.1:5432/postgres' + + - name: remote + dsn: 'postgresql://netdata@203.0.113.0:5432/postgres' + +``` +</details> + + + +## Troubleshooting + +### Debug Mode + +**Important**: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature. + +To troubleshoot issues with the `postgres` collector, run the `go.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `go.d.plugin` to debug the collector: + + ```bash + ./go.d.plugin -d -m postgres + ``` + +### Getting Logs + +If you're encountering problems with the `postgres` collector, follow these steps to retrieve logs and identify potential issues: + +- **Run the command** specific to your system (systemd, non-systemd, or Docker container). +- **Examine the output** for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem. + +#### System with systemd + +Use the following command to view logs generated since the last Netdata service restart: + +```bash +journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep postgres +``` + +#### System without systemd + +Locate the collector log file, typically at `/var/log/netdata/collector.log`, and use `grep` to filter for collector's name: + +```bash +grep postgres /var/log/netdata/collector.log +``` + +**Note**: This method shows logs from all restarts. Focus on the **latest entries** for troubleshooting current issues. + +#### Docker Container + +If your Netdata runs in a Docker container named "netdata" (replace if different), use this command: + +```bash +docker logs netdata 2>&1 | grep postgres +``` + + |