summaryrefslogtreecommitdiffstats
path: root/src/go/collectors/go.d.plugin/modules/consul/integrations/consul.md
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--src/go/collectors/go.d.plugin/modules/consul/integrations/consul.md324
1 files changed, 324 insertions, 0 deletions
diff --git a/src/go/collectors/go.d.plugin/modules/consul/integrations/consul.md b/src/go/collectors/go.d.plugin/modules/consul/integrations/consul.md
new file mode 100644
index 000000000..c6ac92dc2
--- /dev/null
+++ b/src/go/collectors/go.d.plugin/modules/consul/integrations/consul.md
@@ -0,0 +1,324 @@
+<!--startmeta
+custom_edit_url: "https://github.com/netdata/netdata/edit/master/src/go/collectors/go.d.plugin/modules/consul/README.md"
+meta_yaml: "https://github.com/netdata/netdata/edit/master/src/go/collectors/go.d.plugin/modules/consul/metadata.yaml"
+sidebar_label: "Consul"
+learn_status: "Published"
+learn_rel_path: "Collecting Metrics/Service Discovery / Registry"
+most_popular: True
+message: "DO NOT EDIT THIS FILE DIRECTLY, IT IS GENERATED BY THE COLLECTOR'S metadata.yaml FILE"
+endmeta-->
+
+# Consul
+
+
+<img src="https://netdata.cloud/img/consul.svg" width="150"/>
+
+
+Plugin: go.d.plugin
+Module: consul
+
+<img src="https://img.shields.io/badge/maintained%20by-Netdata-%2300ab44" />
+
+## Overview
+
+This collector monitors [key metrics](https://developer.hashicorp.com/consul/docs/agent/telemetry#key-metrics) of Consul Agents: transaction timings, leadership changes, memory usage and more.
+
+
+It periodically sends HTTP requests to [Consul REST API](https://developer.hashicorp.com/consul/api-docs).
+
+Used endpoints:
+
+- [/operator/autopilot/health](https://developer.hashicorp.com/consul/api-docs/operator/autopilot#read-health)
+- [/agent/checks](https://developer.hashicorp.com/consul/api-docs/agent/check#list-checks)
+- [/agent/self](https://developer.hashicorp.com/consul/api-docs/agent#read-configuration)
+- [/agent/metrics](https://developer.hashicorp.com/consul/api-docs/agent#view-metrics)
+- [/coordinate/nodes](https://developer.hashicorp.com/consul/api-docs/coordinate#read-lan-coordinates-for-all-nodes)
+
+
+This collector is supported on all platforms.
+
+This collector supports collecting metrics from multiple instances of this integration, including remote instances.
+
+
+### Default Behavior
+
+#### Auto-Detection
+
+This collector discovers instances running on the local host, that provide metrics on port 8500.
+
+On startup, it tries to collect metrics from:
+
+- http://localhost:8500
+- http://127.0.0.1:8500
+
+
+#### Limits
+
+The default configuration for this integration does not impose any limits on data collection.
+
+#### Performance Impact
+
+The default configuration for this integration is not expected to impose a significant performance impact on the system.
+
+
+## Metrics
+
+Metrics grouped by *scope*.
+
+The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.
+
+The set of metrics depends on the [Consul Agent mode](https://developer.hashicorp.com/consul/docs/install/glossary#agent).
+
+
+### Per Consul instance
+
+These metrics refer to the entire monitored application.
+
+This scope has no labels.
+
+Metrics:
+
+| Metric | Dimensions | Unit | Leader | Follower | Client |
+|:------|:----------|:----|:---:|:---:|:---:|
+| consul.client_rpc_requests_rate | rpc | requests/s | • | • | • |
+| consul.client_rpc_requests_exceeded_rate | exceeded | requests/s | • | • | • |
+| consul.client_rpc_requests_failed_rate | failed | requests/s | • | • | • |
+| consul.memory_allocated | allocated | bytes | • | • | • |
+| consul.memory_sys | sys | bytes | • | • | • |
+| consul.gc_pause_time | gc_pause | seconds | • | • | • |
+| consul.kvs_apply_time | quantile_0.5, quantile_0.9, quantile_0.99 | ms | • | • | |
+| consul.kvs_apply_operations_rate | kvs_apply | ops/s | • | • | |
+| consul.txn_apply_time | quantile_0.5, quantile_0.9, quantile_0.99 | ms | • | • | |
+| consul.txn_apply_operations_rate | txn_apply | ops/s | • | • | |
+| consul.autopilot_health_status | healthy, unhealthy | status | • | • | |
+| consul.autopilot_failure_tolerance | failure_tolerance | servers | • | • | |
+| consul.autopilot_server_health_status | healthy, unhealthy | status | • | • | |
+| consul.autopilot_server_stable_time | stable | seconds | • | • | |
+| consul.autopilot_server_serf_status | active, failed, left, none | status | • | • | |
+| consul.autopilot_server_voter_status | voter, not_voter | status | • | • | |
+| consul.network_lan_rtt | min, max, avg | ms | • | • | |
+| consul.raft_commit_time | quantile_0.5, quantile_0.9, quantile_0.99 | ms | • | | |
+| consul.raft_commits_rate | commits | commits/s | • | | |
+| consul.raft_leader_last_contact_time | quantile_0.5, quantile_0.9, quantile_0.99 | ms | • | | |
+| consul.raft_leader_oldest_log_age | oldest_log_age | seconds | • | | |
+| consul.raft_follower_last_contact_leader_time | leader_last_contact | ms | | • | |
+| consul.raft_rpc_install_snapshot_time | quantile_0.5, quantile_0.9, quantile_0.99 | ms | | • | |
+| consul.raft_leader_elections_rate | leader | elections/s | • | • | |
+| consul.raft_leadership_transitions_rate | leadership | transitions/s | • | • | |
+| consul.server_leadership_status | leader, not_leader | status | • | • | |
+| consul.raft_thread_main_saturation_perc | quantile_0.5, quantile_0.9, quantile_0.99 | percentage | • | • | |
+| consul.raft_thread_fsm_saturation_perc | quantile_0.5, quantile_0.9, quantile_0.99 | percentage | • | • | |
+| consul.raft_fsm_last_restore_duration | last_restore_duration | ms | • | • | |
+| consul.raft_boltdb_freelist_bytes | freelist | bytes | • | • | |
+| consul.raft_boltdb_logs_per_batch_rate | written | logs/s | • | • | |
+| consul.raft_boltdb_store_logs_time | quantile_0.5, quantile_0.9, quantile_0.99 | ms | • | • | |
+| consul.license_expiration_time | license_expiration | seconds | • | • | • |
+
+### Per node check
+
+Metrics about checks on Node level.
+
+Labels:
+
+| Label | Description |
+|:-----------|:----------------|
+| datacenter | Datacenter Identifier |
+| node_name | The node's name |
+| check_name | The check's name |
+
+Metrics:
+
+| Metric | Dimensions | Unit | Leader | Follower | Client |
+|:------|:----------|:----|:---:|:---:|:---:|
+| consul.node_health_check_status | passing, maintenance, warning, critical | status | • | • | • |
+
+### Per service check
+
+Metrics about checks at a Service level.
+
+Labels:
+
+| Label | Description |
+|:-----------|:----------------|
+| datacenter | Datacenter Identifier |
+| node_name | The node's name |
+| check_name | The check's name |
+| service_name | The service's name |
+
+Metrics:
+
+| Metric | Dimensions | Unit | Leader | Follower | Client |
+|:------|:----------|:----|:---:|:---:|:---:|
+| consul.service_health_check_status | passing, maintenance, warning, critical | status | • | • | • |
+
+
+
+## Alerts
+
+
+The following alerts are available:
+
+| Alert name | On metric | Description |
+|:------------|:----------|:------------|
+| [ consul_node_health_check_status ](https://github.com/netdata/netdata/blob/master/src/health/health.d/consul.conf) | consul.node_health_check_status | node health check ${label:check_name} has failed on server ${label:node_name} datacenter ${label:datacenter} |
+| [ consul_service_health_check_status ](https://github.com/netdata/netdata/blob/master/src/health/health.d/consul.conf) | consul.service_health_check_status | service health check ${label:check_name} for service ${label:service_name} has failed on server ${label:node_name} datacenter ${label:datacenter} |
+| [ consul_client_rpc_requests_exceeded ](https://github.com/netdata/netdata/blob/master/src/health/health.d/consul.conf) | consul.client_rpc_requests_exceeded_rate | number of rate-limited RPC requests made by server ${label:node_name} datacenter ${label:datacenter} |
+| [ consul_client_rpc_requests_failed ](https://github.com/netdata/netdata/blob/master/src/health/health.d/consul.conf) | consul.client_rpc_requests_failed_rate | number of failed RPC requests made by server ${label:node_name} datacenter ${label:datacenter} |
+| [ consul_gc_pause_time ](https://github.com/netdata/netdata/blob/master/src/health/health.d/consul.conf) | consul.gc_pause_time | time spent in stop-the-world garbage collection pauses on server ${label:node_name} datacenter ${label:datacenter} |
+| [ consul_autopilot_health_status ](https://github.com/netdata/netdata/blob/master/src/health/health.d/consul.conf) | consul.autopilot_health_status | datacenter ${label:datacenter} cluster is unhealthy as reported by server ${label:node_name} |
+| [ consul_autopilot_server_health_status ](https://github.com/netdata/netdata/blob/master/src/health/health.d/consul.conf) | consul.autopilot_server_health_status | server ${label:node_name} from datacenter ${label:datacenter} is unhealthy |
+| [ consul_raft_leader_last_contact_time ](https://github.com/netdata/netdata/blob/master/src/health/health.d/consul.conf) | consul.raft_leader_last_contact_time | median time elapsed since leader server ${label:node_name} datacenter ${label:datacenter} was last able to contact the follower nodes |
+| [ consul_raft_leadership_transitions ](https://github.com/netdata/netdata/blob/master/src/health/health.d/consul.conf) | consul.raft_leadership_transitions_rate | there has been a leadership change and server ${label:node_name} datacenter ${label:datacenter} has become the leader |
+| [ consul_raft_thread_main_saturation ](https://github.com/netdata/netdata/blob/master/src/health/health.d/consul.conf) | consul.raft_thread_main_saturation_perc | average saturation of the main Raft goroutine on server ${label:node_name} datacenter ${label:datacenter} |
+| [ consul_raft_thread_fsm_saturation ](https://github.com/netdata/netdata/blob/master/src/health/health.d/consul.conf) | consul.raft_thread_fsm_saturation_perc | average saturation of the FSM Raft goroutine on server ${label:node_name} datacenter ${label:datacenter} |
+| [ consul_license_expiration_time ](https://github.com/netdata/netdata/blob/master/src/health/health.d/consul.conf) | consul.license_expiration_time | Consul Enterprise licence expiration time on node ${label:node_name} datacenter ${label:datacenter} |
+
+
+## Setup
+
+### Prerequisites
+
+#### Enable Prometheus telemetry
+
+[Enable](https://developer.hashicorp.com/consul/docs/agent/config/config-files#telemetry-prometheus_retention_time) telemetry on your Consul agent, by increasing the value of `prometheus_retention_time` from `0`.
+
+
+#### Add required ACLs to Token
+
+Required **only if authentication is enabled**.
+
+| ACL | Endpoint |
+|:---------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `operator:read` | [autopilot health status](https://developer.hashicorp.com/consul/api-docs/operator/autopilot#read-health) |
+| `node:read` | [checks](https://developer.hashicorp.com/consul/api-docs/agent/check#list-checks) |
+| `agent:read` | [configuration](https://developer.hashicorp.com/consul/api-docs/agent#read-configuration), [metrics](https://developer.hashicorp.com/consul/api-docs/agent#view-metrics), and [lan coordinates](https://developer.hashicorp.com/consul/api-docs/coordinate#read-lan-coordinates-for-all-nodes) |
+
+
+
+### Configuration
+
+#### File
+
+The configuration file name for this integration is `go.d/consul.conf`.
+
+
+You can edit the configuration file using the `edit-config` script from the
+Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/netdata-agent/configuration.md#the-netdata-config-directory).
+
+```bash
+cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
+sudo ./edit-config go.d/consul.conf
+```
+#### Options
+
+The following options can be defined globally: update_every, autodetection_retry.
+
+
+<details><summary>All options</summary>
+
+| Name | Description | Default | Required |
+|:----|:-----------|:-------|:--------:|
+| update_every | Data collection frequency. | 1 | no |
+| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no |
+| url | Server URL. | http://localhost:8500 | yes |
+| acl_token | ACL token used in every request. | | no |
+| max_checks | Checks processing/charting limit. | | no |
+| max_filter | Checks processing/charting filter. Uses [simple patterns](https://github.com/netdata/netdata/blob/master/src/libnetdata/simple_pattern/README.md). | | no |
+| username | Username for basic HTTP authentication. | | no |
+| password | Password for basic HTTP authentication. | | no |
+| proxy_url | Proxy URL. | | no |
+| proxy_username | Username for proxy basic HTTP authentication. | | no |
+| proxy_password | Password for proxy basic HTTP authentication. | | no |
+| timeout | HTTP request timeout. | 1 | no |
+| method | HTTP request method. | GET | no |
+| body | HTTP request body. | | no |
+| headers | HTTP request headers. | | no |
+| not_follow_redirects | Redirect handling policy. Controls whether the client follows redirects. | no | no |
+| tls_skip_verify | Server certificate chain and hostname validation policy. Controls whether the client performs this check. | no | no |
+| tls_ca | Certification authority that the client uses when verifying the server's certificates. | | no |
+| tls_cert | Client tls certificate. | | no |
+| tls_key | Client tls key. | | no |
+
+</details>
+
+#### Examples
+
+##### Basic
+
+An example configuration.
+
+```yaml
+jobs:
+ - name: local
+ url: http://127.0.0.1:8500
+ acl_token: "ec15675e-2999-d789-832e-8c4794daa8d7"
+
+```
+##### Basic HTTP auth
+
+Local server with basic HTTP authentication.
+
+<details><summary>Config</summary>
+
+```yaml
+jobs:
+ - name: local
+ url: http://127.0.0.1:8500
+ acl_token: "ec15675e-2999-d789-832e-8c4794daa8d7"
+ username: foo
+ password: bar
+
+```
+</details>
+
+##### Multi-instance
+
+> **Note**: When you define multiple jobs, their names must be unique.
+
+Collecting metrics from local and remote instances.
+
+
+<details><summary>Config</summary>
+
+```yaml
+jobs:
+ - name: local
+ url: http://127.0.0.1:8500
+ acl_token: "ec15675e-2999-d789-832e-8c4794daa8d7"
+
+ - name: remote
+ url: http://203.0.113.10:8500
+ acl_token: "ada7f751-f654-8872-7f93-498e799158b6"
+
+```
+</details>
+
+
+
+## Troubleshooting
+
+### Debug Mode
+
+To troubleshoot issues with the `consul` collector, run the `go.d.plugin` with the debug option enabled. The output
+should give you clues as to why the collector isn't working.
+
+- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on
+ your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`.
+
+ ```bash
+ cd /usr/libexec/netdata/plugins.d/
+ ```
+
+- Switch to the `netdata` user.
+
+ ```bash
+ sudo -u netdata -s
+ ```
+
+- Run the `go.d.plugin` to debug the collector:
+
+ ```bash
+ ./go.d.plugin -d -m consul
+ ```
+
+