From 5da14042f70711ea5cf66e034699730335462f66 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Sun, 5 May 2024 14:08:03 +0200 Subject: Merging upstream version 1.45.3+dfsg. Signed-off-by: Daniel Baumann --- .../elasticsearch/integrations/elasticsearch.md | 343 +++++++++++++++++++++ 1 file changed, 343 insertions(+) create mode 100644 src/go/collectors/go.d.plugin/modules/elasticsearch/integrations/elasticsearch.md (limited to 'src/go/collectors/go.d.plugin/modules/elasticsearch/integrations/elasticsearch.md') diff --git a/src/go/collectors/go.d.plugin/modules/elasticsearch/integrations/elasticsearch.md b/src/go/collectors/go.d.plugin/modules/elasticsearch/integrations/elasticsearch.md new file mode 100644 index 000000000..9978bf073 --- /dev/null +++ b/src/go/collectors/go.d.plugin/modules/elasticsearch/integrations/elasticsearch.md @@ -0,0 +1,343 @@ + + +# Elasticsearch + + + + + +Plugin: go.d.plugin +Module: elasticsearch + + + +## Overview + +This collector monitors the performance and health of the Elasticsearch cluster. + + +It uses [Cluster APIs](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster.html) to collect metrics. + +Used endpoints: + +| Endpoint | Description | API | +|------------------------|----------------------|-------------------------------------------------------------------------------------------------------------| +| `/` | Node info | | +| `/_nodes/stats` | Nodes metrics | [Nodes stats API](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html) | +| `/_nodes/_local/stats` | Local node metrics | [Nodes stats API](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html) | +| `/_cluster/health` | Cluster health stats | [Cluster health API](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html) | +| `/_cluster/stats` | Cluster metrics | [Cluster stats API](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-stats.html) | + + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +By default, it detects instances running on localhost by attempting to connect to port 9200: + +- http://127.0.0.1:9200 +- https://127.0.0.1:9200 + + +#### Limits + +By default, this collector monitors only the node it is connected to. To monitor all cluster nodes, set the `cluster_mode` configuration option to `yes`. + + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per node + +These metrics refer to the cluster node. + +Labels: + +| Label | Description | +|:-----------|:----------------| +| cluster_name | Name of the cluster. Based on the [Cluster name setting](https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html#cluster-name). | +| node_name | Human-readable identifier for the node. Based on the [Node name setting](https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html#node-name). | +| host | Network host for the node, based on the [Network host setting](https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html#network.host). | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| elasticsearch.node_indices_indexing | index | operations/s | +| elasticsearch.node_indices_indexing_current | index | operations | +| elasticsearch.node_indices_indexing_time | index | milliseconds | +| elasticsearch.node_indices_search | queries, fetches | operations/s | +| elasticsearch.node_indices_search_current | queries, fetches | operations | +| elasticsearch.node_indices_search_time | queries, fetches | milliseconds | +| elasticsearch.node_indices_refresh | refresh | operations/s | +| elasticsearch.node_indices_refresh_time | refresh | milliseconds | +| elasticsearch.node_indices_flush | flush | operations/s | +| elasticsearch.node_indices_flush_time | flush | milliseconds | +| elasticsearch.node_indices_fielddata_memory_usage | used | bytes | +| elasticsearch.node_indices_fielddata_evictions | evictions | operations/s | +| elasticsearch.node_indices_segments_count | segments | segments | +| elasticsearch.node_indices_segments_memory_usage_total | used | bytes | +| elasticsearch.node_indices_segments_memory_usage | terms, stored_fields, term_vectors, norms, points, doc_values, index_writer, version_map, fixed_bit_set | bytes | +| elasticsearch.node_indices_translog_operations | total, uncommitted | operations | +| elasticsearch.node_indices_translog_size | total, uncommitted | bytes | +| elasticsearch.node_file_descriptors | open | fd | +| elasticsearch.node_jvm_heap | inuse | percentage | +| elasticsearch.node_jvm_heap_bytes | committed, used | bytes | +| elasticsearch.node_jvm_buffer_pools_count | direct, mapped | pools | +| elasticsearch.node_jvm_buffer_pool_direct_memory | total, used | bytes | +| elasticsearch.node_jvm_buffer_pool_mapped_memory | total, used | bytes | +| elasticsearch.node_jvm_gc_count | young, old | gc/s | +| elasticsearch.node_jvm_gc_time | young, old | milliseconds | +| elasticsearch.node_thread_pool_queued | generic, search, search_throttled, get, analyze, write, snapshot, warmer, refresh, listener, fetch_shard_started, fetch_shard_store, flush, force_merge, management | threads | +| elasticsearch.node_thread_pool_rejected | generic, search, search_throttled, get, analyze, write, snapshot, warmer, refresh, listener, fetch_shard_started, fetch_shard_store, flush, force_merge, management | threads | +| elasticsearch.node_cluster_communication_packets | received, sent | pps | +| elasticsearch.node_cluster_communication_traffic | received, sent | bytes/s | +| elasticsearch.node_http_connections | open | connections | +| elasticsearch.node_breakers_trips | requests, fielddata, in_flight_requests, model_inference, accounting, parent | trips/s | + +### Per cluster + +These metrics refer to the cluster. + +Labels: + +| Label | Description | +|:-----------|:----------------| +| cluster_name | Name of the cluster. Based on the [Cluster name setting](https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html#cluster-name). | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| elasticsearch.cluster_health_status | green, yellow, red | status | +| elasticsearch.cluster_number_of_nodes | nodes, data_nodes | nodes | +| elasticsearch.cluster_shards_count | active_primary, active, relocating, initializing, unassigned, delayed_unaasigned | shards | +| elasticsearch.cluster_pending_tasks | pending | tasks | +| elasticsearch.cluster_number_of_in_flight_fetch | in_flight_fetch | fetches | +| elasticsearch.cluster_indices_count | indices | indices | +| elasticsearch.cluster_indices_shards_count | total, primaries, replication | shards | +| elasticsearch.cluster_indices_docs_count | docs | docs | +| elasticsearch.cluster_indices_store_size | size | bytes | +| elasticsearch.cluster_indices_query_cache | hit, miss | events/s | +| elasticsearch.cluster_nodes_by_role_count | coordinating_only, data, data_cold, data_content, data_frozen, data_hot, data_warm, ingest, master, ml, remote_cluster_client, voting_only | nodes | + +### Per index + +These metrics refer to the index. + +Labels: + +| Label | Description | +|:-----------|:----------------| +| cluster_name | Name of the cluster. Based on the [Cluster name setting](https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html#cluster-name). | +| index | Name of the index. | + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| elasticsearch.node_index_health | green, yellow, red | status | +| elasticsearch.node_index_shards_count | shards | shards | +| elasticsearch.node_index_docs_count | docs | docs | +| elasticsearch.node_index_store_size | store_size | bytes | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ elasticsearch_node_indices_search_time_query ](https://github.com/netdata/netdata/blob/master/src/health/health.d/elasticsearch.conf) | elasticsearch.node_indices_search_time | search performance is degraded, queries run slowly. | +| [ elasticsearch_node_indices_search_time_fetch ](https://github.com/netdata/netdata/blob/master/src/health/health.d/elasticsearch.conf) | elasticsearch.node_indices_search_time | search performance is degraded, fetches run slowly. | +| [ elasticsearch_cluster_health_status_red ](https://github.com/netdata/netdata/blob/master/src/health/health.d/elasticsearch.conf) | elasticsearch.cluster_health_status | cluster health status is red. | +| [ elasticsearch_cluster_health_status_yellow ](https://github.com/netdata/netdata/blob/master/src/health/health.d/elasticsearch.conf) | elasticsearch.cluster_health_status | cluster health status is yellow. | +| [ elasticsearch_node_index_health_red ](https://github.com/netdata/netdata/blob/master/src/health/health.d/elasticsearch.conf) | elasticsearch.node_index_health | node index $label:index health status is red. | + + +## Setup + +### Prerequisites + +No action required. + +### Configuration + +#### File + +The configuration file name for this integration is `go.d/elasticsearch.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/netdata-agent/configuration.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config go.d/elasticsearch.conf +``` +#### Options + +The following options can be defined globally: update_every, autodetection_retry. + + +
Config options + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Data collection frequency. | 5 | no | +| autodetection_retry | Recheck interval in seconds. Zero means no recheck will be scheduled. | 0 | no | +| url | Server URL. | http://127.0.0.1:9200 | yes | +| cluster_mode | Controls whether to collect metrics for all nodes in the cluster or only for the local node. | false | no | +| collect_node_stats | Controls whether to collect nodes metrics. | true | no | +| collect_cluster_health | Controls whether to collect cluster health metrics. | true | no | +| collect_cluster_stats | Controls whether to collect cluster stats metrics. | true | no | +| collect_indices_stats | Controls whether to collect indices metrics. | false | no | +| timeout | HTTP request timeout. | 2 | no | +| username | Username for basic HTTP authentication. | | no | +| password | Password for basic HTTP authentication. | | no | +| proxy_url | Proxy URL. | | no | +| proxy_username | Username for proxy basic HTTP authentication. | | no | +| proxy_password | Password for proxy basic HTTP authentication. | | no | +| method | HTTP request method. | GET | no | +| body | HTTP request body. | | no | +| headers | HTTP request headers. | | no | +| not_follow_redirects | Redirect handling policy. Controls whether the client follows redirects. | no | no | +| tls_skip_verify | Server certificate chain and hostname validation policy. Controls whether the client performs this check. | no | no | +| tls_ca | Certification authority that the client uses when verifying the server's certificates. | | no | +| tls_cert | Client TLS certificate. | | no | +| tls_key | Client TLS key. | | no | + +
+ +#### Examples + +##### Basic single node mode + +A basic example configuration. + +```yaml +jobs: + - name: local + url: http://127.0.0.1:9200 + +``` +##### Cluster mode + +Cluster mode example configuration. + +
Config + +```yaml +jobs: + - name: local + url: http://127.0.0.1:9200 + cluster_mode: yes + +``` +
+ +##### HTTP authentication + +Basic HTTP authentication. + +
Config + +```yaml +jobs: + - name: local + url: http://127.0.0.1:9200 + username: username + password: password + +``` +
+ +##### HTTPS with self-signed certificate + +Elasticsearch with enabled HTTPS and self-signed certificate. + +
Config + +```yaml +jobs: + - name: local + url: https://127.0.0.1:9200 + tls_skip_verify: yes + +``` +
+ +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Collecting metrics from local and remote instances. + + +
Config + +```yaml +jobs: + - name: local + url: http://127.0.0.1:9200 + + - name: remote + url: http://192.0.2.1:9200 + +``` +
+ + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `elasticsearch` collector, run the `go.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `go.d.plugin` to debug the collector: + + ```bash + ./go.d.plugin -d -m elasticsearch + ``` + + -- cgit v1.2.3