From 517a443636daa1e8085cb4e5325524a54e8a8fd7 Mon Sep 17 00:00:00 2001 From: Daniel Baumann Date: Tue, 17 Oct 2023 11:30:23 +0200 Subject: Merging upstream version 1.43.0. Signed-off-by: Daniel Baumann --- collectors/python.d.plugin/riakkv/README.md | 150 +------------- .../python.d.plugin/riakkv/integrations/riakkv.md | 219 +++++++++++++++++++++ 2 files changed, 220 insertions(+), 149 deletions(-) mode change 100644 => 120000 collectors/python.d.plugin/riakkv/README.md create mode 100644 collectors/python.d.plugin/riakkv/integrations/riakkv.md (limited to 'collectors/python.d.plugin/riakkv') diff --git a/collectors/python.d.plugin/riakkv/README.md b/collectors/python.d.plugin/riakkv/README.md deleted file mode 100644 index e822c551e..000000000 --- a/collectors/python.d.plugin/riakkv/README.md +++ /dev/null @@ -1,149 +0,0 @@ - - -# Riak KV collector - -Collects database stats from `/stats` endpoint. - -## Requirements - -- An accessible `/stats` endpoint. See [the Riak KV configuration reference documentation](https://docs.riak.com/riak/kv/2.2.3/configuring/reference/#client-interfaces) - for how to enable this. - -The following charts are included, which are mostly derived from the metrics -listed -[here](https://docs.riak.com/riak/kv/latest/using/reference/statistics-monitoring/index.html#riak-metrics-to-graph). - -1. **Throughput** in operations/s - -- **KV operations** - - gets - - puts - -- **Data type updates** - - counters - - sets - - maps - -- **Search queries** - - queries - -- **Search documents** - - indexed - -- **Strong consistency operations** - - gets - - puts - -2. **Latency** in milliseconds - -- **KV latency** of the past minute - - get (mean, median, 95th / 99th / 100th percentile) - - put (mean, median, 95th / 99th / 100th percentile) - -- **Data type latency** of the past minute - - counter_merge (mean, median, 95th / 99th / 100th percentile) - - set_merge (mean, median, 95th / 99th / 100th percentile) - - map_merge (mean, median, 95th / 99th / 100th percentile) - -- **Search latency** of the past minute - - query (median, min, max, 95th / 99th percentile) - - index (median, min, max, 95th / 99th percentile) - -- **Strong consistency latency** of the past minute - - get (mean, median, 95th / 99th / 100th percentile) - - put (mean, median, 95th / 99th / 100th percentile) - -3. **Erlang VM metrics** - -- **System counters** - - processes - -- **Memory allocation** in MB - - processes.allocated - - processes.used - -4. **General load / health metrics** - -- **Siblings encountered in KV operations** during the past minute - - get (mean, median, 95th / 99th / 100th percentile) - -- **Object size in KV operations** during the past minute in KB - - get (mean, median, 95th / 99th / 100th percentile) - -- **Message queue length** in unprocessed messages - - vnodeq_size (mean, median, 95th / 99th / 100th percentile) - -- **Index operations** encountered by Search - - errors - -- **Protocol buffer connections** - - active - -- **Repair operations coordinated by this node** - - read - -- **Active finite state machines by kind** - - get - - put - - secondary_index - - list_keys - -- **Rejected finite state machines** - - get - - put - -- **Number of writes to Search failed due to bad data format by reason** - - bad_entry - - extract_fail - -## Configuration - -Edit the `python.d/riakkv.conf` configuration file using `edit-config` from the Netdata [config -directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md), which is typically at `/etc/netdata`. - -```bash -cd /etc/netdata # Replace this path with your Netdata config directory, if different -sudo ./edit-config python.d/riakkv.conf -``` - -The module needs to be passed the full URL to Riak's stats endpoint. -For example: - -```yaml -myriak: - url: http://myriak.example.com:8098/stats -``` - -With no explicit configuration given, the module will attempt to connect to -`http://localhost:8098/stats`. - -The default update frequency for the plugin is set to 2 seconds as Riak -internally updates the metrics every second. If we were to update the metrics -every second, the resulting graph would contain odd jitter. -### Troubleshooting - -To troubleshoot issues with the `riakkv` module, run the `python.d.plugin` with the debug option enabled. The -output will give you the output of the data collection job or error messages on why the collector isn't working. - -First, navigate to your plugins directory, usually they are located under `/usr/libexec/netdata/plugins.d/`. If that's -not the case on your system, open `netdata.conf` and look for the setting `plugins directory`. Once you're in the -plugin's directory, switch to the `netdata` user. - -```bash -cd /usr/libexec/netdata/plugins.d/ -sudo su -s /bin/bash netdata -``` - -Now you can manually run the `riakkv` module in debug mode: - -```bash -./python.d.plugin riakkv debug trace -``` - diff --git a/collectors/python.d.plugin/riakkv/README.md b/collectors/python.d.plugin/riakkv/README.md new file mode 120000 index 000000000..f43ece09b --- /dev/null +++ b/collectors/python.d.plugin/riakkv/README.md @@ -0,0 +1 @@ +integrations/riakkv.md \ No newline at end of file diff --git a/collectors/python.d.plugin/riakkv/integrations/riakkv.md b/collectors/python.d.plugin/riakkv/integrations/riakkv.md new file mode 100644 index 000000000..f83def446 --- /dev/null +++ b/collectors/python.d.plugin/riakkv/integrations/riakkv.md @@ -0,0 +1,219 @@ + + +# RiakKV + + + + + +Plugin: python.d.plugin +Module: riakkv + + + +## Overview + +This collector monitors RiakKV metrics about throughput, latency, resources and more.' + + +This collector reads the database stats from the `/stats` endpoint. + +This collector is supported on all platforms. + +This collector supports collecting metrics from multiple instances of this integration, including remote instances. + + +### Default Behavior + +#### Auto-Detection + +If the /stats endpoint is accessible, RiakKV instances on the local host running on port 8098 will be autodetected. + +#### Limits + +The default configuration for this integration does not impose any limits on data collection. + +#### Performance Impact + +The default configuration for this integration is not expected to impose a significant performance impact on the system. + + +## Metrics + +Metrics grouped by *scope*. + +The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels. + + + +### Per RiakKV instance + +These metrics refer to the entire monitored application. + +This scope has no labels. + +Metrics: + +| Metric | Dimensions | Unit | +|:------|:----------|:----| +| riak.kv.throughput | gets, puts | operations/s | +| riak.dt.vnode_updates | counters, sets, maps | operations/s | +| riak.search | queries | queries/s | +| riak.search.documents | indexed | documents/s | +| riak.consistent.operations | gets, puts | operations/s | +| riak.kv.latency.get | mean, median, 95, 99, 100 | ms | +| riak.kv.latency.put | mean, median, 95, 99, 100 | ms | +| riak.dt.latency.counter_merge | mean, median, 95, 99, 100 | ms | +| riak.dt.latency.set_merge | mean, median, 95, 99, 100 | ms | +| riak.dt.latency.map_merge | mean, median, 95, 99, 100 | ms | +| riak.search.latency.query | median, min, 95, 99, 999, max | ms | +| riak.search.latency.index | median, min, 95, 99, 999, max | ms | +| riak.consistent.latency.get | mean, median, 95, 99, 100 | ms | +| riak.consistent.latency.put | mean, median, 95, 99, 100 | ms | +| riak.vm | processes | total | +| riak.vm.memory.processes | allocated, used | MB | +| riak.kv.siblings_encountered.get | mean, median, 95, 99, 100 | siblings | +| riak.kv.objsize.get | mean, median, 95, 99, 100 | KB | +| riak.search.vnodeq_size | mean, median, 95, 99, 100 | messages | +| riak.search.index | errors | errors | +| riak.core.protobuf_connections | active | connections | +| riak.core.repairs | read | repairs | +| riak.core.fsm_active | get, put, secondary index, list keys | fsms | +| riak.core.fsm_rejected | get, put | fsms | +| riak.search.index | bad_entry, extract_fail | writes | + + + +## Alerts + + +The following alerts are available: + +| Alert name | On metric | Description | +|:------------|:----------|:------------| +| [ riakkv_1h_kv_get_mean_latency ](https://github.com/netdata/netdata/blob/master/health/health.d/riakkv.conf) | riak.kv.latency.get | average time between reception of client GET request and subsequent response to client over the last hour | +| [ riakkv_kv_get_slow ](https://github.com/netdata/netdata/blob/master/health/health.d/riakkv.conf) | riak.kv.latency.get | average time between reception of client GET request and subsequent response to the client over the last 3 minutes, compared to the average over the last hour | +| [ riakkv_1h_kv_put_mean_latency ](https://github.com/netdata/netdata/blob/master/health/health.d/riakkv.conf) | riak.kv.latency.put | average time between reception of client PUT request and subsequent response to the client over the last hour | +| [ riakkv_kv_put_slow ](https://github.com/netdata/netdata/blob/master/health/health.d/riakkv.conf) | riak.kv.latency.put | average time between reception of client PUT request and subsequent response to the client over the last 3 minutes, compared to the average over the last hour | +| [ riakkv_vm_high_process_count ](https://github.com/netdata/netdata/blob/master/health/health.d/riakkv.conf) | riak.vm | number of processes running in the Erlang VM | +| [ riakkv_list_keys_active ](https://github.com/netdata/netdata/blob/master/health/health.d/riakkv.conf) | riak.core.fsm_active | number of currently running list keys finite state machines | + + +## Setup + +### Prerequisites + +#### Configure RiakKV to enable /stats endpoint + +You can follow the RiakKV configuration reference documentation for how to enable this. + +Source : https://docs.riak.com/riak/kv/2.2.3/configuring/reference/#client-interfaces + + + +### Configuration + +#### File + +The configuration file name for this integration is `python.d/riakkv.conf`. + + +You can edit the configuration file using the `edit-config` script from the +Netdata [config directory](https://github.com/netdata/netdata/blob/master/docs/configure/nodes.md#the-netdata-config-directory). + +```bash +cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata +sudo ./edit-config python.d/riakkv.conf +``` +#### Options + +There are 2 sections: + +* Global variables +* One or more JOBS that can define multiple different instances to monitor. + +The following options can be defined globally: priority, penalty, autodetection_retry, update_every, but can also be defined per JOB to override the global values. + +Additionally, the following collapsed table contains all the options that can be configured inside a JOB definition. + +Every configuration JOB starts with a `job_name` value which will appear in the dashboard, unless a `name` parameter is specified. + + +
Config options + +| Name | Description | Default | Required | +|:----|:-----------|:-------|:--------:| +| update_every | Sets the default data collection frequency. | 5 | False | +| priority | Controls the order of charts at the netdata dashboard. | 60000 | False | +| autodetection_retry | Sets the job re-check interval in seconds. | 0 | False | +| penalty | Indicates whether to apply penalty to update_every in case of failures. | yes | False | +| url | The url of the server | no | True | + +
+ +#### Examples + +##### Basic (default) + +A basic example configuration per job + +```yaml +local: +url: 'http://localhost:8098/stats' + +``` +##### Multi-instance + +> **Note**: When you define multiple jobs, their names must be unique. + +Collecting metrics from local and remote instances. + + +
Config + +```yaml +local: + url: 'http://localhost:8098/stats' + +remote: + url: 'http://192.0.2.1:8098/stats' + +``` +
+ + + +## Troubleshooting + +### Debug Mode + +To troubleshoot issues with the `riakkv` collector, run the `python.d.plugin` with the debug option enabled. The output +should give you clues as to why the collector isn't working. + +- Navigate to the `plugins.d` directory, usually at `/usr/libexec/netdata/plugins.d/`. If that's not the case on + your system, open `netdata.conf` and look for the `plugins` setting under `[directories]`. + + ```bash + cd /usr/libexec/netdata/plugins.d/ + ``` + +- Switch to the `netdata` user. + + ```bash + sudo -u netdata -s + ``` + +- Run the `python.d.plugin` to debug the collector: + + ```bash + ./python.d.plugin riakkv debug trace + ``` + + -- cgit v1.2.3